On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

Size: px
Start display at page:

Download "On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR"

Transcription

1 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao Université de Rennes 1, IRMAR

2 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Overview Introduction a two sample problem High-dimensional data and new challenge in statistics A two sample problem Marčenko-Pastur distributions and one-sample problems Sample v.s. population covariance matrices Marčenko-Pastur distributions CLT s for linear spectral statistics of S n Random Fisher matrices and two-sample problems Random Fisher matrices Testing covariance matrices one-sample case Simulation study I Testing covariance matrices two-samples case Simulation study II Conclusions

3 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Introduction a two sample problem High-dimensional data and new challenge in statistics A two sample problem Marčenko-Pastur distributions and one-sample problems Sample v.s. population covariance matrices Marčenko-Pastur distributions CLT s for linear spectral statistics of S n Random Fisher matrices and two-sample problems Random Fisher matrices Testing covariance matrices one-sample case Simulation study I Testing covariance matrices two-samples case Simulation study II Conclusions

4 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova High-dimensional data and new challenge in statistics High dimensional data High dimensional data high dimensional models Nonparametric regression a very high-dimensional model (i.e. infinite dimensional model) but with one-dimensional data y i = f (x i ) + ε i, f R R, i = 1,..., n High-dimensional data observation vectors x i R p, with p relatively high w.r.t. the sample size n

5 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova High-dimensional data and new challenge in statistics High dimensional data Some typical data dimensions data dimension p sample size n y = p/n portfolio climate survey speech analysis a 10 2 b ORL face data base micro-arrays Important y = p/n not always close to 0 ; frequently larger than 1

6 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova A two sample problem High-dimensional effect by an example The two-sample problem two independent samples x 1,..., x n1 (µ 1, Σ), y 1,..., y n2 (µ 2, Σ) want to test H 0 µ 1 = µ 2 against H 1 µ 1 µ 2. Classical approach Hotelling s T 2 test where x = S n = n X 1 i=1 1 n 2 T 2 = n1n2 n (x y) Sn 1 (x y), n X 2 x i, y = y j, n = n 1 + n 2, " n1 j=1 # n X X 2 (x i x i )(x i x i ) + (y j y j )(y j y i ) i=1 j=1. S n a sample covariance matrix

7 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova A two sample problem The two-sample problem Hotelling s T 2 test nice properties invariance under linear transformations; optimality if Gaussian LRT, BLUE.. Hotelling s T 2 test bad news low power even for moderate data dimensions; very few is known for the non Gaussian case; fatal deficiency when p > n 2, S n is not invertible.

8 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova A two sample problem Dempster s non-exact test (NET) Dempster A.P., 58, 60 A reasonable test must be based on x y even when p > n 2; choose a new basis in R n such that 1. axis 1 ground mean, 2. axis 2 (x y). with an orthogonal matrix H n and the data matrix X n p xn 1, y 1,..., y n2 ), Z = n p Hn X = n n h 1.. h n C A X = z 1.. z n C A, h 1 = 1 1 n, h 2 = n Under normality, we have the z i s are n independent N p(, Σ); Ez 1 = 1 (n 1µ n 1 + n 2µ 2 ), Ez 2 = n1n2 (µ n 1 µ 2 ), Ez 3 = 0, i = 3,..., n. n 2 nn1 1 n1 n 1 nn2 1 n2 1 C A.

9 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova A two sample problem Dempster s non-exact test (NET) Test statistic z 2 2 F D = (n 2) z z n 2 Under H 0, z j 2 Q = rx α k χ 2 1(k), k=1 where α 1 α r > 0 are the non null eigenvalues of Σ. The distribution of F D is complicated Approximations (so the NET test!) 1. Q mχ 2 r ; 2. next estimate r by ˆr ; Finally, under H 0, F D F (ˆr, (n 2)ˆr).

10 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova A two sample problem Dempster s non-exact test (NET) Problems with the NET test Difficult to construct the orthogonal transformation H n = {h j } for large n; even under Gaussianity, the exact power function depend on H n.

11 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova A two sample problem Bai and Saranadasa s test (ANT) Bai & Saranadasa, 96 Consider directly the statistic M n = x y 2 n tr S n ; n 1n 2 generally under very mild conditions (here RMT comes!), M n σ 2 n = N(0, 1), σn 2 = Var(M n) = n2 n 1 n1 2n2 2 n 2 tr Σ2. A ratio consistent estimator bσ 2 n = 2n(n 1)(n 2) n 1n 2(n 3)» tr Sn 2 1 (tr Sn)2 n 2, bσ 2 n/σ 2 n P 1. Finally, under H 0, Z n = Mn bσ 2 n = N(0, 1) This is the Bai-Saranadasa s asymptotic normal test (ANT).

12 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova A two sample problem Comparison between T 2, NET and ANT Power functions Bai & Saranadasa, 96 Assuming p, n, p/n y (0, 1), n 1/n κ ; Hotelling s T 2, Dempster s NET and Bai-Saranadasa s ANT s β H (µ) = Φ ξ α + β D (µ) = Φ ξ α + where α = test size, and! n(1 y) κ(1 κ) Σ 1/2 µ 2 + o(1), 2y n 2 tr Σ 2 κ(1 κ) µ 2 «+ o(1) = β BS (µ). µ = µ 1 µ 2, ξ α = Φ 1 (1 α). Important because of the factor (1 y), T 2 losses power when y increases, i.e. p increases relatively to n.

13 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova A two sample problem Comparison between T 2, NET and ANT Simulation results 1 Gaussian case Choice of covariance Σ = (1 ρ)i p + ρj p, J p = 1 p1 p noncentral parameter η = µ 1 µ 2 2, (n tr Σ 2 1, n 2) = (25, 20), n = 45

14 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova A two sample problem Comparison between T 2, NET and ANT Simulation results 2 Gamma variables Choice of covariance Σ = tridiagonal(ρ; 1 + ρ 2 ; ρ)

15 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova A two sample problem A summary of the introduction High-dimensional effect need to be taken into account ; Surprisingly, asymptotic methods with RMT perform well even for small p (as low as p = 4) ; many of classical multivariate analysis methods have to be examined with respect to high-dimensional effects.

16 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Introduction a two sample problem High-dimensional data and new challenge in statistics A two sample problem Marčenko-Pastur distributions and one-sample problems Sample v.s. population covariance matrices Marčenko-Pastur distributions CLT s for linear spectral statistics of S n Random Fisher matrices and two-sample problems Random Fisher matrices Testing covariance matrices one-sample case Simulation study I Testing covariance matrices two-samples case Simulation study II Conclusions

17 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Sample v.s. population covariance matrices An effect of high-dimensions S n Σ x i, 1 i n i.i.d. N p(0, Σ), Sample covariance matrix S n = 1 X x i x i. n i Répartition des v.p., p=40, n=160 p = n = y = p/n = Histogram of eigenvalues of S n

18 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Sample v.s. population covariance matrices Répartition des v.p., p=40, n= p = 40 n = 320 y = p/n = Histogram of eigenvalues of S n

19 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Marčenko-Pastur distributions The Marčenko-Pastur distribution Theorem. Assume Marčenko & Pastur, 1967 X = p n i.i.d. variables (0, 1), Σ = I p not necessarily Gaussian, but with finite 4-th moment p, n, p/n y (0, 1] Then, the (empirical) distribution of the eigenvalues of S n = 1 n XXT converges to the distribution with density function f (x) = 1 p (x a)(b x), a x b, 2πyx where a = (1 y) 2, b = (1 + y) 2.

20 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Marčenko-Pastur distributions The Marčenko-Pastur distribution f (x) = 1 p (x a)(b x), (1 y) 2 = a x b = (1 + y) 2. 2πyx Densités de la loi de Marcenko Pastur 1.0 y p/n a b 1/ / / c=1/4 c=1/8 c=1/2

21 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Marčenko-Pastur distributions An explanation of the power deficiency of Hotelling s T 2 when p increases, even in Gaussian case, S n is different from its population counterpart Σ ; when y = p/n 1, the left edge a 0 small eigenvalues yield an instability of the T 2 statistic T 2 = n1n2 n (x y) Sn 1 (x y).

22 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova CLT s for linear spectral statistics of Sn CLT for linear spectral statistics of S n Set y n = p ; n [a, b] U open C. for any g analytic on U G n(g) = p [F n(g) µ yn (g)] (1) where µ α is the MP distribution of index α (0, 1).

23 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova CLT s for linear spectral statistics of Sn A CLT for linear spectral statistics Theorem Assume that g 1,, g k are k analytic functions on U ; Bai and Silverstein, 04 the matrix entries x ij are i.i.d. real-valued random variables such that Ex ij = 0, Ex 2 ij = 1, Ex ij 4 = 3. as n, p, y n = p y (0, 1); n Then, (G n(g 1),, G n(g k )) N k (m, V ), with a given mean vector m = m(g 1,..., g k ) and asymptotic covariance matrix V = V (g 1,..., g k ).

24 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Introduction a two sample problem High-dimensional data and new challenge in statistics A two sample problem Marčenko-Pastur distributions and one-sample problems Sample v.s. population covariance matrices Marčenko-Pastur distributions CLT s for linear spectral statistics of S n Random Fisher matrices and two-sample problems Random Fisher matrices Testing covariance matrices one-sample case Simulation study I Testing covariance matrices two-samples case Simulation study II Conclusions

25 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Random Fisher matrices Random Fisher matrices two independent samples x 1,..., x n1 (0, I p), y 1,..., y n2 (0, I p) with i.i.d coordinates of mean 0 and variance 1 Associated sample covariance matrices S 1 = 1 n X 1 x i x i, S 2 = 1 n X 2 y j yj. n 1 n 2 i=1 j=1 Fisher matrix V n = S 1S 1 2 where n 2 > p.

26 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Random Fisher matrices Random Fisher matrices Assume y n1 = p n 1 y 1 (0, 1), y n2 = p n 2 y 2 (0, 1). Under mild moment conditions, the ESD Fn Vn of V n has a LSD F y1,y 2 with density 8 >< l(x) = > (1 y p 2) (b x)(x a), a x b, 2πx(y 1 + y 2x) 0, otherwise where a = (1 y 2) 2 `1 y 1 + y 2 y 1y 2 2, b = (1 y 2) 2 `1 + y 1 + y 2 y 1y 2 2.

27 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Random Fisher matrices CLT for LSS of random Fisher matrices let U e C be an open set including the interval " I (0,1) (y (1 y 1) 2 # 1) (1 + y, (1 + y1) 2 2) 2 (1, y 2) 2 for an analytic function f on e U, define fg n(f ) = p Z + h i f (x) Fn Vn F yn1,y n2 (dx), where F yn1,y n2 is the LSD with indexes y nk, k = 1, 2.

28 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Random Fisher matrices CLT for LSS of random Fisher matrices Theorem Assume Ex 4 11 = Ey 4 11 < and let β = E x Then for any analytic functions f 1,, f k defined on e U, h fgn(f 1),, f G n(f k )i = N k (m, υ). Zheng, 08

29 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Random Fisher matrices CLT for LSS of random Fisher matrices Limiting mean function m Zheng, 08 m(f j ) = lim [(2) + (3) + (4)] r 1 + I " # 1 1 f j (z(ζ)) πi ζ =1 ζ 1 ζ + 1 ζ + y dζ (2) 2 r r hr I β y1(1 y2)2 1 + f 2πi h 2 j (z(ζ)) ζ =1 (ζ + y dζ (3) 2 )3 hr I β y2(1 y2) + f j (z(ζ)) ζ + 1 hr 2πi h ζ =1 (ζ + y dζ, (4) 2 )3 hr where h i z(ζ) = (1 y 2) h 2 + 2hR(ζ), h = y 1 + y 2 y 1y 2.

30 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Random Fisher matrices CLT for LSS of random Fisher matrices Limiting covariance function υ Zheng, 08 υ(f j, f l ) = lim [(5) + (6))] 1<r 1 <r I I f j (z(r 1ζ 1))f l (z(r 2ζ 2))r 1r 2 dζ 1dζ 2, (5) 2π 2 ζ 2 =1 ζ 1 =1 (r 2ζ 2 r 1ζ 1) 2 I I β (y1 + y2)(1 y2)2 f j (z(ζ 1)) f l (z(ζ 2)) 4π 2 h 2 ζ 1 =1 (ζ 1 + y 2 hr 1 ) 2 dζ1 ζ 2 =1 (ζ 2 + y dζ2 (6) 2 hr 2 ) 2 j, l {1,, k}.

31 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Introduction a two sample problem High-dimensional data and new challenge in statistics A two sample problem Marčenko-Pastur distributions and one-sample problems Sample v.s. population covariance matrices Marčenko-Pastur distributions CLT s for linear spectral statistics of S n Random Fisher matrices and two-sample problems Random Fisher matrices Testing covariance matrices one-sample case Simulation study I Testing covariance matrices two-samples case Simulation study II Conclusions

32 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova One-sample test on covariance matrices a sample x 1,..., x n) N p(µ, Σ) want to test H 0 Σ = I p LR statistic T n = n [trs n log S n p] with the sample covariance matrix S n = 1 nx (x i x)(x i x), n i=1 Classical LRT Data dimension p is fixed, and when n, T n = χ 2 p(p+1)/2. Will see rapidly deficient when p is not small.

33 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova RMT Corrected LRT Theorem Bai, Jiang, Y and Zheng 09 Assume p/n y (0, 1) and let g(x) = x log x 1. Then, under H 0 and when n» Tn n p F yn (g) N(m(g), υ(g)), where F yn is the Marčenko-Pastur law of index y n and log (1 y) m(g) =, 2 υ(g) = 2 log (1 y) 2y.

34 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Simulation study I Comparison of LRT and CLRT by simulation nominal test level α = 0.05 ; for each (p, n), 10,000 independent replications with real Gaussian variables. Powers are estimated under the alternative H 1 Σ = diag(1, 0.05, 0.05, 0.05,..., 0.05). CLRT LRT (p, n ) Size Difference with 5% Power Size Power (5, 500) (10, 500) (50, 500) (100, 500) (300, 500)

35 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Simulation study I On a plot n=500 Type I Error CLRT LRT Dimension

36 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Introduction a two sample problem High-dimensional data and new challenge in statistics A two sample problem Marčenko-Pastur distributions and one-sample problems Sample v.s. population covariance matrices Marčenko-Pastur distributions CLT s for linear spectral statistics of S n Random Fisher matrices and two-sample problems Random Fisher matrices Testing covariance matrices one-sample case Simulation study I Testing covariance matrices two-samples case Simulation study II Conclusions

37 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Two-samples test on covariance matrices two samples x 1,..., x n1 ) N p(µ 1, Σ 1), y 1,..., y n2 ) N p(µ 2, Σ 2) want to test H 0 Σ 1 = Σ 2 The associated sample covariance matrices are S 1 = 1 n X 1 (x i x)(x i x), S 2 = 1 n X 2 (y i y)(y i y), n 1 n 2 i=1 i=1 Let the LR statistic S1S 1 n1 2 2 L 1 = c1s 1S 1 n, 2 + c 2I p 2 where n = n 1 + n 2 and c k = n k n, k = 1, 2.

38 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Two-samples test on covariance matrices Classical LRT Data dimension p is fixed, and when n 1, n 2 and under H 0, T n = 2 log L 1 χ 2 p(p+1)/2. Will see rapidly deficient when p is not small.

39 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova RMT Corrected LRT Theorem Bai, Jiang, Y and Zheng 09 Assuming that the conditions of CLT for LSS of Fisher matrices hold and let f (x) = log(y 1 + y 2x) y2 y 1 + y 2 log x log(y 1 + y 2). Then under H 0 and as n 1 n 2,» 2 log L1 p F yn1,y n n2 (f ) N (m(f ), υ(f )), with m(f ) = 1 2» log y1 + y 2 y 1y 2 y 1 + y 2 «y1 y 1 + y 2 log(1 y 2) y2 log(1 y 1), y 1 + y 2 υ(f ) = 2y 2 2 (y 1 + y 2) 2 log(1 y1) 2y 2 1 (y 1 + y 2) 2 log(1 y2) 2 log y 1 + y 2 y 1 + y 2 y 1y 2.

40 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Simulation study II Comparison of LRT and CLRT by simulation nominal test level α = 0.05 ; for each (p, n 1, n 2), 10,000 independent replications with real Gaussian variables. Powers are estimated under the alternative H 1 Σ 1Σ 1 2 = diag(3, 1, 1,, ).

41 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Simulation study II Comparison of LRT and CLRT by simulation with (y1, y2) = (0.05, 0.05) CLRT LRT (p, n 1, n 2 ) Size Difference with 5% Power Size Power (5, 100, 100) (10, 200, 200) (20, 400, 400) (40, 800, 800) (80, 1600, 1600) (160, 3200, 3200) (320, 6400, 6400)

42 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Simulation study II Comparison of LRT and CLRT by simulation with (y1, y2) = (0.05, 0.1) CLRT LRT (p, n 1, n 2 ) Size Difference with 5% Power Size Power (5, 100, 50) (10, 200, 100) (20, 400, 200) (40, 800, 400) (80, 1600, 800) (160, 3200, 1600) (320, 6400, 3200)

43 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Simulation study II Comparisons of LRT and CLRT y1=0.05, y2=0.05 y1=0.05, y2=0.1 Type I Error CLRT LRT Type One Error CLRT LRT Dimension Dimension

44 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Introduction a two sample problem High-dimensional data and new challenge in statistics A two sample problem Marčenko-Pastur distributions and one-sample problems Sample v.s. population covariance matrices Marčenko-Pastur distributions CLT s for linear spectral statistics of S n Random Fisher matrices and two-sample problems Random Fisher matrices Testing covariance matrices one-sample case Simulation study I Testing covariance matrices two-samples case Simulation study II Conclusions

45 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Some conclusions High dimensional effects should be taken into account; RMT for sample covariance matrices is a powerful tool to coorect classical multivariate procedures ; Each time some Σ is to be estimated, take care of the natural estimator bσ = S n. Yet the RMT is not suffciently developped for statistics 1. dependent observations time series ; 2. not identically distributed variables.

46 Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova Some references Bai, Z. D. and Saranadasa, H. (1996). Effect of high dimension comparison of significance tests for a high dimensional two sample problem. Statistica Sinica. 6, Bai, Z. D. and Silverstein, J. W. (2004). CLT for linear spectral statistics of large-dimensional sample covariance matrices. Ann.Probab. 32, Z. D. Bai, D. Jiang, J. Yao and S. Zheng, Corrections to LRT on Large Dimensional Covariance Matrix by RMT. Annals of Statistics 37, Z. D. Bai, D. Jiang, J. Yao and S. Zheng, Large Regression Analysis. submitted Dempster, A. P. (1958). A high dimensional two sample significance test. Ann. Math. Statist. 29, Zheng, S. (2008). Central Limit Theorem for Linear Spectral Statistics of Large Dimensional F Matrix. Preprint, Northern-Est Normal University

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao A note on a Marčenko-Pastur type theorem for time series Jianfeng Yao Workshop on High-dimensional statistics The University of Hong Kong, October 2011 Overview 1 High-dimensional data and the sample covariance

More information

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p.

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p. Title A new test for the proportionality of two large-dimensional covariance matrices Authors) Liu, B; Xu, L; Zheng, S; Tian, G Citation Journal of Multivariate Analysis, 04, v. 3, p. 93-308 Issued Date

More information

TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST

TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST Submitted to the Annals of Statistics arxiv:. TESTS FOR LARGE DIMENSIONAL COVARIANCE STRUCTURE BASED ON RAO S SCORE TEST By Dandan Jiang Jilin University arxiv:151.398v1 [stat.me] 11 Oct 15 This paper

More information

Large sample covariance matrices and the T 2 statistic

Large sample covariance matrices and the T 2 statistic Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and

More information

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Estimation of the Global Minimum Variance Portfolio in High Dimensions Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results

More information

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX Submitted to the Annals of Statistics HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX By Shurong Zheng, Zhao Chen, Hengjian Cui and Runze Li Northeast Normal University, Fudan

More information

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),

More information

An Introduction to Large Sample covariance matrices and Their Applications

An Introduction to Large Sample covariance matrices and Their Applications An Introduction to Large Sample covariance matrices and Their Applications (June 208) Jianfeng Yao Department of Statistics and Actuarial Science The University of Hong Kong. These notes are designed for

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Random Matrices: Beyond Wigner and Marchenko-Pastur Laws

Random Matrices: Beyond Wigner and Marchenko-Pastur Laws Random Matrices: Beyond Wigner and Marchenko-Pastur Laws Nathan Noiry Modal X, Université Paris Nanterre May 3, 2018 Wigner s Matrices Wishart s Matrices Generalizations Wigner s Matrices ij, (n, i, j

More information

A new method to bound rate of convergence

A new method to bound rate of convergence A new method to bound rate of convergence Arup Bose Indian Statistical Institute, Kolkata, abose@isical.ac.in Sourav Chatterjee University of California, Berkeley Empirical Spectral Distribution Distribution

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

On testing the equality of mean vectors in high dimension

On testing the equality of mean vectors in high dimension ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.

More information

Covariance estimation using random matrix theory

Covariance estimation using random matrix theory Covariance estimation using random matrix theory Randolf Altmeyer joint work with Mathias Trabs Mathematical Statistics, Humboldt-Universität zu Berlin Statistics Mathematics and Applications, Fréjus 03

More information

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Tiefeng Jiang 1 and Fan Yang 1, University of Minnesota Abstract For random samples of size n obtained

More information

Testing High-Dimensional Covariance Matrices under the Elliptical Distribution and Beyond

Testing High-Dimensional Covariance Matrices under the Elliptical Distribution and Beyond Testing High-Dimensional Covariance Matrices under the Elliptical Distribution and Beyond arxiv:1707.04010v2 [math.st] 17 Apr 2018 Xinxin Yang Department of ISOM, Hong Kong University of Science and Technology

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Numerical Solutions to the General Marcenko Pastur Equation

Numerical Solutions to the General Marcenko Pastur Equation Numerical Solutions to the General Marcenko Pastur Equation Miriam Huntley May 2, 23 Motivation: Data Analysis A frequent goal in real world data analysis is estimation of the covariance matrix of some

More information

Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic

More information

PATTERNED RANDOM MATRICES

PATTERNED RANDOM MATRICES PATTERNED RANDOM MATRICES ARUP BOSE Indian Statistical Institute Kolkata November 15, 2011 Introduction Patterned random matrices. Limiting spectral distribution. Extensions (joint convergence, free limit,

More information

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,

More information

Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction

Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction Random Matrix Theory and its applications to Statistics and Wireless Communications Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction Sergio Verdú Princeton University National

More information

Regularized LRT for Large Scale Covariance Matrices: One Sample Problem

Regularized LRT for Large Scale Covariance Matrices: One Sample Problem Regularized LRT for Large Scale ovariance Matrices: One Sample Problem Young-Geun hoi, hi Tim Ng and Johan Lim arxiv:502.00384v3 [stat.me] 0 Jun 206 August 6, 208 Abstract The main theme of this paper

More information

Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix

Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix Taras Bodnar a, Holger Dette b and Nestor Parolya c a Department of Mathematics, Stockholm University, SE-069

More information

Non white sample covariance matrices.

Non white sample covariance matrices. Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

CS145: Probability & Computing

CS145: Probability & Computing CS45: Probability & Computing Lecture 5: Concentration Inequalities, Law of Large Numbers, Central Limit Theorem Instructor: Eli Upfal Brown University Computer Science Figure credits: Bertsekas & Tsitsiklis,

More information

Applications of Random Matrix Theory in Wireless Underwater Communication

Applications of Random Matrix Theory in Wireless Underwater Communication Applications of Random Matrix Theory in Wireless Underwater Communication Why Signal Processing and Wireless Communication Need Random Matrix Theory Atulya Yellepeddi May 13, 2013 18.338- Eigenvalues of

More information

Fluctuations from the Semicircle Law Lecture 4

Fluctuations from the Semicircle Law Lecture 4 Fluctuations from the Semicircle Law Lecture 4 Ioana Dumitriu University of Washington Women and Math, IAS 2014 May 23, 2014 Ioana Dumitriu (UW) Fluctuations from the Semicircle Law Lecture 4 May 23, 2014

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Exponential tail inequalities for eigenvalues of random matrices

Exponential tail inequalities for eigenvalues of random matrices Exponential tail inequalities for eigenvalues of random matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Lectures 6 7 : Marchenko-Pastur Law

Lectures 6 7 : Marchenko-Pastur Law Fall 2009 MATH 833 Random Matrices B. Valkó Lectures 6 7 : Marchenko-Pastur Law Notes prepared by: A. Ganguly We will now turn our attention to rectangular matrices. Let X = (X 1, X 2,..., X n ) R p n

More information

Random Matrices for Big Data Signal Processing and Machine Learning

Random Matrices for Big Data Signal Processing and Machine Learning Random Matrices for Big Data Signal Processing and Machine Learning (ICASSP 2017, New Orleans) Romain COUILLET and Hafiz TIOMOKO ALI CentraleSupélec, France March, 2017 1 / 153 Outline Basics of Random

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Random Matrices and Multivariate Statistical Analysis

Random Matrices and Multivariate Statistical Analysis Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Applications of random matrix theory to principal component analysis(pca)

Applications of random matrix theory to principal component analysis(pca) Applications of random matrix theory to principal component analysis(pca) Jun Yin IAS, UW-Madison IAS, April-2014 Joint work with A. Knowles and H. T Yau. 1 Basic picture: Let H be a Wigner (symmetric)

More information

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices by Jack W. Silverstein* Department of Mathematics Box 8205 North Carolina State University Raleigh,

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws Symeon Chatzinotas February 11, 2013 Luxembourg Outline 1. Random Matrix Theory 1. Definition 2. Applications 3. Asymptotics 2. Ensembles

More information

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ). 1 Economics 620, Lecture 8a: Asymptotics II Uses of Asymptotic Distributions: Suppose X n! 0 in probability. (What can be said about the distribution of X n?) In order to get distribution theory, we need

More information

High-dimensional two-sample tests under strongly spiked eigenvalue models

High-dimensional two-sample tests under strongly spiked eigenvalue models 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data

More information

An introduction to G-estimation with sample covariance matrices

An introduction to G-estimation with sample covariance matrices An introduction to G-estimation with sample covariance matrices Xavier estre xavier.mestre@cttc.cat Centre Tecnològic de Telecomunicacions de Catalunya (CTTC) "atrices aleatoires: applications aux communications

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Comparison Method in Random Matrix Theory

Comparison Method in Random Matrix Theory Comparison Method in Random Matrix Theory Jun Yin UW-Madison Valparaíso, Chile, July - 2015 Joint work with A. Knowles. 1 Some random matrices Wigner Matrix: H is N N square matrix, H : H ij = H ji, EH

More information

Random Matrix Theory and its Applications to Econometrics

Random Matrix Theory and its Applications to Econometrics Random Matrix Theory and its Applications to Econometrics Hyungsik Roger Moon University of Southern California Conference to Celebrate Peter Phillips 40 Years at Yale, October 2018 Spectral Analysis of

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title itle On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime Authors Wang, Q; Yao, JJ Citation Random Matrices: heory and Applications, 205, v. 4, p. article no.

More information

Triangular matrices and biorthogonal ensembles

Triangular matrices and biorthogonal ensembles /26 Triangular matrices and biorthogonal ensembles Dimitris Cheliotis Department of Mathematics University of Athens UK Easter Probability Meeting April 8, 206 2/26 Special densities on R n Example. n

More information

Statistical signal processing

Statistical signal processing Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3. Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae

More information

X n = c n + c n,k Y k, (4.2)

X n = c n + c n,k Y k, (4.2) 4. Linear filtering. Wiener filter Assume (X n, Y n is a pair of random sequences in which the first component X n is referred a signal and the second an observation. The paths of the signal are unobservable

More information

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Economics 241B Review of Limit Theorems for Sequences of Random Variables Economics 241B Review of Limit Theorems for Sequences of Random Variables Convergence in Distribution The previous de nitions of convergence focus on the outcome sequences of a random variable. Convergence

More information

TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA

TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA Statistica Sinica (014): Preprint 1 TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA Long Feng 1, Changliang Zou 1, Zhaojun Wang 1, Lixing Zhu 1 Nankai University and Hong Kong Baptist University

More information

A Brief Analysis of Central Limit Theorem. SIAM Chapter Florida State University

A Brief Analysis of Central Limit Theorem. SIAM Chapter Florida State University 1 / 36 A Brief Analysis of Central Limit Theorem Omid Khanmohamadi (okhanmoh@math.fsu.edu) Diego Hernán Díaz Martínez (ddiazmar@math.fsu.edu) Tony Wills (twills@math.fsu.edu) Kouadio David Yao (kyao@math.fsu.edu)

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means Jackknife Empirical Likelihood est for Equality of wo High Dimensional Means Ruodu Wang, Liang Peng and Yongcheng Qi 2 Abstract It has been a long history to test the equality of two multivariate means.

More information

Applications and fundamental results on random Vandermon

Applications and fundamental results on random Vandermon Applications and fundamental results on random Vandermonde matrices May 2008 Some important concepts from classical probability Random variables are functions (i.e. they commute w.r.t. multiplication)

More information

1 Intro to RMT (Gene)

1 Intro to RMT (Gene) M705 Spring 2013 Summary for Week 2 1 Intro to RMT (Gene) (Also see the Anderson - Guionnet - Zeitouni book, pp.6-11(?) ) We start with two independent families of R.V.s, {Z i,j } 1 i

More information

A Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices

A Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices A Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices S. Dallaporta University of Toulouse, France Abstract. This note presents some central limit theorems

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

Preface to the Second Edition...vii Preface to the First Edition... ix

Preface to the Second Edition...vii Preface to the First Edition... ix Contents Preface to the Second Edition...vii Preface to the First Edition........... ix 1 Introduction.............................................. 1 1.1 Large Dimensional Data Analysis.........................

More information

Introduction to Self-normalized Limit Theory

Introduction to Self-normalized Limit Theory Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized

More information

Statistica Sinica Preprint No: SS

Statistica Sinica Preprint No: SS Statistica Sinica Preprint No: SS-2017-0275 Title Testing homogeneity of high-dimensional covariance matrices Manuscript ID SS-2017-0275 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202017.0275

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

MATH5745 Multivariate Methods Lecture 07

MATH5745 Multivariate Methods Lecture 07 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2017 1 See last slide for copyright information. 1 / 40 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

Convergence of empirical spectral distributions of large dimensional quaternion sample covariance matrices

Convergence of empirical spectral distributions of large dimensional quaternion sample covariance matrices Ann Inst Stat Math 206 68:765 785 DOI 0007/s0463-05-054-0 Convergence of empirical spectral distributions of large dimensional quaternion sample covariance matrices Huiqin Li Zhi Dong Bai Jiang Hu Received:

More information

Fluctuations from the Semicircle Law Lecture 1

Fluctuations from the Semicircle Law Lecture 1 Fluctuations from the Semicircle Law Lecture 1 Ioana Dumitriu University of Washington Women and Math, IAS 2014 May 20, 2014 Ioana Dumitriu (UW) Fluctuations from the Semicircle Law Lecture 1 May 20, 2014

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Lecture Stat Information Criterion

Lecture Stat Information Criterion Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA J. Japan Statist. Soc. Vol. 37 No. 1 2007 53 86 MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA M. S. Srivastava* In this article, we develop a multivariate theory for analyzing multivariate datasets

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

EUSIPCO

EUSIPCO EUSIPCO 03 56974375 ON THE RESOLUTION PROBABILITY OF CONDITIONAL AND UNCONDITIONAL MAXIMUM LIKELIHOOD DOA ESTIMATION Xavier Mestre, Pascal Vallet, Philippe Loubaton 3, Centre Tecnològic de Telecomunicacions

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Likelihood Ratio Tests for High-dimensional Normal Distributions

Likelihood Ratio Tests for High-dimensional Normal Distributions Likelihood Ratio Tests for High-dimensional Normal Distributions A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Fan Yang IN PARTIAL FULFILLMENT OF THE

More information

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications J.P Bouchaud with: M. Potters, G. Biroli, L. Laloux, M. A. Miceli http://www.cfm.fr Portfolio theory: Basics Portfolio

More information

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus Chapter 9 Hotelling s T 2 Test 9.1 One Sample The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus H A : µ µ 0. The test rejects H 0 if T 2 H = n(x µ 0 ) T S 1 (x µ 0 ) > n p F p,n

More information

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2 EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut

More information

Operator-Valued Free Probability Theory and Block Random Matrices. Roland Speicher Queen s University Kingston

Operator-Valued Free Probability Theory and Block Random Matrices. Roland Speicher Queen s University Kingston Operator-Valued Free Probability Theory and Block Random Matrices Roland Speicher Queen s University Kingston I. Operator-valued semicircular elements and block random matrices II. General operator-valued

More information

Anderson-Darling Type Goodness-of-fit Statistic Based on a Multifold Integrated Empirical Distribution Function

Anderson-Darling Type Goodness-of-fit Statistic Based on a Multifold Integrated Empirical Distribution Function Anderson-Darling Type Goodness-of-fit Statistic Based on a Multifold Integrated Empirical Distribution Function S. Kuriki (Inst. Stat. Math., Tokyo) and H.-K. Hwang (Academia Sinica) Bernoulli Society

More information

arxiv: v1 [math.pr] 13 Aug 2007

arxiv: v1 [math.pr] 13 Aug 2007 The Annals of Probability 2007, Vol. 35, No. 4, 532 572 DOI: 0.24/009790600000079 c Institute of Mathematical Statistics, 2007 arxiv:0708.720v [math.pr] 3 Aug 2007 ON ASYMPTOTICS OF EIGENVECTORS OF LARGE

More information

4. Distributions of Functions of Random Variables

4. Distributions of Functions of Random Variables 4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n

More information

Bootstrap - theoretical problems

Bootstrap - theoretical problems Date: January 23th 2006 Bootstrap - theoretical problems This is a new version of the problems. There is an added subproblem in problem 4, problem 6 is completely rewritten, the assumptions in problem

More information

EAS 305 Random Processes Viewgraph 1 of 10. Random Processes

EAS 305 Random Processes Viewgraph 1 of 10. Random Processes EAS 305 Random Processes Viewgraph 1 of 10 Definitions: Random Processes A random process is a family of random variables indexed by a parameter t T, where T is called the index set λ i Experiment outcome

More information

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Alan Edelman Department of Mathematics, Computer Science and AI Laboratories. E-mail: edelman@math.mit.edu N. Raj Rao Deparment

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information