Random Matrices and Multivariate Statistical Analysis

Size: px

Start display at page:

Download "Random Matrices and Multivariate Statistical Analysis"

Dorothy Greene
5 years ago
Views:

1 Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford SEA p.1

2 Agenda Classical multivariate techniques Principal Component Analysis Canonical Correlations Multivariate Regression Hypothesis Testing: Single and Double Wishart Eigenvalue densities Linear Statistics Single Wishart Double Wishart Largest Eigenvalue Single Wishart Double Wishart Concluding Remarks SEA p.2

3 Classical Multivariate Statistics Canonical methods are based on spectral decompositions: One matrix (Wishart) Principal Component analysis Factor analysis Multidimensional scaling Two matrices(independent Wisharts) Multivariate Analysis of Variance (MANOVA) Multivarate regression analysis Discriminant analysis Canonical correlation analysis Tests of equality of covariance matrices SEA p.3

4 Gaussian data matrices cases. = = Independent rows: x i N p (0, Σ), i =1,...n or: X N(0,I n Σ p ) variables Zero mean no centering in sample covariance matrix: S =(S kk ), S = 1 n XT X, S kk = 1 n x ik x ik n ns W p (n, Σ) i=1 SEA 06@MIT p.4

5 Principal Components Analysis Hotelling, 1933 X 1,...,X n N p (µ, Σ), Low dim. subspace explaining most variance : l i =max{u Su : u u =1,u u j =0,j <i} Eigenvalues of Wishart: A = ns W p (n, Σ): Au i = l i u i l 1... l p 0. Key question: How many l i are significant? "scree" plot of singular values of phoneme data SEA 06@MIT p.5

6 Canonical Correlations ( ) ( ) X 1 X n jointly p + q variatenormal. Y 1 Y n Most predictable criterion : (Hotelling, 1935, 1936). max u i,v i Corr (u i X, v i Y ) Av i = r 2 i (A + B)v i, r r 2 p. Two independent Wishart distributions: A W p (q, Σ), B W p (n q, Σ). SEA 06@MIT p.6

7 Multivariate Multiple Regression Y = XB + U n p n q q p n p U N p (0,I Σ) n = #observations;p = # response variables; q = # predictor variables P = X(X T X) 1 X T projection on span{cols(x)} Y T Y = Y T PY + Y T (I P )Y H : hypothesis SSP + E : error SSP H W p (q, Σ) indep of E W p (n q, Σ) SEA 06@MIT p.7

8 Agenda Classical multivariate techniques Hypothesis Testing: Single and Double Wishart Eigenvalue densities Linear Statistics Largest Eigenvalue Concluding Remarks SEA p.8

9 Hypothesis Testing Null hypothesis H 0,nestedwithinAlternative hypothesis H A Test Statistics: functions of eigenvalues: T = T (l 1,...,l p ). Null hypothesis distribution: P (T >t H 0 true). RMT offers tools for evaluation, and approximation based on p Single Wishart A W p (n, I) e-vals det(a l i I)=0. Test H 0 :Σ=I (or λi) versush A :Σunrestricted. Double Wishart H W p (q, Σ),E W p (n q, Σ) independently. Eigenvalues det(h l i (E + H)) = 0. Typical hypothesis test (e.g. from Y = XB + U): H 0 : B =0versus H A : B unrestricted SEA 06@MIT p.9

10 Likelihood Ratio Test If X N p (0,I p Σ), the density f Σ (X) = det(2πσ) n/2 exp{ (n/2)trσ 1 S} Log likelihood Σ l(σ X) = log f Σ (X) =c np n 2 log det Σ n 2 trσ 1 S Maximum likelihood occurs at ˆΣ =S: max Σ l(σ X) =c np n 2 log det S Likelihood ratio test of H 0 :Σ=I vs. H A :Σunrestricted: log LR =max Σ H 0 l(σ X) max Σ H A l(σ X) = c np + n 2 ( i log l i i l i ) Linear statistics in eigenvalues of S: i log l i, i l i. SEA 06@MIT p.10

11 (Union-) Intersection Principle Combine univariate test statistics: H 0 :Σ=I a =1 H 0a : a T Σa =1. Var(a T X)=a T Σa, so reject H 0a if Var(a T X)=a T Sa > c a Reject H 0 reject some H 0a max a a T Sa > c max l max (S) >c max Summary: Likelihood ratio principle linear statistics in eigenvalues Intersection principle extreme eigenvalues SEA 06@MIT p.11

12 Agenda Classical multivariate techniques Hypothesis Testing: Single and Double Wishart Eigenvalue densities Linear Statistics Largest Eigenvalue Concluding Remarks SEA p.12

13 Eigenvalue densities - single Wishart Statistics (n, p): c N j e l i/2 j<k l j l k Laguerre OE (N,α): c p i=1 x α 2 j e x i/2 j<k x j x k ( ) ( ) N p α n p 1 i=1 l n p 1 2 Notation change has significance! p =#variables n = sample size Statistics: no necessary relation between p and n; traditional approximation uses p fixed, n. RMT: N with α fixed is most natural. (in Stat, fixing n p would be less natural). SEA 06@MIT p.13

14 Eigenvalue densities - double Wishart Statistics: If H W p (q, I) and E W p (n q, I) are indep, then joint density of eigenvalues {u i } of H(H + E) 1 is f(u) =c With p i=1 u (q p 1)/2 i (1 u i ) (n q p 1)/2 p n q p q p N +1 α β, recover the Jacobi orthogonal ensemble m (u i u j ). i<j u =(1+x)/2, f(x) =c N+1 N+1 (1 x i ) (α 1)/2 (1 + x i ) (β 1)/2 i=1 i<j x i x j. SEA 06@MIT p.14

15 Convergence of Empirical Spectra For e-values {l i } p i=1 G p (t) =p 1 #{l i t} G(t) =g(t)dt. Single Wishart (Marčenko-Pastur, 67) A W p (n, I) If p/n c>0, g MP (b+ t)(t b ) (t) =, b ± =(1± c) 2. 2πct Double Wishart (Wachter, 80) det(h l i (H + E)) = 0. If p q, p/n c =sin 2 (γ/2) > 0, q/n sin 2 (φ/2), g W (t) = (b+ t)(t b ) 2πct(1 t), b ± =sin 2 ( φ ± γ 2 ). SEA 06@MIT p.15

16 Agenda Classical multivariate techniques Hypothesis Testing: Single and Double Wishart Eigenvalue densities Linear Statistics Single Wishart Double Wishart Largest Eigenvalue Concluding Remarks SEA p.16

17 Linear Statistics: Single Wishart Approximate distributions: Statistics: Typically p fixed; standard χ 2 approximation, improvements by Bartlett correction RMT: Central Limit Theorems (p large) for linear statistics of eigenvalues. Large literature Jonsson (1982): S W p (n, I), p/n c>0 With d(c) =(1 c 1 )log(1 c) 1, log det S pd(c) D N( 1 2 log(1 c), 2log(1 c)) (1) trs p D N(0, 2c) Surprise: quality of approximation in (1) for p small (e.g. 2!) SEA 06@MIT p.17

18 Small p asymptotics Quantiles of Input Sample QQ Plot of Sample Data versus Standard Normal Standard Normal Quantiles n p qtile pfix pbais SEA 06@MIT p.18

19 CLT for Likelihood Ratio distribution Bai-Silverstein(2004) p f(l i ) p 1 Cov(X f,x g )= 1 2π 2 f(x)g MP (x)dx D X f N(EX f, Cov(X f )), Γ 1 f(z(m 1 ))g(z(m 2 )) Γ 2 (m 1 m 2 ) 2 dm 1 dm 2 CLT for null distribution of the LR test of H 0 :Σ=I, p (log l i l i +1) D N(pd(c)+ 1 2 log(1 c), 2[log(1 c) 1 c]). 1 SEA 06@MIT p.19

20 Linear Statistics: Double Wishart Hypothesis tests based on e-vals u i of H(H + E) 1,i.e. e-vals w i = u i /(1 u i ) of HE 1. Many standard tests are linear statistics S N (g) = p 1 g(u i): Wilks Λ: log Λ = p 1 log(1 u i) [Likelihood ratio test] Pillai s trace = p 1 u i Hotelling-Lawley trace = u i /(1 u i )= p 1 w i Roy s largest root = u (1). Basor-Chen (05) Unitary case, formal; N α, β fixed. S N (g) (2N + α + β)a g D N(0,bg g ), [a g = 1 2π R 1 1 g(x) q1 x 2 dx, b g g = 1 2π 2 R 1 1 g(x) q 1 x 2 P R 1 1 q 1 y 2 y x g(y)dydx] SEA 06@MIT p.20

21 Agenda Classical multivariate techniques Hypothesis Testing: Single and Double Wishart Eigenvalue densities Linear Statistics Largest Eigenvalue Single Wishart Double Wishart Concluding Remarks SEA p.21

22 Largest Eigenvalue - Single Wishart Usual approach to maxima is (classically) infeasible: {l (1) x} = p i=1 I{l i x} Key role: determinants, not independence: (l i l j )=det[li k 1 ] 1 i,k p i<j p I{l i x} = i=1 p ( ) p k ( 1) k I{l i >x}. k k=0 P { max 1 i p l i t} = i=1 det(i K p χ [t, ) ) K p (x, y) is (2 2 matrix) kernel uses {Laguerre, Jacobi} orthogonal polynomials via Christoffel-Darboux summation. SEA 06@MIT p.22

23 Tracy-Widom Limit For real (β =1,IMJ)orcomplex(β =2, Johansson) data, if n/p c (0, ): with F p (s) =P {l 1 µ np + σ np s} F β (s), µ np =( n + p) 2, σ np =( n + ( 1 p) n + 1 ) 1/3 p El Karoui (2004) In complex case, for refined µ np,σ np, F p (s) F 2 (s) Ce s p 2/3. Also, results for N,p separately, and under alternative hypotheses. SEA 06@MIT p.23

24 Painlevé II and Tracy-Widom Painlevé II: 0.8 q = xq +2q 3 q(x) Ai(x) as x Tracy-Widom distributions: F 2 (s) = exp{ F 1 (s) = (F 2 (s)) 1/2 exp{ 1 2 s (x s)q 2 (x)dx} s q(x)dx} SEA 06@MIT p.24

25 Largest Root - Double Wishart Assume p, q(p),n(p). Simply, γ p 2 =sin 1 r p.5 n 1, µ ± =cos 2 π 2 φ p ± γ p 2, σ 3 p+ = r φ p q.5 2 =sin 1 n 1. 1 sin 4 (φ p + γ p ) (2n 2) 2. sin φ p sin γ p u 1 µ + σ + D W1 F 1 More precisely, logit transform l(u) = log(u/(1 u)) : (IMJ,PJF) P {l(u 1 ) l(µ + )+sσ + l (µ + )} F 1 (s) Ce s/4 p 2/3 corrections (.5, 1, 2) improve approx n for p, q small, error is O(p 2/3 ) [insteadofo(p 1/3 )] SEA 06@MIT p.25

26 Approximation vs. Tables for p =5 Table vs. Approx at 95th %tile; mc = (q p 1)/2; nc = (n q p 1)/2 1 nc = nc = 5 Tables: William Chen, IRS, (2002) m c = q p 1 [0, 15], 2 n c = n q p 1 [1, 1000] 2 Squared Correlation nc = 10 nc = 20 nc = 40 Tracy Widom Chen Table p=5 nc = nc = mc SEA 06@MIT p.26

27 Remarks p 2/3 scale of variability for u 1 95th %tile. = µ p+ + σ p+, 99th %tile. = µ p+ +2σ p+ if µ p+ >.7, logit scale v i =logu i /(1 u i ) better. Smallest eigenvalue: with previous assumptions and γ 0 <φ 0, σp 3 1 sin = 4 (φ p γ p ) (2n 2) 2 sin φ p sin γ p then µ p u p σ p D W1 (W 2 ) Corresponding limit distributions for u 2 u k, u p k u p 1, k fixed SEA 06@MIT p.27

28 Agenda Classical multivariate techniques Hypothesis Testing: Single and Double Wishart Eigenvalue densities Linear Statistics Largest Eigenvalue Concluding Remarks SEA p.28

29 Concluding Remarks Numerous other topics deserve attention: distributions under alternative hypotheses: integral representations, matrix hypergeometric functions empirical distributions and graphical display (Wachter) computational advances: (Dumitriu, Edelman, Koev, Rao) operations on random matrices multivariate orthogonal polynomials matrix hypergeometric functions estimation and testing for eigenvectors (Paul) technical role for RMT in other statistical areas: e.g. via large deviations results SEA p.29

30 Back-Up Slides SEA p.30

31 Upper Bound in SAS Approximate n q q u 1 1 u 1 by F q,n q Table vs. Approx at 95th %tile; mc = (q p 1)/2; nc = (n q p 1)/2 1 nc = nc = 5 Squared Correlation nc = 10 nc = 20 Tracy Widom Chen Table p=5 nc = SAS 40 F approx nc = nc = mc SEA 06@MIT p.31

32 Testing Subsequent Correlations Suppose: Σ XY = ρ ρ 2 p 0 0 If largest r correlations are large, test H r :ρ r+1 = ρ r+2 =...= ρ p =0? Comparison Lemma (from SVD interlacing) L(u r+1 p, q, n; S XY H r ) st < L(u 1 p, q r, n; I) conservative P values for H r via TW(p, q r, n) approx n to RHS p q, n p [Aside: L(u 1 p r, q r, n; I) may be better, but no bounds] SEA 06@MIT p.32

Large sample covariance matrices and the T 2 statistic

Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and