Statistical Inference and Random Matrices N.S. Witte Institute of Fundamental Sciences Massey University New Zealand 5-12-2017 Joint work with Peter Forrester 6 th Wellington Workshop in Probability and Mathematical Statistics 4-6 December 2017 N.S. Witte Statistical Inference and Random Matrices 1-1
Applications Historical Origins: Integrals over Classical Groups U(N), O(N), Sp(2N) Hurwitz 1897, Haar 1933 Mathematical Statistics: Wishart 1928, James 1954-64, Constantine 1964, Mathai 1997 Quantisation of Classically Chaotic Systems: Wigner 1955, 1958 N.S. Witte Statistical Inference and Random Matrices 1-2
Applications Historical Origins: Integrals over Classical Groups U(N), O(N), Sp(2N) Hurwitz 1897, Haar 1933 Mathematical Statistics: Wishart 1928, James 1954-64, Constantine 1964, Mathai 1997 Quantisation of Classically Chaotic Systems: Wigner 1955, 1958 Contemporary Applications: Principal Component Analysis, sample covariance matrices, Wishart matrices, null and non-null SCM mathematical finance, cross correlations of financial data, sample correlation matrices, Polynuclear growth models, random permutations, last passage percolation, queuing models, Biogeographic pattern of species nested-ness, ordered binary presence-absence matrices, distribution of mutation fitness effects across species, Fisher s geometrical model, complex networks modeled by random graphs, e.g. adjacency matrices data analysis and statistical learning Stable signal recovery from incomplete and inaccurate measurements Compressed sensing, best k-term approximation, n-widths Wireless communication, antenna networks, quantum entanglement quantum chaos, semi-classical approximation quantum transport in mesoscopic systems N.S. Witte Statistical Inference and Random Matrices 1-3
What is a random matrix? E.g. Gaussian Orthogonal Ensembles of Random Matrices: GOE aka Gaussian Wigner matrix i.i.d random variables x j,j N[0, 1] x j,k N[0, 1 2 ] Construct n n real, symmetric matrix X = (x j,k ) n j,k=1 Joint p.d.f for the elements P(X ) = 1 n e 1 2 x2 j,j e x2 j,k = 1 e 1 2 x2 j,k = 1 e 2 1 Tr(X 2 ) C n C j=1 1 j<k n n C 1 j,k n n Invariance under orthogonal transformations X OXO, OO = I N.S. Witte Statistical Inference and Random Matrices 1-4
Spectral Decomposition of a Matrix n n Real Symmetric Matrices X = (x j,k ) 1 j,k n Eigenvalue Analysis λ 1,..., λ n X = OΛO Λ = diag(λ 1,..., λ n), Orthogonal Eigenvectors O = (O 1,..., O n), OO = I Volume form (dx ) = n j=1 dx j,j 1 j<k n dx j,k Change of 1 2 n(n + 1) variables {x j,k } {λ j, O j } Jacobian (dx ) = (O do) n m Real Matrices X Singular Value Decomposition σ 1,..., σ m 1 j<k n X = OΣP λ j λ k n j=1 dλ j Σ = diag(σ 1,..., σ m) R n m, Orthogonal O R n n, P R m m Gram-Schmidt orthogonalisation, QR, LU, Cholesky, Hessenberg Decompositions N.S. Witte Statistical Inference and Random Matrices 1-5
Putting it altogether: GOE or Gaussian Wigner Matrices Joint p.d.f for eigenvalues P(λ) = 1 n e 2 1 λ2 j λ j λ k 1, λ j R C n j=1 1 j<k n N.B. repulsion parameter, Dyson index, β = 1 in Stieltjes picture, Log-gas Hermite weight, one of the classical orthogonal polynomial weights normalisation is Selberg integral N.S. Witte Statistical Inference and Random Matrices 1-6
Principal Component Analysis X F n p (F = R, C) with with p p Covariance matrix X = {x (j) k } j=1,...,n, k=1,...,p p = # of variables, n = # of data points n A = X X = x (j) k x (j) 1 k 2 j=1 k 1 =1,...,p k 2 =1,...,p A is a Wishart matrix if x j,k are i.i.d random variables drawn from N[0, 1] Joint eigenvalue p.d.f. 1 p λ βa/2 k e βλ k /2 λ j λ k β, λ k [0, ) C k=1 1 j<k p i.e. Laguerre weight, a = n p + 1 2/β, n p { 1 Real, R β = 2 Complex, C N.S. Witte Statistical Inference and Random Matrices 1-7
Translation Single-Wishart Null hypothesis µ = 0, Σ = I p variate degrees of freedom = n Laguerre LβE e β 2 λ λ β 2 (n p+1 2/β) Double-Wishart Null hypothesis µ 1 = 0, Σ 1 = I, µ 2 = 0, Σ 2 = I p 1 st variate q 2 nd variate degrees of freedom = n Jacobi JβE (1 λ) β 2 (q p+1 2/β) (1+λ) β 2 (n q p+1 2/β) N.S. Witte Statistical Inference and Random Matrices 1-8
Global Properties of the spectra: Empirical Spectral Distribution n Wigner semi-circle Law for the global density of eigenvalues ρ(λ) = 1 π 2n λ 2 n, p Marchenko-Pastur Law for eigenvalues of X X where ρ(λ) = 1 (λ nx )(nx + λ) 2πλ ( ) x ± = c 1/2 2 p ± 1, c = n 1 N.S. Witte Statistical Inference and Random Matrices 1-9
GUE, GOE n = 24 2 nρ(2 nx) 15 14 12 10 10 8 5 6 4 2 1.5 1.0 0.5 0.5 1.0 1.5 1.5 1.0 0.5 0.5 1.0 1.5 N.S. Witte Statistical Inference and Random Matrices 1-10
Global Properties of the spectra: Empirical Spectral Distribution Prop. 2.2 and Lemma 4.1 of Haagerup and Thorbjørnsen [2012] Theorem The eigenvalue density ρ(x) for the GUE satisfies the third order, homogeneous ordinary differential equation ρ + (4n x 2 )ρ + xρ = 0 subject to certain boundary conditions, for fixed n as x ±. W & Forrester [2013] Theorem The eigenvalue density ρ(x) for the GOE satisfies the fifth order, linear homogeneous ordinary differential equation 4ρ (V ) + 5(x 2 4n + 2)ρ 6xρ + [ x 4 + (8n 4)x 2 16n 2 + 16n + 2]ρ + x(x 2 4n + 2)ρ = 0 again subject to certain boundary conditions, for fixed n as x ±. N.S. Witte Statistical Inference and Random Matrices 1-11
The many ways to look at these problems Statistic on Spec(X ) = {λ 1,... λ n} density ρ(λ) Regime Global spectrum m-point correlation functions ρ m(λ 1,..., λ m) Linear Spectral Statistics, n i=1 f (λ i ) Hypothesis tests, Distribution theory Extreme Eigenvalues, λ max, λ min Large deviations, Spectrum edge Eigenvalue Spacings, λ i+1 λ i Bulk or Edge spectrum Spectral Gaps j, λ j / J Spec(X ) Condition Numbers, λmax λ min Determinants, Characteristic Polynomials n i=1 (ζ λ i ) N.S. Witte Statistical Inference and Random Matrices 1-12
Tools one can use Approach Primary Object Moment Methods Concentration Inequalities Large Deviation Theory Free Probability Loop Equations Hypergeometric Functions of Matrix Argument Orthogonal/Bi-orthogonal Polynomials Integrable Systems, Painlevé equations Potential problems and equilibrium measures Stieltjes transform Stieltjes transform, Resolvents Zonal and Jack Polynomials Riemann-Hilbert asymptotics Gap probabilities, Characteristic polynomial N.S. Witte Statistical Inference and Random Matrices 1-13
The Spectrum Edge: Soft Edge Tracy-Widom Distribution F 2 (s), β = 2 Gap probability i.e. probability of no eigenvalues of n n GUE in (t, ) denoted by Shift and scale t as E 2,n(0; (t, )) t = 2n + s 2n 1/6 Take limit n of Gap probability λmax 2n lim P[ s] = F n 2n 1/6 2(s) Tracy-Widom [1994] Fredholm determinant F 2(s) = det(1 K 2) L 2 (s, ) where the integral operator K 2 has the kernel, the Airy Kernel K 2(x, y) = Ai(x)Ai (y) Ai(y)Ai (x) x y N.S. Witte Statistical Inference and Random Matrices 1-14
Tracy-Widom Distribution F 2 (s) and the second Painlevé transcendent P II The P II transcendent q(t; α) then satisfies the standard form of the second Painlevé equation d 2 dt 2 q = 2q3 + tq + α, α C Gap probability, i.e. Tracy-Widom distribution F 2(s), is ( ) Soft Edge E2 (0; (s, )) = exp dt (t s)q(t) 2 s where q(t) is the α = 0 solution for P II with the boundary condition q(t) Ai(t) t Hastings and McLeod solution, see Hastings, S. P. and McLeod, J. B. A boundary value problem associated with the second Painlevé transcendent and the Korteweg-Vries equation. Arch. Rational Mech. Anal., 1980, 73(1), 31-51 Forrester & W [2012], Tails Soft Edge log Eβ (n; (s, )) β s 3 2 (βn s 24 + 3 s 3/2 + β2 ) 1 n s + [ β 2 n2 + ( ) β 2 1 n + 1 ( 1 2 ( 1 β ) 2 )] log s 3/4 6 β 2 N.S. Witte Statistical Inference and Random Matrices 1-15
Universality Hypothesis Extension of central limit theorems and the Gaussian distribution Conjecture As the rank of the random matrix ensembles n, with or without a similar scaling of other parameters, the ensembles have well-defined limits, these limits define new distributions which are insensitive to details of the finite model other than their symmetry class, β are characterised by the solutions of integrable dynamical systems, e.g. of the integrable hierarchies such as the Toda lattice, K-dV or K-P systems, or more precisely by Painlevé type equations. Proven Cases: Four Moments Theorems in Tao, T., Vu, V., Random covariance matrices: universality of local statistics of eigenvalues. Ann. Probab., 2012, 40, 1285 1315. Riemann-Hilbert Approach in Deift, P., Gioev, D. Random Matrix Theory: Invariant Ensembles and Universality. Amer. Math. Soc., 2009. N.S. Witte Statistical Inference and Random Matrices 1-16
Beyond the Null case: Example 1, Appearance of phase transition phenomenon Baik, J., Ben Arous, G. and Péché, S. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab., 2005, 33(5), 1643 1697 β = 2, λ 1 is the largest eigenvalue of the sample covariance matrix, γ 2 = n p, l1 l2 Population covariance matrix Σ = l1 l 2 I p 2 As p, n either ( ) P [λ 1 (1 + γ 1 ) 2 γ F 2(x), 0 < l 1, l 2 < 1 + γ 1 )] (1 + γ) 4/3 n2/3 x F 2 1 (x), 0 < l2 < 1 + γ 1 = l 1 F (x) l 1 = l 2 = 1 + γ 1 or ( P [λ 1 l 1(1 + γ 2 l 1 1 )] n 1/2 l 1 1 γ 2 /(l 1 1) x 2 ) { G 1(x), l 1 > 1 + γ 1, l 1 > l 2 G 2(x), l 1 = l 2 > 1 + γ 1 N.S. Witte Statistical Inference and Random Matrices 1-17
Beyond the Null case: The moral of the story As n and p n PCA works: Sample CM Population CM As n, p and p = O(n)??? Issue: How close are the eigenvalues of the sample PCA to the population PCA? Even though for n finite there is no phase transition (n ) as a function of n or some other parameter the eigenvector of the sample PCA (e.g associated with eigenvalue λ 1) may exhibit a sharp loss of tracking suddenly losing its relation to the eigenvector of the population PCA. Nadler, B. Finite sample approximation results for principal component analysis: a matrix perturbation approach. Ann. Statist. 36 (2008), no. 6, 27912817. N.S. Witte Statistical Inference and Random Matrices 1-18
Beyond the Null case: Example 2, Spiked Population models Johnstone, I.M. On the distribution of the largest eigenvalue in principal components analysis Ann. Statist., 2001, 29(2), 295 327 A model: where x i = µ + Au i + σz i, i = 1,..., n p number of variables M number of spikes x i observation p-vector µ p-vector of means A p M factor loading matrix u i M-vector of random factors z i p-vector of white noise Population Covariance Matrix where M Σ = l 2 j q j q T j j=1 + σ 2 I p Φ l j, q j, j = 1,..., M M M covariance matrix of u i eigenvalues/vectors of AΦA T Ma, Z. Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 (2013), no. 2, 772801. N.S. Witte Statistical Inference and Random Matrices 1-19
Painlevé Transcendents P-I P-II d 2 y dx 2 d 2 y dx 2 = 6y 2 + x = 2y 3 +xy+ν P-III P-IV d 2 y dx 2 = 1 y ( ) dy 2 1 dy dx x dx + y 2 4x (γy+α) + β 2 4x + δ 4y, d 2 y dx 2 = 1 2y ( ) dy 2 + 3 dx γ = 4, δ = 4 2 y 3 + 4xy 2 + 2(x 2 α)y + β y P-V d 2 { y 1 = dx 2 2y + 1 (y 1)2 + x 2 y 1 { αy + β y } ( ) dy 2 1 dy dx x dx } + γy δy(y + 1) + x y 1, δ = 1/2 P-VI d 2 y = 1 { 1 dx 2 2 y + 1 y 1 + 1 } ( ) dy 2 { 1 y x dx x + 1 x 1 + 1 } dy y x dx { y(y 1)(y x) + α+ βx } γ(x 1) δx(x 1) + + x 2 (x 1) 2 y 2 (y 1) 2 (y x) 2 N.S. Witte Statistical Inference and Random Matrices 1-20
Painlevé Equations: Digital Library of Mathematical Functions, Chapter 32, http://dlmf.nist.gov/ Classical Solution P Affine Weyl Group P I Airy: Ai(x), Bi(x) P II A 1 Bessel: I ν(x), K ν(x) P III B 2 Hermite-Weber: D ν(x) P IV A 2 Confluent Hypergeometric: 1F 1(a, c; x) P V A 3 Gauss Hypergeometric: 2F 1(a, b; c; x) P VI D 4 N.S. Witte Statistical Inference and Random Matrices 1-21
Reference Monographs/Reviews Muirhead, R. J., Aspects of Multivariate Statistical Theory, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons Inc., 1982, Bai, Z. and Silverstein, J. W., Spectral Analysis of Large Dimensional Random Matrices, 2nd Edition, Springer, New York, 2010, Mehta, M. L., Random Matrices, Pure and Applied Mathematics (Amsterdam), Vol. 142, 3rd Edition, Elsevier/Academic Press, Amsterdam, 2004, Anderson, G. W. and Guionnet, A. and Zeitouni, O., An Introduction to Random Matrices, Cambridge University Press, Cambridge, 2010, Forrester, P. J., Log Gases and Random Matrices, Princeton University Press, 2010 Akemann, G. and Baik, J. and Di Francesco, P., Handbook on Random Matrix Theory, Oxford University Press, 2011, Couillet, R. and Debbah, M., Random Matrix Methods for Wireless Communications, Cambridge University Press, 2011 Johnstone, I. M., High Dimensional Statistical Inference and Random Matrices, International Congress of Mathematicians. Vol. I, Eur. Math. Soc., Zürich, 2006, pp. 307-333 Paul, D. and Aue, A, Random matrix theory in statistics: A review J. Statist. Plann. Inference, 150, 1 29, 2014 N.S. Witte Statistical Inference and Random Matrices 1-22