Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France

exponential tail inequalities classical theme in probability and statistics quantify the asymptotic statements central limit theorems large deviation principles

classical exponential inequalities sum of independent random variables S n = 1 n (X 1 + + X n ) 0 X i 1 independent P ( S n E(S n ) + t ) e t2 /2, t 0 Hoeffding s inequality same as for X i standard Gaussian central limit theorem

measure concentration ideas asymptotic geometric analysis V. Milman (1970) S n = 1 n (X 1 + + X n ) F (X ) = F (X 1,..., X n ), F : R n R Lipschitz Gaussian sample independent random variables

concentration inequalities S n = 1 n (X 1 + + X n ) F (X ) = F (X 1,..., X n ), F : R n R 1-Lipschitz X 1,..., X n independenty standard Gaussian ( P F (X ) E ( F (X ) ) ) + t e t2 /2, t 0 0 X i 1 independent, F 1-Lipschitz and convex ( P F (X ) E ( F (X ) ) ) + t 2 e t2 /4, t 0 M. Talagrand (1995)

empirical processes X 1,..., X n independent with values in (S, S) F collection of functions f : S [0, 1] Z = sup f F n f (X i ) i=1 Z Lipschitz and convex concentration inequalities on P( Z E(Z) ) t, t 0

Z = sup f F n f (X i ) i=1 f 1, E(f (X i )) = 0, f F P ( Z M t ) ( C exp tc ( )) log t 1 + σ 2, t 0 + M C > 0 numerical constant, M mean or median of Z σ 2 = sup f F n i=1 E(f 2 (X i )) M. Talagrand (1996) P. Massart (2000) S. Boucheron, G. Lugosi, P. Massart (2005) P.-M. Samson (2000) (dependence)

concentration inequalities numerous applications geometric functional analysis discrete and combinatorial probability empirical processes statistical mechanics random matrix theory

recent studies of random matrix and random growth models new asymptotics common, non-central, rate (mean) 1/3 universal limiting Tracy-Widom distribution random matrices, longest increasing subsequence, random growth models, last passage percolation...

sample covariance matrices multivariate statistical inference principal component analysis population (Y 1,..., Y N ) Y j vectors (column) in R M (characters) Y = (Y 1,..., Y N ) M N matrix sample covariance matrix Y Y t (M M) (independent) Gaussian Y j : Wishart matrix models

is Y Y t a good approximation of the population covariance matrix E(Y Y t )? M finite 1 N Y Y t E(Y Y t ) N M infinite? M = M(N) N M N ρ (0, ) N

sample covariance matrices Y = (Y 1,..., Y N ) M N matrix Y = (Y ij ) 1 i M,1 j N Y ij independent identically distributed (real or complex) E(Y ij ) = 0, E(Y 2 ij ) = 1 Wishart model : Y j standard Gaussian in R M numerous extensions

sample covariance matrices Y = (Y 1,..., Y N ) M N matrix Y = (Y ij ) 1 i M,1 j N iid E(Y ij ) = 0, E(Y 2 ij ) = 1 center of interest : eigenvalues 0 λ N 1 λn M of Y Y t (M M non-negative symmetric matrix) λ N k singular values of Y λ N k = λn k N eigenvalues of 1 N Y Y t spectral measure 1 M M k=1 δ λn k asymptotics M = M(N) ρ N N

Marchenko-Pastur theorem (1967) asymptotic behavior of the spectral measure ( λ N k = λn k /N) 1 M M k=1 dν(x) = δ λn k ν Marchenko-Pastur distribution ( 1 1 ) ρ δ 0 + 1 (b x)(x a) 1[a,b] dx + ρ 2πx a = a(ρ) = ( 1 ρ ) 2 b = b(ρ) = ( 1 + ρ ) 2

1 M M k=1 Marchenko-Pastur theorem δ λn k ν on ( a(ρ), b(ρ) ) M ρ N global regime large deviation asymptotics of the spectral measure fluctuations of the spectral measure M [ ) f ( λn k R f dν] G k=1 f : R R smooth Gaussian variable

1 M M k=1 Marchenko-Pastur theorem δ λn k ν on ( a(ρ), b(ρ) ) M ρ N local regime behavior of the individual eigenvalues spacings (bulk behavior) extremal eigenvalues (edge behavior)

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3[ λn M b(ρ) ] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3 N 1[ λ N M b(ρ)n] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)

F TW (complex) C. Tracy, H. Widom (1994) distribution ( F TW (s) = exp u = 2u 3 + xu s (x s)u(x) 2 dx Painlevé II equation ), s R density

mean 1.77 F TW (s) e s3 /12 as s 1 F TW (s) e 4s3/2 /3 as s + density (similar for real case)

Gaussian (Wishart matrices) completely solvable models determinantal structure orthogonal polynomial analysis asymptotics of Laguerre orthogonal polynomials C. Tracy, H. Widom (1994) K. Johansson (2000), I. Johnstone (2001)

extension to non-gaussian matrices A. Soshnikov (2001-02) moment method E ( Tr ( (YY t ) p)) L. Erdös, H.-T. Yau (2009-12) (and collaborators) local Marchenko-Pastur law T. Tao, V. Vu (2010-11) Lindeberg comparison method symmetric matrices

(brief) survey of recent approaches to non-asymptotic exponential inequalities quantify the limit theorems spectral measure extremal eigenvalues catch the new rate (mean) 1/3 from the Gaussian case to non-gaussian models

two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1

two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε )

two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε ) Wishart matrices more general covariance matrices

measure concentration tool F = F (Y Y t ) = F (Y ij ) satisfactory for the global regime less satisfactory for the local regime specific functionals eigenvalue counting function extreme eigenvalues

tail inequalities for the spectral measure A. Guionnet, O. Zeitouni (2000) measure concentration tool f : R R smooth (Lipschitz) X = (X ij ) 1 i,j M M M symmetric matrix eigenvalues λ 1 λ M M F : X Tr f (X ) = f (λ k ) k=1 with respect to the Euclidean structure on Lipschitz M M matrices convex if f is convex

tail inequalities for the spectral measure Gaussian entries Y ij f : R R such that f (x 2 ) 1-Lipschitz ( M [ ) P f ( λ N k ) E( f ( λ N k ))] t C(ρ) e t2 /C(ρ), t 0 k=1 compactly supported entries Y ij f : R R such that f (x 2 ) 1-Lipschitz and convex

non-lipschitz functions f typically f = 1 I, I R interval M f ( λn ) k = # { λn k I } = N I counting function k=1 Wishart matrices (determinantal structure) I interval in (a, b) 1 log M [ NI E(N I ) ] G Gaussian variable exponential tail inequalities P ( N I E(N I ) t ) C e ct log(1+t/ log M), t 0 Var ( N I ) = O(log M)

non-gaussian covariance matrices comparison with Wishart model partial results localization results L. Erdös, H.-T. Yau (2009-12) Lindeberg comparison method T. Tao, V. Vu (2010-11) Var ( N I ) = O(log M) S. Dallaporta, V. Vu (2011) P ( N I E(N I ) t ) C e ctδ, t C log M, 0 < δ 1 T. Tao, V. Vu (2012)

tail inequalities for the extremal eigenvalues fluctuations of the largest eigenvalue M 2/3[ λn M b(ρ) ] C(ρ) F TW M ρ N

tail inequalities for the extremal eigenvalues fluctuations of the largest eigenvalue M 2/3[ λn M b(ρ) ] C(ρ) F TW M ρ N finite M inequalities at the (mean) 1/3 rate reflecting the tails of F TW bounds on Var( λ N M )

measure concentration tool (Gaussian) Wishart matrix Y Y t λ N M = max 1 k M λn k = sup Y v 2 v =1 s N M = λ N M Lipschitz of the Gaussian entries Y ij Gaussian concentration ( P ŝm N E( ŝm N ) ) + t e M t2 /C, t 0 E(ŝ N M ) b(ρ) correct large deviation bounds (t 1)

extreme eigenvalues alternate tools Riemann-Hilbert analysis (Wishart matrices) tri-diagonal representations (Wishart and β-ensembles) moment methods (Wishart and non-gaussian matrices)

M 2/3[ λn M b(ρ) ] C(ρ) F TW P( λn M b(ρ) + s M 2/3) F TW (C s) bounds for Wishart matrices tri-diagonal representation B. Rider, M. L. (2010) P ( λn M b(ρ) + ɛ ) C e Mε3/2 /C, 0 < ε 1 P ( λn M b(ρ) ɛ ) C e Mε3 /C, 0 < ε b(ρ)

P( λn M b(ρ) + s M 2/3) F TW (C s) bounds for Wishart matrices P ( λn M b(ρ) + ɛ ) C e Mε3/2 /C, 0 < ε 1 P ( λn M b(ρ) ɛ ) C e Mε3 /C, 0 < ε b(ρ) fit the Tracy-Widom asymptotics (ε = s M 2/3 ) 1 F TW (s) e s3/2 /C (s + ) F TW (s) e s3 /C (s ) Var( λ N M ) = O ( 1 M 4/3 )

M 2/3[ λn M b(ρ) ] C(ρ) F TW b(ρ) = ( 1 + ρ ) 2 λ N M = λn M /N, M = M(N) ρ N ( MN) 1/3 ( ( M + λ N N) 4/3 M ( M + N) 2) F TW N + 1 M 0 < ε 1 ( P λ N M ( M + ) N) 2 (1 + ε) C e MN ε 3/2 ( 1 ε ( M N ) 1/4)/C ( P λ N M ( M + ) N) 2 (1 ε) C e MN ε3 ( 1 ε ( M N ) 1/2)/C

bi and tri-diagonal representation χ N 0 0 0 χ (M 1) χ N 1 0 0.. 0 χ B = (M 2) χ N 3 0..... 0........ 0.... χ2 χ N M+2 0 0 0 χ 1 χ N M+1 χ (N 1),..., χ 1, χ (M 1),..., χ 1 independent chi-variables B B t same spectrum as Y Y t (Wishart) H. Trotter (1984), A. Edelman, I. Dimitriu (2002) extension to β-ensembles

bounds for non-gaussian entries moment method E ( Tr ( (YY t ) p)) O. Feldheim, S. Sodin (2010) largest eigenvalue (symmetric, subgaussian entries) P ( λn M b(ρ) + ε ) C e M ε3/2 /C, 0 < ε 1 below the mean? necessary for variance bounds

variance level Var( λ N M ) = O ( 1 M 4/3 ) S. Dallaporta (2012) comparison with Wishart model localization results L. Erdös, H.-T. Yau (2009-12) Lindeberg comparison method T. Tao, V. Vu (2010-11)

smallest eigenvalue soft edge M = M(N) ρ N, ρ < 1 a(ρ) = ( 1 ρ ) 2 P ( λn 1 a(ρ) ε ) C e M ε3/2 /C, 0 < ε 1 P ( λn 1 a(ρ) + ε ) C e M ε3 /C, 0 < ε a(ρ) Wishart matrices B. Rider, M. L. (2010)

smallest eigenvalue hard edge M = N, ρ = 1 a(ρ) = ( 1 ρ ) 2 = 0 P ( λn 1 ε ) N 2 C ε + C e cn large families of covariance matrices M. Rudelson, R. Vershynin (2008-10)