Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France
exponential tail inequalities classical theme in probability and statistics quantify the asymptotic statements central limit theorems large deviation principles
classical exponential inequalities sum of independent random variables S n = 1 n (X 1 + + X n ) 0 X i 1 independent P ( S n E(S n ) + t ) e t2 /2, t 0 Hoeffding s inequality same as for X i standard Gaussian central limit theorem
measure concentration ideas asymptotic geometric analysis V. Milman (1970) S n = 1 n (X 1 + + X n ) F (X ) = F (X 1,..., X n ), F : R n R Lipschitz Gaussian sample independent random variables
concentration inequalities S n = 1 n (X 1 + + X n ) F (X ) = F (X 1,..., X n ), F : R n R 1-Lipschitz X 1,..., X n independenty standard Gaussian ( P F (X ) E ( F (X ) ) ) + t e t2 /2, t 0 0 X i 1 independent, F 1-Lipschitz and convex ( P F (X ) E ( F (X ) ) ) + t 2 e t2 /4, t 0 M. Talagrand (1995)
empirical processes X 1,..., X n independent with values in (S, S) F collection of functions f : S [0, 1] Z = sup f F n f (X i ) i=1 Z Lipschitz and convex concentration inequalities on P( Z E(Z) ) t, t 0
Z = sup f F n f (X i ) i=1 f 1, E(f (X i )) = 0, f F P ( Z M t ) ( C exp tc ( )) log t 1 + σ 2, t 0 + M C > 0 numerical constant, M mean or median of Z σ 2 = sup f F n i=1 E(f 2 (X i )) M. Talagrand (1996) P. Massart (2000) S. Boucheron, G. Lugosi, P. Massart (2005) P.-M. Samson (2000) (dependence)
concentration inequalities numerous applications geometric functional analysis discrete and combinatorial probability empirical processes statistical mechanics random matrix theory
concentration inequalities numerous applications geometric functional analysis discrete and combinatorial probability empirical processes statistical mechanics random matrix theory
recent studies of random matrix and random growth models new asymptotics common, non-central, rate (mean) 1/3 universal limiting Tracy-Widom distribution random matrices, longest increasing subsequence, random growth models, last passage percolation...
sample covariance matrices multivariate statistical inference principal component analysis population (Y 1,..., Y N ) Y j vectors (column) in R M (characters) Y = (Y 1,..., Y N ) M N matrix sample covariance matrix Y Y t (M M) (independent) Gaussian Y j : Wishart matrix models
is Y Y t a good approximation of the population covariance matrix E(Y Y t )? M finite 1 N Y Y t E(Y Y t ) N M infinite? M = M(N) N M N ρ (0, ) N
sample covariance matrices Y = (Y 1,..., Y N ) M N matrix Y = (Y ij ) 1 i M,1 j N Y ij independent identically distributed (real or complex) E(Y ij ) = 0, E(Y 2 ij ) = 1 Wishart model : Y j standard Gaussian in R M numerous extensions
sample covariance matrices Y = (Y 1,..., Y N ) M N matrix Y = (Y ij ) 1 i M,1 j N iid E(Y ij ) = 0, E(Y 2 ij ) = 1 center of interest : eigenvalues 0 λ N 1 λn M of Y Y t (M M non-negative symmetric matrix) λ N k singular values of Y λ N k = λn k N eigenvalues of 1 N Y Y t spectral measure 1 M M k=1 δ λn k asymptotics M = M(N) ρ N N
Marchenko-Pastur theorem (1967) asymptotic behavior of the spectral measure ( λ N k = λn k /N) 1 M M k=1 dν(x) = δ λn k ν Marchenko-Pastur distribution ( 1 1 ) ρ δ 0 + 1 (b x)(x a) 1[a,b] dx + ρ 2πx a = a(ρ) = ( 1 ρ ) 2 b = b(ρ) = ( 1 + ρ ) 2
1 M M k=1 Marchenko-Pastur theorem δ λn k ν on ( a(ρ), b(ρ) ) M ρ N global regime large deviation asymptotics of the spectral measure fluctuations of the spectral measure M [ ) f ( λn k R f dν] G k=1 f : R R smooth Gaussian variable
1 M M k=1 Marchenko-Pastur theorem δ λn k ν on ( a(ρ), b(ρ) ) M ρ N local regime behavior of the individual eigenvalues spacings (bulk behavior) extremal eigenvalues (edge behavior)
extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N
Marchenko-Pastur theorem (1967) asymptotic behavior of the spectral measure ( λ N k = λn k /N) 1 M M k=1 dν(x) = δ λn k ν Marchenko-Pastur distribution ( 1 1 ) ρ δ 0 + 1 (b x)(x a) 1[a,b] dx + ρ 2πx a = a(ρ) = ( 1 ρ ) 2 b = b(ρ) = ( 1 + ρ ) 2
extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)
extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)
extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)
extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3[ λn M b(ρ) ] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)
extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3 N 1[ λ N M b(ρ)n] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)
F TW (complex) C. Tracy, H. Widom (1994) distribution ( F TW (s) = exp u = 2u 3 + xu s (x s)u(x) 2 dx Painlevé II equation ), s R density
mean 1.77 F TW (s) e s3 /12 as s 1 F TW (s) e 4s3/2 /3 as s + density (similar for real case)
extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3[ λn M b(ρ) ] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)
Gaussian (Wishart matrices) completely solvable models determinantal structure orthogonal polynomial analysis asymptotics of Laguerre orthogonal polynomials C. Tracy, H. Widom (1994) K. Johansson (2000), I. Johnstone (2001)
extension to non-gaussian matrices A. Soshnikov (2001-02) moment method E ( Tr ( (YY t ) p)) L. Erdös, H.-T. Yau (2009-12) (and collaborators) local Marchenko-Pastur law T. Tao, V. Vu (2010-11) Lindeberg comparison method symmetric matrices
(brief) survey of recent approaches to non-asymptotic exponential inequalities quantify the limit theorems spectral measure extremal eigenvalues catch the new rate (mean) 1/3 from the Gaussian case to non-gaussian models
two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1
1 M M k=1 Marchenko-Pastur theorem δ λn k ν on ( a(ρ), b(ρ) ) M ρ N global regime large deviation asymptotics of the spectral measure fluctuations of the spectral measure M [ ) f ( λn k R f dν] G k=1 f : R R smooth Gaussian variable
two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε )
extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3[ λn M b(ρ) ] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)
two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε ) Wishart matrices more general covariance matrices
measure concentration tool F = F (Y Y t ) = F (Y ij ) satisfactory for the global regime less satisfactory for the local regime specific functionals eigenvalue counting function extreme eigenvalues
two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε ) Wishart matrices more general covariance matrices
tail inequalities for the spectral measure A. Guionnet, O. Zeitouni (2000) measure concentration tool f : R R smooth (Lipschitz) X = (X ij ) 1 i,j M M M symmetric matrix eigenvalues λ 1 λ M M F : X Tr f (X ) = f (λ k ) k=1 with respect to the Euclidean structure on Lipschitz M M matrices convex if f is convex
concentration inequalities S n = 1 n (X 1 + + X n ) F (X ) = F (X 1,..., X n ), F : R n R 1-Lipschitz X 1,..., X n independenty standard Gaussian ( P F (X ) E ( F (X ) ) ) + t e t2 /2, t 0 0 X i 1 independent, F 1-Lipschitz and convex ( P F (X ) E ( F (X ) ) ) + t 2 e t2 /4, t 0 M. Talagrand (1995)
tail inequalities for the spectral measure Gaussian entries Y ij f : R R such that f (x 2 ) 1-Lipschitz ( M [ ) P f ( λ N k ) E( f ( λ N k ))] t C(ρ) e t2 /C(ρ), t 0 k=1 compactly supported entries Y ij f : R R such that f (x 2 ) 1-Lipschitz and convex
1 M M k=1 Marchenko-Pastur theorem δ λn k ν on ( a(ρ), b(ρ) ) M ρ N global regime large deviation asymptotics of the spectral measure fluctuations of the spectral measure M [ ) f ( λn k R f dν] G k=1 f : R R smooth Gaussian variable
non-lipschitz functions f typically f = 1 I, I R interval M f ( λn ) k = # { λn k I } = N I counting function k=1 Wishart matrices (determinantal structure) I interval in (a, b) 1 log M [ NI E(N I ) ] G Gaussian variable exponential tail inequalities P ( N I E(N I ) t ) C e ct log(1+t/ log M), t 0 Var ( N I ) = O(log M)
non-gaussian covariance matrices comparison with Wishart model partial results localization results L. Erdös, H.-T. Yau (2009-12) Lindeberg comparison method T. Tao, V. Vu (2010-11) Var ( N I ) = O(log M) S. Dallaporta, V. Vu (2011) P ( N I E(N I ) t ) C e ctδ, t C log M, 0 < δ 1 T. Tao, V. Vu (2012)
non-lipschitz functions f typically f = 1 I, I R interval M f ( λn ) k = # { λn k I } = N I counting function k=1 Wishart matrices (determinantal structure) I interval in (a, b) 1 log M [ NI E(N I ) ] G Gaussian variable exponential tail inequalities P ( N I E(N I ) t ) C e ct log(1+t/ log M), t 0 Var ( N I ) = O(log M)
two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε ) Wishart matrices more general covariance matrices
two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε ) Wishart matrices more general covariance matrices
tail inequalities for the extremal eigenvalues fluctuations of the largest eigenvalue M 2/3[ λn M b(ρ) ] C(ρ) F TW M ρ N
extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3[ λn M b(ρ) ] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)
tail inequalities for the extremal eigenvalues fluctuations of the largest eigenvalue M 2/3[ λn M b(ρ) ] C(ρ) F TW M ρ N finite M inequalities at the (mean) 1/3 rate reflecting the tails of F TW bounds on Var( λ N M )
measure concentration tool (Gaussian) Wishart matrix Y Y t λ N M = max 1 k M λn k = sup Y v 2 v =1 s N M = λ N M Lipschitz of the Gaussian entries Y ij Gaussian concentration ( P ŝm N E( ŝm N ) ) + t e M t2 /C, t 0 E(ŝ N M ) b(ρ) correct large deviation bounds (t 1)
measure concentration tool (Gaussian) Wishart matrix Y Y t λ N M = max 1 k M λn k = sup Y v 2 v =1 s N M = λ N M Lipschitz of the Gaussian entries Y ij Gaussian concentration ( P ŝm N E( ŝm N ) ) + t e M t2 /C, t 0 E(ŝ N M ) b(ρ) does not fit the small deviation regime t = s M 2/3
extreme eigenvalues alternate tools Riemann-Hilbert analysis (Wishart matrices) tri-diagonal representations (Wishart and β-ensembles) moment methods (Wishart and non-gaussian matrices)
extreme eigenvalues alternate tools Riemann-Hilbert analysis (Wishart matrices) tri-diagonal representations (Wishart and β-ensembles) moment methods (Wishart and non-gaussian matrices)
M 2/3[ λn M b(ρ) ] C(ρ) F TW P( λn M b(ρ) + s M 2/3) F TW (C s) bounds for Wishart matrices tri-diagonal representation B. Rider, M. L. (2010) P ( λn M b(ρ) + ɛ ) C e Mε3/2 /C, 0 < ε 1 P ( λn M b(ρ) ɛ ) C e Mε3 /C, 0 < ε b(ρ)
P( λn M b(ρ) + s M 2/3) F TW (C s) bounds for Wishart matrices P ( λn M b(ρ) + ɛ ) C e Mε3/2 /C, 0 < ε 1 P ( λn M b(ρ) ɛ ) C e Mε3 /C, 0 < ε b(ρ) fit the Tracy-Widom asymptotics (ε = s M 2/3 ) 1 F TW (s) e s3/2 /C (s + ) F TW (s) e s3 /C (s ) Var( λ N M ) = O ( 1 M 4/3 )
M 2/3[ λn M b(ρ) ] C(ρ) F TW b(ρ) = ( 1 + ρ ) 2 λ N M = λn M /N, M = M(N) ρ N ( MN) 1/3 ( ( M + λ N N) 4/3 M ( M + N) 2) F TW N + 1 M 0 < ε 1 ( P λ N M ( M + ) N) 2 (1 + ε) C e MN ε 3/2 ( 1 ε ( M N ) 1/4)/C ( P λ N M ( M + ) N) 2 (1 ε) C e MN ε3 ( 1 ε ( M N ) 1/2)/C
bi and tri-diagonal representation χ N 0 0 0 χ (M 1) χ N 1 0 0.. 0 χ B = (M 2) χ N 3 0..... 0........ 0.... χ2 χ N M+2 0 0 0 χ 1 χ N M+1 χ (N 1),..., χ 1, χ (M 1),..., χ 1 independent chi-variables B B t same spectrum as Y Y t (Wishart) H. Trotter (1984), A. Edelman, I. Dimitriu (2002) extension to β-ensembles
bounds for non-gaussian entries moment method E ( Tr ( (YY t ) p)) O. Feldheim, S. Sodin (2010) largest eigenvalue (symmetric, subgaussian entries) P ( λn M b(ρ) + ε ) C e M ε3/2 /C, 0 < ε 1 below the mean? necessary for variance bounds
variance level Var( λ N M ) = O ( 1 M 4/3 ) S. Dallaporta (2012) comparison with Wishart model localization results L. Erdös, H.-T. Yau (2009-12) Lindeberg comparison method T. Tao, V. Vu (2010-11)
smallest eigenvalue soft edge M = M(N) ρ N, ρ < 1 a(ρ) = ( 1 ρ ) 2 P ( λn 1 a(ρ) ε ) C e M ε3/2 /C, 0 < ε 1 P ( λn 1 a(ρ) + ε ) C e M ε3 /C, 0 < ε a(ρ) Wishart matrices B. Rider, M. L. (2010)
smallest eigenvalue hard edge M = N, ρ = 1 a(ρ) = ( 1 ρ ) 2 = 0 P ( λn 1 ε ) N 2 C ε + C e cn large families of covariance matrices M. Rudelson, R. Vershynin (2008-10)