Concentration Inequalities for Random Matrices

Similar documents
Exponential tail inequalities for eigenvalues of random matrices

Concentration inequalities: basics and some new challenges

Eigenvalue variance bounds for Wigner and covariance random matrices

Markov operators, classical orthogonal polynomial ensembles, and random matrices

Fluctuations from the Semicircle Law Lecture 4

Invertibility of random matrices

Random matrices: Distribution of the least singular value (via Property Testing)

Random regular digraphs: singularity and spectrum

A Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices

A Remark on Hypercontractivity and Tail Inequalities for the Largest Eigenvalues of Random Matrices

Random Matrices: Invertibility, Structure, and Applications

Universality of distribution functions in random matrix theory Arno Kuijlaars Katholieke Universiteit Leuven, Belgium

Random Matrices: Beyond Wigner and Marchenko-Pastur Laws

Local semicircle law, Wegner estimate and level repulsion for Wigner random matrices

1 Intro to RMT (Gene)

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

Stein s method, logarithmic Sobolev and transport inequalities

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Rectangular Young tableaux and the Jacobi ensemble

Numerical analysis and random matrix theory. Tom Trogdon UC Irvine

Fluctuations of random tilings and discrete Beta-ensembles

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

Comparison Method in Random Matrix Theory

Random Matrix: From Wigner to Quantum Chaos

Update on the beta ensembles

III. Quantum ergodicity on graphs, perspectives

Determinantal point processes and random matrix theory in a nutshell

On the concentration of eigenvalues of random symmetric matrices

Fluctuations of random tilings and discrete Beta-ensembles

Extreme eigenvalue fluctutations for GUE

Small Ball Probability, Arithmetic Structure and Random Matrices

Convergence of spectral measures and eigenvalue rigidity

DEVIATION INEQUALITIES ON LARGEST EIGENVALUES

Universality of local spectral statistics of random matrices

RANDOM MATRICES: LAW OF THE DETERMINANT

arxiv: v2 [math.pr] 16 Aug 2014

1 Tridiagonal matrices

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

On the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance matrices

arxiv: v1 [math.pr] 22 May 2008

arxiv: v2 [math.fa] 7 Apr 2010

Random Matrices and Multivariate Statistical Analysis

From the mesoscopic to microscopic scale in random matrix theory

Rigidity of the 3D hierarchical Coulomb gas. Sourav Chatterjee

Semicircle law on short scales and delocalization for Wigner random matrices

Stein s Method: Distributional Approximation and Concentration of Measure

arxiv: v3 [math-ph] 21 Jun 2012

Large sample covariance matrices and the T 2 statistic

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

arxiv: v1 [math.pr] 22 Dec 2018

Fluctuations of the free energy of spherical spin glass

The Matrix Dyson Equation in random matrix theory

Invertibility of symmetric random matrices

Estimation of large dimensional sparse covariance matrices

Triangular matrices and biorthogonal ensembles

Inhomogeneous circular laws for random matrices with non-identically distributed entries

Insights into Large Complex Systems via Random Matrix Theory

Least squares under convex constraint

Concentration, self-bounding functions

Concentration inequalities and the entropy method

Random Matrix Theory and its Applications to Econometrics

Anti-concentration Inequalities

STAT 200C: High-dimensional Statistics

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Statistical Inference and Random Matrices

STAT C206A / MATH C223A : Stein s method and applications 1. Lecture 31

Universal Fluctuation Formulae for one-cut β-ensembles

Painlevé Representations for Distribution Functions for Next-Largest, Next-Next-Largest, etc., Eigenvalues of GOE, GUE and GSE

OPTIMAL UPPER BOUND FOR THE INFINITY NORM OF EIGENVECTORS OF RANDOM MATRICES

Concentration inequalities for non-lipschitz functions

Concentration of the Spectral Measure for Large Random Matrices with Stable Entries

Central Limit Theorems for linear statistics for Biorthogonal Ensembles

Lectures 6 7 : Marchenko-Pastur Law

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

Model Selection and Geometry

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

RANDOM MATRICES: OVERCROWDING ESTIMATES FOR THE SPECTRUM. 1. introduction

Inverse Theorems in Probability. Van H. Vu. Department of Mathematics Yale University

Operator norm convergence for sequence of matrices and application to QIT

Superconcentration inequalities for centered Gaussian stationnary processes

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications

Non white sample covariance matrices.

Concentration for Coulomb gases

Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices

The Tracy-Widom distribution is not infinitely divisible.

Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction

Random matrices: A Survey. Van H. Vu. Department of Mathematics Rutgers University

Homogenization of the Dyson Brownian Motion

Sparse recovery for spherical harmonic expansions

Advances in Random Matrix Theory: Let there be tools

Beyond the Gaussian universality class

Free Probability and Random Matrices: from isomorphisms to universality

Generalization theory

Progress in the method of Ghosts and Shadows for Beta Ensembles

Testing Algebraic Hypotheses

TOP EIGENVALUE OF CAUCHY RANDOM MATRICES

Lecture I: Asymptotics for large GUE random matrices

ROW PRODUCTS OF RANDOM MATRICES

Concentration and Anti-concentration. Van H. Vu. Department of Mathematics Yale University

Second Order Freeness and Random Orthogonal Matrices

Transcription:

Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France

exponential tail inequalities classical theme in probability and statistics quantify the asymptotic statements central limit theorems large deviation principles

classical exponential inequalities sum of independent random variables S n = 1 n (X 1 + + X n ) 0 X i 1 independent P ( S n E(S n ) + t ) e t2 /2, t 0 Hoeffding s inequality same as for X i standard Gaussian central limit theorem

measure concentration ideas asymptotic geometric analysis V. Milman (1970) S n = 1 n (X 1 + + X n ) F (X ) = F (X 1,..., X n ), F : R n R Lipschitz Gaussian sample independent random variables

concentration inequalities S n = 1 n (X 1 + + X n ) F (X ) = F (X 1,..., X n ), F : R n R 1-Lipschitz X 1,..., X n independenty standard Gaussian ( P F (X ) E ( F (X ) ) ) + t e t2 /2, t 0 0 X i 1 independent, F 1-Lipschitz and convex ( P F (X ) E ( F (X ) ) ) + t 2 e t2 /4, t 0 M. Talagrand (1995)

empirical processes X 1,..., X n independent with values in (S, S) F collection of functions f : S [0, 1] Z = sup f F n f (X i ) i=1 Z Lipschitz and convex concentration inequalities on P( Z E(Z) ) t, t 0

Z = sup f F n f (X i ) i=1 f 1, E(f (X i )) = 0, f F P ( Z M t ) ( C exp tc ( )) log t 1 + σ 2, t 0 + M C > 0 numerical constant, M mean or median of Z σ 2 = sup f F n i=1 E(f 2 (X i )) M. Talagrand (1996) P. Massart (2000) S. Boucheron, G. Lugosi, P. Massart (2005) P.-M. Samson (2000) (dependence)

concentration inequalities numerous applications geometric functional analysis discrete and combinatorial probability empirical processes statistical mechanics random matrix theory

concentration inequalities numerous applications geometric functional analysis discrete and combinatorial probability empirical processes statistical mechanics random matrix theory

recent studies of random matrix and random growth models new asymptotics common, non-central, rate (mean) 1/3 universal limiting Tracy-Widom distribution random matrices, longest increasing subsequence, random growth models, last passage percolation...

sample covariance matrices multivariate statistical inference principal component analysis population (Y 1,..., Y N ) Y j vectors (column) in R M (characters) Y = (Y 1,..., Y N ) M N matrix sample covariance matrix Y Y t (M M) (independent) Gaussian Y j : Wishart matrix models

is Y Y t a good approximation of the population covariance matrix E(Y Y t )? M finite 1 N Y Y t E(Y Y t ) N M infinite? M = M(N) N M N ρ (0, ) N

sample covariance matrices Y = (Y 1,..., Y N ) M N matrix Y = (Y ij ) 1 i M,1 j N Y ij independent identically distributed (real or complex) E(Y ij ) = 0, E(Y 2 ij ) = 1 Wishart model : Y j standard Gaussian in R M numerous extensions

sample covariance matrices Y = (Y 1,..., Y N ) M N matrix Y = (Y ij ) 1 i M,1 j N iid E(Y ij ) = 0, E(Y 2 ij ) = 1 center of interest : eigenvalues 0 λ N 1 λn M of Y Y t (M M non-negative symmetric matrix) λ N k singular values of Y λ N k = λn k N eigenvalues of 1 N Y Y t spectral measure 1 M M k=1 δ λn k asymptotics M = M(N) ρ N N

Marchenko-Pastur theorem (1967) asymptotic behavior of the spectral measure ( λ N k = λn k /N) 1 M M k=1 dν(x) = δ λn k ν Marchenko-Pastur distribution ( 1 1 ) ρ δ 0 + 1 (b x)(x a) 1[a,b] dx + ρ 2πx a = a(ρ) = ( 1 ρ ) 2 b = b(ρ) = ( 1 + ρ ) 2

1 M M k=1 Marchenko-Pastur theorem δ λn k ν on ( a(ρ), b(ρ) ) M ρ N global regime large deviation asymptotics of the spectral measure fluctuations of the spectral measure M [ ) f ( λn k R f dν] G k=1 f : R R smooth Gaussian variable

1 M M k=1 Marchenko-Pastur theorem δ λn k ν on ( a(ρ), b(ρ) ) M ρ N local regime behavior of the individual eigenvalues spacings (bulk behavior) extremal eigenvalues (edge behavior)

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N

Marchenko-Pastur theorem (1967) asymptotic behavior of the spectral measure ( λ N k = λn k /N) 1 M M k=1 dν(x) = δ λn k ν Marchenko-Pastur distribution ( 1 1 ) ρ δ 0 + 1 (b x)(x a) 1[a,b] dx + ρ 2πx a = a(ρ) = ( 1 ρ ) 2 b = b(ρ) = ( 1 + ρ ) 2

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3[ λn M b(ρ) ] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3 N 1[ λ N M b(ρ)n] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)

F TW (complex) C. Tracy, H. Widom (1994) distribution ( F TW (s) = exp u = 2u 3 + xu s (x s)u(x) 2 dx Painlevé II equation ), s R density

mean 1.77 F TW (s) e s3 /12 as s 1 F TW (s) e 4s3/2 /3 as s + density (similar for real case)

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3[ λn M b(ρ) ] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)

Gaussian (Wishart matrices) completely solvable models determinantal structure orthogonal polynomial analysis asymptotics of Laguerre orthogonal polynomials C. Tracy, H. Widom (1994) K. Johansson (2000), I. Johnstone (2001)

extension to non-gaussian matrices A. Soshnikov (2001-02) moment method E ( Tr ( (YY t ) p)) L. Erdös, H.-T. Yau (2009-12) (and collaborators) local Marchenko-Pastur law T. Tao, V. Vu (2010-11) Lindeberg comparison method symmetric matrices

(brief) survey of recent approaches to non-asymptotic exponential inequalities quantify the limit theorems spectral measure extremal eigenvalues catch the new rate (mean) 1/3 from the Gaussian case to non-gaussian models

two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1

1 M M k=1 Marchenko-Pastur theorem δ λn k ν on ( a(ρ), b(ρ) ) M ρ N global regime large deviation asymptotics of the spectral measure fluctuations of the spectral measure M [ ) f ( λn k R f dν] G k=1 f : R R smooth Gaussian variable

two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε )

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3[ λn M b(ρ) ] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)

two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε ) Wishart matrices more general covariance matrices

measure concentration tool F = F (Y Y t ) = F (Y ij ) satisfactory for the global regime less satisfactory for the local regime specific functionals eigenvalue counting function extreme eigenvalues

two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε ) Wishart matrices more general covariance matrices

tail inequalities for the spectral measure A. Guionnet, O. Zeitouni (2000) measure concentration tool f : R R smooth (Lipschitz) X = (X ij ) 1 i,j M M M symmetric matrix eigenvalues λ 1 λ M M F : X Tr f (X ) = f (λ k ) k=1 with respect to the Euclidean structure on Lipschitz M M matrices convex if f is convex

concentration inequalities S n = 1 n (X 1 + + X n ) F (X ) = F (X 1,..., X n ), F : R n R 1-Lipschitz X 1,..., X n independenty standard Gaussian ( P F (X ) E ( F (X ) ) ) + t e t2 /2, t 0 0 X i 1 independent, F 1-Lipschitz and convex ( P F (X ) E ( F (X ) ) ) + t 2 e t2 /4, t 0 M. Talagrand (1995)

tail inequalities for the spectral measure Gaussian entries Y ij f : R R such that f (x 2 ) 1-Lipschitz ( M [ ) P f ( λ N k ) E( f ( λ N k ))] t C(ρ) e t2 /C(ρ), t 0 k=1 compactly supported entries Y ij f : R R such that f (x 2 ) 1-Lipschitz and convex

1 M M k=1 Marchenko-Pastur theorem δ λn k ν on ( a(ρ), b(ρ) ) M ρ N global regime large deviation asymptotics of the spectral measure fluctuations of the spectral measure M [ ) f ( λn k R f dν] G k=1 f : R R smooth Gaussian variable

non-lipschitz functions f typically f = 1 I, I R interval M f ( λn ) k = # { λn k I } = N I counting function k=1 Wishart matrices (determinantal structure) I interval in (a, b) 1 log M [ NI E(N I ) ] G Gaussian variable exponential tail inequalities P ( N I E(N I ) t ) C e ct log(1+t/ log M), t 0 Var ( N I ) = O(log M)

non-gaussian covariance matrices comparison with Wishart model partial results localization results L. Erdös, H.-T. Yau (2009-12) Lindeberg comparison method T. Tao, V. Vu (2010-11) Var ( N I ) = O(log M) S. Dallaporta, V. Vu (2011) P ( N I E(N I ) t ) C e ctδ, t C log M, 0 < δ 1 T. Tao, V. Vu (2012)

non-lipschitz functions f typically f = 1 I, I R interval M f ( λn ) k = # { λn k I } = N I counting function k=1 Wishart matrices (determinantal structure) I interval in (a, b) 1 log M [ NI E(N I ) ] G Gaussian variable exponential tail inequalities P ( N I E(N I ) t ) C e ct log(1+t/ log M), t 0 Var ( N I ) = O(log M)

two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε ) Wishart matrices more general covariance matrices

two main questions and objectives tail inequalities for the spectral measure ( M ) P f ( λ N k ) t k=1 tail inequalities for the extremal eigenvalues P ( λn M b(ρ) + ε ) Wishart matrices more general covariance matrices

tail inequalities for the extremal eigenvalues fluctuations of the largest eigenvalue M 2/3[ λn M b(ρ) ] C(ρ) F TW M ρ N

extremal eigenvalues largest eigenvalue λ N M = max 1 k M λ N k λ N M = λn M N b(ρ) = ( 1 + ρ ) 2 M ρ N fluctuations around b(ρ) complex or real Gaussian (Wishart matrices) M 2/3[ λn M b(ρ) ] C(ρ) F TW F TW C. Tracy, H. Widom (1994) distribution K. Johansson (2000), I. Johnstone (2001)

tail inequalities for the extremal eigenvalues fluctuations of the largest eigenvalue M 2/3[ λn M b(ρ) ] C(ρ) F TW M ρ N finite M inequalities at the (mean) 1/3 rate reflecting the tails of F TW bounds on Var( λ N M )

measure concentration tool (Gaussian) Wishart matrix Y Y t λ N M = max 1 k M λn k = sup Y v 2 v =1 s N M = λ N M Lipschitz of the Gaussian entries Y ij Gaussian concentration ( P ŝm N E( ŝm N ) ) + t e M t2 /C, t 0 E(ŝ N M ) b(ρ) correct large deviation bounds (t 1)

measure concentration tool (Gaussian) Wishart matrix Y Y t λ N M = max 1 k M λn k = sup Y v 2 v =1 s N M = λ N M Lipschitz of the Gaussian entries Y ij Gaussian concentration ( P ŝm N E( ŝm N ) ) + t e M t2 /C, t 0 E(ŝ N M ) b(ρ) does not fit the small deviation regime t = s M 2/3

extreme eigenvalues alternate tools Riemann-Hilbert analysis (Wishart matrices) tri-diagonal representations (Wishart and β-ensembles) moment methods (Wishart and non-gaussian matrices)

extreme eigenvalues alternate tools Riemann-Hilbert analysis (Wishart matrices) tri-diagonal representations (Wishart and β-ensembles) moment methods (Wishart and non-gaussian matrices)

M 2/3[ λn M b(ρ) ] C(ρ) F TW P( λn M b(ρ) + s M 2/3) F TW (C s) bounds for Wishart matrices tri-diagonal representation B. Rider, M. L. (2010) P ( λn M b(ρ) + ɛ ) C e Mε3/2 /C, 0 < ε 1 P ( λn M b(ρ) ɛ ) C e Mε3 /C, 0 < ε b(ρ)

P( λn M b(ρ) + s M 2/3) F TW (C s) bounds for Wishart matrices P ( λn M b(ρ) + ɛ ) C e Mε3/2 /C, 0 < ε 1 P ( λn M b(ρ) ɛ ) C e Mε3 /C, 0 < ε b(ρ) fit the Tracy-Widom asymptotics (ε = s M 2/3 ) 1 F TW (s) e s3/2 /C (s + ) F TW (s) e s3 /C (s ) Var( λ N M ) = O ( 1 M 4/3 )

M 2/3[ λn M b(ρ) ] C(ρ) F TW b(ρ) = ( 1 + ρ ) 2 λ N M = λn M /N, M = M(N) ρ N ( MN) 1/3 ( ( M + λ N N) 4/3 M ( M + N) 2) F TW N + 1 M 0 < ε 1 ( P λ N M ( M + ) N) 2 (1 + ε) C e MN ε 3/2 ( 1 ε ( M N ) 1/4)/C ( P λ N M ( M + ) N) 2 (1 ε) C e MN ε3 ( 1 ε ( M N ) 1/2)/C

bi and tri-diagonal representation χ N 0 0 0 χ (M 1) χ N 1 0 0.. 0 χ B = (M 2) χ N 3 0..... 0........ 0.... χ2 χ N M+2 0 0 0 χ 1 χ N M+1 χ (N 1),..., χ 1, χ (M 1),..., χ 1 independent chi-variables B B t same spectrum as Y Y t (Wishart) H. Trotter (1984), A. Edelman, I. Dimitriu (2002) extension to β-ensembles

bounds for non-gaussian entries moment method E ( Tr ( (YY t ) p)) O. Feldheim, S. Sodin (2010) largest eigenvalue (symmetric, subgaussian entries) P ( λn M b(ρ) + ε ) C e M ε3/2 /C, 0 < ε 1 below the mean? necessary for variance bounds

variance level Var( λ N M ) = O ( 1 M 4/3 ) S. Dallaporta (2012) comparison with Wishart model localization results L. Erdös, H.-T. Yau (2009-12) Lindeberg comparison method T. Tao, V. Vu (2010-11)

smallest eigenvalue soft edge M = M(N) ρ N, ρ < 1 a(ρ) = ( 1 ρ ) 2 P ( λn 1 a(ρ) ε ) C e M ε3/2 /C, 0 < ε 1 P ( λn 1 a(ρ) + ε ) C e M ε3 /C, 0 < ε a(ρ) Wishart matrices B. Rider, M. L. (2010)

smallest eigenvalue hard edge M = N, ρ = 1 a(ρ) = ( 1 ρ ) 2 = 0 P ( λn 1 ε ) N 2 C ε + C e cn large families of covariance matrices M. Rudelson, R. Vershynin (2008-10)