Random Matrices for Big Data Signal Processing and Machine Learning

Size: px
Start display at page:

Download "Random Matrices for Big Data Signal Processing and Machine Learning"

Transcription

1 Random Matrices for Big Data Signal Processing and Machine Learning (ICASSP 2017, New Orleans) Romain COUILLET and Hafiz TIOMOKO ALI CentraleSupélec, France March, / 153

2 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 2 / 153

3 Basics of Random Matrix Theory/ 3/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 3 / 153

4 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 4/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 4 / 153

5 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : 5 / 153

6 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is the sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 5 / 153

7 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is the sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 If n, then, strong law of large numbers Ĉ N a.s. C N. or equivalently, in spectral norm ĈN C N a.s / 153

8 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is the sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 If n, then, strong law of large numbers Ĉ N a.s. C N. or equivalently, in spectral norm ĈN C N a.s. 0. Random Matrix Regime No longer valid if N, n with N/n c (0, ), ĈN C N 0. 5 / 153

9 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is the sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 If n, then, strong law of large numbers Ĉ N a.s. C N. or equivalently, in spectral norm ĈN C N a.s. 0. Random Matrix Regime No longer valid if N, n with N/n c (0, ), ĈN C N 0. For practical N, n with N n, leads to dramatically wrong conclusions 5 / 153

10 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is the sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 If n, then, strong law of large numbers Ĉ N a.s. C N. or equivalently, in spectral norm ĈN C N a.s. 0. Random Matrix Regime No longer valid if N, n with N/n c (0, ), ĈN C N 0. For practical N, n with N n, leads to dramatically wrong conclusions Even for N = n/ / 153

11 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 6/153 The Large Dimensional Fallacies Setting: x i C N i.i.d., x 1 CN (0, I N ) 6 / 153

12 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 6/153 The Large Dimensional Fallacies Setting: x i C N i.i.d., x 1 CN (0, I N ) assume N = N(n) such that N/n c > 1 6 / 153

13 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 6/153 The Large Dimensional Fallacies Setting: x i C N i.i.d., x 1 CN (0, I N ) assume N = N(n) such that N/n c > 1 then, joint point-wise convergence ] max 1 i,j N [ĈN I N ij = max 1 1 i,j N n X j, Xi, δ ij a.s / 153

14 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 6/153 The Large Dimensional Fallacies Setting: x i C N i.i.d., x 1 CN (0, I N ) assume N = N(n) such that N/n c > 1 then, joint point-wise convergence ] max 1 i,j N [ĈN I N ij = max 1 1 i,j N n X j, Xi, δ ij a.s. 0. however, eigenvalue mismatch 0 = λ 1 (ĈN ) =... = λ N n (ĈN ) λ N n+1 (ĈN )... λ N (ĈN ) 1 = λ 1 (I N ) =... = λ N n (I N ) = λ N n+1 (ĈN ) =... = λ N (I N ) 6 / 153

15 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 6/153 The Large Dimensional Fallacies Setting: x i C N i.i.d., x 1 CN (0, I N ) assume N = N(n) such that N/n c > 1 then, joint point-wise convergence ] max 1 i,j N [ĈN I N ij = max 1 1 i,j N n X j, Xi, δ ij a.s. 0. however, eigenvalue mismatch 0 = λ 1 (ĈN ) =... = λ N n (ĈN ) λ N n+1 (ĈN )... λ N (ĈN ) 1 = λ 1 (I N ) =... = λ N n (I N ) = λ N n+1 (ĈN ) =... = λ N (I N ) no convergence in spectral norm. 6 / 153

16 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 7/153 The Mar cenko Pastur law 0.8 Empirical eigenvalue distribution Mar cenko Pastur Law 0.6 Density Eigenvalues of ĈN Figure: Histogram of the eigenvalues of ĈN for N = 500, n = 2000, C N = I N. 7 / 153

17 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 8/153 The Mar cenko Pastur law Definition (Empirical Spectral Density) Empirical spectral density (e.s.d.) µ N of Hermitian matrix A N C N N is µ N = 1 N δ λi (A N N ). i=1 8 / 153

18 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 8/153 The Mar cenko Pastur law Definition (Empirical Spectral Density) Empirical spectral density (e.s.d.) µ N of Hermitian matrix A N C N N is µ N = 1 N δ λi (A N N ). i=1 Theorem (Mar cenko Pastur Law [Mar cenko,pastur 67]) X N C N n with i.i.d. zero mean, unit variance entries. As N, n with N/n c (0, ), e.s.d. µ N of 1 n X N X N satisfies weakly, where µ c({0}) = max{0, 1 c 1 } µ N a.s. µ c 8 / 153

19 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 8/153 The Mar cenko Pastur law Definition (Empirical Spectral Density) Empirical spectral density (e.s.d.) µ N of Hermitian matrix A N C N N is µ N = 1 N δ λi (A N N ). i=1 Theorem (Mar cenko Pastur Law [Mar cenko,pastur 67]) X N C N n with i.i.d. zero mean, unit variance entries. As N, n with N/n c (0, ), e.s.d. µ N of 1 n X N X N satisfies weakly, where µ c({0}) = max{0, 1 c 1 } µ N a.s. µ c on (0, ), µ c has continuous density f c supported on [(1 c) 2, (1 + c) 2 ] f c(x) = 1 (x (1 c) 2 )((1 + c) 2 x). 2πcx 8 / 153

20 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 9/153 The Mar cenko Pastur law 1.2 c = Density fc(x) x Figure: Mar cenko-pastur law for different limit ratios c = lim N N/n. 9 / 153

21 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 9/153 The Mar cenko Pastur law 1.2 c = 0.1 c = Density fc(x) x Figure: Mar cenko-pastur law for different limit ratios c = lim N N/n. 9 / 153

22 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 9/153 The Mar cenko Pastur law 1.2 c = 0.1 c = c = Density fc(x) x Figure: Mar cenko-pastur law for different limit ratios c = lim N N/n. 9 / 153

23 Basics of Random Matrix Theory/The Stieltjes Transform Method 10/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 10 / 153

24 Basics of Random Matrix Theory/The Stieltjes Transform Method 11/153 The Stieltjes transform Definition (Stieltjes Transform) For µ real probability measure of support supp(µ), Stieltjes transform m µ defined, for z C \ supp(µ), as 1 m µ(z) = t z µ(dt). 11 / 153

25 Basics of Random Matrix Theory/The Stieltjes Transform Method 11/153 The Stieltjes transform Definition (Stieltjes Transform) For µ real probability measure of support supp(µ), Stieltjes transform m µ defined, for z C \ supp(µ), as 1 m µ(z) = t z µ(dt). Property (Inverse Stieltjes Transform) For a < b continuity points of µ, 1 µ([a, b]) = lim ε 0 π b I[m µ(x + ıε)]dx a 11 / 153

26 Basics of Random Matrix Theory/The Stieltjes Transform Method 11/153 The Stieltjes transform Definition (Stieltjes Transform) For µ real probability measure of support supp(µ), Stieltjes transform m µ defined, for z C \ supp(µ), as 1 m µ(z) = t z µ(dt). Property (Inverse Stieltjes Transform) For a < b continuity points of µ, 1 µ([a, b]) = lim ε 0 π Besides, if µ has a density f at x, b I[m µ(x + ıε)]dx a 1 f(x) = lim I[mµ(x + ıε)]. ε 0 π 11 / 153

27 Basics of Random Matrix Theory/The Stieltjes Transform Method 12/153 The Stieltjes transform Property (Relation to e.s.d.) If µ e.s.d. of Hermitian A C N N, (i.e., µ = 1 N N i=1 δ λ i (A)) m µ(z) = 1 N tr (A zi N ) 1 12 / 153

28 Basics of Random Matrix Theory/The Stieltjes Transform Method 12/153 The Stieltjes transform Property (Relation to e.s.d.) If µ e.s.d. of Hermitian A C N N, (i.e., µ = 1 N N i=1 δ λ i (A)) m µ(z) = 1 N tr (A zi N ) 1 Proof: m µ(z) = µ(dt) t z = 1 N N i=1 = 1 N tr (A zi N ) 1. 1 λ i (A) z = 1 N tr (diag{λ i(a)} zi N ) 1 12 / 153

29 Basics of Random Matrix Theory/The Stieltjes Transform Method 13/153 The Stieltjes transform Property (Stieltjes transform of Gram matrices) For X C N n, and Then µ e.s.d. of XX µ e.s.d. of X X m µ(z) = n N m µ(z) N n 1 N z. 13 / 153

30 Basics of Random Matrix Theory/The Stieltjes Transform Method 13/153 The Stieltjes transform Property (Stieltjes transform of Gram matrices) For X C N n, and Then µ e.s.d. of XX µ e.s.d. of X X m µ(z) = n N m µ(z) N n 1 N z. Proof: m µ(z) = 1 N 1 N λ i=1 i (XX ) z = 1 N n i=1 1 λ i (X X) z + 1 N (N n) 1 0 z. 13 / 153

31 Basics of Random Matrix Theory/The Stieltjes Transform Method 14/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Resolvent Identity) For A, B C N N invertible, A 1 B 1 = A 1 (B A)B / 153

32 Basics of Random Matrix Theory/The Stieltjes Transform Method 14/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Resolvent Identity) For A, B C N N invertible, A 1 B 1 = A 1 (B A)B 1. Corollary For t C, x C N, A C N N, with A and A + txx invertible, (A + txx ) 1 x = A 1 x 1 + tx A 1 x. 14 / 153

33 Basics of Random Matrix Theory/The Stieltjes Transform Method 15/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Rank-one perturbation) For A, B C N N Hermitian nonnegative definite, e.s.d. µ of A, t > 0, x C N, z C \ supp(µ), 1 N tr B (A + txx zi N ) 1 1 N tr B (A zi N ) 1 1 B N dist(z, supp(µ)) 15 / 153

34 Basics of Random Matrix Theory/The Stieltjes Transform Method 15/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Rank-one perturbation) For A, B C N N Hermitian nonnegative definite, e.s.d. µ of A, t > 0, x C N, z C \ supp(µ), 1 N tr B (A + txx zi N ) 1 1 N tr B (A zi N ) 1 1 B N dist(z, supp(µ)) In particular, as N, if lim sup N B <, 1 N tr B (A + txx zi N ) 1 1 N tr B (A zi N ) / 153

35 Basics of Random Matrix Theory/The Stieltjes Transform Method 16/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Trace Lemma) For then x C N with i.i.d. entries with zero mean, unit variance, finite 2p order moment, A C N N deterministic (or independent of x), [ 1 E N x Ax 1 N tr A p ] K A p N p/2. 16 / 153

36 Basics of Random Matrix Theory/The Stieltjes Transform Method 16/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Trace Lemma) For then x C N with i.i.d. entries with zero mean, unit variance, finite 2p order moment, A C N N deterministic (or independent of x), [ 1 E N x Ax 1 N tr A p ] K A p N p/2. In particular, if lim sup N A <, and x has entries with finite eighth-order moment, 1 N x Ax 1 a.s. tr A 0 N (by Markov inequality and Borel Cantelli lemma). 16 / 153

37 Basics of Random Matrix Theory/The Stieltjes Transform Method 17/153 Proof of the Mar cenko Pastur law Theorem (Mar cenko Pastur Law [Mar cenko,pastur 67]) X N C N n with i.i.d. zero mean, unit variance entries. As N, n with N/n c (0, ), e.s.d. µ N of 1 n X N X N satisfies weakly, where µ c({0}) = max{0, 1 c 1 } µ N a.s. µ c 17 / 153

38 Basics of Random Matrix Theory/The Stieltjes Transform Method 17/153 Proof of the Mar cenko Pastur law Theorem (Mar cenko Pastur Law [Mar cenko,pastur 67]) X N C N n with i.i.d. zero mean, unit variance entries. As N, n with N/n c (0, ), e.s.d. µ N of 1 n X N X N satisfies weakly, where µ c({0}) = max{0, 1 c 1 } µ N a.s. µ c on (0, ), µ c has continuous density f c supported on [(1 c) 2, (1 + c) 2 ] f c(x) = 1 (x (1 c) 2 )((1 + c) 2 x). 2πcx 17 / 153

39 Basics of Random Matrix Theory/The Stieltjes Transform Method 18/153 Proof of the Mar cenko Pastur law Stieltjes transform approach. 18 / 153

40 Basics of Random Matrix Theory/The Stieltjes Transform Method 18/153 Proof of the Mar cenko Pastur law Stieltjes transform approach. Proof With µ N e.s.d. of 1 n X N X N, m µn (z) = 1 N tr ( 1 n X N X N zi N ) 1 = 1 N [ N ( ) ] 1 1 n X N XN zi N. i=1 ii 18 / 153

41 Basics of Random Matrix Theory/The Stieltjes Transform Method 18/153 Proof of the Mar cenko Pastur law Stieltjes transform approach. Proof With µ N e.s.d. of 1 n X N X N, m µn (z) = 1 N tr ( 1 n X N X N zi N ) 1 = 1 N [ N ( ) ] 1 1 n X N XN zi N. i=1 ii Write [ ] y X N = C Y N n N 1 18 / 153

42 Basics of Random Matrix Theory/The Stieltjes Transform Method 18/153 Proof of the Mar cenko Pastur law Stieltjes transform approach. Proof With µ N e.s.d. of 1 n X N X N, m µn (z) = 1 N tr ( 1 n X N X N zi N ) 1 = 1 N [ N ( ) ] 1 1 n X N XN zi N. i=1 ii Write so that, for I[z] > 0, [ ] y X N = C Y N n N 1 ( ) 1 1 n X N XN zi N = ( 1 n y y z 1 n Y N 1y 1 ) 1 n y Y N 1 1 n Y N 1YN 1 zi. N 1 18 / 153

43 Basics of Random Matrix Theory/The Stieltjes Transform Method 19/153 Proof of the Mar cenko Pastur law Proof (continued) From block matrix inverse formula ( ) 1 ( A B (A BD = 1 C) 1 A 1 B(D CA 1 B) 1 ) C D (A BD 1 C) 1 CA 1 (D CA 1 B) 1 we have [ ( ) ] 1 1 n X N XN zi N 11 = 1 z z 1 n y ( 1 n Y N 1 Y N 1 zi n) 1 y. 19 / 153

44 Basics of Random Matrix Theory/The Stieltjes Transform Method 19/153 Proof of the Mar cenko Pastur law Proof (continued) From block matrix inverse formula ( ) 1 ( A B (A BD = 1 C) 1 A 1 B(D CA 1 B) 1 ) C D (A BD 1 C) 1 CA 1 (D CA 1 B) 1 we have [ ( ) ] 1 1 n X N XN zi N 11 = 1 z z 1 n y ( 1 n Y N 1 Y N 1 zi n) 1 y. By Trace Lemma, as N, n [ ( ) ] 1 1 n X N XN zi 1 N z z 1 11 n tr ( 1 n Y N 1 Y N 1 zi n) 1 a.s / 153

45 Basics of Random Matrix Theory/The Stieltjes Transform Method 20/153 Proof of the Mar cenko Pastur law Proof (continued) By Rank-1 Perturbation Lemma (X N X N = Y N 1 Y N 1 + yy ), as N, n [ ( ) ] 1 1 n X N XN zi N 11 1 z z 1 n tr ( 1 n X N X N zi n) 1 a.s / 153

46 Basics of Random Matrix Theory/The Stieltjes Transform Method 20/153 Proof of the Mar cenko Pastur law Proof (continued) By Rank-1 Perturbation Lemma (X N X N = Y N 1 Y N 1 + yy ), as N, n [ ( ) ] 1 1 n X N XN zi N 11 1 z z 1 n tr ( 1 n X N X N zi n) 1 a.s. 0. Since 1 n tr ( 1 n X N X N zi n) 1 = 1 n tr ( 1 n X N X N zi N ) 1 n N n [ ( ) ] 1 1 n X N XN zi N 11 1 z, 1 1 N n z z 1 n tr ( 1 n X N X N zi N ) 1 a.s / 153

47 Basics of Random Matrix Theory/The Stieltjes Transform Method 20/153 Proof of the Mar cenko Pastur law Proof (continued) By Rank-1 Perturbation Lemma (X N X N = Y N 1 Y N 1 + yy ), as N, n [ ( ) ] 1 1 n X N XN zi N 11 1 z z 1 n tr ( 1 n X N X N zi n) 1 a.s. 0. Since 1 n tr ( 1 n X N X N zi n) 1 = 1 n tr ( 1 n X N X N zi N ) 1 n N n [ ( ) ] 1 1 n X N XN zi N 11 1 z, 1 1 N n z z 1 n tr ( 1 n X N X N zi N ) 1 a.s. 0. Repeating for entries (2, 2),..., (N, N), and averaging, we get (for I[z] > 0) m µn (z) 1 1 N n z z N n mµ N (z) a.s / 153

48 Basics of Random Matrix Theory/The Stieltjes Transform Method 21/153 Proof of the Mar cenko Pastur law Proof (continued) Then m µn (z) a.s. m(z) solution to m(z) = 1 1 c z czm(z) 21 / 153

49 Basics of Random Matrix Theory/The Stieltjes Transform Method 21/153 Proof of the Mar cenko Pastur law Proof (continued) Then m µn (z) a.s. m(z) solution to m(z) = 1 1 c z czm(z) i.e., (with branch of f(z) such that m(z) 0 as z ) (z ) ( m(z) = 1 c 2cz 1 (1 + c) 2 z (1 c) 2) 2c +. 2cz 21 / 153

50 Basics of Random Matrix Theory/The Stieltjes Transform Method 21/153 Proof of the Mar cenko Pastur law Proof (continued) Then m µn (z) a.s. m(z) solution to m(z) = 1 1 c z czm(z) i.e., (with branch of f(z) such that m(z) 0 as z ) (z ) ( m(z) = 1 c 2cz 1 (1 + c) 2 z (1 c) 2) 2c +. 2cz Finally, by inverse Stieltjes Transform, for x > 0, ((1 1 + c) 2 x ) ( x (1 c) 2) lim I[m(x + ıε)] = 1 ε 0 π 2πcx {x [(1 c) 2,(1+ c) 2 ]}. And for x = 0, lim ε 0 ıεi[m(ıε)] = ( 1 c 1) 1 {c>1}. 21 / 153

51 Basics of Random Matrix Theory/The Stieltjes Transform Method 22/153 Sample Covariance Matrices Theorem (Sample Covariance Matrix Model [Silverstein,Bai 95]) Let Y N = C 1 2 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, X N C N n has i.i.d. entries of zero mean and unit variance. As N, n, N/n c (0, ), µ N e.s.d. of 1 n Y N Y N C n n satisfies µ N a.s. µ weakly, with m µ (z), I[z] > 0, unique solution with I[m µ (z)] > 0 of ( m µ (z) = z + c ) t tm µ (z) ν(dt). 22 / 153

52 Basics of Random Matrix Theory/The Stieltjes Transform Method 22/153 Sample Covariance Matrices Theorem (Sample Covariance Matrix Model [Silverstein,Bai 95]) Let Y N = C 1 2 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, X N C N n has i.i.d. entries of zero mean and unit variance. As N, n, N/n c (0, ), µ N e.s.d. of 1 n Y N Y N C n n satisfies µ N a.s. µ weakly, with m µ (z), I[z] > 0, unique solution with I[m µ (z)] > 0 of ( m µ (z) = z + c ) t tm µ (z) ν(dt). Moreover, µ is continuous on R + and real analytic wherever positive. 22 / 153

53 Basics of Random Matrix Theory/The Stieltjes Transform Method 22/153 Sample Covariance Matrices Theorem (Sample Covariance Matrix Model [Silverstein,Bai 95]) Let Y N = C 1 2 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, X N C N n has i.i.d. entries of zero mean and unit variance. As N, n, N/n c (0, ), µ N e.s.d. of 1 n Y N Y N C n n satisfies µ N a.s. µ weakly, with m µ (z), I[z] > 0, unique solution with I[m µ (z)] > 0 of ( m µ (z) = z + c ) t tm µ (z) ν(dt). Moreover, µ is continuous on R + and real analytic wherever positive. Immediate corollary: For µ N e.s.d. of 1 n Y N Y N = 1 n n i=1 C 1 2 N x i x i C 1 2 N, weakly, with µ = cµ + (1 c)δ 0. µ N a.s. µ 22 / 153

54 Basics of Random Matrix Theory/The Stieltjes Transform Method 23/153 Sample Covariance Matrices e.s.d. e.s.d. f f (i) (ii) Figure: Histogram of the eigenvalues of 1 n Y N Y N, n = 3000, N = 300, with C N diagonal with evenly weighted masses in (i) 1, 3, 7, (ii) 1, 3, / 153

55 Basics of Random Matrix Theory/The Stieltjes Transform Method 24/153 Further Models and Deterministic Equivalents Theorem (Doubly-correlated i.i.d. matrices) Let B N = C 1 2 N X N T N X N C 1 2 N, with e.s.d. µ N, X k C N n with i.i.d. entries of zero mean, variance 1/n, C N Hermitian nonnegative definite, T N diagonal nonnegative, lim sup N max( C N, T N ) <. Denote c = N/n. Then, as N, n with bounded ratio c, for z C \ R, m µn (z) m N (z) a.s. 0, m N (z) = 1 N tr ( zi N + ē N (z)c N ) 1 with ē(z) unique solution in {z C +, ē N (z) C + } or {z R, ē N (z) R + } of e N (z) = 1 N tr C N ( zi N + ē N (z)c N ) 1 ē N (z) = 1 n tr T N (I n + ce N (z)t N ) / 153

56 Basics of Random Matrix Theory/The Stieltjes Transform Method 25/153 Other Refined Sample Covariance Models Side note on other models. Similar results for multiple matrix models: 25 / 153

57 Basics of Random Matrix Theory/The Stieltjes Transform Method 25/153 Other Refined Sample Covariance Models Side note on other models. Similar results for multiple matrix models: Information-plus-noise: Y N = A N + X N, A N deterministic Variance profile: Y N = P N X N (entry-wise product) Per-column covariance: Y N = [y 1,..., y n], y i = C 1 2 N,i x i etc. 25 / 153

58 Basics of Random Matrix Theory/Spiked Models 26/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 26 / 153

59 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) 2 Let Y N = C 1 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, 27 / 153

60 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) 2 Let Y N = C 1 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, E[ X N 4 ij ] <, 27 / 153

61 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) 2 Let Y N = C 1 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, E[ X N 4 ij ] <, X N C N n has i.i.d. entries of zero mean and unit variance, 27 / 153

62 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) 2 Let Y N = C 1 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, E[ X N 4 ij ] <, X N C N n has i.i.d. entries of zero mean and unit variance, max i dist(λ i (C N ), supp(ν)) / 153

63 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) Let Y N = C 1 2 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, E[ X N 4 ij ] <, X N C N n has i.i.d. entries of zero mean and unit variance, max i dist(λ i (C N ), supp(ν)) 0. Let µ be the limiting e.s.d. of 1 n Y N Y N as before. Let [a, b] R \ supp( ν). Then, for all large n, almost surely. { ( )} 1 n λ i n Y N Y N [a, b] = i=1 27 / 153

64 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) Let Y N = C 1 2 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, E[ X N 4 ij ] <, X N C N n has i.i.d. entries of zero mean and unit variance, max i dist(λ i (C N ), supp(ν)) 0. Let µ be the limiting e.s.d. of 1 n Y N Y N as before. Let [a, b] R \ supp( ν). Then, for all large n, almost surely. { ( )} 1 n λ i n Y N Y N [a, b] = i=1 In practice: This means that eigenvalues of 1 n Y N Y N cannot be bound at macroscopic distance from the bulk, for N, n large. 27 / 153

65 Basics of Random Matrix Theory/Spiked Models 28/153 Spiked Models Breaking the rules. If we break Rule 1: Infinitely many eigenvalues may wander away from supp(µ). 0.8 {λ i } n i=1 µ 0.8 {λ i } n i=1 µ E[X 4 ij ] < E[X4 ij ] = 28 / 153

66 Basics of Random Matrix Theory/Spiked Models 29/153 Spiked Models If we break: Rule 2: C N may create isolated eigenvalues in 1 n Y N YN, called spikes. 0.8 {λ i } n i=1 µ ω 1 + c 1+ω 1 ω 1, 1 + ω 2 + c 1+ω 2 ω 2 Figure: Eigenvalues of 1 n Y N Y N, C N = diag(1,..., 1, 2, 2, 3, 3), N = 500, n = }{{} N 4 29 / 153

67 Basics of Random Matrix Theory/Spiked Models 30/153 Spiked Models Theorem (Eigenvalues [Baik,Silverstein 06]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, E[ X N 4 ij ] <. C N = I N + P, P = UΩU, where, for K fixed, Ω = diag (ω 1,..., ω K ) R K K, with ω 1... ω K > / 153

68 Basics of Random Matrix Theory/Spiked Models 30/153 Spiked Models Theorem (Eigenvalues [Baik,Silverstein 06]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, E[ X N 4 ij ] <. C N = I N + P, P = UΩU, where, for K fixed, Ω = diag (ω 1,..., ω K ) R K K, with ω 1... ω K > 0. Then, as N, n, N/n c (0, ), denoting λ i = λ i ( 1 n Y N Y N ), if ω m > c, λ m a.s. 1 + ω m + c 1 + ωm ω m > (1 + c) 2 30 / 153

69 Basics of Random Matrix Theory/Spiked Models 30/153 Spiked Models Theorem (Eigenvalues [Baik,Silverstein 06]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, E[ X N 4 ij ] <. C N = I N + P, P = UΩU, where, for K fixed, Ω = diag (ω 1,..., ω K ) R K K, with ω 1... ω K > 0. Then, as N, n, N/n c (0, ), denoting λ i = λ i ( 1 n Y N Y N ), if ω m > c, λ m a.s. 1 + ω m + c 1 + ωm ω m > (1 + c) 2 if ω m (0, c], λ m a.s. (1 + c) 2 30 / 153

70 Basics of Random Matrix Theory/Spiked Models 31/153 Spiked Models Proof Two ingredients: Algebraic calculus + trace lemma 31 / 153

71 Basics of Random Matrix Theory/Spiked Models 31/153 Spiked Models Proof Two ingredients: Algebraic calculus + trace lemma Find eigenvalues away from eigenvalues of 1 n X N X N : ( 1 0 = det = det(c N ) det ( 1 = det n Y N Y N λi N ) ( ) 1 n X N XN λc 1 N ) n X N XN λi N + λ(i N C 1 N ) ( ) ( 1 = det n X N XN λi N det ( I N + λ(i N C 1 1 N ) n X N XN λi N ) 1 ). 31 / 153

72 Basics of Random Matrix Theory/Spiked Models 31/153 Spiked Models Proof Two ingredients: Algebraic calculus + trace lemma Find eigenvalues away from eigenvalues of 1 n X N X N : ( 1 0 = det = det(c N ) det ( 1 = det n Y N Y N λi N ) ( ) 1 n X N XN λc 1 N ) n X N XN λi N + λ(i N C 1 N ) ( ) ( 1 = det n X N XN λi N det ( I N + λ(i N C 1 1 N ) n X N XN λi N ) 1 ). Use low rank property: I N C 1 N = I N (I N + UΩU ) 1 = U(I K + Ω 1 ) 1 U, Ω C K K. Hence ( ) ( ( ) ) = det n X N XN λi N det I N + λu(i K + Ω 1 ) 1 U n X N XN λi N

73 Basics of Random Matrix Theory/Spiked Models 31/153 Spiked Models Proof Two ingredients: Algebraic calculus + trace lemma Find eigenvalues away from eigenvalues of 1 n X N X N : ( 1 0 = det = det(c N ) det ( 1 = det n Y N Y N λi N ) ( ) 1 n X N XN λc 1 N ) n X N XN λi N + λ(i N C 1 N ) ( ) ( 1 = det n X N XN λi N det ( I N + λ(i N C 1 1 N ) n X N XN λi N ) 1 ). Use low rank property: I N C 1 N = I N (I N + UΩU ) 1 = U(I K + Ω 1 ) 1 U, Ω C K K. Hence ( ) ( ( ) ) = det n X N XN λi N det I N + λu(i K + Ω 1 ) 1 U n X N XN λi N 31 / 153

74 Basics of Random Matrix Theory/Spiked Models 32/153 Spiked Models Proof (2) Sylverster s identity (det(i + AB) = det(i + BA)), ( ) ( ( ) ) = det n X N XN λi N det I K + λ(i K + Ω 1 ) 1 U n X N XN λi N U 32 / 153

75 Basics of Random Matrix Theory/Spiked Models 32/153 Spiked Models Proof (2) Sylverster s identity (det(i + AB) = det(i + BA)), ( ) ( ( ) ) = det n X N XN λi N det I K + λ(i K + Ω 1 ) 1 U n X N XN λi N U No eigenvalue outside the support [Bai,Sil 98]: det( 1 n X N X N λi N ) has no zero beyond (1 + c) 2 for all large n a.s. 32 / 153

76 Basics of Random Matrix Theory/Spiked Models 32/153 Spiked Models Proof (2) Sylverster s identity (det(i + AB) = det(i + BA)), ( ) ( ( ) ) = det n X N XN λi N det I K + λ(i K + Ω 1 ) 1 U n X N XN λi N U No eigenvalue outside the support [Bai,Sil 98]: det( 1 n X N X N λi N ) has no zero beyond (1 + c) 2 for all large n a.s. Extension of Trace Lemma: for each z C \ supp(µ), U ( 1 n X N X N zi N ) 1 U a.s. m µ(z)i K. (X N being almost-unitarily invariant, U can be seen as formed of random i.i.d.-like vectors) 32 / 153

77 Basics of Random Matrix Theory/Spiked Models 32/153 Spiked Models Proof (2) Sylverster s identity (det(i + AB) = det(i + BA)), ( ) ( ( ) ) = det n X N XN λi N det I K + λ(i K + Ω 1 ) 1 U n X N XN λi N U No eigenvalue outside the support [Bai,Sil 98]: det( 1 n X N X N λi N ) has no zero beyond (1 + c) 2 for all large n a.s. Extension of Trace Lemma: for each z C \ supp(µ), U ( 1 n X N X N zi N ) 1 U a.s. m µ(z)i K. (X N being almost-unitarily invariant, U can be seen as formed of random i.i.d.-like vectors) As a result, for all large n a.s., ( 0 = det I K + λ(i K + Ω 1 ) 1 U ( 1 ) n X N XN λi N ) 1 U M m=1 ( 1 + λ 1 + ωm 1 m µ(λ)) km = M m=1 ( ) 1 + λωm km m µ(λ) 1 + ω m 32 / 153

78 Basics of Random Matrix Theory/Spiked Models 33/153 Spiked Models Proof (3) Limiting solutions: zeros (with multiplicity) of 1 + λωm 1 + ω m m µ(λ) = / 153

79 Basics of Random Matrix Theory/Spiked Models 33/153 Spiked Models Proof (3) Limiting solutions: zeros (with multiplicity) of 1 + λωm 1 + ω m m µ(λ) = 0. Using Mar cenko Pastur law properties (m µ(z) = (1 c z czm µ(z)) 1 ), { λ 1 + ω m + c 1 + } M ωm. ω m m=1 33 / 153

80 Basics of Random Matrix Theory/Spiked Models 34/153 Spiked Models Theorem (Eigenvectors [Paul 07]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, finite fourth order moment entries C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω M > / 153

81 Basics of Random Matrix Theory/Spiked Models 34/153 Spiked Models Theorem (Eigenvectors [Paul 07]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, finite fourth order moment entries C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω M > 0. Then, as N, n, N/n c (0, ), for a, b C N deterministic and û i eigenvector of λ i ( 1 n Y N Y N ), In particular, a û i û i b 1 cω 2 i 1 + cω 1 i a u i u i b 1 ω i > c a.s. 0 û i u i 2 a.s. 1 cω 2 i 1 + cω 1 1 ωi > c. i 34 / 153

82 Basics of Random Matrix Theory/Spiked Models 34/153 Spiked Models Theorem (Eigenvectors [Paul 07]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, finite fourth order moment entries C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω M > 0. Then, as N, n, N/n c (0, ), for a, b C N deterministic and û i eigenvector of λ i ( 1 n Y N Y N ), In particular, a û i û i b 1 cω 2 i 1 + cω 1 i a u i u i b 1 ω i > c a.s. 0 û i u i 2 a.s. 1 cω 2 i 1 + cω 1 1 ωi > c. i Proof: Based on Cauchy integral + similar ingredients as eigenvalue proof a û i û i b = 1 ( ) 1 1 2πı n Y N YN zi N b dz C i a for C m contour circling around λ i only. 34 / 153

83 Basics of Random Matrix Theory/Spiked Models 35/153 Spiked Models û 1 u N = 100 N = 200 N = c/ω c/ω Population spike ω 1 Figure: Simulated versus limiting û 1 u1 2 for Y N = C 1 2N X N, C N = I N + ω 1u 1u 1, N/n = 1/3, varying ω / 153

84 Basics of Random Matrix Theory/Spiked Models 36/153 Tracy Widom Theorem Theorem (Phase Transition [Baik,BenArous,Péché 05]) Let Y N = C 1 2 N X N, with X N with i.i.d. complex Gaussian zero mean, unit variance entries, C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω K > 0 (K 0). 36 / 153

85 Basics of Random Matrix Theory/Spiked Models 36/153 Tracy Widom Theorem Theorem (Phase Transition [Baik,BenArous,Péché 05]) Let Y N = C 1 2 N X N, with X N with i.i.d. complex Gaussian zero mean, unit variance entries, C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω K > 0 (K 0). Then, as N, n, N/n c < 1, If ω 1 < c (or K = 0), N 2 3 λ 1 (1 + c) 2 (1 + c) 4 3 c 1 2 L T 2, (complex Tracy Widom law) 36 / 153

86 Basics of Random Matrix Theory/Spiked Models 36/153 Tracy Widom Theorem Theorem (Phase Transition [Baik,BenArous,Péché 05]) Let Y N = C 1 2 N X N, with X N with i.i.d. complex Gaussian zero mean, unit variance entries, C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω K > 0 (K 0). Then, as N, n, N/n c < 1, If ω 1 < c (or K = 0), N 2 3 λ 1 (1 + c) 2 (1 + c) 4 3 c 1 2 L T 2, (complex Tracy Widom law) If ω 1 > c, ( (1 + ω1 ) 2 c (1 + ω 1) 2 ) 1 [ ( 2 1 N ω1 2 2 λ ω 1 + c 1 + ω )] 1 L N (0, 1). ω 1 36 / 153

87 Basics of Random Matrix Theory/Spiked Models 37/153 Tracy Widom Theorem 0.5 Centered-scaled λ 1 T Figure: Distribution of N 2 3 c 1 2 (1 + c) 4 3 [ λ 1( 1 n X N X N ) (1 + c) 2] versus Tracy Widom (T 2), N = 500, n = / 153

88 Basics of Random Matrix Theory/Spiked Models 38/153 Other Spiked Models Similar results for multiple matrix models: Additive spiked model: Y N = 1 n XX + P, P deterministic and low rank Y N = 1 n X (I + P )X Y N = 1 n (X + P ) (X + P ) Y N = 1 n T X (I + P )XT etc. 38 / 153

89 Basics of Random Matrix Theory/Other Common Random Matrix Models 39/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 39 / 153

90 Basics of Random Matrix Theory/Other Common Random Matrix Models 40/153 The Semi-circle law Theorem Let X N C N N 1 Hermitian with e.s.d. µ N such that [X N N ] i>j are i.i.d. with zero mean and unit variance. Then, as N, µ N a.s. µ with µ(dt) = 1 2π (4 t 2 ) + dt. In particular, m µ satisfies m µ(z) = 1 z m µ(z). 40 / 153

91 Basics of Random Matrix Theory/Other Common Random Matrix Models 41/153 The Semi-circle law 0.4 Empirical eigenvalue distribution Semi-circle Law 0.3 Density Eigenvalues Figure: Histogram of the eigenvalues of Wigner matrices and the semi-circle law, for N = / 153

92 Basics of Random Matrix Theory/Other Common Random Matrix Models 42/153 The Circular law Theorem Let X N C N N with e.s.d. µ N be such that mean and unit variance. Then, as N, µ N a.s. µ with µ a complex-supported measure with µ(dz) = 1 2π δ z 1dz. 1 N [X N ] ij are i.i.d. entries with zero 42 / 153

93 Basics of Random Matrix Theory/Other Common Random Matrix Models 43/153 The Circular law 1 Empirical eigenvalues Circular Law Eigenvalues (imaginary part) Eigenvalues (real part) Figure: Eigenvalues of X N with i.i.d. standard Gaussian entries, for N = / 153

94 Basics of Random Matrix Theory/Other Common Random Matrix Models 44/153 Bibliographical references: Maths Book and Tutorial References I From most accessible to least: Couillet, R., & Debbah, M. (2011). Random matrix methods for wireless communications. Cambridge University Press. Tao, T. (2012). Topics in random matrix theory (Vol. 132). Providence, RI: American Mathematical Society. Bai, Z., & Silverstein, J. W. (2010). Spectral analysis of large dimensional random matrices (Vol. 20). New York: Springer. Pastur, L. A., Shcherbina, M., & Shcherbina, M. (2011). Eigenvalue distribution of large random matrices (Vol. 171). Providence, RI: American Mathematical Society. Anderson, G. W., Guionnet, A., & Zeitouni, O. (2010). An introduction to random matrices (Vol. 118). Cambridge university press. 44 / 153

95 Applications/ 45/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 45 / 153

96 Applications/Random Matrices and Robust Estimation 46/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 46 / 153

97 Applications/Random Matrices and Robust Estimation 47/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : 47 / 153

98 Applications/Random Matrices and Robust Estimation 47/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 47 / 153

99 Applications/Random Matrices and Robust Estimation 47/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 [Huber 67] If x 1 (1 ε)n (0, C N ) + εg, G unknown, robust estimator (n > N) { } Ĉ N = 1 n l 2 max l 1, n 1 i=1 N x i Ĉ 1 N x x i x i for some l 1, l 2 > 0. i 47 / 153

100 Applications/Random Matrices and Robust Estimation 47/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 [Huber 67] If x 1 (1 ε)n (0, C N ) + εg, G unknown, robust estimator (n > N) { } Ĉ N = 1 n l 2 max l 1, n 1 i=1 N x i Ĉ 1 N x x i x i for some l 1, l 2 > 0. i [Maronna 76] If x 1 elliptical (and n > N), ML estimator for C N given by Ĉ N = 1 n ( ) 1 u n N x i Ĉ 1 N x i x i x i for some non-increasing u. i=1 47 / 153

101 Applications/Random Matrices and Robust Estimation 47/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 [Huber 67] If x 1 (1 ε)n (0, C N ) + εg, G unknown, robust estimator (n > N) { } Ĉ N = 1 n l 2 max l 1, n 1 i=1 N x i Ĉ 1 N x x i x i for some l 1, l 2 > 0. i [Maronna 76] If x 1 elliptical (and n > N), ML estimator for C N given by Ĉ N = 1 n ( ) 1 u n N x i Ĉ 1 N x i x i x i for some non-increasing u. i=1 [Pascal 13; Chen 11] If N > n, x 1 elliptical or with outliers, shrinkage extensions Ĉ N (ρ) = (1 ρ) 1 n x i x i n 1 i=1 N x i Ĉ 1 N (ρ)x + ρi N i ˇB N (ρ) Č N (ρ) = 1 N tr ˇB N (ρ), ˇBN (ρ) = (1 ρ) 1 n x i x i n 1 i=1 N x i Č 1 N (ρ)x + ρi N i 47 / 153

102 Applications/Random Matrices and Robust Estimation 48/153 Context Results only known for N fixed and n : not appropriate in settings of interest today (BigData, array processing, MIMO) 48 / 153

103 Applications/Random Matrices and Robust Estimation 48/153 Context Results only known for N fixed and n : not appropriate in settings of interest today (BigData, array processing, MIMO) We study such ĈN in the regime N, n, N/n c (0, ). 48 / 153

104 Applications/Random Matrices and Robust Estimation 48/153 Context Results only known for N fixed and n : not appropriate in settings of interest today (BigData, array processing, MIMO) We study such ĈN in the regime N, n, N/n c (0, ). Math interest: limiting eigenvalue distribution of ĈN limiting values and fluctuations of functionals f(ĉn ) 48 / 153

105 Applications/Random Matrices and Robust Estimation 48/153 Context Results only known for N fixed and n : not appropriate in settings of interest today (BigData, array processing, MIMO) We study such ĈN in the regime N, n, N/n c (0, ). Math interest: limiting eigenvalue distribution of ĈN limiting values and fluctuations of functionals f(ĉn ) Application interest: comparison between SCM and robust estimators performance of robust/non-robust estimation methods improvement thereof (by proper parametrization) 48 / 153

106 Applications/Random Matrices and Robust Estimation 49/153 Model Description Definition (Maronna s Estimator) For x 1,..., x n C N with n > N, Ĉ N is the solution (upon existence and uniqueness) of n Ĉ N = 1 n i=1 ( 1 u N x i Ĉ 1 N x i ) x i x i 49 / 153

107 Applications/Random Matrices and Robust Estimation 49/153 Model Description Definition (Maronna s Estimator) For x 1,..., x n C N with n > N, Ĉ N is the solution (upon existence and uniqueness) of where u : [0, ) (0, ) is n Ĉ N = 1 n i=1 ( 1 u N x i Ĉ 1 N x i ) x i x i non-increasing such that φ(x) xu(x) increasing of supremum φ with 1 < φ < c 1, c (0, 1). 49 / 153

108 Applications/Random Matrices and Robust Estimation 50/153 The Results in a Nutshell For various models of the x i s, First order convergence: ĈN ŜN a.s. 0 for some tractable random matrices ŜN. We only discuss this result here. 50 / 153

109 Applications/Random Matrices and Robust Estimation 50/153 The Results in a Nutshell For various models of the x i s, First order convergence: ĈN ŜN a.s. 0 for some tractable random matrices ŜN. We only discuss this result here. Second order results: N 1 ε ( a Ĉ k N b a Ŝ k N b ) a.s. 0 allowing transfer of CLT results. 50 / 153

110 Applications/Random Matrices and Robust Estimation 50/153 The Results in a Nutshell For various models of the x i s, First order convergence: ĈN ŜN a.s. 0 for some tractable random matrices ŜN. We only discuss this result here. Second order results: N 1 ε ( a Ĉ k N b a Ŝ k N b ) a.s. 0 allowing transfer of CLT results. Applications: improved robust covariance matrix estimation improved robust tests / estimators specific examples in statistics at large, array processing, statistical finance, etc. 50 / 153

111 Applications/Random Matrices and Robust Estimation 51/153 (Elliptical) scenario Theorem (Large dimensional behavior, elliptical case) For x i = τ i w i, τ i impulsive (random or not), w i unitarily invariant, w i = N, ĈN ŜN a.s. 0 with, for some v related to u (v = u g 1, g(x) = x(1 cφ(x)) 1 ), n Ĉ N 1 n i=1 and γ N unique solution of ( 1 u N x i Ĉ 1 N x i ) x i x i, 1 = 1 n γv(τ i γ) n 1 + cγv(τ j=1 i γ). Ŝ N 1 n v(τ i γ N )x i x i n i=1 51 / 153

112 Applications/Random Matrices and Robust Estimation 51/153 (Elliptical) scenario Theorem (Large dimensional behavior, elliptical case) For x i = τ i w i, τ i impulsive (random or not), w i unitarily invariant, w i = N, ĈN ŜN a.s. 0 with, for some v related to u (v = u g 1, g(x) = x(1 cφ(x)) 1 ), n Ĉ N 1 n i=1 and γ N unique solution of ( 1 u N x i Ĉ 1 N x i ) x i x i, 1 = 1 n γv(τ i γ) n 1 + cγv(τ j=1 i γ). Ŝ N 1 n v(τ i γ N )x i x i n i=1 Corollaries Spectral measure: µĉn N µŝn N L 0 a.s. (µ X N 1 n n i=1 δ λ i (X)) 51 / 153

113 Applications/Random Matrices and Robust Estimation 51/153 (Elliptical) scenario Theorem (Large dimensional behavior, elliptical case) For x i = τ i w i, τ i impulsive (random or not), w i unitarily invariant, w i = N, ĈN ŜN a.s. 0 with, for some v related to u (v = u g 1, g(x) = x(1 cφ(x)) 1 ), n Ĉ N 1 n i=1 and γ N unique solution of ( 1 u N x i Ĉ 1 N x i ) x i x i, 1 = 1 n γv(τ i γ) n 1 + cγv(τ j=1 i γ). Ŝ N 1 n v(τ i γ N )x i x i n i=1 Corollaries Spectral measure: µĉn N µŝn N L 0 a.s. (µ X N 1 n n i=1 δ λ i (X)) Local convergence: max 1 i N λ i (ĈN ) λ i (ŜN ) a.s / 153

114 Applications/Random Matrices and Robust Estimation 51/153 (Elliptical) scenario Theorem (Large dimensional behavior, elliptical case) For x i = τ i w i, τ i impulsive (random or not), w i unitarily invariant, w i = N, ĈN ŜN a.s. 0 with, for some v related to u (v = u g 1, g(x) = x(1 cφ(x)) 1 ), n Ĉ N 1 n i=1 and γ N unique solution of ( 1 u N x i Ĉ 1 N x i ) x i x i, 1 = 1 n γv(τ i γ) n 1 + cγv(τ j=1 i γ). Ŝ N 1 n v(τ i γ N )x i x i n i=1 Corollaries Spectral measure: µĉn N µŝn N L 0 a.s. (µ X N 1 n n i=1 δ λ i (X)) Local convergence: max 1 i N λ i (ĈN ) λ i (ŜN ) a.s. 0. Norm boundedness: lim sup N ĈN < Bounded spectrum (unlike SCM!) 51 / 153

115 Applications/Random Matrices and Robust Estimation 52/153 Large dimensional behavior 3 Eigenvalues of ĈN Density Eigenvalues Figure: n = 2500, N = 500, C N = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 52 / 153

116 Applications/Random Matrices and Robust Estimation 52/153 Large dimensional behavior Eigenvalues of ĈN 3 Eigenvalues of ŜN Density Eigenvalues Figure: n = 2500, N = 500, C N = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 52 / 153

117 Applications/Random Matrices and Robust Estimation 52/153 Large dimensional behavior Eigenvalues of ĈN 3 Eigenvalues of ŜN Approx. Density Density Eigenvalues Figure: n = 2500, N = 500, C N = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 52 / 153

118 Applications/Random Matrices and Robust Estimation 53/153 Elements of Proof Definition (v and ψ) Letting g(x) = x(1 cφ(x)) 1 (on R + ), v(x) (u g 1 )(x) non-increasing ψ(x) xv(x) increasing and bounded by ψ. 53 / 153

119 Applications/Random Matrices and Robust Estimation 53/153 Elements of Proof Definition (v and ψ) Letting g(x) = x(1 cφ(x)) 1 (on R + ), v(x) (u g 1 )(x) non-increasing ψ(x) xv(x) increasing and bounded by ψ. Lemma (Rewriting ĈN) It holds (with C N = I N ) that Ĉ N 1 n τ i v (τ i d i ) w i wi n i=1 with (d 1,..., d n) R n + a.s. unique solution to d i = 1 N w i Ĉ 1 (i) w i = 1 N w i 1 1 τ j v(τ j d j )w j wj w i, i = 1,..., n. n j i 53 / 153

120 Applications/Random Matrices and Robust Estimation 54/153 Elements of Proof Remark (Quadratic Form close to Trace) Random matrix insight: ( 1 n j i τ jv(τ j d j )w j wj ) 1 almost independent of w i, so 54 / 153

121 Applications/Random Matrices and Robust Estimation 54/153 Elements of Proof Remark (Quadratic Form close to Trace) Random matrix insight: ( 1 n j i τ jv(τ j d j )w j wj ) 1 almost independent of w i, so 1 1 d i = 1 N w 1 i τ j v(τ j d j )w j wj w i 1 n N tr 1 τ j v(τ j d j )w j w j γ N n j i j i for some deterministic sequence (γ N ) N=1, irrespective of i. 54 / 153

122 Applications/Random Matrices and Robust Estimation 54/153 Elements of Proof Remark (Quadratic Form close to Trace) Random matrix insight: ( 1 n j i τ jv(τ j d j )w j wj ) 1 almost independent of w i, so 1 1 d i = 1 N w 1 i τ j v(τ j d j )w j wj w i 1 n N tr 1 τ j v(τ j d j )w j w j γ N n j i j i for some deterministic sequence (γ N ) N=1, irrespective of i. Lemma (Key Lemma) Letting e i v(τ id i ) v(τ i γ N ) with γ N unique solution to we have 1 = 1 n ψ(τ i γ N ) n 1 + cψ(τ k=1 i γ N ) max e i 1 a.s i n 54 / 153

123 Applications/Random Matrices and Robust Estimation 55/153 Proof of the Key Lemma: max i e i 1 a.s. 0, e i = v(τ id i ) v(τ i γ N ) Property (Quadratic form and γ N ) 1 max 1 1 i n N w 1 i τ j v(τ j γ N )w j wj w i γ N a.s. n 0. j i 55 / 153

124 Applications/Random Matrices and Robust Estimation 55/153 Proof of the Key Lemma: max i e i 1 a.s. 0, e i = v(τ id i ) v(τ i γ N ) Property (Quadratic form and γ N ) 1 max 1 1 i n N w 1 i τ j v(τ j γ N )w j wj w i γ N a.s. n 0. j i Proof of the Property Uniformity easy (moments of all orders for [w i ] j ). By a quadratic form similar to trace approach, we get 1 max 1 1 i n N w 1 i τ j v(τ j γ N )w j wj w i m(0) a.s. n 0 j i with m(0) unique positive solution to [MarPas 67; BaiSil 95] m(0) = 1 n τ i v(τ i γ N ) n 1 + cτ i=1 i v(τ i γ N )m(0). γ N precisely solves this equation, thus m(0) = γ N. 55 / 153

125 Applications/Random Matrices and Robust Estimation 56/153 Proof of the Key Lemma: max i e i 1 a.s. 0, e i = v(τ id i ) Substitution Trick (case τ i [a, b] (0, )) Up to relabelling e 1... e n, use v(τ i γ N ) 1 v(τ nγ N )e n = v(τ nd n) = v 1 τn N w 1 n τ i v(τ i d i ) w i w i n }{{} w n i<n =v(τ i γ N )e i ( ) v τ ne n N w n τ i v(τ i γ N )w i wi w n n i<n v ( τ ne 1 n (γ N ε ) n) a.s., ε n 0 (slow). 56 / 153

126 Applications/Random Matrices and Robust Estimation 56/153 Proof of the Key Lemma: max i e i 1 a.s. 0, e i = v(τ id i ) Substitution Trick (case τ i [a, b] (0, )) Up to relabelling e 1... e n, use v(τ i γ N ) 1 v(τ nγ N )e n = v(τ nd n) = v 1 τn N w 1 n τ i v(τ i d i ) w i w i n }{{} w n i<n =v(τ i γ N )e i ( ) v τ ne n N w n τ i v(τ i γ N )w i wi w n n i<n Use properties of ψ to get v ( τ ne 1 n (γ N ε ) n) a.s., ε n 0 (slow). ψ (τ nγ N ) ψ ( τ ne 1 n γ ) ( ) 1 N 1 ε nγ 1 N 56 / 153

127 Applications/Random Matrices and Robust Estimation 56/153 Proof of the Key Lemma: max i e i 1 a.s. 0, e i = v(τ id i ) Substitution Trick (case τ i [a, b] (0, )) Up to relabelling e 1... e n, use v(τ i γ N ) 1 v(τ nγ N )e n = v(τ nd n) = v 1 τn N w 1 n τ i v(τ i d i ) w i w i n }{{} w n i<n =v(τ i γ N )e i ( ) v τ ne n N w n τ i v(τ i γ N )w i wi w n n i<n Use properties of ψ to get v ( τ ne 1 n (γ N ε ) n) a.s., ε n 0 (slow). ψ (τ nγ N ) ψ ( τ ne 1 n γ ) ( ) 1 N 1 ε nγ 1 N Conclusion: If e n > 1 + l i.o., as τ n [a, b], on subsequence { τn τ 0 > 0 γ N γ 0 > 0, ( ) τ0 γ 0 ψ(τ 0 γ 0 ) ψ, a contradiction. 1 + l 56 / 153

128 Applications/Random Matrices and Robust Estimation 57/153 Outlier Data Theorem (Outlier Rejection) Observation set X = [ ] x 1,..., x (1 εn)n, a 1,..., a εnn where x i CN (0, C N ) and a 1,..., a εnn C N deterministic outliers. Then, ĈN ŜN a.s. 0 where Ŝ N v (γ N ) 1 n (1 ε n)n i=1 x i x i + 1 ε nn v (α i,n ) a i a i n i=1 with γ N and α 1,n,..., α εnn,n unique positive solutions to ( ) γ N = 1 N tr C (1 ε)v(γ N ) N C N + 1 ε nn 1 v (α i,n ) a i a i 1 + cv(γ N )γ N n i=1 1 α i,n = 1 N a (1 ε)v(γ N ) i C N + 1 ε nn v (α j,n ) a j a j a i, i = 1,..., ε nn. 1 + cv(γ N )γ N n j i 57 / 153

129 Applications/Random Matrices and Robust Estimation 58/153 Outlier Data For ε nn = 1, ( φ 1 (1) Ŝ N = v 1 c ) n 1 1 n i=1 ( φ x i x i (v 1 + (1) 1 c Outlier rejection relies on 1 N a 1 C 1 N a N a 1C 1 N a 1 ) ) + o(1) a 1 a 1 58 / 153

130 Applications/Random Matrices and Robust Estimation 58/153 Outlier Data For ε nn = 1, ( φ 1 (1) Ŝ N = v 1 c ) n 1 1 n i=1 ( φ x i x i (v 1 + (1) 1 c Outlier rejection relies on 1 N a 1 C 1 N a 1 1. For a i CN (0, D N ), ε n ε 0, 1 N a 1C 1 N a 1 Ŝ N = v (γ 1 (1 ε n)n n) x i x i n + v (αn) 1 ε nn a i a i n i=1 i=1 γ n = 1 ( (1 ε)v(γn) N tr C N C N cv(γ n)γ n α n = 1 ( (1 ε)v(γn) N tr D N C N cv(γ n)γ n ) εv(α n) 1 + cv(α n)α n D N εv(α n) 1 + cv(α n)α n D N ) + o(1) a 1 a 1 ) 1 ) / 153

131 Applications/Random Matrices and Robust Estimation 58/153 Outlier Data For ε nn = 1, ( φ 1 (1) Ŝ N = v 1 c ) n 1 1 n i=1 ( φ x i x i (v 1 + (1) 1 c Outlier rejection relies on 1 N a 1 C 1 N a 1 1. For a i CN (0, D N ), ε n ε 0, For ε n 0, 1 N a 1C 1 N a 1 Ŝ N = v (γ 1 (1 ε n)n n) x i x i n + v (αn) 1 ε nn a i a i n i=1 i=1 γ n = 1 ( (1 ε)v(γn) N tr C N C N cv(γ n)γ n α n = 1 ( (1 ε)v(γn) N tr D N C N cv(γ n)γ n ( φ 1 ) (1 ε (1) 1 n)n Ŝ N = v x i x i 1 c n + 1 n i=1 ε nn i=1 ) εv(α n) 1 + cv(α n)α n D N εv(α n) 1 + cv(α n)α n D N ( φ 1 (1) v 1 c ) + o(1) a 1 a 1 ) 1 ) 1. 1 N tr D N C 1 N ) a i a i Outlier rejection relies on 1 N tr D N C 1 N / 153

132 Applications/Random Matrices and Robust Estimation 59/153 Outlier Data Deterministic equivalent eigenvalue distribution n (1 εn)n x i=1 i x i (Clean Data) Eigenvalues Figure: Limiting eigenvalue distributions. [C N ] ij =.9 i j, D N = I N, ε = / 153

133 Applications/Random Matrices and Robust Estimation 59/153 Outlier Data Deterministic equivalent eigenvalue distribution (1 εn)n x n i=1 i x i (Clean Data) 1 n XX or per-input normalized Eigenvalues Figure: Limiting eigenvalue distributions. [C N ] ij =.9 i j, D N = I N, ε = / 153

134 Applications/Random Matrices and Robust Estimation 59/153 Outlier Data Deterministic equivalent eigenvalue distribution (1 εn)n x n i=1 i x i (Clean Data) 1 n XX or per-input normalized Ĉ N Eigenvalues Figure: Limiting eigenvalue distributions. [C N ] ij =.9 i j, D N = I N, ε = / 153

135 Applications/Random Matrices and Robust Estimation 60/153 Example of application to statistical finance Robust matrix-optimized portfolio allocation ĈST Annualized standard deviation Ĉ ST Ĉ P Ĉ C Ĉ C2 Ĉ LW Ĉ R t (N=45,n=300) 60 / 153

136 Applications/Spectral Clustering Methods and Random Matrices 61/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 61 / 153

137 Applications/Spectral Clustering Methods and Random Matrices 62/153 Reminder on Spectral Clustering Methods Context: Two-step classification of n objects based on similarity A R n n : 1. extraction of eigenvectors U = [u 1,..., u l ] with dominant eigenvalues 62 / 153

138 Applications/Spectral Clustering Methods and Random Matrices 62/153 Reminder on Spectral Clustering Methods Context: Two-step classification of n objects based on similarity A R n n : 1. extraction of eigenvectors U = [u 1,..., u l ] with dominant eigenvalues 2. classification of vectors U 1,,..., U n, R l using k-means/em. 62 / 153

139 Applications/Spectral Clustering Methods and Random Matrices 62/153 Reminder on Spectral Clustering Methods Context: Two-step classification of n objects based on similarity A R n n : 1. extraction of eigenvectors U = [u 1,..., u l ] with dominant eigenvalues 2. classification of vectors U 1,,..., U n, R l using k-means/em. 0 spikes 62 / 153

140 Applications/Spectral Clustering Methods and Random Matrices 62/153 Reminder on Spectral Clustering Methods Context: Two-step classification of n objects based on similarity A R n n : 1. extraction of eigenvectors U = [u 1,..., u l ] with dominant eigenvalues 2. classification of vectors U 1,,..., U n, R l using k-means/em. 0 spikes Eigenvectors (in practice, shuffled!!) 62 / 153

141 Applications/Spectral Clustering Methods and Random Matrices 63/153 Reminder on Spectral Clustering Methods Eigenv. 2 Eigenv / 153

142 Applications/Spectral Clustering Methods and Random Matrices 63/153 Reminder on Spectral Clustering Methods Eigenv. 2 Eigenv. 1 l-dimensional representation (shuffling no longer matters!) Eigenvector 2 Eigenvector 1 63 / 153

143 Applications/Spectral Clustering Methods and Random Matrices 63/153 Reminder on Spectral Clustering Methods Eigenv. 2 Eigenv. 1 l-dimensional representation (shuffling no longer matters!) Eigenvector 2 Eigenvector 1 EM or k-means clustering. 63 / 153

144 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. 64 / 153

145 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: 64 / 153

146 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution 64 / 153

147 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution spikes 64 / 153

148 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution spikes eigenvectors of isolated eigenvalues. 64 / 153

149 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution spikes eigenvectors of isolated eigenvalues. 2. From Ãn, perform spiked model analysis: 64 / 153

150 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution spikes eigenvectors of isolated eigenvalues. 2. From Ãn, perform spiked model analysis: exhibit phase transition phenomenon 64 / 153

151 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution spikes eigenvectors of isolated eigenvalues. 2. From Ãn, perform spiked model analysis: exhibit phase transition phenomenon read the content of isolated eigenvectors of à n. 64 / 153

152 Applications/Spectral Clustering Methods and Random Matrices 65/153 The Random Matrix Approach The Spike Analysis: For noisy plateaus -looking isolated eigenvectors u 1,..., u l of Ãn, write k u i = α a j a i + σi a wa i na a=1 with j a R n canonical vector of class C a, w a i noise orthogonal to ja, 65 / 153

153 Applications/Spectral Clustering Methods and Random Matrices 65/153 The Random Matrix Approach The Spike Analysis: For noisy plateaus -looking isolated eigenvectors u 1,..., u l of Ãn, write k u i = α a j a i + σi a wa i na a=1 with j a R n canonical vector of class C a, w a i α a i = 1 na u T i ja (σi a )2 = u i α a j a 2 i. na noise orthogonal to ja, and evaluate 65 / 153

154 Applications/Spectral Clustering Methods and Random Matrices 65/153 The Random Matrix Approach The Spike Analysis: For noisy plateaus -looking isolated eigenvectors u 1,..., u l of Ãn, write k u i = α a j a i + σi a wa i na a=1 with j a R n canonical vector of class C a, w a i α a i = 1 na u T i ja (σi a )2 = u i α a j a 2 i. na noise orthogonal to ja, and evaluate = Can be done using complex analysis calculus, e.g. (α a i )2 = 1 ja T n u iu T i ja a = 1 1 ) 1jadz. ja T (Ãn zi n 2πı γ a n a 65 / 153

155 Applications/Community Detection on Graphs 66/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 66 / 153

156 Applications/Community Detection on Graphs 67/153 System Setting Assume n-node, m-edges undirected graph G, with intrinsic average connectivity q 1,..., q n µ i.i.d. 67 / 153

157 Applications/Community Detection on Graphs 67/153 System Setting Assume n-node, m-edges undirected graph G, with intrinsic average connectivity q 1,..., q n µ i.i.d. k classes C 1,..., C k independent of {q i } of (large) sizes n 1,..., n k, with preferential attachment C ab between C a and C b 67 / 153

158 Applications/Community Detection on Graphs 67/153 System Setting Assume n-node, m-edges undirected graph G, with intrinsic average connectivity q 1,..., q n µ i.i.d. k classes C 1,..., C k independent of {q i } of (large) sizes n 1,..., n k, with preferential attachment C ab between C a and C b induces edge probability for node i C a, j C b, P (i j) = q i q j C ab. 67 / 153

159 Applications/Community Detection on Graphs 67/153 System Setting Assume n-node, m-edges undirected graph G, with intrinsic average connectivity q 1,..., q n µ i.i.d. k classes C 1,..., C k independent of {q i } of (large) sizes n 1,..., n k, with preferential attachment C ab between C a and C b induces edge probability for node i C a, j C b, P (i j) = q i q j C ab. adjacency matrix A with A ij Bernoulli(q i q j C ab ). 67 / 153

160 Applications/Community Detection on Graphs 68/153 Objective Study of spectral methods: standard methods based on adjacency A, modularity A ddt 2m, normalized adjacency D 1 AD 1, etc. (adapted to dense nets) refined methods based on Bethe Hessian (r 2 1)I n ra + D (adapted to sparse nets!) 68 / 153

161 Applications/Community Detection on Graphs 68/153 Objective Study of spectral methods: standard methods based on adjacency A, modularity A ddt 2m, normalized adjacency D 1 AD 1, etc. (adapted to dense nets) refined methods based on Bethe Hessian (r 2 1)I n ra + D (adapted to sparse nets!) Improvement to realistic graphs: observation of failure of standard methods above improvement by new methods. 68 / 153

162 Applications/Community Detection on Graphs 69/153 Limitations of Adjacency/Modularity Approach (Modularity) (Bethe Hessian) 69 / 153

163 Applications/Community Detection on Graphs 69/153 Limitations of Adjacency/Modularity Approach (Modularity) Scenario: 3 classes with µ bi-modal (e.g., µ = 3 4 δ δ 0.5) (Bethe Hessian) Leading eigenvectors of A (or modularity A ddt 2m ) biased by q i distribution. Similar behavior for Bethe Hessian. 69 / 153

164 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). 70 / 153

165 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). Community information is weak but highly REDUNDANT! 70 / 153

166 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). Community information is weak but highly REDUNDANT! Considered Matrix: For α [0, 1], (and with D = diag(a1 n) = diag(d) the degree matrix), m = 1 2 dt 1 the number of edges L α = (2m) α 1 [ ] D α A ddt D α. n 2m 70 / 153

167 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). Community information is weak but highly REDUNDANT! Considered Matrix: For α [0, 1], (and with D = diag(a1 n) = diag(d) the degree matrix), m = 1 2 dt 1 the number of edges L α = (2m) α 1 [ ] D α A ddt D α. n 2m Our results in a nutshell: we find optimal α opt having best phase transition. 70 / 153

168 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). Community information is weak but highly REDUNDANT! Considered Matrix: For α [0, 1], (and with D = diag(a1 n) = diag(d) the degree matrix), m = 1 2 dt 1 the number of edges L α = (2m) α 1 [ ] D α A ddt D α. n 2m Our results in a nutshell: we find optimal α opt having best phase transition. we find consistent estimator ˆα opt from A alone. 70 / 153

169 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). Community information is weak but highly REDUNDANT! Considered Matrix: For α [0, 1], (and with D = diag(a1 n) = diag(d) the degree matrix), m = 1 2 dt 1 the number of edges L α = (2m) α 1 [ ] D α A ddt D α. n 2m Our results in a nutshell: we find optimal α opt having best phase transition. we find consistent estimator ˆα opt from A alone. we claim optimal eigenvector regularization D α 1 u, u eigenvector of L α. 70 / 153

170 Applications/Community Detection on Graphs 71/153 Asymptotic Equivalence Theorem (Limiting Random Matrix Equivalent) For each α [0, 1], as n, L α L α 0 almost surely, where L α = (2m) α 1 [ ] D α A ddt D α n 2m L α = 1 D α n q XDq α + UΛU T with D q = diag({q i }), X zero-mean random matrix, [ ] U = Dq 1 α J n Dq α X1 n, rank k + 1 [ (Ik 1 Λ = k c T )M(I k c1 T k ) 1 ] k 1 T k 0 and J = [j 1,..., j k ], j a = [0,..., 0, 1 T n a, 0,..., 0] T R n canonical vector of class C a. 71 / 153

171 Applications/Community Detection on Graphs 71/153 Asymptotic Equivalence Theorem (Limiting Random Matrix Equivalent) For each α [0, 1], as n, L α L α 0 almost surely, where L α = (2m) α 1 [ ] D α A ddt D α n 2m L α = 1 D α n q XDq α + UΛU T with D q = diag({q i }), X zero-mean random matrix, [ ] U = Dq 1 α J n Dq α X1 n, rank k + 1 [ (Ik 1 Λ = k c T )M(I k c1 T k ) 1 ] k 1 T k 0 and J = [j 1,..., j k ], j a = [0,..., 0, 1 T n a, 0,..., 0] T R n canonical vector of class C a. Consequences: isolated eigenvalues beyond phase transition λ(m) > spectrum edge optimal choice α opt of α from study of noise spectrum. 71 / 153

172 Applications/Community Detection on Graphs 71/153 Asymptotic Equivalence Theorem (Limiting Random Matrix Equivalent) For each α [0, 1], as n, L α L α 0 almost surely, where L α = (2m) α 1 [ ] D α A ddt D α n 2m L α = 1 D α n q XDq α + UΛU T with D q = diag({q i }), X zero-mean random matrix, [ ] U = Dq 1 α J n Dq α X1 n, rank k + 1 [ (Ik 1 Λ = k c T )M(I k c1 T k ) 1 ] k 1 T k 0 and J = [j 1,..., j k ], j a = [0,..., 0, 1 T n a, 0,..., 0] T R n canonical vector of class C a. Consequences: isolated eigenvalues beyond phase transition λ(m) > spectrum edge optimal choice α opt of α from study of noise spectrum. eigenvectors correlated to Dq 1 α J Natural regularization by D α 1! 71 / 153

173 Applications/Community Detection on Graphs 72/153 Eigenvalue Spectrum Eigenvalues of L 1 Limiting law spikes S α S α S α Figure: Eigenvalues of L 1, K = 3, n = 2000, c 1 = 0.3, c 2 = 0.3, c 3 = 0.4, µ = 1 2 δq (1) δq (2), q (1) = 0.4, q (2) = 0.9, M defined by M ii = 12, M ij = 4, i j. 72 / 153

174 Applications/Community Detection on Graphs 73/153 Phase Transition Theorem (Phase Transition) For α [0, 1], isolated eigenvalue λ i (L α) if λ i ( M) > τ α, M = (D(c) cc T )M, τ α = lim 1 x S + α e α, phase transition threshold 2 (x) with [S α, Sα + ] limiting eigenvalue support of Lα and eα 2 (x) ( x > Sα + ) solution of e α 1 (x) = e α 2 (x) = 1 In this case, e α 2 (λ i(l = λ α)) i( M). q 1 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq) 2 (x) q 2 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq). 2 (x) 73 / 153

175 Applications/Community Detection on Graphs 73/153 Phase Transition Theorem (Phase Transition) For α [0, 1], isolated eigenvalue λ i (L α) if λ i ( M) > τ α, M = (D(c) cc T )M, τ α = lim 1 x S + α e α, phase transition threshold 2 (x) with [S α, Sα + ] limiting eigenvalue support of Lα and eα 2 (x) ( x > Sα + ) solution of e α 1 (x) = e α 2 (x) = 1 In this case, e α 2 (λ i(l = λ α)) i( M). q 1 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq) 2 (x) q 2 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq). 2 (x) Clustering still possible when λ i ( M) = (min α τ α) + ε. Optimal α = α opt: α opt = argmin α [0,1] {τ α}. 73 / 153

176 Applications/Community Detection on Graphs 73/153 Phase Transition Theorem (Phase Transition) For α [0, 1], isolated eigenvalue λ i (L α) if λ i ( M) > τ α, M = (D(c) cc T )M, τ α = lim 1 x S + α e α, phase transition threshold 2 (x) with [S α, Sα + ] limiting eigenvalue support of Lα and eα 2 (x) ( x > Sα + ) solution of e α 1 (x) = e α 2 (x) = 1 In this case, e α 2 (λ i(l = λ α)) i( M). q 1 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq) 2 (x) q 2 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq). 2 (x) Clustering still possible when λ i ( M) = (min α τ α) + ε. Optimal α = α opt: α opt = argmin α [0,1] {τ α}. d From max i i q i d T 1 n a.s. 0, we obtain consistent estimator ˆα opt of α opt. 73 / 153

177 Applications/Community Detection on Graphs 74/153 Simulated Performance Results (2 masses of q i ) (Modularity) (Bethe Hessian) 74 / 153

178 Applications/Community Detection on Graphs 74/153 Simulated Performance Results (2 masses of q i ) (Modularity) (Bethe Hessian) (Algo with α = 1) Figure: Two dominant eigenvectors (x-y axes) for n = 2000, K = 3, µ = 3 4 δq (1) δq (2), q (1) = 0.1, q (2) = 0.5, c 1 = c 2 = 1 4, c3 = 1 2, M = 100I3. 74 / 153

179 Applications/Community Detection on Graphs 74/153 Simulated Performance Results (2 masses of q i ) (Modularity) (Bethe Hessian) (Algo with α = 1) (Algo with α opt) Figure: Two dominant eigenvectors (x-y axes) for n = 2000, K = 3, µ = 3 4 δq (1) δq (2), q (1) = 0.1, q (2) = 0.5, c 1 = c 2 = 1 4, c3 = 1 2, M = 100I3. 74 / 153

180 Applications/Community Detection on Graphs 75/153 Simulated Performance Results (2 masses for q i ) Normalized spike λ S + α Eigenvalue l (l = 1/e α 2 (λ) beyond phase transition) Figure: Largest eigenvalue λ of L α as a function of the largest eigenvalue l of (D(c) cc T )M, for µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, for α {0, 1 4, 1 2, 3 4, 1, αopt} (indicated below the graph). Here, α opt = Circles indicate phase transition. Beyond phase transition, l = 1/e α 2 (λ). 75 / 153

181 Applications/Community Detection on Graphs 75/153 Simulated Performance Results (2 masses for q i ) Normalized spike λ S + α Eigenvalue l (l = 1/e α 2 (λ) beyond phase transition) Figure: Largest eigenvalue λ of L α as a function of the largest eigenvalue l of (D(c) cc T )M, for µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, for α {0, 1 4, 1 2, 3 4, 1, αopt} (indicated below the graph). Here, α opt = Circles indicate phase transition. Beyond phase transition, l = 1/e α 2 (λ). 75 / 153

182 Applications/Community Detection on Graphs 75/153 Simulated Performance Results (2 masses for q i ) Normalized spike λ S + α α opt Eigenvalue l (l = 1/e α 2 (λ) beyond phase transition) Figure: Largest eigenvalue λ of L α as a function of the largest eigenvalue l of (D(c) cc T )M, for µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, for α {0, 1 4, 1 2, 3 4, 1, αopt} (indicated below the graph). Here, α opt = Circles indicate phase transition. Beyond phase transition, l = 1/e α 2 (λ). 75 / 153

183 Applications/Community Detection on Graphs 76/153 Simulated Performance Results (2 masses for q i ) Overlap α = 1 Phase transition Figure: Overlap performance for n = 3000, K = 3, c i = 1 3, µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, M = I 3, for [5, 50]. Here α opt = / 153

184 Applications/Community Detection on Graphs 76/153 Simulated Performance Results (2 masses for q i ) Overlap α = 0 α = 1 2 α = 1 Phase transition Figure: Overlap performance for n = 3000, K = 3, c i = 1 3, µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, M = I 3, for [5, 50]. Here α opt = / 153

185 Applications/Community Detection on Graphs 76/153 Simulated Performance Results (2 masses for q i ) Overlap 0.4 α = α = 1 2 α = 1 α = α opt Phase transition Figure: Overlap performance for n = 3000, K = 3, c i = 1 3, µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, M = I 3, for [5, 50]. Here α opt = / 153

186 Applications/Community Detection on Graphs 76/153 Simulated Performance Results (2 masses for q i ) Overlap α = 0 α = 1 2 α = 1 α = α opt Bethe Hessian Phase transition Figure: Overlap performance for n = 3000, K = 3, c i = 1 3, µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, M = I 3, for [5, 50]. Here α opt = / 153

187 Applications/Community Detection on Graphs 77/153 Simulated Performance Results (2 masses for q i ) α = 0 α = 0.5 α = 1 α = α opt Bethe Hessian 0.6 Overlap q 2 (q 1 = 0.1) Figure: Overlap performance for n = 3000, K = 3, µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) [0.1, 0.9], M = 10(2I T 3 ), ci = / 153

188 Applications/Community Detection on Graphs 78/153 Theoretical Performance Analysis of eigenvectors reveals: eigenvectors are noisy staircase vectors 78 / 153

189 Applications/Community Detection on Graphs 78/153 Theoretical Performance Analysis of eigenvectors reveals: eigenvectors are noisy staircase vectors conjectured Gaussian fluctuations of eigenvector entries 78 / 153

190 Applications/Community Detection on Graphs 78/153 Theoretical Performance Analysis of eigenvectors reveals: eigenvectors are noisy staircase vectors conjectured Gaussian fluctuations of eigenvector entries for q i = q 0 (homogeneous case), same variance for all entries in non-homogeneous case, we can compute average variance per class Heuristic asymptotic performance upper-bound using EM. 78 / 153

191 Applications/Community Detection on Graphs 79/153 Theoretical Performance Results (uniform distribution for q i ) Correct classificate rate α = 1 α = 0.75 α = 0.5 α = 0.25 α = 0 α opt Figure: Theoretical probability of correct recovery for n = 2000, K = 2, c 1 = 0.6, c 2 = 0.4, µ uniformly distributed in [0.2, 0.8], M = I 2, for [0, 20]. 79 / 153

192 Applications/Community Detection on Graphs 80/ / 153

193 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: 80 / 153

194 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. 80 / 153

195 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. Classical debate over best normalization of adjacency (or modularity) matrix A not trivial to solve. With heterogeneous degrees, we found a good on-line method. 80 / 153

196 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. Classical debate over best normalization of adjacency (or modularity) matrix A not trivial to solve. With heterogeneous degrees, we found a good on-line method. Simulations support good performances even for rather sparse settings. 80 / 153

197 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. Classical debate over best normalization of adjacency (or modularity) matrix A not trivial to solve. With heterogeneous degrees, we found a good on-line method. Simulations support good performances even for rather sparse settings. But strong limitations: 80 / 153

198 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. Classical debate over best normalization of adjacency (or modularity) matrix A not trivial to solve. With heterogeneous degrees, we found a good on-line method. Simulations support good performances even for rather sparse settings. But strong limitations: Key assumption: C ab = 1 + M ab n. Everything collapses if different regime. 80 / 153

199 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. Classical debate over best normalization of adjacency (or modularity) matrix A not trivial to solve. With heterogeneous degrees, we found a good on-line method. Simulations support good performances even for rather sparse settings. But strong limitations: Key assumption: C ab = 1 + M ab n. Everything collapses if different regime. Simulations on small networks in fact give ridiculous arbitrary results. 80 / 153

200 Applications/Kernel Spectral Clustering 81/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 81 / 153

201 Applications/Kernel Spectral Clustering 82/153 Kernel Spectral Clustering Problem Statement Dataset x 1,..., x n R p Objective: cluster data in k similarity classes S 1,..., S k. 82 / 153

202 Applications/Kernel Spectral Clustering 82/153 Kernel Spectral Clustering Problem Statement Dataset x 1,..., x n R p Objective: cluster data in k similarity classes S 1,..., S k. Typical metric to optimize: k κ(x j, x j (RatioCut) argmin ) S1... S k ={1,...,n} S i=1 j S i i j / S i for some similarity kernel κ(x, y) 0 (large if x similar to y). 82 / 153

203 Applications/Kernel Spectral Clustering 82/153 Kernel Spectral Clustering Problem Statement Dataset x 1,..., x n R p Objective: cluster data in k similarity classes S 1,..., S k. Typical metric to optimize: k κ(x j, x j (RatioCut) argmin ) S1... S k ={1,...,n} S i=1 j S i i j / S i for some similarity kernel κ(x, y) 0 (large if x similar to y). Can be shown equivalent to (RatioCut) argmin M M tr M T (D K)M { } where M R n k M; M ij {0, S j 1 2 } (in particular, M T M = I k ) and K = {κ(x i, x j )} n i,j=1, D ii = n K ij. j=1 82 / 153

204 Applications/Kernel Spectral Clustering 82/153 Kernel Spectral Clustering Problem Statement Dataset x 1,..., x n R p Objective: cluster data in k similarity classes S 1,..., S k. Typical metric to optimize: k κ(x j, x j (RatioCut) argmin ) S1... S k ={1,...,n} S i=1 j S i i j / S i for some similarity kernel κ(x, y) 0 (large if x similar to y). Can be shown equivalent to (RatioCut) argmin M M tr M T (D K)M { } where M R n k M; M ij {0, S j 1 2 } (in particular, M T M = I k ) and K = {κ(x i, x j )} n i,j=1, D ii = n K ij. j=1 But integer problem! Usually NP-complete. 82 / 153

205 Applications/Kernel Spectral Clustering 83/153 Kernel Spectral Clustering Towards kernel spectral clustering Kernel spectral clustering: discrete-to-continuous relaxations of such metrics (RatioCut) argmin M, M T M=I K tr M T (D K)M i.e., eigenvector problem: 1. find eigenvectors of smallest eigenvalues 2. retrieve classes from eigenvector components 83 / 153

206 Applications/Kernel Spectral Clustering 83/153 Kernel Spectral Clustering Towards kernel spectral clustering Kernel spectral clustering: discrete-to-continuous relaxations of such metrics (RatioCut) argmin M, M T M=I K tr M T (D K)M i.e., eigenvector problem: 1. find eigenvectors of smallest eigenvalues 2. retrieve classes from eigenvector components Refinements: working on K, D K, I n D 1 K, I n D 1 2 KD 2 1, etc. several steps algorithms: Ng Jordan Weiss, Shi Malik, etc. 83 / 153

207 Applications/Kernel Spectral Clustering 84/153 Kernel Spectral Clustering 84 / 153

208 Applications/Kernel Spectral Clustering 84/153 Kernel Spectral Clustering 84 / 153

209 Applications/Kernel Spectral Clustering 84/153 Kernel Spectral Clustering 84 / 153

210 Applications/Kernel Spectral Clustering 84/153 Kernel Spectral Clustering Figure: Leading four eigenvectors of D 2 1 KD 2 1 for MNIST data. 84 / 153

211 Applications/Kernel Spectral Clustering 85/153 Methodology and objectives Current state: Algorithms derived from ad-hoc procedures (e.g., relaxation). Little understanding of performance, even for Gaussian mixtures! Let alone when both p and n are large (BigData setting) 85 / 153

212 Applications/Kernel Spectral Clustering 85/153 Methodology and objectives Current state: Algorithms derived from ad-hoc procedures (e.g., relaxation). Little understanding of performance, even for Gaussian mixtures! Let alone when both p and n are large (BigData setting) Objectives and Roadmap: Develop mathematical analysis framework for BigData kernel spectral clustering (p, n ) 85 / 153

213 Applications/Kernel Spectral Clustering 85/153 Methodology and objectives Current state: Algorithms derived from ad-hoc procedures (e.g., relaxation). Little understanding of performance, even for Gaussian mixtures! Let alone when both p and n are large (BigData setting) Objectives and Roadmap: Develop mathematical analysis framework for BigData kernel spectral clustering (p, n ) Understand: 1. Phase transition effects (i.e., when is clustering possible?) 2. Content of each eigenvector 3. Influence of kernel function 4. Performance comparison of clustering algorithms 85 / 153

214 Applications/Kernel Spectral Clustering 85/153 Methodology and objectives Current state: Algorithms derived from ad-hoc procedures (e.g., relaxation). Little understanding of performance, even for Gaussian mixtures! Let alone when both p and n are large (BigData setting) Objectives and Roadmap: Develop mathematical analysis framework for BigData kernel spectral clustering (p, n ) Understand: 1. Phase transition effects (i.e., when is clustering possible?) 2. Content of each eigenvector 3. Influence of kernel function 4. Performance comparison of clustering algorithms Methodology: Use statistical assumptions (Gaussian mixture) Benefit from doubly-infinite independence and random matrix tools 85 / 153

215 Applications/Kernel Spectral Clustering 86/153 Model and Assumptions Gaussian mixture model: x 1,..., x n R p, k classes C 1,..., C k, x 1,..., x n1 C 1,..., x n nk +1,..., x n C k, C a = {x x N (µ a, C a)}. Then, for x i C a, with w i N(0, C a), x i = µ a + w i. 86 / 153

216 Applications/Kernel Spectral Clustering 86/153 Model and Assumptions Gaussian mixture model: x 1,..., x n R p, k classes C 1,..., C k, x 1,..., x n1 C 1,..., x n nk +1,..., x n C k, C a = {x x N (µ a, C a)}. Then, for x i C a, with w i N(0, C a), x i = µ a + w i. Assumption (Convergence Rate) As n, p 1. Data scaling: n c 0 (0, ), n 2. Class scaling: a n c a (0, 1), 3. Mean scaling: with µ k a=1 n a n µ a and µ a µ a µ, then µ a = O(1) 4. Covariance scaling: with C k n a a=1 n C a and Ca Ca C, then C a = O(1), 1 p tr C a = O(1) tr C ac b = O(p) 86 / 153

217 Applications/Kernel Spectral Clustering 87/153 Model and Assumptions Kernel Matrix: Kernel matrix of interest: K = { ( )} 1 n f p x i x j 2 i,j=1 for some sufficiently smooth nonnegative f. 87 / 153

218 Applications/Kernel Spectral Clustering 87/153 Model and Assumptions Kernel Matrix: Kernel matrix of interest: K = { ( )} 1 n f p x i x j 2 i,j=1 for some sufficiently smooth nonnegative f. We study the normalized Laplacian: L = nd 1 2 KD 1 2 with d = K1 n, D = diag(d). 87 / 153

219 Applications/Kernel Spectral Clustering 88/153 Model and Assumptions Difficulty: L is a very intractable random matrix non-linear f non-trivial dependence between entries of L 88 / 153

220 Applications/Kernel Spectral Clustering 88/153 Model and Assumptions Difficulty: L is a very intractable random matrix non-linear f non-trivial dependence between entries of L Strategy: 1. Find random equivalent ˆL a.s. (i.e., L ˆL 0 as n, p ) based on: concentration: K ij constant as n, p (for all i j) Taylor expansion around limit point 88 / 153

221 Applications/Kernel Spectral Clustering 88/153 Model and Assumptions Difficulty: L is a very intractable random matrix non-linear f non-trivial dependence between entries of L Strategy: 1. Find random equivalent ˆL a.s. (i.e., L ˆL 0 as n, p ) based on: concentration: K ij constant as n, p (for all i j) Taylor expansion around limit point 2. Apply spiked random matrix approach to study: existence of isolated eigenvalues in ˆL: phase transition 88 / 153

222 Applications/Kernel Spectral Clustering 88/153 Model and Assumptions Difficulty: L is a very intractable random matrix non-linear f non-trivial dependence between entries of L Strategy: 1. Find random equivalent ˆL a.s. (i.e., L ˆL 0 as n, p ) based on: concentration: K ij constant as n, p (for all i j) Taylor expansion around limit point 2. Apply spiked random matrix approach to study: existence of isolated eigenvalues in ˆL: phase transition eigenvector projections on canonical class-basis 88 / 153

223 Applications/Kernel Spectral Clustering 89/153 Random Matrix Equivalent Results on K: Key Remark: Under our assumptions, uniformly on i, j {1,..., n}, for some common limit τ. 1 p x i x j 2 a.s. τ 89 / 153

224 Applications/Kernel Spectral Clustering 89/153 Random Matrix Equivalent Results on K: Key Remark: Under our assumptions, uniformly on i, j {1,..., n}, for some common limit τ. large dimensional approximation for K: K = f(τ)1 n1 T n + }{{} O (n) 1 p x i x j 2 a.s. τ na1 }{{} low rank, O ( n) + A 2 }{{} informative terms, O (1) 89 / 153

225 Applications/Kernel Spectral Clustering 89/153 Random Matrix Equivalent Results on K: Key Remark: Under our assumptions, uniformly on i, j {1,..., n}, for some common limit τ. large dimensional approximation for K: K = f(τ)1 n1 T n + }{{} O (n) 1 p x i x j 2 a.s. τ na1 }{{} low rank, O ( n) + A 2 }{{} informative terms, O (1) difficult to handle (3 orders to manipulate!) 89 / 153

226 Applications/Kernel Spectral Clustering 89/153 Random Matrix Equivalent Results on K: Key Remark: Under our assumptions, uniformly on i, j {1,..., n}, for some common limit τ. large dimensional approximation for K: K = f(τ)1 n1 T n + }{{} O (n) 1 p x i x j 2 a.s. τ na1 }{{} low rank, O ( n) + A 2 }{{} informative terms, O (1) difficult to handle (3 orders to manipulate!) Observation: Spectrum of L = nd 1 2 KD 1 2 : Dominant eigenvalue n with eigenvector D n All other eigenvalues of order O(1). 89 / 153

227 Applications/Kernel Spectral Clustering 89/153 Random Matrix Equivalent Results on K: Key Remark: Under our assumptions, uniformly on i, j {1,..., n}, for some common limit τ. large dimensional approximation for K: K = f(τ)1 n1 T n + }{{} O (n) 1 p x i x j 2 a.s. τ na1 }{{} low rank, O ( n) + A 2 }{{} informative terms, O (1) difficult to handle (3 orders to manipulate!) Observation: Spectrum of L = nd 1 2 KD 1 2 : Dominant eigenvalue n with eigenvector D n All other eigenvalues of order O(1). Naturally leads to study: Projected normalized Laplacian (or modularity -type Laplacian): L = nd 1 2 KD 1 2 n D n1 T 2 nd 1 ) 1 T = nd 1 2 (K ddt n D1n 1 T D 1 2. d Dominant (normalized) eigenvector D n 1 T n D1 n. 89 / 153

228 Applications/Kernel Spectral Clustering 90/153 Random Matrix Equivalent Theorem (Random Matrix Equivalent) As n, p, in operator norm, L ˆL a.s. 0, where ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n and τ = 2 p tr C, W = [w 1,..., w n] R p n (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] 1 U = p J, Φ, ψ R n (2k+4) B = B 11 = M T M + ( B 11 I k 1 k c T 5f ) (τ) 8f(τ) f (τ) 2f t (τ) I k c1 T k ) 0 k k 0 k 1 R(2k+4) (2k+4) t T 5f 0 (τ) 1 k ( 5f (τ) 8f(τ) f (τ) 2f (τ) ( 5f (τ) 8f(τ) f (τ) 2f (τ) 8f(τ) f (τ) 2f (τ) ) tt T f (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k Rk k. 90 / 153

229 Applications/Kernel Spectral Clustering 90/153 Random Matrix Equivalent Theorem (Random Matrix Equivalent) As n, p, in operator norm, L ˆL a.s. 0, where ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n and τ = 2 p tr C, W = [w 1,..., w n] R p n (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] 1 U = p J, Φ, ψ R n (2k+4) B = B 11 = M T M + ( B 11 I k 1 k c T 5f ) (τ) 8f(τ) f (τ) 2f t (τ) I k c1 T k ) 0 k k 0 k 1 R(2k+4) (2k+4) t T 5f 0 (τ) 1 k ( 5f (τ) 8f(τ) f (τ) 2f (τ) ( 5f (τ) 8f(τ) f (τ) 2f (τ) 8f(τ) f (τ) 2f (τ) ) tt T f (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k Rk k. 1 p J = [j 1,..., j k ] R n k, j a canonical vector of class C a. 90 / 153

230 Applications/Kernel Spectral Clustering 90/153 Random Matrix Equivalent Theorem (Random Matrix Equivalent) As n, p, in operator norm, L ˆL a.s. 0, where ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n and τ = 2 p tr C, W = [w 1,..., w n] R p n (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] 1 U = p J, Φ, ψ R n (2k+4) B = B 11 = M T M + ( B 11 I k 1 k c T 5f ) (τ) 8f(τ) f (τ) 2f t (τ) I k c1 T k ) 0 k k 0 k 1 R(2k+4) (2k+4) t T 5f 0 (τ) 1 k ( 5f (τ) 8f(τ) f (τ) 2f (τ) ( 5f (τ) 8f(τ) f (τ) 2f (τ) M = [µ 1,..., µ k ] Rn k, µ a = µ a k n b b=1 n µ b. 8f(τ) f (τ) 2f (τ) ) tt T f (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k Rk k. 90 / 153

231 Applications/Kernel Spectral Clustering 90/153 Random Matrix Equivalent Theorem (Random Matrix Equivalent) As n, p, in operator norm, L ˆL a.s. 0, where ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n and τ = 2 p tr C, W = [w 1,..., w n] R p n (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] 1 U = p J, Φ, ψ R n (2k+4) B = B 11 = M T M + ( B 11 I k 1 k c T 5f ) (τ) 8f(τ) f (τ) 2f t (τ) I k c1 T k ) 0 k k 0 k 1 R(2k+4) (2k+4) t T 5f 0 (τ) 1 k ( 5f (τ) 8f(τ) f (τ) 2f (τ) ( 5f (τ) 8f(τ) f (τ) 2f (τ) 8f(τ) f (τ) 2f (τ) ) tt T f (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k Rk k. [ ] t = p 1 tr C1,..., p 1 tr Ck R k, Ca = Ca k n b b=1 n C b. 90 / 153

232 Applications/Kernel Spectral Clustering 90/153 Random Matrix Equivalent Theorem (Random Matrix Equivalent) As n, p, in operator norm, L ˆL a.s. 0, where ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n and τ = 2 p tr C, W = [w 1,..., w n] R p n (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] 1 U = p J, Φ, ψ R n (2k+4) B = B 11 = M T M + T = { 1 p tr C ac b ( B 11 I k 1 k c T 5f ) (τ) 8f(τ) f (τ) 2f t (τ) I k c1 T k ) 0 k k 0 k 1 R(2k+4) (2k+4) t T 5f 0 (τ) 1 k ( 5f (τ) 8f(τ) f (τ) 2f (τ) ( 5f (τ) 8f(τ) f (τ) 2f (τ) } k a,b=1 Rk k, C a = C a k b=1 8f(τ) f (τ) 2f (τ) ) tt T f (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k Rk k. n b n C b. 90 / 153

233 Applications/Kernel Spectral Clustering 91/153 Random Matrix Equivalent Some consequences: ˆL is a spiked model: UBU T seen as low rank perturbation of 1 p P W T W P 91 / 153

234 Applications/Kernel Spectral Clustering 91/153 Random Matrix Equivalent Some consequences: ˆL is a spiked model: UBU T seen as low rank perturbation of 1 p P W T W P If f (τ) = 0, L asymptotically deterministic! only t and T can be discriminated upon If f (τ) = 0, (e.g., f(x) = x) T unused If 5f (τ) 8f(τ) = f (τ) 2f, t (seemingly) unused (τ) 91 / 153

235 Applications/Kernel Spectral Clustering 91/153 Random Matrix Equivalent Some consequences: ˆL is a spiked model: UBU T seen as low rank perturbation of 1 p P W T W P If f (τ) = 0, L asymptotically deterministic! only t and T can be discriminated upon If f (τ) = 0, (e.g., f(x) = x) T unused If 5f (τ) 8f(τ) Further analysis: = f (τ) 2f, t (seemingly) unused (τ) Determine separability condition for eigenvalues Evaluate eigenvalue positions when separable Evaluate eigenvector projection to canonical basis j 1,..., j k Evaluate fluctuation of eigenvectors. 91 / 153

236 Applications/Kernel Spectral Clustering 92/153 Isolated eigenvalues: Gaussian inputs Eigenvalues of L Eigenvalues of ˆL Figure: Eigenvalues of L and ˆL, k = 3, p = 2048, n = 512, c 1 = c 2 = 1/4, c 3 = 1/2, [µ a] j = 4δ aj, C a = (1 + 2(a 1)/ p)i p, f(x) = exp( x/2). 92 / 153

237 Applications/Kernel Spectral Clustering 93/153 Theoretical Findings versus MNIST 0.2 Eigenvalues of L Figure: Eigenvalues of L (red) and (equivalent Gaussian model) ˆL (white), MNIST data, p = 784, n = / 153

238 Applications/Kernel Spectral Clustering 93/153 Theoretical Findings versus MNIST 0.2 Eigenvalues of L Eigenvalues of ˆL as if Gaussian model Figure: Eigenvalues of L (red) and (equivalent Gaussian model) ˆL (white), MNIST data, p = 784, n = / 153

239 Applications/Kernel Spectral Clustering 94/153 Theoretical Findings versus MNIST Figure: Leading four eigenvectors of D 2 1 KD 1 2 for MNIST data (red) and theoretical findings (blue). 94 / 153

240 Applications/Kernel Spectral Clustering 94/153 Theoretical Findings versus MNIST Figure: Leading four eigenvectors of D 2 1 KD 1 2 for MNIST data (red) and theoretical findings (blue). 94 / 153

Random Matrices in Machine Learning

Random Matrices in Machine Learning Random Matrices in Machine Learning Romain COUILLET CentraleSupélec, University of ParisSaclay, France GSTATS IDEX DataScience Chair, GIPSA-lab, University Grenoble Alpes, France. June 21, 2018 1 / 113

More information

A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications

A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications (EURECOM) Romain COUILLET CentraleSupélec, France June, 2017 1 / 80 Outline Basics of Random Matrix Theory

More information

A Random Matrix Framework for BigData Machine Learning

A Random Matrix Framework for BigData Machine Learning A Random Matrix Framework for BigData Machine Learning (Groupe Deep Learning, DigiCosme) Romain COUILLET CentraleSupélec, France June, 2017 1 / 63 Outline Basics of Random Matrix Theory Motivation: Large

More information

Non white sample covariance matrices.

Non white sample covariance matrices. Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions

More information

Comparison Method in Random Matrix Theory

Comparison Method in Random Matrix Theory Comparison Method in Random Matrix Theory Jun Yin UW-Madison Valparaíso, Chile, July - 2015 Joint work with A. Knowles. 1 Some random matrices Wigner Matrix: H is N N square matrix, H : H ij = H ji, EH

More information

Local semicircle law, Wegner estimate and level repulsion for Wigner random matrices

Local semicircle law, Wegner estimate and level repulsion for Wigner random matrices Local semicircle law, Wegner estimate and level repulsion for Wigner random matrices László Erdős University of Munich Oberwolfach, 2008 Dec Joint work with H.T. Yau (Harvard), B. Schlein (Cambrigde) Goal:

More information

Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction

Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction Random Matrix Theory and its applications to Statistics and Wireless Communications Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction Sergio Verdú Princeton University National

More information

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),

More information

Future Random Matrix Tools for Large Dimensional Signal Processing

Future Random Matrix Tools for Large Dimensional Signal Processing /42 Future Random Matrix Tools for Large Dimensional Signal Processing EUSIPCO 204, Lisbon, Portugal. Abla KAMMOUN and Romain COUILLET 2 King s Abdullah University of Technology and Science, Saudi Arabia

More information

The Dynamics of Learning: A Random Matrix Approach

The Dynamics of Learning: A Random Matrix Approach The Dynamics of Learning: A Random Matrix Approach ICML 2018, Stockholm, Sweden Zhenyu Liao, Romain Couillet L2S, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair, GIPSA-lab,

More information

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao A note on a Marčenko-Pastur type theorem for time series Jianfeng Yao Workshop on High-dimensional statistics The University of Hong Kong, October 2011 Overview 1 High-dimensional data and the sample covariance

More information

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA) The circular law Lewis Memorial Lecture / DIMACS minicourse March 19, 2008 Terence Tao (UCLA) 1 Eigenvalue distributions Let M = (a ij ) 1 i n;1 j n be a square matrix. Then one has n (generalised) eigenvalues

More information

Large sample covariance matrices and the T 2 statistic

Large sample covariance matrices and the T 2 statistic Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and

More information

Preface to the Second Edition...vii Preface to the First Edition... ix

Preface to the Second Edition...vii Preface to the First Edition... ix Contents Preface to the Second Edition...vii Preface to the First Edition........... ix 1 Introduction.............................................. 1 1.1 Large Dimensional Data Analysis.........................

More information

Applications of random matrix theory to principal component analysis(pca)

Applications of random matrix theory to principal component analysis(pca) Applications of random matrix theory to principal component analysis(pca) Jun Yin IAS, UW-Madison IAS, April-2014 Joint work with A. Knowles and H. T Yau. 1 Basic picture: Let H be a Wigner (symmetric)

More information

Determinantal point processes and random matrix theory in a nutshell

Determinantal point processes and random matrix theory in a nutshell Determinantal point processes and random matrix theory in a nutshell part II Manuela Girotti based on M. Girotti s PhD thesis, A. Kuijlaars notes from Les Houches Winter School 202 and B. Eynard s notes

More information

Spectral law of the sum of random matrices

Spectral law of the sum of random matrices Spectral law of the sum of random matrices Florent Benaych-Georges benaych@dma.ens.fr May 5, 2005 Abstract The spectral distribution of a matrix is the uniform distribution on its spectrum with multiplicity.

More information

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Estimation of the Global Minimum Variance Portfolio in High Dimensions Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results

More information

Inhomogeneous circular laws for random matrices with non-identically distributed entries

Inhomogeneous circular laws for random matrices with non-identically distributed entries Inhomogeneous circular laws for random matrices with non-identically distributed entries Nick Cook with Walid Hachem (Telecom ParisTech), Jamal Najim (Paris-Est) and David Renfrew (SUNY Binghamton/IST

More information

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional

More information

The norm of polynomials in large random matrices

The norm of polynomials in large random matrices The norm of polynomials in large random matrices Camille Mâle, École Normale Supérieure de Lyon, Ph.D. Student under the direction of Alice Guionnet. with a significant contribution by Dimitri Shlyakhtenko.

More information

THE EIGENVALUES AND EIGENVECTORS OF FINITE, LOW RANK PERTURBATIONS OF LARGE RANDOM MATRICES

THE EIGENVALUES AND EIGENVECTORS OF FINITE, LOW RANK PERTURBATIONS OF LARGE RANDOM MATRICES THE EIGENVALUES AND EIGENVECTORS OF FINITE, LOW RANK PERTURBATIONS OF LARGE RANDOM MATRICES FLORENT BENAYCH-GEORGES AND RAJ RAO NADAKUDITI Abstract. We consider the eigenvalues and eigenvectors of finite,

More information

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices

Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices by Jack W. Silverstein* Department of Mathematics Box 8205 North Carolina State University Raleigh,

More information

Random Matrices: Invertibility, Structure, and Applications

Random Matrices: Invertibility, Structure, and Applications Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37

More information

Applications and fundamental results on random Vandermon

Applications and fundamental results on random Vandermon Applications and fundamental results on random Vandermonde matrices May 2008 Some important concepts from classical probability Random variables are functions (i.e. they commute w.r.t. multiplication)

More information

Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic

More information

Robust covariance matrices estimation and applications in signal processing

Robust covariance matrices estimation and applications in signal processing Robust covariance matrices estimation and applications in signal processing F. Pascal SONDRA/Supelec GDR ISIS Journée Estimation et traitement statistique en grande dimension May 16 th, 2013 FP (SONDRA/Supelec)

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

The Matrix Dyson Equation in random matrix theory

The Matrix Dyson Equation in random matrix theory The Matrix Dyson Equation in random matrix theory László Erdős IST, Austria Mathematical Physics seminar University of Bristol, Feb 3, 207 Joint work with O. Ajanki, T. Krüger Partially supported by ERC

More information

Research Program. Romain COUILLET. 11 février CentraleSupélec Université Paris-Sud 11 1 / 38

Research Program. Romain COUILLET. 11 février CentraleSupélec Université Paris-Sud 11 1 / 38 Research Program Romain COUILLET CentraleSupélec Université Paris-Sud 11 11 février 2016 1 / 38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large

More information

Exponential tail inequalities for eigenvalues of random matrices

Exponential tail inequalities for eigenvalues of random matrices Exponential tail inequalities for eigenvalues of random matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify

More information

Eigenvalue variance bounds for Wigner and covariance random matrices

Eigenvalue variance bounds for Wigner and covariance random matrices Eigenvalue variance bounds for Wigner and covariance random matrices S. Dallaporta University of Toulouse, France Abstract. This work is concerned with finite range bounds on the variance of individual

More information

Markov operators, classical orthogonal polynomial ensembles, and random matrices

Markov operators, classical orthogonal polynomial ensembles, and random matrices Markov operators, classical orthogonal polynomial ensembles, and random matrices M. Ledoux, Institut de Mathématiques de Toulouse, France 5ecm Amsterdam, July 2008 recent study of random matrix and random

More information

Supelec Randomness in Wireless Networks: how to deal with it?

Supelec Randomness in Wireless Networks: how to deal with it? Supelec Randomness in Wireless Networks: how to deal with it? Mérouane Debbah Alcatel-Lucent Chair on Flexible Radio merouane.debbah@supelec.fr The performance dilemma... La théorie, c est quand on sait

More information

MIMO Capacities : Eigenvalue Computation through Representation Theory

MIMO Capacities : Eigenvalue Computation through Representation Theory MIMO Capacities : Eigenvalue Computation through Representation Theory Jayanta Kumar Pal, Donald Richards SAMSI Multivariate distributions working group Outline 1 Introduction 2 MIMO working model 3 Eigenvalue

More information

DISTRIBUTION OF EIGENVALUES OF REAL SYMMETRIC PALINDROMIC TOEPLITZ MATRICES AND CIRCULANT MATRICES

DISTRIBUTION OF EIGENVALUES OF REAL SYMMETRIC PALINDROMIC TOEPLITZ MATRICES AND CIRCULANT MATRICES DISTRIBUTION OF EIGENVALUES OF REAL SYMMETRIC PALINDROMIC TOEPLITZ MATRICES AND CIRCULANT MATRICES ADAM MASSEY, STEVEN J. MILLER, AND JOHN SINSHEIMER Abstract. Consider the ensemble of real symmetric Toeplitz

More information

Fluctuations from the Semicircle Law Lecture 4

Fluctuations from the Semicircle Law Lecture 4 Fluctuations from the Semicircle Law Lecture 4 Ioana Dumitriu University of Washington Women and Math, IAS 2014 May 23, 2014 Ioana Dumitriu (UW) Fluctuations from the Semicircle Law Lecture 4 May 23, 2014

More information

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2 EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut

More information

1 Intro to RMT (Gene)

1 Intro to RMT (Gene) M705 Spring 2013 Summary for Week 2 1 Intro to RMT (Gene) (Also see the Anderson - Guionnet - Zeitouni book, pp.6-11(?) ) We start with two independent families of R.V.s, {Z i,j } 1 i

More information

On the concentration of eigenvalues of random symmetric matrices

On the concentration of eigenvalues of random symmetric matrices On the concentration of eigenvalues of random symmetric matrices Noga Alon Michael Krivelevich Van H. Vu April 23, 2012 Abstract It is shown that for every 1 s n, the probability that the s-th largest

More information

Concentration inequalities: basics and some new challenges

Concentration inequalities: basics and some new challenges Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,

More information

Semicircle law on short scales and delocalization for Wigner random matrices

Semicircle law on short scales and delocalization for Wigner random matrices Semicircle law on short scales and delocalization for Wigner random matrices László Erdős University of Munich Weizmann Institute, December 2007 Joint work with H.T. Yau (Harvard), B. Schlein (Munich)

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Random regular digraphs: singularity and spectrum

Random regular digraphs: singularity and spectrum Random regular digraphs: singularity and spectrum Nick Cook, UCLA Probability Seminar, Stanford University November 2, 2015 Universality Circular law Singularity probability Talk outline 1 Universality

More information

2 Two-Point Boundary Value Problems

2 Two-Point Boundary Value Problems 2 Two-Point Boundary Value Problems Another fundamental equation, in addition to the heat eq. and the wave eq., is Poisson s equation: n j=1 2 u x 2 j The unknown is the function u = u(x 1, x 2,..., x

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

A new method to bound rate of convergence

A new method to bound rate of convergence A new method to bound rate of convergence Arup Bose Indian Statistical Institute, Kolkata, abose@isical.ac.in Sourav Chatterjee University of California, Berkeley Empirical Spectral Distribution Distribution

More information

Random Matrix: From Wigner to Quantum Chaos

Random Matrix: From Wigner to Quantum Chaos Random Matrix: From Wigner to Quantum Chaos Horng-Tzer Yau Harvard University Joint work with P. Bourgade, L. Erdős, B. Schlein and J. Yin 1 Perhaps I am now too courageous when I try to guess the distribution

More information

Lectures 6 7 : Marchenko-Pastur Law

Lectures 6 7 : Marchenko-Pastur Law Fall 2009 MATH 833 Random Matrices B. Valkó Lectures 6 7 : Marchenko-Pastur Law Notes prepared by: A. Ganguly We will now turn our attention to rectangular matrices. Let X = (X 1, X 2,..., X n ) R p n

More information

arxiv: v1 [math-ph] 19 Oct 2018

arxiv: v1 [math-ph] 19 Oct 2018 COMMENT ON FINITE SIZE EFFECTS IN THE AVERAGED EIGENVALUE DENSITY OF WIGNER RANDOM-SIGN REAL SYMMETRIC MATRICES BY G.S. DHESI AND M. AUSLOOS PETER J. FORRESTER AND ALLAN K. TRINH arxiv:1810.08703v1 [math-ph]

More information

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Alan Edelman Department of Mathematics, Computer Science and AI Laboratories. E-mail: edelman@math.mit.edu N. Raj Rao Deparment

More information

INVERSE EIGENVALUE STATISTICS FOR RAYLEIGH AND RICIAN MIMO CHANNELS

INVERSE EIGENVALUE STATISTICS FOR RAYLEIGH AND RICIAN MIMO CHANNELS INVERSE EIGENVALUE STATISTICS FOR RAYLEIGH AND RICIAN MIMO CHANNELS E. Jorswieck, G. Wunder, V. Jungnickel, T. Haustein Abstract Recently reclaimed importance of the empirical distribution function of

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. Adjoint operator and adjoint matrix Given a linear operator L on an inner product space V, the adjoint of L is a transformation

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

An introduction to G-estimation with sample covariance matrices

An introduction to G-estimation with sample covariance matrices An introduction to G-estimation with sample covariance matrices Xavier estre xavier.mestre@cttc.cat Centre Tecnològic de Telecomunicacions de Catalunya (CTTC) "atrices aleatoires: applications aux communications

More information

On the Spectrum of Random Features Maps of High Dimensional Data

On the Spectrum of Random Features Maps of High Dimensional Data On the Spectrum of Random Features Maps of High Dimensional Data ICML 018, Stockholm, Sweden Zhenyu Liao, Romain Couillet LS, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair,

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory

More information

Random Matrix Theory for Neural Networks

Random Matrix Theory for Neural Networks Random Matrix Theory for Neural Networks Ph.D. Mid-Term Evaluation Zhenyu Liao Laboratoire des Signaux et Systèmes CentraleSupélec Université Paris-Saclay Salle sd.207, Bâtiment Bouygues Gif-sur-Yvette,

More information

Density of States for Random Band Matrices in d = 2

Density of States for Random Band Matrices in d = 2 Density of States for Random Band Matrices in d = 2 via the supersymmetric approach Mareike Lager Institute for applied mathematics University of Bonn Joint work with Margherita Disertori ZiF Summer School

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Math 118, Fall 2014 Final Exam

Math 118, Fall 2014 Final Exam Math 8, Fall 4 Final Exam True or false Please circle your choice; no explanation is necessary True There is a linear transformation T such that T e ) = e and T e ) = e Solution Since T is linear, if T

More information

High Dimensional Minimum Risk Portfolio Optimization

High Dimensional Minimum Risk Portfolio Optimization High Dimensional Minimum Risk Portfolio Optimization Liusha Yang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology Centrale-Supélec June 26, 2015 Yang (HKUST)

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as MAHALANOBIS DISTANCE Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as d E (x, y) = (x 1 y 1 ) 2 + +(x p y p ) 2

More information

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM Unless otherwise stated, all vector spaces in this worksheet are finite dimensional and the scalar field F is R or C. Definition 1. A linear operator

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Archive of past papers, solutions and homeworks for. MATH 224, Linear Algebra 2, Spring 2013, Laurence Barker

Archive of past papers, solutions and homeworks for. MATH 224, Linear Algebra 2, Spring 2013, Laurence Barker Archive of past papers, solutions and homeworks for MATH 224, Linear Algebra 2, Spring 213, Laurence Barker version: 4 June 213 Source file: archfall99.tex page 2: Homeworks. page 3: Quizzes. page 4: Midterm

More information

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws Symeon Chatzinotas February 11, 2013 Luxembourg Outline 1. Random Matrix Theory 1. Definition 2. Applications 3. Asymptotics 2. Ensembles

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

1 Math 241A-B Homework Problem List for F2015 and W2016

1 Math 241A-B Homework Problem List for F2015 and W2016 1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let

More information

Invertibility of random matrices

Invertibility of random matrices University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]

More information

Lecture I: Asymptotics for large GUE random matrices

Lecture I: Asymptotics for large GUE random matrices Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random

More information

Applications of Robust Optimization in Signal Processing: Beamforming and Power Control Fall 2012

Applications of Robust Optimization in Signal Processing: Beamforming and Power Control Fall 2012 Applications of Robust Optimization in Signal Processing: Beamforg and Power Control Fall 2012 Instructor: Farid Alizadeh Scribe: Shunqiao Sun 12/09/2012 1 Overview In this presentation, we study the applications

More information

5 Operations on Multiple Random Variables

5 Operations on Multiple Random Variables EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y

More information

Compound matrices and some classical inequalities

Compound matrices and some classical inequalities Compound matrices and some classical inequalities Tin-Yau Tam Mathematics & Statistics Auburn University Dec. 3, 04 We discuss some elegant proofs of several classical inequalities of matrices by using

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

OR MSc Maths Revision Course

OR MSc Maths Revision Course OR MSc Maths Revision Course Tom Byrne School of Mathematics University of Edinburgh t.m.byrne@sms.ed.ac.uk 15 September 2017 General Information Today JCMB Lecture Theatre A, 09:30-12:30 Mathematics revision

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

Operator norm convergence for sequence of matrices and application to QIT

Operator norm convergence for sequence of matrices and application to QIT Operator norm convergence for sequence of matrices and application to QIT Benoît Collins University of Ottawa & AIMR, Tohoku University Cambridge, INI, October 15, 2013 Overview Overview Plan: 1. Norm

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Free deconvolution for signal processing applications

Free deconvolution for signal processing applications IEEE TRANSACTIONS ON INFORMATION THEORY, VOL., NO., JANUARY 27 Free deconvolution for signal processing applications Øyvind Ryan, Member, IEEE, Mérouane Debbah, Member, IEEE arxiv:cs.it/725v 4 Jan 27 Abstract

More information

Minimum variance portfolio optimization in the spiked covariance model

Minimum variance portfolio optimization in the spiked covariance model Minimum variance portfolio optimization in the spiked covariance model Liusha Yang, Romain Couillet, Matthew R Mckay To cite this version: Liusha Yang, Romain Couillet, Matthew R Mckay Minimum variance

More information

Lecture 4: Analysis of MIMO Systems

Lecture 4: Analysis of MIMO Systems Lecture 4: Analysis of MIMO Systems Norms The concept of norm will be extremely useful for evaluating signals and systems quantitatively during this course In the following, we will present vector norms

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Matrix Theory. A.Holst, V.Ufnarovski

Matrix Theory. A.Holst, V.Ufnarovski Matrix Theory AHolst, VUfnarovski 55 HINTS AND ANSWERS 9 55 Hints and answers There are two different approaches In the first one write A as a block of rows and note that in B = E ij A all rows different

More information

1 Tridiagonal matrices

1 Tridiagonal matrices Lecture Notes: β-ensembles Bálint Virág Notes with Diane Holcomb 1 Tridiagonal matrices Definition 1. Suppose you have a symmetric matrix A, we can define its spectral measure (at the first coordinate

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

Part IA. Vectors and Matrices. Year

Part IA. Vectors and Matrices. Year Part IA Vectors and Matrices Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2018 Paper 1, Section I 1C Vectors and Matrices For z, w C define the principal value of z w. State de Moivre s

More information

On the principal components of sample covariance matrices

On the principal components of sample covariance matrices On the principal components of sample covariance matrices Alex Bloemendal Antti Knowles Horng-Tzer Yau Jun Yin February 4, 205 We introduce a class of M M sample covariance matrices Q which subsumes and

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information