Random Matrices for Big Data Signal Processing and Machine Learning
|
|
- Britton Blair
- 6 years ago
- Views:
Transcription
1 Random Matrices for Big Data Signal Processing and Machine Learning (ICASSP 2017, New Orleans) Romain COUILLET and Hafiz TIOMOKO ALI CentraleSupélec, France March, / 153
2 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 2 / 153
3 Basics of Random Matrix Theory/ 3/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 3 / 153
4 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 4/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 4 / 153
5 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : 5 / 153
6 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is the sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 5 / 153
7 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is the sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 If n, then, strong law of large numbers Ĉ N a.s. C N. or equivalently, in spectral norm ĈN C N a.s / 153
8 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is the sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 If n, then, strong law of large numbers Ĉ N a.s. C N. or equivalently, in spectral norm ĈN C N a.s. 0. Random Matrix Regime No longer valid if N, n with N/n c (0, ), ĈN C N 0. 5 / 153
9 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is the sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 If n, then, strong law of large numbers Ĉ N a.s. C N. or equivalently, in spectral norm ĈN C N a.s. 0. Random Matrix Regime No longer valid if N, n with N/n c (0, ), ĈN C N 0. For practical N, n with N n, leads to dramatically wrong conclusions 5 / 153
10 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 5/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is the sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 If n, then, strong law of large numbers Ĉ N a.s. C N. or equivalently, in spectral norm ĈN C N a.s. 0. Random Matrix Regime No longer valid if N, n with N/n c (0, ), ĈN C N 0. For practical N, n with N n, leads to dramatically wrong conclusions Even for N = n/ / 153
11 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 6/153 The Large Dimensional Fallacies Setting: x i C N i.i.d., x 1 CN (0, I N ) 6 / 153
12 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 6/153 The Large Dimensional Fallacies Setting: x i C N i.i.d., x 1 CN (0, I N ) assume N = N(n) such that N/n c > 1 6 / 153
13 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 6/153 The Large Dimensional Fallacies Setting: x i C N i.i.d., x 1 CN (0, I N ) assume N = N(n) such that N/n c > 1 then, joint point-wise convergence ] max 1 i,j N [ĈN I N ij = max 1 1 i,j N n X j, Xi, δ ij a.s / 153
14 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 6/153 The Large Dimensional Fallacies Setting: x i C N i.i.d., x 1 CN (0, I N ) assume N = N(n) such that N/n c > 1 then, joint point-wise convergence ] max 1 i,j N [ĈN I N ij = max 1 1 i,j N n X j, Xi, δ ij a.s. 0. however, eigenvalue mismatch 0 = λ 1 (ĈN ) =... = λ N n (ĈN ) λ N n+1 (ĈN )... λ N (ĈN ) 1 = λ 1 (I N ) =... = λ N n (I N ) = λ N n+1 (ĈN ) =... = λ N (I N ) 6 / 153
15 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 6/153 The Large Dimensional Fallacies Setting: x i C N i.i.d., x 1 CN (0, I N ) assume N = N(n) such that N/n c > 1 then, joint point-wise convergence ] max 1 i,j N [ĈN I N ij = max 1 1 i,j N n X j, Xi, δ ij a.s. 0. however, eigenvalue mismatch 0 = λ 1 (ĈN ) =... = λ N n (ĈN ) λ N n+1 (ĈN )... λ N (ĈN ) 1 = λ 1 (I N ) =... = λ N n (I N ) = λ N n+1 (ĈN ) =... = λ N (I N ) no convergence in spectral norm. 6 / 153
16 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 7/153 The Mar cenko Pastur law 0.8 Empirical eigenvalue distribution Mar cenko Pastur Law 0.6 Density Eigenvalues of ĈN Figure: Histogram of the eigenvalues of ĈN for N = 500, n = 2000, C N = I N. 7 / 153
17 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 8/153 The Mar cenko Pastur law Definition (Empirical Spectral Density) Empirical spectral density (e.s.d.) µ N of Hermitian matrix A N C N N is µ N = 1 N δ λi (A N N ). i=1 8 / 153
18 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 8/153 The Mar cenko Pastur law Definition (Empirical Spectral Density) Empirical spectral density (e.s.d.) µ N of Hermitian matrix A N C N N is µ N = 1 N δ λi (A N N ). i=1 Theorem (Mar cenko Pastur Law [Mar cenko,pastur 67]) X N C N n with i.i.d. zero mean, unit variance entries. As N, n with N/n c (0, ), e.s.d. µ N of 1 n X N X N satisfies weakly, where µ c({0}) = max{0, 1 c 1 } µ N a.s. µ c 8 / 153
19 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 8/153 The Mar cenko Pastur law Definition (Empirical Spectral Density) Empirical spectral density (e.s.d.) µ N of Hermitian matrix A N C N N is µ N = 1 N δ λi (A N N ). i=1 Theorem (Mar cenko Pastur Law [Mar cenko,pastur 67]) X N C N n with i.i.d. zero mean, unit variance entries. As N, n with N/n c (0, ), e.s.d. µ N of 1 n X N X N satisfies weakly, where µ c({0}) = max{0, 1 c 1 } µ N a.s. µ c on (0, ), µ c has continuous density f c supported on [(1 c) 2, (1 + c) 2 ] f c(x) = 1 (x (1 c) 2 )((1 + c) 2 x). 2πcx 8 / 153
20 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 9/153 The Mar cenko Pastur law 1.2 c = Density fc(x) x Figure: Mar cenko-pastur law for different limit ratios c = lim N N/n. 9 / 153
21 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 9/153 The Mar cenko Pastur law 1.2 c = 0.1 c = Density fc(x) x Figure: Mar cenko-pastur law for different limit ratios c = lim N N/n. 9 / 153
22 Basics of Random Matrix Theory/Motivation: Large Sample Covariance Matrices 9/153 The Mar cenko Pastur law 1.2 c = 0.1 c = c = Density fc(x) x Figure: Mar cenko-pastur law for different limit ratios c = lim N N/n. 9 / 153
23 Basics of Random Matrix Theory/The Stieltjes Transform Method 10/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 10 / 153
24 Basics of Random Matrix Theory/The Stieltjes Transform Method 11/153 The Stieltjes transform Definition (Stieltjes Transform) For µ real probability measure of support supp(µ), Stieltjes transform m µ defined, for z C \ supp(µ), as 1 m µ(z) = t z µ(dt). 11 / 153
25 Basics of Random Matrix Theory/The Stieltjes Transform Method 11/153 The Stieltjes transform Definition (Stieltjes Transform) For µ real probability measure of support supp(µ), Stieltjes transform m µ defined, for z C \ supp(µ), as 1 m µ(z) = t z µ(dt). Property (Inverse Stieltjes Transform) For a < b continuity points of µ, 1 µ([a, b]) = lim ε 0 π b I[m µ(x + ıε)]dx a 11 / 153
26 Basics of Random Matrix Theory/The Stieltjes Transform Method 11/153 The Stieltjes transform Definition (Stieltjes Transform) For µ real probability measure of support supp(µ), Stieltjes transform m µ defined, for z C \ supp(µ), as 1 m µ(z) = t z µ(dt). Property (Inverse Stieltjes Transform) For a < b continuity points of µ, 1 µ([a, b]) = lim ε 0 π Besides, if µ has a density f at x, b I[m µ(x + ıε)]dx a 1 f(x) = lim I[mµ(x + ıε)]. ε 0 π 11 / 153
27 Basics of Random Matrix Theory/The Stieltjes Transform Method 12/153 The Stieltjes transform Property (Relation to e.s.d.) If µ e.s.d. of Hermitian A C N N, (i.e., µ = 1 N N i=1 δ λ i (A)) m µ(z) = 1 N tr (A zi N ) 1 12 / 153
28 Basics of Random Matrix Theory/The Stieltjes Transform Method 12/153 The Stieltjes transform Property (Relation to e.s.d.) If µ e.s.d. of Hermitian A C N N, (i.e., µ = 1 N N i=1 δ λ i (A)) m µ(z) = 1 N tr (A zi N ) 1 Proof: m µ(z) = µ(dt) t z = 1 N N i=1 = 1 N tr (A zi N ) 1. 1 λ i (A) z = 1 N tr (diag{λ i(a)} zi N ) 1 12 / 153
29 Basics of Random Matrix Theory/The Stieltjes Transform Method 13/153 The Stieltjes transform Property (Stieltjes transform of Gram matrices) For X C N n, and Then µ e.s.d. of XX µ e.s.d. of X X m µ(z) = n N m µ(z) N n 1 N z. 13 / 153
30 Basics of Random Matrix Theory/The Stieltjes Transform Method 13/153 The Stieltjes transform Property (Stieltjes transform of Gram matrices) For X C N n, and Then µ e.s.d. of XX µ e.s.d. of X X m µ(z) = n N m µ(z) N n 1 N z. Proof: m µ(z) = 1 N 1 N λ i=1 i (XX ) z = 1 N n i=1 1 λ i (X X) z + 1 N (N n) 1 0 z. 13 / 153
31 Basics of Random Matrix Theory/The Stieltjes Transform Method 14/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Resolvent Identity) For A, B C N N invertible, A 1 B 1 = A 1 (B A)B / 153
32 Basics of Random Matrix Theory/The Stieltjes Transform Method 14/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Resolvent Identity) For A, B C N N invertible, A 1 B 1 = A 1 (B A)B 1. Corollary For t C, x C N, A C N N, with A and A + txx invertible, (A + txx ) 1 x = A 1 x 1 + tx A 1 x. 14 / 153
33 Basics of Random Matrix Theory/The Stieltjes Transform Method 15/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Rank-one perturbation) For A, B C N N Hermitian nonnegative definite, e.s.d. µ of A, t > 0, x C N, z C \ supp(µ), 1 N tr B (A + txx zi N ) 1 1 N tr B (A zi N ) 1 1 B N dist(z, supp(µ)) 15 / 153
34 Basics of Random Matrix Theory/The Stieltjes Transform Method 15/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Rank-one perturbation) For A, B C N N Hermitian nonnegative definite, e.s.d. µ of A, t > 0, x C N, z C \ supp(µ), 1 N tr B (A + txx zi N ) 1 1 N tr B (A zi N ) 1 1 B N dist(z, supp(µ)) In particular, as N, if lim sup N B <, 1 N tr B (A + txx zi N ) 1 1 N tr B (A zi N ) / 153
35 Basics of Random Matrix Theory/The Stieltjes Transform Method 16/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Trace Lemma) For then x C N with i.i.d. entries with zero mean, unit variance, finite 2p order moment, A C N N deterministic (or independent of x), [ 1 E N x Ax 1 N tr A p ] K A p N p/2. 16 / 153
36 Basics of Random Matrix Theory/The Stieltjes Transform Method 16/153 The Stieltjes transform Three fundamental lemmas in all proofs. Lemma (Trace Lemma) For then x C N with i.i.d. entries with zero mean, unit variance, finite 2p order moment, A C N N deterministic (or independent of x), [ 1 E N x Ax 1 N tr A p ] K A p N p/2. In particular, if lim sup N A <, and x has entries with finite eighth-order moment, 1 N x Ax 1 a.s. tr A 0 N (by Markov inequality and Borel Cantelli lemma). 16 / 153
37 Basics of Random Matrix Theory/The Stieltjes Transform Method 17/153 Proof of the Mar cenko Pastur law Theorem (Mar cenko Pastur Law [Mar cenko,pastur 67]) X N C N n with i.i.d. zero mean, unit variance entries. As N, n with N/n c (0, ), e.s.d. µ N of 1 n X N X N satisfies weakly, where µ c({0}) = max{0, 1 c 1 } µ N a.s. µ c 17 / 153
38 Basics of Random Matrix Theory/The Stieltjes Transform Method 17/153 Proof of the Mar cenko Pastur law Theorem (Mar cenko Pastur Law [Mar cenko,pastur 67]) X N C N n with i.i.d. zero mean, unit variance entries. As N, n with N/n c (0, ), e.s.d. µ N of 1 n X N X N satisfies weakly, where µ c({0}) = max{0, 1 c 1 } µ N a.s. µ c on (0, ), µ c has continuous density f c supported on [(1 c) 2, (1 + c) 2 ] f c(x) = 1 (x (1 c) 2 )((1 + c) 2 x). 2πcx 17 / 153
39 Basics of Random Matrix Theory/The Stieltjes Transform Method 18/153 Proof of the Mar cenko Pastur law Stieltjes transform approach. 18 / 153
40 Basics of Random Matrix Theory/The Stieltjes Transform Method 18/153 Proof of the Mar cenko Pastur law Stieltjes transform approach. Proof With µ N e.s.d. of 1 n X N X N, m µn (z) = 1 N tr ( 1 n X N X N zi N ) 1 = 1 N [ N ( ) ] 1 1 n X N XN zi N. i=1 ii 18 / 153
41 Basics of Random Matrix Theory/The Stieltjes Transform Method 18/153 Proof of the Mar cenko Pastur law Stieltjes transform approach. Proof With µ N e.s.d. of 1 n X N X N, m µn (z) = 1 N tr ( 1 n X N X N zi N ) 1 = 1 N [ N ( ) ] 1 1 n X N XN zi N. i=1 ii Write [ ] y X N = C Y N n N 1 18 / 153
42 Basics of Random Matrix Theory/The Stieltjes Transform Method 18/153 Proof of the Mar cenko Pastur law Stieltjes transform approach. Proof With µ N e.s.d. of 1 n X N X N, m µn (z) = 1 N tr ( 1 n X N X N zi N ) 1 = 1 N [ N ( ) ] 1 1 n X N XN zi N. i=1 ii Write so that, for I[z] > 0, [ ] y X N = C Y N n N 1 ( ) 1 1 n X N XN zi N = ( 1 n y y z 1 n Y N 1y 1 ) 1 n y Y N 1 1 n Y N 1YN 1 zi. N 1 18 / 153
43 Basics of Random Matrix Theory/The Stieltjes Transform Method 19/153 Proof of the Mar cenko Pastur law Proof (continued) From block matrix inverse formula ( ) 1 ( A B (A BD = 1 C) 1 A 1 B(D CA 1 B) 1 ) C D (A BD 1 C) 1 CA 1 (D CA 1 B) 1 we have [ ( ) ] 1 1 n X N XN zi N 11 = 1 z z 1 n y ( 1 n Y N 1 Y N 1 zi n) 1 y. 19 / 153
44 Basics of Random Matrix Theory/The Stieltjes Transform Method 19/153 Proof of the Mar cenko Pastur law Proof (continued) From block matrix inverse formula ( ) 1 ( A B (A BD = 1 C) 1 A 1 B(D CA 1 B) 1 ) C D (A BD 1 C) 1 CA 1 (D CA 1 B) 1 we have [ ( ) ] 1 1 n X N XN zi N 11 = 1 z z 1 n y ( 1 n Y N 1 Y N 1 zi n) 1 y. By Trace Lemma, as N, n [ ( ) ] 1 1 n X N XN zi 1 N z z 1 11 n tr ( 1 n Y N 1 Y N 1 zi n) 1 a.s / 153
45 Basics of Random Matrix Theory/The Stieltjes Transform Method 20/153 Proof of the Mar cenko Pastur law Proof (continued) By Rank-1 Perturbation Lemma (X N X N = Y N 1 Y N 1 + yy ), as N, n [ ( ) ] 1 1 n X N XN zi N 11 1 z z 1 n tr ( 1 n X N X N zi n) 1 a.s / 153
46 Basics of Random Matrix Theory/The Stieltjes Transform Method 20/153 Proof of the Mar cenko Pastur law Proof (continued) By Rank-1 Perturbation Lemma (X N X N = Y N 1 Y N 1 + yy ), as N, n [ ( ) ] 1 1 n X N XN zi N 11 1 z z 1 n tr ( 1 n X N X N zi n) 1 a.s. 0. Since 1 n tr ( 1 n X N X N zi n) 1 = 1 n tr ( 1 n X N X N zi N ) 1 n N n [ ( ) ] 1 1 n X N XN zi N 11 1 z, 1 1 N n z z 1 n tr ( 1 n X N X N zi N ) 1 a.s / 153
47 Basics of Random Matrix Theory/The Stieltjes Transform Method 20/153 Proof of the Mar cenko Pastur law Proof (continued) By Rank-1 Perturbation Lemma (X N X N = Y N 1 Y N 1 + yy ), as N, n [ ( ) ] 1 1 n X N XN zi N 11 1 z z 1 n tr ( 1 n X N X N zi n) 1 a.s. 0. Since 1 n tr ( 1 n X N X N zi n) 1 = 1 n tr ( 1 n X N X N zi N ) 1 n N n [ ( ) ] 1 1 n X N XN zi N 11 1 z, 1 1 N n z z 1 n tr ( 1 n X N X N zi N ) 1 a.s. 0. Repeating for entries (2, 2),..., (N, N), and averaging, we get (for I[z] > 0) m µn (z) 1 1 N n z z N n mµ N (z) a.s / 153
48 Basics of Random Matrix Theory/The Stieltjes Transform Method 21/153 Proof of the Mar cenko Pastur law Proof (continued) Then m µn (z) a.s. m(z) solution to m(z) = 1 1 c z czm(z) 21 / 153
49 Basics of Random Matrix Theory/The Stieltjes Transform Method 21/153 Proof of the Mar cenko Pastur law Proof (continued) Then m µn (z) a.s. m(z) solution to m(z) = 1 1 c z czm(z) i.e., (with branch of f(z) such that m(z) 0 as z ) (z ) ( m(z) = 1 c 2cz 1 (1 + c) 2 z (1 c) 2) 2c +. 2cz 21 / 153
50 Basics of Random Matrix Theory/The Stieltjes Transform Method 21/153 Proof of the Mar cenko Pastur law Proof (continued) Then m µn (z) a.s. m(z) solution to m(z) = 1 1 c z czm(z) i.e., (with branch of f(z) such that m(z) 0 as z ) (z ) ( m(z) = 1 c 2cz 1 (1 + c) 2 z (1 c) 2) 2c +. 2cz Finally, by inverse Stieltjes Transform, for x > 0, ((1 1 + c) 2 x ) ( x (1 c) 2) lim I[m(x + ıε)] = 1 ε 0 π 2πcx {x [(1 c) 2,(1+ c) 2 ]}. And for x = 0, lim ε 0 ıεi[m(ıε)] = ( 1 c 1) 1 {c>1}. 21 / 153
51 Basics of Random Matrix Theory/The Stieltjes Transform Method 22/153 Sample Covariance Matrices Theorem (Sample Covariance Matrix Model [Silverstein,Bai 95]) Let Y N = C 1 2 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, X N C N n has i.i.d. entries of zero mean and unit variance. As N, n, N/n c (0, ), µ N e.s.d. of 1 n Y N Y N C n n satisfies µ N a.s. µ weakly, with m µ (z), I[z] > 0, unique solution with I[m µ (z)] > 0 of ( m µ (z) = z + c ) t tm µ (z) ν(dt). 22 / 153
52 Basics of Random Matrix Theory/The Stieltjes Transform Method 22/153 Sample Covariance Matrices Theorem (Sample Covariance Matrix Model [Silverstein,Bai 95]) Let Y N = C 1 2 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, X N C N n has i.i.d. entries of zero mean and unit variance. As N, n, N/n c (0, ), µ N e.s.d. of 1 n Y N Y N C n n satisfies µ N a.s. µ weakly, with m µ (z), I[z] > 0, unique solution with I[m µ (z)] > 0 of ( m µ (z) = z + c ) t tm µ (z) ν(dt). Moreover, µ is continuous on R + and real analytic wherever positive. 22 / 153
53 Basics of Random Matrix Theory/The Stieltjes Transform Method 22/153 Sample Covariance Matrices Theorem (Sample Covariance Matrix Model [Silverstein,Bai 95]) Let Y N = C 1 2 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, X N C N n has i.i.d. entries of zero mean and unit variance. As N, n, N/n c (0, ), µ N e.s.d. of 1 n Y N Y N C n n satisfies µ N a.s. µ weakly, with m µ (z), I[z] > 0, unique solution with I[m µ (z)] > 0 of ( m µ (z) = z + c ) t tm µ (z) ν(dt). Moreover, µ is continuous on R + and real analytic wherever positive. Immediate corollary: For µ N e.s.d. of 1 n Y N Y N = 1 n n i=1 C 1 2 N x i x i C 1 2 N, weakly, with µ = cµ + (1 c)δ 0. µ N a.s. µ 22 / 153
54 Basics of Random Matrix Theory/The Stieltjes Transform Method 23/153 Sample Covariance Matrices e.s.d. e.s.d. f f (i) (ii) Figure: Histogram of the eigenvalues of 1 n Y N Y N, n = 3000, N = 300, with C N diagonal with evenly weighted masses in (i) 1, 3, 7, (ii) 1, 3, / 153
55 Basics of Random Matrix Theory/The Stieltjes Transform Method 24/153 Further Models and Deterministic Equivalents Theorem (Doubly-correlated i.i.d. matrices) Let B N = C 1 2 N X N T N X N C 1 2 N, with e.s.d. µ N, X k C N n with i.i.d. entries of zero mean, variance 1/n, C N Hermitian nonnegative definite, T N diagonal nonnegative, lim sup N max( C N, T N ) <. Denote c = N/n. Then, as N, n with bounded ratio c, for z C \ R, m µn (z) m N (z) a.s. 0, m N (z) = 1 N tr ( zi N + ē N (z)c N ) 1 with ē(z) unique solution in {z C +, ē N (z) C + } or {z R, ē N (z) R + } of e N (z) = 1 N tr C N ( zi N + ē N (z)c N ) 1 ē N (z) = 1 n tr T N (I n + ce N (z)t N ) / 153
56 Basics of Random Matrix Theory/The Stieltjes Transform Method 25/153 Other Refined Sample Covariance Models Side note on other models. Similar results for multiple matrix models: 25 / 153
57 Basics of Random Matrix Theory/The Stieltjes Transform Method 25/153 Other Refined Sample Covariance Models Side note on other models. Similar results for multiple matrix models: Information-plus-noise: Y N = A N + X N, A N deterministic Variance profile: Y N = P N X N (entry-wise product) Per-column covariance: Y N = [y 1,..., y n], y i = C 1 2 N,i x i etc. 25 / 153
58 Basics of Random Matrix Theory/Spiked Models 26/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 26 / 153
59 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) 2 Let Y N = C 1 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, 27 / 153
60 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) 2 Let Y N = C 1 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, E[ X N 4 ij ] <, 27 / 153
61 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) 2 Let Y N = C 1 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, E[ X N 4 ij ] <, X N C N n has i.i.d. entries of zero mean and unit variance, 27 / 153
62 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) 2 Let Y N = C 1 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, E[ X N 4 ij ] <, X N C N n has i.i.d. entries of zero mean and unit variance, max i dist(λ i (C N ), supp(ν)) / 153
63 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) Let Y N = C 1 2 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, E[ X N 4 ij ] <, X N C N n has i.i.d. entries of zero mean and unit variance, max i dist(λ i (C N ), supp(ν)) 0. Let µ be the limiting e.s.d. of 1 n Y N Y N as before. Let [a, b] R \ supp( ν). Then, for all large n, almost surely. { ( )} 1 n λ i n Y N Y N [a, b] = i=1 27 / 153
64 Basics of Random Matrix Theory/Spiked Models 27/153 No Eigenvalue Outside the Support Theorem (No Eigenvalue Outside the Support [Silverstein,Bai 98]) Let Y N = C 1 2 N X N C N n, with C N C N N nonnegative definite with e.s.d. ν N ν weakly, E[ X N 4 ij ] <, X N C N n has i.i.d. entries of zero mean and unit variance, max i dist(λ i (C N ), supp(ν)) 0. Let µ be the limiting e.s.d. of 1 n Y N Y N as before. Let [a, b] R \ supp( ν). Then, for all large n, almost surely. { ( )} 1 n λ i n Y N Y N [a, b] = i=1 In practice: This means that eigenvalues of 1 n Y N Y N cannot be bound at macroscopic distance from the bulk, for N, n large. 27 / 153
65 Basics of Random Matrix Theory/Spiked Models 28/153 Spiked Models Breaking the rules. If we break Rule 1: Infinitely many eigenvalues may wander away from supp(µ). 0.8 {λ i } n i=1 µ 0.8 {λ i } n i=1 µ E[X 4 ij ] < E[X4 ij ] = 28 / 153
66 Basics of Random Matrix Theory/Spiked Models 29/153 Spiked Models If we break: Rule 2: C N may create isolated eigenvalues in 1 n Y N YN, called spikes. 0.8 {λ i } n i=1 µ ω 1 + c 1+ω 1 ω 1, 1 + ω 2 + c 1+ω 2 ω 2 Figure: Eigenvalues of 1 n Y N Y N, C N = diag(1,..., 1, 2, 2, 3, 3), N = 500, n = }{{} N 4 29 / 153
67 Basics of Random Matrix Theory/Spiked Models 30/153 Spiked Models Theorem (Eigenvalues [Baik,Silverstein 06]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, E[ X N 4 ij ] <. C N = I N + P, P = UΩU, where, for K fixed, Ω = diag (ω 1,..., ω K ) R K K, with ω 1... ω K > / 153
68 Basics of Random Matrix Theory/Spiked Models 30/153 Spiked Models Theorem (Eigenvalues [Baik,Silverstein 06]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, E[ X N 4 ij ] <. C N = I N + P, P = UΩU, where, for K fixed, Ω = diag (ω 1,..., ω K ) R K K, with ω 1... ω K > 0. Then, as N, n, N/n c (0, ), denoting λ i = λ i ( 1 n Y N Y N ), if ω m > c, λ m a.s. 1 + ω m + c 1 + ωm ω m > (1 + c) 2 30 / 153
69 Basics of Random Matrix Theory/Spiked Models 30/153 Spiked Models Theorem (Eigenvalues [Baik,Silverstein 06]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, E[ X N 4 ij ] <. C N = I N + P, P = UΩU, where, for K fixed, Ω = diag (ω 1,..., ω K ) R K K, with ω 1... ω K > 0. Then, as N, n, N/n c (0, ), denoting λ i = λ i ( 1 n Y N Y N ), if ω m > c, λ m a.s. 1 + ω m + c 1 + ωm ω m > (1 + c) 2 if ω m (0, c], λ m a.s. (1 + c) 2 30 / 153
70 Basics of Random Matrix Theory/Spiked Models 31/153 Spiked Models Proof Two ingredients: Algebraic calculus + trace lemma 31 / 153
71 Basics of Random Matrix Theory/Spiked Models 31/153 Spiked Models Proof Two ingredients: Algebraic calculus + trace lemma Find eigenvalues away from eigenvalues of 1 n X N X N : ( 1 0 = det = det(c N ) det ( 1 = det n Y N Y N λi N ) ( ) 1 n X N XN λc 1 N ) n X N XN λi N + λ(i N C 1 N ) ( ) ( 1 = det n X N XN λi N det ( I N + λ(i N C 1 1 N ) n X N XN λi N ) 1 ). 31 / 153
72 Basics of Random Matrix Theory/Spiked Models 31/153 Spiked Models Proof Two ingredients: Algebraic calculus + trace lemma Find eigenvalues away from eigenvalues of 1 n X N X N : ( 1 0 = det = det(c N ) det ( 1 = det n Y N Y N λi N ) ( ) 1 n X N XN λc 1 N ) n X N XN λi N + λ(i N C 1 N ) ( ) ( 1 = det n X N XN λi N det ( I N + λ(i N C 1 1 N ) n X N XN λi N ) 1 ). Use low rank property: I N C 1 N = I N (I N + UΩU ) 1 = U(I K + Ω 1 ) 1 U, Ω C K K. Hence ( ) ( ( ) ) = det n X N XN λi N det I N + λu(i K + Ω 1 ) 1 U n X N XN λi N
73 Basics of Random Matrix Theory/Spiked Models 31/153 Spiked Models Proof Two ingredients: Algebraic calculus + trace lemma Find eigenvalues away from eigenvalues of 1 n X N X N : ( 1 0 = det = det(c N ) det ( 1 = det n Y N Y N λi N ) ( ) 1 n X N XN λc 1 N ) n X N XN λi N + λ(i N C 1 N ) ( ) ( 1 = det n X N XN λi N det ( I N + λ(i N C 1 1 N ) n X N XN λi N ) 1 ). Use low rank property: I N C 1 N = I N (I N + UΩU ) 1 = U(I K + Ω 1 ) 1 U, Ω C K K. Hence ( ) ( ( ) ) = det n X N XN λi N det I N + λu(i K + Ω 1 ) 1 U n X N XN λi N 31 / 153
74 Basics of Random Matrix Theory/Spiked Models 32/153 Spiked Models Proof (2) Sylverster s identity (det(i + AB) = det(i + BA)), ( ) ( ( ) ) = det n X N XN λi N det I K + λ(i K + Ω 1 ) 1 U n X N XN λi N U 32 / 153
75 Basics of Random Matrix Theory/Spiked Models 32/153 Spiked Models Proof (2) Sylverster s identity (det(i + AB) = det(i + BA)), ( ) ( ( ) ) = det n X N XN λi N det I K + λ(i K + Ω 1 ) 1 U n X N XN λi N U No eigenvalue outside the support [Bai,Sil 98]: det( 1 n X N X N λi N ) has no zero beyond (1 + c) 2 for all large n a.s. 32 / 153
76 Basics of Random Matrix Theory/Spiked Models 32/153 Spiked Models Proof (2) Sylverster s identity (det(i + AB) = det(i + BA)), ( ) ( ( ) ) = det n X N XN λi N det I K + λ(i K + Ω 1 ) 1 U n X N XN λi N U No eigenvalue outside the support [Bai,Sil 98]: det( 1 n X N X N λi N ) has no zero beyond (1 + c) 2 for all large n a.s. Extension of Trace Lemma: for each z C \ supp(µ), U ( 1 n X N X N zi N ) 1 U a.s. m µ(z)i K. (X N being almost-unitarily invariant, U can be seen as formed of random i.i.d.-like vectors) 32 / 153
77 Basics of Random Matrix Theory/Spiked Models 32/153 Spiked Models Proof (2) Sylverster s identity (det(i + AB) = det(i + BA)), ( ) ( ( ) ) = det n X N XN λi N det I K + λ(i K + Ω 1 ) 1 U n X N XN λi N U No eigenvalue outside the support [Bai,Sil 98]: det( 1 n X N X N λi N ) has no zero beyond (1 + c) 2 for all large n a.s. Extension of Trace Lemma: for each z C \ supp(µ), U ( 1 n X N X N zi N ) 1 U a.s. m µ(z)i K. (X N being almost-unitarily invariant, U can be seen as formed of random i.i.d.-like vectors) As a result, for all large n a.s., ( 0 = det I K + λ(i K + Ω 1 ) 1 U ( 1 ) n X N XN λi N ) 1 U M m=1 ( 1 + λ 1 + ωm 1 m µ(λ)) km = M m=1 ( ) 1 + λωm km m µ(λ) 1 + ω m 32 / 153
78 Basics of Random Matrix Theory/Spiked Models 33/153 Spiked Models Proof (3) Limiting solutions: zeros (with multiplicity) of 1 + λωm 1 + ω m m µ(λ) = / 153
79 Basics of Random Matrix Theory/Spiked Models 33/153 Spiked Models Proof (3) Limiting solutions: zeros (with multiplicity) of 1 + λωm 1 + ω m m µ(λ) = 0. Using Mar cenko Pastur law properties (m µ(z) = (1 c z czm µ(z)) 1 ), { λ 1 + ω m + c 1 + } M ωm. ω m m=1 33 / 153
80 Basics of Random Matrix Theory/Spiked Models 34/153 Spiked Models Theorem (Eigenvectors [Paul 07]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, finite fourth order moment entries C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω M > / 153
81 Basics of Random Matrix Theory/Spiked Models 34/153 Spiked Models Theorem (Eigenvectors [Paul 07]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, finite fourth order moment entries C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω M > 0. Then, as N, n, N/n c (0, ), for a, b C N deterministic and û i eigenvector of λ i ( 1 n Y N Y N ), In particular, a û i û i b 1 cω 2 i 1 + cω 1 i a u i u i b 1 ω i > c a.s. 0 û i u i 2 a.s. 1 cω 2 i 1 + cω 1 1 ωi > c. i 34 / 153
82 Basics of Random Matrix Theory/Spiked Models 34/153 Spiked Models Theorem (Eigenvectors [Paul 07]) Let Y N = C 1 2 N X N, with X N with i.i.d. zero mean, unit variance, finite fourth order moment entries C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω M > 0. Then, as N, n, N/n c (0, ), for a, b C N deterministic and û i eigenvector of λ i ( 1 n Y N Y N ), In particular, a û i û i b 1 cω 2 i 1 + cω 1 i a u i u i b 1 ω i > c a.s. 0 û i u i 2 a.s. 1 cω 2 i 1 + cω 1 1 ωi > c. i Proof: Based on Cauchy integral + similar ingredients as eigenvalue proof a û i û i b = 1 ( ) 1 1 2πı n Y N YN zi N b dz C i a for C m contour circling around λ i only. 34 / 153
83 Basics of Random Matrix Theory/Spiked Models 35/153 Spiked Models û 1 u N = 100 N = 200 N = c/ω c/ω Population spike ω 1 Figure: Simulated versus limiting û 1 u1 2 for Y N = C 1 2N X N, C N = I N + ω 1u 1u 1, N/n = 1/3, varying ω / 153
84 Basics of Random Matrix Theory/Spiked Models 36/153 Tracy Widom Theorem Theorem (Phase Transition [Baik,BenArous,Péché 05]) Let Y N = C 1 2 N X N, with X N with i.i.d. complex Gaussian zero mean, unit variance entries, C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω K > 0 (K 0). 36 / 153
85 Basics of Random Matrix Theory/Spiked Models 36/153 Tracy Widom Theorem Theorem (Phase Transition [Baik,BenArous,Péché 05]) Let Y N = C 1 2 N X N, with X N with i.i.d. complex Gaussian zero mean, unit variance entries, C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω K > 0 (K 0). Then, as N, n, N/n c < 1, If ω 1 < c (or K = 0), N 2 3 λ 1 (1 + c) 2 (1 + c) 4 3 c 1 2 L T 2, (complex Tracy Widom law) 36 / 153
86 Basics of Random Matrix Theory/Spiked Models 36/153 Tracy Widom Theorem Theorem (Phase Transition [Baik,BenArous,Péché 05]) Let Y N = C 1 2 N X N, with X N with i.i.d. complex Gaussian zero mean, unit variance entries, C N = I N + P, P = K i=1 ω iu i u i, ω 1 >... > ω K > 0 (K 0). Then, as N, n, N/n c < 1, If ω 1 < c (or K = 0), N 2 3 λ 1 (1 + c) 2 (1 + c) 4 3 c 1 2 L T 2, (complex Tracy Widom law) If ω 1 > c, ( (1 + ω1 ) 2 c (1 + ω 1) 2 ) 1 [ ( 2 1 N ω1 2 2 λ ω 1 + c 1 + ω )] 1 L N (0, 1). ω 1 36 / 153
87 Basics of Random Matrix Theory/Spiked Models 37/153 Tracy Widom Theorem 0.5 Centered-scaled λ 1 T Figure: Distribution of N 2 3 c 1 2 (1 + c) 4 3 [ λ 1( 1 n X N X N ) (1 + c) 2] versus Tracy Widom (T 2), N = 500, n = / 153
88 Basics of Random Matrix Theory/Spiked Models 38/153 Other Spiked Models Similar results for multiple matrix models: Additive spiked model: Y N = 1 n XX + P, P deterministic and low rank Y N = 1 n X (I + P )X Y N = 1 n (X + P ) (X + P ) Y N = 1 n T X (I + P )XT etc. 38 / 153
89 Basics of Random Matrix Theory/Other Common Random Matrix Models 39/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 39 / 153
90 Basics of Random Matrix Theory/Other Common Random Matrix Models 40/153 The Semi-circle law Theorem Let X N C N N 1 Hermitian with e.s.d. µ N such that [X N N ] i>j are i.i.d. with zero mean and unit variance. Then, as N, µ N a.s. µ with µ(dt) = 1 2π (4 t 2 ) + dt. In particular, m µ satisfies m µ(z) = 1 z m µ(z). 40 / 153
91 Basics of Random Matrix Theory/Other Common Random Matrix Models 41/153 The Semi-circle law 0.4 Empirical eigenvalue distribution Semi-circle Law 0.3 Density Eigenvalues Figure: Histogram of the eigenvalues of Wigner matrices and the semi-circle law, for N = / 153
92 Basics of Random Matrix Theory/Other Common Random Matrix Models 42/153 The Circular law Theorem Let X N C N N with e.s.d. µ N be such that mean and unit variance. Then, as N, µ N a.s. µ with µ a complex-supported measure with µ(dz) = 1 2π δ z 1dz. 1 N [X N ] ij are i.i.d. entries with zero 42 / 153
93 Basics of Random Matrix Theory/Other Common Random Matrix Models 43/153 The Circular law 1 Empirical eigenvalues Circular Law Eigenvalues (imaginary part) Eigenvalues (real part) Figure: Eigenvalues of X N with i.i.d. standard Gaussian entries, for N = / 153
94 Basics of Random Matrix Theory/Other Common Random Matrix Models 44/153 Bibliographical references: Maths Book and Tutorial References I From most accessible to least: Couillet, R., & Debbah, M. (2011). Random matrix methods for wireless communications. Cambridge University Press. Tao, T. (2012). Topics in random matrix theory (Vol. 132). Providence, RI: American Mathematical Society. Bai, Z., & Silverstein, J. W. (2010). Spectral analysis of large dimensional random matrices (Vol. 20). New York: Springer. Pastur, L. A., Shcherbina, M., & Shcherbina, M. (2011). Eigenvalue distribution of large random matrices (Vol. 171). Providence, RI: American Mathematical Society. Anderson, G. W., Guionnet, A., & Zeitouni, O. (2010). An introduction to random matrices (Vol. 118). Cambridge university press. 44 / 153
95 Applications/ 45/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 45 / 153
96 Applications/Random Matrices and Robust Estimation 46/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 46 / 153
97 Applications/Random Matrices and Robust Estimation 47/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : 47 / 153
98 Applications/Random Matrices and Robust Estimation 47/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 47 / 153
99 Applications/Random Matrices and Robust Estimation 47/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 [Huber 67] If x 1 (1 ε)n (0, C N ) + εg, G unknown, robust estimator (n > N) { } Ĉ N = 1 n l 2 max l 1, n 1 i=1 N x i Ĉ 1 N x x i x i for some l 1, l 2 > 0. i 47 / 153
100 Applications/Random Matrices and Robust Estimation 47/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 [Huber 67] If x 1 (1 ε)n (0, C N ) + εg, G unknown, robust estimator (n > N) { } Ĉ N = 1 n l 2 max l 1, n 1 i=1 N x i Ĉ 1 N x x i x i for some l 1, l 2 > 0. i [Maronna 76] If x 1 elliptical (and n > N), ML estimator for C N given by Ĉ N = 1 n ( ) 1 u n N x i Ĉ 1 N x i x i x i for some non-increasing u. i=1 47 / 153
101 Applications/Random Matrices and Robust Estimation 47/153 Context Baseline scenario: x 1,..., x n C N (or R N ) i.i.d. with E[x 1 ] = 0, E[x 1 x 1 ] = C N : If x 1 N (0, C N ), ML estimator for C N is sample covariance matrix (SCM) Ĉ N = 1 n x i x i n. i=1 [Huber 67] If x 1 (1 ε)n (0, C N ) + εg, G unknown, robust estimator (n > N) { } Ĉ N = 1 n l 2 max l 1, n 1 i=1 N x i Ĉ 1 N x x i x i for some l 1, l 2 > 0. i [Maronna 76] If x 1 elliptical (and n > N), ML estimator for C N given by Ĉ N = 1 n ( ) 1 u n N x i Ĉ 1 N x i x i x i for some non-increasing u. i=1 [Pascal 13; Chen 11] If N > n, x 1 elliptical or with outliers, shrinkage extensions Ĉ N (ρ) = (1 ρ) 1 n x i x i n 1 i=1 N x i Ĉ 1 N (ρ)x + ρi N i ˇB N (ρ) Č N (ρ) = 1 N tr ˇB N (ρ), ˇBN (ρ) = (1 ρ) 1 n x i x i n 1 i=1 N x i Č 1 N (ρ)x + ρi N i 47 / 153
102 Applications/Random Matrices and Robust Estimation 48/153 Context Results only known for N fixed and n : not appropriate in settings of interest today (BigData, array processing, MIMO) 48 / 153
103 Applications/Random Matrices and Robust Estimation 48/153 Context Results only known for N fixed and n : not appropriate in settings of interest today (BigData, array processing, MIMO) We study such ĈN in the regime N, n, N/n c (0, ). 48 / 153
104 Applications/Random Matrices and Robust Estimation 48/153 Context Results only known for N fixed and n : not appropriate in settings of interest today (BigData, array processing, MIMO) We study such ĈN in the regime N, n, N/n c (0, ). Math interest: limiting eigenvalue distribution of ĈN limiting values and fluctuations of functionals f(ĉn ) 48 / 153
105 Applications/Random Matrices and Robust Estimation 48/153 Context Results only known for N fixed and n : not appropriate in settings of interest today (BigData, array processing, MIMO) We study such ĈN in the regime N, n, N/n c (0, ). Math interest: limiting eigenvalue distribution of ĈN limiting values and fluctuations of functionals f(ĉn ) Application interest: comparison between SCM and robust estimators performance of robust/non-robust estimation methods improvement thereof (by proper parametrization) 48 / 153
106 Applications/Random Matrices and Robust Estimation 49/153 Model Description Definition (Maronna s Estimator) For x 1,..., x n C N with n > N, Ĉ N is the solution (upon existence and uniqueness) of n Ĉ N = 1 n i=1 ( 1 u N x i Ĉ 1 N x i ) x i x i 49 / 153
107 Applications/Random Matrices and Robust Estimation 49/153 Model Description Definition (Maronna s Estimator) For x 1,..., x n C N with n > N, Ĉ N is the solution (upon existence and uniqueness) of where u : [0, ) (0, ) is n Ĉ N = 1 n i=1 ( 1 u N x i Ĉ 1 N x i ) x i x i non-increasing such that φ(x) xu(x) increasing of supremum φ with 1 < φ < c 1, c (0, 1). 49 / 153
108 Applications/Random Matrices and Robust Estimation 50/153 The Results in a Nutshell For various models of the x i s, First order convergence: ĈN ŜN a.s. 0 for some tractable random matrices ŜN. We only discuss this result here. 50 / 153
109 Applications/Random Matrices and Robust Estimation 50/153 The Results in a Nutshell For various models of the x i s, First order convergence: ĈN ŜN a.s. 0 for some tractable random matrices ŜN. We only discuss this result here. Second order results: N 1 ε ( a Ĉ k N b a Ŝ k N b ) a.s. 0 allowing transfer of CLT results. 50 / 153
110 Applications/Random Matrices and Robust Estimation 50/153 The Results in a Nutshell For various models of the x i s, First order convergence: ĈN ŜN a.s. 0 for some tractable random matrices ŜN. We only discuss this result here. Second order results: N 1 ε ( a Ĉ k N b a Ŝ k N b ) a.s. 0 allowing transfer of CLT results. Applications: improved robust covariance matrix estimation improved robust tests / estimators specific examples in statistics at large, array processing, statistical finance, etc. 50 / 153
111 Applications/Random Matrices and Robust Estimation 51/153 (Elliptical) scenario Theorem (Large dimensional behavior, elliptical case) For x i = τ i w i, τ i impulsive (random or not), w i unitarily invariant, w i = N, ĈN ŜN a.s. 0 with, for some v related to u (v = u g 1, g(x) = x(1 cφ(x)) 1 ), n Ĉ N 1 n i=1 and γ N unique solution of ( 1 u N x i Ĉ 1 N x i ) x i x i, 1 = 1 n γv(τ i γ) n 1 + cγv(τ j=1 i γ). Ŝ N 1 n v(τ i γ N )x i x i n i=1 51 / 153
112 Applications/Random Matrices and Robust Estimation 51/153 (Elliptical) scenario Theorem (Large dimensional behavior, elliptical case) For x i = τ i w i, τ i impulsive (random or not), w i unitarily invariant, w i = N, ĈN ŜN a.s. 0 with, for some v related to u (v = u g 1, g(x) = x(1 cφ(x)) 1 ), n Ĉ N 1 n i=1 and γ N unique solution of ( 1 u N x i Ĉ 1 N x i ) x i x i, 1 = 1 n γv(τ i γ) n 1 + cγv(τ j=1 i γ). Ŝ N 1 n v(τ i γ N )x i x i n i=1 Corollaries Spectral measure: µĉn N µŝn N L 0 a.s. (µ X N 1 n n i=1 δ λ i (X)) 51 / 153
113 Applications/Random Matrices and Robust Estimation 51/153 (Elliptical) scenario Theorem (Large dimensional behavior, elliptical case) For x i = τ i w i, τ i impulsive (random or not), w i unitarily invariant, w i = N, ĈN ŜN a.s. 0 with, for some v related to u (v = u g 1, g(x) = x(1 cφ(x)) 1 ), n Ĉ N 1 n i=1 and γ N unique solution of ( 1 u N x i Ĉ 1 N x i ) x i x i, 1 = 1 n γv(τ i γ) n 1 + cγv(τ j=1 i γ). Ŝ N 1 n v(τ i γ N )x i x i n i=1 Corollaries Spectral measure: µĉn N µŝn N L 0 a.s. (µ X N 1 n n i=1 δ λ i (X)) Local convergence: max 1 i N λ i (ĈN ) λ i (ŜN ) a.s / 153
114 Applications/Random Matrices and Robust Estimation 51/153 (Elliptical) scenario Theorem (Large dimensional behavior, elliptical case) For x i = τ i w i, τ i impulsive (random or not), w i unitarily invariant, w i = N, ĈN ŜN a.s. 0 with, for some v related to u (v = u g 1, g(x) = x(1 cφ(x)) 1 ), n Ĉ N 1 n i=1 and γ N unique solution of ( 1 u N x i Ĉ 1 N x i ) x i x i, 1 = 1 n γv(τ i γ) n 1 + cγv(τ j=1 i γ). Ŝ N 1 n v(τ i γ N )x i x i n i=1 Corollaries Spectral measure: µĉn N µŝn N L 0 a.s. (µ X N 1 n n i=1 δ λ i (X)) Local convergence: max 1 i N λ i (ĈN ) λ i (ŜN ) a.s. 0. Norm boundedness: lim sup N ĈN < Bounded spectrum (unlike SCM!) 51 / 153
115 Applications/Random Matrices and Robust Estimation 52/153 Large dimensional behavior 3 Eigenvalues of ĈN Density Eigenvalues Figure: n = 2500, N = 500, C N = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 52 / 153
116 Applications/Random Matrices and Robust Estimation 52/153 Large dimensional behavior Eigenvalues of ĈN 3 Eigenvalues of ŜN Density Eigenvalues Figure: n = 2500, N = 500, C N = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 52 / 153
117 Applications/Random Matrices and Robust Estimation 52/153 Large dimensional behavior Eigenvalues of ĈN 3 Eigenvalues of ŜN Approx. Density Density Eigenvalues Figure: n = 2500, N = 500, C N = diag(i 125, 3I 125, 10I 250), τ i Γ(.5, 2) i.i.d. 52 / 153
118 Applications/Random Matrices and Robust Estimation 53/153 Elements of Proof Definition (v and ψ) Letting g(x) = x(1 cφ(x)) 1 (on R + ), v(x) (u g 1 )(x) non-increasing ψ(x) xv(x) increasing and bounded by ψ. 53 / 153
119 Applications/Random Matrices and Robust Estimation 53/153 Elements of Proof Definition (v and ψ) Letting g(x) = x(1 cφ(x)) 1 (on R + ), v(x) (u g 1 )(x) non-increasing ψ(x) xv(x) increasing and bounded by ψ. Lemma (Rewriting ĈN) It holds (with C N = I N ) that Ĉ N 1 n τ i v (τ i d i ) w i wi n i=1 with (d 1,..., d n) R n + a.s. unique solution to d i = 1 N w i Ĉ 1 (i) w i = 1 N w i 1 1 τ j v(τ j d j )w j wj w i, i = 1,..., n. n j i 53 / 153
120 Applications/Random Matrices and Robust Estimation 54/153 Elements of Proof Remark (Quadratic Form close to Trace) Random matrix insight: ( 1 n j i τ jv(τ j d j )w j wj ) 1 almost independent of w i, so 54 / 153
121 Applications/Random Matrices and Robust Estimation 54/153 Elements of Proof Remark (Quadratic Form close to Trace) Random matrix insight: ( 1 n j i τ jv(τ j d j )w j wj ) 1 almost independent of w i, so 1 1 d i = 1 N w 1 i τ j v(τ j d j )w j wj w i 1 n N tr 1 τ j v(τ j d j )w j w j γ N n j i j i for some deterministic sequence (γ N ) N=1, irrespective of i. 54 / 153
122 Applications/Random Matrices and Robust Estimation 54/153 Elements of Proof Remark (Quadratic Form close to Trace) Random matrix insight: ( 1 n j i τ jv(τ j d j )w j wj ) 1 almost independent of w i, so 1 1 d i = 1 N w 1 i τ j v(τ j d j )w j wj w i 1 n N tr 1 τ j v(τ j d j )w j w j γ N n j i j i for some deterministic sequence (γ N ) N=1, irrespective of i. Lemma (Key Lemma) Letting e i v(τ id i ) v(τ i γ N ) with γ N unique solution to we have 1 = 1 n ψ(τ i γ N ) n 1 + cψ(τ k=1 i γ N ) max e i 1 a.s i n 54 / 153
123 Applications/Random Matrices and Robust Estimation 55/153 Proof of the Key Lemma: max i e i 1 a.s. 0, e i = v(τ id i ) v(τ i γ N ) Property (Quadratic form and γ N ) 1 max 1 1 i n N w 1 i τ j v(τ j γ N )w j wj w i γ N a.s. n 0. j i 55 / 153
124 Applications/Random Matrices and Robust Estimation 55/153 Proof of the Key Lemma: max i e i 1 a.s. 0, e i = v(τ id i ) v(τ i γ N ) Property (Quadratic form and γ N ) 1 max 1 1 i n N w 1 i τ j v(τ j γ N )w j wj w i γ N a.s. n 0. j i Proof of the Property Uniformity easy (moments of all orders for [w i ] j ). By a quadratic form similar to trace approach, we get 1 max 1 1 i n N w 1 i τ j v(τ j γ N )w j wj w i m(0) a.s. n 0 j i with m(0) unique positive solution to [MarPas 67; BaiSil 95] m(0) = 1 n τ i v(τ i γ N ) n 1 + cτ i=1 i v(τ i γ N )m(0). γ N precisely solves this equation, thus m(0) = γ N. 55 / 153
125 Applications/Random Matrices and Robust Estimation 56/153 Proof of the Key Lemma: max i e i 1 a.s. 0, e i = v(τ id i ) Substitution Trick (case τ i [a, b] (0, )) Up to relabelling e 1... e n, use v(τ i γ N ) 1 v(τ nγ N )e n = v(τ nd n) = v 1 τn N w 1 n τ i v(τ i d i ) w i w i n }{{} w n i<n =v(τ i γ N )e i ( ) v τ ne n N w n τ i v(τ i γ N )w i wi w n n i<n v ( τ ne 1 n (γ N ε ) n) a.s., ε n 0 (slow). 56 / 153
126 Applications/Random Matrices and Robust Estimation 56/153 Proof of the Key Lemma: max i e i 1 a.s. 0, e i = v(τ id i ) Substitution Trick (case τ i [a, b] (0, )) Up to relabelling e 1... e n, use v(τ i γ N ) 1 v(τ nγ N )e n = v(τ nd n) = v 1 τn N w 1 n τ i v(τ i d i ) w i w i n }{{} w n i<n =v(τ i γ N )e i ( ) v τ ne n N w n τ i v(τ i γ N )w i wi w n n i<n Use properties of ψ to get v ( τ ne 1 n (γ N ε ) n) a.s., ε n 0 (slow). ψ (τ nγ N ) ψ ( τ ne 1 n γ ) ( ) 1 N 1 ε nγ 1 N 56 / 153
127 Applications/Random Matrices and Robust Estimation 56/153 Proof of the Key Lemma: max i e i 1 a.s. 0, e i = v(τ id i ) Substitution Trick (case τ i [a, b] (0, )) Up to relabelling e 1... e n, use v(τ i γ N ) 1 v(τ nγ N )e n = v(τ nd n) = v 1 τn N w 1 n τ i v(τ i d i ) w i w i n }{{} w n i<n =v(τ i γ N )e i ( ) v τ ne n N w n τ i v(τ i γ N )w i wi w n n i<n Use properties of ψ to get v ( τ ne 1 n (γ N ε ) n) a.s., ε n 0 (slow). ψ (τ nγ N ) ψ ( τ ne 1 n γ ) ( ) 1 N 1 ε nγ 1 N Conclusion: If e n > 1 + l i.o., as τ n [a, b], on subsequence { τn τ 0 > 0 γ N γ 0 > 0, ( ) τ0 γ 0 ψ(τ 0 γ 0 ) ψ, a contradiction. 1 + l 56 / 153
128 Applications/Random Matrices and Robust Estimation 57/153 Outlier Data Theorem (Outlier Rejection) Observation set X = [ ] x 1,..., x (1 εn)n, a 1,..., a εnn where x i CN (0, C N ) and a 1,..., a εnn C N deterministic outliers. Then, ĈN ŜN a.s. 0 where Ŝ N v (γ N ) 1 n (1 ε n)n i=1 x i x i + 1 ε nn v (α i,n ) a i a i n i=1 with γ N and α 1,n,..., α εnn,n unique positive solutions to ( ) γ N = 1 N tr C (1 ε)v(γ N ) N C N + 1 ε nn 1 v (α i,n ) a i a i 1 + cv(γ N )γ N n i=1 1 α i,n = 1 N a (1 ε)v(γ N ) i C N + 1 ε nn v (α j,n ) a j a j a i, i = 1,..., ε nn. 1 + cv(γ N )γ N n j i 57 / 153
129 Applications/Random Matrices and Robust Estimation 58/153 Outlier Data For ε nn = 1, ( φ 1 (1) Ŝ N = v 1 c ) n 1 1 n i=1 ( φ x i x i (v 1 + (1) 1 c Outlier rejection relies on 1 N a 1 C 1 N a N a 1C 1 N a 1 ) ) + o(1) a 1 a 1 58 / 153
130 Applications/Random Matrices and Robust Estimation 58/153 Outlier Data For ε nn = 1, ( φ 1 (1) Ŝ N = v 1 c ) n 1 1 n i=1 ( φ x i x i (v 1 + (1) 1 c Outlier rejection relies on 1 N a 1 C 1 N a 1 1. For a i CN (0, D N ), ε n ε 0, 1 N a 1C 1 N a 1 Ŝ N = v (γ 1 (1 ε n)n n) x i x i n + v (αn) 1 ε nn a i a i n i=1 i=1 γ n = 1 ( (1 ε)v(γn) N tr C N C N cv(γ n)γ n α n = 1 ( (1 ε)v(γn) N tr D N C N cv(γ n)γ n ) εv(α n) 1 + cv(α n)α n D N εv(α n) 1 + cv(α n)α n D N ) + o(1) a 1 a 1 ) 1 ) / 153
131 Applications/Random Matrices and Robust Estimation 58/153 Outlier Data For ε nn = 1, ( φ 1 (1) Ŝ N = v 1 c ) n 1 1 n i=1 ( φ x i x i (v 1 + (1) 1 c Outlier rejection relies on 1 N a 1 C 1 N a 1 1. For a i CN (0, D N ), ε n ε 0, For ε n 0, 1 N a 1C 1 N a 1 Ŝ N = v (γ 1 (1 ε n)n n) x i x i n + v (αn) 1 ε nn a i a i n i=1 i=1 γ n = 1 ( (1 ε)v(γn) N tr C N C N cv(γ n)γ n α n = 1 ( (1 ε)v(γn) N tr D N C N cv(γ n)γ n ( φ 1 ) (1 ε (1) 1 n)n Ŝ N = v x i x i 1 c n + 1 n i=1 ε nn i=1 ) εv(α n) 1 + cv(α n)α n D N εv(α n) 1 + cv(α n)α n D N ( φ 1 (1) v 1 c ) + o(1) a 1 a 1 ) 1 ) 1. 1 N tr D N C 1 N ) a i a i Outlier rejection relies on 1 N tr D N C 1 N / 153
132 Applications/Random Matrices and Robust Estimation 59/153 Outlier Data Deterministic equivalent eigenvalue distribution n (1 εn)n x i=1 i x i (Clean Data) Eigenvalues Figure: Limiting eigenvalue distributions. [C N ] ij =.9 i j, D N = I N, ε = / 153
133 Applications/Random Matrices and Robust Estimation 59/153 Outlier Data Deterministic equivalent eigenvalue distribution (1 εn)n x n i=1 i x i (Clean Data) 1 n XX or per-input normalized Eigenvalues Figure: Limiting eigenvalue distributions. [C N ] ij =.9 i j, D N = I N, ε = / 153
134 Applications/Random Matrices and Robust Estimation 59/153 Outlier Data Deterministic equivalent eigenvalue distribution (1 εn)n x n i=1 i x i (Clean Data) 1 n XX or per-input normalized Ĉ N Eigenvalues Figure: Limiting eigenvalue distributions. [C N ] ij =.9 i j, D N = I N, ε = / 153
135 Applications/Random Matrices and Robust Estimation 60/153 Example of application to statistical finance Robust matrix-optimized portfolio allocation ĈST Annualized standard deviation Ĉ ST Ĉ P Ĉ C Ĉ C2 Ĉ LW Ĉ R t (N=45,n=300) 60 / 153
136 Applications/Spectral Clustering Methods and Random Matrices 61/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 61 / 153
137 Applications/Spectral Clustering Methods and Random Matrices 62/153 Reminder on Spectral Clustering Methods Context: Two-step classification of n objects based on similarity A R n n : 1. extraction of eigenvectors U = [u 1,..., u l ] with dominant eigenvalues 62 / 153
138 Applications/Spectral Clustering Methods and Random Matrices 62/153 Reminder on Spectral Clustering Methods Context: Two-step classification of n objects based on similarity A R n n : 1. extraction of eigenvectors U = [u 1,..., u l ] with dominant eigenvalues 2. classification of vectors U 1,,..., U n, R l using k-means/em. 62 / 153
139 Applications/Spectral Clustering Methods and Random Matrices 62/153 Reminder on Spectral Clustering Methods Context: Two-step classification of n objects based on similarity A R n n : 1. extraction of eigenvectors U = [u 1,..., u l ] with dominant eigenvalues 2. classification of vectors U 1,,..., U n, R l using k-means/em. 0 spikes 62 / 153
140 Applications/Spectral Clustering Methods and Random Matrices 62/153 Reminder on Spectral Clustering Methods Context: Two-step classification of n objects based on similarity A R n n : 1. extraction of eigenvectors U = [u 1,..., u l ] with dominant eigenvalues 2. classification of vectors U 1,,..., U n, R l using k-means/em. 0 spikes Eigenvectors (in practice, shuffled!!) 62 / 153
141 Applications/Spectral Clustering Methods and Random Matrices 63/153 Reminder on Spectral Clustering Methods Eigenv. 2 Eigenv / 153
142 Applications/Spectral Clustering Methods and Random Matrices 63/153 Reminder on Spectral Clustering Methods Eigenv. 2 Eigenv. 1 l-dimensional representation (shuffling no longer matters!) Eigenvector 2 Eigenvector 1 63 / 153
143 Applications/Spectral Clustering Methods and Random Matrices 63/153 Reminder on Spectral Clustering Methods Eigenv. 2 Eigenv. 1 l-dimensional representation (shuffling no longer matters!) Eigenvector 2 Eigenvector 1 EM or k-means clustering. 63 / 153
144 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. 64 / 153
145 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: 64 / 153
146 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution 64 / 153
147 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution spikes 64 / 153
148 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution spikes eigenvectors of isolated eigenvalues. 64 / 153
149 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution spikes eigenvectors of isolated eigenvalues. 2. From Ãn, perform spiked model analysis: 64 / 153
150 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution spikes eigenvectors of isolated eigenvalues. 2. From Ãn, perform spiked model analysis: exhibit phase transition phenomenon 64 / 153
151 Applications/Spectral Clustering Methods and Random Matrices 64/153 The Random Matrix Approach A two-step method: 1. If A n is not a standard random matrix, retrieve Ãn such that A n Ãn a.s. 0 in operator norm as n. Transfers crucial properties from A n to Ãn: limiting eigenvalue distribution spikes eigenvectors of isolated eigenvalues. 2. From Ãn, perform spiked model analysis: exhibit phase transition phenomenon read the content of isolated eigenvectors of à n. 64 / 153
152 Applications/Spectral Clustering Methods and Random Matrices 65/153 The Random Matrix Approach The Spike Analysis: For noisy plateaus -looking isolated eigenvectors u 1,..., u l of Ãn, write k u i = α a j a i + σi a wa i na a=1 with j a R n canonical vector of class C a, w a i noise orthogonal to ja, 65 / 153
153 Applications/Spectral Clustering Methods and Random Matrices 65/153 The Random Matrix Approach The Spike Analysis: For noisy plateaus -looking isolated eigenvectors u 1,..., u l of Ãn, write k u i = α a j a i + σi a wa i na a=1 with j a R n canonical vector of class C a, w a i α a i = 1 na u T i ja (σi a )2 = u i α a j a 2 i. na noise orthogonal to ja, and evaluate 65 / 153
154 Applications/Spectral Clustering Methods and Random Matrices 65/153 The Random Matrix Approach The Spike Analysis: For noisy plateaus -looking isolated eigenvectors u 1,..., u l of Ãn, write k u i = α a j a i + σi a wa i na a=1 with j a R n canonical vector of class C a, w a i α a i = 1 na u T i ja (σi a )2 = u i α a j a 2 i. na noise orthogonal to ja, and evaluate = Can be done using complex analysis calculus, e.g. (α a i )2 = 1 ja T n u iu T i ja a = 1 1 ) 1jadz. ja T (Ãn zi n 2πı γ a n a 65 / 153
155 Applications/Community Detection on Graphs 66/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 66 / 153
156 Applications/Community Detection on Graphs 67/153 System Setting Assume n-node, m-edges undirected graph G, with intrinsic average connectivity q 1,..., q n µ i.i.d. 67 / 153
157 Applications/Community Detection on Graphs 67/153 System Setting Assume n-node, m-edges undirected graph G, with intrinsic average connectivity q 1,..., q n µ i.i.d. k classes C 1,..., C k independent of {q i } of (large) sizes n 1,..., n k, with preferential attachment C ab between C a and C b 67 / 153
158 Applications/Community Detection on Graphs 67/153 System Setting Assume n-node, m-edges undirected graph G, with intrinsic average connectivity q 1,..., q n µ i.i.d. k classes C 1,..., C k independent of {q i } of (large) sizes n 1,..., n k, with preferential attachment C ab between C a and C b induces edge probability for node i C a, j C b, P (i j) = q i q j C ab. 67 / 153
159 Applications/Community Detection on Graphs 67/153 System Setting Assume n-node, m-edges undirected graph G, with intrinsic average connectivity q 1,..., q n µ i.i.d. k classes C 1,..., C k independent of {q i } of (large) sizes n 1,..., n k, with preferential attachment C ab between C a and C b induces edge probability for node i C a, j C b, P (i j) = q i q j C ab. adjacency matrix A with A ij Bernoulli(q i q j C ab ). 67 / 153
160 Applications/Community Detection on Graphs 68/153 Objective Study of spectral methods: standard methods based on adjacency A, modularity A ddt 2m, normalized adjacency D 1 AD 1, etc. (adapted to dense nets) refined methods based on Bethe Hessian (r 2 1)I n ra + D (adapted to sparse nets!) 68 / 153
161 Applications/Community Detection on Graphs 68/153 Objective Study of spectral methods: standard methods based on adjacency A, modularity A ddt 2m, normalized adjacency D 1 AD 1, etc. (adapted to dense nets) refined methods based on Bethe Hessian (r 2 1)I n ra + D (adapted to sparse nets!) Improvement to realistic graphs: observation of failure of standard methods above improvement by new methods. 68 / 153
162 Applications/Community Detection on Graphs 69/153 Limitations of Adjacency/Modularity Approach (Modularity) (Bethe Hessian) 69 / 153
163 Applications/Community Detection on Graphs 69/153 Limitations of Adjacency/Modularity Approach (Modularity) Scenario: 3 classes with µ bi-modal (e.g., µ = 3 4 δ δ 0.5) (Bethe Hessian) Leading eigenvectors of A (or modularity A ddt 2m ) biased by q i distribution. Similar behavior for Bethe Hessian. 69 / 153
164 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). 70 / 153
165 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). Community information is weak but highly REDUNDANT! 70 / 153
166 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). Community information is weak but highly REDUNDANT! Considered Matrix: For α [0, 1], (and with D = diag(a1 n) = diag(d) the degree matrix), m = 1 2 dt 1 the number of edges L α = (2m) α 1 [ ] D α A ddt D α. n 2m 70 / 153
167 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). Community information is weak but highly REDUNDANT! Considered Matrix: For α [0, 1], (and with D = diag(a1 n) = diag(d) the degree matrix), m = 1 2 dt 1 the number of edges L α = (2m) α 1 [ ] D α A ddt D α. n 2m Our results in a nutshell: we find optimal α opt having best phase transition. 70 / 153
168 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). Community information is weak but highly REDUNDANT! Considered Matrix: For α [0, 1], (and with D = diag(a1 n) = diag(d) the degree matrix), m = 1 2 dt 1 the number of edges L α = (2m) α 1 [ ] D α A ddt D α. n 2m Our results in a nutshell: we find optimal α opt having best phase transition. we find consistent estimator ˆα opt from A alone. 70 / 153
169 Applications/Community Detection on Graphs 70/153 Regularized Modularity Approach Connectivity Model: P (i j) = q i q j C ab for i C a, j C b. Dense Regime Assumptions: Non trivial regime when, as n, C ab = 1 + M ab n, M ab = O(1). Community information is weak but highly REDUNDANT! Considered Matrix: For α [0, 1], (and with D = diag(a1 n) = diag(d) the degree matrix), m = 1 2 dt 1 the number of edges L α = (2m) α 1 [ ] D α A ddt D α. n 2m Our results in a nutshell: we find optimal α opt having best phase transition. we find consistent estimator ˆα opt from A alone. we claim optimal eigenvector regularization D α 1 u, u eigenvector of L α. 70 / 153
170 Applications/Community Detection on Graphs 71/153 Asymptotic Equivalence Theorem (Limiting Random Matrix Equivalent) For each α [0, 1], as n, L α L α 0 almost surely, where L α = (2m) α 1 [ ] D α A ddt D α n 2m L α = 1 D α n q XDq α + UΛU T with D q = diag({q i }), X zero-mean random matrix, [ ] U = Dq 1 α J n Dq α X1 n, rank k + 1 [ (Ik 1 Λ = k c T )M(I k c1 T k ) 1 ] k 1 T k 0 and J = [j 1,..., j k ], j a = [0,..., 0, 1 T n a, 0,..., 0] T R n canonical vector of class C a. 71 / 153
171 Applications/Community Detection on Graphs 71/153 Asymptotic Equivalence Theorem (Limiting Random Matrix Equivalent) For each α [0, 1], as n, L α L α 0 almost surely, where L α = (2m) α 1 [ ] D α A ddt D α n 2m L α = 1 D α n q XDq α + UΛU T with D q = diag({q i }), X zero-mean random matrix, [ ] U = Dq 1 α J n Dq α X1 n, rank k + 1 [ (Ik 1 Λ = k c T )M(I k c1 T k ) 1 ] k 1 T k 0 and J = [j 1,..., j k ], j a = [0,..., 0, 1 T n a, 0,..., 0] T R n canonical vector of class C a. Consequences: isolated eigenvalues beyond phase transition λ(m) > spectrum edge optimal choice α opt of α from study of noise spectrum. 71 / 153
172 Applications/Community Detection on Graphs 71/153 Asymptotic Equivalence Theorem (Limiting Random Matrix Equivalent) For each α [0, 1], as n, L α L α 0 almost surely, where L α = (2m) α 1 [ ] D α A ddt D α n 2m L α = 1 D α n q XDq α + UΛU T with D q = diag({q i }), X zero-mean random matrix, [ ] U = Dq 1 α J n Dq α X1 n, rank k + 1 [ (Ik 1 Λ = k c T )M(I k c1 T k ) 1 ] k 1 T k 0 and J = [j 1,..., j k ], j a = [0,..., 0, 1 T n a, 0,..., 0] T R n canonical vector of class C a. Consequences: isolated eigenvalues beyond phase transition λ(m) > spectrum edge optimal choice α opt of α from study of noise spectrum. eigenvectors correlated to Dq 1 α J Natural regularization by D α 1! 71 / 153
173 Applications/Community Detection on Graphs 72/153 Eigenvalue Spectrum Eigenvalues of L 1 Limiting law spikes S α S α S α Figure: Eigenvalues of L 1, K = 3, n = 2000, c 1 = 0.3, c 2 = 0.3, c 3 = 0.4, µ = 1 2 δq (1) δq (2), q (1) = 0.4, q (2) = 0.9, M defined by M ii = 12, M ij = 4, i j. 72 / 153
174 Applications/Community Detection on Graphs 73/153 Phase Transition Theorem (Phase Transition) For α [0, 1], isolated eigenvalue λ i (L α) if λ i ( M) > τ α, M = (D(c) cc T )M, τ α = lim 1 x S + α e α, phase transition threshold 2 (x) with [S α, Sα + ] limiting eigenvalue support of Lα and eα 2 (x) ( x > Sα + ) solution of e α 1 (x) = e α 2 (x) = 1 In this case, e α 2 (λ i(l = λ α)) i( M). q 1 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq) 2 (x) q 2 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq). 2 (x) 73 / 153
175 Applications/Community Detection on Graphs 73/153 Phase Transition Theorem (Phase Transition) For α [0, 1], isolated eigenvalue λ i (L α) if λ i ( M) > τ α, M = (D(c) cc T )M, τ α = lim 1 x S + α e α, phase transition threshold 2 (x) with [S α, Sα + ] limiting eigenvalue support of Lα and eα 2 (x) ( x > Sα + ) solution of e α 1 (x) = e α 2 (x) = 1 In this case, e α 2 (λ i(l = λ α)) i( M). q 1 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq) 2 (x) q 2 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq). 2 (x) Clustering still possible when λ i ( M) = (min α τ α) + ε. Optimal α = α opt: α opt = argmin α [0,1] {τ α}. 73 / 153
176 Applications/Community Detection on Graphs 73/153 Phase Transition Theorem (Phase Transition) For α [0, 1], isolated eigenvalue λ i (L α) if λ i ( M) > τ α, M = (D(c) cc T )M, τ α = lim 1 x S + α e α, phase transition threshold 2 (x) with [S α, Sα + ] limiting eigenvalue support of Lα and eα 2 (x) ( x > Sα + ) solution of e α 1 (x) = e α 2 (x) = 1 In this case, e α 2 (λ i(l = λ α)) i( M). q 1 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq) 2 (x) q 2 2α x q 1 2α e α 1 (x) + q2 2α e α µ(dq). 2 (x) Clustering still possible when λ i ( M) = (min α τ α) + ε. Optimal α = α opt: α opt = argmin α [0,1] {τ α}. d From max i i q i d T 1 n a.s. 0, we obtain consistent estimator ˆα opt of α opt. 73 / 153
177 Applications/Community Detection on Graphs 74/153 Simulated Performance Results (2 masses of q i ) (Modularity) (Bethe Hessian) 74 / 153
178 Applications/Community Detection on Graphs 74/153 Simulated Performance Results (2 masses of q i ) (Modularity) (Bethe Hessian) (Algo with α = 1) Figure: Two dominant eigenvectors (x-y axes) for n = 2000, K = 3, µ = 3 4 δq (1) δq (2), q (1) = 0.1, q (2) = 0.5, c 1 = c 2 = 1 4, c3 = 1 2, M = 100I3. 74 / 153
179 Applications/Community Detection on Graphs 74/153 Simulated Performance Results (2 masses of q i ) (Modularity) (Bethe Hessian) (Algo with α = 1) (Algo with α opt) Figure: Two dominant eigenvectors (x-y axes) for n = 2000, K = 3, µ = 3 4 δq (1) δq (2), q (1) = 0.1, q (2) = 0.5, c 1 = c 2 = 1 4, c3 = 1 2, M = 100I3. 74 / 153
180 Applications/Community Detection on Graphs 75/153 Simulated Performance Results (2 masses for q i ) Normalized spike λ S + α Eigenvalue l (l = 1/e α 2 (λ) beyond phase transition) Figure: Largest eigenvalue λ of L α as a function of the largest eigenvalue l of (D(c) cc T )M, for µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, for α {0, 1 4, 1 2, 3 4, 1, αopt} (indicated below the graph). Here, α opt = Circles indicate phase transition. Beyond phase transition, l = 1/e α 2 (λ). 75 / 153
181 Applications/Community Detection on Graphs 75/153 Simulated Performance Results (2 masses for q i ) Normalized spike λ S + α Eigenvalue l (l = 1/e α 2 (λ) beyond phase transition) Figure: Largest eigenvalue λ of L α as a function of the largest eigenvalue l of (D(c) cc T )M, for µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, for α {0, 1 4, 1 2, 3 4, 1, αopt} (indicated below the graph). Here, α opt = Circles indicate phase transition. Beyond phase transition, l = 1/e α 2 (λ). 75 / 153
182 Applications/Community Detection on Graphs 75/153 Simulated Performance Results (2 masses for q i ) Normalized spike λ S + α α opt Eigenvalue l (l = 1/e α 2 (λ) beyond phase transition) Figure: Largest eigenvalue λ of L α as a function of the largest eigenvalue l of (D(c) cc T )M, for µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, for α {0, 1 4, 1 2, 3 4, 1, αopt} (indicated below the graph). Here, α opt = Circles indicate phase transition. Beyond phase transition, l = 1/e α 2 (λ). 75 / 153
183 Applications/Community Detection on Graphs 76/153 Simulated Performance Results (2 masses for q i ) Overlap α = 1 Phase transition Figure: Overlap performance for n = 3000, K = 3, c i = 1 3, µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, M = I 3, for [5, 50]. Here α opt = / 153
184 Applications/Community Detection on Graphs 76/153 Simulated Performance Results (2 masses for q i ) Overlap α = 0 α = 1 2 α = 1 Phase transition Figure: Overlap performance for n = 3000, K = 3, c i = 1 3, µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, M = I 3, for [5, 50]. Here α opt = / 153
185 Applications/Community Detection on Graphs 76/153 Simulated Performance Results (2 masses for q i ) Overlap 0.4 α = α = 1 2 α = 1 α = α opt Phase transition Figure: Overlap performance for n = 3000, K = 3, c i = 1 3, µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, M = I 3, for [5, 50]. Here α opt = / 153
186 Applications/Community Detection on Graphs 76/153 Simulated Performance Results (2 masses for q i ) Overlap α = 0 α = 1 2 α = 1 α = α opt Bethe Hessian Phase transition Figure: Overlap performance for n = 3000, K = 3, c i = 1 3, µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) = 0.5, M = I 3, for [5, 50]. Here α opt = / 153
187 Applications/Community Detection on Graphs 77/153 Simulated Performance Results (2 masses for q i ) α = 0 α = 0.5 α = 1 α = α opt Bethe Hessian 0.6 Overlap q 2 (q 1 = 0.1) Figure: Overlap performance for n = 3000, K = 3, µ = 3 4 δq (1) δq (2) with q (1) = 0.1 and q (2) [0.1, 0.9], M = 10(2I T 3 ), ci = / 153
188 Applications/Community Detection on Graphs 78/153 Theoretical Performance Analysis of eigenvectors reveals: eigenvectors are noisy staircase vectors 78 / 153
189 Applications/Community Detection on Graphs 78/153 Theoretical Performance Analysis of eigenvectors reveals: eigenvectors are noisy staircase vectors conjectured Gaussian fluctuations of eigenvector entries 78 / 153
190 Applications/Community Detection on Graphs 78/153 Theoretical Performance Analysis of eigenvectors reveals: eigenvectors are noisy staircase vectors conjectured Gaussian fluctuations of eigenvector entries for q i = q 0 (homogeneous case), same variance for all entries in non-homogeneous case, we can compute average variance per class Heuristic asymptotic performance upper-bound using EM. 78 / 153
191 Applications/Community Detection on Graphs 79/153 Theoretical Performance Results (uniform distribution for q i ) Correct classificate rate α = 1 α = 0.75 α = 0.5 α = 0.25 α = 0 α opt Figure: Theoretical probability of correct recovery for n = 2000, K = 2, c 1 = 0.6, c 2 = 0.4, µ uniformly distributed in [0.2, 0.8], M = I 2, for [0, 20]. 79 / 153
192 Applications/Community Detection on Graphs 80/ / 153
193 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: 80 / 153
194 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. 80 / 153
195 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. Classical debate over best normalization of adjacency (or modularity) matrix A not trivial to solve. With heterogeneous degrees, we found a good on-line method. 80 / 153
196 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. Classical debate over best normalization of adjacency (or modularity) matrix A not trivial to solve. With heterogeneous degrees, we found a good on-line method. Simulations support good performances even for rather sparse settings. 80 / 153
197 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. Classical debate over best normalization of adjacency (or modularity) matrix A not trivial to solve. With heterogeneous degrees, we found a good on-line method. Simulations support good performances even for rather sparse settings. But strong limitations: 80 / 153
198 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. Classical debate over best normalization of adjacency (or modularity) matrix A not trivial to solve. With heterogeneous degrees, we found a good on-line method. Simulations support good performances even for rather sparse settings. But strong limitations: Key assumption: C ab = 1 + M ab n. Everything collapses if different regime. 80 / 153
199 Applications/Community Detection on Graphs 80/153 Some Takeaway messages Main findings: Degree heterogeneity breaks community structures in eigenvectors. Compensation by D α 1 normalization of eigenvectors. Classical debate over best normalization of adjacency (or modularity) matrix A not trivial to solve. With heterogeneous degrees, we found a good on-line method. Simulations support good performances even for rather sparse settings. But strong limitations: Key assumption: C ab = 1 + M ab n. Everything collapses if different regime. Simulations on small networks in fact give ridiculous arbitrary results. 80 / 153
200 Applications/Kernel Spectral Clustering 81/153 Outline Basics of Random Matrix Theory Motivation: Large Sample Covariance Matrices The Stieltjes Transform Method Spiked Models Other Common Random Matrix Models Applications Random Matrices and Robust Estimation Spectral Clustering Methods and Random Matrices Community Detection on Graphs Kernel Spectral Clustering Kernel Spectral Clustering: Subspace Clustering Semi-supervised Learning Support Vector Machines Neural Networks: Extreme Learning Machines Perspectives 81 / 153
201 Applications/Kernel Spectral Clustering 82/153 Kernel Spectral Clustering Problem Statement Dataset x 1,..., x n R p Objective: cluster data in k similarity classes S 1,..., S k. 82 / 153
202 Applications/Kernel Spectral Clustering 82/153 Kernel Spectral Clustering Problem Statement Dataset x 1,..., x n R p Objective: cluster data in k similarity classes S 1,..., S k. Typical metric to optimize: k κ(x j, x j (RatioCut) argmin ) S1... S k ={1,...,n} S i=1 j S i i j / S i for some similarity kernel κ(x, y) 0 (large if x similar to y). 82 / 153
203 Applications/Kernel Spectral Clustering 82/153 Kernel Spectral Clustering Problem Statement Dataset x 1,..., x n R p Objective: cluster data in k similarity classes S 1,..., S k. Typical metric to optimize: k κ(x j, x j (RatioCut) argmin ) S1... S k ={1,...,n} S i=1 j S i i j / S i for some similarity kernel κ(x, y) 0 (large if x similar to y). Can be shown equivalent to (RatioCut) argmin M M tr M T (D K)M { } where M R n k M; M ij {0, S j 1 2 } (in particular, M T M = I k ) and K = {κ(x i, x j )} n i,j=1, D ii = n K ij. j=1 82 / 153
204 Applications/Kernel Spectral Clustering 82/153 Kernel Spectral Clustering Problem Statement Dataset x 1,..., x n R p Objective: cluster data in k similarity classes S 1,..., S k. Typical metric to optimize: k κ(x j, x j (RatioCut) argmin ) S1... S k ={1,...,n} S i=1 j S i i j / S i for some similarity kernel κ(x, y) 0 (large if x similar to y). Can be shown equivalent to (RatioCut) argmin M M tr M T (D K)M { } where M R n k M; M ij {0, S j 1 2 } (in particular, M T M = I k ) and K = {κ(x i, x j )} n i,j=1, D ii = n K ij. j=1 But integer problem! Usually NP-complete. 82 / 153
205 Applications/Kernel Spectral Clustering 83/153 Kernel Spectral Clustering Towards kernel spectral clustering Kernel spectral clustering: discrete-to-continuous relaxations of such metrics (RatioCut) argmin M, M T M=I K tr M T (D K)M i.e., eigenvector problem: 1. find eigenvectors of smallest eigenvalues 2. retrieve classes from eigenvector components 83 / 153
206 Applications/Kernel Spectral Clustering 83/153 Kernel Spectral Clustering Towards kernel spectral clustering Kernel spectral clustering: discrete-to-continuous relaxations of such metrics (RatioCut) argmin M, M T M=I K tr M T (D K)M i.e., eigenvector problem: 1. find eigenvectors of smallest eigenvalues 2. retrieve classes from eigenvector components Refinements: working on K, D K, I n D 1 K, I n D 1 2 KD 2 1, etc. several steps algorithms: Ng Jordan Weiss, Shi Malik, etc. 83 / 153
207 Applications/Kernel Spectral Clustering 84/153 Kernel Spectral Clustering 84 / 153
208 Applications/Kernel Spectral Clustering 84/153 Kernel Spectral Clustering 84 / 153
209 Applications/Kernel Spectral Clustering 84/153 Kernel Spectral Clustering 84 / 153
210 Applications/Kernel Spectral Clustering 84/153 Kernel Spectral Clustering Figure: Leading four eigenvectors of D 2 1 KD 2 1 for MNIST data. 84 / 153
211 Applications/Kernel Spectral Clustering 85/153 Methodology and objectives Current state: Algorithms derived from ad-hoc procedures (e.g., relaxation). Little understanding of performance, even for Gaussian mixtures! Let alone when both p and n are large (BigData setting) 85 / 153
212 Applications/Kernel Spectral Clustering 85/153 Methodology and objectives Current state: Algorithms derived from ad-hoc procedures (e.g., relaxation). Little understanding of performance, even for Gaussian mixtures! Let alone when both p and n are large (BigData setting) Objectives and Roadmap: Develop mathematical analysis framework for BigData kernel spectral clustering (p, n ) 85 / 153
213 Applications/Kernel Spectral Clustering 85/153 Methodology and objectives Current state: Algorithms derived from ad-hoc procedures (e.g., relaxation). Little understanding of performance, even for Gaussian mixtures! Let alone when both p and n are large (BigData setting) Objectives and Roadmap: Develop mathematical analysis framework for BigData kernel spectral clustering (p, n ) Understand: 1. Phase transition effects (i.e., when is clustering possible?) 2. Content of each eigenvector 3. Influence of kernel function 4. Performance comparison of clustering algorithms 85 / 153
214 Applications/Kernel Spectral Clustering 85/153 Methodology and objectives Current state: Algorithms derived from ad-hoc procedures (e.g., relaxation). Little understanding of performance, even for Gaussian mixtures! Let alone when both p and n are large (BigData setting) Objectives and Roadmap: Develop mathematical analysis framework for BigData kernel spectral clustering (p, n ) Understand: 1. Phase transition effects (i.e., when is clustering possible?) 2. Content of each eigenvector 3. Influence of kernel function 4. Performance comparison of clustering algorithms Methodology: Use statistical assumptions (Gaussian mixture) Benefit from doubly-infinite independence and random matrix tools 85 / 153
215 Applications/Kernel Spectral Clustering 86/153 Model and Assumptions Gaussian mixture model: x 1,..., x n R p, k classes C 1,..., C k, x 1,..., x n1 C 1,..., x n nk +1,..., x n C k, C a = {x x N (µ a, C a)}. Then, for x i C a, with w i N(0, C a), x i = µ a + w i. 86 / 153
216 Applications/Kernel Spectral Clustering 86/153 Model and Assumptions Gaussian mixture model: x 1,..., x n R p, k classes C 1,..., C k, x 1,..., x n1 C 1,..., x n nk +1,..., x n C k, C a = {x x N (µ a, C a)}. Then, for x i C a, with w i N(0, C a), x i = µ a + w i. Assumption (Convergence Rate) As n, p 1. Data scaling: n c 0 (0, ), n 2. Class scaling: a n c a (0, 1), 3. Mean scaling: with µ k a=1 n a n µ a and µ a µ a µ, then µ a = O(1) 4. Covariance scaling: with C k n a a=1 n C a and Ca Ca C, then C a = O(1), 1 p tr C a = O(1) tr C ac b = O(p) 86 / 153
217 Applications/Kernel Spectral Clustering 87/153 Model and Assumptions Kernel Matrix: Kernel matrix of interest: K = { ( )} 1 n f p x i x j 2 i,j=1 for some sufficiently smooth nonnegative f. 87 / 153
218 Applications/Kernel Spectral Clustering 87/153 Model and Assumptions Kernel Matrix: Kernel matrix of interest: K = { ( )} 1 n f p x i x j 2 i,j=1 for some sufficiently smooth nonnegative f. We study the normalized Laplacian: L = nd 1 2 KD 1 2 with d = K1 n, D = diag(d). 87 / 153
219 Applications/Kernel Spectral Clustering 88/153 Model and Assumptions Difficulty: L is a very intractable random matrix non-linear f non-trivial dependence between entries of L 88 / 153
220 Applications/Kernel Spectral Clustering 88/153 Model and Assumptions Difficulty: L is a very intractable random matrix non-linear f non-trivial dependence between entries of L Strategy: 1. Find random equivalent ˆL a.s. (i.e., L ˆL 0 as n, p ) based on: concentration: K ij constant as n, p (for all i j) Taylor expansion around limit point 88 / 153
221 Applications/Kernel Spectral Clustering 88/153 Model and Assumptions Difficulty: L is a very intractable random matrix non-linear f non-trivial dependence between entries of L Strategy: 1. Find random equivalent ˆL a.s. (i.e., L ˆL 0 as n, p ) based on: concentration: K ij constant as n, p (for all i j) Taylor expansion around limit point 2. Apply spiked random matrix approach to study: existence of isolated eigenvalues in ˆL: phase transition 88 / 153
222 Applications/Kernel Spectral Clustering 88/153 Model and Assumptions Difficulty: L is a very intractable random matrix non-linear f non-trivial dependence between entries of L Strategy: 1. Find random equivalent ˆL a.s. (i.e., L ˆL 0 as n, p ) based on: concentration: K ij constant as n, p (for all i j) Taylor expansion around limit point 2. Apply spiked random matrix approach to study: existence of isolated eigenvalues in ˆL: phase transition eigenvector projections on canonical class-basis 88 / 153
223 Applications/Kernel Spectral Clustering 89/153 Random Matrix Equivalent Results on K: Key Remark: Under our assumptions, uniformly on i, j {1,..., n}, for some common limit τ. 1 p x i x j 2 a.s. τ 89 / 153
224 Applications/Kernel Spectral Clustering 89/153 Random Matrix Equivalent Results on K: Key Remark: Under our assumptions, uniformly on i, j {1,..., n}, for some common limit τ. large dimensional approximation for K: K = f(τ)1 n1 T n + }{{} O (n) 1 p x i x j 2 a.s. τ na1 }{{} low rank, O ( n) + A 2 }{{} informative terms, O (1) 89 / 153
225 Applications/Kernel Spectral Clustering 89/153 Random Matrix Equivalent Results on K: Key Remark: Under our assumptions, uniformly on i, j {1,..., n}, for some common limit τ. large dimensional approximation for K: K = f(τ)1 n1 T n + }{{} O (n) 1 p x i x j 2 a.s. τ na1 }{{} low rank, O ( n) + A 2 }{{} informative terms, O (1) difficult to handle (3 orders to manipulate!) 89 / 153
226 Applications/Kernel Spectral Clustering 89/153 Random Matrix Equivalent Results on K: Key Remark: Under our assumptions, uniformly on i, j {1,..., n}, for some common limit τ. large dimensional approximation for K: K = f(τ)1 n1 T n + }{{} O (n) 1 p x i x j 2 a.s. τ na1 }{{} low rank, O ( n) + A 2 }{{} informative terms, O (1) difficult to handle (3 orders to manipulate!) Observation: Spectrum of L = nd 1 2 KD 1 2 : Dominant eigenvalue n with eigenvector D n All other eigenvalues of order O(1). 89 / 153
227 Applications/Kernel Spectral Clustering 89/153 Random Matrix Equivalent Results on K: Key Remark: Under our assumptions, uniformly on i, j {1,..., n}, for some common limit τ. large dimensional approximation for K: K = f(τ)1 n1 T n + }{{} O (n) 1 p x i x j 2 a.s. τ na1 }{{} low rank, O ( n) + A 2 }{{} informative terms, O (1) difficult to handle (3 orders to manipulate!) Observation: Spectrum of L = nd 1 2 KD 1 2 : Dominant eigenvalue n with eigenvector D n All other eigenvalues of order O(1). Naturally leads to study: Projected normalized Laplacian (or modularity -type Laplacian): L = nd 1 2 KD 1 2 n D n1 T 2 nd 1 ) 1 T = nd 1 2 (K ddt n D1n 1 T D 1 2. d Dominant (normalized) eigenvector D n 1 T n D1 n. 89 / 153
228 Applications/Kernel Spectral Clustering 90/153 Random Matrix Equivalent Theorem (Random Matrix Equivalent) As n, p, in operator norm, L ˆL a.s. 0, where ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n and τ = 2 p tr C, W = [w 1,..., w n] R p n (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] 1 U = p J, Φ, ψ R n (2k+4) B = B 11 = M T M + ( B 11 I k 1 k c T 5f ) (τ) 8f(τ) f (τ) 2f t (τ) I k c1 T k ) 0 k k 0 k 1 R(2k+4) (2k+4) t T 5f 0 (τ) 1 k ( 5f (τ) 8f(τ) f (τ) 2f (τ) ( 5f (τ) 8f(τ) f (τ) 2f (τ) 8f(τ) f (τ) 2f (τ) ) tt T f (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k Rk k. 90 / 153
229 Applications/Kernel Spectral Clustering 90/153 Random Matrix Equivalent Theorem (Random Matrix Equivalent) As n, p, in operator norm, L ˆL a.s. 0, where ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n and τ = 2 p tr C, W = [w 1,..., w n] R p n (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] 1 U = p J, Φ, ψ R n (2k+4) B = B 11 = M T M + ( B 11 I k 1 k c T 5f ) (τ) 8f(τ) f (τ) 2f t (τ) I k c1 T k ) 0 k k 0 k 1 R(2k+4) (2k+4) t T 5f 0 (τ) 1 k ( 5f (τ) 8f(τ) f (τ) 2f (τ) ( 5f (τ) 8f(τ) f (τ) 2f (τ) 8f(τ) f (τ) 2f (τ) ) tt T f (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k Rk k. 1 p J = [j 1,..., j k ] R n k, j a canonical vector of class C a. 90 / 153
230 Applications/Kernel Spectral Clustering 90/153 Random Matrix Equivalent Theorem (Random Matrix Equivalent) As n, p, in operator norm, L ˆL a.s. 0, where ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n and τ = 2 p tr C, W = [w 1,..., w n] R p n (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] 1 U = p J, Φ, ψ R n (2k+4) B = B 11 = M T M + ( B 11 I k 1 k c T 5f ) (τ) 8f(τ) f (τ) 2f t (τ) I k c1 T k ) 0 k k 0 k 1 R(2k+4) (2k+4) t T 5f 0 (τ) 1 k ( 5f (τ) 8f(τ) f (τ) 2f (τ) ( 5f (τ) 8f(τ) f (τ) 2f (τ) M = [µ 1,..., µ k ] Rn k, µ a = µ a k n b b=1 n µ b. 8f(τ) f (τ) 2f (τ) ) tt T f (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k Rk k. 90 / 153
231 Applications/Kernel Spectral Clustering 90/153 Random Matrix Equivalent Theorem (Random Matrix Equivalent) As n, p, in operator norm, L ˆL a.s. 0, where ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n and τ = 2 p tr C, W = [w 1,..., w n] R p n (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] 1 U = p J, Φ, ψ R n (2k+4) B = B 11 = M T M + ( B 11 I k 1 k c T 5f ) (τ) 8f(τ) f (τ) 2f t (τ) I k c1 T k ) 0 k k 0 k 1 R(2k+4) (2k+4) t T 5f 0 (τ) 1 k ( 5f (τ) 8f(τ) f (τ) 2f (τ) ( 5f (τ) 8f(τ) f (τ) 2f (τ) 8f(τ) f (τ) 2f (τ) ) tt T f (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k Rk k. [ ] t = p 1 tr C1,..., p 1 tr Ck R k, Ca = Ca k n b b=1 n C b. 90 / 153
232 Applications/Kernel Spectral Clustering 90/153 Random Matrix Equivalent Theorem (Random Matrix Equivalent) As n, p, in operator norm, L ˆL a.s. 0, where ˆL = 2 f [ ] (τ) 1 f(τ) p P W T W P + UBU T + α(τ)i n and τ = 2 p tr C, W = [w 1,..., w n] R p n (x i = µ a + w i ), P = I n 1 n 1n1T n, [ ] 1 U = p J, Φ, ψ R n (2k+4) B = B 11 = M T M + T = { 1 p tr C ac b ( B 11 I k 1 k c T 5f ) (τ) 8f(τ) f (τ) 2f t (τ) I k c1 T k ) 0 k k 0 k 1 R(2k+4) (2k+4) t T 5f 0 (τ) 1 k ( 5f (τ) 8f(τ) f (τ) 2f (τ) ( 5f (τ) 8f(τ) f (τ) 2f (τ) } k a,b=1 Rk k, C a = C a k b=1 8f(τ) f (τ) 2f (τ) ) tt T f (τ) f (τ) T + p f(τ)α(τ) n 2f (τ) 1 k1 T k Rk k. n b n C b. 90 / 153
233 Applications/Kernel Spectral Clustering 91/153 Random Matrix Equivalent Some consequences: ˆL is a spiked model: UBU T seen as low rank perturbation of 1 p P W T W P 91 / 153
234 Applications/Kernel Spectral Clustering 91/153 Random Matrix Equivalent Some consequences: ˆL is a spiked model: UBU T seen as low rank perturbation of 1 p P W T W P If f (τ) = 0, L asymptotically deterministic! only t and T can be discriminated upon If f (τ) = 0, (e.g., f(x) = x) T unused If 5f (τ) 8f(τ) = f (τ) 2f, t (seemingly) unused (τ) 91 / 153
235 Applications/Kernel Spectral Clustering 91/153 Random Matrix Equivalent Some consequences: ˆL is a spiked model: UBU T seen as low rank perturbation of 1 p P W T W P If f (τ) = 0, L asymptotically deterministic! only t and T can be discriminated upon If f (τ) = 0, (e.g., f(x) = x) T unused If 5f (τ) 8f(τ) Further analysis: = f (τ) 2f, t (seemingly) unused (τ) Determine separability condition for eigenvalues Evaluate eigenvalue positions when separable Evaluate eigenvector projection to canonical basis j 1,..., j k Evaluate fluctuation of eigenvectors. 91 / 153
236 Applications/Kernel Spectral Clustering 92/153 Isolated eigenvalues: Gaussian inputs Eigenvalues of L Eigenvalues of ˆL Figure: Eigenvalues of L and ˆL, k = 3, p = 2048, n = 512, c 1 = c 2 = 1/4, c 3 = 1/2, [µ a] j = 4δ aj, C a = (1 + 2(a 1)/ p)i p, f(x) = exp( x/2). 92 / 153
237 Applications/Kernel Spectral Clustering 93/153 Theoretical Findings versus MNIST 0.2 Eigenvalues of L Figure: Eigenvalues of L (red) and (equivalent Gaussian model) ˆL (white), MNIST data, p = 784, n = / 153
238 Applications/Kernel Spectral Clustering 93/153 Theoretical Findings versus MNIST 0.2 Eigenvalues of L Eigenvalues of ˆL as if Gaussian model Figure: Eigenvalues of L (red) and (equivalent Gaussian model) ˆL (white), MNIST data, p = 784, n = / 153
239 Applications/Kernel Spectral Clustering 94/153 Theoretical Findings versus MNIST Figure: Leading four eigenvectors of D 2 1 KD 1 2 for MNIST data (red) and theoretical findings (blue). 94 / 153
240 Applications/Kernel Spectral Clustering 94/153 Theoretical Findings versus MNIST Figure: Leading four eigenvectors of D 2 1 KD 1 2 for MNIST data (red) and theoretical findings (blue). 94 / 153
Random Matrices in Machine Learning
Random Matrices in Machine Learning Romain COUILLET CentraleSupélec, University of ParisSaclay, France GSTATS IDEX DataScience Chair, GIPSA-lab, University Grenoble Alpes, France. June 21, 2018 1 / 113
More informationA Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications
A Random Matrix Framework for BigData Machine Learning and Applications to Wireless Communications (EURECOM) Romain COUILLET CentraleSupélec, France June, 2017 1 / 80 Outline Basics of Random Matrix Theory
More informationA Random Matrix Framework for BigData Machine Learning
A Random Matrix Framework for BigData Machine Learning (Groupe Deep Learning, DigiCosme) Romain COUILLET CentraleSupélec, France June, 2017 1 / 63 Outline Basics of Random Matrix Theory Motivation: Large
More informationNon white sample covariance matrices.
Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions
More informationComparison Method in Random Matrix Theory
Comparison Method in Random Matrix Theory Jun Yin UW-Madison Valparaíso, Chile, July - 2015 Joint work with A. Knowles. 1 Some random matrices Wigner Matrix: H is N N square matrix, H : H ij = H ji, EH
More informationLocal semicircle law, Wegner estimate and level repulsion for Wigner random matrices
Local semicircle law, Wegner estimate and level repulsion for Wigner random matrices László Erdős University of Munich Oberwolfach, 2008 Dec Joint work with H.T. Yau (Harvard), B. Schlein (Cambrigde) Goal:
More informationEigenvalues and Singular Values of Random Matrices: A Tutorial Introduction
Random Matrix Theory and its applications to Statistics and Wireless Communications Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction Sergio Verdú Princeton University National
More informationAssessing the dependence of high-dimensional time series via sample autocovariances and correlations
Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),
More informationFuture Random Matrix Tools for Large Dimensional Signal Processing
/42 Future Random Matrix Tools for Large Dimensional Signal Processing EUSIPCO 204, Lisbon, Portugal. Abla KAMMOUN and Romain COUILLET 2 King s Abdullah University of Technology and Science, Saudi Arabia
More informationThe Dynamics of Learning: A Random Matrix Approach
The Dynamics of Learning: A Random Matrix Approach ICML 2018, Stockholm, Sweden Zhenyu Liao, Romain Couillet L2S, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair, GIPSA-lab,
More informationA note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao
A note on a Marčenko-Pastur type theorem for time series Jianfeng Yao Workshop on High-dimensional statistics The University of Hong Kong, October 2011 Overview 1 High-dimensional data and the sample covariance
More informationThe circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)
The circular law Lewis Memorial Lecture / DIMACS minicourse March 19, 2008 Terence Tao (UCLA) 1 Eigenvalue distributions Let M = (a ij ) 1 i n;1 j n be a square matrix. Then one has n (generalised) eigenvalues
More informationLarge sample covariance matrices and the T 2 statistic
Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and
More informationPreface to the Second Edition...vii Preface to the First Edition... ix
Contents Preface to the Second Edition...vii Preface to the First Edition........... ix 1 Introduction.............................................. 1 1.1 Large Dimensional Data Analysis.........................
More informationApplications of random matrix theory to principal component analysis(pca)
Applications of random matrix theory to principal component analysis(pca) Jun Yin IAS, UW-Madison IAS, April-2014 Joint work with A. Knowles and H. T Yau. 1 Basic picture: Let H be a Wigner (symmetric)
More informationDeterminantal point processes and random matrix theory in a nutshell
Determinantal point processes and random matrix theory in a nutshell part II Manuela Girotti based on M. Girotti s PhD thesis, A. Kuijlaars notes from Les Houches Winter School 202 and B. Eynard s notes
More informationSpectral law of the sum of random matrices
Spectral law of the sum of random matrices Florent Benaych-Georges benaych@dma.ens.fr May 5, 2005 Abstract The spectral distribution of a matrix is the uniform distribution on its spectrum with multiplicity.
More informationEstimation of the Global Minimum Variance Portfolio in High Dimensions
Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results
More informationInhomogeneous circular laws for random matrices with non-identically distributed entries
Inhomogeneous circular laws for random matrices with non-identically distributed entries Nick Cook with Walid Hachem (Telecom ParisTech), Jamal Najim (Paris-Est) and David Renfrew (SUNY Binghamton/IST
More informationOn corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR
Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional
More informationThe norm of polynomials in large random matrices
The norm of polynomials in large random matrices Camille Mâle, École Normale Supérieure de Lyon, Ph.D. Student under the direction of Alice Guionnet. with a significant contribution by Dimitri Shlyakhtenko.
More informationTHE EIGENVALUES AND EIGENVECTORS OF FINITE, LOW RANK PERTURBATIONS OF LARGE RANDOM MATRICES
THE EIGENVALUES AND EIGENVECTORS OF FINITE, LOW RANK PERTURBATIONS OF LARGE RANDOM MATRICES FLORENT BENAYCH-GEORGES AND RAJ RAO NADAKUDITI Abstract. We consider the eigenvalues and eigenvectors of finite,
More informationStrong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices
Strong Convergence of the Empirical Distribution of Eigenvalues of Large Dimensional Random Matrices by Jack W. Silverstein* Department of Mathematics Box 8205 North Carolina State University Raleigh,
More informationRandom Matrices: Invertibility, Structure, and Applications
Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37
More informationApplications and fundamental results on random Vandermon
Applications and fundamental results on random Vandermonde matrices May 2008 Some important concepts from classical probability Random variables are functions (i.e. they commute w.r.t. multiplication)
More informationConcentration Inequalities for Random Matrices
Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic
More informationRobust covariance matrices estimation and applications in signal processing
Robust covariance matrices estimation and applications in signal processing F. Pascal SONDRA/Supelec GDR ISIS Journée Estimation et traitement statistique en grande dimension May 16 th, 2013 FP (SONDRA/Supelec)
More informationHomework 1. Yuan Yao. September 18, 2011
Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to
More informationThe Matrix Dyson Equation in random matrix theory
The Matrix Dyson Equation in random matrix theory László Erdős IST, Austria Mathematical Physics seminar University of Bristol, Feb 3, 207 Joint work with O. Ajanki, T. Krüger Partially supported by ERC
More informationResearch Program. Romain COUILLET. 11 février CentraleSupélec Université Paris-Sud 11 1 / 38
Research Program Romain COUILLET CentraleSupélec Université Paris-Sud 11 11 février 2016 1 / 38 Outline Curriculum Vitae Research Project : Learning in Large Dimensions Axis 1 : Robust Estimation in Large
More informationExponential tail inequalities for eigenvalues of random matrices
Exponential tail inequalities for eigenvalues of random matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify
More informationEigenvalue variance bounds for Wigner and covariance random matrices
Eigenvalue variance bounds for Wigner and covariance random matrices S. Dallaporta University of Toulouse, France Abstract. This work is concerned with finite range bounds on the variance of individual
More informationMarkov operators, classical orthogonal polynomial ensembles, and random matrices
Markov operators, classical orthogonal polynomial ensembles, and random matrices M. Ledoux, Institut de Mathématiques de Toulouse, France 5ecm Amsterdam, July 2008 recent study of random matrix and random
More informationSupelec Randomness in Wireless Networks: how to deal with it?
Supelec Randomness in Wireless Networks: how to deal with it? Mérouane Debbah Alcatel-Lucent Chair on Flexible Radio merouane.debbah@supelec.fr The performance dilemma... La théorie, c est quand on sait
More informationMIMO Capacities : Eigenvalue Computation through Representation Theory
MIMO Capacities : Eigenvalue Computation through Representation Theory Jayanta Kumar Pal, Donald Richards SAMSI Multivariate distributions working group Outline 1 Introduction 2 MIMO working model 3 Eigenvalue
More informationDISTRIBUTION OF EIGENVALUES OF REAL SYMMETRIC PALINDROMIC TOEPLITZ MATRICES AND CIRCULANT MATRICES
DISTRIBUTION OF EIGENVALUES OF REAL SYMMETRIC PALINDROMIC TOEPLITZ MATRICES AND CIRCULANT MATRICES ADAM MASSEY, STEVEN J. MILLER, AND JOHN SINSHEIMER Abstract. Consider the ensemble of real symmetric Toeplitz
More informationFluctuations from the Semicircle Law Lecture 4
Fluctuations from the Semicircle Law Lecture 4 Ioana Dumitriu University of Washington Women and Math, IAS 2014 May 23, 2014 Ioana Dumitriu (UW) Fluctuations from the Semicircle Law Lecture 4 May 23, 2014
More informationEXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2
EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut
More information1 Intro to RMT (Gene)
M705 Spring 2013 Summary for Week 2 1 Intro to RMT (Gene) (Also see the Anderson - Guionnet - Zeitouni book, pp.6-11(?) ) We start with two independent families of R.V.s, {Z i,j } 1 i
More informationOn the concentration of eigenvalues of random symmetric matrices
On the concentration of eigenvalues of random symmetric matrices Noga Alon Michael Krivelevich Van H. Vu April 23, 2012 Abstract It is shown that for every 1 s n, the probability that the s-th largest
More informationConcentration inequalities: basics and some new challenges
Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,
More informationSemicircle law on short scales and delocalization for Wigner random matrices
Semicircle law on short scales and delocalization for Wigner random matrices László Erdős University of Munich Weizmann Institute, December 2007 Joint work with H.T. Yau (Harvard), B. Schlein (Munich)
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More informationOn corrections of classical multivariate tests for high-dimensional data
On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationRandom regular digraphs: singularity and spectrum
Random regular digraphs: singularity and spectrum Nick Cook, UCLA Probability Seminar, Stanford University November 2, 2015 Universality Circular law Singularity probability Talk outline 1 Universality
More information2 Two-Point Boundary Value Problems
2 Two-Point Boundary Value Problems Another fundamental equation, in addition to the heat eq. and the wave eq., is Poisson s equation: n j=1 2 u x 2 j The unknown is the function u = u(x 1, x 2,..., x
More informationPart II Probability and Measure
Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationData Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs
Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationA new method to bound rate of convergence
A new method to bound rate of convergence Arup Bose Indian Statistical Institute, Kolkata, abose@isical.ac.in Sourav Chatterjee University of California, Berkeley Empirical Spectral Distribution Distribution
More informationRandom Matrix: From Wigner to Quantum Chaos
Random Matrix: From Wigner to Quantum Chaos Horng-Tzer Yau Harvard University Joint work with P. Bourgade, L. Erdős, B. Schlein and J. Yin 1 Perhaps I am now too courageous when I try to guess the distribution
More informationLectures 6 7 : Marchenko-Pastur Law
Fall 2009 MATH 833 Random Matrices B. Valkó Lectures 6 7 : Marchenko-Pastur Law Notes prepared by: A. Ganguly We will now turn our attention to rectangular matrices. Let X = (X 1, X 2,..., X n ) R p n
More informationarxiv: v1 [math-ph] 19 Oct 2018
COMMENT ON FINITE SIZE EFFECTS IN THE AVERAGED EIGENVALUE DENSITY OF WIGNER RANDOM-SIGN REAL SYMMETRIC MATRICES BY G.S. DHESI AND M. AUSLOOS PETER J. FORRESTER AND ALLAN K. TRINH arxiv:1810.08703v1 [math-ph]
More informationFree Probability, Sample Covariance Matrices and Stochastic Eigen-Inference
Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Alan Edelman Department of Mathematics, Computer Science and AI Laboratories. E-mail: edelman@math.mit.edu N. Raj Rao Deparment
More informationINVERSE EIGENVALUE STATISTICS FOR RAYLEIGH AND RICIAN MIMO CHANNELS
INVERSE EIGENVALUE STATISTICS FOR RAYLEIGH AND RICIAN MIMO CHANNELS E. Jorswieck, G. Wunder, V. Jungnickel, T. Haustein Abstract Recently reclaimed importance of the empirical distribution function of
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko
More informationMATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.
MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. Adjoint operator and adjoint matrix Given a linear operator L on an inner product space V, the adjoint of L is a transformation
More informationBasic Concepts in Matrix Algebra
Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1
More informationAn introduction to G-estimation with sample covariance matrices
An introduction to G-estimation with sample covariance matrices Xavier estre xavier.mestre@cttc.cat Centre Tecnològic de Telecomunicacions de Catalunya (CTTC) "atrices aleatoires: applications aux communications
More informationOn the Spectrum of Random Features Maps of High Dimensional Data
On the Spectrum of Random Features Maps of High Dimensional Data ICML 018, Stockholm, Sweden Zhenyu Liao, Romain Couillet LS, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair,
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory
More informationRandom Matrix Theory for Neural Networks
Random Matrix Theory for Neural Networks Ph.D. Mid-Term Evaluation Zhenyu Liao Laboratoire des Signaux et Systèmes CentraleSupélec Université Paris-Saclay Salle sd.207, Bâtiment Bouygues Gif-sur-Yvette,
More informationDensity of States for Random Band Matrices in d = 2
Density of States for Random Band Matrices in d = 2 via the supersymmetric approach Mareike Lager Institute for applied mathematics University of Bonn Joint work with Margherita Disertori ZiF Summer School
More informationMATH 205C: STATIONARY PHASE LEMMA
MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationMath 118, Fall 2014 Final Exam
Math 8, Fall 4 Final Exam True or false Please circle your choice; no explanation is necessary True There is a linear transformation T such that T e ) = e and T e ) = e Solution Since T is linear, if T
More informationHigh Dimensional Minimum Risk Portfolio Optimization
High Dimensional Minimum Risk Portfolio Optimization Liusha Yang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology Centrale-Supélec June 26, 2015 Yang (HKUST)
More informationELEMENTS OF PROBABILITY THEORY
ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable
More informationDef. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as
MAHALANOBIS DISTANCE Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as d E (x, y) = (x 1 y 1 ) 2 + +(x p y p ) 2
More informationLINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM
LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM Unless otherwise stated, all vector spaces in this worksheet are finite dimensional and the scalar field F is R or C. Definition 1. A linear operator
More informationCompressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery
Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad
More informationArchive of past papers, solutions and homeworks for. MATH 224, Linear Algebra 2, Spring 2013, Laurence Barker
Archive of past papers, solutions and homeworks for MATH 224, Linear Algebra 2, Spring 213, Laurence Barker version: 4 June 213 Source file: archfall99.tex page 2: Homeworks. page 3: Quizzes. page 4: Midterm
More informationRandom Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg
Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws Symeon Chatzinotas February 11, 2013 Luxembourg Outline 1. Random Matrix Theory 1. Definition 2. Applications 3. Asymptotics 2. Ensembles
More informationEEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as
L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationBrownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539
Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory
More information1 Math 241A-B Homework Problem List for F2015 and W2016
1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let
More informationInvertibility of random matrices
University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]
More informationLecture I: Asymptotics for large GUE random matrices
Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random
More informationApplications of Robust Optimization in Signal Processing: Beamforming and Power Control Fall 2012
Applications of Robust Optimization in Signal Processing: Beamforg and Power Control Fall 2012 Instructor: Farid Alizadeh Scribe: Shunqiao Sun 12/09/2012 1 Overview In this presentation, we study the applications
More information5 Operations on Multiple Random Variables
EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y
More informationCompound matrices and some classical inequalities
Compound matrices and some classical inequalities Tin-Yau Tam Mathematics & Statistics Auburn University Dec. 3, 04 We discuss some elegant proofs of several classical inequalities of matrices by using
More information2. Matrix Algebra and Random Vectors
2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns
More informationOR MSc Maths Revision Course
OR MSc Maths Revision Course Tom Byrne School of Mathematics University of Edinburgh t.m.byrne@sms.ed.ac.uk 15 September 2017 General Information Today JCMB Lecture Theatre A, 09:30-12:30 Mathematics revision
More informationThe Multivariate Normal Distribution. In this case according to our theorem
The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this
More informationOperator norm convergence for sequence of matrices and application to QIT
Operator norm convergence for sequence of matrices and application to QIT Benoît Collins University of Ottawa & AIMR, Tohoku University Cambridge, INI, October 15, 2013 Overview Overview Plan: 1. Norm
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationFree deconvolution for signal processing applications
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL., NO., JANUARY 27 Free deconvolution for signal processing applications Øyvind Ryan, Member, IEEE, Mérouane Debbah, Member, IEEE arxiv:cs.it/725v 4 Jan 27 Abstract
More informationMinimum variance portfolio optimization in the spiked covariance model
Minimum variance portfolio optimization in the spiked covariance model Liusha Yang, Romain Couillet, Matthew R Mckay To cite this version: Liusha Yang, Romain Couillet, Matthew R Mckay Minimum variance
More informationLecture 4: Analysis of MIMO Systems
Lecture 4: Analysis of MIMO Systems Norms The concept of norm will be extremely useful for evaluating signals and systems quantitatively during this course In the following, we will present vector norms
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationMatrix Theory. A.Holst, V.Ufnarovski
Matrix Theory AHolst, VUfnarovski 55 HINTS AND ANSWERS 9 55 Hints and answers There are two different approaches In the first one write A as a block of rows and note that in B = E ij A all rows different
More information1 Tridiagonal matrices
Lecture Notes: β-ensembles Bálint Virág Notes with Diane Holcomb 1 Tridiagonal matrices Definition 1. Suppose you have a symmetric matrix A, we can define its spectral measure (at the first coordinate
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationPart IA. Vectors and Matrices. Year
Part IA Vectors and Matrices Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2018 Paper 1, Section I 1C Vectors and Matrices For z, w C define the principal value of z w. State de Moivre s
More informationOn the principal components of sample covariance matrices
On the principal components of sample covariance matrices Alex Bloemendal Antti Knowles Horng-Tzer Yau Jun Yin February 4, 205 We introduce a class of M M sample covariance matrices Q which subsumes and
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More information