Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices

Size: px
Start display at page:

Download "Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices"

Transcription

1 Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices Alex Wein MIT Joint work with: Amelia Perry (MIT), Afonso Bandeira (Courant NYU), Ankur Moitra (MIT) 1 / 22

2 Random Matrices Wigner Matrix 1 n W R n n symmetric, W ij N (0, 1) E. P. Wigner, AoM V. A. Marchenko, L. A. Pastur, M.S / 22

3 Random Matrices Wigner Matrix 1 n W R n n symmetric, W ij N (0, 1) E. P. Wigner, AoM V. A. Marchenko, L. A. Pastur, M.S / 22

4 Random Matrices Wigner Matrix 1 n W R n n symmetric, W ij N (0, 1) Wishart Matrix 1 N N k=1 y ky T k y k N (0, I n ) E. P. Wigner, AoM V. A. Marchenko, L. A. Pastur, M.S / 22

5 Random Matrices Wigner Matrix 1 n W R n n symmetric, W ij N (0, 1) Wishart Matrix 1 N N k=1 y ky T k y k N (0, I n ) E. P. Wigner, AoM V. A. Marchenko, L. A. Pastur, M.S / 22

6 Spiked Models Spiked Wigner Matrix Y = 1 n W + λxx T 1 n W Wigner, x = 1. J. Baik, G. Ben Arous, S. Peche, AoP D. Feral, S. Peche, CMP / 22

7 Spiked Models Spiked Wigner Matrix Y = 1 n W + λxx T 1 n W Wigner, x = Visible on the largest eigenvalue when λ > 1 J. Baik, G. Ben Arous, S. Peche, AoP D. Feral, S. Peche, CMP / 22

8 Spiked Models Spiked Wigner Matrix Y = 1 n W + λxx T 1 n W Wigner, x = 1. Wishart Matrix Y = 1 N N k=1 y ky T k y k N ( 0, I n + βxx T ), x = Visible on the largest eigenvalue when λ > 1 J. Baik, G. Ben Arous, S. Peche, AoP D. Feral, S. Peche, CMP / 22

9 Spiked Models Spiked Wigner Matrix Y = 1 n W + λxx T 1 n W Wigner, x = 1. Wishart Matrix Y = 1 N N k=1 y ky T k y k N ( 0, I n + βxx T ), x = Visible on the largest eigenvalue when λ > 1 J. Baik, G. Ben Arous, S. Peche, AoP D. Feral, S. Peche, CMP Visible on the largest eigenvalue when β > γ, γ = n N, β [ 1, ) 3 / 22

10 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T 4 / 22

11 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T Recovery: nontrivial correlation with truth: x, ˆx > ɛ 4 / 22

12 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T Recovery: nontrivial correlation with truth: x, ˆx > ɛ PCA solves both detection and recovery above threshold λ > 1 (Wigner) 4 / 22

13 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T Recovery: nontrivial correlation with truth: x, ˆx > ɛ PCA solves both detection and recovery above threshold λ > 1 (Wigner) Are they statistically possible below the threshold? 4 / 22

14 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T Recovery: nontrivial correlation with truth: x, ˆx > ɛ PCA solves both detection and recovery above threshold λ > 1 (Wigner) Are they statistically possible below the threshold? Need a prior on the spike x R n 4 / 22

15 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T Recovery: nontrivial correlation with truth: x, ˆx > ɛ PCA solves both detection and recovery above threshold λ > 1 (Wigner) Are they statistically possible below the threshold? Need a prior on the spike x R n unit sphere i.i.d. ±1 sparse ±1 4 / 22

16 Detection vs Recovery Hypothesis testing power Recovery quality λ 5 / 22

17 Detection vs Recovery Hypothesis testing power Recovery quality 1.0 Optimal 0.8 PCA λ PCA is optimal 5 / 22

18 Detection vs Recovery Hypothesis testing power Recovery quality 1.0 Optimal 0.8 PCA λ PCA is sub-optimal 5 / 22

19 Detection vs Recovery Hypothesis testing power Recovery quality 1.0 Optimal 0.8 PCA λ PCA is sub-optimal This talk: focus on detection threshold (also hypothesis testing bounds, recovery threshold) 5 / 22

20 3 Scenarios 1. PCA achieves optimal threshold (e.g. Wigner with spherical prior) 6 / 22

21 3 Scenarios 1. PCA achieves optimal threshold (e.g. Wigner with spherical prior) 2. Can beat PCA with an efficient algorithm (e.g. non-gaussian Wigner) 6 / 22

22 3 Scenarios 1. PCA achieves optimal threshold (e.g. Wigner with spherical prior) 2. Can beat PCA with an efficient algorithm (e.g. non-gaussian Wigner) 3. Can beat PCA, but only with an inefficient algorithm (e.g. sparse priors; Wishart) 6 / 22

23 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n L. Le Cam, / 22

24 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n If distributions are contiguous, there is no reliable test to distinguish them. L. Le Cam, / 22

25 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n If distributions are contiguous, there is no reliable test to distinguish them. Proof : Let A n be the event that distinguisher says P n. L. Le Cam, / 22

26 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n If distributions are contiguous, there is no reliable test to distinguish them. Proof : Let A n be the event that distinguisher says P n. Theorem (Second Moment) If E Qn ( dpn dq n ) 2 = O(1), then Pn is contiguous to Q n. L. Le Cam, / 22

27 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n If distributions are contiguous, there is no reliable test to distinguish them. Proof : Let A n be the event that distinguisher says P n. Theorem (Second Moment) If E Qn ( dpn dq n ) 2 = O(1), then Pn is contiguous to Q n. E.g. if P n, Q n have densities p n, q n : E Qn ( dpn dq n ) 2 = EY Qn ( pn(y ) q n(y ) ) 2 L. Le Cam, / 22

28 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n Theorem (Second Moment) If E Qn ( dpn dq n ) 2 = O(1), then Pn is contiguous to Q n. Proof : L. Le Cam, / 22

29 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n Theorem (Second Moment) If E Qn ( dpn dq n ) 2 = O(1), then Pn is contiguous to Q n. Proof : A n dp n }{{} P n(a n) = A n dp n dq n dq n ( ( ) 2 dpn dq n A n dq n }{{} E Qn ( dpn dqn ) 2 ) 1 2 ( A n dq n }{{} Q n(a n) ) 1 2 L. Le Cam, / 22

30 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22

31 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22

32 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 E.g. if the prior X is uniform on the unit sphere (and λ < 1): A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22

33 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 E.g. if the prior X is uniform on the unit sphere (and λ < 1): lim n E x,x X ( λ 2 ) n exp 2 x, x 2 = ( 1 λ 2) 1 2 < A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22

34 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 E.g. if the prior X is uniform on the unit sphere (and λ < 1): lim n E x,x X ( λ 2 ) n exp 2 x, x 2 = ( 1 λ 2) 1 2 < The distributions are contiguous below the spectral threshold A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22

35 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 E.g. if the prior X is uniform on the unit sphere (and λ < 1): lim n E x,x X ( λ 2 ) n exp 2 x, x 2 = ( 1 λ 2) 1 2 < The distributions are contiguous below the spectral threshold Same can be shown for the Wishart case A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22

36 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 E.g. if the prior X is uniform on the unit sphere (and λ < 1): lim n E x,x X ( λ 2 ) n exp 2 x, x 2 = ( 1 λ 2) 1 2 < The distributions are contiguous below the spectral threshold Same can be shown for the Wishart case But what about when we know more about the spike? A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22

37 Rademacher Spike Prior Y 1 n W (Wigner) vs Y 1 n W + λxx T, x Unif { } n. ± 1 n J. Banks, C. Moore, R. Vershynin, J. Xu, / 22

38 Rademacher Spike Prior Y 1 n W (Wigner) vs Y 1 n W + λxx T, x Unif { } n. ± 1 n Still Contiguous for λ < 1 J. Banks, C. Moore, R. Vershynin, J. Xu, / 22

39 Rademacher Spike Prior Y 1 n W (Wigner) vs Y 1 n W + λxx T, x Unif { } n. ± 1 n Still Contiguous for λ < 1 Contiguity argument goes through for a general class of priors! J. Banks, C. Moore, R. Vershynin, J. Xu, / 22

40 Rademacher Spike Prior Y 1 n W (Wigner) vs Y 1 n W + λxx T, x Unif { } n. ± 1 n Still Contiguous for λ < 1 Contiguity argument goes through for a general class of priors! But for sparse priors, with enough sparsity, PCA is no longer optimal. J. Banks, C. Moore, R. Vershynin, J. Xu, / 22

41 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n 11 / 22

42 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n Same story for spherical prior, contiguous for β < γ (PCA threshold). 11 / 22

43 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n Same story for spherical prior, contiguous for β < γ (PCA threshold). For Rademacher prior, something surprising happens: 11 / 22

44 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n Same story for spherical prior, contiguous for β < γ (PCA threshold). For Rademacher prior, something surprising happens: If n N = γ < 1 3 then the models are contiguous for β < γ 11 / 22

45 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n Same story for spherical prior, contiguous for β < γ (PCA threshold). For Rademacher prior, something surprising happens: If n N = γ < 1 3 then the models are contiguous for β < γ But... for γ > there exists a computationally inefficient procedure that distinguishes the two models for some β ( γ, 0) (below the spectral threshold). 11 / 22

46 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n Same story for spherical prior, contiguous for β < γ (PCA threshold). For Rademacher prior, something surprising happens: If n N = γ < 1 3 then the models are contiguous for β < γ But... for γ > there exists a computationally inefficient procedure that distinguishes the two models for some β ( γ, 0) (below the spectral threshold). Is there a computational gap? 11 / 22

47 Rademacher Wishart, Negative β β γ PCA: succeeds above the line inefficient algorithm: succeeds above the line contiguity lower bound: impossible below the line 12 / 22

48 Back to Wigner: What if noise is not Gaussian? x Unif{S n 1 }, W R n n Y = 1 n W + λxx T but W ij p(w) such that Ew = 0, Ew 2 = 1. D. Feral, S. Peche, CMP T. Tao, V. Vu, MAoRMT / 22

49 Back to Wigner: What if noise is not Gaussian? x Unif{S n 1 }, W R n n Y = 1 n W + λxx T but W ij p(w) such that Ew = 0, Ew 2 = 1. Universality: spectral properties are unchanged... D. Feral, S. Peche, CMP T. Tao, V. Vu, MAoRMT / 22

50 Can you tell which one is which? For W ij Unif(±1), λ < 1 W + λ nxx T vs W 14 / 22

51 Can you tell which one is which? For W ij Unif(±1), λ < 1 W + λ nxx T vs W vs 14 / 22

52 Can you tell which one is which? For W ij Unif(±1), λ < 1 W + λ nxx T vs W vs Let s restrict ourselves to when the density p(w) is smooth. 14 / 22

53 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. 15 / 22

54 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) 15 / 22

55 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) f (W ij ) + f (W ij ) λ nx i x j 15 / 22

56 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) f (W ij ) + f (W ij ) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j (f (W ij ) Ef (W ij )) λ nx i x j 15 / 22

57 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) f (W ij ) + f (W ij ) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j (f (W ij ) Ef (W ij )) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j. 15 / 22

58 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) f (W ij ) + f (W ij ) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j (f (W ij ) Ef (W ij )) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j. It is (close to) a new spiked Wigner matrix with λ = λef (W ij ) Ef 2 (W ij ). 15 / 22

59 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) f (W ij ) + f (W ij ) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j (f (W ij ) Ef (W ij )) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j. It is (close to) a new spiked Wigner matrix with λ = λef (W ij ) Ef 2 (W ij ). Calculus of variations gives optimal choice of f : f (w) = p (w) p(w) 15 / 22

60 Pre-transformed PCA Figure: Dashed: p(w), Solid: f (w) = p (w)/p(w) T. Lesieur, F. Krzakala, L. Zdeborová, Allerton F. Krzakala, J. Xu, L. Zdeborová, / 22

61 Pre-transformed PCA Y Figure: Dashed: p(w), Solid: f (w) = p (w)/p(w) f (Y ) T. Lesieur, F. Krzakala, L. Zdeborová, Allerton F. Krzakala, J. Xu, L. Zdeborová, / 22

62 Pre-transformed PCA Y Figure: Dashed: p(w), Solid: f (w) = p (w)/p(w) f (Y ) New threshold at λ = 1, Fp F p = E p ( p (w) p(w) ) 2 1. T. Lesieur, F. Krzakala, L. Zdeborová, Allerton F. Krzakala, J. Xu, L. Zdeborová, / 22

63 Pre-transformed PCA Y Figure: Dashed: p(w), Solid: f (w) = p (w)/p(w) f (Y ) New threshold at λ = 1, Fp F p = E p ( p (w) p(w) ) 2 1. Gaussian is hardest. T. Lesieur, F. Krzakala, L. Zdeborová, Allerton F. Krzakala, J. Xu, L. Zdeborová, / 22

64 Pre-transformed PCA Y Figure: Dashed: p(w), Solid: f (w) = p (w)/p(w) f (Y ) New threshold at λ = 1, Fp Contiguity shows this is optimal! F p = E p ( p (w) p(w) ) 2 1. Gaussian is hardest. T. Lesieur, F. Krzakala, L. Zdeborová, Allerton F. Krzakala, J. Xu, L. Zdeborová, / 22

65 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 ) n E Qn = E exp dq n 2 x, x 2 x, x X 17 / 22

66 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 n E Qn = E exp dq n = 0 Pr ) 2 x, x 2 [ exp ( λ 2 n 2 x, x 2 x, x X ) ] t dt 17 / 22

67 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 ) n E Qn = E exp dq n 2 x, x 2 x, x X [ ( λ 2 ) ] n = Pr exp 0 2 x, x 2 t dt [ ] 2 log t = 2 Pr x, x λ 2 dt n 0 17 / 22

68 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 ) n E Qn = E exp dq n 2 x, x 2 x, x X [ ( λ 2 ) ] n = Pr exp 0 2 x, x 2 t dt [ ] 2 log t = 2 Pr x, x 0 λ 2 dt n 1 ( λ = 2 Pr [ x, x 2 ) n u] exp 2 u2 λ 2 n u du 0 17 / 22

69 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 ) n E Qn = E exp dq n 2 x, x 2 x, x X [ ( λ 2 ) ] n = Pr exp 0 2 x, x 2 t dt [ ] 2 log t = 2 Pr x, x 0 λ 2 dt n 1 ( λ = 2 Pr [ x, x 2 ) n u] exp λ 2 n u du u2 [ ( λ λ 2 2 n u exp n 2 u2 R(u) )] du Rate function: R(u) = lim n 1 n log Pr [ x, x u] 17 / 22

70 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 ) n E Qn = E exp dq n 2 x, x 2 x, x X [ ( λ 2 ) ] n = Pr exp 0 2 x, x 2 t dt [ ] 2 log t = 2 Pr x, x 0 λ 2 dt n 1 ( λ = 2 Pr [ x, x 2 ) n u] exp λ 2 n u du u2 [ ( λ λ 2 2 n u exp n 2 u2 R(u) )] du Rate function: R(u) = lim n 1 n log Pr [ x, x u] In other words: Pr [ x, x u] exp( n R(u)) 17 / 22

71 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 1 [ ( )] dpn λ E Qn 2 λ 2 2 n u exp n dq n 0 2 u2 R(u) du 18 / 22

72 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 1 [ ( )] dpn λ E Qn 2 λ 2 2 n u exp n dq n 0 2 u2 R(u) du Bounded iff λ2 2 u2 R(u) < 0 for all u (0, 1) 18 / 22

73 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 1 [ ( )] dpn λ E Qn 2 λ 2 2 n u exp n dq n 0 2 u2 R(u) du Bounded iff λ2 2 u2 R(u) < 0 for all u (0, 1) How big a parabola can you fit underneath the rate function? 18 / 22

74 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 1 [ ( )] dpn λ E Qn 2 λ 2 2 n u exp n dq n 0 2 u2 R(u) du Bounded iff λ2 2 u2 R(u) < 0 for all u (0, 1) How big a parabola can you fit underneath the rate function? E.g. Rademacher prior (±1) has R(u) = log 2 h ( ) 1+u 2 where h(p) = p log p (1 p) log(1 p) R(u) 1 2 u2 (λ = 1) / 22

75 Improvement: Conditioning J. Banks, C. Moore, J. Neeman, P. Netrapalli, COLT / 22

76 Improvement: Conditioning Goal: P n contiguous to Q n : Q n (A n ) 0 P n (A n ) 0 J. Banks, C. Moore, J. Neeman, P. Netrapalli, COLT / 22

77 Improvement: Conditioning Goal: P n contiguous to Q n : Q n (A n ) 0 P n (A n ) 0 Sufficient to show P n is contiguous to Q n, where P n, P n agree with prob 1 o(1) J. Banks, C. Moore, J. Neeman, P. Netrapalli, COLT / 22

78 Improvement: Conditioning Goal: P n contiguous to Q n : Q n (A n ) 0 P n (A n ) 0 Sufficient to show P n is contiguous to Q n, where P n, P n agree with prob 1 o(1) Choose P n to make E Qn ( d Pn dq n ) 2 small J. Banks, C. Moore, J. Neeman, P. Netrapalli, COLT / 22

79 Improvement: Conditioning Goal: P n contiguous to Q n : Q n (A n ) 0 P n (A n ) 0 Sufficient to show P n is contiguous to Q n, where P n, P n agree with prob 1 o(1) Choose P n to make E Qn ( d Pn dq n ) 2 small Condition away from rare bad events J. Banks, C. Moore, J. Neeman, P. Netrapalli, COLT / 22

80 Conditioning Example: Sparse Rademacher Prior J. Banks, C. Moore, R. Vershynin, J. Xu, / 22

81 Conditioning Example: Sparse Rademacher Prior Sparsity ρ [0, 1] J. Banks, C. Moore, R. Vershynin, J. Xu, / 22

82 Conditioning Example: Sparse Rademacher Prior Sparsity ρ [0, 1] Prior X ρ : i.i.d. x i 1 ρn +1 w.p. ρ/2 1 w.p. ρ/2 0 w.p. 1 ρ J. Banks, C. Moore, R. Vershynin, J. Xu, / 22

83 Conditioning Example: Sparse Rademacher Prior Sparsity ρ [0, 1] +1 w.p. ρ/2 Prior X ρ : i.i.d. x i 1 ρn 1 w.p. ρ/2 0 w.p. 1 ρ P n : spiked Wigner with prior X ρ, Q n : unspiked J. Banks, C. Moore, R. Vershynin, J. Xu, / 22

84 Conditioning Example: Sparse Rademacher Prior Sparsity ρ [0, 1] +1 w.p. ρ/2 Prior X ρ : i.i.d. x i 1 ρn 1 w.p. ρ/2 0 w.p. 1 ρ P n : spiked Wigner with prior X ρ, Q n : unspiked P n : change prior to X ρ : condition on close-to-typical proportion of nonzeros J. Banks, C. Moore, R. Vershynin, J. Xu, / 22

85 Conditioning Example: Sparse Rademacher Prior Sparsity ρ [0, 1] +1 w.p. ρ/2 Prior X ρ : i.i.d. x i 1 ρn 1 w.p. ρ/2 0 w.p. 1 ρ P n : spiked Wigner with prior X ρ, Q n : unspiked P n : change prior to X ρ : condition on close-to-typical proportion of nonzeros 0.6 conditioned R(u) unconditioned R(u) Figure: ρ = 0.2 J. Banks, C. Moore, R. Vershynin, J. Xu, / 22

86 Sparse Rademacher: Results λ ρ unconditioned conditioned noise-conditioned (upcoming) replica prediction (truth) 21 / 22

87 Summary 22 / 22

88 Summary 3 scenarios: 22 / 22

89 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) 22 / 22

90 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) PCA beaten efficiently (non-gaussian Wigner) 22 / 22

91 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) PCA beaten efficiently (non-gaussian Wigner) PCA beaten inefficiently (Rademacher Wishart; sparse Rademacher Wigner) 22 / 22

92 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) PCA beaten efficiently (non-gaussian Wigner) PCA beaten inefficiently (Rademacher Wishart; sparse Rademacher Wigner) Second moment method: simple, widely-applicable technique to show non-detection lower bounds 22 / 22

93 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) PCA beaten efficiently (non-gaussian Wigner) PCA beaten inefficiently (Rademacher Wishart; sparse Rademacher Wigner) Second moment method: simple, widely-applicable technique to show non-detection lower bounds Is it possible to match the replica prediction with a simple method? 22 / 22

94 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) PCA beaten efficiently (non-gaussian Wigner) PCA beaten inefficiently (Rademacher Wishart; sparse Rademacher Wigner) Second moment method: simple, widely-applicable technique to show non-detection lower bounds Is it possible to match the replica prediction with a simple method? Thanks! Questions? 22 / 22

OPTIMALITY AND SUB-OPTIMALITY OF PCA I: SPIKED RANDOM MATRIX MODELS

OPTIMALITY AND SUB-OPTIMALITY OF PCA I: SPIKED RANDOM MATRIX MODELS Submitted to the Annals of Statistics OPTIMALITY AND SUB-OPTIMALITY OF PCA I: SPIKED RANDOM MATRIX MODELS By Amelia Perry and Alexander S. Wein and Afonso S. Bandeira and Ankur Moitra Massachusetts Institute

More information

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization Jess Banks Cristopher Moore Roman Vershynin Nicolas Verzelen Jiaming Xu Abstract We study the problem

More information

arxiv: v2 [math.pr] 24 Jan 2017

arxiv: v2 [math.pr] 24 Jan 2017 Statistical limits of spiked tensor models Amelia Perry, Alexander S. Wein, and Afonso S. Bandeira arxiv:6.0778v [math.pr] 4 Jan 07 Department of Mathematics, Massachusetts Institute of Technology Department

More information

Fundamental limits of symmetric low-rank matrix estimation

Fundamental limits of symmetric low-rank matrix estimation Proceedings of Machine Learning Research vol 65:1 5, 2017 Fundamental limits of symmetric low-rank matrix estimation Marc Lelarge Safran Tech, Magny-Les-Hameaux, France Département d informatique, École

More information

Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic

More information

Non white sample covariance matrices.

Non white sample covariance matrices. Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions

More information

How Robust are Thresholds for Community Detection?

How Robust are Thresholds for Community Detection? How Robust are Thresholds for Community Detection? Ankur Moitra (MIT) joint work with Amelia Perry (MIT) and Alex Wein (MIT) Let me tell you a story about the success of belief propagation and statistical

More information

Reconstruction in the Generalized Stochastic Block Model

Reconstruction in the Generalized Stochastic Block Model Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR

More information

Exponential tail inequalities for eigenvalues of random matrices

Exponential tail inequalities for eigenvalues of random matrices Exponential tail inequalities for eigenvalues of random matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify

More information

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications J.P Bouchaud with: M. Potters, G. Biroli, L. Laloux, M. A. Miceli http://www.cfm.fr Portfolio theory: Basics Portfolio

More information

OPTIMAL DETECTION OF SPARSE PRINCIPAL COMPONENTS. Philippe Rigollet (joint with Quentin Berthet)

OPTIMAL DETECTION OF SPARSE PRINCIPAL COMPONENTS. Philippe Rigollet (joint with Quentin Berthet) OPTIMAL DETECTION OF SPARSE PRINCIPAL COMPONENTS Philippe Rigollet (joint with Quentin Berthet) High dimensional data Cloud of point in R p High dimensional data Cloud of point in R p High dimensional

More information

1 Tridiagonal matrices

1 Tridiagonal matrices Lecture Notes: β-ensembles Bálint Virág Notes with Diane Holcomb 1 Tridiagonal matrices Definition 1. Suppose you have a symmetric matrix A, we can define its spectral measure (at the first coordinate

More information

Invertibility of random matrices

Invertibility of random matrices University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information

Community detection in stochastic block models via spectral methods

Community detection in stochastic block models via spectral methods Community detection in stochastic block models via spectral methods Laurent Massoulié (MSR-Inria Joint Centre, Inria) based on joint works with: Dan Tomozei (EPFL), Marc Lelarge (Inria), Jiaming Xu (UIUC),

More information

Applications of random matrix theory to principal component analysis(pca)

Applications of random matrix theory to principal component analysis(pca) Applications of random matrix theory to principal component analysis(pca) Jun Yin IAS, UW-Madison IAS, April-2014 Joint work with A. Knowles and H. T Yau. 1 Basic picture: Let H be a Wigner (symmetric)

More information

Lectures 6 7 : Marchenko-Pastur Law

Lectures 6 7 : Marchenko-Pastur Law Fall 2009 MATH 833 Random Matrices B. Valkó Lectures 6 7 : Marchenko-Pastur Law Notes prepared by: A. Ganguly We will now turn our attention to rectangular matrices. Let X = (X 1, X 2,..., X n ) R p n

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

III. Quantum ergodicity on graphs, perspectives

III. Quantum ergodicity on graphs, perspectives III. Quantum ergodicity on graphs, perspectives Nalini Anantharaman Université de Strasbourg 24 août 2016 Yesterday we focussed on the case of large regular (discrete) graphs. Let G = (V, E) be a (q +

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Random Matrix Theory and its Applications to Econometrics

Random Matrix Theory and its Applications to Econometrics Random Matrix Theory and its Applications to Econometrics Hyungsik Roger Moon University of Southern California Conference to Celebrate Peter Phillips 40 Years at Yale, October 2018 Spectral Analysis of

More information

Fluctuations from the Semicircle Law Lecture 4

Fluctuations from the Semicircle Law Lecture 4 Fluctuations from the Semicircle Law Lecture 4 Ioana Dumitriu University of Washington Women and Math, IAS 2014 May 23, 2014 Ioana Dumitriu (UW) Fluctuations from the Semicircle Law Lecture 4 May 23, 2014

More information

Random Matrices: Beyond Wigner and Marchenko-Pastur Laws

Random Matrices: Beyond Wigner and Marchenko-Pastur Laws Random Matrices: Beyond Wigner and Marchenko-Pastur Laws Nathan Noiry Modal X, Université Paris Nanterre May 3, 2018 Wigner s Matrices Wishart s Matrices Generalizations Wigner s Matrices ij, (n, i, j

More information

Information-theoretically Optimal Sparse PCA

Information-theoretically Optimal Sparse PCA Information-theoretically Optimal Sparse PCA Yash Deshpande Department of Electrical Engineering Stanford, CA. Andrea Montanari Departments of Electrical Engineering and Statistics Stanford, CA. Abstract

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

On convergence of Approximate Message Passing

On convergence of Approximate Message Passing On convergence of Approximate Message Passing Francesco Caltagirone (1), Florent Krzakala (2) and Lenka Zdeborova (1) (1) Institut de Physique Théorique, CEA Saclay (2) LPS, Ecole Normale Supérieure, Paris

More information

Tensor Biclustering. Hamid Javadi Stanford University Abstract

Tensor Biclustering. Hamid Javadi Stanford University Abstract Tensor Biclustering Soheil Feizi Stanford University sfeizi@stanford.edu Hamid Javadi Stanford University hrhakim@stanford.edu David Tse Stanford University dntse@stanford.edu Abstract Consider a dataset

More information

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Noisy Signal Recovery via Iterative Reweighted L1-Minimization Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.

More information

Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude

Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Addy M. Boĺıvar Cimé Centro de Investigación en Matemáticas A.C. May 1, 2010

More information

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences Yuxin Chen Emmanuel Candès Department of Statistics, Stanford University, Sep. 2016 Nonconvex optimization

More information

On the distinguishability of random quantum states

On the distinguishability of random quantum states 1 1 Department of Computer Science University of Bristol Bristol, UK quant-ph/0607011 Distinguishing quantum states Distinguishing quantum states This talk Question Consider a known ensemble E of n quantum

More information

Statistical Inference and Random Matrices

Statistical Inference and Random Matrices Statistical Inference and Random Matrices N.S. Witte Institute of Fundamental Sciences Massey University New Zealand 5-12-2017 Joint work with Peter Forrester 6 th Wellington Workshop in Probability and

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

1 Intro to RMT (Gene)

1 Intro to RMT (Gene) M705 Spring 2013 Summary for Week 2 1 Intro to RMT (Gene) (Also see the Anderson - Guionnet - Zeitouni book, pp.6-11(?) ) We start with two independent families of R.V.s, {Z i,j } 1 i

More information

Comparison Method in Random Matrix Theory

Comparison Method in Random Matrix Theory Comparison Method in Random Matrix Theory Jun Yin UW-Madison Valparaíso, Chile, July - 2015 Joint work with A. Knowles. 1 Some random matrices Wigner Matrix: H is N N square matrix, H : H ij = H ji, EH

More information

Large sample covariance matrices and the T 2 statistic

Large sample covariance matrices and the T 2 statistic Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and

More information

Random Matrices: Invertibility, Structure, and Applications

Random Matrices: Invertibility, Structure, and Applications Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Random regular digraphs: singularity and spectrum

Random regular digraphs: singularity and spectrum Random regular digraphs: singularity and spectrum Nick Cook, UCLA Probability Seminar, Stanford University November 2, 2015 Universality Circular law Singularity probability Talk outline 1 Universality

More information

Large deviations of the top eigenvalue of random matrices and applications in statistical physics

Large deviations of the top eigenvalue of random matrices and applications in statistical physics Large deviations of the top eigenvalue of random matrices and applications in statistical physics Grégory Schehr LPTMS, CNRS-Université Paris-Sud XI Journées de Physique Statistique Paris, January 29-30,

More information

The Minimax Noise Sensitivity in Compressed Sensing

The Minimax Noise Sensitivity in Compressed Sensing The Minimax Noise Sensitivity in Compressed Sensing Galen Reeves and avid onoho epartment of Statistics Stanford University Abstract Consider the compressed sensing problem of estimating an unknown k-sparse

More information

Rigidity of the 3D hierarchical Coulomb gas. Sourav Chatterjee

Rigidity of the 3D hierarchical Coulomb gas. Sourav Chatterjee Rigidity of point processes Let P be a Poisson point process of intensity n in R d, and let A R d be a set of nonzero volume. Let N(A) := A P. Then E(N(A)) = Var(N(A)) = vol(a)n. Thus, N(A) has fluctuations

More information

On State Estimation with Bad Data Detection

On State Estimation with Bad Data Detection On State Estimation with Bad Data Detection Weiyu Xu, Meng Wang, and Ao Tang School of ECE, Cornell University, Ithaca, NY 4853 Abstract We consider the problem of state estimation through observations

More information

Numerical Solutions to the General Marcenko Pastur Equation

Numerical Solutions to the General Marcenko Pastur Equation Numerical Solutions to the General Marcenko Pastur Equation Miriam Huntley May 2, 23 Motivation: Data Analysis A frequent goal in real world data analysis is estimation of the covariance matrix of some

More information

Sparse PCA in High Dimensions

Sparse PCA in High Dimensions Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and

More information

Painlevé Representations for Distribution Functions for Next-Largest, Next-Next-Largest, etc., Eigenvalues of GOE, GUE and GSE

Painlevé Representations for Distribution Functions for Next-Largest, Next-Next-Largest, etc., Eigenvalues of GOE, GUE and GSE Painlevé Representations for Distribution Functions for Next-Largest, Next-Next-Largest, etc., Eigenvalues of GOE, GUE and GSE Craig A. Tracy UC Davis RHPIA 2005 SISSA, Trieste 1 Figure 1: Paul Painlevé,

More information

Spin Glass Approach to Restricted Isometry Constant

Spin Glass Approach to Restricted Isometry Constant Spin Glass Approach to Restricted Isometry Constant Ayaka Sakata 1,Yoshiyuki Kabashima 2 1 Institute of Statistical Mathematics 2 Tokyo Institute of Technology 1/29 Outline Background: Compressed sensing

More information

Norms of Random Matrices & Low-Rank via Sampling

Norms of Random Matrices & Low-Rank via Sampling CS369M: Algorithms for Modern Massive Data Set Analysis Lecture 4-10/05/2009 Norms of Random Matrices & Low-Rank via Sampling Lecturer: Michael Mahoney Scribes: Jacob Bien and Noah Youngs *Unedited Notes

More information

Beyond the Gaussian universality class

Beyond the Gaussian universality class Beyond the Gaussian universality class MSRI/Evans Talk Ivan Corwin (Courant Institute, NYU) September 13, 2010 Outline Part 1: Random growth models Random deposition, ballistic deposition, corner growth

More information

The non-backtracking operator

The non-backtracking operator The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Mixing in Product Spaces. Elchanan Mossel

Mixing in Product Spaces. Elchanan Mossel Poincaré Recurrence Theorem Theorem (Poincaré, 1890) Let f : X X be a measure preserving transformation. Let E X measurable. Then P[x E : f n (x) / E, n > N(x)] = 0 Poincaré Recurrence Theorem Theorem

More information

Optimal hypothesis testing for stochastic block models with growing degrees

Optimal hypothesis testing for stochastic block models with growing degrees Optimal hypothesis testing for stochastic block models with growing degrees Debapratim Banerjee and Zongming Ma arxiv:705.05305v [math.st] 0 Aug 07 University of Pennsylvania August 0, 07 Abstract The

More information

Random matrices: A Survey. Van H. Vu. Department of Mathematics Rutgers University

Random matrices: A Survey. Van H. Vu. Department of Mathematics Rutgers University Random matrices: A Survey Van H. Vu Department of Mathematics Rutgers University Basic models of random matrices Let ξ be a real or complex-valued random variable with mean 0 and variance 1. Examples.

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Fluctuations of the free energy of spherical spin glass

Fluctuations of the free energy of spherical spin glass Fluctuations of the free energy of spherical spin glass Jinho Baik University of Michigan 2015 May, IMS, Singapore Joint work with Ji Oon Lee (KAIST, Korea) Random matrix theory Real N N Wigner matrix

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws Symeon Chatzinotas February 11, 2013 Luxembourg Outline 1. Random Matrix Theory 1. Definition 2. Applications 3. Asymptotics 2. Ensembles

More information

DETECTING RARE AND WEAK SPIKES IN LARGE COVARIANCE MATRICES

DETECTING RARE AND WEAK SPIKES IN LARGE COVARIANCE MATRICES Submitted to the Annals of Statistics DETECTING RARE AND WEAK SPIKES IN LARGE COVARIANCE MATRICES By Zheng Tracy Ke University of Chicago arxiv:1609.00883v1 [math.st] 4 Sep 2016 Given p-dimensional Gaussian

More information

Small Ball Probability, Arithmetic Structure and Random Matrices

Small Ball Probability, Arithmetic Structure and Random Matrices Small Ball Probability, Arithmetic Structure and Random Matrices Roman Vershynin University of California, Davis April 23, 2008 Distance Problems How far is a random vector X from a given subspace H in

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

Bi-cross-validation for factor analysis

Bi-cross-validation for factor analysis Picking k in factor analysis 1 Bi-cross-validation for factor analysis Art B. Owen, Stanford University and Jingshu Wang, Stanford University Picking k in factor analysis 2 Summary Factor analysis is over

More information

Spectral Partitiong in a Stochastic Block Model

Spectral Partitiong in a Stochastic Block Model Spectral Graph Theory Lecture 21 Spectral Partitiong in a Stochastic Block Model Daniel A. Spielman November 16, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened

More information

Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula

Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula Jean Barbier, Mohamad Dia and Nicolas Macris Laboratoire de Théorie des Communications, Faculté Informatique

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

SPECTRAL METHOD FOR MULTIPLEXED PHASE RETRIEVAL AND APPLICATION IN OPTICAL IMAGING IN COMPLEX MEDIA

SPECTRAL METHOD FOR MULTIPLEXED PHASE RETRIEVAL AND APPLICATION IN OPTICAL IMAGING IN COMPLEX MEDIA SPECTRAL METHOD FOR MULTIPLEXED PHASE RETRIEVAL AND APPLICATION IN OPTICAL IMAGING IN COMPLEX MEDIA Jonathan Dong 1,2 Florent Krzakala 2 Sylvain Gigan 1 We will first introduce the phase retrieval problem

More information

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Electronic Journal of Statistics Vol. 9 (2015) 1 10 ISSN: 1935-7524 DOI: 10.1214/15-EJS1007 Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Xiaohui Chen University

More information

A Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices

A Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices A Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices S. Dallaporta University of Toulouse, France Abstract. This note presents some central limit theorems

More information

FIRST ORDER SENTENCES ON G(n, p), ZERO-ONE LAWS, ALMOST SURE AND COMPLETE THEORIES ON SPARSE RANDOM GRAPHS

FIRST ORDER SENTENCES ON G(n, p), ZERO-ONE LAWS, ALMOST SURE AND COMPLETE THEORIES ON SPARSE RANDOM GRAPHS FIRST ORDER SENTENCES ON G(n, p), ZERO-ONE LAWS, ALMOST SURE AND COMPLETE THEORIES ON SPARSE RANDOM GRAPHS MOUMANTI PODDER 1. First order theory on G(n, p) We start with a very simple property of G(n,

More information

Random Matrix: From Wigner to Quantum Chaos

Random Matrix: From Wigner to Quantum Chaos Random Matrix: From Wigner to Quantum Chaos Horng-Tzer Yau Harvard University Joint work with P. Bourgade, L. Erdős, B. Schlein and J. Yin 1 Perhaps I am now too courageous when I try to guess the distribution

More information

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candès, and Yi Ma, Microsoft Research Asia, Beijing, P.R.C Dept. of Electrical

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Universality of local spectral statistics of random matrices

Universality of local spectral statistics of random matrices Universality of local spectral statistics of random matrices László Erdős Ludwig-Maximilians-Universität, Munich, Germany CRM, Montreal, Mar 19, 2012 Joint with P. Bourgade, B. Schlein, H.T. Yau, and J.

More information

Statistical and Computational Phase Transitions in Planted Models

Statistical and Computational Phase Transitions in Planted Models Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Stochastic geometry and random matrix theory in CS

Stochastic geometry and random matrix theory in CS Stochastic geometry and random matrix theory in CS IPAM: numerical methods for continuous optimization University of Edinburgh Joint with Bah, Blanchard, Cartis, and Donoho Encoder Decoder pair - Encoder/Decoder

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

Distribution of Eigenvalues of Weighted, Structured Matrix Ensembles

Distribution of Eigenvalues of Weighted, Structured Matrix Ensembles Distribution of Eigenvalues of Weighted, Structured Matrix Ensembles Olivia Beckwith 1, Steven J. Miller 2, and Karen Shen 3 1 Harvey Mudd College 2 Williams College 3 Stanford University Joint Meetings

More information

Efficient Spectral Methods for Learning Mixture Models!

Efficient Spectral Methods for Learning Mixture Models! Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant.

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA) The circular law Lewis Memorial Lecture / DIMACS minicourse March 19, 2008 Terence Tao (UCLA) 1 Eigenvalue distributions Let M = (a ij ) 1 i n;1 j n be a square matrix. Then one has n (generalised) eigenvalues

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová

THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES Lenka Zdeborová (CEA Saclay and CNRS, France) MAIN CONTRIBUTORS TO THE PHYSICS UNDERSTANDING OF RANDOM INSTANCES Braunstein, Franz, Kabashima, Kirkpatrick,

More information

Information Recovery from Pairwise Measurements

Information Recovery from Pairwise Measurements Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Mismatched Estimation in Large Linear Systems

Mismatched Estimation in Large Linear Systems Mismatched Estimation in Large Linear Systems Yanting Ma, Dror Baron, Ahmad Beirami Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 7695, USA Department

More information

Contraction Principles some applications

Contraction Principles some applications Contraction Principles some applications Fabrice Gamboa (Institut de Mathématiques de Toulouse) 7th of June 2017 Christian and Patrick 59th Birthday Overview Appetizer : Christian and Patrick secret lives

More information

Spectral thresholds in the bipartite stochastic block model

Spectral thresholds in the bipartite stochastic block model Spectral thresholds in the bipartite stochastic block model Laura Florescu and Will Perkins NYU and U of Birmingham September 27, 2016 Laura Florescu and Will Perkins Spectral thresholds in the bipartite

More information

Interpolation-Based Trust-Region Methods for DFO

Interpolation-Based Trust-Region Methods for DFO Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv

More information

Bulk scaling limits, open questions

Bulk scaling limits, open questions Bulk scaling limits, open questions Based on: Continuum limits of random matrices and the Brownian carousel B. Valkó, B. Virág. Inventiones (2009). Eigenvalue statistics for CMV matrices: from Poisson

More information