Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices
|
|
- Belinda Mills
- 5 years ago
- Views:
Transcription
1 Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices Alex Wein MIT Joint work with: Amelia Perry (MIT), Afonso Bandeira (Courant NYU), Ankur Moitra (MIT) 1 / 22
2 Random Matrices Wigner Matrix 1 n W R n n symmetric, W ij N (0, 1) E. P. Wigner, AoM V. A. Marchenko, L. A. Pastur, M.S / 22
3 Random Matrices Wigner Matrix 1 n W R n n symmetric, W ij N (0, 1) E. P. Wigner, AoM V. A. Marchenko, L. A. Pastur, M.S / 22
4 Random Matrices Wigner Matrix 1 n W R n n symmetric, W ij N (0, 1) Wishart Matrix 1 N N k=1 y ky T k y k N (0, I n ) E. P. Wigner, AoM V. A. Marchenko, L. A. Pastur, M.S / 22
5 Random Matrices Wigner Matrix 1 n W R n n symmetric, W ij N (0, 1) Wishart Matrix 1 N N k=1 y ky T k y k N (0, I n ) E. P. Wigner, AoM V. A. Marchenko, L. A. Pastur, M.S / 22
6 Spiked Models Spiked Wigner Matrix Y = 1 n W + λxx T 1 n W Wigner, x = 1. J. Baik, G. Ben Arous, S. Peche, AoP D. Feral, S. Peche, CMP / 22
7 Spiked Models Spiked Wigner Matrix Y = 1 n W + λxx T 1 n W Wigner, x = Visible on the largest eigenvalue when λ > 1 J. Baik, G. Ben Arous, S. Peche, AoP D. Feral, S. Peche, CMP / 22
8 Spiked Models Spiked Wigner Matrix Y = 1 n W + λxx T 1 n W Wigner, x = 1. Wishart Matrix Y = 1 N N k=1 y ky T k y k N ( 0, I n + βxx T ), x = Visible on the largest eigenvalue when λ > 1 J. Baik, G. Ben Arous, S. Peche, AoP D. Feral, S. Peche, CMP / 22
9 Spiked Models Spiked Wigner Matrix Y = 1 n W + λxx T 1 n W Wigner, x = 1. Wishart Matrix Y = 1 N N k=1 y ky T k y k N ( 0, I n + βxx T ), x = Visible on the largest eigenvalue when λ > 1 J. Baik, G. Ben Arous, S. Peche, AoP D. Feral, S. Peche, CMP Visible on the largest eigenvalue when β > γ, γ = n N, β [ 1, ) 3 / 22
10 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T 4 / 22
11 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T Recovery: nontrivial correlation with truth: x, ˆx > ɛ 4 / 22
12 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T Recovery: nontrivial correlation with truth: x, ˆx > ɛ PCA solves both detection and recovery above threshold λ > 1 (Wigner) 4 / 22
13 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T Recovery: nontrivial correlation with truth: x, ˆx > ɛ PCA solves both detection and recovery above threshold λ > 1 (Wigner) Are they statistically possible below the threshold? 4 / 22
14 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T Recovery: nontrivial correlation with truth: x, ˆx > ɛ PCA solves both detection and recovery above threshold λ > 1 (Wigner) Are they statistically possible below the threshold? Need a prior on the spike x R n 4 / 22
15 Statistical Questions Detection: distinguish reliably (error prob 0) Y 1 n W (Wigner) vs Y 1 n W + λxx T Recovery: nontrivial correlation with truth: x, ˆx > ɛ PCA solves both detection and recovery above threshold λ > 1 (Wigner) Are they statistically possible below the threshold? Need a prior on the spike x R n unit sphere i.i.d. ±1 sparse ±1 4 / 22
16 Detection vs Recovery Hypothesis testing power Recovery quality λ 5 / 22
17 Detection vs Recovery Hypothesis testing power Recovery quality 1.0 Optimal 0.8 PCA λ PCA is optimal 5 / 22
18 Detection vs Recovery Hypothesis testing power Recovery quality 1.0 Optimal 0.8 PCA λ PCA is sub-optimal 5 / 22
19 Detection vs Recovery Hypothesis testing power Recovery quality 1.0 Optimal 0.8 PCA λ PCA is sub-optimal This talk: focus on detection threshold (also hypothesis testing bounds, recovery threshold) 5 / 22
20 3 Scenarios 1. PCA achieves optimal threshold (e.g. Wigner with spherical prior) 6 / 22
21 3 Scenarios 1. PCA achieves optimal threshold (e.g. Wigner with spherical prior) 2. Can beat PCA with an efficient algorithm (e.g. non-gaussian Wigner) 6 / 22
22 3 Scenarios 1. PCA achieves optimal threshold (e.g. Wigner with spherical prior) 2. Can beat PCA with an efficient algorithm (e.g. non-gaussian Wigner) 3. Can beat PCA, but only with an inefficient algorithm (e.g. sparse priors; Wishart) 6 / 22
23 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n L. Le Cam, / 22
24 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n If distributions are contiguous, there is no reliable test to distinguish them. L. Le Cam, / 22
25 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n If distributions are contiguous, there is no reliable test to distinguish them. Proof : Let A n be the event that distinguisher says P n. L. Le Cam, / 22
26 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n If distributions are contiguous, there is no reliable test to distinguish them. Proof : Let A n be the event that distinguisher says P n. Theorem (Second Moment) If E Qn ( dpn dq n ) 2 = O(1), then Pn is contiguous to Q n. L. Le Cam, / 22
27 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n If distributions are contiguous, there is no reliable test to distinguish them. Proof : Let A n be the event that distinguisher says P n. Theorem (Second Moment) If E Qn ( dpn dq n ) 2 = O(1), then Pn is contiguous to Q n. E.g. if P n, Q n have densities p n, q n : E Qn ( dpn dq n ) 2 = EY Qn ( pn(y ) q n(y ) ) 2 L. Le Cam, / 22
28 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n Theorem (Second Moment) If E Qn ( dpn dq n ) 2 = O(1), then Pn is contiguous to Q n. Proof : L. Le Cam, / 22
29 Tool: Contiguity Sequence of distributions P n is contiguous to Q n if for any sequence of events A n, lim Q n (A n ) = 0 lim P n (A n ) = 0. n n Theorem (Second Moment) If E Qn ( dpn dq n ) 2 = O(1), then Pn is contiguous to Q n. Proof : A n dp n }{{} P n(a n) = A n dp n dq n dq n ( ( ) 2 dpn dq n A n dq n }{{} E Qn ( dpn dqn ) 2 ) 1 2 ( A n dq n }{{} Q n(a n) ) 1 2 L. Le Cam, / 22
30 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22
31 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22
32 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 E.g. if the prior X is uniform on the unit sphere (and λ < 1): A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22
33 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 E.g. if the prior X is uniform on the unit sphere (and λ < 1): lim n E x,x X ( λ 2 ) n exp 2 x, x 2 = ( 1 λ 2) 1 2 < A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22
34 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 E.g. if the prior X is uniform on the unit sphere (and λ < 1): lim n E x,x X ( λ 2 ) n exp 2 x, x 2 = ( 1 λ 2) 1 2 < The distributions are contiguous below the spectral threshold A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22
35 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 E.g. if the prior X is uniform on the unit sphere (and λ < 1): lim n E x,x X ( λ 2 ) n exp 2 x, x 2 = ( 1 λ 2) 1 2 < The distributions are contiguous below the spectral threshold Same can be shown for the Wishart case A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22
36 Second Moment Taking P n : 1 n W + λxx T with x X, and Q n : 1 n W ( ) 2 ( dpn λ 2 ) n E Qn = E x,x dq X exp n 2 x, x 2 E.g. if the prior X is uniform on the unit sphere (and λ < 1): lim n E x,x X ( λ 2 ) n exp 2 x, x 2 = ( 1 λ 2) 1 2 < The distributions are contiguous below the spectral threshold Same can be shown for the Wishart case But what about when we know more about the spike? A. Montanari, D. Reichman, O. Zeitouni, NIPS A. Onatski, M. J. Moreira, M. Hallin, AoS / 22
37 Rademacher Spike Prior Y 1 n W (Wigner) vs Y 1 n W + λxx T, x Unif { } n. ± 1 n J. Banks, C. Moore, R. Vershynin, J. Xu, / 22
38 Rademacher Spike Prior Y 1 n W (Wigner) vs Y 1 n W + λxx T, x Unif { } n. ± 1 n Still Contiguous for λ < 1 J. Banks, C. Moore, R. Vershynin, J. Xu, / 22
39 Rademacher Spike Prior Y 1 n W (Wigner) vs Y 1 n W + λxx T, x Unif { } n. ± 1 n Still Contiguous for λ < 1 Contiguity argument goes through for a general class of priors! J. Banks, C. Moore, R. Vershynin, J. Xu, / 22
40 Rademacher Spike Prior Y 1 n W (Wigner) vs Y 1 n W + λxx T, x Unif { } n. ± 1 n Still Contiguous for λ < 1 Contiguity argument goes through for a general class of priors! But for sparse priors, with enough sparsity, PCA is no longer optimal. J. Banks, C. Moore, R. Vershynin, J. Xu, / 22
41 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n 11 / 22
42 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n Same story for spherical prior, contiguous for β < γ (PCA threshold). 11 / 22
43 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n Same story for spherical prior, contiguous for β < γ (PCA threshold). For Rademacher prior, something surprising happens: 11 / 22
44 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n Same story for spherical prior, contiguous for β < γ (PCA threshold). For Rademacher prior, something surprising happens: If n N = γ < 1 3 then the models are contiguous for β < γ 11 / 22
45 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n Same story for spherical prior, contiguous for β < γ (PCA threshold). For Rademacher prior, something surprising happens: If n N = γ < 1 3 then the models are contiguous for β < γ But... for γ > there exists a computationally inefficient procedure that distinguishes the two models for some β ( γ, 0) (below the spectral threshold). 11 / 22
46 What about for Wishart? 1 N N y k yk T k=1 y k N (0, I n ) vs y k N ( 0, I n + βxx ) { } n T, x Unif ± 1 n Same story for spherical prior, contiguous for β < γ (PCA threshold). For Rademacher prior, something surprising happens: If n N = γ < 1 3 then the models are contiguous for β < γ But... for γ > there exists a computationally inefficient procedure that distinguishes the two models for some β ( γ, 0) (below the spectral threshold). Is there a computational gap? 11 / 22
47 Rademacher Wishart, Negative β β γ PCA: succeeds above the line inefficient algorithm: succeeds above the line contiguity lower bound: impossible below the line 12 / 22
48 Back to Wigner: What if noise is not Gaussian? x Unif{S n 1 }, W R n n Y = 1 n W + λxx T but W ij p(w) such that Ew = 0, Ew 2 = 1. D. Feral, S. Peche, CMP T. Tao, V. Vu, MAoRMT / 22
49 Back to Wigner: What if noise is not Gaussian? x Unif{S n 1 }, W R n n Y = 1 n W + λxx T but W ij p(w) such that Ew = 0, Ew 2 = 1. Universality: spectral properties are unchanged... D. Feral, S. Peche, CMP T. Tao, V. Vu, MAoRMT / 22
50 Can you tell which one is which? For W ij Unif(±1), λ < 1 W + λ nxx T vs W 14 / 22
51 Can you tell which one is which? For W ij Unif(±1), λ < 1 W + λ nxx T vs W vs 14 / 22
52 Can you tell which one is which? For W ij Unif(±1), λ < 1 W + λ nxx T vs W vs Let s restrict ourselves to when the density p(w) is smooth. 14 / 22
53 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. 15 / 22
54 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) 15 / 22
55 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) f (W ij ) + f (W ij ) λ nx i x j 15 / 22
56 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) f (W ij ) + f (W ij ) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j (f (W ij ) Ef (W ij )) λ nx i x j 15 / 22
57 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) f (W ij ) + f (W ij ) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j (f (W ij ) Ef (W ij )) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j. 15 / 22
58 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) f (W ij ) + f (W ij ) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j (f (W ij ) Ef (W ij )) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j. It is (close to) a new spiked Wigner matrix with λ = λef (W ij ) Ef 2 (W ij ). 15 / 22
59 Entrywise function If noise drawn from non-gaussian p(w): we will beat PCA by applying some function f : R R entrywise to our matrix Y = W + λ nxx, followed by PCA. f (Y ij ) = f ( W ij + λ nx i x j ) f (W ij ) + f (W ij ) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j (f (W ij ) Ef (W ij )) λ nx i x j f (W ij ) + E[f (W ij )]λ nx i x j. It is (close to) a new spiked Wigner matrix with λ = λef (W ij ) Ef 2 (W ij ). Calculus of variations gives optimal choice of f : f (w) = p (w) p(w) 15 / 22
60 Pre-transformed PCA Figure: Dashed: p(w), Solid: f (w) = p (w)/p(w) T. Lesieur, F. Krzakala, L. Zdeborová, Allerton F. Krzakala, J. Xu, L. Zdeborová, / 22
61 Pre-transformed PCA Y Figure: Dashed: p(w), Solid: f (w) = p (w)/p(w) f (Y ) T. Lesieur, F. Krzakala, L. Zdeborová, Allerton F. Krzakala, J. Xu, L. Zdeborová, / 22
62 Pre-transformed PCA Y Figure: Dashed: p(w), Solid: f (w) = p (w)/p(w) f (Y ) New threshold at λ = 1, Fp F p = E p ( p (w) p(w) ) 2 1. T. Lesieur, F. Krzakala, L. Zdeborová, Allerton F. Krzakala, J. Xu, L. Zdeborová, / 22
63 Pre-transformed PCA Y Figure: Dashed: p(w), Solid: f (w) = p (w)/p(w) f (Y ) New threshold at λ = 1, Fp F p = E p ( p (w) p(w) ) 2 1. Gaussian is hardest. T. Lesieur, F. Krzakala, L. Zdeborová, Allerton F. Krzakala, J. Xu, L. Zdeborová, / 22
64 Pre-transformed PCA Y Figure: Dashed: p(w), Solid: f (w) = p (w)/p(w) f (Y ) New threshold at λ = 1, Fp Contiguity shows this is optimal! F p = E p ( p (w) p(w) ) 2 1. Gaussian is hardest. T. Lesieur, F. Krzakala, L. Zdeborová, Allerton F. Krzakala, J. Xu, L. Zdeborová, / 22
65 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 ) n E Qn = E exp dq n 2 x, x 2 x, x X 17 / 22
66 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 n E Qn = E exp dq n = 0 Pr ) 2 x, x 2 [ exp ( λ 2 n 2 x, x 2 x, x X ) ] t dt 17 / 22
67 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 ) n E Qn = E exp dq n 2 x, x 2 x, x X [ ( λ 2 ) ] n = Pr exp 0 2 x, x 2 t dt [ ] 2 log t = 2 Pr x, x λ 2 dt n 0 17 / 22
68 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 ) n E Qn = E exp dq n 2 x, x 2 x, x X [ ( λ 2 ) ] n = Pr exp 0 2 x, x 2 t dt [ ] 2 log t = 2 Pr x, x 0 λ 2 dt n 1 ( λ = 2 Pr [ x, x 2 ) n u] exp 2 u2 λ 2 n u du 0 17 / 22
69 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 ) n E Qn = E exp dq n 2 x, x 2 x, x X [ ( λ 2 ) ] n = Pr exp 0 2 x, x 2 t dt [ ] 2 log t = 2 Pr x, x 0 λ 2 dt n 1 ( λ = 2 Pr [ x, x 2 ) n u] exp λ 2 n u du u2 [ ( λ λ 2 2 n u exp n 2 u2 R(u) )] du Rate function: R(u) = lim n 1 n log Pr [ x, x u] 17 / 22
70 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 ( dpn λ 2 ) n E Qn = E exp dq n 2 x, x 2 x, x X [ ( λ 2 ) ] n = Pr exp 0 2 x, x 2 t dt [ ] 2 log t = 2 Pr x, x 0 λ 2 dt n 1 ( λ = 2 Pr [ x, x 2 ) n u] exp λ 2 n u du u2 [ ( λ λ 2 2 n u exp n 2 u2 R(u) )] du Rate function: R(u) = lim n 1 n log Pr [ x, x u] In other words: Pr [ x, x u] exp( n R(u)) 17 / 22
71 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 1 [ ( )] dpn λ E Qn 2 λ 2 2 n u exp n dq n 0 2 u2 R(u) du 18 / 22
72 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 1 [ ( )] dpn λ E Qn 2 λ 2 2 n u exp n dq n 0 2 u2 R(u) du Bounded iff λ2 2 u2 R(u) < 0 for all u (0, 1) 18 / 22
73 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 1 [ ( )] dpn λ E Qn 2 λ 2 2 n u exp n dq n 0 2 u2 R(u) du Bounded iff λ2 2 u2 R(u) < 0 for all u (0, 1) How big a parabola can you fit underneath the rate function? 18 / 22
74 Proof Details: Bounding the (Wigner) Second Moment ( ) 2 1 [ ( )] dpn λ E Qn 2 λ 2 2 n u exp n dq n 0 2 u2 R(u) du Bounded iff λ2 2 u2 R(u) < 0 for all u (0, 1) How big a parabola can you fit underneath the rate function? E.g. Rademacher prior (±1) has R(u) = log 2 h ( ) 1+u 2 where h(p) = p log p (1 p) log(1 p) R(u) 1 2 u2 (λ = 1) / 22
75 Improvement: Conditioning J. Banks, C. Moore, J. Neeman, P. Netrapalli, COLT / 22
76 Improvement: Conditioning Goal: P n contiguous to Q n : Q n (A n ) 0 P n (A n ) 0 J. Banks, C. Moore, J. Neeman, P. Netrapalli, COLT / 22
77 Improvement: Conditioning Goal: P n contiguous to Q n : Q n (A n ) 0 P n (A n ) 0 Sufficient to show P n is contiguous to Q n, where P n, P n agree with prob 1 o(1) J. Banks, C. Moore, J. Neeman, P. Netrapalli, COLT / 22
78 Improvement: Conditioning Goal: P n contiguous to Q n : Q n (A n ) 0 P n (A n ) 0 Sufficient to show P n is contiguous to Q n, where P n, P n agree with prob 1 o(1) Choose P n to make E Qn ( d Pn dq n ) 2 small J. Banks, C. Moore, J. Neeman, P. Netrapalli, COLT / 22
79 Improvement: Conditioning Goal: P n contiguous to Q n : Q n (A n ) 0 P n (A n ) 0 Sufficient to show P n is contiguous to Q n, where P n, P n agree with prob 1 o(1) Choose P n to make E Qn ( d Pn dq n ) 2 small Condition away from rare bad events J. Banks, C. Moore, J. Neeman, P. Netrapalli, COLT / 22
80 Conditioning Example: Sparse Rademacher Prior J. Banks, C. Moore, R. Vershynin, J. Xu, / 22
81 Conditioning Example: Sparse Rademacher Prior Sparsity ρ [0, 1] J. Banks, C. Moore, R. Vershynin, J. Xu, / 22
82 Conditioning Example: Sparse Rademacher Prior Sparsity ρ [0, 1] Prior X ρ : i.i.d. x i 1 ρn +1 w.p. ρ/2 1 w.p. ρ/2 0 w.p. 1 ρ J. Banks, C. Moore, R. Vershynin, J. Xu, / 22
83 Conditioning Example: Sparse Rademacher Prior Sparsity ρ [0, 1] +1 w.p. ρ/2 Prior X ρ : i.i.d. x i 1 ρn 1 w.p. ρ/2 0 w.p. 1 ρ P n : spiked Wigner with prior X ρ, Q n : unspiked J. Banks, C. Moore, R. Vershynin, J. Xu, / 22
84 Conditioning Example: Sparse Rademacher Prior Sparsity ρ [0, 1] +1 w.p. ρ/2 Prior X ρ : i.i.d. x i 1 ρn 1 w.p. ρ/2 0 w.p. 1 ρ P n : spiked Wigner with prior X ρ, Q n : unspiked P n : change prior to X ρ : condition on close-to-typical proportion of nonzeros J. Banks, C. Moore, R. Vershynin, J. Xu, / 22
85 Conditioning Example: Sparse Rademacher Prior Sparsity ρ [0, 1] +1 w.p. ρ/2 Prior X ρ : i.i.d. x i 1 ρn 1 w.p. ρ/2 0 w.p. 1 ρ P n : spiked Wigner with prior X ρ, Q n : unspiked P n : change prior to X ρ : condition on close-to-typical proportion of nonzeros 0.6 conditioned R(u) unconditioned R(u) Figure: ρ = 0.2 J. Banks, C. Moore, R. Vershynin, J. Xu, / 22
86 Sparse Rademacher: Results λ ρ unconditioned conditioned noise-conditioned (upcoming) replica prediction (truth) 21 / 22
87 Summary 22 / 22
88 Summary 3 scenarios: 22 / 22
89 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) 22 / 22
90 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) PCA beaten efficiently (non-gaussian Wigner) 22 / 22
91 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) PCA beaten efficiently (non-gaussian Wigner) PCA beaten inefficiently (Rademacher Wishart; sparse Rademacher Wigner) 22 / 22
92 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) PCA beaten efficiently (non-gaussian Wigner) PCA beaten inefficiently (Rademacher Wishart; sparse Rademacher Wigner) Second moment method: simple, widely-applicable technique to show non-detection lower bounds 22 / 22
93 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) PCA beaten efficiently (non-gaussian Wigner) PCA beaten inefficiently (Rademacher Wishart; sparse Rademacher Wigner) Second moment method: simple, widely-applicable technique to show non-detection lower bounds Is it possible to match the replica prediction with a simple method? 22 / 22
94 Summary 3 scenarios: PCA optimal (Wigner with spherical or Rademacher prior) PCA beaten efficiently (non-gaussian Wigner) PCA beaten inefficiently (Rademacher Wishart; sparse Rademacher Wigner) Second moment method: simple, widely-applicable technique to show non-detection lower bounds Is it possible to match the replica prediction with a simple method? Thanks! Questions? 22 / 22
OPTIMALITY AND SUB-OPTIMALITY OF PCA I: SPIKED RANDOM MATRIX MODELS
Submitted to the Annals of Statistics OPTIMALITY AND SUB-OPTIMALITY OF PCA I: SPIKED RANDOM MATRIX MODELS By Amelia Perry and Alexander S. Wein and Afonso S. Bandeira and Ankur Moitra Massachusetts Institute
More informationInformation-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization
Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization Jess Banks Cristopher Moore Roman Vershynin Nicolas Verzelen Jiaming Xu Abstract We study the problem
More informationarxiv: v2 [math.pr] 24 Jan 2017
Statistical limits of spiked tensor models Amelia Perry, Alexander S. Wein, and Afonso S. Bandeira arxiv:6.0778v [math.pr] 4 Jan 07 Department of Mathematics, Massachusetts Institute of Technology Department
More informationFundamental limits of symmetric low-rank matrix estimation
Proceedings of Machine Learning Research vol 65:1 5, 2017 Fundamental limits of symmetric low-rank matrix estimation Marc Lelarge Safran Tech, Magny-Les-Hameaux, France Département d informatique, École
More informationConcentration Inequalities for Random Matrices
Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic
More informationNon white sample covariance matrices.
Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions
More informationHow Robust are Thresholds for Community Detection?
How Robust are Thresholds for Community Detection? Ankur Moitra (MIT) joint work with Amelia Perry (MIT) and Alex Wein (MIT) Let me tell you a story about the success of belief propagation and statistical
More informationReconstruction in the Generalized Stochastic Block Model
Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR
More informationExponential tail inequalities for eigenvalues of random matrices
Exponential tail inequalities for eigenvalues of random matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify
More informationRandom Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications
Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications J.P Bouchaud with: M. Potters, G. Biroli, L. Laloux, M. A. Miceli http://www.cfm.fr Portfolio theory: Basics Portfolio
More informationOPTIMAL DETECTION OF SPARSE PRINCIPAL COMPONENTS. Philippe Rigollet (joint with Quentin Berthet)
OPTIMAL DETECTION OF SPARSE PRINCIPAL COMPONENTS Philippe Rigollet (joint with Quentin Berthet) High dimensional data Cloud of point in R p High dimensional data Cloud of point in R p High dimensional
More information1 Tridiagonal matrices
Lecture Notes: β-ensembles Bálint Virág Notes with Diane Holcomb 1 Tridiagonal matrices Definition 1. Suppose you have a symmetric matrix A, we can define its spectral measure (at the first coordinate
More informationInvertibility of random matrices
University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]
More informationRandom matrices: Distribution of the least singular value (via Property Testing)
Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued
More informationCommunity detection in stochastic block models via spectral methods
Community detection in stochastic block models via spectral methods Laurent Massoulié (MSR-Inria Joint Centre, Inria) based on joint works with: Dan Tomozei (EPFL), Marc Lelarge (Inria), Jiaming Xu (UIUC),
More informationApplications of random matrix theory to principal component analysis(pca)
Applications of random matrix theory to principal component analysis(pca) Jun Yin IAS, UW-Madison IAS, April-2014 Joint work with A. Knowles and H. T Yau. 1 Basic picture: Let H be a Wigner (symmetric)
More informationLectures 6 7 : Marchenko-Pastur Law
Fall 2009 MATH 833 Random Matrices B. Valkó Lectures 6 7 : Marchenko-Pastur Law Notes prepared by: A. Ganguly We will now turn our attention to rectangular matrices. Let X = (X 1, X 2,..., X n ) R p n
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationIII. Quantum ergodicity on graphs, perspectives
III. Quantum ergodicity on graphs, perspectives Nalini Anantharaman Université de Strasbourg 24 août 2016 Yesterday we focussed on the case of large regular (discrete) graphs. Let G = (V, E) be a (q +
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationRandom Matrix Theory and its Applications to Econometrics
Random Matrix Theory and its Applications to Econometrics Hyungsik Roger Moon University of Southern California Conference to Celebrate Peter Phillips 40 Years at Yale, October 2018 Spectral Analysis of
More informationFluctuations from the Semicircle Law Lecture 4
Fluctuations from the Semicircle Law Lecture 4 Ioana Dumitriu University of Washington Women and Math, IAS 2014 May 23, 2014 Ioana Dumitriu (UW) Fluctuations from the Semicircle Law Lecture 4 May 23, 2014
More informationRandom Matrices: Beyond Wigner and Marchenko-Pastur Laws
Random Matrices: Beyond Wigner and Marchenko-Pastur Laws Nathan Noiry Modal X, Université Paris Nanterre May 3, 2018 Wigner s Matrices Wishart s Matrices Generalizations Wigner s Matrices ij, (n, i, j
More informationInformation-theoretically Optimal Sparse PCA
Information-theoretically Optimal Sparse PCA Yash Deshpande Department of Electrical Engineering Stanford, CA. Andrea Montanari Departments of Electrical Engineering and Statistics Stanford, CA. Abstract
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationOn convergence of Approximate Message Passing
On convergence of Approximate Message Passing Francesco Caltagirone (1), Florent Krzakala (2) and Lenka Zdeborova (1) (1) Institut de Physique Théorique, CEA Saclay (2) LPS, Ecole Normale Supérieure, Paris
More informationTensor Biclustering. Hamid Javadi Stanford University Abstract
Tensor Biclustering Soheil Feizi Stanford University sfeizi@stanford.edu Hamid Javadi Stanford University hrhakim@stanford.edu David Tse Stanford University dntse@stanford.edu Abstract Consider a dataset
More informationNoisy Signal Recovery via Iterative Reweighted L1-Minimization
Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.
More informationPrincipal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude
Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Addy M. Boĺıvar Cimé Centro de Investigación en Matemáticas A.C. May 1, 2010
More informationThe Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences
The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences Yuxin Chen Emmanuel Candès Department of Statistics, Stanford University, Sep. 2016 Nonconvex optimization
More informationOn the distinguishability of random quantum states
1 1 Department of Computer Science University of Bristol Bristol, UK quant-ph/0607011 Distinguishing quantum states Distinguishing quantum states This talk Question Consider a known ensemble E of n quantum
More informationStatistical Inference and Random Matrices
Statistical Inference and Random Matrices N.S. Witte Institute of Fundamental Sciences Massey University New Zealand 5-12-2017 Joint work with Peter Forrester 6 th Wellington Workshop in Probability and
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More information1 Intro to RMT (Gene)
M705 Spring 2013 Summary for Week 2 1 Intro to RMT (Gene) (Also see the Anderson - Guionnet - Zeitouni book, pp.6-11(?) ) We start with two independent families of R.V.s, {Z i,j } 1 i
More informationComparison Method in Random Matrix Theory
Comparison Method in Random Matrix Theory Jun Yin UW-Madison Valparaíso, Chile, July - 2015 Joint work with A. Knowles. 1 Some random matrices Wigner Matrix: H is N N square matrix, H : H ij = H ji, EH
More informationLarge sample covariance matrices and the T 2 statistic
Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and
More informationRandom Matrices: Invertibility, Structure, and Applications
Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationRandom regular digraphs: singularity and spectrum
Random regular digraphs: singularity and spectrum Nick Cook, UCLA Probability Seminar, Stanford University November 2, 2015 Universality Circular law Singularity probability Talk outline 1 Universality
More informationLarge deviations of the top eigenvalue of random matrices and applications in statistical physics
Large deviations of the top eigenvalue of random matrices and applications in statistical physics Grégory Schehr LPTMS, CNRS-Université Paris-Sud XI Journées de Physique Statistique Paris, January 29-30,
More informationThe Minimax Noise Sensitivity in Compressed Sensing
The Minimax Noise Sensitivity in Compressed Sensing Galen Reeves and avid onoho epartment of Statistics Stanford University Abstract Consider the compressed sensing problem of estimating an unknown k-sparse
More informationRigidity of the 3D hierarchical Coulomb gas. Sourav Chatterjee
Rigidity of point processes Let P be a Poisson point process of intensity n in R d, and let A R d be a set of nonzero volume. Let N(A) := A P. Then E(N(A)) = Var(N(A)) = vol(a)n. Thus, N(A) has fluctuations
More informationOn State Estimation with Bad Data Detection
On State Estimation with Bad Data Detection Weiyu Xu, Meng Wang, and Ao Tang School of ECE, Cornell University, Ithaca, NY 4853 Abstract We consider the problem of state estimation through observations
More informationNumerical Solutions to the General Marcenko Pastur Equation
Numerical Solutions to the General Marcenko Pastur Equation Miriam Huntley May 2, 23 Motivation: Data Analysis A frequent goal in real world data analysis is estimation of the covariance matrix of some
More informationSparse PCA in High Dimensions
Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and
More informationPainlevé Representations for Distribution Functions for Next-Largest, Next-Next-Largest, etc., Eigenvalues of GOE, GUE and GSE
Painlevé Representations for Distribution Functions for Next-Largest, Next-Next-Largest, etc., Eigenvalues of GOE, GUE and GSE Craig A. Tracy UC Davis RHPIA 2005 SISSA, Trieste 1 Figure 1: Paul Painlevé,
More informationSpin Glass Approach to Restricted Isometry Constant
Spin Glass Approach to Restricted Isometry Constant Ayaka Sakata 1,Yoshiyuki Kabashima 2 1 Institute of Statistical Mathematics 2 Tokyo Institute of Technology 1/29 Outline Background: Compressed sensing
More informationNorms of Random Matrices & Low-Rank via Sampling
CS369M: Algorithms for Modern Massive Data Set Analysis Lecture 4-10/05/2009 Norms of Random Matrices & Low-Rank via Sampling Lecturer: Michael Mahoney Scribes: Jacob Bien and Noah Youngs *Unedited Notes
More informationBeyond the Gaussian universality class
Beyond the Gaussian universality class MSRI/Evans Talk Ivan Corwin (Courant Institute, NYU) September 13, 2010 Outline Part 1: Random growth models Random deposition, ballistic deposition, corner growth
More informationThe non-backtracking operator
The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationMixing in Product Spaces. Elchanan Mossel
Poincaré Recurrence Theorem Theorem (Poincaré, 1890) Let f : X X be a measure preserving transformation. Let E X measurable. Then P[x E : f n (x) / E, n > N(x)] = 0 Poincaré Recurrence Theorem Theorem
More informationOptimal hypothesis testing for stochastic block models with growing degrees
Optimal hypothesis testing for stochastic block models with growing degrees Debapratim Banerjee and Zongming Ma arxiv:705.05305v [math.st] 0 Aug 07 University of Pennsylvania August 0, 07 Abstract The
More informationRandom matrices: A Survey. Van H. Vu. Department of Mathematics Rutgers University
Random matrices: A Survey Van H. Vu Department of Mathematics Rutgers University Basic models of random matrices Let ξ be a real or complex-valued random variable with mean 0 and variance 1. Examples.
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationFluctuations of the free energy of spherical spin glass
Fluctuations of the free energy of spherical spin glass Jinho Baik University of Michigan 2015 May, IMS, Singapore Joint work with Ji Oon Lee (KAIST, Korea) Random matrix theory Real N N Wigner matrix
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationRandom Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg
Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws Symeon Chatzinotas February 11, 2013 Luxembourg Outline 1. Random Matrix Theory 1. Definition 2. Applications 3. Asymptotics 2. Ensembles
More informationDETECTING RARE AND WEAK SPIKES IN LARGE COVARIANCE MATRICES
Submitted to the Annals of Statistics DETECTING RARE AND WEAK SPIKES IN LARGE COVARIANCE MATRICES By Zheng Tracy Ke University of Chicago arxiv:1609.00883v1 [math.st] 4 Sep 2016 Given p-dimensional Gaussian
More informationSmall Ball Probability, Arithmetic Structure and Random Matrices
Small Ball Probability, Arithmetic Structure and Random Matrices Roman Vershynin University of California, Davis April 23, 2008 Distance Problems How far is a random vector X from a given subspace H in
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationAnalysis of Robust PCA via Local Incoherence
Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu
More informationBi-cross-validation for factor analysis
Picking k in factor analysis 1 Bi-cross-validation for factor analysis Art B. Owen, Stanford University and Jingshu Wang, Stanford University Picking k in factor analysis 2 Summary Factor analysis is over
More informationSpectral Partitiong in a Stochastic Block Model
Spectral Graph Theory Lecture 21 Spectral Partitiong in a Stochastic Block Model Daniel A. Spielman November 16, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened
More informationMutual information for symmetric rank-one matrix estimation: A proof of the replica formula
Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula Jean Barbier, Mohamad Dia and Nicolas Macris Laboratoire de Théorie des Communications, Faculté Informatique
More informationarxiv: v5 [math.na] 16 Nov 2017
RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationSPECTRAL METHOD FOR MULTIPLEXED PHASE RETRIEVAL AND APPLICATION IN OPTICAL IMAGING IN COMPLEX MEDIA
SPECTRAL METHOD FOR MULTIPLEXED PHASE RETRIEVAL AND APPLICATION IN OPTICAL IMAGING IN COMPLEX MEDIA Jonathan Dong 1,2 Florent Krzakala 2 Sylvain Gigan 1 We will first introduce the phase retrieval problem
More informationDiscussion of High-dimensional autocovariance matrices and optimal linear prediction,
Electronic Journal of Statistics Vol. 9 (2015) 1 10 ISSN: 1935-7524 DOI: 10.1214/15-EJS1007 Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Xiaohui Chen University
More informationA Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices
A Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices S. Dallaporta University of Toulouse, France Abstract. This note presents some central limit theorems
More informationFIRST ORDER SENTENCES ON G(n, p), ZERO-ONE LAWS, ALMOST SURE AND COMPLETE THEORIES ON SPARSE RANDOM GRAPHS
FIRST ORDER SENTENCES ON G(n, p), ZERO-ONE LAWS, ALMOST SURE AND COMPLETE THEORIES ON SPARSE RANDOM GRAPHS MOUMANTI PODDER 1. First order theory on G(n, p) We start with a very simple property of G(n,
More informationRandom Matrix: From Wigner to Quantum Chaos
Random Matrix: From Wigner to Quantum Chaos Horng-Tzer Yau Harvard University Joint work with P. Bourgade, L. Erdős, B. Schlein and J. Yin 1 Perhaps I am now too courageous when I try to guess the distribution
More informationDense Error Correction for Low-Rank Matrices via Principal Component Pursuit
Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candès, and Yi Ma, Microsoft Research Asia, Beijing, P.R.C Dept. of Electrical
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationUniversality of local spectral statistics of random matrices
Universality of local spectral statistics of random matrices László Erdős Ludwig-Maximilians-Universität, Munich, Germany CRM, Montreal, Mar 19, 2012 Joint with P. Bourgade, B. Schlein, H.T. Yau, and J.
More informationStatistical and Computational Phase Transitions in Planted Models
Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationStochastic geometry and random matrix theory in CS
Stochastic geometry and random matrix theory in CS IPAM: numerical methods for continuous optimization University of Edinburgh Joint with Bah, Blanchard, Cartis, and Donoho Encoder Decoder pair - Encoder/Decoder
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationDistribution of Eigenvalues of Weighted, Structured Matrix Ensembles
Distribution of Eigenvalues of Weighted, Structured Matrix Ensembles Olivia Beckwith 1, Steven J. Miller 2, and Karen Shen 3 1 Harvey Mudd College 2 Williams College 3 Stanford University Joint Meetings
More informationEfficient Spectral Methods for Learning Mixture Models!
Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant.
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationThe circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)
The circular law Lewis Memorial Lecture / DIMACS minicourse March 19, 2008 Terence Tao (UCLA) 1 Eigenvalue distributions Let M = (a ij ) 1 i n;1 j n be a square matrix. Then one has n (generalised) eigenvalues
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior
More informationTHE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová
THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES Lenka Zdeborová (CEA Saclay and CNRS, France) MAIN CONTRIBUTORS TO THE PHYSICS UNDERSTANDING OF RANDOM INSTANCES Braunstein, Franz, Kabashima, Kirkpatrick,
More informationInformation Recovery from Pairwise Measurements
Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationMismatched Estimation in Large Linear Systems
Mismatched Estimation in Large Linear Systems Yanting Ma, Dror Baron, Ahmad Beirami Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 7695, USA Department
More informationContraction Principles some applications
Contraction Principles some applications Fabrice Gamboa (Institut de Mathématiques de Toulouse) 7th of June 2017 Christian and Patrick 59th Birthday Overview Appetizer : Christian and Patrick secret lives
More informationSpectral thresholds in the bipartite stochastic block model
Spectral thresholds in the bipartite stochastic block model Laura Florescu and Will Perkins NYU and U of Birmingham September 27, 2016 Laura Florescu and Will Perkins Spectral thresholds in the bipartite
More informationInterpolation-Based Trust-Region Methods for DFO
Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv
More informationBulk scaling limits, open questions
Bulk scaling limits, open questions Based on: Continuum limits of random matrices and the Brownian carousel B. Valkó, B. Virág. Inventiones (2009). Eigenvalue statistics for CMV matrices: from Poisson
More information