Phase transitions in low-rank matrix estimation
|
|
- Cecilia Charles
- 5 years ago
- Views:
Transcription
1 Phase transitions in low-rank matrix estimation Florent Krzakala arxiv: &
2 Thibault Lesieur Lenka Zeborová Caterina Debacco Cris Moore Jess Banks Jiaming Xu Jean Barbier Nicolas Macris Mohama Dia
3 Low-rank matrix estimation W = 1 p n XX T W = 1 p n UV T X 2 R n r or U 2 R m r,v 2 R n r unknown unknown Matrix W has low (finite) rank r n, m W is observe element-wise trough a noisy channel: P out (y ij w ij ) Goal: Estimate unknown X (or U & V) from known Y.
4 Some examples x T i =(0,...,0, 1, 0,...,0) w ij = x T i x j / p n { r-imensional variable (r=rank) (Goal: Estimate unknown X from known Y.)
5 Some examples x T i =(0,...,0, 1, 0,...,0) w ij = x T i x j / p n Aitive white Gaussian noise (sub-matrix localization) P out (y w) = 1 p 2 e (y w)2 2 (Goal: Estimate unknown X from known Y.)
6 Some examples x T i =(0,...,0, 1, 0,...,0) w ij = x T i x j / p n Aitive white Gaussian noise (sub-matrix localization) P out (y w) = 1 p 2 e (y w)2 2 Dense stochastic block moel (community etection) P out (y ij =1 w ij )=p out + µw ij P out (y ij =0 w ij )=1 p out µw ij p in = p out + µ/ p n = p out(1 p out ) µ 2. (Goal: Estimate unknown X from known Y.)
7 Some examples u T i =(v 1 i,...,v R i ) 2 R v T i =(0,...,0, 1, 0,...,0) Y W = 1 p n UV T +N (0, 2 ) Aitive white Gaussian noise (clustering mixture of Gaussians) (Goal: Estimate unknown U an V from known Y.)
8 Even more examples Sparse PCA, robust PCA Collaborative filtering (low rank matrix completion) 1-bit Collaborative filtering (like/unlike) Bi-clustering Plante clique (cf. Anrea Montanari yesteray) etc
9 QUESTIONS Many interesting problems can be formulate this way Q1: When is it possible to perform a goo factorization? Q2: When is it algorithmically tractable? Q3: How goo are spectral methos (main tool)? Yesteray: Anrea taught us how to analyze AMP for such problems with the state evolution approach. Toay: We continue in this irection an answer these questions in a probabilistic setting, with instances ranomly generate from a moel.
10 Probabilistic setting Assume X is generate from P (X) = The posterior istribution reas P (X Y )= 1 Z(Y ) ny ny i=1 i=1 P X (x i ) Y i<j P X (x i ) P out y ij x T i x j p n x j y ij x i Graphical moel where y ij are pair-wise observations of variables MMSE estimator (minimal error) Marginals probability of the posterior EXACTLY Solvable? When P X an P out known, n!1,r = O(1) Approximate message passing = ense-factor-graph simplifications of belief propagation, hopefully asymptotically exact marginals of the posterior istribution. Exact analysis possible with statistical-physics style methos Many rigorous proofs possible when one works a bit harer
11 OUTLINE 1) Message passing, State evolution, Mutual information 2) Universality property 3) Main results 4) Sketch of proof
12 OUTLINE 1) Message passing, State evolution, Mutual information 2) Universality property 3) Main results 4) Sketch of proof
13 AMP FOR GAUSSIAN ADDITIVE CHANNEL w ij = x T i x j / p AWGN p n y ij = w ij + B t = 1 p n Sa t 1 1 n A t = 1 n S = Y at a t X i v t i! a t 1 Mean an variance of the marginals: a t+1 i = f(a t,bi) v t+1 i = (A t,b i) First written for rank r=1 in Rangan, Fletcher 12 Depenence on the prior only via a thresholing function f(a,b) given by the expectation of: P (x) = 1 Z(A, B) P X(x)exp B > x x > Ax 2
14 AMP FOR GAUSSIAN ADDITIVE CHANNEL w ij = x T i x j / p AWGN p n y ij = w ij + B t = 1 p n Sa t 1 1 n A t = 1 n S = Y at a t X i v t i! a t 1 Mean an variance of the marginals: a t+1 i = f(a t,bi) v t+1 i = (A t,b i) First written for rank r=1 in Rangan, Fletcher 12 Note: for x=±1, these are nothing but the TAP equations for the Ising Sherrington-Kirkpatrick moel (on the Nishimori line) a t+1 i = tanh(b t i) v t+1 i =1 (a t+1 i ) 2
15 State evolution Single letter characterization of the AMP M t = 1 n ˆx AMP x r M t! # M t+1 = E x, "f M t, M t x + x Depens on the channel only though its Fisher information. 1-parameter family of channels having the same MMSE. cf. Yesteray s talk: rigorously proven in Montanari, Bayati 10 - Montanari, Deshpane 14
16 State evolution Single letter characterization of the AMP M t = 1 n ˆx AMP x M t+1 = E x, "f M t, M t x + r M t! x # Note: for x=±1, these are nothing but the replica symmetric equations for the Ising Sherrington-Kirkpatrick moel (on the Nishimori line) Z M t+1 = xd tanh M r! t M t x +
17 Replica Mutual information Most quantities of interest can be compute from the Mutual Information (free energy for physicist) I(X; Y )= Z xyp (x, y) log P (x, y) P (x)p (y) The replica methos preicts an asymptotic formula for i = I/n: i RS (m) = m2 + E x (x 2 ) 2 r m, mx mz E x,z applej + 4 with Z J (A, B) = log e Bx Ax2 /2 p(x)e FACT: The state evolution recursion for AMP is a fixe point of i RS (m) CONJECTURE: lim n!1 I(X; Y ) n =min m i rs (m)
18 Replica Mutual information i RS (m) = m2 + E x (x 2 ) 2 4 E x,z applej m, mx + r mz J (A, B) = log Z e Bx Ax2 /2 p(x)e Goo for AMP Ba for AMP i RS (E) i RS (E) = = E = = E Goo for AMP Goo for AMP E = MSE vector AMP = E[x 2 ] m
19 Replica Mutual information Can we prove lim n!1 I(X; Y ) n =min m i rs (m)? yes! i RS (m) = m2 + E x (x 2 ) 2 with 4 Z J (A, B) = log E x,z applej e Bx m, mx + r mz Ax2 /2 p(x)e * Proven for not-too-sparse PCA (Montanari & Deshpane 14) an for symmetric community etection (Montanari, Abbe & Deshpane 16). * Proven for the plante SK moel by Koraa & Macris 10 FK, Xu & Zeborová 16 : Upper Boun (Guerra Interpolation) I n apple i rs(m) Barbier, Dia, Macris, FK & Zeborová 16: lim n!1 (Spatial coupling+thermoynamic integration/i-mmse) I n min m i rs (m)
20 FORMULAS FOR UV T B t+1 u = A t+1 u = r 1 n Xf v(a t v,b t v) AMP 1 hf t v 0 (A t v,b t v)if u (A t 1 u,b t 1 u ) 1n f v(a t v,b t v)f v (A t v,b t v) T B t+1 v = A t+1 v = r 1 n XT f s (A t v,b t v) hfs 0 (A t s,b t s)if v (A t 1 v,b t 1 v ) 1n f u(a t ub t u)f u (A t u,b t u) T M t u = 1 nû AMP u State evolution M t+1 u = E u, "f u M tv, M tv x + r M t v! u # M t v = 1 m ˆv AMP v M t+1 v = E v, "f v M tu, M tu x + r M tu! v # Mutual information i RS (m u,m v )= m um v +[E(u 2 )][E(v 2 )] 2 E u,z Ju m v, m vu + p m v z E v,z Jv m u, m uv + p m u z
21 OUTLINE 1) Message passing, State evolution, Mutual information 2) Universality property 3) Main results 4) Sketch of proof
22 WHAT ABOUT OTHER CHANNELS? w ij = x T i x j / p n P out(y ij w ij ) y ij
23 CHANNEL UNIVERSALITY w ij = x T i x j / p n P out(y ij w ij ) y ij Depenence on the channel only via: S log P out(y ij y ij,0 Fisher-score matrix 1 EPout (y w=0) log Pout (y y,0 2 # Fisher information Effective Gaussian channel with y=δs w ij = x T i x j / p AWGN p n y ij = w ij +
24 } CHANNEL UNIVERSALITY A physicist argument small quantity W = 1 p n XX T P out (Y ij W ij )=e log P out(y ij W ij ) = P out (Y ij 0)e W ijs ij W 2 ij S0 ij +O(n 3/2 ) n 2 terms P out (Y W )=P out (Y 0)e P iapplej (W ijs ij W 2 ij S0 ij )+O(p n) concentration P out (Y W ) P out (Y 0)e P iapplej (W ijs ij 1 2 W 2 ij )+O(p n) Effective Gaussian posterior probability P (W Y ) / P (W )e 1 2 P iapplej ( S ij W ij ) 2 +O( p n).
25 CHANNEL UNIVERSALITY arxiv: FK, Xu & Zeborova 16 Conjecture in Lesieur, FK & Zeborova 15 Rank 1 SBM use & proven in Abbe, Deshpane, Montanari 16
26 OUTLINE 1) Message passing, State evolution, Mutual information 2) Universality property 3) Main results 4) Sketch of proof
27 Minimum mean-square error x T W = p 1 i =(0,...,0, 1, 0,...,0) XX T Y = P out (W ) n Rank r=2 Size n=20000 Fisher information:
28 LARGER RANK? x T W = p 1 i =(0,...,0, 1, 0,...,0) XX T Y = P out (W ) n Rank >=4 Fisher information:
29 Two phase transitions for r>4 Fisher information: EASY < c
30 Two phase transitions for r>4 Fisher information: EASY HARD c < < s
31 Two phase transitions for r>4 Fisher information: EASY HARD IMPOSSIBLE > s
32 Two phase transitions for r>4 Easy phase < 1 r 2 = c Same transition in spectral methos (~BBP 05) Har phase p in p out > 1 p n r p p out (1 p out ). 1 r 2 < < 1 4r ln(r) [1 + o r(1)] Decelle, Moore, FK, Zeborova 11 = s Conjecture to be har for all polynomial algorithms Impossible phase > 1 4r ln(r) [1 + o r(1)] = s p in p out < 1 p n 2 p r log r p p out (1 p out ). Relate to Potts glass temperature (Kanter, Gross, Sompolinsky 85)
33 Non-symmetric Community etection AMP Opt spectral Asymmetric Community Detection MMSE AMP ρ MSE( ) at ρ=0.05 AMP opt communities +/- size +: ρn size -: (1-ρ)n p ++ = p + µ 1 p = p + µ p n (1 ) p n p + = p µ 1 p n P (x) = x r 1 +(1 ) x + r 1 = p(1 p) µ 2
34 CLUSTERING MIXTURES OF GAUSSIANS IN HIGH DIMENSIONS 2 groups 20 groups Algorithm first propose by Matsushita, Tanaka 13 u T i =(v 1 i,...,v R i ) 2 R α=m/n=1 v T i =(0,...,0, 1, 0,...,0) W = 1 p n UV T +N (0, 1/ ) Y
35 What about RANK 1 sparse PCA? X i = {0, 1} W = 1 p n XX T P (x) = x,1 +(1 ) x,0 Y = W + N (0, ) ρ 1 Rank r=1 For ρ > 0.05, Montanari an Deshpane showe that AMP achieve the optimal MMSE ρ=0.05 Δ
36 What about RANK 1 sparse PCA? X i = {0, 1} W = 1 p n XX T Y = W + N (0, ) P (x) = x,1 +(1 ) x,0 Rank r=1 AMP achieve the optimal MMSE everywhere EXCEPT between the blue an re curves spectral methos ρ= Δ 0.03 ρ Δ AMP spinoal inf. theoretic transition 2n spinoal
37 Easy, Har an impossible inference A very common phenomena: Information is hien by metastability (1st orer transition in physics) known bouns har known algorithms easy impossible ck cs amount of information in the measurements The har phase quantifie also in: plante constraint satisfaction, compresse sensing, stochastic block moel, ictionary learning, blin source separation, sparse PCA, error correcting coes, hien clique problem, others. Conjecture: har for all polynomial algorithms
38 OUTLINE 1) Message passing, State evolution, Mutual information 2) Universality property 3) Main results 4) Sketch of proof
39 SKETCH OF THE PROOF Result 1 [FK, Xu & Zeborová] : Upper Boun I n apple i Bethe(m) = m2 + Ex (x 2 ) 2 4 E x,z applej m, mx + r mz (note that this is true at any value of n, not only asymptotically) Metho: Guerra s interpolation+nishimori ientities
40 SKETCH OF THE PROOF Result 1 [FK, Xu & Zeborová] : Upper Boun Interpolate the factorization problem at t=1 from a enoising problem at t=0 P t (x Y,D) = 1 Z t p 0 (x)e 2 t P 4 iapplej xi x j p n Y ij (1 t) P i apple (D i x i ) 2 2 D Factorization: w ij = x T i x j / p n N (0, /t) y ij Denoising: x i N (0, D/(1 t)) D i with D = m I t=1 (x; Y )=I t=0 (x; y)+ R 1 0 t It (x; Y,y)
41 SKETCH OF THE PROOF Result 1 [FK, Xu & Zeborová] : Upper Boun Interpolate the factorization problem at t=1 from a enoising problem at t=0 P t (x Y,D) = 1 Z t p 0 (x)e 2 t P 4 iapplej xi x j p n Y ij (1 t) P i apple (D i x i ) 2 2 D I t=1 (x; Y )=I t=0 (x; y)+ R 1 0 t It (x; Y,y) Use Stein lemma an Nishimori s ientities, one can show that I(x;Y ) n = i rs (m) R 1 0 t (m E t[xx 0 ]) 2 4 <i rs (m) Remark: for estimation problems, Guerra s interpolation yiels an upper boun while in the usual case (i.e. CSP) it yiels a lower boun
42 SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ - ΔRS - ΔAMP - 0
43 SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS - ΔAMP - 0
44 SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS - ΔAMP - 0 I-mmse theorem (Guo-Veru) (For physicists: this is just thermoynamics) 1 i( )=1 4 MMSE( )
45 SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS 1 [min Ei RS (E; )] 1 i( ) - ΔAMP - 0 I-mmse theorem (Guo-Veru) (For physicists: this is just thermoynamics) 1 i( )=1 4 MMSE( )
46 SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS 1 [min Ei RS (E; )] 1 i( ) - ΔAMP [min E i RS (E; )] apple i( ) - 0
47 SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS 1 [min Ei RS (E; )] 1 i( ) - ΔAMP [min E i RS (E; )] apple i( ) min E i RS (E; ) min E i RS (E; 0) apple i( ) i(0) - 0 Both are equal to H(x)
48 SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS 1 [min Ei RS (E; )] 1 i( ) - ΔAMP [min E i RS (E; )] apple i( ) min E i RS (E; ) min E i RS (E; 0) apple i( ) i(0) - 0 min E i RS (,E) apple i( )
49 SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 2: Prove that the true thermoynamic potential is analytic until ΔRS - - ΔRS ΔAMP This is the harest part. Sketch: 1) Map the problem to its spatially couple version 2) Prove both moels have the same mutual information (with Guerra s type interpolation) 3) Prove the spatially couple version is analytic until ΔRS (Threshol saturation) More in Nicolas Macris's talk this afternoon - 0
50 SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ - ΔRS Step 3: ForΔ >ΔRS, AMP provies back the MMSE Use again the I-MMSE theorem from ΔRS to Δ>ΔRS - ΔAMP - 0
51 Conclusions MMSE in a ranom setting of the low-rank matrix estimation evaluate (Proven rigorously for symmetric problems, open for generic UV T factorization). Promising proof strategy for the replica formula in estimation problems (possible generalization for larger ranks, tensors, etc ) Depenence on the output channel through its Fisher information: universality Two phase transitions: an algorithmic one an an information theoretic one. Sometimes equal, but not always; AMP reach the MMSE in a large region. When the problem is not balance (i.e. if the prior has a non-zero mean) AMP has a better etectability transition than spectral methos. Promising algorithm with goo convergence properties for applications beyon ranom setting (example: Restricte Boltzmann Machine). Open-source implementation available
Sparse Superposition Codes for the Gaussian Channel
Sparse Superposition Codes for the Gaussian Channel Florent Krzakala (LPS, Ecole Normale Supérieure, France) J. Barbier (ENS) arxiv:1403.8024 presented at ISIT 14 Long version in preparation Communication
More informationTHE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová
THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES Lenka Zdeborová (CEA Saclay and CNRS, France) MAIN CONTRIBUTORS TO THE PHYSICS UNDERSTANDING OF RANDOM INSTANCES Braunstein, Franz, Kabashima, Kirkpatrick,
More informationOn convergence of Approximate Message Passing
On convergence of Approximate Message Passing Francesco Caltagirone (1), Florent Krzakala (2) and Lenka Zdeborova (1) (1) Institut de Physique Théorique, CEA Saclay (2) LPS, Ecole Normale Supérieure, Paris
More informationThe non-backtracking operator
The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:
More informationMutual information for symmetric rank-one matrix estimation: A proof of the replica formula
Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula Jean Barbier, Mohamad Dia and Nicolas Macris Laboratoire de Théorie des Communications, Faculté Informatique
More informationFundamental limits of symmetric low-rank matrix estimation
Proceedings of Machine Learning Research vol 65:1 5, 2017 Fundamental limits of symmetric low-rank matrix estimation Marc Lelarge Safran Tech, Magny-Les-Hameaux, France Département d informatique, École
More informationMismatched Estimation in Large Linear Systems
Mismatched Estimation in Large Linear Systems Yanting Ma, Dror Baron, Ahmad Beirami Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 7695, USA Department
More informationInferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016
Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines Eric W. Tramel itwist 2016 Aalborg, DK 24 August 2016 Andre MANOEL, Francesco CALTAGIRONE, Marylou GABRIE, Florent
More informationInformation-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization
Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization Jess Banks Cristopher Moore Roman Vershynin Nicolas Verzelen Jiaming Xu Abstract We study the problem
More informationReconstruction in the Generalized Stochastic Block Model
Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR
More informationMessage passing and approximate message passing
Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x
More informationarxiv: v2 [cs.it] 6 Sep 2016
The Mutual Information in Random Linear Estimation Jean Barbier, Mohamad Dia, Nicolas Macris and Florent Krzakala Laboratoire de Théorie des Communications, Faculté Informatique et Communications, Ecole
More informationMismatched Estimation in Large Linear Systems
Mismatched Estimation in Large Linear Systems Yanting Ma, Dror Baron, and Ahmad Beirami North Carolina State University Massachusetts Institute of Technology & Duke University Supported by NSF & ARO Motivation
More informationThe Sherrington-Kirkpatrick model
Stat 36 Stochastic Processes on Graphs The Sherrington-Kirkpatrick model Andrea Montanari Lecture - 4/-4/00 The Sherrington-Kirkpatrick (SK) model was introduced by David Sherrington and Scott Kirkpatrick
More informationPerformance Regions in Compressed Sensing from Noisy Measurements
Performance egions in Compressed Sensing from Noisy Measurements Junan Zhu and Dror Baron Department of Electrical and Computer Engineering North Carolina State University; aleigh, NC 27695, USA Email:
More informationInformation, Physics, and Computation
Information, Physics, and Computation Marc Mezard Laboratoire de Physique Thdorique et Moales Statistiques, CNRS, and Universit y Paris Sud Andrea Montanari Department of Electrical Engineering and Department
More informationRigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems
1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it
More informationApproximate Message Passing for Bilinear Models
Approximate Message Passing for Bilinear Models Volkan Cevher Laboratory for Informa4on and Inference Systems LIONS / EPFL h,p://lions.epfl.ch & Idiap Research Ins=tute joint work with Mitra Fatemi @Idiap
More informationPlanted Cliques, Iterative Thresholding and Message Passing Algorithms
Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea Montanari Stanford University November 5, 2013 Deshpande, Montanari Planted Cliques November 5, 2013 1 /
More informationRisk and Noise Estimation in High Dimensional Statistics via State Evolution
Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical
More informationApproximate Message Passing Algorithms
November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)
More informationFundamental Limits of Compressed Sensing under Optimal Quantization
Fundamental imits of Compressed Sensing under Optimal Quantization Alon Kipnis, Galen Reeves, Yonina C. Eldar and Andrea J. Goldsmith Department of Electrical Engineering, Stanford University Department
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationThe spin-glass cornucopia
The spin-glass cornucopia Marc Mézard! Ecole normale supérieure - PSL Research University and CNRS, Université Paris Sud LPT-ENS, January 2015 40 years of research n 40 years of research 70 s:anomalies
More informationInformation-theoretically Optimal Sparse PCA
Information-theoretically Optimal Sparse PCA Yash Deshpande Department of Electrical Engineering Stanford, CA. Andrea Montanari Departments of Electrical Engineering and Statistics Stanford, CA. Abstract
More informationCompressed Sensing under Optimal Quantization
Compressed Sensing under Optimal Quantization Alon Kipnis, Galen Reeves, Yonina C. Eldar and Andrea J. Goldsmith Department of Electrical Engineering, Stanford University Department of Electrical and Computer
More informationPhase Transitions (and their meaning) in Random Constraint Satisfaction Problems
International Workshop on Statistical-Mechanical Informatics 2007 Kyoto, September 17 Phase Transitions (and their meaning) in Random Constraint Satisfaction Problems Florent Krzakala In collaboration
More informationRelative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation
Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University
More informationApproximate Message Passing
Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters
More informationCompressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach
Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL) Problem Statement Traditional
More informationarxiv: v1 [cond-mat.dis-nn] 16 Feb 2018
Parallel Tempering for the planted clique problem Angelini Maria Chiara 1 1 Dipartimento di Fisica, Ed. Marconi, Sapienza Università di Roma, P.le A. Moro 2, 00185 Roma Italy arxiv:1802.05903v1 [cond-mat.dis-nn]
More informationFor final project discussion every afternoon Mark and I will be available
Worshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include
More informationJoint Channel Estimation and Co-Channel Interference Mitigation in Wireless Networks Using Belief Propagation
Joint Channel Estimation and Co-Channel Interference Mitigation in Wireless Networks Using Belief Propagation Yan Zhu, Dongning Guo and Michael L. Honig Northwestern University May. 21, 2008 Y. Zhu, D.
More informationSpin glasses and Stein s method
UC Berkeley Spin glasses Magnetic materials with strange behavior. ot explained by ferromagnetic models like the Ising model. Theoretically studied since the 70 s. An important example: Sherrington-Kirkpatrick
More informationarxiv: v4 [cond-mat.stat-mech] 28 Jul 2016
Statistical physics of inference: Thresholds and algorithms Lenka Zdeborová 1,, and Florent Krzakala 2, 1 Institut de Physique Théorique, CNRS, CEA, Université Paris-Saclay, F-91191, Gif-sur-Yvette, France
More informationOptimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices
Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices Alex Wein MIT Joint work with: Amelia Perry (MIT), Afonso Bandeira (Courant NYU), Ankur Moitra (MIT) 1 / 22 Random
More informationarxiv: v3 [cs.na] 21 Mar 2016
Phase transitions and sample complexity in Bayes-optimal matrix factorization ariv:40.98v3 [cs.a] Mar 06 Yoshiyuki Kabashima, Florent Krzakala, Marc Mézard 3, Ayaka Sakata,4, and Lenka Zdeborová 5, Department
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationThe Minimax Noise Sensitivity in Compressed Sensing
The Minimax Noise Sensitivity in Compressed Sensing Galen Reeves and avid onoho epartment of Statistics Stanford University Abstract Consider the compressed sensing problem of estimating an unknown k-sparse
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationPhil Schniter. Supported in part by NSF grants IIP , CCF , and CCF
AMP-inspired Deep Networks, with Comms Applications Phil Schniter Collaborators: Sundeep Rangan (NYU), Alyson Fletcher (UCLA), Mark Borgerding (OSU) Supported in part by NSF grants IIP-1539960, CCF-1527162,
More informationStatistical and Computational Phase Transitions in Planted Models
Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in
More informationmuch more on minimax (order bounds) cf. lecture by Iain Johnstone
much more on minimax (order bounds) cf. lecture by Iain Johnstone http://www-stat.stanford.edu/~imj/wald/wald1web.pdf today s lecture parametric estimation, Fisher information, Cramer-Rao lower bound:
More informationCommunity detection in stochastic block models via spectral methods
Community detection in stochastic block models via spectral methods Laurent Massoulié (MSR-Inria Joint Centre, Inria) based on joint works with: Dan Tomozei (EPFL), Marc Lelarge (Inria), Jiaming Xu (UIUC),
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationOrganization. I MCMC discussion. I project talks. I Lecture.
Organization I MCMC discussion I project talks. I Lecture. Content I Uncertainty Propagation Overview I Forward-Backward with an Ensemble I Model Reduction (Intro) Uncertainty Propagation in Causal Systems
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationMachine Learning Lecture Notes
Machine Learning Lecture Notes Predrag Radivojac January 3, 25 Random Variables Until now we operated on relatively simple sample spaces and produced measure functions over sets of outcomes. In many situations,
More information1. Aufgabenblatt zur Vorlesung Probability Theory
24.10.17 1. Aufgabenblatt zur Vorlesung By (Ω, A, P ) we always enote the unerlying probability space, unless state otherwise. 1. Let r > 0, an efine f(x) = 1 [0, [ (x) exp( r x), x R. a) Show that p f
More informationLecture 8 Analyzing the diffusion weighted signal. Room CSB 272 this week! Please install AFNI
Lecture 8 Analyzing the diffusion weighted signal Room CSB 272 this week! Please install AFNI http://afni.nimh.nih.gov/afni/ Next lecture, DTI For this lecture, think in terms of a single voxel We re still
More informationInteractions of Information Theory and Estimation in Single- and Multi-user Communications
Interactions of Information Theory and Estimation in Single- and Multi-user Communications Dongning Guo Department of Electrical Engineering Princeton University March 8, 2004 p 1 Dongning Guo Communications
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More information5 Operations on Multiple Random Variables
EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y
More information5. Density evolution. Density evolution 5-1
5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;
More informationCapacity Analysis of MIMO Systems with Unknown Channel State Information
Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,
More information( 1 k "information" I(X;Y) given by Y about X)
SUMMARY OF SHANNON DISTORTION-RATE THEORY Consider a stationary source X with f (x) as its th-order pdf. Recall the following OPTA function definitions: δ(,r) = least dist'n of -dim'l fixed-rate VQ's w.
More informationStatistical Physics of Inference and Bayesian Estimation
Statistical Physics of Inference and Bayesian Estimation Florent Krzakala, Lenka Zdeborova, Maria Chiara Angelini and Francesco Caltagirone Contents Bayesian Inference and Estimators 3. The Bayes formula..........................................
More informationDifferentiation ( , 9.5)
Chapter 2 Differentiation (8.1 8.3, 9.5) 2.1 Rate of Change (8.2.1 5) Recall that the equation of a straight line can be written as y = mx + c, where m is the slope or graient of the line, an c is the
More informationTurbo-AMP: A Graphical-Models Approach to Compressive Inference
Turbo-AMP: A Graphical-Models Approach to Compressive Inference Phil Schniter (With support from NSF CCF-1018368 and DARPA/ONR N66001-10-1-4090.) June 27, 2012 1 Outline: 1. Motivation. (a) the need for
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationUNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS
UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems
More informationInformation Dimension
Information Dimension Mina Karzand Massachusetts Institute of Technology November 16, 2011 1 / 26 2 / 26 Let X would be a real-valued random variable. For m N, the m point uniform quantized version of
More informationRecall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.
Expectations of Sums of Random Variables STAT/MTHE 353: 4 - More on Expectations and Variances T. Linder Queen s University Winter 017 Recall that if X 1,...,X n are random variables with finite expectations,
More informationLeast Distortion of Fixed-Rate Vector Quantizers. High-Resolution Analysis of. Best Inertial Profile. Zador's Formula Z-1 Z-2
High-Resolution Analysis of Least Distortion of Fixe-Rate Vector Quantizers Begin with Bennett's Integral D 1 M 2/k Fin best inertial profile Zaor's Formula m(x) λ 2/k (x) f X(x) x Fin best point ensity
More informationPhase transitions in discrete structures
Phase transitions in discrete structures Amin Coja-Oghlan Goethe University Frankfurt Overview 1 The physics approach. [following Mézard, Montanari 09] Basics. Replica symmetry ( Belief Propagation ).
More informationSingle-letter Characterization of Signal Estimation from Linear Measurements
Single-letter Characterization of Signal Estimation from Linear Measurements Dongning Guo Dror Baron Shlomo Shamai The work has been supported by the European Commission in the framework of the FP7 Network
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationFast Direct Methods for Gaussian Processes
Fast Direct Methods for Gaussian Processes Mike O Neil Departments of Mathematics New York University oneil@cims.nyu.edu December 12, 2015 1 Collaborators This is joint work with: Siva Ambikasaran Dan
More informationOn the Applications of the Minimum Mean p th Error to Information Theoretic Quantities
On the Applications of the Minimum Mean p th Error to Information Theoretic Quantities Alex Dytso, Ronit Bustin, Daniela Tuninetti, Natasha Devroye, H. Vincent Poor, and Shlomo Shamai (Shitz) Outline Notation
More informationAdmin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016
Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network
More informationTHERE is a deep connection between the theory of linear
664 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 2, FEBRUARY 2007 Griffith Kelly Sherman Correlation Inequalities: A Useful Tool in the Theory of Error Correcting Codes Nicolas Macris, Member,
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationApproximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery
Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns
More informationFast Numerical Methods for Stochastic Computations
Fast AreviewbyDongbinXiu May 16 th,2013 Outline Motivation 1 Motivation 2 3 4 5 Example: Burgers Equation Let us consider the Burger s equation: u t + uu x = νu xx, x [ 1, 1] u( 1) =1 u(1) = 1 Example:
More informationIndependent Component Analysis. Contents
Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationA New Minimum Description Length
A New Minimum Description Length Soosan Beheshti, Munther A. Dahleh Laboratory for Information an Decision Systems Massachusetts Institute of Technology soosan@mit.eu,ahleh@lis.mit.eu Abstract The minimum
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationGrowth Rate of Spatially Coupled LDPC codes
Growth Rate of Spatially Coupled LDPC codes Workshop on Spatially Coupled Codes and Related Topics at Tokyo Institute of Technology 2011/2/19 Contents 1. Factor graph, Bethe approximation and belief propagation
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationLecture 1: The Multiple Access Channel. Copyright G. Caire 12
Lecture 1: The Multiple Access Channel Copyright G. Caire 12 Outline Two-user MAC. The Gaussian case. The K-user case. Polymatroid structure and resource allocation problems. Copyright G. Caire 13 Two-user
More informationChapter I: Fundamental Information Theory
ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.
More informationThe Game of Twenty Questions with noisy answers. Applications to Fast face detection, micro-surgical tool tracking and electron microscopy
The Game of Twenty Questions with noisy answers. Applications to Fast face detection, micro-surgical tool tracking and electron microscopy Graduate Summer School: Computer Vision July 22 - August 9, 2013
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationAn equivalence between high dimensional Bayes optimal inference and M-estimation
An equivalence between high dimensional Bayes optimal inference and M-estimation Madhu Advani Surya Ganguli Department of Applied Physics, Stanford University msadvani@stanford.edu and sganguli@stanford.edu
More informationee378a spring 2013 April 1st intro lecture statistical signal processing/ inference, estimation, and information processing Monday, April 1, 13
ee378a statistical signal processing/ inference, estimation, and information processing spring 2013 April 1st intro lecture 1 what is statistical signal processing? anything & everything inference, estimation,
More informationPropagating beliefs in spin glass models. Abstract
Propagating beliefs in spin glass models Yoshiyuki Kabashima arxiv:cond-mat/0211500v1 [cond-mat.dis-nn] 22 Nov 2002 Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology,
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationSolutions to Homework Set #4 Differential Entropy and Gaussian Channel
Solutions to Homework Set #4 Differential Entropy and Gaussian Channel 1. Differential entropy. Evaluate the differential entropy h(x = f lnf for the following: (a Find the entropy of the exponential density
More informationMMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only
MMSE Dimension Yihong Wu Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú Department of Electrical Engineering Princeton University
More informationEE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes
EE229B - Final Project Capacity-Approaching Low-Density Parity-Check Codes Pierre Garrigues EECS department, UC Berkeley garrigue@eecs.berkeley.edu May 13, 2005 Abstract The class of low-density parity-check
More informationVector Approximate Message Passing. Phil Schniter
Vector Approximate Message Passing Phil Schniter Collaborators: Sundeep Rangan (NYU), Alyson Fletcher (UCLA) Supported in part by NSF grant CCF-1527162. itwist @ Aalborg University Aug 24, 2016 Standard
More informationInformation Theory. Coding and Information Theory. Information Theory Textbooks. Entropy
Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is
More informationChapter 4: Interpolation and Approximation. October 28, 2005
Chapter 4: Interpolation and Approximation October 28, 2005 Outline 1 2.4 Linear Interpolation 2 4.1 Lagrange Interpolation 3 4.2 Newton Interpolation and Divided Differences 4 4.3 Interpolation Error
More informationStatistical Physics on Sparse Random Graphs: Mathematical Perspective
Statistical Physics on Sparse Random Graphs: Mathematical Perspective Amir Dembo Stanford University Northwestern, July 19, 2016 x 5 x 6 Factor model [DM10, Eqn. (1.4)] x 1 x 2 x 3 x 4 x 9 x8 x 7 x 10
More informationCausal Modeling with Generative Neural Networks
Causal Modeling with Generative Neural Networks Michele Sebag TAO, CNRS INRIA LRI Université Paris-Sud Joint work: D. Kalainathan, O. Goudet, I. Guyon, M. Hajaiej, A. Decelle, C. Furtlehner https://arxiv.org/abs/1709.05321
More information