Fundamental limits of symmetric low-rank matrix estimation

Similar documents
Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

Reconstruction in the Generalized Stochastic Block Model

Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices

Information-theoretically Optimal Sparse PCA

Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula

On convergence of Approximate Message Passing

OPTIMALITY AND SUB-OPTIMALITY OF PCA I: SPIKED RANDOM MATRIX MODELS

Sparse Superposition Codes for the Gaussian Channel

Spectral Partitiong in a Stochastic Block Model

Phase transitions in low-rank matrix estimation

The non-backtracking operator

Performance Regions in Compressed Sensing from Noisy Measurements

Statistical and Computational Phase Transitions in Planted Models

arxiv: v2 [cs.it] 6 Sep 2016

THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová

Clustering from Sparse Pairwise Measurements

SPECTRAL METHOD FOR MULTIPLEXED PHASE RETRIEVAL AND APPLICATION IN OPTICAL IMAGING IN COMPLEX MEDIA

Single-letter Characterization of Signal Estimation from Linear Measurements

Spectral thresholds in the bipartite stochastic block model

Pan Zhang Ph.D. Santa Fe Institute 1399 Hyde Park Road Santa Fe, NM 87501, USA

arxiv: v1 [cond-mat.dis-nn] 16 Feb 2018

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Mismatched Estimation in Large Linear Systems

The Sherrington-Kirkpatrick model

PERSPECTIVES ON SPIN GLASSES

Detection limits and fluctuation results in some spiked random matrix models and pooling of discrete data. Ahmed El Alaoui El Abidi

COR-OPT Seminar Reading List Sp 18

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY

Fundamental Limits of PhaseMax for Phase Retrieval: A Replica Analysis

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

A statistical model for tensor PCA

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval

Convergence Rates of Kernel Quadrature Rules

Information Recovery from Pairwise Measurements: A Shannon-Theoretic Approach

Non white sample covariance matrices.

Compressed Sensing under Optimal Quantization

Optimal Linear Estimation under Unknown Nonlinear Transform

Why We Can Not Surpass Capacity: The Matching Condition

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016

arxiv: v1 [cs.it] 21 Feb 2013

8.1 Concentration inequality for Gaussian random matrix (cont d)

Community detection in stochastic block models via spectral methods

Information Recovery from Pairwise Measurements

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

Predicting Phase Transitions in Hypergraph q-coloring with the Cavity Method

Reconstruction in the Sparse Labeled Stochastic Block Model

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

Tractable Upper Bounds on the Restricted Isometry Constant

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

Tensor Biclustering. Hamid Javadi Stanford University Abstract

Insights into Large Complex Systems via Random Matrix Theory

Interactions of Information Theory and Estimation in Single- and Multi-user Communications

Fluctuations of the free energy of spherical spin glass

ISIT Tutorial Information theory and machine learning Part II

Community Detection and Stochastic Block Models: Recent Developments

The Minimax Noise Sensitivity in Compressed Sensing

Fundamental Limits of Compressed Sensing under Optimal Quantization

arxiv: v2 [math.pr] 24 Jan 2017

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

Random Matrices and Wireless Communications

Antonio Auffinger. Education. Employment. Fields of Interest. Honors, Awards, & Fellowships. Northwestern University. University of Chicago

Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models

WE study the capacity of peak-power limited, single-antenna,

On the low-rank approach for semidefinite programs arising in synchronization and community detection

On the Shamai-Laroia Approximation for the Information Rate of the ISI Channel

Spectral thresholds in the bipartite stochastic block model

Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices

arxiv: v3 [cs.na] 21 Mar 2016

Linear inverse problems on Erdős-Rényi graphs: Information-theoretic limits and efficient recovery

Lecture 2: August 31

Approximate Message Passing Algorithms

Coding over Interference Channels: An Information-Estimation View

Variational Principal Components

An equivalence between high dimensional Bayes optimal inference and M-estimation

An indicator for the number of clusters using a linear map to simplex structure

Random Matrices: Beyond Wigner and Marchenko-Pastur Laws

arxiv: v4 [cs.it] 17 Oct 2015

Phase transitions in discrete structures

Information, Physics, and Computation

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

A Structured Construction of Optimal Measurement Matrix for Noiseless Compressed Sensing via Polarization of Analog Transmission

Concentration Inequalities for Random Matrices

Learning latent structure in complex networks

The spin-glass cornucopia

The Capacity of the Semi-Deterministic Cognitive Interference Channel and its Application to Constant Gap Results for the Gaussian Channel

Replica Symmetry Breaking in Compressive Sensing

Lecture 4 Noisy Channel Coding

How Robust are Thresholds for Community Detection?

Extreme eigenvalues of Erdős-Rényi random graphs

Random Matrices: Invertibility, Structure, and Applications

Concentration inequalities: basics and some new challenges

Belief Propagation for Traffic forecasting

arxiv: v1 [math.st] 26 Jan 2018

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Machine Learning in Simple Networks. Lars Kai Hansen

arxiv: v5 [math.na] 16 Nov 2017

Recovery of Simultaneously Structured Models using Convex Optimization

19.1 Problem setup: Sparse linear regression

Spectral Redemption: Clustering Sparse Networks

Transcription:

Proceedings of Machine Learning Research vol 65:1 5, 2017 Fundamental limits of symmetric low-rank matrix estimation Marc Lelarge Safran Tech, Magny-Les-Hameaux, France Département d informatique, École normale supérieure, CNRS, PSL Research University, Paris, France INRIA Léo Miolane Département d informatique, École normale supérieure, CNRS, PSL Research University, Paris, France INRIA MARC.LELARGE@ENS.FR LEO.MIOLANE@INRIA.FR Abstract We consider the high-dimensional inference problem where the signal is a low-rank symmetric matrix which is corrupted by an additive Gaussian noise. Given a probabilistic model for the low-rank matrix, we compute the limit in the large dimension setting for the mutual information between the signal and the observations, as well as the matrix minimum mean square error, while the rank of the signal remains constant. We unify and generalize a number of recent works on PCA, sparse PCA, submatrix localization or community detection by computing the informationtheoretic limits for these problems in the high noise regime. This allows to locate precisely the information-theoretic thresholds for the above mentioned problems. 1 Keywords: Matrix factorization, PCA, community detection, spin glasses 1. Low-rank matrix estimation The estimation of a low-rank matrix observed through a noisy channel is a fundamental problem in statistical inference with applications in machine learning, signal processing or information theory. We shall consider the high dimensional setting where the low-rank matrix to estimate is symmetric and where the noise is additive and Gaussian. Let P 0 be a probability distribution over R k (k N is fixed) with finite second moment. Let i.i.d. X i P 0 and suppose that we observe for 1 i < j n: λ Y i,j = n X i X j + Z i,j (1) i.i.d. where Z i,j N (0, 1). The goal here is to recover the low-rank signal XX from the observation Y. Notice that we suppose here to observe only the coefficients of λ/nxx + Z that are above the diagonal. The case where all the coefficients are observed can be directly deduced from this case. Depending on the choice of the prior P 0, (1) can encode many classical statistical problems such as PCA, sparse PCA, submatrix localization or community detection. These examples are detailed in the full version of this work. Define the minimal mean square error (MMSE) for this statistical 1. Extended abstract. Full version appears as [arxiv:1611.03888v3] c 2017 M. Lelarge & L. Miolane.

LELARGE MIOLANE problem: MMSE n (λ) = min ˆθ 2 n(n 1) 1 i<j n [ ( E X i X j ˆθ ) ] 2 i,j (Y) where the minimum is taken over all estimators ˆθ (i.e. measurable functions of the observations Y). We aim at computing the limit of MMSE n and the mutual information 1 ni(x; Y) as n goes to infinity. 2. The replica symmetric formula We define the function F : (λ, q) R S + k E log ( λ(z dp 0 (x) exp q 1/2 x) + λx qx λ ) 2 x qx λ 4 q 2 where S + k denote the set of k k symmetric positive-semidefinite matrices. Z N (0, I k) and X P 0 are independent random variables. The limits of the MMSE and the mutual information has been conjectured in Lesieur et al. (2015b) according to powerful, but non-rigorous, statistical physics arguments. They are given by the following replica symmetric formula. Theorem 1 For λ > 0, we have 1 lim n + n I(X; Y) = λ E P 0 XX 2 sup F(λ, q), 4 q S + k For almost all λ > 0, all the maximizers q of q S + k F(λ, q) have the same norm q = q (λ) and MMSE n (λ) E P n 0 XX 2 q (λ) 2 In the rank-one case (k = 1), Barbier et al. (2016) proved Theorem 1 for discrete P 0 but under the restrictive assumption that the function q F(λ, q) is required to have at most three stationary points. Notice that the dummy estimator ˆθ i,j (Y) = E[X i X j ] achieves has a mean square error equal to DMSE = EXX 2 (EX)(EX) 2. We deduce from Theorem 1 that λ c := inf {λ > 0 q (λ) > (EX)(EX) } is the information-theoretic threshold for the estimation problem (1). Namely, if λ > λ c, then lim n MMSE n < DMSE: one can estimate XX better than chance as n goes to infinity. if λ < λ c, then lim n MMSE n = DMSE: it is impossible to retrieve asymptotically XX better than a random guess. 2

FUNDAMENTAL LIMITS OF SYMMETRIC LOW-RANK MATRIX ESTIMATION References IEEE International Symposium on Information Theory, ISIT 2015, Hong Kong, China, June 14-19, 2015, 2015. IEEE. ISBN 978-1-4673-7704-1. URL http://ieeexplore.ieee.org/ xpl/mostrecentissue.jsp?punumber=7270906. Emmanuel Abbe and Colin Sandon. Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic bp, and the information-computation gap. arxiv preprint arxiv:1512.09080, 2015. Michael Aizenman, Robert Sims, and Shannon L Starr. Extended variational principle for the sherrington-kirkpatrick spin-glass model. Physical Review B, 68(21):214403, 2003. Jinho Baik, Gérard Ben Arous, and Sandrine Péché. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Annals of Probability, pages 1643 1697, 2005. Afonso S Bandeira, Nicolas Boumal, and Amit Singer. Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Mathematical Programming, pages 1 23, 2016. Jess Banks, Cristopher Moore, Nicolas Verzelen, Roman Vershynin, and Jiaming Xu. Informationtheoretic bounds and phase transitions in clustering, sparse pca, and submatrix localization. arxiv preprint arxiv:1607.05222v2, 2016. Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Thibault Lesieur, and Lenka Zdeborová. Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula. In Advances in Neural Information Processing Systems, pages 424 432, 2016. Florent Benaych-Georges and Raj Rao Nadakuditi. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics, 227(1):494 521, 2011. Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013. Francesco Caltagirone, Marc Lelarge, and Léo Miolane. Recovering asymmetric communities in the stochastic block model. arxiv preprint arxiv:1610.03680, 2016. Amin Coja-Oghlan, Florent Krzakala, Will Perkins, and Lenka Zdeborova. Information-theoretic thresholds from the cavity method. arxiv preprint arxiv:1611.00814, 2016. Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6):066106, 2011. Yash Deshpande and Emmanuel Abbe. Asymptotic mutual information for the balanced binary stochastic block model. Information and Inference, page iaw017, 2016. Yash Deshpande and Andrea Montanari. Information-theoretically optimal sparse pca. In 2014 IEEE International Symposium on Information Theory, pages 2197 2201. IEEE, 2014. 3

LELARGE MIOLANE Delphine Féral and Sandrine Péché. The largest eigenvalue of rank one deformation of large wigner matrices. Communications in mathematical physics, 272(1):185 228, 2007. Francesco Guerra. Broken replica symmetry bounds in the mean field spin glass model. Communications in mathematical physics, 233(1):1 12, 2003. Dongning Guo, Shlomo Shamai, and Sergio Verdú. Mutual information and minimum mean-square error in gaussian channels. IEEE Transactions on Information Theory, 51(4):1261 1282, 2005. Bruce Hajek, Yihong Wu, and Jiaming Xu. Information limits for recovering a hidden community. In Information Theory (ISIT), 2016 IEEE International Symposium on, pages 1894 1898. IEEE, 2016. Yukito Iba. The nishimori line and bayesian statistics. Journal of Physics A: Mathematical and General, 32(21):3875, 1999. Satish Babu Korada and Nicolas Macris. Exact solution of the gauge symmetric p-spin glass model on a complete graph. Journal of Statistical Physics, 136(2):205 230, 2009. Satish Babu Korada and Nicolas Macris. Tight bounds on the capacity of binary input random cdma systems. IEEE Transactions on Information Theory, 56(11):5590 5613, 2010. Satish Babu Korada and Andrea Montanari. Applications of the lindeberg principle in communications and statistical learning. IEEE Transactions on Information Theory, 57(4):2440 2450, 2011. Florent Krzakala, Jiaming Xu, and Lenka Zdeborová. Mutual information in rank-one matrix estimation. In Information Theory Workshop (ITW), 2016 IEEE, pages 71 75. IEEE, 2016. Thibault Lesieur, Florent Krzakala, and Lenka Zdeborová. MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel. In 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2015, Allerton Park & Retreat Center, Monticello, IL, USA, September 29 - October 2, 2015, pages 680 687. IEEE, 2015a. ISBN 978-1-5090-1824-6. doi: 10.1109/ALLERTON.2015.7447070. URL http://dx.doi.org/10. 1109/ALLERTON.2015.7447070. Thibault Lesieur, Florent Krzakala, and Lenka Zdeborová. Phase transitions in sparse PCA. In IEEE International Symposium on Information Theory, ISIT 2015, Hong Kong, China, June 14-19, 2015 DBL (2015), pages 1635 1639. ISBN 978-1-4673-7704-1. doi: 10.1109/ISIT.2015. 7282733. URL http://dx.doi.org/10.1109/isit.2015.7282733. Cyril Méasson, Andrea Montanari, Thomas J Richardson, and Rüdiger Urbanke. The generalized area theorem and some of its consequences. IEEE Transactions on Information Theory, 55(11): 4793 4821, 2009. Marc Mézard, Giorgio Parisi, and Miguel Virasoro. Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Co Inc, 1987. Paul Milgrom and Ilya Segal. Envelope theorems for arbitrary choice sets. Econometrica, 70(2): 583 601, 2002. 4

FUNDAMENTAL LIMITS OF SYMMETRIC LOW-RANK MATRIX ESTIMATION Andrea Montanari. Estimating random variables from random sparse observations. European Transactions on Telecommunications, 19(4):385 403, 2008. Andrea Montanari. Finding one community in a sparse graph. Journal of Statistical Physics, 161 (2):273 299, 2015. Joe Neeman and Praneeth Netrapalli. Non-reconstructability in the stochastic block model. arxiv preprint arxiv:1404.6304, 2014. Hidetoshi Nishimori. Statistical physics of spin glasses and information processing: an introduction, volume 111. Clarendon Press, 2001. Dmitry Panchenko. The Sherrington-Kirkpatrick model. Springer Science & Business Media, 2013. Amelia Perry, Alexander S Wein, Afonso S Bandeira, and Ankur Moitra. sub-optimality of pca for spiked random matrices and synchronization. arxiv:1609.05573, 2016. Optimality and arxiv preprint Sundeep Rangan and Alyson K Fletcher. Iterative estimation of constrained rank-one matrices in noise. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, pages 1246 1250. IEEE, 2012. Alaa Saade, Marc Lelarge, Florent Krzakala, and Lenka Zdeborová. Spectral detection in the censored block model. In 2015 IEEE International Symposium on Information Theory (ISIT), pages 1184 1188. IEEE, 2015. Alaa Saade, Marc Lelarge, Florent Krzakala, and Lenka Zdeborová. Clustering from sparse pairwise measurements. In Information Theory (ISIT), 2016 IEEE International Symposium on, pages 780 784. IEEE, 2016. Michel Talagrand. Mean field models for spin glasses: Volume I: Basic examples, volume 54. Springer Science & Business Media, 2010. Michel Talagrand. Mean field models for spin glasses: Volume II: Advanced Replica-Symmetry and Low Temperature, volume 55. Springer Science & Business Media, 2011. Lenka Zdeborová and Florent Krzakala. Statistical physics of inference: Thresholds and algorithms. Advances in Physics, 65(5):453 552, 2016. 5