Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Similar documents
On convergence of Approximate Message Passing

Estimating LASSO Risk and Noise Level

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

An equivalence between high dimensional Bayes optimal inference and M-estimation

Least squares under convex constraint

Reconstruction from Anisotropic Random Measurements

Approximate Message Passing for Bilinear Models

Sparse Superposition Codes for the Gaussian Channel

Mismatched Estimation in Large Linear Systems

Message passing and approximate message passing

Approximate Message Passing

Mismatched Estimation in Large Linear Systems

Approximate Message Passing Algorithms

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems

arxiv: v2 [math.st] 12 Feb 2008

An Overview of Multi-Processor Approximate Message Passing

Performance Regions in Compressed Sensing from Noisy Measurements

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Single-letter Characterization of Signal Estimation from Linear Measurements

A General Framework for High-Dimensional Inference and Multiple Testing

arxiv: v1 [math.st] 8 Jan 2008

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

An iterative hard thresholding estimator for low rank matrix recovery

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Sparsity in Underdetermined Systems

high-dimensional inference robust to the lack of model sparsity

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Lecture 24 May 30, 2018

Planted Cliques, Iterative Thresholding and Message Passing Algorithms

M-Estimation under High-Dimensional Asymptotics

Inference for High Dimensional Robust Regression

High-dimensional covariance estimation based on Gaussian graphical models

Reconstruction in the Generalized Stochastic Block Model

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Tractable Upper Bounds on the Restricted Isometry Constant

BM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

FALSE DISCOVERIES OCCUR EARLY ON THE LASSO PATH

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Message Passing Algorithms: A Success Looking for Theoreticians

Message Passing Algorithms for Compressed Sensing: I. Motivation and Construction

arxiv: v2 [stat.me] 3 Jan 2017

Information-theoretically Optimal Sparse PCA

Permutation-invariant regularization of large covariance matrices. Liza Levina

Phil Schniter. Supported in part by NSF grants IIP , CCF , and CCF

Optimization methods

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Computational and Statistical Tradeoffs via Convex Relaxation

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

Does l p -minimization outperform l 1 -minimization?

High-dimensional Statistical Models

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Performance Trade-Offs in Multi-Processor Approximate Message Passing

Mathematical Methods for Data Analysis

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

The spin-glass cornucopia

The Minimax Noise Sensitivity in Compressed Sensing

Hybrid Approximate Message Passing for Generalized Group Sparsity

Gaussian Graphical Models and Graphical Lasso

High-dimensional statistics: Some progress and challenges ahead

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results

Uncertainty quantification in high-dimensional statistics

The deterministic Lasso

THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová

False Discoveries Occur Early on the Lasso Path

Knockoffs as Post-Selection Inference

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Compressed Sensing and Sparse Recovery

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA

Information, Physics, and Computation

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

arxiv: v4 [math.st] 15 Sep 2016

Robust methods and model selection. Garth Tarr September 2015

arxiv: v2 [cs.it] 6 Sep 2016

Probabilistic Graphical Models

Optimization methods

Andrea Montanari. Professor of Electrical Engineering, of Statistics and, by courtesy, of Mathematics. Bio. Teaching BIO ACADEMIC APPOINTMENTS

Signal and Noise Statistics Oblivious Orthogonal Matching Pursuit

High-dimensional regression:

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Online Decision-Making with High-Dimensional Covariates

Guaranteed Sparse Recovery under Linear Transformation

Estimating Unknown Sparsity in Compressed Sensing

Model Selection and Geometry

A statistical model for tensor PCA

Fundamental Limits of Compressed Sensing under Optimal Quantization

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

High-dimensional Statistics

High-dimensional Ordinary Least-squares Projection for Screening Variables

Compressed Sensing under Optimal Quantization

Optimality of Large MIMO Detection via Approximate Message Passing

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Replica Symmetry Breaking in Compressive Sensing

Signal Recovery from Permuted Observations

Transcription:

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari

Statistical learning motivations

Data Prediction Online advertising: Predict probability of click on an ad Healthcare Predict occurrence of diabetes Finance Predict change in stock prices

Formulation Patient record i: Given n records: Posit a linear model: Goal: find a good

Massive amounts of measurements Traditional clinical decision making, based on few important measurements (small p) Electronic health records many cheap measurements (large p) Monitoring Nurse observation MD exam Labs Medications Radiology Real-time vital signs Testing for wellness Location tracking Smartphone apps Personalized medicine Genomic data

How to use more measurements? Standard least square Many solutions Most solutions are poor for future outcomes (due to noise) Main problem: For large p find few important measurements Infer a sparse

Learning recipe Define a loss function: Example: least square or Gaussian noise: Estimate by

Estimate by NP-hard problem

Convex relaxation (LASSO) Estimate by Tibshirani 96, Chen-Donoho 95 Automatically selects few important measurements

Model selection Low (large ) High (small ) Source: Elements of Statistical Learning, Hastie et al 2009

Mathematical questions

Characteristics of the solution What performance should we expect? What is MSE or for each? How to choose the best?

Growing theory on LASSO Zhao, Yu (2006) Candes, Romberg, Tao (2006) Candes, Tao (2007) Bunea, Tsybakov, Wegkamp (2007) Bickel, Ritov, Tsybakov (2009) Buhlmann, van de Geer (2009) Zhang (2009) Meinshausen, Yu (2009) Wainwright (2009, 2011) Talagrand (2010) Belloni, Chernozhukov et al (2009-13) Maleki et al (2011) Bickel et al (2012) There are many more but not listed due to space limitations.

General random convex functions Consider Let A be a Gaussian matrix Let be strictly convex or Talagrand 10: Finds generic properties of the minimizer, in particular MSE can be calculated, when certain replica symmetric equations have solutions Chapter 3 of Mean Field Models for Spin Glasses Vol 1 Does not apply to our case with

Some intuition: scalar case (p=n=1) For LASSO estimate is Simple calculus: Then MSE is With independent

Main result Theorem (Bayati-Montanari): For and Like the scalar case with Where:

Main result Theorem (Bayati-Montanari): For and In fact we prove: Problem asymptotically decouples to p scalar sub-problems with increased Gaussian noise And we find a formula for the noise

Main result (general case) Theorem (Bayati-Montanari): For and In fact we prove: Problem asymptotically decouples to p scalar sub-problems with increased Gaussian noise

Main result (general case) Theorem (Bayati-Montanari): For and Note: There is strong empirical and theoretical evidence that Gaussian assumption on A is not necessary Donoho-Maleki-Montanari 09 Bayati-Lelarge-Montanari 12-13

Algorithmic Analysis

Proof strategy 1. Construct a sequence in that 2. Show that With

Start with Belief Propagation 1. Gibbs measure 2. Write cavity (belief propagation) equations at 3. Each message is a probability distribution with mean satisfying

AMP algorithm: derivation 1. Gibbs measure 2. Message-passing (MP) algorithm 3. Look at first order approximation of the messages (AMP)

AMP algorithm Approximate message passing (AMP) Donoho, Maleki, Montanari 09 Onsager reaction term.

AMP and compressed sensing AMP was originally designed to solve this problem: This is equivalent to LASSO solution when In this case assumption is that x 0 is the solution to the above optimization when L 1 norm is replaced with L 0 norm

Phase transition line and algorithms Source: Arian Maleki s PhD Thesis

AMP algorithm Approximate message passing (AMP) Donoho, Maleki, Montanari 09 For Gaussian A as it converges and accuracy can be achieved after iterations Bayati, Montanari 12

Main steps of the proof 1. We use a conditioning technique (due to Bolthausen) to prove: Where 2. 3. Therefore algorithm s estimate satisfies the main claim:

Recall the main result Theorem (Bayati-Montanari): For and Problem: The right hand side requires knowledge of. What can be done when we do not have that?

Objective Recall the problem: Given, construct estimator for MSE and (*) So far we used the knowledge of noise and distribution of x 0 which is not realistic. Next, we ll demonstrate how to solve (*).

Recipe (Columns of A are iid) 1. Let 2. Define pseudo-data 3. The estimators are where

Main Result Theorem (Bayati-Erdogdu-Montanari 13) For and For correlated columns we have a similar (non-rigorous) formula that relies on a conjecture based on replica method due to Javanmard-Montanari 13.

Sketch of the proof where Using Stein s SURE estimate:

MSE Estimation (iid Gaussian Data)

MSE Estimation (Correlated Gaussian Data) Relies on a replica method conjecture

Comparison with noise estimation methods Belloni, Chernuzhukov (2009) Fan, Guo, Hao (2010) Sun, Zhang (2010) Zhang (2010) Städler, Bühlmann, van de Geer (2010, 2012)

Noise Estimation (iid Gaussian Data)

Noise Estimation (Correlated Gaussian Data)

Extensions to general random matrices

Recall AMP and MP algorithm 1. Gibbs measure 2. Message-passing algorithm 3. Look at first order approximation of the messages.

General random matrices (i.n.i.d.) Theorem (Bayati-Lelarge-Montanari 12) 1) As, finite marginals of are asymptotically insensitive to the distribution of with sub-exponential tail. 2) The entries are asymptotically Gaussian with zero mean, and variance that can be calculated by a one dimensional equation.

Main steps of the proof Step 1: AMP is asymptotically equivalent to its belief propagation (MP) counterpart (w.l.o.g. assume A is symmetric) AMP MP

Main steps of the proof Step 2: MP messages are summation over non-backtracking trees Example: If i l a b

Main steps of the proof Step 2: MP messages are summation over non-backtracking trees Example: If i l a b

Step 2: (continued) Main steps of the proof

Main steps of the proof Step 2: (continued) Each edge is repeated twice Converges to 0 as p grows First term is independent of the distribution and only depends on the second moments.

Extensions and open directions Setting: General distribution on A, other cost functions/regularizers Promising progress: Rangan et al 10-12 Schniter et al 10-12 Donoho-Johnston-Montanari 11 Maleki et al 11 Krzakala-Mézard-Sausset-Sun-Zdeborová 11-12 Bean-Bickel-El Karoui-Yu 12 Bayati-Lelarge-Montanari 12 Javanmard-Montanari 12,13 Kabashima et al 12-14 Manoel-Krzakala-Tramel-Zdeborová 14 Caltagirone-Krzakala-Zdeborová 14 Schülke-Caltagirone-Zdeborová 14

Thank you!