An iterative hard thresholding estimator for low rank matrix recovery

Similar documents
Universal low-rank matrix recovery from Pauli measurements

Adaptive one-bit matrix completion

Sparse and Low Rank Recovery via Null Space Properties

Reconstruction from Anisotropic Random Measurements

High-dimensional Statistics

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

High-dimensional Statistical Models

Three Generalizations of Compressed Sensing

Binary matrix completion

The Stability of Low-Rank Matrix Reconstruction: a Constrained Singular Value Perspective

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

1 Regression with High Dimensional Data

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Optimisation Combinatoire et Convexe.

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

On Iterative Hard Thresholding Methods for High-dimensional M-Estimation

Compressed Sensing and Robust Recovery of Low Rank Matrices

High-dimensional statistics: Some progress and challenges ahead

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Least squares under convex constraint

sparse and low-rank tensor recovery Cubic-Sketching

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Estimation of (near) low-rank matrices with noise and high-dimensional scaling

Information-Theoretic Limits of Matrix Completion

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

ROP: MATRIX RECOVERY VIA RANK-ONE PROJECTIONS 1. BY T. TONY CAI AND ANRU ZHANG University of Pennsylvania

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Convex relaxation for Combinatorial Penalties

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

arxiv: v2 [math.st] 12 Feb 2008

Lecture 2 Part 1 Optimization

Learning discrete graphical models via generalized inverse covariance matrices

19.1 Problem setup: Sparse linear regression

General principles for high-dimensional estimation: Statistics and computation

arxiv: v1 [cs.it] 21 Feb 2013

Recovering overcomplete sparse representations from structured sensing

Compressed Sensing and Sparse Recovery

Quantum State Tomography via Compressed Sensing

Matrix Completion: Fundamental Limits and Efficient Algorithms

Phase recovery with PhaseCut and the wavelet transform case

arxiv: v3 [stat.me] 8 Jun 2018

Sparse Solutions of an Undetermined Linear System

Signal Recovery from Permuted Observations

1-Bit Matrix Completion

Recovering any low-rank matrix, provably

Analysis of Greedy Algorithms

1-Bit Matrix Completion

Introduction to Compressed Sensing

Concentration-Based Guarantees for Low-Rank Matrix Reconstruction

Low-Rank Matrix Recovery

Lecture Notes 9: Constrained Optimization

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Solving Corrupted Quadratic Equations, Provably

OWL to the rescue of LASSO

arxiv: v1 [math.st] 10 Sep 2015

ESTIMATION OF (NEAR) LOW-RANK MATRICES WITH NOISE AND HIGH-DIMENSIONAL SCALING

Robust Principal Component Analysis

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

The convex algebraic geometry of rank minimization

CSC 576: Variants of Sparse Learning

Robust estimation, efficiency, and Lasso debiasing

STAT 200C: High-dimensional Statistics

Mathematical Methods for Data Analysis

Stochastic dynamical modeling:

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Tractable Upper Bounds on the Restricted Isometry Constant

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Multiplicative and Additive Perturbation Effects on the Recovery of Sparse Signals on the Sphere using Compressed Sensing

Interpolation-Based Trust-Region Methods for DFO

Compressive Sensing and Beyond

A new method on deterministic construction of the measurement matrix in compressed sensing

Stein s Method for Matrix Concentration

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

Statistical Issues in Searches: Photon Science Response. Rebecca Willett, Duke University

high-dimensional inference robust to the lack of model sparsity

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Nonlinear Structured Signal Estimation in High Dimensions via Iterative Hard Thresholding

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Compressed Sensing and Affine Rank Minimization Under Restricted Isometry

Shifting Inequality and Recovery of Sparse Signals

Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators

Rank Minimization over Finite Fields

High-dimensional covariance estimation based on Gaussian graphical models

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Super-resolution via Convex Programming

Beyond stochastic gradient descent for large-scale machine learning

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

A General Framework for High-Dimensional Inference and Multiple Testing

Transcription:

An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics, University of Cambridge Workshop HDP-QPh, June. 10th 2015

Introduction Low rank matrix recovery in high dimension : Relevant applications (in particular quantum tomography) Interesting theoretical challenges This talk : based on a joint work with Arlene K.Y. Kim, An iterative hard thresholding estimator for low rank matrix recovery with explicit limiting distribution, availabe on arxiv (arxiv:1502.04654).

Outline The matrix recovery setting Setting Results In high dimension The problem Discussion Hard thresholding estimator The estimator Results Simulations

Outline The matrix recovery setting Setting Results In high dimension The problem Discussion Hard thresholding estimator The estimator Results Simulations

Setting Background and notations Vector notations Let p > 0..,. : classical vectorial scalar product in C p.. q, q 0 : l p (semi, for q = 0) norm in C p Matrix notations Let d > 0..,. tr : scalar product for d d Hermitian matrices (A, B) : A, B tr = tr ( A B ).. F : Frobenius norm, A 2 F = A, A tr = i λ2 i, where the (λ 2 i ) i are the eigen values of A A.. : Trace (or Schatten 1) norm, A = i λi.. S : Spectral (or Schatten ) norm, A S = sup i λ i.

Setting The matrix regression setting For a parameter Θ, and a sensing matrix X i, of dim. d d, one observes a noisy data for i n, Y i = tr ( (X i ) Θ ) + ɛ i = X i, Θ tr + ɛ i, where ɛ R n is an i.i.d. noise vector.

Setting The matrix regression setting : objective Let X be a linear operator, s.t. X(A) = (tr ( (X i ) A )) ) ( X = i, A tr i n i n. Model : Y = X(Θ) + ɛ. Problem Given (Y, X), reconstruct Θ. Matrix version of the linear regression problem.

Setting Application to quantum tomography Source = ^ θ Measurement = ^ X See Gross (2011); Flammia et al. (2012); Kahn and Guta (2009); Guta et al. (2012); Barndorff-Nielsen et al. (2003); Butucea, Guta, Kypraios (2015); Alquier et al. (2013); Liu (2011); Gross et al. (2010) etc.

Results The least square estimator Model : Y = X(Θ) + ɛ. Most natural idea : least square, i.e. ˆΘ = arg min T X(T ) Y 2 2, which has for solution the least square estimator ˆΘ = (X X) 1 X (Y ), where X = n i=1 (Xi ) is the conjugate of X.

Results Properties of the least square estimator Under standard properties of the noise, and invertibility of the design (i.e. existence of (X X) 1 ), one has asymptotically ˆΘ Θ N (0, (X X) 1 ), and if sub-gaussian noise (e.g. bounded), with proba. 1 δ ˆΘ Θ 2 F C d2 log(1/δ), n which is minimax optimal over d d matrices. Those two results enable to do inference (estimation + confidence statements).

Outline The matrix recovery setting Setting Results In high dimension The problem Discussion Hard thresholding estimator The estimator Results Simulations

The problem Main problem Crucial assumption : X X invertible - measurement system must be complete. We thus require n d 2. Question : What if it is not the case, i.e. high dimensional setting where n d 2? Answer : Restrict set of parameters and impose condition on design.

The problem Restriction on the set of parameters Problem : If n d 2, some parameters have same image even in the absence of noise, impossible to have uniform reconstruction over d d matrices. Solution : Restrict the space of parameters. Natural restriction : low rank matrices. We write M(k), for the set of matrices of rank k.

The problem Design assumption Good sampling scheme X satisfies the matrix Restricted Isometry Property (RIP) : sup T M(k) 1 n X(T ) 2 2 T 2 F T 2 F ɛ. M(k) See e.g. Candès and Recht (2009); Candès and Tao (2010); Candès and Plan (2011); Gross (2011); Liu (2011) etc. Example : Random sub-gaussian design, or random sampling from incoherent basis (e.g. Pauli)...

The problem Remark on the following pictures M(k) is not easy to draw so for drawing the intuitions, I will resort to the sparse linear regression model, where and θ is k sparse. Y = Xθ + ɛ, M(k) 1 sparse vectors

The problem First non-convex solution If X satisfies the matrix RIP, the measurement system is complete for M(k). The estimator ˆΘ 0 = arg min X(T ) Y 2, T M(k) will satisfy with proba. larger than 1 δ ˆΘ 0 Θ 2 F C kd log(1/δ). n Problem : Non convex, horrible program.

The problem Convex relaxation Current estimator ˆΘ 0 = arg min X(T ) Y 2, T M(k) Problem : Non convex, horrible program. Idea = convex relaxation : ˆΘ = arg min X(T ) Y 2, T : T b or rather ˆΘ = arg min T X(T ) Y 2 2 + λ T.

The problem Convex relaxation Matrix Lasso : ˆΘ = arg min T X(T ) Y 2 2 + λ T. Θ^

The problem Convex relaxation Theorem If the design satisfies the matrix RIP, and if d log(1/δ) λ n then matrix lasso satisfies with proba. larger than 1 δ ˆΘ Θ 2 F C kd log(1/δ). n See e.g. Fazel et al. (2010); Candès and Plan (2011); Gross et al. (2010); Flammia et al. (2012); Koltchinskii et al. (2011) etc.

Discussion Questions From a minimax perspective, problem is solved. But How to implement efficiently the matrix lasso? Or rather, is there an estimator that is computationally efficient? What is the precision of the estimate? Entry-wise results? Limiting distribution?

Discussion Implementation ˆΘ is defined by an optimisation program - it is computationally solvable in theory. But in practice? Projected gradient descent in the noisy case Agarwal et al. (2012). Many works on this the regression setting Agarwal et al. (2012); Goldfarb and Ma (2011); Blumensath and Davies (2009); Tanner and Wei (2012) - in particular Hard thresholding in the noiseless case.

Discussion Implementation Projected gradient descent : Mean squared error Hard Thresholding : Mean squared error Convex relaxation of the constraint Actual constraint

Discussion Uncertainty quantification Uncertainty quantification? Results only for linear regression model... Global vs. local result (. F vs.. S ). Estimator with uncertainty distributed and with explicit limiting distribution van de Geer et al. (2014); Javanmard and Montanari (2014); Zhang and Zhang (2014). Remark : Minimax confidence set depending on the sparsity? Does not exist in the linear regression model Nickl and van de Geer (2014)... But exists in the matrix recovery model Carpentier, Eisert, Gross and Nickl (2015)!

Discussion Uncertainty quantification Constrained solution : no obvious lim. distr. Projected solution : Gaussian lim. distr. Θ^ Y - X( Θ^ ) Θ^ Y - X( ) X * ( ) 1 n

Outline The matrix recovery setting Setting Results In high dimension The problem Discussion Hard thresholding estimator The estimator Results Simulations

The estimator Prerequisites Let K be an upper bound on the rank of the parameter, i.e. Θ M(K). Assume that X satisfies the matrix RIP : We will need : sup T M(2K) 1 n X(T ) 2 2 T 2 F T 2 c n (2K). F c n (2K) K < 1/4. For e.g. Gaussian design, or random Pauli design, we have (up to a log(d) for Pauli) kd c n (2K) n, and so condition satisfied whenever d k n 1.

The estimator The hard thresholding estimator Initial values for the estimator ˆΘ 0 and the threshold T 0 : ˆΘ 0 = 0 R d d, T 0 = B R +. Set now recursively, for r N, r 1, T r = 4 c n (2K) KT r 1 + C d log(1/δ), and, n ˆΘ r = ˆΘ r 1 + 1 n X ( Y X( ˆΘ r 1 ) ) Tr. where M T is the matrix where all singular values of M smaller than T are thresholded (set to 0).

The estimator Interpretations Low rank projected gradient descent : A projected gradient descent on the set of low rank matrices But gradient step is 1 (so not really a gradient descent...). Gradient step of 1 is very important... Mean squared error Actual constraint

The estimator Interpretations Application of a contracting operator : We have : ˆΘ r = ˆΘ r 1 + 1 n X ( X(Θ ˆΘ r 1 )+ɛ )) Tr. Θ - ^ Θ r-1 By condition c n (2K) K < 1/4, we have X X * 1 n 1 n X X Id d S 1/4. So multiple application of a spectral contraction (with thresholdings). 1 X * X( Θ - ^Θ ) n r-1

The estimator Interpretations Taylor expansion of the inverse function : Least squares : (X X) 1 X Y. Problem : (X X) 1. Taylor expansion at r of (Id d (Id d 1/nX X)) 1 : L(r) = r (Id d 1 n X X) m. m=0 If suppression of thresholding step, the estimator we constructed is of the form 1 n L(r)X Y. Thresholding between each step controls the small eigen values...

Results General result Theorem Let r O(log(n)). With probability larger than 1 δ and for any k K/2 and also that sup Θ M(k), Θ F B Θ ˆΘ d log(1/δ) r S C 1, n sup rank( ˆΘ r ) k. Θ M(k), Θ F B Note : Minimax optimal results in Frobenius and Trace norm follow immediately.

Results Discussion Bounds are minimax-optimal. The spectral norm bound allows to have bound of the entry-wise risk. Adaptive estimator : no need to know k. Results hold also in the linear regression setting with estimator in the same vein.

Results Result in Gaussian design Theorem Assume that the elements in the design matrices X i M are i.i.d. Gaussian with mean 0 and variance 1. Then, writing Z := 1 n X (ɛ) and := n( ˆΘ r Θ) 1 n X X( ˆΘ r Θ), we have n( ˆΘ Θ) = + Z, ( ) where Z X N 0, 1 n X X. Assuming that max(k 2 d, Kd log(d)) = o(n), we have that = o P (1).

Results Discussion Limiting distribution entrywise confidence sets. Bound on the risk of each entry by 1/n (Gaussian concentration). Results hold also in the linear regression setting.

Simulations Simulations Simulations for Gaussian design Gaussian uncorrelated noise ɛ N (0, I n ) Parameter Θ of rank k Θ = k N l Nl T, where, N l N (0, I d ). l=1 Computing estimator, and entrywise Confidence intervals.

Simulations log(rescaled Risk) log(rescaled Risk) 6 4 2 0 3.5 2.5 1.5 0.5 p=64,k=3 1000 2000 3000 4000 5000 6000 n p=64,k=10 1000 2000 3000 4000 5000 6000 n log(rescaled Risk) log(rescaled Risk) 6 4 2 0 2.0 1.0 0.0 p=100,k=3 2000 3000 4000 5000 6000 7000 n p=100,k=10 2000 3000 4000 5000 6000 7000 n Figure: Logarithm of the rescaled Frobenius risk of the estimate.

Simulations p=64,k=3 p=100,k=3 log(length of CI) 4 3 2 1 0 1 1000 2000 3000 4000 5000 6000 n p=64,k=10 log(length of CI) 4 3 2 1 0 1 2000 3000 4000 5000 6000 7000 n p=100,k=10 log(length of CI) 1 0 1 2 1000 2000 3000 4000 5000 6000 n log(length of CI) 0.0 1.0 2.0 2000 3000 4000 5000 6000 7000 n Figure: Logarithm of rescaled CI length

Simulations Coverage Probability Coverage Probability 0.5 0.7 0.9 0.5 0.7 0.9 p=64,k=3 1000 2000 3000 4000 5000 6000 n p=64,k=10 1000 2000 3000 4000 5000 6000 n Coverage Probability Coverage Probability 0.5 0.7 0.9 0.5 0.7 0.9 p=100,k=3 2000 3000 4000 5000 6000 7000 n p=100,k=10 Figure: Coverage of CI 2000 3000 4000 5000 6000 7000 n

Conclusion We have Minimax-optimal bounds. In particular for spectral norm. Estimator that is very fast to compute. With limiting distribution in the case of a Gaussian design. We want Limiting distribution in non Gaussian design? Sharper bounds on entries in non-gaussian design? Results with true quantum model?

Thank you!

References I Agarwal, A., S. Negahban, and M. J. Wainwright (2012). Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Statist. 40(5), 2452 2482. P. Alquier, C. Butucea, M. Hebiri, K. Meziani, T. Morimae (2013). Rank penalized estimation of a quantum system. Physical Reviews A 88, 032113. O. E. Barndorff-Nielsen, R. D. Gill, P. E. Jupp (2003). On quantum statistical inference (with discussion). J. R. Statist. Soc. B. 65(5), 775 816. Blumensath, T. and M. E. Davies (2009). Iterative hard thresholding for compressed sensing. Appl. Computat. Har. Analysis 27(3), 265 274.

References II Butucea, Guta, Kypraios (2015). Spectral thresholding quantum tomography for low rank states. arxiv:1504.08295. Candès, E. and T. Tao (2010). The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inform. Theory 56, 2053 2080. Candès, E. J. and Y. Plan (2011). Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. IEEE Trans. Inform. Theory 57(4), 2342 2359. Candès, E. and B. Recht (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9, 717 772.

References III Carpentier A., Eisert J., Gross D. and Nickl R. (2015). EUncertainty Quantification for Matrix Compressed Sensing and Quantum Tomography Problems. arxiv : 1504.03234. Flammia, S. T, D. Gross, Y.-K. Liu, and J. Eisert. Quantum tomography via compressed sensing: error bounds, sample complexity and efficient estimators. New J. Phys., 14(9):095022, 2012. van de Geer, S., P. Bühlmann, Y. Ritov, and R. Dezeure (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42(3), 1166 1202. Recht, B., Fazel, M., and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review 52(3), 471 501.

References IV Goldfarb, D. and S. Ma (2011). Convergence of fixed-point continuation algorithms for matrix rank minimization. Found. Comput. Math. 11, 183 210. Gross, D. (2011). Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inform. Theory 57(3), 1548 1566. Gross, D., Y.-K. Liu, S. T Flammia, S. Becker, and J. Eisert. Quantum state tomography via compressed sensing. Physical Rev. letters, 105(15):150401, 2010. M. Guta, T. Kypraios and I. Dryden. Rank based model selection for multiple ions quantum tomography. New Journal of Physics, 14:105002, 2012.

References V H. Haffner et al. Scalable multiparticle entanglement of trapped ions. Nature, 438:643-646, 2005. Javanmard, A. and A. Montanari. Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res., 15(1):2869 2909, 2014. J. Kahn and M. Guta. Local asymptotic normality for finite dimensional quantum systems. Commun. Math. Phys., 289 597-652 (2009). Koltchinskii, V., K. Lounici, and A. B. Tsybakov (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39(5), 2302 2329. Liu, Y.K. (2011). Universal low-rank matrix recovery from Pauli measurements. Adv. Neural Inf. Process. Syst., 1638 1646.

References VI Nickl, R. and van de Geer, S. (2014). Confidence sets in sparse regression. Ann. Statist. 41(6), 2852-2876. Tanner, J. and K. Wei (2012). Normalized iterative hard thresolding for matrix completion. SIAM J. Sci. Comput. 35, S104 S125. Zhang, C-H. and Zhang, S-S. (2012). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 76, 217 242.