Reconstruction from Anisotropic Random Measurements

Similar documents
Reconstruction from Anisotropic Random Measurements

High-dimensional covariance estimation based on Gaussian graphical models

Least squares under convex constraint

High-dimensional statistics: Some progress and challenges ahead

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Sparse Optimization Lecture: Sparse Recovery Guarantees

Lecture 3. Random Fourier measurements

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

The Pros and Cons of Compressive Sensing

Sparse recovery under weak moment assumptions

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

(Part 1) High-dimensional statistics May / 41

Lecture Notes 9: Constrained Optimization

Sparse and Low Rank Recovery via Null Space Properties

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Strengthened Sobolev inequalities for a random subspace of functions

arxiv: v2 [math.st] 12 Feb 2008

Tractable Upper Bounds on the Restricted Isometry Constant

Geometry of log-concave Ensembles of random matrices

An Introduction to Sparse Approximation

Compressed Sensing and Sparse Recovery

Universal low-rank matrix recovery from Pauli measurements

Conditions for Robust Principal Component Analysis

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Supremum of simple stochastic processes

Sparsity Regularization

Recovering overcomplete sparse representations from structured sensing

The Pros and Cons of Compressive Sensing

Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls

Random hyperplane tessellations and dimension reduction

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Constrained optimization

Tractable performance bounds for compressed sensing.

arxiv: v1 [math.st] 8 Jan 2008

Sparse Recovery with Pre-Gaussian Random Matrices

Guaranteed Sparse Recovery under Linear Transformation

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

High-dimensional Statistical Models

An iterative hard thresholding estimator for low rank matrix recovery

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Shifting Inequality and Recovery of Sparse Signals

GREEDY SIGNAL RECOVERY REVIEW

Introduction to Compressed Sensing

A REMARK ON THE LASSO AND THE DANTZIG SELECTOR

Sparse Legendre expansions via l 1 minimization

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Near Optimal Signal Recovery from Random Projections

arxiv: v1 [math.st] 5 Oct 2009

ROP: MATRIX RECOVERY VIA RANK-ONE PROJECTIONS 1. BY T. TONY CAI AND ANRU ZHANG University of Pennsylvania

Compressive Sensing with Random Matrices

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

The Stability of Low-Rank Matrix Reconstruction: a Constrained Singular Value Perspective

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

The deterministic Lasso

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Compressed Sensing and Neural Networks

THEORY OF COMPRESSIVE SENSING VIA l 1 -MINIMIZATION: A NON-RIP ANALYSIS AND EXTENSIONS

Error Correction via Linear Programming

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Stability and robustness of l 1 -minimizations with Weibull matrices and redundant dictionaries

Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

sparse and low-rank tensor recovery Cubic-Sketching

1 Regression with High Dimensional Data

High-dimensional Statistics

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Making Flippy Floppy

Optimisation Combinatoire et Convexe.

AN INTRODUCTION TO COMPRESSIVE SENSING

General principles for high-dimensional estimation: Statistics and computation

Methods for sparse analysis of high-dimensional data, II

Stochastic geometry and random matrix theory in CS

Recent Developments in Compressed Sensing

Optimization for Compressed Sensing

Estimating Unknown Sparsity in Compressed Sensing

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

19.1 Problem setup: Sparse linear regression

On the singular values of random matrices

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

SPARSE signal representations have gained popularity in recent

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

New ways of dimension reduction? Cutting data sets into small pieces

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Generalized Orthogonal Matching Pursuit- A Review and Some

Compressed Sensing and Related Learning Problems

Uniform uncertainty principle for Bernoulli and subgaussian ensembles

The convex algebraic geometry of linear inverse problems

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Transcription:

Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

Want to estimate a parameter β R p Example: How is a response y R related to the Parkinson s disease affected by a set of genes among the Chinese population? Construct a linear model: y = β T x + ɛ, where E (y x) = β T x Parameter: Non-zero entries in β (sparsity of β) identify a subset of genes and indicate how much they influence y Take a random sample of (X, Y ), and use the sample to estimate β; that is, we have Y = Xβ + ɛ

Model selection and parameter estimation When can we approximately recover β from n noisy observations Y? Questions: How many measurements n do we need in order to recover the non-zero positions in β? How does n scale with p or s, where s is the number of non-zero entries of β? What assumptions about the data matrix X are reasonable?

Sparse recovery When β is known to be s-sparse for some 1 s n, which means that at most s of the coefficients of β can be non-zero: Assume every s columns of X are linearly independent: Identifiability condition (reasonable once n s) Λ min (s) = min υ 0,s-sparse Xυ n υ > 0 Proposition: (Candès-Tao 05) Suppose that any s columns of the n p matrix X are linearly independent Then, any s-sparse signal β R p can be reconstructed uniquely from Xβ

l 0 -minimization How to reconstruct an s-sparse signal β R p from the measurements Y = Xβ given Λ min (s) > 0? Let β be the unique sparsest solution to Xβ = Y : β = arg min β:xβ=y β 0 where β 0 := #{1 i p : β i 0} is the sparsity of β Unfortunately, l 0 -minimization is computationally intractable; (in fact, it is an NP-complete problem)

Basis pursuit Consider the following convex optimization problem β := arg min β:xβ=y β 1 Basis pursuit works whenever the n p measurement matrix X is sufficiently incoherent: RIP (Candès-Tao 05) requires that for all T {1,, p} with T s and for all coefficients sequences (c j ) j T, (1 δ s ) c X T c/n (1 + δ s ) c holds for some 0 < δ s < 1 (s-restricted isometry constant) The good matrices for compressed sensing should satisfy the inequalities for the largest possible s

Restricted Isometry Property (RIP): examples For Gaussian random matrix, or any sub-gaussian ensemble, RIP holds with s n/ log(p/n) For random Fourier ensemble, or randomly sampled rows of orthonormal matrices, RIP holds for s = O(n/ log 4 p) For a random matrix composed of columns that are independent isotropic vectors with log-concave densities, RIP holds for s = O(n/log (p/n)) References: Candès-Tao 05, 06, Rudelson-Vershynin 05, Donoho 06, Baraniuk et al 08, Mendelson et al 08, Adamczak et al 09

Basis pursuit for high dimensional data These algorithms are also robust with regards to noise, and RIP will be replaced by more relaxed conditions In particular, the isotropicity condition which has been assumed in all literature cited above needs to be dropped Let X i R p, i = 1,, n be iid random row vectors of the design matrix X Covariance matrix: Σ(X i ) = EX i X i = EX i Xi T Σ n = 1 n X i X i = 1 n X i Xi T n n i=1 X i is isotropic if Σ(X i ) = I and E X i = n i=1

Sparse recovery for Y = Xβ + ɛ Lasso (Tibshirani 96), aka Basis Pursuit (Chen, Donoho and Saunders 98, and others): β = arg min β Y Xβ /n + λ n β 1, where the scaling factor 1/(n) is chosen by convenience Dantzig selector (Candès-Tao 07): (DS) arg min β R p β 1 subject to X T (Y X β)/n λ n References: Greenshtein-Ritov 04, Meinshausen-Bühlmann 06, Zhao-Yu 06, Bunea et al 07, Candès-Tao 07, van de Geer 08, Zhang-Huang 08, Wainwright 09, Koltchinskii 09, Meinshausen-Yu 09, Bickel et al 09, and others

The Cone Constraint For an appropriately chosen λ n, the solution of the Lasso or the Dantzig selector satisfies (under iid Gaussian noise), with high probability, υ := β β C(s, k 0 ) k 0 = 1 for the Dantzig selector, and k 0 = 3 for the Lasso Object of interest: for 1 s 0 p, and a positive number k 0, C(s 0, k 0 ) = { x R p J {1,, p}, J = s 0 st x J c 1 k 0 x J 1 } This object has appeared in earlier work in the noiseless setting References: Donoho-Huo 01, Elad-Bruckstein 0, Feuer-Nemirovski 03, Candès-Tao 07, Bickel-Ritov-Tsybakov 09, Cohen-Dahmen-DeVore 09

The Lasso solution

Restricted Eigenvalue (RE) condition Object of interest: C(s 0, k 0 ) = { x R p J {1,, p}, J = s 0 st x J c 1 k 0 x J 1 } Definition Matrix A q p satisfies RE(s 0, k 0, A) condition with parameter K (s 0, k 0, A) if for any υ 0, 1 K (s 0, k 0, A) := min J {1,,p}, J s 0 Aυ min > 0 υ J c 1 k 0 υ J 1 υ J References: van de Geer 07, Bickel-Ritov-Tsybakov 09, van de Geer-Bühlmann 09

An elementary estimate Lemma For each vector υ C(s 0, k 0 ), let T 0 denote the locations of the s 0 1 largest coefficients of υ in absolute values Then υ T c 0 υ 1 T0, and υ T0 υ 1+k0 Implication: Let A be a q p matrix such that RE(s 0, 3k 0, A) condition holds for 0 < K (s 0, 3k 0, A) < Then υ C(s 0, k 0 ) S p 1 Aυ υ T0 K (s 0, k 0, A) 1 K (s 0, k 0, A) 1 + k 0 > 0

Sparse eigenvalues Definition For m p, we define the largest and smallest m-sparse eigenvalue of a q p matrix A to be ρ max (m, A) := max t R p,t 0;m sparse At / t, ρ min (m, A) := min t R p,t 0;m sparse At / t If RE(s 0, k 0, A) is satisfied with k 0 1, then the square submatrices of size s 0 of A T A are necessarily positive definite, that is, ρ min (s 0, A) > 0

Examples: of A which satisfies the Restricted Eigenvalue condition, but not RIP (Ruskutti, Wainwright, and Yu 10) Spiked Identity matrix: for a [0, 1), Σ p p = (1 a)i p p + a 1 1 T where 1 R p is the vector of all ones ρ min (Σ) > 0 Then for all s 0 s 0 submatrix Σ SS, we have ρ max (Σ SS ) ρ min (Σ SS ) = 1 + a(s 0 1) 1 a Largest sparse eigenvalue as s 0, but Σ 1/ e j = 1 is bounded

Motivation: to construct classes of design matrices such that the Restricted Eigenvalue condition will be satisfied Design matrix X has just independent rows, rather than independent entries: eg, consider for some matrix A q p X = ΨA, where rows of the matrix Ψ n q are independent isotropic vectors with subgaussian marginals, and RE(s 0, (1 + ε)k 0, A) holds for some ε > 0, p > s 0 0, and k 0 > 0 Design matrix X consists of independent identically distributed rows with bounded entries, whose covariance matrix Σ(X i ) = EX i X T i satisfies RE(s 0, (1 + ε)k 0, Σ 1/ ) The rows of X will be sampled from some distributions in R p ; The distribution may be highly non-gaussian and perhaps discrete

Outline Introduction The main results The reduction principle Applications of the reduction principle Ingredients of the proof Conclusion

Notation Let e 1,, e p be the canonical basis of R p For a set J {1,, p}, denote E J = span{e j : j J} For a matrix A, we use A to denote its operator norm For a set V R p, we let conv V denote the convex hull of V For a finite set Y, the cardinality is denoted by Y Let B p and Sp 1 be the unit Euclidean ball and the unit sphere respectively

The reduction principle: Theorem Let E = J =d E J for d(3k 0 ) < p, where d(3k 0 ) = s 0 + s 0 max Ae j 16K (s 0, 3k 0, A)(3k 0 ) (3k 0 + 1) j δ and E denotes R p otherwise Let Ψ be a matrix such that x AE (1 δ) x Ψx (1 + δ) x Then RE(s 0, k 0, ΨA) holds with 0 < K (s 0, k 0, ΨA) K (s 0,k 0,A) 1 5δ If the matrix Ψ acts as almost isometry on the images of the d-sparse vectors under A, then the product ΨA satisfies the RE condition with a smaller parameter k 0

Reformulation of the reduction principle: Theorem Restrictive Isometry Let Ψ be a matrix such that x AE (1 δ) x Ψx (1 + δ) x ( ) Then for any x A C(s 0, k 0 ) S q 1, (1 5δ) Ψx (1 + 3δ) If the matrix Ψ acts as almost isometry on the images of the d-sparse vectors under A, then it acts the same way on the images of C(s 0, k 0 ) It is reduced to checking that the almost isometry property holds for all vectors from some low-dimensional subspaces which is easier than checking the RE property directly

Definition: subgaussian random vectors Let Y be a random vector in R p 1 Y is called isotropic if for every y R p, E ( Y, y ) = y Y is ψ with a constant α if for every y R p, ( ) Y, y ψ := inf{t : E exp( Y, y /t ) } α y The ψ condition on a scalar random variable V is equivalent to the subgaussian tail decay of V, which means for some constant c, P ( V > t) exp( t /c ), for all t > 0 A random vector Y in R p is subgaussian if the one-dimensional marginals Y, y are sub-gaussian random variables for all y R p

The first application of the reduction principle Let A be a q p matrix satisfying RE(s 0, 3k 0, A) condition Let m = min(d, p) where d = s 0 + s 0 max Ae j 16K (s 0, 3k 0, A)(3k 0 ) (3k 0 + 1) j δ Theorem Let Ψ be an n q matrix whose rows are independent isotropic ψ random vectors in R q with constant α Suppose n 000mα4 δ log ( ) 60ep mδ Then with probability at least 1 exp( δ n/000α 4 ), RE(s 0, k 0, (1/ n)ψa) condition holds with 0 < K (s 0, k 0, (1/ n)ψa) K (s 0, k 0, A) 1 δ

Examples of subgaussian vectors The random vector Y with iid N(0, 1) random coordinates Discrete Gaussian vector, which is a random vector taking values on the integer lattice X p with distribution P(X = m) = C exp( m /) for m X p A vector with independent centered bounded random coordinates In particular, vectors with random symmetric Bernoulli coordinates, in other words, random vertices of the discrete cube

Previous results on (sub)gaussian random vectors Raskutti, Wainwright, and Yu 10: RE(s 0, k 0, X) holds for random Gaussian measurements / design matrix X which consists of n = O(s 0 log p) independent copies of a Gaussian random vector Y N p (0, Σ), assuming that the RE condition holds for Σ 1/ Their proof relies on a deep result from the theory of Gaussian random processes Gordon s Minimax Lemma To establish the RE condition for more general classes of random matrices we had to introduce a new approach based on geometric functional analysis, namely, the reduction principle The bound n = O(s 0 log p) can be improved to the optimal one n = O(s 0 log(p/s 0 )) when RE(s 0, k 0, Σ 1/ ) is replaced with RE(s 0, (1 + ε)k 0, Σ 1/ ) for any ε > 0

In Zhou 09, subgaussian random matrices of the form X = ΨΣ 1/ was considered, where Σ is a p p positive semidefinite matrix: X satisfies RE(s 0, k 0 ) condition with overwhelming probability if for K := K (s 0, k 0, Σ 1/ ), n > 9c α 4 { δ ( + k 0 ) K 4ρ max (s 0, Σ 1/ )s 0 log 5ep } s 0 log p s 0 Analysis there used a result in Mendelson et al 07, 08 The current result does not involve ρ max (s 0, A), nor the global parameters of the matrices A and Ψ, such as the norm or the smallest singular value Recall the Spiked Identity matrix: for a [0, 1), Σ p p = (1 a)i p p + a 1 1 T which satisfies the RE condition, such that ρ max (s 0, A) grows linearly with s 0 while the maximum of Σ 1/ e j = 1

Design matrices with uniformly bounded entries Let Y R p be a random vector such that Y M as and denote Σ = EYY T Let X be an n p matrix, whose rows X 1,, X n Y Set d = s 0 + s 0 max j ( Σ 1/ e j ) 16K (s 0, 3k 0, Σ 1/ )(3k 0 ) (3k 0 + 1) δ Theorem Assume that d p and ρ = ρ min (d, Σ 1/ ) > 0 Let Σ satisfy the RE(s 0, 3k 0, Σ 1/ ) condition Suppose n CM d log p ρδ log 3 ( CM d log p Then with probability at least 1 exp ( δρn/(6m d) ), RE(s 0, k 0, X) holds for matrix X/ n with 0 < K (s 0, k 0, X/ n)) K (s 0,k 0,Σ 1/ ) 1 δ ρδ )

Remarks on applying the reduction principle To analyze different classes of random design matrices: Unlike the case of a random matrix with subgaussian marginals, the estimate of the second example contains the minimal sparse singular value ρ min (d, Σ 1/ ) The reconstruction of sparse signals by subgaussian design matrices or by random Fourier ensemble was analyzed in the literature before, however only under the RIP assumptions The reduction principle can be applied to other types of random variables: eg, random vectors with heavy-tailed marginals, or random vectors with log-concave densities References: Rudelson-Vershynin 05, Baraniuk et al 08, Mendelson et al 08, Vershynin 11a, b, Adamczak et al 09

Maurey s empirical approximation argument (Pisier 81) Let u 1,, u M R q Let y conv(u 1,, u M ): y = j {1,,M} α j u j where α j 0, and j There exists a set L {1,,, M} such that L m = 4 max j {1,,M} u j ε and a vector y conv(u j, j L) such that y y ε α j = 1 proof An application of the probabilistic methods: if we only want to approximate y rather than exactly represent it as a convex combination of u 1,, u M, this is possible with much fewer points, namely, u 1,, u L

Let y = j {1,,M} α j u j where α j 0, and j Goal: to find a vector y conv(u j, j L) such that y y ε α j = 1

Let Y be a random vector in R q such that P (Y = u l ) = α l, l {1,, M} Then E (Y ) = α l u l = y l {1,,M}

Let Y 1,, Y m be independent copies of Y and let ε 1,, ε m be ±1 iid mean zero Bernoulli random variables, chosen independently of Y 1,, Y m By the standard symmetrization argument we have E y 1 m where m j=1 Y j E 4E 1 m m ε j Y j j=1 4 max l {1,,M} u l m ( ) Y j sup Y j max u l l {1,,M} and the last inequality in (1) follows from the definition of m = 4 m ( ) m E Y j j=1 ε (1)

Fix a realization Y j = u kj, j = 1,, m for which u km-1 y 1 m m j=1 u k m Y j ε u k3 u k1 The vector 1 m m j=1 Y j belongs to the convex hull of {u l : l L}, where L is the set of different elements from the sequence k 1,, k m Obviously L m and the lemma is proved QED u k

The Inclusion Lemma To( prove the) restricted isometry of Ψ over the set of vectors in A C(s 0, k 0 ) S q 1, we show that this set is contained in the convex hull of the images of the sparse vectors with norms not exceeding (1 δ) 1 Lemma Let 1 > δ > 0 Suppose RE(s 0, k 0, A) condition holds for matrix A q p For a set J {1,, p}, E J = span{e j : j J} Set Then d = d(k 0, A) = s 0 + s 0 max Ae j j ( ) A C(s 0, k 0 ) S q 1 (1 δ) 1 conv where for d p, E J is understood to be R p ( 16K (s 0, k 0, A)k0 (k ) 0 + 1) J d δ AE J S q 1

Conclusion We prove a general reduction principle showing that if the matrix Ψ acts as almost isometry on the images of the sparse vectors under A, then the product ΨA satisfies the RE condition with a smaller parameter k 0 We apply the reduction principle to analyze different classes of random design matrices This analysis is reduced to checking that the almost isometry property holds for all vectors from some low-dimensional subspaces, which is easier than checking the RE property directly

Thank you!