Optimisation Combinatoire et Convexe.

Similar documents
Tractable performance bounds for compressed sensing.

The convex algebraic geometry of linear inverse problems

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Tractable Upper Bounds on the Restricted Isometry Constant

Sparse Recovery with Pre-Gaussian Random Matrices

Sampling and high-dimensional convex geometry

Lecture Notes 9: Constrained Optimization

Intrinsic Volumes of Convex Cones Theory and Applications

Sections of Convex Bodies via the Combinatorial Dimension

The convex algebraic geometry of rank minimization

The Convex Geometry of Linear Inverse Problems

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Reconstruction from Anisotropic Random Measurements

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Sparse Optimization Lecture: Sparse Recovery Guarantees

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Uniqueness Conditions For Low-Rank Matrix Recovery

Algorithms for sparse analysis Lecture I: Background on sparse approximation

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

An Introduction to Sparse Approximation

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Z Algorithmic Superpower Randomization October 15th, Lecture 12

An iterative hard thresholding estimator for low rank matrix recovery

Dimensionality Reduction Notes 3

Robust Principal Component Analysis

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Compressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements

An example of a convex body without symmetric projections.

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Recovery of Simultaneously Structured Models using Convex Optimization

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

Strengthened Sobolev inequalities for a random subspace of functions

Error Correction via Linear Programming

Supremum of simple stochastic processes

Compressed Sensing: Lecture I. Ronald DeVore

A new method on deterministic construction of the measurement matrix in compressed sensing

of Orthogonal Matching Pursuit

Diameters of Sections and Coverings of Convex Bodies

The Convex Geometry of Linear Inverse Problems

Lecture 16: Compressed Sensing

Sparse and Low Rank Recovery via Null Space Properties

Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula

Going off the grid. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

A SIMPLE TOOL FOR BOUNDING THE DEVIATION OF RANDOM MATRICES ON GEOMETRIC SETS

Signal Recovery from Permuted Observations

The Stability of Low-Rank Matrix Reconstruction: a Constrained Singular Value Perspective

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Compressive Sensing and Beyond

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Universal low-rank matrix recovery from Pauli measurements

GREEDY SIGNAL RECOVERY REVIEW

Geometry of log-concave Ensembles of random matrices

Compressed Sensing and Sparse Recovery

IEOR 265 Lecture 3 Sparse Linear Regression

COMPRESSED Sensing (CS) is a method to recover a

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

CSC 576: Variants of Sparse Learning

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Phase Transition Phenomenon in Sparse Approximation

Constrained optimization

Analysis of Robust PCA via Local Incoherence

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Conditions for Robust Principal Component Analysis

Three Generalizations of Compressed Sensing

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

1 Sparsity and l 1 relaxation

Compressed Sensing and Neural Networks

Sparse and Low-Rank Matrix Decompositions

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

Convex sets, conic matrix factorizations and conic rank lower bounds

Lecture 2 Part 1 Optimization

Computation and Relaxation of Conditions for Equivalence between l 1 and l 0 Minimization

Latent Variable Graphical Model Selection Via Convex Optimization

A Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

AN INTRODUCTION TO COMPRESSIVE SENSING

Color Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Low-Rank Matrix Recovery

Compressed Sensing and Robust Recovery of Low Rank Matrices

Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

RSP-Based Analysis for Sparsest and Least l 1 -Norm Solutions to Underdetermined Linear Systems

Introduction to Compressed Sensing

Information-Theoretic Limits of Matrix Completion

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Subspaces and orthogonal decompositions generated by bounded orthogonal systems

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

Near Optimal Signal Recovery from Random Projections

Invertibility of random matrices

Solution Recovery via L1 minimization: What are possible and Why?

Another Low-Technology Estimate in Convex Geometry

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

On the recovery of measures without separation conditions

Lecture 22: More On Compressed Sensing

Transcription:

Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36

Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix completion, atomic norms. Algorithmic implications. A. d Aspremont. M1 ENS. 2/36

Low complexity models Consider the following underdetermined linear system A x = b = m n where A R m n, with n m. Can we find the sparsest solution? A. d Aspremont. M1 ENS. 3/36

Introduction Signal processing: We make a few measurements of a high dimensional signal, which admits a sparse representation in a well chosen basis (e.g. Fourier, wavelet). Can we reconstruct the signal exactly? Coding: Suppose we transmit a message which is corrupted by a few errors. How many errors does it take to start losing the signal? Statistics: Variable selection in regression (LASSO, etc). A. d Aspremont. M1 ENS. 4/36

Introduction Why sparsity? Sparsity is a proxy for power laws. Most results stated here on sparse vectors apply to vectors with a power law decay in coefficient magnitude. Power laws appear everywhere... Zipf law: word frequencies in natural language follow a power law. Ranking: pagerank coefficients follow a power law. Signal processing: 1/f signals Social networks: node degrees follow a power law. Earthquakes: Gutenberg-Richter power laws River systems, cities, net worth, etc. A. d Aspremont. M1 ENS. 5/36

Introduction Frequency vs. word in Wikipedia (from Wikipedia). A. d Aspremont. M1 ENS. 6/36

Introduction Frequency vs. magnitude for earthquakes worldwide. Christensen et al. [2002] A. d Aspremont. M1 ENS. 7/36

Introduction Pages vs. Pagerank on web sample. Pandurangan et al. [2006] A. d Aspremont. M1 ENS. 8/36

Introduction Getting the sparsest solution means solving minimize subject to Card(x) Ax = b which is a (hard) combinatorial problem in x R n. A classic heuristic is to solve instead minimize x 1 subject to Ax = b which is equivalent to an (easy) linear program. A. d Aspremont. M1 ENS. 9/36

Introduction Assuming x 1, we can replace: with Card(x) = x 1 = n i=1 1 {xi 0} n x i i=1 Graphically, assuming x [ 1, 1] this is: 1 x Card(x) 1 0 1 x The l 1 norm is the largest convex lower bound on Card(x) in [ 1, 1]. A. d Aspremont. M1 ENS. 10/36

Introduction Example: we fix A, we draw many sparse signals e and plot the probability of perfectly recovering e by solving in x R n, with n = 50 and m = 30. 1 minimize x 1 subject to Ax = Ae Prob. of recovering e 0.8 0.6 0.4 0.2 0 0 10 20 30 40 50 Cardinality of e A. d Aspremont. M1 ENS. 11/36

Introduction Donoho and Tanner [2005] and Candès and Tao [2005] show that for certain classes of matrices, when the solution e is sparse enough, the solution of the l 1 -minimization problem is also the sparsest solution to Ax = Ae. Let k = Card(e), this happens even when k = O(m) asymptotically, which is provably optimal. Also obtain bounds on reconstruction error outside of this range. 1 Cardinality k/m 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Shape m/n A. d Aspremont. M1 ENS. 12/36

Introduction Similar results exist for rank minimization. The l 1 norm is replaced by the trace norm on matrices. Exact recovery results are detailed in Recht et al. [2007], Candes and Recht [2008],... A. d Aspremont. M1 ENS. 13/36

l 1 recovery A. d Aspremont. M1 ENS. 14/36

Diameter Kashin and Temlyakov [2007]: Simple relationship between the diameter of a section of the l 1 ball and the size of signals recovered by l 1 -minimization. Proposition Diameter & Recovery threshold. that there is some k > 0 such that Given a coding matrix A R m n, suppose sup x 2 1 (1) Ax=0 k x 1 1 then sparse recovery x LP = u is guaranteed if Card(u) k/4, and u x LP 1 4 min {Card(y) k/16} u y 1 where x LP solves the l 1 -minimization LP and u is the true signal. A. d Aspremont. M1 ENS. 15/36

Diameter Proof. Kashin and Temlyakov [2007]. Suppose sup x 2 k 1/2 Ax=0 x 1 1 Let u be the true signal, with Card(u) k/4. If x satisfies Ax = 0, for any support set Λ with Λ k/4, x i Λ x 2 Λ /k x 1 x 1 /2, i Λ Now let Λ = supp(u) and let v u such that x = v u satisfies Ax = 0, then v 1 = i Λ u i +x i + i/ Λ x i i Λ u i i Λ x i + i/ Λ x i = u 1 + x 1 2 i Λ x i and x 1 2 i Λ x i > 0 means that v 1 > u 1, so x LP = u. The error bound follows from similar arg. A. d Aspremont. M1 ENS. 16/36

Diameter, low M estimate Theorem Low M estimate. Let E R n be a subspace of codimension k chosen uniformly at random w.r.t. to the Haar measure on G n,n k, then n n diam(k E) c k M(K ) = c k S n 1 x K dσ(x) with probability 1 e k, where c is an absolute constant. Proof. See [Pajor and Tomczak-Jaegermann, 1986] for example. We have M(B ) n log n/n asymptotically. This means that random sections of the l 1 ball with dimension n k have diameter bounded by log n diam(b1 n E) c k with high probability, where c is an absolute constant (a more precise analysis allows the log term to be replaced by log(n/k)). A. d Aspremont. M1 ENS. 17/36

Sections of the l 1 ball Results guaranteeing near-optimal bounds on the diameter can be traced back to Kashin and Dvoretzky s theorem. Kashin decomposition [Kashin, 1977]. Given n = 2m, there exists two orthogonal m-dimensional subspaces E 1, E 2 R n such that 1 8 x 2 1 x 1 x 2, for all x E 1 E 2 n In fact, most m-dimensional subspaces satisfy this relationship. A. d Aspremont. M1 ENS. 18/36

Atomic norms We can give another geometric view on the recovery of low complexity models. Once again, we focus on problem (2), namely minimize x A subject to Ax = Ax 0 and we start by a construction from [Chandrasekaran et al., 2010] on a specific type of norm penalties, which induce simple representations in a generic setting. Definition Atomic norm. Let A R n be a set of atoms. Let A be the gauge of A, i.e. x A = inf{t > 0 : x t Co(A)} The motivation for this definition is simple, if the centroid of A is at the origin, we have { x A = inf λ a : x = } λ a a, λ a 0 a A a A A. d Aspremont. M1 ENS. 19/36

Atomic norms Depending on A, atomic norms look very familiar. Suppose A = {±e i } i=1,...,n where e i is the Euclidean basis of R n. Then Co(A) is the l 1 ball and x A = x 1. Suppose A = {uv T : u, v R n, u 2 = v 2 = 1}, then Co(A) is the unit ball of the trace norm and X A = X when X R n n. Suppose A is the set of all orthogonal matrices of dimension n. Its convex hull is the unit ball of the spectral norm, and X A = X 2 when X R n n. Suppose A is the set of all permutations of the list {1, 2,..., n}, its convex hull is called the is the permutahedron (it needs to be recentered) and x A is hard to compute (but can be used as a penalty). A. d Aspremont. M1 ENS. 20/36

Atomic norms Suppose A is an atomic norm, focus on minimize x A (2) subject to Ax = Ax 0 Proposition Optimality & recovery. We write T A (x 0 ) = Cone{z x 0 : z A x 0 A } the tangeant cone at x 0. Then x 0 is the unique optimal solution of (3) iff T A (x 0 ) N (A) = {0} A. d Aspremont. M1 ENS. 21/36

Atomic norms Perfect recovery of x 0 by minimizing the atomic norm x A occurs when the intersection of the subspace N (A) and the cone T A (x 0 ) is empty. When A is i.i.d. Gaussian with variance 1/m, the probability of the event T A (x 0 ) N (A) = {0} can be bounded explicitly. Proposition [Gordon, 1988] Let A R m n, be i.i.d. Gaussian with A i,j N (0, 1/m), let Ω = T A (x 0 ) B p 2 be the intersection of the cone T A(x 0 ) with the unit sphere, x 0 is the unique minimizer of (3) with probability 1 exp( (λ n ω(ω)) 2 /2) if m ω(ω) 2 + 1 where [ ] 2 Γ((m + 1)/2) ω(ω) = E sup y T g y Ω and λ m = Γ(m/2) A. d Aspremont. M1 ENS. 22/36

Atomic norms The previous result shows that computing the recovery threshold n (number of samples required to reconstruct the signal x 0 ), it suffices to estimate [ ] ω(ω) = E sup y T g y T A (x 0 ), y 2 =1 This quantity can be computed for many atomic norms A. Suppose x 0 is a k-sparse vector, x A = x 1 and ω(ω) 2 2k log(p/k) + 5k/4 Suppose X 0 is a rank r matrix in R m 1 m 2, then X A = X and ω(ω) 2 r(m 1 + m 2 r) Suppose X 0 is an orthogonal matrix of dimension n, then X A = X 2 and ω(ω) 2 3n2 n 4 A. d Aspremont. M1 ENS. 23/36

* References J. Bourgain, J. Lindenstrauss, and V. Milman. Minkowski sums and symmetrizations. Geometric aspects of functional analysis, pages 44 66, 1988. E. Candes and B. Recht. Simple bounds for low-complexity model reconstruction. arxiv preprint arxiv:1106.1474, 2011. E. J. Candès and T. Tao. Decoding by linear programming. IEEE Transactions on Information Theory, 51(12):4203 4215, 2005. E.J. Candes and B. Recht. Exact matrix completion via convex optimization. preprint, 2008. V. Chandrasekaran, B. Rechtw, P.A. Parrilo, and A.S. Willskym. The convex geometry of linear inverse problems. Arxiv preprint arxiv:1012.0621, 2010. K. Christensen, L. Danon, T. Scanlon, and P. Bak. Unified scaling law for earthquakes, 2002. D. L. Donoho and J. Tanner. Sparse nonnegative solutions of underdetermined linear equations by linear programming. Proc. of the National Academy of Sciences, 102(27):9446 9451, 2005. J.J. Fuchs. On sparse representations in arbitrary redundant bases. Information Theory, IEEE Transactions on, 50(6):1341 1344, 2004. A. Giannopoulos, V.D. Milman, and A. Tsolomitis. Asymptotic formulas for the diameter of sections of symmetric convex bodies. Journal of Functional Analysis, 223(1):86 108, 2005. A. A. Giannopoulos and V. D. Milman. On the diameter of proportional sections of a symmetric convex body. International Math. Research Notices, No. 1 (1997) 5 19., (1):5 19, 1997. Y. Gordon. On Milman s inequality and random subspaces which escape through a mesh in R n. Geometric Aspects of Functional Analysis, pages 84 106, 1988. B. Kashin. The widths of certain finite dimensional sets and classes of smooth functions. Izv. Akad. Nauk SSSR Ser. Mat, 41(2):334 351, 1977. B.S. Kashin and V.N. Temlyakov. A remark on compressed sensing. Mathematical notes, 82(5):748 755, 2007. A. Pajor and N. Tomczak-Jaegermann. Subspaces of small codimension of finite-dimensional banach spaces. Proceedings of the American Mathematical Society, 97(4):637 642, 1986. G. Pandurangan, P. Raghavan, and E. Upfal. Using pagerank to characterize web structure. Internet Mathematics, 3(1):1 20, 2006. B. Recht, M. Fazel, and P.A. Parrilo. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization. Arxiv preprint arxiv:0706.4138, 2007. R. Vershynin. Lectures in Geometric Functional Analysis. In preparation, 2011. URL http://www-personal.umich.edu/~romanv/papers/gfa-book/gfa-book.pdf. A. d Aspremont. M1 ENS. 24/36