Constrained optimization

Similar documents
Lecture Notes 9: Constrained Optimization

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Convex Optimization & Lagrange Duality

DS-GA 1002 Lecture notes 10 November 23, Linear models

Introduction to Compressed Sensing

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

5. Duality. Lagrangian

Lecture 3 January 28

Convex Optimization M2

CS-E4830 Kernel Methods in Machine Learning

Constrained Optimization and Lagrangian Duality

STAT 200C: High-dimensional Statistics

Lecture: Duality of LP, SOCP and SDP

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Optimization Boyd & Vandenberghe. 5. Duality

Conditions for Robust Principal Component Analysis

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Duality.

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

Sparse Recovery Beyond Compressed Sensing

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Lagrangian Duality and Convex Optimization

1 Regression with High Dimensional Data

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Supremum of simple stochastic processes

EE 227A: Convex Optimization and Applications October 14, 2008

Sparse solutions of underdetermined systems

Lecture 7: Convex Optimizations

Reconstruction from Anisotropic Random Measurements

Sparsity Regularization

Optimization-based sparse recovery: Compressed sensing vs. super-resolution

IEOR 265 Lecture 3 Sparse Linear Regression

Lecture Note 5: Semidefinite Programming for Stability Analysis

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

EE364a Review Session 5

Machine Learning. Support Vector Machines. Manfred Huber

Gauge optimization and duality

Support Vector Machines: Maximum Margin Classifiers

Lecture 8. Strong Duality Results. September 22, 2008

A Brief Review on Convex Optimization

15. Conic optimization

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Semidefinite Programming Basics and Applications

Lecture Notes on Support Vector Machine

Towards a Mathematical Theory of Super-resolution

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

subject to (x 2)(x 4) u,

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Recovering overcomplete sparse representations from structured sensing

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Linear and non-linear programming

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Sparse Legendre expansions via l 1 minimization

8. Geometric problems

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

The convex algebraic geometry of rank minimization

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

Compressed Sensing and Sparse Recovery

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

ICS-E4030 Kernel Methods in Machine Learning

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Nonlinear Programming 3rd Edition. Theoretical Solutions Manual Chapter 6

Demixing Sines and Spikes: Robust Spectral Super-resolution in the Presence of Outliers

SPARSE signal representations have gained popularity in recent

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Uniqueness Conditions for A Class of l 0 -Minimization Problems

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Optimization for Compressed Sensing

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Statistical Issues in Searches: Photon Science Response. Rebecca Willett, Duke University

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

10 Numerical methods for constrained problems

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

EE/AA 578, Univ of Washington, Fall Duality

3. Linear Programming and Polyhedral Combinatorics

Primal/Dual Decomposition Methods

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Support Vector Machines

4. Convex optimization problems

Lecture 22: More On Compressed Sensing

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

Optimality Conditions for Constrained Optimization

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Near Optimal Signal Recovery from Random Projections

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Transcription:

Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda

Compressed sensing Convex constrained problems Analyzing optimization-based methods

Magnetic resonance imaging 2D DFT (magnitude) 2D DFT (log. of magnitude)

Magnetic resonance imaging Data: Samples from spectrum Problem: Sampling is time consuming (annoying, kids move... ) Images are compressible (sparse in wavelet basis) Can we recover compressible signals from less data?

2D wavelet transform

2D wavelet transform

Full DFT matrix

Full DFT matrix

Regular subsampling

Regular subsampling

Random subsampling

Random subsampling

Toy example

Regular subsampling

Random subsampling

Linear inverse problems Linear inverse problem A x = y Linear measurements, A R m n y[i] = A i:, x, 1 i n, Aim: Recover data signal x R m from data y R n We need n m, otherwise the problem is underdetermined If n < m there are infinite solutions x + w where w null (A)

Sparse recovery Aim: Recover sparse x from linear measurements When is the problem well posed? A x = y There shouldn t be two sparse vectors x 1 and x 2 such that A x 1 = A x 2

Spark The spark of a matrix is the smallest subset of columns that is linearly dependent

Spark The spark of a matrix is the smallest subset of columns that is linearly dependent Let y := A x, where A R m n, y R n and x R m is a sparse vector with s nonzero entries The vector x is the only vector with sparsity s consistent with the data, i.e. it is the solution of for any choice of x if and only if min x x 0 subject to A x = y spark (A) > 2s

Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data

Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data For any pair of s-sparse vectors x 1 and x 2 A ( x 1 x 2 ) 0

Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data For any pair of s-sparse vectors x 1 and x 2 A ( x 1 x 2 ) 0 For any pair of subsets of s indices T 1 and T 2 A T1 T 2 α 0 for any α R T 1 T 2

Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data For any pair of s-sparse vectors x 1 and x 2 A ( x 1 x 2 ) 0 For any pair of subsets of s indices T 1 and T 2 A T1 T 2 α 0 for any α R T 1 T 2 All submatrices with at most 2s columns have no nonzero vectors in their null space

Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data For any pair of s-sparse vectors x 1 and x 2 A ( x 1 x 2 ) 0 For any pair of subsets of s indices T 1 and T 2 A T1 T 2 α 0 for any α R T 1 T 2 All submatrices with at most 2s columns have no nonzero vectors in their null space All submatrices with at most 2s columns are full rank

Restricted-isometry property Robust version of spark If two s-sparse vectors x 1, x 2 are far, then A x 1, A x 2 should be far The linear operator should preserve distances (be an isometry) when restricted to act upon sparse vectors

Restricted-isometry property A satisfies the restricted isometry property (RIP) with constant κ s if for any s-sparse vector x (1 κ s ) x 2 A x 2 (1 + κ s ) x 2 If A satisfies the RIP for a sparsity level 2s then for any s-sparse x 1, x 2 y 2 y 1 2

Restricted-isometry property A satisfies the restricted isometry property (RIP) with constant κ s if for any s-sparse vector x (1 κ s ) x 2 A x 2 (1 + κ s ) x 2 If A satisfies the RIP for a sparsity level 2s then for any s-sparse x 1, x 2 y 2 y 1 2 = A ( x 1 x 2 ) (1 κ 2s ) x 2 x 1 2

Regular subsampling

Regular subsampling

Correlation with column 20 1.0 0.8 Correlation 0.6 0.4 0.2 0.0 0 10 20 30 40 50 60

Random subsampling

Random subsampling

Correlation with column 20 1.0 0.8 0.6 Correlation 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 50 60

Restricted-isometry property Deterministic matrices tend to not satisfy the RIP It is NP-hard to check if spark or RIP hold Random matrices satisfy RIP with high probability We prove it for Gaussian iid matrices, ideas in proof for random Fourier matrices are similar

Restricted-isometry property for Gaussian matrices Let A R m n be a random matrix with iid standard Gaussian entries 1 m A satisfies the RIP for a constant κ s with probability 1 C 2 n m C 1s κ 2 s for two fixed constants C 1, C 2 > 0 ( n ) log s as long as

Proof For a fixed support T of size s bounds follow from bounds on singular values of Gaussian matrices

Singular values of m s matrix, s = 100 σi m 1.5 1 m/s 2 5 10 20 50 100 200 0.5 0 20 40 60 80 100 i

Singular values of m s matrix, s = 1000 σi m 1.5 1 m/s 2 5 10 20 50 100 200 0.5 0 200 400 600 800 1,000 i

Proof For a fixed submatrix the singular values are bounded by m (1 κs ) σ s σ 1 m (1 + κ s ) with probability at least ( ) 12 s ( ) 1 2 exp mκ2 s 32 κ s For any vector x with support T 1 κs x 2 1 m A x 2 1 + κ s x 2

Union bound For any events S 1, S 2,..., S n in a probability space P ( i S i ) n P (S i ). i=1

Proof Number of different supports of size s

Proof Number of different supports of size s ( ) n ( en s s ) s

Proof Number of different supports of size s ( ) n ( en s s ) s By the union bound 1 κs x 2 1 m A x 2 1 + κ s x 2 holds for any s-sparse vector x with probability at least ( en ) ( ) s 12 s ( ) 1 2 exp mκ2 s s 32 κ s ( = 1 exp log 2 + s + s log 1 C 2 n as long as ( n s ) + s log m C 1s κ 2 s ( 12 κ s ( n ) log s ) mκ2 s 2 )

Sparse recovery via l 1 -norm minimization l 0 - norm" minimization is intractable (As usual) we can minimize l 1 norm instead, estimate x l1 to is the solution min x x 1 subject to A x = y

Minimum l 2 -norm solution (regular subsampling) 4 True Signal Minimum L2 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Minimum l 1 -norm solution (regular subsampling) 4 True Signal Minimum L1 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Minimum l 2 -norm solution (random subsampling) 4 True Signal Minimum L2 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Minimum l 1 -norm solution (random subsampling) 4 True Signal Minimum L1 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Geometric intuition l 2 norm l 1 norm

Sparse recovery via l 1 -norm minimization If the signal is sparse in a transform domain then min x c 1 subject to AW c = y If we want to recover the original c then AW should satisfy the RIP

Sparse recovery via l 1 -norm minimization If the signal is sparse in a transform domain then min x c 1 subject to AW c = y If we want to recover the original c then AW should satisfy the RIP However, we might be fine with any c such that A c = x

Regular subsampling

Minimum l 2 -norm solution (regular subsampling)

Minimum l 1 -norm solution (regular subsampling)

Random subsampling

Minimum l 2 -norm solution (random subsampling)

Minimum l 1 -norm solution (random subsampling)

Compressed sensing Convex constrained problems Analyzing optimization-based methods

Convex sets A convex set S is any set such that for any x, y S and θ (0, 1) θ x + (1 θ) y S The intersection of convex sets is convex

Convex vs nonconvex Nonconvex Convex

Epigraph epi (f ) f A function is convex if and only if its epigraph is convex

Projection onto convex set The projection of any vector x onto a non-empty closed convex set S exists and is unique P S ( x) := arg min y S x y 2

Proof Assume there are two distinct projections y 1 y 2 Consider y belongs to S (why?) y := y 1 + y 2 2

Proof x y, y 1 y = x y 1 + y 2 2 x y1 = 2, y 1 y 1 + y 2 2 + x y 2, x y 1 x y 2 2 2 2

Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y 2 2 2 x y1 = + x y 2, x y 1 x y 2 2 2 2 2 = 1 ( x y 1 2 + x y 2 2) 4 = 0

Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y 2 2 2 x y1 = + x y 2, x y 1 x y 2 2 2 2 2 = 1 ( x y 1 2 + x y 2 2) 4 By Pythagoras theorem = 0 x y 1 2 2

Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y 2 2 2 x y1 = + x y 2, x y 1 x y 2 2 2 2 2 = 1 ( x y 1 2 + x y 2 2) 4 By Pythagoras theorem = 0 x y 1 2 2 = x y 2 2 + y 1 y 2 2

Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y 2 2 2 x y1 = + x y 2, x y 1 x y 2 2 2 2 2 = 1 ( x y 1 2 + x y 2 2) 4 By Pythagoras theorem = 0 x y 1 2 2 = x y 2 2 + y 1 y 2 2 = x y 2 2 + y 1 y 2 2 2 2

Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y 2 2 2 x y1 = + x y 2, x y 1 x y 2 2 2 2 2 = 1 ( x y 1 2 + x y 2 2) 4 By Pythagoras theorem = 0 x y 1 2 2 = x y 2 2 + y 1 y 2 2 = x y 2 2 + y 1 y 2 2 2 > x y 2 2 2

Convex combination Given n vectors x 1, x 2,..., x n R n, x := n θ i x i i=1 is a convex combination of x 1, x 2,..., x n if θ i 0, 1 i n n θ i = 1 i=1

Convex hull The convex hull of S is the set of convex combinations of points in S The l 1 -norm ball is the convex hull of the intersection between the l 0 norm" ball and the l -norm ball

l 1 -norm ball

B l1 C (B l0 B l ) Let x B l1 Set θ i := x[i], θ 0 = 1 n i=1 θ i n i=0 θ i = 1 by construction, θ i 0 and n+1 θ 0 = 1 i=1 θ i = 1 x 1 0 because x B l1

B l1 C (B l0 B l ) Let x B l1 Set θ i := x[i], θ 0 = 1 n i=1 θ i n i=0 θ i = 1 by construction, θ i 0 and n+1 θ 0 = 1 i=1 θ i = 1 x 1 0 because x B l1 x B l0 B l because x = n θ i sign ( x[i]) e i + θ 0 0 i=1

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 m θ i y i 1 i=1 by the Triangle inequality

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 m θ i y i 1 i=1 m θ i y i i=1 by the Triangle inequality y i only has one nonzero entry

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 m θ i y i 1 i=1 m θ i y i i=1 m i=1 θ i by the Triangle inequality y i only has one nonzero entry

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 1 m θ i y i 1 i=1 m θ i y i i=1 m i=1 θ i by the Triangle inequality y i only has one nonzero entry

Convex optimization problem f 0, f 1,..., f m, h 1,..., h p : R n R minimize f 0 ( x) subject to f i ( x) 0, 1 i m, h i ( x) = 0, 1 i p,

Definitions A feasible vector is a vector that satisfies all the constraints A solution is any vector x such that for all feasible vectors x f 0 ( x) f 0 ( x ) If a solution exists f ( x ) is the optimal value or optimum of the problem

Convex optimization problem The optimization problem is convex if f 0 is convex f 1,..., f m are convex h 1,..., h p are affine, i.e. h i ( x) = a i T x + b i for some a i R n and b i R

Linear program minimize subject to a T x c T i x d i, 1 i m A x = b

l 1 -norm minimization as an LP The optimization problem minimize x 1 subject to A x = b can be recast as the LP minimize subject to m t[i] i=1 t[i] e T i x t[i] e T i x A x = b

Proof Solution to l 1 -norm min. problem: x l 1 Solution to linear program: ( x lp, t lp) Set t l 1 [i] := x l 1 [i] ( x l 1, t l ) 1 is feasible for linear program x l 1 = 1 m t l 1 [i] i=1

Proof Solution to l 1 -norm min. problem: x l 1 Solution to linear program: ( x lp, t lp) Set t l 1 [i] := x l 1 [i] ( x l 1, t l ) 1 is feasible for linear program x l 1 = 1 m t l 1 [i] i=1 m t lp [i] i=1 by optimality of t lp

Proof Solution to l 1 -norm min. problem: x l 1 Solution to linear program: ( x lp, t lp) Set t l 1 [i] := x l 1 [i] ( x l 1, t l ) 1 is feasible for linear program x l 1 m = t l 1 [i] 1 i=1 m t lp [i] i=1 x lp 1 by optimality of t lp

Proof Solution to l 1 -norm min. problem: x l 1 Solution to linear program: ( x lp, t lp) Set t l 1 [i] := x l 1 [i] ( x l 1, t l ) 1 is feasible for linear program x l 1 m = t l 1 [i] 1 i=1 m t lp [i] i=1 x lp 1 by optimality of t lp x lp is a solution to the l 1 -norm min. problem

Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1

Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1 x lp 1 by optimality of x l 1

Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1 x lp 1 by optimality of x l 1 m t lp [i] i=1

Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1 x lp 1 by optimality of x l 1 m t lp [i] i=1 ( x l 1, t l ) 1 is a solution to the linear problem

Quadratic program For a positive semidefinite matrix Q R n n minimize subject to x T Q x + a T x c T i x d i, 1 i m, A x = b

l 1 -norm regularized least squares as a QP The optimization problem can be recast as the QP minimize A x y 2 2 + α x 1 minimize subject to x T A T A x 2 y T x + α t[i] e T i x t[i] e T i x n t[i] i=1

Lagrangian The Lagrangian of a canonical optimization problem is m p L ( x, α, ν) := f 0 ( x) + α[i] f i ( x) + ν[j] h j ( x), i=1 j=1 α R m, ν R p are called Lagrange multipliers or dual variables If x is feasible and α[i] 0 for 1 i m L ( x, α, ν) f 0 ( x)

Lagrange dual function The Lagrange dual function of the problem is l ( α, ν) := inf x R n f 0 ( x) + m α[i]f i ( x) + i=1 Let p be an optimum of the optimization problem as long as α[i] 0 for 1 i n l ( α, ν) p p ν[j]h j ( x) j=1

Dual problem The dual problem of the (primal) optimization problem is maximize l ( α, ν) subject to α[i] 0, 1 i m. The dual problem is always convex, even if the primal isn t!

Maximum/supremum of convex functions Pointwise maximum of m convex functions f 1,..., f m is convex f max (x) := max 1 i m f i (x) Pointwise supremum of a family of convex functions indexed by a set I is convex f sup (x) := sup f i (x) i I

Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I

Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I sup θf i ( x) + (1 θ) f i ( y) i I by convexity of the f i

Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I sup θf i ( x) + (1 θ) f i ( y) i I θ sup i I f i ( x) + (1 θ) sup f j ( y) j I by convexity of the f i

Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I sup θf i ( x) + (1 θ) f i ( y) i I θ sup i I f i ( x) + (1 θ) sup f j ( y) j I = θf sup ( x) + (1 θ) f sup ( y) by convexity of the f i

Weak duality If p is a primal optimum and d a dual optimum d p

Strong duality For convex problems d = p under very weak conditions LPs: The primal optimum is finite General convex programs (Slater s condition): There exists a point that is strictly feasible f i ( x) < 0 1 i m

l 1 -norm minimization The dual problem of min x x 1 subject to A x = y is max y T ν ν subject to A T ν 1

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function If A T ν[i] > 1? l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν)

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν) If A T ν 1? (A T ν) T x

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν) If A T ν 1? (A T ν) T A x x T 1 ν x 1

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν) If A T ν 1? so l ( α, ν) = ν T y (A T ν) T A x x T 1 ν x 1

Strong duality The solution ν to satisfies max y T ν ν subject to A T ν 1 (A T ν )[i]= sign( x [i]) for all x [i] 0 for all solutions x to the primal problem min x x 1 subject to A x = y

Dual solution 1 0.8 True Signal Sign Minimum L1 Dual Solution 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-1 0 10 20 30 40 50 60 70

Proof By strong duality By Hölder s inequality x 1 = y T ν = (A x ) T ν = ( x ) T (A T ν ) m = (A T ν )[i] x [i] i=1 x 1 m (A T ν )[i] x [i] i=1 with equality if and only if (A T ν )[i] = sign( x [i]) for all x [i] 0

Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w

Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w = (A x) T w

Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w = (A x) T w = x T (A T w)

Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w = (A x) T w = x T (A T w) Idea: Maximize A T w, bounding magnitude of entries by 1

Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w = (A x) T w = x T (A T w) Idea: Maximize A T w, bounding magnitude of entries by 1 Entries where x is nonzero should saturate to 1 or -1

Compressed sensing Convex constrained problems Analyzing optimization-based methods

Analyzing optimization-based methods Best case scenario: Primal solution has closed form Otherwise: Use dual solution to characterize primal solution

Minimum l 2 -norm solution Let A R m n be a full rank matrix such that m < n For any y R n the solution to the optimization problem is arg min x x 2 subject to A x = y. x := VS 1 U T y where A = USV T is the SVD of A = A T ( A T A) 1 y

Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x

Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x = USV T V c

Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x = USV T V c = US c

Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x = USV T V c = US c A x = y is equivalent to US c = y and c = S 1 U T y

Proof For all feasible vectors x P row(a) x = VS 1 U T y By Pythagoras theorem, minimizing x 2 is equivalent to minimizing x 2 2 = Prow(A) x 2 2 + Prow(A) x 2 2

Regular subsampling

Minimum l 2 -norm solution (regular subsampling) 4 True Signal Minimum L2 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Regular subsampling A := 1 2 [ Fm/2 F m/2 ] F m/2 F m/2 = I F m/2 F m/2 = I x := [ xup x down ]

Regular subsampling x l2 = arg min A x= y x 2

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 ] [ ] x F up m/2 x down

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 2 F m/2 ] [ ] x F up m/2 x down

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 Fm/2 2 [ ] = 1 F m/2 I 1 ( ) F m/2 x up + F m/2 x down 2 F m/2 ] [ ] x F up m/2 x down

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 Fm/2 2 [ ] = 1 F m/2 2 Fm/2 I 1 ( ) F m/2 x up + F m/2 x down [ = 1 ( )] F m/2 Fm/2 x up + F m/2 x ( down ) 2 Fm/2 x up + F m/2 x down F m/2 ] [ ] x F up m/2 x down

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 Fm/2 2 [ ] = 1 F m/2 2 Fm/2 I 1 ( ) F m/2 x up + F m/2 x down [ = 1 ( )] F m/2 Fm/2 x up + F m/2 x ( down ) 2 Fm/2 Fm/2 x up + F m/2 x down = 1 [ ] xup + x down 2 x up + x down ] [ ] x F up m/2 x down

Minimum l 1 -norm solution Problem: arg min A x= y x 1 doesn t have a closed form Instead we can use a dual variable to certify optimality

Dual solution The solution ν to satisfies max y T ν ν subject to A T ν 1 (A T ν )[i]= sign( x [i]) for all x [i] 0 where x [i] is a solution to the primal problem min x x 1 subject to A x = y

Dual certificate If there exists a vector ν R n such that (A T ν)[i] = sign( x [i]) if x [i] 0 (A T ν)[i] <1 if x [i] = 0 then x is the unique solution to the primal problem min x x 1 subject to A x = y as long as the submatrix A T is full rank

Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν

Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν

Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν = ( x ) T (A T ν)

Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν = ( x ) T (A T ν) = x [i] sign( x [i]) i T

Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν = ( x ) T (A T ν) = x [i] sign( x [i]) i T x must be a solution = x 1

Proof 2 A T ν is a subgradient of the l 1 norm at x For any other feasible vector x x 1 x 1 + (A T ν) T ( x x )

Proof 2 A T ν is a subgradient of the l 1 norm at x For any other feasible vector x x 1 x 1 + (A T ν) T ( x x ) = x 1 + ν T (A x A x )

Proof 2 A T ν is a subgradient of the l 1 norm at x For any other feasible vector x x 1 x 1 + (A T ν) T ( x x ) = x 1 + ν T (A x A x ) = x 1

Random subsampling

Minimum l 1 -norm solution (random subsampling) 4 True Signal Minimum L1 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Exact sparse recovery via l 1 -norm minimization Assumption: There exists a signal x R m with s nonzeros such that A x = y for a random A R m n (random Fourier, Gaussian iid,... ) Exact recovery: If the number of measurements satisfies m C s log n the solution of the problem minimize x 1 subject to A x = y is the original signal with probability at least 1 1 n

Proof Show that dual certificate always exists We need A T T ν = sign( x T ) A T T c ν < 1 s constraints Idea: Impose A T ν = sign( x ) and minimize A T T c ν Problem: No closed-form solution How about minimizing l 2 norm?

Proof of exact recovery Prove that dual certificate exists for any s-sparse x Dual certificate candidate: Solution of minimize v 2 subject to A T T v = sign ( x T ) Closed-form solution ν l2 := A T ( A T T A T ) 1 sign ( x T ) A T T A T is invertible with high probability We need to prove that A T ν l2 satisfies (A T ν l2 ) T c < 1

Dual certificate 1.5 1 Sign Pattern Dual Function 0.5 0-0.5-1 -1.5 0 10 20 30 40 50 60 70

Proof of exact recovery To control (A T ν l2 ) T c, we need to bound for i T c A T i Let w := ( A T T A T ) 1 sign ( x T ) ( A T T A T ) 1 sign ( x T ) A T i w can be bounded using independence Result then follows from union bound