Constrained optimization - PDF Free Download

Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda

Compressed sensing Convex constrained problems Analyzing optimization-based methods

Magnetic resonance imaging 2D DFT (magnitude) 2D DFT (log. of magnitude)

Magnetic resonance imaging Data: Samples from spectrum Problem: Sampling is time consuming (annoying, kids move... ) Images are compressible (sparse in wavelet basis) Can we recover compressible signals from less data?

2D wavelet transform

Full DFT matrix

Regular subsampling

Random subsampling

Toy example

Regular subsampling

Random subsampling

Linear inverse problems Linear inverse problem A x = y Linear measurements, A R m n y[i] = A i:, x, 1 i n, Aim: Recover data signal x R m from data y R n We need n m, otherwise the problem is underdetermined If n < m there are infinite solutions x + w where w null (A)

Sparse recovery Aim: Recover sparse x from linear measurements When is the problem well posed? A x = y There shouldn t be two sparse vectors x 1 and x 2 such that A x 1 = A x 2

Spark The spark of a matrix is the smallest subset of columns that is linearly dependent

Spark The spark of a matrix is the smallest subset of columns that is linearly dependent Let y := A x, where A R m n, y R n and x R m is a sparse vector with s nonzero entries The vector x is the only vector with sparsity s consistent with the data, i.e. it is the solution of for any choice of x if and only if min x x 0 subject to A x = y spark (A) > 2s

Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data

Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data For any pair of s-sparse vectors x 1 and x 2 A ( x 1 x 2 ) 0

Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data For any pair of s-sparse vectors x 1 and x 2 A ( x 1 x 2 ) 0 For any pair of subsets of s indices T 1 and T 2 A T1 T 2 α 0 for any α R T 1 T 2 All submatrices with at most 2s columns have no nonzero vectors in their null space

Restricted-isometry property Robust version of spark If two s-sparse vectors x 1, x 2 are far, then A x 1, A x 2 should be far The linear operator should preserve distances (be an isometry) when restricted to act upon sparse vectors

Restricted-isometry property A satisfies the restricted isometry property (RIP) with constant κ s if for any s-sparse vector x (1 κ s ) x 2 A x 2 (1 + κ s ) x 2 If A satisfies the RIP for a sparsity level 2s then for any s-sparse x 1, x 2 y 2 y 1 2

Regular subsampling

Correlation with column 20 1.0 0.8 Correlation 0.6 0.4 0.2 0.0 0 10 20 30 40 50 60

Random subsampling

Correlation with column 20 1.0 0.8 0.6 Correlation 0.4 0.2 0.0 0.2 0.4 0.6 0 10 20 30 40 50 60

Restricted-isometry property Deterministic matrices tend to not satisfy the RIP It is NP-hard to check if spark or RIP hold Random matrices satisfy RIP with high probability We prove it for Gaussian iid matrices, ideas in proof for random Fourier matrices are similar

Restricted-isometry property for Gaussian matrices Let A R m n be a random matrix with iid standard Gaussian entries 1 m A satisfies the RIP for a constant κ s with probability 1 C 2 n m C 1s κ 2 s for two fixed constants C 1, C 2 > 0 ( n ) log s as long as

Proof For a fixed support T of size s bounds follow from bounds on singular values of Gaussian matrices

Singular values of m s matrix, s = 100 σi m 1.5 1 m/s 2 5 10 20 50 100 200 0.5 0 20 40 60 80 100 i

Singular values of m s matrix, s = 1000 σi m 1.5 1 m/s 2 5 10 20 50 100 200 0.5 0 200 400 600 800 1,000 i

Proof For a fixed submatrix the singular values are bounded by m (1 κs ) σ s σ 1 m (1 + κ s ) with probability at least ( ) 12 s ( ) 1 2 exp mκ2 s 32 κ s For any vector x with support T 1 κs x 2 1 m A x 2 1 + κ s x 2

Union bound For any events S 1, S 2,..., S n in a probability space P ( i S i ) n P (S i ). i=1

Proof Number of different supports of size s

Proof Number of different supports of size s ( ) n ( en s s ) s

Proof Number of different supports of size s ( ) n ( en s s ) s By the union bound 1 κs x 2 1 m A x 2 1 + κ s x 2 holds for any s-sparse vector x with probability at least ( en ) ( ) s 12 s ( ) 1 2 exp mκ2 s s 32 κ s ( = 1 exp log 2 + s + s log 1 C 2 n as long as ( n s ) + s log m C 1s κ 2 s ( 12 κ s ( n ) log s ) mκ2 s 2 )

Sparse recovery via l 1 -norm minimization l 0 - norm" minimization is intractable (As usual) we can minimize l 1 norm instead, estimate x l1 to is the solution min x x 1 subject to A x = y

Minimum l 2 -norm solution (regular subsampling) 4 True Signal Minimum L2 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Minimum l 1 -norm solution (regular subsampling) 4 True Signal Minimum L1 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Minimum l 2 -norm solution (random subsampling) 4 True Signal Minimum L2 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Minimum l 1 -norm solution (random subsampling) 4 True Signal Minimum L1 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Geometric intuition l 2 norm l 1 norm

Sparse recovery via l 1 -norm minimization If the signal is sparse in a transform domain then min x c 1 subject to AW c = y If we want to recover the original c then AW should satisfy the RIP

Sparse recovery via l 1 -norm minimization If the signal is sparse in a transform domain then min x c 1 subject to AW c = y If we want to recover the original c then AW should satisfy the RIP However, we might be fine with any c such that A c = x

Regular subsampling

Minimum l 2 -norm solution (regular subsampling)

Minimum l 1 -norm solution (regular subsampling)

Random subsampling

Minimum l 2 -norm solution (random subsampling)

Minimum l 1 -norm solution (random subsampling)

Compressed sensing Convex constrained problems Analyzing optimization-based methods

Convex sets A convex set S is any set such that for any x, y S and θ (0, 1) θ x + (1 θ) y S The intersection of convex sets is convex

Convex vs nonconvex Nonconvex Convex

Epigraph epi (f ) f A function is convex if and only if its epigraph is convex

Projection onto convex set The projection of any vector x onto a non-empty closed convex set S exists and is unique P S ( x) := arg min y S x y 2

Proof Assume there are two distinct projections y 1 y 2 Consider y belongs to S (why?) y := y 1 + y 2 2

Proof x y, y 1 y = x y 1 + y 2 2 x y1 = 2, y 1 y 1 + y 2 2 + x y 2, x y 1 x y 2 2 2 2

Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y 2 2 2 x y1 = + x y 2, x y 1 x y 2 2 2 2 2 = 1 ( x y 1 2 + x y 2 2) 4 = 0

Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y 2 2 2 x y1 = + x y 2, x y 1 x y 2 2 2 2 2 = 1 ( x y 1 2 + x y 2 2) 4 By Pythagoras theorem = 0 x y 1 2 2

Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y 2 2 2 x y1 = + x y 2, x y 1 x y 2 2 2 2 2 = 1 ( x y 1 2 + x y 2 2) 4 By Pythagoras theorem = 0 x y 1 2 2 = x y 2 2 + y 1 y 2 2

Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y 2 2 2 x y1 = + x y 2, x y 1 x y 2 2 2 2 2 = 1 ( x y 1 2 + x y 2 2) 4 By Pythagoras theorem = 0 x y 1 2 2 = x y 2 2 + y 1 y 2 2 = x y 2 2 + y 1 y 2 2 2 2

Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y 2 2 2 x y1 = + x y 2, x y 1 x y 2 2 2 2 2 = 1 ( x y 1 2 + x y 2 2) 4 By Pythagoras theorem = 0 x y 1 2 2 = x y 2 2 + y 1 y 2 2 = x y 2 2 + y 1 y 2 2 2 > x y 2 2 2

Convex combination Given n vectors x 1, x 2,..., x n R n, x := n θ i x i i=1 is a convex combination of x 1, x 2,..., x n if θ i 0, 1 i n n θ i = 1 i=1

Convex hull The convex hull of S is the set of convex combinations of points in S The l 1 -norm ball is the convex hull of the intersection between the l 0 norm" ball and the l -norm ball

l 1 -norm ball

B l1 C (B l0 B l ) Let x B l1 Set θ i := x[i], θ 0 = 1 n i=1 θ i n i=0 θ i = 1 by construction, θ i 0 and n+1 θ 0 = 1 i=1 θ i = 1 x 1 0 because x B l1

B l1 C (B l0 B l ) Let x B l1 Set θ i := x[i], θ 0 = 1 n i=1 θ i n i=0 θ i = 1 by construction, θ i 0 and n+1 θ 0 = 1 i=1 θ i = 1 x 1 0 because x B l1 x B l0 B l because x = n θ i sign ( x[i]) e i + θ 0 0 i=1

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 m θ i y i 1 i=1 by the Triangle inequality

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 m θ i y i 1 i=1 m θ i y i i=1 by the Triangle inequality y i only has one nonzero entry

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 m θ i y i 1 i=1 m θ i y i i=1 m i=1 θ i by the Triangle inequality y i only has one nonzero entry

C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 1 m θ i y i 1 i=1 m θ i y i i=1 m i=1 θ i by the Triangle inequality y i only has one nonzero entry

Convex optimization problem f 0, f 1,..., f m, h 1,..., h p : R n R minimize f 0 ( x) subject to f i ( x) 0, 1 i m, h i ( x) = 0, 1 i p,

Definitions A feasible vector is a vector that satisfies all the constraints A solution is any vector x such that for all feasible vectors x f 0 ( x) f 0 ( x ) If a solution exists f ( x ) is the optimal value or optimum of the problem

Convex optimization problem The optimization problem is convex if f 0 is convex f 1,..., f m are convex h 1,..., h p are affine, i.e. h i ( x) = a i T x + b i for some a i R n and b i R

Linear program minimize subject to a T x c T i x d i, 1 i m A x = b

l 1 -norm minimization as an LP The optimization problem minimize x 1 subject to A x = b can be recast as the LP minimize subject to m t[i] i=1 t[i] e T i x t[i] e T i x A x = b

Proof Solution to l 1 -norm min. problem: x l 1 Solution to linear program: ( x lp, t lp) Set t l 1 [i] := x l 1 [i] ( x l 1, t l ) 1 is feasible for linear program x l 1 = 1 m t l 1 [i] i=1

Proof Solution to l 1 -norm min. problem: x l 1 Solution to linear program: ( x lp, t lp) Set t l 1 [i] := x l 1 [i] ( x l 1, t l ) 1 is feasible for linear program x l 1 = 1 m t l 1 [i] i=1 m t lp [i] i=1 by optimality of t lp

Proof Solution to l 1 -norm min. problem: x l 1 Solution to linear program: ( x lp, t lp) Set t l 1 [i] := x l 1 [i] ( x l 1, t l ) 1 is feasible for linear program x l 1 m = t l 1 [i] 1 i=1 m t lp [i] i=1 x lp 1 by optimality of t lp

Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1

Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1 x lp 1 by optimality of x l 1

Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1 x lp 1 by optimality of x l 1 m t lp [i] i=1

Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1 x lp 1 by optimality of x l 1 m t lp [i] i=1 ( x l 1, t l ) 1 is a solution to the linear problem

Quadratic program For a positive semidefinite matrix Q R n n minimize subject to x T Q x + a T x c T i x d i, 1 i m, A x = b

l 1 -norm regularized least squares as a QP The optimization problem can be recast as the QP minimize A x y 2 2 + α x 1 minimize subject to x T A T A x 2 y T x + α t[i] e T i x t[i] e T i x n t[i] i=1

Lagrangian The Lagrangian of a canonical optimization problem is m p L ( x, α, ν) := f 0 ( x) + α[i] f i ( x) + ν[j] h j ( x), i=1 j=1 α R m, ν R p are called Lagrange multipliers or dual variables If x is feasible and α[i] 0 for 1 i m L ( x, α, ν) f 0 ( x)

Lagrange dual function The Lagrange dual function of the problem is l ( α, ν) := inf x R n f 0 ( x) + m α[i]f i ( x) + i=1 Let p be an optimum of the optimization problem as long as α[i] 0 for 1 i n l ( α, ν) p p ν[j]h j ( x) j=1

Dual problem The dual problem of the (primal) optimization problem is maximize l ( α, ν) subject to α[i] 0, 1 i m. The dual problem is always convex, even if the primal isn t!

Maximum/supremum of convex functions Pointwise maximum of m convex functions f 1,..., f m is convex f max (x) := max 1 i m f i (x) Pointwise supremum of a family of convex functions indexed by a set I is convex f sup (x) := sup f i (x) i I

Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I

Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I sup θf i ( x) + (1 θ) f i ( y) i I by convexity of the f i

Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I sup θf i ( x) + (1 θ) f i ( y) i I θ sup i I f i ( x) + (1 θ) sup f j ( y) j I by convexity of the f i

Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I sup θf i ( x) + (1 θ) f i ( y) i I θ sup i I f i ( x) + (1 θ) sup f j ( y) j I = θf sup ( x) + (1 θ) f sup ( y) by convexity of the f i

Weak duality If p is a primal optimum and d a dual optimum d p

Strong duality For convex problems d = p under very weak conditions LPs: The primal optimum is finite General convex programs (Slater s condition): There exists a point that is strictly feasible f i ( x) < 0 1 i m

l 1 -norm minimization The dual problem of min x x 1 subject to A x = y is max y T ν ν subject to A T ν 1

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function If A T ν[i] > 1? l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν)

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν) If A T ν 1? (A T ν) T x

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν) If A T ν 1? (A T ν) T A x x T 1 ν x 1

Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν) If A T ν 1? so l ( α, ν) = ν T y (A T ν) T A x x T 1 ν x 1

Strong duality The solution ν to satisfies max y T ν ν subject to A T ν 1 (A T ν )[i]= sign( x [i]) for all x [i] 0 for all solutions x to the primal problem min x x 1 subject to A x = y

Dual solution 1 0.8 True Signal Sign Minimum L1 Dual Solution 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-1 0 10 20 30 40 50 60 70

Proof By strong duality By Hölder s inequality x 1 = y T ν = (A x ) T ν = ( x ) T (A T ν ) m = (A T ν )[i] x [i] i=1 x 1 m (A T ν )[i] x [i] i=1 with equality if and only if (A T ν )[i] = sign( x [i]) for all x [i] 0

Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w

Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w = (A x) T w

Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w = (A x) T w = x T (A T w)

Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w = (A x) T w = x T (A T w) Idea: Maximize A T w, bounding magnitude of entries by 1

Compressed sensing Convex constrained problems Analyzing optimization-based methods

Analyzing optimization-based methods Best case scenario: Primal solution has closed form Otherwise: Use dual solution to characterize primal solution

Minimum l 2 -norm solution Let A R m n be a full rank matrix such that m < n For any y R n the solution to the optimization problem is arg min x x 2 subject to A x = y. x := VS 1 U T y where A = USV T is the SVD of A = A T ( A T A) 1 y

Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x

Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x = USV T V c

Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x = USV T V c = US c

Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x = USV T V c = US c A x = y is equivalent to US c = y and c = S 1 U T y

Proof For all feasible vectors x P row(a) x = VS 1 U T y By Pythagoras theorem, minimizing x 2 is equivalent to minimizing x 2 2 = Prow(A) x 2 2 + Prow(A) x 2 2

Regular subsampling

Minimum l 2 -norm solution (regular subsampling) 4 True Signal Minimum L2 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Regular subsampling A := 1 2 [ Fm/2 F m/2 ] F m/2 F m/2 = I F m/2 F m/2 = I x := [ xup x down ]

Regular subsampling x l2 = arg min A x= y x 2

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 ] [ ] x F up m/2 x down

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 2 F m/2 ] [ ] x F up m/2 x down

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 Fm/2 2 [ ] = 1 F m/2 I 1 ( ) F m/2 x up + F m/2 x down 2 F m/2 ] [ ] x F up m/2 x down

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 Fm/2 2 [ ] = 1 F m/2 2 Fm/2 I 1 ( ) F m/2 x up + F m/2 x down [ = 1 ( )] F m/2 Fm/2 x up + F m/2 x ( down ) 2 Fm/2 x up + F m/2 x down F m/2 ] [ ] x F up m/2 x down

Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 Fm/2 2 [ ] = 1 F m/2 2 Fm/2 I 1 ( ) F m/2 x up + F m/2 x down [ = 1 ( )] F m/2 Fm/2 x up + F m/2 x ( down ) 2 Fm/2 Fm/2 x up + F m/2 x down = 1 [ ] xup + x down 2 x up + x down ] [ ] x F up m/2 x down

Minimum l 1 -norm solution Problem: arg min A x= y x 1 doesn t have a closed form Instead we can use a dual variable to certify optimality

Dual solution The solution ν to satisfies max y T ν ν subject to A T ν 1 (A T ν )[i]= sign( x [i]) for all x [i] 0 where x [i] is a solution to the primal problem min x x 1 subject to A x = y

Dual certificate If there exists a vector ν R n such that (A T ν)[i] = sign( x [i]) if x [i] 0 (A T ν)[i] <1 if x [i] = 0 then x is the unique solution to the primal problem min x x 1 subject to A x = y as long as the submatrix A T is full rank

Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν

Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν

Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν = ( x ) T (A T ν)

Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν = ( x ) T (A T ν) = x [i] sign( x [i]) i T

Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν = ( x ) T (A T ν) = x [i] sign( x [i]) i T x must be a solution = x 1

Proof 2 A T ν is a subgradient of the l 1 norm at x For any other feasible vector x x 1 x 1 + (A T ν) T ( x x )

Proof 2 A T ν is a subgradient of the l 1 norm at x For any other feasible vector x x 1 x 1 + (A T ν) T ( x x ) = x 1 + ν T (A x A x )

Proof 2 A T ν is a subgradient of the l 1 norm at x For any other feasible vector x x 1 x 1 + (A T ν) T ( x x ) = x 1 + ν T (A x A x ) = x 1

Random subsampling

Minimum l 1 -norm solution (random subsampling) 4 True Signal Minimum L1 Solution 3 2 1 0-1 -2-3 0 10 20 30 40 50 60 70

Exact sparse recovery via l 1 -norm minimization Assumption: There exists a signal x R m with s nonzeros such that A x = y for a random A R m n (random Fourier, Gaussian iid,... ) Exact recovery: If the number of measurements satisfies m C s log n the solution of the problem minimize x 1 subject to A x = y is the original signal with probability at least 1 1 n

Proof Show that dual certificate always exists We need A T T ν = sign( x T ) A T T c ν < 1 s constraints Idea: Impose A T ν = sign( x ) and minimize A T T c ν Problem: No closed-form solution How about minimizing l 2 norm?

Proof of exact recovery Prove that dual certificate exists for any s-sparse x Dual certificate candidate: Solution of minimize v 2 subject to A T T v = sign ( x T ) Closed-form solution ν l2 := A T ( A T T A T ) 1 sign ( x T ) A T T A T is invertible with high probability We need to prove that A T ν l2 satisfies (A T ν l2 ) T c < 1

Dual certificate 1.5 1 Sign Pattern Dual Function 0.5 0-0.5-1 -1.5 0 10 20 30 40 50 60 70

Proof of exact recovery To control (A T ν l2 ) T c, we need to bound for i T c A T i Let w := ( A T T A T ) 1 sign ( x T ) ( A T T A T ) 1 sign ( x T ) A T i w can be bounded using independence Result then follows from union bound