Constrained optimization

Size: px

Start display at page:

Download "Constrained optimization"

Norma Strickland
5 years ago
Views:

1 Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis Carlos Fernandez-Granda

2 Compressed sensing Convex constrained problems Analyzing optimization-based methods

3 Magnetic resonance imaging 2D DFT (magnitude) 2D DFT (log. of magnitude)

4 Magnetic resonance imaging Data: Samples from spectrum Problem: Sampling is time consuming (annoying, kids move... ) Images are compressible (sparse in wavelet basis) Can we recover compressible signals from less data?

5 2D wavelet transform

6 2D wavelet transform

7 Full DFT matrix

8 Full DFT matrix

9 Regular subsampling

10 Regular subsampling

11 Random subsampling

12 Random subsampling

13 Toy example

14 Regular subsampling

15 Random subsampling

16 Linear inverse problems Linear inverse problem A x = y Linear measurements, A R m n y[i] = A i:, x, 1 i n, Aim: Recover data signal x R m from data y R n We need n m, otherwise the problem is underdetermined If n < m there are infinite solutions x + w where w null (A)

17 Sparse recovery Aim: Recover sparse x from linear measurements When is the problem well posed? A x = y There shouldn t be two sparse vectors x 1 and x 2 such that A x 1 = A x 2

18 Spark The spark of a matrix is the smallest subset of columns that is linearly dependent

19 Spark The spark of a matrix is the smallest subset of columns that is linearly dependent Let y := A x, where A R m n, y R n and x R m is a sparse vector with s nonzero entries The vector x is the only vector with sparsity s consistent with the data, i.e. it is the solution of for any choice of x if and only if min x x 0 subject to A x = y spark (A) > 2s

20 Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data

21 Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data For any pair of s-sparse vectors x 1 and x 2 A ( x 1 x 2 ) 0

22 Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data For any pair of s-sparse vectors x 1 and x 2 A ( x 1 x 2 ) 0 For any pair of subsets of s indices T 1 and T 2 A T1 T 2 α 0 for any α R T 1 T 2

23 Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data For any pair of s-sparse vectors x 1 and x 2 A ( x 1 x 2 ) 0 For any pair of subsets of s indices T 1 and T 2 A T1 T 2 α 0 for any α R T 1 T 2 All submatrices with at most 2s columns have no nonzero vectors in their null space

24 Proof Equivalent statements For any x, x is the only vector with sparsity s consistent with the data For any pair of s-sparse vectors x 1 and x 2 A ( x 1 x 2 ) 0 For any pair of subsets of s indices T 1 and T 2 A T1 T 2 α 0 for any α R T 1 T 2 All submatrices with at most 2s columns have no nonzero vectors in their null space All submatrices with at most 2s columns are full rank

25 Restricted-isometry property Robust version of spark If two s-sparse vectors x 1, x 2 are far, then A x 1, A x 2 should be far The linear operator should preserve distances (be an isometry) when restricted to act upon sparse vectors

26 Restricted-isometry property A satisfies the restricted isometry property (RIP) with constant κ s if for any s-sparse vector x (1 κ s ) x 2 A x 2 (1 + κ s ) x 2 If A satisfies the RIP for a sparsity level 2s then for any s-sparse x 1, x 2 y 2 y 1 2

27 Restricted-isometry property A satisfies the restricted isometry property (RIP) with constant κ s if for any s-sparse vector x (1 κ s ) x 2 A x 2 (1 + κ s ) x 2 If A satisfies the RIP for a sparsity level 2s then for any s-sparse x 1, x 2 y 2 y 1 2 = A ( x 1 x 2 ) (1 κ 2s ) x 2 x 1 2

28 Regular subsampling

29 Regular subsampling

30 Correlation with column Correlation

31 Random subsampling

32 Random subsampling

33 Correlation with column Correlation

34 Restricted-isometry property Deterministic matrices tend to not satisfy the RIP It is NP-hard to check if spark or RIP hold Random matrices satisfy RIP with high probability We prove it for Gaussian iid matrices, ideas in proof for random Fourier matrices are similar

35 Restricted-isometry property for Gaussian matrices Let A R m n be a random matrix with iid standard Gaussian entries 1 m A satisfies the RIP for a constant κ s with probability 1 C 2 n m C 1s κ 2 s for two fixed constants C 1, C 2 > 0 ( n ) log s as long as

36 Proof For a fixed support T of size s bounds follow from bounds on singular values of Gaussian matrices

37 Singular values of m s matrix, s = 100 σi m m/s i

38 Singular values of m s matrix, s = 1000 σi m m/s ,000 i

39 Proof For a fixed submatrix the singular values are bounded by m (1 κs ) σ s σ 1 m (1 + κ s ) with probability at least ( ) 12 s ( ) 1 2 exp mκ2 s 32 κ s For any vector x with support T 1 κs x 2 1 m A x κ s x 2

40 Union bound For any events S 1, S 2,..., S n in a probability space P ( i S i ) n P (S i ). i=1

41 Proof Number of different supports of size s

42 Proof Number of different supports of size s ( ) n ( en s s ) s

43 Proof Number of different supports of size s ( ) n ( en s s ) s By the union bound 1 κs x 2 1 m A x κ s x 2 holds for any s-sparse vector x with probability at least ( en ) ( ) s 12 s ( ) 1 2 exp mκ2 s s 32 κ s ( = 1 exp log 2 + s + s log 1 C 2 n as long as ( n s ) + s log m C 1s κ 2 s ( 12 κ s ( n ) log s ) mκ2 s 2 )

44 Sparse recovery via l 1 -norm minimization l 0 - norm" minimization is intractable (As usual) we can minimize l 1 norm instead, estimate x l1 to is the solution min x x 1 subject to A x = y

45 Minimum l 2 -norm solution (regular subsampling) 4 True Signal Minimum L2 Solution

46 Minimum l 1 -norm solution (regular subsampling) 4 True Signal Minimum L1 Solution

47 Minimum l 2 -norm solution (random subsampling) 4 True Signal Minimum L2 Solution

48 Minimum l 1 -norm solution (random subsampling) 4 True Signal Minimum L1 Solution

49 Geometric intuition l 2 norm l 1 norm

50 Sparse recovery via l 1 -norm minimization If the signal is sparse in a transform domain then min x c 1 subject to AW c = y If we want to recover the original c then AW should satisfy the RIP

51 Sparse recovery via l 1 -norm minimization If the signal is sparse in a transform domain then min x c 1 subject to AW c = y If we want to recover the original c then AW should satisfy the RIP However, we might be fine with any c such that A c = x

52 Regular subsampling

53 Minimum l 2 -norm solution (regular subsampling)

54 Minimum l 1 -norm solution (regular subsampling)

55 Random subsampling

56 Minimum l 2 -norm solution (random subsampling)

57 Minimum l 1 -norm solution (random subsampling)

58 Compressed sensing Convex constrained problems Analyzing optimization-based methods

59 Convex sets A convex set S is any set such that for any x, y S and θ (0, 1) θ x + (1 θ) y S The intersection of convex sets is convex

60 Convex vs nonconvex Nonconvex Convex

61 Epigraph epi (f ) f A function is convex if and only if its epigraph is convex

62 Projection onto convex set The projection of any vector x onto a non-empty closed convex set S exists and is unique P S ( x) := arg min y S x y 2

63 Proof Assume there are two distinct projections y 1 y 2 Consider y belongs to S (why?) y := y 1 + y 2 2

64 Proof x y, y 1 y = x y 1 + y 2 2 x y1 = 2, y 1 y 1 + y x y 2, x y 1 x y

65 Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y x y1 = + x y 2, x y 1 x y = 1 ( x y x y 2 2) 4 = 0

66 Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y x y1 = + x y 2, x y 1 x y = 1 ( x y x y 2 2) 4 By Pythagoras theorem = 0 x y 1 2 2

67 Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y x y1 = + x y 2, x y 1 x y = 1 ( x y x y 2 2) 4 By Pythagoras theorem = 0 x y = x y y 1 y 2 2

68 Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y x y1 = + x y 2, x y 1 x y = 1 ( x y x y 2 2) 4 By Pythagoras theorem = 0 x y = x y y 1 y 2 2 = x y y 1 y

69 Proof x y, y 1 y = x y 1 + y 2, y 1 y 1 + y x y1 = + x y 2, x y 1 x y = 1 ( x y x y 2 2) 4 By Pythagoras theorem = 0 x y = x y y 1 y 2 2 = x y y 1 y > x y 2 2 2

70 Convex combination Given n vectors x 1, x 2,..., x n R n, x := n θ i x i i=1 is a convex combination of x 1, x 2,..., x n if θ i 0, 1 i n n θ i = 1 i=1

71 Convex hull The convex hull of S is the set of convex combinations of points in S The l 1 -norm ball is the convex hull of the intersection between the l 0 norm" ball and the l -norm ball

72 l 1 -norm ball

73 B l1 C (B l0 B l ) Let x B l1 Set θ i := x[i], θ 0 = 1 n i=1 θ i n i=0 θ i = 1 by construction, θ i 0 and n+1 θ 0 = 1 i=1 θ i = 1 x 1 0 because x B l1

74 B l1 C (B l0 B l ) Let x B l1 Set θ i := x[i], θ 0 = 1 n i=1 θ i n i=0 θ i = 1 by construction, θ i 0 and n+1 θ 0 = 1 i=1 θ i = 1 x 1 0 because x B l1 x B l0 B l because x = n θ i sign ( x[i]) e i + θ 0 0 i=1

75 C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1

76 C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1

77 C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 m θ i y i 1 i=1 by the Triangle inequality

78 C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 m θ i y i 1 i=1 m θ i y i i=1 by the Triangle inequality y i only has one nonzero entry

79 C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 m θ i y i 1 i=1 m θ i y i i=1 m i=1 θ i by the Triangle inequality y i only has one nonzero entry

80 C (B l0 B l ) B l1 Let x C (B l0 B l ), then x = m θ i y i i=1 x 1 1 m θ i y i 1 i=1 m θ i y i i=1 m i=1 θ i by the Triangle inequality y i only has one nonzero entry

81 Convex optimization problem f 0, f 1,..., f m, h 1,..., h p : R n R minimize f 0 ( x) subject to f i ( x) 0, 1 i m, h i ( x) = 0, 1 i p,

82 Definitions A feasible vector is a vector that satisfies all the constraints A solution is any vector x such that for all feasible vectors x f 0 ( x) f 0 ( x ) If a solution exists f ( x ) is the optimal value or optimum of the problem

83 Convex optimization problem The optimization problem is convex if f 0 is convex f 1,..., f m are convex h 1,..., h p are affine, i.e. h i ( x) = a i T x + b i for some a i R n and b i R

84 Linear program minimize subject to a T x c T i x d i, 1 i m A x = b

85 l 1 -norm minimization as an LP The optimization problem minimize x 1 subject to A x = b can be recast as the LP minimize subject to m t[i] i=1 t[i] e T i x t[i] e T i x A x = b

86 Proof Solution to l 1 -norm min. problem: x l 1 Solution to linear program: ( x lp, t lp) Set t l 1 [i] := x l 1 [i] ( x l 1, t l ) 1 is feasible for linear program x l 1 = 1 m t l 1 [i] i=1

87 Proof Solution to l 1 -norm min. problem: x l 1 Solution to linear program: ( x lp, t lp) Set t l 1 [i] := x l 1 [i] ( x l 1, t l ) 1 is feasible for linear program x l 1 = 1 m t l 1 [i] i=1 m t lp [i] i=1 by optimality of t lp

88 Proof Solution to l 1 -norm min. problem: x l 1 Solution to linear program: ( x lp, t lp) Set t l 1 [i] := x l 1 [i] ( x l 1, t l ) 1 is feasible for linear program x l 1 m = t l 1 [i] 1 i=1 m t lp [i] i=1 x lp 1 by optimality of t lp

89 Proof Solution to l 1 -norm min. problem: x l 1 Solution to linear program: ( x lp, t lp) Set t l 1 [i] := x l 1 [i] ( x l 1, t l ) 1 is feasible for linear program x l 1 m = t l 1 [i] 1 i=1 m t lp [i] i=1 x lp 1 by optimality of t lp x lp is a solution to the l 1 -norm min. problem

90 Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1

91 Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1 x lp 1 by optimality of x l 1

92 Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1 x lp 1 by optimality of x l 1 m t lp [i] i=1

93 Proof Set t l 1 [i] := x l 1 [i] m i=1 t l 1 i = x l 1 1 x lp 1 by optimality of x l 1 m t lp [i] i=1 ( x l 1, t l ) 1 is a solution to the linear problem

94 Quadratic program For a positive semidefinite matrix Q R n n minimize subject to x T Q x + a T x c T i x d i, 1 i m, A x = b

95 l 1 -norm regularized least squares as a QP The optimization problem can be recast as the QP minimize A x y α x 1 minimize subject to x T A T A x 2 y T x + α t[i] e T i x t[i] e T i x n t[i] i=1

96 Lagrangian The Lagrangian of a canonical optimization problem is m p L ( x, α, ν) := f 0 ( x) + α[i] f i ( x) + ν[j] h j ( x), i=1 j=1 α R m, ν R p are called Lagrange multipliers or dual variables If x is feasible and α[i] 0 for 1 i m L ( x, α, ν) f 0 ( x)

97 Lagrange dual function The Lagrange dual function of the problem is l ( α, ν) := inf x R n f 0 ( x) + m α[i]f i ( x) + i=1 Let p be an optimum of the optimization problem as long as α[i] 0 for 1 i n l ( α, ν) p p ν[j]h j ( x) j=1

98 Dual problem The dual problem of the (primal) optimization problem is maximize l ( α, ν) subject to α[i] 0, 1 i m. The dual problem is always convex, even if the primal isn t!

99 Maximum/supremum of convex functions Pointwise maximum of m convex functions f 1,..., f m is convex f max (x) := max 1 i m f i (x) Pointwise supremum of a family of convex functions indexed by a set I is convex f sup (x) := sup f i (x) i I

100 Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I

101 Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I sup θf i ( x) + (1 θ) f i ( y) i I by convexity of the f i

102 Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I sup θf i ( x) + (1 θ) f i ( y) i I θ sup i I f i ( x) + (1 θ) sup f j ( y) j I by convexity of the f i

103 Proof For any 0 θ 1 and any x, y R, f sup (θ x + (1 θ) y) = sup f i (θ x + (1 θ) y) i I sup θf i ( x) + (1 θ) f i ( y) i I θ sup i I f i ( x) + (1 θ) sup f j ( y) j I = θf sup ( x) + (1 θ) f sup ( y) by convexity of the f i

104 Weak duality If p is a primal optimum and d a dual optimum d p

105 Strong duality For convex problems d = p under very weak conditions LPs: The primal optimum is finite General convex programs (Slater s condition): There exists a point that is strictly feasible f i ( x) < 0 1 i m

106 l 1 -norm minimization The dual problem of min x x 1 subject to A x = y is max y T ν ν subject to A T ν 1

107 Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y

108 Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function If A T ν[i] > 1? l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y

109 Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν)

110 Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν) If A T ν 1? (A T ν) T x

111 Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν) If A T ν 1? (A T ν) T A x x T 1 ν x 1

112 Proof Lagrangian L ( x, ν) = x 1 + ν T ( y A x) Lagrange dual function l ( α, ν) := inf x R n x 1 (AT ν) T x + ν T y If A T ν[i] > 1? We can set x[i] and l ( α, ν) If A T ν 1? so l ( α, ν) = ν T y (A T ν) T A x x T 1 ν x 1

113 Strong duality The solution ν to satisfies max y T ν ν subject to A T ν 1 (A T ν )[i]= sign( x [i]) for all x [i] 0 for all solutions x to the primal problem min x x 1 subject to A x = y

114 Dual solution True Signal Sign Minimum L1 Dual Solution

115 Proof By strong duality By Hölder s inequality x 1 = y T ν = (A x ) T ν = ( x ) T (A T ν ) m = (A T ν )[i] x [i] i=1 x 1 m (A T ν )[i] x [i] i=1 with equality if and only if (A T ν )[i] = sign( x [i]) for all x [i] 0

116 Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w

117 Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w = (A x) T w

118 Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w = (A x) T w = x T (A T w)

119 Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w = (A x) T w = x T (A T w) Idea: Maximize A T w, bounding magnitude of entries by 1

120 Another algorithm for sparse recovery Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and A T w for any w y T w = (A x) T w = x T (A T w) Idea: Maximize A T w, bounding magnitude of entries by 1 Entries where x is nonzero should saturate to 1 or -1

121 Compressed sensing Convex constrained problems Analyzing optimization-based methods

122 Analyzing optimization-based methods Best case scenario: Primal solution has closed form Otherwise: Use dual solution to characterize primal solution

123 Minimum l 2 -norm solution Let A R m n be a full rank matrix such that m < n For any y R n the solution to the optimization problem is arg min x x 2 subject to A x = y. x := VS 1 U T y where A = USV T is the SVD of A = A T ( A T A) 1 y

124 Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x

125 Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x = USV T V c

126 Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x = USV T V c = US c

127 Proof x = P row(a) x + P row(a) x Since A is full rank V, P row(a) x = V c for some vector c R n A x = AP row(a) x = USV T V c = US c A x = y is equivalent to US c = y and c = S 1 U T y

128 Proof For all feasible vectors x P row(a) x = VS 1 U T y By Pythagoras theorem, minimizing x 2 is equivalent to minimizing x 2 2 = Prow(A) x Prow(A) x 2 2

129 Regular subsampling

130 Minimum l 2 -norm solution (regular subsampling) 4 True Signal Minimum L2 Solution

131 Regular subsampling A := 1 2 [ Fm/2 F m/2 ] F m/2 F m/2 = I F m/2 F m/2 = I x := [ xup x down ]

132 Regular subsampling x l2 = arg min A x= y x 2

133 Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y

134 Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 ] [ ] x F up m/2 x down

135 Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 2 F m/2 ] [ ] x F up m/2 x down

136 Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 Fm/2 2 [ ] = 1 F m/2 I 1 ( ) F m/2 x up + F m/2 x down 2 F m/2 ] [ ] x F up m/2 x down

137 Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 Fm/2 2 [ ] = 1 F m/2 2 Fm/2 I 1 ( ) F m/2 x up + F m/2 x down [ = 1 ( )] F m/2 Fm/2 x up + F m/2 x ( down ) 2 Fm/2 x up + F m/2 x down F m/2 ] [ ] x F up m/2 x down

138 Regular subsampling x l2 = arg min A x= y x 2 = A T ( A T A ) 1 y [ ] ( [ ]) 1 = 1 F m/2 1 [ ] 1 F 2 Fm/2 Fm/2 F m/2 2 m/2 1 [ 2 2 Fm/2 Fm/2 [ ] ) 1 F m/2 Fm/2 + F m/2fm/2 ( ) Fm/2 x up + F m/2 x down [ ] = 1 F (1 m/2 2 Fm/2 2 [ ] = 1 F m/2 2 Fm/2 I 1 ( ) F m/2 x up + F m/2 x down [ = 1 ( )] F m/2 Fm/2 x up + F m/2 x ( down ) 2 Fm/2 Fm/2 x up + F m/2 x down = 1 [ ] xup + x down 2 x up + x down ] [ ] x F up m/2 x down

139 Minimum l 1 -norm solution Problem: arg min A x= y x 1 doesn t have a closed form Instead we can use a dual variable to certify optimality

140 Dual solution The solution ν to satisfies max y T ν ν subject to A T ν 1 (A T ν )[i]= sign( x [i]) for all x [i] 0 where x [i] is a solution to the primal problem min x x 1 subject to A x = y

141 Dual certificate If there exists a vector ν R n such that (A T ν)[i] = sign( x [i]) if x [i] 0 (A T ν)[i] <1 if x [i] = 0 then x is the unique solution to the primal problem min x x 1 subject to A x = y as long as the submatrix A T is full rank

142 Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν

143 Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν

144 Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν = ( x ) T (A T ν)

145 Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν = ( x ) T (A T ν) = x [i] sign( x [i]) i T

146 Proof 1 ν is feasible for the dual problem, so for any primal feasible x x 1 y T ν = (A x ) T ν = ( x ) T (A T ν) = x [i] sign( x [i]) i T x must be a solution = x 1

147 Proof 2 A T ν is a subgradient of the l 1 norm at x For any other feasible vector x x 1 x 1 + (A T ν) T ( x x )

148 Proof 2 A T ν is a subgradient of the l 1 norm at x For any other feasible vector x x 1 x 1 + (A T ν) T ( x x ) = x 1 + ν T (A x A x )

149 Proof 2 A T ν is a subgradient of the l 1 norm at x For any other feasible vector x x 1 x 1 + (A T ν) T ( x x ) = x 1 + ν T (A x A x ) = x 1

150 Random subsampling

151 Minimum l 1 -norm solution (random subsampling) 4 True Signal Minimum L1 Solution

152 Exact sparse recovery via l 1 -norm minimization Assumption: There exists a signal x R m with s nonzeros such that A x = y for a random A R m n (random Fourier, Gaussian iid,... ) Exact recovery: If the number of measurements satisfies m C s log n the solution of the problem minimize x 1 subject to A x = y is the original signal with probability at least 1 1 n

153 Proof Show that dual certificate always exists We need A T T ν = sign( x T ) A T T c ν < 1 s constraints Idea: Impose A T ν = sign( x ) and minimize A T T c ν Problem: No closed-form solution How about minimizing l 2 norm?

154 Proof of exact recovery Prove that dual certificate exists for any s-sparse x Dual certificate candidate: Solution of minimize v 2 subject to A T T v = sign ( x T ) Closed-form solution ν l2 := A T ( A T T A T ) 1 sign ( x T ) A T T A T is invertible with high probability We need to prove that A T ν l2 satisfies (A T ν l2 ) T c < 1

155 Dual certificate Sign Pattern Dual Function

156 Proof of exact recovery To control (A T ν l2 ) T c, we need to bound for i T c A T i Let w := ( A T T A T ) 1 sign ( x T ) ( A T T A T ) 1 sign ( x T ) A T i w can be bounded using independence Result then follows from union bound

Lecture Notes 9: Constrained Optimization

Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form