Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization a strictly complementary dual certificate guarantees exact recovery it also guarantees stable recovery 1 / 24
What is covered A review of dual certificate in the context of conic programming A condition that guarantees recovering a set of sparse vectors (whose entries have the same signs), not for all k-sparse vectors The condition depends on sign(x o ), but not x o itself or b The condition is sufficient and necessary It also guarantees robust recovery against measurement errors The condition can be numerically verified (in polynomial time) The underlying techniques are Lagrange duality, strict complementarity, and LP strong duality. Results in this lecture are drawn from various papers. For references, see: H. Zhang, M. Yan, and W. Yin, One condition for all: solution uniqueness and robustness of l 1-synthesis and l 1-analysis minimizations 2 / 24
Lagrange dual for conic programs Let K i be a first-orthant, second-order, or semi-definite cone. It is self-dual. (Suppose a, b K i. Then, a T b 0. If a T b = 0, either a = 0 or b = 0.) Primal: min c T x s.t. Ax = b, x i K i i. Lagrangian relaxation: L(x; s) = c T x + s T (Ax b) Dual function: d(s) = min{l(x; s) : x i K i i} = b T s ι {(A T s+c) x i K i i} Dual problem: min s d(s) min s b T s s.t. (A T s + c) i K i i One problem might be simpler to solve than the other; solving one might help solve the other. 3 / 24
Dual certificate Given that x is primal feasible, i.e., obeying Ax = b, x i K i i. Question: is x optimal? Answer: One does not need to compare x to all other feasible x. A dual vector y will certify the optimality of x. 4 / 24
Dual certificate Theorem Suppose x is feasible (i.e., Ax = b, x i K i i). If s obeys 1. vanished duality gap: b T s = c T x, and 2. dual feasibility: (A T s + c) i K i, then x is primal optimal. Pick any primal feasible x (i.e., Ax = b, x i K i i), we have and thus due to Ax = b, (c + A T s ) T x = i (c + A T s ) T i x i 0 }{{}}{{} K i K i c T x = (c + A T s ) T x (A T s ) T x (A T s ) T x = b T s = c T x. Therefore, x is optimal. Corollary: (c + A T s ) T x = 0 and (c + A T s ) T i x i = 0, i. Bottom line: dual vector y = A T s certifies the optimality of x. 5 / 24
Dual certificate A related claim: Theorem If any primal feasible x and dual feasible s have no duality gap, then x is primal optimal and s is dual optimal. Reason: the primal objective value of any primal feasible x the dual objective value of any dual feasible s. Therefore, assuming both primal and dual feasibilities, a pair of primal/dual objectives must be optimal. 6 / 24
Complementarity and strict complementarity From and we get (c + A T s ) T i x i = (c + A T s ) T x = c T x + b T s = 0 i (c + A T s ) T i }{{} K i xi }{{} K i 0, i. (c + A T s ) T i x i = 0, i. Hence, at least one of (c + A T s ) T i and x i is 0 (but they can be both zero.) If exactly one of (c + A T s ) T i and x i is zero (the other is nonzero), then they are strictly complementary. Certifying the uniqueness of x requires a strictly complementary s. 7 / 24
l 1 duality and dual certificate Primal: Dual: min x 1 s.t. Ax = b (1) max b T s s.t. A T s 1 Given a feasible x, if s obeys 1. A T s 1, and 2. x 1 b T s = 0, then y = A T s certifies the optimality of x. Any primal optimal x must satisfy x 1 b T s = 0. 8 / 24
l 1 duality and complementarity a 1 = ab b. If ab = b, then 1. a < 1 b = 0 2. a = 1 b 0 3. a = 1 b 0 From A T s 1, we get x 1 = b T s = (A T s ) T x x 1 and (A T s ) i x i = x i, i. Therefore, 1. if (A T s ) i < 1, then x i = 0 2. if (A T s ) i = 1, then x i 0 3. if (A T s ) i = 1, then x i 0 Strict complementarity holds if for each i, 1 (A T s ) i or x i is zero but not both. 9 / 24
Uniqueness of x Suppose x is a solution to the basis pursuit model. Question: Is it the unique solution? Define I := supp(x ) = {i : x i 0} and J = I c. If s is a dual certificate and (A T s ) J < 1, x J = 0 for all optimal x. For i I, (A T s ) i = ±1 cannot determine x? i = 0 for optimal x. It is possible that (A T s ) i = ±1 yet x i = 0 (this is called degenerate.) On the other hand, if A Ix I = b has a unique solution, denoted by x I, then since x J = 0 is unique, x = [x I; x J] = [x I; 0] is the unique solution to the basis pursuit model. A Ix I = b has a unique solution provided that A I has independent columns, or equivalently, ker(a I) = {0}. 10 / 24
Optimality and uniqueness Condition For a given x, the index sets I = supp( x) and J = I c satisfy 1. ker(a I) = {0} 2. there exists y such that y R(A T ), y I = sign( x I), and y J < 1. Comments: part 1 guarantees unique x I as the solution to A Ix I = b part 2 guarantees x J = 0 y R(A T ) means y = A T s for some s the condition involves I and sign( x I), not the values of x I or b; but different I and sign( x I) require a different condition RIP guarantees the condition hold for all small I and arbitrary signs the condition is easy to verify 11 / 24
Optimality and uniqueness Theorem Suppose x obeys A x = b and the above Condition, then x is the unique solution to min{ x 1 : Ax = b}. In fact, the converse is also true, namely, the Condition is also necessary. 12 / 24
Uniqueness of x Part 1 ker(a I) = {0} is necessary. Lemma If 0 h ker(a I), then all x α = x + α[h; 0] for small α is optimal. Proof. x α is feasible since Ax α = Ax = b. We know x α 1 x 1, but for small α around 0, we also have x α 1 = x I + αh 1 = (A T s ) T I (x I + αh) = x 1 + α(a T s ) T I h. Hence, (A T s ) T I h = 0 and thus x α 1 = x 1. So, x α is also optimal. 13 / 24
Necessity Is part 2 necessary? Introduce min y y J s.t. y R(A T ), y I = sign( x I). (2) If the optimal objective value < 1, then there exists y obeying part 2, so part 2 is also necessary. We shall translate (2) and rewrite y R(A T ). 14 / 24
Necessity Define a = [sign( x I); 0] and basis Q of Null(A). If a R(A T ), set y = a. done. Otherwise, let y = a + z. Then y R(A T ) Q T y = 0 Q T z = Q T a y I = sign( x I) = a I z I = 0 a J = 0 y J = z J Equivalent problem: min z z J s.t. Q T z = Q T a, z I = 0. (3) If the optimal objective value < 1, then part 2 is necessary. 15 / 24
Necessity Theorem (LP strong duality) If a linear program has a finite solution, its Lagrange dual has a finite solution. The two solutions achieve the same primal and dual optimal objective. Problem (3) is feasible and has a finite objective value. The dual of (3) is max p (QT a) T p s.t. (Qp) J 1 1. If its optimal objective value < 1, then part 2 is necessary. 16 / 24
Lemma Necessity If x is unique, then the optimal objective of the following primal-dual problems is strictly less than 1. min z J s.t. Q T z = Q T a, z I = 0. z max p (QT a) T p s.t. (Qp) J 1 1. Proof. Define a = sign(x ). Uniqueness of x = for h ker(a) \ {0}, we have x 1 < x + h 1 = a T I h I < h J 1 Therefore, if p = 0, then z J = (Q T a) T p = 0. if p 0, then h := Qp ker(a) \ {0} obeys z J = (Q T a) T p = a T I h I < h J 1 (Qp) J 1 1. In both cases, the optimal objective value < 1. 17 / 24
Theorem Suppose x obeys A x = b. Then, x is the unique solution to min{ x 1 : Ax = b} if and only if the Condition holds. Comments: the uniqueness requires strong duality result for problems involving z J strong duality does not hold for all convex programs strong duality does hold for convex polyhedral functions f(z J), as well as those with constraint qualifications (e.g., the Slater condition) indeed, the theorem generalizes to analysis l 1 minimization: Ψ T x 1 does it generalize to x Gi 2 or X? the key is strong duality for 2 and also, the theorem generalizes to the noisy l 1 models (next part...) 18 / 24
Noisy measurements Suppose b is contaminated by noise: b = Ax + w Appropriate models to recover a sparse x include min λ x 1 + 1 2 Ax b 2 2 (4) min x 1 s.t. Ax b 2 δ (5) Theorem Suppose x is a solution to either (4) or (5). Then, x is the unique solution if and only if the Condition holds for x. Key intuition: reduce (4) to (1) with a specific b. Let ˆx be any solution to (4) and b := Aˆx. All solutions to (4) are solutions to min x 1 s.t. Ax = b. The same applies to (5). Recall that the Condition does not involve b. 19 / 24
Stable recovery Assumptions: x and y satisfy the Condition. x is the original signal. b = A x + w, where w 2 δ x is the solution to min x 1 s.t. Ax b 2 δ. Goal: obtain a bound x x 2 Cδ. Constant C shall be independent of δ. 20 / 24
Stable recovery Lemma Define I = supp( x) and J = I c. x x 1 C 3δ + C 4 x J 1, where C 3 = 2 I r(i) and C 4 = A I r(i) + 1. Proof. x x 1 = x I x I 1 + x J 1 x I x I 1 I x I x I 2 I r(i) A I(x I x I) 2, where r(i) := sup supp(u)=i,u 0 (r(i) is related to one side of the RIP bound) introduce ˆx = [x I; 0]. u Au A I(x I x I) 2 = A(ˆx x) 2 A(ˆx x ) 2 + A(x x) 2 }{{} 2δ A(ˆx x ) 2 A ˆx x 2 A ˆx x 1 = A x J 1 21 / 24
Stable recovery Recall in the Condition, y I = sign( x) and y J < 1 x I 1 y I, x I x J 1 (1 y J ) 1 ( x J 1 y J, x ) Therefore, x J 1 (1 y J ) 1 ( x 1 y, x ) = (1 y J ) 1 d y(x, x), where d y(x, x) = x 1 x 1 y, x x is the Bregman distance induced by 1. Recall in the Condition, y R(A T ) so y = A T β for some vector β. d y(x, x) 2 β 2δ. Lemma Under the above assumptions, x J 1 2(1 y J ) 1 β 2δ. 22 / 24
Stable recovery Theorem Assumptions: x and y satisfy the Condition. x is the original signal. y = A T β. b = A x + w, where w 2 δ x is the solution to min x 1 s.t. Ax b 2 δ. Conclusion: where x x 1 Cδ, C = 2 I r(i) + 2 β 2( A I r(i) + 1) 1 y J Comment: a similar bound can be obtained for min λ x 1 + 1 2 Ax b 2 2 with a condition on λ. 23 / 24
Generalization All the previous results (exact and stable recovery) generalize to the following models: min Ψ T x 1 s.t. Ax = b min λ Ψ T x 1 + 1 2 Ax b 2 2 min Ψ T x 1 s.t. Ax b 2 δ Assume that A and Ψ each has independent rows, the update conditions are Condition For a given x, the index sets I = supp(ψ T x) and J = I c satisfy 1. ker(ψ T J ) ker(a I) = {0} 2. there exists y such that Ψy R(A T ), y I = sign(ψ T I x), and y J < 1. 24 / 24