Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Similar documents
I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

One condition for all: solution uniqueness and robustness of l 1 -synthesis and l 1 -analysis minimizations

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Convex Optimization and SVM

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Conic Linear Optimization and its Dual. yyye

Linear and Combinatorial Optimization

Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization

Lecture Note 5: Semidefinite Programming for Stability Analysis

Necessary and sufficient conditions of solution uniqueness in l 1 minimization

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

Convex Optimization M2

Lecture 6: Conic Optimization September 8

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra

Constrained optimization

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Sparse Optimization Lecture: Dual Methods, Part I

Summer School: Semidefinite Optimization

Lagrangian Duality and Convex Optimization

15. Conic optimization

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Lagrange Relaxation and Duality

Nonlinear Programming

5. Duality. Lagrangian

LECTURE 10 LECTURE OUTLINE

Convex Optimization & Lagrange Duality

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lecture 14: Optimality Conditions for Conic Problems

Lagrangian Duality Theory

Introduction to Compressed Sensing

On the local stability of semidefinite relaxations

Math 273a: Optimization Subgradients of convex functions

12. Interior-point methods

Solving Dual Problems

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Sparsity and Compressed Sensing

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Lecture 7: Convex Optimizations

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

4TE3/6TE3. Algorithms for. Continuous Optimization

Agenda. 1 Duality for LP. 2 Theorem of alternatives. 3 Conic Duality. 4 Dual cones. 5 Geometric view of cone programs. 6 Conic duality theorem

Convex Optimization and Modeling

Lecture: Duality.

Practice Exam 1: Continuous Optimisation

EE364a Review Session 5

1 Review Session. 1.1 Lecture 2

The fundamental theorem of linear programming

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Least Sparsity of p-norm based Optimization Problems with p > 1

Convex Optimization Boyd & Vandenberghe. 5. Duality

Compressed Sensing and Sparse Recovery

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

STAT 200C: High-dimensional Statistics

Asteroide Santana, Santanu S. Dey. December 4, School of Industrial and Systems Engineering, Georgia Institute of Technology

Recent Developments in Compressed Sensing

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

CS711008Z Algorithm Design and Analysis

Optimality Conditions for Constrained Optimization

Additional Homework Problems

Midterm Review. Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.

Sparse Optimization Lecture: Basic Sparse Optimization Models

Lecture: Duality of LP, SOCP and SDP

Lecture: Introduction to LP, SDP and SOCP

Chapter 1. Preliminaries

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

ICS-E4030 Kernel Methods in Machine Learning

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Lecture 16: October 22

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

A Review of Linear Programming

CS-E4830 Kernel Methods in Machine Learning

4. Algebra and Duality

Lecture 10: Linear programming duality and sensitivity 0-0

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

The Karush-Kuhn-Tucker (KKT) conditions

Finite Dimensional Optimization Part III: Convex Optimization 1

Computational Optimization. Constrained Optimization Part 2

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Making Flippy Floppy

10701 Recitation 5 Duality and SVM. Ahmed Hefny

Global Optimization of Polynomials

1 Regression with High Dimensional Data

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Duality of LPs and Applications

Week 3 Linear programming duality

Lecture Notes 9: Constrained Optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Advanced Mathematical Programming IE417. Lecture 24. Dr. Ted Ralphs

Example Problem. Linear Program (standard form) CSCI5654 (Linear Programming, Fall 2013) Lecture-7. Duality

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Duality in Linear Programming

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Trust Region Problems with Linear Inequality Constraints: Exact SDP Relaxation, Global Optimality and Robust Optimization

Transcription:

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization a strictly complementary dual certificate guarantees exact recovery it also guarantees stable recovery 1 / 24

What is covered A review of dual certificate in the context of conic programming A condition that guarantees recovering a set of sparse vectors (whose entries have the same signs), not for all k-sparse vectors The condition depends on sign(x o ), but not x o itself or b The condition is sufficient and necessary It also guarantees robust recovery against measurement errors The condition can be numerically verified (in polynomial time) The underlying techniques are Lagrange duality, strict complementarity, and LP strong duality. Results in this lecture are drawn from various papers. For references, see: H. Zhang, M. Yan, and W. Yin, One condition for all: solution uniqueness and robustness of l 1-synthesis and l 1-analysis minimizations 2 / 24

Lagrange dual for conic programs Let K i be a first-orthant, second-order, or semi-definite cone. It is self-dual. (Suppose a, b K i. Then, a T b 0. If a T b = 0, either a = 0 or b = 0.) Primal: min c T x s.t. Ax = b, x i K i i. Lagrangian relaxation: L(x; s) = c T x + s T (Ax b) Dual function: d(s) = min{l(x; s) : x i K i i} = b T s ι {(A T s+c) x i K i i} Dual problem: min s d(s) min s b T s s.t. (A T s + c) i K i i One problem might be simpler to solve than the other; solving one might help solve the other. 3 / 24

Dual certificate Given that x is primal feasible, i.e., obeying Ax = b, x i K i i. Question: is x optimal? Answer: One does not need to compare x to all other feasible x. A dual vector y will certify the optimality of x. 4 / 24

Dual certificate Theorem Suppose x is feasible (i.e., Ax = b, x i K i i). If s obeys 1. vanished duality gap: b T s = c T x, and 2. dual feasibility: (A T s + c) i K i, then x is primal optimal. Pick any primal feasible x (i.e., Ax = b, x i K i i), we have and thus due to Ax = b, (c + A T s ) T x = i (c + A T s ) T i x i 0 }{{}}{{} K i K i c T x = (c + A T s ) T x (A T s ) T x (A T s ) T x = b T s = c T x. Therefore, x is optimal. Corollary: (c + A T s ) T x = 0 and (c + A T s ) T i x i = 0, i. Bottom line: dual vector y = A T s certifies the optimality of x. 5 / 24

Dual certificate A related claim: Theorem If any primal feasible x and dual feasible s have no duality gap, then x is primal optimal and s is dual optimal. Reason: the primal objective value of any primal feasible x the dual objective value of any dual feasible s. Therefore, assuming both primal and dual feasibilities, a pair of primal/dual objectives must be optimal. 6 / 24

Complementarity and strict complementarity From and we get (c + A T s ) T i x i = (c + A T s ) T x = c T x + b T s = 0 i (c + A T s ) T i }{{} K i xi }{{} K i 0, i. (c + A T s ) T i x i = 0, i. Hence, at least one of (c + A T s ) T i and x i is 0 (but they can be both zero.) If exactly one of (c + A T s ) T i and x i is zero (the other is nonzero), then they are strictly complementary. Certifying the uniqueness of x requires a strictly complementary s. 7 / 24

l 1 duality and dual certificate Primal: Dual: min x 1 s.t. Ax = b (1) max b T s s.t. A T s 1 Given a feasible x, if s obeys 1. A T s 1, and 2. x 1 b T s = 0, then y = A T s certifies the optimality of x. Any primal optimal x must satisfy x 1 b T s = 0. 8 / 24

l 1 duality and complementarity a 1 = ab b. If ab = b, then 1. a < 1 b = 0 2. a = 1 b 0 3. a = 1 b 0 From A T s 1, we get x 1 = b T s = (A T s ) T x x 1 and (A T s ) i x i = x i, i. Therefore, 1. if (A T s ) i < 1, then x i = 0 2. if (A T s ) i = 1, then x i 0 3. if (A T s ) i = 1, then x i 0 Strict complementarity holds if for each i, 1 (A T s ) i or x i is zero but not both. 9 / 24

Uniqueness of x Suppose x is a solution to the basis pursuit model. Question: Is it the unique solution? Define I := supp(x ) = {i : x i 0} and J = I c. If s is a dual certificate and (A T s ) J < 1, x J = 0 for all optimal x. For i I, (A T s ) i = ±1 cannot determine x? i = 0 for optimal x. It is possible that (A T s ) i = ±1 yet x i = 0 (this is called degenerate.) On the other hand, if A Ix I = b has a unique solution, denoted by x I, then since x J = 0 is unique, x = [x I; x J] = [x I; 0] is the unique solution to the basis pursuit model. A Ix I = b has a unique solution provided that A I has independent columns, or equivalently, ker(a I) = {0}. 10 / 24

Optimality and uniqueness Condition For a given x, the index sets I = supp( x) and J = I c satisfy 1. ker(a I) = {0} 2. there exists y such that y R(A T ), y I = sign( x I), and y J < 1. Comments: part 1 guarantees unique x I as the solution to A Ix I = b part 2 guarantees x J = 0 y R(A T ) means y = A T s for some s the condition involves I and sign( x I), not the values of x I or b; but different I and sign( x I) require a different condition RIP guarantees the condition hold for all small I and arbitrary signs the condition is easy to verify 11 / 24

Optimality and uniqueness Theorem Suppose x obeys A x = b and the above Condition, then x is the unique solution to min{ x 1 : Ax = b}. In fact, the converse is also true, namely, the Condition is also necessary. 12 / 24

Uniqueness of x Part 1 ker(a I) = {0} is necessary. Lemma If 0 h ker(a I), then all x α = x + α[h; 0] for small α is optimal. Proof. x α is feasible since Ax α = Ax = b. We know x α 1 x 1, but for small α around 0, we also have x α 1 = x I + αh 1 = (A T s ) T I (x I + αh) = x 1 + α(a T s ) T I h. Hence, (A T s ) T I h = 0 and thus x α 1 = x 1. So, x α is also optimal. 13 / 24

Necessity Is part 2 necessary? Introduce min y y J s.t. y R(A T ), y I = sign( x I). (2) If the optimal objective value < 1, then there exists y obeying part 2, so part 2 is also necessary. We shall translate (2) and rewrite y R(A T ). 14 / 24

Necessity Define a = [sign( x I); 0] and basis Q of Null(A). If a R(A T ), set y = a. done. Otherwise, let y = a + z. Then y R(A T ) Q T y = 0 Q T z = Q T a y I = sign( x I) = a I z I = 0 a J = 0 y J = z J Equivalent problem: min z z J s.t. Q T z = Q T a, z I = 0. (3) If the optimal objective value < 1, then part 2 is necessary. 15 / 24

Necessity Theorem (LP strong duality) If a linear program has a finite solution, its Lagrange dual has a finite solution. The two solutions achieve the same primal and dual optimal objective. Problem (3) is feasible and has a finite objective value. The dual of (3) is max p (QT a) T p s.t. (Qp) J 1 1. If its optimal objective value < 1, then part 2 is necessary. 16 / 24

Lemma Necessity If x is unique, then the optimal objective of the following primal-dual problems is strictly less than 1. min z J s.t. Q T z = Q T a, z I = 0. z max p (QT a) T p s.t. (Qp) J 1 1. Proof. Define a = sign(x ). Uniqueness of x = for h ker(a) \ {0}, we have x 1 < x + h 1 = a T I h I < h J 1 Therefore, if p = 0, then z J = (Q T a) T p = 0. if p 0, then h := Qp ker(a) \ {0} obeys z J = (Q T a) T p = a T I h I < h J 1 (Qp) J 1 1. In both cases, the optimal objective value < 1. 17 / 24

Theorem Suppose x obeys A x = b. Then, x is the unique solution to min{ x 1 : Ax = b} if and only if the Condition holds. Comments: the uniqueness requires strong duality result for problems involving z J strong duality does not hold for all convex programs strong duality does hold for convex polyhedral functions f(z J), as well as those with constraint qualifications (e.g., the Slater condition) indeed, the theorem generalizes to analysis l 1 minimization: Ψ T x 1 does it generalize to x Gi 2 or X? the key is strong duality for 2 and also, the theorem generalizes to the noisy l 1 models (next part...) 18 / 24

Noisy measurements Suppose b is contaminated by noise: b = Ax + w Appropriate models to recover a sparse x include min λ x 1 + 1 2 Ax b 2 2 (4) min x 1 s.t. Ax b 2 δ (5) Theorem Suppose x is a solution to either (4) or (5). Then, x is the unique solution if and only if the Condition holds for x. Key intuition: reduce (4) to (1) with a specific b. Let ˆx be any solution to (4) and b := Aˆx. All solutions to (4) are solutions to min x 1 s.t. Ax = b. The same applies to (5). Recall that the Condition does not involve b. 19 / 24

Stable recovery Assumptions: x and y satisfy the Condition. x is the original signal. b = A x + w, where w 2 δ x is the solution to min x 1 s.t. Ax b 2 δ. Goal: obtain a bound x x 2 Cδ. Constant C shall be independent of δ. 20 / 24

Stable recovery Lemma Define I = supp( x) and J = I c. x x 1 C 3δ + C 4 x J 1, where C 3 = 2 I r(i) and C 4 = A I r(i) + 1. Proof. x x 1 = x I x I 1 + x J 1 x I x I 1 I x I x I 2 I r(i) A I(x I x I) 2, where r(i) := sup supp(u)=i,u 0 (r(i) is related to one side of the RIP bound) introduce ˆx = [x I; 0]. u Au A I(x I x I) 2 = A(ˆx x) 2 A(ˆx x ) 2 + A(x x) 2 }{{} 2δ A(ˆx x ) 2 A ˆx x 2 A ˆx x 1 = A x J 1 21 / 24

Stable recovery Recall in the Condition, y I = sign( x) and y J < 1 x I 1 y I, x I x J 1 (1 y J ) 1 ( x J 1 y J, x ) Therefore, x J 1 (1 y J ) 1 ( x 1 y, x ) = (1 y J ) 1 d y(x, x), where d y(x, x) = x 1 x 1 y, x x is the Bregman distance induced by 1. Recall in the Condition, y R(A T ) so y = A T β for some vector β. d y(x, x) 2 β 2δ. Lemma Under the above assumptions, x J 1 2(1 y J ) 1 β 2δ. 22 / 24

Stable recovery Theorem Assumptions: x and y satisfy the Condition. x is the original signal. y = A T β. b = A x + w, where w 2 δ x is the solution to min x 1 s.t. Ax b 2 δ. Conclusion: where x x 1 Cδ, C = 2 I r(i) + 2 β 2( A I r(i) + 1) 1 y J Comment: a similar bound can be obtained for min λ x 1 + 1 2 Ax b 2 2 with a condition on λ. 23 / 24

Generalization All the previous results (exact and stable recovery) generalize to the following models: min Ψ T x 1 s.t. Ax = b min λ Ψ T x 1 + 1 2 Ax b 2 2 min Ψ T x 1 s.t. Ax b 2 δ Assume that A and Ψ each has independent rows, the update conditions are Condition For a given x, the index sets I = supp(ψ T x) and J = I c satisfy 1. ker(ψ T J ) ker(a I) = {0} 2. there exists y such that Ψy R(A T ), y I = sign(ψ T I x), and y J < 1. 24 / 24