Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Similar documents
Lecture 13: Duality Uses and Correspondences

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lagrangian Duality for Dummies

5. Duality. Lagrangian

Optimization for Machine Learning

Constrained Optimization and Lagrangian Duality

Duality revisited. Javier Peña Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 14: Optimality Conditions for Conic Problems

Convexity II: Optimization Basics

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

Lecture 23: November 19

Lecture: Duality of LP, SOCP and SDP

Convex Optimization M2

Convex Optimization Boyd & Vandenberghe. 5. Duality

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

10701 Recitation 5 Duality and SVM. Ahmed Hefny

Operations Research Letters

Lagrangian Duality and Convex Optimization

Convex Optimization Overview (cnt d)

Lecture: Duality.

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Support Vector Machines

Lecture 7: Weak Duality

Convex Optimization & Lagrange Duality

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lecture 23: Conditional Gradient Method

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

EE 227A: Convex Optimization and Applications October 14, 2008

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Convex Optimization and Modeling

Lecture 16: October 22

Introduction to Alternating Direction Method of Multipliers

Lecture 4: September 12

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

9. Interpretations, Lifting, SOS and Moments

EE/AA 578, Univ of Washington, Fall Duality

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Support Vector Machines: Maximum Margin Classifiers

CS-E4830 Kernel Methods in Machine Learning

Lecture 6: Conic Optimization September 8

too, of course, but is perhaps overkill here.

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

ICS-E4030 Kernel Methods in Machine Learning

12. Interior-point methods

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

ECE Optimization for wireless networks Final. minimize f o (x) s.t. Ax = b,

Lecture 7: September 17

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

subject to (x 2)(x 4) u,

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Primal/Dual Decomposition Methods

Lecture 1: January 12

Optimality Conditions for Constrained Optimization

Lecture 18: Optimization Programming

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Homework 4. Convex Optimization /36-725

Interior Point Algorithms for Constrained Convex Optimization

Computational Optimization. Constrained Optimization Part 2

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization

Lecture Note 5: Semidefinite Programming for Stability Analysis

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Convex Optimization Problems. Prof. Daniel P. Palomar

CS711008Z Algorithm Design and Analysis

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Lecture 8. Strong Duality Results. September 22, 2008

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

On the Method of Lagrange Multipliers

Review of Optimization Basics

Lecture 7: Convex Optimizations

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

Tutorial on Convex Optimization: Part II

Support vector machines

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Applied Lagrange Duality for Constrained Optimization

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

Input Optimization for Multi-Antenna Broadcast Channels with Per-Antenna Power Constraints

12. Interior-point methods

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Dual Decomposition.

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Convex Optimization Lecture 6: KKT Conditions, and applications

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Lecture 10. ( x domf. Remark 1 The domain of the conjugate function is given by

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Primal-Dual Interior-Point Methods. Javier Peña Convex Optimization /36-725

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

4TE3/6TE3. Algorithms for. Continuous Optimization

Transcription:

Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725

Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions are ( m 0 f() + u i h i () + i=1 u i h i () = 0 for all i r j=1 h i () 0, l j () = 0 for all i, j u i 0 for all i ) v j l j () (stationarity) (complementary slackness) (primal feasibility) (dual feasibility) These are necessary for optimality (of a primal-dual pair and u, v ) under strong duality, and always sufficient 2

Two key uses of duality: Uses of duality For primal feasible and u, v dual feasible, f() g(u, v) is called the duality gap between and u, v. Since f() f( ) f() g(u, v) a zero duality gap implies optimality. Also, the duality gap can be used as a stopping criterion in algorithms Under strong duality, given dual optimal u, v, any primal solution imizes L(, u, v ) over all (i.e., it satisfies stationarity condition). This can be used to characterize or compute primal solutions 3

Solving the primal via the dual An important consequence of stationarity: under strong duality, given a dual solution u, v, any primal solution solves f() + m u i h i () + i=1 r vj l j () j=1 Often, solutions of this unconstrained problem can be epressed eplicitly, giving an eplicit characterization of primal solutions from dual solutions Furthermore, suppose the solution of this problem is unique; then it must be the primal solution This can be very helpful when the dual is easier to solve than the primal 4

Eample from B & V page 249: n f i ( i ) subject to a T = b i=1 where each f i : R R is smooth, strictly conve. Dual function: g(v) = = bv + = bv n f i ( i ) + v(b a T ) i=1 n i=1 i { fi ( i ) a i v i } n fi (a i v) i=1 where f i is the conjugate of f i, to be defined shortly 5

Therefore the dual problem is ma v bv n i=1 f i (a i v) v n fi (a i v) bv i=1 This is a conve imization problem with scalar variable much easier to solve than primal Given v, the primal solution solves n ( fi ( i ) a i v ) i i=1 Strict conveity of each f i implies that this has a unique solution, namely, which we compute by solving f i ( i ) = a i v for each i 6

Outline Today: Dual norms Conjugate functions Dual cones Dual tricks and subtleties (Note: there are many other uses of duality and relationships to duality that we could discuss, but not enough time...) 7

Dual norms Let be a norm, e.g., l p norm: p = ( n i=1 i p ) 1/p, for p 1 Trace norm: X tr = r i=1 σ i(x) We define its dual norm as = ma z 1 zt Gives us the inequality z T z (like generalized Holder). Back to our eamples, l p norm dual: ( p ) = q, where 1/p + 1/q = 1 Trace norm dual: ( X tr ) = X op = σ 1 (X) Dual norm of dual norm: can show that = 8

Proof: consider the (trivial-looking) problem y y subject to y = whose optimal value is. Lagrangian: L(y, u) = y + u T ( y) = y y T u + T u Using definition of, If u > 1, then y { y y T u} = If u 1, then y { y y T u} = 0 Therefore Lagrange dual problem is ma u u T subject to u 1 By strong duality f = g, i.e., = 9

10 Conjugate function Given a function f : R n R, define its conjugate f : R n R, f (y) = ma y T f() Note that f is always conve, since it is the pointwise maimum of conve (affine) functions in y (here f need not be conve) f() y f (y) : maimum gap between linear function y T and f() (From B & V page 91) (0, f (y)) For differentiable f, conjugation is called the Legendre transform

11 Properties: Fenchel s inequality: for any, y, f() + f (y) T y Conjugate of conjugate f satisfies f f If f is closed and conve, then f = f If f is closed and conve, then for any, y, f (y) y f() f() + f (y) = T y If f(u, v) = f 1 (u) + f 2 (v), then f (w, z) = f1 (w) + f2 (z)

12 Eamples: Simple quadratic: let f() = 1 2 T Q, where Q 0. Then y T 1 2 T Q is strictly concave in y and is maimized at y = Q 1, so f (y) = 1 2 yt Q 1 y Indicator function: if f() = I C (), then its conjugate is f (y) = I C(y) = ma C yt called the support function of C Norm: if f() =, then its conjugate is f (y) = I {z : z 1}(y) where is the dual norm of

13 Eample: lasso dual Given y R n, X R n p, recall the lasso problem: 1 β 2 y Xβ 2 2 + λ β 1 Its dual function is just a constant (equal to f ). Therefore we transform the primal to β,z 1 2 y z 2 2 + λ β 1 subject to z = Xβ so dual function is now g(u) = β,z 1 2 y z 2 2 + λ β 1 + u T (z Xβ) = 1 2 y 2 2 1 2 y u 2 2 I {v : v 1}(X T u/λ)

14 Therefore the lasso dual problem is ma u 1 2 ( ) y 2 2 y u 2 2 subject to X T u λ u y u 2 2 subject to X T u λ Check: Slater s condition holds, and hence so does strong duality. But note: the optimal value of the last problem is not the optimal lasso objective value Further, note that given the dual solution u, any lasso solution β satisfies Xβ = y u This is from KKT stationarity condition for z (i.e., z y + β = 0). So the lasso fit is just the dual residual

15 y X ˆβ û A, s A 0 (X T ) 1 0 C = {u : X T u λ} {v : v λ} R n R p

16 Conjugates and dual problems Conjugates appear frequently in derivation of dual problems, via f (u) = f() u T in imization of the Lagrangian. E.g., consider f() + g() Equivalently:,z g(u) = Hence dual problem is f() + g(z) subject to = z. Dual function: f() + g(z) + u T (z ) = f (u) g ( u) ma u f (u) g ( u)

17 Eamples of this last calculation: Indicator function: Primal : Dual : ma u f() + I C () f (u) I C( u) where I C is the support function of C Norms: the dual of Primal : Dual : ma u f() + f (u) subject to u 1 where is the dual norm of

18 Shifting linear transformations Dual formulations can help us by shifting a linear transformation between one part of the objective and another. Consider f() + g(a) Equivalently:,z dual is: f() + g(z) subject to A = z. Like before, ma u f (A T u) g ( u) Eample: for a norm and its dual norm,, : Primal : Dual : ma u f() + A f(a T u) subject to u 1 The dual can often be a helpful transformation here

19 Dual cones For a cone K R n (recall this means K, t 0 = t K), K = {y : y T 0 for all K} is called its dual cone. This is always a conve cone (even if K is not conve) y K z K Notice that y K the halfspace { : y T 0} contains K (From B & V page 52) Important property: if K is a closed conve cone, then K = K

20 Eamples: Linear subspace: the dual cone of a linear subspace V is V, its orthogonal complement. E.g., (row(a)) = null(a) Norm cone: the dual cone of the norm cone K = {(, t) R n+1 : t} is the norm cone of its dual norm K = {(y, s) R n+1 : y s} Positive semidefinite cone: the conve cone S n + is self-dual, meaning (S n +) = S n +. Why? Check that Y 0 tr(y X) 0 for all X 0 by looking at the eigendecomposition of X

21 Dual cones and dual problems Consider the cone constrained problem Recall that its dual problem is f() subject to A K ma u f (A T u) I K( u) where recall I K (y) = ma z K z T y, the support function of K. If K is a cone, then this is simply ma u f (A T u) subject to u K where K is the dual cone of K, because I K ( u) = I K (u) This is quite a useful observation, because many different types of constraints can be posed as cone constraints

22 Dual subtleties Often, we will transform the dual into an equivalent problem and still call this the dual. Under strong duality, we can use solutions of the (transformed) dual problem to characterize or compute primal solutions Warning: the optimal value of this transformed dual problem is not necessarily the optimal primal value A common trick in deriving duals for unconstrained problems is to first transform the primal by adding a dummy variable and an equality constraint Usually there is ambiguity in how to do this. Different choices can lead to different dual problems!

23 Double dual Consider general imization problem with linear constraints: subject to f() A b, C = d The Lagrangian is L(, u, v) = f() + (A T u + C T v) T b T u d T v and hence the dual problem is ma u,v subject to u 0 f ( A T u C T v) b T u d T v Recall property: f = f if f is closed and conve. Hence in this case, we can show that the dual of the dual is the primal

24 Actually, the connection (between duals of duals and conjugates) runs much deeper than this, beyond linear constraints. Consider subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r If f and h 1,... h m are closed and conve, and l 1,... l r are affine, then the dual of the dual is the primal This is proved by viewing the imization problem in terms of a bifunction. In this framework, the dual function corresponds to the conjugate of this bifunction (for more, read Chapters 29 and 30 of Rockafellar)

25 References S. Boyd and L. Vandenberghe (2004), Conve optimization, Chapters 2, 3, 5 R. T. Rockafellar (1970), Conve analysis, Chapters 12, 13, 14, 16, 28 30