Lecture 13: Duality Uses and Correspondences

Similar documents
Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Lecture 8: September 26

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

Lecture 23: November 19

Lecture 16: October 22

Lecture 6: September 12

Lecture 4: September 12

B-469 Simplified Copositive and Lagrangian Relaxations for Linearly Constrained Quadratic Optimization Problems in Continuous and Binary Variables

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 23: Conditional Gradient Method

Constrained Optimization and Lagrangian Duality

The Linear Quadratic Regulator

Optimization for Machine Learning

Lecture 14: Optimality Conditions for Conic Problems

Convex Optimization Overview (cnt d)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

Lecture 14: Newton s Method

Lecture 26: April 22nd

Formal Methods for Deriving Element Equations

Lecture 23: November 21

Operations Research Letters

Lecture: Duality of LP, SOCP and SDP

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Lagrangian Duality and Convex Optimization

Lecture 5: September 15

Lecture 6: September 17

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

4 Exact laminar boundary layer solutions

Convexity II: Optimization Basics

Lagrangian Duality for Dummies

A Computational Study with Finite Element Method and Finite Difference Method for 2D Elliptic Partial Differential Equations

Mixed Type Second-order Duality for a Nondifferentiable Continuous Programming Problem

Non-Lecture I: Linear Programming. Th extremes of glory and of shame, Like east and west, become the same.

ECON3120/4120 Mathematics 2, spring 2009

Execution time certification for gradient-based optimization in model predictive control

Lecture 5: September 12

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

5. Duality. Lagrangian

On Multiobjective Duality For Variational Problems

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Vectors in Rn un. This definition of norm is an extension of the Pythagorean Theorem. Consider the vector u = (5, 8) in R 2

CDS 110b: Lecture 1-2 Introduction to Optimal Control

Dual Ascent. Ryan Tibshirani Convex Optimization

Lecture 7: Weak Duality

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Duality revisited. Javier Peña Convex Optimization /36-725

Lecture 25: November 27

A Characterization of the Domain of Beta-Divergence and Its Connection to Bregman Variational Model

The Dual of the Maximum Likelihood Method

i=1 y i 1fd i = dg= P N i=1 1fd i = dg.

10701 Recitation 5 Duality and SVM. Ahmed Hefny

12. Interior-point methods

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

A fundamental inverse problem in geosciences

Lecture 24: August 28

Convex Optimization M2

Lecture 9: September 28

Convex Optimization Boyd & Vandenberghe. 5. Duality

Essentials of optimal control theory in ECON 4140

CRITERIA FOR TOEPLITZ OPERATORS ON THE SPHERE. Jingbo Xia

Lecture 15: October 15

FEA Solution Procedure

Lecture: Duality.

Convex Optimization Problems. Prof. Daniel P. Palomar

Convex Optimization & Lagrange Duality

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture 1: January 12

Lecture 4: January 26

Second-Order Wave Equation

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Convex Optimization and Modeling

Lecture 10: Duality in Linear Programs

MAT389 Fall 2016, Problem Set 6

VIBRATION MEASUREMENT UNCERTAINTY AND RELIABILITY DIAGNOSTICS RESULTS IN ROTATING SYSTEMS

Lecture 17: Primal-dual interior-point methods part II

Optimization in Predictive Control Algorithm

3 2D Elastostatic Problems in Cartesian Coordinates

Lecture 6: Conic Optimization September 8

ON OPTIMALITY CONDITIONS FOR ABSTRACT CONVEX VECTOR OPTIMIZATION PROBLEMS

CHARACTERIZATIONS OF EXPONENTIAL DISTRIBUTION VIA CONDITIONAL EXPECTATIONS OF RECORD VALUES. George P. Yanev

Lecture 14: October 17

Relaxations Applicable to Mixed Integer Predictive Control Comparisons and Efficient Computations

On the Method of Lagrange Multipliers

Primal/Dual Decomposition Methods

EE 227A: Convex Optimization and Applications October 14, 2008

ICS-E4030 Kernel Methods in Machine Learning

Complex Variables. For ECON 397 Macroeconometrics Steve Cunningham

Subgradient Method. Ryan Tibshirani Convex Optimization

Technical Note. ODiSI-B Sensor Strain Gage Factor Uncertainty

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

3.1 The Basic Two-Level Model - The Formulas

STABILIZATIO ON OF LONGITUDINAL AIRCRAFT MOTION USING MODEL PREDICTIVE CONTROL AND EXACT LINEARIZATION

Lagrange duality. The Lagrangian. We consider an optimization program of the form

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

ECE Optimization for wireless networks Final. minimize f o (x) s.t. Ax = b,

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Transcription:

10-725/36-725: Conve Optimization Fall 2016 Lectre 13: Dality Uses and Correspondences Lectrer: Ryan Tibshirani Scribes: Yichong X, Yany Liang, Yanning Li Note: LaTeX template cortesy of UC Berkeley EECS dept. Disclaimer: These notes have not been sbjected to the sal scrtiny reserved for formal pblications. They may be distribted otside this class only with the permission of the Instrctor. 13.1 Last Class 13.1.1 KKT conditions For the problem the KKT conditions are f() sbject to h i () 0, i = 1,..., m l j () = 0, j = 1,..., r Stationary: 0 f() + m i h i () + r j=1 v j l j (), Complementary slackness: i h i () = 0, i, Primal feasibility: h i () 0, h j () = 0, i, j, Dal feasibility: i 0, i. The KKT conditions are always sfficient, and are necessary for optimality nder strong dality. 13.1.2 Uses of dality For a primal feasible and a dal feasible, v, f() g(, v) is called the dality gap between and, v. Since f() g(, v) we have f() f( ) f() g(, v). So a zero dality gap implies optimality. algorithms. Also the dality gap can be sed as a stopping criterion in Under strong dality, if we are given dal optimal, v, any primal soltion imizes L(,, v ) over all, becase of the stationary condition. This can be sed to characterize or compte primal soltions. Eplicitly, given a dal soltion, v, any primal soltion solves f() + m i h i () + r vj l j (). j=1 13-1

13-2 Lectre 13: Dality Uses and Correspondences Soltions of this nconstrained problem can often be epressed eplicitly, giving an eplicit characterization of primal soltions from dal soltions. Eample (B & V page 249). Consider the problem f i ( i ) sbject to a T = b where each f i : R R is smooth and strictly conve. The dal fnction is g(v) = = bv + = bv f i ( i ) + v(b a T ) f i ( i ) a i v i i fi (a i v), where f i is the conjgate of f i, which we will define later. The dal problem is ths or ma v v bv fi (a i v) fi (a i v) bv. This is a conve imization problam with a scalar variable - it is mch easier to solve than the primal problem. Given v, the primal soltion solves i f i ( i ) + a i v i. Since each f i is strictly conve, the problem i f i ( i )+a i v i has a niqe soltion, which can be compted by solving f i ( i ) = a i v for each i. 13.2 Dal norms Let be an arbitrary norm. For eample: l p norm: p = ( n i p ) 1/p, for p 1. Trace norm: X tr = r σ i(x). Define its dal norm as = ma z 1 zt. The definition gives s the ineqality z T z, similar to Cachy-Schwartz ineqality. eamples: For the

Lectre 13: Dality ses and correspondences 13-3 l p norm dal: ( p ) = q, where 1/p + 1/q = 1. Trace norm: ( X tr ) = X op = σ 1 (X). We can show that Theorem 13.1 The dal of dal norm is the original norm: i.e., =. Proof: Consider the problem whose optimal vale is. Its Lagrangian is y y sbject to y =, L(y, ) = y + T ( y) = y y T + T. From the definition of, if > 1, let z be the maimizer of ma z 1 z T, i.e., z T =. Note that z = 1. Let y = tz for t > 0. Ths y y T = t( z z T ) = t(1 ) which as t. So in sch case y y y T =. If 1, we have y y T y y 0. This can be realized by setting y = 0. So y y y T = 0. Therefore the Lagrange dal problem is ma T sbject to 1, whose optimal vale is the dal of, i.e.,. By strong dality we have =. 13.3 Conjgate fnction 13.3.1 Definition Given a fnction f : R n R, define its conjgate f : R n R, f (y) = ma yt f() Since y T f() is conve in y for any fied, f is always conve as it is a pointwise maimm of conve (affine) fnctions in y (f need not be conve). f (y) is the maimm gap between linear fnction y T and f() and Figre 13.1 shows how it looks like when f is a scalar fnction. For differentiable f, conjgation is called the Legendre transform.

13-4 Lectre 13: Dality Uses and Correspondences Given y, the pper dashed line represents the fnction g() = y and solid line represents f(). f (y) is the biggest gap between g and f where g is above f. The lower dashed line is drawn to find sch biggest gap and the absolte vale of intercept corresponds to the vale of biggest gap Figre 13.1: From [1] pp. 91 13.3.2 Properties Fenchel s ineqality: for any, y, Proof: f (y) = ma z z T y f(z) T y f() f() + f (y) T y Hence conjgate of conjgate f satisfies f f. Proof: f () = ma z z T f (z) ma z f() = f() ( comes from Fenchel s ineqality) If f is closed and conve, then f = f Proof: f() shares the same vale with y f(y), sbject to y =. The dal is ma T f () = f (). Since strong dality holds, the eqality follows. If f is closed and conve, then for any, y, If f(, v) = f 1 () + f 2 (v), then f (y) y f() f() + f (y) = T y f (w, z) = f 1 (w) + f 2 (z) Proof: f (w, z) = ma,v ( T, v T )(w, z) T f(, v) = ma,v T w+v T z f 1 () f 2 (v) = ma { T w f 1 ()} + ma v {v T z f 2 (v)} = f 1 (w) + f 2 (z) 13.3.3 Eamples Simple qadratic: let f() = 1 2 T Q, where Q 0. Then y T 1 2 T Q is strictly concave in y and is maimized at y = Q 1, so f (y) = 1 2 yt Q 1 y

Lectre 13: Dality ses and correspondences 13-5 Indicator fnction: if f() = I C (), then its conjgate is called the spport fnction of C. Norm: if f() =, then its conjgate is where is the dal norm of. f (y) = ma C yt := I C(y) f (y) = I {z: z 1}(y) 13.3.4 Eample: lasso dal Given y R n, X R n p, recall the lasso problem: 1 β 2 y Xβ 2 2 + λ β 1 Its dal fnction is jst a constant (eqal to f ). Therefore we transform the primal to So dal fnction is now g() = β,z 1 β,z 2 y z 2 2 + λ β 1 sbject to z = Xβ 1 2 y z 2 2 + λ β 1 + T (z Xβ) = 1 2 2 2 + y T I {v: v 1}(X T /λ) Therefore the lasso dal problem is or eqivalently sbject to sbject to ma 1 2 2 2 + y T X T λ y 2 2 X T λ Check: Slater s condition holds, and hence so does strong dality. Bt note: the optimal vale of the last problem is not the optimal lasso objective vale. Frther, note the given the dal soltion, any lasso soltion β satisfies Xβ = y This is from KKT stationarity condition for z (i.e. z y + β = 0). So the lasso fit is jst the dal residal (see Figre 13.2).

13-6 Lectre 13: Dality Uses and Correspondences Figre 13.2: The lasso soltion and its dal soltion 13.3.5 Conjgates and dal problems Conjgates appear freqently in derivation of dal problems, via in imization of the Lagrangian. E.g., consider f () = f() T f() + g() Eqivalently:,z f() + g(z) sbject to = z. Lagrange dal fnction is: g() =,z f() + g(z) + T (z ) =,z f() T + g(z) ( ) T z = {f() T } + {g(z) ( ) T z} z = ma{ T f()} ma z {( )T z g(z)} = f () g ( ) Eamples of this last calclation: Indicator fnction: the dal of is f() + I C () ma f () IC( ) where I C is the spport fnction of C. Norms: the dal of f() +

Lectre 13: Dality ses and correspondences 13-7 is or eqivalently where is the dal norm of. ma f () I {z: z 1}( ) ma f () sbject to 1 13.3.6 Shifting linear transformations Dal formlations can help s by shifting a linear transformation between one part of the objective and another. Let s consider f() + g(a) Eqivalently:,z f() + g(z) sbject to A = z. Like before: Then dal is: g() =,z f() + g(z) + T (z A) = ma (AT ) T f() ma ( )T z g(z) = f (A T ) g ( ) ma f (A T ) g ( ) Eample: for a norm and its dal norm,,, the problems and are primal and dal paris. f() + A ma f (A T ) sbject to 1 z 13.4 Dal cones 13.4.1 Definition Recall that set K R n is a cone if K, t 0, we have t K. The dal cone of K is defined as K = {y : y T 0 for all K} Important properties: K is closed and conve. K 1 K 2 K2 K1 K is the closre of the conve hll of K. (Hence if K is conve and closed, K = K)

13-8 Lectre 13: Dality Uses and Correspondences Left. The halfspace with inward normal y contains the cone K, so y K. Right. The halfspace with inward normal z does not contain K, so z / K. Figre 13.3: From B & V [1] pp. 52 13.4.2 Eamples Linear sbspace: the dal cone of a linear sbspace V is V, its orthogonal complement. E.g. (row(a)) = (A). Norm cone: the dal cone of the norm cone is the norm cone of its dal norm K = {(, t) R n+1 : t} K = {(y, s) R n+1 : y s} Positive semidefinite cone: the conve cone S n + is self-dal, i.e. (S n +) = S n +. Y 0 Tr(Y X) 0 for all X 0 13.4.3 Dal cones and dal problems Consider the cone constrained problem its dal problem is f() sbject to A K ma f (A T ) IK( ) where I K (y) = ma z K z T y is the spport fnction of K. If K is a cone, we have IK ( ) = I K (), the this is eqivalent to where K is the dal cone of K. ma f (A T ) sbject to K It is sally easier to handle cone constraints like K than constraints that the linear transform of is in a cone, i.e. A K.

Lectre 13: Dality ses and correspondences 13-9 13.5 Doble dal Consider general imization problem with linear constraints: sbject to f() A b, C = d The Lagrangian is and hence the dal problem is L(,, v) = f() + (A T + C T v) T b T d T v f ( A T C T v) b T d T v sbject to 0 Recall property: f = f if f is closed and conve. Hence in this case, we can show that the dal of the dal is the primal. Actally this also goes beyond linear constraints. Consider f() sbject to h i () 0, i = 1,..., m l j () = 0, j = 1,..., r If f and h 1,...h m are closed and conve, and l 1,...l r are affine, then the dal of the dal is the primal. This is proved by viewing the imization problem in terms of a bifnction. In this framework, the dal fnction corresponds to the conjgate of this bifnction. See Chapter 29 and 30 of Rockafellar. [2] 13.6 Dal sbtleties We often transform the dal into an eqivalent problem and still call this the dal. Under strong dality, we can se soltions of the (transformed) dal problem to characterize or compte the primal soltions. Warning: the optimal vale of this transformed dal problem is not necessarily the optimal primal vale. A common trick in deriving dals for nconstrained problems is to first transform the primal by adding a dmmy variable and an eqality constraint. e.g. The previos eample of the lasso dal. Usally there is ambigity in how to do this. Different choices can lead to different dal problems. References [1] Stephen Boyd and Lieven Vandenberghe. Conve optimization. Cambridge University Press, 2004. [2] R. Tyrrell Rockafellar. Conve analysis. Princeton University Press, 1970.