10-725/36-725: Conve Optimization Fall 2016 Lectre 13: Dality Uses and Correspondences Lectrer: Ryan Tibshirani Scribes: Yichong X, Yany Liang, Yanning Li Note: LaTeX template cortesy of UC Berkeley EECS dept. Disclaimer: These notes have not been sbjected to the sal scrtiny reserved for formal pblications. They may be distribted otside this class only with the permission of the Instrctor. 13.1 Last Class 13.1.1 KKT conditions For the problem the KKT conditions are f() sbject to h i () 0, i = 1,..., m l j () = 0, j = 1,..., r Stationary: 0 f() + m i h i () + r j=1 v j l j (), Complementary slackness: i h i () = 0, i, Primal feasibility: h i () 0, h j () = 0, i, j, Dal feasibility: i 0, i. The KKT conditions are always sfficient, and are necessary for optimality nder strong dality. 13.1.2 Uses of dality For a primal feasible and a dal feasible, v, f() g(, v) is called the dality gap between and, v. Since f() g(, v) we have f() f( ) f() g(, v). So a zero dality gap implies optimality. algorithms. Also the dality gap can be sed as a stopping criterion in Under strong dality, if we are given dal optimal, v, any primal soltion imizes L(,, v ) over all, becase of the stationary condition. This can be sed to characterize or compte primal soltions. Eplicitly, given a dal soltion, v, any primal soltion solves f() + m i h i () + r vj l j (). j=1 13-1
13-2 Lectre 13: Dality Uses and Correspondences Soltions of this nconstrained problem can often be epressed eplicitly, giving an eplicit characterization of primal soltions from dal soltions. Eample (B & V page 249). Consider the problem f i ( i ) sbject to a T = b where each f i : R R is smooth and strictly conve. The dal fnction is g(v) = = bv + = bv f i ( i ) + v(b a T ) f i ( i ) a i v i i fi (a i v), where f i is the conjgate of f i, which we will define later. The dal problem is ths or ma v v bv fi (a i v) fi (a i v) bv. This is a conve imization problam with a scalar variable - it is mch easier to solve than the primal problem. Given v, the primal soltion solves i f i ( i ) + a i v i. Since each f i is strictly conve, the problem i f i ( i )+a i v i has a niqe soltion, which can be compted by solving f i ( i ) = a i v for each i. 13.2 Dal norms Let be an arbitrary norm. For eample: l p norm: p = ( n i p ) 1/p, for p 1. Trace norm: X tr = r σ i(x). Define its dal norm as = ma z 1 zt. The definition gives s the ineqality z T z, similar to Cachy-Schwartz ineqality. eamples: For the
Lectre 13: Dality ses and correspondences 13-3 l p norm dal: ( p ) = q, where 1/p + 1/q = 1. Trace norm: ( X tr ) = X op = σ 1 (X). We can show that Theorem 13.1 The dal of dal norm is the original norm: i.e., =. Proof: Consider the problem whose optimal vale is. Its Lagrangian is y y sbject to y =, L(y, ) = y + T ( y) = y y T + T. From the definition of, if > 1, let z be the maimizer of ma z 1 z T, i.e., z T =. Note that z = 1. Let y = tz for t > 0. Ths y y T = t( z z T ) = t(1 ) which as t. So in sch case y y y T =. If 1, we have y y T y y 0. This can be realized by setting y = 0. So y y y T = 0. Therefore the Lagrange dal problem is ma T sbject to 1, whose optimal vale is the dal of, i.e.,. By strong dality we have =. 13.3 Conjgate fnction 13.3.1 Definition Given a fnction f : R n R, define its conjgate f : R n R, f (y) = ma yt f() Since y T f() is conve in y for any fied, f is always conve as it is a pointwise maimm of conve (affine) fnctions in y (f need not be conve). f (y) is the maimm gap between linear fnction y T and f() and Figre 13.1 shows how it looks like when f is a scalar fnction. For differentiable f, conjgation is called the Legendre transform.
13-4 Lectre 13: Dality Uses and Correspondences Given y, the pper dashed line represents the fnction g() = y and solid line represents f(). f (y) is the biggest gap between g and f where g is above f. The lower dashed line is drawn to find sch biggest gap and the absolte vale of intercept corresponds to the vale of biggest gap Figre 13.1: From [1] pp. 91 13.3.2 Properties Fenchel s ineqality: for any, y, Proof: f (y) = ma z z T y f(z) T y f() f() + f (y) T y Hence conjgate of conjgate f satisfies f f. Proof: f () = ma z z T f (z) ma z f() = f() ( comes from Fenchel s ineqality) If f is closed and conve, then f = f Proof: f() shares the same vale with y f(y), sbject to y =. The dal is ma T f () = f (). Since strong dality holds, the eqality follows. If f is closed and conve, then for any, y, If f(, v) = f 1 () + f 2 (v), then f (y) y f() f() + f (y) = T y f (w, z) = f 1 (w) + f 2 (z) Proof: f (w, z) = ma,v ( T, v T )(w, z) T f(, v) = ma,v T w+v T z f 1 () f 2 (v) = ma { T w f 1 ()} + ma v {v T z f 2 (v)} = f 1 (w) + f 2 (z) 13.3.3 Eamples Simple qadratic: let f() = 1 2 T Q, where Q 0. Then y T 1 2 T Q is strictly concave in y and is maimized at y = Q 1, so f (y) = 1 2 yt Q 1 y
Lectre 13: Dality ses and correspondences 13-5 Indicator fnction: if f() = I C (), then its conjgate is called the spport fnction of C. Norm: if f() =, then its conjgate is where is the dal norm of. f (y) = ma C yt := I C(y) f (y) = I {z: z 1}(y) 13.3.4 Eample: lasso dal Given y R n, X R n p, recall the lasso problem: 1 β 2 y Xβ 2 2 + λ β 1 Its dal fnction is jst a constant (eqal to f ). Therefore we transform the primal to So dal fnction is now g() = β,z 1 β,z 2 y z 2 2 + λ β 1 sbject to z = Xβ 1 2 y z 2 2 + λ β 1 + T (z Xβ) = 1 2 2 2 + y T I {v: v 1}(X T /λ) Therefore the lasso dal problem is or eqivalently sbject to sbject to ma 1 2 2 2 + y T X T λ y 2 2 X T λ Check: Slater s condition holds, and hence so does strong dality. Bt note: the optimal vale of the last problem is not the optimal lasso objective vale. Frther, note the given the dal soltion, any lasso soltion β satisfies Xβ = y This is from KKT stationarity condition for z (i.e. z y + β = 0). So the lasso fit is jst the dal residal (see Figre 13.2).
13-6 Lectre 13: Dality Uses and Correspondences Figre 13.2: The lasso soltion and its dal soltion 13.3.5 Conjgates and dal problems Conjgates appear freqently in derivation of dal problems, via in imization of the Lagrangian. E.g., consider f () = f() T f() + g() Eqivalently:,z f() + g(z) sbject to = z. Lagrange dal fnction is: g() =,z f() + g(z) + T (z ) =,z f() T + g(z) ( ) T z = {f() T } + {g(z) ( ) T z} z = ma{ T f()} ma z {( )T z g(z)} = f () g ( ) Eamples of this last calclation: Indicator fnction: the dal of is f() + I C () ma f () IC( ) where I C is the spport fnction of C. Norms: the dal of f() +
Lectre 13: Dality ses and correspondences 13-7 is or eqivalently where is the dal norm of. ma f () I {z: z 1}( ) ma f () sbject to 1 13.3.6 Shifting linear transformations Dal formlations can help s by shifting a linear transformation between one part of the objective and another. Let s consider f() + g(a) Eqivalently:,z f() + g(z) sbject to A = z. Like before: Then dal is: g() =,z f() + g(z) + T (z A) = ma (AT ) T f() ma ( )T z g(z) = f (A T ) g ( ) ma f (A T ) g ( ) Eample: for a norm and its dal norm,,, the problems and are primal and dal paris. f() + A ma f (A T ) sbject to 1 z 13.4 Dal cones 13.4.1 Definition Recall that set K R n is a cone if K, t 0, we have t K. The dal cone of K is defined as K = {y : y T 0 for all K} Important properties: K is closed and conve. K 1 K 2 K2 K1 K is the closre of the conve hll of K. (Hence if K is conve and closed, K = K)
13-8 Lectre 13: Dality Uses and Correspondences Left. The halfspace with inward normal y contains the cone K, so y K. Right. The halfspace with inward normal z does not contain K, so z / K. Figre 13.3: From B & V [1] pp. 52 13.4.2 Eamples Linear sbspace: the dal cone of a linear sbspace V is V, its orthogonal complement. E.g. (row(a)) = (A). Norm cone: the dal cone of the norm cone is the norm cone of its dal norm K = {(, t) R n+1 : t} K = {(y, s) R n+1 : y s} Positive semidefinite cone: the conve cone S n + is self-dal, i.e. (S n +) = S n +. Y 0 Tr(Y X) 0 for all X 0 13.4.3 Dal cones and dal problems Consider the cone constrained problem its dal problem is f() sbject to A K ma f (A T ) IK( ) where I K (y) = ma z K z T y is the spport fnction of K. If K is a cone, we have IK ( ) = I K (), the this is eqivalent to where K is the dal cone of K. ma f (A T ) sbject to K It is sally easier to handle cone constraints like K than constraints that the linear transform of is in a cone, i.e. A K.
Lectre 13: Dality ses and correspondences 13-9 13.5 Doble dal Consider general imization problem with linear constraints: sbject to f() A b, C = d The Lagrangian is and hence the dal problem is L(,, v) = f() + (A T + C T v) T b T d T v f ( A T C T v) b T d T v sbject to 0 Recall property: f = f if f is closed and conve. Hence in this case, we can show that the dal of the dal is the primal. Actally this also goes beyond linear constraints. Consider f() sbject to h i () 0, i = 1,..., m l j () = 0, j = 1,..., r If f and h 1,...h m are closed and conve, and l 1,...l r are affine, then the dal of the dal is the primal. This is proved by viewing the imization problem in terms of a bifnction. In this framework, the dal fnction corresponds to the conjgate of this bifnction. See Chapter 29 and 30 of Rockafellar. [2] 13.6 Dal sbtleties We often transform the dal into an eqivalent problem and still call this the dal. Under strong dality, we can se soltions of the (transformed) dal problem to characterize or compte the primal soltions. Warning: the optimal vale of this transformed dal problem is not necessarily the optimal primal vale. A common trick in deriving dals for nconstrained problems is to first transform the primal by adding a dmmy variable and an eqality constraint. e.g. The previos eample of the lasso dal. Usally there is ambigity in how to do this. Different choices can lead to different dal problems. References [1] Stephen Boyd and Lieven Vandenberghe. Conve optimization. Cambridge University Press, 2004. [2] R. Tyrrell Rockafellar. Conve analysis. Princeton University Press, 1970.