Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725
Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions are ( m 0 f() + u i h i () + i=1 u i h i () = 0 for all i r j=1 h i () 0, l j () = 0 for all i, j u i 0 for all i ) v j l j () (stationarity) (complementary slackness) (primal feasibility) (dual feasibility) These are necessary for optimality (of a primal-dual pair and u, v ) under strong duality, and always sufficient 2
Two key uses of duality: Uses of duality For primal feasible and u, v dual feasible, f() g(u, v) is called the duality gap between and u, v. Since f() f( ) f() g(u, v) a zero duality gap implies optimality. Also, the duality gap can be used as a stopping criterion in algorithms Under strong duality, given dual optimal u, v, any primal solution imizes L(, u, v ) over all (i.e., it satisfies stationarity condition). This can be used to characterize or compute primal solutions 3
Solving the primal via the dual An important consequence of stationarity: under strong duality, given a dual solution u, v, any primal solution solves f() + m u i h i () + i=1 r vj l j () j=1 Often, solutions of this unconstrained problem can be epressed eplicitly, giving an eplicit characterization of primal solutions from dual solutions Furthermore, suppose the solution of this problem is unique; then it must be the primal solution This can be very helpful when the dual is easier to solve than the primal 4
Eample from B & V page 249: n f i ( i ) subject to a T = b i=1 where each f i : R R is smooth, strictly conve. Dual function: g(v) = = bv + = bv n f i ( i ) + v(b a T ) i=1 n i=1 i { fi ( i ) a i v i } n fi (a i v) i=1 where f i is the conjugate of f i, to be defined shortly 5
Therefore the dual problem is ma v bv n i=1 f i (a i v) v n fi (a i v) bv i=1 This is a conve imization problem with scalar variable much easier to solve than primal Given v, the primal solution solves n ( fi ( i ) a i v ) i i=1 Strict conveity of each f i implies that this has a unique solution, namely, which we compute by solving f i ( i ) = a i v for each i 6
Outline Today: Dual norms Conjugate functions Dual cones Dual tricks and subtleties (Note: there are many other uses of duality and relationships to duality that we could discuss, but not enough time...) 7
Dual norms Let be a norm, e.g., l p norm: p = ( n i=1 i p ) 1/p, for p 1 Trace norm: X tr = r i=1 σ i(x) We define its dual norm as = ma z 1 zt Gives us the inequality z T z (like generalized Holder). Back to our eamples, l p norm dual: ( p ) = q, where 1/p + 1/q = 1 Trace norm dual: ( X tr ) = X op = σ 1 (X) Dual norm of dual norm: can show that = 8
Proof: consider the (trivial-looking) problem y y subject to y = whose optimal value is. Lagrangian: L(y, u) = y + u T ( y) = y y T u + T u Using definition of, If u > 1, then y { y y T u} = If u 1, then y { y y T u} = 0 Therefore Lagrange dual problem is ma u u T subject to u 1 By strong duality f = g, i.e., = 9
10 Conjugate function Given a function f : R n R, define its conjugate f : R n R, f (y) = ma y T f() Note that f is always conve, since it is the pointwise maimum of conve (affine) functions in y (here f need not be conve) f() y f (y) : maimum gap between linear function y T and f() (From B & V page 91) (0, f (y)) For differentiable f, conjugation is called the Legendre transform
11 Properties: Fenchel s inequality: for any, y, f() + f (y) T y Conjugate of conjugate f satisfies f f If f is closed and conve, then f = f If f is closed and conve, then for any, y, f (y) y f() f() + f (y) = T y If f(u, v) = f 1 (u) + f 2 (v), then f (w, z) = f1 (w) + f2 (z)
12 Eamples: Simple quadratic: let f() = 1 2 T Q, where Q 0. Then y T 1 2 T Q is strictly concave in y and is maimized at y = Q 1, so f (y) = 1 2 yt Q 1 y Indicator function: if f() = I C (), then its conjugate is f (y) = I C(y) = ma C yt called the support function of C Norm: if f() =, then its conjugate is f (y) = I {z : z 1}(y) where is the dual norm of
13 Eample: lasso dual Given y R n, X R n p, recall the lasso problem: 1 β 2 y Xβ 2 2 + λ β 1 Its dual function is just a constant (equal to f ). Therefore we transform the primal to β,z 1 2 y z 2 2 + λ β 1 subject to z = Xβ so dual function is now g(u) = β,z 1 2 y z 2 2 + λ β 1 + u T (z Xβ) = 1 2 y 2 2 1 2 y u 2 2 I {v : v 1}(X T u/λ)
14 Therefore the lasso dual problem is ma u 1 2 ( ) y 2 2 y u 2 2 subject to X T u λ u y u 2 2 subject to X T u λ Check: Slater s condition holds, and hence so does strong duality. But note: the optimal value of the last problem is not the optimal lasso objective value Further, note that given the dual solution u, any lasso solution β satisfies Xβ = y u This is from KKT stationarity condition for z (i.e., z y + β = 0). So the lasso fit is just the dual residual
15 y X ˆβ û A, s A 0 (X T ) 1 0 C = {u : X T u λ} {v : v λ} R n R p
16 Conjugates and dual problems Conjugates appear frequently in derivation of dual problems, via f (u) = f() u T in imization of the Lagrangian. E.g., consider f() + g() Equivalently:,z g(u) = Hence dual problem is f() + g(z) subject to = z. Dual function: f() + g(z) + u T (z ) = f (u) g ( u) ma u f (u) g ( u)
17 Eamples of this last calculation: Indicator function: Primal : Dual : ma u f() + I C () f (u) I C( u) where I C is the support function of C Norms: the dual of Primal : Dual : ma u f() + f (u) subject to u 1 where is the dual norm of
18 Shifting linear transformations Dual formulations can help us by shifting a linear transformation between one part of the objective and another. Consider f() + g(a) Equivalently:,z dual is: f() + g(z) subject to A = z. Like before, ma u f (A T u) g ( u) Eample: for a norm and its dual norm,, : Primal : Dual : ma u f() + A f(a T u) subject to u 1 The dual can often be a helpful transformation here
19 Dual cones For a cone K R n (recall this means K, t 0 = t K), K = {y : y T 0 for all K} is called its dual cone. This is always a conve cone (even if K is not conve) y K z K Notice that y K the halfspace { : y T 0} contains K (From B & V page 52) Important property: if K is a closed conve cone, then K = K
20 Eamples: Linear subspace: the dual cone of a linear subspace V is V, its orthogonal complement. E.g., (row(a)) = null(a) Norm cone: the dual cone of the norm cone K = {(, t) R n+1 : t} is the norm cone of its dual norm K = {(y, s) R n+1 : y s} Positive semidefinite cone: the conve cone S n + is self-dual, meaning (S n +) = S n +. Why? Check that Y 0 tr(y X) 0 for all X 0 by looking at the eigendecomposition of X
21 Dual cones and dual problems Consider the cone constrained problem Recall that its dual problem is f() subject to A K ma u f (A T u) I K( u) where recall I K (y) = ma z K z T y, the support function of K. If K is a cone, then this is simply ma u f (A T u) subject to u K where K is the dual cone of K, because I K ( u) = I K (u) This is quite a useful observation, because many different types of constraints can be posed as cone constraints
22 Dual subtleties Often, we will transform the dual into an equivalent problem and still call this the dual. Under strong duality, we can use solutions of the (transformed) dual problem to characterize or compute primal solutions Warning: the optimal value of this transformed dual problem is not necessarily the optimal primal value A common trick in deriving duals for unconstrained problems is to first transform the primal by adding a dummy variable and an equality constraint Usually there is ambiguity in how to do this. Different choices can lead to different dual problems!
23 Double dual Consider general imization problem with linear constraints: subject to f() A b, C = d The Lagrangian is L(, u, v) = f() + (A T u + C T v) T b T u d T v and hence the dual problem is ma u,v subject to u 0 f ( A T u C T v) b T u d T v Recall property: f = f if f is closed and conve. Hence in this case, we can show that the dual of the dual is the primal
24 Actually, the connection (between duals of duals and conjugates) runs much deeper than this, beyond linear constraints. Consider subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r If f and h 1,... h m are closed and conve, and l 1,... l r are affine, then the dual of the dual is the primal This is proved by viewing the imization problem in terms of a bifunction. In this framework, the dual function corresponds to the conjugate of this bifunction (for more, read Chapters 29 and 30 of Rockafellar)
25 References S. Boyd and L. Vandenberghe (2004), Conve optimization, Chapters 2, 3, 5 R. T. Rockafellar (1970), Conve analysis, Chapters 12, 13, 14, 16, 28 30