Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the optimal value, and the optimal solution. (b) Lagrangian and dual function. Plot the objective x 2 + 1 versus x. On the same plot, show the feasible set, optimal point and value, and plot the Lagrangian L(x, λ) versus x for a few positive values of λ. Verify the lower bound property (p inf x L(x, λ) for λ 0). Derive and sketch the Lagrange dual function g. (c) Lagrange dual problem. State the dual problem, and verify that it is a concave maximization problem. Find the dual optimal value and dual optimal solution λ. Does strong duality hold? (d) Sensitivity analysis. Let p (u) denote the optimal value of the problem Solution. minimize x 2 + 1 subject to (x 2)(x 4) u, as a function of the parameter u. Plot p (u). Verify that dp (0)/du = λ. (a) The feasible set is the interval [2, 4]. The (unique) optimal point is x = 2, and the optimal value is p = 5. The plot shows f 0 and f 1. 30 25 20 f 0 15 10 PSfrag replacements 5 0 f 1 (b) The Lagrangian is 5 1 0 1 2 3 4 5 x L(x, λ) = (1 + λ)x 2 6λx + (1 + 8λ). The plot shows the Lagrangian L(x, λ) = f 0 + λf 1 as a function of x for different values of λ 0. Note that the minimum value of L(x, λ) over x (i.e., g(λ)) is always less than p. It increases as λ varies from 0 toward 2, reaches its maximum at λ = 2, and then decreases again as λ increases above 2. We have equality p = g(λ) for λ = 2.
5 Duality PSfrag replacements 30 25 20 15 10 f 0 + 3.0f 1 f 0 + 2.0f 1 f 0 + 1.0f 1 5 0 f0 5 1 0 1 2 3 4 5 x For λ > 1, the Lagrangian reaches its minimum at x = 3λ/(1 + λ). For λ 1 it is unbounded below. Thus 9λ 2 /(1 + λ) + 1 + 8λ λ > 1 g(λ) = λ 1 which is plotted below. 6 4 2 0 g(λ) 2 4 6 PSfrag replacements 8 10 2 1 0 1 2 3 4 λ We can verify that the dual function is concave, that its value is equal to p = 5 for λ = 2, and less than p for other values of λ. (c) The Lagrange dual problem is maximize 9λ 2 /(1 + λ) + 1 + 8λ subject to λ 0. The dual optimum occurs at λ = 2, with d = 5. So for this example we can directly observe that strong duality holds (as it must Slater s constraint qualification is satisfied). (d) The perturbed problem is infeasible for u < 1, since inf x(x 2 6x + 8) = 1. For u 1, the feasible set is the interval [3 1 + u, 3 + 1 + u], given by the two roots of x 2 6x + 8 = u. For 1 u 8 the optimum is x (u) = 3 1 + u. For u 8, the optimum is the unconstrained minimum of f 0,
i.e., x (u) = 0. In summary, p (u) = u < 1 11 + u 6 1 + u 1 u 8 1 u 8. The figure shows the optimal value function p (u) and its epigraph. 10 8 epi p 6 p (u) 4 PSfrag replacements 2 0 p (0) λ u 2 2 0 2 4 6 8 10 u Finally, we note that p (u) is a differentiable function of u, and that dp (0) du = 2 = λ. 5.2 Weak duality for unbounded and infeasible problems. The weak duality inequality, d p, clearly holds when d = or p =. Show that it holds in the other two cases as well: If p =, then we must have d =, and also, if d =, then we must have p =. Solution. (a) p =. The primal problem is unbounded, i.e., there exist feasible x with arbitrarily small values of f 0(x). This means that L(x, λ) = f 0(x) + m λ if i(x) is unbounded below for all λ 0, i.e., g(λ) = for λ 0. Therefore the dual problem is infeasible (d = ). (b) d =. The dual problem is unbounded above. This is only possible if the primal problem is infeasible. If it were feasible, with f i( x) 0 for i = 1,..., m, then for all λ 0, g(λ) = inf(f 0(x) + λ if i(x)) f 0( x) + λ if i( x), so the dual problem is bounded above. 5.3 Problems with one inequality constraint. Express the dual problem of i minimize c T x subject to f(x) 0, i
(c) Defining z = λw, we obtain the equivalent problem This is the dual of the original LP. maximize b T z subject to A T z + c = 0 z 0. 5.5 Dual of general LP. Find the dual function of the LP minimize subject to c T x Gx h Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. (a) The Lagrangian is L(x, λ, ν) = c T x + λ T (Gx h) + ν T (Ax b) = (c T + λ T G + ν T A)x hλ T ν T b, which is an affine function of x. It follows that the dual function is given by (b) The dual problem is λ T g(λ, ν) = inf L(x, λ, ν) = h ν T b c + G T λ + A T ν = 0 x otherwise. maximize g(λ, ν) subject to λ 0. After making the implicit constraints explicit, we obtain maximize λ T h ν T b subject to c + G T λ + A T ν = 0 λ 0. 5.6 Lower bounds in Chebyshev approximation from least-squares. Consider the Chebyshev or l -norm approximation problem minimize Ax b, (5.103) where A R m n and rank A = n. Let x ch denote an optimal solution (there may be multiple optimal solutions; x ch denotes one of them). The Chebyshev problem has no closed-form solution, but the corresponding least-squares problem does. Define x ls = argmin Ax b 2 = (A T A) 1 A T b. We address the following question. Suppose that for a particular A and b we have computed the least-squares solution x ls (but not x ch ). How suboptimal is x ls for the Chebyshev problem? In other words, how much larger is Ax ls b than Ax ch b? (a) Prove the lower bound using the fact that for all z R m, Ax ls b m Ax ch b, 1 m z 2 z z 2.
5 Duality (b) In example 5.6 (page 254) we derived a dual for the general norm approximation problem. Applying the results to the l -norm (and its dual norm, the l 1-norm), we can state the following dual for the Chebyshev approximation problem: maximize b T ν subject to ν 1 1 A T ν = 0. (5.104) Any feasible ν corresponds to a lower bound b T ν on Ax ch b. Denote the least-squares residual as r ls = b Ax ls. Assuming r ls 0, show that ˆν = r ls / r ls 1, ν = r ls / r ls 1, are both feasible in (5.104). By duality b T ˆν and b T ν are lower bounds on Ax ch b. Which is the better bound? How do these bounds compare with the bound derived in part (a)? Solution. (a) Simple manipulation yields Ax cheb b 1 m Ax cheb b 2 1 m Ax ls b 2 1 m Ax ls b. (b) From the expression x ls = (A T A) 1 A T b we note that A T r ls = A T (b A(A T A) 1 b) = A T b A T b = 0. Therefore A T ˆν = 0 and A T ν = 0. Obviously we also have ˆν 1 = 1 and ν 1 = 1, so ˆν and ν are dual feasible. We can write the dual objective value at ˆν as b T ˆν = bt r ls r ls 1 = (Ax ls b) T r ls r ls 1 = r ls 2 2 r ls 1 and, similarly, b T ν = r ls 2 2 r ls 1. Therefore ν gives a better bound than ˆν. Finally, to show that the resulting lower bound is better than the bound in part (a), we have to verify that r ls 2 2 1 r ls. r ls 1 m This follows from the inequalities which hold for general x R m. x 1 m x 2, x x 2 5.7 Piecewise-linear minimization. We consider the convex piecewise-linear minimization problem minimize max,...,m(a T i x + b i) (5.105) with variable x R n.
5.10 Optimal experiment design. The following problems arise in experiment design (see 7.5). (a) D-optimal design. minimize log det ( ) p 1 xivivt i subject to x 0, 1 T x = 1. (b) A-optimal design. minimize tr ( ) p 1 xivivt i subject to x 0, 1 T x = 1. p xivivt i The domain of both problems is x 0}. The variable is x R p ; the vectors v 1,..., v p R n are given. Derive dual problems by first introducing a new variable X S n and an equality constraint X = p xivivt i, and then applying Lagrange duality. Simplify the dual problems as much as you can. Solution. (a) D-optimal design. The Lagrangian is minimize log det(x 1 ) subject to X = p xivivt i x 0, 1 T x = 1. L(x, Z, z, ν) = log det(x 1 ) + tr(zx) = log det(x 1 ) + tr(zx) + p x ivi T Zv i z T x + ν(1 T x 1) p x i( vi T Zv i z i + ν) ν. The minimum over x i is bounded below only if ν vi T Zv i = z i. Setting the gradient with respect to X equal to zero gives X 1 = Z. We obtain the dual function log det Z + n ν ν v T g(z, z) = i Zv i = z i, i = 1,..., p otherwise. The dual problem is maximize log det Z + n ν subject to vi T Zv i ν, i = 1,..., p, with domain S n ++ R. We can eliminate ν by first making a change of variables W = (1/ν)Z, which gives maximize log det W + n + n log ν ν subject to vi T Ŵ v i 1, i = 1,..., p. Finally, we note that we can easily optimize n log ν ν over ν. The optimum is ν = n, and substituting gives maximize log det W + n log n subject to vi T W v i 1, i = 1,..., p.
5 Duality (b) A-optimal design. The Lagrangian is minimize tr(x 1 ) subject to X = ( p xivivt i x 0, 1 T x = 1. L(X, Z, z, ν) = tr(x 1 ) + tr(zx) = tr(x 1 ) + tr(zx) + ) 1 p x ivi T Zv i z T x + ν(1 T x 1) p x i( vi T Zv i z i + ν) ν. The minimum over x is unbounded below unless vi T Zv i + z i = ν. The minimum over X can be found by setting the gradient equal to zero: X 2 = Z, or X = Z 1/2 if Z 0, which gives 2 tr(z 1/2 ) Z 0 inf X 0 (tr(x 1 ) + tr(zx)) = otherwise. The dual function is g(z, z, ν) = The dual problem is ν + 2 tr(z 1/2 ) Z 0, v T i Zv i + z i = ν otherwise. maximize ν + 2 tr(z 1/2 ) subject to vi T Zv i nu, i = 1,..., p Z 0. As a first simplification, we define W = (1/ν)Z, and write the problem as By optimizing over ν > 0, we obtain 5.11 Derive a dual problem for maximize ν + 2 ν tr(w 1/2 ) subject to vi T W v i 1, i = 1,..., p W 0. maximize (tr(w 1/2 )) 2 subject to vi T W v i 1, i = 1,..., p W 0. minimize N Aix + bi 2 + (1/2) x x0 2 2. The problem data are A i R m i n, b i R m i, and x 0 R n. First introduce new variables y i R m i and equality constraints y i = A ix + b i. Solution. The Lagrangian is L(x, z 1,..., z N ) = N y i 2 + 1 N 2 x x0 2 2 zi T (y i A ix b i).
5 Duality 5.13 Lagrangian relaxation of Boolean LP. A Boolean linear program is an optimization problem of the form minimize c T x subject to Ax b x i 0, 1}, i = 1,..., n, and is, in general, very difficult to solve. In exercise 4.15 we studied the LP relaxation of this problem, minimize c T x subject to Ax b (5.107) 0 x i 1, i = 1,..., n, which is far easier to solve, and gives a lower bound on the optimal value of the Boolean LP. In this problem we derive another lower bound for the Boolean LP, and work out the relation between the two lower bounds. (a) Lagrangian relaxation. The Boolean LP can be reformulated as the problem minimize subject to c T x Ax b x i(1 x i) = 0, i = 1,..., n, which has quadratic equality constraints. Find the Lagrange dual of this problem. The optimal value of the dual problem (which is convex) gives a lower bound on the optimal value of the Boolean LP. This method of finding a lower bound on the optimal value is called Lagrangian relaxation. (b) Show that the lower bound obtained via Lagrangian relaxation, and via the LP relaxation (5.107), are the same. Hint. Derive the dual of the LP relaxation (5.107). Solution. (a) The Lagrangian is L(x, µ, ν) = c T x + µ T (Ax b) ν T x + x T diag(ν)x = x T diag(ν)x + (c + A T µ ν) T x b T µ. Minimizing over x gives the dual function b T µ (1/4) n g(µ, ν) = (ci + at i µ ν i) 2 /ν i ν 0 otherwise where a i is the ith column of A, and we adopt the convention that a 2 /0 = if a 0, and a 2 /0 = 0 if a = 0. The resulting dual problem is sup ν i 0 maximize b T µ (1/4) n (ci + at i µ ν i) 2 /ν i subject to ν 0. In order to simplify this dual, we optimize analytically over ν, by noting that ( (ci + ) at i µ ν i) 2 (ci + a T i µ) c i + a T i µ 0 = ν i 0 c i + a T i µ 0 = min0, (c i + a T i µ)}. This allows us to eliminate ν from the dual problem, and simplify it as maximize b T µ + n min0, ci + at i µ} subject to µ 0.
(b) We follow the hint. The Lagrangian and dual function of the LP relaxation re The dual problem is L(x, u, v, w) = c T x + u T (Ax b) v T x + w T (x 1) = (c + A T u v + w) T x b T u 1 T w b T u 1 T w A T u v + w + c = 0 g(u, v, w) = otherwise. maximize b T u 1 T w subject to A T u v + w + c = 0 u 0, v 0, w 0, which is equivalent to the Lagrange relaxation problem derived above. We conclude that the two relaxations give the same value. 5.14 A penalty method for equality constraints. We consider the problem minimize f 0(x) subject to Ax = b, (5.108) where f 0 : R n R is convex and differentiable, and A R m n with rank A = m. In a quadratic penalty method, we form an auxiliary function φ(x) = f(x) + α Ax b 2 2, where α > 0 is a parameter. This auxiliary function consists of the objective plus the penalty term α Ax b 2 2. The idea is that a minimizer of the auxiliary function, x, should be an approximate solution of the original problem. Intuition suggests that the larger the penalty weight α, the better the approximation x to a solution of the original problem. Suppose x is a minimizer of φ. Show how to find, from x, a dual feasible point for (5.108). Find the corresponding lower bound on the optimal value of (5.108). Solution. If x minimizes φ, then Therefore x is also a minimizer of f 0( x) + 2αA T (A x b) = 0. f 0(x) + ν T (Ax b) where ν = 2α(A x b). Therefore ν is dual feasible with g(ν) = inf x (f0(x) + νt (Ax b)) = f 0( x) + 2α A x b 2 2. Therefore, for all x that satisfy Ax = b. 5.15 Consider the problem f 0(x) f 0( x) + 2α A x b 2 2 minimize f 0(x) subject to f i(x) 0, i = 1,..., m, (5.109)
(c) with variables x, u, t, v. minimize subject to x T Σx p T x r min 1 T x = 1, x 0 n/20 t + 1 T u 0.9 λ1 + u 0 u 0, 5.20 Dual of channel capacity problem. Derive a dual for the problem minimize subject to c T x + m P x = y x 0, 1 T x = 1, yi log yi where P R m n has nonnegative elements, and its columns add up to one (i.e., P T 1 = 1). The variables are x R n, y R m. (For c j = m pij log pij, the optimal value is, up to a factor log 2, the negative of the capacity of a discrete memoryless channel with channel transition probability matrix P ; see exercise 4.57.) Simplify the dual problem as much as possible. Solution. The Lagrangian is m L(x, y, λ, ν, z) = c T x + y i log y i λ T x + ν(1 T x 1) + z T (P x y) = ( c λ + ν1 + P T z) T x + The minimum over x is bounded below if and only if m y i log y i z T y ν. c λ + ν1 + P T z = 0. To minimize over y, we set the derivative with respect to y i equal to zero, which gives log y i + 1 z i = 0, and conclude that The dual function is g(λ, ν, z) = The dual problem is inf y i 0 (yi log yi ziyi) = ez i 1. m ez i 1 ν c λ + ν1 + P T z = 0 otherwise. maximize m exp(zi 1) ν subject to P T z c + ν1 0. This can be simplified by introducing a variable w = z + ν1 (and using the fact that 1 = P T 1), which gives maximize m exp(wi ν 1) ν subject to P T w c. Finally we can easily maximize the objective function over ν by setting the derivative equal to zero (the optimal value is ν = log( i e1 w i ), which leads to maximize log( m exp wi) 1 subject to P T w c. This is a geometric program, in convex form, with linear inequality constraints (i.e., monomial inequality constraints in the associated geometric program).
There are four solutions: corresponding to ν = 3.15, ν = 0.22, ν = 1.89, ν = 4.04, x = (0.16, 0.47, 0.87), x = (0.36, 0.82, 0.45), x = (0.90, 0.35, 0.26), x = ( 0.97, 0.20, 0.17). (c) ν is the largest of the four values: ν = 4.0352. This can be seen several ways. The simplest way is to compare the objective values of the four solutions x, which are f 0(x) = 1.17, f 0(x) = 0.67, f 0(x) = 0.56, f 0(x) = 4.70. We can also evaluate the dual objective at the four candidate values for ν. Finally we can note that we must have 2 f 0(x ) + ν 2 f 1 (x ) 0, because x is a minimizer of L(x, ν ). In other words [ ] [ 3 0 0 1 0 0 0 1 0 + ν 0 1 0 0 0 2 0 0 1 and therefore ν 3. 5.30 Derive the KKT conditions for the problem minimize tr X log det X subject to Xs = y, with variable X S n and domain S n ++. y R n and s R n are given, with s T y = 1. Verify that the optimal solution is given by X = I + yy T 1 s T s sst. Solution. We introduce a Lagrange multiplier z R n for the equality constraint. The KKT optimality conditions are: ] 0, X 0, Xs = y, X 1 = I + 1 2 (zst + sz T ). (5.30.A) We first determine z from the condition Xs = y. Multiplying the gradient equation on the right with y gives s = X 1 y = y + 1 2 (z + (zt y)s). (5.30.B) By taking the inner product with y on both sides and simplifying, we get z T y = 1 y T y. Substituting in (5.30.B) we get z = 2y + (1 + y T y)s, and substituting this expression for z in (5.30.A) gives X 1 = I + 1 2 ( 2ysT 2sy T + 2(1 + y T y)ss T ) = I + (1 + y T y)ss T ys T sy T.
5 Duality Finally we verify that this is the inverse of the matrix X given above: ( I + (1 + y T y)ss T ys T sy T ) X = (I + yy T (1/s T s)ss T ) + (1 + y T y)(ss T + sy T ss T ) (ys T + yy T ys T ) (sy T + (y T y)sy T (1/s T s)ss T ) = I. To complete the solution, we prove that X 0. An easy way to see this is to note that ( ) ( ) T X = I + yy T sst s T s = I + yst sst I + yst sst. s 2 s T s s 2 s T s 5.31 Supporting hyperplane interpretation of KKT conditions. Consider a convex problem with no equality constraints, minimize f 0(x) subject to f i(x) 0, i = 1,..., m. Assume that x R n and λ R m satisfy the KKT conditions Show that f i(x ) 0, i = 1,..., m λ i 0, i = 1,..., m λ i f i(x ) = 0, i = 1,..., m f 0(x ) + m λ i f i(x ) = 0. f 0(x ) T (x x ) 0 for all feasible x. In other words the KKT conditions imply the simple optimality criterion of 4.2.3. Solution. Suppose x is feasible. Since f i are convex and f i(x) 0 we have Using λ i 0, we conclude that 0 f i(x) f i(x ) + f i(x ) T (x x ), i = 1,..., m. 0 = m λ i ( fi(x ) + f i(x ) T (x x ) ) m m λ i f i(x ) + λ i f i(x ) T (x x ) = f 0(x ) T (x x ). In the last line, we use the complementary slackness condition λ i f i(x ) = 0, and the last KKT condition. This shows that f 0(x ) T (x x ) 0, i.e., f 0(x ) defines a supporting hyperplane to the feasible set at x. Perturbation and sensitivity analysis 5.32 Optimal value of perturbed problem. Let f 0, f 1,..., f m : R n R be convex. Show that the function p (u, v) = inff 0(x) x D, f i(x) u i, i = 1,..., m, Ax b = v}