UC Berkele Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 6 Fall 2009 Solution 6.1 (a) p = 1 (b) The Lagrangian is L(x,, λ) = e x + λx 2 /. The dual function is g(λ) = inf x,>0 (e x + λx 2 0 if λ 0 /) = otherwise so we can write the dual problem as maximize 0 subject to λ 0 with optimal value d = 0. The optimal dualit gap is p d = 1 (c) Slater s condition is not satisfied. (d) p (u) = 1 if u = 0, p (u) = 0 if u > 0 and p (u) = if u < 0 Solution 6.2 Suppose x is feasible. Since f i are convex and f i (x) 0, we have 0 f i (x) f i (x ) + f i (x ) T (x x ), i = 1,...,m Using λ i 0, we conclude that 0 = λ i (f i (x ) + f i (x ) T (x x ) λ i (f i (x ) + f i (x ) T (x x ) = f 0 (x ) T (x x ) In the last line, we use the complementar slackness condition λ i f i(x ) = 0, and the last KKT condition. This show that f 0 (x ) T (x x ) 0, i.e. f 0 (x ) defines a supporting hperplane to feasible set at x Solution 6.3 (a) Follows from tr(wxx T ) = x T Wx and (xx T ) ii = x 2 i (b) It gives a lower bound because we minimize the same objective function over a larger set. If X is rank one, it is optimal. 1
(c) We write the problem as a minimization problem minimize 1 T ν subject to W + diag(ν) 0 Introducing a Lagrange multiplier X S n for the matrix inequalit, we obtain the Lagrangian L(ν, X) = 1 T ν tr(x(w + diag(ν))) = 1 T ν tr(xw) n ν ix ii = tr(xw) + n ν i(1 X ii ) This is bounded below as a function of ν onl if X ii = 1 for all i, so we obtain the dual problem maximize tr(w X) subject to X 0 X ii = 1, i = 1,...,n Changing the sign again, and switching from maximization to minimization, ields the problem in part (a) Solution 6.4 (a) We introduce the new variables, and write the problem as The Lagrangian is L(x,, t, λ, ν, µ) = c T x + minimize c T x subject to i 2 t i, i = 1,...,m i = A i x + b i, i = 1,...,m t i = c T i x + d i, i = 1,...,m + λ i ( i 2 t i ) + µ i (t i c T i x d i ) = (c + A T i ν i νi T ( i A i x b i ) µ i c i ) T x + ( λ i + µ i )t i (λ i i 2 + νi T i ) (b T i ν i + d i µ i ) The minimum over x is bounded below if and onl i To minimize over i, we note that (A T i ν i + µ i c i ) = c inf i (λ i i + ν T i i ) = 2 0 νi 2 λ i otherwise
The minimum over t i is bounded below if and onl if λ i = µ i. The Lagrangian is m g(λ, ν, µ) = (bt i ν i + d i µ i ) m (AT i ν i + µ i c i ) = c, ν i 2 λ i, µ = λ otherwise which leads to the dual problem maximize m (bt i ν i + d i λ i ) subject to m (AT i ν i + λ i c i ) = c ν i 2 λ i, i = 1,...,m (b) We express the SOCP as a conic form problem minimize c T x subject to (A i x + b i, c T i x + d i) Ki 0, i = 1,...,m The conic dual is maximize m (bt i u i + d i v i ) subject to m (AT i u i + v i c i ) = c (u i, v i ) K i 0, i = 1,...,m Solution 6.5 (a) Since f is a convex and closed function (i.e., its epigraph is a closed set), it can be represented via its conjugate, as f(r) = max r T f () }. Consequentl, we can express the problem in minimax form, as p = min x max φ(x, ), where the function φ(x, ) := T (Ax + b) f () + 1 2 x 2 2. Weak dualit tells us that p d, where d := max min x φ(x, ). We obtain d = min f () + 1 2 AT 2 2 b T }. (b) We observe that for ever, the sub-level sets of the function φ(, ) are bounded. (Here to avoid trivial sub-cases, we assume that p is finite, a condition that should have been in the problem statement.) Thus, according to the result of [BV,exercise 5.25], we have p = Ad. We observe that d (hence, p ) is convex in K = AA T, since d is concave: d = min f () + 1 2 T K b T }. 3
(c) The primal problem involves a strictl convex objective function and no constraints, hence the optimum is attained and unique. For each, the problem min x φ(x, ) has a unique solution, given b x() := A T. According to the result in [BV, 5.5.5], we conclude that if is optimal for the dual problem, then x = A T is optimal. (d) The dual takes the following specific forms. (i) Support vector machines classification: When f(r) = m (r i) +, we have f(r) = max 0 u 1 ut r, which shows that f is then the indicator function of the set [0, 1] m. The dual problem writes (with b = 1): 2d = min A T 2 2 2b T : 0 1. (ii) Least-squares regression: The function f(r) = 1 2 r 2 2 is self-conjugate, so that the dual problem takes the form 2d 1 = min 2 2bT T (K + I) = b T (K + I) 1 b, as expected from the primal form. (ii) Least-norm regression: when f(r) = r, where is a norm, the conjugate of f is the indicator of the unit ball for the dual norm, hence We can express d as 2d = min A T 2 2 2b T : 1. d = min 0 K : 1, where 0 := K 1 b, and K is the weighted Euclidean norm with values z 2 K = z T Kz. The above represented the minimum weighted Euclidean distance from 0 the unit ball in the dual norm. Solution 6.6 (a) The KKT conditions for (z, λ ) are given b the following equations: z L(z, λ ) = f(x n ) + 2 f(x n )z + A T λ = 0 (1a) Az = 0. (1b) Putting these equations together ields the given matrix form. Since the problem is strictl convex (assuming that 2 f(x n ) 0) with linear constraints, these KKT conditions are necessar and sufficient to ield the optimum. Hence when P(x n ) is invertible, solving the sstem ields the given form of the Newton update. (b) Here is some MATLAB code to solve this problem via Newton s method with Armijo rule: 4
% Newton s method with Armijo rule to solve the constrained maximum % entrop problem in primal form clear f; MAXITS = 500; % Maximum number of iterations BETA = 0.5; % Armijo parameter SIGMA = 0.1; % Armijo parameter GRADTOL = 1e-7; % Tolerance for gradient load xinit.ascii; load A.ascii; load b.ascii x = xinit; m = size(a,1); n = size(a,2); for iter=1:maxits, val = x *log(x); % current function value f(iter) = val; grad = 1 + log(x); % current gradient hess = diag(1./x); temp = -[hess A ; A zeros(m,m)] \ [grad; zeros(m,1)]; newt = temp(1:n); primal_lambda = temp(n+1:(n+m)); descmag = grad *newt; % Check magnitude of descent if (abs(descmag) < GRADTOL) break; t = 1; while (min(x + t*newt) <= 0) t = BETA*t; while ( ((x+t*newt) *log(x+t*newt)) - val >= SIGMA*t*descmag) t = BETA*t; x = x + t*newt; gradviol = norm(a*x -b,2) pstar = val Appling to the problem data on the website ields p = 33.6429. Figure 1(a) shows log f(x n ) p versus iteration number n. 5
10 1 Convergence plot for primal Newton method 10 3 Convergence plot for dual Newton method 10 2 10 0 10 1 10 1 10 0 10 1 f n p * 10 2 q n q * 10 2 10 3 10 3 10 4 10 4 10 5 10 6 10 5 1 1.5 2 2.5 3 3.5 4 4.5 5 Iteration number (a) Primal Newton method 10 7 1 2 3 4 5 6 7 8 9 Iteration number (b) Dual Newton method Figure 1: Convergence plots for the primal Newton method (a) and dual Newton method (b). The inverted quadratic shape (on a log scale) reveals the quadratic convergence. (c) The Lagrangian dual is given b q(λ) : = = inf x 0 n x i log x i + λ T (Ax b) } n inf xi log x i (a T } i λ)x i b T λ x i >0 = n g ( a T i λ) b T λ where g(v) = v log v with dom(g) = v > 0}, and g is its conjugate dual. Straightforward calculations give g (µ) = exp(µ 1) with dom(g ) = R, so that the result follows. (d) Here is some MATLAB code to solve the dual problem: % MATLAB code to perform dual optimization via Newton s method % This code actuall solves the convex problem of minimizing the % negative dual function. clear q; MAXITS = 500; % Maximum number of iterations BETA = 0.5; % Armijo parameter SIGMA = 0.1; % Armijo parameter 6
GRADTOL = 1e-7; % Tolerance for gradient load xinit.ascii; load A.ascii; load b.ascii m = size(a,1); n = size(a,2); laminit = ones(m,1); lambda = laminit; for iter=1:maxits, val = b *lambda + sum(exp(-a *lambda -1)); % current function value q(iter) = -val; grad = b - A*exp(-A *lambda-1); % current gradient hess = A*diag(exp(-A *lambda-1))*a ; newt = -hess \grad; descmag = grad *newt; % Check magnitude of descent if (abs(descmag) < GRADTOL) break; t = 1; newval = b *(lambda + t*newt) + sum(exp(-a *(lambda + t*newt)-1)); while (newval > val + SIGMA*t*descmag) t = BETA*t; fnew = b *(lambda + t*newt) + sum(exp(-a *(lambda + t*newt)-1)); lambda = lambda + t*newt; qstar = -val It ields the optimal dual value q = 33.6429 = p, so that strong dualit holds. Figure 1(b) shows a plot of log q(λ n ) q versus iteration number n. 7