UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 6 Fall 2009

Similar documents
Convex Optimization M2

5. Duality. Lagrangian

Convex Optimization Boyd & Vandenberghe. 5. Duality

Lecture: Duality.

Lecture: Duality of LP, SOCP and SDP

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

EE/AA 578, Univ of Washington, Fall Duality

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Convex Optimization & Lagrange Duality

ICS-E4030 Kernel Methods in Machine Learning

Homework Set #6 - Solutions

subject to (x 2)(x 4) u,

Additional Homework Problems

Optimization for Machine Learning

12. Interior-point methods

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

EE 227A: Convex Optimization and Applications October 14, 2008

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Lecture 18: Optimization Programming

Constrained Optimization and Lagrangian Duality

Lecture 7: Convex Optimizations

Support Vector Machines

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

CS-E4830 Kernel Methods in Machine Learning

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

EE364a Homework 8 solutions

Applications of Linear Programming

Lagrangian Duality Theory

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

A Brief Review on Convex Optimization

minimize x subject to (x 2)(x 4) u,

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Support Vector Machines: Maximum Margin Classifiers

On the Method of Lagrange Multipliers

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

EE364a Review Session 5

12. Interior-point methods

Tutorial on Convex Optimization for Engineers Part II

Tutorial on Convex Optimization: Part II

11. Equality constrained minimization

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Nonlinear Optimization: What s important?

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

Constrained optimization: direct methods (cont.)

Jeff Howbert Introduction to Machine Learning Winter

10-725/ Optimization Midterm Exam

10 Numerical methods for constrained problems

Convex Optimization and SVM

Inequality constrained minimization: log-barrier method

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Homework 4. Convex Optimization /36-725

Algorithms for constrained local optimization

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

Lagrangian Duality and Convex Optimization

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Optimality, Duality, Complementarity for Constrained Optimization

Interior Point Algorithms for Constrained Convex Optimization

Lagrange Relaxation and Duality

Convex Optimization and Support Vector Machine

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Written Examination

Equality constrained minimization

Numerical optimization

Constrained Optimization

A. Derivation of regularized ERM duality

Convex Optimization and Modeling

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Optimality Conditions for Constrained Optimization

Primal-Dual Interior-Point Methods

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Convex Optimization M2

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning. Support Vector Machines. Manfred Huber

CS711008Z Algorithm Design and Analysis

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Analytic Center Cutting-Plane Method

2.098/6.255/ Optimization Methods Practice True/False Questions

Gauge optimization and duality

Solution to EE 617 Mid-Term Exam, Fall November 2, 2017

Lecture Note 5: Semidefinite Programming for Stability Analysis

Linear programming: Theory

Linear and non-linear programming

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Duality Theory of Constrained Optimization

Lecture 6: Conic Optimization September 8

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Semidefinite Programming Basics and Applications

EE364b Homework 5. A ij = φ i (x i,y i ) subject to Ax + s = 0, Ay + t = 0, with variables x, y R n. This is the bi-commodity network flow problem.

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Transcription:

UC Berkele Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 6 Fall 2009 Solution 6.1 (a) p = 1 (b) The Lagrangian is L(x,, λ) = e x + λx 2 /. The dual function is g(λ) = inf x,>0 (e x + λx 2 0 if λ 0 /) = otherwise so we can write the dual problem as maximize 0 subject to λ 0 with optimal value d = 0. The optimal dualit gap is p d = 1 (c) Slater s condition is not satisfied. (d) p (u) = 1 if u = 0, p (u) = 0 if u > 0 and p (u) = if u < 0 Solution 6.2 Suppose x is feasible. Since f i are convex and f i (x) 0, we have 0 f i (x) f i (x ) + f i (x ) T (x x ), i = 1,...,m Using λ i 0, we conclude that 0 = λ i (f i (x ) + f i (x ) T (x x ) λ i (f i (x ) + f i (x ) T (x x ) = f 0 (x ) T (x x ) In the last line, we use the complementar slackness condition λ i f i(x ) = 0, and the last KKT condition. This show that f 0 (x ) T (x x ) 0, i.e. f 0 (x ) defines a supporting hperplane to feasible set at x Solution 6.3 (a) Follows from tr(wxx T ) = x T Wx and (xx T ) ii = x 2 i (b) It gives a lower bound because we minimize the same objective function over a larger set. If X is rank one, it is optimal. 1

(c) We write the problem as a minimization problem minimize 1 T ν subject to W + diag(ν) 0 Introducing a Lagrange multiplier X S n for the matrix inequalit, we obtain the Lagrangian L(ν, X) = 1 T ν tr(x(w + diag(ν))) = 1 T ν tr(xw) n ν ix ii = tr(xw) + n ν i(1 X ii ) This is bounded below as a function of ν onl if X ii = 1 for all i, so we obtain the dual problem maximize tr(w X) subject to X 0 X ii = 1, i = 1,...,n Changing the sign again, and switching from maximization to minimization, ields the problem in part (a) Solution 6.4 (a) We introduce the new variables, and write the problem as The Lagrangian is L(x,, t, λ, ν, µ) = c T x + minimize c T x subject to i 2 t i, i = 1,...,m i = A i x + b i, i = 1,...,m t i = c T i x + d i, i = 1,...,m + λ i ( i 2 t i ) + µ i (t i c T i x d i ) = (c + A T i ν i νi T ( i A i x b i ) µ i c i ) T x + ( λ i + µ i )t i (λ i i 2 + νi T i ) (b T i ν i + d i µ i ) The minimum over x is bounded below if and onl i To minimize over i, we note that (A T i ν i + µ i c i ) = c inf i (λ i i + ν T i i ) = 2 0 νi 2 λ i otherwise

The minimum over t i is bounded below if and onl if λ i = µ i. The Lagrangian is m g(λ, ν, µ) = (bt i ν i + d i µ i ) m (AT i ν i + µ i c i ) = c, ν i 2 λ i, µ = λ otherwise which leads to the dual problem maximize m (bt i ν i + d i λ i ) subject to m (AT i ν i + λ i c i ) = c ν i 2 λ i, i = 1,...,m (b) We express the SOCP as a conic form problem minimize c T x subject to (A i x + b i, c T i x + d i) Ki 0, i = 1,...,m The conic dual is maximize m (bt i u i + d i v i ) subject to m (AT i u i + v i c i ) = c (u i, v i ) K i 0, i = 1,...,m Solution 6.5 (a) Since f is a convex and closed function (i.e., its epigraph is a closed set), it can be represented via its conjugate, as f(r) = max r T f () }. Consequentl, we can express the problem in minimax form, as p = min x max φ(x, ), where the function φ(x, ) := T (Ax + b) f () + 1 2 x 2 2. Weak dualit tells us that p d, where d := max min x φ(x, ). We obtain d = min f () + 1 2 AT 2 2 b T }. (b) We observe that for ever, the sub-level sets of the function φ(, ) are bounded. (Here to avoid trivial sub-cases, we assume that p is finite, a condition that should have been in the problem statement.) Thus, according to the result of [BV,exercise 5.25], we have p = Ad. We observe that d (hence, p ) is convex in K = AA T, since d is concave: d = min f () + 1 2 T K b T }. 3

(c) The primal problem involves a strictl convex objective function and no constraints, hence the optimum is attained and unique. For each, the problem min x φ(x, ) has a unique solution, given b x() := A T. According to the result in [BV, 5.5.5], we conclude that if is optimal for the dual problem, then x = A T is optimal. (d) The dual takes the following specific forms. (i) Support vector machines classification: When f(r) = m (r i) +, we have f(r) = max 0 u 1 ut r, which shows that f is then the indicator function of the set [0, 1] m. The dual problem writes (with b = 1): 2d = min A T 2 2 2b T : 0 1. (ii) Least-squares regression: The function f(r) = 1 2 r 2 2 is self-conjugate, so that the dual problem takes the form 2d 1 = min 2 2bT T (K + I) = b T (K + I) 1 b, as expected from the primal form. (ii) Least-norm regression: when f(r) = r, where is a norm, the conjugate of f is the indicator of the unit ball for the dual norm, hence We can express d as 2d = min A T 2 2 2b T : 1. d = min 0 K : 1, where 0 := K 1 b, and K is the weighted Euclidean norm with values z 2 K = z T Kz. The above represented the minimum weighted Euclidean distance from 0 the unit ball in the dual norm. Solution 6.6 (a) The KKT conditions for (z, λ ) are given b the following equations: z L(z, λ ) = f(x n ) + 2 f(x n )z + A T λ = 0 (1a) Az = 0. (1b) Putting these equations together ields the given matrix form. Since the problem is strictl convex (assuming that 2 f(x n ) 0) with linear constraints, these KKT conditions are necessar and sufficient to ield the optimum. Hence when P(x n ) is invertible, solving the sstem ields the given form of the Newton update. (b) Here is some MATLAB code to solve this problem via Newton s method with Armijo rule: 4

% Newton s method with Armijo rule to solve the constrained maximum % entrop problem in primal form clear f; MAXITS = 500; % Maximum number of iterations BETA = 0.5; % Armijo parameter SIGMA = 0.1; % Armijo parameter GRADTOL = 1e-7; % Tolerance for gradient load xinit.ascii; load A.ascii; load b.ascii x = xinit; m = size(a,1); n = size(a,2); for iter=1:maxits, val = x *log(x); % current function value f(iter) = val; grad = 1 + log(x); % current gradient hess = diag(1./x); temp = -[hess A ; A zeros(m,m)] \ [grad; zeros(m,1)]; newt = temp(1:n); primal_lambda = temp(n+1:(n+m)); descmag = grad *newt; % Check magnitude of descent if (abs(descmag) < GRADTOL) break; t = 1; while (min(x + t*newt) <= 0) t = BETA*t; while ( ((x+t*newt) *log(x+t*newt)) - val >= SIGMA*t*descmag) t = BETA*t; x = x + t*newt; gradviol = norm(a*x -b,2) pstar = val Appling to the problem data on the website ields p = 33.6429. Figure 1(a) shows log f(x n ) p versus iteration number n. 5

10 1 Convergence plot for primal Newton method 10 3 Convergence plot for dual Newton method 10 2 10 0 10 1 10 1 10 0 10 1 f n p * 10 2 q n q * 10 2 10 3 10 3 10 4 10 4 10 5 10 6 10 5 1 1.5 2 2.5 3 3.5 4 4.5 5 Iteration number (a) Primal Newton method 10 7 1 2 3 4 5 6 7 8 9 Iteration number (b) Dual Newton method Figure 1: Convergence plots for the primal Newton method (a) and dual Newton method (b). The inverted quadratic shape (on a log scale) reveals the quadratic convergence. (c) The Lagrangian dual is given b q(λ) : = = inf x 0 n x i log x i + λ T (Ax b) } n inf xi log x i (a T } i λ)x i b T λ x i >0 = n g ( a T i λ) b T λ where g(v) = v log v with dom(g) = v > 0}, and g is its conjugate dual. Straightforward calculations give g (µ) = exp(µ 1) with dom(g ) = R, so that the result follows. (d) Here is some MATLAB code to solve the dual problem: % MATLAB code to perform dual optimization via Newton s method % This code actuall solves the convex problem of minimizing the % negative dual function. clear q; MAXITS = 500; % Maximum number of iterations BETA = 0.5; % Armijo parameter SIGMA = 0.1; % Armijo parameter 6

GRADTOL = 1e-7; % Tolerance for gradient load xinit.ascii; load A.ascii; load b.ascii m = size(a,1); n = size(a,2); laminit = ones(m,1); lambda = laminit; for iter=1:maxits, val = b *lambda + sum(exp(-a *lambda -1)); % current function value q(iter) = -val; grad = b - A*exp(-A *lambda-1); % current gradient hess = A*diag(exp(-A *lambda-1))*a ; newt = -hess \grad; descmag = grad *newt; % Check magnitude of descent if (abs(descmag) < GRADTOL) break; t = 1; newval = b *(lambda + t*newt) + sum(exp(-a *(lambda + t*newt)-1)); while (newval > val + SIGMA*t*descmag) t = BETA*t; fnew = b *(lambda + t*newt) + sum(exp(-a *(lambda + t*newt)-1)); lambda = lambda + t*newt; qstar = -val It ields the optimal dual value q = 33.6429 = p, so that strong dualit holds. Figure 1(b) shows a plot of log q(λ n ) q versus iteration number n. 7