EE364b Convex Optimization II May 30 June 2, Final exam

Similar documents
Final exam.

Final exam.

Exercises for EE364b

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Homework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

Homework 4. Convex Optimization /36-725

Constrained Optimization and Lagrangian Duality

10725/36725 Optimization Homework 4

EE364a: Convex Optimization I March or March 12 13, Final Exam

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

12. Interior-point methods

10-725/ Optimization Midterm Exam

12. Interior-point methods

Homework 3. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725

EE364a Homework 8 solutions

Linear and non-linear programming

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Introduction to Alternating Direction Method of Multipliers

6.079/6.975 S. Boyd & P. Parrilo December 10 11, Final exam

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

Coordinate Update Algorithm Short Course Operator Splitting

Tutorial on Convex Optimization for Engineers Part II

Nonlinear Optimization for Optimal Control

III. Applications in convex optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

15. Conic optimization

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture: Introduction to LP, SDP and SOCP

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Homework 5. Convex Optimization /36-725

Review Solutions, Exam 2, Operations Research

CSCI : Optimization and Control of Networks. Review on Convex Optimization

SEMIDEFINITE PROGRAM BASICS. Contents

A Brief Review on Convex Optimization

EE364a Review Session 5

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Lecture 7: Convex Optimizations

CS711008Z Algorithm Design and Analysis

Convex Optimization and l 1 -minimization

MATH 680 Fall November 27, Homework 3

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

subject to (x 2)(x 4) u,

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

A Quantum Interior Point Method for LPs and SDPs

Additional Homework Problems

Distributed Optimization: Analysis and Synthesis via Circuits

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH OPERATIONS RESEARCH DETERMINISTIC QUALIFYING EXAMINATION. Part I: Short Questions

Semidefinite Programming

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4

SF2822 Applied nonlinear optimization, final exam Wednesday June

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex Optimization and Modeling

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Dual and primal-dual methods

CS-E4830 Kernel Methods in Machine Learning

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

The Frank-Wolfe Algorithm:

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Solving Dual Problems

ARock: an algorithmic framework for asynchronous parallel coordinate updates

Preconditioning via Diagonal Scaling

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

EE364b Homework 2. λ i f i ( x) = 0, i=1

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

c 2005 Society for Industrial and Applied Mathematics

4. Convex optimization problems

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

On the Method of Lagrange Multipliers

Convex Optimization M2

Convex Optimization & Lagrange Duality

Name: INSERT YOUR NAME HERE. Due to dropbox by 6pm PDT, Wednesday, December 14, 2011

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Sparse Optimization Lecture: Dual Methods, Part I

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

EE364b Homework 5. A ij = φ i (x i,y i ) subject to Ax + s = 0, Ay + t = 0, with variables x, y R n. This is the bi-commodity network flow problem.

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Midterm Review. Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.

Solving Linear Systems

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Homework Set #6 - Solutions

More First-Order Optimization Algorithms

Sparse Gaussian conditional random fields

CSC411 Fall 2018 Homework 5

Linear programming II

Equality constrained minimization

Barrier Method. Javier Peña Convex Optimization /36-725

Tutorial on Convex Optimization: Part II

5. Duality. Lagrangian

Transcription:

EE364b Convex Optimization II May 30 June 2, 2014 Prof. S. Boyd Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you have 72 hours to work on the final, your solutions must be typeset using L A TEX. We are expecting your solutions to be typo-free, clear, and correctly typeset. (And yes, we will deduct points for poor typesetting, typos, or unclear solutions.) All code submitted must be clear, commented and readable. Email your solutions to ee364b.submission@gmail.com by Monday June 2nd 5pm at the latest. You can find the matlab files containing problem data on the course website homework page. Pleasemakesureeachproblemstartsonanewpage, say, byusingthe\clearpagecommand. (This generates a new page after printing out any figures that have floated forward.) 1

1. Solving LPs via alternating projections. Consider an LP in standard form, minimize c T x subject to Ax = b x 0, with variable x R n, and where A R m n. A tuple (x,ν,λ) R 2n+m is primal-dual optimal if and only if Ax = b, x 0, A T ν +λ = c, λ 0, c T x+b T ν = 0. These are the KKT optimality conditions of the LP. The last constraint, which states that the duality gap is zero, can be replaced with an equivalent condition, λ T x = 0, which is complementary slackness. (a) Let z = (x, ν, λ) denote the primal-dual variable. Express the optimality conditions as z A C, where A is an affine set, and C is a simple cone. Give A as A = {z Fz = g}, for appropriate F and g. (b) Explain how to compute the Euclidean projections onto A and also onto C. (c) Implement alternating projections to solve the standard form LP. Use z k+1/2 to denote the iterate after projection onto A, and z k+1 to denote the iterate after projection onto C. Your implementation should exploit factorization caching in the projection onto A, but you don t need to worry about exploiting structure in the matrix F. Test your solver on a problem instance with m = 100, n = 500. Plot the residual z k+1 z k+1/2 2 over 1000 iterations. (This should converge to zero, although perhaps slowly.) Here is a simple method to generate LP instances that are feasible. First, generate a random vector ω R n. Let x = max{ω,0} and λ = max{ ω,0}, where the maximum is taken elementwise. Choose A R m n and ν R m with random entries, and set b = Ax, c = A T ν + λ. This gives you an LP instance with optimal value c T x. (d) Implement Dykstra s alternating projection method and try it on the same problem instances from part (c). Verify that you obtain a speedup, and plot the same residual as in part (c). 2

2. Quantile regression. For α (0,1), define h α : R n R as h α (x) = α1 T x + +(1 α)1 T x, where x + = max{x,0} and x = max{ x,0}, where the maximum is taken elementwise. For the connection between this function and quantiles, see exercise 1.4. (a) Give a simple expression for the proximal operator of h α. (b) The quantile regression problem is minimize h α (Ax b), with variable x R n and parameters A R m n, b R m, and α (0,1). Explain how to use ADMM to solve this problem by introducing a new variable (and constraint) z = Ax b. Give the details of each step in ADMM, including how one of the steps can be greatly speeded up after the first step. (c) Implement your method on data (i.e., A and b) generated as described below, for α {0.2,0.5,0.8}. For each of these three values of α, give the optimal objective value, and plot a histogram of the residual vector Ax b. Generate A and b using the following code: m = 2000; n = 200; rand(state, 3); A = rand(m, n); b = rand(m, 1); Hint. You should develop, debug, and test your code on a smaller problem instance, so you can easily (i.e., quickly) check the results against CVX. 3

3. Optimal parameter choice for Peaceman-Rachford algorithm. Consider the problem minimize f(x)+g(x), with variable x R n, where f,g : R n R {+ } are convex, closed, and proper functions. This problem is equivalent to solving 0 f(x) + g(x). The Peaceman- Rachford iteration for solving this problem is where z k+1 = C f C g (z k ), C f (z) = 2(I +λ f) 1 (z) z = 2prox λf (z) z is the Cayley operator of f, and similarly for C g, with λ > 0. This iteration need not converge. But it does converge if either C f or C g is a contraction. (Note that C f or C g is nonexpansive.) (a) Assume that f is convex quadratic, f(x) = (1/2)x T Px+q T x, with P S n ++ and q R n. Find the smallest Lipschitz constant on C f C g in terms of λ and P, without any further assumptions on g. (Your answer can involve the eigenvalues of P, ordered as λ max (P) = λ 1 λ n = λ min (P) > 0.) (b) Find λ opt, the value of λ for which the Lipschitz constant in part (a) is minimized, and give the associated Lipschitz constant for C f C g. Express λ opt in terms of the eigenvalues of P. Express the optimal Lipschitz constant in terms of the condition number κ of P, given by κ = λ max (P)/λ min (P). (c) Consider the case f(x) = Ax b 2 2 and g the indicator function of the nonnegative orthant. (This is the nonnegative least-squares problem.) The optimality conditions for this problem are x 0, A T (Ax b) 0, x i (A T (Ax b)) i = 0, i = 1,...,n. At each iteration of the Peaceman-Rachford algorithm, the point x k = R g (z k ) satisfiesthefirstoptimalitycondition. WestopthealgorithmwhenA T (Ax k b) ǫ1, and (x k ) i (A T (Ax k b)) i ǫ for i = 1,...,n, where ǫ > 0 is a tolerance. Implement the Peaceman-Rachford algorithm in this case, with tolerance ǫ = 10 4. Generate a random instance of the problem with m = 500 and n = 200, and plot the number of iterations required versus λ over a range that includes λ opt (from part (b)). The horizontal axis should be logarithmic, showing λ/λ opt (say, for 30 values from 0.01 to 100). Repeat for several random instances, and briefly comment on the results. 4

4. Regularization parameter for sparse Bayes network identification. We are given samples y 1,...,y N R n from an N(0,Σ) distribution, where Σ 0 is an unknown covariance matrix. From these samples we will estimate the parameter Σ, using the prior knowledge that Σ 1 is sparse. (The diagonals will not be zero, so this means that many off-diagonal elements of Σ 1 are zero. Zero entries in Σ 1 can be interpreted as a conditional independence condition, and explains the title of this problem.) To this end, we solve the (convex) problem maximize logdets Tr(SY) λ i j S ij with variable S S n ++, which is our estimate of Σ 1. Modulo a constant and scaling, the first two terms in the objective are the log-likelihood, where matrix Y is the sample covariance matrix Y = 1 N y k yk T N k=1 (which we assume satisfies Y 0). The last term in the objective is a sparsfiying regularizer, with regularization parameter λ > 0. It does not penalize the diagonal terms in S, since they cannot be zero. We let S denote the optimal S (which is unique, since the objective is strictly concave). It depends on Y and λ. (a) Suppose we add the additional constraint that S must be diagonal. (In this case S is as sparse as it can possibly be: All its off-diagonal entries are zero.) Find a simple expression for S diag, the optimal S in this case. (b) Show that there is a (finite) value λ diag, such that S = S diag if and only if λ λ diag. Find a simple expression for λ diag in terms of Y. Hint. See page 641 of the textbook for the derivative of logdets. Remark. It is very useful in practice to know the value λ diag. Useful values of the regularization parameter λ are almost always in the range [0.05,.95]λ diag. 5

5. Subgradient method for total variation in-painting. A grayscale image is represented as an m n matrix of intensities U orig (typically between the values 0 and 255). You are given the values U orig ij, for (i,j) K, where K {1,...,m} {1,...,n} is the set of indices corresponding to known pixel values. Your job is to in-paint the image by guessing the missing pixel values, i.e., those with indices not in K. The reconstructed image will be represented by U R m n, where U matches the known pixels, i.e., U ij = U orig ij for (i,j) K. The reconstruction U is found by minimizing the total variation of U, subject to matching the known pixel values. We will use the l 2 total variation, defined as tv(u) = m 1 i=1 n 1 [ ] Ui+1,j U i,j U i,j+1 U i,j Note that the norm of the discretized gradient is not squared. j=1 (a) Explain how to find a subgradient G tv(u). It is sufficient to give a formula for G ij. (b) Implement a projected subgradient method for minimizing tv(u) subject to U ij = U orig ij for (i,j) K. Use it to solve the problem instance given in subgrad_tv_inpaint_data.m. You will also need tv_l2_subgrad.m, lena512.bmp, and lena512_corrupted.bmp. Show the original image, the corrupted image, and the in-painted image. Plot tv ( U (k)) (U (k) is U in the kth iteration) versus k. The file subgrad_tv_inpaint_data.m defines m, n, and matrices Uorig, Ucorrupt, and Known. The matrix Ucorrupt is Uorig with the unknown pixels whited out. The matrix Known is m n, with (i,j) entry one if (i,j) K and zero otherwise. The file also includes code to display Uorig and Ucorrupt as images. Writing matlab code that operates quickly on large image matrices is tricky, so we have provided a function tv_l2_subgrad.m that computes tv(u) and G tv(u) given U. tv_l2_subgrad.m uses the norms function from CVX, so you will need CVX installed. A simple (and fast) way to set the known entries of a matrix U to their known values is U(Known == 1) = Uorig(Known == 1). You may need to try several step length sequences to get fast enough convergence. We obtained good results with step sizes like α k = 1000/k and α k = 50/ k, but feel free to experiment with others. Do not hesitate to run the algorithm for 1000 or more iterations. Once it s working, you might like to create an animated GIF that shows algorithm progress, say, displaying U every 50 iterations. We used the function imwrite(u_record, inpaint.gif, DelayTime,1, LoopCount,inf). Here U_record is an m n 1 r matrix, where U_record(:, :, 1, i) is the ith intermediate value of U out of the r stored in U_record. imwrite will project invalid intensity values into the range [0, 255] (with a warning).. 2 6