Homework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent

Similar documents
Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent

Homework 5. Convex Optimization /36-725

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Homework 3. Convex Optimization /36-725

CS261: Problem Set #3

HOMEWORK 4: SVMS AND KERNELS

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Homework 4. Convex Optimization /36-725

COMS F18 Homework 3 (due October 29, 2018)

Linear & nonlinear classifiers

EE364b Convex Optimization II May 30 June 2, Final exam

ICS-E4030 Kernel Methods in Machine Learning

Minimum cost transportation problem

CS-E4830 Kernel Methods in Machine Learning

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Solving Dual Problems

DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH OPERATIONS RESEARCH DETERMINISTIC QUALIFYING EXAMINATION. Part I: Short Questions

Convex Optimization and Support Vector Machine

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Set 2 January 12 th, 2018

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Problem set 1. (c) Is the Ford-Fulkerson algorithm guaranteed to produce an acyclic maximum flow?

CS446: Machine Learning Spring Problem Set 4

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Dual Decomposition.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Homework 6: Image Completion using Mixture of Bernoullis

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Convex Optimization / Homework 1, due September 19

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Duality Theory of Constrained Optimization

IE 521 Convex Optimization Homework #1 Solution

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Constrained Optimization and Lagrangian Duality

CSC411 Fall 2018 Homework 5

10-725/ Optimization Midterm Exam

subject to (x 2)(x 4) u,

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

CS145: INTRODUCTION TO DATA MINING

Lecture Notes on Support Vector Machine

minimize x subject to (x 2)(x 4) u,

Linear & nonlinear classifiers

Additional Homework Problems

Relation of Pure Minimum Cost Flow Model to Linear Programming

Support Vector Machines

Linear Programming Duality P&S Chapter 3 Last Revised Nov 1, 2004

Fall 2017 Qualifier Exam: OPTIMIZATION. September 18, 2017

Machine Learning. Support Vector Machines. Manfred Huber

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

Applied Lagrange Duality for Constrained Optimization

Part IB Optimisation

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM

10725/36725 Optimization Homework 4

Announcements - Homework

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Programming Assignment 4: Image Completion using Mixture of Bernoullis

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley

Classification with Perceptrons. Reading:

Linear Programming Duality

Support Vector Machines and Kernel Methods

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Homework 2. Convex Optimization /36-725

You should be able to...

Max Margin-Classifier

EE364a Review Session 5

Routing. Topics: 6.976/ESD.937 1

5. Duality. Lagrangian

Dual Ascent. Ryan Tibshirani Convex Optimization

CS , Fall 2011 Assignment 2 Solutions

6.854 Advanced Algorithms

A Brief Review on Convex Optimization

IE 5531 Midterm #2 Solutions

UNIVERSITY OF YORK. MSc Examinations 2004 MATHEMATICS Networks. Time Allowed: 3 hours.

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

12. Interior-point methods

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Convex Optimization M2

(P ) Minimize 4x 1 + 6x 2 + 5x 3 s.t. 2x 1 3x 3 3 3x 2 2x 3 6

Support Vector Machines

The fundamental theorem of linear programming

Dual Proximal Gradient Method

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

3.10 Lagrangian relaxation

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013

1. f(β) 0 (that is, β is a feasible point for the constraints)

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

CS246: Mining Massive Data Sets Winter Only one late period is allowed for this homework (11:59pm 2/14). General Instructions

Kernel Methods and Support Vector Machines

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Lecture Support Vector Machine (SVM) Classifiers

Support Vector Machines

On the Method of Lagrange Multipliers

An interior-point stochastic approximation method and an L1-regularized delta rule

FALL 2018 MATH 4211/6211 Optimization Homework 4

Linear programming II

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Transcription:

Homework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent CMU 10-725/36-725: Convex Optimization (Fall 2017) OUT: Nov 4 DUE: Nov 18, 11:59 PM START HERE: Instructions Collaboration policy: Collaboration on solving the homework is allowed, after you have thought about the problems on your own. It is also OK to get clarification (but not solutions) from books or online resources, again after you have thought about the problems on your own. There are two requirements: first, cite your collaborators fully and completely (e.g., Jane explained to me what is asked in Question 3.4 ). Second, write your solution independently: close the book and all of your notes, and send collaborators out of the room, so that the solution comes from you only. Submitting your work: Assignments should be submitted as PDFs using Gradescope unless explicitly stated otherwise. Each derivation/proof should be completed on a separate page. Submissions can be handwritten, but should be labeled and clearly legible. Else, submissions can be written in LaTeX. Upon submission, label each question using the template provided by Gradescope. Please refer to Piazza for detailed instruction for joining Gradescope and submitting your homework. Programming: All programming portions of the assignments should be submitted to Gradescope as well. We will not be using this for autograding, meaning you may use any language which you like to submit. 1

1 Primal-Dual Interior Points Methods (20pts) [Devendra] A network is described as a directed graph with m arcs or links. The network supports n flows, with nonnegative rates x 1,..., x n. Each flow moves along a fixed, or pre-determined, path or route in the network, from a source node to a destination node. Each link can support multiple flows, and the total traffic on a link is the sum of the rates of the flows that travel over it. The total traffic on link i can be expressed as (Ax) i, where A R m n is the flow-link incidence matrix defined as { 1 when flow j passes through link i A ij = 0 otherwise Usually each path passes through only a small fraction of the total number of links, so the matrix A is sparse. Each link has a positive capacity, which is the maximum total traffic it can handle. These link capacity constraints can be expressed as Ax b, where b i is the capacity of link i. We consider the network rate optimization problem (in which matrix A is given, so that we are not optimizing over where each flow passes through) where maximize f 1 (x 1 ) + + f n (x n ) subject to Ax b x 0 { x f k (x k ) = k x k < c k (x k + c k )/2 x k c k and c k > 0 is given. In this problem we choose feasible flow rates x k that maximize a utility function k f k(x k ). (a) [6pt] Express the network rate optimization problem as a linear program in inequality form. (b) [14pt]Write a custom implementation of the primal-dual interior point method for the linear program in part (a). Give an efficient method for solving the linear equations that arise in each iteration of the algorithm. Justify briefly your method, assuming that m and n are large, and that the matrix A T A is sparse. 2 Duality for Support Vector Regression (Yichong; 25 pts) In the last homework, we used barrier methods to solve support vector regression. This time we look at SVR from a different point of view. Recall that for a set of data points (X 1, Y 1 ),..., (X n, Y n ) with X i R d and Y i R for all i,an ε-svr solves the convex optimization problem min f(w, ξ) = 1 w R d+1,ξ R n 2 w 1:d 2 2 + C s.t. Y i w T X i ε + ξ i, w T X i Y i ε + ξ i, ξ i 0, i = 1, 2,..., n. For simplicity, we append a 1 to the end of each X i so that we don t need to consider bias. Here ε and C are parameters of SVR, and w 1:d is the first d dimension of w. n i=1 ξ i 2

(a) [5pt] Derive the KKT conditions for SVR. Use u i, i = 1, 2,..., n for the first set of constraints, and u i, v i for the second and third one. (b) [5pt] Derive the dual program for SVR. Simplify your program so that v i does not appear in the program. (c) [5pt] Describe how to derive the primal solution w using the dual solution u i, u i. Remember also to show how to calculate the intercept (the d + 1-th dimension of w). Hint: It is possible that you cannot obtain a unique solution for the intercept. (d) [10pt] Without a proof, figure out what is the location of (X i, Y i ) with respect to the margin {(x, y) : y w T X i ε} in the following cases, when w is the optimal solution? i) u i = u i = 0; ii) u i iii) u i (0, C); (0, C); iv) u i = C; v) u i = C. 3 Dual Ascent [Hongyang; 25 pts] The key to the success of dual ascent method is the so-called strong duality, namely, the primal problem and the dual problem have exactly the same optimal objective function value. Let us see a simple example: with variable x R. min x x 2 + 1, s.t. (x 2)(x 4) 0, (a) [5pt] Analysis of primal problem. Give the feasible set, the optimal value, and the optimal solution. (b) [15pt]Lagrangian and dual function. i) Plot the objective x 2 + 1 versus x. On the same plot, show the feasible set, optimal point and value, and plot the Lagrangian L(x, λ) versus x for a few positive values of λ. ii) Verify the lower bound property (p inf x L(x, λ) for λ 0). iii) Derive and sketch the Lagrange dual function g. (c) [5pt] Lagrange dual problem. State the dual problem, and verify that it is a concave maximization problem. Find the dual optimal value and dual optimal solution λ. Does strong duality hold? 4 Quantile regression and ADMM [Yifeng; 30 pts] Quantile regression is a way to estimate the conditional α-quantile, for some α [0, 1] fixed by the implementer, of some response variable Y given some predictors X 1,..., X p. (Just to remind you: the α-quantile of Y X 1,..., X p is given by inf t {t : Prob(Y t X 1,..., X p ) α}.) It turns out that we can estimate the α-quantile by solving the following (convex) optimization problem: min β R p n i=1 l(α) i (y Xβ), (1) where y R n is a response vector, X R n p is a data matrix, and the map l (α) : R n R n is the quantile loss, defined as l (α) i (a) = max {αa i, (α 1)a i }, i = 1,..., n, 3

for some a R n. (It is not too hard to show why this is the case, but you can just assume it is true for this question.) In Figure 1, we plot some estimated α-quantiles, for a few values of α, obtained by solving problem (1) on some synthetic data. Figure 1: Various quantiles estimated from synthetic data. We often want to estimate multiple quantiles simultaneously, but would like them to not intersect: e.g., in Figure 1, we would like to avoid the situation where the 0.1-quantile line (maybe eventually) lies above the 0.9-quantile line, since this doesn t make any sense. To do so, we can instead solve the following (again convex) optimization problem, referred to as the multiple quantile regression problem with non-crossing constraints: n r ( min B R p r i=1 j=1 l(a) ij y1 T XB ) s.t. XBD T (2) 0. Here, r is the number of α s (i.e., quantile levels); A = {α 1,..., α r } is a set of user-defined quantile levels; y, X are as before; 1 is the r-dimensional all-ones vector; and B R p r is now a parameter matrix, structured as follows: B = β 1 β r, for β i R p, i = 1,..., r. Here, we have also extended the definition of the quantile loss: it is now a map l (A) : R n r R n r, defined as l (A) ij (A) = max {α j A ij, (α j 1)A ij }, i = 1,..., n, j = 1,..., r. Lastly, D R (r 1) r is the first-order finite differencing operator, given by 1 1 0 0 0 0 1 1 0 0 D =......... 0 0 0 1 1 Hence, you can convince yourself that the constraints in (2) prevent the estimated quantiles from crossing. Now, finally, for the problem. (a, 5pts) Derive the prox operator for the quantile loss l (A) ij (defined above). In other words, write down an expression for prox (A) l,λ(a), where λ > 0 is a user-defined constant, and A Rm n is the point at ij which the prox operator is evaluated. 4

(b, 10pts) We can put problem (2) into ADMM form as follows: min B R p r, Z 1,Z 2 R n r, Z 3 R n (r 1) n i=1 r j=1 l(a) ij (Z 1 ) + I + (Z 3 ) s.t. Z 1 = y1 T XB, Z 2 = XB, Z 3 = Z 2 D T (3) where we introduced the variables Z 1, Z 2, Z 3, and I + ( ) is the projection map onto the nonnegative orthant. The augmented Lagrangian associated with problem (3) is then (in scaled form) L ρ (B, Z 1, Z 2, Z 3, U 1, U 2, U 3 ) = where U 1, U 2, U 3 are dual variables. n r i=1 j=1 l (A) ij (Z 1 ) + I + (Z 3 ) + ρ ( y1 T XB Z 1 + U 1 2 F + 2 XB Z 2 + U 2 2 F + Z 2 D T Z 3 + U 3 2 F U 1 2 F U 2 2 F U 3 2 F ), Here are the updates steps of ADMM in our question: For k = 0, 1, 2,..., B (k+1) = argmin L ρ (B, Z (k) 1, Z(k) 2, Z(k) 3, U (k) B 1 = argmin Z 1 2 = argmin Z 2 3 = argmin Z 3 L ρ (B (k+1), Z 1, Z (k) 2, Z(k) 3, U (k) L ρ (B (k+1), 1, Z 2, Z (k) 3, U (k) L ρ (B (k+1), 1, 2, Z 3, U (k) 1 = U (k) 1 + y1 T XB (k+1) 1 ; 2 = U (k) 2 + XB (k+1) 2 ; 3 = U (k) 3 + 2 D T 3. Please provide the explicit update formula for B (k+1), 1, 2, 3. You can assume that the quantity X T X is invertible. (c, 15 pts) Implement your ADMM algorithm from part (b) for the quantiles A = {0.1, 0.5, 0.9}, using synthetic data from hw5q4.zip which contain a predictor matrix X, response vector y, and in csv files. Note that the predictor matrix in this example data is 1000 2, with the 2nd column being the column of all 1s, hence providing an intercept term in the quantile regression. Set initial values for all primal and dual variables as zero matrices of appropriate size, use an augmented Lagrangian parameter of ρ = 1, and run ADMM for 50 steps. Report your optimal criterion ( n r i=1 j=1 l(a) ij (Z 1 )) value after 50 steps, and plot the optimal criterion value across the iterations. Recreate the plot similar to Figure 1, plotting data as points and overlaying as lines the conditional quantile predictions for the quantiles in A. (Hint: recall that B is a matrix whose r columns are coefficient estimates for each quantile level α 1,, α r, so that XnewB T at a new data point X new = (x new, 1) is the fitted estimates for each quantile in A.) 5