Lecture 20: November 1st

Similar documents
Semidefinite Programming Basics and Applications

Lecture 4: January 26

Lecture 4: September 12

Lecture: Examples of LP, SOCP and SDP

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Lecture 15: October 15

Lecture 9: September 28

IE 521 Convex Optimization

Lecture 1: September 25

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

Convex Optimization M2

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Lecture 25: November 27

Lecture 5: September 15

Lecture 17: Primal-dual interior-point methods part II

Sparse Optimization Lecture: Basic Sparse Optimization Models

Lecture 6: September 19

Lecture 5: September 12

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Lecture 14: October 11

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Lecture 24: August 28

Lecture 8: Linear Algebra Background

Lecture 26: April 22nd

10-725/ Optimization Midterm Exam

Mathematical Optimization Models and Applications

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications

Lecture 6: Conic Optimization September 8

ORF 523 Lecture 9 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Thursday, March 10, 2016

Semidefinite Programming

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4

10-725/36-725: Convex Optimization Prerequisite Topics

Quadratic reformulation techniques for 0-1 quadratic programs

Distance Geometry-Matrix Completion

Convex Optimization in Classification Problems

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Interior Point Methods: Second-Order Cone Programming and Semidefinite Programming

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming

Convex Optimization and Modeling

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Lecture: Cone programming. Approximating the Lorentz cone.

Lecture: Introduction to LP, SDP and SOCP

Lecture 23: November 19

16.1 L.P. Duality Applied to the Minimax Theorem

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

SDP Relaxations for MAXCUT

12. Interior-point methods

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013

Lecture 23: November 21

Lecture 17: October 27

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

l p -Norm Constrained Quadratic Programming: Conic Approximation Methods

Lecture 6: September 12

Sparse PCA with applications in finance

Lecture 16: October 22

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture 14: October 17

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

Lecture 18: Optimization Programming

1 Regression with High Dimensional Data

Conditional Gradient (Frank-Wolfe) Method

Lecture 23: Conditional Gradient Method

EE 227A: Convex Optimization and Applications October 14, 2008

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1

Lecture 2: August 31

Agenda. 1 Cone programming. 2 Convex cones. 3 Generalized inequalities. 4 Linear programming (LP) 5 Second-order cone programming (SOCP)

Homework 5. Convex Optimization /36-725

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Relaxations and Randomized Methods for Nonconvex QCQPs

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

SEMIDEFINITE PROGRAM BASICS. Contents

Convex Optimization and l 1 -minimization

Lecture 14: Newton s Method

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs

Nonconvex Quadratic Programming: Return of the Boolean Quadric Polytope

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

1 Quantum states and von Neumann entropy

1 Cricket chirps: an example

Homework 4. Convex Optimization /36-725

Sparse and Robust Optimization and Applications

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Matrix Rank Minimization with Applications

Lecture 7: Convex Optimizations

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

LECTURE 13 LECTURE OUTLINE

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Lecture 8. Strong Duality Results. September 22, 2008

Convex sets, conic matrix factorizations and conic rank lower bounds

Transcription:

10-725: Optimization Fall 2012 Lecture 20: November 1st Lecturer: Geoff Gordon Scribes: Xiaolong Shen, Alex Beutel Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. 20.1 Administrivia HW3 back at end of class Last day for feedback survey All lectures now up on Youtube Reminder: midterm next Tuesday 11/6! in class test with 1 hr 20 mins, one sheet of notes. 20.2 Introduction and Defination We are talking about Quadratic programs and cone programs. Starting with definition, this lecture will extend to the application of these optimization methods. Actually tons of research in Machine Learning and Statistics are trying to convert interesting problems to Quadratic Programs and Cone Programs. Some familiar problems is group lasso which is second order cone programs and SVM which is quadratic programs. 20.3 Quadratic Programs m constraints and n vars A : R m n Constrain Matrix b : R m Right hand side of constrains c : R n Objective Vectors x : R n Random Variables H : R n n Quadratic Part of Objective [min or max] x T Hx/2 + c T x subject to Ax b or Ax = b may have x 0 20-1

20-2 Lecture 20: November 1st Typical Example: Maximize a convex function: max 2x + x 2 + y 2 s.t. x + y 4 2x + 5y 12 x + 2y 5 x, y 0 The Optimization Problem is Convex Problem if: min H 0 max H 0 20.4 Cone Programs m constraints and n vars A : R m n Constrain Matrix b : R m Right hand side of constrains c : R n Objective Vectors x : R n Random Variables Cones K R m L R n [min or max] c T x subject to Ax + b K, x L convex problem if cones K and L are both convex Typical Example: Typical Example of K: K = {0} p R q + First p elements representing equality constrains and later q elements representing inequality constrains Typical Example of L: L = {0} p R q + R q Then with this constrains we get a ordinary LPs, if with more general constrains, we can have more general optimization problems. 20.4.1 SOCP Mathematical Problem is min c T x s.t. A i x + b i K i, i = 1, 2,... Where maximize is also possible. A i R mi n x R n b i R mi

Lecture 20: November 1st 20-3 K i should be some combination of equality constrains({0}), inequality constrains (R + ) and Secondary Order Cones (SOC) SOC : {(y, t) y t} y : R mi t : R SOC : R m+1 As the object function is linear, we are only possible to get the optimal point at the boundary of the feasible region. 20.4.2 Relationship between QPs and SOCPs 20.4.2.1 QPs are reducible to SOCPs The QP is Then change the minimization problem to : min x T Hx/2 + c T x s.t. min t + c T x Some Constrains. s.t. t xt Hx, Some Constrains. 2 The goal is to implement the t xt Hx 2 using a Secondary Order Cone Constrains. Start by get: H = R T R then x T Hx = (x T R T )(Rx) = Rx 2 2 So we need to constrain Rx,t,t+1 (Rx, t, t + 1) SOC which means t + 1 Rx 2 2 + t2 square both side t 2 + 2t + 1 Rx 2 2 + t 2 t Rx 2 2 2 1 2 as the 1 2 is completely irrelevant and Rx 2 2 = x T Hx t xt Hx 2

20-4 Lecture 20: November 1st 20.4.2.2 Are all SOCPs are QPs? The answer is actually no. For example, the QCQP can be a reasonable instance. QCQP represents convex quadratic objective and constraints. For a optimization problem: minimize a 2 + b 2 s.t. a x 2, b y 2 2x + y = 4 This is not a QP as it has quadratic constraints. Actually from the nonlinear constraints we can reformulate the problem to: minimize x 4 + y 4 s.t. 2x + y = 4 It is a l-4 norm minimization problem which is different from a QP. If we want to make it to SOCP, we are solving min p + q where p a 2 and q b 2. So we need to constrain a,p,p+ 1 2 (a, p, p + 1 2 ) SOC which means p + 0.5 a 2 + p 2 square both side p 2 + p + 0.25 a 2 + p 2 p a 2 0.25 as the 1 4 is completely irrelevant, we can implement p a2. Similarly we can have q b 2. Also we can use the intersecting of a cone and a plane to make the constrain of a x 2 and b y 2 in a similar way. 20.4.3 More Cone Programs: Semidefinite Program First we make some definitions about semidefinite programs. Semidefinite Constraint: variable x R n constant matrices A 0, A 1, A 2,... R m m constrain A 0 + i x ia i 0 and symmetric constrain A 0 + i x ia i S+ m Semidefinite program: min c T x with the constraints: semidefinite constraints

Lecture 20: November 1st 20-5 linear equalities linear inequalities For a positive semidefinite cone S +, it is self-dual: Proof: We surpress m in S m + to S + : To show that the dual is the same as the primal: S + : {A A = A T, x T Ax 0 for all x } [x T Ax 0 for all x ] [tr(b T A) 0 for all psd B] which demonstrate when (A : B) 0 is true for all the elements in dual cone K = psdb, it is the same as being in the primal cone. First, conduct [x T Ax 0 for all x ] [tr(b T A) 0 for all psd B] psd B = i x i x T i Several matrix factorization methods like Singular Value Decomposition can make a positive semidefinite cone represented in this way.then we will have: tr(b T A) = i tr((x i x T i ) T A) = i tr(x T i Ax i = i x T u Ax i 0 Then, conduct [x T Ax 0 for all x ] [tr(b T A) 0 for all psd B] B = xx T So we can have: tr(b T A) = x T Ax 0 which shows that A is semidefinite. 20.5 Solving QPs and CPs To solve a convex QP or CP is not much harder than LP as long as we have an efficient representation of the cone. So with a bit length of L and accuracy of ɛ we can get the QP or CP in time polynomial ploy(l, 1 ɛ ). But can these programs solved in time polynomial with bit length which is called strong polynomial ploy(l). Though people guess it s true, it still stays a open question which we can get our PhD degree by proving it!

20-6 Lecture 20: November 1st For general QP or CP, they are NP-complete as to reduce the max cut to QP. So it shows us again that the convexity property is crucial in our problems. 20.6 Examples of QPs Euclidean projection Lasso Mahalanobis projection Huber regression Support Vector Machine 20.6.1 Robuster Huber regression Given points (x i, y i ) where x i R n and y i R we have the following optimization problem (L 2 regression): min (y i x i w) 2 w i We would like to avoid overvfitting that could be caused by outliers, by using the Huber loss rather than the squared loss, so the optimization problem becomes min Hu(y i x i w) w where i 2z 1 if z 1 Hu(z) = 2z 1 if z 1 z 2 if 1 < z < 1 (20.1) A plot comparing the two loss functions can be seen below where the red line is the Huber loss and the blue line is the squared loss. 25 20 15 10 5 4 2 2 4 We would like to now show that the Huber loss is in fact a quadratic program. We assume that we can compute it as the following QP and then show that it works out to the Huber loss function Hu(z i ) = min a i,b i (z i + a i b i ) 2 + 2a i + 2b i (20.2) s.t.a i, b i 0 (20.3)

Lecture 20: November 1st 20-7 We find that taking the graident with respect to a and b we get If we set these each to 0, we a (z + a b) 2 + 2a + 2b = 2(z + a b) + 2 (20.4) b (z + a b) 2 + 2a + 2b = 2(z + a b) + 2 (20.5) When solving this we see there are four posibilities for the constraints: 1. Both constraints are tight a = b = 0 implies Hu(z) = z 2 2. If we assume both constraints are not tight then both of their derivatives must equal 0. We see adding the two derivatives that 4 = 0 which is impossible and thus we can not have that both constraints are not tight. 3. a > 0 and b = 0 implies a = z 1 and therefore Hu(z) = 1 2z. This is under the constraint that a = z 1 0 and thus it is only true if z 1. 4. Taking the symmetric case of a = 0 and b > 0 we get the other branch for z 1. 20.6.2 Cone program examples Sparse Group Lasso: We can show that (sparse) group lasso is a SOCP. Group lasso is defined as: given groups g i {1,... n}, we have the optimization problem min y Xw 2 2 + λ i w gi 2 (20.6) where w gi is the subvector of w corresponding to the indices g i. Similarly sparse group lasso is defined as: min y Xw 2 2 + λ i w gi 2 + µ w 1. (20.7) We can formulate this as the following SOCP as: min t + λ i t i (20.8) s.t. t y Xw 2 2 (20.9) t i w gi 2 (20.10) Inference in a discrete Markov random field can be relaxed to a SOCP [Kumar, Kolmogorov, Torr, JMLR 2008]. Minimum volume covering ellipsoid: We represent an ellipsoid as {x Ax + b 1}. We can write this as a second order cone constraint ( Ax i+b) 1 SOC. The volume of the elliposed is det A 1, so we can set up our optimization problem as min ln det A 1 (20.11) This is a convex function, making this a second order cone program, but with a non-linear objective. SDP: We can also frame a number of problems as SDPs:

20-8 Lecture 20: November 1st graphical LASSO (nonlinear objective): min tr(s X) log det X + λ i j x ij (20.12) where S is the empirical covariance, and requiring that X 0 and X is symmetric. Markowitz portofolio optimization Max-cut relaxation [Goemans,Williamson] Matrix completion Manifold learning through max variance unfolding Matrix completition: We have a matrix A where we observe some elements A ij for (i, j) E. We write O ij = { 1 if (i, j) E 0 if otherwise (20.13) Our objective normally is: min X (X A) O 2 F + λ X (20.14) (As a recap, the nuclear norm term is in an effort to get a low rank X.) We would like to make this into a semidefinite program, by showing that λ X is the same as λ(tr(p ) + tr(q))/2 subject to: ( ) P X M = X 0. (20.15) Q To show this we begin by decomposing X = UΣV. We know that M 0 is equivalent to tr(b M) 0 for all B 0 (because PSDs are self dual). So if we take ( ) UU UV B = V U V V 0. (20.16) then we find 0 tr(b M) = tr(uu P ) + tr(v V Q) 2tr(V U X) (20.17) tr(p ) + tr(q) 2tr(U XV ) = 2tr(Σ) = 2 X (20.18) If we set P = UΣU and Q = V ΣV then we find tr(p ) = tr(u UΣ) = tr(σ) and similarly for Q. Then tr(p ) + tr(q) = 2 X. Finally we must show that this choice of P and Q make M PSD. We now have ( ) UΣU UΣV M = V ΣU V ΣV We can apply an orthogonal transformation and keep the trace norm: ( U 0 0 V ) ( UΣU UΣV V ΣU V ΣV ) ( U 0 0 V ) ( Σ Σ = Σ Σ (20.19) ) 0 (20.20) From that we see that by setting M up as such, replacing λ X by λ(tr(p )+tr(q))/2, and requiring M 0 we have an equivalent SDP with a quadratic objective that solves the original matrix completion objective.