Lecture 4: September 12

Similar documents
Canonical Problem Forms. Ryan Tibshirani Convex Optimization

Convexity II: Optimization Basics

Lecture 4: January 26

Lecture 26: April 22nd

Lecture 23: November 19

Lecture 20: November 1st

Semidefinite Programming Basics and Applications

Lecture 14: Optimality Conditions for Conic Problems

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Lecture 5: September 15

Lecture 7: Weak Duality

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

Lecture 16: October 22

Lecture 5: September 12

Lecture 9: SVD, Low Rank Approximation

Sparse Optimization Lecture: Basic Sparse Optimization Models

Lecture 23: Conditional Gradient Method

Lecture 9: September 28

Lecture 6: September 12

Convex Optimization. 4. Convex Optimization Problems. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Lecture 6: September 19

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

Convex Optimization Problems. Prof. Daniel P. Palomar

Lecture 13: Duality Uses and Correspondences

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture: Examples of LP, SOCP and SDP

ORF 523 Lecture 9 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Thursday, March 10, 2016

Lecture 1: January 12

Lecture 1: September 25

9. Interpretations, Lifting, SOS and Moments

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Midterm 1 Solutions. 1. (2 points) Show that the Frobenius norm of a matrix A depends only on its singular values. Precisely, show that

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

1 Kernel methods & optimization

Lecture 6: September 17

Lecture 24: August 28

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture 17: Primal-dual interior-point methods part II

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

Convex Optimization M2

Lecture 7: September 17

Sparse and Robust Optimization and Applications

Intro to Nonlinear Optimization

Midterm Solutions. EE127A L. El Ghaoui 3/19/11

Convex Optimization M2

10701 Recitation 5 Duality and SVM. Ahmed Hefny

Lecture 25: November 27

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

Convex optimization problems. Optimization problem in standard form

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

Chapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1)

Convex Optimization and l 1 -minimization

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Lecture 3: Review of Linear Algebra

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 14: October 17

4. Convex optimization problems

Convex Optimization and Modeling

Lecture 14: October 11

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Lecture: Introduction to LP, SDP and SOCP

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

Lecture: Convex Optimization Problems

Convex Optimization and Support Vector Machine

Lecture 8: Linear Algebra Background

4. Convex optimization problems

Quadratic reformulation techniques for 0-1 quadratic programs

EE 227A: Convex Optimization and Applications October 14, 2008

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 6: Conic Optimization September 8

Lecture Semidefinite Programming and Graph Partitioning

Lecture 3: Review of Linear Algebra

Lecture 23: November 21

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Lecture 10: Duality in Linear Programs

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Mathematical Optimization Models and Applications

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Agenda. 1 Cone programming. 2 Convex cones. 3 Generalized inequalities. 4 Linear programming (LP) 5 Second-order cone programming (SOCP)

Sparse PCA with applications in finance

EE Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko, JPL Third Term

Review of Linear Algebra

Fantope Regularization in Metric Learning

Operations Research Letters

10-725/36-725: Convex Optimization Prerequisite Topics

Homework 4. Convex Optimization /36-725

Lecture 14: Newton s Method

Interior Point Methods: Second-Order Cone Programming and Semidefinite Programming

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4

1 Robust optimization

Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization

Learning the Kernel Matrix with Semi-Definite Programming

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018

Preliminaries Overview OPF and Extensions. Convex Optimization. Lecture 8 - Applications in Smart Grids. Instructor: Yuanzhang Xiao

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Transcription:

10-725/36-725: Conve Optimization Fall 2016 Lecture 4: September 12 Lecturer: Ryan Tibshirani Scribes: Jay Hennig, Yifeng Tao, Sriram Vasudevan Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. 4.1 Previous Lecture 4.1.1 Eliating Equality Constraints If the problem is of the form f() s.t g i () 0, i = 1,..., m A = b then can be epressed as My + 0 (where A 0 = b and col(m) = null(a)). Doing so allows us to rewrite the above problem as: y f(my + 0 ) s.t g i (My + 0 ) 0, i = 1,..., m 4.1.2 Introducing Slack Variables The concept of slack variables is opposite to that of eliating equality constraints. Thus the first formulation in the previous section can be written as:,s f() s.t s i 0, i = 1,..., m g i () + s i = 0, i = 1,..., m A = b This problem however is not conve unless the g i are all affine. 4.1.3 Relaing Nonaffine Equality Constraints Given an optimization problem f() such that C, we can consider an enlarged set C C and solve f() such that C instead. This is known as relaation, and its optimal value is always lesser than or equal to that of the original problem. 4-1

4-2 Lecture 4: September 12 An important special case is that of replacing conve nonaffine equality constraints h j () = 0, j = 1,..., r with h j () 0, j = 1,..., r. 4.1.4 Eamples 1. Maimum Utility Problem: This problem models investment/consumption. It can be formulated as: ma,b T α t u( t ) t=1 s.t. b t+1 = b t + f(b t ) t, t = 1,..., T 0 t b t, t = 1,..., T with b t being the budget and t being the amount consumed at time t. f is the investment return function, u is the utility function, and both are concave and increasing. The equality constraint is nonaffine, but if we rela it to an inequality, the problem doesn t change (relaation is tight), and the problem is now conve. 2. Principal Component Analysis: Given X R n p, consider the low rank approimation problem R X R 2 F such that rank(r) = k. Here A 2 F = n p i=1 j=1 A2 ij, the entrywise squared l 2 norm. This is equivalent to the PCA problem where R = U k D k Vk T with U k and V k being the first k columns of U and V, and D k being the first k diagonal elements of D (X = UDV T, the SVD decomposition of X). This is not a conve problem. To see this, suppose we take a matri A in the set C = {R : Rank(R) = k}, then A C, but 0.5A + 0.5( A) / C. This problem can be recast in a conve form by first rewriting the problem as X XZ 2 Z S P F subject to rank(r) = k ma tr(sz) subject to rank(r) = k Z S P where Z is a projection and S = X T X. Hence the constraint set is the nonconve set C = { Z S P : λ i (Z) {0, 1}, i = 1,..., p, tr(z) = k } where λ i (Z) are the n eigenvalues of Z. For this formulation, the solution becomes Z = V k Vk T V k gives first k columns of V. where If we rela the constraint set to F = conv(c), its conve hull, we have a linear maimiation over the fantope of order k, which is conve: ma Z F tr(sz). This is equivalent to the nonconve PCA problem, i.e., it admits the same solution. Note: The fantope of order k is given by: F = {Z S P : λ i (Z) [0, 1], i = 1,..., p, tr(z) = k} = {Z S P : 0 Z I, tr(z) = k}

Lecture 4: September 12 4-3 4.2 Linear Programs 4.2.1 Definition A linear program (LP) is an optimization problem of the form: c T s.t. D d A = b Note that this is always conve. A fundamental problem in conve optimization, it has many diverse applications and a rich history. Dantzig s simple algorithm gives a direct solver. 4.2.2 Eamples Some common LP problems are given below: 1. Diet Problem: The problem deals with finding the cheapest combination of food items that satisfies some nutritional requirements. It can be formulated as shown below: c T s.t. D d 0 where c j is the per-unit cost of item j, d i is the imum intake of nutrient i required, D ij is the amount of nutrient i contained in food j and j is the units of food j in the diet. 2. Transportation Problem: This problem deals with imizing the costs of shipping the commodities from given sources to destinations. It can be formulated as shown below: s.t. m i=1 j=1 n c ij ij n ij s i, i = 1,..., m j=1 m ij d ij, j = 1,..., n, 0 i=1 where s i is the supply at source i, d j is the demand at destination j, c ij is the per-unit shipping cost from source i to destination j and ij is number of units shipped from i to j. 3. Basis Pursuit: Given y R n and X R n p (with p > n), the aim is to detere the sparsest solution to the underdetered linear system Xβ = y. It can be formulated as below: β β 0 s.t. Xβ = y

4-4 Lecture 4: September 12 where β 0 = p j=1 1{β j 0}. This is a nonconve problem, which can be recast as a linear program through an l 1 approimation known as basis pursuit. This formulation is given below: The above problem can be reformulated as: β β 1 s.t. Xβ = y β,z 1T z s.t. z β z β Xβ = y 4. Dantzig Selector: The Dantzig selector is a modification of basis pursuit where strict equality is not enforced, i.e., Xβ y. Then the formulation becomes: β β 1 s.t. X T (y Xβ) λ where λ 0 is a tuning parameter. This too can be reformulated as a linear program if the constraint is written as: λ X T j (y Xβ) λ j = 1,..., p 4.2.3 Standard Form A linear program is said to be in standard form when it is written as: Any LP can be written in standard form. c T s.t. A = b 0 4.3 Quadratic Programs 4.3.1 Definition Conve quadratic program (QP) is a kind of optimization problem of the form: c T + 1 2 T Q s.t. D d A = b We only discuss the case whose Q 0, since the problem is conve iff Q 0.

Lecture 4: September 12 4-5 4.3.2 Eamples Here are some common QP problems: 1. Portfolio optimization We can use the QP: µ T + γ 2 T Q s.t. 1 T = 1 0 to trade off performance and risk in a financial portfolio. Here µ is epected assets returns, Q is covariance matri of assets returns, γ is risk aversion, is portfolio holdings (sum is normalized to be 1). 2. Support vector machine Given y { 1, 1} n, X R n p with rows 1, 2,..., n. SVM problem is: 3. Lasso 1 β,β 0,ξ 2 β 2 2 + C n i=1 ξ i s.t. ξ i 0, i = 1,..., n y i ( T i β + β 0 ) 1 ξ i, i = 1,..., n. Given y R n, X R n p, recall the lasso problem: β R p y Xβ 2 2 s.t. β 1 s. Here s 0 is a tuning parameter. This can be rewritten as a quadratic program. An alternative way to parametrize the lasso problem is in the penalized / Lagrange form: β R p y Xβ 2 2 + λ β 1 Here λ is the tuning parameter. The can also be rewritten in a quadratic form. 4.3.3 Standard Form Any QP can be rewritten in the standard form: c T + 1 2 T Q s.t. A = b 0.

4-6 Lecture 4: September 12 4.4 Semidefinite programs (SDPs) 4.4.1 Motivation Recall that Linear programs (LPs) have the following form: c T subject to D d A = b (4.1) Here, is a vector. But we can generalize this problem to optimize over matrices, X, by changing to. This defines a partial ordering over matrices. (More on this later.) 4.4.2 Background Recall: S n is the space of n n symmetric matrices. S n + is the space of positive semidefinite matrices: S n + = {X S n u T Xu 0 for all u R n } S n ++ is the space of positive definite matrices. S n ++ = {X S n u T Xu > 0 for all u R n \{0}} If X is a matri in one of the above sets, this constrains its eigenvalues. Letting λ(x) denote the eigenvalues of a matri X: X S n λ(x) R n X S n + λ(x) R n + = { R n 0} X S n ++ λ(x) R n ++ = { R n > 0} We can define an inner product between two symmetric matrices X, Y S n using the trace operator: X Y = tr(xy ) = i,j X i,j Y i,j We can also partially order S n by defining as follows: X Y X Y S n + If we consider diagonal matrices, then this ordering for matrices becomes the same as our ordering for vectors. Below, diag() denotes a matri X S n which has the vector R n as its diagonal elements, and 0 elsewhere. Now let, y R n. Then:

Lecture 4: September 12 4-7 diag() diag(y) y 4.4.3 Semidefinite programs (SDPs) An SDP is an optimization problem of the form: c T subject to 1 F 1 + + n F n F 0 A = b (4.2) Here F j S d for j = 0, 1,, n, and A R m n, c R n, b R m. Recall that in an LP we have the constraint D d. If we let D i be the i th column of D, then this is the same as i id i d. So here, the SDP simply generalizes the vectors D i and d to be symmetric matrices, F i. This problem is always conve because linear matri inequalities are conve. An SDP is said to be in standard form if it is written as: C X X S n subject to A i X = b i, i = 1,, m X 0 (4.3) With C S n, A i S n, and b i R. Any SDP can be written in this form, though a proof of this fact will require some effort! Finally, any linear program is also a semidefinite program. To see this, consider the SDP where X = diag(). Eample: trace norm imization Let A 1,, A p R m n. Then the following is a linear mapping from R m n R p : A(X) = A 1 X (4.4) A p X (Note that because A i is not necessarily symmetric, we use the standard defintion of trace: A i X = tr(a T i X).) Finding a matri X that satisfies A(X) = b for some b such that X has the lowest rank is a nonconve problem, because calculating rank is not a conve function. But we can use the trace norm as a surrogate objective, resulting in the following trace norm approimation: X tr X subject to A(X) = b (4.5)

4-8 Lecture 4: September 12 This is an SDP, though this is not a trivial fact. 4.5 Conic Programs 4.5.1 Definition A conic program is an optimization problem of the form: c T s.t. A = b D() + b K Here, c, R n, A R m n, b R m. D : R n Y is a linear mapping, d Y for Euclidean space Y, K Y is a closed conve cone. This is very similar to LP, the only distinction is the set of linear inequalities are replaced with conic inequalities, i.e., D() + d K 0. Notice that if K = S n +, we recover SDP. Thus, this is a very broad class of problems. 4.5.2 Eamples Second-order cone program. A second-order cone program (SOCP) is an optimization problem of the form: c T s.t. D i + d 2 e T i + f i, i = 1, 2,..., n A = b This is a conic program with specific choice of K. In particular, it is a combination of second-order cones that are defined as: Q = {(, t) : 2 t}. From the definition, it is easy to see D i + d 2 e T i + f i (D i + d, e T i + f i ) Q i for appropriate dimensions, then taking K = Q 1 Q 2... Q p will lead to the conic program form. It is easy to see every LP is SOCP. In addition, to see every SOCP is and SDP, recall the Schur complement theorem: For A, C sysmmetric and C 0. Apply this the theorem to the following matri, [ ] A B B T 0 A BC C 1 B T 0.

Lecture 4: September 12 4-9 [ ti ] T t 0 ti T t 0 2 t. Thus, we can convert the second-order cone constraint to PSD constraint. 4.6 Relationship between Programs The relationship between Linear Program (LP), Quadratic Program (QP), Second-Order Cone Program (SOCP), Semidefinite Program (SDP) and Conic Program (CP) is shown in the following figure. While the relationship between conve problems and non-conve problems is shown in the following figure. Conve problems just contain the amount of a bubble compared with non-conve ones in this figure. Acknowledgements The slides and scribe notes in the former years were referred to while making this scribe note.