Lecture 4: September 12 - PDF Free Download

10-725/36-725: Conve Optimization Fall 2016 Lecture 4: September 12 Lecturer: Ryan Tibshirani Scribes: Jay Hennig, Yifeng Tao, Sriram Vasudevan Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. 4.1 Previous Lecture 4.1.1 Eliating Equality Constraints If the problem is of the form f() s.t g i () 0, i = 1,..., m A = b then can be epressed as My + 0 (where A 0 = b and col(m) = null(a)). Doing so allows us to rewrite the above problem as: y f(my + 0 ) s.t g i (My + 0 ) 0, i = 1,..., m 4.1.2 Introducing Slack Variables The concept of slack variables is opposite to that of eliating equality constraints. Thus the first formulation in the previous section can be written as:,s f() s.t s i 0, i = 1,..., m g i () + s i = 0, i = 1,..., m A = b This problem however is not conve unless the g i are all affine. 4.1.3 Relaing Nonaffine Equality Constraints Given an optimization problem f() such that C, we can consider an enlarged set C C and solve f() such that C instead. This is known as relaation, and its optimal value is always lesser than or equal to that of the original problem. 4-1

4-2 Lecture 4: September 12 An important special case is that of replacing conve nonaffine equality constraints h j () = 0, j = 1,..., r with h j () 0, j = 1,..., r. 4.1.4 Eamples 1. Maimum Utility Problem: This problem models investment/consumption. It can be formulated as: ma,b T α t u( t ) t=1 s.t. b t+1 = b t + f(b t ) t, t = 1,..., T 0 t b t, t = 1,..., T with b t being the budget and t being the amount consumed at time t. f is the investment return function, u is the utility function, and both are concave and increasing. The equality constraint is nonaffine, but if we rela it to an inequality, the problem doesn t change (relaation is tight), and the problem is now conve. 2. Principal Component Analysis: Given X R n p, consider the low rank approimation problem R X R 2 F such that rank(r) = k. Here A 2 F = n p i=1 j=1 A2 ij, the entrywise squared l 2 norm. This is equivalent to the PCA problem where R = U k D k Vk T with U k and V k being the first k columns of U and V, and D k being the first k diagonal elements of D (X = UDV T, the SVD decomposition of X). This is not a conve problem. To see this, suppose we take a matri A in the set C = {R : Rank(R) = k}, then A C, but 0.5A + 0.5( A) / C. This problem can be recast in a conve form by first rewriting the problem as X XZ 2 Z S P F subject to rank(r) = k ma tr(sz) subject to rank(r) = k Z S P where Z is a projection and S = X T X. Hence the constraint set is the nonconve set C = { Z S P : λ i (Z) {0, 1}, i = 1,..., p, tr(z) = k } where λ i (Z) are the n eigenvalues of Z. For this formulation, the solution becomes Z = V k Vk T V k gives first k columns of V. where If we rela the constraint set to F = conv(c), its conve hull, we have a linear maimiation over the fantope of order k, which is conve: ma Z F tr(sz). This is equivalent to the nonconve PCA problem, i.e., it admits the same solution. Note: The fantope of order k is given by: F = {Z S P : λ i (Z) [0, 1], i = 1,..., p, tr(z) = k} = {Z S P : 0 Z I, tr(z) = k}

Lecture 4: September 12 4-3 4.2 Linear Programs 4.2.1 Definition A linear program (LP) is an optimization problem of the form: c T s.t. D d A = b Note that this is always conve. A fundamental problem in conve optimization, it has many diverse applications and a rich history. Dantzig s simple algorithm gives a direct solver. 4.2.2 Eamples Some common LP problems are given below: 1. Diet Problem: The problem deals with finding the cheapest combination of food items that satisfies some nutritional requirements. It can be formulated as shown below: c T s.t. D d 0 where c j is the per-unit cost of item j, d i is the imum intake of nutrient i required, D ij is the amount of nutrient i contained in food j and j is the units of food j in the diet. 2. Transportation Problem: This problem deals with imizing the costs of shipping the commodities from given sources to destinations. It can be formulated as shown below: s.t. m i=1 j=1 n c ij ij n ij s i, i = 1,..., m j=1 m ij d ij, j = 1,..., n, 0 i=1 where s i is the supply at source i, d j is the demand at destination j, c ij is the per-unit shipping cost from source i to destination j and ij is number of units shipped from i to j. 3. Basis Pursuit: Given y R n and X R n p (with p > n), the aim is to detere the sparsest solution to the underdetered linear system Xβ = y. It can be formulated as below: β β 0 s.t. Xβ = y

4-4 Lecture 4: September 12 where β 0 = p j=1 1{β j 0}. This is a nonconve problem, which can be recast as a linear program through an l 1 approimation known as basis pursuit. This formulation is given below: The above problem can be reformulated as: β β 1 s.t. Xβ = y β,z 1T z s.t. z β z β Xβ = y 4. Dantzig Selector: The Dantzig selector is a modification of basis pursuit where strict equality is not enforced, i.e., Xβ y. Then the formulation becomes: β β 1 s.t. X T (y Xβ) λ where λ 0 is a tuning parameter. This too can be reformulated as a linear program if the constraint is written as: λ X T j (y Xβ) λ j = 1,..., p 4.2.3 Standard Form A linear program is said to be in standard form when it is written as: Any LP can be written in standard form. c T s.t. A = b 0 4.3 Quadratic Programs 4.3.1 Definition Conve quadratic program (QP) is a kind of optimization problem of the form: c T + 1 2 T Q s.t. D d A = b We only discuss the case whose Q 0, since the problem is conve iff Q 0.

Lecture 4: September 12 4-5 4.3.2 Eamples Here are some common QP problems: 1. Portfolio optimization We can use the QP: µ T + γ 2 T Q s.t. 1 T = 1 0 to trade off performance and risk in a financial portfolio. Here µ is epected assets returns, Q is covariance matri of assets returns, γ is risk aversion, is portfolio holdings (sum is normalized to be 1). 2. Support vector machine Given y { 1, 1} n, X R n p with rows 1, 2,..., n. SVM problem is: 3. Lasso 1 β,β 0,ξ 2 β 2 2 + C n i=1 ξ i s.t. ξ i 0, i = 1,..., n y i ( T i β + β 0 ) 1 ξ i, i = 1,..., n. Given y R n, X R n p, recall the lasso problem: β R p y Xβ 2 2 s.t. β 1 s. Here s 0 is a tuning parameter. This can be rewritten as a quadratic program. An alternative way to parametrize the lasso problem is in the penalized / Lagrange form: β R p y Xβ 2 2 + λ β 1 Here λ is the tuning parameter. The can also be rewritten in a quadratic form. 4.3.3 Standard Form Any QP can be rewritten in the standard form: c T + 1 2 T Q s.t. A = b 0.

4-6 Lecture 4: September 12 4.4 Semidefinite programs (SDPs) 4.4.1 Motivation Recall that Linear programs (LPs) have the following form: c T subject to D d A = b (4.1) Here, is a vector. But we can generalize this problem to optimize over matrices, X, by changing to. This defines a partial ordering over matrices. (More on this later.) 4.4.2 Background Recall: S n is the space of n n symmetric matrices. S n + is the space of positive semidefinite matrices: S n + = {X S n u T Xu 0 for all u R n } S n ++ is the space of positive definite matrices. S n ++ = {X S n u T Xu > 0 for all u R n \{0}} If X is a matri in one of the above sets, this constrains its eigenvalues. Letting λ(x) denote the eigenvalues of a matri X: X S n λ(x) R n X S n + λ(x) R n + = { R n 0} X S n ++ λ(x) R n ++ = { R n > 0} We can define an inner product between two symmetric matrices X, Y S n using the trace operator: X Y = tr(xy ) = i,j X i,j Y i,j We can also partially order S n by defining as follows: X Y X Y S n + If we consider diagonal matrices, then this ordering for matrices becomes the same as our ordering for vectors. Below, diag() denotes a matri X S n which has the vector R n as its diagonal elements, and 0 elsewhere. Now let, y R n. Then:

Lecture 4: September 12 4-7 diag() diag(y) y 4.4.3 Semidefinite programs (SDPs) An SDP is an optimization problem of the form: c T subject to 1 F 1 + + n F n F 0 A = b (4.2) Here F j S d for j = 0, 1,, n, and A R m n, c R n, b R m. Recall that in an LP we have the constraint D d. If we let D i be the i th column of D, then this is the same as i id i d. So here, the SDP simply generalizes the vectors D i and d to be symmetric matrices, F i. This problem is always conve because linear matri inequalities are conve. An SDP is said to be in standard form if it is written as: C X X S n subject to A i X = b i, i = 1,, m X 0 (4.3) With C S n, A i S n, and b i R. Any SDP can be written in this form, though a proof of this fact will require some effort! Finally, any linear program is also a semidefinite program. To see this, consider the SDP where X = diag(). Eample: trace norm imization Let A 1,, A p R m n. Then the following is a linear mapping from R m n R p : A(X) = A 1 X (4.4) A p X (Note that because A i is not necessarily symmetric, we use the standard defintion of trace: A i X = tr(a T i X).) Finding a matri X that satisfies A(X) = b for some b such that X has the lowest rank is a nonconve problem, because calculating rank is not a conve function. But we can use the trace norm as a surrogate objective, resulting in the following trace norm approimation: X tr X subject to A(X) = b (4.5)

4-8 Lecture 4: September 12 This is an SDP, though this is not a trivial fact. 4.5 Conic Programs 4.5.1 Definition A conic program is an optimization problem of the form: c T s.t. A = b D() + b K Here, c, R n, A R m n, b R m. D : R n Y is a linear mapping, d Y for Euclidean space Y, K Y is a closed conve cone. This is very similar to LP, the only distinction is the set of linear inequalities are replaced with conic inequalities, i.e., D() + d K 0. Notice that if K = S n +, we recover SDP. Thus, this is a very broad class of problems. 4.5.2 Eamples Second-order cone program. A second-order cone program (SOCP) is an optimization problem of the form: c T s.t. D i + d 2 e T i + f i, i = 1, 2,..., n A = b This is a conic program with specific choice of K. In particular, it is a combination of second-order cones that are defined as: Q = {(, t) : 2 t}. From the definition, it is easy to see D i + d 2 e T i + f i (D i + d, e T i + f i ) Q i for appropriate dimensions, then taking K = Q 1 Q 2... Q p will lead to the conic program form. It is easy to see every LP is SOCP. In addition, to see every SOCP is and SDP, recall the Schur complement theorem: For A, C sysmmetric and C 0. Apply this the theorem to the following matri, [ ] A B B T 0 A BC C 1 B T 0.

Lecture 4: September 12 4-9 [ ti ] T t 0 ti T t 0 2 t. Thus, we can convert the second-order cone constraint to PSD constraint. 4.6 Relationship between Programs The relationship between Linear Program (LP), Quadratic Program (QP), Second-Order Cone Program (SOCP), Semidefinite Program (SDP) and Conic Program (CP) is shown in the following figure. While the relationship between conve problems and non-conve problems is shown in the following figure. Conve problems just contain the amount of a bubble compared with non-conve ones in this figure. Acknowledgements The slides and scribe notes in the former years were referred to while making this scribe note.