10-725/36-725: Conve Optimization Spring 2015 Lecturer: Javier Pena Lecture 4: January 26 Scribes: Vipul Singh, Shinjini Kundu, Chia-Yin Tsai Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. 4.1 Introduction Optimization is a huge class of problems. There is a hierarchy of conve optimization problems. We ll talk about linear programming, quadratic programming, second-order cone programming, and semidefinite programming today. See below. 4-1
4-2 Lecture 4: January 26 4.2 Linear Program A linear program is a problem of the form: min c T subject to D d A = b The simplest type of conve optimization problem. Recall that a conve optimization problem is a problem of the form: min f() subject to g() d A = b where f and g s are conve functions. In LP, objective and all inequality constraints are linear, and linear functions are conve. Aside: Linear programming has an interesting history. Attributed to Dantzig in in 1940s, and has a vast range of applications, especially in game theory. Eamples: Diet problem Find the cheapest combination of foods that satisfies some nutritional requirements. min c T subject to D d 0 Here, c j is per-unit cost of food j, d i is minimum required intake of nutrient i, D i j is content of nutrient i per unit of food j, and j is the units of food j in the diet. Another eample: Transportation problem (see slides) Another eample: L1 minmization is a heuristic to find a sparse solution to an under-determined system of equations, min 1 subject to A = b Where A is a fat matri (m n), having fewer constraints than variables. Solving the combinatorial problem is NP-hard, but the LP problem recovers a sparse solution with high probability. Another eample: Dantzig selector Tweaks the L1 minimization problem, where assuming that b is not just A, but is A + ɛ, and allows some error in the solution. Any linear programming problem can be written in standard form. min c T subject to A b 0
Lecture 4: January 26 4-3 For eample, if we have min c T subject to D d we can add slack variables and write as min c T subject to D s = d s 0 Can replace = y s Optimality conditions: We have that -c is in the normal cone at an optimal solution *, which equivalent to stating that c = A T y +s. Easier to characterize the optimality conditions when the problem is in standard form. 4.3 Quadratic Program 4.3.1 Conve quadratic programming This is an optimization problem of the form min c T + 1/2 T Q subject to D d A = b where Q is symmetric and positive semidefinite. The problem is conve iff the matri Q is positive semidefinite. Eamples: portfolio optimization Model to construct financial portfolio with optimal performance/risk tradeoff: ma µ T γ/2 T Q subject to 1 T = 1 0 Here, µ is the epected assets returns, Q is the covariance matri of assets returns, γ is risk aversion, and is the portfolio holding (percentages). Another eample: support vector machines The objective is quadratic
4-4 Lecture 4: January 26 Why is it a quadratic minimization problem when the C i s don t appear square? The Q matri can have many zero entries. 4.3.2 Standard form A quadratic program is in standard form if it is written as: min c T + 1 2 T Q subject to A = b 0 is an optimal solution c Q N C ( ) Q + c = A T ȳ + s for some ȳ and s 0 such that s T = 0. We see that for linear and quadratic conve programs, all we need to formalize the optimal solution are first order conditions. Comparing with the optimality condition for linear programs, we see that if we have a solver for linear programs, we might be able to tweak it a bit to solve conve quadratic programs. 4.4 Semi-definite programming Semi-definite programs are a much bigger subset of conve optimization problems than conve quadratic programs. We can etend from linear programs to semi-definite programs by changing the order ( ) involved in the inequality constraint D d to a different kind of order in some vector space. We work with the vector-space S n now. 4.4.1 Notation and Definitions S n is the vector space of symmetric n n real matrices. Inside this vector space resides the cone of positive semi-definite matrices: S n + := {X S n : u T Xu 0 u R n } We will be using a couple of facts from linear algebra: The eigenvalues of a symmetric matri are always real. The eigenvalues of a positive semi-definite matri are always non-negative. The canonical inner product in S n is: X, Y = X Y := trace(xy ) = n i=1 n j=1 X ijy ij Trace satisfies the property that if the product ABC is well-defined and the result is a square matri, then tr(abc) = tr(bca). S n + is a closed conve cone. The interior of S n + is the cone of positive definite matrices defined as: S n ++ := {X S n : u T Xu > 0 u R n \ {0}}
Lecture 4: January 26 4-5 X S n ++ λ(x) R n ++, where λ(x) is the map defining the eigenvalues of X. Loewner ordering: Given X, Y S n, X Y X Y S n + 4.4.2 The Optimization Problem A semi-definite program (SDP) is of the form: min c T subject to Σ n j=1f j j F 0 Here, F j S d, j = 0, 1,..., n and A R m n, c R n, b R m. A = b In the LP formulation, we had constraints D d, i.e., Σ n j=1 d j j d, where d j s are the columns of D. This is a system of linear inequalities. For the SDP, we have replaced d j with F j, d with F 0, and the order with. By analogy, this is called a system of Linear Matri Inequalities. A semidefinite program is a conve optimization problem. 4.4.3 Standard form min C X X subject to A i X = b i, i = 1,..., m X 0 where A 1,..., A m and C are given symmetric n n matrices, and X S n is the matri variable. Every linear program is an SDP. We see this using the following: R n + Diag() 0 c T = Diag(c) Diag() The objective as well as constraints can be written in matri form. The constraints for off-diagonal elements being 0 can be enforced by equalities. 4.4.4 History of semidefinite programming Eigenvalue optimization, LMI problems (1960s - 1970s) linear matri inequality in control Lovasz theta function (1979) in information theory [lovasz] Interior-point algorithm for SDP (1980s, 1990s) Advancements in theory, algorithms, application (1990s) New algorithm and applications in data and imaging science (2000s-)
4-6 Lecture 4: January 26 4.5 Application of semidefinite programming 4.5.1 theta function Assume G = (N, E), N is the node and E is the edge. ω(g) is the clique number of G. The largest set of nodes that are completely connected. χ(g) is the chromatic number of G. The minimum number of colors that suffice to color the nodes of the graph. The theta function: θ(g) := ma X 11T X subject to I X = 1 X ij = 0, (i, j) E X 0 Lovasz sandwich theorem: ω(g) θ(g) χ(g) 4.5.2 Nuclear norm minimization Similar to l 1 -norm for vectors. min X tr X subject to A(X) = b Here A : R mn p linear map, b R p. Nuclear norm is X tr = σ(x) 1, the sum of the singular values of X. The dual of the nuclear norm is operator norm: X op = σ(x) = ma Xu 2 : u 2 1 (Note: the duality is like the p and q norm for vector.) Eample: Netfli challenge. We would want to find a matri that is low rank. Key for proof that this is a semidefinite programming. Observation For y R mn y op 1 yy T I m Im y 0(special case of Schur Complement) y T I n
Lecture 4: January 26 4-7 tr = ma y = ma y subject to {trace( T y) : y op 1} trace( T y) Im y 0 1 = min w 1,w 2 2 (I mw 1 + I n w 2 ) w1 subject to T 0 w 2 by SDP duality y T I n The problem becomes a SDP. 1 min X,w 1,w 2 2 (I mw 1 + I n w 2 ) subject to A(X) = b w1 X X T 0 w 2 A B *Schur Complement theorem: For a matri B T and A, C are symmetric, C 0 C A B B T 0 A BC 1 B T 0 C 4.6 Conic programming LP and SDP are special cases of conic programming. Conic program where K is a closed conve cone. min c T subject to d D K A = b 4.6.1 second-order conic programming (SOCP) min c T subject to d D Q A = b where Q = Q n1... Q nr. and Q n is defined as Q n := { = 0 R n : 0 }
4-8 Lecture 4: January 26 Standard form: for Q = Q n1... Q nr. LP SOCP SDP. min c T subject to Q 0 A = b 4.6.1.1 Conve QCQP For a conve inequality T Q + q T + l 0, where Q = LL T S n, q R n, and l R. It can be recast as [ L T ] 1+q T +l 1 qt l 2 2. That is, A + b c T e T + f + d e T + fa + b c T Q + d Therefore, it can be rewrite to a SOCP if Q i 0, i = 1,..., r min T Q 0 + q T 0 subject to T Q i + q T i, i = 1,..., r since all the inequality can be transformed to a second order cone constraint. 4.6.1.2 Rewrite a second order cone in terms of SDP We can rewrite 2 0. By Shur Complement theorem, So, I T 0 2 1 1 2 0 I 0 T 0 0 Any second order cone constraints can be rewrite to a SD constraint. References [lovasz] Lovasz L, On the Shannon capacity of a graph. IEEE Transactions on Information Theory Vol. 25, No. 1, (Jan. 25, 1979), pp. 1-7