Canonical Problem Forms Ryan Tibshirani Convex Optimization 10-725
Last time: optimization basics Optimization terology (e.g., criterion, constraints, feasible points, solutions) Properties and first-order optimality Equivalent transformations (e.g., partial optimization, change of variables, eliating equality constraints) 2
Outline Today: Linear programs Quadratic programs Semidefinite programs Cone programs 3
4
Linear program A linear program or LP is an optimization problem of the form x c T x Dx d Ax = b Observe that this is always a convex optimization problem First introduced by Kantorovich in the late 1930s and Dantzig in the 1940s Dantzig s simplex algorithm gives a direct (noniterative) solver for LPs (later in the course we ll see interior point methods) Fundamental problem in convex optimization. Many diverse applications, rich history 5
Example: diet problem Find cheapest combination of foods that satisfies some nutritional requirements (useful for graduate students!) x c T x Dx d x 0 Interpretation: c j : per-unit cost of food j d i : imum required intake of nutrient i D ij : content of nutrient i per unit of food j x j : units of food j in the diet 6
Example: transportation problem Ship commodities from given sources to destinations at cost x m n c ij x ij i=1 j=1 n x ij s i, i = 1,..., m j=1 m x ij d j, j = 1,..., n, x 0 i=1 Interpretation: s i : supply at source i d j : demand at destination j c ij : per-unit shipping cost from i to j x ij : units shipped from i to j 7
Example: basis pursuit Given y R n and X R n p, where p > n. Suppose that we seek the sparsest solution to underdetered linear system Xβ = y Nonconvex formulation: β β 0 Xβ = y where recall β 0 = p j=1 1{β j 0}, the l 0 norm The l 1 approximation, often called basis pursuit: β β 1 Xβ = y 8
Basis pursuit is a linear program. Reformulation: β β 1 Xβ = y β,z 1 T z z β z β Xβ = y (Check that this makes sense to you) 9
Example: Dantzig selector Modification of previous problem, where we allow for Xβ y (we don t require exact equality), the Dantzig selector: 1 β β 1 Here λ 0 is a tuning parameter X T (y Xβ) λ Again, this can be reformulated as a linear program (check this!) 1 Candes and Tao (2007), The Dantzig selector: statistical estimation when p is much larger than n 10
11 Standard form A linear program is said to be in standard form when it is written as x c T x Ax = b x 0 Any linear program can be rewritten in standard form (check this!)
12 Convex quadratic program A convex quadratic program or QP is an optimization problem of the form x c T x + 1 2 xt Qx Dx d Ax = b where Q 0, i.e., positive semidefinite Note that this problem is not convex when Q 0 From now on, when we say quadratic program or QP, we implicitly assume that Q 0 (so the problem is convex)
13 Example: portfolio optimization Construct a financial portfolio, trading off performance and risk: Interpretation: µ : expected assets returns max µ T x γ x 2 xt Qx 1 T x = 1 x 0 Q : covariance matrix of assets returns γ : risk aversion x : portfolio holdings (percentages)
14 Example: support vector machines Given y { 1, 1} n, X R n p having rows x 1,... x n, recall the support vector machine or SVM problem: β,β 0,ξ 1 2 β 2 2 + C n i=1 ξ i ξ i 0, i = 1,... n y i (x T i β + β 0 ) 1 ξ i, i = 1,... n This is a quadratic program
15 Example: lasso Given y R n, X R n p, recall the lasso problem: β y Xβ 2 2 β 1 s Here s 0 is a tuning parameter. Indeed, this can be reformulated as a quadratic program (check this!) Alternative parametrization (called Lagrange, or penalized form): 1 β 2 y Xβ 2 2 + λ β 1 Now λ 0 is a tuning parameter. And again, this can be rewritten as a quadratic program (check this!)
16 Standard form A quadratic program is in standard form if it is written as x c T x + 1 2 xt Qx Ax = b x 0 Any quadratic program can be rewritten in standard form
17 Motivation for semidefinite programs Consider linear programg again: x c T x Dx d Ax = b Can generalize by changing to different (partial) order. Recall: S n is space of n n symmetric matrices S n + is the space of positive semidefinite matrices, i.e., S n + = {X S n : u T Xu 0 for all u R n } S n ++ is the space of positive definite matrices, i.e., S n ++ = { X S n : u T Xu > 0 for all u R n \ {0} }
18 Facts about S n, S n +, S n ++ Basic linear algebra facts, here λ(x) = (λ 1 (X),..., λ n (X)): X S n = λ(x) R n X S n + λ(x) R n + X S n ++ λ(x) R n ++ We can define an inner product over S n : given X, Y S n, X Y = tr(xy ) We can define a partial ordering over S n : given X, Y S n, X Y X Y S n + Note: for x, y R n, diag(x) diag(y) x y (recall, the latter is interpreted elementwise)
19 Semidefinite program A semidefinite program or SDP is an optimization problem of the form x c T x x 1 F 1 +... + x n F n F 0 Ax = b Here F j S d, for j = 0, 1,... n, and A R m n, c R n, b R m. Observe that this is always a convex optimization problem Also, any linear program is a semidefinite program (check this!)
20 Standard form A semidefinite program is in standard form if it is written as X C X A i X = b i, i = 1,... m X 0 Any semidefinite program can be written in standard form (for a challenge, check this!)
Example: theta function Let G = (N, E) be an undirected graph, N = {1,..., n}, and ω(g) : clique number of G χ(g) : chromatic number of G The Lovasz theta function: 2 ϑ(g) = max X 11 T X I X = 1 X ij = 0, (i, j) / E X 0 The Lovasz sandwich theorem: ω(g) ϑ(ḡ) χ(g), where Ḡ is the complement graph of G 2 Lovasz (1979), On the Shannon capacity of a graph 21
22 Example: trace norm imization Let A : R m n R p be a linear map, A 1 X A(X) =... A p X for A 1,... A p R m n (and where A i X = tr(a T i X)). Finding lowest-rank solution to an underdetered system, nonconvex: Trace norm approximation: X X rank(x) A(X) = b X tr A(X) = b This is indeed an SDP (but harder to show, requires duality...)
23 Conic program A conic program is an optimization problem of the form: x c T x Ax = b D(x) + d K Here: c, x R n, and A R m n, b R m D : R n Y is a linear map, d Y, for Euclidean space Y K Y is a closed convex cone Both LPs and SDPs are special cases of conic programg. For LPs, K = R n +; for SDPs, K = S n +
24 Second-order cone program A second-order cone program or SOCP is an optimization problem of the form: x c T x D i x + d i 2 e T i x + f i, i = 1,... p Ax = b This is indeed a cone program. Why? Recall the second-order cone So we have Q = {(x, t) : x 2 t} D i x + d i 2 e T i x + f i (D i x + d i, e T i x + f i ) Q i for second-order cone Q i of appropriate dimensions. Now take K = Q 1... Q p
25 Observe that every LP is an SOCP. Further, every SOCP is an SDP Why? Turns out that x 2 t [ ti x x T t ] 0 Hence we can write any SOCP constraint as an SDP constraint The above is a special case of the Schur complement theorem: [ ] A B B T 0 A BC 1 B T 0 C for A, C symmetric and C 0
26 Hey, what about QPs? Finally, our old friend QPs sneak into the hierarchy. Turns out QPs are SOCPs, which we can see by rewriting a QP as x,t c T x + t Dx d, 1 2 xt Qx t Ax = b Now write 1 2 xt Qx t ( 1 2 Q 1/2 x, 1 2 (1 t)) 2 1 2 (1 + t) Take a breath (phew!). Thus we have established the hierachy LPs QPs SOCPs SDPs Conic programs completing the picture we saw at the start
27 References and further reading D. Bertsimas and J. Tsitsiklis (1997), Introduction to linear optimization, Chapters 1, 2 A. Nemirovski and A. Ben-Tal (2001), Lectures on modern convex optimization, Chapters 1 4 S. Boyd and L. Vandenberghe (2004), Convex optimization, Chapter 4