Disciplined Convex Programming

Similar documents
CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Advances in Convex Optimization: Theory, Algorithms, and Applications

Convex Optimization. Lieven Vandenberghe Electrical Engineering Department, UCLA. Joint work with Stephen Boyd, Stanford University

4. Convex optimization problems

4. Convex optimization problems

Convex optimization problems. Optimization problem in standard form

Convex Optimization M2

Lecture: Convex Optimization Problems

12. Interior-point methods

Course Outline. FRTN10 Multivariable Control, Lecture 13. General idea for Lectures Lecture 13 Outline. Example 1 (Doyle Stein, 1979)

Convex Optimization in Classification Problems

Lecture 1: Introduction

Tutorial on Convex Optimization for Engineers Part II

Lecture 4: Linear and quadratic problems

Convex Optimization & Machine Learning. Introduction to Optimization

Introduction to Convex Optimization

8. Geometric problems

15. Conic optimization

Convex Optimization Problems. Prof. Daniel P. Palomar

Stochastic Subgradient Methods

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Convex Optimization: Applications

The Q-parametrization (Youla) Lecture 13: Synthesis by Convex Optimization. Lecture 13: Synthesis by Convex Optimization. Example: Spring-mass System

FRTN10 Multivariable Control, Lecture 13. Course outline. The Q-parametrization (Youla) Example: Spring-mass System

Convex Optimization and l 1 -minimization

8. Geometric problems

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Tutorial on Convex Optimization: Part II

12. Interior-point methods

Convex optimization examples

A direct formulation for sparse PCA using semidefinite programming

INTERIOR-POINT METHODS ROBERT J. VANDERBEI JOINT WORK WITH H. YURTTAN BENSON REAL-WORLD EXAMPLES BY J.O. COLEMAN, NAVAL RESEARCH LAB

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Example 1 linear elastic structure; forces f 1 ;:::;f 100 induce deections d 1 ;:::;d f i F max i, several hundred other constraints: max load p

Lecture 2: Convex functions

Sparse Optimization Lecture: Basic Sparse Optimization Models

A direct formulation for sparse PCA using semidefinite programming

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

From Convex Optimization to Linear Matrix Inequalities

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs

LMI solvers. Some notes on standard. Laboratoire d Analyse et d Architecture des Systèmes. Denis Arzelier - Dimitri Peaucelle - Didier Henrion

Module 04 Optimization Problems KKT Conditions & Solvers

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LMI Methods in Optimal and Robust Control

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Lecture 1 Introduction

A Quick Tour of Linear Algebra and Optimization for Machine Learning

Mathematical Optimization Models and Applications

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Optimization in. Stephen Boyd. 3rd SIAM Conf. Control & Applications. and Control Theory. System. Convex

Semidefinite Programming Basics and Applications

Lecture 6: Conic Optimization September 8

Convex Optimization of Graph Laplacian Eigenvalues

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications

POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

Constrained Optimization and Lagrangian Duality

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms

13. Convex programming

Research overview. Seminar September 4, Lehigh University Department of Industrial & Systems Engineering. Research overview.

Relaxations and Randomized Methods for Nonconvex QCQPs

Second-Order Cone Program (SOCP) Detection and Transformation Algorithms for Optimization Software

Lagrangian Duality and Convex Optimization

Review of Optimization Methods

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

Homework 4. Convex Optimization /36-725

Sparse Covariance Selection using Semidefinite Programming

Lecture: Examples of LP, SOCP and SDP

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

Statistical Methods for SVM

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization

Problem 1 (Exercise 2.2, Monograph)

CS-E4830 Kernel Methods in Machine Learning

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

Handout 8: Dealing with Data Uncertainty

An Introduction to CVX

7. Statistical estimation

Analysis and synthesis: a complexity perspective

IOE 611/Math 663: Nonlinear Programming

Nonlinear Programming Models

Lecture 20: November 1st

Kernel Methods. Machine Learning A W VO

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

9. Geometric problems

A Brief Review on Convex Optimization

Decentralized Control of Stochastic Systems

Convex Optimization in Communications and Signal Processing

A CONIC DANTZIG-WOLFE DECOMPOSITION APPROACH FOR LARGE SCALE SEMIDEFINITE PROGRAMMING

January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions

Transcription:

Disciplined Convex Programming Stephen Boyd Michael Grant Electrical Engineering Department, Stanford University University of Pennsylvania, 3/30/07

Outline convex optimization checking convexity via convex calculus convex optimization solvers efficient solution via problem transformations disciplined convex programming examples bounding portfolio risk computing probability bounds antenna array beamforming l 1 -regularized logistic regression University of Pennsylvania, 3/30/07 1

Optimization opitmization problem with variable x R n : minimize f 0 (x) subject to f i (x) 0, h i (x) = 0, i = 1,...,m i = 1,...,p our ability to solve varies widely; depends on properties of f i, h i for f i, h i affine (linear plus constant) get linear program (LP); can solve very efficiently even simple looking, relatively small problems with nonlinear f i, h i can be intractable University of Pennsylvania, 3/30/07 2

convex optimization problem: Convex optimization minimize f 0 (x) subject to f i (x) 0, Ax = b i = 1,...,m objective and inequality constraint functions f i are convex: for all x, y, θ [0,1], f i (θx + (1 θ)y) θf i (x) + (1 θ)f i (y) roughly speaking, graphs of f i curve upward equality constraint functions are affine, so can be written as Ax = b University of Pennsylvania, 3/30/07 3

Convex optimization a subclass of optimization problems that includes LP as special case convex problems can look very difficult (nonlinear, even nondifferentiable), but like LP can be solved very efficiently convex problems come up more often than was once thought many applications recently discovered in control, combinatorial optimization, signal processing, communications, circuit design, machine learning, statistics, finance,... University of Pennsylvania, 3/30/07 4

General approaches to using convex optimization pretend/assume/hope f i are convex and proceed easy on user (problem specifier) but lose many benefits of convex optimization verify problem is convex before attempting solution but verification for general problem description is hard, often fails construct problem as convex from the outset user needs to follow a restricted set of rules and methods convexity verification is automatic each has its advantages, but we focus on 3rd approach University of Pennsylvania, 3/30/07 5

How can you tell if a problem is convex? need to check convexity of a function approaches: use basic definition, first or second order conditions, e.g., 2 f(x) 0 via convex calculus: construct f using library of basic examples or atoms that are convex calculus rules or transformations that preserve convexity University of Pennsylvania, 3/30/07 6

Convex functions: Basic examples x p for p 1 or p 0; x p for 0 p 1 e x, log x, xlog x a T x + b x T x; x T x/y (for y > 0); (x T x) 1/2 x (any norm) max(x 1,...,x n ), log(e x 1 + + e x n ) log Φ(x) (Φ is Gaussian CDF) log det X 1 (for X 0) University of Pennsylvania, 3/30/07 7

Calculus rules nonnegative scaling: if f is convex, α 0, then αf is convex sum: if f and g are convex, so is f + g affine composition: if f is convex, so is f(ax + b) pointwise maximum: if f 1,...,f m are convex, so is f(x) = max i f i (x) partial minimization: if f(x, y) is convex, and C is convex, then g(x) = inf y C f(x,y) is convex composition: if h is convex and increasing, and f is convex, then g(x) = h(f(x)) is convex (there are several other composition rules)... and many others (but rules above will get you quite far) University of Pennsylvania, 3/30/07 8

Examples piecewise-linear function: f(x) = max i=1...,k (a T i x + b i) l 1 -regularized least-squares cost: Ax b 2 2 + λ x 1, with λ 0 sum of largest k elements of x: f(x) = x [1] + + x [k] log-barrier: m i=1 log( f i(x)) (on {x f i (x) < 0}, f i convex) distance to convex set C: f(x) = dist(x,c) = inf y C x y 2 note: except for log-barrier, these functions are nondifferentiable... University of Pennsylvania, 3/30/07 9

How do you solve a convex problem? use someone else s ( standard ) solver (LP, QP, SDP,... ) easy, but your problem must be in a standard form cost of solver development amortized across many users write your own (custom) solver lots of work, but can take advantage of special structure transform your problem into a standard form, and use a standard solver extends reach of problems that can be solved using standard solvers transformation can be hard to find, cumbersome to carry out this talk: methods to formalize and automate the last approach University of Pennsylvania, 3/30/07 10

General convex optimization solvers subgradient, bundle, proximal, ellipsoid methods mostly developed in Soviet Union, 1960s 1970s are universal convex optimization solvers, that work even for nondifferentiable f i ellipsoid method is efficient in theory (i.e., polynomial time) all can be slow in practice University of Pennsylvania, 3/30/07 11

Interior-point convex optimization solvers rapid development since 1990s, but some ideas can be traced to 1960s can handle smooth f i (e.g., LP, QP, GP), and problems in conic form (SOCP, SDP) are extremely efficient, typically requiring a few tens of iterations, almost independent of problem type and size each iteration involves solving a set of linear equations (least-squares problem) with same size and structure as problem method of choice when applicable University of Pennsylvania, 3/30/07 12

What if interior-point methods can t handle my problem? example: l 1 -regularized least-squares (used in machine learning): minimize Ax b 2 2 + λ x 1 a convex problem, but objective is nondifferentiable, so cannot directly use interior-point method (IPM) basic idea: transform problem, possibly adding new variables and constraints, so that IPM can be used even though transformed problem has more variables and constraints, we can solve it very efficiently via IPM University of Pennsylvania, 3/30/07 13

Example: l 1 -regularized least-squares original problem, with n variables, no constraints: minimize Ax b 2 2 + λ x 1 introduce new variable t R n, and new constraints x i t i : minimize x T (A T A)x (A T b) T x + λ1 T t subject to x t, t x a problem with 2n variables, 2n constraints, but objective and constraint functions are smooth so IPM can be used key point: problems are equivalent (if we solve one, we can easily get solution of other) University of Pennsylvania, 3/30/07 14

Efficient solution via problem transformations start with convex optimization problem P 0, possibly with nondifferentiable objective or constraint functions carry out a sequence of equivalence transformations to yield a problem P K that can be handled by an IP solver P 0 P 1 P K solve P K efficiently transform solution of P K back to solution of original problem P 0 P K often has more variables and constraints than P 0, but its special structure, and efficiency of IPMs, more than compensates University of Pennsylvania, 3/30/07 15

Convex calculus rules and problem transformations for most of the convex calculus rules, there is an associated problem transformation that undoes the rule example: when we encounter max{f 1 (x),f 2 (x)} we replace it with a new variable t add new (convex) constraints f 1 (x) t, f 2 (x) t example: when we encounter h(f(x)) we replace it with h(t) add new (convex) constraint f(x) t these transformations look trivial, but are not University of Pennsylvania, 3/30/07 16

From proof of convexity to IPM-compatible problem minimize f 0 (x) subject to f i (x) 0, Ax = b i = 1,...,m when you construct f i from atoms and convex calculus rules, you have a mathematical proof that the problem is convex the same construction gives a sequence of problem transformations that yields a problem containing only atoms and equality constraints if the atoms are IPM-compatible, our constructive proof automatically gives us an equivalent problem that is IPM-compatible University of Pennsylvania, 3/30/07 17

Disciplined convex programming specify convex problem in natural form declare optimization variables form convex objective and constraints using a specific set of atoms and calculus rules problem is convex-by-construction easy to parse, automatically transform to IPM-compatible form, solve, and transform back implemented using object-oriented methods and/or compiler-compilers University of Pennsylvania, 3/30/07 18

Example (cvx) convex problem, with variable x R n : minimize Ax b 2 + λ x 1 subject to Fx g cvx specification: cvx begin variable x(n) % declare vector variable minimize ( norm(a*x-b,2) + lambda*norm(x,1) ) subject to F*x <= g cvx end University of Pennsylvania, 3/30/07 19

when cvx processes this specification, it verifies convexity of problem generates equivalent IPM-compatible problem solves it using SDPT3 or SeDuMi transforms solution back to original problem the cvx code is easy to read, understand, modify University of Pennsylvania, 3/30/07 20

The same example, transformed by hand transform problem to SOCP, call SeDuMi, reconstruct solution: % set up big matrices [m,n] = size(a); [p,n] = size(f); AA = [ speye(n), - speye(n), speye(n), sparse(n,p+m+1) ;... F, sparse(p,2*n), speye(p), sparse(p,m+1) ;... A, sparse(m,2*n+p), speye(m), sparse(m,1) ]; bb = [ zeros(n,1) ; g ; b ]; cc = [ zeros(n,1) ; gamma * ones(2*n,1) ; zeros(m+p,1) ; 1 ]; K.f = m; K.l = 2*n+p; K.q = m + 1; xx = sedumi( AA, bb, cc, K ); x = x(1:n); % specify cone % solve SOCP % extract solution University of Pennsylvania, 3/30/07 21

History general purpose optimization modeling systems AMPL, GAMS (1970s) systems for SDPs/LMIs (1990s): sdpsol (Wu, Boyd), lmilab (Gahinet, Nemirovsky), lmitool (El Ghaoui) yalmip (Löfberg 2000 ) automated convexity checking (Crusius PhD thesis 2002) disciplined convex programming (DCP) (Grant, Boyd, Ye 2004) cvx (Grant, Boyd, Ye 2005) cvxopt (Dahl, Vandenberghe 2005) ggplab (Mutapcic, Koh, et al 2006) University of Pennsylvania, 3/30/07 22

Summary the bad news: you can t just call a convex optimization solver, hoping for the best; convex optimization is not a plug & play or try my code method you can t just type in a problem description, hoping it s convex (and that a sophisticated analysis tool will recognize it) the good news: by learning and following a modest set of atoms and rules, you can specify a problem in a form very close to its natural mathematical form you can simultaneously verify convexity of problem, automatically generate IPM-compatible equivalent problem University of Pennsylvania, 3/30/07 23

Examples bounding portfolio risk computing probability bounds antenna array beamforming l 1 -regularized logistic regression University of Pennsylvania, 3/30/07 24

Portfolio risk bounding portfolio of n assets invested for single period w i is amount of investment in asset i returns of assets is random vector r with mean r, covariance Σ portfolio return is random variable r T w mean portfolio return is r T w; variance is V = w T Σw value at risk & probability of loss are related to portfolio variance V University of Pennsylvania, 3/30/07 25

Risk bound with uncertain covariance now suppose: w is known (and fixed) have only partial information about Σ, i.e., L ij Σ ij U ij, i,j = 1,...,n problem: how large can portfolio variance V = w T Σw be? University of Pennsylvania, 3/30/07 26

Risk bound via semidefinite programming can get (tight) bound on V via semidefinite programming (SDP): maximize w T Σw subject to Σ 0 L ij Σ ij U ij variable is matrix Σ = Σ T ; Σ 0 means Σ is positive semidefinite many extensions possible, e.g., optimize portfolio w with worst-case variance limit University of Pennsylvania, 3/30/07 27

cvx specification cvx begin variable Sigma(n,n) symmetric maximize ( w *Sigma*w ) subject to Sigma == semidefinite(n); % Sigma is positive semidefinite Sigma >= L; Sigma <= U; cvx end University of Pennsylvania, 3/30/07 28

Example portfolio with n = 4 assets variance bounding with sign constraints on Σ: w = 1 2.5.5, Σ = 1 + +? + 1 + 1 +? + 1 (i.e., Σ 12 0, Σ 23 0,... ) University of Pennsylvania, 3/30/07 29

Result (global) maximum value of V is 10.1, with Σ = 1.00 0.79 0.00 0.53 0.79 1.00.59 0.00 0.00.59 1.00 0.51 0.53 0.00 0.51 1.00 (which has rank 3, so constraint Σ 0 is active) Σ = I yields V = 5.5 Σ = [(L + U)/2] + yields V = 6.75 ([ ] + is positive semidefinite part) University of Pennsylvania, 3/30/07 30

Computing probability bounds random variable (X,Y ) R 2 with N(0, 1) marginal distributions X, Y uncorrelated question: how large (small) can Prob(X 0, Y 0) be? if (X,Y ) N(0,I), Prob(X 0,Y 0) = 0.25 University of Pennsylvania, 3/30/07 31

Probability bounds via LP discretize distribution as p ij, i,j = 1,...,n, over region [ 3,3] 2 x i = y i = 6(i 1)/(n 1) 3, i = 1,...,n maximize (minimize) n/2 i,j=1 p ij subject to p ij 0, i,j = 1,...,n n i=1 p ij = ae y2 i /2, j = 1,...,n n j=1 p ij = ae x2 i /2, i = 1,...,n n i,j=1 p ijx i y j = 0 with variable p R n n, a = 2.39/(n 1) University of Pennsylvania, 3/30/07 32

cvx specification cvx begin variable p(n,n) maximize ( sum(sum(p(1:n/2,1:n/2))) ) subject to p >= 0; sum( p,1 ) == a*exp(-y.^2/2) ; sum( p,2 ) == a*exp(-x.^2/2) ; sum( sum( p.*(x*y ) ) ) == 0; cvx end University of Pennsylvania, 3/30/07 33

Gaussian (X,Y ) N(0.I); Prob(X 0,Y 0) = 0.25 3 0 3 3 0 3 University of Pennsylvania, 3/30/07 34

Distribution that minimizes Prob(X 0, Y 0) Prob(X 0,Y 0) = 0.03 3 0 3 3 0 3 University of Pennsylvania, 3/30/07 35

Distribution that maximizes Prob(X 0, Y 0) Prob(X 0,Y 0) = 0.47 3 0 3 3 0 3 University of Pennsylvania, 3/30/07 36

Antenna array beamforming θ n omnidirectional antenna elements in plane, at positions (x i,y i ) unit plane wave (λ = 1) incident from angle θ ith element has (demodulated) signal e j(x i cos θ+y i sin θ) (j = 1) University of Pennsylvania, 3/30/07 37

combine antenna element signals using complex weights w i to get antenna array output y(θ) = n w i e j(x i cos θ+y i sin θ) i=1 typical design problem: choose w C n so that y(θ tar ) = 1 (unit gain in target or look direction) y(θ) is small for θ θ tar (2 is beamwidth) University of Pennsylvania, 3/30/07 38

Example n = 30 antenna elements, θ tar = 60, = 15 (30 beamwidth) 5 0 0 5 University of Pennsylvania, 3/30/07 39

Uniform weights w i = 1/n; no particular directivity pattern 1 0.1 University of Pennsylvania, 3/30/07 40

Least-squares (l 2 -norm) beamforming discretize angles outside beam (i.e., θ θ tar ) as θ 1,...,θ N ; solve least-squares problem ( N ) 1/2 minimize i=1 y(θ i) 2 subject to y(θ tar ) = 1 cvx begin variable w(n) complex minimize ( norm( A_outside_beam*w ) ) subject to a_tar *w == 1; cvx end University of Pennsylvania, 3/30/07 41

Least-squares beamforming 1 0.1 University of Pennsylvania, 3/30/07 42

solve minimax problem (objective is called sidelobe level) Chebyshev beamforming minimize max i=1,...,n y(θ i ) subject to y(θ tar ) = 1 cvx begin variable w(n) complex minimize ( max( abs( A_outside_beam*w ) ) ) subject to a_tar *w == 1; cvx end University of Pennsylvania, 3/30/07 43

(globally optimal) sidelobe level 0.11 Chebyshev beamforming 1 0.1 University of Pennsylvania, 3/30/07 44

l 1 -regularized logistic regression logistic model: Prob(y = 1) = exp(at x + b) 1 + exp(a T x + b) y { 1, 1} is Boolean random variable (outcome) x R n is vector of explanatory variables or features a R n, b are model parameters a T x + b = 0 is neutral hyperplane linear classifier: given x, ŷ = sgn(a T x + b) University of Pennsylvania, 3/30/07 45

Maximum likelihood estimation a.k.a. logistic regression given observed (training) examples (x 1,y 1 )...,(x m,y m ), estimate a, b maximum likelihood model parameters found by solving (convex) problem minimize n i=1 lse ( 0, y i (x T i a + b)) with variables a R n, b R, where (which is convex) lse(u) = log (expu 1 + + expu k ) University of Pennsylvania, 3/30/07 46

l 1 -regularized logistic regression find a R n, b R by solving (convex) problem minimize n i=1 lse ( 0, y i (x T i a + b)) + λ a 1 λ > 0 is regularization parameter protects against over-fitting heuristic to get sparse a (i.e., simple explanation) for m > n heuristic to select relevant features when m < n University of Pennsylvania, 3/30/07 47

cvx code cvx begin variables a(n) b tmp = [zeros(m,1) -y.*(x *a+b)]; minimize ( sum(logsumexp(tmp )) + lambda*norm(a,1) ) cvx end University of Pennsylvania, 3/30/07 48

Leukemia example taken from Golub et al, Science 1999 n = 7129 features (gene expression data) m = 72 examples (acute leukemia patients), divided into training set (38) and validation set (34) outcome: type of cancer (ALL or AML) l 1 -regularized logistic regression model found using training set; classification performance checked on validation set University of Pennsylvania, 3/30/07 49

Classification error card(a) 0.4 0.3 0.2 0.1 0 20 10 0 Results 10 1 10 0 10 1 10 0 λ/λ max University of Pennsylvania, 3/30/07 50

Final comments DCP formalizes the way we think convex optimization modeling should be done CVX makes convex optimization model development & exploration quite straightforward University of Pennsylvania, 3/30/07 51

References www.stanford.edu/~boyd www.stanford.edu/~boyd/cvx www.stanford.edu/class/ee364 or just google convex optimization, convex programming, cvx,... University of Pennsylvania, 3/30/07 52