Convex Optimization. Lieven Vandenberghe. Electrical Engineering Department, UC Los Angeles
|
|
- Ambrose Gardner
- 5 years ago
- Views:
Transcription
1 Convex Optimization Lieven Vandenberghe Electrical Engineering Department, UC Los Angeles Tutorial lectures, 18th Machine Learning Summer School September 13-14, 2011
2 Introduction Convex optimization MLSS 2011 mathematical optimization linear and convex optimization recent history 1
3 Mathematical optimization minimize f 0 (x 1,...,x n ) subject to f 1 (x 1,...,x n ) 0... f m (x 1,...,x n ) 0 a mathematical model of a decision, design, or estimation problem generally intractable even simple looking nonlinear optimization problems can be very hard Introduction 2
4 The famous exception: linear programming minimize c T x subject to a T i x b i, i = 1,...,m widely used since Dantzig introduced the simplex algorithm in 1948 since 1950s, many applications in operations research, network optimization, finance, engineering, combinatorial optimization,... extensive theory (optimality conditions, sensitivity,... ) there exist very efficient algorithms for solving linear programs Introduction 3
5 Convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m objective and constraint functions are convex: for 0 θ 1 f i (θx+(1 θ)y) θf i (x)+(1 θ)f i (y) can be solved globally, with similar (polynomial-time) complexity as LPs surprisingly many problems can be solved via convex optimization provides tractable heuristics and relaxations for non-convex problems Introduction 4
6 History 1940s: linear programming minimize c T x subject to a T i x b i, i = 1,...,m 1950s: quadratic programming 1960s: geometric programming 1990s: semidefinite programming, second-order cone programming, quadratically constrained quadratic programming, robust optimization, sum-of-squares programming,... Introduction 5
7 New applications since 1990 linear matrix inequality techniques in control support vector machine training via quadratic programming semidefinite programming relaxations in combinatorial optimization circuit design via geometric programming l 1 -norm optimization for sparse signal reconstruction applications in structural optimization, statistics, signal processing, communications, image processing, computer vision, quantum information theory, finance, power distribution,... Introduction 6
8 Advances in convex optimization algorithms interior-point methods 1984 (Karmarkar): first practical polynomial-time algorithm for LP : efficient implementations for large-scale LPs around 1990 (Nesterov & Nemirovski): polynomial-time interior-point methods for nonlinear convex programming since 1990: extensions and high-quality software packages fast first-order algorithms similar to gradient descent, but with better convergence properties based on Nesterov s optimal-rate gradient methods from 1980s extend to certain nondifferentiable or constrained problems Introduction 7
9 Overview 1. Basic theory and convex modeling convex sets and functions common problem classes and applications 2. Interior-point methods for conic optimization conic optimization barrier methods symmetric primal-dual methods 3. First-order methods gradient algorithms dual techniques
10 Convex sets and functions Convex optimization MLSS 2011 convex sets convex functions operations that preserve convexity
11 Convex set contains the line segment between any two points in the set x 1,x 2 C, 0 θ 1 = θx 1 +(1 θ)x 2 C convex not convex not convex Convex sets and functions 8
12 Basic examples affine set: solution set of linear equations Ax = b halfspace: solution of one linear inequality a T x b (a 0) polyhedron: solution of finitely many linear inequalities Ax b ellipsoid: solution of quadratic inquality (x x c ) T A(x x c ) 1 (A positive definite) norm ball: solution of x R (for any norm) positive semidefinite cone: S n + = {X S n X 0} the intersection of any number of convex sets is convex Convex sets and functions 9
13 Example of intersection property C = {x R n p(t) 1 for t π/3} where p(t) = x 1 cost+x 2 cos2t+ +x n cosnt 2 p(t) x2 1 0 C 1 0 π/3 2π/3 π t x 1 C is intersection of infinitely many halfspaces, hence convex Convex sets and functions 10
14 Outline convex sets convex functions operations that preserve convexity
15 Convex function domain domf is a convex set and Jensen s inequality holds: for all x,y domf, 0 θ 1 f(θx+(1 θ)y) θf(x)+(1 θ)f(y) (x,f(x)) (y,f(y)) f is concave if f is convex Convex sets and functions 11
16 Examples linear and affine functions are convex and concave expx, logx, xlogx are convex x α is convex for x > 0 and α 1 or α 0; x α is convex for α 1 norms are convex quadratic-over-linear function x T x/t is convex in x, t for t > 0 geometric mean (x 1 x 2 x n ) 1/n is concave for x 0 logdetx is concave on set of positive definite matrices log(e x 1 + e x n ) is convex Convex sets and functions 12
17 Epigraph and sublevel set epigraph: epif = {(x,t) x domf, f(x) t} epif a function is convex if and only its epigraph is a convex set f sublevel sets: C α = {x domf f(x) α} the sublevel sets of a convex function are convex (converse is false) Convex sets and functions 13
18 Differentiable convex functions differentiable f is convex if and only if domf is convex and f(y) f(x)+ f(x) T (y x) for all x,y domf f(y) f(x) + f(x) T (y x) (x,f(x)) twice differentiable f is convex if and only if domf is convex and 2 f(x) 0 for all x domf Convex sets and functions 14
19 Outline convex sets convex functions operations that preserve convexity
20 Methods for establishing convexity of a function 1. verify definition 2. for twice differentiable functions, show 2 f(x) 0 3. show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum minimization composition perspective Convex sets and functions 15
21 Positive weighted sum & composition with affine function nonnegative multiple: αf is convex if f is convex, α 0 sum: f 1 +f 2 convex if f 1,f 2 convex (extends to infinite sums, integrals) composition with affine function: f(ax+b) is convex if f is convex examples logarithmic barrier for linear inequalities f(x) = m log(b i a T i x) i=1 (any) norm of affine function: f(x) = Ax+b Convex sets and functions 16
22 Pointwise maximum is convex if f 1,..., f m are convex f(x) = max{f 1 (x),...,f m (x)} example: sum of r largest components of x R n f(x) = x [1] +x [2] + +x [r] is convex (x [i] is ith largest component of x) proof: f(x) = max{x i1 +x i2 + +x ir 1 i 1 < i 2 < < i r n} Convex sets and functions 17
23 Pointwise supremum g(x) = supf(x,y) y A is convex if f(x,y) is convex in x for each y A example: maximum eigenvalue of symmetric matrix λ max (X) = sup y T Xy y 2 =1 Convex sets and functions 18
24 Minimization h(x) = inf y C f(x,y) is convex if f(x,y) is convex in x,y and C is a convex set examples distance to a convex set C: h(x) = inf y C x y optimal value of linear program as function of righthand side follows by taking h(x) = inf y:ay x ct y f(x,y) = c T y, domf = {(x,y) Ay x} Convex sets and functions 19
25 Composition composition of g : R n R and h : R R: f is convex if f(x) = h(g(x)) g convex, h convex and nondecreasing g concave, h convex and nonincreasing (if we assign h(x) = for x domh) examples expg(x) is convex if g is convex 1/g(x) is convex if g is concave and positive Convex sets and functions 20
26 Perspective the perspective of a function f : R n R is the function g : R n R R, g(x,t) = tf(x/t) g is convex if f is convex on domg = {(x,t) x/t domf, t > 0} examples perspective of f(x) = x T x is quadratic-over-linear function g(x,t) = xt x t perspective of negative logarithm f(x) = logx is relative entropy g(x,t) = tlogt tlogx Convex sets and functions 21
27 the conjugate of a function f is Conjugate function f (y) = sup (y T x f(x)) x dom f f(x) xy x (0, f (y)) f is convex (even if f is not) Convex sets and functions 22
28 convex quadratic function (Q 0) Examples f(x) = 1 2 xt Qx f (y) = 1 2 yt Q 1 y negative entropy f(x) = n x i logx i f (y) = i=1 n e y i 1 i=1 norm f(x) = x f (y) = { 0 y 1 + otherwise indicator function (C convex) f(x) = I C (x) = { 0 x C + otherwise f (y) = supy T x x C Convex sets and functions 23
29 Convex optimization problems Convex optimization MLSS 2011 linear programming quadratic programming geometric programming second-order cone programming semidefinite programming modeling software
30 Convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b f 0, f 1,..., f m are convex functions feasible set is convex locally optimal points are globally optimal tractable, in theory and practice Convex optimization problems 24
31 Linear program (LP) minimize c T x+d subject to Gx h Ax = b inequality is componentwise vector inequality convex problem with affine objective and constraint functions feasible set is a polyhedron P x c Convex optimization problems 25
32 Piecewise-linear minimization minimize f(x) = max i=1,...,m (at i x+b i ) f(x) a T i x + b i equivalent linear program an LP with variables x, t R minimize t subject to a T i x+b i t, i = 1,...,m x Convex optimization problems 26
33 l 1 -Norm and l -norm minimization l 1 -norm approximation and equivalent LP ( y 1 = k y k ) minimize Ax b 1 n minimize i=1 y i subject to y Ax b y l -norm approximation ( y = max k y k ) minimize Ax b minimize y subject to y1 Ax b y1 (1 is vector of ones) Convex optimization problems 27
34 1.0 example: histograms of residuals (with A is ) minimize Ax b 2 minimize Ax b norm distribution is wider with a high peak at zero Convex optimization problems 28
35 Robust regression f(t) t 42 points t i, y i (circles), including two outliers function f(t) = α+βt fitted using 2-norm (dashed) and 1-norm Convex optimization problems 29
36 Linear discrimination separate two sets of points {x 1,...,x N }, {y 1,...,y M } by a hyperplane a T x i +b > 0, i = 1,...,N a T y i +b < 0, i = 1,...,M homogeneous in a, b, hence equivalent to the linear inequalities (in a, b) a T x i +b 1, i = 1,...,N, a T y i +b 1, i = 1,...,M Convex optimization problems 30
37 Approximate linear separation of non-separable sets minimize N max{0,1 a T x i b}+ i=1 M max{0,1+a T y i +b} i=1 a piecewise-linear minimization problem in a, b; equivalent to an LP can be interpreted as a heuristic for minimizing #misclassified points Convex optimization problems 31
38 Quadratic program (QP) minimize (1/2)x T Px+q T x+r subject to Gx h P S n +, so objective is convex quadratic minimize a convex quadratic function over a polyhedron f 0 (x ) x P Convex optimization problems 32
39 Linear program with random cost minimize c T x subject to Gx h c is random vector with mean c and covariance Σ hence, c T x is random variable with mean c T x and variance x T Σx expected cost-variance trade-off minimize Ec T x+γvar(c T x) = c T x+γx T Σx subject to Gx h γ > 0 is risk aversion parameter Convex optimization problems 33
40 Robust linear discrimination H 1 = {z a T z +b = 1} H 2 = {z a T z +b = 1} distance between hyperplanes is 2/ a 2 to separate two sets of points by maximum margin, a quadratic program in a, b minimize a 2 2 = a T a subject to a T x i +b 1, i = 1,...,N a T y i +b 1, i = 1,...,M Convex optimization problems 34
41 Support vector classifier min. γ a 2 2+ N max{0,1 a T x i b}+ i=1 M max{0,1+a T y i +b} i=1 equivalent to a QP γ = 0 γ = 10 Convex optimization problems 35
42 Total variation signal reconstruction minimize ˆx x cor 2 +γφ(ˆx) x cor = x+v is corrupted version of unknown signal x, with noise v variable ˆx (reconstructed signal) is estimate of x φ : R n R is quadratic or total variation smoothing penalty φ quad (ˆx) = n 1 (ˆx i+1 ˆx i ) 2, φ tv (ˆx) = i=1 n 1 i=1 ˆx i+1 ˆx i Convex optimization problems 36
43 example: x cor, and reconstruction with quadratic and t.v. smoothing 2 xcor i quad i t.v quadratic smoothing smooths out noise and sharp transitions in signal total variation smoothing preserves sharp transitions in signal i Convex optimization problems 37
44 Geometric programming posynomial function f(x) = K k=1 c k x a 1k 1 xa 2k 2 x a nk n, domf = R n ++ with c k > 0 geometric program (GP) with f i posynomial minimize f 0 (x) subject to f i (x) 1, i = 1,...,m Convex optimization problems 38
45 Geometric program in convex form change variables to y i = logx i, and take logarithm of cost, constraints geometric program in convex form: ( K ) minimize log exp(a T 0ky +b 0k ) ( k=1 K ) subject to log exp(a T iky +b ik ) 0, i = 1,...,m k=1 b ik = logc ik Convex optimization problems 39
46 Second-order cone program (SOCP) minimize f T x subject to A i x+b i 2 c T i x+d i, i = 1,...,m 2 is Euclidean norm y 2 = y y2 n constraints are nonlinear, nondifferentiable, convex 1 constraints are inequalities w.r.t. second-order cone: { y } y1 2+ +y2 p 1 y p y y y 1 Convex optimization problems 40
47 Robust linear program (stochastic) minimize c T x subject to prob(a T i x b i) η, i = 1,...,m a i random and normally distributed with mean ā i, covariance Σ i we require that x satisfies each constraint with probability exceeding η η = 10% η = 50% η = 90% Convex optimization problems 41
48 SOCP formulation the chance constraint prob(a T i x b i) η is equivalent to the constraint ā T i x+φ 1 (η) Σ 1/2 i x 2 b i Φ is the (unit) normal cumulative density function 1 η Φ(t) t Φ 1 (η) robust LP is a second-order cone program for η 0.5 Convex optimization problems 42
49 Robust linear program (deterministic) minimize c T x subject to a T i x b i for all a i E i, i = 1,...,m a i uncertain but bounded by ellipsoid E i = {ā i +P i u u 2 1} we require that x satisfies each constraint for all possible a i SOCP formulation minimize c T x subject to ā T i x+ PT i x 2 b i, i = 1,...,m follows from sup (ā i +P i u) T x = ā T i x+ Pi T x 2 u 2 1 Convex optimization problems 43
50 Examples of second-order cone constraints convex quadratic constraint (A = LL T positive definite) x T Ax+2b T x+c 0 L T x+l 1 b 2 (b T A 1 b c) 1/2 extends to positive semidefinite singular A hyperbolic constraint [ x T x yz, y,z 0 2x y z ] 2 y +z, y,z 0 Convex optimization problems 44
51 Examples of SOC-representable constraints positive powers x 1.5 t, x 0 z : x 2 tz, z 2 x, x,z 0 two hyperbolic constraints can be converted to SOC constraints extends to powers x p for rational p 1 negative powers x 3 t, x > 0 z : 1 tz, z 2 tx, x,z 0 two hyperbolic constraints on r.h.s. can be converted to SOC constraints extends to powers x p for rational p < 0 Convex optimization problems 45
52 Semidefinite program (SDP) minimize c T x subject to x 1 A 1 +x 2 A 2 + +x n A n B A 1, A 2,..., A n, B are symmetric matrices inequality X Y means Y X is positive semidefinite, i.e., z T (Y X)z = i,j (Y ij X ij )z i z j 0 for all z includes many nonlinear constraints as special cases Convex optimization problems 46
53 Geometry 1 [ x y y z ] 0 z y x 1 a nonpolyhedral convex cone feasible set of a semidefinite program is the intersection of the positive semidefinite cone in high dimension with planes Convex optimization problems 47
54 Examples A(x) = A 0 +x 1 A 1 + +x m A m (A i S n ) eigenvalue minimization (and equivalent SDP) minimize λ max (A(x)) minimize t subject to A(x) ti matrix-fractional function minimize b T A(x) 1 b subject to A(x) 0 minimize t[ A(x) b subject to b T t ] 0 Convex optimization problems 48
55 Matrix norm minimization A(x) = A 0 +x 1 A 1 +x 2 A 2 + +x n A n (A i R p q ) matrix norm approximation ( X 2 = max k σ k (X)) minimize A(x) 2 minimize t[ subject to ti A(x) A(x) T ti ] 0 nuclear norm approximation ( X = k σ k(x)) minimize A(x) minimize (tru +trv)/2 [ U A(x) subject to T A(x) V ] 0 Convex optimization problems 49
56 Semidefinite relaxations semidefinite programming is often used to find good bounds for nonconvex polynomial problems, via relaxation as a heuristic for good suboptimal points example: Boolean least-squares minimize Ax b 2 2 subject to x 2 i = 1, i = 1,...,n basic problem in digital communications could check all 2 n possible values of x { 1,1} n... an NP-hard problem, and very hard in general Convex optimization problems 50
57 Semidefinite lifting Boolean least-squares problem minimize x T A T Ax 2b T Ax+b T b subject to x 2 i = 1, i = 1,...,n reformulation: introduce new variable Y = xx T minimize tr(a T AY) 2b T Ax+b T b subject to Y = xx T diag(y) = 1 cost function and second constraint are linear (in the variables Y, x) first constraint is nonlinear and nonconvex... still a very hard problem Convex optimization problems 51
58 Semidefinite relaxation replace Y = xx T with weaker constraint Y xx T to obtain relaxation minimize tr(a T AY) 2b T Ax+b T b subject to Y xx T diag(y) = 1 convex; can be solved as a semidefinite program Y xx T [ Y x x T 1 ] 0 optimal value gives lower bound for Boolean LS problem if Y = xx T at the optimum, we have solved the exact problem otherwise, can use randomized rounding generate z from N(x,Y xx T ) and take x = sign(z) Convex optimization problems 52
59 Example SDP bound LS solution frequency Ax b 2 /(SDP bound) n = 100: feasible set has points histogram of 1000 randomized solutions from SDP relaxation Convex optimization problems 53
60 Nonnegative polynomial on R f(t) = x 0 +x 1 t+ +x 2m t 2m 0 for all t R a convex constraint on x holds if and only if f is a sum of squares of (two) polynomials: f(t) = k (y k0 +y k1 t+ +y km t m ) 2 T = 1. t m T k y k y T k 1. t m = 1. t m Y 1. t m where Y = k y k y T k 0 Convex optimization problems 54
61 SDP formulation f(t) 0 if and only if for some Y 0, f(t) = 1 ṭ. t 2m T x 0 x 1. x 2m = 1 ṭ. t m T Y 1 ṭ. t m this is an SDP constraint: there exists Y 0 such that x 0 = Y 11 x 1 = Y 12 +Y 21 x 2 = Y 13 +Y 22 +Y 32. x 2m = Y m+1,m+1 Convex optimization problems 55
62 General sum-of-squares constraints f(t) = x T p(t) is a sum of squares if ( s s x T p(t) = kq(t)) k=1(y T 2 = q(t) T y k yk T k=1 ) q(t) p, q: basis functions (of polynomials, trigonometric polynomials,... ) independent variable t can be one- or multidimensional a sufficient condition for nonnegativity of x T p(t), useful in nonconvex polynomial optimization in several variables in some nontrivial cases (e.g., polynomial on R), necessary and sufficient equivalent SDP constraint (on the variables x, X) x T p(t) = q(t) T Xq(t), X 0 Convex optimization problems 56
63 Modeling software modeling packages for convex optimization CVX, YALMIP (MATLAB) CVXMOD, CVXPY (Python) assist in formulating convex problems by automating two tasks: verifying convexity from convex calculus rules transforming problem in input format required by standard solvers related packages general-purpose optimization modeling: AMPL, GAMS Convex optimization problems 57
64 CVX example minimize Ax b 1 subject to 0 x k 1, k = 1,...,n MATLAB code cvx_begin variable x(3); minimize(norm(a*x - b, 1)) subject to x >= 0; x <= 1; cvx_end between cvx_begin and cvx_end, x is a CVX variable after execution, x is MATLAB variable with optimal solution Convex optimization problems 58
65 Conic optimization Convex optimization MLSS 2011 definitions and examples modeling duality
66 Generalized (conic) inequalities conic inequality: a constraint x K with K a convex cone in R m we require that K is a proper cone: closed pointed: K ( K) = {0} with nonempty interior: intk ; equivalently, K +( K) = R m notation x K y x y K, x K y x y intk with subscript in K omitted if K is clear from the context Conic optimization 59
67 Cone linear program minimize c T x subject to Ax K b if K is the nonnegative orthant, this reduces to regular linear program widely used in recent literature on convex optimization modeling: a small number of primitive cones is sufficient to express most convex constraints that arise in practice algorithms: a convenient problem format for extending interior-point algorithms for linear programming to convex optimization Conic optimization 60
68 Norm cones K = { (x,y) R m 1 R x y } 1 y x x for the Euclidean norm this is the second-order cone (notation: Q m ) Conic optimization 61
69 Second-order cone program minimize c T x subject to B k0 x+d k0 2 B k1 x+d k1, k = 1,...,r cone LP formulation: express constraints as Ax K b K = Q m 1 Q m r, A = B 10 B 11. B r0, b = d 10 d 11. d r0 B r1 d r1 (assuming B k0, d k0 have m k 1 rows) Conic optimization 62
70 Vector notation for symmetric matrices vectorized symmetric matrix: for U S p vec(u) = 2 ( U11, U 21,..., U p1, U 22, U 32,..., U p2,..., U ) pp inverse operation: for u = (u 1,u 2,...,u n ) R n with n = p(p+1)/2 mat(u) = 1 2 2u1 u 2 u p u 2 2up+1 u 2p 1... u p u 2p 1 2up(p+1)/2 coefficients 2 are added so that standard inner products are preserved: tr(uv) = vec(u) T vec(v), u T v = tr(mat(u)mat(v)) Conic optimization 63
71 Positive semidefinite cone S p = {vec(x) X S p +} = {x R p(p+1)/2 mat(x) 0} 1 z y 1 0 x S 2 = { (x,y,z) [ x y/ 2 y/ 2 z ] } 0 Conic optimization 64
72 Semidefinite program minimize c T x subject to x 1 A 11 +x 2 A x n A 1n B 1... x 1 A r1 +x 2 A r2 + +x n A rn B r r linear matrix inequalities of order p 1,..., p r cone LP formulation: express constraints as Ax K B K = S p 1 S p 2 S p r A = vec(a 11 ) vec(a 12 ) vec(a 1n ) vec(a 21 ) vec(a 22 ) vec(a 2n )... vec(a r1 ) vec(a r2 ) vec(a rn ), b = vec(b 1 ) vec(b 2 ). vec(b r ) Conic optimization 65
73 Exponential cone the epigraph of the perspective of expx is a non-proper cone { } K = (x,y,z) R 3 ye x/y z, y > 0 the exponential cone is K exp = clk = K {(x,0,z) x 0,z 0} 1 z y x 0 1 Conic optimization 66
74 Geometric program minimize c T x subject to log n i k=1 exp(a T ik x+b ik) 0, i = 1,...,r cone LP formulation minimize c T x subject to at ik x+b ik 1 z ik n i k=1 K exp, k = 1,...,n i, i = 1,...,r z ik 1, i = 1,...,m Conic optimization 67
75 Power cone definition: for α = (α 1,α 2,...,α m ) > 0, m i=1 α i = 1 K α = { (x,y) R m + R y x α } 1 1 xα m m examples for m = 2 α = ( 1 2, 1 2 ) α = (2 3, 1 3 ) α = (3 4, 1 4 ) y 0 y 0 y x 1 x x 1 x x 1 x 2 Conic optimization 68
76 Outline definition and examples modeling duality
77 Modeling tools convex modeling systems (CVX, YALMIP, CVXMOD, CVXPY,... ) convert problems stated in standard mathematical notation to cone LPs in principle, any convex problem can be represented as a cone LP in practice, a small set of primitive cones is used (R n +, Q p, S p ) choice of cones is limited by available algorithms and solvers (see later) modeling systems implement set of rules for expressing constraints f(x) t as conic inequalities for the implemented cones Conic optimization 69
78 Examples of second-order cone representable functions convex quadratic f(x) = x T Px+q T x+r (P 0) quadratic-over-linear function f(x,y) = xt x y with domf = R n R + (assume 0/0 = 0) convex powers with rational exponent f(x) = x α, f(x) = { x β x > 0 + x 0 for rational α 1 and β 0 p-norm f(x) = x p for rational p 1 Conic optimization 70
79 Examples of SD cone representable functions matrix-fractional function f(x,y) = y T X 1 y with domf = {(X,y) S n + R n y R(X)} maximum eigenvalue of symmetric matrix maximum singular value f(x) = X 2 = σ 1 (X) [ ] ti X X 2 t X T ti 0 nuclear norm f(x) = X = i σ i(x) X t U,V : [ U X X T V ] 0, 1 (tru +trv) t 2 Conic optimization 71
80 Functions representable with exponential and power cone exponential cone exponential and logarithm entropy f(x) = xlogx power cone increasing power of absolute value: f(x) = x p with p 1 decreasing power: f(x) = x q with q 0 and domain R ++ p-norm: f(x) = x p with p 1 Conic optimization 72
81 Outline definition and examples modeling duality
82 Linear programming duality primal and dual LP (P) minimize c T x subject to Ax b (D) maximize b T z subject to A T z +c = 0 z 0 primal optimal value is p (+ if infeasible, if unbounded below) dual optimal value is d ( if infeasible, + if unbounded below) duality theorem weak duality: p d, with no exception strong duality: p = d if primal or dual is feasible if p = d is finite, then primal and dual optima are attained Conic optimization 73
83 Dual cone definition K = {y x T y 0 for all x K} a proper cone if K is a proper cone dual inequality: x y means x K y for generic proper cone K note: dual cone depends on choice of inner product: H 1 K is dual cone for inner product x,y = x T Hy Conic optimization 74
84 Examples R p +, Q p, S p are self-dual: K = K dual of norm cone is norm cone for dual norm dual of exponential cone K exp = { (u,v,w) R R R + ulog( u/w)+u v 0 } (with 0log(0/w) = 0 if w 0) dual of power cone is K α = { (u,v) R m + R v (u 1 /α 1 ) α1 (u m /α m ) α m } Conic optimization 75
85 Primal and dual cone LP primal problem (optimal value p ) minimize c T x subject to Ax b dual problem (optimal value d ) maximize b T z subject to A T z +c = 0 z 0 weak duality: p d (without exception) Conic optimization 76
86 Strong duality p = d if primal or dual is strictly feasible slightly weaker than LP duality (which only requires feasibility) can have d < p with finite p and d other implications of strict feasibility if primal is strictly feasible, then dual optimum is attained (if d is finite) if dual is strictly feasible, then primal optimum is attained (if p is finite) Conic optimization 77
87 Optimality conditions minimize c T x subject to Ax+s = b s 0 maximize b T z subject to A T z +c = 0 z 0 optimality conditions [ 0 s ] = [ 0 A T A 0 ][ x z ] + [ c b ] s 0, z 0, z T s = 0 duality gap: inner product of (x,z) and (0,s) gives z T s = c T x+b T z Conic optimization 78
88 Barrier methods Convex optimization MLSS 2011 barrier method for linear programming normal barriers barrier method for conic optimization
89 History 1960s: Sequentially Unconstrained Minimization Technique (SUMT) solves nonlinear convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m via a sequence of unconstrained minimization problems minimize tf 0 (x) m i=1 log( f i (x)) 1980s: LP barrier methods with polynomial worst-case complexity 1990s: barrier methods for non-polyhedral cone LPs Barrier methods 79
90 Logarithmic barrier function for linear inequalities ψ(x) = φ(b Ax), φ(s) = m logs i i=1 a smooth convex function with domψ = {x Ax < b} ψ(x) at boundary of domψ gradient and Hessian are ψ(x) = A T φ(s), 2 ψ(x) = A T φ 2 (s)a with s = b Ax ( 1 φ(s) =,..., s 1 1 s m ) ( 1, φ 2 (s) = diag s 2,..., 1 1 s 2 m ) Barrier methods 80
91 Central path for linear program central path: set of minimizers x (t) (with t > 0) of f t (x) = tc T x+φ(b Ax) c x x (t) optimality conditions: x = x (t) satisfies f t (x) = tc A T φ(s) = 0, s = b Ax Barrier methods 81
92 Central path and duality dual feasible point on central path for x = x (t) and s = b Ax, z (t) = 1 t φ(s) = ( 1 ts 1, 1 ts 2,..., is strictly dual feasible: c+a T z = 0 and z > 0 can be modified to correct for inexact centering of x duality gap between x = x (t) and z = z (t) is 1 ts m ) c T x+b T z = s T z = m t gives bound on suboptimality: c T x (t) p m/t Barrier methods 82
93 Barrier method starting with t > 0, strictly feasible x, repeat until c T x p ǫ make one or more Newton steps to (approximately) minimize f t : x + = x α 2 f t (x) 1 f t (x) step size α is fixed or from line search increase t complexity: with proper initialization, step size, update scheme for t, #Newton steps = O ( mlog(1/ǫ) ) result follows from convergence analysis of Newton s method for f t Barrier methods 83
94 Outline barrier method for linear programming normal barriers barrier method for conic optimization
95 Normal barrier for proper cone φ is a θ-normal barrier for the proper cone K if it is a barrier: smooth, convex, domain intk, blows up at boundary of K logarithmically homogeneous with parameter θ: φ(tx) = φ(x) θlogt, x intk, t > 0 self-concordant: restriction g(α) = φ(x + αv) to any line satisfies g (α) 2g (α) 3/2 introduced by Nesterov and Nemirovski (1994) Barrier methods 84
96 Examples nonnegative orthant: K = R m + φ(x) = m logx i (θ = m) i=1 second-order cone: K = Q p = {(x,y) R p 1 R x 2 y} φ(x,y) = log(y 2 x T x) (θ = 2) semidefinite cone: K = S m = {x R m(m+1)/2 mat(x) 0} φ(x) = logdetmat(x) (θ = m) Barrier methods 85
97 exponential cone: K exp = cl{(x,y,z) R 3 ye x/y z, y > 0} φ(x,y,z) = log(ylog(z/y) x) logz logy (θ = 3) power cone: K = {(x 1,x 2,y) R + R + R y x α 1 1 xα 2 2 } φ(x,y) = log ( x 2α 1 1 x 2α 2 2 y 2) logx 1 logx 2 (θ = 4) Barrier methods 86
98 Central path linear cone LP (with inequality with respect to proper cone K) minimize c T x subject to Ax b barrier for the feasible set where φ is a θ-normal barrier for K φ(b Ax) central path: set of minimizers x (t) (with t > 0) of f t (x) = tc T x+φ(b Ax) Barrier methods 87
99 Newton step centering problem minimize f t (x) = tc T x+φ(b Ax) Newton step at x x = 2 f t (x) 1 f t (x) Newton decrement λ t (x) = ( x T 2 f t (x) x ) 1/2 = ( f t (x) x) 1/2 used to measure proximity of x to x (t) Barrier methods 88
100 Damped Newton method minimize f t (x) = tc T x+φ(b Ax) algorithm select ǫ (0,1/2), η (0,1/4], and a starting point x domf t repeat: 1. compute Newton step x and Newton decrement λ t (x) 2. if λ t (x) 2 ǫ, return x 3. otherwise, set x := x+α x with α = 1 1+λ t (x) if λ t (x) η, α = 1 if λ t (x) < η alternatively, can use backtracking line search Barrier methods 89
101 Convergence results for damped Newton method damped Newton phase f t (x + ) f t (x) γ if λ t (x) η value decreases by at least a positive constant γ = η log(1+η) quadratically convergent phase 2λ t (x + ) (2λ t (x)) 2 if λ t (x) < η implies λ t (x + ) 2η 2 < η, and Newton decrement decreases to zero stopping criterion λ t (x) 2 ǫ implies f t (x) inff t (x) ǫ Barrier methods 90
102 Outline barrier method for linear programming normal barriers barrier method for conic optimization
103 Central path and duality duality point on central path: x (t) defines a strictly dual feasible z (t) z (t) = 1 t φ(s), s = b Ax (t) duality gap: gap between x = x (t) and z = z (t) is c T x+b T z = s T z = θ t, ct x p θ t near central path: for inexactly centered x c T x p ( 1+ λ t(x) θ ) θ t if λ t (x) < 1 (results follow from properties of normal barriers) Barrier methods 91
104 Short-step barrier method algorithm: parameters ǫ (0,1), β = 1/8 select initial x and t with λ t (x) β repeat until 2θ/t ǫ: ( ) 1 t := t, x := x f t (x) 1 f t (x) θ properties increase t slowly so x stays in region of quadratic region (λ t (x) β) iteration complexity ( ( )) θlog θ #iterations = O ǫt 0 best known worst-case complexity; same as for linear programming Barrier methods 92
105 Predictor-corrector methods short-step barrier methods stay in narrow neighborhood of central path (defined by limit on λ t ) make small, fixed increases t + = µt as a result, quite slow in practice predictor-corrector method select new t using a linear approximation to central path ( predictor ) re-center with new t ( corrector ) allows faster and adaptive increases in t; similar worst-case complexity Barrier methods 93
106 Primal-dual methods Convex optimization MLSS 2011 primal-dual algorithms for linear programming symmetric cones primal-dual algorithms for conic optimization implementation
107 Primal-dual interior-point methods similarities with barrier method follow the same central path same linear algebra cost per iteration differences more robust and faster (typically less than 50 iterations) primal and dual iterates updated at each iteration symmetric treatment of primal and dual iterates can start at infeasible points include heuristics for adaptive choice of central path parameter t often have superlinear asymptotic convergence Primal-dual methods 94
108 Primal-dual central path for linear programming minimize c T x subject to Ax+s = b s 0 maximize b T z subject to A T z +c = 0 z 0 optimality conditions Ax+s = b, A T z +c = 0, (s,z) 0, s z = 0 s z is component-wise vector product primal-dual parametrization of central pah Ax+s = b, A T z +c = 0, (s,z) 0, s z = 1 t 1 solution is x = x (t), z = z (t) Primal-dual methods 95
109 Primal-dual search direction steps solve central path equations linearized around current iterates x, s, z A(x+ x)+s+ s = b A T (z + z)+c = 0 (1) (s+ z) (z + z) = σµ1 where µ = (s T z)/m and σ [0,1] targets point on central path with 1/t = σµ, i.e., with gap σs T z different methods use different strategies for selecting σ linearized equations: the two linear equations in (1) and z s+s z = σµ1 s z after eliminating s, z reduces to an equation A T DA x = r, D = diag(z 1 /s 1,...,z m /s m ) Primal-dual methods 96
110 Outline primal-dual algorithms for linear programming symmetric cones primal-dual algorithms for conic optimization implementation
111 Symmetric cones symmetric primal-dual solvers for cone LPs are limited to symmetric cones second-order cone positive semidefinite cone direct products of these primitive symmetric cones (such as R p +) definition: cone of squares x = y 2 = y y for a product that satisfies 1. bilinearity (x y is linear in x for fixed y and vice-versa) 2. x y = y x 3. x 2 (y x) = (x 2 y) x 4. x T (y z) = (x y) T z not necessarily associative Primal-dual methods 97
112 Vector product and identity element nonnegative orthant: componentwise product x y = diag(x)y identity element is e = 1 = (1,1,...,1) positive semidefinite cone: symmetrized matrix product x y = 1 2 vec(xy +YX) with X = mat(x), Y = mat(y) identity element is e = vec(i) second-order cone: the product of x = (x 0,x 1 ) and y = (y 0,y 1 ) is x y = 1 2 [ identity element is e = ( 2,0,...,0) x T y x 0 y 1 +y 0 x 1 ] Primal-dual methods 98
113 Classification symmetric cones are studied in the theory of Euclidean Jordan algebras all possible symmetric cones have been characterized list of symmetric cones the second-order cone the positive semidefinite cone of Hermitian matrices with real, complex, or quaternion entries 3 3 positive semidefinite matrices with octonion entries Cartesian products of these primitive symmetric cones (such as R p +) practical implication can focus on Q p, S p and study these cones using elementary linear algebra Primal-dual methods 99
114 Spectral decomposition with each symmetric cone/product we associate a spectral decomposition x = θ λ i q i, i=1 with θ q i = e and q i q j = i=1 { qi i = j 0 i j semidefinite cone (K = S p ): eigenvalue decomposition of mat(x) θ = p, mat(x) = p λ i v i vi T, q i = vec(v i vi T ) i=1 second-order cone (K = Q p ) θ = 2, λ i = x 0± x 1 2 2, q i = 1 2 [ ] 1, i = 1,2 ±x 1 / x 1 2 Primal-dual methods 100
115 Applications nonnegativity x 0 λ 1,...,λ θ 0, x 0 λ 1,...,λ θ > 0 powers (in particular, inverse and square root) x α = i λ α i q i log-det barrier φ(x) = logdetx = θ logλ i i=1 a θ-normal barrier gradient is φ(x) = x 1 Primal-dual methods 101
116 Outline primal-dual algorithms for linear programming symmetric cones primal-dual algorithms for conic optimization implementation
117 Symmetric parametrization of central path centering problem minimize tc T x+φ(b Ax) optimality conditions (using φ(s) = s 1 ) Ax+s = b, A T z +c = 0, (s,z) 0, z = 1 t s 1 equivalent symmetric form Ax+b = s, A T z +c = 0, (s,z) 0, s z = 1 t e Primal-dual methods 102
118 Scaling with Hessian linear transformation with H = 2 φ(u) have several important properties preserves conic inequalities: s 0 Hs 0 if s is invertible, then Hs is invertible and (Hs) 1 = H 1 s 1 preserves central path: s z = µe (Hs) (H 1 z) = µe symmetric square root of H is H 1/2 = 2 φ(u 1/2 ) example (K = S p ): S = U 1 SU 1 S = mat(s), U = mat(u) Primal-dual methods 103
119 Primal-dual search direction steps solve central path equations linearized around current iterates x, s, z A(x+ x)+s+ s = b, A T (z + z)+c = 0 (2) (H(s+ s)) ( H 1 (z + z) ) = σµe where µ = (s T z)/m, σ [0,1], and H = 2 φ(u) different algorithms use different choices of σ, u Nesterov-Todd scaling: H = 2 φ(u) defined by Hs = H 1 z linearized equations: the two linear equations (2) and (Hs) (H 1 z)+(h 1 z) (H s) = σµe (Hs) (H 1 z) after eliminating s, z, reduces to an equation A T 2 φ(w)a x = r, w = u 2 Primal-dual methods 104
120 Outline primal-dual algorithms for linear programming symmetric cones primal-dual algorithms for conic optimization implementation
121 Software implementations General-purpose software for nonlinear convex optimization several high-quality packages (MOSEK, Sedumi, SDPT3,... ) exploit sparsity to achieve scalability Customized implementations can exploit non-sparse types of problem structure often orders of magnitude faster than general-purpose solvers Primal-dual methods 105
122 Example: l 1 -regularized least-squares A is m n (with m n) and dense quadratic program formulations minimize Ax b 2 2+ x 1 minimize Ax b T u subject to u x u coefficient of Newton system in interior-point method is [ A T A ] [ ] D1 +D + 2 D 2 D 1 D 2 D 1 D 1 +D 2 (D 1, D 2 positive diagonal) expensive (O(n 3 )) for large n Primal-dual methods 106
123 customized implementation can reduce Newton equation to solution of a system (AD 1 A T +I) u = r cost per iteration is O(m 2 n) comparison (seconds on 2.83 Ghz Core 2 Quad machine) m n custom general-purpose custom solver is CVXOPT; general-purpose solver is MOSEK Primal-dual methods 107
124 Gradient methods Convex optimization MLSS 2011 gradient and subgradient method proximal gradient method fast proximal gradient methods 108
125 Classical gradient method to minimize a convex differentiable function f: choose x (0) and repeat x (k) = x (k 1) t k f(x (k 1) ), k = 1,2,... step size t k is constant or from line search advantages every iteration is inexpensive does not require second derivatives disadvantages often very slow; very sensitive to scaling does not handle nondifferentiable functions Gradient methods 109
126 Quadratic example f(x) = 1 2 (x2 1+γx 2 2) (γ > 1) with exact line search and starting point x (0) = (γ,1) x (k) x 2 x (0) x 2 = ( ) k γ 1 γ +1 4 x x 1 Gradient methods 110
127 Nondifferentiable example f(x) = x 2 1 +γx2 2 ( x 2 x 1 ), f(x) = x 1+γ x 2 1+γ ( x 2 > x 1 ) with exact line search, x (0) = (γ,1), converges to non-optimal point 2 x x 1 Gradient methods 111
128 First-order methods address one or both disadvantages of the gradient method methods for nondifferentiable or constrained problems smoothing methods subgradient method proximal gradient method methods with improved convergence variable metric methods conjugate gradient method accelerated proximal gradient method we will discuss subgradient and proximal gradient methods Gradient methods 112
129 Subgradient g is a subgradient of a convex function f at x if f(y) f(x)+g T (y x) y domf f(x) f(x 1 ) + g T 1 (x x 1) f(x 2 ) + g T 2 (x x 2) f(x 2 ) + g T 3 (x x 2) x 1 x 2 generalizes basic inequality for convex differentiable f f(y) f(x)+ f(x) T (y x) y domf Gradient methods 113
130 Subdifferential the set of all subgradients of f at x is called the subdifferential f(x) absolute value f(x) = x f(x) = x f(x) 1 x 1 x Euclidean norm f(x) = x 2 f(x) = 1 x 2 x if x 0, f(x) = {g g 2 1} if x = 0 Gradient methods 114
131 Subgradient calculus weak calculus rules for finding one subgradient sufficient for most algorithms for nondifferentiable convex optimization if one can evaluate f(x), one can usually compute a subgradient much easier than finding the entire subdifferential subdifferentiability convex f is subdifferentiable on dom f except possibly at the boundary example of a non-subdifferentiable function: f(x) = x at x = 0 Gradient methods 115
132 Examples of calculus rules nonnegative combination: f = α 1 f 1 +α 2 f 2 with α 1,α 2 0 g = α 1 g 1 +α 2 g 2, g 1 f 1 (x), g 2 f 2 (x) composition with affine transformation: f(x) = h(ax + b) g = A T g, g h(ax+b) pointwise maximum f(x) = max{f 1 (x),...,f m (x)} g f i (x) where f i (x) = max k f k (x) conjugate f(x) = sup y (x T y f(y)); take any maximizing y Gradient methods 116
133 Subgradient method to minimize a nondifferentiable convex function f: choose x (0) and repeat x (k) = x (k 1) t k g (k 1), k = 1,2,... g (k 1) is any subgradient of f at x (k 1) step size rules fixed step size: t k constant fixed step length: t k g (k 1) 2 constant (i.e., x (k) x (k 1) 2 constant) diminishing: t k 0, t k = k=1 Gradient methods 117
134 Some convergence results assumption: f is convex and Lipschitz continuous with constant G > 0: f(x) f(y) G x y 2 x,y results fixed step size t k = t converges to approximately G 2 t/2-suboptimal fixed length t k g (k 1) 2 = s converges to approximately Gs/2-suboptimal decreasing k t k, t k 0: convergence rate of convergence is 1/ k with proper choice of step size sequence Gradient methods 118
135 Example: 1-norm minimization minimize Ax b 1 (A R ,b R 500 ) subgradient is given by A T sign(ax b) (f (k) best f )/f / k 0.01/k k fixed steplength s = 0.1, 0.01, k diminishing step size t k = 0.01/ k, t k = 0.01/k Gradient methods 119
136 Outline gradient and subgradient method proximal gradient method fast proximal gradient methods
137 Proximal mapping the proximal mapping (prox-operator) of a convex function h is prox h (x) = argmin u h(x) = 0: prox h (x) = x ( h(u)+ 1 ) 2 u x 2 2 h(x) = I C (x) (indicator function of C): prox h is projection on C prox h (x) = argmin u C u x 2 2 = P C (x) h(x) = x 1 : prox h is the soft-threshold (shrinkage) operation prox h (x) i = x i 1 x i 1 0 x i 1 x i +1 x i 1 Gradient methods 120
138 Proximal gradient method unconstrained problem with cost function split in two components minimize f(x) = g(x)+h(x) g convex, differentiable, with domg = R n h convex, possibly nondifferentiable, with inexpensive prox-operator proximal gradient algorithm x (k) = prox tk h ( ) x (k 1) t k g(x (k 1) ) t k > 0 is step size, constant or determined by line search Gradient methods 121
139 Interpretation x + = prox th (x t g(x)) from definition of proximal operator: x + = argmin u = argmin u ( h(u)+ 1 ) 2t u x+t g(x) 2 2 ( h(u)+g(x)+ g(x) T (u x)+ 1 ) 2t u x 2 2 x + minimizes h(u) plus a simple quadratic local model of g(u) around x Gradient methods 122
140 Examples minimize g(x) + h(x) gradient method: h(x) = 0, i.e., minimize g(x) x + = x t g(x) gradient projection method: h(x) = I C (x), i.e., minimize g(x) over C x x + = P C (x t g(x)) C x + x t g(x) Gradient methods 123
141 iterative soft-thresholding: h(x) = x 1, i.e., minimize g(x)+ x 1 x + = prox th (x t g(x)) and prox th (u) i = u i t u i t 0 t u i t u i +t u i t prox th (u) i t t u i Gradient methods 124
142 Some properties of proximal mappings prox h (x) = argmin u ( h(u)+ 1 ) 2 u x 2 2 assume h is closed and convex (i.e., convex with closed epigraph) prox h (x) is uniquely defined for all x prox h is nonexpansive prox h (x) prox h (y) 2 x y 2 Moreau decomposition x = prox h (x)+prox h (x) cf., properties of Euclidean projection on convex sets Gradient methods 125
143 example: h is indicator function of subspace L h(u) = I L (u) = { 0 u L + otherwise conjugate h is indicator function of the orthogonal complement L h (v) = supv T u = u L { 0 v L = I L (v) + otherwise Moreau decomposition is orthogonal decomposition x = P L (x)+p L (x) Gradient methods 126
144 Examples of inexpensive prox-operators projection on simple sets hyperplanes and halfspaces rectangles {x l x u} probability simplex {x 1 T x = 1,x 0} norm ball for many norms (Euclidean, 1-norm,... ) nonnegative orthant, second-order cone, positive semidefinite cone Euclidean norm: h(x) = x 2 prox th (x) = ( 1 t ) x if x 2 t, x 2 prox th (x) = 0 otherwise Gradient methods 127
145 logarithmic barrier h(x) = n i=1 logx i, prox th (x) i = x i+ x 2 i +4t, i = 1,...,n 2 Euclidean distance: d(x) = inf y C x y 2 (C closed convex) prox td (x) = θp C (x)+(1 θ)x, θ = t max{d(x), t} squared Euclidean distance: h(x) = d(x) 2 /2 prox th (x) = 1 1+t x+ t 1+t P C(x) Gradient methods 128
146 Prox-operator of conjugate prox th (x) = x tprox h/t (x/t) follows from Moreau decomposition of interest when prox-operator of h is inexpensive example: norms h(x) = I C (x), h (y) = y where C is unit norm ball for and is dual norm of prox h is projection on C formula useful for prox-operator of if projection on C is inexpensive Gradient methods 129
147 Support function many convex functions can be expressed as support functions with C closed, convex h(x) = S C (x) = supx T y y C conjugate is indicator function of C: h (y) = I C (y) hence, can compute prox th via projection on C example: h(x) is sum of largest r components of x h(x) = x [1] + +x [r] = S C (x), C = {y 0 y 1,1 T y = r} Gradient methods 130
148 Convergence of proximal gradient method minimize f(x) = g(x)+h(x) assumptions g is Lipschitz continuous with constant L > 0 g(x) g(y) 2 L x y 2 x,y optimal value f is finite and attained at x (not necessarily unique) result: with fixed step size t k = 1/L f(x (k) ) f L 2k x(0) x 2 2 compare with 1/ k rate of subgradient method can be extended to account for line searches Gradient methods 131
149 Outline gradient and subgradient method proximal gradient method fast proximal gradient methods
150 Fast (proximal) gradient methods Nesterov (1983, 1988, 2005): three gradient projection methods with 1/k 2 convergence rate Beck & Teboulle (2008): FISTA, a proximal gradient version of Nesterov s 1983 method Nesterov (2004 book), Tseng (2008): overview and unified analysis of fast gradient methods several recent variations and extensions this lecture: FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) Gradient methods 132
151 FISTA unconstrained problem with composite objective minimize f(x) = g(x)+h(x) g convex differentiable with domg = R n h convex with inexpensive prox-operator algorithm: choose x (0) = y (0) domh; for k 1 x (k) = prox tk h ( ) y (k 1) t k g(y (k 1) ) y (k) = x (k) + k 1 k +2 (x(k) x (k 1) ) Gradient methods 133
152 Interpretation first iteration (k = 1) is a proximal gradient step at x (0) next iterations are proximal gradient steps at extrapolated points y (k 1) ( x (k) = prox tk h y (k 1) t k g(y (k 1) ) ) x (k 2) x (k 1) y (k 1) sequence x (k) remains feasible (in domh); sequence y (k) not necessarily Gradient methods 134
153 Convergence of FISTA minimize f(x) = g(x)+h(x) assumptions optimal value f is finite and attained at x (not necessarily unique) domg = R n and g is Lipschitz continuous with constant L > 0 h is closed (implies prox th (u) exists and is unique for all u) result: with fixed step size t k = 1/L f(x (k) ) f 2L (k +1) 2 x(0) f 2 2 compare with 1/k convergence rate for gradient method can be extended to account for line searches Gradient methods 135
154 10 0 gradient 10 0 gradient Example minimize log m i=1 exp(a T i x+b i) randomly generated data with m = 2000, n = 1000, same fixed step size 10-1 FISTA 10-1 FISTA f(x (k) ) f f FISTA is not a descent method k k Gradient methods 136
155 Dual methods Convex optimization MLSS 2011 Lagrange duality dual decomposition dual proximal gradient method multiplier methods
156 Dual function convex problem (with linear constraints for simplicity) optimal value p Lagrangian minimize f(x) subject to Gx h Ax = b L(x,λ,ν) = f(x)+λ T (Gx h)+ν T (Ax b) = f(x)+(g T λ+a T ν) T x h T λ b T ν dual function g(λ,ν) = inf x L(x,λ,ν) = f ( G T λ A T ν) h T λ b T ν Dual methods 137
157 Dual problem maximize g(λ, ν) subject to λ 0 optimal value d a convex optimization problem in λ, ν weak duality: p d, without exception strong duality: p = d if a constraint qualification holds (for example, primal problem is feasible and dom f open) Dual methods 138
158 Least-norm solution of linear equations minimize f(x) = x subject to Ax = b recall that f is indicator function of unit dual norm ball dual problem maximize b T ν f ( A T ν) = { b T ν A T ν 1 otherwise reformulated dual problem minimize b T z subjec to A T z 1 Dual methods 139
159 Norm approximation minimize Ax b reformulated problem minimize y subject to y = Ax b dual function dual problem ( g(ν) = inf y +ν T y ν T Ax+b T ν ) x,y { b = T ν A T ν = 0, ν 1 otherwise minimize b T z subject to A T z = 0, z 1 Dual methods 140
160 Karush-Kuhn-Tucker optimality conditions if strong duality holds, then x, λ, ν are optimal if and only if 1. primal feasibility: x domf, Gx h, Ax = b 2. λ 0 3. complementary slackness: λ T (h Gx) = 0 4. x minimizes L(x,λ,ν) = f(x)+λ T (Gx h)+ν T (Ax b) for differentiable f, condition 4 can be expressed as f(x)+g T λ+a T ν = 0 Dual methods 141
161 Outline Lagrange dual dual decomposition dual proximal gradient method multiplier methods
162 Dual methods primal problem minimize f(x) subject to Gx h Ax = b dual problem maximize h T λ b T ν f ( G T λ A T ν) subject to λ 0 possible advantages of solving the dual when using first-order methods dual problem is unconstrained or has simple constraints dual problem can be decomposed into smaller problems Dual methods 142
163 (Sub-)gradients of conjugate function f (y) = sup x ( y T x f(x) ) subgradient: x is a subgradient at y if it maximizes y T x f(x) if maximizing x is unique, then f is differentiable this is the case, for example, if f is strictly convex strongly convex function: f is strongly convex with parameter µ if f(x) µ 2 xt x is convex implies that f (x) is Lipschitz continuous with parameter 1/µ Dual methods 143
164 Dual gradient method primal problem with equality constraints and dual minimize f(x) subject to Ax = b dual ascent: use (sub-)gradient method to minimize g(ν) = b T ν +f ( A T ν) = sup x ( (b Ax) T ν f(x) ) algorithm x + = argmin x ( f(x)+ν T Ax ) ν + = ν +t(ax + b) of interest if calculation of x + is inexpensive (for example, separable) Dual methods 144
165 Dual decomposition minimize f 1 (x 1 )+f 2 (x 2 ) subject to G 1 x 1 +G 2 x 2 h objective is separable; constraint is complicating (or coupling) constraint dual problem ( master problem) maximize h T λ f 1( G T 1λ) f 2( G T 2λ) subject to λ 0 can be solved by (sub-)gradient projection if λ 0 is the only constraint subproblems: for j = 1,2, evaluate f j( G T j λ) = inf x j ( fj (x j )+λ T G j x j ) maximizer x j gives subgradient G j x j of f j ( GT j λ) w.r.t. λ Dual methods 145
A Brief Review on Convex Optimization
A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review
More informationCSCI : Optimization and Control of Networks. Review on Convex Optimization
CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one
More information15. Conic optimization
L. Vandenberghe EE236C (Spring 216) 15. Conic optimization conic linear program examples modeling duality 15-1 Generalized (conic) inequalities Conic inequality: a constraint x K where K is a convex cone
More informationEE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1
EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex
More informationLecture: Convex Optimization Problems
1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization
More information4. Convex optimization problems
Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationConvex optimization problems. Optimization problem in standard form
Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality
More informationL. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1
L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones definition spectral decomposition quadratic representation log-det barrier 18-1 Introduction This lecture: theoretical properties of the following
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More information3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions
3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions
More information4. Convex optimization problems
Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization
More informationTutorial on Convex Optimization for Engineers Part II
Tutorial on Convex Optimization for Engineers Part II M.Sc. Jens Steinwandt Communications Research Laboratory Ilmenau University of Technology PO Box 100565 D-98684 Ilmenau, Germany jens.steinwandt@tu-ilmenau.de
More informationLecture: Duality of LP, SOCP and SDP
1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More information3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions
3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions
More informationUnconstrained minimization
CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained
More informationThe proximal mapping
The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More informationOptimisation convexe: performance, complexité et applications.
Optimisation convexe: performance, complexité et applications. Introduction, convexité, dualité. A. d Aspremont. M2 OJME: Optimisation convexe. 1/128 Today Convex optimization: introduction Course organization
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationConvex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University
Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationSubgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus
1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationConvex Optimization Boyd & Vandenberghe. 5. Duality
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications
ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March
More information8. Conjugate functions
L. Vandenberghe EE236C (Spring 2013-14) 8. Conjugate functions closed functions conjugate function 8-1 Closed set a set C is closed if it contains its boundary: x k C, x k x = x C operations that preserve
More information1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics
1. Introduction Convex Optimization Boyd & Vandenberghe mathematical optimization least-squares and linear programming convex optimization example course goals and topics nonlinear optimization brief history
More information1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics
1. Introduction Convex Optimization Boyd & Vandenberghe mathematical optimization least-squares and linear programming convex optimization example course goals and topics nonlinear optimization brief history
More information1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics
1. Introduction mathematical optimization least-squares and linear programming convex optimization example course goals and topics nonlinear optimization brief history of convex optimization 1 1 Mathematical
More informationAdvances in Convex Optimization: Theory, Algorithms, and Applications
Advances in Convex Optimization: Theory, Algorithms, and Applications Stephen Boyd Electrical Engineering Department Stanford University (joint work with Lieven Vandenberghe, UCLA) ISIT 02 ISIT 02 Lausanne
More informationLecture: Duality.
Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong
More information4. Convex optimization problems (part 1: general)
EE/AA 578, Univ of Washington, Fall 2016 4. Convex optimization problems (part 1: general) optimization problem in standard form convex optimization problems quasiconvex optimization 4 1 Optimization problem
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More informationA : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:
1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 6 (Conic optimization) 07 Feb, 2013 Suvrit Sra Organizational Info Quiz coming up on 19th Feb. Project teams by 19th Feb Good if you can mix your research
More information1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics
1. Introduction Convex Optimization Boyd & Vandenberghe mathematical optimization least-squares and linear programming convex optimization example course goals and topics nonlinear optimization brief history
More information1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics
1. Introduction Convex Optimization Boyd & Vandenberghe mathematical optimization least-squares and linear programming convex optimization example course goals and topics nonlinear optimization brief history
More informationConvex Optimization in Communications and Signal Processing
Convex Optimization in Communications and Signal Processing Prof. Dr.-Ing. Wolfgang Gerstacker 1 University of Erlangen-Nürnberg Institute for Digital Communications National Technical University of Ukraine,
More informationLecture Note 5: Semidefinite Programming for Stability Analysis
ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State
More informationA : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:
1-5: Least-squares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 Chih-Jen Lin (National
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationLecture 9 Sequential unconstrained minimization
S. Boyd EE364 Lecture 9 Sequential unconstrained minimization brief history of SUMT & IP methods logarithmic barrier function central path UMT & SUMT complexity analysis feasibility phase generalized inequalities
More informationEE/AA 578, Univ of Washington, Fall Duality
7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationConvex Optimization & Lagrange Duality
Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT
More informationConvex Optimization. Lieven Vandenberghe Electrical Engineering Department, UCLA. Joint work with Stephen Boyd, Stanford University
Convex Optimization Lieven Vandenberghe Electrical Engineering Department, UCLA Joint work with Stephen Boyd, Stanford University Ph.D. School in Optimization in Computer Vision DTU, May 19, 2008 Introduction
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationGeometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as
Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationAgenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms
Agenda Interior Point Methods 1 Barrier functions 2 Analytic center 3 Central path 4 Barrier method 5 Primal-dual path following algorithms 6 Nesterov Todd scaling 7 Complexity analysis Interior point
More informationInterior Point Algorithms for Constrained Convex Optimization
Interior Point Algorithms for Constrained Convex Optimization Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Inequality constrained minimization problems
More informationInterior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems
AMSC 607 / CMSC 764 Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 4: Introduction to Interior Point Methods Dianne P. O Leary c 2008 Interior Point Methods We ll discuss
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationLecture 14 Barrier method
L. Vandenberghe EE236A (Fall 2013-14) Lecture 14 Barrier method centering problem Newton decrement local convergence of Newton method short-step barrier method global convergence of Newton method predictor-corrector
More informationConvex Optimization and Modeling
Convex Optimization and Modeling Convex Optimization Fourth lecture, 05.05.2010 Jun.-Prof. Matthias Hein Reminder from last time Convex functions: first-order condition: f(y) f(x) + f x,y x, second-order
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationCourse Outline. FRTN10 Multivariable Control, Lecture 13. General idea for Lectures Lecture 13 Outline. Example 1 (Doyle Stein, 1979)
Course Outline FRTN Multivariable Control, Lecture Automatic Control LTH, 6 L-L Specifications, models and loop-shaping by hand L6-L8 Limitations on achievable performance L9-L Controller optimization:
More informationConvex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples
More informationDuality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities
Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationConvex Functions. Pontus Giselsson
Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum
More informationLecture: Examples of LP, SOCP and SDP
1/34 Lecture: Examples of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More information18. Primal-dual interior-point methods
L. Vandenberghe EE236C (Spring 213-14) 18. Primal-dual interior-point methods primal-dual central path equations infeasible primal-dual method primal-dual method for self-dual embedding 18-1 Symmetric
More informationDual Decomposition.
1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:
More informationLecture 2: Convex functions
Lecture 2: Convex functions f : R n R is convex if dom f is convex and for all x, y dom f, θ [0, 1] f is concave if f is convex f(θx + (1 θ)y) θf(x) + (1 θ)f(y) x x convex concave neither x examples (on
More informationConvex functions. Definition. f : R n! R is convex if dom f is a convex set and. f ( x +(1 )y) < f (x)+(1 )f (y) f ( x +(1 )y) apple f (x)+(1 )f (y)
Convex functions I basic properties and I operations that preserve convexity I quasiconvex functions I log-concave and log-convex functions IOE 611: Nonlinear Programming, Fall 2017 3. Convex functions
More informationLinear and non-linear programming
Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)
More informationI.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010
I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationBarrier Method. Javier Peña Convex Optimization /36-725
Barrier Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: Newton s method For root-finding F (x) = 0 x + = x F (x) 1 F (x) For optimization x f(x) x + = x 2 f(x) 1 f(x) Assume f strongly
More informationHW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.
HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard
More informationUnconstrained minimization: assumptions
Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More information8. Geometric problems
8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 Minimum volume ellipsoid around a set Löwner-John ellipsoid
More informationminimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,
4 Duality 4.1 Numerical perturbation analysis example. Consider the quadratic program with variables x 1, x 2, and parameters u 1, u 2. minimize x 2 1 +2x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x
More informationLagrangian Duality and Convex Optimization
Lagrangian Duality and Convex Optimization David Rosenberg New York University February 11, 2015 David Rosenberg (New York University) DS-GA 1003 February 11, 2015 1 / 24 Introduction Why Convex Optimization?
More information10 Numerical methods for constrained problems
10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside
More informationsubject to (x 2)(x 4) u,
Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationA ten page introduction to conic optimization
CHAPTER 1 A ten page introduction to conic optimization This background chapter gives an introduction to conic optimization. We do not give proofs, but focus on important (for this thesis) tools and concepts.
More informationSemidefinite Programming Basics and Applications
Semidefinite Programming Basics and Applications Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Content What is semidefinite programming (SDP)? How to represent
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More informationLecture 8 Plus properties, merit functions and gap functions. September 28, 2008
Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:
More informationLecture 1: January 12
10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 1: January 12 Scribes: Seo-Jin Bang, Prabhat KC, Josue Orellana 1.1 Review We begin by going through some examples and key
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More information