A Brief Review on Convex Optimization

Similar documents
CSCI : Optimization and Control of Networks. Review on Convex Optimization

Tutorial on Convex Optimization for Engineers Part II

EE/AA 578, Univ of Washington, Fall Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality

5. Duality. Lagrangian

Convex Optimization M2

Lecture: Duality.

Lecture: Duality of LP, SOCP and SDP

4. Convex optimization problems

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

4. Convex optimization problems (part 1: general)

Unconstrained minimization

Constrained Optimization and Lagrangian Duality

Lecture: Convex Optimization Problems

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

4. Convex optimization problems

Convex optimization problems. Optimization problem in standard form

CS-E4830 Kernel Methods in Machine Learning

On the Method of Lagrange Multipliers

Convex Optimization & Lagrange Duality

Convex Optimization in Communications and Signal Processing

Optimisation convexe: performance, complexité et applications.

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

ICS-E4030 Kernel Methods in Machine Learning

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

8. Geometric problems

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

Convex Functions. Pontus Giselsson

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Lagrangian Duality and Convex Optimization

8. Geometric problems

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

10. Unconstrained minimization

12. Interior-point methods

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Convex Optimization and Modeling

12. Interior-point methods

Convex Optimization and Modeling

subject to (x 2)(x 4) u,

Lecture 18: Optimization Programming

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

9. Geometric problems

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization and Support Vector Machine

Convex Optimization and l 1 -minimization

11. Equality constrained minimization

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Convex Optimization M2

Lecture 2: Convex functions

EE364a Review Session 5

8. Conjugate functions

Lecture Note 5: Semidefinite Programming for Stability Analysis

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Primal/Dual Decomposition Methods

minimize x subject to (x 2)(x 4) u,

Lecture 7: Convex Optimizations

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Interior Point Algorithms for Constrained Convex Optimization

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

Descent methods. min x. f(x)

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Linear and non-linear programming

Convex Optimization for Signal Processing and Communications: From Fundamentals to Applications

Convex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK)

CS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems. Instructor: Shaddin Dughmi

Equality constrained minimization

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

4TE3/6TE3. Algorithms for. Continuous Optimization

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

COM S 578X: Optimization for Machine Learning

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Optimization for Machine Learning

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Support Vector Machines: Maximum Margin Classifiers

Convex Optimization Overview

January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions

Support vector machines

5 Handling Constraints

Homework Set #6 - Solutions

Machine Learning. Support Vector Machines. Manfred Huber

Transcription:

A Brief Review on Convex Optimization 1

Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review on Convex Optimization 2

Hyperplanes and halfspaces hyperplane: set of the form {x a T x = b, a 0} halfspace: set of the form {x a T x b, a 0} a x 0 a T x b a T x b a is the normal vector hyperplanes and halfspaces are convex A Brief Review on Convex Optimization 3

Euclidean balls and ellipsoids Euclidean ball with center x c and radius r: B(x c,r) = {x x x c 2 r} = {x c +ru u 2 1} ellipsoid: set of the form {x (x x c ) T P 1 (x x c ) 1} with P S n ++ (i.e., P symmetric positive definite) x c other representation: {x c +Au u 2 1} with A square and nonsingular A Brief Review on Convex Optimization 4

Polyhedra solution set of finitely many linear inequalities and equalities Ax b, Cx = d (A R m n, C R p n, is componentwise inequality) a 1 a2 a 5 P a 3 a 4 polyhedron is intersection of finite number of halfspaces and hyperplanes A Brief Review on Convex Optimization 5

Convex functions f : R n R is convex if domf is a convex set and f(θx+(1 θ)y) θf(x)+(1 θ)f(y) for all x,y domf, 0 θ 1 (x,f(x)) (y,f(y)) f is concave if f is convex f is strictly convex if domf is convex and f(θx+(1 θ)y) < θf(x)+(1 θ)f(y) for x,y domf, x y, 0 < θ < 1 A Brief Review on Convex Optimization 6

Examples on R convex: affine: ax+b on R, for any a,b R exponential: e ax, for any a R powers: x α on R ++, for α 1 or α 0 powers of absolute value: x p on R, for p 1 negative entropy: xlogx on R ++ concave: affine: ax+b on R, for any a,b R powers: x α on R ++, for 0 α 1 logarithm: logx on R ++ A Brief Review on Convex Optimization 7

Examples on R n and R m n affine functions are convex and concave; all norms are convex examples on R n affine function f(x) = a T x+b examples on R m n (m n matrices) affine function f(x) = tr(a T X)+b = m i=1 n A ij X ij +b j=1 A Brief Review on Convex Optimization 8

Restriction of a convex function to a line f : R n R is convex if and only if the function g : R R, g(t) = f(x+tv), domg = {t x+tv domf} is convex (in t) for any x domf, v R n. can check convexity of f by checking convexity of functions of one variable A Brief Review on Convex Optimization 9

First-order condition f is differentiable if domf is open and the gradient f(x) = exists at each x domf ( f(x), f(x),..., f(x) ) x 1 x 2 x n 1st-order condition: differentiable f with convex domain is convex iff f(y) f(x)+ f(x) T (y x) for all x,y domf f(y) f(x) + f(x) T (y x) (x,f(x)) first-order approximation of f is global underestimator A Brief Review on Convex Optimization 10

Second-order conditions f is twice differentiable if domf is open and the Hessian 2 f(x) S n, exists at each x domf 2 f(x) ij = 2 f(x) x i x j, i,j = 1,...,n, 2nd-order conditions for twice differentiable f with convex domain f is convex if and only if 2 f(x) 0 for all x domf if 2 f(x) 0 for all x domf, then f is strictly convex A Brief Review on Convex Optimization 11

Examples quadratic function: f(x) = (1/2)x T Px+q T x+r (with P S n ) f(x) = Px+q, 2 f(x) = P convex if P 0 least-squares objective: f(x) = Ax b 2 2 f(x) = 2A T (Ax b), 2 f(x) = 2A T A convex (for any A) quadratic-over-linear: f(x,y) = x 2 /y 2 2 f(x,y) = 2 y 3 [ y x ][ y x ] T 0 f(x,y) 1 0 2 2 convex for y > 0 1 y 0 2 x 0 A Brief Review on Convex Optimization 12

Epigraph and sublevel set α-sublevel set of f : R n R: C α = {x domf f(x) α} sublevel sets of convex functions are convex (converse is false) epigraph of f : R n R: epif = {(x,t) R n+1 x domf, f(x) t} epif f f is convex if and only if epif is a convex set A Brief Review on Convex Optimization 13

Jensen s inequality basic inequality: if f is convex, then for 0 θ 1, f(θx+(1 θ)y) θf(x)+(1 θ)f(y) extension: if f is convex, then f(ez) Ef(z) for any random variable z basic inequality is special case with discrete distribution prob(z = x) = θ, prob(z = y) = 1 θ A Brief Review on Convex Optimization 14

Optimization problem in standard form minimize f 0 (x) subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p x R n is the optimization variable f 0 : R n R is the objective or cost function f i : R n R, i = 1,...,m, are the inequality constraint functions h i : R n R are the equality constraint functions optimal value: p = inf{f 0 (x) f i (x) 0, i = 1,...,m, h i (x) = 0, i = 1,...,p} p = if problem is infeasible (no x satisfies the constraints) p = if problem is unbounded below A Brief Review on Convex Optimization 15

Optimal and locally optimal points x is feasible if x domf 0 and it satisfies the constraints a feasible x is optimal if f 0 (x) = p ; X opt is the set of optimal points x is locally optimal if there is an R > 0 such that x is optimal for minimize (over z) f 0 (z) subject to f i (z) 0, i = 1,...,m, h i (z) = 0, i = 1,...,p z x 2 R examples (with n = 1, m = p = 0) f 0 (x) = 1/x, domf 0 = R ++ : p = 0, no optimal point f 0 (x) = logx, domf 0 = R ++ : p = f 0 (x) = xlogx, domf 0 = R ++ : p = 1/e, x = 1/e is optimal f 0 (x) = x 3 3x, p =, local optimum at x = 1 A Brief Review on Convex Optimization 16

Feasibility problem find x subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p can be considered a special case of the general problem with f 0 (x) = 0: minimize 0 subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p p = 0 if constraints are feasible; any feasible x is optimal p = if constraints are infeasible A Brief Review on Convex Optimization 17

Convex optimization problem standard form convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m a T i x = b i, i = 1,...,p f 0, f 1,..., f m are convex; equality constraints are affine problem is quasiconvex if f 0 is quasiconvex (and f 1,..., f m convex) often written as minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b important property: feasible set of a convex optimization problem is convex A Brief Review on Convex Optimization 18

example minimize f 0 (x) = x 2 1+x 2 2 subject to f 1 (x) = x 1 /(1+x 2 2) 0 h 1 (x) = (x 1 +x 2 ) 2 = 0 f 0 is convex; feasible set {(x 1,x 2 ) x 1 = x 2 0} is convex not a convex problem (according to our definition): f 1 is not convex, h 1 is not affine equivalent (but not identical) to the convex problem minimize x 2 1+x 2 2 subject to x 1 0 x 1 +x 2 = 0 A Brief Review on Convex Optimization 19

Local and global optima any locally optimal point of a convex problem is (globally) optimal proof: suppose x is locally optimal and y is optimal with f 0 (y) < f 0 (x) x locally optimal means there is an R > 0 such that z feasible, z x 2 R = f 0 (z) f 0 (x) consider z = θy +(1 θ)x with θ = R/(2 y x 2 ) y x 2 > R, so 0 < θ < 1/2 z is a convex combination of two feasible points, hence also feasible z x 2 = R/2 and f 0 (z) θf 0 (x)+(1 θ)f 0 (y) < f 0 (x) which contradicts our assumption that x is locally optimal A Brief Review on Convex Optimization 20

Optimality criterion for differentiable f 0 x is optimal if and only if it is feasible and f 0 (x) T (y x) 0 for all feasible y X x f 0 (x) if nonzero, f 0 (x) defines a supporting hyperplane to feasible set X at x A Brief Review on Convex Optimization 21

unconstrained problem: x is optimal if and only if x domf 0, f 0 (x) = 0 equality constrained problem minimize f 0 (x) subject to Ax = b x is optimal if and only if there exists a ν such that x domf 0, Ax = b, f 0 (x)+a T ν = 0 minimization over nonnegative orthant x is optimal if and only if minimize f 0 (x) subject to x 0 x domf 0, x 0, { f0 (x) i 0 x i = 0 f 0 (x) i = 0 x i > 0 A Brief Review on Convex Optimization 22

Linear program (LP) minimize c T x+d subject to Gx h Ax = b convex problem with affine objective and constraint functions feasible set is a polyhedron P x c A Brief Review on Convex Optimization 23

Examples diet problem: choose quantities x 1,..., x n of n foods one unit of food j costs c j, contains amount a ij of nutrient i healthy diet requires nutrient i in quantity at least b i to find cheapest healthy diet, minimize c T x subject to Ax b, x 0 piecewise-linear minimization equivalent to an LP minimize max i=1,...,m (a T i x+b i) minimize t subject to a T i x+b i t, i = 1,...,m A Brief Review on Convex Optimization 24

Quadratic program (QP) minimize (1/2)x T Px+q T x+r subject to Gx h Ax = b P S n +, so objective is convex quadratic minimize a convex quadratic function over a polyhedron f 0 (x ) x P A Brief Review on Convex Optimization 25

Examples least-squares minimize Ax b 2 2 analytical solution x = A b (A is pseudo-inverse) can add linear constraints, e.g., l x u linear program with random cost minimize c T x+γx T Σx = Ec T x+γvar(c T x) subject to Gx h, Ax = b c is random vector with mean c and covariance Σ hence, c T x is random variable with mean c T x and variance x T Σx γ > 0 is risk aversion parameter; controls the trade-off between expected cost and variance (risk) A Brief Review on Convex Optimization 26

Quadratically constrained quadratic program (QCQP) minimize (1/2)x T P 0 x+q T 0x+r 0 subject to (1/2)x T P i x+q T i x+r i 0, i = 1,...,m Ax = b P i S n +; objective and constraints are convex quadratic if P 1,...,P m S n ++, feasible region is intersection of m ellipsoids and an affine set A Brief Review on Convex Optimization 27

Example: Linear discrimination separate two sets of points {x 1,...,x N }, {y 1,...,y M } by a hyperplane: a T x i +b > 0, i = 1,...,N, a T y i +b < 0, i = 1,...,M homogeneous in a, b, hence equivalent to a T x i +b 1, i = 1,...,N, a T y i +b 1, i = 1,...,M a set of linear inequalities in a, b A Brief Review on Convex Optimization 28

Robust linear discrimination (Euclidean) distance between hyperplanes H 1 = {z a T z +b = 1} H 2 = {z a T z +b = 1} is dist(h 1,H 2 ) = 2/ a 2 to separate two sets of points by maximum margin, minimize (1/2) a 2 subject to a T x i +b 1, i = 1,...,N a T y i +b 1, i = 1,...,M (1) (after squaring objective) a QP in a, b A Brief Review on Convex Optimization 29

Approximate linear separation of non-separable sets an LP in a, b, u, v minimize 1 T u+1 T v subject to a T x i +b 1 u i, i = 1,...,N a T y i +b 1+v i, i = 1,...,M u 0, v 0 at optimum, u i = max{0,1 a T x i b}, v i = max{0,1+a T y i +b} can be interpreted as a heuristic for minimizing #misclassified points A Brief Review on Convex Optimization 30

Support vector classifier minimize a 2 +γ(1 T u+1 T v) subject to a T x i +b 1 u i, i = 1,...,N a T y i +b 1+v i, i = 1,...,M u 0, v 0 produces point on trade-off curve between inverse of margin 2/ a 2 and classification error, measured by total slack 1 T u+1 T v same example as previous page, with γ = 0.1: A Brief Review on Convex Optimization 31

Lagrangian standard form problem (not necessarily convex) minimize f 0 (x) subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with doml = D R m R p, L(x,λ,ν) = f 0 (x)+ m λ i f i (x)+ i=1 p ν i h i (x) i=1 weighted sum of objective and constraint functions λ i is Lagrange multiplier associated with f i (x) 0 ν i is Lagrange multiplier associated with h i (x) = 0 A Brief Review on Convex Optimization 32

Lagrange dual function Lagrange dual function: g : R m R p R, g(λ,ν) = inf L(x,λ,ν) x D ( = inf x D f 0 (x)+ m λ i f i (x)+ i=1 ) p ν i h i (x) i=1 g is concave, can be for some λ, ν lower bound property: if λ 0, then g(λ,ν) p proof: if x is feasible and λ 0, then f 0 ( x) L( x,λ,ν) inf L(x,λ,ν) = g(λ,ν) x D minimizing over all feasible x gives p g(λ,ν) A Brief Review on Convex Optimization 33

Least-norm solution of linear equations dual function minimize x T x subject to Ax = b Lagrangian is L(x,ν) = x T x+ν T (Ax b) to minimize L over x, set gradient equal to zero: x L(x,ν) = 2x+A T ν = 0 = x = (1/2)A T ν plug in in L to obtain g: g(ν) = L(( 1/2)A T ν,ν) = 1 4 νt AA T ν b T ν a concave function of ν lower bound property: p (1/4)ν T AA T ν b T ν for all ν A Brief Review on Convex Optimization 34

Standard form LP minimize c T x subject to Ax = b, x 0 dual function Lagrangian is L is affine in x, hence L(x,λ,ν) = c T x+ν T (Ax b) λ T x g(λ,ν) = inf x L(x,λ,ν) = = b T ν +(c+a T ν λ) T x { b T ν A T ν λ+c = 0 otherwise g is linear on affine domain {(λ,ν) A T ν λ+c = 0}, hence concave lower bound property: p b T ν if A T ν +c 0 A Brief Review on Convex Optimization 35

The dual problem Lagrange dual problem maximize g(λ, ν) subject to λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ,ν) domg often simplified by making implicit constraint (λ, ν) dom g explicit example: standard form LP and its dual (page 35) minimize c T x subject to Ax = b x 0 maximize b T ν subject to A T ν +c 0 A Brief Review on Convex Optimization 36

Weak and strong duality weak duality: d p always holds (for convex and nonconvex problems) strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications A Brief Review on Convex Optimization 37

Slater s constraint qualification strong duality holds for a convex problem if it is strictly feasible, i.e., minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b x intd : f i (x) < 0, i = 1,...,m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g., can replace intd with relintd (interior relative to affine hull); linear inequalities do not need to hold with strict inequality,... there exist many other types of constraint qualifications A Brief Review on Convex Optimization 38

Inequality form LP primal problem minimize c T x subject to Ax b dual function g(λ) = inf x ( (c+a T λ) T x b T λ ) { b = T λ A T λ+c = 0 otherwise dual problem maximize b T λ subject to A T λ+c = 0, λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d except when primal and dual are infeasible A Brief Review on Convex Optimization 39

primal problem (assume P S n ++) Quadratic program minimize x T Px subject to Ax b dual function g(λ) = inf x ( x T Px+λ T (Ax b) ) = 1 4 λt AP 1 A T λ b T λ dual problem maximize (1/4)λ T AP 1 A T λ b T λ subject to λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d always A Brief Review on Convex Optimization 40

Complementary slackness assume strong duality holds, x is primal optimal, (λ,ν ) is dual optimal f 0 (x ) = g(λ,ν ) = inf x ( f 0 (x )+ f 0 (x ) f 0 (x)+ m λ if i (x)+ i=1 m λ if i (x )+ i=1 ) p νih i (x) i=1 p νih i (x ) i=1 hence, the two inequalities hold with equality x minimizes L(x,λ,ν ) λ i f i(x ) = 0 for i = 1,...,m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0 A Brief Review on Convex Optimization 41

Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1. primal feasibility: f i (x) 0, i = 1,...,m, h i (x) = 0, i = 1,...,p 2. dual feasibility: λ 0 3. complementary slackness: λ i f i (x) = 0, i = 1,...,m 4. first order condition (gradient of Lagrangian with respect to x vanishes): x L(x,λ,ν) = f 0 (x)+ m λ i f i (x)+ i=1 p ν i h i (x) = 0 i=1 from page 41: if strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions A Brief Review on Convex Optimization 42

KKT conditions for convex problem if x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence, f 0 ( x) = g( λ, ν) if Slater s condition is satisfied: x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem A Brief Review on Convex Optimization 43

example: water-filling (assume α i > 0) minimize n i=1 log(x i +α i ) subject to x 0, 1 T x = 1 x is optimal iff x 0, 1 T x = 1, and there exist λ R n, ν R such that λ 0, λ i x i = 0, 1 x i +α i +λ i = ν if ν < 1/α i : λ i = 0 and x i = 1/ν α i if ν 1/α i : λ i = ν 1/α i and x i = 0 determine ν from 1 T x = n i=1 max{0,1/ν α i} = 1 interpretation n patches; level of patch i is at height α i flood area with unit amount of water resulting level is 1/ν 1/ν x i i α i A Brief Review on Convex Optimization 44

Solving unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x) is attained (and finite) unconstrained minimization methods produce sequence of points x (k) domf, k = 0,1,... with f(x (k) ) p can be interpreted as iterative methods for solving optimality condition f(x ) = 0 A Brief Review on Convex Optimization 45

Initial point and sublevel set algorithms in this chapter require a starting point x (0) such that x (0) domf sublevel set S = {x f(x) f(x (0) )} is closed 2nd condition is hard to verify, except when all sublevel sets are closed: equivalent to condition that epi f is closed true if domf = R n true if f(x) as x bddomf examples of differentiable functions with closed sublevel sets: m f(x) = log( exp(a T i x+b i )), i=1 f(x) = m log(b i a T i x) i=1 A Brief Review on Convex Optimization 46

Strong convexity and implications f is strongly convex on S if there exists an m > 0 such that 2 f(x) mi for all x S implications for x,y S, hence, S is bounded p >, and for x S, f(y) f(x)+ f(x) T (y x)+ m 2 x y 2 2 f(x) p 1 2m f(x) 2 2 useful as stopping criterion (if you know m) A Brief Review on Convex Optimization 47

Descent methods x (k+1) = x (k) +t (k) x (k) with f(x (k+1) ) < f(x (k) ) other notations: x + = x+t x, x := x+t x x is the step, or search direction; t is the step size, or step length from convexity, f(x + ) < f(x) implies f(x) T x < 0 (i.e., x is a descent direction) General descent method. given a starting point x domf. repeat 1. Determine a descent direction x. 2. Line search. Choose a step size t > 0. 3. Update. x := x + t x. until stopping criterion is satisfied. A Brief Review on Convex Optimization 48

Line search types exact line search: t = argmin t>0 f(x+t x) backtracking line search (with parameters α (0,1/2), β (0,1)) starting at t = 1, repeat t := βt until f(x+t x) < f(x)+αt f(x) T x graphical interpretation: backtrack until t t 0 f(x + t x) f(x) + t f(x) T x f(x) + αt f(x) T x t t = 0 t 0 A Brief Review on Convex Optimization 49

Gradient descent method general descent method with x = f(x) given a starting point x domf. repeat 1. x := f(x). 2. Line search. Choose step size t via exact or backtracking line search. 3. Update. x := x + t x. until stopping criterion is satisfied. stopping criterion usually of the form f(x) 2 ǫ convergence result: for strongly convex f, f(x (k) ) p c k (f(x (0) ) p ) c (0,1) depends on m, x (0), line search type very simple, but often very slow; rarely used in practice A Brief Review on Convex Optimization 50

quadratic problem in R 2 f(x) = (1/2)(x 2 1 +γx 2 2) (γ > 0) with exact line search, starting at x (0) = (γ,1): x (k) 1 = γ very slow if γ 1 or γ 1 example for γ = 10: ( ) k γ 1, x (k) 2 = γ +1 ( γ 1 ) k γ +1 4 x2 0 x (1) x (0) 4 10 0 10 x 1 A Brief Review on Convex Optimization 51

nonquadratic example f(x 1,x 2 ) = e x 1+3x 2 0.1 +e x 1 3x 2 0.1 +e x 1 0.1 x (0) x (2) (0) x x (1) x (1) backtracking line search exact line search A Brief Review on Convex Optimization 52

a problem in R 100 f(x) = c T x 500 i=1 log(b i a T i x) 10 4 f(x (k) ) p 10 2 10 0 10 2 exact l.s. k backtracking l.s. 10 4 0 50 100 150 200 linear convergence, i.e., a straight line on a semilog plot A Brief Review on Convex Optimization 53