CSCI : Optimization and Control of Networks. Review on Convex Optimization

Similar documents
A Brief Review on Convex Optimization

Tutorial on Convex Optimization for Engineers Part II

EE/AA 578, Univ of Washington, Fall Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality

5. Duality. Lagrangian

4. Convex optimization problems

Convex Optimization M2

Lecture: Duality.

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

4. Convex optimization problems (part 1: general)

Lecture: Duality of LP, SOCP and SDP

Lecture: Convex Optimization Problems

Unconstrained minimization

Constrained Optimization and Lagrangian Duality

Convex optimization problems. Optimization problem in standard form

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

4. Convex optimization problems

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

On the Method of Lagrange Multipliers

Optimisation convexe: performance, complexité et applications.

Convex Optimization & Lagrange Duality

CS-E4830 Kernel Methods in Machine Learning

Convex Optimization in Communications and Signal Processing

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

subject to (x 2)(x 4) u,

ICS-E4030 Kernel Methods in Machine Learning

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Lagrangian Duality and Convex Optimization

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

10. Unconstrained minimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Convex Functions. Pontus Giselsson

Convex Optimization and Modeling

12. Interior-point methods

Lecture 18: Optimization Programming

Convex Optimization and Modeling

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

12. Interior-point methods

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

11. Equality constrained minimization

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Convex Optimization and l 1 -minimization

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

8. Geometric problems

Lecture 2: Convex functions

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

8. Conjugate functions

Primal/Dual Decomposition Methods

minimize x subject to (x 2)(x 4) u,

8. Geometric problems

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Lecture 7: Convex Optimizations

Linear and non-linear programming

Convex Optimization and Support Vector Machine

EE364a Review Session 5

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Interior Point Algorithms for Constrained Convex Optimization

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture Note 5: Semidefinite Programming for Stability Analysis

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

Descent methods. min x. f(x)

Convex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK)

CS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems. Instructor: Shaddin Dughmi

Equality constrained minimization

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Convex Optimization for Signal Processing and Communications: From Fundamentals to Applications

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Convex Optimization. 4. Convex Optimization Problems. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

9. Geometric problems

4TE3/6TE3. Algorithms for. Continuous Optimization

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

COM S 578X: Optimization for Machine Learning

Optimization for Machine Learning

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

5 Handling Constraints

Homework Set #6 - Solutions

Convex Optimization Overview

Convex functions. Definition. f : R n! R is convex if dom f is a convex set and. f ( x +(1 )y) < f (x)+(1 )f (y) f ( x +(1 )y) apple f (x)+(1 )f (y)

Unconstrained minimization: assumptions

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Written Examination

Transcription:

CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1

Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): Review on Convex Optimization 2

Hyperplanes and halfspaces hyperplane: set of the form {x a T x = b, a 0} halfspace: set of the form {x a T x b, a 0} a x 0 a T x b a T x b a is the normal vector hyperplanes and halfspaces are convex Review on Convex Optimization 3

Euclidean balls and ellipsoids Euclidean ball with center x c and radius r: B(x c,r) = {x x x c 2 r} = {x c +ru u 2 1} ellipsoid: set of the form {x (x x c ) T P 1 (x x c ) 1} with P S n ++ (i.e., P symmetric positive definite) x c other representation: {x c +Au u 2 1} with A square and nonsingular Review on Convex Optimization 4

Polyhedra solution set of finitely many linear inequalities and equalities Ax b, Cx = d (A R m n, C R p n, is componentwise inequality) a 1 a2 a 5 P a 3 a 4 polyhedron is intersection of finite number of halfspaces and hyperplanes Review on Convex Optimization 5

Convex functions f : R n R is convex if domf is a convex set and f(θx+(1 θ)y) θf(x)+(1 θ)f(y) for all x,y domf, 0 θ 1 (x,f(x)) (y,f(y)) f is concave if f is convex f is strictly convex if domf is convex and f(θx+(1 θ)y) < θf(x)+(1 θ)f(y) for x,y domf, x y, 0 < θ < 1 Review on Convex Optimization 6

Examples on R convex: affine: ax+b on R, for any a,b R exponential: e ax, for any a R powers: x α on R ++, for α 1 or α 0 powers of absolute value: x p on R, for p 1 negative entropy: xlogx on R ++ concave: affine: ax+b on R, for any a,b R powers: x α on R ++, for 0 α 1 logarithm: logx on R ++ Review on Convex Optimization 7

Examples on R n and R m n affine functions are convex and concave; all norms are convex examples on R n affine function f(x) = a T x+b examples on R m n (m n matrices) affine function f(x) = tr(a T X)+b = m i=1 n A ij X ij +b j=1 Review on Convex Optimization 8

Restriction of a convex function to a line f : R n R is convex if and only if the function g : R R, g(t) = f(x+tv), domg = {t x+tv domf} is convex (in t) for any x domf, v R n. can check convexity of f by checking convexity of functions of one variable Review on Convex Optimization 9

First-order condition f is differentiable if domf is open and the gradient f(x) = exists at each x domf ( f(x), f(x),..., f(x) ) x 1 x 2 x n 1st-order condition: differentiable f with convex domain is convex iff f(y) f(x)+ f(x) T (y x) for all x,y domf f(y) f(x) + f(x) T (y x) (x,f(x)) first-order approximation of f is global underestimator Review on Convex Optimization 10

Second-order conditions f is twice differentiable if domf is open and the Hessian 2 f(x) S n, exists at each x domf 2 f(x) ij = 2 f(x) x i x j, i,j = 1,...,n, 2nd-order conditions for twice differentiable f with convex domain f is convex if and only if 2 f(x) 0 for all x domf if 2 f(x) 0 for all x domf, then f is strictly convex Review on Convex Optimization 11

Examples quadratic function: f(x) = (1/2)x T Px+q T x+r (with P S n ) f(x) = Px+q, 2 f(x) = P convex if P 0 least-squares objective: f(x) = Ax b 2 2 f(x) = 2A T (Ax b), 2 f(x) = 2A T A convex (for any A) quadratic-over-linear: f(x,y) = x 2 /y 2 2 f(x,y) = 2 y 3 [ y x ][ y x ] T 0 f(x,y) 1 0 2 2 convex for y > 0 1 y 0 2 x 0 Review on Convex Optimization 12

Epigraph and sublevel set α-sublevel set of f : R n R: C α = {x domf f(x) α} sublevel sets of convex functions are convex (converse is false) epigraph of f : R n R: epif = {(x,t) R n+1 x domf, f(x) t} epif f f is convex if and only if epif is a convex set Review on Convex Optimization 13

Jensen s inequality basic inequality: if f is convex, then for 0 θ 1, f(θx+(1 θ)y) θf(x)+(1 θ)f(y) extension: if f is convex, then f(ez) Ef(z) for any random variable z basic inequality is special case with discrete distribution prob(z = x) = θ, prob(z = y) = 1 θ Review on Convex Optimization 14

Optimization problem in standard form minimize f 0 (x) subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p x R n is the optimization variable f 0 : R n R is the objective or cost function f i : R n R, i = 1,...,m, are the inequality constraint functions h i : R n R are the equality constraint functions optimal value: p = inf{f 0 (x) f i (x) 0, i = 1,...,m, h i (x) = 0, i = 1,...,p} p = if problem is infeasible (no x satisfies the constraints) p = if problem is unbounded below Review on Convex Optimization 15

Optimal and locally optimal points x is feasible if x domf 0 and it satisfies the constraints a feasible x is optimal if f 0 (x) = p ; X opt is the set of optimal points x is locally optimal if there is an R > 0 such that x is optimal for minimize (over z) f 0 (z) subject to f i (z) 0, i = 1,...,m, h i (z) = 0, i = 1,...,p z x 2 R examples (with n = 1, m = p = 0) f 0 (x) = 1/x, domf 0 = R ++ : p = 0, no optimal point f 0 (x) = logx, domf 0 = R ++ : p = f 0 (x) = xlogx, domf 0 = R ++ : p = 1/e, x = 1/e is optimal f 0 (x) = x 3 3x, p =, local optimum at x = 1 Review on Convex Optimization 16

Implicit constraints the standard form optimization problem has an implicit constraint x D = m i=0 domf i p i=1 domh i, we call D the domain of the problem the constraints f i (x) 0, h i (x) = 0 are the explicit constraints a problem is unconstrained if it has no explicit constraints (m = p = 0) example: minimize f 0 (x) = k i=1 log(b i a T i x) is an unconstrained problem with implicit constraints a T i x < b i Review on Convex Optimization 17

Feasibility problem find x subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p can be considered a special case of the general problem with f 0 (x) = 0: minimize 0 subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p p = 0 if constraints are feasible; any feasible x is optimal p = if constraints are infeasible Review on Convex Optimization 18

Convex optimization problem standard form convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m a T i x = b i, i = 1,...,p f 0, f 1,..., f m are convex; equality constraints are affine problem is quasiconvex if f 0 is quasiconvex (and f 1,..., f m convex) often written as minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b important property: feasible set of a convex optimization problem is convex Review on Convex Optimization 19

example minimize f 0 (x) = x 2 1+x 2 2 subject to f 1 (x) = x 1 /(1+x 2 2) 0 h 1 (x) = (x 1 +x 2 ) 2 = 0 f 0 is convex; feasible set {(x 1,x 2 ) x 1 = x 2 0} is convex not a convex problem (according to our definition): f 1 is not convex, h 1 is not affine equivalent (but not identical) to the convex problem minimize x 2 1+x 2 2 subject to x 1 0 x 1 +x 2 = 0 Review on Convex Optimization 20

Local and global optima any locally optimal point of a convex problem is (globally) optimal proof: suppose x is locally optimal and y is optimal with f 0 (y) < f 0 (x) x locally optimal means there is an R > 0 such that z feasible, z x 2 R = f 0 (z) f 0 (x) consider z = θy +(1 θ)x with θ = R/(2 y x 2 ) y x 2 > R, so 0 < θ < 1/2 z is a convex combination of two feasible points, hence also feasible z x 2 = R/2 and f 0 (z) θf 0 (x)+(1 θ)f 0 (y) < f 0 (x) which contradicts our assumption that x is locally optimal Review on Convex Optimization 21

Optimality criterion for differentiable f 0 x is optimal if and only if it is feasible and f 0 (x) T (y x) 0 for all feasible y X x f 0 (x) if nonzero, f 0 (x) defines a supporting hyperplane to feasible set X at x Review on Convex Optimization 22

unconstrained problem: x is optimal if and only if x domf 0, f 0 (x) = 0 equality constrained problem minimize f 0 (x) subject to Ax = b x is optimal if and only if there exists a ν such that x domf 0, Ax = b, f 0 (x)+a T ν = 0 minimization over nonnegative orthant x is optimal if and only if minimize f 0 (x) subject to x 0 x domf 0, x 0, { f0 (x) i 0 x i = 0 f 0 (x) i = 0 x i > 0 Review on Convex Optimization 23

Linear program (LP) minimize c T x+d subject to Gx h Ax = b convex problem with affine objective and constraint functions feasible set is a polyhedron P x c Review on Convex Optimization 24

Examples diet problem: choose quantities x 1,..., x n of n foods one unit of food j costs c j, contains amount a ij of nutrient i healthy diet requires nutrient i in quantity at least b i to find cheapest healthy diet, minimize c T x subject to Ax b, x 0 piecewise-linear minimization equivalent to an LP minimize max i=1,...,m (a T i x+b i) minimize t subject to a T i x+b i t, i = 1,...,m Review on Convex Optimization 25

Chebyshev center of a polyhedron Chebyshev center of P = {x a T i x b i, i = 1,...,m} is center of largest inscribed ball x cheb B = {x c +u u 2 r} a T i x b i for all x B if and only if sup{a T i (x c +u) u 2 r} = a T i x c +r a i 2 b i hence, x c, r can be determined by solving the LP maximize r subject to a T i x c+r a i 2 b i, i = 1,...,m Review on Convex Optimization 26

Quadratic program (QP) minimize (1/2)x T Px+q T x+r subject to Gx h Ax = b P S n +, so objective is convex quadratic minimize a convex quadratic function over a polyhedron f 0 (x ) x P Review on Convex Optimization 27

Examples least-squares minimize Ax b 2 2 analytical solution x = A b (A is pseudo-inverse) can add linear constraints, e.g., l x u linear program with random cost minimize c T x+γx T Σx = Ec T x+γvar(c T x) subject to Gx h, Ax = b c is random vector with mean c and covariance Σ hence, c T x is random variable with mean c T x and variance x T Σx γ > 0 is risk aversion parameter; controls the trade-off between expected cost and variance (risk) Review on Convex Optimization 28

Quadratically constrained quadratic program (QCQP) minimize (1/2)x T P 0 x+q T 0x+r 0 subject to (1/2)x T P i x+q T i x+r i 0, i = 1,...,m Ax = b P i S n +; objective and constraints are convex quadratic if P 1,...,P m S n ++, feasible region is intersection of m ellipsoids and an affine set Review on Convex Optimization 29

Lagrangian standard form problem (not necessarily convex) minimize f 0 (x) subject to f i (x) 0, i = 1,...,m h i (x) = 0, i = 1,...,p variable x R n, domain D, optimal value p Lagrangian: L : R n R m R p R, with doml = D R m R p, L(x,λ,ν) = f 0 (x)+ m λ i f i (x)+ i=1 p ν i h i (x) i=1 weighted sum of objective and constraint functions λ i is Lagrange multiplier associated with f i (x) 0 ν i is Lagrange multiplier associated with h i (x) = 0 Review on Convex Optimization 30

Lagrange dual function Lagrange dual function: g : R m R p R, g(λ,ν) = inf L(x,λ,ν) x D ( = inf x D f 0 (x)+ m λ i f i (x)+ i=1 ) p ν i h i (x) i=1 g is concave, can be for some λ, ν lower bound property: if λ 0, then g(λ,ν) p proof: if x is feasible and λ 0, then f 0 ( x) L( x,λ,ν) inf L(x,λ,ν) = g(λ,ν) x D minimizing over all feasible x gives p g(λ,ν) Review on Convex Optimization 31

Least-norm solution of linear equations dual function minimize x T x subject to Ax = b Lagrangian is L(x,ν) = x T x+ν T (Ax b) to minimize L over x, set gradient equal to zero: x L(x,ν) = 2x+A T ν = 0 = x = (1/2)A T ν plug in in L to obtain g: g(ν) = L(( 1/2)A T ν,ν) = 1 4 νt AA T ν b T ν a concave function of ν lower bound property: p (1/4)ν T AA T ν b T ν for all ν Review on Convex Optimization 32

Standard form LP minimize c T x subject to Ax = b, x 0 dual function Lagrangian is L is affine in x, hence L(x,λ,ν) = c T x+ν T (Ax b) λ T x g(λ,ν) = inf x L(x,λ,ν) = = b T ν +(c+a T ν λ) T x { b T ν A T ν λ+c = 0 otherwise g is linear on affine domain {(λ,ν) A T ν λ+c = 0}, hence concave lower bound property: p b T ν if A T ν +c 0 Review on Convex Optimization 33

The dual problem Lagrange dual problem maximize g(λ, ν) subject to λ 0 finds best lower bound on p, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d λ, ν are dual feasible if λ 0, (λ,ν) domg often simplified by making implicit constraint (λ, ν) dom g explicit example: standard form LP and its dual (page 33) minimize c T x subject to Ax = b x 0 maximize b T ν subject to A T ν +c 0 Review on Convex Optimization 34

Weak and strong duality weak duality: d p always holds (for convex and nonconvex problems) strong duality: d = p does not hold in general (usually) holds for convex problems conditions that guarantee strong duality in convex problems are called constraint qualifications Review on Convex Optimization 35

Slater s constraint qualification strong duality holds for a convex problem if it is strictly feasible, i.e., minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b x intd : f i (x) < 0, i = 1,...,m, Ax = b also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g., can replace intd with relintd (interior relative to affine hull); linear inequalities do not need to hold with strict inequality,... there exist many other types of constraint qualifications Review on Convex Optimization 36

Inequality form LP primal problem minimize c T x subject to Ax b dual function g(λ) = inf x ( (c+a T λ) T x b T λ ) { b = T λ A T λ+c = 0 otherwise dual problem maximize b T λ subject to A T λ+c = 0, λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d except when primal and dual are infeasible Review on Convex Optimization 37

primal problem (assume P S n ++) Quadratic program minimize x T Px subject to Ax b dual function g(λ) = inf x ( x T Px+λ T (Ax b) ) = 1 4 λt AP 1 A T λ b T λ dual problem maximize (1/4)λ T AP 1 A T λ b T λ subject to λ 0 from Slater s condition: p = d if A x b for some x in fact, p = d always Review on Convex Optimization 38

Complementary slackness assume strong duality holds, x is primal optimal, (λ,ν ) is dual optimal f 0 (x ) = g(λ,ν ) = inf x ( f 0 (x )+ f 0 (x ) f 0 (x)+ m λ if i (x)+ i=1 m λ if i (x )+ i=1 ) p νih i (x) i=1 p νih i (x ) i=1 hence, the two inequalities hold with equality x minimizes L(x,λ,ν ) λ i f i(x ) = 0 for i = 1,...,m (known as complementary slackness): λ i > 0 = f i (x ) = 0, f i (x ) < 0 = λ i = 0 Review on Convex Optimization 39

Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable f i, h i ): 1. primal feasibility: f i (x) 0, i = 1,...,m, h i (x) = 0, i = 1,...,p 2. dual feasibility: λ 0 3. complementary slackness: λ i f i (x) = 0, i = 1,...,m 4. first order condition (gradient of Lagrangian with respect to x vanishes): x L(x,λ,ν) = f 0 (x)+ m λ i f i (x)+ i=1 p ν i h i (x) = 0 i=1 from page 39: if strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions Review on Convex Optimization 40

KKT conditions for convex problem if x, λ, ν satisfy KKT for a convex problem, then they are optimal: from complementary slackness: f 0 ( x) = L( x, λ, ν) from 4th condition (and convexity): g( λ, ν) = L( x, λ, ν) hence, f 0 ( x) = g( λ, ν) if Slater s condition is satisfied: x is optimal if and only if there exist λ, ν that satisfy KKT conditions recall that Slater implies strong duality, and dual optimum is attained generalizes optimality condition f 0 (x) = 0 for unconstrained problem Review on Convex Optimization 41

example: water-filling (assume α i > 0) minimize n i=1 log(x i +α i ) subject to x 0, 1 T x = 1 x is optimal iff x 0, 1 T x = 1, and there exist λ R n, ν R such that λ 0, λ i x i = 0, 1 x i +α i +λ i = ν if ν < 1/α i : λ i = 0 and x i = 1/ν α i if ν 1/α i : λ i = ν 1/α i and x i = 0 determine ν from 1 T x = n i=1 max{0,1/ν α i} = 1 interpretation n patches; level of patch i is at height α i flood area with unit amount of water resulting level is 1/ν 1/ν x i i α i Review on Convex Optimization 42

Solving unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x) is attained (and finite) unconstrained minimization methods produce sequence of points x (k) domf, k = 0,1,... with f(x (k) ) p can be interpreted as iterative methods for solving optimality condition f(x ) = 0 Review on Convex Optimization 43

Initial point and sublevel set algorithms in this chapter require a starting point x (0) such that x (0) domf sublevel set S = {x f(x) f(x (0) )} is closed 2nd condition is hard to verify, except when all sublevel sets are closed: equivalent to condition that epi f is closed true if domf = R n true if f(x) as x bddomf examples of differentiable functions with closed sublevel sets: m f(x) = log( exp(a T i x+b i )), i=1 f(x) = m log(b i a T i x) i=1 Review on Convex Optimization 44

Strong convexity and implications f is strongly convex on S if there exists an m > 0 such that 2 f(x) mi for all x S implications for x,y S, hence, S is bounded p >, and for x S, f(y) f(x)+ f(x) T (y x)+ m 2 x y 2 2 f(x) p 1 2m f(x) 2 2 useful as stopping criterion (if you know m) Review on Convex Optimization 45

Descent methods x (k+1) = x (k) +t (k) x (k) with f(x (k+1) ) < f(x (k) ) other notations: x + = x+t x, x := x+t x x is the step, or search direction; t is the step size, or step length from convexity, f(x + ) < f(x) implies f(x) T x < 0 (i.e., x is a descent direction) General descent method. given a starting point x domf. repeat 1. Determine a descent direction x. 2. Line search. Choose a step size t > 0. 3. Update. x := x + t x. until stopping criterion is satisfied. Review on Convex Optimization 46

Line search types exact line search: t = argmin t>0 f(x+t x) backtracking line search (with parameters α (0,1/2), β (0,1)) starting at t = 1, repeat t := βt until f(x+t x) < f(x)+αt f(x) T x graphical interpretation: backtrack until t t 0 f(x + t x) f(x) + t f(x) T x f(x) + αt f(x) T x t t = 0 t 0 Review on Convex Optimization 47

Gradient descent method general descent method with x = f(x) given a starting point x domf. repeat 1. x := f(x). 2. Line search. Choose step size t via exact or backtracking line search. 3. Update. x := x + t x. until stopping criterion is satisfied. stopping criterion usually of the form f(x) 2 ǫ convergence result: for strongly convex f, f(x (k) ) p c k (f(x (0) ) p ) c (0,1) depends on m, x (0), line search type very simple, but often very slow; rarely used in practice Review on Convex Optimization 48

quadratic problem in R 2 f(x) = (1/2)(x 2 1 +γx 2 2) (γ > 0) with exact line search, starting at x (0) = (γ,1): x (k) 1 = γ very slow if γ 1 or γ 1 example for γ = 10: ( ) k γ 1, x (k) 2 = γ +1 ( γ 1 ) k γ +1 4 x2 0 x (1) x (0) 4 10 0 10 x 1 Review on Convex Optimization 49

nonquadratic example f(x 1,x 2 ) = e x 1+3x 2 0.1 +e x 1 3x 2 0.1 +e x 1 0.1 x (0) x (2) (0) x x (1) x (1) backtracking line search exact line search Review on Convex Optimization 50

a problem in R 100 f(x) = c T x 500 i=1 log(b i a T i x) 10 4 f(x (k) ) p 10 2 10 0 10 2 exact l.s. k backtracking l.s. 10 4 0 50 100 150 200 linear convergence, i.e., a straight line on a semilog plot Review on Convex Optimization 51