Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Similar documents
10. Unconstrained minimization

Unconstrained minimization

12. Interior-point methods

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

ORIE 6326: Convex Optimization. Quasi-Newton Methods

Interior Point Algorithms for Constrained Convex Optimization

12. Interior-point methods

Unconstrained minimization: assumptions

11. Equality constrained minimization

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

A Brief Review on Convex Optimization

Barrier Method. Javier Peña Convex Optimization /36-725

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Convex Optimization and l 1 -minimization

Equality constrained minimization

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Lecture 9 Sequential unconstrained minimization

Newton s Method. Javier Peña Convex Optimization /36-725

Lecture 14: October 17

Unconstrained minimization of smooth functions

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

ICS-E4030 Kernel Methods in Machine Learning

Nonlinear Optimization for Optimal Control

Constrained Optimization and Lagrangian Duality

CS-E4830 Kernel Methods in Machine Learning

10 Numerical methods for constrained problems

Conditional Gradient (Frank-Wolfe) Method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

CS711008Z Algorithm Design and Analysis

Convex Optimization M2

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

5. Duality. Lagrangian

minimize x subject to (x 2)(x 4) u,

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lecture 14: Newton s Method

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Primal-Dual Interior-Point Methods

Convex Optimization Boyd & Vandenberghe. 5. Duality

subject to (x 2)(x 4) u,

Convex Optimization. Problem set 2. Due Monday April 26th

Algorithms for constrained local optimization

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Lecture 16: October 22

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

Lecture 14 Barrier method

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms

Convex Optimization & Lagrange Duality

Numerical optimization

Lecture: Duality of LP, SOCP and SDP

Convex Optimization and SVM

Homework 4. Convex Optimization /36-725

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Analytic Center Cutting-Plane Method

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Descent methods. min x. f(x)

Lecture 8. Strong Duality Results. September 22, 2008

Convex Optimization M2

8. Geometric problems

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Written Examination

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Lecture: Duality.

5 Handling Constraints

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

10-725/ Optimization Midterm Exam

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization

More First-Order Optimization Algorithms

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Inequality constrained minimization: log-barrier method

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Coordinate Descent and Ascent Methods

Support Vector Machines for Regression

Applications of Linear Programming

Advances in Convex Optimization: Theory, Algorithms, and Applications

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

LINEAR AND NONLINEAR PROGRAMMING

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

8. Geometric problems

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

8 Numerical methods for unconstrained problems

A Distributed Newton Method for Network Utility Maximization, II: Convergence

Convex optimization problems. Optimization problem in standard form

Optimization Tutorial 1. Basic Gradient Descent

EE/AA 578, Univ of Washington, Fall Duality

Optimization for Machine Learning

4. Convex optimization problems

Convex Optimization and Support Vector Machine

On the Method of Lagrange Multipliers

Selected Topics in Optimization. Some slides borrowed from

EE364a Homework 8 solutions

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Transcription:

Convex Optimization Newton s method ENSAE: Optimisation 1/44

Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x) is attained (and finite) unconstrained minimization methods produce sequence of points x (k) dom f, k = 0, 1,... with f(x (k) ) p can be interpreted as iterative methods for solving optimality condition f(x ) = 0 ENSAE: Optimisation 2/44

Strong convexity and implications f is strongly convex on S if there exists an m > 0 such that 2 f(x) mi for all x S implications for x, y S, f(y) f(x) + f(x) T (y x) + m 2 x y 2 2 hence, S is bounded p >, and for x S, f(x) p 1 2m f(x) 2 2 useful as stopping criterion (if you know m) ENSAE: Optimisation 3/44

Descent methods x (k+1) = x (k) + t (k) x (k) with f(x (k+1) ) < f(x (k) ) other notations: x + = x + t x, x := x + t x x is the step, or search direction; t is the step size, or step length from convexity, f(x + ) < f(x) implies f(x) T x < 0 (i.e., x is a descent direction) General descent method. given a starting point x dom f. repeat 1. Determine a descent direction x. 2. Line search. Choose a step size t > 0. 3. Update. x := x + t x. until stopping criterion is satisfied. ENSAE: Optimisation 4/44

Line search types exact line search: t = argmin t>0 f(x + t x) backtracking line search (with parameters α (0, 1/2), β (0, 1)) starting at t = 1, repeat t := βt until f(x + t x) < f(x) + αt f(x) T x graphical interpretation: backtrack until t t 0 f(x + t x) f(x) + t f(x) T x t = 0 t 0 f(x) + αt f(x) T x t ENSAE: Optimisation 5/44

Gradient descent method general descent method with x = f(x) given a starting point x dom f. repeat 1. x := f(x). 2. Line search. Choose step size t via exact or backtracking line search. 3. Update. x := x + t x. until stopping criterion is satisfied. stopping criterion usually of the form f(x) 2 ɛ convergence result: for strongly convex f, f(x (k) ) p c k (f(x (0) ) p ) c (0, 1) depends on m, x (0), line search type very simple, but often very slow; rarely used in practice ENSAE: Optimisation 6/44

quadratic problem in R 2 f(x) = (1/2)(x 2 1 + γx 2 2) (γ > 0) with exact line search, starting at x (0) = (γ, 1): x (k) 1 = γ ( ) k γ 1, x (k) 2 = γ + 1 ( γ 1 ) k γ + 1 very slow if γ 1 or γ 1 example for γ = 10: 4 x2 0 x (1) x (0) 4 10 0 10 x 1 ENSAE: Optimisation 7/44

a problem in R 100 f(x) = c T x 500 i=1 log(b i a T i x) 10 4 10 2 f(x (k) ) p 10 0 10 2 exact l.s. k backtracking l.s. 10 4 0 50 100 150 200 linear convergence, i.e., a straight line on a semilog plot ENSAE: Optimisation 8/44

Steepest descent method normalized steepest descent direction (at x, for norm ): x nsd = argmin{f(x) + f(x) T v v = 1} interpretation: for small v, f(x + v) f(x) + f(x) T v; direction x nsd is unit-norm step with most negative directional derivative (unnormalized) steepest descent direction x sd = f(x) x nsd satisfies f(x) T sd = f(x) 2 steepest descent method general descent method with x = x sd convergence properties similar to gradient descent ENSAE: Optimisation 9/44

examples Euclidean norm: x sd = f(x) quadratic norm x P = (x T P x) 1/2 (P S n ++): x sd = P 1 f(x) l 1 -norm: x sd = ( f(x)/ x i )e i, where f(x)/ x i = f(x) unit balls and normalized steepest descent directions for a quadratic norm and the l 1 -norm: f(x) x nsd x nsd f(x) ENSAE: Optimisation 10/44

choice of norm for steepest descent x (0) x (0) x (2) x (1) x (2) x (1) steepest descent with backtracking line search for two quadratic norms ellipses show {x x x (k) P = 1} equivalent interpretation of steepest descent with quadratic norm P : gradient descent after change of variables x = P 1/2 x shows choice of P has strong effect on speed of convergence ENSAE: Optimisation 11/44

Newton step x nt = 2 f(x) 1 f(x) interpretations x + x nt minimizes second order approximation f(x + v) = f(x) + f(x) T v + 1 2 vt 2 f(x)v x + x nt solves linearized optimality condition f(x + v) f(x + v) = f(x) + 2 f(x)v = 0 f (x,f(x)) (x + x nt,f(x + x nt )) f f (x,f (x)) f (x + x nt,f (x + x nt )) ENSAE: Optimisation 12/44

x nt is steepest descent direction at x in local Hessian norm u 2 f(x) = ( u T 2 f(x)u ) 1/2 x x + x nsd x + x nt dashed lines are contour lines of f; ellipse is {x + v v T 2 f(x)v = 1}, arrow shows f(x) ENSAE: Optimisation 13/44

Newton decrement a measure of the proximity of x to x properties λ(x) = ( f(x) T 2 f(x) 1 f(x) ) 1/2 gives an estimate of f(x) p, using quadratic approximation f: f(x) inf y f(y) = 1 2 λ(x)2 equal to the norm of the Newton step in the quadratic Hessian norm λ(x) = ( x nt 2 f(x) x nt ) 1/2 directional derivative in the Newton direction: f(x) T x nt = λ(x) 2 affine invariant (unlike f(x) 2 ) ENSAE: Optimisation 14/44

Newton s method given a starting point x dom f, tolerance ɛ > 0. repeat 1. Compute the Newton step and decrement. x nt := 2 f(x) 1 f(x); λ 2 := f(x) T 2 f(x) 1 f(x). 2. Stopping criterion. quit if λ 2 /2 ɛ. 3. Line search. Choose step size t by backtracking line search. 4. Update. x := x + t x nt. affine invariant, i.e., independent of linear changes of coordinates: Newton iterates for f(y) = f(t y) with starting point y (0) = T 1 x (0) are y (k) = T 1 x (k) ENSAE: Optimisation 15/44

Classical convergence analysis assumptions f strongly convex on S with constant m 2 f is Lipschitz continuous on S, with constant L > 0: 2 f(x) 2 f(y) 2 L x y 2 (L measures how well f can be approximated by a quadratic function) outline: there exist constants η (0, m 2 /L), γ > 0 such that if f(x) 2 η, then f(x (k+1) ) f(x (k) ) γ if f(x) 2 < η, then L 2m 2 f(x(k+1) ) 2 ( ) 2 L 2m 2 f(x(k) ) 2 ENSAE: Optimisation 16/44

damped Newton phase ( f(x) 2 η) most iterations require backtracking steps function value decreases by at least γ if p >, this phase ends after at most (f(x (0) ) p )/γ iterations quadratically convergent phase ( f(x) 2 < η) all iterations use step size t = 1 f(x) 2 converges to zero quadratically: if f(x (k) ) 2 < η, then L 2m 2 f(xl ) 2 ( ) 2 l k L 2m 2 f(xk ) 2 ( ) 2 l k 1 2, l k ENSAE: Optimisation 17/44

Newton s method: complexity conclusion: number of iterations until f(x) p ɛ is bounded above by f(x (0) ) p γ + log 2 log 2 (ɛ 0 /ɛ) γ, ɛ 0 are constants that depend on m, L, x (0) second term is small (of the order of 6) and almost constant for practical purposes in practice, constants m, L (hence γ, ɛ 0 ) are usually unknown, but we can show, under different assumptions that the number of iterations is bounded by 375(f(x (0) ) p ) + 6 ENSAE: Optimisation 18/44

Examples example in R 2 10 5 x (0) x (1) f(x (k) ) p 10 0 10 5 10 10 10 15 0 1 2 3 4 5 k backtracking parameters α = 0.1, β = 0.7 converges in only 5 steps quadratic local convergence ENSAE: Optimisation 19/44

example in R 100 (page 8) 10 5 2 f(x (k) ) p 10 0 10 5 10 10 exact line search backtracking step size t (k) 1.5 1 0.5 exact line search backtracking 10 15 0 2 4 6 8 10 k 0 0 2 4 6 8 k backtracking parameters α = 0.01, β = 0.5 backtracking line search almost as fast as exact l.s. (and much simpler) clearly shows two phases in algorithm ENSAE: Optimisation 20/44

example in R 10000 (with sparse a i ) f(x) = 10000 i=1 log(1 x 2 i ) 100000 i=1 log(b i a T i x) 10 5 f(x (k) ) p 10 0 10 5 0 5 10 15 20 k backtracking parameters α = 0.01, β = 0.5. performance similar as for small examples ENSAE: Optimisation 21/44

numerical example: 150 randomly generated instances of minimize f(x) = m i=1 log(b i a T i x) 25 20 : m = 100, n = 50 : m = 1000, n = 500 : m = 1000, n = 50 iterations 15 10 5 0 0 5 10 15 20 25 30 35 f(x (0) ) p number of iterations much smaller than 375(f(x (0) ) p ) + 6 bound of the form c(f(x (0) ) p ) + 6 with smaller c (empirically) valid ENSAE: Optimisation 22/44

Equality Constraints ENSAE: Optimisation 23/44

Equality constrained minimization minimize f(x) subject to Ax = b f convex, twice continuously differentiable A R p n with Rank A = p we assume p is finite and attained optimality conditions: x is optimal iff there exists a ν such that f(x ) + A T ν = 0, Ax = b ENSAE: Optimisation 24/44

equality constrained quadratic minimization (with P S n +) minimize subject to (1/2)x T P x + q T x + r Ax = b optimality condition: [ P A T A 0 ] [ x ν ] = [ q b ] coefficient matrix is called KKT matrix ENSAE: Optimisation 25/44

Eliminating equality constraints represent solution of {x Ax = b} as {x Ax = b} = {F z + ˆx z R n p } ˆx is (any) particular solution range of F R n (n p) is nullspace of A (Rank F = n p and AF = 0) reduced or eliminated problem minimize f(f z + ˆx) an unconstrained problem with variable z R n p from solution z, obtain x and ν as x = F z + ˆx, ν = (AA T ) 1 A f(x ) ENSAE: Optimisation 26/44

example: optimal allocation with resource constraint minimize f 1 (x 1 ) + f 2 (x 2 ) + + f n (x n ) subject to x 1 + x 2 + + x n = b eliminate x n = b x 1 x n 1, i.e., choose ˆx = be n, F = [ I 1 T ] R n (n 1) reduced problem: minimize f 1 (x 1 ) + + f n 1 (x n 1 ) + f n (b x 1 x n 1 ) (variables x 1,..., x n 1 ) ENSAE: Optimisation 27/44

Newton step Newton step of f at feasible x is given by (1st block) of solution of [ 2 f(x) A T A 0 ] [ xnt w ] = [ f(x) 0 ] interpretations x nt solves second order approximation (with variable v) minimize subject to f(x + v) = f(x) + f(x) T v + (1/2)v T 2 f(x)v A(x + v) = b equations follow from linearizing optimality conditions f(x + x nt ) + A T w = 0, A(x + x nt ) = b ENSAE: Optimisation 28/44

Newton decrement properties λ(x) = ( x T nt 2 f(x) x nt ) 1/2 = ( f(x) T x nt ) 1/2 gives an estimate of f(x) p using quadratic approximation f: f(x) inf Ay=b f(y) = 1 2 λ(x)2 directional derivative in Newton direction: d dt f(x + t x nt) = λ(x) 2 t=0 in general, λ(x) ( f(x) T 2 f(x) 1 f(x) ) 1/2 ENSAE: Optimisation 29/44

Newton s method with equality constraints given starting point x dom f with Ax = b, tolerance ɛ > 0. repeat 1. Compute the Newton step and decrement x nt, λ(x). 2. Stopping criterion. quit if λ 2 /2 ɛ. 3. Line search. Choose step size t by backtracking line search. 4. Update. x := x + t x nt. a feasible descent method: x (k) feasible and f(x (k+1) ) < f(x (k) ) affine invariant ENSAE: Optimisation 30/44

Barrier Methods ENSAE: Optimisation 31/44

Inequality constrained minimization minimize f 0 (x) subject to f i (x) 0, i = 1,..., m Ax = b (1) f i convex, twice continuously differentiable A R p n with Rank A = p we assume p is finite and attained we assume problem is strictly feasible: there exists x with x dom f 0, f i ( x) < 0, i = 1,..., m, A x = b hence, strong duality holds and dual optimum is attained ENSAE: Optimisation 32/44

Logarithmic barrier reformulation of (1) via indicator function: minimize subject to f 0 (x) + m i=1 I (f i (x)) Ax = b where I (u) = 0 if u 0, I (u) = otherwise (indicator function of R ) approximation via logarithmic barrier minimize subject to f 0 (x) (1/t) m i=1 log( f i(x)) Ax = b an equality constrained problem for t > 0, (1/t) log( u) is a smooth approximation of I approximation improves as t 10 5 0 5 3 2 1 0 1 u ENSAE: Optimisation 33/44

logarithmic barrier function φ(x) = m log( f i (x)), dom φ = {x f 1 (x) < 0,..., f m (x) < 0} i=1 convex (follows from composition rules) twice continuously differentiable, with derivatives φ(x) = 2 φ(x) = m i=1 m i=1 1 f i (x) f i(x) 1 f i (x) 2 f i(x) f i (x) T + m i=1 1 f i (x) 2 f i (x) ENSAE: Optimisation 34/44

Central path for t > 0, define x (t) as the solution of minimize subject to tf 0 (x) + φ(x) Ax = b (for now, assume x (t) exists and is unique for each t > 0) central path is {x (t) t > 0} example: central path for an LP c minimize c T x subject to a T i x b i, i = 1,..., 6 hyperplane c T x = c T x (t) is tangent to level curve of φ through x (t) x x (10) ENSAE: Optimisation 35/44

Interpretation via KKT conditions x = x (t), λ = 1/( tf i (x (t)), ν = w/t (with w dual variable from equality constrained barrier problem) satisfy 1. primal constraints: f i (x) 0, i = 1,..., m, Ax = b 2. dual constraints: λ 0 3. approximate complementary slackness: λ i f i (x) = 1/t, i = 1,..., m 4. gradient of Lagrangian with respect to x vanishes: f 0 (x) + m λ i f i (x) + A T ν = 0 i=1 difference with KKT is that condition 3 replaces λ i f i (x) = 0 ENSAE: Optimisation 36/44

Barrier method given strictly feasible x, t := t (0) > 0, µ > 1, tolerance ɛ > 0. repeat 1. Centering step. Compute x (t) by minimizing tf 0 + φ, subject to Ax = b. 2. Update. x := x (t). 3. Stopping criterion. quit if m/t < ɛ. 4. Increase t. t := µt. terminates with f 0 (x) p ɛ (stopping criterion follows from f 0 (x (t)) p m/t) centering usually done using Newton s method, starting at current x choice of µ involves a trade-off: large µ means fewer outer iterations, more inner (Newton) iterations; typical values: µ = 10 20 several heuristics for choice of t (0) ENSAE: Optimisation 37/44

Examples inequality form LP (m = 100 inequalities, n = 50 variables) duality gap 10 2 10 0 10 2 10 4 10 6 µ = 50 µ = 150 µ = 2 0 20 40 60 80 Newton iterations Newton iterations 140 120 100 80 60 40 20 0 0 40 80 120 160 200 µ starts with x on central path (t (0) = 1, duality gap 100) terminates when t = 10 8 (gap 10 6 ) centering uses Newton s method with backtracking total number of Newton iterations not very sensitive for µ 10 ENSAE: Optimisation 38/44

geometric program (m = 100 inequalities and n = 50 variables) minimize subject to log log ( 5 ( 5 ) k=1 exp(at 0k x + b 0k) ) k=1 exp(at ik x + b ik) 0, i = 1,..., m 10 2 10 0 duality gap 10 2 10 4 10 6 µ = 150 µ = 50 µ = 2 0 20 40 60 80 100 120 Newton iterations ENSAE: Optimisation 39/44

family of standard LPs (A R m 2m ) minimize c T x subject to Ax = b, x 0 m = 10,..., 1000; for each m, solve 100 randomly generated instances 35 Newton iterations 30 25 20 15 10 1 10 2 10 3 m number of iterations grows very slowly as m ranges over a 100 : 1 ratio ENSAE: Optimisation 40/44

Polynomial-time complexity of barrier method for µ = 1 + 1/ m: ( ( )) m m/t (0) N = O log ɛ number of Newton iterations for fixed gap reduction is O( m) multiply with cost of one Newton iteration (a polynomial function of problem dimensions), to get bound on number of flops this choice of µ optimizes worst-case complexity; in practice we choose µ fixed (µ = 10,..., 20) ENSAE: Optimisation 41/44

Feasibility and phase I methods feasibility problem: find x such that f i (x) 0, i = 1,..., m, Ax = b (2) phase I: computes strictly feasible starting point for barrier method basic phase I method minimize (over x, s) s subject to f i (x) s, i = 1,..., m Ax = b (3) if x, s feasible, with s < 0, then x is strictly feasible for (2) if optimal value p of (3) is positive, then problem (2) is infeasible if p = 0 and attained, then problem (2) is feasible (but not strictly); if p = 0 and not attained, then problem (2) is infeasible ENSAE: Optimisation 42/44

sum of infeasibilities phase I method minimize 1 T s subject to s 0, f i (x) s i, i = 1,..., m Ax = b for infeasible problems, produces a solution that satisfies many more inequalities than basic phase I method example (infeasible set of 100 linear inequalities in 50 variables) 60 60 number 40 20 number 40 20 0 1 0.5 0 0.5 1 1.5 b i a T i x max 0 1 0.5 0 0.5 1 1.5 b i a T i x sum left: basic phase I solution; satisfies 39 inequalities right: sum of infeasibilities phase I solution; satisfies 79 inequalities ENSAE: Optimisation 43/44

example: family of linear inequalities Ax b + γ b data chosen to be strictly feasible for γ > 0, infeasible for γ 0 use basic phase I, terminate when s < 0 or dual objective is positive Newton iterations 100 80 60 40 20 Infeasible Feasible 0 1 0.5 0 0.5 1 γ Newton iterations 100 80 60 40 20 0 10 0 10 2 10 4 10 6 γ Newton iterations 100 80 60 40 20 0 10 6 10 4 10 2 10 0 γ number of iterations roughly proportional to log(1/ γ ) ENSAE: Optimisation 44/44