Optimization Methods. Lecture 19: Line Searches and Newton s Method

Similar documents
1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

The Steepest Descent Algorithm for Unconstrained Optimization

Line Search Methods for Unconstrained Optimisation

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Optimization Tutorial 1. Basic Gradient Descent

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

5 Overview of algorithms for unconstrained optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Optimality conditions for unconstrained optimization. Outline

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

Unconstrained minimization of smooth functions

Lecture 3: Linesearch methods (continued). Steepest descent methods

Algorithms for constrained local optimization

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725

You should be able to...

Nonlinear Programming

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

8 Numerical methods for unconstrained problems

2.098/6.255/ Optimization Methods Practice True/False Questions

Introduction to Nonlinear Optimization Paul J. Atzberger

Unconstrained optimization

Nonlinear Optimization

CS 435, 2018 Lecture 6, Date: 12 April 2018 Instructor: Nisheeth Vishnoi. Newton s Method

Lecture 14: October 17

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Nonlinear Optimization for Optimal Control

10.34: Numerical Methods Applied to Chemical Engineering. Lecture 7: Solutions of nonlinear equations Newton-Raphson method

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

Unconstrained optimization I Gradient-type methods

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Conditional Gradient (Frank-Wolfe) Method

Unconstrained Optimization

FIXED POINT ITERATION

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

10. Unconstrained minimization

MATH 4211/6211 Optimization Quasi-Newton Method

17 Solution of Nonlinear Systems

The Conjugate Gradient Method

Unconstrained minimization

Selected Topics in Optimization. Some slides borrowed from

Math 164: Optimization Barzilai-Borwein Method

Lecture 7 Unconstrained nonlinear programming

Constrained optimization: direct methods (cont.)

Introduction to unconstrained optimization - direct search methods

The Randomized Newton Method for Convex Optimization

Convex Optimization. Problem set 2. Due Monday April 26th

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised)

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Examination paper for TMA4180 Optimization I

Lecture 17: October 27

Higher-Order Methods

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

MATH 4211/6211 Optimization Basics of Optimization Problems

Numerical Optimization

Sub-Sampled Newton Methods

Chapter 3 Numerical Methods

Lecture 13 Newton-type Methods A Newton Method for VIs. October 20, 2008

Programming, numerics and optimization

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

CS 323: Numerical Analysis and Computing

ORIE 6326: Convex Optimization. Quasi-Newton Methods

14. Nonlinear equations

Self-Concordant Barrier Functions for Convex Optimization

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Lecture 5: September 15

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

18.657: Mathematics of Machine Learning

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Scientific Computing: Optimization

NUMERICAL METHODS. x n+1 = 2x n x 2 n. In particular: which of them gives faster convergence, and why? [Work to four decimal places.

Primal/Dual Decomposition Methods

Root Finding (and Optimisation)

GRADIENT = STEEPEST DESCENT

IFT Lecture 2 Basics of convex analysis and gradient descent

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Computational Optimization. Convexity and Unconstrained Optimization 1/29/08 and 2/1(revised)

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

6. Proximal gradient method

Nonlinear Optimization: What s important?

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

5 Quasi-Newton Methods

ECE580 Partial Solution to Problem Set 3

Chapter 4. Unconstrained optimization

IE 5531: Engineering Optimization I

Math 473: Practice Problems for Test 1, Fall 2011, SOLUTIONS

D(f/g)(P ) = D(f)(P )g(p ) f(p )D(g)(P ). g 2 (P )

Lecture 10: September 26

Transcription:

15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method

1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions for Optimality f(x ) = f(x) psd x B ǫ (x ) x loc. min. Characterizations of Convexity a. f convex f(y) f(x) + f(x) (y x) x, y b. f convex f(x) PSD x f convex then global min f(x ) = Steepest descent.1 The algorithm Step Given x, set :=. Step 1 Step d := f(x ). If d ǫ, then stop. Solve min λ h(λ) := f(x + λd ) for the step-length λ, perhaps chosen by an exact or inexact line-search. Step 3 Set x +1 x + λ d, + 1. Go to Step 1. 3 Outline 1. Bisection Method - Armijo s Rule Slide Slide 3. Motivation for Newton s method 3. Newton s method 4. Quadratic rate of convergence 5. Modification for global convergence 4 Choices of step sizes Min λ f(x + λd ) Slide 4 Limited Minimization: Min λ [,s] f(x + λd ) Constant stepsize λ = s constant 1

Diminishing stepsize: λ, Armijo Rule λ = 4.1 Bisection Line- Search Algorithm 4.1.1 Convex functions λ := arg min h(λ) := argmin f(x + λd ) λ λ Slide 5 If f(x) is convex, h(λ) is convex. Find λ for which h (λ) = h (λ) = f(x + λd ) d 4.1. Algorithm Step. Set =. Set λ L := and λ U := λˆ. Slide 6 Step. Set λ = If h λ U + λ L and compute h (λ ). (λ ) >, re-set λ U := λ. Set + 1. If h (λ ) <, re-set λ L := λ. Set + 1. If h (λ ) =, stop. 4.1.3 Analysis At the th iteration of the bisection algorithm, the current interval [λ L, λ U ] must contain a point λ : h (λ ) = Slide 7 At the th iteration of the bisection algorithm, the length of the current interval [λ L, λ U ] is ( ) length = 1 (λˆ). A value of λ such that λ λ ǫ can be found in at most ( ) λˆ log ǫ steps of the bisection algorithm.

4.1.4 Example Slide 8 h(λ) = (x 1 + λd 1 ).6(x + λd ) + 4(x 3 + λd 3 ) + 4 +.5(x 4 + λd 4 ) log(x i + λd i ) i=1 ( ) 4 4 log (5 x i λ d i i=1 i=1 where (x 1, x, x 3, x 4 ) = (1, 1, 1, 1) and (d 1, d, d 3, d 4 ) = ( 1,.6, 4,.5). Slide 9 and h(λ) = 4.65 17.45λ log(1 λ) log(1 +.6λ) log(1 4λ) log(1.5λ) log(1 + 4.65λ) 1.6 h (λ) = 17.45 + + 4 + 1 λ 1 +.6λ 1 4λ.5 4.65 + 1.5λ 1 + 4.65λ Slide 1 1.5.1.15..5 1 Slide 11 1.. 3. 4.15 5.1875 6.1875 7.1875 8 9 1.195315.195315.195315 λl λ u h (λ ) 1. NaN.5 NaN.5 11.549338348.5.9591763683.5 13.863869418.1875.5969.315.65144155.315.831883373151.199188.8736915988.197656.65313496 Slide 1 3

l λ λ u h (λ ) 3 4 41 4 43 44 45 46 47.19753.1977.1843191.146531.189.3.59.18.3.8.. 4. Armijo Rule Slide 13 Bisection is accurate but may be expensive in practice Need cheap method guaranteeing sufficient accuracy Inexact line search method. Requires two parameters: ǫ (, 1), σ > 1. h (λ) = h() + λǫh () λ acceptable by Armijo s rule if: h(λ ) h (λ ) h(σλ ) h (σλ ) (prevents the step size be small) (i.e. f(x + λd ) f(x ) + λǫ f(x ) d ) Slide 14 We get a range of acceptable stepsizes. Step : Set =, λ = λ > Step : If h(λ ) h (λ ), choose λ stop. If h(λ ) > h (λ ) let λ +1 1 = σλ (for example, σ = ) 5 Newton s method 5.1 History Steepest Descent is simple but slow Newton s method complex but fast Origins not clear Slide 15 Raphson became member of the Royal Society in 1691 for his boo Analysis Aequationum Universalis with Newton method. Raphson published it 5 years before Newton. 4

6 Newton s method 6.1 Motivation Consider min f(x) Slide 16 Taylor series expansion around x f(x) g(x) = f(x ) + f(x ) (x x )+ 1 (x x ) f( x)(x x ) Instead of minf(x), solve min g(x), i.e., g(x) = f(x ) + f(x )(x x ) = x x = ( f(x )) 1 f(x ) The direction d = ( f(x )) 1 f(x ) is the Newton direction 6. The algorithm Step Given x, set :=. Step 1 d := ( f(x )) 1 f(x ). If d ǫ, then stop. Step Set x +1 x + d, + 1. Go to Step 1. 6.3 Comments The method assumes that f(x ) is nonsingular at each iteration There is no guaranteee that f(x +1 ) f(x ) We can augment the algorithm with a line-search: Slide 17 Slide 18 min f(x + λd ) λ 6.4 Properties Theorem If H = f(x ) is PD, then d is a descent direction: f(x ) d < Proof: f(x ) d = f(x ) H 1 f(x ) < if H 1 is PD. But, Slide 19 < v (H 1 ) HH 1 v = v H 1 v H 1 is PD 5

6.5 Example 1 f(x) = 7x log x, 1 x = =.14857143 7 f (x) = 7 1, f (x) = x ( d = (f (x)) 1 1 f (x) = x 1 x ) 1 ( 7 1 x ( x+1 = x + x 7(x ) ) = x 7(x ) ) = x 7x Slide Slide 1 1 3 4 5 6 7 8 x 1-5 -185-39945 -4E11-11E x x.1.193.3599.6917.9814.1884978.1414837.14843938.1485714 x.1.13.1417.1484777.1485714.14857143.14857143.14857143.14857143 6.6 Example Slide f(x 1, x ) = log(1 x 1 x ) log x 1 log x 1 1 f(x 1, x ) = 1 x 1 x x 1 1 1 1 x 1 x x ( ) ( ) ( ) 1 1 1 + 1 x 1 x x 1 1 x 1 x f(x 1, x ) = ( ) ( ) ( ) 1 1 1 + 1 x 1 x 1 x 1 x x 1 Slide 3 ( ) 1 1, x ) = 3, 1 3 (x, f(x 1, x ) = 3.95836867 6

1 x.85 1.71768.51975 3.3547858 4.338449 5.3333377 6.33333334 7.33333333 x.5.9659864.17647971.734878.36387.3335933.33333333.33333333 x x.5895565.458316.384835.63613.874717 7.4133E-5 1.1953E-8 1.571E-16 7 Quadratic convergence Recall z n+1 z z = lim z n, limsup = δ < n n z n z Slide 4 Example: z n = a n, < a < 1, z = z n+1 z a n+1 z n z = = 1 a n+1 7.1 Intuitive statement Theorem Suppose f(x) C 3 (thrice cont. diff/ble), x : f(x ) = and f(x ) is nonsingular. If Newton s method is started sufficiently close to x, the sequence of iterates converges quadratically to x. Lemma: Suppose f(x) C 3 on R n and x a given pt. Then ǫ >, β > : x y ǫ Slide 5 Slide 6 f(x) f(y) H(y)(x y) β x y 7. Formal statement Theorem Suppose f twice cont. diff/ble. Let g(x) = f(x), H(x) = f(x), and x : g(x ) =. Suppose f(x) is twice differentiable and its Hessian satisfies: H(x ) h Slide 7 H(x) H(x ) L x x, x : x x β (Reminder: If A is a square matrix: A = max Ax = max{ λ 1,..., λ n }) x: x =1 7

Suppose ( ) Slide 8 x x h min β, = γ. 3L Let x be the next iterate in the Newton method. Then, ( ) x x x x h and 3L x x < x x < γ. 7.3 Proof x x Slide 9 = x H(x) 1 g(x) x ( ) = x x + H(x) 1 g(x ) g(x) = x x + H(x) 1 1 H(x + t(x x))(x x)dt (Lemma 1) x x Now = H(x) 1 (H(x + t(x x)) H(x)) (x x)dt 1 H(x) 1 1 H(x + t(x x)) H(x) x x dt 1 H(x) 1 x x L x x t dt = 1 L H(x) 1 x x h H(x ) = H(x) + H(x ) H(x) H(x) + H(x ) H(x) H(x) + L x x H(x) h L x x H(x) 1 1 H (x) 1 h L x x Slide 3 Slide 31 Slide 3 8

x x x x L (h L x x ) x x L (h 3 h) = 3L x x h 7.4 Lemma 1 Fix w. Let φ(t) = g(x + t(x x)) w Slide 33 φ() = g(x) w, φ(1) = g(x ) w φ (t) = w H(x + t(x x))(x x) 1 φ(1) = φ() + φ (t)dt w 1 w : w (g(x ) g(x)) = H(x + t(x x))(x x)dt Slide 34 1 g(x ) g(x) = H(x + t(x x))(x x)dt 7.5 Critical comments The iterates from Newton s method are equally attracted to local min and local max. Slide 35 We do not now β, h, L in general. Note, however, that they are only used in the proof, not in the algorithm. We do not assume convexity, only that H(x ) is nonsingular and not badly behaved near x. Slide 36 7.6 Properties of Convergence Proposition: Let r = x x and C = 3L. If x x < γ then h r 1 (Cr ) C Slide 37 9

Proposition: If x x > ǫ > x x < ǫ 1 log( Cǫ ) log( log( 1 ) Cr log 8 Solving systems of equations g(x) = g(x t+1 ) g(x t ) + g(x t )(x t+1 x t ) x t+1 = x t ( g(x t )) 1 g(x t ) Application in optimization: g(x) = f(x) 9 Modifications for global convergence Perform line search Slide 38 Slide 39 When Hessian is singular or near singular, use: ( ) 1 d = f(x ) + D f(x ) D is a diagonal matrix: f(x ) + D is PD 1 Summary 1. Line search methods: Bisection Method. Armijo s Rule. Slide 4. The Newton s method 3. Quadratic rate of convergence 4. Modification for global convergence 1

MIT OpenCourseWare http://ocw.mit.edu 15.93J / 6.55J Optimization Methods Fall 9 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.