Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Similar documents
Optimization Methods. Lecture 19: Line Searches and Newton s Method

The Steepest Descent Algorithm for Unconstrained Optimization

Nonlinear Optimization: What s important?

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Nonlinear Optimization for Optimal Control

8 Numerical methods for unconstrained problems

You should be able to...

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

IE 5531: Engineering Optimization I

Line Search Methods for Unconstrained Optimisation

Numerical optimization

Introduction to Nonlinear Optimization Paul J. Atzberger

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

5 Overview of algorithms for unconstrained optimization

Unconstrained optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Optimization Tutorial 1. Basic Gradient Descent

Convex Optimization. Problem set 2. Due Monday April 26th

Written Examination

MATH 4211/6211 Optimization Basics of Optimization Problems

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

ARE202A, Fall Contents

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

CS711008Z Algorithm Design and Analysis

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Algorithms for constrained local optimization

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Programming, numerics and optimization

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

A New Trust Region Algorithm Using Radial Basis Function Models

IE 5531: Engineering Optimization I

Unconstrained minimization of smooth functions

Lecture 3: Basics of set-constrained and unconstrained optimization

Lecture 3. Optimization Problems and Iterative Algorithms

154 ADVANCES IN NONLINEAR PROGRAMMING Abstract: We propose an algorithm for nonlinear optimization that employs both trust region techniques and line

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

GRADIENT = STEEPEST DESCENT

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Gradient Descent. Dr. Xiaowei Huang

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Unconstrained Optimization

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

Lecture 7 Unconstrained nonlinear programming

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

IE 5531: Engineering Optimization I

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

17 Solution of Nonlinear Systems

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

1. Introduction In this paper we study a new type of trust region method for solving the unconstrained optimization problem, min f(x): (1.1) x2< n Our

Numerical Optimization: Basic Concepts and Algorithms

Optimality conditions for unconstrained optimization. Outline

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Unconstrained optimization I Gradient-type methods

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

2.098/6.255/ Optimization Methods Practice True/False Questions

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Copositive Programming and Combinatorial Optimization

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions

Numerical Optimization

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

The Conjugate Gradient Method

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

Unconstrained minimization: assumptions

2 JOSE HERSKOVITS The rst stage to get an optimal design is to dene the Optimization Model. That is, to select appropriate design variables, an object

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Computational Finance

Chapter 2. Optimization. Gradients, convexity, and ALS

Calculus and optimization

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning

Math (P)refresher Lecture 8: Unconstrained Optimization

Introduction to unconstrained optimization - direct search methods

Higher-Order Methods

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

CHAPTER 2: QUADRATIC PROGRAMMING

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

CONSTRAINED NONLINEAR PROGRAMMING

3E4: Modelling Choice. Introduction to nonlinear programming. Announcements

14. Nonlinear equations

Chapter 3 Numerical Methods

Examination paper for TMA4180 Optimization I

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Constrained Optimization

More First-Order Optimization Algorithms

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.

Semidefinite Programming Basics and Applications

Math 273a: Optimization Basic concepts

Transcription:

5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization

Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The steepest descent algorithm 4. Rate of convergence 5. Line search algorithms Last Lecture Nonlinear Optimization Applications Portfolio Selection Slide Facility Location (Geometry Problems) Trac Assignment, Routing 3 The general problem f : < n 7! < is a continuous (usually dierentiable) function of n variables Slide 3 g i : < n 7! < i = : : : m h j : < n 7! < j = : : : l N L P : min f (x) s.t. g (x). g m (x) h (x) =. h l (x) = 3. Local vs Global Minima Slide 4 x F is a local minimum of N LP if there exists > su ch that f (x) f (y) for all y B(x ) \ F x F is a global minimum of N LP if f (x) f (y) for all y F.

4 Convex Sets and Functions n A subset S < is a convex set if Slide 5 x y S ) x + ( ; )y S 8 [ ] A function f (x) is a convex function if f (x + ( ; )y) f (x) + ( ; )f (y) 8x y 8 [ ] 5 Convex Optimization 5. Convexity and Minima COP is called a convex optimization problem if f (x) g (x) : : : g m(x) are convex functions This implies that the objective function is convex and the feasible region F is a convex set. Implication: If COP i s a c o n vex optimization problem, then any local minimum will be a global minimum. 6 Optimality Conditions Necessary Conds for Local Optima \If x is local optimum then x must satisfy..." Slide 6 Slide 7 Identies all candidates for local optima. Sucient Conds for Local Optima \If x satises...,then x must be a local optimum " 7 Optimality Conditions 7. Necessary conditions Consider min f (x) x< n Slide 8

Zero rst order variation along all directions Theorem Let f (x) b e c o n tinuously dierentiable. If x < n is a local minimum of f (x), then rf (x ) = and r f (x ) P SD 7. Proof Zero slope at local min x f (x ) f (x + d) for all d < n < Slide 9 Pic > f (x + d) ; f (x ) Tae limits as! r f (x ) n d 8d < Since d arbitrary, replace with ;d ) r f (x ) = : Nonnegative curvature at a local min x ) f (x + d) ; f (x = rf (x ) (d) + (d) r f (x )(d) + jjdjj R(x d) where R(x y)! as y!. Since rf (x ), = Slide = d r f (x )d + jjdjj R(x d) ) f (x + d) ; f (x ) d = r f (x )d + jjdjj R(x d) If r f (x ) is not PSD, 9d : d r f (x )d < ) f (x + d ) < f ( 8 x), su. QED. small 7.3 Example f (x) = x + x :x + x ; 4x ; 4x ; x 3 rf (x) = ( x + x ; 4 x + 4 x ; 4 ; 3x ) Candidates x = (4 ) and x = (3 ) r f (x) = 4 ; 6x r f (x ) = 4 PSD x = (3 ) r f ( x) = ; Indenite matrix x is the only candidate for local min Slide Slide 3

7.4 Sucient conditions Theorem f twice continuously dierentiable. If rf (x ) = and r f (x) P S D in B(x ), then x is a local minimum. Proof: Taylor series expansion: For all x B(x ) Slide 3 for some [ ] f (x) = f (x ) + rf (x ) (x ; x ) + (x ; x ) r f (x + (x ; x ))(x ; x ) ) f (x) f (x ) 7.5 Example Continued... Slide 4 At x = ( 4 ), rf (x ) = a n d r f (x) = 4 ; 6x is PSD for x B(x ) Slide 5 3 f (x) = x + x and rf (x) = ( 3 x x ) x = ( ) r 6x f (x) = is not PSD in B( ) f (; ) = ; 3 < = f (x ) 7.6 Characterization of convex functions Theorem Let f (x) b e c o n tinuously dierentiable. Then f (x) i s c o n vex if and only if Slide 6 rf (x) (x ; x) f (x) ; f (x) 7.7 Proof By convexity As!, f (x + ( ; )x) f (x) + ( ; )f (x) f (x + (x ; x)) ; f (x) f (x) ; f (x) rf (x) (x ; x) f (x) ; f (x) Slide 7 4

If ( rf x) d <, then for small, > 7.8 Convex functions Theorem Let f (x) b e a c o n tinuously dierentiable convex function. Then x is a m in im um of f if and only if Slide 8 Proof: If f convex and rf (x ) = rf (x ) = ) f (x) ; f (x ) r f (x (x ; x ) = 7.9 Descent Directions Interesting Observation Slide 9 f di/ble at x 9d: rf ( x) d < ) 8 >, su. small, f ( x + d) < f ( x) (d: descent direction) 7. Proof where R( x d) ;!! f ( x + d) = f ( x) + rf ( x) t d + jjdjjr( x d ) Slide f ( x + d) ; f ( x) = rf ( x) t d + jjdjjr( x d ) x) rf ( t d <, R( x d) ;!! ) 8 > su. sm all f ( x + d) < f ( x). QED 8 Algorithms for unconstrained optimization 8. Gradient Methods-Motivation Slide Decrease f (x) until rf (x ) = f ( x + d) f ( x) + rf x) ( d x + d) < ( f ( f x) 5

Principal example: 9 Gradient Methods 9. A generic algorithm + x = x + d If rf (x ) 6=, direction d satises: rf (x ) d < Slide Step-length > D + = x x ; D rf (x ) positive denite symmetric matrix 9. Principal directions Steepest descent: Newton's method: 9.3 Other directions Diagonally scaled steepest descent + x = x ; rf (x ) x = x ; (r f (x )) ; rf (x ) + D = Diagonal approximation to (r f (x )) ; Slide 3 Slide 4 Modied Newton's method D = Diagonal approximation to (r f (x )) ; = Gauss-Newton method for least squares problems f (x) = jjg(x)jj D (rg(x )rg(x ) ) Steepest descent. The algorithm Step Given x, set :=. Step d := ;rf (x ). If jjd jj, then stop. Slide 5 Step Solve min h() := f (x + d ) for the step-length, perhaps chosen by an exact or inexact line-search. + Step 3 Set x x + d,. +. Go to Step 6

. An example minimize f (x x ) = 5 x + x + 4 x x ; 4x ; 6x + x = ( x x ) = ( ) Slide 6 Given x f (x ) = Slide 7 d d ;x = ;rf(x x ) = ; 4x + 4 = ;x d ; 4x + 6 ) h() = f (x + d = 5(x ) + ( x + d )(x + d + d ) + 4( x + d ); ;4(x + d ) ; 6(x + d ) + (d ) + ( d ) = Slide 8 (5(d ) + ( d ) + 4 d d ) Start at x = ( ) ;6 " = x x d d jjd jj f (x ) : : ;6: ;4: 9:59646 :866 6: ;:578 8:786963 :379968 ;:56798 :9734 :8 :576 3 :755548 3:64 ;6:355739 ;3:43 7:856659 :866 :98787 4 :485 :93535 :337335 ;:6648 :7583 :8 :73379 5 :9443 :53789 ;:55367 ;:83659 :7645895 :866 :7854 6 :8565 :4653 :846 ;:5344 :73934 :8 :43645 7 :98539 :3468 ;:379797 ;:456 :4335657 :866 :669 8 :95485 :3749 :58 ;:37436 :45845 :8 :68 9 :99649 :338 ;:984 ;:4999 :544577 :866 :638 :988385 :786 :498 ;:95 :3937 :8 :56 x x d d jjd jj f (x ) :9997 :7856 ;:695 ;: :577638 :866 :38 :9976 :6797 :5 ;:37 :5476 :8 :9 3 :999787 :9 ;:5548 ;:987 :637 :866 : 4 :99936 :66 :94 ;:547 :69 :8 : 5 :999948 :469 ;:356 ;:73 :543 :866 : 6 :99983 :46 :7 ;:34 :583 :8 : 7 :999987 :5 ;:33 ;:79 :37653 :866 : 8 :999959 :99 :8 ;:33 :37 :8 : 9 :999997 :8 ;:8 ;:44 :94 :866 : :99999 :4 :4 ;:8 :97 :83 : :999999 :7 ;: ;: :5 :866 : :999998 :6 : ;: : :87 : 3 : : ;:5 ;:3 :55 :866 : 4 :999999 : : ;: :54 : : Slide 9 Slide 3 7

5 x +4 y +3 x y+7 x+ 8 6 4 5 y 5 5 x 5 Slide 3 8 6 4 5 5.3 Important Properties f(x + ) <f(x ) < <f(x ) (because d are descent directions) Slide 3 Under reasonable assumptions of f(x), the sequence x x ::: will have at least one cluster point x Every cluster point x will satisfy rf(x) = Implication: If f(x) is a convex function, x will be an optimal solution 8

Global Convergence Result Theorem: f : R n! R is continuously di/ble on F = fx R n : f (x) f (x )g closed, bo unded set Every cluster point x of fx g satises rf ( x) =.. Wor Per Iteration Two computation tass at each iteration of steepest descent: Slide 33 Slide 34 Compute rf (x ) (for quadratic objective functions, it taes O(n ) steps) to determine d = ;rf (x ) Perform line-search o f h() = f (x + d ) to determine = arg min h() = arg min f (x + d ) Rate of convergence of algorithms Let z : : : z n : : :! z be a c o n vergent sequence. We s a y that the order of convergence of this sequence is p if jz + ; zj p = sup p : lim sup <! jz ; zj p Let jz + ; zj = lim sup jz ; zj p! The larger p, the faster the convergence. Types of convergence. p = < <, then linear (or geometric) rate of convergence Slide 35 Slide 36. p = =, super-linear convergence 3. p = =, sub-linear convergence 4. p = quadratic convergence 9

. Examples z = a < a < c o n verges linearly to zero, = a Slide 37 z = a < a < c o n verges quadratically to zero z = converges sub-linearly to zero z = converges super-linearly to zero.3 Steepest descent z = f (x ), z = f (x ), where x = arg min f (x) Slide 38 Then an algorithm exhibits linear convergence if there is a constant < such that + f (x ) ; f (x ) f (x ) ; f (x, ) for all suciently large, where x is an optimal solution..3. Discussion + f (x ) ; f (x ) < f (x ) ; f (x ) Slide 39 optimal alue. If = :, every iteration adds another digit of accuracy to the objective v If = :9, every iterations add another digit of accuracy to the optimal objective v alue, because (:9) :. 3 Rate of convergence of steepest descent 3. Quadratic Case 3.. Theorem Suppose f (x) = Q is psd x Qx ; c x Slide 4

max = largest eigenvalue of Q min = smallest eigenvalues Q of Linear Convergence T h e orem : If f(x) is a quadratic function and Q is psd, then + f(x ) ; f(x max ) ; min @ A f(x ) ; f(x ) max min + 3.. Discussion f(x + ) ; f(x max ) min ; @ A f(x ) ; f(x ) max min + Slide 4 (Q) := (Q) max is the condition number of Q min (Q) plays an extremely important role in analyzing computation involving Q (Q) = max min + f(x ) ; f(x ) (Q) ; f(x ) ; f(x ) (Q) + Upper Bound on Number of Iterations to Reduce Convergence Constant the Optimality Gap by...3 3..5..67 6..96 58..98 6 4..99 3 For (Q) O() converges fast. For large (Q) Slide 4 Slide 43 Therefore In (Q) ; ) ( ; ; (Q) + (Q) (Q) (f(x ) ; f(x )) ( ; ) (f(x ) ; f(x )) (Q) (Q)(;ln ) iterations, nds x : (f(x ) ; f(x )) (f(x ) ; f(x ))

6 3. Example (Q) = 3 :34 (Q); = (Q)+ = :876 f (x) = x Qx ; c x + 4 5 Q = c = 5 Slide 44 Slide 45 3 4 5 6 7 8 9 x 4: 5:54693 6:77558 6:7635 7:47 :986 :34366 7:844 7:393573 4:684 x ;: ;99:6967 ;64:6683 ;64:468535 ;4:4698 ;4:8563 ;6:5894 ;6:9455 ;6:46575 ;5:98969 jjd jj 86:6934 77:697948 88:59488 5:375844 3:884577 33:64869 8:5579489 :437 53:653873 4:578836 :56 :56 :56 :56 :56 f (x ) 65: 398:6958 6:587793 74:8777 35:4663 747:5555 49:4977 34:53734 3:73595 4:9596 f (x ) ; f (x ) f (x ; ) ; f (x) Slide 46 x :46997 3 ;:5998 4 ;:48 5 ;:36 6 ;:3395 7 ;:3336 8 ;:3333 9 ;:33333 x :948466 3:3899 3:975 3:3885 3:3378 3:33365 3:33335 3:33333 jjd jj :7984766 :9698 :739574 :338 :473 :55 :636 :78 : f (x ) 3:666 :96583 :93388 :93334 :933333 :933333 :933333 :933333 f (x ) ; f (x ) f (x ; ) ; f (x) :65878 :6585 :654656 : Slide 47 3.3 Example 3 f (x) = x Qx ; c x + 5 4 Q = c = 5 6 6 Slide 48

3 (Q) = :854 (Q) ; = = :896 (Q) + Slide 49 3 4 5 6 7 8 9 3 4 5 x 4: 9:8678 :534 :563658 :74549 :736 :66755 :659643 :6578 :6577 :657636 :65763 :65768 :65767 :65767 x ;: ;:56 ;4:5558 :35 ;:53347 :66834 :5898 :69366 :68996 :69486 :69468 :6949 :6949 :6949 :6949 jjd jj 434:7933649 385:96565 67:67355 8:445 3:98573 :8586649 :554644 :44973 :764 :99 :3349 :99 :58 :45 :75 :74 :459 :74 :459 :74 :459 :74 :459 :74 :459 :74 :459 :74 :459 : f (x ) 765: 359:6537 74:5893 :8678 5:64475 4:95886 4:888973 4:88875 4:88837 4:88836 4:88836 4:88836 4:88836 4:88836 4:88836 f (x ) ; f (x ) f (x ; ) ; f (x) :4766 :4766 :4766 :4766 :4766 :4766 :4766 :4766 :4766 :4766 :476 :4768 :45 : Slide 5 3.4 Empirical behavior The convergence constant bound is not just theoretical. It is typically experienced in practice. Slide 5 Analysis is due to Leonid Kantorovich, who won the Nobel Memorial Prize in Economic Science in 975 for his contributions to optimization and economic planning. Slide 5 What about non-quadratic functions? { Suppose x = a r g m i n x f (x)

{ r f (x ) is the Hessian of f (x) at x = x { Rate of convergence will depend on (r f (x )) 4 Summary. Optimality Conditions Slide 53. The steepest descent algorithm - Convergence 3. Rate of convergence of Steepest Descent 5 Choices of step sizes M in f (x + d ) Slide 54 Limited Minimization: M in [ s]f (x + d ) Constant stepsize constant = s P Diminishing stepsize:! = Armijo Rule 4