Lecture V. Numerical Optimization

Similar documents
Algorithms for Constrained Optimization

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

1 Numerical optimization

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

1 Numerical optimization

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

Scientific Computing: Optimization

Optimization Methods

Quasi-Newton Methods


AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Unconstrained Multivariate Optimization

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10)

Optimization II: Unconstrained Multivariable

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

Programming, numerics and optimization

Optimization and Root Finding. Kurt Hornik

5 Quasi-Newton Methods

Nonlinear Optimization: What s important?

Numerical optimization

5 Handling Constraints

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

CONSTRAINED NONLINEAR PROGRAMMING

Computational Optimization. Augmented Lagrangian NW 17.3

Unconstrained optimization

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

2.3 Linear Programming

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Optimization Tutorial 1. Basic Gradient Descent

Numerical optimization

Statistics 580 Optimization Methods

Convex Optimization CMU-10725

Scientific Computing: An Introductory Survey

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Lecture 13: Constrained optimization

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Multidisciplinary System Design Optimization (MSDO)

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

2. Quasi-Newton methods

Optimization Methods for Circuit Design

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

4TE3/6TE3. Algorithms for. Continuous Optimization

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Optimization II: Unconstrained Multivariable

Nonlinear Programming

Constrained Optimization

Algorithms for constrained local optimization

Lecture 17: Numerical Optimization October 2014

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Chapter 4. Unconstrained optimization

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

Static unconstrained optimization

Optimization Methods

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Nonlinear Optimization

Optimization. Totally not complete this is...don't use it yet...

Lecture 14: October 17

Gradient Descent. Dr. Xiaowei Huang

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

NonlinearOptimization

Lecture 7 Unconstrained nonlinear programming

2.098/6.255/ Optimization Methods Practice True/False Questions

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

8 Numerical methods for unconstrained problems

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Nonlinear Optimization for Optimal Control

Optimization Concepts and Applications in Engineering

IMM OPTIMIZATION WITH CONSTRAINTS CONTENTS. 2nd Edition, March K. Madsen, H.B. Nielsen, O. Tingleff

Multivariate Newton Minimanization

Chapter 3 Numerical Methods

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Lecture 18: November Review on Primal-dual interior-poit methods

Introduction to unconstrained optimization - direct search methods

Topic 8c Multi Variable Optimization

Selected Topics in Optimization. Some slides borrowed from

Lecture 34 Minimization and maximization of functions

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Non-Linear Optimization

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Fitting The Unknown 1/28. Joshua Lande. September 1, Stanford

MATH 4211/6211 Optimization Quasi-Newton Method

Unconstrained minimization of smooth functions

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

4M020 Design tools. Algorithms for numerical optimization. L.F.P. Etman. Department of Mechanical Engineering Eindhoven University of Technology

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Part 2: NLP Constrained Optimization

Interior-Point Methods for Linear Optimization

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.

Chapter 2. Optimization. Gradients, convexity, and ALS

Optimisation in Higher Dimensions

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Computational Finance

Optimization for neural networks

Transcription:

Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19

Isomorphism I We describe minimization problems: to maximize f minimize f If dealing with the minimization problem min f (x) x [a,b] and you know it is concave, use root-finding on the FOC f (x) = 0 You do not need to compute the derivative analytically, you can use a finite difference method to approximate f. G. Violante, Numerical Optimization p. 2 /19

Isomorphism II Note that the multivariate root-finding problem f 1 (x 1,x 2,...,x n ) = 0 f 2 (x 1,x 2,...,x n ) = 0... f n (x 1,x 2,...,x n ) = 0 can be restated as a nonlinear least square problem min {x 1,x 2,...x n } n f i (x 1,x 2...,x n ) 2 i=1 that can be solved via numerical optimization methods G. Violante, Numerical Optimization p. 3 /19

Basic concepts of numerical optimization All methods search through the space of feasible choices generating a sequence of guesses that converges to the solution Methods can be divided into two groups: Comparison methods: compute objective at several points and pick the one yielding the smallest value Gradient-based methods: use info on slope and possibly on the curvature (hessian-based): fastest, but unstable and most costly to code All methods discussed here yield local minima. We need multiple initializations to explore whether solution is a global minimum Grid search: the most primitive. It does not find an exact solution, but it conveys information about the shape of the objective function. Useful first step. G. Violante, Numerical Optimization p. 4 /19

Choosing an optimization algorithm 1. Speed: the major determinant of computation time is the number of function evaluations required by the algorithm to find a local minimum. Gradient based methods generally involve fewer function evaluations than the other methods 2. Robustness to non-smoothness: Gradient based methods get stuck if the objective function is not smooth. Comparison methods better behaved. 3. Robustness to starting parameters: gradient based algorithm started at different starting parameters are more likely to lead to a different local minimum. Stochastic comparison methods more likely to find global minimum. G. Violante, Numerical Optimization p. 5 /19

Bracketing method Most reliable for one dimensional problems Initialization: find a < b < c such that f(a),f(c) > f(b) 1. Choose d (a,c) and compute f(d) 2. Choose new (a,b,c) triplet. If d < b and f(d) > f(b), then there is a minimum in [d,c]. Update the triple (a,b,c) with (d,b,c). If d < b and f(d) < f(b), then the minimum is in [a,b]. Update the triple (a,b,c) with (b,d,c). Otherwise, update the triple (a,b,c) with (a,b,d). 3. Stop if c a < δ. If not, go back to step 1 Golden search: more sophisticated version that features an optimal way to segment intervals G. Violante, Numerical Optimization p. 6 /19

Simplex method Comparison methods in n dimensions also called Nelder-Meade or polytope method. In Matlab: fminsearch 1. Choose initial simplex {x 1,x 2,...,x n,x n+1 } in R n 2. Reorder the simplex vertices in descending order: f(x i ) f(x i+1 )..., i 3. Find the smallest i such that f(x R i ) < f(x i), where x R i is the reflection of x i. If exists, replace x i with x R i and go back to step 2. Else, go to step 4. 4. If width of the current simplex is < ε, stop. Else, go to step 5. 5. For i = 1,...,n set x S i = x i+x i+1 2 to shrink the simplex. Go back to step 1. Other simplex methods use other operations, such as expansions and contractions of vertices around the centroid of the simplex G. Violante, Numerical Optimization p. 7 /19

G. Violante, Numerical Optimization p. 8 /19

Simulated Annealing: stochastic comparison method Best algorithm to find a global minimum in a problem plagued with lots of local ones... but excruciatingly slow 1. Draw z from N(0,1) and perturb initial guess x 0 : x 1 = x 0 +zλ, with λ given step length 2. If f(x 1 ) < f(x 0 ), accept candidate solution and go to step 3. If f(x 1 ) f(x 0 ) accept stochastically, i.e., if: f(x 1 ) f(x 0 ) f(x 0 ) < τc, c U[0,1], and τ > 0 is temperature parameter If not, go back to step 1. 3. Stop if x 1 x 0 is less than tolerance level Along the algorithm, decrease τ 0 to reduce randomness G. Violante, Numerical Optimization p. 9 /19

Newton/Quasi-Newton methods (one dimension) Idea is similar to root-finding Newton methods Second-order Taylor approximation of f around x n : f (x) f (x n )+f (x n )(x x n )+ 1 2 f (x n )(x x n ) 2 Choose x = x n+1 that minimizes the approximate function f (x n )+f (x n )(x x n ) = 0 x n+1 = x n f (x n ) f (x n ) It needs evaluation of first and second derivative Idea: you move in the direction of the derivative, and if function is very concave you move little, if it is very linear you move a lot. G. Violante, Numerical Optimization p. 10 /19

Multivariate versions Updating equation in R n : x n+1 = x n H(x n ) 1 f (x n ) Note that a second-order Taylor appx. around x n+1 yields: f ( x n+1) f (x n )+ f (x n ) ( x n+1 x n) + 1 2 ( x n+1 x n) H(x n ) ( x n+1 x n) and using the updating rule: f ( x n+1) f (x n ) 1 2 ( x n+1 x n) H(x n ) ( x n+1 x n) therefore as long as the Hessian is approximated by a positive definite matrix, the algorithm moves in the right direction G. Violante, Numerical Optimization p. 11 /19

Multivariate version Computation of Hessian is time consuming. Nothing guarantees that is positive definite away from the solution That s where quasi-newton methods come handy Simplest option: set Hessian to I x n+1 = x n f (x n ) This method is called steepest descent. Always moving in the right direction, but slow convergence rate. BHHH (Berndt, Hall, Hall, and Hausman) method uses the outer product of the gradient vectors to replace the Hessian Broyden-Fletcher-Goldfarb-Shanno (BFGS) and Davidon-Fletcher-Powell (DFP) algorithms: secant methods that approximate Hessian with symmetric positive definite matrix G. Violante, Numerical Optimization p. 12 /19

Penalty-function methods We wish to solve the minimization problem subject to inequality and equality constraints: min x f (x) s.t. g i (x) 0, i = 1,...,m r j (x) = 0, j = 1,...,k Construct the quadratic penalty function: P (x) = 1 2 m [max{0,g i (x)}] 2 + 1 2 i=1 k [r j (x)] 2 j=1 which yields quadratic-augmented objective function: h(c,x) f (x)+cp (x) G. Violante, Numerical Optimization p. 13 /19

Choice of penalty-parameter Each unsatisfied constraint influences x by assessing a penalty equal to the square of the violation. These influences are summed and multiplied by c the penalty parameter This influence is counterbalanced by f(x). If the penalty term is small relative to the f (x), minimization of h(x) will almost certainly not result in an x feasible to the original problem. Starting the problem with a very high value of c yield a feasible solution but causes problems: 1. almost impossible to tell how large c must be get solution without creating numerical difficulties in the computations 2. problem is dynamically changing with the relative position of x and with the subset of the constraints violated 3. large values of c create enormously steep valleys at the boundaries, and hence formidable convergence difficulties unless algorithm starts extremely close to the minimum G. Violante, Numerical Optimization p. 14 /19

Sequential Unconstrained Minimization Technique Start with a relatively small value of c and an infeasible point. This way no steep valleys are present in the initial optimization of (c, x). Next, solve a sequence of problems with monotonically increasing values of c chosen so that the solution to each new problem is close to the previous one. 1. Initialization Step: Select a growth parameter η > 1, a stopping parameter ε > 0, and an initial penalty value c 0. 2. At iteration t minimize h(c t,x). Call this solution x t. If convergence reached, stop, if not set c t+1 = (1+η)c t and solve the minimization problem starting from x t. Must scale the constraints so that the penalty generated by each one is about the same magnitude Choose c 0 so that penalty term is same size as objective function G. Violante, Numerical Optimization p. 15 /19

Proof of convergence of SUMT Easy to prove that {f (c t,x t )} decreasing sequence: h ( c t+1,x t+1) = f ( x t+1) +c t+1 P ( x t+1) f ( x t+1) +c t P ( x t+1) From this, we have f ( x t+1) +c t P ( x t+1) f ( x t) +c t P ( x t) f ( x t) +c t P ( x t) = h ( c t,x t) f ( x t) +c t+1 P ( x t) f ( x t+1) +c t+1 P ( x t+1) since the RHS are both minima Now, subtract bottom-right from top-left and bottom left from top right to prove that P ( x t+1) P (x t ) G. Violante, Numerical Optimization p. 16 /19

Proof of convergence of SUMT f ( x t+1) +c t P ( x t+1) f ( x t+1) c t+1 P ( x t+1) f ( x t) +c t P ( x t) f ( x t) This result together with c t+1 P ( x t) c t P ( x t+1) c t+1 P ( x t+1) c t P ( x t) c t+1 P ( x t) ( c t+1 c t) P ( x t+1) ( c t+1 c t) P ( x t) P ( x t+1) P ( x t) f ( x t+1) +c t P ( x t+1) f ( x t) +c t P ( x t) f ( x t+1) f ( x t) because we are moving from outside to inside the feasible region f (x ) = f (x )+c t P (x ) f ( x t) +c t P ( x t) f ( x t) Thus, P (x t ) P (x ) and f (x t ) f (x ) G. Violante, Numerical Optimization p. 17 /19

Barrier methods The penalty method searches always outside the feasible region An alternative that searches always inside is the barrier method based on the augmented objective function h(c,x) f (x)+c m log[ g i (x)] i=1 where you start from a large value of c. At each subsequent iteration, the value of c is monotonically decreased and x approaches boundaries Penalty (exterior) methods better because barrier (interior) methods cannot handle equality constraints Issue with penalty method: need to define f smoothly outside the feasible region. G. Violante, Numerical Optimization p. 18 /19

Karush-Kuhn-Tucker Methods This method directly solves the KKT system of (nonlinear) equations associated with the problem by a kind of the Newton or Quasi-Newton approach. Example with equality constraints: min x,λ L(x,λ) = f (x)+λt r(x) The system of KT conditions is: x L(x,λ) = 0 f (x)+ r(x) T λ = 0 λ L(x,λ) = 0 r(x) = 0 We have a nonlinear system of n+k equations in (n+k) unknowns that can be solved as we studied. G. Violante, Numerical Optimization p. 19 /19