ECS550NFB Introduction to Numerical Methods using Matlab Day 2

Similar documents
5 Quasi-Newton Methods

Lecture V. Numerical Optimization

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Optimization II: Unconstrained Multivariable

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

1 Numerical optimization

Optimization and Root Finding. Kurt Hornik

Higher-Order Methods

Optimization II: Unconstrained Multivariable

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Line Search Methods. Shefali Kulkarni-Thaker

1 Numerical optimization

Algorithms for Constrained Optimization

2. Quasi-Newton methods

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Nonlinear Optimization: What s important?

Unconstrained optimization

Quasi-Newton Methods

Convex Optimization CMU-10725

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Nonlinear Programming

Statistics 580 Optimization Methods

Unconstrained Multivariate Optimization

Numerical Optimization: Basic Concepts and Algorithms

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Lecture 14: October 17

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Optimization Methods

8 Numerical methods for unconstrained problems

Scientific Computing: Optimization

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

4M020 Design tools. Algorithms for numerical optimization. L.F.P. Etman. Department of Mechanical Engineering Eindhoven University of Technology

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Quasi-Newton methods for minimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N.

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Chapter 4. Unconstrained optimization

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Programming, numerics and optimization

Gradient-Based Optimization

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Static unconstrained optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Convex Optimization. Problem set 2. Due Monday April 26th

MATH 4211/6211 Optimization Quasi-Newton Method

IE 5531: Engineering Optimization I

Computational Finance

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

LINEAR AND NONLINEAR PROGRAMMING

MATH 4211/6211 Optimization Basics of Optimization Problems

Lecture 18: November Review on Primal-dual interior-poit methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods

IE 5531: Engineering Optimization I

Maria Cameron. f(x) = 1 n

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Algorithms for constrained local optimization

Newton s Method. Javier Peña Convex Optimization /36-725

Cubic regularization in symmetric rank-1 quasi-newton methods

MS&E 318 (CME 338) Large-Scale Numerical Optimization

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Optimization Tutorial 1. Basic Gradient Descent

Introduction to unconstrained optimization - direct search methods

Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations

Introduction to Nonlinear Optimization Paul J. Atzberger

Numerical Optimization. Review: Unconstrained Optimization

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

Line Search Methods for Unconstrained Optimisation

Nonlinear Optimization for Optimal Control

Optimization Methods for Machine Learning

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

Unconstrained minimization of smooth functions

Applied Computational Economics Workshop. Part 3: Nonlinear Equations

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Optimisation in Higher Dimensions

3E4: Modelling Choice. Introduction to nonlinear programming. Announcements

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Gradient Descent. Dr. Xiaowei Huang

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Quasi-Newton Methods

Multidisciplinary System Design Optimization (MSDO)

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Conditional Gradient (Frank-Wolfe) Method

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

The Conjugate Gradient Method

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Krzysztof Tesch. Continuous optimisation algorithms

Written Examination

Transcription:

ECS550NFB Introduction to Numerical Methods using Matlab Day 2 Lukas Laffers lukas.laffers@umb.sk Department of Mathematics, University of Matej Bel June 9, 2015

Today Root-finding: find x that solves f(x) = 0 Optimization: find x that optimizes (minimizes/maximizes) f(x), Constrained/Unconstrained optimization

Root-finding Given univariate function on an interval. We want to find x that solves f(x) = 0. simple slow robust In MATLAB fzero fsolve - requires Optimization Toolbox

Bisection Image source: wiki

Root-finding: Newton Method Based on a Taylor approximation: h f(x) f (x), and x k+1 = x k f(x) f (x) f(x + h) f(x) + hf (x) key is the linear approximation, NR method is as good as is the linear approximation Multiple dimensions f(x + h) f(x) + h T f(x) h ( f(x)) 1 f(x) and x k+1 = x k ( f(x)) 1 f(x).

Newton Method Image source: wiki

Newton Method experiment with different formulation. x 2 sin x = 0 vs. 2/x 1/ sin x = 0, this may help us to get rid of a flat part if x k fails to settle down, use different starting point verification: f(x ɛ) < 0 < f(x + ɛ)

Secant Method In Newton s method we had to calculate derivative, this may be computationally expensive Secant method approximates the derivative 0 f(x 1) x 2 x 1 = f(x 0) f(x 1 ) x 0 x 1 x k x k 1 x k+1 = x k f(x k ) f(x k ) f(x k 1 )

Secant Method Image source: mathworld.wolfram.com

Optimization - problem classification Local vs Global Local - most numerical methods concerns local optimization. The nature of the problem may guarantee the uniqueness. Global - usually stochastic methods. There is a chance that we jump-off local extrema. Constrained vs Unconstrained Unconstrained Constrained

Optimization Understand the problem! There exists no method that it superior in all situations. How many variables do we optimize over? What is the shape of the objective function (e.g. concave/convex)? Is the constraint set convex? How costly is it to evaluate the objective function? How costly is it to evaluate the gradient of the objective function? Trade-off speed vs. generality.

What is available Without toolbox fminbnd - minimum of a single variable function on a fixed interval fminsearch - unconstrained derivative-free minimization fzero - find a root of a nonlinear function Optimization Toolbox Constrained minimization Linear and Quadratic programming Mixed integer programming Global Optimization Toolbox fminbnd - minimum of a single variable function on a fixed interval fminsearch - unconstrained derivative-free minimization fzero - find a root of a nonlinear function

Formulating the Problem in MATLAB structure problem min x f(x) s.t. A x b A eq x = b eq lb x ub f - function to be minimized A and b defining inequality restrictions Aeq and beq defining equality restrictions lb and ub defining bounds on variables x0 - starting point solver - (e.g. linprog, fminunc) options - in MATLAB run: solver(problem)

Optimization in MATLAB structure output exitflag - reason why solver terminated (1, 0, 1, 2, 3,...) lambda - Lagrange multipliers at the solution (lower, upper, ineqlin, eqlin) output - information about the optimization (iterations, constrviolation, firstorderopt,...) structure options Algorithm - algorithm to be used Maxiter - maximum number of iterations TolFun - termination tolerance for the function value.

Unconstrained optimization Golden Section Search Univariate function Interval Newton Method Multivariate function Gradient supplied Hessian supplied Quasi-Newton Methods Multivariate function Gradient supplied Hessian approximated Nelder-Mead Method Multivariate function only need function values Trust region Methods Multivariate function only need function values useful for large-scale problems

What is a good method fast - in terms of computing speed, usually measured in no. of function evaluations or in general rate of convergence x k+1 x x k x reliable - guarantees success, (assumptions needed for this!) robust - behaves well under different scenarios efficient - is the fastest (for a certain class of problems, under certain assumptions)

Convex and non-convex sets and functions Convex sets and functions. Image source: Brandimarte

Unconstrained optimization - conditions for smooth functions First order necessary conditions: x is local min of f, f is continuously differentiable around x, then f(x ) = 0. Second order necessary conditions: x is local min of f, 2 f is continuous around x, then f(x ) = 0 and 2 f(x ) 0. Second order sufficient conditions: 2 f is continuous around x, f(x ) = 0 and 2 f(x ) 0, then x is strict local min of f. Convex/concave differentiable function = stationary point is a global minimizer/maximizer.

Unconstrained optimization - strategies Line search: min α>0 f(x + αp). Set direction p and make step of size α Trust-region: Approximate f with some function m in a region around x.

Unconstrained optimization - strategies Image source: Nocedal and Wright

Step size How to choose the step size α? Fixed step size - reduce step size if no optimum found Wolfe conditions (c 1 = 10 4, c 2 (c 1, 1)) Sufficient decrease: f(xk + α k h k ) f(x k ) + c 1 α f(x k ) T h k Curvature: f(xk + α k h k ) T h k c 2 f(x k ) T h k (step is not too small) Backtracking line search - adaptively reduce the step size (from α to βα, for some β (0.1, 0.8)), until the step is good enough according to some criterion (e.g. sufficient decrease) Exact line search - arg min α 0 f(x α f(x)) - usually not efficient

Wolfe conditions Image source: Nocedal and Wright

Golden Section Search Extremum of unimodal function on an interval. We choose the points at which we evaluate f in a clever way: we make sure to use the the interval that contains the extremum shrinks at the best possible rate 1+ 5 2 1.618 is I n+1 I n

Golden Section Search Image source: wiki

Steepest descent Let us choose the natural direction going down the hill, f(x) x k+1 = x k α f(x k ) When do we stop our algorithm? f(x) < ɛ

Steepest descent Image source: Nocedal and Wright

Steepest descent - scaling Image source: Nocedal and Wright

Newton Method We require information about the gradient and hessian (This can be very costly). f(x + h) = f(x) + f(x) T h + 1 2 ht 2 f(x)h minimize via direction h, we get h = ( 2 f(x)) 1 f(x) basically root-finding applied to the first derivative computation of ( 2 f(x)) 1 costly convergence not guaranteed (may stuck in infinite cycle)

Quasi-Newton Methods We avoid the calculation of the Hessian matrix and simplify the computation of the search direction. Start with B 0, some positive definite matrix. We will update B 0 B 1. Given the information of gradient, we iteratively update our approximation of the Hessian matrix. Our Hessian matrix in step k, B k has low rank is symmetric is positive definite B k is chosen so that the quadratic approximation of f match the gradient of f at x k+1 and x k is not very different from B k 1 (in a certain sense (Frobenius norm)), so that B k does not change wildly makes it easy to find B 1 k+1 (using Sherman-Morrison formula), therefore simplify the calculation of the optimal step.

Quasi-Newton Methods There are different ways to update the approximation of the Hessian matrix. DFP (Davidon-Fletcher-Powel) BFGS (Broyden-Fletcher-Goldfarb-Shanno), superseds DFP. Instead of imposing conditions on B k, we impose conditions on H k = B 1 k L-BFGS, L-DFP - limited memory versions

Algorithm Overview We need: H 0, x 0 Step1 Check if your solution is good enough ( f(x k ) > ɛ ), if not, continue. Step2 Compute optimal direction H k f(x k ) Step3 Compute the size of the optimal step in this direction (linesearch - one dimensional optimization) and compute x k+1 Step4 Evaluate the inverse Hessian H k+1, k = k + 1; Step5 Go to Step 1

Quasi-Newton Methods more efficient than Newton methods in situations when evaluating Hessian is costly (O(n 2 ) vs O(n 3 )) we need our function to have quadratic Taylor approximation near an optimum super linear convergence rate we can use H 0 = ( f(x k+1 ) f(x k )) T (x k+1 x k ) ( f(x k+1 ) f(x k )) T ( f(x k+1 ) f(x k )) I for a quadratic function n-steps of Quasi-Newton method is one Newton step.

Linear Programming min x c T x s.t. A x b A eq x = b eq lb x ub [x, fval, exit] = linprog(c, A, b, Aeq, beq, lb, ub, x0, options)

Linear Programming Classical examples Travelling salesman problem Vehicle routing problem Cost minimization Manufacturing and transportation More recent economics applications Test for rationality of consumption behaviour (Chernye, Rock and Vermuelen) Identification and shape restrictions in nonparametric instrumental variables estimation (Freyberger and Horowitz)

Linear Programming - Example min x x 1 + 2x 2 3x 3 s.t. 2x 1 + x 2 + 3x 3 1 x 1 + 2x 2 0.5x 3 2 x 1 + x 2 + x 3 = 1 0 x 1, x 2, x 3 1 c = [1 2-3]; A = [-2 1 3; -1 2-0.5]; b = [1; 2]; Aeq = [1 1 1]; beq = 1; lb = [0 0 0]; ub = [1 1 1]; options = optimset( linprog ); [x,fval,exit] = linprog(c,a,b,aeq,beq,lb,ub,[]...,options)

Linear Programming - Algorithms interior-point dual-simplex active-set simplex

Linear Programming - Simplex algorithm Image source: wiki

Integer Programming min x c T x s.t. x(intcon) are integers A x b A eq x = b eq lb x ub [x, fval, exit] = intlinprog(c, intcon, A, b, Aeq, beq, lb, ub, x0, options)

Integer Programming - Example min x x 1 + 2x 2 3x 3 + x 4 s.t. x 4 is an integer 2x 1 + x 2 + 3x 3 2x 4 1 x 1 + 2x 2 0.5x 3 4x 4 2 x 1 + x 2 + x 3 + x 4 = 1 0 x 1, x 2, x 3 1 c = [1 2-3 1]; A = [-2 1 3-2; -1 2-0.5-4]; b = [1; 2]; Aeq = [1 1 1 1]; beq = 1; lb = [0 0 0 0]; ub = [1 1 1 1]; intcon = 4; [x,fval,exit] = intlinprog(c,intcon,a,b,aeq,beq,lb,ub) http://www.mathworks.com/help/optim/ug/tuning-integerlinear-programming.html

Integer Programming - Branch and Bound Example from http://www.columbia.edu/ cs2035/courses/ieor4600.s07/bb-lecb.pdf max x x 1 + 4x 2 s.t. x 1, x 2 are integers 10x 1 + 20x 2 22 5x 1 + 10x 2 49 x 1 5 0 x 1, x 2 Optimal solution (3.8, 3), with Z = 8.2

Integer Programming - Branch and Bound x 1 3 or x 1 4 Branch x 1 3 Branch x 1 4 x = (3, 2.6), Z = 6.2 x = (4, 2.9), Z = 7.6

Branch x 1 4 x = (4, 2.9), Z = 7.6 Branch x 1 4 and x 2 2 Branch x 1 4 and x 2 3 NO SOLUTION x = (4, 2), Z = 4

Branch x 1 3 x = (3, 2.6), Z = 7.4 Branch x 1 3 and x 2 2 Branch x 1 3 and x 2 3 NO SOLUTION x = (1.8, 2), Z = 6.2

Branch x 1 3 and x 2 2 x = (1.8, 2), Z = 6.2 Branch x 1 3, x 2 2 and x 1 1 Branch x 1 3, x 2 2 and x 1 2 x = (1, 1.16), Z = 5.4 x = (2, 2), Z = 6

Quadratic Programming min x 1 2 xt Hx + f T x s.t. A x b A eq x = b eq lb x ub [x, fval, exit] = quadprog(h, f, A, b, Aeq, beq, lb, ub, x0, options)

Quadratic Programming - Example min x x 2 1 x 1x 2 + x 2 2 + 2x 1 + 3x 2 s.t. 2x 1 + x 2 1 x 1 2x 2 3 x 1 + 2x 2 = 4 H = [2-1; -1 2]; f = [2; 3]; A = [-2 1; 1-2]; b = [-1; 3]; Aeq = [1 2]; beq = 4; lb = []; ub = []; [x,fval,exit] = quadprog(h,f,a,b,aeq,beq,lb,ub)

Global Optimization methods Image source: www.mathworks.com

Global Optimization methods In MATLAB s Global Optimization Toolbox Global Search and Multistart Solvers - generate multiple starting points, filter nonpromising points Genetic Algorithm Solver - population of points, we simulate evolution of population. Phases: Selection: we select good parents Crossover: they produce children Mutation: induce randomness - can jump off local optimate Pattern Search Solver - direct search, no derivative needed. Simulated Annealing - probabilistic search algorithm that mimics the physical process of annealing. We slowly reduce the system temperature to minimize the system energy.

Choice of Algorithm Identify the objective function linear quadratic smooth nonlinear nonsmooth Identify the types of constraints none bound linear smooth discrete http://www.mathworks.com/help/optim/ug/choosing-a-solver.html

Optimization Cookbook - practical advice What to do if...? Solver did not succeed Not sure if the solver succeeded Solver succeeded

Solver did not succeed Try to find out what is going on, set display to iter. Does the {objective function, max constraint violation, first order optimality criterion, trust region radius} decrease? increase MaxIter or MaxFunEvals relax tolerances change initial point center and scale your problem if problem is unbounded - check formulation of the problem start from a simpler problem, iteratively add restrictions and use optimal solutions as starting points for more complex problem

Not sure whether solver succeeded The first order optimality condition is not satisfied Final point = Initial point. Change the initial point to some nearby points Local minimum possible Non-smooth function - this is the best we can possibly get Set the optimum as a new starting point and re-run the optimization Try a different algorithm Play with tolerances Takes too long? Use a sparse solver (uses less memory), use parallel computing

Solver succeeds - Robustness Local minimum vs Global minimum? Use a grid of initial points. Check whether the formulation of the problem in MATLAB corresponds to the problem at hand - try objective function at a few points, check sign if max, check sign of inequalities. Check with Global optimization toolbox.

Lagrange multipliers Lagrange multipliers tell you how important particular restrictions are at the optimal solution use it for your advantage, which restrictions are important, causes problems to your solver?

Tolerances and Stopping Criteria TolX TolFun TolCon MaxIter MaxFunEvals many others for different algorithms (e.g. Interior-point)

For Tomorrow Install Dynare http://www.dynare.org/documentation-and-support/quick-start

Literature Miranda, M., and P. Fackler. Applied computational economics. (2001). Wright, Stephen J., and Jorge Nocedal. Numerical optimization. Vol. 2. New York: Springer, 1999. J. E. Dennis, Jr.; Jorge J. More. Quasi-Newton Methods, Motivation and Theory. SIAM Review, Vol. 19, No. 1 (Jan., 1977), 46-89 Nocedal, J. (1980). Updating Quasi-Newton Matrices with Limited Storage. Mathematics of Computation 35 (151): 773782. Branch and Bound method explained on an example http://www.columbia.edu/ cs2035/courses/ieor4600.s07/bb-lecb.pdf Branch and Bound method - example with animals http://ocw.mit.edu/courses/sloan-school-of-management/15-053-optimization-methods-in-management -science-spring-2013/tutorials/mit15 053S13 tut10.pdf http://www.mathworks.com/help/optim/ug/choosing-a-solver.html