ECS550NFB Introduction to Numerical Methods using Matlab Day 2

ECS550NFB Introduction to Numerical Methods using Matlab Day 2 Lukas Laffers lukas.laffers@umb.sk Department of Mathematics, University of Matej Bel June 9, 2015

Today Root-finding: find x that solves f(x) = 0 Optimization: find x that optimizes (minimizes/maximizes) f(x), Constrained/Unconstrained optimization

Root-finding Given univariate function on an interval. We want to find x that solves f(x) = 0. simple slow robust In MATLAB fzero fsolve - requires Optimization Toolbox

Bisection Image source: wiki

Root-finding: Newton Method Based on a Taylor approximation: h f(x) f (x), and x k+1 = x k f(x) f (x) f(x + h) f(x) + hf (x) key is the linear approximation, NR method is as good as is the linear approximation Multiple dimensions f(x + h) f(x) + h T f(x) h ( f(x)) 1 f(x) and x k+1 = x k ( f(x)) 1 f(x).

Newton Method Image source: wiki

Newton Method experiment with different formulation. x 2 sin x = 0 vs. 2/x 1/ sin x = 0, this may help us to get rid of a flat part if x k fails to settle down, use different starting point verification: f(x ɛ) < 0 < f(x + ɛ)

Secant Method In Newton s method we had to calculate derivative, this may be computationally expensive Secant method approximates the derivative 0 f(x 1) x 2 x 1 = f(x 0) f(x 1 ) x 0 x 1 x k x k 1 x k+1 = x k f(x k ) f(x k ) f(x k 1 )

Secant Method Image source: mathworld.wolfram.com

Optimization - problem classification Local vs Global Local - most numerical methods concerns local optimization. The nature of the problem may guarantee the uniqueness. Global - usually stochastic methods. There is a chance that we jump-off local extrema. Constrained vs Unconstrained Unconstrained Constrained

Optimization Understand the problem! There exists no method that it superior in all situations. How many variables do we optimize over? What is the shape of the objective function (e.g. concave/convex)? Is the constraint set convex? How costly is it to evaluate the objective function? How costly is it to evaluate the gradient of the objective function? Trade-off speed vs. generality.

What is available Without toolbox fminbnd - minimum of a single variable function on a fixed interval fminsearch - unconstrained derivative-free minimization fzero - find a root of a nonlinear function Optimization Toolbox Constrained minimization Linear and Quadratic programming Mixed integer programming Global Optimization Toolbox fminbnd - minimum of a single variable function on a fixed interval fminsearch - unconstrained derivative-free minimization fzero - find a root of a nonlinear function

Formulating the Problem in MATLAB structure problem min x f(x) s.t. A x b A eq x = b eq lb x ub f - function to be minimized A and b defining inequality restrictions Aeq and beq defining equality restrictions lb and ub defining bounds on variables x0 - starting point solver - (e.g. linprog, fminunc) options - in MATLAB run: solver(problem)

Optimization in MATLAB structure output exitflag - reason why solver terminated (1, 0, 1, 2, 3,...) lambda - Lagrange multipliers at the solution (lower, upper, ineqlin, eqlin) output - information about the optimization (iterations, constrviolation, firstorderopt,...) structure options Algorithm - algorithm to be used Maxiter - maximum number of iterations TolFun - termination tolerance for the function value.

Unconstrained optimization Golden Section Search Univariate function Interval Newton Method Multivariate function Gradient supplied Hessian supplied Quasi-Newton Methods Multivariate function Gradient supplied Hessian approximated Nelder-Mead Method Multivariate function only need function values Trust region Methods Multivariate function only need function values useful for large-scale problems

What is a good method fast - in terms of computing speed, usually measured in no. of function evaluations or in general rate of convergence x k+1 x x k x reliable - guarantees success, (assumptions needed for this!) robust - behaves well under different scenarios efficient - is the fastest (for a certain class of problems, under certain assumptions)

Convex and non-convex sets and functions Convex sets and functions. Image source: Brandimarte

Unconstrained optimization - conditions for smooth functions First order necessary conditions: x is local min of f, f is continuously differentiable around x, then f(x ) = 0. Second order necessary conditions: x is local min of f, 2 f is continuous around x, then f(x ) = 0 and 2 f(x ) 0. Second order sufficient conditions: 2 f is continuous around x, f(x ) = 0 and 2 f(x ) 0, then x is strict local min of f. Convex/concave differentiable function = stationary point is a global minimizer/maximizer.

Unconstrained optimization - strategies Line search: min α>0 f(x + αp). Set direction p and make step of size α Trust-region: Approximate f with some function m in a region around x.

Unconstrained optimization - strategies Image source: Nocedal and Wright

Step size How to choose the step size α? Fixed step size - reduce step size if no optimum found Wolfe conditions (c 1 = 10 4, c 2 (c 1, 1)) Sufficient decrease: f(xk + α k h k ) f(x k ) + c 1 α f(x k ) T h k Curvature: f(xk + α k h k ) T h k c 2 f(x k ) T h k (step is not too small) Backtracking line search - adaptively reduce the step size (from α to βα, for some β (0.1, 0.8)), until the step is good enough according to some criterion (e.g. sufficient decrease) Exact line search - arg min α 0 f(x α f(x)) - usually not efficient

Wolfe conditions Image source: Nocedal and Wright

Golden Section Search Extremum of unimodal function on an interval. We choose the points at which we evaluate f in a clever way: we make sure to use the the interval that contains the extremum shrinks at the best possible rate 1+ 5 2 1.618 is I n+1 I n

Golden Section Search Image source: wiki

Steepest descent Let us choose the natural direction going down the hill, f(x) x k+1 = x k α f(x k ) When do we stop our algorithm? f(x) < ɛ

Steepest descent Image source: Nocedal and Wright

Steepest descent - scaling Image source: Nocedal and Wright

Newton Method We require information about the gradient and hessian (This can be very costly). f(x + h) = f(x) + f(x) T h + 1 2 ht 2 f(x)h minimize via direction h, we get h = ( 2 f(x)) 1 f(x) basically root-finding applied to the first derivative computation of ( 2 f(x)) 1 costly convergence not guaranteed (may stuck in infinite cycle)

Quasi-Newton Methods We avoid the calculation of the Hessian matrix and simplify the computation of the search direction. Start with B 0, some positive definite matrix. We will update B 0 B 1. Given the information of gradient, we iteratively update our approximation of the Hessian matrix. Our Hessian matrix in step k, B k has low rank is symmetric is positive definite B k is chosen so that the quadratic approximation of f match the gradient of f at x k+1 and x k is not very different from B k 1 (in a certain sense (Frobenius norm)), so that B k does not change wildly makes it easy to find B 1 k+1 (using Sherman-Morrison formula), therefore simplify the calculation of the optimal step.

Quasi-Newton Methods There are different ways to update the approximation of the Hessian matrix. DFP (Davidon-Fletcher-Powel) BFGS (Broyden-Fletcher-Goldfarb-Shanno), superseds DFP. Instead of imposing conditions on B k, we impose conditions on H k = B 1 k L-BFGS, L-DFP - limited memory versions

Algorithm Overview We need: H 0, x 0 Step1 Check if your solution is good enough ( f(x k ) > ɛ ), if not, continue. Step2 Compute optimal direction H k f(x k ) Step3 Compute the size of the optimal step in this direction (linesearch - one dimensional optimization) and compute x k+1 Step4 Evaluate the inverse Hessian H k+1, k = k + 1; Step5 Go to Step 1

Quasi-Newton Methods more efficient than Newton methods in situations when evaluating Hessian is costly (O(n 2 ) vs O(n 3 )) we need our function to have quadratic Taylor approximation near an optimum super linear convergence rate we can use H 0 = ( f(x k+1 ) f(x k )) T (x k+1 x k ) ( f(x k+1 ) f(x k )) T ( f(x k+1 ) f(x k )) I for a quadratic function n-steps of Quasi-Newton method is one Newton step.

Linear Programming min x c T x s.t. A x b A eq x = b eq lb x ub [x, fval, exit] = linprog(c, A, b, Aeq, beq, lb, ub, x0, options)

Linear Programming Classical examples Travelling salesman problem Vehicle routing problem Cost minimization Manufacturing and transportation More recent economics applications Test for rationality of consumption behaviour (Chernye, Rock and Vermuelen) Identification and shape restrictions in nonparametric instrumental variables estimation (Freyberger and Horowitz)

Linear Programming - Example min x x 1 + 2x 2 3x 3 s.t. 2x 1 + x 2 + 3x 3 1 x 1 + 2x 2 0.5x 3 2 x 1 + x 2 + x 3 = 1 0 x 1, x 2, x 3 1 c = [1 2-3]; A = [-2 1 3; -1 2-0.5]; b = [1; 2]; Aeq = [1 1 1]; beq = 1; lb = [0 0 0]; ub = [1 1 1]; options = optimset( linprog ); [x,fval,exit] = linprog(c,a,b,aeq,beq,lb,ub,[]...,options)

Linear Programming - Algorithms interior-point dual-simplex active-set simplex

Linear Programming - Simplex algorithm Image source: wiki

Integer Programming min x c T x s.t. x(intcon) are integers A x b A eq x = b eq lb x ub [x, fval, exit] = intlinprog(c, intcon, A, b, Aeq, beq, lb, ub, x0, options)

Integer Programming - Example min x x 1 + 2x 2 3x 3 + x 4 s.t. x 4 is an integer 2x 1 + x 2 + 3x 3 2x 4 1 x 1 + 2x 2 0.5x 3 4x 4 2 x 1 + x 2 + x 3 + x 4 = 1 0 x 1, x 2, x 3 1 c = [1 2-3 1]; A = [-2 1 3-2; -1 2-0.5-4]; b = [1; 2]; Aeq = [1 1 1 1]; beq = 1; lb = [0 0 0 0]; ub = [1 1 1 1]; intcon = 4; [x,fval,exit] = intlinprog(c,intcon,a,b,aeq,beq,lb,ub) http://www.mathworks.com/help/optim/ug/tuning-integerlinear-programming.html

Integer Programming - Branch and Bound Example from http://www.columbia.edu/ cs2035/courses/ieor4600.s07/bb-lecb.pdf max x x 1 + 4x 2 s.t. x 1, x 2 are integers 10x 1 + 20x 2 22 5x 1 + 10x 2 49 x 1 5 0 x 1, x 2 Optimal solution (3.8, 3), with Z = 8.2

Integer Programming - Branch and Bound x 1 3 or x 1 4 Branch x 1 3 Branch x 1 4 x = (3, 2.6), Z = 6.2 x = (4, 2.9), Z = 7.6

Branch x 1 4 x = (4, 2.9), Z = 7.6 Branch x 1 4 and x 2 2 Branch x 1 4 and x 2 3 NO SOLUTION x = (4, 2), Z = 4

Branch x 1 3 x = (3, 2.6), Z = 7.4 Branch x 1 3 and x 2 2 Branch x 1 3 and x 2 3 NO SOLUTION x = (1.8, 2), Z = 6.2

Branch x 1 3 and x 2 2 x = (1.8, 2), Z = 6.2 Branch x 1 3, x 2 2 and x 1 1 Branch x 1 3, x 2 2 and x 1 2 x = (1, 1.16), Z = 5.4 x = (2, 2), Z = 6

Quadratic Programming min x 1 2 xt Hx + f T x s.t. A x b A eq x = b eq lb x ub [x, fval, exit] = quadprog(h, f, A, b, Aeq, beq, lb, ub, x0, options)

Quadratic Programming - Example min x x 2 1 x 1x 2 + x 2 2 + 2x 1 + 3x 2 s.t. 2x 1 + x 2 1 x 1 2x 2 3 x 1 + 2x 2 = 4 H = [2-1; -1 2]; f = [2; 3]; A = [-2 1; 1-2]; b = [-1; 3]; Aeq = [1 2]; beq = 4; lb = []; ub = []; [x,fval,exit] = quadprog(h,f,a,b,aeq,beq,lb,ub)

Global Optimization methods Image source: www.mathworks.com

Global Optimization methods In MATLAB s Global Optimization Toolbox Global Search and Multistart Solvers - generate multiple starting points, filter nonpromising points Genetic Algorithm Solver - population of points, we simulate evolution of population. Phases: Selection: we select good parents Crossover: they produce children Mutation: induce randomness - can jump off local optimate Pattern Search Solver - direct search, no derivative needed. Simulated Annealing - probabilistic search algorithm that mimics the physical process of annealing. We slowly reduce the system temperature to minimize the system energy.

Choice of Algorithm Identify the objective function linear quadratic smooth nonlinear nonsmooth Identify the types of constraints none bound linear smooth discrete http://www.mathworks.com/help/optim/ug/choosing-a-solver.html

Optimization Cookbook - practical advice What to do if...? Solver did not succeed Not sure if the solver succeeded Solver succeeded

Solver did not succeed Try to find out what is going on, set display to iter. Does the {objective function, max constraint violation, first order optimality criterion, trust region radius} decrease? increase MaxIter or MaxFunEvals relax tolerances change initial point center and scale your problem if problem is unbounded - check formulation of the problem start from a simpler problem, iteratively add restrictions and use optimal solutions as starting points for more complex problem

Not sure whether solver succeeded The first order optimality condition is not satisfied Final point = Initial point. Change the initial point to some nearby points Local minimum possible Non-smooth function - this is the best we can possibly get Set the optimum as a new starting point and re-run the optimization Try a different algorithm Play with tolerances Takes too long? Use a sparse solver (uses less memory), use parallel computing

Solver succeeds - Robustness Local minimum vs Global minimum? Use a grid of initial points. Check whether the formulation of the problem in MATLAB corresponds to the problem at hand - try objective function at a few points, check sign if max, check sign of inequalities. Check with Global optimization toolbox.

Lagrange multipliers Lagrange multipliers tell you how important particular restrictions are at the optimal solution use it for your advantage, which restrictions are important, causes problems to your solver?

Tolerances and Stopping Criteria TolX TolFun TolCon MaxIter MaxFunEvals many others for different algorithms (e.g. Interior-point)

For Tomorrow Install Dynare http://www.dynare.org/documentation-and-support/quick-start

Literature Miranda, M., and P. Fackler. Applied computational economics. (2001). Wright, Stephen J., and Jorge Nocedal. Numerical optimization. Vol. 2. New York: Springer, 1999. J. E. Dennis, Jr.; Jorge J. More. Quasi-Newton Methods, Motivation and Theory. SIAM Review, Vol. 19, No. 1 (Jan., 1977), 46-89 Nocedal, J. (1980). Updating Quasi-Newton Matrices with Limited Storage. Mathematics of Computation 35 (151): 773782. Branch and Bound method explained on an example http://www.columbia.edu/ cs2035/courses/ieor4600.s07/bb-lecb.pdf Branch and Bound method - example with animals http://ocw.mit.edu/courses/sloan-school-of-management/15-053-optimization-methods-in-management -science-spring-2013/tutorials/mit15 053S13 tut10.pdf http://www.mathworks.com/help/optim/ug/choosing-a-solver.html