Global and derivative-free optimization Lectures 1-4

Size: px
Start display at page:

Download "Global and derivative-free optimization Lectures 1-4"

Transcription

1 Global and derivative-free optimization Lectures 1-4 Coralia Cartis, University of Oxford INFOMM CDT: Contemporary Numerical Techniques Global and derivative-free optimizationlectures 1-4 p. 1/46

2 Lectures 1-4:outline Brief overview of derivative-based methods for local NLO. Global optimization: definition and overview. Derivative-Free Optimization (DFO): motivation and applications. Overview of DFO algorithms Model-based (+with probabilistic models, later) Direct-search, pattern search, Nelder-Mead Implicit-filtering (in the context of an application) Overview of GO algorithms (briefly) Stochastic methods Deterministic methods Branch-and-Bound; Interval Methods; Response Surface Methods; Modern Branch-and-Bound Global and derivative-free optimizationlectures 1-4 p. 2/46

3 Nonlinear optimization: derivative-based algorithms minimize f(x) subject to x Ω R n. (P) f :Ω R is (generally) smooth and nonconvex. Ω feasible set (determined by finitely many constraints). guaranteed to find local minimizers of (P). rely heavily on accurate/exact derivative(s) information of f and the constraints optimality conditions: for example, when Ω=R n, x local minimizer of f = f(x )=0and 2 f(x ) 0 f(x )=0and 2 f(x ) 0 = x local min. of f. used as termination criteria for algorithms. Taylor expansions of f: at kth iterate x k x, f(x k + s) m k (s) =f(x k )+s T f(x k ) [ st 2 f(x k )s ] used in algorithm construction. Global and derivative-free optimizationlectures 1-4 p. 3/46

4 Nonlinear optimization: derivative-based algorithms... Methods for local unconstrained optimization [i.e., Ω=IR n in (P)] A Generic Method (GM) Choose ɛ>0 and x 0 R n. While (TERMINATION CRITERIA not achieved), REPEAT: compute the change x k+1 x k = α k s k, [linesearch, trust-region] to ensure f(x k+1 ) f(x k ); where α k [0, 1] and s k = arg min s R n m k(s) f(x k + s). set x k+1 := x k + α k s k, k := k +1. TC: f(x k ) ɛ; maybe also, λ min ( 2 f(x k )) ɛ. Global and derivative-free optimizationlectures 1-4 p. 4/46

5 Nonlinear optimization: derivative-based algorithms... Linesearch methods for local unconstrained optimization compute a descent direction s k from x k [i.e., (s k ) T f(x k ) < 0] set x k+1 = x k + α k s k to decrease f [α k (in)exact linesearch] s k = f(x k ) s k = 2 f(x k ) 1 f(x k ) s k min s= f(xk ) m k (s) linear. steepest descent s k min s m k (s) quadratic. Newton Global and derivative-free optimizationlectures 1-4 p. 5/46

6 Nonlinear optimization: derivative-based algorithms... Trust-region methods for local unconstrained optimization compute a step s k from x k to improve model m k (s) of f within the trust-region s k, s k (approx.)min s m k (s) subjectto s k. set x k+1 = x k + s k if m k and f agree at x k + s k otherwise set x k+1 = x k and reduce the radius k Global and derivative-free optimizationlectures 1-4 p. 6/46

7 Nonlinear optimization: derivative-based algorithms... How to compute/provide derivatives to a solver? Calculate derivatives by hand when easy/simple objective and constraints; user provides code that computes them. Calculate or approximate derivatives automatically: Automatic differentiation: breaks down computer code for evaluating f into elementary arithmetic operations + differentiate by chain rule. Software: ADIFOR, ADOL-C. Symbolic differentiation: manipulate the algebraic expression of f (if available). Software: symbolic packages of MAPLE, MATHEMATICA, MATLAB. Finite differencing approximate derivatives. See Nocedal & Wright, Numerical Optimization (2nd edition, 2006) for more details. Global and derivative-free optimizationlectures 1-4 p. 7/46

8 Nonlinear optimization: derivative-based algorithms... Advantages and successes global convergence to stationary points of (P) under mild assumptions on the problem class; fast local convergence for Newton-like variants. can solve large-scale problems n large (at least of order 10 3 ) efficiently, even when (P) has nonlinear constraints. Limitations only guaranteed to provide local solutions of (P) when (P) is nonconvex. requires accurate or exact first-, and sometimes even second-, derivatives of the objective f and constraints to be available. Global and derivative-free optimizationlectures 1-4 p. 8/46

9 Global and derivative-free optimization algorithms Attempt to overcome the limitations of derivative-based NLO algorithms for local minimization: Global Optimization (GO) the global minimizer of (P) is required; derivatives are allowed Derivative-Free Optimization (DFO) derivatives are unavailable, even if (P) may be smooth; use only function values to construct iterates that approach a (local) min. For the remainder, Ω=R n in (P), i.e., we solve minimize f(x) subject to x R n. (UP) [GO and DFO may not deal with nonlinear constraints, at best bounds] Comparison to local optimization (UP) : GO more difficult (generally, NP-hard); in DFO, we lose problem information. = for both GO and DFO, often content with improvement rather than optimization Global and derivative-free optimizationlectures 1-4 p. 9/46

10 Global optimization Consider (UP). When f convex, x local minimizer of f = x global minimizer of f. Hence, for such instances, local optimization=go. f nonconvex and bounded below: how to compute the global minimizer of f in the presence of local minimizers/high oscillations and sometimes noise? A local optimization algorithm gets trapped at local minimizers and cannot further advance towards the global solution. How do GO methods avoid this? when to terminate a GO algorithm? Global and derivative-free optimizationlectures 1-4 p. 10/46

11 Global optimization... Applications: many of the grand challenge problems of scientific computing such weather forecasting, electronic-structural design, protein folding, molecular dynamics, etc. Methods (to be addressed): branch-and-bound, multistart local search, randomized, etc. Limitations on average, can solve efficiently problems of (very) small scale (in the order of 10 variables); better if parallelism is employed. difficulties with incorporating nonlinear constraints, only bound constraints (more) straightforward. Global and derivative-free optimizationlectures 1-4 p. 11/46

12 Derivative-Free Optimization (DFO) Consider (UP). Even when f is smooth, for many applications: Exact first derivatives of f are unavailable: f(x) given by a black-box code, proprietary code or a simulation package. Computing f(x) for any given x is expensive: f(x) given by a time-consuming numerical simulation or lab experiments. Numerical approximation of f(x) is impractically expensive or slow: using finite-differencing for approximating f(x) when f(x) is expensive. The values of f(x) are noisy, i.e., the evaluation of f(x) is inaccurate. For example, when f(x) depends on discretization, sampling, inaccurate data, etc. Then gradient information is meaningless. Global and derivative-free optimizationlectures 1-4 p. 12/46

13 DFO: effect of noise on finite-differencing Effect of noisy function values on finite-differencing: Let F smooth and Ψ noise so that f(x) =F (x) +Ψ(x). Central-Difference (CD) formula for f(x) with stepsize h: f (x) f(x + hei ) f(x he i ), i = 1,n, x i h and let η(x, h) := sup z x h Ψ(z). = h f(x) F (x) L F h 2 + η(x,h) h, where L F is the Lipschitz constant of 2 F. If η(x, h) dominates in the RHS, then lucky if h f(x) descent. May use DFO methods in that case. Global and derivative-free optimizationlectures 1-4 p. 13/46

14 DFO: Illustrative application Tuning of algorithmic parameters Consider some nonlinear optimization solver (say trust region). Its performance depends on parameters choices: starting point, initial trust-region radius, successful step parameter, etc. For their automatic (optimal) adjustment, solve for instance min p R n p f(p) =CPU(solver; p) subject to p P, where p vector of all parameters to be tuned and P = {p : l p u}. derivative calculation hard, possibly nondifferentiable. Other applications. Automatic error analysis, engineering design, molecular geometry, etc. Global and derivative-free optimizationlectures 1-4 p. 14/46

15 DFO methods use only objective function values to construct iterates. do not essentially compute an approximate gradient. instead, form sample of points (less tightly clustered than for finite-differences); use associated function values to generate x k+1 so as to ensure descent; must also control geometry of sample sets. Algorithms (to be addressed): model-based trust-region, direct-search algorithms, etc. compute approximate (local) solution with few function evaluations; asymptotic speed irrelevant as no optimality conditions for termination. also suitable (but not guaranteed to be successful) for nonsmooth and for global optimization. Global and derivative-free optimizationlectures 1-4 p. 15/46

16 DFO methods... Limitations. With current state-of-the art DFO methods, expect to successfully solve problems provided: the problem is small-scale (in the order of 10 2 variables); f must be quite smooth; accurate finite-differencing cannot be achieved (f noisy or expensive etc); high accuracy not required (as the methods are slow asymptotically). Global and derivative-free optimizationlectures 1-4 p. 16/46

17 Derivative-free optimization Model-based derivative-free methods Model-based derivative-free algorithm Interpolation models Polynomial interpolation Geometry of the sample set Comments Global and derivative-free optimizationlectures 1-4 p. 17/46

18 Models in optimization methods minimize f(x) subject to x R n. (UP) derivative-based methods rely on (linear or quadratic) Taylor models of f: f(x k +s) f(x k )+s T f(x k ) ( + 1 ) 2 st 2 f(x k )s m k (s). need accurate gradient values f(x k ) [and maybe Hessians 2 f(x k )]. How to construct models of f when derivatives are unavailable / don t exist / cannot be approximated? by interpolation of f on a set of appropriately chosen sample points. Global and derivative-free optimizationlectures 1-4 p. 18/46

19 Models in derivative-free optimization methods Sample set: Y = {y 1,...,y q } for some q. {f(y 1 ),...,f(y q )} assumed to be known/computed. x k current iterate/estimate of minimizer x. x k Y and f(x k ) f(y i ), i = 1,q. Model: m k (s) =c + s T g ( st Hs ), where c R, g R n (and H R n n symmetric) unknown. Compute c R, g R n (and H R n n ) to satisfy the interpolation conditions: m k (y i x k )=f(y i ), i {1,...,q}. (IC) need q = n +1for m k linear (i.e., H =0); m k quadratic needs q =(n +1)(n +2)/2; connect to finite-differences. Global and derivative-free optimizationlectures 1-4 p. 19/46

20 Model-based DFO algorithm Issues to address: model interpolation: matrix of linear system (IC) must be nonsingular and well-conditioned. model minimization: since m k nonconvex, add TR constraint: s k = arg min s R n m k(s) subject to s k. (TR) update m k rather than recompute (only one point leaves from Y and a new one enters); improve the geometry of Y to help with the (conditioning of) model interpolation step. A complete algorithm is very involved; here we give a generic framework. Global and derivative-free optimizationlectures 1-4 p. 20/46

21 Model-based DFO algorithm... Let s k be a(n approximate) solution of (TR). Then predicted model decrease: m k (0) m k (s k )=f(x k ) m k (s k ). actual function decrease: f(x k ) f(x k + s k ). The trust region radius k is chosen based on the value of ρ k := f(xk ) f(x k + s k ). f(x k ) m k (s k ) If ρ k η, where η (0, 1), x k+1 := x k + s k, k+1 k. If ρ k <η, x k+1 = x k and k is reduced or Y is improved. Global and derivative-free optimizationlectures 1-4 p. 21/46

22 Generic model-based DFO algorithm Given Y = {y 1,...,y q } such that (IC) nonsingular, x 0 Y such that f(x 0 ) f(y i ) for i = 1,q, η (0, 1), 0 > 0 and k =0. While (TC not satisfied), do: 1. Form the linear/quadratic model m k (s) to satisfy (IC). 2. Solve (approximately) the (TR) subproblem for s k with m k (s k ) <f(x k ) ( sufficiently ). Compute ρ k := [f(x k ) f(x k + s k )]/[f(x k ) m k (s k )]. 3. If ρ k η, then [successful step] set x k+1 := x k + s k, k+1 k, replace y i Y by x k+1. Else if (ρ k <η) and (Y need not be improved), then set x k+1 = x k and k+1 < k. [unsuccessful step] End(if) 4. Geometry-improving step... Global and derivative-free optimizationlectures 1-4 p. 22/46

23 Generic model-based DFO algorithm... (continued...) 4. Invoke a geometry-improving procedure to update Y [one point leaves Y, new one enters so as to improve the conditioning of (IC)] choose ˆx Y such that f(ˆx) f(y i ) for all y i Y. set k+1 := k ; recompute ρ k for x k + s k := ˆx. If ρ k η, then set x k+1 =ˆx; Else set x k+1 = x k. End(if) 5. Let k := k +1. Global and derivative-free optimizationlectures 1-4 p. 23/46

24 Model-based DFO algorithm: comments ρ k <η= trust region is too large OR the sample set Y is inadequate (degenerate): iterates confined to low-dimensional surface of R n that does not contain the solution replace point in Y if condition number of (IC) is too high so as to improve this condition no. initial Y : vertices and edges midpoints of simplex in R n. quadratic models expensive: O(n 2 ) function evals. for initial model set-up; O(n 4 ) arithmetic operations per iteration for model update and minimization. (cheaper quadratic model: see Frobenius norm updates) use linear models (at least at the start of the algorithm until enough function evaluations have been calculated): O(n) function evals. for initial model set-up; O(n 3 ) ops/it. Global and derivative-free optimizationlectures 1-4 p. 24/46

25 Model construction by interpolation Linear model (linear polynomial in n variables): m k (s) =f(x k )+s T g. Need q = n +1in (IC) to determine c R,g R n ; but c = f(x k ) and so (IC) provide f(x k )+(y i x k ) T g = f(y i ), i = 1,n, or equivalently, (y i x k ) T g = f(y i ) f(x k ), i = 1,n. Thus g and hence m k (s) uniquely defined {y 1 x k,y 2 x k,...,y n x k } linearly independent {x k,y 1,y 2,...,y n } nondegenerate simplex. Pn 1 polynomials of degree at most 1 in Rn ; dim Pn 1 = n +1; monomial basis=natural basis φ = {1,x 1,...,x n }; φ j (x) =x j m k (y i x k )= q j=1 α jφ j (y i )=f(y i ), i = 1,q. Global and derivative-free optimizationlectures 1-4 p. 25/46

26 Model construction by interpolation... Quadratic model (quadratic polynomial in n variables): m k (s) =f(x k )+s T g st Hs, or equivalently, by symmetry of H, where ŝ = m k (s) =f(x k )+s T g + i<j H ijs i s j ( s, {s i s j } i<j, m k (s) =f(x k )+ŝ T ĝ, { 2 1 s 2 i }), ĝ = ( g, {H ij } i<j, P n 2 polynomials of degree at most 2 in Rn ; dim Pn 2 =(n + 1)(n + 2)/2 =q; monomial basis=natural basis φ = {φ j : j = 1,q}; i H iis 2 i, m k (y i x k )= q j=1 α jφ j (y i )=f(y i ), i = 1,q. { 1 2 H 2 ii}). Thus m k uniquely defined δ(φ, Y )=det({φ j (y i )} ij ) 0, for some polynomial basis φ. Global and derivative-free optimizationlectures 1-4 p. 26/46

27 Model construction by interpolation... (IC) M(φ, Y )α φ = f(y ), where M(φ, Y ) ij = φ j (y i ) and f(y ) i = f(y i ), i, j = 1,q. Y poised for interpolation δ(φ, Y ) =detm(φ, Y ) 0for some basis φ δ(φ, Y ) 0for any basis φ interpolating polynomial m k (s) exists and is unique. Other (useful) polynomial basis: Lagrange polynomials; Newton fundamental polynomials. Lagrange polynomials: given Y = {y 1,...,y q }, Lagrange polyn. χ j (x), j = 1,q, such that χ j (y i )=1if i = j and χ j (y i )=0if i j. Y poised = basis {χ j (x)} j uniquely exists = interpolating polyn. of f on Y : m k (s) = q j=1 f(yj )χ j (x k + s). Global and derivative-free optimizationlectures 1-4 p. 27/46

28 Updating Y to improve its geometry remove y Y and add y + to give Y + so that δ(φ, Y + ) increases in magnitude. Property: δ(φ, Y + ) χ j (y + ) δ(φ, Y ) with j =index of y. When ρ k η: y + = x k + s k and y = arg max yj Y χ j (x k + s k ). When ρ k <η: check whether Y needs improvement: [Y adequate at x k if for all y j Y such that y j x k k, δ(φ, Y ) cannot be doubled if y j replaced by y inside TR constr.] If Y adequate, choose k+1 < k, x k+1 = x k, leave Y unchanged. Else, for every y j Y, define potential replacement y j r y j r = arg max y x k k χ j(y). Let y = y j = arg max y i Y χ i (y i r ). Global and derivative-free optimizationlectures 1-4 p. 28/46

29 Constructing cheaper quadratic models extension to DFO of quasi-newton techniques; only O(n) function-evaluations required to construct the quadratic model with a O(n 3 ) arith. cost/iteration. Compute c, g and H by solving min c,g,h H Hk F subject to H = H T,m k (y i x k )=f(y i ), i = 1, ˆq, where F Frobenius norm and H k previous model Hessian. ˆq = O(n): ˆq n +2so we can compute c, g and some H k+1 H k ; practical value: ˆq =2n +1. like before, we need to consider geometry of Y... Software (model-based implementations): COBYLA (linear), DFO (quadratic), UOBYQA (quadratic), WEDGE (quadratic), NEWUOA (cheap quadratic based on qnewton updating). Global and derivative-free optimizationlectures 1-4 p. 29/46

30 Derivative-free optimization Direct-search derivative-free methods Linesearch methods Coordinate search method Pattern search methods Simplex methods Nelder-Mead algorithm Global and derivative-free optimizationlectures 1-4 p. 30/46

31 Linesearch derivative-free methods minimize f(x) subject to x R n. (UP) A Generic Linesearch DF Method (GLM-DF) Choose x 0 R n ; k =0. While (TC not satisfied), REPEAT: choose a search direction s k R n from x k if possible, compute a stepsize α k R along s k such that f(x k + α k s k ) <f(x k ); set x k+1 := x k + α k s k (if such a step α k exists) and x k+1 := x k (otherwise) and k := k +1. recall derivative-based linesearch methods: (s k descent if (s k ) T f(x k ) < 0) f(x k + αs k ) <f(x k ) for α>0 sufficiently small; s k = f(x k ) descent. Global and derivative-free optimizationlectures 1-4 p. 31/46

32 Linesearch derivative-free methods: linesearch f(x) (a) Steps are too long. (b) Steps are too short. (c) Bad search direction. Kolda, Lewis & Torczon (SIREV) Exact linesearch: α k =argmin α R φ k (α) =f(x k + αs k ). Inexact linesearch: sufficient decrease (Armijo-like cond.): f(x k + α k s k ) <f(x k ) ρ(α k ), where ρ(t) 0 increasing function of t, ρ(t)/t 0 as t 0. Use backtracking to satisfy the Armijo-like condition. Global and derivative-free optimizationlectures 1-4 p. 32/46

33 Linesearch DF methods: choice of search directions Example: Coordinate search method x 1 = x 0 + α 0 e 1 x 2 = x 1 + α 1 e 2. x n = x n 1 + α n 1 e n x n+1 = x n + α n e 1 x*. α k computed by exact or inexact linesearch. x 0 x 1 inefficient behaviour: coordinate direction (almost) orthogonal to f(x k ); see Figs. 1 (c) & 2. Global and derivative-free optimizationlectures 1-4 p. 33/46

34 Linesearch DF methods: choice of search directions Coordinate search method (continued...) CS with exact linesearch: example of failure to converge to a stationary point of f (MJD Powell, 1973). efficient when the variables are essentially uncoupled (equiv. to a nearly diagonal Hessian). Problem about coordinate search. when convergent, the local rate of convergence is often slower than steepest descent (Luenberger, 2003): one step of SD n steps of CS. globally convergent variants exist (strong assumptions or sophisticated linesearch); for example, assume that along each coordinate direction, f has a unique minimizer. Global and derivative-free optimizationlectures 1-4 p. 34/46

35 Linesearch DF methods: choice of search directions Other variants of coordinate search and linesearch DF: Back and forth /Double sweep method: Search along e 1,e 2,...,e n,e n 1,...,e 2,e 1,e 2,... Hookes & Jeeves; search along n coordinates then along the line from 1st to last point in cycle. Conjugate directions algorithm (connection to derivative-based Conjugate Gradients). Global convergence for GLM-DF: prevent inefficient behaviour by requiring that cos θ k = f(xk ) T s k δ>0for all k. f(x k ) s k When gradient of f is unavailable, require instead min v 0 max j {0,...,n 1} v T s k+j v s k+j δ>0 for all k. = span {s 0,s 1,...,s n 1 } = R n. Still not enough for global convergence: need sophisticated linesearch (Lucidi et al, 2002). Global and derivative-free optimizationlectures 1-4 p. 35/46

36 Pattern-search methods motivated by the need to make use of parallelization of function evaluations in linesearch methods. Pattern-search algorithm Given ɛ>0, θ 1 (0, 1), θ 2 1, choose x 0 R n, stepsize α 0 >ɛ, initial direction set D 0 ; k =0; While (α k >ɛ), do: 1. If sufficient decrease condition holds at α k for some s i D k, then set x k+1 = x k + α k s i and α k+1 = θ 2 α k. 2. Else set x k+1 = x k and α k+1 = θ 1 α k. Global and derivative-free optimizationlectures 1-4 p. 36/46

37 Pattern-search methods... instead of one search direction s k, at each PS iteration, we have a set of directions D k ; conditions for a good set of directions D k : at least one direction in D k should give descent in the sense that min v 0 max s D k v T s v s δ>0 for all k. (*) (very similar to linesearch methods condition earlier). Also, require that 0 <s min s s max for all s D k. (**) Note that D k = {e i, e i } does not satisfy (*). Global and derivative-free optimizationlectures 1-4 p. 37/46

38 Pattern-search methods... Suitable choices for D k that satisfy (*) and (**): coordinate directions: {e 1,e 2,...,e n, e 1, e 2,...,e n }. {s i = 1 2n e ei }, i = 1,n and s n+1 = 1 e, where 2n e =(1, 1...,1) T. stepsize α k fixed during kth iteration; sufficient decrease condition (see Inexact linesearch earlier) checked at α k along each direction in D k. Suitable values for ρ(t): γt 2 s k or γt 3/2 etc. Pattern-search software packages: APPS (Hough, Kolda & Torczon), DIRECT (Jones, Perttunen & Stuckman), etc Global and derivative-free optimizationlectures 1-4 p. 38/46

39 Pattern-search methods... D k = {e 1,e 2, e 1, e 2 }, n =2. (a) Initial pattern (b) Move North (c) Move West (d) Move North (e) Contract (f) Move West Kolda, Lewis & Torczon (SIREV) Global and derivative-free optimizationlectures 1-4 p. 39/46

40 Simplex methods: Nelder-Mead Nothing to do with simplex methods for linear programming. Nelder-Mead (1965): the most popular algorithm with users of optimization: easy to understand and implement, not sophisticated. But heuristical, not rigorous or reliable and hence not popular with optimizers. Connection to (linear) model-based DFO: NM and simplex methods keep a simplex of points at each iteration, but do not construct a linear approximation of f over this simplex, only use function values at the vertices of the simplex and certain operations on the simplex. Vertices of the simplex: Y = {x k,y 1,...,y n }; Edges from x k : matrix M = ( y 1 x k y 2 x k... y n x k) M nonsingular the simplex (ie, convex hull of Y ) is nondegenerate. Connection to pattern-search: set of search directions D k = {s i = y i x k : i = 1,n}. Global and derivative-free optimizationlectures 1-4 p. 40/46

41 The Nelder-Mead algorithm Change notation: Y = {x 1,...,x n+1 } at iteration k with f(x 1 ) f(x 2 )... f(x n+1 ). Attempt to improve worst function value f(x n+1 ). Centroid of best n points: x = 1 n n i=1 xi. Search direction : x n+1 x; x(α) =x + α(x n+1 x). Simplex operations: illustrated for n =2(S. Richards, 2010). (a) Reflection (b) Expansion (c) Outside Contraction (d) Inside Contraction (e) Shrink Global and derivative-free optimizationlectures 1-4 p. 41/46

42 The Nelder-Mead algorithm Given ρ 1 (reflection), χ>ρ (expansion), γ (0, 1) (contraction) and σ (0, 1) (shrinkage); intial simplex Y = {x 1,...,x n+1 } in R n, k =0; While (TC not satisfied), do: 1. Order vertices: f(x 1 ) f(x 2 )... f(x n+1 ). 2. (Reflection) Compute x r = x( ρ) and f(x r ). If f(x 1 ) f(x r ) <f(x n ), replace x n+1 Y by x r ; k = k +1. Else if f(x r ) <f(x 1 ), then 3. (Expand) Compute x e = x( χ) and f(x e ). If f(x e ) <f(x r ), replace x n+1 Y by x e ; k = k +1. Else replace x n+1 Y by x r ; k = k +1. (End if) Else (i.e., f(x r ) f(x n )). Global and derivative-free optimizationlectures 1-4 p. 42/46

43 The Nelder-Mead algorithm (continued)... Else (i.e., f(x r ) f(x n )) 4. (Contract) If f(x n ) f(x r ) <f(x n+1 ), then (outside contraction) Compute x oc = x( γ) and f(x oc ). If f(x oc ) f(x r ), replace x n+1 Y by x oc ; k = k +1. Else go to Step 5. (End if) Else (i.e., f(x r ) f(x n+1 )), then (inside contraction) Compute x ic = x(γ) and f(x ic ). If f(x ic ) <f(x n+1 ), replace x n+1 Y by x ic ; k = k +1. Else go to Step 5. (End if) (End if) (End if) 5. (Shrink) Define n new vertices y i = x 1 + σ(x i x 1 ), i = 1,n, and new simplex Y + = {x 1,y 1,...,y n }; k = k +1. Global and derivative-free optimizationlectures 1-4 p. 43/46

44 The Nelder-Mead algorithm: some properties Termination conditions: function values at simplex vertices close to each other, or simplex has become too small (max i x i x 1 ɛ max(1, x 1 )). Function-evaluation cost: k =0and any shrinkage step expensive (n +1function values); else, one or two fcts. evals./operation Limited convergence results: only for n =1and n =2. Other simplex methods have better convergence theory; see Torczon (1991). Examples of failure (many), documented (McKinnon 1998). Global and derivative-free optimizationlectures 1-4 p. 44/46

45 The Nelder-Mead algorithm: convergence (Lagarias et al, 1998) Theorem 1. (n =1) Let f : R R be a strictly convex objective with bounded level sets. Assume the initial simplex is nondegenerate. Apply the Nelder-Mead algorithm to minimizing f. Then both end points of the Nelder-Mead interval (i.e., simplex in one-d) converge to the minimizer x of f. Theorem 2. (n =2) Let f : R 2 R be a strictly convex objective with bounded level sets. Assume the initial simplex is nondegenerate and that ρ =1, χ =2and γ =1/2. Apply the Nelder-Mead algorithm to minimizing f. Then and lim k f(x 1,k )=lim k f(x 2,k )=lim k f(x 3,k ) lim k diam(conv(y k )) = 0. Global and derivative-free optimizationlectures 1-4 p. 45/46

46 Illustrations of Nelder-Mead algorithm in action Margaret Wright, 2013 Global and derivative-free optimizationlectures 1-4 p. 46/46

47 A2-dNMpicture(notetheeaseofunderstandingwhat s happening!):

48 Nelder-Mead on the McKinnon counterexample:

49 Similar things happen on the more complicated (in)famous Rosenbrock function, f =100(x 2 1 x 2 ) 2 +(1 x 1 ) 2,withitscurving steep-sided valley. Coordinate search; 81 function evaluations, step =

50 Nelder Mead, 76 function evaluations

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

A recursive model-based trust-region method for derivative-free bound-constrained optimization.

A recursive model-based trust-region method for derivative-free bound-constrained optimization. A recursive model-based trust-region method for derivative-free bound-constrained optimization. ANKE TRÖLTZSCH [CERFACS, TOULOUSE, FRANCE] JOINT WORK WITH: SERGE GRATTON [ENSEEIHT, TOULOUSE, FRANCE] PHILIPPE

More information

Interpolation-Based Trust-Region Methods for DFO

Interpolation-Based Trust-Region Methods for DFO Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv

More information

Lecture 3: Linesearch methods (continued). Steepest descent methods

Lecture 3: Linesearch methods (continued). Steepest descent methods Lecture 3: Linesearch methods (continued). Steepest descent methods Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 3: Linesearch methods (continued).

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection 6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal Zero-Order Methods for the Optimization of Noisy Functions Jorge Nocedal Northwestern University Simons Institute, October 2017 1 Collaborators Albert Berahas Northwestern University Richard Byrd University

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Derivative-Free Trust-Region methods

Derivative-Free Trust-Region methods Derivative-Free Trust-Region methods MTH6418 S. Le Digabel, École Polytechnique de Montréal Fall 2015 (v4) MTH6418: DFTR 1/32 Plan Quadratic models Model Quality Derivative-Free Trust-Region Framework

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1 A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions R. Yousefpour 1 1 Department Mathematical Sciences, University of Mazandaran, Babolsar, Iran; yousefpour@umz.ac.ir

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3 Contents Preface ix 1 Introduction 1 1.1 Optimization view on mathematical models 1 1.2 NLP models, black-box versus explicit expression 3 2 Mathematical modeling, cases 7 2.1 Introduction 7 2.2 Enclosing

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Scientific Computing: Optimization

Scientific Computing: Optimization Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

More information

A multistart multisplit direct search methodology for global optimization

A multistart multisplit direct search methodology for global optimization 1/69 A multistart multisplit direct search methodology for global optimization Ismael Vaz (Univ. Minho) Luis Nunes Vicente (Univ. Coimbra) IPAM, Optimization and Optimal Control for Complex Energy and

More information

4M020 Design tools. Algorithms for numerical optimization. L.F.P. Etman. Department of Mechanical Engineering Eindhoven University of Technology

4M020 Design tools. Algorithms for numerical optimization. L.F.P. Etman. Department of Mechanical Engineering Eindhoven University of Technology 4M020 Design tools Algorithms for numerical optimization L.F.P. Etman Department of Mechanical Engineering Eindhoven University of Technology Wednesday September 3, 2008 1 / 32 Outline 1 Problem formulation:

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Global and derivative-free optimization Lectures 1-4 (continued)

Global and derivative-free optimization Lectures 1-4 (continued) Global and derivative-free optimization Lectures 1-4 (continued) Coralia Cartis, University of Oxford INFOMM CDT: Contemporary Numerical Techniques Global and derivative-free optimizationlectures 1-4 (continued)

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department

More information

Lecture 15: SQP methods for equality constrained optimization

Lecture 15: SQP methods for equality constrained optimization Lecture 15: SQP methods for equality constrained optimization Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 15: SQP methods for equality constrained

More information

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes) AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth

More information

Scientific Data Computing: Lecture 3

Scientific Data Computing: Lecture 3 Scientific Data Computing: Lecture 3 Benson Muite benson.muite@ut.ee 23 April 2018 Outline Monday 10-12, Liivi 2-207 Monday 12-14, Liivi 2-205 Topics Introduction, statistical methods and their applications

More information

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,

More information

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

A DERIVATIVE-FREE ALGORITHM FOR THE LEAST-SQUARE MINIMIZATION

A DERIVATIVE-FREE ALGORITHM FOR THE LEAST-SQUARE MINIMIZATION A DERIVATIVE-FREE ALGORITHM FOR THE LEAST-SQUARE MINIMIZATION HONGCHAO ZHANG, ANDREW R. CONN, AND KATYA SCHEINBERG Abstract. We develop a framework for a class of derivative-free algorithms for the least-squares

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

Lecture V. Numerical Optimization

Lecture V. Numerical Optimization Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Optimization Methods. Lecture 19: Line Searches and Newton s Method 15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions

More information

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38

More information

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

ECS550NFB Introduction to Numerical Methods using Matlab Day 2 ECS550NFB Introduction to Numerical Methods using Matlab Day 2 Lukas Laffers lukas.laffers@umb.sk Department of Mathematics, University of Matej Bel June 9, 2015 Today Root-finding: find x that solves

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

A Derivative-Free Gauss-Newton Method

A Derivative-Free Gauss-Newton Method A Derivative-Free Gauss-Newton Method Coralia Cartis Lindon Roberts 29th October 2017 Abstract We present, a derivative-free version of the Gauss-Newton method for solving nonlinear least-squares problems.

More information

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel

More information

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods. Jorge Nocedal

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods. Jorge Nocedal Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods Jorge Nocedal Northwestern University Huatulco, Jan 2018 1 Collaborators Albert Berahas Northwestern University Richard Byrd University

More information

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods Department of Statistical Sciences and Operations Research Virginia Commonwealth University Sept 25, 2013 (Lecture 9) Nonlinear Optimization

More information

GEOMETRY OF INTERPOLATION SETS IN DERIVATIVE FREE OPTIMIZATION

GEOMETRY OF INTERPOLATION SETS IN DERIVATIVE FREE OPTIMIZATION GEOMETRY OF INTERPOLATION SETS IN DERIVATIVE FREE OPTIMIZATION ANDREW R. CONN, KATYA SCHEINBERG, AND LUíS N. VICENTE Abstract. We consider derivative free methods based on sampling approaches for nonlinear

More information

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John

More information

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University

More information

A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification

A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification JMLR: Workshop and Conference Proceedings 1 16 A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification Chih-Yang Hsia r04922021@ntu.edu.tw Dept. of Computer Science,

More information

Lecture 14: Newton s Method

Lecture 14: Newton s Method 10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I IE 5531: Engineering Optimization I Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson November 1, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 1 / 24

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Benchmarking Derivative-Free Optimization Algorithms

Benchmarking Derivative-Free Optimization Algorithms ARGONNE NATIONAL LABORATORY 9700 South Cass Avenue Argonne, Illinois 60439 Benchmarking Derivative-Free Optimization Algorithms Jorge J. Moré and Stefan M. Wild Mathematics and Computer Science Division

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Unconstrained Multivariate Optimization

Unconstrained Multivariate Optimization Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued

More information

Worst case complexity of direct search

Worst case complexity of direct search EURO J Comput Optim (2013) 1:143 153 DOI 10.1007/s13675-012-0003-7 ORIGINAL PAPER Worst case complexity of direct search L. N. Vicente Received: 7 May 2012 / Accepted: 2 November 2012 / Published online:

More information

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization

More information

Trust Regions. Charles J. Geyer. March 27, 2013

Trust Regions. Charles J. Geyer. March 27, 2013 Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but

More information