Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Size: px
Start display at page:

Download "Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2"

Transcription

1 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 1/51

2 Problems and solutions minimize f(x) subject to x Ω R n. ( ) f : Ω R is (sufficiently) smooth (f C i (Ω), i {1,2}). f objective; x variables; Ω feasible set (determined by finitely many constraints). n may be large. minimizing f(x) maximizing f(x). Wlog, minimize. x global minimizer of f over Ω f(x) f(x ), x Ω. x local minimizer of f over Ω there exists N(x,δ) such that f(x) f(x ), for all x Ω N(x,δ), where N(x,δ) := {x R n : x x δ} and is the Euclidean norm. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 2/51

3 Example problem in one dimension Example : minf(x) subject to a x b. f(x) x 1 x 2 a b The feasible region Ω is the interval [a,b]. The point x 1 is the global minimizer; x 2 is a local (non-global) minimizer; x = a is a constrained local minimizer. x Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 3/51

4 Example problems in two dimensions x y y Ackeley s test function x Rosenbrock s test function [see Wikipedia] Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 4/51

5 Main classes of continuous optimization problems Linear (Quadratic) programming: linear (quadratic) objective and linear constraints in the variables min x IR nct x ( xt Hx ) subjectto a T i x = b i,i E; a T i x b i,i I, where c, a i IR n for all i and H is n n matrix; E and I are finite index sets. Unconstrained (Constrained) nonlinear programming min x IR nf(x) (subjectto c i(x) = 0,i E; c i (x) 0,i I) where f,c i : IR n IR are (smooth, possibly nonlinear) functions for all i; E and I are finite index sets. Most real-life problems are nonlinear, often large-scale! Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 5/51

6 Example: an OR application [Gould 06] Optimization of a high-pressure gas network pressures p = (p i, i); flows q = (q j, j); demands d = (d k, k); compressors. Maximize net flow s.t. the constraints: Aq d = 0 A T p 2 + Kq = 0 A T 2 q + z c(p,q) = 0 p min p p max q min q q max A, A 2 {±1,0}; z {0,1} 200 nodes and pipes, 26 machines: 400 variables; variable demand, (p, d) 10mins. 58,000 vars; real-time. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 6/51

7 Example: an inverse problem application[metoffice] Data assimilation for weather forecasting best estimate of the current state of the atmosphere find initial conditions x 0 for the numerical forecast by solving the (ill-posed) nonlinear inverse problem min x 0 m (H i [x i ] y i ) T R 1 i (H[x i ] y i ), i=0 x i = S(t i,t 0,x 0 ), S solution operator of the discrete nonlinear model; H i maps x i to observations y i, R i error covariance matrix of the observations at t i. x 0 of size ; observations m 250,000. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 7/51

8 Optimality conditions for unconstrained problems == algebraic characterizations of solutions suitable for computations. provide a way to guarantee that a candidate point is optimal (sufficient conditions) indicate when a point is not optimal (necessary conditions) minimize f(x) subject to x R n. (UP) First-order necessary conditions: f C 1 (R n ); x a local minimizer of f = f(x ) = 0. f(x) = 0 x stationary point of f. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 8/51

9 Optimality conditions for unconstrained problems... Lemma. Let f C 1, x R n and s R n with s 0. Then f(x) T s < 0 = f(x + αs) < f(x), α > 0 sufficiently small. Proof. f C 1 = α > 0 such that f(x + αs) T s < 0, α [0,α]. Taylor s/mean value theorem: ( ) f(x + αs) = f(x) + α f(x + αs) T s, for some α (0,α). ( ) = f(x + αs) < f(x), α [0,α]. s descent direction for f at x if f(x) T s < 0. Proof of 1st order necessary conditions. assume f(x ) 0. s := f(x ) is a descent direction for f at x = x : f(x ) T ( f(x )) = f(x ) T f(x ) = f(x ) 2 < 0 since f(x ) 0 and a 0 with equality iff a = 0. Thus, by Lemma, x is not a local minimizer of f. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 9/51

10 Optimality conditions for unconstrained problems... f(x) is a descent direction for f at x whenever f(x) 0. s descent direction for f at x if f(x) T s < 0, which is equivalent to cos f(x),s = ( f(x))t s f(x) s = f(x)t s f(x) s > 0, and so: f(x),s [0,π/2). p k f k A descent directionp k. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 10/51

11 Summary of first-order conditions. A look ahead minimize f(x) subject to x R n. (UP) First-order necessary optimality conditions: f C 1 (R n ); x a local minimizer of f = f(x ) = 0. x = argmax x R n f(x) f( x) = 0. f(x) x 1 x 2 a b Look at higher-order derivatives to distinguish between minimizers and maximizers.... except for convex functions: x local minimizer/stationary point = x global minimizer. [see Problem Sheet] x Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 11/51

12 Second-order optimality conditions (nonconvex fcts.) Second-order necessary conditions: f C 2 (R n ); x local minimizer of f = 2 f(x ) positive semidefinite, i.e., s T 2 f(x )s 0, for all s R n. Example: f(x) := x 3, x = 0 not a local minimizer but f (0) = f (0) = 0. Second-order sufficient conditions: f C 2 (R n ); f(x ) = 0 and 2 f(x ) positive definite, namely, s T 2 f(x )s > 0, for all s 0. = x (strict) local minimizer of f. [local convexity] Example: f(x) := x 4, x = 0 is a (strict) local minimizer but f (0) = 0. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 12/51

13 Stationary points of quadratic functions x* x* x* (a) Minimum (b) Maximum (c) Saddle (d) Semidefinite x 2 x 2 x 2 x* x* x 1 x 1 x 1 (e) Maximum or (f) Saddle (g) Semidefinite minimum H R n n, g R n : q(x ) = 0 Hx + g = 0; q(x) := g T x xt Hx. 2 q(x) = H, x. General f: approx. locally quadratic around x stationary. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 13/51

14 Methods for local unconstrained optimization minimize f(x) subject to x R n (UP) [f C 1 (R n ) or f C 2 (R n )] A Generic Method (GM) Choose ǫ > 0 and x 0 R n. While (TERMINATION CRITERIA not achieved), REPEAT: compute the change x k+1 x k = F(x k, problem data), [linesearch, trust-region] to ensure f(x k+1 ) < f(x k ). set x k+1 := x k + F(x k, prob. data), k := k + 1. TC: f(x k ) ǫ; maybe also, λ min ( 2 f(x k )) ǫ. e.g., x k+1 minimizer of some (simple) model of f around x k linesearch, trust-region methods. if F = F(x k,x k 1, problem data) conjugate gradients mthd. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 14/51

15 Issues to consider about GM Global convergence of GM: if ǫ := 0 and any x 0 R n : f(x k ) 0, as k? [maybe also, liminf k λ min ( 2 f(x k )) 0?] Local convergence of GM: if ǫ := 0 and x 0 sufficiently close to x stationary/local minimizer of f: x k x, k? Global/local complexity of GM: count number of iterations and their cost required by GM to generate x k within desired accuracy ǫ > 0, e.g., such that f(x k ) ǫ. [connection to convergence and its rate] Rate of global/local convergence of GM. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 15/51

16 Rates of convergence of sequences: an example l k := (1/2) k 0 linearly, q k := (1/2) 2k 0 quadratically, s k := k k 0 superlinearly as k. 0 l k k l k q k ( 1) ( 2) ( 2) 0.1 ( 4) 250 q k s k ( 2) 0.2 ( 9) ( 2) 0.5 ( 19) Rates of convergence on a log scale. Notation: ( i) := 10 i. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 16/51

17 Rates of convergence of sequences {x k } R n,x R n ; x k x as k. p-rate of convergence: x k x with rate p 1 if ρ > 0 and k 0 0 such that x k+1 x ρ x k x p, k k 0. ρ convergence factor; e k := x k x error in x k x. Linear convergence: p = 1 ρ < 1; (asymptotically,) no of correct digits grows linearly in the number of iterations. Quadratic convergence: p = 2; (asymptotically,) no of correct digits grows exponentially in the number of iterations. Superlinear convergence: x k+1 x / x k x 0 as k. [faster than linear, slower than quadratic; practically very acceptable ] Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 17/51

18 Linesearch methods Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 18/51

19 A generic linesearch method (UP): minimize f(x) subject to x R n, where f C 1 or C 2. A Generic Linesearch Method (GLM) Choose ǫ > 0 and x 0 R n. While f(x k ) > ǫ, REPEAT: compute a descent search direction s k R n, f(x k ) T s k < 0; compute a stepsize α k > 0 along s k such that f(x k + α k s k ) < f(x k ); set x k+1 := x k + α k s k and k := k + 1. Recall property of descent directions. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 19/51

20 Performing a linesearch How to compute α k? Exact linesearch: α k := argmin α>0 f(x k + αs k ). computationally expensive for nonlinear objectives. x x (1) x x (a) (c) x x2 4 2 x x1 x (2) (b) (d) Exact linesearch for quadratic functions Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 20/51

21 Performing a linesearch... Inexact linesearch want stepsize α k not to be: too short too long compared to the decrease in f f(x) 3 f(x) (x 1,f(x 1 ) (x 1,f(x 1 ) (x 2,f(x 2 ) 1.5 (x 2,f(x 2 ) 1 (x 5,f(x 5 ) (x 3,f(x 3 ) (x 4,f(x 4 ) 1 (x 3,f(x 3 ) (x 5,f(x 5 ) (x 4,f(x 4 ) x x Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 21/51

22 Performing a linesearch... Inexact linesearch want stepsize α k not too long compared to the decrease in f. The Armijo condition Choose β (0,1). Compute α k > 0 such that f(x k + α k s k ) f(x k ) + βα k f(x k ) T s k ( ) is satisfied. in practice, β := 0.1 or even β := due to the descent condition, α k > 0 such that ( ) holds for α [0,α k ]. Choose α k (0,α k ] as large as possible. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 22/51

23 Performing a linesearch... Inexact linesearch... Φ k : R R, Φ k (α) := f(x k + αs k ), α 0. Then Armijo Φ k (α k ) Φ k (0) + βα k Φ (0). Let y β (α) := Φ k (0) + βαφ (0), α 0. y 2 y=y 0 (α) y=φ(α) 0.5 y=y 1 (α) y=y 0.6 (α) acceptable α=(0,a] A α Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 23/51

24 Performing a linesearch... Inexact linesearch The backtracking-armijo (barmijo) linesearch algorithm Choose α (0) > 0, τ (0,1) and β (0,1). While f(x k + α (i) s k ) > f(x k ) + βα (i) f(x k ) T s k, REPEAT: set α (i+1) := τα (i) and i := i + 1. END. Set α k := α (i). backtracking = stepsize α k not too short ; α (0) := 1; τ := 0.5 = α (0) := 1, α (1) := 0.5, α (2) := 0.25,... the barmijo linesearch algorithm terminates in a finite number of steps with α k > 0, due to the descent condition. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 24/51

25 Steepest descent method Steepest descent (SD) direction: set s k := f(x k ), k 0. s k descent direction whenever f(x k ) 0: f(x k ) T s k < 0 f(x k ) T ( f(x k )) < 0 f(x k ) 2 < 0. s k steepest descent: unique global solution of minimize s R nf(x k ) + s T f(x k ) subject to s = f(x k ). Cauchy-Schwarz: s T f(x k ) s f(x k ), s, with equality iff s is proportional to f(x k ). Method of steepest descent (SD): GLM with s k == SD direction; any linesearch. SD-e :== SD method with exact linesearches; SD-bA :== SD method with barmijo linesearches. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 25/51

26 Global convergence of steepest descent methods f C 1 (R n ); f is Lipschitz continuous (on R n ) iff L > 0, f(y) f(x) L y x, x, y R n. Theorem Let f C 1 (R n ) be bounded below on R n. Let f be Lipschitz continuous. Apply the SD-e or the SD-bA method to minimizing f with ǫ := 0. Then both variants of the SD method have the property: either there exists l 0 such that f(x l ) = 0 or f(x k ) 0 as k. SD methods have excellent global convergence properties (under weak assumptions). Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 26/51

27 Some disadvatanges of steepest descent methods SD methods are scale-dependent. poorly scaled variables = SD direction gives little progress The effect of problem scaling on SD-e performance. Let f(x 1,x 2 ) = 1 2 (ax2 1 + x2 2 ), where a > 0. Left figure: a = (mildly poor scaling). Right figure: a = 1 ( perfect scaling). Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 27/51

28 Some disadvatanges of steepest descent methods... Usually, SD methods converge very slowly to solution, asymptotically. theory: very slow converge. numerics: break-down (cumulation of round-off and ill-conditioning) f(x 1,x 2 ) = 10(x 2 x 2 1 ) (x 1 1) SD-bA applied to the Rosenbrock function f. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 28/51

29 Local rate of convergence for steepest descent Locally, SD converges linearly to a solution, namely f(x k+1 ) f(x ) ρ f(x k ) f(x ), k suff. large BUT convergence factor ρ v. close to 1 usually! Theorem. f C 2 ; x local minimizer of f with 2 f(x ) positive definite λ max, λ min eigenvalues. Apply SD-e to minf. If x k x as k, then f(x k ) converges linearly to f(x ), with ρ ( κ(x ) 1 κ(x )+1) 2 := ρsd, where κ(x ) = λ max /λ min condition number of 2 f(x ). practice: ρ = ρ SD ; for Rosenbrock f: κ(x ) = , ρ SD κ(x ) = 800, f(x 0 ) = 1, f(x ) = 0. SD-e gives f(x k ) after 1000 iterations! Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 29/51

30 Other directions for GLMs Let B k symmetric, positive definite matrix. Let s k be def. by B k s k = f(x k ). ( ) = s k descent direction; = s k solves the problem minimize s R n m k (s) = f(x k ) + f(x k ) T s st B k s k. If 2 f(x k ) is positive definite and B k := 2 f(x k ) = damped Newton method. ( ) is a scaled steepest descent direction; resulting GLMs can be made scale-invariant for some B k (e.g., B k = 2 f(x k )); local convergence can be made faster than SD methods for some B k (e.g., quadratic for Newton); Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 30/51

31 Global convergence for general GLMs Theorem Let f C 1 (R n ) be bounded below on R n. Let f be Lipschitz continuous. Let the eigenvalues of B k be uniformly bounded above and below, away from zero, for all k. Apply the GLM to minf with barmijo linesearch, s k from ( ) and ǫ = 0. Then either there exists l 0 such that f(x l ) = 0 or f(x k ) 0 as k. Theorem requires locally strongly convex quadratic models of f for all k. In particular, Th. holds if f locally strongly convex at all iterates. See Problem Sheet for failure examples for Newton s method (without linesearch) to converge globally Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 31/51

32 Local convergence for damped Newton s method Theorem Let f C 2 (R n ) and 2 f be Lipschitz continuous and uniformly positive definite. Let s k =Newton direction in GLM and let barmijo linesearch have β 0.5 and α (0) = 1. Then, provided limit points of the sequence of iterates exist, α k = 1 for all sufficiently large k; x k x as k quadratically. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 32/51

33 Local convergence for damped Newton... f(x 1,x 2 ) = 10(x 2 x 2 1 )2 + (x 1 1) 2 ; x = (1,1) Damped Newton with barmijo linesearch applied to the Rosenbrock function f. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 33/51

34 Modified Newton methods If 2 f(x k ) is not positive definite, it is usual to solve instead ( 2 f(x k ) + M k) s k = f(x k ), where M k chosen such that 2 f(x k ) + M k is sufficiently positive definite. M k := 0 when 2 f(x k ) is sufficiently positive definite. Popular option: Modified Cholesky: compute Cholesky factorization 2 f(x k ) = L k (L k ) where L k is lower triangular matrix. Modify the generated L k if the factorization is in danger of failing (modify small or negative diagonal pivots, etc.). Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 34/51

35 Quasi-Newton methods Secant approximations for computing B k 2 f(x k ) [when 2 f unavailable exactly or by function or gradient finite-differences] At the start of the GLM, choose B 0 (say, B 0 := I). After computing s k from B k s k = f(x k ) and x k+1 = x k + α k s k, compute update B k+1 of B k. Wish list: Compute B k+1 as a function of already-computed quantities f(x k+1 ), f(x k ),..., f(x 0 ), B k, s k B k+1 should be symmetric, nonsingular (pos. def?) B k+1 close to B k, a cheap update of B k B k 2 f(x k ), etc. = a new class of methods: faster than steepest descent method, cheaper to compute per iteration than Newton s. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 35/51

36 Quasi-Newton methods... For the first wish, choose B k+1 to satisfy the secant equation γ k := f(x k+1 ) f(x k ) = B k+1 (x k+1 x k ) = B k+1 α k s k. It is satisfied by B k+1 = 2 f when f is a quadratic function. The change in gradient contains information about the Hessian. Many ways to compute B k+1 to satisfy secant eq; trade-off between wishes: rank-1 update of B k (Symmetric-Rank 1), rank 2 update of B k (BFGS (Broyden-Fletcher-Goldfarb-Shanno)), DFP (Davidson-Fletcher-Powell), etc. Work per iteration: O(n 2 ) (as opposed to the O(n 3 ) of Newton) For global convergence, must use Wolfe linesearch instead of barmijo linesearch. local Q-superlinear convergence [see Problem Sheet] Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 36/51

37 Trust-region methods Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 37/51

38 Linesearch versus trust-region methods (UP): minimize f(x) subject to x R n. Linesearch methods: liberal in the choice of search direction, keeping bad behaviour in control by choice of α k. choose descent direction s k, compute stepsize α k to reduce f(x k + αs k ), update x k+1 := x k + α k s k. Trust region (TR) methods: conservative in the choice of search direction, so that a full stepsize along it may really reduce the objective. pick direction s k to reduce a local model of f(x k + s k ), accept x k+1 := x k + s k if decrease in the model is also achieved by f(x k + s k ), else set x k+1 := x k and refine the model. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 38/51

39 Trust-region models for unconstrained problems Approximate f(x k + s) by: linear model l k (s) := f(x k ) + s f(x k ) or quadratic model Impediments: q k (s) := f(x k ) + s f(x k ) s 2 f(x k )s. models may not resemble f(x k + s) when s is large, models may be unbounded from below, l k (s) always unbounded below (unless f(x k ) = 0) q k (s) is always unbounded below if 2 f(x k ) is negative definite or indefinite, and sometimes if 2 f(x k ) is positive semidefinite. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 39/51

40 Trust region models and subproblem Prevent bad approximations by trusting the model only in a trust region, defined by the trust region constraint s k, (R) for some appropriate radius k > 0. The constraint (R) also prevents l k, q k from unboundedness! = the trust region subproblem min s R nm k(s) subject to s k, (TR) where m k := l k, k 0, or m k := q k, k 0. From now on, m k := q k. (TR) easier to solve than (P). May even solve (TR) only approximately. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 40/51

41 Trust region models and subproblem - an example 1 quadratic TR model about x=(1, 0.5), =1 1 linear TR model about x=(1, 0.5), = quadratic TR model about x=(0,0), = quadratic TR model about x=( 0.25,0.5), = Trust-region models of f(x) = x x 1x 2 + (1 + x 2 ) 2. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 41/51

42 Generic trust-region method Let s k be a(n approximate) solution of (TR). Then predicted model decrease: m k (0) m k (s k ) = f(x k ) m k (s k ). actual function decrease: f(x k ) f(x k + s k ). The trust region radius k is chosen based on the value of ρ k := f(xk ) f(x k + s k ). f(x k ) m k (s k ) If ρ k is not too smaller than 1, x k+1 := x k + s k, k+1 k. If ρ k close to or 1, k is increased. If ρ k 1, x k+1 = x k and k is reduced. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 42/51

43 A Generic Trust Region (GTR) method Given 0 > 0, x 0 R n, ǫ > 0. While f(x k ) ǫ, do: 1. Form the local quadratic model m k (s) of f(x k + s). 2. Solve (approximately) the (TR) subproblem for s k with m k (s k ) < f(x k ) ( sufficiently ). Compute ρ k := [f(x k ) f(x k + s k )]/[f(x k ) m k (s k )]. 3. If ρ k 0.9, then [very successful step] set x k+1 := x k + s k and k+1 := 2 k. Else if ρ k 0.1, then [successful step] set x k+1 := x k + s k and k+1 := k. Else [unsuccessful step] set x k+1 = x k and k+1 := 1 2 k. 4. Let k := k + 1. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 43/51

44 Trust-region methods Other sensible values of the parameters of the GTR are possible. Solving the (TR) subproblem min s R nm k(s) subject to s k, (TR)... exactly or even approximately may imply work. Want minimal condition of sufficient decrease in the model that ensures global convergence of the GTR method (the Cauchy cond.). In practice, we (usually) do much better than this condition! Example of applying a trust-region method: [Sartenaer, 2008]. approximate solution of (TR) subproblem: better than Cauchy, but not exact. notation: f/ m k ρ k. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 44/51

45 The Cauchy point of the (TR) subproblem recall the steepest descent method has strong (theoretical) global convergence properties; same will hold for GTR method with SD direction. minimal condition of sufficient decrease in the model: require m k (s k ) m k (s k c ) and sk k, where s k c := αk c f(xk ), with α k c := arg min α>0 m k( α f(x k )) subject to α f(x k ) k. [i.e. a linesearch along steepest descent direction is applied to m k at x k and is restricted to the trust region.] Easy: α k c := arg min α m k( α f(x k )) subject to 0 < α y k c := xk + s k c is the Cauchy point. k f(x k ). [see Problem Sheet] Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 45/51

46 Global convergence of the GTR method Theorem (GTR Global Convergence) Let f be smooth on R n and bounded below on R n. Let f be Lipschitz continuous on R n. Let {x k }, k 0, be generated by the generic trust region (GTR) method, and let the computation of s k be such that m k (s k ) m k (s k c ), for all k. Then either or there exists k 0 such that f(x k ) = 0 lim k f(x k ) = 0. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 46/51

47 Solving the (TR) subproblem On each TR iteration we compute or approximate the solution of min s R nm k(s) = f(x k ) + s f(x k ) s 2 f(x k )s subject to s k. also, s k must satisfy the Cauchy condition m k (s k ) m k (s k c ), where s k c := αk c f(xk ), with α k c := arg min α>0 m k( α f(x k )) subject to α f(x k ) k. [Cauchy condition ensures global convergence] solve (TR) exactly (i.e., compute global minimizer of TR) = TR akin to Newton-like method. solve (TR) approximately (i.e., an approximate global minimizer) = large-scale problems. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 47/51

48 Solving the (TR) subproblem exactly For h R, > 0, g R n, H n n real matrix, consider min s R nm(s) := h + s g s Hs, s. t. s. (TR) Characterization result for the solution of (TR): Theorem Any global minimizer s of (TR) satisfies the equation (H + λ I)s = g, where H + λ I is positive semidefinite, λ 0, λ ( s ) = 0 and s. If H + λ I is positive definite, then s is unique. The above Theorem gives necessary and sufficient global optimality conditions for a nonconvex optimization problem! Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 48/51

49 Solving the (TR) subproblem exactly Computing the global solution s of (TR): Case 1. If H is positive definite and Hs = g satisfies s = s := s (unique), λ := 0 (by Theorem). Case 2. If H is positive definite but s >, or H is not positive definite, Theorem implies s satisfies (H + λi)s = g, s =, ( ) for some λ max{0, λ min (H)} := λ. Let s(λ) = (H + λi) 1 g, for any λ > λ. Then s = s(λ ) where λ λ solution of s(λ) =, λ λ. nonlinear equation in one variable λ. Use Newton s method to solve it. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 49/51

50 Further considerations on trust-region methods Solving the (TR) subproblem accurately for large-scale (and dense) problems uses iterative methods (conjugate-gradient, Lanczos); Cauchy condition satisfied Quasi-Newton methods/approximate derivatives also possible in the trust-region framework (no need for positive definite updates for the Hessian); replace 2 f(x k ) with approximation B k in the quadratic local model m k (s). Conclusions: state-of-the-art software exists implementing linesearch and TR methods; similar performances for linesearch and TR methods (more heuristical approaches needed by linesearch methods to deal with negative curvature). Choosing between the two is mostly a matter of taste. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 50/51

51 Numerical optimization bibliography J. Nocedal & S. J. Wright, Numerical Optimization, Springer, 2nd edition, R. Fletcher, Practical Methods of Optimization, Wiley, 2nd edition, J. Dennis and R. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, (republished by) SIAM, Information on existing software can be found at the NEOS Center: look under Optimization Guide and Optimization Tree, etc. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 51/51

Lecture 3: Linesearch methods (continued). Steepest descent methods

Lecture 3: Linesearch methods (continued). Steepest descent methods Lecture 3: Linesearch methods (continued). Steepest descent methods Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 3: Linesearch methods (continued).

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Global and derivative-free optimization Lectures 1-4

Global and derivative-free optimization Lectures 1-4 Global and derivative-free optimization Lectures 1-4 Coralia Cartis, University of Oxford INFOMM CDT: Contemporary Numerical Techniques Global and derivative-free optimizationlectures 1-4 p. 1/46 Lectures

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL) Part 2: Linesearch methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Lecture 15: SQP methods for equality constrained optimization

Lecture 15: SQP methods for equality constrained optimization Lecture 15: SQP methods for equality constrained optimization Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 15: SQP methods for equality constrained

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Maria Cameron. f(x) = 1 n

Maria Cameron. f(x) = 1 n Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

Trust Regions. Charles J. Geyer. March 27, 2013

Trust Regions. Charles J. Geyer. March 27, 2013 Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

Lecture 7 Unconstrained nonlinear programming

Lecture 7 Unconstrained nonlinear programming Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

ECS550NFB Introduction to Numerical Methods using Matlab Day 2 ECS550NFB Introduction to Numerical Methods using Matlab Day 2 Lukas Laffers lukas.laffers@umb.sk Department of Mathematics, University of Matej Bel June 9, 2015 Today Root-finding: find x that solves

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Gradient Descent, Newton-like Methods Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them in class/help-session/tutorial this

More information

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

How do we recognize a solution?

How do we recognize a solution? AMSC 607 / CMSC 764 Advanced Numerical Optimization Fall 2010 UNIT 2: Unconstrained Optimization, Part 1 Dianne P. O Leary c 2008,2010 The plan: Unconstrained Optimization: Fundamentals How do we recognize

More information

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,

More information

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 11 and

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

EECS260 Optimization Lecture notes

EECS260 Optimization Lecture notes EECS260 Optimization Lecture notes Based on Numerical Optimization (Nocedal & Wright, Springer, 2nd ed., 2006) Miguel Á. Carreira-Perpiñán EECS, University of California, Merced May 2, 2010 1 Introduction

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Trust Region Methods Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Trust Region Methods min p m k (p) f(x k + p) s.t. p 2 R k Iteratively solve approximations

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Improved Damped Quasi-Newton Methods for Unconstrained Optimization Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified

More information

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor. ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.

More information

Cubic regularization in symmetric rank-1 quasi-newton methods

Cubic regularization in symmetric rank-1 quasi-newton methods Math. Prog. Comp. (2018) 10:457 486 https://doi.org/10.1007/s12532-018-0136-7 FULL LENGTH PAPER Cubic regularization in symmetric rank-1 quasi-newton methods Hande Y. Benson 1 David F. Shanno 2 Received:

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

Notes on Numerical Optimization

Notes on Numerical Optimization Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18, 2014 1 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Optimization Methods. Lecture 19: Line Searches and Newton s Method 15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions

More information

Geometry optimization

Geometry optimization Geometry optimization Trygve Helgaker Centre for Theoretical and Computational Chemistry Department of Chemistry, University of Oslo, Norway European Summer School in Quantum Chemistry (ESQC) 211 Torre

More information

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: Unconstrained Convex Optimization 21 4 Newton Method H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: f(x + p) f(x)+p T f(x)+ 1 2 pt H(x)p ˆf(p) In general, ˆf(p) won

More information

NonlinearOptimization

NonlinearOptimization 1/35 NonlinearOptimization Pavel Kordík Department of Computer Systems Faculty of Information Technology Czech Technical University in Prague Jiří Kašpar, Pavel Tvrdík, 2011 Unconstrained nonlinear optimization,

More information

GRADIENT = STEEPEST DESCENT

GRADIENT = STEEPEST DESCENT GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Optimisation in Higher Dimensions

Optimisation in Higher Dimensions CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

More information

On Lagrange multipliers of trust-region subproblems

On Lagrange multipliers of trust-region subproblems On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John

More information

Miscellaneous Nonlinear Programming Exercises

Miscellaneous Nonlinear Programming Exercises Miscellaneous Nonlinear Programming Exercises Henry Wolkowicz 2 08 21 University of Waterloo Department of Combinatorics & Optimization Waterloo, Ontario N2L 3G1, Canada Contents 1 Numerical Analysis Background

More information

Review of Classical Optimization

Review of Classical Optimization Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

Unconstrained Multivariate Optimization

Unconstrained Multivariate Optimization Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued

More information