Methods for Unconstrained Optimization Numerical Optimization Lectures PDF Free Download

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 1/51

Problems and solutions minimize f(x) subject to x Ω R n. ( ) f : Ω R is (sufficiently) smooth (f C i (Ω), i {1,2}). f objective; x variables; Ω feasible set (determined by finitely many constraints). n may be large. minimizing f(x) maximizing f(x). Wlog, minimize. x global minimizer of f over Ω f(x) f(x ), x Ω. x local minimizer of f over Ω there exists N(x,δ) such that f(x) f(x ), for all x Ω N(x,δ), where N(x,δ) := {x R n : x x δ} and is the Euclidean norm. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 2/51

Example problem in one dimension Example : minf(x) subject to a x b. f(x) x 1 x 2 a b The feasible region Ω is the interval [a,b]. The point x 1 is the global minimizer; x 2 is a local (non-global) minimizer; x = a is a constrained local minimizer. x Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 3/51

Example problems in two dimensions 2500 14 2000 12 1500 10 1000 8 6 500 4 3.0 2 4 2.5 2.0 2.0 2 1.5 1.5 0 4 2 x 2 0 4 y y 1.0 1.0 0.5 0.0 0.5 0.0 4 Ackeley s test function x 0.5 2 1.0 0.5 1.5 Rosenbrock s test function [see Wikipedia] Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 4/51

Main classes of continuous optimization problems Linear (Quadratic) programming: linear (quadratic) objective and linear constraints in the variables min x IR nct x ( + 1 2 xt Hx ) subjectto a T i x = b i,i E; a T i x b i,i I, where c, a i IR n for all i and H is n n matrix; E and I are finite index sets. Unconstrained (Constrained) nonlinear programming min x IR nf(x) (subjectto c i(x) = 0,i E; c i (x) 0,i I) where f,c i : IR n IR are (smooth, possibly nonlinear) functions for all i; E and I are finite index sets. Most real-life problems are nonlinear, often large-scale! Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 5/51

Example: an OR application [Gould 06] Optimization of a high-pressure gas network pressures p = (p i, i); flows q = (q j, j); demands d = (d k, k); compressors. Maximize net flow s.t. the constraints: Aq d = 0 A T p 2 + Kq 2.8359 = 0 A T 2 q + z c(p,q) = 0 p min p p max q min q q max A, A 2 {±1,0}; z {0,1} 200 nodes and pipes, 26 machines: 400 variables; variable demand, (p, d) 10mins. 58,000 vars; real-time. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 6/51

Example: an inverse problem application[metoffice] Data assimilation for weather forecasting best estimate of the current state of the atmosphere find initial conditions x 0 for the numerical forecast by solving the (ill-posed) nonlinear inverse problem min x 0 m (H i [x i ] y i ) T R 1 i (H[x i ] y i ), i=0 x i = S(t i,t 0,x 0 ), S solution operator of the discrete nonlinear model; H i maps x i to observations y i, R i error covariance matrix of the observations at t i. x 0 of size 10 7 10 8 ; observations m 250,000. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 7/51

Optimality conditions for unconstrained problems == algebraic characterizations of solutions suitable for computations. provide a way to guarantee that a candidate point is optimal (sufficient conditions) indicate when a point is not optimal (necessary conditions) minimize f(x) subject to x R n. (UP) First-order necessary conditions: f C 1 (R n ); x a local minimizer of f = f(x ) = 0. f(x) = 0 x stationary point of f. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 8/51

Optimality conditions for unconstrained problems... Lemma. Let f C 1, x R n and s R n with s 0. Then f(x) T s < 0 = f(x + αs) < f(x), α > 0 sufficiently small. Proof. f C 1 = α > 0 such that f(x + αs) T s < 0, α [0,α]. Taylor s/mean value theorem: ( ) f(x + αs) = f(x) + α f(x + αs) T s, for some α (0,α). ( ) = f(x + αs) < f(x), α [0,α]. s descent direction for f at x if f(x) T s < 0. Proof of 1st order necessary conditions. assume f(x ) 0. s := f(x ) is a descent direction for f at x = x : f(x ) T ( f(x )) = f(x ) T f(x ) = f(x ) 2 < 0 since f(x ) 0 and a 0 with equality iff a = 0. Thus, by Lemma, x is not a local minimizer of f. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 9/51

Optimality conditions for unconstrained problems... f(x) is a descent direction for f at x whenever f(x) 0. s descent direction for f at x if f(x) T s < 0, which is equivalent to cos f(x),s = ( f(x))t s f(x) s = f(x)t s f(x) s > 0, and so: f(x),s [0,π/2). p k f k A descent directionp k. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 10/51

Summary of first-order conditions. A look ahead minimize f(x) subject to x R n. (UP) First-order necessary optimality conditions: f C 1 (R n ); x a local minimizer of f = f(x ) = 0. x = argmax x R n f(x) f( x) = 0. f(x) x 1 x 2 a b Look at higher-order derivatives to distinguish between minimizers and maximizers.... except for convex functions: x local minimizer/stationary point = x global minimizer. [see Problem Sheet] x Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 11/51

Second-order optimality conditions (nonconvex fcts.) Second-order necessary conditions: f C 2 (R n ); x local minimizer of f = 2 f(x ) positive semidefinite, i.e., s T 2 f(x )s 0, for all s R n. Example: f(x) := x 3, x = 0 not a local minimizer but f (0) = f (0) = 0. Second-order sufficient conditions: f C 2 (R n ); f(x ) = 0 and 2 f(x ) positive definite, namely, s T 2 f(x )s > 0, for all s 0. = x (strict) local minimizer of f. [local convexity] Example: f(x) := x 4, x = 0 is a (strict) local minimizer but f (0) = 0. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 12/51

Stationary points of quadratic functions x* x* x* (a) Minimum (b) Maximum (c) Saddle (d) Semidefinite x 2 x 2 x 2 x* x* x 1 x 1 x 1 (e) Maximum or (f) Saddle (g) Semidefinite minimum H R n n, g R n : q(x ) = 0 Hx + g = 0; q(x) := g T x + 1 2 xt Hx. 2 q(x) = H, x. General f: approx. locally quadratic around x stationary. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 13/51

Methods for local unconstrained optimization minimize f(x) subject to x R n (UP) [f C 1 (R n ) or f C 2 (R n )] A Generic Method (GM) Choose ǫ > 0 and x 0 R n. While (TERMINATION CRITERIA not achieved), REPEAT: compute the change x k+1 x k = F(x k, problem data), [linesearch, trust-region] to ensure f(x k+1 ) < f(x k ). set x k+1 := x k + F(x k, prob. data), k := k + 1. TC: f(x k ) ǫ; maybe also, λ min ( 2 f(x k )) ǫ. e.g., x k+1 minimizer of some (simple) model of f around x k linesearch, trust-region methods. if F = F(x k,x k 1, problem data) conjugate gradients mthd. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 14/51

Issues to consider about GM Global convergence of GM: if ǫ := 0 and any x 0 R n : f(x k ) 0, as k? [maybe also, liminf k λ min ( 2 f(x k )) 0?] Local convergence of GM: if ǫ := 0 and x 0 sufficiently close to x stationary/local minimizer of f: x k x, k? Global/local complexity of GM: count number of iterations and their cost required by GM to generate x k within desired accuracy ǫ > 0, e.g., such that f(x k ) ǫ. [connection to convergence and its rate] Rate of global/local convergence of GM. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 15/51

Rates of convergence of sequences: an example l k := (1/2) k 0 linearly, q k := (1/2) 2k 0 quadratically, s k := k k 0 superlinearly as k. 0 l k k l k q k 50 0 1 0.5 100 1 0.5 0.25 2 0.25 0.6 ( 1) 150 3 0.12 0.4 ( 2) 200 4 0.6 ( 2) 0.1 ( 4) 250 q k s k 5 0.3 ( 2) 0.2 ( 9) 6 0.2 ( 2) 0.5 ( 19) 300 0 20 40 60 80 100 Rates of convergence on a log scale. Notation: ( i) := 10 i. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 16/51

Rates of convergence of sequences {x k } R n,x R n ; x k x as k. p-rate of convergence: x k x with rate p 1 if ρ > 0 and k 0 0 such that x k+1 x ρ x k x p, k k 0. ρ convergence factor; e k := x k x error in x k x. Linear convergence: p = 1 ρ < 1; (asymptotically,) no of correct digits grows linearly in the number of iterations. Quadratic convergence: p = 2; (asymptotically,) no of correct digits grows exponentially in the number of iterations. Superlinear convergence: x k+1 x / x k x 0 as k. [faster than linear, slower than quadratic; practically very acceptable ] Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 17/51

Linesearch methods Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 18/51

A generic linesearch method (UP): minimize f(x) subject to x R n, where f C 1 or C 2. A Generic Linesearch Method (GLM) Choose ǫ > 0 and x 0 R n. While f(x k ) > ǫ, REPEAT: compute a descent search direction s k R n, f(x k ) T s k < 0; compute a stepsize α k > 0 along s k such that f(x k + α k s k ) < f(x k ); set x k+1 := x k + α k s k and k := k + 1. Recall property of descent directions. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 19/51

Performing a linesearch How to compute α k? Exact linesearch: α k := argmin α>0 f(x k + αs k ). computationally expensive for nonlinear objectives. x1-4 -2 2 4 6 x (1) 140 120 100 80 60 40 20 x2 4 2-2 -4-6 x (a) (c) 0.2 0.4 0.6 2.5 0 x2-2.5-5 x2 4 2 x1-4 -2 2 4 6-2 -4-6 -2.5 0 2.5 5 x1 x (2) (b) (d) 150 0 50 100 Exact linesearch for quadratic functions Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 20/51

Performing a linesearch... Inexact linesearch want stepsize α k not to be: too short too long compared to the decrease in f f(x) 3 f(x) 3 2.5 2.5 (x 1,f(x 1 ) (x 1,f(x 1 ) 2 2 1.5 (x 2,f(x 2 ) 1.5 (x 2,f(x 2 ) 1 (x 5,f(x 5 ) (x 3,f(x 3 ) (x 4,f(x 4 ) 1 (x 3,f(x 3 ) (x 5,f(x 5 ) (x 4,f(x 4 ) 0.5 0.5 0 0 2 1.5 1 0.5 0 0.5 1 1.5 2 x 2 1.5 1 0.5 0 0.5 1 1.5 2 x Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 21/51

Performing a linesearch... Inexact linesearch want stepsize α k not too long compared to the decrease in f. The Armijo condition Choose β (0,1). Compute α k > 0 such that f(x k + α k s k ) f(x k ) + βα k f(x k ) T s k ( ) is satisfied. in practice, β := 0.1 or even β := 0.001. due to the descent condition, α k > 0 such that ( ) holds for α [0,α k ]. Choose α k (0,α k ] as large as possible. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 22/51

Performing a linesearch... Inexact linesearch... Φ k : R R, Φ k (α) := f(x k + αs k ), α 0. Then Armijo Φ k (α k ) Φ k (0) + βα k Φ (0). Let y β (α) := Φ k (0) + βαφ (0), α 0. y 2 y=y 0 (α) 1.5 1 0.5 0 y=φ(α) 0.5 y=y 1 (α) y=y 0.6 (α) 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 acceptable α=(0,a] A α Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 23/51

Performing a linesearch... Inexact linesearch The backtracking-armijo (barmijo) linesearch algorithm Choose α (0) > 0, τ (0,1) and β (0,1). While f(x k + α (i) s k ) > f(x k ) + βα (i) f(x k ) T s k, REPEAT: set α (i+1) := τα (i) and i := i + 1. END. Set α k := α (i). backtracking = stepsize α k not too short ; α (0) := 1; τ := 0.5 = α (0) := 1, α (1) := 0.5, α (2) := 0.25,... the barmijo linesearch algorithm terminates in a finite number of steps with α k > 0, due to the descent condition. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 24/51

Steepest descent method Steepest descent (SD) direction: set s k := f(x k ), k 0. s k descent direction whenever f(x k ) 0: f(x k ) T s k < 0 f(x k ) T ( f(x k )) < 0 f(x k ) 2 < 0. s k steepest descent: unique global solution of minimize s R nf(x k ) + s T f(x k ) subject to s = f(x k ). Cauchy-Schwarz: s T f(x k ) s f(x k ), s, with equality iff s is proportional to f(x k ). Method of steepest descent (SD): GLM with s k == SD direction; any linesearch. SD-e :== SD method with exact linesearches; SD-bA :== SD method with barmijo linesearches. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 25/51

Global convergence of steepest descent methods f C 1 (R n ); f is Lipschitz continuous (on R n ) iff L > 0, f(y) f(x) L y x, x, y R n. Theorem Let f C 1 (R n ) be bounded below on R n. Let f be Lipschitz continuous. Apply the SD-e or the SD-bA method to minimizing f with ǫ := 0. Then both variants of the SD method have the property: either there exists l 0 such that f(x l ) = 0 or f(x k ) 0 as k. SD methods have excellent global convergence properties (under weak assumptions). Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 26/51

Some disadvatanges of steepest descent methods SD methods are scale-dependent. poorly scaled variables = SD direction gives little progress. 5 4 3 2 1 0 1 2 3 4 5 5 4 3 2 1 0 1 2 3 4 5 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 The effect of problem scaling on SD-e performance. Let f(x 1,x 2 ) = 1 2 (ax2 1 + x2 2 ), where a > 0. Left figure: a = 100.6 (mildly poor scaling). Right figure: a = 1 ( perfect scaling). Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 27/51

Some disadvatanges of steepest descent methods... Usually, SD methods converge very slowly to solution, asymptotically. theory: very slow converge. numerics: break-down (cumulation of round-off and ill-conditioning). 1.5 1 0.5 0 f(x 1,x 2 ) = 10(x 2 x 2 1 )2 +... + (x 1 1) 2 0.5. 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 SD-bA applied to the Rosenbrock function f. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 28/51

Local rate of convergence for steepest descent Locally, SD converges linearly to a solution, namely f(x k+1 ) f(x ) ρ f(x k ) f(x ), k suff. large BUT convergence factor ρ v. close to 1 usually! Theorem. f C 2 ; x local minimizer of f with 2 f(x ) positive definite λ max, λ min eigenvalues. Apply SD-e to minf. If x k x as k, then f(x k ) converges linearly to f(x ), with ρ ( κ(x ) 1 κ(x )+1) 2 := ρsd, where κ(x ) = λ max /λ min condition number of 2 f(x ). practice: ρ = ρ SD ; for Rosenbrock f: κ(x ) = 258.10, ρ SD 0.984. κ(x ) = 800, f(x 0 ) = 1, f(x ) = 0. SD-e gives f(x k ) 0.007 after 1000 iterations! Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 29/51

Other directions for GLMs Let B k symmetric, positive definite matrix. Let s k be def. by B k s k = f(x k ). ( ) = s k descent direction; = s k solves the problem minimize s R n m k (s) = f(x k ) + f(x k ) T s + 1 2 st B k s k. If 2 f(x k ) is positive definite and B k := 2 f(x k ) = damped Newton method. ( ) is a scaled steepest descent direction; resulting GLMs can be made scale-invariant for some B k (e.g., B k = 2 f(x k )); local convergence can be made faster than SD methods for some B k (e.g., quadratic for Newton); Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 30/51

Global convergence for general GLMs Theorem Let f C 1 (R n ) be bounded below on R n. Let f be Lipschitz continuous. Let the eigenvalues of B k be uniformly bounded above and below, away from zero, for all k. Apply the GLM to minf with barmijo linesearch, s k from ( ) and ǫ = 0. Then either there exists l 0 such that f(x l ) = 0 or f(x k ) 0 as k. Theorem requires locally strongly convex quadratic models of f for all k. In particular, Th. holds if f locally strongly convex at all iterates. See Problem Sheet for failure examples for Newton s method (without linesearch) to converge globally Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 31/51

Local convergence for damped Newton s method Theorem Let f C 2 (R n ) and 2 f be Lipschitz continuous and uniformly positive definite. Let s k =Newton direction in GLM and let barmijo linesearch have β 0.5 and α (0) = 1. Then, provided limit points of the sequence of iterates exist, α k = 1 for all sufficiently large k; x k x as k quadratically. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 32/51

Local convergence for damped Newton... f(x 1,x 2 ) = 10(x 2 x 2 1 )2 + (x 1 1) 2 ; x = (1,1). 1.5 1 0.5 0 0.5 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Damped Newton with barmijo linesearch applied to the Rosenbrock function f. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 33/51

Modified Newton methods If 2 f(x k ) is not positive definite, it is usual to solve instead ( 2 f(x k ) + M k) s k = f(x k ), where M k chosen such that 2 f(x k ) + M k is sufficiently positive definite. M k := 0 when 2 f(x k ) is sufficiently positive definite. Popular option: Modified Cholesky: compute Cholesky factorization 2 f(x k ) = L k (L k ) where L k is lower triangular matrix. Modify the generated L k if the factorization is in danger of failing (modify small or negative diagonal pivots, etc.). Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 34/51

Quasi-Newton methods Secant approximations for computing B k 2 f(x k ) [when 2 f unavailable exactly or by function or gradient finite-differences] At the start of the GLM, choose B 0 (say, B 0 := I). After computing s k from B k s k = f(x k ) and x k+1 = x k + α k s k, compute update B k+1 of B k. Wish list: Compute B k+1 as a function of already-computed quantities f(x k+1 ), f(x k ),..., f(x 0 ), B k, s k B k+1 should be symmetric, nonsingular (pos. def?) B k+1 close to B k, a cheap update of B k B k 2 f(x k ), etc. = a new class of methods: faster than steepest descent method, cheaper to compute per iteration than Newton s. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 35/51

Quasi-Newton methods... For the first wish, choose B k+1 to satisfy the secant equation γ k := f(x k+1 ) f(x k ) = B k+1 (x k+1 x k ) = B k+1 α k s k. It is satisfied by B k+1 = 2 f when f is a quadratic function. The change in gradient contains information about the Hessian. Many ways to compute B k+1 to satisfy secant eq; trade-off between wishes: rank-1 update of B k (Symmetric-Rank 1), rank 2 update of B k (BFGS (Broyden-Fletcher-Goldfarb-Shanno)), DFP (Davidson-Fletcher-Powell), etc. Work per iteration: O(n 2 ) (as opposed to the O(n 3 ) of Newton) For global convergence, must use Wolfe linesearch instead of barmijo linesearch. local Q-superlinear convergence [see Problem Sheet] Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 36/51

Trust-region methods Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 37/51

Linesearch versus trust-region methods (UP): minimize f(x) subject to x R n. Linesearch methods: liberal in the choice of search direction, keeping bad behaviour in control by choice of α k. choose descent direction s k, compute stepsize α k to reduce f(x k + αs k ), update x k+1 := x k + α k s k. Trust region (TR) methods: conservative in the choice of search direction, so that a full stepsize along it may really reduce the objective. pick direction s k to reduce a local model of f(x k + s k ), accept x k+1 := x k + s k if decrease in the model is also achieved by f(x k + s k ), else set x k+1 := x k and refine the model. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 38/51

Trust-region models for unconstrained problems Approximate f(x k + s) by: linear model l k (s) := f(x k ) + s f(x k ) or quadratic model Impediments: q k (s) := f(x k ) + s f(x k ) + 1 2 s 2 f(x k )s. models may not resemble f(x k + s) when s is large, models may be unbounded from below, l k (s) always unbounded below (unless f(x k ) = 0) q k (s) is always unbounded below if 2 f(x k ) is negative definite or indefinite, and sometimes if 2 f(x k ) is positive semidefinite. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 39/51

Trust region models and subproblem Prevent bad approximations by trusting the model only in a trust region, defined by the trust region constraint s k, (R) for some appropriate radius k > 0. The constraint (R) also prevents l k, q k from unboundedness! = the trust region subproblem min s R nm k(s) subject to s k, (TR) where m k := l k, k 0, or m k := q k, k 0. From now on, m k := q k. (TR) easier to solve than (P). May even solve (TR) only approximately. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 40/51

Trust region models and subproblem - an example 1 quadratic TR model about x=(1, 0.5), =1 1 linear TR model about x=(1, 0.5), =1 0.5 0.5 0 0 0.5 0.5 1 1 1.5 1.5 2 1 0 1 2 2 1 0 1 2 1 0.5 0 0.5 1 1.5 quadratic TR model about x=(0,0), =1 0.5 0 0.5 1 1.5 quadratic TR model about x=( 0.25,0.5), =1 1 2 2 1 0 1 2 1 0 1 2 Trust-region models of f(x) = x 4 1 + x 1x 2 + (1 + x 2 ) 2. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 41/51

Generic trust-region method Let s k be a(n approximate) solution of (TR). Then predicted model decrease: m k (0) m k (s k ) = f(x k ) m k (s k ). actual function decrease: f(x k ) f(x k + s k ). The trust region radius k is chosen based on the value of ρ k := f(xk ) f(x k + s k ). f(x k ) m k (s k ) If ρ k is not too smaller than 1, x k+1 := x k + s k, k+1 k. If ρ k close to or 1, k is increased. If ρ k 1, x k+1 = x k and k is reduced. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 42/51

A Generic Trust Region (GTR) method Given 0 > 0, x 0 R n, ǫ > 0. While f(x k ) ǫ, do: 1. Form the local quadratic model m k (s) of f(x k + s). 2. Solve (approximately) the (TR) subproblem for s k with m k (s k ) < f(x k ) ( sufficiently ). Compute ρ k := [f(x k ) f(x k + s k )]/[f(x k ) m k (s k )]. 3. If ρ k 0.9, then [very successful step] set x k+1 := x k + s k and k+1 := 2 k. Else if ρ k 0.1, then [successful step] set x k+1 := x k + s k and k+1 := k. Else [unsuccessful step] set x k+1 = x k and k+1 := 1 2 k. 4. Let k := k + 1. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 43/51

Trust-region methods Other sensible values of the parameters of the GTR are possible. Solving the (TR) subproblem min s R nm k(s) subject to s k, (TR)... exactly or even approximately may imply work. Want minimal condition of sufficient decrease in the model that ensures global convergence of the GTR method (the Cauchy cond.). In practice, we (usually) do much better than this condition! Example of applying a trust-region method: [Sartenaer, 2008]. approximate solution of (TR) subproblem: better than Cauchy, but not exact. notation: f/ m k ρ k. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 44/51

The Cauchy point of the (TR) subproblem recall the steepest descent method has strong (theoretical) global convergence properties; same will hold for GTR method with SD direction. minimal condition of sufficient decrease in the model: require m k (s k ) m k (s k c ) and sk k, where s k c := αk c f(xk ), with α k c := arg min α>0 m k( α f(x k )) subject to α f(x k ) k. [i.e. a linesearch along steepest descent direction is applied to m k at x k and is restricted to the trust region.] Easy: α k c := arg min α m k( α f(x k )) subject to 0 < α y k c := xk + s k c is the Cauchy point. k f(x k ). [see Problem Sheet] Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 45/51

Global convergence of the GTR method Theorem (GTR Global Convergence) Let f be smooth on R n and bounded below on R n. Let f be Lipschitz continuous on R n. Let {x k }, k 0, be generated by the generic trust region (GTR) method, and let the computation of s k be such that m k (s k ) m k (s k c ), for all k. Then either or there exists k 0 such that f(x k ) = 0 lim k f(x k ) = 0. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 46/51

Solving the (TR) subproblem On each TR iteration we compute or approximate the solution of min s R nm k(s) = f(x k ) + s f(x k ) + 1 2 s 2 f(x k )s subject to s k. also, s k must satisfy the Cauchy condition m k (s k ) m k (s k c ), where s k c := αk c f(xk ), with α k c := arg min α>0 m k( α f(x k )) subject to α f(x k ) k. [Cauchy condition ensures global convergence] solve (TR) exactly (i.e., compute global minimizer of TR) = TR akin to Newton-like method. solve (TR) approximately (i.e., an approximate global minimizer) = large-scale problems. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 47/51

Solving the (TR) subproblem exactly For h R, > 0, g R n, H n n real matrix, consider min s R nm(s) := h + s g + 1 2 s Hs, s. t. s. (TR) Characterization result for the solution of (TR): Theorem Any global minimizer s of (TR) satisfies the equation (H + λ I)s = g, where H + λ I is positive semidefinite, λ 0, λ ( s ) = 0 and s. If H + λ I is positive definite, then s is unique. The above Theorem gives necessary and sufficient global optimality conditions for a nonconvex optimization problem! Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 48/51

Solving the (TR) subproblem exactly Computing the global solution s of (TR): Case 1. If H is positive definite and Hs = g satisfies s = s := s (unique), λ := 0 (by Theorem). Case 2. If H is positive definite but s >, or H is not positive definite, Theorem implies s satisfies (H + λi)s = g, s =, ( ) for some λ max{0, λ min (H)} := λ. Let s(λ) = (H + λi) 1 g, for any λ > λ. Then s = s(λ ) where λ λ solution of s(λ) =, λ λ. nonlinear equation in one variable λ. Use Newton s method to solve it. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 49/51

Further considerations on trust-region methods Solving the (TR) subproblem accurately for large-scale (and dense) problems uses iterative methods (conjugate-gradient, Lanczos); Cauchy condition satisfied Quasi-Newton methods/approximate derivatives also possible in the trust-region framework (no need for positive definite updates for the Hessian); replace 2 f(x k ) with approximation B k in the quadratic local model m k (s). Conclusions: state-of-the-art software exists implementing linesearch and TR methods; similar performances for linesearch and TR methods (more heuristical approaches needed by linesearch methods to deal with negative curvature). Choosing between the two is mostly a matter of taste. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 50/51

Numerical optimization bibliography J. Nocedal & S. J. Wright, Numerical Optimization, Springer, 2nd edition, 2006. R. Fletcher, Practical Methods of Optimization, Wiley, 2nd edition, 2000. J. Dennis and R. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, (republished by) SIAM, 1996. Information on existing software can be found at the NEOS Center: http://www.neos-guide.org look under Optimization Guide and Optimization Tree, etc. Methods for Unconstrained OptimizationNumerical Optimization Lectures 1-2 p. 51/51