Unconstrained optimization I Gradient-type methods

Size: px
Start display at page:

Download "Unconstrained optimization I Gradient-type methods"

Transcription

1 Unconstrained optimization I Gradient-type methods Antonio Frangioni Department of Computer Science University of Pisa frangio@di.unipi.it Computational Mathematics for Learning and Data Analysis Master in Computer Science University of Pisa

2 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize

3 Optimization algorithms Iterative procedures (doh!) Start from initial guess x 0, some process x i x i+1 Want the sequence { x i } to go towards an optimal solution Actually three different forms: (strong) { x i } x : the whole sequence converges to an optimal solution (weaker) all accumulation points of { x i } (if any) are optimal solutions (weakest) at least one accumulation point of { x i } (if any) is optimal X compact helps (accumulation points always ), but here X = R n f not convex = optimal stationary point Two general forms of the process: line search: first choose d i R n (direction), then choose α i R (stepsize) s.t. x i+1 x i + α i d i trust region: first choose α i (trust radius), then choose d i In ML, α i is often called learning rate Crucial concept: model of f used to construct next iterate

4 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize

5 First example of line search: gradient method Simplest idea: my model is linear Best linear model of f at x i : f i (x) = f (x i ) + f (x i )( x x i ) x i+1 argmin{ f i (x) : x R n } Except, of course, argmin is empty: f i is unbounded below on R n Go infinitely much along the steepest descent direction d i = f (x i ) But this clearly is trusting the model too much: f (x) f i (x) far from x i As you move along d i, f changes; soon f d i will no longer be negative Beware too long steps, as f will (probably) start growing after a while Too short steps are bad either: f will decrease, but only too little The best step ever: α i argmin{ f ( x i + αd i ) : α 0 } exact line search (doh!) Then, x i+1 x i + α i d i Exact line search is difficult in general, let s start simple Exercise: prove α i > 0

6 Gradient method for quadratic functions Couldn t be simpler than f (x) = 1 2 x T Qx + qx Think Q 0 as otherwise f is surely unbounded below x solves Qx = q (if ), so this is linear algebra Inverting/factorizing Q is O(n 3 ) in practice, can we do better? d i = f (x i ) = Qx i q (O(n 2 ) to compute) Good news: line search is easy, α i = d i 2 /( (d i ) T Qd i ) = procedure x = SDQ ( Q, q, x, ε ) { while( f (x) > ε ) do { d f (x); α d 2 /( d T Qd ); x x + αd; } } Exercise: prove the formula for α i Exercise: there is a glaring numerical problem in that procedure, fix it Exercise: something can go wrong with that formula: what does it mean? Improve the code to take that occurrence into account. Exercise: what happens if Q 0? Does the (improved) code need be fixed?

7 Gradient method: convergence The gradient method works : what does this mean? Asymptotic analysis: ε = 0 = { x i } is/contains a minimizing sequence Fundamental relationship: f (x i ), f (x i+1 ) = 0 Proof: d i, f (x i+1 ) = f d i (x i+1 ), but x i+1 local minimum along d i { x i } x = f (x) = 0 Proof: lim i f (x i ), f (x i+1 ) = 0 = f (x), f (x) (why?) Any subsequence that converges does so at a stationary point (weaker) Do (sub)sequence(s) converge? X compact would help, but X = R n ε > 0 = finitely terminates (why?), no convergence required Exercise: prove that if Q 0, then { x i } x, unique optimum

8 Gradient method: efficiency The gradient method is (not) fast : what does this mean? How rapidly x i x decreases... hard; { x i } may not converge different subsequences different optima (which x?) Typically, how rapidly f (x i ) f decreases (eventually, it has to) Rate/order of convergence: f (x i+1 ) f lim i ( f (x i ) f ) p = R p = 1, R = 1 = sublinear convergence 1 / i, 1 / i 2... p = 1, R < 1 = linear convergence γ i, γ < 1 p = 1, R = 0 = superlinear (!) convergence γ i 2, γ < 1 p = 2, R > 0 = quadratic (!!!) convergence γ 2i, γ < 1 Linear convergence: in the tail, f (x i+1 ) f R( f (x i ) f ) = f (x i ) f ( f (x 1 ) f )R i, as fast as a negative exponential f (x i ) f ε for i log( ( f (x 1 ) f )/ε ) / log(1/r) O( log( 1/ε ) ) [good!], but the constant as R 1 [bad!]

9 Gradient method: efficiency Analysis is not obvious, have to use property of x (unknown) In this case, nifty trick: f (x) = 1 2 (x x ) T Q(x x ) = f (x) x T Qx = f (x) f the error at x is the distance between x and x in Q Exercise: check the above formula (hint: remember Qx + q = 0) One can then prove that if Q 0 then ( f (x i+1 d i 4 ) ) = 1 ((d i ) T Qd i )((d i ) T Q 1 d i f (x i ) ) the error decreases by exactly a constant factor at each iteration Making sense of the above bound requires a bit of work Exercise: check the above formula (hint: for y i = x i x, d i = Qy i )

10 Gradient method: efficiency (cont.d) Recall a few facts: Λ(Q) = λ 1... λ n > 0 eigenvalues of Q = Λ(Q 1 ) = 1 / λ n... 1 / λ 1 > 0 eigenvalues of Q 1 λ n x 2 x T Qx λ 1 x 2 x R n Hence, x 2 /x T Qx 1/λ 1, x 2 /x T Q 1 x λ n (check) = x R n x 4 (x T Qx)(x T Q 1 x) λn λ 1 A better estimate is possible (technical, just believe it): A bit better: with λ 1 = 1000λ n x R n x 4 (x T Qx)(x T Q 1 x) 4λ1 λ n (λ 1 + λ n ) 2 λ n λ 1 = < 4λ1 λ n (λ 1 + λ n )

11 Gradient method: efficiency (wrap up) All in all: ( λ f (x i+1 1 λ n ) 2 ) f λ 1 + λ n ( f (x i ) f ) the prototype of all linear convergence results Good news: the bound is dimension independent does not depend on n = holds the same for very-large-scale Bad news: the bound depends badly on conditioning of Q Example: λ 1 = 1000λ n = R / log(1/r) 576 Note: with coarser formula R = / log(1/r) With f (x 1 ) f = 1, ε = 10 6 requires 3500 iterations even for n = 2

12 Gradient method: efficiency (wrap up) All in all: ( λ f (x i+1 1 λ n ) 2 ) f λ 1 + λ n ( f (x i ) f ) the prototype of all linear convergence results Good news: the bound is dimension independent does not depend on n = holds the same for very-large-scale Bad news: the bound depends badly on conditioning of Q Example: λ 1 = 1000λ n = R / log(1/r) 576 Note: with coarser formula R = / log(1/r) With f (x 1 ) f = 1, ε = 10 6 requires 3500 iterations even for n = 2... but also for n = 10 8 Dimension independence is liked a lot in ML, but R may 1 as n grows More bad news: the behaviour in practice is close to the bound Intuitively, the algorithm zig-zags a lot when level sets are very elongated

13 En passant: the stopping criterion The stopping criterion is not what one would want, which is f (x i ) f = ε A ε (absolute error) or ε A / f = ε R ε (relative error) (more or less alternative version has f (x i ) at the denominator) Exercise: the definition of ε R has a glaring numerical problem, fix it Exercise: explain exactly why ε R is better than ε A Except, f is unknown (most often) and cannot be used on-line Need a lower bound f f, tight at least towards termination Estimating f could be considered the true problem Often f not there, hence f (x i ) ε the only workable alternative But the relationship between the two ε is far from obvious Sometimes f (x) has a physical meaning that can be used Exercise: for X = B(0, r) and f convex, estimate ε A when f (x i ) ε

14 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize

15 Gradient method: non-quadratic case What happens when f is a general nonlinear function? Good news: convergence is the same (never used f quadratic ) Condition f (x i ), f (x i+1 ) = 0 holds at local minima (but also at local maxima and saddle points), so convexity not crucial Good/bad news: efficiency is basically the same. f C 2, x local minimum such that 2 f (x ) = Q 0; if { x i } x, then { f (x i ) } f (x ) linearly with the same R as in the quadratic case (depending on λ 1 and λ n of Q) In the tail of the convergence process f its second-order model, so convergence is the same Fundamental issue: exact line search is difficult Algebraic solution (compute f (x α f (x)), find its roots) possible only in a limited set of cases Has to algorithmically search along the line for the right α i (doh!)

16 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize

17 Line Search: first-order approaches For ϕ( α ) = f ( x i + αd i ) : R R, ϕ ( α ) = f ( x i + αd i ), d Exercise: prove this using the chain rule: f : R m R k, g : R n R m h(x) = f (g(x)) : R n R k = Jh(x) = Jf (g(x)) Jg(x) (note that Jf R k m, Jg R m n, in fact Jh R k m R m n = R k n ) Find α i s.t. ϕ ( α i ) = 0 f continuous = ϕ continuous (why?) α i must exist if ᾱ s.t. ϕ ( ᾱ ) > 0 Exercise: prove this (hint: use the intermediate value theorem) Obvious solution: ᾱ 1; // or whatever value while( ϕ ( ᾱ ) > 0 ) do ᾱ 2ᾱ; // or whatever factor Will work in practice for all reasonable function Works if ϕ coercive: lim α ϕ( α ) = (ex. f strongly convex) Exercise: construct an example where ᾱ exists but it is not found

18 Line Search: Bisection method Pretty darn obvious: procedure α = LSBM ( ϕ, α, ε ) { α 0; α + α; while( true ) do { α (α + + α )/2; v ϕ ( α ); if( v ε ) then break; if( v < 0 ) then α α; else α + α; } } Asymptotic convergence: ε = 0, { α k } infinite sequence { α k } [ 0, ᾱ ] = convergent subsequence to α (why?) α [ α k, α k + ] k, α k + α k = ᾱ2 k = { α k } α (why?) = { ϕ ( α k ) } ϕ ( α ) = 0 (why?) = finitely terminate for ε > 0 Exercise: prove: ϕ locally Lipschitz at α = { ϕ ( α k ) } 0 linearly (R?) Exercise: construct counter-example (ϕ not locally Lipschitz) Exercise: suggest assumptions for ϕ locally Lipschitz = linear convergence

19 Improving the bisection method: interpolation Choosing α k+1 right in the middle just the dumbest possible approach One knows a lot about ϕ: ϕ( α ), ϕ( α + ), ϕ ( α + ), ϕ ( α ) (need be computed, but usually free if one computes ϕ ) Quadratic interpolation: aα 2 + bα + c that agrees with ϕ at α +, α Three parameters, four conditions, something s gotta give (three cases) Example: 2aα + + b = ϕ ( α + ), 2aα + b = ϕ ( α ) = a = ϕ ( α + ) ϕ ( α ) 2(α + α ), b = α ϕ ( α + ) α + ϕ ( α ) α + α Minimum solves 2aα + b = 0 (c irrelevant) α = α ϕ ( α + ) α + ϕ ( α ) ϕ ( α + ) ϕ ( α ) a convex combination between α + and α (check) Exercise: develop the other cases of quadratic interpolation and discuss them

20 Improving the bisection method: more interpolation It can be proven (long and complicated) that, if ϕ C 3, then quadratic interpolation has convergence of order 1 < p < 2 (superlinear) For instance, the previous formula (a.k.a. method of false position or secant formula ) has p = (1 + 5)/ Exercise: propose a simple modification that guarantees (linear) convergence even if ϕ / C 3 while changing as little as possible the normal run Four conditions = can fit a cubic polynomial and use its minima Rather tedious to write down, analyse and implement Theoretically pays: cubic interpolation has quadratic convergence (p = 2) Seems to work pretty well in practice Exercise (not for the faint of heart): develop cubic interpolation

21 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize

22 Line Search: second-order approaches More derivatives = same information with less points f C 2 = ϕ (α) = d T 2 f (x + αd)d and continuous (why?) Exercise: prove this using the chain rule Computing 2 f = quadratic convergence with only one point Newton s method (tangent method): first-order Taylor of ϕ at α k ϕ (α) ϕ (α k ) + ϕ (α k )( α α k ), solve ϕ (α) = 0 = α = α k ϕ (α k ) / ϕ (α k ) This is clearly second-order approximation of ϕ Fantastically simple procedure α = LSNM ( ϕ, ϕ, α, ε ) { while( ϕ ( α ) > ε ) do α α ϕ ( α ) / ϕ ( α ); } Extremely good convergence (under appropriate conditions) Clearly numerically delicate: what if ϕ ( α ) 0?

23 Analysis of Newton s method Theoretical analysis of Newton s method instructive If ϕ C 3, ϕ ( α ) = 0 and ϕ (α ) 0, then δ > 0 s.t. if Newton s method starts at α [ α δ, α + δ ], then { α k } α with p = 2 Proof: the iteration gives α k+1 α = α k α ( ϕ ( α k ) ϕ ( α ) ) / ϕ ( α k ) = [ ϕ ( α k ) ϕ ( α ) + ϕ ( α k )( α k α ) ] / ϕ ( α k ) For some β [ α k, α ], Taylor gives ϕ ( α ) = ϕ ( α k ) + ϕ ( α k )( α k α ) + ϕ ( β )( α k α ) 2 /2 = α k+1 α = [ ϕ ( β ) / 2ϕ ( α k ) ]( α k α ) 2 δ > 0 s.t. ϕ ( α ) k 2 > 0 (why?) and ϕ ( β ) k 1 < (why?) for α, β [ α δ, α + δ ] = α k+1 α [ k 1 / 2k 2 ]( α k α ) 2 k 1 ( α k α )/2k 2 1 = α k+1 α < α k α = { α k } α, and the convergence is quadratic Convergence only if α k+1 α small enough Nontrivial to ensure in practice

24 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize

25 Line Search: zeroth-order approaches Computing f / 2 f can be costly (d T 2 f d is O(n 2 ) already) Only use ϕ values: less derivatives = more points Golden ratio search: assuming ϕ( 0 ) ϕ( α ) procedure α = LSGRM ( ϕ, α, ε ) { α 0; α + α; α α; α + = α; while( α + α ε ) do if( ϕ( α ) > ϕ( α + ) ) then { α α ; α α α +; α (α + α ); } else { α + α +; α + α α ; α 0.382(α + α ); } } ( 5 1)/2 (golden ratio), = Property: r = (1 r)/r 0.382/0.618, i.e., r : 1 = (1 r) : r Can compute only one ϕ( α ) per iteration Can do slightly better by using r k = F n k /F n k+1 (Fibonacci sequence) Exercise: picture out graphically how it works Exercise: analyse asymptotic and finite convergence of the approach

26 Gradient method and (inexact) line search Is ϕ ( α i ) ε enough for convergence? It depends on ε (of course) Trick: d i = f (x i )/ f (x i ) = d i = 1, ϕ ( 0 ) = f (x i ) ϕ ( α i ) = d i, f (x i+1 ) = f (x i )/ f (x i ), f (x i+1 ) { x i } x lim i f (x i )/ f (x i ), f (x i+1 ) = f (x)/ f (x), f (x) = f (x) ε (note: f (x i ) > ε) ε > 0 and { x i } x = for finite i, x i is approximate stationary point Note: with d i := f (x i ), ε := ε f (x i ) Other assumptions on f needed to ensure { x i } x (R n not compact) Simple one: f coercive lim x f (x) = + f continuous = f coercive S(f, v) compact v Exercise: prove f coercive (+ what else neded) = algorithm finitely stops Exercise: discuss how to get asymptotic convergence (ε = 0) Do we really need a close approximation to f (x) = 0?

27 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize

28 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, x

29 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) x

30 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ x

31 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x

32 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 )

33 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima

34 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima Wolfe condition: m 1 < m 3 < 1, (W) ϕ ( α ) m 3 ϕ ( 0 )

35 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima Wolfe condition: m 1 < m 3 < 1, (W) ϕ ( α ) m 3 ϕ ( 0 ) the curvature has to be a bit closer to 0 (but can be 0)

36 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima Wolfe condition: m 1 < m 3 < 1, (W) ϕ ( α ) m 3 ϕ ( 0 ) the curvature has to be a bit closer to 0 (but can be 0) Strong Wolfe: (W ) ϕ ( α ) m 3 ϕ ( 0 ) = m 3 ϕ ( 0 ) cannot be 0, but still captures all local minima (and maxima) Clearly, (W ) = (W)

37 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima Wolfe condition: m 1 < m 3 < 1, (W) ϕ ( α ) m 3 ϕ ( 0 ) the curvature has to be a bit closer to 0 (but can be 0) Strong Wolfe: (W ) ϕ ( α ) m 3 ϕ ( 0 ) = m 3 ϕ ( 0 ) cannot be 0, but still captures all local minima (and maxima) Clearly, (W ) = (W) (A) (W) / (W ) typically captures all local minima

38 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima Wolfe condition: m 1 < m 3 < 1, (W) ϕ ( α ) m 3 ϕ ( 0 ) the curvature has to be a bit closer to 0 (but can be 0) Strong Wolfe: (W ) ϕ ( α ) m 3 ϕ ( 0 ) = m 3 ϕ ( 0 ) cannot be 0, but still captures all local minima (and maxima) Clearly, (W ) = (W) (A) (W) / (W ) typically captures all local minima unless m 1 too close to 1 (that s why m )

39 Armijo-Wolfe line search ϕ C 1 ϕ( α ) bounded below for α 0 = α s.t. (A) (W ) holds Proof: l(α) = ϕ(0) + m 1 αϕ (0), d(α) = l(α) ϕ(α) = d(0) = 0, d (0) = (m 1 1)ϕ (0) > 0 (m 1 < 1) ᾱ > 0 s.t. d(ᾱ) = 0 = ϕ unbounded below (why?) Smallest ᾱ > 0 s.t. d(ᾱ) = 0: (A) is satisfied α (0, ᾱ] (why?) Rolle s theorem: d (ᾱ) < 0 = ϕ (ᾱ) > m 1 ϕ (0) (> m 3 ϕ (0) > ϕ (0)) Intermediate value theorem (on ϕ ): α (0, ᾱ) s.t. ϕ (α ) = m 3 ϕ (0) = (W ) also holds in α But how do I actually find such a point? m 1 small enough s.t. local minima are not cut = just go for the local minima and stop whenever (A) (W) / (W ) holds Hard to say if m 1 is small enough, although m 1 = most often is Specialized line search can be constructed for the odd case it is not Basic idea: find an interval [ α, ᾱ ] that surely contains points satisfying (A) (W) / (W ) (cf. proof above), restrict the search there inside Exercise (not for the faint of heart): develop specialized line search

40 Convergence with Armijo-Wolfe line search f Lipschitz continuous (A) (W) always hold = either f unbounded below or { f (x i ) } 0 Proof: (W) = ϕ ( α i ) ϕ ( 0 ) (1 m 3 )( ϕ ( 0 )) = f Lipschitz = ϕ Lipschitz and L does not depend on x i (check) = α i (1 m 3 )( ϕ ( 0 ))/L (check: where has gone?) ϕ ( 0 ) = f (x i ) ε > 0 = α i δ > 0 (A) = f (x i+1 ) f (x i ) m 1 α i f (x i ) f (x i ) m 1 δε = { f (x i ) } (or { f (x i ) } 0) Usual stuff: { x i } x = x a stationary point Hence, the algorithm finitely terminates with ε > 0 Insight from the proof: (W) (+ Lipschitz) serve to ensure that α k c f (x i ) for some c > 0 Can we get the same in a simpler way?

41 Backtracking line search Backtracking line search: procedure α = BLS ( ϕ, ϕ, α, m 1, τ ) { while( ϕ( α ) > ϕ( 0 ) + m 1αϕ ( 0 ) ) do α τ α; } f Lipschitz = gradient method with BLS works Proof: for simplicity, α = 1 (input). Remember the proof: ᾱ s.t. (A) holds α (0, ᾱ] and ϕ (ᾱ) > m 1 ϕ (0) > ϕ (0) = L(ᾱ 0) ϕ (ᾱ) ϕ (0) > (1 m 1 )( ϕ (0)) = ᾱ > (1 m 1 ) f (x i ) /L (same as before) f (x i ) > ε i = ᾱ > δ > 0 i h = min{ k : τ k δ } = α i τ h > 0 i = f (x i+1 ) f (x i ) m 1 τ h ε = { f (x i ) } or Now, { x i } x = x stationary blah blah Fundamental trick: α i can 0, but only as fast as f (x i ) Would be simpler if α i δ > 0 for good Exercise: remove assumption α = 1 (input)

42 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize

43 Line Search: really really inexact... no line search at all fixed step size Recall f Lipschitz = f (y) f (x) + f (x)( y x ) + L 2 y x 2 y := x i+1, x := x i, y x := α f (x i ) = f (x i+1 ) f (x i ) ( Lα 2 /2 α ) f (x i ) 2 (check) Powerful idea: find α that provides best worst-case improvement v(α) = Lα 2 /2 α, v (α) = Lα 1 = 0 = α = 1/L, v(α ) = 1/2L All in all: f (x i+1 ) f (x i ) f (x i ) 2 /(2L) Can t do better if you trust the quadratic upper estimate (which of course must not be trusted) In fact, α i = 1/L terrible in practice = use the previous methods Enticing because simple and inexpensive Selecting the parameters that lead to best performances for a model a very powerful idea in general

44 Fixed stepsize: convergence rate Once you have convergence, you can talk efficiency (easier with α fixed) Already know the error decreases, but how fast? i+1 := f (x i+1 ) f (x ) ( i := f (x i ) f (x ) ) f (x i ) 2 /(2L) x i any and f (x ) f (x i+1 ) = f (x ) f (x) f (x) 2 /(2L) f convex = f (x)(x x ) f (x) f (x ) f (x) 2 /(2L) x This proves r i := x i x decreases: (r i+1 ) 2 = x i+1 x 2 = x i x f (x i )/L 2 = = x i x 2 2 f (x i )(x i x )/L + f (x i )/L 2 x i x 2 = (r i ) 2 Hence, at the very least { x i } x (no problem here) Technical step: f (x i ) (r i /r 1 ) f (x i ) [Cauchy-Swartz] f (x i )(x i x ) (f (x i ) f (x ))/r 1 [convexity] = i /r 1 Conclusion: i+1 i f (x i ) 2 /(2L) i ( i ) 2 /(2(r 1 ) 2 L) = = i ( 1 i /(2(r 1 ) 2 L) ) not linear convergence as R is not constant sublinear

45 Fixed stepsize: convergence rate (cont d) What does this mean, exactly? i+1 i ( i ) 2 /(2(r 1 ) 2 L): divide by i+1 i = 1 i 1 i+1 i i+1 2(r 1 ) 2 L = 1 i i + 2(r 1 ) 2 L (why?) 1/ grows by a constant at each i = 1/ i+1 1/ 1 + i/(2(r 1 ) 2 L) = i (r 1 ) 2 L/( 2(r 1 ) 2 L + i 1 ) Error decreases as O( 1 / i ) = O( 1 / ε ) iterations (check details) Exponentially worse than O( 1 / log( ε ) ) However, this is unfair: we used Q nonsingular λ n > 0 Does it make a difference? You bet

46 Fixed stepsize: convergence rate with strong convexity Basically strong convexity Eigenvalues bounded both above and below: u I f 2 (x) L I, u > 0 Taylor = f ( x ) f ( x i ) + f (x i )( x x i ) + u x x i 2 /2 (why?) Minimize on x both sides independently = f ( x ) f ( x i ) f (x i ) 2 /(2u) (check) = f (x i ) 2 2u( f ( x i ) f ( x ) ) Put in f (x i+1 ) f (x ) f (x i ) f (x ) f (x i ) 2 /(2L) = f (x i+1 ) f (x ) ( f (x i ) f (x ) )( 1 u/l ) with exact step, funnily same as with coarse estimate, i.e., much worse A small difference in f makes a big difference in convergence Properties of f even more important than the algorithm O( 1/ε ) not the best for not strongly convex, can be O( 1/ ε ) better, but still much worse than O( 1 / log( ε ) ) Hence better algorithms do count, we ll work towards that However, O( 1/ ε ) is tight: can t do better without strong convexity Algorithms can only get so far with nasty problems

47 Wrap up Gradient (descent direction) + line search = convergence Line search by no means have to be exact

48 Wrap up Gradient (descent direction) + line search = convergence Line search by no means have to be exact... but not too coarse either Many different practical line searches, up to no search at all Convergence of gradient methods can be from quite bad to horrible

49 Wrap up Gradient (descent direction) + line search = convergence Line search by no means have to be exact... but not too coarse either Many different practical line searches, up to no search at all Convergence of gradient methods can be from quite bad to horrible... in practice as well as in theory Something better sorely needed

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

Chapter 1. Root Finding Methods. 1.1 Bisection method

Chapter 1. Root Finding Methods. 1.1 Bisection method Chapter 1 Root Finding Methods We begin by considering numerical solutions to the problem f(x) = 0 (1.1) Although the problem above is simple to state it is not always easy to solve analytically. This

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL) Part 2: Linesearch methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

ter. on Can we get a still better result? Yes, by making the rectangles still smaller. As we make the rectangles smaller and smaller, the

ter. on Can we get a still better result? Yes, by making the rectangles still smaller. As we make the rectangles smaller and smaller, the Area and Tangent Problem Calculus is motivated by two main problems. The first is the area problem. It is a well known result that the area of a rectangle with length l and width w is given by A = wl.

More information

Numerical Optimization

Numerical Optimization Unconstrained Optimization (II) Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Unconstrained Optimization Let f : R R Unconstrained problem min x

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Review of Classical Optimization

Review of Classical Optimization Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization

More information

1 Lecture 25: Extreme values

1 Lecture 25: Extreme values 1 Lecture 25: Extreme values 1.1 Outline Absolute maximum and minimum. Existence on closed, bounded intervals. Local extrema, critical points, Fermat s theorem Extreme values on a closed interval Rolle

More information

Simple Iteration, cont d

Simple Iteration, cont d Jim Lambers MAT 772 Fall Semester 2010-11 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Simple Iteration, cont d In general, nonlinear equations cannot be solved in a finite sequence

More information

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 1

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations Methods for Systems of Methods for Systems of Outline Scientific Computing: An Introductory Survey Chapter 5 1 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign

More information

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Optimization Methods. Lecture 19: Line Searches and Newton s Method 15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational

More information

V. Graph Sketching and Max-Min Problems

V. Graph Sketching and Max-Min Problems V. Graph Sketching and Max-Min Problems The signs of the first and second derivatives of a function tell us something about the shape of its graph. In this chapter we learn how to find that information.

More information

Generating Function Notes , Fall 2005, Prof. Peter Shor

Generating Function Notes , Fall 2005, Prof. Peter Shor Counting Change Generating Function Notes 80, Fall 00, Prof Peter Shor In this lecture, I m going to talk about generating functions We ve already seen an example of generating functions Recall when we

More information

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017 Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper

More information

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes) AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

ROOT FINDING REVIEW MICHELLE FENG

ROOT FINDING REVIEW MICHELLE FENG ROOT FINDING REVIEW MICHELLE FENG 1.1. Bisection Method. 1. Root Finding Methods (1) Very naive approach based on the Intermediate Value Theorem (2) You need to be looking in an interval with only one

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Numerical Methods in Informatics

Numerical Methods in Informatics Numerical Methods in Informatics Lecture 2, 30.09.2016: Nonlinear Equations in One Variable http://www.math.uzh.ch/binf4232 Tulin Kaman Institute of Mathematics, University of Zurich E-mail: tulin.kaman@math.uzh.ch

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 5 Nonlinear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

THE SECANT METHOD. q(x) = a 0 + a 1 x. with

THE SECANT METHOD. q(x) = a 0 + a 1 x. with THE SECANT METHOD Newton s method was based on using the line tangent to the curve of y = f (x), with the point of tangency (x 0, f (x 0 )). When x 0 α, the graph of the tangent line is approximately the

More information

Lecture 10: Powers of Matrices, Difference Equations

Lecture 10: Powers of Matrices, Difference Equations Lecture 10: Powers of Matrices, Difference Equations Difference Equations A difference equation, also sometimes called a recurrence equation is an equation that defines a sequence recursively, i.e. each

More information

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 13 Steepest Descent Method Hello, welcome back to this series

More information

5 Overview of algorithms for unconstrained optimization

5 Overview of algorithms for unconstrained optimization IOE 59: NLP, Winter 22 c Marina A. Epelman 9 5 Overview of algorithms for unconstrained optimization 5. General optimization algorithm Recall: we are attempting to solve the problem (P) min f(x) s.t. x

More information

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k. Problem du jour Week 6: Monday, Mar 7 Show that for any initial guess x 0 > 0, Newton iteration on f(x) = x 2 a produces a decreasing sequence x 1 x 2... x n a. What is the rate of convergence if a = 0?

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Numerical solutions of nonlinear systems of equations

Numerical solutions of nonlinear systems of equations Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points

More information

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.

More information

CS 323: Numerical Analysis and Computing

CS 323: Numerical Analysis and Computing CS 323: Numerical Analysis and Computing MIDTERM #2 Instructions: This is an open notes exam, i.e., you are allowed to consult any textbook, your class notes, homeworks, or any of the handouts from us.

More information

Numerical Methods I Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)

More information

2.5 The Fundamental Theorem of Algebra.

2.5 The Fundamental Theorem of Algebra. 2.5. THE FUNDAMENTAL THEOREM OF ALGEBRA. 79 2.5 The Fundamental Theorem of Algebra. We ve seen formulas for the (complex) roots of quadratic, cubic and quartic polynomials. It is then reasonable to ask:

More information

STOP, a i+ 1 is the desired root. )f(a i) > 0. Else If f(a i+ 1. Set a i+1 = a i+ 1 and b i+1 = b Else Set a i+1 = a i and b i+1 = a i+ 1

STOP, a i+ 1 is the desired root. )f(a i) > 0. Else If f(a i+ 1. Set a i+1 = a i+ 1 and b i+1 = b Else Set a i+1 = a i and b i+1 = a i+ 1 53 17. Lecture 17 Nonlinear Equations Essentially, the only way that one can solve nonlinear equations is by iteration. The quadratic formula enables one to compute the roots of p(x) = 0 when p P. Formulas

More information

To get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions.

To get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions. Concepts: Horizontal Asymptotes, Vertical Asymptotes, Slant (Oblique) Asymptotes, Transforming Reciprocal Function, Sketching Rational Functions, Solving Inequalities using Sign Charts. Rational Function

More information

MATH 1A, Complete Lecture Notes. Fedor Duzhin

MATH 1A, Complete Lecture Notes. Fedor Duzhin MATH 1A, Complete Lecture Notes Fedor Duzhin 2007 Contents I Limit 6 1 Sets and Functions 7 1.1 Sets................................. 7 1.2 Functions.............................. 8 1.3 How to define a

More information

ExtremeValuesandShapeofCurves

ExtremeValuesandShapeofCurves ExtremeValuesandShapeofCurves Philippe B. Laval Kennesaw State University March 23, 2005 Abstract This handout is a summary of the material dealing with finding extreme values and determining the shape

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Single Variable Minimization

Single Variable Minimization AA222: MDO 37 Sunday 1 st April, 2012 at 19:48 Chapter 2 Single Variable Minimization 2.1 Motivation Most practical optimization problems involve many variables, so the study of single variable minimization

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

4 damped (modified) Newton methods

4 damped (modified) Newton methods 4 damped (modified) Newton methods 4.1 damped Newton method Exercise 4.1 Determine with the damped Newton method the unique real zero x of the real valued function of one variable f(x) = x 3 +x 2 using

More information

εx 2 + x 1 = 0. (2) Suppose we try a regular perturbation expansion on it. Setting ε = 0 gives x 1 = 0,

εx 2 + x 1 = 0. (2) Suppose we try a regular perturbation expansion on it. Setting ε = 0 gives x 1 = 0, 4 Rescaling In this section we ll look at one of the reasons that our ε = 0 system might not have enough solutions, and introduce a tool that is fundamental to all perturbation systems. We ll start with

More information

1.1: The bisection method. September 2017

1.1: The bisection method. September 2017 (1/11) 1.1: The bisection method Solving nonlinear equations MA385/530 Numerical Analysis September 2017 3 2 f(x)= x 2 2 x axis 1 0 1 x [0] =a x [2] =1 x [3] =1.5 x [1] =b 2 0.5 0 0.5 1 1.5 2 2.5 1 Solving

More information

Chapter 3: Root Finding. September 26, 2005

Chapter 3: Root Finding. September 26, 2005 Chapter 3: Root Finding September 26, 2005 Outline 1 Root Finding 2 3.1 The Bisection Method 3 3.2 Newton s Method: Derivation and Examples 4 3.3 How To Stop Newton s Method 5 3.4 Application: Division

More information

We consider the problem of finding a polynomial that interpolates a given set of values:

We consider the problem of finding a polynomial that interpolates a given set of values: Chapter 5 Interpolation 5. Polynomial Interpolation We consider the problem of finding a polynomial that interpolates a given set of values: x x 0 x... x n y y 0 y... y n where the x i are all distinct.

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

Nonlinear equations and optimization

Nonlinear equations and optimization Notes for 2017-03-29 Nonlinear equations and optimization For the next month or so, we will be discussing methods for solving nonlinear systems of equations and multivariate optimization problems. We will

More information

Unit 2: Solving Scalar Equations. Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright

Unit 2: Solving Scalar Equations. Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright cs416: introduction to scientific computing 01/9/07 Unit : Solving Scalar Equations Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright 1 Introduction We now

More information

8.5 Taylor Polynomials and Taylor Series

8.5 Taylor Polynomials and Taylor Series 8.5. TAYLOR POLYNOMIALS AND TAYLOR SERIES 50 8.5 Taylor Polynomials and Taylor Series Motivating Questions In this section, we strive to understand the ideas generated by the following important questions:

More information

Optimization and Calculus

Optimization and Calculus Optimization and Calculus To begin, there is a close relationship between finding the roots to a function and optimizing a function. In the former case, we solve for x. In the latter, we solve: g(x) =

More information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 Petros Koumoutsakos, Jens Honore Walther (Last update: April 16, 2018) IMPORTANT DISCLAIMERS 1. REFERENCES: Much of the material

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Notes on Constrained Optimization

Notes on Constrained Optimization Notes on Constrained Optimization Wes Cowan Department of Mathematics, Rutgers University 110 Frelinghuysen Rd., Piscataway, NJ 08854 December 16, 2016 1 Introduction In the previous set of notes, we considered

More information

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable. Maria Cameron 1. Fixed point methods for solving nonlinear equations We address the problem of solving an equation of the form (1) r(x) = 0, where F (x) : R n R n is a vector-function. Eq. (1) can be written

More information

FIXED POINT ITERATION

FIXED POINT ITERATION FIXED POINT ITERATION The idea of the fixed point iteration methods is to first reformulate a equation to an equivalent fixed point problem: f (x) = 0 x = g(x) and then to use the iteration: with an initial

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

Section 1.x: The Variety of Asymptotic Experiences

Section 1.x: The Variety of Asymptotic Experiences calculus sin frontera Section.x: The Variety of Asymptotic Experiences We talked in class about the function y = /x when x is large. Whether you do it with a table x-value y = /x 0 0. 00.0 000.00 or with

More information

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization 5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The

More information

Line Search Techniques

Line Search Techniques Multidisciplinary Design Optimization 33 Chapter 2 Line Search Techniques 2.1 Introduction Most practical optimization problems involve many variables, so the study of single variable minimization may

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Numerical differentiation

Numerical differentiation Numerical differentiation Paul Seidel 1801 Lecture Notes Fall 011 Suppose that we have a function f(x) which is not given by a formula but as a result of some measurement or simulation (computer experiment)

More information

Maria Cameron. f(x) = 1 n

Maria Cameron. f(x) = 1 n Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that

More information

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane. Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane c Sateesh R. Mane 2018 3 Lecture 3 3.1 General remarks March 4, 2018 This

More information

3.1 Introduction. Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x x 1.5 =0, tan x x =0.

3.1 Introduction. Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x x 1.5 =0, tan x x =0. 3.1 Introduction Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x 3 +1.5x 1.5 =0, tan x x =0. Practical existence test for roots: by intermediate value theorem, f C[a, b] & f(a)f(b)

More information

WEEK 7 NOTES AND EXERCISES

WEEK 7 NOTES AND EXERCISES WEEK 7 NOTES AND EXERCISES RATES OF CHANGE (STRAIGHT LINES) Rates of change are very important in mathematics. Take for example the speed of a car. It is a measure of how far the car travels over a certain

More information

Approximation, Taylor Polynomials, and Derivatives

Approximation, Taylor Polynomials, and Derivatives Approximation, Taylor Polynomials, and Derivatives Derivatives for functions f : R n R will be central to much of Econ 501A, 501B, and 520 and also to most of what you ll do as professional economists.

More information

Math Lecture 4 Limit Laws

Math Lecture 4 Limit Laws Math 1060 Lecture 4 Limit Laws Outline Summary of last lecture Limit laws Motivation Limits of constants and the identity function Limits of sums and differences Limits of products Limits of polynomials

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

Sequence convergence, the weak T-axioms, and first countability

Sequence convergence, the weak T-axioms, and first countability Sequence convergence, the weak T-axioms, and first countability 1 Motivation Up to now we have been mentioning the notion of sequence convergence without actually defining it. So in this section we will

More information

Nonlinearity Root-finding Bisection Fixed Point Iteration Newton s Method Secant Method Conclusion. Nonlinear Systems

Nonlinearity Root-finding Bisection Fixed Point Iteration Newton s Method Secant Method Conclusion. Nonlinear Systems Nonlinear Systems CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Nonlinear Systems 1 / 24 Part III: Nonlinear Problems Not all numerical problems

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection 6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods

More information

UNCONSTRAINED OPTIMIZATION

UNCONSTRAINED OPTIMIZATION UNCONSTRAINED OPTIMIZATION 6. MATHEMATICAL BASIS Given a function f : R n R, and x R n such that f(x ) < f(x) for all x R n then x is called a minimizer of f and f(x ) is the minimum(value) of f. We wish

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

MATH 23b, SPRING 2005 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Midterm (part 1) Solutions March 21, 2005

MATH 23b, SPRING 2005 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Midterm (part 1) Solutions March 21, 2005 MATH 23b, SPRING 2005 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Midterm (part 1) Solutions March 21, 2005 1. True or False (22 points, 2 each) T or F Every set in R n is either open or closed

More information

CS 323: Numerical Analysis and Computing

CS 323: Numerical Analysis and Computing CS 323: Numerical Analysis and Computing MIDTERM #2 Instructions: This is an open notes exam, i.e., you are allowed to consult any textbook, your class notes, homeworks, or any of the handouts from us.

More information

2.098/6.255/ Optimization Methods Practice True/False Questions

2.098/6.255/ Optimization Methods Practice True/False Questions 2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information