Unconstrained optimization I Gradient-type methods
|
|
- Joel Byrd
- 5 years ago
- Views:
Transcription
1 Unconstrained optimization I Gradient-type methods Antonio Frangioni Department of Computer Science University of Pisa frangio@di.unipi.it Computational Mathematics for Learning and Data Analysis Master in Computer Science University of Pisa
2 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize
3 Optimization algorithms Iterative procedures (doh!) Start from initial guess x 0, some process x i x i+1 Want the sequence { x i } to go towards an optimal solution Actually three different forms: (strong) { x i } x : the whole sequence converges to an optimal solution (weaker) all accumulation points of { x i } (if any) are optimal solutions (weakest) at least one accumulation point of { x i } (if any) is optimal X compact helps (accumulation points always ), but here X = R n f not convex = optimal stationary point Two general forms of the process: line search: first choose d i R n (direction), then choose α i R (stepsize) s.t. x i+1 x i + α i d i trust region: first choose α i (trust radius), then choose d i In ML, α i is often called learning rate Crucial concept: model of f used to construct next iterate
4 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize
5 First example of line search: gradient method Simplest idea: my model is linear Best linear model of f at x i : f i (x) = f (x i ) + f (x i )( x x i ) x i+1 argmin{ f i (x) : x R n } Except, of course, argmin is empty: f i is unbounded below on R n Go infinitely much along the steepest descent direction d i = f (x i ) But this clearly is trusting the model too much: f (x) f i (x) far from x i As you move along d i, f changes; soon f d i will no longer be negative Beware too long steps, as f will (probably) start growing after a while Too short steps are bad either: f will decrease, but only too little The best step ever: α i argmin{ f ( x i + αd i ) : α 0 } exact line search (doh!) Then, x i+1 x i + α i d i Exact line search is difficult in general, let s start simple Exercise: prove α i > 0
6 Gradient method for quadratic functions Couldn t be simpler than f (x) = 1 2 x T Qx + qx Think Q 0 as otherwise f is surely unbounded below x solves Qx = q (if ), so this is linear algebra Inverting/factorizing Q is O(n 3 ) in practice, can we do better? d i = f (x i ) = Qx i q (O(n 2 ) to compute) Good news: line search is easy, α i = d i 2 /( (d i ) T Qd i ) = procedure x = SDQ ( Q, q, x, ε ) { while( f (x) > ε ) do { d f (x); α d 2 /( d T Qd ); x x + αd; } } Exercise: prove the formula for α i Exercise: there is a glaring numerical problem in that procedure, fix it Exercise: something can go wrong with that formula: what does it mean? Improve the code to take that occurrence into account. Exercise: what happens if Q 0? Does the (improved) code need be fixed?
7 Gradient method: convergence The gradient method works : what does this mean? Asymptotic analysis: ε = 0 = { x i } is/contains a minimizing sequence Fundamental relationship: f (x i ), f (x i+1 ) = 0 Proof: d i, f (x i+1 ) = f d i (x i+1 ), but x i+1 local minimum along d i { x i } x = f (x) = 0 Proof: lim i f (x i ), f (x i+1 ) = 0 = f (x), f (x) (why?) Any subsequence that converges does so at a stationary point (weaker) Do (sub)sequence(s) converge? X compact would help, but X = R n ε > 0 = finitely terminates (why?), no convergence required Exercise: prove that if Q 0, then { x i } x, unique optimum
8 Gradient method: efficiency The gradient method is (not) fast : what does this mean? How rapidly x i x decreases... hard; { x i } may not converge different subsequences different optima (which x?) Typically, how rapidly f (x i ) f decreases (eventually, it has to) Rate/order of convergence: f (x i+1 ) f lim i ( f (x i ) f ) p = R p = 1, R = 1 = sublinear convergence 1 / i, 1 / i 2... p = 1, R < 1 = linear convergence γ i, γ < 1 p = 1, R = 0 = superlinear (!) convergence γ i 2, γ < 1 p = 2, R > 0 = quadratic (!!!) convergence γ 2i, γ < 1 Linear convergence: in the tail, f (x i+1 ) f R( f (x i ) f ) = f (x i ) f ( f (x 1 ) f )R i, as fast as a negative exponential f (x i ) f ε for i log( ( f (x 1 ) f )/ε ) / log(1/r) O( log( 1/ε ) ) [good!], but the constant as R 1 [bad!]
9 Gradient method: efficiency Analysis is not obvious, have to use property of x (unknown) In this case, nifty trick: f (x) = 1 2 (x x ) T Q(x x ) = f (x) x T Qx = f (x) f the error at x is the distance between x and x in Q Exercise: check the above formula (hint: remember Qx + q = 0) One can then prove that if Q 0 then ( f (x i+1 d i 4 ) ) = 1 ((d i ) T Qd i )((d i ) T Q 1 d i f (x i ) ) the error decreases by exactly a constant factor at each iteration Making sense of the above bound requires a bit of work Exercise: check the above formula (hint: for y i = x i x, d i = Qy i )
10 Gradient method: efficiency (cont.d) Recall a few facts: Λ(Q) = λ 1... λ n > 0 eigenvalues of Q = Λ(Q 1 ) = 1 / λ n... 1 / λ 1 > 0 eigenvalues of Q 1 λ n x 2 x T Qx λ 1 x 2 x R n Hence, x 2 /x T Qx 1/λ 1, x 2 /x T Q 1 x λ n (check) = x R n x 4 (x T Qx)(x T Q 1 x) λn λ 1 A better estimate is possible (technical, just believe it): A bit better: with λ 1 = 1000λ n x R n x 4 (x T Qx)(x T Q 1 x) 4λ1 λ n (λ 1 + λ n ) 2 λ n λ 1 = < 4λ1 λ n (λ 1 + λ n )
11 Gradient method: efficiency (wrap up) All in all: ( λ f (x i+1 1 λ n ) 2 ) f λ 1 + λ n ( f (x i ) f ) the prototype of all linear convergence results Good news: the bound is dimension independent does not depend on n = holds the same for very-large-scale Bad news: the bound depends badly on conditioning of Q Example: λ 1 = 1000λ n = R / log(1/r) 576 Note: with coarser formula R = / log(1/r) With f (x 1 ) f = 1, ε = 10 6 requires 3500 iterations even for n = 2
12 Gradient method: efficiency (wrap up) All in all: ( λ f (x i+1 1 λ n ) 2 ) f λ 1 + λ n ( f (x i ) f ) the prototype of all linear convergence results Good news: the bound is dimension independent does not depend on n = holds the same for very-large-scale Bad news: the bound depends badly on conditioning of Q Example: λ 1 = 1000λ n = R / log(1/r) 576 Note: with coarser formula R = / log(1/r) With f (x 1 ) f = 1, ε = 10 6 requires 3500 iterations even for n = 2... but also for n = 10 8 Dimension independence is liked a lot in ML, but R may 1 as n grows More bad news: the behaviour in practice is close to the bound Intuitively, the algorithm zig-zags a lot when level sets are very elongated
13 En passant: the stopping criterion The stopping criterion is not what one would want, which is f (x i ) f = ε A ε (absolute error) or ε A / f = ε R ε (relative error) (more or less alternative version has f (x i ) at the denominator) Exercise: the definition of ε R has a glaring numerical problem, fix it Exercise: explain exactly why ε R is better than ε A Except, f is unknown (most often) and cannot be used on-line Need a lower bound f f, tight at least towards termination Estimating f could be considered the true problem Often f not there, hence f (x i ) ε the only workable alternative But the relationship between the two ε is far from obvious Sometimes f (x) has a physical meaning that can be used Exercise: for X = B(0, r) and f convex, estimate ε A when f (x i ) ε
14 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize
15 Gradient method: non-quadratic case What happens when f is a general nonlinear function? Good news: convergence is the same (never used f quadratic ) Condition f (x i ), f (x i+1 ) = 0 holds at local minima (but also at local maxima and saddle points), so convexity not crucial Good/bad news: efficiency is basically the same. f C 2, x local minimum such that 2 f (x ) = Q 0; if { x i } x, then { f (x i ) } f (x ) linearly with the same R as in the quadratic case (depending on λ 1 and λ n of Q) In the tail of the convergence process f its second-order model, so convergence is the same Fundamental issue: exact line search is difficult Algebraic solution (compute f (x α f (x)), find its roots) possible only in a limited set of cases Has to algorithmically search along the line for the right α i (doh!)
16 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize
17 Line Search: first-order approaches For ϕ( α ) = f ( x i + αd i ) : R R, ϕ ( α ) = f ( x i + αd i ), d Exercise: prove this using the chain rule: f : R m R k, g : R n R m h(x) = f (g(x)) : R n R k = Jh(x) = Jf (g(x)) Jg(x) (note that Jf R k m, Jg R m n, in fact Jh R k m R m n = R k n ) Find α i s.t. ϕ ( α i ) = 0 f continuous = ϕ continuous (why?) α i must exist if ᾱ s.t. ϕ ( ᾱ ) > 0 Exercise: prove this (hint: use the intermediate value theorem) Obvious solution: ᾱ 1; // or whatever value while( ϕ ( ᾱ ) > 0 ) do ᾱ 2ᾱ; // or whatever factor Will work in practice for all reasonable function Works if ϕ coercive: lim α ϕ( α ) = (ex. f strongly convex) Exercise: construct an example where ᾱ exists but it is not found
18 Line Search: Bisection method Pretty darn obvious: procedure α = LSBM ( ϕ, α, ε ) { α 0; α + α; while( true ) do { α (α + + α )/2; v ϕ ( α ); if( v ε ) then break; if( v < 0 ) then α α; else α + α; } } Asymptotic convergence: ε = 0, { α k } infinite sequence { α k } [ 0, ᾱ ] = convergent subsequence to α (why?) α [ α k, α k + ] k, α k + α k = ᾱ2 k = { α k } α (why?) = { ϕ ( α k ) } ϕ ( α ) = 0 (why?) = finitely terminate for ε > 0 Exercise: prove: ϕ locally Lipschitz at α = { ϕ ( α k ) } 0 linearly (R?) Exercise: construct counter-example (ϕ not locally Lipschitz) Exercise: suggest assumptions for ϕ locally Lipschitz = linear convergence
19 Improving the bisection method: interpolation Choosing α k+1 right in the middle just the dumbest possible approach One knows a lot about ϕ: ϕ( α ), ϕ( α + ), ϕ ( α + ), ϕ ( α ) (need be computed, but usually free if one computes ϕ ) Quadratic interpolation: aα 2 + bα + c that agrees with ϕ at α +, α Three parameters, four conditions, something s gotta give (three cases) Example: 2aα + + b = ϕ ( α + ), 2aα + b = ϕ ( α ) = a = ϕ ( α + ) ϕ ( α ) 2(α + α ), b = α ϕ ( α + ) α + ϕ ( α ) α + α Minimum solves 2aα + b = 0 (c irrelevant) α = α ϕ ( α + ) α + ϕ ( α ) ϕ ( α + ) ϕ ( α ) a convex combination between α + and α (check) Exercise: develop the other cases of quadratic interpolation and discuss them
20 Improving the bisection method: more interpolation It can be proven (long and complicated) that, if ϕ C 3, then quadratic interpolation has convergence of order 1 < p < 2 (superlinear) For instance, the previous formula (a.k.a. method of false position or secant formula ) has p = (1 + 5)/ Exercise: propose a simple modification that guarantees (linear) convergence even if ϕ / C 3 while changing as little as possible the normal run Four conditions = can fit a cubic polynomial and use its minima Rather tedious to write down, analyse and implement Theoretically pays: cubic interpolation has quadratic convergence (p = 2) Seems to work pretty well in practice Exercise (not for the faint of heart): develop cubic interpolation
21 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize
22 Line Search: second-order approaches More derivatives = same information with less points f C 2 = ϕ (α) = d T 2 f (x + αd)d and continuous (why?) Exercise: prove this using the chain rule Computing 2 f = quadratic convergence with only one point Newton s method (tangent method): first-order Taylor of ϕ at α k ϕ (α) ϕ (α k ) + ϕ (α k )( α α k ), solve ϕ (α) = 0 = α = α k ϕ (α k ) / ϕ (α k ) This is clearly second-order approximation of ϕ Fantastically simple procedure α = LSNM ( ϕ, ϕ, α, ε ) { while( ϕ ( α ) > ε ) do α α ϕ ( α ) / ϕ ( α ); } Extremely good convergence (under appropriate conditions) Clearly numerically delicate: what if ϕ ( α ) 0?
23 Analysis of Newton s method Theoretical analysis of Newton s method instructive If ϕ C 3, ϕ ( α ) = 0 and ϕ (α ) 0, then δ > 0 s.t. if Newton s method starts at α [ α δ, α + δ ], then { α k } α with p = 2 Proof: the iteration gives α k+1 α = α k α ( ϕ ( α k ) ϕ ( α ) ) / ϕ ( α k ) = [ ϕ ( α k ) ϕ ( α ) + ϕ ( α k )( α k α ) ] / ϕ ( α k ) For some β [ α k, α ], Taylor gives ϕ ( α ) = ϕ ( α k ) + ϕ ( α k )( α k α ) + ϕ ( β )( α k α ) 2 /2 = α k+1 α = [ ϕ ( β ) / 2ϕ ( α k ) ]( α k α ) 2 δ > 0 s.t. ϕ ( α ) k 2 > 0 (why?) and ϕ ( β ) k 1 < (why?) for α, β [ α δ, α + δ ] = α k+1 α [ k 1 / 2k 2 ]( α k α ) 2 k 1 ( α k α )/2k 2 1 = α k+1 α < α k α = { α k } α, and the convergence is quadratic Convergence only if α k+1 α small enough Nontrivial to ensure in practice
24 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize
25 Line Search: zeroth-order approaches Computing f / 2 f can be costly (d T 2 f d is O(n 2 ) already) Only use ϕ values: less derivatives = more points Golden ratio search: assuming ϕ( 0 ) ϕ( α ) procedure α = LSGRM ( ϕ, α, ε ) { α 0; α + α; α α; α + = α; while( α + α ε ) do if( ϕ( α ) > ϕ( α + ) ) then { α α ; α α α +; α (α + α ); } else { α + α +; α + α α ; α 0.382(α + α ); } } ( 5 1)/2 (golden ratio), = Property: r = (1 r)/r 0.382/0.618, i.e., r : 1 = (1 r) : r Can compute only one ϕ( α ) per iteration Can do slightly better by using r k = F n k /F n k+1 (Fibonacci sequence) Exercise: picture out graphically how it works Exercise: analyse asymptotic and finite convergence of the approach
26 Gradient method and (inexact) line search Is ϕ ( α i ) ε enough for convergence? It depends on ε (of course) Trick: d i = f (x i )/ f (x i ) = d i = 1, ϕ ( 0 ) = f (x i ) ϕ ( α i ) = d i, f (x i+1 ) = f (x i )/ f (x i ), f (x i+1 ) { x i } x lim i f (x i )/ f (x i ), f (x i+1 ) = f (x)/ f (x), f (x) = f (x) ε (note: f (x i ) > ε) ε > 0 and { x i } x = for finite i, x i is approximate stationary point Note: with d i := f (x i ), ε := ε f (x i ) Other assumptions on f needed to ensure { x i } x (R n not compact) Simple one: f coercive lim x f (x) = + f continuous = f coercive S(f, v) compact v Exercise: prove f coercive (+ what else neded) = algorithm finitely stops Exercise: discuss how to get asymptotic convergence (ε = 0) Do we really need a close approximation to f (x) = 0?
27 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize
28 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, x
29 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) x
30 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ x
31 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x
32 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 )
33 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima
34 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima Wolfe condition: m 1 < m 3 < 1, (W) ϕ ( α ) m 3 ϕ ( 0 )
35 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima Wolfe condition: m 1 < m 3 < 1, (W) ϕ ( α ) m 3 ϕ ( 0 ) the curvature has to be a bit closer to 0 (but can be 0)
36 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima Wolfe condition: m 1 < m 3 < 1, (W) ϕ ( α ) m 3 ϕ ( 0 ) the curvature has to be a bit closer to 0 (but can be 0) Strong Wolfe: (W ) ϕ ( α ) m 3 ϕ ( 0 ) = m 3 ϕ ( 0 ) cannot be 0, but still captures all local minima (and maxima) Clearly, (W ) = (W)
37 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima Wolfe condition: m 1 < m 3 < 1, (W) ϕ ( α ) m 3 ϕ ( 0 ) the curvature has to be a bit closer to 0 (but can be 0) Strong Wolfe: (W ) ϕ ( α ) m 3 ϕ ( 0 ) = m 3 ϕ ( 0 ) cannot be 0, but still captures all local minima (and maxima) Clearly, (W ) = (W) (A) (W) / (W ) typically captures all local minima
38 Gradient method and (really) inexact line search Don t need to get a local minimum, just decrease enough f(x) Armijo condition: 0 < m 1 < 1, (A) ϕ( α ) ϕ( 0 ) + m 1 αϕ ( 0 ) m 1 ( 1) of the descent promised by ϕ Issue: arbitrarily short steps satisfy (A) x Goldstein condition: m 1 < m 2 < 1, (G) ϕ( α ) ϕ( 0 ) + m 2 αϕ ( 0 ) Issue: (A) (G) can easily exclude all local minima Wolfe condition: m 1 < m 3 < 1, (W) ϕ ( α ) m 3 ϕ ( 0 ) the curvature has to be a bit closer to 0 (but can be 0) Strong Wolfe: (W ) ϕ ( α ) m 3 ϕ ( 0 ) = m 3 ϕ ( 0 ) cannot be 0, but still captures all local minima (and maxima) Clearly, (W ) = (W) (A) (W) / (W ) typically captures all local minima unless m 1 too close to 1 (that s why m )
39 Armijo-Wolfe line search ϕ C 1 ϕ( α ) bounded below for α 0 = α s.t. (A) (W ) holds Proof: l(α) = ϕ(0) + m 1 αϕ (0), d(α) = l(α) ϕ(α) = d(0) = 0, d (0) = (m 1 1)ϕ (0) > 0 (m 1 < 1) ᾱ > 0 s.t. d(ᾱ) = 0 = ϕ unbounded below (why?) Smallest ᾱ > 0 s.t. d(ᾱ) = 0: (A) is satisfied α (0, ᾱ] (why?) Rolle s theorem: d (ᾱ) < 0 = ϕ (ᾱ) > m 1 ϕ (0) (> m 3 ϕ (0) > ϕ (0)) Intermediate value theorem (on ϕ ): α (0, ᾱ) s.t. ϕ (α ) = m 3 ϕ (0) = (W ) also holds in α But how do I actually find such a point? m 1 small enough s.t. local minima are not cut = just go for the local minima and stop whenever (A) (W) / (W ) holds Hard to say if m 1 is small enough, although m 1 = most often is Specialized line search can be constructed for the odd case it is not Basic idea: find an interval [ α, ᾱ ] that surely contains points satisfying (A) (W) / (W ) (cf. proof above), restrict the search there inside Exercise (not for the faint of heart): develop specialized line search
40 Convergence with Armijo-Wolfe line search f Lipschitz continuous (A) (W) always hold = either f unbounded below or { f (x i ) } 0 Proof: (W) = ϕ ( α i ) ϕ ( 0 ) (1 m 3 )( ϕ ( 0 )) = f Lipschitz = ϕ Lipschitz and L does not depend on x i (check) = α i (1 m 3 )( ϕ ( 0 ))/L (check: where has gone?) ϕ ( 0 ) = f (x i ) ε > 0 = α i δ > 0 (A) = f (x i+1 ) f (x i ) m 1 α i f (x i ) f (x i ) m 1 δε = { f (x i ) } (or { f (x i ) } 0) Usual stuff: { x i } x = x a stationary point Hence, the algorithm finitely terminates with ε > 0 Insight from the proof: (W) (+ Lipschitz) serve to ensure that α k c f (x i ) for some c > 0 Can we get the same in a simpler way?
41 Backtracking line search Backtracking line search: procedure α = BLS ( ϕ, ϕ, α, m 1, τ ) { while( ϕ( α ) > ϕ( 0 ) + m 1αϕ ( 0 ) ) do α τ α; } f Lipschitz = gradient method with BLS works Proof: for simplicity, α = 1 (input). Remember the proof: ᾱ s.t. (A) holds α (0, ᾱ] and ϕ (ᾱ) > m 1 ϕ (0) > ϕ (0) = L(ᾱ 0) ϕ (ᾱ) ϕ (0) > (1 m 1 )( ϕ (0)) = ᾱ > (1 m 1 ) f (x i ) /L (same as before) f (x i ) > ε i = ᾱ > δ > 0 i h = min{ k : τ k δ } = α i τ h > 0 i = f (x i+1 ) f (x i ) m 1 τ h ε = { f (x i ) } or Now, { x i } x = x stationary blah blah Fundamental trick: α i can 0, but only as fast as f (x i ) Would be simpler if α i δ > 0 for good Exercise: remove assumption α = 1 (input)
42 Outline Unconstrained optimization Gradient method for quadratic functions Gradient method for general functions Exact Line Search: first-order approaches Exact Line Search: second-order approaches Exact Line Search: zeroth-order approaches Inexact Line Search: Armijo-Wolfe Really inexact Line Search: fixed stepsize
43 Line Search: really really inexact... no line search at all fixed step size Recall f Lipschitz = f (y) f (x) + f (x)( y x ) + L 2 y x 2 y := x i+1, x := x i, y x := α f (x i ) = f (x i+1 ) f (x i ) ( Lα 2 /2 α ) f (x i ) 2 (check) Powerful idea: find α that provides best worst-case improvement v(α) = Lα 2 /2 α, v (α) = Lα 1 = 0 = α = 1/L, v(α ) = 1/2L All in all: f (x i+1 ) f (x i ) f (x i ) 2 /(2L) Can t do better if you trust the quadratic upper estimate (which of course must not be trusted) In fact, α i = 1/L terrible in practice = use the previous methods Enticing because simple and inexpensive Selecting the parameters that lead to best performances for a model a very powerful idea in general
44 Fixed stepsize: convergence rate Once you have convergence, you can talk efficiency (easier with α fixed) Already know the error decreases, but how fast? i+1 := f (x i+1 ) f (x ) ( i := f (x i ) f (x ) ) f (x i ) 2 /(2L) x i any and f (x ) f (x i+1 ) = f (x ) f (x) f (x) 2 /(2L) f convex = f (x)(x x ) f (x) f (x ) f (x) 2 /(2L) x This proves r i := x i x decreases: (r i+1 ) 2 = x i+1 x 2 = x i x f (x i )/L 2 = = x i x 2 2 f (x i )(x i x )/L + f (x i )/L 2 x i x 2 = (r i ) 2 Hence, at the very least { x i } x (no problem here) Technical step: f (x i ) (r i /r 1 ) f (x i ) [Cauchy-Swartz] f (x i )(x i x ) (f (x i ) f (x ))/r 1 [convexity] = i /r 1 Conclusion: i+1 i f (x i ) 2 /(2L) i ( i ) 2 /(2(r 1 ) 2 L) = = i ( 1 i /(2(r 1 ) 2 L) ) not linear convergence as R is not constant sublinear
45 Fixed stepsize: convergence rate (cont d) What does this mean, exactly? i+1 i ( i ) 2 /(2(r 1 ) 2 L): divide by i+1 i = 1 i 1 i+1 i i+1 2(r 1 ) 2 L = 1 i i + 2(r 1 ) 2 L (why?) 1/ grows by a constant at each i = 1/ i+1 1/ 1 + i/(2(r 1 ) 2 L) = i (r 1 ) 2 L/( 2(r 1 ) 2 L + i 1 ) Error decreases as O( 1 / i ) = O( 1 / ε ) iterations (check details) Exponentially worse than O( 1 / log( ε ) ) However, this is unfair: we used Q nonsingular λ n > 0 Does it make a difference? You bet
46 Fixed stepsize: convergence rate with strong convexity Basically strong convexity Eigenvalues bounded both above and below: u I f 2 (x) L I, u > 0 Taylor = f ( x ) f ( x i ) + f (x i )( x x i ) + u x x i 2 /2 (why?) Minimize on x both sides independently = f ( x ) f ( x i ) f (x i ) 2 /(2u) (check) = f (x i ) 2 2u( f ( x i ) f ( x ) ) Put in f (x i+1 ) f (x ) f (x i ) f (x ) f (x i ) 2 /(2L) = f (x i+1 ) f (x ) ( f (x i ) f (x ) )( 1 u/l ) with exact step, funnily same as with coarse estimate, i.e., much worse A small difference in f makes a big difference in convergence Properties of f even more important than the algorithm O( 1/ε ) not the best for not strongly convex, can be O( 1/ ε ) better, but still much worse than O( 1 / log( ε ) ) Hence better algorithms do count, we ll work towards that However, O( 1/ ε ) is tight: can t do better without strong convexity Algorithms can only get so far with nasty problems
47 Wrap up Gradient (descent direction) + line search = convergence Line search by no means have to be exact
48 Wrap up Gradient (descent direction) + line search = convergence Line search by no means have to be exact... but not too coarse either Many different practical line searches, up to no search at all Convergence of gradient methods can be from quite bad to horrible
49 Wrap up Gradient (descent direction) + line search = convergence Line search by no means have to be exact... but not too coarse either Many different practical line searches, up to no search at all Convergence of gradient methods can be from quite bad to horrible... in practice as well as in theory Something better sorely needed
8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationChapter 1. Root Finding Methods. 1.1 Bisection method
Chapter 1 Root Finding Methods We begin by considering numerical solutions to the problem f(x) = 0 (1.1) Although the problem above is simple to state it is not always easy to solve analytically. This
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationUnconstrained Optimization
1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationPart 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)
Part 2: Linesearch methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More informationter. on Can we get a still better result? Yes, by making the rectangles still smaller. As we make the rectangles smaller and smaller, the
Area and Tangent Problem Calculus is motivated by two main problems. The first is the area problem. It is a well known result that the area of a rectangle with length l and width w is given by A = wl.
More informationNumerical Optimization
Unconstrained Optimization (II) Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Unconstrained Optimization Let f : R R Unconstrained problem min x
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationOPER 627: Nonlinear Optimization Lecture 14: Mid-term Review
OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationReview of Classical Optimization
Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization
More information1 Lecture 25: Extreme values
1 Lecture 25: Extreme values 1.1 Outline Absolute maximum and minimum. Existence on closed, bounded intervals. Local extrema, critical points, Fermat s theorem Extreme values on a closed interval Rolle
More informationSimple Iteration, cont d
Jim Lambers MAT 772 Fall Semester 2010-11 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Simple Iteration, cont d In general, nonlinear equations cannot be solved in a finite sequence
More informationIE 5531: Engineering Optimization I
IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 1
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More information1 Numerical optimization
Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................
More informationOutline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations
Methods for Systems of Methods for Systems of Outline Scientific Computing: An Introductory Survey Chapter 5 1 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign
More informationOptimization Methods. Lecture 19: Line Searches and Newton s Method
15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions
More informationLecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations
Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational
More informationV. Graph Sketching and Max-Min Problems
V. Graph Sketching and Max-Min Problems The signs of the first and second derivatives of a function tell us something about the shape of its graph. In this chapter we learn how to find that information.
More informationGenerating Function Notes , Fall 2005, Prof. Peter Shor
Counting Change Generating Function Notes 80, Fall 00, Prof Peter Shor In this lecture, I m going to talk about generating functions We ve already seen an example of generating functions Recall when we
More informationNon-Convex Optimization. CS6787 Lecture 7 Fall 2017
Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper
More informationMotivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)
AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationROOT FINDING REVIEW MICHELLE FENG
ROOT FINDING REVIEW MICHELLE FENG 1.1. Bisection Method. 1. Root Finding Methods (1) Very naive approach based on the Intermediate Value Theorem (2) You need to be looking in an interval with only one
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationNumerical Methods in Informatics
Numerical Methods in Informatics Lecture 2, 30.09.2016: Nonlinear Equations in One Variable http://www.math.uzh.ch/binf4232 Tulin Kaman Institute of Mathematics, University of Zurich E-mail: tulin.kaman@math.uzh.ch
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 5 Nonlinear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationTHE SECANT METHOD. q(x) = a 0 + a 1 x. with
THE SECANT METHOD Newton s method was based on using the line tangent to the curve of y = f (x), with the point of tangency (x 0, f (x 0 )). When x 0 α, the graph of the tangent line is approximately the
More informationLecture 10: Powers of Matrices, Difference Equations
Lecture 10: Powers of Matrices, Difference Equations Difference Equations A difference equation, also sometimes called a recurrence equation is an equation that defines a sequence recursively, i.e. each
More informationCS 450 Numerical Analysis. Chapter 5: Nonlinear Equations
Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80
More informationNumerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore
Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 13 Steepest Descent Method Hello, welcome back to this series
More information5 Overview of algorithms for unconstrained optimization
IOE 59: NLP, Winter 22 c Marina A. Epelman 9 5 Overview of algorithms for unconstrained optimization 5. General optimization algorithm Recall: we are attempting to solve the problem (P) min f(x) s.t. x
More informationBindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.
Problem du jour Week 6: Monday, Mar 7 Show that for any initial guess x 0 > 0, Newton iteration on f(x) = x 2 a produces a decreasing sequence x 1 x 2... x n a. What is the rate of convergence if a = 0?
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationNumerical solutions of nonlinear systems of equations
Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points
More informationWe are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero
Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.
More informationCS 323: Numerical Analysis and Computing
CS 323: Numerical Analysis and Computing MIDTERM #2 Instructions: This is an open notes exam, i.e., you are allowed to consult any textbook, your class notes, homeworks, or any of the handouts from us.
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More information2.5 The Fundamental Theorem of Algebra.
2.5. THE FUNDAMENTAL THEOREM OF ALGEBRA. 79 2.5 The Fundamental Theorem of Algebra. We ve seen formulas for the (complex) roots of quadratic, cubic and quartic polynomials. It is then reasonable to ask:
More informationSTOP, a i+ 1 is the desired root. )f(a i) > 0. Else If f(a i+ 1. Set a i+1 = a i+ 1 and b i+1 = b Else Set a i+1 = a i and b i+1 = a i+ 1
53 17. Lecture 17 Nonlinear Equations Essentially, the only way that one can solve nonlinear equations is by iteration. The quadratic formula enables one to compute the roots of p(x) = 0 when p P. Formulas
More informationTo get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions.
Concepts: Horizontal Asymptotes, Vertical Asymptotes, Slant (Oblique) Asymptotes, Transforming Reciprocal Function, Sketching Rational Functions, Solving Inequalities using Sign Charts. Rational Function
More informationMATH 1A, Complete Lecture Notes. Fedor Duzhin
MATH 1A, Complete Lecture Notes Fedor Duzhin 2007 Contents I Limit 6 1 Sets and Functions 7 1.1 Sets................................. 7 1.2 Functions.............................. 8 1.3 How to define a
More informationExtremeValuesandShapeofCurves
ExtremeValuesandShapeofCurves Philippe B. Laval Kennesaw State University March 23, 2005 Abstract This handout is a summary of the material dealing with finding extreme values and determining the shape
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationSingle Variable Minimization
AA222: MDO 37 Sunday 1 st April, 2012 at 19:48 Chapter 2 Single Variable Minimization 2.1 Motivation Most practical optimization problems involve many variables, so the study of single variable minimization
More informationIntroduction to Nonlinear Optimization Paul J. Atzberger
Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,
More information4 damped (modified) Newton methods
4 damped (modified) Newton methods 4.1 damped Newton method Exercise 4.1 Determine with the damped Newton method the unique real zero x of the real valued function of one variable f(x) = x 3 +x 2 using
More informationεx 2 + x 1 = 0. (2) Suppose we try a regular perturbation expansion on it. Setting ε = 0 gives x 1 = 0,
4 Rescaling In this section we ll look at one of the reasons that our ε = 0 system might not have enough solutions, and introduce a tool that is fundamental to all perturbation systems. We ll start with
More information1.1: The bisection method. September 2017
(1/11) 1.1: The bisection method Solving nonlinear equations MA385/530 Numerical Analysis September 2017 3 2 f(x)= x 2 2 x axis 1 0 1 x [0] =a x [2] =1 x [3] =1.5 x [1] =b 2 0.5 0 0.5 1 1.5 2 2.5 1 Solving
More informationChapter 3: Root Finding. September 26, 2005
Chapter 3: Root Finding September 26, 2005 Outline 1 Root Finding 2 3.1 The Bisection Method 3 3.2 Newton s Method: Derivation and Examples 4 3.3 How To Stop Newton s Method 5 3.4 Application: Division
More informationWe consider the problem of finding a polynomial that interpolates a given set of values:
Chapter 5 Interpolation 5. Polynomial Interpolation We consider the problem of finding a polynomial that interpolates a given set of values: x x 0 x... x n y y 0 y... y n where the x i are all distinct.
More informationOptimal Newton-type methods for nonconvex smooth optimization problems
Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations
More informationNonlinear equations and optimization
Notes for 2017-03-29 Nonlinear equations and optimization For the next month or so, we will be discussing methods for solving nonlinear systems of equations and multivariate optimization problems. We will
More informationUnit 2: Solving Scalar Equations. Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright
cs416: introduction to scientific computing 01/9/07 Unit : Solving Scalar Equations Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright 1 Introduction We now
More information8.5 Taylor Polynomials and Taylor Series
8.5. TAYLOR POLYNOMIALS AND TAYLOR SERIES 50 8.5 Taylor Polynomials and Taylor Series Motivating Questions In this section, we strive to understand the ideas generated by the following important questions:
More informationOptimization and Calculus
Optimization and Calculus To begin, there is a close relationship between finding the roots to a function and optimizing a function. In the former case, we solve for x. In the latter, we solve: g(x) =
More informationCLASS NOTES Models, Algorithms and Data: Introduction to computing 2018
CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 Petros Koumoutsakos, Jens Honore Walther (Last update: April 16, 2018) IMPORTANT DISCLAIMERS 1. REFERENCES: Much of the material
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationNotes on Constrained Optimization
Notes on Constrained Optimization Wes Cowan Department of Mathematics, Rutgers University 110 Frelinghuysen Rd., Piscataway, NJ 08854 December 16, 2016 1 Introduction In the previous set of notes, we considered
More informationx 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.
Maria Cameron 1. Fixed point methods for solving nonlinear equations We address the problem of solving an equation of the form (1) r(x) = 0, where F (x) : R n R n is a vector-function. Eq. (1) can be written
More informationFIXED POINT ITERATION
FIXED POINT ITERATION The idea of the fixed point iteration methods is to first reformulate a equation to an equivalent fixed point problem: f (x) = 0 x = g(x) and then to use the iteration: with an initial
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationLecture Notes: Geometric Considerations in Unconstrained Optimization
Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections
More informationSection 1.x: The Variety of Asymptotic Experiences
calculus sin frontera Section.x: The Variety of Asymptotic Experiences We talked in class about the function y = /x when x is large. Whether you do it with a table x-value y = /x 0 0. 00.0 000.00 or with
More informationOptimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization
5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The
More informationLine Search Techniques
Multidisciplinary Design Optimization 33 Chapter 2 Line Search Techniques 2.1 Introduction Most practical optimization problems involve many variables, so the study of single variable minimization may
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationNumerical differentiation
Numerical differentiation Paul Seidel 1801 Lecture Notes Fall 011 Suppose that we have a function f(x) which is not given by a formula but as a result of some measurement or simulation (computer experiment)
More informationMaria Cameron. f(x) = 1 n
Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that
More informationQueens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.
Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane c Sateesh R. Mane 2018 3 Lecture 3 3.1 General remarks March 4, 2018 This
More information3.1 Introduction. Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x x 1.5 =0, tan x x =0.
3.1 Introduction Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x 3 +1.5x 1.5 =0, tan x x =0. Practical existence test for roots: by intermediate value theorem, f C[a, b] & f(a)f(b)
More informationWEEK 7 NOTES AND EXERCISES
WEEK 7 NOTES AND EXERCISES RATES OF CHANGE (STRAIGHT LINES) Rates of change are very important in mathematics. Take for example the speed of a car. It is a measure of how far the car travels over a certain
More informationApproximation, Taylor Polynomials, and Derivatives
Approximation, Taylor Polynomials, and Derivatives Derivatives for functions f : R n R will be central to much of Econ 501A, 501B, and 520 and also to most of what you ll do as professional economists.
More informationMath Lecture 4 Limit Laws
Math 1060 Lecture 4 Limit Laws Outline Summary of last lecture Limit laws Motivation Limits of constants and the identity function Limits of sums and differences Limits of products Limits of polynomials
More informationSeptember Math Course: First Order Derivative
September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which
More informationSequence convergence, the weak T-axioms, and first countability
Sequence convergence, the weak T-axioms, and first countability 1 Motivation Up to now we have been mentioning the notion of sequence convergence without actually defining it. So in this section we will
More informationNonlinearity Root-finding Bisection Fixed Point Iteration Newton s Method Secant Method Conclusion. Nonlinear Systems
Nonlinear Systems CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Nonlinear Systems 1 / 24 Part III: Nonlinear Problems Not all numerical problems
More informationStatic unconstrained optimization
Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2
More information6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection
6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods
More informationUNCONSTRAINED OPTIMIZATION
UNCONSTRAINED OPTIMIZATION 6. MATHEMATICAL BASIS Given a function f : R n R, and x R n such that f(x ) < f(x) for all x R n then x is called a minimizer of f and f(x ) is the minimum(value) of f. We wish
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationMATH 23b, SPRING 2005 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Midterm (part 1) Solutions March 21, 2005
MATH 23b, SPRING 2005 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Midterm (part 1) Solutions March 21, 2005 1. True or False (22 points, 2 each) T or F Every set in R n is either open or closed
More informationCS 323: Numerical Analysis and Computing
CS 323: Numerical Analysis and Computing MIDTERM #2 Instructions: This is an open notes exam, i.e., you are allowed to consult any textbook, your class notes, homeworks, or any of the handouts from us.
More information2.098/6.255/ Optimization Methods Practice True/False Questions
2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence
More informationStochastic Gradient Descent. Ryan Tibshirani Convex Optimization
Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so
More information