Programming, numerics and optimization

Size: px
Start display at page:

Download "Programming, numerics and optimization"

Transcription

1 Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski Institute of Fundamental Technological Research Room 4.32, Phone ext. 428 June 5, Current version is available at ljank. 1/64

2 Outline 1 Line search methods 2 Zero order methods 3 Steepest descent 4 Conjugate gradient methods 5 Newton methods 6 Quasi-Newton methods 7 Least-squares problems 2/64

3 Outline 1 Line search methods 3/64

4 Line search methods The basic outline of all line search algorithms is: Select any feasible point x 0. Having calculated points x 0, x 1,..., x k, iteratively calculate the successive point x k+1 : 1 Choose the direction d k of optimization. 2 Starting from x k, perform a (usually approximate) 1D optimization in the direction of d k, that is find s k R that sufficiently 2 decreases f k and f k, where Then, take the step f k (s) = f (x k + s d k ). x k+1 := x k + s k d k. 3 Check the stop conditions. 2 Sufficiently, that is significantly enough to guarantee convergence to the minimum. Exact minimization is usually not necessary; it can be costly and sometimes it can even make the convergence slower. 4/64

5 Line search methods 1 Choose the direction of optimization. 2 Perform a (usually approximate) 1D optimization in that direction. 3 Check the stop conditions Two problems: 1 Direction choice 2 Step size 5/64

6 Outline 2 Zero order methods Coordinate descent Powell s direction set Rosenbrock method 6/64

7 Zero order methods Coordinate descent The simplest method. All coordinate directions e 1,, e n are used sequentially, d k := e k mod n. The simplest method Very slow or even nonconvergent. In 2D and with exact line minimizations equivalent to the steepest descent (besides the first step). The set of the search directions can be periodically modified to include directions deemed to be more effective. 7/64

8 Zero order methods Powell s direction set Given x 0, 1 Optimize along all the directions e 1,..., e n and yield the points x 1,..., x n. 2 Modify the directions 1 e 1 := e 2 e n 1 := e n 0 e n := x n x 0 3 Optimize along e n and yield the point x 0. 4 Repeat until stop conditions are satisfied. 8/64

9 Zero order methods Powell s direction set For a quadratic objective function and exact line minimizations, Powell s method generates a set of conjugate directions (besides the first step). Iteratively generated directions tend to fold up on each other and become linearly dependent. Possible solutions: Every m iterations reset the directions to the original set. Every m iterations reset the directions to any orthogonal basis (making use of some of the already generated directions). When modifying the directions (step 2), instead of discarding the first direction e 1, discard the direction of the largest decrease (since it is anyway a major component of the newly generated direction e n ). 9/64

10 Zero order methods Rosenbrock method The Rosenbrock method 3 involves approximate optimization that cycles over all the directions. The direction set is modified after each approximate optimization. Single stage of the Rosenbrock method Given x 0. 1 Approximately optimize f using all the directions e 1,..., e n and yield the points x 1,..., x n. 2 Modify the directions so that e n := x n x 0. Orthogonalize the resulting set. 3 Let x 0 := x n. 3 H.H. Rosenbrock. An Automatic Method for Finding the Greatest or Least Value of a Function. The Computer Journal 3(3): , Full text: 10/64

11 Zero order methods Rosenbrock method Approximate optimization 1 Let s i be an arbitrary initial step length in the ith direction. 2 Let α > 1 and β (0, 1) be two given constants (step elongation and step shortening). 3 Repeat for every direction i = 1,..., n 1 Make a step s i in the ith direction. 2 If successful (f not increased), then s i = α s i, else if failed (f increased), then s i = β s i. until the loop is executed N times or until all s i become too small or until at least one success and one failure in each direction. Rosenbrock method requires a cheap objective function. 11/64

12 Zero order methods Rosenbrock method 1 Rosenbrock method 0 12/64

13 Outline 3 Steepest descent Simplified version of the method Plotting the number of iterations Attraction basins of minima 13/64

14 Simplified steepest descent The steepest descent method is often implemented in the following simplified version: x k+1 = x k α f (x k ), where α > 0 is a given coefficient. This version does not satisfy the Wolfe (strong Wolfe, Goldstein and Price, backtracking) conditions and so can perform extremely poorly and should not be used. However, investigation of its properties reveals astonishing complexity 4. Consider the following characteristics: the number of iterations necessary for the method to converge to the minimum from a given starting point, the attraction basins of the minima. 4 See C. A. Reiter s home page at reiterc. 14/64

15 Plotting the number of iterations The simplified steepest descent method: x k+1 = x k α f (x k ). For every value of the coefficient α, the regions of quick and slow convergence of the method can be illustrated using the following characteristics of starting points x 0 : n steps (x; α, ɛ) = arg min k min m x k x m < ɛ, with x 0 = x, where x m are all local minima of the objective function f. Thus, n steps (x; α, ɛ) is the number of the iterations necessary for {x k } to converge from x to any minimum of f. 15/64

16 Plotting the number of iterations Consider the Rosenbrock banana function, focus on [0, 2] [1, 3] The function f has a single global minimum at (x, y ) = (1, 1). f (x, y) = 100(y x 2 ) 2 + (x 1) 2 16/64

17 Plotting the number of iterations Legend few iterations many iterations The scale begins at zero iterations ends at the maximum number of iterations in the current frame (but not more than ). All images have been computed with the resolution The accuracy ɛ = for the accompanying video. 17/64

18 Outline Line search Zero order Steepest descent Conjugate gradient Newton Quasi-Newton Least-squares Plotting the number of iterations 18/64

19 Plotting the number of iterations zoom for α = /64

20 Plotting the attraction basins of minima The simplified steepest descent method: x k+1 = x k α f (x k ). For every value of the coefficient α, the attraction basins of the minima of the objective function can be illustrated by plotting n min (x; α, ɛ) = arg min m x m lim k x k, x 0 = x, if {x k } is convergent and 0 otherwise. Thus, n min (x; α, ɛ) is the number of the minimum to which {x k } converges from x (or 0, if {x k } is nonconvergent). 20/64

21 Plotting the attraction basins of minima Consider a two-minimum modification of the Rosenbrock banana function: f (x, y) = 100(y x 2 ) 2 + (x )2 (x 1) 2 The function f has two global minima at ( 1 2, 1 4 ) and (1, 1). 21/64

22 Plotting the attraction basins of minima Legend no convergence minimum (1, 1) minimum ( 1 2, 1 4 ) All images have been computed with the resolution The accuracy ɛ = Up to iterations performed for the accompanying video. 22/64

23 Plotting the attraction basins of minima 23/64

24 Outline 4 Conjugate gradient methods Conjugate directions Linear conjugate gradient Nonlinear conjugate gradient 24/64

25 Search directions Conjugate direction Any (smooth enough) function can be approximated by a quadratic form, that is by its second-order Taylor series, f (x k + s d k ) f (x k ) + s d T k f (x k ) s2 d T k 2 f (x k ) d k. The gradient of the approximation is f (x k + s d k ) f (x k ) + s 2 f (x k ) d k. 25/64

26 Search directions Conjugate direction If f (x k ) is the exact minimum along the previous search direction d k 1, the gradient at the minimum x k is perpendicular to the search direction d k 1, d T k 1 f (x k ) = 0. It is reasonable to expect that the minimization along the next direction d k does not jeopardize the minimization along the previous direction d k 1. Therefore, the gradient in the points along the new search direction d k should still stay perpendicular to d k 1. Thus we require that 0 = d T k 1 f (x k + s d k ) ] d T k 1 [ f (x k ) + s 2 f (x k ) d k = s d T k 1 2 f (x k ) d k. 26/64

27 Conjugate directions Conjugate direction Every direction d k, which satisfies 0 = d T k 1 2 f (x k ) d k, is said to be conjugate to d k 1 (at x with respect to f ). Conjugate set A set of vectors d i that pairwise satisfy 0 = d T i 2 f (x) d j, is called a conjugate set (at x with respect to f ). If f is an n-dimensional quadratic form, then n global conjugate directions can be always found. If f is also positive definite, then single exact optimization along each of them leads directly to the global minimum. 27/64

28 Conjugate directions Coordinate descent Conjugate directions 28/64

29 Linear conjugate directions Let A be an n n positive definite matrix and f (x) := c + x T b xt Ax, r(x) := f (x) = b + Ax, and d i (i = 1, 2,..., n) be mutually conjugate directions with respect to A, d T i Ad j = 0 for i j. 29/64

30 Linear conjugate directions According to the line search principle x k+1 := x k + s k d k, where s k minimizes f k (s) := f (x k + sd k ). For quadratic f, f k is a parabola with the exact minimizer s k = dt k r k d T k Ad, k where r k = r(x k ) = f (x k ) = b + Ax k. The matrix A is n n, hence exactly n such optimum steps along all conjugate directions d i lead to the global minimum, n 1 x n = x = x 0 + s k d k. k=0 30/64

31 Linear conjugate directions Choosing the next direction Let d i, i = 0, 1,..., k be (already computed) conjugate directions with respect to A. Then is the minimum in the subspace k x k+1 = x 0 + s i d i i=0 x 0 + span {d 0,..., d k } and thus the gradient r k+1 = f (x k+1 ) is perpendicular to span {d 0,..., d k } and so r T k+1d i = 0, i = 0, 1,..., k. 31/64

32 Linear conjugate directions Choosing the next direction The next conjugate direction d k+1 can be hence computed using the gradient r k+1 as k d k+1 = r k+1 + η k+1,i d i, i=0 where the coefficient η k+1,i ensure conjugacy of d k+1 with all the previous directions, d T k+1ad i = 0, i = 0, 1,..., k which yields Therefore η k+1,i = rt k+1 Ad i d T i Ad i. d k+1 = r k+1 + k i=0 r T k+1 Ad i d T i Ad i d i. 32/64

33 Linear conjugate directions Linear conjugate directions (for a quadratic form) k d k+1 := r k+1 + η k+1,i d i, i=0 η k+1,i = rt k+1 Ad i d T i Ad i, where f (x) = c + x T b xt Ax, r(x) := f (x) = b + Ax. Therefore, the computation of the next conjugate direction d k+1 requires k + 1 coefficients η k+1,i (i = 0, 1,..., k) to be computed the entire set of all conjugate directions d i, i = 0,..., n 1, requires O(n 2 ) time with a large constant and can be numerically unstable (a scheme similar to Gram-Schmidt orthogonalization). 33/64

34 Linear conjugate gradient method Fortunately, it turns out that if the first direction d 0 is the steepest descent direction, d 0 = r 0 = f (x 0 ), then in k d k+1 := r k+1 + η k+1,i d i i=0 it is enough to take into account the last direction only, d k+1 := r k+1 + η k d k = r k+1 + rt k+1 Ad k d T k Ad d k, k and the resulting direction d k+1 is automatically conjugate to all previous directions d i, i = 0, 1,..., k. This is called (a bit misleadingly) the conjugate gradient method. 34/64

35 Linear conjugate gradient method Using r k+1 r k = s k Ad k, d k+1 = r k+1 + η k d k, r T k+1d i = 0 for i = 0, 1,..., k, it is easy to express the optimum step length s k for the use with x k+1 := x k + s k d k as s k = dt k r k d T k Ad k = rt k r k d T k Ad. k simplify η k for the use with d k+1 := r k+1 + η k d k, η k = rt k+1 Ad k d T k Ad k = rt k+1 r k+1 rk Tr. k 35/64

36 Linear conjugate gradient method Linear conjugate gradient Given f (x) = c + x T b xt Ax, chose the initial point x 0. Initialize: r 0 := b + Ax 0, d 0 := r 0 and k := 0. While stop conditions not satisfied do rt k r k s k := d T k Ad, k x k+1 := x k + s k d k, r k+1 := r k + s k Ad k, η k := rt k+1 r k+1 rk Tr, k d k+1 := r k+1 + η k d k, k := k + 1. This is basically the CG method for solving Ax = b (Lecture B-3). 36/64

37 Nonlinear conjugate gradient method The conjugate gradient algorithm can be (almost) directly used with non-quadratic objective functions, provided the exactly minimizing step is replaced with a line search. It is inexpensive numerically (no Hessian, just gradients; data storage only one step back) and in general yields a superlinear convergence. The line search need not to be exact. However, the step length s k has to satisfy the strong Wolfe conditions with 0 < c 1 < c 2 < 0.5 (Lecture C-2) in order to assure that the next direction d k+1 := r k+1 + η k d k is a descent direction with η k := rt k+1 r k+1 r T k r k = f (x k+1) T f (x k+1 ) f (x k ) T. f (x k ) 37/64

38 Nonlinear conjugate gradient method Nonlinear conjugate gradient Given f (x), chose the initial point x 0. Initialize: r 0 := f (x 0 ), d 0 := r 0 and k := 0. While stop conditions not satisfied do 1 Find step length s k satisfying strong Wolfe conditions with 0 < c 1 < c 2 < 0.5, set x k+1 := x k + s k d k. 2 Compute r k+1 := f (x k+1 ). 3 Compute Fletcher-Reeves η k := rt k+1 r k+1 r T k r k Polak-Ribière [ η k := max 4 Compute d k+1 := r k+1 + η k d k. 5 Update the step number k := k , rt k+1 (r k+1 r k ) r T k r k ] 38/64

39 Nonlinear conjugate gradient method Iteratively generated search directions d k can tend to fold up on each other and become linearly dependent. Therefore, the directions should be periodically reset by assuming η k := 0, which effectively restarts the procedure of generating the directions from the scratch (that is, the steepest descent direction). Restart every N iterations. With quadratic objective function and exact line searches, consecutive gradients are orthogonal, r T i r j = 0 if i j. The procedure can be thus restarted when r T k+1 r k r k+1 r k > c 0.1, which in practice happens rather frequently. Polak-Ribière method restarts anyway when the computed correction term is negative (rather infrequently). 39/64

40 Outline 5 Newton methods Inexact Newton methods Modified Newton methods 40/64

41 Newton methods The Newton direction d k := [ 2 f (x k ) ] 1 f (xk ) is based on the quadratic approximation to the objective function: f (x k + x) f (x k ) + x T f (x k ) xt 2 f (x k ) x. If the Hessian 2 f (x k ) is positive definite (the approximation is convex), the minimum is found by solving f (x k + d k ) f (x k ) + 2 f (x k ) d k = 0. The Hessian has to be computed (a 2 nd order method). Inverting a large Hessian is time-consuming. Far from the minimum the Hessian may not be positive definite and d k may be an ascent direction. Quick quadratic convergence near the minimum. 41/64

42 Newton methods Far from the minimum the Hessian may not be positive definite and the exact Newton direction may be an ascent direction. In practice the Newton method is implemented as Inexact Newton methods, which solve 2 f (x k ) d k = f (x k ) inexactly to assure that d k is a descent direction (Newton conjugate gradient method). Modified Newton methods, which modify the Hessian matrix 2 f (x n ) so that it becomes positive definite. The solution to H k d k = f (x k ), where H k is the modified Hessian, is then a descent direction. 42/64

43 Inexact Newton methods Inexact solution of 2 f (x k ) d k = f (x k ) Is quicker, since the Hessian is not inverted. Can guarantee that the inexact solution is a descent direction. Stop condition is usually based on the norm of the residuum, normalized with respect to the norm of the RHS ( f (x k )): r k f (x k ) = 2 f (x k ) d k + f (x k ) α k, f (x k ) where at least α k < α < 1 or preferably α k 0, for example [ ] 1 α k = min 2, f (x k) or [ 1 ] α k = min 2, f (x k ). 43/64

44 Inexact Newton methods Newton conjugate gradient The Newton conjugate gradient method solves in each step 2 f (x k ) d k = f (x k ) iteratively (an iteration in each step of the iteration) using the linear conjugate gradient method, with the stop conditions Direction p i+1 of a negative curvature is generated, p T i+1 2 f (x k )p i+1 0, or the Newton equation is inexactly solved, r k f (x k ) α k, where [ 1 ] α k = min 2, f (x k ). 44/64

45 Modified Newton methods Modified Newton methods modify the Hessian matrix 2 f (x k ) (as little as possible) by H k = 2 f (x k ) + E k, so that it becomes sufficiently positive definite and the solution to the modified equation [ 2 f (x k ) + E k ] d k = f (x k ) is a descent direction. There are several possibilities to choose E k : 1 A multiple of identity, E k = τi. 2 Obtained during Cholesky factorisation of the Hessian with immediate increase of the diagonal elements (Cholesky modification). 3 others (Gershgorin modification, etc.). 45/64

46 Outline 6 Quasi-Newton methods Approximating the Hessian DFP method BFGS method Broyden class and SR1 method 46/64

47 Quasi-Newton methods All Newton methods require the Hessian to be computed and some of them (modified Newton) invert or factorize it. Both tasks are time-consuming and error-prone. Quasi-Newton methods compute the search direction using a gradient-based approximation to the Hessian (or to its inverse): d k = H 1 k f (x k) or d k = B k f (x k ). Quasi-Newton methods are useful in large problems (the Hessian is dense and too large), with severely ill-conditioned Hessians, when second derivatives are unavailable. 47/64

48 Quasi-Newton methods Approximating the Hessian Approximation is based on gradients and updated after each step using the previous step length and the change of the gradient. Approximate the function around the last two iterates: f k 1 (x) := f (x k 1 + x) f (x k 1 ) + x T f (x k 1 ) xt H k 1 x, f k (x) := f (x k + x) f (x k ) + x T f (x k ) xt H k x, where x k := x k 1 + s k 1 d k 1. Assume the previous-step approximate Hessian H k 1 is known. The next approximation H k can be obtained by comparing the gradients in x k 1 : f k ( s k 1 d k 1 ) = f k 1 (0). 48/64

49 Quasi-Newton methods Approximating the Hessian The requirement f k ( s k 1 d k 1 ) = f k 1 (0) leads to H k (x k x k 1 ) = f (x k ) f (x k 1 ), which is usually stated in the shorter form and called the secant equation H k y k = g k, where y k := x k x k 1, g k := f (x k ) f (x k 1 ). Such symmetric positive definite H k exists (non-uniquely) only if y T l H k y k = y T k g k > 0, which is guaranteed if the step length s k satisfies Wolfe conditions. 49/64

50 Quasi-Newton methods Approximating the Hessian Since the solution is non-unique, H k should be chosen so that it is the closest approximate to the last step approximation H k 1 : Find a symmetric positive definite matrix H k, which satisfies the secant equation H k y k = g k and minimizes H k H k 1 for a given norm. The approximation is used to find the quasi-newton direction d k = H 1 k f (x k), which requires inverting the approximated Hessian. The task is thus often reformulated to find an approximation to the inverse B k = H 1 k : Find a symmetric positive definite matrix B k, which satisfies the inverse secant equation B k g k = y k and minimizes B k B k 1 for a given norm. 50/64

51 Quasi-Newton methods Approximating the Hessian In both cases, the solution is easy to find if the weighted Frobenius norm 7 is used, A W = W 1/2 AW 1/2 F, where the the weighting matrix W satisfies the inverse (or direct) secant equation. If W 1 (or W) is the average Hessian over the last step, 1 2 f (x k 1 + τs k 1 d k 1 ) dτ, 0 then two updating formulas are obtained: DFP (Davidon, Fletcher, Powell) formula for Hessian updating and BFGS (Broyden, Fletcher, Goldfarb, Shanno) formula for inverse Hessian updating. Both formulas use rank two updates to the previous step matrix. 7 The Frobenius norm of a matrix A is defined as A 2 F = i,j a2 ij. 51/64

52 Quasi-Newton methods DFP method DFP (Davidon, Fletcher, Powell) Hessian updating formula ( ) ( ) H k 1 H DFP k = I g ky T k g T k y k I y kg T k g T k y k + g kg T k g T k y k The corresponding formula for updating the inverse Hessian B DFP k = B k 1 B k 1g k g T k B k 1 g T k B k 1g k + y ky T k g T k y k 52/64

53 Quasi-Newton methods BFGS method BFGS (Broyden, Fletcher, Goldfarb, Shanno) inverse Hessian updating formula ( ) ( ) B k 1 B BFGS k = I y kg T k g T k y k I g ky T k g T k y k + y ky T k g T k y k The corresponding formula for updating the Hessian H BFGS k = H k 1 H k 1y k y T k H k 1 y T k H k 1y k + g kg T k g T k y k 53/64

54 Quasi-Newton methods Broyden class The DFP and the BFGS formulas for updating the Hessian can be linearly combined to form a general updating formula for the Broyden class methods. Broyden class methods H k = (1 φ)h BFGS k + φ H DFP k. The restricted Broyden class is defined by 0 φ 1. 54/64

55 Quasi-Newton methods SR1 method A member of the Broyden class is the SR1 (symmetric rank one) method, which yields rank one updates to the Hessian. SR1 method φ SR1 k = y T k g k y T k (g k H k 1 y k ) The weighting coefficient φ SR1 k may fall outside [0, 1] interval. The SR1 formula approximates Hessian well, but the approximated Hessian might not be positive semidefinite, the denominator in the above formula can vanish or be small. It is thus usually used with modified Newton methods (trust region) and sometimes the update is skipped. 55/64

56 Outline 7 Least-squares problems Hessian approximation Gauss-Newton method Levenberg-Marquardt method 56/64

57 Least-squares problems In many problems (like parameter fitting, structural optimization, etc.) the objective function is a sum of squares (residuals): Then f (x) = 1 2 r(x)t r(x) = 1 2 n i=1 r 2 i (x). n f (x) = r i (x) r i (x) = J(x) T r(x), i=1 n n 2 f (x) = r i (x) r i (x) T + r i (x) 2 r i (x) i=1 i=1 n = J(x) T J(x) + r i (x) 2 r i (x), i=1 where J(x) is the Jacobi matrix of the residuals, J(x) = [ ri (x) x j ]i,j. 57/64

58 Least-squares problems Hessian approximation If the residuals r i (x), at least near the minimum, vanish (the optimum point yields is a good fit between the model and the data), that is r i (x) 0, or are linear functions of x, that is 2 r i (x) 0, then the Hessian simplifies to n 2 f (x) = J(x) T J(x) + r i (x) 2 r i (x) i=1 J(x) T J(x), which is always positive semidefinite and can be computed using the first derivatives of the residuals only. 58/64

59 Least-squares problems Gauss-Newton method Substitution of the approximation into the Newton formula, yields the Gauss-Newton formula 2 f (x k ) d k = f (x k ), J(x k ) T J(x k ) d k = J(x k ) T r(x k ), which may be problematic, if J(x k ) T J(x k ) is not a good approximation to the Hessian. 59/64

60 Least-squares problems Levenberg-Marquardt method A modified version of the Gauss-Newton method is called the Levenberg-Marquardt method 8 Levenberg-Marquardt formula [ ] J(x k ) T J(x k ) + λi d k = J(x k ) T r(x k ). The parameter λ allows the step length to be smoothly controlled: For small λ, the steps are Gauss-Newton (or Newton) in character and take advantage of the superlinear convergence near the minimum. For large λ, the identity matrix dominates and the steps are similar to steepest descent steps. Sometimes a heuristic is advocated, which uses the diagonal of the approximated Hessian J(x k ) T J(x k ) instead of the identity matrix. 8 Notice that this is essentially a trust-region approach. 60/64

61 Least-squares problems Levenberg-Marquardt method (quadratically) approximated objective function steepest descent identity matrix diagonal heuristic 61/64

62 Least-squares problems Levenberg-Marquardt method The rule with the diagonal of the Hessian matrix (instead of the identity) is a heuristic only. It can be quicker, but sometimes (and not so rare) may be also much slower steepest descent identity matrix diagonal heuristic 62/64

63 Least-squares problems Levenberg-Marquardt method In the Levenberg-Marquardt formula, [ ] J(x k ) T J(x k ) + λi d k = J(x k ) T r(x k ), the parameter λ controls the step length. It is updated in each optimization step based on the accuracy ρ = f (x k) f (x k + d k ) f k (0) f k (d k ) = actual decrease predicted decrease of the quadratic approximation f k to the objective function f (x k + x) f k (x) := f (x k ) + x T J(x k ) T r(x k ) xt J(x k ) T J(x k )x. The denominator in the formula for ρ simplifies to f k (0) f k (d k ) = 1 2 d k [λ d k f (x k )]. 63/64

64 Least-squares problems Levenberg-Marquardt method Given the coefficients ρ min 0.1, ρ max 0.75 and k 25. Single optimization step of the Levenberg-Marquardt method Compute r(x k ), J(x k ), J(x k ) T J(x k ) and f (x k ). do 1 Solve [ J(xk ) T J(x k ) + λi ] d k = J(x k ) T r(x k ). 2 Compute f (x k + d k ). 3 Compute ρ = f (x k) f (x k + d k ) f k (0) f k (d k ) = actual decrease predicted decrease. 4 If ρ < ρ min, then λ = kλ, else if ρ > ρ max, then λ = λ/k. while ρ < 0 64/64

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Lecture 7 Unconstrained nonlinear programming

Lecture 7 Unconstrained nonlinear programming Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

EECS260 Optimization Lecture notes

EECS260 Optimization Lecture notes EECS260 Optimization Lecture notes Based on Numerical Optimization (Nocedal & Wright, Springer, 2nd ed., 2006) Miguel Á. Carreira-Perpiñán EECS, University of California, Merced May 2, 2010 1 Introduction

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

NonlinearOptimization

NonlinearOptimization 1/35 NonlinearOptimization Pavel Kordík Department of Computer Systems Faculty of Information Technology Czech Technical University in Prague Jiří Kašpar, Pavel Tvrdík, 2011 Unconstrained nonlinear optimization,

More information

Chapter 10 Conjugate Direction Methods

Chapter 10 Conjugate Direction Methods Chapter 10 Conjugate Direction Methods An Introduction to Optimization Spring, 2012 1 Wei-Ta Chu 2012/4/13 Introduction Conjugate direction methods can be viewed as being intermediate between the method

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Notes on Numerical Optimization

Notes on Numerical Optimization Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18, 2014 1 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization

More information

Lecture V. Numerical Optimization

Lecture V. Numerical Optimization Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Cubic regularization in symmetric rank-1 quasi-newton methods

Cubic regularization in symmetric rank-1 quasi-newton methods Math. Prog. Comp. (2018) 10:457 486 https://doi.org/10.1007/s12532-018-0136-7 FULL LENGTH PAPER Cubic regularization in symmetric rank-1 quasi-newton methods Hande Y. Benson 1 David F. Shanno 2 Received:

More information

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38

More information

The Conjugate Gradient Algorithm

The Conjugate Gradient Algorithm Optimization over a Subspace Conjugate Direction Methods Conjugate Gradient Algorithm Non-Quadratic Conjugate Gradient Algorithm Optimization over a Subspace Consider the problem min f (x) subject to x

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Optimization II: Unconstrained

More information

Optimization Methods for Circuit Design

Optimization Methods for Circuit Design Technische Universität München Department of Electrical Engineering and Information Technology Institute for Electronic Design Automation Optimization Methods for Circuit Design Compendium H. Graeb Version

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Geometry optimization

Geometry optimization Geometry optimization Trygve Helgaker Centre for Theoretical and Computational Chemistry Department of Chemistry, University of Oslo, Norway European Summer School in Quantum Chemistry (ESQC) 211 Torre

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method. Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization

More information

Lecture 10: September 26

Lecture 10: September 26 0-725: Optimization Fall 202 Lecture 0: September 26 Lecturer: Barnabas Poczos/Ryan Tibshirani Scribes: Yipei Wang, Zhiguang Huo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Lecture 7: Optimization methods for non linear estimation or function estimation

Lecture 7: Optimization methods for non linear estimation or function estimation Lecture 7: Optimization methods for non linear estimation or function estimation Y. Favennec 1, P. Le Masson 2 and Y. Jarny 1 1 LTN UMR CNRS 6607 Polytetch Nantes 44306 Nantes France 2 LIMATB Université

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization

More information

Matrix Derivatives and Descent Optimization Methods

Matrix Derivatives and Descent Optimization Methods Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Stochastic Quasi-Newton Methods

Stochastic Quasi-Newton Methods Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent

More information

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods Department of Statistical Sciences and Operations Research Virginia Commonwealth University Sept 25, 2013 (Lecture 9) Nonlinear Optimization

More information

Conjugate Gradient (CG) Method

Conjugate Gradient (CG) Method Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Optimization Methods for Machine Learning

Optimization Methods for Machine Learning Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Spring 2010 Emo Todorov (UW) AMATH/CSE 579, Spring 2010 Lecture 9 1 / 8 Gradient descent

More information

A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Conjugate Directions for Stochastic Gradient Descent

Conjugate Directions for Stochastic Gradient Descent Conjugate Directions for Stochastic Gradient Descent Nicol N Schraudolph Thore Graepel Institute of Computational Science ETH Zürich, Switzerland {schraudo,graepel}@infethzch Abstract The method of conjugate

More information

Mathematical optimization

Mathematical optimization Optimization Mathematical optimization Determine the best solutions to certain mathematically defined problems that are under constrained determine optimality criteria determine the convergence of the

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Lecture 18: November Review on Primal-dual interior-poit methods

Lecture 18: November Review on Primal-dual interior-poit methods 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Jason E. Hicken Aerospace Design Lab Department of Aeronautics & Astronautics Stanford University 14 July 2011 Lecture Objectives describe when CG can be used to solve Ax

More information

Multivariate Newton Minimanization

Multivariate Newton Minimanization Multivariate Newton Minimanization Optymalizacja syntezy biosurfaktantu Rhamnolipid Rhamnolipids are naturally occuring glycolipid produced commercially by the Pseudomonas aeruginosa species of bacteria.

More information

On nonlinear optimization since M.J.D. Powell

On nonlinear optimization since M.J.D. Powell On nonlinear optimization since 1959 1 M.J.D. Powell Abstract: This view of the development of algorithms for nonlinear optimization is based on the research that has been of particular interest to the

More information

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection 6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods

More information

FALL 2018 MATH 4211/6211 Optimization Homework 4

FALL 2018 MATH 4211/6211 Optimization Homework 4 FALL 2018 MATH 4211/6211 Optimization Homework 4 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2 1 Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year 2013-14 OUTLINE OF WEEK 8 topics: quadratic optimisation, least squares,

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

Review of Classical Optimization

Review of Classical Optimization Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

Maria Cameron. f(x) = 1 n

Maria Cameron. f(x) = 1 n Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 013 1 ISSN 50-3153 Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Sub-Sampled Newton Methods

Sub-Sampled Newton Methods Sub-Sampled Newton Methods F. Roosta-Khorasani and M. W. Mahoney ICSI and Dept of Statistics, UC Berkeley February 2016 F. Roosta-Khorasani and M. W. Mahoney (UCB) Sub-Sampled Newton Methods Feb 2016 1

More information

Unconstrained Multivariate Optimization

Unconstrained Multivariate Optimization Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued

More information

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal Zero-Order Methods for the Optimization of Noisy Functions Jorge Nocedal Northwestern University Simons Institute, October 2017 1 Collaborators Albert Berahas Northwestern University Richard Byrd University

More information

Numerical solutions of nonlinear systems of equations

Numerical solutions of nonlinear systems of equations Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points

More information

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

ECS550NFB Introduction to Numerical Methods using Matlab Day 2 ECS550NFB Introduction to Numerical Methods using Matlab Day 2 Lukas Laffers lukas.laffers@umb.sk Department of Mathematics, University of Matej Bel June 9, 2015 Today Root-finding: find x that solves

More information