A Primer on Multidimensional Optimization

Size: px
Start display at page:

Download "A Primer on Multidimensional Optimization"

Transcription

1 A Primer on Multidimensional Optimization Prof. Dr. Florian Rupp German University of Technology in Oman (GUtech) Introduction to Numerical Methods for ENG & CS (Mathematics IV) Spring Term 2016

2 Eercise Session

3 Reviewing the highlights from last time (1/ 2) Reviewing the highlights from last time Page 123, eercise 1 (reformulated) Find where the graphs of y = 3 and y = ep() intersect by finding solutions of ep() 3 = 0 correct to four decimal digits with the secant method. Page 149, eercise 4 Application of the secant method for f() = 2 e with 0 = 0 and 1 = 1 leads to the following sequence of iterates n+1 = n +(2 e n )( n n 1 )(e n e n 1 ) 1. What is lim n n? Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 3 / 83

4 Reviewing the highlights from last time (2/ 2) Reviewing the highlights from last time Page 151, computer eercise 12 Test numerically whether Olver s method, given by the update formula n+1 = n f( n) f ( n ) 1 2 f ( n ) f ( n ) ( ) f(n ) 2 f ( n ) is cubically convergent to a root of f. Try to establish that it is. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 4 / 83

5 Introduction & Todays Scope

6 Disclaimer This will be a rather theoretical and not very interactive lecture. It discusses root finding of the gradient J of functions J : R n R, n 1. The purpose of this lecture is to give you a high-level outlook on optimization methods that are based on linear and quadratic approimations. Thus, it is a review of what we discussed for root finding in a slightly different contet. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 6 / 83

7 About optimization and calculus An important application of calculus is the problem of finding the local minima and maima of a function. Problems of maimization are covered by the theory of minimization because the maima of F are the minima of F. In calculus, the principal technique for minimization is to differentiate the function whose minimum is sought, set the derivative equal to zero, and locate the points that simultaneously satisfy the resulting equation, like F( 1, 2, 3 ) 1 = F( 1, 2, 3 ) 2 = F( 1, 2, 3 ) 3 = 0. This procedure cannot be readily accepted as a general-purpose numerical method as it requires differentiation followed by the solution of one/ many equations in one/ many variables using the methods from the last time. This task may be as difficult to carry out as a direct frontal attack on the original problem. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 7 / 83

8 Unconstrained & constrained minimization problems (1/ 2) The minimization problem has two forms: the unconstrained and the constrained. In an unconstrained minimization problem, a function F : R n R is given and a point z R n is sought with the property F(z) F() for all R n. In a constrained minimization problem, a subset K R n is prescribed, and a point z K R n is sought with the property F(z) F() for all K R n. Such problems are more difficult because of the need to keep the points within the set K which can be defined in a complicated way (you may have seen such problems already, think of the Lagrange method!). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 8 / 83

9 Unconstrained & constrained minimization problems (2/ 2) Eample 20 Consider the elliptic paraboloid F( 1, 2 ) = = ( 1 1) 2 +( 2 1) The unconstrained minimum occurs at F(1,1) = 2, whereas the if K = {( 1, 2 ) : 1, 2 0}, the constrained minimum is at F(0,0) = y ais ais Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 9 / 83

10 Today, we will focus on multidimensional unconstrained optimization Today s topics: The gradient/ steepest descent method and its step-size conditions Newton s method revisited and the Quasi-Newton method The Trust-Region method Penalty & barrier functions (in 1D) Simulated Annealing Corresponding tetbook chapters: 13.1 and 13.2 Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 10 / 83

11 The Gradient/ Steepest Descent Method

12 The negative gradient vector points in the direction of the steepest descent From calculus we know that the negative gradient vector J of a function J points in the direction of the steepest descent: Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 12 / 83

13 A first gedanken eperiment... Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 13 / 83 Let us try to find the minimum of the peaks -function J : R 2 R and assume therefore a small ball at the position (,J()) on the graph of J that moves with speed J() 2 in direction J: k ' k+1 '' k+1 Abszisse J() = 2 J() = 2

14 ... immediately leads to four core problems in optimization We see that we found a minimum as soon as our ball is in a kind of a basin and does not move anymore ( J = 0). Though, this intuition is already not so easy to transform into mathematics: 1. What does it mean to be in a basin and is J = 0 the right mathematical property (after all saddle points have this property, too)? 2. Can it be possible, that the ball has so much energy that it simply runs through the basin without stopping or that it has so few energy that it approaches the basin too slowly? 3. Which direction has the ball to take to eventually come to a basin? 4. How long does it take the ball to reach a minimum? (Speed of convergence) Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 14 / 83

15 Solving problem 1: strictly conve functions (1/ 2) Problem 1 1. What does it mean to be in a basin and is J = 0 the right property? can be solved in two ways: First, by introducing the notion of curvature and establishing that the curvature is positive at a minimum of J : R n R. This is the usual way in calculus, where you go for a second order Taylor epansion around J( +h) = f( )+h T J( )h+ 1 2 ht H J ( )h+o( h 3 ) and study if the Hessian matri H J () = ( i j J( 1,..., n )) i,j=1,...,n is positive definite at =. I.e., if all eigenvalues of H J ( ) are strictly positive. Second, by encoding such an information into the function you want to minimize, like only allowing (strictly) conve functions as minimizers. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 15 / 83

16 Solving problem 1: strictly conve functions (2/ 2) Definition (Strictly Conve Function) Let F R n be a conve set. A function J : F R is called conve, if for all,y F with y and λ (0,1) it holds that J(λ+(1 λ)y) λj()+(1 λ)j(y). The function J : F R is called strictly conve if strict inequality holds, i.e., J(λ+(1 λ)y) < λj()+(1 λ)j(y). Strictly conve functions have at most one minimum such that they are perfect candidates for optimization purposes. Remark: Flipping the inequality sign gives the definition of a (strictly) concave function. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 16 / 83

17 Eamples of conve and concave functions Some illustrations of conve and concave functions: J(y) concave linear conve conve concave J(c) J() c y conve non conve For a conve function J over an interval [,y] we have that any point c [a,b] takes a value J(c) in the interval [J(a),J(b)] or [J(b),J(a)]. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 17 / 83

18 The key idea of the gradient method (1/ 2) Net to problem 3 3. Which direction has the ball to take to eventually come to a basin? If the ball continues to roll always in that direction s R n, s 2 = 1, for which J(+s) is less that J() at the current point, i.e., at best J() > J(+s) and ma s R n J(+s) J() 2 then it will inevitably reach a minimum (if it eists). Applying a first order Taylor approimation J(+s) J()+s T J() leads, provided it eists, to a direction s R n of the steepest descent (of norm one) via the relation s = ma s R n : s 2 =1 st J() 2 = ma J(+s) J() 2. s R n : s 2 =1 Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 18 / 83

19 The key idea of the gradient method (2/ 2) The scalar product s T J() takes its maimum if s and J() are collinear, giving us already s := J() J() 2, because we require J(+s) J() = s T J() < 0, and, as we have already seen in the discussion of the gradient, a direction of the kind + J() would lead us to an ascent. This motivates the following algorithm (attributed to Cauchy, 1847). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 19 / 83

20 The algorithm of the gradient method Algorithm (Gradient Method) Starting point: Choose 0 R n and compute s 0 := J( 0 ). Set k = 0. Iteration: If s k = 0: STOP, the optimal solution is k. Else, choose α k (0, ) such that J( k ) > J( k +α k s k ). (The parameter α k determines how long we want to descent in the direction s k.) Update Data: Set k+1 := k +α k s k and compute s k+1 := J( k+1 ). Set k := k +1 and continue with the iteration step. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 20 / 83

21 A geometric view on the gradient... The gradient at k is orthogonal to the tangential plane of the graph at k, and the projection of the gradient at k onto the argument space is orthogonal to the level set of the function at k. This leads to the gradient method s typical zig-zag pattern of the iterates in the argument space. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 21 / 83

22 ... eplains the typical zig-zag pattern of the gradient method Application of the gradient method on the function ( ) T with starting point 0 = ( 1.9, 0.5) T : Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 22 / 83

23 The gradient method can be generalized to a descent method In general a sufficiently good descent direction would be enough to eventually reach a minimum, e.g. a direction s k R n \{0} such that J( k ) > J( k +α k +s k ), k N 0. This generalizes the gradient method to a descent or gradient-like method. For the direction of descent in such methods the following uniform angle condition is demanded in order to ensure a sufficiently large descent: νsuch that0 < ν ν k := cos( J( k ),s k ) = st k J( k) s k 2 J( k ) 2. The global postulate ν ν k for all k ensures during the descent that the angle between s k and J( k ) stays uniformly less than 90 degrees and thus that at each step s k is a sufficiently good direction of descent. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 23 / 83

24 Visualization of the uniform angle condition k ν J( k ) k ν k s k ν J( k ) J() Visualization of the uniform angle condition: the green direction of descent s k must lie in the red cone. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 24 / 83

25 Step-Size Conditions for the Gradient/ Steepest Descent Method

26 How do we control the (step-)size of the direction of descent? Problem 2 2. Can it be possible, that the ball has so much energy that it simply runs through the basin without stopping or that it has so few energy that it approaches the basin too slowly? leads to the issue of controlling the step-size and thus the parameter α in the gradient or gradient-like methods. One way to obtain an optimal α := α opt as the solution of the minimization problem J(+α opt s) = min α>0 J(+αs). This is an effective way, but may not be efficient as the minimization may not terminate in finitely many steps, ecept we are dealing with quadratic functions. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 26 / 83

27 The Armijo-Goldstein step-size condition The Armijo-Goldstein Condition For the whole gradient or gradient-like method, let the number σ (0,1) and a strictly decreasing sequence {β l } l N0 be given such that β l (0,1), l = 0,1,2,... (β l > β l+1 ). For R n and s R n \{0}, s 2 = 1, the number α := ma{β l : l N 0 } has to be determined such that ϕ(α) := J(+αs) J() + σ α J() T s = ϕ(0) + σ α ϕ (0). Thus for determining the Armijo-Goldstein step-size α we have to test these inequalities subsequently for all β l, l = 0,1,2,..., until we reach an inde l where they hold for the first time. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 27 / 83

28 Illustration of the Armijo-Goldstein step-size condition ϕ(α) ϕ(0) + σαϕ'(0) ϕ 0 AG α = β l β l-1 We consider the argument set of ϕ and there only those values (AG) for which the graph Γ ϕ lies under the Armijo-Goldstein line through (0,ϕ(0)) with slope σϕ (0). The Armijo-Goldstein sep-size α AG is the largest of our predefined numbers β l, l = 0,1,2,..., therein. Thus, we are not descending too far. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 28 / 83

29 The Wolfe-Powell step-size condition... Wolfe-Powell Condition For the whole gradient or gradient-like method, let the numbers σ (0, 1 2 ) and ρ (σ,1) be given. For R n and s R n \ {0}, s 2 = 1, such that s T J() < 0 determine the number α > 0 such that the Armijo-Goldstein inequality ϕ(α) := J(+αs) J() + σ α J() T s = ϕ(0) + σ α ϕ (0). and both hold. ϕ (α) = J(+αs) T s ρ J() T s = ρ ϕ (0) The restriction σ < 1 2 is motivated by considerations to accept the eact minimum of a quadratic function ϕ as Wolfe-Powell setp-size. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 29 / 83

30 .. and its interpretation We choose the Wolfe-Powell step-size from the domain WP R for which on the one hand the graph of ϕ lies under the Armijo-Goldstein line (this prohibits going to far in direction s), and on the other hand the graph of ϕ on WP is not that steeply increasing or decreasing as it is in a neighborhood of α = 0 due to the damping factor ρ in the second Wolfe-Powell inequality (this ensures a sufficiently large progress). ϕ(α) ϕ'(α) ϕ(0) + σαϕ'(0) 0 WP α Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 30 / 83

31 A word of caution The gradient and gradient-like methods can take a long time to actually reach the minimum if the function is rather flat, like in the non-conve Rosenbrock function with its banana valley: The global minimum of the Rosenbrock function f(,y) = (1 ) (y 2 ) 2 is at the point (1,1). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 31 / 83

32 Newton s Method Revisited and the Quasi-Newton Method

33 Newton s method revisited As we have seen, the gradient method uses a linear approimation of the function J and reaches the vicinity of a stationary point with J( ) = 0. This method can be improved by considering the quadratic approimation of J at a point 0 q() := J( 0 )+( 0 ) T J( 0 )+ 1 2 ( 0) T H J ( 0 )( 0 ) If and only if a point leads to a vanishing gradient and a positive definite Hessian-matri of the discussed function this point is a minimum, we get 0 = q() := J( 0 )+H J ( 0 )( 0 ) and thus the Newton identity = 0 (H J ( 0 )) 1 J( 0 ) = 0 +s follows, where s is the solution of the linear system H J ( 0 )s = J( 0 ). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 33 / 83

34 The local Newton s method for unconstrained minimization Algorithm (Local Newton s Method) Starting point: Choose 0 R n and compute s 0 := J( 0 ). Set k = 0. Iteration: If s k = 0: STOP, the optimal solution is k. Else, determine s k+1 R n as the solution of the linear Newton system Update Data: H J ( 0 )s k+1 = s k Set k+1 := k +s k+1 and compute s k+1 := J( k+1 ). Set k := k +1 and continue with the iteration step. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 34 / 83

35 Application of the local Newton s method to the Rosenbrock function Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 35 / 83

36 Some remarks on the local Newton s method For a quadratic function the local Newton s method terminates in one step. The local Newton s method requires a positive definite Hessian matri (which can only be ensured close to the actual minimum ) and, as we already know from last time, starting points very close to the actual minimum. For instance, combining the local Newton s method with the gradient method leads to a globalization of the local Newton s method Quadratic best approimation on the eponential function at the origin. The local character of this approimation is clearly visible. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 36 / 83

37 The key ideas of the Newton-Gradient method We combine the gradient method and the local Newton s method such that in the case that the Newton iteration can not be performed (i.e., if H J () is not positive definite) a sequence of descending points is generated via the gradient method until Newton iteration performs. Of course, the linear Newton system has a solution for all regular Hessian matrices. Thus it has to be checked if we really obtain a direction of descent. In particular, the Newton method s direction (as well as the gradient method s direction) is taken as a new direction of descent the step-size of which is determined by the Armijo-Goldstein condition. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 37 / 83

38 The Newton-Gradient method (1/ 2) Algorithm (Newton-Gradient Method) Starting point: Choose 0 R n, ρ > 0, p > 2 and compute s 0 := J( 0 ). Set k = 0. Iteration: If s k = 0: STOP, the optimal solution is k. Else, determine s k+1 R n as the solution of the linear Newton system... H J ( 0 )s k+1 = s k If this system has no solution or if the condition for a suitable good direction of descent s T k+1 J( k+1) ρ s k+1 p is violated, then set s k+1 := J( k ). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 38 / 83

39 The Newton-Gradient method (2/ 2) Algorithm (Newton-Gradient Method) [cont.]... New Step-Size: Determine the new step-size α k+1 with the Armijo-Goldstein method. Update Data: Set k+1 := k +α k+1 s k+1 and compute s k+1 := J( k+1 ). Set k := k +1 and continue with the iteration step. Remark: It would be sufficient to use difference-quotients of the gradients to approimate the Hessian matri. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 39 / 83

40 The key idea of Quasi-Newton methods (1/ 3) Instead of going for the eact inverse Hessian matri, the so called Quasi-Newton methods use an approimation of it and thus avoid in each step the computationally epensive eact set-up of the Hessian matri, and the evaluation of a linear system to get the direction of descent. Due to Taylor s theorem we have H J ( k )( k+1 k ) J( k+1 ) J( k ). Thus, any matri A k+1 that satisfies the Quasi-Newton condition A k+1 ( k+1 k ) = J( k+1 ) J( k ) can be considered as an approimation of the Hessian matri H J ( k ). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 40 / 83

41 The key idea of Quasi-Newton methods (2/ 3) With the Quasi-Newton condition A k+1 ( k+1 k ) = J( k+1 ) J( k ) we have that any matri B k+1 that satisfies the inverse Quasi-Newton condition B k+1 ( J( k+1 ) J( k )) }{{} =:g k = k+1 k }{{} =:y k can be considered as an approimation of the inverse of the Hessian matri. In the Quasi-Newton algorithm this approimation B k+1 is updated in each step with the help of specific update formulas (no eact computation of the Hessian or the inverse Hessian is performed!) such that points of descent are gained just by updating. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 41 / 83

42 The key idea of Quasi-Newton methods (3/ 3) Note, the direction of the steepest descent depends on the norm. Just consider a conve function with an unique minimum and then overlay it with the unit cells of different norms. In each norm another steepest descent path will guide you to the minimum. Using Newton methods it suggests itself to choose a norm that is related to the Hessian H or its inverse, for instance H := H 1/2 2 = H 1/2,H 1/2, where 2 denotes the Euclidean norm in R n, and R n. This allows us to view the Newton direction as direction of steepest descent with respect to the norm H. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 42 / 83

43 The BFGS-method One of the most used update schemes was found 1970 nearly to the same time by Broyden, Fletcher, Goldfrab and Shanno. Their BFGS-update formula reads as B k+1 = B k + (y k B k g k )y T k +y k(y k B k g k ) T g T k y k (y k B k g k )g k y k y T k (g T k y k) 2. In particular, let y k,g k R n such that y T k g k > 0 and if B k R n n is symmetric and positive definite. Then one can show that the BFGS-update matri B k+1 R n n is again symmetric and positive definite. The same holds for the following globalized BFGS-minimization algorithm. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 43 / 83

44 The globalized BFGS method (1/ 2) Algorithm (globalized BFGS Method) Starting point: Choose 0 R n, and B 0 R n n symmetric and positive definite. Set k = 0. Iteration (new Quasi-Newton direction): If J( k ) = 0: STOP, the optimal solution is k. Else, compute s k = B k J( k ). New Step-Size: Determine the new step-size α k with the Wolfe-Powell method. Update Data: Set k+1 := k + α k s k, and compute y k := k+1 k and g k := J( k+1 ) J( k ). Compute B k+1 via the BFGS-update formula. Set k := k +1 and continue with the iteration step. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 44 / 83

45 Some remarks on Quasi-Newton methods and the BFGS-method In each step of a Quasi-Newton method the approimation of the Hessian and its inverse changes. Thus, the Quasi-Newton direction is in each step considered the steepest descent with respect to a changed norm. This leads to the notation of Quasi-Newton methods as variable metric methods. As an important convergence result, let us finally note that the BFGS-method converges towards a minimum and that its speed of convergence in a suitable environment of is even super-linear. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 45 / 83

46 The Trust-Region Method

47 The key idea of the trust-region method (1/ 2) The idea of all optimization algorithms discussed so far was to determine the direction of descent s via the unconstrained optimization problem where min q k (s), q k (s) := J( k )+ J( k ) T s+ 1 2 st H k s is the quadratic model for the function J : R n R at the point k for the actual minimum of J, and H k is a suitably good approimation of the Hessian matri at k. As discussed several times, the quadratic approimation is not that well suited for global optimization. Although, we consider the quadratic approimation q k as a good model only locally around k, we still take its global minimum as the new direction of descent for our original non-linear problem. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 47 / 83

48 The key idea of the trust-region method (2/ 2) This is where the trust-region method provides a better ansatz: instead of finding the global minimum of q k we determine a minimum in a certain region of trustworthiness (trust-region) which allows for the local character of the quadratic model. This leads to the constrained sub-problem min q k (s) with respect to s 2 k, where k denotes the radius of the trust-region. This sub-problem can be solved, for instance, with Newton s method. In particular, due to its local character the step-size determination is omitted in the trust-region method. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 48 / 83

49 Illustration of the trust-region sub-problem y level-sets of the quadratic model Newton step trust region radius J(,y) k+1 k direction of the negative gradient Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 49 / 83

50 Discussion of the trust-region radius It is obvious that the choice of the trust-region radius k is the essential part of this method. k is predicted by comparing the decrease of the actual objective function J and its quadratic approimation q k. This prediction is carried out with the help of the following quotient r k := J( k) J( k +s). J( k ) q k (s) Depending on r k the trust-region radius is enlarged or reduced (see the algorithm for details). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 50 / 83

51 A trust-region Newton algorithm (1/ 2) Algorithm (Trust-Region Newton Method) Starting point: Choose 0 R n, 0 > 0, min > 0, 0 < ρ 1 < ρ 2 < 1 and 0 < σ 1 < 1 < σ 2. Set k = 0. Iteration: Trust region sub-problem... If J( k ) = 0: STOP, the optimal solution is k. Else, determines k R n asthesolution ofthetrust-region sub-problem (e.g. as penalized unconstrained optimization problem using Newton s method), and compute the trust-region quotient r k. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 51 / 83

52 A trust-region Newton algorithm (2/ 2) Algorithm (Newton-Gradient Method) [cont.]... New point of descent: If r k ρ 1 (the kth iteration was successful), set k+1 := k +s k Else, set k+1 := k (do nothing and adjust the trust-region radius) Update Data: new trust-region radius If r k < ρ 1, set k+1 := σ 1 k (reduce the trust-region radius) If r k [ρ 1,ρ 2 ), set k+1 := ma{ min, k } If r k ρ 2, set k+1 := ma{ min,σ 2 k } (enlarge the trust-region radius) Set k := k +1 and continue with the iteration step. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 52 / 83

53 Solving constrained minimization problems The trust-region method requires us to solve a constrained minimization problem subject to the trust-region. There are two rather easy ways to transform a constrained minimization problem into an unconstrained one and thus be able to apply the discussed methods: 1. Barrier functions prohibit leaving the admissible region (e.g. by introducing some kind of a singularity at the boundaries), and 2. Penalty functions simply penalize leaving the admissible region. (There are more efficient (but more elaborate and difficult) methods for approaching constrained optimization problems, but they are beyond what we ll cover in this course.) Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 53 / 83

54 Eamples of Penalty & Barrier Functions in 1D

55 Constrained 1D minimization with penalty functions (1/ 4) Eample Given the constrained 1D minimization problem min J() subject to 1 for J() = First, we define a twice-continuously differentiable penalty function { 0 for < 0 φ k () := k 3 for Penalty function φ k () for k = 100, penalizing all values > 0. for some k 1, e.g., k = 100. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 55 / 83

56 Constrained 1D minimization with penalty functions (2/ 4) Eample [cont.] As the constraint 1 is equivalent to 1 0, we define the modified penalized objective function J k () := J()+φ k (1 ) = 4 +φ k (1 ) which is identical to J for 1 but rises sharply for < 1. The additional term φ k (1 ) penalizes an optimization algorithm for choosing < Graph of the original objective function (solid blue) and of the modified penalized objective function (dotted red). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 56 / 83

57 Constrained 1D minimization with penalty functions (3/ 4) Eample [cont.] We can approimately minimize J() subject to 1 by running an unconstrained optimization algorithm on J k (); the penalty term will strongly encourage the unconstrained algorithm to choose the best 1. The penalty function φ k is C 2, so it does not cause any trouble in an optimization algorithm which relies on first or second derivatives. The computed minimum of J 100 () turns out as = Zoom into the graph of the original objective function (solid blue) and of the modified penalized objective function (dotted red). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 57 / 83

58 Constrained 1D minimization with penalty functions (4/ 4) Eample [cont.] 3 Increasing the penalty parameter k enforces the constrains more rigorously, while using the previous final iterate as an initial guess speeds up convergence of the unconstrained optimization algorithm (as we epect theminimumforalargervalueofk to be near the minimum of the previous value of k) Zoom into the graph of the original objective function J() (solid blue), of the modified penalized objective function J 100 () (dotted red) and of J () (dotted black). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 58 / 83

59 Pros & Cons of penalty functions The penalty function approach is easily generalized to higher dimensions. It is a hands-off method for converting constrained problems of any type into unconstrained problems. We don t have to worry about finding an initial feasible point (sometimes a problem). Many constraints in the real world are soft, in the sense that they need not be satisfied precisely. The penalty function approach is well-suited to this type of problem. The drawback to penalty function methods is that the solution to the unconstrained penalized problem will not be an eact solution to the original problem (ecept in the limit as described above). In some cases penalty methods can t be applied because the objective function is actually undefined outside the feasible set. Also, as we increase the penalty parameters to more strictly enforce the constraints, the unconstrained formulation becomes very ill-conditioned, with large gradients and abrupt function changes. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 59 / 83

60 The key idea of the barrier function method (1/ 2) Barrier function methods are closely related to penalty function methods, and in fact might as well be considered a type of penalty function method. These methods are generally applicable only to inequality constrained optimization problems. Barrier methods have the advantage that they always maintain feasible iterates, unlike the penalty methods above. The most common is the log barrier method. Suppose we have an objective function J() on R n with inequality constraints g i () 0 for i = 1,2,...,m. We transform this into a modified or penalized objective function J b () = J() m r i ln( g i ()), with all r i > 0. i=1 J b () is undefined if any g i () 0, so we can only evaluate J b () in the interior of the feasible region. However, even inside the feasible region the penalty term is non-zero (but it becomes an anti-penalty if g i () 1). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 60 / 83

61 The key idea of the barrier function method (2/ 2) In general a barrier method works in a similar way to the penalty methods. We start with some choice for the r i and with an initial feasible point 0 (which may be hard to find), and minimize J b () = J() m r i ln( g i ()), with all r i > 0 i=1 by applying an unconstrained optimization algorithm. The terminal point k, must be a feasible point, because the log terms in the definition of J b () form a barrier of infinite height which prevents the optimization routine from leaving the interior of the feasible region. Net, we decrease the value of the r i and re-optimize, using the final iterate k as an initial guess for the newly. We continue this until an acceptable minimum is found. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 61 / 83

62 An eample for the application of barrier functions Eample Given the constrained 1D minimization problem min J() subject to 1 for J() = A modified objective function may look like for r 1 = 2. J b () = 4 2ln( 1) Graph of the original objective function J() (solid blue) and of the modified objective function J b () (dotted red). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 62 / 83

63 Simulated Annealing

64 Finding the global minimum in the presence of many local ones Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 64 / 83

65 The method of simulated annealing (1/ 2) Simulated annealing eploits an analogy between the way in which a metal cools and freezes into a minimum energy crystalline structure (the annealing process) and the search for a minimum in a more general system. It has been proposed and found effective for the minimization of difficult functions, especially if they may have many purely local minimum points. It involves no derivatives or line searches; indeed it has found great success in minimizing discrete functions, such as arise in the traveling salesman problem. Suppose, we are given a real-valued function of n variables J : R n R, such that we are able to compute the values J() for any R n. It is desired to locate a global minimum point of J, which is a point such that J( ) J() for all R n. In other words, J( ) is equal to inf R n J(). Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 65 / 83

66 The method of simulated annealing (1/ 2) The simulated annealing algorithm is based upon that of Metropolis et al., which was originally proposed as a means of finding the equilibrium configuration of a collection of atoms at a given temperature. Simulated annealing s major advantage over other methods is an ability to avoid becoming trapped at local minima. The algorithm employs a random search which not only accepts changes that decrease the objective function, but also some changes that increase it. The latter are accepted with a certain probability depending on a control parameter, which by analogy with the original application is known as the system temperature irrespective of the objective function involved. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 66 / 83

67 Illustration of the method of simulated annealing (1/ 9) J() perturb perturb high "energy" barrier perturb local minimum local minimum local minimum global minimum Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 67 / 83

68 Illustration of the method of simulated annealing (2/ 9) J() Situation: We seem to be trapped in a local minimum, how can we escape? Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 68 / 83

69 Illustration of the method of simulated annealing (3/ 9) J() Situation: We seem to be trapped in a local minimum, how can we escape? Solution: Increase the energy or temerature, so that the "particle" can deviate from the local minimum. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 69 / 83

70 Illustration of the method of simulated annealing (4/ 9) J() Situation: We seem to be trapped in a local minimum, how can we escape? Solution: Increase the energy or temerature, so that the "particle" can deviate from the local minimum. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 70 / 83

71 Illustration of the method of simulated annealing (5/ 9) J() Situation: We seem to be trapped in a local minimum, how can we escape? Solution: Increase the energy or temerature, so that the "particle" can deviate from the local minimum. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 71 / 83

72 Illustration of the method of simulated annealing (6/ 9) J() Situation: We seem to be trapped in a local minimum, how can we escape? Solution: Increase the energy or temerature, so that the "particle" can deviate from the local minimum. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 72 / 83

73 Illustration of the method of simulated annealing (7/ 9) J() Situation: We seem to be trapped in a local minimum, how can we escape? Solution: Increase the energy or temerature, so that the "particle" can deviate from the local minimum, and reduce it again... Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 73 / 83

74 Illustration of the method of simulated annealing (8/ 9) J() Situation: We seem to be trapped in a local minimum, how can we escape? Solution: Increase the energy or temerature, so that the "particle" can deviate from the local minimum, and reduce it again... Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 74 / 83

75 Illustration of the method of simulated annealing (9/ 9) J() Situation: We seem to be trapped in a local minimum, how can we escape? Solution: Increase the energy or temerature, so that the "particle" can deviate from the local minimum, and reduce it again... Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 75 / 83

76 Mathematical formulation of the simulated annealing algorithm (1/ 2) The simulated annealing algorithm generates a sequence of points 1, 2,... and one hopes that the following convergence can be established min j k J( j) J( ) for k, where J( ) is equal to inf R n J(). In describing the computation that leads to k+1, assuming that k has been computed, we begin by generating a modest number of random points u 1,u 2,...,u m in a large neighborhood of k. For each of these points, its value J(u i ) (i = 1,2,...,m) must be computed and the net point k+1 in our sequence is actually one of the points u 1,u 2,...,u m. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 76 / 83

77 Mathematical formulation of the simulated annealing algorithm (2/ 2) The choice of the net point k+1 in our sequence is made as follows: Select an inde j such that J(u j ) = min{j(u 1 ),J(u 2 ),...,J(u m )}. If J(u j ) < J( k ), set k+1 := u j. Else, for each i = 1,2,...,m assign a probability p i := ep(α(j( k ) J(u i ))) m i=1 ep(α(j( k) J(u i ))) [0,1] to each u i, where α > 0 is a parameter chosen upfront. Finally, a random choice is made among the points u 1,u 2,...,u m based on the probabilities p 1,p 2,...,p m. The thus randomly chosen u i becomes k+1. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 77 / 83

78 Another illustration of simulated annealing (1/ 4) J() Start with a given point or distribution of points. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 78 / 83

79 Another illustration of simulated annealing (2/ 4) J() range where the test points are randomly set range where the test points are randomly set Generate a modest number of random test points in an interval around the starting point(s). The length of this interval can be interpreted as the energy or temperature we used in our first analogy. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 79 / 83

80 Another illustration of simulated annealing (3/ 4) J() Evaluate the function at these test points. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 80 / 83

81 Another illustration of simulated annealing (4/ 4) J() random choice unique choice Finally, decide upon the new starting point(s). To avoid too much diffusion, one adjusts the energy/ temp. in each step and thus controls how far away form the starting point(s) the net test points lie. Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 81 / 83

82 Summary & Outlook

83 Major concepts covered: unconditioned optimization techniques Although this may seem a parado, all eact science is dominated by the idea of approimation. Bertrand Russell The gradient/ steepest descent method and its step-size conditions (Armijo-Goldstein & Wolfe-Powell) Newton s method revisited, the Gradient-Newton method and the Quasi-Newton method incl. BFGS-method The Trust-Region method Penalty & barrier functions (in 1D) Simulated Annealing Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods 83 / 83

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Determining the Roots of Non-Linear Equations Part I

Determining the Roots of Non-Linear Equations Part I Determining the Roots of Non-Linear Equations Part I Prof. Dr. Florian Rupp German University of Technology in Oman (GUtech) Introduction to Numerical Methods for ENG & CS (Mathematics IV) Spring Term

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

Unconstrained Multivariate Optimization

Unconstrained Multivariate Optimization Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued

More information

Polynomial Interpolation Part II

Polynomial Interpolation Part II Polynomial Interpolation Part II Prof. Dr. Florian Rupp German University of Technology in Oman (GUtech) Introduction to Numerical Methods for ENG & CS (Mathematics IV) Spring Term 2016 Exercise Session

More information

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x) Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Economics 205 Exercises

Economics 205 Exercises Economics 05 Eercises Prof. Watson, Fall 006 (Includes eaminations through Fall 003) Part 1: Basic Analysis 1. Using ε and δ, write in formal terms the meaning of lim a f() = c, where f : R R.. Write the

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013

UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013 PAUL SCHRIMPF OCTOBER 24, 213 UNIVERSITY OF BRITISH COLUMBIA ECONOMICS 26 Today s lecture is about unconstrained optimization. If you re following along in the syllabus, you ll notice that we ve skipped

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

Nonlinear Equations. Chapter The Bisection Method

Nonlinear Equations. Chapter The Bisection Method Chapter 6 Nonlinear Equations Given a nonlinear function f(), a value r such that f(r) = 0, is called a root or a zero of f() For eample, for f() = e 016064, Fig?? gives the set of points satisfying y

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1

More information

Numerical Optimization: Basic Concepts and Algorithms

Numerical Optimization: Basic Concepts and Algorithms May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Math 409/509 (Spring 2011)

Math 409/509 (Spring 2011) Math 409/509 (Spring 2011) Instructor: Emre Mengi Study Guide for Homework 2 This homework concerns the root-finding problem and line-search algorithms for unconstrained optimization. Please don t hesitate

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods

Today. Introduction to optimization Definition and motivation 1-dimensional methods. Multi-dimensional methods. General strategies, value-only methods Optimization Last time Root inding: deinition, motivation Algorithms: Bisection, alse position, secant, Newton-Raphson Convergence & tradeos Eample applications o Newton s method Root inding in > 1 dimension

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Intro to Nonlinear Optimization

Intro to Nonlinear Optimization Intro to Nonlinear Optimization We now rela the proportionality and additivity assumptions of LP What are the challenges of nonlinear programs NLP s? Objectives and constraints can use any function: ma

More information

Scientific Computing: Optimization

Scientific Computing: Optimization Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

A Basic Course in Real Analysis Prof. P. D. Srivastava Department of Mathematics Indian Institute of Technology, Kharagpur

A Basic Course in Real Analysis Prof. P. D. Srivastava Department of Mathematics Indian Institute of Technology, Kharagpur A Basic Course in Real Analysis Prof. P. D. Srivastava Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 36 Application of MVT, Darbou Theorem, L Hospital Rule (Refer Slide

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Topic 8c Multi Variable Optimization

Topic 8c Multi Variable Optimization Course Instructor Dr. Raymond C. Rumpf Office: A 337 Phone: (915) 747 6958 E Mail: rcrumpf@utep.edu Topic 8c Multi Variable Optimization EE 4386/5301 Computational Methods in EE Outline Mathematical Preliminaries

More information

Lecture V. Numerical Optimization

Lecture V. Numerical Optimization Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize

More information

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY UNIVERSITY OF MARYLAND: ECON 600 1. Some Eamples 1 A general problem that arises countless times in economics takes the form: (Verbally):

More information

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction ISSN 1749-3889 (print), 1749-3897 (online) International Journal of Nonlinear Science Vol.11(2011) No.2,pp.153-158 Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method Yigui Ou, Jun Zhang

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Part 2: NLP Constrained Optimization

Part 2: NLP Constrained Optimization Part 2: NLP Constrained Optimization James G. Shanahan 2 Independent Consultant and Lecturer UC Santa Cruz EMAIL: James_DOT_Shanahan_AT_gmail_DOT_com WIFI: SSID Student USERname ucsc-guest Password EnrollNow!

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

Optimization. 1 Some Concepts and Terms

Optimization. 1 Some Concepts and Terms ECO 305 FALL 2003 Optimization 1 Some Concepts and Terms The general mathematical problem studied here is how to choose some variables, collected into a vector =( 1, 2,... n ), to maimize, or in some situations

More information

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence Chapter 6 Nonlinear Equations 6. The Problem of Nonlinear Root-finding In this module we consider the problem of using numerical techniques to find the roots of nonlinear equations, f () =. Initially we

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Exploring the energy landscape

Exploring the energy landscape Exploring the energy landscape ChE210D Today's lecture: what are general features of the potential energy surface and how can we locate and characterize minima on it Derivatives of the potential energy

More information

Multidisciplinary System Design Optimization (MSDO)

Multidisciplinary System Design Optimization (MSDO) Multidisciplinary System Design Optimization (MSDO) Numerical Optimization II Lecture 8 Karen Willcox 1 Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Today s Topics Sequential

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Optimization II: Unconstrained

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Lecture 7: Weak Duality

Lecture 7: Weak Duality EE 227A: Conve Optimization and Applications February 7, 2012 Lecture 7: Weak Duality Lecturer: Laurent El Ghaoui 7.1 Lagrange Dual problem 7.1.1 Primal problem In this section, we consider a possibly

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

Lecture 16: October 22

Lecture 16: October 22 0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

NonlinearOptimization

NonlinearOptimization 1/35 NonlinearOptimization Pavel Kordík Department of Computer Systems Faculty of Information Technology Czech Technical University in Prague Jiří Kašpar, Pavel Tvrdík, 2011 Unconstrained nonlinear optimization,

More information

4.3 How derivatives affect the shape of a graph. The first derivative test and the second derivative test.

4.3 How derivatives affect the shape of a graph. The first derivative test and the second derivative test. Chapter 4: Applications of Differentiation In this chapter we will cover: 41 Maimum and minimum values The critical points method for finding etrema 43 How derivatives affect the shape of a graph The first

More information

3.3.1 Linear functions yet again and dot product In 2D, a homogenous linear scalar function takes the general form:

3.3.1 Linear functions yet again and dot product In 2D, a homogenous linear scalar function takes the general form: 3.3 Gradient Vector and Jacobian Matri 3 3.3 Gradient Vector and Jacobian Matri Overview: Differentiable functions have a local linear approimation. Near a given point, local changes are determined by

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

1. Sets A set is any collection of elements. Examples: - the set of even numbers between zero and the set of colors on the national flag.

1. Sets A set is any collection of elements. Examples: - the set of even numbers between zero and the set of colors on the national flag. San Francisco State University Math Review Notes Michael Bar Sets A set is any collection of elements Eamples: a A {,,4,6,8,} - the set of even numbers between zero and b B { red, white, bule} - the set

More information

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I IE 5531: Engineering Optimization I Lecture 19: Midterm 2 Review Prof. John Gunnar Carlsson November 22, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 1 / 34 Administrivia

More information

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Outline. Basic Concepts in Optimization Part I. Illustration of a (Strict) Local Minimum, x. Local Optima. Neighborhood.

Outline. Basic Concepts in Optimization Part I. Illustration of a (Strict) Local Minimum, x. Local Optima. Neighborhood. Outline Basic Concepts in Optimization Part I Local and Global Optima Benoît Chachuat McMaster University Department of Chemical Engineering ChE G: Optimization in Chemical Engineering

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

Lecture 8 Optimization

Lecture 8 Optimization 4/9/015 Lecture 8 Optimization EE 4386/5301 Computational Methods in EE Spring 015 Optimization 1 Outline Introduction 1D Optimization Parabolic interpolation Golden section search Newton s method Multidimensional

More information

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization

More information

Computational Optimization. Constrained Optimization Part 2

Computational Optimization. Constrained Optimization Part 2 Computational Optimization Constrained Optimization Part Optimality Conditions Unconstrained Case X* is global min Conve f X* is local min SOSC f ( *) = SONC Easiest Problem Linear equality constraints

More information

MATH 4211/6211 Optimization Basics of Optimization Problems

MATH 4211/6211 Optimization Basics of Optimization Problems MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection 6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way. AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

Multivariate Newton Minimanization

Multivariate Newton Minimanization Multivariate Newton Minimanization Optymalizacja syntezy biosurfaktantu Rhamnolipid Rhamnolipids are naturally occuring glycolipid produced commercially by the Pseudomonas aeruginosa species of bacteria.

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C. Chapter 4: Numerical Computation Deep Learning Authors: I. Goodfellow, Y. Bengio, A. Courville Lecture slides edited by 1 Chapter 4: Numerical Computation 4.1 Overflow and Underflow 4.2 Poor Conditioning

More information

Tangent spaces, normals and extrema

Tangent spaces, normals and extrema Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

More information

Computer Problems for Taylor Series and Series Convergence

Computer Problems for Taylor Series and Series Convergence Computer Problems for Taylor Series and Series Convergence The two problems below are a set; the first should be done without a computer and the second is a computer-based follow up. 1. The drawing below

More information

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 Petros Koumoutsakos Gerardo Tauriello (Last update: July 2, 2015) IMPORTANT DISCLAIMERS 1. REFERENCES: Much of the material

More information

Optimization for neural networks

Optimization for neural networks 0 - : Optimization for neural networks Prof. J.C. Kao, UCLA Optimization for neural networks We previously introduced the principle of gradient descent. Now we will discuss specific modifications we make

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information