Static unconstrained optimization

Size: px
Start display at page:

Download "Static unconstrained optimization"

Transcription

1 Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R n the set of admissible decisions. 2. Optimality conditions In the following, conditions are derived to determine (local) minimizers x of f(x) in X ad so that f(x) f(x ) (2.2) for x X ad or x U ɛ X ad with U ɛ a sufficiently small ɛ neighborhood of x (cf. Definition.2). For this, recall the mean value theorem (Theorem.3) and assume that f(x) C (X ad ) and that the line segment [x, x + δx] X ad for sufficiently small δx. Then taking into account (.24) we have f(x + δx) = f(x ) + (δx) T ( f)(x + ( r)δx) for some r [, ]. In case of a minimizer x the inequality (2.2) with x = x + δx has to be fulfilled for all δx sufficiently small. This yields f(x ) + (δx) T ( f)(x + ( r)δx) f(x ) or equivalently (δx) T ( f)(x + ( r)δx). It will be shown by contradiction that the latter inequality implies ( f)(x ) =. Hence, let first ( f)(x ) and define δx = ( f)(x ). Then (δx) T ( f)(x ) = ( f)(x ) 2 <. Since f is by assumption continuous in a neighborhood of x, there exists a scalar τ such that δx T ( f)(x + tδx) < (2.3) for all t [, τ]. Hence, for any t (, τ] the mean value theorem implies f(x + tδx) = f(x ) + tδx T ( f)(x + ( r) tδx) (2.4) for some r [, ]. Noting that r [, ] and hence t := ( r) t [, τ], substitution of (2.3) into (2.4) yields f(x + tδx) < f(x ) 23

2 for all t (, τ]. With this, a new direction is obtained away from x along which f(x) decreases. Thus x is not a local minimizer and a contradiction is obtained, which implies the following result. Theorem 2.: First order necessary optimality condition Let X ad R n be the set of admissible decisions and assume f(x) C (X ad ). If x X ad is a local minimizer, then ( f)(x ) =. (2.5) Example 2.. Consider the minimization problem min f(x) = sin(x x 2 ) tan(x ) (2.6) x X ad with X ad = [, ] [, ]. Evaluation of (2.5) provides [ ] ( f)(x x2 cos(x ) = x 2 ) cos 2 (x ) =, x cos(x x 2 ) which results in x = [, ] T. In order to ensure that x is a local minimizer it is required to show that = f(x ) f(x) for all x in a neighborhood of x. Therefore let x = x + [ɛ, ] T for ɛ and evaluate f(x) = f(x + [ɛ, ] T ) = sin(ɛ) tan(ɛ). Thus, f(x) < f(x ) for x (, ɛ] and f(x) > f(x ) for x [ ɛ, ) given x 2 =, which yields that x satisfying (2.5) is not a local minimizer. This example illustrates that the conditions of Theorem 2. are only necessary but not sufficient. In addition, (2.5) only implies that x is an extremum, subsequently often called stationary point, and is fulfilled for a minimum but also for a maximum and a saddle point (cf. Figure 2.). By taking into account higher order terms, assuming that f(x) is at least twice continuously f f f x 2 (a) Minimum. x x 2 (b) Maximum. x x 2 x (c) Saddle point. Figure 2.: Examples of minimum, maximum and saddle point. differentiable in X ad, the higher order mean value theorem (.25) can be considered to improve Theorem Chapter 2 Static unconstrained optimization

3 Theorem 2.2: Second order necessary optimality conditions Let X ad R n be the set of admissible decisions and assume f(x) C 2 (X ad ). If x X ad is a local minimizer, then ( f)(x ) = and ( 2 f)(x ). (2.7) The second condition in (2.7) refers to the Hessian of f(x) being positive definite at the point x. The proof of Theorem 2.2 is left as an exercise to the reader. Example 2.2. Consider the minimization problem min f(x) = x 2 4x + 3x 2 2 6x 2 (2.8) x X ad with X ad = {x R 2 : x, x 2 }. Evaluation of (2.7) provides ( f)(x ) = [ ] 2x 4 6x 2 6 =, which is satisfied for x = [2, ] T and ( 2 f)(x ) = [ ] 2. 6 Since the Hessian matrix is positive definite (eigenvalues at λ = 2 and λ 2 = 6), the objective function f(x) does fulfill the necessary optimality conditions of Theorem 2.2. Example 2.3. Consider the minimization problem min x X ad f(x) = x 2 + x x 2 2x 2 2 (2.9) with X ad = R 2. Evaluation of (2.7) results in [ ] ( f)(x 2x ) = + x 2 x =, 4x 2 which yields x = and hence [ ] ( 2 f)(x 2 ) =. 4 The Hessian matrix is indefinite (one positive and negative real eigenvalue) so that neither the necessary optimality conditions for a minimum nor maximum are fulfilled. It is left to the reader to show that x = is a saddle point as depicted in Figure 2.(c). The following result provides sufficient conditions that guarantee that a point x interior to X ad is a strict local minimizer [9]. 2. Optimality conditions 25

4 Theorem 2.3: Second order sufficient optimality condition Let X ad R n be the set of admissible decisions and let f(x) C 2 (X ad ). If x X ad and the conditions ( f)(x ) = and ( 2 f)(x ) >. (2.) are fulfilled, then x is a strict local minimizer of f(x). If the objective function f(x) in (2.) is a convex function, local and global minimizers can be easily characterized as is summarized below [9]. Theorem 2.4 If f(x) is a convex function on the convex set X ad, then any local minimizer x is a global minimizer and the set of minima G = arg min{f(x) : x X ad } is convex. If in addition f(x) C (X ad ), then any stationary point x X ad is a global minimizer. The proof of Theorem 2.4 is left to the reader and can be, e.g., found in [2]. Example 2.4. Consider a minimization problem involving a quadratic form, i.e. min x X ad f(x) = 2 xt P x + q T x + r for X ad = R n with P = P T R n n, q R n and r R. From Example.6 the gradient and Hessian matrix follow as ( f)(x) = P x + q, ( 2 f)(x) = P. We note that f(x) is strictly convex if P is positive definite. There is a unique stationary point x = P q, which, due to the differentiability of f(x), is a global minimizer by Theorem Numerical minimization algorithms The necessary optimality conditions require the determination of stationary points x as solutions to an in general nonlinear system of n coupled equations given by ( f)(x ) =. As a result, an analytical solution can be expected only in special cases so that numerical techniques are needed to accurately approximate stationary points x. For this, various algorithms are available, which in principle are based on the computation of a sequence of values (x k ) k N starting at an initial point x such that f(x) is decreased in each iteration step, i.e. f(x k+ ) < f(x k ), k =,,... (2.) with the desire to achieve convergence of the sequence to the (local) minimizer lim x k x. k (2.2) The algorithms are often referred to as iterative descent algorithms. 26 Chapter 2 Static unconstrained optimization

5 Remark 2. It should be mentioned that also nonmonotone algorithms exist that do not require the decrease of f(x) in every iteration but after a certain prescribed number of iterations. Also information of earlier iterates x, x,..., x k can be used to determine x k+. In the following, some preliminaries from numerical analysis are summarized, which are required to properly analyze so called line search and trust region methods. Finally so called direct search strategies are briefly introduced Preliminaries Convergence is the essential question and preliminary in any iterative technique. For a proper definition, the contraction property of a mapping in a suitable complete space has to be taken into account by defining a suitable metric, i.e. a measure of distance from the iterate to the fixed point of the mapping. The reader is referred to [3, ] for further details. Subsequently, only the notion of convergence order is introduced as a measure of convergence speed. Definition 2.: Order of convergence Let (x k ) be a sequence converging towards the limit x. The order of convergence of the sequence (x k ) k N is the supremum of all nonnegative numbers p for which lim k x k+ x x k x = µ <. (2.3) p The constant µ is called the asymptotic error constant. It is obvious from (2.3) that larger values of p correspond to a higher speed of convergence since the distance of the iterate x k+ to x is for large k reduced by the p th power. Example 2.5. The sequence ( k + k)k N converges to with the order of convergence p = since lim k k + 2 k + k + k = lim k ( k+ k+2 ) 4 ( k+ k+2 ) 4 ( k k+2 ) 4 =. One typically distinguishes between the two major cases { p =, µ (, ), linear convergence p = 2, µ <, quadratic convergence and { p =, µ =, superlinear convergence p =, µ =, sublinear convergence. This in particular illustrates that any algorithm with convergence order p > is superlinear. 2.2 Numerical minimization algorithms 27

6 Exercise 2.2. Determine the convergence order p and asymptotic error constant µ of the sequence {k k } k N. Solution 2.2. The sequence converges superlinearly to zero. When analyzing a sequence of vectors (x k ) k N converging to a limit x, as is the case in the considered minimization algorithms, the determination of the rate of convergence requires the proper mapping of this sequence into a sequence of scalars. If f(x) is the objective function according to (2.), then typically the convergence of the sequence (f(x k )) k N to f(x ) is analyzed. In this context f(x) is also referred to as error function. Alternatively also the norm x k x can be considered or another suitable map from R n to R. However, the rate of convergence of a vector valued sequence is in general independent of the choice of the error function Line search methods The principle operation of line search methods is illustrated in Figure 2.2. In each iteration of a line search method a search direction s k is computed and the algorithm decides how far to move into this direction by determining a suitable step length α k >, i.e. x k+ = x k + α k s k. (2.4) Most line search algorithms require s k to be a descent direction, i.e. one for which s T k ( f)(x k) <, since this property guarantees that f(x) can be reduced along this direction such that f(x k+ ) = f(x k + α k s k ) < f(x k ). (2.5) To illustrate this, the following proposition is proved subsequently. Proposition 2. (Direction of steepest descent). The search direction s k = ( f)(x k ) is the direction of steepest descent, i.e. among all directions at x k it is the one along which f(x) decreases most rapidly. Proof. Let f C 2 (X ad ). Then the mean value theorem (.25) implies that there exists an r [, ] such that f(x k+ ) = f(x k + α k s k ) = f(x k ) + α k s T k ( f)(x k ) + 2 α2 ks T k ( 2 f)(x k + ( r)α k s k )s k. }{{} =t k Herein, t k [, α k ] since r [, ]. The rate of change in f is the coefficient in the term in α k, i.e. s T k ( f)(x k). Hence, the unit direction of s k of most rapid decrease is the solution to the minimization problem min s k R st n k ( f)(x k ) subject to s k =. Evaluation of the scalar product yields s T k ( f)(x k ) = s k ( f)(x k ) cos θ }{{} = 28 Chapter 2 Static unconstrained optimization

7 f x 2 Figure 2.2: Illustration of a line search method. x with θ the angle between s k and ( f)(x k ). The desired minimum is obviously attained for cos θ = so that s k = ( f)(x k) ( f)(x k ) is the (unit) direction of steepest descent starting at x k. This also illustrates that f(x) can be reduced along any direction s k fulfilling the property that s T k ( f)(x k) <. Depending on the selection of the search direction s k different algorithms can be distinguished, which are summarized below. These, moreover, depend on the suitable determination of the second degree of freedom, namely the step length α k >. For this, it would be ideal to find the global minimizer of the scalar minimization problem min g(α k) = f(x k + α k s k ) (2.6) α k > for fixed x k and s k. However, this is in general computationally too expensive so that other techniques have to be taken into account to locally address (2.6). The schematic realization of line search methods is summarized in Algorithm below. Algorithm : Schematic line search method. input : x (starting value) ɛ (stopping criteria) initialize : k = repeat Compute search direction s k Find an appropriate step length α k Compute x k+ = x k + α k s k Update k = k + until f(x k+ ) ɛ or x k+ x k ɛ; 2.2 Numerical minimization algorithms 29

8 Determination of step length It should be mentioned that simply asking for (2.5), i.e. f(x k + α k s k ) < f(x k ) is not enough to achieve convergence to the minimizer x. As the following example illustrates sufficient decrease conditions are required to solve (2.6). Example 2.6. Let f(x) = (x ) 2 and consider the sequence (x k ) k N with x k = + ( ) k 2/k +. Then f(x k ) = 2/k so that f(x k+ ) < f(x k ) but as k the sequence f(x k ) approaches since x k will start alternating between and 2. However, the minimum f(x ) = for x = is not reached. Subsequently, different conditions and related algorithms are provided, which enable to determine an appropriate step length α k in the line search method assuming that the starting point x k of the line search and a search direction (descent direction) s k are given. Armijo conditions The Taylor series of g(α k ) = f(x k + α k s k ) around α k = results in g(α k ) = g() + g ()α k + O 2 (α k ) = f(x k + α k s k ) = f(x k ) + α k s T k ( f)(x k ) + O 2 (α k ). In the Armijo condition the step length α k, the directional derivative s T k ( f)(x k) and the reduction in f( ) are connected by the inequality f(x k + α k s k ) f(x k ) + ɛ α k s T k ( f)(x k ) (2.7) for some constant ɛ (, ), typically chosen small, e.g. ɛ.. With this, an upper bound on the step length is imposed. To ensure that α k does not become too small an additional inequality is introduced f(x k + α k s k ) f(x k ) + ɛ ɛ α k s T k ( f)(x k ) (2.8) with the parameter ɛ >. Figure 2.3(a) shows a graphical illustration of (2.7) and (2.8). Herein recall that s k is by assumption a descent direction with s T k ( f)(x k) <. In practice one starts for fixed x k and s k with an initial choice of α k = α () k : (i) If the initial value satisfies (2.7), then α k is successively increased by a factor ɛ > until at say α (j+) k condition (2.7) is violated. (ii) If the initial value does not satisfy (2.8), then successively decrease α k by a factor ɛ > until α (j) k = α (j ) k /ɛ fulfills (2.8). (iii) Finally assign the determined α k = α (j) k as step length for the line search algorithm. Wolfe conditions A slight modification of the Armijo conditions leads to the so called Wolfe conditions. Besides (2.7) a curvature condition is introduced different from (2.8) to exclude unacceptable small values of α k, i.e. g (α k ) ɛ 2 g () or equivalently s T k ( f)(x k + α k s k ) ɛ 2 s T k ( f)(x k ) 3 Chapter 2 Static unconstrained optimization

9 g g() + ɛ α k g () g g() + ɛ α k g () ɛ 2 g () g (α k ) α k α k g() + ɛ ɛ α k g () (a) Armijo conditions (2.7), (2.8) (b) Wolfe conditions (2.9) Figure 2.3: Illustration of Armijo and Wolfe conditions. Admissible areas are marked by the double arrows. for some constant ɛ 2 (ɛ, ). This condition ensures that the slope of g( ) at α k is ɛ 2 times greater than the initial slope at α k =. Figure 2.3(b) provides a graphical illustration and confirms that this selection is useful since if the slope g (α k ) = s T k ( f)(x k + α k s k ) is strongly negative, then f can be further reduced by moving along the search direction s k with α k. On the other hand, if g (α k ) = s T k ( f)(x k + α k s k ) is only slightly negative or positive, then one can in general no longer assume that f can be further reduced in this search direction so that line search can be terminated with this s k. In summary, the introduced two sufficient conditions are known as the Wolfe condition and read f(x k + α k s k ) f(x k ) + ɛ α k s T k ( f)(x k ) s T k ( f)(x k + α k s k ) ɛ 2 s T k ( f)(x k ) (2.9a) (2.9b) for constants ɛ (, ) and ɛ 2 (ɛ, ). Typical values of ɛ 2 are.9 when the search direction s k is determined by a Newton or quasi Newton method and. if a nonlinear conjugate gradient method is chosen to obtain s k. The so called strong Wolfe conditions are obtained by modifying the curvature condition, i.e. f(x k + α k s k ) f(x k ) + ɛ α k s T k ( f)(x k ) s T k ( f)(x k + α k s k ) ɛ 2 s T k ( f)(x k ) (2.2a) (2.2b) for constants ɛ (, ) and ɛ 2 (ɛ, ). This more restrictive formulation enforces that α k attains a value so that x k+ = x k + α k s k lies in (at least) a large neighborhood of a local minimizer or stationary point. Remark 2.2 It can be shown under the assumption of continuous differentiability of f(x) that there always exist step lengths α k satisfying the Wolfe and the strong Wolfe conditions. For further details the reader is referred to, e.g., [9]. 2.2 Numerical minimization algorithms 3

10 The so called Goldstein conditions are rather similar to the Wolfe condi- Goldstein conditions tions and read as f(x k ) + ( ɛ)α k s T k ( f)(x k ) f(x k + α k s k ) f(x k ) + ɛα k s T k ( f)(x k ) (2.2) for a constant ɛ (, /2). The Goldstein conditions are often used in Newton type methods but show the disadvantage compared to the Wolfe conditions that the first inequality may exclude all minimizers of g(α k ). Backtracking As is argued above, the decrease condition (2.9a) alone is not sufficient to guarantee that the algorithm makes reasonable progress in the considered search direction. Nevertheless, if the candidate step lengths are chosen appropriately by using a so called backtracking approach, then the curvature condition (2.9b) can be neglected and only (2.9a) may be used to terminate the line search procedure. The most basic form of this technique is summarized in Algorithm 2. The initial step αk is chosen to be in Newton and quasi Newton methods but can Algorithm 2: Backtracking algorithm. input : αk > (starting value) ρ (, ) (backtracking parameter) ɛ (, ) (descent parameter) initialize : α k = α k repeat α k ρα k until f(x k + α k s k ) f(x k ) + ɛ α k s T k ( f)(x k); have different values in other algorithms such as steepest descent or conjugate gradient. On the one hand the backtracking algorithm ensures that α k will in a finite number of trials become sufficiently small so that the decrease condition (2.9a) is fulfilled. On the other hand, α k will not become too small, preventing progress of the algorithm, due to the successive reduction by ρ (, ). Applications illustrate that backtracking is well suited for Newton s method but less appropriate for quasi Newton and conjugate gradient methods. Nested intervals A less heuristic technique for the determination of the step length α k minimizing (2.6) is provided by nested intervals. The underlying idea is illustrated in Figure 2.4. For this, it is assumed that g(α k ) is unimodal in an interval α k [l, r ] so that g(α k ) has a unique local minimum in the open interval (l, r ). To determine the interval [l, r ] start from a sufficiently small l and increase the value of the right interval boundary r until g(r) starts increasing for some r = r. Interval nesting is an iterative procedure to successively decrease the interval [l j, r j ] including the local minimum of g(α k ) as j increases. Consider now the j th iteration step. Based on l j and r j new interval boundaries l + j, r+ j with l j < l + j < r + j < r j are computed using l + j = l j + ( ɛ)(r j l j ) (2.22a) r + j = l j + ɛ(r j l j ) (2.22b) The function f(x) is called unimodal for x X if it has unique local minimum in X. 32 Chapter 2 Static unconstrained optimization

11 g g g(r j ) g(l j ) g(l j+ ) g(r j+ ) α k l j l + j r + j r j α k l j+ r j+ (a) Step j. (b) Step j +. Figure 2.4: Example of nested intervals. with the parameter ɛ (/2, ). The remaining procedure is based on the following lemma. Lemma 2. Let l j < l + j < r + j < r j and let g(α k ) be an unimodal function on the interval [l j, r j ]. Let α k denote the local minimum of g(α k) in (l j, r j ). Then α k [l j, r + j ] if g(l+ j ) g(r+ j ) or α k [l+ j, r j] if g(l + j ) g(r+ j ). Proof. Consider the case g(l j + ) g(r+ j ). We follow a contradiction argument assuming that the local minimizer satisfies αk > r+ j, which implies that l+ j < αk. Since g(l+ j ) g(r+ j ) there exists a point αk (l+ j, α k ) such that g(α k ) = max α k [l + j,α k ] g(α k). Hence αk denotes a local maximizer in the interval [l j, r j ], which contradicts the assumption that g(α k ) is unimodal in [l j, r j ]. The case g(l j + ) g(r+ j ) follows analogously. Lemma 2. implies that r j + is dropped for the iteration step j + if g(l j + ) g(r+ j ) so that the new interval [l j+, r j+ ] is given by l j+ = l j and r j+ = r j +. This case is shown in Figure 2.4. If g(l j + ) g(r+ j ), then the new interval [l j+, r j+ ] is obtained as l j+ = l j + and r j+ = r j. For the scenario of Figure 2.4 evaluate (2.22) for j = j +, which yields l + j+ = l j + ɛ( ɛ)(r j l j ), r + j+ = l j + ɛ 2 (r j l j ). (2.23) By imposing the constraint ɛ 2 = ( ɛ), i.e. ɛ = , (2.24) the equality r j+ + = l+ j is obtained, so that in each iteration only one new boundary has to be computed. Note that the fraction /ɛ =.68 is also known as the golden ratio. If g(l j + ) g(r+ j ), then (2.24) similarly ensures l j+ + = r j to reduce the number of computational steps. The local minimizer αk is finally obtained by averaging the final iteration results, i.e. α k = (l k + r k )/2 or by quadratic interpolation (see below) using the three smallest values of the four values of g at l j, r j, l j +, and r+ j. The method of nested intervals is an easily implementable and 2.2 Numerical minimization algorithms 33

12 numerically robust procedure to compute α k at the cost of a typically larger number of iteration steps. Quadratic interpolation One very efficient method to solve the minimization problem (2.6) is given by quadratic interpolation. For this, choose three pairwise distinct values αk, α2 k and αk 3 and evaluate g j = g(α j k ). The quadratic interpolation function passing through these three points is given by q(α k ) = 3 j= i j g (α k αk i ) j i j (αj k (2.25) αi k ). The minimizer α k of q(α k) follows as α k = 2 ( g α 2 k αk)( 3 α 2 k + αk) 3 ( + g2 α 3 k αk)( α 3 k + αk) ( + g3 α k αk)( 2 α k + αk 2 ) ( g α 2 k αk) 3 ( + g2 α 3 k αk) ( + g3 α k αk 2 ). (2.26) Determination of the search direction The convergence of the line search methods not only depends on the selection of the step length α k but also on the chosen search direction s k, which has to be a descent direction such that s T k ( f)(x k) <. In the following, different approaches for the proper choice of s k are presented together with the resulting convergence rates. Steepest descent or gradient method Proposition 2. shows that the search direction s k = ( f)(x k ) (2.27) is the direction of steepest descent, i.e. among all directions at x k it is the direction along f(x) decreases most rapidly. For the analysis of convergence of the steepest descent method x k+ = x k α k ( f)(x k ) (2.28) with (2.27) consider first the quadratic minimization problem min f(x) = x R n 2 xt P x b T x (2.29) for P symmetric and positive definite. It was shown in Example.6 that f(x) is strictly convex since ( 2 f)(x) = P is positive definite so that Property (iv) of convex functions applies. Taking into account Theorems 2.2 and 2.4 it follows from ( f)(x) = P x b = that x = P b is a global minimizer of (2.29). Given (2.29) the method of steepest descent (2.28) evaluates to x k+ = x k α k (P x k b). (2.3) The minimizer of (2.6), i.e. min αk > g(α k ) = f(x k + α k s k ), and hence the optimal step length α k can be computed explicitly from min f(x k+α k s k ) = ( ) T ( ) xk α k (P x k b) P xk α k (P x k b) b T ( ) x k α k (P x k b) α k > 2 }{{}}{{}}{{} =( f)(x k ) =( f)(x k ) =( f)(x k ) 34 Chapter 2 Static unconstrained optimization

13 taking the derivative of f(x k + α k s k ) with respect to α k. This yields α k = ( f)t (x k )( f)(x k ) ( f) T (x k )P ( f)(x k ). (2.3) Exercise 2.3. Verify (2.3). With α k as above the steepest descent method for the quadratic minimization problem reads x k+ = x k ( f)t (x k )( f)(x k ) ( f) T (x k )P ( f)(x k ) ( f)(x k). (2.32) For the convergence analysis, introduce a suitably weighted norm by defining x 2 P = xt P x. This in particular implies with x = P b that 2 x x 2 P = f(x) f(x ). (2.33) The introduced norm is a measure of the difference between the current objective function and the minimal value. Exercise 2.4. Verify (2.33). Consider the weighted distance of x k+ defined in (2.32) and the minimizer, i.e. x k+ x P, which evaluates to x k+ x 2 P = x k ( f)t (x k )( f)(x k ) 2 ( f) T (x k )P ( f)(x k ) ( f)(x k) x P [ ( f) = x k x 2 T (x k )( f)(x k ) ] 2 P ( f) T (x k )P ( f)(x k ) ( [ ( f) T (x k )( f)(x k ) ] 2 ) = ( f) T (x k )P ( f)(x k ) x k x 2 x k x 2 P P ( [ ( f) T (x k )( f)(x k ) ] 2 ) = [ ( f) T (x k )P ( f)(x k ) ][ ( f) T (x k )P ( f)(x k ) ] x k x 2 P. } {{ } = ( ) (2.34) Herein, ( f)(x k ) = P (x k x ) is used, which follows from x = P b and hence b = P x. The term ( ) describes the decrease in each iteration step so that the convergence properties of the steepest descent method can be deduced from this expression. For its interpretation, Kantorovich s inequality is used. Lemma 2.2: Kantorovich s inequality Let P R n n be a symmetric positive definite matrix. For every x R n the inequality (x T x) 2 (x T P x)(x T P x) 4λ minλ max (λ min + λ max ) 2 (2.35) 2.2 Numerical minimization algorithms 35

14 holds with λ min and λ max referring to the smallest and largest eigenvalues of P. Note that the eigenvalues of a symmetric and positive definite matrix are real and positive. Exercise 2.5. Prove Lemma 2.2. These preliminaries allow to conclude the following theorem [7]. Theorem 2.5: Convergence of steepest descent for quadratic objective function For any initial value x R n the steepest descent method (2.32) converges linearly to the global minimum of the strict convex objective function (2.29) with the error norm satisfying x k+ x 2 P ( ) κ 2 x k x 2 P (2.36) κ + with κ = λ max /λ min the spectral condition number of P. Proof. The result is a direct consequence of (2.35) applied to (2.34), i.e. x k+ x 2 P x k x 2 P = [ ( f) T (x k )( f)(x k ) ] 2 [ ( f) T (x k )P ( f)(x k ) ][ ( f) T (x k )P ( f)(x k ) ] 4λ minλ max (λ min + λ max ) 2 with λ min and λ max referring to the smallest and largest eigenvalues of P. Hence, x k+ x 2 P x k x 2 P (λ min λ max ) 2 ( ) κ 2 (λ min + λ max ) 2 = κ + which equals (2.36). This result admits a geometric interpretation. At first, it is obvious that convergence is achieved in a single step if κ =, i.e. if all eigenvalues λ j = λ of P coincide so that P = λe. In this case the contours of the objective function f(x) = x T P x b T x are circles and the steepest descent direction always points at the global minimizer. This case is visualized in Figure 2.5(a).If κ increases, then the contours approach ellipsoids and convergence degrades due to a zigzagging behavior of the line search algorithm with steepest descent as is shown in Figure 2.5(b). Note that the zigzagging will increase with the spectral condition number κ. The rate of convergence remains in principle unchanged if the minimization problem (2.) is considered with a general objective function f(x) [9]. Theorem 2.6: Convergence of steepest descent for general objective function Let f(x) C 2 (R n ) and let x denote the local minimizer of (2.). Moreover, assume that ( 2 f)(x ) is positive definite and let λ min and λ max denote its smallest and largest (positive real) eigenvalue. Assume that the sequence of iterations (x k ) k N generated by the steepest descent method x k+ = x k α k ( f)(x k ) 36 Chapter 2 Static unconstrained optimization

15 converges to the local minimizer x for suitable step lengths α k. Then the sequence (f(x k )) k N converges linearly to f(x ) with a rate of convergence larger than (κ ) 2 /(κ+ ) 2, where κ = λ max /λ min is the spectral condition number of the Hessian matrix. α s x x2 x α 2 s2 x2 x x 3 x 2 x x 4 α s x x 2 x (a) Ideal conditioning with κ =. x (b) Conditioning with κ. Figure 2.5: Line search with steepest descent for quadratic strict convex objective function. For poorly conditioned problems with large κ an appropriate scaling might be used to improve the iterations. This approach exploits the fact that the determination of the minimum of the objective function f(x) is equivalent to the determination of the minimum of the objective function g(z) = f(v z) with x = V z and V regular. With this, the minimizer x is mapped according to z = V x. Hence, in the new state z the gradient and Hessian of g(z) are related to those of f(x) by ( g)(z) = V T ( f)(v z), ( 2 g)(z) = V T ( 2 f)(v z)v, (2.37) which in particular implies ( g)(z ) = V T ( f)(x ) and ( 2 g)(z ) = V T ( 2 f)(x )V. The proper selection of the transformation matrix V may lead to an improvement of the spectral condition number of the Hessian matrix ( 2 g)(z) compared to ( 2 f)(x). Nevertheless, these so called pre conditioning techniques should be only applied with caution as is remarked, e.g., in [, p. 34f]. Pros and cons of line search with steepest descent or gradient method can be summarized as follows: (+) Simple with low computational burden since the explicit evaluation of the Hessian matrix ( 2 f)(x k ) is not needed; (+) Convergence can be achieved also for starting values x not close to the local minimizer x ; ( ) Slow convergence depending on the conditioning (and scaling); ( ) Linear convergence only. 2.2 Numerical minimization algorithms 37

16 Conjugated gradient method The conjugated gradient (CG) method aims at combining quadratic convergence (as in Newton s method below) with the low computational burden of the steepest descent method. Herein, information of the present and previous iteration are used to appropriately determine the search direction, i.e. s k = ( f)(x k ) + β k s k, k s = ( f)(x ). (2.38) Different formula exist for the determination of the parameter β k. One version is given by the Fletcher Reeves formula, where β F R k = ( f)t (x k )( f)(x k ) ( f) T (x k )( f)(x k ). (2.39) Moreover, the Polak Ribière formula should be mentioned in this context, where β P R k = ( f)t (x k )[( f)(x k ) ( f)(x k )] ( f) T. (2.4) (x k )( f)(x k ) While the convergence properties of CG methods are well understood for linear and quadratic problems, in the general nonlinear setting surprising convergence properties can be observed, as is, e.g., pointed out in [9]. The reader is referred to this reference or [] for further details and analysis. Newton s method Newton s iterative method is based on the analysis of f(x k+ ) for x k+ = x k + s k, i.e. unit step length α k =. Evaluation of the Taylor series at x k neglecting terms of order 3 and larger yields f(x k+ ) = f(x k ) + s T k ( f)(x k ) + 2 st k ( 2 f)(x k )s k. (2.4) The search direction, also called Newton direction, is obtained by minimizing the right hand side of (2.4) with respect to s k. Taking into account Theorem 2. and noting that the right hand side is a quadratic form in s k, implies so that ( sk f)(x k+ ) = ( f)(x k ) + ( 2 f)(x k )s k = s k = ( 2 f) (x k )( f)(x k ). (2.42) Hence, Newton s method can be interpreted as minimizing the quadratic function approximation of the objective function f(x). For x k in a sufficiently small neighborhood of a strict local minimizer x it follows from Theorem 2.3 that the Hessian matrix ( 2 f)(x k ) is positive definite and hence invertible. In this case, Newton s method is well defined and (2.42) defines a descent direction. 38 Chapter 2 Static unconstrained optimization

17 Theorem 2.7: Convergence of Newton s method Let f C 2 (R n ) and let ( 2 f)(x) be locally Lipschitz continuous in a neighborhood of x for which the second order sufficient optimality conditions (2.) are satisfied. If the starting point x is sufficiently close to the minimizer x, then the Newton iteration x k+ = x k ( 2 f) (x k )( f)(x k ) (2.43) converges to x with an order of convergence p of at least 2. In addition, the sequence of gradient norms ( ( f)(x k ) ) k N converges quadratically to zero. The proof of this theorem is omitted but can be, e.g., found in [9, Chap. 3.3]. Remark 2.3 Let f : R n R m satisfy the inequality f(x ) f(x 2 ) L x x 2, L (, ) (2.44) for all x, x 2 B r (y) = {x R n : x y r}. Then f(x) is called locally Lipschitz continuous on B r (y) R n. If the inequality holds for all x, x 2 R n, then f(x) is called globally Lipschitz continuous. Note that if f(x) and ( f)(x) are continuous on B r (y) R n, then f(x) is locally Lipschitz continuous. In view of Theorem 2.7 and the considered scalar case with f(x) the local Lipschitz continuity of ( 2 f)(x) in a neighborhood of x is given provided that f(x) C 3 (R n ). For the practical implementation typically a certain step length α k is introduced so that (2.43) is replaced by x k+ = x k α k ( 2 f) (x k )( f)(x k ). (2.45) Herein α k is also referred to as damping coefficient and the damped Newton method is often called Newton Raphson method. Strategies for the suitable determination of α k are discussed in Section above. It is crucial to observe that the positive definiteness of the Hessian matrix ( 2 f)(x k ) might be lost if x k is not sufficiently close to x. In this case, s k defined in (2.42) is no longer a descent direction and ( 2 f)(x k ) is not necessarily invertible. To address this issue, the search direction is modified so that the iteration rule reads x k+ = x k α k N k ( f)(x k), N k = ( 2 f)(x k ) + ɛ k E (2.46) with the unit matrix E R n n and a suitable ɛ k. For ɛ k = Newton s method is recovered while for large ɛ k the iteration (2.46) approaches the method of steepest descent. The proper selection of ɛ k is not trivial. One typically begins with a starting value and successively increases ɛ k until N k is positive definite. According to Theorem.2 definiteness can be checked, e.g., by computing the eigenvalues of N k. Numerically more efficient techniques such as the Cholesky factorization can be used, which imply positive definiteness if and only if the matrix can be factorized into N k = D k D T k with D k a lower triangular matrix with strictly positive entries on its diagonal [3]. 2.2 Numerical minimization algorithms 39

18 Exercise 2.6. Verify that line search with Newton s method does converges in a single step independent of the starting point x for the quadratic minimization problem min f(x) = x R n 2 xt P x b T x with P positive definite. Pros and cons of line search with Newton s method can be summarized as follows: (+) Quadratic convergence if the Hessian matrix ( 2 f)(x k ) is positive definite; ( ) Loss of positive definiteness of the Hessian matrix ( 2 f)(x k ) if x k is not in a sufficiently small neighborhood of the minimizer x ; ( ) Requires evaluation of the Hessian matrix ( 2 f)(x k ) and the computation of its inverse (not explicitly but by solving a linear system of equations at each x k ) Quasi Newton methods In the quasi Newton method the evaluation and in particular inversion of the Hessian matrix ( 2 f)(x k ) is replaced by an iterative procedure, which makes the approach suitable also for medium and large scale systems with n. The underlying idea makes use of (2.4), i.e. a quadratic model of the objective function given by f(x k+ ) = f(x k ) + s T k ( f)(x k ) + 2 st k B k s k, (2.47) with the difference that ( 2 f)(x k ) is replaced by the (n n) matrix B k, which is assumed symmetric and positive definite. Proceeding as in Newton s method, the search direction is chosen as s k = B k ( f)(x k) (2.48) and minimizes the quadratic (convex) approximation (2.47). With this, the next iterate is x k+ = x k α k B k ( f)(x k) (2.49) with the step length chosen to satisfy the Wolfe conditions (2.9). The crucial point is now to determine B k from the knowledge of B k, ( f)(x k ) and ( f)(x k ). For this, let f(x) C 2 (R n ) and recall the integral mean value theorem (.26), which implies ( f)(x k+ ) ( f)(x k ) = ( 2 f)(x k + r(x k+ x k ))(x k+ x k )dr ( 2 f)(x k+ )(x k+ x k ). In view of the approximation of the Hessian matrix ( 2 f)(x k+ ) by B k+ this motivates ( f)(x k+ ) ( f)(x k ) = B k+ (x k+ x k ). From a numerical point of view it is advantageous to select the approximation of the Hessian matrix so that rankb k+ B k is small [3]. Quasi Newton methods can be hence characterized by the following three properties B k (x k+ x k ) = ( f)(x k ) (2.5a) 4 Chapter 2 Static unconstrained optimization

19 B k+ (x k+ x k ) = ( f)(x k+ ) ( f)(x k ) (2.5b) B k+ = B k + B k, rank B k = m (2.5c) for k N {}. Eqn. (2.5b) is also known as the secant condition. The idea behind (2.5c) is to minimize the distance between B k+ and B k in some suitable norm. Typically m = or m = 2 is chosen leading to so called rank and rank 2 corrections B k. Introducing p k = x k+ x k and q k = ( f)(x k+ ) ( f)(x k ) properties (2.5) imply the frequently used relations B k+ p k = q k (2.5a) (B k+ B k )p k = ( f)(x k+ ) (2.5b) ( f)(x k+ ) = q k B k p k. (2.5c) Since B k is assumed positive definite and as such is invertible for any k =,,... it is reasonable to impose that the rank perturbation B k does not interfere with this assumption (for a detailed discussion on this topic the reader is referred to the analysis of matrix perturbations). Hence, instead of determining B k+ = B k + B k we will seek for H k+ = H k + H k inverting B k+. A straightforward rank correction is obtained using H k = γ k z k z T k z k z T k is at most of rank. Substitution into (2.5a) results in since the dyadic product p k = H k+ q k = H k q k + γ k z k z T k q k. (2.52) From this one obtains (p k H k q k )(p k H k q k ) T = γ 2 kz k z T k q k q T k z k z T k = γ 2 k( z T k q k ) 2zk z T k = γ k ( z T k q k ) 2 Hk. Solving for H k hence yields H k+ = H k + (p k H k q k )(p k H k q k ) T γ k ( z T k q k ) 2. (2.53) This expression can be further simplified by taking the scalar product of (2.52) with q T k, i.e. q T k p k = q T k H k q k + γ k q T k z k z T k q k = q T k H k q k + γ k ( z T k q k ) 2. Solving for the latter term and substituting into (2.53) results in the so called good Broyden method H k+ = H k + (p k H k q k )(p k H k q k ) T q T k (p k H k q k ). (2.54) Various convergence results are available for Broyden s method proving superlinear convergence under certain conditions. For details, the reader is referred to, e.g., [3, 5, 6]. The main problem with (2.54) is that positive definiteness of H k+ is only preserved if q T k (p k H k q k ) >. One of the most elegant techniques to ensure this property is provided by the Davidson Fletcher Powell (DFP) method. The technique is summarized in Algorithm 3 below and essentially relies on the initialization of the algorithm by a positive definite matrix H. 2.2 Numerical minimization algorithms 4

20 It can be shown that H k remains positive definite as long as H is positive definite and the Algorithm 3: Quasi Newton method with DFP update. input : H (symmetric, positive definite matrix) x (starting value) ɛ x, ɛ f (stopping criteria) initialize : k = repeat Compute search direction s k = H k ( f)(x k ) Apply line search to solve min αk f(x k + α k s k ) (taking into account the Wolfe conditions (2.9)) Compute x k+ = x k + α k s k, p k = x k+ x k and q k = ( f)(x k+ ) ( f)(x k ) Update using H k+ = H k + p kp T k p T k q H kq k q T k H k k q T k H (2.55) kq k until x k+ x k ɛ x f(x k+ ) f(x k ) ɛ f ; condition q T k p k > is satisfied. Since the approximation of the inverse Hessian matrix is in any step corrected by two rank matrices one refers in case of the DFP update also to a rank 2 correction. An alternative to the DFP update is given by the so called Broyden Fletcher Goldfarb Shanno (BFGS) method. Herein, the iterative determination of the inverse Hessian matrix in Algorithm 3 is replaced by H k+ = ( E p kq T k q T k p k )H k ( E q kp T k q T k p k ) + p kp T k q T k p. (2.56) k In general superlinear convergence is achieved by making use quasi Newton methods involving the DFP or the BFGS update laws. While convergence of Newton s method is faster, its cost per iteration is higher due to the need for second order derivatives and the explicit inversion of the Hessian matrix. For further analysis and information regarding implementation of the quasi Newton method the reader is referred to, e.g., [9] Trust region methods Trust region methods are somewhat similar to line search methods in the sense that they both generate steps based on a quadratic model of the objective function. They, however, differ in the way the model is exploited. While line search methods rely on the determination of a search (descent) direction and a suitable step length to move along this direction trust region methods define a region around the current iterate. In this region, the quadratic model is trusted to be an adequate approximation of the objective function, i.e. m(s k ) = f(x k ) + s T k ( f)(x k ) + 2 st k B k s k f(x k + s k ) (2.57) 42 Chapter 2 Static unconstrained optimization

21 with B k an appropriate symmetric and uniformly bounded matrix. Application of Taylor s formula (.27) reveals that the error of approximation is of the order s k 2 or even s k 3 if B k = ( 2 f)(x k ). The trust region around the iterate x k, which is subsequently characterized by the parameter k, can be interpreted as the region, where f(x k +s k ) is supposed to be sufficiently accurate represented by m(s k ). In trust region methods, the minimization problem min s k R n m(s k) = f(x k ) + s T k ( f)(x k ) + 2 st k B k s k f(x k + s k ) s.t. s k k (2.58) is solved in each iteration k for suitable trust region radius k. The solution s k of (2.58) is hence the minimizer of m(s k ) in the ball of radius k. Contrary to line search both search direction and step length are determined simultaneously. The proper choice of the degree of freedom k is crucial in a trust region method. For this, the agreement between the model function m(s k ) and the objective function f(x k ) at previous iterations is considered in terms of the ratio ϱ(s k ) = f(x k) f(x k + s k ). (2.59) m() m(s k ) Herein, the numerator is called actual reduction and the denominator is the predicted reduction. Note that the predicted reduction is always nonnegative since s k minimizes m(s k ) inside the trust region that includes s k =. As a result, if ϱ(s k ) <, then the new value f(x k + s k ) of the objective function is greater than the current value f(x k ) so that the step must be rejected and the trust region must be shrunk. For ϱ(s k ) the agreement between model and objective function is good so that the trust region may be expanded for the next iteration. If < ϱ(s k ), then the trust region is shrunk in the next iteration by reducing k. The principle process is summarized in Algorithm 4 [9]. Thereby, refers to the overall bound on the trust region radius. The radius is increased only if s k reaches the boundary of the trust region, i.e. when s k = k. For the implementation of trust region methods and the update of the Hessian matrix B k+ the reader is referred to [9, ] Direct search methods Direct (derivative free) methods are characterized by the fact that no explicit knowledge of the gradient or the Hessian matrix for the objective function f(x) is needed to compute the minimum. Herein, a series of function values is computed for a set of samples to determine the subsequent iteration point. One of the most famous methods in this context is the so called simplex method of Nelder and Mead [8]. In the case of two decision variables x R 2 a simplex is a triangle and the method makes use of the comparison of function values at the triangle s three vertices. The worst vertex characterized by the largest value of the objective function f(x) is rejected and replaced with a new vertex. With this, a new triangle is formed to continue the search. In the course of the process a sequence of triangles, in general of different shape, is generated with decreasing function values at the vertices. Since the size of the triangles is reduced in each step the coordinates of the minimizer can be found. 2.2 Numerical minimization algorithms 43

22 Algorithm 4: Trust region method. input : >, (, ) (starting trust region radius) η [, ) 4 ɛ x, ɛ f (stopping criteria) initialize : k = repeat Determine s k by (approximately) solving (2.58) Evaluate ϱ(s k ) from (2.55) if ϱ(s k ) < 4 then k+ = 4 k else if ϱ(s k ) > 3 4 and s k = k then k+ = min{2 k, } else k+ = k end if ϱ(s k ) > η then x k+ = x k + s k (next iterate) B k+ = B k +... (update Hessian matrix) else x k+ = x k (repeat iteration with k+ < k ) end k = k + until x k+ x k ɛ x f(x k+ ) f(x k ) ɛ f ; Remark 2.4 The simplex algorithm of Nelder and Mead should not be confused with the conceptually different simplex method introduced by G.B. Dantzig in linear programming [4]. In the n dimensional setting a simplex is the convex hull 2 (cf. Definition.7), which is spanned by n + points x k,j, j =,..., n in the k th iteration. Denote by x k,min and x k,max those points x k,j, j =,..., n, where the objective function attains a minimum or maximum, i.e. f(x k,min ) = min f(x k,j), f(x k,max ) = max f(x k,j). (2.6) j=,...,n j=,...,n The centroid of the simplex x k is defined by x k = ( n ) x k,j x k,max n j= (2.6) The algorithm replaces the point x k,max in the simplex by another point with lower value of the objective function. In particular x k,max is replaced by a new point on the line x ref k = x k + α ( x k x k,max ) 2 It reduces to a straight line if n =, a triangle for n = 2, a tetrahedron for n = 3, etc.. (2.62) 44 Chapter 2 Static unconstrained optimization

23 x k,3 x k,3 x k,3 x k, x k x k,2 x k, x k x k,2 x k, x k x k,2 x k,con x k,ref x k,exp (a) Reflection. (b) Expansion. (c) Outer contraction. x k,3 x k,3 x k,con x k, x k x k,2 x k, x k,2 (d) Inner contraction. (e) Shrinkage. Figure 2.6: Operations involved in the simplex algorithm of Nelder and Mead. depending on α. For this, various operations on the simplex are defined, that are summarized in Figure 2.6. During the iteration the simplex moves in the direction of the minimizer and is thereby successively contracted. Algorithm 5 summarizes the general procedure. Implementations of the Nelder/Mead simplex algorithm are available, e.g., in MATLAB and OCTAVE in terms of the function fminsearch. Convergence of the simplex algorithm of Nelder and Mead cannot be guaranteed in general and the algorithm might even approach an non minimizer. However, in practical applications the simplex algorithm yields good results at the cost of a rather slow convergence. 2.3 Benchmark example For the evaluation of the different techniques subsequently Rosenbrock s problem is considered as a benchmark example []. Herein, the minimization problem is considered for the objective function min f(x) = ( x 2 x 2 ) 2 ( x R 2 + x ) 2. (2.63) Figure 2.7 shows the profile of f(x) and the corresponding isoclines. Exercise 2.7. Verify that x = [, ] T is a local minimizer of (2.63). Analyze whether this minimizer is global and unique. Is f(x) a convex function? 2.3 Benchmark example 45

24 Algorithm 5: Simplex algorithm of Nelder and Mead. input : x,j, j =,..., n (initial simplex) α ref > (reflection coefficient [α ref = ]) α exp > (expansion coefficient [α exp = ]) α con (, ) (contraction coefficient [α con = /2]) ɛ x, ɛ f (stopping criteria) initialize : k = repeat Compute x k,min, x k,max Compute centroid x k Reflection step x k,ref = x k + α ref ( x k x k,max ) if f(x k,ref ) < f(x k,min ) then Expansion step x k,exp = x k,ref + α exp (x k,ref x k ) if f(x k,exp ) < f(x k,ref ) then x k,new = x k,exp else x k,new = x k,ref end else if f(x k,ref ) > max j=,...,n, xk,j x k,max f(x k,j ) then if f(x k,max ) f(x k,ref ) then Inner contraction x k,new = α con x k,max + ( α con ) x k else Outer contraction x k,new = α con x k,ref + ( α con ) x k end else Preserve reflection point x k,new = x k,ref end if f(x k,new ) f(x k,max ) then Shrinkage step x k+,j = 2 (x k,j + x k,min ), j =,..., n else x k,max = x k,new, x k+,j = x k,j, j =,..., n end k = k + until x k+ x k ɛ x f(x k+ ) f(x k ) ɛ f ; In the following, it is desired to evaluate the properties and convergence behavior of the line search, trust region and direct search methods introduced in the paragraphs above. For this, the Optimization Toolbox of MATLAB provides the two functions fminunc, implementing quasi Newton as line search method as well as a trust region method fminsearch, implementing the simplex method of Nelder and Mead. 46 Chapter 2 Static unconstrained optimization

25 .5 f = const f x x 2 x x 25 5 Figure 2.7: Rosenbrock s banana or valley function: profile and isoclines. Similarly, the Optim Package of OCTAVE enables, e.g., the use of the functions d2_min, implementing Newton s method; minimize, implementing Newton s method as well as the BFGS method as an example of quasi Newton methods; fminsearch and nelder_mead_min, implementing the simplex method of Nelder and Mead. The reader is also referred to the user supplied function minfunc, which can be obtained from [2] and provides a large selection of line search methods including those discussed in previous sections. This function is used subsequently to evaluate the different line search methods for the Rosenbrock problem. Herein, the strong Wolfe conditions (2.2) are by default used for the step length determination provided that the user does not manually set a different option. Item Method Iter. f(x ) ( f)(x ) 2 #eval(f) Line search: steepest descent Line search: conjugated gradient Line search: Newton Line search: quasi Newton with BFGS 5 Trust region Direct method: Nelder Mead Table 2.: Comparison of line search, trust region and direct search methods for the Rosenbrock problem (2.63). Line search methods are evaluated using the function minfunc [2], fminunc is used for the trust region approach and fminsearch for the simplex algorithm of Nelder and Mead. Table 2. summarizes the results of a comparison of the different algorithms using the functions minfunc, fminunc and fminsearch. The initial value is always set to x = [, ] T. The corresponding behavior of the iterates is depicted in Figure 2.8. The weak performance of steepest descent is directly visible. In particular the local minimizer x = [, ] T is not even closely reached after 5 iterations. This behavior is illustrated in Figure 2.9, where the progress in the successive iterations is depicted for 25 iterations. The steepest descent direction is orthogonal to the respective isocline with the gradient ( f)(x k ) still attaining a reasonable 2.3 Benchmark example 47

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one

More information

Unconstrained Multivariate Optimization

Unconstrained Multivariate Optimization Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Lecture V. Numerical Optimization

Lecture V. Numerical Optimization Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Review of Classical Optimization

Review of Classical Optimization Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION by Darin Griffin Mohr An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

Optimization Methods for Circuit Design

Optimization Methods for Circuit Design Technische Universität München Department of Electrical Engineering and Information Technology Institute for Electronic Design Automation Optimization Methods for Circuit Design Compendium H. Graeb Version

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Numerical Optimization: Basic Concepts and Algorithms

Numerical Optimization: Basic Concepts and Algorithms May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

ECS550NFB Introduction to Numerical Methods using Matlab Day 2 ECS550NFB Introduction to Numerical Methods using Matlab Day 2 Lukas Laffers lukas.laffers@umb.sk Department of Mathematics, University of Matej Bel June 9, 2015 Today Root-finding: find x that solves

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Numerical Methods for Large-Scale Nonlinear Systems

Numerical Methods for Large-Scale Nonlinear Systems Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004 Num.

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point

More information

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x) Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not

More information

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38

More information

Fitting The Unknown 1/28. Joshua Lande. September 1, Stanford

Fitting The Unknown 1/28. Joshua Lande. September 1, Stanford 1/28 Fitting The Unknown Joshua Lande Stanford September 1, 2010 2/28 Motivation: Why Maximize It is frequently important in physics to find the maximum (or minimum) of a function Nature will maximize

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

Lecture 7 Unconstrained nonlinear programming

Lecture 7 Unconstrained nonlinear programming Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

NonlinearOptimization

NonlinearOptimization 1/35 NonlinearOptimization Pavel Kordík Department of Computer Systems Faculty of Information Technology Czech Technical University in Prague Jiří Kašpar, Pavel Tvrdík, 2011 Unconstrained nonlinear optimization,

More information

Numerical solutions of nonlinear systems of equations

Numerical solutions of nonlinear systems of equations Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points

More information

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 013 1 ISSN 50-3153 Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming

More information

Handling nonpositive curvature in a limited memory steepest descent method

Handling nonpositive curvature in a limited memory steepest descent method IMA Journal of Numerical Analysis (2016) 36, 717 742 doi:10.1093/imanum/drv034 Advance Access publication on July 8, 2015 Handling nonpositive curvature in a limited memory steepest descent method Frank

More information

Scientific Computing: Optimization

Scientific Computing: Optimization Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

FALL 2018 MATH 4211/6211 Optimization Homework 4

FALL 2018 MATH 4211/6211 Optimization Homework 4 FALL 2018 MATH 4211/6211 Optimization Homework 4 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution

More information

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10)

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10) Lecture 7: Minimization or maximization of functions (Recipes Chapter 10) Actively studied subject for several reasons: Commonly encountered problem: e.g. Hamilton s and Lagrange s principles, economics

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

More information

Introduction to unconstrained optimization - direct search methods

Introduction to unconstrained optimization - direct search methods Introduction to unconstrained optimization - direct search methods Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Structure of optimization methods Typically Constraint handling converts the

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I IE 5531: Engineering Optimization I Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson November 1, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 1 / 24

More information

Math 164: Optimization Barzilai-Borwein Method

Math 164: Optimization Barzilai-Borwein Method Math 164: Optimization Barzilai-Borwein Method Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 online discussions on piazza.com Main features of the Barzilai-Borwein (BB) method The BB

More information

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes) AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.

More information

EECS260 Optimization Lecture notes

EECS260 Optimization Lecture notes EECS260 Optimization Lecture notes Based on Numerical Optimization (Nocedal & Wright, Springer, 2nd ed., 2006) Miguel Á. Carreira-Perpiñán EECS, University of California, Merced May 2, 2010 1 Introduction

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

Numerical Methods I Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Multivariate Newton Minimanization

Multivariate Newton Minimanization Multivariate Newton Minimanization Optymalizacja syntezy biosurfaktantu Rhamnolipid Rhamnolipids are naturally occuring glycolipid produced commercially by the Pseudomonas aeruginosa species of bacteria.

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Lecture 7: Optimization methods for non linear estimation or function estimation

Lecture 7: Optimization methods for non linear estimation or function estimation Lecture 7: Optimization methods for non linear estimation or function estimation Y. Favennec 1, P. Le Masson 2 and Y. Jarny 1 1 LTN UMR CNRS 6607 Polytetch Nantes 44306 Nantes France 2 LIMATB Université

More information

Notes on Numerical Optimization

Notes on Numerical Optimization Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18, 2014 1 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Optimization Methods for Machine Learning

Optimization Methods for Machine Learning Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction

More information

Step lengths in BFGS method for monotone gradients

Step lengths in BFGS method for monotone gradients Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly

More information

Maria Cameron. f(x) = 1 n

Maria Cameron. f(x) = 1 n Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction ISSN 1749-3889 (print), 1749-3897 (online) International Journal of Nonlinear Science Vol.11(2011) No.2,pp.153-158 Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method Yigui Ou, Jun Zhang

More information