Review of Classical Optimization

Size: px
Start display at page:

Download "Review of Classical Optimization"

Transcription

1 Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51

2 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization Motivation Most practical optimization problems involve many variables, so the study of single variable minimization may seem academic. However, the optimization of multi-variable functions can be broken into two parts: 1. Finding a suitable search direction; 2. Minimizing along that direction. The second part of this strategy, the so-called line search, is the motivation for studying single variable minimization. Multidisciplinary Design Optimization of Aircrafts 52

3 Consider a scalar function, f, that depends on a single independent variable, x. Suppose we want to find the value of x where f(x) is a minimum value: minimize f(x) (2.1) by varying x R Furthermore, we want to do this with low computational cost (few iterations and low cost per iteration), low memory requirements, and low failure rate. Often the computational effort is dominated by the computation of f and its derivatives so some of the requirements can be translated into: evaluate f(x) and df/ dx as few times as possible. Multidisciplinary Design Optimization of Aircrafts 53

4 You end up having a few choices: Choose methods that need / do not need the evaluation of the function derivatives (if you can compute derivatives cheaply, you may want to consider using them); If the function is pathologically bad, you may want to avoid the use of derivatives; When using bracketing methods you want to choose the approach so that it provides faster rates of convergence in general; In multi-dimensional cases, choose between methods that require order N and order N 2 storage. Multidisciplinary Design Optimization of Aircrafts 54

5 2.1.2 Types of Minima The point x is a strong local minimizer, if f(x ) < f(x) for all x near x ; weak local minimizer, if f(x ) f(x) for all x near x ; strong global minimizer, if f(x ) < f(x), for all x; weak global minimizer, if f(x ) f(x), for all x. If a minimum does not exist, the function is not bounded below. Multidisciplinary Design Optimization of Aircrafts 55

6 2.1.3 Optimality Conditions Taylor s theorem is useful for identifying local minima. Theorem. [Taylor s theorem] θ (0 θ 1) such that If f(x) is n times differentiable, then there exists f(x + h) = f(x) + hf (x) h2 f (x) (n 1)! hn 1 f n 1 (x) + 1 n! hn f n (x + θh) }{{} O(h n ) Assuming f is twice-continuously differentiable and a minimum of f exists at x, then Taylor s theorem, using n = 2 and x = x, leads to f(x + ε) = f(x ) + εf (x ) ε2 f (x + θε) (2.2) For a local minimum at x, it requires that f(x + ε) f(x ) for a range δ ε δ, where δ is a positive number. Given this definition, and the Taylor series expansion (2.2), for f(x ) to be a local minimum, it requires εf (x ) ε2 f (x + θε) 0. For any finite values of f (x ) and f (x ), the value ε can be always be chosen small enough such that εf (x ) 1 2 ε2 f (x ). Multidisciplinary Design Optimization of Aircrafts 56

7 For εf (x ) to be non-negative, then f (x ) = 0, because the sign of ε is arbitrary. This is the first-order optimality condition. A point that satisfies the first-order optimality condition is called a stationary point. Besides minima, other type of stationary points include maxima and inflection points. Because the first derivative term is zero, the second derivative term must be considered. This term must be non-negative for a local minimum at x. Since ε 2 is always positive, then f (x ) 0. This is the second-order optimality condition. Higher-order can always be made smaller than the second-order term by choosing a small enough ε. Discontinuities: Many optimizers fail in the presence of discontinuities. This is specially critical for gradient-based optimizers or others that look only at the local region of design space. Multidisciplinary Design Optimization of Aircrafts 57

8 Necessary conditions (for a local minimum): f (x ) = 0; f (x ) 0 (2.3) Sufficient conditions (for a strong local minimum): f (x ) = 0; f (x ) > 0 (2.4) The optimality conditions can be used to Verify that a point is a minimum (sufficient conditions); Realize that a point is not a minimum (necessary conditions); Define equations that can be solved to find a minimum (in simple cases). Gradient-based minimization methods find a local minima by finding points that satisfy the optimality conditions. Multidisciplinary Design Optimization of Aircrafts 58

9 2.1.4 Rate of Convergence The rate of convergence is a measure of how fast an iterative method converges to the numerical solution. An iterative method is said to converge with order r when r > 0 is the largest number such that x k+1 x 0 < lim < (2.5) k x k x r where k is iteration number. This is to say that the above limit must be a positive constant. This constant is the asymptotic error constant. x k+1 x lim = γ (2.6) k x k x r If the limit is zero when r = 1, we have a special case called superlinear convergence. When r = 2, the method sequence converges quadratically, this means that the number of correct figures roughly doubles with each iteration. When solving real problems, the exact x is not known in advance, but it is useful to plot x k+1 x k and g k (i.e. the norm of the gradient) in a log-axis plot versus k. Multidisciplinary Design Optimization of Aircrafts 59

10 Some examples from Gill et al. [29] Example 2.1: x k = c 2k, for 0 c < 1 Each member is the square of the previous, the limit is zero. x k+1 0 x k 0 2 = c so r = 2 (quadratic convergence) and γ = c. Example 2.2: y k = c 2 k, for c 0 Each member is the square root of the previous, the limit is 1. c 2 (k+1) 1 c 2 k 1 = 1 c 2 (k+1) 1 = 1 2 so r = 1 (linear convergence) and γ = 1/2. Multidisciplinary Design Optimization of Aircrafts 60

11 2.1.5 Unimodality and Bracketing the Minimum Line search methods using bracketing require that the function f to be unimodal, that is, it monotonically decreases as we approach x from the left and then monotonically increases to the right of x (it has a single local minimum). Example of a unimodal function Multidisciplinary Design Optimization of Aircrafts 61

12 The first step in the process of finding the minimum is to bracket it in an interval. Input: function f, starting point x 1, step size, expansion parameter γ 1 Output: Three point pattern x 1,x 2,x 3 such that f 1 f 2 < f 3 begin set x 2 = x 1 + evaluate f 1 and f 2 if f 2 > f 1 then interchange f 1 and f 2, x 1 and x 2, and set = end repeat if f 3 not null then rename f 2 as f 1, f 3 as f 2, x 2 as x 1, x 3 as x 2 end set = γ, x 3 = x 2 +, and evaluate f 3 until f 3 > f 2 end Pseudo-code 1: Bracketing Algorithm Common values for γ are 2 (step size doubled at each successive iteration) or (the golden section ratio). The three-point-pattern is needed for all interval reduction methods. Multidisciplinary Design Optimization of Aircrafts 62

13 2.1.6 Interval Reduction Methods These methods for function minimization start with an interval of uncertainty containing the minimum, that could have been determined using the bracketing algorithm, and successively reduce its size to a desired tolerance. These methods should be robust and efficient, that is, they should converge to the minimum using a reduced number of function evaluations. Among the most common methods are: Fibonacci Method; Golden Section Method; Polynomial-Based Methods. Multidisciplinary Design Optimization of Aircrafts 63

14 Fibonacci Method The Fibonacci method is the strategy that yields the maximum reduction in the interval of uncertainty for a given number of function evaluations. Leonardo Pisa (nicknamed Fibonacci) found a sequence of numbers that describe the evolution of a population of rabbits: Rabbit population and Fibonacci numbers The first few numbers of this sequence are 1, 1, 2, 3, 5, 8, 13,.... In general, the sequence of Fibonacci numbers can be generated using F 0 = F 1 = 1 (2.7) F k = F k 1 + F k 2, k = 2,..., n (2.8) Multidisciplinary Design Optimization of Aircrafts 64

15 Say we have an interval of uncertainty and the function has been evaluated at the boundaries. In order to reduce the interval of uncertainty, we have to evaluate two new points inside the interval. Then (assuming the function is unimodal) the interval that has lower function evaluation inside its boundaries is the new interval of uncertainty. The most efficient way of reducing the size of the interval would be to: (1) ensure that the two possible intervals of uncertainty are the same size, and (2) once the new interval is chosen, the point inside the domain can be used and only one more function evaluation is required for selecting a new interval of uncertainty. Fibonacci: Sequence of intervals Multidisciplinary Design Optimization of Aircrafts 65

16 The interval sizes, I k are such that I 1 = I 2 + I 3 I 2 = I 3 + I 4. I k = I k+1 + I k+2 (2.9). I N 4 = I N 3 + I N 2 = 8I N I N 3 = I N 2 + I N 1 = 5I N I N 2 = I N 1 + I N = 3I N I N 1 = 2I N Recognizing the Fibonacci numbers, the following relation holds I n j = F j+1 I n, j = 1, 2,..., n 1 (2.10) Multidisciplinary Design Optimization of Aircrafts 66

17 To find the successive interval sizes, we need to start from the last interval in reverse order, only after this can we start the search. When using the Fibonacci search, we have to decide on the number of function evaluations a priori. This is not always convenient, as the termination criteria is often the variation of the function values in the interval of uncertainty. Furthermore, this method requires that the sequence be stored. Fibonacci search is the optimum because in addition to yielding two intervals that are the same size and reusing one point, the interval between the two interior points converges to zero, and in the final iteration, the interval is divided into two (almost exactly), which is the optimum strategy for the last iteration. A detailed description of this method can be found in [10], pp Multidisciplinary Design Optimization of Aircrafts 67

18 Input: function f, starting values x 1 and x 4 bracketing the minimum, tolerance ε or maximum iterations N (condition: x 1 < x 4 ) Output: interval of size smaller ε that contains the minimum of f(x) begin I 1 x 4 x 1 if ε is given then N minn : ε = I N = I 1 F N end I 2 F N 1 F N I 1 x 2 x 4 I 2 for k = 1 N 1 do I k+1 I k 1 + I k x 3 x 1 + I k+1 if f(x 2 ) < f(x 3 ) then x 4 = x 1 ; x 1 = x 3 else x 1 = x 2 ; x 2 = x 3 end end output [x 1, x 4 ] is the final interval end Pseudo-code 2: Fibonacci Algorithm Multidisciplinary Design Optimization of Aircrafts 68

19 Golden Section Method In the golden section search, the interval reduction strategy is uniform, thus, it is independent of the number of iterations. The interval sizes, I k, are such that Then, I 1 = I 2 + I 3 I 2 = I 3 + I 4. I 2 = I 3 = I 4 = = τ I 1 I 2 I 3 The positive solution of this equation is the golden section ratio, τ 2 + τ 1 = 0 (2.11) τ = lim k F k 1 F k (2.12) Therefore, the Fibonacci search also reaches this value in the limit. Multidisciplinary Design Optimization of Aircrafts 69

20 The basic golden section algorithm is similar to the Fibonacci algorithm, except that α is replaced by the golden ratio τ. Both Fibonacci and golden section methods offer always two equal intervals and reuses a previous interior point, but the latter does not use an optimal strategy for the last iteration. There is no last iteration. The interval is always divided in the same proportions. Assuming the initial uncertainty interval is I 1 = [0, 1]. The function is then evaluated at 1 τ and τ. The two possible intervals are [0, τ] and [1 τ, 1], and they are of the same size. If, say [0, τ] is selected, then the next two interior points would be τ(1 τ) and ττ, but τ 2 = 1 τ which has already been evaluated. Golden section: Sequence of intervals Multidisciplinary Design Optimization of Aircrafts 70

21 Similarly to the Fibonacci method, the golden section method has linear convergence, meaning that successive significant figures are won linearly with additional function evaluations. This method can also be integrated with the three-point bracketing algorithm by choosing the expansion parameter as 1 + τ. A detailed description of this method can be found in [10], pp Multidisciplinary Design Optimization of Aircrafts 71

22 Polynomial-Based Methods More efficient procedures use information about f gathered during iteration. One way of using this information is to produce an estimate of the function which we can easily minimize. The lowest order function that we can use for this purpose is a quadratic, since a linear function does not have a minimum. Suppose we approximate f by f = 1 2 ax2 + bx + c. (2.13) If a > 0, the minimum of this function is x = b/a. Multidisciplinary Design Optimization of Aircrafts 72

23 To generate a quadratic approximation, three independent pieces of information are needed. For example, if we have the value of the function, its first derivative, and its second derivative at point x k, we can write a quadratic approximation of the function value at x as the first three terms of a Taylor series f(x) = f(x k ) + f (x k )(x x k ) f (x k )(x x k ) 2 (2.14) If f (x k ) is not zero, and setting x = x k+1 this yields This is Newton s method used to find a zero of the first derivative. x k+1 = x k f (x k ) f (x k ). (2.15) Robust algorithms are obtained when polynomial fit and sectioning ideas are merged, such as Brent s Quadratic Fit-Sectioning Algorithm. Multidisciplinary Design Optimization of Aircrafts 73

24 Brent s Quadratic Fit-Sectioning Algorithm Brent [11] devised a method fits a quadratic polynomial and accepts the quadratic minimum when the function is cooperative, and uses the golden section method otherwise. At any particular stage, Brent s algorithm keeps track of six points (not necessarily all distinct), a, b, u, v, w and x, defined as follows: The minimum is bracketed between a and b; x is the point with the least function value found so far (or the most recent one in case of a tie); w is the point with the second least function value; v is the previous value of w; u is the point at which the function was evaluated most recently. Multidisciplinary Design Optimization of Aircrafts 74

25 The general idea is the following: parabolic interpolation is attempted, fitting through the points x, v, and w. To be acceptable, the parabolic step must (1) fall within the bounding interval (a, b), and (2) imply a movement from the best current value x that is less than half the movement of the step before last. This second criterion insures that the parabolic steps are converging, rather than, say, bouncing around in some non-convergent limit cycle. The minimum of the quadratic that fits f(x), f(v) and f(w) is u = x 1 2 (x w) 2 [f(x) f(v)] (x v) 2 [f(x) f(w)] (x w)[f(x) f(v)] (x v)[f(x) f(w)]. (2.16) Brent s method converges superlinearly, meaning that the rate at which successive significant figures are liberated increases with each successive function evaluation. A detailed description of this method can be found in [10], pp Multidisciplinary Design Optimization of Aircrafts 75

26 Input: function f, three-point-pattern a, b and x bracketing the minimum, tolerance (condition: f a f x < f b ) Output: interval of size smaller 2ε that contains the minimum of f(x) begin w, v x repeat if x, w and v distinct then try quadratic fit for x, w and v, and determine minimum u using (2.16) if u close to a, b or x, adjust to larger of [a, x] or [x, b] so it stays ε away from x else calculate u using golden sectioning of the larger interval [a, x] or [x, b] end evaluate f u among a, b, x, w, v, u, determine the new a, b, x, w, v until larger interval [a, x] or [x, b] < 2ε end Pseudo-code 3: Brent s Algorithm for Minimum Multidisciplinary Design Optimization of Aircrafts 76

27 Akima Splines Polynomial interpolation often leads to spurious oscillations, specially when the original function exhibits abrupt changes in curvature. Hiroshi Akima, in 1970, published a one-dimensional fitting method that has some very desirable properties [2]. Akima claims that his method is closer to a manually drawn curve than those drawn by other mathematical methods. In 1991, Akima published an update to his algorithm [4, 3] addressing some shortcomings of the original. The approach uses a cubic fit between the data points, so the slope is required at each data point in addition to the value of the point itself. The interpolating polynomial is written between the ith and i + 1st data points as: y = a 0 + a 1 (x x i ) + a 2 (x x i ) + a 3 (x x i ), (2.17) Multidisciplinary Design Optimization of Aircrafts 77

28 with coefficients defined by a 0 = y i (2.18) a 1 = y i a 2 = 3m i 2y i y i+1 x i+1 x i a 3 = y i + y i+1 2m i (x i+1 x i ) 2 and m i = y i+1 y i (2.19) x i+1 x i which is the slope of the line segment passing through the points. The method of determining the derivatives, y, is what makes the Akima methods unique. In the 1991 method, the derivative is y i = ωk f k ωk (2.20) where f k is the computed derivative at P i of a third-order polynomial passing through P i and three other nearby points: Multidisciplinary Design Optimization of Aircrafts 78

29 f 1 = F (P i 3, P i 2, P i 1, P i ) (2.21) f 2 = F (P i 2, P i 1, P i, P i+1 ) f 3 = F (P i 1, P i, P i+1, P i+2 ) f 4 = F (P i, P i+1, P i+2, P i+3 ) The weights are inversely proportional to the product of what Akima calls a volatility measure and a distance measure, ω k = 1 v k d k (2.22) The distance factor is the sum of squares of the distance from P i and the other three points: d 1 = (x i 3 x i ) 2 + (x i 2 x i ) 2 + (x i 1 x i ) 2 (2.23) d 2 = (x i 2 x i ) 2 + (x i 1 x i ) 2 + (x i+1 x i ) 2 d 3 = (x i 1 x i ) 2 + (x i+1 x i ) 2 + (x i+2 x i ) 2 d 4 = (x i+1 x i ) 2 + (x i+2 x i ) 2 + (x i+3 x i ) 2 The volatility factor, v k, is the sum of squares of deviation from a least-squares linear fit of the four points. Multidisciplinary Design Optimization of Aircrafts 79

30 2.1.7 Zero of a Function Solving the first-order optimality condition, that is, finding x such that g(x ) = f (x ) = 0, is equivalent to finding the roots of the first derivative of the function to be minimized. In addition, in constrained optimization, zero-finding problems also occur due to constraints, i.e., h(x) = 0. Therefore, root finding methods can be used to find stationary points and are useful in function minimization. Zero-finding algorithms can be classified in two basic categories, depending on the starting guess: Interval of uncertainty: Bisection method Arbitrary point: Newton s method Secant method Multidisciplinary Design Optimization of Aircrafts 80

31 Bisection Method This method for finding the zero of a function f starts with two guesses, forming an initial bracket [a, b] containing the root, for which the function values f(a) and f(b) have opposite sign. A new guess is then chosen at the midpoint, c = 1 2 (a + b). The procedure is repeated by using the new guess and the other guess that brackets the root. This process is then repeated until the desired accuracy is obtained. Multidisciplinary Design Optimization of Aircrafts 81

32 If [a, b] is the initial interval and N is the number of iterations, the final interval is δ = a b 2 N 2N = a b δ ( ) a b N = log 2 δ (2.24) therefore, this method is guaranteed to find the zero to a specified tolerance δ in about log 2 (a b)/δ function evaluations. Bisection yields the smallest interval of uncertainty for a specified number of function evaluations. It has the advantage that it always converges provided that the initial interval contains a zero. Because it is a bracketing method, it generates a set of nested intervals. The only drawback is that the rate of convergence is rather slow. Since δ k+1 = k /2, from the definition of rate of convergence, for r = 1, lim = δ k+1 = 1 k δ k 2 therefore, the bisection algorithm exhibits a linear rate of convergence (r = 1) and the asymptotic error constant 1/2. Multidisciplinary Design Optimization of Aircrafts 82

33 To find the minimum of a function using bisection, we would evaluate the derivative of f at each iteration, instead of the function value. Using machine precision, it is not possible find the exact zero, so we will be satisfied with finding an x that belongs to an interval [a, b] such that the function g( f ) satisfies g(a)g(b) < 0 and a b < δ where δ is a small tolerance. This tolerance might be dictated by the machine representation (using double precision this is usually ), the precision of the function evaluation, or a limit on the number of iterations we want to perform with the root finding algorithm. Multidisciplinary Design Optimization of Aircrafts 83

34 Input: function f, endpoint values a and b, tolerance ε or maximum iterations N (conditions: a < b and f(a)f(b) < 0) Output: value which differs from a root of f(x) = 0 by less than ε begin k 1 while k N do c (a + b)/2 if f(c) = 0 or (b a)/2 < ε then return c stop end k k + 1 if sign(f(c)) = sign(f(a)) then a c else b c end end output Method failed end Pseudo-code 4: Bisection Method Multidisciplinary Design Optimization of Aircrafts 84

35 Newton-Raphson Method Newton s method for finding a zero can be derived from the Taylor s series expansion of the function about some initial guess x k, f(x k+1 ) = f(x k ) + (x k+1 x k )f (x k ) + O((x k+1 x k ) 2 ) where x k+1 = x k + x. Setting the function to zero and ignoring the terms of order higher than two results f(x k ) + (x k+1 x k )f (x k ) 0 Solving for the new estimate, x k+1, yields x k+1 = x k f(x k) f (x k ) (2.25) Multidisciplinary Design Optimization of Aircrafts 85

36 This iterative procedure converges quadratically, so lim k x k+1 x x k x 2 = const. While having quadratic converge is a great property, this method is not guaranteed to converge, and only works under certain conditions. To minimize a function using Newton s method, we simply substitute the function for its first derivative and the first derivative by the second derivative, x k+1 = x k f (x k ) f (x k ) (2.26) Multidisciplinary Design Optimization of Aircrafts 86

37 Input: function f, starting value x 0, tolerance ε, maximum iterations N Output: value which differs from a root of f(x) = 0 by less than ε begin k 1 while k N do x k+1 x k f(x k )/f (x k ) if x k+1 x k < ε or f(x k+1 ) < ε then return x k+1 stop end x k x k+1 k k + 1 end output Method failed end Pseudo-code 5: Newton-Raphson Method Multidisciplinary Design Optimization of Aircrafts 87

38 Example 2.3: Function Minimization Using Newton s Method Solve the single-variable optimization problem using Newton s method. minimize f(x) = (x 3)x 3 (x 6) 4 w.r.t. x Newton s method with several different initial guesses. The x k are Newton iterates. x N is the converged solution. Multidisciplinary Design Optimization of Aircrafts 88

39 Secant Method Newton s method requires the first derivative for each iteration (and the second derivative when applied to minimization). In some practical applications, it might not be possible to obtain this derivative analytically or it might just be troublesome. If we use a backward-difference approximation for f (x k ), f (x k ) f(x k) f(x k 1 ) x k x k 1 and substitute into Newton s method, results x k+1 = x k f(x k ) ( xk x k 1 ) f(x k ) f(x k 1 ) (2.27) which is the secant method ( the poor-man s Newton method ). Under favorable conditions, this method has superlinear convergence (1 < r < 2), with r Multidisciplinary Design Optimization of Aircrafts 89

40 Input: function f, starting values x 0 and x 1 near the root, tolerance ε, maximum iterations N Output: value which differs from a root of f(x) = 0 by less than ε begin k 1 while k N do if f(x k 1 ) < f(x k ) then swap x k 1 and x k end x k x k 1 x k+1 x k f(x k ) f(x k ) f(x k 1 ) if x k+1 x k < ε or f(x k+1 ) < ε then return x k+1 stop end x k 1 x k x k x k+1 k k + 1 end output Method failed end Pseudo-code 6: Secant Method Multidisciplinary Design Optimization of Aircrafts 90

41 Linear Interpolation Method The bisection method is very simple, but generally quite inefficient, in part because it only makes use of the sign of the function f(x) at each evaluation, while ignoring its magnitude. Thus it ignores significant information which could be used accelerate the finding of the root. A method based on interpolation makes use of this information by approximating the function on the interval [x 1, x 2 ] by the chord joining the points (x 1, f(x 1 )) and (x 2, f(x 2 )), that is the straight line: y y 1 = y 2 y 1 (2.28) x x 1 x 2 x 1 Solving this linear equation for y = 0 yields the new interval endpoint within the interval [x 1, x 2 ]: ( ) x2 x 1 x 3 = x 1 f(x 1 ) f(x 2 ) f(x 1 ) (2.29) The choice between the two intervals [x 1, x 3 ] and [x 3, x 2 ] is decided by evaluating f(x 3 ) and discarding the interval whose endpoints have the same sign, as was done in the bisection method. This iteration process is repeated, but it will converge more quickly than the bisection method since the information about the magnitude of f(x) pushes the x 3 value more quickly towards the actual root. Multidisciplinary Design Optimization of Aircrafts 91

42 Input: function f, starting values x 0 and x 1 near the root, tolerance ε, maximum iterations N (condition: f(x 0 )f(x 1 ) < 0) Output: value which differs from a root of f(x) = 0 by less than ε begin k 1 while k N do x k x k 1 x k+1 x k f(x k ) f(x k ) f(x k 1 ) if x k+1 x k < ε or f(x k+1 ) < ε then return x k+1 stop end if f(x k 1 )f(x k+1 ) < 0 then x k x k+1 else x k 1 x k+1 end k k + 1 end output Method failed end Pseudo-code 7: Linear Interpolation Method Multidisciplinary Design Optimization of Aircrafts 92

43 2.1.8 Line Search Techniques Line search methods are related to single-variable optimization methods as they address the problem of minimizing a multi-variable function along a line, which is a subproblem in many gradient-based optimization methods. After a gradient-based optimizer has computed a search direction p k, it must decide how far to move along that direction. The step can be written as where the positive scalar α k is the step length. x k+1 = x k + α k p k (2.30) Most algorithms require that p k be a descent direction, i.e., that p k must have a projection in g k such that p T k g k < 0. This guarantees that f can be reduced by stepping along this direction. We want to compute a step length α k that yields a substantial reduction in f, but we do not want to spend too much computational effort in making the choice. Ideally, we would find the global minimum of f(x k + α k p k ) with respect to α k but in general, it is too expensive to compute this value. Even to find a local minimizer usually requires too many evaluations of the objective function f and possibly its gradient g. More practical methods perform an inexact line search that achieves adequate reductions of f at reasonable cost. Multidisciplinary Design Optimization of Aircrafts 93

44 Wolfe Conditions A typical line search involves trying a sequence of step lengths, accepting the first that satisfies certain conditions. A common condition requires that α k should yield a sufficient decrease of f, as given by the inequality f(x k + αp k ) f(x k ) + µ 1 αg T k p k (2.31) for a constant 0 µ 1 1. In practice, this constant is small, say µ 1 = Any sufficiently small step can satisfy the sufficient decrease condition, so in order to prevent steps that are too small we need a second requirement called the curvature condition, which can be stated as g(x k + αp k ) T p k µ 2 g T k p k (2.32) where µ 1 µ 2 1, and g(x k + αp k ) T p k is the derivative of f(x k + αp k ) with respect to α k. This condition requires that the slope of the univariate function at the new point be greater. Since we start with a negative slope, the gradient at the new point must be either less negative or positive. Typical values of µ 2 are 0.9 when using a Newton type method and 0.1 when a conjugate gradient methods is used. Multidisciplinary Design Optimization of Aircrafts 94

45 The sufficient decrease (2.31) and curvature (2.32) conditions are known collectively as the Wolfe conditions. We can also modify the curvature condition to force α k to lie in a broad neighborhood of a local minimizer or stationary point and obtain the strong Wolfe conditions f(x k + αp k ) f(x k ) + µ 1 αg T k p k. (2.33) g(x k + αp k ) T g T p k µ2, (2.34) where 0 < µ 1 < µ 2 < 1. The only difference when comparing with the Wolfe conditions is that these conditions do not allow points where the derivative has a positive value that is too large, and therefore exclude points that are far from the stationary points. If µ 2 = 0, then require g(x k + αp k ) T p k = 0 and we have an exact line search. k p k Figure 2.1: Acceptable steps for the Wolfe conditions Multidisciplinary Design Optimization of Aircrafts 95

46 Sufficient Decrease and Backtracking The curvature condition can be ignored by performing backtracking, i.e., by executing the following algorithm: 1. Choose a starting step length 0 < ᾱ < 1, and reduction ratio 0 < ρ < If f(x k + αp k ) f(x k ) + µ 1 αg T k p k then set α k = α and stop. 3. α = ρα. 4. Return to 2. When using Newton or quasi-newton methods, the starting step length ᾱ is usually set to 1. The step size reduction ratio, ρ, sometimes varies during the optimization process and is such that 0 < ρ < 1. In practice ρ is not set to be too close to 0 or 1. For steepest descent and conjugate gradient methods, which do not produce well-scaled search directions, need to use other information to guess a step length. One strategy is to assume that the first-order change in x k will be the same as the one obtained in the previous step. i.e, that ᾱg T k p k = α k 1 g T k 1 p k 1 and therefore: g T k 1 ᾱ = α p k 1 k 1 gk T p. (2.35) k Multidisciplinary Design Optimization of Aircrafts 96

47 Line Search Algorithm Using the Strong Wolfe Conditions This procedure is guaranteed to find a step length satisfying the strong Wolfe conditions for any parameters µ 1 and µ 2. This procedure has two stages: 1. Begins with trial α 1, and keeps increasing it until it finds either an acceptable step length or an interval that brackets the desired step lengths. 2. In the latter case, a second stage (the zoom algorithm) is performed that decreases the size of the interval until an acceptable step length is found. Define the univariate function φ(α) = f(x k + αp k ), so that φ(0) = f(x k ). Therefore, φ (α i ) is the derivative of f in the line direction taken with respect to α at α i. Multidisciplinary Design Optimization of Aircrafts 97

48 The first stage is as follows: 1. Set α 0 = 0, choose α 1 > 0 and α max. Set i = Evaluate φ(α i ). 3. If [φ(α i ) > φ(0) + µ 1 α i φ (0)] or [φ(α i ) > φ(α i 1 ) and i > 1] then, set α = zoom(α i 1, α i ) and stop (local minimum bracketed). 4. Evaluate φ (α i ). 5. If φ (α i ) µ 2 φ (0), set α = α i and stop. 6. If φ (α i ) 0, set α = zoom(α i, α i 1 ) and stop. 7. Choose α i+1 such that α i < α i+1 < α max. 8. Set i = i Return to 2. Multidisciplinary Design Optimization of Aircrafts 98

49 The second stage, the zoom(α lo, α hi ) function: 1. Interpolate (using quadratic, cubic, or bisection) to find a trial step length α j between α lo and α hi. 2. Evaluate φ(α j ). 3. If φ(α j ) > φ(0) + µ 1 α j φ (0) or φ(α j ) > φ(α lo ), set α hi = α j. 4. Else: (a) Evaluate φ (α j ). (b) If φ (α j ) µ 2 φ (0), set α = α j and stop. (c) If φ (α j )(α hi α lo ) 0, set α hi = α lo. (d) α lo = α j. Implementing an algorithm based on the strong Wolfe conditions (as opposed to the plain Wolfe conditions) has the advantage that by decreasing µ 2, we can force α to lie closer to the local minimum. More details can be found in Nocedal and Wright, pg [47]. Multidisciplinary Design Optimization of Aircrafts 99

50 Example 2.4: Line Search Algorithm Using Strong Wolfe Conditions The line search algorithm iterations. The first stage is marked with square labels and the zoom stage is marked with circles. Multidisciplinary Design Optimization of Aircrafts 100

51 2.2 Unconstrained Gradient-Based Minimization Many engineering problems involve the unconstrained minimization of a function of several variables. Unconstrained problems also arise when the constraints are eliminated and accounted for by suitable penalty functions. All these problems are of the form minimize by varying f(x) x R n The point x is a strong local minimum, if f(x ) < f(x) for all x near x ; weak local minimum, if f(x ) f(x) for all x near x ; strong global minimum, if f(x ) < f(x), for all x; weak global minimum, if f(x ) f(x), for all x. Note on convention: lowercase or bold roman letters are vectors, lowercase Greek letters are scalars, and uppercase roman letters are matrices. Multidisciplinary Design Optimization of Aircrafts 101

52 2.2.1 Gradient Vector and Hessian Matrix of a Multivariable Function Let f(x) be a real function where x = [x 1, x 2,..., x n ] T is a column vector of n real-valued design variables. The gradient vector of the function f(x) is given by the partial derivatives with respect to each of the independent variables, f x 1 f f(x) g(x) x 2 (2.36). f In the multivariate case, the gradient vector is perpendicular to the the hyperplane tangent to the contour surfaces of constant f: let the tangent plane be defined by t = [ x 1 / s, x 2 / s,, x n / s] T, where s is along a contour or isosurface; then, f(x) = const df ds = 0 df ds = f x 1 x 1 s + f x n x 2 x 2 s + + f x n x n s = 0 f T t = 0 therefore, the dot product of the gradient with the tangent to the contour surface is zero. Multidisciplinary Design Optimization of Aircrafts 102

53 Higher derivatives of multi-variable functions are defined as in the single-variable case, but note that the number of gradient components increase by a factor of n for each differentiation. While the gradient of a function of n variables is an n-vector, the second derivative of an n-variable function is defined by n 2 partial derivatives (the derivatives of the n first partial derivatives with respect to the n variables): 2 f x i x j, i j and 2 f, i = j. x 2 i If the partial derivatives f/ x i, f/ x j and 2 f/ x i x j are continuous and f is single valued, then 2 f/ x i x j exists and 2 f/ x i x j = 2 f/ x j x i. Therefore the second-order partial derivatives can be represented by a square symmetric matrix called the Hessian matrix, 2 f 2 f 2 2 x 1 x 1 x n f(x) H(x).. (2.37) 2 f 2 f, x n x 1 2 x n which contains n(n + 1)/2 independent elements. If f is quadratic, the Hessian of f is constant, and the function can be expressed as f(x) = 1 2 xt Hx + g T x + α. (2.38) Multidisciplinary Design Optimization of Aircrafts 103

54 2.2.2 Optimality Conditions As in the single-variable case, the optimality conditions can be derived from the Taylor-series expansion of f about x : f(x + εp) = f(x ) + εp T g(x ) ε2 p T H(x + εθp)p, (2.39) where 0 θ 1, ε is a scalar, and p is an n-vector. For x to be a local minimum, then for any vector p there must be a finite ε such that f(x + εp) f(x ), i.e. there is a neighborhood in which this condition holds. If this condition is satisfied, then f(x + εp) f(x ) 0 and the first and second order terms in the Taylor-series expansion must be greater than or equal to zero. As in the single variable case, and for the same reason, the first order terms are considered first. Since p is an arbitrary vector and ε can be positive or negative, every component of the gradient vector g(x ) must be zero. Multidisciplinary Design Optimization of Aircrafts 104

55 A point that satisfies f(x ) g(x ) = 0 is called a stationary point, and it can be a minimum, maximum or saddle point. Stationary points Regarding the second order term, ε 2 p T H(x + εθp)p. For this term to be non-negative, H(x + εθp) has to be positive semi-definite, and by continuity, the Hessian at the optimum, H(x ) must also be positive semi-definite. Multidisciplinary Design Optimization of Aircrafts 105

56 Necessary conditions (for a local minimum): g(x ) = 0 and H(x ) is positive semi-definite. (2.40) Sufficient conditions (for a strong local minimum): g(x ) = 0 and H(x ) is positive definite. (2.41) Some definitions from linear algebra that might be helpful: The matrix H R n n is positive definite if p T Hp > 0 for all nonzero vectors p R n (If H = H T then all the eigenvalues of H are strictly positive) convex function The matrix H R n n is positive semi-definite if p T Hp 0 for all vectors p R n (If H = H T then the eigenvalues of H are positive or zero) convex and flat function The matrix H R n n is indefinite if there exists p, q R n such that p T Hp > 0 and q T Hq < 0. (If H = H T then H has eigenvalues of mixed sign.) saddle point Multidisciplinary Design Optimization of Aircrafts 106

57 Example 2.5: Find all stationary points of f(x) = 1.5x x2 2 2x 1x 2 + 2x x4 1. Solve f(x) = 0, get three solutions: (0, 0) f = 0 Local minimum 1/2( 3 7, 3 7) f = Global minimum 1/2( 3 + 7, 3 + 7) f = Saddle point To establish the type of point, we have to determine if the Hessian is positive definite and compare the values of the function at the points. Multidisciplinary Design Optimization of Aircrafts 107

58 2.2.3 General Algorithm for Smooth Functions All algorithms for unconstrained gradient-based optimization can be described as follows: 1. Initial guess. Start with iteration number k = 0 and a starting point, x Test for convergence. If the conditions for convergence are satisfied, then we can stop and x k is the solution. 3. Compute a search direction. Compute the vector p k that defines the direction in n-space along which we will search. 4. Compute the step length. Find a positive scalar, α k such that f(x k + α k p k ) < f(x k ). 5. Update the design variables. Set x k+1 = x k + α k p k, k = k + 1 and go back to 2. There are two subproblems in this type of algorithm for each major iteration: computing the search direction p k and finding the step size (controlled by α k ). The difference between the various types of gradient-based algorithms is the method that is used for computing the search direction. Caution: in non-convex problems with multiple local minima, the solution obtained by gradient methods will only find the local minimum nearest to the starting point. Multidisciplinary Design Optimization of Aircrafts 108

59 2.2.4 Steepest Descent Method The earliest reference to this method is given by Cauchy in 1847 [14]. The steepest descent method uses the gradient vector at each point as the search direction for each iteration. The gradient vector at a point, g(x k ), is also the direction of maximum rate of change (maximum increase) of the function at that point. This rate of change is given by the norm, g(x k ). As mentioned previously, the gradient vector is orthogonal to the plane tangent to the isosurfaces of the function. If we use an exact line search, the steepest descent direction at each iteration is orthogonal to the previous one, i.e., df(x k+1 ) dα = f(x k+1) x k+1 = x k+1 α = T f(x k+1 )p k = 0 g T (x k+1 )g(x k ) = 0 (2.42) Therefore the method zigzags in the design space and is rather inefficient. Although a substantial decrease may be observed in the first few iterations, the method is usually very slow after that. In particular, while the algorithm is guaranteed to converge, it may take an infinite number of iterations. The rate of convergence is linear. Multidisciplinary Design Optimization of Aircrafts 109

60 Input: function f, starting point x 0 and convergence parameters ε g, ε a and ε r Output: local minimum of f begin repeat compute g(x k ) f(x k ) if g(x k ) ε g then converged else compute normalized search direction p k = g(x k )/ g(x k ) end perform line search to find step length α k in the direction of p k update the current point, x k+1 = x k + α k p k evaluate f(x k+1 ) if f(x k+1 ) f(x k ) ε a + ε r f(x k ) satisfied for two successive iterations then converged else set k = k + 1, x k = x k+1 end until converged stop end Pseudo-code 8: Steepest Descent Algorithm Multidisciplinary Design Optimization of Aircrafts 110

61 Here, f(x k+1 ) f(x k ) ε a + ε r f(x k ) is a check for the successive reductions of f. ε a is the absolute tolerance on the change in function value (usually small 10 6 ) and ε r is the relative tolerance (usually set to 0.01). If f is order 1, then ε r dominates. If f gets too small, then the absolute error take over. For steepest descent and other gradient methods that do not produce well-scaled search directions, we need to use other information to guess a step length. One strategy is to assume that the first-order change in x k will be the same as the one obtained in the previous step. i.e, that ᾱg T k p k = α k 1 g T k 1 p k 1 and therefore g T k 1 ᾱ = α p k 1 k 1 gk T p. (2.43) k Since steepest descent relies only on first order information, which is useful locally, it does not take into account neither previous iterations nor second order information, which helps in getting the bigger picture. Multidisciplinary Design Optimization of Aircrafts 111

62 Example 2.6: Steepest Descent Applied to a Quadratic Function Figure 2.2: Solution path of the steepest descent method Multidisciplinary Design Optimization of Aircrafts 112

63 2.2.5 Conjugate Gradient Method First presented by Fletcher and Reeves [27], this method includes a small modification to the steepest descent method that takes into account the history of the gradients to move more directly towards the optimum. It can find the minimum of a quadratic function of n variables in n iterations. Consider the problem of minimizing a (convex) quadratic function f(x) = 1 2 xt Ax c T x (2.44) where A is an n n matrix that is symmetric and positive definite. Differentiating with respect to x yields f(x) = Ax c (2.45) Thus, minimizing the quadratic is equivalent to solving a linear system. The conjugate gradient method is an iterative method for solving linear systems of equations such as this one. A set of nonzero vectors {p 0, p 1,..., p n 1 } is conjugate with respect to A if p T i Ap j = 0, for all i j. (2.46) Conjugate vectors are linearly independent. Multidisciplinary Design Optimization of Aircrafts 113

64 Suppose that we start from a point x 0 and a set of conjugate directions {p 0, p 1,..., p n 1 }. In this method, the gradients of f are used to generate the conjugate directions. Let g k = f(x k ) = Ax k c, being x k the current point at iteration k. The first direction is chosen as the steepest direction p 0 = g 0. (2.47) The sequence {x k } is generated by minimizing f(x) along p k, thus x k+1 = x k + α k p k (2.48) where α k is obtained from the line search problem minimize f(α) = f(x k + αp k ) (2.49) Setting df(α)/ dα = 0 yields p T k Ap kα k + p T k (Ax k + c) = 0 α k = pt k g k p T k Ap k (2.50) Also, the exact line search condition df(α)/dα yields p T k g k+1 = 0 (2.51) Multidisciplinary Design Optimization of Aircrafts 114

65 Now, the key step: choosing p k+1 to be of the form p k+1 = g k+1 + β k d k (2.52) where β introduces a deflection in the steepest descent direction. Requiring p k+1 to be a conjugate direction to p k, p T k+1 Ap k = g T k+1 Ap k + β k d T k Ap k = 0 (2.53) Manipulating x k+1 = x k + α k p k leads to Ap k = (g k+1 g k )/α k. (2.54) Rearranging equations yields β k = gt k+1 (g k+1 g k ) α k p T k Ap k (2.55) Right multiplying (2.52) with g k+1 and using (2.51) results p T k g k = g T k g k (2.56) Multidisciplinary Design Optimization of Aircrafts 115

66 Substituting (2.56) in (2.50) leads to α k = gt k g k p T k Ap k (2.57) Finally, replacing (2.57) in (2.55), β k = gt k+1 (g k+1 g k ) g T k g k (2.58) For any x 0, the sequence {x k } generated by the conjugate direction algorithm converges to the solution of the linear system in at most n steps, only for quadratic functions. This is referred as the (linear) Polak Rebiere algorithm. Convergence is also only guaranteed with exact line search and no round-off errors. In the case of general functions, a restart is made every n iterations wherein the steepest descent step is taken for computational stability. Multidisciplinary Design Optimization of Aircrafts 116

67 If we consider g T k+1 g k = g T k+1 ( p k + β k 1 p k 1 ) = β k 1 g T k+1 p k 1 = β k 1 (g T k + α kp T k A)p k 1 = 0 Substituting in (2.58), we obtain β k = gt k+1 g k+1 gk T g k which is the nonlinear CG algorithm, also known as the Fletcher Reeves method. (2.59) The only difference of CG relative to the steepest descent is that the each descent direction is modified by adding a contribution from the previous direction. Rate of convergence is linear, but can be superlinear, converges in n to 5n iterations, usually 2n. Several variants of the Fletcher Reeves CG method have been proposed. Most of these variants differ in their definition of β k. For example, Dai and Yuan [16] proposed β k = g k+1 2 (g k+1 g k ) T p k. (2.60) Multidisciplinary Design Optimization of Aircrafts 117

68 Input: function f, starting point x 0 and convergence parameters ε g, ε a and ε r Output: local minimum of f begin set k = 0 compute g(x k ) f(x k ) if g(x k ) ε g then converged end repeat compute conjugate gradient direction p k = g k + β k p k 1, where gt k g k β k = g k 1 T g k 1 perform line search to find step length α k in the direction of p k update the current point, x k+1 = x k + α k p k evaluate f(x k+1 ) if f(x k+1 ) f(x k ) ε a + ε r f(x k ) satisfied for two successive iterations then converged else set k = k + 1, x k = x k+1 end until converged end Pseudo-code 9: Nonlinear Conjugate Gradient Algorithm Multidisciplinary Design Optimization of Aircrafts 118

69 Example 2.7: Conjugate Gradient Applied to a Quadratic Function Figure 2.3: Solution path of the nonlinear conjugate gradient method Multidisciplinary Design Optimization of Aircrafts 119

70 2.2.6 Newton Methods Even though the Newton s method lacks robustness for optimization, its concepts lay down the basis for other powerful methods discussed subsequently. While the steepest descent and conjugate gradient methods only use first-order information (the function gradient or first derivative term in the Taylor series) to obtain a local model of the function, Newton methods use a second-order Taylor series expansion of the function (the function Hessian or second-derivative term in the Taylor series) about the current design point, i.e., a quadratic model f(x k + d k ) f k + g T k d k dt k H kd k, (2.61) where d k is the step to the minimum. Differentiating this with respect to d k and setting it to zero, we can obtain the step that minimizes this quadratic, H k d k = g k. (2.62) This is a linear system which yields the Newton step, d k, as a solution. Thus, the Newton method gives both the search direction and step size, i.e., p k = d k and α k = 1. Multidisciplinary Design Optimization of Aircrafts 120

71 When it converges, this method converges at a faster rate than first-oder methods. If the function f is quadratic with a positive definite Hessian matrix H k, then the method converges in one step. For general nonlinear functions, Newton s method converges quadratically if x 0 is sufficiently close to x and the Hessian is positive definite at x. Despite the excellent convergence rate, this method has two main disadvantages: As in the single variable case, difficulties and even failure may occur when the quadratic model is a poor approximation of the function f. If H k is not positive definite, the quadratic model might not have a minimum or even a stationary point. For some nonlinear functions, the Newton step might be such that f(x k + s k ) > f(x k ) and the method is not guaranteed to converge. Another disadvantage of Newton s method is the need to compute not only the gradient, but also the Hessian, which contains n(n + 1)/2 second-order derivatives. Multidisciplinary Design Optimization of Aircrafts 121

72 Input: function f, starting point x 0 and convergence parameters ε g, ε a and ε r Output: local minimum of f begin set k = 0 repeat compute g k f(x k ) if g(x k ) ε g then converged end compute H k 2 f(x k ) compute Newton step d k from H k d k = g k update the current point, x k+1 = x k + d k if f(x k+1 ) f(x k ) ε a + ε r f(x k ) satisfied for two successive iterations then converged else set k = k + 1, x k = x k+1 end until converged stop end Pseudo-code 10: Newton s Method Multidisciplinary Design Optimization of Aircrafts 122

73 Modified Newton s Method To address the two mentioned Newton s method main disadvantages, two modifications can be made: Ensure that the search direction is a descent direction of f at x k That is, ensure that f(x k ) T d k < 0, which using (2.62) means f(x k ) T [ 2 f(x k )] 1 f(xk ) < 0 (2.63) For the above to be satisfied, the Hessian of f has to be positive definite. One strategy is to replace the real Hessian with a symmetric positive definite matrix F k defined by F k = H k + γi, (2.64) where γ is chosen such that all the eigenvalues of F k are greater than a scalar δ > 0. The direction vector d k is now determined from the solution of F k d k = g k. (2.65) Multidisciplinary Design Optimization of Aircrafts 123

74 A step size parameter α k can be introduced to improve the approximation of highly nonlinear functions The step size α k is obtained from line search: minimize f(x k + α k d k ) The new point is then found to be x k+1 = x k + α k d k When using Newton or quasi-newton methods, the starting step length ᾱ is usually set to 1, since Newton s method already provides a good guess for the step size. The step size reduction ratio (ρ in the backtracking line search) sometimes varies during the optimization process and is such that 0 < ρ < 1. In practice ρ is not set to be too close to 0 or 1. Multidisciplinary Design Optimization of Aircrafts 124

75 Input: function f, starting point x 0, scalar δ > 0 and convergence param.ε g, ε a and ε r Output: local minimum of f begin set k = 0 repeat compute g k f(x k ) if g(x k ) ε g then converged end compute H k 2 f(x k ) and F k = H k + γi compute search direction d k from F k d k = g k compute step size α k from: minimize f(x k + α k d k ) update current point, x k+1 = x k + d k if f(x k+1 ) f(x k ) ε a + ε r f(x k ) satisfied for two successive iterations then converged else set k = k + 1, x k = x k+1 end until converged stop end Pseudo-code 11: Modified Newton s Method Multidisciplinary Design Optimization of Aircrafts 125

76 Example 2.8: Modified Newton s Method Applied to a Quadratic Function Figure 2.4: Solution path of the modified Newton s method Multidisciplinary Design Optimization of Aircrafts 126

77 2.2.7 Quasi-Newton Methods This class of methods uses first order information only, but builds second order information an approximate Hessian based on the sequence of function values and gradients from previous iterations. Most of these methods also force the Hessian to be symmetric and positive definite, which can greatly improve their convergence properties. Key to success of Newton s is the use of n-dimensional curvature information given by the Hessian, which allows a local quadratic model of f. Quasi-Newton contrasts with Newton in that with Newton, all gradients and all curvature terms are computed at a single point. The update formula is now x k+1 = x k α k V k f(x k ) (2.66) where V k is the inverse of the Hessian approximation, V k F 1 k, and the step size α k is determined by minimizing f(x k + αd k ) with respect to α, where d k = V k f(x k ). Multidisciplinary Design Optimization of Aircrafts 127

78 When using quasi-newton methods, the inverse Hessian is initialized to the identity matrix, V 0 = I. The update at each iteration is written as ˆV k and is added to the current one, V k+1 = V k + ˆV k. (2.67) Considering the Taylor-series expansion of the gradient function about x k, g(x k+1 ) = g k + H k s k + (2.68) where s k = x k+1 x k. Neglecting higher-order terms in this series yields where y k = g(x k+1 ) g(x k ). H k s k = y k, (2.69) Then, the new approximate inverse of the Hessian, V k+1 must satisfy the quasi-newton condition, V k+1 y k = s k. (2.70) The quasi-newton methods are the most widely used of the gradient optimization methods. Multidisciplinary Design Optimization of Aircrafts 128

79 Davidon Fletcher Powell (DFP) Method One of the first quasi-newton methods was devised by Davidon (1959) [20] and modified by Fletcher and Powell (1963) [26]. Instead of computing V k from scratch at every iteration, a quasi-newton method updates it in a way that accounts for the curvature measured during the most recent step. The DFP update for the inverse of the Hessian approximation can be shown to be V DF P k+1 = V k V ky k y T k V k y T k V ky k + s ks T k s T k y k (2.71) Notice that V k+1 remains symmetric and it can also be shown that it remains positive definite (assuming H k is positive definite). When applied to quadratic functions, the update formula results in the exact inverse of the Hessian matrix after n iterations; this implies convergence at the end of n iterations (same as CG method). For large problems, the storage and update of V may be a disadvantage of quasi-newton methods compared to the conjugate gradient method. Multidisciplinary Design Optimization of Aircrafts 129

80 The DFP Algorithm 1. Select starting point x 0, and convergence parameter ε g. Set k = 0 and V 0 = I. 2. Compute g(x k ) f(x k ). If g(x k ) ε g then stop. Otherwise, continue. 3. Compute the search direction, p k = V k g k. 4. Perform line search to find step length α k in the direction of p k (start with α k = 1). 5. Update the current point, x k+1 = x k + α k p k, set s k = α k p k, and compute the change in the gradient, y k = g k+1 g k. 6. Update V k+1 by computing A k = V ky k y T k V k y T k V ky k B k = s ks T k s T k y k 7. Set k = k + 1 and return to step 2. V k+1 = V k A k + B k Multidisciplinary Design Optimization of Aircrafts 130

81 Broyden Fletcher Goldfarb Shanno (BFGS) Method The DFP update was soon superseded by the BFGS formula [12, 13, 25, 31, 54], which is generally considered to be the most effective quasi-newton updates. The BFGS update formula for the inverse of the Hessian approximation can be shown to be BF GS Vk+1 = V k s ky T k V k + V k y k s T k s T k y k + ( 1 + yt k V ky k s T k y k ) sk s T k s T k y k (2.72) The relative performance between the DFP and BFGS methods is problem dependent. The BFGS update is better suited than the DFP update when using approximate line search. Multidisciplinary Design Optimization of Aircrafts 131

82 Example 2.9: BFGS Applied to a Quadratic Function Figure 2.5: Solution path of the BFGS method Multidisciplinary Design Optimization of Aircrafts 132

83 2.2.8 Trust Region Methods Trust region, or restricted step methods are a different approach to resolving the weaknesses of the pure form of Newton s method, arising from an Hessian that is not positive definite or a highly nonlinear function. One may interpret these problems as arising from trying to minimize the quadratic approximation of f(x), minimize q(x) = f(x k ) + f(x k ) (x x k) T 2 f(x k )(x x k ) in a region which is outside the validity region of the quadratic approximation. These difficulties can be overcome by minimizing the function within a region around x k wherein the second-order Taylor series approximation is valid, that is to say, where there is trust in the quadratic model. This region is called the trust region and can be denoted by Ω k = {x : x x k h h } where h k is the size of the trust region, which is dynamically adjusted. Multidisciplinary Design Optimization of Aircrafts 133

84 The quadratic approximation q(x) is minimized within Ω k, minimize q(s k ) = f(x k ) + g(x k ) T s k st k H(x k)s k w.r.t. s k (2.73) s.t. h k s k h k, i = 1,..., n which is a constrained minimization problem involving a quadratic objective function and linear constraints. This class of problems is called a quadratic programming (QP) problem and its solution are discussed in future sections. After obtaining s k, the actual and predicted changes in the objective function can be computed as f = f(x k ) f(x k + s k ) (2.74) q = f(x k ) q(s k ) (2.75) Then the accuracy with which q(s k ) approximates f(x k + s k ) can be measured by the ratio r k = f q (2.76) The closer r k is to unity, the better is the agreement. Multidisciplinary Design Optimization of Aircrafts 134

85 The size of the trust region is updated based on this ratio as follows: h k+1 = s k if r k < 0.25, 4 h k+1 = 2h k if r k > 0.75 and h k = s k, (2.77) h k+1 = h k otherwise. The initial value of h is usually taken as h 0 = 1. The quadratic model is reasonable when q(x k ) is close to the real value of the function f(x k ). A particular advantage of trust region methods is that they are not very sensitive to scaling because of the dynamic adjustment of the size of the trust region. They are also very robust. Multidisciplinary Design Optimization of Aircrafts 135

86 Input: function f, starting point x 0, conv.param. ε g, ε a and ε r, and initial size of trust region h 0 Output: local minimum of f begin set k = 0 repeat compute g k f(x k ) if g(x k ) ε g then converged end compute H k 2 f(x k ) and solve the quadratic subproblem (2.74) for s k evaluate f(x k + s k ) and compute quadratic model accuracy ratio r k (2.76) compute size for new trust region using (2.77) determine new point: x k+1 = x k if r k 0; x k+1 = x k + s k otherwise if f(x k+1 ) f(x k ) ε a + ε r f(x k ) satisfied for two successive iterations then converged else set k = k + 1, x k = x k+1 end until converged end Pseudo-code 12: Trust Region algorithm Multidisciplinary Design Optimization of Aircrafts 136

87 Example 2.10: Minimization of the Rosenbrock Function Minimize Rosenbrock s function, f(x) = 100 ( x 2 x 2 1) 2 + (1 x1 ) 2, starting from x 0 = [ ] Multidisciplinary Design Optimization of Aircrafts 137

88 Figure 2.6: Solution path of the steepest descent and conjugate gradient methods Multidisciplinary Design Optimization of Aircrafts 138

89 Figure 2.7: Solution path of the modified Newton and BFGS methods Multidisciplinary Design Optimization of Aircrafts 139

Single Variable Minimization

Single Variable Minimization AA222: MDO 37 Sunday 1 st April, 2012 at 19:48 Chapter 2 Single Variable Minimization 2.1 Motivation Most practical optimization problems involve many variables, so the study of single variable minimization

More information

Line Search Techniques

Line Search Techniques Multidisciplinary Design Optimization 33 Chapter 2 Line Search Techniques 2.1 Introduction Most practical optimization problems involve many variables, so the study of single variable minimization may

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Unconstrained Multivariate Optimization

Unconstrained Multivariate Optimization Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued

More information

3.1 Introduction. Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x x 1.5 =0, tan x x =0.

3.1 Introduction. Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x x 1.5 =0, tan x x =0. 3.1 Introduction Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x 3 +1.5x 1.5 =0, tan x x =0. Practical existence test for roots: by intermediate value theorem, f C[a, b] & f(a)f(b)

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes) AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation

More information

Numerical Methods in Informatics

Numerical Methods in Informatics Numerical Methods in Informatics Lecture 2, 30.09.2016: Nonlinear Equations in One Variable http://www.math.uzh.ch/binf4232 Tulin Kaman Institute of Mathematics, University of Zurich E-mail: tulin.kaman@math.uzh.ch

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

2.3 Linear Programming

2.3 Linear Programming 2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

GENG2140, S2, 2012 Week 7: Curve fitting

GENG2140, S2, 2012 Week 7: Curve fitting GENG2140, S2, 2012 Week 7: Curve fitting Curve fitting is the process of constructing a curve, or mathematical function, f(x) that has the best fit to a series of data points Involves fitting lines and

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Chapter III. Unconstrained Univariate Optimization

Chapter III. Unconstrained Univariate Optimization 1 Chapter III Unconstrained Univariate Optimization Introduction Interval Elimination Methods Polynomial Approximation Methods Newton s Method Quasi-Newton Methods 1 INTRODUCTION 2 1 Introduction Univariate

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization

More information

Chapter 3: Root Finding. September 26, 2005

Chapter 3: Root Finding. September 26, 2005 Chapter 3: Root Finding September 26, 2005 Outline 1 Root Finding 2 3.1 The Bisection Method 3 3.2 Newton s Method: Derivation and Examples 4 3.3 How To Stop Newton s Method 5 3.4 Application: Division

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues

More information

UNCONSTRAINED OPTIMIZATION

UNCONSTRAINED OPTIMIZATION UNCONSTRAINED OPTIMIZATION 6. MATHEMATICAL BASIS Given a function f : R n R, and x R n such that f(x ) < f(x) for all x R n then x is called a minimizer of f and f(x ) is the minimum(value) of f. We wish

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

MATH 4211/6211 Optimization Basics of Optimization Problems

MATH 4211/6211 Optimization Basics of Optimization Problems MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization

More information

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable. Maria Cameron 1. Fixed point methods for solving nonlinear equations We address the problem of solving an equation of the form (1) r(x) = 0, where F (x) : R n R n is a vector-function. Eq. (1) can be written

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Line Search Methods. Shefali Kulkarni-Thaker

Line Search Methods. Shefali Kulkarni-Thaker 1 BISECTION METHOD Line Search Methods Shefali Kulkarni-Thaker Consider the following unconstrained optimization problem min f(x) x R Any optimization algorithm starts by an initial point x 0 and performs

More information

Computational Finance

Computational Finance Department of Mathematics at University of California, San Diego Computational Finance Optimization Techniques [Lecture 2] Michael Holst January 9, 2017 Contents 1 Optimization Techniques 3 1.1 Examples

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

Unit 2: Solving Scalar Equations. Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright

Unit 2: Solving Scalar Equations. Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright cs416: introduction to scientific computing 01/9/07 Unit : Solving Scalar Equations Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright 1 Introduction We now

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 5 Nonlinear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational

More information

Optimality Conditions

Optimality Conditions Chapter 2 Optimality Conditions 2.1 Global and Local Minima for Unconstrained Problems When a minimization problem does not have any constraints, the problem is to find the minimum of the objective function.

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function

More information

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10)

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10) Lecture 7: Minimization or maximization of functions (Recipes Chapter 10) Actively studied subject for several reasons: Commonly encountered problem: e.g. Hamilton s and Lagrange s principles, economics

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Multivariate Newton Minimanization

Multivariate Newton Minimanization Multivariate Newton Minimanization Optymalizacja syntezy biosurfaktantu Rhamnolipid Rhamnolipids are naturally occuring glycolipid produced commercially by the Pseudomonas aeruginosa species of bacteria.

More information

Gradient Descent. Sargur Srihari

Gradient Descent. Sargur Srihari Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors

More information

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method. Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization

More information

Unconstrained optimization I Gradient-type methods

Unconstrained optimization I Gradient-type methods Unconstrained optimization I Gradient-type methods Antonio Frangioni Department of Computer Science University of Pisa www.di.unipi.it/~frangio frangio@di.unipi.it Computational Mathematics for Learning

More information

Introduction to unconstrained optimization - direct search methods

Introduction to unconstrained optimization - direct search methods Introduction to unconstrained optimization - direct search methods Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Structure of optimization methods Typically Constraint handling converts the

More information

Nonlinear Equations. Chapter The Bisection Method

Nonlinear Equations. Chapter The Bisection Method Chapter 6 Nonlinear Equations Given a nonlinear function f(), a value r such that f(r) = 0, is called a root or a zero of f() For eample, for f() = e 016064, Fig?? gives the set of points satisfying y

More information

Numerical Methods. Root Finding

Numerical Methods. Root Finding Numerical Methods Solving Non Linear 1-Dimensional Equations Root Finding Given a real valued function f of one variable (say ), the idea is to find an such that: f() 0 1 Root Finding Eamples Find real

More information

CHAPTER 4 ROOTS OF EQUATIONS

CHAPTER 4 ROOTS OF EQUATIONS CHAPTER 4 ROOTS OF EQUATIONS Chapter 3 : TOPIC COVERS (ROOTS OF EQUATIONS) Definition of Root of Equations Bracketing Method Graphical Method Bisection Method False Position Method Open Method One-Point

More information

Mathematical optimization

Mathematical optimization Optimization Mathematical optimization Determine the best solutions to certain mathematically defined problems that are under constrained determine optimality criteria determine the convergence of the

More information

Scientific Computing: Optimization

Scientific Computing: Optimization Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x) Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not

More information

Math 409/509 (Spring 2011)

Math 409/509 (Spring 2011) Math 409/509 (Spring 2011) Instructor: Emre Mengi Study Guide for Homework 2 This homework concerns the root-finding problem and line-search algorithms for unconstrained optimization. Please don t hesitate

More information

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations Methods for Systems of Methods for Systems of Outline Scientific Computing: An Introductory Survey Chapter 5 1 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Chapter 6: Derivative-Based. optimization 1

Chapter 6: Derivative-Based. optimization 1 Chapter 6: Derivative-Based Optimization Introduction (6. Descent Methods (6. he Method of Steepest Descent (6.3 Newton s Methods (NM (6.4 Step Size Determination (6.5 Nonlinear Least-Squares Problems

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

15 Nonlinear Equations and Zero-Finders

15 Nonlinear Equations and Zero-Finders 15 Nonlinear Equations and Zero-Finders This lecture describes several methods for the solution of nonlinear equations. In particular, we will discuss the computation of zeros of nonlinear functions f(x).

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that Chapter 4 Nonlinear equations 4.1 Root finding Consider the problem of solving any nonlinear relation g(x) = h(x) in the real variable x. We rephrase this problem as one of finding the zero (root) of a

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Solution of Nonlinear Equations

Solution of Nonlinear Equations Solution of Nonlinear Equations (Com S 477/577 Notes) Yan-Bin Jia Sep 14, 017 One of the most frequently occurring problems in scientific work is to find the roots of equations of the form f(x) = 0. (1)

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION 15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome

More information

SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS BISECTION METHOD

SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS BISECTION METHOD BISECTION METHOD If a function f(x) is continuous between a and b, and f(a) and f(b) are of opposite signs, then there exists at least one root between a and b. It is shown graphically as, Let f a be negative

More information

PART I Lecture Notes on Numerical Solution of Root Finding Problems MATH 435

PART I Lecture Notes on Numerical Solution of Root Finding Problems MATH 435 PART I Lecture Notes on Numerical Solution of Root Finding Problems MATH 435 Professor Biswa Nath Datta Department of Mathematical Sciences Northern Illinois University DeKalb, IL. 60115 USA E mail: dattab@math.niu.edu

More information

Numerical Optimization: Basic Concepts and Algorithms

Numerical Optimization: Basic Concepts and Algorithms May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some

More information

Numerical Optimization Techniques

Numerical Optimization Techniques Numerical Optimization Techniques Léon Bottou NEC Labs America COS 424 3/2/2010 Today s Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification,

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

Chapter 1. Root Finding Methods. 1.1 Bisection method

Chapter 1. Root Finding Methods. 1.1 Bisection method Chapter 1 Root Finding Methods We begin by considering numerical solutions to the problem f(x) = 0 (1.1) Although the problem above is simple to state it is not always easy to solve analytically. This

More information

Solution of Algebric & Transcendental Equations

Solution of Algebric & Transcendental Equations Page15 Solution of Algebric & Transcendental Equations Contents: o Introduction o Evaluation of Polynomials by Horner s Method o Methods of solving non linear equations o Bracketing Methods o Bisection

More information