Review of Classical Optimization
|
|
- Norman Jordan
- 5 years ago
- Views:
Transcription
1 Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51
2 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization Motivation Most practical optimization problems involve many variables, so the study of single variable minimization may seem academic. However, the optimization of multi-variable functions can be broken into two parts: 1. Finding a suitable search direction; 2. Minimizing along that direction. The second part of this strategy, the so-called line search, is the motivation for studying single variable minimization. Multidisciplinary Design Optimization of Aircrafts 52
3 Consider a scalar function, f, that depends on a single independent variable, x. Suppose we want to find the value of x where f(x) is a minimum value: minimize f(x) (2.1) by varying x R Furthermore, we want to do this with low computational cost (few iterations and low cost per iteration), low memory requirements, and low failure rate. Often the computational effort is dominated by the computation of f and its derivatives so some of the requirements can be translated into: evaluate f(x) and df/ dx as few times as possible. Multidisciplinary Design Optimization of Aircrafts 53
4 You end up having a few choices: Choose methods that need / do not need the evaluation of the function derivatives (if you can compute derivatives cheaply, you may want to consider using them); If the function is pathologically bad, you may want to avoid the use of derivatives; When using bracketing methods you want to choose the approach so that it provides faster rates of convergence in general; In multi-dimensional cases, choose between methods that require order N and order N 2 storage. Multidisciplinary Design Optimization of Aircrafts 54
5 2.1.2 Types of Minima The point x is a strong local minimizer, if f(x ) < f(x) for all x near x ; weak local minimizer, if f(x ) f(x) for all x near x ; strong global minimizer, if f(x ) < f(x), for all x; weak global minimizer, if f(x ) f(x), for all x. If a minimum does not exist, the function is not bounded below. Multidisciplinary Design Optimization of Aircrafts 55
6 2.1.3 Optimality Conditions Taylor s theorem is useful for identifying local minima. Theorem. [Taylor s theorem] θ (0 θ 1) such that If f(x) is n times differentiable, then there exists f(x + h) = f(x) + hf (x) h2 f (x) (n 1)! hn 1 f n 1 (x) + 1 n! hn f n (x + θh) }{{} O(h n ) Assuming f is twice-continuously differentiable and a minimum of f exists at x, then Taylor s theorem, using n = 2 and x = x, leads to f(x + ε) = f(x ) + εf (x ) ε2 f (x + θε) (2.2) For a local minimum at x, it requires that f(x + ε) f(x ) for a range δ ε δ, where δ is a positive number. Given this definition, and the Taylor series expansion (2.2), for f(x ) to be a local minimum, it requires εf (x ) ε2 f (x + θε) 0. For any finite values of f (x ) and f (x ), the value ε can be always be chosen small enough such that εf (x ) 1 2 ε2 f (x ). Multidisciplinary Design Optimization of Aircrafts 56
7 For εf (x ) to be non-negative, then f (x ) = 0, because the sign of ε is arbitrary. This is the first-order optimality condition. A point that satisfies the first-order optimality condition is called a stationary point. Besides minima, other type of stationary points include maxima and inflection points. Because the first derivative term is zero, the second derivative term must be considered. This term must be non-negative for a local minimum at x. Since ε 2 is always positive, then f (x ) 0. This is the second-order optimality condition. Higher-order can always be made smaller than the second-order term by choosing a small enough ε. Discontinuities: Many optimizers fail in the presence of discontinuities. This is specially critical for gradient-based optimizers or others that look only at the local region of design space. Multidisciplinary Design Optimization of Aircrafts 57
8 Necessary conditions (for a local minimum): f (x ) = 0; f (x ) 0 (2.3) Sufficient conditions (for a strong local minimum): f (x ) = 0; f (x ) > 0 (2.4) The optimality conditions can be used to Verify that a point is a minimum (sufficient conditions); Realize that a point is not a minimum (necessary conditions); Define equations that can be solved to find a minimum (in simple cases). Gradient-based minimization methods find a local minima by finding points that satisfy the optimality conditions. Multidisciplinary Design Optimization of Aircrafts 58
9 2.1.4 Rate of Convergence The rate of convergence is a measure of how fast an iterative method converges to the numerical solution. An iterative method is said to converge with order r when r > 0 is the largest number such that x k+1 x 0 < lim < (2.5) k x k x r where k is iteration number. This is to say that the above limit must be a positive constant. This constant is the asymptotic error constant. x k+1 x lim = γ (2.6) k x k x r If the limit is zero when r = 1, we have a special case called superlinear convergence. When r = 2, the method sequence converges quadratically, this means that the number of correct figures roughly doubles with each iteration. When solving real problems, the exact x is not known in advance, but it is useful to plot x k+1 x k and g k (i.e. the norm of the gradient) in a log-axis plot versus k. Multidisciplinary Design Optimization of Aircrafts 59
10 Some examples from Gill et al. [29] Example 2.1: x k = c 2k, for 0 c < 1 Each member is the square of the previous, the limit is zero. x k+1 0 x k 0 2 = c so r = 2 (quadratic convergence) and γ = c. Example 2.2: y k = c 2 k, for c 0 Each member is the square root of the previous, the limit is 1. c 2 (k+1) 1 c 2 k 1 = 1 c 2 (k+1) 1 = 1 2 so r = 1 (linear convergence) and γ = 1/2. Multidisciplinary Design Optimization of Aircrafts 60
11 2.1.5 Unimodality and Bracketing the Minimum Line search methods using bracketing require that the function f to be unimodal, that is, it monotonically decreases as we approach x from the left and then monotonically increases to the right of x (it has a single local minimum). Example of a unimodal function Multidisciplinary Design Optimization of Aircrafts 61
12 The first step in the process of finding the minimum is to bracket it in an interval. Input: function f, starting point x 1, step size, expansion parameter γ 1 Output: Three point pattern x 1,x 2,x 3 such that f 1 f 2 < f 3 begin set x 2 = x 1 + evaluate f 1 and f 2 if f 2 > f 1 then interchange f 1 and f 2, x 1 and x 2, and set = end repeat if f 3 not null then rename f 2 as f 1, f 3 as f 2, x 2 as x 1, x 3 as x 2 end set = γ, x 3 = x 2 +, and evaluate f 3 until f 3 > f 2 end Pseudo-code 1: Bracketing Algorithm Common values for γ are 2 (step size doubled at each successive iteration) or (the golden section ratio). The three-point-pattern is needed for all interval reduction methods. Multidisciplinary Design Optimization of Aircrafts 62
13 2.1.6 Interval Reduction Methods These methods for function minimization start with an interval of uncertainty containing the minimum, that could have been determined using the bracketing algorithm, and successively reduce its size to a desired tolerance. These methods should be robust and efficient, that is, they should converge to the minimum using a reduced number of function evaluations. Among the most common methods are: Fibonacci Method; Golden Section Method; Polynomial-Based Methods. Multidisciplinary Design Optimization of Aircrafts 63
14 Fibonacci Method The Fibonacci method is the strategy that yields the maximum reduction in the interval of uncertainty for a given number of function evaluations. Leonardo Pisa (nicknamed Fibonacci) found a sequence of numbers that describe the evolution of a population of rabbits: Rabbit population and Fibonacci numbers The first few numbers of this sequence are 1, 1, 2, 3, 5, 8, 13,.... In general, the sequence of Fibonacci numbers can be generated using F 0 = F 1 = 1 (2.7) F k = F k 1 + F k 2, k = 2,..., n (2.8) Multidisciplinary Design Optimization of Aircrafts 64
15 Say we have an interval of uncertainty and the function has been evaluated at the boundaries. In order to reduce the interval of uncertainty, we have to evaluate two new points inside the interval. Then (assuming the function is unimodal) the interval that has lower function evaluation inside its boundaries is the new interval of uncertainty. The most efficient way of reducing the size of the interval would be to: (1) ensure that the two possible intervals of uncertainty are the same size, and (2) once the new interval is chosen, the point inside the domain can be used and only one more function evaluation is required for selecting a new interval of uncertainty. Fibonacci: Sequence of intervals Multidisciplinary Design Optimization of Aircrafts 65
16 The interval sizes, I k are such that I 1 = I 2 + I 3 I 2 = I 3 + I 4. I k = I k+1 + I k+2 (2.9). I N 4 = I N 3 + I N 2 = 8I N I N 3 = I N 2 + I N 1 = 5I N I N 2 = I N 1 + I N = 3I N I N 1 = 2I N Recognizing the Fibonacci numbers, the following relation holds I n j = F j+1 I n, j = 1, 2,..., n 1 (2.10) Multidisciplinary Design Optimization of Aircrafts 66
17 To find the successive interval sizes, we need to start from the last interval in reverse order, only after this can we start the search. When using the Fibonacci search, we have to decide on the number of function evaluations a priori. This is not always convenient, as the termination criteria is often the variation of the function values in the interval of uncertainty. Furthermore, this method requires that the sequence be stored. Fibonacci search is the optimum because in addition to yielding two intervals that are the same size and reusing one point, the interval between the two interior points converges to zero, and in the final iteration, the interval is divided into two (almost exactly), which is the optimum strategy for the last iteration. A detailed description of this method can be found in [10], pp Multidisciplinary Design Optimization of Aircrafts 67
18 Input: function f, starting values x 1 and x 4 bracketing the minimum, tolerance ε or maximum iterations N (condition: x 1 < x 4 ) Output: interval of size smaller ε that contains the minimum of f(x) begin I 1 x 4 x 1 if ε is given then N minn : ε = I N = I 1 F N end I 2 F N 1 F N I 1 x 2 x 4 I 2 for k = 1 N 1 do I k+1 I k 1 + I k x 3 x 1 + I k+1 if f(x 2 ) < f(x 3 ) then x 4 = x 1 ; x 1 = x 3 else x 1 = x 2 ; x 2 = x 3 end end output [x 1, x 4 ] is the final interval end Pseudo-code 2: Fibonacci Algorithm Multidisciplinary Design Optimization of Aircrafts 68
19 Golden Section Method In the golden section search, the interval reduction strategy is uniform, thus, it is independent of the number of iterations. The interval sizes, I k, are such that Then, I 1 = I 2 + I 3 I 2 = I 3 + I 4. I 2 = I 3 = I 4 = = τ I 1 I 2 I 3 The positive solution of this equation is the golden section ratio, τ 2 + τ 1 = 0 (2.11) τ = lim k F k 1 F k (2.12) Therefore, the Fibonacci search also reaches this value in the limit. Multidisciplinary Design Optimization of Aircrafts 69
20 The basic golden section algorithm is similar to the Fibonacci algorithm, except that α is replaced by the golden ratio τ. Both Fibonacci and golden section methods offer always two equal intervals and reuses a previous interior point, but the latter does not use an optimal strategy for the last iteration. There is no last iteration. The interval is always divided in the same proportions. Assuming the initial uncertainty interval is I 1 = [0, 1]. The function is then evaluated at 1 τ and τ. The two possible intervals are [0, τ] and [1 τ, 1], and they are of the same size. If, say [0, τ] is selected, then the next two interior points would be τ(1 τ) and ττ, but τ 2 = 1 τ which has already been evaluated. Golden section: Sequence of intervals Multidisciplinary Design Optimization of Aircrafts 70
21 Similarly to the Fibonacci method, the golden section method has linear convergence, meaning that successive significant figures are won linearly with additional function evaluations. This method can also be integrated with the three-point bracketing algorithm by choosing the expansion parameter as 1 + τ. A detailed description of this method can be found in [10], pp Multidisciplinary Design Optimization of Aircrafts 71
22 Polynomial-Based Methods More efficient procedures use information about f gathered during iteration. One way of using this information is to produce an estimate of the function which we can easily minimize. The lowest order function that we can use for this purpose is a quadratic, since a linear function does not have a minimum. Suppose we approximate f by f = 1 2 ax2 + bx + c. (2.13) If a > 0, the minimum of this function is x = b/a. Multidisciplinary Design Optimization of Aircrafts 72
23 To generate a quadratic approximation, three independent pieces of information are needed. For example, if we have the value of the function, its first derivative, and its second derivative at point x k, we can write a quadratic approximation of the function value at x as the first three terms of a Taylor series f(x) = f(x k ) + f (x k )(x x k ) f (x k )(x x k ) 2 (2.14) If f (x k ) is not zero, and setting x = x k+1 this yields This is Newton s method used to find a zero of the first derivative. x k+1 = x k f (x k ) f (x k ). (2.15) Robust algorithms are obtained when polynomial fit and sectioning ideas are merged, such as Brent s Quadratic Fit-Sectioning Algorithm. Multidisciplinary Design Optimization of Aircrafts 73
24 Brent s Quadratic Fit-Sectioning Algorithm Brent [11] devised a method fits a quadratic polynomial and accepts the quadratic minimum when the function is cooperative, and uses the golden section method otherwise. At any particular stage, Brent s algorithm keeps track of six points (not necessarily all distinct), a, b, u, v, w and x, defined as follows: The minimum is bracketed between a and b; x is the point with the least function value found so far (or the most recent one in case of a tie); w is the point with the second least function value; v is the previous value of w; u is the point at which the function was evaluated most recently. Multidisciplinary Design Optimization of Aircrafts 74
25 The general idea is the following: parabolic interpolation is attempted, fitting through the points x, v, and w. To be acceptable, the parabolic step must (1) fall within the bounding interval (a, b), and (2) imply a movement from the best current value x that is less than half the movement of the step before last. This second criterion insures that the parabolic steps are converging, rather than, say, bouncing around in some non-convergent limit cycle. The minimum of the quadratic that fits f(x), f(v) and f(w) is u = x 1 2 (x w) 2 [f(x) f(v)] (x v) 2 [f(x) f(w)] (x w)[f(x) f(v)] (x v)[f(x) f(w)]. (2.16) Brent s method converges superlinearly, meaning that the rate at which successive significant figures are liberated increases with each successive function evaluation. A detailed description of this method can be found in [10], pp Multidisciplinary Design Optimization of Aircrafts 75
26 Input: function f, three-point-pattern a, b and x bracketing the minimum, tolerance (condition: f a f x < f b ) Output: interval of size smaller 2ε that contains the minimum of f(x) begin w, v x repeat if x, w and v distinct then try quadratic fit for x, w and v, and determine minimum u using (2.16) if u close to a, b or x, adjust to larger of [a, x] or [x, b] so it stays ε away from x else calculate u using golden sectioning of the larger interval [a, x] or [x, b] end evaluate f u among a, b, x, w, v, u, determine the new a, b, x, w, v until larger interval [a, x] or [x, b] < 2ε end Pseudo-code 3: Brent s Algorithm for Minimum Multidisciplinary Design Optimization of Aircrafts 76
27 Akima Splines Polynomial interpolation often leads to spurious oscillations, specially when the original function exhibits abrupt changes in curvature. Hiroshi Akima, in 1970, published a one-dimensional fitting method that has some very desirable properties [2]. Akima claims that his method is closer to a manually drawn curve than those drawn by other mathematical methods. In 1991, Akima published an update to his algorithm [4, 3] addressing some shortcomings of the original. The approach uses a cubic fit between the data points, so the slope is required at each data point in addition to the value of the point itself. The interpolating polynomial is written between the ith and i + 1st data points as: y = a 0 + a 1 (x x i ) + a 2 (x x i ) + a 3 (x x i ), (2.17) Multidisciplinary Design Optimization of Aircrafts 77
28 with coefficients defined by a 0 = y i (2.18) a 1 = y i a 2 = 3m i 2y i y i+1 x i+1 x i a 3 = y i + y i+1 2m i (x i+1 x i ) 2 and m i = y i+1 y i (2.19) x i+1 x i which is the slope of the line segment passing through the points. The method of determining the derivatives, y, is what makes the Akima methods unique. In the 1991 method, the derivative is y i = ωk f k ωk (2.20) where f k is the computed derivative at P i of a third-order polynomial passing through P i and three other nearby points: Multidisciplinary Design Optimization of Aircrafts 78
29 f 1 = F (P i 3, P i 2, P i 1, P i ) (2.21) f 2 = F (P i 2, P i 1, P i, P i+1 ) f 3 = F (P i 1, P i, P i+1, P i+2 ) f 4 = F (P i, P i+1, P i+2, P i+3 ) The weights are inversely proportional to the product of what Akima calls a volatility measure and a distance measure, ω k = 1 v k d k (2.22) The distance factor is the sum of squares of the distance from P i and the other three points: d 1 = (x i 3 x i ) 2 + (x i 2 x i ) 2 + (x i 1 x i ) 2 (2.23) d 2 = (x i 2 x i ) 2 + (x i 1 x i ) 2 + (x i+1 x i ) 2 d 3 = (x i 1 x i ) 2 + (x i+1 x i ) 2 + (x i+2 x i ) 2 d 4 = (x i+1 x i ) 2 + (x i+2 x i ) 2 + (x i+3 x i ) 2 The volatility factor, v k, is the sum of squares of deviation from a least-squares linear fit of the four points. Multidisciplinary Design Optimization of Aircrafts 79
30 2.1.7 Zero of a Function Solving the first-order optimality condition, that is, finding x such that g(x ) = f (x ) = 0, is equivalent to finding the roots of the first derivative of the function to be minimized. In addition, in constrained optimization, zero-finding problems also occur due to constraints, i.e., h(x) = 0. Therefore, root finding methods can be used to find stationary points and are useful in function minimization. Zero-finding algorithms can be classified in two basic categories, depending on the starting guess: Interval of uncertainty: Bisection method Arbitrary point: Newton s method Secant method Multidisciplinary Design Optimization of Aircrafts 80
31 Bisection Method This method for finding the zero of a function f starts with two guesses, forming an initial bracket [a, b] containing the root, for which the function values f(a) and f(b) have opposite sign. A new guess is then chosen at the midpoint, c = 1 2 (a + b). The procedure is repeated by using the new guess and the other guess that brackets the root. This process is then repeated until the desired accuracy is obtained. Multidisciplinary Design Optimization of Aircrafts 81
32 If [a, b] is the initial interval and N is the number of iterations, the final interval is δ = a b 2 N 2N = a b δ ( ) a b N = log 2 δ (2.24) therefore, this method is guaranteed to find the zero to a specified tolerance δ in about log 2 (a b)/δ function evaluations. Bisection yields the smallest interval of uncertainty for a specified number of function evaluations. It has the advantage that it always converges provided that the initial interval contains a zero. Because it is a bracketing method, it generates a set of nested intervals. The only drawback is that the rate of convergence is rather slow. Since δ k+1 = k /2, from the definition of rate of convergence, for r = 1, lim = δ k+1 = 1 k δ k 2 therefore, the bisection algorithm exhibits a linear rate of convergence (r = 1) and the asymptotic error constant 1/2. Multidisciplinary Design Optimization of Aircrafts 82
33 To find the minimum of a function using bisection, we would evaluate the derivative of f at each iteration, instead of the function value. Using machine precision, it is not possible find the exact zero, so we will be satisfied with finding an x that belongs to an interval [a, b] such that the function g( f ) satisfies g(a)g(b) < 0 and a b < δ where δ is a small tolerance. This tolerance might be dictated by the machine representation (using double precision this is usually ), the precision of the function evaluation, or a limit on the number of iterations we want to perform with the root finding algorithm. Multidisciplinary Design Optimization of Aircrafts 83
34 Input: function f, endpoint values a and b, tolerance ε or maximum iterations N (conditions: a < b and f(a)f(b) < 0) Output: value which differs from a root of f(x) = 0 by less than ε begin k 1 while k N do c (a + b)/2 if f(c) = 0 or (b a)/2 < ε then return c stop end k k + 1 if sign(f(c)) = sign(f(a)) then a c else b c end end output Method failed end Pseudo-code 4: Bisection Method Multidisciplinary Design Optimization of Aircrafts 84
35 Newton-Raphson Method Newton s method for finding a zero can be derived from the Taylor s series expansion of the function about some initial guess x k, f(x k+1 ) = f(x k ) + (x k+1 x k )f (x k ) + O((x k+1 x k ) 2 ) where x k+1 = x k + x. Setting the function to zero and ignoring the terms of order higher than two results f(x k ) + (x k+1 x k )f (x k ) 0 Solving for the new estimate, x k+1, yields x k+1 = x k f(x k) f (x k ) (2.25) Multidisciplinary Design Optimization of Aircrafts 85
36 This iterative procedure converges quadratically, so lim k x k+1 x x k x 2 = const. While having quadratic converge is a great property, this method is not guaranteed to converge, and only works under certain conditions. To minimize a function using Newton s method, we simply substitute the function for its first derivative and the first derivative by the second derivative, x k+1 = x k f (x k ) f (x k ) (2.26) Multidisciplinary Design Optimization of Aircrafts 86
37 Input: function f, starting value x 0, tolerance ε, maximum iterations N Output: value which differs from a root of f(x) = 0 by less than ε begin k 1 while k N do x k+1 x k f(x k )/f (x k ) if x k+1 x k < ε or f(x k+1 ) < ε then return x k+1 stop end x k x k+1 k k + 1 end output Method failed end Pseudo-code 5: Newton-Raphson Method Multidisciplinary Design Optimization of Aircrafts 87
38 Example 2.3: Function Minimization Using Newton s Method Solve the single-variable optimization problem using Newton s method. minimize f(x) = (x 3)x 3 (x 6) 4 w.r.t. x Newton s method with several different initial guesses. The x k are Newton iterates. x N is the converged solution. Multidisciplinary Design Optimization of Aircrafts 88
39 Secant Method Newton s method requires the first derivative for each iteration (and the second derivative when applied to minimization). In some practical applications, it might not be possible to obtain this derivative analytically or it might just be troublesome. If we use a backward-difference approximation for f (x k ), f (x k ) f(x k) f(x k 1 ) x k x k 1 and substitute into Newton s method, results x k+1 = x k f(x k ) ( xk x k 1 ) f(x k ) f(x k 1 ) (2.27) which is the secant method ( the poor-man s Newton method ). Under favorable conditions, this method has superlinear convergence (1 < r < 2), with r Multidisciplinary Design Optimization of Aircrafts 89
40 Input: function f, starting values x 0 and x 1 near the root, tolerance ε, maximum iterations N Output: value which differs from a root of f(x) = 0 by less than ε begin k 1 while k N do if f(x k 1 ) < f(x k ) then swap x k 1 and x k end x k x k 1 x k+1 x k f(x k ) f(x k ) f(x k 1 ) if x k+1 x k < ε or f(x k+1 ) < ε then return x k+1 stop end x k 1 x k x k x k+1 k k + 1 end output Method failed end Pseudo-code 6: Secant Method Multidisciplinary Design Optimization of Aircrafts 90
41 Linear Interpolation Method The bisection method is very simple, but generally quite inefficient, in part because it only makes use of the sign of the function f(x) at each evaluation, while ignoring its magnitude. Thus it ignores significant information which could be used accelerate the finding of the root. A method based on interpolation makes use of this information by approximating the function on the interval [x 1, x 2 ] by the chord joining the points (x 1, f(x 1 )) and (x 2, f(x 2 )), that is the straight line: y y 1 = y 2 y 1 (2.28) x x 1 x 2 x 1 Solving this linear equation for y = 0 yields the new interval endpoint within the interval [x 1, x 2 ]: ( ) x2 x 1 x 3 = x 1 f(x 1 ) f(x 2 ) f(x 1 ) (2.29) The choice between the two intervals [x 1, x 3 ] and [x 3, x 2 ] is decided by evaluating f(x 3 ) and discarding the interval whose endpoints have the same sign, as was done in the bisection method. This iteration process is repeated, but it will converge more quickly than the bisection method since the information about the magnitude of f(x) pushes the x 3 value more quickly towards the actual root. Multidisciplinary Design Optimization of Aircrafts 91
42 Input: function f, starting values x 0 and x 1 near the root, tolerance ε, maximum iterations N (condition: f(x 0 )f(x 1 ) < 0) Output: value which differs from a root of f(x) = 0 by less than ε begin k 1 while k N do x k x k 1 x k+1 x k f(x k ) f(x k ) f(x k 1 ) if x k+1 x k < ε or f(x k+1 ) < ε then return x k+1 stop end if f(x k 1 )f(x k+1 ) < 0 then x k x k+1 else x k 1 x k+1 end k k + 1 end output Method failed end Pseudo-code 7: Linear Interpolation Method Multidisciplinary Design Optimization of Aircrafts 92
43 2.1.8 Line Search Techniques Line search methods are related to single-variable optimization methods as they address the problem of minimizing a multi-variable function along a line, which is a subproblem in many gradient-based optimization methods. After a gradient-based optimizer has computed a search direction p k, it must decide how far to move along that direction. The step can be written as where the positive scalar α k is the step length. x k+1 = x k + α k p k (2.30) Most algorithms require that p k be a descent direction, i.e., that p k must have a projection in g k such that p T k g k < 0. This guarantees that f can be reduced by stepping along this direction. We want to compute a step length α k that yields a substantial reduction in f, but we do not want to spend too much computational effort in making the choice. Ideally, we would find the global minimum of f(x k + α k p k ) with respect to α k but in general, it is too expensive to compute this value. Even to find a local minimizer usually requires too many evaluations of the objective function f and possibly its gradient g. More practical methods perform an inexact line search that achieves adequate reductions of f at reasonable cost. Multidisciplinary Design Optimization of Aircrafts 93
44 Wolfe Conditions A typical line search involves trying a sequence of step lengths, accepting the first that satisfies certain conditions. A common condition requires that α k should yield a sufficient decrease of f, as given by the inequality f(x k + αp k ) f(x k ) + µ 1 αg T k p k (2.31) for a constant 0 µ 1 1. In practice, this constant is small, say µ 1 = Any sufficiently small step can satisfy the sufficient decrease condition, so in order to prevent steps that are too small we need a second requirement called the curvature condition, which can be stated as g(x k + αp k ) T p k µ 2 g T k p k (2.32) where µ 1 µ 2 1, and g(x k + αp k ) T p k is the derivative of f(x k + αp k ) with respect to α k. This condition requires that the slope of the univariate function at the new point be greater. Since we start with a negative slope, the gradient at the new point must be either less negative or positive. Typical values of µ 2 are 0.9 when using a Newton type method and 0.1 when a conjugate gradient methods is used. Multidisciplinary Design Optimization of Aircrafts 94
45 The sufficient decrease (2.31) and curvature (2.32) conditions are known collectively as the Wolfe conditions. We can also modify the curvature condition to force α k to lie in a broad neighborhood of a local minimizer or stationary point and obtain the strong Wolfe conditions f(x k + αp k ) f(x k ) + µ 1 αg T k p k. (2.33) g(x k + αp k ) T g T p k µ2, (2.34) where 0 < µ 1 < µ 2 < 1. The only difference when comparing with the Wolfe conditions is that these conditions do not allow points where the derivative has a positive value that is too large, and therefore exclude points that are far from the stationary points. If µ 2 = 0, then require g(x k + αp k ) T p k = 0 and we have an exact line search. k p k Figure 2.1: Acceptable steps for the Wolfe conditions Multidisciplinary Design Optimization of Aircrafts 95
46 Sufficient Decrease and Backtracking The curvature condition can be ignored by performing backtracking, i.e., by executing the following algorithm: 1. Choose a starting step length 0 < ᾱ < 1, and reduction ratio 0 < ρ < If f(x k + αp k ) f(x k ) + µ 1 αg T k p k then set α k = α and stop. 3. α = ρα. 4. Return to 2. When using Newton or quasi-newton methods, the starting step length ᾱ is usually set to 1. The step size reduction ratio, ρ, sometimes varies during the optimization process and is such that 0 < ρ < 1. In practice ρ is not set to be too close to 0 or 1. For steepest descent and conjugate gradient methods, which do not produce well-scaled search directions, need to use other information to guess a step length. One strategy is to assume that the first-order change in x k will be the same as the one obtained in the previous step. i.e, that ᾱg T k p k = α k 1 g T k 1 p k 1 and therefore: g T k 1 ᾱ = α p k 1 k 1 gk T p. (2.35) k Multidisciplinary Design Optimization of Aircrafts 96
47 Line Search Algorithm Using the Strong Wolfe Conditions This procedure is guaranteed to find a step length satisfying the strong Wolfe conditions for any parameters µ 1 and µ 2. This procedure has two stages: 1. Begins with trial α 1, and keeps increasing it until it finds either an acceptable step length or an interval that brackets the desired step lengths. 2. In the latter case, a second stage (the zoom algorithm) is performed that decreases the size of the interval until an acceptable step length is found. Define the univariate function φ(α) = f(x k + αp k ), so that φ(0) = f(x k ). Therefore, φ (α i ) is the derivative of f in the line direction taken with respect to α at α i. Multidisciplinary Design Optimization of Aircrafts 97
48 The first stage is as follows: 1. Set α 0 = 0, choose α 1 > 0 and α max. Set i = Evaluate φ(α i ). 3. If [φ(α i ) > φ(0) + µ 1 α i φ (0)] or [φ(α i ) > φ(α i 1 ) and i > 1] then, set α = zoom(α i 1, α i ) and stop (local minimum bracketed). 4. Evaluate φ (α i ). 5. If φ (α i ) µ 2 φ (0), set α = α i and stop. 6. If φ (α i ) 0, set α = zoom(α i, α i 1 ) and stop. 7. Choose α i+1 such that α i < α i+1 < α max. 8. Set i = i Return to 2. Multidisciplinary Design Optimization of Aircrafts 98
49 The second stage, the zoom(α lo, α hi ) function: 1. Interpolate (using quadratic, cubic, or bisection) to find a trial step length α j between α lo and α hi. 2. Evaluate φ(α j ). 3. If φ(α j ) > φ(0) + µ 1 α j φ (0) or φ(α j ) > φ(α lo ), set α hi = α j. 4. Else: (a) Evaluate φ (α j ). (b) If φ (α j ) µ 2 φ (0), set α = α j and stop. (c) If φ (α j )(α hi α lo ) 0, set α hi = α lo. (d) α lo = α j. Implementing an algorithm based on the strong Wolfe conditions (as opposed to the plain Wolfe conditions) has the advantage that by decreasing µ 2, we can force α to lie closer to the local minimum. More details can be found in Nocedal and Wright, pg [47]. Multidisciplinary Design Optimization of Aircrafts 99
50 Example 2.4: Line Search Algorithm Using Strong Wolfe Conditions The line search algorithm iterations. The first stage is marked with square labels and the zoom stage is marked with circles. Multidisciplinary Design Optimization of Aircrafts 100
51 2.2 Unconstrained Gradient-Based Minimization Many engineering problems involve the unconstrained minimization of a function of several variables. Unconstrained problems also arise when the constraints are eliminated and accounted for by suitable penalty functions. All these problems are of the form minimize by varying f(x) x R n The point x is a strong local minimum, if f(x ) < f(x) for all x near x ; weak local minimum, if f(x ) f(x) for all x near x ; strong global minimum, if f(x ) < f(x), for all x; weak global minimum, if f(x ) f(x), for all x. Note on convention: lowercase or bold roman letters are vectors, lowercase Greek letters are scalars, and uppercase roman letters are matrices. Multidisciplinary Design Optimization of Aircrafts 101
52 2.2.1 Gradient Vector and Hessian Matrix of a Multivariable Function Let f(x) be a real function where x = [x 1, x 2,..., x n ] T is a column vector of n real-valued design variables. The gradient vector of the function f(x) is given by the partial derivatives with respect to each of the independent variables, f x 1 f f(x) g(x) x 2 (2.36). f In the multivariate case, the gradient vector is perpendicular to the the hyperplane tangent to the contour surfaces of constant f: let the tangent plane be defined by t = [ x 1 / s, x 2 / s,, x n / s] T, where s is along a contour or isosurface; then, f(x) = const df ds = 0 df ds = f x 1 x 1 s + f x n x 2 x 2 s + + f x n x n s = 0 f T t = 0 therefore, the dot product of the gradient with the tangent to the contour surface is zero. Multidisciplinary Design Optimization of Aircrafts 102
53 Higher derivatives of multi-variable functions are defined as in the single-variable case, but note that the number of gradient components increase by a factor of n for each differentiation. While the gradient of a function of n variables is an n-vector, the second derivative of an n-variable function is defined by n 2 partial derivatives (the derivatives of the n first partial derivatives with respect to the n variables): 2 f x i x j, i j and 2 f, i = j. x 2 i If the partial derivatives f/ x i, f/ x j and 2 f/ x i x j are continuous and f is single valued, then 2 f/ x i x j exists and 2 f/ x i x j = 2 f/ x j x i. Therefore the second-order partial derivatives can be represented by a square symmetric matrix called the Hessian matrix, 2 f 2 f 2 2 x 1 x 1 x n f(x) H(x).. (2.37) 2 f 2 f, x n x 1 2 x n which contains n(n + 1)/2 independent elements. If f is quadratic, the Hessian of f is constant, and the function can be expressed as f(x) = 1 2 xt Hx + g T x + α. (2.38) Multidisciplinary Design Optimization of Aircrafts 103
54 2.2.2 Optimality Conditions As in the single-variable case, the optimality conditions can be derived from the Taylor-series expansion of f about x : f(x + εp) = f(x ) + εp T g(x ) ε2 p T H(x + εθp)p, (2.39) where 0 θ 1, ε is a scalar, and p is an n-vector. For x to be a local minimum, then for any vector p there must be a finite ε such that f(x + εp) f(x ), i.e. there is a neighborhood in which this condition holds. If this condition is satisfied, then f(x + εp) f(x ) 0 and the first and second order terms in the Taylor-series expansion must be greater than or equal to zero. As in the single variable case, and for the same reason, the first order terms are considered first. Since p is an arbitrary vector and ε can be positive or negative, every component of the gradient vector g(x ) must be zero. Multidisciplinary Design Optimization of Aircrafts 104
55 A point that satisfies f(x ) g(x ) = 0 is called a stationary point, and it can be a minimum, maximum or saddle point. Stationary points Regarding the second order term, ε 2 p T H(x + εθp)p. For this term to be non-negative, H(x + εθp) has to be positive semi-definite, and by continuity, the Hessian at the optimum, H(x ) must also be positive semi-definite. Multidisciplinary Design Optimization of Aircrafts 105
56 Necessary conditions (for a local minimum): g(x ) = 0 and H(x ) is positive semi-definite. (2.40) Sufficient conditions (for a strong local minimum): g(x ) = 0 and H(x ) is positive definite. (2.41) Some definitions from linear algebra that might be helpful: The matrix H R n n is positive definite if p T Hp > 0 for all nonzero vectors p R n (If H = H T then all the eigenvalues of H are strictly positive) convex function The matrix H R n n is positive semi-definite if p T Hp 0 for all vectors p R n (If H = H T then the eigenvalues of H are positive or zero) convex and flat function The matrix H R n n is indefinite if there exists p, q R n such that p T Hp > 0 and q T Hq < 0. (If H = H T then H has eigenvalues of mixed sign.) saddle point Multidisciplinary Design Optimization of Aircrafts 106
57 Example 2.5: Find all stationary points of f(x) = 1.5x x2 2 2x 1x 2 + 2x x4 1. Solve f(x) = 0, get three solutions: (0, 0) f = 0 Local minimum 1/2( 3 7, 3 7) f = Global minimum 1/2( 3 + 7, 3 + 7) f = Saddle point To establish the type of point, we have to determine if the Hessian is positive definite and compare the values of the function at the points. Multidisciplinary Design Optimization of Aircrafts 107
58 2.2.3 General Algorithm for Smooth Functions All algorithms for unconstrained gradient-based optimization can be described as follows: 1. Initial guess. Start with iteration number k = 0 and a starting point, x Test for convergence. If the conditions for convergence are satisfied, then we can stop and x k is the solution. 3. Compute a search direction. Compute the vector p k that defines the direction in n-space along which we will search. 4. Compute the step length. Find a positive scalar, α k such that f(x k + α k p k ) < f(x k ). 5. Update the design variables. Set x k+1 = x k + α k p k, k = k + 1 and go back to 2. There are two subproblems in this type of algorithm for each major iteration: computing the search direction p k and finding the step size (controlled by α k ). The difference between the various types of gradient-based algorithms is the method that is used for computing the search direction. Caution: in non-convex problems with multiple local minima, the solution obtained by gradient methods will only find the local minimum nearest to the starting point. Multidisciplinary Design Optimization of Aircrafts 108
59 2.2.4 Steepest Descent Method The earliest reference to this method is given by Cauchy in 1847 [14]. The steepest descent method uses the gradient vector at each point as the search direction for each iteration. The gradient vector at a point, g(x k ), is also the direction of maximum rate of change (maximum increase) of the function at that point. This rate of change is given by the norm, g(x k ). As mentioned previously, the gradient vector is orthogonal to the plane tangent to the isosurfaces of the function. If we use an exact line search, the steepest descent direction at each iteration is orthogonal to the previous one, i.e., df(x k+1 ) dα = f(x k+1) x k+1 = x k+1 α = T f(x k+1 )p k = 0 g T (x k+1 )g(x k ) = 0 (2.42) Therefore the method zigzags in the design space and is rather inefficient. Although a substantial decrease may be observed in the first few iterations, the method is usually very slow after that. In particular, while the algorithm is guaranteed to converge, it may take an infinite number of iterations. The rate of convergence is linear. Multidisciplinary Design Optimization of Aircrafts 109
60 Input: function f, starting point x 0 and convergence parameters ε g, ε a and ε r Output: local minimum of f begin repeat compute g(x k ) f(x k ) if g(x k ) ε g then converged else compute normalized search direction p k = g(x k )/ g(x k ) end perform line search to find step length α k in the direction of p k update the current point, x k+1 = x k + α k p k evaluate f(x k+1 ) if f(x k+1 ) f(x k ) ε a + ε r f(x k ) satisfied for two successive iterations then converged else set k = k + 1, x k = x k+1 end until converged stop end Pseudo-code 8: Steepest Descent Algorithm Multidisciplinary Design Optimization of Aircrafts 110
61 Here, f(x k+1 ) f(x k ) ε a + ε r f(x k ) is a check for the successive reductions of f. ε a is the absolute tolerance on the change in function value (usually small 10 6 ) and ε r is the relative tolerance (usually set to 0.01). If f is order 1, then ε r dominates. If f gets too small, then the absolute error take over. For steepest descent and other gradient methods that do not produce well-scaled search directions, we need to use other information to guess a step length. One strategy is to assume that the first-order change in x k will be the same as the one obtained in the previous step. i.e, that ᾱg T k p k = α k 1 g T k 1 p k 1 and therefore g T k 1 ᾱ = α p k 1 k 1 gk T p. (2.43) k Since steepest descent relies only on first order information, which is useful locally, it does not take into account neither previous iterations nor second order information, which helps in getting the bigger picture. Multidisciplinary Design Optimization of Aircrafts 111
62 Example 2.6: Steepest Descent Applied to a Quadratic Function Figure 2.2: Solution path of the steepest descent method Multidisciplinary Design Optimization of Aircrafts 112
63 2.2.5 Conjugate Gradient Method First presented by Fletcher and Reeves [27], this method includes a small modification to the steepest descent method that takes into account the history of the gradients to move more directly towards the optimum. It can find the minimum of a quadratic function of n variables in n iterations. Consider the problem of minimizing a (convex) quadratic function f(x) = 1 2 xt Ax c T x (2.44) where A is an n n matrix that is symmetric and positive definite. Differentiating with respect to x yields f(x) = Ax c (2.45) Thus, minimizing the quadratic is equivalent to solving a linear system. The conjugate gradient method is an iterative method for solving linear systems of equations such as this one. A set of nonzero vectors {p 0, p 1,..., p n 1 } is conjugate with respect to A if p T i Ap j = 0, for all i j. (2.46) Conjugate vectors are linearly independent. Multidisciplinary Design Optimization of Aircrafts 113
64 Suppose that we start from a point x 0 and a set of conjugate directions {p 0, p 1,..., p n 1 }. In this method, the gradients of f are used to generate the conjugate directions. Let g k = f(x k ) = Ax k c, being x k the current point at iteration k. The first direction is chosen as the steepest direction p 0 = g 0. (2.47) The sequence {x k } is generated by minimizing f(x) along p k, thus x k+1 = x k + α k p k (2.48) where α k is obtained from the line search problem minimize f(α) = f(x k + αp k ) (2.49) Setting df(α)/ dα = 0 yields p T k Ap kα k + p T k (Ax k + c) = 0 α k = pt k g k p T k Ap k (2.50) Also, the exact line search condition df(α)/dα yields p T k g k+1 = 0 (2.51) Multidisciplinary Design Optimization of Aircrafts 114
65 Now, the key step: choosing p k+1 to be of the form p k+1 = g k+1 + β k d k (2.52) where β introduces a deflection in the steepest descent direction. Requiring p k+1 to be a conjugate direction to p k, p T k+1 Ap k = g T k+1 Ap k + β k d T k Ap k = 0 (2.53) Manipulating x k+1 = x k + α k p k leads to Ap k = (g k+1 g k )/α k. (2.54) Rearranging equations yields β k = gt k+1 (g k+1 g k ) α k p T k Ap k (2.55) Right multiplying (2.52) with g k+1 and using (2.51) results p T k g k = g T k g k (2.56) Multidisciplinary Design Optimization of Aircrafts 115
66 Substituting (2.56) in (2.50) leads to α k = gt k g k p T k Ap k (2.57) Finally, replacing (2.57) in (2.55), β k = gt k+1 (g k+1 g k ) g T k g k (2.58) For any x 0, the sequence {x k } generated by the conjugate direction algorithm converges to the solution of the linear system in at most n steps, only for quadratic functions. This is referred as the (linear) Polak Rebiere algorithm. Convergence is also only guaranteed with exact line search and no round-off errors. In the case of general functions, a restart is made every n iterations wherein the steepest descent step is taken for computational stability. Multidisciplinary Design Optimization of Aircrafts 116
67 If we consider g T k+1 g k = g T k+1 ( p k + β k 1 p k 1 ) = β k 1 g T k+1 p k 1 = β k 1 (g T k + α kp T k A)p k 1 = 0 Substituting in (2.58), we obtain β k = gt k+1 g k+1 gk T g k which is the nonlinear CG algorithm, also known as the Fletcher Reeves method. (2.59) The only difference of CG relative to the steepest descent is that the each descent direction is modified by adding a contribution from the previous direction. Rate of convergence is linear, but can be superlinear, converges in n to 5n iterations, usually 2n. Several variants of the Fletcher Reeves CG method have been proposed. Most of these variants differ in their definition of β k. For example, Dai and Yuan [16] proposed β k = g k+1 2 (g k+1 g k ) T p k. (2.60) Multidisciplinary Design Optimization of Aircrafts 117
68 Input: function f, starting point x 0 and convergence parameters ε g, ε a and ε r Output: local minimum of f begin set k = 0 compute g(x k ) f(x k ) if g(x k ) ε g then converged end repeat compute conjugate gradient direction p k = g k + β k p k 1, where gt k g k β k = g k 1 T g k 1 perform line search to find step length α k in the direction of p k update the current point, x k+1 = x k + α k p k evaluate f(x k+1 ) if f(x k+1 ) f(x k ) ε a + ε r f(x k ) satisfied for two successive iterations then converged else set k = k + 1, x k = x k+1 end until converged end Pseudo-code 9: Nonlinear Conjugate Gradient Algorithm Multidisciplinary Design Optimization of Aircrafts 118
69 Example 2.7: Conjugate Gradient Applied to a Quadratic Function Figure 2.3: Solution path of the nonlinear conjugate gradient method Multidisciplinary Design Optimization of Aircrafts 119
70 2.2.6 Newton Methods Even though the Newton s method lacks robustness for optimization, its concepts lay down the basis for other powerful methods discussed subsequently. While the steepest descent and conjugate gradient methods only use first-order information (the function gradient or first derivative term in the Taylor series) to obtain a local model of the function, Newton methods use a second-order Taylor series expansion of the function (the function Hessian or second-derivative term in the Taylor series) about the current design point, i.e., a quadratic model f(x k + d k ) f k + g T k d k dt k H kd k, (2.61) where d k is the step to the minimum. Differentiating this with respect to d k and setting it to zero, we can obtain the step that minimizes this quadratic, H k d k = g k. (2.62) This is a linear system which yields the Newton step, d k, as a solution. Thus, the Newton method gives both the search direction and step size, i.e., p k = d k and α k = 1. Multidisciplinary Design Optimization of Aircrafts 120
71 When it converges, this method converges at a faster rate than first-oder methods. If the function f is quadratic with a positive definite Hessian matrix H k, then the method converges in one step. For general nonlinear functions, Newton s method converges quadratically if x 0 is sufficiently close to x and the Hessian is positive definite at x. Despite the excellent convergence rate, this method has two main disadvantages: As in the single variable case, difficulties and even failure may occur when the quadratic model is a poor approximation of the function f. If H k is not positive definite, the quadratic model might not have a minimum or even a stationary point. For some nonlinear functions, the Newton step might be such that f(x k + s k ) > f(x k ) and the method is not guaranteed to converge. Another disadvantage of Newton s method is the need to compute not only the gradient, but also the Hessian, which contains n(n + 1)/2 second-order derivatives. Multidisciplinary Design Optimization of Aircrafts 121
72 Input: function f, starting point x 0 and convergence parameters ε g, ε a and ε r Output: local minimum of f begin set k = 0 repeat compute g k f(x k ) if g(x k ) ε g then converged end compute H k 2 f(x k ) compute Newton step d k from H k d k = g k update the current point, x k+1 = x k + d k if f(x k+1 ) f(x k ) ε a + ε r f(x k ) satisfied for two successive iterations then converged else set k = k + 1, x k = x k+1 end until converged stop end Pseudo-code 10: Newton s Method Multidisciplinary Design Optimization of Aircrafts 122
73 Modified Newton s Method To address the two mentioned Newton s method main disadvantages, two modifications can be made: Ensure that the search direction is a descent direction of f at x k That is, ensure that f(x k ) T d k < 0, which using (2.62) means f(x k ) T [ 2 f(x k )] 1 f(xk ) < 0 (2.63) For the above to be satisfied, the Hessian of f has to be positive definite. One strategy is to replace the real Hessian with a symmetric positive definite matrix F k defined by F k = H k + γi, (2.64) where γ is chosen such that all the eigenvalues of F k are greater than a scalar δ > 0. The direction vector d k is now determined from the solution of F k d k = g k. (2.65) Multidisciplinary Design Optimization of Aircrafts 123
74 A step size parameter α k can be introduced to improve the approximation of highly nonlinear functions The step size α k is obtained from line search: minimize f(x k + α k d k ) The new point is then found to be x k+1 = x k + α k d k When using Newton or quasi-newton methods, the starting step length ᾱ is usually set to 1, since Newton s method already provides a good guess for the step size. The step size reduction ratio (ρ in the backtracking line search) sometimes varies during the optimization process and is such that 0 < ρ < 1. In practice ρ is not set to be too close to 0 or 1. Multidisciplinary Design Optimization of Aircrafts 124
75 Input: function f, starting point x 0, scalar δ > 0 and convergence param.ε g, ε a and ε r Output: local minimum of f begin set k = 0 repeat compute g k f(x k ) if g(x k ) ε g then converged end compute H k 2 f(x k ) and F k = H k + γi compute search direction d k from F k d k = g k compute step size α k from: minimize f(x k + α k d k ) update current point, x k+1 = x k + d k if f(x k+1 ) f(x k ) ε a + ε r f(x k ) satisfied for two successive iterations then converged else set k = k + 1, x k = x k+1 end until converged stop end Pseudo-code 11: Modified Newton s Method Multidisciplinary Design Optimization of Aircrafts 125
76 Example 2.8: Modified Newton s Method Applied to a Quadratic Function Figure 2.4: Solution path of the modified Newton s method Multidisciplinary Design Optimization of Aircrafts 126
77 2.2.7 Quasi-Newton Methods This class of methods uses first order information only, but builds second order information an approximate Hessian based on the sequence of function values and gradients from previous iterations. Most of these methods also force the Hessian to be symmetric and positive definite, which can greatly improve their convergence properties. Key to success of Newton s is the use of n-dimensional curvature information given by the Hessian, which allows a local quadratic model of f. Quasi-Newton contrasts with Newton in that with Newton, all gradients and all curvature terms are computed at a single point. The update formula is now x k+1 = x k α k V k f(x k ) (2.66) where V k is the inverse of the Hessian approximation, V k F 1 k, and the step size α k is determined by minimizing f(x k + αd k ) with respect to α, where d k = V k f(x k ). Multidisciplinary Design Optimization of Aircrafts 127
78 When using quasi-newton methods, the inverse Hessian is initialized to the identity matrix, V 0 = I. The update at each iteration is written as ˆV k and is added to the current one, V k+1 = V k + ˆV k. (2.67) Considering the Taylor-series expansion of the gradient function about x k, g(x k+1 ) = g k + H k s k + (2.68) where s k = x k+1 x k. Neglecting higher-order terms in this series yields where y k = g(x k+1 ) g(x k ). H k s k = y k, (2.69) Then, the new approximate inverse of the Hessian, V k+1 must satisfy the quasi-newton condition, V k+1 y k = s k. (2.70) The quasi-newton methods are the most widely used of the gradient optimization methods. Multidisciplinary Design Optimization of Aircrafts 128
79 Davidon Fletcher Powell (DFP) Method One of the first quasi-newton methods was devised by Davidon (1959) [20] and modified by Fletcher and Powell (1963) [26]. Instead of computing V k from scratch at every iteration, a quasi-newton method updates it in a way that accounts for the curvature measured during the most recent step. The DFP update for the inverse of the Hessian approximation can be shown to be V DF P k+1 = V k V ky k y T k V k y T k V ky k + s ks T k s T k y k (2.71) Notice that V k+1 remains symmetric and it can also be shown that it remains positive definite (assuming H k is positive definite). When applied to quadratic functions, the update formula results in the exact inverse of the Hessian matrix after n iterations; this implies convergence at the end of n iterations (same as CG method). For large problems, the storage and update of V may be a disadvantage of quasi-newton methods compared to the conjugate gradient method. Multidisciplinary Design Optimization of Aircrafts 129
80 The DFP Algorithm 1. Select starting point x 0, and convergence parameter ε g. Set k = 0 and V 0 = I. 2. Compute g(x k ) f(x k ). If g(x k ) ε g then stop. Otherwise, continue. 3. Compute the search direction, p k = V k g k. 4. Perform line search to find step length α k in the direction of p k (start with α k = 1). 5. Update the current point, x k+1 = x k + α k p k, set s k = α k p k, and compute the change in the gradient, y k = g k+1 g k. 6. Update V k+1 by computing A k = V ky k y T k V k y T k V ky k B k = s ks T k s T k y k 7. Set k = k + 1 and return to step 2. V k+1 = V k A k + B k Multidisciplinary Design Optimization of Aircrafts 130
81 Broyden Fletcher Goldfarb Shanno (BFGS) Method The DFP update was soon superseded by the BFGS formula [12, 13, 25, 31, 54], which is generally considered to be the most effective quasi-newton updates. The BFGS update formula for the inverse of the Hessian approximation can be shown to be BF GS Vk+1 = V k s ky T k V k + V k y k s T k s T k y k + ( 1 + yt k V ky k s T k y k ) sk s T k s T k y k (2.72) The relative performance between the DFP and BFGS methods is problem dependent. The BFGS update is better suited than the DFP update when using approximate line search. Multidisciplinary Design Optimization of Aircrafts 131
82 Example 2.9: BFGS Applied to a Quadratic Function Figure 2.5: Solution path of the BFGS method Multidisciplinary Design Optimization of Aircrafts 132
83 2.2.8 Trust Region Methods Trust region, or restricted step methods are a different approach to resolving the weaknesses of the pure form of Newton s method, arising from an Hessian that is not positive definite or a highly nonlinear function. One may interpret these problems as arising from trying to minimize the quadratic approximation of f(x), minimize q(x) = f(x k ) + f(x k ) (x x k) T 2 f(x k )(x x k ) in a region which is outside the validity region of the quadratic approximation. These difficulties can be overcome by minimizing the function within a region around x k wherein the second-order Taylor series approximation is valid, that is to say, where there is trust in the quadratic model. This region is called the trust region and can be denoted by Ω k = {x : x x k h h } where h k is the size of the trust region, which is dynamically adjusted. Multidisciplinary Design Optimization of Aircrafts 133
84 The quadratic approximation q(x) is minimized within Ω k, minimize q(s k ) = f(x k ) + g(x k ) T s k st k H(x k)s k w.r.t. s k (2.73) s.t. h k s k h k, i = 1,..., n which is a constrained minimization problem involving a quadratic objective function and linear constraints. This class of problems is called a quadratic programming (QP) problem and its solution are discussed in future sections. After obtaining s k, the actual and predicted changes in the objective function can be computed as f = f(x k ) f(x k + s k ) (2.74) q = f(x k ) q(s k ) (2.75) Then the accuracy with which q(s k ) approximates f(x k + s k ) can be measured by the ratio r k = f q (2.76) The closer r k is to unity, the better is the agreement. Multidisciplinary Design Optimization of Aircrafts 134
85 The size of the trust region is updated based on this ratio as follows: h k+1 = s k if r k < 0.25, 4 h k+1 = 2h k if r k > 0.75 and h k = s k, (2.77) h k+1 = h k otherwise. The initial value of h is usually taken as h 0 = 1. The quadratic model is reasonable when q(x k ) is close to the real value of the function f(x k ). A particular advantage of trust region methods is that they are not very sensitive to scaling because of the dynamic adjustment of the size of the trust region. They are also very robust. Multidisciplinary Design Optimization of Aircrafts 135
86 Input: function f, starting point x 0, conv.param. ε g, ε a and ε r, and initial size of trust region h 0 Output: local minimum of f begin set k = 0 repeat compute g k f(x k ) if g(x k ) ε g then converged end compute H k 2 f(x k ) and solve the quadratic subproblem (2.74) for s k evaluate f(x k + s k ) and compute quadratic model accuracy ratio r k (2.76) compute size for new trust region using (2.77) determine new point: x k+1 = x k if r k 0; x k+1 = x k + s k otherwise if f(x k+1 ) f(x k ) ε a + ε r f(x k ) satisfied for two successive iterations then converged else set k = k + 1, x k = x k+1 end until converged end Pseudo-code 12: Trust Region algorithm Multidisciplinary Design Optimization of Aircrafts 136
87 Example 2.10: Minimization of the Rosenbrock Function Minimize Rosenbrock s function, f(x) = 100 ( x 2 x 2 1) 2 + (1 x1 ) 2, starting from x 0 = [ ] Multidisciplinary Design Optimization of Aircrafts 137
88 Figure 2.6: Solution path of the steepest descent and conjugate gradient methods Multidisciplinary Design Optimization of Aircrafts 138
89 Figure 2.7: Solution path of the modified Newton and BFGS methods Multidisciplinary Design Optimization of Aircrafts 139
Single Variable Minimization
AA222: MDO 37 Sunday 1 st April, 2012 at 19:48 Chapter 2 Single Variable Minimization 2.1 Motivation Most practical optimization problems involve many variables, so the study of single variable minimization
More informationLine Search Techniques
Multidisciplinary Design Optimization 33 Chapter 2 Line Search Techniques 2.1 Introduction Most practical optimization problems involve many variables, so the study of single variable minimization may
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More information1 Numerical optimization
Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationGradient-Based Optimization
Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems
More informationIntroduction to Nonlinear Optimization Paul J. Atzberger
Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationUnconstrained Multivariate Optimization
Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued
More information3.1 Introduction. Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x x 1.5 =0, tan x x =0.
3.1 Introduction Solve non-linear real equation f(x) = 0 for real root or zero x. E.g. x 3 +1.5x 1.5 =0, tan x x =0. Practical existence test for roots: by intermediate value theorem, f C[a, b] & f(a)f(b)
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationOptimization Methods
Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available
More informationMotivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)
AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.
More informationStatistics 580 Optimization Methods
Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation
More informationNumerical Methods in Informatics
Numerical Methods in Informatics Lecture 2, 30.09.2016: Nonlinear Equations in One Variable http://www.math.uzh.ch/binf4232 Tulin Kaman Institute of Mathematics, University of Zurich E-mail: tulin.kaman@math.uzh.ch
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More information2.3 Linear Programming
2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationNumerical Optimization of Partial Differential Equations
Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada
More informationLecture Notes: Geometric Considerations in Unconstrained Optimization
Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections
More informationGENG2140, S2, 2012 Week 7: Curve fitting
GENG2140, S2, 2012 Week 7: Curve fitting Curve fitting is the process of constructing a curve, or mathematical function, f(x) that has the best fit to a series of data points Involves fitting lines and
More informationOutline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems
Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationChapter III. Unconstrained Univariate Optimization
1 Chapter III Unconstrained Univariate Optimization Introduction Interval Elimination Methods Polynomial Approximation Methods Newton s Method Quasi-Newton Methods 1 INTRODUCTION 2 1 Introduction Univariate
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More information17 Solution of Nonlinear Systems
17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m
More informationOPER 627: Nonlinear Optimization Lecture 14: Mid-term Review
OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization
More informationChapter 3: Root Finding. September 26, 2005
Chapter 3: Root Finding September 26, 2005 Outline 1 Root Finding 2 3.1 The Bisection Method 3 3.2 Newton s Method: Derivation and Examples 4 3.3 How To Stop Newton s Method 5 3.4 Application: Division
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationSeptember Math Course: First Order Derivative
September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one
More informationUniversity of Houston, Department of Mathematics Numerical Analysis, Fall 2005
3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider
More information, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are
Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues
More informationUNCONSTRAINED OPTIMIZATION
UNCONSTRAINED OPTIMIZATION 6. MATHEMATICAL BASIS Given a function f : R n R, and x R n such that f(x ) < f(x) for all x R n then x is called a minimizer of f and f(x ) is the minimum(value) of f. We wish
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationMATH 4211/6211 Optimization Basics of Optimization Problems
MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization
More informationx 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.
Maria Cameron 1. Fixed point methods for solving nonlinear equations We address the problem of solving an equation of the form (1) r(x) = 0, where F (x) : R n R n is a vector-function. Eq. (1) can be written
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationLine Search Methods. Shefali Kulkarni-Thaker
1 BISECTION METHOD Line Search Methods Shefali Kulkarni-Thaker Consider the following unconstrained optimization problem min f(x) x R Any optimization algorithm starts by an initial point x 0 and performs
More informationComputational Finance
Department of Mathematics at University of California, San Diego Computational Finance Optimization Techniques [Lecture 2] Michael Holst January 9, 2017 Contents 1 Optimization Techniques 3 1.1 Examples
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationUnconstrained Optimization
1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation
More informationUnit 2: Solving Scalar Equations. Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright
cs416: introduction to scientific computing 01/9/07 Unit : Solving Scalar Equations Notes prepared by: Amos Ron, Yunpeng Li, Mark Cowlishaw, Steve Wright Instructor: Steve Wright 1 Introduction We now
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationStatic unconstrained optimization
Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 5 Nonlinear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationAM 205: lecture 18. Last time: optimization methods Today: conditions for optimality
AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3
More informationLecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations
Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational
More informationOptimality Conditions
Chapter 2 Optimality Conditions 2.1 Global and Local Minima for Unconstrained Problems When a minimization problem does not have any constraints, the problem is to find the minimum of the objective function.
More informationNumerical Optimization
Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function
More informationLecture 7: Minimization or maximization of functions (Recipes Chapter 10)
Lecture 7: Minimization or maximization of functions (Recipes Chapter 10) Actively studied subject for several reasons: Commonly encountered problem: e.g. Hamilton s and Lagrange s principles, economics
More informationQuasi-Newton Methods
Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationMultivariate Newton Minimanization
Multivariate Newton Minimanization Optymalizacja syntezy biosurfaktantu Rhamnolipid Rhamnolipids are naturally occuring glycolipid produced commercially by the Pseudomonas aeruginosa species of bacteria.
More informationGradient Descent. Sargur Srihari
Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors
More informationConstrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.
Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization
More informationUnconstrained optimization I Gradient-type methods
Unconstrained optimization I Gradient-type methods Antonio Frangioni Department of Computer Science University of Pisa www.di.unipi.it/~frangio frangio@di.unipi.it Computational Mathematics for Learning
More informationIntroduction to unconstrained optimization - direct search methods
Introduction to unconstrained optimization - direct search methods Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Structure of optimization methods Typically Constraint handling converts the
More informationNonlinear Equations. Chapter The Bisection Method
Chapter 6 Nonlinear Equations Given a nonlinear function f(), a value r such that f(r) = 0, is called a root or a zero of f() For eample, for f() = e 016064, Fig?? gives the set of points satisfying y
More informationNumerical Methods. Root Finding
Numerical Methods Solving Non Linear 1-Dimensional Equations Root Finding Given a real valued function f of one variable (say ), the idea is to find an such that: f() 0 1 Root Finding Eamples Find real
More informationCHAPTER 4 ROOTS OF EQUATIONS
CHAPTER 4 ROOTS OF EQUATIONS Chapter 3 : TOPIC COVERS (ROOTS OF EQUATIONS) Definition of Root of Equations Bracketing Method Graphical Method Bisection Method False Position Method Open Method One-Point
More informationMathematical optimization
Optimization Mathematical optimization Determine the best solutions to certain mathematically defined problems that are under constrained determine optimality criteria determine the convergence of the
More informationScientific Computing: Optimization
Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture
More informationOptimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23
Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More information(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)
Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not
More informationMath 409/509 (Spring 2011)
Math 409/509 (Spring 2011) Instructor: Emre Mengi Study Guide for Homework 2 This homework concerns the root-finding problem and line-search algorithms for unconstrained optimization. Please don t hesitate
More informationOutline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations
Methods for Systems of Methods for Systems of Outline Scientific Computing: An Introductory Survey Chapter 5 1 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationChapter 6: Derivative-Based. optimization 1
Chapter 6: Derivative-Based Optimization Introduction (6. Descent Methods (6. he Method of Steepest Descent (6.3 Newton s Methods (NM (6.4 Step Size Determination (6.5 Nonlinear Least-Squares Problems
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More information15 Nonlinear Equations and Zero-Finders
15 Nonlinear Equations and Zero-Finders This lecture describes several methods for the solution of nonlinear equations. In particular, we will discuss the computation of zeros of nonlinear functions f(x).
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More information1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that
Chapter 4 Nonlinear equations 4.1 Root finding Consider the problem of solving any nonlinear relation g(x) = h(x) in the real variable x. We rephrase this problem as one of finding the zero (root) of a
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationAM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationSolution of Nonlinear Equations
Solution of Nonlinear Equations (Com S 477/577 Notes) Yan-Bin Jia Sep 14, 017 One of the most frequently occurring problems in scientific work is to find the roots of equations of the form f(x) = 0. (1)
More informationMethods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent
Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived
More informationLECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION
15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome
More informationSOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS BISECTION METHOD
BISECTION METHOD If a function f(x) is continuous between a and b, and f(a) and f(b) are of opposite signs, then there exists at least one root between a and b. It is shown graphically as, Let f a be negative
More informationPART I Lecture Notes on Numerical Solution of Root Finding Problems MATH 435
PART I Lecture Notes on Numerical Solution of Root Finding Problems MATH 435 Professor Biswa Nath Datta Department of Mathematical Sciences Northern Illinois University DeKalb, IL. 60115 USA E mail: dattab@math.niu.edu
More informationNumerical Optimization: Basic Concepts and Algorithms
May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some
More informationNumerical Optimization Techniques
Numerical Optimization Techniques Léon Bottou NEC Labs America COS 424 3/2/2010 Today s Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification,
More information1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0
Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationChapter 1. Root Finding Methods. 1.1 Bisection method
Chapter 1 Root Finding Methods We begin by considering numerical solutions to the problem f(x) = 0 (1.1) Although the problem above is simple to state it is not always easy to solve analytically. This
More informationSolution of Algebric & Transcendental Equations
Page15 Solution of Algebric & Transcendental Equations Contents: o Introduction o Evaluation of Polynomials by Horner s Method o Methods of solving non linear equations o Bracketing Methods o Bisection
More information