Numerical Optimization Algorithms

Numerical Optimization Algorithms 1. Overview. Calculus of Variations 3. Linearized Supersonic Flow 4. Steepest Descent 5. Smoothed Steepest Descent

Overview 1 Two Main Categories of Optimization Algorithms Gradient Based Non-Gradient Based

Overview Non-Gradient Based Only objective function evaluations are used to find optimum point. Gradient and Hessian of the objective function are not needed. May be able to find global minimum BUT requires a large number of design cycles. Non-gradient based family of methods: genetic algorithms, grid searchers, stochastic, nonlinear simplex, etc. In the case of Genetic Algorithms: Evaluations of the objective function of an initial set of solutions starts the design process. Initial set is typically very LARGE. Able to handle integer variables such as number of vertical tails, number of engines and other integer parameters. Able to seek the optimum point for objective functions that do not have smooth first or second derivatives.

Overview 3 Gradient Based Requires existence of continuous first derivatives of the objective function and possibly higher derivatives. Generally requires a much smaller number of design cycles to converge to an optimum compared to non-gradient based methods. However, only convergence to a local minimum is guaranteed. Simple gradient-based methods only require the gradient of the objective function but usually requires N iterations or more where N is the number of design variables. Methods that use the Hessian (Quasi-Newton) generally only require N iterations.

Overview 4 Gradient or Non-Gradient Based? Two Step Approach: 1. Use low-fidelity method (Panel Method, Euler) together with Non- Gradient Based method in the Conceptual Design Stage.. Use higher-fidelity method (Navier-Stokes) together with Gradient Based method to refine the design. The proper combination of the different flow solvers with the various optimization algorithms is still an OPEN research topic.

Calculus of Variations 1 Consider a class of optimization problems for which a curve y(x) is to be chosen to minimize a cost function described by I = x 1 x 0 F (x, y, y ) dx, where F is an arbitrary function that is continuous and twice-differentiable. The function F is dependent on x, y, and y, where y(x) is the trajectory to be optimized and it is a continuous function and differentiable, and y represents the derivative of y. Under a variation δy, the first variation of the cost function can be expressed as δi = x 1 x 0 F y F δy + y δy dx, Expand the equation by integrating the second term by parts δi = x 1 x 0 F y δy dx + F y δy x 1 x1 x 0 x 0 d F dx y δy dx.

Calculus of Variations Assuming fixed end points, then the variations of y at x 0 and x 1 are zero, δy(x 0 ) = δy(x 1 ) = 0, so that δi = x 1 x 0 F y d F dx y δy dx = x 1 x 0 Gδy dx, where G may be recognized as the gradient of the cost function and is expressed as G = F y d F dx y. A further variation of the gradient, then results to the following expression δg = G G δy + + G or δg = A δy, y y δy y δy where A is the Hessian operator. Thus the expression for the Hessian can be expressed as the differential operator A = G y + G d y dx + G d y dx. (1)

Linearized Supersonic Flow 1 In this example, we explore this concept by deriving the gradient and Hessian operator for linearized supersonic flow. Consider a linearized supersonic flow over a profile with a height y(x), where y is continuous and twice-differentiable. The surface pressure can be defined as where p p = ρq dy M 1 dx, ρq M 1 is a constant and p is the freestream pressure. Next consider an inverse problem with cost function I = 1 B (p p t) dx, where p t is the target surface pressure. The variation of the equation for the surface pressure and cost function under a profile variation δy is δp = ρq d M 1 dx δy and δi = (p p B t) δp dx.

Linearized Supersonic Flow Substitute the variation of the pressure into the equation for the variation of the cost function and integrate by parts to obtain δi = (p p ρq d B t) δy dx M 1 dx = B The gradient can then be defined as ρq M 1 d dx (p p t)δy dx. g = ρq d M 1 dx (p p t).

Linearized Supersonic Flow 3 To form the Hessian, take a variation of the gradient and substitute the expression for δp δg = ρq d M 1 dx δp = ρ q 4 d M 1 dx δy. Thus the Hessian for the inverse design of the linearized supersonic flow problem can be expressed as the differential operator A = ρ q 4 d M 1 dx. ()

Brachistochrone 1 Brachistochorne Problem: Find the minimum time taken by a particle traversing a path y(x) connecting initial and final points (x o, y o ) and x 1, y 1 ), subject only to the force of gravity. The total time is given by T = x 1 x o where the velocity of a particel starting from rest and falling under the influence of gravity is ds v and v = gy ds = dx + dy ds dy = 1 + dx dx ds = 1 + y dx

Brachistochrone Substitution for v and ds, yields T = x 1 x o 1 + y gy dx = 1 g x 1 x o 1 + y dx = I y g Therefore, I = x 1 x o From Calculus of Variations, 1 + y y dx = x 1 x o F (y, y )dx G = F y d dx F y = F y d F dx y Compute the partial derivatives of F with respect to y and y and substitute into the gradient formula produces G = 1 + y y 3 d y dx y(1 + y )

Brachistochrone 3 The expression for the gradient can then be simplified to Since F does not depend on x, G = 1 + y + yy (y(1 + y )) 3 G = F y d dx F y = F y F y x F y yy F y y y y G = F y y F y yy F y y y y = d dx (F y F y ) On an optimal path, G = 0, so or y G = d dx (F y F y ) = 0 F y F y = const

Brachistochrone 4 The expression can then be expanded to 1 + y y F y F y y 1 1 y (1 + y ) 1 y = const = const y(1 + y ) = const The classical solution can be obtained by the substitution y(t) = C sin into the above equation, where C is a constant. t y(1 + y ) = C y = C C sin ( t ) = cos ( t sin ( t ) ) = cot t

Brachistochrone 5 Finally, the optimal path can be derived by substituting y in the previous equation with to yield dy dx dy dx = cot x = t y cot ( t ) = tan x(t) = 1 C (t sin t) t dy dt dt = C sin t dt

Brachistochrone 6: Continuous Grad Let the trajectory be represented by the discrete values y j = y(x j ) at x j = j x where x is the mesh interval, j is defined as 0 j N + 1 and N is the number of design variables which are also the number of mesh points. From the gradient obtained through calculus of variations, the continuous gradient can be computed as G j = 1 + y j + y jy j (y j (1 + y j ))3 where y j and y j can be evaluated at the discrete points using second-order finite difference approximation y j = y j+1 y j 1 x,and y j = y j+1 y j + y j 1 x

Brachistochrone 7 : Discrete Grad In the discrete approach, I can be approximated using the rectangle rule of integration and I = N j=0 F j+ 1 x,where F j+ 1 = y j+ 1 = 1 (y j+1 + y j ),and y j+ 1 1 + y j+ 1 y j+ 1 = (y j+1 y j ) x Now the discrete gradient can be evaluated where G j = I = F y j y d dx ) (A j+ + A 1 j 1 = x A j+ 1 = 1 + y j+ 1 y 3 j+ 1 F y = 1 + y y 3 (B j+ 1 B j 1 ),and B j+ 1 = d dx y x y(1 + y ) y j+ 1 yj+ 1(1 + y j+ 1 )

Steepest Descent 1 Line search methods require the algorithm to choose a direction p and search along this direction from the current iterate to obtain a new iterate for the function value. Once the direction is chosen, then a step length α is multiplied to the search direction to advance the optimization to the next iterate. In order to obtain the search direction, p, and the step length, α, we may employ Taylor s theorem. First, let us define the objective function as f(x), then the optimization problem can be stated as min x f(x), where x R n is a real vector with n 1 components and f: R n R is a smooth function. Let, p be defined as the search direction. Then by Taylor s theorem f(x + αp) = f(x) + αp T f + 1 α p T f(x + tp)p +....

Steepest Descent From the Taylor s expansion, the second term p T f is the rate of change of f along the search direction p. The last term contains the expression f(x + αp) which corresponds to the Hessian matrix. The value p that would provide the most rapid decrease in the objective function f(x), is the solution of the following optimization problem: min p p T f, subject to p = 1. With p = 1, the expression p T f can be simplified to p T f = p f cos θ = f cos θ, where θ is the angle between the search direction p and the gradient f. The above expression would attain its minimum value, when cos θ takes on the value 1.

Steepest Descent 3 Therefore, the equation can be further simplified to yield an expression for the search direction p of steepest descent p T f = f p = f f. Accordingly a simple optimization algorithm can then be defined by setting the search direction, p, to the negative of the gradient at every iteration. Therefore: p = f. With a line search method the step size α is chosen such that the maximum reduction of the objective function f(x) is attained. The vector x is then updated by the following expression: x n+1 = x n α f.

Steepest Descent 4 An alternative approach is to try to follow the continuous path of steepest descent in a sequence of many small steps. The equation above can be rearranged as such x n+1 x n α = f. In the limit as α 0, this reduces to x t = f, (3) where α is the time step in a forward Euler discretization.

Smoothed Steepest Descent 1 Let x represent the design variable, and f the gradient. Instead of making the step δx = αp = α f, we replace the gradient f by a smoothed value f. To apply smoothing in the x direction, the smoothed gradient f may be calculated from a discrete approximation to where ɛ is the smoothing parameter. Then the first order change in the cost function is f x ɛ f = f, (4) x δf = = α fδxdx f x ɛ x f fdx = α f dx + α x ɛ x f fdx.

Smoothed Steepest Descent Now, integrating the second integral by parts, δf = α f dx + = α < 0, f + ɛ α fɛ f α x f x dx ɛ f x dx where the second term in the first line of the equation is zero if the end points of the new gradient vector are assigned zero values. If ɛ is positive, the variation of the objective function is less than zero and this assures an improvement if α is positive unless f and hence f are zero. Smoothing ensures that each new shape in the optimization sequence remains smooth. It also acts as a preconditioner, which allows the use of much larger steps, and leads to a large reduction in the number of design iterations needed for convergence. A larger smoothing parameter allows a larger time step to be used and thus accelerates the convergence.

Smoothed Steepest Descent 3 Jameson and Vassberg have shown that the implicit smoothing technique corresponds to an implicit time stepping scheme for the descent equation (3) if the smoothing parameter ɛ is chosen appropriately. Consider a parabolic equation of the form A second order implicit discretization is x t = π x y. φδx k 1 + (1 + φ)δx k φδx k+1 = φ ( x n k 1 xn k + xn k+1). where φ = π t y. This corresponds exactly to smoothing the correction with the formula ɛ = π. Their results show that the number of iterations required by the smoothing technique is similar to that of the implicit time stepping scheme, and both approaches perform better than the simple steepest descent and Quasi-Newton methods by a large amount.

Smoothed Steepest Descent 4 For some problems, such as the calculus of variations, the implicit smoothing technique can be used to implement the Newton method. In a Newton method, the gradient is driven to zero based on the linearization g(y + δy) = g(y) + Aδy, where A is the Hessian. In the case of the calculus of variations a Newton step can be achieved by solving Aδy = G y + G d y dx + G d y dx δy = g, since the Hessian can be represented by the differential operator. Thus the correct choice of smoothing from equation (4) approximates the Newton step, resulting in quadratic convergence, independent of the number of mesh intervals.