Numerical Optimization Algorithms

Similar documents
An Investigation of the Attainable Efficiency of Flight at Mach One or Just Beyond

PDE Solvers for Fluid Flow

A Crash-Course on the Adjoint Method for Aerodynamic Shape Optimization

Gradient Based Optimization Methods

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Inverse Kinematics. Mike Bailey.

ENGI Gradient, Divergence, Curl Page 5.01

Optimal control problems with PDE constraints

Edexcel past paper questions. Core Mathematics 4. Parametric Equations

Second Order ODEs. Second Order ODEs. In general second order ODEs contain terms involving y, dy But here only consider equations of the form

Tutorial-1, MA 108 (Linear Algebra)

A Short Essay on Variational Calculus

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Adjoint Formulations for Topology, Shape and Discrete Optimization

Introduction to gradient descent

The Brachistochrone Curve

Lecture 16: Relaxation methods

Arc Length and Surface Area in Parametric Equations

Inverse Kinematics. Mike Bailey. Oregon State University. Inverse Kinematics

Higher-Order Methods

Unconstrained optimization

AP Calculus 2004 AB FRQ Solutions

Engg. Math. I. Unit-I. Differential Calculus

PH.D. PRELIMINARY EXAMINATION MATHEMATICS

Numerical Optimization. Review: Unconstrained Optimization

MTH4101 CALCULUS II REVISION NOTES. 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) ax 2 + bx + c = 0. x = b ± b 2 4ac 2a. i = 1.

An Investigation of the Attainable Efficiency of Flight at Mach One or Just Beyond

Coordinate systems and vectors in three spatial dimensions

Math and Numerical Methods Review

Final Exam. Monday March 19, 3:30-5:30pm MAT 21D, Temple, Winter 2018

THE BRACHISTOCHRONE CURVE: THE PROBLEM OF QUICKEST DESCENT

Unconstrained minimization of smooth functions

25. Chain Rule. Now, f is a function of t only. Expand by multiplication:

Robotics. Islam S. M. Khalil. November 15, German University in Cairo

Convex Optimization. Problem set 2. Due Monday April 26th

CHAPTER 5: Linear Multistep Methods

VANDERBILT UNIVERSITY. MATH 2300 MULTIVARIABLE CALCULUS Practice Test 1 Solutions

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Integration, differentiation, and root finding. Phys 420/580 Lecture 7

minimize x subject to (x 2)(x 4) u,

Quasi-Newton Methods

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Lecture 6, September 1, 2017

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Mathematical optimization

Neural Network Training

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Calculus of Variations and Computer Vision

Numerical Algorithms as Dynamical Systems

Self-Concordant Barrier Functions for Convex Optimization

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

NUMERICAL METHODS. x n+1 = 2x n x 2 n. In particular: which of them gives faster convergence, and why? [Work to four decimal places.

Nonlinear equations and optimization

Derivatives and Integrals

1 Numerical optimization

Basic Aspects of Discretization

17 Solution of Nonlinear Systems

Optimization Tutorial 1. Basic Gradient Descent

Matrix Derivatives and Descent Optimization Methods

Written Examination

ENGI Partial Differentiation Page y f x

Find the indicated derivative. 1) Find y(4) if y = 3 sin x. A) y(4) = 3 cos x B) y(4) = 3 sin x C) y(4) = - 3 cos x D) y(4) = - 3 sin x

Constrained optimization: direct methods (cont.)

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

5 Handling Constraints

Simulation based optimization

1 Numerical optimization

The Conjugate Gradient Method

Edexcel Core Mathematics 4 Parametric equations.

Introduction to Nonlinear Optimization Paul J. Atzberger

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

Numerical methods for the Navier- Stokes equations

SECTION A. f(x) = ln(x). Sketch the graph of y = f(x), indicating the coordinates of any points where the graph crosses the axes.

Vectors, metric and the connection

NonlinearOptimization

Review for Exam 2 Ben Wang and Mark Styczynski

Computational Finance

Mathematics of Physics and Engineering II: Homework problems

Section Taylor and Maclaurin Series

SOLUTIONS TO THE FINAL EXAM. December 14, 2010, 9:00am-12:00 (3 hours)

Line Search Methods for Unconstrained Optimisation

(A) Opening Problem Newton s Law of Cooling

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Optimization. Totally not complete this is...don't use it yet...

Optimization Methods

Series Solutions of Differential Equations

Introduction to the Calculus of Variations

Some notes about PDEs. -Bill Green Nov. 2015

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

Review session Midterm 1

Motion Estimation (I) Ce Liu Microsoft Research New England

28. Pendulum phase portrait Draw the phase portrait for the pendulum (supported by an inextensible rod)

Applications of adjoint based shape optimization to the design of low drag airplane wings, including wings to support natural laminar flow

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Methods I Solving Nonlinear Equations

Introduction to unconstrained optimization - direct search methods

Transcription:

Numerical Optimization Algorithms 1. Overview. Calculus of Variations 3. Linearized Supersonic Flow 4. Steepest Descent 5. Smoothed Steepest Descent

Overview 1 Two Main Categories of Optimization Algorithms Gradient Based Non-Gradient Based

Overview Non-Gradient Based Only objective function evaluations are used to find optimum point. Gradient and Hessian of the objective function are not needed. May be able to find global minimum BUT requires a large number of design cycles. Non-gradient based family of methods: genetic algorithms, grid searchers, stochastic, nonlinear simplex, etc. In the case of Genetic Algorithms: Evaluations of the objective function of an initial set of solutions starts the design process. Initial set is typically very LARGE. Able to handle integer variables such as number of vertical tails, number of engines and other integer parameters. Able to seek the optimum point for objective functions that do not have smooth first or second derivatives.

Overview 3 Gradient Based Requires existence of continuous first derivatives of the objective function and possibly higher derivatives. Generally requires a much smaller number of design cycles to converge to an optimum compared to non-gradient based methods. However, only convergence to a local minimum is guaranteed. Simple gradient-based methods only require the gradient of the objective function but usually requires N iterations or more where N is the number of design variables. Methods that use the Hessian (Quasi-Newton) generally only require N iterations.

Overview 4 Gradient or Non-Gradient Based? Two Step Approach: 1. Use low-fidelity method (Panel Method, Euler) together with Non- Gradient Based method in the Conceptual Design Stage.. Use higher-fidelity method (Navier-Stokes) together with Gradient Based method to refine the design. The proper combination of the different flow solvers with the various optimization algorithms is still an OPEN research topic.

Calculus of Variations 1 Consider a class of optimization problems for which a curve y(x) is to be chosen to minimize a cost function described by I = x 1 x 0 F (x, y, y ) dx, where F is an arbitrary function that is continuous and twice-differentiable. The function F is dependent on x, y, and y, where y(x) is the trajectory to be optimized and it is a continuous function and differentiable, and y represents the derivative of y. Under a variation δy, the first variation of the cost function can be expressed as δi = x 1 x 0 F y F δy + y δy dx, Expand the equation by integrating the second term by parts δi = x 1 x 0 F y δy dx + F y δy x 1 x1 x 0 x 0 d F dx y δy dx.

Calculus of Variations Assuming fixed end points, then the variations of y at x 0 and x 1 are zero, δy(x 0 ) = δy(x 1 ) = 0, so that δi = x 1 x 0 F y d F dx y δy dx = x 1 x 0 Gδy dx, where G may be recognized as the gradient of the cost function and is expressed as G = F y d F dx y. A further variation of the gradient, then results to the following expression δg = G G δy + + G or δg = A δy, y y δy y δy where A is the Hessian operator. Thus the expression for the Hessian can be expressed as the differential operator A = G y + G d y dx + G d y dx. (1)

Linearized Supersonic Flow 1 In this example, we explore this concept by deriving the gradient and Hessian operator for linearized supersonic flow. Consider a linearized supersonic flow over a profile with a height y(x), where y is continuous and twice-differentiable. The surface pressure can be defined as where p p = ρq dy M 1 dx, ρq M 1 is a constant and p is the freestream pressure. Next consider an inverse problem with cost function I = 1 B (p p t) dx, where p t is the target surface pressure. The variation of the equation for the surface pressure and cost function under a profile variation δy is δp = ρq d M 1 dx δy and δi = (p p B t) δp dx.

Linearized Supersonic Flow Substitute the variation of the pressure into the equation for the variation of the cost function and integrate by parts to obtain δi = (p p ρq d B t) δy dx M 1 dx = B The gradient can then be defined as ρq M 1 d dx (p p t)δy dx. g = ρq d M 1 dx (p p t).

Linearized Supersonic Flow 3 To form the Hessian, take a variation of the gradient and substitute the expression for δp δg = ρq d M 1 dx δp = ρ q 4 d M 1 dx δy. Thus the Hessian for the inverse design of the linearized supersonic flow problem can be expressed as the differential operator A = ρ q 4 d M 1 dx. ()

Brachistochrone 1 Brachistochorne Problem: Find the minimum time taken by a particle traversing a path y(x) connecting initial and final points (x o, y o ) and x 1, y 1 ), subject only to the force of gravity. The total time is given by T = x 1 x o where the velocity of a particel starting from rest and falling under the influence of gravity is ds v and v = gy ds = dx + dy ds dy = 1 + dx dx ds = 1 + y dx

Brachistochrone Substitution for v and ds, yields T = x 1 x o 1 + y gy dx = 1 g x 1 x o 1 + y dx = I y g Therefore, I = x 1 x o From Calculus of Variations, 1 + y y dx = x 1 x o F (y, y )dx G = F y d dx F y = F y d F dx y Compute the partial derivatives of F with respect to y and y and substitute into the gradient formula produces G = 1 + y y 3 d y dx y(1 + y )

Brachistochrone 3 The expression for the gradient can then be simplified to Since F does not depend on x, G = 1 + y + yy (y(1 + y )) 3 G = F y d dx F y = F y F y x F y yy F y y y y G = F y y F y yy F y y y y = d dx (F y F y ) On an optimal path, G = 0, so or y G = d dx (F y F y ) = 0 F y F y = const

Brachistochrone 4 The expression can then be expanded to 1 + y y F y F y y 1 1 y (1 + y ) 1 y = const = const y(1 + y ) = const The classical solution can be obtained by the substitution y(t) = C sin into the above equation, where C is a constant. t y(1 + y ) = C y = C C sin ( t ) = cos ( t sin ( t ) ) = cot t

Brachistochrone 5 Finally, the optimal path can be derived by substituting y in the previous equation with to yield dy dx dy dx = cot x = t y cot ( t ) = tan x(t) = 1 C (t sin t) t dy dt dt = C sin t dt

Brachistochrone 6: Continuous Grad Let the trajectory be represented by the discrete values y j = y(x j ) at x j = j x where x is the mesh interval, j is defined as 0 j N + 1 and N is the number of design variables which are also the number of mesh points. From the gradient obtained through calculus of variations, the continuous gradient can be computed as G j = 1 + y j + y jy j (y j (1 + y j ))3 where y j and y j can be evaluated at the discrete points using second-order finite difference approximation y j = y j+1 y j 1 x,and y j = y j+1 y j + y j 1 x

Brachistochrone 7 : Discrete Grad In the discrete approach, I can be approximated using the rectangle rule of integration and I = N j=0 F j+ 1 x,where F j+ 1 = y j+ 1 = 1 (y j+1 + y j ),and y j+ 1 1 + y j+ 1 y j+ 1 = (y j+1 y j ) x Now the discrete gradient can be evaluated where G j = I = F y j y d dx ) (A j+ + A 1 j 1 = x A j+ 1 = 1 + y j+ 1 y 3 j+ 1 F y = 1 + y y 3 (B j+ 1 B j 1 ),and B j+ 1 = d dx y x y(1 + y ) y j+ 1 yj+ 1(1 + y j+ 1 )

Steepest Descent 1 Line search methods require the algorithm to choose a direction p and search along this direction from the current iterate to obtain a new iterate for the function value. Once the direction is chosen, then a step length α is multiplied to the search direction to advance the optimization to the next iterate. In order to obtain the search direction, p, and the step length, α, we may employ Taylor s theorem. First, let us define the objective function as f(x), then the optimization problem can be stated as min x f(x), where x R n is a real vector with n 1 components and f: R n R is a smooth function. Let, p be defined as the search direction. Then by Taylor s theorem f(x + αp) = f(x) + αp T f + 1 α p T f(x + tp)p +....

Steepest Descent From the Taylor s expansion, the second term p T f is the rate of change of f along the search direction p. The last term contains the expression f(x + αp) which corresponds to the Hessian matrix. The value p that would provide the most rapid decrease in the objective function f(x), is the solution of the following optimization problem: min p p T f, subject to p = 1. With p = 1, the expression p T f can be simplified to p T f = p f cos θ = f cos θ, where θ is the angle between the search direction p and the gradient f. The above expression would attain its minimum value, when cos θ takes on the value 1.

Steepest Descent 3 Therefore, the equation can be further simplified to yield an expression for the search direction p of steepest descent p T f = f p = f f. Accordingly a simple optimization algorithm can then be defined by setting the search direction, p, to the negative of the gradient at every iteration. Therefore: p = f. With a line search method the step size α is chosen such that the maximum reduction of the objective function f(x) is attained. The vector x is then updated by the following expression: x n+1 = x n α f.

Steepest Descent 4 An alternative approach is to try to follow the continuous path of steepest descent in a sequence of many small steps. The equation above can be rearranged as such x n+1 x n α = f. In the limit as α 0, this reduces to x t = f, (3) where α is the time step in a forward Euler discretization.

Smoothed Steepest Descent 1 Let x represent the design variable, and f the gradient. Instead of making the step δx = αp = α f, we replace the gradient f by a smoothed value f. To apply smoothing in the x direction, the smoothed gradient f may be calculated from a discrete approximation to where ɛ is the smoothing parameter. Then the first order change in the cost function is f x ɛ f = f, (4) x δf = = α fδxdx f x ɛ x f fdx = α f dx + α x ɛ x f fdx.

Smoothed Steepest Descent Now, integrating the second integral by parts, δf = α f dx + = α < 0, f + ɛ α fɛ f α x f x dx ɛ f x dx where the second term in the first line of the equation is zero if the end points of the new gradient vector are assigned zero values. If ɛ is positive, the variation of the objective function is less than zero and this assures an improvement if α is positive unless f and hence f are zero. Smoothing ensures that each new shape in the optimization sequence remains smooth. It also acts as a preconditioner, which allows the use of much larger steps, and leads to a large reduction in the number of design iterations needed for convergence. A larger smoothing parameter allows a larger time step to be used and thus accelerates the convergence.

Smoothed Steepest Descent 3 Jameson and Vassberg have shown that the implicit smoothing technique corresponds to an implicit time stepping scheme for the descent equation (3) if the smoothing parameter ɛ is chosen appropriately. Consider a parabolic equation of the form A second order implicit discretization is x t = π x y. φδx k 1 + (1 + φ)δx k φδx k+1 = φ ( x n k 1 xn k + xn k+1). where φ = π t y. This corresponds exactly to smoothing the correction with the formula ɛ = π. Their results show that the number of iterations required by the smoothing technique is similar to that of the implicit time stepping scheme, and both approaches perform better than the simple steepest descent and Quasi-Newton methods by a large amount.

Smoothed Steepest Descent 4 For some problems, such as the calculus of variations, the implicit smoothing technique can be used to implement the Newton method. In a Newton method, the gradient is driven to zero based on the linearization g(y + δy) = g(y) + Aδy, where A is the Hessian. In the case of the calculus of variations a Newton step can be achieved by solving Aδy = G y + G d y dx + G d y dx δy = g, since the Hessian can be represented by the differential operator. Thus the correct choice of smoothing from equation (4) approximates the Newton step, resulting in quadratic convergence, independent of the number of mesh intervals.