Introduction to Nonlinear Optimization Paul J. Atzberger

Similar documents
j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

Review of Classical Optimization

Unconstrained optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Single Variable Minimization

Unconstrained minimization of smooth functions

Higher-Order Methods

Notes on Numerical Optimization

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.

Nonlinear Optimization: What s important?

8 Numerical methods for unconstrained problems

MATH 4211/6211 Optimization Basics of Optimization Problems

17 Solution of Nonlinear Systems

CONSTRAINED NONLINEAR PROGRAMMING

1 Numerical optimization

Line Search Techniques

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

Maria Cameron. f(x) = 1 n

Introduction to gradient descent

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization


Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

1 Numerical optimization

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Optimization Methods. Lecture 19: Line Searches and Newton s Method

The Steepest Descent Algorithm for Unconstrained Optimization

TMA4180 Solutions to recommended exercises in Chapter 3 of N&W

Nonlinear Programming

5 Handling Constraints

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Optimization Tutorial 1. Basic Gradient Descent

Line Search Methods. Shefali Kulkarni-Thaker

Optimization II: Unconstrained Multivariable

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

Numerical Methods I Solving Nonlinear Equations

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Written Examination

Math 273a: Optimization Netwon s methods

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS

Statistics 580 Optimization Methods

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Lecture 3: Linesearch methods (continued). Steepest descent methods

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Numerical Optimization of Partial Differential Equations

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Static unconstrained optimization

Practical Optimization: Basic Multidimensional Gradient Methods

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

Generalization. A cat that once sat on a hot stove will never again sit on a hot stove or on a cold one either. Mark Twain

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Unconstrained Optimization

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Optimization with Scipy (2)

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Newton s Method. Javier Peña Convex Optimization /36-725

Unconstrained optimization I Gradient-type methods

Numerical Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

Lecture 3: Basics of set-constrained and unconstrained optimization

IE 5531: Engineering Optimization I

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

On the iterate convergence of descent methods for convex optimization

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

Gradient Descent. Dr. Xiaowei Huang

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

Scientific Computing: An Introductory Survey

Second Order Optimization Algorithms I

Numerical optimization

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Nonlinear Optimization

MATH 205C: STATIONARY PHASE LEMMA

Optimization II: Unconstrained Multivariable

Lecture V. Numerical Optimization

Optimal Newton-type methods for nonconvex smooth optimization problems

Self-Concordant Barrier Functions for Convex Optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

5 Overview of algorithms for unconstrained optimization

Steepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720

Line Search Algorithms

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

Multidisciplinary System Design Optimization (MSDO)

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

Lecture Notes: Geometric Considerations in Unconstrained Optimization

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

Algorithms for Constrained Optimization

Math 273a: Optimization Basic concepts

Mathematical optimization

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Transcription:

Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu

Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts, with wide applicability in finance and other applied fields. The basic problem is to solve min x f(x) (1) where typically x R n, but this can also be subject to constraints. To numerically approximate the solution (if it exists), we shall construct a sequence {x n } n=1 such that x n x, where f(x ) = min x f(x). There are many algorithms which attempt to construct such a sequence, with different approaches appropriate depending on the problem. A good reference on nonlinear optimization methods is Numerical Optimization by Nocedal and Wright. In these notes, we shall discuss primarily line search methods and handle any constraints that arise using penalty terms. In line search algorithms the sequence {x n } n=1 is constructed iteratively at each step choosing a search direction p and attempting to minimize the objective function along the line (ray) in this direction. This reduces the problem essentially to a sequence of one dimensional problems, where x is given by the basic recurrence: x +1 = x + α p (2) where the step-length α > 0 must be estimated. The ey to having a sequence x which converges rapidly is to construct an algorithm which maes effective use of the information about f(x) from the previous iterates to choose a good p and step-length α. To ensure that progress can be made each iteration, a natural condition to impose on p is that the function decrease, at least locally, in this direction. We call p a descent direction if f p < 0. Typically, the descent direction has the form p = B 1 f. In the case that B is position definite this ensures that p is a descent direction For example: f p = f T B 1 f < 0. Steepest descent has B = I set to the identity matrix so that p = f. Newton s method has B = 2 f set to the Hessian so that p = ( 2 f ) 1 f. It is important to note that p is only ensured to be descent direction if the Hessian is position definite. The Hessian however is often expensive to compute numerically. Quasi-Newton methods construct a B which approximates the Hessian by maing use of the previous function evaluations of f and f, see Nocedal and Wright for a further discussion. In designing a good numerical optimization method some care must be taen in choosing the not only the direction p but the step-length α, and this will usually depend on the function f(x) being optimized. We shall assume that p is some chosen descent direction and now discuss how to choose the step-length α. Wolfe Conditions: Choosing a Step-Length α Let φ(α) = f(x + αp ) then one choice of α which might seem ideal would be to minimize φ(α) globally for all α. However, it is typically computationally expensive to find the global optimum even to the one dimensional problem. Since the ray p will not typically contain the minimizer to the full problem, multiple 2

iterations will almost always be required. This suggests that in designing an algorithm there should be a balance between maing progress in minimizing the objective function f in each search direction p while iterating over a number of different search directions. To ensure our algorithm converges we must be more mathematically precise about what we mean by maing progress over each search direction. The simplest criteria we might consider to determine if progress has been made over a particular search direction is that the function be reduced f(x + α p ) < f(x ), however, this is not sufficient to ensure convergence to the minimizer of x. For example, consider f(x,y) = x 2 + y 2, and suppose the algorithm has the output x +1 = x + α p where α = 1. Then if p 2 = [ 1,0] T we have f(x +1 ) < f(x ), but x +1 = x 0 =0 1 p 2. If the first component is [x 0 ] 1 > 2 then x 0. In other words, to summarize, if the progress made each iteration is not sufficiently large, the sequence may converge prematurely to a non-optimal value. Sufficient Decrease New Point Must Lie Below This Line y x Figure 1: Sufficient Decrease Condition Curvature Condition New Slope Must Be Greater Than This Line y x Figure 2: Curvature Condition To ensure that sufficient progress is made each iteration in the line search we shall require the following two conditions to hold: 3

Wolfe Conditions: (i) (sufficient decrease) f(x + αp ) < f(x ) + c 1 α f Tp, where c 1 (0,1). (ii) (curvature condition) f(x + αp ) T p c 2 f Tp, where c 2 (c 1,1). (See the figures 1 2 for a geometric interpretation). We now prove that such step lengths exist satisfying the Wolfe Conditions. Convergence of Line Search Optimization Methods Lemma: Let f : R n R be continuously differentiable and let p be a descent direction for each x. If we assume that along each ray {x + αp α > 0} f is bounded below then there exist intervals of step lengths [α (1),α (2) ] such that the Wolfe Conditions are satisfied. Proof: Since φ(α) = f(x + αp ) is bounded below for all α > 0 and 0 < c 1 < 1, the line l(α) = f(x ) + αc 1 f T p must intersect the graph of φ(α) at least once. Let α > 0 be the smallest such α, that is f(x + α p ) = f(x ) + α c 1 f T p. The sufficient decrease condition (i) then clearly holds for α < α. By the differentiability of the function we have from the mean-value theorem that there exists α (0,α ) such that f(x +α p ) f(x ) = α f(x +α p ) T p. Now by substituting for f(x +α p ) the expression above (sufficient-decrease), we obtain f(x + α p ) T p = c 1 f T p > c 2 f T p, since c 1 < c 2 and f p < 0. Thus α satisfies the curvature condition (ii). Therefore, since α < α, we have that in a neighborhood of α both Wolfe Conditions (i) and (ii) hold simultaneously. We now discuss the convergence of line search algorithms when the Wolfe Conditions are satisfied. Theorem: (Zoutendij s Condition) Assume {x } is generated by a line search algorithm satisfying the Wolfe Conditions. Suppose that f(x) : R n R is bounded below with Lipschitz continuous f, f(x) f(y) L x y then cos 2 (θ ) f 2 < 0 where cos(θ ) = ft p f p. We remar that θ is the angle between the f T steepest descent direction and the search direction p. Proof: From the Wolfe Conditions (i) and (ii) we have From Lipschitz continuity we have ( f +1 f ) T p (c 2 1) f T p. ( f +1 f ) T p α L p 2. 4

By combining both relations we have α c 2 1 f Tp L p 2. By substituting this in (i) we have From the definition of cos(θ ) we have f +1 f c 1 1 c 2 L ( f T p ) 2 p 2. f +1 f ccos(θ ) f 2 where c = c1(1 c2) L. By summing all indices less than we have f +1 f 0 c j=0 cos2 (θ j ) f j 2. Since f(x) is bounded below we must have cos 2 (θ j ) f j 2 < j=0. Using the standard facts about convergence of series, we have cos 2 (θ j ) f j 2 0, as. This result while appearing somewhat technical, is actually quite useful. For if we construct a line search method such that cos(θ ) > δ > 0, then the theorem above implies that lim f = 0. Thus if the line search sequence {x } remains bounded there will be a limit point x which is a critical point such that f(x ) = 0. Given the sufficient decrease condition this point will liely be a minimizer, although one must be careful to chec if no special structure is assumed about f. To further describe the convergence, we shall say that a method converges in f with rate p if f(x +1 ) f(x ) c f(x ) f(x ) p. We say that a method converges in x with rate p if x +1 x c x x p. We shall now discuss the convergence of two common optimization methods, namely, steepest descent and newton s method. Convergence of Steepest Descent Steepest Descent: In this case p = f for each step. Let us consider a problem which we can readily analyze with an objective function of the form: f(x) = 1 2 xt Qx b T x where Q = Q T is positive definite. The minimizer to this problem can be readily found as solves, Qx = b. In this example, we can explicitly compute the step-length α which minimizes the function ρ(α) each interaction. This is given by f(x αg ) = 1 2 (x αg ) T Q(x αg ) b(x αg ) 5

so that f(x) = Qx b gives the minimizer in α > 0. That is α should satisfy This gives when g = f we have f(x αg ) = Q(x αg ) b = 0. α = gt g g T Qg α = ft f f T Q f thus the next iterate is given by (when minimizing in the line search), ( ) f T x +1 = x f f TQ f f. From the expression above it can be shown that [ ( ) f f(x +1 ) f(x T 2 ] ) = 1 f ( ) ( ) (f(x f T Q f f T Q 1 ) f(x )). f This can be expressed in terms of the eigenvalues of Q as ( ) 2 (f(x +1 ) f(x λn λ 1 )) (f(x ) f(x )) λ n + λ 1 where 0 λ 1 λ 2 λ n are eigenvalues of Q. (Proof is given elsewhere, see Luenberger 1984). We remar that for nonlinear objective functions we can approximate the function near a minimizer using a truncation of the Taylor expansion at the quadratic term. In this case, the analysis above will be applicable up to this truncation error where we set Q to be the Hessian, Q = 2 f. From the analysis we see find that steepest descent has a rate of convergence in f of first order, in the ideal case. Convergence of Newton s Method Newton s Method: Let us now consider the the case when the search direction is p (N) = 2 f 1 f. As we mentioned above, we must be a little careful since in this case the Hessian 2 f may not be positive definite and the search direction may not be a descent direction. For the example above Q positive definite the search direction is p (N) = Q 1 f. Analysis similar to that done in the case of Steepest Descent can be carried out to show that the rate of convergence in f is quadratic. In particular, in the case of a quadratic objective function it can be shown that for each iteration the optimal α = 1. This gives x + p x = x x 2 f 1 f = 2 f 1 [ 2 f (x x ) ( f f ) ] 6

where f = f(x ) = 0. Now using that we have f f = 1 0 2 f(x + t(x x ))(x x )dt, 2 f(x )(x x ) ( f f(x )) = 1 0 1 where L is the Lipschitz constant for 2 f(x) for x near x. 0 [ 2 f(x ) 2 f(x + t(x x )) ] (x x )dt 2 f(x ) 2 f(x + t(x x )) x x dt 1 x x 2 Ltdt 0 x + p (N) x L 2 f(x ) 1 x x 2 Therefore, the method converges in x quadratically, p = 2. A Line Search Algorithm We now discuss an algorithm to find a step-length α which satisfies the Wolfe conditions. The basic strategy is to use interpolation and a bisection search by determining which half of an interval contains points satisfying the Wolfe conditions. We state a general algorithm here in pseudo code. Algorithm: LineSearch (Input: x, p, α max. Output α.) α 0 0, and specify α 1 > 0 and α max ; i 1 repeat: Evaluate φ(α i ); if φ(α i ) > φ(0) + c 1 α i φ (0) or φ(α i ) φ(α i 1 ) and i > 1. α zoom (α i 1,α i ) and stop. Evaluate φ (α i ); if φ (α i ) c 2 φ (0) set α α i and stop; if φ (α i ) 0 set α zoom(α i,α i 1 ) and stop; Choose α i+1 (α i,α max ) i i + 1; end (repeat) Algorithm: Zoom (Input: α lo, α hi. Output: α.) repeat Interpolate (using quadratic, cubic, or bisection) to find a trial step length α j between α lo and α hi Evaluate φ(α j ); if φ(α j ) > φ(0) + c 1 α j φ (0) or φ(α j ) φ(α lo ) α hi α j ; 7

else Evaluate φ (α j ); if φ(α j ) c 2 φ (0) set α α j and stop; if φ (α j )(α hi α lo ) 0 α hi α lo ; α lo α j ; end (repeat) For a more in depth discussion and motivation for these algorithms see (Nocedal and Wright, Numerical Optimization, pages 58-61). Solving Nonlinear Unconstrained Optimization Problems For unconstrained optimization problems of the form the basic line search algorithm is as follows: min x f(x) Unconstrained Line Search Optimization Algorithm: (Input: x 0, α max,ǫ. Output: x.) 0 x x 0 Evaluate f f(x ); repeat p = B 1 f ; α LineSearch(x,p,α max ); x +1 x + α p ; Evaluate f f(x ); If f < ǫ then x x and stop; end (repeat) Solving Nonlinear Constrained Optimization Problems For constrained optimization problems of the form min x subject f(x) c i (x) = 0, i E c i (x) 0, i I a line search algorithm can be constructed. To handle the constraints we shall formulate an unconstrained optimization problem which incorporates the constraints through penalty terms. The strategy is to then solve each of the unconstrained problems with a penalty parameter µ j up to a tolerance of ǫ j. By repeatedly solving this problem using as the starting point the solution x j of the previous iteration, and by successively reducing the values of µ j and ǫ j, a sequence of solutions x j x can be generated. One must tae some care on how µ j and ǫ j are reduced each iteration to ensure convergence and efficient use of computational resources. These issues are further discussed in 8

Nocedal and Wright. The basic algorithm is: A Constrained Line Search Optimization Algorithm: (Input: x 0, α max,ǫ,δ. Output: x.) j 0 x j x 0 µ j µ 0 repeat 1 Let F(x,µ j ) = f(x) + 1 2µ j i E c2 i (x) µ j i I log (c i(x)) x j UnconstrainedOptimization(x j,α max,ǫ j ) Evaluate F j x F(x j,µ j); if F j < ǫ and µ j < δ then x x j and stop; j j + 1 reduce the value of µ j reduce the value of ǫ j end (repeat 1) 9