E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Similar documents
SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

Nonlinear Optimization: What s important?

Nonlinear optimization

Constrained optimization: direct methods (cont.)

4TE3/6TE3. Algorithms for. Continuous Optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

8 Numerical methods for unconstrained problems

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

SF2822 Applied nonlinear optimization, final exam Saturday December

Line Search Methods for Unconstrained Optimisation

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

5 Quasi-Newton Methods

Algorithms for constrained local optimization

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

Unconstrained minimization of smooth functions

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

The Steepest Descent Algorithm for Unconstrained Optimization

Written Examination

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 10: Interior methods. Anders Forsgren. 1. Try to solve theory question 7.

Unconstrained optimization

1 Computing with constraints

Algorithms for Constrained Optimization

5 Handling Constraints

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

Scientific Computing: Optimization

IE 5531: Engineering Optimization I

Newton s Method. Javier Peña Convex Optimization /36-725

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Optimization Tutorial 1. Basic Gradient Descent

Introduction to Nonlinear Optimization Paul J. Atzberger

Multidisciplinary System Design Optimization (MSDO)

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Some Theoretical Properties of an Augmented Lagrangian Merit Function

minimize x subject to (x 2)(x 4) u,

Lecture 14: October 17

Optimization Methods. Lecture 19: Line Searches and Newton s Method

MATH 4211/6211 Optimization Basics of Optimization Problems

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

10-725/ Optimization Midterm Exam

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Unconstrained Optimization

10. Unconstrained minimization

Computational Finance

5 Overview of algorithms for unconstrained optimization

Examination paper for TMA4180 Optimization I

Higher-Order Methods

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Lecture 3. Optimization Problems and Iterative Algorithms

Nonlinear Programming

Numerical Methods I Solving Nonlinear Equations

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

REGULARIZED SEQUENTIAL QUADRATIC PROGRAMMING METHODS

Quadratic Programming

Constrained Optimization

Arc Search Algorithms

Numerical Optimization: Basic Concepts and Algorithms

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Lecture 15: SQP methods for equality constrained optimization

CONVEXIFICATION SCHEMES FOR SQP METHODS

2.3 Linear Programming

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

10 Numerical methods for constrained problems

Numerical optimization

On the use of piecewise linear models in nonlinear programming

12. Interior-point methods

CONSTRAINED NONLINEAR PROGRAMMING

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Optimality Conditions for Constrained Optimization

Self-Concordant Barrier Functions for Convex Optimization

An Inexact Newton Method for Nonlinear Constrained Optimization

Math 273a: Optimization Basic concepts

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Numerical Optimization

TMA4180 Solutions to recommended exercises in Chapter 3 of N&W

Miscellaneous Nonlinear Programming Exercises

Programming, numerics and optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Lecture 3: Basics of set-constrained and unconstrained optimization

Constrained Optimization and Lagrangian Duality

Lecture 13: Constrained optimization

Some new facts about sequential quadratic programming methods employing second derivatives

LINEAR AND NONLINEAR PROGRAMMING

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

A GLOBALLY CONVERGENT STABILIZED SQP METHOD

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Optimization methods

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Conditional Gradient (Frank-Wolfe) Method

The Conjugate Gradient Method

2. Quasi-Newton methods

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

Transcription:

E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007

Unconstrained convex program Consider a convex optimization problem on the form (CP ) minimize x IR n f(x), where f : IR n IR is convex and twice continuously differentiable. Proposition. Let p denote the optimal value of (CP ). Let x in IR n, let m = η min ( 2 f(x)) and let M = η max ( 2 f(x)). Then, f(x) 1 2m f(x) 2 2 p f(x) 1 2M f(x) 2 2. Note that ỹ = x (1/m) f(x) minimizes f(x) T (y x) + m 2 y x 2 2. The direction (1/m) f(x) is a steepest descent step. A. Forsgren, KTH 2 Lecture 8 Convex optimization 2006/2007

Iterative methods (CP ) minimize f(x) subject to x IR n, where f C 2, f convex on IR n. An iterative method generates x 0, x 1, x 2,... such that lim k x k = x, where f(x ) = 0. Terminates when suitable convergence criteria is fulfilled, e.g., f(x k ) < ɛ. A. Forsgren, KTH 3 Lecture 8 Convex optimization 2006/2007

Linesearch methods A linesearch method generates in each iteration a search direction and performs a linesearch along the search direction. Iteration k takes the following form at x k. ˆ Compute search direction p k such that f(x k ) T p k < 0. ˆ Approximatively solve min α 0 f(x k + αp k ), which gives α k. ˆ x k+1 x k + α k p k. Different methods vary in choice of p k and α k. A. Forsgren, KTH 4 Lecture 8 Convex optimization 2006/2007

Classes of linesearch methods We will initially consider two fundamental methods. ˆ The steepest-descent method, where p k = f(x k ), and ˆ Newton s method, where 2 f(x k )p k = f(x k ). Steepest descent: + Search direction inexpensive to compute, Slow convergence. Newton s method: Search direction more expensive to compute, + Faster convergence. There are methods in-between, e.g., quasi-newton methods that aim at mimicking Newton s method without computing second derivatives. A. Forsgren, KTH 5 Lecture 8 Convex optimization 2006/2007

Quadratic objective function Consider model problem with quadratic objective function (QP ) minimize subject to x IR n, f(x) = 1 2 xt Hx + c T x where H 0. Proposition. The following holds for (QP ) depending on H and c: (i) If H 0 then (QP ) has a unique minimizer x given by Hx = c. (ii) If H 0, H 0 each x that fulfills Hx = c is a global minimizer to (QP ). (There may possibly be no such x ). Proof. The condition f(x ) = 0 gives the results. We assume H 0 in the discussion. A. Forsgren, KTH 6 Lecture 8 Convex optimization 2006/2007

Linesearch method on quadratic objective function Consider (QP ) with f(x) = 1 2 xt Hx + c T x, where H 0. We obtain: x minimizer to (QP ) 0 = f(x ) = Hx + c. Suppose search direction p k satisfies f(x k ) T p k < 0. Let ϕ(α) = f(x k + αp k ) = f(x k ) + α f(x k ) T p k + α2 2 pt khp k. Then α k = f(x k) T p k p T k Hp gives the minimizer to min α 0 f(x k + αp k ), k i.e., we can perform exact linesearch. A. Forsgren, KTH 7 Lecture 8 Convex optimization 2006/2007

Steepest descent on quadratic objective function Assume that f(x) = 1 2 xt Hx + c T x, where H 0. Further assume that steepest descent with exact linesearch is used, i.e., p k = f(x k ) and α k = f(x k) T p k p T k Hp k Then it can be shown ( that ) f(x k+1 ) f(x 2 cond(h) 1 ) (f(x k ) f(x )). cond(h) + 1 cond(h) 1 cond(h) 1 cond(h) + 1. 1, i.e., slow linear convergence. For a nonlinear function, we typically get slow linear convergence, where H is replaced by 2 f(x k ). A. Forsgren, KTH 8 Lecture 8 Convex optimization 2006/2007

Speed of convergence Definition. Assume that x k IR n, k = 0, 1,..., and assume that lim k x k = x. We say that {x k } k=0 converges to x with speed of convergence r if lim k x k+1 x x k x r = C, where C <. We have x k+1 x C x k x r. We want r large (and C close to zero). Of interest: ˆ r = 1, 0 < C < 1, linear convergence. (Steepest descent.) ˆ r = 1, C = 0, superlinear convergence. (Quasi-Newton.) ˆ r = 2, quadratic convergence. (Newton s method.) A. Forsgren, KTH 9 Lecture 8 Convex optimization 2006/2007

Newton s method for solving a nonlinear equation Consider solving the nonlinear equation f(u) = 0, where f : IR n IR, f C 2. Then, f(u + p) = f(u) + 2 f(u)p + o( p ). Linearization given by f(u) + 2 f(u)p. Choose p so that f(u) + 2 f(u)p = 0, i.e., solve 2 f(u)p = f(u). A Newton iteration takes the following form for a given u. ˆ p solves 2 f(u)p = f(u). ˆ u u + p. (The nonlinear equation need not be a gradient.) A. Forsgren, KTH 10 Lecture 8 Convex optimization 2006/2007

Speed of convergence for Newton s method Theorem. Assume that f C 3 and that 2 f(u ) is nonsingular. Then, if Newton s method (with steplength one) is started at a point sufficiently close to u, then it is well defined and converges to u with convergence rate at least two, i.e., there is a constant C such that u k+1 u C u k u 2. The proof can be given by studying a Taylor-series expansion, u k+1 u = u k 2 f(u k ) 1 f(u k ) u = 2 f(u k ) 1 ( f(u ) f(u k ) + 2 f(u k )(u u k )). For u k sufficiently close to u, f(u ) f(u k ) + 2 f(u k )(u u k ) C u k u 2. A. Forsgren, KTH 11 Lecture 8 Convex optimization 2006/2007

One-dimensional example for Newton s method For a positive number d, consider computing 1/d by minimizing f(u) = du ln u. Then, f (u) = d 1 u, f (u) = 1 u 2. We see that u = 1 d. u k+1 = u k f (u k ) f (u k ) = u k d 1 u k 1 u 2 k = 2u k u 2 kd. Then, u k+1 1 d = 2u k u 2 kd 1 d = d ( u k 1 d) 2. A. Forsgren, KTH 12 Lecture 8 Convex optimization 2006/2007

One-dimensional example for Newton s method, cont. Graphical picture for d = 2. A. Forsgren, KTH 13 Lecture 8 Convex optimization 2006/2007

Sufficient descent direction For a direction p k to ensure convergence is must be a sufficient descent direction. Typical conditions are f(x k) T p k f(x k ) p k σ, where σ is a positive constant. This means that p k must be sufficiently similar to the negative gradient. For search direction p k from B k p k = f(x k ) this is required by ensuring that B k M and Bk 1 m, where m and M are positive constants. In a modified Newton method modifications of 2 f(x k ) can be made, if needed, by a modified Cholesky factorization. A. Forsgren, KTH 14 Lecture 8 Convex optimization 2006/2007

Linesearch In the linesearch α k is determined as an approximate solution to min α 0 ϕ(α), where ϕ(α) = f(x k + αp k ). We want f(x k+1 ) < f(x k ), i.e., ϕ(α k ) < ϕ(0). This is not sufficient to ensure convergence. Example requirement for step not too long: ϕ(α) ϕ(0) + µαϕ (0), i.e., f(x k + αp k ) f(x k ) + µα f(x k ) T p k, where µ (0, 1). 2 (Armijo condition) Example requirement for step not too short: ϕ (α) ηϕ (0), i.e., (Wolfe condition) f(x k + αp k ) T p k η f(x k ) T p k, where η (µ, 1). Alternative requirement for not too short step: Take smallest nonnegative integer i such that f(x k + 2 i p k ) f(x k ) + µ2 i f(x k ) T p k. ( Backtracking ) A. Forsgren, KTH 15 Lecture 8 Convex optimization 2006/2007

Illustration of linesearch conditions The linesearch conditions of Wolfe-Armijo type can be illustrated in the following picture. A. Forsgren, KTH 16 Lecture 8 Convex optimization 2006/2007

Linesearch conditions To find ᾱ such that the Wolfe and Armijo conditions are fulfilled we may consider ˆϕ(α) = ϕ(α) ϕ(0) µαϕ (0). Then ˆϕ(0) = 0 and ˆϕ (0) < 0. In addition, there must exist ᾱ > 0 such that ˆϕ(ᾱ) = 0, otherwise ϕ is unbounded from below. By the mean-value theorem there is an ˆα (0, ᾱ) such that ˆϕ( ˆα) < 0 and ˆϕ ( ˆα) = 0. Since µ < η we obtain ϕ(α) ϕ(0) + µαϕ (0) and ϕ (α) ηϕ (0) for α in a neighborhood of ˆα. For example, bisection in combination with polynomial interpolation can be used on ˆϕ to find a suitable α. A. Forsgren, KTH 17 Lecture 8 Convex optimization 2006/2007

Newton s method and steepest descent The steepest descent direction solves minimize p IR n f(x) T p subject to p T p 1. The Newton direction solves minimize p IR n f(x) T p subject to p T 2 f(x)p 1. The Newton step is a steepest-descent step in the norm defined by 2 f(x), i.e., u 2 f(x) = (u T 2 f(x)u) 1/2. A. Forsgren, KTH 18 Lecture 8 Convex optimization 2006/2007

Self-concordant functions When proving polynomial complexity of interior methods for convex optimization, the notion of self-concordant functions is an important concept. Definition. A three times differentiable function f : C IR, which is convex on the convex set C, is self-concordant if f (x) 2f (x) 3/2. In essence, this means that the third derivatives are not too large. An important self-concordant function is f(x) = ln x for x > 0. A. Forsgren, KTH 19 Lecture 8 Convex optimization 2006/2007

Suggested reading Suggested reading in the textbook: ˆ Sections 9.1 9.7. A. Forsgren, KTH 20 Lecture 8 Convex optimization 2006/2007

Equality-constrained convex program Consider a convex optimization problem on the form (CP = ) minimize x IR n f(x) subject to Ax = b, where f : IR n IR is convex and twice continuously differentiable. If the Lagrangian function is defined as l(x, λ) = f(x) λ T (Ax b), the first-order optimality conditions are l(x, λ) = 0. We write them as xl(x, λ) λ l(x, λ) = f(x) A(x)T λ Ax b = 0 0. A. Forsgren, KTH 21 Lecture 8 Convex optimization 2006/2007

Newton iteration A Newton iteration on the optimality conditions takes the form A T 2 f(x) A 0 p ν = f(x) AT λ Ax b. We may use f(x) AT λ Ax b 2 as merit function, i.e., to measure how good a point is. A. Forsgren, KTH 22 Lecture 8 Convex optimization 2006/2007

Variable elimination Note that for a feasible point x, it holds that A(x x) = 0 for all feasible x. Let Z be a matrix whose columns form a basis for null(a). Then x = x + Zv, with a one-to-one correspondence between x and v. Let ϕ(v) = f( x + Zv). We may then rewrite the problem as (CP =) minimize v IR n m ϕ(v). Differentiation gives ϕ(v) = Z T f( x + Zv), 2 ϕ(v) = Z T 2 f( x + Zv)Z. This is an unconstrained problem. We may solve (CP =) and identify x = x + Zv, where v is associated with (CP =). Z T f(x) is called the reduced gradient to f in x. Z T 2 f(x)z is called the reduced Hessian to f in x. A. Forsgren, KTH 23 Lecture 8 Convex optimization 2006/2007

First-order optimality conditions as a system of equations, cont. The resulting Newton system may equivalently be written as A T 2 f(x) A 0 p (λ + ν) = f(x) (Ax b), alternatively A T 2 f(x) A 0 p λ + = f(x) (Ax b). We prefer the form with λ +, since it can be directly generalized to problems with inequality constraints. A. Forsgren, KTH 24 Lecture 8 Convex optimization 2006/2007

Quadratic programming with equality constraints Compare with an equality-constrained quadratic programming problem (EQP ) minimize 1 2 pt Hp + c T p subject to Ap = b, p IR n, where the unique optimal solution p and multiplier vector λ + are given by H AT A 0 p λ + = c b if Z T HZ 0 and A has full row rank, where Z is a matrix whose columns form a basis for null(a)., A. Forsgren, KTH 25 Lecture 8 Convex optimization 2006/2007

Newton iteration and equality-constrained quadratic program Compare A T 2 f(x) A 0 p λ + = f(x) (Ax b) with H AT A 0 p λ + = c b. Identify: 2 f(x) H f(x) c A A (Ax b) b. A. Forsgren, KTH 26 Lecture 8 Convex optimization 2006/2007

Newton iteration as a QP problem A Newton iteration for solving the first-order necessary optimality conditions to (CP = ) may be viewed as solving the QP problem (QP = ) minimize 1 2 pt 2 f(x)p + f(x) T p subject to Ap = (Ax b), p IR n, and letting x + = x + p, and λ + are given by the multipliers of (QP = ). Problem (QP = ) is well defined with unique optimal solution p and multiplier vector λ + if Z T 2 f(x)z 0 and A has full row rank, where Z is a matrix whose columns form a basis for null(a). A. Forsgren, KTH 27 Lecture 8 Convex optimization 2006/2007

An SQP iteration for problems with equality constraints Given x, λ such that Z T 2 f(x)z 0 and A has full row rank, a Newton iteration takes the following form. ˆ Compute optimal solution p and multiplier vector λ + to (QP = ) minimize 1 2 pt 2 f(x)p + f(x) T p subject to Ap = (Ax b), p IR n, ˆ x x + p, λ λ +. We call this method sequential quadratic programming (SQP). NB! (QP = ) is solved by solving a system of linear equations. NB! x and λ have given numerical values in (QP = ). A. Forsgren, KTH 28 Lecture 8 Convex optimization 2006/2007

ed of convergence for SQP method for equality-constrained problem Theorem. Assume that f C 3 is convex on R n and that A IR m n has full row rank. Further, assume that x is a minimizer of (CP = ) such that Z T 2 f(x)z 0, where Z is a matrix whose columns form a basis for null(a). If the SQP method (with steplength one) is started at a point sufficiently close to x, λ, then it is well defined and converges to x, λ with convergence rate at least two. Proof. In a neighborhood of x, λ it holds that Z T 2 f(x)z 0 and 2 f(x) A T is nonsingular. The subproblem (QP = ) is hence well A 0 defined and the result follows from the quadratic rate of convergence of Newton s method. A. Forsgren, KTH 29 Lecture 8 Convex optimization 2006/2007