IE 5531: Engineering Optimization I

Similar documents
IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I

IE 5531 Midterm #2 Solutions

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I

Scientific Computing: Optimization

Unconstrained minimization of smooth functions

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

IE 5531: Engineering Optimization I

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Nonlinear Optimization: What s important?

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

The Ellipsoid (Kachiyan) Method

A vector from the origin to H, V could be expressed using:

The Steepest Descent Algorithm for Unconstrained Optimization

Nonlinear Programming

IE 5531: Engineering Optimization I

MATH 4211/6211 Optimization Basics of Optimization Problems

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Math 273a: Optimization Basic concepts

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

1 Numerical optimization

IE 5531 Practice Midterm #2

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Chapter III. Unconstrained Univariate Optimization

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Line Search Methods. Shefali Kulkarni-Thaker

2.098/6.255/ Optimization Methods Practice True/False Questions

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Constrained Optimization

2.3 Linear Programming

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Written Examination

A First-Order Framework for Solving Ellipsoidal Inclusion and. Inclusion and Optimal Design Problems. Selin Damla Ahipa³ao lu.

x k+1 = x k + α k p k (13.1)

Introduction to unconstrained optimization - direct search methods

Constrained Optimization and Lagrangian Duality

Gradient Descent. Dr. Xiaowei Huang

Lecture 14 Ellipsoid method

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

Unconstrained optimization

Chapter 4. Unconstrained optimization

Numerical Optimization of Partial Differential Equations

1 Numerical optimization

Linear Programming Duality

Conditional Gradient (Frank-Wolfe) Method

The Ellipsoid Algorithm

minimize x subject to (x 2)(x 4) u,

5 Handling Constraints

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Statistics 580 Optimization Methods

Multidisciplinary System Design Optimization (MSDO)

You should be able to...

Constrained Optimization

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Algorithms for constrained local optimization

Computational Finance

FALL 2018 MATH 4211/6211 Optimization Homework 4

Numerical Methods. V. Leclère May 15, x R n

4TE3/6TE3. Algorithms for. Continuous Optimization

Primal/Dual Decomposition Methods

Optimization Tutorial 1. Basic Gradient Descent

Optimization II: Unconstrained Multivariable

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS

8 Numerical methods for unconstrained problems

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Lecture 3. Optimization Problems and Iterative Algorithms

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

Numerical Optimization: Basic Concepts and Algorithms

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Numerical Methods I Solving Nonlinear Equations

Optimisation and Operations Research

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

Lecture 15: October 15

Optimization methods

Introduction to Nonlinear Optimization Paul J. Atzberger

Mathematical Economics. Lecture Notes (in extracts)

Optimization II: Unconstrained Multivariable

ARE202A, Fall Contents

Math (P)refresher Lecture 8: Unconstrained Optimization

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

Introduction to Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Newton s Method. Javier Peña Convex Optimization /36-725

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Economics 101A (Lecture 3) Stefano DellaVigna

Scientific Data Computing: Lecture 3

1 Computing with constraints

Transcription:

IE 5531: Engineering Optimization I Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson November 1, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 1 / 24

Administrivia Midterms returned 11/01 11/01 oce hours moved No class next week (INFORMS) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 2 / 24

Recap Algorithms for unconstrained minimization: Introduction Bisection search (root-nding) Golden section search (unimodal minimization) Line search (minimization) Wolfe, Goldstein conditions Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 3 / 24

Today Gradient method (steepest descent) example Newton's method Constrained problems and the ellipsoid method Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 4 / 24

Steepest (gradient) descent example Recall that in the method of steepest descent, we set d k = f (x k ) Consider the case where we want to minimize f (x) = c T x + 1 2 xt Q x where Q is a symmetric positive denite matrix Clearly, the unique minimizer lies where f (x ) = 0, which occurs precisely when Q x = c The descent direction will be d = f (x) = (c + Q x) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 5 / 24

Steepest descent example The iteration scheme x k+1 = x k + α k d k is given by x k+1 = x k α k (c + Q x k ) We need to choose a step size α k, so we consider φ (α) = f (x k α (c + Q x k )) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 6 / 24

Steepest descent example Note that we can nd the optimal α analytically, which automatically satises the Wolfe conditions φ (α) = f (x k α (c + Q x k )) = c T (x k α (c + Q x k )) + 1 2 (x k α (c + Q x k )) T Q (x k α (c + Q x k )) Since φ (α) is a strictly convex quadratic function in α it is not hard to see that its minimizer occurs where c T d k + x T k Q d k + αd T k Q d k = 0 and thus we set with d k = (c + Q x k ) α k = dt k d k d T k Q d k Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 7 / 24

Steepest descent example The recursion for the steepest descent method is therefore ( x k+1 = x k d T k d k d T k Q d k ) d k Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 8 / 24

The above theorem gives what is called the global convergence property of the steepest-descent method No matter how far away x0 is, the steepest descent method must converge to a stationary point The steepest descent method may, however, be very slow to reach that point Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 9 / 24 Convergence of steepest descent Theorem Let f (x) be a given continuously dierentiable function. Let x0 R n be a point for which the sub-level set X 0 = {x R n : f (x) f (x0)} is bounded. Let {x k } be a sequence of points generated by the steepest descent method initiated at x0, using either the Wolfe or Goldstein line search conditions. Then {x k } converges to a stationary point of f (x).

Newton's method Minimizing a function f (x) can be thought of as nding a solution to the nonlinear system of equations f (x) = 0 Suppose we begin at a point x0 that is thought to be close to a minimizer x We may consider the problem of nding a solution to f (x) = 0 that is close to x0 (we're assuming that there aren't any maximizers that are closer to x0) Newton's method is a general method for solving a system of equations g (x) = 0 (to minimize/maximize, set g (x) := f (x) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 10 / 24

Univariate Newton's method Newton's method is an iterative method that follows the following scheme: 1 At a given iterate x k, make a linear approximation L (x) to g (x) at x k by dierentiating g (x) 2 Set x k+1 to be the solution to the linear system of equations L (x) = 0 It is not hard to show that, in the univariate case, the iteration is x k+1 = x k g (x k) g (x k ) which is well-dened provided g (x k ) exists and is nonzero at each step Note that the iteration terminates if g (x k ) = 0 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 11 / 24

Graphical interpretation Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 12 / 24

Graphical interpretation Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 12 / 24

Graphical interpretation Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 12 / 24

Conditions for convergence Without further conditions imposed, Newton's method is not globally convergent: The function g (x) = x 1/3 has a root at x = 0, but any non-zero starting point will diverge: Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 13 / 24

Conditions for convergence Without further conditions imposed, Newton's method is not globally convergent: The function g (x) = x 1/3 has a root at x = 0, but any non-zero starting point will diverge: Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 13 / 24

Conditions for convergence Without further conditions imposed, Newton's method is not globally convergent: The function g (x) = x 1/3 has a root at x = 0, but any non-zero starting point will diverge: Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 13 / 24

Conditions for convergence Without further conditions imposed, Newton's method is not globally convergent: The function g (x) = x 1/3 has a root at x = 0, but any non-zero starting point will diverge: Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 13 / 24

Conditions for convergence Without further conditions imposed, Newton's method is not globally convergent: The function g (x) = x 1/3 has a root at x = 0, but any non-zero starting point will diverge: Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 13 / 24

Convergence conditions Theorem If g (x) is twice continuously dierentiable and x is a root of g (x) at which g (x ) 0, then provided that x 0 x is suciently small, the sequence generated by the Newton iterations x k+1 = x k g (x k) g (x k ) converges quadratically to x at a rate C = g (x ) 2g (x ) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 14 / 24

Multiple dimensions Consider the problem of solving g (x) = 0 Dene the Jacobian matrix J = g by [J] ij = g i (x) (x j ) (the rows of J are just the gradient vectors g i (x)) The necessary conditions for convergence are somewhat more detailed and involved and we will not go into them Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 15 / 24

Computational issues As an optimization procedure the Newton method requires all rst and second derivatives of the objective function This can be very time-consuming, if the objective function is expensive to compute For this reason, Quasi-Newton approaches are often used in which the second derivative is approximated Trade-o: steepest descent requires more iterations, but each iteration is fast Newton's method requires fewer iterations, but they take longer Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 16 / 24

Constrained optimization In a constrained optimization problem minimize f (x) x F s.t. we have to worry about feasibility as well as optimality A descent direction must also be feasible, for example Gradient projection: project the gradient vector onto F and move in that direction (more on this later) The ellipsoid method: a completely dierent approach developed in the 1960's and 1970's in the Soviet Union The idea is to enclose the region of interest in a sequence of ellipsoids whose size is decreasing Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 17 / 24

Constrained optimization In a constrained optimization problem minimize f (x) x F s.t. we have to worry about feasibility as well as optimality A descent direction must also be feasible, for example Gradient projection: project the gradient vector onto F and move in that direction (more on this later) The ellipsoid method: a completely dierent approach developed in the 1960's and 1970's in the Soviet Union The idea is to enclose the region of interest in a sequence of ellipsoids whose size is decreasing This is similar to the bisection method in 2 dimensions Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 17 / 24

Ellipsoid method The ellipsoid method is best introduced by considering the problem of nding an element of of a solution set X given by a system of linear inequalities: X = {x R n : Ax b, i = 1,..., m} Instead of restricting ourselves to linear inequalities, we can have convex inequalities instead, i.e. g i (x) 0 with g (x) convex for all i, although this makes the exposition more challenging As we saw in an earlier problem set, solving a linear program is equivalent to nding a feasible solution to a set of linear inequalities We make two technical assumptions to start: 1 X is contained in a ball centered at the origin with radius R > 0 2 The volume of X is at least ɛ n vol (B 0 ), where B 0 is the volume of the unit ball in R n Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 18 / 24

Ellipsoid representation An ellipsoid is just a set of the form { } E k = x R n : (x x k ) T B 1 (x x k k ) 1 where x k is the center of the ellipsoid B k is a symmetric positive denite matrix of dimension n In two dimensions this is just { ( x E k = (x, y) R 2 x0 : y y 0 ) T ( a b/2 b/2 c ) ( x x0 y y 0 ) } 1 i.e. a (x x 0 ) 2 + b (x x 0 ) (y y 0 ) + c (y y 0 ) 2 1 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 19 / 24

Volume of ellipsoid and cutting plane It is not hard to show that vol (E k ) = det B k volb 0 At the kth iteration, we know that X E k ; we're going to shrink these ellipsoids at each iteration by a constant factor until a feasible point is found We check whether the center point x k is in X : If x k X, then we're done If not, then at least one constraint is violated, say a T j E half k := { } x E k : a T j x a T j x k In that case, X lies in the half-ellipsoid x k < b j Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 20 / 24

Illustration Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 21 / 24

Illustration Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 21 / 24

Illustration Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 21 / 24

Illustration Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 21 / 24

Constructing a new ellipsoid At a given iteration k with x k and E k, we construct E k+1 as follows: dene τ = 1 n + 1 ; δ = n2 n 2 1 ; σ = 2τ We set x k+1 = x k + Tj ( a τ B k a j ; B k aj B k+1 = δ B k σ B k a j a a T j T B j k B k a j ) It is rather cumbersome, but it turns out { that B k+1 is the minimum } volume ellipsoid that contains E half := x E k k : a T x a T x j j k Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 22 / 24

Convergence Theorem The ellipsoid E k+1 dened in the { preceding slide is the minimum volume ellipsoid that contains E half := x E k k : a T x a T x j j k }. Moreover, vol (E k+1) vol (E k ) ( n 2 = n 2 1 ) (n 1)/2 ( ) n 1 n + 1 < exp < 1 2 (n + 1) This establishes that the volume of the ellipsoid decreases by a constant amount at each iteration. It can be shown that the ellipsoid solves linear programs in O ( n 2 log (R/ɛ) ) iterations. Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 23 / 24

Comments The ellipsoid can solve any convex problem as well, as long as we can generate a hyperplane through the center of E k that must contain X In practice, the ellipsoid method is usually slower than the simplex method; it exists primarily as a pedagogical tool to prove the complexity of solving linear (or convex) programs Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 1, 2010 24 / 24