IE 5531: Engineering Optimization I

Similar documents
IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I

IE 5531 Midterm #2 Solutions

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I

Unconstrained minimization of smooth functions

Unconstrained optimization

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

The Conjugate Gradient Method

Line Search Methods. Shefali Kulkarni-Thaker

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Gradient Descent. Dr. Xiaowei Huang

1 Numerical optimization

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Optimization methods

Unconstrained optimization I Gradient-type methods

Scientific Computing: Optimization

Computational Finance

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Nonlinear Optimization: What s important?

Optimization Tutorial 1. Basic Gradient Descent

8 Numerical methods for unconstrained problems

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

TMA4180 Solutions to recommended exercises in Chapter 3 of N&W

IE 5531 Practice Midterm #2

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

IE 5531: Engineering Optimization I

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

1 Numerical optimization

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Optimality Conditions for Constrained Optimization

Numerical optimization

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Rice University. Answer Key to Mid-Semester Examination Fall ECON 501: Advanced Microeconomic Theory. Part A

Numerical Optimization of Partial Differential Equations

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

minimize x subject to (x 2)(x 4) u,

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Chapter 4. Unconstrained optimization

Higher-Order Methods

Convex Optimization. Problem set 2. Due Monday April 26th

ARE202A, Fall Contents

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Numerical Optimization

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore

A vector from the origin to H, V could be expressed using:

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

1-D Optimization. Lab 16. Overview of Line Search Algorithms. Derivative versus Derivative-Free Methods

Multidisciplinary System Design Optimization (MSDO)

MATH 4211/6211 Optimization Basics of Optimization Problems

5 Quasi-Newton Methods

Algorithms for Nonsmooth Optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

One Variable Calculus. Izmir University of Economics Econ 533: Quantitative Methods and Econometrics

Lecture 8 Optimization

1 Non-negative Matrix Factorization (NMF)

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

Conditional Gradient (Frank-Wolfe) Method

x k+1 = x k + α k p k (13.1)

Math 164: Optimization Barzilai-Borwein Method

390 Chapter 10. Survey of Descent Based Methods special attention must be given to the surfaces of non-dierentiability, it becomes very important toco

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Lec7p1, ORF363/COS323

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Static unconstrained optimization

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Nonlinear Optimization

Solving Dual Problems

Single Variable Minimization

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Introduction to gradient descent

FALL 2018 MATH 4211/6211 Optimization Homework 4

Numerical Methods. V. Leclère May 15, x R n

Introduction to Nonlinear Optimization Paul J. Atzberger

Active sets, steepest descent, and smooth approximation of functions

Nonlinear Programming

Projected Gradient Methods for NCP 57. Complementarity Problems via Normal Maps

Constrained Optimization and Lagrangian Duality

Chapter 6: Derivative-Based. optimization 1

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

You should be able to...

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

ECON 582: Dynamic Programming (Chapter 6, Acemoglu) Instructor: Dmytro Hryshko

Microeconomic Theory I Midterm October 2017

CPSC 540: Machine Learning

LECTURE 15 + C+F. = A 11 x 1x1 +2A 12 x 1x2 + A 22 x 2x2 + B 1 x 1 + B 2 x 2. xi y 2 = ~y 2 (x 1 ;x 2 ) x 2 = ~x 2 (y 1 ;y 2 1

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

CSC321 Lecture 6: Backpropagation

Transcription:

IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 1 / 21

Administrivia Midterms returned 11/01 11/01 oce hours moved PS5 posted this evening Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 2 / 21

Recap: Applications of KKT conditions Applications of KKT conditions: Portfolio optimization Public good allocation Communication channel power allocation (water-lling) Fisher's exchange market Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 3 / 21

Today Algorithms for unconstrained minimization: Introduction Bisection search Golden section search Line search Wolfe, Goldstein conditions Gradient method (steepest descent) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 4 / 21

Introduction Today's lecture is focused on solving the unconstrained problem for x R n minimize f (x) Ideally, we would like to nd a global minimizer, i.e. a point x such that f (x ) f (x) for all x R n In general, as we have seen with the KKT conditions, we have to settle for a local minimizer, i.e. a point x such that f (x ) f (x) for all x in a local neighborhood N (x ) If f (x) is convex, these two notions are the same Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 5 / 21

Necessary and sucient conditions If x is a local minimizer, then there must be no descent direction, i.e. a direction d such that f (x ) T d < 0 This immediately implies that f (x ) = 0 We also need to distinguish between local maximizers and local minimizers, so we also require that H 0, where h ij = 2 f (x ) x i x j The stronger condition H 0 is a sucient condition for x to be a minimizer Again, if f (x) is convex (and continuously dierentiable), then f (x ) = 0 is a necessary and sucient condition for a global minimizer Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 6 / 21

Overview Optimization algorithms tend to be iterative procedures: Starting at a given point x 0, they generate a sequence {x k } of iterates This sequence terminates when either no more progress can be made (out of memory, etc.) or when a solution point has been approximated satisfactorily At any given iterate x k, we generally want x k+1 to satisfy f (x k+1) < f (x k ) Furthermore, we want our sequence to converge to a local minimizer x The general approach is a line search: At any given iterate x k, choose a direction d k, and then set x k+1 = x k + α k d k for some scalar α k > 0 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 7 / 21

Convergent sequences Denition Let {x k } be a sequence of real numbers. Then {x k } converges to x if and only if for all real numbers ɛ > 0, there exists a positive integer K such that x k x < ɛ for all k K. Examples of convergence: x k = 1/k x k = (1/2) k [ x k = 1 log(k+1) ] k Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 8 / 21

Searching in one variable: root-nding Intermediate value theorem: given a continuous single-variable function f (x) and a pair of points x 0 and x 1 such that f (x l ) < 0 and f (x r ) > 0, there exists a point x [x l, x r ] such that f (x ) = 0 A simpler question to motivate: how can we nd x (or a point within ɛ of x )? Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 9 / 21

Bisection 1 Choose x mid = x l+xr 2 and evaluate f (x mid ) 2 If f (x mid ) = 0, then x = x mid and we're done 3 Otherwise, 1 If f (x mid ) < 0, then set x l = x mid 2 If f (x mid ) > 0, then set x r = x mid 4 If x r x l < ɛ, we're done; otherwise, go to step 1 The algorithm above divides the search interval in half at every iteration; thus, to approximate x by ɛ we require at most log 2 iterations r l ɛ Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 10 / 21

Golden section search Consider a unimodal function f (x) dened on an interval [x l, x r ] Unimodal: f (x) has only one local minimizer x in [x l, x r ] How can we nd x (or a point within ɛ of x )? Hint: we can do this without derivatives Hint: we need to sample two points x l, x in [x r l, x r ] Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 11 / 21

Golden section search Assume without loss of generality that x l = 0 and x r = 1; set ψ = 3 5 1 Set x l = ψ and x r = 1 ψ. ( ) ( ) 2 If f x l < f x r, then the minimizer must lie in the interval [ ] xl, x r, so set xr = x r 2 3 Otherwise, the minimizer must lie in the interval [ ] x l, x r, so set x l = x l 4 If x r x l < ɛ, we're done; otherwise, go to step 1 By setting ψ = 3 5 we decrease the search interval by a constant factor 2 1 ψ 0.618 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 12 / 21

Line search: step length Consider the multi-dimensional problem minimize f (x) for x R n At each iteration x k we set d k = f (x k ) and set x k+1 = x k + α k d k, for appropriately chosen α k Ideally, we would like for α k to be the minimizer of the univariate function φ (α) := f (x k + αd k ) but this is time-consuming In the big picture, we want α k to give us a sucient reduction in f (x), without spending too much time on it Two conditions we can impose are the Wolfe and Goldstein conditions Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 13 / 21

Armijo condition Clearly the step length α k f (x), so we require should guarantee a sucient decrease in φ (α) = f (x k + αd k ) f (x k ) + c 1 α f (x k ) T d k with c 1 (0, 1) The right-hand side is linear in α Note that this is satised for all α that are suciently small In practice, we often set c 1 10 4 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 14 / 21

Curvature condition The preceding condition is not sucient because an arbitrarily small α satises it, which means that {x k } may not converge to a minimizer One way to get around this is to impose the additional condition where c 2 (c 1, 1) φ (α) = f (x k + αd k ) T d k c 2 f (x k ) T d k This condition just says that the slope at φ (α) has to be more than c 2 times the slope at φ (0) Typically we choose c 2 0.9 If the slope at φ (α) were really small, it would mean that our step size wasn't chosen very well (we could continue in that direction and decrease the function) The Armijo condition and the curvature condition, when combined, are called the Wolfe conditions Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 15 / 21

Goldstein conditions An alternative to the Wolfe conditions is the Goldstein conditions: f (x k )+(1 c) α f (x k ) T d k f (x k + αd k ) f (x k )+cα f (x k ) T d k with c (0, 1/2) The second inequality is just the sucient decrease condition The rst inequality bounds the step length from below One disadvantage is that the local minimizers of φ (α) may be excluded in this search Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 16 / 21

Steepest (gradient) descent example Recall that in the method of steepest descent, we set d k = f (x k ) Consider the case where we want to minimize f (x) = c T x + 1 2 xt Q x where Q is a symmetric positive denite matrix Clearly, the unique minimizer lies where f (x ) = 0, which occurs precisely when Q x = c The descent direction will be d = f (x) = (c + Q x) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 17 / 21

Steepest descent example The iteration scheme x k+1 = x k + α k d k is given by x k+1 = x k α k (c + Q x k ) We need to choose a step size α k, so we consider φ (α) = f (x k α (c + Q x k )) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 18 / 21

Steepest descent example Note that we don't even need the Wolfe or Goldstein conditions, as we can nd the optimal α analytically! φ (α) = f (x k α (c + Q x k )) = c T (x k α (c + Q x k )) + 1 2 (x k α (c + Q x k )) T Q (x k α (c + Q x k )) Since φ (α) is a strictly convex quadratic function in α it is not hard to see that its minimizer occurs where c T d k + x T k Q d k + αd T k Q d k = 0 and thus we set with d k = (c + Q x k ) α k = dt k d k d T k Q d k Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 19 / 21

Steepest descent example The recursion for the steepest descent method is therefore x k+1 ( = x k d T k d k d T k Q d k ) d k Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 20 / 21

Convergence of steepest descent Theorem Let f (x) be a given continuously dierentiable function. Let x 0 R n be a point for which the sub-level set X 0 = {x R n : f (x) f (x 0 )} is bounded. Let {x k } be a sequence of points generated by the steepest descent method initiated at x 0, using either the Wolfe or Goldstein line search conditions. Then {x k } converges to a stationary point of f (x). The above theorem gives what is called the global convergence property of the steepest-descent method No matter how far away x 0 is, the steepest descent method must converge to a stationary point The steepest descent method may, however, be very slow to reach that point Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 21 / 21