3E4: Modelling Choice. Introduction to nonlinear programming. Announcements

Similar documents
Constrained Optimization

2.098/6.255/ Optimization Methods Practice True/False Questions

Nonlinear Optimization for Optimal Control

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

minimize x subject to (x 2)(x 4) u,

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Linear Programming Inverse Projection Theory Chapter 3

CO 250 Final Exam Guide

Summary of the simplex method

Ann-Brith Strömberg. Lecture 4 Linear and Integer Optimization with Applications 1/10

56:270 Final Exam - May

OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES

Summary of the simplex method

More First-Order Optimization Algorithms

September Math Course: First Order Derivative

Written Examination

February 17, Simplex Method Continued

OPTIMISATION /09 EXAM PREPARATION GUIDELINES

Numerical Optimization. Review: Unconstrained Optimization

3E4: Modelling Choice

Primal/Dual Decomposition Methods

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Algorithms for constrained local optimization

Linear Programming: Simplex

Revised Simplex Method

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

Special cases of linear programming

CHAPTER 2: QUADRATIC PROGRAMMING

4TE3/6TE3. Algorithms for. Continuous Optimization

Multidisciplinary System Design Optimization (MSDO)

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Intro to Nonlinear Optimization

Scientific Computing: Optimization

Review Solutions, Exam 2, Operations Research

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

1 Computing with constraints

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Generalization to inequality constrained problem. Maximize

MVE165/MMG631 Linear and integer optimization with applications Lecture 5 Linear programming duality and sensitivity analysis

Optimisation and Operations Research

Algorithms for Constrained Optimization

Section Notes 9. IP: Cutting Planes. Applied Math 121. Week of April 12, 2010

Structured Problems and Algorithms

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.

Nonlinear equations and optimization

Applications of Linear Programming

Introduction to Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Computational Integer Programming Universidad de los Andes. Lecture 1. Dr. Ted Ralphs

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

IE 5531: Engineering Optimization I

Mathematical optimization

DM545 Linear and Integer Programming. Lecture 7 Revised Simplex Method. Marco Chiarandini

Maximum Theorem, Implicit Function Theorem and Envelope Theorem

36106 Managerial Decision Modeling Linear Decision Models: Part II

10-725/ Optimization Midterm Exam

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

CONSTRAINED NONLINEAR PROGRAMMING

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Constrained optimization

Network Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini

SF2822 Applied nonlinear optimization, final exam Saturday December

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Subgradient Method. Ryan Tibshirani Convex Optimization

CS711008Z Algorithm Design and Analysis

Review of Optimization Methods

3 The Simplex Method. 3.1 Basic Solutions

Michælmas 2012 Operations Research III/IV 1

Non-Linear Optimization

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Standard Form An LP is in standard form when: All variables are non-negativenegative All constraints are equalities Putting an LP formulation into sta

Bellman s Curse of Dimensionality

Optimisation. 3/10/2010 Tibor Illés Optimisation

Lagrangian Duality. Richard Lusby. Department of Management Engineering Technical University of Denmark

3.10 Lagrangian relaxation

IE 5531 Midterm #2 Solutions

Appendix A Taylor Approximations and Definite Matrices

1 The linear algebra of linear programs (March 15 and 22, 2015)

Math 16 - Practice Final

Chapter 3 Numerical Methods

Operations Research Lecture 6: Integer Programming

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

1 Seidel s LP algorithm

CO 602/CM 740: Fundamentals of Optimization Problem Set 4

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization

1. Algebraic and geometric treatments Consider an LP problem in the standard form. x 0. Solutions to the system of linear equations

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

LINEAR PROGRAMMING I. a refreshing example standard form fundamental questions geometry linear algebra simplex algorithm

Distributed Real-Time Control Systems. Lecture Distributed Control Linear Programming

Lagrange Relaxation and Duality

MATH2070 Optimisation

Second Order Optimality Conditions for Constrained Nonlinear Programming

The Dual Simplex Algorithm

Lecture 3: Linesearch methods (continued). Steepest descent methods

Handout 1: Introduction to Dynamic Programming. 1 Dynamic Programming: Introduction and Examples

Lecture slides by Kevin Wayne

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

Transcription:

3E4: Modelling Choice Lecture 7 Introduction to nonlinear programming 1 Announcements Solutions to Lecture 4-6 Homework will be available from http://www.eng.cam.ac.uk/~dr241/3e4 Looking ahead to Lecture 8 please email me any particularly hard questions for review part of lecture I ll hand out a copy of a sample exam paper 2 1

3E4 : Lecture Outline Lecture 1. Management Science & Optimisation Modelling: Linear Programming Lecture 2. LP: Spreadsheets and the Simplex Method Lecture 3. LP: Sensitivity & shadow prices Reduced cost & shadow price formulae Lecture 4. Integer LP: branch & bound Lecture 5. Network flows problems Lecture 6. Multiobjective LP Lecture 7 8. Introduction to nonlinear programming 3 A simple Nonlinear Program Economic Order Quantity (EOQ) Problem for Managing Inventory Involves determining the optimal quantity to purchase when orders are placed. Deals with trade off between carrying or holding cost, and ordering cost Small orders result in: low inventory levels & carrying costs frequent orders & higher ordering costs Large orders result in: higher inventory levels & carrying costs infrequent orders & lower ordering costs 4 2

Inventory 60 50 40 30 20 10 Sample Inventory Profiles Annual Usage = 150 Order Size = 50 Number of Orders = 3 Avg Inventory = 25 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Month Inventory 60 50 40 30 20 10 Annual Usage = 150 Order Size = 25 Number of Orders = 6 Avg Inventory = 12.5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Month 5 The standard EOQ model Data D = annual demand for the item C = unit purchase cost for the item S = fixed cost of placing an order i = cost of holding inventory for a year (expressed as a % of C) Variable Q: order quantity Minimize Annual Cost: MIN DC+(D/Q)S+(Q/2)iC 6 3

EOQ Cost Relationships 1000 800 Total Cost 600 Cost 400 Carrying Cost 200 0 EOQ Ordering Cost 0 10 20 30 40 50 Order Quantity 7 Variations of the EOQ Model A-level calculus exercise shows that the optimal order quantity Q is given by Q * = 2DS Ci More realistic variations on the basic EOQ model need to account for quantity discounts storage restrictions etc 8 4

Linear functions A linear (or rather affine) function is of the form f(x 1,,x n ) = a 1 x 1 + + a n x n +c where a 1,...,a n and c are data and x 1, x n are variables In spreadsheets you are not leaving the realm of linear formulas if you multiply linear formulas by data or constants or add them up 9 Nonlinear functions You may be leaving the realm of linear formulas if you multiply two formulas or divide one by another Use the MAX, MIN, or ABS functions use the IF or ROUND functions If you do any of the above then your formula represents a nonlinear function 10 5

Levels of Nonlinearity Differentiable functions If f(x 1,,x n ) and g i (y 1,,y m ) for i=1,, n are differentiable then f(g 1 (y), g n (y)) is a differentiable function of y You are not leaving the realm if differentiable functions if you apply LN,EXP,^,SIN etc. to differentiable formulas Typically, nonlinear means differentiable (so LP is a subclass of NLP) Non-differentiable functions You may leave the realm of differentiable functions if you use MAX, MIN, ABS Discontinuous functions You may leave the realm of continuous functions if you use IF, ROUND, CEILING, FLOOR, INT, LOOKUP 11 Nonlinear Programming (NLP) NLP problems have a nonlinear objective function and/or one or more nonlinear constraints NLP problems are formulated in virtually the same way as linear problems. The mathematics involved in solving general NLPs is quite different to LPs. Solver tends to hide this difference but it is important to understand the difficulties that may be encountered when solving NLPs. 12 6

Solver on Simple NLPs Min cos(πx) starting from x = 0.1 x = 0 x = 0.1 Min (x 1) 2 Unconstrained, or Subject to x 0 13 General Form of an Optimisation Problem MAX (or MIN): f (X 1, X 2,, X n ) Subject to: g 1 (X 1, X 2,, X n ) <= b 1 : : g k (X 1, X 2,, X n ) >= b k : : g m (X 1, X 2,, X n ) = b m Here n and m are fixed dimensions: n = no. of variables (X i ), m = no. of constraints. Each f, g 1,, g m is a function. If there exists at least one nonlinear function the problem is a Nonlinear Program (NLP). 14 7

Unconstrained optimization The unconstrained minimization problem: min f ( x) x R where f : R n R. x* is a global minimum if f(x*) f(x)for all x R n. x* isa local minimum if there exists an open neighborhood B(x*) of x*, such that f(x*) f(x)for allx B(x*). n 15 Unconstrained optimization: Optimality conditions Any vector d R n is a descent direction of f at x* if there exists α*>0 such that f(x*+αd) < f(x*) for all α (0, α*) First order necessary condition: If x* is a local minimum then (no descent direction at x*) f(x*)=0. Second order necessary condition If x* is a local minimum then f(x*)=0 and x T 2 f(x*) x 0 for all x R n. 16 8

Unconstrained optimization: Optimality conditions Any point x R n satisfying first order optimality condition is called stationary. Second order sufficient condition If x* is a stationary point and x T 2 f(x*) x > 0 for all x R n, with x 0, then x* is a strict local minimum. 17 Typical Optimisation Procedure Aim at seeking stationary points. Based on idea of Steepest Descent for minimizing an unconstrained nonlinear function At a given feasible point, find a direction along which you can improve your objective value (search direction) Then search along the direction (line search) until you find a better point and re-iterate from this point BUT if that direction leads out of the feasible set then must correct it (reduced direction) 18 9

Steepest Descent for Unconstrained Optimization Let f be a continuously differentiable function of several variables, f : R n R. Consider the unconstrained problem: min f(x) At a given point x, the steepest descent direction is d = f(x) gradient of f at x is the vector of partial derivatives: f f n f ( x) =,..., x R x x ( ) To decrease f : 1 n search along this direction: find a step size t > 0 such that f(x+td) < f(x). Replace x by x + td and repeat with the new steepest 19 descent direction In-Class Exercise Steepest Descent Directions What is the steepest descent direction for 1. cos(πx) at x=0? 2. x 1 /(1+ x 22 ) at x=(1,0)? 3. 5x 1 2x 2 + x 3? For case 1 above, perform 2 steepest descent steps starting from x = 1/2 with unit stepsize (t = 1). 20 10

Steepest Descent Method for Unconstrained Optimization Consider the unconstrained problem: min f(x) Initially Require a starting vector x 0 Set iteration counter k = 0 At iteration k Descent direction: Let d = f(x k ) Linesearch: find a step size t= t k > 0 such that f(x k +td) < f(x k ). Update: x k+1 = x k + t k d, k = k + 1 and repeat from above 21 What does Steepest Descent Method achieve? Steepest Descent Convergence Theorem Given an initial iterate (vector) x 0, and iteration counter k=0: If f is bounded below and the step size t k > 0 is properly chosen for each iterate x k, then the steepest descent method produces an infinite sequence {x k } where f(x k ) zero vector, on some subsequence of {x k } if any subsequence of {x k } converges to a vector x * (a limit point), then f(x * ) = 0 The linesearch must not choose tk too large or too small. Linesearch methods will not be covered here. 22 11

Steepest Descent Method for Unconstrained Optimization That is, Steepest Descent Method tends to find Stationary Points of f Remind that all minimizers of f are stationary points What happens if f(x k ) = zero vector for some iteration k? 23 Constrained optimization The constrained minimization problem: min x R n subject to f ( x) g( x) 0 h( x) = 0 where f : R n R, g : R n R m, h : R n R p. x* F={x R n : g(x) 0 and h(x) = 0} is a global minimum if f(x*) f(x) for all x F. x* F is a local minimum if there exists an open neighborhood B(x*) of x*, such that f(x*) f(x) for all x F B(x*). 24 12

Constrained optimization: Optimality conditions Need to define feasible descent directions at any point x F.? Will see next lecture! Let s focus first on the practical side. 25 Constrained optimization: Optimality conditions and descent methods Any local minimum satisfies some necessary conditions, called stationarity conditions. Any point satisfying stationarity conditions is called stationary. Iterative (descent) methods look for stationary points. 26 13

Constrained Steepest Descent: Projected Gradient Method Let f :R n R be a continuously differentiable Consider the problem with LP constraints: min f(x) subject to Ax = b, x nonnegative The reduced gradient or projected gradient method is just a version of steepest descent for optimization problems with constraints Given a feasible x, let d = f(x) Motivation: want to decrease f by moving along ray x + td where t starts at 0 and increases Problem: x + td may be infeasible 27 Constrained Steepest Descent: Projected Gradient Method Solution is to convert any point of ray x + td to a feasible point by projection onto feasible set: For any step size t > 0, find nearest point y(t) in feasible set (polyhedron) to x + td Of course, y(t) is feasible 28 14

Negative Gradients of objective function First Iterate Optimum Decreasing level curves of objective function Reduced Gradient Starting Point Feasible Region Projected Gradient Method 29 Constrained Steepest Descent: Projected Gradient Method In-class exercise Let feasible set be the box 0 x 1 in R 2, x = (1,0), and f(x) = 5x 1 2x 2. 1. Sketch the steepest descent ray x + td and the projected gradient path y(t). 2. Give a formula or formulae for y(t). 30 15

Why does Projected Gradient Method Work? Result I: Projected Gradient Descent Either y(t) = x for all t > 0 Or f(y(t)) < f(x) for small t > 0 Result II: Projected Gradient at a minimum If x is a local minimum then y(t) = x for all t > 0 These results say why the projected gradient path y(t) is useful. A useful by-product: Definition: A point x is stationary for our constrained minimization problem if y(t) = x for all t > 0. 31 Formally: Projected Gradient Method Consider the problem with LP constraints: min f(x) subject to Ax = b, x nonnegative Initially Require a feasible starting vector x 0 Set iteration counter k = 0 At iteration k Descent direction: Let d k = f(x k ) Projected gradient linesearch: find step size t= t k > 0 such that f(y k (t)) < f(x k ) where y k (t) is projection of x k + td k onto feasible set Update: x k+1 = y k (t k ), k = k + 1 and repeat from above 32 16

What does Projected Gradient Method achieve? Direct extension of Steepest Descent Convergence: Projected Gradient Convergence Theorem Given an initial iterate (vector) x 0, and iteration counter k=0: If f is bounded below and the step size t k > 0 is properly chosen for each iterate x k, then the projected gradient method produces an infinite sequence {x k } where y k (t k ) x k zero vector, on some subsequence of {x k } if any subsequence of {x k } converges to a vector x * then x * is (feasible and) stationary for the constrained problem 33 Summary of Projected Gradient Method min f(x) subject to Ax = b, x nonnegative reduced gradient or projected gradient method Given a feasible x, let d = f(x) For any step size t > 0, find nearest point y(t) in feasible set (polyhedron) to x + td Fact: Either f(y(t)) < f(x) for all small enough t > 0 Or y(t) = x for all t > 0 and we say x is a stationary point of the above problem Note that all minimizers of above problem are stationary So projected gradient method makes sense can show limit points of iteration sequence are stationary 34 17

Doing it in Excel: Solver Solver uses a version of projected gradient method this called the Generalized Reduced Gradient (GRG) algorithm to solve NLPs. GRG will attempt to track nonlinear feasible sets, which is generally much more difficult than dealing with LP feasible sets GRG can also be used on LPs (if you don t select assume linear model ) but is generally slower than the Simplex method. 35 Starting Points The produced local solution depends on the starting point If you think there may be a better point, try another starting point Instead of assigning an arbitrary value to variable cells, start with values which are representative of the values you expect in the solution Start close to where you expect the optimum to be if that information is available 36 18

A note on convergence In contrast to the simplex method, NLP methods often produce an infinite sequence of iteration points that converge to a (local) optimum The sequence needs to be stopped!! In Excel 8.0 & beyond, the convergence field in the Solver Options dialog box can be adjusted increase to avoid squillions of iterations decrease to avoid early stopping at sub-optimal solutions 37 Sensitivity Analysis Sensitivity report provides Reduced Gradient Lagrange Multipliers The reduced gradient information is the equivalent of reduced costs in LP Lagrange multipliers are the equivalent of shadow prices in LP 38 19

Lagrange Multipliers Amount by which the objective function would improve if the RHS of the constraint was relaxed by one unit For equality constraints: Amount by which the objective function would improve if the RHS of the constraint was increased by one unit Again: For NLP this is derivative information and therefore it isn t possible to give a range of RHS values for which Lagrange multiplier (shadow price) is fixed. 39 Lagrange Multipliers & Shadow Prices I Suppose x* is a stationary point of the NLP min f(x) subject to Ax = b, x 0 Then x* solves the LP min f(x*) T x subject to Ax = b, x 0. Lagrange multiplier for NLP at x* coincides with shadow cost for LP (if one is unique then so is the other) 40 20

Lagrange Multipliers & Shadow Prices II Suppose x* is a stationary point of the NLP: min f(x) subject to h(x) = 0, x 0 where f, h :R n R (single equality constraint) Define c = f(x*), a = h(x*), b = h(x*) T x* ( R) Then x* solves the following LP with a single equality constraint: min c T x subject to a T x = b, x 0 Lagrange multiplier for NLP at x* coincides with shadow cost for LP Technical requirement: f (x*)/ x i 0 for some nonzero component of x* 41 Lagrange Multipliers & Shadow Prices III Suppose x* is a stationary point of the NLP: min f(x) subject to h(x) = 0, x 0 where h :R n R m (m equality constraints), i.e. h(x) = (h 1 (x),, h m (x)). Q: How do we define the associated LP at x*? A: Define c = f(x*), A = matrix of m rows h 1 (x*) T,, h m (x*) T b = h(x*) T x* ( R m ) 42 21

Automatic Scaling Use automatic scaling option since poor scaling can result in breakdown of the method, in particular in NLP For automatic scaling to work properly it is important that your initial variables have typical values 43 Solver Message 1 Solver found a solution. All constraints and optimality conditions are satisfied. This means Solver found a stationary solution at which necessary optimality conditions are satisfied In most cases such stationary solutions will be locally optimal solutions but there is no guarantee It is not at all guaranteed that the solution is a globally optimal solution Run Solver from several different starting points to increase the chances that you find the global optimal solution to your problem. 44 22

Solver Messages 2-4 relate to various difficulties Will not go over these See next 3 slides for descriptions 45 Solver Message 2 Solver has converged to the current solution. All constraints are satisfied. This means the objective function value changed very slowly for the last few iterations. If you suspect the solution is not locally optimal, your problem may be poorly scaled. 46 23

Solver Message 3 Solver cannot improve the current solution. All constraints are satisfied. This rare message means that your model is degenerate and the Solver is cycling Degeneracy can sometimes be eliminated by removing redundant constraints in a model. 47 Solver Message 4 Solver could not find a feasible solution. In NLP this can occur in feasible problems. Solver is not able to reduce the sum of infeasibilities Possible remedies: try another starting point, include automatic scaling, decrease convergence tolerance 48 24

Lecture 7 3E4 Homework 1. Find all stationary points of a) cos(πx) b) x 1 /(1+ x 22 ) c) 5x 1 2x 2 + x 3 2. What is the steepest descent direction for 5x 1 2x 2 + x 3? 3. Perform two steepest descent steps on the function cos(πx) starting from 1/2 with unit stepsize (t = 1). I.e. take x 0 = 1/2 and calculate x 1 and x 2. 49 Lecture 7 3E4 Homework 4. Suppose we want to minimise f(x)= 5x 1 2x 2 subject to x in the box 0 x 1 in R 2. Starting from x 0 = (1,0), take two steps of the projected gradient method with stepsize t=1/3. Show x 0, x 1, x 2 on a sketch of the feasible set. 50 25

Lecture 7 3E4 Homework 5. Consider minimising 5x 1 2x 2 + x 3 subject to nonnegative variables and x 12 + x 22 + x 32 = 4. a) Verify that x* = (0,2,0) is optimal for the LP associated with this point. b) Find the Lagrange multiplier for x* by calculating the shadow price of the LP in part (a) at this point. c) How will the optimal value of the NLP change to if the RHS value is changed from 4 to 4.5? d) Check your answer to part (b) using the Excel sensitivity report. e) Check your answer to part (c) by re-solving with Excel using RHS 4.5. Is there a difference? Comment briefly on this. 51 Lecture 7 3E4 Homework Hints for Q5, parts (a), (b): Write down the linear program corresponding to x*, as described previously Rewrite LP as a max problem, and analyse it as described in Lecture 3: find reduced cost of x* to show optimality find shadow cost (How is shadow cost of max LP related to shadow cost of min LP? hence to Lagrange multiplier of NLP at x*?) 52 26