Handout 1: Introduction to Dynamic Programming. 1 Dynamic Programming: Introduction and Examples

Similar documents
1 Markov decision processes

minimize x subject to (x 2)(x 4) u,

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

OPTIMIZATION. joint course with. Ottimizzazione Discreta and Complementi di R.O. Edoardo Amaldi. DEIB Politecnico di Milano

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Constrained Optimization. Unconstrained Optimization (1)

Examiners: R. Grinnell Date: April 19, 2013 E. Moore Time: 9:00 am Duration: 3 hours. Read these instructions:

MIT Manufacturing Systems Analysis Lecture 14-16

DRAFT Formulation and Analysis of Linear Programs

CHAPTER 2: QUADRATIC PROGRAMMING

REVIEW OF MATHEMATICAL CONCEPTS

Theoretical questions and problems to practice Advanced Mathematics and Statistics (MSc course)

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

Study Guide - Part 2


MS-E2140. Lecture 1. (course book chapters )

MS 2001: Test 1 B Solutions

REVIEW OF MATHEMATICAL CONCEPTS

Integer programming: an introduction. Alessandro Astolfi

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

Practice Questions for Math 131 Exam # 1

CHAPTER 11 Integer Programming, Goal Programming, and Nonlinear Programming

MATH 4211/6211 Optimization Basics of Optimization Problems

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Numerical Optimization. Review: Unconstrained Optimization

Network Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini

Lecture Note 1: Introduction to optimization. Xiaoqun Zhang Shanghai Jiao Tong University

3E4: Modelling Choice. Introduction to nonlinear programming. Announcements

Math 116: Business Calculus Chapter 4 - Calculating Derivatives

The Dual Simplex Algorithm

Introduction to linear programming using LEGO.

Math for Economics 1 New York University FINAL EXAM, Fall 2013 VERSION A

Markov Decision Processes Chapter 17. Mausam

Operations Research: Introduction. Concept of a Model

IE 5531: Engineering Optimization I

OPTIMISATION /09 EXAM PREPARATION GUIDELINES

Understanding the Simplex algorithm. Standard Optimization Problems.

UNIVERSITY OF KWA-ZULU NATAL

Part 1. The Review of Linear Programming

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

MS-E2140. Lecture 1. (course book chapters )

Page Points Score Total: 100

Mathematical Optimization Models and Applications

Introduction to Reinforcement Learning

Lecture 6: Sections 2.2 and 2.3 Polynomial Functions, Quadratic Models

Optimeringslära för F (SF1811) / Optimization (SF1841)

MATH 121: EXTRA PRACTICE FOR TEST 2. Disclaimer: Any material covered in class and/or assigned for homework is a fair game for the exam.

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

Welcome to CPSC 4850/ OR Algorithms

IE 495 Stochastic Programming Problem Set #1 Solutions. 1 Random Linear Programs and the Distribution Problem

MATH2070/2970 Optimisation

SECTION 5.1: Polynomials

Optimization. Broadly two types: Unconstrained and Constrained optimizations We deal with constrained optimization. General form:

Chapter 1. Introduction. 1.1 Problem statement and examples

Linear programming: introduction and examples

Computational Finance

Stochastic Integration and Stochastic Differential Equations: a gentle introduction

Markov Decision Processes Chapter 17. Mausam

3E4: Modelling Choice

Econ Slides from Lecture 10

Lagrange Relaxation and Duality

Lectures 6, 7 and part of 8

Mathematical Methods and Economic Theory

1 Computing with constraints

Integer Linear Programming Modeling

Lecture 1: Introduction

MULTIPLE CHOICE QUESTIONS DECISION SCIENCE

In the Ramsey model we maximized the utility U = u[c(t)]e nt e t dt. Now

Agenda today. Introduction to prescriptive modeling. Linear optimization models through three examples: Beyond linear optimization

Modern Logistics & Supply Chain Management

2. Linear Programming Problem

Optimization Methods in Finance

MATH2070 Optimisation

Constrained optimization: direct methods (cont.)

II. Analysis of Linear Programming Solutions

IV. Violations of Linear Programming Assumptions

Section K MATH 211 Homework Due Friday, 8/30/96 Professor J. Beachy Average: 15.1 / 20. ), and f(a + 1).

Math Exam Jam Concise. Contents. 1 Algebra Review 2. 2 Functions and Graphs 2. 3 Exponents and Radicals 3. 4 Quadratic Functions and Equations 4

MATH 445/545 Homework 1: Due February 11th, 2016

Structured Problems and Algorithms

Applications of Linear Programming - Minimization

RELATIONS AND FUNCTIONS

MATH2070 Optimisation

5 Flows and cuts in digraphs

is called an integer programming (IP) problem. model is called a mixed integer programming (MIP)

MS&E 246: Lecture 18 Network routing. Ramesh Johari

OPERATIONS RESEARCH. Linear Programming Problem

1 Strict local optimality in unconstrained optimization

ALGEBRA CLAST MATHEMATICS COMPETENCIES

Sample Mathematics 106 Questions

Math 120 Final Exam Practice Problems, Form: A

Math Practice Final - solutions

Linear Programming. Businesses seek to maximize their profits while operating under budget, supply, Chapter

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Stochastic Optimization

Concept and Definition. Characteristics of OR (Features) Phases of OR

Interior Point Methods for Mathematical Programming

32. Use a graphing utility to find the equation of the line of best fit. Write the equation of the line rounded to two decimal places, if necessary.

Topic 8: Optimal Investment

The Boundary Problem: Markov Chain Solution

Transcription:

SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 1: Introduction to Dynamic Programming Instructor: Shiqian Ma January 6, 2014 Suggested Reading: Sections 1.1 1.5 of Chapter I of Richard Bellman, Dynamic Programming, Dover Publications, Inc., 2003. Also review material from SEEM 3440: Operations Research II. 1 Dynamic Programming: Introduction and Examples Operations Research: a science about decision making Operations: Activities carried out in an organization related to the attainment of its goals: Decision making among different options (Example: Shortest Path) Research: Scientific methods to study the operations Operations Research: Develop scientific methods to help people make decisions of activities so as to achieve a specific objective Two features: Decision making: which path? Achieve some objective, e.g., maximize profits or minimize costs Deterministic model All info and data are deterministic Produce chair and table using two materials Stochastic model Some info and data are stochastic Lifespan of a usb, when should I replace it? Where is operations research used? Airline: Scheduling aircrafts crews (minimum number of crews) Logistics and supply chain: Inventory (how many to order, demand, order cost, inventory cost) Revenue management: Pricing (retailer selects products to display) Financial industry: Portfolio selection, asset allocation Civil engineering: Traffic analysis and transportation system design (the routes and frequency of buses, emergency evacuation system) Dynamic Programming: multi-stage optimization: take advantage of the new information in each stage to make a new decision. Example: 1

Scheduling (Shortest Path) Inventory Control Two-game chess match Machine replacement 2 Basic Terminologies in Optimization An optimization problem typically takes the form minimize f(x) subject to x X. (P ) Here, f : R n R is called the objective function, and X R n is called the feasible region. Thus, x = (x 1,..., x n ) is an n dimensional vector, and we shall agree that it is represented in column form. In other words, we treat x as an n 1 matrix. The entries x 1,..., x n are called the decision variables of (P ). If X = R n, then (P ) is called an unconstrained optimization problem. Otherwise, it is called an constrained optimization problem. As the above formulation suggests, we are interested in an optimal solution to (P ), which is defined as a point x X such that f(x ) f(x) for all x X. We call f(x ) the optimum value of (P ). To illustrate the above concepts, let us consider the following example: Example 1 Suppose that f : R 2 R is given by f(x 1, x 2 ) = x 2 1 + 2x2 2, and X = {(x 1, x 2 ) R 2 : 0 x 1 1, 1 x 2 3}. Then, we can write (P ) as (P ) minimize x 2 1 + 2x2 2 subject to 0 x 1 1, 1 x 2 3. This is a constrained optimization problem, and it is easy to verify that f(x 1, x 2 ) f(0, 1) = 2 for all (x 1, x 2 ) X. Thus, we say that (0, 1) is an optimal solution to (P ), and f(0, 1) = 2 is the optimal value. It is worth computing the derivative of f at (0, 1): f [ ] [ ] [ ] x 1 2x1 0 0 f = =. 4x 2 1 0 (x 1,x 2 )=(0,1) x 2 (x 1,x 2 )=(0,1) This shows that for a constrained optimization problem, the derivative at the optimal solution need not be zero. The different structures of f and X in (P ) give rise to different classes of optimization problems. Some important classes include: 1. discrete optimization problems, when the set X consists of countably many points; 2

2. linear optimization problems, when f takes the form a 1 x 1 + a 2 x 2 + + a n x n for some given a 1,..., a n, and X is a set defined by linear inequalities; 3. nonlinear optimization problems, when f is nonlinear or X cannot be defined by linear inequalities alone; 4. stochastic optimization problems, where f takes the form where Z is a random parameter. f(x) = E Z [F (x, Z)], To illustrate the above concepts, let us consider the following problem, which will serve as our running example: Resource Allocation Problem. Suppose that we have an initial wealth of S 0 dollars, and we want to allocate it to two investment options. By allocating x 0 to the first option, one earns a return of g(x 0 ). The remaining S 0 x 0 dollars will earn a return of h(s 0 x 0 ). Here, we are assuming that 0 x 0 S 0, so that we are not borrowing extra money to fund our investments. Now, a natural goal is choose the allocation amount x 0 to maximize our total return, which is given by f(x 0 ) = g(x 0 ) + h(s 0 x 0 ). In our notation, the resource allocation problem is nothing but the following optimization problem: maximize g(x 0 ) + h(s 0 x 0 ) (RAP) subject to x 0 X = [0, S 0 ]. Consider the following scenarios: 1. Suppose that both g and h are linear, i.e., g(x) = ax + b and h(x) = cx + d for some a, b, c, d R. Then, (RAP) becomes maximize (a c)x 0 + b + d + cs 0 (RAP L) which is a linear optimization problem. In this case, the optimal solution to (RAP L) can be determined explicitly. Indeed, if a c 0, then it is profitable to make x 0 as large as possible, and hence the optimal solution is x 0 = S 0. On the other hand, if a c < 0, then a similar argument shows that the optimal solution should be x 0 = 0. Suppose that we change the constraint in (RAP L) from x 0 [0, S 0 ] to { x 0 X = 0, S 0 M, 2S } 0 M,..., S 0, where M 2 is some integer. Then, the problem becomes a discrete optimization problem, as the feasible region X now consists of only a finite number of points. 2. Suppose that g(x) = a log x and h(x) = b log x for some a, b > 0. Then, (RAP) becomes maximize a log x 0 + b log(s 0 x 0 ) (RAP LOG) 3

which is a nonlinear optimization problem. Observe that if x 0 is an optimal solution to (RAP LOG), then we must have 0 < x 0 < S 0. In other words, the boundary points x 0 = 0 and x 0 = S 0 cannot be optimal for (RAP LOG). This implies that the optimal solution x 0 can be found by differentiating the objective function and setting the derivative to zero; i.e., x 0 satisfies df = a b = 0. dx 0 x 0 S 0 x 0 In particular, we obtain x 0 = as 0/(a + b). 3. Let Z be a random variable with Consider the functions G and g defined by Pr(Z = 1) = 1 4, Pr(Z = 1) = 3 4. G(x, Z) = Zx + b, g(x) = E Z [G(x, Z)], where b R is a given constant. Furthermore, suppose that h(x) = cx + d, where c, d R are given. Then, (RAP) becomes maximize E Z [G(x, Z)] + cx + d (RAP S) which is a stochastic optimization problem. Note that by definition of expectation, we have E Z [G(x, Z)] = G(x, 1) Pr(Z = 1) + G(x, 1) Pr(Z = 1) = 3 4 ( x + b) + 1 (x + b) 4 = 1 2 x + b for any x. Hence, (RAP S) can be written as ( maximize c 1 ) x + b + d 2 which is a simple linear optimization problem. 3 Introduction to Dynamic Programming Observe that all the optimization problems introduced in the previous section involve only an one stage decision, namely, to choose a point x in the feasible region X to minimize an objective function f. However, in reality, information is often released in stages, and we are allowed to take advantage of the new information in each stage to make a new decision. This gives rise to multi stage optimization problems, which we shall refer to as dynamic programming or dynamic optimization problems. 4

Before we introduce the theory of dynamic programming, let us study an example and understand some of the difficulties of dynamic optimization. Consider a two stage generalization of the resource allocation problem, in which the first stage proceeds as before. However, as a price of obtaining the return g(x 0 ), the original allocation x 0 to the first option is reduced to ax 0, where 0 < a < 1. Similarly, the allocation S 0 x 0 for obtaining the return h(s 0 x 0 ) is reduced to b(s 0 x 0 ), where 0 < b < 1. In particular, at the end of the first stage, the available wealth for investment in the next stage is S 1 = ax 0 + b(s 0 x 0 ). Now, in the second stage, one can again split the S 1 dollars into the two investment options, obtaining a return of g(x 1 ) + h(s 1 x 1 ) if x 1 dollars is allocated to the first option and the remaining amount S 1 x 1 is allocated to the second option. The goal now is to choose the allocation amounts x 0 and x 1 in both stages to maximize the total return f S0 (x 0, x 1 ) = g(x 0 ) + h(s 0 x 0 ) + g(x 1 ) + h(s 1 x 1 ). In other words, we can formulate the two stage resouce allocation problem as follows: maximize g(x 0 ) + h(s 0 x 0 ) + g(x 1 ) + h(s 1 x 1 ) 0 x 1 S 1, S 1 = ax 0 + b(s 0 x 0 ). (RAP 2) Of course, there is no reason to stop at a second stage problem. By iterating the above process, we have an N stage resource allocation problem, where at the end of the k th stage (where k = 1, 2,..., N 1), the available wealth would be S k = ax k 1 + b(s k 1 x k 1 ), where x k 1 is the amount allocated to the first option in the k th stage. Mathematically, the N stage problem can be formulated as follows: maximize N 1 k=0 [ g(xk ) + h(s k x k ) ] 0 x 1 S 1,..., 0 x N 1 S N 1, S k = ax k 1 + b(s k 1 x k 1 ) for k = 1,..., N 1. (RAP N) Now, an important question is, how would one solve (RAP N)? If g, h are linear, then (RAP N) is a linear optimization problem, and hence it can in principle be solved by, say, the simplex method. However, the problem becomes more difficult if g, h are nonlinear. One possibility is to use calculus. Towards that end, suppose that the optimal solution (x 0, x 1,..., x N 1 ) to (RAP N) satisfies 0 < x k < S k for k = 0, 1,..., N 1. Let f S0 (x 0, x 1,..., x N 1 ) = N 1 k=0 [ g(xk ) + h(s k x k ) ]. Then, we set all the partial derivatives of f S0 to zero and solve for x 0, x 1,..., x N 1 : f S0 x N 1 = g (x N 1 ) h (S N 1 x N 1 ) = 0, f S0 x N 2 = g (x N 2 ) h (S N 2 x N 2 ) + (a b)h (S N 1 x N 1 ) = 0,. 5

This approach requires us to solve a system of N nonlinear equations in N unknowns, which in general is not an easy task. Worse yet, we have to also check the boundary points x k = 0 and x k = S k for optimality. Fortunately, not all is lost. Observe that in the above approach, we have not taken into account the sequential nature of the problem, i.e., the allocations x 0, x 1,..., x N 1 should be determined sequentially. This motivates us to consider approaches that can take advantage of such a structure. Towards that end, observe that the maximum total return of the N stage resource allocation problem depends only on N and the initial wealth S 0. Hence, we can define a function q N by q N (S 0 ) = max {f S0 (x 0, x 1,..., x N 1 ) : 0 x k S k for k = 0, 1,..., N 1}. (1) In words, q N (S 0 ) is the maximum return of the N stage resource allocation problem if the initial wealth is S 0. For instance, we have q 1 (S 0 ) = max {g(x 0 ) + h(s 0 x 0 ) : 0 x 0 S 0 }, (2) which coincides with (RAP). Now, although we can use the definition of q 2 (S 0 ) as given in (1), we can also express it in terms of q 1 (S 0 ). To see this, recall that the total return of the 2 stage problem is the first stage return plus the second stage return. Clearly, whatever we choose the first stage allocation x 0 to be, the wealth available at the end of the first stage, i.e., S 1 = ax 0 + b(s 0 x 0 ), must be allocated optimally for the second stage if we wish to maximize the total return. Thus, if x 0 is our allocation in the first stage, then we will obtain a return of q 1 (S 1 ) in the second stage by choosing x 1 optimally. It follows that q 2 (S 0 ) = max {g(x 0 ) + h(s 0 x 0 ) + q 1 (ax 0 + b(s 0 x 0 )) : 0 x 0 S 0 }. (3) More generally, by using the same idea, we obtain the following recurrence relation for q N (s 0 ): q N (S 0 ) = max {g(x 0 ) + h(s 0 x 0 ) + q N 1 (ax 0 + b(s 0 x 0 )) : 0 x 0 S 0 }. (4) An important feature of (4) is that it has only one decision variable (i.e., x 0 ), as opposed to N decision variables (i.e., x 0, x 1,..., x N 1 ) in the definition of q N (S 0 ) as given by (1). Now, starting with q 1 (S 0 ), as given by (2), we can use (3) to compute q 2 (S 0 ), which in turn can be used to compute q 3 (S 0 ) and so on using (4). Thus, the formulation (4) allows us to turn the original N variable formulation (RAP N) into N one dimensional problems. We shall see the computational advantage of such a formulation later in the course. As an illustration, consider the following example: Example 2 Consider the 2 stage resource allocation problem, where g(x) = a log x and h(x) = b log x for some a, b > 0, and the initial wealth is S 0. Recall that the maximum total return of this problem is given by q 2 (S 0 ) = max {g(x 0 ) + h(s 0 x 0 ) + q 1 (ax 0 + b(s 0 x 0 )) : 0 x 0 S 0 }. To determine q 2 (S 0 ), we start with q 1 (S 1 ), where S 1 = ax 0 + b(s 0 x 0 ). By definition, we have q 1 (S 1 ) = max {a log x + b log(s 1 x) : 0 x S 1 }. Observe that the optimal solution x to q 1 (S 1 ) must satisfy 0 < x < S 1. Hence, by differentiating the objective function and setting the derivative to zero, we obtain a x b S 1 x = 0 x = a a + b S 1. 6

In particular, q 1 (S 1 ) = a log(rs 1 ) + b log((1 r)s 1 ), where r = a a + b. Upon substituting this into q 2 (S 0 ), we have q 2 (S 0 ) = max {a log(x 0 ) + b log(s 0 x 0 ) + a log(rs 1 ) + b log((1 r)s 1 ) : 0 x 0 S 0 }. Again, the optimal solution x 0 to q 2(S 0 ) must satisfy 0 < x 0 < S 0. Hence, by differentiating the objective function and setting the derivative to zero, we have a x 0 b S 0 x 0 + a(a b) ax 0 + b(s 0 x 0 ) + b(a b) ax 0 + b(s 0 x 0 ) = 0. This is just a quadratic equation in x 0 and hence the optimal solution x 0 leave this as an exercise to the reader. can be found easily. We 7