SHORT INTRODUCTION TO DYNAMIC PROGRAMMING. 1. Example

Similar documents
Lecture 4: Optimization. Maximizing a function of a single variable

Math Advanced Calculus II

Shortest paths with negative lengths

Applications of Differentiation

The Principle of Optimality

The Interpretation of λ

Applied Lagrange Duality for Constrained Optimization

3. Find the slope of the tangent line to the curve given by 3x y e x+y = 1 + ln x at (1, 1).

Kevin James. MTHSC 102 Section 4.3 Absolute Extreme Points

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1

Game Theory and its Applications to Networks - Part I: Strict Competition

Mathematical Economics: Lecture 16

Winter Lecture 10. Convexity and Concavity

Optimization, Part 2 (november to december): mandatory for QEM-IMAEF, and for MMEF or MAEF who have chosen it as an optional course.

where u is the decision-maker s payoff function over her actions and S is the set of her feasible actions.

250 (headphones list price) (speaker set s list price) 14 5 apply ( = 14 5-off-60 store coupons) 60 (shopping cart coupon) = 720.

Solving Dual Problems

Lecture 5: The Bellman Equation

Lakehead University ECON 4117/5111 Mathematical Economics Fall 2002

1 Lecture 25: Extreme values

Homework #2 Solutions Due: September 5, for all n N n 3 = n2 (n + 1) 2 4

Maths 212: Homework Solutions

The Envelope Theorem

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

MATH 215 Sets (S) Definition 1 A set is a collection of objects. The objects in a set X are called elements of X.

Convex Analysis and Economic Theory Winter 2018

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Economics 205, Fall 2002: Final Examination, Possible Answers

ECON 582: Dynamic Programming (Chapter 6, Acemoglu) Instructor: Dmytro Hryshko

9/21/2018. Properties of Functions. Properties of Functions. Properties of Functions. Properties of Functions. Properties of Functions

DYNAMIC LECTURE 5: DISCRETE TIME INTERTEMPORAL OPTIMIZATION

Convex analysis and profit/cost/support functions

8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0

Sections 4.1 & 4.2: Using the Derivative to Analyze Functions

minimize x subject to (x 2)(x 4) u,

A guide to Proof by Induction

Tutorial 3: Optimisation

Example 1: Inverse Functions Show that the functions are inverse functions of each other (if they are inverses, )

1 Markov decision processes

ACO Comprehensive Exam 19 March Graph Theory

Econ 508-A FINITE DIMENSIONAL OPTIMIZATION - NECESSARY CONDITIONS. Carmen Astorne-Figari Washington University in St. Louis.

CHAPTER I THE RIESZ REPRESENTATION THEOREM

Structural Properties of Utility Functions Walrasian Demand

Algorithms: COMP3121/3821/9101/9801

Week 7 Solution. The two implementations are 1. Approach 1. int fib(int n) { if (n <= 1) return n; return fib(n 1) + fib(n 2); } 2.

Submodular Functions Properties Algorithms Machine Learning

Sample Mathematics 106 Questions

Kevin James. MTHSC 102 Section 4.2 Relative Extreme Points

MAT 122 Homework 7 Solutions

MTH 241: Business and Social Sciences Calculus

Slides II - Dynamic Programming

Analysis III. Exam 1

Chapter 5. Increasing and Decreasing functions Theorem 1: For the interval (a,b) f (x) f(x) Graph of f + Increases Rises - Decreases Falls

ASSIGNMENT 1 SOLUTIONS

GARP and Afriat s Theorem Production

DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES

An Application to Growth Theory

LMI Methods in Optimal and Robust Control

(2) Let f(x) = 5x. (3) Say f (x) and f (x) have the following graphs. Sketch a graph of f(x). The graph of f (x) is: 3x 5

Choice under Uncertainty

Mathematics for Decision Making: An Introduction. Lecture 13

14 Increasing and decreasing functions

Name Vetter Midterm REVIEW January 2019

Solve the equation for c: 8 = 9c (c + 24). Solve the equation for x: 7x (6 2x) = 12.

Continuous Functions on Metric Spaces

Higher-Degree Polynomial Functions. Polynomials. Polynomials

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Differentiation. 1. What is a Derivative? CHAPTER 5

lim sup f(x k ) d k < 0, {α k } K 0. Hence, by the definition of the Armijo rule, we must have for some index k 0

Numerical Sequences and Series

Suppose that f is continuous on [a, b] and differentiable on (a, b). Then

s P = f(ξ n )(x i x i 1 ). i=1

4.1 Real-valued functions of a real variable

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

4.2: What Derivatives Tell Us

Chapter 3: Root Finding. September 26, 2005

Scheduling Markovian PERT networks to maximize the net present value: new results

Week 5 Lectures 13-15

Chapter 3. Differentiable Mappings. 1. Differentiable Mappings

Math Practice Final - solutions

Review Exercise 2. 1 a Chemical A 5x+ Chemical B 2x+ 2y12 [ x+ Chemical C [ 4 12]

Ch3. Generating Random Variates with Non-Uniform Distributions

Course 212: Academic Year Section 1: Metric Spaces

CHAPTER 3: OPTIMIZATION

Chapter 3 Continuous Functions

Distributed Optimization. Song Chong EE, KAIST

Mathematical Preliminaries for Microeconomics: Exercises

INSPECT Algebra I Summative Assessment Summary

Constrained Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Exercises NP-completeness

A model for competitive markets

CHAPTER 4: HIGHER ORDER DERIVATIVES. Likewise, we may define the higher order derivatives. f(x, y, z) = xy 2 + e zx. y = 2xy.

6.254 : Game Theory with Engineering Applications Lecture 8: Supermodular and Potential Games

Numerical Methods. V. Leclère May 15, x R n

n n P} is a bounded subset Proof. Let A be a nonempty subset of Z, bounded above. Define the set

Lecture 8: Basic convex analysis

Graphs with few total dominating sets

Tangent Plane. Nobuyuki TOSE. October 02, Nobuyuki TOSE. Tangent Plane

Transcription:

SHORT INTRODUCTION TO DYNAMIC PROGRAMMING. Example We consider different stages (discrete time events) given by k = 0,..., N. Let x k be the amount of money owned by a consumer at the stage k. At each stage k, the consumer decides the fraction u k of the capital x k that he will use. The amount which is consumed at the stage k is therefore given by c k = u k x k. The rest is spared at a given interest rate. The evolution of the capital is thus given by the equation x k+ = ρ( u k )x k where ρ >. We consider the utility function U(c) = c. In particular, the function U is strictly increasing and also strictly concave. It is increasing because the consumer spends his money with something which increases his satisfaction. However, the marginal increase in satisfaction decreases with the amount of money which is spent. For example, the increase in satisfaction obtained by spending 0 extra kroner is less if it is added to 000 kroner than if it is added to 0 kroner. The concavity assumption takes this fact into account. The consumer wants to maximize J = U(c k ) = xk u k The consumer has to find a good balance between his immediate satisfaction and his future satisfaction. To illustrate this, let us consider two opposite strategies: The consumer uses all his money at the first stage: u 0 = and u k arbitrary for k = 0,..., N. Then, x k = 0 for k =,..., N and J = x 0 The consumer spares his money until the last stage where he spends it all: u k = 0 for k = 0,..., N, u N =. Then, x k = ρ k x 0 and J = ρ N x 0 The first strategy is not optimal because the consumer at stage does not take into account the satisfaction he can get in the future by sparing. In the second strategy, the consumer takes this fact into account and manages to get the highest capital possible but this capital is not used in the optimal way. Indeed, due to the concavity of the utility function, it is not optimal to spend a lot of money at the same time. The optimal strategy is a balance between these two strategies, which we are going to compute.

2 DYNAMIC PROGRAMMING 2. Terminology and statement of the problem We consider a system where the events happen in stages and the total number of stages is fixed. At each stage k (where k = 0,..., N), the state variable x k gives a describtion of the system. The evolution of the system is given by a governing equation of the form () x k+ = g k (x k, u k ) where g k is a given function and u k is a control variable. By choosing the control variable, we influence the evolution of the state variable (we cannot, in general, set directly the state variable; some intertia in the system can be modelled by ()). At each stage, there is a profit (or cost) given as a function f k (x k, u k ) of the state variable and of the control variable. The total value (or total cost) is thus (2) J = f k (x k, u k ). We want to find the optimal values for u k (k = 0,..., N) which maximize or minimize J. We denote the optimal value of J by J. We will consider the case where x k and u k belong to R (the scalar case) but the results can be readily extended to the case where x k R n and u k R p for some integer n and p (the vector case). In general the state and control variable cannot just take any value in R and we have u k U where U, the control space, is a given subset of R. It is standard to take U compact (bounded and closed) so that the minimization problems have a solutions. One can also consider a set U k which depends on the stage. We can finally formulate the problem as follows: Given x 0 R, find the optimal sequence, which we denote {u k }N, such that u k U k for k = 0,..., N and () J = where f k (x k, u k) = max {u k } N (4) x k+ = g k (x k, u k ). f k (x k, u k ). The DP algorithm We define the value function J k (x) for each stage k as follows Definition. For any x R, we define J k (x) as (5) J k (x) = max {u k } N f i (x i, u i ) where i=k x k = x and x i+ = g i (x i, u i ) for i = k,..., N.

DYNAMIC PROGRAMMING It is clear from the definition of the optimal control () that we have J = J 0 (x 0 ). We want to compute the functions J k (x). We can see J N (x) is easy to obtain. Indeed, we have J N (x) = max f N (x, u) u U N so that J N (x) is obtained by solving a standard maximisation problem (the only unknown is u). The idea is then to compute J k (x) for all value of x by going backwards: We assume that J k+ (x) is given and then, we compute J k (x) by using the following proposition, which constitutes the fundamental principle in dynamic programming. Fundamental principle in dynamic programming. We have (6) J k (x) = max (f k (x, u) + J k+ (g k (x, u))). Proof. Given x R, for any sequence of control {u i } N i=k, we have f i (x i, u i ) = f k (x k, u k ) + f i (x i, u i ) (7) i=k i=k+ f k (x k, u k ) + J k+ (x k+ ) (by definition of J k+ ) = f k (x, u k ) + J k+ (g k (x, u k )) (by (4)) max (f k (x, u) + J k+ (g k (x, u))). The right-hand side of (7) is a number which does not depend on the sequence {u i } N i=k. We take the maximum over all the sequences u i on the left-hand side and obtain that J k (x) max(f k (x, u) + J k+ (g k (x, u))). It remains to prove the inequality in the other direction. From now on, we assume that the maxima are allways attained. Consider u which maximises f k (x, u) + J k+ (g k (x, u)) and then u i (i = k +,..., N) which maximizes N i=k+ f i(x i, u i ) where x k+ = g k (x, u ) and x i+ = g i (x i, u i ). Hence, max(f k (x, u) + J k+ (g k (x, u))) = f k (x, u ) + J k+ (g k (x, u )) = f k (x, u ) + J k (x) i=k+ f i (x i, u i ) The last inequality follows from the definition of J k as a maximum, see (5). To solve the problem, we can use the following algorithm DP algorithm. By using the fundamental principle of dynamic programming, we compute J k (x) for k = N,..., 0 (going backwards in k). The optimal value for J is given by J = J 0 (x 0 ). The optimal control sequence {u k } N is given by for k = 0,..., N. u k = argmax(f k (x, u) + J k (g k (x, u)))

4 DYNAMIC PROGRAMMING Let us use the DP algorithm to solve the example of the first section. We compute J N (x); we have J N (x) = max xu = x. u [0,] At the last stage, the consumer spends all the money left. We compute J N (x); we have J N (x) = max u [0,] ( xu + J N (g(x, u))) so that J N (x) = max u [0,] ( xu + ρ( u)x) = x max u [0,] ( u + ρ u). We want to minimize the function φ(u) = u + ρ u. We have φ (u) = ρ 2 u 2 u and φ (u ) = 0 if and only if u = + ρ. Then, we have φ(u ) = + ρ. Since u is the only extrema in (0, ) and φ(u ) φ(0) = ρ and φ(u ) φ() =, then u is the maximum. Hence, We compute J N 2 (x); we have J N (x) = + ρ x. so that J N 2 (x) = max u [0,] ( xu + J N (g(x, u))) J N (x) = max u [0,] ( xu + + ρ ρ( u)x) = x max u [0,] ( u + ρ + ρ 2 u). We want to maximize the function φ(u) = u + ρ + ρ 2 u. We observe that φ is obtained from φ by replacing ρ by ρ + ρ 2. Hence, the maximum of φ is equal to + ρ + ρ 2 and is reached for u = +ρ+ρ 2. Thus, By induction, we prove that Hence, the optimal value is and is obtained by choosing J N 2 (x) = + ρ + ρ 2 x. J N p = + ρ + ρ 2 +... + ρ p x = J = ρ N ρ x u N p = ρ ρ p+. ρ p+ ρ x.

DYNAMIC PROGRAMMING 5 4. The shortest path problem We consider N nodes. Some of the nodes are connected. The lengths between the connected nodes are given. There is a starting node that we denote s and an ending node that we denote t. The shortest path problem consists of finding the shortest path between s and t. Figure gives an example of such graph. the length between two connected nodes is indicated in the figure. We order the nodes and give them 2 2 6 6 2 5 4 5 Figure. Example of a graph for a shortest path problem a number from to N. Let f(i, j) be the length between the connected nodes i and j. A path of p nodes is a sequence of node x k {,..., p} for k =,..., p such that x k and x k+ are connected, x = s and x p = t. The length of the path {x k } p k= is given by (8) p f(x k, x k+ ). k= The solution of the shortest path problem is a path {x k } p which minimizes the length given by (8). We now want to rewrite the shortest path problem as a DP problem. We extend the definition of f(i, j) to any node (not just the connected ones) by setting f(i, j) = if the nodes i and j are not connected and f(i, i) = 0. We make the following assumptions: There does not exist any cyclic path of negative length. If this assumption is not fullfilled and if this cycle path with negative length can be reached from s, then the problem does not admit a solution as any path can allways be improved by taking loops in this cycle. There exist at least one path of finite length which connects s to t. With these assumptions, it is clear that an optimal path exists and its length is at most N. We consider the DP problem given by minimizing (9) J = where x = s, N k= f(x k, u k ) x k+ = u k

6 DYNAMIC PROGRAMMING and we take u k U k where U k = {,..., N } for k =,..., N 2 and U N = {t}. This DP problem is equivalent to the shortest path problem. In the DP formulation (9), we have a fixed number of stage N while p was variable in the shortest path problem formulation (8). We find the actual number of nodes in the optimal path by removing the repeated nodes in the solution of the DP problem. Let us consider the example above with s = and t = 6. The function f is given in the Figure 2. By using (6), we compute recursively the value of J k (i) for k =,..., 5 and i =,..., 6 and the result are given in Figure. For illustration purpose, we consider in details the computation of J 4 (). We have J 4 () = min(f(, x) + J 5 (x)). x Since 0 6 9 f(, ) + J 5 ( ) = 2 + =, 0 we have J 4 () = 9. From the results in Figure, we get that the optimal length is J = J (s) = J () = 5. To find the optimal path, we have to solve (0) x k+ = argmin (f(x k, x) + J k (x)) x {,...,N} Hence, we get x =, x 2 =, x =, x 4 = 5, x 5 = 4, x 6 = 6. There is one repeated node (x = x 2 ). To compute the shortest path, a straightforward method is to consider all the path, compute the length of each of them and find the smallest. Since there are N nodes, there exist N 2 paths and we can roughly estimate by N N! the number of operation to compute all the lengths. The question is how this method compares with the DP algorithm. In the DP algorithm, we have to find the functions J k, that is, compute J k (x i ) for k =,..., N and i =,..., N (in total N N values to compute). To compute each J k (x i ), we need to solve a minimization problem which requires N operations. Finally, for the DP algorithm, we have a number of operations of order N. Since, for N large, N is smaller that N N!, the DP algorithm is computationally advantageous. However, it requires a lot of memory (all the J k have to be stored), which is not the case in the first approach. 2 4 5 6 0 2 2 0 2 6 2 0 5 4 2 5 0 5 0 6 6 0 Figure 2. The value of f(i, j) is given by the element (i, j) in the table.

DYNAMIC PROGRAMMING 7 J 5 J 4 J J 2 J 9 6 5 5 2 6 4 4 5 2 2 2 2 6 0 0 0 0 0 Figure. Computation of J k (i) for k =,..., 5 and i =,..., 6