SHORT INTRODUCTION TO DYNAMIC PROGRAMMING. Example We consider different stages (discrete time events) given by k = 0,..., N. Let x k be the amount of money owned by a consumer at the stage k. At each stage k, the consumer decides the fraction u k of the capital x k that he will use. The amount which is consumed at the stage k is therefore given by c k = u k x k. The rest is spared at a given interest rate. The evolution of the capital is thus given by the equation x k+ = ρ( u k )x k where ρ >. We consider the utility function U(c) = c. In particular, the function U is strictly increasing and also strictly concave. It is increasing because the consumer spends his money with something which increases his satisfaction. However, the marginal increase in satisfaction decreases with the amount of money which is spent. For example, the increase in satisfaction obtained by spending 0 extra kroner is less if it is added to 000 kroner than if it is added to 0 kroner. The concavity assumption takes this fact into account. The consumer wants to maximize J = U(c k ) = xk u k The consumer has to find a good balance between his immediate satisfaction and his future satisfaction. To illustrate this, let us consider two opposite strategies: The consumer uses all his money at the first stage: u 0 = and u k arbitrary for k = 0,..., N. Then, x k = 0 for k =,..., N and J = x 0 The consumer spares his money until the last stage where he spends it all: u k = 0 for k = 0,..., N, u N =. Then, x k = ρ k x 0 and J = ρ N x 0 The first strategy is not optimal because the consumer at stage does not take into account the satisfaction he can get in the future by sparing. In the second strategy, the consumer takes this fact into account and manages to get the highest capital possible but this capital is not used in the optimal way. Indeed, due to the concavity of the utility function, it is not optimal to spend a lot of money at the same time. The optimal strategy is a balance between these two strategies, which we are going to compute.
2 DYNAMIC PROGRAMMING 2. Terminology and statement of the problem We consider a system where the events happen in stages and the total number of stages is fixed. At each stage k (where k = 0,..., N), the state variable x k gives a describtion of the system. The evolution of the system is given by a governing equation of the form () x k+ = g k (x k, u k ) where g k is a given function and u k is a control variable. By choosing the control variable, we influence the evolution of the state variable (we cannot, in general, set directly the state variable; some intertia in the system can be modelled by ()). At each stage, there is a profit (or cost) given as a function f k (x k, u k ) of the state variable and of the control variable. The total value (or total cost) is thus (2) J = f k (x k, u k ). We want to find the optimal values for u k (k = 0,..., N) which maximize or minimize J. We denote the optimal value of J by J. We will consider the case where x k and u k belong to R (the scalar case) but the results can be readily extended to the case where x k R n and u k R p for some integer n and p (the vector case). In general the state and control variable cannot just take any value in R and we have u k U where U, the control space, is a given subset of R. It is standard to take U compact (bounded and closed) so that the minimization problems have a solutions. One can also consider a set U k which depends on the stage. We can finally formulate the problem as follows: Given x 0 R, find the optimal sequence, which we denote {u k }N, such that u k U k for k = 0,..., N and () J = where f k (x k, u k) = max {u k } N (4) x k+ = g k (x k, u k ). f k (x k, u k ). The DP algorithm We define the value function J k (x) for each stage k as follows Definition. For any x R, we define J k (x) as (5) J k (x) = max {u k } N f i (x i, u i ) where i=k x k = x and x i+ = g i (x i, u i ) for i = k,..., N.
DYNAMIC PROGRAMMING It is clear from the definition of the optimal control () that we have J = J 0 (x 0 ). We want to compute the functions J k (x). We can see J N (x) is easy to obtain. Indeed, we have J N (x) = max f N (x, u) u U N so that J N (x) is obtained by solving a standard maximisation problem (the only unknown is u). The idea is then to compute J k (x) for all value of x by going backwards: We assume that J k+ (x) is given and then, we compute J k (x) by using the following proposition, which constitutes the fundamental principle in dynamic programming. Fundamental principle in dynamic programming. We have (6) J k (x) = max (f k (x, u) + J k+ (g k (x, u))). Proof. Given x R, for any sequence of control {u i } N i=k, we have f i (x i, u i ) = f k (x k, u k ) + f i (x i, u i ) (7) i=k i=k+ f k (x k, u k ) + J k+ (x k+ ) (by definition of J k+ ) = f k (x, u k ) + J k+ (g k (x, u k )) (by (4)) max (f k (x, u) + J k+ (g k (x, u))). The right-hand side of (7) is a number which does not depend on the sequence {u i } N i=k. We take the maximum over all the sequences u i on the left-hand side and obtain that J k (x) max(f k (x, u) + J k+ (g k (x, u))). It remains to prove the inequality in the other direction. From now on, we assume that the maxima are allways attained. Consider u which maximises f k (x, u) + J k+ (g k (x, u)) and then u i (i = k +,..., N) which maximizes N i=k+ f i(x i, u i ) where x k+ = g k (x, u ) and x i+ = g i (x i, u i ). Hence, max(f k (x, u) + J k+ (g k (x, u))) = f k (x, u ) + J k+ (g k (x, u )) = f k (x, u ) + J k (x) i=k+ f i (x i, u i ) The last inequality follows from the definition of J k as a maximum, see (5). To solve the problem, we can use the following algorithm DP algorithm. By using the fundamental principle of dynamic programming, we compute J k (x) for k = N,..., 0 (going backwards in k). The optimal value for J is given by J = J 0 (x 0 ). The optimal control sequence {u k } N is given by for k = 0,..., N. u k = argmax(f k (x, u) + J k (g k (x, u)))
4 DYNAMIC PROGRAMMING Let us use the DP algorithm to solve the example of the first section. We compute J N (x); we have J N (x) = max xu = x. u [0,] At the last stage, the consumer spends all the money left. We compute J N (x); we have J N (x) = max u [0,] ( xu + J N (g(x, u))) so that J N (x) = max u [0,] ( xu + ρ( u)x) = x max u [0,] ( u + ρ u). We want to minimize the function φ(u) = u + ρ u. We have φ (u) = ρ 2 u 2 u and φ (u ) = 0 if and only if u = + ρ. Then, we have φ(u ) = + ρ. Since u is the only extrema in (0, ) and φ(u ) φ(0) = ρ and φ(u ) φ() =, then u is the maximum. Hence, We compute J N 2 (x); we have J N (x) = + ρ x. so that J N 2 (x) = max u [0,] ( xu + J N (g(x, u))) J N (x) = max u [0,] ( xu + + ρ ρ( u)x) = x max u [0,] ( u + ρ + ρ 2 u). We want to maximize the function φ(u) = u + ρ + ρ 2 u. We observe that φ is obtained from φ by replacing ρ by ρ + ρ 2. Hence, the maximum of φ is equal to + ρ + ρ 2 and is reached for u = +ρ+ρ 2. Thus, By induction, we prove that Hence, the optimal value is and is obtained by choosing J N 2 (x) = + ρ + ρ 2 x. J N p = + ρ + ρ 2 +... + ρ p x = J = ρ N ρ x u N p = ρ ρ p+. ρ p+ ρ x.
DYNAMIC PROGRAMMING 5 4. The shortest path problem We consider N nodes. Some of the nodes are connected. The lengths between the connected nodes are given. There is a starting node that we denote s and an ending node that we denote t. The shortest path problem consists of finding the shortest path between s and t. Figure gives an example of such graph. the length between two connected nodes is indicated in the figure. We order the nodes and give them 2 2 6 6 2 5 4 5 Figure. Example of a graph for a shortest path problem a number from to N. Let f(i, j) be the length between the connected nodes i and j. A path of p nodes is a sequence of node x k {,..., p} for k =,..., p such that x k and x k+ are connected, x = s and x p = t. The length of the path {x k } p k= is given by (8) p f(x k, x k+ ). k= The solution of the shortest path problem is a path {x k } p which minimizes the length given by (8). We now want to rewrite the shortest path problem as a DP problem. We extend the definition of f(i, j) to any node (not just the connected ones) by setting f(i, j) = if the nodes i and j are not connected and f(i, i) = 0. We make the following assumptions: There does not exist any cyclic path of negative length. If this assumption is not fullfilled and if this cycle path with negative length can be reached from s, then the problem does not admit a solution as any path can allways be improved by taking loops in this cycle. There exist at least one path of finite length which connects s to t. With these assumptions, it is clear that an optimal path exists and its length is at most N. We consider the DP problem given by minimizing (9) J = where x = s, N k= f(x k, u k ) x k+ = u k
6 DYNAMIC PROGRAMMING and we take u k U k where U k = {,..., N } for k =,..., N 2 and U N = {t}. This DP problem is equivalent to the shortest path problem. In the DP formulation (9), we have a fixed number of stage N while p was variable in the shortest path problem formulation (8). We find the actual number of nodes in the optimal path by removing the repeated nodes in the solution of the DP problem. Let us consider the example above with s = and t = 6. The function f is given in the Figure 2. By using (6), we compute recursively the value of J k (i) for k =,..., 5 and i =,..., 6 and the result are given in Figure. For illustration purpose, we consider in details the computation of J 4 (). We have J 4 () = min(f(, x) + J 5 (x)). x Since 0 6 9 f(, ) + J 5 ( ) = 2 + =, 0 we have J 4 () = 9. From the results in Figure, we get that the optimal length is J = J (s) = J () = 5. To find the optimal path, we have to solve (0) x k+ = argmin (f(x k, x) + J k (x)) x {,...,N} Hence, we get x =, x 2 =, x =, x 4 = 5, x 5 = 4, x 6 = 6. There is one repeated node (x = x 2 ). To compute the shortest path, a straightforward method is to consider all the path, compute the length of each of them and find the smallest. Since there are N nodes, there exist N 2 paths and we can roughly estimate by N N! the number of operation to compute all the lengths. The question is how this method compares with the DP algorithm. In the DP algorithm, we have to find the functions J k, that is, compute J k (x i ) for k =,..., N and i =,..., N (in total N N values to compute). To compute each J k (x i ), we need to solve a minimization problem which requires N operations. Finally, for the DP algorithm, we have a number of operations of order N. Since, for N large, N is smaller that N N!, the DP algorithm is computationally advantageous. However, it requires a lot of memory (all the J k have to be stored), which is not the case in the first approach. 2 4 5 6 0 2 2 0 2 6 2 0 5 4 2 5 0 5 0 6 6 0 Figure 2. The value of f(i, j) is given by the element (i, j) in the table.
DYNAMIC PROGRAMMING 7 J 5 J 4 J J 2 J 9 6 5 5 2 6 4 4 5 2 2 2 2 6 0 0 0 0 0 Figure. Computation of J k (i) for k =,..., 5 and i =,..., 6