Handout 1: Introduction to Dynamic Programming. 1 Dynamic Programming: Introduction and Examples

SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 1: Introduction to Dynamic Programming Instructor: Shiqian Ma January 6, 2014 Suggested Reading: Sections 1.1 1.5 of Chapter I of Richard Bellman, Dynamic Programming, Dover Publications, Inc., 2003. Also review material from SEEM 3440: Operations Research II. 1 Dynamic Programming: Introduction and Examples Operations Research: a science about decision making Operations: Activities carried out in an organization related to the attainment of its goals: Decision making among different options (Example: Shortest Path) Research: Scientific methods to study the operations Operations Research: Develop scientific methods to help people make decisions of activities so as to achieve a specific objective Two features: Decision making: which path? Achieve some objective, e.g., maximize profits or minimize costs Deterministic model All info and data are deterministic Produce chair and table using two materials Stochastic model Some info and data are stochastic Lifespan of a usb, when should I replace it? Where is operations research used? Airline: Scheduling aircrafts crews (minimum number of crews) Logistics and supply chain: Inventory (how many to order, demand, order cost, inventory cost) Revenue management: Pricing (retailer selects products to display) Financial industry: Portfolio selection, asset allocation Civil engineering: Traffic analysis and transportation system design (the routes and frequency of buses, emergency evacuation system) Dynamic Programming: multi-stage optimization: take advantage of the new information in each stage to make a new decision. Example: 1

Scheduling (Shortest Path) Inventory Control Two-game chess match Machine replacement 2 Basic Terminologies in Optimization An optimization problem typically takes the form minimize f(x) subject to x X. (P ) Here, f : R n R is called the objective function, and X R n is called the feasible region. Thus, x = (x 1,..., x n ) is an n dimensional vector, and we shall agree that it is represented in column form. In other words, we treat x as an n 1 matrix. The entries x 1,..., x n are called the decision variables of (P ). If X = R n, then (P ) is called an unconstrained optimization problem. Otherwise, it is called an constrained optimization problem. As the above formulation suggests, we are interested in an optimal solution to (P ), which is defined as a point x X such that f(x ) f(x) for all x X. We call f(x ) the optimum value of (P ). To illustrate the above concepts, let us consider the following example: Example 1 Suppose that f : R 2 R is given by f(x 1, x 2 ) = x 2 1 + 2x2 2, and X = {(x 1, x 2 ) R 2 : 0 x 1 1, 1 x 2 3}. Then, we can write (P ) as (P ) minimize x 2 1 + 2x2 2 subject to 0 x 1 1, 1 x 2 3. This is a constrained optimization problem, and it is easy to verify that f(x 1, x 2 ) f(0, 1) = 2 for all (x 1, x 2 ) X. Thus, we say that (0, 1) is an optimal solution to (P ), and f(0, 1) = 2 is the optimal value. It is worth computing the derivative of f at (0, 1): f [ ] [ ] [ ] x 1 2x1 0 0 f = =. 4x 2 1 0 (x 1,x 2 )=(0,1) x 2 (x 1,x 2 )=(0,1) This shows that for a constrained optimization problem, the derivative at the optimal solution need not be zero. The different structures of f and X in (P ) give rise to different classes of optimization problems. Some important classes include: 1. discrete optimization problems, when the set X consists of countably many points; 2

2. linear optimization problems, when f takes the form a 1 x 1 + a 2 x 2 + + a n x n for some given a 1,..., a n, and X is a set defined by linear inequalities; 3. nonlinear optimization problems, when f is nonlinear or X cannot be defined by linear inequalities alone; 4. stochastic optimization problems, where f takes the form where Z is a random parameter. f(x) = E Z [F (x, Z)], To illustrate the above concepts, let us consider the following problem, which will serve as our running example: Resource Allocation Problem. Suppose that we have an initial wealth of S 0 dollars, and we want to allocate it to two investment options. By allocating x 0 to the first option, one earns a return of g(x 0 ). The remaining S 0 x 0 dollars will earn a return of h(s 0 x 0 ). Here, we are assuming that 0 x 0 S 0, so that we are not borrowing extra money to fund our investments. Now, a natural goal is choose the allocation amount x 0 to maximize our total return, which is given by f(x 0 ) = g(x 0 ) + h(s 0 x 0 ). In our notation, the resource allocation problem is nothing but the following optimization problem: maximize g(x 0 ) + h(s 0 x 0 ) (RAP) subject to x 0 X = [0, S 0 ]. Consider the following scenarios: 1. Suppose that both g and h are linear, i.e., g(x) = ax + b and h(x) = cx + d for some a, b, c, d R. Then, (RAP) becomes maximize (a c)x 0 + b + d + cs 0 (RAP L) which is a linear optimization problem. In this case, the optimal solution to (RAP L) can be determined explicitly. Indeed, if a c 0, then it is profitable to make x 0 as large as possible, and hence the optimal solution is x 0 = S 0. On the other hand, if a c < 0, then a similar argument shows that the optimal solution should be x 0 = 0. Suppose that we change the constraint in (RAP L) from x 0 [0, S 0 ] to { x 0 X = 0, S 0 M, 2S } 0 M,..., S 0, where M 2 is some integer. Then, the problem becomes a discrete optimization problem, as the feasible region X now consists of only a finite number of points. 2. Suppose that g(x) = a log x and h(x) = b log x for some a, b > 0. Then, (RAP) becomes maximize a log x 0 + b log(s 0 x 0 ) (RAP LOG) 3

which is a nonlinear optimization problem. Observe that if x 0 is an optimal solution to (RAP LOG), then we must have 0 < x 0 < S 0. In other words, the boundary points x 0 = 0 and x 0 = S 0 cannot be optimal for (RAP LOG). This implies that the optimal solution x 0 can be found by differentiating the objective function and setting the derivative to zero; i.e., x 0 satisfies df = a b = 0. dx 0 x 0 S 0 x 0 In particular, we obtain x 0 = as 0/(a + b). 3. Let Z be a random variable with Consider the functions G and g defined by Pr(Z = 1) = 1 4, Pr(Z = 1) = 3 4. G(x, Z) = Zx + b, g(x) = E Z [G(x, Z)], where b R is a given constant. Furthermore, suppose that h(x) = cx + d, where c, d R are given. Then, (RAP) becomes maximize E Z [G(x, Z)] + cx + d (RAP S) which is a stochastic optimization problem. Note that by definition of expectation, we have E Z [G(x, Z)] = G(x, 1) Pr(Z = 1) + G(x, 1) Pr(Z = 1) = 3 4 ( x + b) + 1 (x + b) 4 = 1 2 x + b for any x. Hence, (RAP S) can be written as ( maximize c 1 ) x + b + d 2 which is a simple linear optimization problem. 3 Introduction to Dynamic Programming Observe that all the optimization problems introduced in the previous section involve only an one stage decision, namely, to choose a point x in the feasible region X to minimize an objective function f. However, in reality, information is often released in stages, and we are allowed to take advantage of the new information in each stage to make a new decision. This gives rise to multi stage optimization problems, which we shall refer to as dynamic programming or dynamic optimization problems. 4

Before we introduce the theory of dynamic programming, let us study an example and understand some of the difficulties of dynamic optimization. Consider a two stage generalization of the resource allocation problem, in which the first stage proceeds as before. However, as a price of obtaining the return g(x 0 ), the original allocation x 0 to the first option is reduced to ax 0, where 0 < a < 1. Similarly, the allocation S 0 x 0 for obtaining the return h(s 0 x 0 ) is reduced to b(s 0 x 0 ), where 0 < b < 1. In particular, at the end of the first stage, the available wealth for investment in the next stage is S 1 = ax 0 + b(s 0 x 0 ). Now, in the second stage, one can again split the S 1 dollars into the two investment options, obtaining a return of g(x 1 ) + h(s 1 x 1 ) if x 1 dollars is allocated to the first option and the remaining amount S 1 x 1 is allocated to the second option. The goal now is to choose the allocation amounts x 0 and x 1 in both stages to maximize the total return f S0 (x 0, x 1 ) = g(x 0 ) + h(s 0 x 0 ) + g(x 1 ) + h(s 1 x 1 ). In other words, we can formulate the two stage resouce allocation problem as follows: maximize g(x 0 ) + h(s 0 x 0 ) + g(x 1 ) + h(s 1 x 1 ) 0 x 1 S 1, S 1 = ax 0 + b(s 0 x 0 ). (RAP 2) Of course, there is no reason to stop at a second stage problem. By iterating the above process, we have an N stage resource allocation problem, where at the end of the k th stage (where k = 1, 2,..., N 1), the available wealth would be S k = ax k 1 + b(s k 1 x k 1 ), where x k 1 is the amount allocated to the first option in the k th stage. Mathematically, the N stage problem can be formulated as follows: maximize N 1 k=0 [ g(xk ) + h(s k x k ) ] 0 x 1 S 1,..., 0 x N 1 S N 1, S k = ax k 1 + b(s k 1 x k 1 ) for k = 1,..., N 1. (RAP N) Now, an important question is, how would one solve (RAP N)? If g, h are linear, then (RAP N) is a linear optimization problem, and hence it can in principle be solved by, say, the simplex method. However, the problem becomes more difficult if g, h are nonlinear. One possibility is to use calculus. Towards that end, suppose that the optimal solution (x 0, x 1,..., x N 1 ) to (RAP N) satisfies 0 < x k < S k for k = 0, 1,..., N 1. Let f S0 (x 0, x 1,..., x N 1 ) = N 1 k=0 [ g(xk ) + h(s k x k ) ]. Then, we set all the partial derivatives of f S0 to zero and solve for x 0, x 1,..., x N 1 : f S0 x N 1 = g (x N 1 ) h (S N 1 x N 1 ) = 0, f S0 x N 2 = g (x N 2 ) h (S N 2 x N 2 ) + (a b)h (S N 1 x N 1 ) = 0,. 5

This approach requires us to solve a system of N nonlinear equations in N unknowns, which in general is not an easy task. Worse yet, we have to also check the boundary points x k = 0 and x k = S k for optimality. Fortunately, not all is lost. Observe that in the above approach, we have not taken into account the sequential nature of the problem, i.e., the allocations x 0, x 1,..., x N 1 should be determined sequentially. This motivates us to consider approaches that can take advantage of such a structure. Towards that end, observe that the maximum total return of the N stage resource allocation problem depends only on N and the initial wealth S 0. Hence, we can define a function q N by q N (S 0 ) = max {f S0 (x 0, x 1,..., x N 1 ) : 0 x k S k for k = 0, 1,..., N 1}. (1) In words, q N (S 0 ) is the maximum return of the N stage resource allocation problem if the initial wealth is S 0. For instance, we have q 1 (S 0 ) = max {g(x 0 ) + h(s 0 x 0 ) : 0 x 0 S 0 }, (2) which coincides with (RAP). Now, although we can use the definition of q 2 (S 0 ) as given in (1), we can also express it in terms of q 1 (S 0 ). To see this, recall that the total return of the 2 stage problem is the first stage return plus the second stage return. Clearly, whatever we choose the first stage allocation x 0 to be, the wealth available at the end of the first stage, i.e., S 1 = ax 0 + b(s 0 x 0 ), must be allocated optimally for the second stage if we wish to maximize the total return. Thus, if x 0 is our allocation in the first stage, then we will obtain a return of q 1 (S 1 ) in the second stage by choosing x 1 optimally. It follows that q 2 (S 0 ) = max {g(x 0 ) + h(s 0 x 0 ) + q 1 (ax 0 + b(s 0 x 0 )) : 0 x 0 S 0 }. (3) More generally, by using the same idea, we obtain the following recurrence relation for q N (s 0 ): q N (S 0 ) = max {g(x 0 ) + h(s 0 x 0 ) + q N 1 (ax 0 + b(s 0 x 0 )) : 0 x 0 S 0 }. (4) An important feature of (4) is that it has only one decision variable (i.e., x 0 ), as opposed to N decision variables (i.e., x 0, x 1,..., x N 1 ) in the definition of q N (S 0 ) as given by (1). Now, starting with q 1 (S 0 ), as given by (2), we can use (3) to compute q 2 (S 0 ), which in turn can be used to compute q 3 (S 0 ) and so on using (4). Thus, the formulation (4) allows us to turn the original N variable formulation (RAP N) into N one dimensional problems. We shall see the computational advantage of such a formulation later in the course. As an illustration, consider the following example: Example 2 Consider the 2 stage resource allocation problem, where g(x) = a log x and h(x) = b log x for some a, b > 0, and the initial wealth is S 0. Recall that the maximum total return of this problem is given by q 2 (S 0 ) = max {g(x 0 ) + h(s 0 x 0 ) + q 1 (ax 0 + b(s 0 x 0 )) : 0 x 0 S 0 }. To determine q 2 (S 0 ), we start with q 1 (S 1 ), where S 1 = ax 0 + b(s 0 x 0 ). By definition, we have q 1 (S 1 ) = max {a log x + b log(s 1 x) : 0 x S 1 }. Observe that the optimal solution x to q 1 (S 1 ) must satisfy 0 < x < S 1. Hence, by differentiating the objective function and setting the derivative to zero, we obtain a x b S 1 x = 0 x = a a + b S 1. 6

In particular, q 1 (S 1 ) = a log(rs 1 ) + b log((1 r)s 1 ), where r = a a + b. Upon substituting this into q 2 (S 0 ), we have q 2 (S 0 ) = max {a log(x 0 ) + b log(s 0 x 0 ) + a log(rs 1 ) + b log((1 r)s 1 ) : 0 x 0 S 0 }. Again, the optimal solution x 0 to q 2(S 0 ) must satisfy 0 < x 0 < S 0. Hence, by differentiating the objective function and setting the derivative to zero, we have a x 0 b S 0 x 0 + a(a b) ax 0 + b(s 0 x 0 ) + b(a b) ax 0 + b(s 0 x 0 ) = 0. This is just a quadratic equation in x 0 and hence the optimal solution x 0 leave this as an exercise to the reader. can be found easily. We 7