ECON 582: Dynamic Programming (Chapter 6, Acemoglu) Instructor: Dmytro Hryshko

Indirect Utility Recall: static consumer theory; J goods, p j is the price of good j (j = 1; : : : ; J), c j is consumption of good j (j = 1; : : : ; J), and I is income (scalar). Consumer solves max u(c 1 ; : : : ; c J ) c 1 0;::: ;c J 0 s.t. JX j=1 p j c j = I: Denote the Lagrange multiplier on the budget constraint as. F.O.C. with respect to each c j is: u j (c 1 ; : : : ; c J ) = p j, where u j is the rst derivative of u with respect to c j.

The optimal c j should be a function of p 1 ; : : : ; p J and I: c j = c j (p 1; : : : ; p J ; I). Lagrangian of the problem at the optimum is V (p 1 ; : : : ; p J ; I) = u c 1(p 1 ; : : : ; p J ; I); : : : ; c j(p 1 ; : : : ; p J ; I) + (I JX j=1 p j c j(p 1 ; : : : ; p J ; I)): V (p 1 ; : : : ; p J ; I) is an indirect utility function the maximized level of utility from the current \state," p 1 ; : : : ; p J ; I.

What happens if I increases marginally? Overall utility will increase by V I (p 1 ; : : : ; p J ) = JX j=1 @c j @I (u j p j ) + = = {z } u j(c 1 ; : : : ; c ) J ; 8j: p j =0 by FOC This is an application of the Envelope theorem.

Thus, the indirect utility function summarizes the value of the household's problem and allows to determine the marginal value of income without knowing optimal consumption functions. Need to do better since this theory is static; ignores savings, and uncertainty over the future.

Discrete-time innite-horizon optimization sup fy(t)g 1 t=0 1X t=0 t e U(t; x(t); y(t)) s.t. y(t) 2 e G(t; x(t)); 8t 0 x(t + 1) = ~ f(t; x(t); y(t)); 8t 0 x(0) given: 2 [0; 1) is the time discount factor; t = 0; 1; : : : is time; x(t) 2 X R Kx ; y(t) 2 Y R Ky, K x ; K y 1. x(t) state variables at time t; y(t) control variables at time t. eu : Z + X Y! R instantaneous payo function. P 1 t=0 t e U(t; x(t); y(t)) objective function.

Eliminate y(t) and write the optimization problem as Problem A0: V (0; x(0)) = s.t. sup fx(t+1)g 1 t=0 1X t=0 x(t + 1) 2 G(t; x(t)); 8t 0 x(0) given: t U(t; x(t); x(t + 1)) x(t) is the state vector and x(t + 1) is the control vector at time t. G : Z + X X is the constraint correspondence; the value function V : Z + X! R the supremum (highest possible value) the objective function can attain, starting with some x(0) at time 0. \sup" is used to denote that there is no guarantee that the maximal value is attained by any feasible plan; otherwise use the \max" operator.

When the maximal value is attained by some sequence fx (t + 1)g 1 t=0, it is called an optimal plan. V (t; x) the value function: the value of pursuing the optimal strategy starting with initial state x at time t. We want to characterize the optimal plan fx (t + 1)g 1 t=0 and the value function V (0; x(0)).

Example. The optimal growth problem: max fc(t);k(t+1)g 1 t=0 1X t=0 t u(c(t)) s.t. y(t) = f(k(t)) = c(t) + i(t) k(t + 1) = k(t) k(t) + i(t) k(t) 0 k(0) > 0 given: where is the depreciation rate; i(t), k(t) and c(t) are investment, capital, and consumption per capita. Plug i(t) = f(k(t)) c(t) from the rst constraint into the second to obtain the law of motion of capital k(t + 1) = f(k(t)) + (1 )k(t) c(t):

Mapping into the general formulation x(t) = k(t), x(t + 1) = k(t + 1). From the law of motion of capital c(t) = f(k(t)) k(t + 1) + (1 )k(t). The objective function becomes: max fk(t+1)g 1 t=0 1X t=0 t u [f(k(t)) k(t + 1) + (1 )k(t)] : Thus, U(t; x(t); x(t + 1)) U(t; k(t); k(t + 1)) = u [f(k(t)) k(t + 1) + (1 )k(t)] :

The constraint correspondence G(t; k(t)) is given by k(t + 1) 2 [0; f(k(t)) + (1 )k(t)] : The lower bound is obtained when c(t) = f(k(t)) + (1 )k(t), the upper bound is obtained when c(t) = 0. Note that U and G do not explicitly depend on time. A stationary problem involves an objective function that is a discounted sum, and U and G do not explicitly depend on time.

Stationary dynamic programming Problem A1: V (x(0)) = s.t. sup fx(t+1)g 1 t=0 1X t=0 x(t + 1) 2 G(x(t)); 8t 0 x(0) given: t U(x(t); x(t + 1)) Note that there is no time argument in V, G, and U. E.g., U : X X! R. Problems A0 and A1 are called the sequence problems: they involve choosing an innite sequence fx(t)g 1 t=0.

Dynamic programming: turn the sequence problem into a functional equation, that is, transform the problem into nding a function rather than a sequence. Problem A2 (Bellman equation): V (x) = where V : X! R. sup [U(x; y) + V (y)] ; 8x 2 X; y2g(x)

Notes: Intuitively, instead of choosing fx(t)g 1 t=0, we choose a policy, which determines the value of the control x(t + 1) for a given value of the state x(t). Since U(:; :) does not explicitly depend on t, the policy is time-independent. State is x, control is y: we want to choose y 2 G(x) for any x, or maximize V for any x. V is on the both sides of the functional equation. Hence, the problem is said to be in recursive formulation.

Benets of recursive formulation: Similar to the logic of comparing today to tomorrow: U(x; y) is the \return for today", and V (y) is the continuation return from the next period and onwards (\return for tomorrow"). Thus, we can use our intuitions from two-period maximization problems. Consider Problem A1 and a maximum attained from x(0) as fx (t)g 1 t=0, and x (0) = x(0).

Under some technical conditions, V (x(0)) = 1X t=0 t U(x (t); x (t + 1)) = U(x (0); x (1)) + 1X s=0 = U(x (0); x (1)) + V (x (1)): s U(x (s + 1); x (s + 2)) The Principle of Optimality: the optimal plan can be broken into 2 parts, what is optimal to do today and the optimal continuation path.

For the stationary dynamic programming the solution can be represented by a time-invariant policy function : X! X, that is y(t) = x(t + 1) = (x(t)), 8t, or y = x 0 = (x), where x 0 is the value of x attained in the next period.

For the optimal policy function y = (x) it must be the case that V (x) = U(x; (x)) + V ((x)); 8x 2 X:

Stationary dynamic programming theorems Consider a sequence fx (t)g 1 t=0 that attains the supremum of A1. Our main purpose is to ensure that this sequence satises the recursive formulation A2. V (x (t)) = U(x (t); x (t + 1)) + V (x (t + 1)); 8t = 0; : : : ; We also want any solution to A2 to be also a solution to A1 (the problem attains its supremum). To this end, we need to make several assumptions.

List of assumptions G(x) is nonempty for all x 2 X and for all x(0) 2 X, lim n!1 P n t=0 t U(x(t); x(t + 1)) exists and is nite (maximum value of the problem attained is a nite number) (*). X is compact, G is compact and continuous, U is continuous. U is concave. For each y 2 X, U(x; y) is strictly increasing in x (can be a vector). U is continuously dierentiable on the interior of its domain.

Principle of optimality Let the assumption (*) hold; and x = fx(0); x (1); x (2); : : : g is a feasible plan that attains V (x(0)) in Problem A1. Then, V (x (t)) = U(x (t); x (t + 1)) + V (x (t + 1)); for all t = 0; 1; : : :, with x (0) = x(0). Moreover, if any feasible x = fx(0); x (1); x (2); : : : g satises the above equation, then it attains the optimal value in Problem A1.

Comments The optimal plan can be broken into two parts: the current return, U(x (t); x (t + 1)) and the continuation return V (x (t + 1)), the discounted value of a problem starting from the state x (t + 1) and onwards. We can go from the solution of the recursive problem to the solution of the original sequence problem. Under certain assumptions, we can show that the policy function x (t + 1) = (x (t)) is continuous, the value function V : X! R is concave, monotonous, and dierentiable.

Basic equations The functional equation of Problem A2: V (x) = max U(x; y) + V (y); for all x 2 X: y2g(x) F.O.C. (of the RHS) w.r.t. y: @U(x; y ) + V 0 (y ) = 0: @y At the optimum, V (x) = U(x; y ) + V (y ). Dierentiate the value function to obtain V 0 @U(x; y ) @y (x) = @y @x + V 0 (y ) @y @x {z } =0 by F.O.C. + @U(x; y ) @x This is an application of the Envelope theorem. = @U(x; y ) : @x

Let's take the F.O.C. and the Envelope condition at time t: @U(x(t); x(t + 1)) + V 0 (x(t + 1)) = 0 @x(t + 1) V 0 @U(x(t); x(t + 1)) (x(t)) = : @x(t) From the second equation, V 0 (x(t + 1)) = @U(x(t + 1); x(t + 2)) : @x(t + 1) Plug this result into the rst equation to obtain the Euler equation: @U(x (t); x (t + 1)) @x(t + 1) + @U(x (t + 1); x (t + 2)) @x(t + 1) = 0:

In innite horizon problems, we need the transversality condition to hold to ensure optimality: lim @U(x (t); x (t + 1)) t!1 t x (t) = 0: @x(t)

Example Consider the following optimal growth problem with log preferences, Cobb-Douglas technology and full depreciation of capital stock. max fc(t);k(t+1)g 1 t=0 s.t. 1X t=0 k(t + 1) = k(t) k(0) > 0: t log(c(t)) c(t)

Recursive formulation V (k) = max log(k k 0 0 k 0 ) + V (k 0 ) ; where k is the current level of physical capital and k 0 is the amount of physical capital in the next period. In terms of our previous notation, x k; y k 0 ; U(x; y) log(k k 0 ). We want to nd the optimal policy function y = (x), or k 0 = (k). It determines the level of tomorrow's capital stock as a function of today's capital stock. Consumption will be given as c = k k 0 for each t.

The F.O.C. of the RHS is: The Envelope condition is: 1 k k 0 + V 0 (k 0 ) = 0: V 0 (k) = 1 k k 0 k 1 : Iterating forward the envelope condition, V 0 (k 0 ) = (k0 ) 1 (k 0 ) ^k ; where ^k is the capital stock two periods ahead. Plugging this result into the rst equation, our Euler equation is: 1 k k 0 = (k0 ) 1 (k 0 ) ^k :

This equation is equivalent to a familiar Euler equation u 0 (c(t)) = R(t + 1)u 0 (c(t + 1)), where R(t + 1) = (k(t + 1)) 1. k 0 is a function of k, and ^k is a function of k 0. Thus, the Euler equation is a functional equation. Let's guess-and-verify that k 0 = ak (invest a constant fraction, a, of output every period). Plug it into the Euler equation, 1 k ak = (ak ) 1 (ak ) a 1+ k 2 = a k ak : For the LHS and RHS to be equal, we require a =. Thus, k(t + 1) = k(t) and c(t) = k(t) k(t) = (1 )k(t). The steady state will occur when k = k(t + 1) = k(t), or when k = () 1 1.

More on transversality condition Consider the sequence problem: max fx(t)g T t=0 TX t=0 t U (x(t); x(t + 1)) ; x(t + 1) 0, x(0) given. Let U(x(T ); x(t + 1)) be the last period's utility (\salvage value"). Assume interior solutions x (t) > 0. The Euler equation: @U(x (t); x (t + 1)) @y + @U(x (t + 1); x (t + 2)) @x = 0; for 0 t T 1.

Maximizing the salvage value wrt x(t + 1) yields the following boundary condition: T @U(x (T ); x (T + 1)) x (T + 1) = 0 @y x (T + 1) 0: In the optimal growth problem, U(x(t); x(t + 1)) = u (f(k(t)) + (1 )k(t) k(t + 1)) : At the last date T, c z T } +1 { T @U( f(k(t )) + (1 )k(t ) k(t + 1)) k (T + 1) = 0 @k(t + 1) U 0 (c(t + 1)) k (T + 1) = 0: {z } >0

Thus, k (T + 1) = 0: there should be no capital left at the end of the world. Otherwise, utility could be improved by consuming left-out resources at the last date or earlier.

Heuristic derivation of the transversality condition Take the limit of the salvage-value condition: lim @U(x (T ); x (T + 1)) T!1 T x (T + 1) = 0 @y The Euler equation linking periods T and T + 1: T @U(x (T ); x (T + 1)) @y Thus, + T +1 @U(x (T + 1); x (T + 2)) @x = 0: or lim T!1 T +1 @U(x (T + 1); x (T + 2)) x (T + 1) = 0; @x lim @U(x (T ); x (T + 1)) T!1 T x (T ) = 0: @x

Under certain assumptions the solution is approached in the limit as j! 1: V j+1 (x) = max [U(x; y) + V j (y)] ; x given; y2g(x) where V 0 is bounded and continuous.

Computational methods for solving dynamic programming problems 1 Value function iteration. Choose V 0 (x) = 0 for all x, and iterate on V j+1 (x) = max y2g(x) [U(x; y) + V j (y)], x given, until V j converges. 2 Guess and verify: guess the policy or value functions (we've done it above guessing the policy function). 3 Policy function iteration.