Lecture 1: Dynamic Programming

Lecture 1: Dynamic Programming Fatih Guvenen November 2, 2016 Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 1 / 32

Goal Solve V (k, z) =max c,k 0 u(c)+ E(V (k 0, z 0 ) z) c + k 0 =(1 + r)k + z z 0 = z + Questions: 1 Does a solution exist? 2 Is it unique? 3 If the answers to (1) and (2) are yes: how do we find this solution? Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 2 / 32

Contraction Mapping Theorem Definition: Contraction Mapping. Let (S, d) be a metric space and T : S! S be a mapping of S into itself. T is a contraction mapping with modulus, if for some 2 (0, 1) we have for all v 1, v 2 2 S. d(tv 1, Tv 2 ) apple d(v 1, v 2 ) Contraction Mapping Theorem: Let (S,d) be a complete metric space and suppose that T : S! S is a contraction mapping. Then, T has a unique fixed point v 2 S such that for all v 0 2 S. Tv = v = lim N!1 T N v 0 The beauty of CMT is that it is a constructive theorem: it not only tells us the existence and uniqueness of v but it also shows us how to find it! Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 3 / 32

Qualitative Properties of v We cannot apply CMT in certain cases, because the particular set we are interested in is not a complete metric space. The following corollary comes in handy in those cases. Corollary: Let (S,d) be a complete metric space and T : S! S be a contraction mapping with Tv = v. a. If S is a closed subset of S, and T (S) S, then v 2 S. b. If, in addition, T (S) S S, then v 2 S. S = {continuous, bounded, strictly concave}. Not a complete metric space. S = continuous, bounded, weakly concave} is. So we need to be able to establish that T maps elements in S into S. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 4 / 32

APrototypeProblem apple V (k, z) =max u(c)+ c,k 0 Z c + k 0 =(1 + r)k + z z 0 = z + V (k 0, z 0 )f (z 0 z)dz 0 CMT tells us to start with a guess V 0 and then repeatedly solve the problem in the RHS. But: How to evaluate the conditional expectation? I Several non-trivial issues that are easy to underestimate How to do constrained optimization (esp. in multi-dimension)? This course will focus on methods that are especially suitable for incomplete mkts/heterogenous agent models. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 5 / 32

Algorithmus 1 : STANDARD VALUE FUNCTION ITERATION 1 Set n = 0. Choose an initial guess V 0 2 S. 2 Obtain V n+1 by applying the mapping: V n+1 = TV n, which entails maximizing the right-hand side of the Bellman equation. 3 Stop if convergence criteria satisfied: V n+1 V n < toler. Otherwise, increase n and return to step 2. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 5 / 32

Simple Analytical Example The Neoclassical Growth Model Consider the special case with log utility, Cobb-Douglas production and full depreciation: V (k) = max c,k 0 log c + V k 0 s.t c = Ak k 0 Rewrite the Bellman equation as: V (k) =max c,k 0 log Ak k 0 + V k 0 Our goal is to find V (k) and a decision rule g such that k 0 = g(k). Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 6 / 32

I. Backward Induction (Brute Force) If t = T < 1, in the last period we would have: V 0 (k) 0 for all k. Therefore: 8 9 >< >= V 1 (k) =max k 0 >: log Ak k 0 + V 0 k 0 {z } >; V 1 = max k 0 log (Ak Substitute V 1 into the RHS of V 2 : 0 k 0 ) ) k 0 = 0 ) V 1 (k) =log A + log k V 2 = max k 0 log Ak k 0 + log A + log k 0 ) FOC : 1 Ak k 0 = k 0 ) k 0 = Ak 1 + Substitute k 0 to obtain V 2. We can keep iterating to find the solution. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 7 / 32

II. Guess and Verify (Value Function) The first method suggests a more direct approach. Note that V 2 is also of the form: a + b log k as was V 1. Conjecture that the solution V (k) =a + b log k, where a and b are coefficients that need to be determined. a + b log k = max c,k 0 log Ak k 0 + a + b log k 0 FOC: 1 Ak k 0 = b k 0 ) k 0 = b 1 + b Ak Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 8 / 32

II. Guess and Verify (Value Function) Let LHS = a + b log k. Plug in the expression for k 0 into the RHS: RHS = log Ak b 1 + b Ak b + a + b log 1 + b Ak 1 b = (1 + b) log A + log + a + b log 1 + b 1 + b + (1 + b) log k Imposing the condition that LHS RHS for allk, we find a and b : a = apple 1 1 log A +(1 ) log (1 1 1 + log ) b = 1 We have solved the model! Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 9 / 32

Guess and Verify as a Numerical Tool Although this was a very special example, the same logic can be used for numerical analysis. As long as the true value function is well-behaved (smooth, etc), we can choose a sufficiently flexible functional form that has a finite (ideally small) number of parameters. Then we can apply the same logic as above and solve for the unknown coefficients, which then gives us the complete solution. Many solution methods rely on various version of this general idea (perturbation methods, collocation methods, parametrized expectations) Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 10 / 32

III. Guess and Verify (Policy Functions) a.k.a Euler equations approach Let the policy rule for savings be: k 0 = g(k). The Euler equation is: Ak 1 g (k) A g (k) 1 A (g (k) g (g (k))) = 0 for all k. which is a functional equation in g (k). Guess g (k) =sak, and substitute above: 1 (1 s) Ak = A (sak ) 1 A ((sak ) sa (aak ) ) As can be seen, k cancels out, and we get s =. By using a very flexible choice of g() this method too can be used for solving very general models. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 11 / 32

Back to VFI VFI can be very slow when 1. Three ways to accelerate. 1 (Howard s) Policy Iteration Algorithm (together with its modified version) 2 MacQueen-Porteus (MQP) error bounds 3 Endogenous Grid Method (EGM). In general, basic VFI should never be used without at least one of these add-ons. I I EGM is your best bet when it s applicable. But in certain cases, it s not. In those cases, a combination of Howard s algorithm and MQP can be very useful. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 12 / 32

Howard s Policy Iteration Consider the neoclassical growth model: c 1 V (k, z) =max c,k 0 1 + E V k 0, z 0 z s.t c + k 0 = e z k +(1 )k (P1) z 0 = z + 0, k 0 k. In stage n of the VFI algorithm, first, we maximize the RHS and solve for the policy rule: ( ) (e s j ki +(1 )k s) 1 n (k, z) =arg max s k 1 + E V n s, z 0 z. (1) Second: V n+1 = T sn V n. (2) Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 13 / 32

Policy Iteration Maximization step can be very time consuming. So it seems like a waste that we use the new policy only one period in updating to V n+1. I A simple but key insight is that (2) is also a contraction with modulus. Therefore if we were to repeatedly apply T sn, it would also converge to a fixed point itself at rate. I I Of course, this fixed point would not be the solution of the original Bellman equation we would like to solve. But it is an operatore that is much cheaper to apply. So we may want to apply it more than once. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 14 / 32

Two Properties of Howard s Algorithm Puterman and Brumelle (1979) show that: Policy iteration is equivalent to the Newton-Kantarovich method applied to dynamic programming. Thus, just like Newton s method it has two properties: 1 it is guaranteed to converge to the true solution when the initial point, V 0, is in the domain of attraction of V, and 2 when (i) is satisfied, it converges at a quadratic rate in iteration index n. Bad news: no more global convergence like VFI (unless state space is discrete) Good news: potentially very fast convergence. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 16 / 32

Algorithmus 2 : VFI WITH POLICY ITERATION ALGORITHM 1 Set n = 0. Choose an initial guess v 0 2 S. 2 Obtain s n as in (1) and take the updated value function to be: v n+1 = lim m!1 T m s n v n, which is the (fixed point) value function resulting from using policy, s n, forever. 1 3 Stop if convergence criteria satisfied: v n+1 v n < toler. Otherwise, increase n and return to step 1. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 16 / 32

Modified Policy Iteration Caution: Quadratic convergence is a bit misleading: this is the rate in n. And in contrast to VFI, Howard s algorithm takes a lot of time to evaluate step 2. So overall, it may not be much faster when the state space is large. Second, the basin of attraction can be small. These can be fixed by slightly modifying the algorithm. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 17 / 32

Error Bounds: Background In iterative numerical algorithms, we need a stopping rule. In dynamic programming we want to know how far we are from the true solution in each iteration. Contraction mapping theorem can be used to show: kv v k k 1 apple 1 1 kv k+1 v k k 1. So if we want to stop when we are " away from true solution, kv k+1 v k k 1 <" (1 ). Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 18 / 32

Algorithmus 3 : VFI WITH Modified POLICY IMPROVEMENT ALGO- RITHM 1 Modify Step 2 of Howard s algorithm: Obtain s n as in (1) and take the updated value function to be: v n+1 = T m s n v n, which entails m applications of Howard s mapping to update to v n+1. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 18 / 32

Error Bounds: Background Remarks: 1 This bound is for the worst case scenario (sup-norm). If v varies over a wide range, this bound may be misleading. I Consider u(c) = c 10 10. Typically, v will cover an enormous range of values. Bound might be too pessimistic. 2 ASIDE: In general, it may be hard to judge what an " deviation in v means in economic terms. I I Another approach will be to define stopping rule in the policy space. It is easier to judge what it means to consume x% less than optimal. Typically policies converge faster than values so this might allow stopping sooner. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 19 / 32

MacQueen-Porteus Bounds Consider a different formulation for a dynamic programming problem: 2 3 V (x i )= max 4U(x i, y)+ JX ij (y)v (x j ) 5, (3) y2 (x i ) State space is discrete. But choices are continuous. j=1 Allows for simple modeling of interesting problems. Very common formulation in other fields using dynamic programming. (See, e.g., all books by Bertsekas, with Shreve, or Ozdaglar, etc.) Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 20 / 32

MacQueen-Porteus Bounds Theorem [MacQueen-Porteus bounds] Consider 2 define V (x i )= max y2 (x i ) 4U(x i, y)+ 3 JX ij (y)v (x j ) 5, (4) c n = 1 min [V n V n 1 ] c n = 1 max [V n V n 1 ] (5) Then, for all x 2 X, we have: j=1 T n V 0 (x)+c n apple V (x) apple T n V 0 (x)+c n. (6) Furthermore, with each iteration, the two bounds approach the true solution monotonically. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 21 / 32

MQP Bounds: Comments MQP bounds can be quite tight. Example: V n (x) V n 1 (x) = for all x. Suppose = 100 (a large number). The usual bound implies: kv V n k 1 apple 1 1 kv n (x) V n 1 (x)k 1 = 1, so we would keep iterating. MQP implies c n = c n =, which the then implies 1 = V (x) T n V 0 (x) = 1. We find V (x) =V n (x)+ 1, in one step! MQP: both lower and upper bound for signed difference. No upper bound only for sup-norm. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 22 / 32

Algorithmus 4 : VFI WITH MACQUEEN-PORTEUS ERROR BOUNDS [Step 2 :] Stop when c n c n < toler. Then take the final estimate of V to be either the median Ṽ = T n cn + c V 0 + n 2 or the mean (i.e., average error bound across states): ˆV = T n V 0 + n(1 ) nx i=1 T n V 0 (x i ) T n 1 V 0 (x i ). Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 22 / 32

MQP: Convergence Rate Bertsekas (1987) derives the convergence rate of MQP bounds algorithm It is proportional to the subdominant eigenvalue of ij (y ) (the transition matrix evaluated at optimal policy). VFI is proportional to the dominant eigenvalue, which is always 1. Multiplied by, gives convergence rate. Subdominant (2nd largest) eigenvalue ( 2 ) is sometimes 1 and sometimes not: I AR(1) process, discretized: 2 = (persistence parameter) I More than 1 ergodic set: 2 = 1. When persistence is low, this can lead to substantial improvements in speed. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 23 / 32

Endogenous Grid Method

Endogenous Grid Method Under standard VFI, we have c = E V k k 0, z 0 z j. This equation can be rewritten (by substituting out consumption using the budget constraint) as z j k i +(1 )k i k 0 = E V k k 0, z 0 z j, (7) In VFI, we solve for k 0 for each grid point today (k i, z j ). Slow for three reasons: I This is a non-linear equation in k 0. I For every trial value of k 0, we need to repeatedly evaluate the conditional expectation (since k 0 appears inside the expectation). I V (ki, z j ) is stored at grid points defined over k, so for every trial value of k 0, we need to interpolate to obtain off-grid values V (k 0, zj 0) for each zj 0. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 25 / 32

EGM View the problem differently: c 1 V k, z j = max c,k 0 1 + E V k 0 i, z0 s.t c + ki 0 = z j k +(1 )k (P3) ln z 0 = ln z j + 0, k 0 k Now the same FOC as before: z j k +(1 )k k 0 i = E V k k 0 i, z0 z j, (8) but solve for k as a function of k 0 i and z j : z j k +(1 )k = E V k k 0 i, z0 z j 1/ + k 0 i. z j Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 26 / 32

EGM Define and rewrite the Bellman equation as: ( (Y k 0 ) 1 V (Y, z) =max k 0 1 s.t ln z 0 = ln z j + 0. Y zk +(1 )k (9) + E V Y 0, z 0 z j ) The key observation is that Y 0 is only a function of k 0 i and z 0, so we can write the conditional expectation on the right hand side as: V(k 0 i, z j) E V Y 0 (k 0 i, z0 ), z 0 z j. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 27 / 32

EGM V (Y, z) =max k 0 ( (Y k 0 ) 1 1 + V(k 0 i, z j) ) Now the FOC of this new problem becomes: c (k 0 i, z j) = V k 0(k 0 i, z j). (10) Having obtained c (ki 0, z j) from this expression, we use the resource constraint to compute today s end-of-period resources Y (ki 0, z j)=c (ki 0, z j)+ki 0 as well as V Y (ki 0, z j), z j = c (ki 0, z j) 1 1 + V(k 0 i, z j) Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 28 / 32

EGM: The Algorithm 0: Set n = 0. Construct a grid for tomorrow s capital and today s shock: (ki 0, z j). Choose an initial guess V 0 (ki 0, z j). 1: For all i, j, obtain c (k 0 i, z j)= V n k (k 0 i, z j) 1/. 2: Obtain today s end-of-period resources as a function of tomorrow s capital and today s shock: Y (k 0 i, z j)=c (k 0 i, z j)+k 0 i, and today s updated value function, V n+1 Y (ki 0, z j), z j = c (ki 0, z j) 1 1 + V n (k 0 i, z j) by plugging in consumption decision into the RHS. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 29 / 32

EGM: The Algorithm (Cont d) 3: Interpolate V n+1 to obtain its values on a grid of tomorrow s end-of-period resources: Y 0 = z 0 (ki 0) +(1 )ki 0. 4: Obtain V n+1 (k 0 i, z j)= E V n+1 Y 0 (ki 0, z0 ), z 0 z j. 5: Stop if convergence criterion is satisfied and obtain beginning-of-period capital, k, by solving the nonlinear equation Y n (i, j) z j k +(1 )k, for all i, j. Otherwise, go to step 1. Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 30 / 32

Is This Worth the Trouble? RRA Utility 0.95 0.98 0.99 0.995 2 CRRA 28.9 74 119 247 CRRA+PI 7.17 18.2 29.5 53 CRRA+PI+MP 7.17 16.5 26 38 CRRA+PI+MP+100grid 2.15 5.2 8.2 12 Endog Grid (grid curv=2) 0.38 0.94 1.92 4 Table: Time for convergence (seconds) : dim k=300, = 3.0 Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 31 / 32

Another Benchmark! 0.95 0.99 0.999 MQP/N max! # 0 50 500 0 50 500 0 50 500 no 14.99 1.07 1.00 26.48 1.28 1.00 33.29 1.41 1.00 yes 0.32 0.60 0.79 0.10 0.23 0.27 0.01 0.03 0.04 no 13.03 0.96 1.00 26.77 1.28 1.00 33.37 1.45 1.00 yes 0.67 0.67 0.69 0.14 0.24 0.30 0.02 0.04 0.06 = 1 = 5 Table: Mc-Queen Porteus Bounds and Policy Iteration Fatih Guvenen Lecture 1: Dynamic Programming November 2, 2016 32 / 32