Practical Dynamic Programming: An Introduction. Associated programs dpexample.m: deterministic dpexample2.m: stochastic

Similar documents
problem. max Both k (0) and h (0) are given at time 0. (a) Write down the Hamilton-Jacobi-Bellman (HJB) Equation in the dynamic programming

Topic 2. Consumption/Saving and Productivity shocks

High-dimensional Problems in Finance and Economics. Thomas M. Mertens

A simple macro dynamic model with endogenous saving rate: the representative agent model

(a) Write down the Hamilton-Jacobi-Bellman (HJB) Equation in the dynamic programming

1 Bewley Economies with Aggregate Uncertainty

Lecture notes on modern growth theory

Fluctuations. Shocks, Uncertainty, and the Consumption/Saving Choice

Introduction to Recursive Methods

Slides II - Dynamic Programming

Economic Growth: Lecture 8, Overlapping Generations

Chapter 3. Dynamic Programming

SGZ Macro Week 3, Lecture 2: Suboptimal Equilibria. SGZ 2008 Macro Week 3, Day 1 Lecture 2

Appendices to: Revisiting the Welfare Effects of Eliminating Business Cycles

HOMEWORK #3 This homework assignment is due at NOON on Friday, November 17 in Marnix Amand s mailbox.

Neoclassical Business Cycle Model

Lecture 5 Dynamics of the Growth Model. Noah Williams

An approximate consumption function

Economics 210B Due: September 16, Problem Set 10. s.t. k t+1 = R(k t c t ) for all t 0, and k 0 given, lim. and

Introduction to Real Business Cycles: The Solow Model and Dynamic Optimization

The Real Business Cycle Model

Suggested Solutions to Homework #3 Econ 511b (Part I), Spring 2004

Competitive Equilibrium and the Welfare Theorems

1 Basic Analysis of Forward-Looking Decision Making

Lecture 2 The Centralized Economy: Basic features

Lecture 4: Dynamic Programming

Introduction to second order approximation. SGZ Macro Week 3, Day 4 Lecture 1

Lecture 4 The Centralized Economy: Extensions

1. Using the model and notations covered in class, the expected returns are:

Lecture 6: Discrete-Time Dynamic Optimization

Introduction to Numerical Methods

ECON 582: Dynamic Programming (Chapter 6, Acemoglu) Instructor: Dmytro Hryshko

Uncertainty Per Krusell & D. Krueger Lecture Notes Chapter 6

Lecture 1: Dynamic Programming

Comprehensive Exam. Macro Spring 2014 Retake. August 22, 2014

Lecture 3: Huggett s 1993 model. 1 Solving the savings problem for an individual consumer

Projection Methods. Felix Kubler 1. October 10, DBF, University of Zurich and Swiss Finance Institute

Equilibrium in a Production Economy

Neoclassical Growth Model: I

Lecture 7: Stochastic Dynamic Programing and Markov Processes

The Ramsey Model. (Lecture Note, Advanced Macroeconomics, Thomas Steger, SS 2013)

The Solow Model. Prof. Lutz Hendricks. January 26, Econ520

Economic Growth: Lecture 9, Neoclassical Endogenous Growth

Stochastic Problems. 1 Examples. 1.1 Neoclassical Growth Model with Stochastic Technology. 1.2 A Model of Job Search

Value Function Iteration

Solution Methods. Jesús Fernández-Villaverde. University of Pennsylvania. March 16, 2016

HOMEWORK #1 This homework assignment is due at 5PM on Friday, November 3 in Marnix Amand s mailbox.

Assumption 5. The technology is represented by a production function, F : R 3 + R +, F (K t, N t, A t )

UNIVERSITY OF VIENNA

The economy is populated by a unit mass of infinitely lived households with preferences given by. β t u(c Mt, c Ht ) t=0

Simple Consumption / Savings Problems (based on Ljungqvist & Sargent, Ch 16, 17) Jonathan Heathcote. updated, March The household s problem X

ECOM 009 Macroeconomics B. Lecture 2

A Quick Introduction to Numerical Methods

Endogenous Growth: AK Model

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming

Dynamic Optimization Using Lagrange Multipliers

Ramsey Cass Koopmans Model (1): Setup of the Model and Competitive Equilibrium Path

ECON607 Fall 2010 University of Hawaii Professor Hui He TA: Xiaodong Sun Assignment 2

Adverse Selection, Risk Sharing and Business Cycles

Graduate Macro Theory II: Notes on Solving Linearized Rational Expectations Models

ADVANCED MACRO TECHNIQUES Midterm Solutions

An Application to Growth Theory

The Ramsey/Cass-Koopmans (RCK) Model

Economic Growth: Lecture 13, Stochastic Growth

DYNAMIC LECTURE 5: DISCRETE TIME INTERTEMPORAL OPTIMIZATION

2. What is the fraction of aggregate savings due to the precautionary motive? (These two questions are analyzed in the paper by Ayiagari)

Markov Perfect Equilibria in the Ramsey Model

Public Economics The Macroeconomic Perspective Chapter 2: The Ramsey Model. Burkhard Heer University of Augsburg, Germany

Advanced Macroeconomics

Advanced Macroeconomics

Linear and Loglinear Approximations (Started: July 7, 2005; Revised: February 6, 2009)

Topic 3. RBCs

New Notes on the Solow Growth Model

Endogenous Growth. Lecture 17 & 18. Topics in Macroeconomics. December 8 & 9, 2008

Parameterized Expectations Algorithm and the Moving Bounds

Advanced Tools in Macroeconomics

Macro 1: Dynamic Programming 2

Suggested Solutions to Homework #6 Econ 511b (Part I), Spring 2004

Perturbation Methods I: Basic Results

Macroeconomics Qualifying Examination

Decision Theory: Markov Decision Processes

Lecture 15. Dynamic Stochastic General Equilibrium Model. Randall Romero Aguilar, PhD I Semestre 2017 Last updated: July 3, 2017

Economic Growth: Lecture 7, Overlapping Generations

Deterministic Models

ADVANCED MACROECONOMIC TECHNIQUES NOTE 3a

1 Computing the Invariant Distribution

Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania

Small Open Economy RBC Model Uribe, Chapter 4

Lecture XI. Approximating the Invariant Distribution

Online Appendix for Slow Information Diffusion and the Inertial Behavior of Durable Consumption

Solving a Dynamic (Stochastic) General Equilibrium Model under the Discrete Time Framework

Dynamic (Stochastic) General Equilibrium and Growth

Macroeconomics Theory II

Lecture Notes 10: Dynamic Programming

Lecture 4: The Bellman Operator Dynamic Programming

Problem 1 (30 points)

Dynamic Optimization Problem. April 2, Graduate School of Economics, University of Tokyo. Math Camp Day 4. Daiki Kishishita.

1. Money in the utility function (start)

Lecture 9. Dynamic Programming. Randall Romero Aguilar, PhD I Semestre 2017 Last updated: June 2, 2017

Lecture 10. Dynamic Programming. Randall Romero Aguilar, PhD II Semestre 2017 Last updated: October 16, 2017

Transcription:

Practical Dynamic Programming: An Introduction Associated programs dpexample.m: deterministic dpexample2.m: stochastic

Outline 1. Specific problem: stochastic model of accumulation from a DP perspective 2. Value Function Iteration: An Introduction 3. Value Function Iteration: An Example 4. Value Function Iteration: Discretization 5. The curse of dimensionality 6. The importance of initial conditions 7. Adding uncertainty

1. Stochastic growth model from a DP perspective Consider a model in which the production function is subject to shocks to total factor productivity y = t t t a f ( k ) These productivity shocks are assumed to be governed by a Markov process, so that a( ς ) t prob( ) ( ) ς ς γ ς t+ 1 = = and additional restrictions are imposed so that productivity is stationary (a common initial assumption but one that seems counterfactual given Solow-style accounting). t

Accumulation of capital Endogenous state evolution equation in the sense of our earlier presentation of DP Control is consumption k k g( k, a, c ) t+ 1 = t + t t t = k + a f( k ) δ k c t t t t t

Objective Discounted utility from consumption (with infinite horizon as special case) T General : E β u( k, x, c ) 0 t = 0 T Specific : E β u( c ) 0 t = 0 t t t t t t

Explicit horizon To allow consideration of iterative schemes in theory and computation To allow for finite horizon economies Requires some additional notation in terms of keeping track of time (lose time invariance because optimal behavior depends on how long is left )

2. Value Function Iteration We now describe an iterative method of determining value functions for a series of problems with increasing horizon This approach is a practical one introduced by Bellman in his original work on dynamic programming.

Review of Bellman s core ideas Focused on finding policy function and value function both of which depend on states (endogenous and exogenous states).de Subdivided complicated intertemporal problems into many two period problems, in which the trade-off is between the present now and later. Working principal was to find the optimal control and state now, taking as given that latter behavior would itself be optimal.

The Principle of Optimality An optimal policy has the property that, whatever the state and optimal first decision may be, the remaining decisions constitute an optimal policy with respect to the state originating from the first decisions Bellman (1957, pg. 83)

Following the principle, The natural maximization problem is max{ uc ( ) + βev( k, ς, T t)} c, k t t+ 1 t t t+ 1 t+ 1 st.. k = k + g( k, a( ς ), c ) t+ 1 t t t t where the right hand side is the current momentary objective (u) plus the consequences (V) of for the discounted objective of behaving optimally in the future given that there are h=t-t periods to go (h is length of horizon ).

Noting that time does not enter in an essential way other than through the horizon We can write this below as (with again meaning next period) max{ uc ( )) + βev( k', ς ', h') ( k, ς)} ck, ' st.. k' = k + g( k, a( ς), c) So then the Bellman equation is written as (with the understanding that h=h +1) Vk (,, ς h) = max{() uc + βevk ( ', ς ', h') (, kς)} ck, ' st.. k' = k + g( k, a( ς), c)

After the maximization We know the optimal policy and can calculate the associated value, so that there is now a Bellman equation of the form Vk (,, ς h) = {((,, uckς h))) + βev( k + g( k, a( ς), c( k, ς, h)), ς ', h ') ( k, ς)} Note that the policy function and the value function both depend on the states and the horizon. The Bellman equation provides a mapping from one value function (indexed by h =h+1) to another (indexed by h)

3. An example Suppose that the utility function is u(c) = log(c), that the production function is Cobb-Douglas with capital parameter α and there is complete depreciation (δ=1). These assumptions allow us to find the policy and value functions as loglinear equations Then the accumulation equation is α k' = a( ς ) k c

One period problem Suppose that there is no tomorrow so that the value of future capital is zero and thus it is optimal to choose k =0. Then, the optimal policy and associated value function are ck (, ς,1) = a( ς) k vk (, ς,1) = uc ( ) = log( c) α = log a( ς) + α log k

Two period problem Bellman Equation Vk (, ς, 2) = max {log( c) + ck, ' + βe(log( a') + αlog( k ')} st. k' = ak c Optimal policy from Euler equation 1 α 0= β α c ak c α 1 α βα ck (,,2) ς = a() ς k k'(,,2) kς = a() ς k 1+ βα 1+ βα α

Under optimal policies 1 α Vk (, ς, 2) = {log( a( ς) k ) + βe(log( a( ς ')) ς 1 + βα βα α + βα log( a( ς) k )} 1 + βα = E(log( a( ς ')) ς + βαlog( a( ς)) + (1 + βα) α log( k) 1 βα + log( ) + βα log( ) 1+ βα 1+ βα

A candidate solution vk (, ς, h) = θ log( k) + g( ς) + η h h h vk (, ς, h+ 1) = max {log( c) + βe[ θ log( k') + g( ς ') + η ] ( k, ς)} ck, ' h h h st.. k' = a( ς ) k α

Finding the optimal policy and value function 1 βθh EE :0= α c ak c 1 ck (, ς, h+ 1) = a( ς) k 1 + βθ 1 α vk (, ς, h+ 1) = {log( a( ς) k ) 1+ βθ h βθh α + β E[ θh log( a( ς) k ) + gh( ς ') + ηh]} 1+ βθ = [ α(1 + βθ )log( k)] h h + [(1 + βθ )log( a( ς ) + β Eg ( ς ') ς h α h 1 βθh + [log( ) + βθh log( ) + βηh] 1+ βθ 1+ βθ h h h

Comments The more importance (value) is attached to future capital, the smaller is consumption s share of output One value function which is loglinear in k generates another that is loglinear in k. A recursion governs the capital coefficient, which can be used to show that it increases with the horizon θ = h 1 [ α(1 + + βθh)] h = α[1 + ( αβ ) +... + ( αβ ) ]

Comments Other more complicated recursions govern the other coefficients (these are harder to solve, since the difference equations are more complicated). Illustrates general idea of calculating a sequence of value functions and associated policy functions.

Infinite horizon problem vk (, ς ) = θlog( k) + g( ς) + η vk (, ς ) = max {log( c) + βe[ θlog( k') + g( ς ') + η] ( k, ς)} ck, ' st.. k' = a( ς ) k α 1 βθ 1 EE :0 = c = a( ς ) k α c a( ς ) k c 1 + βθ α

Value function implication vk (, ς ) = θlog( k) + g( ς) + η BE : v( k, ς) = α(1 + βθ)log( k) + (1 + βθ) log( a( ς )) + β Eg( ς ') ς 1 βθ + log( ) + βθlog( ) + βη 1+ βθ 1+ βθ α θ = α(1 + βθ) = 1 αβ

Comments Value function of infinite horizon is limit of finite horizon functions as h goes to infinity. We can see this for capital coefficients using standard formula for geometric sum (other coefficients are more complicated) Example suggests that one strategy for approximating the infinite horizon value function is to construct the sequence of finite horizon value functions (this would be very inefficient here, though, because it is so easy to compute the infinite horizon function)

Comments (contd) Such approximation strategies motivate the study of the properties of the Bellman equation as a mapping on a space of (value) functions. Economics of problem: distribution of uncertainty foes not affect saving (accumulation) in this setting for reasons related to Sandmo s discussion of offsetting substitution and income effects of rate of return uncertainty.

4. Discretization Let s suppose that there are only a finite set of capital levels which can be chosen, rather that treating the level of capital as a continuous variable. Similarly, we assume that productivity can only take on a finite set of values Let s suppose that consumption is continuous and, to simplify the discussion below, let s think of substituting in the accumulation constraint, so that there is a reduced form objective (as in our discussion of the calculus of variations)

Discrete capital and productivity grids 1 2 Reduced form objective k : κ1 < κ2 <... κm a : η < η <... η z( kk, ', a) = uaf ( ( k) + (1 δ ) k k') Hence: we can form a (big) list of z numbers: utility for all different k,a,k triplets Similarly, the value function is just a list of numbers giving value for each discrete state. n

DP iterations under certainty Start with given value function v(k) In each state k, find the k value that maximizes z(k,k )+βev(k ) and compute the associated value v(k)=max{z(k,k )+βev(k ) } Collecting up results across states produces a new v(k) Continue until change in v is small.

Example Look at iterative scheme in practice using simple MATLAB program Model parameters β = 1/(1 +.06) 1 1 uc () = c σ with σ =3 1 σ α y=ak with a=1 and α =1/3 δ =.1

Early iteration 0 Value Function Updating at Iteration 1-5 Value -10-15 v vnew -20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Capital (fraction of ss) Policy Function Updating at Iteration 1 1 0.8 Kprime 0.6 0.4 0.2 pol polnew 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Capital (fraction of ss)

Later iteration 0 Value Function Updating at Iteration 19-5 Value -10-15 v vnew -20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Capital (fraction of ss) Policy Function Updating at Iteration 19 1 0.8 Kprime 0.6 0.4 0.2 pol polnew 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Capital (fraction of ss)

Lessons: Specific and General Value function to converge more slowly than policy function: welfare is more sensitive to horizon than behavior. Has motivated work on policy iteration where one starts with an initial policy (say, a linear approximation policy); evaluates the value function; finds a new policy that maximizes z(k,k )+βev(k ) and then iterates until the policy converged. Has motivated development of acceleration algorithms Example: evaluate value under some feasible policy (say constant saving rate policy) and then use this as the starting point for value iteration Policy function looks pretty linear. Has comforted some in use of linear approximation methods Has motivated others to look for peicewise linear policy functions.

5. The Curse of dimensionality We have studied a model with one state which can taken on N different values. Our brute force optimization approach involves calculating z(k,k ) at each k for each possible k and then picking the largest one. Hence, we have to compute z(k,k ) a total of N*N times. If we have a good method of finding the optimal policy in at each point k, such as using properties of the objective, then the problem might only involve NM computations (with M<N being the smaller number of steps in the maximization step). Suppose that there were two states, each of which can take on N different values. We would have to find an optimal state evolution at each of N*N points and a blind search would require evaluating z at N*N points all of the possible actions -- for each at each state. A more sophisticated search would require (N*N*M), as we would be doing M computations at each of N*N points in a state grid.

Curse Even if computation is maximially efficient (M=1, so that one does only one evaluation of Z at each point in a state grid), one can see that this state grid gets large very fast: if there are j states then there are N j points in the grid: with 100 points for each state and 4 state variables, one is looking at a grid of 10 8 of points. Discretization methods are not feasible for complicated models with many states. Has motivated study of many other computational procedures for dynamic programming.

6. The importance of initial conditions We started the example dynamic programs with a future v=0 and with v(k,a,0)=u(af(k)+(1-δ)k). Theorems in the literature on dynamic programming guarantee the convergence of value function iteration under specified conditions on u,f,a from any initial condition. In this sense, the initial condition is irrelevant. Such theorems will be developed in Miao s Economic Dynamics course in the spring. However, the theorems indicate that for a large number of iterations, results from any two starting points are (i) arbitrarily close to each other and (ii) arbitrarily close to the solution for the infinite horizon case. In a practical sense, initial conditions can be critical. Imagine that, by accident or design, one started off with exactly the correct value function for the infinite horizon case, but you did not know the policy function. In this situation, there would only be one iteration, because the infinite horizon Bellman equation means that the new v will be equal to the old v.

7. Adding uncertainty Since we have discretized the state space, it is also natural to discretize the shock variable. The discretization is a Markov chain, with fixed probabilities of making transitions from one exogenous state to another.

Transition matrix for Markov chain (elements are prob(a a)=π ij ) Later: High Later: Medium Later: Low Now:.5.4.1 High Now: Medium.2.6.2 Now:.1.4.5 Low

DP iterations with uncertainty At each grid point (k= κ l,a= η i ), the Bellman equation is J l i = k' l i + ij ' j j = 1 V( κ, η, h) max { z( κ, η, k') β π EV( k', η )} DP iterations with uncertainty require calculating the expected value as well as the prior computations discussed above. With a Markov chain, this is just adding up value functions after multiplying by future states. Grid for v now must depend on (k,a), which adds to curse of dimensionality.

Stochastic dynamic program Three value functions in graph: each of three values of a Three policy functions in graph: each of three values of a. No longer see improvements for all three: just the middle a function to avoid graphical clutter (it is the dashed line).

Early iteration 0 Value Function Updating at Iteration 1-5 Value -10-15 v (a=1) vnew -20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Capital (fraction of ss) Policy Function Updating at Iteration 1 Kprime 1 0.8 0.6 0.4 0.2 pol (a=1) polnew 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Capital (fraction of ss)

Later iteration 0 Value Function Updating at Iteration 19-5 Value -10-15 v (a=1) vnew -20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Capital (fraction of ss) Policy Function Updating at Iteration 19 Kprime 1 0.8 0.6 0.4 0.2 pol (a=1) polnew 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Capital (fraction of ss)