Lecture Notes 10: Dynamic Programming

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 1 of 81 Lecture Notes 10: Dynamic Programming Peter J. Hammond 2018 September 28th

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 2 of 81 Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 3 of 81 Basic Equation A simple stochastic linear difference equation of the first order in one variable takes the form x t = ax t 1 + ɛ t (t N) Here a is a real parameter, and each ɛ t is a real random disturbance. Assume that: 1. there is a given or pre-determined initial state x 0 ; 2. the random variables ɛ t are independent and identically distributed (IID) with mean Eɛ t = 0 and variance Eɛ 2 t = σ 2. A special case is when the disturbances are all normally distributed i.e., ɛ t N(0, σ 2 ).

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 4 of 81 Explicit Solution and Conditional Mean For each fixed outcome ɛ N = (ɛ t ) t N of the random sequence, there is a unique solution which can be written as x t = a t x 0 + t s=1 at s ɛ s The main stable case occurs when a < 1. Then each term of the sum a t x 0 + t s=1 at s ɛ s converges to 0 as t. This is what econometricians or statisticians call a first-order autoregressive (or AR(1)) process. In fact, given x 0 at time 0, our assumption that Eɛ s = 0 for all s = 1, 2,..., t implies that the conditional mean of x t is m t := E[x t x 0 ] = E [a t x 0 + t ] s=1 at s ɛ s x 0 = a t x 0

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 5 of 81 Conditional Variance The conditional variance, however, is given by v t := E [ (x t m t ) 2 ] [ x 0 = E[(xt a t x 0 ) 2 t ] 2 x 0 ] = E s=1 at s ɛ s In the case we are considering with independently distributed disturbances ɛ s, the variance of a sum is the sum of the variances. Hence v t = t s=1 E [ a t s ɛ s ] 2 = t s=1 a2(t s) Eɛ 2 s = σ 2 t s=1 a2(t s) Using the rule for summing the geometric series t s=1 a2(t s), we finally obtain v t = 1 a2t 1 a 2 σ2

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 6 of 81 Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 7 of 81 Sums of Normally Distributed Random Variables, I Recall that if X N(µ, σ 2 ), then the characteristic function defined by φ X (t) = E[e ixt ] takes the form φ X (t) = E[e ixt ] = + e ixt 1 2πσ 2 exp 1 2 This reduces to φ X (t) = exp ( itµ 1 2 σ2 t 2). ( ) x µ 2 dx Hence, if Z = X + Y where X N(µ X, σ 2 X ) and Y N(µ Y, σ 2 Y ) are independent random, variables, then φ Z (t) = E[e izt ] = E[e i(x +Y )t ] = E[e ixt e iyt ] = E[e ixt ]E[e iyt ] σ

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 8 of 81 Sums of Normally Distributed Random Variables, II So φ Z (t) = exp ( itµ X 1 2 σ2 X t2) exp ( itµ Y 1 2 σ2 Y t2) = exp ( it(µ X + µ Y ) 1 2 (σ2 X + σ2 Y )t2) = exp ( itµ Z 1 2 σ2 Z t2) where µ Z = µ X + µ Y = E(X + Y ) is the mean of X + Y, and σ 2 Z = σ2 X + σ2 Y is the variance of X + Y. It follows that t φ Z (t) is the characteristic function of a random variable Z N(µ Z, σ 2 Z ) where µ Z = µ X + µ Y = E(X + Y ) and σ 2 Z = σ2 X + σ2 Y. That is, the sum Z = X + Y of two independent normally distributed random variables X and Y is also normally distributed, with: 1. mean equal to the sum of the means; 2. variance equal to the sum of the variances.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 9 of 81 The Gaussian Case and the Asymptotic Distribution In the particular case when each ɛ t is normally distributed as well as IID, then x t is also normally distributed with mean m t and variance v t. As t, the conditional mean m t = a t x 0 0 and the conditional variance v t = 1 a2t 1 a 2 σ2 v := σ2 1 a 2 In the case when each ɛ t is normally distributed, this implies that the asymptotic distribution of x t is also normal, with mean 0 and variance v = σ 2 /(1 a 2 ).

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 10 of 81 Stationarity Now suppose that x 0 itself has this asymptotic normal distribution suppose that x 0 N(0, σ 2 /(1 a 2 )). This is what the distribution of x 0 would be if the process had started at t = instead of at t = 0. Then the unconditional mean of each x t is Ex t = a t Ex 0 = 0. On the other hand, because x t+k = a k x t + k s=1 ak s ɛ t+s, the unconditional covariance of x t and x t+k is E(x t+k x t ) = E[a k x 2 t ] = a k v = ak 1 a 2 σ2 (k = 0, 1, 2...) In fact, given any t, the joint distribution of the r random variables x t, x t+1,..., x t+r 1 is multivariate normal with variance covariance matrix having elements E(x t+k x t ) = a k σ 2 /(1 a 2 ), independent of t. Because of this independence, the process is said to be stationary.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 11 of 81 Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 12 of 81 Intertemporal Utility Consider a household which at time s is planning its intertemporal consumption stream c T s := (c s, c s+1,..., c T ) over periods t in the set {s, s + 1,..., T }. Its intertemporal utility function R T s+1 c T s Us T (c T s ) R is assumed to take the additively separable form U T s (c T s ) := T t=s u t(c t ) where the one period felicity functions c u t (c) are differentiably increasing and strictly concave (DISC) i.e., u t(c) > 0, and u t (c) < 0 for all t and all c > 0. As before, the household faces: 1. fixed initial wealth w s ; 2. a terminal wealth constraint w T +1 0.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 13 of 81 Risky Wealth Accumulation Also as before, we assume a wealth accumulation equation w t+1 = r t (w t c t ), where r t is the household s gross rate of return on its wealth in period t. It is assumed that: 1. the return r t in each period t is a random variable with positive values; 2. the return distributions for different times t are stochastically independent; 3. starting with predetermined wealth w s at time s, the household seeks to maximize the expectation E s [U T s (c T s )] of its intertemporal utility.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 14 of 81 Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 15 of 81 Two Period Case We work backwards from the last period, when s = T. In this last period the household will obviously choose c T = w T, yielding a maximized utility equal to V T (w T ) = u T (w T ). Next, consider the penultimate period, when s = T 1. The consumer will want to choose c T 1 in order to maximize u T 1 (c T 1 ) }{{} period T 1 + E T 1 V T (w T ) }{{} result of an optimal policy in period T subject to the wealth constraint w T = r T 1 }{{} random gross return (w T 1 c T 1 ) }{{} saving

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 16 of 81 First-Order Condition Substituting both the function V T (w T ) = u T (w T ) and the wealth constraint into the objective reduces the problem to max c T 1 {u T 1 (c T 1 ) + E T 1 [u T ( r T 1 (w T 1 c T 1 )) ] } subject to 0 c T 1 w T 1 and c T := r T 1 (w T 1 c T 1 ). Assume we can differentiate under the integral sign, and that there is an interior solution with 0 < c T 1 < w T 1. Then the first-order condition (FOC) is 0 = u T 1 (c T 1) + E T 1 [( r T 1 )u T ( r T 1(w T 1 c T 1 ))]

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 17 of 81 The Stochastic Euler Equation Rearranging the first-order condition while recognizing that c T := r T 1 (w T 1 c T 1 ), one obtains u T 1 (c T 1) = E T 1 [ r T 1 u T ( r T 1(w T 1 c T 1 ))] Dividing by u T 1 (c T 1) gives the stochastic Euler equation 1 = E T 1 [ ] u T r ( c T ) ] T 1 u T 1 (c = E T 1 [ r T 1 MRS T T 1 T 1) (c T 1; c T ) involving the marginal rate of substitution function MRS T T 1 (c T 1; c T ) := u T ( c T ) u T 1 (c T 1)

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 18 of 81 The CES Case For the marginal utility function c u (c), its elasticity of substitution is defined for all c > 0 by η(c) := d ln u (c)/d ln c. Then η(c) is both the degree of relative risk aversion, and the degree of relative fluctuation aversion. A constant elasticity of substitution (or CES) utility function satisfies d ln u (c)/d ln c = ɛ < 0 for all c > 0. The marginal rate of substitution satisfies u (c)/u ( c) = (c/ c) ɛ for all c, c > 0.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 19 of 81 Normalized Utility Normalize by putting u (1) = 1, implying that u (c) c ɛ. Then integrating gives u(c; ɛ) = u(1) + c 1 x ɛ dx u(1) + c1 ɛ 1 if ɛ 1 = 1 ɛ u(1) + ln c if ɛ = 1 Introduce the final normalization 1 if ɛ 1 u(1) = 1 ɛ 0 if ɛ = 1 The utility function is reduced to c 1 ɛ if ɛ 1 u(c; ɛ) = 1 ɛ ln c if ɛ = 1

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 20 of 81 The Stochastic Euler Equation in the CES Case Consider the CES case when u t(c) δ t c ɛ, where each δ t is the discount factor for period t. Definition The one-period discount factor in period t is defined as β t := δ t+1 /δ t. Then the stochastic Euler equation takes the form [ ( ) ] ɛ ct 1 = E T 1 r T 1 β T 1 c T 1 Because c T 1 is being chosen at time T 1, this implies that (c T 1 ) ɛ = E T 1 [ rt 1 β T 1 ( c T ) ɛ]

The Two Period Problem in the CES Case In the two-period case, we know that c T = w T = r T 1 (w T 1 c T 1 ) in the last period, so the Euler equation becomes (c T 1 ) ɛ = E T 1 [ rt 1 β T 1 ( c T ) ɛ] = β T 1 (w T 1 c T 1 ) ɛ E T 1 [ ( rt 1 ) 1 ɛ] Take the ( 1/ɛ) th power of each side and define ρ T 1 := ( β T 1 E T 1 [ ( rt 1 ) 1 ɛ]) 1/ɛ to reduce the Euler equation to c T 1 = ρ T 1 (w T 1 c T 1 ) whose solution is evidently c T 1 = γ T 1 w T 1 where γ T 1 := ρ T 1 /(1 + ρ T 1 ) and 1 γ T 1 = 1/(1 + ρ T 1 ) are respectively the optimal consumption and savings ratios. It follows that ρ T 1 = γ T 1 /(1 γ T 1 ) is the consumption/savings ratio. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 21 of 81

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 22 of 81 Optimal Discounted Expected Utility The optimal policy in periods T and T 1 is c t = γ t w t where γ T = 1 and γ T 1 has just been defined. In this CES case, the discounted utility of consumption in period T is V T (w T ) := δ T u(w T ; ɛ). The discounted expected utility at time T 1 of consumption in periods T and T 1 together is V T 1 (w T 1 ) = δ T 1 u(γ T 1 w T 1 ; ɛ) + δ T E T 1 [u( w T ; ɛ)] where w T = r T 1 (1 γ T 1 )w T 1.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 23 of 81 Discounted Expected Utility in the Logarithmic Case In the logarithmic case when ɛ = 1, one has V T 1 (w T 1 ) = δ T 1 ln(γ T 1 w T 1 ) + δ T E T 1 [ln ( r T 1 (1 γ T 1 )w T 1 )] It follows that V T 1 (w T 1 ) = α T 1 + (δ T 1 + δ T )u(w T 1 ; ɛ) where α T 1 := δ T 1 ln γ T 1 + δ T {ln(1 γ T 1 ) + E T 1 [ln r T 1 ]}

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 24 of 81 Discounted Expected Utility in the CES Case In the CES case when ɛ 1, one has (1 ɛ)v T 1 (w T 1 ) = δ T 1 (γ T 1 w T 1 ) 1 ɛ + δ T [(1 γ T 1 )w T 1 ] 1 ɛ E T 1 [( r T 1 ) 1 ɛ ] so V T 1 (w T 1 ) = v T 1 u(w T 1 ; ɛ) where v T 1 := δ T 1 (γ T 1 ) 1 ɛ + δ T (1 γ T 1 ) 1 ɛ E T 1 [( r T 1 ) 1 ɛ ] In both cases, one can write V T 1 (w T 1 ) = α T 1 + v T 1 u(w T 1 ; ɛ) for a suitable additive constant α T 1 (which is 0 in the CES case) and a suitable multiplicative constant v T 1.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 25 of 81 Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 26 of 81 The Time Line In each period t, suppose: the consumer starts with known wealth w t ; then the consumer chooses consumption c t, along with savings or residual wealth w t c t ; there is a cumulative distribution function F t (r) on R that determines the gross return r t as a positive-valued random variable. After these three steps have been completed, the problem starts again in period t + 1, with the consumer s wealth known to be w t+1 = r t (w t c t ).

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 27 of 81 Expected Conditionally Expected Utility Starting at any t, suppose the consumer s choices, together with the random returns, jointly determine a cdf F T t over the space of intertemporal consumption streams c T t. The associated expected utility is E t [ U T t (c T t ) ], using the shorthand E t to denote integration w.r.t. the cdf F T t. Then, given that the consumer has chosen c t at time t, let E t+1 [ c t ] denote the conditional expected utility. This is found by integrating w.r.t. the conditional cdf F T t+1 (ct t+1 c t). The law of iterated expectations allows us to write the unconditional expectation E t [ U T t (c T t ) ] as the expectation E t [E t+1 [U T t (c T t ) c t ]] of the conditional expectation.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 28 of 81 The Expectation of Additively Separable Utility Our hypothesis is that the intertemporal von Neumann Morgenstern utility function takes the additively separable form U T t (c T t ) = T τ=t u τ (c τ ) The conditional expectation given c t must then be E t+1 [U T t (c T t ) c t ] = u t (c t ) + E t+1 [ T τ=t+1 u τ (c τ ) c t ] whose expectation is [ T ] [ [ T ] ] E t u τ (c τ ) = u t (c t ) + E t E t+1 u τ (c τ ) c t τ=t τ=t+1

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 29 of 81 The Continuation Value Let V t+1 (w t+1 ) be the state valuation function expressing the maximum of the continuation value E t+1 [ U T t+1(c T t+1) w t+1 ] = E t+1 [ T τ=t+1 u τ (c τ ) w t+1 ] as a function of the wealth level or state w t+1. Assume this maximum value is achieved by following an optimal policy from period t + 1 on. Then total expected utility at time t will then reduce to E t [ U T t ( c T t ) c t ] = u t (c t ) + E t [ E t+1 [ T τ=t+1 u τ (c τ ) w t+1 ] c t ] = u t (c t ) + E t [V t+1 ( w t+1 ) c t ] = u t (c t ) + E t [V t+1 ( r t (w t c t ))]

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 30 of 81 The Principle of Optimality Maximizing E s [ U T s (c T s ) ] w.r.t. c s, taking as fixed the optimal consumption plans c t (w t ) at times t = s + 1,..., T, therefore requires choosing c s to maximize u s (c s ) + E s [V s+1 ( r s (w s c s ))] Let c s (w s ) denote a solution to this maximization problem. Then the value of an optimal plan (c t (w t )) T t=s that starts with wealth w s at time s is V s (w s ) := u s (c s (w s )) + E s [V s+1 ( r s (w s c s (w s )))] Together, these two properties can be expressed as } V s (w s ) = cs max {u (w s ) = arg s (c s ) + E s [V s+1 ( r s (w s c s ))]} 0 c s w s which can be described as the the principle of optimality.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 31 of 81 An Induction Hypothesis Consider once again the case when u t (c) δ t u(c; ɛ) for the CES (or logarithmic) utility function that satisfies u (c; ɛ) c ɛ and, specifically { c 1 ɛ /(1 ɛ) if ɛ 1; u(c; ɛ) = ln c if ɛ = 1. Inspired by the solution we have already found for the final period T and penultimate period T 1, we adopt the induction hypothesis that there are constants α t, γ t, v t (t = T, T 1,..., s + 1, s) for which c t (w t ) = γ t w t and V t (w t ) = α t + v t u(w t ; ɛ) In particular, the consumption ratio γ t and savings ratio 1 γ t are both independent of the wealth level w t.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 32 of 81 Applying Backward Induction Under the induction hypotheses that the maximand takes the form c t (w t ) = γ t w t and V t (w t ) = α t + v t u(w t ; ɛ) u s (c s ) + E s [V s+1 ( r s (w s c s ))] δ s u(c s ; ɛ) + E s [α s+1 + v s+1 u( r s (w s c s ); ɛ)] The first-order condition for this to be maximized w.r.t. c s is or, equivalently, that 0 = δ s u (c s ; ɛ) v s+1 E s [ r s u ( r s (w s c s ); ɛ)] δ s (c s ) ɛ = v s+1 E s [ r s ( r s (w s c s )) ɛ )] = v s+1 (w s c s ) ɛ E s [( r s ) 1 ɛ ]

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 33 of 81 Solving the Logarithmic Case When ɛ = 1 and so u(c; ɛ) = ln c, the first-order condition reduces to δ s (c s ) 1 = v s+1 (w s c s ) 1. Its solution is indeed c s = γ s w s where δ s (γ s ) 1 = v s+1 (1 γ s ) 1, implying that γ s = δ s /(δ s + v s+1 ). The state valuation function then becomes V s (w s ) = δ s u(γ s w s ; ɛ) + α s+1 + v s+1 E s [u( r s (1 γ s )w s ; ɛ)] = δ s ln(γ s w s ) + α s+1 + v s+1 E s [ln( r s (1 γ s )w s )] = δ s ln(γ s w s ) + α s+1 + v s+1 {ln(1 γ s )w s + ln R s } where we define the geometric mean certainty equivalent return R s so that ln R s := E s [ln( r s )].

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 34 of 81 The State Valuation Function The formula V s (w s ) = δ s ln(γ s w s ) + α s+1 + v s+1 {ln(1 γ s )w s + ln R s } reduces to the desired form V s (w s ) = α s + v s ln w s provided we take v s := δ s + v s+1, which implies that γ s = δ s /v s, and also α s := δ s ln γ s + α s+1 + v s+1 {ln(1 γ s ) + ln R s } = δ s ln(δ s /v s ) + α s+1 + v s+1 {ln(v s+1 /v s ) + ln R s } = δ s ln δ s + α s+1 v s ln v s + v s+1 {ln v s+1 + ln R s } This confirms the induction hypothesis for the logarithmic case. The relevant constants v s are found by summing backwards, starting with v T = δ T, implying that v s = T τ=s δ s.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 35 of 81 The Stationary Logarithmic Case In the stationary logarithmic case: the felicity function in each period t is β t ln c t, so the one period discount factor is the constant β; the certainty equivalent return R t is also a constant R. Then v s = T τ=s δ s = T τ=s βτ = (β s β T +1 )/(1 β), implying that γ s = β s /v s = β s (1 β)/(β s β T +1 ). It follows that c s = γ s w s = (1 β)w s 1 β T s+1 = (1 β)w s 1 β H+1 when there are H := T s periods left before the horizon T. As H, this solution converges to c s = (1 β)w s, so the savings ratio equals the constant discount factor β. Remarkably, this is also independent of the gross return to saving.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 36 of 81 First-Order Condition in the CES Case Recall that the first-order condition in the CES Case is δ s (c s ) ɛ = v s+1 (w s c s ) ɛ E s [( r s ) 1 ɛ ] = v s+1 (w s c s ) ɛ R 1 ɛ s where we have defined the certainty equivalent return R s as the solution to R 1 ɛ s := E s [( r s ) 1 ɛ ]. The first-order condition indeed implies that c s (w s ) = γ s w s, where δ s (γ s ) ɛ = v s+1 (1 γ s ) ɛ R 1 ɛ s. This implies that or γ s = ( v s+1 R 1 ɛ ) 1/ɛ s /δ s 1 γ s γ s = ( vs+1 Rs 1 ɛ ) 1/ɛ /δ s 1 + ( v s+1 R 1 ɛ s /δ s ) 1/ɛ = ( vs+1 Rs 1 ɛ ) 1/ɛ (δ s ) 1/ɛ + ( v s+1 R 1 ɛ s ) 1/ɛ

Completing the Solution in the CES Case Under the induction hypothesis that V s+1 (w) = v s+1 w 1 ɛ /(1 ɛ), one also has (1 ɛ)v s (w s ) = δ s (γ s w s ) 1 ɛ + v s+1 E s [( r s (1 γ s )w s ) 1 ɛ ] This reduces to the desired form (1 ɛ)v s (w s ) = v s (w s ) 1 ɛ, where v s := δ s (γ s ) 1 ɛ + v s+1 E s [( r s ) 1 ɛ ](1 γ s ) 1 ɛ = δ s(v s+1 Rs 1 ɛ ) 1 1/ɛ + v s+1 Rs 1 ɛ (δ s ) 1 1/ɛ [(δ s ) 1/ɛ + ( v s+1 Rs 1 ɛ ) 1/ɛ] 1 ɛ = δ s v s+1 R 1 ɛ s = δ s v s+1 R 1 ɛ s (v s+1 R 1 ɛ s ) 1/ɛ + (δ s ) 1/ɛ [(δ s ) 1/ɛ + ( v s+1 R 1 ɛ [(δ s ) 1/ɛ + ( v s+1 R 1 ɛ s s ) 1/ɛ] 1 ɛ ) 1/ɛ] ɛ This confirms the induction hypothesis for the CES case. Again, the relevant constants are found by working backwards. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 37 of 81

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 38 of 81 Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 39 of 81 Histories and Strategies For each time t = s, s + 1,..., T between the start s and the horizon T, let h t denote a known history (w τ, c τ, r τ ) t τ=s of the triples (w τ, c τ, r τ ) at successive times τ = s, s + 1,..., t up to time t. A general policy the consumer can choose involves a measurable function h t ψ t (h t ) mapping each known history up to time t, which determines the consumer s information set, into a consumption level at that time. The collection of successive functions ψs T = ψ t T t=s is what a game theorist would call the consumer s strategy in the extensive form game against nature.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 40 of 81 Markov Strategies We found an optimal solution for the two-period problem when t = T 1. It took the form of a Markov strategy ψ t (h t ) := c t (w t ), which depends only on w t as the particular state variable. The following analysis will demonstrate in particular that at each time t = s, s + 1,..., T, under the induction hypothesis that the consumer will follow a Markov strategy in periods τ = t + 1, t + 2,..., T, there exists a Markov strategy that is optimal in period t. It will follow by backward induction that there exists an optimal strategy h t ψ t (h t ) for every period t = s, s + 1,..., T that takes the Markov form h t w t c t (w t ). This treats history as irrelevant, except insofar as it determines current wealth w t at the time when c t has to be chosen.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 41 of 81 A Stochastic Difference Equation Accordingly, suppose that the consumer pursues a Markov strategy taking the form w t c t (w t ). Then the Markov state variable w t will evolve over time according to the stochastic difference equation w t+1 = φ t (w t, r t ) := r t (w t c t (w t )). Starting at any time t, conditional on initial wealth w t, this equation will have a random solution w T t+1 = ( w τ ) T τ=t+1 described by a unique joint conditional cdf F T t+1 (wt t+1 w t) on R T s. Combined with the Markov strategy w t c t (w t ), this generates a random consumption stream c T t+1 = ( c τ ) T τ=t+1 described by a unique joint conditional cdf G T t+1 (ct t+1 w t) on R T s.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 42 of 81 Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

General Finite Horizon Problem Consider the objective of choosing the sequence (y s, y s+1,..., y T 2, y T 1 ) of controls in order to maximize [ T 1 E s u s (x s, y s ) + φ T (x T ) t=s subject to the law of motion x t+1 = ξ t (x t, y t, ɛ t ), where the random shocks ɛ t at different times t = s, s + 1, s + 2,..., T 1 are conditionally independent given x t, y t. Here x T φ T (x T ) is the terminal state valuation function. The stochastic law of motion can also be expressed through successive conditional probabilities P t+1 (x t+1 x t, y t ). The choices of y t at successive times determine a controlled Markov process governing the stochastic transition from each state x t to its immediate successor x t+1. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 43 of 81 ]

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 44 of 81 Backward Recurrence Relation To find the optimal solution, solve the backward recurrence relation } V s (x s ) = ys max (x s ) = arg {u s(x s, y s ) + E s [V s+1 (x s+1 ) x s, y s ]} y s F s(x s) where, for each start time s, 1. x s denotes the inherited state at time s; 2. V s (x s ) is the current value in state x s of the state value function X x V s (x) R; 3. X x F s (x) Y is the feasible set correspondence, with graph G s := {(x, y) x X & y F s (x)}; 4. G s (x, y) u s (x, y) denotes the immediate return function; 5. X x y s (x) F s (x s ) is the optimal strategy or policy function; 6. The relevant terminal condition is that V T (x T ) is given by the exogenously specified function φ T (x T ).

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 45 of 81 Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 46 of 81 An Infinite Horizon Savings Problem Game theorists speak of the one-shot deviation principle. This states that if any deviation from a particular policy or strategy improves a player s payoff, then there exists a one-shot deviation that improves the payoff. We consider the infinite horizon extension of the consumption/investment problem already considered. Given the initial time s and initial wealth w s, this takes the form of choosing a consumption policy c t (w t ) at times t = s, s + 1, s + 2,... in order to maximize the discounted sum of total utility, given by t=s βt s u(c t ) subject to the accumulation equation w t+1 = r t (w t c t ) as well as the inequality constraint w t 0 for t = s + 1, s + 2,....

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 47 of 81 Some Assumptions The parameter β (0, 1) is the constant discount factor. Note that utility function R c u(c) is independent of t; its first two derivatives are assumed to satisfy the inequalities u (c) > 0 and u (c) < 0 for all c R +. The investment returns r t in successive periods are assumed to be i.i.d. random variables. It is assumed that w t in each period t is known at time t, but not before.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 48 of 81 Terminal Constraint There has to be an additional constraint that imposes a lower bound on wealth at some time t. Otherwise there would be no optimal policy the consumer can always gain by increasing debt (negative wealth), no matter how large existing debt may be. In the finite horizon, there was a constraint w T 0 on terminal wealth. But here T is effectively infinite. One might try an alternative like lim inf t βt w t 0 But this places no limit on wealth at any finite time. We use the alternative constraint requiring that w t 0 for all time.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 49 of 81 Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 50 of 81 The Stationary Problem Our modified problem can be written in the following form that is independent of s: max c 0,c 1,...,c t,... t=0 βt u(c t ) subject to the constraints 0 c t w t and w t+1 = r t (w t c t ) for all t = 0, 1, 2,..., with w 0 = w, where w is given. Because the starting time s is irrelevant, this is a stationary problem. Define the state valuation function w V (w) as the maximum value of the objective, as a function of initial wealth w. It is independent of s because the problem is stationary.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 51 of 81 Bellman s Equation For the finite horizon problem, the principle of optimality was } V s (w s ) = cs max {u (w s ) = arg s (c s ) + E s [V s+1 ( r s (w s c s ))]} 0 c s w s For the stationary infinite horizon problem, however, the time starting time s is irrelevant. So the principle of optimality can be expressed as } V (w) = c max {u(c) + βe[v ( r(w c))]} (w) = arg 0 c w The state valuation function w V (w) appears on both left and right hand sides of this equation. Solving it therefore involves finding a fixed point, or function, in an appropriate function space.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 52 of 81 Isoelastic Case We consider yet again the isoelastic case with a CES (or logarithmic) utility function that satisfies u (c; ɛ) c ɛ and, specifically { c 1 ɛ /(1 ɛ) if ɛ 1; u(c; ɛ) = ln c if ɛ = 1. Recall the corresponding finite horizon case, where we found that the solution to the corresponding equations } V s (w s ) = cs max {u (w s ) = arg s (c s ) + βe s [V s+1 ( r s (w s c s ))]} 0 c s w s takes the form: (i) V s (w) = α s + v s u(w; ɛ) for suitable real constants α s and v s > 0, where α s = 0 if ɛ 1; (ii) c s (w s ) = γ s w s for a suitable constant γ s (0, 1).

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 53 of 81 First-Order Condition Accordingly, we look for a solution to the stationary problem } V (w) = c max {u(c; ɛ) + βe[v ( r(w c))]} (w) = arg 0 c w taking the isoelastic form V (w) = α + vu(w; ɛ) for suitable real constants α and v > 0, where α = 0 if ɛ 1. The first-order condition for solving this concave maximization problem is c ɛ = βe[ r( r(w c)) ɛ ] = ζ ɛ (w c) ɛ where ζ ɛ := βr 1 ɛ with R as the certainty equivalent return defined by R 1 ɛ := E[ r 1 ɛ ]. Hence c = γw where γ ɛ = ζ ɛ (1 γ) ɛ, implying that ζ = (1 γ)/γ, the savings consumption ratio. Then γ = 1/(1 + ζ), so 1 γ = ζ/(1 + ζ).

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 54 of 81 Solution in the Logarithmic Case When ɛ = 1 and so u(c; ɛ) = ln c, one has V (w) = u(γw; ɛ) + β{α + ve[u( r(1 γ)w; ɛ)]} = ln(γw) + β{α + ve[ln( r(1 γ)w)]} = ln γ + (1 + βv) ln w + β {α + v ln(1 γ) + E[ln r]} This is consistent with V (w) = α + v ln w in case: 1. v = 1 + βv, implying that v = (1 β) 1 ; 2. and also α = ln γ + β {α + v ln(1 γ) + E[ln r]}, which implies that α = (1 β) 1 [ ln γ + β { (1 β) 1 ln(1 γ) + E[ln r] }] This confirms the solution for the logarithmic case.

Solution in the CES Case When ɛ 1 and so u(c; ɛ) = c 1 ɛ /(1 ɛ), the equation implies that V (w) = u(γw; ɛ) + βve[u( r(1 γ)w; ɛ)] (1 ɛ)v (w) = (γw) 1 ɛ + βve[( r(1 γ)w) 1 ɛ ] = vw 1 ɛ where v = γ 1 ɛ + βv(1 γ) 1 ɛ R 1 ɛ and so v = γ 1 ɛ 1 β(1 γ) 1 ɛ R 1 ɛ = γ 1 ɛ 1 (1 γ) 1 ɛ ζ ɛ But optimality requires γ = 1/(1 + ζ), implying finally that v = (1 + ζ)ɛ 1 1 ζ(1 + ζ) ɛ 1 = 1 (1 + ζ) 1 ɛ ζ This confirms the solution for the CES case. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 55 of 81

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 56 of 81 Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 57 of 81 Uniformly Bounded Returns Suppose that the stochastic transition from each state x to the immediately succeeding state x is specified by a conditional probability measure B P( x B x, u) on a σ-algebra of the state space. Consider the stationary problem of choosing a policy x u (x) in order to maximize the infinite discounted sum of utility E t=1 βt 1 f (x t, u t ) where 0 < β < 1, with x 1 given and subject to u t U(x t ) for t = 1, 2,.... The return function (x, u) f (x, u) R is uniformly bounded provided there exist a uniform lower bound M and a uniform upper bound M such that M f (x, u) M for all (x, u)

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 58 of 81 The Function Space The boundedness assumption M f (x, u) M for all (x, u) ensures that, because 0 < β < 1 and so t=1 βt 1 = 1 1 β, the infinite discounted sum of utility W := E satisfies (1 β) W [M, M ]. t=1 βt 1 f (x t, u t ) This makes it natural to consider the linear space V of all bounded functions X x V (x) R equipped with its sup norm defined by V := sup x X V (x). We will pay special attention to the subset V M := {V V x X = (1 β)v (x) [M, M ]} of state valuation functions whose values V (x) all lie within the range of the possible values of W.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 59 of 81 Existence and Uniqueness Theorem Consider the Bellman equation system } V (x) = u max {f (x, u) + βe [V ( x) x, u]} (x) arg u F (x) Under the assumption of uniformly bounded returns satisfying M f (x, u) M for all (x, u): 1. among the set V M of state valuation functions that satisfy the inequalities M (1 β)v (x) M for all x, there is a unique state valuation function x V (x) that satisfies the Bellman equation system. 2. any associated policy solution x u (x) determines an optimal policy that is stationary i.e., independent of time.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 60 of 81 Two Mappings Given any measurable policy function X x u(x) denoted by u, define the mapping T u : V M V by [T u V ](x) := f (x, u(x)) + βe [V ( x) x, u(x)] When the state is x, this gives the value [T u V ](x) of choosing the policy u(x) for one period, and then experiencing a future discounted return V ( x) after reaching each possible subsequent state x X. Define also the mapping T : V M V by [T V ](x) := max {f (x, u) + βe [V ( x) x, u]} u F (x) These definitions allow the Bellman equation system to be rewritten as V (x) = [T V ](x) u (x) arg max u F (x) [T u V ](x)

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 61 of 81 Two Mappings of V M into Itself For all V V M, policies u, and x X, we have defined [T u V ](x) := f (x, u(x)) + β E [V ( x) x, u(x)] and [T V ](x) := max u F (x) {f (x, u) + β E [V ( x) x, u]} Recall the uniform boundedness condition M f (x, u) M, together with the assumption that V belongs to the domain V M of functions satisfying M (1 β)v ( x) M for all x. So these two definitions jointly imply that [T u V ](x) M + β (1 β) 1 M = (1 β) 1 M and [T u V ](x) M + β (1 β) 1 M = (1 β) 1 M Similarly, given any V V M, one has M (1 β) [T V ](x) M for all x X. Therefore both V T u V and V T V map V M into itself.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 62 of 81 A First Contraction Mapping The definition [T u V ](x) := f (x, u(x)) + β E [V ( x) x, u(x)] implies that for any two functions V 1, V 2 V M, one has [T u V 1 ](x) [T u V 2 ](x) = β E [V 1 ( x) V 2 ( x) x, u(x)] The definition of the sup norm therefore implies that T u V 1 T u V 2 = sup x X [T u V 1 ](x) [T u V 2 ](x) = sup x X β E [V 1 ( x) V 2 ( x) x, u(x)] = β sup x X E [V 1 ( x) V 2 ( x) x, u(x)] β sup x X V 1 ( x) V 2 ( x) = β V 1 V 2 Hence V M V T u V V M is a contraction mapping with factor β < 1.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 63 of 81 Applying the Contraction Mapping Theorem, I For each fixed policy u, the contraction mapping V M V T u V V M has a unique fixed point in the form of a function V u V M. Furthermore, given any initial function V V M, consider the infinite sequence of mappings [T u ] k V (k N) that result from applying the operator T u iteratively k times. The contraction mapping property of T u implies that [T u ] k V V u 0 as k.

Characterizing the Fixed Point, I Starting from V 0 = 0 and given any initial state x X, note that [T u ] k V 0 (x) = [T u ] ( [T u ] k 1 V 0 ) (x) = f (x, u(x)) + β E [( [T u ] k 1 V 0 ) ( x) x, u(x) ] It follows by induction on k that [T u ] k V 0 ( x) equals the expected discounted total payoff E k t=1 βt 1 f (x t, u t ) of starting from x 0 = x and then following the policy x u(x) for k subsequent periods. Taking the limit as k, it follows that for any state x X, the value V u ( x) of the fixed point in V M is the expected discounted total payoff E t=1 βt 1 f (x t, u t ) of starting from x 0 = x and then following the policy x u(x) for ever thereafter. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 64 of 81

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 65 of 81 A Second Contraction Mapping Recall the definition [T V ](x) := max {f (x, u) + βe [V ( x) x, u]} u F (x) Given any state x X and any two functions V 1, V 2 V M, define u 1, u 2 F (x) so that for k = 1, 2 one has [T V k ](x) = f (x, u k ) + βe [V k ( x) x, u k ]} Note that [T V 2 ](x) f (x, u 1 ) + βe [V 2 ( x) x, u 1 ]} implying that [T V 1 ](x) [T V 2 ](x) βe [V 1 ( x) V 2 ( x) x, u 1 ]} β V 1 V 2 Similarly, interchanging 1 and 2 in the above argument gives [T V 2 ](x) [T V 1 ](x) β V 1 V 2. Hence T V 1 T V 2 β V 1 V 2, so T is also a contraction.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 66 of 81 Applying the Contraction Mapping Theorem, II Similarly the contraction mapping V T V has a unique fixed point in the form of a function V V M such that V ( x) is the maximized expected discounted total payoff of starting in state x 0 = x and following an optimal policy for ever thereafter. Moreover, V = T V = T u V. This implies that V is also the value of following the policy x u (x) throughout, which must therefore be an optimal policy.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 67 of 81 Characterizing the Fixed Point, II Starting from V 0 = 0 and given any initial state x X, note that [T ] k V 0 (x) = [T ] ( [T ] k 1 V 0 ) (x) = max u F (x) {f (x, u) + β E [( [T ] k 1 V 0 ) ( x) x, u ] } It follows by induction on k that [T ] k V 0 ( x) equals the maximum possible expected discounted total payoff E k t=1 βt 1 f (x t, u t ) of starting from x 1 = x and then following the backward sequence of optimal policies (uk, u k 1, u k 2,..., u 2, u 1 ), where for each k the policy x uk (x) is optimal when k periods remain.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 68 of 81 Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 69 of 81 Method of Successive Approximation The method of successive approximation starts with an arbitrary function V 0 V M. For k = 1, 2,...,, it then repeatedly solves the pair of equations V k = T V k 1 = T u k V k 1 to construct sequences of: 1. state valuation functions X x V k (x) R; 2. policies X x uk (x) F (x) that are optimal given that one applies the preceding state valuation function X x V k 1 ( x) R to each immediately succeeding state x. Because the operator V T V on V M is a contraction mapping, the method produces a convergent sequence (V k ) k=1 of state valuation functions whose limit satisfies V = T V = T u V for a suitable policy X x u (x) F (x).

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 70 of 81 Monotonicity For all functions V V M, policies u, and states x X, we have defined [T u V ](x) := f (x, u(x)) + β E [V ( x) x, u(x)] and [T V ](x) := max u F (x) {f (x, u) + β E [V ( x) x, u]} Notation Given any pair V 1, V 2 V M, we write V 1 V 2 to indicate that the inequality V 1 (x) V 2 (x) holds for all x X. Definition An operator V M V TV V M is monotone just in case whenever V 1, V 2 V M satisfy V 1 V 2, one has TV 1 TV 2. Theorem The following operators on V M are monotone: 1. V T u V for all policies u; 2. V T V for the optimal policy.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 71 of 81 Proof that T u is Monotone Given any state x X and any two functions V 1, V 2 V M, the definition of T u implies that [T u V 1 ](x) := f (x, u(x)) + β E [V 1 ( x) x, u(x)] and [T u V 2 ](x) := f (x, u(x)) + β E [V 2 ( x) x, u(x)] Subtracting the second equation from the first implies that [T u V 1 ](x) [T u V 2 ](x) = β E [V 1 ( x) V 2 ( x) x, u(x)] If V 1 V 2 and so the inequality V 1 ( x) V 2 ( x) holds for all x X, it follows that [T u V 1 ](x) [T u V 2 ](x). Since this holds for all x X, we have proved that T u V 1 T u V 2.

Proof that T is Monotone Given any state x X and any two functions V 1, V 2 V M, define u 1, u 2 F (x) so that for k = 1, 2 one has [T V k ](x) = max u F (x) {f (x, u) + β E [V k ( x) x, u]} = [T u kv k ](x) = f (x, u k ) + β E [V k ( x) x, u k ] It follows that [T V 1 ](x) f (x, u 2 ) + β E [V 1 ( x) x, u 2 ] and [T V 2 ](x) = f (x, u 2 ) + β E [V 2 ( x) x, u 2 ] Subtracting the second equation from the first inequality gives [T V 1 ](x) [T V 2 ](x) β E [V 1 ( x) V 2 ( x) x, u 2 ] If V 1 V 2 and so the inequality V 1 ( x) V 2 ( x) holds for all x X, it follows that [T V 1 ](x) [T V 2 ](x). Since this holds for all x X, we have proved that T V 1 T V 2. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 72 of 81

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 73 of 81 Starting Policy Improvement The method of policy improvement starts with any fixed policy u 0 or X x u 0 (x) F (x), along with the value V u 0 V M of following that policy for ever. The value V u 0 is the unique fixed point satisfying V u 0 = T u 0 V u 0 which belongs to the domain V M of suitably bounded functions. At each step k = 1, 2,..., given the previous policy u k 1 and associated value V u k 1 satisfying V u k 1 = T u k 1V u k 1: 1. the policy u k is chosen so that T V u k 1 = T u k V u k 1; 2. the state valuation function x V k (x) is chosen as the unique fixed point in V M of the operator T u k.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 74 of 81 Policy Improvement Theorem Theorem The infinite sequence (u k, V u k ) k N consisting of pairs of policies u k with their associated valuation functions V u k V M satisfies 1. V u k V u k 1 for all k N (policy improvement); 2. V u k V 0 as k, where V is the infinite-horizon optimal state valuation function in V M that satisfies T V = V.

Proof of Policy Improvement By definition of the optimality operator T, one has T V T u V for all functions V V M and all policies u. So at each step k of the policy improvement routine, one has T u k V u k 1 = T V u k 1 T u k 1 V u k 1 = V u k 1 In particular, T u k V u k 1 V u k 1. Now, applying successive iterations of the monotonic operator T u k implies that V u k 1 T u k V u k 1 [T u k ] 2 V u k 1...... [T u k ] r V u k 1 [T u k ] r+1 V u k 1... But the definition of V u k implies that for all V V M, including V = V u k 1, one has [T u k ] r V V u k 0 as r. Hence V u k = sup r [T u k ] r V u k 1 V u k 1, thus confirming that the policy u k does improve u k 1. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 75 of 81

Proof of Convergence Recall that at each step k of the policy improvement routine, one has T u k V u k 1 = T V u k 1 and also T u k V u k = V u k. Now, for each state x X, define ˆV (x) := sup k N V u k (x). Because V u k V u k 1 and T u k is monotonic, one has V u k = T u k V u k T u k V u k 1 = T V u k 1. Next, because T is monotonic, it follows that ˆV = sup V u k k sup k T V u k 1 = T (sup V u k 1 ) = T ˆV k Similarly, monotonicity of T and its definition together imply that ˆV = sup V u k k = sup T u k V u k k sup T V u k k = T (sup V u k ) = T ˆV k Hence ˆV = T ˆV = V, because T has a unique fixed point. Therefore V = sup k V u k and so, because the sequence V u k (x) is non-decreasing, one has V u k (x) V (x) for each x X. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 76 of 81

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 77 of 81 Lecture Outline Stochastic Linear Difference Equations in One Variable Explicit Solution Gaussian Disturbances Optimal Saving Preferences and Constraints The Two Period Problem The T Period Problem A General Savings Problem General Problems Finite Horizon Case Infinite Time Horizon Stationarity and the Bellman Equation Finding a Fixed Function Successive Approximation and Policy Improvement Unboundedness

Unbounded Utility In economics the boundedness condition M f (x, u) M is rarely satisfied! Consider for example the isoelastic utility function c 1 ɛ if ɛ > 0 and ɛ 1 u(c; ɛ) = 1 ɛ ln c if ɛ = 1 This function is obviously: 1. bounded below but unbounded above in case 0 < ɛ < 1; 2. unbounded both above and below in case ɛ = 1; 3. bounded above but unbounded below in case ɛ > 1. Also commonly used is the negative exponential utility function defined by u(c) = e αc where α is the constant absolute rate of risk aversion (CARA). This function is bounded above and, provided that c 0, also below. University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 78 of 81

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 79 of 81 Warning Example: Statement of Problem The following example shows that there can be irrelevant unbounded solutions to the Bellman equation. Example Consider the problem of maximizing t=0 βt (1 u t ) where u t [0, 1], 0 < β < 1, and x t+1 = 1 β (x t + u t ), with x 0 > 0. Notice that x t+1 1 β x t implying that x t β t x 0 as t. Of course the return function [0, 1] u f (x, u) = 1 u [0, 1] is uniformly bounded.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 80 of 81 Warning Example: Unbounded Spurious Solution The Bellman equation is } J(x) = u (x) = arg max u [0,1] { ( )} 1 1 u + βj (x + u) β Even though the return function is uniformly bounded, this Bellman equation has an unbounded spurious solution. Indeed, we find a spurious solution with J(x) γ + x for a suitable constant γ. The condition for this to solve the Bellman equation is that γ + x = max u [0,1] {1 u + β [γ + 1β ]} (x + u) = max u [0,1] {1 + βγ + x} = 1 + βγ + x which is true iff γ = 1 + βγ and so γ = (1 β) 1.

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 81 of 81 Warning Example: True Solution The problem is to maximize t=0 βt (1 u t ) where u t [0, 1], 0 < β < 1, and x t+1 = 1 β (x t + u t ), with x 0 > 0. The obvious optimal policy is to choose u t = 0 for all t, giving the maximized value J(x) = t=0 βt = (1 β) 1. Indeed the bounded function J(x) = (1 β) 1, together with u = 0, both independent of x, do indeed solve the Bellman equation ( )} 1 J(x) = max u [0,1] {1 u + βj (x + u) β { = max u [0,1] 1 u + β(1 β) 1 } = 1 + β 1 β = 1 1 β