Basic Deterministic Dynamic Programming

Size: px

Start display at page:

Download "Basic Deterministic Dynamic Programming"

Dennis Dixon
6 years ago
Views:

1 Basic Deterministic Dynamic Programming Timothy Kam School of Economics & CAMA Australian National University ECON8022, This version March 17, 2008

2 Motivation What do we do? Outline Deterministic IHDP Deterministic IHDP: Example From Sequence to Recursive Problem Histories, strategies, and value function Principle of Optimality Existence and Uniqueness of value function Useful results Strategy Space Backing out Strategies: Existence and Uniqueness? Stationary Optimal Strategies Example: RCK model Existence of S.O.S. Properties of v Unique π Dynamic properties

3 Motivation, Plan of Attack Previously, we heuristically motivated the -horizon (deterministic) planning problem, as limiting case of the smaller finite T -horizon problem. But... Not precise enough...!

4 Motivation, Plan of Attack In this lecture... We show rigorously how to characterize -horizon (deterministic) planning problem as a recursive infinite horizon dynamic programming (IHDP) problem. This leads to the analysis of the solution to a functional equation: the Bellman equation or the Bellman operator, of the form: j B : C b (X) C b (X), C b (X) := w : X R w ff. is continuous, bounded Fixed-point solution to a mapping w B(w) defined on a space of functions. (So we can think of functions w : X R as points living in such a space, just like points x R.) Economic meaning of the value function w?

5 Motivation, Plan of Attack In this lecture (cont d)... Step 1. Characterize fixed-point solution: v = B(v): Existence of v Uniqueness of v Properties of v. Inherits continuity, boundedness...? Step 2. Know v. Then reverse engineer optimal strategies that support the optimal value v(x 0) of the decision process starting from state x 0 X: Focus on strategies π = (π 0, π 1,...) induced by stationary policy/decision functions, x t π t(x t) such that π t(x) = π s(x) = π(x) for all t s. Existence of π? Uniqueness of optimal strategy, π = {π(x t)} t=0? Concrete Example: Characterizing and solving IHDP problem for the RCK model.

6 Setting Up: Deterministic IHDP Key objects in the infinite-horizon discounted optimization problem {X, A, Γ, f, U, β}: 1. X is state space. 2. A is action space. 3. Γ : X 2 A is feasible action correspondence. 4. f : X A X is state transition law. 5. U : X A R is per-period payoff/reward/utility. 6. β is constant subjective discount factor.

7 Setting Up: Example In the RCK optimal growth model, {X, A, Γ, f, U, β} is: 1. X = R + space of possible states of capital stock. 2. A = R +, space of saving/consumption choice in RCK. 3. For each current state k X, feasible action set is: { } Γ(k) = k 2 A 0 k F (k) + (1 δ)k. 4. State transition law given by k = F (k) + (1 δ)k c f(k, c). 5. Per-period payoff/reward/utility: U(c) R 6. β (0, 1) is constant subjective discount factor.

8 From Sequence to Recursive Problem The sequence problem in general: { v(x 0 ) = sup β t U(u t, x t ) : {u t Γ(x t)} t=0 t=0 } x t+1 = f(x t, u t ), given x 0 X (P1) Remarks: v(x 0) is l.u.b. or maximal lifetime payoff w.r.t. initial state x 0, if planner follows a optimal path of actions that solve the RHS maximal problem. v has an indirect utility interpretation. Recall duality theorem of optimization in e.g. consumer theory? Why sup and not max for the RHS problem in (P1)?

9 Histories, strategies To understand (P1) and the recursive value function approach, describe the sequential decision making problem as follows. Definition A t-history: h t = {x 0, u 0,..., x t 1, u t 1, x t}. Set of all possible t-histories: Let H 0 = X. Then H t = {h t t = 1, 2,...}. Period t state under history h t : x t[h t ]. A (feasible) strategy σ = {σ t(h t )} t=0 is a plan of action s.t. σ t : H t A and the actions are feasible for each history: σ t(h t ) Γ(x t[h t ]). Set of all feasible strategies: j ff Σ = σ σt(ht ) Γ(x t[h t ]), h t H t, t N. Note: Distinguish between action actually taken u t and component of strategy σ t.

10 Histories, strategies Few will have the greatness to bend history itself; but each of us can work to change a small portion of events, and in the total of all those acts will be written the history of this generation. J.F.K.

11 Idea: Histories, strategies Planner (decision maker) stands at arbitrary initial date t = 0. Planner looks ahead; knows Current state, x = x 0 Transition law f Know reward function U and knows discount factor β Planner considers (possibly arbitrary) strategy σ = {σ t}. Fix this plan. So, strategy σ, would induce a fixed path for the state {x t(σ, x 0)} t N. Under strategy σ, at t = 0, h 0 = x 0(σ, x 0) = x 0, ( h 0 = x 0 x 1(σ, x 0) = f(x 0, u 0(σ, x 0)) u 0(σ, x 0) = σ 0(h 0 (σ, x 0)) Under strategy σ, at t = 1, x 2(σ, x 0) = f(x 1(σ, x 0), u 1(σ, x 0)) ( h 1 (σ, x 0) = (x 0, u 0(σ, x 0), x 1(σ, x 0)) u 1(σ, x 0) = σ 1(h 1 (σ, x 0))

12 Histories, strategies, and reward By induction, for each fixed strategy σ Σ, for given x 0, we have for all t N: h t (σ, x 0 ) ={x 0 (σ), u 0 (σ, x 0 ),..., x t (σ, x 0 )} u t (σ, x 0 ) =σ t (h t (σ, x 0 )) x t+1 (σ, x 0 ) =f(x t (σ, x 0 ), u t (σ, x 0 )) So we can generate the infinite sequence of states and actions {x t (σ, x 0 ), u t (σ, x 0 )} t N, induced by the strategy σ.

13 Histories, strategies, and value function Each period t action u t (σ, x 0 ), consistent with strategy σ, and starting from initial state x 0, induces a period-t payoff: U t (σ)(x 0 ) = U[x t (σ, x 0 ), u t (σ, x 0 )]. Then total discounted payoff/reward generated under (σ, x 0 ) is W (σ)(x 0 ) = β t U t (σ)(x 0 ). t=0 Definition: The value function is the maximal total discounted payoff across all possible (i.e. feasible) strategies. v(x 0 ) = sup W (σ)(x 0 ). σ Σ

14 Sequence Problem to Recursive Problem First, need to be able to compare W (σ)(x 0 ) and W ( σ)(x 0 ) for any σ σ. Not possible if W is not bounded. Assumption 1. There exists K < + s.t. K U(u t, x t ) K for all (u t, x t ) A X. This buys us: Lemma 1. v : X R is bounded. and Lemma 2. For any initial state x 0 X, and ɛ > 0, there is a strategy σ s.t. W (σ)(x 0 ) v(x 0 ) ɛ.

15 The Bellman Principle of Optimality So now we are ready to show that the sequence problem (PI), of picking an optimal strategy σ, Idea: yields an optimal value function v, that satisfies a recursive representation, that take the form of a Bellman functional equation. So then solving the Bellman equation for v, also implies we have solved for the value function for (P1), and so we can recursive recover the supporting optimal strategy/strategies, while solving the Bellman equation.

16 The Bellman Principle of Optimality Theorem (Bellman principle of optimality) Let x denote the current state and x the next-period state. For each x X, the value function v : X R of (P1) satisfies v(x) = sup {U(x, u) + βv(x )} s.t. x = f(x, u) (1) u Γ(x) Remark. The Bellman equation says that one s actions or decisions along the optimal path has to be time consistent. That is, once we are on this path, there is no incentive to deviate from it along any future decision nodes.

17 Proof. Let W : X R be s.t W (x) = for any x X. Trick: w.t.s. sup {U(x, u) + βv(f(x, u))}. u Γ(x) v(x) W (x), and, v(x) W (x), so then, v(x) = W (x) for any x X. So we will break this proof down into these TWO steps.

18 Proof. (Continued...) Step 1. Show v(x) W (x). Pick a feasible strategy σ Σ. Note: x = f(x, u). By Lemma 2: σ s.t. continuation value: W (σ)(x ) v(x ) ɛ. So then v(x) U(x, u) + βw (σ)(f(x, u)) = U(x, u) + βv(f(x, u)) βɛ. Since this holds for all u Γ(x), and ɛ > 0 is arbitrary, then v(x) sup {U(x, u) + βv(f(x, u)) βɛ} u Γ(x) sup {U(x, u) + βv(f(x, u))} = W (x). u Γ(x)

19 Proof. (Continued...) Step 2. Show v(x) W (x). Fix any x X and ɛ > 0. Let σ be s.t. W (σ)(x) v(x) ɛ. Starting at x, pick u = σ 0(x) σ, and so x 1 = f(x, u). Define the continuation strategy under σ, following σ 0(x) be σ 1. Since σ 1 σ and σ Σ, then supported continuation value is either optimal or it is not, so So then, v(x 1) W (σ 1)(x 1). v(x) ɛ W (σ)(x) = U(x, u) + βw (σ 1)[f(x, u)] U(x, u) + βv(f(x, u)) sup {U(x, u) + βv(f(x, u))} = W (x). u Γ(x)

20 Existence and Uniqueness of value function Notes: We moved from a problem (P1) of picking an infinite plan of action σ to one of solving recursively for the optimal value function v : X R. Now, the problem then become one of not knowing what the value function v : X R looks like. Idea: we can start with any guess of v(x) for all x X, and apply the Bellman operator to produce another guess of v(x). The theorem tells us that these successive approximations of v will eventually converge to a unique value function v : X R that satisfies both sides of the Bellman equation.

21 Existence and Uniqueness of value function Application of The Contraction Mapping Principle (a.k.a. The Banach* Fixed-Point Theorem). Stefan Banach *Stefan Banach (1922), Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fundamenta Mathematicae, 3:

22 Existence and Uniqueness of value function Let B(X) denote the set of all bounded functions from X to R. So then v B(X). Use the sup-norm metric to measure how close two functions v, w B(X) are: d (v, w) = sup v(x) w(x). x X Let the map T on B(X) be defined as follows. For w B(X), T w(x) = sup {U(x, u) + βw(f(x, u))} u Γ(x) at any x X. Since U, w B(X), then T w B(X). So our Bellman operator T : B(X) B(X). A fixed-point of this operator will give us the value function of (P1).

23 Useful results Warning. You should have pre-read the material on real analysis listed prior to Semester 1 (see e.g. SLP, Chapter 3). Concepts: Metric spaces, sequences, limits, Cauchy sequences, vector spaces, Banach spaces. We say a complete metric space (X, d) is one where every Cauchy sequence in the set X converges to a limit, and the limit is in X. Lemma The metric space (B(X), d ) is complete. Definition Let (S, d) be a metric space and the map T : S S. Let T (w) := T w be the value of T at w S. T is a contraction with modulus 0 β < 1 if d(t w, T v) βd(w, v) for all w, v S.

24 Useful results Theorem (Banach Fixed Point Theorem) If (S, d) is a complete metric space and T : S S is a contraction, then there is a fixed point for T and it is unique. Proof. See lecture notes. You need to know how this works! Existence: Prove using completeness of (S, d) and triangle inequality exists at least one fixed point. Uniqueness: Then prove by contradiction that there can only be one such fixed point.

25 Useful Results We can make use of the following result to verify whether T is a contraction mapping. Lemma (Blackwell s sufficient conditions for a contraction) Let M : B(X) B(X) be any map satisfying 1. Monotonicity: For any v, w B(X) such that w v Mw Mv. 2. Discounting: There exists a 0 β < 1 such that M(w + c) = Mw + βc, for all w B(X) and c R. (Define (f + c)(x) = f(x) + c.) Then M is a contraction with modulus β.

26 Application: Banach Fixed-point Theorem Theorem (Existence and Uniqueness of value function) v : X R is the unique fixed point of the operator T : B(X) B(X), such that if any w B(X) satisfies w(x) = sup {U(x, u) + βw(f(x, u))} u Γ(x) at any x X, then it must be that w = v. Proof. T : B(X) B(X) is a contraction with modulus β. Since (B(X), d ) is a complete metric space, v : X R is the unique fixed point of T by Theorem 4.

27 Backing out Strategies OK, so far we had done the following: 1. Show the Bellman Principle of Optimality: Sequence Problem to Recursive Problem. 2. Characterize solution to Bellman equation finding optimal value function v, as the fixed point of the Bellman operator. 3. Study basic conditions for existence and uniqueness of solution v B(X) i.e. in value function space. 4. What does finding v imply about the supporting optimal strategy or strategies in the strategy space? 5. Turns out we can get existence of optimal strategies, for free, with our existing set of assumptions. 6. But, need further structure on the model s primitives, before we can conclude uniqueness of optimal strategy.

28 Optimal strategy Theorem If U bounded (Ass. 1), a strategy σ is optimal if and only if W (σ) satisfies the Bellman equation W (σ)(x) = at each x X. sup {U(x, u) + βw (σ)(f(x, u))} u Γ(x)

29 Backing out Strategies: Stationary Strategies Often we want to be able to say more about the optimal strategies. Focus on stationary optimal strategies. Definition A Markovian strategy π for {X, A, U, f, Γ, β} is a strategy such that π = {π t } t N and π t = π t (x t [h t ]), where for each t, π t : X A such that π t (x t ) Γ(x t ). Definition A Markovian strategy π = {π t } t N with the further property that π t (x) = π τ (x) = π(x) for all x X is called a stationary strategy.

30 Stationary optimal strategies: existence Need more structure: Assumption 2. U is continuous on X A. Assumption 3. f is continuous on X A. Assumption 4. Γ is a continuous, compact valued correspondence on X.

31 Stationary optimal strategies: existence Together with assumption U is bounded on X A, conclude: Step 1 Existence of a unique continuous and bounded value function that satisfies the Bellman Principle of Optimality; Step 2 Existence of a well-defined feasible action correspondence admitting a stationary optimal strategy that satisfies the Bellman Principle of Optimality; and Step 3 This stationary strategy delivers a total discounted payoff that is equal to the value function, and is indeed an optimal strategy.

32 Stationary optimal strategies: existence Notice now with Assumptions 1-4, we can focus on the space of bounded and continuous functions from X to R denoted by C b (X). Previously we defined the Bellman operator T on B(X). Now the space in which our candidate value functions live is C b (X). Define the operator T : C b (X) C b (X) by T w(x) = max {U(x, u) + βw(f(x, u))} u Γ(x) for each x X.

33 Stationary optimal strategies: Step 1 Lemma T : C b (X) C b (X) is a contraction with modulus β. Finally, by Banach s fixed point theorem, we can show the existence of a unique continuous and bounded value function that satisfies the Bellman Principle of Optimality. Theorem There exists a unique w C b (X) such that given each x X w (x) = max u Γ(x) {U(x, u) + βw (f(x, u))}.

34 Stationary optimal strategies: Step 2 1. Define G : X P (A) by G (x) = arg max {U(x, u) + u Γ(x) βw (f(x, u))}. 2. By the Maximum Theorem G is a nonempty upper-semicontinuous correspondence. 3. Thus there exists a function π : X A such that given each x X, π G Γ(x). 4. By construction, at all x X and for any ũ π (x), it must be that w (x) = max u Γ(x) {U(x, π (x)) + βw [f(x, π (x))]} U(x, ũ) + βw [f(x, ũ)], ũ Γ(x). 5. So the function π : X A defines a stationary optimal strategy.

35 Stationary optimal strategies: Step 3 1. Fix any initial state x X. 2. sequence {x t(x, π ), u t(x, π )} under s.o.s. π, from x. 3. Payoff U t(π )(x) := U[x t(x, π ), u t(x, π )] from π beginning from x. 4. Shorthand: x t := x t(x, π ) and u t := u t(x, π ). By definition, Show: W (π )(x) = X β t U t(π )(x). t=0 w (x) =U(x, π (x)) + βw [f(x, π (x))] ( TX 1 ) = lim T β t U t(π )(x) + β T w [x T (π, x)] = W (π )(x). t=0 5. Since unique fixed-point w satisfying the Bellman equation, then W (π )(x) = max u Γ(x) {U(x, u) + βw (π )[f(x, u)]}.

36 Stationary optimal strategies So Steps 1-3 give us the result: Theorem If the stationary dynamic programming problem {X, A, Γ, f, U, β} satisfies Assumptions 1-4, then there exists a stationary optimal policy π. Furthermore the value function v = W (π ) is bounded and continuous on X, and satisfies for each x X, v(x) = max {U(x, u) + βv(f(x, u))} u Γ(x) =U(x, π (x)) + βw (π )(f(x, π (x))). Note: If we have additional strict concavity assumptions on U and f, an optimal strategy not only exists, but will also be unique s.o.s is unique.

37 A concrete example: The (T )-RCK model See lecture notes for details. Here we will add a few more assumptions. Things to look out for: 1. Strict concavity of U will buy monotonicity of optimal saving/consumption path. 2. Strict concavity of U and quasi-concavity of f will buy: Monotonicity of optimal saving/consumption path. Steady state values that are independent of preferences U. Depends only on technology. Unique optimal strategy unique s.o.s. plus U, f C 1 and f also strictly concave, then v continuously differentiable at k int(x), and optimal path can be described by Euler equations!

38 A concrete example: The (T )-RCK model The sequence problem in the RCK model previously, can be written down as: max {c t,k t+1 } t N subject to β t U(c t ) t=0 k 0 =k given, f(k t ) =c t + k t+1, for all t N. 0 k t+1 f(k t ).

39 A concrete example: The (T )-RCK model {X, A, Γ, U, g, β} fully describes the stationary discounted dynamic programming in the RCK model. State space, X = R +. State variable k X. Action space, A = R +. State transition function g : X A X, such that for each (k, c) X A, the next period s state is k = g(k, c) = f(k) c. Feasible action correspondence, Γ(k) = [0, f(k)]. Per period payoff from action c Γ(k) given state k, U(k, c) = U(c).

40 We would like to check the following items for this example model: 1. When do (stationary) optimal strategies exist? 2. Properties of the value function 3. Is an optimal strategy unique here? 4. What are the dynamic properties i.e. the trajectory of {c t, k t } t N under the optimal strategy? What is the behavior of the transitional path (short run)? The steady state (long run)?

41 A concrete example: The (T )-RCK model Alternative restrictions: Assumption. Instead of U bounded, let X = A = [0, k] where k <. State and action spaces are compact. U is continuous on X A. So indirectly U (restricted to compact X A) will be bounded. Specifically, in this model U : A R. Assumption. f : X R + is continuous and nondecreasing on X, and f is bounded

42 RCK model: 1. Existence of S.O.S Theorem There exists a stationary optimal strategy π : X A for the optimal growth model given by {X, A, Γ, U, g, β}, such that v(π)(k) = max k Γ(k) {U(f(k) k ) + βv(π)[k ]} =U(π(k)) + βv[g(k, π(k))]}.

43 RCK model: 2. Value function inherits primitives Theorem v : X R is a nondecreasing function on X. Proof. Define T : C b (X) C b (X) by T w(k) = max {U(f(k) k [0,f(k)] k ) + βw(k )} T can be proved to be a contraction on (C b (X), d ). Since C b (X) complete T : C b (X) C b (X) has a unique fixed point v C b (X) Since f is nondecreasing on X and c Γ(k) = [0, f(k)], then k = g(k, c) = f(k) c is also nondecreasing on X. Starting at any w on X that is nondecreasing and given g is nondecreasing, T w is also nondecreasing on X. Therefore v is nondecreasing on X.

44 RCK model: 3. Uniqueness of optimal strategy Theorem Suppose: then 1. All assumptions as before, and 2. U strictly increasing and strictly concave on A, 1. the optimal savings level π(k) := f(k) c(k) under the optimal strategy π, where k = π(k), is nondecreasing on X; and 2. if f is (weakly) concave on X, then, the value function v is (weakly) concave on X; and 3. the correspondence G : X 2 A : j G (k) = k ff max {U(f(k) k ) + βw (k )}, k X. k Γ(k) is a singleton set (a set of only one maximizer k ) for each state k X. Therefore G admits a unique optimal strategy π. Furthermore, π is a continuous function on X.

45 RCK model: 4. Dynamic properties Theorem Suppose: then 1. All assumptions as before, 2. U C 1 ((0, )) and lim c 0 U (c) =, and 3. f C 1 ((0, )) and lim k 0 f (k) = 1/β, 1. the solution k = π(k) is such that π(k) (0, f(k)) for all k X; and 2. Benveniste and Scheinkman: v : X R is a C 1 function, s.t. v(k) is continuously differentiable at any feasible k int(x) with its derivative at k: v (k) = U(f(k) π(k))f (k). 3. So then the optimal path can be describe by the Euler equations: U c[f(k) π(k)] = βu c[f(k ) π(k )]f k (π(k)), such that k = π(k).

46 RCK model: 4. Dynamic properties Theorem Under Assumptions above, the optimal saving decision function is increasing on X. That is, for k > k, π(k) > π( k). Theorem Given any initial condition k X, the sequence of states {k t+1 (k)} t N under the optimal policy function π : X P (A), and the sequence of consumption levels {c t (k)} t N converge monotonically to k and c respectively. Furthermore, k and c are unique.

Economics 8105 Macroeconomic Theory Recitation 3

Economics 8105 Macroeconomic Theory Recitation 3 Conor Ryan September 20th, 2016 Outline: Minnesota Economics Lecture Problem Set 1 Midterm Exam Fit Growth Model into SLP Corollary of Contraction Mapping