Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania

Similar documents
Stochastic Dynamic Programming: The One Sector Growth Model

Lecture 5: The Bellman Equation

Stochastic convexity in dynamic programming

Basic Deterministic Dynamic Programming

Recursive Methods. Introduction to Dynamic Optimization

1 Stochastic Dynamic Programming

ECON 582: Dynamic Programming (Chapter 6, Acemoglu) Instructor: Dmytro Hryshko

Outline Today s Lecture

Dynamic Programming Theorems

[A + 1 ] + (1 ) v: : (b) Show: the derivative of T at v = v 0 < 0 is: = (v 0 ) (1 ) ; [A + 1 ]

Markov Decision Processes Infinite Horizon Problems

Fixed Term Employment Contracts. in an Equilibrium Search Model

The Principle of Optimality

Stochastic Shortest Path Problems

Contents. An example 5. Mathematical Preliminaries 13. Dynamic programming under certainty 29. Numerical methods 41. Some applications 57

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

On the Principle of Optimality for Nonstationary Deterministic Dynamic Programming

Deterministic Dynamic Programming in Discrete Time: A Monotone Convergence Principle

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

Economics 2010c: Lecture 2 Iterative Methods in Dynamic Programming

A simple macro dynamic model with endogenous saving rate: the representative agent model

Notes on Measure Theory and Markov Processes

Chapter 3. Dynamic Programming

Reinforcement Learning. Introduction

Metric Spaces and Topology

ECON 2010c Solution to Problem Set 1

Time is discrete and indexed by t =0; 1;:::;T,whereT<1. An individual is interested in maximizing an objective function given by. tu(x t ;a t ); (0.

Uniform turnpike theorems for finite Markov decision processes

Problem set 1, Real Analysis I, Spring, 2015.

Macro 1: Dynamic Programming 1

Economics 8105 Macroeconomic Theory Recitation 3

Notes on Iterated Expectations Stephen Morris February 2002

arxiv: v1 [math.oc] 9 Oct 2018

1 Stochastic Dynamic Programming

Markov Decision Processes

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Reinforcement Learning

1 Markov decision processes

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang

Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts.

Continuity of equilibria for two-person zero-sum games with noncompact action sets and unbounded payoffs

Advanced Economic Growth: Lecture 21: Stochastic Dynamic Programming and Applications

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Department of Economics Working Paper Series

Understanding (Exact) Dynamic Programming through Bellman Operators

and C be the space of continuous and bounded real-valued functions endowed with the sup-norm 1.

A Proof of the EOQ Formula Using Quasi-Variational. Inequalities. March 19, Abstract

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Slides II - Dynamic Programming

Costly Expertise. Dino Gerardi and Leeat Yariv yz. Current Version: December, 2007

Discussion Class Notes

G Recitation 3: Ramsey Growth model with technological progress; discrete time dynamic programming and applications

MATH4406 Assignment 5

A Simple No-Bubble Theorem for Deterministic Sequential Economies

Approximate dynamic programming for stochastic reachability

Markov Decision Problems where Means bound Variances

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming

Distributed Optimization. Song Chong EE, KAIST

REMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I.

Problem Set 2: Proposed solutions Econ Fall Cesar E. Tamayo Department of Economics, Rutgers University

Introduction to Continuous-Time Dynamic Optimization: Optimal Control Theory

Practical Dynamic Programming: An Introduction. Associated programs dpexample.m: deterministic dpexample2.m: stochastic

THE CARLO ALBERTO NOTEBOOKS

1 Definition of the Riemann integral

Simple Consumption / Savings Problems (based on Ljungqvist & Sargent, Ch 16, 17) Jonathan Heathcote. updated, March The household s problem X

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers

MATHS 730 FC Lecture Notes March 5, Introduction

Lecture 1- The constrained optimization problem

Near convexity, metric convexity, and convexity

Introduction to Recursive Methods

Equilibria in a Stochastic OLG Model: A Recursive Approach

On Kusuoka Representation of Law Invariant Risk Measures

Deterministic Dynamic Programming

Iowa State University. Instructor: Alex Roitershtein Summer Homework #1. Solutions

Math 421, Homework #9 Solutions

Value and Policy Iteration

We say that the function f obtains a maximum value provided that there. We say that the function f obtains a minimum value provided that there

Lecture 1: Dynamic Programming

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries

Berge s Maximum Theorem

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes

Payoff Continuity in Incomplete Information Games

DYNAMIC LECTURE 5: DISCRETE TIME INTERTEMPORAL OPTIMIZATION

A Perturbation Approach to Approximate Value Iteration for Average Cost Markov Decision Process with Borel Spaces and Bounded Costs

6.254 : Game Theory with Engineering Applications Lecture 7: Supermodular Games

INTRODUCTION TO MARKOV DECISION PROCESSES

Lecture Notes 8

Example I: Capital Accumulation

UNIVERSITY OF VIENNA

Information Relaxation Bounds for Infinite Horizon Markov Decision Processes

Decomposability and time consistency of risk averse multistage programs

Almost sure convergence to zero in stochastic growth models

4. Convex Sets and (Quasi-)Concave Functions

Fixed Point Theorems

Lecture notes for Macroeconomics I, 2004

ECON607 Fall 2010 University of Hawaii Professor Hui He TA: Xiaodong Sun Assignment 2

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

Economics 204 Summer/Fall 2017 Lecture 7 Tuesday July 25, 2017

Coordinating Inventory Control and Pricing Strategies with Random Demand and Fixed Ordering Cost: The Finite Horizon Case

Transcription:

Stochastic Dynamic Programming Jesus Fernande-Villaverde University of Pennsylvania 1

Introducing Uncertainty in Dynamic Programming Stochastic dynamic programming presents a very exible framework to handle multitude of problems in economics. We generalie the results of deterministic dynamic programming. Problem: taking care of measurability. 2

References Read chapter 9 of SLP!!!!!!!!!!!!!! Problem of SLP: based on Borel sets. Raises issues of measurability. See page 253 and 254 of SLP. Bertsekas and Shreve (Stochastic Optimal Control, 1978) redo much of the theory with universal measurability Read chapter 10 of SLP: it is full of economic applications. 3

Environment (X; X ): universally measurable space for the endogenous state. (Z; Z): universally measurable space for the exogenous state. (S; S): (X; X ) (Z; Z) : Q: stationary transition function for (Z; Z). : X Z! X: correspondence constraint. A = f(x; y; ) 2 X X Z : y 2 (x; )g: graph of. F : A! R: one-period return function. : discount factor. 4

Plans t : Z t! X for t = 1; 2; :::: sequence of measurable functions. = ( 0 2 X; t ): plan. Interpretation of a plan: contingent decision rules. A plan is feasible from s 0 2 S if: 1. 0 2 (s 0 ) : 2. t 2 t 1 t 1 ; t for t 2 Z t, t = 1; 2; ::: (s 0 ): set of all feasible plans from s 0 2 S: If does not depend on t but only on t, we call the plan stationary or Markov. 5

Some Preliminary Results I Assumption 1: 1. is non-empty valued. 2. A is (X X Z) measurable. 3. 9 a measurable selection h : S! X s.t. h (s) 2 (s) for 8s 2 S. Lemma 1: under previous assumption, (s 0 ) is nonempty for 8s 0 2 S. Lemma 2: A = (X X Z) is a algebra. Corollary 1: F t 1 t 1 ; t t ; t is Z t measurable. 6

Given Q on (Z; Z) and s 0 2 S, Some Preliminary Results II Assumption 2: F : A! R is A holds: a. F 0 or F 0: t ( 0 ; ) : Z t! [0; 1], t = 1; 2; ::: measurable and either (a) or (b) b. For each (x 0 ; 0 ) = s 0 2 S and each plan 2 (s 0 ), F t 1 t 1 ; t t ; t is t ( 0 ; ) integrable, t = 1; 2; ::: and the limit: F (x 0 ; 0 ; 0 )+ lim t!1 1X t=1 Z Z t t F t 1 exists (though it may be plus or minus innity). 7 t 1 ; t t ; t t ( 0 ; )

Sequential Problem Dene u n (; s 0 ) : (s 0 )! R; n = 0; 1; ::: by: u 0 (; s 0 ) = F (x 0 ; 0 ; 0 ) u n (; s 0 ) = F (x 0 ; 0 ; 0 ) + nx t=1 Z Z t t F t 1 Dene u (; s 0 ) : (s 0 )! R 1 by t 1 ; t t ; t t 0 ; d t Dene v : S! R 1 by u (; s 0 ) = v (s) = lim n!1 u n (; s 0 ) sup u (; s 0 ) 2(s) 8

Recursive Problem Functional equation: v (s) = v (x; ) = sup F (x; y; ) + y2 (x;) Z v y; 0 Q ; d 0 Associate with the functional equation, we have a policy correspondence: Z G (x; ) = y 2 (x; ) : v (x; ) = F (x; y; ) + v y; 0 Q ; d 0 If G is nonempty and if there is a sequence of measurable selections g 1 ; ::: from G, we have the plan generated by G from s 0 : 0 = g 0 (s 0 ) t t = g t h t 1 t 1 ; ti, 8 t 2 Z t, t = 1; 2; ::: 9

Transversality Condition In general, dynamic programming problems require two boundary conditions: an initial condition and a nal condition. Transversality condition plays the role of the second condition. To ensure the equivalence of the sequential and recursive problem, we also need then a transversality condition: lim t!1 t Z v t 1 t 1 ; t t 0 ; d t = 0; 8 2 (s 0 ), s 0 2 S 10

Equivalence of Sequential and Recursive Problem Under our previous assumptions: 1. v = v 2. Any plan generated by G obtains the supremum in v (s) = sup 2(s) u (; s 0 ) Under our previous assumptions and an additional boundness condition, a plan is optimal only if it is generated a.e. by G: Our results are equivalent to theorems 4.2-4.5 in SLP for the deterministic case. 11

Bounded Returns As in the deterministic case, we want to show further results. Assumptions: 1. F is bounded and continuous. 2. < 1: 3. X is a compact set in R l and X is a universally measurable algebra. 4. Z is a compact set in R k and Z is a universally measurable algebra. 5. Q has the Feller property. Intuition: integration will preserve properties of the return function. 12

Results I Under these assumptions, we can prove. that: 1. The Bellman operator: (T f) (x; ) = sup y2 (x;) has a unique xed point. F (x; y; ) + Z v y; 0 Q ; d 0 2. Contractivity: kt n v 0 vk n kv 0 vk, n = 1; 2; ::: 3. The policy correspondence G (x; ) = y 2 (x; ) : v (x; ) = F (x; y; ) + is non-empty, compact-valued, and u.h.c. Z v y; 0 Q ; d 0 4. The value function will inherit increasing properties from F and Q. 13

Concavity Assumption concavity 1: For each 2 Z, F (; ; ) : A! R satises: F (x; y) + (1 ) x 0 ; y 0 ; F (x; y; ) + (1 ) F x 0 ; y 0 ; and the inequality is strict if x 6= x 0. 8 2 (0; 1), 8 (x; y) ; x 0 ; y 0 2 A Assumption concavity 2: For 8 2 Z and 8x; x 0 2 X, y 2 y 0 2 x 0 ; (x; ) and y + (1 ) y 0 2 x + (1 ) x 0 ; ; 8 2 (0; 1) 14

Results II 1. Under previous assumptions, v (; ) : X! R is strictly concave and G (; ) : X! X is a continuous, single-valued function. 2. Let v n = T v n 1 and g n (x; ) = arg max y2 (x;) F (x; y; ) + Z v y; 0 Q ; d 0 for n = 1; 2; ::: Then, g n! g uniformly. 3. If x 0 2 int (X) and g (x 0 ; 0 ) 2 int ( (x 0 ; 0 )), v (; 0 ) is continuously dierentiable in x at x 0 with derivatives given by: v i (x 0 ; 0 ) = F i [x 0 ; g (x 0 ; 0 ) ; 0 ] ; i = 1; :::; l 15

Unbounded Returns What if returns, like in most applications of interest in economics, are unbounded? This was already an issue in the deterministic set-up. We can get most of the substance of previous results if F is constant returns to scale. In the case of CRRA utility functions, we would need to do some ad-hoc work. 16

Policy Functions and Transition Functions I Let us imagine that the decision maker follows g (x; ) given an initial condition s 0 : The policy function generates a sequence fs t g : What do we know about fs t g? Read chapters 11-14 of SLP. 17

Policy Functions and Transition Functions II Let (X; X ), (Z; Z) ; and (S; S): (X; X ) (Z; Z) be universally measurable spaces; let Q be a transition function on (Z; Z) ; and let g : S! X be a measurable function. Then: ( Q (; B) if g (x; ) 2 A P [(x; ) ; A B] = 0 otherwise for 8x 2 X; 2 Z; A 2 X ; and B 2 Z, denes a transition function on (S; S). If g is continuous, then P has the Feller property. Characteriing long run behavior of the model. 18