T 1. The value function v(x) is the expected net gain when using the optimal stopping time starting at state x:

Similar documents
MATH 56A SPRING 2008 STOCHASTIC PROCESSES

Xt i Xs i N(0, σ 2 (t s)) and they are independent. This implies that the density function of X t X s is a product of normal density functions:

MATH 56A SPRING 2008 STOCHASTIC PROCESSES 65

Markov Processes Hamid R. Rabiee

O.K. But what if the chicken didn t have access to a teleporter.

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

8.1 Solutions of homogeneous linear differential equations

Chapter 1 Review of Equations and Inequalities

Math-Stat-491-Fall2014-Notes-V

Monopoly An Analysis using Markov Chains. Benjamin Bernard


MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

Quantitative Evaluation of Emedded Systems Solution 2: Discrete Time Markov Chains (DTMC)

Math 115 Spring 11 Written Homework 10 Solutions

Infinite Limits. By Tuesday J. Johnson

Lesson 6-1: Relations and Functions

P (A G) dp G P (A G)

Chapter 10 Markov Chains and Transition Matrices

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Markov Chains. Chapter 16. Markov Chains - 1

Lecture 4 - Random walk, ruin problems and random processes

and lim lim 6. The Squeeze Theorem

Critical Reading Answer Key 1. D 2. B 3. D 4. A 5. A 6. B 7. E 8. C 9. D 10. C 11. D

MATH HOMEWORK PROBLEMS D. MCCLENDON

Math 456: Mathematical Modeling. Tuesday, March 6th, 2018

Lecture 10: Powers of Matrices, Difference Equations

MATH The Chain Rule Fall 2016 A vector function of a vector variable is a function F: R n R m. In practice, if x 1, x n is the input,

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

On the definition and properties of p-harmonious functions

4452 Mathematical Modeling Lecture 16: Markov Processes

General Boundary Problem: System of Equations Solution

M155 Exam 2 Concept Review

Mathematical Games and Random Walks

Leibniz Notation. Math 184 section 922 5/31/11. An alternative to the notation f (x) or (x + 2) is the so-called Leibniz

The Boundary Problem: Markov Chain Solution

TAYLOR POLYNOMIALS DARYL DEFORD

Integer-Valued Polynomials

Lecture 9 Classification of States

MATH 3C: MIDTERM 1 REVIEW. 1. Counting

Lecture 4. David Aldous. 2 September David Aldous Lecture 4

1 Lecture 25: Extreme values

8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0

= lim. (1 + h) 1 = lim. = lim. = lim = 1 2. lim

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

Study guide for the Math 115 final Fall 2012

Math 381 Discrete Mathematical Modeling

General Physics I Momentum

Probability Problems for Group 3(Due by EOC Mar. 6)

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

May 2015 Timezone 2 IB Maths Standard Exam Worked Solutions

A brief introduction to Conditional Random Fields

Discrete Random Variables. Discrete Random Variables

Section 11.3 Rates of Change:

Test 3 Review. y f(a) = f (a)(x a) y = f (a)(x a) + f(a) L(x) = f (a)(x a) + f(a)

Math Stochastic Processes & Simulation. Davar Khoshnevisan University of Utah

10.4 Comparison Tests

Preliminary check: are the terms that we are adding up go to zero or not? If not, proceed! If the terms a n are going to zero, pick another test.

18.175: Lecture 2 Extension theorems, random variables, distributions

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Review. Projectile motion is a vector. - Has magnitude and direction. When solving projectile motion problems, draw it out

Chapter 3. Chapter 3 sections

Lesson 1: What s My Rule?

Transformations and A Universal First Order Taylor Expansion

Exploring Graphs of Polynomial Functions

UNDERSTANDING FUNCTIONS

Chapter 4 Notes, Calculus I with Precalculus 3e Larson/Edwards

Markov Chains on Countable State Space

Joint Probability Distributions and Random Samples (Devore Chapter Five)

4 Branching Processes

Convex Optimization CMU-10725

Spring 2015, Math 111 Lab 3: Exploring the Derivative

The problem Countable additivity Finite additivity Conglomerability Structure Puzzle. The Measure Problem. Alexander R. Pruss

Solving and Graphing Inequalities

Unit 1- Function Families Quadratic Functions

CS168: The Modern Algorithmic Toolbox Lecture #6: Markov Chain Monte Carlo

Hitting Probabilities

Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan

MATH150-E01 Test #2 Summer 2016 Show all work. Name 1. Find an equation in slope-intercept form for the line through (4, 2) and (1, 3).

1 Ways to Describe a Stochastic Process

Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Written by Rachel Singh, last updated Oct 1, Functions

Algebra Exam. Solutions and Grading Guide

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

What is a random variable

Math 5a Reading Assignments for Sections

1 Gambler s Ruin Problem

Math 122L. Additional Homework Problems. Prepared by Sarah Schott

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Answers for Calculus Review (Extrema and Concavity)

When a function is defined by a fraction, the denominator of that fraction cannot be equal to zero

Math 111, Introduction to the Calculus, Fall 2011 Midterm I Practice Exam 1 Solutions

CHAPTER 2 POLYNOMIALS KEY POINTS

Summary of Derivative Tests

Economics 2010c: Lecture 2 Iterative Methods in Dynamic Programming

MATH 56A: STOCHASTIC PROCESSES CHAPTER 7

Chapter 4E - Combinations of Functions

Transcription:

108 OPTIMAL STOPPING TIME 4.4. Cost functions. The cost function g(x) gives the price you must pay to continue from state x. If T is your stopping time then X T is your stopping state and f(x T ) is your payoff. But your cost to play to that point was T 1 cost = g(x 0 ) + g(x 1 ) + + g(x T 1 ) = g(x j ) So, your net gain is T 1 net = f(x T ) g(x j ) The value function v(x) is the expected net gain when using the optimal stopping time starting at state x: It satisfies the equation: j=0 j=0 T 1 v(x) = E(f(X T ) g(x j ) X 0 = x) j=0 v(x) = max(f(x), (P v)(x) ) Proof: In state x you should either stop or continue. (1) If you stop you get f(x). () If you continuous and use the optimal strategy after that then you get v(y) with probability p(x, y) but you have to pay g(x). So, you would expect to get p(x, y)v(y) y x You should pick the one which gives you a higher expected net. So, v(x) is the maximum of these two numbers. 4.4.1. nonstochastic case. The first example was a dice problem. You toss two dice and let x = the sum. Then x 1 with probabilities indicated below. p = 1 3 4 5 6 5 4 3 1 1/36 x = 3 4 5 6 7 8 9 10 11 1 f(x) = 3 4 5 6 0 8 9 10 11 1 g(x) = 1 1 1 1 1 1 1 There was a question of what should be g(7). It does not make sense to talk about the cost of continuing from 7 if you are not allowed to continue. So, I decided that, in order to have a well defined function (g(x) needs to be defined for every x S in order for g to be a function),

MATH 56A SPRING 008 STOCHASTIC PROCESSES 109 we should allow the player to pay max f(x) = 1 to continue from 7. It doesn t make sense to pay 1 to play a game where the maximum gain is 1. So, this has the effect of making 7 recurrent. The problem is to find the value function v(x) and the optimal strategy (the formula for T ). I pointed out that this Markov chain is actually not stochastic in the sense that the probabilities do not change with time. This implies that the value function v(x) which is a vector with 10 unknown coordinates (11 coordinates of which we know only v(7) = 0): v = (v(), v(3),, v(6), v(7) = 0, v(8),, v(1) is determined by one number E = the expected payoff if you continue Then, your expected net if you continue is E so And E is given in terms of v by: v(x) = max(f(x), E ) if x 7. E = x 7 p x v(x) When you do the iteration algorithm you compute and you get a sequence of numbers All you need is the single number E n = x 7 p x u n (x) E 1, E, E 3, E = E = lim n E n.

110 OPTIMAL STOPPING TIME No cost First I did this in the no cost case. When n = 1 you take the most optimistic view: Hope to get x = 1. But you have a probability p 7 = 6/36 = 1/6 of getting x = 7 and losing. So, E 1 = x 7 p x u 1 (x) = x 7 p x 1 = (5/6)1 = 10. Then u (x) = max(f(x), E) if x 7 (But make sure to put u (7) = 0): x = 3 4 5 6 7 8 9 10 11 1 f(x) = 3 4 5 6 0 8 9 10 11 1 E = 10 10 10 10 10 10 10 10 10 10 10 u (x) = 10 10 10 10 10 0 10 10 10 11 1 If you take the average value of u (x) you get E : Repeating this process you get: E = x 7 p x u (x) = 8.4444... E 3 = 7.4691 E 4 = 7.001 E 5 = 6.806 E 6 = 6.747 E 5 = 6.6667 Once you realize that E is somewhere between 6 and 7 you know the winning strategy: You need to continue if you get 6 or less and stop if you get 8 or more. So, which makes v(x) = (E, E, E, E, E, 0, 8, 9, 10, 11, 1) E = 1 (E + E + 3E + 4E + 5E + 5(8) + 4(9) + 3(10) + (11) + 1) 36 = 1 (15E + 140) 36 36E = 15E + 140 E = 140/1 = 0/3 = 6 3 So, the value function is the vector: v = ( 0, 0, 0, 0, 0, 0, 8, 9, 10, 11, 1) 3 3 3 3 3

MATH 56A SPRING 008 STOCHASTIC PROCESSES 111 With cost The iteration algorithm starts with (I don t remember what I said but you have to remember that u n (x) f(x) all the time): 0 if x = 7 f(x) if f(x) max f(y) This gives: u 1 (x) = max f(y) }{{} hope for best }{{} cost otherwise u 1 = (10, 10, 10, 10, 11, 0, 11, 11, 11, 11, 1) The average of these numbers is: E 1 = x 7 p x u 1 (x) = 8.917 Then u (x) = max(f(x), E 1 ) u = (6.917, 6.917, 6.917, 6.917, 7.917, 0, 8, 9, 10, 11, 1) with average E = p x u (x) = 6.910 x 7 E 3 = 6.096 E 4 = 5.960 E 10 = 5.939393 E 11 = 5.939393 We just needed to know that E is between 5 and 6. This tells us that the optimal strategy is to continue if you get or 3 and stop if you get 4 or more. After you determine the optimal strategy, you can find the exact value of both E and the value function v(x). First you find v(x) in terms of E: v(x) = (E, E, 4, 5, 6, 0, 8, 9, 10, 11, 1) The average of these numbers is E. So, E = (E )(3/36) + 0/36 E = 196/33 = 5 31 = 5 93 = 5.939393 33 99 v(x) = (3 31, 3 31, 4, 5, 6, 0, 8, 9, 10, 11, 1) 33 33

11 OPTIMAL STOPPING TIME 4.4.. random walk. with absorbing walls. In the general (stochastic) case the value function is the solution of the equation: v(x) = max(f(x), p(x, y)v(y) g(x)) y }{{} v(x 1)+v(x+1) In the case of the random walk, this part is what we have before. So, v(x) is the smallest function so that v(x) v(x) f(x) and v(x 1) + v(x + 1) Example 4.8. Suppose the payoff and cost functions are: states x = 0 1 3 4 5 f(x) = 0 5 11 11 15 0 g(x) = 0 1 1 1 1 0 11 13 1 = w(3) 11 = f(3) 15 v(0)+v() 5 1 f(1) = 5 w(1) = 4 1 In the graph, the thin lines gives the convex hull of the function f(x). This would be the answer if there were no cost. Since the cost is 1, we have to go one step below the average. I called this function w(x): w(x) := f(x 1) + f(x + 1) Since v(x) f(x), it must also be w(x): v(x) v(x 1) + v(x + 1) g(x) f(x 1) + f(x + 1) g(x) = w(x)

MATH 56A SPRING 008 STOCHASTIC PROCESSES 113 This is a simple case in which the gaps have length. So, we can just compare f(x) and w(x) to get the value function v(x). If the gap is more than two, the equation becomes more complicated. I think I forgot to say this: For the iteration algorithm we can start with the value function that we know how to calculate when there is no cost: u 1 (x) = (0, 5 1, 11, 13, 15, 0) Then ( u (x) = max f(x), u ) 1(x 1) + u 1 (x + 1) u = (0, 5, 11, 1, 15, 0) If you do it again, you get the same thing: u 3 = u. So, this is also equal to the value function: v = (0, 5, 11, 1, 15, 0) So, the optimal strategy is to continue when x = 3 (since that is the only point where v(x) > f(x)) but stop at any other point.