Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University.
|
|
- Poppy McCarthy
- 6 years ago
- Views:
Transcription
1 Continuous-Time Markov Decision Processes Discounted and Average Optimality Conditions Xianping Guo Zhongshan University.
2 Outline The control model The existing works Our conditions and results Examples
3 1. The control model The model of continuous-time Markov decision processes: {S, (A(x) A, x S),q( x,a),r(x,a)}, (1) where S : the state space, a Polish space A(x) : the admissible action set; actions A: a Polish space of q( x,a) : the transition rates, a A(x). 1
4 r(x, a) : the reward/cost rates, a A(x) A Markov policy π: a family (π t,t 0) of stochastic kernels on A(x) given X. A stationary policy f: a measurable function on S with f(x) A(x) for all x S. For each Markov policy π = (π t, t 0), define the transition rates Q(D x,π t ) := A(x) q(d x,a)π t (da x). (2) 2
5 To guarantee the existence of a Q-process with the transition rates Q(D x,π t ), we introduce admissible policies. Definition 1.1(Admissible Policies). A Markov policy (π t ) is called admissible if Q(D x,π t ) is continuous in t 0. Let Π be the class of admissible policies. Under suitable condition, we can define the expected discounted and average criteria: J(s, x,π) = s [Ẽs,x π e α(t s) r(ξ(t),η(t))]dt. 3
6 V (x,π) = lim inf T T 0 [Ẽ0,x π r(ξ(t),η(t))]dt. T Definition 1.2. A policy π is said to be discounted optimal if J(s, x,π ) J(s, x, π) for all π,x S and s 0; Similarly, we can define an average optimal policy. Main aims: The conditions for the existence of optimal policies. Algorithms for optimal policies. The characterization of optimal policies and applications. 4
7 2. The Existing Works Let r := sup (a,x) K r(x,a), q := sup (x,a) K ( q({x} x,a)). When S is denumerable: Case 1 r <, q < : Kakumanu (1972, 1975); Dong (1979), Cao (2002). Case 2 r <, q : Doshi (1975), Song (1987), Guo & Liu (2001). Case 3 r, q < : Yushkevich (1979), Puter- 5
8 man (1994), Haviv (1998), Sennott (1999), Lewis & Puterman (2000). Case 4 r, q : Guo & Zhu (2001, 2002), Guo & Hernandez-Lerma, (2003a, 2003b). When S is not denumerable: Case 1 r <, q : Doshi (1975) (Finite A(x)) Case 2 r(x, a) 0, q : Hernandez-Lerma & Govindan (2001) ( Assumption about the existence of a solution to the optimality equation!!!). Case 3 r, q :??? 6
9 3. On the Discounted Criterion To ensure the regularity of a possibly nonhomogeneous Q- process we use the following drift conditions. Assumption A. There exists a measurable function w 1 1 on S, and constants c 1 0,b 1 0 and M q > 0 such that (1) S w 1(y)q(dy x, a) c 1 w 1 (x) + b 1 (x, a) K; (2) q(x) M q w 1 (x);q(x) := sup a A(x) [ q({x} x,a)]. Theorem 3.1. Let w be any nonnegative measurable function on S, c ( = 0) and b ( 0) two constants. Then for each 7
10 π Π, the following statements are equivalent: (a) S w(y)q(dy x,π t) c w(x) + b x S,t 0; (b) S w(y)pmin π (s, x, t,dy) e c(t s) w(x) + b [e c(t s) 1] c for all x S,t s 0, where p min π (s, x,t,dy) is the minimum Q-process with the transition rates Q(D x,π t ). Theorem 3.2. Suppose that Assumption A holds. Then for each π Π, x S,t s 0, (a) the Q-process is regular (i.e., p min π (s, x,t,s) 1); (b) E π s,xw 1 (x(t,π)) e c 1(t s) w 1 (x) + b 1 c 1 (e c 1(t s) 1). 8
11 To ensure the finiteness of discounted criterion J(s, x,π), by Theorem 3.2(b) it is natural to propose the following conditions. Assumption B. (1) r(x,a) M 1 w 1 (x) for all (x,a) K. (2) α c 1 > 0. To ensure the existence of optimal stationary policies, in addition to Assumptions A and B we propose the following. Assumption C. (1) A(x) is a compact set for each x S; (2) r(x,a) is continuous in a A(x), for each fixed x S; 9
12 (3) For each x S, the function S u(y)q(dy x,a) is continuous in a A(x), for all of bounded measurable functions u on S, and also for u = w 1 ; (4) There exists a nonnegative measurable function w 2 on S, and constants c 2 > 0, b 2 0 and M 2 > 0 such that q(x)w 1 (x) M 2 w 2 (x), w 2 (y)q(dy x,a) c 2 w 2 (x) + b 2. S Let B w1 (S) := {u : sup x S u(x) w 1 (x) < }. Theorem 3.3. Suppose that Assumptions A, B and C hold. (a) J(s, x,π) b 1M 1 α(α c 1 ) + M 1w(x) α c 1. 10
13 (b) There exists a function u B w1 (S) satisfying the optimality equation αu (x) = sup {r(x,a) + a A(x) S u (y)q(dy x, a)} x S. (c) u (x) = sup π Π J(s, x,π) =: J α(x) for all s 0. (d)there exists an optimal stationary policy. Theorem 3.3 ensures the existence of an optimal stationary policy. Under the hypotheses of Theorem 3.3, for each f F, we define a stochastic process: M(τ, f) = τ 0 e αt r(x(t,f),f(x(t, f)))dt + e ατ u (x(τ,f)). 11
14 Theorem 3.4 Suppose that Assumptions A, B and C hold. Then, the following statements are equivalent: (a) f in F is discounted optimal. (b) For each x S, {M(τ, f )} is a P f x -martingale with respect to F t = σ{x(s, f ) : s t}. Theorem 3.4 gives a martingale characterization of discount optimal stationary policy. 12
15 4. On the Average Criterion To prove the existence of average optimal policies, we give the following conditions. Assumption D. (1) c 1 < 0, with c 1 as in Assumption A. (2) There exist functions v 1, v 2 B w1 (S) and state x 0 S satisfying v 1 (x) h α (x) v 2 (x) x S and α > 0, where h α (x) := Jα(x) Jα(x 0 ). 13
16 To verify Assumption D, we provide sufficient conditions. Proposition 4.1. Under Assumptions A and B, each one of following conditions (a) and (b) implies Assumption C. (a) There exit constants R and ρ > 0, such that sup Ex[u(x(t))] f u(y)µ f (dy) Re ρt w 1 (x) u w 1 S for t 0 and f F, where µ f is the invariant distribution. (b) If S = [0, ) d with some integer number d 1, and the following conditions are satisfied: for each f F (b1) the function w 1 in Assumption A is nondecreasing 14
17 in each component, and w 1 (y)q(dy x, f(x)) c 1 w 1 (x) + b 1 I {0 d } (x), S (b2) The process x(t) is stochastically monotonic. We now state our main results about average optimality. Theorem 4.2. Suppose that Assumptions A, B, C and D hold. (a) There exists a constant g, functions u 1, u 2 B w1 (S) and a stationary f F satisfying the optimality inequalities g max {r(x,a) + u 1(y)(dy x, a)} a A(x) S 15
18 g max {r(x,a) + u 2v(y)(dy x,a)} x S; a A(x) S = r(x,f (x)) + u 2(y)q(dy x, f (x)) x S; S (b) g = sup π Π V (x,π) for all x S. (c) The policy f in (a) is average optimal. Theorem 4.2 ensures the existence of an average optimal stationary policy. For each f F,x S,u B w1 (S) and any constant number 16
19 g, let (x; f,u, g) := r(x,f(x)) + S u(y)q(dy x, f(x)) g and then define a continuous-time Markov process M t (f, u, g) := t 0 r(x(s), f(x(s)))ds + u(x(t)) tg for each t 0. Theorem 4.3. Suppose that Assumption A, B, C and D hold. (a) Let f be the average optimal policy obtained in Theorem 5.2, with u 1, u 2, g as in Theorem 5.2. Then (a 1 ) M t (f, u 2,g ) is a P f x -submartingale all x S; ; 17
20 and x S. (a 2 ) M t (f, u 1,g ) is P f x -supermartingale for all f F (b) Conversely, if there exist some policy f F, and u 1, u 2 B w (S) and a constant g such that for each f F and x S, (b 1 ) M t (f, u 2, g) is a P f x -submartingale for all x S; (b 2 ) M t (f, u 1, g) is P f x -supermartingale, then f F is average optimal. Theorem 4.3 gives a semi-martingale characterization of average optimal stationary policy. 18
21 5. Examples 5.1. A controlled generalized Potlach process. S := [0, ) d with d 1. Let (p ij ) be a transition probability matrix on {1, 2,, d}. Then, the generalized Potlach process generated by d Lu(x) := i=1 0 [u(x e i x i + y d p ij x i e j ) u(x)]df(y, λ), Let d d r(x,a) := q i p ij x j λ(x x d ), (4) i=1 j=1 j=1 fo 19
22 where (q 1,,q d ) =: a will be interpreted as control actions. Thus, the transition rates q(d x,a) are given by d q(d x,a) := i=1 q({x} x, a) := q(s x, a). 0 d I D\{x} (x e i x i + y p ij x i e j )λe λy dy, Conclusion: All of Assumptions A, B, C and D hold if A(x) is compact for each x S, and λ > 1. j=1 20
23 5.2. A controlled generalized birth-death system. For i = 0 and each a := (a 1,a 2 ) A(0) q(1 0, a) := q(0 0,a) := h 2 (0, a 2 ) > 0, and for i 1 and all a := (a 1, a 2 ) A(i) µi + h 1 (i,a 1 ) if j = i 1, (µ + λ)i h q(j i,a) := 1 (i,a 1 ) h 2 (i,a 2 ) if j = i, λi + h 2 (i,a 2 ) if j = i + 1, 0 otherwise; r(i,a) := pi + r(i,a 2 ) c(i,a 1 ) Consider the following conditions: E 1 : (a) µ λ > 0; 21
24 (b) Either κ := µ λ+h 2 h 1 0, or µ λ > h 2 h 1 when κ > 0, where h 2 := sup a2 A 2 (i),i 1 h 2 (i,a 2 ),h 1 := inf a1 A 1 (i),i 1 h 1 (i,a 1 ). E 2 : h 1 (i, ), h 2 (i, ), c(i, ) and r(i, ) are all continuous. E 3 : (a) c(i,a 1 ) L 1 (i + 1) and r(i,a 2 ) L 2 (i + 1); (b) h k := sup i S,ak A k (i) h k (i,a k ) <. Proposition 5.1. Under E 1,E 2 and E 3, the birth-death system satisfies the Assumptions A, B, C and D. 22
25 Bibliography [1] Altman, E., Constrained Markov Decision Processes, Chapman Hall/CRC,1999. [2] Borkar, V.S., Topics in Controlled Markov Chains, Pitman Research Notes in Math. No. 240, Longman Scientific and technical, Harlow,
26 [3] Derman, C., Finite State Markovian Decision Processes, Academic Press, New York, [4] Dynkin, E.B. and Yushkevich, A.A., Controlled Markov Processes, Springer Verlag, New York. [5] Feinberg, E.A., and Shwartz, A., Handbook of Markov Decision Processes, Kluwer Academic Publishers, Boston/Dordrecht/London, [6] Filar, J.A. and Vrieze, K., Competitive Markov Decision Processes, Springer-Verlag, New York,
27 [7] Hernandez-Lerma, O. and Lasserre. J.B., Further Topics on discrete-time Markov Control Processes, Springer-Verlag, New York, [8] Hernandez-Lerma, O. and Lasserre. J.B., Discrete- Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York, [9] Hernandez-Lerma, O., Adaptive Markov Control Processes, Springer-Verlag, New York, [10] Hinderer, K., Foundations of Non-stationary Dy- 25
28 namic Programming with Discrete Time Parameter. Lecture Notes in Oper. Res., Springer-Verlag, New York, [11] Hou, Z.T. and Guo, X.P., Markov Desicion Processes, Science and Technology Press of Hunan, (In Chinese.) [12] Hordijk, A., Dynamic Programming and Markov Potential Theory, Math. Centre Tract, No.51, Mathematisch Centrum, Amsterdam,
29 [13] Howard, R.A., Dynamic Programming and Markov Processes, MIT Press, Cambridge, [14] Kallenberg, L.C.M., Linear Programming and Finite Markovian Control Problems, Mathematical Centre Tract 148, Mathematical Centre, Amsterdam, [15] Piunovskiy, A.B., Optimal Control of Random Sequences in Problems with Constraints, Kluwer Academic Publishers, 1997, 27
30 [16] Puterman, M.L., Markov Decision Processes, Wiley, New York, [17] Ross, S.M., Introduction to Stochastic Dynamic Programming, Academic Press, New York, [18] Sennott, L.I., Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley, New York, [19] Tijms, H.C. and Wessels, J., Markov Decision Theory, Mathematical Centre Tract 93, Mathematical 28
31 Centre, Amsterdam, [20] White,D.J., Markov Decision Processes, John Wiley Sons, Ltd., Chichester, THANK YOU!!!
Total Expected Discounted Reward MDPs: Existence of Optimal Policies
Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600
More informationZero-sum semi-markov games in Borel spaces: discounted and average payoff
Zero-sum semi-markov games in Borel spaces: discounted and average payoff Fernando Luque-Vásquez Departamento de Matemáticas Universidad de Sonora México May 2002 Abstract We study two-person zero-sum
More informationOn Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes
On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook
More informationZero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation
Zero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation Oscar Vega-Amaya Departamento de Matemáticas Universidad de Sonora May 2002 Abstract This paper deals with zero-sum average
More informationOptimality Inequalities for Average Cost MDPs and their Inventory Control Applications
43rd IEEE Conference on Decision and Control December 14-17, 2004 Atlantis, Paradise Island, Bahamas FrA08.6 Optimality Inequalities for Average Cost MDPs and their Inventory Control Applications Eugene
More informationON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES
MMOR manuscript No. (will be inserted by the editor) ON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES Eugene A. Feinberg Department of Applied Mathematics and Statistics; State University of New
More informationA monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion
A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion Cao, Jianhua; Nyberg, Christian Published in: Seventeenth Nordic Teletraffic
More informationON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME
ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems
More informationMinimum average value-at-risk for finite horizon semi-markov decision processes
12th workshop on Markov processes and related topics Minimum average value-at-risk for finite horizon semi-markov decision processes Xianping Guo (with Y.H. HUANG) Sun Yat-Sen University Email: mcsgxp@mail.sysu.edu.cn
More informationOn the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies
On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics Stony Brook University
More informationValue Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes
Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes RAÚL MONTES-DE-OCA Departamento de Matemáticas Universidad Autónoma Metropolitana-Iztapalapa San Rafael
More informationREMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I.
REMARKS ON THE EXISTENCE OF SOLUTIONS TO THE AVERAGE COST OPTIMALITY EQUATION IN MARKOV DECISION PROCESSES Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I. Marcus Department of Electrical
More informationOn the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies
On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies Eugene A. Feinberg, 1 Mark E. Lewis 2 1 Department of Applied Mathematics and Statistics,
More informationComputational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes
Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes Jefferson Huang Dept. Applied Mathematics and Statistics Stony Brook
More informationOptimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies
Optimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York
More informationINVARIANT PROBABILITIES FOR
Applied Mathematics and Stochastic Analysis 8, Number 4, 1995, 341-345 INVARIANT PROBABILITIES FOR FELLER-MARKOV CHAINS 1 ONtSIMO HERNNDEZ-LERMA CINVESTA V-IPN Departamento de Matemticas A. Postal 1-70,
More informationReductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang
Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones Jefferson Huang School of Operations Research and Information Engineering Cornell University November 16, 2016
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationOPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS
OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony
More informationAlmost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms
Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Vladislav B. Tadić Abstract The almost sure convergence of two time-scale stochastic approximation algorithms is analyzed under
More informationMulti-dimensional Stochastic Singular Control Via Dynkin Game and Dirichlet Form
Multi-dimensional Stochastic Singular Control Via Dynkin Game and Dirichlet Form Yipeng Yang * Under the supervision of Dr. Michael Taksar Department of Mathematics University of Missouri-Columbia Oct
More informationUNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}.
Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 1 UNIFORMIZATION IN MARKOV DECISION PROCESSES OGUZHAN ALAGOZ MEHMET U.S. AYVACI Department of Industrial and Systems Engineering, University of Wisconsin-Madison,
More information1.1 Review of Probability Theory
1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,
More informationConstrained continuous-time Markov decision processes on the finite horizon
Constrained continuous-time Markov decision processes on the finite horizon Xianping Guo Sun Yat-Sen University, Guangzhou Email: mcsgxp@mail.sysu.edu.cn 16-19 July, 2018, Chengdou Outline The optimal
More informationAdvanced Computer Networks Lecture 2. Markov Processes
Advanced Computer Networks Lecture 2. Markov Processes Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/28 Outline 2/28 1 Definition
More informationCentral limit theorems for ergodic continuous-time Markov chains with applications to single birth processes
Front. Math. China 215, 1(4): 933 947 DOI 1.17/s11464-15-488-5 Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes Yuanyuan LIU 1, Yuhui ZHANG 2
More informationControlled Markov Processes with Arbitrary Numerical Criteria
Controlled Markov Processes with Arbitrary Numerical Criteria Naci Saldi Department of Mathematics and Statistics Queen s University MATH 872 PROJECT REPORT April 20, 2012 0.1 Introduction In the theory
More informationNoncooperative continuous-time Markov games
Morfismos, Vol. 9, No. 1, 2005, pp. 39 54 Noncooperative continuous-time Markov games Héctor Jasso-Fuentes Abstract This work concerns noncooperative continuous-time Markov games with Polish state and
More informationSEMI-MARKOV CONTROL MODELS WITH AVERAGE COSTS
APPLICATIONES MATHEMATICAE 26,3(1999), pp. 315 331 F. LUQUE-VÁSQUEZ (Sonora) O. HERNÁNDEZ-LERMA(México) SEMI-MARKOV CONTROL MODELS WITH AVERAGE COSTS Abstract. This paper studies semi-markov control models
More informationBrownian Motion. 1 Definition Brownian Motion Wiener measure... 3
Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................
More informationRandom Process Lecture 1. Fundamentals of Probability
Random Process Lecture 1. Fundamentals of Probability Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/43 Outline 2/43 1 Syllabus
More informationOn Finding Optimal Policies for Markovian Decision Processes Using Simulation
On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation
More informationHamiltonian Cycles and Singularly Perturbed Markov Chains
Hamiltonian Cycles and Singularly Perturbed Markov Chains Vladimir Ejov Jerzy A. Filar Minh-Tuan Nguyen School of Mathematics University of South Australia Mawson Lakes Campus, Mawson Lakes, SA 595, Australia
More informationFeller Processes and Semigroups
Stat25B: Probability Theory (Spring 23) Lecture: 27 Feller Processes and Semigroups Lecturer: Rui Dong Scribe: Rui Dong ruidong@stat.berkeley.edu For convenience, we can have a look at the list of materials
More informationSTABILIZATION OF AN OVERLOADED QUEUEING NETWORK USING MEASUREMENT-BASED ADMISSION CONTROL
First published in Journal of Applied Probability 43(1) c 2006 Applied Probability Trust STABILIZATION OF AN OVERLOADED QUEUEING NETWORK USING MEASUREMENT-BASED ADMISSION CONTROL LASSE LESKELÄ, Helsinki
More informationResearch Article A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games
Game Theory Volume 2013, Article ID 290427, 5 pages http://dx.doi.org/10.1155/2013/290427 Research Article A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games
More informationCONSTRAINED MARKOV DECISION MODELS WITH WEIGHTED DISCOUNTED REWARDS EUGENE A. FEINBERG. SUNY at Stony Brook ADAM SHWARTZ
CONSTRAINED MARKOV DECISION MODELS WITH WEIGHTED DISCOUNTED REWARDS EUGENE A. FEINBERG SUNY at Stony Brook ADAM SHWARTZ Technion Israel Institute of Technology December 1992 Revised: August 1993 Abstract
More informationROBUST - September 10-14, 2012
Charles University in Prague ROBUST - September 10-14, 2012 Linear equations We observe couples (y 1, x 1 ), (y 2, x 2 ), (y 3, x 3 ),......, where y t R, x t R d t N. We suppose that members of couples
More informationRisk-Sensitive and Average Optimality in Markov Decision Processes
Risk-Sensitive and Average Optimality in Markov Decision Processes Karel Sladký Abstract. This contribution is devoted to the risk-sensitive optimality criteria in finite state Markov Decision Processes.
More informationUniformly Uniformly-ergodic Markov chains and BSDEs
Uniformly Uniformly-ergodic Markov chains and BSDEs Samuel N. Cohen Mathematical Institute, University of Oxford (Based on joint work with Ying Hu, Robert Elliott, Lukas Szpruch) Centre Henri Lebesgue,
More informationDynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)
Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic
More informationOn the reduction of total cost and average cost MDPs to discounted MDPs
On the reduction of total cost and average cost MDPs to discounted MDPs Jefferson Huang School of Operations Research and Information Engineering Cornell University July 12, 2017 INFORMS Applied Probability
More informationKeywords: inventory control; finite horizon, infinite horizon; optimal policy, (s, S) policy.
Structure of Optimal Policies to Periodic-Review Inventory Models with Convex Costs and Backorders for all Values of Discount Factors arxiv:1609.03984v2 [math.oc] 28 May 2017 Eugene A. Feinberg, Yan Liang
More informationContinuous Time Markov Chains
Continuous Time Markov Chains Stochastic Processes - Lecture Notes Fatih Cavdur to accompany Introduction to Probability Models by Sheldon M. Ross Fall 2015 Outline Introduction Continuous-Time Markov
More informationIntegrals for Continuous-time Markov chains
Integrals for Continuous-time Markov chains P.K. Pollett Abstract This paper presents a method of evaluating the expected value of a path integral for a general Markov chain on a countable state space.
More informationAbstract Dynamic Programming
Abstract Dynamic Programming Dimitri P. Bertsekas Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Overview of the Research Monograph Abstract Dynamic Programming"
More informationPARTIAL DIFFERENTIAL EQUATIONS
PARTIAL DIFFERENTIAL EQUATIONS Lecturer: P.A. Markowich bgmt2008 http://www.damtp.cam.ac.uk/group/apde/teaching/pde_partii.html In addition to the sets of lecture notes written by previous lecturers ([1],[2])
More informationLecture 1: Brief Review on Stochastic Processes
Lecture 1: Brief Review on Stochastic Processes A stochastic process is a collection of random variables {X t (s) : t T, s S}, where T is some index set and S is the common sample space of the random variables.
More informationp 1 ( Y p dp) 1/p ( X p dp) 1 1 p
Doob s inequality Let X(t) be a right continuous submartingale with respect to F(t), t 1 P(sup s t X(s) λ) 1 λ {sup s t X(s) λ} X + (t)dp 2 For 1 < p
More informationPath Decomposition of Markov Processes. Götz Kersting. University of Frankfurt/Main
Path Decomposition of Markov Processes Götz Kersting University of Frankfurt/Main joint work with Kaya Memisoglu, Jim Pitman 1 A Brownian path with positive drift 50 40 30 20 10 0 0 200 400 600 800 1000-10
More informationLecture 12: Detailed balance and Eigenfunction methods
Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2015 Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility),
More informationStopped Decision Processes. in conjunction with General Utility. Yoshinobu KADOTA. Dep. of Math., Wakayama Univ., Wakayama 640, Japan
Stopped Decision Processes in conjunction with General Utility Yoshinobu KADOTA Dep. of Math., Wakayama Univ., Wakayama 640, Japan yoshi-k@math.edu.wakayama-u.ac.jp Masami KURANO Dep. of Math., Chiba Univ.,
More informationStochastic Shortest Path Problems
Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can
More informationOn asymptotic behavior of a finite Markov chain
1 On asymptotic behavior of a finite Markov chain Alina Nicolae Department of Mathematical Analysis Probability. University Transilvania of Braşov. Romania. Keywords: convergence, weak ergodicity, strong
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationOptimal Control of Partially Observable Piecewise Deterministic Markov Processes
Optimal Control of Partially Observable Piecewise Deterministic Markov Processes Nicole Bäuerle based on a joint work with D. Lange Wien, April 2018 Outline Motivation PDMP with Partial Observation Controlled
More informationWeak convergence and large deviation theory
First Prev Next Go To Go Back Full Screen Close Quit 1 Weak convergence and large deviation theory Large deviation principle Convergence in distribution The Bryc-Varadhan theorem Tightness and Prohorov
More informationWEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE
PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 126, Number 10, October 1998, Pages 3089 3096 S 0002-9939(98)04390-1 WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE JEAN B. LASSERRE
More informationChapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS
Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process
More informationINTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING
INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing
More informationStochastic Models. Edited by D.P. Heyman Bellcore. MJ. Sobel State University of New York at Stony Brook
Stochastic Models Edited by D.P. Heyman Bellcore MJ. Sobel State University of New York at Stony Brook 1990 NORTH-HOLLAND AMSTERDAM NEW YORK OXFORD TOKYO Contents Preface CHARTER 1 Point Processes R.F.
More informationNear-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement
Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Jefferson Huang March 21, 2018 Reinforcement Learning for Processing Networks Seminar Cornell University Performance
More informationLinear Ordinary Differential Equations
MTH.B402; Sect. 1 20180703) 2 Linear Ordinary Differential Equations Preliminaries: Matrix Norms. Denote by M n R) the set of n n matrix with real components, which can be identified the vector space R
More informationA Perturbation Approach to Approximate Value Iteration for Average Cost Markov Decision Process with Borel Spaces and Bounded Costs
A Perturbation Approach to Approximate Value Iteration for Average Cost Markov Decision Process with Borel Spaces and Bounded Costs Óscar Vega-Amaya Joaquín López-Borbón August 21, 2015 Abstract The present
More informationWITH SWITCHING ARMS EXACT SOLUTION OF THE BELLMAN EQUATION FOR A / -DISCOUNTED REWARD IN A TWO-ARMED BANDIT
lournal of Applied Mathematics and Stochastic Analysis, 12:2 (1999), 151-160. EXACT SOLUTION OF THE BELLMAN EQUATION FOR A / -DISCOUNTED REWARD IN A TWO-ARMED BANDIT WITH SWITCHING ARMS DONCHO S. DONCHEV
More informationCONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Stochastic Stability - H.J. Kushner
STOCHASTIC STABILITY H.J. Kushner Applied Mathematics, Brown University, Providence, RI, USA. Keywords: stability, stochastic stability, random perturbations, Markov systems, robustness, perturbed systems,
More informationAARMS Homework Exercises
1 For the gamma distribution, AARMS Homework Exercises (a) Show that the mgf is M(t) = (1 βt) α for t < 1/β (b) Use the mgf to find the mean and variance of the gamma distribution 2 A well-known inequality
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationFunctional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals
Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals Noèlia Viles Cuadros BCAM- Basque Center of Applied Mathematics with Prof. Enrico
More informationOptimal Control of an Inventory System with Joint Production and Pricing Decisions
Optimal Control of an Inventory System with Joint Production and Pricing Decisions Ping Cao, Jingui Xie Abstract In this study, we consider a stochastic inventory system in which the objective of the manufacturer
More informationZero-sum constrained stochastic games with independent state processes
Math. Meth. Oper. Res. (2005) 62: 375 386 DOI 10.1007/s00186-005-0034-4 ORIGINAL ARTICLE Eitan Altman Konstantin Avrachenkov Richard Marquez Gregory Miller Zero-sum constrained stochastic games with independent
More informationGeometric ρ-mixing property of the interarrival times of a stationary Markovian Arrival Process
Author manuscript, published in "Journal of Applied Probability 50, 2 (2013) 598-601" Geometric ρ-mixing property of the interarrival times of a stationary Markovian Arrival Process L. Hervé and J. Ledoux
More information{σ x >t}p x. (σ x >t)=e at.
3.11. EXERCISES 121 3.11 Exercises Exercise 3.1 Consider the Ornstein Uhlenbeck process in example 3.1.7(B). Show that the defined process is a Markov process which converges in distribution to an N(0,σ
More informationStochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS
Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review
More informationOn the optimality equation for average cost Markov decision processes and its validity for inventory control
DOI 10.1007/s10479-017-2561-9 FEINBERG: PROBABILITY On the optimality equation for average cost Markov decision processes and its validity for inventory control Eugene A. Feinberg 1 Yan Liang 1 Springer
More informationPoint Process Control
Point Process Control The following note is based on Chapters I, II and VII in Brémaud s book Point Processes and Queues (1981). 1 Basic Definitions Consider some probability space (Ω, F, P). A real-valued
More informationDerman s book as inspiration: some results on LP for MDPs
Ann Oper Res (2013) 208:63 94 DOI 10.1007/s10479-011-1047-4 Derman s book as inspiration: some results on LP for MDPs Lodewijk Kallenberg Published online: 4 January 2012 The Author(s) 2012. This article
More informationSeries Expansions in Queues with Server
Series Expansions in Queues with Server Vacation Fazia Rahmoune and Djamil Aïssani Abstract This paper provides series expansions of the stationary distribution of finite Markov chains. The work presented
More informationOn Ergodic Impulse Control with Constraint
On Ergodic Impulse Control with Constraint Maurice Robin Based on joint papers with J.L. Menaldi University Paris-Sanclay 9119 Saint-Aubin, France (e-mail: maurice.robin@polytechnique.edu) IMA, Minneapolis,
More informationINEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS
Applied Probability Trust (5 October 29) INEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS EUGENE A. FEINBERG, Stony Brook University JUN FEI, Stony Brook University Abstract We consider the following
More informationLIMITS FOR QUEUES AS THE WAITING ROOM GROWS. Bell Communications Research AT&T Bell Laboratories Red Bank, NJ Murray Hill, NJ 07974
LIMITS FOR QUEUES AS THE WAITING ROOM GROWS by Daniel P. Heyman Ward Whitt Bell Communications Research AT&T Bell Laboratories Red Bank, NJ 07701 Murray Hill, NJ 07974 May 11, 1988 ABSTRACT We study the
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process
More informationLet (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t
2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ algebras {F t } such that F t F and F t F t+1 for all t = 0, 1,.... In continuous time, the second condition
More informationfor all f satisfying E[ f(x) ] <.
. Let (Ω, F, P ) be a probability space and D be a sub-σ-algebra of F. An (H, H)-valued random variable X is independent of D if and only if P ({X Γ} D) = P {X Γ}P (D) for all Γ H and D D. Prove that if
More informationLinear Programming Methods
Chapter 11 Linear Programming Methods 1 In this chapter we consider the linear programming approach to dynamic programming. First, Bellman s equation can be reformulated as a linear program whose solution
More informationMAXIMAL COUPLING OF EUCLIDEAN BROWNIAN MOTIONS
MAXIMAL COUPLING OF EUCLIDEAN BOWNIAN MOTIONS ELTON P. HSU AND KAL-THEODO STUM ABSTACT. We prove that the mirror coupling is the unique maximal Markovian coupling of two Euclidean Brownian motions starting
More informationThe Optimal Stopping of Markov Chain and Recursive Solution of Poisson and Bellman Equations
The Optimal Stopping of Markov Chain and Recursive Solution of Poisson and Bellman Equations Isaac Sonin Dept. of Mathematics, Univ. of North Carolina at Charlotte, Charlotte, NC, 2822, USA imsonin@email.uncc.edu
More informationA relative entropy characterization of the growth rate of reward in risk-sensitive control
1 / 47 A relative entropy characterization of the growth rate of reward in risk-sensitive control Venkat Anantharam EECS Department, University of California, Berkeley (joint work with Vivek Borkar, IIT
More informationVerona Course April Lecture 1. Review of probability
Verona Course April 215. Lecture 1. Review of probability Viorel Barbu Al.I. Cuza University of Iaşi and the Romanian Academy A probability space is a triple (Ω, F, P) where Ω is an abstract set, F is
More informationOn the convergence rates of genetic algorithms
Theoretical Computer Science 229 (1999) 23 39 www.elsevier.com/locate/tcs On the convergence rates of genetic algorithms Jun He a;, Lishan Kang b a Department of Computer Science, Northern Jiaotong University,
More informationA concentration theorem for the equilibrium measure of Markov chains with nonnegative coarse Ricci curvature
A concentration theorem for the equilibrium measure of Markov chains with nonnegative coarse Ricci curvature arxiv:103.897v1 math.pr] 13 Mar 01 Laurent Veysseire Abstract In this article, we prove a concentration
More information1 Markov decision processes
2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe
More informationSection Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018
Section Notes 9 Midterm 2 Review Applied Math / Engineering Sciences 121 Week of December 3, 2018 The following list of topics is an overview of the material that was covered in the lectures and sections
More informationProcedia Computer Science 00 (2011) 000 6
Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-
More information4.6 Example of non-uniqueness.
66 CHAPTER 4. STOCHASTIC DIFFERENTIAL EQUATIONS. 4.6 Example of non-uniqueness. If we try to construct a solution to the martingale problem in dimension coresponding to a(x) = x α with
More informationSUPPLEMENT TO CONTROLLED EQUILIBRIUM SELECTION IN STOCHASTICALLY PERTURBED DYNAMICS
SUPPLEMENT TO CONTROLLED EQUILIBRIUM SELECTION IN STOCHASTICALLY PERTURBED DYNAMICS By Ari Arapostathis, Anup Biswas, and Vivek S. Borkar The University of Texas at Austin, Indian Institute of Science
More informationA Barrier Version of the Russian Option
A Barrier Version of the Russian Option L. A. Shepp, A. N. Shiryaev, A. Sulem Rutgers University; shepp@stat.rutgers.edu Steklov Mathematical Institute; shiryaev@mi.ras.ru INRIA- Rocquencourt; agnes.sulem@inria.fr
More informationDynamic Control of a Tandem Queueing System with Abandonments
Dynamic Control of a Tandem Queueing System with Abandonments Gabriel Zayas-Cabán 1 Jungui Xie 2 Linda V. Green 3 Mark E. Lewis 1 1 Cornell University Ithaca, NY 2 University of Science and Technology
More information1 Stochastic Dynamic Programming
1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future
More informationPoisson Jumps in Credit Risk Modeling: a Partial Integro-differential Equation Formulation
Poisson Jumps in Credit Risk Modeling: a Partial Integro-differential Equation Formulation Jingyi Zhu Department of Mathematics University of Utah zhu@math.utah.edu Collaborator: Marco Avellaneda (Courant
More information