UNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}.

Size: px
Start display at page:

Download "UNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}."

Transcription

1 Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 1 UNIFORMIZATION IN MARKOV DECISION PROCESSES OGUZHAN ALAGOZ MEHMET U.S. AYVACI Department of Industrial and Systems Engineering, University of Wisconsin-Madison, Madison, Wisconsin Most Markov decision process (MDP) models consider problems with decisions occurring at discrete time points. On the other hand, there are several real-life applications, particularly in queuing systems, in which the decision maker chooses actions at random times over a continuous-time interval. Such problems can be modeled using continuous-time models. Semi-Markov decision processes (SMDPs) (see Semi- Markov Decision Processes), a class of continuous-time models, generalize discretetime Markov decision processes (DTMDPs) by allowing state changes to occur randomly over continuous time and letting or requiring decisions to be taken whenever the system state changes 1,2. In SMDPs, the stochastic process defined by the state transitions follows a discrete-time Markov chain while the time between each transition is drawn from a general distribution, independent of transitions 1,3. Continuous-time Markov decision processes (CTMDPs) constitute a special type of SMDPs in which the transition times between decisions are exponentially distributed and actions are taken at every transition 2. Uniformization, as a tool, is used to convert a CTMDP into an equivalent DTMDP. Although uniformization has been used to analyze continuous-time Markov processes for a long time 4 7, Serfozo 8 formalized the use of uniformization in the context of countable-state CTMDPs. In this article, we will describe uniformization in CTMDPs. Although we consider CTMDPs with stationary transition probabilities and reward functions, bounded reward functions, and finite state and action spaces, the results can be easily extended to CTMDPs with countable state and action spaces as well as to more general spaces with appropriate measurability conditions 8. While we focus on uniformization only for infinite-horizon CTMDPs with total expected discounted reward criterion, uniformization can also be utilized in CTMDPs with average reward criterion 2. Assuming a unichain transition probability matrix for every stationary policy, the transformation and modeling scheme for solving CTMDPs with average reward criterion (see Average Reward MDPs: Solution Techniques) is identical to those described in this article except for the computation of the reward function. The results apply to multichain cases as well, with slight modifications. More information on uniformization in CTMDPs with average reward criterion is available elsewhere 2. The remainder of this article is organized as follows: we next summarize how uniformization is used to convert a continuous-time Markov chain (CTMC) into an equivalent discrete-time Markov chain. Then, we describe the use of uniformization in CTMDPs. Finally, we present two examples for uniformization. UNIFORMIZATION IN CONTINUOUS-TIME MARKOV CHAINS CTMCs are formally defined as follows 9 (see the section titled Continuous-Time Markov Chains (CTMCs) in this encyclopedia): a continuous-time stochastic process {X(t), t } is a CTMC if for all s, t and nonnegative integers i, j, x(u) where u < t, P{X(t + s) = j X(t) = i, X(u) = x(u), u < t} = P{X(t + s) = j X(t) = i}. A CTMC is a stochastic process with the Markovian property; that is, the conditional distribution P(X(s + t) X(u)) of the future Wiley Encyclopedia of Operations Research and Management Science, edited by James J. Cochran Copyright 21 John Wiley & Sons, Inc. 1

2 Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 2 2 UNIFORMIZATION IN MARKOV DECISION PROCESSES state X(t + s) is independent of the past states X(u), u < t and only depends on the current state X(t) (see Definition and Examples of CTMCs). Consider a CTMC in which the time to make a transition from its current state to a different state is exponentially distributed with β for all states. Let P ij (t) denote the probability of being in state j at time t starting from state i at time. Note that the number of transitions by time t, {N(t),t } is a Poisson process with rate β 9. Therefore, P ij (t) can be recasted by conditioning on the number of transitions by time t as follows: P ij (t) = P{X(t) = j X() = i} = P { X(t) = j X() = i, N(t) = n } P(N(t) = n X() = i) = P { X(t) = j X() = i, N(t) = n } βt (βt)n e = P n (βt)n ije βt, (1) where P n ij represents the n-step stationary transition probability of an equivalent discrete-time Markov chain with transition probabilities P ij.thatis, P{X(t) = j X() = i, N(t) = n} =P n ij. (2) Equation (1) follows from the assumption that the time spent in every state is exponentially distributed with rate (β). More specifically, the probability of moving from i to j in n transitions is equal to the probability of moving from i to j by time t since moving in n steps does not give any information on which states were visited due to identical sojourn times. Therefore, Equation (2) can be applied only if all states have identical sojourn time distributions. In order to convert a CTMC with different transition rates into a discrete-time Markov chain we use uniformization. Suppose the mean sojourn time at each state is 1/β i and there exists a finite constant β such that β i β, foralli. According to the new scheme, we assign the same transition rate β to all states i, where the transition process is divided into two: the fictitious transitions to the state itself and the transitions to the other states. To match the actual process, we will have the process remain in each state for an exponential amount of time with rate β and the new transition probabilities will be defined as { β P 1 i β ij =, j = i, β i β P ij, j i. Applying the new transition probabilities to Equation (1), we obtain P ij (t) = ( P ij ) n βt (βt)n e. Figure 1 shows the schematic of a simple uniformization example. In summary, uniformization enabled us to convert a CTMC with state-dependent out-of-state transition rates into an analytically equivalent CTMC with uniform transition rates. This new system could be treated as a discrete-time Markov chain for the purposes of analysis 9 UNIFORMIZATION IN CONTINUOUS-TIME MARKOV DECISION PROCESSES In this section, we describe the uniformization process in CTMDPs. We start with the simpler case, where the transition rates are uniform and then extend this to the more general form, where the transition rates are state-and-action-dependent. Uniform Transition Rates Consider an infinite-horizon discounted CTMDP with the following reward (cost) function: t n lim e αt g(s(t), a(t)) dt, n E s where t n represents the time of nth transition, n ={1,..., }; α is the continuous-time discounting factor, α > ; g(s(t), a(t)) is the reward obtained when action a(t) is selected at state s(t). If we let s n and a n denote

3 Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 3 UNIFORMIZATION IN MARKOV DECISION PROCESSES 3 Original scheme After uniformization β β 1 β β CTMC ν (i ): Transition rate out of state i 1 β 1 1 β 1 Equivalent DTMC (Probabilities) ν() = β, ν(1) = β 1 P 1 = 1 1 P 1 = 1 P = β 1 / β ν() = ν(1) = β = β + β 1 P 1 = β / β 1 P 1 = β 1 / β P 11 = β / β Figure 1. An illustrative example for the uniformization of a CTMC through the use of fictitious self-transitions. the state and the action selected at time t n, respectively, then s(t) = s n and a(t) = a n hold where t n t < t n+1. Suppose g(s(t),a(t)) consists of two parts: K(s(t),a(t)), the lump reward obtained when a new state and action pair is observed and C(s(t),a(t)), the continuous reward accumulated if the state was s(t) andactiona(t) was taken in the last decision epoch. The state of a CTMDP does not change between decision epochs, therefore, the value of a given policy π for a CTMDP, v π α, that is, the total expected discounted reward over the infinite horizon for π is calculated as follows: v π α = Eπ s e (K(s αtn n, a n ) + tn+1 tn e α(t tn) C(s n, a n )dt). (3) Let τ n+1 = t n+1 t n (τ = t = ) denote the time the process remains in s n,whichisdistributed exponentially with parameter β for all states. Then, we can rewrite Equation (3) as follows: v π α = Eπ s e αtn K(s n, a n ) + E π s e αtn C(s n, a n ) τn+1 = E π s e α(τ 1 + +τn) K(s n, a n ) = + E π s e α(τ 1 + +τn) 1 α (1 e ατ n+1)c(s n, a n ) + E π s K(sn, a n ) ( E π ) s e ατ 1 n E π s C(sn, a n ) 1 α ( E π s e ατ 1 ) n e αt dt ( 1 E π ) s e ατ 1, (4) where Equation (4) follows from the assumption that {τ 1, τ 2,..., τ n+1 } are independent and identically distributed with exponential distribution and s n is independent of τ 1, τ 2,..., τ n+1. Evaluating the expectation of the exponential, E π s e ατ 1 = e αt βe βt dt = β := λ,

4 Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 4 4 UNIFORMIZATION IN MARKOV DECISION PROCESSES and rewriting this equation with the λ substituted in Equation (4), we obtain v π α = ( Eπ s λ n K(s n, a n ) + C(s ) n, a n ) = E π s λ n r(s n, a n ), which has the same form as an equivalent DTMDP if we redefine K(s n, a n ) + C(s n, a n )/() = r(s n, a n ) as the total expected discounted reward between two decision epochs for the (s n, a n ) pair. Note that this was achieved since the sojourn times at each state are assumed to be independent and identically distributed with exponential distribution. To summarize, a CTMDP with the reward function t n lim e αt g ( s(t), a(t) ) dt, n E s and a transition rate β for all states and actions is equivalent to a DTMDP with the discount factor λ = β, and the total expected discounted reward between two decision epochs is given by C(s, a) r(s, a) = K(s, a) +, (5) where the reward functions K and C are defined as above. Let P(j s, a) representthe probability that the state at the next decision epoch will be j, given that the state is s and action a is taken at the last decision epoch. Then, the Bellman equations can be rewritten as v(s) = max a As r(s, a) + λ P(j s, a)v(j), and be solved as a DTMDP, where A s and v(j) represent the set of available actions at state s and the optimal total expected discounted reward that can be obtained when the process starts in state j, respectively. Nonuniform Transition Rates A major limiting assumption for the above result is the assumption of identical transition rates across all states and actions. In this section, we show that by allowing fictitious transitions from a state to itself as in the previous section, we can extend the results for CTMDPs with uniform transition rates to those with nonuniform transition rates. Letβ(s, a) denote the transition rate out of state s when action a is taken and β represent a uniform transition rate satisfying β(s, a) β, s S and a A s,wherea s represents the action space for state s. Wecanthenmodify the transition probabilities as { 1 β(s,a) β, j = s; P(j s, a) = β(s,a) β P(j s, a), j s. (6) By creating fictitious transitions, we are creating a stochastically equivalent process in which the transitions occur more often. As an example, when the process is in state s, it will leave s at a faster rate β but will return to the same state with 1 β(s, a)/β probability. Probabilistically, the new process will move to another state at the same rate as in the original one. As a result of the uniformization of the nonidentical transition rates, we can use the results for CTMDPs with uniform transition rates. To summarize, we can analyze a CTMDP with exponential transition rates β(s, a), transition probabilities P(j s, a), and a reward function of lim N E s tn e αt g(s(t), a(t)) dt, by converting it into an equivalent CTMDP with the discount factor λ = β α+β, where β(s, a) β, s S, a A s and the transition probabilities are given by Equation (6). The total expected discounted reward between two decision epochs is given by Equation (5).

5 Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 5 UNIFORMIZATION IN MARKOV DECISION PROCESSES 5 The optimality equation is then written as C(s, a) v(s) = max K(s, a) + a As + β P(j s, a)v(j) (7) = max a As r(s, a) + λ P(j s, a)v(j) (8) and can be analyzed as a DTMDP. It can easily be shown 1 that after several simple algebraic manipulations, Equations (7) and (8) can also be presented as follows: v(s) = 1 max ()K(s, a) a As + C(s, a) + (β β(s, a))v(s) + β(s, a) P(j s, a)v(j). The optimality equations given by Equation (8) provide a compact form that is very similar to the conventional optimality equations for DTMDPs and therefore, are easier to comprehend. EXAMPLES In this section, we present two simple examples from queuing systems to illustrate the use of uniformization in continuous-time Markov models. More examples for the application of uniformization to CTMDPs are available elsewhere 2,1,11. Meeting the Professor. Students come in randomly during Professor Smith s office hours and on some occasions, they find the professor busy with other students, in which case they leave and return later. The interarrival times of the students are independent and identically distributed exponentially with rate ω, and it takes exponential amount of time with rate μ for Professor Smith to finish with a student. A student arrives at the office and finds the professor busy with another student. We compute the probability that the professor will be available if the student comes back at time t. We model the process as a birth and death process, where states and 1 represent when the professor is available and busy with another student, respectively. We can solve a set of differential equations to calculate the probability in the question. However, we will solve this problem using uniformization. Note that this problem is essentially an M/M/1/1 queue. The reader can refer to Ref. 9 for an analysis of the model to derive the probability in question. The process has the following parameters: β = ω, β 1 = μ, andp 1 = P 1 = 1. By defining β = ω + μ, we can uniformize the CTMC to obtain P = μ ω + μ = 1 P 1, P 1 = μ ω + μ = 1 P 11. This creates a new transition matrix with identical entries in a column: ( ) μ ω P ij = μ ω = P n ij i, j ={, 1} n = 1, 2,... Hence, using the uniformization for CTMCs, P 11 (t) = P n 11 = e ()t + (ω + μ)tn e ()t ( ω ) ω + μ n=1 ()t (ω + μ)tn e = e ()t + 1 e ()t ω ω + μ = ω ω + μ + μ ω + μ e ()t. The required probability is then P 1 (t)=1 P 11 (t)= μ 1 e ()t. ω + μ

6 Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 6 6 UNIFORMIZATION IN MARKOV DECISION PROCESSES Professor s Dilemma. Consider a slightly modified version of the above example. Namely, we will now model the professor s decision on how fast he should answer a student s questions. Suppose, the professor has only three chairs in the office and students coming to the office hours get in only if there is a vacant seat in the office. Every time a student comes in, the professor sets his pace in answering questions so that he can be fair to all that are waiting. His pace in terms of time he expects to spend with a student, is distributed exponentially with ameanthatisin1/μ,1/μ interval.each time a student comes in, the professor will accrue a reward of U(s, μ), the immediate utility he gets when the total number of students in the office is s and he chooses his pace as μ. The utility depends on the number of people waiting, as well as the pace reflecting the quality of time he expects to spend with the students. On the other hand, he accrues a utility of u(s, μ) while he is answering the student s questions and this is continuously discounted at rate α reflecting the fact that the more time he spends with a student, the less he can spend with others. We will write the optimality equations for Professor s pace decision problem. We define the state space S ={, 1, 2, 3}, representing the number of students that are in the office. The transition rates are ω, s = ; β(s, μ) = ω + μ, s = 1, 2; μ, s = 3. The maximum possible transition rate is β = ω + μ. The new transition probability matrix for a given μ is as follows: μ ω μ μ μ P = μ ω μ μ ω μ ω, where (s, j)entryof P represents P(j s, a = μ). Uniformizing the decision process via using β and the above transition matrix followed by application of Equation (7) leads us to the optimality equations given by u(s, μ) v(s) = max U(s, μ) + μ (μ,μ) + β P(j s, μ)v(j). As discussed in the previous section, the above equations could be converted to the form of Equation (8), and therefore can be solved using conventional DTMDP solution techniques such as value iteration, policy iteration, or linear programming (see Total Expected Discounted Reward MDPs: Value Iteration Algorithm, Total Expected Discounted Reward MDPs: Policy Iteration Algorithm and Linear Programming Formulations of MDPs). Acknowledgments This article was supported in part by National Science Foundation grant CMMI The authors thank Jeffrey Kharoufeh and two anonymous referees for their suggestions and insights, which improved this manuscript. REFERENCES 1. Heyman DP, Sobel MJ. Stochastic models. New York: Elsevier Science Publications; Puterman ML. Markov decision processes: discrete stochastic dynamic programming. New York: John Wiley & Sons, Inc.; Cinlar E. Introduction to stochastic processes. Englewood Cliffs (NJ): Prentice Hall; Howard R. Dynamic programming and Markov processes. Cambridge (MA): MIT Press; Jensen A. Markov chains as an aid in the study of Markov processes. Skand Aktuarietidskr 1953;34(3): Lippman SA. Applying a new device in the optimization of exponential queuing systems. Oper Res 1975;23(4): Veinott AF. Discrete dynamic programming with sensitive discount optimality. Ann Math Stat 1969;4:

7 Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 7 UNIFORMIZATION IN MARKOV DECISION PROCESSES 7 8. Serfozo R. An equivalence between discrete and continuous time Markov decision processes. Oper Res 1979;27: Ross SM. Introduction to probability models. New York: Academic Press; Bertsekas DP. Volumes 1 and 2, Dynamic programming and stochastic control. Belmont (MA): Athena Scientific; Walrand J. An introduction to queueing networks. Englewood Cliffs (NJ): Prentice Hall; 1988.

8 Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. Queries in Article eorms934 Q1. Please confirm if the suggested keywords are fine.

9 Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. Please note that the abstract and keywords will not be included in the printed book, but are required for the online presentation of this book which will be published on Wiley s own online publishing platform. Q1 If the abstract and keywords are not present below, please take this opportunity to add them now. The abstract should be a short paragraph upto 2 words in length and keywords between 5 to 1 words. Abstract: Continuous-time Markov decision processes (CTMDP) may be viewed as a special case of semi-markov decision processes (SMDP) where the intertransition times are exponentially distributed and the decision maker is allowed to choose actions whenever the system state changes. When the transition rates are identical for each state and action pair, one can convert a CTMDP into an equivalent discrete-time Markov decision process (DTMDP), which is easier to analyze and solve. In this article, we describe uniformization that uses fictitious transitions from a state to itself and hence enables the conversion of a CTMDP with nonidentical transition rates into an equivalent DTMDP. We first demonstrate the use of uniformization in converting a continuous-time Markov chain into an equivalent discrete-time Markov chain, and then describe how it is used in the context of CTMDPs with discounted reward criterion. We also present examples for the use of uniformization in continuous-time Markov models. Keywords: MDP; DTMDP; CTMDP; discounted reward; uniformization

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process

More information

ON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES

ON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES MMOR manuscript No. (will be inserted by the editor) ON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES Eugene A. Feinberg Department of Applied Mathematics and Statistics; State University of New

More information

Statistics 150: Spring 2007

Statistics 150: Spring 2007 Statistics 150: Spring 2007 April 23, 2008 0-1 1 Limiting Probabilities If the discrete-time Markov chain with transition probabilities p ij is irreducible and positive recurrent; then the limiting probabilities

More information

Data analysis and stochastic modeling

Data analysis and stochastic modeling Data analysis and stochastic modeling Lecture 7 An introduction to queueing theory Guillaume Gravier guillaume.gravier@irisa.fr with a lot of help from Paul Jensen s course http://www.me.utexas.edu/ jensen/ormm/instruction/powerpoint/or_models_09/14_queuing.ppt

More information

Chapter 5. Continuous-Time Markov Chains. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan

Chapter 5. Continuous-Time Markov Chains. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan Chapter 5. Continuous-Time Markov Chains Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan Continuous-Time Markov Chains Consider a continuous-time stochastic process

More information

Dynamic Control of a Tandem Queueing System with Abandonments

Dynamic Control of a Tandem Queueing System with Abandonments Dynamic Control of a Tandem Queueing System with Abandonments Gabriel Zayas-Cabán 1 Jungui Xie 2 Linda V. Green 3 Mark E. Lewis 1 1 Cornell University Ithaca, NY 2 University of Science and Technology

More information

Infinite-Horizon Discounted Markov Decision Processes

Infinite-Horizon Discounted Markov Decision Processes Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected

More information

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs 2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter

More information

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes Infinite-Horizon Average Reward Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 1 Outline The average

More information

IEOR 6711, HMWK 5, Professor Sigman

IEOR 6711, HMWK 5, Professor Sigman IEOR 6711, HMWK 5, Professor Sigman 1. Semi-Markov processes: Consider an irreducible positive recurrent discrete-time Markov chain {X n } with transition matrix P (P i,j ), i, j S, and finite state space.

More information

Stochastic Models. Edited by D.P. Heyman Bellcore. MJ. Sobel State University of New York at Stony Brook

Stochastic Models. Edited by D.P. Heyman Bellcore. MJ. Sobel State University of New York at Stony Brook Stochastic Models Edited by D.P. Heyman Bellcore MJ. Sobel State University of New York at Stony Brook 1990 NORTH-HOLLAND AMSTERDAM NEW YORK OXFORD TOKYO Contents Preface CHARTER 1 Point Processes R.F.

More information

Regenerative Processes. Maria Vlasiou. June 25, 2018

Regenerative Processes. Maria Vlasiou. June 25, 2018 Regenerative Processes Maria Vlasiou June 25, 218 arxiv:144.563v1 [math.pr] 22 Apr 214 Abstract We review the theory of regenerative processes, which are processes that can be intuitively seen as comprising

More information

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes?

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes? IEOR 3106: Introduction to Operations Research: Stochastic Models Fall 2006, Professor Whitt SOLUTIONS to Final Exam Chapters 4-7 and 10 in Ross, Tuesday, December 19, 4:10pm-7:00pm Open Book: but only

More information

Dynamic control of a tandem system with abandonments

Dynamic control of a tandem system with abandonments Dynamic control of a tandem system with abandonments Gabriel Zayas-Cabán 1, Jingui Xie 2, Linda V. Green 3, and Mark E. Lewis 4 1 Center for Healthcare Engineering and Patient Safety University of Michigan

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition R. G. Gallager January 31, 2011 i ii Preface These notes are a draft of a major rewrite of a text [9] of the same name. The notes and the text are outgrowths

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

21 Markov Decision Processes

21 Markov Decision Processes 2 Markov Decision Processes Chapter 6 introduced Markov chains and their analysis. Most of the chapter was devoted to discrete time Markov chains, i.e., Markov chains that are observed only at discrete

More information

Stochastic process. X, a series of random variables indexed by t

Stochastic process. X, a series of random variables indexed by t Stochastic process X, a series of random variables indexed by t X={X(t), t 0} is a continuous time stochastic process X={X(t), t=0,1, } is a discrete time stochastic process X(t) is the state at time t,

More information

Procedia Computer Science 00 (2011) 000 6

Procedia Computer Science 00 (2011) 000 6 Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-

More information

On Optimization of the Total Present Value of Profits under Semi Markov Conditions

On Optimization of the Total Present Value of Profits under Semi Markov Conditions On Optimization of the Total Present Value of Profits under Semi Markov Conditions Katehakis, Michael N. Rutgers Business School Department of MSIS 180 University, Newark, New Jersey 07102, U.S.A. mnk@andromeda.rutgers.edu

More information

6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE Control of continuous-time Markov chains Semi-Markov problems Problem formulation Equivalence to discretetime problems Discounted problems Average cost

More information

INEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS

INEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS Applied Probability Trust (5 October 29) INEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS EUGENE A. FEINBERG, Stony Brook University JUN FEI, Stony Brook University Abstract We consider the following

More information

IEOR 4106: Introduction to Operations Research: Stochastic Models Spring 2011, Professor Whitt Class Lecture Notes: Tuesday, March 1.

IEOR 4106: Introduction to Operations Research: Stochastic Models Spring 2011, Professor Whitt Class Lecture Notes: Tuesday, March 1. IEOR 46: Introduction to Operations Research: Stochastic Models Spring, Professor Whitt Class Lecture Notes: Tuesday, March. Continuous-Time Markov Chains, Ross Chapter 6 Problems for Discussion and Solutions.

More information

Continuous Time Processes

Continuous Time Processes page 102 Chapter 7 Continuous Time Processes 7.1 Introduction In a continuous time stochastic process (with discrete state space), a change of state can occur at any time instant. The associated point

More information

Risk-Sensitive and Average Optimality in Markov Decision Processes

Risk-Sensitive and Average Optimality in Markov Decision Processes Risk-Sensitive and Average Optimality in Markov Decision Processes Karel Sladký Abstract. This contribution is devoted to the risk-sensitive optimality criteria in finite state Markov Decision Processes.

More information

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1 Irreducibility Irreducible every state can be reached from every other state For any i,j, exist an m 0, such that i,j are communicate, if the above condition is valid Irreducible: all states are communicate

More information

Lecture 1: Brief Review on Stochastic Processes

Lecture 1: Brief Review on Stochastic Processes Lecture 1: Brief Review on Stochastic Processes A stochastic process is a collection of random variables {X t (s) : t T, s S}, where T is some index set and S is the common sample space of the random variables.

More information

Inventory Ordering Control for a Retrial Service Facility System Semi- MDP

Inventory Ordering Control for a Retrial Service Facility System Semi- MDP International Journal of Engineering Science Invention (IJESI) ISS (Online): 239 6734, ISS (Print): 239 6726 Volume 7 Issue 6 Ver I June 208 PP 4-20 Inventory Ordering Control for a Retrial Service Facility

More information

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony

More information

MARKOV PROCESSES. Valerio Di Valerio

MARKOV PROCESSES. Valerio Di Valerio MARKOV PROCESSES Valerio Di Valerio Stochastic Process Definition: a stochastic process is a collection of random variables {X(t)} indexed by time t T Each X(t) X is a random variable that satisfy some

More information

Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes

Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes Jefferson Huang Dept. Applied Mathematics and Statistics Stony Brook

More information

REMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I.

REMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I. REMARKS ON THE EXISTENCE OF SOLUTIONS TO THE AVERAGE COST OPTIMALITY EQUATION IN MARKOV DECISION PROCESSES Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I. Marcus Department of Electrical

More information

Strategic Dynamic Jockeying Between Two Parallel Queues

Strategic Dynamic Jockeying Between Two Parallel Queues Strategic Dynamic Jockeying Between Two Parallel Queues Amin Dehghanian 1 and Jeffrey P. Kharoufeh 2 Department of Industrial Engineering University of Pittsburgh 1048 Benedum Hall 3700 O Hara Street Pittsburgh,

More information

The Optimal Stopping of Markov Chain and Recursive Solution of Poisson and Bellman Equations

The Optimal Stopping of Markov Chain and Recursive Solution of Poisson and Bellman Equations The Optimal Stopping of Markov Chain and Recursive Solution of Poisson and Bellman Equations Isaac Sonin Dept. of Mathematics, Univ. of North Carolina at Charlotte, Charlotte, NC, 2822, USA imsonin@email.uncc.edu

More information

Minimum average value-at-risk for finite horizon semi-markov decision processes

Minimum average value-at-risk for finite horizon semi-markov decision processes 12th workshop on Markov processes and related topics Minimum average value-at-risk for finite horizon semi-markov decision processes Xianping Guo (with Y.H. HUANG) Sun Yat-Sen University Email: mcsgxp@mail.sysu.edu.cn

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Part I Stochastic variables and Markov chains

Part I Stochastic variables and Markov chains Part I Stochastic variables and Markov chains Random variables describe the behaviour of a phenomenon independent of any specific sample space Distribution function (cdf, cumulative distribution function)

More information

Continuous-Time Markov Chain

Continuous-Time Markov Chain Continuous-Time Markov Chain Consider the process {X(t),t 0} with state space {0, 1, 2,...}. The process {X(t),t 0} is a continuous-time Markov chain if for all s, t 0 and nonnegative integers i, j, x(u),

More information

On the static assignment to parallel servers

On the static assignment to parallel servers On the static assignment to parallel servers Ger Koole Vrije Universiteit Faculty of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The Netherlands Email: koole@cs.vu.nl, Url: www.cs.vu.nl/

More information

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

Total Expected Discounted Reward MDPs: Existence of Optimal Policies Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600

More information

LECTURE #6 BIRTH-DEATH PROCESS

LECTURE #6 BIRTH-DEATH PROCESS LECTURE #6 BIRTH-DEATH PROCESS 204528 Queueing Theory and Applications in Networks Assoc. Prof., Ph.D. (รศ.ดร. อน นต ผลเพ ม) Computer Engineering Department, Kasetsart University Outline 2 Birth-Death

More information

Elements of Reinforcement Learning

Elements of Reinforcement Learning Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,

More information

Introduction to Queuing Networks Solutions to Problem Sheet 3

Introduction to Queuing Networks Solutions to Problem Sheet 3 Introduction to Queuing Networks Solutions to Problem Sheet 3 1. (a) The state space is the whole numbers {, 1, 2,...}. The transition rates are q i,i+1 λ for all i and q i, for all i 1 since, when a bus

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process

More information

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t 2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ algebras {F t } such that F t F and F t F t+1 for all t = 0, 1,.... In continuous time, the second condition

More information

Figure 10.1: Recording when the event E occurs

Figure 10.1: Recording when the event E occurs 10 Poisson Processes Let T R be an interval. A family of random variables {X(t) ; t T} is called a continuous time stochastic process. We often consider T = [0, 1] and T = [0, ). As X(t) is a random variable

More information

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K "

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems  M/M/1  M/M/m  M/M/1/K Queueing Theory I Summary Little s Law Queueing System Notation Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K " Little s Law a(t): the process that counts the number of arrivals

More information

Multi Stage Queuing Model in Level Dependent Quasi Birth Death Process

Multi Stage Queuing Model in Level Dependent Quasi Birth Death Process International Journal of Statistics and Systems ISSN 973-2675 Volume 12, Number 2 (217, pp. 293-31 Research India Publications http://www.ripublication.com Multi Stage Queuing Model in Level Dependent

More information

Queuing Theory. Using the Math. Management Science

Queuing Theory. Using the Math. Management Science Queuing Theory Using the Math 1 Markov Processes (Chains) A process consisting of a countable sequence of stages, that can be judged at each stage to fall into future states independent of how the process

More information

Modelling data networks stochastic processes and Markov chains

Modelling data networks stochastic processes and Markov chains Modelling data networks stochastic processes and Markov chains a 1, 3 1, 2 2, 2 b 0, 3 2, 3 u 1, 3 α 1, 6 c 0, 3 v 2, 2 β 1, 1 Richard G. Clegg (richard@richardclegg.org) November 2016 Available online

More information

Simplex Algorithm for Countable-state Discounted Markov Decision Processes

Simplex Algorithm for Countable-state Discounted Markov Decision Processes Simplex Algorithm for Countable-state Discounted Markov Decision Processes Ilbin Lee Marina A. Epelman H. Edwin Romeijn Robert L. Smith November 16, 2014 Abstract We consider discounted Markov Decision

More information

Some notes on Markov Decision Theory

Some notes on Markov Decision Theory Some notes on Markov Decision Theory Nikolaos Laoutaris laoutaris@di.uoa.gr January, 2004 1 Markov Decision Theory[1, 2, 3, 4] provides a methodology for the analysis of probabilistic sequential decision

More information

Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University.

Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University. Continuous-Time Markov Decision Processes Discounted and Average Optimality Conditions Xianping Guo Zhongshan University. Email: mcsgxp@zsu.edu.cn Outline The control model The existing works Our conditions

More information

Semi-Markov Decision Problems and Performance Sensitivity Analysis

Semi-Markov Decision Problems and Performance Sensitivity Analysis 758 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO. 5, MAY 2003 Semi-Markov Decision Problems Performance Sensitivity Analysis Xi-Ren Cao, Fellow, IEEE Abstract Recent research indicates that Markov

More information

(implicitly assuming time-homogeneity from here on)

(implicitly assuming time-homogeneity from here on) Continuous-Time Markov Chains Models Tuesday, November 15, 2011 2:02 PM The fundamental object describing the dynamics of a CTMC (continuous-time Markov chain) is the probability transition (matrix) function:

More information

Individual, Class-based, and Social Optimal Admission Policies in Two-Priority Queues

Individual, Class-based, and Social Optimal Admission Policies in Two-Priority Queues Individual, Class-based, and Social Optimal Admission Policies in Two-Priority Queues Feng Chen, Vidyadhar G. Kulkarni Department of Statistics and Operations Research, University of North Carolina at

More information

Lecture Notes of Communication Network I y: part 5. Lijun Qian. Rutgers The State University of New Jersey. 94 Brett Road. Piscataway, NJ

Lecture Notes of Communication Network I y: part 5. Lijun Qian. Rutgers The State University of New Jersey. 94 Brett Road. Piscataway, NJ Lecture Notes of Communication Network I y: part 5 Lijun Qian Department of Electrical Engineering Rutgers The State University of New Jersey 94 Brett Road Piscataway, NJ 08854-8058 October 6, 1998 Abstract

More information

Continuous Time Markov Chains

Continuous Time Markov Chains Continuous Time Markov Chains Stochastic Processes - Lecture Notes Fatih Cavdur to accompany Introduction to Probability Models by Sheldon M. Ross Fall 2015 Outline Introduction Continuous-Time Markov

More information

ADMISSION CONTROL IN THE PRESENCE OF PRIORITIES: A SAMPLE PATH APPROACH

ADMISSION CONTROL IN THE PRESENCE OF PRIORITIES: A SAMPLE PATH APPROACH Chapter 1 ADMISSION CONTROL IN THE PRESENCE OF PRIORITIES: A SAMPLE PATH APPROACH Feng Chen Department of Statistics and Operations Research University of North Carolina at Chapel Hill chenf@email.unc.edu

More information

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems

More information

Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions

Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions Håkan L. S. Younes and Reid G. Simmons School of Computer Science Carnegie Mellon University Pittsburgh, PA

More information

Performance Evaluation of Queuing Systems

Performance Evaluation of Queuing Systems Performance Evaluation of Queuing Systems Introduction to Queuing Systems System Performance Measures & Little s Law Equilibrium Solution of Birth-Death Processes Analysis of Single-Station Queuing Systems

More information

Some AI Planning Problems

Some AI Planning Problems Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

PBW 654 Applied Statistics - I Urban Operations Research

PBW 654 Applied Statistics - I Urban Operations Research PBW 654 Applied Statistics - I Urban Operations Research Lecture 2.I Queuing Systems An Introduction Operations Research Models Deterministic Models Linear Programming Integer Programming Network Optimization

More information

Queuing Theory. Richard Lockhart. Simon Fraser University. STAT 870 Summer 2011

Queuing Theory. Richard Lockhart. Simon Fraser University. STAT 870 Summer 2011 Queuing Theory Richard Lockhart Simon Fraser University STAT 870 Summer 2011 Richard Lockhart (Simon Fraser University) Queuing Theory STAT 870 Summer 2011 1 / 15 Purposes of Today s Lecture Describe general

More information

On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes

On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook

More information

Markov decision processes and interval Markov chains: exploiting the connection

Markov decision processes and interval Markov chains: exploiting the connection Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic

More information

IEOR 6711: Stochastic Models I, Fall 2003, Professor Whitt. Solutions to Final Exam: Thursday, December 18.

IEOR 6711: Stochastic Models I, Fall 2003, Professor Whitt. Solutions to Final Exam: Thursday, December 18. IEOR 6711: Stochastic Models I, Fall 23, Professor Whitt Solutions to Final Exam: Thursday, December 18. Below are six questions with several parts. Do as much as you can. Show your work. 1. Two-Pump Gas

More information

Since D has an exponential distribution, E[D] = 0.09 years. Since {A(t) : t 0} is a Poisson process with rate λ = 10, 000, A(0.

Since D has an exponential distribution, E[D] = 0.09 years. Since {A(t) : t 0} is a Poisson process with rate λ = 10, 000, A(0. IEOR 46: Introduction to Operations Research: Stochastic Models Chapters 5-6 in Ross, Thursday, April, 4:5-5:35pm SOLUTIONS to Second Midterm Exam, Spring 9, Open Book: but only the Ross textbook, the

More information

Homework 4 due on Thursday, December 15 at 5 PM (hard deadline).

Homework 4 due on Thursday, December 15 at 5 PM (hard deadline). Large-Time Behavior for Continuous-Time Markov Chains Friday, December 02, 2011 10:58 AM Homework 4 due on Thursday, December 15 at 5 PM (hard deadline). How are formulas for large-time behavior of discrete-time

More information

Markov Decision Processes and their Applications to Supply Chain Management

Markov Decision Processes and their Applications to Supply Chain Management Markov Decision Processes and their Applications to Supply Chain Management Jefferson Huang School of Operations Research & Information Engineering Cornell University June 24 & 25, 2018 10 th Operations

More information

MAT SYS 5120 (Winter 2012) Assignment 5 (not to be submitted) There are 4 questions.

MAT SYS 5120 (Winter 2012) Assignment 5 (not to be submitted) There are 4 questions. MAT 4371 - SYS 5120 (Winter 2012) Assignment 5 (not to be submitted) There are 4 questions. Question 1: Consider the following generator for a continuous time Markov chain. 4 1 3 Q = 2 5 3 5 2 7 (a) Give

More information

Lecture 20: Reversible Processes and Queues

Lecture 20: Reversible Processes and Queues Lecture 20: Reversible Processes and Queues 1 Examples of reversible processes 11 Birth-death processes We define two non-negative sequences birth and death rates denoted by {λ n : n N 0 } and {µ n : n

More information

Discrete-Time Markov Decision Processes

Discrete-Time Markov Decision Processes CHAPTER 6 Discrete-Time Markov Decision Processes 6.0 INTRODUCTION In the previous chapters we saw that in the analysis of many operational systems the concepts of a state of a system and a state transition

More information

A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion

A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion Cao, Jianhua; Nyberg, Christian Published in: Seventeenth Nordic Teletraffic

More information

Introduction to queuing theory

Introduction to queuing theory Introduction to queuing theory Queu(e)ing theory Queu(e)ing theory is the branch of mathematics devoted to how objects (packets in a network, people in a bank, processes in a CPU etc etc) join and leave

More information

Chapter 16 Planning Based on Markov Decision Processes

Chapter 16 Planning Based on Markov Decision Processes Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until

More information

Queueing. Chapter Continuous Time Markov Chains 2 CHAPTER 5. QUEUEING

Queueing. Chapter Continuous Time Markov Chains 2 CHAPTER 5. QUEUEING 2 CHAPTER 5. QUEUEING Chapter 5 Queueing Systems are often modeled by automata, and discrete events are transitions from one state to another. In this chapter we want to analyze such discrete events systems.

More information

Robustness of policies in Constrained Markov Decision Processes

Robustness of policies in Constrained Markov Decision Processes 1 Robustness of policies in Constrained Markov Decision Processes Alexander Zadorojniy and Adam Shwartz, Senior Member, IEEE Abstract We consider the optimization of finite-state, finite-action Markov

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdlhandlenet/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive

More information

SMSTC (2007/08) Probability.

SMSTC (2007/08) Probability. SMSTC (27/8) Probability www.smstc.ac.uk Contents 12 Markov chains in continuous time 12 1 12.1 Markov property and the Kolmogorov equations.................... 12 2 12.1.1 Finite state space.................................

More information

Q-Learning for Markov Decision Processes*

Q-Learning for Markov Decision Processes* McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of

More information

ON THE LAW OF THE i TH WAITING TIME INABUSYPERIODOFG/M/c QUEUES

ON THE LAW OF THE i TH WAITING TIME INABUSYPERIODOFG/M/c QUEUES Probability in the Engineering and Informational Sciences, 22, 2008, 75 80. Printed in the U.S.A. DOI: 10.1017/S0269964808000053 ON THE LAW OF THE i TH WAITING TIME INABUSYPERIODOFG/M/c QUEUES OPHER BARON

More information

The Transition Probability Function P ij (t)

The Transition Probability Function P ij (t) The Transition Probability Function P ij (t) Consider a continuous time Markov chain {X(t), t 0}. We are interested in the probability that in t time units the process will be in state j, given that it

More information

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing

More information

Inventory Control with Convex Costs

Inventory Control with Convex Costs Inventory Control with Convex Costs Jian Yang and Gang Yu Department of Industrial and Manufacturing Engineering New Jersey Institute of Technology Newark, NJ 07102 yang@adm.njit.edu Department of Management

More information

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads Operations Research Letters 37 (2009) 312 316 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.com/locate/orl Instability of FIFO in a simple queueing

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement

Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Jefferson Huang March 21, 2018 Reinforcement Learning for Processing Networks Seminar Cornell University Performance

More information

Stochastic modelling of epidemic spread

Stochastic modelling of epidemic spread Stochastic modelling of epidemic spread Julien Arino Centre for Research on Inner City Health St Michael s Hospital Toronto On leave from Department of Mathematics University of Manitoba Julien Arino@umanitoba.ca

More information

Introduction to Approximate Dynamic Programming

Introduction to Approximate Dynamic Programming Introduction to Approximate Dynamic Programming Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Approximate Dynamic Programming 1 Key References Bertsekas, D.P.

More information

Solutions to Homework Discrete Stochastic Processes MIT, Spring 2011

Solutions to Homework Discrete Stochastic Processes MIT, Spring 2011 Exercise 6.5: Solutions to Homework 0 6.262 Discrete Stochastic Processes MIT, Spring 20 Consider the Markov process illustrated below. The transitions are labelled by the rate q ij at which those transitions

More information

General Glivenko-Cantelli theorems

General Glivenko-Cantelli theorems The ISI s Journal for the Rapid Dissemination of Statistics Research (wileyonlinelibrary.com) DOI: 10.100X/sta.0000......................................................................................................

More information

1 IEOR 4701: Continuous-Time Markov Chains

1 IEOR 4701: Continuous-Time Markov Chains Copyright c 2006 by Karl Sigman 1 IEOR 4701: Continuous-Time Markov Chains A Markov chain in discrete time, {X n : n 0}, remains in any state for exactly one unit of time before making a transition (change

More information

Stochastic2010 Page 1

Stochastic2010 Page 1 Stochastic2010 Page 1 Extinction Probability for Branching Processes Friday, April 02, 2010 2:03 PM Long-time properties for branching processes Clearly state 0 is an absorbing state, forming its own recurrent

More information

Preference Elicitation for Sequential Decision Problems

Preference Elicitation for Sequential Decision Problems Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These

More information

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review

More information