UNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}.

Similar documents
Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

ON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES

Statistics 150: Spring 2007

Data analysis and stochastic modeling

Chapter 5. Continuous-Time Markov Chains. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan

Dynamic Control of a Tandem Queueing System with Abandonments

Infinite-Horizon Discounted Markov Decision Processes

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs

Infinite-Horizon Average Reward Markov Decision Processes

IEOR 6711, HMWK 5, Professor Sigman

Stochastic Models. Edited by D.P. Heyman Bellcore. MJ. Sobel State University of New York at Stony Brook

Regenerative Processes. Maria Vlasiou. June 25, 2018

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes?

Dynamic control of a tandem system with abandonments

Markov Decision Processes and Dynamic Programming

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition

MDP Preliminaries. Nan Jiang. February 10, 2019

21 Markov Decision Processes

Stochastic process. X, a series of random variables indexed by t

Procedia Computer Science 00 (2011) 000 6

On Optimization of the Total Present Value of Profits under Semi Markov Conditions

6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE

INEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS

IEOR 4106: Introduction to Operations Research: Stochastic Models Spring 2011, Professor Whitt Class Lecture Notes: Tuesday, March 1.

Continuous Time Processes

Risk-Sensitive and Average Optimality in Markov Decision Processes

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

Lecture 1: Brief Review on Stochastic Processes

Inventory Ordering Control for a Retrial Service Facility System Semi- MDP

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

MARKOV PROCESSES. Valerio Di Valerio

Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes

REMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I.

Strategic Dynamic Jockeying Between Two Parallel Queues

The Optimal Stopping of Markov Chain and Recursive Solution of Poisson and Bellman Equations

Minimum average value-at-risk for finite horizon semi-markov decision processes

CS 7180: Behavioral Modeling and Decisionmaking

Part I Stochastic variables and Markov chains

Continuous-Time Markov Chain

On the static assignment to parallel servers

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

LECTURE #6 BIRTH-DEATH PROCESS

Elements of Reinforcement Learning

Introduction to Queuing Networks Solutions to Problem Sheet 3

Markov Decision Processes and Dynamic Programming

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Figure 10.1: Recording when the event E occurs

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K "

Multi Stage Queuing Model in Level Dependent Quasi Birth Death Process

Queuing Theory. Using the Math. Management Science

Modelling data networks stochastic processes and Markov chains

Simplex Algorithm for Countable-state Discounted Markov Decision Processes

Some notes on Markov Decision Theory

Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University.

Semi-Markov Decision Problems and Performance Sensitivity Analysis

(implicitly assuming time-homogeneity from here on)

Individual, Class-based, and Social Optimal Admission Policies in Two-Priority Queues

Lecture Notes of Communication Network I y: part 5. Lijun Qian. Rutgers The State University of New Jersey. 94 Brett Road. Piscataway, NJ

Continuous Time Markov Chains

ADMISSION CONTROL IN THE PRESENCE OF PRIORITIES: A SAMPLE PATH APPROACH

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

Solving Generalized Semi-Markov Decision Processes using Continuous Phase-Type Distributions

Performance Evaluation of Queuing Systems

Some AI Planning Problems

An Adaptive Clustering Method for Model-free Reinforcement Learning

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

PBW 654 Applied Statistics - I Urban Operations Research

Queuing Theory. Richard Lockhart. Simon Fraser University. STAT 870 Summer 2011

On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes

Markov decision processes and interval Markov chains: exploiting the connection

IEOR 6711: Stochastic Models I, Fall 2003, Professor Whitt. Solutions to Final Exam: Thursday, December 18.

Since D has an exponential distribution, E[D] = 0.09 years. Since {A(t) : t 0} is a Poisson process with rate λ = 10, 000, A(0.

Homework 4 due on Thursday, December 15 at 5 PM (hard deadline).

Markov Decision Processes and their Applications to Supply Chain Management

MAT SYS 5120 (Winter 2012) Assignment 5 (not to be submitted) There are 4 questions.

Lecture 20: Reversible Processes and Queues

Discrete-Time Markov Decision Processes

A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion

Introduction to queuing theory

Chapter 16 Planning Based on Markov Decision Processes

Queueing. Chapter Continuous Time Markov Chains 2 CHAPTER 5. QUEUEING

Robustness of policies in Constrained Markov Decision Processes

Cover Page. The handle holds various files of this Leiden University dissertation

SMSTC (2007/08) Probability.

Q-Learning for Markov Decision Processes*

ON THE LAW OF THE i TH WAITING TIME INABUSYPERIODOFG/M/c QUEUES

The Transition Probability Function P ij (t)

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

Inventory Control with Convex Costs

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads

Artificial Intelligence & Sequential Decision Problems

Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement

Stochastic modelling of epidemic spread

Introduction to Approximate Dynamic Programming

Solutions to Homework Discrete Stochastic Processes MIT, Spring 2011

General Glivenko-Cantelli theorems

1 IEOR 4701: Continuous-Time Markov Chains

Stochastic2010 Page 1

Preference Elicitation for Sequential Decision Problems

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

Transcription:

Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 1 UNIFORMIZATION IN MARKOV DECISION PROCESSES OGUZHAN ALAGOZ MEHMET U.S. AYVACI Department of Industrial and Systems Engineering, University of Wisconsin-Madison, Madison, Wisconsin Most Markov decision process (MDP) models consider problems with decisions occurring at discrete time points. On the other hand, there are several real-life applications, particularly in queuing systems, in which the decision maker chooses actions at random times over a continuous-time interval. Such problems can be modeled using continuous-time models. Semi-Markov decision processes (SMDPs) (see Semi- Markov Decision Processes), a class of continuous-time models, generalize discretetime Markov decision processes (DTMDPs) by allowing state changes to occur randomly over continuous time and letting or requiring decisions to be taken whenever the system state changes 1,2. In SMDPs, the stochastic process defined by the state transitions follows a discrete-time Markov chain while the time between each transition is drawn from a general distribution, independent of transitions 1,3. Continuous-time Markov decision processes (CTMDPs) constitute a special type of SMDPs in which the transition times between decisions are exponentially distributed and actions are taken at every transition 2. Uniformization, as a tool, is used to convert a CTMDP into an equivalent DTMDP. Although uniformization has been used to analyze continuous-time Markov processes for a long time 4 7, Serfozo 8 formalized the use of uniformization in the context of countable-state CTMDPs. In this article, we will describe uniformization in CTMDPs. Although we consider CTMDPs with stationary transition probabilities and reward functions, bounded reward functions, and finite state and action spaces, the results can be easily extended to CTMDPs with countable state and action spaces as well as to more general spaces with appropriate measurability conditions 8. While we focus on uniformization only for infinite-horizon CTMDPs with total expected discounted reward criterion, uniformization can also be utilized in CTMDPs with average reward criterion 2. Assuming a unichain transition probability matrix for every stationary policy, the transformation and modeling scheme for solving CTMDPs with average reward criterion (see Average Reward MDPs: Solution Techniques) is identical to those described in this article except for the computation of the reward function. The results apply to multichain cases as well, with slight modifications. More information on uniformization in CTMDPs with average reward criterion is available elsewhere 2. The remainder of this article is organized as follows: we next summarize how uniformization is used to convert a continuous-time Markov chain (CTMC) into an equivalent discrete-time Markov chain. Then, we describe the use of uniformization in CTMDPs. Finally, we present two examples for uniformization. UNIFORMIZATION IN CONTINUOUS-TIME MARKOV CHAINS CTMCs are formally defined as follows 9 (see the section titled Continuous-Time Markov Chains (CTMCs) in this encyclopedia): a continuous-time stochastic process {X(t), t } is a CTMC if for all s, t and nonnegative integers i, j, x(u) where u < t, P{X(t + s) = j X(t) = i, X(u) = x(u), u < t} = P{X(t + s) = j X(t) = i}. A CTMC is a stochastic process with the Markovian property; that is, the conditional distribution P(X(s + t) X(u)) of the future Wiley Encyclopedia of Operations Research and Management Science, edited by James J. Cochran Copyright 21 John Wiley & Sons, Inc. 1

Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 2 2 UNIFORMIZATION IN MARKOV DECISION PROCESSES state X(t + s) is independent of the past states X(u), u < t and only depends on the current state X(t) (see Definition and Examples of CTMCs). Consider a CTMC in which the time to make a transition from its current state to a different state is exponentially distributed with β for all states. Let P ij (t) denote the probability of being in state j at time t starting from state i at time. Note that the number of transitions by time t, {N(t),t } is a Poisson process with rate β 9. Therefore, P ij (t) can be recasted by conditioning on the number of transitions by time t as follows: P ij (t) = P{X(t) = j X() = i} = P { X(t) = j X() = i, N(t) = n } P(N(t) = n X() = i) = P { X(t) = j X() = i, N(t) = n } βt (βt)n e = P n (βt)n ije βt, (1) where P n ij represents the n-step stationary transition probability of an equivalent discrete-time Markov chain with transition probabilities P ij.thatis, P{X(t) = j X() = i, N(t) = n} =P n ij. (2) Equation (1) follows from the assumption that the time spent in every state is exponentially distributed with rate (β). More specifically, the probability of moving from i to j in n transitions is equal to the probability of moving from i to j by time t since moving in n steps does not give any information on which states were visited due to identical sojourn times. Therefore, Equation (2) can be applied only if all states have identical sojourn time distributions. In order to convert a CTMC with different transition rates into a discrete-time Markov chain we use uniformization. Suppose the mean sojourn time at each state is 1/β i and there exists a finite constant β such that β i β, foralli. According to the new scheme, we assign the same transition rate β to all states i, where the transition process is divided into two: the fictitious transitions to the state itself and the transitions to the other states. To match the actual process, we will have the process remain in each state for an exponential amount of time with rate β and the new transition probabilities will be defined as { β P 1 i β ij =, j = i, β i β P ij, j i. Applying the new transition probabilities to Equation (1), we obtain P ij (t) = ( P ij ) n βt (βt)n e. Figure 1 shows the schematic of a simple uniformization example. In summary, uniformization enabled us to convert a CTMC with state-dependent out-of-state transition rates into an analytically equivalent CTMC with uniform transition rates. This new system could be treated as a discrete-time Markov chain for the purposes of analysis 9 UNIFORMIZATION IN CONTINUOUS-TIME MARKOV DECISION PROCESSES In this section, we describe the uniformization process in CTMDPs. We start with the simpler case, where the transition rates are uniform and then extend this to the more general form, where the transition rates are state-and-action-dependent. Uniform Transition Rates Consider an infinite-horizon discounted CTMDP with the following reward (cost) function: t n lim e αt g(s(t), a(t)) dt, n E s where t n represents the time of nth transition, n ={1,..., }; α is the continuous-time discounting factor, α > ; g(s(t), a(t)) is the reward obtained when action a(t) is selected at state s(t). If we let s n and a n denote

Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 3 UNIFORMIZATION IN MARKOV DECISION PROCESSES 3 Original scheme After uniformization β β 1 β β CTMC ν (i ): Transition rate out of state i 1 β 1 1 β 1 Equivalent DTMC (Probabilities) ν() = β, ν(1) = β 1 P 1 = 1 1 P 1 = 1 P = β 1 / β ν() = ν(1) = β = β + β 1 P 1 = β / β 1 P 1 = β 1 / β P 11 = β / β Figure 1. An illustrative example for the uniformization of a CTMC through the use of fictitious self-transitions. the state and the action selected at time t n, respectively, then s(t) = s n and a(t) = a n hold where t n t < t n+1. Suppose g(s(t),a(t)) consists of two parts: K(s(t),a(t)), the lump reward obtained when a new state and action pair is observed and C(s(t),a(t)), the continuous reward accumulated if the state was s(t) andactiona(t) was taken in the last decision epoch. The state of a CTMDP does not change between decision epochs, therefore, the value of a given policy π for a CTMDP, v π α, that is, the total expected discounted reward over the infinite horizon for π is calculated as follows: v π α = Eπ s e (K(s αtn n, a n ) + tn+1 tn e α(t tn) C(s n, a n )dt). (3) Let τ n+1 = t n+1 t n (τ = t = ) denote the time the process remains in s n,whichisdistributed exponentially with parameter β for all states. Then, we can rewrite Equation (3) as follows: v π α = Eπ s e αtn K(s n, a n ) + E π s e αtn C(s n, a n ) τn+1 = E π s e α(τ 1 + +τn) K(s n, a n ) = + E π s e α(τ 1 + +τn) 1 α (1 e ατ n+1)c(s n, a n ) + E π s K(sn, a n ) ( E π ) s e ατ 1 n E π s C(sn, a n ) 1 α ( E π s e ατ 1 ) n e αt dt ( 1 E π ) s e ατ 1, (4) where Equation (4) follows from the assumption that {τ 1, τ 2,..., τ n+1 } are independent and identically distributed with exponential distribution and s n is independent of τ 1, τ 2,..., τ n+1. Evaluating the expectation of the exponential, E π s e ατ 1 = e αt βe βt dt = β := λ,

Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 4 4 UNIFORMIZATION IN MARKOV DECISION PROCESSES and rewriting this equation with the λ substituted in Equation (4), we obtain v π α = ( Eπ s λ n K(s n, a n ) + C(s ) n, a n ) = E π s λ n r(s n, a n ), which has the same form as an equivalent DTMDP if we redefine K(s n, a n ) + C(s n, a n )/() = r(s n, a n ) as the total expected discounted reward between two decision epochs for the (s n, a n ) pair. Note that this was achieved since the sojourn times at each state are assumed to be independent and identically distributed with exponential distribution. To summarize, a CTMDP with the reward function t n lim e αt g ( s(t), a(t) ) dt, n E s and a transition rate β for all states and actions is equivalent to a DTMDP with the discount factor λ = β, and the total expected discounted reward between two decision epochs is given by C(s, a) r(s, a) = K(s, a) +, (5) where the reward functions K and C are defined as above. Let P(j s, a) representthe probability that the state at the next decision epoch will be j, given that the state is s and action a is taken at the last decision epoch. Then, the Bellman equations can be rewritten as v(s) = max a As r(s, a) + λ P(j s, a)v(j), and be solved as a DTMDP, where A s and v(j) represent the set of available actions at state s and the optimal total expected discounted reward that can be obtained when the process starts in state j, respectively. Nonuniform Transition Rates A major limiting assumption for the above result is the assumption of identical transition rates across all states and actions. In this section, we show that by allowing fictitious transitions from a state to itself as in the previous section, we can extend the results for CTMDPs with uniform transition rates to those with nonuniform transition rates. Letβ(s, a) denote the transition rate out of state s when action a is taken and β represent a uniform transition rate satisfying β(s, a) β, s S and a A s,wherea s represents the action space for state s. Wecanthenmodify the transition probabilities as { 1 β(s,a) β, j = s; P(j s, a) = β(s,a) β P(j s, a), j s. (6) By creating fictitious transitions, we are creating a stochastically equivalent process in which the transitions occur more often. As an example, when the process is in state s, it will leave s at a faster rate β but will return to the same state with 1 β(s, a)/β probability. Probabilistically, the new process will move to another state at the same rate as in the original one. As a result of the uniformization of the nonidentical transition rates, we can use the results for CTMDPs with uniform transition rates. To summarize, we can analyze a CTMDP with exponential transition rates β(s, a), transition probabilities P(j s, a), and a reward function of lim N E s tn e αt g(s(t), a(t)) dt, by converting it into an equivalent CTMDP with the discount factor λ = β α+β, where β(s, a) β, s S, a A s and the transition probabilities are given by Equation (6). The total expected discounted reward between two decision epochs is given by Equation (5).

Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 5 UNIFORMIZATION IN MARKOV DECISION PROCESSES 5 The optimality equation is then written as C(s, a) v(s) = max K(s, a) + a As + β P(j s, a)v(j) (7) = max a As r(s, a) + λ P(j s, a)v(j) (8) and can be analyzed as a DTMDP. It can easily be shown 1 that after several simple algebraic manipulations, Equations (7) and (8) can also be presented as follows: v(s) = 1 max ()K(s, a) a As + C(s, a) + (β β(s, a))v(s) + β(s, a) P(j s, a)v(j). The optimality equations given by Equation (8) provide a compact form that is very similar to the conventional optimality equations for DTMDPs and therefore, are easier to comprehend. EXAMPLES In this section, we present two simple examples from queuing systems to illustrate the use of uniformization in continuous-time Markov models. More examples for the application of uniformization to CTMDPs are available elsewhere 2,1,11. Meeting the Professor. Students come in randomly during Professor Smith s office hours and on some occasions, they find the professor busy with other students, in which case they leave and return later. The interarrival times of the students are independent and identically distributed exponentially with rate ω, and it takes exponential amount of time with rate μ for Professor Smith to finish with a student. A student arrives at the office and finds the professor busy with another student. We compute the probability that the professor will be available if the student comes back at time t. We model the process as a birth and death process, where states and 1 represent when the professor is available and busy with another student, respectively. We can solve a set of differential equations to calculate the probability in the question. However, we will solve this problem using uniformization. Note that this problem is essentially an M/M/1/1 queue. The reader can refer to Ref. 9 for an analysis of the model to derive the probability in question. The process has the following parameters: β = ω, β 1 = μ, andp 1 = P 1 = 1. By defining β = ω + μ, we can uniformize the CTMC to obtain P = μ ω + μ = 1 P 1, P 1 = μ ω + μ = 1 P 11. This creates a new transition matrix with identical entries in a column: ( ) μ ω P ij = μ ω = P n ij i, j ={, 1} n = 1, 2,... Hence, using the uniformization for CTMCs, P 11 (t) = P n 11 = e ()t + (ω + μ)tn e ()t ( ω ) ω + μ n=1 ()t (ω + μ)tn e = e ()t + 1 e ()t ω ω + μ = ω ω + μ + μ ω + μ e ()t. The required probability is then P 1 (t)=1 P 11 (t)= μ 1 e ()t. ω + μ

Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 6 6 UNIFORMIZATION IN MARKOV DECISION PROCESSES Professor s Dilemma. Consider a slightly modified version of the above example. Namely, we will now model the professor s decision on how fast he should answer a student s questions. Suppose, the professor has only three chairs in the office and students coming to the office hours get in only if there is a vacant seat in the office. Every time a student comes in, the professor sets his pace in answering questions so that he can be fair to all that are waiting. His pace in terms of time he expects to spend with a student, is distributed exponentially with ameanthatisin1/μ,1/μ interval.each time a student comes in, the professor will accrue a reward of U(s, μ), the immediate utility he gets when the total number of students in the office is s and he chooses his pace as μ. The utility depends on the number of people waiting, as well as the pace reflecting the quality of time he expects to spend with the students. On the other hand, he accrues a utility of u(s, μ) while he is answering the student s questions and this is continuously discounted at rate α reflecting the fact that the more time he spends with a student, the less he can spend with others. We will write the optimality equations for Professor s pace decision problem. We define the state space S ={, 1, 2, 3}, representing the number of students that are in the office. The transition rates are ω, s = ; β(s, μ) = ω + μ, s = 1, 2; μ, s = 3. The maximum possible transition rate is β = ω + μ. The new transition probability matrix for a given μ is as follows: μ ω μ μ μ P = μ ω μ μ ω μ ω, where (s, j)entryof P represents P(j s, a = μ). Uniformizing the decision process via using β and the above transition matrix followed by application of Equation (7) leads us to the optimality equations given by u(s, μ) v(s) = max U(s, μ) + μ (μ,μ) + β P(j s, μ)v(j). As discussed in the previous section, the above equations could be converted to the form of Equation (8), and therefore can be solved using conventional DTMDP solution techniques such as value iteration, policy iteration, or linear programming (see Total Expected Discounted Reward MDPs: Value Iteration Algorithm, Total Expected Discounted Reward MDPs: Policy Iteration Algorithm and Linear Programming Formulations of MDPs). Acknowledgments This article was supported in part by National Science Foundation grant CMMI- 794. The authors thank Jeffrey Kharoufeh and two anonymous referees for their suggestions and insights, which improved this manuscript. REFERENCES 1. Heyman DP, Sobel MJ. Stochastic models. New York: Elsevier Science Publications; 199. 2. Puterman ML. Markov decision processes: discrete stochastic dynamic programming. New York: John Wiley & Sons, Inc.; 1994. 3. Cinlar E. Introduction to stochastic processes. Englewood Cliffs (NJ): Prentice Hall; 1975. 4. Howard R. Dynamic programming and Markov processes. Cambridge (MA): MIT Press; 196. 5. Jensen A. Markov chains as an aid in the study of Markov processes. Skand Aktuarietidskr 1953;34(3):87 91. 6. Lippman SA. Applying a new device in the optimization of exponential queuing systems. Oper Res 1975;23(4):687 71. 7. Veinott AF. Discrete dynamic programming with sensitive discount optimality. Ann Math Stat 1969;4:1635 166.

Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 7 UNIFORMIZATION IN MARKOV DECISION PROCESSES 7 8. Serfozo R. An equivalence between discrete and continuous time Markov decision processes. Oper Res 1979;27:616 62. 9. Ross SM. Introduction to probability models. New York: Academic Press; 27. 1. Bertsekas DP. Volumes 1 and 2, Dynamic programming and stochastic control. Belmont (MA): Athena Scientific; 21. 11. Walrand J. An introduction to queueing networks. Englewood Cliffs (NJ): Prentice Hall; 1988.

Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. Queries in Article eorms934 Q1. Please confirm if the suggested keywords are fine.

Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. Please note that the abstract and keywords will not be included in the printed book, but are required for the online presentation of this book which will be published on Wiley s own online publishing platform. Q1 If the abstract and keywords are not present below, please take this opportunity to add them now. The abstract should be a short paragraph upto 2 words in length and keywords between 5 to 1 words. Abstract: Continuous-time Markov decision processes (CTMDP) may be viewed as a special case of semi-markov decision processes (SMDP) where the intertransition times are exponentially distributed and the decision maker is allowed to choose actions whenever the system state changes. When the transition rates are identical for each state and action pair, one can convert a CTMDP into an equivalent discrete-time Markov decision process (DTMDP), which is easier to analyze and solve. In this article, we describe uniformization that uses fictitious transitions from a state to itself and hence enables the conversion of a CTMDP with nonidentical transition rates into an equivalent DTMDP. We first demonstrate the use of uniformization in converting a continuous-time Markov chain into an equivalent discrete-time Markov chain, and then describe how it is used in the context of CTMDPs with discounted reward criterion. We also present examples for the use of uniformization in continuous-time Markov models. Keywords: MDP; DTMDP; CTMDP; discounted reward; uniformization