On Optimization of the Total Present Value of Profits under Semi Markov Conditions

Size: px
Start display at page:

Download "On Optimization of the Total Present Value of Profits under Semi Markov Conditions"

Transcription

1 On Optimization of the Total Present Value of Profits under Semi Markov Conditions Katehakis, Michael N. Rutgers Business School Department of MSIS 180 University, Newark, New Jersey 07102, U.S.A. Abstract: In this paper we have surveyed theory related to optimization of Semi Markov processes and we have applied these techniques to a simple dynamic ferry dispatch problem, when the customers arrive at a Ferry according to a Poisson process with rate λ > 0. We have formulated this dispatch problem as a two action semi-markov decision process and we have illustrated computationally that an optimal dispatch policy is characterized by a single critical numberx 0 such that it is optimal to wait until there are at leastx 0 customers on the ferry before a departure occurs. Key Words: Semi-Markov Optimization, Dynamic Scheduling, Markov Chains. 1 Introduction In this paper we have surveyed theory related to optimization of Semi Markov processes and we have applied these techniques to a simple ferry dispatch problem, when the customers arrive at a Ferry according to a Poisson process with rate λ > 0. We have formulated this dispatch problem as a two action semi-markov decision process and we have illustrated computationally that an optimal dispatch policy is characterized by a single critical number x 0 such that it is optimal to wait until there are at least x 0 customers on the ferry before a departure occurs. For related work in this are of dynamic scheduling we refer the reader to Ungureanu et-al [3] through Ungureanu et-al [6]. Further related work can be found in Zhao and Katehakis(2006) and Zhou and Katehakis (2008). The paper is organized as follows. In section 2 we survey the concepts of future and present rewards under continuous discounting. In Section 3 we present the main tools for optimization of a system that can be modelled as a Semi Markov Process, herein we follow Derman(1970) and Ross(1970). In Section 4 we present the model for a ferry dispatch and present computationally optimal dispatch policies. 2 Future and Present values of Rewards. 2.1 Future Values with Compounding. With compounding accumulated interest is added back to the principal, so that interest is earned on interest from that moment on. For example, a loan with $100 principal and an monthly interest rate of 1% that has its interest compounded every month would have a balance of $101 at the end of the first month, $ at the end of the second month, etc. Let t denote the total time in years, n the number of compounding periods per year, note that the total number of compounding periods (for example a period is a month) in a year isn t. Letρbe the nominal annual interest rate, expressed as a decimal. e.g.: 12% = Then, ρ/n is the per period interest and the future value at period t of an initial capital R 0 is 1 : 1 sometimes the simple interest is used in chich case the future value is R s(t) = R 0(1+tρ).

2 R cn (t) = R 0 ( 1+ ρ n) n t. Note that when the compounding frequency is annual, n will be 1 and R c1 (t) = R 0 (1+ρ) t. Since the principalr 0 is a coefficient, it is often dropped for simplicity, and the resulting accumulation function is used in interest theory instead. The accumulation function, b cn (t), is: b cn (t) = ( 1+ ρ ) n t n As n increases, the rate b cn (t) approaches an upper limit of e ρt, cf figure 1. This rate is called continuous compounding factor at rate ρ Figure 1: Convergence of b cn (t) to e ρt as n 2.2 Present Values with Compounding. With compounding accumulated interest is subtracted from principal, so for example, a loan with a monthly interest rate of 1% that has its interest compounded every month that has a balance of $ at the end of the second month, would have a balance of $101 at the end of the first month, and a present value $ at the start of the time horizon. As before let t denote the total time in years, n the number of compounding periods per year, and let ρ be the nominal annual interest rate, expressed as a decimal. e.g.: 12% = Then, ρ/n is the per period interest and for compounded discounting the present value at period t of an initial capital R 0 is : The discount function is: R cn (t) = R 0 d cn (t),. d cn (t) = ( 1 ρ n) n t. As n increases, the discount function d cn (t) approaches an upper limit of e ρt. This function e ρt is called continuous discounting factor at rate ρ.

3 3 Optimization in Semi Markov Processes. In this section we survey the most important features of a sequential decision process for which the times between transitions are random. Such a process is observed at time 0 and its state is classified into one element of the set X = {0,1,2,...}. If the process has just entered state x at time 0, an action a from a set of available actions A x must be chosen. Then, as a result of this state action pair (x,a) the following events unfold: (i) The time spent in state x, (the sojourn time in state x) conditional on the event that the next state visited when the process leaves state x is state y is a random variable S xa with probability distribution F xy,a. (ii) The probability that the next state is state y isp xy,a, where y X p xy,a = 1. After the process leaves statexand upon entering a new stateya new actiona from the set of allowable actions in state y, A y, must be chosen and steps (i) and (ii) above are repeated ad infinitum. It is further supposed there is a reward structure associated with the states visited and actions chosen. If action a is chosen when the process enters state x, then: (i) an immediate reward R(x,a) is earned; (ii) additional rewards accumulate at a rate r(x,a), per unit of time the process stays in state x. Thus, the total reward associated with the state action pair (x,a) when the process stays in state x for t units of time units, then is given by R(x,a)+tr(x,a), and its present value under continuous discounting is equal to: t R(x,a;t) = R(x,a)+ e ρs r(x,a)ds. 0 Remark. When the transition times are identically one, then the above is just a Markov decision process, and in the general case, it is called a semi - Markov decision process. We also note that if a stationary policy is employed, then the process{x(t), t 0} is a semi-markov process, where X(t) represents the state of the process at time t. To avoid trivialities, we will make the following assumption. Assumption I. (i) The reward functions R(x, a) and r(x, a). are bounded. (ii) There exist constants δ > 0 and ǫ > 0, such that P(S xa > δ) ǫ, (x,a). (1) Note that P(S xa > δ) = y X p xy,a F xy,a (δ), where we use the notation: F xy,a = 1 F xy,a. Thus, assumption A(ii) states that for every state x and action a, there is a positive probability of at least ǫ that the that the sojourn time in state x will be greater than δ. Hence, an infinite number of transitions can not occur in a finite interval. 3.1 Optimization of Present Values. We assume that rewards are continuously discounted, and the objective is to maximize the expected total present value of a stream of rewards. Note that a reward R received at at time t has equivalent present value (at time 0) equal tore ρt. Most of these results are well known, [1] and [2] and the theorems will be stated without proof. LetL S (ρ) denote the Laplace transform of a random variable S, i.e.,

4 L S (ρ) = E x,a e ρs. Notice that We also define L Sxa (ρ) = r(x,a;s xa ) = p xy (a) e ρt df xy,a (t). (2) 0 Sxa 0 r(x,a)e ρs ds. (3) Using the Eqs (2) and (3) above we obtain the following expression for the expected discounted reward R(x,a) during the sojourn time S xa, in state x when action a is taken: where R(x,a) = R(x,a)+ r(x,a), (4) r(x,a) = E r(x,a) = r(x,a)(1 Ee ρsxa )/ρ = r(x,a)(1 L Sxa (ρ))/ρ (5) Let X n and A n be respectively the n-th state of the process and the n-th action chosen, n = 1,2,... Now, for any deterministic policy π (i.e., a rule for choosing actions as a function of the past observations on states and times) and ρ > 0, the expected total discounted reward over an infinite horizon, w ρ,π (x), when policy π is employed is equal to: w ρ,π (x) = E π ( e ρ n 1 ν=0 S XνAν R(X n,a n ) X 0 = x) n=0 = R(x,π(x))+L Sxπ(x) (ρ) p xy (π(x))w ρ,π (y). (6) The value function is defined as follows: A policy π is optimal if v ρ (x) = sup{w ρ,π (x)}. (7) π w ρ,π (x) = v ρ (x), x X. (8) The following classic theorems, are used to specify the optimal value function and the the existence of a simple optimal policy. Theorem 1 Under the assumption 1, the value function v ρ (x) is the unique solution to the following system of equations. v ρ (x) = max { R(x,a)+L Sxa (ρ) p xy (a)v ρ (y)}. (9) a A(x)

5 Further, if we define a policy π 0 such that for all x X it chooses action π 0 (x) defined below: then we have the following theorem. π 0 (x)=argmax { R(x,a)+L Sxa (ρ) p xy (a)v ρ (y)}, (10) a A(x) Theorem 2 The value function v ρ (x) is the unique solution to the following system of equations. w ρ,π 0(x) = v ρ (x), x X. (11) Theorem 3 Under assumption 1, the iterates vρ(x) ν produce by (12) below converge to the value function v ρ (x), as ν, vρ(x)= ν max { R(x,a)+L Sxa (ρ) p xy (a)vρ ν 1 (y)}, (12) a A(x) for any arbitrary initial values v 0 ρ(x). 4 An Application: The Optimal Ferry Dispatch Problem. Suppose that customers arrive at a Ferry according to a Poisson process with rateλ > 0. At any time,t, the decision maker (captain) may, at a cost ofk+tk units depart, wherek is a fixed cost andk is a cost proportional to the delay from the nominal departure time. Suppose also that there is a revenue of R(x) if the ferry picks up all x customers, where R(x) is a bounded increasing nonnegative function. The process is assumed to go on indefinitely, and the problem is to select a policy which maximizes the total expected discounted profit for the ferry. This problem can be formulated as a two action semi-markov decision process with states: X = {1,2,...,S} where state x means that there are x customers currently on board and S is the capacity of the ferry. Let a 1 denote the action depart and let a 0 denote the action wait. We assume that the process repeats without delay. The parameters of the problem are: 1. Under action a 1 : p x1 (a 1 ) = 1, F Sx,a1 (t) = 1 e λt, L Sxa1 = λ and R(x,a 1 ) = R(x) K. 2. Under action a 0 : p xx+1 (a 0 ) = 1, F Sx,a0 (t) = 1 e λt, L Sxa1 = λ and R(x,a 0 ) = k 3. Also, and A(x) = {a 1,a 0 }, for x = 1,...,S 1, A(S) = {a 1 }, for x = S. Thus the Bellman s optimality conditions of theorem 1 for x < S are: v ρ (x) = max{t 1 (x,a 1 ),T 0 (x,a 0 )} (13) where λ T 1 (x,a 1 ) = R(x) K +v ρ (1)

6 and and for x = S they are T 0 (x,a 0 ) = k +v λ ρ(x+1) λ v ρ (S) = T 1 (S,a 1 ) = R(S) K +v(1). (14) It follows that it is optimal to depart when there arexcustomers present when λ R(x) K +v ρ (1) > k +v λ ρ(x+1) Using Theorem 1, we have done indicative computations using the values λ = 1, ρ =.1, K = 20, k :=.1, S = 40, and R(x) = rx, where r = 1.5. In Figure 2 we plot v ν ρ(25) and v ν ρ(30) versus ν in order to illustrate the convergence of of v ν ρ(x)) tov ρ (x) asν. Figure 2: Convergence ofv ν ρ (x)) tov ρ(x) asν. In Figure 3 we illustrate the form of the optimal policy where we observe that there exists a fixed critical constant x 0 such that π 0 (x) = 0 for x x 0 and π 0 (x) = 1 for x x 0. Figure 3: Optimal actions: π 0 (x)) References: [1] Derman, C.(1970). Finite State Markovian Decision Processes, Academic Press. [2] Ross, S. M. (1970). Applied Probability Models with Optimization Applications, Holden-Day, San Francisco, CA. [3] Ungureanu V., Melamed B., Katehakis M.N. and Bradford P.G. (2006). Deferred Assignment Scheduling in Cluster-based Servers. Cluster Computing 9(1) pp [4] Ungureanu V., Melamed B., Katehakis M.N. and Bradford P.G. (2006). Class-Dependent Assignment in cluster-based servers. SAC 2004: pp

7 [5] Ungureanu V., Melamed B. and Katehakis M.N. (2004). The LC Assignment Policy for Cluster-Based Servers. NCA 2004: pp [6] Ungureanu V., Melamed B. and Katehakis M.N. (2004). Performance Comparison of Assignment Policies on Cluster-based E-Commerce Servers, WSEAS Transactions. Also in Proceedings of the International Conference on Software Engineering, Parallel and Distributed Systems, February 13-15, 2004, Salzburg, Austria. [7] Ungureanu V., Melamed B. and Katehakis M.N. (2003). Towards an Efficient Cluster-Based E-Commerce Server. CLUSTER 2003: pp [8] Veinott A. F., (1966). On the optimality of(s, S) inventory policies: new conditions and a new proof. SIAM J. Appl. Math., pp [9] Zhao Y. and Katehakis M. N., (2006) on the structure of optimal ordering policies for stochastic inventory systems with minimum order quantity, Probability in the Engineering and Informational Sciences, pp [10] Zhou B., Katehakis M. N. and Zhao Y (2007). Effective control policies for stochastic inventory systems with minimum order quantity: and linear costs. International Journal Of Production Economics Vol 106(2)

On Finding Optimal Policies for Markovian Decision Processes Using Simulation

On Finding Optimal Policies for Markovian Decision Processes Using Simulation On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation

More information

1 Markov decision processes

1 Markov decision processes 2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe

More information

Performance Evaluation of Queuing Systems

Performance Evaluation of Queuing Systems Performance Evaluation of Queuing Systems Introduction to Queuing Systems System Performance Measures & Little s Law Equilibrium Solution of Birth-Death Processes Analysis of Single-Station Queuing Systems

More information

UNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}.

UNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}. Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 1 UNIFORMIZATION IN MARKOV DECISION PROCESSES OGUZHAN ALAGOZ MEHMET U.S. AYVACI Department of Industrial and Systems Engineering, University of Wisconsin-Madison,

More information

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony

More information

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process

More information

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K "

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems  M/M/1  M/M/m  M/M/1/K Queueing Theory I Summary Little s Law Queueing System Notation Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K " Little s Law a(t): the process that counts the number of arrivals

More information

Sequential Selection of Projects

Sequential Selection of Projects Sequential Selection of Projects Kemal Gürsoy Rutgers University, Department of MSIS, New Jersey, USA Fusion Fest October 11, 2014 Outline 1 Introduction Model 2 Necessary Knowledge Sequential Statistics

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

P (L d k = n). P (L(t) = n),

P (L d k = n). P (L(t) = n), 4 M/G/1 queue In the M/G/1 queue customers arrive according to a Poisson process with rate λ and they are treated in order of arrival The service times are independent and identically distributed with

More information

Queueing systems. Renato Lo Cigno. Simulation and Performance Evaluation Queueing systems - Renato Lo Cigno 1

Queueing systems. Renato Lo Cigno. Simulation and Performance Evaluation Queueing systems - Renato Lo Cigno 1 Queueing systems Renato Lo Cigno Simulation and Performance Evaluation 2014-15 Queueing systems - Renato Lo Cigno 1 Queues A Birth-Death process is well modeled by a queue Indeed queues can be used to

More information

COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis. State University of New York at Stony Brook. and.

COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis. State University of New York at Stony Brook. and. COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis State University of New York at Stony Brook and Cyrus Derman Columbia University The problem of assigning one of several

More information

Renewal theory and its applications

Renewal theory and its applications Renewal theory and its applications Stella Kapodistria and Jacques Resing September 11th, 212 ISP Definition of a Renewal process Renewal theory and its applications If we substitute the Exponentially

More information

Simplex Algorithm for Countable-state Discounted Markov Decision Processes

Simplex Algorithm for Countable-state Discounted Markov Decision Processes Simplex Algorithm for Countable-state Discounted Markov Decision Processes Ilbin Lee Marina A. Epelman H. Edwin Romeijn Robert L. Smith November 16, 2014 Abstract We consider discounted Markov Decision

More information

CDA5530: Performance Models of Computers and Networks. Chapter 4: Elementary Queuing Theory

CDA5530: Performance Models of Computers and Networks. Chapter 4: Elementary Queuing Theory CDA5530: Performance Models of Computers and Networks Chapter 4: Elementary Queuing Theory Definition Queuing system: a buffer (waiting room), service facility (one or more servers) a scheduling policy

More information

IEOR 6711, HMWK 5, Professor Sigman

IEOR 6711, HMWK 5, Professor Sigman IEOR 6711, HMWK 5, Professor Sigman 1. Semi-Markov processes: Consider an irreducible positive recurrent discrete-time Markov chain {X n } with transition matrix P (P i,j ), i, j S, and finite state space.

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle  holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive

More information

Infinite-Horizon Discounted Markov Decision Processes

Infinite-Horizon Discounted Markov Decision Processes Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected

More information

INDEX POLICIES FOR DISCOUNTED BANDIT PROBLEMS WITH AVAILABILITY CONSTRAINTS

INDEX POLICIES FOR DISCOUNTED BANDIT PROBLEMS WITH AVAILABILITY CONSTRAINTS Applied Probability Trust (4 February 2008) INDEX POLICIES FOR DISCOUNTED BANDIT PROBLEMS WITH AVAILABILITY CONSTRAINTS SAVAS DAYANIK, Princeton University WARREN POWELL, Princeton University KAZUTOSHI

More information

Linear Programming Methods

Linear Programming Methods Chapter 11 Linear Programming Methods 1 In this chapter we consider the linear programming approach to dynamic programming. First, Bellman s equation can be reformulated as a linear program whose solution

More information

6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE Control of continuous-time Markov chains Semi-Markov problems Problem formulation Equivalence to discretetime problems Discounted problems Average cost

More information

Central-limit approach to risk-aware Markov decision processes

Central-limit approach to risk-aware Markov decision processes Central-limit approach to risk-aware Markov decision processes Jia Yuan Yu Concordia University November 27, 2015 Joint work with Pengqian Yu and Huan Xu. Inventory Management 1 1 Look at current inventory

More information

Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk

Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk ANSAPW University of Queensland 8-11 July, 2013 1 Outline (I) Fluid

More information

On Successive Lumping of Large Scale Systems

On Successive Lumping of Large Scale Systems On Successive Lumping of Large Scale Systems Laurens Smit Rutgers University Ph.D. Dissertation supervised by Michael Katehakis, Rutgers University and Flora Spieksma, Leiden University April 18, 2014

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Solutions to Homework Discrete Stochastic Processes MIT, Spring 2011

Solutions to Homework Discrete Stochastic Processes MIT, Spring 2011 Exercise 6.5: Solutions to Homework 0 6.262 Discrete Stochastic Processes MIT, Spring 20 Consider the Markov process illustrated below. The transitions are labelled by the rate q ij at which those transitions

More information

Stochastic Optimization

Stochastic Optimization Chapter 27 Page 1 Stochastic Optimization Operations research has been particularly successful in two areas of decision analysis: (i) optimization of problems involving many variables when the outcome

More information

Dynamic control of a tandem system with abandonments

Dynamic control of a tandem system with abandonments Dynamic control of a tandem system with abandonments Gabriel Zayas-Cabán 1, Jingui Xie 2, Linda V. Green 3, and Mark E. Lewis 4 1 Center for Healthcare Engineering and Patient Safety University of Michigan

More information

Lecture 7: Simulation of Markov Processes. Pasi Lassila Department of Communications and Networking

Lecture 7: Simulation of Markov Processes. Pasi Lassila Department of Communications and Networking Lecture 7: Simulation of Markov Processes Pasi Lassila Department of Communications and Networking Contents Markov processes theory recap Elementary queuing models for data networks Simulation of Markov

More information

Markov decision processes and interval Markov chains: exploiting the connection

Markov decision processes and interval Markov chains: exploiting the connection Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic

More information

Introduction to queuing theory

Introduction to queuing theory Introduction to queuing theory Claude Rigault ENST claude.rigault@enst.fr Introduction to Queuing theory 1 Outline The problem The number of clients in a system The client process Delay processes Loss

More information

Multiagent Value Iteration in Markov Games

Multiagent Value Iteration in Markov Games Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges

More information

Non Markovian Queues (contd.)

Non Markovian Queues (contd.) MODULE 7: RENEWAL PROCESSES 29 Lecture 5 Non Markovian Queues (contd) For the case where the service time is constant, V ar(b) = 0, then the P-K formula for M/D/ queue reduces to L s = ρ + ρ 2 2( ρ) where

More information

Queuing Theory. Using the Math. Management Science

Queuing Theory. Using the Math. Management Science Queuing Theory Using the Math 1 Markov Processes (Chains) A process consisting of a countable sequence of stages, that can be judged at each stage to fall into future states independent of how the process

More information

Discrete planning (an introduction)

Discrete planning (an introduction) Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135

More information

Procedia Computer Science 00 (2011) 000 6

Procedia Computer Science 00 (2011) 000 6 Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-

More information

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes Infinite-Horizon Average Reward Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 1 Outline The average

More information

Stochastic models in product form: the (E)RCAT methodology

Stochastic models in product form: the (E)RCAT methodology Stochastic models in product form: the (E)RCAT methodology 1 Maria Grazia Vigliotti 2 1 Dipartimento di Informatica Università Ca Foscari di Venezia 2 Department of Computing Imperial College London Second

More information

Dynamic Control of a Tandem Queueing System with Abandonments

Dynamic Control of a Tandem Queueing System with Abandonments Dynamic Control of a Tandem Queueing System with Abandonments Gabriel Zayas-Cabán 1 Jungui Xie 2 Linda V. Green 3 Mark E. Lewis 1 1 Cornell University Ithaca, NY 2 University of Science and Technology

More information

Strategic Dynamic Jockeying Between Two Parallel Queues

Strategic Dynamic Jockeying Between Two Parallel Queues Strategic Dynamic Jockeying Between Two Parallel Queues Amin Dehghanian 1 and Jeffrey P. Kharoufeh 2 Department of Industrial Engineering University of Pittsburgh 1048 Benedum Hall 3700 O Hara Street Pittsburgh,

More information

Statistics 150: Spring 2007

Statistics 150: Spring 2007 Statistics 150: Spring 2007 April 23, 2008 0-1 1 Limiting Probabilities If the discrete-time Markov chain with transition probabilities p ij is irreducible and positive recurrent; then the limiting probabilities

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts. An Upper Bound on the oss from Approximate Optimal-Value Functions Satinder P. Singh Richard C. Yee Department of Computer Science University of Massachusetts Amherst, MA 01003 singh@cs.umass.edu, yee@cs.umass.edu

More information

16.4 Multiattribute Utility Functions

16.4 Multiattribute Utility Functions 285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate

More information

Stochastic Analysis of Bidding in Sequential Auctions and Related Problems.

Stochastic Analysis of Bidding in Sequential Auctions and Related Problems. s Case Stochastic Analysis of Bidding in Sequential Auctions and Related Problems S keya Rutgers Business School s Case 1 New auction models demand model Integrated auction- inventory model 2 Optimizing

More information

Figure 10.1: Recording when the event E occurs

Figure 10.1: Recording when the event E occurs 10 Poisson Processes Let T R be an interval. A family of random variables {X(t) ; t T} is called a continuous time stochastic process. We often consider T = [0, 1] and T = [0, ). As X(t) is a random variable

More information

GI/M/1 and GI/M/m queuing systems

GI/M/1 and GI/M/m queuing systems GI/M/1 and GI/M/m queuing systems Dmitri A. Moltchanov moltchan@cs.tut.fi http://www.cs.tut.fi/kurssit/tlt-2716/ OUTLINE: GI/M/1 queuing system; Methods of analysis; Imbedded Markov chain approach; Waiting

More information

A TANDEM QUEUEING SYSTEM WITH APPLICATIONS TO PRICING STRATEGY. Wai-Ki Ching. Tang Li. Sin-Man Choi. Issic K.C. Leung

A TANDEM QUEUEING SYSTEM WITH APPLICATIONS TO PRICING STRATEGY. Wai-Ki Ching. Tang Li. Sin-Man Choi. Issic K.C. Leung Manuscript submitted to AIMS Journals Volume X, Number 0X, XX 00X Website: http://aimsciences.org pp. X XX A TANDEM QUEUEING SYSTEM WITH APPLICATIONS TO PRICING STRATEGY WAI-KI CHING SIN-MAN CHOI TANG

More information

SOLUTIONS IEOR 3106: Second Midterm Exam, Chapters 5-6, November 8, 2012

SOLUTIONS IEOR 3106: Second Midterm Exam, Chapters 5-6, November 8, 2012 SOLUTIONS IEOR 3106: Second Midterm Exam, Chapters 5-6, November 8, 2012 This exam is closed book. YOU NEED TO SHOW YOUR WORK. Honor Code: Students are expected to behave honorably, following the accepted

More information

Finding the Value of Information About a State Variable in a Markov Decision Process 1

Finding the Value of Information About a State Variable in a Markov Decision Process 1 05/25/04 1 Finding the Value of Information About a State Variable in a Markov Decision Process 1 Gilvan C. Souza The Robert H. Smith School of usiness, The University of Maryland, College Park, MD, 20742

More information

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes?

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes? IEOR 3106: Introduction to Operations Research: Stochastic Models Fall 2006, Professor Whitt SOLUTIONS to Final Exam Chapters 4-7 and 10 in Ross, Tuesday, December 19, 4:10pm-7:00pm Open Book: but only

More information

Motivation for introducing probabilities

Motivation for introducing probabilities for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.

More information

A Study on Performance Analysis of Queuing System with Multiple Heterogeneous Servers

A Study on Performance Analysis of Queuing System with Multiple Heterogeneous Servers UNIVERSITY OF OKLAHOMA GENERAL EXAM REPORT A Study on Performance Analysis of Queuing System with Multiple Heterogeneous Servers Prepared by HUSNU SANER NARMAN husnu@ou.edu based on the papers 1) F. S.

More information

Introduction to Markov Chains, Queuing Theory, and Network Performance

Introduction to Markov Chains, Queuing Theory, and Network Performance Introduction to Markov Chains, Queuing Theory, and Network Performance Marceau Coupechoux Telecom ParisTech, departement Informatique et Réseaux marceau.coupechoux@telecom-paristech.fr IT.2403 Modélisation

More information

CS 798: Homework Assignment 3 (Queueing Theory)

CS 798: Homework Assignment 3 (Queueing Theory) 1.0 Little s law Assigned: October 6, 009 Patients arriving to the emergency room at the Grand River Hospital have a mean waiting time of three hours. It has been found that, averaged over the period of

More information

M/G/1 and M/G/1/K systems

M/G/1 and M/G/1/K systems M/G/1 and M/G/1/K systems Dmitri A. Moltchanov dmitri.moltchanov@tut.fi http://www.cs.tut.fi/kurssit/elt-53606/ OUTLINE: Description of M/G/1 system; Methods of analysis; Residual life approach; Imbedded

More information

ON THE STRUCTURE OF OPTIMAL ORDERING POLICIES FOR STOCHASTIC INVENTORY SYSTEMS WITH MINIMUM ORDER QUANTITY

ON THE STRUCTURE OF OPTIMAL ORDERING POLICIES FOR STOCHASTIC INVENTORY SYSTEMS WITH MINIMUM ORDER QUANTITY Probability in the Engineering and Informational Sciences, 20, 2006, 257 270+ Printed in the U+S+A+ ON THE STRUCTURE OF OPTIMAL ORDERING POLICIES FOR STOCHASTIC INVENTORY SYSTEMS WITH MINIMUM ORDER QUANTITY

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

E-Companion to Fully Sequential Procedures for Large-Scale Ranking-and-Selection Problems in Parallel Computing Environments

E-Companion to Fully Sequential Procedures for Large-Scale Ranking-and-Selection Problems in Parallel Computing Environments E-Companion to Fully Sequential Procedures for Large-Scale Ranking-and-Selection Problems in Parallel Computing Environments Jun Luo Antai College of Economics and Management Shanghai Jiao Tong University

More information

Online Supplement to Delay-Based Service Differentiation with Many Servers and Time-Varying Arrival Rates

Online Supplement to Delay-Based Service Differentiation with Many Servers and Time-Varying Arrival Rates Online Supplement to Delay-Based Service Differentiation with Many Servers and Time-Varying Arrival Rates Xu Sun and Ward Whitt Department of Industrial Engineering and Operations Research, Columbia University

More information

Data analysis and stochastic modeling

Data analysis and stochastic modeling Data analysis and stochastic modeling Lecture 7 An introduction to queueing theory Guillaume Gravier guillaume.gravier@irisa.fr with a lot of help from Paul Jensen s course http://www.me.utexas.edu/ jensen/ormm/instruction/powerpoint/or_models_09/14_queuing.ppt

More information

Little s result. T = average sojourn time (time spent) in the system N = average number of customers in the system. Little s result says that

Little s result. T = average sojourn time (time spent) in the system N = average number of customers in the system. Little s result says that J. Virtamo 38.143 Queueing Theory / Little s result 1 Little s result The result Little s result or Little s theorem is a very simple (but fundamental) relation between the arrival rate of customers, average

More information

EQUILIBRIUM CUSTOMER STRATEGIES AND SOCIAL-PROFIT MAXIMIZATION IN THE SINGLE SERVER CONSTANT RETRIAL QUEUE

EQUILIBRIUM CUSTOMER STRATEGIES AND SOCIAL-PROFIT MAXIMIZATION IN THE SINGLE SERVER CONSTANT RETRIAL QUEUE EQUILIBRIUM CUSTOMER STRATEGIES AND SOCIAL-PROFIT MAXIMIZATION IN THE SINGLE SERVER CONSTANT RETRIAL QUEUE ANTONIS ECONOMOU AND SPYRIDOULA KANTA Abstract. We consider the single server constant retrial

More information

MULTIPLE CHOICE QUESTIONS DECISION SCIENCE

MULTIPLE CHOICE QUESTIONS DECISION SCIENCE MULTIPLE CHOICE QUESTIONS DECISION SCIENCE 1. Decision Science approach is a. Multi-disciplinary b. Scientific c. Intuitive 2. For analyzing a problem, decision-makers should study a. Its qualitative aspects

More information

A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion

A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion Cao, Jianhua; Nyberg, Christian Published in: Seventeenth Nordic Teletraffic

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process

More information

Since D has an exponential distribution, E[D] = 0.09 years. Since {A(t) : t 0} is a Poisson process with rate λ = 10, 000, A(0.

Since D has an exponential distribution, E[D] = 0.09 years. Since {A(t) : t 0} is a Poisson process with rate λ = 10, 000, A(0. IEOR 46: Introduction to Operations Research: Stochastic Models Chapters 5-6 in Ross, Thursday, April, 4:5-5:35pm SOLUTIONS to Second Midterm Exam, Spring 9, Open Book: but only the Ross textbook, the

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

Stochastic Models of Manufacturing Systems

Stochastic Models of Manufacturing Systems Stochastic Models of Manufacturing Systems Ivo Adan Systems 2/49 Continuous systems State changes continuously in time (e.g., in chemical applications) Discrete systems State is observed at fixed regular

More information

1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming 1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

More information

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

Total Expected Discounted Reward MDPs: Existence of Optimal Policies Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

A Queueing System with Queue Length Dependent Service Times, with Applications to Cell Discarding in ATM Networks

A Queueing System with Queue Length Dependent Service Times, with Applications to Cell Discarding in ATM Networks A Queueing System with Queue Length Dependent Service Times, with Applications to Cell Discarding in ATM Networks by Doo Il Choi, Charles Knessl and Charles Tier University of Illinois at Chicago 85 South

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

THE ON NETWORK FLOW EQUATIONS AND SPLITTG FORMULAS TRODUCTION FOR SOJOURN TIMES IN QUEUEING NETWORKS 1 NO FLOW EQUATIONS

THE ON NETWORK FLOW EQUATIONS AND SPLITTG FORMULAS TRODUCTION FOR SOJOURN TIMES IN QUEUEING NETWORKS 1 NO FLOW EQUATIONS Applied Mathematics and Stochastic Analysis 4, Number 2, Summer 1991, III-I16 ON NETWORK FLOW EQUATIONS AND SPLITTG FORMULAS FOR SOJOURN TIMES IN QUEUEING NETWORKS 1 HANS DADUNA Institut flit Mathematische

More information

Link Models for Packet Switching

Link Models for Packet Switching Link Models for Packet Switching To begin our study of the performance of communications networks, we will study a model of a single link in a message switched network. The important feature of this model

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

Stochastic process. X, a series of random variables indexed by t

Stochastic process. X, a series of random variables indexed by t Stochastic process X, a series of random variables indexed by t X={X(t), t 0} is a continuous time stochastic process X={X(t), t=0,1, } is a discrete time stochastic process X(t) is the state at time t,

More information

Inventory Ordering Control for a Retrial Service Facility System Semi- MDP

Inventory Ordering Control for a Retrial Service Facility System Semi- MDP International Journal of Engineering Science Invention (IJESI) ISS (Online): 239 6734, ISS (Print): 239 6726 Volume 7 Issue 6 Ver I June 208 PP 4-20 Inventory Ordering Control for a Retrial Service Facility

More information

Some notes on Markov Decision Theory

Some notes on Markov Decision Theory Some notes on Markov Decision Theory Nikolaos Laoutaris laoutaris@di.uoa.gr January, 2004 1 Markov Decision Theory[1, 2, 3, 4] provides a methodology for the analysis of probabilistic sequential decision

More information

On Stability and Sojourn Time of Peer-to-Peer Queuing Systems

On Stability and Sojourn Time of Peer-to-Peer Queuing Systems On Stability and Sojourn Time of Peer-to-Peer Queuing Systems Taoyu Li Minghua Chen Tony Lee Xing Li Tsinghua University, Beijing, China. {ldy03@mails.tsinghua.edu.cn,xing@cernet.edu.cn} The Chinese University

More information

Chapter 2. Poisson Processes. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan

Chapter 2. Poisson Processes. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan Chapter 2. Poisson Processes Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan Outline Introduction to Poisson Processes Definition of arrival process Definition

More information

PBW 654 Applied Statistics - I Urban Operations Research

PBW 654 Applied Statistics - I Urban Operations Research PBW 654 Applied Statistics - I Urban Operations Research Lecture 2.I Queuing Systems An Introduction Operations Research Models Deterministic Models Linear Programming Integer Programming Network Optimization

More information

Practical Dynamic Programming: An Introduction. Associated programs dpexample.m: deterministic dpexample2.m: stochastic

Practical Dynamic Programming: An Introduction. Associated programs dpexample.m: deterministic dpexample2.m: stochastic Practical Dynamic Programming: An Introduction Associated programs dpexample.m: deterministic dpexample2.m: stochastic Outline 1. Specific problem: stochastic model of accumulation from a DP perspective

More information

1 Basic concepts from probability theory

1 Basic concepts from probability theory Basic concepts from probability theory This chapter is devoted to some basic concepts from probability theory.. Random variable Random variables are denoted by capitals, X, Y, etc. The expected value or

More information

DES and RES Processes and their Explicit Solutions

DES and RES Processes and their Explicit Solutions DES and RES Processes and their Explicit Solutions Michael N Katehakis Dept of Management Science and Information Systems, Rutgers Business School - Newark and New Brunswick, 1 Washington Park Newark,

More information

Introduction to queuing theory

Introduction to queuing theory Introduction to queuing theory Queu(e)ing theory Queu(e)ing theory is the branch of mathematics devoted to how objects (packets in a network, people in a bank, processes in a CPU etc etc) join and leave

More information

Session-Based Queueing Systems

Session-Based Queueing Systems Session-Based Queueing Systems Modelling, Simulation, and Approximation Jeroen Horters Supervisor VU: Sandjai Bhulai Executive Summary Companies often offer services that require multiple steps on the

More information

2905 Queueing Theory and Simulation PART III: HIGHER DIMENSIONAL AND NON-MARKOVIAN QUEUES

2905 Queueing Theory and Simulation PART III: HIGHER DIMENSIONAL AND NON-MARKOVIAN QUEUES 295 Queueing Theory and Simulation PART III: HIGHER DIMENSIONAL AND NON-MARKOVIAN QUEUES 16 Queueing Systems with Two Types of Customers In this section, we discuss queueing systems with two types of customers.

More information

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017 CPSC 531: System Modeling and Simulation Carey Williamson Department of Computer Science University of Calgary Fall 2017 Motivating Quote for Queueing Models Good things come to those who wait - poet/writer

More information

QUEUING MODELS AND MARKOV PROCESSES

QUEUING MODELS AND MARKOV PROCESSES QUEUING MODELS AND MARKOV ROCESSES Queues form when customer demand for a service cannot be met immediately. They occur because of fluctuations in demand levels so that models of queuing are intrinsically

More information

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you

More information

Markov Processes Cont d. Kolmogorov Differential Equations

Markov Processes Cont d. Kolmogorov Differential Equations Markov Processes Cont d Kolmogorov Differential Equations The Kolmogorov Differential Equations characterize the transition functions {P ij (t)} of a Markov process. The time-dependent behavior of the

More information

Optimal Control of an Inventory System with Joint Production and Pricing Decisions

Optimal Control of an Inventory System with Joint Production and Pricing Decisions Optimal Control of an Inventory System with Joint Production and Pricing Decisions Ping Cao, Jingui Xie Abstract In this study, we consider a stochastic inventory system in which the objective of the manufacturer

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:

More information

Markov Decision Processes and their Applications to Supply Chain Management

Markov Decision Processes and their Applications to Supply Chain Management Markov Decision Processes and their Applications to Supply Chain Management Jefferson Huang School of Operations Research & Information Engineering Cornell University June 24 & 25, 2018 10 th Operations

More information

Inventory Control with Convex Costs

Inventory Control with Convex Costs Inventory Control with Convex Costs Jian Yang and Gang Yu Department of Industrial and Manufacturing Engineering New Jersey Institute of Technology Newark, NJ 07102 yang@adm.njit.edu Department of Management

More information

Page 0 of 5 Final Examination Name. Closed book. 120 minutes. Cover page plus five pages of exam.

Page 0 of 5 Final Examination Name. Closed book. 120 minutes. Cover page plus five pages of exam. Final Examination Closed book. 120 minutes. Cover page plus five pages of exam. To receive full credit, show enough work to indicate your logic. Do not spend time calculating. You will receive full credit

More information

problem. max Both k (0) and h (0) are given at time 0. (a) Write down the Hamilton-Jacobi-Bellman (HJB) Equation in the dynamic programming

problem. max Both k (0) and h (0) are given at time 0. (a) Write down the Hamilton-Jacobi-Bellman (HJB) Equation in the dynamic programming 1. Endogenous Growth with Human Capital Consider the following endogenous growth model with both physical capital (k (t)) and human capital (h (t)) in continuous time. The representative household solves

More information