On Optimization of the Total Present Value of Profits under Semi Markov Conditions
|
|
- Vernon Harper
- 5 years ago
- Views:
Transcription
1 On Optimization of the Total Present Value of Profits under Semi Markov Conditions Katehakis, Michael N. Rutgers Business School Department of MSIS 180 University, Newark, New Jersey 07102, U.S.A. Abstract: In this paper we have surveyed theory related to optimization of Semi Markov processes and we have applied these techniques to a simple dynamic ferry dispatch problem, when the customers arrive at a Ferry according to a Poisson process with rate λ > 0. We have formulated this dispatch problem as a two action semi-markov decision process and we have illustrated computationally that an optimal dispatch policy is characterized by a single critical numberx 0 such that it is optimal to wait until there are at leastx 0 customers on the ferry before a departure occurs. Key Words: Semi-Markov Optimization, Dynamic Scheduling, Markov Chains. 1 Introduction In this paper we have surveyed theory related to optimization of Semi Markov processes and we have applied these techniques to a simple ferry dispatch problem, when the customers arrive at a Ferry according to a Poisson process with rate λ > 0. We have formulated this dispatch problem as a two action semi-markov decision process and we have illustrated computationally that an optimal dispatch policy is characterized by a single critical number x 0 such that it is optimal to wait until there are at least x 0 customers on the ferry before a departure occurs. For related work in this are of dynamic scheduling we refer the reader to Ungureanu et-al [3] through Ungureanu et-al [6]. Further related work can be found in Zhao and Katehakis(2006) and Zhou and Katehakis (2008). The paper is organized as follows. In section 2 we survey the concepts of future and present rewards under continuous discounting. In Section 3 we present the main tools for optimization of a system that can be modelled as a Semi Markov Process, herein we follow Derman(1970) and Ross(1970). In Section 4 we present the model for a ferry dispatch and present computationally optimal dispatch policies. 2 Future and Present values of Rewards. 2.1 Future Values with Compounding. With compounding accumulated interest is added back to the principal, so that interest is earned on interest from that moment on. For example, a loan with $100 principal and an monthly interest rate of 1% that has its interest compounded every month would have a balance of $101 at the end of the first month, $ at the end of the second month, etc. Let t denote the total time in years, n the number of compounding periods per year, note that the total number of compounding periods (for example a period is a month) in a year isn t. Letρbe the nominal annual interest rate, expressed as a decimal. e.g.: 12% = Then, ρ/n is the per period interest and the future value at period t of an initial capital R 0 is 1 : 1 sometimes the simple interest is used in chich case the future value is R s(t) = R 0(1+tρ).
2 R cn (t) = R 0 ( 1+ ρ n) n t. Note that when the compounding frequency is annual, n will be 1 and R c1 (t) = R 0 (1+ρ) t. Since the principalr 0 is a coefficient, it is often dropped for simplicity, and the resulting accumulation function is used in interest theory instead. The accumulation function, b cn (t), is: b cn (t) = ( 1+ ρ ) n t n As n increases, the rate b cn (t) approaches an upper limit of e ρt, cf figure 1. This rate is called continuous compounding factor at rate ρ Figure 1: Convergence of b cn (t) to e ρt as n 2.2 Present Values with Compounding. With compounding accumulated interest is subtracted from principal, so for example, a loan with a monthly interest rate of 1% that has its interest compounded every month that has a balance of $ at the end of the second month, would have a balance of $101 at the end of the first month, and a present value $ at the start of the time horizon. As before let t denote the total time in years, n the number of compounding periods per year, and let ρ be the nominal annual interest rate, expressed as a decimal. e.g.: 12% = Then, ρ/n is the per period interest and for compounded discounting the present value at period t of an initial capital R 0 is : The discount function is: R cn (t) = R 0 d cn (t),. d cn (t) = ( 1 ρ n) n t. As n increases, the discount function d cn (t) approaches an upper limit of e ρt. This function e ρt is called continuous discounting factor at rate ρ.
3 3 Optimization in Semi Markov Processes. In this section we survey the most important features of a sequential decision process for which the times between transitions are random. Such a process is observed at time 0 and its state is classified into one element of the set X = {0,1,2,...}. If the process has just entered state x at time 0, an action a from a set of available actions A x must be chosen. Then, as a result of this state action pair (x,a) the following events unfold: (i) The time spent in state x, (the sojourn time in state x) conditional on the event that the next state visited when the process leaves state x is state y is a random variable S xa with probability distribution F xy,a. (ii) The probability that the next state is state y isp xy,a, where y X p xy,a = 1. After the process leaves statexand upon entering a new stateya new actiona from the set of allowable actions in state y, A y, must be chosen and steps (i) and (ii) above are repeated ad infinitum. It is further supposed there is a reward structure associated with the states visited and actions chosen. If action a is chosen when the process enters state x, then: (i) an immediate reward R(x,a) is earned; (ii) additional rewards accumulate at a rate r(x,a), per unit of time the process stays in state x. Thus, the total reward associated with the state action pair (x,a) when the process stays in state x for t units of time units, then is given by R(x,a)+tr(x,a), and its present value under continuous discounting is equal to: t R(x,a;t) = R(x,a)+ e ρs r(x,a)ds. 0 Remark. When the transition times are identically one, then the above is just a Markov decision process, and in the general case, it is called a semi - Markov decision process. We also note that if a stationary policy is employed, then the process{x(t), t 0} is a semi-markov process, where X(t) represents the state of the process at time t. To avoid trivialities, we will make the following assumption. Assumption I. (i) The reward functions R(x, a) and r(x, a). are bounded. (ii) There exist constants δ > 0 and ǫ > 0, such that P(S xa > δ) ǫ, (x,a). (1) Note that P(S xa > δ) = y X p xy,a F xy,a (δ), where we use the notation: F xy,a = 1 F xy,a. Thus, assumption A(ii) states that for every state x and action a, there is a positive probability of at least ǫ that the that the sojourn time in state x will be greater than δ. Hence, an infinite number of transitions can not occur in a finite interval. 3.1 Optimization of Present Values. We assume that rewards are continuously discounted, and the objective is to maximize the expected total present value of a stream of rewards. Note that a reward R received at at time t has equivalent present value (at time 0) equal tore ρt. Most of these results are well known, [1] and [2] and the theorems will be stated without proof. LetL S (ρ) denote the Laplace transform of a random variable S, i.e.,
4 L S (ρ) = E x,a e ρs. Notice that We also define L Sxa (ρ) = r(x,a;s xa ) = p xy (a) e ρt df xy,a (t). (2) 0 Sxa 0 r(x,a)e ρs ds. (3) Using the Eqs (2) and (3) above we obtain the following expression for the expected discounted reward R(x,a) during the sojourn time S xa, in state x when action a is taken: where R(x,a) = R(x,a)+ r(x,a), (4) r(x,a) = E r(x,a) = r(x,a)(1 Ee ρsxa )/ρ = r(x,a)(1 L Sxa (ρ))/ρ (5) Let X n and A n be respectively the n-th state of the process and the n-th action chosen, n = 1,2,... Now, for any deterministic policy π (i.e., a rule for choosing actions as a function of the past observations on states and times) and ρ > 0, the expected total discounted reward over an infinite horizon, w ρ,π (x), when policy π is employed is equal to: w ρ,π (x) = E π ( e ρ n 1 ν=0 S XνAν R(X n,a n ) X 0 = x) n=0 = R(x,π(x))+L Sxπ(x) (ρ) p xy (π(x))w ρ,π (y). (6) The value function is defined as follows: A policy π is optimal if v ρ (x) = sup{w ρ,π (x)}. (7) π w ρ,π (x) = v ρ (x), x X. (8) The following classic theorems, are used to specify the optimal value function and the the existence of a simple optimal policy. Theorem 1 Under the assumption 1, the value function v ρ (x) is the unique solution to the following system of equations. v ρ (x) = max { R(x,a)+L Sxa (ρ) p xy (a)v ρ (y)}. (9) a A(x)
5 Further, if we define a policy π 0 such that for all x X it chooses action π 0 (x) defined below: then we have the following theorem. π 0 (x)=argmax { R(x,a)+L Sxa (ρ) p xy (a)v ρ (y)}, (10) a A(x) Theorem 2 The value function v ρ (x) is the unique solution to the following system of equations. w ρ,π 0(x) = v ρ (x), x X. (11) Theorem 3 Under assumption 1, the iterates vρ(x) ν produce by (12) below converge to the value function v ρ (x), as ν, vρ(x)= ν max { R(x,a)+L Sxa (ρ) p xy (a)vρ ν 1 (y)}, (12) a A(x) for any arbitrary initial values v 0 ρ(x). 4 An Application: The Optimal Ferry Dispatch Problem. Suppose that customers arrive at a Ferry according to a Poisson process with rateλ > 0. At any time,t, the decision maker (captain) may, at a cost ofk+tk units depart, wherek is a fixed cost andk is a cost proportional to the delay from the nominal departure time. Suppose also that there is a revenue of R(x) if the ferry picks up all x customers, where R(x) is a bounded increasing nonnegative function. The process is assumed to go on indefinitely, and the problem is to select a policy which maximizes the total expected discounted profit for the ferry. This problem can be formulated as a two action semi-markov decision process with states: X = {1,2,...,S} where state x means that there are x customers currently on board and S is the capacity of the ferry. Let a 1 denote the action depart and let a 0 denote the action wait. We assume that the process repeats without delay. The parameters of the problem are: 1. Under action a 1 : p x1 (a 1 ) = 1, F Sx,a1 (t) = 1 e λt, L Sxa1 = λ and R(x,a 1 ) = R(x) K. 2. Under action a 0 : p xx+1 (a 0 ) = 1, F Sx,a0 (t) = 1 e λt, L Sxa1 = λ and R(x,a 0 ) = k 3. Also, and A(x) = {a 1,a 0 }, for x = 1,...,S 1, A(S) = {a 1 }, for x = S. Thus the Bellman s optimality conditions of theorem 1 for x < S are: v ρ (x) = max{t 1 (x,a 1 ),T 0 (x,a 0 )} (13) where λ T 1 (x,a 1 ) = R(x) K +v ρ (1)
6 and and for x = S they are T 0 (x,a 0 ) = k +v λ ρ(x+1) λ v ρ (S) = T 1 (S,a 1 ) = R(S) K +v(1). (14) It follows that it is optimal to depart when there arexcustomers present when λ R(x) K +v ρ (1) > k +v λ ρ(x+1) Using Theorem 1, we have done indicative computations using the values λ = 1, ρ =.1, K = 20, k :=.1, S = 40, and R(x) = rx, where r = 1.5. In Figure 2 we plot v ν ρ(25) and v ν ρ(30) versus ν in order to illustrate the convergence of of v ν ρ(x)) tov ρ (x) asν. Figure 2: Convergence ofv ν ρ (x)) tov ρ(x) asν. In Figure 3 we illustrate the form of the optimal policy where we observe that there exists a fixed critical constant x 0 such that π 0 (x) = 0 for x x 0 and π 0 (x) = 1 for x x 0. Figure 3: Optimal actions: π 0 (x)) References: [1] Derman, C.(1970). Finite State Markovian Decision Processes, Academic Press. [2] Ross, S. M. (1970). Applied Probability Models with Optimization Applications, Holden-Day, San Francisco, CA. [3] Ungureanu V., Melamed B., Katehakis M.N. and Bradford P.G. (2006). Deferred Assignment Scheduling in Cluster-based Servers. Cluster Computing 9(1) pp [4] Ungureanu V., Melamed B., Katehakis M.N. and Bradford P.G. (2006). Class-Dependent Assignment in cluster-based servers. SAC 2004: pp
7 [5] Ungureanu V., Melamed B. and Katehakis M.N. (2004). The LC Assignment Policy for Cluster-Based Servers. NCA 2004: pp [6] Ungureanu V., Melamed B. and Katehakis M.N. (2004). Performance Comparison of Assignment Policies on Cluster-based E-Commerce Servers, WSEAS Transactions. Also in Proceedings of the International Conference on Software Engineering, Parallel and Distributed Systems, February 13-15, 2004, Salzburg, Austria. [7] Ungureanu V., Melamed B. and Katehakis M.N. (2003). Towards an Efficient Cluster-Based E-Commerce Server. CLUSTER 2003: pp [8] Veinott A. F., (1966). On the optimality of(s, S) inventory policies: new conditions and a new proof. SIAM J. Appl. Math., pp [9] Zhao Y. and Katehakis M. N., (2006) on the structure of optimal ordering policies for stochastic inventory systems with minimum order quantity, Probability in the Engineering and Informational Sciences, pp [10] Zhou B., Katehakis M. N. and Zhao Y (2007). Effective control policies for stochastic inventory systems with minimum order quantity: and linear costs. International Journal Of Production Economics Vol 106(2)
On Finding Optimal Policies for Markovian Decision Processes Using Simulation
On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation
More information1 Markov decision processes
2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe
More informationPerformance Evaluation of Queuing Systems
Performance Evaluation of Queuing Systems Introduction to Queuing Systems System Performance Measures & Little s Law Equilibrium Solution of Birth-Death Processes Analysis of Single-Station Queuing Systems
More informationUNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}.
Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 1 UNIFORMIZATION IN MARKOV DECISION PROCESSES OGUZHAN ALAGOZ MEHMET U.S. AYVACI Department of Industrial and Systems Engineering, University of Wisconsin-Madison,
More informationOPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS
OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony
More informationChapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS
Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process
More informationQueueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K "
Queueing Theory I Summary Little s Law Queueing System Notation Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K " Little s Law a(t): the process that counts the number of arrivals
More informationSequential Selection of Projects
Sequential Selection of Projects Kemal Gürsoy Rutgers University, Department of MSIS, New Jersey, USA Fusion Fest October 11, 2014 Outline 1 Introduction Model 2 Necessary Knowledge Sequential Statistics
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationP (L d k = n). P (L(t) = n),
4 M/G/1 queue In the M/G/1 queue customers arrive according to a Poisson process with rate λ and they are treated in order of arrival The service times are independent and identically distributed with
More informationQueueing systems. Renato Lo Cigno. Simulation and Performance Evaluation Queueing systems - Renato Lo Cigno 1
Queueing systems Renato Lo Cigno Simulation and Performance Evaluation 2014-15 Queueing systems - Renato Lo Cigno 1 Queues A Birth-Death process is well modeled by a queue Indeed queues can be used to
More informationCOMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis. State University of New York at Stony Brook. and.
COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis State University of New York at Stony Brook and Cyrus Derman Columbia University The problem of assigning one of several
More informationRenewal theory and its applications
Renewal theory and its applications Stella Kapodistria and Jacques Resing September 11th, 212 ISP Definition of a Renewal process Renewal theory and its applications If we substitute the Exponentially
More informationSimplex Algorithm for Countable-state Discounted Markov Decision Processes
Simplex Algorithm for Countable-state Discounted Markov Decision Processes Ilbin Lee Marina A. Epelman H. Edwin Romeijn Robert L. Smith November 16, 2014 Abstract We consider discounted Markov Decision
More informationCDA5530: Performance Models of Computers and Networks. Chapter 4: Elementary Queuing Theory
CDA5530: Performance Models of Computers and Networks Chapter 4: Elementary Queuing Theory Definition Queuing system: a buffer (waiting room), service facility (one or more servers) a scheduling policy
More informationIEOR 6711, HMWK 5, Professor Sigman
IEOR 6711, HMWK 5, Professor Sigman 1. Semi-Markov processes: Consider an irreducible positive recurrent discrete-time Markov chain {X n } with transition matrix P (P i,j ), i, j S, and finite state space.
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive
More informationInfinite-Horizon Discounted Markov Decision Processes
Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected
More informationINDEX POLICIES FOR DISCOUNTED BANDIT PROBLEMS WITH AVAILABILITY CONSTRAINTS
Applied Probability Trust (4 February 2008) INDEX POLICIES FOR DISCOUNTED BANDIT PROBLEMS WITH AVAILABILITY CONSTRAINTS SAVAS DAYANIK, Princeton University WARREN POWELL, Princeton University KAZUTOSHI
More informationLinear Programming Methods
Chapter 11 Linear Programming Methods 1 In this chapter we consider the linear programming approach to dynamic programming. First, Bellman s equation can be reformulated as a linear program whose solution
More information6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE Control of continuous-time Markov chains Semi-Markov problems Problem formulation Equivalence to discretetime problems Discounted problems Average cost
More informationCentral-limit approach to risk-aware Markov decision processes
Central-limit approach to risk-aware Markov decision processes Jia Yuan Yu Concordia University November 27, 2015 Joint work with Pengqian Yu and Huan Xu. Inventory Management 1 1 Look at current inventory
More informationStability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk
Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk ANSAPW University of Queensland 8-11 July, 2013 1 Outline (I) Fluid
More informationOn Successive Lumping of Large Scale Systems
On Successive Lumping of Large Scale Systems Laurens Smit Rutgers University Ph.D. Dissertation supervised by Michael Katehakis, Rutgers University and Flora Spieksma, Leiden University April 18, 2014
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationSolutions to Homework Discrete Stochastic Processes MIT, Spring 2011
Exercise 6.5: Solutions to Homework 0 6.262 Discrete Stochastic Processes MIT, Spring 20 Consider the Markov process illustrated below. The transitions are labelled by the rate q ij at which those transitions
More informationStochastic Optimization
Chapter 27 Page 1 Stochastic Optimization Operations research has been particularly successful in two areas of decision analysis: (i) optimization of problems involving many variables when the outcome
More informationDynamic control of a tandem system with abandonments
Dynamic control of a tandem system with abandonments Gabriel Zayas-Cabán 1, Jingui Xie 2, Linda V. Green 3, and Mark E. Lewis 4 1 Center for Healthcare Engineering and Patient Safety University of Michigan
More informationLecture 7: Simulation of Markov Processes. Pasi Lassila Department of Communications and Networking
Lecture 7: Simulation of Markov Processes Pasi Lassila Department of Communications and Networking Contents Markov processes theory recap Elementary queuing models for data networks Simulation of Markov
More informationMarkov decision processes and interval Markov chains: exploiting the connection
Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic
More informationIntroduction to queuing theory
Introduction to queuing theory Claude Rigault ENST claude.rigault@enst.fr Introduction to Queuing theory 1 Outline The problem The number of clients in a system The client process Delay processes Loss
More informationMultiagent Value Iteration in Markov Games
Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges
More informationNon Markovian Queues (contd.)
MODULE 7: RENEWAL PROCESSES 29 Lecture 5 Non Markovian Queues (contd) For the case where the service time is constant, V ar(b) = 0, then the P-K formula for M/D/ queue reduces to L s = ρ + ρ 2 2( ρ) where
More informationQueuing Theory. Using the Math. Management Science
Queuing Theory Using the Math 1 Markov Processes (Chains) A process consisting of a countable sequence of stages, that can be judged at each stage to fall into future states independent of how the process
More informationDiscrete planning (an introduction)
Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135
More informationProcedia Computer Science 00 (2011) 000 6
Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-
More informationInfinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 1 Outline The average
More informationStochastic models in product form: the (E)RCAT methodology
Stochastic models in product form: the (E)RCAT methodology 1 Maria Grazia Vigliotti 2 1 Dipartimento di Informatica Università Ca Foscari di Venezia 2 Department of Computing Imperial College London Second
More informationDynamic Control of a Tandem Queueing System with Abandonments
Dynamic Control of a Tandem Queueing System with Abandonments Gabriel Zayas-Cabán 1 Jungui Xie 2 Linda V. Green 3 Mark E. Lewis 1 1 Cornell University Ithaca, NY 2 University of Science and Technology
More informationStrategic Dynamic Jockeying Between Two Parallel Queues
Strategic Dynamic Jockeying Between Two Parallel Queues Amin Dehghanian 1 and Jeffrey P. Kharoufeh 2 Department of Industrial Engineering University of Pittsburgh 1048 Benedum Hall 3700 O Hara Street Pittsburgh,
More informationStatistics 150: Spring 2007
Statistics 150: Spring 2007 April 23, 2008 0-1 1 Limiting Probabilities If the discrete-time Markov chain with transition probabilities p ij is irreducible and positive recurrent; then the limiting probabilities
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationApproximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.
An Upper Bound on the oss from Approximate Optimal-Value Functions Satinder P. Singh Richard C. Yee Department of Computer Science University of Massachusetts Amherst, MA 01003 singh@cs.umass.edu, yee@cs.umass.edu
More information16.4 Multiattribute Utility Functions
285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate
More informationStochastic Analysis of Bidding in Sequential Auctions and Related Problems.
s Case Stochastic Analysis of Bidding in Sequential Auctions and Related Problems S keya Rutgers Business School s Case 1 New auction models demand model Integrated auction- inventory model 2 Optimizing
More informationFigure 10.1: Recording when the event E occurs
10 Poisson Processes Let T R be an interval. A family of random variables {X(t) ; t T} is called a continuous time stochastic process. We often consider T = [0, 1] and T = [0, ). As X(t) is a random variable
More informationGI/M/1 and GI/M/m queuing systems
GI/M/1 and GI/M/m queuing systems Dmitri A. Moltchanov moltchan@cs.tut.fi http://www.cs.tut.fi/kurssit/tlt-2716/ OUTLINE: GI/M/1 queuing system; Methods of analysis; Imbedded Markov chain approach; Waiting
More informationA TANDEM QUEUEING SYSTEM WITH APPLICATIONS TO PRICING STRATEGY. Wai-Ki Ching. Tang Li. Sin-Man Choi. Issic K.C. Leung
Manuscript submitted to AIMS Journals Volume X, Number 0X, XX 00X Website: http://aimsciences.org pp. X XX A TANDEM QUEUEING SYSTEM WITH APPLICATIONS TO PRICING STRATEGY WAI-KI CHING SIN-MAN CHOI TANG
More informationSOLUTIONS IEOR 3106: Second Midterm Exam, Chapters 5-6, November 8, 2012
SOLUTIONS IEOR 3106: Second Midterm Exam, Chapters 5-6, November 8, 2012 This exam is closed book. YOU NEED TO SHOW YOUR WORK. Honor Code: Students are expected to behave honorably, following the accepted
More informationFinding the Value of Information About a State Variable in a Markov Decision Process 1
05/25/04 1 Finding the Value of Information About a State Variable in a Markov Decision Process 1 Gilvan C. Souza The Robert H. Smith School of usiness, The University of Maryland, College Park, MD, 20742
More information(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes?
IEOR 3106: Introduction to Operations Research: Stochastic Models Fall 2006, Professor Whitt SOLUTIONS to Final Exam Chapters 4-7 and 10 in Ross, Tuesday, December 19, 4:10pm-7:00pm Open Book: but only
More informationMotivation for introducing probabilities
for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.
More informationA Study on Performance Analysis of Queuing System with Multiple Heterogeneous Servers
UNIVERSITY OF OKLAHOMA GENERAL EXAM REPORT A Study on Performance Analysis of Queuing System with Multiple Heterogeneous Servers Prepared by HUSNU SANER NARMAN husnu@ou.edu based on the papers 1) F. S.
More informationIntroduction to Markov Chains, Queuing Theory, and Network Performance
Introduction to Markov Chains, Queuing Theory, and Network Performance Marceau Coupechoux Telecom ParisTech, departement Informatique et Réseaux marceau.coupechoux@telecom-paristech.fr IT.2403 Modélisation
More informationCS 798: Homework Assignment 3 (Queueing Theory)
1.0 Little s law Assigned: October 6, 009 Patients arriving to the emergency room at the Grand River Hospital have a mean waiting time of three hours. It has been found that, averaged over the period of
More informationM/G/1 and M/G/1/K systems
M/G/1 and M/G/1/K systems Dmitri A. Moltchanov dmitri.moltchanov@tut.fi http://www.cs.tut.fi/kurssit/elt-53606/ OUTLINE: Description of M/G/1 system; Methods of analysis; Residual life approach; Imbedded
More informationON THE STRUCTURE OF OPTIMAL ORDERING POLICIES FOR STOCHASTIC INVENTORY SYSTEMS WITH MINIMUM ORDER QUANTITY
Probability in the Engineering and Informational Sciences, 20, 2006, 257 270+ Printed in the U+S+A+ ON THE STRUCTURE OF OPTIMAL ORDERING POLICIES FOR STOCHASTIC INVENTORY SYSTEMS WITH MINIMUM ORDER QUANTITY
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationE-Companion to Fully Sequential Procedures for Large-Scale Ranking-and-Selection Problems in Parallel Computing Environments
E-Companion to Fully Sequential Procedures for Large-Scale Ranking-and-Selection Problems in Parallel Computing Environments Jun Luo Antai College of Economics and Management Shanghai Jiao Tong University
More informationOnline Supplement to Delay-Based Service Differentiation with Many Servers and Time-Varying Arrival Rates
Online Supplement to Delay-Based Service Differentiation with Many Servers and Time-Varying Arrival Rates Xu Sun and Ward Whitt Department of Industrial Engineering and Operations Research, Columbia University
More informationData analysis and stochastic modeling
Data analysis and stochastic modeling Lecture 7 An introduction to queueing theory Guillaume Gravier guillaume.gravier@irisa.fr with a lot of help from Paul Jensen s course http://www.me.utexas.edu/ jensen/ormm/instruction/powerpoint/or_models_09/14_queuing.ppt
More informationLittle s result. T = average sojourn time (time spent) in the system N = average number of customers in the system. Little s result says that
J. Virtamo 38.143 Queueing Theory / Little s result 1 Little s result The result Little s result or Little s theorem is a very simple (but fundamental) relation between the arrival rate of customers, average
More informationEQUILIBRIUM CUSTOMER STRATEGIES AND SOCIAL-PROFIT MAXIMIZATION IN THE SINGLE SERVER CONSTANT RETRIAL QUEUE
EQUILIBRIUM CUSTOMER STRATEGIES AND SOCIAL-PROFIT MAXIMIZATION IN THE SINGLE SERVER CONSTANT RETRIAL QUEUE ANTONIS ECONOMOU AND SPYRIDOULA KANTA Abstract. We consider the single server constant retrial
More informationMULTIPLE CHOICE QUESTIONS DECISION SCIENCE
MULTIPLE CHOICE QUESTIONS DECISION SCIENCE 1. Decision Science approach is a. Multi-disciplinary b. Scientific c. Intuitive 2. For analyzing a problem, decision-makers should study a. Its qualitative aspects
More informationA monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion
A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion Cao, Jianhua; Nyberg, Christian Published in: Seventeenth Nordic Teletraffic
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process
More informationSince D has an exponential distribution, E[D] = 0.09 years. Since {A(t) : t 0} is a Poisson process with rate λ = 10, 000, A(0.
IEOR 46: Introduction to Operations Research: Stochastic Models Chapters 5-6 in Ross, Thursday, April, 4:5-5:35pm SOLUTIONS to Second Midterm Exam, Spring 9, Open Book: but only the Ross textbook, the
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationStochastic Models of Manufacturing Systems
Stochastic Models of Manufacturing Systems Ivo Adan Systems 2/49 Continuous systems State changes continuously in time (e.g., in chemical applications) Discrete systems State is observed at fixed regular
More information1 Stochastic Dynamic Programming
1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future
More informationTotal Expected Discounted Reward MDPs: Existence of Optimal Policies
Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationA Queueing System with Queue Length Dependent Service Times, with Applications to Cell Discarding in ATM Networks
A Queueing System with Queue Length Dependent Service Times, with Applications to Cell Discarding in ATM Networks by Doo Il Choi, Charles Knessl and Charles Tier University of Illinois at Chicago 85 South
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationTHE ON NETWORK FLOW EQUATIONS AND SPLITTG FORMULAS TRODUCTION FOR SOJOURN TIMES IN QUEUEING NETWORKS 1 NO FLOW EQUATIONS
Applied Mathematics and Stochastic Analysis 4, Number 2, Summer 1991, III-I16 ON NETWORK FLOW EQUATIONS AND SPLITTG FORMULAS FOR SOJOURN TIMES IN QUEUEING NETWORKS 1 HANS DADUNA Institut flit Mathematische
More informationLink Models for Packet Switching
Link Models for Packet Switching To begin our study of the performance of communications networks, we will study a model of a single link in a message switched network. The important feature of this model
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationStochastic process. X, a series of random variables indexed by t
Stochastic process X, a series of random variables indexed by t X={X(t), t 0} is a continuous time stochastic process X={X(t), t=0,1, } is a discrete time stochastic process X(t) is the state at time t,
More informationInventory Ordering Control for a Retrial Service Facility System Semi- MDP
International Journal of Engineering Science Invention (IJESI) ISS (Online): 239 6734, ISS (Print): 239 6726 Volume 7 Issue 6 Ver I June 208 PP 4-20 Inventory Ordering Control for a Retrial Service Facility
More informationSome notes on Markov Decision Theory
Some notes on Markov Decision Theory Nikolaos Laoutaris laoutaris@di.uoa.gr January, 2004 1 Markov Decision Theory[1, 2, 3, 4] provides a methodology for the analysis of probabilistic sequential decision
More informationOn Stability and Sojourn Time of Peer-to-Peer Queuing Systems
On Stability and Sojourn Time of Peer-to-Peer Queuing Systems Taoyu Li Minghua Chen Tony Lee Xing Li Tsinghua University, Beijing, China. {ldy03@mails.tsinghua.edu.cn,xing@cernet.edu.cn} The Chinese University
More informationChapter 2. Poisson Processes. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan
Chapter 2. Poisson Processes Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan Outline Introduction to Poisson Processes Definition of arrival process Definition
More informationPBW 654 Applied Statistics - I Urban Operations Research
PBW 654 Applied Statistics - I Urban Operations Research Lecture 2.I Queuing Systems An Introduction Operations Research Models Deterministic Models Linear Programming Integer Programming Network Optimization
More informationPractical Dynamic Programming: An Introduction. Associated programs dpexample.m: deterministic dpexample2.m: stochastic
Practical Dynamic Programming: An Introduction Associated programs dpexample.m: deterministic dpexample2.m: stochastic Outline 1. Specific problem: stochastic model of accumulation from a DP perspective
More information1 Basic concepts from probability theory
Basic concepts from probability theory This chapter is devoted to some basic concepts from probability theory.. Random variable Random variables are denoted by capitals, X, Y, etc. The expected value or
More informationDES and RES Processes and their Explicit Solutions
DES and RES Processes and their Explicit Solutions Michael N Katehakis Dept of Management Science and Information Systems, Rutgers Business School - Newark and New Brunswick, 1 Washington Park Newark,
More informationIntroduction to queuing theory
Introduction to queuing theory Queu(e)ing theory Queu(e)ing theory is the branch of mathematics devoted to how objects (packets in a network, people in a bank, processes in a CPU etc etc) join and leave
More informationSession-Based Queueing Systems
Session-Based Queueing Systems Modelling, Simulation, and Approximation Jeroen Horters Supervisor VU: Sandjai Bhulai Executive Summary Companies often offer services that require multiple steps on the
More information2905 Queueing Theory and Simulation PART III: HIGHER DIMENSIONAL AND NON-MARKOVIAN QUEUES
295 Queueing Theory and Simulation PART III: HIGHER DIMENSIONAL AND NON-MARKOVIAN QUEUES 16 Queueing Systems with Two Types of Customers In this section, we discuss queueing systems with two types of customers.
More informationCPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017
CPSC 531: System Modeling and Simulation Carey Williamson Department of Computer Science University of Calgary Fall 2017 Motivating Quote for Queueing Models Good things come to those who wait - poet/writer
More informationQUEUING MODELS AND MARKOV PROCESSES
QUEUING MODELS AND MARKOV ROCESSES Queues form when customer demand for a service cannot be met immediately. They occur because of fluctuations in demand levels so that models of queuing are intrinsically
More informationThis question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.
This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you
More informationMarkov Processes Cont d. Kolmogorov Differential Equations
Markov Processes Cont d Kolmogorov Differential Equations The Kolmogorov Differential Equations characterize the transition functions {P ij (t)} of a Markov process. The time-dependent behavior of the
More informationOptimal Control of an Inventory System with Joint Production and Pricing Decisions
Optimal Control of an Inventory System with Joint Production and Pricing Decisions Ping Cao, Jingui Xie Abstract In this study, we consider a stochastic inventory system in which the objective of the manufacturer
More informationBayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem
Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:
More informationMarkov Decision Processes and their Applications to Supply Chain Management
Markov Decision Processes and their Applications to Supply Chain Management Jefferson Huang School of Operations Research & Information Engineering Cornell University June 24 & 25, 2018 10 th Operations
More informationInventory Control with Convex Costs
Inventory Control with Convex Costs Jian Yang and Gang Yu Department of Industrial and Manufacturing Engineering New Jersey Institute of Technology Newark, NJ 07102 yang@adm.njit.edu Department of Management
More informationPage 0 of 5 Final Examination Name. Closed book. 120 minutes. Cover page plus five pages of exam.
Final Examination Closed book. 120 minutes. Cover page plus five pages of exam. To receive full credit, show enough work to indicate your logic. Do not spend time calculating. You will receive full credit
More informationproblem. max Both k (0) and h (0) are given at time 0. (a) Write down the Hamilton-Jacobi-Bellman (HJB) Equation in the dynamic programming
1. Endogenous Growth with Human Capital Consider the following endogenous growth model with both physical capital (k (t)) and human capital (h (t)) in continuous time. The representative household solves
More information