Data-Based Resource-View of Service Networks: Performance Analysis, Delay Prediction and Asymptotics. PhD Proposal. Nitzan Carmeli.

Size: px

Start display at page:

Download "Data-Based Resource-View of Service Networks: Performance Analysis, Delay Prediction and Asymptotics. PhD Proposal. Nitzan Carmeli."

Alison Booker
6 years ago
Views:

1 Data-Based Resource-View of Service Networks: Performance Analysis, Delay Prediction and Asymptotics PhD Proposal Nitzan Carmeli Advisers: Prof. Avishai Mandelbaum Dr. Galit Yom-Tov Technion - Israel Institute of Technology February 6, 217

2 Contents 1 Introduction 3 2 Literature Review Wait Time and Sojourn Time Predictors Steady State Approximations of Waiting Times in Queueing Networks Heavy-Traffic Approximations Real-Time Estimators Delay Announcements Activity networks Wait Time Estimators for Two Queues in Tandem Wait Time Distribution Given Number of Customers in Both Queues: Exact Analysis Sojourn Time Wait Times Many-Server Heavy Traffic Approximations, Time Varying Fluid Approximation Diffusion Approximation Resource-Driven Activity Networks Fluid RDAN: The Static Model Theoretical Example Leading to a Limit with Two Time-Scales Two Coupled Membership Stations Data-Based Creation of Resource-Driven Activity Networks Data-Based Example of Two-Scale Model Research Goals 47 A Data-Based Creation of RDAN 49 1

3 List of Figures 1.1 Two snapshots of customer and server flows in a call center. The left graph shows the customer flow, while the right one shows the server flow. Animation: Customer / Server Networks A snapshots of a resource-driven activity network of a call center, including both customers and servers. Animation: RDAN animation Flow chart of an ED process Sojourn time at the nurse admission station, April 214 to May 215, All days Sojourn time at the doctor station, April 214 to May 215, All days Transition diagram, two M/M/1 queues in tandem Simulation results, CDF of sojourn time in the system Simulation results, CDF of wait time at the second queue. (Simulation (blue), estimation given T 1 < T 2 (red).) The waiting time diffusion term, taken from Mandelbaum et al. [26] Schematic description of the η membership model X η transition rate diagram Modified transition diagram SBR, routing of inbound calls to top-level agent groups. The ellipses represent different types of queues; the rectangles represent different types of top-level agent groups Occupancy profile of an exam room in an outpatient hospital Occupancy profile of an infusion chair in an outpatient hospital

4 Chapter 1 Introduction The proposed research focuses on developing a theory for analyzing queueing networks. Our primary motivation comes from healthcare systems, such as Emergency Departments (EDs); these are complex networks that involve multiple resource types (physicians, nurses, etc.), multiple patient types (differentiated by severity and medical specialty), highly variable and long processes, and a time-varying environment [4]. Such characteristics often result in patients experiencing long waiting times. Patients in emergency departments, clinics and other healthcare facilities often feel confused, stressed and helpless, as a result of the ambiguity of their service processes and waiting durations. This stress inflicts high pressure on personnel as well. Initial research [2, 3, 33] suggests that providing real-time information for patients on process development and durations might improve patient satisfaction and reduce stress. Furthermore, it was shown in [12] that providing delay information influences customer choices, which in turn impact the system performance. In particular, they showed that there is a substantial growth in Internet queries regarding ED wait times, and that patients take this information into consideration when choosing the ED they will turn to. This results in a pooling effect; a balance between the ED loads in different hospitals. Hence, if delay estimations are accurate the average waiting times in all hospitals (in a certain area) should decrease. We aim to develop the required theory and tools in order to provide such information by predicting delays in a complex queueing network. Therefore, our first goal is to develop wait-time predictors which will be based on individual data as well as the system state, all of which will be updated in real-time. This is a part of a multidisciplinary research which aims to test the effects of providing real-time, personalized information to patients on the behavior and feelings of patients, and on the operation efficiency of a hospital Emergency Department (ED). The research includes design of a platform for information transparency regarding medical processes, anticipated waiting time, duration and completion of tests and treatment, which will be implemented in a large Israeli hospital. It is therefore of necessary to develop wait time and sojourn time estimators which will account for the complexity in healthcare systems. Current research probes prediction methods that are either based on classical queuing theory [19 21, 28, 31], process mining techniques [13, 37], statistical learning methods or combinations of the following [3]. We shall focus on queuing theory predictors and how ought to expand these predictors to queueing networks (rather than a single station) with different, and sometimes unknown, priority policies. We are interested in individual, real-time wait time estimation, as opposed to the average wait time of all customers, which is usually based on steady-state assumption. We focus on estimating wait time distributions, rather than merely averages, as it may be beneficial to inform patients with higher percentiles of their expected waiting time, or even present it within a range. For example, in [3] it was shown that patient satisfaction deteriorates sharply as the waiting time exceeds the announced average waiting time, which may result in anger and violence towards the medical staff. Moreover, a preliminary user-study, as part of the multidisciplinary research, revealed that customers prominently prefer delay estimators which are given as a range 3

Version I: node 2 Figure 1.1: Two snapshots of customer and server flows in a call center. The left graph shows the customer flow, while the right one shows the server flow.

5 Version I: node 2 Figure 1.1: Two snapshots of customer and server flows in a call center. The left graph shows the customer flow, while the right one shows the server flow. Animation: Customer / Server Networks. ( your wait will be between 45 to 6 minutes ), and find them more understandable and credible than delay estimators which are given as: your wait will be approximately 45 minutes or your wait will be at least 45 minutes. The exploration of complicated service networks, such as healthcare systems, has led to the realization that a new framework for modeling and analyzing queueing networks is greatly needed. Healthcare systems, as well as other service systems (e.g. call centers), have become extremely large in the last decade. For example, one of the healthcare systems which provides the data for our research is a US outpatient hospital. On average, 1 patients visit this hospital every day, and receive treatment by 35 staff members. The essence of a large number of servers that grow with the arrival rate is captured in the many-server heavy traffic theory. In [14] it was shown that in the many-server regime, as opposed to conventional heavy traffic, servers may not always be busy, and the probability that an arriving customer will have to wait is strictly less than one. Therefore, not only customers wait for servers, but also servers wait for customers. This is also true in small scale systems, in which in order to provide a satisfactory level of service there is a need to reduce agent efficiency. Additionally, although servers are present in the system, they might be occupied with other activities, hence, unable to provide service. Consider for example an ED physician which is called to treat a patient in an internal ward, hence becomes unavailable for ED patients. This dynamic was introduced in the Erlang S model ( S for Servers ) presented in [6], and is indeed a natural dynamic in service systems which is also evident through data. An example for that is seen in Figure 1.1, taken from Momčilović et al. [29], which shows a time snapshot of both customer flow within a call center (on the left) as well as server flow (on the right) within that system. 1 (An animation representing the time evolution of the two networks can be found at Customer / Server Networks.) This apparent symmetry between customers and servers is the motivation for our novel framework 1 for 1 Rectangles in both graphs represent activities. Green rectangles correspond to activities that involve both customers and servers. The vertical bars on top of activities represent the number of customers/servers involved in a particular activity at a given time. Individual customers and servers are shown with small discs. A disc moves between two activities while the corresponding customer/server is engaged in the activity that corresponds to the egress node. 4

6 Figure 1.2: A snapshots of a resource-driven activity network of a call center, including both customers and servers. Animation: RDAN animation. queueing networks. Traditionally, when modeling or analyzing queueing networks, we focus on one entity in the service system usually either customers or servers, and analyze the system where each of the entities has its own role. Following [29] we propose a unifying framework under which each entity is equally considered as a resource. The queueing model will then be activity-oriented, where each activity requires a subset of the resources and produces others. We therefore refer to it as a Resource-Driven Activity Network or RDAN. Figure 1.2 is an example of such a network, taken from the same data of Figure 1.1. Here both customers and servers are shown in a single network, and both play a symmetric role. (See RDAN animation for the corresponding animation.) Our proposed framework is also natural when more than one type of server is needed in order to complete a service, which is exactly the case in healthcare systems, such as an ED or the outpatient hospital we are partnering with. For example, in order for an infusion process to take place, a patient, an available nurse, a vacant bed and the drug must all be present; all function as resources for the same activity. We note that there are other natural settings for which our framework would be beneficial; for example, the Uber application which matches customers and taxi drivers based on their location and requirements. Our second goal is therefore to develop methodologies for modeling service networks via data-driven RDANs, which will be then used to support design, staffing, and control of service networks, and will inspire novel prediction methods. Two healthcare systems provide the data for our research: an ED of a large Israeli hospital, and a US outpatient hospital. In the US hospital, each patient, staff member and equipment is fitted with RTLS (Real- Time Location System) device, which provides extraordinary transaction-level data. The Israeli ED database was characterized and established as one of the first steps in our research. It is unique transaction-level data that will become a database which updates in real-time in the future. These two systems are very different as the outpatient hospital processes are all planned in advance while an ED environment is much more stochastic. We will test the methodologies developed in this research on both environments. 5

7 This research proposal is structured as follows: in Chapter 2 we survey existing literature; in Chapter 3 we present two approaches for real-time estimation of wait time distribution - via exact analysis, and by heavy-traffic approximations. Chapter 4 presents the general framework of RDANs, followed by an example of a two time-scale limit arising naturally from the model, and an example of data-based creation of such RDANs. We conclude with Chapter 5 where we summarize our research goals. 6

8 Chapter 2 Literature Review We divide the literature review to two parts. First, we start with a survey, in Section 2.1, of known wait time estimators, and review existing research on the influence of delay announcements on customer behavior and system operation. Then, in Section 2.2, we review the theoretical grounding for our Resource-Driven Activity Networks. 2.1 Wait Time and Sojourn Time Predictors There is ample literature on wait time and sojourn time estimators for service systems. Each of those estimators, or prediction methods are appropriate under different circumstances. We start with steady state approximations for queuing networks, followed by heavy traffic approximations, both conventional heavy traffic and many-server heavy traffic (QED regime). We then survey the literature on real-time waiting estimators for single station, as well as real-time estimators for queuing networks Steady State Approximations of Waiting Times in Queueing Networks A classic result by Reich [34, 35] shows that in a system of M/M/1 queues in tandem, in steady state, the sojourn times (waiting time + service time) in each of the stations are independent and are exponentially distributed with rate (λ µ i ), where λ is the arrival rate and µ i is the service rate at station i. However, Burke et al. [1] showed that the waiting times in a system of M/M/1 queues in tandem are in fact dependent. For example, if W 1 is the waiting time at the first station, and W 2 is the waiting time at the second station, then it was shown that P (W 2 = W 1 = ) > P (W 2 = ). Burke [9] also showed that although the sojourn times in two M/M/s queues in tandem are independent, this does not hold for more than two stations. It was shown that a necessary condition for the independence of the sojourn times in a series of k M/M/s queues in tandem is that the number of servers in stations 2,..., k 1 will be equal to 1. This means that the first and the last stations could have multiple servers; however, the stations in the middle must be single server stations. This is actually the case in the outpatient hospital we are partnering with. A patient path within the hospital usually starts with blood draw, where the patient can receive treatment from a pool of nurses. The second station is a doctor exam; here each patient has her own doctor, and therefore, this station could be modeled as a single server queue. The third station is usually infusion; here the patient is assigned to one of several beds/chairs and can then receive treatment from a pool of nurses. Although the above results are based on the assumption of Poisson arrivals and exponential service times in steady state, it shall be interesting to examine these estimators on our outpatient hospital data with the three stations described here. 7

9 Lemoine [25] developed a set of recursive equations for the Laplace transform of the residual sojourn time within a service network. The model he analyzed is a Jackson network of N single server stations, with a FCFS priority policy. The goal was to derive the sojourn time moments for a customer entering the system at node i when the system is in steady state. Our goal is to expand this kind of analysis, which provides not just the expected value of the remaining sojourn time in the system but also moments of any order (or even better, the distribution), to real-time estimators (rather than steady state estimators) for customers remaining sojourn time in the system. Boxma [7] considered a service network of two M/G/1 queues in tandem, where the service time distribution of the first queue has a support (, b], and the service time distribution of the second queue has a support [b, ). This means that the probability that the service time in the first queue is shorter than the service time in the second queue is equal to 1. For the steady state case, he derived the sojourn time distribution at the second station, the joint distribution of the waiting time in both queues and the total waiting time. It follows from his analysis that when longer services are performed first the waiting time at the second station is zero. While this is true in theory, in practice, we find that also non-bottleneck stations have queues. We note that the results provided here are also presented in an extensive survey of product form on non-product form results for sojourn time in queueing networks given by Boxma and Daduna [8] Heavy-Traffic Approximations Conventional heavy-traffic Reiman [36] studied heavy-traffic diffusion approximations for sojourn time in Jackson networks with K single server stations. Defining λ i as the exogenous arrival rate to node i, µ i as the service rate at node i, P as the routing matrix, and denoting R = (I P ) 1, and γ = λr, the traffic intensity in node i is then defined as ρ i = γ i µ i. The conventional heavy traffic approximation is derived by taking a sequence of Jackson networks indexed by n 1, such that arrival rates, service rates, and the routing matrix depend on n, and ρ i (n) 1, as n. The main result of [36] is the snapshot principle: It is as if the customer takes a snapshot of the network when he enters and all queues remain at the same value during the customer s sojourn through the network. To elaborate on that, let h be a visiting vector - a vector associated with customer route through the system; h i denotes the number of times the customer enters station i during a visit in the system. The snapshot principle states that the diffusion approximation of the stationary sojourn time distribution, for a customer entering the system at node k and following a visiting vector h, is given by, W k,h K h i τ i where τ i are exponentially distributed with mean (µ i γ i ). This implies that the diffusion approximation only depends on the visiting vector h and not on the specific path within the system. It also implies that the customer waiting time at station i is the same for every visit of station i along the path. Snapshot principle-based estimators follow the assumption that changes in queue lengths are negligible during a customer visit within the system; this gave rise to the Last-to-Enter-Service and Head-of-the-Line estimators (see Subsection 2.1.3) which basically approximate the waiting time of an arriving customer by the waiting time of the last customer to enter service, or the elapsed waiting time of the customer who is currently at the head of the queue. Huang et al. [17], for example, used the snapshot principle to approximate queue length, virtual waiting time and sojourn times in a single server multi-class queue. They considered an emergency department physician treating two types of patients: triage (arriving) patients and in-process (IP) patients, where each of i=1 8

10 these types may be further divided into priority classes. The snapshot principle suggests that when a patient of class k completes service, the queue of class k patients approximately equals the number of patients that arrived during this patient waiting time (note that in conventional heavy-traffic, service time is negligible compared to the waiting time, hence there is no need to consider the patient service time). Defining Q k (t) as the queue length of class k patients at time t, λ k as the arrival rate of class k patients, ω k (t) as the virtual waiting time of a patient of class k arriving to the queue at time t, and τ k (t) as the age of the patient at the head of class k queue at time t, then by the snapshot principle it was shown that, and by Little s law, it follows that, Q k (t) λ k τ k (t), ω k (t) τ k (t). If a patient enters the ED at time t, and has a predetermined route which is characterized by the visiting vector h (as defined in [36]), then an approximation for the patient sojourn time in the ED will simply be: K h k τ k (t). k=1 The snapshot principle based estimators are appropriate in cryptically loaded systems (ρ 1) and when the priority policy within each class is FCFS. These criteria may not hold in healthcare systems, which often alternate between overloaded and underloaded periods, and in which the priority policy is rarely FCFS. Furthermore, patients priorities are usually unknown in advance and may change during their stay in the system. Therefore, although the snapshot principle provides simple estimators for wait times and sojourn times in service networks, it may not be adequate in our case. We are now in the process of testing the snapshot estimators on two healthcare data sets we acquired for this research. Many-server heavy-traffic Mandelbaum et al. [27] developed limit theorems for a variety of queueing networks, including classical Jackson networks, queues with abandonment and retrials, priority queues and Jackson networks with state dependent routing. The many-server heavy-traffic approximations are derived by taking a sequence of systems, indexed by n 1, such that arrival rates and the number of servers in the n th system are scaled by n, while service rates remain unscaled. The authors have developed both fluid and diffusion approximations for the queue length processes at each station (queue length was considered as the number of customers in queue and in service). In Section 3.2 we follow their theory, and use their functional strong law of large numbers and functional central limit theorems to derive wait time and sojourn time approximations. In the following papers [26, 28] they focus on a queue with abandonment and retrials and develop fluid and diffusion approximations for the queue length and the waiting time process. We shall elaborate on their approach as we follow it in Section 3.2 to develop wait time estimators for a system of two queues in tandem Real-Time Estimators Single station Whitt [4] presented the value of real-time estimators over steady state estimators, by showing that adding information on the system state (such as the number of customers in the system) can only reduce the variance of the estimator. As stated in the introduction, a preliminary user study suggests that customers perceive wait time estimators which are given as ranges as more credible. Thus, reducing the variance using information on the system state is important as one may provide wait time estimators which include a certain range around the average but are still informative enough. Whitt also emphasized the importance of estimating 9

11 the full CDF (or distribution) of the wait time, as it supports the interesting question of what should be announced to customers, and allows one to provide more information to customers than merely the average waiting time. He also provided exact and approximate predictors for wait time distribution in single station, which we intend to expand in our research for service networks. Ibrahim and Whitt [19, 2, 21] proposed a series of real-time wait time estimators for a single queue with abandonments, time varying arrival rate and capacity, general service, inter-arrival and patience distribution. Their predictors could be categorized as two types: queue-length-based (QL) predictors, and delay-historybased predictors. The simple (QL) predictors state that if a customer observes n customers in queue upon arrival, the mean service duration is m and there are s servers in the system, then the average waiting time can be approximated by: θ QL (n) = (n + 1)m s as the customer will have to wait for the completion of n+1 services before reaching service. This predictor can then be refined to account for abandonments (which will shorten the queue during the customer wait), and time varying capacity. An example of the delay-history-based predictors are the Head-of-the-Line (HOL) predictor and the Last-to-Enter-Service predictor. The simple form of these predictors state that the waiting time of an arriving customer will be as the waiting time of the customer who is currently at the head of the queue, or as the waiting time of the last customer to enter service. These predictors are also refined to account for the case of the time varying arrival rate. The authors also suggested a method for estimating the queue length using the wait time of the head of the line customer, and a method for estimating the wait time of the head of the line by the queue length. Ibrahim et al. [18] assumed that an arriving customer at time t is provided with the LES estimator, and defined the initial conditions which guarantee the asymptotically relative accuracy of this announced delay. They considered both the Quality and Efficiency Driven (QED) regime and the Efficiency Driven (ED) regime. In the QED regime, the condition for the relative accuracy of the LES estimator is that the initial number of idle servers and the number of customers in queues are not too large. In order for the LES delay announcement to be relatively accurate, the difference between the queue length seen at the customer arrival and the queue length seen at the arrival of the last customer to enter service should be negligible. In the ED regime, waiting times are very long; hence the state of the queue may change significantly during a customer s stay in the queue. The main condition, therefore, for the relative accuracy of the LES announcement, is the convergence of the initial number of customers in the system to its fluid limit. Thiongane et al. [39] expand the delay-history-based predictors to a weighted average form, that can coupe with multi-class queue. They estimated the arriving customer wait times using the wait time of customers from the same type that are either currently in queue (can be considered as weighted HOL) or that completed service (can be considered as weighted LES). They also suggested a predictor which is based on the wait times of customers from the same type who found the same queue length at their arrival. Although these real-time predictors cover a wide range of service systems they only account for single stations and provide solely an estimation for the average waiting time. Nakibly [31] focused on estimating wait time distribution given the system state at the time of arrival, in single station service systems with skills-based-routing (their motivation came from call centers). In order to derive the moment generating function of the waiting time in the system, they used first step analysis and solved it using difference equations. They also suggested estimating the wait time distribution using the Phase-type representation of the system and defining the absorbing state such that the wait time of a specific customer type will be given by the time to absorption. We shall use a similar approach in our exact analysis, given in Section 3.1. (2.1) 1

12 Queueing networks Ang et al. [3] focused on predicting the time-to-treatment in an ED. This is the time elapsed between the patient registration and the first encounter with a physician or a nurse practitioner (this time includes the triage process). The authors have shown that incorporating information about the ED state (the number of patients in queue, the processing rate or the number of providers), as well as fluid model estimators (such as given in 2.1) in statistical learning methods (e.g. regularized linear regression) significantly improves the accuracy of the time-to-treatment estimator. The authors suggested that their prediction method can be applied not just to entire time-to-treatment (from registration to first treatment) but also to the remaining time-to-treatment, which is the time between the triage process and the first treatment. After the triage state patients are given their acute level which defines their priority in the system; hence one needs to restrict the training data set to patients of the same triage level. We note here that Senderovich et al. [37, 38] also showed that when applying statistical learning methods, enriching the set of possible predictors with queueing perspective variables, such as the queue-length, results in better prediction models. Gal et al. [13] tackled the issue of predicting buses traveling times given a scheduled bus journey. Their predictors are a function of the source/destination pair and the time at which the prediction is made. The authors have modeled road segments between different bus stops as a network of queues, where the buses are assumed to be customers that travel through the network. Per each segment, they have defined a snapshotbased predictor: Last-Bus-to-Travel-Segment, and then extended it to the network setting as it was shown in [36] that the snapshot principle holds for networks when the route is predetermined. This approach was compared with machine learning techniques such as random forest and extremely randomized trees. These are boosting algorithms which construct sequences of models, each based on the previous one. Thus, the final stage was to combine both methods by using the snapshot-based model as the baseline model for the machine learning techniques. The three types of prediction methods were evaluated against real data, where the latter (combination of snapshot principle and machine learning) proved to be the most promising. The bus traveling time prediction is somewhat similar to the well familiar Waze application. Although the Waze application has the freedom to choose among different routes, it receives a source, a destination, and a time stamp, and produces a traveling time estimation (after choosing the best route). The application algorithms are not published. However, according to their website (Waze) they break the route into segments and use real time current road speed, provided by other Waze users on a specific segment road, to predict its travel time. It is therefore assumed that the speed of another Waze user ahead of you in a specific segment will be the same as your own speed in the segment, which is basically the snapshot principle. If there are no current observations, or not enough observations of travelers along a particular segment, Waze will use historic data, according to the time of the day, day of the week, month, special dates, and so on. This is actually applying machine learning techniques, which uses historic data to formalize a prediction method. We note that our goal of predicting waiting times and sojourn times within a service network is very similar to the goals of the time-to-treatment estimation and the traveling time estimation. In a sense, we are aiming to provide Waze for healthcare systems. However, unlike the proposed methods here, in our case customers routes may not be known in advanced, nor even the destination point. This is a problem that we do not address in this proposal but is part of our research plan. Additionally, the methods specified above results in a single number estimation, while we are interested in the full distribution or at least the first and second moments. As part of our research we shall use our ED and outpatient hospital data to test the methods proposed here, specifically those that are applying machine learning techniques with queueing theory estimators. 11

13 2.1.4 Delay Announcements As stated before, our research on wait time and sojourn time estimation in service networks, and in particular healthcare systems, is part of a multidisciplinary research that aims to test the effect of providing such estimators to patients. One of our goals is to develop estimators for wait/sojourn time distribution which could then provide different delay announcements to patients. We are currently collaborating with a large Israeli hospital in developing a platform which will provide, along with process information, individual real-time delay announcements to patients and their escorts. The effect of these delay announcements on patients feeling and behavior, and on the system operation will be then tested using a field study. Interesting questions are therefore what should be announced to patients? How will the delay announcements influence patients behavior and in turn system operation? These issues will not be addressed in this research proposal. However, we note that there is extensive literature on this subject, of which we only review a few here those that are the most relevant to our study. Dong et al. [12] found that patient are using information provided on the Internet regarding expected ED delays in order to chose the ED they will turn to. It was shown that accurate delay announcements will increase synchronization between near by hospitals, which means that the loads in approximate hospitals will be balanced, hence causing a desirable pooling effect. However, if delay estimations are not accurate, pooling will be partial and hence, network resources will not be fully utilized. Moreover, it was shown that the accuracy of the estimator is not the only parameter influencing the system efficiency, but also the method by which the estimation is made. Using simulation they found that a moving average estimator (based on the last 4 hours) may cause the opposite effect of desynchronization. (An effect that did not appear when using an estimator which is based on the true waiting time plus normally distributed white noise.) Although both estimators has the same accuracy, the former will cause self-osculation. Thus, the accuracy of a delay announcement is not the only measure to be considered, but also its time-bias. Jouini et al. [23] proposed a model in which customers are announced with a certain percentile of the estimated delay distribution, and may either balk or abandon. They presented the tradeoff in choosing the desired percentile. By informing high percentiles more customers are likely to balk, which may cause the system to be less efficient than it could be. On the other hand, announcing a low percentile may reduce balking but cause more abandonments which will be accompanied with dissatisfaction with the organization and a reduction in its perceived reliability. To choose the ideal percentile the authors used numerical analysis. In [22] it was assumed that the cost of overestimating (choosing high percentiles) and underestimating (choosing low percentiles) are not equal; hence, a newsvendor model was used to choose the announced percentile. In Jouini et al. [24] the virtual wait time distribution for multi-class M/M/s queue with two priority classes was found, at first with no delay announcements. Then, it was assumed that customers are provided with delay announcements which are based on a certain percentile of the estimated delay distribution, and that customers have a balking probability given the announced delay. The mean and variance of the virtual delay for both customer priorities, given the balking probability, was then calculated. This was done by turning the virtual delay into the time to absorption in a two-dimensional transient Markov chain. Armony et al. [5] also assumed that customers react to delay announcements in two ways by balking or by adjusting their patience according to the expected delay. However, they noted that the system performance is influenced by customers response to delay announcements; hence the actual delay may change. This leads to a need to find the equilibrium delay announcement, which will be consistent with the actual delay, given customers reaction to it. In this work, the conditions for the existence and the uniqueness of equilibrium delay announcement were provided. The equilibrium delay announcement was obtained for the fluid model of an M/M/s + M queue with exponential balking function and exponential conditional patience distribution. 12

14 Allon et al. [1] assumed that the service organization and the customers are strategic players, gaining (different) rewards from service and paying (differently) for the time spent in the system. The organization chooses a certain delay announcement given the queue length (which is not observed by customers) and customers choose whether to join or balk given the announcement. The organization may provide various types of delay announcements, which may be concrete or intangible, quantifiable or not, and even misleading. Although customers cannot confirm the information they receive, they are not naive, and thus, will make decisions based on their judgment of the information given. They characterized the equilibrium obtained which depends on the queueing dynamics and the strategic nature of the players. It was shown that in equilibrium, if the customers and the organizations are not perfectly aligned (perfectly aligned means that the customers and the organization agree on the preferred action for each state of the system), then the organization will impose an intentional vagueness. The purpose of this vagueness is to convince customers to join the queue when the system congestion is not too high and in cases for which, under full-information, the customers would not have joined it. This vagueness is in line of our approach that wait time announcements should provide customer range of waiting and not a single estimator of the mean. 2.2 Activity networks In Chapter 4 we present our resource-view approach for modeling service networks, which we refer to as RDAN. Our model relies on Harrison s stochastic processing networks (SPNs) and activity analysis [15, 16], which we now review. In SPNs, resources interact with materials via activities. One can think of resources as servers, materials as customers, and activities as services. Materials arrive to the network according to specified arrival rates and follow routes within the networks. The model inputs are k different materials, each with its own arrival rate, n different resources with their given capacity, and m activities. The model output is a plan vector x which specifies the rate at which each activity is conducted. Harrison [15] presented both a static fluid model and a stochastic, dynamic model. To explain the SPN approach and its difference from our RDAN we focus on the static model. The model defines a capacity consumption matrix A, where A i,j represents the rate at which activity j consumes the capacity of resource i. It also defines an input-output matrix R, where R l,j represents the rate at which material l is either produced (R l,j < ) or consumed (R l,j > ) by activity j. Defining b as the vector of resource capacities, and λ as the vector of materials arrival rates, a plan x will therefore be feasible if Ax b, Rx λ, and x. (2.2) There are a few fundamental differences between SPNs and our RDANs; we review them here, as they were presented in [29]. First, the fluid model is based on a conventional heavy-traffic regime, which means that the arrival rates and individual service rates are scaled (exhibit time speed-up) while the number of servers remains unscaled. Thus, resource pools can be considered as single server queues with high processing rates, that is, individual service times are negligibly short. As a result, for example, non-bottleneck nodes can be removed from the network for analysis purposes. In contrast, our model is based on a many-server regime, which means that there is no time speed-up for activity durations. Consequentially, all activities in the network must be considered, and one cannot focus on bottleneck nodes only. Additionally, in our model constraints on activity rates are due to a limited number of resources, which are considered to be both customers and servers, while in SPNs, constraints on activity rates are obtained due to finiteness of processing capacities of servers only. Finally, we are interested in the symmetry between customers and servers, however, we note that there is inherent asymmetry between the two entities. As customers arrive to the system exogenously and leave the system after completing their service (in Figure 1.1 this is depicted as discs moving along edges emanating 13

15 from activities and ending with a dot), servers stay in the system forever. Taking that into account, we focus on closed queueing networks (rather than open ones like SPNs), modeling exogenous arrivals and departures as an activity which produces arriving customers and to which all leaving customers are routed to. Note that analyzing closed networks is both more challenging and more general than analyzing open networks, as every open network can be modeled as a closed one, simply by modeling the outside world as a node in the system. 14

16 Chapter 3 Wait Time Estimators for Two Queues in Tandem Healthcare systems are complex queueing networks. As illustrated in Figure 3.1, a patient visit to an ED will include registration, preliminary exams by a nurse and then a doctor which might be followed by a series of tests (lab tests, imaging, etc.) and sometimes checkups or treatments provided by a nurse. The doctor will then decide whether to release or hospitalize the patient. Figure 3.1 presents a scheme of the ED process. We immediately observe that the resulting queueing network will consist of queues in tandem, queues in parallel, and fork-join queues. While some of the stations are single-server stations (e.g. nurse admission), others have multiple servers (e.g. lab tests). The fact that most patients return to the ED physician more than once imposes yet another challenge, as noted in Yom-Tov and Mandelbaum [41]. Huang et al. [17], for example, distinguished between two types of patients: triage (arriving) patients, and in-process (IP) patients, where each of these types can be divided into several classes according to the medical severity of the patient. They showed that the priority policy the physician should apply depends on the triage score as well as how advanced the patient is in the entire process. Even though most EDs do not apply such a complicated priority scheme, prioritizing patients according to severity is very common. Note that in reality, patients priorities, as well as their paths through the ED process, may not be known in advance, and may change during their ED visit. Another characteristic of this environment is its time-variability, as was shown clearly in Armony et al. [4]. As a result of this variability the system may alternate between overloaded, critically and underloaded periods, or equivalently with high, medium, or no wait for specific activities during one s stay. As discussed in Senderovich et al. [37], different wait time predictors are appropriate under different circumstances. For example, they compared queue-length based predictors with snapshot principle predictors (Last-to-Enter-Service, Head-of-the-Line), and predictors which are based on past delays of served customers (denoted by PTS). They have shown that queue-length based predictors performed poorly in heavily loaded systems, compared to the other predictors. However, in moderate loads, the queue-length based predictors outperform the PTS predictors. The snapshot principle predictors were superior in heavy loads and in moderate loads. It is important to note that the predictors were tested in a call center environment, in which the assumption that the queue length does not change significantly during the wait time of a single customer is fairly reasonable. Our preliminary ED data analysis revealed that the average length of stay in the ED is around 5 hours (and actually depends on the time of arrival, with a substantial decrease for patients arriving around 8:). As it was shown in Armony et al. [4], during such a long time the number of patients in the ED may change significantly; hence the snapshot principle may not be adequate in this case. The aforementioned properties of a healthcare system therefore imply that it is necessary to examine several types of wait time predictors, as there may not be one unified predictor which will be appropriate for 15

17 the whole system, at all times. Following our first goal of developing wait time estimators for service networks, we start with creating predictors for the real-time wait time distribution in two tandem queues (network); those queues may be time-varying and multi-server with different loads (Section 3.2). Other features that were discussed in Chapter 1 (such as different priorities and unknown patients routes) are not considered but will be part of our planned research. We start with a simple example of two queues in tandem. These two queues may correspond to the first two stations in the ED (after registration); the preliminary exams by nurse and then doctor. Exploratory data analysis of our ED data revealed that the sojourn time in both stations is well described by a log-normal distribution, as seen in Figures 3.2, and 3.3 (this is actually also true for the X-Ray station). Nurse Queue Lab tests Queue Nurse Lab Entrance Reception Triage Queue Triage (nurse) Doctor Queue Doctor Examination Imaging Queue Imaging CT/US/X-Ray Interpretation Queue Imaging Interpretation Consultant Queue Consultants External exams Queue External exams Eco/Gastro/ Exercise Stress Test External exam Interpretation Treatment Resource queue Synchronization queue Waiting for Hospitalization Discharge Exit ED Figure 3.1: Flow chart of an ED process 3.1 Wait Time Distribution Given Number of Customers in Both Queues: Exact Analysis In this section we take an exact approach for estimating the waiting time distribution for a customer reaching a network of two M t /M/1 queues in tandem, with FCFS service policy, and encountering x 1 customers in the first station (queue+service) and x 2 customers in the second station (queue+service). We shall call this customer the target customer. Let µ 1 be the service rate at the first station and µ 2 the service rate at the second station. Following the approach in [31], we can describe the system as a Markov jump process, having states (i, j) S, such that i represents the number of customers in station 1, and j represents the number of customers in station 2. Since we are assuming FCFS policy, we can stop all exogenous arrivals 16

Figure 3.2: Sojourn time at the nurse admission station, April 214 to May 215, All days. Figure 3.3: Sojourn time at the doctor station, April 214 to May 215, All days.

The sojourn time in the system will then have a phase-type distribution, which is the absorption time, starting at (x 1 + 1, x 2 ).

18 Figure 3.2: Sojourn time at the nurse admission station, April 214 to May 215, All days. Figure 3.3: Sojourn time at the doctor station, April 214 to May 215, All days. after the target customer arrival. State (x 1 + 1, x 2 ) will be the initial state, and (, ) the absorbing state; all other states are hence transient. The sojourn time in the system will then have a phase-type distribution, which is the absorption time, starting at (x 1 + 1, x 2 ). Let matrix R be the transient part of the infinitesimal generator, and q the initial distribution, q = (1,,..., ). The system moves from state (i, j) to state (i 1, j + 1) when a completion of service in station 1 occurs before a completion of service in station 2. This happens with rate mu 1. The system moves from state (i, j) to state (i, j 1) when a completion of service in station 2 occurs before a completion of service in station 1. Thus, the entries of R will be of the form: R (i,j),(i 1,j+1) = µ 1, i = 1,..., x 1 + 1, j =,..., x 2 + x R (i,j),(i,j 1) = µ 2, i =,..., x 1 + 1, j = 1,..., x 2 + x The resulting transition diagram is given in Figure Sojourn Time Let T be the absorption time, or equivalently, the sojourn time in the system for a target customer arriving when there are x 1 customers in station 1 and x 2 customers in station 2. The cumulative distribution function of T is therefore: F T (t) = 1 q exp tr 1, 17

where 1 is a vector of ones. Figure 3.4: Transition diagram, two M/M/1 queues in tandem. We have simulated four scenarios of the two queues in tandem, in all four of them we set λ = 1.

19 where 1 is a vector of ones. Figure 3.4: Transition diagram, two M/M/1 queues in tandem. We have simulated four scenarios of the two queues in tandem, in all four of them we set λ = 1. We realize that the time spent in each of the queues depends on the service rate in each of them and the number of customers that are already in the system at the time arrival (x 1, x 2 ). Thus, the four scenarios correspond to: Scenario 1 (denoted S1): µ 1 > µ 2, x 1 > x 2. Scenario 2 (denoted S2): µ 1 < µ 2, x 1 > x 2. Scenario 3 (denoted S3): µ 1 > µ 2, x 1 < x 2. Scenario 4 (denoted S4): µ 1 < µ 2, x 1 < x 2. We chose the service rate parameters to be close to the arrival rate (which is 1) in order for the simulation to easily reach reasonable queue lengths in both stations. Hence, we chose µ 1 = 1.2, and µ 2 = 1.5, and vice versa. The number of customers in each of the stations was chosen to be x = 12, and x 2 = 5, and vice versa. These values were chosen in order to create a substantial difference between the two queues, yet simulate a system which is of reasonable for the case of an ED for example. In each of the cases the simulation run for more than 2 iterations. Figure 3.5 presents the CDF of the total sojourn time of the target customer (in blue) and the CDF of the absorption time in the corresponding phase-type distribution (in red). We see that, as expected, in all four cases there is an excellent fit Wait Times Let W 1 be the customer waiting time in the first queue. A customer that finds x 1 customers in front of her, assuming FCFS policy, needs to wait a total of x 1 service times, each exponentially distributed with rate µ 1. Hence, (W 1 Q 1 = x 1, Q 2 = x 2 ) Gamma(x 1, µ 1 ). 18

20 Figure 3.5: Simulation results, CDF of sojourn time in the system. We define T 1 as the time elapsed from the customer s arrival until her exit from the first station (given x 1 customers in the first station at her arrival). It is given by, T 1 Gamma(x 1 + 1, µ 1 ). We define T 2 as the time required for the server in station 2 to serve all the x 2 customer in the second station; then T 2 Gamma(x 2, µ 2 ). Let T 3 be the time required for the server in station 2 to serve all x 1 + x 2 customers in the system, if they were all present in the second station; then T 3 Gamma(x 1 + x 2, µ 2 ). We note that there might be cases in which the server at station 2 is idle but there are still customers in station 1 (in the transition diagram in Figure 3.4, these are states (i, ), i > ). We define K as the number of time lags which occur whenever the server at station 2 is idle and the server at station 1 is busy. We can 19

21 think of K as the number of states of the form (i, ), i >, visited in the path from state (x + 1, x 2 ) to state (, ), thus K x From the memoryless property of the exponential distribution, each of these time lags is exponentially distributed with rate µ 1. We denote by L the sum of these time lags; then L k := (L K = k) Gamma(k, µ 1 ). Let W 2 be the target customer waiting time in the second queue; then P (W 2 > t Q 1 = x 1, Q 2 = x 2 ) = P (T 3 + L T 1 > t) = x 1 +1 k= We note that for each k =, 1,..., x 1 + 1: Hence, P (T 3 + L T 1 > t K = k) P (K = k) = x 1 +1 k= T 1 L k + L x1 +1 k. P (T 3 > t + T 1 L k K = k) P (K = k). P (T 3 > t + T 1 L k K = k) = P (T 3 > t + L x1 +1 k), L x1 +1 k Gamma (x k, µ 1 ), and we get, P (W 2 > t Q 1 = x 1, Q 2 = x 2 ) = = x 1 +1 k= x 1 +x 2 1 i= x 1 +1 k= P (T 3 > t + L x1 +1 k L x1 +1 k = s) f (s)ds Lx1 P (K = k) +1 k ((t + s)µ 2 ) i e ( (t+s)µ 2) µ x 1+1 k 1 i! Γ(x k) sx 1 k e µ 1s ds P (K = k). (3.1) It is left to find the distribution of K and then, by the law of total probability, the calculation of the waiting time distribution at the second queue will be complete. We note that entry (x, y) in the fundamental matrix: N = [I R] 1 represents the probability to visit state x given that we have started at state y. Hence, we can use the fundamental matrix to find the probability to reach each of the states (i, ), i >, starting at (x 1 + 1, x 2 ). Nevertheless, using this method to find the distribution of K may still be somewhat tedious. Our aim is to find a simple expression for the distribution of K. We shall also examine ways to detour the conditioning on the value of K, for example, by replacing it with the expected value of K. In some cases we can greatly simplify the calculations above, and eliminate the need for working with the Phase-Type matrices. By conditioning on the ratio between T 1 and T 2, for example, we get, P (W 2 > t Q 1 = x 1, Q 2 = x 2 ) = P (W 2 > t Q 1 = x 1, Q 2 = x 2, T 1 < T 2 ) P(T 1 < T 2 Q 1 = x 1, Q 2 = x 2 ) + P (W 2 > t Q 1 = x 1, Q 2 = x 2, T 1 > T 2 ) P(T 1 > T 2 Q 1 = x 1, Q 2 = x 2 ). 2

22 Given T 2 > T 1, it is obvious that K =, since this means that while server 2 serves the x 2 customers which are already there, all x 1 customers and the target customer, move from station 1 to station 2; hence there is no idleness in station 2. Therefore, P (W 2 > t Q 1 = x 1, Q 2 = x 2, T 1 < T 2 ) = P (T 3 T 1 > t T 1 = s) f T1 (s) ds. (3.2) Thus, we can first calculate P(T 1 < T 2 Q 1 = x 1, Q 2 = x 2 ) and if it is close to 1 use (3.2), instead of (3.1), to find the wait time distribution at the second queue. Figure 3.6 presents, for the four scenarios specified above, the empirical CDF of the wait time of the target customer at the second queue (in blue), and the results of: 1 P (T 3 T 1 > t T 1 = s) f T1 (s) ds (in red), which is the wait time estimator for the second queue given T 1 < T 2. Since the chosen service rates in both station were quite close, it turns out that the initial number of customers in each station had more effect on the probability that T 1 < T 2. We can see that in scenarios 3 and 4, where x = 5, and x 2 = 12, the probability that T 2 > T 1 is close to one; hence there is a very good fit. However, in the other two scenarios, 1 P (T 3 > t + s) f T1 (s) ds results with higher values than the actual CDF. This is due to the fact that without considering the time lags in which server 2 is idle, the total time it will take server 2 to clear all the customers in the system (x 1 + x 2 ) will be shorter than it actually is. We would have expected that the worst fit will be obtained for scenario 2, where the service rate at station 2 is higher and there are more customers at station 1 at the time of arrival. However, it seems that scenario 1 present the worst fit, although in this case station 1 has a higher service rate. To summarize this section, we note that the exact analysis approach presents some limitations: 1. When the first station has more than 1 server skips and slips during service might occur. This means that the target customer may enter the second queue before customers who were ahead of her at the first queue; in addition, customers arriving after the target customer might enter the second queue before her. One may assume that on average, the number of skips equals the number of slips hence; the above analysis will hold for the case of more than 1 server at the first station. We shall examine that using simulation. 2. When P(T 2 > T 1 ) < 1, one needs to use a matrix exponential in order to either find the phase-type distribution of the time to absorption, or to find the distribution of K (the number of time lags in which server 2 is idle). This approach will have the the curse of dimensionality as the number of stations in the system increases, or even if the initial state (x 1, x 2 ) is large. 3. This analysis allows only the arrivals to be time-varying, it does not support the case of time-varying service rates or staffing. Given the above limitations in the exact analysis approach, we now turn to examine heavy traffic approximations for the system of two queues in tandem. Our goal is to find approximations for the wait time distribution at each of the queues, as well as an approximation for the total sojourn time distribution. 21

23 Figure 3.6: Simulation results, CDF of wait time at the second queue. (Simulation (blue), estimation given T 1 < T 2 (red).) 3.2 Many-Server Heavy Traffic Approximations, Time Varying Following Mandelbaum et al. [27] in analyzing dynamics of queuing networks, we consider a network of two M t /M t /n t queues in tandem, with FCFS policy. We start with fluid analysis, in which we establish the fluid limits of the queue length processes, and derive the fluid limits of the wait time processes, in each of the two queues. The fluid limit will provide an approximation for the expected wait time, at each of the queues, and an approximation for the expected sojourn time. We shall then use diffusion analysis to derive a Gaussian approximation for the wait time in both queues. Let Q 1 (t) and Q 2 (t) be the number of customers at time t, in the first and second station (queue+service) respectively. The following notation will be of use: λ t - Arrival rate at the first station, at time t. µ 1t - Individual service rate at station 1, at time t. µ 2t - Individual service rate at station 2, at time t. 22

State-Dependent Estimation of Delay Distributions in Fork-Join Networks

State-Dependent Estimation of Delay Distributions in Fork-Join Networks Nitzan Carmeli, Galit B. Yom-Tov Technion Israel Institute of Technology, nitzany@campus.technion.ac.il, gality@technion.ac.il, Onno