Blind Fair Routing in Large-Scale Service Systems with Heterogeneous Customers and Servers

Size: px

Start display at page:

Download "Blind Fair Routing in Large-Scale Service Systems with Heterogeneous Customers and Servers"

Pearl Lucas
5 years ago
Views:

1 OPERATIONS RESEARCH Vol. 6, No., January February 23, pp ISSN 3-364X (print) ISSN (online) 23 INFORMS Blind Fair Routing in Large-Scale Service Systems with Heterogeneous Customers and Servers Amy R. Ward Marshall School of Business, University of Southern California, Los Angeles, California 989, Mor Armony Stern School of Business, New York University, New York, New York 2, In a call center, arriving customers must be routed to available servers, and servers that have just become available must be scheduled to help waiting customers. These dynamic routing and scheduling decisions are very difficult, because customers have different needs and servers have different skill levels. A further complication is that it is preferable that these decisions are made blindly; that is, they depend only on the system state and not on system parameter information such as call arrival rates and service speeds. This is because this information is generally not known with certainty. Ideally, a dynamic control policy for making routing and scheduling decisions balances customer and server needs by keeping customer delays low but still fairly dividing the workload amongst the various servers. In this paper, we propose a blind dynamic control policy for parallel-server systems with multiple customer classes and server pools that is based on the number of customers waiting and the number of agents idling. We show that in the Halfin-Whitt many-server heavy-traffic limiting regime, our proposed blind policy performs extremely well when the objective is to minimize customer holding costs subject to server fairness, as defined by how the system idleness is divided among servers. To do this, we formulate an approximating diffusion control problem (DCP) and compare the performance of the nonblind DCP solution to a feasible policy for the DCP that is blind. We establish that the increase in the DCP objective function value is small over a wide range of parameter values. We then use simulation to validate that a small increase in the DCP objective function value is indicative of our proposed blind policy performing very well. Subject classifications: probability; diffusion: stochastic model applications; queues; approximations; diffusion models. Area of review: Stochastic Models. History: Received February 2; revisions received October 2, April 22, June 22; accepted September 22. Published online in Articles in Advance February 8, 23.. Introduction Large-scale call centers and other parallel-server service systems with heterogeneous customer and server populations have brought up the need for skill-based control policies that dynamically match waiting customers and available servers. These dynamic control policies have two components: a routing component that specifies what to do when a customer arrives to the system, and a scheduling component that specifies what to do when a server completes service and becomes available. Many such dynamic control policies have been developed in industry and academia over the last two decades. Traditionally, these policies have been customer-centric in the sense that they have focused on customer related goals, such as assigning customers to servers whose expertise best match the needs of these customers, and minimizing customer wait time. More recently, the recognition that employee satisfaction is also important for business success has led to policies that also consider server-centric goals, such as considering server preference when assigning customers to servers and making sure all servers have some idle time; see, for example, 5 in Aksin et al. (27) and Whitt (26). This leads to the desire for dynamic control policies that balance customer and server goals. For example, the control policies in Armony (25), Gurvich and Whitt (29b), and Dai and Tezcan (28) asymptotically minimize customer delay cost, but this is at the expense of the faster servers having a heavier workload, and the slower servers experiencing almost all of the idle time. This is unfair. At the same time, fairness has been shown by psychologists to be a key component in employee satisfaction (Cohen-Charash and Spector 2, Colquitt et al. 2). This motivates defining a fair policy and having a revised goal of minimizing customer delay cost within the smaller class of fair policies, as observed in Armony and Ward (2). It is also the case that model parameters may not be known. In particular, parameters such as arrival rates and mean service times can have large forecast errors associated with them. However, many of the dynamic control 228

2 Operations Research 6(), pp , 23 INFORMS 229 policies for parallel-server systems that have been proposed rely on the assumption that these parameters are known. It is well recognized that this issue calls for policies that either estimate these parameters in real time, or do not use parameter information at all, which is true of the QIR policies in Gurvich and Whitt (29a). As in Atar et al. (2), we refer to the latter as blind policies. We are interested in finding a blind dynamic control policy that has small customer holding or delay cost and also satisfies fairness constraints with respect to how the system idleness is divided among servers. We would also like the policy to be simple and easily implementable. The first issue is that finding a dynamic control policy that minimizes customer holding or delay cost subject to fairness constraints is very difficult, even when there is no requirement that the policy is blind. This is because finding dynamic control policies for parallel server systems that solve some defined optimization problem is a notoriously difficult problem, due to the curse of dimensionality (many call centers have hundreds to thousands of agents, and the time required to solve this problem numerically becomes prohibitively long as the number of agents becomes large). Furthermore, it is also true that, in general, optimal policies that are simple and easily implementable can only be found for relatively simple models. For larger and more complex models, there is a need for simple and easily implementable policies that are approximately optimal. Therefore, we will follow the approach of looking for control policies, and evaluating their performance, in the Halfin-Whitt many-server heavy-traffic limit regime. In relation to our first paper on the topic of server fairness (Armony and Ward 2), which found an asymptotically optimal fair policy in the Halfin-Whitt limit regime, our main focus in this paper is to evaluate the performance of fair policies that are blind. We consider the diffusion control problem (DCP) that approximates the optimization problem whose objective is to minimize customer holding cost subject to a fairness constraint that is based on the number of idle servers. This DCP also approximates the optimization problem whose objective is to minimize customer delay cost subject to a fairness constraint that is based on agent idle time. We solve the DCP by proving that it is separable; that is, its solution can be found by solving two simpler DCPs, one that involves only routing and one that involves only scheduling, both of whose solutions can be found in the literature. However, the solution to the DCP is not blind. There is a very simple intuitive blind control policy for the DCP that keeps the idleness ratios fixed (as opposed to the DCP solution under which the idleness ratios depend on the diffusion s state). We show that the percentage increase in the DCP objective function value over its minimum when the blind control is used can be calculated by considering a two-pool inverted-v model (that is, a system with one customer class and servers that can serve at one of two speeds). This provides a closed-form analytic expression for the percentage increase in the DCP objective function value when the blind control is used instead of the optimal control. We show that this percentage increase is small in a large part of the parameter space. The fact that the impact of using a blind control for the DCP is small suggests that the impact of using a blind control policy in the parallel-server system model is also small. We test the accuracy of this statement using simulation and find supporting evidence for it. We complete this introduction with a brief review of the most relevant literature. Then, in 2, we present our model and problem formulation. We solve the relevant DCP in 3. We then propose a blind control policy that is feasible for the DCP, and evaluate the percentage increase in the DCP objective function value over its minimum when that blind control is used in 4. In 5, we propose a blind control policy for our model that is based on the number of customers waiting and the number of servers idling, and establish via simulation that a small increase in the DCP objective function value is indicative of our proposed blind policy performing very well. In 6 we present an analogous problem formulation that is expressed in terms of the customer waiting time and server idling time, and propose a corresponding blind policy. A finite time horizon problem formulation is proposed in 7 that requires fairness at all times. Our proposed blind policy is shown to be asymptotically optimal with respect to this finite-horizon formulation. Finally, we make concluding remarks in 8. An electronic companion to this paper is available as part of the online version at All the proofs are presented in the e-companion, as well as some supporting technical material for arguments in the main paper body. Literature Review The literature on skill-based routing is extensive. A recent survey of this literature is included in Aksin et al. (27). Here we survey only the most closely related stream of literature. We begin with the paper by Gurvich and Whitt (29b). That paper shows that a generalized c (Gc) scheduling rule combined with a fastest-server-first routing rule asymptotically stochastically minimizes customer delay costs over a finite horizon in a parallel-server system with multiple customer classes and multiple server pools when the delay cost functions are strictly convex in the Halfin-Whitt manyserver limiting regime. The Gc scheduling rule follows Van Mieghem (995) and Mandelbaum and Stolyar (24), who proved asymptotic optimality of this scheduling rule in the conventional heavy-traffic limit. The Gc scheduling rule is interesting in large part because it is blind when the service rate depends on the server pool only, and not also on the customer class (as is the case in Gurvich and Whitt 29b). However, under fastest-server-first routing, nearly all of the system idle time is experienced by the servers from the pool with the slowest service speed. From the server perspective, this is unfair.

3 23 Operations Research 6(), pp , 23 INFORMS It is also true in the closely related work by Dai and Tezcan (28) that nearly all the system idle time is experienced by the servers from the pool with the slowest service speed. Their model is a parallel-server system model that also allows for customer reneging, and they show an asymptotic optimality result that is similar to Gurvich and Whitt (29b), but uses different proof techniques. The holding and reneging costs are restricted to be linear, and the reneging rates are assumed to be ordered so that classes that are less expensive to hold also have higher reneging rates. There are several recent papers that address the server fairness issue, all in the context of the inverted-v model. Armony and Ward (2) propose a threshold routing rule, where customers are assigned to server pools based on a set of thresholds on the total number of customers in the system. Atar (28) proposes a blind policy that routes newly arrived customers to the server that has been idle the longest. Atar et al. (2) analyze the blind policy that routes newly arrived customers to the pool that has the longest weighted cumulative idleness process, where the cumulative idleness process for a particular pool is defined by summing over the idle times each server in that pool has experienced since time. A similar policy is analyzed by Reed and Shaki (22) under more general assumptions on service time distribution. The policy proposed in Mandelbaum et al. (22) balances the number of idle servers from each pool by routing newly arrived customers randomly, with the probability that a given pool is chosen being proportional to the number of idle servers in that pool. To the best of our knowledge, this is the first paper that considers the server fairness issue in a more general parallel-server system that has both multiple server pools and multiple customer classes. 2. Model and Problem Formulation Consider a Parallel-Server System (PSS), as shown in Figure, with fixed sets I = I that represent customer classes and = J that represents server types (each type in its own server pool). We let A i, i I be independent, unit-rate Poisson processes, and A i i t represent the cumulative number of class i arrivals up to time t >, so that i > is the class i arrival rate. The total arrival rate to the system is = I i. Service times are independent and exponential, and the average service time of a customer served by a server from pool j is / j. There are N j servers in pool j. We let N = j N j be the total number of servers and N = N N J denote the staffing vector. Here and elsewhere, x is used to denote a vector whose elements are x x 2 Note that we follow the notation in Gurvich and Whitt (29b). The set of possible assignments of customers to servers in this system can be represented as the set E of edges in the bipartite graph G formed from the set of nodes I having allowable edges E = i j I. An edge i j E corresponds to pool j being able to serve class i Figure. An example parallel-server system. N Routing and scheduling customers. Given the assignment graph G = I E with E E, we let Ij = { i I i j E } I N J J J i = { j i j E } In words, Ij is the set of classes that a pool j server can serve, and J i is the set of all server pools that can serve class i. A control policy has two components: a routing component that specifies what to do when a customer arrives to the system, and a scheduling component that specifies what to do when a server completes service and becomes available. Denote by = N a policy that operates in the system with arrival rate and staffing vector N (in general, we omit the arguments and N when it is clear from the context which arguments should be used). Let t be an arbitrary time point. We denote the queue length of class i customers by Q i t and the number of idle servers in pool j by I j t. Then, the total number of customers queueing is Q t = i I Q i t, and the total number of idle servers is I t = j I j t. Let Z j t be the number of busy servers in pool j. (Note that we do not need to track which servers are serving which customer classes, because we have assumed that the rate at which a server serves a customer depends only on his pool, and not on the customer s class.) It follows that I j t = N j Z j t and I t = N j Z j t. We let S j, j, be independent, unit-rate Poisson processes, also independent of A i, i I, so that ( t ) D j t = S j j Z j s ds ( t ( = S j j Nj I j s ) ) ds

4 Operations Research 6(), pp , 23 INFORMS 23 represents the cumulative number of service completions by pool j servers. The overall number of customers in the system is X t = Q i t + Z j t i I j = X + A i i t D j t i I j We omit the time argument when we refer to an entire process. Also, we omit from the notation unless it is necessary to avoid confusion between different routing policies. Let be the set of nonanticipating, nonpreemptive policies, under which a steady state exists for X, Q i, i I, and Z j, j. Nonanticipating (roughly speaking) means that the policy cannot require knowledge of the future. By nonpreemptive, we mean that once a service has begun by a particular server, it cannot be reassigned to another server of a different pool, nor can it be preempted by another customer. We also assume that any policy serves customers first-come first-served within each customer class and, similarly, that servers become busy in accordance with a first-idle first-busy policy within each server pool. Additionally, we consider policies that obtain a unique steady-state distribution with a finite mean for all the above relevant processes. Accordingly, we use as the time argument for the various processes to denote the process in steady state. We call a policy an admissible policy and restrict our attention to such policies. Our stated objective in the introduction is to find a blind policy for scheduling and routing that minimizes customer holding costs subject to the requirement that in the long run each server pool experiences a proportionate share of the server idleness. In order to do this, it is helpful to first formulate the relevant stochastic minimization problem when we ignore the desire to have a blind policy. Let C i R + R + for i I be functions that represent the class i holding cost. Let f j be such that J j= f j =. Then, the relevant stochastic minimization problem is minimize E [ C i Q i ] subject to: EI j = f j EI j We call the constraints in () the fairness constraints, and the parameters f j, j the fairness parameters. The problem formulation () is based on the number of customers in each queue and the number of idle servers in each pool. One natural question is whether it is preferable to pose the problem formulation in terms of individual customers delay times and individual server idling time instead. We show that these two problem formulations are equivalent in 6. We begin with the problem formulation () because the presentation is cleaner. It is also true that the problem formulation () is a steady-state formulation, as is common in the literature () (see, for example, Garnett et al. 22, Mandelbaum and Zeltyn 29, Gurvich et al. 28). An interesting question is whether the fairness constraints should require that fairness be achieved at every point in time. We delay discussion of this issue to 7. The ideal is to first solve (), and to then search for a blind policy having average customer holding cost that is not too much larger than the solution to (). However, it does not appear possible to solve () exactly. Instead, we formulate and solve the diffusion control problem (DCP) associated with () that arises in the Halfin-Whitt manyserver heavy-traffic limiting regime. We then propose a blind policy that is feasible for the DCP, and develop bounds on the performance of that blind policy relative to the DCP solution. 2.. The Many-Server Heavy-Traffic Limiting Regime Our approach in solving () is to solve it asymptotically in the Halfin-Whitt many-server heavy-traffic limit regime (Halfin and Whitt 98); i.e., for large systems (systems with many servers and large demand) that are heavily loaded. Specifically, we consider a family of systems indexed by the aggregate arrival rate and let. The service rates j, j, the routing graph G, and the ratios a i = i / are all held fixed. The associated family of staffing vectors is N = N N J. Our convention is to superscript all processes and quantities associated with the system having arrival rate by. We also define the scaled processes ˆQ i t = Q i t i I Î j t = I j t j and ˆQ t = Q t Î t = I t and ˆX t = Xt N We assume that the system is heavily loaded, that each customer class has access to more than the minimal capacity it requires, and that the cost functions are strictly convex. These assumptions are analogous to Assumptions in Gurvich and Whitt (29b), and we refer the reader there for more discussion. Assumption (Square-Root Safety Capacity). There is a strictly positive vector that satisfies J j= j j = such that lim N j j j That is, j j is the limiting proportion of pool j service capacity out of the total system capacity. Furthermore, lim Nj j = j j for R J such that = j j j >

5 232 Operations Research 6(), pp , 23 INFORMS Assumption 2 (Resource Pooling). There exists a vector x R I J + that satisfies j J i j x ij j = a i for all i I and x ij = i Ij for all j and such that the graph G having allowable edges E = i j I x ij > is a connected graph. Assumption 3. There exist continuously differentiable, strictly increasing, strictly convex functions C i, i I having C i = C i = such that C i x = C ix/ for each i I. Given this setup, the problem (), for every fixed value of, is equivalently stated as minimize E [ C i ˆQ i ] (2) subject to: EÎj = f jeî j 3. The Approximating Diffusion Control Problem (DCP) The diffusion control problem (DCP) that arises as an approximation to (2) as becomes large is minimize p v E [ C i p i ˆX + ˆX + ] subject to: E [ v j ˆX ˆX ] = f j E ˆX for ˆX that solves ˆXt = ˆX + t mx v = + for all j (3) m ˆXs v ds + 2Bt where (4) J j v j x x (5) j= That is, ˆX is a diffusion process with infinitesimal variance 2 and infinitesimal drift (5). We heuristically motivate the DCP formulation in the e-companion ( EC.). Roughly speaking, ˆX + and ˆX approximate the scaled total queue length and total system idleness, respectively. Moreover, the functions p and v specify how to divide the queue length between the customer classes and the total idleness between the server pools, respectively. Finally, recall from our terminology for the prelimit system that we call the constraints in (3) the fairness constraints, and that the fairness parameters f j, j, dictate the proportion of idleness experienced by pool j servers. Next, in 3., we establish that the DCP (3) separates into two simpler problems: one that involves only scheduling and one that involves only routing. The implication is that the routing control policy determines the total number of customers waiting in queue, and the scheduling control policy determines how to split them among the various classes. The section culminates in 3.2 with the DCP solution. 3.. Separability of the DCP The key to solving the DCP (3) is to observe that the routing and the scheduling components of the control policy (3) may be found separately. To see this, first observe that (3) is alternatively expressed as minimize p v [ ] E C i p i ˆX + ˆX + ˆX > P ˆX > subject to: E [ v j ˆX ˆX ] = f j E ˆX for all j The term E I C ip i ˆX + ˆX + ˆX > in the above is not a function of the routing component v. In fact, ˆXt + ˆXt > is a reflected Brownian motion (RBM) with drift and infinitesimal variance 2, so that ˆX + ˆX > exp, where denotes is distributed as. Next, since the drift term (5) for ˆX does not depend on p, P ˆX > does not depend on p, and also the constraints in (3) do not depend on p. Therefore, it follows that (3) is equivalently written as ( ) minimize p Q C i p i xxe x dx minimize P ˆX > v I subject to: Ev j ˆX ˆX = f j E ˆX for all j (6) We can separate (6) into two problems. In particular, for any fixed routing component v, the scheduling component p that solves the problem ( minimize p Q ) C i p i xxe x dx (7) is also an optimal scheduling component for (3). Furthermore, for any fixed scheduling component p, the routing component v that solves the problem minimize v I P ˆX > subject to: Ev j ˆX ˆX = f j E ˆX for all j (8)

6 Operations Research 6(), pp , 23 INFORMS 233 is also an optimal routing component for (3). In particular, p that solves (7) is not a function of the particular choice of v, and v that solves (8) is not a function of the particular choice of p. Hence, we have established the following result. Proposition (Separability). The optimal solution p v to the problem (3) has p that solves (7) and v that solves (8). The implication of Proposition is that the solution of the DCP in (3) can be found by solving one problem that involves only scheduling, ignoring the fairness constraints, and one that involves only routing, ignoring the cost functions C i, i I. The routing problem (8) determines E ˆX + = P ˆX > /, which approximates the total number of customers waiting in queue, and the scheduling problem (7) finds the division of waiting customers into classes that minimizes delay costs (as a function of the total number of customers waiting). Moreover, p is the same for all underlying PSS network structures having the same number of customer classes I and the cost functions C i. Similarly, v is the same for all underlying PSS network structures having the same number of server pools and desired fairness fractions f j. Hence, we can think of the solution p v as combining the scheduling control policy p that solves the approximating DCP for the V model with multiple customer classes and a single server pool (in which the fairness constraint disappears), and the approximating DCP for the inverted-v model with a single customer class and multiple server pools (in which the objective function reduces to minimizing the wait time) The Diffusion Control Problem Solution We use Proposition to solve the DCP (3). To do this, we first specify the scheduling and routing components, p and v, separately. These are given in Gurvich and Whitt (29b) and Armony and Ward (2). For completeness, we also write the solutions here The DCP Solution: Scheduling. Consider the problem minimize subject to: C i p i xx p i x = i I p i x for all i I (9) This nonlinear program appears as () in Gurvich and Whitt (29b). It is a separable, continuous, nonlinear resource allocation problem, as in Ibaraki and Katoh (988), Patriksson (26), and Zipkin (98). It follows from Lemma 3. in Gurvich and Whitt (29b) that we can choose the function p such that it is locally Hölder continuous on the open interval, and we make this assumption. It follows from the associate KKT conditions that the solution p x = p x p I x, x > to (9) is the unique solution to the parameterized set of equations C i p ixx = C k p kxx p i x = i I p i x for all i I i k I Because, p x solves (9) for any value of x >, we conclude that p solves (7) The DCP Solution: Routing. The solution to (8) is a threshold control policy, as defined below. Definition. A threshold control policy is defined by a J dimensional vector L = L L J having L = < L < L 2 < < L J < L J =, and has v j x = L j x < L j, for all x and j. It is also useful to note that the infinitesimal drift associated with threshold control policy at levels L is x m L x = J x j L j x < L j x < () j= Let v be the threshold control policy, and L = L J f f J L J = L J J f f J be the associated threshold levels, under which ˆX solves (4), and E [ ˆX L j ˆX < L j ] = f j E [ ˆX ] for all j () Such a control policy exists by Lemma 5 in Armony and Ward (2), and the threshold levels can be found according to the search procedure described in that paper, that is a sequence of one-dimensional searches. Theorem in Armony and Ward (2) establishes that the proposed threshold control policy v with threshold levels L L J is an optimal solution to the problem (8) The DCP Solution. The following Theorem is immediate from Proposition, the solution to (7) provided in 3.2., and the solution to (8) provided in Theorem. Let ˆX satisfy (4) under the routing control v I. Then, for any p v, for ˆX that satisfies (4) under control v I, and has E [ v j ˆX ˆX ] = f j E [ ˆX ] it is also true that E [ C i p i ˆX + ˆX + ] E [ C i p i ˆX + ˆX + ] for all j

7 234 Operations Research 6(), pp , 23 INFORMS In summary, we have solved the DCP (3) that approximates our original stochastic minimization problem (2) in the Halfin-Whitt many-server heavy-traffic regime. However, the solution p v requires knowledge of the parameters J, because the threshold levels L L J depend on those parameters. In particular, this policy is not blind. This is an issue because our stated objective is to propose a blind policy that has an associated cost that is not too much larger than the minimum achievable cost. Therefore, in the next section, we consider a blind policy that is feasible for the DCP in (3), and calculate the percentage cost increase in the DCP objective function (3) under that blind policy relative to its value under p v. 4. A Blind Policy That Is Feasible for the DCP In this section, we introduce our proposed blind policy, and evaluate the percentage increase in the DCP objective function value when that blind policy is used instead of the optimal policy p v. Our goal is to show that this percentage increase is small. In 4., we write the expression for the percentage cost increase, and observe that it does not depend on the scheduling component p. Next, we establish that it is sufficient to study a two-pool model with a single customer class (J = 2 and I =, an inverted-v model) in order to approximate the aforementioned percentage increase in any PSS model, with any number of server pools and customer classes. In particular, to calculate this percentage increase in any PSS model, it is required only to know the service rate of the servers in the slowest and fastest pools; it is not necessary to know the cost functions, the number of customer classes, or the number of server pools. Finally, we end in 4.2 by using the convenient analytic expressions associated with the two-pool inverted-v model to show that the percentage increase in the objective function value is under % in a large portion of the parameter space. 4.. The Proposed Blind Policy One very simple routing policy that satisfies the constraints of the DCP in (3) is the fixed fairness ratios (FFR) policy v j x = f j for x and j (2) The routing control policy v FFR defined by (2) has the very attractive feature that it does not require knowledge of the parameters, J, and so is blind. However, v FFR is more restrictive than the optimal routing policy v, because it requires that the fairness constraint in (3) is satisfied at all times. In other words, each pool always has the desired proportion of idle servers, instead of that desired proportion being achieved in the long run, but not necessarily at any single point in time. The optimal scheduling control policy p is blind, and so our proposed scheduling policy remains the same as in 3. The percentage increase in the objective function value in (3) when the blind control policy p v FFR is used instead of the optimal control policy p v is ( E [ C i p i ˆX FFR + ˆX FFR + ] E [ C i p i ˆX + ˆX + ]) ( E [ C i p i ˆX + ˆX + ]) (3) where ˆX FFR satisfies (4) under the routing control policy in (2), and ˆX satisfies (4) under the routing control policy v. Because any ˆX that satisfies (4) has ˆX + ˆX > exp, E [ C i p i ˆX + ˆX + ] = P ˆX > C i p i xxe x dx Therefore, the percentage cost increase in (3) is equal to P ˆX FFR > P ˆX > (4) P ˆX > so that the increase in the objective function value is independent of the scheduling control policy. Moreover, the increase in the objective function value does not depend on either the cost functions C i, i I or the number of customer classes I. Furthermore, the percentage increase in the objective function value in the DCP (8) that considers only routing is exactly the percentage increase in the objective function value in the DCP (3) that includes both routing and scheduling. The reason is the separability shown in 3.. We would like to use the expression in (4) to understand when using v FFR instead of v results in a small percentage increase in the objective function value. This appears possible because the expression in (4) can be written in terms of the standard normal pdf and cdf using the results in Browne and Whitt (995, 8.4). In the case that J = 2, for given < < 2, f, and f 2 = f, P ˆX FFR > = P FFR 2 f P ˆX > = P 2 Lf and (5) where P FFR 2 f and P 2 L are given in the e-companion ( EC.2). However, the expression in (4) becomes algebraically complicated for J larger than 2. Fortunately, for the purposes of estimating max f f J f j J j= f j = P ˆX FFR > P ˆX > P ˆX >

8 Operations Research 6(), pp , 23 INFORMS 235 the maximum percentage cost increase over all fairness constraints for a given set of parameters, J, J, it is enough to find P FFR max J f P J Lf f Lf P J that is, it is sufficient to study the two-pool system with the slowest and fastest service rates. Theorem 2. Consider a system with J 3 pools and with fixed parameters, 2 J. Then, P ˆX FFR > P ˆX > max f f J f j P ˆX > J j= f j = P FFR max J f P J Lf f Lf P J The key driver of this result is the fact that any convex combination of 2 J may be represented as a convex combination of and J only. Therefore, the diffusion process associated with any feasible policy for the J pool system, is identical to a diffusion process associated with a feasible policy for the 2-pool system, with appropriate chosen fairness ratios. This includes, in particular, the J -pool FFR and the threshold policies. At the same time, the 2-pool threshold policy is optimal for the 2-pool system, so that the percentage cost reduction is greater there Numerical Evaluation We desire to understand the consequences of using the blind routing control policy v FFR instead of the optimal routing control policy v. More specifically, we would like to compute the percentage increase in the objective function value in order to understand the region of the parameter space in which the percentage increase is small. Theorem 2 establishes that to do this it is enough to study the fraction P FFR f P Lf /P Lf for f. For a given f, Lf does not have a convenient closedform expression. However, for a given L, f L= E[ ˆX ˆX < L ] P ˆX < L E [ (6) ˆX ] has a convenient closed-form expression given in the e-companion ( EC.2). Because for a given L >, the drift of the fixed fairness ratios diffusion that satisfies the identical fairness constraint is easily expressed in terms of f L (the drift is x f L + 2 f L), we have the following closed-form expression for the percentage increase in the objective function value under the control policy v FFR instead of v L = P FFR f L P L P L ( ( ( = / L + ) 2 ( h 2 L + ) + ( ) ( 2 ( L + ))) ) + ( f L + f L 2 ( ) h + ) f L + f L 2 where hx = x/x, so that h x = x/ x is the hazard-rate function for the standard normal distribution. Figure 2(a) shows the behavior of when =, =, and for various values of 2. The maximum percentage increase in the objective function value never exceeds %, and is below % for all values of 2 = 2 and 2 = 3. For comparison purposes, it is useful to also calculate the percentage increase in the objective function value under Figure 2. (%) (%) The percentage increase in the objective function value under the blind control policy v FFR instead of (a) the optimal control policy v and (b) the optimal control policy when the fairness constraint is ignored v FSF Slow server idleness fraction f =, = = 2 2 = 3 2 = 4 2 = 5 (a) (L( f )) (b) FSF ( f ) Slow server idleness fraction f =, =

9 236 Operations Research 6(), pp , 23 INFORMS v FFR as compared to the optimal control policy when the fairness constraint is ignored, { v FSF if j = j x = for all x (7) otherwise In particular, v FSF solves minimize P ˆX > v I and so, by separability, is the routing control policy that minimizes the objective function in (3), ignoring the constraint. Then, letting ˆX FSF satisfy (4) under the routing control policy v FSF, the function FSF f = P ˆX FFR > P ˆX FSF > P ˆX FSF > graphed in Figure 2(b) for various values of 2, provides information on the impact of adding the fairness constraint with respect to the increase in the objective function value. We conclude from Figure 2(b) that whenever f is not too close to, meaning the faster servers are experiencing some nontrivial proportion of the idle time, the percentage increase in the objective function value when using v FFR instead of v FSF is much larger than that when using v FFR instead of v. This suggests that the impact of requiring a blind policy is tiny compared to the impact of requiring a fair policy. Figure 3 graphs when = for various values of, and for 2 = 2 (Figure 3(a)) and 2 = 5 (Figure 3(b)). Figure 3 shows that the maximum percentage increase in the objective function increases as increases and that the impact of 2 is small compared to the impact of. This suggests that caution is warranted when deciding to use the blind control policy v FFR if there is a sense that the value of may be large (exceed ). In other words, even though the control policy v FFR itself is blind, it is necessary to have some partial knowledge of the parameters in order to evaluate its performance. 5. The Proposed Blind (and Fair) Policy There is a natural translation between the blind policy v FFR that is feasible for the DCP (3) to a blind policy that is feasible for the original stochastic minimization problem (2). In particular, the separability of the DCP shown in Proposition implies that the scheduling and routing can be decided on in isolation. Scheduling should be in accordance with the Gc scheduling rule that was shown to be asymptotically optimal in Gurvich and Whitt (29b, Theorem 3.2) for a finite time horizon version of (2) in the absence of fairness constraints. Routing should be in accordance with the QIR policy in Gurvich and Whitt (29a) that makes decisions based on always maintaining the proportion of idle servers in each pool j to be f j, so that, in accordance with v FFR, the fairness constraint is satisfied at all times in the Halfin-Whitt limit regime. We call our proposed blind policy the fair fixed idleness ratio and generalized c scheduling (FFIR-Gc) policy. Figure 3. (%) (%) The percentage increase in the objective function value, Lf, under the blind control policy v FFR instead of the optimal control policy v. (a) = ; 2 = Slow server idleness fraction f (b) = ; 2 = Slow server idleness fraction f =.5 =. =.5 = 2. Definition 2 The Fair Fixed Idleness Ratio and Generalized c Scheduling (FFIR-Gc) Policy. Upon the arrival of a class i customer at time t, the customer will be routed to an available server in pool j, where {Î j = j t arg max j t f j ˆX t } j J i I j t> i.e., the customer will be routed to the server pool that has the greatest difference between the number of servers in that pool that are currently idling and the number of idle servers in that pool that maintains the fairness parameter f j. In particular, larger value of f j implies lower priority for the corresponding server pool. If there are no servers available, the customer waits in queue i to be served in the order of arrival. 2 Upon service completion by a type j server at time t, the server will admit to service the customer from the head of queue i = i t arg max C i ˆQ i t i Ij Qi t> i.e., the server prioritizes classes using a generalized c rule. If there are no customers waiting, the server will idle. The FFIR-Gc policy is blind because it only requires knowledge of the fairness parameters f j, j. Also note

10 Operations Research 6(), pp , 23 INFORMS 237 that this policy is a QIR policy (see Gurvich and Whitt 29a, Definition 2.3). This includes the Gc scheduling as observed in Gurvich and Whitt (29b). The following theorem, which is an immediate consequence in Gurvich and Whitt (29a, Theorem 4.), suggests that in the Halfin-Whitt limit regime the FFIR-Gc policy has similar performance to the blind control for the DCP v FFR. Theorem 3. Let ˆX FFR satisfy (4) under the routing control v FFR, and let p be the scheduling control policy that solves (7) and is the optimal scheduling control policy for the DCP in (3). Assume ˆX ˆX FFR as. Then, under the FFIR-Gc policy, ( ˆX t ˆQ t ˆQ I t Î t Î J t) ( ˆXtp ˆX FFR t + ˆX FFR t + p I ˆX FFR t + ˆX FFR t + f ˆX FFR t f J ˆX FFR t ) Theorem 3 suggests that EÎ j f j E ˆX FFR in D as for all j as so that the FFIR-Gc policy satisfies the constraints of the DCP as becomes large. The convergence in the previous display is true when both the limit interchange required to show the weak convergence of the steady-state random variables ( ˆX ˆQ ˆQ I Î Î J ) ( ˆX p ˆX FFR + ˆX FFR + p I ˆX FFR + ˆX FFR + f ˆX FFR f J ˆX FFR ) as, is valid and the sequence of steady-state random variables is uniformly integrable. (Note that for a general PSS model, it is nontrivial to establish the aforementioned limit interchange and uniform integrability. One step in this direction is Stolyar and Yudovina 2.) We next perform a simulation study to investigate the performance of the FFIR-Gc policy. The purpose of our simulation study is to show that the results in 4 that show that the percentage increase in the objective function value in the DCP (3) when the blind control p v FFR is used instead of the optimal control p v translate to the prelimit. That is, the objective function value in (2) when the FFIR-Gc policy is used should be close to its minimum whenever the objective function value in (3) when p v FFR is used is close to its minimum. We describe the setup for our simulation study in 5. and present our simulation results in The Setup for Our Simulation Study We study the performance of the FFIR-Gc policy in the N -model and the two-pool inverted-v model shown in Figure 4. = N = 34 (a) N-model = 8 The PSS models used in our simulation study. 2 = 2 N 2 = 65 (a) Inverted-V model 2 = 8 =,6 = N = 34 2 = 2 N 2 = 65 Figure 4. We use the control policy that is optimal for the DCP (3) p v to motivate a comparison policy to FFIR-Gc. The comparison policy schedules in accordance with the Gc rule, and the routing translates the thresholds L L J that define v to a threshold-routing policy exactly as in 5 in Armony and Ward (2). Hence, we term the comparison control the TR-Gc control. Note that the parameters in Figure 4 are consistent with = 2 for the N -model, = /5, 2 = 4/, = /2, and 2 = /4, so that = = (and N = + = 34, N 2 = = 65). Then, for any given value of f, the threshold L = L 2 f f 2 is well defined. The TR-Gc is a natural comparison policy to FFIR- Gc because its continuous modification has been shown to be asymptotically optimal for the inverted-v model in Armony and Ward (2). At the same time, TR-Gc is not blind, so one can consider this comparison as a the price of blindness indicator. Showing that the TR-Gc policy is asymptotically optimal for a general PSS model is technically challenging, and beyond the scope of this paper, because it requires knowing the limiting behavior of the sequence of steady-state random variables associated with the system processes, and the uniform integrability of that sequence under any admissible policy (which, as mentioned after Theorem 3, is difficult). The routing of an incoming class customer at time t > in the N -model and an incoming customer at time t > in the inverted-v model is the same. The TR-Gc policy routes such a customer to pool if X t L and I t > 2 if X jt = t > L and I2 t > 2 if X t L and I t = if X t > L and I2 t = where L = N L = 99 L 6

11 238 Operations Research 6(), pp , 23 INFORMS Otherwise, if I t = I 2 t =, the customer waits in queue. Scheduling is only relevant for the N -model and is the same under both the FFIR-Gc and TR-Gc policies, but depends on the cost function. For the cost functions C x = 2 x and C 2 x = x for x > (8) the scheduling has a two-pool server that becomes available at time t >, when both Q t > and Q 2t >, serves the class customer that has been waiting the longest if C ˆQ t > C 2 ˆQ 2 t 2Q t > Q 2 t and otherwise serves the class 2 customer that has been waiting the longest. If Q t > but Q 2 t = (Q 2 t > but Q t = ), the server serves the class (class 2) customer that has been waiting the longest. The server idles only if Q t = Q 2t =. We choose the cost function for the inverted-v model so that the predicted mean cost is exactly the same as for the N -model under both the FFIR-Gc and TR-Gc policies, for easy comparison. To do this, first note that in the N -model, the condition C px = C 2 px implies p x = /3 and p 2x = 2/3 for all x, so that [ ( )] [ ( )] Q E C Q + E C 2 2 ( 2 ( ) + 3 ( ) 2 ) E [ ˆX p v + ] 3 for either = TR-Gc and p v = p v or = FFIR-Gc and p v = p v FFR. Then, the cost function Cx = ( 2 ( ( 3) + 2 ) ) 3 x (9) in the inverted-v model produces the same predicted mean cost because [ ( )] Q E C E [ C ˆX p v + ] ( = 2 ( ) + 3 ( ) 2 ) 3 E [ ˆX p v + ] for either = TR-Gc and p v = p v or = FFIR-Gc and p v = p v FFR Simulation Results Figure 5 confirms that the mean cost is predicted within % relative error in both the N -model and the inverted-v model. Moreover, observe that the mean cost under the FFIR-Gc control is not much larger than under the TR-Gc control. This is to be expected because the maximum percentage cost increase as predicted by Lf defined in 4.2 over the range f is just under 8%. When the mean percentage cost increase is so low, the variability in the simulation dominates the cost increase that occurs because the FFIR-Gc control is used instead of Figure 5. The mean cost for the N -model when C x = 2 x and C 2 x = x and for the inverted-v model when Cx = 2 /3 + 2/3 x. Mean cost under TR-Gc Mean cost under FFIR-Gc Predicted N model, simulated Inverted-V model, simulated (a) TR-Gc Slow server idleness fraction f (b) FFIR-Gc Slow server idleness fraction f the TR-Gc control, as shown in Table. According to our theory in 4, this observation should be true for any cost function, and so in the table we also include the cost function Cx = x 2 for x for the inverted-v model. It is informative to compare the simulated mean cost of the FFIR-Gc control to the simulated mean cost of the control that we expect asymptotically minimizes the objective function in (2) when the fairness constraint is ignored (equivalently, when f = and f j = for all j 2 3 J ). That control is the FSF-Gc control that routes by sending an arriving customer to the fastest server available and schedules according to the Gc rule. 3 Table 2 shows the simulated percentage cost increase in the objective function value in (2) when the FFIR-Gc control is used instead of the FSF-Gc control. We conclude that for smaller values of f, meaning that idle servers will not most always be slow servers,

12 Operations Research 6(), pp , 23 INFORMS 239 Table. The percentage increase in the objective function value in (2) under FFIR-Gc instead of TR-Gc. N -model, C and C 2 Inverted-V, Inverted-V, Predicted given C given Cx = x 2 f (%) in (8) (%) in (9) (%) (%) the impact of using the blind FFIR-Gc control, instead of the control TR-Gc that we conjecture asymptotically solves (2), is small compared to the impact of requiring fairness. Note that when f is close to, the FFIR-Gc, TR-Gc, and FSF-Gc controls all perform very similarly, and this is reflected in the numbers in the above two tables. Simulation results that confirm the separability of the routing and scheduling controls are available in Section EC.3 in the E-Companion. 6. Delay and Idling Time Problem Formulation The problem formulation (2) is in terms of holding costs and a fairness constraint based on the average proportion of idle servers in each pool. Another natural problem formulation seeks to minimize customer delay costs subject to a fairness constraint on server idling time. Our purpose in this section is to show that our results in 4 that are relevant for the DCP (3) that approximates (2) are also relevant when (2) is replaced by a problem formulation based on customer delay and server idling time. To do this, in 6., we present our delay and idling-time based problem formulation, and show that it is asymptotically equivalent to our original problem formulation (2). However, the FFIR- Gc policy is not blind for our delay and idling-time based problem formulation. Therefore, in 6.2 we propose a blind policy that we establish is asymptotically equivalent to the FFIR-Gc policy. Table 2. The percentage increase in the objective function value in (2) under FFIR-Gc instead of FSF-Gc. N -model, C and C 2 Inverted-V, Inverted-V, given C given Cx = x 2 f in (8) (%) in (9) (%) (%) The Delay and Idling-Time Based Problem Formulation Let Vi t, i I be the amount of time a class i customer that arrives at time t > must wait for service, and Uj t, j be the amount of time a pool j server that becomes idle at time t > must wait before again becoming busy (if all N j servers are idle at time t, then Uj t is defined as the idling time of a hypothetical pool j server). That is, Vi and Uj are the processes that track the virtual class i waiting time and the virtual pool j idle time, respectively. Let U t = J j= U j t. Define the scaled processes ˆV i = V U = U i i I Û j = U j j and Let Ci a x represent the cost of delaying a class i I customer for x time units and assume Ci ax = C ia i x/a i for C i, i I, cost functions that satisfy Assumption 3. We are interested in studying the following stochastic minimization problem minimize a i E [ C a i ˆV i ] subject to: EÛ j = j EÛ J (2) where j and J j= j =. The question arises: why is (2) the appropriate formulation if we are concerned with individual customer delay times and individual server idling times? The PASTA (Poisson Arrivals See Time Averages) property ensures that arriving customers see time averages, meaning that arriving customers see the steady-state distribution of the virtual waiting-time process. This explains the objective function in (2). The successive times at which servers become idle do not form a Poisson process, and so it is not clear that the same logic applies to explain the constraints in (2). Also, the constraints in (2) are given in terms of the virtual idle time, which is difficult to verify because the routing part of a policy will be based on the actual server idle times. Fortunately, under mild assumptions, as becomes large, servers becoming idle do see time averages, as the following theorem shows. Theorem 4 (Server Asymptotic ASTA (Arrivals See Time Averages)). Let 2 be the sequence of service completion times of servers in pool j. Suppose that the sequence of systems indexed by is such that Û j Û j and I j /Nj in D Then, for any fixed T > and x R, D j T Dj T k= as j Û j k x T Û T j s x ds as (2)

Blind Fair Routing in Large-Scale Service Systems with Heterogeneous Customers and Servers

Blind Fair Routing in Large-Scale Service Systems with Heterogeneous Customers and Servers Mor Armony Amy R. Ward 2 October 6, 2 Abstract In a call center, arriving customers must be routed to available