A Little Flexibility is All You Need: Optimality of Tailored Chaining and Pairing*

Size: px

Start display at page:

Download "A Little Flexibility is All You Need: Optimality of Tailored Chaining and Pairing*"

Joella Bennett
5 years ago
Views:

1 A Little Flexibility is All You Need: Optimality of Tailored Chaining and Pairing* Achal Bassamboo Kellogg School of Management, Northwestern University, Ramandeep S. Randhawa McCombs School of Business, University of Texas at Austin, Jan A. Van Mieghem Kellogg School of Management, Northwestern University, April 16, 8 Deciding on the appropriate type and amount of flexibility is a classic management problem. The literature has shown that the choice between specialization and flexibility is not an all-or-nothing proposition. It is typically better to use a tailored portfolio of dedicated and flexible resources and a little flexibility goes a long way. Let level-k flexibility refer to a resource s ability to process k different types of products. Simulations have shown that using only level- flexible resources in a special configuration called chaining achieves almost all the benefits of total flexibility. In this paper, we introduce tailored pairing that merges and extends the concepts of chaining and tailoring in dynamic processing systems. We optimize the type and amount of flexibility using a Brownian approximation that is asymptotically correct. We show analytically that for symmetric systems and most practical flexibility cost structures the optimal flexibility configuration invests a lot in dedicated resources, a little in only bi-level flexibility, but nothing in level-k > flexibility, let alone full flexibility. Dedicated resources provide base capacity to serve the majority of the demand while only a small amount of bi-level flexibility is sufficient to serve the variable demand: dedicated capacity is sized roughly proportional to demand while flexible capacity is roughly proportional to the square root of demand and to its coefficient of variation. Our main result can be restated as saying that tailored pairing is optimal for symmetric systems. We investigate the accuracy and robustness of our results in asymmetric systems. It is obvious that the tailored flexible configuration will mirror the asymmetry in the demand. Yet our main result remains: even asymmetric systems do not seem to need k > -level flexible resources and complete resource pooling is suboptimal. 1. Introduction Deciding on the appropriate type and amount of flexibility is a classic management problem: should different types of products or customers be processed or served with dedicated or flexible capacity? And how much flexibility is needed to effectively match demand and supply? The extant literature on flexibility refers to the ability of a resource to process multiple types of products as mix- (Chod et al., 8, process- (Sethi and Sethi, 199, product- (Fine and Freund, 199 or scope-flexibility (Van Mieghem, 8. Substantial progress has been made in our understanding of flexibility over the last 15 years. One important insight that is relevant to our work is that the choice between specialization or flexibility is not an all-or-nothing proposition. Two papers in this journal have demonstrated two different interpretations of this insight: tailoring and chaining. Van Mieghem (1998 showed that it is typically optimal to invest in a portfolio of specialized and flexible resources. In a -product setting, such a portfolio consists of two dedicated and one fully flexible resource, where the dedicated resources act as base capacity and the flexible resource serves as an optimal cost/benefit response to demand variability. We will refer to such a portfolio * The authors would like to thank the participants of the IEMS-Kellogg Operations seminar, especially Seyed Iravani. 1

2 Bassamboo et al.: A little flexibility is all you need Fully Dedicated < > Tailored Chaining = Tailored Pairing Fully Flexible Figure 1 For N = 3 product types, tailored chaining is identical to tailored pairing and is the optimal flexibility configuration for a symmetric system. The optimal capacity portfolio invests a lot in dedicated resources, a little in bi-level flexibility, but nothing in fully flexible resources. approach of fitting or optimizing the type and amount of flexibility to demand profiles as tailored flexibility. While tailored flexibility is well understood in a -product setting, finding desirable flexible processing systems for N > products is much more difficult because the choice set of tailored flexibility increases exponentially. Indeed, the capacity portfolio can now consist of N 1 level-k flexible resources, which are resources that are able to process k {1,,,N} products. (Thus, dedicated resources correspond to k = 1 and fully flexible resources correspond to k = N. Consequently, the possible assignments of products to resources grows exponentially in N. In their seminal paper, Jordan and Graves (1995 showed that a little flexibility can achieve almost all the benefits of total flexibility by using only level- flexible resources in a special configuration called chaining. A chain is a group of products and resources which are all connected, directly or indirectly, by product assignment decisions. Specifically, one deploys N bi-flexible resources to form a connected chain. With N = 3, for example, chaining would assign products 1 and to bi-flexible resource 1, products and 3 to bi-flexible resource, and products 3 and 1 to bi-flexible resource 3, as shown by the dashed allocation in the middle panel of Figure 1. Chaining allows for shifting capacity from products with lower than expected demand to those with higher than expected demand. Jordan and Graves consider a single-period model where random demand is allocated ex-post to pre-fixed capacity. Excess demand is assumed lost and the allocation objective is to minimize the corresponding shortfall. Using simulation and providing some analytical justification, Jordan and Graves demonstrated that the expected shortfall and capacity utilization of chained bi-level flexible resources is close to the expected shortfall and utilization of fully flexible resources with the same capacity. Graves and Tomlin (3 showed that chaining provides similar benefits in multi-stage systems. In this paper we introduce the concept of tailored pairing that merges and extends the concepts of chaining and tailoring. Pairing is a configuration of level- flexible resources such that every two

3 Bassamboo et al.: A little flexibility is all you need 3 Type of resources used Flexibility configuration Dedicated Bi-flexible Configuration of level- flexible resources Chaining No Yes Chained: each class is connected to its neighbor via a flexible resource Pairing No Yes Pair-wise connected: each class is paired with every other class via a flexible resource Tailored chaining Yes Yes Chained Tailored pairing Yes Yes Paired Table 1 Different flexible configurations: Terminology. classes are linked by exactly one resource. This is typically different from the chaining configurations of Jordan and Graves (1995, where classes are connected in a closed chain by N level- flexible resources. Only for N 3 products does pairing reduce to chaining and the flexible resources are connected via a single chain; as shown in Figure 1. However, for N > 3, pairing uses N(N 3/ more bi-flexible resource types than chaining. Figure shows that for N = 4 pairing uses 6 biflexible resources and a few additional links compared to chaining (this is in agreement with principle 3 of Jordan and Graves. Table 1 summarizes the different flexibility terminology that we use in this paper. Tailoring refers to configurations where dedicated capacity is used to process the base demand, and flexible resources handle variability. Hence, a tailored pairing configuration is one where only dedicated and level- flexible resources are used; dedicated resources cater to based demand, while the level- flexible resources organized in a pairing configuration cater to the variability. We show the optimality of tailored pairing in a dynamic allocation model where the type and amount of flexibility is optimized analytically. While the literature compares various systems numerically or via simulation, we show analytically that a little flexibility not only goes a long way, it is all you need. Specifically, we consider a processing or queuing system with N classes of customer streams, each requiring a different type of service. The system manager can invest in a portfolio of level-k {1,,,N} flexible resources. The trade-off is simple: higher levels of flexibility reduce holding costs and waiting times but come at a higher cost. While such systems are not amenable to exact analysis, we provide an approximate mode of analysis for symmetric systems, exactly where flexibility is most valuable. We show that in high volume settings it is optimal to size capacity such that it is highly utilized. In other words, economic optimization results in a heavy traffic regime for which we provide a Brownian approximation that is asymptotically correct (and very accurate, as we will show. We show analytically that for most practical flexibility cost structures, the optimal flexibility configuration invests a lot in dedicated resources, a little in only bi-level flexibility, but nothing in level-k > flexibility, let alone full flexibility. Dedicated resources provide base capacity to serve the majority of the demand while only a small amount of bi-level flexibility is sufficient to serve the variable demand. Dedicated capacity is sized roughly proportional to demand while flexible capacity is roughly proportional to the square root of demand and to its coefficient of variation. Our main result can be summarized as saying that tailored pairing is optimal for symmetric systems. In addition to this managerial contribution, our analysis also shows that it is optimal to serve the longest queue. This gives rise to a multi-dimensional reflected Brownian with state-dependent drift that is not amenable to direct analysis. To circumvent this issue, we use a novel folding technique. This technique involves folding the state-space and studying the order statistics of the limiting queue-length. This ordered queue-length process behaves as a reflected Brownian motion in a wedge. For symmetric systems, using Williams (1987 then allows us to specify the stationary distribution and expected holding costs in closed form and to optimize capacity analytically. This

4 4 Bassamboo et al.: A little flexibility is all you need Tailored Pairing (N = 4 Resource Flexibility No Level 3 or 4! Level (little flex Level 1 (dedicated Figure The optimal flexibility configuration for a symmetric system with N = 4 product types consists of dedicated resources that serve the majority of the demand and a small amount of bi-level flexibility is sufficient to serve the uncertain demand. The optimal capacity portfolio does not invest in level 3 or 4 flexibility. For N = 4, pairing thus uses 6 bi-level flexible resources while chaining would only use 4. shows that the optimal amount of bi-level flexibility in symmetric systems is proportional to the total coefficient of variation in inter-arrival times and service times and to the square root of the arrival rate. This suggests that, for asymmetric systems with correlated demands, flexibility should be sized proportional to the standard deviation of the total demand it serves. To our knowledge, no such closed-form analytical expressions exist, not even for static models. Given that our analytic results arise from a Brownian approximation that crucially hinges on system symmetry, we investigate the accuracy and the robustness of our results. Simulations show that our analytic prescriptions are quite accurate and within a few percentages of optimality. We also investigate the robustness of our results in asymmetric systems and to various flexibility cost structures. It is obvious that the tailored pairing configuration will mirror the asymmetry in the demand. (Indeed, our analytic results suggest that, for asymmetric systems with correlated demands, flexibility should be sized proportional to the standard deviation of the total demand it serves. Yet our main result remains: optimizing a 3-class system using simulation shows that even asymmetric systems do not seem to need k > -level flexible resources. We expect this robustness to extend to asymmetric systems with N > 3 products, based on similar observations that chaining performs very well in other simulations (e.g., Jordan and Graves, 1995, Hopp et al., 4 and Iravani et al., 5 and numerical studies (e.g., Aksin and Karaesmen, 8 and Gurumurthi and Benjaafar, 4 of flexible processing systems. Our results hold when capacity costs are convex, linear and reasonable concave in the level of flexibility. What does matter is the marginal cost structure of flexibility. It is obvious that if cost is very concave in the level of flexibility, our result breaks down: indeed, if flexibility were costless, clearly full flexibility is optimal. Nevertheless, our robustness study shows that our results hold for most reasonable levels of concavity in flexibility. We do, however, assume that capacity costs are linear in capacity size. Clearly, with sufficiently large economies of scale in capacity costs, it is optimal to have fewer servers than our results predict. Finally, our results also contribute to the growing literature on flexible queuing networks in heavy traffic which typically has used flexibility in such a way that it leads to complete resource pooling (CRP. CRP amounts to assuming that the servers have sufficiently overlapping flexibility,

5 Bassamboo et al.: A little flexibility is all you need 5 and that they work collectively to the extent that they act as a single super-server in the heavy traffic limit. That is, processing capacities of the various resources are completely exchangeable in the heavy traffic limit and single-dimensional dynamics results. The complete resource pooling assumption obviously leads to excellent system performance; see for example Harrison (1998, Harrison and Lopez (1999,Williams (, Stolyar (4, Mandelbaum and Stolyar (4, Ata and Kumar (5, and references therein. We show that CRP is suboptimal in our setting: it simply is not economical to invest in the sufficient amount of flexibility that is needed for CRP. In precise technical terms, while CRP could be obtained using bi-level flexibility only, we show that the optimal amount of bi-level flexibility is of the order of the square root of the demand rates, while CRP requires a higher order. It appears that Ata and Van Mieghem (8 is the only other paper that investigates partial resource pooling in an asymmetric system with N = classes. The outline of the paper is as follows. We start by introducing the model and the essence of our mode of analysis by reviewing capacity sizing for the familiar single-class G/G/1 queue, i.e., N = 1. Then, in Section 3, we describe the basic framework for flexibility that we use in the rest of the paper. In Section 4, we analyze tailored flexibility for N = products and explain the technique of folding to derive the stationary distribution and optimize the expected cost in closed form. Section 5 uses this technique to demonstrate our main result that tailored pairing is optimal. We investigate the robustness of our results by considering asymmetric systems and general cost structures in Section 6. Finally, we conclude with a summary and discussion of limitations and extensions in Section 7. All proofs are relegated to the online Appendix.. The Single-Class Reference Model To introduce the model and the essence of our mode of analysis, we consider capacity sizing for the familiar single-class G/G/1 queue. We start time at t = and denote by A(t the number of customer or job arrivals until time t; A(t is the counting process associated with the inter-arrival times. We assume the latter are i.i.d. random variables (hence A is a renewal process with mean 1/, variance σ a, and coefficient of variation c a = σ a. Each customer or job embodies a nominal workload for the system, which is assumed to be i.i.d. with mean m, variance σ s, and coefficient of variation c s = σ s /m. The actual amount of time that the server needs to devote to a customer or job depends on the capacity of the server, which is assumed to process workload at fixed (deterministic rate µ when there is work in the system. Hence, the actual average service time is m/µ and its variance is σ s /µ ; equivalently, the nominal workload is the service time under nominal capacity µ = 1. Similar to A(t, let S(t denote the counting process associated with the i.i.d. workload times. Thus S(t denotes the number of jobs processed until time t if the server works at unit rate uninterrupted until time t. We define S µ (t = S(µt for all t. The system manager sizes the service rate µ in order to minimize the total expected cost incurred. We assume there is a capacity cost of $c per unit capacity per unit time and there is a holding cost of $h incurred per customer per time unit spent in the system (waiting and service. Hence, for a capacity choice of µ, we obtain a total cost of heq + cµ, where Q represents the number in system in steady-state and E expectation. Thus, the optimization problem is given by minheq+cµ. (1 µ For general distributions, there is no explicit formula for EQ, which makes exact optimization inaccessible beyond numerical means. However, for large systems operating with high arrival rates, one can utilize diffusion approximations to obtain asymptotically exact estimates of the steady state number of customers in system. To do so we consider a system with arrival process denoted by A (t, where A (t is a renewal process with rate R +, and coefficient of variation c a. One can interpret A (t = A(t. We are

6 6 Bassamboo et al.: A little flexibility is all you need interested in developing approximations when is large. We denote T (t as the cumulative time that the server is busy until time t; where the server operates at a fixed rate µ. Then the number of service completions until time t equals S µ (T (t and the exact dynamics of the queue-length process Q (t are It will be useful to express the dynamics as Q (t = Q ( +A (t S µ (T (t. ( Q (t = X (t +Y (t, (3 where X (t = Q ( + [ A (t t ] [ S µ (T (t µ m T (t ] ( + µ m t, (4 Y (t = µ m [ t T (t ]. (5 Given that t T (t is the cumulative time that the server is idle, Y is called the cumulative idleness process. Under any work-conserving policy, Y increases only at those times t at which Q(t = and it can be shown that Y (t = sup s t [ X (s] + (e.g., Iglehart, 1973 and p. 19, Harrison, The exact queue dynamics are thus only a function of process X. Unfortunately, the process X is not tractable in general, and that is where our approximation comes in. The essence of this approximation can now be summarized as follows. First, approximate the time allocation process T in (4 by its average ρ t, where ρ = m/µ is the average fraction of time that the server is busy. Second, recall that scaled, centered renewal processes converge to driftless Brownian motion: for large, A (t t d Ba (t, where B a (t is a driftless Brownian motion with variance c a, and d denotes approximately equal in distribution. (The equality is exact in the limit as. Similarly, we can approximate 1 (S µ (T (t µ m T (t 1 (S(µρ t µ m ρ t d B s (t, where B s (t is a driftless Brownian motion with variance c s. In summary, we approximate X (t d X(t = Q ( + B a (t ( B s (t + µ t, m where X(t is a Brownian motion with drift µ m = µ m (ρ 1 and variance (c a + c s. We will denote σ := c a +c s. It then follows that Q can be approximated by the Brownian motion X reflected at zero. The limiting distribution Q ( if ρ < 1 thus is exponentially distributed with rate µ m (1 ρ /(σ = (1 ρ /(ρ σ (cf. page 15, Harrison, Thus, we have ( Pr(Q ( > x exp (1 ρ x ρ σ so that EQ ( which is the generalization of the familiar Pollaczek-Khintchine formula. ρ (1 ρ σ,

7 Bassamboo et al.: A little flexibility is all you need 7.1. Optimal capacity sizing We now seek the optimal capacity µ that minimizes the total cost Π (µ heq ( +cµ, (6 m h (µ m σ +cµ. (7 As the function Π is concave, we can use the first order conditions to obtain the minimizer hm µ = m+σ. (8 c The optimal cost becomes Π(µ = cm+σ chm. The optimal capacity is the sum of two parts: base capacity m matches the average arriving mh workload plus safety capacity σ to accommodate variability in the arriving workload. The c capacity sizing equation (8 shows that the optimal safety capacity increases linearly with standard deviation σ and exhibits economies of scale. Indeed, the capacity per unit of demand rate is hm µ / = m+σ c, where the safety capacity per unit decreases in, as does the optimal cost per unit. Notice that these expressions are similar to results of capacity sizing in a newsvendor setting with normal demand. The following result formalizes the analysis of this section. Proposition 1. a The optimal solution to (1, µ opt satisfies µ opt m as. b The solution µ given in (8 is asymptotically optimal for the optimization problem (1 in the sense that for large Π (µ = Π +o(, (9 where Π is the solution to (1. This result is a specific version of our main result which we will state later in Theorem 3. Proposition 1 states that the performance of the prescription µ derived from approximate analysis is not far from optimal: the optimality gap is of a smaller order than. Part a of the result is very important as it establishes that the regime in which the approximations hold is indeed the right regime to consider from an economic perspective. In the remainder of this paper, we will generalize the analysis above to a system with multiple customer classes and various types of flexible servers. Before doing so, we first describe the system with N customer classes and develop some notation that we will use throughout the paper. 3. Model Primitives and Basic Setup For Flexibility We will denote classes by i = 1,,...,N and the arrival process of class i customers by A i (t. We assume that all arrival processes are independent renewal processes with common rate >. Let σa denote the standard deviation of the inter-arrival times. Each arriving job has a service requirement that is independent and identically distributed across all the jobs with mean m and variance σs. The coefficient of variation of service times is denoted by c s = σ s /m, while that of the inter-arrival times is c a = σa. We assume that c a is a constant independent of the rate and will henceforth denote σ = (c a +c s /.

8 8 Bassamboo et al.: A little flexibility is all you need As our system is completely symmetric, we consider only symmetric capacity assignments. That is, we assume that each class has a dedicated server assigned to it that operates at a fixed rate µ 1 that is the same for each class. Further, note that for each level-k flexible resource, there are ( N k different configurations of classes that it can be handle. (We use the notation ( p q = p! if p q, (p q!q! and otherwise. Thus, there are a total of N ( N k = N 1 different resources in the system. Due to the symmetry in the system, each of the ( N k level-k flexible resources are assumed to have the same capacity which we will denote by µ k. In addition to a unit holding rate h, the system incurs a capacity cost rate that depends on capacity size and flexibility type. We assume capacity costs are linear in size and affine in type. The cost rate of capacity size µ k of a level-k flexible resource is c k µ k where c k = c(1 + (k 1δ. Thus, c > is the unit cost of dedicated capacity and δ > is the constant flexibility premium. Section 6 will address non-linear flexibility cost functions and asymmetric systems. Let Q i (t denote the queue-length at time t in the queue of class i and EQ i ( its steady-state expected value. Using the holding cost of $h per job per unit time, we obtain the total cost rate of a capacity portfolio µ = (µ 1,µ,...,µ N as N N ( N Π (µ = EQ i ( h + c k µ i. k We seek the capacity portfolio that minimizes costs: i=1 min µ Π (µ. (1 We will solve this optimization problem asymptotically when is large. The following result will be useful in characterizing the regime that we need to focus on to solve this problem. Theorem 1. Any optimal solution (µ 1,...,µ N to the optimization problem (1 satisfies µ 1 = m+ ˆµ 1 +o(, and (11 µ k = ˆµ k +o( for k, (1 for some ˆµ 1,..., ˆµ N R with ˆµ k for k and N ( N kˆµk >. This result shows that economic optimality will size the dedicated servers on the order of the arrival rate, while the flexible capacity is much smaller and proportional to the standard deviation which is O(. This implies that to characterize an approximate solution to the optimization problem (1, it suffices to restrict attention to capacity portfolios of the form (m + ˆµ 1, ˆµ,..., ˆµN, where ˆµ k for k and N ( N >. The latter condition is essential for stability as it kˆµk ensures that the total demand N does not exceed total capacity, i.e., N < N ( N µk. Equivalently, stability requires that we have positive safety capacity N k m ( N >. The corresponding kˆµk server cost is Nc 1 (m+ ˆµ 1 N + c N k= k( kˆµk. Hence, focussing on this regime, we can rewrite the optimization problem (1 as min Ncm+ ˆΠ(ˆµ, (13 {ˆµ: P N ( N kˆµ k >,ˆµ,...,ˆµ N } where ˆΠ(ˆµ = h N i=1 EQ i ( / + N c k( N kˆµk. We will refer to (13 as the second-order optimization problem. Although we can solve this second-order optimization problem for any finite through simulation, to derive structural insights, we will consider an analytical asymptotic analysis that is correct when the arrival rate. To illustrate our mode of analysis, we begin by considering the N = class setting. In particular, we will demonstrate the novel folding approach

9 Bassamboo et al.: A little flexibility is all you need 9 that allows tractability, and even closed-form solutions. The general N case will be analyzed in a similar manner and the detailed treatment is presented in Section 5. To formalize the mode of analysis, the following terminology will serve useful. All random elements in this paper are defined on the probability space (Ω, F,P. Further, we assume all stochastic processes to lie in the space of functions that are right continuous and possess left limits. For a collection of probability measures P n and P defined on (A, A, where A is a general metric space and A its Borel σ-field, we say that as n, P n P, i.e., P n weakly converges to P, if and only if A fdp n A fdp for all bounded, continuous real-valued functions f on A. Further, if Xn and X are random elements of this space such that P n and P are the probability measures associated with X n and X respectively, then X n X if and only if P n P. 4. A Two-Class Symmetric Model: Optimal Flexibility In this section, we analyze the optimal system configuration in a symmetric system with two classes of incoming jobs. Such systems can use dedicated servers, and flexible servers that can serve either class. We are interested in studying a high volume setting where there are a large number of arrivals. We begin by prescribing a routing rule that determines the order in which jobs are allocated to the different servers. Though computing an optimal routing policy is fairly involved in general, we can use the symmetry in this setting to characterize the following strict priority policy, which we will simply refer to as a longest queue (LQ policy and is optimal: When a dedicated server completes a service request, it next processes any job in the system of its own class, if there is no job corresponding to its class, it idles. If we only allow non-preemptive policies, it is optimal for the flexible server to serve the class with the longer queue due to the symmetry in the system. Noting that the regime under consideration places the system in heavy traffic, the system performance remains the same even when the flexible server serves the longer queue pre-emptively. Thus, our mode of analysis is too crude to differentiate between preemptive and non-preemptive policies. For ease of analysis, we will focus on the pre-emptive version of this policy The folding method Asymptotically, we expect the scaled queue-length processes to behave as diffusions. Typically, in such systems, one encounters a state-space collapse wherein the multi-dimensional diffusion collapses to a lower dimension. This collapse requires sufficient flexible capacity, in particular at a scale greater than O(, which keeps the system state in a lower dimensional manifold. However, Theorem 1 rules out the state-space collapse, and hence the limiting system behavior is a genuine two-dimensional diffusion process that we characterize as follows. Lemma 1. As, if Q ( ˆQ(, then Q ( ˆQ(, where ˆQ is given by ˆQ 1 (t = ˆQ 1 ( 1 m ˆQ (t = ˆQ ( 1 m t t (ˆµ 1 +1{ ˆQ 1 (s ˆQ (s}ˆµ ds +σ B 1 (t +L 1 (t (ˆµ 1 +1{ ˆQ (s > ˆQ (s}ˆµ ds +σ B (t +L (t, where B 1 and B are two standard independent Brownian motions, and L i are non-decreasing, continuous processes such that L 1 ( = L ( =, ˆQ i (t and t ˆQ i (sdl i (s = for all t >. The limiting diffusion characterized in (14 is not directly amenable to analysis. The key reason being that the drift of this reflected brownian motion ( ˆQ 1, ˆQ is not continuous. This discontinuity stems from the optimal LQ routing policy where the flexible server serves the longer queue in a pre-emptive fashion. This causes the drift of the diffusion to change when a queue switches from being the longer to shorter, or vice-versa, as depicted in Figure 3(a. (14

10 / :.-*+, "#$%&' (! }~ klmnopqrstuvwxyz{j ^_àbcdefghi ]\ / :.-*+, "#$%&' (! }~ klmnopqrstuvwxyz{j ^_àbcdefghi ]\ 1 Bassamboo et al.: A little flexibility is all you need (a The original two dimensional diffusion ˆQ. (b The diffusion ˆQ after folding. ˆµ 1 ˆµ 1 ˆµ1 + ˆµ ˆµ 1 + ˆµ ˆQ 1 ˆQ max ˆµ 1 + ˆµ ˆµ 1 ˆQ ˆQ min Figure 3 A pictorial representation of the drifts of the limiting queueing dynamics ˆQ (left. The order statistics ( ˆQ min, ˆQ max live in the folded state space with constant drift (right. Luckily, we can transform the diffusion ˆQ into one with constant drift and recover analytic tractability by monitoring the order statistics of the queue length processes and folding the state-space. Notice that ˆQ 1 (t + ˆQ (t = ˆQ max (t + ˆQ min (t, where ˆQ max (t = max( ˆQ 1 (t, ˆQ (t and ˆQ min (t = min( ˆQ 1 (t, ˆQ (t. The benefit of considering the maximum and minimum queuelengths is that the drifts of these ordered queues are constant, which allows the simpler dynamics of Proposition. by Proposition. As, if Q ( ˆQ(, then ˆQ max (t = ˆQ max ( ˆµ 1 + ˆµ m t+σ B 1 (t +Y 1 (t ˆQ min (t = ˆQ min ( ˆµ 1 m t+σ B (t Y 1 (t +Y (t, ( Q max (, Q min ( ˆQ(, where ˆQ is given where B 1 and B are two independent Brownian motions, and Y 1, Y are two non-decreasing continuous processes such that Y 1 ( = Y ( =, Q max (t Q min (t and t ( ˆQ max (s ˆQ min (sdy 1 (s = and and t Q min(sdy (s = for all t. Our next step involves computing the steady-state distribution of the process ( ˆQ max, ˆQ min. To do so, we unfold the state-space and consider this process (with constant drift on the entire positive orthant. Given that it then simplifies to two independent Brownian motions in a quadrant, its limiting distribution will be a simple product form of exponentials. When folding the state-space into the upper triangle (or wedge in Figure 3(b, owing to the normal reflection, we still obtain a product form of exponentials. Defining G = {(x,y R + : x y}, we characterize the steady state distribution of the process ( ˆQ max (, ˆQ min ( in the following result. Proposition 3. The steady-state distribution of the process ( ˆQ max, ˆQ min on G has the density ( ( ˆµ1 + ˆµ π(x,y = αexp x ˆµ 1 σ m σ m y, where α = G exp ( ( ˆµ 1 +ˆµ mσ x ˆµ 1 y dxdy is a normalizing constant. Further, the corresponding mσ expected queue-lengths are E ˆQ min ( = 1 ˆµ 1 +ˆµ σ m and E ˆQ max ( = E ˆQ min ( + 1 ˆµ 1 +ˆµ σ m. (15

11 Bassamboo et al.: A little flexibility is all you need 11 Using this steady-state characterization, the optimization problem (13 can be expressed as ( 1 min ˆΠ(ˆµ 1, ˆµ = + σ hm+(cˆµ 1 +c(1 +δˆµ. (16 {(ˆµ 1,ˆµ :ˆµ,ˆµ 1 +ˆµ >} ˆµ 1 + ˆµ ˆµ 1 + ˆµ Using the first order conditions for optimality, we obtain the following closed form characterization of the solution: Proposition 4. For N =, the optimal safety capacity that solves (16 is ( 3+ hm 1+γ σ 1, 3+ c (+γ (+γ (1+δ γ 1+γ 1 if δ <., (+γ (+γ (1+δ ( hm 3 (ˆµ 1, ˆµ = σ, if δ =., c (1+δ ( 3+ hm 1+γ σ c (+γ (+γ (1+δ,γ 1+γ if. < δ <.5, (+γ (+γ (1+δ σ (1, if δ.5, hm c (17 where γ is defined as follows: «γ = 1 3δ+ δ δ if δ <.5, δ., 5δ 1 if δ.5. (18 Further, at the optimal solution, the safety capacity cost, cˆµ 1 +c(1 +δˆµ, equals the holding cost, he[ ˆQ 1 ( + ˆQ ( ]. Notice that (ˆµ 1, ˆµ σ 1 c depends only on the flexibility premium δ. Hence, for a fixed δ value, mh the optimal safety capacities scale with the standard deviation as expected. Having characterized the solution to the limiting problem, we formally construct a prescription for our original system and state its optimality property in the following result. Proposition 5. The capacity portfolio (m+ ˆµ 1, ˆµ, with ˆµ 1, ˆµ given by (17, is asymptotically optimal for the optimization problem (1 in the sense that where Π is the solution to (1. Π (m+ ˆµ lim 1, ˆµ Π =, (19 This result states that the loss in optimality incurred by using the prescription (m + ˆµ 1, ˆµ is negligible at the O( scale. 4.. Discussion of results: type and amount of flexibility The explicit characterization of the asymptotic solution yields some interesting insights. Figure 4(a hm depicts the optimal safety capacities where we normalize the scale factor σ = 1. Proposition 4 c prescribes that it is never optimal to use any flexibility if the flexibility premium exceeds 5%, i.e., δ.5. As the flexibility premium decreases, it becomes optimal to use flexibility, and the corresponding flexible capacity increases as expected. When the premium falls below %, we obtain ˆµ 1 < which implies that the optimal dedicated capacity is less than the nominal level, and thus the flexibility is used for maintaining the stability of the system as well. Figure 4(b shows how the investment cost in flexible and total capacity varies with the flexibility cost premium δ at a fixed dedicated capacity cost of c = 1. As expected, an increase in the premium

12 1 Bassamboo et al.: A little flexibility is all you need (a Optimal safety capacity levels. (b Optimal safety capacity costs Flexible ˆµ 1 Safety capacity 6 4 Total safety capacity Safety capacity cost Flexible capacity Total safety capacity Dedicated ˆµ Flexibility cost premium δ Flexibility cost premium δ Figure 4 The optimal capacity portfolio (left and investment cost (right as a function of the flexibility premium δ. leads to an increase in the total capacity cost and a decrease in the investment in flexible capacity. The latter entails lesser pooling benefits and hence an increase in the total safety capacity needed as depicted in Figure 4(a. We observe that as the flexibility premium increases, the optimal flexible capacity decreases and is substituted by dedicated capacity. However, this substitution is not perfect as shown in the figure, we over-substitute and the total safety and hence also total capacity increases as a function of δ. Though similar sizing substitution effects have been observed (see for example, Van Mieghem (1998, the benefit of our analysis is that we find these sizing results analytically, which cannot be done in newsvendor models. The dependence of the optimal solution on the variability and holding cost is also worth pointing out. We can think of the solution (m+ ˆµ 1, ˆµ as the analog of a safety capacity refinement around the mean demand in a standard newsvendor problem with normal demand. Our safety capacity (ˆµ 1, ˆµ is also proportional to the underlying standard deviation σ. As the safety capacity cost is equal to the holding cost similar to the economic order quantity (EOQ hm model, we also obtain that the optimal safety capacities are proportional to σ, in particular c to the square root of the holding cost. Thus, as the variability in the system (or the holding cost increases one would require higher dedicated safety capacity ˆµ 1 as well as higher flexible capacity ˆµ Accuracy of prescriptions To study the accuracy of the asymptotic solution derived in Proposition 5 we compare it with the actual optimal capacities derived via simulation and discrete search. Specifically, we consider Poisson arrivals with rates = 5, 1, 4 and mean service time m = 1, unit dedicated capacity cost c = 1, and holding cost h = 1. To study the effect of variability in service times, we study three different service time distributions: deterministic, normal (standard deviation=.5, and exponential. In each case, we compare the optimal cost with the expected total cost of the system when operating with our proposed solution. The optimal cost is derived via simulation and discrete search over a capacity grid for (µ 1,µ. For each capacity level in this grid, we used a simulation run length of 1, time units to estimate the expected queue length of the system. A grid search then allows us to compute the optimal total expected cost for δ (,.45]. Our analytic estimate for the total cost under any capacity level is less than.% above optimality 95% confidence.

13 Bassamboo et al.: A little flexibility is all you need 13 (a Deterministic service times (c s = Cost Flexibility cost premium δ (b Normally distributed service times (c s =.5. (c Exponentially distributed service times (c s = Cost Cost = 5, 1, and 4 are basically indistinguishable from asymptote Flexibility cost premium δ Flexibility cost premium δ Figure 5 The accuracy of analytical results (represented by solid lines was investigated by comparing its total cost to the one found through optimization by simulation using Poisson arrivals. To investigate the convergence rates and the impact of service time variability, three different arrival rates (=5, 1, 4 and three different service time distributions are shown. Figures 5(a-5(c show the cost function centered by the nominal capacity cost mc and scaled by as a function of flexibility premium δ. The solid lines represent the prescribed solution, while the dashed lines represent the optimal values obtained via simulation. Observe that in all cases the prescription is very close to the optimal cost. In fact the worst case occurs in Figure 5(b for = 5. Even here the optimality gap is less than 3.5%. For = 4 the optimality gap in all cases is less than.5%. The simulated cost also converges to the asymptote. For the exponential distribution (Figure 5(c, the cost obtained via simulation is quite close to the asymptotic value even for = 5. Finally, observe that total costs increase as variability increases from (a to (b to (c. We now extend our model to the case of N >, where we have the option of N different kinds of flexible resources. Herein, we demonstrate the key result of the paper that tailored pairing, which utilizes only dedicated and level- flexible resources, is optimal. 5. Optimality of Tailored Pairing Solutions In this section, we generalize our analysis to symmetric processing systems of N customer classes. As described in Section 3, the system can invest in a portfolio of level-k flexible resources (1 k N. As before, such systems are intractable so we resort to an approximate analysis for large arrival

14 14 Bassamboo et al.: A little flexibility is all you need rates that is asymptotically correct when. We assume that an LQ policy is used to route jobs to different servers. Specifically, any flexible resource serves the class with the largest number of customers in the system among the classes it can serve. Let Q (t := [ ] (Q [1] (t,...,q [N](t be the order statistics for the number of customers in various classes, where Q (t [1] Q (t []... Q (t. [N] Under the LQ policy, the longest queue Q [1] is served by all servers that can process it, and hence is processed at rate µ 1 +(N 1µ + +(N 1µ N 1 +µ N. Consider class [i] with i > 1. We can compute the number of level-k flexible resources that will serve this class in the following manner. A level-k flexible resource will serve class [i] only if it is has the longest queue length among all classes than can be handled by the resource. Thus, if k > N i, no level-k flexible resource will serve class [i]. However, if k N i + 1, the level-k flexible resources for which class [i] is the longest queue will serve it. This is simply the number obtained by selecting k 1 classes from the N classes removing the top i ranked classes, i.e., ( N i k 1. Hence, the total processing rate for class [i] equals N i+1 ( N i k 1 µk. The system symmetry thus implies that we only need to keep track of the order statistics of the queue-lengths, which greatly simplifies analysis. Then, using the folding concept introduced in Section 4.1, we obtain the following limiting system characterization. Proposition 6. As, if Q ( ˆQ(, then Q [ ] ( ˆQ(, where ˆQ is given by ˆQ [i] (t = ˆQ[i] ( 1 m N i+1 ( N i ˆµ k t+ σb i (t Y i 1 (t +Y i (t, ( k 1 for i = 1,...,N, where B i are N independent Brownian motion, Y, Y i for i = 1,...,N are non-decreasing continuous processes such that Y i ( =, for i = 1,...,N ˆQ [1] (t ˆQ [] (t... ˆQ [N] (t, t ( ˆQ [i] (s ˆQ [i+1] (sdy i (s =, for i = 1,...,N 1 and t ˆQ [N] (sdy N (s = for all t >. Defining G N = {x R N + : x 1 x x N }, we can characterize the steady-state distribution of the ˆQ [ ] process as follows. Proposition 7. The steady-state distribution of the process ˆQ [ ] ( on G N has the density π(x = α where α = ( PN i+1 N exp G N i=1 ( 1,...,N, we have E ˆQ [i] ( = ( ( N N i+1 ( N i k 1ˆµk exp x σ i, m i=1 ( N i k 1ˆµ k σ m N i+1 P N P N 1 j=i 1( j k 1ˆµ k σ m. x i dx is the normalizing constant. Further, for i = Proposition 7 allows us to express the second order expected steady-state cost rate as a function of the choice of dedicated and flexible servers as ˆΠ(ˆµ = N i=1 N i+1 N N 1 j=i 1 ( j k 1 ˆµk σ hm+ We now characterize the solution ˆµ to the main optimization problem (13. N ( N ˆµ k c k. (1 k Theorem (Main Result. Tailored pairing is optimal for symmetric processing systems: It is never optimal to invest in level-k > flexible capacity. That is, any solution to (13 has ˆµ k = for < k N.

15 Bassamboo et al.: A little flexibility is all you need 15 It is remarkable that it is sufficient to only use dedicated and level- flexible resources regardless of the number of customer classes. The interpretation in terms of base and safety capacity is as before: dedicated capacity is sized to serve the majority of the demand (O(. The second order capacities = N(N 1 ˆµ represent safety capacity to deal with stochastic fluctuations. Notice that we need ( N level- flexible resources, which means that tailored chaining that uses only N bi-flexible resources is suboptimal for N > 3 and must be extended by N(N 3 additional bi-flexible resources to achieve the optimal tailored pairing configuration. Using Theorem, we can rewrite ˆΠ as follows. ˆΠ(ˆµ 1, ˆµ,,..., = N N k +1 σ N(N 1 hm+n ˆµ (N (k 1ˆµ 1 + (N(N 1 (k 1(k 1 c+ ˆµ c(1+δ. ˆµ The optimization problem is then equivalent to min ˆΠ(ˆµ 1, ˆµ,,...,. ( {ˆµ:N ˆµ 1 + N(N 1 ˆµ >,ˆµ } The formal optimality property follows in a similar fashion as Proposition 5. Theorem 3. The capacity portfolio µ = (m+ ˆµ 1, ˆµ,,...,, where (ˆµ 1, ˆµ denotes an optimizer of (, is asymptotically optimal for the optimization problem (1 in the sense that where Π is the solution to (1. Π (µ Π lim =, (3 Unlike the cases N = 1, (see Section and 4 and 3 (see below, there is no explicit closed form solution for the capacity portfolio when the number of classes N 4. This is due to the fact that the first order conditions entail solving a polynomial of order N +1, which is greater than 4. Obviously, these conditions are easily solved numerically for given parameter values. For the case N = 3, the first order condition requires solving a quartic, which is the highest polynomial equation with an explicit solution and we investigate this case further in the next subsection. For all systems with N classes, we can however characterize the maximum flexibility premium beyond which it is never optimal to invest in flexible resources. Proposition 8. For flexibility premiums δ.5, it is optimal to only use dedicated capacity, i.e., ˆµ i = for all i N Explicit solutions for a symmetric 3-class system When N = 3, the optimization problem ( reduces to ( 1 min ˆΠ(ˆµ 1, ˆµ, = + + {(ˆµ 1,ˆµ :ˆµ,ˆµ 1 +ˆµ >} ˆµ 1 + ˆµ ˆµ 1 +3ˆµ +3cˆµ 1 +3c(1 +δˆµ, 1 ˆµ 1 +ˆµ σ hm (4 and the first order optimality equations involve solving a quartic which allows the following explicit solution:

16 16 Bassamboo et al.: A little flexibility is all you need 1 8 Safety capacity bi-level flexible capacity fully flexible capacity dedicated safety capacity Flexibility cost premium δ Figure 6 Optimal safety capacity levels for symmetric systems with three classes, N = 3. Theorem 4 (Main Result for N = 3. Tailored pairing, which reduces to tailored chaining for 3-class systems, is optimal for symmetric systems. The optimal capacity portfolio is (m + ˆµ 1, ˆµ,, where hm σ c ( ξ (γ min, γ min ξ (γ min if δ < (, hm 13 (ˆµ 1, ˆµ = σ,σ if δ = 17 c 18 61, hm σ c (ξ (γ max,γ max ξ (γ max if 17 < δ < 1, (5 61 hm σ (1, if δ 1. c where ξ 18+54γ+39γ (γ = and γ (1+γ(+3γ(3+3γ(3+3(1+δγ min and γ max respectively denote the smallest and largest real root of δ = 6 +3γ +63γ +54γ 3 +17γ γ +16γ +16γ 3 +61γ4. (6 Recall that 3 class systems have 3 bi-flexible resources, and hence tailored pairing is equivalent to tailored chaining as both tailored chaining and pairing require three level- resources. In other words, Theorem 4, shows that chaining of bi-flexible resources (as suggested by Jordan and Graves, 1995 together with dedicated resources is the optimal flexibility configuration. Theorem 3 shows that for N > 3, this tailored chain of N bi-flexible servers must be extended by N(N 3 other bi-flexible servers to obtain the tailored pairing configuration. As for the -class solution, the optimal portfolio for the 3-class (and even N-class is proportional hm to σ. As the flexibility premium increases, it is optimal to shift or substitute the flexible safety c capacity to dedicated, as shown in Figure 6. Notice that ˆµ = for δ.5 and using only dedicated servers is optimal, in agreement with Proposition Robustness of Tailored Pairing Solutions In this section, we investigate the robustness of our main result when two key assumptions do not hold. In particular, in Section 6.1, we relax the constant flexibility cost premium and demonstrate that tailored pairing remains optimal even when capacity costs are fairly concave in the level of flexibility. In Section 6., we relax the symmetry assumption on arrivals and show that tailored pairing remains optimal in a simulation study with asymmetric arrivals.

17 Bassamboo et al.: A little flexibility is all you need 17 δ=.1 δ=.5 δ= Cost of flexibility Level of flexibility Figure 7 Our main result remains optimal even when the cost structure is fairly concave in flexibility. The figure shows the affine cost structure (dashed lines that is assumed for our main analytic result for N = 5 and when c 1 = δ 1 = 1 and δ takes on three values:.1,.5,.5. The solid lines show the corresponding maximal concavity in flexibility costs for which tailored pairing continues to be optimal Non-linear flexibility cost structure Our main results (Theorem shows that tailored pairing is optimal when capacity costs are affine in the level of flexibility. Assuming that the cost of one unit capacity of a resource with k-level flexibility is c k = c 1 [1 +δ(k 1] means that the marginal cost of flexibility is constant and equal to δ. Remarkably, under this affine cost structure, the optimality of tailored flexibility is independent of the magnitude of δ, as long as it is positive. Obviously, if additional flexibility were costless (δ =, full flexibility would dominate and our main result breaks down. Here we investigate the robustness of our result for non-affine flexibility cost structures. Clearly, when the capacity cost is convex in the level of flexibility (c k > c k 1 + δc 1, higher levels of flexibility become even less attractive and our main result holds. So let us investigate how concave the cost structure can be for tailored pairing to remain optimal. Let δ k denote the marginal[ cost to increase the level of flexibility of one capacity unit from level k 1 to k. Then, c k = c ] k δ l= l. We study an N = 5 class system where we normalize mh c 1 = δ 1 = 1 and σ = 1 and consider some fixed values of δ c =.1,.5, and.5. For each fixed value of δ, we solve for the smallest marginal cost of flexibility values δ k, k = 3,4,5 for which it remains optimal not to invest in level-k > flexibility. The results of this numerical computation of the asymptotic cost expression (1 are displayed in Figure 7. Our main result continues to hold for any concave flexibility cost structures above the solid frontiers. To put this into perspective, one can model this frontier using a power function that is often used to model economies of scale in capacity investment: c k = c 1 + β(k 1 α for some < α < 1 and β > (e.g., see Van Mieghem, 8, pp A simple regression analysis on the data in Figure 7 gives us a corresponding parameter α =.68,.53,.45 for the cases δ =.5,.5,.1, respectively. This would suggest that our main result is robust to the cost structure of flexibility, given that the parameter α is virtually always between.6 and 1 in practice. However, we must exercise some caution here because we are measuring economies of scope instead of economies of scale (more on the latter in the next section. 6.. Asymmetric arrival rates Our analysis has assumed symmetry in arrival rates, which is exactly when the benefits of flexibility are highest (Jordan and Graves, 1995 and p. 17, Van Mieghem, 8. This suggests that investment in flexibility levels higher than would remain suboptimal in asymmetric systems. It

Stochastic Networks and Parameter Uncertainty

Stochastic Networks and Parameter Uncertainty Assaf Zeevi Graduate School of Business Columbia University Stochastic Processing Networks Conference, August 2009 based on joint work with Mike Harrison Achal