Approximate Dynamic Programming for Networks: Fluid Models and Constraint Reduction

Size: px

Start display at page:

Download "Approximate Dynamic Programming for Networks: Fluid Models and Constraint Reduction"

Victor Barber
5 years ago
Views:

1 Approximate Dynamic Programming for Networks: Fluid Models and Constraint Reduction Michael H. Veatch Department of Mathematics Gordon College April 1, 005 Abstract This paper demonstrates the feasibility of using approximate linear programming (ALP) to compute nearly optimal average cost for multiclass queueing network control problems. ALP works directly with the LP form of the optimality equations, but approximates the di erential cost by a linear form. We use quadratics, piece-wise quadratic value functions from uid models, and other approximating functions based on the structure of the optimality equations and numerical experience. The ALP contains one constraint for every state-action pair; however, for quadratics and certain other basis functions, the constraints can be reduced algebraically to a smaller equivalent set. On examples with two to six bu ers, bounds on average cost were 14-6% below optimal using small LPs; tighter bounds can be achieved by using more approximating functions and larger LPs. Although the size of the LP is exponential in the number of bu ers, for a given level of accuracy, the method requries much less computation than standard value iteration or policy iteration, making accurate bounds obtainable for somewhat larger networks. The ability to compute near-optimal performance measures using LP-based techniques could be very useful in the development and testing of heuristic control policies. 1 Introduction The increasing size, complexity, and exibility of manufacturing processes, supply chains, communications, and computer systems have made them increasingly di cult to model and to operate e ciently. Recurrent themes seen in many industries are a large number of interacting processes and signi cant randomness due to customer demand which must be rapidly served, in addition to uncertainties in the service process. The natural modeling framework for these systems is a multiclass queueing network (MQNET). Even under the simplest assumptions of exponentially distributed service and interarrival times and linear holding costs, MQNET control problems are NP-hard so that we cannot hope to solve large problems exactly [18]. Standard policy iteration or value iteration are too computationally intensive to use even on moderate size problems, particularly in heavy tra c. Many heuristics for scheduling and controlling these systems have been proposed, and some implemented, that have the potential to improve performance. However, the justi cation of these 1

2 policies has been less than satisfactory. One would like to know that a proposed heuristic is stabilizing and robustly near optimal. Signi cant progress has been made in the analysis of stability, but relatively little is known about suboptimality because of the inability to solve or tightly bound the optimal control problem. The commonly used bounds [3], [] tend to be loose under balanced heavy tra c, which are often the conditions of most interest. Some heuristics have been shown to be asymptotically optimal for the limiting Brownian control problem in heavy tra c or the uid control problem, which essentially considers large bu er contents. While asymptotic optimality provides some guidance, it is generally too loose a criteria for designing near-optimal policies. This paper demonstrates the feasibility of using approximate linear programming (ALP) to compute nearly optimal average cost for MQNETs dramatically faster than exact DP methods. ALP works directly with the LP form of the optimality equations. Compared to approximate value iteration or approximate policy iteration, the ALP approach has stronger theoretical results, uses only standard LP, and is potentially faster [6] To make the number of variables tractable, the value function is approximated by a linear form. Theory and numerical results suggest that if an accurate class of approximating functions can be found then the ALP will also be accurate, in the sense explained below. Thus, one key to using ALP is selecting a compact but accurate class of approximating functions, or basis. For MQNETs, we exploit knowledge of the value functions from uid models, the structure of the optimality equations, and numerical experience to construct the appropriate form of value function. A natural starting point in approximating the di erential cost function is the cost to drain of the associated uid model. Fluid cost has been used to initialize the value iteration algorithm [4]. The uid cost function is either quadratic or piece-wise quadratic and captures the quadratic growth of the di erential cost in each direction in the state space. In this paper we nd and use these quadratic regions for a three-class example. For larger examples, the two-station uid relaxation in [14] could be used to form a piece-wise quadratic approximation. We also use approximating functions motivated by an analysis of the optimality equations similar to [15]. The second key to using the ALP method is reducing the number of constraints. The ALP contains one constraint for every state-action pair, which is impractical for the problems of interest. Using a di erent quadratic on each set of states de ned by which bu ers are empty, [16] and [17] show that the constraints can be reduced to a nite set. We develop constraint reduction and approximation methods for additional approximating functions, leading to improved bounds. Their work also di ers in that they focus on upper and lower bounds for a speci c policy. Numerical tests were conducted on networks with two to six bu ers. The ALP bound on average cost was 14-6% below optimal; however, the bound requires only the solution of a small LP for these examples. Accuracy improves as more basis functions are added. For example, simply adding indicator functions starting in states with small queue lengths guarantees that the sequence of ALPs converges to optimal. The rate of convergence was tested for a two-station series queue with tra c intensity of 0.8. An error of less than 1% was achieved using 113 basis functions. Solving this LP is much faster than value iteration, which requires a state space with 8 = 784 states and many iterations to achieve the same accuracy. It seems possible to identify much more e cient bases, increasing the rate of convergence and making accurate bounds obtainable for networks of up to eight or 10 bu ers. Although many real systems require larger models, most networks analyzed by researchers are of this size. The ability to compute near-optimal performance measures using LP-based techniques could be very valuable in the development and testing of heuristics. In addition to giving bounds on average cost, ALPs might be useful for obtaining near-optimal policies. For discounted problems, [6] provides an error bound for the ALP value function. In

3 particular, a suitable weighted norm of the error is bounded by the minimum of this error norm over all functions in the approximating class, multiplied by a constant that does not depend on problem size. Similar bounds are given on performance of the policy implied by the ALP value function. For average cost problems, the theory is not as clear. Policies derived from the ALP can have arbitrarily poor performance or even be unstable [5]. A modi cation of the ALP method is presented in [8] for which a performance bound is provided. We do not address performance in this paper, but we conjecture that performance of the ALP policy will be fairly good for the network problems and approximating functions that we consider. The ALP approach was originally proposed by [1]. It is applied to discounted network problems in [6] using quadratic value function approximations. Instead of constraint reduction, they use importance sampling of constraints, which is shown to be probabilistically accurate in [7]. Bounds have also been obtained using the achievable region method [3], [] and duality [10]. The rest of this paper is organized as follows. Section de nes the MQNET sequencing problem and the associated uid control problem and Section 3 describes average cost ALPs. In Section 4 a variety of di erential cost approximations are presented, including detailed analysis of a series queue and a reentrant line. Numerical results on the accuracy of various ALPs are presented and the di erential cost function examined in Section 5. Convergence of a sequence of ALPs to the optimal average cost is shown in Section 6. Some open questions are discussed in Section 7. Open MQNET sequencing: Discrete and uid models In this section we describe the standard MQNET model and the uid model associated with it. There are n job classes and m resources, or stations, each of which serves one or more classes. Associated with each class is a bu er in which jobs wait for processing. Let x i (t) be the number of class i jobs at time t, including any that are being processed. Class i jobs are served by station (i). The topology of the network is described by the routing matrix P = [p ij ]; where p ij is the probability that a job nishing service at class i will be routed to class j, independent of all other history, and the m n constituency matrix with entries C ji = 1 if station j serves class i and C ji = 0 otherwise. If routing is deterministic, then p i;s(i) = 1, where s(i) is the successor of class i. Exogenous arrivals occur at one or more classes according to independent Poisson processes with rate i in class i. Processing times are assumed to be independently exponentially distributed with mean m i = 1= i in class i. To create an open MQNET, the routing matrix P is assumed to be transient, i.e., I + P + P + : : : is convergent. As a result, there will be a unique solution to the tra c equation given by = + P 0 = (I P 0 ) 1 : Here i is the e ective arrival rate to class i, including exogenous arrivals and routing from other classes, and vectors are formed in the usual way. The tra c intensity is given by = C diag(m 1 ; : : : ; m n ); that is, j is the tra c intensity at station j. Stability requires that < 1. 3

4 The network has sequencing control: each server must decide which job class to work on next, or possibly to idle. Preemption is allowed. Let u i (t) = 1 if class i is served at time t and 0 otherwise. Admissible controls are nonanticipating and have Cu(t) 1 u i (t) x i (t): The rst constraint states that a server s allocations cannot exceed one; the second prevents serving an empty bu er. The objective is to minimize long-run average cost Z 1 T J(x; u) = lim sup T!1 T E x;u c 0 x(t)dt: 0 Here E x;u denotes expectation given the initial state x(0) = x and policy u. Consider only stationary Markov policies and write u(t) = u(x(t)). We use the uniformized, discrete-time Markov chain and assume that the potential event rate is P n i=1 ( i + i ) = 1: Let P u = [p u (x; y)] be the transition probability matrix under policy u. It is convenient to introduce the one-step operator T u, de ned by (T u h)(x) = c 0 x + X y p u (x; y)h(y): Due to the linearity of T u with respect to u, only extreme points u i = 0 and u i = 1 need be considered. Let A(x) be the set of feasible extreme point controls in state x. Under the condition < 1, the control problem has several desirable properties: 1. An optimal policy exists and its average cost is constant, J = min u J(x; u) for all x.. There is a solution J and h to the average cost optimality equation J + h(x) = min (T uh)(x): (1) ua(x) 3. Under the additional condition that h is bounded below by a constant and above by a quadratic, there is a unique solution J and h to (1) satisfying h (0) = 0. Furthermore, J is the optimal average cost, any policy u (x) = arg min(t u h )(x) u is optimal, and h is the di erential cost of this policy, Z T h (x) = lim sup E x;u c 0 x(t)dt T!1 0 Z T E 0;u c 0 x(t)dt: () Properties (1) and () can be established using general results for MDPs as in [, Theorems 7..3 and 7.5.6]. The key conditions are that cost is norm-like (so there are only nitely many low-cost states), state 0 is reached in nite mean time from any state under some policy, and any low-cost 0 4

5 state can be reached from 0 in nite mean time. For networks, properties (1) and () are shown in [1, Theorem 7]; (3) is obtained by applying standard veri cation theorems to networks; see [11, Theorem.1 and Section 7]. A more comprehensive treatment is [13, Theorem 10.7]. A natural starting point in approximating the di erential cost function is the associated uid model. In this model all transitions are replaced by their mean rates and a continuous state q i (t) R + is used. In a uid control problem, for each initial state there is a time horizon T such that q(t) = 0 for all t T. The uid control problem corresponding to (1) is (FCP) V (x) = min Z T 0 c 0 q(t)dt _q(t) = Bu(t) + Cu(t) 1 q(0) = x q(t) 0, u(t) 0; where = ( 1 ; : : : ; n ) 0 and B = (P 0 I)diag( 1 ; : : : : n ). An optimal u(t) can be chosen so that it is piece-wise constant, making q(t) piece-wise linear with _q(t) existing except on a set of zero measure. We will use the uid cost to drain V (x) to guide our approximation of h(x). The motivation for this approximation is [1, Theorem 7(iv)], based on [11, Theorem 5.]. It establishes the following connection between the discrete and uid cost functions: h (x) lim = 1: (3)!1 V (x) The optimal uid policy u(x) partitions R n + into a nite number of control switching sets where the control is constant. Each region is a convex polyhedral cone emanating from the origin. In particular, if a region contains x, it also contains x, > 0. These regions can be subdivided according to the sequence of switching sets that a trajectory enters next. In a region, say S k, where a certain sequence of switching sets will be visited, V is quadratic Thus, (3) implies that V (x) = 1 x0 Q k x; x S k : (4) h (x) = 1 x0 Q k x + o(jxj ); x S k (5) and the di erential cost is dominated by the uid cost as queue lengths increase. Similar comparisons can be drawn between their optimal policies; see [4]. Although e cient algorithms exist for computing an optimal uid trajectory, computing the regions S k requires knowing the optimal policy and su ers from the same combinatorial complexity inherent in the original problem. In this paper we use small examples where the S k are known. For larger examples, a two-station uid relaxation as in [14] could be used to form a piece-wise quadratic approximation. 3 Approximate LP: Average cost bounds In this section we describe a general method for constructing a linear program in a small number of variables that approximates the di erential cost and places a lower bound on average cost. 5

6 Although value function approximation can also be used with iterative methods, the direct LP method described here appears the most promising [0], [8]. It is well-known that, for nite state spaces, an inequality relaxation of Bellman s equation gives an equivalent LP in the same variables, (LP) max J s.t. (T u h)(x) J + h(x) for all x Z n +; u A(x) h(0) = 0: For countable state spaces, an additional condition is needed on h. A suitable condition for networks is suggested by (5): For some K > 0 and L > 0, L h(x) K(1 + jxj ): (6) A standard argument for the equivalence of (LP) with (6) is as follows. Using the constraints for the optimal policy and letting x k denote the state after k transitions (including self-transitions of the uniformized chain), J c 0 x k + E u [h(x k+1 )jx k ] h(x k ): After summing and telescoping, taking expectations yields We need to show that J 1 N NX 1 k=0 E x;u c 0 x k + 1 N E x;u h(x N ) + 1 N h(x 0): lim N!1 1 N E x;u h(x N ) = 0 (7) so that taking the limit as N! 1 leaves J J for any feasible h.. Then, since (J ; h ) are feasible, they are optimal for (LP) and (6). To show (7), use the fact that, for all policies u with nite J(x; u), lim N!1 1 N E x;u jx N j = 0 (8) [10, Theorem 1]. Although they assume nonidling policies, (8) also holds for weakly nonidling policies where u(t) 6= 0 if x(t) 6= 0, which includes u : Their result also assumes x is in the recurrent class, but for the optimal policy this extends easily to all states. Combining (8) and (6) gives (7). This exact LP has one variable for every state. To create a tractable LP, the di erential cost can be approximated by a linear form h (x) KX r k k (x) = (r)(x) (9) k=1 using some small set of basis functions k and variables r k [1]. Assume that k (0) = 0. The resulting approximate LP is (ALP) J = max J s.t. (T u r)(x) J + (r)(x) for all x Z n +; u A(x) L (r)(x) K(1 + jxj ) for all x: 6

7 The bounds K and L may depend on r; all that is needed is that the bound applies to each k. Since (ALP) is equivalent to the exact LP with the constraints (9) added, the exact LP is a relaxation. Hence, (ALP) gives a lower bound, J J. Clearly (ALP) has an optimal solution, say r. The approximate LP is still not a manageable size because it has one constraint for each stateaction pair. In Section 4, various constraint sets are algebraically reduced to a smaller, but still exponentially large, set of constraints. Reduction will also be combined with approximate methods that choose constraints. The relationship between errors in the di erential cost (r )(x) h (x) and performance of the associated policy is not obvious. The maximum di erential cost error is likely to be unbounded; however, one might hope that a suitable norm of this error could be used to bound error in average cost. A di erential cost approximation h de nes a myopic policy u h (x) = arg min ua(x) (T uh)(x): We would like to bound J eu J in terms of k(r )(x) h (x)k for some norm, where eu is the myopic policy found by (ALP). Unfortunately, for general MDPs [5] shows that the system might be unstable under eu. However, it is clear when the basis is su cient. Proposition 1 If (ALP) has a binding constraint for each state x then (i) J = J and h = r. (ii) If f k g are linearly independent on Z n +, then (ALP) has a unique optimal solution. Proof. For each x, J and r satisfy (1), with the minimum achieved by the action u(x) with the binding constraint in (ALP). By assumption, (r )(0) = 0 and uniqueness of solutions to (1) implies (i); (ii) follows. 4 Value function approximation and constraint reduction This section considers several bases to approximate the di erential cost and demonstrates how the constraints of the resulting ALP can be algebraically reduced to a small, or at least more easily approximated, set. In Section 4.1, constraint reduction is given for quadratic approximation Section 4. analyzes a series queue and demonstrates the role of additional approximating functions. A method of reducing piece-wise quadratic approximations, which are suggested by uid models, is presented in Section Quadratic approximation Consider the quadratic di erential cost approximation h(x) = 1 x0 Qx + px (10) where Q = [q ij ] is symmetric. This approximation is motivated by (5). It is also interesting to note that for a single uncontrolled queue h (x) = 1 ( ) (x + x): (11) 7

8 The quadratic term in (11) is from the uid, so h (x) V (x) is linear; the e ect of randomness is to shift the uid value function 1/ unit to the left. In [15], (11) is used as a one-dimensional relaxation of the bottleneck station in (1) and shown to be accurate in an asymptotic sense as tra c intensity approaches one. The constraints (ALP) can be reduced to a nite set for quadratic h [16, Appendix A]. To simplify notation, consider only deterministic routing. First, we write the constraints as J X i (c i x i + i [h(x + e i ) h(x)] + u i i [h(x e i + e s(i) ) h(x)]): (1) Unlike a discounted model, only di erences in h appear in these constraints, simplifying the analysis. It is convenient to let x = z + u, so that a control u is feasible for all z Z n +. Substituting (10) into (1) yields J d u + c u0 z (13) where c u i = c i + X j [ i q ij + u j j (q i;s(j) q ij )] d u = X i [u i (c u i + i ( 1 q ii + 1 q s(i);s(i) q i;s(i) + p s(i) p i )) + i ( 1 q ii + p i )] and c u = [c u i ]. But (13) is equivalent to J d u (14) c u i 0 (15) for all i and u. If the optimal policy is nonidling, then for a given control u, (15) is only needed for i in 8 9 < N(u) = : i : X = u j = 1 ; ; j:(i)=(j) i.e., the classes served by busy machines. Under nonidling there are only jn(u)j + 1 constraints for each u instead of n Series queue: Approximating Poisson s equation Even when the uid value function is quadratic, for most networks it only captures the behavior at large queue lengths and quadratic approximation of di erential cost is inadequate. Numerical evidence of this is presented in Section 5. We propose additional functions k that modify h near the boundaries x i = 0. The choice of functions is based on an analysis of Poisson s equation for reasonable controls near the boundary. We choose functions that are either in the domain of the generator of the process or that can potentially correct for error in Poisson s equation under the quadratic approximation. Consider a series queue with arrivals at rate to the rst queue (Figure 1). For this problem, (1) is J c 0 x + [h (x + e 1 ) h (x)] + u 1 1 [h (x e 1 + e ) h (x)] +u [h (x e ) h (x)] : (16) 8

9 α µ 1 µ Figure 1: Two-stage series system We consider the case c 1 < c, so that station 1 might idle (station is always nonidling), and < 1, so that the uid policy is: idle 1 when x > 0 [1]. This policy is greedy and the value function is quadratic: V (x) = 1 c 1 (x 1 + x ) + 1 c c 1 x : (17) Let P 1 denote the transition probability matrix when serving class 1 (u 1 = 1 unless x 1 = 0). De ne the functions 1 (x) = x ; (x) = x 1 x ; 3 (x) = x x ; where = ( = 1 ). Observe that for x > 0 1 (P 1 1 ) (x) = (x) = 1 (x) ; i.e. 1 is in the domain of the generator P 1. Since 1 is small when x is large, the error introduced when idling should be fairly small. The other functions are suggested by studying the error in Poisson s equation, de ned as E (x) = (T u h) (x) h (x) J: Using a quadratic h results in linear error in at least some regions. approximations and J = c 1 =( ). The resulting error is c 1 h (x) = V (x) + 1 x For example, [15] uses the E (x) = (1 u ) c 1x 1 + u 1 1 (c c 1 ) (x + 1) ; i.e., error is zero above the switching curve but is linear in x below the curve when x > 0. A di erent quadratic makes the error constant when below the switching curve and x > 0. Now consider the contribution of and 3 to the error. For, ( ) x ; x (P 1 ) (x) (x) = > 0 ( 1 ) x 1 ( ); x = 0 9

10 when serving 1 and when idle. For 3, when serving 1 and (P 0 ) (x) (x) = [( 1 ) x 1 + ] x (P 1 3 ) (x) 3 (x) = (1 ) x ; x > 0 ; x = 0 (P 0 3 ) (x) 3 (x) = [( 1 ) x 1 ] x when idle. Suitable multiples of and 3 could eliminate the linear error when x = 0 and compensate for it somewhat in the small x states. Based on these observations, consider the di erential cost approximation h(x) = 1 x0 Qx + px + r 1 1 (x) + r (x) + r 3 3 (x): (18) Another advantage of using 1 in the approximation is that it gives the desired shape of switching curve, serve class 1 if x ln x 1, suggested by numerical experience and [15]. Now we reduce the constraints (16) as far as possible. Substituting (18) into (16) and again using x = z + u, (16) has the form J d u + c u0 z + ( u + u0 z) z+u (19) for all z Z + and all u that are nonidling at station. Here d u, c u, u and u are linear functions of the variables p, Q, and r; see the Appendix. For u = (1; 1), (1;1) = 0, so as z i! 1 we must have c (1;1) i 0; i = 1; : (0) Given (0), (19) is tightest at z 1 = 0 for each z. However, depending on the value of ((1;1) it could be tightest at any z. Thus, we will approximate (ALP) by including (19) at z 1 = 0, z = 0; : : : ; N 1 for some N. Now consider u = (0; 1). For each z, the z 1 coe cient must be nonnegative, c (0;1) 1 + (0;1) 1 z 0: Because of the monotonicity in z, this is equivalent to c (0;1) 1 + (0;1) 1 0 (1) and c (0;1) 1 0. Letting z! 1 in (19) gives another constraint, so we have c (0;1) i 0; i = 1; : () Given (1) and (), (19) is tightest at z 1 = 0 but, depending on (0;1) and (0;1), could be tightest at any z, so we include (19) at u = (0; 1), z 1 = 0, and z = 0; : : : ; N 1. Next, for u = (1; 0) we must have z = 0. For (19) to hold as z 1! 1, we must have c (1;0) 1 0: (3) In light of (3), (19) is tightest at z 1 = 0, so we include (19) at u = (1; 0) and z = (0; 0). Finally, we include (19) at u = z = (0; 0). 10

11 µ 1 µ µ 3 Figure : Three-class, two-station reentrant line. To summarize, the approximate reduced ALP contains the N + 8 constraints (19) at u = (1; 1); z 1 = 0, z = 0; : : : ; N 1; u = (0; 1); z 1 = 0, z = 0; : : : ; N 1; u = (1; 0); z = (0; 0); and u = z = 0; plus (0)-(3). Call this relaxation ALP(N). We end this section with a statement about the relaxation. Proposition Let J N be the optimal value of ALP(N) for the series queue. For some M, J N = J for all N M. The proposition follows from the fact that the limiting constraints as x! 1, namely, (0)-(3), are incorporated in ALP(N). 4.3 Fluid approximation for a reentrant line The network in Section 4. has a greedy uid policy. In contrast, this section considers the reentrant line studied by Weiss [5]. Its uid policy has a switching curve that depends on the relative size of the queue lengths, not just whether a queue is empty, making the uid information richer. Such policies have piece-wise quadratic V (x), and may perform better [3]. Moreover, under natural assumptions about the nature of the optimal policy, their switching surfaces give the correct asymptotic slopes of the switching surfaces of the original problem [4]. We formulate the ALP using the piece-wise quadratic regions from the uid and propose an approximate constraint reduction. Figure shows a three-class, two-station line where jobs arrive at rate to class 1. Station serves only class and is the bottleneck, m > m 1 + m 3, where m i = 1= i is the mean service time for class i. Costs are constant, c i = 1, so the only decision is whether to serve class 1 or 3 at station 1. As Weiss shows, when x = 0 the uid policy makes a trade-o between serving class 3, which starves the bottleneck, and serving class 1, feeding the bottleneck. Class 3 is given priority when x 3 x 1, where 1 m m 1 m 3 = : 1 = m 1 + m 3 When x > 0 class 3 is served. Although the control is constant on x > 0, V (x) divides this region into three quadratic regions according to which of three controls will be used next on a trajectory starting from x. The correspondence between quadratic regions and control regions is shown in Table 1. One can verify that these are the three quadratic regions by following the trajectories. Trajectories in the rst control 11

12 Control region visited next Quadratic region (x > 0) State Station serves _q S 1 : x 3 > x x x = 0, x 3 > x 1 3 (; 0; 3 ) S : x 3 x x x = 0, x 3 x 1 1 and 3 ( ; 0; 3 (1 and x 3 > 3 x S 3 : x 3 3 x x > 0; x 3 = 0 3 and idle ( ; 0) 1 )) Table 1: Quadratic regions of the uid cost in the reentrant line example region in Table 1 enter the second control region next; the second and third region feed into the control region x = 0 and x 3 = 0, which leads to x = 0. Note that this optimal policy idles machine 1 in the third control region, but this is just for convenience; a nonidling optimal policy also exists. Since V (x) is continuous (in fact it is C 1 ), the regions S k can be extended to include x = 0, fully de ning V. Hence, we approximate h as quadratic on each of these regions: h (x) = 1 x0 Q k x + p k0 x + f k ; x S k ; (4) where S k, k = 1; : : : ; 3 are de ned in Table 1. Note that we do not require h to be continuous, so the treatment of boundaries matters. The assignment of boundaries in Table 1 was chosen for consistency with the control regions. Now consider constraint reduction for a general network using (4). Assume that there are nitely many S k, each a polyhedral cone from the origin of full dimension. The dual method of [17, Section 3.4] can be extended to reduce the constraints to a nite set; however, certain approximations are needed to make the constraints tractable. For each control, this approach de nes sets of states in which the form of the constraints (1) is constant. Given the control, the transition probabilities are translation invariant. Let f l g; l = 1; : : : ; q be the transitions, i.e., p u (x; x + l ) > 0 for some u and all x such that u A(x). Let = (u; k 0 ; k 1 ; : : : ; k q ) and de ne X = fx S k0 \ Z n + : x + l S kl for all l such that p u (x; x + l ) > 0g: There is one index for each combination of control u, quadratic region S k0 of the current state, and quadratic region S kl of possible next states. If transition l does not occur under u, then k l = k 0. To illustrate these de nitions in the reentrant line example, number the service transitions l = 1; ; 3 and the arrival transition l = 4. Consider, for example, u = (0; 1; 1) and k 0 = 3, i.e., x S 3 (see Table 1). Then k 1 = 3 because class 1 is not served and k 3 = k 4 = 3 because these transitions cannot leave S 3. However, a class service completion could stay in S 3 (k = 3), enter S (k = ), or, for certain parameter values, enter S 1 (k = 1). Speci cally, if 3 and x = (0; 1; 1) then x S 3 but x + = (0; 0; ) S 1, i.e., x X where = (u; 3; 3; 1; 3; 3). Because S 1 and S 3 only meet at the origin, X can contain only states near the origin. In general, if contains k l 6= k 0 then X lies within one transition of the hyperplane separating S k0 and S kl. Again using x = z + u, let Z = fz : z + u X g. The constraints have the form J d + c 0 z + 1 z0 M z; z Z (5) 1

13 where d, c, and M are linear functions of Q k, p k, and f k. The quadratic term M is symmetric. It appears because of transitions between regions S k. The rst approximation is to remove the integer restriction by allowing z Z, where Z is a polyhedron, say fz R n : A z b ; z 0g, whose lattice points are (nearly) the set Z. For simplicity, we allow lattice points on the boundary of Z that are not in Z. This overlap could be avoided by adding more cutting planes. Also, because S k has full dimension and the control u is feasible at all z 0, there is no need for equality constraints in Z. If nonidling controls are desired, constraints of the form z i = 0 can be enforced by removing these variables. Checking (5) exactly for a given d, c, and M is related to determining if M is copositive; instead, following [17], we impose the stronger, simpler conditions and M 0 (6) J d + c 0 z A z b (7) z 0: The key observation is that these constraints are colinear in z and the ALP variables. A dual can be constructed that separates z. Fixed values of J, Q k, p k, and f k satisfy (7) if and only if, for each, the LP min s.t. c 0 z A z b z 0 has optimal value w J d, or equivalently, so does its dual max b 0 y s.t. A 0 y c (8) y 0: Thus, (7) is equivalent to (8) and w J d for all. Reintroducing J, Q k, p k, and f k as variables, the dual form of (ALP) is (ALPD) max J s.t. A 0 y c b 0 y J d M 0 y 0: The two approximations made were restrictions of (ALP); hence, the optimal value J D of (ALPD) is also a lower bound, J D J. 13

14 Region Extreme directions S 1 (0; 0; 1); (0; ; + 3 ); (1; 0; ) S (1; 0; 0); (0; ; + 3 ); (1; 0; ); (0; ; 3 ) S 3 (1; 0; 0); (0; 1; 0) (0; ; + 3 ) Table : Edges of the quadratic regions in the reentrant line example (ALPD) has many more variables than (ALP) due to the large number of regions indexed by. We propose two reductions. First, if nonidling is assumed, then z i = 0 for i = N(u), and these z i can be eliminated before forming the dual. Second, (7) can be interpreted geometrically as checking J d + c 0 z for every extreme point z and c 0 0 for every extreme direction of Z. Because the hyperplanes bounding each S k pass through the origin, the ones bounding Z pass within roughly one transition of the origin (there are points in Z within one transition of the S k boundary). Thus, in a certain sense, the extreme points of Z lie near the origin. Also, nding the extreme directions is made easier by the fact that the extreme directions of Z are a subset of the extreme directions of S k0. In particular, Z has the ones contained in the common boundary of S k0 and all S kl (because there are transitions into S kl from Z ). The extreme directions for the example in this section are listed in Table. Checking extreme directions in the linear constraints (7) is an exact method; however, we will apply it as an approximate check of (5). Requiring (5) at z = t for all t 0 results in quadratic constraints. For simplicity, we use the stronger conditions c 0 0 and 0 M 0. These observations suggest the following approximation to (5). Find the extreme directions f ;l g of Z. The relaxation ALP(N) contains the constraints (5) for z Z and z i N 1 c 0 ;l 0 (9) ( ;l ) 0 M ;l 0 (30) for all and all directions ;l. The constraints (5) address the extreme points, while the limiting constraints (9) and (30) in the extreme directions allow faster convergence over N. Notice that ALP(N) is based on the exact constraints, not the linearization (7), suggesting that ALP(N) might give a tighter bound than (ALPD). However, because of the approximate treatment of limiting constraints ALP(N) may not converge to (ALP). 4.4 Additional examples Two other examples using slightly di erent value function approximations have been tested. The rst is Hajek s arrival routing problem [9]. Arrivals at rate must be immediately routed to one of two servers (see Figure 3). The di erential cost approximation used is h(x) = 1 x0 Qx + px + r 1 x 1 x1 + r x 1 x + r3 1 x1 + r4 x (31) which is analogous to (18). The quadratic approximation was also tested on the six-class, two-station network of Figure 4. This example is the largest for which DP results are available; they are taken from [19]. 14

15 µ 1 α router µ Figure 3: Arrival routing system. α 1 α 3 µ 1 µ µ 3 µ 4 µ 5 µ 6 Figure 4: A six-class network. 15

16 c max Series queue 1 1:5; 1:5 1; 0:8 Arrival routing 1 0:65; 0:65 1; 0:77 Reentrant line 9 ; 10; 1; 1; 1 0:9 6-class network 6=140; 6=140 1=4; 1; 1=8; 1=6; 1=; 1=7 1; 1; 1; 1; 1; 1 0:6 5 Numerical Results Table 3: Parameters of the examples The tightness of the ALP bound was tested by computing the optimal average cost using DP value iteration on a truncated state space. The baseline parameters for each of the four examples are shown in Table 3. Note that these parameters have not yet been scaled. For example, the reentrant line and i must be divided by their sum of 63. First, the e ect of tra c intensity on the quadratic approximation was tested. In Figure 5, = = was varied while keeping i xed in the series queue. The percent error vanishes in light tra c. This is not surprising, since the ve variables in (10) can t h exactly in the six states with x 1 + x ; which are a su cient state truncation in light tra c. Although the DP can only be solved up to a certain, the data suggests that percent error also vanishes in heavy tra c. A data point has been added at = 1 to show this. Tra c intensity a ects error in a similar way when the piecewise quadratic approximation (4) is used for the reentrant line (Figure 6). In this example, as! 1, the geometry of the quadratic regions changes and large ALPs (large truncations N) are required. It should be noted that each of these examples has a single bottleneck station. The accuracy of the various di erential cost approximations is reported in Table 4. The column labelled improved ALP uses the di erential cost approximations (18) for the series queue, (31) for arrival routing, and a quadratic with indicator functions for the events x i = 0 and x i = 1 for the 6-class network. The optimal average cost for the 6-class network is taken from [19]. Although the bounds on average cost are fairly loose, the LPs to obtain them are quite small. For the ALPs that require truncation, the use of limiting constraints is very e ective. ALP(N) for the series queue gives the same solution for all N 3; the sizes shown in Table 4 are for the smallest N that achieves this constant result. We tried removing the limiting constraints from ALP(N) for the series queue and found that it converged very slowly, requiring a prohibitively large N to approach the same average cost. At least in some cases, accuracy improves signi cantly when more variables are added. Adding three variables to the series queue ALP cut the error from 40% to 6%. If additional, similarly e ective basis functions could be identi ed it appears that accurate bounds could be obtained for relatively little computational e ort. Even the LPs with a large number of constraints can be rapidly solved using dual-based methods. Note also that these bounds will be tighter not only in light tra c but in (very) heavy tra c. The form of the optimal di erential cost, h, was also investigated numerically. The dominant feature of h is its quadratic growth. To view the other features of h, a quadratic function was t to it using least squares over the points 0 x i 10. The result for the series queue is shown in Figure 7. The residuals, plotted on the z axis, are small compared to h ; the largest value of h on this grid is over 500. The percent residual is larger when x is small and particularly when x is small. The residual function has a complex shape, giving further evidence that the r 1 and r terms in (18) are useful. The graph also suggests that higher order terms, such as x 1 x x and x 1 x might be e ective; however, adding these terms to the ALP only reduced the error in average cost 16

17 tightness of bound Figure 5: Tightness of average cost bound vs. tra c intensity, series queue tightness of bound Figure 6: Tightness of average cost bound vs. tra c intensity, reentrant line. 17

18 Optimal Percent error in ALP J (size of LP) average cost Quadratic Piece-wise quadratic Improved ALP Series queue 9:31 40% (13 6) n/a 6% (15 9) Arrival routing 5:54 19% (13 6) n/a 14% (75 10) Reentrant line 11:93 0% (18 10) 17% (1337 ) 6-class network :56 19% (89 8) 19% (80 34) Table 4: Accuracy of the ALP average cost from 6.5% to 5.7%. 6 ALP Sequences The numerical results in Section 5 suggest that a sequence of ALPs with larger bases could be solved until the desired accuracy is obtained. However, it is not obvious which sequences of basis functions will work. This section establishes that it is possible to select a sequence of basis functions so that the ALP bound converges to the optimal average cost. Let the sequence (x k ) be any ordering of the states Z n + that is increasing in total queue length, i.e., if i < j then x i x j. Assign an indicator function to each state, k (x) = 1 if x = x k and 0 otherwise. Also de ne N = x K and 0 (x) = 1 fjxj>ng e (jxj N). Let ALP[K] be the ALP that uses the functions f 0 ; 1 ; : : : K g in the same manner as (9). Denote the optimal value of ALP[K] by J(K). The proof that J(K) converges to optimal is based on Sennott s approximating sequence method for in nite state space MDPs []. The basic idea is to construct an approximating sequence ( N ) of MDPs by limiting the total queue length to N through turning o arrivals when jxj = N. Let J N and h N (x) be a solution to (1) for N. Such a solution exists and is in fact the optimal average and di erential cost, respectively [, Proposition and Theorem 6.4.]. Sennott gives conditions under which J N converges to J for a general MDP. First, we show that these conditions hold for our approximating sequence. Lemma 3 The truncated MDP converges, lim N!1 J N = J. Proof. We need to verify the (AC) assumptions [, p. 169]. The rst assumption, that (1) has a solution for each N, was addressed above. Since arrivals can only increase cost, 0 h N (x) h (x), establishing (AC) and (AC3), and J N J < 1, establishing (AC4). Convergence of the ALP sequence follows from the lemma. Theorem 4 For su ciently large, the ALP sequence converges, lim K!1 J(K) = J. Proof. Set r k = h N (x k ), so that (r)(x) = h N (x), jxj N. We will show that, for su ciently large N,, and r 0, J N and r are feasible for ALP[K]. That makes J(K) J N and lim k!1 J(K) lim N!1 J N = J. But the ALP gives a lower bound, J(K) J, so lim K!1 J(K) = J. Recall that the ALP constraints are c T x + X y p u (x; y)(r)(y) J + (r)(x): (3) 18

19 Figure 7: Residual (z) in the best quadratic t to h for the series queue. The constraints with jxj N 1 are the same as the corresponding optimality equations for N ; hence, they are satis ed by J N and r de ned above. Fix N and choose r 0 = max x fh N (x)g. For jxj = N, the left side of (3) di ers from the corresponding optimality equation for N by P fnext transition is an arrivalg(r 0 h N (x)) 0; so (3) is satis ed for these states. Now choose N large enough that minfc T x : jxj > Ng J N and large enough so that, for any x and action u, X X p u (x; y) + e p u (x; y) 1: (33) jyj=jxj jyj=jxj+1 Note that N is chosen so that there are no low cost states outside of N and that (33) is a Lyapunov drift condition on 0 in the states jxj > N. Such a will exist because there are uncontrolled arrivals, making the second sum in (33) nonzero. Then for jxj > N X p u (x; y)(r)(y) (r)(x) y and (3) is satis ed in these states as well. Any xed set of basis functions can be added to the ALPs in Theorem 4. The rate of convergence was tested for the series queue, using (18) and the indicator functions. An error of less than 1%, compared to 6% for (18) alone, was achieved using the 104 indicator functions with jxj < 13 (a 19

20 total of 113 basis functions). An error of 3% was achieved using 44 indicator functions. To achieve 1% accuracy using DP, a state space of 8 = 784 states is required if arrivals are ignored at the upper boundary. Considering also the large number of iterations required by value iteration, the ALP achieves the same accuracy with much less computation. Although the proof uses a very speci c set of basis functions, Section 5 suggests that convergence will occur for other functions and that it should be possible to preselect more e cient functions, increasing the rate of convergence. 7 Summary and Future Work We have demonstrated the practicality of computing a tight bound on average cost for small to moderate size networks. Unlike other bounds, the quality of the bound does not degrade in heavy tra c, although the size of the LP used to compute it may grow. The method requires only the selection of approximating functions and the solution of an LP. For some approximating functions, constraint reduction can be applied so that the number of constraints in the LP grows linearly the number of control actions and the number of bu ers. Typically the number of actions is exponential in the number of bu ers, but with a small base, and the resulting LP is tractable for much larger systems than can be solved exactly by DP. Even when the constraints cannot be reduced to an equivalent nite set, truncation methods can be e ective in selecting a small approximate constraint set. As the number of approximating functions increases, the bound becomes tighter fairly rapidly. More work is needed to determine larger sets of approximating functions that give tighter bounds. Two possibilities are A larger set of exponential decay functions, similar to those in Section 4.. Using the principle that states with at least one small bu er are more important, states could be aggregated with partitions at x i = m. Functions that are linear in, say, the largest x i in each partition and zero elsewhere might be e ective. Several questions of interest remain open: Can it be shown that the percent error in the average cost bound vanishes in heavy tra c with a single bottleneck, as suggested by the numerical results? How does it perform in balanced heavy tra c? Do the policies recovered from the ALP have good performance? For general MDPs, [5] shows that the answer is no, leading them to propose a modi ed algorithm [8]. However, for the class of network problems considered here a positive answer seems possible. Can a comparable upper bound be constructed based on the ALP? Previous upper bounds are on the worst-case performance over a broad class of policies, such as nonidling. Acknowledgements Much of the numerical work in this paper was done by Michael Frechette, Melissa LeClair, and Jonathan Senning; Daniel Stahl and Anna Moore also assisted. I would also like to thank Sean Meyn for his many suggestions. 0

21 References [1] F. Avram, D. Bertsimas, and M. Ricard. Fluid models of sequencing problems in open queueing networks: An optimal control approach. In F. P. Kelly and R. J. Williams, editors, Stochastic Networks, Vol. 71 of the IMA Volumes in Mathematics and its Applications, pages Springer-Verlag, New York, [] D. Bertsimas, D. Gamarnik, and J.N. Tsitsiklis. Performance of multiclass Markovian queueing networks via piecewise linear lyapunov functions. Ann. Appl. Probab., 11: , 001. [3] D. Bertsimas, I.Ch. Paschaladis, and J.N. Tsitsiklis. Optimization of multiclass queueing networks: Polyhedral and nonlinear characterizations of achievable performance. Ann. Appl. Probab., 4:43 75, [4] R-R. Chen and S.P. Meyn. Value iteration and optimization of multiclass queueing networks. Queueing Systems Theory and Appl., 3(1-3):65 97, [5] D.P. de Farias and B. Van Roy. Approximate linear programming for average-cost dynamic programming. In Advances in Neural Information Processing Systems 15. MIT Press, 003. [6] D.P. de Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. Oper. Res., 5(6): , 003. [7] D.P. de Farias and B. Van Roy. On constraint sampling for the linear programming approach to approximate dynamic programming. Math. Oper. Res., 9(3):46 478, 004. [8] D.P. de Farias and B. Van Roy. A linear program for Bellman error minimization with performance guarantees. In Advances in Neural Information Processing Systems 17. MIT Press, 005. [9] B. Hajek. Optimal control of two interacting service stations. IEEE Trans. Automat. Control, AC-9: , [10] P.R. Kumar and S.P. Meyn. Duality and linear programs for stability and performance analysis of queueing networks and scheduling policies. IEEE Trans. Automat. Control, 41(1):4 17, [11] S.P. Meyn. The policy iteration algorithm for Markov decision processes with general state space. IEEE Trans. Automat. Control, AC-4: , [1] S.P. Meyn. Sequencing and routing in multiclass queueing networks. Part I: Feedback regulation. SIAM J. Control Optim., 40: , 001. [13] S.P. Meyn. Stability, performance evaluation and optimization. In E. Feinberg and A. Shwartz, editors, Handbook of Markov Decision Processes: Methods and Applications. Kluwer, 001. [14] S.P. Meyn. Sequencing and routing in multiclass queueing networks. Part II: Workload relaxations. SIAM J. Control Optim., 4(1):178 17, 003. [15] S.P. Meyn. Dynamic safety-stocks for asymptotic optimality in stochastic networks. Dept. of Electrical and Computer Eng., University of Illinois at Urbana-Champaign,

22 [16] J.R. Morrison and P.R. Kumar. New linear program performance bounds for queueing networks. J. Optim. Theory Appl., 100(3): , [17] J.R. Morrison and P.R. Kumar. Linear programming performance bounds for Markov chains with polyhedrally translation invariant transition probabilities and applications to unreliable manufacturing systems and enhanced wafer fab models. In Proceedings of IMECE00, New Orleans, LA, 00. Full-length version available at uiuc.edu/prkumar. [18] C.H. Papadimitriou and J.N. Tsitsiklis. The complexity of optimal queuing network control. Math. Oper. Res., 4:93 305, [19] I.C. Paschaladis, C. Su, and M.C. Caramanis. Target-pursuing scheduling and routing policies for multiclass queueing networks. IEEE Trans. Automat. Control, 49(10): , July 004. [0] D. Schuurmans and R. Patrascu. Direct value-approximation for factored MDPs. In Advances in Neural Information Processing Systems 14, pages MIT Press, 001. [1] P. Schweitzer and A. Seidmann. Generalized polynomial approximations in Markovian decision processes. J. of Mathematical Analysis and Applications, 110:568 58, [] L.I. Sennott. Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York, [3] M.H. Veatch. Fluid analysis of arrival routing. IEEE Trans. Automat. Control, 46: , 001. [4] M.H. Veatch. Using uid solutions in dynamic scheduling. In S. B. Gershwin, Y. Dallery, C. T. Papadopoulos, and J. M. Smith, editors, Analysis and Modeling of Manufacturing Systems, pages , New York, 00. Kluwer. [5] G. Weiss. On optimal draining of uid reentrant lines. In F. P. Kelly and R. J. Williams, editors, Stochastic Networks, Vol. 71 of the IMA Volumes in Mathematics and its Applications, pages , New York, Springer-Verlag. Appendix ALP Constraints for the Series Queue For the series queue of Section 4. and the di erential cost approximation (18) this appendix gives the ALP constraints in terms of z, where x = z + u. The terms in (16) are h(x + e 1 ) h(x) = q 11 x 1 + q 1 x + 1 q 11 + p 1 + r x h(x e 1 + e ) h(x) = ( q 11 + q 1 )x 1 + (q q 1 )x + 1 q q q 1 p 1 + p r 1 (1 ) x r [(1 )x 1 + ] x r 3 [(1 )x ] x h(x e ) h(x) = q 1 x 1 q x + 1 q p + r 1 ( 1 + r x 1 ( 1 1) x + r 3 ( 1 1)x 1 x : 1) x

23 Using these, (16) can be written as (19), which we restate here For the control u = (1; 1), (1;1) = 0 and c (1;1) 1 = c 1 ( 1 )q 11 + ( 1 )q 1 c (1;1) = c ( 1 )q 1 + ( 1 )q J d u + c ut z + ( u + ut z) z+u : d (1;1) = c (1;1) 1 + c (1;1) + 1 ( + 1)q 11 1 q ( 1 + )q ( 1 )p 1 + ( 1 )p (1;1) = r ( ) r 3 ( 1 ): For u = (0; 1), c (0;1) 1 = c 1 + q 11 q 1 c (0;1) = c + q 1 q d (0;1) = c (0;1) + 1 q q + p 1 p (0;1) 1 = r ( 1 ) (0;1) = r 3 ( 1 ) (0;1) = (0;1) + r 1 ( 1 ) + r r 3 1 : For u = (1; 0), we must have z = 0 and (1;0) 1 = 0; (1;0) = 0, c (1;0) 1 = c 1 ( 1 )q q 1 r ( 1 ) d (1;0) = c (1;0) ( + 1)q 11 1 q q ( 1 )p p Finally, for u = x = (0; 0), (0;0) = 0 and r 1 ( 1 ) r ( ) + r 3 : d (0;0) = 1 q 11 + p 1 + r : 3

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads Operations Research Letters 37 (2009) 312 316 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.com/locate/orl Instability of FIFO in a simple queueing