Using Static Flow Patterns in Time-Staged Resource Allocation Problems

Size: px

Start display at page:

Download "Using Static Flow Patterns in Time-Staged Resource Allocation Problems"

Lesley Franklin Cummings
6 years ago
Views:

1 Using Static Flow Patterns in Time-Staged Resource Allocation Problems Arun Marar Warren B. Powell Hugo P. Simão Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ September 6, 2006

2 Abstract We address the problem of combining a cost-based simulation model, which makes decisions over time by minimizing a cost model, and rule-based policies, where a knowledgeable user would like certain types of decisions to happen with a specified frequency when averaged over the entire simulation. These rules are designed to capture issues that are difficult to quantify as costs, but which produce more realistic behaviors in the judgment of a knowledgeable user. We consider patterns that are specified as averages over time, which have to be enforced in a model that makes decisions while stepping through time (for example, while optimizing the assignment of resources to tasks). We show how an existing simulation, as long as it uses a cost-based optimization model while stepping through time, can be modified to more closely match exogenously specified patterns.

3 Introduction Frequently, we find that optimization models of complex operational problems produce results which run against the insights of knowledgeable experts. It is nice when these differences represent the improvements that save money, but it is frequently the case that the differences simply reflect missing or incomplete information about the real problem. For example, a truckload carrier may need to assign longer loads to drivers who own their tractors (as opposed to drivers who use company-owned equipment) because these drivers need to make more money to cover the equipment costs. We may not be able to quantify the cost of assigning a driver to a shorter load, but we do know that we are happy if the average length of loads to which these drivers are assigned matches a corporate goal. Making optimization models match corporate goals (as opposed to simply minimizing costs) is very common in engineering practice and it is usually achieved through the inclusion of soft bonuses and penalties to encourage the model to produce certain behaviors. Tuning these soft parameters is typically ad hoc and can be quite time consuming. A more formal strategy, introduced by Marar et al. (2006), is to add a penalty term to produce a modified objective function of the form min C(x) + θ x x X xp where x is the flow produced by the model, and x p is a flow that we are trying to match using an exogenously specified pattern. The resulting problem is a nonlinear programming problem that can be solved using standard algorithms. We often encounter problems that are time-staged where the same challenge of meeting corporate operating statistics arise. The problems may be stochastic, or we may be using a temporal decomposition simply because the problems are too large. For example, we may be simulating the assignment of drivers to loads over a planning horizon. We know the cost of assigning a driver to a load (we can minimize these costs at a point in time), but by the end of the simulation, we want the model to produce statistics that meet certain goals when averaged over the entire simulation. 1

4 In our applications, these corporate goals are always expressed as static patterns. This means that while we are solving the problem using a method that steps through time, the decisions, when aggregated over the entire simulation, need to match specific targets. This challenge arises in virtually every project we encounter with the sponsors of CASTLE Laboratory at Princeton. Examples of specific projects (all of which have been solved using the techniques in this paper) include: Locomotive management at a railroad - One pattern was to assign a particular type of locomotive (e.g horsepower) to a particular train (e.g. intermodal trains), 80 percent of the time (intermodal trains need to move quickly to compete with trucks). Routing and scheduling for cryogenic gases - The pattern specified that drivers who just delivered gases at a particular customer would then move an average distance to the next customer (this helped provide more realistic clustering of customers). Managing drivers at a major less-than-truckload carrier - One pattern specified that drivers in Chicago, with a home domicile of Cleveland, might be assigned to a load going to Indianapolis 10 percent of the time (this tells the model that it is possible, but not common, to send drivers in this direction). Military airlift problems - The pattern might specify that C-5 aircraft should be assigned to move cargo into the Middle East 7 percent of the time (bases in the Middle East might not have good repair facilities for this type of aircraft). Truckload motor carriers - Team drivers (drivers moving in pairs) should be assigned to loads that were between 700 and 800 miles in length 20 percent of the time (this helped the model match average length of haul statistics). Managing boxcars for a railroad - Customers requesting boxcars would receive empties from a particular location 40 percent of the time. Sometimes a customer had special needs that were met by cars from a specific location. All of these problems were solved using methods that stepped through time using the techniques of approximate dynamic programming (see, for example, Topaloglu & Powell (2006), 2

5 Powell & Van Roy (2004) or Powell et al. (2005)). In each case, our ability to gain user acceptance was significantly improved by our ability to match user-specified patterns to obtain more realistic behaviors from the model. This required the ability to solve a problem at a point in time, while matching statistics that were measured over the entire simulation. In this paper, we assume that we are solving a dynamic model one time period at a time, stepping forward through time (for example, using a myopic simulation, a rolling horizon procedure or a more advanced technique such as approximate dynamic programming). We assume that we are given static flow patterns that we wish to use to guide the behavior of a dynamic model. Thus, we might like to assign a particular type of driver to long loads 70 percent of the time, but in any one time period we may not be able to meet this target. It is not necessary (and often may not be possible) to match the pattern at any one point in time. The goal is to match it over time. Although our original motivation was to match exogenous patterns to improve user acceptance, there is another use of static patterns which we investigate in this paper. All of our problems are defined over a time horizon and are too hard to solve as a single optimization problem, either because the problem is stochastic or because the problem is simply too large. As a result, we are forced to use some sort of approximation. It is typically the case, however, that we can solve static versions of the same model using commercial solvers. We can view the optimal solution of the static model as an exogenous pattern, and test whether this improves the quality of the solution produced by our dynamic approximation. We propose an algorithm that modifies an existing (typically approximate) algorithm which steps through time, producing results that more closely match a static pattern. We establish the following properties for the algorithm. 1) For the case of continuous, nonreusable resources (resources are consumed in each time period), we introduce a modified model to be solved at each point in time that guarantees that the deviation from a static flow pattern is reduced after each time period. 2) For the case of reusable resources, we introduce an iterative algorithm which adapts to static patterns. 3) We show experimentally that using the optimal solution of a static problem (which is much smaller) as a pattern to guide an approximate solution of a dynamic problem improves overall solution quality. 3

6 The organization of the paper is as follows. In section 1 we present the dynamic resource allocation problem which is our motivating application. We present the dynamic resource allocation model in two settings: reusable resources, which arise in the context of fleet management, and nonreusable resources, which arise in the context of production planning. In section 2 we introduce our approach for incorporating static flow patterns in the optimization model. This approach combines a traditional cost function with a proximal term called the pattern metric which measures the deviation between static flow patterns and the patterns generated from solving the time-staged approximation. The technique is then developed for two major problem classes. The first, presented in section 3 assumes that resources are nonreusable which is to say that decisions made about resources in one time period do not affect the resources available in the next time period. This special case is easily solved to optimality in a time-staged manner (since each time period is independent), allowing us to focus on the challenge of making decisions over time that match a static pattern. We are able to prove specific convergence results for this problem class. Then, section 4 introduces the problem of reusable resources, where decisions made in one time period need to consider the downstream impact on future time periods. Section 5 describes a specific resource allocation problem as an instance of the more difficult case of reusable resources for which we want to demonstrate that static flow patterns can improve the solution obtained by approximate policies that are applied over time in a simulation. Experimental results in section 6 show that we can improve the overall solution quality when we introduce static flow patterns. We present our conclusions in section 7. 1 The Dynamic Resource Allocation Problem We begin by presenting a model of a resource allocation problem where the resources are reusable. Our work is motivated by problems in freight transportation which involve the management of vehicles (aircraft, tractors, trailers, box cars, containers) which have to be moved over space and time. After finishing a move, the vehicle becomes empty and available to be assigned to a new load of freight or to be repositioned (empty) at another location. To model this problem, we use the following notation. Our problem is modeled in discrete 4

7 time over the set T = {1,..., T }. Resources are modeled using: a = Vector of attributes of a resource. A = Attribute space of a. R ta = Number of resources with attribute vector a at time t. R t = (R ta ) a A, known as the resource state vector. Decisions and costs are given by: D a = Set of specific decisions that can be applied to resources with attribute vector a. x = Number of resources with attribute vector a acted on by decision d D a at time t. x t = (x ) a A,d D c = Cost of making a decision d on resource attribute vector a at time t, a A, d D a, t {1,..., T }. c t = (c ) a A,d D The optimization problem over a finite horizon is written as: min t T a A c x (1) subject to, for all t T, A t x t R t = 0 (2) B t x t R t+1 = 0 (3) x t 0. (4) The problem in (1) can be hard to solve because of complexities such as uncertainty, integrality constraints, time windows on tasks and a high level of detail in defining actual operations. 5

8 It is common to solve time-staged problems such as (1) using techniques that step through time. Let: X π t (R t ) = Vector of decisions returned by a policy π Π given the resource state R t. There are several classes of policies that illustrate this function. A myopic policy uses the rule: Xt π (R t ) = arg min x t X t a A c x t T (5) where X t is the feasible set defined by the constraints (2) - (4). A rolling horizon policy would plan events over a planning horizon T ph T in the future and is given by X π t (R t ) = arg min x t,...,x t+t ph c x + a A t {1,..., T T ph }. t+t ph c t adx t ad t =t+1 a A where we optimize over x t,..., x t+t ph but only implement x t. Finally, we might use a dynamic programming policy: X π t (R t ) = arg min x t X t ( a A c x + V t+1 (R t+1 ) ) (6) where V t+1 is an approximation of the value of being in resource state R t+1 = B t x t. For simplicity of notation, we have presented our model assuming single-period transformation times, that is, resources which are acted on in period t reappear in period t+1. An important special case arises when resources are not reusable, which we would represent using B t = 0. Our goal is to obtain flows x t at a point in time which, when averaged over time, closely match the static flow patterns. In the next section we introduce the basis of our methodology that allows us to make decisions that match the static flow patterns. 6

9 2 Representation of Static Flow Patterns We first develop the notation to represent information pertaining to static flow patterns. We assume that exogenous patterns are specified in the form ρ s ad = The fraction of time that resources with attribute a are acted on with decisions of type d. Thus the vector ρ s a = (ρ s ad ) represents the probability mass function of the decisions d acting on the resource attribute vector a. In practice, it is typically the case that attributes (and decisions) are expressed at some level of aggregation, although we do not consider this possibility in this paper. To compare with static flow patterns, we normalize the decisions made by the model over the entire time horizon as shown below: ρ ad (x) = t T t T x x, d D a, a A. (7) We now present the optimization model in the following form: [( ) ] arg min c x + θh(ρ(x), ρ s ) t T a A (8) subject to A t x t = R t, x t 0, t T, (9) where H is a penalty function known as the pattern metric that penalizes deviations of the vector ρ(x) = (ρ ad (x)) a A,d Da from the static flow patterns ρ s = (ρ s ad ) a A,. This penalty is weighted by a positive scaling factor θ. The formulation in (8) holds true for both reusable and nonreusable resources if we note that in the case of nonreusable resources B t = 0. In the next paragraphs we derive the functional form of the pattern metric through a goodness-of-fit test metric used widely in statistics. We adopt a quadratic form of the pattern metric in (8) motivated by the popular Pearson goodness-of-fit metric (Read & Cressie (1988), Pearson (1900)). The Pearson goodness-of-fit 7

10 metric is a popular statistical test where a particular sample of data might have been drawn from a hypothesized probability distribution denoted by H 0. Consider observing a random variable which can have one of the possible outcomes in the set {d i } {i I}, where ρ i is the probability of outcome d i. The decisions {d i } {i I} are mutually exclusive and ρ i = 1. i I We assume ρ i > 0 for all i I. Consider a scenario where we observe N realizations of this random variable. We can summarize our observations using the vector (ˆρ i ) i I where ˆρ i denotes the fraction of the sample that is observed with outcome d i, i I. We hypothesize a probability vector for the null model using H 0 : ρ = (ρ i ) i I where ρ i > 0 for all i I. If the observations are independent and identically distributed the Pearson goodness-of-fit metric is a chi-squared statistic given by χ 2 = i I N ρ i (ˆρ i ρ i ) 2. (10) The null hypothesis H 0 (that is, the observation of the random variable follows the distribution ρ) is rejected if the Pearson goodness-of-fit metric in (10) exceeds a certain threshold. The Pearson goodness-of-fit metric in its original form has a disadvantage because of the presence of the probabilities in the denominator of the function. This is particularly inconvenient because we do not require the time-staged model to prohibit decisions that do not occur in the static flow pattern. Thus we adopt a simple variant of the Pearson goodness-of-fit metric as our functional form of the pattern metric, given by min x t X t, t T ( T ) c x + θ ( T ) t=1 R x 2 a ρ s ad, (11) R t=1 a a A a A where X t is the feasible region defined by constraints (2) - (4) for time t and = t T x 8

11 is the total number of resources with attribute a over the entire horizon. We first develop our methodology of solving the model with a pattern metric in (11) in a setting with nonreusable resources. In this setting, each time period represents a separate optimization problem with no coupling across time periods. Thus, if we do not consider static flow patterns, we can obtain the overall optimal solution simply by optimizing each time period. Introducing static flow patterns requires that we make decisions which, over time, minimize deviations from the exogenous pattern. 3 Static Flow Patterns with Nonreusable Resources In this section, we focus on the problem of nonreusable resources by which we mean that resources in time period t are not carried forward to the next time period. If we did not face the challenge of matching a static flow pattern (which applies to activities over all time periods) we would be able to solve each time period independently. Such models tend to arise in strategic planning settings where the time periods are fairly large. In subsection 3.1 we present an algorithm for the case of continuous resources. Subection 3.2 proves convergence of the algorithm. Finally, subsection 3.3 shows how to adapt the algorithm for the case of discrete resources. 3.1 The Continuous Case A dynamic resource allocation model with nonreusable resources is solved as a sequence of models over the set T given by x t = arg min x t X t a A c x t T. (12) Our goal is to develop a methodology that solves the model in (11) in a time-staged manner compatible with the techniques introduced in section 1. We let the optimal solution of our 9

12 objective function with the pattern metric be x t (θ) = arg min x t X t [( a A c x ) + θh t (x t ) ] t T, (13) where H t is a function whose specific form we derive using the pattern metric H(ρ(x), ρ s ) later in this subsection. Thus, x t (θ) is our solution with the pattern metric while x t = x t (0) is the solution obtained using only the cost function. With the application of the policy in (13) in the case of nonreusable resources we can show that H(ρ(x (θ)), ρ s ) H(ρ( x ), ρ s ) θ > 0, (14) where x (θ) = (x t (θ)) t T and x = (x t ) t T. The rest of this subsection is devoted to deriving the functional form of H t. The normalized decision variables over the entire time horizon are given by ρ ad (x) = T t=1 x a A, d D a. (15) We suppress the dependence of x on θ to simplify notation. The pattern metric proposed in (11) is given by H(ρ(x), ρ s ) = a A (ρ ad (x) ρ s ad) 2. (16) We can define the normalized decision variables specific to a stage t using ρ = x R ta a A, d D a t {1,..., T }. Analogous to the decision variable x we define x = Number of resources with attribute vector a acted on by decision d D a at time t in the optimal solution of (12). 10

13 Using the same notation for ρ we let ρ = x R ta a A, d D a t {1,..., T }. (17) Similar to the expression in (15) we define the normalized solution to the problem in (12) using ρ ad = T t=1 x a A, d D a which we may rewrite as ρ ad = = = T t=1 x T t=1 T t=1 x R ta R ta ( Rta ρ ), (18) where the last step uses the substitution in equation (17). We denote the gradient of H(ρ(x), ρ s ) with respect to the normalized decision variable ρ ad at the value ρ ad as h ad, which is found using h ad = H ρ ad ρad = ρ ad a A, d D a = 2 ( ρ ad ρ s ad) a A, d D a. (19) Using equation (18) and the relation T t=1 R ta = we can rewrite equation (19) as h ad = 2 ( T t=1 ( ) ) Rta ( ρ ρ s R ad) a A, d D a. (20) a When we solve a subproblem at time t using equation (13) we have already obtained the solution vector x for all t t < t. Our static flow pattern may be telling us to send 30 percent of a particular type of vehicle to a particular location, whereas if we look at the time periods before t, we may be doing this only 20 percent of the time. This information could be used 11

14 as we progress through time to help us match the static flow pattern, but is ignored in the expression for the gradient of the pattern metric in (20). We incorporate information regarding prior decisions by adopting a Gauss-Seidel strategy (see Strang (1988), p.381). We first define ρ, = t 1 ( ) x t ad + t =1 } {{ } I T ( ) x t ad t =t } {{ } II a A, d D a t {1,..., T }, (21) The Gauss-Seidel gradient of the pattern metric is given by h = 2 (ρ, ρs ad) a A, d D a t {1,..., T }. (22) The pattern metric itself can be calculated at the beginning of every subproblem using Ht 1 = ρs ad) 2 t {1,..., T + 1}. a A (ρ, Note that H0 is simply the pattern metric that evaluates the optimal solution x of model (12) and HT is the pattern metric that evaluates the solution x of model (13), which incorporates the static flow patterns. 3.2 Convergence Results We establish two useful results. The first shows that the Gauss-Seidel version of the algorithm monotonically improves the pattern metric as we step forward in time during a single iteration. We then establish overall convergence of the algorithm. The following theorem establishes monotonic improvement of the pattern metric within an iteration: Theorem 1 For all t, t {1,..., T } if we solve the following quadratic programming problem: x t (θ) = arg min ( c x + θ x [ h R + 2(u x ) ] ) du (23) a a A d D 0 a 12

15 subject to: A t x t = R t, x t 0 then we obtain H T H T 1... H 1. (24) Thus, the pattern metric evaluated after solving each subproblem in (23) forms a monotonically decreasing sequence in time t. Consequently the function H t (x t ) that we adopt in the formulation given in (13) is given by: H t (x t ) = ( 1 a A x 0 [ h + 2(u x ) ] du ). Proof: See appendix. Theorem 1 proves the expression in (14) thus validating our approach of solving the time-staged sequence of models stated in (13). We next show that the decisions produced by equation (23) produce the optimal solution to the objective function given in equation (11). The proof of convergence is obtained by showing that the model in (23) is identical to solving the model in (11) using an iterative method known as the block coordinate descent (BCD) method. The proof uses existing convergence results for this class of algorithms. The block coordinate descent method is a popular technique for minimizing a real-valued continuously differentiable function f of m real variables subject to upper bounding constraints. In this method coordinates of f are partitioned into M blocks and at each iteration, f is minimized with respect to one of the coordinate blocks while the other coordinates are held fixed. This method is closely related to Gauss-Seidel methods for equation solving (Ortega & Rheinboldt (1970) and Warga (1963)). Convergence of the block coordinate descent method typically requires that f be strictly convex, differentiable and, taking into account the bounded constraints, has bounded level sets (Sargent & Sebastian (1973),Warga (1963)). 13

16 We formally describe the BCD algorithm below using the notation developed in Tseng (2000): Initialization. Choose any x 0 = (x 0 1,..., x 0 M ) X. Iteration n, n 1. Given x n 1 = (x n 1 1,..., x n 1 ) X, choose an index s {1,....M} and compute a new iterate: M x n = (x n 1,..., x n M) X satisfying x n s = arg min f(x n 1 1,..., xs 1 n 1, x s, x n 1 s+1,..., x n 1 x M ) (25) s x n j = x n 1 j, j s, j {1,..., M}. The minimization in (25) is attained if the set {x : f(x) f(x 0 )} is bounded and f is lower semicontinuous on this compact set (Rockafellar (1972)). To ensure convergence of the algorithm it is further required that each coordinate block is chosen sufficiently often in the method. One of the most commonly used methods to achieve this is the cyclic rule. According to the cyclic rule there exists a constant M M such that every index j {1,..., M} is chosen at least once between the n-th iteration and the (n + M 1)-th iteration. A well-known case of this rule is when M = M according to which an index s is set to k {1,..., M} at iterations k, k + M, k + 2M,.... It is obvious why the BCD method is attractive to solve the model with a pattern metric given in (11) in the case where the resources are nonreusable. The number of blocks is equal to the number of time periods T. By fixing the values for T 1 blocks at any iteration we only need to optimize over the decision variables representing one time period, say index t. In the case where the resources are nonreusable the advantage of the BCD method is realized because we can optimize over the feasible region X t ignoring all other constraints. This is exactly what we exploited in developing our algorithm in (23). It should be noted that we do not require the initial solution x 0 to be the optimal solution of the optimization model 14

17 solved without the pattern metric. We used this as our initial solution in theorem 1 only to validate our approach in capturing information in an optimization model. If we adopt the cycle rule in the BCD methodology applied to our optimization model in (11) then at any iteration n 1 the time period (block) t that we minimize over is given by: t = n n 1 T T where x denotes the greatest integer less than or equal to x. The key to understanding the connection between the BCD methodology and our problem is that a subproblem solved at time period t is an iteration of the BCD methodology. The Gauss-Seidel gradient of the pattern metric given in (22) after iteration n can be expressed by h,n t ad = 2 ( T t =1 x,n We compute x,n t f n (x) = a A ρ s ad ) as follows. If t = n n 1 T ( c x + θ Otherwise, we simply set x,n t x,0, so we may use x,0 t = x t., n 1, a A, d D a, t T. (26) T, then x,n = arg min x X f n (x) where x 0 t [ h,n 1 + 2(u x,n 1 ) ] du ). (27) = x,n 1 t. Any feasible solution in X can be used to initialize A direct application of the BCD methodology suggests the following procedure. iteration n 1, if t = n n 1 T f BCD,n (x) = a A θ a A Otherwise, we simply set x,n t T, then x,n = arg min x X f BCD,n (x) where T t =1,t t ( T t R =1,t t x,n 1 t ad a = x,n 1 t. t c t adx,n 1 t ad + c x + At ) 2 + x ρ s ad. (28) We conclude this subsection by showing that our methodology in (23) is a provably convergent algorithm for solving the optimization model in a pattern metric. We first show 15

18 that the application of the BCD method to the optimization model in a pattern metric given in (11) and our methodology in (23) are exactly the same, that is, we prove the following: Theorem 2 The minimizers of f BCD,n given in (28) and f n given in (27) are identical, that is: arg min f BCD,n (x) = arg min f n (x), n 1, t = n n 1 x X x X T T. Proof: The proof is provided in the appendix. The proof of convergence follows directly from the properties of the optimization model in a pattern metric. In Warga (1963) it is shown that the application of the BCD methodology to a convex function does converge to the optimal solution if the following statements are true: The optimization model in a pattern metric given by equation (11) is continuously differentiable in some neighborhood (relative to X = t T X t ) at every stationary point of this function. For every t, t = 1,..., T, f BCD,n (x t ) (or f n (x t )) is a strictly convex function of x t for all iterations n 1. X is compact. The first condition holds since the model in a pattern metric is differentiable everywhere. The second condition of strict convexity also holds because of the quadratic form of the pattern metric (see appendix). The feasible region X is compact in most real-world applications. 3.3 The Discrete Case Many operational problems are characterized by integrality constraints on the decision variables as is indicated by the wide application of integer resource allocation problems. Such applications arise in airline fleet assignment (Barnhart et al. (2000), Hane et al. (1995)), air 16

19 traffic control (Bertsimas & Patterson (2000)), railcar management (Holmberg et al. (1998), Jordan & Turnquist (1983)), container distribution (Crainic et al. (1993)) and general fleet management (Powell & Carvalho (1997)). In this subsection we see how we can approximate the model in (23) to generate integer solutions. Moreover we see that we can solve the resulting problem as a network if the original structure of the problem (that is, the cost function without the pattern metric) is a network. There is a literature on solving quadratic cost functions and more general convex cost problems as network flow problems. Minoux (1984) developed a polynomial-time algorithm for obtaining a real-valued optimal solution of a quadratic form of the objective function similar to the model objective in (23). It is further shown in Minoux (1986) that this method can be used to obtain an integer optimal solution to the general convex flow problem. We use a method (see Ahuja et al. (1993), pp ) that approximates a quadratic function using a piecewise linear model. We then show that this formulation can be solved as a network and use the well-known fact that solving a network with integer data as a linear program yields integer solutions. The objective function in (23) can be expressed as a A C (x ) where: C (x ) = c x + θ x [ h R + 2(u x ) ] du. a 0 Since x cannot exceed the number of occurrences of state a at time t denoted by R ta we can approximate C (x ) by at most R ta = R ta + I {Rta R ta >0} linear segments. The set {0, 1,..., R ta } denotes the breakpoints of the piecewise linear approximation. The linear cost coefficient in any interval [u 1, u], u {1,..., R ta } is obtained by taking the gradient of C (x ) evaluated at x = u which is given by c + θ [ h + 2(u x )]. Let Rta u=1 yu = x where 0 y u 1, u {1,..., R ta }. Using the piecewise linear approximation of C (x ) we can represent the quadratic formulation in (23) using yt (θ) = arg min y t Y t R ta a A u=1 ( c + θ [ h + 2(u x ) ]) y u. (29) Y t is the feasible region obtained from transforming the feasible region X using the equations 17

20 Rta u=1 yu = x and the constraints 0 y u 1, u {1,..., R ta }. If the feasible region X t for all t T defines network flow constraints, we see that the formulation in (29) retains the network structure. In the presence of integer data this formulation yields integer solutions. The disadvantage of the network formulation in (29) is that we need to replace a single arc representing the pattern (a, d) with multiple arcs each of whose upper bound is one unit of flow and the cardinality of the number of arcs associated with a particular pattern (a, d) is given by R ta. A simpler version of our piecewise linear approximation is simply to use a linear approximation as shown below: x t (θ) = arg min x t X t a A (c + θra h ) x. (30) In the next section we extend the algorithm developed in this section to the reusable resource case. 4 Extension to Reusable Resources When time periods are relatively short, the decisions to act on resources in one time period impact the resources available in a later time period. In this case, the time periods are coupled. A natural algorithmic strategy is to use approximate dynamic programming methods. Decisions made in time period t can capture the impact on time period t + 1 by using an approximate value function V t+1 (R t+1 ) where R t+1 = B t x t, as presented in equation (6). When we allocate resources, our decisions (x ) a A,d D must be chosen subject to the resource constraint: R n ta = x n a A, t T. In the case of reusable resources, the resource vector R t = (R ta ) a A, for t 1, depends on decisions made in earlier time periods. We let V t (R t ) be the function that describes (at least 18

21 approximately) the optimal value of having R t resources at the beginning of time period t for the remainder of the horizon. An outline of the basic algorithm is given in figure 1. We use U V to denote an updating function that updates the value function approximations for the resource state Rt n = {Rta} n a A, t T at every iteration n 1. Examples of such approximations for resource allocation problems can be found in Powell et al. (2002), Godfrey & Powell (2001) and Godfrey & Powell (2002). A general treatment of approximate dynamic programming methods can be found in Bertsekas & Tsitsiklis (1996) and Si et al. (2004). In practice, these methods do not produce optimal solutions for most problems, and as a result we lose our ability to prove overall convergence of the algorithm. However, we can show that our pattern matching algorithm improves our ability to match an exogenously specified pattern. In addition, we can show experimentally that we can improve overall solution quality when the exogenous pattern is based on solving a static model to optimality. Step 0 Initialization: Set iteration counter n = 1. Choose an approximation V 0 t (.) for V t (.), t T. Step 1 Forward Pass: Step 1.0 Initialize forward pass: Initialize R 1 1. Set t = 1. Step 1.1 Solve subproblem: For time period t solve equation (6) to get solution vector x n t. Step 1.2 Apply system dynamics to update resource attributes after transformation. Step 1.3 Advance time t = t + 1: If t T go to Step 1.1. Step 2 Value function update: Set V t n (.) U V n 1 ( V t (.), Rt n ), t T. Step 3 Advance iteration counter: Stop if convergence is satisfied. If not set n = n + 1 and go to Step 1. Figure 1: Value iteration methodology for dynamic resource allocation problems with reusable resources. In this section we show how we can apply the optimization model introduced in subsection 3.3 to the iterative setting represented by figure 1. We define the normalized model flows as 19

22 shown below: ρ n (x) = xn R n ta a A, t T d D a. As the model decision variables change with each iteration it is necessary to define the pattern metric given by equation (16) at every iteration as a function of the normalized model flows obtained from the previous iteration. We denote the pattern metric at the beginning of iteration n as shown by H n 1 = H(ρ n 1 (x), ρ s, R n 1 ) where H is given by the expression in equation (16). Note that we use the additional argument R n 1 in denoting the pattern metric H n 1 to take into account the fact that when we have reusable resources the number of resources with attribute a varies across iterations. We let R n a = t T R n ta a A. We assume we have the initialized values ρ 0 and R 0. We denote the gradient of H n 1 with respect to the normalized decision variable ρ ad at the value ρ n 1 ad h n ad = Hn 1 ρ ad from which we obtain ρad =ρ n 1 ad a A, d D a as h n ad. We begin with h n ad = 2Ra n 1 (ρ n 1 ad ρ s ad) a A, d D a. The Gauss-Seidel variant of the gradient of the pattern metric, denoted by h n,is given by h n = 2Ra n 1 (ρ n, ρs ad) a A, d D a t {1,..., T }, (31) where as before we define ρ n, = t 1 ( ) x n t ad + Ra n 1 t =1 T t =t ( x n 1 ) t ad Ra n 1 a A, d D a t {1,..., T }. 20

23 Note that we use the notation for the Gauss-Seidel gradient of the pattern metric as a function of n since it reflects the usage of decision variables from prior time periods obtained at iteration n. Within the approximate dynamic programming technique proposed to solve this problem, we adopt a linear value function approximation V t (R t ) = a A v n tar ta, where v n ta is an approximation of the marginal value of resources of type a at time t. Let the attribute transition function be defined using a M (a, d) = The attribute of a resource produced by acting on a resource with attribute a using decision d. The slope v n t+1,a M (a,d) represents the future (marginal) value at time t+1 of a decision d acting at time t on resource attribute vector a. If we use a linear value function approximation, then our subproblem at time t becomes Y n t min y n t Yn t R n ta a A u=1 ( c + v n 1 t+1,a M (a,d) + θ [ hn Ra n 1 + 2(u x n 1 )]) y u,n. (32) denotes the feasible region at iteration n and time t. We use the notation y u,n the iteration-specific dependence of the flow decomposition variables. to indicate We obtain a myopic policy by simply setting v n 1 t+1,a = 0, giving us min y n t Yn t R n ta a A u=1 ( c + θ [ hn Ra n 1 + 2(u x n 1 )]) y u,n. (33) We can obtain an estimate of v n ta by letting ˆv n ta be the dual variable of the supply constraint (equation (2)) of resource attribute a of the subproblem solved at time t at iteration n. Since these fluctuate randomly (even for deterministic problems), we update our estimates v n ta using v n ta = (1 α n ) v n 1 ta + α nˆv n ta a A, t T, 21

24 where α n (0, 1) is a smoothing factor. 5 A Resource Allocation Problem There are two applications of our pattern matching logic which we would like to test. First, we wish to demonstrate the degree to which our algorithm can improve our ability to match exogenously specified patterns. This ability improves user acceptance of these complex models. Second, we wish to measure the value of using the optimal solution of a static model to guide the approximate solution of a dynamic model for the more difficult context of nonreusable resources. To demonstrate the usefulness of the approach, we use as our test setting a problem known as the military airlift problem, which requires managing over time different types of cargo aircraft to move a set of loads ( requirements ) within a network of airbases. Cargo aircraft can be moved loaded or repositioned empty. The problem was chosen in part because while it exhibits the difficult time-staged nature of all of our problems, it is still small enough that we can solve the dynamic version of the model using a commercial solver. This ability allows us to evaluate all of our solutions relative to the optimal solution. In this section we first present the multicommodity flow problem in section 5.1. We detail in section 5.2 the static model that we solve to generate the static flow patterns. Section 5.3 then presents the dynamic model and we see how we can formulate the decision to hold a resource for the next time period which is absent when solving the static model. The results from the actual experiments are reported in section 6; 5.1 The Multicommodity Flow Problem Our experimental design is centered around a dynamic, multicommodity flow problem where resources are assigned to tasks that are moved from one location to another. On completion of these tasks these resources are allowed to cover other tasks starting from that location or to move empty to a different location to cover other tasks. Typically tasks have a time window during which they are available for assignment. There is a reward for covering a 22

25 Table 1: Compatibility matrix task based on the type of resource assigned to it. In addition there is a cost of moving empty between two locations. The data for our experiment is motivated by the military airlift problem, where a fleet of cargo aircraft are used to move loads of freight over time. We consider five types of aircraft and five types of tasks. We conducted experiments with five sets of data. Each dataset is characterized by a label L-A(#)-T(#)-TP where L denotes the number of locations, A the number of aircraft, T the number of tasks and TP the number of time periods. For the same number of aircraft we have different data sets characterizing the attributes of the aircraft and this difference is indicated as a counter A(#) for aircraft (we use T(#) for tasks). For example (1)-2000(1)-30 indicates an experiment that has 20 aircraft locations, 200 aircraft characterized by dataset 1, 2000 tasks characterized by dataset 1 and 30 time periods. Each task is characterized by an origin, destination and a type. A negative value is generated for covering the task and this reward is a function of the type of the task and the type of resource assigned to the task. Each task is associated with a value specified in dollars and the reward for covering this task with a resource is based on a compatibility matrix of dimensionality 5 5 that indicates the fraction of the reward received when covering a particular task type with a specific resource type. The compatibility matrix for our experiment is shown in table 1. There is an empty cost in dollars per mile that is associated with moving empty from one location to another. The empty cost is the same for all resource types. The data set is generated so that the number of demands going out of a location is negatively correlated with the number of demands going into a location at a certain time period. This results in more empty repositioning moves and more temporal flow imbalances. 23

26 The resource attribute vector a is given by a = {location, aircraft-type}. (34) We denote the set of locations in the network by J, and let a location be the attribute location of the resource attribute vector a. For any location j J we define L(j) as the set of tasks whose origin is j. A task expires from the system if it has not been assigned at the time it is available for assignment, that is, we do not assume time windows on tasks in the dynamic model. There is no reward generated for expired tasks. The decision set for the resource a at time t is given by D a = {Move assigned with l L(a location ), Move empty to j J }, a A We let D e a be the set of decisions to move a vehicle empty. 5.2 The Static Model We solve the static model characterizing the resource allocation problem presented in the above subsection as a flow balancing network model. To denote the transformation of resources in the static model we define the following indicator variable: 1 If decision d D a transforms the resource with attribute δ a (a, d ) = vector a A to the state a A 0 Otherwise. The static flow-balancing network model is given by x s = arg min c ad x ad (35) x X a A where we let c ad be the unit cost of transforming a resource with attribute vector a A 24

27 using a decision d D a. The feasible region X is defined by the constraints x ad a A d D a δ a (a, d ) x a d = 0, a A x ad 0, a A, d D a. The cost vector c consists of negative values (rewards) for covering a task and positive values (costs) for moving empty between locations. We represent the normalized optimal flows of empties from the static model as static flow patterns in the time-staged resource allocation model. The normalized static flow patterns ρ s ad are derived from the flow of empties using ρ s ad = x s ad d D ea xs ad, a A, d D e a. The static model is able to globally balance flows over the entire network. As such, these models are able to capture network-level patterns that may be missed by approximate models that are stepping forward through time. The experimental challenge is to measure the size of this benefit. 5.3 The Dynamic Model The objective function for the time-staged model is given by: max x X t T a A c x In our dynamic model, our cost vector has to consider the timing of activities. Thus, a load that is moved late will be assessed a service penalty. A problem we face in using flows from a static model to guide a dynamic model is that the static model does not provide any guidance as to how much flow should be held at a location (the hold option) at a given time period. Let β n a [0, 1] be an estimate of the fraction of hold flows for resources with attribute vector a at iteration n. We use the total number of empties from the static model to derive the scaling factor β n a as shown: β n a = min ( 0, 1 d Da e xs ad d Da t T e xn 1 ) 25

28 where d D e a t T xn 1 is the total number of resources with attribute vector a in the model characterizing flow of empties and the hold decisions from the previous iteration. Instead of using an iteration-independent ρ s ad to represent static flow patterns at every iteration we use: ρ s,n ad = { β n a Hold (1 β n a )ρ s ad Move to another location Thus, we are employing a user-defined parameter to specify the fraction of vehicles that are held in a location, and then factoring down the movements to other locations so that the pattern still sums to one. The new vector of probabilities {ρ s,n ad } d D e a satisfies the following condition at every iteration n: d D e a ρ s,n ad = 1, a A. The new pattern metric at the end of every iteration n is given by H n (ρ n (x), ρ s,n, R n ) = a A ( T t=1 R n ta ) d D e a (ρ n ad ρ s,n ad )2 (36) where we use the compact notation ρ s,n = {ρ s,n ad } a A, d D e a. A summary of the algorithm we use to incorporate the pattern logic in a dynamic model is given in figure 2. 6 Experimental Results We have three questions we wish to answer experimentally: 1) How quickly does the algorithm converge? 2) How well does the algorithm match exogenous patterns for problems with reusable resources? and 3) If the exogenous pattern is the optimal solution to a static problem, how much does this improve the solution when we are using an approximate algorithm (for problems with reusable resources)? 26

29 Step 0 Initialization: Set iteration counter n to 1. Initialize the following for n = 1: h 0 = 0, t T, a A, d D e a R0 ta t T R0 ta = 0, t T, a A ρ s,0 a = ρ s a, a A. Step 1 Set time t = 1: Step 2 : Step 1.0 If n > 1 : Derive network arc cost using the Gauss-Seidel gradient of the pattern metric as in equation (31) and apply smoothing to this cost. Step 1.1 Solve the time-staged model with linear value function approximations indicated in (32) or the myopic policy indicated in (33) for stage t. Step 1.2 Increment t = t + 1: If t T go to Step 1.0 else go to Step 2. Step 2.0 Calculate aggregate decision variables: x n ad = t T x n, a A, d D e a Step 2.1 Derive: ρ n ad = x n ad d D ea xn ad, a A, d D e a If d D e a xn ad = 0 set ρn ad = 1 for all a A, d De a, δ a (a, d) = 1, and ρ n ad = 0, otherwise. Step 2.2 Scaling: Derive ρ s,n ad, a A, d De a to reflect hold decisions. Step 2.3 Derive the pattern metric H n using (36). Step 2.5 Advance iteration counter if convergence is not satisfied: Set n = n + 1 and go to Step 1. Figure 2: Piecewise linear version of algorithm for incorporating static flow patterns in a time-staged resource allocation model. 27

30 Flow patterns from static model Origin,Aircraft-Type (a) Destination (d) Proportion(ρ s ad ) Total Flow(xs ad ) FL-34,A VA FL-34,A MS SC-29,A VA CA-95,B MO CA-95,B OR IA-51,D AK UT-84,D NM UT-84,D MO Table 2: Percent of flow moving empty from origin to destination by aircraft type, as produced by the static model. These are the patterns used to guide the dynamic model. We address these questions using the problem described in section 5. A sample file of patterns representing the flow of empties between locations obtained from solving the static model for the military airlift problem is highlighted in table 2. In our experiments we are able to solve the dynamic resource allocation model exactly to get the optimal solution. Based on experimentation we found that using a scaling factor θ = 1000 is appropriate when incorporating patterns with a linear value function and a scaling factor θ = is appropriate when incorporating patterns while using a myopic policy, which performs very poorly for this problem class. In our experiment we use α n = 2 10+n as the smoothing factor to update the linear value function approximations. The smoothing factor that we apply 20 to the Gauss-Seidel gradient of the pattern metric is. We initialize all the smoothed 40+n 1 gradients and costs for n = 0 to Rate of convergence We have proven that our algorithm monotonically reduces the pattern metric, even for the case of reusable resources where we are unable to prove global convergence (since we are using an approximate algorithm to step through time). Unresolved, however, is the rate of convergence. In the introduction, we described a number of projects where we are using this methodology. We have consistently found that the Gauss-Seidel strategy produces very fast convergence. Figure 3 shows how well we match a historical pattern (normalized to 100) after each 28

31 Normalized metric Iterations Figure 3: Rate of convergence of the pattern metric. iteration of the algorithm. The model was judged to be acceptable (by a knowledgeable user) if the performance was within the bounds shown in the figure (approximately two percent above and below the target). We found that the Gauss-Seidel algorithm converged closely to this target within three to four iterations. We have used this algorithm in a number of projects, and this performance is typical. The fast performance is due to the ability of the algorithm to adjust after each time step whether it should do more or less of an activity in order to match a target statistic based on how well we are tracking the goal over the last T time periods (which may include time periods from a previous iteration). If we are using an approximate dynamic programming algorithm, we have to simulate the problem iteratively, and the pattern logic adds only a nominal computational burden. If we were to use a simple myopic policy which normally would require stepping through the data once, this logic now requires that we repeat this simulation three or four times. 6.2 Matching patterns and improving solution quality We now report on experiments where we measure both how well the procedure matches exogenous patterns, and the degree to which patterns derived from solving a static model optimally improves the quality of heuristics used to solve the dynamic problem. 29

32 Linear Pattern Logic and a Myopic Policy Data % optimality % optimality % improvement % improvement with θ = 0 with θ = in obj. function in pattern metric (1)-2000(1) (1)-4000(1) (1)-6000(1) (2)-4000(2) (3)-4000(3) Table 3: Effect of patterns when using a myopic policy (value functions are zero) and a linear pattern metric. Piecewise Linear Pattern Logic using a Myopic Policy Data % optimality % optimality % improvement % improvement with θ = 0 with θ = in obj. function in pattern metric (1)-2000(1) (1)-4000(1) (1)-6000(1) (2)-4000(2) (3)-4000(3) Table 4: Effect of patterns when using a myopic policy and a piecewise-linear pattern metric. Tables 3 and 4 summarize our experimental results when implementing our algorithm using a myopic policy. We see that there is a significant improvement in the percentage of optimality obtained by incorporating patterns using either the linear (equation (30)) or piecewise linear (equation (29)) versions of our algorithm. We are able to achieve in most cases around 70 percent of the optimal solution with an improvement of around 40 percent. While this is far below optimal, we have to point out that the myopic policy is especially poor. This policy does not allow us to move equipment empty to a different location to cover demands that might arise in the future, resulting in excess inventories of equipment at some location that become unproductive. We also see that in the implementations of both the linear and piecewise linear versions of our methodology there is a significant reduction in the pattern metric, showing that we are doing a much better job of matching the pattern. In tables 5 and 6 we report our results for incorporating patterns when we use linear value function approximations to convey information among subproblems. We see that even without incorporating patterns, with the use of linear value function approximations we are able to achieve more than 90 percent of the optimal solution. Despite this, both linear and 30

The Optimizing-Simulator: Merging Optimization and Simulation Using Approximate Dynamic Programming

The Optimizing-Simulator: Merging Optimization and Simulation Using Approximate Dynamic Programming Winter Simulation Conference December 5, 2005 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu