Using Static Flow Patterns in Time-Staged Resource Allocation Problems
|
|
- Lesley Franklin Cummings
- 6 years ago
- Views:
Transcription
1 Using Static Flow Patterns in Time-Staged Resource Allocation Problems Arun Marar Warren B. Powell Hugo P. Simão Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ September 6, 2006
2 Abstract We address the problem of combining a cost-based simulation model, which makes decisions over time by minimizing a cost model, and rule-based policies, where a knowledgeable user would like certain types of decisions to happen with a specified frequency when averaged over the entire simulation. These rules are designed to capture issues that are difficult to quantify as costs, but which produce more realistic behaviors in the judgment of a knowledgeable user. We consider patterns that are specified as averages over time, which have to be enforced in a model that makes decisions while stepping through time (for example, while optimizing the assignment of resources to tasks). We show how an existing simulation, as long as it uses a cost-based optimization model while stepping through time, can be modified to more closely match exogenously specified patterns.
3 Introduction Frequently, we find that optimization models of complex operational problems produce results which run against the insights of knowledgeable experts. It is nice when these differences represent the improvements that save money, but it is frequently the case that the differences simply reflect missing or incomplete information about the real problem. For example, a truckload carrier may need to assign longer loads to drivers who own their tractors (as opposed to drivers who use company-owned equipment) because these drivers need to make more money to cover the equipment costs. We may not be able to quantify the cost of assigning a driver to a shorter load, but we do know that we are happy if the average length of loads to which these drivers are assigned matches a corporate goal. Making optimization models match corporate goals (as opposed to simply minimizing costs) is very common in engineering practice and it is usually achieved through the inclusion of soft bonuses and penalties to encourage the model to produce certain behaviors. Tuning these soft parameters is typically ad hoc and can be quite time consuming. A more formal strategy, introduced by Marar et al. (2006), is to add a penalty term to produce a modified objective function of the form min C(x) + θ x x X xp where x is the flow produced by the model, and x p is a flow that we are trying to match using an exogenously specified pattern. The resulting problem is a nonlinear programming problem that can be solved using standard algorithms. We often encounter problems that are time-staged where the same challenge of meeting corporate operating statistics arise. The problems may be stochastic, or we may be using a temporal decomposition simply because the problems are too large. For example, we may be simulating the assignment of drivers to loads over a planning horizon. We know the cost of assigning a driver to a load (we can minimize these costs at a point in time), but by the end of the simulation, we want the model to produce statistics that meet certain goals when averaged over the entire simulation. 1
4 In our applications, these corporate goals are always expressed as static patterns. This means that while we are solving the problem using a method that steps through time, the decisions, when aggregated over the entire simulation, need to match specific targets. This challenge arises in virtually every project we encounter with the sponsors of CASTLE Laboratory at Princeton. Examples of specific projects (all of which have been solved using the techniques in this paper) include: Locomotive management at a railroad - One pattern was to assign a particular type of locomotive (e.g horsepower) to a particular train (e.g. intermodal trains), 80 percent of the time (intermodal trains need to move quickly to compete with trucks). Routing and scheduling for cryogenic gases - The pattern specified that drivers who just delivered gases at a particular customer would then move an average distance to the next customer (this helped provide more realistic clustering of customers). Managing drivers at a major less-than-truckload carrier - One pattern specified that drivers in Chicago, with a home domicile of Cleveland, might be assigned to a load going to Indianapolis 10 percent of the time (this tells the model that it is possible, but not common, to send drivers in this direction). Military airlift problems - The pattern might specify that C-5 aircraft should be assigned to move cargo into the Middle East 7 percent of the time (bases in the Middle East might not have good repair facilities for this type of aircraft). Truckload motor carriers - Team drivers (drivers moving in pairs) should be assigned to loads that were between 700 and 800 miles in length 20 percent of the time (this helped the model match average length of haul statistics). Managing boxcars for a railroad - Customers requesting boxcars would receive empties from a particular location 40 percent of the time. Sometimes a customer had special needs that were met by cars from a specific location. All of these problems were solved using methods that stepped through time using the techniques of approximate dynamic programming (see, for example, Topaloglu & Powell (2006), 2
5 Powell & Van Roy (2004) or Powell et al. (2005)). In each case, our ability to gain user acceptance was significantly improved by our ability to match user-specified patterns to obtain more realistic behaviors from the model. This required the ability to solve a problem at a point in time, while matching statistics that were measured over the entire simulation. In this paper, we assume that we are solving a dynamic model one time period at a time, stepping forward through time (for example, using a myopic simulation, a rolling horizon procedure or a more advanced technique such as approximate dynamic programming). We assume that we are given static flow patterns that we wish to use to guide the behavior of a dynamic model. Thus, we might like to assign a particular type of driver to long loads 70 percent of the time, but in any one time period we may not be able to meet this target. It is not necessary (and often may not be possible) to match the pattern at any one point in time. The goal is to match it over time. Although our original motivation was to match exogenous patterns to improve user acceptance, there is another use of static patterns which we investigate in this paper. All of our problems are defined over a time horizon and are too hard to solve as a single optimization problem, either because the problem is stochastic or because the problem is simply too large. As a result, we are forced to use some sort of approximation. It is typically the case, however, that we can solve static versions of the same model using commercial solvers. We can view the optimal solution of the static model as an exogenous pattern, and test whether this improves the quality of the solution produced by our dynamic approximation. We propose an algorithm that modifies an existing (typically approximate) algorithm which steps through time, producing results that more closely match a static pattern. We establish the following properties for the algorithm. 1) For the case of continuous, nonreusable resources (resources are consumed in each time period), we introduce a modified model to be solved at each point in time that guarantees that the deviation from a static flow pattern is reduced after each time period. 2) For the case of reusable resources, we introduce an iterative algorithm which adapts to static patterns. 3) We show experimentally that using the optimal solution of a static problem (which is much smaller) as a pattern to guide an approximate solution of a dynamic problem improves overall solution quality. 3
6 The organization of the paper is as follows. In section 1 we present the dynamic resource allocation problem which is our motivating application. We present the dynamic resource allocation model in two settings: reusable resources, which arise in the context of fleet management, and nonreusable resources, which arise in the context of production planning. In section 2 we introduce our approach for incorporating static flow patterns in the optimization model. This approach combines a traditional cost function with a proximal term called the pattern metric which measures the deviation between static flow patterns and the patterns generated from solving the time-staged approximation. The technique is then developed for two major problem classes. The first, presented in section 3 assumes that resources are nonreusable which is to say that decisions made about resources in one time period do not affect the resources available in the next time period. This special case is easily solved to optimality in a time-staged manner (since each time period is independent), allowing us to focus on the challenge of making decisions over time that match a static pattern. We are able to prove specific convergence results for this problem class. Then, section 4 introduces the problem of reusable resources, where decisions made in one time period need to consider the downstream impact on future time periods. Section 5 describes a specific resource allocation problem as an instance of the more difficult case of reusable resources for which we want to demonstrate that static flow patterns can improve the solution obtained by approximate policies that are applied over time in a simulation. Experimental results in section 6 show that we can improve the overall solution quality when we introduce static flow patterns. We present our conclusions in section 7. 1 The Dynamic Resource Allocation Problem We begin by presenting a model of a resource allocation problem where the resources are reusable. Our work is motivated by problems in freight transportation which involve the management of vehicles (aircraft, tractors, trailers, box cars, containers) which have to be moved over space and time. After finishing a move, the vehicle becomes empty and available to be assigned to a new load of freight or to be repositioned (empty) at another location. To model this problem, we use the following notation. Our problem is modeled in discrete 4
7 time over the set T = {1,..., T }. Resources are modeled using: a = Vector of attributes of a resource. A = Attribute space of a. R ta = Number of resources with attribute vector a at time t. R t = (R ta ) a A, known as the resource state vector. Decisions and costs are given by: D a = Set of specific decisions that can be applied to resources with attribute vector a. x = Number of resources with attribute vector a acted on by decision d D a at time t. x t = (x ) a A,d D c = Cost of making a decision d on resource attribute vector a at time t, a A, d D a, t {1,..., T }. c t = (c ) a A,d D The optimization problem over a finite horizon is written as: min t T a A c x (1) subject to, for all t T, A t x t R t = 0 (2) B t x t R t+1 = 0 (3) x t 0. (4) The problem in (1) can be hard to solve because of complexities such as uncertainty, integrality constraints, time windows on tasks and a high level of detail in defining actual operations. 5
8 It is common to solve time-staged problems such as (1) using techniques that step through time. Let: X π t (R t ) = Vector of decisions returned by a policy π Π given the resource state R t. There are several classes of policies that illustrate this function. A myopic policy uses the rule: Xt π (R t ) = arg min x t X t a A c x t T (5) where X t is the feasible set defined by the constraints (2) - (4). A rolling horizon policy would plan events over a planning horizon T ph T in the future and is given by X π t (R t ) = arg min x t,...,x t+t ph c x + a A t {1,..., T T ph }. t+t ph c t adx t ad t =t+1 a A where we optimize over x t,..., x t+t ph but only implement x t. Finally, we might use a dynamic programming policy: X π t (R t ) = arg min x t X t ( a A c x + V t+1 (R t+1 ) ) (6) where V t+1 is an approximation of the value of being in resource state R t+1 = B t x t. For simplicity of notation, we have presented our model assuming single-period transformation times, that is, resources which are acted on in period t reappear in period t+1. An important special case arises when resources are not reusable, which we would represent using B t = 0. Our goal is to obtain flows x t at a point in time which, when averaged over time, closely match the static flow patterns. In the next section we introduce the basis of our methodology that allows us to make decisions that match the static flow patterns. 6
9 2 Representation of Static Flow Patterns We first develop the notation to represent information pertaining to static flow patterns. We assume that exogenous patterns are specified in the form ρ s ad = The fraction of time that resources with attribute a are acted on with decisions of type d. Thus the vector ρ s a = (ρ s ad ) represents the probability mass function of the decisions d acting on the resource attribute vector a. In practice, it is typically the case that attributes (and decisions) are expressed at some level of aggregation, although we do not consider this possibility in this paper. To compare with static flow patterns, we normalize the decisions made by the model over the entire time horizon as shown below: ρ ad (x) = t T t T x x, d D a, a A. (7) We now present the optimization model in the following form: [( ) ] arg min c x + θh(ρ(x), ρ s ) t T a A (8) subject to A t x t = R t, x t 0, t T, (9) where H is a penalty function known as the pattern metric that penalizes deviations of the vector ρ(x) = (ρ ad (x)) a A,d Da from the static flow patterns ρ s = (ρ s ad ) a A,. This penalty is weighted by a positive scaling factor θ. The formulation in (8) holds true for both reusable and nonreusable resources if we note that in the case of nonreusable resources B t = 0. In the next paragraphs we derive the functional form of the pattern metric through a goodness-of-fit test metric used widely in statistics. We adopt a quadratic form of the pattern metric in (8) motivated by the popular Pearson goodness-of-fit metric (Read & Cressie (1988), Pearson (1900)). The Pearson goodness-of-fit 7
10 metric is a popular statistical test where a particular sample of data might have been drawn from a hypothesized probability distribution denoted by H 0. Consider observing a random variable which can have one of the possible outcomes in the set {d i } {i I}, where ρ i is the probability of outcome d i. The decisions {d i } {i I} are mutually exclusive and ρ i = 1. i I We assume ρ i > 0 for all i I. Consider a scenario where we observe N realizations of this random variable. We can summarize our observations using the vector (ˆρ i ) i I where ˆρ i denotes the fraction of the sample that is observed with outcome d i, i I. We hypothesize a probability vector for the null model using H 0 : ρ = (ρ i ) i I where ρ i > 0 for all i I. If the observations are independent and identically distributed the Pearson goodness-of-fit metric is a chi-squared statistic given by χ 2 = i I N ρ i (ˆρ i ρ i ) 2. (10) The null hypothesis H 0 (that is, the observation of the random variable follows the distribution ρ) is rejected if the Pearson goodness-of-fit metric in (10) exceeds a certain threshold. The Pearson goodness-of-fit metric in its original form has a disadvantage because of the presence of the probabilities in the denominator of the function. This is particularly inconvenient because we do not require the time-staged model to prohibit decisions that do not occur in the static flow pattern. Thus we adopt a simple variant of the Pearson goodness-of-fit metric as our functional form of the pattern metric, given by min x t X t, t T ( T ) c x + θ ( T ) t=1 R x 2 a ρ s ad, (11) R t=1 a a A a A where X t is the feasible region defined by constraints (2) - (4) for time t and = t T x 8
11 is the total number of resources with attribute a over the entire horizon. We first develop our methodology of solving the model with a pattern metric in (11) in a setting with nonreusable resources. In this setting, each time period represents a separate optimization problem with no coupling across time periods. Thus, if we do not consider static flow patterns, we can obtain the overall optimal solution simply by optimizing each time period. Introducing static flow patterns requires that we make decisions which, over time, minimize deviations from the exogenous pattern. 3 Static Flow Patterns with Nonreusable Resources In this section, we focus on the problem of nonreusable resources by which we mean that resources in time period t are not carried forward to the next time period. If we did not face the challenge of matching a static flow pattern (which applies to activities over all time periods) we would be able to solve each time period independently. Such models tend to arise in strategic planning settings where the time periods are fairly large. In subsection 3.1 we present an algorithm for the case of continuous resources. Subection 3.2 proves convergence of the algorithm. Finally, subsection 3.3 shows how to adapt the algorithm for the case of discrete resources. 3.1 The Continuous Case A dynamic resource allocation model with nonreusable resources is solved as a sequence of models over the set T given by x t = arg min x t X t a A c x t T. (12) Our goal is to develop a methodology that solves the model in (11) in a time-staged manner compatible with the techniques introduced in section 1. We let the optimal solution of our 9
12 objective function with the pattern metric be x t (θ) = arg min x t X t [( a A c x ) + θh t (x t ) ] t T, (13) where H t is a function whose specific form we derive using the pattern metric H(ρ(x), ρ s ) later in this subsection. Thus, x t (θ) is our solution with the pattern metric while x t = x t (0) is the solution obtained using only the cost function. With the application of the policy in (13) in the case of nonreusable resources we can show that H(ρ(x (θ)), ρ s ) H(ρ( x ), ρ s ) θ > 0, (14) where x (θ) = (x t (θ)) t T and x = (x t ) t T. The rest of this subsection is devoted to deriving the functional form of H t. The normalized decision variables over the entire time horizon are given by ρ ad (x) = T t=1 x a A, d D a. (15) We suppress the dependence of x on θ to simplify notation. The pattern metric proposed in (11) is given by H(ρ(x), ρ s ) = a A (ρ ad (x) ρ s ad) 2. (16) We can define the normalized decision variables specific to a stage t using ρ = x R ta a A, d D a t {1,..., T }. Analogous to the decision variable x we define x = Number of resources with attribute vector a acted on by decision d D a at time t in the optimal solution of (12). 10
13 Using the same notation for ρ we let ρ = x R ta a A, d D a t {1,..., T }. (17) Similar to the expression in (15) we define the normalized solution to the problem in (12) using ρ ad = T t=1 x a A, d D a which we may rewrite as ρ ad = = = T t=1 x T t=1 T t=1 x R ta R ta ( Rta ρ ), (18) where the last step uses the substitution in equation (17). We denote the gradient of H(ρ(x), ρ s ) with respect to the normalized decision variable ρ ad at the value ρ ad as h ad, which is found using h ad = H ρ ad ρad = ρ ad a A, d D a = 2 ( ρ ad ρ s ad) a A, d D a. (19) Using equation (18) and the relation T t=1 R ta = we can rewrite equation (19) as h ad = 2 ( T t=1 ( ) ) Rta ( ρ ρ s R ad) a A, d D a. (20) a When we solve a subproblem at time t using equation (13) we have already obtained the solution vector x for all t t < t. Our static flow pattern may be telling us to send 30 percent of a particular type of vehicle to a particular location, whereas if we look at the time periods before t, we may be doing this only 20 percent of the time. This information could be used 11
14 as we progress through time to help us match the static flow pattern, but is ignored in the expression for the gradient of the pattern metric in (20). We incorporate information regarding prior decisions by adopting a Gauss-Seidel strategy (see Strang (1988), p.381). We first define ρ, = t 1 ( ) x t ad + t =1 } {{ } I T ( ) x t ad t =t } {{ } II a A, d D a t {1,..., T }, (21) The Gauss-Seidel gradient of the pattern metric is given by h = 2 (ρ, ρs ad) a A, d D a t {1,..., T }. (22) The pattern metric itself can be calculated at the beginning of every subproblem using Ht 1 = ρs ad) 2 t {1,..., T + 1}. a A (ρ, Note that H0 is simply the pattern metric that evaluates the optimal solution x of model (12) and HT is the pattern metric that evaluates the solution x of model (13), which incorporates the static flow patterns. 3.2 Convergence Results We establish two useful results. The first shows that the Gauss-Seidel version of the algorithm monotonically improves the pattern metric as we step forward in time during a single iteration. We then establish overall convergence of the algorithm. The following theorem establishes monotonic improvement of the pattern metric within an iteration: Theorem 1 For all t, t {1,..., T } if we solve the following quadratic programming problem: x t (θ) = arg min ( c x + θ x [ h R + 2(u x ) ] ) du (23) a a A d D 0 a 12
15 subject to: A t x t = R t, x t 0 then we obtain H T H T 1... H 1. (24) Thus, the pattern metric evaluated after solving each subproblem in (23) forms a monotonically decreasing sequence in time t. Consequently the function H t (x t ) that we adopt in the formulation given in (13) is given by: H t (x t ) = ( 1 a A x 0 [ h + 2(u x ) ] du ). Proof: See appendix. Theorem 1 proves the expression in (14) thus validating our approach of solving the time-staged sequence of models stated in (13). We next show that the decisions produced by equation (23) produce the optimal solution to the objective function given in equation (11). The proof of convergence is obtained by showing that the model in (23) is identical to solving the model in (11) using an iterative method known as the block coordinate descent (BCD) method. The proof uses existing convergence results for this class of algorithms. The block coordinate descent method is a popular technique for minimizing a real-valued continuously differentiable function f of m real variables subject to upper bounding constraints. In this method coordinates of f are partitioned into M blocks and at each iteration, f is minimized with respect to one of the coordinate blocks while the other coordinates are held fixed. This method is closely related to Gauss-Seidel methods for equation solving (Ortega & Rheinboldt (1970) and Warga (1963)). Convergence of the block coordinate descent method typically requires that f be strictly convex, differentiable and, taking into account the bounded constraints, has bounded level sets (Sargent & Sebastian (1973),Warga (1963)). 13
16 We formally describe the BCD algorithm below using the notation developed in Tseng (2000): Initialization. Choose any x 0 = (x 0 1,..., x 0 M ) X. Iteration n, n 1. Given x n 1 = (x n 1 1,..., x n 1 ) X, choose an index s {1,....M} and compute a new iterate: M x n = (x n 1,..., x n M) X satisfying x n s = arg min f(x n 1 1,..., xs 1 n 1, x s, x n 1 s+1,..., x n 1 x M ) (25) s x n j = x n 1 j, j s, j {1,..., M}. The minimization in (25) is attained if the set {x : f(x) f(x 0 )} is bounded and f is lower semicontinuous on this compact set (Rockafellar (1972)). To ensure convergence of the algorithm it is further required that each coordinate block is chosen sufficiently often in the method. One of the most commonly used methods to achieve this is the cyclic rule. According to the cyclic rule there exists a constant M M such that every index j {1,..., M} is chosen at least once between the n-th iteration and the (n + M 1)-th iteration. A well-known case of this rule is when M = M according to which an index s is set to k {1,..., M} at iterations k, k + M, k + 2M,.... It is obvious why the BCD method is attractive to solve the model with a pattern metric given in (11) in the case where the resources are nonreusable. The number of blocks is equal to the number of time periods T. By fixing the values for T 1 blocks at any iteration we only need to optimize over the decision variables representing one time period, say index t. In the case where the resources are nonreusable the advantage of the BCD method is realized because we can optimize over the feasible region X t ignoring all other constraints. This is exactly what we exploited in developing our algorithm in (23). It should be noted that we do not require the initial solution x 0 to be the optimal solution of the optimization model 14
17 solved without the pattern metric. We used this as our initial solution in theorem 1 only to validate our approach in capturing information in an optimization model. If we adopt the cycle rule in the BCD methodology applied to our optimization model in (11) then at any iteration n 1 the time period (block) t that we minimize over is given by: t = n n 1 T T where x denotes the greatest integer less than or equal to x. The key to understanding the connection between the BCD methodology and our problem is that a subproblem solved at time period t is an iteration of the BCD methodology. The Gauss-Seidel gradient of the pattern metric given in (22) after iteration n can be expressed by h,n t ad = 2 ( T t =1 x,n We compute x,n t f n (x) = a A ρ s ad ) as follows. If t = n n 1 T ( c x + θ Otherwise, we simply set x,n t x,0, so we may use x,0 t = x t., n 1, a A, d D a, t T. (26) T, then x,n = arg min x X f n (x) where x 0 t [ h,n 1 + 2(u x,n 1 ) ] du ). (27) = x,n 1 t. Any feasible solution in X can be used to initialize A direct application of the BCD methodology suggests the following procedure. iteration n 1, if t = n n 1 T f BCD,n (x) = a A θ a A Otherwise, we simply set x,n t T, then x,n = arg min x X f BCD,n (x) where T t =1,t t ( T t R =1,t t x,n 1 t ad a = x,n 1 t. t c t adx,n 1 t ad + c x + At ) 2 + x ρ s ad. (28) We conclude this subsection by showing that our methodology in (23) is a provably convergent algorithm for solving the optimization model in a pattern metric. We first show 15
18 that the application of the BCD method to the optimization model in a pattern metric given in (11) and our methodology in (23) are exactly the same, that is, we prove the following: Theorem 2 The minimizers of f BCD,n given in (28) and f n given in (27) are identical, that is: arg min f BCD,n (x) = arg min f n (x), n 1, t = n n 1 x X x X T T. Proof: The proof is provided in the appendix. The proof of convergence follows directly from the properties of the optimization model in a pattern metric. In Warga (1963) it is shown that the application of the BCD methodology to a convex function does converge to the optimal solution if the following statements are true: The optimization model in a pattern metric given by equation (11) is continuously differentiable in some neighborhood (relative to X = t T X t ) at every stationary point of this function. For every t, t = 1,..., T, f BCD,n (x t ) (or f n (x t )) is a strictly convex function of x t for all iterations n 1. X is compact. The first condition holds since the model in a pattern metric is differentiable everywhere. The second condition of strict convexity also holds because of the quadratic form of the pattern metric (see appendix). The feasible region X is compact in most real-world applications. 3.3 The Discrete Case Many operational problems are characterized by integrality constraints on the decision variables as is indicated by the wide application of integer resource allocation problems. Such applications arise in airline fleet assignment (Barnhart et al. (2000), Hane et al. (1995)), air 16
19 traffic control (Bertsimas & Patterson (2000)), railcar management (Holmberg et al. (1998), Jordan & Turnquist (1983)), container distribution (Crainic et al. (1993)) and general fleet management (Powell & Carvalho (1997)). In this subsection we see how we can approximate the model in (23) to generate integer solutions. Moreover we see that we can solve the resulting problem as a network if the original structure of the problem (that is, the cost function without the pattern metric) is a network. There is a literature on solving quadratic cost functions and more general convex cost problems as network flow problems. Minoux (1984) developed a polynomial-time algorithm for obtaining a real-valued optimal solution of a quadratic form of the objective function similar to the model objective in (23). It is further shown in Minoux (1986) that this method can be used to obtain an integer optimal solution to the general convex flow problem. We use a method (see Ahuja et al. (1993), pp ) that approximates a quadratic function using a piecewise linear model. We then show that this formulation can be solved as a network and use the well-known fact that solving a network with integer data as a linear program yields integer solutions. The objective function in (23) can be expressed as a A C (x ) where: C (x ) = c x + θ x [ h R + 2(u x ) ] du. a 0 Since x cannot exceed the number of occurrences of state a at time t denoted by R ta we can approximate C (x ) by at most R ta = R ta + I {Rta R ta >0} linear segments. The set {0, 1,..., R ta } denotes the breakpoints of the piecewise linear approximation. The linear cost coefficient in any interval [u 1, u], u {1,..., R ta } is obtained by taking the gradient of C (x ) evaluated at x = u which is given by c + θ [ h + 2(u x )]. Let Rta u=1 yu = x where 0 y u 1, u {1,..., R ta }. Using the piecewise linear approximation of C (x ) we can represent the quadratic formulation in (23) using yt (θ) = arg min y t Y t R ta a A u=1 ( c + θ [ h + 2(u x ) ]) y u. (29) Y t is the feasible region obtained from transforming the feasible region X using the equations 17
20 Rta u=1 yu = x and the constraints 0 y u 1, u {1,..., R ta }. If the feasible region X t for all t T defines network flow constraints, we see that the formulation in (29) retains the network structure. In the presence of integer data this formulation yields integer solutions. The disadvantage of the network formulation in (29) is that we need to replace a single arc representing the pattern (a, d) with multiple arcs each of whose upper bound is one unit of flow and the cardinality of the number of arcs associated with a particular pattern (a, d) is given by R ta. A simpler version of our piecewise linear approximation is simply to use a linear approximation as shown below: x t (θ) = arg min x t X t a A (c + θra h ) x. (30) In the next section we extend the algorithm developed in this section to the reusable resource case. 4 Extension to Reusable Resources When time periods are relatively short, the decisions to act on resources in one time period impact the resources available in a later time period. In this case, the time periods are coupled. A natural algorithmic strategy is to use approximate dynamic programming methods. Decisions made in time period t can capture the impact on time period t + 1 by using an approximate value function V t+1 (R t+1 ) where R t+1 = B t x t, as presented in equation (6). When we allocate resources, our decisions (x ) a A,d D must be chosen subject to the resource constraint: R n ta = x n a A, t T. In the case of reusable resources, the resource vector R t = (R ta ) a A, for t 1, depends on decisions made in earlier time periods. We let V t (R t ) be the function that describes (at least 18
21 approximately) the optimal value of having R t resources at the beginning of time period t for the remainder of the horizon. An outline of the basic algorithm is given in figure 1. We use U V to denote an updating function that updates the value function approximations for the resource state Rt n = {Rta} n a A, t T at every iteration n 1. Examples of such approximations for resource allocation problems can be found in Powell et al. (2002), Godfrey & Powell (2001) and Godfrey & Powell (2002). A general treatment of approximate dynamic programming methods can be found in Bertsekas & Tsitsiklis (1996) and Si et al. (2004). In practice, these methods do not produce optimal solutions for most problems, and as a result we lose our ability to prove overall convergence of the algorithm. However, we can show that our pattern matching algorithm improves our ability to match an exogenously specified pattern. In addition, we can show experimentally that we can improve overall solution quality when the exogenous pattern is based on solving a static model to optimality. Step 0 Initialization: Set iteration counter n = 1. Choose an approximation V 0 t (.) for V t (.), t T. Step 1 Forward Pass: Step 1.0 Initialize forward pass: Initialize R 1 1. Set t = 1. Step 1.1 Solve subproblem: For time period t solve equation (6) to get solution vector x n t. Step 1.2 Apply system dynamics to update resource attributes after transformation. Step 1.3 Advance time t = t + 1: If t T go to Step 1.1. Step 2 Value function update: Set V t n (.) U V n 1 ( V t (.), Rt n ), t T. Step 3 Advance iteration counter: Stop if convergence is satisfied. If not set n = n + 1 and go to Step 1. Figure 1: Value iteration methodology for dynamic resource allocation problems with reusable resources. In this section we show how we can apply the optimization model introduced in subsection 3.3 to the iterative setting represented by figure 1. We define the normalized model flows as 19
22 shown below: ρ n (x) = xn R n ta a A, t T d D a. As the model decision variables change with each iteration it is necessary to define the pattern metric given by equation (16) at every iteration as a function of the normalized model flows obtained from the previous iteration. We denote the pattern metric at the beginning of iteration n as shown by H n 1 = H(ρ n 1 (x), ρ s, R n 1 ) where H is given by the expression in equation (16). Note that we use the additional argument R n 1 in denoting the pattern metric H n 1 to take into account the fact that when we have reusable resources the number of resources with attribute a varies across iterations. We let R n a = t T R n ta a A. We assume we have the initialized values ρ 0 and R 0. We denote the gradient of H n 1 with respect to the normalized decision variable ρ ad at the value ρ n 1 ad h n ad = Hn 1 ρ ad from which we obtain ρad =ρ n 1 ad a A, d D a as h n ad. We begin with h n ad = 2Ra n 1 (ρ n 1 ad ρ s ad) a A, d D a. The Gauss-Seidel variant of the gradient of the pattern metric, denoted by h n,is given by h n = 2Ra n 1 (ρ n, ρs ad) a A, d D a t {1,..., T }, (31) where as before we define ρ n, = t 1 ( ) x n t ad + Ra n 1 t =1 T t =t ( x n 1 ) t ad Ra n 1 a A, d D a t {1,..., T }. 20
23 Note that we use the notation for the Gauss-Seidel gradient of the pattern metric as a function of n since it reflects the usage of decision variables from prior time periods obtained at iteration n. Within the approximate dynamic programming technique proposed to solve this problem, we adopt a linear value function approximation V t (R t ) = a A v n tar ta, where v n ta is an approximation of the marginal value of resources of type a at time t. Let the attribute transition function be defined using a M (a, d) = The attribute of a resource produced by acting on a resource with attribute a using decision d. The slope v n t+1,a M (a,d) represents the future (marginal) value at time t+1 of a decision d acting at time t on resource attribute vector a. If we use a linear value function approximation, then our subproblem at time t becomes Y n t min y n t Yn t R n ta a A u=1 ( c + v n 1 t+1,a M (a,d) + θ [ hn Ra n 1 + 2(u x n 1 )]) y u,n. (32) denotes the feasible region at iteration n and time t. We use the notation y u,n the iteration-specific dependence of the flow decomposition variables. to indicate We obtain a myopic policy by simply setting v n 1 t+1,a = 0, giving us min y n t Yn t R n ta a A u=1 ( c + θ [ hn Ra n 1 + 2(u x n 1 )]) y u,n. (33) We can obtain an estimate of v n ta by letting ˆv n ta be the dual variable of the supply constraint (equation (2)) of resource attribute a of the subproblem solved at time t at iteration n. Since these fluctuate randomly (even for deterministic problems), we update our estimates v n ta using v n ta = (1 α n ) v n 1 ta + α nˆv n ta a A, t T, 21
24 where α n (0, 1) is a smoothing factor. 5 A Resource Allocation Problem There are two applications of our pattern matching logic which we would like to test. First, we wish to demonstrate the degree to which our algorithm can improve our ability to match exogenously specified patterns. This ability improves user acceptance of these complex models. Second, we wish to measure the value of using the optimal solution of a static model to guide the approximate solution of a dynamic model for the more difficult context of nonreusable resources. To demonstrate the usefulness of the approach, we use as our test setting a problem known as the military airlift problem, which requires managing over time different types of cargo aircraft to move a set of loads ( requirements ) within a network of airbases. Cargo aircraft can be moved loaded or repositioned empty. The problem was chosen in part because while it exhibits the difficult time-staged nature of all of our problems, it is still small enough that we can solve the dynamic version of the model using a commercial solver. This ability allows us to evaluate all of our solutions relative to the optimal solution. In this section we first present the multicommodity flow problem in section 5.1. We detail in section 5.2 the static model that we solve to generate the static flow patterns. Section 5.3 then presents the dynamic model and we see how we can formulate the decision to hold a resource for the next time period which is absent when solving the static model. The results from the actual experiments are reported in section 6; 5.1 The Multicommodity Flow Problem Our experimental design is centered around a dynamic, multicommodity flow problem where resources are assigned to tasks that are moved from one location to another. On completion of these tasks these resources are allowed to cover other tasks starting from that location or to move empty to a different location to cover other tasks. Typically tasks have a time window during which they are available for assignment. There is a reward for covering a 22
25 Table 1: Compatibility matrix task based on the type of resource assigned to it. In addition there is a cost of moving empty between two locations. The data for our experiment is motivated by the military airlift problem, where a fleet of cargo aircraft are used to move loads of freight over time. We consider five types of aircraft and five types of tasks. We conducted experiments with five sets of data. Each dataset is characterized by a label L-A(#)-T(#)-TP where L denotes the number of locations, A the number of aircraft, T the number of tasks and TP the number of time periods. For the same number of aircraft we have different data sets characterizing the attributes of the aircraft and this difference is indicated as a counter A(#) for aircraft (we use T(#) for tasks). For example (1)-2000(1)-30 indicates an experiment that has 20 aircraft locations, 200 aircraft characterized by dataset 1, 2000 tasks characterized by dataset 1 and 30 time periods. Each task is characterized by an origin, destination and a type. A negative value is generated for covering the task and this reward is a function of the type of the task and the type of resource assigned to the task. Each task is associated with a value specified in dollars and the reward for covering this task with a resource is based on a compatibility matrix of dimensionality 5 5 that indicates the fraction of the reward received when covering a particular task type with a specific resource type. The compatibility matrix for our experiment is shown in table 1. There is an empty cost in dollars per mile that is associated with moving empty from one location to another. The empty cost is the same for all resource types. The data set is generated so that the number of demands going out of a location is negatively correlated with the number of demands going into a location at a certain time period. This results in more empty repositioning moves and more temporal flow imbalances. 23
26 The resource attribute vector a is given by a = {location, aircraft-type}. (34) We denote the set of locations in the network by J, and let a location be the attribute location of the resource attribute vector a. For any location j J we define L(j) as the set of tasks whose origin is j. A task expires from the system if it has not been assigned at the time it is available for assignment, that is, we do not assume time windows on tasks in the dynamic model. There is no reward generated for expired tasks. The decision set for the resource a at time t is given by D a = {Move assigned with l L(a location ), Move empty to j J }, a A We let D e a be the set of decisions to move a vehicle empty. 5.2 The Static Model We solve the static model characterizing the resource allocation problem presented in the above subsection as a flow balancing network model. To denote the transformation of resources in the static model we define the following indicator variable: 1 If decision d D a transforms the resource with attribute δ a (a, d ) = vector a A to the state a A 0 Otherwise. The static flow-balancing network model is given by x s = arg min c ad x ad (35) x X a A where we let c ad be the unit cost of transforming a resource with attribute vector a A 24
27 using a decision d D a. The feasible region X is defined by the constraints x ad a A d D a δ a (a, d ) x a d = 0, a A x ad 0, a A, d D a. The cost vector c consists of negative values (rewards) for covering a task and positive values (costs) for moving empty between locations. We represent the normalized optimal flows of empties from the static model as static flow patterns in the time-staged resource allocation model. The normalized static flow patterns ρ s ad are derived from the flow of empties using ρ s ad = x s ad d D ea xs ad, a A, d D e a. The static model is able to globally balance flows over the entire network. As such, these models are able to capture network-level patterns that may be missed by approximate models that are stepping forward through time. The experimental challenge is to measure the size of this benefit. 5.3 The Dynamic Model The objective function for the time-staged model is given by: max x X t T a A c x In our dynamic model, our cost vector has to consider the timing of activities. Thus, a load that is moved late will be assessed a service penalty. A problem we face in using flows from a static model to guide a dynamic model is that the static model does not provide any guidance as to how much flow should be held at a location (the hold option) at a given time period. Let β n a [0, 1] be an estimate of the fraction of hold flows for resources with attribute vector a at iteration n. We use the total number of empties from the static model to derive the scaling factor β n a as shown: β n a = min ( 0, 1 d Da e xs ad d Da t T e xn 1 ) 25
28 where d D e a t T xn 1 is the total number of resources with attribute vector a in the model characterizing flow of empties and the hold decisions from the previous iteration. Instead of using an iteration-independent ρ s ad to represent static flow patterns at every iteration we use: ρ s,n ad = { β n a Hold (1 β n a )ρ s ad Move to another location Thus, we are employing a user-defined parameter to specify the fraction of vehicles that are held in a location, and then factoring down the movements to other locations so that the pattern still sums to one. The new vector of probabilities {ρ s,n ad } d D e a satisfies the following condition at every iteration n: d D e a ρ s,n ad = 1, a A. The new pattern metric at the end of every iteration n is given by H n (ρ n (x), ρ s,n, R n ) = a A ( T t=1 R n ta ) d D e a (ρ n ad ρ s,n ad )2 (36) where we use the compact notation ρ s,n = {ρ s,n ad } a A, d D e a. A summary of the algorithm we use to incorporate the pattern logic in a dynamic model is given in figure 2. 6 Experimental Results We have three questions we wish to answer experimentally: 1) How quickly does the algorithm converge? 2) How well does the algorithm match exogenous patterns for problems with reusable resources? and 3) If the exogenous pattern is the optimal solution to a static problem, how much does this improve the solution when we are using an approximate algorithm (for problems with reusable resources)? 26
29 Step 0 Initialization: Set iteration counter n to 1. Initialize the following for n = 1: h 0 = 0, t T, a A, d D e a R0 ta t T R0 ta = 0, t T, a A ρ s,0 a = ρ s a, a A. Step 1 Set time t = 1: Step 2 : Step 1.0 If n > 1 : Derive network arc cost using the Gauss-Seidel gradient of the pattern metric as in equation (31) and apply smoothing to this cost. Step 1.1 Solve the time-staged model with linear value function approximations indicated in (32) or the myopic policy indicated in (33) for stage t. Step 1.2 Increment t = t + 1: If t T go to Step 1.0 else go to Step 2. Step 2.0 Calculate aggregate decision variables: x n ad = t T x n, a A, d D e a Step 2.1 Derive: ρ n ad = x n ad d D ea xn ad, a A, d D e a If d D e a xn ad = 0 set ρn ad = 1 for all a A, d De a, δ a (a, d) = 1, and ρ n ad = 0, otherwise. Step 2.2 Scaling: Derive ρ s,n ad, a A, d De a to reflect hold decisions. Step 2.3 Derive the pattern metric H n using (36). Step 2.5 Advance iteration counter if convergence is not satisfied: Set n = n + 1 and go to Step 1. Figure 2: Piecewise linear version of algorithm for incorporating static flow patterns in a time-staged resource allocation model. 27
30 Flow patterns from static model Origin,Aircraft-Type (a) Destination (d) Proportion(ρ s ad ) Total Flow(xs ad ) FL-34,A VA FL-34,A MS SC-29,A VA CA-95,B MO CA-95,B OR IA-51,D AK UT-84,D NM UT-84,D MO Table 2: Percent of flow moving empty from origin to destination by aircraft type, as produced by the static model. These are the patterns used to guide the dynamic model. We address these questions using the problem described in section 5. A sample file of patterns representing the flow of empties between locations obtained from solving the static model for the military airlift problem is highlighted in table 2. In our experiments we are able to solve the dynamic resource allocation model exactly to get the optimal solution. Based on experimentation we found that using a scaling factor θ = 1000 is appropriate when incorporating patterns with a linear value function and a scaling factor θ = is appropriate when incorporating patterns while using a myopic policy, which performs very poorly for this problem class. In our experiment we use α n = 2 10+n as the smoothing factor to update the linear value function approximations. The smoothing factor that we apply 20 to the Gauss-Seidel gradient of the pattern metric is. We initialize all the smoothed 40+n 1 gradients and costs for n = 0 to Rate of convergence We have proven that our algorithm monotonically reduces the pattern metric, even for the case of reusable resources where we are unable to prove global convergence (since we are using an approximate algorithm to step through time). Unresolved, however, is the rate of convergence. In the introduction, we described a number of projects where we are using this methodology. We have consistently found that the Gauss-Seidel strategy produces very fast convergence. Figure 3 shows how well we match a historical pattern (normalized to 100) after each 28
31 Normalized metric Iterations Figure 3: Rate of convergence of the pattern metric. iteration of the algorithm. The model was judged to be acceptable (by a knowledgeable user) if the performance was within the bounds shown in the figure (approximately two percent above and below the target). We found that the Gauss-Seidel algorithm converged closely to this target within three to four iterations. We have used this algorithm in a number of projects, and this performance is typical. The fast performance is due to the ability of the algorithm to adjust after each time step whether it should do more or less of an activity in order to match a target statistic based on how well we are tracking the goal over the last T time periods (which may include time periods from a previous iteration). If we are using an approximate dynamic programming algorithm, we have to simulate the problem iteratively, and the pattern logic adds only a nominal computational burden. If we were to use a simple myopic policy which normally would require stepping through the data once, this logic now requires that we repeat this simulation three or four times. 6.2 Matching patterns and improving solution quality We now report on experiments where we measure both how well the procedure matches exogenous patterns, and the degree to which patterns derived from solving a static model optimally improves the quality of heuristics used to solve the dynamic problem. 29
32 Linear Pattern Logic and a Myopic Policy Data % optimality % optimality % improvement % improvement with θ = 0 with θ = in obj. function in pattern metric (1)-2000(1) (1)-4000(1) (1)-6000(1) (2)-4000(2) (3)-4000(3) Table 3: Effect of patterns when using a myopic policy (value functions are zero) and a linear pattern metric. Piecewise Linear Pattern Logic using a Myopic Policy Data % optimality % optimality % improvement % improvement with θ = 0 with θ = in obj. function in pattern metric (1)-2000(1) (1)-4000(1) (1)-6000(1) (2)-4000(2) (3)-4000(3) Table 4: Effect of patterns when using a myopic policy and a piecewise-linear pattern metric. Tables 3 and 4 summarize our experimental results when implementing our algorithm using a myopic policy. We see that there is a significant improvement in the percentage of optimality obtained by incorporating patterns using either the linear (equation (30)) or piecewise linear (equation (29)) versions of our algorithm. We are able to achieve in most cases around 70 percent of the optimal solution with an improvement of around 40 percent. While this is far below optimal, we have to point out that the myopic policy is especially poor. This policy does not allow us to move equipment empty to a different location to cover demands that might arise in the future, resulting in excess inventories of equipment at some location that become unproductive. We also see that in the implementations of both the linear and piecewise linear versions of our methodology there is a significant reduction in the pattern metric, showing that we are doing a much better job of matching the pattern. In tables 5 and 6 we report our results for incorporating patterns when we use linear value function approximations to convey information among subproblems. We see that even without incorporating patterns, with the use of linear value function approximations we are able to achieve more than 90 percent of the optimal solution. Despite this, both linear and 30
The Optimizing-Simulator: Merging Optimization and Simulation Using Approximate Dynamic Programming
The Optimizing-Simulator: Merging Optimization and Simulation Using Approximate Dynamic Programming Winter Simulation Conference December 5, 2005 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu
More informationApproximate Dynamic Programming for High Dimensional Resource Allocation Problems
Approximate Dynamic Programming for High Dimensional Resource Allocation Problems Warren B. Powell Abraham George Belgacem Bouzaiene-Ayari Hugo P. Simao Department of Operations Research and Financial
More informationApproximate Dynamic Programming in Rail Operations
Approximate Dynamic Programming in Rail Operations June, 2007 Tristan VI Phuket Island, Thailand Warren Powell Belgacem Bouzaiene-Ayari CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu
More informationDynamic Models for Freight Transportation
Dynamic Models for Freight Transportation Warren B. Powell Belgacem Bouzaiene-Ayari Hugo P. Simao Department of Operations Research and Financial Engineering Princeton University August 13, 2003 Abstract
More informationDynamic Models for Freight Transportation
Dynamic Models for Freight Transportation Warren B. Powell Belgacem Bouzaiene-Ayari Hugo P. Simao Department of Operations Research and Financial Engineering Princeton University September 14, 2005 Abstract
More informationSome Fixed-Point Results for the Dynamic Assignment Problem
Some Fixed-Point Results for the Dynamic Assignment Problem Michael Z. Spivey Department of Mathematics and Computer Science Samford University, Birmingham, AL 35229 Warren B. Powell Department of Operations
More informationThere has been considerable recent interest in the dynamic vehicle routing problem, but the complexities of
TRANSPORTATION SCIENCE Vol. 38, No. 4, November 2004, pp. 399 419 issn 0041-1655 eissn 1526-5447 04 3804 0399 informs doi 10.1287/trsc.1030.0073 2004 INFORMS The Dynamic Assignment Problem Michael Z. Spivey,
More informationStochastic Programming in Transportation and Logistics
Stochastic Programming in Transportation and Logistics Warren B. Powell and Huseyin Topaloglu Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544 Abstract
More informationA Parallelizable and Approximate Dynamic Programming-Based Dynamic Fleet Management Model with Random Travel Times and Multiple Vehicle Types
A Parallelizable and Approximate Dynamic Programming-Based Dynamic Fleet Management Model with Random Travel Times and Multiple Vehicle Types Huseyin Topaloglu School of Operations Research and Industrial
More informationApproximate Dynamic Programming in Transportation and Logistics: A Unified Framework
Approximate Dynamic Programming in Transportation and Logistics: A Unified Framework Warren B. Powell, Hugo P. Simao and Belgacem Bouzaiene-Ayari Department of Operations Research and Financial Engineering
More informationImproving the travel time prediction by using the real-time floating car data
Improving the travel time prediction by using the real-time floating car data Krzysztof Dembczyński Przemys law Gawe l Andrzej Jaszkiewicz Wojciech Kot lowski Adam Szarecki Institute of Computing Science,
More informationA Distributed Decision Making Structure for Dynamic Resource Allocation Using Nonlinear Functional Approximations
A Distributed Decision Making Structure for Dynamic Resource Allocation Using Nonlinear Functional Approximations Huseyin Topaloglu School of Operations Research and Industrial Engineering Cornell University,
More informationOn Two Class-Constrained Versions of the Multiple Knapsack Problem
On Two Class-Constrained Versions of the Multiple Knapsack Problem Hadas Shachnai Tami Tamir Department of Computer Science The Technion, Haifa 32000, Israel Abstract We study two variants of the classic
More informationWe present a general optimization framework for locomotive models that captures different levels of detail,
Articles in Advance, pp. 1 24 ISSN 0041-1655 (print) ISSN 1526-5447 (online) http://dx.doi.org/10.1287/trsc.2014.0536 2014 INFORMS From Single Commodity to Multiattribute Models for Locomotive Optimization:
More informationRobust Optimization for Empty Repositioning Problems
Robust Optimization for Empty Repositioning Problems Alan L. Erera, Juan C. Morales and Martin Savelsbergh The Logistics Institute School of Industrial and Systems Engineering Georgia Institute of Technology
More informationInventory optimization of distribution networks with discrete-event processes by vendor-managed policies
Inventory optimization of distribution networks with discrete-event processes by vendor-managed policies Simona Sacone and Silvia Siri Department of Communications, Computer and Systems Science University
More informationMODELING DYNAMIC PROGRAMS
CHAPTER 5 MODELING DYNAMIC PROGRAMS Perhaps one of the most important skills to develop in approximate dynamic programming is the ability to write down a model of the problem. Everyone who wants to solve
More informationA Representational Paradigm for Dynamic Resource Transformation Problems
A Representational Paradigm for Dynamic Resource Transformation Problems Warren B. Powell, Joel A. Shapiro and Hugo P. Simao January, 2003 Department of Operations Research and Financial Engineering, Princeton
More informationRouting. Topics: 6.976/ESD.937 1
Routing Topics: Definition Architecture for routing data plane algorithm Current routing algorithm control plane algorithm Optimal routing algorithm known algorithms and implementation issues new solution
More informationApproximate Dynamic Programming in Transportation and Logistics: A Unified Framework
Approximate Dynamic Programming in Transportation and Logistics: A Unified Framework Warren B. Powell, Hugo P. Simao and Belgacem Bouzaiene-Ayari Department of Operations Research and Financial Engineering
More informationThe Dynamic Energy Resource Model
The Dynamic Energy Resource Model Group Peer Review Committee Lawrence Livermore National Laboratories July, 2007 Warren Powell Alan Lamont Jeffrey Stewart Abraham George 2007 Warren B. Powell, Princeton
More informationApproximate Dynamic Programming for High-Dimensional Resource Allocation Problems
Approximate Dynamic Programming for High-Dimensional Resource Allocation Problems Warren B. Powell Department of Operations Research and Financial Engineering Princeton University Benjamin Van Roy Departments
More informationAn Integrated Optimizing-Simulator for the Military Airlift Problem
An Integrated Optimizing-Simulator for the Military Airlift Problem Tongqiang Tony Wu Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544 Warren B. Powell
More informationOn-line supplement to: SMART: A Stochastic Multiscale Model for the Analysis of Energy Resources, Technology
On-line supplement to: SMART: A Stochastic Multiscale Model for e Analysis of Energy Resources, Technology and Policy This online supplement provides a more detailed version of e model, followed by derivations
More informationTRUCK DISPATCHING AND FIXED DRIVER REST LOCATIONS
TRUCK DISPATCHING AND FIXED DRIVER REST LOCATIONS A Thesis Presented to The Academic Faculty by Steven M. Morris In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in Industrial
More informationA heuristic algorithm for the Aircraft Landing Problem
22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 A heuristic algorithm for the Aircraft Landing Problem Amir Salehipour
More informationPerhaps one of the most widely used and poorly understood terms in dynamic programming is policy. A simple definition of a policy is:
CHAPTER 6 POLICIES Perhaps one of the most widely used and poorly understood terms in dynamic programming is policy. A simple definition of a policy is: Definition 6.0.1 A policy is a rule (or function)
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationHandout 1: Introduction to Dynamic Programming. 1 Dynamic Programming: Introduction and Examples
SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 1: Introduction to Dynamic Programming Instructor: Shiqian Ma January 6, 2014 Suggested Reading: Sections 1.1 1.5 of Chapter
More informationOn the Approximate Linear Programming Approach for Network Revenue Management Problems
On the Approximate Linear Programming Approach for Network Revenue Management Problems Chaoxu Tong School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853,
More informationPrioritized Sweeping Converges to the Optimal Value Function
Technical Report DCS-TR-631 Prioritized Sweeping Converges to the Optimal Value Function Lihong Li and Michael L. Littman {lihong,mlittman}@cs.rutgers.edu RL 3 Laboratory Department of Computer Science
More informationLecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo
Lecture 1 Behavioral Models Multinomial Logit: Power and limitations Cinzia Cirillo 1 Overview 1. Choice Probabilities 2. Power and Limitations of Logit 1. Taste variation 2. Substitution patterns 3. Repeated
More informationBayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem
Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:
More informationMitigating end-effects in Production Scheduling
Mitigating end-effects in Production Scheduling Bachelor Thesis Econometrie en Operationele Research Ivan Olthuis 359299 Supervisor: Dr. Wilco van den Heuvel June 30, 2014 Abstract In this report, a solution
More informationLarge Scale Semi-supervised Linear SVMs. University of Chicago
Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.
More informationWe consider a nonlinear nonseparable functional approximation to the value function of a dynamic programming
MANUFACTURING & SERVICE OPERATIONS MANAGEMENT Vol. 13, No. 1, Winter 2011, pp. 35 52 issn 1523-4614 eissn 1526-5498 11 1301 0035 informs doi 10.1287/msom.1100.0302 2011 INFORMS An Improved Dynamic Programming
More informationWE propose an approximate dynamic programming
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 58, NO 12, DECEMBER 2013 2995 An Optimal Approximate Dynamic Programming Algorithm for Concave, Scalar Storage Problems With Vector-Valued Controls Juliana Nascimento
More informationAn Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints
An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints Klaus Schittkowski Department of Computer Science, University of Bayreuth 95440 Bayreuth, Germany e-mail:
More informationM08/5/MATSD/SP1/ENG/TZ2/XX/M+ MARKSCHEME. May 2008 MATHEMATICAL STUDIES. Standard Level. Paper pages
M08/5/MATSD/SP1/ENG/TZ/XX/M+ MARKSCHEME May 008 MATHEMATICAL STUDIES Standard Level Paper 1 0 pages M08/5/MATSD/SP1/ENG/TZ/XX/M+ This markscheme is confidential and for the exclusive use of examiners in
More informationRecoverable Robustness in Scheduling Problems
Master Thesis Computing Science Recoverable Robustness in Scheduling Problems Author: J.M.J. Stoef (3470997) J.M.J.Stoef@uu.nl Supervisors: dr. J.A. Hoogeveen J.A.Hoogeveen@uu.nl dr. ir. J.M. van den Akker
More informationM08/5/MATSD/SP2/ENG/TZ2/XX/M+ MARKSCHEME. May 2008 MATHEMATICAL STUDIES. Standard Level. Paper pages
M08/5/MATSD/SP/ENG/TZ/XX/M+ MARKSCHEME May 008 MATHEMATICAL STUDIES Standard Level Paper 3 pages M08/5/MATSD/SP/ENG/TZ/XX/M+ This markscheme is confidential and for the exclusive use of examiners in this
More informationSession-Based Queueing Systems
Session-Based Queueing Systems Modelling, Simulation, and Approximation Jeroen Horters Supervisor VU: Sandjai Bhulai Executive Summary Companies often offer services that require multiple steps on the
More informationVehicle Routing with Traffic Congestion and Drivers Driving and Working Rules
Vehicle Routing with Traffic Congestion and Drivers Driving and Working Rules A.L. Kok, E.W. Hans, J.M.J. Schutten, W.H.M. Zijm Operational Methods for Production and Logistics, University of Twente, P.O.
More informationIntegrated Network Design and Scheduling Problems with Parallel Identical Machines: Complexity Results and Dispatching Rules
Integrated Network Design and Scheduling Problems with Parallel Identical Machines: Complexity Results and Dispatching Rules Sarah G. Nurre 1 and Thomas C. Sharkey 1 1 Department of Industrial and Systems
More informationAn Approximate Dynamic Programming Algorithm for the Allocation of High-Voltage Transformer Spares in the Electric Grid
An Approximate Dynamic Programming Algorithm for the Allocation of High-Voltage Transformer Spares in the Electric Grid Johannes Enders Department of Operations Research and Financial Engineering Princeton
More information3E4: Modelling Choice. Introduction to nonlinear programming. Announcements
3E4: Modelling Choice Lecture 7 Introduction to nonlinear programming 1 Announcements Solutions to Lecture 4-6 Homework will be available from http://www.eng.cam.ac.uk/~dr241/3e4 Looking ahead to Lecture
More informationA Gentle Introduction to Reinforcement Learning
A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,
More informationBayesian Active Learning With Basis Functions
Bayesian Active Learning With Basis Functions Ilya O. Ryzhov Warren B. Powell Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA IEEE ADPRL April 13, 2011 1 / 29
More informationThe design of Demand-Adaptive public transportation Systems: Meta-Schedules
The design of Demand-Adaptive public transportation Systems: Meta-Schedules Gabriel Teodor Crainic Fausto Errico ESG, UQAM and CIRRELT, Montreal Federico Malucelli DEI, Politecnico di Milano Maddalena
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationThe network maintenance problem
22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 The network maintenance problem Parisa Charkhgard a, Thomas Kalinowski
More information16.410/413 Principles of Autonomy and Decision Making
6.4/43 Principles of Autonomy and Decision Making Lecture 8: (Mixed-Integer) Linear Programming for Vehicle Routing and Motion Planning Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute
More informationComplexity Metrics. ICRAT Tutorial on Airborne self separation in air transportation Budapest, Hungary June 1, 2010.
Complexity Metrics ICRAT Tutorial on Airborne self separation in air transportation Budapest, Hungary June 1, 2010 Outline Introduction and motivation The notion of air traffic complexity Relevant characteristics
More informationA Tighter Variant of Jensen s Lower Bound for Stochastic Programs and Separable Approximations to Recourse Functions
A Tighter Variant of Jensen s Lower Bound for Stochastic Programs and Separable Approximations to Recourse Functions Huseyin Topaloglu School of Operations Research and Information Engineering, Cornell
More informationLinear-Quadratic Optimal Control: Full-State Feedback
Chapter 4 Linear-Quadratic Optimal Control: Full-State Feedback 1 Linear quadratic optimization is a basic method for designing controllers for linear (and often nonlinear) dynamical systems and is actually
More informationWhat you should know about approximate dynamic programming
What you should know about approximate dynamic programming Warren B. Powell Department of Operations Research and Financial Engineering Princeton University, Princeton, NJ 08544 December 16, 2008 Abstract
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationAn Optimization-Based Heuristic for the Split Delivery Vehicle Routing Problem
An Optimization-Based Heuristic for the Split Delivery Vehicle Routing Problem Claudia Archetti (1) Martin W.P. Savelsbergh (2) M. Grazia Speranza (1) (1) University of Brescia, Department of Quantitative
More informationStochastic programs with binary distributions: Structural properties of scenario trees and algorithms
INSTITUTT FOR FORETAKSØKONOMI DEPARTMENT OF BUSINESS AND MANAGEMENT SCIENCE FOR 12 2017 ISSN: 1500-4066 October 2017 Discussion paper Stochastic programs with binary distributions: Structural properties
More informationRuntime Reduction Techniques for the Probabilistic Traveling Salesman Problem with Deadlines
Runtime Reduction Techniques for the Probabilistic Traveling Salesman Problem with Deadlines Ann Melissa Campbell, Barrett W. Thomas Department of Management Sciences, University of Iowa 108 John Pappajohn
More informationDynamic Programming Approximations for Stochastic, Time-Staged Integer Multicommodity Flow Problems
Dynamic Programming Approximations for Stochastic, Time-Staged Integer Multicommody Flow Problems Huseyin Topaloglu School of Operations Research and Industrial Engineering, Cornell Universy, Ithaca, NY
More informationCS 6901 (Applied Algorithms) Lecture 3
CS 6901 (Applied Algorithms) Lecture 3 Antonina Kolokolova September 16, 2014 1 Representative problems: brief overview In this lecture we will look at several problems which, although look somewhat similar
More informationReal-time Systems: Scheduling Periodic Tasks
Real-time Systems: Scheduling Periodic Tasks Advanced Operating Systems Lecture 15 This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of
More informationM08/5/MATSD/SP1/ENG/TZ1/XX/M+ MARKSCHEME. May 2008 MATHEMATICAL STUDIES. Standard Level. Paper pages
M08/5/MATSD/SP1/ENG/TZ1/XX/M+ MARKSCHEME May 008 MATHEMATICAL STUDIES Standard Level Paper 1 0 pages M08/5/MATSD/SP1/ENG/TZ1/XX/M+ This markscheme is confidential and for the exclusive use of examiners
More informationSupplementary Technical Details and Results
Supplementary Technical Details and Results April 6, 2016 1 Introduction This document provides additional details to augment the paper Efficient Calibration Techniques for Large-scale Traffic Simulators.
More informationOn service level measures in stochastic inventory control
On service level measures in stochastic inventory control Dr. Roberto Rossi The University of Edinburgh Business School, The University of Edinburgh, UK roberto.rossi@ed.ac.uk Friday, June the 21th, 2013
More informationOn the static assignment to parallel servers
On the static assignment to parallel servers Ger Koole Vrije Universiteit Faculty of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The Netherlands Email: koole@cs.vu.nl, Url: www.cs.vu.nl/
More informationRobust multi-sensor scheduling for multi-site surveillance
DOI 10.1007/s10878-009-9271-4 Robust multi-sensor scheduling for multi-site surveillance Nikita Boyko Timofey Turko Vladimir Boginski David E. Jeffcoat Stanislav Uryasev Grigoriy Zrazhevsky Panos M. Pardalos
More informationApproximate Dynamic Programming: Solving the curses of dimensionality
Approximate Dynamic Programming: Solving the curses of dimensionality Informs Computing Society Tutorial October, 2008 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu
More informationDRAFT Formulation and Analysis of Linear Programs
DRAFT Formulation and Analysis of Linear Programs Benjamin Van Roy and Kahn Mason c Benjamin Van Roy and Kahn Mason September 26, 2005 1 2 Contents 1 Introduction 7 1.1 Linear Algebra..........................
More information15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018
15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 Usual rules. :) Exercises 1. Lots of Flows. Suppose you wanted to find an approximate solution to the following
More informationKnapsack and Scheduling Problems. The Greedy Method
The Greedy Method: Knapsack and Scheduling Problems The Greedy Method 1 Outline and Reading Task Scheduling Fractional Knapsack Problem The Greedy Method 2 Elements of Greedy Strategy An greedy algorithm
More informationLecture 1. Stochastic Optimization: Introduction. January 8, 2018
Lecture 1 Stochastic Optimization: Introduction January 8, 2018 Optimization Concerned with mininmization/maximization of mathematical functions Often subject to constraints Euler (1707-1783): Nothing
More informationProbabilistic Planning. George Konidaris
Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t
More informationSurge Pricing and Labor Supply in the Ride- Sourcing Market
Surge Pricing and Labor Supply in the Ride- Sourcing Market Yafeng Yin Professor Department of Civil and Environmental Engineering University of Michigan, Ann Arbor *Joint work with Liteng Zha (@Amazon)
More informationChapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 4 Greedy Algorithms Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 4.1 Interval Scheduling Interval Scheduling Interval scheduling. Job j starts at s j and
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationComplexity of Routing Problems with Release Dates and Deadlines
Complexity of Routing Problems with Release Dates and Deadlines Alan Erera, Damian Reyes, and Martin Savelsbergh H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationBirgit Rudloff Operations Research and Financial Engineering, Princeton University
TIME CONSISTENT RISK AVERSE DYNAMIC DECISION MODELS: AN ECONOMIC INTERPRETATION Birgit Rudloff Operations Research and Financial Engineering, Princeton University brudloff@princeton.edu Alexandre Street
More informationNumerical Methods. V. Leclère May 15, x R n
Numerical Methods V. Leclère May 15, 2018 1 Some optimization algorithms Consider the unconstrained optimization problem min f(x). (1) x R n A descent direction algorithm is an algorithm that construct
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationRegularized optimization techniques for multistage stochastic programming
Regularized optimization techniques for multistage stochastic programming Felipe Beltrán 1, Welington de Oliveira 2, Guilherme Fredo 1, Erlon Finardi 1 1 UFSC/LabPlan Universidade Federal de Santa Catarina
More informationMicroeconomic Algorithms for Flow Control in Virtual Circuit Networks (Subset in Infocom 1989)
Microeconomic Algorithms for Flow Control in Virtual Circuit Networks (Subset in Infocom 1989) September 13th, 1995 Donald Ferguson*,** Christos Nikolaou* Yechiam Yemini** *IBM T.J. Watson Research Center
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationPayments System Design Using Reinforcement Learning: A Progress Report
Payments System Design Using Reinforcement Learning: A Progress Report A. Desai 1 H. Du 1 R. Garratt 2 F. Rivadeneyra 1 1 Bank of Canada 2 University of California Santa Barbara 16th Payment and Settlement
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationAnticipatory Freight Selection in Intermodal Long-haul Round-trips
Anticipatory Freight Selection in Intermodal Long-haul Round-trips A.E. Pérez Rivera and M.R.K. Mes Department of Industrial Engineering and Business Information Systems, University of Twente, P.O. Box
More informationThe Optimizing-Simulator: An Illustration using the Military Airlift Problem
The Optimizing-Simulator: An Illustration using the Military Airlift Problem Tongqiang Tony Wu Warren B. Powell Princeton University and Alan Whisman Air Mobility Command There have been two primary modeling
More informationTraffic Modelling for Moving-Block Train Control System
Commun. Theor. Phys. (Beijing, China) 47 (2007) pp. 601 606 c International Academic Publishers Vol. 47, No. 4, April 15, 2007 Traffic Modelling for Moving-Block Train Control System TANG Tao and LI Ke-Ping
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationM11/5/MATSD/SP2/ENG/TZ2/XX/M MARKSCHEME. May 2011 MATHEMATICAL STUDIES. Standard Level. Paper pages
M11/5/MATSD/SP/ENG/TZ/XX/M MARKSCHEME May 011 MATHEMATICAL STUDIES Standard Level Paper 9 pages M11/5/MATSD/SP/ENG/TZ/XX/M This markscheme is confidential and for the exclusive use of examiners in this
More information1 Introduction. 2 Successive Convexification Algorithm
1 Introduction There has been growing interest in cooperative group robotics [], with potential applications in construction and assembly. Most of this research focuses on grounded or mobile manipulator
More information6. DYNAMIC PROGRAMMING I
6. DYNAMIC PROGRAMMING I weighted interval scheduling segmented least squares knapsack problem RNA secondary structure Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley Copyright 2013
More informationMULTIPLE CHOICE QUESTIONS DECISION SCIENCE
MULTIPLE CHOICE QUESTIONS DECISION SCIENCE 1. Decision Science approach is a. Multi-disciplinary b. Scientific c. Intuitive 2. For analyzing a problem, decision-makers should study a. Its qualitative aspects
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationChapter 3: Discrete Optimization Integer Programming
Chapter 3: Discrete Optimization Integer Programming Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Sito web: http://home.deib.polimi.it/amaldi/ott-13-14.shtml A.A. 2013-14 Edoardo
More information1 Bewley Economies with Aggregate Uncertainty
1 Bewley Economies with Aggregate Uncertainty Sofarwehaveassumedawayaggregatefluctuations (i.e., business cycles) in our description of the incomplete-markets economies with uninsurable idiosyncratic risk
More information