Introduction to Large-Scale Linear Programming and Applications Stephen J. Stoyan, Maged M. Dessouky*, and Xiaoqing Wang Daniel J. Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA, 90089-0193 *Corresponding author (e-mail: maged@usc.edu) 1 Introduction Linear Programming has been used as an effective tool to solve a number of applications in manufacturing, transportation, military, healthcare, and finance. In order to accurately model practical systems, however, there may be many variables and constraints necessary to capture all of the attributes in the problem. In many applications, the size of the formulation in terms of the number of variables and constraints can be quite large. For example, Gondzio and Grothey (2006) solve problems with variable sizes up to 10 9 and Leachman (1993) formulates a linear programming model of over 10 5 constraints to solve a production planning problem for semiconductors at Harris Corporation. With such large-size problems, specialized procedures such as column generation and cutting-plane methods may need to be employed to solve them. In some cases, the problem may have a special structure, where decomposition methods can also be used. The purpose of this chapter is to provide an overview of these specialized procedures, and we conclude with an example of a large-scale 1
formulation for solving a route balancing problem. For general discussions on the progress made with linear programming, problem size, and algorithmic developments one may refer to [3]. In this chapter we describe some of the fundamental ideas used in large-scale linear programming with an emphasis on the simplex method. However, the reader may also refer to the literature on Interior Point Methods [7, 11]. 2 Linear Programming Problem Consider the following standard Linear Programming (LP) problem: min c x (1) s.t. Ax = b (2) x 0, (3) where x R n, A is an m n matrix, and c and b are vectors of length n and m, respectively. Now suppose the problem is so large, e.g. the A matrix cannot be stored in memory, that the LP cannot be solved. Using the simplex method, large-scale problems can be solved by adding a few extra steps to the algorithm. Depending on the structure of the A matrix it may be possible to decompose the problem such that solving smaller subproblems provide sufficient optimality conditions. Recall from the simplex method, an LP is partitioned into Basic (B) and Nonbasic (N) elements. Such a partitioning on (1) (3) gives: min c Bx B + c Nx N (4) s.t. Bx B + Nx N = b (5) x B, x N 0, (6) where B is an m m matrix composed of the set of linearly independent columns A i i B (A i refers to column i in the A matrix), N is a matrix composed of the remaining set 2
of columns A j j N, c B refers to the vector in the objective function containing the elements in B, and c N refers to the elements in N. Also, x B and x N refer to the basic and nonbasic decision variables, respectively. Finally, B 1 exists since the columns of B are linearly independent. Then, solving for x B in equation (5) and making the substitution in (1) provides min c BB 1 b + (c N c BB 1 N)x N. (7) A basis matrix B is then said to be optimal if c N c B B 1 N 0 and B 1 b 0. The first argument is provided in equation (7), if the value in parenthesis is nonnegative then introducing a nonbasic element to B offers no further reduction to the current objective value, which proves the optimality of the solution. Although we will not go through all the details of the algorithm, the simplex method tries to find a set of basic variables that satisfy the optimality criteria. To do so, given an initial basis, reduced costs r j are defined j N where r j = c j c BB 1 A j. (8) The primal simplex method then allows variables to enter and leave the basis while upholding B 1 b 0, and thus, for a given iteration if r j 0 j N the current basis is optimal. This observation is a critical step in many large-scale applications. Given the current basis, one method is to search for a new variable x j j N to enter the basis if its reduced cost is not nonnegative; otherwise, the current solution is optimal. In the next sections we investigate such applications. 3 Column Generation Many practical problems involve large-scale models where there are many more variables than constraints; hence, n >> m. As mentioned in the last section, these instances may 3
be quite difficult to solve. For such problems, column generation is an effective solution method. Column generation can also be used for problems where columns may be dynamically generated to form possible solutions. The general idea behind the algorithm is to find the subset of m basic variables that needs to be considered in the solution to the problem since nonbasic variables are equal to zero. To do so, the method forms a subset of the columns in (1) (3) called the master problem, which contains the columns associated with a basic feasible solution to the original problem and possibly a few additional columns. The method then searches for variables that have the potential to reduce the current objective value, which is computed in a subproblem that identifies an entering variable. This implies finding a variable x j j N with a reduced cost r j < 0. The column generation algorithm proceeds as follows: first a master problem is generated involving a subset of the original variables, which contains an initial basis. Then a subproblem is solved to find an entering variable and the corresponding column is added to the master problem. The master problem is then re-solved and this process continues until the subproblem cannot identify an entering variable (i.e. r j 0 j N), in which case, the current solution is optimal. The master problem is defined as: min s.t. c ω x ω (9) ω Ψ A ω x ω = b (10) ω Ψ x 0, (11) where for example Ψ = {i B, j}, which contains all of the current basic columns and a nonbasic column A j. Then a subproblem is introduced that is used to seek a new column with r j < 0 to add to the master problem. Re-solving the master problem will generate a new solution and the process is continued until r j 0 for all variables in the subproblem. We provide an example of a possible subproblem on page 6. One may also note that there are different techniques used to store the number of elements in set Ψ of the master problem. 4
In the example described above, the size of Ψ is kept to a minimum of m+1 variables. Other methods never remove any variables that have entered the basis, and some allow Ψ to grow to a certain size before removing variables. Using any of these sizes of Ψ and solving (9) (11) by adding columns in an iterative manner as described above is guaranteed to terminate in the absence of degeneracy since it is simply a special implementation of the revised simplex method. If degeneracy is present in the problem, one can use a method to prevent cycling such as the lexicographic rule. Column generation is also an effective tool at solving problems where instead of defining a large set of decisions, the problem may be more efficiently approached if it is structured to generate candidate solutions (columns) until a better solution cannot be found. Such problems exist in inventory, transportation, and scheduling. In Section 6 we provide a route balancing example where road crews are assigned to clean various areas of Los Angeles based on distance. As will be presented, the problem can be expressed as min e x (12) s.t. Ax = b (13) x 0, (14) where e is a vector of all ones. As mentioned above, matrix A may not involve the set of all possible combinations to solve the initial problem and candidate solutions will be generated to do so. In this case, candidate solutions are equivalent to adding columns A j to the problem. In the route balancing example, finding a candidate solution (column A j ) simply requires finding an area that a crew may be allocated to. Thus, providing an initial basis to (12) (14) and possibly a few additional columns forms the master problem. Suppose that we have an initial basis, then the subproblem will generate an additional candidate solution j with a negative reduced cost, i.e. r j = 1 e B 1 A j < 0. (15) 5
Now say that the subproblem generated columns associated with the most negative reduced cost. This is equivalent to taking the column associated with min{r j } = min{1 e B 1 A j }. (16) If the solution to the reduced cost found in (16) is nonnegative, then the current solution is optimal. Notice that solving equation (16) is equivalent to the following inequality: max{e B 1 A j } 1. (17) Thus, the master problem is optimal if the solution to the subproblem satisfies (17). In addition to equation (17), the subproblem also involves constraints that satisfy the original problem such that only feasible solutions are considered. In some cases, the subproblem may simply require solving the knapsack problem. For more information and examples of such issues the reader may refer to [1, 2, 9]. 4 Cutting-Plane Methods In Section 3 we discussed large-scale problems where n >> m. We now examine the case in which the problem involves a large number of constraints. This is equivalent to investigating the dual of the model presented in Section 3. In this case, we define problems where Cutting- Plane methods provide a valid solution approach. Considering the dual of (1) (3): max b y (18) s.t. A y c, (19) where the resulting problem above has many rows or constraints. In order to solve the problem in (18) (19), the cutting-plane method defines a less-constrained problem max b y (20) s.t. A ω y c ω, ω Ω, (21) 6
where Ω is a subset of the constraints in the initial problem (i.e. A ω corresponds to a row in (19)). The less-constrained problem in (20) (21) is referred to as the relaxed problem. The cutting-plane algorithm begins by first solving the relaxed problem, then uses a subproblem to check the feasibility of the solution with that of the initial problem in (18) (19). If a constraint is violated in the initial problem, then the violated constraint is added to subset Ω in the relaxed problem and it is re-solved. This continues until the optimal solution of the relaxed problem produces a feasible solution with respect to the initial problem, at which point we have found an optimal solution. Thus, the algorithm continuously solves the relaxed problem with at least one additional constraint being added to Ω after each iteration, until an optimal solution is found that does not violate the initial problem. Given that y Ω is an optimal solution to (20) (21), using the cutting-plane method there are two issues to consider with respect to the optimality of (18) (19), namely (i) If y Ω is a feasible solution to (18) (19), then y D = y Ω ; where y D refers to an optimal solution of (18) (19); (ii) If y Ω is infeasible with respect to (18) (19), then we find the violated constraint(s) and add it to the relaxed problem (20) (21). Thus, given that row A j provides the violation, then set Ω is updated to include row j and the problem is re-solved. Figure 1 is a graphical illustration of cases (i) and (ii) above. Given that the feasible region of the initial problem is A 1, if constraint L 4 is not included in the relaxed problem (20) (21), then the feasible region expands to contain regions A 1 A2. In this case, solving the relaxed problem produces an optimal solution to (18) (19) and thus, y Ω = y D as depicted in the figure. However, if constraint L 2 is not included in the relaxed problem (20) (21), then the feasible region contains A 1 A3. Depending on the slope of the objective function and the rest of the region, the optimal solution may be at vertex yω, as depicted in the figure. In this case, y Ω y D, and violated inequality L 2 would be generated as a cutting plane. 7
L 1 y D * A 1 L 4 A 2 L 2 L 3 A 3 * y Figure 1: Illustrative example of issues related to Cutting-Plane methods. The last issue to consider with respect to the cutting-plane algorithm is how to check that the solution is feasible with respect to the initial problem. Also, if the solution is not feasible, an efficient method to identify the violated constraint(s) is necessary. Given that y k is the current solution to the relaxed problem, one method to identify a violated constraint(s) involves solving min l / Ω c l A l y k (22) from the variables in the initial problem (18) (19). This forms the subproblem in the cuttingplane method, and if (22) is nonnegative for all elements l then we have a feasible and therefore, optimal solution. Otherwise, we add a constraint that satisfies the inequality A l y k > c l to the relaxed problem (20) (21). Depending on the problem, solving equation (22) may be challenging. There are techniques that may be employed for such instances; the reader may refer to [1, 2] for more information on this topic. One may also note that the cutting-plane method we describe in this section is different than the method used in integer programming, where constraints (cuts) are added to the problem to form the convex hull of 8
the integer feasible set; please refer to [10] for more on this topic. 5 Special Decompositions In addition to the methods mentioned in Sections 3 and 4, many large-scale problems may have special structures that can be further exploited. For example, some problems may contain a sparse A matrix and/or have a unique block-matrix structure. In some cases it is possible to decompose the large-scale system into smaller, more tractable problems. In a simple example, consider the following problem with a block diagonal matrix structure min c 1 x 1 + c 2 x 2 + + c n x n (23) s.t. D1 x 1 = b 1 (24) D 2 x 2 = b 2 (25)... D n x n = b n (26) x 1, x 2,..., x n 0, (27) where D 1,..., D n represent block matrices, and b 1,..., b n, c 1,..., c n, and x 1,..., x n represent vectors of appropriate dimension. Equations (23) (27) can be easily decomposed into n subproblems involving x 1,x 2,..., x n, and solved independently. The sum of the subproblem solutions will give the optimal value to (23). Now consider problems that have the following 9
structure min c 1 x 1 + c 2 x 2 + + c n x n (28) s.t.  1 x 1 + Â2x 2 + + Ânx n = b 0 (29) D 1 x 1 = b 1 (30) D 2 x 2 = b 2 (31)... D n x n = b n (32) x 1, x 2,..., x n 0, (33) where Â1,..., Ân also represent block matrices. The constraint structure presented in (28) (33) is quite common for many practical problems. For example, problems in production, finance, and transportation often involve models where some subset of constraints only pertain to particular elements/areas of the problem (e.g. time periods) and others span the whole problem domain (e.g. all time periods). For such instances, Dantzig-Wolfe decomposition is an efficient solution approach [1, 2, 4]. Similar to the methods described earlier, Dantzig-Wolfe decomposition forms a master problem of the system in (28) (33), which contains a basis. Then, the method uses a subproblem to generate admissible columns to add to the master problem, which is re-solved until no further improvement can be made with respect to the objective function. First, the master problem is formed by defining χ i = { D i x i = b i, x i 0} i = 1,..., n, (34) and assuming χ i i = 1,..., n, (28) (33) becomes: n min c i x i (35) s.t. i=1 n  i x i = b 0 (36) i=1 x i χ i, i = 1,..., n. (37) 10
Then, using the Representation Theorem [1, 2, 4] we have that any element x i χ i can be expressed as x i = q Q i λ q i vq i + k K i µ k i d k i, (38) where v q i q Q i consist of the set of extreme points in χ i and d k i k K i is the set of extreme rays in χ i ; also note that λ q i 0, µk i 0, and q Q i λ q i = 1 i = 1,..., n. Therefore, from the representation theorem (35) (37) is equivalent to ( ) n min c i (λ q i vq i ) + c i (µ k i d k i ) (39) i=1 q Q i k K ( i ) n s.t. Â i (λ q i vq i ) + Â i (µ k i d k i ) = b 0 (40) i=1 q Q i k K i = 1, i = 1,..., n, (41) q Q i λ q i λ q i 0, i = 1,..., n, q Q i, (42) µ k i 0, i = 1,..., n, k K i, (43) which is the master problem of Dantzig-Wolfe decomposition. One of the key issues with the master problem is that (39) (43) typically contain more variables than the original problem in (28) (33), since a variable is created for every extreme point and extreme ray of each polyhedron χ i i = 1,..., n. This issue can be resolved by using column generation methods (described in Section 3) to facilitate the increased number of variables. For further discussions on the Dantzig-Wolfe algorithm the reader may refer to [1, 2, 4, 9]. Taking the dual of (28) (33) forms a problem that can be solved using Benders decomposition. Such problems describe the structure of a two-stage stochastic linear program and can be found in many practical models where uncertainty is present. Benders decomposition is a Dantzig-Wolfe decomposition algorithm applied to the dual, and hence, the master problem contains many constraints. The key idea in Benders decomposition is to solve a relaxed 11
master problem similar to what was done in cutting-plane methods, described in Section 4. For more information on Benders decomposition one may refer to [1, 2, 4, 9]. 6 Route Balancing Example In this section, we present an example of a linear programming formulation to balance the routes for road crews used by street sweepers. Several counties across California have begun to switch from diesel-powered street sweepers to Compressed Natural Gas (CNG) street sweepers in order to comply with Federal and State air quality regulations. However, due to a number reasons, which include slow refueling time at CNG fueling stations and some design characteristics of the CNG sweepers, the productivity of the sweeping operations have decreased with these new sweepers. The goal of the study is to identify new operating policies to improve the productivity of the CNG street sweepers. From a recent analysis of the assigned road miles, it was evident that the workload varied significantly from crew to crew. Hence, a better balance of the assigned miles-to-crews could significantly improve the productivity of the crews. In this section, we present a mathematical model that optimizes the assignment of road miles to crews. Input Parameters: Let, 1, 2,..., n: be the possible road crews with a total of n; r ij : be the maximum possible road miles that crew i can shift to crew j, i = 1, 2,..., n, j = 1, 2,..., n, where r ii = 0; a i : be the original assigned miles of crew i = 1, 2,..., n; u: be the average miles of the region. 12
Decision Variables: Let, x ij : be the road miles that crew i shifts to crew j, i = 1, 2,..., n, j = 1, 2,..., n, where x ii = 0; z i : be the deviation between the workload of crew i = 1, 2,..., n and u. Using the variable and parameter definitions above, the road crew linear program is as follows: min s.t. n i=1 z i n x ji + a i j=1 n x ij a i (44) n x ij u z i i = 1, 2,..., n, (45) j=1 j=1 j=1 n x ji + u z i i = 1, 2,..., n, (46) x ij r ij i = 1, 2,..., n, j = 1, 2,..., n, (47) x ij 0 i = 1, 2,..., n, j = 1, 2,..., n. (48) The objective in problem (44) (48) is to minimize the deviation between the assigned miles of each crew and the average miles of the region, which is equivalent to n n n x ji + a i x ij u. (49) i=1 j=1 j=1 Here, variable z i is used to represent the deviation and linear transformations represented by constraints (45) and (46). The objective function is simply the sum of the deviation of all crews. The smaller the objective value, the more balanced the routes are and an objective value of zero means that each route in the region has exactly the same number of miles. The third constraint (47) limits the miles that crew i can shift to another crew. The data establishment of r ij is based on the physical connection between two adjacent crews, since we only allow the reassignment of miles if the two crews are next to each other on the freeway. 13
The last constraint (48) is simply a sign restriction on the nonnegativity of variable x ij. The nonnegativity of variable z i has already been ensured by the first and second constraints. Based on the LP in (44) (48), we constructed an example that consists of 37 crews (n = 37) and has 31.25 miles average for the region (u = 31.25 miles). The resulting LP model contains 1406 variables and 2812 constraints, which can be easily increased in size. Thus, (44) (48) may be solved using cutting-plane methods or taking the dual and applying column generation. In addition, the problem used pre-assigned miles for road crews, however, using column generation alternative road-mile combinations may be generated; as presented in Section 3. For the given example, Table 1 lists the assigned miles to each crew for the current and optimized solutions, which was solved using CPLEX 9.0. The optimal solution is 273.00 miles, which is a 29.39% improvement over the current actual practice of 386.64 miles. 14
Current Optimized Current Optimized Crew Solution (miles) Solution (miles) 1 20.90 31.25 2 30.65 29.01 3 39.76 31.25 4 10.30 10.13 5 28.57 31.25 6 30.90 47.05 7 44.29 77.95 8 79.02 31.25 9 38.29 31.25 10 20.84 23.13 11 41.03 34.03 12 12.64 19.64 13 9.2 15.11 14 22.96 18.13 15 18.98 31.25 16 28.15 25.59 17 20.21 23.64 18 29.45 31.25 Crew Solution (miles) Solution (miles) 20 19.36 21.40 21 39.48 33.05 22 30.16 31.25 23 40.32 31.25 24 21.77 11.71 25 28.79 31.25 26 20.49 31.25 27 20.23 28.13 28 25.84 31.25 29 35.99 31.25 30 47.22 38.65 31 27.36 33.55 32 31.67 31.25 33 44.11 38.34 34 54.2 77.04 35 57.39 31.25 36 38.04 38.04 37 27.95 31.25 19 19.63 12.82 Table 1: Crew road miles. 15
References [1] Bazaraa, MS, Jarvis, JJ, and Sherali, HD. Linear Programming and Network Flows. 2005. John Wiley and Sons, Hoboken, NJ. [2] Bertsimas, D and Tsitsiklis, JN. Introduction to Linear Optimization. 1997. Athena Scientific, Belmont, MA. [3] Bixby, RE. Solving Real-World Linear Programs: A Decade and More of Progress. 2002. Operations Research, 50(1), 3-15. [4] Dantzig, GB and Thapa, MN. Linear Programming 2: Theory and Extensions. 2003. Springer, New York, NY. [5] Gondzio, J and Grothey, A. Direct Solution of Linear Systems of Size 10 9 Arising in Optimization with Interior Point Methods. 2006. Lecture notes in computer science, 513-525. [6] Leachman, RC. Modeling Techniques for Automated Production Planning in the Semiconductor Industry, in Optimization in Industry. TA Ciriani and RC Leachman, eds. 1993. John Wiley and Sons, New York, NY. [7] Roos, C, Terlaky, T and Vial J-P. Interior Point Methods for Linear Optimization. 2006. Springer, New York, NY. [8] Vanderbei, RJ. Linear Programming: Foundations and Extensions. 2001. Springer, New York, NY. [9] Winston, WL. Operations Research: Applications and Algorithms. 2004. Thomson Brooks/Cole, Toronto, ON. [10] Wolsey, LA. Integer Programming. 1998. John Wiley and Sons, Toronto, ON. 16
[11] Ye, Y. Interior Point Algorithms: Theory and Analysis. 1997. John Wiley and Sons, Toronto, ON. 17