Minimum Distance Violation Tree Problems

Size: px

Start display at page:

Download "Minimum Distance Violation Tree Problems"

Kory Robinson
5 years ago
Views:

1 Minimum Distance Violation Tree Problems Hui Chen Department of Management Sciences University of Iowa S221 John Pappajohn Business Building Iowa City, Iowa Ph.D Dissertation Proposal April 2, 2007

2 TABLE OF CONTENTS Page LIST OF TABLES vii LIST OF FIGURES viii CHAPTER 1 INTRODUCTION Motivation Problem Variants Uniform Distance Restriction (H ij = H) Node Distance Restriction (H ij = H i ) Pairwise Distance Restriction (H ij ) Notation and Definition Literature Review Optimal Communication Spanning Tree (OCST) Hop-Constrained Minimum Spanning Tree (HC-MST) k Source Minimum Max-Eccentricity Spanning Tree (MEST). 8 3 Minimax Distance Violation Tree Problems Minimax Distance Violation Tree with Uniform Distance Restriction (UMVT) Minimax Distance Violation Tree with Node Distance Restrictions (NMVT) Minimax Distance Violation Tree with Pairwise Distance Restriction (PMVT) Alternative NMVT Algorithm Preliminaries Algorithm for the NMVT problem Minimax Flow Distance Violation Tree Problems (F-MVT) Flow Variants Node F-MVT Problems (NF-MVT) Pairwise F-MVT Problems Conclusions v

3 4 Minisum Distance Violation Tree Problems Problem Complexity IP Formulations Local Search Algorithms Edge k-switch Neighborhood Local Search Options Implementation and Results Structure-Based Neighborhoods Backbone Move-One-Hub (MOH) Neighborhood Add-One-Hub (AOH) Neighborhood Merge-Two-Hubs (MTH) Neighborhood Data Set and Test Design Test Results Conclusions Future Work Motivation Problems Minimum Subgraph with Diameter Constraint Problem (MSD) Minimum Augmentation with Diameter Constraint Problem (MAD) vi

4 LIST OF TABLES Table Page 3.1 Complexity of Nine Min-Max Flow Distance Violation Tree Problem Variants problem instances on U.S. data Compare MST with BSPT as FI initial solutions Compare BI with FI Compare GRASP with FI Compare GRASP with TS problem instances on E.U. data Pairwise Size scheme for U.S. cities The 8 PSVT instances Compare the number of MOH, AOH, and MTH in SSBLS Compare the run time of GRASP and SSBLS Compare the sum of violation for GRASP, MSBLS, and ES-SBLS Compare the sum of violation for GRASP, MSBLS, and ES-SBLS on PSVT instances vii

5 LIST OF FIGURES Figure Page 3.1 An example of the n-mest problem transformed from the NMVT problem An example on an original tree T T B and its corresponding tree T in G Two subtrees of Tr separated by its center point o Two different initial solutions for US Compare the best and worst trees obtained in 10 GRASP runs for US Compare the best and worst trees obtained in 10 GRASP runs for EU An example of MOH An example of AOH An example of MTH Compare the sum of violation for GRASP and SSBLS viii

6 1 CHAPTER 1 INTRODUCTION 1.1 Motivation Whether they are called expedited, express, or time-definite deliveries, there is no question that these services now dominate the delivery business. Best characterized by FedEx s and UPS s next-day and second-day delivery services, time-definite services have grown from just 4% of the parcel delivery market in 1977 to over 60% in 2002 (United Parcel Service, 2002). The market for all time-definite cargo was expected to grow by 7.6% in 2006 (Carey, 2006), and the growth is expected to continue. To be successful in the face of fierce competition and to meet the growing demand for time-definite services, delivery companies must design their delivery networks to reliably and efficiently meet their promised delivery times. In the near term, however, redesign of the network through the opening of new facilities and the closing of existing facilities is infeasible. Rather, the network can be modified by changing how freight flows through the network. In particular, companies can change which cities have direct connections to one another and which cities must flow freight and packages through intermediate cities. Each hub-to-hub connection requires substantial investment, so delivery companies want to minimize the number of direct connections. Because of the costs involved, this study will focus on restricting the design of the delivery networks to tree structures, as they connect all of the nodes in the network with the minimum number of connections. In addition to cost minimization, Powell and Koskosidis (1992) note that, tree-structured networks are easier networks to manage because there is only one path between each origin and destination pair. Because it may not be possible to identify a tree-structured network that satisfies all of the delivery guarantees, we allow the restrictions to be violated but seek to minimize the violations. We assume the transportation time depends on only the distance between two cities, and we can translate a given transportation time restriction into a distance restriction. We then define the distance violation from city A to city B as the difference between the length of a path between them in the tree and the associated distance restriction from A to B. We consider two

7 2 ways in which the violation can be measured. First, we consider when the violation is based strictly on the difference between the distance of two nodes in the tree and their associated distance restriction. In the second case, the size of the violation is weighted by the flow between nodes. Moreover, there are two different methods for companies to measure the network. Companies may measure the network by the max of violations which represents the worst case of the network. Instead, companies may prefer to measure the sum of violations in their network. Then, when the violation is a function only of distance, We call the problems of identifying the tree that minimizes the max and the sum of the distance violations over all pairs of cities the Minimax Distance Violation Tree Problem (MVT) and the Minisum Distance Violation Tree Problem (SVT), respectively. When the violation is a function of both flow and distance, we call the problems the Minimax Flow Distance Violation Tree Problem (F-MVT) and the Minisum Flow Distance Violation Tree Problem (F-SVT), respectively. 1.2 Problem Variants Delivery companies design different services between cities considering many factors such as the distance, the competition, the demand and so on. These different services lead to different assumptions about the distance restriction between cities. Based on different assumptions about the distance restriction, we shall study three problem variants. If H ij represents the distance restriction between node i and node j, then we consider three different assumptions on H ij Uniform Distance Restriction (H ij = H) In this case, H ij equals a constant H for any pair of nodes. The uniform distance restriction represents the case where the distance restrictions are the same for all pairs of nodes. Then, we refer to the first variants of MVT and SVT as the Minimax Distance Violation Tree Problem with Uniform Distance Restriction (UMVT) and the Minisum Distance Violation Tree Problem with Uniform Distance Restriction (USVT), respectively Node Distance Restriction (H ij = H i ) In this case, the distance restriction from node i to all other nodes is the same and equal to H i. However, for different nodes, H i may be different. This restriction is motivated by the fact that different levels of service may be offered in different cities due to the size of the markets and the level of competition. For example,

8 3 customers in large cities, such as Boston or Chicago, may be offered 2-day service for domestic deliveries, regardless of destination, but customers in smaller cities, where there is less competition and demand, may be offered only 4-day service for their packages. Then, we call the second variants of MVT and SVT as the Minimax Distance Violation Tree Problem with Node Distance Restriction (NMVT) and the Minisum Distance Violation Tree Problem with Node Distance Restriction (NSVT), respectively Pairwise Distance Restriction (H ij ) In this case, the distance restriction between node i and node j is specific to the pair of nodes i, j. This restriction is motivated by the fact that the delivery guarantee between cities is often based on the distance between cities and/or the flow between them. For example, deliveries among cities in the Northeast can have short delivery times due to their proximity, but short delivery times would not be possible for cross-country deliveries. Moreover, as the transportation flows between large cities are bigger than between small cities, shorter delivery times are more often promised between large cities than small cities due to strong competition. Thus, the third variants of MVT and SVT are the Minimax Distance Violation Tree Problem with Pairwise Distance Restriction (PMVT) and the Minisum Distance Violation Tree Problem with Pairwise Distance Restriction (PSVT), respectively. 1.3 Notation and Definition Let G = (V, E) be an undirected graph, where V is the set of nodes v i and E is the set of edges e ij. Without loss of generality, we assume i < j for every i and j. Also let V = n and E = m. Suppose that each edge e ij E is associated with a positive length l ij. A node which is incident to only one edge is a leaf node, otherwise it is an internal node. An edge incident to the leaf node is a leaf edge. A path from node v i to node v j is denoted as v i -v j path. For two nodes u and v, the shortest path distance in a graph G is denoted by d G (u, v). A spanning tree of G is a connected subgraph T = (V, E T ) without cycles. The distance between node v i and node v j in T, d T (v i, v j ), is the sum of the lengths of the edges which are in the unique v i -v j path. The diameter path of T, D T, is defined as the longest path in T among all paths connecting a pair of nodes in V. The diameter of T, δ T, is then the length of D T. For v V and U V, define D(v, U) = max u U {d G (v, u)} the maximum of shortest path lengths from node v to any node in the set U. For a spanning tree T, D T (v, U) is similar except that the distance is on the tree T. We define nodes v i V and v j V as adjacent in the tree T if arc e ij E T.

9 The endpoints of a path are the two nodes where the path starts and ends. In a path, all the other nodes except the two endpoints are internal nodes of the path. Let v 1 v 2... v k be path A and u 1 u 2... u l be path B in the graph. We say path A and path B are disjoint if v i u j for i = 1, 2,..., k and j = 1, 2,..., l. Path A and path B intersect if there exists a node v p on path A coinciding with a node u q on path B. In addition, we call v p an intersection node. 4

10 5 CHAPTER 2 LITERATURE REVIEW To the best of our knowledge, the MVT, SVT, F-MVT, and F-SVT have not been studied in the literature. However, several related problems exist. We review this literature below. 2.1 Optimal Communication Spanning Tree (OCST) The Optimal Communication Spanning Tree Problem (OCST) was first introduced by Hu (1974). It is the only literature known to the authors which considers flow in conjunction with tree structures. Given a set of requirements r ij in graph G = (V, E), which may represent the number of telephone calls between v i and v j for example, the cost of communication of a given spanning tree is defined as follows. For a pair of nodes v i and v j, there is a unique path in the spanning tree between them. The cost of communication for the pair of nodes v i and v j is r ij multiplied by the distance of the path, d T (i, j). Summing over all ( n 2) pairs of nodes, we have the cost of the spanning tree. The problem is to build a spanning tree connecting all nodes such that the total cost of communication of the spanning tree is minimum among all spanning trees. Johnson et al. (1978) demonstrate that the OCST problem is NP-hard. Since Hu s introduction of the problem, few results for the general OCST problem have been attained. Ahuja and Murty (1987) provide an exact algorithm for small problems. Their heuristic algorithm has two phases, the tree-building phase and tree-improvement phase, and each phase requires O(n 3 ) computations. Peleg (1997) describes an O(ln 5/2 n ln D) - approximation solution, where D denotes the maximum distance between any two vertices in G. Peleg and Reshef (1998) shows that the OCST can be transformed into the problem of Minimum Average Stretch Spanning Tree problem (MAST). In the MAST, there is an undirected connected graph, with distance weights w i,j 0 and multiplicities m i,j 0 for every edge e i,j E. Let M = i,j m i,j. For a spanning tree T of G, the stretch over the vertex pair v i, v j is defined as d T (i, j)/w i,j, and the average stretch of T is S(T ) = 1 M e i,j E m i,j dt (i,j) w i,j. The objective of the MAST is to minimize S(T ). Based on an algorithm for MAST, Peleg establishes a polynomial-time approximation algorithm

11 6 for the OCST with approximation ratio O(log 2 n) in (Peleg and Reshef, 1998). Because of the complexity of the OCST, most communication spanning tree research focuses on simplified versions of the OCST. By assuming the length of every arc is one, the general problem reduces to the Optimal Requirement Spanning Tree Problem (ORST). Hu (1974) establishes that the ORST can be solved in polynomial time when the input is a complete graph. The algorithm, known as the Cut Tree Algorithm, is based on solving n 1 maximal flow problems, where n is the number of nodes. But this algorithm does not work on a general graph. Another way to simplify the OCST is to assume that the requirements, which are the required flows between a pair of nodes, are one for all pairs. This problem is known as the Optimal Distance Spanning Tree Problem (ODST). Wu (2002) proves that the k source ODST (k-odst) is NP-hard even if k = 2 for a metric graph. The k- ODST only has k source nodes and all vertices as destinations for flow. Hu was able to establish sufficient conditions for a star-tree, which has only one internal node, to be the optimal solution for the ODST in (Hu, 1974). These conditions are based on the relative size of the arc lengths. Much of the research on the ODST focuses on finding approximation algorithms. Wu et al. (2000a) show that the ODST k star tree, which has at most k internal nodes and can be found in polynomial time, is a ((k + 3)/(k + 1)) - approximation solution for the metric ODST problem. Particularly, a special 2-star tree, which can be found in O(n 3 ) time, is a approximation solution. Moreover, Wu et al. (2000c) present another algorithm based on finding general star, which is a generalization of the shortest-paths tree. It provides a 2 - approximation solution in O(n 2 + f(g)) time, where f(g) is the time complexity for computing all pair shortest paths of the input graph G and n is the number of vertices of G. Also, this algorithm can achieve a 15/8 - approximation solution in O(n 3 ) time, a 3/2 - approximation solution in O(n 4 ) time, and a 4/3 + ε - approximation solution in O(n δ ) time, where ε > 0 and δ is (33ε + 8/(9ε). Another result established by Wu et al. (1999) is a 1 + ε - approximation solution in O(n 2 2/ε 2) ) time. Researchers have also tried to find other ways to simplify the OCST without losing the generality of the problem. One attempt is to make a restriction on the number of source nodes, where flow originates only at source nodes like the k-odct. Provided there are p sources in the graph, the OCST becomes the p-source OCST problem (p-ocst). Wu (2004a) proves for any fixed integer p 2, the p-ocst is NP-hard in the metric graph. He offers a 2 - approximation solution in O(n p 1 ) time using a greedy algorithm. For the 2-OCST, he proposes an algorithm that yields a 3 - approximation solution in O( E + V log V ) time, and the time complexity can be reduced to O( E ) if the weight lengths are all integers.

12 7 Another way to simplify the OCST is to make certain assumptions on the communication requirement between nodes. Wu et al. (2000a) make the assumption that the requirement is the sum of the weights of the two nodes and call this problem the Optimal Sum-Requirement Communication Spanning Tree Problem (SR-OCST). For the SR-OCST, an O(n 3 ) time 2 - approximation solution is provided in (Wu et al., 2000a). Wu et al. (2000a) also investigate the assumption that the requirement is the product of the weights of the two nodes. This problem is the Optimal Product-Requirement Communication Spanning Tree Problem (PR-OCST). They demonstrate that the k-star tree is a ((k + 3)/(k + 1)) - approximation solution for the metric PR-OCST problem. Specifically, the PR-OCST 2-star, which can be found in O(n 5 ) time by solving a series of min-cut problems, is a approximation solution. Furthermore, Wu et al. (2000b) give a more general result. That is, there exists a polynomial time algorithm scheme for the PR-OCST with time complexity O(k k 1 n 2k (q/λ) k ) and approximation ratio ((1 + q 1 ) 2 + λ((k + 3) 2 /(k + 1)))((k + 3)/(k + 1)) for any positive integers q and k and positive number λ < 1. In this algorithm, q is the vertex weight scaling factor, k is the number of internal nodes in the tree, and λ is the weight threshold factor to divide the vertices into light vertices and heavy vertices. 2.2 Hop-Constrained Minimum Spanning Tree (HC-MST) Another related problem is the Hop-Constrained Minimum Spanning Tree Problem (HC-MST). Referring to each arc as a hop, hop constraints put limitations on the number of hops between nodes and can be viewed as a kind of distance constraint. In this literature, the objective of the HC-MST is to find the minimum spanning tree T such that the number of the hops (arcs) in the unique path from a single root node to any other node is not greater than a constant number H. By reducing the problem to Simple Uncapacitated Plant Location Problem, Dahl (1998) proves that the 2-hop constrained minimum spanning tree problem is NP-Hard. Manyem and F.M.Stallmann (1996) show that the HC-MST is not APX by reducing it to a Set Covering problem. That is, it is not possible to find a polynomial time heuristic which guarantees a constant approximation bound. In solving the problem, there are many different integer programming (IP) formulations. Gouveia (1995) provides several node-oriented formulations, in which the variables X ij represent whether arc(i, j) is in the minimal spanning tree and variables U i specify the position of node i in the tree. He builds different formulations for the problem by adding

13 8 or lifting specific valid constraints. Linear programming relaxation and Lagrangian relaxation combined with subgradient optimization are used to get lower bounds. Gouveia (1996) offers several multicommodity flow formulations for the problem. He strengthens the model by lifting the hop constraints, uses an arc elimination test to reduce the number of arc variables, obtains lower bounds by Lagrangian relaxation method, and utilizes arc exchange heuristics to transform a spanning tree into a feasible hop-constrained spanning tree. Gouveia offers even more formulations including an extended hop-path formulation (Gouveia, 1998) and a hop-dependent multicommodity flow formulation (Gouveia, 1998). Gouveia and Requejo (2001) further improve the Lagrangian relaxation of hop-dependent multicommodity flow formulation by dualizing two special sets of constrains. People have also tried other algorithms to solve the problem. Voss (1999) uses tabu search to improve a feasible initial solution. Althaus et al. (2005) build an algorithm with an O(log n) - approximation in running time O(n 5 k) for the k-hop constrained minimum spanning tree problem. 2.3 k Source Minimum Max-Eccentricity Spanning Tree (MEST) The k Source Minimum Max-Eccentricity Spanning Tree Problem (k-mest) is highly related to the MVT and F-MVT. Given the source node set S such that S = k and the sink node set U in a graph G, the source eccentricity in a spanning tree T is the longest distance from the source to all sink nodes. The k-mest problem is defined as finding a spanning tree to minimize the maximal source eccentricities among all sources in S. Farley et al. (2000) explore the instance with uniform edge length. First, they prove the result that for the k-mest where k is the number of source nodes with uniform edge lengths, there exists either a vertex x or an edge (y, z), such that the shortest-path tree rooted from either x or the midpoint of (y, z) minimizes the max source eccentricity. Then, they demonstrate the exact polynomial algorithm for the k-mest with uniform edge length by solving many shortest-path problems. They then build an exact pseudo-polynomial algorithm for the k-mest by dividing each edge into edges with uniform length. Krumme and Fragopoulou (2001) prove a more general result that the minimum max-eccentricity spanning tree is a shortest-path tree rooted at either a vertex or a created vertex lying on an edge. By identifying the appropriate edge that can be cut to create a new vertex from which to construct that optimal shortest-path spanning tree, they offer a polynomial algorithm with running time O( V 3 + E V ). Based on the similar idea, McMahan and Proskurowski

14 9 (2004) establish another exact polynomial algorithm with running time O( V 3 + E V log V ). In addition, McMahan and Proskurowski (2004) demonstrate the result that all edges can be partitioned into intervals such that all points in the same interval share identical shortest-path spanning tree and there are at most V +1 such intervals on any given edge. Thus, they design another exact polynomial algorithm by finding all shortest path for every different interval. Wu (2004b) describes a faster exact polynomial algorithm with O( V 2 log V + V E ) time.

15 10 CHAPTER 3 MINIMAX DISTANCE VIOLATION TREE PROBLEMS In this chapter, we examine the complexity of the MVT and F-MVT problems and find some to be polynomial and others to be NP-Complete. Section 3.1 identifies the optimal solution for the UMVT problem. Section 3.2 shows the NMVT problem can be transformed to a special k-mest problem. In Section 3.3, the NP- Completeness of the PMVT problem is established. In Section 3.4, we describes an alternative polynomial-time algorithm for the NMVT problem. In Section 3.5, we study the complexity of the variants of F-MVT and show some variants can be solved by extending the algorithm developed in Section 3.4 for the NMVT. Finally, we conclude in Section Minimax Distance Violation Tree with Uniform Distance Restriction (UMVT) In the UMVT, because the distance restriction is the same for every pair of nodes, the worst case in terms of violating the distance restriction is achieved by the longest path in the tree. Because the longest path in a tree plays an important role in finding the optimal solution to the UMVT problem, we first introduce the Minimum Diameter Spanning Tree Problem (MDST). The MDST is the problem of finding a spanning tree in a graph that minimizes the diameter of the spanning tree. Proposition 1 states the relationship between the MDST and the UMVT. Theorem 1 Any MDST is an optimal solution to the UMVT. Proof: Because the distance restriction for any pair of nodes is the same under the uniform distance restriction, the maximal distance violation of a tree is obtained by the longest path of the tree. That is, the smaller the diameter of a tree, the less the maximal distance violation of the tree. Thus, minimizing the maximal distance violation reduces to finding the minimum diameter spanning tree. Given this result, we have a polynomial time algorithm for the UMVT problem. Hassin and Tamir (1995) show that the optimal solution to the MDST can be found by an O(mn + n 2 log n) time algorithm, which could thus be used in solving the UMVT problem as well.

16 Minimax Distance Violation Tree with Node Distance Restrictions (NMVT) This section discusses the NMVT. For NMVT, we show that it can be transformed into a special case of the k-mest problem and then solved by an algorithm for the k-mest problem. Let G(V, E) be a simple, connected, and undirected graph with node set V and edge set E. Let l ij denote the edge length and l ij 0 for any edge e ij E. Note that for the k-mest as defined before, if we denote the cost function c(t ) as the maximal source eccentricity in a tree T, then c(t ) = max s S {max u U {d T (s, u)}}, where S such that S = k and U are the source node set and the sink node set in G, respectively. For the NMVT, if let v(t ) denote the maximal distance violation of a tree T, then v(t ) = max vi V {max vj V {d T (v i, v j ) H i }} where H i is the node distance restriction for node v i. Given the node distance restriction, we observe that the maximal distance violation from a node in a tree is obtained by the longest path from the node in the tree. As both the k-mest problem and the NMVT problem are concerned with the longest distance from nodes in a tree, these two problems are closely related to each other. In fact, we can transform the NMVT problem into a special case of the k-mest problem. For an instance of the NMVT problem, we can transform the NMVT problem to a special case of the k-mest problem as follows. First, we build the input graph G for the k-mest problem from G. For each node v i V, we create a new node u i and then connect v i and u i by a new edge e vi u i with a negative edge length l vi u i = H i 0. Let G be the graph with node set V and edge set E where V = V {u 1, u 2,..., u n } and E = E {e v1 u 1, e v2 u 2,..., e vnun }. Then, V = 2n and E = E + V = m + n. Next, we set the source and sink node sets for the k-mest problem such that S = V = {v 1, v 2,..., v n } and U = {u 1, u 2,..., u n } respectively. Clearly, only the sink nodes are leaf nodes in G. In this paper, we refer to this special k-mest problem as the n-mest problem. Figure 3.1 shows an example of the transformed n-mest problem from an instance of the NMVT problem, in which there are four nodes v 1, v 2, v 3, and v 4 in G with 2, 4, 6, and 3 as the node distance restrictions H 1, H 2, H 3, and H 4 respectively. Let T be any solution tree for the n-mest problem of G. As all the sink nodes are leaf nodes in G, all the sink nodes are still leaf nodes in T and all the leaf edges must be in T. Then, if we delete all the sink nodes and all the leaf edges from T, we can then obtain a new tree T of G. Conversely, given any T of G, we can also construct an tree T of G by adding all the sink nodes and all their adjacent edges in G to T. Then, we have the following observation for c(t ) and v(t ).

17 12 Figure 3.1: An example of the n-mest problem transformed from the NMVT problem Observation 1 c(t ) = v(t ). Proof: Clearly, d T (v i, v j ) = d T (v i, v j ) for any v i, v j V. Then, because u i U and v i V are one to one corresponding and S = V, c(t ) = max max{d T (v i, u j )} v i S u j U = max max{d T (v i, v j ) H j } v i S v j V = max max{d T (v i, v j ) H j } v i V v j V = v(t ). Then, we have the following corollary on the optimal solution of the NMVT problem. Corollary 1 A tree T is the optimal NMVT in G if and only if the corresponding T is the optimal n-mest in G. Next, we shall study the algorithm to solve the n-mest problem. The fastest algorithm to solve the k-mest problem for a graph with positive edge lengths is offered by Wu (2004b). The algorithm runs in O(n 2 log n + mn) time. But, it is not clear whether it can be applied to the n-mest problem since some edges have negative edge lengths in the n-mest problem of G. However, we shall establish that Wu s algorithm can solve the n-mest problem in the same computation time as required for the k-mest problem. For the n-mest problem, on any tree T of G, since all the sink nodes are leaf nodes and all the source nodes are internal nodes, any simple path between two source nodes contains only source nodes and then consists of only the edges in E. We denote ξ T = max v1,v 2 S{d T (v 1, v 2 )} the maximum of the intra-source distances on

18 13 T. An edge e m1 m 2 E on T is a central edge if min{d T (v, m 1 ), d T (v, m 2 )} ξ T /2 for any v S and e m1 m 2 is on the longest intra-source path. Then the next two lemmas in Wu (2004b) still hold for T. Lemma 1 Wu (2004b) Given a tree T of G and e m1 m 2 E. If we remove e m1 m 2 from T, we obtain two trees T 1 and T 2 such that m 1 and m 2 are in T 1 and T 2, respectively. Let S 1 and S 2 be the sets of all the source nodes in T 1 and T 2, respectively. Let U 1 and U 2 be the sets of all the sink nodes in T 1 and T 2, respectively. Then, the edge e m1 m 2 is a central edge of T if and only if both S 1 and S 2 are not empty and D T (m 1, S 1 ) D T (m 2, S 2 ) l m1 m 2. Since the intra-source paths which is studied in Lemma 1 only contain nodes in V and edges in E, the proof of Lemma 1 follows as in Wu (2004b). Lemma 2 Wu (2004b) If e m1 m 2 is a central edge of T, then c(t ) = l m1 m 2 + max{d T (m 1, S 1 ) + D T (m 2, U 2 ), D T (m 2, S 2 ) + D T (m 1, U 1 )}. Proof: By Lemma 1 and the construction of the tree T, D T (m 1, S 1 ) D T (m 2, S 2 )+ l m1 m 2 = D T (m 1, S 2 ), which implies that the furthest source node to m 1 is in S 2. Thus, by triangle inequality, for any sink node u i U 1, max{d T (u i, v)} = l ui v i + max{d T (v i, v)} v S 1 v S 1 l ui v i + max{d T (v i, m 1 ) + d T (m 1, v)} v S 1 = l ui v i + d T (v i, m 1 ) + D T (m 1, S 1 ) d T (u i, m 1 ) + D T (m 2, S 2 ) + l m1 m 2 and max{d T (u i, v)} v S 2 = d T (u i, m 1 ) + D T (m 2, S 2 ) + l m1 m 2 so we have max T (u i, v)} v S = max{ max{d T (u i, v)}, max{d T (u i, v)} } v S 1 v S 2 = max{d T (u i, v)} v S 2 = d T (u i, m 1 ) + D T (m 2, S 2 ) + l m1 m 2. Taking the maximum over all nodes in U 1, we have max{max T (u, v)}} = D T (m 1, U 1 ) + D T (m 2, S 2 ) + l m1 m 2. u U 1 v S Similarly, max{max T (u, v)}} u U 2 v S = D T (m 2, U 2 ) + D T (m 1, S 1 ) + l m1 m 2. The result then follows the definition of c(t ). We define the central point as the middle point of the longest intra-source path of T. An important result on the shortest path tree rooted at the central point of the optimal tree for the k-mest problem is proved in Farley et al. (2000) and presented in Krumme and Fragopoulou (2001), McMahan and Proskurowski (2004), and Wu (2004b). We shall extend this result to the n-mest problem.

19 14 Lemma 3 If T is an optimal n-mest and o is the central point of T, then a shortest path tree T o rooted at o in G is also optimal. (This is the result similar to Theorem 1 in the first version of our paper.) Proof: For any node w S U, let D T (w, S) and D T o (w, S) denote the maximal distance from w to all source node on T and T o, respectively. For any source node v i S, we shall first show D T o (v i, S) D T (v i, S). For any v j S such that v j v i, because of the triangle inequality, the shortest paths from o in T o, the definition of longest intra-source distance, and Lemma 2 (The second Lemma in the first version of our paper), we obtain d T o (v i, v j ) d T o (v i, o) + d T o (o, v j ) d T (v i, o) + d T (o, v j ) d T (v i, o) ξ T = D T (v i, S). It follows directly that D T o (v i, S) D T (v i, S). Therefore, for any sink node u i U, D T o (u i, S) = l ui v i + D T o (v i, S) l ui v i + D T (v i, S) = D T (u i, S). Hence, c(t o) = max T o u i U (u i, S)} max T (u i, S)} u i U = c(t ). That is, T o is also optimal. Then, letting T A be the set of all the shortest path trees in G, the following observation is true. Observation 2 For a tree T T A, if c(t ) is the minimum among all the trees in T A, then T is an optimal n-mest. For any edge e m1 m 2 E, there are certain trees in G such that e m1 m 2 is their central edge, all the source nodes connect with either m 1 or m 2 by a shortest path, and all the sink nodes connect with their unique adjacent source nodes. Then, we define the set T B as all these trees for all e m1 m 2 E as the central edge. For any tree T T A, since its central edge is an edge in E, without lost of generality, let e m1 m 2 E be the central edge of T. Then, e m1 m 2 could divide T into two trees T 1 and T 2, S into S 1 and S 2, and U into U 1 and U 2 as in Lemma 1. Clearly, all the nodes in S 1 (S 2 ) are connected with m 1 (m 2 ) by a shortest path and all the sink nodes are connected with their unique adjacent source nodes in T. Hence, T T B if T T A and then T A T B. However, for a tree T T B, although all the nodes

20 15 are connected by shortest paths to either m 1 or m 2, T may not be a shortest path tree on the whole. Then, T A is a proper subset of T B, that is T A T B. Therefore, based on Observation 2, we have the following observation. Observation 3 For a tree T T B, if c(t ) is the minimum among all the trees in T B, then T is an optimal n-mest. In order to adapt Wu s algorithm, we now construct a new graph G (V, E ) from G, where V = V and E contains all the edges connecting any source node v i S with all other nodes in V with the simple shortest path distances in G as the edge lengths. Clearly, E E since each edge in E is also the simple shortest path between its two endpoints. We then define a 2-star tree in G as a tree with at most two internal nodes. In the case of two internal nodes, the edge between the two internal nodes is the unique central edge of the tree. Then, for any tree T T B in G with an edge e m1 m 2 E as its central edge, we can define a corresponding tree T in G as follows. As in Lemma 1, e m1 m 2 divides T into two trees T 1 and T 2, U into U 1 and U 2, and S into S 1 and S 2 where m 1 S 1 and m 2 S 2. Then, if we connect each node w 1 S 1 U1 to m 1 by e w1 m 1, each node w 2 S 2 U2 to m 2 by e w2 m 2, and add the edge e m1 m 2, we build a new tree T in G. Clearly, there are only two internal nodes on T. Figure 3.2 shows an example of a tree T T B in G and its corresponding tree T in G. Therefore, Lemma 4 describes that the corresponding T is a 2-star in G where e m1 m 2 is still its central edge and c(t ) = c(t ) given S and U are still the source and sink node set respectively in G. (a) The original tree T T B in G (b) The corresponding tree T in G Figure 3.2: An example on an original tree T T B in G and its corresponding tree T in G Lemma 4 For any tree T T B with some edge e m1 m 2 E as its central edge, there exists a corresponding 2-star tree T in G with e m1 m 2 still as its central edge and c(t ) = c(t ). Proof: By the definition of T B and by the construction of G, the distances from

21 16 every node to m 1 and m 2 are the same in T and T. Therefore, e m1 m 2 is also the central edge of T by Lemma 1 and then c(t ) = c(t ) by Lemma 2. We then define a tree set T C for G such that T C consists of all the 2-star trees one to one corresponding to all the trees in T B. The following observation is true. Observation 4 For a tree T T B and a tree T T C, if c(t ) and c(t ) are the minima among all the trees in T B and T C respectively, then c(t ) = c(t ). Now, we define another 2-star tree set T D for G in order to investigate T C. We first partition S into two source sets S 1 and S 2 and U into two sink sets U 1 and U 2. Then, for any edge e m1 m 2 E, if we build one star tree rooted at m 1 spanning all nodes in S 1 and U 1, another star tree rooted at m 2 spanning all nodes in S 2 and U 2, and add e m1 m 2 to connect these two star trees, we then create a tree T of G with only m 1 and m 2 as the internal nodes. By this construction, since there are at most 2 2n ways to partition S and U, we could obtain at most 2 2n different trees for an edge e m1 m 2, where some of them are 2-star trees with e m1 m 2 as their central edge. We then define the 2-star set T D as all the 2-star trees in G with an edge e m1 m 2 E as their central edge. Clearly, T C T D as all the trees in T C are also 2-star trees for some edge in E as the central edge. Then, we have the following observation. Observation 5 For a tree T T D, if c(t ) is the minimum among all the trees in T D, then c(t ) is the lower bound for the cost of all the trees in T C. That is, c(t ) c(t c ) for any T c T C. In addition, c(t ) is also the lower bound for the cost of all the trees in T B based on Observation 4. That is, c(t ) c(t b) for any T b T B. Next, we shall focus on finding the tree minimizing c(t ) among all T T D. The algorithm tries all edges in E. For each edge e m1 m 2, it finds the best 2-star with e m1 m 2 as the central edge. In Wu (2004b), a method is provided to examine the 2-star trees in a complete graph. We shall follow this method to study c(t ) for T T D. Similar notations in Wu (2004b) are first introduced. For a tree T T D with e m1 m 2 as the central edge, let (S 1, S 2 ) of S and (U 1, U 2 ) of U be its associated bipartitions. We define the xy-pair to be the ordered pair in which x = D T (m 1, S 1 ) and y = D T (m 2, S 2 ) and the pq-pair to be the ordered pair in which p = D T (m 1, U 1 ) and q = D T (m 2, U 2 ). By Lemma 2, c(t ) = l m1 m 2 +max{x+q, y+p} since e m1 m 2 is the central edge. However, for all T T D with e m1 m 2 as the central edge, there are at most n 2 different xy-pairs in 2 n bipartitions of S and at most n 2 different pq-pairs in 2 n bipartitions of U. Among all these bipartitions for edge e m1 m 2, we want to find the best one to minimize c(t ), which is the same as minimizing max{x + q, y + p}.

22 17 Fact 1 Wu (2004b) If (x, y) is a xy-pair, there is no source v with d G (v, m 1 ) > x and d G (v, m 2 ) > y simultaneously. If (p, q) is a pq-pair, there is no sink u with d G (u, m 1 ) > p and d G (u, m 2 ) > q simultaneously. Let P = {(d G (v 1, m 1 ), d G (v 2, m 2 )) v 1, v 2 S}. A xy-pair is minimal if (1) for any other (x 1, y 1 ) P, x 1 > x or y 1 > y; and (2) there exists a source node v with d G (v, m 1 ) = x and d G (v, m 2 ) > y. Then, there are at most (n 1) minimal xy-pairs and (n 1) minimal pq-pairs. Let L s be the list of all minimal pairs (x i, y i ) satisfying x i y i l m1 m 2, where all x i are in increasing order and all y i are in decreasing order. Let L u be sorted list of all minimal pq-pairs (p i, q i ), where all p i are in decreasing order and all q i are in increasing order. Lemma 5 Wu (2004b) Given the sorted distances d G (v, m 1 ) for all sources v S, the list L s can be constructed in O( S ) = O(n) time. Given the sorted distances d G (u, m 1 ) for all sinks u U, L u can be constructed in O( U ) = O(n) time. Define a(i, j) = x i + q j, b(i, j) = y i + p j, and f(i, j) = max{a(i, j), b(i, j)}. The goal is then to find the minimal f(i, j). For a fixed i, define a i (j) = a(i, j), b i (j) = b(i, j), and f i (j) = f(i, j). To find the minimal f(i, j), we need to find the minimal f i (j) for each i. Fact 2 Wu (2004b) The function a(i, j) is monotonically increasing for both i and j; and b(i, j) is monotonically decreasing for both i and j. Then, f i (j) is bitonic: monotonically decreasing and then monotonically increasing. Fact 3 Wu (2004b) If f i (j) achieves its minimum at j, then b i (j + 1) < a i (j + 1) and a i (j 1) < b i (j 1). Lemma 6 Wu (2004b) If f i and f i+1 achieve their minima at j and j respectively, then j j. Lemma 7 Wu (2004b) Given L s and L u, the minimal f(i, j) can be computed in O( U ) = O(n) time. The following algorithm finds the tree with the minimal c(t ) among all T T D. This algorithm is a modification of Algorithm 2 in Wu (2004b). Algorithm for Finding Minimal c(t ) in T D Input: Graph G = (V, E, l ij ) and the node distance restrictions H i. 1. Compute the all-pair shortest path lengths for each node in graph G. 2. Construct G from G, set the source set S = V and sink set U, and compute the distance from each source node to all sink nodes. 3. Construct G from G. 4. For each source v S, sort the distance d G (v, w) from v to all other nodes

23 18 w S U. 5. For each edge e m1 m 2 E do (a) Construct L s and L u. (b) Find i and j minimizing f(i, j). (c) Compute l m1 m 2 + f(i, j ). 6. Let e m1 m 2 and i and j minimize the cost among all edges. Denote the minimal cost found in last step as c(m 1, m 2 ) = l m1 m 2 + f(i, j). 7. Construct the output tree T. (a) Partition S into S 1, S 2 and U into U 1, U 2 such that d G (w, m 1 ) d G (w, m 2 ) x i y i for any node w S 1 U1. (b) On G, construct a star tree T 1 rooted at m 1 spanning S 1 U1. (c) On G, construct a star tree T 2 rooted at m 2 spanning S 2 U2. (d) Build T by adding e m1 m 2 to connect T 1 and T 2. In this algorithm, step 1 takes O(n 2 log n+nm) time. Step 2 takes O(n) time. Step 3 takes O(n 2 ) time. Step 4 takes O(n 2 log n) time for sorting. Step 5 takes O(n) time for an edge and O(nm) time for all edges. Step 6 takes O(n) time to construct the tree. Thus, the algorithm takes O(n 2 log n+nm) time. The algorithm finds the minimal cost c(m 1, m 2 ) for all the trees in T D and outputs a tree T in G. Now, we shall demonstrate that T is the optimal tree in T D by showing that T is a tree in T D and its cost is equal to the minimal cost c(m 1, m 2 ). In the proof, we follow the steps in the proof of Wu s Lemma 14 in Wu (2004b), which states a similar result to Lemma 8. Lemma 8 T T D and c(t ) = c(m 1, m 2 ). Consequently, T is the optimal tree in T D. Proof: First, we need to show T T D. Since m 1 and m 2 are the only two internal nodes on T by construction, we only need to show that e m1 m 2 is the central edge. Since (x i, y i ) is a minimal pair in L s, there exists a source s with d G (s, m 1 ) = x i and d G (s, m 2 ) > y i by the definition of a minimal pair. So, we have d G (s, m 1 ) d G (s, m 2 ) x i y i. Then, s S 1 by the definition of S 1 in the algorithm. This implies D T (m 1, S 1 ) x i. Suppose that there is a source s 1 S 1 with d G (s 1, m 1 ) > x i. Since d G (s 1, m 1 ) d G (s 1, m 2 ) x i y i, we have d G (s 1, m 2 ) y i + (d G (s 1, m 1 ) x i ) y i, a contradiction to Fact 1. Therefore, D T (m 1, S 1 ) = x i and similarly D T (m 2, S 2 ) = y i.

24 19 By the definition of L s, x i y i l m1 m 2. Then, we have D T (m 1, S 1 ) D T (m 2, S 2 ) l m1 m 2, which indicates that e m1 m 2 is the central edge of T by Lemma 1. Next, we establish c(t ) = c(m 1, m 2 ). Because e m1 m 2 is the central edge, D T (m 1, S 1 ) = x i, and D T (m 2, S 2 ) = y i, then we have c(t ) = l m1 m 2 + max{d T (m 1, S 1 ) + D T (m 2, U 2 ), D T (m 2, S 2 ) + D T (m 1, U 1 )} = l m1 m 2 + max{x i + D T (m 2, U 2 ), y i + D T (m 1, U 1 )}. Recall c(m 1, m 2 ) = l m1 m 2 + f(i, j) = l m1 m 2 + max{x i + q j, y i + p j }. As (p j, q j ) is a minimal pair in L u, there is no sink node u with d G (u, m 1 ) > p j and d G (u, m 2 ) > q j simultaneously by Fact 1. For any sink u U 1, if d G (u, m 1 ) p j, we have d G (u, m 1 ) + y i p j + y i. Otherwise, if d G (u, m 1 ) > p j, then d G (u, m 2 ) q j. Since u U 1, then d G (u, m 1 ) d G (u, m 2 ) x i y i by the definition of U 1 in the algorithm. Therefore, d G (u, m 1 )+y i x i +d G (u, m 2 ) x i +q j. Consequently, d G (u, m 1 )+y i max{x i + q j, y i + p j } for any u U 1, which implies y i + D T (m 1, U 1 ) max{x i + q j, y i + p j }. Similarly, we can show that x i + D T (m 2, U 2 ) max{x i + q j, y i + p j } as well. Therefore, c(t ) c(m 1, m 2 ). Because T T D and c(m 1, m 2 ) is the minimum for all the trees in T D, c(t ) = c(m 1, m 2 ) and then T is the optimal tree in T Finally, given the optimal tree T in T D, we create a corresponding tree T in T B such that c(t ) = c(t ). In G, we build a shortest path tree T 1 rooted at m 1 spanning S 1 U1, a shortest path tree T 2 rooted at m 2 spanning S 2 U2, and add e m1 m 2 to the two shortest path trees. This construction can be done in O(n log n + m) time. We shall first prove that this construction creates a tree T in G in Lemma 9 and then establish T T B and c(t ) = c(t ) in Lemma 10. Lemma 9 Wu (2004b) Any shortest path P from m 1 to a node w 1 S 1 U1 contains no node in S 2 U2. Similarly, any shortest path from m 2 to a node in S 2 U2 contains no node in S 1 U1. Proof: Suppose that there exists a node w 2 on P such that w 2 S 2 U2. By the principle of optimality, d G (w 1, m 1 ) = d G (w 1, w 2 ) + d G (w 2, m 1 ). By definition of S 2 U2, d G (w 2, m 1 ) d G (w 2, m 2 ) > x i y i. We then have d G (w 2, m 1 ) d G (w 2, m 2 ) = (d G (w 2, m 1 ) + d G (w 2, w 1 )) (d G (w 2, m 2 ) + d G (w 2, w 1 )) = d G (w 1, m 1 ) (d G (w 2, m 2 ) + d G (w 2, w 1 )) > x i y i. Then, by triangle inequality, d G (w 1, m 1 ) d G (w 1, m 2 ) > x i y i. This implies that w 1 S 2 U2, which is a contradiction. So, T 1 and T 2 are two disconnected trees in G and e m1 m 2 connects T 1 and T 2. That is, T is a tree of G. Finally, we shall establish that T is the optimal n-mest by showing that T D.

25 20 is the tree with the minimal cost in T B. Lemma 10 T T B and c(t ) = c(t ). Consequently, T is the optimal n-mest. Proof: Lemma 9 has shown T is a tree of G. Also, by construction, all source nodes are connected with either m 1 or m 2 on T. In order to show T T B, we first show that all the sink nodes connect with its adjacent source nodes in T. By definition of S 1 U1, if a source node v t S 1, then d G (v t, m 1 ) d G (v t, m 2 ) x i y i. Then, for its adjacent sink node u t, d G (u t, m 1 ) d G (u t, m 2 ) = (d G (u t, v t ) + d G (v t, m 1 )) (d G (u t, v t ) + d G (v t, m 2 )) = d G (v t, m 1 ) d G (v t, m 2 ) x i y i Thus, u t U 1 as well. Similarly, if v t S 2, then u t U 2. Hence, all the sink nodes must be connected with its adjacent source nodes when constructing shortest path trees T 1 and T 2. Next, we shall show that e m1 m 2 is the central edge of T. For any source s S 1, because d T (s, m 1 ) = d G (s, m 1 ) on T and d T (s, m 1 ) = l sm1 = d G (s, m 1 ) on T, then d T (s, m 1 ) = d T (s, m 1 ). Thus, D T (m 1, S 1 ) = D T (m 1, S 1 ) and similarly D T (m 2, S 2 ) = D T (m 2, S 2 ). By Lemma 1, D T (m 1, S 1 ) D T (m 2, S 2 ) l m1 m 2 as e m1 m 2 is the central edge of T. Therefore, e m1 m 2 is also the central edge of T by Lemma 1, which concludes T T B. For the same logic, D T (m 1, U 1 ) = D T (m 1, U 1 ) and D T (m 2, U 2 ) = D T (m 2, U 2 ) as well. Then, by Lemma 2, c(t ) = c(t ). Because c(t ) is the minimum for all trees in T D and c(t ) = c(t ), T must be the optimal tree in T B by Observation 5 and then the optimal n-mest by Observation 3. Therefore, we have the following theorem. Theorem 2 The n-mest and correspondingly the NMVT can be solved in O(n 2 log n+ nm) time. 3.3 Minimax Distance Violation Tree with Pairwise Distance Restriction (PMVT) This section evaluates the complexity of the PMVT problem and shows it to be NP-Complete. We first state the decision version of the PMVT problem. Instance: Graph G = (V, E), pairwise distance restriction H ij for any pairs of two nodes v i and v j in V, integer bound K Z +. Question: Is there a spanning tree T for G such that the maximum, over all pairs of nodes v i, v j V, distance violation in T is no more than K? We now demonstrate that the PMVT problem is NP-Complete. In order to

26 21 describe the result of the NP-Completeness of the PMVT problem, we need first introduce the Tree t-spanner problem. Instance: Graph G = (V, E), let d G (v i, v j ) be the length of the shortest path connecting v i and v j in G and d T (v i, v j ) be the length of the unique path connecting v i and v j in a spanning tree T. Given an integer t Z +. Question: Is there a spanning tree T for G such that for each pair of nodes v i and v j, d T (v i, v j ) td G (v i, v j )? Chew (1986) and Peleg and Ullman (1987) introduce the notation of Tree t- Spanner. For general nonnegative edge lengths, Cai and Corneil (1995) demonstrate the Tree t-spanner problem is NP-Complete even for t = 2, and it remains NP- Complete for t 4 for unit edge lengths. The NP-Completeness of the PMVT problem is presented in the following theorem. Theorem 3 The PMVT problem is NP-Complete. Proof: The PMVT problem is clearly in NP because it takes polynomial time to check whether or not the distance of any pair of node in a given tree T is less than the integer bound K. Thus, we need only to prove the PMVT problem is NP-hard. We shall next prove the PMVT problem is NP-hard by transforming the Tree t-spanner problem into a special case of the PMVT problem. Given an instance of the Tree t-spanner problem, we can define a special case of the PMVT problem by setting the distance restriction between any two nodes v i and v j as H ij = td G (v i, v j ) and the integer bound K = 0. Then, we claim that this special case of the PMVT problem has a solution if and only if the Tree t-spanner problem has a solution. Suppose first that this special PMVT problem has a solution. Then, if we let v i and v j be any two nodes in the graph, there exists a tree T such that the violation between nodes v i and v j is V IO ij = max{0, d T (v i, v j ) H ij } = max{0, d T (v i, v j ) td G (v i, v j )} K = 0. Hence, d T (v i, v j ) td G (v i, v j ) for any two nodes v i and v j in T. So, the tree T is also the solution of the Tree t-spanner problem. Next, suppose the Tree t-spanner problem has a solution. Thus, there exists a tree T such that d T (v i, v j ) td G (v i, v j ) for any two nodes v i and v j in T. Then, the violation V IO ij between node v i and v j in the tree T is V IO ij = max{0, d T (v i, v j ) H ij } = max{0, d T (v i, v j ) td G (v i, v j )} = 0. Hence, the maximal distance violation of all pairs of nodes in T is equal to 0 and thus no bigger than the integer bound K = 0. That is, the special PMVT problem also has a solution. Therefore, because the Tree t-spanner problem is NP-hard, the PMVT problem is NP-hard as well. As we have shown that the PMVT problem is in NP, it is then NP-Complete. Based on the above reduction, we present two additional observations which

27 22 further indicate that, given the pairwise distance restriction, both the problem of minimizing a monotone function of all pairs of violations in a tree and the problem of finding an approximation solution of the PMVT problem within any constant factor are NP-hard. Corollary 2 For the pairwise distance restriction, if any problem P is to find a tree to minimize a monotone function of all pairs of violations, V IO ij = max{0, d T (v i, v j ) H ij }, then the problem P is NP-hard. Proof: Given a graph G = (V, E), the pairwise node distance restriction H ij between node v i and v j, and an integer bound K Z +, the problem P is defined as the problem of finding a tree such that the monotone function of the violations, φ = φ(v IO 12, V IO 13,..., V IO ij,..., V IO n(n 1) ) K. Then, given an instance of the Tree t-spanner problem, we can define a special case of the problem P by letting the distance restriction between any two nodes v i and v j as H ij = td G (v i, v j ) and the integer bound K = φ(0, 0,..., 0,..., 0). We claim that this special case of the problem P has a solution if and only if the Tree t-spanner problem has a solution. First, if the Tree t-spanner problem has a solution, then there exists a tree T such that d T (v i, v j ) td G (v i, v j ) for any two nodes v i and v j in T. Thus, the violation between nodes v i and v j in the tree T is V IO ij = max{0, d T (v i, v j ) H ij } = max{0, d T (v i, v j ) td G (v i, v j )} = 0. That is, the violation for any two nodes in the tree T is 0. Then, for the tree T, φ = f(v IO 12, V IO 13,..., V IO ij,..., V IO n(n 1) ) = φ(0, 0,..., 0,..., 0) K = φ(0, 0,..., 0,..., 0). Hence, the tree T is also the solution for the special case of the problem P. Next, suppose the special case of the problem P has a solution tree T satisfying φ K = φ(0, 0,..., 0,..., 0). By definition, the violation V IO ij = max{0, d T (v i, v j ) H ij } 0 for any two nodes v i and v j. Then, we shall prove the violation for any pair of nodes in tree T is 0 by contradiction. Assume the violation for two nodes v 1 and v 2 is a positive number. Since the violation for all other pairs of nodes are greater than or equal to 0 and the function φ is monotone, then φ > φ(0, 0,..., 0,..., 0), which contradicts f K = f(0, 0,..., 0,..., 0). Hence, the violation V IO ij = 0 for any pair of nodes v i and v j in the tree T. Therefore, the tree T also satisfies d T (v i, v j ) td G (v i, v j ) for the Tree t-spanner problem. Thus, because the Tree t-spanner problem is NP-hard, the problem P is NPhard as well. Corollary 3 For the pairwise distance restriction, the problem P to obtain an approximation solution of the PMVT problem within any constant factor is NP-hard. Proof: Given a graph G = (V, E), the pairwise distance restriction H ij between node v i and v j, and an integer bound K Z +, we define the problem P as the problem of finding an approximation solution tree of the PMVT problem whose

28 23 maximal distance violation is within a constant factor c R + of the optimal solution tree. Similar to the proof of Theorem 3, if we let H ij = td G (v i, v j ) and the integer bound K = 0, for any constant factor c R +, the problem P has a solution if and only if the Tree t-spanner problem has a solution. Suppose problem P has a solution. Then, if we let v i and v j be any two nodes, there exists a tree T such that their violation V IO ij = max{0, d T (v i, v j ) H ij } = max{0, d T (v i, v j ) td G (v i, v j )} ck = 0 because of K = 0. Hence, d T (v i, v j ) td G (v i, v j ) for any two nodes v i and v j in T. So, the tree T is also the solution of the Tree t-spanner problem. Next, suppose the Tree t-spanner problem has a solution. Thus, there exists a tree T such that d T (v i, v j ) td G (v i, v j ) for any two nodes v i and v j in T. Then, the violation between node v i and v j in the tree T is V IO ij = max{0, d T (v i, v j ) H ij } = max{0, d T (v i, v j ) td G (v i, v j )} = 0. Hence, the maximal distance violation of all pairs of nodes in T is equal to 0 and thus no bigger than ck = 0. That is, this problem P also has a solution. Thus, because the Tree t-spanner problem is NP-hard, the problem P is NPhard as well. 3.4 Alternative NMVT Algorithm In this section, we discuss an alternative algorithm to solve the NMVT, which is more easily extended to the variants of NMVT that consider flow between nodes. In Section 3.5, we will discuss this extension further Preliminaries To begin, we restate the following lemma from Connamacher and Proskurowski (2003) characterizing the longest distance from each node in a tree. Lemma 11 Given a tree T and its diameter path D T, let node s and node t be the two endpoints of D T. Let h u represent the longest distance from node u to other nodes in T. Then, h u is obtained either by the u-s path or by the u-t path. The next lemma (Kariv and Hakimi, 1979) clarifies which endpoint of the diameter path defines the longest distance to a node on the tree and the value of its longest distance. Certain definitions are necessary for the description. A point is a location in the graph which could be either a node or a location on an edge. The center point of a tree is the location on the diameter path such that the length of the path from each of the diameter path s two endpoints to the center point is the same. If the center point is not a vertex but a location on an edge, we call the edge

29 24 containing the center point as the center edge of a tree. Lemma 12 Given a tree T and its diameter path D T, let node s and node t be the two endpoints of D T. Let o be the center point of T. Let h u represent the longest distance from node u to other nodes on T. When node u is not on D T, let node v u be the node on D T where the path connecting u with D T intersects D T. If v u lies in the s-o path, then h u = d T (u, t). If v u lies in the t-o path, then h u = d T (u, s). Similarly, when node u is on D T, if u lies on the s-o path, then h u = d T (u, t). If u lies on the t-o path, then h u = d T (u, s). Then, h u = d T (u, o) + 1δ 2 T. An important theorem on the shortest path tree rooted at the center point of the optimal tree for the k-mest problem is proved by Farley et al. (2000) and presented by Krumme and Fragopoulou (2001), McMahan and Proskurowski (2004), and Wu (2004b). We restate the theorem in a more general form and revisit its proof as it is key in the development of our algorithm. For the proof, we need to extend the concept of a shortest path tree rooted at a node in a graph to a shortest path tree rooted at any point in a graph. Given a point α on an edge e pq in graph G, by adding a new node v α at α and replacing e pq with a path of two new edges e pvα and e vα q, we modify G to a new graph G. In G, we can find the shortest path tree T rooted at the new node v α. Because T must contain the two new edges e pvα and e vα q, if we replace the path of e pvα and e vα q with e pq, we then modify T to a new tree T, which is a spanning tree of G. Hence, by this means we can define the shortest path tree rooted at any point in G. Using this definition, the concept of a (single-source) shortest path spanning tree rooted at a node can be extended to a shortest path spanning tree rooted at any point in G. Theorem 4 Given any tree T in graph G and its center point o, let tree T o be the shortest path tree rooted at o in G. Then, for any node, its longest distance in T o is no greater than that in T. Proof: Let node u be an arbitrary node. Let h T u and h T o u denote the longest distance from u on T and T o, respectively. We want to show h T o u h T u. For any node v, because of the triangle inequality, the shortest paths from o in T o, the definition of diameter, and Lemma 12, we obtain d To (u, v) d To (u, o) + d To (o, v) d T (u, o) + d T (o, v) d T (u, o) δ T Therefore, it implies h To u h T u. = h T u

30 25 Based on Theorem 4, if T is an optimal tree for the NMVT problem and T o is the shortest path spanning tree rooted at the center point o of T, the longest distance for any node in T o is then no greater than that in T. Therefore, given the node distance restriction, the maximal distance violation from any node in T o is no greater than that in T. Thus, it implies that T o is also an optimal tree. Corollary 4 For the NMVT problem, given an optimal tree T, let o be the center point of T, then T o which is the shortest path spanning tree rooted at o is also an optimal tree. In the light of Corollary 4, given an optimal tree T and its center point o, we could build another optimal tree which is the shortest path tree T o rooted at o. However, because the diameter path of T o may move, the center point of T o may not be located at o any more, and o is not even guaranteed to be on the diameter path at all. Nevertheless, we can prove that there does exist an optimal shortest path tree whose root coincides with its center point. Theorem 5 For the NMVT problem, there is an optimal shortest path spanning tree whose root is also its center point. Proof: Based on Corollary 4, given an optimal tree for the NMVT problem, there exists an optimal shortest path spanning tree. Let Tr be such an optimal shortest path tree rooted at r. Let v 1 -v 2 path be the diameter path with diameter δ T r and o the center point of Tr. Then, o could divide Tr into two subtrees, subtree 1 which contains v 1 and subtree 2 which contains v 2. By Lemma 12, for any node in subtree 1, its longest distance as well as its maximal distance violation is reached by the path to v 2. Then, let node s 1 and s 2 be the nodes whose maximal distance violation are the largest among all the nodes in subtree 1 and subtree 2 respectively. Since r is either on subtree 1 or on subtree 2, without loss of generality, let r on subtree 1. Then, since v 2 is on subtree 2, o is on the r-v 2 path. This situation is shown in Figure 3.3. Thus, by Corollary 4, the shortest path tree T o rooted at o is also an optimal tree. If o is also the center point of T o, then we are done. However, if o is not the center point of T o, we must construct another optimal shortest path tree whose root coincides with its center point in order to finish the proof. The key is to prove that the center point o 1 of T o must lie in the o-v 2 path. In order to prove this result, we let δ To be the diameter of T o and need to prove δ To δ T r. For two arbitrary nodes u and v, because of the triangle inequality, shortest paths from o in T o, and the definition of the diameter, we have d To (u, v) d To (u, o) + d To (o, v) d T r (u, o) + d T r (o, v)

31 26 Figure 3.3: Two subtrees of T r separated by its center point o 1 2 δ T r δ T r = δ T r Thus, any path in T o is shorter than the diameter of Tr. This implies δ To δ T r as well. We shall further prove that v 2 is one of the endpoints of the diameter path on T o. Because r is on subtree 1 in Tr and all of the shortest paths from nodes in subtree 2 to r pass through o in Tr, by the shortest path tree suboptimality property, subtree 2 should be in T o. In the proof of Theorem 4, we have shown that for any node, its longest distance in T o is no greater than that in Tr such that its maximal distance violation is no greater than that in Tr. Thus, the maximal violation of T o is still obtained either by the path from s 1 or by the path from s 2. Suppose it is obtained by the path from s 1. Since any path from s 1 to a node in subtree 2 is not the longest path from s 2 in Tr and subtree 2 is kept in T o, then the longest path from s 2 in T o has to be obtained to a node belong to subtree 1 in Tr. Let w be any node belong to subtree 1 in Tr. Because d To (s 2, w) d To (s 2, o) + d To (o, w) = d T r (s 2, o) + d To (o, w) d T r (s 2, o) + d T r (o, w) = d T r (s 2, w), the distance from s 2 to w in T o is no greater than that in Tr. Thus, the longest distance from s 2 in T o must be still obtained by the s 2 -v 1 path in T o which has the same distance as the s 2 -v 1 path in Tr. Therefore, since the o-s 2 path in subtree 2 is kept in T o, the path from o to v 1 in Tr must be kept in T o as well. Since δ To δ T r and d To (v 1, v 2 ) = d T r (v 1, v 2 ) = δ T r, then the v 1 -v 2 path is still the diameter path of

32 27 T o and o is still the center point of T o. Thus, it is a contradiction that o is not the center point of T o. Hence, the maximal distance violation of T o is obtained by the path from s 2. Then, the longest path from s 2 in T o must be the same as that in Tr. Because the o-v 2 path in subtree 2 is kept in T o, the path from o to s 1 in Tr must be kept in T o as well. That is, the s 1 -v 2 path in Tr is kept in T o. By Lemma 11, the longest path from s 1 must pass the center point and end at one endpoint of the diameter path. Then, v 2 is one endpoint of the diameter path in T o and o 1 lies in the s 1 -v 2 path. Hence, because of δ To δ T r, we then have d To (o 1, v 2 ) = 1 2 δ T o = d T r (o, v 2 ) Therefore, o 1 lies in the o-v 2 path in T o. Then, we can build the third optimal shortest path tree rooted at o 1, so on and so forth. If the center point does not coincide with its root in the new optimal shortest path tree, then it is closer to v 2 on the s 1 -v 2 path. Since every time the center point in the new tree moves a positive distance closer to v 2 and the distance from it to v 2 is bounded, the moving distance must converge to 0. In other words, the center point will coincide with the root of the shortest path tree eventually. Theorem 5 characterizes a special property of the structure of an optimal tree for the NMVT problem. We now provide the notation for the special tree set containing all the trees with this property. Let V be the set of the points in G and T SP be the set of shortest path trees rooted at a point in V. Then, To SP denotes the subset of T SP containing all the trees whose roots coincide with their center point. Then, based on Theorem 5, we can identify the optimal tree that minimizes the maximal distance violation for the node distance restriction in Corollary 5. Corollary 5 The tree with the minimax distance violation among all the trees in To SP is the optimal tree for the NMVT problem. Because To SP is a subset of T SP, it follows directly from Corollary 5 that the tree having the minimax distance violation among all the trees in T SP is optimal for the NMVT problem. Thus, one algorithm is to examine all the trees in T SP to find the optimal tree for the NMVT problem. McMahan and Proskurowski (2004) demonstrate that T SP can be constructed in polynomial time. Because To SP is a smaller subset of T SP, however, it is more efficient to examine To SP rather than T SP. Two theorems in McMahan and Proskurowski (2004) concerning the set T SP provide an efficient way for us to further examine the set To SP. 1 2 δ T r

33 28 The two theorems require the following definitions related to the idea of intervals. In graph G, the distance between u and v in G, d G (u, v), is the length of the shortest u-v path in G. Let α be a point on an edge e pq. Then, d p (α) is the distance along the edge from p to α; d q (α) is the distance along the edge from q to α, and d p (α) + d q (α) = l pq. If we add a new node v α at the location d p (α), the distance between the new node v α on e pq and a vertex v in G is min{d G (p, v) + d p (α), d G (q, v) + d q (α)}. We define such sets of points by analogy to intervals on the real line. Given points α 1 and α 2 on an edge e pq with d p (α 1 ) < d p (α 2 ), a point β on e pq is in the open interval (α 1, α 2 ) if d p (α 1 ) < d p (β) < d p (α2). Certain points on an edge e pq which are called bottleneck points in the location theory (Nickel and Puerto, 2005) play an important role in deciding the shortest path from any point on e pq. For each vertex v in G, we define the point γ v on e pq such that for any point α located on the interval (p, γ v ) the shortest path from α to v is through node p, and for any point α located on the interval (γ v, q) the shortest path from α to v is through node q. Then the point γ v on e pq satisfies the relation that the distance of the shortest path from γ v to v through p is the same as the distance of the shortest path from γ v to v through q. Thus, by the relation d p (γ v ) + d G (p, v) = d q (γ v ) + d G (q, v) = (l pq d p (γ v )) + d G (q, v), the location of γ v on e pq for node v is then given as d p (γ v ) = 1(d 2 G(q, v) d G (p, v) + l pq ). Let v 1, v 2,..., v n be n nodes in G. Without loss of generality, let γ v1, γ v2,..., γ vn be the corresponding point on edge e pq such that d p (γ v1 ), d p (γ v2 ),..., d p (γ vn ) is a non-decreasing sequence of distances on the edge e pq. For completeness, we now present, without proof, two theorems from McMahan and Proskurowski (2004). Theorem 6 For any two points α 1 and α 2 in an interval (γ vi, γ vi+1 ) for 1 i < n, the set of shortest path spanning trees rooted at α 1 is the same as the set of shortest path spanning trees rooted at α 2. Theorem 7 Any shortest path spanning tree rooted at a point α (γ vi 1, γ vi+1 ), 1 < i < n, is also the shortest path spanning tree rooted at the point γ vi. By Theorem 6, all of the points in an interval (γ vi, γ vi+1 ) for 1 i < n share the same set of shortest path spanning trees. Thus, we can investigate every interval (γ vi, γ vi+1 ) on a certain edge e pq to see whether we can find a shortest path tree rooted at a point in the interval (γ vi, γ vi+1 ) which belongs to the set To SP. Theorem 8 provides the necessary and sufficient condition for determining whether or not a shortest path tree rooted at a point in the interval (γ vi, γ vi+1 ) is in the set To SP. Theorem 8 Let T be a shortest path tree rooted at any point in the interval (γ vi, γ vi+1 ) on the edge e pq. Because of the definition of γ vj and the non-decreasing assumption

34 29 of the sequence d p (γ v1 ), d p (γ v2 ),..., d p (γ vn ), T goes through p to v j for i + 1 j n and through q to v j for 1 j i. Suppose node v s has the longest shortest path to q among all the nodes v j for 1 j i in T and node v t has the longest shortest path to p among all the nodes v j for i + 1 j n in T. Let P be the path by connecting the path from v t to p, the edge e pq, and the path from q to v s. Let β be the midpoint of P such that d T (v s, q) + d q (β) = d T (v t, p) + d p (β). Then, d T (v s, v t ) = [d T (v s, q) + d q (β)] + [d T (v t, p) + d p (β)]. Then, T To SP if and only if β is in the interval [γ vi, γ vi+1 ]. Proof: First, we shall prove that if T To SP then β is in the interval [γ vi, γ vi+1 ]. Since T To SP, then the center point of T is also the root of T by the definition of To SP. Because the root of T is in the interval (γ vi, γ vi+1 ), that is, the root of T is also in the interval [γ vi, γ vi+1 ], then the center point of T is also in [γ vi, γ vi+1 ]. It implies that the diameter path contains the edge e pq. Then, the two endpoints of the diameter path must approach p and q from different sides of e pq respectively. Therefore, since P is the longest path in T containing e pq by construction, then P must be the diameter path. The midpoint of P, β, is then the center point of T. Hence, β is in the interval [γ vi, γ vi+1 ]. Second, we will show that if β is in the interval [γ vi, γ vi+1 ] then T To SP. By Theorem 6 and Theorem 7, for any point in the interval [γ vi, γ vi+1 ], T is a shortest path tree rooted at that point. So, in order to prove T To SP, we only need to show that the center point of T is also in [γ vi, γ vi+1 ]. We claim that the path P is the diameter path of T. The diameter path of T either goes through edge e pq or not. If the diameter path goes through e pq, then by the construction of path P which is the longest path containing e pq in T, it is the diameter path. We are done. Suppose the diameter path does not go through e pq. T consists of three parts: the subtree containing all v j for 1 j i and q, the subtree containing all v j for i + 1 j n and p, and the edge e pq. Then, the diameter path must lie in one of the two subtrees of T. Without loss of generality, let the diameter path is a path in the subtree containing q. Suppose that the path from v a to v b for 1 a i and 1 b i is the diameter path. By the triangle inequality, we obtain the relation d T (v a, v b ) d T (v a, q) + d T (v b, q). Because v s has the longest shortest path to q in T among all v j for 1 j i, then d T (v a, q) d T (v s, q) and d T (v b, q) d T (v s, q). Thus, d T (v a, v b ) d T (v a, q) + d T (v b, q) d T (v s, q) + d T (v s, q) < [d T (v s, q) + d q (β)] + [d T (v s, q) + d q (β)] = [d T (v s, q) + d q (β)] + [d T (v t, p) + d p (β)]

35 30 = d T (v s, v t ) This contradicts that the length of the diameter path is shorter than that of path P. Thus, the diameter path must contain e pq. Hence, by construction P must be the diameter path and its midpoint β is the center point of T. Because β is in the interval [γ vi, γ vi+1 ], the center point of T is in [γ vi, γ vi+1 ]. Hence, T To SP. Theorem 9 will demonstrate that when there are different shortest path trees rooted at the same point in the interval (γ vi, γ vi+1 ), if any one of them belongs to the set To SP, then all of them belong to the set To SP. All of them will also share the same maximal distance violation. Theorem 9 Let T be the set of shortest path trees rooted at any point in the interval (γ vi, γ vi+1 ) on the edge e pq. Because of the definition of γ vj and the non-decreasing assumption of the sequence d p (γ v1 ), d p (γ v2 ),..., d p (γ vn ), any T T goes through p to v j for i + 1 j n and through q to v j for 1 j i. Let T β be a shortest path tree such that T β T and β be the root of T β. Then, by Theorem 6 and Theorem 7, β could be any point in the interval [γ vi, γ vi+1 ]. If T β To SP, then 1. β is a unique point in [γ vi, γ vi+1 ]; 2. all the different shortest path trees rooted at β in the set T belong to To SP ; In addition, they have the same diameter; 3. all the different shortest path trees rooted at β in the set T share the same maximal distance violation for the node distance restriction. Proof: 1. Because T β To SP, β is both the root of T β and the center point of T β. Suppose that node v s has the longest shortest path to q among all the nodes v j for 1 j i in T β and node v t has the longest shortest path to p among all the nodes v j for i + 1 j n in T β. Let P be the path by connecting the path from v t to p, the edge e pq, and the path from q to v s. In Theorem 8, it has been proved that path P is the diameter path of T β. Thus, the location of β is unique in the interval [γ vi, γ vi+1 ] decided by the relation d T (v t, p) + d p (β) = d T (v s, q) + d q (β) = d T (v s, q) + l pq d p (β). That is d p (β) = 1[d 2 T (v s, q) d T (v t, p) + l pq ]. 2. Let T β and T β be two different shortest path trees rooted at β in the set T. T β and T β may have different paths from β with the same distance to some nodes v k for 1 k n. Suppose T β To SP, we need to show that T β also belongs to To SP.

36 31 First, we will show that the longest distance of the path from any v j for 1 j i to q in both T β and T β is the same and the longest distance of the path from any v j for i + 1 j n to p in both T β and T β is also the same. Let node v s, v s for some 1 s i and 1 s i have the longest shortest path to q in T β and T β respectively and node v t, v t for some i + 1 t n and i + 1 t n have the longest shortest path to p in T β and T β respectively. Suppose d Tβ (v s, q) d T β (v s, q) and without loss of generality assume d Tβ (v s, q) > d T β (v s, q). Obviously, v s v s, otherwise it will create a contradiction that there exists two different shortest paths from the same node v s (v s ) to q in G with different lengths. If v s v s, then d Tβ (v s, q) = d T β (v s, q) d T β (v s, q). This contradicts with d Tβ (v s, q) > d T β (v s, q). Therefore, d Tβ (v s, q) = d T β (v s, q) and similarly d Tβ (v t, p) = d T β (v t, p). Next, let P be the path in T β by connecting the path from v t to p, the edge e pq, and the path from q to v s. let P be the path in T β by connecting the path from v t to p, the edge e pq, and the path from q to v s. Let β be the midpoint of P. We will show that the location of β is the same as β. Since d Tβ (v s, q) = d T β (v s, q) and d Tβ (v t, p) = d T β (v t, p), then the location of β is given by d p (β ) = 1 2 [d T β (v s, q) d T β (v t, p) + l pq] = 1 2 [d T β (v s, q) d Tβ (v t, p) + l pq ] = d p (β) Thus, β and β are at the same location and then β [γ vi, γ vi+1 ] as well as β [γ vi, γ vi+1 ]. Hence, by Theorem 8, T β T SP o. In addition, because both β and β, the center points of T β and T β respectively, are on edge e pq, so both of the diameter paths of T β and T β contain edge e pq. By construction, path P and path P are the longest paths containing e pq on T β and T β respectively. Thus, path P and path P are the diameter paths of T β and T β respectively. Then, δ Tβ = d Tβ (v s, q) + l pq + d Tβ (v t, p) = d T β (v s, q) + l pq + d T β (v t, p) = δ T β 3. Let T β and T β be two different shortest path trees rooted at β. Suppose T β To SP, we have just shown T β To SP as well. Finally, we shall prove

37 32 that the maximal distance violation for both T β and T β is same. Let v be any node in G and h T β v respectively. Let δ Tβ, δ T β of T β T SP o, h T β v denote the longest distance from v in T β and T β be the diameter of T β and T β respectively. Because and T β To SP, β is the center point of both T β and T β. Thus, by Lemma 12, h T β v = d Tβ (v, β) + δ Tβ and h T β v = d T β (v, β) + δ T β. Since both T β and T β are shortest path trees rooted at β, d Tβ (v, β) = d T β (v, β). In addition, we have just shown δ Tβ = δ T β. Hence, h T β v = h T β v. Therefore, given the node distance restriction, the maximal distance violation for a node in T β is same as its maximal distance violation in T β. Thus, the maximal distance violation for the node restriction for both T β and T β is same. After we find a shortest path tree belonging to the set To SP according to the criterion in Theorem 8, we need to calculate its maximal distance violation in order to find the optimal tree. Theorem 10 provides an efficient way to evaluate the maximal distance violation of a tree in the set To SP. To help explain the result, we introduce the following definition. If a node v i connects edge e pq through node p, we then define the quantity d T (v i, p) H i as the p-violation of v i in T. Theorem 10 Let T be a shortest path tree rooted at a point on the edge e pq and belong to the set To SP. Suppose node v s and v a have the longest shortest path to q and the largest q-violation among all the nodes connecting with edge e pq through node q in T respectively. Suppose node v t and node v b have the longest shortest path to p and the largest p-violation among all the nodes connecting with edge e pq through node p in T respectively. Then, the maximal distance violation of T is max{d T (v a, q) + l pq + d T (p, v t ) H a, d T (v b, p) + l pq + d T (q, v s ) H b }. Proof: Let P be the path by connecting the path from v t to p, the edge e pq, and the path from q to v s. Then, by Theorem 8, path P is the diameter path of T because T To SP. Let v i be any node connecting with edge e pq through node q in T. By Lemma 12, the longest distance from v i in T is h i = d T (v i, q) + l pq + d T (p, v t ) and then its maximal distance violation in T is d T (v i, q) + l pq + d T (p, v t ) H i = [d T (v i, q) H i ] + l pq + d T (p, v t ) Since v a has the largest q-violation among all the nodes connecting with edge e pq through node q in T, it then offers the largest maximal distance violation among all the nodes connecting with edge e pq through node q in T. Similarly, v b offers the largest maximal distance violation among all the nodes connecting with edge e pq through node p in T.

38 33 Therefore, the maximal distance violation of T is max{d T (v a, q)+l pq +d T (p, v t ) H a, d T (v b, p) + l pq + d T (q, v s ) H b } Algorithm for the NMVT problem In this section, we use the preliminary results to build the algorithm to solve the NMVT problem. We first divide each edge in G into intervals as described above. We then examine all the open intervals (γ vi, γ vi+1 ) for 1 i < n of each edge to determine whether there exists a shortest path tree rooted at a point in the interval (γ vi, γ vi+1 ) belonging to the set To SP. If a tree belongs to the set To SP, we calculate its maximal distance violation and then update the minimax distance violation. We describe the algorithm in detail below. Algorithm: Edge Interval Examine 1. Initialize the minimax distance violation + ; 2. Solve all pairs shortest path tree rooted at every node in G; 3. For each edge e pq E, examine all the intervals on e pq from the interval closest to p to the interval farthest away from p: (a) Calculate the d p (γ vj ) for 1 j n and then sort all d p (γ vj ) in nondecreasing order to the γ Array A; (b) Store all nodes to the Node γ Array B according to the node order in the γ array A. Then, when we examine the intervals one by one, the nodes will also be moved one by one according to the order in the Node γ Array B from connecting with edge e pq through p to connecting with edge e pq through q; (c) Initialize before examining intervals: i. Initialize the p Distance Array P D by the shortest path distances to p for all the nodes sorted in non-increasing order: P D {d G (v 1, p), d G (v 2, p),..., d G (v n, p)}; ii. Let θ p be the longest shortest path distance to p among all the nodes connecting to edge e pq through p. Initialize θ p The first element in P D; iii. Let θ q be the longest shortest path distance to q among all the nodes connecting to edge e pq through q. Initialize θ q 0; iv. Initialize the p Violation Array P V by first calculating and then sorting the p Violation for all the nodes in non-increasing order;

39 34 v. Let λ p be the largest p Violation. Initialize λ p The first element in P V ; vi. Let λ q be the largest q Violation. Initialize λ q 0; (d) For each interval represented by the d p (γ vj ) in A, let v i be the node in B corresponding to the smallest d p (γ vj ) in A which has not been visited. Then, when we examine the shortest path tree rooted at a point in the next interval, v i will be moved from connecting with edge e pq through p to connecting with edge e pq through q. i. Update the data information: A. Update P D by removing d G (v i, p): P D P D {d G (v i, p)}; B. Update θ p : If d G (v i, p) < θ p, then θ p θ p ; otherwise, θ p The first element in P D; C. Update θ q : θ q max{θ q, d G (v i, q)}; D. Update P V by removing p Violation of v i : P V P V - p Violation of v i ; E. Update λ p : If d G (v i, p) Hi < λ p, then λ p λ p ; otherwise, λ p The first element in P V ; F. Update λ q : λ q max{λ q, d G (v i, q) H i }; ii. Check whether the shortest path tree T rooted at a point in this interval belongs to To SP and update the minimax distance violation : A. Calculate the location of the midpoint β of the longest path containing edge e pq by d p (β) = 1[θ 2 q θ p + l pq ]. If d p (v i ) d p (β) d p (v i+1 ), then T To SP ; otherwise T / To SP B. If T To SP, then update min{, max{θ p + l pq + λ q, θ q + lpq + λ p }}; 4. Output the minimax distance violation. Because, by Theorem 7, the shortest path tree rooted at γ vi is either the shortest path tree rooted at a point in the interval (γ vi 1, γ vi ) or the shortest path tree rooted at a point in the interval (γ vi, γ vi+1 ), the algorithm does not need to consider the shortest path trees rooted at the boundary point γ vi. Thus, the shortest path trees rooted at γ vi have been considered when we examine the interval (γ vi 1, γ vi ) and (γ vi, γ vi+1 ). The correctness of Edge Interval Examine follows from Theorem 5, Corollary 5, Theorem 6, Theorem 7, Theorem 8, Theorem 9, and Theorem 10. To

40 35 compute the all pairs shortest path tree in step 2, it takes O(n 2 logn + nm) time (Cormen et al., 1990). In step 3.a, the location of γ vj for a node v j is given by d p (γ vj ) = 1 2 [d G(q, v) d G (p, v) + l pq ]. Given the shortest path distance from node v j to p and to q obtained in step 2, it takes constant time to compute d p (γ vj ). Since there are n nodes, 3.a, 3.c.i, and 3.c.iv deal with sorting of n nodes, they each require O(nlogn) time. Step 3.b takes O(n) time to store n nodes. Further, 3.c.ii, 3.c.iii, 3.c.v, and 3.c.vi take constant time. Moreover, 3.d examines at most n 1 interval on an edge and each step in 3.d takes constant time. Then, 3.d takes O(n) time as well. Hence, it requires O(n) time to check one edge in step 3 and O(nm) time to check all the m edges in G. Therefore, the Edge Interval Examine algorithm solves the NMVT problem in O(n 2 logn + mnlogn) time. 3.5 Minimax Flow Distance Violation Tree Problems (F-MVT) The previous sections discussed three types of MVT problems. In these three problems, we focused on minimizing the distance violation of each pair of cities in a tree which assumes the flows between all cities are 1. In reality, the transportation flow between cities can vary greatly, and delivery providers may want to weight the violations based on the amount of flow involved. It is thus important to explore how incorporating the flow in the objective can influence the solution. In this section, we measure the violation of the distance restriction with consideration of the flow between nodes Flow Variants We consider two assumptions concerning the flows between nodes. Let f ij represent the flow between node i and node j, then the two different flow assumptions on f ij are: 1. Node Flow f ij = f i : In this case, the flows from node i to all other nodes are the same and equal f i. However, for different nodes i, f i may be different. For example, the flows from larger cities may be higher than from smaller cities. 2. Pairwise Flow f ij = f ij : In this case, the flow from node i to node j is specific to the pair of nodes i and j. For example, the flow between two large cities, such as New York

41 36 and Los Angeles, will likely be larger than the flow between a large city and a small city Node F-MVT Problems (NF-MVT) We now discuss the complexity of three F-MVT problems for which flow is node specific The NF-UMVT and NF-NMVT Problems We show that both the NF-UMVT problem and the NF-NMVT problem are polynomial solvable. In addition, the algorithm for solving the NMVT problem can be extended for solving the NF-UMVT and NF-NMVT problem. We discussed in Section 3.4 that one of contributions of our algorithm for the NMVT problem is that it is more easily extendable to the case in which flow impacts violation. Because the k-mest algorithms of both Krumme and Fragopoulou (2001) and Wu (2004b) originally focus on examining only the maximum eccentricity of the tree rather than the eccentricity of all source nodes in the tree, it is nontrivial to extend them to the flow NMVT problem which requires the eccentricity for all nodes in a tree. We shall first show that the NF-NMVT problem has a result which is similar to the result in Theorem 5 for the NMVT problem. Theorem 11 For the NF-NMVT problem, there is an optimal shortest path spanning tree whose root is also its center point. Proof: First, as with the NMVT, we will show that the maximal flow distance violation for each node in tree T is still acquired by the longest path from this node to other nodes in T. Given the node distance restriction, the maximal distance violation for any node v is still acquired by its longest path in T. In addition, because the flows from v to other nodes are the same, the maximal flow distance violation for v then depends on only its maximal distance violation. Therefore, the maximal flow distance violation for v is also acquired by the longest path from v. Hence, let T be an optimal tree for the NF-NMVT problem and T o be the shortest path spanning tree rooted at the center point o of T. By Theorem 4, the longest distance for any node in T o is no greater than that in T. Thus, because we have just shown that the maximal flow distance violation for any node only depends on its longest distance in the tree, then each node in tree T o has no greater maximal flow distance violation than that in tree T. That is, the maximal flow distance violation of T o is no greater than that of T. Thus, T o is also an optimal tree. Finally, following the same logic in the proof of Theorem 5, we will show

42 37 that an optimal shortest path tree can be constructed such that its root is also its center point. Suppose an optimal shortest path tree Tr rooted at r has the same situation as in Figure 3.3. Here, node s 1 and s 2 instead refer to the nodes which have largest maximal flow distance violation among all the nodes in subtree 1 and subtree 2 respectively. We have just shown the shortest path tree rooted at o, T o, is also optimal and δ To δ T r in the proof of Theorem 5. Similar to the proof of Theorem 5, if we suppose the maximal flow distance violation in T o is obtained by the longest path from s 2, then this path must still be the s 2 -v 1 path and o is then still the center point of T o. Thus, the maximal flow distance violation must be still obtained by the longest path from s 1 in T o. Then, like the proof in Theorem 5, the center point o 1 of T o also lies in o-v 2 path in T o. Therefore, by repeatedly constructing a new optimal shortest path tree rooted at the center point of the last optimal spanning tree, we will achieve an optimal shortest path tree whose root coincides with its center point. Hence, after we replace the objective of the maximal distance violation with the maximal flow distance violation in evaluating the trees in the set T SP, we can modify the algorithm for the NMVT problem to solve the NF-NMVT problem as well. In addition, the NF-UMVT problem is a special case of the NF-NMVT problem, the NF-UMVT can be solved by the same algorithm as the NF-NMVT problem. Corollary 6 The NF-NMVT and NF-UMVT problems can be solved in polynomial time. Now, we modify the Edge Interval Examine algorithm for the NMVT problem to the algorithm Edge Interval Examine With Flow for solving the NF-UMVT problem and the NF-NMVT problem. The key difference between Edge Interval Examine algorithm and Edge Interval Examine With Flow algorithm is that Edge Interval Examine With Flow needs to update the maximal flow violation for all the nodes after finding a tree T To SP, while Edge Interval Examine only maintains the single maximal distance violation of a tree T To SP. We detail the differences in the following. For the data structure, Edge Interval Examine With Flow uses the p Flow Violation Array P F V instead of p Violation Array P V in Edge Interval Examine, which maintains the maximal flow violation for each node connecting to edge e pq through p. In addition, λ p and λ q represent the largest p Flow Violation and q Flow Violation in Edge Interval Examine With Flow respectively rather than largest p Violation and q Violation in Edge Interval Examine. Finally, in order to calculate the flow violation for each node connecting to edge e pq through q, two more arrays are added in Edge Interval Examine With Flow. One is the q Distance Array QD

43 38 and the other is the q Flow Violation Array QF V. For the algorithm, Edge Interval Examine With Flow has the same structure as Edge Interval Examine except the following steps. First, the differences in process 3.c are changing step 3.c.iv, 3.c.v, and 3.c.vi and adding two additional steps 3.c.vii and 3.c.viii. 3.c.iv: Initialize the p Flow Violation Array P F V {+, +,..., + }; 3.c.v: Initialize λ p + and 3.c.vi: initialize λ q + ; 3.c.vii: Initialize the q Distance Array QD by empty set QD ; 3.c.viii: Initialize the q Flow Violation Array QF V by empty set: QF V ; Second, the differences in process 3.d.i are eliminating the last three steps 3.d.i.D, 3.d.i.E, and 3.d.i.F but adding one more step: update the q Distance Array QD QD {d G (v i, q)}. Finally, in process 3.d.ii, Edge Interval Examine With Flow needs to recalculate the maximal flow distance violation for all of the nodes if it finds a tree belonging to To SP. Then, we need to replace 3.d.ii.B by the following steps: If T To SP, then Update the p Flow Violation Array P F V for each node v j with a corresponding distance in P D by recalculating its p Flow Violation with f j [d G (v j, p) + l pq + θ q H j ]; Update the q Flow Violation Array QF V for each node v j with a corresponding distance in QD by recalculating its q Flow Violation with f j [d G (v j, q) + l pq + θ p H j ]; Update the largest p Flow Violation: λ p max P F V {p F low V iolation}; Update the largest q Flow Violation: λ q max QF V {q F low V iolation}; Update min{, max{λ p, λ q }}; In the Edge Interval Examine With Flow, step 2 is the same as in Edge Interval Examine and it still takes O(n 2 logn + nm) computation time; 3.a and 3.b are also the same as in Edge Interval Examine and then take O(nlogn) and O(n) time respectively. For step 3.c, the computation time is still O(nlogn). Step 3.d requires O(n) time to examine one interval if this interval offers a shortest path tree belonging to To SP. Each of at most n 1 intervals on an edge could offer a shortest path

44 39 tree belonging to To SP such that step 3.d finishes in O(n 2 ) time. Because there are m edges in G, step 3 then takes O(mnlogn + mn 2 ) time. Thus, we conclude that Edge Interval Examine With Flow runs in O(n 2 logn + mnlogn + mn 2 ) time NF-PMVT Problem Because the PMVT problem is a special case of the NF-PMVT problem with all node flows equal to 1 and we have shown that the PMVT problem is NP- Complete, the NF-PMVT problem is NP-Complete as well Pairwise F-MVT Problems Now let us investigate the three pairwise flow F-MVT problems. We shall demonstrate that the three problems are NP-Complete. Here, we will first offer the decision version of the PF-UMVT problem and demonstrate that it is NP-Complete by the transformation from Tree t-spanner problem. Instance: Graph G = (V, E), uniform distance restriction H ij = H and pairwise flow f(i, j) for any pair of two nodes v i and v j in V, integer bound K Z +. Question: Is there a spanning tree T for G such that the maximum, over all pairs of nodes v i, v j V, of the flow distance violation in T no more than K? That is, is there a spanning tree T for G such that max{d T (v i, v j ) H, 0}f(v i, v j ) K for all pairs of nodes v i, v j in V? Theorem 12 The PF-UMVT problem is NP-Complete. Proof: We shall prove this result by transforming the Tree t-spanner problem to a special case of the PF-UMVT problem. Given an instance of the Tree t-spanner problem, we can define a special case of the PF-UMVT problem P as follows. In P, let the uniform pair distance K restriction H = 0; let the pairwise flow f(v i, v j ) = td G (v i,v j ; let K be the integer ) bound. Then, we claim that this special PF-UMVT problem P has a solution if and only if the Tree t-spanner problem has a solution. For problem P, the flow distance violation for a pair of node v i and v j in tree T is max{d T (v i, v j ) H, 0}f(v i, v j ) = d T (v i, v j )f(v i, v j ) K = d T (v i, v j ) td G (v i, v j ) = Kd T (v i, v j ) td G (v i, v j )

45 40 Then, problem P is to find a tree T such that Kd T (v i,v j ) td G (v i,v j K, that is, d T (v i,v j ) t ) d G (v i,v j ) for all pairs of nodes in T. Clearly, any solution tree for problem P is also a solution tree for the Tree t-spanner problem, and vice versa. Because the Tree t-spanner problem is NP- Complete, the PF-UMVT problem is NP-Complete as well. Clearly, the PF-UMVT is a special case of both the PF-NMVT and the PF- PMVT. Because we have just shown that the PF-UMVT is NP-Complete, then the PF-NMVT and the PF-PMVT are both NP-Complete as well. 3.6 Conclusions We study several variants of the minimax distance violation problems and the minimax flow distance violation problems. Table 3.1 summarizes the complexity results of the nine minimax flow distance violation problems. Unit Flow Node Flow Pairwise Flow Uniform Distance Restriction O(n 2 logn + mn) O(n 2 logn + mnlogn) NP-Complete Node Distance Restriction O(n 2 logn + mn) O(n 2 logn + mnlogn) NP-Complete Pairwise Distance Restriction NP-Complete NP-Complete NP-Complete Table 3.1: Complexity of Nine Min-Max Flow Distance Violation Tree Problem Variants

46 41 CHAPTER 4 MINISUM DISTANCE VIOLATION TREE PROBLEMS In this chapter, we study the USVT, NSVT, and the PSVT, which are the three variants of SVT based on different assumptions about the distance restriction between cities. We examine the complexity of these problems and apply several local search methods to solve the problems. Section 4.1 demonstrates all three variants are NP-complete. In Section 4.2, we provide an integer programming (IP) formulation for each of them. Section 4.3 discusses local search approaches, and Section 4.4 introduces three new structure-based neighborhoods for the SVT. Finally, we conclude in Section Problem Complexity This section demonstrates the complexity of the SVT. We begin by showing that the USVT is NP-Complete. The following is an instance representation of the USVT. Instance: Graph G = (V, E), uniform pair distance restriction H ij = H for any pair of two nodes v i, v j in V, integer bound K Z +. Question: Is there a spanning tree T of G such that the sum, over all pairs of nodes v i, v j V, of the distance violations in T is no more than K? To help prove the NP-Completeness of the USVT, we introduce the Shortest Total Path Length Spanning Tree problem (STPLST), which is an NP-Complete problem (Garey and Johnson, 1979). The following is an instance representation of the STPLST. Instance: Graph G = (V, E), integer bound K Z +. Question: Is there a spanning tree T of G such that the sum, over all pairs of vertices u, v V, of the length of the paths in T from u to v is no more than K? The following theorem presents the result of the NP-Completeness about the USVT. We will use the transformation from the STPLST to an instance of USVT to prove the result. Theorem 13 The USVT problem is NP-Complete. Proof: For the USVT, we want to minimize v i V v j V max{d T (v i, v j ) H, 0}. For the STPLST, the objective is to minimize v i V v j V d T (v i, v j ). It is easy to see that the STPLST is just a special case of the USVT with H = 0.

47 42 Then, because the STPLST is NP-Complete, the USVT is NP-Complete as well. Based on the NP-Completeness of the USVT, we describe the result on the NP-Completeness of the NSVT and PSVT in Theorem 14. Theorem 14 The NSVT and PSVT are NP-Complete. Proof: For the NSVT problem, if we set the distance restriction from each node to be the same, then the USVT is transformed to be a special case of the NSVT. Thus, because the USVT is NP-Complete by Theorem 13, the NSVT is also NP-Complete. Similarly, for the PSVT, if we set the distance restriction for each pair of nodes to be the same, then the USVT is transformed to be a special case of the PSVT problem. Hence, the PSVT is NP-Complete as well. 4.2 IP Formulations In this section, we provide an IP formulations for them. As the PSVT is the general case of the USVT and NSVT, we first present its IP formulation. We formulate the PSVT as a multicommodity flow problem in an undirected graph. The flow between every different origin and destination node (OD) pair is modeled as a different commodity. Binary variables x ij will be employed such that x ij = 1 if edge e ij is in the SVT, and x ij = 0 otherwise. Binary variables yij st = 1 if the path to send the specific commodity from node s to node t contains edge e ij, and yij st = 0 otherwise. The binary variables zi st = 1 if node i is in the path from node s to t, and zi st = 0 otherwise. Finally, the nonnegative variables v st represent the amount of the violation from node s to node t. In terms of parameters, let K represent the number of OD pairs, and let b st i represent the maximum number of edges which could be on the path between any s-t OD pair and adjacent to node i. Thus, b st i = 1 if i = s or i = t, and b st i = 2 otherwise. Multicommodity Flow Formulation in an Undirected Graph (MCF- UD) : Subject to Minimize s N e ij E s N t N,t s t N,t s v st (4.1) x ij = n 1 (4.2) y st ij Kx ij e ij E (4.3)

48 43 j Ns.t.e ij E s N t N,t s e ij E y st ij + y st ij x ij e ij E (4.4) l ij y st ij H st v st s-t OD pair (4.5) j Ns.t.e ji E y st ji = b st i z st i i N, s-t OD pair (4.6) zs st = 1 s-t OD pair (4.7) zt st = 1 s-t OD pair (4.8) v st 0 s-t OD pair (4.9) x ij {0, 1} e ij E (4.10) yij st {0, 1} e ij E, s-t OD pair (4.11) zi st {0, 1} i N, s-t OD pair (4.12) In the formulation, equation 4.1 is the objective function which sums the violations. Constraint 4.2 represents the limit of n 1 edges in a tree structure. Constraints 4.3 restrict the path connecting any s-t OD pair to contain edge e ij only if the edge e ij is in the solution. On the other hand, constraints 4.4 imply that if all the paths connecting all of the s-t OD pairs do not contain the edge e ij, then the edge e ij should not exist in the solution. Constraints 4.5 follow from the definition of the violation. Constraints 4.6 limit the number of adjacent nodes on any s-t OD pair to be 1 for node s and node t, 2 for intermediate nodes on the path, and 0 for nodes not on the path. Constraints 4.7 and 4.8 assure that the nodes s and t are always on the path for the s-t OD pair. Because we are interested only in variants with positive violation, constraints 4.9 restricts violation to be nonnegative. Finally, constraints 4.10, 4.11, and 4.12 are the binary constraints. To formulate the USVT and NSVT, we replace H st with H and H s in constraints 4.5, respectively. 4.3 Local Search Algorithms Our choice of using local search is based on the complexity of the problems and our computational experience with the IP formulation. To understand the limitations of using an IP approach, we created four different data sets with 3, 6, 9, and 12 cities, respectively. We solve the IP on these four data sets using CPLEX 9.0 on a computer with Intel Pentium D CPU 3.20GHz, 1024 KB cache size, and 2 GB RAM. The problem with 3 cities is solved in seconds, and the problem with 6 cities is solved in seconds. However, the instances with 9 cities and 12 cities do not converge after running for more than 5000 minutes and 6 days, respectively. These results suggest that an IP approach is limited to small problem

49 44 sizes and inadequate for realistic instances, which often have more than 100 cities. These results combined with the fact that all three SVT variants are shown to be NP-complete lead us to using local search algorithms to solve the SVT. Our initial experiments with local search involves the use of the Edge k-switch neighborhood, which has been successful in other tree problems. In this section, we will define the Edge k-switch neighborhood in 4.3.1, discuss various local search approaches in 4.3.2, and present the experimental results in Edge k-switch Neighborhood Given a tree, if we delete k edges from the tree, we will obtain k + 1 subtrees. Then, adding another k edges back one by one to reconnect the k + 1 subtrees creates a new tree, T. The Edge k-switch neighborhood of a tree is all the trees created by first deleting k edges from the tree and then adding another k edges back to reconnect the subtrees. Examples of successfully applying Edge k-switch neighborhoods in solution approaches to many spanning tree problems can be found in (Savelsbergh and Volgenant, 1985), (Steiner and Radzik, 2003), (Ribeiro and Souza, 2002), and (Gruber et al., 2006). The size of the Edge k-switch neighborhood grows in k. For example, the Edge 1-Switch neighborhood is a subset of the Edge 2-Switch neighborhood. Thus, doing local search in a larger Edge k-switch neighborhood may offer better solutions but also requires greater computational time. Hence, we need to make a tradeoff between the size of the neighborhood and the run time. Therefore, our local search scheme shall focus on the Edge 1-Switch neighborhood, which is the smallest Edge k-switch neighborhood. It is important to note that we can efficiently compute the cost of neighboring trees in the Edge 1-Switch neighborhood. Because deleting an edge from T does not change the distance between two nodes in the same subtree, the algorithm only needs to recalculate the distance and update the violation between two nodes in different subtrees. Suppose edge e pq is the deleted edge and edge e uv is the added edge where node u is in subtree 1 and node v is in subtree 2. Let w 1 and w 2 be any two nodes in subtree 1 and subtree 2, respectively. Then, d T (w 1, w 2 ) = d T (w 1, u)+l uv +d T (v, w 2 ), where d T (w 1, u) and d T (v, w 2 ) have already been computed Local Search Options Four widely used local search approaches are implemented with the Edge 1- Switch neighborhood.

50 Best Improvement Search (BI) In each iteration of the search, BI finds and updates the tree by choosing the best improving tree in the Edge 1-Switch neighborhood in terms of reducing the sum of the violations. The search terminates when no improvement can be found. For more discussion, we refer the reader to Aarts and Lenstra (1997) First Improvement Search (FI) In each iteration of FI, the tree is updated with the first improving tree found in the Edge 1-Switch neighborhood. The search terminates when no improvement can be observed. For more discussion, we refer the reader to Aarts and Lenstra (1997). To make sure our FI examines different neighborhoods at each iteration, we utilize the idea of an edge list. Given a tree, when we do the FI, we shall put all edges of the tree into an edge list before we begin the search process. Then, starting from the first edge on the list, we continue down the edge list until the first improving tree is found. The next time we look for an improving solution, instead of building another edge list for the updated tree and starting over, we start from where we left off on the edge list. This search order keeps from reexamining the same subtrees over and over Greedy Randomized Adaptive Search Procedure (GRASP) GRASP is a search method which embeds randomness in the search process. For additional discussion and examples, we refer the reader to Hart and Shogan (1987), Bard and Feo (1989), Feo and Resende (1989), and Feo et al. (1991). The randomization helps prevent the search from always reaching the same local minimum. In our experiments, GRASP is applied to the Edge 1-Switch neighborhood. Instead of updating the tree by the first improving tree in the Edge 1-Switch neighborhood, our GRASP search randomly chooses one from the first three improving trees in this neighborhood Tabu Search (TS) TS is another widely used local search method. It provides a different strategy to prevent the search process from being stuck in a local minima by forcing the search process to move to a new neighborhood. For general discussion, we refer the

51 46 reader to Glover (1989) and Glover (1990). We now outline our TS on the Edge 1-Switch neighborhood. At the beginning of the search, an edge list of the initial solution tree is built and will be updated during the search process. Whenever a tree update happens, the edge list will be updated by replacing the deleted edge with the added edge. The search process then examines the Edge 1-Switch neighborhood for each edge from the top of the edge list to the bottom of the edge list. When a tree update occurs, the edge deleted from the tree is placed in the tabu list to avoid from being added back to the tree. In maintaining the tabu list, we set a size for the tabu list. Sizes of 7, 15, 25, 50, 75, 100, 125, and 150 were evaluated in our implementation. The tabu list is updated in a first-in, first-out manner. For example, if the tabu list size is 7, then an edge added to the tabu list will stay on the tabu list for 7 tree updates. In addition, we use best improvement as the aspiration rule. That is, if adding an edge in the tabu list back to the tree could make the tree be the best tree ever found during the whole search process, then the tabu list is ignored, and the edge is added to the tree. The search process stops after no improving tree update is discovered continuously for 100n tree updates where n is the number of nodes in the graph. The following describes three different kinds of tree updates in our TS. 1. Non-tabu improving tree update: The search process will try to locate the first tree in the Edge 1-Switch neighborhood that decreases the sum of the violations without using any edge in the tabu edge list and then updates the old tree with this tree. 2. Tabu best-improving tree update: If in the Edge 1-Switch neighborhood, the search discovers a tree which has the lowest total violation among all observed trees ever seen but needs to add an edge from the tabu edge list, then the search will do the tabu best-improving tree update because of the aspiration rule. 3. Non-tabu non-improving tree update: If the search cannot find any improving tree in the entire Edge 1-Switch neighborhood without adding an edge in the tabu list, it will update the tree with the best non-tabu non-improving tree found in the Edge 1-Switch neighborhood.

52 Implementation and Results We implement BI, FI, GRASP, and TS on 10 problem instances Data Set We test the local search approaches on 10 instances derived from a data set of the 150 largest cities in the United States (Daskin, 1997). The Euclidean distances are computed based on the (x, y) coordinates as the distances between cities. We next describe how we create the 10 problem instances. There are two decisions in creating each instance: how the graph will be connected and the choice of distance restrictions. We first divide all 150 cities into three groups according to the city size. These groups are the biggest 50 cities, the middle 50 cities, and the smallest 50 cities. Because we want to study the performance of the local search on graphs with different levels of sparseness, we set different connection rules for the instances. These connection rules dictate which cities will have edges between them. For example, the connection rule 1000 miles, 1000 miles, 1000 miles means to include edges between each large, middle, and small city and all other cities within 1000 miles, respectively. The connection rule 500 miles, 500 miles, 500 miles connects a city to all other cities within 500 miles and will offer more sparse graph than the 1000 mile rule. Besides creating edges according to this kind of connection rule, in certain problem instances we also include the edge between any two large cities in the graph. The resulting instances will reflect different choices in the number and type of links that delivery companies may consider in the design of their networks. We next describe how we created the distance restrictions for a problem instance. We first set the service level for each city in terms of transportation time and then translate the service level to be in terms of distance restrictions. For example, if we set 2-day delivery service as the service level from a city to all other cities, then assuming a truck can travel 700 miles per day in United States, the distance restriction from this city is 1400 miles to all other cities. In the problems, we set different service levels according to the size of the cities. For example, the rule 2-day, 3-day, 4-day means to provide 2-day delivery service from a large city, 3-day delivery service from a middle city, and 4-day delivery service from a small city, respectively. Therefore, the above delivery service could then be translated to set 1400 miles, 2100 miles, and 2800 miles as the distance restrictions from a large city, a middle city, and a small city respectively. By constructing our instances in this way, we can examine what happens when delivery companies offer different levels of service.

53 48 Table 4.1 summarizes the 10 created problem instances, where the instance US01, US02, and US03 are the USVT instances and the others are the NSVT instances. Problem Number Graph Connection Rule Service Level Number of Edge Large City Middle City Small City Large City Middle City Small City US miles 5000 miles 5000 miles 4 days 4 days 4 days US miles 1000 miles 1000 miles 4 days 4 days 4 days US miles 500 miles 500 miles 4 days 4 days 4 days US miles 5000 miles 5000 miles 2 days 4 days 4 days US miles 1000 miles 1000 miles 2 days 4 days 4 days US miles 500 miles 500 miles 2 days 4 days 4 days US All large cities 500 miles 500 miles 2 days 4 days 4 days US miles 5000 miles 5000 miles 2 days 3 days 4 days US miles 1000 miles 1000 miles 2 days 3 days 4 days US miles 500 miles 500 miles 2 days 3 days 4 days Table 4.1: 10 problem instances on U.S. data Test Design We consider two methods for seeding the local searches: the Minimum Spanning Tree (MST) and the Best Shortest Path Spanning Tree (BSPT). For each problem, the BSPT is attained by examining all the shortest path spanning trees rooted at each of n nodes and choosing the one with the lowest total violation among all n shortest path spanning trees. We run BI, FI, GRASP, and TS on each of the 10 test problems using MST and BSPT as the initial solution respectively. For the GRASP, because of its use of randomization, we run the search procedure 10 times on each problem instance. We report the best solution value of the 10 runs, but report the total computation

54 49 time for all 10 runs. All tests are coded in C++ and run on the computer with Intel Pentium D CPU 3.20GHz, 1024 KB cache size, and 2 GB RAM Test Results First, we compare using MST and BSPT as initial solutions. The MST and BSPT are very different from each other in their structure. For the same problem instance, the MST has much higher total violation than the BSPT. For example, Figure 4.1 compares the MST and BSPT for the instance US01. The sum of the violations of the MST is days, which is much higher than that of the BSPT, 15.3 days. Table 4.2 compares the total violation of the solutions and the run time of the FI search starting from MST and BSPT on the 10 U.S. instances. MST FI stands for the FI using MST as the initial solution; BSPT FI stands for the FI using BSPT as the initial solution. The results indicate that the FI starting from BSPT converges more quickly to better solutions than starting from MST in all the 10 U.S. instances. Thus, because starting the search from the BSPT has advantages over starting from the MST in terms of both the solution quality and the search time, we use BSPT to seed the remaining searches in this section. Next, we compare the BI with FI. Table 4.3 compares the sum of the violations and the run time for FI and BI. All the tests utilize the BSPT as the initial solution. As shown in Table 4.3, FI offers the same solutions as BI in 5 instances, better solutions in 2 instances, and worse solutions in 3 instances. However, the run time is much faster with FI than BI in 9 instances. From this observation, we shall further concentrate on enhancing FI because it is more time efficient than the BI even if both of them could find comparable quality solutions. In fact, it is the reason why we shall next combine the randomness with FI instead of with BI in the GRASP search. Table 4.4 compares the total violation and the run time for the GRASP and the FI. Both search methods use BSPT as the initial solution. Table 4.4 indicates that the FI is faster than the GRASP. However, using 10 GRASP runs can always discover a solution either better than or equal to the FI search for all 10 test problems. Finally, we compare the GRASP with TS. Table 4.5 compares the total and the run time for TS and GRASP. All the tests utilize the BSPT as the initial solution. The total violation for TS is the best among all the test with 7, 15, 25, 50, 75, 100, 125, and 150 as the size of the tabu list. The total violation for GRASP is the best of 10 GRASP runs. The run time for TS is the run time associated with the test which offers the best solution among the above 8 tests with different tabu list sizes. The run time for GRASP is the total run time of 10 GRASP runs. As

55 50 (a) Minimum Spanning Tree (b) Best Shortest Path Spanning Trees Figure 4.1: Two different initial solutions for US01

56 51 Problem Sum of Violation (days) Run Time (seconds) Number MST FI BSPT FI MST F I BSP T F I MST F I MST FI BSPT FI MST F I BSP T F I MST F I US % % US % % US % % US % % US % % US % % US % % US % % US % % US % % Average 33.9% 51.4% Table 4.2: Compare MST with BSPT as FI initial solutions

57 52 Problem Sum of Violation (days) Run Time (seconds) BI F I BI F I Number BI FI BI FI BI BI US % % US % % US % % US % % US % % US % % US % % US % % US % % US % % Average 0.3% 62.4% Table 4.3: Compare BI with FI

58 53 Problem Sum of Violation (days) Run Time (seconds) Number FI GRASP F I GRASP F I GRASP FI GRASP F I F I US % % US % % US % % US % % US % % US % % US % % US % % US % % US % % Average 1.5% % Table 4.4: Compare GRASP with FI

59 54 shown in Table 4.5, GRASP offers better solutions than TS in all the 10 instances while the run time of GRASP is always shorter than that of TS. Then, GRASP is more efficient than TS in finding a better solution. Problem Sum of Violation (days) Run Time (seconds) Number TS GRASP T S GRASP TS GRASP T S GRASP T S T S US % % US % % US % % US % % US % % US % % US % % US % % US % % US % % Average 1.3% 28.3% Table 4.5: Compare GRASP with TS In conclusion, concerning both the solution quality and the run time, we prefer starting the local search from the BSPT and utilizing FI. Furthermore, since GRASP investigates a larger solution space, we will also employ the GRASP search in our further efforts to find better solutions. 4.4 Structure-Based Neighborhoods The second type of neighborhoods we explore are based on observations of the structure of the solutions obtained in the described tests.

60 Backbone For different solution trees obtained by different GRASP runs on the same problem, the sum of their violations vary greatly. This observation suggests that a problem instance has many local minima in the Edge 1-Switch neighborhood. Figure 4.2 compares the best tree and the worst tree obtained in 10 GRASP runs for the problem instance US08 which sets the connection rule as 5000 miles, 5000 miles, 5000 miles and the service level as 2-day, 3-day, 4-day. As shown in Figure 4.2, although the sum of violations of the best tree is 8.86% less than that of the worst tree, both of the two trees share a similar structure. We then call this similar structure the backbone. A tree with a backbone contains only a small number of high degree hub nodes and all of the other nodes are connected to one of these hub nodes. Although the two solution trees both display a backbone, the differences in the structure of the two backbones result in the different sum of the violations. For example, inside the two rectangles in Figure 4.2, although both trees have a hub node, the locations of the hub nodes are different. In addition, within the two circles in Figure 4.2, the best tree has two hub nodes while the worst tree has only one hub node. Thus, the backbone structure implies that the location and number of those hub nodes play an important role in influencing the total violation. In order to confirm the existence of the backbone structure, we also run tests on 10 problem instances derived from the data set of 147 European cities (Daskin, 1997). Referring to our previous description of the U.S. instances, Table 4.6 describes these 10 E.U. instances. After we run 10 GRASP runs on all the 10 E.U. instances, we find that the solution trees for these instances indicate a backbone structure as well. For example, Figure 4.3 shows that the best and worst trees for the instance EU10 share a similar backbone structure. However, these two trees also indicate the difference in the number and location of the hub nodes. Inside the two circles in Figure 4.3, the best tree has one hub node while the worst tree has no hub node; inside the rectangles, the best tree has four hub nodes while the worst tree has only one hub node in the same area. Based on these observations, we explore the idea of a structure-based neighborhood, a new neighborhood associated with the hub nodes. If we define all the nodes whose degree is more than one as hub nodes, the structure-based neighborhood of a tree is then the trees created by modifying the set of hub nodes. Next, we shall introduce three different kinds of structure-based neighborhoods.

61 56 (a) The best tree in 10 GRASP runs with sum of violations = 3424 days (b) The worst tree in 10 GRASP runs with sum of violations = 3757 days Figure 4.2: Compare the best and worst trees obtained in 10 GRASP runs for US08

62 57 Problem Number Graph Connection Rule Service Level Number of Edge Large City Middle City Small City Large City Middle City Small City EU miles 5000 miles 5000 miles 3 days 3 days 3 days EU miles 1000 miles 1000 miles 3 days 3 days 3 days EU miles 500 miles 500 miles 3 days 3 days 3 days EU miles 5000 miles 5000 miles 1 day 3 days 3 days EU miles 1000 miles 1000 miles 1 day 3 days 3 days EU miles 500 miles 500 miles 1 day 3 days 3 days EU All large cities 500 miles 500 miles 1 day 3 days 3 days EU miles 5000 miles 5000 miles 1 day 2 days 3 days EU miles 1000 miles 1000 miles 1 day 2 days 3 days EU miles 500 miles 500 miles 1 day 2 days 3 days Table 4.6: 10 problem instances on E.U. data

63 58 (a) The best tree in 10 GRASP runs with sum of violations = 2947 days (b) The worst tree in 10 GRASP runs with sum of violations = 3098 days Figure 4.3: Compare the best and worst trees obtained in 10 GRASP runs for EU10

64 Move-One-Hub (MOH) Neighborhood The MOH neighborhood is motivated by the fact that the location of the hub node could influence the sum of the violation. We first define MOH. Let v be a hub node in a tree T whose degree is k, and let w 1, w 2,..., w k be the k nodes adjacent to v in T. MOH first deletes all of the edges e vwi for i = 1, 2,..., k from the tree T and then connects both v and w i for i = 1, 2,..., k to another node u, so as to relocate the hub node at u instead of v in the new tree. The MOH neighborhood of a tree then includes all the trees created by MOH for every hub node. In order to not create a cycle when reconnecting the old hub node v and w i for i = 1, 2,..., k to the new hub node u, we design a specific procedure to reconnect them. Among all the k nodes w 1, w 2,..., w k, after we delete all the edges e vwi for i = 1, 2,..., k, there will be exactly one node of w 1, w 2,..., w k which is on the same subtree as the new hub node u. Without loss of generality, let w 1 be this node. Then, we describe the procedure to build the new tree as follows. First, since w 1 is on the same subtree as the new hub node u, we do not need to reconnect w 1 with u. Second, for w i and i = 2,..., k, if e uwi E, then add e uwi to the new tree; if e uwi / E, then add the edge e vwi back to the tree. Finally, for the old hub node v, if e uv E, then add e uv to the new tree; if e uv / E, then add the edge e vw1 back to the tree. Clearly the procedure will result in a tree because the edge we add each time only connects two separate components. Figure 4.4 compares two trees where a hub is located from one node to another node by MOH. In designing the local search in the MOH neighborhood, since the location of the hub node has strong influence on the sum violation, we shall move a hub node further step by step and then keep the backbone structure at the same time. We examine only a small candidate set of new hub nodes. We choose the three nodes which are closest to the old hub node as the candidate set. Because there are at most h n 2 hub nodes in a tree, the size of the MOH neighborhood is at most 3h 3n 6 for the entire tree, which is a small number. Thus, we apply the BI rule rather than the FI rule to the local search in this neighborhood. That is, we compare the three candidate new hub nodes and choose the one which offers the most improvement on the total violation as the new hub node Add-One-Hub (AOH) Neighborhood When we compare the solution trees from different GRASP runs on the same problem, we found that some trees with more hub nodes on the backbone offer a lower total violation than those with less hub nodes on the backbone. This observation suggests another structure-based neighborhood, the AOH neighborhood.

65 60 (a) The tree before MOH (b) The tree after MOH Figure 4.4: An example of MOH

66 61 Concisely, AOH builds a new tree with one more hub node than the old tree. Let v a and v b be two connected hub nodes in a tree T. Let w1, a w2,..., a wk a be k nodes other than v b adjacent to the hub node v a and w1, b w2,..., b wl b be l nodes other than v a adjacent to the hub node v b in the tree T. AOH first creates, based on the location of v a and v b, a candidate node set from which we shall choose the new hub node u. If we let o be the middle point of edge e va v b, then the candidate added hub nodes are a number of nodes other than v a and v b which are closest to o. AOH then deletes all the edges e vaw i a for i = 1, 2,..., k, all the edges e vb w for j = 1, 2,..., l, j b and the edge e va v b from the tree T. Lastly, AOH connects certain nodes of wi a for i = 1, 2,..., k, back to v a, certain nodes of wj b for j = 1, 2,..., l back to v b, and all other wi a, wj, b and v a, v b to the new hub node u. Therefore, AOH results in a tree with one more hub node u besides the two old hub nodes v a and v b. The AOH neighborhood of a tree then contains all the trees created by AOH for every pair of connected hub nodes. Given a new hub node u, we now describe the algorithm to build the new tree. Among all the k + l nodes w1, a w2,..., a wk a and w1, b w2,..., b wl b, after we delete all the edges e va wi a for i = 1, 2,..., k, all the edges e vb w for j = 1, 2,..., l, and the edge e j b v a v b from the tree T, there will be exactly one node of w1, a w2,..., a wk a and w1, b w2,..., b wl b which is on the same subtree as the new hub node u. Without loss of generality, let w1 a be this node. First, since w1 a is on the same subtree as the new hub node u, we do not need to reconnect w1 a with u. Second, for wi a and i = 2,..., k, if e uw a i E and l uw a i l vaw i a, then add e uwi a to the new tree; otherwise add the edge e vaw i a back to the tree. Third, similarly for wj b and j = 1, 2,..., l, if e uw b j E and l uw b j l vb w, then j b add e uw b j to the new tree; otherwise add the edge e vb w back to the tree. Finally, for i b the old hub node v a, if e uva E, then add e uva to the new tree; otherwise add the edge e vaw 1 a back to the tree. For the old hub node v b, if e uvb E, then add e uvb to the new tree; otherwise add the edge e va v b back to the tree. Hence, the algorithm connects all the nodes adjacent to v a and v b except node w1 a either back to their old hub nodes v a and v b or to the new hub node u in the new tree, which has one more hub node u than the old tree T. Because each time we add an edge to connect two separate components, the algorithm will not create any cycles. Figure 4.5 compares two trees where a hub node is added by AOH. If we choose three candidates for each pair of hub nodes, as there are at most h n 3 hub node pairs in a tree, the size of the AOH neighborhood is at most 3h 3n 9 for the entire tree, which is a small number as well. Hence, we utilize again the BI rule rather than the FI rule in the local search of the AOH neighborhood.

67 62 (a) The tree before AOH (b) The tree after AOH Figure 4.5: An example of AOH

68 Merge-Two-Hubs (MTH) Neighborhood Among the solution trees from different GRASP runs on the same problem, we also found some trees with less hub nodes on the backbone show a lower total violation than those with more hub nodes on the backbone. This phenomenon yields the third structure-based neighborhood, the MTH neighborhood, which reduces the number of hub nodes by one in the new tree. Let v a and v b be two connected hub nodes in a tree T. let w1, a w2,..., a wk a be k nodes other than v b adjacent to the hub node v a and w1, b w2,..., b wl b be l nodes other than v a adjacent to the hub node v b in the tree T. As with AOH, MTH first creates a candidate new hub node set and then deletes all the edges e va wi a for i = 1, 2,..., k, all the edges e vb w for j = 1, 2,..., l, and the edge e j b v a v b from the tree T. Unlike AOH, MOH next connects all the nodes wi a for i = 1, 2,..., k, wj b for j = 1, 2,..., l, v a, and v b to the new hub node u. The MTH neighborhood of a tree contains all the trees created by MTH for each pair of connected hub nodes. The algorithm to build the new tree for MTH is similar to AOH. We employ the same notation as AOH in describing the algorithm for MTH. The first step in MTH is same as in AOH, which is to do no reconnection for w1. a The second and third steps in MTH are different from those in AOH. In the second step, for wi a and i = 2,..., k, if only e uw a i E, then add edge e uw a i to the new tree; otherwise add e vaw i a back to the tree. In the third step, for wj b and j = 1, 2,..., l, if only e uw b j E, then add edge e uw b j to the new tree; otherwise add the edge e vb w back to the tree. The j b final step is also exactly the same as that in AOH. Hence, the algorithm substitutes the new hub node u for the old two hub nodes v a and v b such that the number of hub nodes reduces by one in the new tree.figure 4.5 compares two trees where two hub nodes are merged by MTH. The method in AOH of creating the candidate new hub node set is also adopted in MTH. In addition, the BI rule is used again in the local search of the MTH neighborhood Data Set and Test Design Since the structure-based neighborhoods are based on the the solution trees obtained by the Edge 1-Switch neighborhood local searches, we implement a twostage local search. It combines the local searches on these two different neighborhoods by feeding the solution obtained by the Edge 1-Switch local search into the structure-based neighborhood local search.

69 64 (a) The tree before MTH (b) The tree after MTH Figure 4.6: An example of MTH

70 Data Set As the 10 U.S. instances and 10 E.U. instances defined earlier are USVT and NSVT instances, we create 8 PSVT instances to also examine in our testing. The graphs of the 8 PSVT instances are developed using a similar method as with the USVT and NSVT instances. We use two schemes to dictate the service levels which will be translated to pairwise distance restrictions. The two schemes are Pairwise Size and Pairwise Distance. The Pairwise Size scheme decides the service level according to the size of the origin and destination cities. There is often more competition for business in larger cities, which may translate to better service levels. Table 4.7 shows the Pairwise Size scheme for U.S. cities. The Pairwise Size scheme for E.U. cities is similar to the U.S. cities except that the service level is one day less than that of U.S. cities because the cities in Europe are geographically closer to each other. The Pairwise Distance scheme decides the service level according to the distance between the origin and the destination cities. This scheme is based on the simple idea that it takes less time to travel shorter distances. For every U.S. city, we set 2-day service to its closest 50 cities, 3-day service to its further 50 cities, and 4-day service to its farthest 50 cities. For a E.U. city, the Pairwise Distance scheme is similar except that the service level is one day less than for a U.S. city. We summarize the 8 PSVT instances in Table 4.8. To Largest 50 Cities To Middle 50 Cities To Smallest 50 Cities From Largest 50 Cities 2 days 2 days 3 days From Middle 50 Cities 2 days 2 days 3 days From Smallest 50 Cities 3 days 3 days 4 days Table 4.7: Pairwise Size scheme for U.S. cities Test Design In order to examine whether the structure-based neighborhoods improve the solution, our first experiments involve a Single Structure-Based Neighborhood Local Search (SSBLS). In SSBLS, we use the solutions constructed from GRASP in the Edge 1-Switch neighborhood as the seed solutions for to the MOH local search, the AOH local search, and the MTH local search, respectively. Second, we investigate the performance of the combination of the three structure-based neighborhoods in

71 66 Problem Edge Graph Connection Rule Service Level Number Number Large City Middle City Small City Scheme US miles 1000 miles 1000 miles Pairwise Size US miles 500 miles 500 miles Pairwise Size US miles 1000 miles 1000 miles Pairwise Distance US miles 500 miles 500 miles Pairwise Distance EU miles 1000 miles 1000 miles Pairwise Size EU miles 500 miles 500 miles Pairwise Size EU miles 1000 miles 1000 miles Pairwise Distance EU miles 500 miles 500 miles Pairwise Distance Table 4.8: The 8 PSVT instances a Multiple Structure-Based Neighborhood Local Search (MSBLS). In MSBLS, we again use the solutions from the GRASP runs as the seed solutions, but now we perform MOH, AOH, and MTH local search repeatedly until no improvement can be found. Last, we study the combination of the Edge 1-Switch local search and the structure-based local search (ES-SBLS). In ES-SBLS, we seed the structure-based local search from from the GRASP runs and then run MOH, AOH, and MTH until an improving solution is found. If any of the structure-based neighborhoods yields an improving solution, then it will be fed into a GRASP search again. If GRASP converges to a better solution, then this solution is thrown back into the structurebased neighborhood local search. This process repeats until no improving solution can be found with any of these neighborhoods Test Results The experiments with structure-based neighborhoods are still in progress and should be completed shortly. However, we now report some of the test results.

67 4.4.6.1 Compare GRASP with SSBLS Figure 4.7 compares the total violation of GRASP and SSBLS for 20 problem instances.

72 Compare GRASP with SSBLS Figure 4.7 compares the total violation of GRASP and SSBLS for 20 problem instances. The total violation for GRASP is the best obtained in the previous 10 GRASP runs. For each SSBLS, we run 10 times by seeding each run with one of the 10 solutions obtained in the previous 10 GRASP runs. Then, the total violation for each kind of SSBLS in Figure 4.7 is the best of 10 SSBLS runs. Figure 4.7 indicates that MOH, AOH, and MTH can improve the solution obtained by GRASP in 5, 12, and 4 of 20 instances, respectively. This confirms that the structure-based neighborhood can improve the solution found in the Edge 1-Switch neighborhood. Figure 4.7: Compare the sum of violation for GRASP and SSBLS Table 4.9 compares the number of MOH, AOH, and MTH in the search process. The numbers in Table 4.9 is the average number of MOH, AOH, and MTH in 10 runs, respectively. As shown in Table 4.9, the average numbers of MOH, AOH, and MTH in each SSBLS are 1 or 0. This result implies that although SSBLS can

The minimum G c cut problem

The minimum G c cut problem Abstract In this paper we define and study the G c -cut problem. Given a complete undirected graph G = (V ; E) with V = n, edge weighted by w(v i, v j ) 0 and an undirected