Technische Universität Berlin

Technische Universität Berlin berlin FACHBEREICH 3 MATHEMATIK Approximation Algorithms for Scheduling Series-Parallel Orders Subject to Unit Time Communication Delays by Rolf H. Mohring and Markus W. Schaffter No. 483/1995 Sekr. MA 6-1, Straße des 17. Juni 136, D-10623 Berlin - Germany

Approximation Algorithms for Scheduling Series-Parallel Orders Subject to Unit Time Communication Delays Rolf H. Mohring Markus W. Schater December 1995 Abstract We consider the problem P jprec; c ij = 1jC max of scheduling jobs with arbitrary processing times on m parallel machines subject to precedence constraints and unit time communication delays. We show that, given an optimal schedule on an unlimited number of machines, i.e. for the problem P 1jprec; c ij = 1jC max, one can construct a job priority list such that applying priority list scheduling determines a schedule whose makespan is at most twice the minimum. When the precedence constraints are given by a series-parallel order, we derive a polynomial time algorithm to construct such an optimal schedule on a suciently large number of machines, thus obtaining a simple and fast approximation algorithm with a worst case performance ratio of at most two for the problem P jseries-parallel; c ij = 1jC max. The overall running time of the presented approach is O(n 0 + minfn log n; np maxg) where n 0 denotes the number of precedence relations, and n and p max denote the number of jobs and the largest processing time, respectively. 1 Introduction We consider the problem of scheduling jobs on m identical machines subject to precedence constraints and unit time communication delays. The history of this problem is as follows. In 1987, Rayward-Smith [16] proved that the problem of scheduling unit-time jobs with unit communication delays is N P{hard. In the same paper it is shown that the relative worst case performance of every greedy schedule for this problem is at most (3? 2=m). In 1992, Picouleau presented in his thesis [15] a polynomial time algorithm to solve the problem P 1jprec; p j ; c ij jc max for the case of single-source/single target series parallel orders. His approach does, however, not hold for arbitrary series-parallel orders considered in this paper. In 1993, Lenstra, Veldhorst and Veltman [11] showed that the problem P jprec; c ij = 1jC max becomes N P{hard already for in-trees, thus in particular for the larger class of series-parallel orders. The given transformation can even be extended to binary in-trees [19]. For the two machine case, the authors constructed a linear time algorithm that solves the problem optimally. In the same year, Lawler [10] developed a linear time approximation algorithm for the problem P mjtree; p j = 1; c ij = 1jC max, which yields a schedule missing the minimum makespan by at most (m? 2) time-units. Technische Universitat Berlin, Fachbereich Mathematik, Sekr. MA 6-1, Strae des 17. Juni 136, 10623 Berlin, Germany, e-mail: fmoehring,schaeterg@math.tu-berlin.de. 1

In July 1995, Guinand, Rapine and Trystram [7] presented an approximation algorithm that improves the bound for the absolute error of [10] by a factor of 2 to m?1 2. In 1994/95, Hanen and Munier [8] obtained an approximation algorithm for the problem P jprec; c ij jc max with a worst case ratio of 4+3 2+? 2+2 (2+)m, where denotes the ratio between the maximum communication delay and the minimum processing time. For the case of unit processing times and unit time communication delays, this bound is 7? 4 3 3m (see also [13]). This bound is obtained by solving an LP-relaxation and rounding the obtained fractional values. For the corresponding problem with an unlimited number of machines, the relative performance is 4. 3 In 1995, Verriet [18] dealt with the problem P 2jst-ser-par; p j = 1; c ij = 1jL max, scheduling unit time jobs subject to a single source/single target series-parallel order, unit communication delays, release times and deadlines, in order to minimize the maximum lateness. He introduced a sucient condition, called the least urgent parent property, on the instance that allowed him to nd an optimal schedule for problems fullling this condition. In the same year, Munier and Hanen [14] presented a list scheduling algorithm for the case that job duplication is allowed, whose worst case relative performance ratio is at most 2? 1 m for the problem P jdup, prec; p j = 1; c ij = 1jC max. In this paper, we consider the case that the precedence constraints are given by a series-parallel order (see Section 2.2). We present a simple approximation algorithm that produces a feasible schedule whose worst case performance ratio is at most two. The algorithm is based on priority list scheduling. The rst part of the paper deals with determining the priorities. They correspond to a \generalized" height in the partial order that takes the communication delays into account, and are calculated along the series-parallel decomposition tree of the partial order. We then show that this height function yields an optimal schedule if suciently many machines are available, i.e. for the problem P 1jprec; c ij = 1jC max. Determining such a schedule is N P{ hard for arbitrary orders [15], and polynomial for several special cases including trees and single source/single target series-parallel orders, but not arbitrary series-parallel orders, see [15, 4, 1, 2, 3]. It turns out that, for arbitrary orders, every optimal schedule for an unlimited number of machines induces already a height function with the same \essential" properties as those obtained for series-parallel orders. This is even true for communication delays with values 0 or 1. The second part of the paper, beginning with Section 4, uses only these essential properties of the height function, i.e. the results of this part apply to arbitrary orders. Thus the necessary height function can be derived from an optimal schedule or, for the case of a series-parallel order, be constructed as described in Section 3. We rst obtain a priority list from the given height function. We then present a generalization of the standard list scheduling approach to problems with communication delays and apply it to the obtained priority list and any number m of machines. Arguments about the idle times in the resulting schedule then yield that the makespan of this schedule is at most twice the minimum makespan. This is currently the best performance ratio for the problem P jseries-parallel; c ij = 1jC max. The overall computation time of our approach is O(jEj + jv j log jv j), including both the calculation of the height function on the series-parallel order and the list scheduling algorithm. This bound can be improved to O(jEj + jv j) if all processing times are bounded by a xed constant, e.g. in the case of unit processing times. Here V denotes the set of jobs while E denotes the set of precedence constraints, E V V. A detailed example of the algorithms is given in the appendix. 2

2 Denitions and Notations A comprehensive overview on the theory of scheduling can be found in [6]. We only give a brief overview on the denitions and notations necessary here. In order to be compatible with the commonly used notation, we give the denitions for arbitrary communication delays. Since we suppose that the precedence constraints are given by a series-parallel order, we also introduce the necessary denitions and notations for series-parallel orders here. A comprehensive overview can be found in [12] and [17]. 2.1 Notations in Scheduling Theory An instance I = (m; V; p; ; c) for the problem P jseries-parallel; p j ; c ij jc max consists of the number m of machines, a set V of jobs, a series-parallel precedence order on the set V of jobs, a processing time p(v) 1 for every job v 2 V, and an interprocessor communication delay c(v; w) 0 for every pair of jobs v, w such that v is a direct predecessor of w. In this paper, we focus on the case of unit time communication delays c(u; v) = 1. We denote the set of direct successors of a job v by Succ!(v) and the set of all successors (direct or indirect) of v, not including v, by Succ(v). The corresponding sets of predecessors are denoted by Pred!(v) and P red(v), respectively. A schedule S on a job set V is a function assigning a starting time S(v) to each job v 2 V. The completion time of a job v is then given by C(v) = S(v) + p(v). The length (or makespan) of a schedule S can be computed as C max (S) = maxfc(v) j v 2 V g. For a schedule S, we call the intervals s t = [t; t + 1] for t = 0; : : : ; C max (S)? 1 the time slots of S. We call a schedule S weakly feasible if it respects the limited number of machines and the precedence constraints. (I) No more than m jobs are scheduled simultaneously in any time slot s t, i.e. ]fv 2 V j S(v) t < C(v)g m for all t = 0; : : : ; C max (S)? 1. (II) No job is started before all its predecessors have been nished, i.e. S(v) C(u) for all jobs v 2 V and u 2 P red(v). A weakly feasible schedule does not respect the communication delays. In the presence of communication delays, we have to modify the second condition leading to the following denition of feasibility. A schedule is called feasible if all jobs v 2 V fulll the following conditions. (I) No more than m jobs are scheduled simultaneously in any time slot s t, i.e. ]fv 2 V j S(v) t < C(v)g m for all t = 0; : : : ; C max (S)? 1. (IIa) S(v) C(u) for all u 2 Pred!(v). (IIb) S(v) < C(u) + c(u; v) for at most one job u 2 Pred!(v). For a given instance I = (m; V; p; ; c) for the problem P jseries-parallel; p j ; c ij jc max, let C opt (I) denote the minimum makespan among all feasible schedules. Similarly, let C 1 opt(i) denote the minimum makespan among all schedules that only fulll (IIa) and (IIb), i.e. assume an unlimited number of machines. Clearly, C 1 opt(i) C opt (I). For a feasible schedule S one can easily construct a feasible machine-assignment, i.e. an assignment of every jobs v to a machine M(v) 2 f1; 2; : : : ; mg such that the following conditions are fullled. (I') At any time t = 0; : : : ; C max (S)? 1, at most one job is assigned to each machine M, i.e. ]fv 2 V j S(v) t < C(v) and M(v) = Mg 1 for each M 2 f1; 2; : : : ; mg. (II'a) S(v) C(u) for all u 2 Pred!(v) with M(u) = M(v). (II'b) S(v) C(u) + c(u; v) for all u 2 Pred!(v) with M(u) 6= M(v). 3

A simple way to nd a feasible machine-assignment for a feasible schedule S is as follows. First assign all jobs with a starting time of zero, i.e. the jobs in time slot s 0, arbitrarily to the machines. Condition (I) guarantees that every jobs in time slot s 0 can be assigned to a machine. Then iterate over the time slots s i, i = 1; 2; : : : ; C max? 1. For each time slot s i consider rst those jobs v that have at least one direct predecessor u such that S(v) < C(u) + c(u; v). Condition (IIb) states that there is at most one such direct predecessor u of each job v. Hence, we can set M(v) = M(u) for these jobs v. The remaining jobs v 0 in time slot s i then fulll S(v 0 ) C(u) + c(u; v 0 ) for each direct predecessor u of v 0. Hence, these jobs v 0 can be scheduled on every one of the still empty machines. Then make an arbitrary such assignment. Again, Condition (I) guarantees that there are enough empty machines to assign every job v 0 in time slot s i to one machine. Note that this approach takes only linear time for arbitrary partial orders. 2.2 Series-Parallel Orders A partial order is a pair (V; ) consisting of a set V and a strict order relation on V, i.e. a transitive and asymmetric binary relation, denoted by u v. There exist several slightly dierent denitions of series-parallel orders, such as single source/single target parallel and edge seriesparallel orders (see [17, 12]). These are special cases of the following order-theoretic denition. A partial order is called series-parallel if it can be obtained recursively from singletons by two operations, the series composition and the parallel composition of two (series-parallel) sub-orders. The smallest series-parallel order consists of a single element, called a singleton. Suppose that 1 and 2 are two series parallel orders on disjoint sets. Then the series and the parallel composition of 1 and 2 are again series-parallel orders. The series composition S = (V; ) of two series-parallel orders 1 = (V 1 ; 1 ) and 2 = (V 2 ; 2 ) with V 1 \V 2 = ; is dened by V = V 1 [V 2 and v 1 v 2 if v 1 1 v 2 in 1, or v 1 2 v 2 in 2, or v 1 2 V 1 and v 2 2 V 2. Loosely speaking, S introduces the additional precedence constraints v 1 v 2 for all v 1 2 1 and all v 2 2 2. S is denoted by S = 1 2. The sets V 1 and V 2 are called the series blocks of S. The parallel composition P = (V; ) of two series-parallel orders 1 = (V 1 ; 1 ) and 2 = (V 2 ; 2 ) with V 1 \ V 2 = ; is dened by V = V 1 [ V 2 and v 1 v 2 if v 1 1 v 2 in 1 or v 1 2 v 2 in 2. Loosely speaking, P is the disjoint union of 1 and 2. P is denoted by P = 1 [ 2. The sets V 1 and V 2 are called the parallel blocks of S. According to this denition, every series-parallel order can be obtained by a sequence of successive series and parallel compositions, starting with singletons. Conversely, every series-parallel order can be decomposed into series-parallel sub-orders. The structure of this decomposition is a rooted binary tree, called the decomposition tree of the series-parallel order. The nodes of the decomposition tree correspond to series-parallel sub-orders. Every inner node is a series or a parallel composition of its two children. The root of the decomposition tree corresponds to the partial order on the entire set V, while the leaves correspond to the singletons. A decomposition tree of a given series-parallel order can be determined recursively as follows. Start with the entire order as the root of the tree. According to the denition of series-parallel orders, every non-singleton sub-order 0 is obtained by a series or parallel composition of two series-parallel sub-orders of 0. Let these sub-orders be the children of the node corresponding to 0 and continue with the recursion. For a series composition, order the children from the left to the right according to the associated precedence relations. For a parallel composition, order the children arbitrarily. The recursion stops when only singletons are left. For a given series-parallel order = (V; ) with n = jv j elements and n 0 = j j ordered pairs, a decomposition tree can be computed in O(n + n 0 ) time [17]. The binary decomposition tree need not be unique when there are repeated series or parallel compositions. For instance, 1 2 3 has (at least) the two series compositions decompositions 4

( 1 2 ) 3 and 1 ( 2 3 ). One can dene a unique canonical decomposition tree by making repeated compositions of the same kind children of the same node. In this tree, series and parallel compositions alternate along every path in the tree. The canonical decomposition tree represents all possible binary decomposition trees, see the example in the appendix. The single source/single target series parallel orders introduced by Picouleau [15, x3.1.5] are a (very restricted) special case of arbitrary series-parallel orders. In our terminology, they correspond to series-parallel orders in which all series and parallel blocks are required to have a unique minimal (called the source) and a unique maximal (called the target) element. Thus these series-parallel orders are still very close to trees. 3 Dening the Priorities We now determine for a given series-parallel partial order a priority function on the set of jobs. This function can be understood as a \generalized" height in that takes the communication delays into account. Roughly speaking, this height function assigns to a job v an upper bound h(v) for the amount of time that it takes to schedule v and all its successors, when enough machines are available. Commonly in scheduling, such bounds are computed by traversing the partial order \from back to front", i.e. h(v) is calculated after h(w) has been calculated for all successors w of v. We will proceed dierently and compute h(v) along the decomposition tree of the partial order, thus exploiting the fact that it is series-parallel. In doing so, we must consider how to obtain the h(v) values for a parallel or series composition from the already computed h(v) values of the two parallel or series blocks B 1 and B 2 in the composition. To this end, we maintain for every block a class of schedules that are represented by their interface type. This interface type models the dierent cases how schedules of two blocks can be put together in a parallel or series composition. The crucial case is always given when one has to compose schedules S 1 ; S 2 of two series blocks B 1 ; B 2, and there are two or more jobs in the last time slot of S 1 or in the rst time slot of S 2. In this case, one must insert an extra time slot for the communication delays. The construction along the decomposition tree is such that this bad situation is avoided as much as possible. To capture this formally, we introduce the following notation. For a given schedule S on a block B V, dene its interface type (x; y) consisting of the numbers x and y of jobs scheduled in the rst and the last time slot of S, respectively, reduced to a maximum of two. Hence, x denotes the minimum of 2 and the number of jobs in the rst time slot of S, while y denotes the analogue for the last time slot of S. Thus the tuple (x; y) can only take one of the values (1; 1); (1; 2); (2; 1) and (2; 2). In addition to these interface types, we will need one more for the bad situation mentioned above. This will be introduced below. For each block B of the series-parallel composition, we will determine the length and the possible interface types of an optimal schedule for this block on suciently many machines. The length of such an optimal schedule is denoted by l(b), while tp(b) denotes the set of all componentwise minimal interface types over all such optimal schedules for B. Every job v of B obtains a local height-value h B (v). The values l(b); tp(b), and h B (v) are determined bottom up along the decomposition tree of the series-parallel order. An important fact is that the dierent interface types in tp(b) represent possible interface types of optimal schedules for B. As an example consider the parallel composition of two chains, where one chain consists of three jobs, the other of two jobs, and where all processing times are 1. If both chains start at the same time, the resulting schedule has interface type (2; 1). If the shorter chain is delayed by one time-unit, the resulting schedule has interface type (1; 2). Case 2 of Figure 2 depicts both situations. For each interface type in tp(b), the corresponding optimal schedule having exactly that interface type can be derived directly from the height function by setting S(v) = h max? h B (v) where h max denotes the maximum height value h B (u) among all jobs u in B. Hence the exact height 5

value depends on the chosen interface type. If several interface types are possible, the height function will give the minimum possible height. This means that a modication of these height values may be necessary if a particular interface type is chosen. In order to keep track of the required changes, jobs for which a modication may become necessary are labeled with the interface type that determines the necessary modication. The modication rules will in fact guarantee that, for each choice of possible interface type (x; y) 2 tp(b), one obtains a height function that induces an optimal schedule with the interface type (x; y) on B. Moreover, the modication is simple. It just consists in increasing the height of all jobs v with label (x; y) by one time-unit. The overall height function h on V is then given by the local height function h Br in the root B r of the decomposition tree. The problem of nding an optimal schedule for a series-parallel order on B is equivalent to the problem of nding an optimal composition of feasible schedules for the blocks of. To this end, not only optimal schedules have to be considered{it may be necessary to select sub-optimal schedules for some of the blocks. An example is given in Figure 1. This is the reason why we introduce the additional interface type (?;?). So altogether, there are 5 interface types (1; 1), (1; 2), (2; 1), (2; 2), and (?;?). Every block B will have one of the following sets tp(b) of interface types: f(1; 1)g, f(1; 2)g, f(1; 2); (2; 1)g, f(2; 1)g, f(2; 2)g, and f(?;?)g. If tp(b) = f(?;?)g, then every optimal schedule for B (on suciently many machines) has interface type (2; 2), but there is a sub-optimal feasible schedule with interface type (1; 1) that is only one time-unit longer. Figure 1 gives an example why this distinction is necessary. Consider the block B consisting of the jobs u i and v i for i = 1; 2; 3. This block results from a parallel composition of the two blocks B u = fu 1 ; u 2 ; u 3 g and B v = fv 1 ; v 2 ; v 3 g. Suppose that block B is then in a series composition with two blocks B a and B c. If B a and B c consist of more than one job each, say B a = fa 1 ; a 2 g and B c = fc 1 ; c 2 g, then an optimal schedule for B a [ B [ B b is obtained from an optimal schedule for B as shown in the upper right schedule of Figure 1. If, however, the blocks B a and B c each consist of one job only, say B a = fa 1 g and B c = fc 1 g, this is no longer true. Instead of using an optimal schedule for B of length 3 with interface type (2; 2), we have to choose a sub-optimal schedule of length 4 with the better interface type (1; 1). The decision which type of schedule will be used later cannot be made locally when block B is considered, but only when a schedule for B will be used. So we have to store both possibilities in order to choose the right one in the later computation. So a block B that has an optimal schedule of length l(b) with interface type (2; 2) and a suboptimal schedule of length l(b) + 1 with interface type of (1; 1) obtains the set tp(b) = f(?;?)g as set of interface types. a 1 u 1 u 2 u 3 a 2 v 1 v 2 v 3 c 1 c 2 a 1 u 1 u 2 u 3 v 1 v 2 v 3 c 1 a 1 u 1 u 2 u 3 c 1 a 2 v 1 v 2 v 3 c 2 a 1 u 1 u 2 u 3 v 1 v 2 v 3 c 1 a 1 u 1 u 2 u 3 c 1 a 2 v 1 v 2 v 3 c 2 Figure 1: A partial order and four possible schedules We will now specify how the values l(b); tp(b); h B (v) and the job labels are determined along the decomposition tree of the series-parallel order. This turns out to be more complicated for the series composition than for the parallel composition. The parallel composition corresponds more or less to a \maximum" operation, while 6

nding an optimal composition of sub-schedules for a sequence of series composition of several blocks is equivalent to a shortest path problem in a multi-layer graph. These interpretations will be elaborated with the aid of the example given in the appendix. So suppose that a binary decomposition tree of is given. Singleton Blocks: Every single job v corresponds to a singleton block B v with l(b v ) = h Bv (v) = p(v), and tp(b v ) = f(1; 1)g. Jobs in singleton blocks are not labeled. Parallel-composition: Consider a block B resulting from a parallel-composition B = 1 [ 2 of two blocks B 1 and B 2. Without loss of generality, assume l(b 1 ) l(b 2 ). The values of l(b); tp(b), the local height values in B and job labels are dened by the following case analysis. Figure 2 illustrates these cases. Case 1: l(b 1 ) l(b 2 ) + 2 Preserve all labels of jobs in B 1, remove all labels of jobs in B 2. Set l(b) = l(b 1 ), and tp(b) = tp(b 1 ). Let h B (v) = h B1 (v) for all v 2 B 1 and h B (v) = 1 + h B2 (v) for all v 2 B 2. Case 2: l(b 1 ) = l(b 2 ) + 1 Remove all labels of jobs in B 1 and B 2. Set l(b) = l(b 1 ). Let h B (v) = h Bi (v) for all v 2 V (B i ) (with i = 1; 2). If tp(b 1 ) = f(?;?)g then set tp(b) = f(?;?)g. Otherwise consider the set f(a; 2); (2; b)g with a = minfa j 9b 2 f1; 2g : (a; b) 2 tp(b 1 )g and b = minfb j 9a 2 f1; 2g : (a; b) 2 tp(b 1 )g. Let tp(b) be the set of these pairs (a; b) that are component-wise minimal. If (2; 1) 2 tp(b), label all jobs in B 2 with (2; 1). Case 3: l(b 1 ) = l(b 2 ) Remove all labels of jobs in B 1 and B 2. Set l(b) = l(b 1 ) = l(b 2 ). Let h B (v) = h Bi (v) for all v 2 V (B i ) (with i = 1; 2). If ((1; b) 2 tp(b 1 ) and (a; 1) 2 tp(b 2 )) or ((1; b) 2 tp(b 2 ) and (a; 1) 2 tp(b 1 )) for some a; b 2 f1; 2g, set tp(b) = f(?;?)g and label all jobs in B 2 with (1; 1). Otherwise set tp(b) = f(2; 2)g and leave all jobs unlabeled. Series-composition: Consider a block B resulting from a series-composition B = 1 2 of two blocks B 1 and B 2. The values of l(b); tp(b), the local height values in B and job labels are dened by the following case analysis. Case 1: direct coupling If condition (x; 1) 2 tp(b 1 ) and (1; y) 2 tp(b 2 ) for some x; y 2 f1; 2g (1) holds for some x; y 2 f1; 2g, then two feasible schedules, say S 1 and S 2, for the blocks B 1 and B 2 with interface types (x; 1) and (1; y) can be coupled directly as to obtain an optimal schedule S for block B. This is achieved by the following settings: Set l(b) = l(b 1 ) + l(b 2 ). Increase the height values of all jobs in B 1 labeled with (x; 1) and every job in B 2 labeled with (1; y) by one. Remove all labels of jobs in B 1 and B 2. Set h B (v) = h B2 (v) for all v 2 B 2 and h B (v) = l(b 2 ) + h B1 (v) for all v 2 B 1. The set tp(b) consists of the interface type (x; y) derived from the chosen types (x; 1) 2 tp(b 1 ) and (1; y) 2 tp(b 2 ) fullling Condition (1). Case 2: buered coupling If Condition (1) can not be fullled by tp(b 1 ) and tp(b 2 ), an extra time slot has to be inserted between any two optimal schedules for B 1 and B 2. Hence we set l(b) = l(b 1 ) + l(b 2 ) + 1. The set tp(b) is determined by the following case analysis. 7

Case 2a: tp(b 1 ) = f(?;?)g and tp(b 2 ) 6= f(?;?)g Set tp(b) = tp(b 2 ). If (1; b) 2 tp(b 2 ) for some b 2 f1; 2g, increase the height value for for every job in B 1 labeled with (1; 1) and of every job in B 2 labeled with (1; b). Remove all labels of jobs in B 1 and B 2. Case 2b: tp(b 1 ) 6= f(?;?)g and tp(b 2 ) = f(?;?)g Set tp(b) = tp(b 1 ). If (a; 1) 2 tp(b 1 ) for some a 2 f1; 2g, then increase the height value of every job in B 2 labeled with (1; 1) and of every job in B 1 labeled with (a; 1), by one. Remove all labels of jobs in B 1 and B 2. Case 2c: tp(b 1 ) = tp(b 2 ) = f(?;?)g Set tp(b) = f(?;?)g. Preserve all labels of jobs in B 1 and B 2. Case 2d: tp(b 1 ) 6= f(?;?)g and tp(b 2 ) 6= f(?;?)g Set tp(b) = (x; y) with x = minfa j 9b 2 f1; 2g : (a; b) 2 tp(b 1 )g and y = minfb j 9a 2 f1; 2g : (a; b) 2 tp(b 2 )g. Increase the height value of every job in B 1 and B 2 that is labeled with (1; b) and (a; 1) for some a; b 2 f1; 2g, respectively. Preserve all labels of jobs in B 1 and B 2. After tp(b) has been determined, the local height function of B is derived from the (possibly already modied) local height functions of B 1 and B 2 as follows: Set h B (v) = h B2 (v) for all v 2 B 2 and h B (v) = 1 + l(b 2 ) + h B1 (v) for all v 2 B 1. M 1 M 1 M 1 M 1 M 1 or or M 2 M 2 M 2 M 2 M 2 Case 1 Case 2 Case 3 Figure 2: Case 1: l(b 1 ) l(b 2 ) + 2 Case 2: l(b 1 ) = l(b 2 ) + 1 Case 3: l(b 1 ) = l(b 2 ) This completes the denition of the height function. Let I = (m; V; p; ) be an instance for the problem P jseries-parallel; c ij = 1jC max. The above denition results in local values l(b); tp(b); h B (:) and job labels for every block B in the decomposition tree. The local height function of a block B induces a feasible schedule for the jobs in B on an unlimited number of machines. If jobs are labeled, several other local height functions are possible. Each of them induces a feasible schedule that has an interface type contained in the set tp(b) of possible interface types of B. This is proved in Lemma 1. Moreover, this feasible schedule is optimal which is shown in Lemma 3. Lemma 1 If no job in a block B has a label, then tp(b) consists of a unique interface type (x; y) 6= (?;?), and h B is the only local height function of B. In all other cases, there are alternative height functions h 0 B for every job label (x0 ; y 0 ) occurring in B. These are obtained by increasing the values h B (v) of all jobs v labeled with (x 0 ; y 0 ) by one time-unit. Any height function h 0 B (including h B) induces a feasible schedule S 1 h for B on suciently 0 B many machines by setting S 1 h (v) = 0 maxfh 0 B (u) j u 2 Bg?h0 B (v). Its length is C max(s 1 h = 0 B B) maxfh 0 B (v) j v 2 Bg. The interface type of the resulting schedule is (x; y) for h B, and the interface type (x 0 ; y 0 ) inducing h 0 B for the alternative height functions h0 B. 8

Proof The proof follows directly from the construction of the local block values. Note that in each step, the values of a block B result directly from a composition of two feasible schedules of the corresponding sub-blocks of the series-parallel decomposition. 2 For zero-one communication delays, the series composition of two optimal schedules for two series blocks B 1 ; B 2 misses the minimum makespan of the series composition of B 1 and B 2 by at most one as stated by the following lemma. Lemma 2 Consider the problem P 1jseries-parallel; c ij 2 f0; 1gjC max. Let I 1 and I 2 denote two instances for this problem, and let I denote the instance resulting from a series composition of the orders of I 1 and I 2. Then, C 1 opt(i 1 ) + C 1 opt(i 2 ) C 1 opt(i) C 1 opt(i 1 ) + C 1 opt(i 2 ) + 1 Proof Consider an optimal schedule S for the instance I of the problem P 1jseries-parallel; c ij 2 f0; 1gjC max. Since I is the series composition of I 1 and I 2, S decomposes into two feasible subschedules S 1 and S 2 for I 1 and I 2, respectively. Obviously, C 1 opt(i) = C max (S) C max (S 1 ) + C max (S 2 ) C 1 opt(i 1 ) + C 1 opt(i 2 ) Since all communication delays are at most one, every two optimal schedules S 0 1 and S 0 2 for the instances I 1 and I 2 can be combined to a feasible schedule S 0 for the instance I by introducing an empty time slot between the last time slot of S 0 1 and the rst time slot of S 0 2. Hence, C 1 opt(i 1 ) + C 1 opt(i 2 ) + 1 = C max (S 0 ) C 1 opt(i) We now use Lemma 1 and Lemma 2 to prove the following result. Lemma 3 Consider an instance I of the problem P 1jseries-parallel; c ij = 1jC max. Let B be a block of the series-parallel decomposition of the order of I, and consider the sub-instance I B induced by B (i.e. with job set B and precedence constraints, processing times etc. induced by instance I.) Then: (i) l(b) = maxfh B (v) j v 2 Bg. (ii) Setting S 1 B (v) = maxfh B(u) j u 2 Bg? h B (v) for all v 2 B yields an optimal schedule S 1 B for B that fullls l(b) = C max (S 1 B ) = C1 opt(i B ). (iii) Let S be an optimal schedule for B with interface type (x; y). If tp(b) = f(?;?)g then (x; y) = (2; 2). If tp(b) 6= f(?;?)g, there exists an interface type (x 0 ; y 0 ) 2 tp(b) such that x x 0 and y y 0. If tp(b) = f(2; 2)g then there does not exist a feasible schedule with length C 1 opt(i) + 1 and interface type (1; 1). Proof The proof is done by induction along the series-parallel decomposition tree of. First, Statements (i){(iii) are trivially true for singleton blocks. Let us assume that (i){(iii) hold for two blocks B 1 and B 2. We must show that they also hold for the series and parallel composition of these two blocks. 2 9

Case 1: parallel composition Let B be the parallel composition of B 1 and B 2. We distinguish the following cases according to the construction along the tree. Case 1a: l(b 1 ) l(b 2 ) + 2 Since all settings for B are the same as for B 1, Statements (i){(iii) hold for block B since they hold for block B 1 by the inductive hypothesis. Case 1b: l(b 1 ) = l(b 2 ) + 1 ad (i): Due to the denition in this case, l(b) = l(b 1 ) and maxfh(v) j v 2 Bg = maxfh(v) j v 2 B 1 g. This gives Statement (i) by the inductive hypothesis. ad (ii): Since Statement (ii) holds for the blocks B 1 and B 2, the height functions of these blocks induce optimal schedules S 1 1 and S 1 2 for B 1 and B 2. Setting h(v) = h 1 (v) for all v 2 B 1 and h(v) = h 2 (v) or even h(v) = h 2 (v) + 1 for all v 2 B 2 induces a feasible schedule S 1 B for B (in the sense of Statement (ii)) that fullls C max (S 1 B ) = C max (S 1 1 ) = l(b 1 ) = l(b) which, by inductive hypothesis, equals Copt(I 1 B1 ) Copt(I 1 B ). Hence S 1 B B. is optimal for ad (iii): Let S denote an arbitrary optimal schedule for B with interface type (x; y) 2 tp(b). S then decomposes into two parallel sub-schedules S 1 and S 2 for B 1 and B 2, respectively. Due to the inductive assumption, Statement (ii) is valid for B 1 and B 2. Let I 1 and I 2 denote the sub-instances of I induced by B 1 and B 2 respectively. Then Copt(I) 1 = C max (S) max (C max(s i )) i=1;2 max i=1;2? C 1 opt (I i ) (ii) = max i=1;2 (l(b i)) = l(b 1 ) = l(b 2 ) + 1 = l(b) Thus, the interface type (x; y) of S is obtained directly from the interface types (x 1 ; y 1 ) and (x 2 ; y 2 ) of the sub-schedules S 1 and S 2, respectively. We can assume without loss of generality that S 1 and S 2 are optimal for I 1 and I 2, respectively. Hence, either (x; y) = (x 1 ; minf2; y 1 + y 2 g) or (x; y) = (minf2; x 1 + x 2 g; y 1 ). Since Statement (iii) is valid for S 1 and S 2, this proves that Statement (iii) holds for S. Case 1c: l(b 1 ) = l(b 2 ) The proof for this case is analogous to that for Case 1b. Case 2: series composition Let B be the series composition of B 1 and B 2. Then every job of B 1 is a predecessor of every job in B 2. For Statements (ii) and (iii) consider the schedule S 1 B. Due to the series composition, this schedule decomposes into two, not necessarily optimal, sub-schedules S 1 1 and S 1 2 for B 1 and B 2. Let I 1 and I 2 denote the corresponding sub-instances of I induced by B 1 and B 2, respectively. ad (i): Consider the interface types of B 1 and B 2 and distinguish between the following two cases. direct coupling: (a; 1) 2 tp(b 1 ) and (1; b) 2 tp(b 2 ) for some a; b 2 f1; 2g This case was called direct coupling in the denition of the local block values. In this case, l(b) = l(b 1 ) + l(b 2 ) and h max (B) = l(b 2 ) + h max (B 1 ) by denition. By the inductive assumption, h max (B) = l(b 2 ) + l(b 1 ) = l(b). 10

buered coupling: If the previous case cannot be achieved by any interface types of B 1 and B 2, the construction yields l(b) = 1 + l(b 1 ) + l(b 2 ). This case was called buered coupling, and in this case h max (B) = 1 + l(b 2 ) + h max (B 1 ) which, by the inductive assumption, is equal to 1 + l(b 2 ) + l(b 1 ) = l(b). ad (ii): Lemma 1 and Statement (i) show that S 1 B is a feasible schedule of length C max(s 1 B ) = l(b). As in the proof of Statement (i), we distinguish between the cases of direct and buered coupling. direct coupling: (a; 1) 2 tp(b 1 ) and (1; b) 2 tp(b 2 ) for some a; b 2 f1; 2g Since Statement (iii) is valid for the blocks B 1 and B 2, there exist two optimal schedules for B 1 and B 2 corresponding to interface types (a; 1) and (1; b). These schedules can be composed to a feasible schedule S without an additional time slot between them. Hence, C 1 opt(i B ) C max (S) = C 1 opt(i 1 ) + C 1 opt(i 2 ) (ii) = l(b 1 ) + l(b 2 ) = l(b) : With Lemma 2, S 1 B optimal. buered coupling: This case occurs only if is optimal and, by the inductive hypothesis, has length l(b), it is minfb j 9a 2 f1; 2g : (a; b) 2 tp(b 1 )g = 2 or minfa j 9b 2 f1; 2g : (a; b) 2 tp(b 2 )g = 2: Since Statement (iii) is valid for B 1 and B 2, every optimal schedule S 1 for B 1 contains at least two jobs in its last time slot. The analogue is true for the last time slot of every optimal schedule S 2 for B 2. Hence the sum of the minimum makespans of the sub-instances I 1 and I 2 is strictly smaller than the minimum makespan of the instance I B. With Lemma 2, we obtain in this case that C 1 opt(i B ) = C 1 opt(i 1 ) + C 1 opt(i 2 ) + 1: In the construction of the local height function, at most one sub-schedule S 1 B 1 or S 1 B 2 is not optimal and then exactly one time-unit longer than the minimum makespan. Hence we obtain that C max (S 1 B ) = C 1 opt(i B ) = C 1 opt(i 1 ) + C 1 opt(i 2 ) + 1; which, together with the previous equation, proves the optimality of C max (S 1 B ). ad (iii): We prove Statement (iii) by contradiction. There are four similar cases to be considered. tp(b) = f(?;?)g and there exists an optimal schedule with an interface type dierent from (2; 2). tp(b) 6= f(?;?)g and (1; 1); (1; 2) 62 tp(b) while there exists an optimal schedule with a single job in the rst time slot. tp(b) 6= f(?;?)g and (1; 1); (2; 1) 62 tp(b) while there exists an optimal schedule with a single job in the last time slot. tp(b) = f(2; 2)g and there exists a feasible schedule one time-unit longer than the minimum makespan with interface type (1; 1). Since the proofs for all these cases are similar, we consider the second case only. So suppose that tp(b) contains none of the interface types (1; 1) and (1; 2), while there exists an optimal schedule S for B that contains only one job in its rst time slot. Consider the sub-schedules S 1 and S 2 of S induced by the jobs in B 1 and B 2, respectively. 11

If S 1 is optimal, then the set tp(b 1 ) must contain at least one of the interface types (1; 1) and (1; 2), since Statement (iii) is supposed to hold for B 1 due to the inductive assumption. Note that tp(b) is determined from tp(b 1 ) and tp(b 2 ). Since tp(b 1 ) contains an interface type (1; b) for some b 2 f1; 2g, the denition of the local values for block B (in the case of a series composition) yields that tp(b) must contain one of the interface types (1; 1) and (1; 2). This contradicts the above assumption. If S 1 is not optimal for B 1, then Lemma 2 states that C max (S) = C max (S 1 ) + C max (S 2 ) since S is optimal for B. Hence, the sub-schedules S 1 and S 2 are composed to S without an additional time slot between the last time slot of S 1 and the rst time slot of S 2. Since S 1 is not optimal for B 1, the schedule S 2 must be optimal for B 2. Otherwise, due to the unit-time communication delays, the two optimal sub-schedules could be composed by introducing an empty time slot between them, obtaining a feasible schedule for B that is shorter than S. Since S 1 and S 2 are composed without an extra time slot between them, the last time slot of S 1 as well as the rst time slot of S 2 contain exactly one job each. Hence S 1 has interface type (1; 1) and S 2 has either (1; 1) or (1; 2). Since Statement (iii) holds for B 1 and B 2 by the inductive assumption, tp(b 1 ) 6= f(2; 2)g and tp(b 2 ) must contain (1; 2) or (1; 1). Moreover, tp(b 1 ) cannot contain one of the interface types (2; 1) and (1; 1) since otherwise Statement (iii) guarantees the existence of an optimal schedule S 0 1 for B 1 with a single job in the last time slot. This schedule S 0 1 could be composed directly with S 2, yielding a feasible schedule for B that is shorter than S. Hence, there are only two possible values for tp(b 1 ), either tp(b 1 ) = f(1; 2)g or tp(b 1 ) = f(?;?)g. Since (1; b) 2 tp(b 2 ) for some b 2 f1; 2g, tp(b) contains in both cases one of the interface types (1; 1) and (1; 2), which contradicts the above assumption. This completes the proof of Lemma 3. 2 Applying Lemma 3 to the root block of the decomposition tree then yields: Theorem 1 Let I = (V; p; ) be an instance of the problem P 1jseries-parallel; c ij = 1jC max. Consider the height-values h(v) = h Br (v) obtained for the root block B r = V of the decomposition tree of. Then S 1, dened by S 1 (v) = maxfh(u) j u 2 Bg? h(v), is an optimal schedule for I. The construction in the proof of Lemma 4 shows that, in the case of a series composition, at most one successor of each job v is scheduled on the same machine as v. So we have: Corollary 1 Consider the problem P 1jprec; p j ; c ij = 1jC max on an innite number of machines. Then there exists an optimal schedule such that every job v has at most one direct successor that is scheduled on the same machine as v. This is a very useful property that does not hold for arbitrary communication delays. As an example, consider three jobs a; b and c with unit processing time, precedence constraints a b and a c, and communication delays c(a; b) = c(a; c) = 3. There is only one optimal way to schedule these jobs optimally: schedule all jobs on the same machine. Corollary 1 can be generalized to arbitrary partial orders and locally small communication delays, i.e. communication delays fullling maxfc(u; v) j v 2 Succ!(u)g minfp(v) j v 2 Succ!(u)g for all jobs u 2 V. This can be proved by contradiction and an exchange argument. 12

4 Priority List Scheduling In the previous section, we have dened a height function h for the jobs of a series-parallel order. We will now consider the values h(v) as priorities and apply priority list scheduling to the list of jobs ordered by non-increasing priority values, thus obtaining an m-machine schedule with a worst-case performance ratio of 2 for the makespan on any number m of machines. The priority list scheduling algorithm schedules jobs at certain decision times. These decision times are the time t = 0, the completion times of jobs, and, in order to incorporate the unit-time communication delays, the completion times of jobs plus one. These decision times depend on the previously taken decisions and are processed in ascending order. At every decision time t, the algorithm repeatedly chooses among the still unscheduled jobs a job with highest priority (the rst \available" job in the given list) that can be started at time t. This is repeated until no more job can be started at t. Then the next decision time is considered. Here, a job v is called available at time t if C(u) t for all predecessors u of v, C(^u) > t? c(^u; v) for at most one predecessors ^u of v. This algorithm does not determine a machine assignment, but the resulting schedule fullls all conditions required for feasibility (cf. Conditions (I), (IIa), (IIb) in Section 2.1). Hence a machine assignment can be obtained according to the procedure described there. So, compared with list scheduling rules for problems without communication delays, we only need to incorporate Condition (IIb). This means that at most one direct successor of a job u can be scheduled directly after the completion of u. Figure 3 gives a detailed description of the thus adapted priority list scheduling algorithm. Priority list scheduling algorithm 1 Let L = (v 1 ; v 2 ; : : : ; v n ) be a list of all jobs in V ordered by nondecreasing priorities. 2 t = 0 3 while L contains unscheduled jobs do 4 t next = 1 5 for i = 1 to n do 6 if v i is unscheduled then 7 if ]fv 2 V j S(v) t < C(v)g < m and v i is available at time t then 8 set S(v i ) = t, and t next = minft next ; C(v i )g 9 else set t next = t + 1 10 t = t next Figure 3: Priority list scheduling adapted to communication delays In the sequel, we do no longer need the fact that the precedence constraints form a seriesparallel order, but only that the priorities fulll Condition (2) stated in Lemma 4 below. This is the case for the priorities h(v) derived in the previous section (see the proof of Theorem 2), but this may also be true for other priority functions on more general orders. Moreover, communication delays may be 0 or 1. Lemma 4 Let I = (m; V; p; ) denote an instance for the problem P jprec; c ij 2 f0; 1gjC max. Let h be a function on the job-set V such that every job v 2 V fullls the following Condition (2). (2a) h(u) h(v) + p(u) for all direct successors v of u, while h(u) < h(v) + c(u; v) + p(u) for at most one direct successor v of u with c(u; v) > 0. (2b) h(u) h(v)+p(u) for all direct predecessors u of v, while h(u) < h(v)+c(u; v)+ p(u) for at most one direct predecessor u of v with c(u; v) > 0. (2) 13

Then priority list scheduling, applied to the list L = (v 1 ; : : : ; v n ) of jobs ordered by non-increasing height values h(v 1 ) : : : h(v n ), determines a feasible schedule S that fullls C max (S) C opt (I) + maxfh(v) j v 2 V g: Proof We rst observe that the priority list scheduling algorithm always produces a feasible schedule. This is due to the fact that, in Line 7, a job is only scheduled if this does neither violate the partial order nor the communication delays, nor the restricted number of machines. In the sequel, let S denote the feasible schedule generated by the list scheduling algorithm. The claimed inequality is shown by considering in the schedule S for every time t the maximum remaining height ^h(v) among all jobs v that are busy at t or start after t. For a job v that is scheduled after time t, its remaining height ^h(v) equals its height h(v), while for a job w that is being processed during time slot s t?1, the remaining height at time t is the dierence between its height and the number of time-units that w has already been processed, i.e. ^h(w) = h(w)? (t? S(w)) = h(w) + S(w)? t. The same idea was also used by Jae [9] for scheduling problems without communication delays. This leads to the following denition of H(t) for each t = 0; 1; : : : ; C max (S)? 1. H(t) = maxf^h(v) j C(v) > tg = max fh(v) + S(v)? t j S(v) t < C(v)g [ fh(v) j S(v) > t)g Clearly, the function H(t) is monotonically non-increasing in t. We will show that H(t) decreases also with idle times, i.e., after any k time slots with idle times, H(t) has decreased by at least k? 1. More precisely, H(0)? H(t) k? 1 when there are k time slots before t that contain idle times. As an example consider three jobs a; b, and c with p(a) = p(b) = 2; p(c) = 1, a c, and b c. Set c(a; c) = c(b; c) = 1. Then h(c) = 1 and h(a) = h(b) = 4 is a height function fullling Condition 2. The schedule S induced by h is then S(a) = S(b) = 0; S(c) = 3. Then, H(0) = 4; H(1) = 3; H(2) = H(3) = 1; H(4) = 0. Note that between time 0 and 1, H(t) decreases by one (since both, a and b, are executed), while between 1 and 3, it decreases by two (due to the communication delays between a and b on the one hand and c on the other hand). Let 0 t 1 < t 2 < : : : < t` denote the indices of all non-lled time slots in S, i.e. time slots s ti = [t i ; t i + 1] with m > ]fv 2 V j S(v) t i < C(v)g. Then, as we will show below, the following property holds for each i 2 f2; : : : ; `g. H(t i ) H(t i?1 )? 1 or (t i?1 1 and H(t i ) H(t i?1? 1)? 2) (3) So if there exist full time slots between s ti?1 and s ti, then H(t) decreases by one. If there is no full time slot between them, t i?1 = t i? 1 and H(t i?1 ) = H(t i ) is possible. But then the second inequality yields H(t i?1? 1)? H(t i ) 2. This shows that H(0)? H(t) k? 1 when there are k idle times before t. So the number ` of non-lled time slots is bounded from above by ` H(0)? H(t`) H(0). Since H(0) = maxf^h(v) j C(v) > 0g = maxfh(v) j v 2 V g, the number of non-lled time slots in S can be at most maxfh(v) j v 2 V g. This and C opt X (I) 1 p(v) ]flled time slots in Sg (4) m v2v yield the following inequality which proves Lemma 4. C max (S) = ]flled time slotsg + ]fnon-lled time slotsg C opt (I) + maxfh(v) j v 2 V g : 14

To complete the proof, we show that Statement (3) holds for each i 2 f2; : : : ; `g. Choose i 2 f2; : : : ; `g arbitrarily. Consider a job v 2 V that determines the maximum in the denition of H(t i ), i.e. a job v such that C(v) > t i and ^h(v) is maximum. There are two cases for v: S(v) t i (Case A) or S(v) > t i (Case B). We will consider Case A rst. So assume that v is being processed in time slot s ti. Note that the time slot s ti?1 also contains an idle-time. Hence either v is being executed during that time slot (Case (a) below) or v is not available at time t i?1. Since all communication delays are at most one, there are only two possible reasons why v is not available at time t i?1. The rst reason is that one of the predecessors of job v is being processed in time slot s ti?1 (Case (b) below). The other possible reason is that all predecessors of job v have been nished before time t i?1, but job v cannot be scheduled at time t i?1 because of a communication delay. So there is a direct predecessor u of v that is completed at time t i?1. (Otherwise all predecessors of v are nished at time t i?1? 1 or earlier, and, since communication delays are at most 1, job v can be scheduled at time t i?1 on every machine.) If u is the only direct predecessor that completes at time t i?1, there must be another direct successor v 0 of u that is started at time t i?1 and c(u; v) = c(u; v 0 ) = 1 (If c(u; v) = 0, v can be processed in the idle time slot. If c(u; v 0 ) = 0, then both v and v 0 would be available at time t i?1 and thus be started by the priority scheduling algorithm). This is Case (c) below. In the remaining Case (d), there are two direct predecessors u 1 ; u 2 of v with c(u 1 ; v) = c(u 2 ; v) = 1 that both end at time t i?1 and thus prevent v (and any other common successor) from starting at time t i?1. Figure 4 depicts the dierent situations occurring in the case analysis. Let us now consider these cases in detail. Case (a): As v is being processed during time slot s ti?1, we obtain from the denition of H(t) that H(t i?1 ) ^h(v) = h(v) + S(v)? t i?1. This inequality and t i t i?1 + 1 yield H(t i ) = ^h(v) = h(v) + S(v)? t i h(v) + S(v)? t i?1? 1 H(t i?1 )? 1: Case (b): Since u is being processed in time slot s ti?1, S(u) t i?1 and t i?1 < C(u). We know by Assumption (2) that h(u) h(v) + p(u). (Due to transitivity, this is true for direct as well as for non-direct predecessors u of v.) Then t i?1 < C(u) and S(u) + p(u) = C(u) yield p(u) > t i?1? S(u), which, together with S(u) t i gives H(t i ) = ^h(v) h(v) h(u)? p(u) < h(u) + S(u)? ti?1 H(t i?1 ) Case (c): Note that at time t i?1, job v has also been a candidate to be scheduled. Since priority list scheduling prefers jobs of highest priority, the priority of v cannot be greater than the priority of v 0. (Otherwise the priority scheduling algorithm would have scheduled v instead of v 0.) Hence h(v 0 ) h(v). This, together with S(v 0 ) t i?1 and with the denition of H(t) gives H(t i ) = ^h(v) h(v) h(v 0 ) h(v 0 ) + S(v 0 )? t i?1 H(t i?1 ) : If h(v) < h(v 0 ), it follows that H(t i ) < H(t i?1 ) and hence the rst inequality of (3) holds. So assume that h(v) = h(v 0 ). Then u has two direct successors v; v 0 of the same height-value and with communication delays c(u; v) = c(u; v 0 ) = 1. Then Assumption (2a) states that h(u) h(v)+p(u)+c(u; v) h(v)+p(u)+1 or h(u) h(v 0 )+p(u)+c(u; v 0 ) h(v 0 )+p(u)+1. Since h(v) = h(v 0 ), we obtain in both cases that h(u) h(v) + p(u) + 1. This and C(u) = t i?1 yields H(t i?1? 1) (Def.) maxfh(w) + S(w)? (t i?1? 1) j S(w) t i?1? 1 < C(w)g 15