showed that the SMAT algorithm generates shelf based schedules with an approximation factor of 8.53 [10]. Turek et al. [14] proved that a generalizati

Preemptive Weighted Completion Time Scheduling of Parallel Jobs? Uwe Schwiegelshohn Computer Engineering Institute, University Dortmund, 441 Dortmund, Germany, uwe@carla.e-technik.uni-dortmund.de Abstract. In this paper we present a new algorithm for the o-line scheduling of parallel and independent jobs on a parallel processor system. To this end we introduce a machine model which is based on existing multiprocessors and accounts for the penalty of preemption. After examining the relation between the makespan and the total weighted completion time costs for the scheduling of parallel jobs it is shown that the new algorithm achieves a small approximation factor for both total weighted completion time and makespan scheduling. To ne tune the algorithm a fairly simple numerical optimization problem is derived. This way dierent preemption penalties can be considered when determining the approximation factor. Finally, we compare the costs of the generated preemptive schedules with those of non-preemptive schedules for the same problem. Key words. Scheduling, approximation algorithms AMS subject classications. 68M0, 68Q5, 90B35 1 Introduction Various scheduling problems have been addressed since the seventies and many of them have been shown to be NP-complete. Thus, many eorts to deal with these problems have been devoted to nding good approximation algorithms. The most commonly used scheduling objectives are the minimization of either the weighted completion time (response time, ow time) or the makespan. In the case of a multiprocessor, we can say informally that weighted completion time scheduling supports the viewpoint of a user who can inuence the completion time of his job by assigning a higher weight to it and consequently accepting higher costs. On the other hand makespan scheduling is closely related to the overall use of a multiprocessor. There, individual job weights are ignored. Thus, it generally reects the goal of the multiprocessor's owner, see also [5]. In this paper we address the problem of o-line scheduling of parallel and independent jobs with invariant resource requirements. Makespan scheduling of parallel and independent jobs is similar to bin packing and has been addressed frequently in the past. For instance Garey and Graham [6] have shown that a simple list schedule for a given job system achieves an approximation factor of. Weighted completion time scheduling of sequential jobs on a multiprocessor has also been subject of various research eorts. For this problem Kawaguchi and Kyan [8] presented a list scheduling algorithm called LF with a tight approximation factor of 1+p. However, the parallel problem has only recently been addressed when Schwiegelshohn et al.? A preliminary version of this paper appeared in the Proceedings of the European Symposium of Algorithms ESA'96, Springer Lecture Notes in Science Computer 1136 1

showed that the SMAT algorithm generates shelf based schedules with an approximation factor of 8.53 [10]. Turek et al. [14] proved that a generalization of Kawaguchi's and Kyan's LF method produces a tight approximation factor of for parallel jobs with unique weights if the resource requirement of each job is at most 50% of the maximum number of processors. However, when allowing arbitrary jobs this method may result in schedules which deviate signicantly from the optimum. Other results include Chakrabarti et al. [1] who addressed for instance the scheduling of malleable jobs similar to [14] and proved an expected performance within 8.67 of optimal for a randomized on{line algorithm. Deng et al. [] also discussed preemptive response time scheduling for malleable jobs with unique weights and variable resource requirements. But their model is related to a multithreaded environment and diers signicantly from ours as they allow unknown execution times while neither a preemption penalty nor gang scheduling is considered. Here, we present an o-line algorithm which generates a preemptive schedule with small approximation factors for makespan and total weighted completion time costs. This is achieved by combining two restrictive schedules into a general schedule with the help of preemption. A similar idea has been applied by Stein and Wein [13] to show the existence of schedules which approximate both the makespan and the total weighted completion time. In the paper we rst introduce our machine model in Section. Then the scheduling problem is dened and some aspects of bicriteria scheduling of parallel and independent jobs are presented. Some previous results with relevance to our algorithm are discussed in Section 4. Next, the new algorithm is introduced in Section 5 and then analyzed. We show that the total weighted completion time approximation factor can be ne tuned by solving a numerical optimization problem. This way a tight approximation factor of.37 can for instance be achieved if the preemption penalty is not taken into account. Further, some special cases are considered in Section 7. Finally, our results are compared to the worst case bounds for non-preemptive SMAT schedules. The Model A system of M independent parallel jobs must be executed on a multiprocessor. Each job has only a single phase of parallelism []. It is described by its invariant resource requirement r i f1; : : : ; g (also called xed allotment in [14]), its execution time h i 1, and a user set priority weight w i > 0. Note, that without restriction of generality the minimal job execution time has been normalized to 1. Our multiprocessor P(; p) consists of identical nodes which use an interconnection network for communication. Each node has its own processor, main memory, and a local hard disk for swapping as in the IBM SP. The multiprocessor allows free variable partitioning [4], that is the resource requirement r i of a job i can be satised by any resource set i f1; ; : : : g with j i j = r i. Also, the execution time of a job does not depend on the assigned partition. The multiprocessor further supports gang scheduling [4] by switching simultaneously the context of all processors belonging to the same partition. This context switch is executed by use of the local processor

memory and/or the local hard disk while the interconnection network is not aected except for message draining and synchronization. The context switch also causes a preemption penalty mainly due to processor synchronization, message draining, saving of job status and page faults. In our model this penalty is assumed to be a constant amount of time p which is encountered at any context switch. Individual nodes are assigned to jobs in an exclusive fashion, i.e. at any time instant any node belongs at most to a single partition. This partition and the corresponding job are said to be active at this time instant. As gang scheduling is used a job must be either active on all or on none nodes of its partition at the same time. All jobs which have already been started but not yet completed and whose partitions include node n are said to be resident at node n. The number of resident jobs at any node may be limited by the swapping space of the local hard disk. The multiprocessor also has an external parallel le system with a distributed interface which is assumed not to form a bottleneck. Therefore, the amount of time required to load a parallel job and save its results is independent of the assigned partition and any concurrent loading or saving of other jobs. Consequently, it is included in h i. On the other hand the cost of any intermediate saving of the job status due to a context switch is contained in the preemption penalty p. 3 The Scheduling Problem For a given job system and a multiprocessor P(; p) we next introduce a valid preemptive schedule based on the model described in the previous section by dening timing and resource allocation separately. Denition 1 Timing. In a preemptive schedule S(; P) each job i is assigned { a non-negative integer d i, { a (d i + )-tuple of time instants t i (0); : : : ; t i (d i + 1) such that Xd i =0 t i ()? t i (? 1) > t i ( + 1)? t i () = h i + p d i 8 < : and 0 d i = 0 or ( = for 0 < d i ) p d i > 0 and ( = 1 or = d i + 1) p = + 1 for 0 < < d i d i denotes the number of preemptions for job i. More precisely, the execution of job i is started at time t i (0), interrupted at times t i (? 1), resumed at times t i (), and completed at its completion time t i (d i + 1) with 0 < d i. Denition esource Allocation. A resource allocation is dened by assigning to each job i and each with 0 d i a node set i; f1; ; : : : g such that j i; j = r i. The resource allocation is valid if j; \ k; = ; for all dierent jobs j and k with t j () t k () < t j ( + 1). 3

In most parts of our paper we further assume that for all jobs i and for all 0 1 < d i there is i;1 = i; = i. Then, we also say that the schedule does not use job migration. A schedule S(; P) has the total weighted completion time c S and the makespan m S with c S = X i w i t i (d i + 1) and m S = max i ft i(d i + 1)g: For a given job system and a multiprocessor P we further denote = min c S and = min m S: all S(;P) all S(;P) The problem of nding a schedule with c = is known to be NP-complete in the strong sense for arbitrary even if each job requires only a single processor [7]. The same is true for nding a schedule with m = in the case of parallel jobs [3]. Both results hold for preemptive and non-preemptive schedules. Moreover, in general there is no schedule with c = and m = as shown in Example 1. Example 1. Assume the following system of + 1 jobs: h i r i w i jobs 1 1 1 1 i? 1 1 i = + 1 For this job system the following relations hold for the various optimal schedules: m Si any optimal makespan schedule S m (+3) any optimal completion time schedule S c + 1 + 1 c Si Example 1 also demonstrates the existence of job systems such that any optimal makespan schedule S m results in csm = (). A similar result with respect to optimal completion time schedules is described in the following lemma. Lemma 3. There are job systems such that any non-preemptive optimal total weighted completion time schedule S c results in msc = (). Proof: We consider a job system of k jobs which complies with the following restrictions: h i r i w i jobs H 1 H( + 1) i = 1 1 i = H + 1 1 variable i + 1 for 1 i < k 1 variable i for 1 < i k 4

In addition, we have P k i=3 w i H+. Note that delaying job 1 only by a single time unit is more expensive than starting an optimal schedule for all other jobs at time h 1 = H as H( + 1) > ( + H+ )H. Therefore, in the optimal completion time schedule the start time t 1 (0) of job 1 must be 0. Also, job 1 cannot be executed concurrently with job as the second job requires all nodes. Similarly, it is more expensive to delay job by a single time unit than starting the optimal schedule for jobs 3 to k at time h 1 + h = H + 1 as > H+ (H + 1). Due to the execution times and node requirements of jobs 3 to k none of them can be started before time H + 1 in the optimal completion time schedule. By recursively repeating the same construction for all remaining jobs the single optimal completion time schedule S c can be determined. If h i = + i is used for all jobs i + 1 with 0 i < k = we obtain m Sc = (3+1) while = 3? 1. However if preemption is allowed, the relation between total weighted completion time scheduling and makespan scheduling is dierent. Lemma 4. If preemption with job migration and without preemption penalty is allowed then any optimal weighted completion time schedule S c also guarantees msc < for any job system. Proof: Assume job i with r i and t i(d i + 1) = max j^rj ft j(d j + 1)g. As schedule S c has the minimal S total weighted completion time there are less than d t [0; m Sc [n i =0 [t i(); t i ( + 1)[. Therefore, we have nodes idle during all time instances > (m S c? h i ) + h ir i = m S c? h i + r i h i m S c : 4 Previous esults and Basic Observations In their LF schedule Kawaguchi and Kyan [8] rst generate a priority list of jobs by arranging the jobs in non increasing order of wi h i. This ratio is also known as Smith's ratio [1]. Whenever a node becomes available, the next unscheduled job in this list is assigned to the free node until all jobs are completed. The p +1. McNaughton [9] proved that running time of the algorithm is O(M log M) and it guarantees c < for this problem the cost of the optimal schedule cannot be reduced by introducing preemption. However, this observation does not hold for parallel jobs as can be seen by the following simple example [11]. Example. Assume a system of 3 jobs as described below: h i r i w i jobs 1 i = 1 1? 1 1 i = 4 1 1 i = 3 5

In an optimal non-preemptive schedule S n job is started concurrently with job 3. This block and job 1 are then scheduled in any order, resulting in c Sn = 11. In the optimal preemptive schedule S p however, rst job and job 3 are started together. Then at time t = 1, job 3 is interrupted and job 1 is executed before the execution of job 3 is resumed at time t = 3. Assuming no preemption penalty (p = 0) we have c Sp = 10. For parallel jobs a modied Smith ratio s i = wi h ir i can be dened. Therefore, all parameters of job i can be determined from any three of the four values h i, r i, w i, and s i. Turek et al. [14] showed that Kawaguchi's and Kyan's LF schedule guarantees c < if r i for each job i. While the proof in [14] only considers the unique weight case, a generalization to arbitrary weights is straight forward. Also, if all jobs of a job system require more than 50% of the nodes then it is easy to obtain an optimal total weighted completion time schedule as no two jobs can be executed concurrently. However due to dierent resource requirements of the various jobs, scheduling the jobs in Smith order will not necessarily be optimal. Corollary 5. Assume a job system with r i > all jobs are executed in Smith order. for all i. Then, a schedule S c satises csc < if Proof: We construct a new job system ^ such that for each job i there is a job ^{ ^ with w^{ = w i, h^{ = h i and r^{ =. Therefore, we have h^{ r^{ < h i r i. h ir i Using the squashed area bound A [15], where each job i is transformed into a job of execution time and resource requirement, we have csc A^ < A copt But if both cases are combined, meaning that jobs may be arbitrary, the approximation factor for may be as bad as [14]. Further, many non-preemptive schedules with small approximation factors for cannot guarantee a constant approximation factor for [15]. Therefore, we next address the approximation factors for of some non-preemptive schedules with good total weighted completion time performance. Corollary 6. For any job system a -SMAT NFIW schedule S c satises msc < 1 + + Proof:?1. First we restrict ourselves to a job system 1 which contains all jobs scheduled on the rst shelf of each height component. Assuming k < h max = max i h i k+1 we then obtain for the makespan m Sc ( 1 ) h max + k+1?1?1 while ( 1 ) h max. Therefore, we have m Sc ( 1 ) ( 1 ) < 1 +? 1 : Next we consider the job system = n Y 1. Using Y Lemmas 3.5 and 3.6 in [10] we obtain h i r i + h i r i? h i r i ) < Y h i r i (): i i 1 m Sc ( ) (Y i i 6

The combination of both parts yields m Sc () () < 1 + +? 1 : Corollary 7. Assume a job system with r i for all i. Then an LF schedule S c satises msc < 3. This bound is tight. Proof: Assume job i with t i (1) = m Sc. As S c is a non{preemptive LF schedule there must be less than nodes idle during all time instances t [0; t i (0)[. Therefore, we have h i and > (msc?h i) which results in m Sc = m S c? h i + h i < 3: Further, consider the following system of 1k + 1 jobs with 3k + 1 =. As all jobs have the same Smith ratio we assume that the order is given by the job index. h i r i w i jobs y + j 3 h i r i i = 3j + 1 and 0 j < k y + j + 1? 1 h ir i i = 3j + and 0 j < k 1? 1 h i r i i = 3j and 0 < j k ky 1 h i r i i = 1k + 1 For this job system we obtain m Sc = ky + k(k+1) + ky and d k 4 e(y + 4d k 4 e + 1) + (y + k? 1) which results for y = k in m Sc lim = 3:!1 It is easy to see that an LF schedule S c guarantees msc < if r i = 1 for all jobs i. 5 The Algorithm In this section we introduce the new preemptive algorithm PSS, shown in Table 1. PSS stands for Preemptive Smith atio Scheduling. Without restriction of generality is assumed to be even. The algorithm uses the following elements: Q t s a priority queue of jobs based on the modied Smith ratio a variable denoting the earliest starting time of the next job T (r) minftjt t s and? r P j r j with t j (d j + 1) > tg test a boolean function for testing whether the current job will cause preemption start value a function to determine the non{negative start delay for the current job if it causes preemption 7

Create a priority list Q for all jobs such that job i precedes job j if s i > s j; t s = 0; while (Q 6= ;)f pick the next job i and delete it from Q; d i = 0; if (r i > and test) f t i(0) = T ( ) + p + start value; t s = t i(0) + h i + p ; for all jobs j with t j(d j + 1) > t i(0)? p do f d j = d j + 1; t j(d j + 1) = t j(d j? 1) + h i + p; t j(d j? 1) = t i(0); t j(d j) = t s? p ; g g g else t s = t i(0) = T (r i); t i(1) = t i(0) + h i; Table 1. The Preemptive Algorithm PSS The following notation is used for variables which are changed during the iteration of algorithm PSS in which job i is removed from Q: q(i) and q(i) denote the value of variable q at the beginning and at the end of this iteration, respectively. Note that only a job with r i > may cause preemption. Further, these jobs cannot be preempted themselves. First, we discuss the validity of the schedule produced by PSS. Lemma 8. Consider a job system and a multiprocessor P. Independent of the implementation of test and start value PSS always generates a valid schedule which does not require migration. Proof: First note that at the beginning of any iteration t i (d i ) t s for any job i nq. Initially, PSS assigns to each job i an integer d i = 0 and time instances t i (0) and t i (1) with t i (1)? t i (0) = h i 1 and t s (i) t i (0) ts (i). During a later iteration job j may cause preemption and d i (j) may become d i (j) = d i (j)+1 while all t i () with d i (j) remain unchanged. The validity of the scheduling conditions for t i ( d i (j)? 1), t i ( d i (j)), and t i ( d i (j) + 1) can be easily veried. A job i can be assigned any set i f1; : : : ; gn( S j jjt j (d j + 1) > T (r i )) with j i j = r i if i does not cause preemption of any other job in PSS. Otherwise job i is neither executed concurrently with any other job j nq nor with any future job j Q as ts (i) t i (d i (i) + 1) holds. Therefore, job i can be assigned any node set i f1; : : : ; g with j i j = r i. Note that PSS is based on a restrictive global gang scheduling model which requires a concurrent context switch of all nodes independent of the partition size. Moreover, no job execution can take place during preemption penalty. The time complexity of PSS is max(o(m ); M(O(test) + O(start value))) as in the worst case O(M) jobs may preempt O(M) jobs each. However, note that the LF schedule and Turek's generalization are 8

both included in PSS. Therefore, if the job system observes the corresponding resource restrictions, p PSS has a time complexity of O(M log M) and guarantees cs < +1 and cs <, respectively. 6 Approximation Factors The approximation factors depend on the denition of the functions test and start value as for test 0 there are job systems for which PSS generates schedules with cs = () and ms = () [14]. It may seem appropriate to dene test and start value such that a local minimum for the completion time is achieved, that is t i (0) = argfmin t st ( X j6q_j=i w j t j ( d j (i) + 1))g: However as shown in Example 3, this approach cannot guarantee ms = o(). Example 3. Assume the following system of jobs with! 0: h i r i w i jobs 1 1 i = 1 1 1? i = 1 wi? 1 wi?? i = k + 1 and 1 k <? i = k and k Algorithm PSS with local optimality scheduling produces a schedule S where no two jobs are executed concurrently, thus resulting in m S = + while =. Therefore, we use the following functions for test and start value which only depend on T (r i ), T ( ), and the execution time h i of the new job i: test = ( h i v < T (r i)? T ( )) and start value = h i v : v 1 is a positive constant which may be chosen to minimize the approximation factors. As test and start value both have a time complexity of (1), the overall complexity of PSS is O(M ). Next, the ratio cs is addressed. Theorem 9. Assume schedule S is obtained by algorithm PSS. Then the job system below with k;! 1,! 0, and appropriately selected non-negative parameters x <, y produces the maximum ratio f c (p) = cs. There, it is assumed that x + 1 jobs of the rst group are always followed by 1 job of the second group in Q. The jobs of the third group are scheduled after all jobs of the rst two groups while the jobs of the fourth group must be at the end of Q. 9

h i r i w i number of jobs 1 v + 1 1 v + k(x + 1) 1? x? x k ky 1 ky x 1 k kyv Note that the modied Smith ratio is 1 for all jobs in the rst three groups of the job system in Theorem 9. Therefore, the order among these jobs is arbitrary. The jobs in the last group only have the purpose to delay the completion of all jobs in Group 3. The optimal schedule is obtained by using Turek's generalization of the LF schedule where the jobs of the various groups are arranged in the order Group 3, Group, Group 1 and Group 4. Then the high order terms for and c S are given by = k (xy + 1 x (? x)( v(?x) + 1) ) and = k (xy(yv + 1) + 1 ( x v +? x))( 1 v + 1 + p): c S Thus, Theorem 9 transforms the determination of the best approximation factor f c (p) for algorithm PSS into the solution of the following optimization problem: f c (p) = min ( max 0<v1 y0;0x< ( c S )) To prove Theorem 9 we construct a worst case example in several steps and derive a lower bound for which is tight for the job systems of Theorem 9. First, we restrict ourselves to all jobs i of a job system with s i = max j (s j ) and call the set of all those jobs ^. The order of those jobs in Q is arbitrary. Further, the modied Smith ratios are normalized such that s i = 1 for all jobs i ^. Then, the ratio between the total weighted completion time ^c S for all jobs i ^ in the schedule S produced by PSS and the costs ^ of the optimal schedule for ^ is determined. Note that in schedule S jobs with a lower s j may inuence the weighted completion time of jobs i ^. The following lemma establishes a relationship between ^cs ^ Lemma 10. If Proof: ^c S ^ for all job systems then and cs. cs is also upper bounded by. As test and start value are not dependent on any weight, the proof of Theorem of Kawaguchi's and Kyan's paper [8] can be directly applied to this lemma as well. Next, we derive a lower bound for ^. To this end we dene t b = max i6^ f(t ( ))(i)g and ^ t = fi ^jt(d i + 1) > t b g. Note that only jobs in ^ t can be preempted by jobs j with s j < 1. To generate a worst case, we therefore assume that all jobs i ^ are followed by a sucient number of jobs j with h j = 1, r j = and s j! 0. This results in t(d i + 1)? t b h i (1 + v + vp) for all jobs i ^ t where h i is the remaining execution time of job i at time t b. The job system ^ is now transformed into a new parameterized sequential job system ~(r) with 0 r <. This system consists of three disjoint job systems: 10

~ 1 : r i jobs j with r j = 1, h j = h i, and s j = 1 for each job i ^ t, ~ : min(r; r i ) jobs j with r j = 1, h j = h i, and s j = 1 for each job i ^n^ t, ~ 3 : max(0; r i? r) jobs j with r j = 1, h j = h i, and s j = 1 for each job i ^n^ t. ~ 1 is further partitioned into sets ~ 11 and ~ 1 such that j~ 11 j = min(r; j~ 1 j) and h i h j for all jobs i ~ 11 and j ~ 1. ~ (r) is the cost of the optimal schedule for ~(r) under the restriction that all r jobs of ~ 11 and all jobs of ~ 3 are scheduled using only resources 1; : : : ; r. The following corollary is a direct consequence of this denition. Corollary 11. ^ max 0r< ~ (r) holds for all job systems. In further parts of this section, we will therefore always use the lower bound ~ (r). Job system ~(r) consists only of sequential jobs with the same Smith ratio. The total completion time for each resource is 1 ((P i h i) + P i h i ) where the sums are taken over all jobs i ~(r) scheduled on this resource [1]. Also, we get a lower bound for ~ (r) if we assume that the makespan on all resources r + 1; : : : ; is the same and that this makespan is not larger than the makespan on any resource 1; : : : ; r [8]. For the job systems of Theorem 9 this lower bound for ~ (r) is tight and max 0r< ~ (r) = ^ holds. The rest of the proof of Theorem 9 is divided into three corollaries. While the proof is tedious it is mostly based on a simple concept: Job system is transformed into a job system 0 such that max r (~ (r))j > max r (~ (r))j 0 and ^c S j? ^c S j 0 (1 + 1 v + p)(max r(~ (r))j? max r (~ (r))j 0). First, we address all jobs i ^ with r i >. ^c Corollary 1. Assume a job system with S max j r(~(r)) 1 + 1 v + p. Then there is a job system 0 such ^c that S max j r(~(r)) 0 ^c S max j r(~(r)) and all jobs i ^ 0 with r i > have h i = 1 and cause preemption. Proof: 1. Assume a job i with h i causing preemption in S. Now transform into 0 by replacing i in Q with two jobs i 1 and i such that { r i1 = r i = r i, { s i1 = s i = s i, { h i1 = 1, { h i = h i? 1. Both jobs i 1 and i will cause preemption in S 0 and t i (1)j S 0 = t i (1)j S + p. Further, there is t j (d j + 1)j S t j (d j + 1)j S 0 for all jobs j \ 0. This results in yielding ^c S j 0 ^c S j? r i (h i? 1)( 1 v + 1? p) ~ (r)j 0 ~ (r)j? r i (h i? 1); ^c S max r (~ (r)) j 0 ^c S max r (~ (r)) j 1 + 1 v + p: 11

. As already mentioned we always assume that there are enough jobs j with s j! 0 such that t i (d i + 1) = t b + h i (1 + v + vp) for all jobs i ^ t or ^ t 0. Transform into 0 by replacing each job i ^ with job i 0 ^ 0 such that { r i 0 = r i, { h i 0 = ah i, { w i 0 = aw i, using a positive integer a. Then repeatedly split all jobs i 0 ^ 0 as shown above if they cause preemption and have h i 0. If all h i 0 are rational numbers then a can be selected such that h i = 1 for all jobs i 0 ^ 0 which cause preemption. This scaling procedure will also result in ^c S max r (~ (r)) j 0 ^c S max r (~ (r)) j 1 + 1 v + p: 3. Assume a job i ^ t with r i hi which does not cause preemption, that is v > T (r i)? T ( ) > 0 in algorithm PSS. Transform into 0 by replacing i in Q with two jobs i 1 and i such that { r i1 = r i = r i, { s i1 = s i = s i, { h i1 = v(t (r i1 )? T ( ))? with! 0, { h i = h i? h i1. If h i1 < 1 or h i < 1 scale the system appropriately as described above. Now, i 1 causes preemption while lim!0 T (r i ) = T ( ) holds for i in algorithm PSS. As before we have ^c S j 0 ^c S j? r i (h i1 h i? h i1 p? h i p) ~ (r)j 0 ~ (r)j? r i h i1 h i : 4. Assume a job i ^ t with r i > and T (r i) = T ( ) in algorithm PSS. Transform into 0 by replacing i in Q with j = r i? + 1 jobs i 1; : : : i j such that { h i1 = : : : = h ij = h i, { s i1 = : : : = s ij = s i, { r i1 =, { r i = : : : = r ij = 1. Then we have ^c S j 0 = ^c S j ~ (r)j 0 ~ (r)j : Note that Corollary 1 is valid independent of p. Next, we introduce a further transformation of the job system in order to increase the number of preemptions. ^c S Corollary 13. Assume a job system with max j r(~(r)) 1 + 1 v + p. Then there is another job system 0 ^c and a parameter > 0 with lim!0 maxf S max j ^c r(~(r))? S max j r(~(r)) 0; 0g = 0 and following properties of 0 : { r i > or r i = 1 for all jobs i ^ 0. 1

{ t i (0) t b? and h i = h j for all jobs i; j ^ 0 t. { h i = 1 + 1 v + and d i = 1 for all jobs i ^ 0 n^ 0 t with r i = 1. { + 1? r j jobs are preempted by job j ^ 0 with r j >. Proof: Based on Corollary 1 we can assume that h i = 1 for all jobs i ^ with r i >. 1. Let i ^ be a job with r i and h i. Then schedule S and job system are transformed into schedule S 0 and job system 0 by replacing i with two jobs i 1 and i such that { r i1 = r i = r i, { s i1 = s i = s i, { h i = h i? h i1 h i, { d i1 + d i = d i, { t i1 () = t i () for all 0 d i1, { t i1 (d i1 + 1) = t i (0), { t i () = t i (? d i1 ) for all 0 < d i + 1. Then t i (d i + 1) t i (0) + h i (1 + v + vp) holds. Therefore, we have ^c S 0j 0 ^c S j? r i h i1 h i (1 + 1 v + p) ~ (r)j 0 ~ (r)j? r i h i1 h i : However, the new schedule may not necessarily be a PSS schedule. Now, is transformed into 0 by repeatedly splitting jobs i ^ with d i 1 at time t j (1) + p + in schedule S if job j ^ causes preemption of job i. In Q job i 1 then replaces job i and job i is introduced just after job j. In the resulting PSS schedule we have 0 t i1 (d i1 + 1)? t i (0). Therefore, the above described conditions are valid for! 0. Further, the completion time of any job is reduced by at most t b v 1+v+vp which leads to lim maxf ^c S!0 max r (~ (r)) j ^c S? max r (~ (r)) j 0; 0g = 0: If the job splitting would result in jobs with h i < 1 the job system is scaled before as described in the proof of Corollary 1.. Let j 1 ; j ^ be two jobs with r j1 ; r j > and t j (0) > t j1 (1) such that there is no other job i with r i > and t j (0) > t i (1) > t j1 (1). If t j (0) > t j1 (1) + 1 v + p then all jobs i with t i(0) t j (0)? 1 v? p < t i(d i + 1) are split at time t j (0)? 1 v? p. If necessary the job system is scaled such that a = tj (0)? v 1?p?tj (1) 1 becomes an integer and 1+ v 1 + h i 1 v + for all jobs i with t j 1 (1) + p t i (0) < t j (0)? 1 v? p. Note that during any time instance between t j 1 (1) + p and t j (0)? 1 v? p more than 50% of the nodes are used. Next, job system is transformed into 0 by replacing the set fi ^jt j1 (1) + p t i(0) < t j (0)? 1 v? p g with a jobs having r i = + 1, h i = 1, s i = 1 and a jobs having r i = 1, h i = 1 v +, s i = 1. sequential jobs are then always followed by one job i with r i = + 1 in Q. Because of Corollary 1 and the results above this procedure guarantees ^c S j 0 max r (~ (r)j 0) ^c S j max r (~ (r)j ) : 13

The same construction can be used for the time frames [0; minft i (0)ji ^n^ t with r i > g? p [ and [maxft i (1)ji ^n^ t with r i > g + p ; t b[ if necessary. 3. Next, replace each job i ^ having r i with r i identical jobs j such that r j = 1, h j = h i, and s i = 1. This does not aect the cost of the PSS schedule if! 0. Also, transforming into 0 by removing a job i ^n^ t with r i < and d i = 0 results in ^c S j 0 > ^c S j? h i r i t b t ~ (r)j 0 ~ (r)j? h i r b i 1+ v 1 +p : Similarly, assume that job j ^ preempts b sequential jobs. Then can be transformed into 0 by removing b + r j?? 1 of these sequential jobs. This will not aect the completion time of any other job. Therefore we have ^c S j 0 max r (~ (r)j 0) ^c S j max r (~ (r)j ) : 4. Finally, a job system is transformed into 0 by replacing each job i ^ t with a job i 0 such that { r i 0 = 1, { s i 0 = 1, { h i 0 = P i^ t 0 h i j^ t 0j. This results in ^c S j 0 = ^c S j? ( P i^ t h i? (P i^t hi) j^ tj )(1 + 1 v + p) ~ (r)j 0 ~ (r)j? P i^ t h i + (P i^t hi) j^ tj : ^c Corollary 14. Assume a job system as described in Corollaries 1 and 13 with S max j r(~(r)) > 1+ 1 v +p and a given parameter r with 0 r <. Then there is another job system 0 with r i =? r for all preemption causing jobs i ^ 0 and Proof: ^c S max j ^c r(~(r)) S max j r(~(r)) 0. 1. Assume a job i with r i <?r. Then transform into 0 by removing one job j which is preempted by i, that is r j = 1, h j = 1 v, and s j = 1. Further, i is replaced by i 0 such that { r i 0 = r i + 1, { s i 0 = s i = 1, { h i 0 = h i. P P As i~ h i > (?r)t b 1+ and v 1 +p i~ [~ 3 h i > t b holds, this results in +p 1+ 1 v ^c S j 0 ^c S j? t b ( 1 v? 1) ~ (r)j 0 ~ (r)j? t b 1+ 1 v +p(1 v? 1): Therefore, it is sucient to assume that r i? r for all preemption causing jobs i ^. 14

. Transform into 0 such that P i^ 0 n^ 0 t h ir i = P i^n^ t h i r i and the rst b j~3j r c preemption causing jobs in the PSS schedule Sj 0 all have r i = while at most one other job j ^ 0 has r j > r. Then, we have ^c S j 0 ^c S j ~ (r)j 0 = ~ (r)j : By the use of scaling we can further achieve that all jobs i ^ with r i > 1 either have r i = r or r i =. 3. Assume a job i ^ with r i =. Transform into 0 by removing those jobs j with t j (0) 1 v + p. This results in ^c S j 0 P ~ (r)j 0 ~ (r)j? i^ w i: ^c S j? (1 + 1 v + p)p i^ w i This concludes the proof of Theorem 9. To obtain the minimal value for f c (0) = :366 we choose v = 0:836, x = 0:183, and y =. Note that f c (0) > 1 v + 1 = :196. In the next theorem we address the makespan costs of a PSS schedule: Theorem 15. Algorithm PSS produces schedules with a ratio f m (p) < + 1 v + p. This bound is tight. Proof: in We have P i r ih i and max i h i. Further, let j be the last job in Q. This results f m (p) = m S = m S? ts (j) t + s (j) m S? ts (j) t + s (j) max i h i Pi r < 1 + 1 + 1 ih i v + p = + 1 v + p: The job system below with k;! 1 and! 0 produces the maximum ratio f m (p) = + 1 v + p. There it is assumed that jobs of the rst group are always followed by 1 job of the second group in Q. The single job of the third group is the last job of Q. h i r i w i number of jobs 1 v + 1 1 v + k 1? 1? 1 k k(1 + v(?1) ) 1 1 7 Extensions It would also be interesting to know whether allowing job migration could signicantly decrease the ratio f c. For this purpose, we consider a variant of PSS with an additional optimization phase. This phase is executed for each job in the order given by the initial list Q. During optimization a job i is rescheduled such that there is no time interval below the completion time of i in which at least r i resources are available and i is not scheduled. To address this case similar corollaries as introduced in Section 6 can be used. However, here we just present a type of job system which cannot be improved by the optimization phase: 15

Example 4. Assume a system as described below: h i r i w i number of jobs k(1 + 1 v + p) 3 3k(1 + 1 v + p) 1 1 v +? 1 ( 1 v + )(? 1) k 1 + 1 + 1 k ky 1? 1 ky(? 1) 1 k The rst job in the table is also the rst job in Q. Next, one job of the second group alternates with one job of the third group. Then follows the job of Group 4 while all jobs of Group 5 are positioned at the end of Q. For k;! 1,! 0 the same numerical optimization problem as described in Section 6 is obtained with x = = const. Then v = 0:816 and y = :449 yield the value :1. However, note that :1 < 1 v + 1 = :5. Nevertheless, the example demonstrates that the achievable gain for allowing migration is limited. Moreover, this gain may come with an increased preemption penalty. Further, we may assume that the modied Smith ratio is 1 for all jobs of a job system. With other words the cost of any node{second is constant. Note that jobs may still have dierent weights depending on their resource requirements. However, Group 4 in Theorem 9 cannot exist anymore. Nevertheless, the proof of Theorem 9 remains valid. There is only a new high order expression for c S : kyv c S = k (xy(y + 1 v + 1 + p) + 1 ( x v +? x)( 1 v + 1 + p)): In this case Group 3 of Theorem 9 cannot increase f c and we obtain f c (0) < for v = 1. 8 Conclusion First, we addressed bicriteria scheduling of parallel jobs in general and gave a few new results. Then, we presented an algorithm which generates preemptive o-line schedules for parallel and independent jobs with xed resource requirements. This algorithm is obtained by combining two algorithms with good performance for restricted input sets. The schedule is based on a priority list and has small approximation factors for both total weighted completion time and makespan costs. The method belongs to the class of list scheduling algorithms. It is carefully analyzed and a tight worst case approximation factor is determined. Moreover, the analysis provides information about the structure of `bad cases'. Also, we derived a numerical optimization problem which can be used to ne tune the total weighted completion time approximation factor. The generated schedules are based upon our machine model which is derived from existing parallel computers. To our knowledge it is also the rst time that a preemption penalty is considered in the analysis of such an algorithm. Compared with the non-preemptive SMAT schedules our approximation 16

factors are signicantly better even if we assume that a context switch is as time consuming as the minimal completion time of a job including loading the job and storing its results, that is p = 1. As shown below PSS and SMAT schedules can be ne tuned to minimize either cs Schedule c S m S SMAT 8.53 5.19 SMAT 9 5 PSS with p = 0.37 3.0 PSS with p = 0.41 3 PSS with p = 1 3.41 4.31 PSS with p = 1 3.61 4 or ms. PSS schedules further have the advantage that they use preemption only for jobs which require at most 50% of the nodes. Even in this case there are at most two jobs resident on any node at the same time. Moreover, PSS schedules only need global preemption which may be easier to implement than other forms of gang scheduling with respect to running messages in the interconnection network. The big dierence between the total weighted completion time approximation factors for preemptive and non{preemptive scheduling of parallel jobs leads to the question whether better non{preemptive methods are possible at all. This may be subject of future research. Acknowledgement. The author is grateful to Joel Wein for a helpful discussion on bicriteria scheduling. eferences 1. S. Chakrabarti, C. Phillips, A.S. Schulz, D.B. Shmoys, C. Stein, and J. Wein. Improved approximation algorithms for minsum criteria. In Proceedings of the 1996 International Colloquium on Automata, Languages and Programming. Springer Verlag Lecture Notes in Computer Science, 1996.. X. Deng, N. Gu, T. Brecht, and K. Lu. Preemptive scheduling of parallel jobs on multiprocessors. In Proceedings of the 7 th SIAM Symposium on Discrete Algorithms, January 1996. 3. J. Du and J. Leung. Complexity of scheduling parallel task systems. SIAM Journal on Discrete Mathematics, (4):473{487, November 1989. 4. D.G. Feitelson and L. udolph. Parallel job scheduling: Issues and approaches. In D.G. Feitelson and L. udolph, editors, IPPS'95 Workshop: Job Scheduling Strategies for Parallel Processing, pages 1{18. Springer{ Verlag, Lecture Notes in Computer Science 949, 1995. 5. D.G. Feitelson and L. udolph. Towards convergence in job schedulers for parallel supercomputers. In D.G. Feitelson and L. udolph, editors, IPPS'96 Workshop: Job Scheduling Strategies for Parallel Processing, pages 1{6. Springer{Verlag, Lecture Notes in Computer Science 116, 1996. 6. M. Garey and. Graham. Bounds for multiprocessor scheduling with resource constraints. SIAM Journal on Computing, 4():187{00, June 1975. 7. M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979. 8. T. Kawaguchi and S. Kyan. Worst case bound of an LF schedule for the mean weighted ow-time problem. SIAM Journal on Computing, 15(4):1119{119, November 1986. 9.. McNaughton. Scheduling with deadlines and loss functions. Management Science, 6(1):1{1, October 1959. 17

10. U. Schwiegelshohn, W. Ludwig, J.L. Wolf, J.J. Turek, and P. Yu. Smart SMAT bounds for weighted response time scheduling. SIAM Journal on Computing. Accepted for publication. 11. U. Schwiegelshohn, J.J. Turek, and J.L. Wolf. Preemptive scheduling of parallel tasks. Technical eport C 0104 (8893), IBM esearch Division, June 1995. 1. W. Smith. Various optimizers for single-stage production. Naval esearch Logistics Quarterly, 3:59{66, 1956. 13. C. Stein and J. Wein. On the existence of schedules that are near-optimal for both makespan and total weighted completion time. Preprint, 1996. 14. J.J. Turek, W. Ludwig, J.L. Wolf, L. Fleischer, P. Tiwari, J. Glasgow, U. Schwiegelshohn, and P. Yu. Scheduling parallelizable tasks to minimize average response time. In Proceedings of the 6th Annual Symposium on Parallel Algorithms and Architectures, Cape May, NJ, pages 00{09, June 1994. 15. J.J. Turek, U. Schwiegelshohn, J.L. Wolf, and P. Yu. Scheduling parallel tasks to minimize average response time. In Proceedings of the 5 th SIAM Symposium on Discrete Algorithms, pages 11{11, January 1994. 18