Shortest-Elapsed-Time-First on a Multiprocessor

Shortest-Elapse-Time-First on a Multiprocessor Neal Barcelo, Sungjin Im 2, Benjamin Moseley 2, an Kirk Pruhs Department of Computer Science, University of Pittsburgh ncb30,kirk@cs.pitt.eu 2 Computer Science Department, University of Illinois im3,bmosele2@illinois.eu I woul like to call it a corollary of Moore s Law that the number of cores will ouble every 8 months. Anant Agarwal, founer an chief technology officer of MIT startup Tilera Abstract. We show that SETF, the iealize version of the uniprocessor scheuling algorithm use by Unix, is scalable for the objective of fractional flow on a homogeneous multiprocessor. We also give a potential function analysis for the objective of weighte fractional flow on a uniprocessor. Introuction At the harware level, Moore s law continues unabate, with the number of transistors per chip oubling about every.5 to 2 years. However, we are in the mist of a revolutionary change in the effect of Moore s law on the software layers of the information technology stack. Instea of an exponential increase in processor spee over time, these layers are now expecte to see an exponential increase in the number of processors over time. MIT startup Tilera now prouces chips with up to 00 processors, an the expectation is that chips with 000 processors will be available within the ecae. The natural research question motivating our research is whether the stanar priority scheuling algorithms use for uniprocessors will be appropriate in the multiprocessor setting. In particular, we consier scheuling algorithm Shortest Elapse Time First SETF), as it is the iealize version of Unix s uniprocessor scheuling algorithm. Of course the implementation in Unix has many practical kluges/moifications, such as maintaining equivalence queues of jobs that have been processe about the same amount so as to logarithmically boun the number of preemptions per job). For a uniprocessor using SETF for scheuling, all jobs that have been processe the least share the processing equally; it is useful to think of SETF giving higher priority to jobs that have been processe less. The natural generalization of SETF to a homogeneous multiprocessor setting assigns jobs to processors in priority orer recall jobs that have been processe less have higher priority); the x jobs of the next priority are either assigne to x processors, or evenly share the remaining unassigne processors if there are less than x previously unassigne processors.

Two natural quality of service measures for iniviual jobs are integer flow an fractional flow. The integer flow of a job is the total time a job has to wait to be complete. The fractional flow of a job is the integral over times between when a job arrives an when it is complete of the fraction of the job that is uncomplete. Integer flos a more appropriate objective if no benefit is gaine from a job being partially complete, an fractional flos a more appropriate objective if some benefit is gaine from partially completing a job. The corresponing two natural scheuling objectives are the integer flow of the scheule, which is the sum of the integer flow of the jobs, an fractional flow of the scheule, which is the sum of the fractional flow of the jobs. On a uniprocessor, SETF is known to be scalable, +ɛ)-spee O)-competitive, for the stanar objective of integer flow [], an it is known that spee augmentation is require to achieve boune competitiveness in a general operating system setting requiring a nonclairvoyant scheuler, that is one that oes not know the size of the jobs [2]. To the best of our knowlege, there are no results in the literature explicitly analyzing the fractional flow for nonclairvoyant algorithms on a uniprocessor although it is possible that such results might be erivable from results on integer flow). The main result of this paper is to shon Section 2 that for a homogeneous multiprocessor, SETF is universally 3 scalable for the objective of fractional flow. The analysis in [] shows that SETF is locally competitive for integer flow on a uniprocessor, that is, at all points in time the increase for the quality of service objective for SETF is not too much greater than the increase for an arbitrary scheule. But it is straightforwar to see that no online scheuling algorithm can be locally competitive for either fractional or integer flow on a homogeneous multiprocessor. Thus the next logical approach is to try to use an amortize local competitiveness argument using the so calle stanar potential function metho for these sorts of scheuling problems for more backgroun, see [3]). However, this stanar approach is not immeiately applicable in this setting as this approach requires a reasonably simple algebraic expression for the online algorithm s future cost given no more job arrivals, an after some thought, one can see that a simple algebraic expression oes not exist for SETF s future costs on a multiprocessor. For fractional flow, we are able to surmount this ifficultly by using a potential function base on an algebraic expression for SETF s future costs on a uniprocessor. The primary ifferences between our potential function an the stanar potential function are that it takes the ifference of future costs between the work remaining in the optimal scheule an the online algorithm instea of the future cost of the ifference in remaining work, an aitionally our potential function iscounts the optimal s future costs. These moifications are necessary to get the running conition to hol; however, these moifications cause the potential function to jump when jobs arrive. Fortunately, we are able to complete the analysis by showing that the aggregate increase in these jumps 3 An algorithm is sai to be universally scalable if it is +ɛ)-spee Ofɛ)) competitive for any fixe constant ɛ > 0 an the algorithm is not parameterize by ɛ. Here f is a function of only ɛ.

can be boune by total processing times of all the jobs. Unfortunately we are unable to make this approach work for integer flow for SETF. Although one can resort to a technique to convert an algorithm that is fractionally scalable to an algorithm that is integrally scalable see [4] for etails). This technique combine with our analysis shows that a variation of SETF is scalable for integer flow. There are two closely relate results in the literature. It was known that if newly arriving jobs were ranomly assigne to a processor, an if each processor ran SETF, that the resulting algorithm is universally scalable in expectation for integer flow [5]. Roughly this analysis combines the fact that SETF is universally scalable on a uniprocessor, with the fact that ranomly assigning jobs roughly balances the processor loas although the fact that there will be some unevenness in the loas in part explains why the competitive ratio that is prove is quite large, something like O ɛ 7 )). It is also known that the algorithm Late Arrival Processor Sharing LAPS) is existentially 4 scalable for integer flow on a homogeneous multiprocessor [6]. There are often situations where one woul like the operating system to view some jobs as being more important than other jobs. One way to formalize this is to assume that jobs have weights specifying their importance, an then consier the objective of minimizing the weighte fractional or integral flow of the jobs. WSETF is a natural generalization of SETF, where jobs are prioritize by the ratio of their weight to the time that jobs have been processe. It was shown in [7] that WSETF is scalable for a uniprocessor using a local competitiveness argument. In Section 3, we show that WSETF is scalable using an amortize local competitiveness argument using a potential function. As in our analysis of SETF, the starting point for the esign of the potential function was an algebraic expression for the future cost of WSETF. However, we again ha to make moifications to the stanar potential function in orer for the running conition to hol. We believe that our analysis is at least moestly interesting for a couple reasons. When one is analyzing algorithms in non-work-conserving scheuling settings, there is usually no hope of using a local competitiveness argument. In this context, a scheuling environment is sai to be non-workconserving if at any given time two reasonable scheuling algorithms coul have complete a ifferent amount of total work thus far. The lack of a potential function analysis of SETF an WSETF meant that these algorithms coul not be use to esign algorithms in non-work-conserving scheuling settings. For example, the analysis of nonclairvoyant spee scaling algorithms for a spee scalable processor in [8] consiere the Late Arrival Processor Sharing Algorithm LAPS) instea of SETF because a potential function analysis was known for LAPS [9]. It is our hope that our potential functions for SETF an WSETF will be useful in other non-work-conserving scheuling settings. Although in fairness we nee to mention that we were unable to aapt our potential function analysis of WSETF for a uniprocessor to the multiprocessor setting because we o not 4 An algorithm is sai to be existentially scalable if it is + ɛ)-spee Ofɛ)) competitive for any fixe constant ɛ > 0 an the algorithm is parameterize by ɛ. Here f is a function of only ɛ.

know how to boun the aggregate increases in the potential function when jobs arrive. But it is our hope that that one further iea woul be enough to surmount this issue, an allow the application of this potential function or some variation thereof) to non-work-conserving scheuling settings. Note that an existentially scalable algorithm, Weighte Late Arrival Processor Sharing, is known for the objective of integer flow on a homogeneous multiprocessor [0]. There is currently a ebate within the architectural community as to whether a homogeneous multiprocessor or a heterogeneous multiprocessor is a better esign []. There are avantages to each option. [2] points out that some stanar priority scheuling algorithms, such as Highest Density First an WSETF, are not scalable for a heterogeneous multiprocessor, an that it is not clear whether other stanar priority algorithms, such as Shortest Remaining Processing Time, Shortest Job First, an SETF, are scalable. So while this paper certainly oes not settle the issue, taken together with [2], the results in this paper inicate that one avantage of homogeneous multiprocessors over heterogeneous multiprocessors is that they seem to be easier to scheule, an that in fact the stanar uniprocessor scheuling algorithms shoul perform similarly well on a homogeneous multiprocessor as on a uniprocessor.. Basic Definitions The input consists of n jobs. We let r i enote the release time of job i, enote the size of job i, an in some instances, enote the weight of job i. An online scheuler oes not learn about job i until time r i. At time r i, a nonclairvoyant scheuler learns the weight but not the size. For each time t, the online algorithm must choose some job i to run such that r i t. We assume that the processor has unit spee, so a job of size, takes units of time to complete. If C i is the completion time for job i, then C i t=r i t is the weighte integer flow for job i. The integer flow of a scheule is the sum over the jobs of the integer flow of each job. The weighte fractional flow of job i, is t=r i pit) t, where t) represents the remaining processing time of job i. The fractional flow of a scheule is the sum over the jobs of the fractional flow of each job. If the scheule is not obvious from context, we superscript a variable with the name of the scheule that is referre to. An algorithm A is s-spee c-competitive if max I A s I) OPT I) c, where A s I) enotes the cost of algorithm A on input I with a spee s processor, OPT I) enotes the cost of the optimal scheule with a spee processor, an the maximum is taken over all possible inputs. A class {A +ɛ) } of algorithms is existentially scalable if for all ɛ > 0, A +ɛ) is + ɛ)-spee Ofɛ))-competitive for some function f that only epens on ɛ. An algorithm A is universally scalable if for all ɛ > 0, A is + ɛ)-spee Ofɛ))-competitive for some function f.

To show that an algorithm A is c + )-competitive using a locally amortize competitiveness argument, one fins a potential function Φ such that the following conitions hol [3]: Bounary conition: Φ is initially 0 an finally non-negative. Completion conition: Φ oes not increase ue to completion of jobs by A or OPT. Arrival conition: Φ oes not increase by more than OPT ue to arrival of jobs. Running conition: At all times t when no job arrives or is complete, we have, t A + t Φt) c t OPT Here t A enotes the increase in the objective in A s scheule, while t OPT enotes the increase in the objective in OPT s scheule. c + )-competitiveness follows by integrating these conitions over time. 2 SETF on a Homogeneous Multiprocessor As our first result, we shon Theorem that SETF is universally scalable on a homogeneous multiprocessor for the objective of fractional flow using an amortize local competitiveness argument. Theorem. SETF is + ɛ)-spee + 5 ɛ )-competitive on a homogeneous multiprocessor for the objective of fractional flow. Proof. We use A to enote SETF. Let m enote the number of homogeneous mutliprocessors. We let qj A t) enote the amount of job j that has been processe up to time t. Note that qj At)+pA j t) =. Let, x) + return x when x is positive, an 0 otherwise. Then, efine p A i,j t) := min, ) qj At))+. This represents the amount of time job i must wait on job j assuming no more jobs arrive. Note that it is possible that i = j. Similarly for OPT, p O i,j t) := min, ) qj Ot))+. We let Q A t) an Q O t) enote the algorithm A s queue an OPT s queue, at time t respectively. Finally, let Zi At) := j Q A t) pa i,j t). Similarly, ZO i t) := j Q O t) po i,j t). We use an amortize local competitiveness argument. We efine the potential function Φt) as follows. Φt) = mɛ = mɛ p A i t) Z A i t) + mp A i t) Z O i t) ) p A i t) j Q A t) j Q O t) min, ) q A j t)) + + mp A i t) min, ) q O j t)) +)

Bounary conition: The bounary conition is trivially satisfie, as there are no jobs contributing to Φ at t = 0 or when all jobs have been finishe. Job completion: Fix some job i Q A t). Consier first when A completes job i. Note that at this time, p A i t) = 0 an therefore there is no change in Φ from removing this term from the sum. Next, consier when A completes some job j i. Since, qj At) =, p A i,j t) = 0, so there is no change in Φ from removing this term. Similarly, the completion of a job by OPT oes not change Φ. Job arrival: We first show the following lemma. Lemma. Consier any job i Q A t) an time t. Then it is the case that Z A i t) ZO i t) m. Proof. Fix time t. Let Jt) enote the set of all jobs in A s queue that have been processe less than job i s total processing time. More formally, we have Jt) = {j Q A t) q A j t) < }. If Jt) m, then there are at most m terms contributing to Z A i t) each of which have value at most an so the esire result hols. So suppose Jt) > m. Consier the earliest time t t such that at any time τ [t, t], Jτ) > m. By efinition of t, at time t δ, there are at most m jobs that have elapse processing times at most. Now consier all jobs, enote by S, which arrive uring [t, t]. Note that for any time τ [t, t], for any job j that is run, q A j τ) < since Jτ) > m. Therefore, Jt) Jt δ) S. Consier Jt) s contribution to Z A i t) ZO i t) at time t. Let t = t δ. min, ) qj A t)) + min, ) qj O t)) + j Jt) min, ) qj A t)) min, ) qj O t)) ) j Jt) = qj O t) qj A t)) j Jt) j Jt ) q O j t ) q A j t )) + j Jt ) q O j t) q O j t )) q A j t) q A j t )) + j Sq O j t) q A j t)) 2) m 3) Inequality ) hols as base on the efinition of Jt) the first term in the sum will always be positive. 2) hols by noting that Jt) = Jt ) S an rearranging terms while letting δ 0. Finally, 3) is true because the first sum is less than m as there are at most m terms of value. Further, j Jt ) qa j t) qa j t ))+ j S qa j t) represents the total work that SETF i uring this interval an j Jt ) qo j t) qo j t ))+ j S qo j t) cannot be more than the work that OPT i uring this interval, therefore their ifference is non-positive.

Given this lemma, note that when job i arrives, Φ increases by at most 2 ɛ an so summing over all arrivals, the increase is at most 4 ɛ OPT since /2 is a lower boun for job i s fractional flow time in any scheule. Running Conition: First note that t A = p A i t). Also, t OPT = p O i t) i Q O t). We now boun the change in Φ at some time t when no jobs arrive or complete. We have that, t Φt) = mɛ pa i t) t Z A i t) + mp A i t) Z O i t)) + pa i t) ZA i t) + mpa i t) ZO i t)) ) t First consier the change of pa i t). This occurs only when job i is being processe by SETF. Since SETF runs at spee + ɛ), pa i t) is ecreasing at a rate of + ɛ). To boun the overall rate of increase in Φ this can have, we ignore the positive terms Zi At) an mpa i t) an consier only ZO i t). Then, the rate of increase in Φ ue to change in pa i t) is boune by mɛ + ɛ) j Q O t) min, ) q O j t)) + To boun this sum, there are two cases to consier. First, consier all jobs j such that <. Then, we have that min, ) qj O t)) + = p O j t) po j t) For all jobs j such that, we have that min, ) q O j t)) + = So, in total we have that mɛ + ɛ) mɛ + ɛ) = + ɛ mɛ pi qj Ot) ) + = j Q O t) j Q O t) t OPT qo j t) min, ) q O j t)) + p O j t) ) + po j t)

Since there are at most m such jobs as i running, the total rate of increase in Φ ue to change in pa i t) is boune by + ɛ ) t OPT. We now turn our attention to the change of Zi At)+mpA i t) ZO i t)) for any job i Q A t). Note that p A i t) > 0, i.e. qa i t) <. If SETF is working on job i, then mp A i t) ecreases at a rate of m + ɛ). Otherwise, if SETF oes not work on i at time t, then there must exist m jobs j such that qj At) < that SETF is working on. In either case, Zi At) + mpa i t) ecreases at a rate of m + ɛ). On the other han, Zi O t) can increase at a rate of at most m. Therefore, the rate of change of Φ ue to change in Zi At) + mpa i t) ZO i t)) is boune by, mɛ So, in total, we have that t A + t Φt) t A + p A i t) m + ɛ) + m) = p A i t) = t A + ) ɛ t OPT t A = + ) ɛ t OPT We note that one can achieve an existentially scalable nonclairvoyant algorithm for integer flow by maintaining the invariant that each job is either one or has processe + ɛ) times as much as SETF woul have processe it on + ɛ) slower processors. 3 WSETF on a Uniprocessor We now show that WSETF is scalable on a single processor for the objective of weighte fractional flow. Recall the WSETF shares the processor equally among all jobs that have maximal ratio between weight an the amount that the job has been processe. Theorem 2. WSETF is +ɛ)-spee + 3 ɛ )-competitive on a uniprocessor for the objective of weighte fractional flow. Proof. We use A to enote the algorithm WSETF. Let qj A t) enote the amount of job j that has been processe up to time t. Let p A wj i,j t) := min, ) qj At))+ an p O wj i,j t) := min, ) qj Ot))+. We again use an amortize local competitiveness argument. Consier the following potential function Φt). Φt) = ɛ = ɛ p A i t) Zi A t) + p A i t) Zi O t)) p A i t) min w j, ) qj A t)) + + p A i t) j Q A t) min w j, ) qj O t)) +) j Q O t)

p A i t) It is worth noting that Zi At) + pa i t)) represents the approximate future cost of WSETF assuming no more jobs arrive. We now verify that all four conitions hol. Bounary conition: The bounary conition is trivially satisfie, as there are no jobs contributing to Φ at t = 0 or when all jobs have been finishe. Job completion: Consier first when A completes job i. Note that at this time, p A i t) = 0 an therefore there is no change in Φ from removing this term from the sum. Next, consier when A completes job j. Since, qj At) =, p A i,j t) = 0, so there is no change in Φ from removing this term. Similarly, the completion of job i or job j by OPT oes not change Φ. Job arrival: We first show the following lemma. Lemma 2. Consier any job i At) an time t. Then it is the case that Zi At) ZO i t) 0. Proof. Fix time t. Consier the earliest time t t such that at any time τ [t, t] = I, WSETF works only on jobs j such that q j τ) wj. By efinition of t, at time t ɛ, all unfinishe jobs have elapse processing times at least wj, which thus contribute zero to Zi A t), so we can ignore those jobs. Now consier all jobs, enote by S, which arrive uring [t, t]. Consier S s contribution to Zi At) ZO i t) at time t, min w j, ) qj A t)) + min w j, ) qj O t)) + j S min w j, ) qj A t)) min w j, ) qj O t)) 4) j S = j S q O j t) q A j t) 5) Note that 4) hols as base on the efinition of S an I, for any job j S, qj A wj t). Now consier the term in 5), j S qo j t) qa j t). First note that j S qa j t) captures the total work that WSETF i uring the interval, an further j S qo j t) cannot excee the amount of work that OPT i uring the same interval. Therefore, this term is non-positive. Given this lemma, note that when job i arrives, Φ increases by at most ɛ ). So, summing over all arrivals, Φ increases by at most ɛ i 2 ɛ OPT as esire. Running Conition: First note that t A = pa i t). Also, t OPT = i Q O t) po i t). We now boun the change in Φ at some time t when no job

arrives or is complete. We have that, t Φt) = ɛ wipa i t) t + p A i t) Z A i t) + p A i t) Z O i t)) ZA i t) + pa i t) ZO i t)) ) t 6) First consier the change of wipa i t). This occurs only when job i is being processe by A. We assume without loss of generality that A works on a single job at each time t. Then, since WSETF runs at spee +ɛ), wipa i t) is ecreasing at a rate of + ɛ) wi. To boun the overall rate of increase in Φ this can have, we ignore the positive terms Zi At) an pa i t) an consier only ZO i t). Then, the rate of increase in Φ ue to change in wipa i t) is boune above by ɛ + ɛ) Zi O t) = + ) wi ɛ j Q O t) min w ) + j, ) qj O t) To boun this sum, we again consier two cases. First, consier all jobs j such that wj > wi. Then, min w j, ) q O j t)) + = q O j t)) w j po j t) Now, for all jobs j such that wj wi, we have that min w j, ) q O j t)) + = w j q O j t)) + = w j q O j t)) + w j po j t) Combining these, we have that + ) wi ɛ j Q O t) min w ) + j, ) qj O t) + ) ɛ t OPT We now turn our attention to the change of Zi At) + pa i t) ZO i t)) for any job i Q A t). Note that p A i t) > 0, i.e. qa i t) <. Thus if WSETF oes not work on i at time t, then there must exist a job j such that qa j t) w j < pi that WSETF is working on. In either case, Zi At) + pa i t) ecreases at a rate of + ɛ. On the other han, Zi O t) can increase at a rate of at most. Therefore, the rate of change in Φ ue to change in Zi At) + pa i t) ZO i t)) is boune above by ɛ p A i t) + ɛ) + ) = p A i t) = t A

So, in total, we have that t A + t Φt) t A + + ) ɛ t OPT t A = + ) ɛ t OPT Our analysis of WSETF oes not exten to a homogeneous multiprocessor because we o not know how to boun the jumps in the potential function when jobs arrive, in part because when a job i arrives the increase in the potential involves terms of the form w j. We are able to surmount this ifficulty in our analysis of SETF because all jobs have equal weight, an the sum of the processing times is a lower boun to optimal. Acknowlegment Supporte in part by an IBM Faculty Awar, an NSF grants CCF-06684, CCF-0830558, CCF-5575, CNS-02070, CNS-5575, an CNS-25328 References. Kalyanasunaram, B., Pruhs, K.: Spee is as powerful as clairvoyance. J. ACM 474) 2000) 67 643 2. Motwani, R., Phillips, S., Torng, E.: Non-clairvoyant scheuling. Theor. Comput. Sci. 30) 994) 7 47 3. Im, S., Moseley, B., Pruhs, K.: A tutorial on amortize local competitiveness in online scheuling. SIGACT News 422) June 20) 83 97 4. Chaha, J.S., Garg, N., Kumar, A., Muralihara, V.N.: A competitive algorithm for minimizing weighte flow time on unrelate machines with spee augmentation. In: Proceeings of the 4st annual ACM symposium on Theory of computing. STOC 09 2009) 679 684 5. Chekuri, C., Khanna, S., et a l., Goel, A.: Multi-processor scheuling to minimize flow time with resource augmentation. In: In Proc. 36th Symp. Theory of Computing STOC), ACM 2004) 363 372 6. Emons, J., Pruhs, K.: Scalably scheuling processes with arbitrary speeup curves. In: SODA. 2009) 685 692 7. Bansal, N., Dhamhere, K.: Minimizing weighte flow time. ACM Trans. Algorithms 34) November 2007) 8. Chan, H.L., Emons, J., Lam, T.W., Lee, L.K., Marchetti-Spaccamela, A., Pruhs, K.: Nonclairvoyant spee scaling for flow an energy. Algorithmica 63) 20) 507 57 9. Chan, H.L., Emons, J., Pruhs, K.: Spee scaling of processes with arbitrary speeup curves on a multiprocessor. Theory Comput. Syst. 494) 20) 87 833 0. Bansal, N., Krishnaswamy, R., Nagarajan, V.: Better scalable algorithms for broacast scheuling. In: ICALP ). 200) 324 335. Merrit, R.: Cpu esigners ebate multi-core future. EE Times February 200) 2. Gupta, A., Im, S., Krishnaswamy, R., Moseley, B., Pruhs, K.: Scheuling heterogeneous processors isn t as easy as you think. In: Proceeings of the Twenty-Thir Annual ACM-SIAM Symposium on Discrete Algorithms. SODA 2, SIAM 202) 242 253