Analysis of Global EDF for Parallel Tasks

Analyss of Global EDF for Parallel Tasks Jng L, Kunal Agrawal, Chenyang Lu, Chrstopher Gll Departent of Coputer Scence and Engneerng Washngton Unversty n St. Lous St. Lous, MO, USA {l.jng, kunal, lu, and cdgll}@wustl.edu Abstract As ultcore processors becoe ever ore prevalent, t s portant for real-te progras to take advantage of ntra-task parallels n order to support coputaton-ntensve applcatons wth tght deadlnes. We prove that a Global Earlest Deadlne Frst (GEDF) schedulng polcy provdes a capacty augentaton bound of 4 and a resource augentaton bound of 1 for parallel tasks n the general drected acyclc graph odel. For the proposed capacty augentaton bound of 4 for plct deadlne tasks under GEDF, we prove that f a task set has a total utlzaton of at ost /(4 ) and each task s crtcal path length s no ore than 1/(4 ) of ts deadlne, t can be scheduled on a achne wth processors under GEDF. Our capacty augentaton bound therefore can be used as a straghtforward schedulablty test. For the standard resource augentaton bound of 1 for arbtrary deadlne tasks under GEDF, we prove that f an deal optal scheduler can schedule a task set on unt-speed processors, then GEDF can schedule the sae task set on processors of speed 1. However, ths bound does not lead to a schedulablty test snce the deal optal scheduler s only hypothetcal and s not known. Sulatons confr that the GEDF s not only safe under the capacty augentaton bound for varous randoly generated task sets, but also perfors surprsngly well and usually outperfors an exstng schedulng technque that nvolves task decoposton. Index Ters real-te schedulng, parallel schedulng, global EDF, resource augentaton bound, capacty augentaton bound I. INTRODUCTION Durng the last decade, the perforance ncrease of processor chps has coe prarly fro ncreasng nubers of cores. Ths has led to extensve work on real-te schedulng technques that can explot ultcore and ultprocessor systes. Most pror work has concentrated on nter-task parallels, where each task runs sequentally (and therefore can only run on a sngle core) and ultple cores are exploted by ncreasng the nuber of tasks. Ths type of schedulng s called ultprocessor schedulng. When a odel s lted to nter-task parallels, each ndvdual task s total executon requreent ust be saller than ts deadlne snce ndvdual tasks cannot run any faster than on a sngle-core achne. In order to enable tasks wth hgher executon deands and tghter deadlnes, such as those used n autonoous vehcles, vdeo survellance, coputer vson, radar trackng and realte hybrd testng [1], we ust enable parallels wthn tasks. In ths paper, we are nterested n parallel schedulng, where n addton to nter-task parallels tasks sets contan ntratask parallels, whch allows threads fro one task runnng n parallel ore than sngle core. There has been soe recent work n ths area. Many of these approaches are based on task decoposton [] [4], whch frst decoposes each parallel task nto a set of sequental subtasks wth assgned nteredate release tes and deadlnes, and then schedules these sequental subtasks usng a known ultprocessor schedulng algorth. Decoposton technques requre, n addton to task-level worst case executon te and crtcal path length, also a thorough knowledge of the structure of tasks as well as the ndvdual worst case executon te of each subtask pror to executon. Such knowledge s expensve to acqure and ay be naccurate, pessstc or even unavalable when tasks fro dfferent vendors are ntegrated on a coon coputng platfor. Moreover, decoposton ntroduces pleentaton coplextes n real systes [5]. Therefore, we are nterested n analyzng the perforance of a global EDF (GEDF) scheduler wthout any decoposton. We consder a general task odel, where each task s represented as a drected acyclc graph (DAG) where each node represents a sequence of nstructons (thread) and each edge represents a dependency between nodes. A node s ready to be executed when all ts predecessors have been executed. GEDF works as follows: at each te step, the scheduler frst tres to schedule as any ready nodes fro all jobs wth the earlest deadlne as t can; then t schedules ready nodes fro the jobs wth the next earlest deadlne, and so on, untl ether all processors are busy or no ore nodes are ready. Copared wth other schedulers, GEDF has benefts, such as autoatc load balancng. Effcent and scalable pleentatons of GEDF for sequencal tasks are avalable for Lnux [6] and LITMUS [7], suggestng the potental exstence of an easy pleentaton for parallel tasks f decoposton s not requred. Pror theory work analyzng GEDF for parallel tasks s ether restrcted to a sngle recurrng task [8] or consders response te analyss for soft-real te tasks [9]. In ths work, we consder task sets wth n tasks and analyze ther schedulablty under the GEDF scheduler n ters of augentaton bounds. We dstngush between two types of augentaton bounds, both of whch are called resource augentaton n prevous lterature. By standard defnton, a scheduler S provdes a resource augentaton bound of b f the followng condton holds: f an deal scheduler can schedule a task set on unt-speed processors, then S can schedule that task set on processors of speed b. Note that the deal scheduler (optal

schedule) s only a hypothetcal scheduler, eanng that f a feasble schedule ever exsts for a task set then ths deal scheduler can guarantee to schedule t. Unfortunately, Fsher et al. [10] proved that optal onlne ultprocessor schedulng of sporadc task systes s possble. Snce there ay be no way to tell whether the deal scheduler can schedule a gven task set on unt-speed processors, a resource augentaton bound ay not provde a schedulablty test. Therefore, we dstngush resource augentaton fro a capacty augentaton bound whch can serve as an easy schedulablty test. If on unt-speed processors, a task set has total utlzaton of at ost and the crtcal path length of each task s saller than ts deadlne, then scheduler S wth capacty augentaton bound of b can schedule ths task set on processors of speed b. Capacty augentaton bounds have the advantage that they drectly lead to schedulablty tests, snce one can easly check the bounds on utlzaton and crtcal path length for any task set. The contrbutons presented n ths paper are as follows: For a syste wth dentcal processors, we prove a capacty augentaton bound of4 (whch approaches to 4 as approaches to nfnty) for sporadc task sets wth plct deadlnes the relatve deadlne of each task s equal to ts perod. Another way to understand ths bound s: f a task set has total utlzaton at ost /(4 ) and the crtcal path length of each task s at ost 1/(4 ) of ts deadlne, then t can be scheduled usng GEDF on unt-speed processors. For a syste wth dentcal processors, we prove a resource augentaton bound of 1 (whch approaches to as approaches to nfnty) for sporadc task sets wth arbtrary deadlnes. We also show that GEDF s capacty bound for parallel task sets (even wth plct deadlnes) s greater than 1. In partcular, we show exaple task sets wth utlzaton where the crtcal path length of each task s no ore than ts deadlne, whle GEDF sses a deadlne on processors wth speed less than 3+ 5.618. We conduct sulaton experents to show that the capacty augentaton bound s safe for task sets wth dfferent DAG structures (as entoned above, checkng the resource augentaton bound s dffcult snce we cannot copute the optal schedule). Results show that GEDF perfors surprsngly well. All sulated rando task sets eet ther deadlnes wth processor speed of. We also copare GEDF wth a schedulng technque that decoposes parallel tasks and then schedules decoposed subtasks usng GEDF [4]. We fnd that for ost task sets, GEDF s better wthout decoposton. Only aong task sets, whch are ntentonally desgned to be harder for GEDF to schedule, does the decoposton technque occasonally perfor better than GEDF wthout decoposton. Secton II revews related work. Secton III descrbes the DAG task odel wth ntra-task parallels. Proofs for a capacty augentaton bound of 4 and a resource augentaton bound of 1 are presented n Sectons IV and V respectvely. Secton VI shows experental results and Secton VII gves concludng rearks. II. RELATED WORK Schedulng parallel tasks wthout deadlnes has been addressed by parallel-coputng researchers [11] [14]. Soft realte schedulng has been studed for varous optzaton crtera, such as cache sses [15], [16], akespan [17] and total work done by tasks that eet deadlnes [18]. Most pror work on hard real-te schedulng atop ultprocessors has concentrated on sequental tasks [19]. In ths context, any suffcent schedulablty tests for GEDF and other global fxed prorty schedulng algorths have been proposed [0] [8]. Earler work consderng ntra-task parallels akes strong assuptons on task odels [9] [30] [31]. For ore realstc parallel tasks, synchronous tasks, Kato et al. [3] proposed a gang schedulng approach. The synchronous odel, a specal case of the DAG odel, represents tasks wth a sequence of ultthreaded segents wth synchronzaton ponts between the (such as those generated by parallel-for loops). Most other approaches for schedulng synchronous tasks nvolve decoposng parallel tasks nto ndependent sequental subtasks, whch are then scheduled usng known ultprocessor schedulng technques, such as [4], [33]. For a restrcted set of synchronous tasks, Lakshanan et al. [] prove a capacty augentaton bound of 3.4 usng deadlne onotonc schedulng for decoposed tasks. For ore general synchronous tasks, Safullah et al. [3] proved a capacty augentaton bound of 4 for GEDF and 5 for deadlne onotonc schedulng. Ths decoposton approach was recently extended to general DAGs [4] to acheve a capacty augentaton bound of 4 under GEDF on decoposed tasks (note that n that work GEDF s used to schedule sequental decoposed tasks, not parallel tasks drectly). Ths s the best augentaton bound known for task sets wth ultple DAGs. More recently, there has been soe work on schedulng general DAGs wthout decoposton. Noguera et al. [34] explored the use of work-stealng for real-te schedulng. The paper s ostly experental and focused on soft realte preforance. The bounds for hard real-te schedulng only guarantee that tasks eet deadlnes f ther utlzaton s saller than 1. Ths restrcton lts ts applcaton to any coputaton-ntensve real-te applcatons, whose utlzaton exceeds the capacty of a sngle core. For exaple, even for large, lke 101, a task wth utlzaton /100 s not guaranteed to be schedulable usng the such schedulers. Anderson et al. [9] analyzed the response te of GEDF wthout decoposton for soft real-te tasks. Baruah et al. [8] proved when the task set s a sngle DAG task wth arbtrary deadlnes, GEDF provdes a resource augentaton bound of. In ths paper, nstead of just a sngle task, we consder ultple DAG tasks wth dfferent executon tes, release tes and deadlnes. For the resource augentaton bound of 1 for ultple DAGs under GEDF proved n our paper, Bonfac

et al. [35] also show the sae resource augentaton bound 1. Our work was done ndependently of that work, and they also do not consder capacty augentaton. III. TASK MODEL Ths secton presents a odel for DAG tasks. We consder a syste wth dentcal unt-speed processors. The task set τ conssts of n tasks τ = {τ 1,τ,...,τ n }. Each task τ s represented by a drected acyclc graph (DAG), and has a perod P and deadlne D. We represent the j-th subtask of theth task as nodew j. A drected edge fro nodewj towk has fnshed eans that W k can only be executed after W j executng. A node s ready to be executed as soon as all of ts predecessors have been executed. Each node has ts own worst case executon te C j. Multple source nodes and snk nodes are allowed n the DAG, and the DAG s not requred to be fully connected. Fgure 1 shows an exaple of a task consstng of 5 subtasks n the DAG structure. For each task τ n task set τ, let C = j Cj be the total worst case executon te on a sngle processor, also called the work of the task. Let L be the crtcal path length (.e. the worst case executon te of the task on nfnte nuber of processors). In Fgure 1, the crtcal path (.e. the longest path) starts fro node W1 1, goes throughw3 1 and ends at node W1 4, so the crtcal path length of DAG W 1 s 1+3+ = 6. The work and the crtcal path length of any job generated by task τ are the sae as those of task τ. Fg. 1: Exaple task wth work C = 8 and crtcal path length L = 6. We also defne the noton of reanng work and reanng crtcal path length of a partally executed job. The reanng work s the total work nus the work that has already been done. The reanng crtcal path length s the length of the longest path n the unexecuted porton of the DAG (ncludng partally executed nodes). For exaple, n Fgure 1, f W 1 1 and W 1 are copletely executed, and W 3 1 s partally executed such that 1 (out of 3) unts of work has been done for t, then the reanng crtcal path length s + = 4. Nodes do not have ndvdual release offsets and deadlnes when scheduled by the GEDF scheduler; they share the sae absolute deadlne of ther jobs. Therefore, to analyze the GEDF scheduler, we do not requre any knowledge of the DAG structure beyond the total worst case executon te C, deadlne D, perod P and crtcal path length L. We also defne the utlzaton of a task τ as u = C P. On unt speed processors, a task set s not schedulable (by any scheduler) unless the followng condtons hold: The crtcal path length of each task s less than ts deadlne. L D (1) The total utlzaton s saller than the nuber of cores. u () In addton, we denote J k,a as the a-th job nstance of task k n syste executon. For exaple, the -th node of J k,a s represented as Wk,a. We denote r k,a and d k,a as the absolute release te and absolute deadlne of job J k,a respectvely. Relatve deadlne D k s equal to d k,a r k,a. Snce n ths paper we address sporadc tasks, the absolute release te has the followng propertes: r k,a+1 d k,a r k,a+1 r k,a d k,a r k,a = D k IV. CAPACITY AUGMENTATION BOUND OF 4 In ths secton, we propose a capacty augentaton bound of 4 for plct deadlne tasks, whch yelds an easy schedulablty test. In partcular, we show that GEDF can successfully schedule a task set, f the task set satsfes the followng condtons: (1) ts total utlzaton s at ost /(4 ) and () the crtcal path length of each task s at ost 1/(4 ) of ts perod (and deadlne). Note that ths s equvalent to say that f a task set eets condtons fro Inequaltes 1 and of processors of unt speed, then t can be scheduled on processors of speed 4 (whch approaches to 4 as approaches to nfnty). The gst of the proof s the followng: at a job s release te, we can bound the reanng work fro other tasks under GEDF wth speed up 4. Bounded reanng work leads to bounded nterference fro other tasks, and hence GEDF can successfully schedule all of the. Notaton We frst defne a noton of nterference. Consder a job J k,a, whch s the a-th nstance of task τ k. Under GEDF schedulng, only jobs that have absolute deadlnes earler than the absolute deadlne of J k,a can nterfere wth J k,a. We say that a job s unfnshed f the job has been released but has not copleted yet. Due to plct deadlnes (D = P ), at ost one job of each task can be unfnshed at any te. There are two sources of nterference for job J k,a. (1) Carry-n work s the work fro jobs that were released before J k,a, dd not fnsh before J k,a was released, and have be the carry-n work due to task and let R k,a = Rk,a be the total carry-n fro the entre task set onto the job J k,a. () Other than carry-n work, the jobs that were released after (or at the sae te as) J k,a was released can also nterfere wth t f ther deadlnes deadlnes before J k,a deadlne. Let R k,a are ether before or at the sae te as J k,a. Let n k,a be the nuber of task s jobs, whch are released after the release te of J k,a but have deadlnes no later than the deadlne of

J k,a (that s, the nuber of jobs fro task that entrely fall n between the release te and deadlne of J k,a,.e. the te nterval [r k,a,d k,a ].) For exaple, n rght part of Fgure, one entre job J 1,3 falls wthn te nterval [r 3,1,d 3,1 ] of job J 3,1. So, n 3,1 1 = 1. By defnton and D = P, every task has the property that n k,a D D k (3) Therefore, the total aount of work, A k,a that can nterfere wth J k,a (ncludng J k,a s work) and ust be fnshed (to prevent any deadlne sses) before the deadlne of J k,a s the su of the carry-n work and the work that was released at or after J k,a s release. A k,a = R k,a + u n k,a D. (4) Note that the work of the job J k,a tself s also ncluded n ths forula. That s, n ths forulaton, each job nterferes wth tself. Proof of the Theore Consder a GEDF schedule wth processors each of speed b. Each te step can be dvded ntobsub-steps such that each processor can do one unt of work n each sub-step. We say a sub-step s coplete f all processors are workng durng that sub-step, otherwse, we say t s ncoplete. Frst, a couple of straght-forward leas. Lea 1. On every ncoplete sub-step, the reanng crtcal path length of each unfnshed job reduces by 1. Lea. In any t contguous te steps (bt sub-steps) wth unfnshed jobs, f there are t ncoplete sub-steps, then the total work done durng ths te, F t s at least F t bt ( 1)t. Proof: The total nuber of coplete sub-steps durng t steps s bt t, and the total work durng these coplete steps s (bt t ). On an ncoplete sub-step, at least one unt of work s done. Therefore, the total work done n ncoplete sub-steps s at least t. Addng the two gves us the bound. We now prove a suffcent condton for the schedulablty of a partcular job. Lea 3. If nterference A k,a on a job J k,a s bounded by A k,a bd k ( 1)D k, then job J k,a can eet ts deadlne on dentcal processors wth speed of b. Proof: Note that there are D k te steps (therefore bd k sub-steps) between the release te and deadlne of ths job. There are two cases: Case 1: The total nuber of ncoplete sub-steps between the release te and deadlne of J k,a s ore than D k, and therefore, also ore than L k. In ths case, J k,a s crtcal path length reduces on all of these sub-steps. After at ost L k ncoplete steps, the crtcal path s 0 and the job has fnshed executng. Therefore, t can not ss the deadlne. Case : The total nuber of ncoplete sub-steps between the release and deadlne of J k,a s saller than D k. Therefore, the total aount of work done durng ths te s ore than bd k ( 1)D k by the condton n Lea. Snce the total nterference (ncludng J k,a s work) s at ost ths quantty, the job cannot ss ts deadlne. Fg. : Exaple of tasks executon trace We now defne addtonal notaton n order to prove that f the carry-n work for a job s bounded, then GEDF guarantees a capacty augentaton bound of b. Let α k,a be the nuber of te steps between the absolute release te of J k,a and the absolute deadlne of the carry-n job of task. Hence, for J k,a and ts carry-n job J j,b of task j α k,a j = d j,b r k,a (5) To ake the notaton clearer, we gve an exaple n Fgure. There are 3 sporadc tasks wth plct deadlnes: the (executon te, deadlne, perod) for task 1, and 3 are (, 3, 3), (7, 7, 7) and (6, 6, 6) respectvely. For splcty, assue they are sequental tasks. Snce tasks are sporadc, r 1, > d 1,1. α 3,1 1 s the nuber of te steps between the release te of job J 3,1 and the deadlne of the carry-n job J 1, fro task 1. In ths exaple,α 3,1 1 =. Slarly, α 3,1 = 3. Also, n 3,1 1 = 1. For ether perodc or sporadc tasks, task has the property α k,a +n k,a D D k (6) snce α k,a s the reanng length of the carry-n job and n k,a s the nuber of jobs of task entrely fallng n the perod (relatve deadlne) of jobj k,a. As n Fgure,α 3,1 1 +n3,1 1 D 1 = +1 3 = 5 < 6 = D 3. Lea 4. If the processors speed s b 4 and the total carry-n work R k,a fro every task satsfes the condton R k,a u α k,a + ax(α k,a ), then job J k,a always eets ts deadlne under global EDF. Proof: The total aount of nterferng work (ncludng J k,a s work) s A k,a = R k,a + u n k,a D. Hence, accordng to the condton n Lea 4, the total aount of work A k,a = R k,a + u n k,a D u α k,a +ax(α k,a )+ u n k,a D u (α k,a +n k,a D )+ax(α k,a ) Usng eq.(6) to substtute D k nto the forula, then A k,a u D k +D k

Snce the total task set utlzaton does not exceed the nuber of processors, by eq.(), we replace u wth. And snce b 4 and 1, we get A k,a D k (3 1)D k (4 )D k ( 1)D k bd k ( 1)D k Fnally, accordng to Lea 3, snce the nterference satsfes the bound, job J k,a can eet ts deadlne. We now coplete the proof by showng that the carry-n work s bounded as requred by Lea 4 for every job. Lea 5. If the processor s speed b 4, then, for ether perodc or sporadc task sets wth plct deadlnes, the total carry-n work R k,a for every job J k,a n the task set s bounded by R k,a u α k,a +ax(α k,a ) Proof: We prove ths theore by nducton fro absolute te 0 to the release te of job J k,a. Base Case: For the very frst job of all the tasks released n the syste (denoted J l,1 ), no carry-n jobs are released before ths job. Therefore, the condton trvally holds and the job can eet ts deadlne by Lea 4. R l,1 = 0 u α l,1 +ax(α l,1 ) Inductve Step: Assue that for every job wth an earler release te than J k,a, the condton holds. Therefore, accordng to Lea 4, every earler released job eets ts deadlne. Now we prove that the condton also holds for job J k,a. For job J k,a, f there s no carry-n work fro jobs released earler than J k,a, so that R k,a = 0, the property trvally holds. Otherwse, there s at least one unfnshed job (a job wth carry-n work) at the release te of J k,a. We now defne J j,b as the job wth the earlest release te aong all the unfnshed jobs at the te that J k,a was released. For exaple, at release te r 3,1 of J 3,1 n Fgure, both J 1, and J,1 are unfnshed, but J,1 has the earlest release te. By the nductve assupton, the carry-n work R j,b at the release te of job J j,b s bounded by R j,b u α j,b +ax(α j,b ) (7) Let t be the nuber of te steps between the release te r j,b of J j,b and the release te r k,a of J k,a. t = r k,a r j,b Note that J j,b has not fnshed at te r k,a, but by assupton t can eet ts deadlne. Therefore ts absolute deadlne d j,b s later than the release te r k,a. So, by eq.(5) t+α k,a j = r k,a r j,b +α k,a j = d j,b r j,b = D j (8) In Fgure, t+α 3,1 1 = r 3,1 r,1 +α 3,1 1 = d,1 r,1 = D. For each task τ, let n t be the nuber of jobs that are released after the release te r j,b of J j,b before the release te r k,a of J k,a. The last such job ay have a deadlne after the release te of r k,a, but ts relase te s before r k,a. In other words, n t s the nuber of jobs of task τ, whch fall entrely nto the te nterval [r j,b,r k,a + D ]. By defnton of α k,a, to job J k,a, the deadlne of the unfnshed job of task s r k,a +α k,a. Therefore, for every τ, α j,b +n t D r k,a +α k,a r j,b = t+α k,a (9) As n the exaple n Fgure, one entre job of task 1 falls wthn [r,1,r 3,1 +D 1 ], akng n t 1 = 1 and d 1, = r 3,1 +α 3,1 1. Also, snce d 1,1 r 1,, α,1 1 +n t 1D 1 = α,1 1 +D 1 d 1, r,1 = r 3,1 +α 3,1 1 r,1 = t+α 3,1 1 t+d 1. Copare between t and α k,a j. When t 1 D j, by eq.(8), α k,a j = D j t 1 D j t. There are two cases: Case 1: t 1 D j and hence α k,a j t: Snce by defnton J j,b s the earlest carry-n job, other carry-n jobs to J k,a are released after the release te of J j,b and therefore are not carry-n jobs to J j,b. In other words, the carry-n jobs to J j,b ust have been fnshed before the release te of J k,a, whch eans that the carry-n work R j,b s not part of the carry-n work R k,a. So the carry-n work R k,a s the su of those released later than J j,b R k,a = u n t D u (t+α k,a ) (fro eq.(9)) ( ) By assupton of case 1, t α k,a j ax α k,a. Hence, replace u wth usng eq.(), we can prove that R k,a ( ) u α k,a +ax α k,a Case : t > 1 D j: Snce J j,b has not fnshed executng at the release te of J k,a, the total nuber of ncoplete sub-steps durng the t te steps (r j,b,r k,a ] s less than L j. Therefore, the total work done durng ths te s at least F t where F t = bt ( 1)L j (fro Lea ) bt ( 1)D j (fro eq.(1)) The total aount of work fro jobs that are released n te nterval (r j,b,r k,a ] (.e, entre jobs that fall n between the release te of job J j,b and the release te of job J k,a plus ts deadlne) s u n t D, by the defnton of n t. The carry-n work R k,a at the release te of job J k,a s the su of the carry-n work R j,b and newly released work u n t D nus the fnshed work durng te nterval t, whch s R k,a = R j,b + u n t D F t R j,b + u n t D (bt ( 1)D j ) By our assupton n eq.(7), we can replacer j,b, reorganze the forula and get R k,a ( ) u α j,b +ax α j,b + u n t D bt+( 1)D j ( ( ) u α j,b +n t D )+ax α j,b bt+( 1)D j Accordng to eq.(9), we can replace α j,b +n t D wth t+ α k,a, reorganze the forula and get

R k,a ( ) u t+α k,a bt+( 1)D ( j ( u t+α k,a +ax(α j,b ) ) t +ax(α j,b )+( 1)D j (b 1)t Usng eq.() to replace wth u n the frst te, usng eq.(6) to getax (α j,b ) D j and to replaceax (α j,b ) wth D j n the second te, and snce t > 1 D j, R k,a u α k,a u α k,a +D j +( 1)D j (b )t t +D j t+( 1)t (b )t u α k,a +(D j t)+0 (snce b 4 ) u α k,a +α k,a j (fro eq.(8)) ( ) Fnally, snce α k,a j ax α k,a, we can prove that R k,a ( ) u α k,a +ax α k,a Therefore, by nducton, f the processor speed b 4, for every J k,a n task set R k,a ( ) u α k,a +ax α k,a Fro Leas 4 and 5, we can easly derve the followng capacty augentaton bound theore. Theore 1. If, on unt speed processors, the utlzaton of a sporadc task set s at ost, and the crtcal path length of each job s at ost ts deadlne, then the task set can eet all ther plct deadlnes on processors of speed 4. Theore 1 proves the speed-up factor of GEDF and t can also be restated as follows: Corollary 6. If a sporadc task set τ wth plct deadlnes satsfes the condton that ts total utlzaton s no ore than 1/(4 ) of the total syste utlzaton and for every task τ τ, the crtcal path length L s saller than 1/(4 ) of ts perod D, then GEDF can successfully schedule task set τ. V. RESOURCE AUGMENTATION BOUND OF 1 In ths secton, we prove the resource augentaton bound of 1 for GEDF schedulng of arbtrary deadlne tasks. For sake of dscusson, we convert the DAG representng of task nto an equvalent DAG where each sub-node does 1 unt of work. An exaple of ths transforaton of task 1 n Fgure 1 s shown n job W 1 n Fgure 3 (see the upper job). A node wth work w s splt nto a chan of w sub-nodes wth work 1. For exaple, snce n Fgure 3 =, nodew1 1 wth worst case executon te of 1 s splt nto sub-nodes W 1,1 1 and W 1, 1 wth length 1. The orgnal ncong edges coe nto the frst node of the chan, whle the outgong edges leave the last node of the chan. Ths transforaton does not change ) any other characterstc of the DAG, and the schedulng does not depend on ths step the transforaton s done only for clarty of the proof. Frst, soe defntons. Snce the GEDF scheduler runs on processors of speed 1, each step under GEDF can be dvded nto ( 1) sub-steps of length 1. In each sub-step, each processor can do 1 unt of work (.e. execute one subnode). In a GEDF scheduler, on an ncoplete step, all ready nodes are executed (Observaton 7). As n Secton IV, we say that a sub-step s coplete f all processors are busy, and ncoplete otherwse. For each sub-step t, we defne F I (t) as the set of sub-nodes that have fnshed executng under an deal scheduler after sub-step t, R I (t) as the set of sub-nodes that are ready (all ther predecessors have been executed) to be executed by the deal scheduler before sub-step t, and D I (t) as the set of sub-nodes copleted by the deal scheduler n sub-step t. Note that D I (t) = R I (t) F I (t). We slarly defne F G (t), R G (t), and D G (t) for GEDF scheduler. Observaton 7. The GEDF scheduler copletes all the ready nodes n an ncoplete sub-step. That s, D G (t) = R G (t), f t s ncoplete sub-step, (10) Note for the deal scheduler, each orgnal step conssts of sub-steps, whle for GEDF wth speed 1 each step conssts of 1 sub-steps. For exaple, n Fgure 3 for step t 1, there are two sub-steps t 1(1) and t 1() under deal scheduler, whle under GEDF there s an addtonal t 1(3) (snce 1 = 3). Theore. If an deal scheduler can schedule a task set τ (perodc or sporadc tasks wth arbtrary deadlnes) on a unt-speed syste wth dentcal processors, then global EDF can schedule τ on processors of speed 1. Proof: In a GEDF scheduler, on an ncoplete sub-step, all ready sub-nodes are executed (Observaton 7). Therefore, after an ncoplete sub-step, GEDF ust have fnshed all the released sub-nodes and hence ust have done at least as uch work as the deal scheduler. Thus, for brevty of our proof, we leave out any te nterval when all processors under GEDF are dlng, snce at ths te GEDF has fnshed all avalable work and at ths te the Theore s obvously true. We defne te 0 as the frst nstant when not all processors are dlng under GEDF and te t as any te such that for every substeps durng te nterval [0, t] at least one processor under GEDF s workng. Therefore for every ncoplete sub-step 1 GEDF wll fnsh at least 1 sub-node (.e. unt of work). We also defne sub-step 0 as the last sub-step before te 0 and hence by defnton, F G (0) F I (0) and F G (0) F I (0) (11) For each te t 0, we now prove the followng: If the deal unt-speed syste can successfully schedule all tasks wth deadlnes n the te nterval [0,t], then on speed 1 processors, so can GEDF. Note agan that durng the nterval [0,t] an deal scheduler and GEDF have t and t t substeps respectvely. Case 1: In [0,t], GEDF has at ost t ncoplete substeps.

Fg. 3: Exaple of tasks executon: (left) under unt-speed deal scheduler; (rght) under -speed GEDF Snce there are at least (t t) t = t t coplete steps, the syste can coplete F G (t) F G (0) (t t) + (t) = t work, snce each coplete sub-step can fnsh executng sub-nodes and each ncoplete sub-step can fnsh executng at least 1 sub-node. We defne I(t) as the set of all sub-nodes fro jobs wth absolute deadlnes no later than t. Snce the deal scheduler can schedule ths task set, we know that I(t) F I (0) t = t, snce the deal scheduler can only fnsh at ost sub-nodes n each sub-step and durng[0, t] there are t sub-steps for the deal scheduler. Hence, we have F G (t) F G (0) I(t) F I (0). By eq.(11), we get F G (t) I(t). Note that jobs n I(t) have earler deadlnes than the other jobs, so under GEDF, no other jobs can nterfere wth the. The GEDF scheduler wll never execute other sub-nodes unless there are no ready sub-nodes fro I(t). Snce F G (t) I(t),.e. the nuber of subnodes fnshed by GEDF s at least the aount n I(t), ths ples that GEDF ust have fnshed all sub-nodes n I(t). Therefore, GEDF can eet all deadlnes snce t has fnshed all work that needed to be done by te t. Case : In [0, t], the GEDF syste has ore than t ncoplete sub-steps. For each nteger s we defne f(s) as the frst te nstant such that the nuber of ncoplete sub-steps n nterval [0,f(s)] s exactly s. Note that the sub-step f(s) s always ncoplete, snce otherwse t wouldn t be the frst such nstant. We show, va nducton, that F I (s) F G (f(s)). In other words, after f(s) sub-steps, GEDF has copleted all the nodes that the deal scheduler has copleted after s sub-steps. Base Case: For s = 0, f(s) = 0. By eq.(11), the cla s vacuously true. Inductve Step: Suppose that fors 1 the claf I (s 1) F G (f(s 1)) s true. Now, we prove that F I (s) F G (f(s)). In (s 1,s], the deal syste has exactly 1 sub-step. So, F I (s) = F I (s 1) D I (s) F I (s 1) R I (s) (1) Snce F I (s 1) F G (f(s 1)), all the sub-nodes that are ready before sub-step s for the deal scheduler, wll ether have already been executed or are also ready for the GEDF scheduler one sub-step after sub-step f(s 1); that s, F I (s 1) R I (s) F G (f(s 1)) R G (f(s 1)+1) (13) For GEDF, fro sub-step f(s 1)+1 to f(s), all the ready sub-nodes wth earlest deadlnes wll be executed and then new sub-nodes wll be released nto the ready set. Hence, F G (f(s 1)) R G (f(s 1)+1) F G (f(s 1)+1) R G (f(s 1)+) (14)... F G (f(s) 1) R G (f(s)) Snce sub-step f(s) for GEDF s always ncoplete, F G (f(s)) = F G (f(s) 1) D G (f(s)) = F G (f(s) 1) R G (f(s)) (fro eq.(10)) F G (f(s 1)) R G (f(s 1)+1) F I (s 1) R I (s) (fro eq.(13)) (fro eq.(14)) F I (s) (fro eq.(1)) By te t, there are t sub-steps for the deal scheduler, so GEDF ust have fnshed all the nodes executed by the deal scheduler at sub-step f(t). Snce there are exactly t ncoplete sub-steps n [0,f(t)] and snce the nuber of ncoplete sub-steps by te t s at least t, the te f(t) s no later than te t. Snce the deal syste does not ss any deadlne by te t, GEDF also eets all deadlnes. An Exaple Provdng Intuton of the Proof We provde an exaple n Fgure 3 to llustrate the proof of Case and copare the executon trace of an deal scheduler (ths scheduler s only consdered deal n the sense that t akes all the deadlnes) and GEDF. In addton to task 1 fro Fgure 1, task conssts of two nodes connected to another node, all wth executon te of 1 (each splt nto sub-nodes n the fgure). All tasks are released by te t 0. The syste has cores, so GEDF has resource augentaton bound of 1.5. On the left sde s the executon trace for the deal scheduler on unt-speed cores, whle the rght sde shows the executon trace under GEDF on speed cores. One step s dvded nto and 3 sub-steps, representng the speedup of 1 and 1.5 for the deal scheduler and GEDF respectvely. Snce the crtcal path length of task 1 s equal to ts deadlne, ntutvely t should be executed edately even though t has the latest deadlne. That s exactly what the deal scheduler does. However, GEDF (whch does not take crtcal path length nto consderaton) wll prortze task

frst. If GEDF s only on a unt-speed syste, task 1 wll ss deadlne. However, when GEDF gets speed-1.5 processors, all jobs are fnshed n te. To llustrate Case of the above theore, consder s =. Snce t (3) s the second ncoplete sub-step by GEDF, f(s) = (3). All the nodes fnshed by the deal scheduler after second sub-step (shown at left n dark grey) have also been fnshed under GEDF by stept (3) (shown at rght n dark grey). Capacty Augentaton Bound s greater than (3+ 5)/ Whle the above proof guarantees a bound, snce the deal scheduler s not known, gven a task set, we cannot tell f t s feasble on speed-1 processors. Therefore, we cannot tell f t s schedulable by GEDF on processors wth speed 1. One standard way to prove resource augentaton bounds s to use lower bounds on the deal scheduler, such as Inequaltes 1 and. As prevously stated, we call the resource augentaton bound proven usng these lower bounds a capacty augentaton bound n order to dstngush t fro the augentaton bound descrbed above. To prove a capacty augentaton bound of b under GEDF, one ust prove that f Inequaltes 1 and hold for a task set on unt-speed processors, then GEDF can schedule that task set on processors of speed b. Hence, the capacty augentaton bound s also an easy schedulablty test. Frst, we deonstrate a counter-exaple to show provng a capacty augentaton bound of for GEDF s possble. Fg. 4: Structure of the task set that deonstrates GEDF does not provde a capacty augentaton bound less than (3+ 5)/ In partcular, we provde a task set that satsfes nequaltes 1 and, but cannot be scheduled on processors of speed by GEDF n Fgure 4. In ths exaple, as shown n Fgure 5, = 6. The task set has two tasks. All values are easured on a unt-speed syste, shown n Fgure 4. Task 1 has 13 nodes wth total executon te of 440 and perod of 88, so ts utlzaton s 5. Task s a sngle node, wth executon te and plct deadlne both 60 and hence utlzaton of 1. Note the total utlzaton (6) s exactly equal to, satsfyng nequalty. The crtcal path length of each task s equal to ts deadlne, satsfyng nequalty 1. The executon trace of the task set on a -speed 6-core processor under GEDF s shown n Fgure 5. The frst task s released at te 0 and s edately executed by GEDF. Snce the syste under GEDF s at speed, W 1,1 1 fnshes Fg. 5: Executon of the task set under GEDF at speed executng at te 8. GEDF then executes 6 out of the 1 parallel nodes fro task 1. At te 9, task s released. However, ts deadlne s r +D = 9+60 = 89, later than deadlne 88 of task 1. Nodes fro task 1 are not preepted by task and contnue to execute untl all of the fnsh ther work at te 60. Task 1 successfully eets ts deadlne. The GEDF scheduler fnally gets to execute task and fnshes t at te 90, so task just fals to eet ts deadlne of 89. Note that ths s not a counter-exaple for the resource augentaton bound shown n Theore, snce no scheduler can schedule ths task set on unt-speed syste ether. Second, we deonstrate that one can construct task sets that requre capacty augentaton of at least 3+ 5 to be schedulable by GEDF. We generate task sets wth two tasks whose structure depends on, speedup factor b and a parallels factor n, and show that for large enough and n, the capacty augentaton requred s at least b 3+ 5. As n the lower part of Fgure 4, task 1 s structured as a sngle node wth work x followed by n nodes wth work y. Its crtcal path length s x+y and so s ts deadlne. The utlzaton of task 1 s set to be 1, hence 1 = x+ny (15) x+y Task s structured as a sngle node wth work and deadlne equal to x+y x b (hence utlzaton 1). Therefore, the total task utlzaton s and Inequaltes 1 and are et. As the lower part of Fgure 5 shows, Task s released at te x b +1. We want to generate a counter exaple, so want task to barely ss the deadlne by 1 sub-step. In order for ths to occur, we ust have b + 1 b (x+y x ). (16) b Reorganzng and cobnng eq.(15) and eq.(16), we get ( )b = ((3bn b n b n+1)+(b bn 1))y (17) In the above equaton, for large enough and n, we have (x+y x ny )+ = b (3bn b n b n+1) > 0, or 1 < b < 3 1 n + 1 5 n + 1 n (18) So, there exsts a counter-exaple for any speedup b whch satsfes the above condtons. Therefore, the capacty augentaton requred by GEDF s at least 3+ 5. The exaple above

wth speedup of coes fro such a constructon. Another exaple wth speedup.5 can be obtaned when x = 36050, y = 5900, = 10 and n = 7. VI. EVALUATION In ths secton, we descrbe our experental evaluaton the perforance of GEDF and the robustness of our capacty augentaton bound. 1 We randoly generate task sets that fully load achnes, and then sulate ther executon on achnes of ncreasng speed. The capacty augentaton bound for GEDF predcts that all task sets should be schedulable by the te the processor speed s ncreased to 4. In our sulaton experents, all task sets becae schedulable before the speed reached. We also copared GEDF wth the only other known ethod that provdes capacty bounds for schedulng ultple DAGs (wth a DAG s utlzaton potentally ore than 1) on ultcores [4]. In ths ethod, whch we call DECOMP, tasks are decoposed nto sequental subtasks and then scheduled usng GEDF. We fnd that GEDF wthout decoposton perfors better than DECOMP for ost task sets. A. Task Sets and Experental Setup We generate two types of DAG tasks for evaluaton. For each ethod, we frst fx the nuber of nodes n n the DAG and then add edges. (1) Erdos-Reny ethod G(n, p) [36]: Each vald edge s added wth probablty p, where p s a paraeter. Note that ths ethod does not necessarly generate a connected DAG. Although the bound also does not requre the DAG of a task to be fully connected, connectng the can ake t harder to schedule. Hence, we odfed t slghtly n the last step, to add the fewest edges needed to ake the DAG connected. () Specal synchronous task L(n, ): As shown n Fgure 4, task 1, synchronous tasks, n whch hghly parallel segents follow sequental segents ake schedulng dffcult for GEDF snce they can cause deadlne sses for other tasks. Therefore, we generate task sets wth alternatng sequental and hghly parallel segents. Tasks n L(n, ) ( s the nuber of processors) are generated n the followng way. Whle the total nuber of nodes n the DAG s saller than n, we add another sequental segent by addng a node, then generate the next parallel layer randoly. For each parallel layer, we unforly generate a nuber t between 1 and n, and set the nuber of nodes n the segent to be t. Gven a task structure generated by ether of the above ethods, worst case executon tes for ndvdual nodes n the DAG are pcked randoly between [50, 500] The crtcal path length L for each task s then calculated. We then assgn a perod (equal to ts deadlne) to each task. Note that a vald deadlne s at least the crtcal path length. Two types of perods were assgned to tasks 1 Note that, due to the lack of a schedulablty test, t s dffcult to test the resource augentaton bound of 1 experentally. For DECOMP, end-to-end deadlne (nstead of decoposed subtask s deadlne) ss ratos were reported. (1) Haronc Perod: In ths ethod, all tasks have perods that are ntegral powers of. We frst copute the sallest value a such that a s larger than a task s crtcal path length L. We then randoly assgn the perod ether a, a+1 or a+ to generate tasks wth varyng utlzaton. All tasks are then released at the sae te and sulated for the hyperperod of the tasks. () Arbtrary Perod: An arbtrary perod s assgned n the for (L + C 0.5 ) (1 + 0.5 gaa(,1)), where gaa(, 1) s the Gaa dstrbuton [37] wth k = and θ = 1. The forula s desgned such that, for sall, tasks tend to have saller utlzaton. Ths allows us to have a reasonable nuber of tasks n a task set for any value of. Task sets are created by addng tasks to the untl the total utlzaton reaches 99% of. These task sets were sulated for 0 tes the longest perod n a task set. Several paraeters were vared to test the syste: G(n, p) vs L(n, ) DAGs, dfferent p for G(n, p), Haronc vs Arbtrary Perods, Nubers of Cores (4, 8, 16, 3, 64). For each settng, we generated a 1000 task sets. We frst sulated the task sets for each settng on processors of speed 1, and ncreased the speed n steps of 0.. For each settng, we easured the falure rate the nuber of task sets where any task ssed ts deadlne. B. Experent Results 1) Erdos-Reny Method: For ths ethod, we generate two types of task sets: (1) Fxed p task sets: In ths settng, all task sets have the sae p. We vared the values of p fro 0.01 to 0.9. () Rando p task sets: We also generated task sets where each task has a dfferent, randoly pcked, value of p. Logarthc scale of falure rato 10 0 10 1 10 DECOMP, p = 0.0 GEDF, p = 0.0 10 3 DECOMP, p = 0.1 GEDF, p = 0.1 DECOMP, p = 0.5 GEDF, p = 0.5 DECOMP, p = 0.9 GEDF, p = 0.9 10 4 1 1.1 1. 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Speed up factor Fg. 6: Coparson as p changes ( = 64, haronc perod) Fgure 6 shows the falure rate for fxed-p task sets as we vared p and kept constant at 64. GEDF wthout decoposton outperfors DECOMP for all settngs of p. It appears that GEDF has the hardest te when p 0.1, where tasks are ore sequental. But even then, all tasks are schedulable wth speed 1.8. At p > 0.1, GEDF never requres speed ore than 1.4, whle DECOMP often requres a speed of to schedule all task sets. Trends are slar for other values of.

Mnu schedulable speed up factor. 1.8 1.6 1.4 1. 1 DECOMP, p=0.0 GEDF, p=0.0 DECOMP, p=0.5 GEDF, p=0.5 10 0 30 40 50 60 (nuber of cores) Fg. 7: Mnu schedulable speed up as changes (haronc perod) In addton to varyng p, we also vared the nuber of cores. In Fgure 7, aong all dfferent cobnatons we show the nu schedulable speed up as ncreases under p = 0.0 and p = 0.5. Results for other p and are slar. Ths shows that GEDF wthout decoposton generally needs saller speed up to schedule the sae task sets. The ncrease of n ths experent settng akes task sets harder to schedule for any cases, but larger s not always worse. Logarthc scale of falure rato 1 0.1 0.01 0.001 DECOMP, = 4 GEDF, = 4 DECOMP, = 16 GEDF, = 16 DECOMP, = 64 GEDF, = 64 0 1 1.1 1. 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Speed up factor Fg. 8: Coparson as changes (p = 0.0, haronc perod) For the rest of the fgures, we only show results wth = 4, 16 and 64, snce the trends for = 8 and 3 are slar. Fgure 8 shows the falure rato of the fxed-p task sets as we kept p constant at 0.0 and vared. Agan, GEDF outperfors DECOMP for all settngs. When = 4, GEDF can schedule all task sets at speed 1.4. The ncrease of does not nfluence DECOMP uch, whle t becoes slghtly harder for GEDF to schedule a few (out of 1000) task sets. In addton to haronc perods, we also run experents for arbtrary perods for all dfferent types of task sets. As the coparson between Fgure 9 and 8 suggest, the trends for arbtrary and haronc perods are slar. Though t appears that tasks wth arbtrary perods are easer to schedule usng, especally for GEDF. Ths s at least partally explaned by the observaton that, wth haronc perods, any tasks have Logarthc scale of falure rato 1 0.1 0.01 0.001 DECOMP, = 4 GEDF, = 4 DECOMP, = 16 GEDF, = 16 DECOMP, = 64 GEDF, = 64 0 1 1.1 1. 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Speed up factor Fg. 9: Coparson as changes (p = 0.0, arbtrary perod) the sae deadlne, akng t dffcult for GEDF to dstngush between the. Ths trends holds for other types of task sets. So for brevty, we only show the experental results for haronc perods for the other experents. Fgure 10 shows the falure rato for GEDF and DECOMP for task sets wherep s not fxed, but s randoly generated for each task, as we vary. Agan, GEDF outperfors DECOMP. Note, however, that t appears that GEDF appears to have a harder te here than n the fxed p experent. Logarthc scale of falure rato 10 0 10 1 10 10 3 DECOMP, = 4 GEDF, = 4 DECOMP, = 16 GEDF, = 16 DECOMP, = 64 GEDF, = 64 10 4 1 1.1 1. 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Speed up factor Fg. 10: Coparson as changes(rando p, haronc perod) ) Synchronous Method: Fgure 11 shows the coparson between GEDF and DECOMP wth varyng for synchronous task sets. In ths case, the falure rato for GEDF s hgher than for task sets generated wth the Erdos-Reny Method. We can also see that soetes DECOMP outperfors GEDF n ters of falure rato and requred speed up. Ths ndcates that synchronous tasks wth hghly parallel segents are ndeed ore dffcult for GEDF to schedule. However, even n ths case, we never requre a speedup of ore than. Even though Fgure 4 deonstrates that there exst task sets that requre speedup of ore than, such pathologcal task sets never appeared n our randoly generated saple. Results ndcate that GEDF perfors better than predcted by the capacty augentaton bound. For ost task sets, GEDF s better than DECOMP. Only aong task sets ntentonally

10 0 Logarthc scale of falure rato 10 1 10 10 3 DECOMP, = 4 GEDF, = 4 DECOMP, = 16 GEDF, = 16 DECOMP, = 64 GEDF, = 64 10 4 1 1.1 1. 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Speed up factor Fg. 11: Coparson as changes (L(n, ) tasks, haronc perod) desgned to be harder for GEDF to schedule, DECOMP perfors slghtly better. VII. CONCLUSIONS In ths paper, we have presented the best bounds known for parallel tasks represented as DAGs under GEDF. In partcular, we proved that GEDF provdes a resource augentaton bound of 1/ for sporadc task sets wth arbtrary deadlnes and a capacty augentaton bound of 4 / wth plct deadlnes. The capacty augentaton bound also serves as a sple schedulablty test, naely, a task set s schedulable on processors f (1) s at least 4 tes ts total utlzaton, and () the plct deadlne of each task s at least 4 tes ts crtcal path length. We also present sulaton results ndcatng ths bound s safe; n fact, we never saw a requred capacty augentaton of ore than on randoly generated task sets n our sulatons. Possble drectons of future work: Frst, we would lke to extend the capacty augentaton bounds to constraned and arbtrary deadlnes; In addton, whle we prove that a capacty augentaton bound of ore than 3+ 5 s needed, there s stll a gap between ths lower bound and the upper bound of (4 /) for capacty augentaton, whch we would lke to close. ACKNOWLEDGMENT Ths research was supported n part by NSF grant CCF- 1136073 (CPS) REFERENCES [1] A. Maghareh, S. J. Dyke, A. Prakash, G. Buntng, and P. Lndsay, Evaluatng odelng choces n the pleentaton of real-te hybrd sulaton, n EMI/PMC 01. [] K. Lakshanan, S. Kato, and R. R. Rajkuar, Schedulng parallel realte tasks on ult-core processors, n RTSS 10. [3] A. Safullah, K. Agrawal, C. Lu, and C. Gll, Mult-core real-te schedulng for generalzed parallel task odels, n RTSS 11. [4] A. Safullah, D. Ferry, K. Agrawal, C. Lu, and C. Gll, Real-te schedulng of parallel tasks under a general dag odel, Washngton Unversty n St Lous, USA, Tech. Rep. WUCSE-01-14, 01. [5] D. Ferry, J. L, M. Mahadevan, K. Agrawal, C. Gll, and C. Lu, A real-te schedulng servce for parallel tasks, n RTSS 13. [6] J. Lell, D. Faggol, T. Cucnotta, and S. Superore, An effcent and scalable pleentaton of global edf n lnux, n OSPERT 11. [7] B. B. Brandenburg and J. H. Anderson, On the pleentaton of global real-te schedulers, n RTSS 09. [8] S. Baruah, V. Bonfacy, A. Marchett-Spaccaelaz, L. Stougex, and A. Wese, A generalzed parallel task odel for recurrent real-te processes, n RTSS 1. [9] C. Lu and J. Anderson, Supportng soft real-te parallel applcatons on ultcore processors, n RTCSA 1. [10] N. Fsher, J. Goossens, and S. Baruah, Optal onlne ultprocessor schedulng of sporadc real-te tasks s possble, Real-Te Syst., vol. 45, no. 1-, pp. 6 71, 010. [11] C. D. Polychronopoulos and D. J. Kuck, Guded self-schedulng: A practcal schedulng schee for parallel supercoputers, IEEE Transactons on Coputers, vol. C-36, no. 1, pp. 145 1439, 1987. [1] M. Drozdowsk, Real-te schedulng of lnear speedup parallel tasks, Inf. Process. Lett., vol. 57, no. 1, pp. 35 40, 1996. [13] X. Deng, N. Gu, T. Brecht, and K. Lu, Preeptve schedulng of parallel jobs on ultprocessors, n SODA 96. [14] K. Agrawal, C. E. Leserson, Y. He, and W. J. Hsu, Adaptve workstealng wth parallels feedback, ACM Trans. Coput. Syst., vol. 6, Septeber 008. [15] J. M. Calandrno and J. H. Anderson, On the desgn and pleentaton of a cache-aware ultcore real-te scheduler, n ECRTS 09. [16] J. H. Anderson and J. M. Calandrno, Parallel real-te task schedulng on ultcore platfors, n RTSS 06. [17] Q. Wang and K. H. Cheng, A heurstc of schedulng parallel tasks and ts analyss, SIAM J. Coput., vol. 1, no., 199. [18] O.-H. Kwon and K.-Y. Chwa, Schedulng parallel tasks wth ndvdual deadlnes, Theor. Coput. Sc., vol. 15, no. 1-, pp. 09 3, 1999. [19] R. I. Davs and A. Burns, A survey of hard real-te schedulng for ultprocessor systes, ACM Cop. Surv., vol. 43, pp. 35:1 44, 011. [0] B. Andersson, S. Baruah, and J. Jonsson, Statc-prorty schedulng on ultprocessors, n RTSS 01, dec. 001, pp. 193 0. [1] A. Srnvasan and S. Baruah, Deadlne-based schedulng of perodc task systes on ultprocessors, Inforaton Processng Letters, vol. 84, no., pp. 93 98, 00. [] J. Goossens, S. Funk, and S. Baruah, Prorty-drven schedulng of perodc task systes on ultprocessors, Real-Te Systes, vol. 5, no., pp. 187 05, 003. [3] M. Bertogna, M. Crne, and G. Lpar, Schedulablty analyss of global schedulng algorths on ultprocessor platfors, Parallel and Dstrbuted Systes, vol. 0, no. 4, pp. 553 566, aprl 009. [4] S. Baruah and T. Baker, Schedulablty analyss of global edf, Real- Te Systes, vol. 38, no. 3, pp. 3 35, 008. [5] T. Baker and S. Baruah, Sustanable ultprocessor schedulng of sporadc task systes, n ECRTS 09, july 009, pp. 141 150. [6] J. Lee and K. G. Shn, Controllng preepton for better schedulablty n ult-core systes, n RTSS 1, Dec. 01. [7] S. Baruah, Optal utlzaton bounds for the fxed-prorty schedulng of perodc task systes on dentcal ultprocessors, Coputers, IEEE Transactons on, vol. 53, no. 6, pp. 781 784, june 004. [8] M. Bertogna and S. Baruah, Tests for global edf schedulablty analyss, J. Syst. Archt., vol. 57, no. 5, pp. 487 497, 011. [9] W. Y. Lee and H. Lee, Optal schedulng for real-te parallel tasks, IEICE Trans. Inf. Syst., vol. E89-D, no. 6, pp. 196 1966, 006. [30] S. Collette, L. Cucu, and J. Goossens, Integratng job parallels n real-te schedulng theory, Inf. Process. Lett., vol. 106, no. 5, pp. 180 187, 008. [31] G. Manaran, C. S. R. Murthy, and K. Raartha, A new approach for schedulng of parallelzable tasks n real-te ultprocessor systes, Real-Te Syst., vol. 15, no. 1, pp. 39 60, 1998. [3] S. Kato and Y. Ishkawa, Gang EDF schedulng of parallel task systes, n RTSS 09. [33] N. Fsher, S. Baruah, and T. P. Baker, The parttoned schedulng of sporadc tasks accordng to statc-prortes, n ECRTS 06. [34] L. Noguera and L. M. Pnho, Server-based schedulng of parallel realte tasks, n Internatonal Conference on Ebedded Software, 01. [35] V. Bonfac, A. Marchett-Spaccaela, S. Stller, and A. Wese, Feasblty analyss n the sporadc dag task odel, n ECRTS 13. [36] D. Cordero, G. Moun, S. Perarnau, D. Trystra, J.-M. Vncent, and F. Wagner, Rando graph generaton for schedulng sulatons, n SIMUTools 10. [37] Gaa dstrbuton, http://en.wkpeda.org/wk/gaa dstrbuton.