Capacity Augmentation Bounds for Parallel DAG Tasks under G-EDF and G-RM

T E C H N I C A L R E P O RT S I N C O M P U T E R S C I E N C E Technische Universität Dortund Capacity Augentation Bounds for Parallel DAG Tasks under G-EDF and G-RM Jian-Jia Chen and Kunal Agrawal Coputer Science 1 at TU Dortund Washington University in St. Louis, U.S.A. Nuber: 845 July 014 Technische Universität Dortund Fakultät für Inforatik Otto-Hahn-Str. 14, 447 Dortund

http://ls1-www.cs.tu-dortund.de Jian-Jia Chen and Kunal Agrawal : Capacity Augentation Bounds for Parallel DAG Tasks under G-EDF and G-RM, Technical Report, Departent of Coputer Science, Dortund University of Technology. July 014

A B S T R A C T This paper considers global earliest-deadline-first (EDF) and global rate-onotonic scheduling for a general task odel for parallel sporadic real-tie tasks. In particular, each sporadic real-tie task is characterized by the general directed acyclic graph (DAG). This paper provides the utilization-based analysis to test the schedulability of global EDF and global rate-onotonic scheduling. We show that if on unit-speed processors, a task set has total utilization of at ost and the critical path length of each task is saller than its deadline, then global EDF can schedule that task set on processors of speed 3+ 5.6181, defined as the capacity augentation bound. Together with the lower bound on the speeding up, we close the gap for global EDF when is sufficiently large. This is the best known capacity augentation bound for parallel DAG tasks under any scheduling strategy. In addition, we also show that global rate onotonic scheduling has a capacity augentation bound of + 3 3.731 with a siilar analysis procedure, the best known capacity augentation bound for fixed priority scheduling of the general DAG tasks. For global EDF and global RM, we also present utilization-based schedulability analysis tests based on the utilization and the axiu critical path utilization. A C K N O W L E D G M E N T S Part of the results of this paper will appear in Euroicro Conference on Real- Tie Systes (ECRTS) 014 in the paper ANALYSIS OF FEDERATED AND GLOBAL SCHEDULING FOR PARALLEL REAL-TIME TASKS. This research was supported in part by the priority progra Dependable Ebedded Systes (SPP 1500 - spp1500.itec.kit.edu), by DFG, as part of the Collaborative Research Center SFB876 (http://sfb876.tu-dortund.de/) and NSF grant CCF-133718. iii

C O N T E N T S 1 introduction 1 syste odel 4 3 canonical for of a dag task 6 4 global edf 11 4.1 Upper Bound on Capacity Augentation of G-EDF 11 4. Lower Bound on Capacity Augentation of G-EDF 13 4.3 Utilization-Based Schedulability Test 14 5 g-r scheduling 16 6 related work 19 7 conclusions 1 v

vi contents

I N T R O D U C T I O N 1 In the last decade, ulticore processors have becoe ubiquitous and there has been extensive work on how to exploit these parallel achines for real-tie tasks. In the real-tie systes counity, there has been extensive research on scheduling task sets with inter-task parallelis where each task in the task set is a sequential progra. In this case, increasing the nuber of cores allows us to increase the nuber of tasks in the task set. However, since each task can only use one core at a tie, the coputational requireent of a single task is still liited by the capacity of a single core. Recently, there has been soe interest in design and analysis of scheduling strategies for task sets with intra-task parallelis (in addition to inter-task parallelis), where individual tasks are parallel progras and can potentially utilize ore than one core in parallel. These odels enable tasks with higher execution deands and tighter deadlines, such as those used in autonoous vehicles [30], video surveillance, coputer vision, radar tracking and real-tie hybrid testing [8] In this paper, we consider the general directed acyclic graph (DAG) odel. We prove that both global EDF and global rate-onotonic schedulers provide strong perforance guarantees, in the for of capacity augentation bounds, for scheduling these parallel DAG tasks. One can generally derive two kinds of perforance bounds for real tie schedulers. The traditional bound is called resource augentation bound (also called processor speed-up factor). A scheduler A provides a resource augentation bound of b 1 if it can successfully schedule any task set T on processors of speed b as long as the ideal scheduler can schedule T on processors of speed 1. A resource augentation bound provides a good notion of how close a scheduler is to the optial schedule, but has a drawback. Note that the ideal scheduler is only a hypothetical scheduler, eaning that it always finds a feasible schedule if one exists. Unfortunately, Fisher et al. [6] proved that optial online ultiprocessor scheduling of sporadic task systes is ipossible. Since, often, we can not tell whether the ideal scheduler can schedule a given task set on unit-speed processors, a resource augentation bound ay not provide a schedulability test. The other kind of bound that is coonly used is a utilization bound. A scheduler A provides a utilization bound of b if it can successfully schedule any task set which has total utilization at ost /b on processors. 1 A utilization bound provides ore inforation than a resource augentation bound does; any scheduler that guarantees a utilization bound of b autoatically guarantees a resource augentation bound of b as well. In addition, it acts as a very siple schedulability test in itself, since the total utilization of the task set can be calculated in linear tie and copared to /b. Finally, a utilization bound gives an indication of how uch load a syste can handle; allowing us to estiate how uch over-provisioning ay be necessary when designing a platfor. Unfortunately, it is often ipossible to prove a utilization bound for 1 A utilization bound is often stated in ters of 1/b; we adopt this notation in order to be consistent. 1

introduction parallel systes; often, we can construct pathological task sets with utilization arbitrarily close to 1, but which can not scheduled on processors. Li et al. [33] defined a concept of capacity augentation bound which is siilar to the utilization bound, but adds a new condition. A scheduler A provides a capacity augentation bound of b if it can schedule any task set T which satisfies the following two conditions: (1) the total utilization of T is at ost /b, and () the worst-case critical-path length of each task Φ i (execution tie of the task on an infinite nuber of processors) is at ost 1/bth fraction of its deadline. A capacity augentation bound is quite siilar to a utilization bound: It also provides ore inforation than a resource augentation bound does; any scheduler that guarantees a capacity augentation bound of b autoatically guarantees a resource augentation bound of b as well. It also acts as a very siple schedulability test. Finally, it can also provide help while designing a platfor by allowing one to estiate the load it is expected to handle. There has been soe recent research on proving both resource augentation bounds and capacity augentation bounds for various scheduling strategies for parallel tasks. This work falls in two categories. In decoposition-based strategies, the parallel task is decoposed into a set of sequential tasks and these sequential tasks are scheduled using existing strategies for scheduling sequential tasks on ultiprocessors. In general, decoposition based strategies require explicit knowledge of the structure of the DAG off-line in order to apply the decoposition. In non-decoposition based strategies, the progra can unfold dynaically since no offline knowledge is required. For decoposed strategy, ost prior work considers synchronous tasks (subcategory of general DAGs) with iplicit deadlines. Lakshanan et al. [31] proved a capacity augentation bound of 3.4 for partitioned fixed-priority scheduling for a restricted category of synchronous tasks 3 by decoposing tasks and scheduling the decoposed tasks using a deadline onotonic scheduling strategy. Saifullah et al. [4] provide a different decoposition strategy for general parallel synchronous tasks and prove a capacity augentation bound of 4 when the decoposed tasks are scheduled using global EDF and 5 when they are scheduled using partitioned DM. Ki et al. [30] provide a different decoposition strategy for these synchronous tasks and prove a capacity augentation bound of 3.73 using global deadline onotonic strategy. In the respective papers, these results are stated as resource augentation bounds, but they are in fact the stronger capacity augentation bounds. Nelisson et al. [39] proved a resource augentation bound of for general synchronous tasks. For non-decoposition strategies, researchers have studied priarily global earliest deadline first (G-EDF) and global rate-onotonic (G-RM). Andersson and Niz [5] show that global EDF provides resource augentation bound of for synchronous tasks with constrained deadlines. Both Li et. al [33] and Bonifaci et. al [16] concurrently showed that global EDF provides a resource augentation bound of for general DAG tasks with arbitrary deadlines. In their paper, Bonifaci et al. also proved that G-RM provides a resource augentation bound of 3 for parallel DAG tasks with arbitrary deadlines. In addition, critical-path length of a sequential task is equal to its execution tie 3 Fork-join task odel in their terinology

introduction 3 Agrawal et. al also provide a capacity augentation bound of 4 for global EDF for task sets with iplicit deadlines. In suary, the best known capacity augentation bound for iplicit deadlines tasks are 4 for DAG tasks using global EDF, and 3.73 for parallel synchronous tasks using decoposition cobined with global DM. The contributions of this paper are as follows: 1. We iprove the capacity augentation bound of global EDF to 3+ 5.6181 for DAGs. When the nuber of processors,, is large, there is a atching lower bound for global EDF due to [33]; therefore, this result closes the gap for large. In addition, this is the best known capacity augentation bound for any scheduler for parallel DAG tasks.. We show that global RM has a capacity augentation bound of + 3 3.731. This is the best known capacity augentation bound for any fixed-priority scheduler for DAG tasks. Even if we restrict ourselves to synchronous tasks, this is the best bound for global fixed priority scheduling without decoposition. 3. For global EDF and global RM, we also present their resource augentation factor as a function of the utilization and the axiu critical path utilization. Moreover, we also present utilization-based schedulability analysis tests based on the utilization and the axiu critical path utilization. The paper is organized as follows. Section defines the DAG odel for parallel tasks and provides soe definitions. Section 3 presents a canonical for to give an upper bound of the work of a DAG that should be done in a specified interval length. Section 4 proves that global EDF provides the capacity augentation bound of.6181. Section 5 shows that global RM provides the capacity augentation bound of 3.731. Section 7 concludes this paper.

S Y S T E M M O D E L We now present the details of the DAG task odel for parallel tasks and soe additional definitions. task odel This paper considers a given set T of independent sporadic real-tie tasks {τ 1, τ,..., τ n }. A task τ i represents an infinite sequence of arrivals and executions of task instances (or also called jobs). We consider the sporadic task odel [38, 11] where, for a task τ i, the iniu inter-arrival tie or period T i represents the tie between consecutive arrivals of task instances, and the relative deadline D i represents the teporal constraint for executing the job. If a task instance of τ i arrives at tie t, the execution of this instance ust be finished no later than the absolute deadline t + D i and the release of the next instance of task τ i ust be no earlier than t plus the iniu inter-arrival tie, i..e, t + T i. In this paper, we consider iplicit deadline tasks where each task τ i s relative deadline D i is equal to its iniu inter-arrival tie T i ; that is, T i = D i. Each task τ i T is a parallel task; we consider a general odel for deterinistic parallel tasks, naely the DAG odel. Each task is characterized by its execution pattern, defined by a directed acyclic graph (DAG). Each node (subtask) in the DAG represents a sequence of instructions (a thread) and each edge represents dependency between nodes. A node (subtask) is ready to be executed when all its predecessors have been executed. Throughout this paper, as it is not necessary to build the analysis based on specific structures of the execution pattern, only two paraeters related to the execution pattern of task τ i are defined: total execution tie (or work) C i of task τ i : This is the suation of the worst-case execution ties of all the subtasks of task τ i. critical-path length Φ i of task τ i : This is the length of the critical path in the given DAG, in which each node is characterized by the worstcase execution tie of the corresponding subtask of task τ i ; critical path length is the worst case execution tie of the task on an infinite nuber of processors. Given a DAG, obtaining work C i and the critical-path length Φ i [43, pages 661-666] can both be done in linear tie. For notational brevity, the utilization C i T i of task τ i is denoted by u i. The total utilization of the task set is U = τ i T u i. Moreover, let the critical path utilization of task τ i, denoted as i, be Φ i T i. Also, let ax is the axiu critical path utilization of task set T, i.e., ax = ax τi T i. Finally, we also define V i as ax T i. processor odel and global scheduling This paper considers scheduling a task set on a unifor ultiprocessor or ulticore consisting of 4

syste odel 5 identical processors or cores. Specifically, we only consider global policies in which an instance of a subtask/task can be igrated aong the processors. In global scheduling, there is a global queue for the subjobs (subtask instances) that are ready to be executed. This paper will explore two global scheduling policies, global earliest-deadlinefirst (global EDF, or G-EDF) and global rate onotonic (global RM, or G-RM), to decide the priority orders of the subjobs in the global queue. In G-EDF, a subjob has higher priority than another if its absolute deadline of the job that contains it is earlier. In G-RM, a subjob of a task τ i has higher priority than another subjob of a task τ j if T i T j. For the siplicity of presentation, it is assued that the tasks are sorted by increasing deadline so that T i T j if i j. utilization-based schedulability test As entioned in Section 1 we analyze algoriths in ters of their capacity augentation bound. The foral definition is presented here: Definition.0.1 Given a task set T with total utilization of U, a scheduling algorith A with capacity augentation bound b can always schedule this task set on processors of speed b as long as T satisfies the following conditions on speed 1 processors. Utilization does not exceed total cores, u i (1) τ i T For each task τ i T, the critical path Φ i D i () Since no scheduler can schedule a task set T on unit speed processors unless Conditions (1) and () are et, capacity augentation bound autoatically leads to a resource augentation bound. This definition can be equivalently stated (without reference to the speedup factor) as follows: Condition (1) says that the total utilization U is at ost /b and Condition () says that the critical-path length of each task is at ost 1/b ties its relative deadline, that is, ax b. Therefore, in order to check if a task set is schedulable we only need to know the su of the task utilizations, and the axiu critical path utilization. Note that a scheduler with a saller b is better than another with a larger b, since b = 1 eans that A is an optial scheduler.

3 C A N O N I C A L F O R M O F A D A G TA S K In this section, we will represent each task τ i using its canonical for DAG. Note each task can have an arbitrarily coplex DAG structure which is difficult to analyze and ay not even be known to the scheduler before runtie. However, given the known task set paraeters (work, critical path length, utilization, critical-path utilization, etc.) we represent each task using a canonical DAG which allows us to upper bound the deand of the task in any given interval length t. These results will play iportant building blocks when we analyze the capacity augentation bounds for global EDF in Section 4 and global RM in Section 5. We first classify each task τ i as a light or heavy task. A task is a light task if its utilization u i = C i /T i ax. Otherwise, if τ i s utilization u i > ax, then we say that τ i is heavy. For analytical purposes, instead of considering the coplex DAG structure of individual tasks τ i, we consider a canonical for τ i of task τ i. The canonical for of a task is also represented by a DAG, but it is a uch sipler DAG. In particular, each (node) subtask of task τ i has execution tie ɛ, which is positive and arbitrarily sall. (For presentation clarity, we assue that V i ɛ and C i ɛ are both integers.) Light and heavy tasks have different canonical fors described below. The canonical for τ i of a light task τ i is siply a chain of C i /ɛ nodes, each with execution tie ɛ. Note that task τ i is a sequential task. The canonical for τ i of a heavy task τ i is a little ore coplex. It starts with a chain of V i /ɛ 1 nodes each with execution tie ɛ. Therefore, the total work of this chain is V i ɛ. The last node of the chain forks all the reaining nodes. That is, all the reaining (C i V i + ɛ)/ɛ nodes have an edge fro the last node of this chain, but no other edges. Therefore, all these forked subtasks can execute entirely in parallel. Due to the assuption that V i ɛ and C i ɛ are both integers for each task τ i, we know that the above construction of the canonical for τ i results in a feasible DAG structure. Figure 1 provides an exaple for such a transforation for a heavy task. It is iportant to note that the canonical for τ i does not depend on the internal DAG structure of τ i at all. On the other hand, it depends only on the task paraeters of task τ i and the axiu critical path utilization ax of the task set, since V i = ax T i. As an additional analysis tool, we define a hypothetical scheduling A which ust schedule a task set T on an infinite nuber of processors, that is, =. Since the syste has an infinite nuber of processors, the prioritization of the subjobs becoes unnecessary and A can obtain an optial schedule by siply assigning a subjob to a processor as soon as that subjob becoes ready for execution. Using this schedule, all the tasks respond in their critical-path length, that is, a job of task τ i finishes exactly Φ i tie units after it is released. 6

canonical for of a dag task 7 ɛ 4 3 5 1 3 ɛ ɛ ɛ ɛ 1 ɛ 1 nodes.. ɛ 8+ɛ ɛ nodes (a) original DAG (b) canonical for: heavy task Figure 1: A heavy DAG task τ i with Φ i = 1, V i = 1, C i = 0, and T i = D i = 0 and its canonical for, where the nuber in each node is its execution tie. Therefore, if Φ i T i for all tasks τ i in T, the above schedule always eets the deadlines of the tasks. We denote this schedule as S. Siilarly, S, is the resulting schedule when A schedules tasks on processors of speed 1. Note that S, finishes a job of task τ i exactly Φ i / tie units after it is released. We now define soe notations based on S,. We say that axiu load work i (t, ) for task i as the axiu aount of work that S, ay do on the subjobs of τ i in any interval of length t. Let q i (t, ) be the total work finished by S, between the arrival tie r i of task instance τ i and tie r i + t. That is, fro r i + t to r i + T i, i.e., interval length T i t, the reaining C i q i (t, ) workload (execution tie) has to be finished. Clearly, both q i (t, ) and work i (t, ) depend on the structure of the DAG. We can derive work i (t, ) as follows: work i (t, ) = C i q i (T i t, ) t T i tti C i + work i (t tti T i ) t > T i. (3) For the canonical for τ i, let q i (t, ) be defined with the sae definition of q i (t, ) for task τ i. As the canonical for in task τ i is well-defined, we can derive q i (t, ) directly. Note that ɛ can be arbitrarily sall, and, hence, its ipact is ignored when calculating q i (t, ). Suppose that V i is in{c i, V i }. For both heavy and light tasks, the first V i unit of work is sequential. Therefore, it is clear that q i (t, ) is t when t < V i. Moreover, if task τ i is a heavy task, we also have q i ( V i, ) is C i. We can now define the canonical workload work i (t, ) as the axiu workload of the canonical task τ i in any tie interval length in schedule S,. For a light task τ i, where C i /T i i, and τ i is a chain, it is easy to see that the canonical workload is work i (t, ) = 0 t < T i C i (t (T i C i )) T i C i t T i tti C i + work i (t tti T i ) t > T i. (4)

8 canonical for of a dag task 40 38 36 34 3 30 8 6 4 0 18 16 14 1 10 8 6 4 0 q i (t, ) q i (t, ) q i (t, 1) q i (t, 1) 0 4 6 8 10 1 14 16 18 0 4 6 8 30 3 34 36 38 40 t (a) q i (t, ) and q i(t, ) 40 38 36 34 3 30 8 6 4 0 18 16 14 1 10 8 6 4 0 work i (t, 1) work i (t, 1) work i (t, ) work i (t, ) 0 4 6 8 10 1 14 16 18 0 4 6 8 30 3 34 36 38 40 t (b) work i (t, ) and work i(t, ) Figure : q i (t, ), q i(t, ), work i (t, ) and work i(t, ) for the heavy task τ i with T i = 0 in Figure 1. Siilarly, for heavy tasks, where C i /T i i, when ɛ is arbitrarily sall, we have work i (t, ) = 0, t < T i V i C i V i + (t (T i V i )), T i V i < t T i tti C i + work i (t tti T i ) t > T i. (5) Figure provides an illustrative exaple for deonstrating q i (t, ), q i(t, ), work i (t, ), and work i(t, ) of the heavy task τ i defined in Figure 1 when T i = 0, = 1, and =. It can be observed that work i (t, ) work i(t, ) in this exaple. In fact, the following lea proves that work i (t, ) work i(t, ) for any t > 0 and 1.

canonical for of a dag task 9 Lea 1 For any t > 0 and 1, work i (t, ) work i(t, ). Proof. Suppose that V i is in{c i, V i }. For both heavy and light tasks, the first V i unit of work is sequential. Therefore, it is clear that q i (t, ) is t when t < V i. According to the definition, we have q i(t, ) t q i (t, ) when t < Φ i. In addition, q i ( V i, ) = C i, since at t = V i /, S, finishes the job since Φ i = V i for all τ i. Moreover, since the critical path length Φ i V i, for all i, we have q i ( Φ i, ) = C i, since S, finishes a job of task τ i exactly Φ i / tie units after it is released. Since Φ i V i, we can conclude that q i (t) q i(t) for any 0 t < T i. Cobining this fact with the definition of work(t, ) (Equation (3)), we coplete the proof. Moreover, the following lea provides an upper bound of the density (workload that has to be finished divided by the interval length) for both heavy and light tasks Lea For any task τ i (heavy or light), we have work i (t, ) t for any t > 0 and > 1. work i (t, ) t u i 1 ax (6) Proof. The first inequality in Inequality (6) coes fro Lea 1. We prove this lea by showing that the second inequality in Inequality (6) holds for light tasks and heavy tasks. We first note that the right hand side is non-negative; that is, 1 ax > 0, since > 1, and 0 < ax 1. There are two cases: Case 1: 0 < t T i We further consider light and heavy tasks separately as follows: If τ i is a light task, where the function work i (t, ) is defined in Equation (4): Therefore, for any 0 < t T i, we have work i (t, ) C i T i t (t T i + C i ) C i T i t = (t T i )( C i T i ) 1 (t T i )( 1) 0, where we get the step 1 by relying on three assuptions: (a) t T i ; (b) since τ i is a light task, we have C i T i ax 1; and (c) > 1. As we observed above, the RHS of Inequality (6) is non-negative. Therefore, the said inequality holds for any light task τ i for any 0 < t T i. If τ i is a heavy task, where the function work i (t, ) is defined in Equation (5). Inequality (6) holds naturally when 0 < t < T i V i

10 canonical for of a dag task since the left hand side is 0 and right hand side is non-negative. For any t such that T i V i t T i, we have work i (t, ) = C i + t T i t t = + T i ( u i ), t Therefore, work i (t,) t is axiized either (a) when t = T i V i if u i 0 or (b) t = T i if u i < 0. If (b) is true, Inequality (6) is obvious since we know that work i (T i,) T i = u i u i. If (a) is 1 ax true, then we have work i (T i V i, ) T i V i = C i V i T i V i = u i ax 1 ax = C i T i ax T i T i ax u i 1 ax Therefore, Inequality (6) holds for any heavy task τ i for any 0 < t T i. Case : t > T i Suppose that t is kt i + t, where k is tti and 0 < t T i. By Equation (4) and Equation (5) and the results known for 0 < t T i, we have work i (t, ) = kc i + work i (t, ) t kt i + t u i t 1 ax ku i T i + kt i + t (kt i + t ) kt i + t u i 1 ax. u i 1 ax. We will use the result of this lea in Sections 4 and 5 to derive bounds on global EDF and global RM scheduling.

G L O B A L E D F 4 In this section, we will use the results fro Section 3 to prove the capacity augentation bound of 3 1/+ 5 /+1/ (3 + 5)/ for global EDF scheduling of parallel DAG tasks. For large, this is tight, since Li et al. [33] showed that global EDF can not provide a capacity augentation bound of less than (3 + 5)/ when is large. In addition, we also show a lower bound of 3 /+ 5 1/+4/ when 3. 4.1 upper bound on capacity augentation of g-edf Our analysis builds on the analysis used to prove the resource augentation bounds by Bonifaci et al. [16]. We first review the particular lea fro the paper that we will use to achieve our bound. Lea 3 If t > 0, n ( + 1) t work i (t, ), i=1 the task set is schedulable by G-EDF on a platfor with speed. Proof. This is based on a reforulation of Lea 3 and Definition 10 in [16]. Theore 1 The capacity augentation bound for G-EDF is 3 1 + 5 + 1. Proof. According to Lea, for any > 1, it is clear that τ sup i work i (t, ) work i (t, ) sup t>0 t τ i t>0 t τ i u i 1 ax where sup is the supreu of a set of nubers. Therefore, if 1 1 ( + 1), 1 1, (7) we can also conclude that the schedulability test for G-EDF in Lea 3 holds. Therefore, if ( + 1) (1 1 ), then the task set is schedulable. In order to calculate, we can solve the equivalent quadratic equation (3 1) + ( 1) = 0. 11

1 global edf resource augentation bound.6.4. 1.8 1.6 1.4 1. 1 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 U Σ / ax =0.1 ax =0.3 ax =0.6 ax =1.0 Figure 3: The resource augentation bound of G-EDF when is sufficiently large. which solves to = 3 1 + 5 + 1. We now state a slightly ore general corollary relating the resource augentation bound to the total utilization. Corollary 1 The resource augentation bound for G-EDF is ax + U + 1 1 + ( ax + U + 1 1 ) 4(1 1 ) ax. Proof. The proof is the sae as in the proof of Theore 1 without taking the inequality in (7). Therefore, if U 1 ax ( + 1), ( ax + U + 1) + ( 1) ax 0, (8) we can also conclude that the schedulability test for G-EDF in Lea 3 holds. By solving the above inequality, it can be proved that the inequality holds when ax+ U +1 1 + ( ax + U +1 1 ) 4(1 1 ) ax. Figure 3 illustrates the resource augentation bound of G-EDF provided in Corollary 1 when is sufficiently large, i.e., =, by varying U and ax. The previously known resource augentation bound for EDF in such a case is [16]. As can be seen fro the figure, this corollary provides a tighter resource augentation bound for sall values of U and ax, but looser bound for larger values.

4. lower bound on capacity augentation of g-edf 13 4. lower bound on capacity augentation of g-edf As entioned above, Li et al. s lower bound [33] deonstrates the tightness of the above bound for large. We now provide the lower bound of the capacity augentation bound for sall. Consider a task set T with two tasks, τ 1 and τ. Task τ 1 starts with sequential execution for 1 ɛ aount of tie and then forks ɛ + 1 subtasks with execution tie ɛ. Here, ɛ is assued to be a positive nuber that is very sall and ɛ 1 is assued to be a positive integer. Therefore, the total work of task τ 1 is C 1 = 1 and its critical-path length Φ i = 1. The iniu inter-arrival tie of task τ 1 is 1. Task τ is siply a sequential task with work/execution tie of 1 1 and iniu inter-arrival tie also 1 1, where > 1 will be defined later. Clearly, the total utilization is and the critical-path length of each task is at ost the relative deadline (iniu inter-arrival tie). Lea 4 When < 3 δ+ 5 1 + 4 and δ = ɛ g(ɛ) and 3, 1 ɛ + Proof. By solving 1 ɛ + > 1 1 1. (9) = 1 1 1, we know that the equality holds when is equal to 3 ɛ+ 5 1 + 4 g(ɛ), in which g(ɛ) is a function of ɛ 1 and approaches to 0 when ɛ approaches 0. Therefore, when < 3 ɛ+ 5 1 + 4 g(ɛ), it is clear that 1 ɛ + > 1 1 1. Now, by setting δ to ɛ, we reach the conclusion. Theore The capacity augentation bound for G-EDF is at least 3 + 5 1 + 4, when ɛ + 0. Proof. Consider the syste with two tasks τ 1 and τ defined at the beginning of Section 4.. Suppose that the arrival tie of task τ 1 is at tie 0, and the arrival tie of task τ is at tie 1 1 + ɛ. By definition, the first jobs of τ 1 and τ have absolute deadlines at 1 and 1 + ɛ. Therefore, G-EDF scheduling will execute the sequential execution of task τ 1, and then execute the sub-jobs of task τ 1, and then execute τ. The finishing tie of task τ 1 by running at speed is not earlier than 1 ɛ + ɛ ɛ = 1 ɛ +. Therefore, the finishing tie of task τ is not earlier than 1 ɛ + + 1 1.

14 global edf capacity augentation bound.7.65.6.55.5.45.4 upper bound lower bound.35 10 0 30 40 50 60 70 80 90 100 Figure 4: The upper bound of G-EDF provided in Theore 1 and the lower bound in Theore with respect to the capacity augentation bound. > 1 + ɛ, then we know that task τ isses its deadline. By Lea 4, we reach the conclusion. Figure 4 illustrates the upper bound of G-EDF provided in Theore 1 and the lower bound in Theore with respect to the capacity augentation bound. It can be easily seen that the upper and lower bounds are getting closer when is larger. When is 100, the gap between the upper and the lower bounds is roughly about 0.016. If 1 ɛ + + 1 1 4.3 utilization-based schedulability test We conclude this section by extending the above analysis to provide a utilizationbased schedulability test based on ax and U in the following corollary. Corollary If U 1 1 ax + 1 1, then the task set can be feasibly scheduled by using G-EDF. Proof. This is proved by perforing the schedulability test at the platfor with speed. In that platfor with speed, the critical path utilization aong the tasks is at ost U ax and the total utilization is. The following proof analyzes the speed-up factor that can guarantee the schedulability at a platfor with speed by assuing that ax is given as an input Y.

4.3 utilization-based schedulability test 15 0.6 0.5 =4 =16 = 0.4 U Σ / 0.3 0. 0.1 0 0 0. 0.4 0.6 ax 0.8 1 Figure 5: The utilization bound schedulability test for G-EDF with respect to given ax. Siilar to the proof of Theore 1 without taking the condition ax 1 in the inequality in (7), we know that if 1 ax ( + 1), 1 1 Y + 1 1, (10) we can also conclude that the schedulability test for G-EDF in Lea 3 holds at the platfor with speed. This iplies that when the total utilization at speed is no ore than, we can guarantee the schedulability of this 1 1 Y +1 1 task set at a platfor with speed. By the above arguent, if the platfor with speed is the platfor that we would like to test the schedulability of G-EDF, we already reach the conclusion.

5 G - R M S C H E D U L I N G This section presents the proof that G-RM provides a capacity augentation bound of 1 + 3 1 + 1 + 3 for large. The structure of the 4 proof is very siilar to the analysis in Section 4. Again, we use a lea fro [16], restated below. Lea 5 If t > 0, n 0.5 ( + 1) t work i (t, ), i=1 the task set is schedulable by G-RM on a platfor with speed. Proof. This is based on a reforulation of Lea 6 and Definition 10 in [16]. Note that that analysis in [16] is for deadline-onotonic scheduling, by giving a subjob of a task higher priority if its relative deadline is shorter. As we consider task sets with iplicit deadlines, deadline-onotonic scheduling is the sae as rate-onotonic scheduling. By using Lea 5, with siilar proofs in Theore 1 and Corollary 1 we have the following results. Theore 3 The capacity augentation bound for G-RM is 4 1 + 1 4 + 1. Proof. In the siilar anner as the proof of Theore 1, for any > 1, we know that τ sup i work i (t, ) t>0 t 1 1. (11) Therefore, if 1 1 0.5( + 1), we can also conclude that the schedulability test for G-RM in Lea 5 holds. By solving the inequality above, we know that 0.5( + 1) holds 1 4 + 1. 1 1 when 4 1 + The result in Theore 3 is the best known result for the capacity augentation bound for global fixed-priority scheduling for general DAG tasks with arbitrary structures. Interestingly, Ki et al. [30] get the sae bound of + 3 for global fixed-priority scheduling of parallel synchronous tasks (a subset of DAG tasks). The strategy used in [30] is quite different. In particular, in their algorith, the tasks undergo a stretch transforation which generates a set of sequential subtask (each with its release tie and deadline) for each parallel task τ i in the 16

g-r scheduling 17 original task set. These subtask are then scheduled using a global DM scheduling algorith [13]. Note that even though the parallel tasks in the original task set are iplicit deadline tasks, the transfored sequential tasks are only constrained deadline tasks hence the need for deadline onotonic scheduling instead of rate onotonic scheduling. The following, slightly ore general, corollary relates the resource augentation bound to the utilization. Corollary 3 The resource augentation bound for G-RM is ax + U + 1 1 + ( ax + U + 1 1 ) 4(1 1 ) ax. Proof. The proof is the sae as in the proof of Corollary 1. Therefore, if U 1 ax 0.5( + 1), ( ax + U + 1) + ( 1) ax 0, (1) we can also conclude that the schedulability test for G-EDF in Lea 3 holds. By solving the above inequality, it can be proved that the inequality holds when ax+ U +1 1 + ( ax + U +1 1 ) 4(1 1 ) ax. Figure 6 illustrates the resource augentation bound of G-RM provided in Corollary 3 when is sufficiently large, i.e., =, by varying U and ax. The previously known resource augentation bound for global RM is 3 [16]. As can be seen fro the figure, this corollary provides a tighter resource augentation bound for sall values of U and ax, but looser bound for larger values. Corollary 4 If U 1 ax + 1 1, then the task set can be feasibly scheduled by using G-RM. Proof. The proof is identical to the proof of Corollary by following Theore 3 instead of Theore 1.

18 g-r scheduling resource augentation bound 3.5 3.5 1.5 1 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 U Σ / ax =0.1 ax =0.3 ax =0.6 ax =1.0 Figure 6: The resource augentation bound of G-RM when is sufficiently large. 0.4 0.35 0.3 =4 =16 = U Σ / 0.5 0. 0.15 0.1 0.05 0 0 0. 0.4 0.6 ax 0.8 1 Figure 7: The utilization bound schedulability test for G-RM with respect to given ax.

R E L AT E D W O R K 6 In this section, we review closely related work on real-tie scheduling, concentrating priarily on scheduling task sets with parallel tasks. Real-tie ultiprocessor scheduling considers scheduling sequential tasks on coputers with ultiple processors or cores and has been studied extensively (see [, 1] for a survey). In addition, platfors such as Litus RT [19, 17] have been designed to support these task sets. Here, we review a few relevant theoretical results. Researchers have proven both resource augentation bounds, utilization bounds and capacity augentation bounds. The best known utilization bound for global EDF for sequential tasks on a ultiprocessor is (traditionally stated as 1/b = 50%)[9]; therefore, global EDF trivially provides a resource and capacity augentation bound of as well. Partitioned EDF and versions partitioned static priority schedulers also provide a utilization bound of [35, 6]. Global RM provides a capacity augentation bound of 3 [4] to iplicit deadline tasks. For parallel real-tie tasks, ost early work considered intra-task parallelis of liited task odels such as alleable tasks [3, 1, 9] and oldable tasks [37]. Kato et al. [9] studied the Gang EDF scheduling of oldable parallel task systes. Researchers have since considered ore realistic task odels that represent progras that are typically generated by coonly used general purpose parallel prograing languages such as Cilk faily [1, 14], OpenMP [], and Intel s Thread Building Blocks [41]. These languages and libraries generally support priitives such as parallel-for loops and fork/join or spawn/sync in order to expose parallelis within the progras. Using these constructs in various cobinations generates tasks whose structure can be represented with different types of DAGs. Tasks with one particular structure, naely parallel synchronous tasks, have been studied ore than others in the real-tie counity. These tasks are generated if only we use only parallel-for loops to generate parallelis. Lakshanan et al. [31] proved a (capacity) augentation bound of 3.4 for a restricted synchronous task odel which is generated when we restrict each parallel-for loop in a task to have the sae nuber of iterations. General synchronous tasks (with no restriction on the nuber of iterations in the parallelfor loops), have also been studied [4, 30, 39, 5]. (More details on these results were presented in Section 1) Chwa et al. [0] provide a response tie analysis. If we do not restrict the priitives used to parallel-for loops, we get a ore general task odel ost easily represented by a general directed acyclic graph. A resource augentation bound of 1 for G-EDF was proved for a single DAG with arbitrary deadlines [10] and for ultiple DAGs [16, 33]. A capacity augentation bound of 4 was proved in [33] for tasks with for iplicit deadlines. Liu and Anderson [34] provide a response tie analysis for G-EDF. 19

0 related work There has been significant work on scheduling parallel systes in the nonreal tie context [40, 4, 3, 7, 8, 5]. In this context, the goal is generally to axiize throughput; tasks have no deadlines or periods. Various provably good scheduling strategies, such as list scheduling [7, 18] and work-stealing [15] have been designed. In addition, any platfors have been built based on these results: exaples include parallel languages and runtie systes, such as the Cilk faily [1, 14], OpenMP [], and Intel s Thread Building Blocks [41]. While ultiple tasks on a single platfor have been considered in the context of fairness in resource allocation [3], none of this work considers real-tie constraints.

C O N C L U S I O N S 7 In this paper, we consider parallel tasks in the DAG odel and prove that global EDF provides a capacity augentation bound of 3 1/+ 5 /+1/, which is no ore than (3 + 5)/.618, to parallel tasks with iplicit deadlines and global RM provides the capacity augentation bound of 1 + 3 1 + 1 + 3 3.73 to these tasks. These are the best known 4 bounds for these schedulers for DAG tasks. In addition, for a large enough nuber of processors, the global EDF bound of (3 + 5)/ is tight, since there exists a atching lower bound. Moreover, the global EDF bound of (3 + 5)/ is the best known capacity augentation bound for any scheduler for parallel tasks. For global EDF and global RM, we also present utilization-based schedulability analysis tests based on the utilization and the axiu critical path utilization. There are several directions of future work. Global RM capacity augentation bound is not known to be tight. The current lower bound of the capacity augentation bound of G-RM is 1/0.3748.668, inherited fro the sequential sporadic real-tie tasks without DAG structures [36]. Therefore, it is worth investigating a atching lower bound or lowering the upper bound. In addition, it would be interesting to investigate if the lower bound of.618 for global EDF is a general lower bound for any parallel scheduler or if it is possible to design schedulers that beat this bound. Finally, all the known capacity augentation bound results are restricted to iplicit deadline tasks; it would be interesting to generalize capacity augentation bounds for constrained and arbitrary deadline tasks. 1

B I B L I O G R A P H Y [1] Intel CilkPlus. http://software.intel.co/en-us/articles/ intel-cilk-plus. [] OpenMP Application Progra Interface v3.1, July 011. http://www. openp.org/p-docuents/openmp3.1.pdf. [3] K. Agrawal, C. E. Leiserson, Y. He, and W. J. Hsu. Adaptive work-stealing with parallelis feedback. ACM Trans. Coput. Syst., 6:11 10, Septeber 008. [4] B. Andersson, S. Baruah, and J. Jonsson. Static-priority scheduling on ultiprocessors. In Real Tie Systes Syposiu, pages 193 0, dec. 001. [5] B. Andersson and D. de Niz. Analyzing global-edf for ultiprocessor scheduling of parallel tasks. In Principles of Distributed Systes, pages 16 30. 01. [6] B. Andersson and J. Jonsson. The utilization bounds of partitioned and pfair static-priority scheduling on ultiprocessors are 50%. In Euroicro Conference on Real Tie Systes, pages 33 40, 003. [7] N. S. Arora, R. D. Bluofe, and C. G. Plaxton. Thread scheduling for ultiprograed ultiprocessors. In SPAA 98. [8] N. Bansal, K. Dhadhere, J. Koneann, and A. Sinha. Non-clairvoyant scheduling for iniizing ean slowdown. Algorithica, 40(4):305 318, 004. [9] S. Baruah, V. Bonifaci, A. Marchetti-Spaccaela, and S. Stiller. Iproved ultiprocessor global schedulability analysis. Real-Tie Syst., 46(1):3 4, Sept. 010. [10] S. Baruah, V. Bonifaciy, A. Marchetti-Spaccaelaz, L. Stougiex, and A. Wiese. A generalized parallel task odel for recurrent real-tie processes. In Real Tie Systes Syposiu, 01. [11] S. K. Baruah, A. K. Mok, and L. E. Rosier. Preeptively scheduling hardreal-tie sporadic tasks on one processor. In IEEE Real-Tie Systes Syposiu, pages 18 190, 1990. [1] M. Bertogna and S. Baruah. Tests for global edf schedulability analysis. Journal of Syste Architecture, 57(5):487 497, 011. [13] M. Bertogna, M. Cirinei, and G. Lipari. New schedulability tests for realtie task sets scheduled by deadline onotonic on ultiprocessors. In Proceedings of the 9th International Conference on Principles of Distributed Systes, OPODIS 05, pages 306 31, 006.

Bibliography 3 [14] R. D. Bluofe, C. F. Joerg, B. C. Kuszaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient ultithreaded runtie syste. pages 07 16, Santa Barbara, California, July 1995. [15] R. D. Bluofe and C. E. Leiserson. Scheduling ultithreaded coputations by work stealing. Journal of the ACM, 46(5):70 748, 1999. [16] V. Bonifaci, A. Marchetti-Spaccaela, S. Stiller, and A. Wiese. Feasibility analysis in the sporadic dag task odel. In ECRTS, pages 5 33, 013. [17] B. B. Brandenburg, A. D. Block, J. M. Calandrino, U. Devi, H. Leontyev, and J. H. Anderson. Litus rt: A status report, 007. [18] R. P. Brent. The parallel evaluation of general arithetic expressions. Journal of the ACM, pages 01 06, 1974. [19] J. M. Calandrino, H. Leontyev, A. Block, U. C. Devi, and J. H. Anderson. Litus RT : A testbed for epirically coparing real-tie ultiprocessor schedulers. In Proceedings of the 7th IEEE International Real-Tie Systes Syposiu, RTSS 06, pages 111 16, 006. [0] H. S. Chwa, J. Lee, K.-M. Phan, A. Easwaran, and I. Shin. Global edf schedulability analysis for synchronous parallel tasks on ulticore platfors. In ECRTS 13. [1] S. Collette, L. Cucu, and J. Goossens. Integrating job parallelis in realtie scheduling theory. Inforation Processing Letters, 106(5):180 187, 008. [] R. I. Davis and A. Burns. A survey of hard real-tie scheduling for ultiprocessor systes. ACM Coputing Surveys, 43:35:1 44, 011. [3] X. Deng, N. Gu, T. Brecht, and K. Lu. Preeptive scheduling of parallel jobs on ultiprocessors. In SODA 96. [4] M. Drozdowski. Real-tie scheduling of linear speedup parallel tasks. Inf. Process. Lett., 57:35 40, January 1996. [5] J. Edonds, D. D. Chinn, T. Brecht, and X. Deng. Non-clairvoyant ultiprocessor scheduling of jobs with changing execution characteristics. Journal of Scheduling, 6(3):31 50, 003. [6] N. Fisher, J. Goossens, and S. Baruah. Optial online ultiprocessor scheduling of sporadic real-tie tasks is ipossible. Real-Tie Systes Journal, 45(1-):6 71, 010. [7] R. L. Graha. Bounds on ultiprocessing anoalies. SIAM Journal on Applied Matheatics, pages 17():416 49, 1969. [8] H.-M. Huang, T. Tidwell, C. Gill, C. Lu, X. Gao, and S. Dyke. Cyberphysical systes for real-tie hybrid structural testing: a case study. In International Conference on Cyber Physical Systes, 010. [9] S. Kato and Y. Ishikawa. Gang EDF scheduling of parallel task systes. In Proceedings of the Real Tie Systes Syposiu, 009.

4 Bibliography [30] J. Ki, H. Ki, K. Lakshanan, and R. Rajkuar. Parallel scheduling for cyber-physical systes: analysis and case study on a self-driving car. In ICCPS, pages 31 40, 013. [31] K. Lakshanan, S. Kato, and R. R. Rajkuar. Scheduling parallel realtie tasks on ulti-core processors. In Proceedings of the 010 31st IEEE Real-Tie Systes Syposiu, RTSS 10, pages 59 68, 010. [3] W. Y. Lee and H. Lee. Optial scheduling for real-tie parallel tasks. IEICE Transactions on Inforation Systes, E89-D(6):196 1966, 006. [33] J. Li, K. Agrawal, C.Lu, and C. Gill. Analysis of global edf for parallel tasks. In Euroicro Conference on Real Tie Systes, 013. [34] C. Liu and J. Anderson. Supporting soft real-tie parallel applications on ulticore processors. In RTCSA, 01. [35] J. M. López, J. L. Díaz, and D. F. García. Utilization bounds for edf scheduling on real-tie ultiprocessor systes. Real-Tie Systes Journal, 8(1):39 68, Oct. 004. [36] L. Lundberg. Analyzing fixed-priority global ultiprocessor scheduling. In IEEE Real Tie Technology and Applications Syposiu, pages 145 153, 00. [37] G. Maniaran, C. S. R. Murthy, and K. Raaritha. A new approach for scheduling of parallelizable tasks inreal-tie ultiprocessor systes. Real-Tie Syst., 15:39 60, July 1998. [38] A. K. Mok. Fundaental design probles of distributed systes for the hard-real-tie environent. Technical report, Cabridge, MA, USA, 1983. [39] G. Nelissen, V. Berten, J. Goossens, and D. Milojevic. Techniques optiizing the nuber of processors to schedule ulti-threaded tasks. In Euroicro Conference Real Tie Systes, 01. [40] C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: A practical scheduling schee for parallel supercoputers. Coputers, IEEE Transactions on, C-36(1):145 1439, dec. 1987. [41] J. Reinders. Intel threading building blocks: outfitting C++ for ulti-core processor parallelis. O Reilly Media, 010. [4] A. Saifullah, K. Agrawal, C. Lu, and C. Gill. Multi-core real-tie scheduling for generalized parallel task odels. In Real Tie Systes Syposiu, 011. [43] R. Sedgewick and K. D. Wayne. Algoriths. Addison-Wesley Professional, 4th edition, 011.