IN MODERN embedded systems, the ever-increasing complexity

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO., JANUARY 208 2 Muticore Mixed-Criticaity Systems: Partitioned Scheduing and Utiization Bound Jian-Jun Han, Member, IEEE, Xin Tao, Dakai Zhu, Member, IEEE, Hakan Aydin, Member, IEEE, Zii Shao, Member, IEEE, and Laurence T. Yang, Senior Member, IEEE Abstract In mixed-criticaity MC) systems, mutipe activities with various certification requirements thus with different criticaity eves) can co-exist on shared hardware patforms, where muticore processors have emerged as the de facto computing engines. In this paper, by using the partitioned eariest-deadinefirst with virtua deadines EDF-VDs) scheduer for a set of periodic MC tasks running on muticore systems, we derive a criticaity-aware utiization bound for efficient feasibiity tests and then identify its characteristics. Our anaysis shows that the bound increases with increasing number of cores and decreasing system criticaity eve. We show that, since the utiizations of MC tasks at different criticaity eves can vary consideraby, the utiization contribution of a task on different cores may have arge variations and thus can significanty affect the system scheduabiity under the EDF-VD scheduer. Based on these observations, we propose a nove and efficient criticaity-aware task partitioning agorithm CA-TPA) to compensate for the inherent pessimism of the utiization bound. In order to improve the system scheduabiity, the task priorities are determined according to their utiization contributions to the system in CA-TPA. Moreover, by anayzing the utiization variations of tasks at different eves, we deveop severa heuristics to minimize the utiization increment and baance the workoad on cores. The simuation resuts show that the CA-TPA scheme is very effective in achieving higher scheduabiity ratio and yieding baanced workoads. The actua impementation in Linux operating system further demonstrates the appicabiity of CA-TPA with ower run-time overhead, compared to the existing partitioning schemes. Index Terms Embedded systems, mixed-criticaity MC), muticore systems, partitioned scheduing, utiization bound. Manuscript received September 25, 206; revised January 25, 207; accepted Apri 9, 207. Date of pubication Apri 25, 207; date of current version December 20, 207. This work was supported in part by the Nationa Natura Science Foundation of China under Award 647250, in part by the Fundamenta Research Funds for the Centra Universities China HUST) under Grant 206YXMS08 and Grant 205TS072, and in part by the U.S. Nationa Science Foundation under Award CNS-422709 and Award CNS-42855. This paper was recommended by Associate Editor J. Xue. Corresponding author: Xin Tao.) J.-J. Han, X. Tao, and L. T. Yang are with the Schoo of Computer Science and Technoogy, Huazhong University of Science and Technoogy, Wuhan 430074, China e-mai: asonhan@hust.edu.cn; m20472823@hust.edu.cn; tyang@stfx.ca). D. Zhu is with the Department of Computer Science, University of Texas at San Antonio, San Antonio, TX 78249 USA e-mai: dzhu@cs.utsa.edu). H. Aydin is with the Department of Computer Science, George Mason University, Fairfax, VA 22030 USA e-mai: aydin@cs.gmu.edu). Z. Shao is with the Department of Computing, Hong Kong Poytechnic University, Hong Kong e-mai: cszshao@comp.poyu.edu.hk). This paper has suppementary downoadabe mutimedia materia avaiabe at http://ieeexpore.ieee.org provided by the authors. Coor versions of one or more of the figures in this paper are avaiabe onine at http://ieeexpore.ieee.org. Digita Obect Identifier 0.09/TCAD.207.2697955 I. INTRODUCTION IN MODERN embedded systems, the ever-increasing compexity demands the integration of mutipe functionaities on a common computing patform due to space, power, and cost constraints. For instance, the integrated moduar avionics initiative for aerospace provides guideines for hosting mutipe avionics components on shared systems to address the increased compexity and cost [27]. In such integrated systems, diverse appication activities with various certification requirements and different eves of importance criticaity) may co-exist. For exampe, the avionics certification standard DO-78 B/C defines five design assurance eves A to E, which are distinguished according to the extent of damage that resut from activity faiures [24]. To incorporate various certification requirements and enabe the efficient management of appication activities, the concept of mixedcriticaity MC) systems has been proposed in Vesta s semina work [29]. Over the ast decade, numerous MC scheduing studies have been reported for a variety of system and task modes [8], [], [2], [7], [22], [26], [30]. Unike the traditiona sporadic rea-time task systems, where the worst-case resource requirements of a tasks must be satisfied, the successfu execution of an MC task is defined by its own criticaity eve and the system s running mode. The basic principe of the MC mode is to have more than one criticaity eve, where tasks at the kth >) criticaity eve have k different worst case execution requirements [29]. Moreover, the execution requirement of a task at k )th eve is no higher than that at kth eve. Most of the existing studies considered MC tasks running on singe processor systems, with a focus on Fixed- Priority-based scheduing FPS) [4], [9], [2], [25], [30] and eariest-deadine-first EDF)-based scheduing techniques [7], [8], [0], [9], [28]. The most notabe EDF-based scheduing agorithm for MC tasks is the recenty proposed EDF with virtua deadines EDF-VDs) agorithm [2], [3]. The basic idea of the EDF-VD scheduer is to feasiby assign virtua and smaer) deadines and thus higher priorities) to high-criticaity tasks when the system operates at ow-criticaity mode, in order to improve the scheduabiity. As muticore processors have become powerfu computing engines for modern systems, there is a renewed interest in exporing scheduing agorithms for muticore/mutiprocessor rea-time systems. There are two cassica approaches to the mutiprocessor scheduing probem: ) partitioned scheduing 0278-0070 c 207 IEEE. Persona use is permitted, but repubication/redistribution requires IEEE permission. See http://www.ieee.org/pubications_standards/pubications/rights/index.htm for more information.

22 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO., JANUARY 208 and 2) goba scheduing. A recent empirica study on muticore scheduing shows that, when compared to goba scheduing, the partitioned-based scheduing generay has better system scheduabiity, where the individua ob queues on processor cores and migration-ess activities at run-time typicay resut in much ower run-time overhead [3]. Typicay, the existing partitioned MC scheduing studies focus on dua-criticaity systems with ony two criticaity eves) and adopt the traditiona heuristics that usuay rey on either task/system utiizations [4], [8] [e.g., first-fit decreasing FFD), best-fit decreasing BFD), and worst-fit decreasing WFD)] or the criticaity eves [6]. It has been shown that a hybrid partitioned scheme [23], which aocates high criticaity tasks using WFD and then ow-criticaity tasks using FFD, can effectivey improve task scheduabiity compared to the schemes that consider ony either utiization or criticaity. Moreover, based on the EDF-VD agorithm and speedup factor anaysis, severa mapping schemes for dua-criticaity systems are reported in [6]. To achieve better task scheduabiity abeit at the cost of much higher time compexity), severa mapping agorithms for dua-criticaity systems were recenty reported, such as the demand bound function DBF)-based feasibiity test [4] and the mixed integer noninear programming soutions that use task custering [22]. The existing partitioned scheduing agorithms for duacriticaity systems normay consider ony a task s utiization at its highest criticaity eve i.e., its maximum utiization) [6], [8], [22], [23] and this usuay eads to pessimistic estimates of system utiization and thus degraded system scheduabiity. On the other hand, a ook at the existing scheduabiity conditions for EDF-VD [2], [3] reveas that, in addition to its maximum utiization, an MC task s utiizations at other vaid ower) eves aso pay an important roe. Motivated by this observation, in this paper, we derive a utiization bound for MC tasks with mutipe criticaity eves running on muticore patforms under partitioned EDF-VD, and discuss its properties. By empoying the atest variant of the EDF-VD agorithm [3], we extend our previous work [5] and propose a criticaity-aware task partitioning agorithm CA-TPA), where the utiization variations of tasks at different eves are taken into account for better scheduabiity and baanced workoad distribution. Specificay, the contributions of this paper can be summarized as foows. ) We deveop a criticaity-aware utiization bound for MC tasks schedued by the partitioned EDF-VD agorithm with WFD mapping heuristic, which forms the basis of an efficient feasibiity test. 2) We identify the monotonicity of the utiization bound, showing that the bound increases with the number of depoyed processor cores and decreases with system criticaity eve. 3) By expoiting the variations of tasks s utiizations at different criticaity eves, we present an efficient task partitioning agorithm with EDF-VD scheduer. Severa criticaity-aware heuristics are proposed to improve tasks scheduabiity and baance the workoad across muticores. 4) The empirica resuts from an actua impementation in the Linux operating system show that CA-TPA with EDF-VD is quite practica thanks to the reativey ow run-time overhead and its criticaity-aware workoad baancing poicy. The remainder of this paper is organized as foows. Section II presents the task and system modes and discusses the feasibiity conditions of the EDF-VD scheduer. The criticaity-aware utiization bound under partitioned EDF-VD with WFD is deveoped, and its properties are discussed, in Section III. Our CA-TPA scheme is presented in Section IV. The evauation resuts and impementation are discussed in Sections V and VI concudes this paper. A review of the MC scheduing research can be found in the suppementary materia. II. SYSTEM MODELS AND PRELIMINARIES In this section, we first present the system and task modes. The scheduabiity conditions for periodic MC tasks schedued by EDF-VD on singe processor are briefy reviewed, foowed by the description of the probem addressed in this paper. A. System and Task Modes We consider a muticore system that consists of M 2 homogeneous processing cores, which are denoted as {P,...,P M }.AsetofN periodic MC tasks ={τ,...,τ N } are schedued in the system. The MC tasks have K > criticaity eves. K is caed the system criticaity eve and the system starts its operation at eve- criticaity. An MC task τ i is characterized by a tupe {C i, p i, i }. i i K) indicates τ i s criticaity eve i.e., its own criticaity). The system criticaity eve K is corresponds to the maximum criticaity eve among a the tasks. p i denotes the task τ i s period as we as its reative deadine thus, we consider impicit-deadine task systems). The vector C i = <c i ),...,c i i )> represents the worst-case execution times WCETs) of task τ i at each criticaity eve, where the WCET of a task at a higher eve is generay arger than that at a ower eve, that is, c i ) <c i 2) < < c i i ). We assume that the th instance ob) of task τ i arrives at time r i, = ) p i and must compete its execution by its absoute deadine d i, = p i. We assume partitioned scheduing the subset of tasks aocated to core P m is denoted as m m =,...,M) and a partition of tasks to cores is represented as Ɣ ={,..., M }, where = M m= m. We assume that the adaptive mixed criticaity AMC) scheme [3], [5] which is appicabe for both FPS and EDF-based scheduing) is adopted to manage the individua executions of obs at run-time on the muticore system. When the current system operates at eve-k <K) running mode and a task τ i executes for more than its eve-k WCET c i k) k < i ) without indicating its competion, the system performs a goba mode transition and the running mode switches to eve-k + ). At that moment, a tasks in the system with own criticaity eve k are discarded and no future eve-k tasks are reeased, unti the system becomes ide and gets back to eve- running mode [6]. Once tasks are mapped to cores, we

HAN et a.: MULTICORE MC SYSTEMS: PARTITIONED SCHEDULING AND UTILIZATION BOUND 23 assume that EDF-VD scheduer is depoyed on each core P m to schedue its subset of MC tasks m. B. Scheduabiity Conditions of EDF-VD Scheduer The EDF-VD scheduer was first studied in the context of a singe processor [2], [3] with the idea of assigning a virtua and smaer) deadines thus higher priorities) to highcriticaity tasks in order to improve scheduabiity. When the system switches to high-criticaity mode, the timing requirements of high-criticaity tasks can be guaranteed by restoring their origina deadines and dropping ow-criticaity tasks. We first review the scheduabiity conditions for MC tasks running on a singe processor schedued under EDF-VD. We first introduce some key notations. ) u i k) = c i k)/p i ): The utiization of task τ i at eve-k i ). 2) U k): The eve-k utiization of tasks at own criticaity eve k or higher, which is defined as U k) = u i k). ) τ i : i = k 3) k): The eve-k utiization of tasks on core P m at own criticaity eve k or higher, that is i = k U m k) = u i k). 2) τ i :τ i m 4) Uk): The tota aggregate) eve-k utiization of tasks at own criticaity eve k or higher. Using ), we can have Uk) = U k). 3) By incorporating the utiizations of tasks at different criticaity eves, a sufficient scheduabiity condition for impicitdeadine MC tasks schedued under AMC-based EDF-VD agorithm was reported in [3], which is summarized in the theorem that foows. Theorem [3, Th. 3.4]: For an impicit-deadine MC task set aocated to processor core P m, the tasks are feasibe under the EDF-VD scheduer on core P m if either = =k ) 4) or, for some k =,...,K, the condition beow hods K=k+ U m k) ) < and = k = ) }{{} ak) K =k+ ) k= U. 5) m ) }{{} bk) Basicay, 4) states that if core P m can accommodate the maximum utiization demands of a its tasks at their own criticaity, the tasks are scheduabe under EDF-VD which actuay reduces to EDF as there is no virtua deadine for any task [3]). We note that the condition is rather pessimistic since ony the maximum utiization demands of tasks are considered. Note that if 4) fais, bk) <k =,...,K ) in 5). When 4) fais but 5) hods for some k condition-k), the virtua reative) deadine of any task τ i on core P m with own criticaity eve higher than k i.e., i > k) can be set as ˆp i = x p i, where x = [ak), bk)] <) is defined as a reduction factor for the virtua deadines for high-criticaity tasks to aow them to compete their ow-criticaity workoads earier. When the system mode shifts to eve-k + ) i.e., a task exceeds its eve-k WCET), a eve-k tasks on core P m are discarded, and the reative deadines of tasks on core P m with their own criticaity eves higher than k wi be restored to their origina ones. For the detaied mechanism of virtua deadine adustment and the discussions of EDF-VD for arbitrary-deadine dua-criticaity systems pease see [3]). Based on Theorem, we can obtain the foowing proposition reated to the feasibiity of MC tasks schedued under the partitioned EDF-VD agorithm on muticore systems. Proposition : For a set of MC tasks with K criticaity eves running on a muticore system with M homogeneous cores, a given partition Ɣ = {,..., M } is feasibe if, Theorem hods for every core P m m =,...,M). As the specia case, for a dua-criticaity system i.e., K = 2), the task set is feasibe under partitioned EDF-VD scheduer if, for each core P m m =,...,M), we have either ) + 2 2) 6) or ) < and 2 ) m ) U 2 2) U. 7) m ) The MC K N, M) Partition Probem: For a set of N MC impicit-deadine tasks with K criticaity eves running on a system with M homogeneous cores, find a feasibe task-to-core mapping Ɣ, where tasks on each core are scheduabe under EDF-VD. Ceary, when K =, the MC K N, M) partition probem actuay reduces to the cassica partitioned rea-time scheduing probem, which is a we-known NP-hard probem. Hence, the MC K N, M) partition probem is NP-hard as we. III. CRITICALITY-AWARE UTILIZATION BOUND AND ITS CHARACTERISTICS UNDER PARTITIONED EDF-VD SCHEDULER With the focus on EDF-VD for uniprocessor systems, Baruah et a. [3] identified the speedup factor to evauate its optimaity i.e., how cose the performance of the EDF- VD agorithm is to that of a cairvoyant agorithm): if an MC task set is scheduabe by a cairvoyant agorithm, the tasks can aso be feasibe under EDF-VD when they are executed with a speedup factor, which is obtained by a goba noninear continuous optimization sover e.g., 4/3 for a dua-criticaity system). To the best of our knowedge, a resut on the speedup bounds for the partitioned EDF-VD scheduer for tasks with mutipe criticaity eves has not been reported yet.

24 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO., JANUARY 208 In contrast to a speedup factor-based approach, our approach is based on deriving a utiization bound for efficient feasibiity test of MC tasks with mutipe eves, schedued by partitioned EDF-VD. One prominent advantage of the utiization bound is that for a given mapping scheme, as ong as the tota utiization of a task set does not exceed the given bound, the partitioning generated by the scheme is guaranteed to be scheduabe [20]. Note that, the scheduabiity conditions given in inequaities 4) and 5) are quite invoved and most of the existing schemes are rather compicated e.g., hybrid scheme [23], CA-TPA proposed in this paper, and the DBF-based heuristic [4]). Among these mapping agorithms, the simpest hybrid scheme aocates tasks to cores using different heuristics e.g., WF and FF) according to their own criticaity eves. In contrast, we adopt the simpe WFD as the representative mapping heuristic to derive a utiization bound and then discuss its properties reated to other task/system parameters. Despite of its simpicity, some insightfu information can be obtained to guide our partitioned scheme to improve tasks scheduabiity and baance the workoad distribution. The detaied discussions can be found at the beginning of Section IV. A. Leve- Core Utiization Limit Since task utiizations at a vaid eves are considered, we can see that the scheduabiity conditions for EDF-VD expressed in inequaities 4) and 5) are quite invoved. As each task has eve- WCET, we first transform these conditions to the simpified scheduabiity condition for tota eve- utiization of tasks on any core i.e., eve- core utiization imit). Then, using that imit, we derive a eve- utiization bound for partitioned EDF-VD in the next section. Define ω as the maximum ratio of WCETs between two consecutive criticaity eves for any task, i.e., ω = max τi {c i k + )/c i k)) k =,..., i }, where ω>. We can obtain the foowing theorem with respect to the eve- core utiization imit for any core. Theorem 2: For a set of periodic MC tasks m aocated to core P m, the tasks are scheduabe under EDF-VD if, the tota eve- utiization of the tasks K = ) satisfies = ) <λ= K. 8) 2 ω K ) 2 K ω ) 2 ω K )2 Proof: Consider the contrapositive. Suppose that neither inequaity 4) nor inequaity 5) hods for core P m. We first consider the second case, where inequaity 5) fais. For core P m, since inequaity 4) does not hod either [i.e., K= ) > ], we can have K=k+ )/ k = )) < k =,...,K ) and thus the second item of inequaity 5) can be transformed as K =k+ k)/ k = )) < k =,...,K ). Then, when inequaity 5) fais for core P m, for each conditionk k =,...,K ), there is k = ) or k= ) + K =k+ k). Therefore, we ony need to consider the second case, that is = ) + =k+ k). By the definition of ω, there is x) ) ω x x K). Hence, the above inequaity is rewritten as = ) ω + = + ) + =k+ =k+ = ) ) ω k ) ω ) ω k ). Note that, x ) K = ) x =,...,K). Then, the above inequaity for condition-k is further rewritten as = ) + ) ω = ) ) + ω k K k) 9) ω k ) ) ω + ωk K k) K ). = To compute S = K k= ωk k, wehave 0) ω S = ω + 2 ω 2 + +K ) ω K ω S S = K ) ω K ω K 2 + ω K 3 + +ω 0) S = K ) ωk ω + ωk ω ) 2. Adding up the above K ) inequaities for condition-k as given in 0), we can further obtain ω K ω ) ω ) 2 K ω + ωk K ω = ) ωk K )2 K ω ) 2 U K m ) = U 2 ω K ) b = λ. = 2 K ω ) 2 ω K )2 We next consider the other case, where inequaity 4) fais [i.e., K = ) >]. Foowing simiar steps, we get: ) + ω 2 ) + +ω K K ) > [ ) ) + ω ) 2 ) + + ω K = K ) ] >.

HAN et a.: MULTICORE MC SYSTEMS: PARTITIONED SCHEDULING AND UTILIZATION BOUND 25 As x = ) K = ) x =,...,K), we have ) ) + ω + +ω K = ) > U m ) > + K = = ω ) = Ub2. ) Define f k) k =,...,K) in9) as ) ) f k) = + ω + ω k K k). 2) = Reca that ω>. For each k k =,...,K ), we can get f k + ) f k) = K k) ω k ω k ) > 0 f k + ) >f k). Note that, by the definition of f k), there are U b2 f K) = based on ) and U b f ) + +f K )) = K based on 9). Thus, we can obtain U b f K) K ) >K U b f K) > U b > U b2. Therefore, if neither inequaity 4) nor inequaity 5) hods, we can get K = ) U b = λ. Taking its contrapositive, when K = ) <λ, either 4) or5) hods for core P m. Hence, the task set m is scheduabe under EDF-VD by Theorem. Based on the eve- core utiization imit λ under EDF-VD in 8), we can obtain the monotonic reationship between λ and other task parameters i.e., ω and K) as foows. Property : When K is fixed, the eve- core utiization imit λ for each core schedued by EDF-VD decreases when ω increases. Proof: We can see that every f k) k =,...,K ) as defined in 2) increases when ω increases and K is fixed. As λ = K / K k= f k)), λ decreases when ω increases. Property 2: The eve- core utiization imit λ under EDF- VD scheduer for each core decreases when the system criticaity eve K increases and ω is fixed. For the proof of Property 2, pease see the suppementary materia. B. Criticaity-Aware Utiization Bound Based on the eve- core utiization imit for any core as given in 8), the minimum feasibe number of tasks on any core can be found as β = λ ɛ/ρ), where ρ = max{u i ) i =,...,N} and ɛ is an arbitrariy sma positive number. We can directy obtain the scheduabiity condition reated to the number of MC tasks N) running on the target system as foows. Property 3: For a set of N periodic MC tasks with K criticaity eves running on a muticore system with M cores under partitioned EDF-VD scheduer, the task set is guaranteed to be feasibe as ong as N does not exceed β M. Once the number of tasks N does not exceed β M, based on the eve- core utiization imit for any core presented in 8), the foowing theorem corresponding to the eve- utiization bound can be used as an efficient feasibiity test for a set of MC tasks, which execute on a muticore system schedued under partitioned EDF-VD with the WFD mapping. Theorem 3: For N periodic MC tasks with K criticaity eves running on a muticore system with M cores, the eve- utiization bound U ca,bound under partitioned EDF-VD scheduing with the WFD heuristic is U ca,bound β M + ) λ =. 3) β + Proof: We assume that we use WFD and hence, the tasks have been sorted by their nonascending eve- utiizations: for any two tasks τ i and τ i < N), u i ) u ). Suppose that the task τ n is the first task for which the sufficient feasibiity condition given in 8) fais when it is mapped to any core. Then, for each core P m m =,...,M), we get c n ) + c ) λ u n ) + u ) λ p n p τ m τ m where m contains the subset of tasks on core P m after aocating the first n ) tasks and m corresponds to the number of tasks in the subset m. Adding these M inequaities together, we have n M ) u n ) + u ) M λ. = By the assumption that tasks have been ordered by their nonincreasing eve- utiizations, u n ) n = u )/n). Thus, the above inequaity can be further rewritten as ) M n + u ) M λ. n = Hence, we can get n = u ) M n λ/m + n ). Based on 3), considering that the tota eve- utiization of tasks U) = K = U ) n = u ), we can further have U) M n λ = gn) 4) M + n where gn) is a function of n. Since neither λ nor M is reated to n, the first derivative of gn) with respect to n is g M M ) λ n) = M + n ) 2 > 0. Therefore, the minimum vaue of gn) can be determined when n = β M +, i.e., U) gβ M + ) = β M + ) λ/β + ) = U ca,bound. Hence, taking its contrapositive, when the tota eve- utiization of tasks U) is ess than U ca,bound, WFD guarantees to generate a partition satisfying 8) for any core and thus Proposition hods, which concudes the proof. When the task system has no criticaity certification requirement, which is denoted as non-mc task system i.e., traditiona periodic rea-time task system), we can have K =, ω = and thus λ = based on 9) and the proofs in Theorem 2. Forthe function gn) defined in 4) with λ =, we can get its first derivative as g n) = M M )/M+n ) 2 )>0. Therefore,

26 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO., JANUARY 208 a) b) c) d) Fig.. Criticaity-aware utiization bound under partitioned EDF-VD scheduing with the WFD heuristic. a) K = 3 and ρ = 0.. b) ω =.5 and ρ = 0.. c) M = 8andρ = 0.. d) ω =.5 andk = 3. the minimum vaue of gn) can be found as β M + /β + ) when n = β M +, which actuay reduces to the utiization bound for non-mc systems under partitioned-edf with WFD [20]. Properties of the Bound: In what foows, we discuss the characteristics of the bound U ca,bound associated with other system/task parameters i.e., M, ω, K, and ρ), which actuay revea the monotonicity of U ca,bound reated to M, ω, and K. First, when the other parameters are fixed, the monotonic reationship between U ca,bound and the number of cores M can be identified as foows. Property 4: For N periodic MC tasks with K criticaity eves running on a muticore system, the utiization bound U ca,bound increases when the number of cores increases. Proof: Let hm) = U ca,bound = β M + ) λ/β + ). As neither β nor λ is reated to M, the first derivative of hm) is h M) = β λ/β + ) >0, which concudes the proof. Next, for the parameters ω and K in λ, when other parameters are fixed, the monotonicity of the bound U ca,bound reated to ω and K can be obtained as foows. Property 5: For a set of MC tasks with K criticaity eves running on a muticore system with M cores, the utiization bound U ca,bound decreases when ω and/or K increases. Proof: Define gω, K) = λ. Reca that β = λ ɛ/ρ) and et hω, K) = β = gω, K) ɛ/ρ). Then, we can have U ca,bound = β M + ) λ/β + ) = hω, K) M + ) gω, K)/hω, K) + ). We prove the property by contradiction. First, we assume that the caim is fase. That is, there must exist two ω and ω 2 ω >ω 2 ) such that hω, K) M + ) gω, K)/hω, K) + ) hω 2, K) M + ) gω 2, K)/hω 2, K) + ). Based on Property, wehavegω, K) <gω 2, K) as ω >ω 2.Note that ρ is not reated to M, ω, K, and thus λ. Then, since the foor function of hω, K) i.e., β) is considered, we can have hω, K) hω 2, K) and hω, K) M + hω, K) + gω, K) hω 2, K) M + gω 2, K) hω 2, K) + hω, K) M + hω, K) + > hω 2, K) M + hω 2, K) + hω 2, K) hω, K) > hω 2, K) hω, K)) M 0 > 0 or M < which eads to contradiction. Foowing the simiar steps, based on Property 2, we can aso get contradictory resuts by assuming that U ca,bound does not decrease as K increases. Finay, when other task/system parameters are fixed, the reationship between U ca,bound and ρ is given as foows. Property 6: For a set of MC tasks with K criticaity eves schedued on a muticore system with M cores, the utiization bound U ca,bound cannot increase as ρ increases. Proof: The proof is obtained by contradiction. Again, ρ is not reated to M and λ. Define hρ) = β = λ ɛ/ρ) and we can have U ca,bound = hρ) M + ) λ/hρ) + ). Suppose that the caim is fase. Then, there must exist two ρ and ρ 2 ρ >ρ 2 ) such that hρ ) M + ) λ/hρ ) + ) > hρ 2 ) M + ) λ/hρ 2 ) + ). Ashρ) i.e., β) is a foor function, hρ ) hρ 2 ). Therefore, we can have hρ ) M + ) λ > hρ 2) M + ) λ hρ ) + hρ 2 ) + hρ 2 ) hρ )>hρ 2 ) hρ )) M 0 > 0orM < which resuts in contradiction and concudes the proof. The reationship between the criticaity-aware system utiization bound U ca,bound and other task/system parameters i.e., M, ω, K, and ρ) can be more expicity iustrated in Fig., where the defaut parameter vaues are: M = 8, ω =.5, K = 3, and ρ = 0.. From the figures, we can see that when there are more avaiabe cores M and smaer ω, based on Properties 4 and 5, the bound U ca,bound graduay increases when other parameters i.e., K and ρ) are fixed [Fig. a)]. For given M, ω, and ρ, based on Property 5, U ca,bound can drop dramaticay when K increases [Fig. b)], because higher task utiizations at high eves can ead to much smaer λ and U ca,bound. For task systems with a high eve criticaity e.g., K 4), the bound can be extremey ow, which rather imits its appicabiity. Simiary, based on Property 5, for given M and ρ, U ca,bound aso decreases as ω and/or K increase as showninfig.c). Moreover, when other parameters are fixed, U ca,bound exhibits stepwise decrease as ρ increases [Fig. d)], which is consistent with Property 6. IV. CRITICALITY-AWARE TASK PARTITIONING Whie U ca,bound provides an efficient feasibiity test, it ony considers the worst-case scenarios: for instance, a pessimistic ω vaue is used when transforming the utiizations of tasks at high criticaity eves to eve- utiizations and the WFD heuristic ony takes tasks eve- utiizations into account.

HAN et a.: MULTICORE MC SYSTEMS: PARTITIONED SCHEDULING AND UTILIZATION BOUND 27 Therefore, due to the characteristics of U ca,bound with respect to other task/system parameters as given in Properties 4 6, the appicabiity of the bound is rather imited especiay when ω and/or K become arge. In addition, even if the tota eve- utiization of tasks does not exceed the bound, the workoad of partitions generated by WFD may be imbaanced, giving high run-time overhead as iustrated in Section V-C. More importanty, athough the utiization bound e.g., 3/4 for cumuative high-criticaity utiization and ow-criticaity utiization on a processor [6]) can be directy empoyed for task mapping, such bound-based mapping agorithm can usuay resut in rather degraded scheduabiity even for dua-criticaity systems as vaidated in [6]. Nonetheess, the derived bound essentiay provides important insights into the partitioned scheme design for muticore MC systems with mutipe criticaity. The utiizations of tasks at different eves, instead of ony those at a certain eve [such as eve- core utiization imit λ in the bound U ca,bound and the maximum utiization demands of tasks in 4)], can aso affect the feasibiity of tasks based on the scheduabiity conditions given in 4) and 5). Hence, the tasks utiizations at different eves shoud be incorporated into the mapping heuristics for better scheduabiity. More importanty, as the various combinations of ω and K can generate divergent utiizations of tasks at different eves, the discrepancy among these utiizations can cause rather imbaanced workoad among cores. Thus, more effective poicies are needed for criticaity-aware workoad baancing, which effectivey improves its appicabiity as vaidated in Section V-C. Hence, with the obective of reducing the pessimism in the bound, the tasks utiization variations at different eves need to be considered to enhance scheduabiity performance and baance system workoad. Reca that the origina MC K N, M) partition probem is NP-hard. In what foows, for tasks with mutipe eves running on muticores, we focus on efficient partitioned scheme with better practica viabiity i.e., high scheduabiity ratio and ow run-time overhead). In genera, there are two fundamenta phases when mapping tasks to cores: ) determine the order i.e., priority) of tasks to be aocated and 2) find an appropriate core for each task. Instead of ony using the maximum utiizations of tasks and the simpe but pessimistic) scheduabiity condition 4), we focus on the more improved scheduabiity condition 5). Because of the arge variations of tasks utiizations at different eves, it is crucia to take such variations into account when designing the two essentia steps of partitioned scheme. Based on the above discussions, we propose a CA-TPA. We define the utiization contribution of a task at a given eve, which is then used to guide the aocation of tasks to cores. CA-TPA adopts a probe-based approach to ensure that, when aocating a task to cores, the overa system utiization has the smaest increment. Moreover, a workoad imbaance factor is introduced to baance workoad. Therefore, this paper differs from the existing studies that usuay rey on ony the maximum utiizations of tasks at their own criticaity eves. A. Task Ordering and Utiization Contribution To incorporate the utiizations of tasks at different criticaity eves in the first step, we present the concept of utiization contribution of tasks. Specificay, a task τ i s utiization contribution at eve-k i ) is defined as C i k) = u ik) Uk), k =,..., i 5) where Uk) defined in 3) is the tota eve-k utiization of tasks at the own criticaity eve k or higher. The utiization contribution of task τ i to the system by considering a its vaid eves) can be further defined as C i = max{c i k) k =,..., i }. 6) From the above definitions, we can see that the utiization contribution of a task essentiay represents its argest weight in system utiizations in a its vaid eves. Therefore, as opposed to the conventiona partitioning heuristics soey based on the utiizations of tasks such as FFD and WFD), we determine the ordering priorities of MC tasks using their utiization contributions in the first step before aocating them to cores. For this purpose, we define two reationa operators and for task prioritization with the foowing rues. ) If task τ i has arger utiization contribution than task τ, we say that τ i has higher ordering priority than τ which is denoted as τ i τ ). Otherwise, if task τ i s utiization contribution is smaer than that of τ, τ i τ. 2) When the two tasks have the same utiization contribution, the tie is broken in favor of the task with higher criticaity eve. That is, if C i = C i >,wehave τ i τ. 3) If there is sti a tie, the task with smaer index is assigned higher ordering priority: τ i τ if i < C i = C i =. B. Utiization Increment To improve the scheduabiity of MC tasks whie baancing workoad among cores, the key point in our core seection heuristic is to take the utiization variations of tasks at different eves into consideration. Specificay, from 5), we can see that, the utiizations of tasks at a vaid eves can affect their scheduabiity on a certain core. Moreover, as each core has a distinct subset of MC tasks, the utiizations of cores at each eve can vary significanty [see 2)]. Therefore, the aocation of a task to different cores can ead to very divergent feasibiity resuts and arge variations in resuting core utiizations, which is quite different from the conventiona partitioned scheduing of non-mc tasks. Based on Theorem, once tasks are aocated to cores, we can check 4) and K ) conditions given in 5). Note that, ony one condition is required to hod on each core to satisfy the feasibiity condition for the partitioned EDF-VD scheduer. In genera, a partitioning scheme requires that the core utiization does not exceed. However, we can see that if 4) hods, bk) for any condition-k in 5). To incorporate and expoit the core utiization to guide task mapping under

28 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO., JANUARY 208 partitioned EDF-VD, condition-k in 5) can be rewritten as = ) < and μ m k) θ m k) μ m k) = = θ m k) = ) = =k+ k) ) ) =k+ ) 7) where θ mk) [θ mk) = ony if P m is empty]. Intuitivey, 5) [i.e., 7)] represents an improved scheduabiity condition compared to 4). Next, we identify the reationship between 4) and 7) regarding the scheduabiity. For ease of discussion, et xk) = k = ), yk) = K=k+ ) and zk) = K =k+ k) for condition-k on core P m.asyk) and zk) maybe0,wehave:yk) zk) k =,...,K ). Property 7: For a set of MC tasks running on a muticore system schedued under partitioned EDF-VD, when 4) hods for core P m, then μ mk) θ mk) k =,...,K ), that is, the second condition in 7) hods for every condition-k. Proof: As 4) hods for core P m,wehavexk) + yk) k =,...,K ). As yk) zk), for each k θ m k) μ m k) = [ xk)] [ yk)] xk) zk) = [xk) + yk)] + xk) [yk) zk)] 0 which concudes the proof. Property 8: For MC tasks running on muticore systems under partitioned EDF-VD, if 7) fais for any condition-k on core P m,4) hods ony when each task on P m has its own criticaity eve and ) =. Proof: As 7) fais for any condition-k k =,...,K ) on core P m, for each k, wehavexk) orθ mk) μ mk) <0. We first consider the second case θ m k) μ m k) <0 [ xk)] [ yk)] xk) zk) <0 ) = xk) + yk) = > + xk) [yk) zk)]. This means that 4) aso fais. Then, 7) fais but 4) hods for core P m ony if xk) k =,...,K ). For each k, as 4) hods, we have K = ) = xk) + yk). Then, we can get xk) = and yk) = 0 for each k. Moreover, when xk) = and yk) = 0 hod, xk + ) = and yk + ) = 0 k =,...,K 2). Hence, 7) fais but 4) hods ony if x) = and y) = 0, i.e., ) = and ) = 0 = 2,...,K), which concudes the proof. Based on Properties 7 and 8, we can see that except in very rare cases, 7) represents a ess pessimistic sufficient scheduabiity condition when compared to 4). Therefore, with an obective to incorporate task utiizations at different eves for effectivey guiding the mapping of tasks to cores, 4) can be ignored and we ony need to focus on the sufficient condition 7) when evauating a core s avaiabe utiization. Then, the avaiabe utiization for condition-k on core P m is defined as { } A m k) = min θ m k) μ m k), = ). 8) The foowing properties are further provided to obtain an effective avaiabe utiization for any condition-k on core P m. Property 9: For a set of MC tasks with K criticaity eves running on a muticore system under partitioned EDF-VD scheduer, when 4) fais but 7) hods for some k <K) on core P m,wehaveθ mk) μ mk) < k = ). Proof: Since 4) fais for core P m, we can have xk) + yk) >k =,...,K ). As 7) hods for some k on core P m, we can get xk) < and θ m k) μ m k) 0 [ xk)] [ yk)] xk) zk) 0 [xk) + yk)] + xk) [yk) zk)] 0 xk) [yk) zk)] > 0 0 < xk) < and yk) >zk) >0. Therefore, we can have θ m k) μ m k) = ) = [ xk)] [ yk)] xk) zk) [ xk)] = [xk) ] yk) xk) zk) <0 which concudes the proof. Again, yk) and zk) maybe0k =,...,K ). Foowing the simiar steps in the proofs of Property 9, we can have the foowing property. Property 0: For a set of MC tasks with K criticaity eves running on a muticore system that are schedued under partitioned EDF-VD, if 4) hods and 7) hods for some k < K) on core P m,wehaveθ mk) μ mk) k = ). Basicay, Properties 9 and 0 indicate that no matter whether 4) hods or not, we have θ mk) μ mk) k = ) if condition-k hods. Hence, the avaiabe utiization for condition-k as defined in 8) can be safey determined as A m k) = θ m k) μ m k). 9) Foowing the scheduabiity conditions presented in 4) and 7), the core utiization on core P m is defined as, ) > and k : A m k) <0 20a) = = or ), k =,...,K = max{ A m k) k : A m k) 0}, ese. 20b) Essentiay, 20) impies that, if the secheduabiity condition given in Theorem fais for core P m, then = ; otherwise, indicates the maximum expoited utiization among )

HAN et a.: MULTICORE MC SYSTEMS: PARTITIONED SCHEDULING AND UTILIZATION BOUND 29 a vaid condition-k i.e., A mk) 0) on core P m. Therefore, based on Property 7 and the definition of A mk), when 4) or 7) is satisfied for core P m, there must exist some k such that A mk) 0 hods. Then, the system utiization U sys and average core utiization U avg are defined, respectivey, as foows: U sys = max { m =,...,M } 2) Mm= U avg =. 22) M To further quantify the impact of partitioning a task τ i to core P m,theincrement of core utiization on P m is defined as m {τ i } = {τ i }. 23) The new core utiization {τ i } of core P m, by assuming that task τ i is assigned to the core, can be computed based on 20). Note that, in case {τ i } = [see 20a)], τ i cannot be feasiby aocated to core P m subect to the scheduabiity conditions 4) and 7). Properties in Dua-Criticaity Systems: Next, focusing on dua-criticaity systems i.e., K = 2), we derive some properties reated to the core utiization variations when aocating a task to cores. Despite the simpified assumptions, the resuts of these properties can shed ight to design mapping heuristics and evauate their effectiveness as discussed beow. Beow, task τ i is assumed to be aocated to core P m.for ease of presentation, et x = ), y = 2 2) and for any task τ = 2), we have c ) = ι c 2) where ι <) is a constant. Thus, 2 ) = ι 2 2) = ι y. Based on these assumptions, we can obtain the foowing properties reated to the utiization increment after assigning τ i to P m. Property : Suppose that 7) hods after τ i is mapped to P m.ifτ i s own criticaity eve equas and we denote by z the quantity u i ), then the core P m s utiization increment is no greater than z. Proof: For dua-criticaity systems, there is ony condition- k = ) on each core. From 7) and 20), we have = [ x) y) ι x y ] {τ i } = [ x z) y) ι x + z) y ]. Note that y may be 0. Then, based on 23), we have m {τ i } = y) z + ι y z = [ ι) y ] z z which concudes the proof. Property 2: Assume that 7) hods after task τ i is assigned to core P m.iftaskτ i s own criticaity eve equas 2 and if we denote by z the quantity u i 2), the utiization increment m {τ i } of core P m is no greater than z but greater than u i ) i.e., ι z). Proof: Foowing the simiar steps in Property, as7) hods for core P m,wehave0 x < and {τ i } = [ x) y z) ι x y + z) ] m {τ i } = x) z + ι x z = [ ι) x] z z. Since 0 x < and 0 <ι<, we can further have ι> ι) x ι) x >ι m {τ i } >ι z Agorithm : Outine of CA-TPA Input: the task set); M the number of cores); Output: A feasibe partition Ɣ or FAIL; Initiate Ɣ ={ m }, where m = m =,...,M); 2 Sort tasks in based on their utiization contributions; 3 for each τ i in the above order) do 4 = ; 5 for each P m ) do 6 Cacuate {τ i } based on Equation 20); 7 Cacuate m {τ i } based on Equation 23); 8 if { m {τ i }} is feasibe and m {τ i } < ) then 9 = m {τ i } ; x = m; 0 end end 2 if == ) then 3 Ɣ = ; break; //not feasibe on any core; 4 end 5 x = x {τ i }; //aocate τ i to P x ; 6 Update U xk) k =,...,K) and U x; 7 end 8 Return Ɣ =? Ɣ: FAIL); which concudes the proof. Based on the anaysis from Properties and 2, we can see that for dua-criticaity systems, a ower ratio of U 2 ) to U 2 2) i.e., ι) can ead to a smaer increment for core utiization, which is consistent with the intuition that more tasks with high criticaity eves can reduce their deadines to compete their reativey ight ow-criticaity workoads earier. More importanty, we can see that when tasks of different own criticaities are mapped onto the same core, the core utiization increment may be ower than the maximum utiization of the task to be aocated to the core. As our partitioned scheme aims at minimizing the utiization increments of cores during task mapping, the tasks with different own criticaity eves are more ikey to be assigned to the same core. Therefore, when the system mode changes, the remaining high-criticaity tasks can be distributed uniformy among cores, which typicay resuts in criticaity-cognizant workoad baance and thus ower run-time overhead due to potentiay fewer ob preemptions on each core) as vaidated in Section V-C. C. Criticaity-Aware Task Partitioning Agorithm Based on the above anaysis, we adopt a probe-based approach to incorporate the contributions of tasks utiizations at mutipe eves on different cores when aocating a task to cores. Specificay, by checking a cores in the system, a task τ i wi be mapped to the core P x that has the minimum increment for its core utiization, shoud τ i be aocated to P x. That is, x {τ i } = min{ m {τ i } m =,...,M}. If more than one core has the same minimum core utiization increment, the tie is broken by mapping the task to the core with smaer index. The outine of our CA-TPA is summarized in Agorithm. First, the task-to-core partition Ɣ and the subset of tasks for

30 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO., JANUARY 208 each core are initiaized ine ). Then, a tasks are sorted in descending order of their utiization contributions ine 2). For each task, CA-TPA probes a cores by cacuating its new core utiization [based on 20)] and utiization increment [based on 23)], by assuming that task τ i is aocated to it ines 5 ). For a cores that can feasiby accommodate task τ i under the EDF-VD scheduer based on the scheduabiity conditions given in 4) and 7), the core P x with the smaest utiization increment is chosen ine 9). If τ i cannot be feasiby aocated to any core, CA-TPA fais to obtain a feasibe task-to-core partition and quits ines 2 and 3). Otherwise, τ i is aocated to the target core P x by updating its subset of tasks x and reated parameters ines 5 and 6). Once a tasks have been successfuy assigned to the cores, the feasibe partition Ɣ is obtained and returned ine 8). ) Time Compexity of CA-TPA: Reca that there are M cores and N tasks in the system. As there are normay ony a few criticaity eves i.e., usuay no arger than 6) in most appications, K can be assumed to be a constant in the compexity anaysis. U k) can be computed from ) inon) time. The computation of Uk) from 3) can aso be done in ON) time. Therefore, sorting the tasks in decreasing order of their utiization contributions can be performed in the compexity of ON ogn). Next, from Agorithm, we can see that determining a target core that can feasiby accommodate a task can be done in OM+N) time, by computing utiization increments of a cores for a tasks that have been aocated. Hence, the overa time compexity of CA-TPA can be found as OM + N) N). 2) Workoad Imbaance Factor: When tasks are aocated to cores under CA-TPA with partitioned EDF-VD, it is possibe to obtain a mapping with imbaanced workoads among cores, where a few cores are over-oaded whie the remaining cores have enough free capacity. To prevent our partitioning agorithm from aocating most tasks to a few cores, we introduce a workoad imbaance factor, which is defined as = Usys min { m =,...,M } U sys 24) where U sys is defined in 2). In essence, is expoited to contro the variations of core utiizations during the task-to-core mapping. In addition, there is a threshod ρ for the workoad imbaance factor, which is set prior to the task assignments. When increases and approaches the threshod ρ, instead of seecting a target core according to CA-TPA, the new task can be aocated directy to the vaid core with the minimum core utiization i.e., min{ m =,...,M}), subect to the feasibiity conditions 4) and 7). A concrete exampe can be found in the suppementary materia. V. EVALUATIONS AND DISCUSSION To evauate the performance of the proposed CA-TPA scheme with EDF-VD experimentay, we deveoped a simuator and impemented the EDF-VD scheduer in Linux kerne. For comparison, in addition to the CA-TPA scheme, we aso impemented the we-known partitioning heuristics WFD, FFD, BFD, as we as the hybrid scheme proposed in [23] TABLE I SYSTEM AND TASK PARAMETERS FOR THE EXPERIMENTS that can be adopted for EDF-VD and systems with mutipe criticaity eves. To test the feasibiity of a core with a new task, these schemes first use the sufficient scheduabiity condition 4). In case the outcome is negative, they check the second and improved condition 5). As the optima soution-based scheme reported in [22] ony focuses on dua-criticaity systems and is usuay appicabe for sma probems, it is omitted here. Moreover, we aso evauate the performance of two partitioned schemes MC-P-UT-INC [6] and MPVD-HA-BF [4] ppicabe to dua-criticaity systems. In what foows, we first give the parameter settings for the experiments in Section V-A, and then in Sections V-B and V-C, we present and discuss the simuation resuts and empirica resuts for the tested schemes, respectivey. A. Parameter Settings We compare these schemes based on the foowing performance metrics. ) Scheduabiity ratio, which is defined as the ratio of the number of task sets that satisfy the scheduabiity condition to the tota number of tested task sets. 2) Average core utiization U avg ) defined in 22) that assesses the workoad baance of partitions generated by the schemes and can typicay affect the run-time overheads for the schemes. 3) Run-time overhead that measures the appicabiity of a mapping schemes. In Tabe I, we provide the parameter ranges of the system considered in the experiments, incuding the number of cores M), the system criticaity eve K), the normaized system utiization NSU) defined as the ratio of the aggregate eve- utiization of a tasks to the number of cores), and a threshod for workoad imbaance α). Then aso shows the parameters for MC tasks: the number of tasks N), task periods P), and the increment factor IFC) defined as the increasing ratio of WCETs between two consecutive eves for any task). In the experiments, the synthetic task sets are generated from the above parameters as foows. First, the system criticaity eve K is seected uniformy in the range [2, 5]. For given vaues of M, N, and NSU, the base task utiization at eve- is set as u base ) = NSU M/N) based on the definition of NSU. Then, for each task τ i, its period p i is randomy chosen in one of the three period ranges given in Tabe I. Next, the vaue of c i ) is obtained uniformy in the range [0.2 p i u base ),.8 p i u base )]. The task τ i s criticaity eve i is seected uniformy within [, K]. Finay, simiar

HAN et a.: MULTICORE MC SYSTEMS: PARTITIONED SCHEDULING AND UTILIZATION BOUND 3 a) b) a) b) Fig. 2. Simuation performance of the schemes with varying NSU for duacriticaity systems. a) Scheduabiity ratio. b) Average core utiization. Fig. 3. Performance of the schemes with varying NSU. a) Scheduabiity ratio. b) Average core utiization. to the generation of the vaue of c i ), the vaues of c i k) k = 2,..., i ) can be accordingy generated using c i ) and the vaue of IFC. B. Performance of the Partitioning Schemes Uness otherwise noted, the defaut parameter vaues in the simuations are: M = 8, N = 80, K = 4, NSU = 0.6, α = 0.2, and IFC = 0.4. In the reported resuts, each data point corresponds to the average resut of 50 000 task sets. ) Resuts for Dua-Criticaity Systems: We first conduct the performance comparison between these mapping schemes for dua-criticaity systems i.e., K = 2). Due to space imits, we ony evauate the impact of the NSU on the performance of tested schemes and the resuts are shown in Fig. 2 where IFC = ). When other parameters are fixed, arger NSU generay means higher workoad and ower acceptance ratio for the schemes. Not surprisingy, as shown in Fig. 2a), WFD usuay yieds the owest acceptance ratio and CA-TPA can have the best scheduabiity performance among poynomia time compexity-based schemes due to its effort to minimize the core utiization increment during task-to-core mapping. MPVD-HA-BF has the best acceptance ratio but with much higher pseudo-poynomia time compexity arisen due to the use of DBF. MC-P-UT-INC considers the ow-criticaity workoads for high-criticaity tasks and can have scheduabiity comparabe to the CA-TPA. However, MC-P-UT-INC ony focuses on dua-criticaity systems and has quite high time compexity, since it iterates the vaues from 0.5 to in increments of 0.0 here) for the bound of the cumuative high-criticaity utiization on each core to find a feasibe partition. Fig. 2b) further shows the performance of workoad baance generated by these schemes. This metric is obtained by considering ony the scheduabe task sets for a schemes. WFD usuay generates partitions with the best workoad baance among the schemes. In addition to minimizing the utiization increase of cores, CA-TPA empoys a threshod for workoad imbaance to avoid severey imbaanced workoads. Thus, CA-TPA can have average core utiization comparabe to WFD and generate partitions with more baanced workoad than other schemes. 2) Resuts for Systems With Mutipe Criticaity Leves: In what foows, we evauate the impacts of different parameters a) Fig. 4. Performance of the schemes with varying IFC. a) Scheduabiity ratio. b) Average core utiization. on the performance of these partitioned schemes except MC- P-UT-INC and MPVD-HA-BF that are appicabe ony to duacriticaity systems) for tasks with mutipe criticaity eves. a) Impact of the normaized system utiization: Fig. 3 shows the impacts of the NSU on the performance for the partitioned schemes. As shown in Fig. 3a), compared to WFD, FFD, BFD, and hybrid mapping schemes, CA-TPA can obtain much better scheduabiity ratio up to 35% more) as expained above, especiay when the system becomes over-oaded e.g., NSU > 0.63). Simiar to the trends as those in Fig. 2b), Fig. 3b) shows that CA-TPA can generate partitions with better workoad baance than FFD, BFD, and hybrid, and can have ower average core utiization compared to WFD when the system is under-oaded e.g., NSU < 0.57). b) Impact of the increment factor: Next, we evauate the schemes with varying IFCs and the resuts are shown in Fig. 4. Usuay, a arger IFC causes higher system workoad and ower acceptance ratio from the definition of IFC and the scheduabiity conditions given in Theorem. The resuts foow the simiar trends as those for varying NSU: our CA-TPA-based scheme performs best in terms of scheduabiity ratio and generates more baanced workoad than FFD, BFD, and hybrid heuristics. More specificay, as CA-TPA tries to bridge the gap between the tota task utiizations at different criticaity eves on every core, it can typicay obtain average core utiization comparabe to WFD as shown in Fig. 4b). c) Impact of the threshod for workoad imbaance α: Fig. 5 iustrates the performance comparison among a mapping schemes with different threshods for workoad imbaance α). As α is used ony by CA-TPA to tune workoad imbaance during task partitioning, the performance of other schemes remains constant when α varies as shown in Fig. 5a) and b). b)

32 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO., JANUARY 208 a) b) a) b) Fig. 5. Performance of the schemes with varying α. a) Scheduabiity ratio. b) Average core utiization. Fig. 7. Performance of the schemes with varying K. a) Scheduabiity ratio. b) Average core utiization. heuristics, and yieds average core utiization comparabe to WFD [see Fig. 7b)]. a) Fig. 6. Performance of the schemes with varying M. a) Scheduabiity ratio. b) Average core utiization. A arger α usuay impies arger toerance of workoad imbaance for CA-TPA. Consequenty, when α increases, the CA-TPA scheme attempts to aocate tasks to the core with the minimum utiization increment without much consideration of the workoad baance i.e., in a manner simiar to FFD) and thus can effectivey improve scheduabiity as shown in Fig. 5a). However, this behavior can resut in more imbaanced workoad among cores i.e., arger workoad imbaance factor), but our CA-TPA scheme sti manages to generate more baanced partitions compared to FFD, BFD, and hybrid schemes as shown in Fig. 5b). d) Impact of the number of cores M: We further evauate the performance for a schemes with varying number of cores M) and the resuts are shown in Fig. 6. In genera, more cores can provide more capacity and fexibiity for tasks. Thus, when M increases, a mapping schemes can obtain better scheduabiity and CA-TPA sti achieves the best acceptance ratio among a schemes [see Fig. 6a)]. Due to the workoad imbaance tuning during task assignments, CA-TPA can generate partitions with better workoad baance compared to BFD, FFD, and hybrid [see Fig. 6b)]. e) Impact of the number of criticaity eves K: Finay, the performance comparison for a schemes with different system criticaity eves K) is shown in Fig. 7. Reca that the NSU represents the system s utiization at eve-. When other parameters are fixed, a arger K impies more execution times for tasks with highest eve running at eve-k. Therefore, the acceptance ratios of a schemes decrease drasticay when K increases as shown in Fig. 7a), but CA-TPA sti can obtain the best scheduabiity performance among a schemes as expained earier. Simiary, CA-TPA generates partitions with more baanced workoad compared to BFD, FFD, and hybrid b) C. Measurement Performance in Linux Kerne To assess the usabiity of the mapping schemes, we impemented them in Linux kerne. We foowed the design patterns of LITMUS RT [] and expoited the avaiabe Linux infrastructure to impement the partitioned EDF-VD scheduer: a new and highest-priority scheduing cass is added to the traditiona Linux scheduer and the partitioned EDF-VD scheduer aways executes the highest priority obs before the reguar Linux obs. Moreover, to maintain the highest-priority scheduing cass for partitioned EDF-VD, an additiona ide ob on each core occupies CPU resource once the system is ide i.e., the ob queue on every processing core is empty). The partitioned EDF-VD scheduer changes the Linux scheduer to invoke the initiaization functions, scheduing and tick handers at run-time. Simiar to paradigms of LITMUS RT, we provided a user space ibrary to create MC tasks by means of mutithreading. The tasks are initiay created as non reatime where each task executes the same function codes by updating a oca variabe in a whie-oop. A system ca is utiized to pass the timing parameters of tasks from user space to kerne space, and then per-task data structures are constructed in kerne. In addition, a warm-up tick is added to ensure that the scheduing and data structures are a ready before tasks start their executions. At each hardware timer interva interrupt, every core triggers the tick hander and individuay performs scheduing decisions for its obs: incuding the arriva, preemption, and competion events. Specificay, we estabished two additiona goba synchronization mechanisms for partitioned EDF-VD scheduer: one is used to synchronize the tick counters of a cores; based on the first mechanism and AMC scheme [5], [6], the other is used to address the issues when the system mode switches to a higher eve which is simiar to barrier synchronization in [26]), such as discarding a ow-criticaity obs, restoring the reative deadines of a high-criticaity obs if appicabe), initiaizing the system running mode when it is ide. We impemented a schemes in Linux kerne 2.6.38.8 that execute on a PC with 32 nm AMD FX-8320 processor 8 cores, 3.5 GHz cock speed, 8 MB L2, and L3 cache) and 8 GB RAM. Here, the performance metric is tota run-time

HAN et a.: MULTICORE MC SYSTEMS: PARTITIONED SCHEDULING AND UTILIZATION BOUND 33 a) b) c) d) e) f) Fig. 8. Empirica performance of the task partitioning agorithms with respect to run-time overhead. a) Performance with varying NSU. b) Performance with varying M. c) Performance with varying IFC. d) Performance with varying K. e) Performance with varying MCP. f) Performance with varying α. overhead on cores, where the main sources are context switching, preemption deay, operations for ob queues i.e., ob arriva and ob finish), and synchronization mechanisms for MC systems. ) Parameter Settings: The period range for tasks is [50 ms, 500 ms] and other parameters are generated using the same methodoogy adopted in the simuations. The evauated task sets are scheduabe by a mapping schemes based on the feasibiity conditions in 4) and 5), and each task set executes for 0 s under each scheme. The additiona measuring parameter for the mapping schemes is mode-change probabiity MCP), which is ranged from 0.0 to 0. and accounts for the execution variations of MC tasks at run-time. We first cacuate the tota number of obs executed in 0 s, which is then mutipied by MCP to obtain the number of obs that can resut in mode transition. After randomy seecting such obs that have their own eves higher than ), the actua execution times of these obs can be uniformy determined from their minimum WCETs to the maximum WCETs. Uness otherwise specified, the defaut parameter vaues in experiments are: M = 4, N = 25, K = 4, NSU = 0.45, α = 0.2, IFC = 0.4, and MCP = 0.08. For the resuts reported beow, each data point represents the average resut of 000 task sets. 2) Empirica Resuts: The overhead measurement resuts in s) for the schemes are shown in Fig. 8. The overhead of CA-TPA usuay corresponds about 3% 5% of the tota execution time on cores e.g., 40 s by defaut), which is a itte ower than that i.e., 5%) for partitioned EDF-VD on Core i5 patform [26]. Specificay, CA-TPA has measured overhead comparabe to WFD and outperforms other schemes. The detais of the anaysis can be found in the suppementary materia. VI. CONCLUSION For periodic MC tasks running on muticores under the EDF-VD agorithm, we investigated a criticaity-cognizant utiization bound for partitioned EDF-VD in conunction with the WFD heuristic, and then discussed its characteristics. We observed that as opposed to excusivey reying on tasks maximum utiizations, the feasibiity conditions for EDF-VD aso depend on tasks utiizations at other vaid eves. By expoiting the contributions of tasks utiizations at various eves on different cores, we deveoped a CA-TPA, and proposed severa heuristics to impement task prioritization, minimize the utiization increments on cores and baance system workoad. The experimenta resuts show that compared to the existing mapping schemes, the proposed CA-TPA scheme with partitioned EDF-VD can achieve better scheduabiity performance with acceptabe time compexity, offer more baanced partitions and experience ower run-time overhead. REFERENCES [] B. B. Brandenburg and J. H. Anderson, A comparison of the M-PCP, D-PCP, and FMLP on LITMUSRT, in Proc. Principes Distrib. Syst., Luxor, Egypt, 2008, pp. 05 24. [2] S. Baruah et a., The preemptive uniprocessor scheduing of mixedcriticaity impicit-deadine sporadic task systems, in Proc. 24th Euromicro Conf. Rea Time Syst., Pisa, Itay, 202, pp. 45 54. [3] S. Baruah et a., Preemptive uniprocessor scheduing of mixedcriticaity sporadic task systems, J. ACM, vo. 62, no. 2, p. 4, 205. [4] S. K. Baruah and A. Burns, Fixed-priority scheduing of duacriticaity systems, in Proc. 2st Int. Conf. Rea Time Netw. Syst., Sophia Antipois, France, 203, pp. 73 8. [5] S. K. Baruah, A. Burns, and R. I. Davis, Response-time anaysis for mixed criticaity systems, in Proc. 32nd IEEE Rea Time Syst. Symp., Vienna, Austria, 20, pp. 34 43. [6] S. Baruah, B. Chattopadhyay, H. Li, and I. Shin, Mixed-criticaity scheduing on mutiprocessors, Rea Time Syst., vo. 50, no., pp. 42 77, 204.

34 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO., JANUARY 208 [7] S. Baruah and S. Vesta, Scheduabiity anaysis of sporadic tasks with mutipe criticaity specifications, in Proc. 20th Euromicro Conf. Rea Time Syst., 2008, pp. 47 55. [8] Y. Chen, Q. Li, Z. Li, and H. Xiong, Efficient scheduabiity anaysis for mixed-criticaity systems under deadine-based scheduing, Chin. J. Aeronautics, vo. 27, no. 4, pp. 856 866, 204. [9] D. De Niz, K. Lakshmanan, and R. Rakumar, On the scheduing of mixed-criticaity rea-time task sets, in Proc. 30th IEEE Rea Time Syst. Symp., Washington, DC, USA, 2009, pp. 29 300. [0] P. Ekberg and W. Yi, Bounding and shaping the demand of generaized mixed-criticaity sporadic task systems, Rea Time Syst., vo. 50, no., pp. 48 86, 204. [] P. Ekberg and W. Yi, Scheduabiity anaysis of a graph-based task mode for mixed-criticaity systems, Rea Time Syst., vo. 52, no., pp. 37, 206. [2] G. Giannopouou, N. Stoimenov, P. Huang, L. Thiee, and B. D. de Dinechin, Mixed-criticaity scheduing on custer-based manycores with shared communication and storage resources, Rea Time Syst., vo. 52, no. 4, pp. 399 449, 206. [3] G. Gracioi, A. A. Fröhich, R. Peizzoni, and S. Fischmeister, Impementation and evauation of goba and partitioned scheduing in a rea-time OS, Rea Time Syst., vo. 49, no. 6, pp. 669 74, 203. [4] C. Gu, N. Guan, Q. Deng, and W. Yi, Partitioned mixed-criticaity scheduing on mutiprocessor patforms, in Proc. Design Autom. Test Europe Conf. Exhibit., Dresden, Germany, 204, pp. 6. [5] J.-J. Han, X. Tao, D. Zhu, and H. Aydin, Criticaity-aware partitioning for muticore mixed-criticaity systems, in Proc. 45th Int. Conf. Parae Process., Phiadephia, PA, USA, 206, pp. 227 235. [6] O. R. Key, H. Aydin, and B. Zhao, On partitioned scheduing of fixedpriority mixed-criticaity task sets, in Proc. Int. Conf. Trust Security Privacy Comput. Commun., 20, Changsha, China, pp. 05 059. [7] A. Kostrzewa, S. Saidi, and R. Ernst, Dynamic contro for mixedcritica networks-on-chip, in Proc. 36th IEEE Rea Time Syst. Symp., San Antonio, TX, USA, 205, pp. 37 326. [8] K. Lakshmanan, D. De Niz, R. Rakumar, and G. Moreno, Resource aocation in distributed mixed-criticaity cyber-physica systems, in Proc. 30th Int. Conf. Distrib. Comput. Syst., Genoa, Itay, 200, pp. 69 78. [9] G. Lipari and G. Buttazzo, Resource reservation for mixed criticaity systems, in Proc. Workshop Rea Time Syst. Past Present Future, York, U.K., 203, pp. 60 74. [20] J. M. López, J. L. Díaz, and D. F. García, Utiization bounds for EDF scheduing on rea-time mutiprocessor systems, Rea Time Syst., vo. 28, no., pp. 39 68, 2004. [2] M. Neukirchner, P. Axer, T. Michaes, and R. Ernst, Monitoring of workoad arriva functions for mixed-criticaity systems, in Proc. IEEE Rea Time Syst. Symp., Vancouver, BC, Canada, 203, pp. 88 96. [22] J. Ren and L. T. X. Phan, Mixed-criticaity scheduing on mutiprocessors using task grouping, in Proc. 27th Euromicro Conf. Rea Time Syst., Lund, Sweden, 205, pp. 25 34. [23] P. Rodriguez, L. George, Y. Abdeddaïm, and J. Goossens, Muti-criteria evauation of partitioned EDF-VD for mixed-criticaity systems upon identica processors, in Proc. Workshop Mixed Criticaity Syst., 203, pp. 49 54. [24] E. A. Lester, Risk-based aternatives to the DO-78C software design assurance process, in Proc. Digit. Avionics Syst. Conf., Prague, Czechia, 205, pp. 8B2-8B2-3. [25] V. Sciandra, P. Courbin, and L. George, Appication of mixed-criticaity scheduing mode to inteigent transportation systems architectures, ACM SIGBED Rev., vo. 0, no. 2, p. 22, 203. [26] L. Sigrist, G. Giannopouou, P. Huang, A. Gomez, and L. Thiee, Mixed-criticaity runtime mechanisms and evauation on muticores, in Proc. Rea Time Embedded Techno. App. Symp., 205, Seatte, WA, USA, pp. 94 206. [27] A. Specification, 65: Design Guidance for Integrated Moduar Avionics, Aeronautica Radio Inc., Annapois, MD, USA, 99. [28] H. Su and D. Zhu, An eastic mixed-criticaity task mode and its scheduing agorithm, in Proc. Conf. Design Autom. Test Europe, Grenobe, France, 203, pp. 47 52. [29] S. Vesta, Preemptive scheduing of muti-criticaity systems with varying degrees of execution time assurance, in Proc. 28th IEEE Rea Time Syst. Symp., Tucson, AZ, USA, 2007, pp. 239 243. [30] N. Zhang, C. Xu, J. Li, and M. Peng, A sufficient response-time anaysis for mixed criticaity systems with pessimistic period, J. Comput. Inf. Syst., vo., no. 6, pp. 955 964, 205. Jian-Jun Han M 07) received the Ph.D. degree in computer science and engineering from the Huazhong University of Science and Technoogy HUST), Wuhan, China, in 2005. He is currenty an Associate Professor with the Schoo of Computer Science and Technoogy, HUST. He was with the University of Caifornia at Irvine, Irvine, CA, USA, as a Visiting Schoar from 2008 to 2009, and with the Seou Nationa University, Seou, South Korea, from 2009 to 200. His current research interests incude rea-time systems, parae processing, and green computing. Xin Tao is currenty pursuing the master s degree with the Schoo of Computer Science and Technoogy, Huazhong University of Science and Technoogy, Wuhan, China. His current research interests incude rea-time scheduing agorithm, embedded systems, and operating systems. Dakai Zhu M 04) received the Ph.D. degree in computer science from the University of Pittsburgh, Pittsburgh, PA, USA, in 2004. He is currenty an Associate Professor with the Department of Computer Science, University of Texas at San Antonio, San Antonio, TX, USA. His current research interests incude rea-time systems, power aware computing, and faut-toerant systems. Dr. Zhu was a recipient of the U.S. Nationa Science Foundation Facuty Eary Career Deveopment Award in 200. Hakan Aydin M 02) received the Ph.D. degree in computer science from the University of Pittsburgh, Pittsburgh, PA, USA, in 200. He is currenty an Associate Professor with the Department of Computer Science, George Mason University, Fairfax, VA, USA. His current research interests incude rea-time systems, ow-power computing, faut toerance, and rea-time operating systems. Dr. Aydin served as the Technica Program Committee Chair of IEEE RTAS in 20. Zii Shao M 06) received the M.S. and Ph.D. degrees from the Department of Computer Science, University of Texas at Daas, Richardson, TX, USA, in 2003 and 2005, respectivey. He has been an Associate Professor with the Department of Computing, Hong Kong Poytechnic University, Hong Kong, since 200. His current research interests incude embedded systems, reatime systems, compier optimization, and hardware/software co-design. Laurence T. Yang SM 04) received the Ph.D. degree in computer science from the University of Victoria, Victoria, BC, Canada. He is currenty a Professor with the Schoo of Computer Science and Technoogy, Huazhong University of Science and Technoogy, Wuhan, China, and the Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada. His current research interests incude parae and distributed computing, and embedded and ubiquitous/pervasive computing.