Two-Phase Low-Energy N-Modular Redundancy for Hard Real-Time Multi-Core Systems

Size: px

Start display at page:

Download "Two-Phase Low-Energy N-Modular Redundancy for Hard Real-Time Multi-Core Systems"

Kelly Tyler
5 years ago
Views:

1 1 Two-Phase Low-Energy N-Modular Redundancy for Hard Real-Tme Mult-Core Systems Mohammad Saleh, Alreza Ejlal, and Bashr M. Al-Hashm, Fellow, IEEE Abstract Ths paper proposes an N-modular redundancy (NMR) technque wth low energy-overhead for hard real-tme multcore systems. NMR s well-suted for mult-core platforms as they provde multple processng unts and low-overhead communcaton for votng. However, t can mpose consderable energy overhead and hence ts energy overhead must be controlled, whch s the prmary consderaton of ths paper. For ths purpose the system operaton can be dvded nto two phases: ndspensable phase and on-demand phase. In the ndspensable phase only half-plus-one copes for each task are executed. When no fault occurs durng ths phase, the results must be dentcal and hence the remanng copes are not requred. Otherwse, the remanng copes must be executed n the on-demand phase to perform a complete majorty votng. In ths paper, for such a two-phase NMR, an energy-management technque s developed where two new concepts have been consdered: ) Block-parttoned schedulng that enables parallel task executon durng on-demand phase, thereby leavng more slack for energy savng, ) Pseudo-dynamc slack, that results when a task has no faulty executon durng the ndspensable phase and hence the tme whch s reserved for ts copes n the on-demand phase s reclamed for energy savng. The energymanagement technque has an off-lne part that manages statc and pseudo-dynamc slacks at desgn tme and an onlne part that manly manages dynamc slacks at run-tme. Expermental results show that the proposed NMR technque provdes up to 29% energy savng and s 6 orders of magntude hgher relable as compared to a recent prevous work. Index Terms Energy mnmzaton, mult-core systems, real-tme and embedded systems, relablty, schedulng. 1 INTRODUCTION M ULTI-CORE platforms have emerged to be popular and powerful computng engnes for many recent embedded systems [1], [2], [3], [4], [5]. Whle such archtectures have been employed for embedded applcatons that requre hgh performance computng, we beleve they also offer new consderable opportuntes for desgnng embedded systems where hard real-tme operaton, hgh relablty n the presence of transent faults, and low energy consumpton are requred [6], [7], [8]. In ths paper, we address the use of mult-core platforms to acheve hgh relablty wth low energy-overhead for hard realtme embedded systems. To acheve relablty aganst transent faults, we consder N modular redundancy (NMR) [9], [10], where multple processng unts execute dentcal copes for each task and ther results are voted on to produce a sngle output. NMR s well-suted for mult-core platforms as they satsfy NMR requrements such as multple processng unts and low-overhead communcaton for votng [3]. An NMR system can mask faults whle less than half of ts unts are faulty. Fault-tolerant real-tme systems that has been consdered n prevous works requre faultdetecton mechansms (e.g., [5], [6], [7], [8], [11]) and these works have assumed (usually mplctly) that they have perfect detecton mechansms (.e., they can detect all faulty task executons). However, common fault-detecton Mohammad Saleh and Alreza Ejlal are wth the Department of Computer Engneerng, Sharf Unversty of Technology, Tehran 14588, Iran (e-mal: mohammad_saleh@ce.sharf.edu, ejlal@sharf.edu). Bashr M. Al-Hashm s wth the School of Electroncs and Computer Scence, Unversty of Southampton, Southampton SO17 1BJ, U.K. (emal: bmah@ecs.soton.ac.uk). mechansms are far less effectve than what s requred for hghly relable systems, whereas NMR does not requre any specfc fault-detecton mechansm and uses result comparson (majorty votng) for fault-detecton and maskng [9], [10]. Snce t s very unlkely that all modules n NMR become faulty at the same tme and make the same erroneous results, comparng the results can provde almost perfect fault-detecton/-maskng [9], [10]. Also, result comparson can be combned wth hashbased detecton mechansms, e.g. Fngerprntng [31], to acheve very hgh detecton coverage, about [31]. Therefore, n our experments n Secton 5 we wll assume detecton coverage for our system. Lke all other fault-tolerance and fault-maskng technques, NMR can mpose consderable energy overhead [9], [10], whch s an mportant concern n the embedded systems where energy consumpton s promnent. To reduce the energy overhead, we propose an energy-management technque that bears the major contrbutons of the work and s specfcally developed for NMR when used for hard realtme mult-core systems (Sectons 3 and 4). The man contrbutons of ths work are: ) Consderng the domnance of the fault-free executon on faulty executons [6] [8] [38], a two-phase NMR s proposed that acheves mnmzed energy consumpton n the absence of faults whle guaranteeng relablty and deadlne requrements. ) A specfc type of slack tme, called pseudo dynamc slack, s consdered n ths work. As explaned n Secton 4, ths type of slack tme s dfferent from conventonal slack tmes,.e., statc and dynamc slack [6], [8], [20].

2 2 ) An energy-management technque s proposed that explots the pseudo-dynamc and statc slacks through offlne optmzaton (Secton 4.2). Ths s dfferent from prevous works that have not proposed a mechansm to manage the pseudodynamc slack. Also, an onlne energy-management technque s proposed to explot dynamc slacks at run-tme (Secton 4.3). v) A specfc schedulng technque s developed, called block-parttoned schedulng (Secton 3) that provdes the ablty of n-advance parallel task executon (Secton 3) to explot pseudo-dynamc slacks more effectvely. The remander of ths paper s organzed as follows. In secton 2 we revew the related work. The proposed technque s presented n Secton 3. Secton 4 descrbes the energy-management method whch s used for the proposed technque. The expermental results are presented n Secton 5. Fnally, Secton 6 concludes the paper. 2 RELATED WORK Some research works, e.g., [6], [7], [11], have addressed both fault tolerance and low energy-consumpton n faulttolerant real-tme systems wth two processors. These works have not consdered multple faults per task executon, and also they assume they have perfect faultdetecton mechansms. [13] has proposed voltage-scalng technques to reduce the energy consumpton of trplemodular redundancy (TMR). However, ths work has only consdered sngle task applcatons. Many prevous works n the context of mult-processor systems ether propose energy reducton management technques wthout consderng relablty (e.g., [14], [15], [32], [33]) or consder relablty wthout consderng energy consumpton (e.g., [4], [30], [34]). [14] has consdered varaton n executon tmes to propose a schedulng algorthm based on dynamc voltage scalng (DVS) [12] for mult-processor systems. [15] has studed the energy effcency of mult-core platforms that use multple voltage slands. [32] has proposed a technque to mnmze chplevel peak power consumpton n mult-core systems runnng sporadc real-tme tasks. [33] has proposed an adaptve task parttonng for mult-core systems runnng ndependent perodc real-tme tasks. [4] has evaluated schedulng heurstcs for tasks wth dfferent crtcalty. [30] has proposed a mappng optmzaton technque for mxed crtcal mult-core systems wth dfferent relablty requrements. [34] has proposed software transformatons to ncrease relablty through reducng nstructons vulnerabltes and the executons of crtcal nstructons. Recently, research works have also been focused on both energy and relablty consderatons n mult-core systems. Some works, e.g. [35], [36], [37] have proposed mult-core archtectures that explot redundancy at dfferent levels of abstracton to target low-energy consumpton and relablty. [35] has proposed an adaptve multcore archtecture that selectvely adjusts ppelne-level redundancy to satsfy relablty target wth low energy consumpton. [36] has proposed a customzable chp-level redundancy technque for mult-core systems that utlzes power effcent hardware fault-detecton mechansms along wth forward recovery to reduce overheads n case of fault-free executons. [37] has consdered the effects of DVS on the soft error rate and proposed a flexble dual modular redundancy (DMR) mechansm that selectvely enables per-core DMR to ncrease relablty. However, these works requre hardware modfcaton or redesgn, and hence, cannot be used by the current commercal-offthe-shelf processors, whle our proposed technque s general and can be exploted by any mult-core processor that supports DVS. Some works, e.g. [5] [16], [17], [18], [38], have proposed energy-management technques for task-level redundancy n mult-core systems. [5] and [16] have consdered only one faulty executon for each task to preserve the orgnal system relablty, whle for many applcatons (e.g., the applcatons that are used n harsh envronments) a hgh level of relablty cannot be acheved unless toleratng multple faulty tasks [9], [10], [17], [18]. Some works have consdered dfferent applcaton models, e.g. perodc ndependent real-tme tasks n [17] and [38] and parallel ndependent applcatons n [18]. However, these works cannot be appled to tasks wth precedence constrants (e.g., task graphs [5], [6], [7]), whle we consder hard real-tme applcatons wth task precedence constrant and propose a schedulng and energy-management technque for these applcatons. 3 PROPOSED TWO-PHASE NMR TECHNIQUE In ths paper, we consder frame-based applcatons [5], [6], [7] wth hard tmng requrements and task precedence constrants where n dependent tasks {T 1, T 2,,,T n} are executed wthn each executon frame and must be completed as a whole before the end of the frame (specfed by a deadlne D). We also consder that the task precedence constrants (dependences between the tasks) are depcted as a drected acyclc graph (DAG) [5], [6], [7]. For example Fg. 1a shows an example applcaton tasks graph wth sx tasks where the numbers placed above the tasks s ther worst-case executon tme at the maxmum supply voltage V max and the maxmum operatonal frequency f max (denoted by W for each task T ). For ths type of applcatons we propose a two-phase NMR technque wth low energy consumpton runnng on mult-core platforms. To do ths, a new schedulng technque s proposed and a new type of slack tme (whch s specfc to the proposed two-phase NMR) s exploted to manage energy consumpton. In ths secton we descrbe the twophase operaton of the system and the proposed schedulng technque and n the next secton we explan the energy-management technque. The two operaton phases of the proposed NMR are: 1. Indspensable phase: At frst the system operates n ts ndspensable phase where t executes a multcore schedule contanng copes of each task. For each task, the results of the task copes are compared. If no fault occurs, the task results must be dentcal and n ths case t s used as the result of the system. However, when the results

3 SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS 3 Indspensable phase On-demand phase Majorty votng (Result comparson)/savng results 20 T 1 30 T T T 5 T 2 Task Graph G (a) Core4 T 5 T 5 T 5 Core3 T 5 T 5 T 5 Core2 T 1 T 4 Core1 T 1 T 4 step 1 step 2 step 3 step 4 step 5 step 6 (b) LstSchedulngLTF(G, 2) for the ndspensable phase Core4 B 1 B 2 B 3 Core3 T 4 T 4 Core2 T 5 T 5 T3 T 5 Core1 T 1 step 1 step 2 step 3 step 4 step 5 step 6 (c) LstSchedulngLTF(G, 1) for the on-demand phase (d) Fg. 1. Syntheszng a TMR system (.e., NMR wth N=3) on a quad-core platform. a) An example task graph, b) Creatng a schedule wth two copes for each task for the ndspensable phase, c) Creatng a schedule wth one copy for each task for the on-demand phase, and d) A block-parttoned verson of the on-demand phase schedule. are not dentcal (when some faults have occurred durng the ndspensable phase), the system temporarly swtches to the on-demand phase where t executes the remanng copes of the task to perform a complete majorty votng. 2. On-demand phase: In ths phase, the system executes a part of a mult-core schedule that contans the remanng copes of the task whch had faulty executons n the ndspensable phase. As copes of the task have already been executed n the ndspensable phase, n the on-demand phase we execute the remanng copes of the same task to obtan N results for performng a complete majorty votng to mask the faults. Therefore, each of the two operaton phases of the proposed NMR technque requres ts own schedule, so that we need to synthesze two schedules from the same applcaton task graph. These two schedules are: ) a mult-core schedule contanng copes for each task for the ndspensable phase, ) a mult-core schedule contanng copes for each task for the on-demand phase. It s known that fndng the optmal mult-core schedule to maxmze parallelsm (.e., mnmzng the schedule tme length) s an NP-hard problem [5]. Indeed mult-core schedules are typcally obtaned by the lst schedulng algorthm [19] as a smple heurstc that also provdes parallelsm. Smlarly, n ths paper we use lst schedulng to synthesze the mult-core schedules of the ndspensable and on-demand phases. Also, n the lst schedulng, whenever several tasks can be scheduled (these are the tasks that all ther predecessors are scheduled), we use the longest task frst (LTF) polcy to determne the executon order. We wll dscuss n Secton 4 why the LTF polcy s effectve for our proposed technque. For example, consderng a TMR system (.e., NMR wth N=3), Fg. 1 shows the step by step generaton of the two mult-core schedules for a gven task graph (Fg. 1a) usng lst schedulng wth LTF polcy. Fg. 1b shows the schedule wth two copes of each task for the ndspensable phase and Fg. 1c shows the schedule wth one copy of each task for the on-demand phase. For the schedule whch s used n the on-demand phase, we requre that each task can overlap (n tme) wth at most one other task n each of the other cores. For example, the mult-core schedule of Fg. 1c (step 6) does not satsfy ths condton as n ths schedule T 2 overlaps wth both and T 5 on Core2 and also overlaps wth both T 4 and T 6 on Core3. Indeed, we need the schedule of Fg. 1c (step 6) to be transformed to a schedule lke the one n Fg. 1d that satsfes the condton as each task overlaps wth at most one other task n each of the other cores. We requre ths condton to be satsfed because t lets us partton the mult-core schedule nto tme blocks, so that n each block only one sngle task or multple parallel tasks exst. For example, n Fg. 1d the block B 1 only conssts of the task T 1, the block B 2 conssts of the parallel tasks T 2, and T 4, and the block B 3 conssts of the parallel tasks T 5 and T 6. In ths paper, we call such schedules, block-parttoned (BP) schedules. As we wll show later n ths secton, whenever a fault occurs durng the ndspensable phase, we swtch to the on-demand phase to execute exactly one block of the BP schedule and then we swtch back to the ndspensable phase to contnue executng the schedule of the ndspensable phase. As a mult-core schedule whch has been syntheszed usng the lst schedulng technque wth LTF polcy (e.g., the schedule of Fg. 1c) may not be BP, we use a smple technque to convert ordnary schedules to BP schedules. Suppose that n a mult-core schedule a task T A overlaps wth two other tasks T B and T C scheduled on another core (Fg. 2a). Assumng that the task T C comes after the task T B, we smply shft the task T C (and all ts successor tasks) to the rght untl there s no overlap between T A and T C. As t can be seen from Fg. 2a, the amount of ths shft (denoted by σ n the fgure) s smply the dfference between the fnsh tme of T A and the start tme of T C. We start from the begnnng of a mult-core schedule, move to the rght, and apply ths technque untl we obtan a BP mult-core schedule. As an example, when we apply ths technque to the schedule of Fg. 1c (step 6), we obtan the BP schedule of Fg. 2b (step 3). One pont that should be noted here s that block-parttonng may ncrease the executon tme of an applcaton and hence t may cause the applcaton to be unschedulable. Therefore, we use the proposed energy-management technque (Secton 4) when the applcaton total executon tme s less than ts deadlne. Ths mples that the energy-management technque mght not be used for some applcatons that have

4 4 σ B 1 B 2 B 3 B 1 B 2 B 3 T B T C T B T A T A (a) T C T 4 T 6 T 5 schedule S T 4 T 6 T 5 T 5 T 5 step 1 step 2 step 3 (b) BlockParttonng(S) Fg. 2. Block parttonng scheme. a) A technque to convert ordnary schedules to block-parttoned (BP) schedules and b) Block-parttonng a schedule that s not BP. tght deadlnes. Smlar schedulablty condtons are used by other technques, e.g. [16], [17] and [38], to defne nfeasble solutons. In the followng, we descrbe how the proposed twophase NMR technque works by means of the example of Fg. 1 where we have a TMR system runnng on a quadcore platform. When no fault has occurred the system executes the schedule of the ndspensable phase (Fg. 1b (step 6)) where two copes of each task T are executed and ther results are compared. If the results are dentcal, t s used as the result of the system. Whenever the results of a task T are not dentcal (whch ndcates that some faults have occurred durng the ndspensable phase), we swtch to the on-demand phase to execute the block of the BP schedule of the on-demand phase (Fg. 1d) that ncludes the same task T. After executng the thrd copy of T n the on-demand phase, a majorty votng s done over the three results to mask the faults. Then, we swtch back to the ndspensable phase to contnue executng ths schedule from the pont t was broken. Fg. 3 shows how the proposed technque operates when some faults occur durng executng the applcaton of Fg. 1. Note that, n ths paper, whenever we say a fault occurs or a task becomes faulty, we mean that the task gves an ncorrect result due to some errors (e.g. one or more transent faults). Assumng that the task T 2 becomes faulty, when comparng the results of T 2, they do not match, and hence the system temporarly swtches to the on-demand phase. The result msmatch may happen due to a fault durng the task executon or even due to a fault that corrupts the result comparson between the two phases. In the on-demand phase as T 2 belongs to the block B 2 of the BP schedule of Fg. 1d, the block B 2 s executed (the hghlghted tasks T 2 and T 4 n Fg. 3), and then a majorty votng s done over the results of the three copes of (a) (c) Indspensable phase On-demand phase Two dentcal results for T Result comparson/savng results Transent fault Result msmatch T 5 block B 2 T 5 T 5 block B 2 T 5 (b) T 5 T 4 T 5 T 5 T 4 T 5 T 2 T 2 ü T 5 block B 2 T 5 T 5 T 4 T 5 (d) ü T 5 T 5 T 4 T 5 T 2 T 1 T 2 ü ü T 2 ü T 5 block B 2 ü ü Fg. 3. Operaton of the proposed technque when faults occur durng the executon of the applcaton of Fg. 1. T 2 to mask the fault (Fg. 3a). Here, s not executed n the block B 2 durng the on-demand phase as t has already fnshed successfully before detectng the fault n T 2 and hence t s no longer requred. The mportant pont to be noted here s that when we execute B 2 durng the ondemand phase we not only execute T 2 (whose result s requred for majorty votng as ts executon n the ndspensable phase has been faulty), but also we execute T 4 n parallel wth T 2 and ts result s saved n memory, so that t can be used later for possble majorty votng. After executng the block B 2 the system swtches back to the ndspensable phase and contnues executng the schedule from the pont t was broken. After swtchng back to the ndspensable phase, two possble executon scenaros can be consdered regardng the task T 4: ) If a fault occurs durng the executon of T 4 n the ndspensable phase (Fg. 3b), when the result comparson ndcates fault occurrence, the system does not need to swtch to the on-demand phase as the results of three copes of T 4 are already avalable to be voted on (the results of two copes of T 4 are obtaned n the ndspensable phase and the result of another copy of T 4 already exsts n the nternal memory as t was executed n-advance n the prevous on-demand phase). ) If no fault occurs durng the executon of T 4 n the ndspensable phase (Fg. 3a), the results of the copy of T 4 that was executed n-advance n the prevous on-demand phase are no longer requred and can be dropped from the memory. One queston that may arse here s what happens f the n-advance executon of T 4 becomes faulty?. (Such a fault may occur durng the n-advance executon of T 4 n the on-demand phase or durng savng the results of the nadvance executon of T 4 between the two phases or even after the n-advance executon of T 4 n ts stored results). In ths case, when the system executes T 4 n the ndspensable phase, f no fault occurs (Fg. 3c), we wll not use the results of the n-advance executon of T 4, and hence no problem occurs. However, f the executon of T 4 n the ndspensable phase also becomes faulty (Fg. 3d), the system cannot mask ths second fault as the stored values of the n-advance executon are also faulty. Indeed, a TMR system can mask only one faulty executon for each task (generally speakng, an NMR system can mask faulty executons for each task) [9], [10]. The n-advance executons of tasks (e.g., T 4 n the block B 2 n Fg. 3) n the on-demand phase are useful because: ) Because of the use of parallel executon n the on-

5 SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS 5 Inputs: G: applcaton task graph N: parameter N of NMR, e.g, 3 for TMR Outputs: S IND: schedule for the ndspensable phase S BP: BP schedule for the on-demand phase 1: S IND=LstSchedulngLTF(G, ); // Fg. 4b 2: S TMP=LstSchedulngLTF(G, ); // Fg. 4b 3: S BP=BlockParttonng(S TMP); // Fg. 4c functon LstSchedulngLTF(G, q) // G: nput task graph, q: number of copes for each task n the // schedule, S: the output schedule 1: S = Null; // Intalze S wth an empty schedule 2: whle all tasks n G are not scheduled do 3: T = the largest unscheduled task n G whose predecessors -- have all scheduled; 4: Add q parallel copes of T to S; 5: endwhle; 6: return S; (a) (b) functon BlockParttonng(S) // S: the nput mult-core schedule 1: for each task T A from the begnnng of S do 2: f T A overlaps wth more than one task, T B and T C (where T C -- comes after T B n the same core) then 3: σ = (fnsh tme of T A) (start tme of T C); 4: shft T C and all ts successors n S to the rght by σ; 5: endf; 6: endfor; 7: for each block B n S do 8: shft all tasks n B to the rght and place them at the end of B; 9: endfor; 10: return S; (c) Fg. 4. The proposed schedulng technque. demand phase, n-advance executons do not mpose any tme overhead. For example, t can be seen n Fg. 3 that when the system have to execute T 2 n the on-demand phase, the n-advance executon of T 4 s performed n parallel wth t. Also, because of the use of LTF schedulng, tasks that come later n the schedule (e.g., T 4) can never be longer than the tasks that come earler (e.g., T 2) whch means that the n-advance executon of T 4 cannot lengthen the executon of the block B 2 n Fg. 3. Indeed, f we dd not use n-advance executons, we would not have any parallel executon durng the on-demand phase whch mples that the use of nadvance executons helps us reserve relatvely less slack tme for the on-demand phase, resultng n more slack to be avalable for energy management. ) Although n-advance executons of tasks n the ondemand phase may turn out to be useless when no fault occurs later durng the executon of the task n the ndspensable phase, they have a neglgble mpact on the average energy consumpton. Ths s because an n-advance parallel executon s performed only when a fault occurs n the ndspensable phase (for example n Fg. 3 the n-advance executon of T 4 has been performed because a fault has occurred n T 2 durng the ndspensable phase). Note that whle from a relablty pont of vew the consderaton of faults s a must, from the average energy consumpton pont of vew, we do not need to consder the cases where the system tolerates a fault [6], [13]. As an example, consder T 2 and T 4 n Fg. 3. Suppose that the probablty of a task executon becomes faulty s 10 4 and the energy consumpton of T 2 and T 4 are 10 mj and 5 mj. When no fault occurs, the system only executes T 2 n the ndspensable phase and consumes 2 10=20 mj. If a fault occurs durng the executon of T 2 n the ndspensable phase, the system wll execute T 2 and T 4 n the on-demand phase and hence consumes (10+5)=15 mj more energy. Therefore, the average energy consumpton for the executon of T 2 and T 4 s ( ) (20+15)= mj whch s very close to the energy consumpton when no faults occur (20 mj). Ths s also consstent wth our expermental observatons showng that the average energy consumpton dffers less than 0.01% from the fault-free energy consumpton. Ths example shows that the energy overhead of the nadvance executons s neglgble from the vewpont of average energy consumpton. Fg. 4 shows the pseudo-code of the proposed schedulng method used n our technque that receves an applcaton task graph (G) to make schedules for the ndspensable and on-demand phases (.e., S IND and S BP respectvely). The pseudo-code of Fg. 4a s the man body of the schedulng technque that calls the functons presented n Fgs 4b and 4c. The functon of Fg. 4b (LstSchedulng- LTF(G, q)) mplements the lst schedulng algorthm wth the LTF polcy to make a schedule S contanng q copes of each task from a task graph G. In ths functon, lne 1 s for the ntalzaton purpose. In lne 2, we begn a whle body to apply the schedulng to all tasks. Lne 3 s used to mplement LTF lst schedulng, as t selects the largest unscheduled task T whose predecessors have all scheduled. In lne 4, q parallel copes of T are scheduled. Fnally, lne 6 returns the schedule S. As t can be seen from Fg. 4a, ths functon s requred for both the ndspensable and on-demand phases. For the ndspensable phase we need a schedule contanng copes for each task, and for the on-demand phase we need a schedule contanng copes for each task. We make these two schedules n lnes 1 and 2 of Fg. 4a. In lne 3 of Fg. 4a we use the functon of Fg. 4c (BlockParttonng(S)) to convert the schedule S TMP (temporary schedule obtaned from lne 2 of Fg. 4a) to the BP schedule S BP. The functon of Fg. 4c receves a mult-core schedule S and starts from the begnnng of the schedule (lne 1). In lne 2 we check f each task, say T A, overlaps wth more than one task n another core n the schedule S, say T B, T C (where T C comes after T B on the same core). If so, through lnes 3 and 4 we shft the task T C (and all ts successor tasks n S) to the rght untl there s no overlap between T A and T C. In

6 6 lne 4, when we shft T C to the rght, we need to shft all the tasks that come after T C on the same core and the tasks that are dependent to T C (successors of T C n the task graph) but are scheduled on the other cores. After removng possble overlaps n the schedule (.e., parttonng the schedule nto blocks), through lnes 7 to 9 we shft all tasks n each block to the rght to place them at end of the block. We wll dscuss n Secton 4 why ths s effectve for our proposed technque. Fnally, n lne 10 the schedule S (.e., a BP schedule) s returned. It s noteworthy that although the proposed NMR technque needs at least cores for parallel executon of each task n the ndspensable and on-demand phases, f less than cores are avalable, the proposed technque stll can be used (wth a slght change) but wth less parallelsm. Indeed the technque can be even used for a sngle core where for each task, at frst the system executes copes of the task one after another (n seres) n ts ndspensable phase and then compares ther results. If some faults occur durng the ndspensable phase, the system executes the remanng copes of the task (agan n seres) for the on-demand phase and fnally the whole results are voted on to mask the faults. It should be noted that ths reduced parallelsm obvously takes more tme and hence may not be sutable for realtme systems wth tght deadlnes. When more cores are avalable, more parallelsm can be acheved that results n lower schedule length that provdes hgher schedulablty [19]. Ths can also release some statc slack tme that can be used for energy management. 4 ENERGY MANAGEMENT For the proposed NMR technque we have mplemented a specfc energy-management technque whch comprses offlne (Secton 4.2) and onlne (Secton 4.3) stages and explots dfferent types of slack tme to reduce the system energy consumpton through DVS [12]. Let W IND and W BP be the worst-case tme t takes to execute the schedules S SC and S BP n the ndspensable and on-demand phases respectvely. We need not only to reserve the tme W IND for the ndspensable phase but also to reserve the tme W BP for the on-demand phase. Hence, the proposed technque s feasble when W IND+W BP D (D s the applcaton deadlne) and the statc slack SS whch s left over from the applcaton and can be used for energy management s: IND BP SS D W W (1) where W IND+W BP s the applcaton total executon tme. As the amount of statc slack s known at desgn tme, offlne technques (e.g., the even slack dstrbuton technque n [20]) can be used at desgn tme to dstrbute ths slack among the tasks. However, n the proposed technque, there are also two other types of slack tme that are created at run-tme, and hence, unlke the statc slack, cannot be allocated at desgn tme, and have to be allocated at run-tme. These two types of slacks are: Dynamc slack: Ths slack results at run-tme when a task consumes less than ts worst-case executon tme due to early completon [6], [8], [11]. It should be noted that the actual executon tme of a task s not known at desgn tme, and hence the dynamc slack tme whch s obtaned from the task s also not known at desgn tme. Pseudo-dynamc slack: Although we always reserve enough tme to execute the BP schedule completely, we do not usually need to execute the tasks of ths schedule at run-tme. Ths s because when copes of a task fnshes successfully durng the ndspensable phase, ths task no longer re- qures the addtonal copes n the ondemand phase. Therefore, the task copes can be dropped from the BP schedule, thereby releasng some slack. We have called ths slack pseudodynamc slack because, just lke dynamc slacks, t s created at run-tme, but unlke dynamc slacks, ts amount can be calculated offlne at desgn tme. When a task T executes successfully n the ndspensable phase and we drop ts copes from the schedule of the on-demand phase, the pseudo-dynamc slack tme δ s released that can be exploted by DVS to reduce the energy consumpton of the subsequent tasks n the ndspensable phase. As the schedule of the on-demand phase s avalable at desgn tme, the amount of ths reclamed slack can be calculated offlne at desgn tme. To do ths, at desgn tme, we consder droppng the tasks from the schedule of the on-demand phase one after another n the order n whch they appear n the schedule of the ndspensable phase and the tme whch s released due to droppng a task T s the pseudo-dynamc slack δ. Fg. 5 shows n more detal how we calculate the pseudo-dynamc slack δ whch s released after droppng T from the schedule of the on-demand phase. To calculate the pseudo-dynamc slack δ the followng three cases can be consdered: 1. Case I (Fg. 5a): If there s no task except T n the block, when T s dropped from the schedule the released slack δ wll be W +c, where W s the worstcase executon tme of T and c s the maxmum tme whch s requred for comparng the results (majorty votng) or savng results. 2. Case II (Fg. 5b): If T s the largest task n the block (.e., W max{w j} for all the remanng tasks n the block), after droppng T from the schedule the value of pseudo-dynamc slack δ s W -max{w j}. 3. Case III (Fg. 5c): If there exsts at least one task T j n the block larger than T, after removng T from the schedule no pseudo-dynamc slack wll be released. Consderng the three cases n Fg. 5, δ s calculated as:... δ T W +c (a) δ... T W T j W j (b) T j W j T W (c)... Fg. 5. Pseudo-dynamc slack (δ ) calculaton. B 1 B 2 B 3 T 5 δ 1 δ 2 δ 3 δ 4 δ 5 δ 6 (d)

7 SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS 7 W c when only T exsts n the block (Case I) W max Wj when W max Wj (Case II) 0 when W <W j for at least one task T j (Case III) In the followng we llustrate how pseudo-dynamc slack s calculated by means of an example. Fg. 5d shows the pseudo-dynamc slack δ whch wll be released after droppng each task T from the BP schedule. The worstcase executon tmes of the example tasks are shown n Fg. 1a. The tasks are dropped from Fg. 5d n the order n whch they are scheduled. In ths example, wthout loss of generalty, we assume that comparng results (majorty votng) and savng results for all the tasks consume 5 tme unts (.e., c =5 for all the tasks). For ths example, t can be seen from the schedule of Fg. 5d that as the task T 1 s a sngle task n the block B 1 (Case I), f we drop T 1 from the schedule, the released slack wll be δ 1=W 1+c 1=25. After droppng T 1, f we drop T 2 from the schedule, as T 2 s the largest task n the block B 2 (Case II), the released slack wll be δ 2=W 2 W 3=20. After droppng T 2, as the task s the largest task n B 2, the released slack wll be δ 3=W 3 W 4=10 (Case II). After droppng, the task T 4 wll be a sngle task n the block B 2 and f we drop T 4 from the schedule the released slack wll be δ 4=W 4+c 4=35 (Case I). Smlarly, we obtan: δ 5=20 and δ 6=25. Although the amount of the pseudo-dynamc slacks can be calculated at desgn tme, t should be noted that ths slack s not avalable (and hence cannot be allocated) from the begnnng of the applcaton executon and s created at run tme when a task fnshes successfully n the ndspensable phase. Ths s why we call t pseudo-dynamc slack. It s noteworthy that the proposed schedulng technque (Secton 3) helps to dstrbute pseudo-dynamc slacks evenly among the tasks. It s known that even slack dstrbuton results n more energy savng as compared to uneven slack dstrbuton [6], [20]. Indeed pseudodynamc slack s prone to be dstrbuted unevenly among the tasks. Ths s because a pseudo-dynamc slack whch s obtaned from a task cannot be exploted by the same task or by ts prevous tasks and t can only be exploted by ts subsequent tasks. Therefore, those tasks that appear later n the schedule have more chance to gan larger pseudo-dynamc slacks as compared to the tasks that come earler. Ths mples that when pseudo-dynamc slack becomes avalable sooner rather than later, t helps to dstrbute pseudo-dynamc slacks more evenly. To acheve ths, we use two polces n our proposed schedulng technque (Secton 3): ) we move tasks n each block of the BP mult-core schedule to the end of the block, thereby enablng the slacks to appear sooner n the block (see Fg. 5b), ) we use the LTF polcy. To gve an nsght nto how the LTF polcy works, consder the followng example. Suppose that three tasks T 1, T 2 and wth worst-case executon tmes W 1=6, W 2=3 and W 3=2 appear n the LTF order n the ndspensable phase. Assumng that these tasks are n one block of the on-demand phase, usng (2), the pseudo-dynamc slacks obtaned from these tasks wll be δ 1=W 1 W 2=3, δ 2=W 2 W 3=1 and δ 3=W 3=2 (In (2) ths example we assume that c =0). However, f the tasks appear n the order, T 1 and T 2 whch s not LTF, the pseudo-dynamc slacks wll be δ 3=0, δ 1=3 and δ 2=3. Therefore, n the LTF order pseudo-dynamc slacks are avalable sooner and hence can be dstrbuted among the tasks more evenly. As we explaned earler, dynamc slack may result at run-tme due to early completon of tasks [6], [8], [11]. However, as the actual executon tme of a task s not known at desgn tme, the amount of dynamc slack s also not known at desgn tme. Hence, we provde an onlne energy-management technque to explot dynamc slacks at run-tme (Secton 4.3). Wth respect to pseudodynamc slacks, snce unlke dynamc slack the amount of pseudo-dynamc slack s known at desgn tme, we have developed a specfc offlne technque to manage pseudodynamc slacks (Secton 3.2). 4.1 Energy and Relablty Models Power consumpton of each task T manly comprses dynamc power P dyn(t ) and statc power P stat(t ). The dynamc power s determned by [6]: P (T ) C V f (3) 2 dyn eff where C eff s the effectve swtched capactance, V and f are, respectvely, the supply voltage and the operatonal frequency durng the executon of T [6], [7]. The statc power s manly comprsed of sub-threshold leakage power and can be wrtten as: Vth VT stat (T ) sub 0 P I V I e V (4) where V s the supply voltage, I sub s the sub-threshold leakage current, I 0 depends on technology parameters and devce geometres, η s a technology parameter, V th s the transstors threshold voltage, and V T s the thermal voltage [6]. When DVS s used, each task T s executed at a voltage V, whch may be less than V max (the maxmum possble supply voltage). For each task T, we defne the normalzed supply voltage ρ as follows: V V (5) max When a task T s executed at the scaled voltage V =ρ V max, consderng an almost lnear relatonshp between voltage and frequency [6], [7], we have: f =ρ f max, where f s the operatonal frequency correspondng to V and f max s the maxmum possble operatonal frequency (correspondng to V max). Therefore, when DVS s used, the actual executon tme of the task s prolonged from t to t /ρ, and by substtutng V =ρ V max and f =ρ f max n (3) and (4), the total energy whch s consumed to execute the task T s gven by [6]: 2 2 t 2 ET ( ) ( IsubVmax CeffVmaxfmax ) ( PS PD) t (6) where P S=I subv max and P D=C effv 2 maxf max are respectvely the statc and dynamc powers when the system performs at the maxmum voltage and frequency. Wthout consderng the energy consumpton of the on-demand phase (whch commonly has a very low

8 8 probablty of beng performed as faults rarely occur [6], [8]), we focus on the ndspensable phase and am at mnmzng the fault-free energy consumpton (lke the works [5], [6], [8]). Usng the energy model of (6) that gves the energy consumpton of a sngle task, the energy whch s consumed to execute a task T n the ndspensable phase (.e., executng copes of the task and comparng the results) can be wrtten as: 2 ENMR(T ) N /2( PS PD ) t c (7) where c s the result comparson tme. Based on (7), the energy consumpton of the fault-free executon of an applcaton wth n tasks usng the proposed NMR technque can be calculated as: n 2 Eapp ENMR (T ) N /2( PS PD ) t c 1 1 n (8) As t s explaned n Secton 3, the fault-free energy consumpton s very close to the average energy consumpton. Therefore, we use (8) n our offlne energy management at desgn tme (Secton 4.2) to mnmze the fault-free energy consumpton. Also, n our experments n Secton 5 we report the fault-free energy consumpton. Transent faults are usually assumed to follow a Posson dstrbuton wth an average rate λ [5], [6]. Consderng the effects of DVS on transent fault rates, the fault rate at the scaled supply voltage V =ρ V max (ρ mn ρ ρ max=1) s modeled as [5], [6]: ( ) 10 0 d (1 ) 1mn where λ 0=λ(ρ max=1) s the fault rate at the maxmum voltage V max, ρ mn s the rato of the mnmum supply voltage V mn to V max, and the exponent value d s a technology dependent constant [5], [6]. Consderng (9) (.e., the effect of voltage scalng on transent fault rate), the probablty of a task T beng executed correctly s wrtten as [5], [6]: t ( ) ( ) (9) R e (10) where λ(ρ ) s gven by (9) and t /ρ s the executon tme of T when executed at V =ρ V max. Conversely, the probablty of falure of the task T (.e., the unrelablty of T ) s denoted by [5], [6]: t ( ) F( ) 1 R( ) 1e (11) To calculate the relablty of the proposed two-phase NMR technque we consder two cases: ) the fault-free executon where all copes of each task are executed successfully n the ndspensable phase and ) the case where some tasks n the ndspensable phase become faulty and we perform the on-demand phase. In NMR, the correct executon of at least copes of each task s requred for the system to be functonal. In the proposed NMR technque, all the correct executons may be performed n the ndspensable phase (when no fault occurs), or some of them are performed n the on-demand phase (when a fault occurs). Therefore, the relablty of the proposed system can be calculated by consderng the two cases. The frst case gves the relablty of T n the fault-free state, and the second case gves the relablty when some faults occur durng the executon of T. When no fault occurs, copes of each task T are executed n the ndspensable phase and the ondemand phase s not requred. When we use DVS n the ndspensable phase, each task T s executed on the scaled supply voltage ρ V max. Therefore, usng (10), the relablty of a task T n the fault-free case can be calculated as: R 1(T ) ( ) R (12) where R ( ) s gven by (10). To calculate the relablty for the case that k (1 k ) copes of each task become faulty (n NMR up to faulty executons can be masked [9], [10]), we consder all the cases that j (1 j k) copes from copes of T n the ndspensable phase and k j copes from copes of T n the on-demand phase become faulty. In these cases, the other j copes n the ndspensable phase and (k j) copes n the on-demand phase are executed correctly. Therefore, the probablty of the correct executon of a task T when up to executons of T become faulty can be calculated usng (10) and (11) as: N /2 k N /2 j N /2 R2(T ) F( ) R( ) k1 j1 j ndspensable phase N /2 F( max ) R( max ) k j k j N/2 ( k j) on-demand phase j (13) where ρ determnes the scaled voltage whch s employed n the ndspensable phase, ρ max=1 s employed n the ondemand phase (as n the on-demand phase no DVS s used and tasks are executed at the maxmum supply voltage V max). Consderng both the fault-free and faulty condtons, the relablty of a task T n the presence of up to faults when executed by the proposed NMR technque, can be wrtten as: R(T ) R1(T ) R2(T ) (14) The relablty of an applcaton executon reles on the correct executon of all ts tasks. Therefore, usng (14), the relablty of an applcaton wth n tasks runnng by the proposed NMR technque can be calculated as: R app n R(T ) (15) Offlne Energy Management As explaned n the prevous sectons, n the proposed NMR technque, when no fault occurs, we do not execute the on-demand phase (ncludes half of the copes for each task,.e., N /2 ), whch results n consderable energy savng as compared wth conventonal NMR. In ths secton we dscuss how the proposed NMR technque explots statc and pseudo-dynamc slack tmes to acheve even further energy reducton. For ths purpose, we develop a specfc technque to allocate statc and pseudodynamc slack tmes to tasks offlne at desgn tme. When we allocate statc and pseudo-dynamc slack tmes, we

9 SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS 9 assume that no dynamc slack exsts, as the avalablty and the amount of dynamc slack tmes s not known at desgn tme. Indeed, at frst we mnmze the expected energy consumpton of the system by the offlne allocaton of statc and pseudo-dynamc slacks assumng that no dynamc slack exsts. However, at run-tme we also explot dynamc slacks through our onlne energymanagement for further energy savng (Secton 4.3). To develop the offlne slack allocaton, we formulate the problem as an optmzaton problem. To do ths, we formulate tme constrants as nequaltes. For the frst task T 1, as the task s executed at the scaled supply voltage ρ 1V max, ts worst-case executon tme ncreases from W 1+c 1 to (W 1+c 1)/ρ 1. Consderng that the only slack tme whch s avalable to T 1 s the statc slack tme (SS gven by (1)) and no pseudo-dynamc slack s avalable to t (as pseudo statc slack s obtaned only from the prevous tasks and T 1 has no prevous task), T 1 cannot explot more than the statc slack SS. So we have: W c 1 1 ( W1c1) SS (16) 1 It should be noted that although the whole of statc slack SS s avalable to the frst task T 1, ths does not necessarly mean that t explots all ts avalable slack tme. Indeed, each task can explot only a part of ts avalable slack and set asde the remanng for the subsequent tasks. Durng the ndspensable phase, when a task T fnshes successfully the pseudo-dynamc slack δ whch s obtaned by droppng the task T from the on-demand phase, s avalable to ts subsequent tasks n the ndspensable phase. Consequently, the task T 2 can explot both the part of statc slack SS left over by T 1 and the pseudodynamc slack δ 1 whch s obtaned by droppng T 1 from the on-demand phase. Hence, for the task T 2, we have: W c W c ( W c ) SS ( W c ) Obtaned 1 from T1 Statc slack left over from T1 Smlarly, for each task T (1 n) we have: W c W c ( W c ) SS j j ( W c ) j j j Tj Tj j Obtaned from the prevous tasks Statc slack left over from the prevous tasks (17) (18) where Φ s the set of all tasks that has been executed before startng the task T. The optmzaton problem of the offlne part of energy management can be wrtten as: mnmze: E E (T ) subject to: app app demand 1 W c Wj cj c1: ( W c) j SS ( Wj c j) Tj T j j for all T (1 n) c2: R R n NMR (19) where E app s the energy consumpton of an applcaton executed usng the proposed NMR technque (gven by (8)), the constrant c1 (Inequalty 18) s used to consder tme constrants,.e., to consder how much slack s avalable to each task (ncludng pseudo-dynamc and statc slack), and the constrant c2 guarantees that the system relablty does not fall below a requred level R demand. The parameters, tasks worst-case executon tme (W ), result comparson tme (c ), statc slack tme (SS) and pseudo-dynamc slack (δ ) are all known at desgn tme. Ths mples that, ths optmzaton problem can be solved offlne at desgn tme to determne the ρ values whch mnmze the system energy consumpton. It should be noted however that we cannot assgn obtaned ρ values to the tasks at desgn tme. Rather, we store the ρ values, and durng the ndspensable phase we assgn the supply voltage ρ V max to the task T, whenever all ts prevous tasks fnsh successfully. In other words, the ρ values that we calculate usng the proposed offlne technque s only vald for the fault-free executon. If some faults occur durng the ndspensable phase, the ρ values wll be no longer vald. Ths s because when a fault occurs n a task T durng the ndspensable phase, the system cannot drop t from the schedule of the on-demand phase, whch means that the pseudo-dynamc slack δ wll not be longer avalable. One possble soluton for ths problem s the offlne calculaton of ρ values for all possble fault scenaros and at run tme based on how faults occur we can decde to use the proper set of ρ values. However, we do not use ths method as the fault-free state s the most probable state and hence s the most promnent state from the vewpont of average energy consumpton [6], [8]. Therefore, n the proposed technque we use the ρ values that are calculated for the fault-free case. However, f a fault occurs at run-tme, we temporarly do not use the ρ values that are calculated offlne (as they are no longer vald) and from then on, we only use the proposed onlne management technque (Secton 4.3) to allocate pseudodynamc slacks. From the begnnng of the next frame we agan use the ρ values that are calculated offlne. 4.3 Onlne Energy Management Let x be the slack (ncludng the pseudo-dynamc and statc slacks) that s allocated to a task T at desgn tme usng the offlne part of our energy-management (Secton 4.2). When DVS s used, the task worst-case executon tme ncreases from W +c to (W +c )/ρ. On the other hand, as we explot the slack x by DVS, we can also say that the task worst-case executon tme ncreases from W +c to W +c +x. Ths mples that we have: W c x ( W c ) (20) Indeed, after calculatng the ρ values by solvng the offlne optmzaton problem at desgn tme, we obtan the slack x (ncludng pseudo-dynamc and statc slacks) that we allocate to a task T usng (20). At run-tme, for each task T, the total slack tme SL whch s avalable to the task can be wrtten as: SL x DS j (21) where x s the slack tme whch has been calculated offlne n Secton 4.2 (ncludng both pseudo-dynamc and statc slacks), and DS j s the dynamc slack whch has

10 10 been left over by the prevous task (the task T j) n the ndspensable phase due to early completon at run-tme. Snce SL s the whole slack tme whch s avalable to the task T, the scaled supply voltage ρ V max whch s assgned to the task, must not prolong ts worst-case executon tme beyond the tme W +c +SL,.e., we requre: W c W c SL (22) Clearly the proposed onlne energy management must take nto account the tme-constrant gven by (22). Another mportant constrant that must be taken nto account s for guaranteeng relablty. Let be the mnmum value of ρ that does not cause the system relablty falls below the requred level. Clearly we requre: (23) In the proposed onlne energy manager, as DVSenabled processors usually have dscrete voltage/frequency levels (Secton 5), we always select the smallest value of ρ among the set of possble ρ values that satsfes both the Inequaltes (22) and (23). In order to be able to check Inequaltes (22) and (23) at run-tme we need to have SL and values at run-tme. To calculate the slack tme SL (gven by (21)) at run-tme, note that x values have been calculated offlne and stored to be used at run-tme. Also the dynamc slack tme DS whch s obtaned from the task T can be easly calculated at runtme as follows. When DVS s used for the frst task (T 1), the actual executon tme of the task s (t 1+c 1)/ρ 1. Snce all the slack tme whch s avalable to T 1 s x 1, the maxmum tme whch s avalable for executng T 1 s W 1+c 1+x 1 therefore the dynamc slack whch s obtaned from T 1 s: 1 1 DS1 ( W1c 1x 1) t c (24) For the remanng tasks (T, 2 n), the maxmum avalable tme s W +c +x +DS j (where DS j s the dynamc slack whch has been left over by the task T j whch s the task that s fnshed just before startng the task T ). Therefore, we can wrte: t c DS ( W c x DS j) (25) At the end of each task T, we can use (25) (except for the frst task that we use (24)) to calculate DS at run-tme. It can be seen from (25) that to calculate the dynamc slack DS, we need to know W +c +x, DS j, and (t +c )/ρ. The parameter W +c +x s known at desgn tme, and hence t can be calculated offlne and stored to be used at runtme. DS j, s the dynamc slack obtaned from the task T j (whch s the task that s fnshed just before startng the task T ), and s already calculated at the end of the T j. (t +c )/ρ s the actual executon tme of the task T (ncludng the result comparson tme), and when the task fnshes, ts executon tme can be easly calculated usng the nternal system clock (as ths executon tme s the dfference between the start tme and fnsh tme of the task). In short, at the end of each task T, the dynamc slack tme DS, can be calculated at run-tme wth very low overhead as ts onlne calculaton only requres a few subtracton and addton operatons. The mnmum possble value of 1 ρ that does not cause the system relablty falls below the requred level (.e. values) can also be calculated offlne at desgn tme. To do ths we can solve the optmzaton problem of (19), but wthout consderng the constrant c1. Ths s because the constrant c1 s used to consder tme constrants, but to calculate values we want to know whch values of ρ can guarantee the requred level of relablty regardless of tme constrants. 5 EVALUATION AND DISSCUTIONS Experments n ths paper were conducted based on the power model of the Intel PXA270 processor [21]. Ths processor can operate at dfferent voltage levels n the range of V, and the correspondng frequences vary from 13MHz to 624MHz. The energy consumpton for actve cores s calculated by (8) where P S and P D (that are respectvely the statc and dynamc power consumpton of the system when operatng at the maxmum voltage and frequency) are 925mW and 260mW respectvely [21]. Also, the Intel PXA270 processor has a low power sleep mode wth mW of dle power consumpton. We consdered that when a core s dsabled or s temporarly unused, t enters the sleep mode and only consumes the dle power. We modfed the tool MEET [22] to profle executon tme and energy consumpton whle usng DVS based on the power model of Intel PXA270. Lke the works [5], [6], [13], [16], we performed systemlevel relablty smulaton where the relablty was calculated by (15) and expressed n terms of applcaton probablty of falure PoF app (.e. PoF app=1-r app). The fault rate was modelled usng (9) under the parameters λ 0=10-6 faults/s and d=3 [5], [6]. Therefore, the fault rate vares between 10 6 faults/s and 10 3 faults/s, correspondng to the maxmum and mnmum voltage levels. Prevous research works on relable real-tme systems that do not use NMR rely on fault-detecton mechansms [5], [6], [7], [8], [11]. However, they have usually overlooked the overhead and fault coverage of detecton mechansms. Indeed, they usually do not consder any specfc detecton mechansm and smply assume that a detecton mechansm wth perfect fault coverage s part of the tasks (e.g., [5], [6], [7]). However, to provde far comparsons, we need to nclude a real fault-detecton mechansm n any mplementaton of prevous works whch s used n our comparsons. To do ths, we consdered that the prevous works use fault-detecton mechansms ncluded n ther tasks (.e., software fault-detecton mechansms). We conducted a set of experments to nvestgate the energy and executon tme overheads of the software fault-detecton mechansms that can be used for prevous works. To consder effect of fault-detecton mechansms on energy and relablty we used two types of software fault-detecton mechansms n the mplementatons of prevous works that were used n our comparsons: 1. Heavy fault-detecton mechansms (called HFD): wth hgh fault-detecton overheads but relatvely hgh fault coverage. For ths case we assumed that the system uses multple fault-detecton mechansms based on code and data redundancy, arthmetc

11 SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS 11 TABLE 1 TIME (T) AND ENERGY (E) OVERHEADS OF HEAVY FAULT- DETECTION (HFD) AND LIGHT FAULT-DETECTION (LFD). Energy Consumpton (mj) No Fault- Detecton HFD LFD Overhead (%) HFD LFD Benchmark T(ms) E(mJ) T(ms) E(mJ) T(ms) E(mJ) T E T E QuckSort BtCounts BascMath SusanSmooth SusanCorners SusanEdges [5]-HFD [5]-LFD CTMR LE-TMR Robot Sparse fpppp MPEG4 MJPEG Benchmark code, consstency check, and control flow checkng [9], [10], [23], [24], [25], [26] to acheve hgh fault coverage for dfferent fault types. 2. Lght fault-detecton mechansms (called LFD): wth relatvely low fault-detecton overheads and also low fault coverage. For ths case we assumed that the system uses fewer mechansms to reduce the fault-detecton overhead wth the cost of decreased detecton coverage [26]. Table 1 shows the tme and energy overheads that the software fault-detecton mechansms mpose (assumng that we use the supply voltage 1.55V). To measure the overheads the applcatons were selected from the MBench [27] benchmarks. It should be noted that whle both tme and energy overheads of software faultdetecton mechansms are lower than the overhead of modular redundancy wth result comparson (majorty votng), the fault coverage of software mechansms s not suffcently hgh, unlke majorty votng that provdes hgh fault maskng [9], [10], [23], [24], [26]. Furthermore, these software fault-detecton mechansms are applcaton-specfc so that each task requres ts specfc detecton mechansm [9], [10], [25], [26], whle result comparson and majorty votng are general and can be used for any type of tasks wthout requrng any hardware modfcaton or redesgn [9], [10], [25]. To evaluate the effectveness of the proposed NMR technque (whch we call t LE-NMR), we compared LE- NMR wth a recent work (proposed n [5]). To provde a far comparson, for both the mplementatons of LE- NMR and the system of [5], we assumed that both use the same level of task replcaton,.e., when we consder an NMR wth N copes for each task, we also consdered that the system of [5] has N-1 backups for each task (.e., agan N copes for each task) to acheve fault tolerance. In addton, the system of [5] requres a fault-detecton mechansm to determne f a backup task must be executed or not. Lke most of the prevous works, [5] has not addressed any fault-detecton mechansm, but we consdered that the tasks that are scheduled n the system of [5] use task-specfc software mechansms for fault-detecton. To do ths, we consdered mplementatons of [5] where the tasks ncluded heavy fault-detecton mechansms (called [5]-HFD) and lght fault-detecton mechansms (called [5]-LFD). We also consdered n our experments an mplementaton of conventonal NMR, called CNMR, where we do not use the two phases ndspensable and on-demand. In conventonal NMR, all N copes of each task are executed n parallel (assumng that enough cores are avalable) and the statc slack tme s only used for energy reducton. It should be noted that there are varous technques to acheve low-energy fault-tolerance n real-tme systems (e.g., [6], [7], [8], [11], [13], [16], [17], [18]) and t s beyond the scope of ths paper to compare the proposed technque wth all these varous technques. The man reason to choose the technque of [5] for the comparson s that t s a recent work wth smlar condtons to the proposed technque, e.g., hard real-tme constrants, the use of DVS, and the frame-based applcaton model wth task precedence constrants (a set of dependent tasks wth a global deadlne) runnng on mult-core platforms. Also, t s noteworthy that for many of the prevous works t s not meanngful to compare them wth the proposed technque because they consderably dffer from ours n applcaton model (e.g., chan of dependent tasks n [6], sngle-task frame n [13], perodc tasks n [11], [17], and ndependent tasks n [18]). To compare LE-NMR wth [5] and conventonal NMR, we used both synthetc and practcal applcaton task graphs. To do ths, we used the task graph generator TGFF [28] and the Standard Task Graph set (STG) [29]. The STG benchmark sute contans both synthetc task graphs and practcal real-tme applcaton task graphs ncludng robot control, SPEC fpppp and a sparse matrx solver. We also conducted experments on two other realworld applcatons: MPEG4 decoder and MJPEG encoder (ther task graphs can be found n [15]). Fg. 6 and Table 2 show, respectvely, the energy consumpton and probablty of falure for [5]-HFD, [5]-LFD, CTMR, and the proposed LE-TMR when runnng the practcal applcatons. The three followng nterestng observatons can be made from Fg. 6 and Table 2: 1. LE-TMR not only provdes more energy savng (n average 28% and up to 33%) as compared to [5]- HFD, but also has a less probablty of falure,.e., LE-TMR s more relable. 2. Although LE-TMR provdes relatvely less energy savng (n average 12%) as compared to [5]-LFD, LE-TMR has a far less probablty of falure (t provdes much hgher relablty). 3. LE-TMR provdes more energy savng (n average 34%) as compared to CTMR, whle provdes almost the same level of relablty (Table 2). Fg. 6. Energy consumpton of LE-TMR, [5]-HFD, [5]-LFD, and CTMR when runnng the practcal applcatons.

12 12 TABLE 2 PROBABILITY OF FAILURE (POF) FOR LE-TMR, [5]-HFD, [5]-LFD, AND CTMR WHEN RUNNING THE PRACTICAL APPLICATIONS. Applcaton [5]-HFD [5]-LFD CTMR LE-TMR Robot Sparse Fpppp MPEG MJPEG Another set of experments were conducted n order to analyze how the parallelsm degree of task graphs affects the effectveness of our technque. To do ths, synthetc task graphs were generated. It s known that for task graphs wth the same number of tasks, the heght of the task graph can be used to take the parallelsm degree nto account [39]. Based on ths, n the experment three classes of task graphs wth dfferent parallelsm degrees were consdered. Let n be the number of nodes (tasks) n a task graph and h be the task graph heght. Clearly h can vary between 1 and n, therefore the three classes of consdered task graphs are: ) task graphs wth 1 h n/3 (called task graphs wth hgh parallelsm degree), ) task graphs wth n/3 h 2n/3 (called task graphs wth medum parallelsm degree), and ) task graphs wth 2n/3 h n (called task graphs wth low parallelsm degree). The tasks of the synthetc task graphs were randomly selected from the MBench benchmarks and the tme and energy overheads of the detecton mechansms for these tasks were taken from Table 1. The worst-case and actual executon tmes (W and t ) of the tasks were generated randomly [4], [5], [6]. The worst-case executon tmes were unformly dstrbuted between 10ms and 100ms. However, as the actual executon tmes for each task may have dfferent probablty dstrbutons, lke works [4], [5], [6], n our experments, we consdered the unform, normal, or exponental dstrbutons for the actual executon tme t and each task T was executed only for the duraton of t. In the experment, t was assumed that task graphs wth 20, 50, 100, 200, 500 tasks wth dfferent parallelsm degrees were executed on mult-core systems wth 2, 4, 8, 16 and 32 cores. Each case (e.g., a task graph wth 50 tasks on an 8-core system) was smulated for 1500 tmes wth dfferent parameters (.e., tasks worst-case and actual executon tmes and applcaton deadlne) and the average results are reported n Fgs 7 and 8. These fgures show the energy consumpton and probablty of falure (PoF) for LE-TMR, [5]-HFD, [5]-LFD, and CTMR. These observatons can be made from Fgs 7 and 8: 1. It can be seen from Fg. 7 that, for all the four systems, as the parallelsm degree of task graphs ncreases, the energy consumpton decreases. However, the energy consumpton of LE-TMR s always less than the other three systems. 2. Whle the energy consumpton of all the four systems decreases wth the ncrease n the task graph parallelsm degree, LE-TMR favours more energy reducton as compared to the others. For example, assumng we have 16 cores, as the task graph parallelsm degree ncreases from low (Fg. 7a) to hgh (Fg. 7c), the energy consumpton of LE-TMR reduces from 1698mJ to 1231mJ (28% reducton), whle the energy consumpton of CTMR reduces from 2132mJ to 1944mJ (9% reducton). 3. As Fg. 8 shows, LE-TMR has a far less probablty of falure than the mplementatons of [5], even compared to the mplementaton of [5] that uses heavy fault-detecton mechansms ([5]-HFD). Ths s because of the superorty of majorty votng (NMR) n coverng the faults as compared to faultdetecton mechansms [9], [10], [24], [25], [26]. 4. Whle LE-TMR provdes almost the same relablty as CTMR (Fg. 8), LE-TMR consumes much less energy than CTMR (Fg. 7) manly because of the more sophstcated energy-management technque that LE-TMR uses. We also compared LE-NMR wth N=5 and N=7 (.e., LE-5MR and LE-7MR respectvely) wth [5] and the conventonal NMR. The experments demonstrate that LE- NMR completely outperform [5] from both the energy- Energy Consumpron (mj) [5]-HFD [5]-LFD CTMR LE-TMR Number of cores 32 (a) Low parallelsm degree Energy Consumpron (mj) [5]-HFD [5]-LFD CTMR LE-TMR Number of cores (b) Medum parallelsm degree Fg. 7. Energy consumpton of LE-TMR, [5]-HFD, [5]-LFD, and CTMR when runnng the synthetc applcatons. Probablty of Falure (PoF) [5]-HFD [5]-LFD CTMR LE-TMR 1E-02 1E-04 1E-06 1E-08 1E Number of cores (a) Low parallelsm degree Probablty of Falure (PoF) [5]-HFD [5]-LFD CTMR LE-TMR 1E-02 1E-04 1E-06 1E-08 1E Number of cores (b) Medum parallelsm degree Energy Consumpron (mj) Probablty of Falure (PoF) [5]-HFD [5]-LFD CTMR LE-TMR Number of cores (c) Hgh parallelsm degree [5]-HFD [5]-LFD CTMR LE-TMR 1E-02 1E-04 1E-06 1E-08 1E Number of cores (c) Hgh parallelsm degree Fg. 8. Probablty of falure (PoF) n logscale for LE-TMR, [5]-HFD, [5]-LFD, and CTMR when runnng the synthetc applcatons.

13 SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS 13 consumpton and relablty vewponts. LE-5MR and LE- 7MR provde n average respectvely 19% (up to 22%), and 17% (up to 21%), and 31% (up to 36%) energy savng as compared to the correspondng mplementatons of [5] and the conventonal NMR. An nterestng observaton from the experments s that none of the mplementatons of [5] can acheve hgh relablty (the mplementatons of [5] cannot acheve a probablty of falure less than 10 3 ) whle LE-NMR satsfes the requred relablty level of safety-crtcal applcatons as they may requre probablty of falure be less than 10 9 [6], [9], [10]. Ths s because the mplementatons of [5] use software fault-detecton mechansms whle the fault coverage of these mechansms s not suffcently hgh [9], [10], [25], [26], unlke LE-NMR that uses majorty votng that provdes hgh fault maskng [9], [10], [24], [25], [26]. 6 CONCLUSION In ths paper, we descrbed how mult-core platforms can be exploted to acheve hgh relablty wth low energyoverhead for hard real-tme systems. To do ths, we proposed a low-energy NMR (we called t LE-NMR). To acheve energy savng n LE-NMR we explot two man strateges. Frst, we adopt a two-phase NMR technque, where usually (when no fault occurs) only one phase s executed, resultng n a consderable energy savng compared wth conventonal NMR systems. Second, to acheve further energy savng, we use DVS. In developng the proposed LE-NMR technque, we have consdered two new concepts: ) Block-parttoned schedulng and ) Pseudo-dynamc slack management. To explot avalable slacks n the system by DVS, we have developed an energy-management technque wth offlne and onlne parts. The offlne part at desgn tme derves and solves an optmzaton problem to explot the slacks that are known at desgn tme (.e., statc and pseudo-dynamc slacks), and to assgn dynamc slacks to the tasks at run-tme, the onlne part s used. The expermental results show that LE-NMR provdes up to 34% energy savng and s 6 orders of magntude hgher relable as compared to an mplementaton of a recent prevous work. ACKNOWLEDGMENT Mohammad Saleh and Alreza Ejlal acknowledge Research Vce-Presdency of Sharf Unversty of Technology for fundng ths work under grant no. G Bashr M. Al-Hashm acknowledges the EPSRC (UK), for fundng ths work n part under grant PRME EP/K034448/1. Expermental data used n ths paper can be found at DOI: /SOTON/ ( REFERENCES [1] J. Henkel, V. Narayanan, S. Parameswaran, and J. Tech, Run- Tme Adapton for Hghly-Complex Mult-Core Systems, Proc. Nnth IEEE/ACM/IFIP Int l Conf. Hardware/Software Codesgn and System Synthess (CODES+ISSS'13), pp. 1-8, Sept Oct , do: /CODES-ISSS [2] W.Y. Lee, Energy-Effcent Schedulng of Perodc Real-Tme Tasks on Lghtly Loaded Multcore Processors, IEEE Trans. Parall. Dstr. Syst., vol. 23, no. 3, pp , March 2012, do: /TPDS [3] A. Munr, S. Ranka, and A. Gordon-Ross, Hgh-Performance Energy-Effcent Multcore Embedded Computng, IEEE Trans. Parall. Dstr. Syst., vol. 23, no. 4, pp , Aprl 2012, do: /TPDS [4] H. Su, D. Zhu, and D. Mosse, Schedulng Algorthms for Elastc Mxed-Crtcalty Tasks n Multcore Systems, Proc. IEEE 19th Int l Conf. Embed. Real-Tme Computng Syst. and Applcatons (RTCSA'13), pp , Aug. 2013, do: /RTCSA [5] Y. Guo, D. Zhu, and H. Aydn, Relablty-Aware Power Management for Parallel Real-Tme Applcatons wth Precedence Constrants, Proc. Int l Green Computng Conf. and Workshops (IGCC), pp.1-8, July 2011, do: /IGCC [6] A. Ejlal, B.M. Al-Hashm, and P. Eles, Low-Energy Standby- Sparng for Hard Real-Tme Systems, IEEE Trans. Comput.-Ad. Des. Integr. Crcuts Syst., vol. 31, no. 3, pp , March 2012, do: /TCAD [7] M.K. Tavana, M. Saleh, and A. Ejlal, Feedback-Based Energy Management n a Standby-Sparng Scheme for Hard Real-Tme Systems, Proc. IEEE 32nd Real-Tme Systems Symposum (RTSS'11), pp , Nov Dec. 2011, do: /RTSS [8] R. Melhem, D. Mosse, and E. Elnozahy, The nterplay of power management and fault recovery n real-tme systems, IEEE Trans. Comput., vol. 53, no. 2, pp , Feb 2004, do: /TC [9] D.K. Pradhan, Fault-tolerant Computer System Desgn. Prentce- Hall, Inc., Upper Saddle Rver, NJ, [10] I. Koren, and C.M. Krshna, Fault-Tolerant Systems. Morgan Kaufmann, Elsever, San Francsco, CA, [11] M.A. Haque, H. Aydn, and D. Zhu, Energy-Aware Standby- Sparng Technque for Perodc Real-Tme Applcatons, Proc. IEEE 29th Int l Conf. Comput. Desgn (ICCD'11), pp , Oct. 2011, do: /ICCD [12] T.D. Burd, T.A. Perng, A.J. Stratakos, and R.W. Brodersen, A dynamc voltage scaled mcroprocessor system, IEEE J. Sold- State Crcuts (JSSC), vol. 35, no. 11, pp , Nov. 2000, do: / [13] D. Zhu, R. Melhem, D. Mosse, and E. Elnozahy, Analyss of an Energy Effcent Optmstc TMR Scheme, Proc. Tenth Int l Conf. Parall. and Dstr. Syst. (ICPADS'04), pp , July 2004, do: /ICPADS [14] J. Cong and K. Gururaj, Energy Effcent Multprocessor Task Schedulng under Input-dependent Varaton, Proc. Desgn, Automaton and Test n Europe Conf. and Exhbton (DATE'09), pp , Aprl 2009, do: /DATE [15] X. Q and D. Zhu, Energy effcent block-parttoned multcore processors for parallel applcatons, J. Comput. Scence Tech., vol. 26, no. 3, pp , May 2011, do: /s [16] X. Q, D. Zhu, and H. Aydn, Global schedulng based relablty-aware power management for multprocessor real-tme systems, J. Real-Tme Syst., vol. 47, no. 2, pp , March 2011, do: /s x. [17] M.A. Haque, H. Aydn, and D. Zhu, Energy-Aware Task Replcaton to Manage Relablty for Perodc Real-Tme Applcatons on Multcore Platform, Int l Green Computng Conf. (IGCC'13), pp. 1-11, June 2013, do: /IGCC

14 [18] D. Zhu, R. Melhem, and D. Mosse, Energy Effcent Redundant Confguratons for Real-Tme Parallel Relable Servers, J. Real-Tme Syst., vol. 41, no. 3, pp. 195-221, Aprl 2009, do: 10.

M. Al-Hashm, and P. Eles, System-Level Desgn Technques for Energy-Effcent Embedded Systems. Norwell, MA: Kluwer, 2004. [21] Intel Corp., Intel PXA270 Processor, Avalable: http://www.ntel.com. [22] M.

1109/TIM.2013.2248288. [23] N. Oh, P.P. Shrvan, and E.J. McCluskey, Control-Flow Checkng by Software Sgnatures, IEEE Trans. Rel., vol. 51, no. 1, pp. 111-122, Mar 2002, do: 10.1109/24.994926. [24] J.

14 14 [18] D. Zhu, R. Melhem, and D. Mosse, Energy Effcent Redundant Confguratons for Real-Tme Parallel Relable Servers, J. Real-Tme Syst., vol. 41, no. 3, pp , Aprl 2009, do: /s [19] E.G. Coffman and R.L. Graham, Optmal Schedulng for Two- Processor Systems, Acta Informatca, vol. 1, no. 3, pp , 1972, do: /BF [20] M.T. Schmtz, B.M. Al-Hashm, and P. Eles, System-Level Desgn Technques for Energy-Effcent Embedded Systems. Norwell, MA: Kluwer, [21] Intel Corp., Intel PXA270 Processor, Avalable: [22] M. Bazzaz, M. Saleh, and A. Ejlal, An Accurate Instructon- Level Energy Estmaton Model and Tool for Embedded Systems, IEEE Trans. Instrum. Meas., vol. 62, no. 7, pp , July 2013, do: /TIM [23] N. Oh, P.P. Shrvan, and E.J. McCluskey, Control-Flow Checkng by Software Sgnatures, IEEE Trans. Rel., vol. 51, no. 1, pp , Mar 2002, do: / [24] J. Ademark, J. Vnter, P. Folkesson, and J. Karlsson, Expermental Evaluaton of Tme-Redundant Executon for a Brakeby-wre Applcaton, Proc. Int l Conf. Dependable Syst. and Networks (DSN 02), pp , 2002, do: /DSN [25] K.S. Ym, V. Sdea, Z. Kalbarczyk, D. Chen, and R.K.A. Iyer, A Fault-Tolerant Programmable Voter for Software-Based N- Modular Redundancy, Proc. IEEE Aerospace Conf., pp. 1-20, March 2012, do: /AERO [26] S. Feng, S. Gupta, A. Ansar, and S. Mahlke, Shoestrng: Probablstc Soft Error Relablty on the Cheap, Proc. 15th Archtectural Support for Programmng Languages and Operatng Syst. (ASPLOS 10), pp , 2010, do: / [27] M.R. Guthaus, J. S. Rngenberg, and D. Ernst, MBench: A free, commercally representatve embedded benchmark sute, Proc. IEEE Int l Workshop on Workload Characterzaton (WWC-4), pp. 3-14, Dec. 2001, do: /WWC [28] D. Rhodes and R. Dck, TGFF: Task Graphs for Free, Proc. 6th Int l Workshop on Hardware/Software Codesgn (CODES/CASHE '98), pp , Mar 1998, do: /HSC [29] T. Tobta and H. Kasahara, A standard task graph set for far evaluaton of multprocessor schedulng algorthms, J. Schedulng, vol. 5, no. 5, pp , Sep. 2002, do: /jos.116. [30] S.-H. Kang, H. Yang, K. Sungchan, I. Bacvarov, S. Ha, and L. Thele, Relablty-aware mappng optmzaton of mult-core systems wth mxed-crtcalty, Proc. Desgn, Automaton and Test n Europe Conf. and Exhbton (DATE 14), pp. 1-4, March 2014, do: /DATE [31] J.C. Smolens, B.T. Gold, J. Km, B. Falsaf, J.C. Hoe, A.G. Nowatzyk, Fngerprntng: Boundng Soft-Error-Detecton Latency and Bandwdth, IEEE Mcro, vol. 24, no. 6, pp , Nov./Dec. 2004, do: /mm [32] J. Lee, B. Yun, and K. G. Shn, Reducng Peak Power Consumpton n Mult-Core Systems wthout Volatng Real-Tme Constrants, IEEE Trans. Parall. Dstr. Syst., vol. 25, no. 4, pp , Aprl 2014, do: /TPDS [33] S. Saha, J. S. Deogun, Y. Lu, Adaptve energy-effcent task parttonng for heterogeneous mult-core multprocessor realtme systems, Int l Conf. Hgh Performance Computng and Smulaton (HPCS), pp , July 2012, do: /HPCSm [34] S. Rehman, F. Krebel, M. Shafque, J. Henkel, Relablty- Drven Software Transformatons for Unrelable Hardware, IEEE Trans. Comput.-Ad. Des. Integr. Crcuts Syst., vol. 33, no. 11, pp , Nov. 2014, do: /TCAD [35] T. Mller, N. Surapanen, R. Teodorescu, Flexble Error Protecton for Energy Effcent Relable Archtectures, 22nd Int l Symp. Comput. Arch. and Hgh Performance Comput. (SBAC-PAD), pp. 1-8, Oct. 2010, do: /SBAC-PAD [36] R. Jeyapaul, F. Hong, A. Rhsheekesan, A. Shrvastava, K. Lee, UnSync-CMP: Multcore CMP Archtecture for Energy- Effcent Soft-Error Relablty, IEEE Trans. Parall. Dstr. Syst., vol. 25, no. 1, pp , Jan. 2014, do: /TPDS [37] R. Vadlaman, J. Zhao, W. Burleson, and R. Tesser, Multcore soft error rate stablzaton usng adaptve dual modular redundancy, Proc. Desgn, Automaton and Test n Europe Conf. and Exhbton (DATE'10), pp , March 2010, do: /DATE [38] T. We, P. Mshra, K. Wu, H. Lang, Fxed-Prorty Allocaton and Schedulng for Energy-Effcent Fault Tolerance n Hard Real-Tme Multprocessor Systems, IEEE Trans. Parall. Dstr. Syst., vol. 19, no. 11, pp , Nov. 2008, do: /TPDS [39] H. Topcuoglu, S. Harr, M.-Y. Wu, Performance-effectve and low-complexty task schedulng for heterogeneous computng, IEEE Trans. Parall. Dstr. Syst., vol. 13, no. 3, pp , Mar 2002, do: / Mohammad Saleh receved the M.S. degree n computer engneerng from Sharf Unversty of Technology, Tehran, Iran, n 2010, where he s currently workng toward the Ph.D. degree n computer engneerng. From 2014 to 2015, he was a vstng researcher n the Char for Embedded Systems CES, Karlsruhe Insttute of Technology (KIT), Germany. Hs research nterests nclude low-power desgn of embedded systems, mult-/many-core systems wth a focus on dependablty/relablty, low power, and the tradeoff between the fault tolerance and energy effcency n real-tme systems. Alreza Ejlal s an Assocate Professor of Computer Engneerng at Sharf Unversty of Technology, Tehran, Iran. He receved a Ph.D. degree n computer engneerng from Sharf Unversty of Technology n From 2005 to 2006, he was a vstng researcher n the Electronc Systems Desgn Group, Unversty of Southampton, UK. In 2006 he joned Sharf Unversty of Technology as a faculty member n the department of computer engneerng and from 2011 to 2015 he was the drector of Computer Archtecture Group n ths department. Hs research nterests nclude low power desgn, real-tme embedded systems, and fault-tolerant embedded systems. Bashr M. Al-Hashm (M 99 SM 01 F 09) s a Professor of computer engneerng, Dean of Faculty of Scences and Engneerng, and the Drector of the Pervasve Systems Center, Unversty of Southampton, U.K. He s ARM Professor of computer engneerng and the Co-Drector of the ARM-ECS Research Center. Hs research nterests nclude methods, algorthms, and desgn automaton tools for low-power desgn and test of embedded systems.

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling Real-Tme Systems Multprocessor schedulng Specfcaton Implementaton Verfcaton Multprocessor schedulng -- -- Global schedulng How are tasks assgned to processors? Statc assgnment The processor(s) used for