PreDVS: Preemptive Dynamic Voltage Scaling for Real-time Systems using Approximation Scheme

Size: px

Start display at page:

Download "PreDVS: Preemptive Dynamic Voltage Scaling for Real-time Systems using Approximation Scheme"

Candace Jennings
6 years ago
Views:

1 PreDVS: Preemptve Dynamc Voltage Scalng for Real-tme Systems usng Approxmaton Scheme Wexun Wang and Prabhat Mshra Department of Computer and Informaton Scence and Engneerng Unversty of Florda, Ganesvlle, Florda, USA ABSTRACT System optmzaton technques based on dynamc voltage scalng (DVS) are wdely used wth the am of reducng processor energy consumpton. Inter-task DVS assgns the same voltage level to all the nstances of each task. Its ntra-task counterpart explots more energy savngs by assgnng multple voltage levels wthn each task. In ths paper, we propose a voltage scalng technque, named PreDVS, whch assgns voltage levels based on the task set s preemptve schedulng for hard real-tme systems. Our approach s based on an approxmaton scheme hence can guarantee to generate solutons wthn a specfed qualty bound (e.g., wthn % of the optmal) and s dfferent from any exstng nter- or ntratask DVS technques. PreDVS explots statc tme slack at a fner granularty and acheves more energy savng than nter-task scalng wthout ntroducng any extra voltage swtchng overhead. Moreover, t can be effcently employed together wth exstng ntra-task scalng technques. Expermental results demonstrate that PreDVS can sgnfcantly reduce energy consumpton and outperform the optmal nter-task voltage scalng technques by up to 24%. Categores and Subject Descrptors C.3 [SPECIAL-PURPOSE AND APPLICATION-BASED SYS- TEMS]: Real-tme systems and embedded systems General Terms Algorthm, Desgn Keywords Real-tme systems, energy-aware schedulng, dynamc voltage scalng, approxmaton algorthm. INTRODUCTION Energy conservaton has been the man concern n embedded system optmzaton for a long tme snce these systems are generally lmted by battery lfetme. Real-tme embedded systems requre unque desgn consderatons snce task executon must meet ther deadlnes n order to ensure correct system behavor. A task set s sad to be schedulable f all the tasks can fnsh executon wthn ther tme constrants. Perodc task set s of major nterest snce t s most popular n typcal real-tme systems. A task n such systems arrves n regular ntervals and normally has a deadlne equal to ts perod. Earlest Deadlne Frst (EDF) [] and Rate Ths work was partally supported by NSF grant CCF Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. DAC 200, June 3-8, 200, Anahem, Calforna, USA. Copyrght 200 ACM $0.00. Vdd τ τ 2 Orgnal Schedule Vdd Tme Inter-task DVS Tme Tme Vdd PreDVS (Our approach) Vdd Intra-task DVS Tme Fgure : Inter-task DVS, PreDVS and Intra-task DVS. Monotonc (RM) [2] are two most commonly used schedulng algorthms n real-tme systems. EDF employs a dynamc prorty scheme and has been proved to be optmal wth an utlzaton bound of [2]. Dynamc voltage scalng (DVS) [3] s wdely acknowledged as one of the most effectve processor energy savng technque. The reason behnd ts capablty to save energy s that lnear reducton n the supply voltage leads to approxmately lnear slow down of performance whle the power can be decreased quadratcally. Dynamc cache reconfguraton technques are recently proposed for real-tme embedded systems to reduce cache herarchy s energy consumpton [4] [5] [6]. In ths paper, we propose a novel voltage scalng technque whch generates voltage assgnment based on the preemptve schedule of the target task set. Unlke heurstcs whch may generate nferor results n certan cases, we develop a fully polynomal approxmaton scheme whch can guarantee to gve solutons wthn specfed qualty bounds. Our approach, named PreDVS, dffers from nter-task and ntra-task DVS as llustrated n Fgure. Specfcally, PreDVS dffers from exstng nter-task scalng technques [7] [8] [9] [0] n that we adjust processor voltage level multple tmes throughout each task s executon to potentally acheve more energy savngs. For example, n Fgure, f the deadlne of task τ s 7, nter-task DVS cannot further lower any task s voltage level otherwse deadlne wll be mssed snce t requres reducton of voltage for all task nstances. However, PreDVS s able to further reduce the energy consumpton by lowerng the voltage level for the frst segment of τ. PreDVS dffers from exstng ntra-task DVS technques [] [2] [3] n the followng ways. Exstng ntra-task scalng methods assume that statc slack allocaton has already been done. They only consder one task nstance (local optmzaton) and focus on explotng dynamc tme slacks generated at runtme due to early fnshed task executon. They requre excessve analyss, runtme trackng and modfcaton of the task source code, whch s not always feasble n realty. Furthermore, they normally result n large number of addtonal voltage swtchng ponts and most of them assume contnuous voltage levels. Our approach ams at statc slack explotaton and s carred out durng desgn tme. In fact, our technque s complementary to exstng ntra-task DVS technques. Any ntra-task scalng can be appled after PreDVS to further reduce energy consumpton at runtme. Our technque consders the problem globally and fnd a voltage assgnment for all tasks so that the total energy consumpton can be mnmzed whle no task deadlne s volated. Snce our approach focuses on statc slack explotaton, we assume every task takes ts worst-case executon tme to complete. We focus on hard real-tme systems wth preemptve perodc task sets n ths work. The system can be executed on any processor wth dscrete voltage/frequency levels. The rest of the paper s organzed as follows. Frst, we for-

2 mulate the problem n Secton 2. Next, we descrbe our problem transformaton scheme and an pseudo-polynomal algorthm whch can gve optmal solutons n Secton 3. We then propose our fully polynomal-tme approxmaton scheme. Secton 4 presents our expermental results. Fnally, Secton 5 concludes the paper. 2. PROBLEM FORMULATION In ths secton, we formulate our problem, prove ts NP-hardness and then dscuss ts unque dffcultes. We are gven: A set of m ndependent perodc tasks T{τ,τ 2,...,τ m }. Each task τ T has known perod p, deadlne d and worstcase executon cycles (WCEC) c. A voltage scalable processor whch supports l dfferent voltage levels V{v,v 2,...,v l }. Task τ T has energy consumpton e k and executon tme t k at processor voltage v k V n the worst-case. Note that we use WCEC c here to reflect the worst-case workload of each task snce t s ndependent of processor frequency level. e k and t k can be computed based on the underlyng processor energy model. We assume that each task s released at the begnnng of every perod and the relatve deadlne s equal to the perod. We prove that our voltage scalng problem s NP-hard by consderng a smplfed verson of our problem nter-task scalng n whch each task τ T s unquely assgned a fxed voltage level throughout all ts jobs. The system schedulablty can be guaranteed by restrctng the total utlzaton rate of task set under the scheduler s bound U. Note that t s suffcent to consder the task schedulng over ts hyper-perod P (equal to the least common multple of all tasks perods) snce perodc task set has repettve executon pattern durng every P. To be more specfc, the smplfed problem can be stated as: subject to, m l P mn(e = x k = k= p ek ) () m l x k tk l U ; = k= p x k = (2) k= In Equaton (), x k s a 0/ varable whch denotes whether task τ s assgned wth voltage level v k. Equaton (2) shows the suffcent condton of schedulablty whch must be satsfed and the assumpton that only one voltage level s assgned to each task. Accordng to [0], the nter-task scalng problem as formulated above s NP-hard. Now let s swtch back to our orgnal problem. By assgnng multple voltage levels at dfferent places throughout each task s executon, more energy savngs can be acheved snce we have more flexblty durng decson makng. Clearly, t s not feasble to consder all possble postons. Task preempton, whch creates multple segments of a sngle job, provdes natural opportuntes to assgn dfferent voltage levels to each task. In ths paper, we examne the EDF schedule when the system s executed wthout DVS and change the processor frequency whenever a job starts executon or resumes after beng preempted. Snce nter-task scalng technques also have to perform voltage swtchng n all these occasons, our strategy does not ntroduce any addtonal runtme overhead. It s mportant snce voltage scalng overhead can have negatve mpact on overall energy consumpton and performance [] [3]. Note that PreDVS leads to dstnct energy consumpton and executon tme for each task s dfferent jobs. As the smplfed verson has been shown to be NP-hard, the orgnal problem s NP-hard as well. Our approach s also applcable wth RM schedulng but s not dscussed n ths paper due to page lmtaton. 3. APPROXIMATION SCHEME Snce PreDVS problem has shown to be NP-hard, the best opton s to devse an effcent method that can lead to approxmateoptmal solutons. As descrbed n Secton 2, the orgnal problem essentally adds another dmenson (voltage/frequency selecton for a task s each segment) to the smplfed verson. Ths fact prevents us from solvng the problem drectly by adaptng approxmaton algorthms for MCKP or nter-task DVS [0]. In ths paper, we make two prmary contrbutons. Frst, we develop a problem transformaton scheme that can elmnate ths complexty. Next, we propose an fully polynomal-tme approxmaton algorthm whch can effcently solve the problem. 3. Problem Transformaton Ths secton descrbes four mportant steps of our problem transformaton scheme. Step : As n tradtonal systems wthout a voltage scalable processor, all the tasks are executed at a fxed frequency. We use the case n whch the processor s runnng solely at the hghest voltage as the baselne and further assume that the gven task set s schedulable n ths case otherwse applyng DVS s not meanngful. We smulate and generate an EDF schedule of the task set on the target system. Each task s set to take ts WCEC c to fnsh. Durng smulaton, we let the scheduler generate the dstnct block lst and dstnct block set lst of each task, whch are defned as: DEFINITION A dstnct block s an executon segment of a task wth a dstnct par of start and end pont. DEFINITION A dstnct block set s a set of dstnct blocks whch compose a whole job of a task. Every dstnct block set has a dfferent set of dstnct block(s) from another. Let b j and s j denote the j th dstnct block and dstnct block set of task τ, respectvely. Fgure 2 llustrates these two terms. In ths example, we consder three tasks: τ {3,3,} 2, τ 2 {5,5,2} and τ 3 {2,2,4}. For task τ, t has only one dstnct block b that s of ts entrety, whch forms ts only dstnct block set s = {b }. Task τ 2, however, has three dstnct blocks: b 2 appears from tme to 3, whch s of ts entrety; b 2 2 appears from tme 5 to 6, whch s ts frst half; b 3 2 appears from tme 7 to 8, whch s ts second half. Therefore, τ 2 has two dstnct block sets: s 2 = {b 2 } and s2 2 = {b2 2, b 3 2 }. Task τ 3, durng ts frst perod, experences two preemptons whch result n three dstnct blocks: b 3 whch s ts frst quarter, b2 3 whch s ts second quarter and b 3 3 whch s the rest half. These three dstnct blocks compose a dstnct block set s 3 = {b 3, b2 3, b3 3 }. In practce, one needs to consder the whole hyper-perod P to collect all the dstnct blocks (and sets) for each task. We denote s j as the number of blocks n s j and δ as the number of dstnct block sets for task τ. τ 3 τ 2 b 3 b 3 2 b 3 3 b 2 b 2 2 b 2 3 b b b b τ Tme Fgure 2: Dstnct block and dstnct block set. Step 2: We calculate the energy consumpton and executon tme for each dstnct block under all voltage levels n V. Let e j,k and t j,k denote these two values of task τ s j th dstnct block b j under voltage v k. Note that energy and tme overhead for voltage transton are ncorporated n them, respectvely. In other words, we generate a profle table for every b j j,k whch stores e and t j,k for all l voltage levels n V. 2 The three elements n the tuple here denote perod, deadlne and worst-case executon tme, respectvely.

3 Step 3: For each dstnct block set s j, dfferent voltage assgnment for every dstnct block n t wll effect the total energy consumpton as well as executon tme for the entre set, whch essentally forms a whole job. In order to take all possble scenaros nto account, we calculate the total energy consumpton and executon tme of all voltage level combnatons, each of whch comprses of one voltage level chosen for each dstnct block n s j j,h. Let E and T j,h stand for the total energy consumpton and executon tme of s j j,h usng voltage level combnaton h. Each par of E and T j,h are stored n the profle table for s j. Furthermore, non-benefcal voltage combnatons, whose energy consumpton and executon tme are domnated by another combnaton n the same set, are elmnated. We use a dynamc programmng based algorthm to generate the profle table for each dstnct block set. The detals are omtted due to lmted space. Obvously, the number of Pareto-optmal combnatons n s j j j s profle table s π l s. Step 4: So far we have obtaned complete proflng nformaton for all the jobs of each task. In order to guarantee the schedulablty, we decde a threshold executon tme t th for each task to be used n schedulablty test ( m t th = p ) that wll act as the upper bound on each job s executon tme. Now the orgnal problem has become how to select one voltage combnaton for each dstnct block set of a task so that the total energy consumpton E = m = δ j= λ j j,h E (where h s the selected profle table entry s ndex for s j and λ j s the number of tmes s j occurs n the hyper-perod P) s mnmzed whle j T j,h t th s satsfed. In ths step, we gve every voltage combnaton a chance to use ts executon tme actng as the threshold for the correspondng task. Once a voltage combnaton of one dstnct block set s pcked as the threshold for task τ, all the other dstnct block set s decsons can be made n a greedy manner, that s, the one wth lowest energy consumpton whle executon tme s less than or equal to the chosen threshold s selected. Note that f, for some other dstnct block sets, no voltage combnaton can make ts executon tme under the threshold, the chosen threshold s nfeasble and thus dscarded. After all the decsons are made, we calculate the total energy consumpton of that entre task and then put them along wth the threshold executon tme nto the aggregated profle table, whch s used as nput to our approxmaton algorthm. Algorthm llustrates ths process. Note that h denote the ndex of the dstnct block set profle table entry whch s currently chosen to act as the threshold. Algorthm Aggregated profle table generaton for τ. Sort each s j j,h s profle table n the ascendng order of E for j = to δ do for h = to π j do sfeasble = true h j = h {h j j denotes the selected ndex of s.} for k = to δ ; k j do for u = to π j do f T j,u T j,h then h u = u ; break end f f h u s not updated then sfeasble = false ; break end f f sfeasble s true then e = δ j= λ j E j,h j {Total energy consumpton of τ } Add (e, T j,h ) nto task τ s aggregated profle table end f Fgure 3: Aggregated profle table generaton for each task. Fgure 3 shows an llustratve example of aggregated profle table generaton. Suppose task τ 3 n Fgure 2 has three dstnct block sets n P and for each of them we have generated profle table n Step 3 as shown on the top-part of Fgure 3. For smplcty, we assume that each of them only occurs once n P. The frst entry n τ 3 s aggregated profle table s calculated as follows. We choose executon tme (26) of the frst entry n s 3 s profle table as the threshold. For s 2 3, the frst entry cannot be selected snce ts executon tme s hgher than the threshold. Hence the second entry s chosen. For the same reason, the second entry s selected for s 3 3. The rest of the table can be generated smlarly. After applyng Algorthm, non-benefcal entres n the aggregated profle table are fltered out. Note that Algorthm assgns the same voltage combnaton to every occurrence of each dstnct block set. Theorem 3. shows that t s reasonable and safe to do so n fndng optmal assgnments. THEOREM 3.. An optmal soluton of our problem must assgn the same voltage combnaton for each occurrence of a dstnct block set. PROOF. Suppose n the optmal assgnment, dstnct block set s j s assgned two dfferent voltage combnatons vc and vc 2. Assume that the threshold executon tme chosen s t th. Thus the executon tme of both vc and vc 2 are less than t th. Snce t s always safe to use voltage combnatons wth executon tme under threshold, one can get a better soluton by replacng the hgher energy consumng one wth the lower one, whch contradcts the fact that the soluton s optmal. Complexty Analyss: We now analyze the complexty of our problem transformaton scheme. Step performs schedulng of the task set. Step 2 requres m = δ j= s j calculatons. Step 3 has a tme complexty of O( m = δ j= l L b ), where L b s the upper bound of the length of lst L b n our dynamc programmng algorthm. Step 4 needs a runnng tme of O( m = δ j= δ (π j )2 ). Each step takes only polynomal tme. It s mportant to note that the problem transformaton ntroduces only desgn-tme computaton overhead. Although the statc overhead depends on the nature of the nput task set, our experments show that t normally takes only n the order of mnutes for common task sets. 3.2 Approxmaton Algorthm The program transformaton descrbed n the last secton results n an aggregated profle table for each task. Each entry of the aggregated profle table (say j th entry of task τ ) represents one possble voltage assgnment (of all dstnct block sets) for task τ and keeps the correspondng total energy consumpton as well as executon tme, denoted by e j and t j j, respectvely. We dvde each t

4 by perod p to represent the utlzaton rate (t j /p ) of the task. Furthermore, let ρ denote the number of entres n task τ s aggregated profle table. We now convert our problem from a mnmzaton verson to a maxmzaton one. Let e max = max{e, e2,..., eρ }. For each e j j, we calculate energy savng e = e max e j. Now the objectve becomes to maxmze total energy savng E = m = er whle satsfy the schedulablty condton T = m = tr U by choosng one and only one entry from the aggregated profle table for each task (here r s the ndex of the chosen entry) Dynamc Programmng Dynamc programmng gves the optmal soluton to our problem. Let e max defned as max{e, e2,..., eρ }. Clearly, E [,me max ]. Let S E denote a soluton n whch we make decsons for the frst tasks and the total energy savng s equal to E whle the utlzaton rate T s mnmzed. A two-dmensonal array s created where each element T[][E] stores the utlzaton rate of S E. Therefore, the recursve relaton for dynamc programmng s: T [][E] = mn j [,ρ ](T [ ][E e j ] +t j ) (3) Usng ths recurson, we fll up T[][E] for all E [,me max ]. Fnally, the optmal energy savng E s found by: E = {max E T [m][e] U} (4) Dynamc programmng acheves the optmal energy savng by teratng over all the tasks ( to m), all possble total energy savng value (from to me max ) and all entres n each task s aggregated profle table (from to ρ ). Ths algorthm flls the array n order so that when calculatng the th row (T[][]), all the prevous ( - ) rows are all flled. Hence, the tme complexty s O(m 2 max{ρ } e max ), whch s pseudo-polynomal snce the last term s unbounded Approxmaton Algorthm The approxmaton algorthm proposed n ths secton s based on dynamc programmng. It reduces the tme complexty by scalng down every e j value by a constant K such that e max /K can be bounded by m (as well as the approxmaton rato ε) whch reduces the complexty to polynomal. By dong so, we actually decrease the sze of the desgn space. Our goal s to guarantee that the energy savng acheved by our approxmaton algorthm s no less than ( ε)e. In order to obtan the constant K, we need to get the lower and upper bound on E. Ths s done by employng a LP-relaxaton verson of our problem by removng the ntegral constrant (choose only one entry out of each task s aggregated profle table), that s, fractona entres are allowed to be chosen. Algorthm 2 shows the polynomal-tme greedy algorthm whch can gve the optmal soluton to the LP-relaxaton problem. Note that ẽ j s the ncremental energy savng a measure of how much more energy savng can be ganed f the j th entry s chosen nstead of the ( j ) th entry from task τ s table. Here, p j represents the ncremental energy savng effcency (e j /t j ). Ũ keeps the resdual utlzaton rate. The algorthm termnates when Ũ s exhausted. The LP-relaxaton optmal choce for task τ, found by the algorthm, s r where x r =. The splt task, τ s, has two fractonal entres beng pcked: r s and r ( p r s s < p r s ). We have the followng lemma: LEMMA 3... If the optmal soluton S LP to the LP-relaxaton verson of our problem has no splt task, t s already the optmal soluton to our orgnal problem. Otherwse, S LP has at most one splt task τ s n whch the two chosen fractonal entres must be adjacent n ts aggregated profle table. PROOF. Snce ths scenaro can be mapped to MCKP, we can reuse the proof shown n [4]. Algorthm 2 Greedy algorthm for LP-relaxaton problem. for = to m do Sort τ s aggregated profle table n ascendng order of t j. p = e /t for j = 2 to ρ do ẽ j = e j e j ; t j = t j j t ; p j = ẽ j / t j Ũ = U m = t ; Ẽ = m = e Sort all the entres from each task s aggregated profle table together n descendng order of p j, assocatng wth the orgnal ndces and j. for each entry (,j) n the sorted order of p do f Ũ t j < 0 then s = ; t = j {Indces of the entry to be splt.} break {Utlzaton rate has exceeded above the bound.} end f Ẽ = Ẽ + p j ; Ũ = Ũ t j x j j = ; x x t s = Ũ/ t s; t x t s return Ẽ = 0 {entry (,j) s chosen nstead of (,j-)} = x t s; Ẽ = Ẽ + p t sx t s Let E 0 be the maxmum of the followng three values: ) total energy savng when the splt task τ s s dscarded; 2) energy savng generated by the frst fractonal entry; 3) energy savng generated by the second fractonal entry. That s, E 0 = max{ m e r, e r s s x r s s, e r s x r s } (5) =; s Note that accordng to Lemma 3.., the last two terms belong to the same task. Now we can gve the upper and lower bound of E, as shown n the followng lemma: LEMMA E 0 dctates the lower and upper bound of the optmal energy savng as: E 0 E 3E 0. PROOF. If E 0 = m =; s er, we can safely obtan hgher overall energy savng by addng e r s s. If E 0 equals to ether of the other two terms, more energy savng can be acheved by addng m =; s emn where e mn = mn{e,e2,...,eρ }. Hence, we have E E 0. Clearly, the soluton to the LP-relaxaton verson must not be worse than the one for the orgnal problem. In other words, Ẽ E. Snce Ẽ = m =; s er + e r s s x r s s + e r s x r s 3E 0, we have E 3E 0. Now we decde the scalng down factor K usng the bounds descrbed above. We prove that approxmaton rato and polynomal tme complexty can be guaranteed f we assgn K = εe 0 m, as shown n the followng lemmas: LEMMA The K-scaled dynamc programmng algorthm generates a ( ε) approxmaton voltage assgnment. PROOF. Let the scaled energy savng value e j = e j /K, we have Ke j e j < K(e j j + ), hence e Ke j < K. Therefore, by accumulatng all m tasks, we have: m e h m K e h < Km (6) = = where h s the ndex of the selected aggregated profle table entry for task τ. Note that the left term of Equaton (6) s the approxmaton scalng error. Snce K = εe 0 m, we have Km = εe 0. Snce E E 0, accordng to Lemma 3..2, we have Km εe. Therefore, the approxmaton error m = eh K m = e h < Km εe. Hence the approxmaton rato ε holds.

5 LEMMA The tme complexty of the K-scaled dynamc programmng algorthm s O( m2 max{ρ } ε ). PROOF. Gven the upper bound of E (E 3E 0 ), the dynamc programmng method can be mproved to search n the range of [,3E 0 ], resultng n a tme complexty of O(m max{ρ } E 0 ). For the scaled verson, the complexty s reduced to O(m max{ρ } E 0 K ). Gven K = εe 0 m, we have E 0 K = m ε, thus the complexty becomes O( m2 max{ρ } ε ), whch s ndependent of any pseudo-polynomal energy values. THEOREM 3.2. The proposed algorthm s a fully polynomal tme ( - ε) approxmaton scheme for the maxmzaton verson of our problem. PROOF. Drectly follows from Lemma 3..3 and Now let s evaluate the qualty of the soluton generated by the converted problem wth respect to our orgnal problem. Let E denote the optmal result (the mnmum energy consumpton) and α denote the approxmaton rato for the orgnal problem. Gven an approxmaton rato ε for the maxmzaton verson, α can be quantfed as: Hence, ( + α)e m = e max ( ε)e (7) = α = m = emax E + εe E E (8) Snce for a specfc soluton, accordng to our converson strategy, we have: E = m = emax E. Therefore, α = E E ε (9) Equaton (9) llustrates that α s related to ε by the factor of E /E, whch s the rato of the total energy savng to total energy consumpton over all tasks. In the worst case, when the overall utlzaton s low enough so that entres wth the lowest energy consumpton are selected for each task, ths rato reaches maxmum. Let v max and v mn denote the maxmum and mnmum voltage avalable, respectvely. We have, α m = (emax e mn ) m ε v max 2 v 2 mn ε (0) = emn v 2 mn Let γ denote ths maxmum rato (thus α γ ε). In practce, gven a voltage scalable processor, we frst calculate ts γ value. If γ, t means that by solvng the converted maxmzaton problem usng approxmaton rato ε, we can get a soluton wth an equal or better qualty bound ( ε) to the orgnal problem. Otherwse, f needed, we can set ε = α/γ so that the specfed approxmaton rato (α) to the orgnal problem can be acheved frmly. As a result, the tme complexty of our approxmaton scheme wth respect to the orgnal problem s O( m2 max{ρ } γ α ). As shown n Secton 4., for common voltage scalable processors, γ s usually very small and n some cases (e.g. StrongARM [5]) s less than. Now we have obtaned an approxmated optmal soluton based on the orgnal EDF schedule whch s generated wthout voltage scalng. As descrbed n Secton 3., we have ensured that the utlzaton bound of EDF s observed, the modfed task set s guaranteed to be schedulable. Note that runnng the task set wth the new voltage assgnment could potentally result n a slghtly dfferent schedule snce we are essentally changng the executon tme of each block. Hence the soluton we gve s essentally wth respect to the orgnal schedule. Certanly, more teratons can be carred out based on the new schedule untl t becomes steady. Based on our observatons, such costly teratons contrbute very lttle n overall energy savngs, and therefore not benefcal. 4. EXPERIMENTS 4. Expermental Setup To demonstrate the effectveness of our approach, we consder two DVS processors: StrongARM [5] and XScale [6]. The former one supports four voltage - frequency levels (.5V - 206MHz,.4V - 92Mhz,.2V - 62MHz and.v - 33MHz) wth γ = 0.86 and the latter one supports fve levels (2.05V - 000MHz,.65V - 800MHz,.3V - 600HMz, 0.99V - 400MHz and 0.7V - 200MHz) wth γ = We compare our results wth two scenaros: when no DVS s used and when optmal nter-task scalng s employed [0]. In the former scenaro, every task s runnng under the hghest voltage level. Whle n the latter scenaro, a dynamc programmng based algorthm s used to obtan the optmal soluton as dscussed n Secton Approxmaton rato α of 0.0, 0.05, 0.0, 0.5 and 0.20 are consdered 3. We mplemented the EDF schedulng smulator along wth all the algorthms n C Results usng Real Benchmarks We frst construct four task sets each of whch conssts of real benchmark applcatons selected from typcal embedded system benchmark sutes MedaBench [7] EEMBC [8] and MBench [9] as shown n Table. Task Set conssts of tasks from MedaBench, Set 2 from EEMBC, Set 3 from MBench, and Set 4 s a mxture from all three sutes. We set each task s utlzaton rate (under the hghest voltage level) randomly n the nterval of [ 0.5 m,.5 m ]. The accumulated overall utlzaton rate s controlled to be wthn [0.7,0.9] for StrongARM and [0.5,0.7] for XScale. Table : Task sets consstng of real benchmarks. Task sets Set Set 2 Set 3 Set 4 Tasks cjpeg, djpeg, epc, mpeg2, pegwt, toast, untoast, rawcaudo A2TIME0, AIFFTR0, AIFIRF0, BaseFP0, BITMNP0, IDCTRN0, RSPEED0, TBLOOK0 qsort, susan, djkstra, patrca, rjndael, adpcm, CRC32, FFT, strngsearch cjpeg, epc, pegwt, A2TIME0, RSPEED0, qsort, susan, djkstra Fgure 4 shows the results n both scenaros. On StrongARM processor, our approxmaton scheme saves up to 34% energy compared to no-dvs and outperforms the optmal nter-task scalng by up to 7% even when the approxmaton rato s set to 0.2 (α = 0.2). On XScale processor, due to larger span between avalable voltage levels and lower overall utlzaton rate, up to 67% energy savng s acheved over no-dvs scenaro and on average 9% extra savng compared to optmal nter-task voltage scalng. 4.3 Results usng Synthetc Tasks We also evaluated our approach by randomly generated task sets wth 5 to 0 tasks per set wth dfferent overall utlzaton rates. We defne the effectve bound of a DVS processor as the followng: any task set wth an overall utlzaton rate equal to or lower than the effectve bound can acheve the optmal voltage assgnment by trvally choosng the lowest voltage for all the tasks. Clearly, the effectve bound s the rato of the lowest frequency to the hghest one. In other words, the effectve bound s 0.64 for StrongARM and 0.2 for XScale. Hence, n the former case, we vary the overall utlzaton rate of each task set from 0.65 to 0.95 at one step of 0.05 whle from 0.3 to 0.9 at one step of 0.. Gven each overall utlzaton rate, we randomly generate task perods n the nterval of [00,30000]. Smlarly as n Secton 4.2, each task s utlzaton rate s evenly dstrbuted between [ 0.5 U m,.5 U m ]. Fgure 5 shows the results whch are the average of 0 randomly generated task sets for each overall utlzaton rate on both DVS processors. Clearly, n all cases, our approach acheves closely approxmated overall energy consumpton wth respect to the optmal 3 Approxmaton rato ε for the maxmzaton verson of our problem s calculated as descrbed n Secton 3.2.2

6 Energy Consumpton Energy Consumpton Set Set 2 Set 3 Set 4 No DVS Inter-task Optmal PreDVS Optmal PreDVS α = 0.0 PreDVS α = 0.05 PreDVS α = 0.0 PreDVS α = 0.5 PreDVS α = (a) StrongARM processor Fgure 4: Results for real benchmark task sets. Set Set 2 Set 3 Set 4 No DVS Inter-task Optmal PreDVS Optmal PreDVS α = 0.0 PreDVS α = 0.05 PreDVS α = 0.0 PreDVS α = 0.5 PreDVS α = 0.20 (b) XScale processor (a) StrongARM processor Fgure 5: Results for synthetc task sets. (b) XScale processor soluton and outperforms nter-task optmal scalng consstently up to 24%. The runnng tme of our algorthm s comparable wth nter-task technque n the optmal soluton case and much shorter n approxmated soluton cases. For example, nter-task DVS [0] and PreDVS optmal algorthms take and mllseconds, respectvely, for one of the synthetc task sets, whle PreDVS approxmaton algorthm only requres hundreds of mllseconds. 5. CONCLUSION In ths paper, we presented a dynamc voltage scalng technque for preemptve real-tme systems usng approxmaton scheme whch acheves sgnfcant energy savngs by assgnng dfferent voltage levels to each task. We proved that the problem s NP-hard and presented an approxmaton scheme by developng a novel transformaton mechansm and a fully polynomal tme approxmaton algorthm. Our approach does not ntroduce any addtonal voltage swtchng overhead compared to nter-task scalng technques. Moreover, our approach explots statc tme slack only and thus can be employed together wth any exstng ntra-task scalng technques. The approxmate solutons gven by our approach outperforms optmal nter-task scalng technques by up to 24%. Our expermental results demonstrated that our approach can generate solutons very close to the optmal. 6. REFERENCES [] G. Buttazzo, Hard Real-Tme Computng Systems. Kluwer, 995. [2] J. Lu, Real-Tme Systems. Prentce Hall, [3] I. Hong et al., Power optmzaton of varable-voltage core-based systems, IEEE TCAD, vol. 8, pp , 999. [4] W. Wang et al., SACR: Schedulng-aware cache reconfguraton for real-tme embedded systems, VLSI Desgn, [5] W. Wang et al., Dynamc reconfguraton of two-level caches n soft real-tme embedded systems, ISVLSI, [6] W. Wang et al., Leakage-aware energy mnmzaton usng dynamc voltage scalng and cache reconfguraton n real-tme systems, VLSI Desgn, 200. [7] R. Jejurkar et al., Energy-aware task schedulng wth task synchronzaton for embedded real-tme systems, IEEE TCAD, vol. 25, pp , [8] G. Quan et al., Energy effcent dvs schedule for fxed-prorty real-tme systems, ACM TODAES, vol. 6, pp. 30, [9] X. Zhong et al., System-wde energy mnmzaton for real-tme tasks: Lower bound and approxmaton, ICCAD, [0] S. Zhang et al., Approxmaton algorthms for power mnmzaton of earlest deadlne frst and rate monotonc schedules, ISLPED, [] J. Seo et al., Profle-based optmal ntra-task voltage schedulng for hard real-tme applcatons, DAC, [2] D. Shn et al., Optmzng ntratask voltage schedulng usng profle and data-flow nformaton, IEEE TCAD, vol. 26, pp , [3] S. Oh et al., Task parttonng algorthm for ntra-task dynamc voltage scalng, ISCAS, [4] H. Kellerer et al., Knapsack Problems. Sprnger-Verlag, [5] Marvell, StrongARM 00 processor, [6] Marvell, XScale mcroarchtecture, [7] C. Lee et al., Medabench: A tool for evaluatng and syntheszng multmeda and communcatons systems, Mcro, 997. [8] EEMBC, EEMBC, The Embedded Mcroprocessor Benchmark Consortum, [9] M. Guthaus et al., Mbench: A free, commercally representatve embedded benchmark sute, WWC, 200.

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling Real-Tme Systems Multprocessor schedulng Specfcaton Implementaton Verfcaton Multprocessor schedulng -- -- Global schedulng How are tasks assgned to processors? Statc assgnment The processor(s) used for