Energy and Feasibility Optimal Global Scheduling Framework on big.little platforms

Size: px

Start display at page:

Download "Energy and Feasibility Optimal Global Scheduling Framework on big.little platforms"

Deborah Phillips
5 years ago
Views:

1 Energy and Feasblty Optmal Global Schedulng Framework on bg.little platforms Hoon Sung Chwa, Jaebaek Seo, Hyuck Yoo Jnkyu Lee, Insk Shn Department of Computer Scence, KAIST, Republc of Korea Department of Computer Scence and Engneerng, Sungkyunkwan Unversty, Republc of Korea Abstract Motvated by ARM s bg.little, the cuttng-edge heterogeneous mult-core archtecture that supports mgraton between cores wth dfferent performance and energy effcency, ths paper targets global heterogeneous mult-core schedulng, and acheves the optmalty n terms of energy consumpton and feasblty. To ths end, we address the problem of determnng not only the system statc confguratons such as on-and-off status and voltage/frequency, and but also the tme-varyng schedule of each task. Frst, we abstract each task s schedule as a rate of each task s workload on each cluster (bg and LITTLE) and formulate an optmzaton problem that acheves both energy and feasblty optmalty. We then develop a tme-effcent task workload allocaton algorthm to assgn the workload rate of each task and the system confguratons. To generate task schedules from the algorthm, we establsh feasblty-optmal schedulng rules for a two-type heterogeneous mult-core platform, whch generalze the exstng rules for a homogeneous one. Our smulaton results demonstrate that our approach yelds up to 37% less energy consumpton wthout compromsng feasblty than best-possble parttoned schedulng approaches obtaned by solvng an ILP optmzaton formulaton. I. INTRODUCTION Advances on chp archtectures towards mult-cores have provded two general optons for real-tme schedulng: global schedulng that allows a task to mgrate from one core to another, and parttoned schedulng that dsallows the mgraton. To explore ther own advantages such as small overhead of the former and full utlzaton of the latter, real-tme multcore schedulng has been wdely studed, mostly for homogeneous cores [1]. When t comes to heterogeneous mult-core schedulng, very few studes have focused on global schedulng manly due to nsuffcent archtectural support for mgraton n commercal heterogeneous mult-core chps. Recently, ARM has launched a two-type heterogeneous mult-core chp, called bg.little [2], whch has been deployed n the state-of-the-art smartphones, e.g., Samsung Galaxy 4 and Note 3. The bg.little archtecture conssts of two types of cores: one wth hgh-performance bg cores and the other wth power-effcent LITTLE cores. One of the most dstngushable features of the bg.little archtecture s a practcal support for mgraton; cores n two dfferent types (bg and LITTLE) not only deploy the same nstructon-set archtecture, but also share a specally desgned nterconnecton bus for data transfer between the clusters. Through couplng bg and LITTLE cores, the bg.little archtecture s capable of global schedulng to acheve hghperformance wth maxmum energy effcency. In real-tme systems research, however, almost all energy-aware schedulng approaches for heterogeneous mult-core platforms have been focused on parttoned schedulng rather global schedulng [3], [4], [5], [6]. Motvated by the cuttng-edge heterogeneous mult-core archtecture, we focus on global schedulng, and demonstrate how good global schedulng s for a bg.little platform compared to exstng parttoned schedulng approaches from the both core utlzaton and energy consumpton ponts of vew. To ths end, we would lke to acheve the followng goals: (a) feasblty optmalty our soluton can schedule all jobs n a task set wthout any job deadlne mss, f there exsts such a feasble soluton, and (b) energy-optmalty any feasble soluton cannot result n less energy consumpton than our soluton. Then, we need to determne the followng cluster confguratons and job schedules to acheve the goals: () the onand-off status of each cluster, () the statc voltage/frequency level of each cluster, and () schedule of each job (.e., when and where each job executes). Snce t s too complcated to determne ()-() at once, especally () durng the entre tme nterval of nterest, we abstract each task s schedule as a rate of each task s workload on each cluster. Then, we dvde the problem nto two: (1) determnng the cluster-level statc confguratons and the workload rate of each task on each cluster, and (2) developng a global schedulng algorthm that generates task schedules from the nput assgned n (1). For (1), we reduce search space by dervng necessary feasblty condtons, and then develop a task workload allocaton algorthm that acheves both energy and feasblty optmalty. When t comes to (2), we establsh feasblty-optmal schedulng rules for a two-type heterogeneous mult-core platform, whch generalze exstng rules for a homogeneous one (called DP-Far [7]). To evaluate our approach, we perform smulatons wth real system parameters from a bg.little core. Our smulaton results demonstrate that the proposed energy/feasbltyoptmal global schedulng framework yelds up to 37% less energy consumpton wthout compromsng feasblty than best-possble parttoned schedulng confguratons obtaned by solvng an ILP optmzaton formulaton. C1 C2 C3 In summary, ths paper offers the followng contrbutons. Addressng a need for global heterogeneous schedulng, motvated by the cuttng-edge archtecture, called bg.little; Development of the frst energy-aware global schedulng framework for a bg.little platform consstng of C3 and C4; Development of a low tme-complexty optmal task

2 C4 C5 workload allocaton algorthm for bg.little based on necessary feasblty condtons we derved; and Establshment of feasblty-optmal schedulng rules for a two-type heterogeneous mult-core platform, whch generalzes DP-Far rules for a homogeneous one; and Demonstraton of the effectveness of our soluton va smulaton. The rest of the paper s structured as follows. Secton II presents our system model and model valdaton, followed by the problem statement n Secton III. Secton IV formulates an optmzaton problem that determnes system confguratons and a rato of workload of each task on the bg and LIT- TLE clusters, and develops a tme effcent energy/feasbltyoptmal task workload allocaton algorthm. Secton V provdes feasblty-optmal schedulng rules for mplct-deadlne perodc tasks runnng on two-type heterogeneous mult-core platforms. Secton VI evaluates our energy/feasblty global schedulng framework. Secton VII dscusses related work, and fnally Secton VIII concludes ths paper. II. SYSTEM MODEL AND VALIDATION In ths secton, we present our system model ncludng a bg.little platform, task model, and power model. Then, we valdate our system model va experments on a real bg.little processor. A. System model bg.little platforms. The bg.little archtecture s a computng platform consstng of two heterogeneous clusters: one wth hgh-performance bg cores and the other wth powereffcent LITTLE cores. Due to the nature of the bg.little archtecture, a bg core exhbts hgh energy consumpton wth hgh performance, whle a LITTLE one does the opposte behavors. Both clusters share not only the same nstructon-set archtecture (ISA), but also a specally desgned nterconnecton bus for data transfer between the clusters [2]. Therefore, t s practcal for a task n one cluster to mgrate to the other cluster n the mddle of executon, whch cannot be realzed n most exstng heterogeneous mult-core archtectures. The bg.little archtecture provdes dynamc voltage and frequency scalng (DVFS) per cluster. The bg (lkewse LITTLE) cluster provdes nne (lkewse fve) dscrete frequency/voltage levels as shown n Table I [8]. We note that the bg.little archtecture supports only cluster-level DVFS, meanng that we can apply dfferent voltage/frequency to the bg and LITTLE clusters, but cores n the same cluster operate wth the same frequency/voltage. Let (V B ) and f L (V L ) denote the frequency (voltage) of a core n the bg and LITTLE cluster, respectvely. Among the several frequency optons n Table I, let,max and f L,max denote the maxmum frequency for a core n the bg and LITTLE cluster, respectvely. We denote the number of cores n the bg and LITTLE cluster s m B and m L, respectvely. Task model. We consder an mplct-deadlne perodc task model, n whch a task τ n a task set τ s characterzed by (T, C B, CL ): the perod or the relatve deadlne T, and the worst-case executon tme (WCET) at the maxmum frequency upon the bg and LITTLE core (denoted by C B and C L, respectvely). In general, the WCET s n nverse proportonal to frequency; therefore, f a bg core operates wth a gven frequency, the WCET of τ s calculated by C B fb,max. Lkewse, the WCET of τ on a LITTLE core wth a gven frequency f L s C L, to be valdated n Secton II-B. fl,max f L Each task τ generates a potentally nfnte sequence of jobs for every T tme unts, and each job released by a task τ has to complete ts executon wthn T tme unts from ts release. We assume that jobs are ndependent,.e., they do not share any resources except cores and do not have any data dependences. A sngle job cannot be executed upon more than one core (regardless of core type) n parallel. As supported by the bg.little archtecture, a job can mgrate from a bg core to a LITTLE one (or from a LITTLE core to a bg one); n ths case, the amount of executon of τ performed on a LITTLE core corresponds to that on a bg core multpled by C L/CB. For example, suppose that CB = 4 and C L = 8; both cluster operate wth the maxmum frequency; and a job of τ executes one tme unt on a bg core and then mgrates to a LITTLE core. In ths case, one tme unt executon on a bg core corresponds 1 C L/CB = 2 tme unts executon on a LITTLE core. Therefore, after mgraton, the job has 8 2 = 6 tme unts executon left on a LITTLE core. Power model. In a bg.little archtecture, there are three power states of a core: off, dle, and actve. Under the off state, a core s turned off, and t cannot execute any job untl the core becomes actvated (note that t takes tme and power to actvate). Once a core s actvated (.e., turned on), the core s on the ether dle or actve state. A core s sad to be on the actve state f the core has a currently-executng job, or on the dle state otherwse. Then, the power consumpton of a core (denoted by P core ) s expressed as P core = P statc + P dynamc, (1) where P statc s the power for the core to keep ready to execute (on the ether actve or dle state), and P dynamc s the addtonal power to execute a job. In other words, a core on the off, dle, and actve state exhbts (a) P statc = P dynamc = 0, (b) P statc > 0 and P dynamc = 0, and (c) P statc, P dynamc > 0, respectvely. The two terms are modelled as follows [9]: P statc = C statc V ρ and P dynamc = C dynamc fv 2, where C statc, ρ, and C dynamc are constants dependng on core types. In addton to the power consumpton of each core, a cluster of a bg.little processor consumes power so as to support cores n the cluster to execute, called the uncore power consumpton (denoted by P uncore ) [9]. That s, f at least one core n a cluster s on the ether dle or actve state, P uncore s postve; otherwse, P uncore s zero. In order to obtan the values of the hardware dependent parameters (.e., C statc, ρ, and C dynamc ), we measured the power consumpton on a bg.little development board for avalable operatng frequency/voltage combnatons, to be presented n Secton II-B. B. Model valdaton Experment setup. We use the ODROID-XU+E board [10] comprsng of four Cortex-A15 ( bg ) cores along wth four

3 bg LITTLE v B : voltage (V) v L : voltage (V) : frequency (MHz) f L : frequency (MHz) TABLE I. FREQUENCY/VOLTAGE LEVELS OF A BIG AND LITTLE CORE TABLE II. A15 core (bg) A7 core (LITTLE) C statc ρ C dynamc POWER MODEL PARAMETERS FOR A BIG AND LITTLE CORE Fg. 1. (a) Task executon model valdaton for 5 benchmarks Model valdaton (b) Power model valdaton Cortex-A7 ( LITTLE ) cores. The ODROID-XU+E board s equpped wth sensors to measure the power consumpton of the bg and LITTLE clusters ndvdually. In our experments, we utlze only one bg and one LITTLE core. The remanng cores n each cluster are logcally turned off usng system calls, such that no jobs are scheduled on them. We set the voltage and frequency for each cluster usng the Lnux userspace governor. We use fve types of benchmarks: CPU ntensve, cache ntensve, memory ntensve, I/O ntensve wth the buffer cache, and I/O ntensve wthout the buffer cache. In the CPU ntensve benchmark, a process runs a busy loop wth no memory accesses. In the cache ntensve benchmark, a process strdes through a memory regon performng readmodfy-wrte cycles on successve cache lnes. The sze of the regon s twce of the L1 cache sze. A process n the memory ntensve benchmark s the same as that n the cache ntensve benchmark except ncreased workng set sze to twce of the L2 cache sze of the bg cluster. In the I/O ntensve benchmarks, a process wrtes an mage to a fle wth/wthout a buffer cache. Task executon model valdaton. In order to valdate our task executon model, we run the above fve types of benchmarks 25 tmes for each frequency/voltage level, and measure the average CPU tme. Fgure 1(a) shows that each benchmark exhbts dfferent performance between the bg and LITTLE clusters, but the CPU tme s n nverse rato to frequency levels wthn the same cluster. Thereby, the formular C B fb,max dscussed n our task model s vald n calculatng the WCET of τ on a bg core wth a gven frequency. Lkewse, C L s vald on a LITTLE core. fl,max f L Power model valdaton. To valdate that our power model adequately represents real hardware behavors, we measure the real-tme power consumpton and obtan the model parameters. We run the CPU ntensve benchmark for each frequency level durng 5 mnutes, and read the power sensor data. The measurement results are shown n Fgure 1(b). We choose a lnear regresson method to obtan the parameters, and Table II shows the estmated parameters. III. PROBLEM STATEMENT As mentoned n the ntroducton, ths paper consders global preemptve schedulng on a bg.little platform, and ams at achevng the followng goals: G1. Feasblty-optmalty our soluton schedules all jobs n a task set wthout any deadlne mss of a job, as long as there exsts such a feasble soluton; and G2. Energy-optmalty any feasble soluton cannot yeld less energy consumpton than our soluton. To acheve the goals, we determne the followng cluster confguratons and job schedules: D1. On-and-off status of each cluster; D2. Statc voltage/frequency level of each cluster; and D3. Schedule of each job,.e., the tme ntervals n and the core on whch each job executes. For D1, we have three optons: (a) both bg and LITTLE clusters are turned on; (b) only the bg cluster s turned on; and (c) only the LITTLE cluster s turned on. Note that we do not consder a core-level on-and-off polcy, because t has been demonstrated that P statc s neglgble compared to P uncore [9], mplyng we hardly beneft from the core-level on-and-off polcy n terms of energy consumpton. Thereby, we assume that all cores are actvated n a cluster when the cluster s turned on. Let δ B and δ L denote the on-and-off status of the bg and LITTLE clusters,.e., 1 f on, 0 otherwse. When t comes to D2, we have cluster-level dscrete choces as shown n Table I, snce the bg.little archtecture does not support core-level voltage/frequency regulaton. Here, we consder statc (rather than dynamc) voltage/frequency scalng on whch the operatng voltage/frequency does not change over tme. In the prevous lteratures [11], [12], the energy optmal frequency s a constant when each job presents ts worstcase workload behavor wth the convex power consumpton functon. Therefore, the statc scalng not only s smpler, but

4 also mnmzes the worst-case energy consumpton (when each job presents ts worst-case executon tme). Such a worstcase behavor of energy consumpton s mportant for moble, battery-powered devces n whch a bg.little processor s deployed. For D3, t s too complex to determne all the job schedules durng the entre tme nterval of nterest, because we should decde not only when, but also where to execute a job, whch yelds dfferent speed of executon and energy consumpton. Therefore, we abstract the job schedule as the rato of workload of ts nvokng task. Ths s because, the rato of workload of a task not only ndcates the amount of executon of the task on each cluster, but also determnes the duraton for a core to be on the actve state due to the task s executon, whch can be translated nto the energy consumpton. Let x B and x L denote the fracton rato of workload for whch a task τ executes on the bg and LITTLE clusters, respectvely, where x B + x L = 1. If x B = 1 (x L = 1), τ s executed only on a bg (LITTLE) cluster. If 0 < x B, xl < 1 holds, τ s fractonally executed on both bg and LITTLE clusters. Usng the rato of workload of each task on each cluster, we dvde the problem nto the followng two steps. At step 1, we determne how much porton of a task workload wll be executed on bg and LITTLE clusters (.e., x B and x L for every τ τ). Then, at step 2, we develop a global schedulng algorthm that generates job schedules for gven x B and x L for every τ τ, determned by step 1. We now summarze our approach to determnng D1, D2 and D3 that satsfes G1 and G2 as follows. Step 1. Step 2. Gven a feasble task set τ of perodc real-tme tasks and a bg.little platform comprsng m B bg cores and m L LITTLE ones, Determne the on-and-off status (δ B and δ L ) and statc frequency ( and f L ) of the bg and LITTLE clusters and the rato of workload of each task on the bg and LITTLE clusters (x B and x L for every τ τ), such that they yeld the energy-optmalty wthout compromsng feasblty; and Develop a global schedulng algorthm that generates job schedules from the values assgned by step 1 such that all jobs meet ther deadlnes. To summary, step 1 performs energy/feasblty-optmal task workload allocaton, and step 2 develops feasbltyoptmal global schedulng algorthm, whch are presented n Sectons IV and V, respectvely. We then evaluate our soluton n Secton VI. IV. ENERGY/FEASIBILTY-OPTIMAL TASK WORKLOAD ALLOCATION In ths secton, we present our approach to determnng the on-and-off status and statc frequency of both clusters and the rato of workload of each task on both cluster (.e., δ B, δ L,, f L, {x B }, and {xl }), explaned n step 1 n Secton III. To ths end, we frst formulate an optmzaton problem that acheves both energy/feasblty optmalty. Then, we derve necessary feasble condtons for the soluton, and based on the condtons, we present a tme-effcent task workload allocaton algorthm. A. Problem formulaton Based on our power model, we can calculate the energy consumpton of each cluster n an nterval. Snce job schedules are repeated at every hyperperod of a task set (denoted and calculated by H = LCM( T )), we now calculate energy consumpton n an nterval of length H. Whle P statc and P uncore only depend on the on-andoff status of the bg and LITTLE clusters (.e., δ B and δ L ), P dynamc addtonal reles on the executon tme of jobs n each cluster. To ths end, we calculate the amount of executon of a job of τ n each cluster. Snce the executon tme of a job of τ s C B fb,max f t s fully executed on a bg core, x B C B fb,max s the amount of actual executon of a job of τ on a bg core. Therefore, we calculate the utlzaton of τ s executon on the bg and LITTLE clusters (denoted by u B and u L ) as follows: u B = x B C B fb,max 1 T, u L = x L C L fl,max f L 1 T. (2) Note that u B wth x B = 1 s sad to be the maxmum utlzaton of τ s executon on the bg cluster, and denoted by u B,max ; lkewse, denotes u L wth x L = 1. Then, n an nterval of length H, the cumulatve executon tme of jobs of τ on the bg (LITTLE) cluster s H u B (H u L ). Then, the cumulatve energy consumpton n the bg and LITTLE clusters n an nterval of length H (denoted by E B and E L ) s calculated as follows: E B = δ B H (Puncore B + m B Pstatc B + Pdynamc B τ τ E L = δ L H (Puncore L + m L Pstatc L + Pdynamc L τ τ u B u L ), (3) ). Note that f the bg (LITTLE) cluster s turned off,.e., δ B = 0 (δ L = 0), the energy consumpton of the bg (LITTLE) cluster E B (E L ) s zero. Also, for smplcty of presentaton, we only concern frequency nstead of voltage/frequency ; as shown n Table I, f frequency s determned, the correspondng voltage s gven. Then, we formally present the optmzaton problem (denoted by Energy/Feasblty-OPT) of determnng the on-andoff status and statc frequency of both clusters and the rato of workload of each task on both cluster (.e., δ B, δ L,, f L, {x B }, and {xl }), as follows. Mnmze E B(δ B,, {x B }) + E L(δ L, f L, {x L }), Subject to C1: τ τ, x B + x L = 1, C2: τ τ, u B + u L 1, C3: u B m B, C4: τ τ u L m L, τ τ C5: τ τ, 0 x B, x L 1. (4)

5 Here, the objectve functon acheves the energy-optmalty, whle constrants C1 C5 yelds feasblty-optmalty. Constrant C1 specfes that every task must receve ts approprate amount of executon. Constrant C2 asserts that each task cannot be executed upon both clusters at the same tme. Constrants C3 and C4 assert that total workload allocated on each cluster should be less than or equal to the capacty of each cluster. Those constrants correspond to the feasblty condtons presented n [13]. Note that f the bg (LITTLE) cluster s turned off, we set m B (m L ) to zero to apply non-avalablty of the cluster. When we calculate the optmzaton problem, we do not assume any partcular relaton between the bg and LITTLE cluster n terms of energy consumpton and executon speed. In a real world, a bg core consumes more power and takes less job executon tme than a LITTLE core. However, we do not rely on such a relaton when we calculate the optmzaton problem; nstead, we seek to fnd a general soluton. B. Necessary feasble condtons There are many ways to solve our optmzaton problem, but we would lke to solve t n an effcent manner n terms of tme-complexty. Snce there are only several cluster confguratons of the on-and-off status and statc frequency, we make solutons of {x B, xl } for all possble cluster confguratons (.e., δ B, δ L, and f L ) and choose the best soluton among all the solutons. To solve our optmzaton problem for gven cluster confguratons, ths subsecton nvestgates some necessary condtons for each task s rato of workload on the bg and LITTLE clusters (.e., {x B, xl }), whch wll be a bass for a soluton algorthm to be presented n Secton IV-C. By the parallelsm restrcton constrant C2, f there exsts a task τ such that u B,max > 1 and > 1 hold, the task cannot satsfy the constrant, whch leads to nfeasblty. However, f > 1 and u B,max 1 hold, we may fnd a feasble soluton by movng some workload of τ from the LITTLE to the bg cluster. In ths case, there must exst the mnmum utlzaton of τ on the bg core so as to satsfy the constrant C2. Conversely, f 1 and u B,max > 1 hold, there must exst the mnmum utlzaton of τ on the LITTLE core wth the same reasonng. If 1 and u B,max 1 hold, a task τ always satsfes constrant C2. Recall that the maxmum utlzaton of τ s executon on the bg and LITTLE clusters s u B,max = C B fb,max 1 T and = C L fl,max 1 f L T, respectvely. Then, the followng lemma calculates such a mnmum utlzaton. Lemma 1: The mnmum value of the utlzaton of each task on the bg and LITTLE clusters (denoted by lo B and lo L ) s calculated by lo B 1, f = u B,max > 1, 0, otherwse. u B,max lo L 1, f u B,max = u B,max > 1, 0, otherwse. (5) (6) Proof: The mnmum value of the utlzaton of each task on the bg cluster (lo B ) s nduced by constrant C2. In constrant C2, f we substtute x L to 1 x B based on constrant C1, we can calculate lo B. The same holds for lol. Detals are gven n Appendx A. Once lo B, lol s calculated, the rest of workload rato (.e., 1 lo B lo L by constrant C1) for each task can be properly allocated to each cluster as long as cluster capacty constrants meet. We let y B and y L denote the workload rato excludng lo B and lo L (.e., x B = y B + lo B and x L = y L + lo L ), respectvely. Then, the constrants C1 C5 for guaranteeng feasblty can be reduced as C1: τ τ, y B + y L = 1 lo L lo B, C3: y B u B,max m B lo B τ τ C4: y L m L lo L τ τ C5: τ τ, 0 y B, y L 1 lo L lo B. u B,max,, Note that constrant C2 s removed. Ths s because constrant C2 s never volated f x B and x L s assgned at least lo B and lo L by lemma 1, respectvely. C. Soluton to the optmzaton problem Snce we reduce the problem by allocatng lo B and lo L amount of utlzaton to the bg and LITTLE clusters, respectvely, the remanng step s to determne {y B} and {yl } such that total energy consumpton s mnmzed whle satsfyng C1 C5. Each task has dfferent energy effcency between clusters. A task τ consumes energy at the rate of u B,max Pdynamc B f fully allocated on the bg cluster, and Pdynamc L on the LITTLE cluster. We defne ef as τ s energy effcency rato of the bg cluster to the LITTLE cluster, expressed as ef = ub,max Pdynamc B. (7) Pdynamc L If ef > 1, executng τ on the LITTLE cluster s more energyeffcent than the bg cluster; on the contrary, f ef < 1, the converse holds. Thereby, f there s no capacty lmt for each cluster, allocatng all of the remanng workload of τ to ts energy-effcent cluster consumes the least energy. However, each cluster has ts capacty lmt as shown n constrants C3 and C4, so t mght be mpossble to allocate all τ wth ef > 1 on the LITTLE cluster (or all τ wth ef < 1 on the bg cluster). Consequently, we need to rearrange each task workload allocaton n order to satsfy cluster capacty lmts. We desgn an optmal task workload allocaton algorthm based on the understandng of per-task energy effcency on each cluster (see Algorthm 1). The task workload allocaton works by two stages: 1) allocatng workload n a way that consumes the mnmum energy assumng nfnty capacty of both clusters and 2) rearrangng the workload to satsfy the feasblty condtons, especally related to cluster capacty constrants. Before stage 1), we calculate the mnmum workload

6 Algorthm 1 Optmal-Task-Workload-Allocaton 1: τ L {τ ef 1} 2: τ B {τ ef < 1} 3: Allocate {lo B }, {lo L } accordng to Lemma 1 4: Allocate y L 1 lo B lo L, y B 0 for all tasks n τ L 5: Allocate y L 0, y B 1 lo B lo L for all tasks n τ B 6: oth C3 and C4 are satsfed then 7: return {x B x B = y B + lo B }, {x L x L = y L + lo L } 8: else oth C3 and C4 are not satsfed then 9: return not feasble 10: else f Only C3 s satsfed then 11: repeat 12: fnd τ k wth the closest ef k to 1 n τ L 13: f yl yk L k then 14: yk L 0, yk B 1 lo B k lo L k 15: τ L τ L \ {τ k } 16: else 17: yk L yl ul,max > m L lol (m L lol ul,max ) 18: end f 19: f C3 s volated then 20: return not feasble 21: end f 22: untl C4 s satsfed 23: else f Only C4 s satsfed then 24: Do the correspondng process to lnes : end f 26: return {x B x B = y B + lo B }, {x L x L = y L + lo L } rato (lo B, lol ) that should be allocated n the bg and LITTLE clusters and allocate them on each cluster. In stage 1), Algorthm 1 parttons a task set nto two groups accordng to energy effcency on a cluster. Let τ B and τ L denote a collecton of tasks that are more energy-effcent when executng on the bg and LITTLE clusters, respectvely (lnes 1 2). Then, we allocate the rest of workload (except lo B, lol ) of all tasks n τ B to the bg cluster and the rest of workloads of all tasks n τ L to the LITTLE cluster (lnes 4 5). In stage 2), we check whether the allocaton done by stage 1) satsfes cluster capacty constrants C3 and C4. There are 4 cases: ) f both C3 and C4 are satsfed, the allocaton done by stage 1) s an energy optmal soluton as well as satsfyng all feasblty condtons (lnes 8 9); ) f both C3 and C4 are not satsfed, there s no feasble workload allocaton, meanng that the task set s not feasble (lnes 6 7); ) f only C3 s satsfed, t requres to move some workload allocated on the LITTLE cluster to the bg cluster untl t satsfes C4 (lnes 10 22); and v) f only C4 s satsfed, t requres to move some workload allocated on the bg cluster to the LITTLE cluster untl t satsfes C3 (lnes 23 25). In the process of rearrangng workload for cases ) and v), f no avalable bg cluster capacty to accommodate more remanng workload, there s no feasble workload allocaton (lnes 19 21). The key ssue n rearrangng workload s to choose some tasks whose workload wll be re-allocated. When some workload s moved from the energy-effcent cluster to the other one, energy consumpton s supposed to ncrease. We need to move the workload n a way that the amount of ncreased energy consumpton arsng from workload mgraton s mnmzed. In addton, t should be done by the most benefcal way to become feasble. We show that choosng tasks n the order of the closest ef to 1 not only mnmzes the amount of ncreased energy consumpton, but also s benefcal for feasblty n the followng theorem. We now prove that our task workload allocaton algorthm acheves both feasblty-optmalty and energy-optmalty. Theorem 1: Our task workload allocaton presented n Algorthm 1 acheves both feasblty-optmalty and energyoptmalty. Proof: We only consder the case that both clusters are turned on (.e., δ B = 1 and δ L = 1) snce n the other cases that one of the clusters s turned off, there s only one confguraton of task allocaton where all tasks are allocated n the cluster turned on. Algorthm 1 works by two stages: 1) allocatng workload of all tasks n τ L to the LITTLE cluster and workload of all tasks n τ B to the bg cluster (lnes 1-5) and 2) rearrangng the workload to satsfy the feasblty condtons, especally related to cluster capacty constrants (lnes 6-25). After stage 1), there are 4 cases: ) both C3 and C4 are satsfed; ) both C3 and C4 are not satsfed; ) only C3 s satsfed; v) only C4 s satsfed. We frst show the energy-optmalty for each case of ) - v), then, prove the feasblty-optmalty later. Recall that τ L = {τ ef 1} and τ B = {τ ef < 1}. For any varable X, we denote by X the amount of the varaton of X through the remanng of ths proof. We now prove energy-optmalty. We denote by U L = ul the amounts of workload moved from the LITTLE cluster to the bg cluster, where u L s each amount of workload of τ n U L. If we move U L to the bg cluster, the change amount of the workload on the bg cluster (denoted as U B ) s calculated as U B = = = u B = u B,max y B u B,max ( y L ) = u B,max ( ul u L ub,max. (8) Then, the amount of the varaton of total energy consumpton on both the bg and LITTLE clusters (denoted by E) s E = E L + E B = H Pdynamc L u L + H P dynamc B u B = H Pdynamc L u L + H P dynamc B u L ub,max = H u L (P dynamc L P dynamc B ub,max ) = H Pdynamc L u L (1 ef ) (9) Note that all varables n (9) are constant except u L. In case ), f τ τ L moves from the LITTLE cluster to the bg cluster, u L < 0 and 1 ef 0, thus E 0. If τ τ B moves from the bg cluster to the LITTLE cluster, E > 0 based on the same reasonng. It means movng any )

7 task ncreases E n ths case. Therefore, stage 1) of Algorthm 1 s energy optmal, n case ). In case ), we prove that τ s not feasble later. In case ), we should move the specfc amount of the workload from the LITTLE cluster to the bg cluster, so that C4 s not volated. When τ τ L moves from the LITTLE cluster to the bg cluster, energy consumpton s supposed to ncrease snce u L < 0 and ef 1. When we move to some amount of the workload n the LITTLE cluster, choosng tasks n the order of the closest ef to 1 (.e., 1 ef has the mnmum value) mnmzes E n (9). Therefore, movng tasks n the order of the closest ef to 1 n τ L to the bg cluster s optmal n energy consumpton for case ). In case v), movng tasks n the order of the closest ef to 1 n τ B to the LITTLE cluster s optmal n energy consumpton based on the same reasonng shown n case ). We now prove feasblty-optmalty. We consder the followng task allocaton process P: P1. allocatng workload of all tasks to the LITTLE cluster P2. movng tasks n the order of the smallest ef from the LITTLE cluster to the bg cluster untl ul = W L. We prove that (a) process P mnmzes ub when the value of ul s fxed as W L and (b) stage 2) of Algorthm 1 satsfes the feasblty-optmalty for each case of ) - v). Proof of (a): durng process P2, when we move tasks from the LITTLE cluster to the bg cluster (.e., U L < 0), accordng to Eq. (8), U B = u L ub,max = P dynamc L P B dynamc u L ef (10) For the same amount of U L < 0, n order to mnmze U B, we should move tasks n the order of the smallest ef from the LITTLE cluster to the bg cluster. Therefore, (a) s true. Proof of (b): In case ), all feasblty condtons are already satsfed. In case ), by (a), movng tasks n the order of the smallest ef from the LITTLE cluster to the bg cluster mnmzes U B but U B > 0 ( u L < 0 u B > 0). Therefore, there s no way to decrease U L and U B at the same tme (.e., τ s not feasble n case )). In case ), the closest ef to 1 n the LITTLE cluster corresponds to the smallest ef n the cluster snce τ τ L. By (a), movng tasks n the order of the closest ef to 1 n τ L to the bg cluster mnmzes U B. Therefore, f C3 s volated n the process of P2 where W L = m L, τ s not feasble. In case v), Algorthm 1 satsfes the feasblty-optmalty based on the same reasonng shown n case ). Therefore, Algorthm 1 satsfes feasblty/energy optmalty. Complexty. We denote by n the number of tasks n a task set. For a gven combnaton of frequency settngs, Algorthm 1 requres O(n log n) to sort a task set. Snce we have only a few combnatons of cluster confguratons (3 for {δ B, δ L }, 9 for and 5 for f L ), t takes only O(A n log n) tme-complexty to solve the optmzaton problem n Secton IV-A, where A s a small constant. We note that accordng to our optmzaton formulaton presented n Secton IV-A, f a frequency combnaton s gven, t can be solved by a Lnear Programmng (LP) solver. LP solvers can solve a LP formulaton n polynomal tme, but the polynomal s generally known as a hgher degree. V. FEASIBILITY-OPTIMAL GLOBAL SCHEDULING Whle t has been proved that EDF s feasblty-optmal on a unprocessor platform, t has been challengng to develop feasblty-optmal schedulng algorthm on a homogeneous mult-core platform. Startng from PFar [14], some feasbltyoptmal schedulng algorthms have been developed, but they are not as ntutve as EDF. Recently, a study [7] has focused on dervaton of general rules that enable a schedulng algorthm to be feasblty-optmal, and therefore; the study have a sgnfcant mpact on developng feasblty-optmal schedulng algorthms n that the only thng we should consder s to satsfy the general rules. Now, the bg.little mult-core processor entals the need of heterogeneous global schedulng. However, there are few studes for the schedulng; the only known study [13] ntroduced a feasble schedule n the process of dervng the exact feasblty condton, but t s complcated and less ntutve. To ths end, we suggest optmal schedulng rules for mplct-deadlne perodc tasks runnng on two-type heterogeneous mult-core platforms. Greg Levn et al. [7] developed DP-Far whch gudes the schedulng rules for the case of a homogeneous mult-core platform. We generalze DP-Far to a heterogeneous mult-core platform. DP-Far ams at schedulng tasks by followng the proportonate farness requrement, on whch each task s executed proportonally to ts utlzaton. DP-Far shows that mposng the farness requrement only at job deadlnes suffces to reach the optmalty. It parttons tme nto slces based on deadlnes of all jobs nvoked by a task set (referred to as deadlne parttonng). To ensure the farness requrement at every deadlnes, each job s assgned ts executon requrement proportonal to ts utlzaton wthn each tme slce. We note that f every job can be executed permanently at a rate equal to ts utlzaton (referred to as a flud schedulng model), the farness requrement can be easly satsfed for all jobs. However, t s mpossble to mplement such a flud schedule on practcal platforms snce one core cannot execute more than one task smultaneously. Thereby, DP-Far suggests some schedulng rules for desgnng practcal schedulers to guarantee the optmalty. We now explan how to generalze DP-Far to a heterogeneous mult-core platform and present the schedulng rules for optmal schedulers to obey. After deadlne parttonng, the k-th tme slce (denoted by σ k ) s [t k 1, t k ) of length l k = t k t k 1. Wthn the tme slce σ k, each task τ s then assgned ts executon requrement u B l k, u L l k on both bg and LITTLE clusters, respectvely. As schedulng decsons are made over tme, the remanng executon of task τ at tme t n σ k on bg and LITTLE clusters s denoted by R B(t) and RL (t), respectvely. At each tme t, a task s sad to be a mgratng task when ts executon remanng s on both bg and LITTLE clusters (.e., R B(t) > 0 and RL (t) > 0), and a task s sad to be an parttoned task when ts executon remanng

8 s solely on ether bg or LITTLE cluster (.e., R L(t) = 0 or R B (t) = 0). A mgratng task at tme t can become a parttoned one whenever no executon remans ether bg or LITTLE cluster after t. The major challenge to generalze DP-Far to a heterogeneous mult-core platform s to schedule mgratng tasks. If there are only parttoned tasks, we can consder each cluster as an ndependent homogeneous platform and apply DP-Far n an dentcal way. However, f there are mgratng tasks, two new ssues arse: (a) we need to ensure that mgratng tasks should execute ts workload on at most one cluster at each tme nstant whle they should fnsh all executon requrements on both bg and LITTLE clusters at the end of each tme slce, and (b) we need to determne whch cluster executes how many mgratng tasks n a way that both clusters successfully process all of the allocated workloads at every tme slce. To address ssue (a), we defne the task-level local laxty of τ at tme t (denoted by L (t)) as the dfference between the remanng tme n a tme slce σ k and the sum of executon remanng on each cluster before the tme slce, and t s presented as L (t) = (t k t) (R B (t) + R L (t)). (11) At the begnnng of a tme slce σ k, R B(t k 1) s u B l k, R L(t k 1) s u L l k. Once a job of a task has zero task-level local laxty, t should always be executed on ether the bg or LITTLE cluster untl the end of tme slce; otherwse, the job wll mss ts deadlne. To address ssue (b), we defne the cluster-level local laxty at tme t, denoted by L B (t) (L L (t)), as the dfference between the total avalable capacty of the bg (LITTLE) cluster from t to at the end of a tme slce and the total remanng workloads on the bg (LITTLE) cluster, expressed as L B (t) = m B (t k t) L L (t) = m L (t k t) R B (t), (12) R L (t). (13) If cluster-level local laxty of the bg cluster at t (.e., L B (t)) reaches zero, all the bg cores should execute jobs untl the end of the tme slce; otherwse, at least one job wll mss ts deadlne due to nsuffcent supply. The same holds for clusterlevel local laxty of the LITTLE cluster. Wth the notons of task-level and cluster-level local laxtes, we present our DP-Far-Hetero schedulng rules. Defnton 1: (DP-Far-Hetero schedulng for tme slces) A schedulng algorthm belongs to DP-Far-Hetero f t schedules jobs wthn a tme slce σ k accordng to the followng rules: R1: Always allocate m B jobs on the bg cores at tme t f ts cluster-level laxty s zero (.e., L B (t) = 0), and allocate m L jobs on the LITTLE cores at tme t f ts cluster-level laxty s zero (.e., L L (t) = 0); R2: Always run all the jobs wth zero task-level local laxty (.e., L (t) = 0); R2-1: Assgn parttoned tasks wth zero laxty to cores pror to assgnng mgratng tasks wth zero laxty to the rest of cores; R3: Never run a job wth no workload remanng on both clusters n the slce. We now prove that any DP-Far-Hetero scheduler s feasblty-optmal on two-type heterogeneous mult-core platforms. Theorem 2: If a perodc mplct-deadlne task set τ s feasble, any DP-Far-Hetero schedulng algorthm always schedules the task set wthout any deadlne mss. Proof: The basc dea of the proof s to show that f a job of task τ n a task set τ msses ts deadlne when t s scheduled by DP-Far-Hetero, the total workload allocated on the bg or LITTLE cluster s larger than the capacty of the bg or LITTLE cluster, whch volates at least one of constrants C3 and C4, so that t contradcts the assumpton that the task set s feasble. Detals are gven n Appendx B. VI. EVALUATION Ths secton presents smulaton results to evaluate our energy/feasblty-optmal global schedulng framework. Smulaton envronment. We have three nput parameters: (a) the number of cores on each bg and LITTLE cluster (m B, m L ), (b) dscrete frequency/voltage levels of each bg and LITTLE core, and (c) ndvdual task parameters (T, C B, CL ). We set platform-specfc parameters (a) and (b), to the same as the ones employed n the ODROID-XU+E board [8],.e., m B = 4, m L = 4, and the frequency/voltage levels as shown n Table I. Our power model wth Table II s used to estmate the total energy consumpton. Task sets are generated based on a technque proposed earler [15]. For each task τ, T s unformly chosen n [100, 1000], and C L and C B are chosen based on bmodal parameters n [100, T ] and [100, C L ], respectvely. For each bmodal parameter from 0.1 to 0.9 wth the step of 0.1, we generate 1,500 task sets as follows. Intally, we generate a set of m B +1 tasks, and create a new set by addng a new task nto the old set untl the task set passes the feasblty constrants under the settng of both clusters turned on wth the maxmum frequences. We compare our global schedulng framework (annotated as Our-Global) wth parttoned schedulng approaches (annotated as ILP-Parttoned). Parttoned approach statcally assgns each task to a core, and task mgraton s not allowed. In ILP- Parttoned, task-to-core assgnment s formulated by alterng cluster-level task workload varables (x B and x L ) n our optmzaton formulaton (Energy/Feasblty-OPT) wth corelevel zero-or-one varables whch ndcate the assgnment of a task to a core. It s then solved by Integer Lnear Programmng (ILP). Smulaton results. To demonstrate how good global schedulng s for a bg.little platform compared to parttoned schedulng n terms of both feasblty and energy consumpton, we compare the number of task sets feasble by each schedulng approach and total energy consumpton to schedule a feasble task set. Fgure 2 plots the number of task sets feasble by Our- Global and ILP-Parttoned wth dfferent maxmum utlzaton B,max def on the bg cluster (denoted by U = ub,max ). Bascally, Our-Global s generalzaton of ILP-Parttoned, so Our- Global domnates ILP-Parttoned. Our-Global guarantees feas-

9 Fg. 2. Global The number of feasble task sets by ILP-Parttoned and Our- The bg-little task sets are wdely spread to all the utlzaton dstrbutons, and ILP-Parttoned consumes 11% more energy than Our-Global on average. Both Our-Global and ILP-Parttoned turn on both bg and LITTLE clusters but dfferent frequency settngs. Our-Global can accommodate task workload wth lower frequency levels. For example, there exsts a case that Our-Global sets the frequency confguraton of the bg and LITTLE clusters to 1100MHz and 600MHz but ILP-Parttoned sets the confguraton to 1500MHz and 600MHz for a task set wth U B,max = 6.05, so ILP-Parttoned consumes 35% more energy than Our-Global. A gap between Our-Global and ILP-Parttoned s ncreased smoothly as the utlzaton ncreases; t starts to decrease from 5.5. As the utlzaton ncreases, there s a less room for savng energy, because Our-Global and ILP-Parttoned reach to the maxmum frequency for each cluster. Wth the beneft of global schedulng, Our-Global can dstrbute all workload n the most energy-effcent way, snce t not only has no restrcton on task mgraton n contrast to ILP-Parttoned but also utlzes such a property effectvely. Fg. 3. Energy consumpton ratos of ILP-Parttoned to Our-Global ble solutons for all generated task sets whle ILP-Parttoned fnds 10% less feasble task sets than Our-Global. Fgure 3 plots energy consumpton ratos of ILP-Parttoned relatve to Our-Global wth dfferent maxmum utlzaton on the bg cluster. We note that we only show the results for feasble task sets by both approaches. We separate the generated task set nto two subsets: the task sets of whch all task workload s allocated only on the LITTLE cluster by Our-Global, and the others (denoted by LITTLE-only and bg- LITTLE, respectvely). The LITTLE-only task sets are dstrbuted on the range of the utlzaton from 0.5 to 3.5, and Our-Global makes sgnfcant dfferences n energy consumpton compared to ILP-Parttoned for the LITTLE-only task sets. ILP-Parttoned consumes up to 60% more energy than Our-Global. Ths s because ILP-Parttoned decde to turn on both bg and LITTLE clusters, whle Our-Global only uses the LITTLE cluster n the task workload allocaton. For example, we observe that there exsts a case that Our-Global uses only LITTLE cluster wth 500MHz but ILP-Parttoned uses both bg and LITTLE clusters wth 800MHz and 600MHz respectvely for a task set wth U B,max = 2.65, so ILP-Parttoned consumes 87% more energy than Our-Global. Our-Global has 25% more such task sets that run on LITTLE cluster only compared to ILP- Parttoned. In general, a LITTLE core consumes less power than a bg core. Thereby, f the LITTLE cluster can accommodate all workload (.e., the bg cluster s turned off), t can save much energy compared to leavng both clusters turned on. VII. RELATED WORK Energy-aware real-tme schedulng. In past decades, energy-aware real-tme schedulng has been wdely explored for both unprocessor and mult-core platforms [16], [17]. For perodc tasks on unprocessor platforms, Aydn et al. [11] showed that an energy-optmal schedule would execute all the tasks at a constant frequency to fully utlze the processor under the assumpton that each task presents ts worst-case workload behavor wth the convex power consumpton functon. Studes on homogeneous mult-core platforms can be classfed nto parttonng and global schedulng of perodc tasks. Aydn and Yang [12] addressed the problem of parttonng perodc tasks by consderng both feasblty and energy consumpton. They showed that the problem s NPhard and developed several heurstc algorthms by explotng the well-known bn-packng algorthms. They expermentally showed that the Worst-Ft Decreasng (WFD) algorthm always acheves the best energy conservaton when task utlzaton orderng s known a pror. A comprehensve survey of energyeffcent parttoned schedulng under dverse task and power models wth practcal consderaton s provded n [16], [17]. On the contrary to parttoned schedulng, there are some energy-effcent global schedulng algorthms wth guarantees of feasblty [18], [19], [20], [21]. Funaoka et al. [18] proposed real-tme statc voltage and frequency scalng (RT-SVFS) technques based on an optmal real-tme schedulng algorthm for homogeneous mult-core platforms. The technques are regarded as a statc voltage/frequency scalng approach, because after settng the ntal voltage and frequency t wll not change durng run-tme. The technques have been proven optmal when the voltage and frequency can be controlled both unformly and ndependently among processors. Based on RT- SVFS, real-tme dynamc voltage and frequency scalng (RT- DVFS) was presented n order to accommodate to dynamc envronments [19]. Whle a lot of research has been done for homogeneous mult-core platforms, comparatvely less research has been done for heterogeneous mult-core platforms. Moreover, parttonng approach s only consdered n the lteratures [3], [4],

10 [5], [6]. Yu and Prasanna [3] addressed the problem of assgnng perodc tasks on heterogeneous platforms wth the settng of the frequency level per task. They formulated the problem as an Integer Lnear Programmng (ILP) to mnmze the energy consumpton and provded a lnear relaxaton heurstc algorthm. The other related works [4], [5], [6] studes energyeffcent task parttonng on platforms wth a statc frequency for each processor. Chen and Thele [4] provded a fully polynomal-tme approxmaton scheme based on dynamc programmng for a case of two-type heterogeneous processors. Ths work was later extended n [5] for n-type heterogeneous processors. Chen et al. [6] formulated the energy-effcent task parttonng problem as an ILP and provded polynomal-tme algorthms by applyng exstng bn-packng algorthms based on a relaxaton of some assumptons. The most of related work for heterogeneous mult-core platforms consders task parttonng approach where task mgraton s not allowed, and t s proved to be NP-hard n a strong sense. Thus, they focused on developng effcent heurstc algorthms wth approxmaton bounds. Ths paper focuses on energy/feasblty optmal global schedulng on two-type heterogeneous platforms. We develop an optmal task workload allocaton algorthm from both the feasblty and energy consumpton ponts of vew and establsh optmal schedulng rules for two-type heterogeneous mult-core platforms. Real-tme schedulng on heterogeneous mult-core platforms. From the feasblty pont of vew, the schedulng problem on heterogeneous multprocessors has been studed n the past [22], [23], [24], [13]. Baruah [22] consdered the task parttonng problem whch determnes whether the tasks can be parttoned among processors n such a manner that all tmng constrants are met. Andersson et al. [23] proposed a task assgnment heurstc algorthm for the two-type platforms. Rarav et al. [24] consdered an ntra-mgratve schedulng problem, whch statcally assgns each task to a core type and allows task mgraton among cores of the same core type, and proposed a lnearthmc task assgnment algorthm. For global schedulng, Baruah [13] provded an exact feasblty analyss. VIII. CONCLUSION Ths paper s motvated by an attempt to see how good global schedulng, beyond parttoned schedulng, can be for bg.little platforms (one of the cuttng-edge heterogeneous mult-core archtectures) n the perspectve of both core utlzaton and energy savng. To ths end, we develop an energy/feasblty-optmal global schedulng framework whch determnes bg.little platform confguratons and global job schedules, so that the energy consumpton s mnmzed wthout compromsng feasblty. Moreover, we suggest DP- Far-Hetero as optmal schedulng rules for mplct-deadlne perodc tasks runnng on two-type heterogeneous mult-core platforms. Ths work wll be a bass for desgnng effcent global schedulers for heterogeneous mult-core platforms. One of the major concerns on global schedulng s mgraton overhead. Hence, a drecton of our future work ncludes ncorporatng mgraton overhead nto our framework and developng an effcent global schedulng algorthm targetng for general heterogeneous mult-core platforms. REFERENCES [1] R. I. Davs and A. Burns, A survey of hard real-tme schedulng for multprocessor systems, ACM Computng Surveys, vol. 43, pp , [2] ARM, bg.lttle technology: The future of moble, [Onlne]. Avalable: bg-little-technology-the-futue-of-moble.pdf [3] Y. Yu and V. K. Prasanna, Resource allocaton for ndependent realtme tasks n heterogeneous systems for energy mnmzaton, Journal of Informaton Scence and Engneerng, vol. 19(3. [4] J.-J. Chen and L. Thele, Energy-effcent task partton for perodc real-tme tasks on platforms wth dual processng elements, n Proceedngs of the 14th IEEE Internatonal Conference on Parallel and Dstrbuted Systems (ICPADS 2008), [5] C.-Y. Yang, J.-J. Chen, T.-W. Kuo, and L. Thele, An approxmaton scheme for energy-effcent schedulng of real-tme tasks n heterogeneous multprocessor systems, n Proceedngs of the Conference and Exhbton of Desgn, Automaton, and Test n Europe (DATE 2009), [6] J.-J. Chen, A. Schranzhofer, and L. Thele, Energy mnmzaton for perodc real-tme tasks on heterogeneous processng unts, n Proceedngs of the Internatonal Parallel and Dstrbuted Processng Symposum (IPDPS 2009), [7] G. Levn, F. Shelby, S. Catln, P. Ian, and B. Scott, DP-Far: A smple model for understandng optmal multprocessor schedulng, n ECRTS, [8] Hardkernel co. Ltd., ODROID-XU+E specfcaton, [Onlne]. Avalable: [9] A. Carroll and G. Heser, Unfyng DVFS and offlnng n moble multcores, n RTAS, [10] Hardkernel co. Ltd., ODROID-XU+E, [Onlne]. Avalable: G [11] H. Aydn, R. Melhem, D. Mossé, and P. Mejía-Alvarez, Dynamc and aggressve schedulng technques for power-aware real-tme systems, n Proceedngs of IEEE Real-Tme Systems Symposum, [12] H. Aydn and Q. Yang, Energy-aware parttonng for multprocessor real-tme systems, n Proceedngs of the Internatonal Parallel and Dstrbuted Processng Symposum (IPDPS 2003), [13] S. Baruah, Feasblty analyss of preemptve real-tme systems upon heterogeneous multrpocessor platforms, n RTSS, [14] S. Baruah, N. K. Cohen, C. G. Plaxton, and D. A. Varvel, Proportonate progress: a noton of farness n resource allocaton, Algorthmca, vol. 15, no. 6, pp , [15] T. Baker, An analyss of EDF schedulablty on a multprocessor, IEEE Transactons on Parallel Dstrbuted Systems, vol. 16, no. 8, pp , [16] J.-J. Chen and C.-F. Kuo, Energy-effcent schedulng for real-tme systems on dynamc voltage scalng (DVS) platforms, n RTCSA, [17] D. L and J. Wu, Energy-aware Schedulng on Multprocessor Platforms. SprngerBrefs n Computer Scence, [18] K. Funaoka, S. Kato, and N. Yamasak, Work-conservng optmal realtme schedulng on multprocessors, n ECRTS, 2008, pp [19] K. Funaoka, A. Takeda, S. Kato, and N. Yamasak, Dynamc voltage and frequency scalng for optmal real-tme schedulng on multprocessors, n Proceedngs of the Internatonal Symposum on Industral Embedded Systems (SIES 2008), [20] D.-S. Zhang, F.-Y. Chen, H.-H. L, S.-Y. Jn, and D.-K. Guo, An energy-effcent schedulng algorthm for sporadc real-tme tasks n multprocessor systems, n Proceedngs of the IEEE Internatonal Conference on Hgh Performance Computng and Communcatons (HPCC 2011), [21] G. A. Moreno and D. de Nz, An optmal real-tme voltage and frequency scalng for unform multprocessors, n RTCSA, [22] S. Baruah, Task parttonng upon heterogeneous multprocessor platforms, n RTAS, [23] B. Andersson, G. Rarav, and K. Bletsas, Assgnng real-tme tasks on heterogeneous multprocessors wth two unrelated types of processors, n RTSS, [24] G. Rarav, B. Andersson, K. Bletsas, and V. Nels, Task assgnment algorthms for two-type heterogeneous multprocessors, n ECRTS, 2012.

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling Real-Tme Systems Multprocessor schedulng Specfcaton Implementaton Verfcaton Multprocessor schedulng -- -- Global schedulng How are tasks assgned to processors? Statc assgnment The processor(s) used for