Offline Equivalence: A Non-Preemptive Scheduling Technique for Resource-Constrained Embedded Real-Time Systems

Similar documents
Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling

Embedded Systems. 4. Aperiodic and Periodic Tasks

Module 9. Lecture 6. Duality in Assignment Problems

Problem Set 9 Solutions

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Last Time. Priority-based scheduling. Schedulable utilization Rate monotonic rule: Keep utilization below 69% Static priorities Dynamic priorities

Two Methods to Release a New Real-time Task

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

Feature Selection: Part 1

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

Clock-Driven Scheduling (in-depth) Cyclic Schedules: General Structure

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Lecture 4: November 17, Part 1 Single Buffer Management

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Uncertainty in measurements of power and energy on power networks

Improved Worst-Case Response-Time Calculations by Upper-Bound Conditions

Appendix B: Resampling Algorithms

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

VQ widely used in coding speech, image, and video

Difference Equations

NP-Completeness : Proofs

Annexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances

Foundations of Arithmetic

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Comparison of Regression Lines

CHAPTER 17 Amortized Analysis

An Integrated OR/CP Method for Planning and Scheduling

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Negative Binomial Regression

Calculation of time complexity (3%)

Partitioned Mixed-Criticality Scheduling on Multiprocessor Platforms

Errors for Linear Systems

More metrics on cartesian products

A Robust Method for Calculating the Correlation Coefficient

Lecture 4. Instructor: Haipeng Luo

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Global EDF Scheduling for Parallel Real-Time Tasks

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

AN EXTENDIBLE APPROACH FOR ANALYSING FIXED PRIORITY HARD REAL-TIME TASKS

1 GSW Iterative Techniques for y = Ax

Markov Chain Monte Carlo Lecture 6

Real-Time Operating Systems M. 11. Real-Time: Periodic Task Scheduling

Lecture 14: Bandits with Budget Constraints

Kernel Methods and SVMs Extension

Parametric Utilization Bounds for Fixed-Priority Multiprocessor Scheduling

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

4DVAR, according to the name, is a four-dimensional variational method.

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Assortment Optimization under MNL

Chapter - 2. Distribution System Power Flow Analysis

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Numerical Heat and Mass Transfer

Limited Preemptive Scheduling for Real-Time Systems: a Survey

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Lecture Notes on Linear Regression

Grover s Algorithm + Quantum Zeno Effect + Vaidman

find (x): given element x, return the canonical element of the set containing x;

Notes on Frequency Estimation in Data Streams

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

= z 20 z n. (k 20) + 4 z k = 4

Hashing. Alexandra Stefan

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

On the correction of the h-index for career length

Boostrapaggregating (Bagging)

Min Cut, Fast Cut, Polynomial Identities

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Improving the Quality of Control of Periodic Tasks Scheduled by FP with an Asynchronous Approach

Regulation No. 117 (Tyres rolling noise and wet grip adhesion) Proposal for amendments to ECE/TRANS/WP.29/GRB/2010/3

Lecture 4: Universal Hash Functions/Streaming Cont d

Hopfield Training Rules 1 N

MMA and GCMMA two methods for nonlinear optimization

Economics 101. Lecture 4 - Equilibrium and Efficiency

Chapter 13: Multiple Regression

Minimizing Energy Consumption of MPI Programs in Realistic Environment

ECE559VV Project Report

Affine transformations and convexity

The Minimum Universal Cost Flow in an Infeasible Flow Network

A 2D Bounded Linear Program (H,c) 2D Linear Programming

a b a In case b 0, a being divisible by b is the same as to say that

Second Order Analysis

x = , so that calculated

Uncertainty and auto-correlation in. Measurement

The Study of Teaching-learning-based Optimization Algorithm

Handling Overload (G. Buttazzo, Hard Real-Time Systems, Ch. 9) Causes for Overload

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

EEE 241: Linear Systems

Energy-Efficient Primary/Backup Scheduling Techniques for Heterogeneous Multicore Systems

Chapter 6. Supplemental Text Material

Some Consequences. Example of Extended Euclidean Algorithm. The Fundamental Theorem of Arithmetic, II. Characterizing the GCD and LCM

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

On the Throughput of Clustered Photolithography Tools:

Fixed-Priority Multiprocessor Scheduling with Liu & Layland s Utilization Bound

Transcription:

Offlne Equvalence: A Non-Preemptve Schedulng Technque for Resource-Constraned Embedded Real-Tme Systems Mtra Nasr Björn B. Brandenburg Max Planck Insttute for Software Systems (MPI-SWS) Abstract We consder the problem of schedulng a set of nonpreemptve perodc tasks n an embedded system wth a lmted amount of memory. On the one hand, due to the memory lmtatons, a table-based schedulng approach mght not be applcable, and on the other hand, the exstng onlne non-preemptve schedulng algorthms are ether not effcent n terms of the schedulablty rato, or suffer from consderable runtme overhead. To arrve at a compromse, ths paper proposes an onlne polcy that s equvalent to a gven offlne table to combne some of the advantages of both onlne and offlne schedulng: we frst consder a low-overhead onlne schedulng algorthm as a baselne, and then dentfy any rregular stuatons where a gven offlne table dffers from the schedule generated by the onlne algorthm. We store any such rregulartes n tables for use by the onlne schedulng algorthm, whch then can recreate the table at runtme. To generate sutable tables, we provde an offlne schedulng algorthm for nonpreemptve tasks, and a table-transformaton algorthm to reduce the number of rregulartes that must be stored. In an evaluaton usng an Arduno board and synthetc task sets, we have observed the technque to result n a substantal reducton of schedulng overhead compared to CW-EDF, the onlne scheduler that acheves the hghest schedulablty rato, whle havng to store on average only a few dozen to a few hundreds of bytes of the statc schedule. I. INTRODUCTION Embedded systems subject to severe cost, power, or energy constrants usually have only very lmted processng capacty and small memores. For nstance, the Atmel UC3A0512 mcrocontroller, whch s used n msson crtcal space applcatons [1], has 64 KB of nternal SRAM, 512 KB of nternal flash memory, and s clocked at 12 MHz. Smlarly, an Arduno Uno uses an ATmega328P mcrocontroller wth a clock speed of 16 MHz, 2 KB of SRAM, and 32 KB of flash memory. Wth such lmted resources, these systems typcally do not use a multtaskng operatng system. Thus, non-preemptve executon s the natural way (f not the only way) of executng real-tme tasks. A tradtonal way to realze non-preemptve schedulng n real-tme systems s to use statc tmetables, as used for example n the classc tme-trggered paradgm [2]. Wth respect to runtme overheads, table-drven schedulng s attractve snce the scheduler performs only an O(1) table lookup. However, n perodc task sets, the number of jobs n a hyperperod (and hence the sze of the tmetable) can be exponental n the number of tasks, whch can translate nto prohbtve memory overheads. For example, Anss et al. [3] descrbe a powertran ECU that conssts of sx perodc tasks wth perods {1, 5, 10, 10, 40, 100}. Due to the relaton of ther perods and specfed release offsets (see [3] for detals), a tmetable for ths ECU would have to store nformaton for more than 500 jobs. However, a table wth 500 entres of sze 32 bts each requres about 2 KB of memory, whch would completely fll Arduno Uno s RAM or take up a substantal part of ts flash memory (see Sec. IV for an explanaton of typcal table entry szes). As another example, consder an automotve benchmark provded by Bosch [4], whch reports tasks to have perods n the set {1, 2, 5, 10, 20, 50, 100, 200, 1000}. Even f only one runnable (functon) of each perod appears n an ECU, the hyperperod wll contan 1,886 jobs (and up to 1,000 dle tmes, whch must also be encoded n the table). Further, the presence of a sngle nconvenent perod that s not harmonc, such as a functonalty nvoked at roughly 30 frames per second, can lead to rapd table growth: addng a 33 ms perod to the Bosch benchmark lets the table grow to 63,238 jobs n one hyperperod. Consequently, table-based solutons can be dffcult to adopt n resource-constraned embedded systems. And even f a large table could be accommodated n prncple by purchasng a mcrocontroller wth a suffcent amount of flash memory, n deeply embedded, mass-produced systems where memores are szed to ft a gven applcaton, savng even only a few klobytes of RAM or flash memory can translate nto sgnfcant cost savngs at scale (.e., f sold thousands or even mllons of tmes). Several works have tred to reduce the sze of a tmetable by means of modfyng the orgnal perods. Rpoll et al. [5] proposed a perod assgnment method to reduce the length of the hyperperod, and Nasr et al. [6, 7] presented two methods for generatng harmonc perods. Although these methods reduce the number of jobs n the tmetable, they are not applcable to systems that must run wth a predefned set of perods, e.g., f a thrd-party component must be ntegrated, or f the perod s chosen based on the samplng perod of a hardware devce. In some cases, changng perods may requre redesgnng the applcaton (e.g., a dfferent control approach may become necessary), or the perod may be dctated by physcal phenomena that cannot be precsely observed wth another samplng perod. An alternatve soluton s to use onlne non-preemptve schedulng algorthms. Most of the well-known work-conservng job schedulng polces such as earlest-deadlne frst (EDF) and rate monotonc (RM) can be used n ths case. It s also possble to use recently developed non-work-conservng schedulng solutons such as Precautous-RM [8] and crtcal tme wndow EDF (CW-EDF) [9]. However, there are two man dsadvantages to usng onlne solutons n a resource-constraned embedded system: frst, the overhead of the schedulng algorthm s a bottleneck due to the system s lmted processng power, and second, the exstng onlne algorthms, whch are non-optmal, may not be able to schedule feasble task sets that can be scheduled wth a table-based approach. Moreover, the exstng schedulablty tests for these onlne algorthms are ether pessmstc f they are appled to perodc tasks [9, 10], or restrcted to a specal type of perods, e.g., harmonc perods [9, 10, 11]. In ths paper, we propose an approach that combnes the advantages of both onlne and offlne schedulng n order to provde a soluton for resource-constraned embedded systems. Our dea s to dentfy those partcular entres n the offlne table that are volatng the polcy of a baselne schedulng algorthm. The baselne algorthm can be, for example, non-preemptve RM (NP-RM), whch has a relatvely low runtme overhead. 1

We dentfy two rregular types of table entres that make the schedule stored n the table dfferent from an NP-RM schedule. The frst type of rregular entres are prorty nversons,.e., when (accordng to the table) a low-prorty task s dspatched whle a hgher-prorty task has a pendng job. The second type of rregular entres are those that volate the work-conservng nature of NP-RM,.e., when no task s scheduled n the table whle there are pendng jobs. The dea s to store only the rregular cases, and to modfy the onlne algorthm to enact these stored devatons from the baselne polcy at runtme, thereby recreatng the gven offlne schedule wthout havng to store t n ts entrety. Contrbutons. Snce the success of our soluton depends on beng able to generate an offlne table for the gven set of nonpreemptve perodc tasks, frst we propose an effcent method to construct such an offlne table (Sec. III). Most of the exstng solutons are based on a branch-and-bound approach and terate over many potental orderngs of the jobs [12], and thus do not scale well f the number of jobs becomes large, whch s often the case for perodc task sets. However, we have observed that durng ths search process, certan sets of jobs can only have a few potental feasble schedules, and hence they only allow a lmted set of possbltes for other jobs to be added among or around them. Explotng ths property, nstead of keepng track of ndvdual jobs, we keep track of sequences of jobs that are assocated wth (or chaned to) an nterval of tme (we call t a chaned wndow), whch allows us to better cope wth the sze of the problem, and to more quckly fnd a feasble schedule. The next contrbuton of the paper s to show how to dentfy the rregular cases that make an NP-RM schedule dfferent from the offlne table (Sec. IV). In partcular, we provde a table manpulaton heurstc to reduce the number of prorty nversons wth respect to the baselne polcy. Fnally, we prototyped our soluton on an Arduno Mega 2560 board to compare the runtme overhead and space requrement of dfferent schedulng strateges. In Sec. V, we report on a comparson of varous ways to generate schedulng tables and show that, after applyng our table manpulaton method, the best method requres on average only a few dozen to a few hundred bytes of rregulartes to be stored, whereas tradtonal tabledrven approaches use several klobytes for the same workloads. II. SYSTEM MODEL AND NOTATIONS We consder a unprocessor system wth a set of ndependent, non-preemptve perodc tasks. The task set has n tasks denoted by τ = {τ 1, τ 2,..., τ n }. Each task τ s dentfed by τ = (C, T, D ), where C s the worst-case executon tme (WCET), T s the perod, and D T s the deadlne. Snce the task set s non-preemptve, we assume that sound WCET estmaton methods can be used, and hence tasks wll not overrun ther WCET at runtme. The system utlzaton s denoted by U = n =1 u, where u = C /T s the utlzaton of task τ. The hyperperod H of ths task set s the least common multple (LCM) of the perods. We assume that the tasks are ndexed accordng to ther perod so that T 1 T 2... T n. Each nstance of a task s called a job. We assume that the tasks do not have release jtter. Ths s a reasonable assumpton for perodc tasks that are not trggered by nterrupts and that run wthout an OS (e.g., Arduno s typcally one such system). Furthermore, we assume that the tasks are released synchronously. We defne a job sequence as an enumerated collecton of Fg. 1. The only two dfferences between ths schedule and NP-RM are an dle nterval [9, 10) and the executon of a job of τ 2 before τ 1 at tme 30. jobs (possbly of dfferent tasks) that must be executed n an order gven by ther ndex. Such a sequence s denoted by J = J 1, J 2,..., J m (m 0), where J denotes the th job n the sequence. The release tme, WCET, and absolute deadlne of that job are denoted by r, c, and d, respectvely. Here we use superscrpts to dstngush parameters of a job from those of a task. Later n Sec. IV, we consder job prorty, denoted by p (numercally lower values mply hgher prortes). Next we defne a non-preemptve schedule for a job sequence. Defnton 1. Consder a set of jobs J = {J 1, J 2,..., J m } and a functon S : J R that maps any job J J to a tme nstant. The functon S s a vald non-preemptve schedule ff:, j; S(J ) + c S(J j ) S(J j ) + c j S(J ), and (1) ; r S(J ) S(J ) + c d. (2) Further, a job sequence schedule (JSS) s a vald non-preemptve schedule for a job sequence that respects the gven order. Defnton 2. For a job sequence J = J 1,..., J m, a schedule S s a vald JSS ff S s a vald non-preemptve schedule and, 1 < m; S(J ) + c S(J +1 ). (3) III. TIMETABLE GENERATION In ths secton, we ntroduce our table generaton algorthm for non-preemptve perodc tasks. We start wth the basc dea and then present formal defntons and ntroduce the operatons that wll be used to generate the tmetable. A. Motvatons and Basc Idea Most of the exstng optmal solutons for non-preemptve schedulng are based on the branch-and-bound strategy [12]; they explore all possble combnatons of job orderngs to fnd the one that mnmzes a goal functon, such as maxmum tardness. If a soluton wth zero tardness s found, the set of jobs s schedulable. Even though n a perodc task set the number of branches wll be constraned (e.g., because no two jobs of a task share the same tme wndow), the tradtonal solutons stll fal to scale wth the ncreases n the number of tasks or the rato between perod values (some results are presented n Sec. V). For example, for two tasks wth a 1 ms and a 100 ms perod, a branch-and-bound heurstc must explore 99 possble job orders. A basc observaton s that some subsets of jobs can only have one or a few possble vald schedules. For example, n Fg. 1, from tme 10 to tme 40, there are only two possble ways of schedulng jobs of tasks τ 1 to τ 3 n ths nterval. Thus, another job wth a larger perod can only appear before or after ths set of jobs. As a result, nstead of keepng track of ndvdual jobs, we keep track of a sequence of jobs that s chaned to a wndow of 2

Fg. 2. A chaned wndow W = ([10, 30], J 1, J 2, J 3, J 4, δ = 4). (a) Jobs scheduled as soon as possble. (b) Jobs scheduled as late as possble. tme (.e., an nterval). We call such a sequence of jobs a chaned wndow (CW), whch forms the buldng block of our soluton. Another observaton s that an ordered subset of jobs has only a lmted amount of slack that can be arbtrarly placed before, between, or after the jobs. Any attempt to add another job wth a WCET exceedng the slack wll result n a deadlne mss for one of the jobs. We use ths observaton to keep track of the avalable slack n chaned wndows. Fg. 2 shows a chaned wndow (the notaton wll be fully explaned n Sec. III-B). In Fg. 2-(a), 4 unts of slack are scheduled after J 4, whereas n Fg. 2-(b), they are scheduled before J 1. To reduce the number of chaned wndows, we merge two neghborng chaned wndows (wth ntersectng tme ntervals) whenever the slack between them allows for t. When merged, a longer chaned wndow results that ncludes an ordered set of jobs from both chaned wndows (see Fg. 4-(b) n Sec. III-B for an example). We hence can teratvely fll n chaned wndows such that they progressvely cover larger ntervals, accumulate more jobs, and have less slack. Ths reduces the number of possbltes that exst when a new job s added to a job sequence. Next we formally ntroduce chaned wndows and show how to buld the whole schedule for a set of tasks. It s worth notng that our prmary goal s to obtan a fast and scalable heurstc for schedulng non-preemptve jobs. Consequently our method s not optmal,.e., t may not fnd a schedule for a feasble set of tasks, snce t does not explore all possble job orderngs. B. Chaned Wndows We start by ntroducng the noton of a chaned wndow, whch s a tuple that represents a job sequence, a wndow of tme, and a slack value. A key property of a chaned wndow s that any JSS for ts job sequence that starts and ends n ts assocated wndow of tme wll be vald. Ths property allows us to freely move around slack n a JSS wthout affectng ts valdty. Defnton 3. Consder a tuple w = ([s, e], J, δ), where [s, e] s an nterval of tme, J s a sequence of m jobs, and δ s the slack. Let C = m =1 c. The tuple w s a chaned wndow ff and any JSS S that satsfes e s δ = C (4) s S(J 1 ) S(J m ) + c m e (5) s a vald JSS (recall Defnton 2). Although by defnton each CW has at least one schedule that guarantees the tmng constrants of ts jobs, f we consder multple overlappng chaned wndows, then there mght not be any feasble soluton for all jobs. Fg. 3-(a) shows a case where no vald schedule exsts for two chaned wndows w 1 and w 2, whle Fg. 3-(b) shows a case where such a schedule exsts. Fg. 3. (a) A case wth no vald schedule. (b) A case wth one vald schedule. If we allow chaned wndows to completely cover another, we agan face the orgnal schedulng problem snce we then need to fnd a feasble orderng among the jobs of dfferent chaned wndows. Hence, we choose the borders of new chaned wndows such that chaned wndows are not fully contaned n another. To ths end, we mantan all chaned wndows n a sequence ordered by start tmes. Let W = w 1, w 2,..., w l be a sequence of l chaned wndows and w = ([s, e ], J 1, J 2,..., J m, δ ) the th chaned wndow of W, and let m denote the number of jobs n w. We ensure that any such sequence of chaned wndows W always satsfes,, 1 < l; s s +1 e e +1 and (6) [s, e ] [s +1, e +1 ] [s, e ] [s +1, e +1 ]. (7) Next we state a smple suffcent schedulablty condton for W. Theorem 1. If there exsts a functon S : l =1 J R such that S(J 1 1 ) = s 1 and,, j, 1 < l, 1 j < m ; S(J j+1 ) = S(J j ) + cj, (8) S(J+1) 1 = max{s +1, S(J m ) + c m }, and (9) r j S(J j ) S(J j ) + cj dj, (10) then S s a vald JSS for the jobs contaned n W and each such job meets ts deadlne f scheduled accordng to S. Proof: The schedule that s specfed by (8) to (10) s a specal case of (3), where each job s scheduled rght after the prevous job of ts own chaned wndow. Moreover, due to (9), the jobs of neghborng chaned wndows are executng sequentally, and hence ther allocatons do not overlap. The functon S s hence a vald JSS. That each job completes by ts deadlne s mpled by (10) and the fact that S s a vald JSS. In the remander of ths secton, we dscuss how to teratvely construct W so that such a vald JSS S exsts. C. Operatons on Chaned Wndows In ths subsecton, we () dscuss how to compute the start tme of the frst job n a chaned wndow, () defne an update operaton to prune the boundares of a chaned wndow and to remove subntervals of the tme wndows that can never be used n any vald JSS, () defne a merge operaton that allows us to merge two chaned wndows wthout affectng ther orgnal vald JSSs, (v) defne an add operaton for addng a new job to a set of chaned wndows whle keepng all jobs schedulable, and (v) show how to fnd a safe set of slack ntervals that can be used to buld a chaned wndow for a new job. Fnally, we wll put everythng together n Sec. III-D and show how to use these operatons to fnd a schedule for a set of tasks. Start tmes. Accordng to (9), the frst job of a chaned wndow 3

Fg. 4. Examples of the (a) update and (b) merge operatons. w cannot be started untl all jobs of the prevous chaned wndows have been scheduled, and hence some parts of the nterval [s, e ] cannot be allocated by any vald JSS of J. These naccessble ntervals are dentfed by the earlest possble fnsh tme of jobs n w 1 to w 1, denoted by t f 1, and the latest possble start tme of jobs of w to w l, denoted by t s. These values can be obtaned wth the followng recursve equatons: t f = max{tf 1, s } + C, and (11) t s = mn{t s +1, e } C, (12) where t f 1 = s 1 + C 1 and t s l = e l C l. Update operaton. The purpose of the update operaton s to prune all chaned wndows after a job has been added to one of the chaned wndows n W, and to ensure ther consstency. It works as follows: () for each chaned wndow w n W, set the new startng pont s { to s max{s =, t f 1 } 1 < l ; (13) s 1 = 1 () set the the new end pont e to { e mn{e, t s +1 = } 1 < l ; and (14) = l e l () set the new slack value δ to δ = δ max{0, t f 1 s } max{0, t s +1 e }. (15) Fg. 4-(a) shows an example of an update operaton. In ths example, the start tme of w 2 s updated because the earlest fnsh tme of w 1 was already greater than s 2. However, snce the latest start tme of w 2 s 25, there s no need to update e 1. Merge operaton. Two chaned wndows w 1 and w 2 can merge and create a new chaned wndow w, where s = s 1, e = e 2, J = J 1 + J 2, and δ = e s (C 1 + C 2 ), provded that δ δ 1 e 1 s 2. (16) By defnton, δ = e 2 s 1 (C 1 + C 2 ) = e 2 s 1 (e 1 s 1 δ 1 ) (e 2 s 2 δ 2 ), whch can be smplfed to δ 1 +δ 2 (e 1 s 2 ). From (16), we have δ δ 1, thus δ 1 + δ 2 (e 1 s 2 ) δ 1 and hence δ 2 e 1 s 2, or equvalently, s 2 + δ 2 e 1. In w, the earlest start tme of any job of J 2 s e 1 δ 1 s 2 (due to (16)). Smlarly, the latest fnsh tme of any job of J 1 s s 2 + δ 2 e 1. Hence any JSS S that s constructed from J s also a vald JSS for both J 1 and J 2. Thus w s a chaned wndow snce any possble JSS s vald. Fg. 4-(b) llustrates the merge operaton. Note that mergng only works f condtons (6) and (7) hold. Add operaton. A job J : (r, c, d ) can be added to a sequence of chaned wndows W after a chaned wndow w f t s +1 t f c. (17) The add procedure works as follows: () create a new chaned wndow w wth s = max{r, t f }, e = mn{ts +1, d }, J = J, and δ = e s c, () nsert w after w and before w +1 n W, () perform the update operaton on W, and (v) perform the merge operaton on all chaned wndows for whch (16) s satsfed, n order from the frst to the last. Due to the way we construct the new chaned wndow w, t already satsfes (6) and (7) as well as the condtons n Defnton 3. Fndng slack ntervals. To add a job J to the schedule, we need to fnd a sutable gap (or slack nterval) between already accepted jobs that can ft J. A slack nterval α s consdered safe f t satsfes three condtons: frst, the job must ft (.e., α c ); second, α must le wthn the feasble wndow of J defned by ts release tme and deadlne; and thrd, f α s used to accommodate job J, then ths does not cause a deadlne mss for any of the jobs of the exstng chaned wndows. The latter condton can be ensured by constructng α based on the earlest fnsh tme and latest start tmes of two consecutve chaned wndows w and w +1. Fg. 5-(d) shows an example. If a safe slack nterval α s found, then J can be successfully scheduled n t. A set of sutable canddate ntervals that can be used to add a job J to a gven set of chaned wndows W s obtaned as follows: () f W = 0, then just add an artfcal chaned wndow w 0 = ([, ],, δ = ); otherwse add two artfcal chaned wndows w 0 = ([, s 1 ],, δ = ) and w l+1 = ([e l, + ],, δ = ) to W. () For each w, obtan the prmary slack nterval α = (w, [max{r, t f }, mn{d, t s +1 }]). Note that α s a two-tuple. () If the length of the nterval n α s larger than or equal to c, then add α to the lst of safe slacks. (v) Sort the lst accordng to a slack selecton polcy such as frst-ft, best-ft, worst-ft etc. For example, the worst-ft (respectvely, frst-ft) heurstc sorts the lst of safe slacks by decreasng nterval length (respectvely, by ncreasng start tme). Next, we use the just-defned operatons to buld a vald JSS for all jobs n the hyperperod of a gven set of perodc tasks. D. Constructng the Table The frst step s to sort all gven jobs (e.g., all jobs n a hyperperod) accordng to a prorty polcy such as NP-RM or NP-EDF. After sortng the jobs, we try to create a chaned wndow for each job whle preservng the schedulablty of the prevously accepted jobs. Ths process s shown n Algorthm 1. Ths algorthm must be called wth a parameter J that ncludes all N nput jobs and W = {( [, + ],, δ = )}, whch s an artfcal chaned wndow wth no job. Consder the task set n Fg. 1. Assume we use NP-RM to sort the jobs. Thus, W corresponds to Fg. 5-(a) after addng all jobs of τ 1. Then for the frst job of τ 2, two slack ntervals are avalable as shown n Fg. 5-(b); the frst one s α 0 = (w 0, [0, 7]) and 4

Algorthm 1: CWnC: Chaned Wndow Constructon Input :J: unscheduled jobs, W : exstng chaned wndows, N: total number of jobs. 1 f J s empty then 2 return W ; 3 end 4 J the frst job of J; 5 Fnd and store the set of safe slack ntervals for J n A; 6 for each α j A do 7 Create a new chaned wndow w for J usng α j ; 8 Add w after w n W and create W ; 9 Apply the update and merge operatons on W ; 10 W CWnC ( J {J }, W ) ; 11 f W = N then 12 return W ; 13 end 14 end 15 return No soluton s found; the second one s α 1 = (w 1, [3, 12]). If we sort these ntervals accordng to the worst-ft heurstc, then the frst job of τ 2 wll be nserted between w 1 and w 2, whch results n a chaned wndow w = ([3, 12], J 2,1, δ = 3). For the sake of smplcty, we denote the x th job of τ y by J y,x. After applyng the update and merge operatons, the new chaned wndow s merged wth w 1, whch results n w 1 = ([0, 12], J 1,1, J 2,1, δ = 3). Fg. 5-(c) shows the resultng W after addng all jobs of τ 2. Fg. 5-(d) shows the slack ntervals that exst for schedulng the job of τ 3. Fg. 5-(e) shows the fnal set of chaned wndows. It s possble to modfy Algorthm 1 to avod searchng for all possble α j A by just consderng the frst α j n A. Thus, the for-loop n Lne 6 of Algorthm 1 s replaced wth α j A[0], whch s the frst element n A. Ths changes the complexty of Algorthm 1 to O(N X), where N s the number of jobs and X s the maxmum number of chaned wndows. In the worst case, the number of chaned wndows can reach N f no two chaned wndow merge wth each other. However, snce n most cases chaned wndows gradually merge, our algorthm s effcent n practce (as t wll be shown n Sec. V). IV. ONLINE EQUIVALENCE OF AN OFFLINE TABLE We now turn our attenton to the problem of enactng a gven non-preemptve schedule or schedulng polcy at runtme on a resource-constraned platform. We frst revew pure table-drven schedulng and pure prorty-drven schedulng as startng ponts, and then ntroduce our hybrd offlne-equvalence approach. A. The Baselnes: Table-Drven and Fxed-Prorty Schedulng Once an offlne table has been generated, t can be used to schedule the system smply by lookng up and dspatchng the job that s to be scheduled whenever the scheduler s nvoked. The man desgn choce s how to encode the table. For example, one approach s to store only the task dentfer (TID) and the absolute start tme of each job n the table, e.g., (τ 2, 1000), whch means that a job of τ 2 s scheduled to start at tme 1000 (relatve to the begnnng of a hyperperod). The advantage of ths approach s that dle tmes are encoded mplctly. However, snce the tme values are absolute (wthn the hyperperod), many bts may be requred to store the start Fg. 5. Steps of Algorthm 1 to schedule the task set n Fg. 1 wth τ 1 : (3, 10, 10), τ 2 : (6, 12, 12), and τ 3 : (8, 60, 60). tme, dependng on the system s resoluton of tme and the maxmum hyperperod length (whch s unbounded n general and can be large n practce). Assumng mcrosecond resoluton, we estmate that fve bytes per entry would be requred. An alternatve s to store relatve tme values that ndcate when the next schedulng event occurs, relatve to the precedng event. That s, we store the TID and the duraton that the task may use the processor, e.g., (τ 2, 50), whch means that τ 2 s allowed to be scheduled for ts WCET, whch s 50 mcroseconds. The advantage s that fewer bts per entry suffce snce n practce the maxmum WCET s obvously of much smaller magntude than any hyperperod. A downsde s that dle ntervals must be explctly encoded (e.g., by nsertng a record wth an nvald TID). In the worst case, there s an dle tme between any two jobs, but typcally fewer dle-tme entres are needed snce smultaneously released jobs are usually scheduled back-to-back. In our prototype, we adopted the latter, relatve-tme encodng scheme; our mplementaton s sketched n Algorthm 2. In each teraton of the system s man loop, the scheduler looks up the next table entry (Lne 3 of Algorthm 2). In Lne 5, t next, the next tme at whch the scheduler must be actvated, s determned. If the current entry s not an dle-tme entry, the ndcated task s run to completon (Lnes 6 9). If the task fnshes earler than t next, or f an dle-tme entry s encountered, the algorthm wats n a spn loop (Lnes 9 11) untl tme t next s reached. The modulo operatons n Lnes 4 and 5 guarantee that the 5

Algorthm 2: Schedulng wth an offlne table Input :Tmetable O = (τ, ) τ τ {τ dle } 1 k 0; t next 0; 2 whle true do 3 (τ, ) read the k th tem of O; 4 k (k + 1) mod O ; 5 t next (t next + ) mod hyperperod; 6 f τ τ dle then 7 Run task τ to completon; 8 end 9 whle (now() mod hyperperod) < t next do 10 nothng(); spn loop; 11 end 12 end table wraps around at the end of each hyperperod (.e., f we reach the end of the table, we start agan from the frst entry). Snce the scheduler s actvated once per table entry, the total cost of schedulng s O(1) per job. Note that now() s assumed to be a system functon that returns the current value of the system clock (e.g., the number of mcroseconds snce the system booted). In practce, Algorthm 2 can be mplemented wth very low overhead (Sec. V-A). The man drawback, however, s that the table sze s exactly proportonal to the number of jobs, whch can be prohbtvely large even for a small number of tasks, as already dscussed n Sec. I. In our mplementaton, we requre 5 bts per TID and use 27 bts to store the relatve tme n mcrosecond granularty, for a total of 4 bytes per record. 1 As even smple ECUs wth only a handful of tasks wth harmonc perods [3] can easly accumulate N = 500 entres, table szes n the range of (at least) a few klobytes are not uncommon. An alternatve soluton s to use an onlne schedulng algorthm wth low runtme overhead such as fxed-prorty schedulng. A straghtforward mplementaton sutable for mcrocontrollers s sketched n Algorthm 3. An array, denoted by A = {a next 1, a next 2,..., a next n }, s used to store the next arrval tme of each task. When the scheduler s actvated, t scans the array n order of decreasng prorty untl t fnds a task τ that has a next-arrval tme n the past,.e., t now a next. After dspatchng the hghest-prorty ready task τ, the scheduler updates a next and then proceeds to rescan the array of arrval tmes, startng agan wth the hghest-prorty task τ 1. The algorthm uses a lnear traversal, rather than a btmap guded-lookup or a more advanced data structure, snce mcrocontrollers often do not have an nstructon to quckly determne the hghest set bt n a word (e.g., ths s the case wth the AVR processor famly used n Arduno boards), and snce dynamc data structures such as mn-heaps or red-black trees typcally do not mprove runtmes for a small number of tasks. Wth an onlne polcy, there s no need to store the offlne table n memory. Furthermore, for small n, Algorthm 3 s qute fast n practce (Sec. V-A). However, we now have a schedulablty problem: the schedulablty rato of smple, work-conservng polces such as NP-RM or NP-EDF s very low f tasks have relatvely large executon tmes [9]. Conversely, employng the non-work-conservng polcy CW-EDF [9], whch offers much better schedulablty n theory, results n unacceptably hgh 1 If job WCETs can be guaranteed to never exceed (roughly) 524 ms, then ths can be further reduced to 3 bytes by usng only 19 bts to store the relatve tme. Algorthm 3: Non-preemptve fxed-prorty scheduler Input :Task perods T 1,..., T n 1 A = {a next j } n j=1 {0,..., 0}; 2 whle true do 3 t now now(); read system clock; 4 for := 1 to n do 5 f t now a next then 6 Run task τ to completon; 7 a next a next + T ; 8 break; 9 end 10 end 11 end runtme overheads n practce (Sec. V-A). B. Offlne Equvalence: Idea and Challenges In order to combne some of the benefts of both onlne and offlne approaches, we propose a new type of soluton based on the dea of storng only the crucal nformaton that makes the offlne table dfferent from a gven baselne onlne polcy such as NP-RM. Ths nformaton can then be used at runtme to produce an onlne schedule that exactly follows the offlne table. For example, suppose that we seek to re-create the schedule shown n Fg. 1 usng NP-RM as the baselne polcy,.e., tasks wth shorter perods have hgher prorty. The schedule n Fg. 1 dffers n two mportant ways from an NP-RM schedule. Frst, n the nterval [9, 10), no task s scheduled although there s pendng work, whch s a non-work-conservng dle tme. A work-conservng algorthm such as NP-RM would schedule τ 3 at tme 9, whch however would result n deadlne mss for the next job of τ 2. Second, at tme 30, τ 2 s scheduled nstead of the (accordng to the NP-RM polcy) hgher-prorty task τ 1, whch s a prorty nverson. Alternatvely, the fourth job of τ 1 can also be seen as havng an rregular start tme,.e., t can be seen as beng released at tme 36 rather than at ts regular perodc release tme at tme 30. Ether way, f the two jobs are swapped, then τ 2 msses ts deadlne at tme 36. Both non-work-conservng dle tmes and prorty nversons (or rregular start tmes) are thus essental and must be fathfully reproduced at runtme. To prevent dvergence of the onlne schedule from a gven table, we must address two man challenges. Frst, we need to dentfy and detect the crucal dfference nformaton that must be stored, and effcently make use of ths nformaton at runtme. Second, we must deal wth early job completons,.e., the fact that jobs n practce tend to rarely (f ever) execute for ther full WCET due to nput varatons and the pessmsm nherent n WCET estmates. If jobs under-run ther WCET, the scheduler can be nvoked too early before a hgh-prorty job s released that, accordng to the WCET-based table, should have been scheduled next. As a result, the scheduler could pck nstead an already pendng lower-prorty job, whch can result n deadlne msses (.e., under non-preemptve schedulng, early completons can nduce schedulng anomales). It s worth notng that snce jobs are not released by external events, there s no release jtter (e.g., due to nterrupt handlng snce nterrupts are not delvered and processed nstantaneously). 6

C. Buldng an Equvalent Onlne Schedule Based on the sketched deas, we present the offlne equvalence (OE) scheduler, whch s gven n ts entrety n Algorthms 4 and 5. In the followng, we ntroduce the OE algorthm step by step and dscuss how t addresses the outlned challenges. We use Algorthm 3 wth rate-monotonc (or deadlnemonotonc) prortes as the startng pont snce t ncurs comparably low runtme overhead. The frst step then s to avod dvergence due to WCET under-runs. Snce the goal s to recreate a fxed tmetable, there s lttle beneft n startng jobs early, and the OE scheduler smply ensures that all jobs consume ther full WCET wth a spn loop (as n Lnes 9 11 of Algorthms 2) to fll any surplus tme (Lnes 33 35 of Algorthm 4). The next step s to dentfy all nstances where the gven reference table ether devates from the onlne prorty order or where t s not work-conservng. We call these two types of rregulartes prorty-nverson rregularty (PII) and dle-tme rregularty (ITI), respectvely. Frst, consder ITIs, whch are the smpler case. An ITI occurs between two consecutve jobs J and J +1 f there s an dle nstant between the two jobs although there s a pendng job J j : S(J ) + c < S(J +1 ) J j : r j < S(J +1 ) S(J j ). (18) If (18) holds, there s a gap n the schedule between jobs J and J +1, but a job J j s already pendng and stll ncomplete strctly before J +1 s scheduled at tme S(J +1 ), whch ndcates a non-work-conservng dle tme. For each such ITI, we add an entry consstng of the absolute start tme of the forced dle nterval (4 bytes) and ts length (2 bytes) to the dle-tme table (IT-table). The IT-table s sorted by ncreasng start tmes. At runtme, the OE scheduler mantans an ndex nto the IT-table. Before every schedulng decson, t consults the current ITI record to determne whether a forced dle tme must be nserted (Lnes 11 14 of Algorthm 4). Next, consder PIIs. Snce the baselne polcy NP-RM s nonpreemptve (and snce each job s padded to consume ts full WCET), f a hgh-prorty job s released whle a lower-prorty job s runnng, there s no PII because even NP-RM wll not to schedule the hgh-prorty job untl the lower-prorty job fnshes ts executon. Thus, a PII exsts only f, for any two jobs J and J j wth respectve prortes p and p j, S(J ) < S(J j ) p > p j r j S(J ). (19) (Recall that a numercally smaller value mples hgher prorty.) We create a separate prorty-nverson table (PI-table) for each task, n whch we note any of the respectve task s jobs that have an elevated prorty and possbly a modfed start tme. For each of PII of a task, we add an entry that conssts of the job sequence number (2 bytes) and the arrval delay (4 bytes). 2 Each PI-table s sorted by ncreasng job sequence numbers. The PI-tables are used at runtme as follows. For each PI-table, the scheduler mantans a current ndex. Further, n addton to the next-arrval tmes array A, the scheduler also mantans an array of next-nter-arrval tmes B. In the regular case, the entres n B just correspond to each task s perod. However, when the current job has an rregular start tme, then the reduced nter-arrval tme of the next job s recorded n B. 2 Dependng on the task set, the delay feld can be reduced to 3 or 2 bytes. Algorthm 4: OE schedulng algorthm Input :Tasks τ 1,..., τ n, TI-table, and PI-tables 1 A = {a next j } n j=1 {0,..., 0} arrval tmes; 2 B = {b next j } n j=1 {T 1,..., T n } nter-arrval tmes; 3 X ; 4 whle true do 5 t now now(); read system clock; 6 f t now > hyperperod then 7 : a next a next hyperperod wrap tme; 8 t now t now hyperperod wrap tme; 9 Reset IT- and PI-table ndces and job numbers; 10 end 11 f processor must dle at t now accordng to IT-table then 12 Get duraton from table and advance ndex; 13 t next t now + ; 14 end 15 else 16 foreach τ X do 17 f t now a next then 18 t next t now + C ; 19 Call Algorthm 5 for τ ; 20 Run task τ to completon; 21 goto Lne 33; 22 end 23 end 24 for := 1 to n do 25 f t now a next then 26 t next t now + C ; 27 Call Algorthm 5 for τ ; 28 Run task τ to completon; 29 break; 30 end 31 end 32 end 33 whle now() < t next do 34 nothng(); spn loop; 35 end 36 end Fnally, snce the release of jobs mentoned n the PI-tables needs to overrule the regular prorty order, the OE scheduler mantans a separate lst X that contans tasks whose next jobs have an rregular start tme and prorty. When preparng the release of the next job of a task, the OE scheduler consults the task s PI-table by lookng up the entry ponted to by the current ndex. If the next job s ndcated to be rregular, the task s added to X and the job s release s delayed. To ensure that later jobs are released perodcally, the nter-arrval tme b next s adjusted accordngly (Lnes 3 8 of Algorthm 5). When the OE scheduler s nvoked and no dle tme s ndcated by the IT-table, t frst traverses X to see f any of the upcomng rregular jobs must now be released and scheduled (Lnes 16 23 of Algorthm 4). It s worth notng that, at any tme, there wll be only one rregular task n the system, among those stored n X, that s elgble to be scheduled. Fnally, f there s no ready rregular job, the scheduler fnds the hghest-prorty task wth a pendng regular job by traversng the normal prorty array A (Lnes 24 31 of Algorthm 4). 7

Algorthm 5: OE job-release algorthm Input :τ, a next 1 a next 2 b next a next, b next, X + b next ; T ; 3 f the next job of τ s rregular then 4 release delay of the next job of τ from PI-table; 5 Add τ to X; 6 a next 7 b next 8 end a next + ; b next ; Fnally, snce the dle-tme offsets and job sequence numbers stored n the TI- and PI-tables are relatve to the start of a hyperperod, whenever a hyperperod boundary s passed, the scheduler resets the ponters to the begnnng of the tables. Ths process happens n Lnes 6 10 of Algorthm 4. Algorthm 4 has O(n) computatonal complexty because of the lnear searches n Lnes 16 and 24, whch are bounded by the number of tasks. Snce resource-constraned systems typcally do not have a large number of tasks (.e., rarely more than a dozen or so), Algorthm 4 s reasonably fast n practce (Sec. V-A). D. A Prorty-Inverson Reducton Pass We next dscuss how to modfy an offlne table such that the number of PII cases s reduced whle guaranteeng that the schedulablty of all jobs s preserved n the resultng table. The man dea s to swap any two jobs that form a PII. Lemma 1. If two arbtrary jobs J and J j wth respectve prortes p > p j are scheduled at tmes S(J ) < S(J j ), then swappng these two entres n the table wll not ntroduce a deadlne mss (for any job) provded that () the jobs J +1,..., J N reman schedulable after the swap, and () r j S(J ) and S(J j ) + c d. (20) Proof: Trvally, all jobs J +1,..., J N reman schedulable as () stpulates ths as a necessary precondton. We show that the th entry, whch s now taken by J j s also schedulable. The new startng tme for J j wll be S (J j ) S(J ), whch accordng to (20) s not smaller than the release tme of J j. Snce accordng to the assumptons, S(J ) < S(J j ), t follows that n a vald table, d j must not be smaller than S(J j ) + c j, whch s larger than or equal to S(J ) + c j. Thus, J j remans schedulable. Based on ths observaton, t s possble to apply a sortnglke algorthm smlar to bubble sort or nserton sort to gradually bubble up jobs wth hgher prorty that are located later than lower-prorty ones n the schedule. Although ths s only a besteffort soluton (.e., the resultng table s not guaranteed to have a mnmal number of PIIs), we have found t to be an effectve frst step. Note that ths table preprocessng pass requres further adjustments f job precedence or other type of constrants are consdered durng the constructon of the orgnal table as we assume that any two jobs can be freely reordered. V. EXPERIMENTAL RESULTS We conducted experments to answer the followng key questons: () what s the overhead of our approach? () How effectve s our table-generaton approach at schedulng nonpreemptve tasks? And () how much memory s needed? A. Runtme Experments on an Arduno Mega 2560 Platform We mplemented our soluton on an Arduno Mega 2560 board, whch has an ATMega2560 RISC mcrocontroller wth a clock speed of 16 MHz, no cache memory, 8 KB SRAM, and 256 KB flash memory. All reported overheads were measured wth Arduno s bult-n mcros() clock, whch has an accuracy of eght mcroseconds accordng to the documentaton. We mplemented fve schedulng algorthms: onlne CW- EDF (due to ts good performance n theory), NP-EDF (as a well-known baselne), NP-RM (as a representatve fxed-prorty scheduler based on Algorthm 3), our proposed OE technque (Algorthm 4), and table-drven schedulng (TD) accordng to Algorthm 2. We chose CW-EDF as a baselne because t s emprcally one of the best (n terms of schedulablty rato) onlne schedulng algorthms for non-preemptve perodc task sets [9]. Brefly, CW-EDF mproves schedulablty by consderng the mpact on future, not-yet-released jobs when makng schedulng decsons. Specfcally, when CW-EDF fnds a job J to have the earlest deadlne, t consders the next job of each non-pendng task n the system to make sure that executng J wll not cause a deadlne mss for any of these future jobs. If schedulng J could result n a deadlne mss, then CW-EDF schedules nstead an dle-tme untl the next job release. In the worst case, CW-EDF consders at most n 1 future jobs per schedulng decson n ths manner. The tables for the TD and OE polces were stored ether on an attached SD card and then loaded nto RAM durng ntalzaton, or stored and accessed drectly n flash memory by complng them nto the system mage. Whle RAM s much more constraned and costly, readng table entres from RAM takes 2 cycles (per byte), whle flash memory accesses take 3 or more cycles (per byte). We evaluated the effect of the number of tasks on schedulng overhead. Task sets were generated randomly as follows. For a gven number of tasks n {3, 6, 9, 12}, we selected perods from a log-unform dstrbuton across the range [1, 1000] (n mllseconds) as suggested by Emberson et al. [13]. We then selected u 1 unformly at random from [0.01, 0.99]. Based on ths value, we obtaned C 1 = u 1 T 1. For the other tasks, we selected the executon tme unformly at random from [0.001, 2(C 1 T 1 )], because C 2(C 1 T 1 ) s a necessary schedulablty condton for non-preemptve task sets [14]. As dscussed shortly n Sec. V-B, we evaluated several ways of generatng schedulng tables. To obtan ITI and PII entres for use n the overhead experments, we always chose the table wth the mnmum number of rregulartes. We dscarded any task sets for whch none of the consdered table-generaton methods could fnd a schedule, or f the number of jobs n the table exceeded 1,000 (due to RAM sze lmtatons). For each value of n, we generated 1,000 task sets. Each task set was executed for 30 seconds under each tested scheduler (and n the case of the OE and TD schedulers, once each usng RAMand flash-based tables), whch translates to over 33 hours of runtme per scheduler, and more than 233 hours of total runtme. Fg. 6 reports the mnmum, maxmum, and average observed schedulng overhead under the dfferent schedulng methods and for dfferent task-set szes. For n = 3, the dfferences n overhead are relatvely mnor. As n ncreases, however, t becomes clear that CW-EDF exhbts much hgher overheads than the other 8

Undecded Rato CW-EDF NP-EDF NP-RM OE (RAM) OE (Flash) TD (RAM) TD (Flash) CW-EDF NP-EDF NP-RM OE (RAM) OE (Flash) TD (RAM) TD (Flash) CW-EDF NP-EDF NP-RM OE (RAM) OE (Flash) TD (RAM) TD (Flash) CW-EDF NP-EDF NP-RM OE (RAM) OE (Flash) TD (RAM) TD (Flash) Overhead (mcroseconds) Schedulablty Rato 450 400 350 300 250 200 150 100 50 0 3 Tasks 6 Tasks 9 Tasks 12 Tasks 404 Max Mn Avg 308 156 104 76 48 44 56 72 36 32 36 32 56 60 32 36 36 64 6432 40 136 76 68 44 32 40 CWn-EDF FF + BK CWn-RM WF + BK CWn-EDF FF CWn-RM WF NP-EDF NP-RM BB-Naïve BB-Moore CW-EDF Necessary Test 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 U Fg. 6. Schedulng overhead (mcroseconds). schedulers. As explaned earler, at each actvaton, CW-EDF performs a forward scan that nevtably results n hgh overhead. As expected, TD exhbts the lowest overhead, whch s also more or less ndependent of n due to ts O(1) complexty. (The mnor measured dfferences are smaller n magntude than the resoluton of the avalable clock devce.) Consderng the avalable clock resoluton (8 µs), we could not observe sgnfcant dfferences between placng the tables n (the larger) flash memory or n the (much scarcer) RAM. The low overhead exhbted by NP-RM the polcy s almost as effcent as the TD scheduler valdates our choce to use t as the baselne polcy for the OE approach. For context, n terms of maxmum and average overhead, the OE scheduler s about twce as costly as ether the TD or NP- RM schedulers, and actually sgnfcantly more effcent than the standard NP-EDF scheduler. Overall, our experments confrm the desred compromse between fast, but large offlne tables, and the slow, but memory-frendly CW-EDF polcy: overheads are ndeed much lower than under CW-EDF, whle only a small fracton of the table must be stored, as we show next. B. Table-Generaton Experments In the followng, we report on a comparson of dfferent tablegeneraton approaches. We consdered the algorthms NP-RM, NP-EDF, CW-EDF [9], whch are actually onlne polces that we smulated untl the end of the hyperperod, nave back-trackng (BB-Nave), whch terates over all possble job orderngs [12], and Moore s branch-and-bound wth prunng (BB-Moore) [15]. The BB-Moore algorthm tres to fnd a schedulable orderng between a gven set of jobs. It s based on a branch-and-bound strategy; however, before branchng, t calculates the maxmum tardness of the remanng unscheduled jobs accordng to the preemptve EDF algorthm. If the tardness s larger than the tardness of one of the prevously seen branches, the branch s pruned,.e., wll not be further explored. We further consdered several nstantatons of our proposed soluton ntroduced n Sec. III: CWn-RM WF, CWn-RM WF+BK, CWn-EDF FF, and CWn-EDF FF+BK, where CWn denotes the chaned wndow technque, WF and FF are the worstand frst-ft strateges for slack selecton, and BK means that the backtrackng opton s enabled as specfed n Algorthm 1. When backtrackng s dsabled, nstead of teratng over all possble slack ntervals α A (Lne 6 of Algorthm 1), we just select the frst safe slack nterval. Fg. 7. Schedulablty rato. NP-EDF and NP-RM are overlapped. BB-Moore s overlapped wth CW-EDF for large utlzaton values. 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Fg. 8. BB-Moore BB-Naïve CWn-EDF FF + BK CWn-RM WF + BK 0.000 0.002 0.002 0.014 0.027 0.051 0.01 0.111 0.204 0.411 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 U The rato of undecded task sets. It s worth notng that the two combnatons of RM and WF were producng the best results compared to other possble slackselecton strateges for RM, whle the frst-ft strategy was the best for EDF. To avod clutter, we do not report on any other slack-selecton strateges such as random-ft, next-ft, best-ft, etc. that were domnated by the dsplayed strateges. The experments were conducted on an Intel Xeon E7-8857 v2 machne clocked at 3 GHz, wth 16 cores and 1.2 TB RAM. In the frst experment, we measured the effect of task set utlzaton. Random task sets were generated as explaned n Sec. V-A. For a gven target utlzaton U, we dscarded any task sets wth a utlzaton other than U ± 0.01. In ths experment, we consdered sx tasks due to the fact that systems wth lmted resources usually do not have a large number of tasks. To ensure overall progress, we set a tme budget of one mnute for the table-generaton algorthms,.e., f an algorthm could fnd a schedule wthn one mnute, we report the task set as undecded. We also explored other tme lmts, as dscussed later (Fg. 12). Fg. 7 shows the schedulablty rato,.e., the fracton of task sets for whch a schedule could be found, relatve to the total number of generated task sets. For context, the fgure also shows a curve for a necessary schedulablty condton [9], whch represents an upper bound on achevable schedulablty rato. Fg. 8 shows the undecded rato,.e., the number of task sets that could not be scheduled wthn the gven tme budget. The confguraton CWn-EDF FF+BK s capable to schedule more task sets than the other algorthms. It also has the smallest value of undecded tasks among the branch-and-bound and backtrackng algorthms wthn the one-mnute tme lmt. In Fg. 7, 0.1 9

Memory Consumton (MB) Schedulablty Rato Tme Consumpton (Mllseconds) Sze of OE-Tables After Compresson (Byte) BB-Moore BB-Naïve CWn-EDF FF + BK CWn-RM WF + BK 8,364 10,000 5,183 6,768 3,525 1,632 2,439 1,007 1,201 1,186 1,279 1,320 1,194 1,392 1,424 1,765 1,000 798 648 341 238 179 206 72 195 53 89 100 44 40 10 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 U Fg. 9. Average tme requred by the dfferent methods f they successfully fnd a schedule. Note that the vertcal axs has logarthmc scale. 100,000 BB-Moore BB-Naïve CWn-EDF FF + BK CWn-RM WF + BK 10,000 1,000 100 10 1 703 1,325 2,964 107 104 91 6,478 9,590 120 107 14,003 446 20,226 24,940 353 425 31,632 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 U Fg. 10. Average memory consumpton by the dfferent methods f they successfully fnd a schedule. Note that the vertcal axs has logarthmc scale. the gap between CWn-EDF FF and CWn-EDF FF+BK shows the mportance of slack nterval selecton for the EDF-based chaned-wndow scheduler. We see that the gap s smaller f the jobs are ordered ntally by RM. The poor performance of NP-EDF and NP-RM s due to the fact that they do not nsert non-work-conservng dle tmes to avod causng large blockng tmes for future, soon-to-arrve jobs wth tght deadlnes. Another observaton s that BB-Nave, whch s based on an exhaustve search over all possble job orderngs, could not fnd schedules for many of the feasble task sets wthn the one-mnute tme budget. Moreover, although CW- EDF s a non-optmal heurstc, for hgh utlzatons, t performs as well as the branch-and-bound BB-Moore algorthm. In Fgs. 9 and 10, we show the tme and memory consumpton of the table generaton methods. As t can be seen, the chaned wndow technque s able to fnd schedules much faster than other branch-and-bound algorthms. Durng the process of constructon of the schedule, t consumes more memory than BB-Moore because t needs to store all chaned wndows (.e., the lst W ), but t stll consumes far less memory than BB-Nave. In our experment, the sze of the generated offlne tables was about 2 KB on average for all of the consdered table-generaton methods (recall that we use 4 bytes per record). Snce the table szes were almost the same for all table-generaton methods, we omt a detaled dscusson and nstead focus on the szes of the derved IT- and PI-tables (whch requre 6 bytes per record). Fg. 11 shows the average total sze of the PI- and IT-tables (called OE-tables here) after applyng the PII reducton pass. At 1,681 CWn-EDF FF + BK NP-EDF BB-Moore CWn-RM WF + BK NP-RM BB-Naïve CWn-EDF FF CWn-RM WF CW-EDF 3000 2500 2000 1500 1000 500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 U Fg. 11. Average total sze of all OE offlne tables for dfferent table-generaton methods after applyng the table manpulaton algorthm n Sec. IV-D. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 BB-Moore BB-Naïve CWn-EDF FF + BK CWn-RM WF + BK 0.795 0.762 0.765 0.133 0.835 0.856 0.401 0.678 0.907 0.769 0.771 0.715 0 0.002 0.007 0.013 1 10 100 1000 Tme budget (seconds) Fg. 12. Schedulablty rato of the table generaton algorthms as a functon of the avalable tme budget. Note that the horzontal axs has logarthmc scale. lower utlzatons, the PII algorthm fnds more opportuntes to swap jobs. CW-EDF generates schedules that are very close to an NP-RM schedule, and hence s able to provde very small OE-tables. In contrast, the tables that are produced by BB-Nave and BB-Moore do not result n effcent OE-tables because they nherently have a large number of prorty nversons that cannot be corrected by our best-effort PII reducton pass. It s noteworthy that, n our experment, NP-EDF and NP- RM generated almost dentcal schedules, whch mples that the OE-tables resultng from NP-EDF have a very small sze (almost 0 on average). Obvously, f a task set s already scheduable by NP-RM, there s no need to use the OE technque n the frst place. However, as shown n Fg. 7, more than 50% of feasble task sets are not schedulable by a non-preemptve fxed-prorty scheduler. The proposed OE approach allows us to fll ths gap. In a second experment, we measured the schedulablty rato of dfferent table-generaton algorthms as a functon of the avalable tme budget. Task sets were generated as explaned n Sec. V-A wth n = 10 tasks and a total utlzaton of U = 0.9. Fg. 12 shows the results of the experment for 1, 10, 100, and 1,000 seconds (per task set). For each setup, we have repeated the experment 1,000 tmes. In most cases, the exhaustve search BB-Nave s unable to fnd a schedule. As can be clearly seen, the chaned-wndow technque has a hgh schedulablty rato even wth a rather lmted tme budget such as one second. VI. RELATED WORK A cyclc executve [16, 17] s one of the tradtonal executon models for real-tme systems n whch the applcaton s dvded 122 10