Predictable Execution Model: Concept and Implementation

Size: px
Start display at page:

Download "Predictable Execution Model: Concept and Implementation"

Transcription

1 Predctable Executon Model: Concept and Implementaton Rodolfo Pellzzon, Emlano Bett, Stanley Bak, Gang Yao, John Crswell and Marco Caccamo Unversty of Illnos at Urbana-Champagn, IL, USA, {rpellz2, ebett, sbak2, crswell, Scuola Superore Sant Anna, Italy, Abstract Buldng safety-crtcal real-tme systems out of nexpensve, non-real-tme, COTS components s challengng. Although COTS components generally offer hgh performance, they can occasonally ncur sgnfcant tmng delays. To prevent ths, we propose controllng the operatng pont of each COTS shared resource (lke the cache, memory, and nterconnecton buses) to mantan t below ts saturaton lmt. Ths s necessary because the low-level arbters of these shared resources are not typcally desgned to provde real-tme guarantees. In ths work, we ntroduce a novel system executon model, the PRedctable Executon Model (PREM), whch, n contrast to the standard COTS executon model, coschedules at a hgh level all actve COTS components n the system, such as CPU cores and I/O perpherals. In order to permt predctable, system-wde executon, we argue that real-tme embedded applcatons need to be compled accordng to a new set of rules dctated by PREM. To expermentally valdate our theory, we developed a COTS-based PREM testbed and modfed the LLVM Compler Infrastructure to produce PREM-compatble executables. 1. Introducton Real-tme embedded systems are ncreasngly beng bult usng commercal-off-the-shelf (COTS) components such as mass-produced CPUs, perpherals and buses. Overall performance of mass produced components s often sgnfcantly hgher than custom-made systems. For example, a PCI Express bus [14] can transfer data three orders of magntude faster than the real-tme SAFEbus [8]. However, the man drawback of usng COTS components wthn a realtme system s the presence of unpredctable tmng anomales snce the ndvdual components are typcally desgned payng lttle or no attenton to worst-case tmng behavor. Addtonally, modern COTS-based embedded systems nclude multple actve components (such as CPU cores and I/O perpherals) that can ndependently ntate access to shared resources, whch, n the worst case, cause contenton leadng to tmng degradaton. Computng precse bounds on tmng delays due to contenton s dffcult. Even though some exstng approaches can produce safe upper bounds, they need to be very pessmstc due to the unpredctable behavor of arbters of physcally shared COTS resources (lke caches, memores, and buses). As a motvatng example, we have prevously shown that the computaton tme of a task can ncrease lnearly wth the number of suffered cache msses due to contenton for access to man memory [16]. In a system wth three actve components, a task s worst case computaton tme can nearly trple. To explot the hgh average performance of COTS components wthout experencng the long delays occasonally suffered by real-tme tasks, we need to control the operatng pont of each COTS shared resource and mantan t below saturaton lmts. Ths s necessary because the low-level arbters of the shared resources are not typcally desgned to provde real-tme guarantees. Ths work ams at showng that ths s ndeed possble by carefully rethnkng the executon model of real-tme tasks and by enforcng a hgh-level coschedulng mechansm among all actve COTS components n the system. Brefly, the key dea s to coschedule actve components so that contenton for accessng COTS shared resources s mplctly resolved by the hgh-level coscheduler wthout relyng on low-level, non-real-tme arbters. Several challenges had to be overcome to realze the PRedctable Executon Model (PREM): Task executon tmes suffer hgh varance due to nternal CPU archtecture features (caches, ppelnes, etc.) and unknown cache mss patterns. Ths source of temporal unpredctablty forces the desgner to make very pessmstc assumptons when performng schedulablty analyss. To address ths problem, PREM uses a novel program executon model wth three man features: (1) jobs are dvded nto a sequence of nonpreemptve schedulng ntervals; (2) some of these schedulng ntervals (named predctable ntervals) are executed predctably and wthout cache-msses by

2 prefetchng all requred data at the begnnng of the nterval tself; (3) the executon tme of predctable ntervals s kept constant by montorng CPU tme counters at run-tme. I/O perpherals wth DMA master capabltes contend for physcally shared resources, ncludng memory and buses, n an unpredctable manner. To address ths problem, we expand upon on our prevous work [1] and ntroduce hardware to put the COTS I/O subsystem under the dscplne of real-tme schedulng. Low-level COTS arbters are usually desgned to acheve farness nstead of real-tme performance. To address ths problem, we enforce a coschedulng mechansm that seralzes arbtraton requests of actve components (CPU cores and I/O perpherals). Durng the executon of a task s predctable nterval, a scheduled perpheral can access the bus and memory wthout experencng delays due to cache msses caused by the task s executon. Our PRedctable Executon Model (PREM) can be used wth a hgh level programmng language lke C by settng some programmng gudelnes and by usng a modfed compler to generate predctable executables. The programmer provdes some nformaton, lke begnnng and end of each predctable executon nterval, and the compler generates programs whch perform cache prefetchng and enforce a constant executon tme n each predctable nterval. In lght of the above dscusson, we argue that realtme embedded applcatons should be compled accordng to a new set of rules dctated by PREM. At the prce of mnor addtonal work by the programmer, the generated executable becomes far more predctable than state-of-the-art compled code, and when run wth the rest of the PREM system, shows sgnfcantly reduced worst-case executon tme. The rest of the paper s organzed as follows. Secton 2 dscusses related work. In Secton 3 we descrbe our man contrbuton: a co-schedulng mechansm that schedules I/O nterrupt handlers, task memory accesses and I/O perpheral data transfers n such a way that access to shared COTS resources s seralzed achevng zero or neglgble contenton durng memory accesses. Then, n Sectons 4 and 5 we dscuss the challenges n term of hardware archtecture and code organzaton that must be met to predctably comple real-tme tasks. Secton 6 presents our schedulablty analyss. Fnally, n Secton 7 we detal our prototype testbed, ncludng our compler mplementaton based on the LLVM Compler Infrastructure [9], and provde an expermental evaluaton. We conclude wth future work n Secton Related Work Several solutons have been proposed n pror real-tme research to address dfferent sources of unpredctablty n COTS components, ncludng real-tme handlng of perpheral drvers, real-tme complaton, and analyss of contenton for memory and buses. For perpheral drvers, Facchnett et al. [4] proposed usng a non-preemptve nterrupt server to better support the reusng of legacy drvers. Addtonally, analyss can be done to model worst-case temporal nterference caused by devce drvers [10]. For real-tme complaton, a tght couplng between compler and worst-case executon tme (WCET) analyzer can optmze a program s WCET [5]. Alternatvely, a compler-based approach can provde predctable pagng [17]. For analyss of contenton for memory and buses, exstng technques can analyze the maxmum delay caused by contenton for a shared memory or bus under varous access models [15, 20]. All these works attempt to analyze or control a sngle resource, and obtan safe bounds that are often hghly pessmstc. Instead, PREM s based on a global coschedule of all relevant system resources. Instead of usng COTS components, other researchers have dscussed new archtectural solutons that can greatly ncrease system predctablty by removng sgnfcant sources of nterference. Instead of a standard cache-based archtecture, a real-tme scratchpad archtecture can be used to provde predctable access tme to man memory [22]. The Precson Tme (PRET) machne [3] promses to smultaneously delver hgh computatonal performance together wth cycle-accurate estmaton of program executon tme. Whle our PREM executon model borrows some deas from these works, t exhbts one key dfference: our model can be appled to exstng COTS-based systems, wthout requrng sgnfcant archtectural redesgn. Ths approach allows PREM to leverage the advantage of the economy of scale of COTS systems, and support the progressve mgraton of legacy systems. 3. System Model We consder a typcal COTS-based real-tme embedded system comprsng of a CPU, man memory and multple DMA perpherals. Whle n ths paper we restrct our dscusson to sngle-core systems wth no hardware multthreadng, we beleve that our predctable executon model s also applcable to multcore systems. We wll present a predctable executon model for multcore systems as part of our planned future work. The CPU can mplement one or more cache levels. We focus on the last cache level, whch typcally employs a wrte-back polcy. Whenever a task suf-

3 COTS Perpheral COTS Perpheral COTS Perpheral Real-Tme Brdge Real-Tme Brdge Perpheral Scheduler Real-Tme Brdge PCIe PCI RAM North Brdge South Brdge COTS Motherboard FSB ATA CPU Dsk <8=#$%&'()*"# <:'>&#.&3'>&2#:",# /&?-:'&4&"32# 8&/9?>&/:-#,:3:# 3/:"2.&/2# 4&4*/1##?>:2&# &%&'()*"#?>:2&# Fgure 1: Real-Tme I/O Management System. fers a cache mss n the last level, the cache controller must access man memory to fetch the newly referenced cache lne and possbly wrte-back a replaced cache lne. Perpherals are connected to the system through COTS nterconnect such as PCI or PCIe [14]. DMA perpherals can autonomously ntate data transfers on the nterconnect. We assume that all data transfers target man memory, that s, data s always transferred between the perpheral s nternal buffers and man memory. Therefore, we can treat man memory as a sngle resource shared by all perpherals and by the cache controller. The CPU executes a set of N real-tme perodc tasks Γ = {τ 1,..., τ N }. Each task can use one or more perpherals to transfer nput or output data to or from man memory. We model all perpheral actvtes as a set of M perodc I/O flows Γ I/O = {τ I/O 1,..., τ I/O M } wth assgned tmng reservatons, and we want to schedule them n such a way that only one flow s transferred at a tme. Unfortunately, COTS perpherals do not typcally conform to the descrbed model. As an example, consder a task recevng nput data from a Network Interface Card (NIC). Delays n the network could easly cause a burst of packets to arrve at the NIC. Snce a hgh-performance COTS NIC s desgned to autonomously transfer ncomng packets to man memory as soon as possble, the NIC could potentally requre memory access for sgnfcantly longer than ts expected perodc reservaton. In [1], we frst ntroduced a soluton to ths problem consstng of a real-tme I/O management scheme. A dagram of our proposed archtecture s depcted n Fgure 1. A mnmally ntrusve hardware devce, called real-tme brdge, s nterposed between each perpheral and the rest of the system. The real-tme brdge buffers all ncomng traffc from the perpheral and delvers t predctably to man memory accordng to a global I/O schedule. Outgong traffc s also retreved from man memory n a predctable fashon. To maxmze responsveness and avod CPU overhead, the I/O schedule s computed by a separate perpheral scheduler, a hardware devce based on the prevously-developed [1] reservaton controller, whch controls all real-tme brdges. For smplcty, n the rest of our dscusson we assume that each perpheral servces a sngle task. However, ths assumpton can be lfted by support- Fgure 2: Predctable Interval wth constant executon tme.!"#$%&'()*"#+*,&-#.*/#0123&456&7&-#8/&,9'3:;9-931# ng perpheral vrtualzaton n real-tme brdges. More detals about our hardware/software mplementaton are provded n Secton 7. Notce that our prevously-developed I/O management system [1] does not solve the problem of memory nterference between perpherals and CPU tasks. When a typcal real-tme task s executed on a COTS CPU, cache msses are unpredctable, makng t dffcult to avod low-level contenton for access to man memory. To overcome ths ssue, we propose a set of compler and OS technques that enable us to predctably schedule all cache msses durng a gven porton of a task executon. The code for each task τ s dvded nto a set of N schedulng ntervals {s,1,..., s,n }, whch are executed sequentally at runtme. The tmng requrements of τ can be expressed by a tuple {{e,1,..., e,n }, p, D }, where p, D are the perod and relatve deadlne of the task, wth D p, and e,j s the maxmum executon tme of s,j, assumng that the nterval runs n solaton wth no memory nterference. A job can only be preempted by a hgher prorty job at the end of a schedulng nterval. Ths ensures that the cache content can not be altered by the preemptng job durng the executon of an nterval. We classfy the schedulng ntervals nto compatble ntervals and predctable ntervals. Compatble ntervals are compled and executed wthout any specal provsons (they are backwards compatble). Cache msses can happen at any tme durng these ntervals. The task code s allowed to perform OS system calls, but blockng calls must have bounded blockng tme. Furthermore, the task can be preempted by nterrupt handlers of assocated perpherals. We assume that the maxmum executon tme e,j for a compatble nterval can be computed based on tradtonal statc analyss technques. However, to reduce the pessmsm n the analyss, we prohbt perpheral traffc from beng transmtted durng a compatble nterval. Ideally, there should be a small number of compatble ntervals whch are kept as short as possble. Predctable ntervals are specally compled to execute accordng to the PREM model shown n Fgure 2, and exhbt three man propertes. Frst, each predctable nterval s dvded nto two dfferent phases. Durng the ntal memory phase, the CPU accesses man memory to perform a set of

4 s1,1 s1,2 s1,3 τ1 D#'*4C:);-&# ##9"3&/7:-# s2,1 s2,2 s2,3 s2,4 τ2 D#4&4*/1# ##CE:2&# 9"C(3##,:3:# I/O τ1 *(3C(3##,:3:# D#&%&'()*"# ##CE:2&# I/O D#FGH#I*J# τ2 <# =<# A<# B<# Fgure 3: Example System-Level Predctable Schedule These ntervals are needed to execute the assocated perphcache lne fetches and replacements. At the end of the memeral drver (ncludng nterrupt handlers) and set up the reory phase, all cache lnes requred durng the predctable ncepton and transmsson buffers n man memory (.e. read terval are avalable n last level cache. Second, the second and wrte system calls). More detals are provded n Secphase s know as the executon phase. Durng ths phase, the!"#$%&'()*"#+*,&-#.*/#0123&456&7&-#8/&,9'3:;9-931# ton 7. I/O flows can be scheduled both durng executon task performs useful computaton wthout sufferng any last phases and whle the CPU s dle. As we wll show n Seclevel cache msses. Predctable ntervals do not contan any ton 6, the descrbed scheme can be modeled as a herarsystem calls and can not be preempted by nterrupt handlers. chcal schedulng system [21], where the CPU schedule of Hence, the CPU does not perform any external man mempredctable ntervals supples avalable transmsson tme to ory access durng the executon phase. Due to ths propi/o flows. Therefore, exstng tests can be reused to check erty, perpheral traffc can be scheduled durng the executhe schedulablty of I/O flows. However, due to the characton phase of a predctable nterval wthout causng any conterstcs of predctable ntervals, a more complex analyss s tenton for access to man memory. Thrd, at run-tme, we requred to derve the supply functon. force the executon tme of a predctable nterval to be always equal to e,j. Let emem be the maxmum tme requred,j to complete the memory phase and eexec,j to complete the ex4. Archtectural Constrants and Solutons ecuton phase. Then offlne we set e,j = emem + eexec,j,j and at run-tme, even f the memory phase lasts for less than Predctable ntervals are executed n a radcally dfferemem tme unts, the overall nterval stll completes n exent way compared to the speculatve executon model that,j actly e,j. Ths property greatly ncreases task predctablty COTS components are typcally desgned to support. In ths wthout affectng CPU worst-case guarantees. In partcular, secton, we detal the challenges and solutons to mplement as we show n Secton 6, t ensures that hard real-tme guarthe PREM executon model on top of a COTS archtecture. antees can be extended to I/O flows. Fgure 3 shows a concrete example of a system-level pre4.1. Cachng and Prefetch dctable schedule for a task set comprsng two tasks τ1, τ2 I/O I/O together wth two I/O flows τ1, τ2 whch servce τ1 and Our general strategy to mplement the memory phase τ2 respectvely. Both tasks and I/O flows are scheduled acconssts of two steps: (1) we determne the complete set of cordng to fxed prorty, wth τ1 havng hgher prorty than memory regons that are accessed durng the nterval. Each I/O I/O regon s a contnuous area n vrtual memory. In general, ts τ2 and τ1 hgher prorty than τ2. We set D = p start address can only be determned at run-tme, but ts sze and assgn to each I/O flow the same perod and deadlne A s known at comple tme. (2) Durng the memory phase, as ts servced task and a transmsson tme equal to 4 tme we prefetch all cache lnes that contan nstructons and data unts. As shown n Fgure 3 for task τ1, ths means that the for requred regons; most COTS nstructon sets nclude a nput data for a gven job s transmtted n the perod beprefetch nstructon that can be used to load specfc cache fore the job s executed, and the output data s transmtted lnes n last level cache. Step (1) wll be detaled n Secton n the perod after. Task τ1 has a sngle predctable nter5. Step (2) can be successful only f there s no cache selfval of length e1,2 = 4 whle τ2 has two predctable nterevcton, that s, prefetchng a cache lne never evcts anvals of lengths e2,2 = 4 and e2,3 = 3. The frst and last other lne that has already been accessed (prefetched) durnterval of both τ1 and τ2 are specal compatble ntervals.

5 ':'A&#B:1# ':'A&# -9"&# 79/3(:-#4&4#2<:'&# <A129':-#4&4#2<:'&# -9"&# 9",&%# C# <:=&#># <:=&#?# Fgure 4: Cache organzaton wth one memory regon!"#$%&'()*"#+*,&-#.*/#0123&456&7&-#8/&,9'3:;9-931# ng the same memory phase. In the remander of ths secton, we descrbe self-evcton preventon. Most COTS CPUs mplement the last-level cache as an N-way set assocatve cache. Let B be the total sze of the cache and L be the sze of each cache lne n bytes. Then the byte sze of each of the N cache ways s W = B/N. An assocatve set s the set of all cache lnes, one for each way, whch have the same ndex n cache; there are W/L assocatve sets. Last level cache s typcally physcally tagged and physcally ndexed, meanng that cache lnes are accessed based on physcal memory addresses only. We also assume that last level cache s not exclusve, that s, when a cache lne s coped to a hgher cache level t s not removed from the last level. Fgure 4 shows an example where L = 4, W = 16 (parameters are chosen to smplfy the dscusson and are not representatve of typcal systems). The man dea behnd our conflct analyss s as follows: we compute the maxmum amount of entres n each assocatve set that are requred to hold the cache lnes prefetched for all memory regons n a schedulng nterval. Based on the cache replacement polcy, we then derve a safe lower bound on the amount of entres that can be prefetched n an assocatve set wthout causng any self-evcton. Consder a memory regon of sze A. The regon wll occupy at most K = A 1 L + 1 cache lnes. As shown n Fgure 4 for a regon wth A = 15, K = 5, the worst case s produced when the regon uses a sngle byte n ts frst cache lne. Assume now that vrtual memory addresses concde wth physcal addresses. Then snce the regon s contguous and there are W/L cache lnes n each way, the maxmum number of entres used n any assocatve set by the regon s K W/L. For example, the regon n Fgure 4 requres two entres n the set wth ndex 0. We then derve the maxmum number of entres for the entre nterval by summng the entres requred for each memory regon. Unfortunately, ths s not generally true f the system employs paged vrtual memory. If the page sze P s smaller than the C# C# ># sze W of each way, the ndex of each cache lne nsde the cache way s dfferent for vrtual and physcal addresses. In the example of Fgure 4 wth P = 8, the number of entres for the memory regon s ncreased from 2 to 3. We consder two solutons: 1) f the system supports t, we can select a page sze multple of W just for our specfc process. Ths soluton, whch we employed n our mplementaton, solves the problem because the ndex n cache for vrtual and physcal addresses s the same no matter the page allocaton. 2) We use a modfed page allocaton algorthm n the OS. Cache-aware allocaton algorthms have been proposed n the lterature, for example n [11] for cache parttonng. Note that a sutable allocaton algorthm could decrease the requred number of assocatve entres by controllng the allocaton n physcal memory of multple regons. We plan to pursue ths soluton n our future work. We now consder the cache replacement polcy. We consder four dfferent replacement polces: random, FIFO, LRU and pseudo-lru, whch cover most mplemented cache archtectures. Let Q be the maxmum number of entres n any assocatve set requred by the predctable nterval. Furthermore, let Q be the number of such entres relatve to cache lnes that are accessed multple tmes durng the memory phase; n our mplementaton, ths only ncludes the cache lnes that contan the small amount of nstructons of the memory phase tself, so Q = 1. A detaled analyss of the four replacement polces, based on the work n [7, 19], s provded n Appendx A; n partcular, based on the replacement polcy and possbly Q, we show how to compute a lower bound on Q such that no selfevcton can happen f Q s less than or equal to the derved bound. It s mportant to notce that the bound for some polces depends on the state of the cache before the begnnng of the predctable nterval. Therefore, we consder two dfferent cache nvaldaton models. In the full-nvaldaton model, none of the cache lnes used by a predctable nterval are already avalable n cache at the begnnng of the memory phase. Furthermore, the replacement polcy s not affected by the presence of nvaldated cache lnes. As an example, ths model can be enforced by fully nvaldatng the cache ether at the end or at the begnnng of each predctable nterval, whch n most archtectures has the sde effect of resettng all replacement queues/trees. Whle ths model s more predctable, several COTS CPUs do not support t, ncludng our testbed descrbed n Secton 7. In the partal-nvaldaton model, selected lnes n last level cache can be nvaldated, for example to handle data coherency between the CPU and perpherals n compatble ntervals. Furthermore, the replacement polcy always selects nvaldated cache lnes before vald lnes, ndependently of the state of the queue/tree. Results n Appendx A are summarzed by the followng theorem. Theorem 1. If cache lnes are prefetched n a consstent or-

6 der, then at the end of the memory phase all Q cache lnes n an assocatve set wll be avalable n cache, requrng at most Q fetches to be loaded, f Q s at most equal to: 1 for random; N for FIFO and LRU; log 2 N + 1 for pseudo-lru wth partal-nvaldaton semantc; { N + 2 Q Q f Q < log 2 N; log 2 N + 1 otherwse, for pseudo-lru wth full-nvaldaton semantc Computng Phase Length To provde predctable tmng guarantees, the maxmum tme requred for memory phase e mem,j and for executon phase e exec,j must be computed. We assume that an upper bound to e exec,j can be derved based on statc analyss of the executon phase. Note that our model only ensures that all requred nstructons and data are avalable n last level cache; cache msses can stll occur n hgher cache levels. Analyses that derve patterns of cache msses for splt level 1 data and nstructon caches have been proposed n [12,18]. These analyses can be safely executed because accordng to our model, level 1 msses durng the executon phase wll not requre the CPU to perform any external access. e mem,j depends on the tme requred to prefetch all accessed memory regons. Note that snce the task can be preempted at the boundary of schedulng ntervals, the cache state s unknown at the begnnng of the memory phase. Hence, each prefetch could requre both a cache lne fetch and a replacement n last cache level. An analyss to compute upper bounds for read/wrte operatons usng a COTS DRAM memory controller s detaled n [13]. Note that snce the number and relatve addresses of prefetched cache lnes s known, the analyss could potentally explot features such as burst read and parallel bank access that are common n modern memory controllers. Fnally, for systems mplementng paged vrtual memory, we employ the followng three assumptons: 1) the CPU supports hardware Translaton Lookasde Buffer (TLB) management; 2) all pages used by predctable ntervals are locked n man memory; 3) the TLB s large enough to contan all page entres for a predctable nterval wthout sufferng any conflct. Under such assumptons, each page used n a predctable nterval can cause at most one TLB mss durng the memory phase, whch requres a number of fetches n man memory equal to at most the level of the page table Interval Length Enforcement As descrbed n Secton 3, each predctable nterval s requred to execute for exactly e,j tme unts. If e,j s set to be at least e mem,j + e exec,j, the computaton n the executon phase s guaranteed to termnate at or before the end of the nterval. The nterval can then enter actve wat untl e,j tme unts have elapsed snce the begnnng of ts memory phase. In our mplementaton, elapsed tme s computed usng a CPU performance counter that s drectly accessble by the task; therefore, nether OS nor memory nteracton are requred to enforce nterval length Schedulng synchronzaton In our model, perpherals are only allowed to transmt durng a predctable nterval s executon phase or whle the CPU s dle. To compute the perpheral schedule, the perpheral scheduler must thus know the status of the CPU schedule. Synchronzaton can be acheved by connectng the perpheral scheduler to a perpheral nterconnecton as shown n Fgure 1. Schedulng messages can then be sent by ether a task or the OS to the perpheral scheduler. In partcular, at the end of each memory phase the task sends to the perpheral scheduler the remanng amount of tme untl the end of the current predctable nterval. Note that propagatng a message through the nterconnecton takes non-zero tme. Snce ths tme s typcally neglgble compared to the sze of a schedulng nterval 1, we wll gnore t n the rest of our dscusson. However, the schedulablty analyss of Secton 6 could be easly modfed to take such overhead nto account. Fnally, to avod executng nterrupt handlers durng predctable ntervals, a perpheral should only rase nterrupts to the CPU durng compatble ntervals of ts servced task. As we descrbe n Secton 7, n our I/O management scheme perpherals rase nterrupts through ther assgned real-tme brdge. Snce the perpheral scheduler communcates wth each real-tme brdge, t s used to block nterrupt propagaton outsde the desred compatble ntervals. Note that blockng real-tme brdge nterrupts to the CPU wll not cause any loss of nput data because the real-tme brdge s capable of ndependently acknowledgng the perpheral and storng all ncomng data n the brdge local buffer. 5. Programmng Model Our system supports COTS applcatons wrtten n standard hgh-level languages such as C. Unmodfed code can be executed wthn one or more compatble ntervals. To 1 In our mplementaton we measured an upper bound to the message propagaton tme of 1us, whle we envson schedulng ntervals wth a length of us.

7 create predctable ntervals, programmers add source code annotatons as C preprocessor macros. The PREM real-tme compler creates code for predctable ntervals so that t does not ncur cache msses durng the executon phase and the nterval tself has a constant executon tme. Due to current lmtatons of our statc code analyss, we assume that the programmer manually parttons the task nto ntervals. In partcular, compatble ntervals can be handled n the conventonal way whle each predctable nterval s encapsulated nto a sngle functon. All code wthn the functon, and any functons transtvely called by ths functon s executed as a sngle predctable nterval. To correctly prefetch all code and data used by a predctable nterval, we mpose several constrants upon ts code: 1. Only scalar and array-based memory accesses should occur wthn a predctable nterval; there should be no use of ponter-based data structures. 2. The code can use data structures, n partcular arrays, that are not local to functons n the predctable ntervals, e.g. they are allocated ether n global memory or n the heap 2. However, the programmer should specfy the frst and last address that s accessed wthn the predctable nterval for each non-local data structure. In general, t s dffcult for the compler to determne the frst and last access to an array wthn a pece of code. The compler needs ths nformaton to be able to load the relevant porton of each array nto the lastlevel cache. 3. The functons wthn a predctable nterval should not be called recursvely. As descrbed below, the compler wll nlne callees nto callers to make all the code wthn an nterval contguous n vrtual memory. Furthermore, no system calls should be made by code wthn a predctable nterval. 4. Code wthn a predctable nterval may only have drect calls. Ths allevates the needs for ponter-analyss to determne the targets of ndrect functon calls; such analyss s usually mprecse and would bloat the code wthn the nterval. 5. No stack allocatons should occur wthn loops. Snce all varables must be loaded nto the cache at functon entry, t must be possble for the compler to safely host the allocatons to the begnnng of the functon whch ntates the predctable nterval. Whle these constrants may seem restrctve, some of these features are rarely used n real-tme C code e.g., ndrect functon calls, and the others are met by many types of functons. We beleve that the beneft of faster, more 2 Note that data structures n the heap must have been prevously allocated durng a compatble nterval. predctable behavor for program hot-spots outweghs the restrctons mposed by our programmng model. Furthermore, exstng code that s too complex to be compled nto predctable ntervals can stll be executed nsde compatble ntervals. Therefore, our model permts a smooth transton for legacy systems. Notce that, the compler can be used to verfy that all of the aforementoned restrctons are met. Smple statc analyss can determne whether there s any rregular data structure usage, ndrect functon calls, or system calls. Durng complaton, the compler employs several transforms to ensure that code marked as beng wthn a predctable nterval does not cause a cache mss. Frst, the compler nlnes all functons called (ether drectly or transtvely) durng the nterval nto the top-level functon defnng the nterval. Ths ensures that all program analyss s ntra-procedural and that all the code for the nterval s contguous wthn vrtual memory. Second, the compler can transform the program so that all cache msses occur durng the memory phase, whch s located at the begnnng of the predctable schedulng nterval. To be specfc, t nserts code after the functon prologue to prefetch the code and data needed to execute the nterval. Based on the descrbed constrants, ths ncludes three types of contguous memory regons: (1) the code for the functon; (2) the actual parameters passed to the functon and the stack frame (whch contans local varables and regster spll slots); and (3) the data structures marked by the programmer as beng accessed by the nterval. Thrd, the compler nserts code to send schedulng messages to the perpheral scheduler as wll be descrbed n Secton 7. Fourth, the compler emts code at the end of the predctable nterval to enforce ts constant length. In partcular, the compler dentfes all return nstructons wthn the functon and adds the requred code before them. Fnally, based on the nformaton on prefetched memory regons, we assume that an external tool, such as a statc tmng analyzer used to compute maxmum phase length n Secton 4.2, can check the absence of cache self-evctons accordng to the analyss provded n Secton Schedulablty Analyss PREM allows us to enforce strct tmng guarantees for both CPU tasks and ther assocated I/O flows. By settng tmng parameters as shown n Fgure 3, the task schedule becomes ndependent of I/O schedulng. Therefore, task schedulablty can be checked usng avalable schedulablty tests. As an example, assume that tasks are scheduled accordng to fxed prorty schedulng as n Fgure 3. For a task τ, let e = N j=1 e,j be the sum of the executon tmes of ts schedulng ntervals, or equvalently, the executon tme of the whole task. Furthermore, let hp Γ be the set of hgher prorty tasks than τ, and lp the set of lower

8 prorty tasks. Snce schedulng ntervals are executed non preemptvely, τ can suffer a blockng tme due to lower prorty tasks of at most B = max τl lp max j=1...nl e l,j. The worst-case response tme of τ can then be found [2] as the fxed pont r of the teraton: r k+1 = e + B + l hp r k p l e l, (1) startng from r 0 = e +B. Task set Γ s schedulable f τ : r D. We now turn our attenton to perpheral schedulng. Assume that each I/O flow τ I/O s characterzed by a maxmum transmsson tme e I/O (wth no nterference n both man memory and the nterconnect), perod p I/O and relatve deadlne D I/O, where D I/O p I/O. The schedulablty analyss for I/O flows s more complex because the schedulng of data transfers depends on the task schedule. To solve ths ssue, we extend the herarchcal schedulng framework proposed by Shn and Lee n [21]. In ths framework, tasks/flows n a chld schedulng model execute usng a tmng resource provded by a parent schedulng model. Schedulablty for the chld model can be tested based on the supply bound functon sbf(t), whch represents the mnmum resource supply provded to the chld model n any nterval of tme t. In our model, the I/O flow schedule s the chld model and sbf(t) represents the mnmum amount of tme n any nterval of tme t durng whch the executon phase of a predctable nterval s scheduled or the CPU s dle. Defne the servce tme bound functon tbf(t) as the pseudo-nverse of sbf(t), that s, tbf(t) = mn{x sbf(x) t}. Then f I/O flows are scheduled accordng to fxed prorty, n [21] t s shown that the response tme r I/O can be computed accordng to the teraton: τ I/O r I/O,k+1 ( = tbf e I/O + l hp I/O I/O,k r p I/O l e I/O l of flow ), (2) where hp I/O has the same meanng as hp. In the remander of ths secton, we detal how to compute sbf(t). For the sake of smplcty, let us ntally assume that tasks are strctly perodc and that the ntal actvaton tme of each task s known. Furthermore, notce that usng the soluton descrbed n Secton 4, we could enforce nterval lengths not just for predctable ntervals, but also for all compatble ntervals. Fnally, let h be the hyperperod of task set Γ, defned as the least common multple of all tasks perods. Under these assumptons, t s easy to see that f Γ s feasble, the CPU schedule can be computed offlne and repeats tself dentcally wth perod h after an ntal nterval of 2h tme unts (h tme unts f all tasks are actvated smultenously). Therefore, a tght sbf(t) can be computed as the mnmum amount of supply (tme durng whch the CPU s dle or n executon phase of a predctable nterval) durng any nterval of tme t n the perodc task schedule, startng from the ntal two hyperperods. More formally, let {t 1,..., t K } be the set of start tmes for all schedulng ntervals actvated n the frst two hyperperods; the followng theorem shows how to compute sbf(t). Theorem 2. Let sf(t, t ) be the amount of supply provded n the perodc task schedule durng nterval [t, t ]. Then: sbf(t) = mn k=1...k sf(tk, t k + t). (3) Proof. Let t, t be two consecutve nterval start tmes n the perodc schedule. We show that t, t t t : mn ( sf(t, t + t), sf(t, t + t) ) sf(t, t + t). Ths mples that to mnmze sbf(t), t suffces to check the start tmes of all schedulng ntervals. Assume frst that t falls nsde a compatble nterval or a memory phase. Then the schedule provdes no supply n [t, t]. Therefore: sf(t, t + t) = sf(t, t + t) sf(t, t + t). Now assume that t falls n an executon phase or n an dle nterval before the start tme t of the next schedulng nterval. Then sf(t, t ) = t t. Therefore: sf(t, t + t) sf(t, t +t (t t))+(t t) = sf(t, t+t)+sf(t, t ) = sf(t, t + t). To conclude the proof, t suffces to note that snce the schedule s perodc, f t 2h, then sf(t, t + t) = sf(t k, t k + t) where t k = (t mod h) + h. Unfortunately, the proposed sbf(t) dervaton can only be appled to strctly perodc tasks. If Γ ncludes any sporadc task τ wth mnmum nterarrval tme p, the schedule can not be computed offlne. Therefore, we now propose an alternatve analyss that s ndependent of the CPU schedulng algorthm and computes a lower bound sbf L (t) to sbf(t). Note that snce sbf(t) s the mnmum supply n any nterval t, usng Equaton 2 wth sbf L (t) nstead of sbf(t) wll stll result n a suffcent schedulablty test. Let sbf(t) be the maxmum amount of tme n whch the CPU s executng ether a compatble nterval or the memory phase of a predctable nterval n any tme wndow of length t. Then by defnton, sbf(t) = t sbf(t). Our analyss derves an upper bound sbf U (t) to sbf(t). Therefore, sbf L (t) = t sbf U (t) s a vald lower bound for sbf(t). For a gven nterval sze t, we compute sbf(t) usng the followng key dea. For each task τ, we determne the mnmum and maxmum amount of tme E mn (t), E max (t) that the task can execute n a tme wndow of length t whle meetng ts deadlne constrant. Let t, E mn (t) t E max (t), be the amount of tme that τ actually executes n the tme wndow. Snce the CPU s a sngle shared resource, t must hold N =1 t t ndependently of the task schedulng algorthm. Fnally, for each task τ, we defne the memory bound functon mbf (t ) to be the maxmum amount of tme

9 $" #"!" E max (t) %" #" &" ''" '#" '(" %!" For a sporadc task, E mn (t) = 0, whle for a perodc task: ( t E mn (p + D 2e ) ) +e (t) = + (8) p ( mn(t (p + D 2e ) mod p, e ) ) +. mn(t, e ) e p D t e (p D ) +e p t = 20 mn(t e (p D ) mod p,e ) + Fgure 5: Dervaton of E max (t), perodc or sporadc task. $" #"!" p e %%" %#" %&" '!" t mn(t (p + D 2e ) mod p (p + D 2e ) +e +,e ) p D e t = 20 E mn (t) Fgure 6: Dervaton of E mn (t), perodc task. that the task can spend n compatble ntervals and memory phases, assumng that t executes for t tme unts n the tme wndow. We can then obtan sbf U (t) as the maxmum over all feasble {t 1,..., t N } tuples of N =1 mbf (t ). In other words, we can compute sbf U (t) by solvng the followng optmzaton problem: sbf U (t) = max mbf (t ), (4) =1 t t, (5) =1, 1 N : E mn (t) t E max (t). (6) E max (t) and E mn (t) can be computed accordng to the scenaros shown n Fgures 5, 6 where the notaton (x) + s used for max(x, 0). Theorem 3. For a perodc or sporadc task: ( t e (p D ) ) +e (t) = mn(t, e ) + + p ( mn(t e (p D ) mod p, e ) ) +. (7) E max Proof. As shown n Fgure 5, the relatve dstance between successve executons of a task τ s mnmzed when the task fnshes at ts deadlne n the frst perod wthn the tme wndow of length t, and as soon as possble (e.g., e tme unts after ts actvaton) for each successve perod. Equaton 7 drectly follows from ths actvaton pattern, assumng that the tme wndow concdes wth the start tme of the frst job of τ. In partcular: term mn(t, e ) represents the executon tme for the frst job, whch s actvated at the begnnng of the tme wndow; ( ) +e term t e (p D ) p represents the executon tme of jobs n perods that are fully contaned wthn the tme wndow; note that the frst perod starts e + (p D ) tme unts after the begnnng of the tme wndow; fnally, term ( mn(t e (p D ) mod p, e ) ) + represents the executon tme for the last job n the tme wndow. Smlarly, as shown n Fgure 6, the relatve dstance between successve executons of a perodc job τ s maxmzed when the task fnshes as soon as possble n ts frst perod and as late as possble n all successve perods. Equaton 8 follows assumng that the tme wndow concdes wth the fnshng tme of the frst job of τ, notcng that perodc task actvatons start (p e ) + (D e ) = p +D 2e tme unts after the begnnng of the tme wndow. Fnally, E mn (t) s zero for a sporadc task snce by defnton the task has no maxmum nterarrval tme. Lemma 4. The optmzaton problem of Equatons 4-6 admts soluton f task set Γ s feasble. Proof. The optmzaton problem admts soluton f the set of t values that satsfy Equatons 5 and 6 s not empty. Note that by defnton E max (t) E mn (t), therefore there s at least one admssble soluton ff N =1 Emn (t) t. Defne EU mn(t) = ( t (D e ) ) + e p. Then t s easy to see that t, E mn (t) EU mn (t): n partcular, both functons are equal to e for t = p + D e and ncrease by e every p afterwards. Now note that EU mn e (t) t p, therefore t also holds: N =1 Emn (t) t N e =1 p. The proof follows by notcng that snce Γ s feasble, t must hold N e =1 p 1.

10 E#'*4F:);-&# ##9"3&/7:-# E#4&4*/1# ##FG:2&# E#&%&'()*"# ##FG:2&# δ = 5/2 t1 t2 t3 α = mbfu (t ) 4-6 n a fnte number of ponts, and usng lnear nterpolaton to derve the remanng values. Due to the complexty of the resultng algorthm, the full dervaton s provded n Appendx B. mbf (t ) ># =# <# B# t4 e,1 e,2 C#?# e,4 Fgure 7: Dervaton of mbf (t ) and mbf U (t ). mbf (t ) can be computed usng a strategy smlar to the!"#$%&'()*"#+*,&-#.*/#0123&456&7&-#8/&,9'3:;9-931# one n Theorem 2. Snce t represents the amount of tme that τ executes, we consder a schedule n whch τ s executed contnuously beng actvated every e tme unts, as shown n Fgure 7. The start tme of the frst schedulng nterval n the frst job of τ s t1 = 0, the start tme of the second schedulng nterval s t2 = e,1, and so on and so forth untl the frst nterval of the second job whch has start tme equal to e. Then mbf (t ) can be computed as the maxmum amount of tme that the task spends n a compatble nterval or memory phase n any tme wndow t. Theorem 5. Let mf (t0, t00 ) be the amount of tme that task τ spends n a compatble nterval or memory phase n the nterval [t0, t00 ], n the schedule n whch τ s executed contnuously. Then: mbf (t ) = max mf (tk, tk + t ). k=1...n (9) Proof. Note that the schedule n whch τ executes contnuously s perodc wth perod p. The same strategy as n Theorem 2 can be used to show that f t0, t00 are consecutve nterval start tmes, then t, t t t : max mf (t, t t ), mf (t, t + t ) mf (t, t + t ). As an example, n Fgure 7 mbf (t ) s maxmzed n the tme wndow that starts at t4 = 10. Snce mbf (t ) s a nonlnear functon, computng sbf U (t) accordng to Equatons 4-6 requres solvng a nonlnear optmzaton problem. To smplfy the problem, we consder a lnear upper bound approxmaton mbf U (t ) = α t + δ, as shown n Fgure 7, where X X α = e,j + emem /e,,j s,j compatble s,j predctable (10) and δ s the mnmum value such that t, mbf U (t ) mbf (t ). Usng mbf U (t ), Equatons 4-6 can be effcently solved n O(N ) tme. Furthermore, we can show that sbf U (t) can be computed for all relevant values of t by solvng the optmzaton problem of Equatons 7. Evaluaton In order to verfy the valdty and practcalty of PREM, we mplemented the key components of the system. In ths secton, we descrbe our evaluaton, frst ntroducng the new hardware components followed by the correspondng software drver and OS calbraton effort. We then dscuss our compler mplementaton and analyse ts effectveness on a DES benchmark. Fnally, usng synthetc tasks we measure the effectveness of the PREM system as a functon of cache stall tme, and show traces of PREM when runnng on COTS hardware PREM Hardware Components When ntroducng PREM n Secton 3, two addtonal hardware components were added to the COTS system (recall Fgure 1). Here, we brefly revew our prevous work whch descrbes these hardware components n more detal [1], and descrbe the addtons to these components whch we made to provde the mechansm for PREM executon. Frst we descrbe the real-tme brdge component, then we descrbe the perpheral scheduler component. Our real-tme brdge prototype s mplemented on an ML507 FPGA Evaluaton Platform, whch contans a COTS TEMAC Ethernet hardware block (the I/O perpheral under control), and a PCIe edge connector (the nterface to the man system). Interposed between these hardware components, our real-tme brdge uses a System-on-Chp desgn runnng Lnux. The real-tme brdge s also wred to perpheral scheduler, whch allows drect communcaton between these components. Three wres are used to transmt nformaton between each real-tme brdge and the perpheral scheduler, data ready, data block, and nterrupt block. The data ready wre s an output sgnal sent to the perpheral scheduler whch s asserted whenever the COTS perpheral has buffered data n the real-tme brdge. The perpheral scheduler sends two sgnals, data block and nterrupt block, to the real-tme brdge. The data block sgnal s asserted to block the real-tme brdge from transferrng data n DMA mode over the PCIe bus from ts local buffer to man memory or vceversa. The nterrupt block sgnal s a new sgnal n the real-tme brdge mplementaton, added to support PREM, and nstructs the real-tme brdge to not further rase nterrupts. In prevous work [1], we provde a more detaled descrpton and buffer analyss of

11 the real-tme brdge desgn, and demonstrate the component s effectveness and effcency. The perpheral scheduler s a unque component n the system, connected drectly to each real-tme brdge. Unlke our prevous work where the perpherals were scheduled asynchronously wth the CPU [1], PREM requres the CPU and perpherals to coordnate ther access to man memory. The perpheral scheduler provdes a hardware mplementaton of the chld schedulng model used to schedule perpherals, wth each perpheral gven a sporadc server to schedule ts traffc. Meanwhle, the OS real-tme scheduler runs the parent schedulng model for CPU tasks to complete the herarchcal PREM schedulng model descrbed n Secton 6. The perpheral scheduler s connected to the PCIe bus, and exposes a set of regsters accessble from the man CPU. In the confguraton regster, constant parameters such as the maxmum cache wrte-back tme are stored. Wrtng a value to the yeld regster ndcates that the CPU wll not access man memory for the gven amount of tme, and I/O perpherals should be allowed to read to and wrte from RAM. The value wrtten to the yeld regster contans a 14 bt unsgned nteger ndcatng the number of mcroseconds to permt perpheral traffc wth man memory, as descrbed n Secton 4. The CPU can also use the yeld regster to allow nterrupts to be rased by perpherals. Another 14 bt unsgned nteger ndcates the number of mcroseconds to allow perpheral nterrupts, and a 3 bt nterrupt mask selects whch perpherals should be allowed to rase the nterrupts. In ths way, dfferent CPU tasks can predctably servce nterrupts from dfferent I/O perpherals Software Evaluaton The software effort requred to mplement PREM executon nvolves two aspects: (1) creatng the drvers whch control the custom PREM hardware dscussed n the prevous secton, and (2) calbratng the OS to elmnate undesred executon nterference. We now dscuss these, n order. The two custom hardware components, the real-tme brdge and the perpheral scheduler, each requre a software drver to be controller from the man CPU. Addtonally, each perpheral requres a drver runnng on the realtme brdge s CPU to control the COTS perpheral. The drver for the perpheral scheduler s straghtforward, mappng the bus addresses correspondng to the exposed regsters to user space where a PREM-compled process can access them. The drver for each real-tme brdge s more dffcult, snce each unque COTS perpheral requres a unque drver. However, snce n our mplementaton both the man CPU and the real-tme brdge s CPU are runnng Lnux (verson ), we can reuse exstng, thoroughly tested Lnux drvers to drastcally reduce the drver creaton effort [1]. The presence of a real-tme brdge s not apparent n user space, and software programs usng the COTS perpherals requre no modfcaton. For our experments, we use a Intel Q6700 CPU wth a 975X system controller; we set the CPU frequency to 1Ghz obtanng a measured memory bandwdth of 1.8Ghz/s to confgure the system n lne wth typcal values for embedded systems. We also dsable the speculatve CPU HW prefetcher snce t negatvely mpacts the predctablty of any real-tme task. The Q6700 has four CPU cores and each par of cores shares a common level 2 (last level) cache. Each cache s 16-assocatve wth a total sze of B = 4 Mbytes and a lne sze of L = 64 bytes. Snce we use a PC platform runnng a COTS Lnux operatng system, there are many potental sources of tmng nose, such as nterrupts, kernel threads, and other processes, whch must be removed for our measurements to be meanngful. For ths reason, n order to emulate at our best a typcal un-processor embedded real-tme platform, we dvded the 4 cores n two parttons. The system partton, runnng on the frst par of cores, receves all nterrupts for non-crtcal devces (ex: the keyboard) and runs all the system actvtes and non real-tme processes (ex: the shell we use to run the experments). The real-tme partton runs on the second par of cores. One core n the real-tme partton runs our real-tme tasks together wth the drvers for real-tme brdges and the perpheral scheduler; the other core s not used. Note that the cores of the system partton can stll produce a small amount of unscheduled bus and man memory accesses, or rase rare nter-processor nterrupts (IPI) that can not be easly prevented. However, n our experments we found these sources of nose to be neglgble. Fnally, to solve the pagng ssue detaled n Secton 4, we used a large, 4MB page sze, just for the real-tme tasks, usng the HugeTLB feature of the Lnux kernel for large page support Compler Evaluaton We bult the PREM real-tme compler prototype usng the LLVM Compler Infrastructure [9], targetng the complaton of C code. LLVM was extended by wrtng selfcontaned analyss and transformaton passes, whch were then loaded nto the compler. In the current PREM real-tme compler prototype, we rely on the programmer to partton the task nto predctable and compatble ntervals. The parttonng s done by puttng each predctable nterval nto ts own functon. The begnnng and end of the schedulng nterval correspond to the entry and ext of the functon, and the start of executon phase (the end of memory access phase) s manually marked by the programmer. We assume that non-local data accessed durng a predctable nterval exsts n contnuous memory spaces, whch can be prefetched by a set of

12 PREFETCH DATA(start address, sze) macros that must be placed by the programmer durng the memory phase. The mplementaton of ths macro does the actual prefetchng of the data nto level 2 cache by prefetchng every cache lne n the gven range wth the 386 prefetcht2 nstructon. After the memory phase, the programmer adds a STARTEXECUTION(wcet) macro to ndcate the begnnng of the executon phase. Ths macro measures the amount of tme remanng n the predctable nterval usng the CPU performance counter, and wrtes the tme remanng to the yeld regster n the perpheral scheduler. All remanng operatons needed to transform the nterval are performed by a new LLVM functon pass. The pass terates over all the functons n a complaton unt. When a functon representng a predctable nterval s found, the pass performs code transformaton. Frst, our transform nlnes called functons usng preexstng LLVM nlnng functons. Ths ensures that there are only a sngle stack frame and segment of code that need to be prefetched nto the cache. Second, our transform nserts code to read the CPU performance counter at the begnnng of the nterval and save the current tme. Thrd, t nserts code to prefetch the stack frame and functon arguments. Brngng the stack frame nto the cache s done by nsertng nstructons nto the program to fetch the stack ponter and frame ponter. Code s then nserted to prefetch the memory between the stack ponter and slghtly beyond the frame ponter (to nclude functon arguments) usng the prefetcht2 nstructon. Fourth, the transform prefetches the code of the functon. Ths s done by transformng the program so that the functon s placed wthn a unque ELF secton. We then use a lnker scrpt to defne varables pontng to the begnnng and end of ths unque ELF secton. The compler then adds code that prefetches the memory nsde the ELF secton. Fnally, the pass dentfes all return nstructons nsde the predctable nterval functon and adds a specal functon eplog before them. The eplog performs nterval length enforcement by loopng untl the performance counter reaches the worst-case cycle count based on the tme value saved at the begnnng of the nterval. It may also enable perpheral nterrupts by wrtng the worst-case nterrupt processng tme to the perpheral scheduler s yeld regster. To verfy the correctness of the PREM real-tme compler prototype and to test ts applcablty, we used LLVM to comple a DES cypher benchmark. The DES benchmark was selected because t represents a typcal real-tme data flow applcaton. The benchmark comprses one schedulng nterval whch encrypts a varable amount of data. We compled t as both a predctable and a compatble nterval (e.g. wth and wthout prefetchng), and measured number of cache msses wth a performance counter. Adaptng the nterval requred no modfcaton to any cypher func- Data sze 4K 8K 32K 128K 512K 1M Compatble k 31k Predctable Table 1: DES benchmark. tons and a total of 11 PREFETCH DATA macros. Results are shown n Table 1 n terms of the number of cache msses suffered n the executon phase of the predctable nterval (after prefetchng), and n the entre compatble nterval. Data sze s n bytes. The compatble nterval suffers an excessve number of cache msses, whch ncreases roughly proportonally wth the amount of processed data. Conversely, the executon phase of the predctable nterval has almost zero cache msses, only sufferng a small ncrease when large amounts of data are beng processed. The reason the number of cache msses s not zero s that the Q6700 CPU core used n our experments uses a random cache replacement polcy, meanng that wth more than one contguous memory regon the probablty of self-evcton s non-zero. In all the followng experments, we observed that the number of self-evctons s typcally so small that t can be consdered neglgble WCET Experments wth Synthetc Tasks In ths secton, we evaluate the effects of PREM on the executon tme of a task. To quckly explore dfferent executon parameters, we developed two synthetc applcatons. In our lnear access applcaton, each schedulng nterval operates on a 256-klobyte global data structure. Data s accessed sequentally, and we vary the amount of computaton performed between memory references. The random access applcaton s smlar, except that references nsde the data structure are nonsequental. For each applcaton, we measured the executon tme after complng the program n two ways: nto predctable ntervals whch prefetch the accessed memory, and nto standard, compatble ntervals. For each type of complaton, we ran the experment n two ways, wth and wthout I/O traffc transmtted by an 8-lane PCIe perpheral wth a measured throughput of 1.2Gbytes/s. In the case of compatble ntervals, we transmtted traffc durng the entre nterval to mrror the worst case accordng to the tradtonal executon model. Fgures 8 and 9 show the observed worst case executon tme for any schedulng nterval as a functon of the cache stall tme of the applcaton, averaged over 10 runs. The cache stall tme represents the percentage of tme requred to fetch cache lnes out of an entre compatble nterval, assumng a fxed (best-case) fetch tme based on the maxmum measured man-memory throughput. Only a sngle lne s shown for predctable ntervals because experments confrmed that njectng traffc durng the executon phase

13 Executon Tme (ms) Executon Tme (ms) compatble/traffc compatble predctable Cache Stall Tme % Fgure 8: random access compatble/traffc compatble predctable Cache Stall Tme % Fgure 9: lnear access does not ncrease executon tme. In all cases, the computaton tme decreases wth an ncrease n stall tme. Ths s because stall tme s controlled by varyng the amount of computaton between memory references. Furthermore, executon tmes should not be compared between the two fgures because the two applcatons execute dfferent code. In the random access case, predctable ntervals outperform compatble ntervals (wthout perpheral traffc) by up to 28%, dependng on the cache stall tme. We beleve ths effect s prmarly due to the behavor of DRAM man memory. Specfcally, accesses to adjacent addresses can be served qucker n burst mode than accesses to random addresses. Thus, we can decrease the executon tme by loadng all the accessed memory nto cache, n order, at the begnnng of each predctable nterval. Furthermore, note that transmttng perpheral traffc durng a compatble nterval can ncrease executon tme by more than 60% n the worst case. In Fgure 9, predctable ntervals perform worse than compatble ntervals (wthout perpheral traffc). We beleve ths s manly due to out-of-order executon n the Q6700 core. In compatble ntervals, whle the core performs a cache fetch, nstructons n the ppelne that do not depend on the fetched data can contnue to execute. When performng lnear accesses, fetches requre less tme and ths effect s magnfed. Furthermore, the gan n executon tme for the case wth perpheral traffc s decreased: ths occurs because burstng data on the memory bus reduces the amount of blockng tme suffered by a task due to perpheral nterference (ths effect has been prevously analyzed n detal [15]). In practce, we expect the effect of PREM on an applcaton s executon tme to be between the two fgures, based on the specfc memory access pattern System-wde Coschedulng Traces We now present executon traces of the mplemented system whch demonstrate the advantage of the PREM coschedulng approach. The traces are obtaned by usng the perpheral scheduler as a logc analyzer for the varous sgnals whch are beng sent to or from the real-tme brdges, data block, data ready, and nterrupt block. Addtonally, the perpheral scheduler has a trace regster whch allows tmestamped trace nformaton to be recorded wth a one mcrosecond resoluton when nstructed by the man CPU, such as at the start and end of an executon nterval. An executon trace s shown for a task runnng the tradtonal COTS executon model n Fgure 10, and the same task runnng wthn the PREM model s shown n Fgure In the frst trace (Fgure 10), although the executon s dvded nto unpreemptable ntervals, there s no memory phase prefetch or constant executon tme guarantees. When the schedulng ntervals of task T 1 fnsh executng, an I/O perpheral begns to access man memory, whch may happen f T 1 had wrtten output data nto RAM. Task T 2 then executes, sufferng cache msses that compete for man memory bandwdth wth the I/O perpheral. Due to the cold cache and perpheral nterference, the executon tme of T 2 grows from 0.5 ms (the executon tme wth warm cache and no perpheral I/O), to 2.9ms as shown n the fgure, an ncrease of about 600%. In the second trace (Fgure 11), the system executes accordng to the PREM executon model, where perpherals only access the bus when permtted by the perpheral scheduler. The predctable nterval s dvded nto a memory phase and an executon phase. Instead of competng for man memory access, task T 2 gets contentonless access to 3 The (compatble) ntervals at the end of T 1 and at the start of T 2 were measured as 0.107ms and 0.009ms, respectvely, and have been exaggerated n the fgures to be vsble.

14 ple processor systems. T 1 T 2 Schedulng Intervals 2.9ms References T 1 I/O Memory Access Fgure 10: An unscheduled bus-access trace (wthout PREM) T 1 T 2 T 1 I/O Compatble Intervals Predctable Memory Phase 1.6ms Fgure 11: A scheduled trace usng PREM Predctable Executon Phase Memory Access man memory durng the memory phase. After all to-beaccessed data s loaded nto the cache, the executon phase begns whch ncurs no cache msses, and the perpheral s allowed to access the data n man memory. The constant executon tme for the predctable nterval n the PREM executon model s 1.6ms, whch s sgnfcantly lower than the worst-case observed for the unscheduled trace (and s about the same as the executon tme of the schedulng nterval of T 2 n the unscheduled trace wth a cold cache and no perpheral traffc). 8. Conclusons We have dscussed the concept and mplementaton of a novel task executon model, PRedctable Executon Model (PREM). Our evaluaton shows that by enforcng a hghlevel co-schedule among CPU tasks and perpherals, PREM can greatly reduce or outrght elmnate low-level contenton for shared resource access. We plan to further develop our soluton n two man drectons. Frst, we wll study extensons to our compler nfrastructure to lft some of the more restrctve code assumptons and comple and test a larger set of benchmarks. Second, t s mportant to recall that contenton for shared resources becomes more severe as the number of actve components ncreases. In partcular, worst-case executon tme can greatly degrade n multcore systems [16]. Snce PREM can make the system contentonless, we predct that the benefts of our approach wll become even more sgnfcant when appled to mult- [1] S. Bak, E. Bett, R. Pellzzon, M. Caccamo, and L. Sha. Real-tme control of I/O COTS perpherals for embedded systems. In Proc. of the 30th IEEE Real-Tme Systems Symposum, Washngton DC, Dec [2] G. Buttazzo. Hard Real-Tme Computng Systems: Predctable Schedulng Algorthms and Applcatons. Kluwer Academc Publshers, Boston, [3] S. A. Edwards and E. A. Lee. The case for the precson tmed (pret) machne. In DAC 07: Proc. of the 44th annual Desgn Automaton Conference, [4] T. Facchnett, G. Buttazzo, M. Marnon, and G. Gud. Non-preemptve nterrupt schedulng for safe reuse of legacy drvers n real-tme systems. In ECRTS 05: Proc. of the 17th Euromcro Conf. on Real-Tme Systems, pages , [5] H. Falk, P. Lokucejewsk, and H. Thelng. Desgn of a wcet-aware c compler. In ESTMED 06: Proc. of the 2006 IEEE/ACM/IFIP Workshop on Embedded Systems for Real Tme Multmeda, pages , [6] C. Ferdnand, F. Martn, and R. Wlhelm. Applyng compler technques to cache behavor predcton. In Proceedngs of the ACM SIGPLAN Workshop on Languages, Complers and Tools for Real-Tme Systems, [7] D. Grund and J. Reneke. Precse and effcent FIFOreplacement analyss based on statc phase detecton. In Proceedngs of the 22nd Euromcro Conference on Real-Tme Systems (ECRTS), Brussels, Belgum, July [8] K. Hoyme and K. Drscoll. Safebus(tm). IEEE Aerospace Electroncs and Systems Magazne, pages 34 39, Mar [9] C. Lattner and V. Adve. LLVM: A complaton framework for lfelong program analyss and transformaton. In Proc. of the Internatonal Symposum of Code Generaton and Optmzaton, San Jose, CA, USA, Mar [10] M. Lewandowsk, M. Stanovch, T. Baker, K. Gopalan, and A. Wang. Modelng devce drver effects n real-tme schedulablty: Study of a network drver. In Proc. of the 13 th IEEE Real Tme Applcaton Symposum, Apr [11] J. Ledtke, H. Hartg, and M. Hohmuth. OS-controlled cache predctablty for real-tme systems. In Proceedngs of the 3rd IEEE Real-Tme Technology and Applcatons Symposum (RTAS), [12] F. Mueller. Tmng analyss for nstructon caches. Real Tme Systems Journal, 18(2/3): , May [13] M Paoler, E Qunones, F. J. Cazorla, and M. Valero. An analyzable memory controller for hard real-tme CMPs. IEEE Embedded System Letter, 1(4), Dec [14] PCI SIG. Conventonal PCI 3.0, PCI-X 2.0 and PCI-E 2.0 Specfcatons. [15] R. Pellzzon and M. Caccamo. Impact of perpheralprocessor nterference on wcet analyss of real-tme embedded systems. IEEE Trans. on Computers, 59(3): , Mar 2010.

15 [16] R. Pellzzon, A. Schranzhofer, J.-J. Chen, M. Caccamo, and L. Thele. Worst case delay analyss for memory nterference n multcore systems. In Proceedngs of Desgn, Automaton and Test n Europe (DATE), Dresden, Germany, Mar [17] I. Puaut and D. Hardy. Predctable pagng n real-tme systems: A compler approach. In ECRTS 07: Proc. of the 19th Euromcro Conf. on Real-Tme Systems, pages , [18] H. Ramaprasad and F. Mueller. Boundng preempton delay wthn data cache reference patterns for real-tme tasks. In Proc. of the IEEE Real-Tme Embedded Technology and Applcaton Symposum, Apr [19] J. Reneke, D. Grund, C. Berg, and R. Wlhelm. Tmng predctablty of cache replacement polces. Real-Tme Systems, 37(2), 207. [20] S. Schlecker, M. Negrean, G. Ncolescu, P. Pauln, and R. Ernst. Relable performance analyss of a multcore multthreaded system-on-chp. In CODES/ISSS, [21] I. Shn and I. Lee. Perodc resource model for compostonal real-tme guarantees. In Proceedngs of Proceedngs of the 23th IEEE Real-Tme Systems Symposum, Cancun, Mexco, Dec [22] J. Whtham and N. Audsley. Implementng tme-predctable load and store operatons. In Proc. of the Intl. Conf. on Embedded Systems (EMSOFT), Grenoble, France, Oct A. Analyss of Cache Replacement Polcy We frst provde a bref overvew of each analyzed polcy. Under random polcy, whenever a new cache lne s loaded n an assocatve set, the lne to be evcted s chosen at random among all N lnes n the set. Whle ths polcy s mplemented n some modern COTS CPUs wth very large cache assocatvty, such as Intel Core archtecture, t s clearly ll suted to real-tme systems. In FIFO polcy, a FIFO queue s assocated wth each assocatve set as shown n Fgure 12(a). Each newly fetched cache lne s nserted at the head of the queue (poston q 1 ), whle the cache lne whch was prevously at the back of the queue (poston q N ) s evcted. In the fgure, l 1,..., l 4 represent cache lnes used n the predctable nterval, whle dashes represent other cache lnes avalable n cache but unrelated to the predctable nterval. Fnally, grayed boxes represent nvaldated cache lnes. The Least Recently Used (LRU) polcy also uses a replacement queue, but a cache lne s moved at the head of the queue whenever t s accessed. Due to the complexty of mplementng LRU, an approxmated verson of the algorthm known as pseudo-lru can be employed where N s a power of 2. A bnary replacement tree wth N 1 nternal nodes and N leaves q 1,..., q N s constructed as shown n Fgure 13 for N = 4. Each nternal node encodes a rght or left drecton usng a sngle bt of data, whle each leaf encodes a fetched cache lne. Whenever a replacement decson must be taken, the encoded drectons are followed startng from the root to the leaf representng the cache lne q 8 q 7 q 6 q 5 q 4 q 3 q 2 q 1 #$%&!'()*$!!!!!!!!!! l 1 l 2 l 3 l 4!"#$ 1$23(!/!.0,$*! l 3 l 4 +,-).&!/!.0,$*! #$%&!'()*$! l 3 l 4 l 3 l 4 l 1 l 2 1$23(!/!.0,$*! l 2 l 1 #$%&!'()*$!!!!!!!!!! l 3 l 4!%#$!&#$!'#$!(#$!)#$ Fgure 12: FIFO polcy, example replacement queue. to be replaced. Furthermore, whenever a cache lne s accessed, all drectons on the path from the root to the leaf of the accessed cache lne are set to be the opposte of the followed path. Fgure 13 shows an example where four cache lnes are accessed n the order l 1, l 2, l 3, l 2, l 4, causng a selfevcton. Wthout loss of generalty, n all proofs n ths secton we focus on the analyss of a sngle assocatve set wth Q cache lnes accessed durng the predctable nterval. Furthermore, let l 1,..., l Q be the set of Q cache lnes n the order n whch they are frst accessed durng the memory phase (note that ths order can be dfferent from the order n whch the lnes are prefetched due to the Q lnes that are accessed multple tmes). Results for LRU and random are smple and well-known, see [6] for example; we detal the bound dervaton n the followng theorem to provde a better understandng of the replacement polces. Theorem 6. A memory phase wll not suffer any cache selfevcton n an assocatve set requrng Q entres, f Q s at most equal to: 1: for random replacement polcy; N: for LRU replacement polcy. Proof. Clearly no self-evcton s possble f Q = 1, snce after the unque cache lne s fetched no other cache lne can be accessed n the assocatve set. Now consder LRU polcy. When a cache lne l s frst accessed, t s fetched (f t s not already n cache) and moved to the begnnng of the replacement queue. Snce no cache lne outsde of l 1,..., l Q s accessed n the assocatve set, when l s frst accessed, cache lnes l 1,..., l 1 must be at the head of the queue n some order. But Q N mples 1 N 1, hence the algorthm always pcks an unrelated lne at the back of the queue to be replaced.

16 #$%&'(")*'*+",+*-."l 3 #$%&'(")*'*+",+*-."l 1,+*-."l 2! "! "! "! " l 1! "! "! " l 1! " l 2! " q 1 q 2 q 3 q 4 q 1 q 2 q 3 q 4 q 1 q 2 q 3 q 4!"#$!%#$!&#$ /--+)) l 2,+*-."l 4 l 1 l 3 l 2! " l 1 l 3 l 2! " l 4 l 3 l 2! "!(#$!'#$!)#$ Fgure 13: Pseudo-LRU polcy, full-nvaldaton, Q = 1.,--+))"l 1, l 2, l 3.+*-/"l 4 l 1 l 2 l 3! " l 1 l 2 l 3! " l 4 l 2 l 3! " q 1 q 2 q 3 q 4 q 1 q 2 q 3 q 4 q 1 q 2 q 3 q 4!"#$!%#$!&#$ Fgure 14: Pseudo-LRU polcy, partal-nvaldaton. Note that for random the bound of 1 s tght, snce n the worst-case fetchng a second cache lne n the assocatve set can cause the frst cache lne to be evcted. Furthermore, note that f the cache s not nvaldated between successve actvatons of the same predctable nterval, then some of the l 1,..., l Q cache lnes can stll be avalable n cache. Assume that at the begnnng of the predctable nterval l 2 s avalable but not l 1. Then even wth LRU polcy, prefetchng l 1 can potentally cause l 2 to be evcted. Accordng to our defnton, ths s not a self-evcton: l 2 s evcted before t s accessed durng the memory phase, and s then reloaded when t s frst accessed, thus guaranteeng that all requred lnes are avalable n cache at the end of the memory phase. In fact, Theorem 6 proves that the bounds for random and LRU are ndependent of the state of the cache before the begnnng of the predctable nterval. Unfortunately, the same s not true of FIFO and pseudo- LRU. Consder FIFO polcy. If a cache lne l s already avalable n cache, then when l s accessed durng the memory phase no modfcaton s performed on the replacement queue. In some stuatons ths can cause l to be evcted by another lne fetched after l n the memory phase, causng a self-evcton. Due to ths added complexty, when analyzng FIFO and pseudo-lru we need to consder the nvaldaton model. A detaled analyss of FIFO replacement s provded n [7]; here we summarze the results and ntutons relevant to our system. Theorem 7 (Drectly follows from Lemma 1 n [7]). Assume FIFO replacement wth full-nvaldaton semantc. Then no self-evcton s possble for an assocatve set wth Q N. The analyss for partal-nvaldaton FIFO s more complex. Fgure 12 (analogous to Fgure 2 n [7]) depcts a clarfyng example, where the mem. phase, fetch and nval. labels show the state of the replacement queue after the memory phase and after some cache lnes have been ether fetched or nvaldated outsde the predctable nterval, respectvely. Note that after Step (e), lnes l 1, l 2 reman n cache but not l 3, l 4. Hence, when l 3 and l 4 are fetched durng the next memory phase n Step (f), they evct l 1 and l 2. The soluton s to prefetch l 1,..., l Q multple tmes n the memory phase, each tme n the same order. Theorem 8 (Drectly follows from Theorem 3 n [7]). Assume FIFO replacement wth partal-nvaldaton semantc. If the Q cache lnes n an assocatve set, wth Q N, are prefetched at least Q tmes n the same access-order l 1,..., l Q, then all Q cache lnes are avalable n cache at the end of the memory phase. Furthermore, no more than Q fetches are requred. Note that whle executng the prefetchng code Q tmes mght seem onerous, n practce the majorty of the tme n the memory phase s spent fetchng and wrtng back cache lnes, and Theorem 8 ensures that no more than Q fetches and wrte backs are requred for each assocatve set. Whle LRU and FIFO allow to prefetch up to N cache lnes wth N fetches, the same s not true of pseudo-lru, even n the full-nvaldaton model. Consder agan Fgure 13, where no relevant lnes are avalable n cache at the begnnng of the memory phase but Q = 1, meanng that lne l 2 can be accessed multple tmes. Then t s easy to see that no more than 3 cache lnes can be prefetched wthout causng a self-evcton. The key ntuton s that up to Step (d), and wth respect to l 1, l 2, l 3, the replacement tree encodes the LRU order exactly. However, after l 2 s accessed agan at Step (e), the LRU order s lost, and the next replacement causes l 1 to be evcted nstead of the remanng unrelated cache lne. Furthermore, note that the strategy employed n Theorem 8 does not work n ths case, because l 1 has already been fetched durng the memory nterval before beng evcted. A smlar stuaton can happen even f Q = 0 n the partal-nvaldaton model, as shown n Fgure 14. Assume that due to partal replacements and cache accesses, at the begnnng of the memory phase the cache state s as shown n Fgure 14(a), wth l 1, l 2 and l 3 beng avalable n cache but not l 4. Then after the frst three cache lnes are prefetched n order, l 1 wll be replaced when l 4 s next prefetched. A lower bound that s ndependent of Q and of the nvaldaton model was frst proven n [19].

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling Real-Tme Systems Multprocessor schedulng Specfcaton Implementaton Verfcaton Multprocessor schedulng -- -- Global schedulng How are tasks assgned to processors? Statc assgnment The processor(s) used for

More information

Embedded Systems. 4. Aperiodic and Periodic Tasks

Embedded Systems. 4. Aperiodic and Periodic Tasks Embedded Systems 4. Aperodc and Perodc Tasks Lothar Thele 4-1 Contents of Course 1. Embedded Systems Introducton 2. Software Introducton 7. System Components 10. Models 3. Real-Tme Models 4. Perodc/Aperodc

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 4: November 17, Part 1 Single Buffer Management Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

Two Methods to Release a New Real-time Task

Two Methods to Release a New Real-time Task Two Methods to Release a New Real-tme Task Abstract Guangmng Qan 1, Xanghua Chen 2 College of Mathematcs and Computer Scence Hunan Normal Unversty Changsha, 410081, Chna qqyy@hunnu.edu.cn Gang Yao 3 Sebel

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

The optimal delay of the second test is therefore approximately 210 hours earlier than =2. THE IEC 61508 FORMULAS 223 The optmal delay of the second test s therefore approxmately 210 hours earler than =2. 8.4 The IEC 61508 Formulas IEC 61508-6 provdes approxmaton formulas for the PF for smple

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Last Time. Priority-based scheduling. Schedulable utilization Rate monotonic rule: Keep utilization below 69% Static priorities Dynamic priorities

Last Time. Priority-based scheduling. Schedulable utilization Rate monotonic rule: Keep utilization below 69% Static priorities Dynamic priorities Last Tme Prorty-based schedulng Statc prortes Dynamc prortes Schedulable utlzaton Rate monotonc rule: Keep utlzaton below 69% Today Response tme analyss Blockng terms Prorty nverson And solutons Release

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

A Network Calculus Based Analysis for the PCI Bus

A Network Calculus Based Analysis for the PCI Bus A Network Calculus Based Analyss for the PCI Bus Rodolfo Pellzzon, Mn-Young Nam, Rchard M. Bradford, and Lu Sha Department of Computer Scence, Unversty of Illnos at Urbana-Champagn, Champagn, IL 61801

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Clock-Driven Scheduling (in-depth) Cyclic Schedules: General Structure

Clock-Driven Scheduling (in-depth) Cyclic Schedules: General Structure CPSC-663: Real-me Systems n-depth Precompute statc schedule o-lne e.g. at desgn tme: can aord expensve algorthms. Idle tmes can be used or aperodc jobs. Possble mplementaton: able-drven Schedulng table

More information

CHAPTER 17 Amortized Analysis

CHAPTER 17 Amortized Analysis CHAPTER 7 Amortzed Analyss In an amortzed analyss, the tme requred to perform a sequence of data structure operatons s averaged over all the operatons performed. It can be used to show that the average

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

Overhead-Aware Compositional Analysis of Real-Time Systems

Overhead-Aware Compositional Analysis of Real-Time Systems Overhead-Aware ompostonal Analyss of Real-Tme Systems Lnh T.X. Phan, Meng Xu, Jaewoo Lee, nsup Lee, Oleg Sokolsky PRESE enter Department of omputer and nformaton Scence Unversty of Pennsylvana ompostonal

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition Sngle-Faclty Schedulng over Long Tme Horzons by Logc-based Benders Decomposton Elvn Coban and J. N. Hooker Tepper School of Busness, Carnege Mellon Unversty ecoban@andrew.cmu.edu, john@hooker.tepper.cmu.edu

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Common loop optimizations. Example to improve locality. Why Dependence Analysis. Data Dependence in Loops. Goal is to find best schedule:

Common loop optimizations. Example to improve locality. Why Dependence Analysis. Data Dependence in Loops. Goal is to find best schedule: 15-745 Lecture 6 Data Dependence n Loops Copyrght Seth Goldsten, 2008 Based on sldes from Allen&Kennedy Lecture 6 15-745 2005-8 1 Common loop optmzatons Hostng of loop-nvarant computatons pre-compute before

More information

A 2D Bounded Linear Program (H,c) 2D Linear Programming

A 2D Bounded Linear Program (H,c) 2D Linear Programming A 2D Bounded Lnear Program (H,c) h 3 v h 8 h 5 c h 4 h h 6 h 7 h 2 2D Lnear Programmng C s a polygonal regon, the ntersecton of n halfplanes. (H, c) s nfeasble, as C s empty. Feasble regon C s unbounded

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

An Interactive Optimisation Tool for Allocation Problems

An Interactive Optimisation Tool for Allocation Problems An Interactve Optmsaton ool for Allocaton Problems Fredr Bonäs, Joam Westerlund and apo Westerlund Process Desgn Laboratory, Faculty of echnology, Åbo Aadem Unversty, uru 20500, Fnland hs paper presents

More information

Limited Preemptive Scheduling for Real-Time Systems: a Survey

Limited Preemptive Scheduling for Real-Time Systems: a Survey Lmted Preemptve Schedulng for Real-Tme Systems: a Survey Gorgo C. Buttazzo, Fellow Member, IEEE, Marko Bertogna, Senor Member, IEEE, and Gang Yao Abstract The queston whether preemptve algorthms are better

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis A Appendx for Causal Interacton n Factoral Experments: Applcaton to Conjont Analyss Mathematcal Appendx: Proofs of Theorems A. Lemmas Below, we descrbe all the lemmas, whch are used to prove the man theorems

More information

Lecture 3: Shannon s Theorem

Lecture 3: Shannon s Theorem CSE 533: Error-Correctng Codes (Autumn 006 Lecture 3: Shannon s Theorem October 9, 006 Lecturer: Venkatesan Guruswam Scrbe: Wdad Machmouch 1 Communcaton Model The communcaton model we are usng conssts

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Improved Worst-Case Response-Time Calculations by Upper-Bound Conditions

Improved Worst-Case Response-Time Calculations by Upper-Bound Conditions Improved Worst-Case Response-Tme Calculatons by Upper-Bound Condtons Vctor Pollex, Steffen Kollmann, Karsten Albers and Frank Slomka Ulm Unversty Insttute of Embedded Systems/Real-Tme Systems {frstname.lastname}@un-ulm.de

More information

AN EXTENDIBLE APPROACH FOR ANALYSING FIXED PRIORITY HARD REAL-TIME TASKS

AN EXTENDIBLE APPROACH FOR ANALYSING FIXED PRIORITY HARD REAL-TIME TASKS AN EXENDIBLE APPROACH FOR ANALYSING FIXED PRIORIY HARD REAL-IME ASKS K. W. ndell 1 Department of Computer Scence, Unversty of York, England YO1 5DD ABSRAC As the real-tme computng ndustry moves away from

More information

Temperature. Chapter Heat Engine

Temperature. Chapter Heat Engine Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the

More information

Two-Phase Low-Energy N-Modular Redundancy for Hard Real-Time Multi-Core Systems

Two-Phase Low-Energy N-Modular Redundancy for Hard Real-Time Multi-Core Systems 1 Two-Phase Low-Energy N-Modular Redundancy for Hard Real-Tme Mult-Core Systems Mohammad Saleh, Alreza Ejlal, and Bashr M. Al-Hashm, Fellow, IEEE Abstract Ths paper proposes an N-modular redundancy (NMR)

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS

A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS Shervn Haamn A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS INTRODUCTION Increasng computatons n applcatons has led to faster processng. o Use more cores n a chp

More information

CHAPTER III Neural Networks as Associative Memory

CHAPTER III Neural Networks as Associative Memory CHAPTER III Neural Networs as Assocatve Memory Introducton One of the prmary functons of the bran s assocatve memory. We assocate the faces wth names, letters wth sounds, or we can recognze the people

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Turing Machines (intro)

Turing Machines (intro) CHAPTER 3 The Church-Turng Thess Contents Turng Machnes defntons, examples, Turng-recognzable and Turng-decdable languages Varants of Turng Machne Multtape Turng machnes, non-determnstc Turng Machnes,

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

An Algorithm to Solve the Inverse Kinematics Problem of a Robotic Manipulator Based on Rotation Vectors

An Algorithm to Solve the Inverse Kinematics Problem of a Robotic Manipulator Based on Rotation Vectors An Algorthm to Solve the Inverse Knematcs Problem of a Robotc Manpulator Based on Rotaton Vectors Mohamad Z. Al-az*, Mazn Z. Othman**, and Baker B. Al-Bahr* *AL-Nahran Unversty, Computer Eng. Dep., Baghdad,

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

One-sided finite-difference approximations suitable for use with Richardson extrapolation

One-sided finite-difference approximations suitable for use with Richardson extrapolation Journal of Computatonal Physcs 219 (2006) 13 20 Short note One-sded fnte-dfference approxmatons sutable for use wth Rchardson extrapolaton Kumar Rahul, S.N. Bhattacharyya * Department of Mechancal Engneerng,

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Lecture 5.8 Flux Vector Splitting

Lecture 5.8 Flux Vector Splitting Lecture 5.8 Flux Vector Splttng 1 Flux Vector Splttng The vector E n (5.7.) can be rewrtten as E = AU (5.8.1) (wth A as gven n (5.7.4) or (5.7.6) ) whenever, the equaton of state s of the separable form

More information

Online Appendix: Reciprocity with Many Goods

Online Appendix: Reciprocity with Many Goods T D T A : O A Kyle Bagwell Stanford Unversty and NBER Robert W. Stager Dartmouth College and NBER March 2016 Abstract Ths onlne Appendx extends to a many-good settng the man features of recprocty emphaszed

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

Physics 5153 Classical Mechanics. Principle of Virtual Work-1 P. Guterrez 1 Introducton Physcs 5153 Classcal Mechancs Prncple of Vrtual Work The frst varatonal prncple we encounter n mechancs s the prncple of vrtual work. It establshes the equlbrum condton of a mechancal

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION 1 2 MULTIPLIERLESS FILTER DESIGN Realzaton of flters wthout full-fledged multplers Some sldes based on support materal by W. Wolf for hs book Modern VLSI Desgn, 3 rd edton. Partly based on followng papers:

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Minimizing Energy Consumption of MPI Programs in Realistic Environment

Minimizing Energy Consumption of MPI Programs in Realistic Environment Mnmzng Energy Consumpton of MPI Programs n Realstc Envronment Amna Guermouche, Ncolas Trquenaux, Benoît Pradelle and Wllam Jalby Unversté de Versalles Sant-Quentn-en-Yvelnes arxv:1502.06733v2 [cs.dc] 25

More information

Lecture 8: Time & Clocks. CDK: Sections TVS: Sections

Lecture 8: Time & Clocks. CDK: Sections TVS: Sections Lecture 8: Tme & Clocks CDK: Sectons 11.1 11.4 TVS: Sectons 6.1 6.2 Topcs Synchronzaton Logcal tme (Lamport) Vector clocks We assume there are benefts from havng dfferent systems n a network able to agree

More information

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003 Tornado and Luby Transform Codes Ashsh Khst 6.454 Presentaton October 22, 2003 Background: Erasure Channel Elas[956] studed the Erasure Channel β x x β β x 2 m x 2 k? Capacty of Noseless Erasure Channel

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Lecture Note 3. Eshelby s Inclusion II

Lecture Note 3. Eshelby s Inclusion II ME340B Elastcty of Mcroscopc Structures Stanford Unversty Wnter 004 Lecture Note 3. Eshelby s Incluson II Chrs Wenberger and We Ca c All rghts reserved January 6, 004 Contents 1 Incluson energy n an nfnte

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

Appendix B. The Finite Difference Scheme

Appendix B. The Finite Difference Scheme 140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton

More information

Queueing Networks II Network Performance

Queueing Networks II Network Performance Queueng Networks II Network Performance Davd Tpper Assocate Professor Graduate Telecommuncatons and Networkng Program Unversty of Pttsburgh Sldes 6 Networks of Queues Many communcaton systems must be modeled

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Curve Fitting with the Least Square Method

Curve Fitting with the Least Square Method WIKI Document Number 5 Interpolaton wth Least Squares Curve Fttng wth the Least Square Method Mattheu Bultelle Department of Bo-Engneerng Imperal College, London Context We wsh to model the postve feedback

More information

Department of Electrical & Electronic Engineeing Imperial College London. E4.20 Digital IC Design. Median Filter Project Specification

Department of Electrical & Electronic Engineeing Imperial College London. E4.20 Digital IC Design. Median Filter Project Specification Desgn Project Specfcaton Medan Flter Department of Electrcal & Electronc Engneeng Imperal College London E4.20 Dgtal IC Desgn Medan Flter Project Specfcaton A medan flter s used to remove nose from a sampled

More information

Global EDF Scheduling for Parallel Real-Time Tasks

Global EDF Scheduling for Parallel Real-Time Tasks Washngton Unversty n St. Lous Washngton Unversty Open Scholarshp Engneerng and Appled Scence Theses & Dssertatons Engneerng and Appled Scence Sprng 5-15-2014 Global EDF Schedulng for Parallel Real-Tme

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Analysis of Queuing Delay in Multimedia Gateway Call Routing

Analysis of Queuing Delay in Multimedia Gateway Call Routing Analyss of Queung Delay n Multmeda ateway Call Routng Qwe Huang UTtarcom Inc, 33 Wood Ave. outh Iseln, NJ 08830, U..A Errol Lloyd Computer Informaton cences Department, Unv. of Delaware, Newark, DE 976,

More information

A Simple Inventory System

A Simple Inventory System A Smple Inventory System Lawrence M. Leems and Stephen K. Park, Dscrete-Event Smulaton: A Frst Course, Prentce Hall, 2006 Hu Chen Computer Scence Vrgna State Unversty Petersburg, Vrgna February 8, 2017

More information

find (x): given element x, return the canonical element of the set containing x;

find (x): given element x, return the canonical element of the set containing x; COS 43 Sprng, 009 Dsjont Set Unon Problem: Mantan a collecton of dsjont sets. Two operatons: fnd the set contanng a gven element; unte two sets nto one (destructvely). Approach: Canoncal element method:

More information

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model EXACT OE-DIMESIOAL ISIG MODEL The one-dmensonal Isng model conssts of a chan of spns, each spn nteractng only wth ts two nearest neghbors. The smple Isng problem n one dmenson can be solved drectly n several

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence) /24/27 Prevew Fbonacc Sequence Longest Common Subsequence Dynamc programmng s a method for solvng complex problems by breakng them down nto smpler sub-problems. It s applcable to problems exhbtng the propertes

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Applied Nuclear Physics (Fall 2004) Lecture 23 (12/3/04) Nuclear Reactions: Energetics and Compound Nucleus

Applied Nuclear Physics (Fall 2004) Lecture 23 (12/3/04) Nuclear Reactions: Energetics and Compound Nucleus .101 Appled Nuclear Physcs (Fall 004) Lecture 3 (1/3/04) Nuclear Reactons: Energetcs and Compound Nucleus References: W. E. Meyerhof, Elements of Nuclear Physcs (McGraw-Hll, New York, 1967), Chap 5. Among

More information

The stream cipher MICKEY

The stream cipher MICKEY The stream cpher MICKEY-128 2.0 Steve Babbage Vodafone Group R&D, Newbury, UK steve.babbage@vodafone.com Matthew Dodd Independent consultant matthew@mdodd.net www.mdodd.net 30 th June 2006 Abstract: We

More information

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle

More information