Minimizing Energy Consumption of MPI Programs in Realistic Environment

Size: px
Start display at page:

Download "Minimizing Energy Consumption of MPI Programs in Realistic Environment"

Transcription

1 Mnmzng Energy Consumpton of MPI Programs n Realstc Envronment Amna Guermouche, Ncolas Trquenaux, Benoît Pradelle and Wllam Jalby Unversté de Versalles Sant-Quentn-en-Yvelnes arxv: v2 [cs.dc] 25 Feb 2015 Abstract Dynamc voltage and frequency scalng proves to be an effcent way of reducng energy consumpton of servers. Energy savngs are typcally acheved by settng a well-chosen frequency durng some program phases. However, determnng sutable program phases and ther assocated optmal frequences s a complex problem. Moreover, hardware s constraned by non neglgble frequency transton latences. Thus, varous heurstcs were proposed to determne and apply frequences, but evaluatng ther effcency remans an ssue. In ths paper, we translate the energy mnmzaton problem nto a mxed nteger program that specfcally models realstc hardware lmtatons. The problem soluton then estmates the mnmal energy consumpton and the assocated frequency schedule. The paper provdes two dfferent formulatons and a dscusson on the feasblty of each of them on realstc applcatons. 1 Introducton For a very long tme, computng performance was the only metrc consdered when launchng a program. Scentsts and users only cared about the tme t took for a program to fnsh. Though stll often true, the prorty of many hardware archtects and system admnstrators has shfted to carng more and more about energy consumpton. Solutons reducng the energy enveloppe have been put forth. Among the dfferent exstng technques, Dynamc Voltage and Frequency Scalng (DVFS) proved to be an effcent way to reduce processor energy consumpton. The processor frequency s adapted accordng to ts workload: When the frequency s lowered wthout ncreasng the executon tme, the power consumpton and energy are reduced. Wth parallel applcatons n general, and more precsely wth MPI applcatons, reducng frequency on one processor may have a dramatc mpact on the executon tme of the applcaton: Reducng processor frequency may delay a message sendng, and maybe ts recepton. Ths may lead to cascadng delays ncreasng the executon tme. To save energy wth respect to applcaton deadlne, two man solutons exst: onlne tools and offlne schedulng. The former try to provde the frequency schedule durng the executon whereas the latter provde t after an offlne study. They both requre the applcaton task graph (ether through a prevous executon or by focusng on teratve applcatons). Many onlne tools [?,?] dentfy the crtcal path: the longest path through the graph, and focus on processors that do not execute these tasks. Typcally, when watng for a message, the processor frequency s set to the mnmal frequency untl the message arrves [?]. Although onlne tools allow some energy savngs, they provde suboptmal energy savng because of a lack of applcaton knowledge. On the other hand, offlne schedulng algorthms [?,?] provde the best frequency executon of each task. However, none of the exstng algorthms consder most current mult-core archtectures characterstcs: () cores wthn the same processor share the same frequency [?] and () swtchng frequency requres some tme [?]. Ths paper presents two models based on lnear programmng whch fnd the executon frequences of each task whle takng nto account the mutlcore archtecture constrants and characterstcs (secton 3) prevously descrbed. Moreover, we allow the executon tme to be ncreased f ths leads to more energy

2 savngs. The user provdes a maxmum performance degradaton that she can tolerate. The presented models provde optmal frequency schedule whch mnmzes the energy consumpton. However, when consderng large applcatons and large machnes, no current solver can provde a result, even parallel ones. The reason behnd ths ssue s dscussed n secton 3. 2 Context and executon model We consder MPI applcatons runnng on a mult-node platform. The targeted archtectures consder the followng characterstcs: () the latency of frequency swtchng s not neglgble and () cores wthn the same processor share the same frequency. A process, runnng on every core, executes a set of tasks. A task, denoted T, s defned as the computatons between two communcatons. The applcaton executon s represented as task graph where tasks are vertces and edges are messages between the tasks. Fgure 1 s an example of the task graph runnng on two processes. One process executes tasks T 1 and T 2 whle the other one executes tasks T 3 and T 4. T 1 T 3 T 2 T 4 Fgure 1: Task graph Before gong nto more detals on the executon model, let us provde an example of the problem we want to solve. Consder the example provded n Fgure 2. The applcaton s executed on 3 cores, 2 n the same processor and one n another processor. Tasks T 1, T 2, T 3 and T 4 are executed on processor 0 whle tasks T 5 and T 6 are executed on processor 1. In order to mnmze the energy consumpton through DVFS, we make the same assumpton as [?]: tasks may have several phases and each phase can be executed at a specfc frequency. Typcally on Fgure 2, task T 1 s dvded nto 3 phases. The frst one s executed at frequency f 1, the second one at frequency f 2 and the last one at frequency f 3. As stressed out before, settng a frequency takes some tme. In other words, when a frequency s requested, t s not set mmedately. Thus, on Fgure 2, when frequency f 2 s requested, t s set some tme after. One needs to be careful of such stuatons snce a frequency may be set after the task whch t was requested from s over. Moreover, cores wthn the same processor run at the same frequency. Hence, on Fgure 2, when f 1 s frst set on processor0, all the tasks beng executed at ths tme (T 1 and T 3 ) are executed at frequency f 1. T 5 s not affected snce t s on another processor. To provde the best frequency to execute each task porton, we need to consder all parallel tasks whch are executed at the same tme on the processor. 0 1 f 1 request f 2 f 2 T 1 T 3 T 5 f 3 f 3 f 1 f 1 T 2 T 4 T 6 f 2 Fgure 2: Frequency swtch latency 1 1 Note that only the latency of the frst request s represented

3 Our model requres the task graph to be provded (through proflng or a complete executon of the applcaton). Thus, we consder determnstc applcatons: for the same parameters and the same nput data, the same task graph s generated. In order to guarantee that edges are the same over all possble executons, one has to make sure that the communcatons between the processes are the same. Non determnstc communcatons n MPI are ether receptons from an unknown source (by usng MPI Any Source n the recepton call), or non-determnstc completon events (MPI Watany for nstance). Any applcaton wth such events s consdered as non-determnstc, thus falls out of the scope of the proposed soluton. T 1 T 3 slack T 2 T 4 Fgure 3: Slack tme Tasks wthn a core are totally ordered. If a task T ends wth a send event, then the followng task T j starts exactly at the end of T. On Fgure 1, task T 2 starts exactly after T 1 ends. On the other hand, when a task s created by a message recepton (T 4 on Fgure 1), t cannot start before all the tasks t depends on fnsh (T 1 and T 3 ) and t has to wat for the message to be receved. If the message arrves after the end of the task whch s supposed to receve t, the tme between the end of the task and the recepton s known as slack tme. On Fgure 3, tasks T 1 sends a message to T 3 but T 3 ends before recevng the messages creatng the slack represented by dotted lnes. A task energy consumpton E s defned as the product of ts executon tme exec and ts power consumpton P. Snce the applcaton s composed of several tasks, ts energy consumpton can be expressed as the sum of the energy consumpton of all the tasks. Thus, the goal translates nto provdng the set of frequency to execute each task. Hence, one can calculate the applcaton energy consumpton as: E = (E ) = (exec P ) (1) Mnmzng the energy consumpton of the applcaton s equvalent to mnmzng E n equaton (1). For each task T, both exec and P depend the frequency of the dfferent phases of the task. In addton, tasks are not ndependent snce when executed n parallel on the same processor, the tasks share the same frequency. Moreover, the overall executon tme of the applcaton depends on all the exec and the slack tme. To mnmze the energy consumpton whle stll controllng the overall executon tme, we express the problem as a lnear program. 3 Buldng the lnear program The followng paragraphs descrbe how the energy mnmzaton problems translates nto a lnear programmng. We frst descrbe the precedence constrants between the tasks, then we descrbe two formulatons whch consder the archtecture constrants. Fnally, we dscuss the feasblty of the descrbed solutons. 3.1 Precedence constrants Let T be a task defned by ts start tme bt and ts end tme et. The begnnng of tasks s bounded by the precedence relaton between them. As already stressed out, a task cannot start before ts drect predecessors complete ther executon. As explaned n secton 2, f T sends a message, ts chld task T j starts exactly when T ends snce the end of the communcaton means the begnnng of the next task. Ths translates to: bt j = et

4 bt et bts ets exec f tt f δ f M j Begnnng of a task T End of a task T Begnnng of a slack task Ts End of a slack task Ts The executon tme of a task T f executed completely at frequency f The tme durng whch the task T s executed at frequency f The fracton of tme a task T spends at frequency f Message transmsson tme from task T j to task T Table 1: Task varables On the other hand, when T ends wth a message recepton from T k, one has to make sure that ts successor task T j starts after both tasks end. Moreover, as ponted out n secton 2, when a task receves a message, some slack may be ntroduced before the recepton. Slack s handled the same way tasks are: t has a start and an end tme and t can be executed at dfferent frequences dependng on the tasks on the other cores. On Fgure 3, the slack after T 3 may be executed at dfferent frequences whether t s executed n parallel wth T 1 or T 2. To ease the presentaton, we assume that each task T recevng a message (from a task T k ) s followed by a slack task, denoted Ts. The begnnng of Ts, denoted bts s exactly equal to the end of T, bts = et (2) whereas ts end tme, denoted ets, s at least equal to the arrval tme of the message from T k. Let M k denote the transmsson tme from T k to T. Thus: ets et k +M k (3) Note that a task may receve messages from dfferent processes (after a collectve communcaton for example) and equaton 3 has to be vald for all of them. Fnally, snce T j, the successor task of T has to start after T and T k fnsh, one just needs to make sure that: bt j = ets In order to compute the end tme of a task T (et ), one has to evaluate the executon tme of T. As explaned above, a task may be executed at dfferent frequences. Let exec f be the executon tme of T f executedcompletely atfrequencyf. Everyfrequencycanbe usedto runafractonδ f ofthe totalexecutonof the task. Let tt f be the fracton of tme T spends at frequency f. It can be expressed as: tt f = δ f execf. Thus, the end tme of a task s: et = bt + f tt f Note that one has to make sure that a task s completely executed: δ f = 1 (4) f Fnally, snce the power consumpton depends on the frequency, let P f be the power consumpton of the task T when executed at frequency f. Usng ths formulaton, the objectve functon of the lnear program becomes: mn( T ( f (tt f P f ))) (5)

5 One can just use tt f n the objectve functon as t s expressed n equaton (5), and the solver would provde the values of tt f of all tasks at all frequences. Ths soluton was presented n [?]. The provded soluton can be used on dfferent archtectures than the ones we target n ths work. As a matter of fact, nothng constrans parallel tasks on one processor to run at the same frequency, and the threshold of swtchng frequency s not consdered ether. Moreover, no constrant on the executon tme s expressed. The followng paragraphs frst descrbe how the performance s handled then they ntroduce addtonal constrants the handle the archtecture constrants and executon tme. 3.2 Executon tme constrants The performance of an applcaton s a major concern; whether the energy consumpton s consdered or not. In ths paragraph we provde constrants whch consder the executon tme of the applcaton. In MPI, all programs end wth MPI Fnalze whch s smlar to a global barrer. Let last task be the last task on core (the MPI Fnalze task). Snce the applcaton ends wth a global communcaton, every task last task s followed by a slack task last slack task. The dfference between the global communcaton slack and the other slack tasks les n the end tme: the end tme of all slack tasks of a global communcaton s the same (all processes leave the barrer at the same tme). Thus, for every couple of cores (,j): elast slack task = elast slack task j (6) Let total Tme be the applcaton executon tme: It s equal to the end tme of the last slack task. total Tme = elast slack task (7) However, n some cases, ncreasng the executon tme of an applcaton could beneft to energy consumpton. In order to allow ths performance loss to a specfed extent, the user lmts the degradaton to a factor x of the maxmal performance. Let exec Tme be the executon tme when all tasks run at the maxmal frequency, and x the maxmum performance loss percentage allowed by the user. The followng constrant allows performance loss wth respect to x: exec Tme x total Tme exec Tme+ 100 The next sectons descrbe two dfferent formulatons. In the frst formulaton, the solver s provded wth all possble task confguratons and chooses the one mnmzng energy consumpton. In the second formulaton, the solver provdes the exact tme of every frequency swtch on each processor. 3.3 Archtecture constrants: the workload approach In order to provde the optmal frequency schedule, the lnear program s provded wth all possble task confguratons,.e., all possble of parallel tasks, known as workloads. Then the solver provdes the executon frequency of each workload Shared frequency constrant We need to express that tasks executed at the same tme on the same processor run at the same frequency. Hence, we frst need to dentfy tasks executed n parallel on the same processor. Dependng on the frequency beng used, the set of parallel tasks may change. Fgure 4 s an example of two dfferent executons runnng at the maxmal and mnmal frequency. Only processes that belong to the same processor are represented. In Fgure 4a, when the processor runs at f max, the set of couple of tasks whch are parallel s: (T 1,T 3 ),(T 1,Ts 3 ),(Ts 1,Ts 3 ),(T 2,T 4 )} (represented by red dotted lnes). When the frequency s set to f mn (Fgure 4b), the slack after T 3 s completely covered and the set of parallel tasks becomes: (T 1,T 3 ),(Ts 1,T 3 ),(T 2,T 4 )}.

6 bw ew tw f dw tw f Begnnng of a workload W End of a workload W The tme a workload W s executed at frequency f The duraton of a workload A bnary varable used to say f a workload s executed at a frequency f or not Table 2: Workload formulaton varables In order to provde all possble confguratons, we defne the processor workloads. A workload, denoted W s tuple of potentally parallel tasks. In Fgure 4, W 1 = (T 1,T 3 ), W 2 = (Ts 1,T 3 ), W 3 = (T 1,Ts 3 ) represent a subset of the possble workloads. Note that there are no workloads wth the same set of tasks. In other words, once a task n a workload s over, a new workload begns. On the other hand, a task can belong to several workloads (lke T 1 n Fgure 4a). T 3 T 1 T 1 T 3 Ts 3 Ts 1 Ts 1 T 2 T 4 T 2 T 4 (a) f max (b) f mn Fgure 4: Workloads Recall that our goal n to calculate the fracton of tme a tasks should spend at each frequency (tt f ) n order to mnmze the energy consumpton of the applcaton accordng to the objectve functon (5). Snce tasks may be executed at several frequences, so does a workload. In Fgure 5, the workload W 1 = (T 1,T 3 ) s executed at frequency f 1 then at frequency f 2. Thus, snce T 1 belongs to both W 1 = (T 1,T 3 ) and W 2 = (T 1,Ts 3 ), the executon tme of T 1 at frequency f 1 (tt f1 1 ) can be calculated by usng the fracton of tme W 1 and W 2 spend at frequency f 1. In other words, the executon tme of a task can be calculated accordng to the executon tme of the workloads t belongs to. Let tw f be the fracton of tme the workload W spends at frequency f. Thus: tt f = tw f j (8) W j,t W j Usng the executon tme of a workload at a specfc frequency (tw f ), one can calculate the duraton of a workload, dw as: f 1 T 1 T 3 W 1 f 2 W 2 f 1 f 2 T 2 Ts 3 T 4 W 3 W 4 Fgure 5: Workloads and tasks executon

7 dw = f tw f Handlng frequency swtch delay Recall that one of the problems when consderng DVFS s the tme requred to actually set a new frequency. Thus, before settng a frequency, one has to make sure that duraton of the workload s long enough to tolerate the frequency change snce changng frequency takes some tme. In other words, f the frequency f s set n a W, tw f s larger than a user-defned threshold, denoted T h. W, f : tw f Th tw f (9) tw f s a bnary varable used to guarantee that defnton (9) remans true when tw f = 0. tw f 0 tw f = 0 = 1 otherwse (10) The expresson of defnton (10) as a mxed bnary programmng formulaton s expressed n the appendx Vald workload flterng The lnear program s provded wth all possble workloads, then t provdes the dfferent tw f j for each workload. However, all workloads cannot be present n one executon. In Fgure 4, W 1 = (T 1,Ts 3 ) and W 2 = (Ts 1,T 3 ) are both possble workloads, but they cannot be n the same executon, because f W 1 s beng executed, t means that T 3 s over (snce Ts 3 s after T 3 ) thus W 2 cannot appear later snce Ts 1 and T 3 are never parallel. Thus, n order to prevent W 1 and W 2 from both exstng n one executon, we just need to check whether the tasks of the workload can be parallel or not. Two tasks are not parallel f one ends before the begnnng of the second. Snce we consder workloads, we focus only on the begnnng and end tme of the workload tself. Let bw and ew be the start tme and the end tme of the workload W j = (T 1,...,T,...,T n ). They are such that: bw j >= bt (11) ew j <= et (12) Note that although the begnnng and the end of the workload are not exactly defned, ths defnton makes sure that the begnnng or the end of a task start a new workload. Moreover, the complete executon of a task are guaranteed thanks to equatons (4) and (8). Fgure 6 s an example of a workload that cannot exst. Let us assume the executon represented n Fgure 6, and let us focus on the workload W 1 = (T 1,Ts 3 ). Let us also assume that wth other frequences, a possbleworkloadsw 2 = (T 3,Ts 1 ). Asexplanedabove, W 1 andw 2 cannotbothexstnthesameexecuton because of precedence constrants. It s obvous from the example that T 3 and Ts 1 are not parallel, let us see how t translates to workloads. Snce W 2 has to start after both T 3 and Ts 1 begns, then t starts after Ts 1 (snce bts 1 bt 3 Fgure 6). The same way t ends before et 3. But snce et 3 bts 1 (as shown n Fgure 6) then the duraton of W 2 should be negatve whch s not possble. Thus, we dentfy workloads whch cannot be n the executon as workloads whch end before they begn. The duraton of a workload s such that: 0 ew < bw dw = (13) ew bw otherwse In the appendx (secton 6), we proove that f two workloads cannot be n the same executon (because of the precedence constrants), then the duraton of at least one of them s 0 (paragraph 6.4.2).

8 bt 3 T 1 T 3 Ts 1 T 2 Ts 3 et 3 ew 2 ets 1 and ew 2 et 3. Thus the workload must at most end here bts 1 bw 2 bts 1 and bw 2 bt 3. Thus the workload must at least start here ets 1 Fgure 6: Negatve workload duraton for mpossble workloads Dscusson The appendx (secton 6) provdes a detaled formulaton of the energy mnmzaton problem usng workloads. The formulaton shows the use of two bnary varables: one to express the threshold constrant and one to calculate the duraton of the workload. Wth these two varables, the formulaton s not lnear anymore, whch requres more tme to solve (especally when the number of workloads s mportant). Moreover, we tred provdng all possble workloads of one of the NAS parallel benchmarks on class C on 16 processes (IS.C.16) on a machne equped wth 16 GB of memory. The applcaton task graph s composed of 630 tasks. The generated data (.e. the number of workloads) could not ft n the memory of the machne. Thus, even wth no bnary varables, provdng all possble workloads s not possble when consderng real applcatons. In the followng secton, we provde another formulaton whch requres only the task graph. 3.4 Archtecture constrants: the frequency swtch approach As explaned earler, our goal s to mnmze the energy consumpton of a parallel applcaton usng DVFS. In order to do so, we express the problem as a lnear program. We consder that the program s represented as a task graph and each task can have several phases. The dffculty of the formulaton s to provde, for each task, the frequency of each of ts phases (tt f ) snce one has to make sure that parallel tasks must run at the same frequency. In ths secton, we provde another formulaton whch consders the tme to set a new frequency on the whole processor nstead of consderng tasks ndependently and then force parallel tasks to run at the same frequency Frequency swtch overhead Let c f jp be the tme the frequency f s set on the processor p, j beng the sequence number of the frequency swtchng. Fgure 7 represents the executon of four tasks on two cores of the same processor p. In the example, we assume that there are only 3 possble frequences. The dfferent c f jp are numbered such that the mnmum frequency f 1 corresponds to the swtchng tme c f1 1p,cf1 4p,..., the frequency f 2 corresponds to the frequency changes c f2 2p,cf2 5p,... and so on. A frequency f 1 s appled durng a tme whch can be calculated as c f2. Ths can be translated to: +1}p cf1 p c f2 +1}p cf1 p c f p d f j Tme of the th frequency swtch on processor p. The frequency f s the one set The amount of tme a frequency f s set for the task for the frequency swtch j Table 3: Frequency swtch formulaton varables

9 c f 1 11 T 1 T 3 c f 2 21 Ts 3 c f 3 31 = c f 1 41 T 2 T 4 c f 2 51 Fgure 7: Frequency swtches example Note that some frequences may not be set f the duraton s zero. In fgure 7, frequency f 3 s not set snce c f3 31 = cf Handlng frequency swtch delay As explaned earler, changng frequency takes some tme. Thus, for a change to be appled, ts duraton has to be longer than the user-defned threshold Th. Let ζ f p be a bnary varable, such that: ζ f p = 0 c f +1}p cf p = 0 (14) 1 otherwse The threshold condton can be expressed as: c f +1}p cf p Th ζf p We detal how equaton (14) s translated nto mxed bnary programmng constrants n the appendx Shared frequency constrants Once the threshold condton s satsfed, one can calculate the tme a task spends at each frequency,.e tt f, accordng to cf jp. On Fgure 7, ntally, tasks T 1 and T 3 run n parallel at frequency f 1. The tme T 3 spends at frequency f 1 s c f2 21 cf1 11 whereas T 1 s executed twce at f 1. It spends (c f2 21 cf1 11 )+(et 1 c f1 41 ) at frequency f 1. Let d f j be the tme the task T spends at frequency f after the frequency swtch j. Back to Fgure 7, d f1 11 = cf2 21 cf1 11 The above translates to: and df1 14 = et 1 c f1 41. ttf1 1 becomes tt f1 1 = d f1 11 +df1 14. tt f = j Note that a task s not mpacted by a frequency change f t ends before the change or begns after the next change. In other words, d f1 j = 0 f et c f1 jp or bt c f2 j+1}p. Otherwse, df1 j can be calculated as mn(et,c f2 j+1}p ) max(bt,c f1 jp ). 3.5 Dscusson d f j 0 d f j = et c f jp or bt c f +1}p mn(et,c f j+1}p ) max(bt,c f jp ) otherwse (15) The appendx (secton 6) provdes the complete formulaton of the problem usng the frequency swtch tme varables. In addton to the bnary varable used to satsfy the frequency swtch overhead, for each task and for each frequency swtch, fve addtonnal bnary varables are used. Thus, for n tasks and m frequency

10 swtch consdered, 5 n m bnary varables are requred. Mxed nteger programmng s NP-hard [?], thus, wth such a number of bnary varables, no soluton can be provded. When comparng the workload approach and the frequency swtch approach, one can notce that the former needs less bnary varables and should be able to provde results. However, because all possble workloads have to be provded to the solver, t s as complex because of the memory requred. Thus, f a very large memory s avalable, then the workload soluton s the one to be used. And f new faster bnary resoluton technques are provded, then the frequency swtch soluton should be used. Several heurstcs can be assumed n order to reduce the tme to solve the problem. Frst, one can consder teratve applcatons, and solve the problems for only one teraton then apply t the remanng ones. However, ths soluton strongly depends on the number of tasks per teratons. We tred ths soluton on some kernels (NAS Parallel Benchmarks [?]) and the solver could not provde any result after several hours. The most promsng heurstc s to consder the tasks at the processor level nstead of the core level. Thus, the only archtecture constrant whch needs to be consdered s the frequency overhead one. Ths study s part of our current work and wll be dscussed n further studes. 4 Related Work DVFS schedulng has been wdely used to mprove processor energy consumpton durng applcaton executon. We focus on studes assumng a set of dependent tasks represented as a drect acyclc graph (DAG). A lot of studes tackle task mappng problem whle mnmzng energy consumpton ether wth respect to task deadlnes[?] or by tryng to mnmze the deadlne as well [?]. When consderng an already mapped task graph, studes provde the executon speed of each task dependng on the frequency model: contnuous [?] or dscrete [?]. Some studes also provde a set of frequences to execute a task [?] (executng a task at multple frequences s known as VDD-Hoppng). In [?], the authors present a complexty study of the energy mnmzaton problem dependng on the frequency model (contnuous frequences, dscrete frequences wth and wthout VDD-Hoppng). Fnally studes lke [?] and [?] consder frequency transton overhead. Although these studes should provde an optmal frequency schedule, they do not consder the constrants of most current archtectures and more specfcally the shared frequency among all cores of the same processor. When consderng lnear programmng formulaton to mnmze applcaton energy consumpton, many formulatons have been proposed n the past. When consderng sngle processor,[?] provdes an nteger lnear programmng formulaton wth neglgble frequency swtchng overhead. The same problem but consderng frequency transton overhead was addressed n [?]. The author also provde a lnear-tme heurstc algorthm whch provdes near-optmal soluton. The work presented n [?] s the closest to the work presented n ths paper. In [?], the authors present a lnear programmng formulaton of the mnmzaton energy problem where tasks can be executed at several frequences. Both slack energy and processor energy consumpton are consdered n the mnmzaton and a loose deadlne s consdered. In a smlar way, [?] provdes a schedulng algorthm and an nteger lnear programmng formulaton of the energy mnmzaton problem on heterogeneous systems wth a fxed deadlne. The formulaton s very close to the one descrbed n [?], but the authors also consdered communcaton energy consumpton. However, they do not consder slack tme and ts power consumpton when solvng the problem. In [?] the authors use an nteger lnear programmng formulaton of the problem where only task wth slack tme are slowed down, whereas other tasks are run at maxmal frequency. The program s used to compute the best frequency executon of a task. Although prevous studes provde dfferent solutons and formulatons for DVFS schedulng, few of them consder current archtecture constrants. Whle some prevous studes consder frequency transton overhead[?,?], noneofthemconsderthefactthatcoreswthn thesameprocessorrunatthesamefrequency. Ths paper descrbes a mxed lnear programmng formulaton that guarantees that parallel tasks on the same processor run at the same frequency. Moreover, t shows that t s possble to relax the deadlne f t leads to energy savng.

11 5 Concluson Thegoalofthspaperwastoprovdeastudyonhowenergymnmzatonproblemofaparallelexecutonofan MPI-lke program can be addressed and formulated when consderng most current archtecture constrants. In order to do so, we used lnear programmng formulaton. Two dfferent formulatons were descrbed. Ther goal s to mnmze the energy consumpton wth respect to a user-defned deadlne by provdng the optmal frequency schedule. Both solutons use a number of bnary varables whch s proportonal to the number of tasks. Used as they are, these formulatons should provde an optmal soluton but are costly n terms of memory and resoluton tme, despte the use of fast parallel solvers lke gurob [?]. We are currently workng on ntroducng heurstcs to relax the archtecture constrants by buldng tasks on the processor level nstead of the core level. Usng such heurstcs seems to drastcally reduce the tme needed to solve the problem. 6 Appendx Ths appendx summarzes the set of constrants of both formulatons descrbed n paragraphs 3.3 and 3.4. We start by descrbng how each non lnear constrant whch appears n sectons 3.3 and 3.4 s expressed. For a more complete descrpton and explanaton, the reader can refer to [?]. 6.1 Expressng non lnear constrants Secton 3 presents dfferent non contnuous varables (defntons 10, (13) and (14), (15)). In ths secton, we brefly explan how ths knd of expressons translates to nequaltes usng bnary varables. 1. If-then statement wth 0-1 varables: Expressng condtons lke: 0 x = 0 x = 1 otherwse (for nstance, defnton 10) requres the use of a large constant M such that: x M x (16) x x ǫ (17) Thus, when x = 0, (17) forces x to be equal to 0 and when x 0, (16) s used to set the value of x to 1. Note that, equaton (9), whch guarantees that tw f Th tw f makes (17) useless (snce Th > ǫ). Thus, (17) s never used n the set of constrants. 2. If-then statement wth real varables: Expressng formulas lke: 0 y < x z = y x otherwse (defnton (13) for nstance) s smlar to the prevous formulaton n the sens that t requres the use of a bg constant M. A bnary varable bn s used such that when y x 0, bn = 0. y x M bn (18) x y M (1 bn) (19) Thus, when y x, (18) s always vald regardless the value of bn. Hence, (19) forces bn to be equal to 0. Smlarly, when y x, equaton (18) forces bn to 1.

12 Once bn s defned, z can be expressed as: y x z M bn (20) y x+z 2 (y x)+m (1 bn) (21) Thus, when y x, bn = 0 (from (18)) and (20) forces z to be 0 (snce all varable are postve) and (21) s always vald. Smlarly, when y x, bn = 1 (from (19)) and (20) and (21) become: Thus y x z y x whch makes z = y x. y x z M z y x 3. Maxmums: Maxmums can be expressed by reformulatng the defnton as: 0 x y z = max(x,y) = x+ y x otherwse Let w be such that: w = We can express w by usng (20) and (21). 0 x y y x otherwse 4. Mnmums: Expressng mnmums s based on the same dea than expressng maxmums: 0 x y z = mn(x,y) = x (x y) x y otherwse We do not detal how mnmums are expressed, snce t s done the same way as maxmums. 5. Expressng several condtons: In defntons lke (15), several condtons can force the value of a varable. 0 x y or z u w = 0 otherwse Translatng such defntons nto nequaltes requres the use of one bnary varable for each condton and one bnary varable to express the or. 1 f z u 0 1 f x y 0 Let bn1, bn2 be such that: bn1 = and bn2 = 0 otherwse 0 otherwse These two defntons can be expressed usng (16) and (17). Fnally bn3 s a bnary varable whch s equal to 1 f bn1 or bn2 are equal to 1 and 0 otherwse: 1 bn1+bn2 1 bn3 = (22) 0 otherwse Snce bn1, bn2 and bn3 are bnary varables, (22) can be easly expressed as: bn1 bn3 (23) bn2 bn3 (24) bn3 bn1 + bn2 (25) Thus, when bn1 and bn2 are 0, (25) forces bn3 to be 0 whereas when bn1 or bn2 are equal to 1, (23) and 24 forces bn3 to be equal to 1.

13 6.2 Objectve functon Mnmzng the energy consumpton of a program descrbed as a set of tasks s the objectve functon of the lnear programmng formulatons descrbed above. For a task T wth a power consumpton at a frequency f, P f and executed at frequency f durng tt f, the energy consumpton of the whole program for ts whole executon tme s: mn( T ( f (tt f P f ))) 6.3 Task constrants Let T,T +1,T +2,T j be four tasks such that: T,T +1,T +2 are consecutve and on the same processor. T ends wth a message sendng creatng T +1 whch ends wth a recepton from T j whch generates T +2 as shown n Fgure 8. T T j T +1 Ts +1 T +2 Fgure 8: Task confguraton et = bt + f δ f = 1 f tt f 6.4 Workload approach Addtonal varable bt +1 = et bts +1 = et +1 ets +1 et j +M +1 j ets +1 bts +1 bt +2 = ets +1 tt f = δ f exectf γ : A bnary varable used to say f a workload duraton s 0 or not M : A large constant bw bt j ew et j tt f = dw = f j tw W j T W j f tw f

14 Usng (16), (17) and (9), we express defnton (10) as: tw f tw f Th tw f M tw f Usng (16), (17), (20) and (21) and γ as the bnary varable, we express defnton (13) as: ew bw M γ bw ew M (1 γ ),γ 0,1} ew bw dw M γ ew bw +dw 2 (ew bw )+M (1 γ ) Proof of workload duraton We want to proove that f two workloads W and W are possble, but they volate the precedence constrant between the tasks, then the duraton of at least one of them s zero. We provde the proof for workloads wth a cardnalty equals to 2 snce the proof remans the same for larger workloads. Let W = (T,T j ) and W = (T,T j ) such that T preceeds T and T j preceeds T j. We want to prove that dw = 0 or dw = 0. Lemma 1. Let W = (T,T j ) and W = (T,T j ). If bt et and bt j et, then dw = 0 or dw = 0. Proof. Let us proove lemma by contradcton. Let us assume that dw 0 and dw 0. dw 0 ew bw From defnton (10): dw 0 ew bw From constrants (11) and (12): bw bt bw bt j ew et (26) and bw bt bw bt j ew et ew et j ew et j But bt et and bt j et, thus: bw btj et j ew (27) bw bt et ew (28) If we consder (27), (28) and (26): bw bt et bw ew Thus bw ew whch by defnton (10) mples that dw = 0 whch leads to a contradcton. 6.5 Frequency swtch approach Note that we do not detal how the threshold condton s handled snce t s done the same as for the workloads.

15 6.5.1 Addtonal varables ζ f p : A bnary varable used to say f a workload s executed at a frequency f or not y f j w f j α f j z f j g f j : The maxmum between bt and c f jp : A varable used to express y f j. It s equal to 0 f bt s the maxmum, and c f jp bt otherwse : A bnary varable used to verfy whether bt c f jp : The mnmum between et and c f j+1}p : A varable used to express z f j. It s equal to 0 f et s the mnmum, and et c f j+1}p otherwse β f j : A bnary varable used to verfy whether et c f j+1}p ψ f j : A bnary varable used to check f bt c f +1}p 0 φ f j : A bnary varable used to check f et c f p 0 ρ f j : A bnary varable used to check f ψ f j or φf j are true M : A large constant Constrants c f +1}p c f p c f +1}p cf p Th ζ f p c f +1}p cf p M ζ f p tt f = d f j Expressng defnton (15) as nequaltes requres the use of (20) and(21) for the maxmum and the mnmum such that: y f j = max(bt,c f jp ) = bt +w f such that: w f j = 0 f bt s the maxmum c f j jp bt otherwse z f j = mn(et,c f j+1}p ) = et g f j such that: g f j = j 0 f et s the mnmum c f j+1}p et otherwse Let α f j be the bnary varable used for the maxmum and βf j the one used for the mnmum. By replacng the correspondng varables n (20) and (21), we obtan the followng nequaltes for the maxmum: c f jp bt M α f j bt c f jp M (1 α f j ),αf j 0,1} c f jp bt w f j M α f j c f jp bt +w f j 2 (c f jp bt )+M (1 α f j ) and the followng for the mnmum: et c f j+1}p M β f j c f j+1}p et M (1 β f j ),βf j 0,1} et c f j+1}p g f j M β f j et c f j+1}p +gf j 2 (et c f j+1}p )+M (1 βf j ) Fnally, usng (23), (24) and (25) and the bnary varables ψ f j, φf j and ρf j as bn1, bn2 and bn3 respectvely and usng (20) and (21), d j can be expressed as:

16 φ f j ρ f j ψ f j ρ f j ρ f j φ f j +ψf j z f j yf j d f j M (1 ρ f j ) z f j yf j +df j 2 (z f j yf j )+M ρf j

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Embedded Systems. 4. Aperiodic and Periodic Tasks

Embedded Systems. 4. Aperiodic and Periodic Tasks Embedded Systems 4. Aperodc and Perodc Tasks Lothar Thele 4-1 Contents of Course 1. Embedded Systems Introducton 2. Software Introducton 7. System Components 10. Models 3. Real-Tme Models 4. Perodc/Aperodc

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling Real-Tme Systems Multprocessor schedulng Specfcaton Implementaton Verfcaton Multprocessor schedulng -- -- Global schedulng How are tasks assgned to processors? Statc assgnment The processor(s) used for

More information

Calculation of time complexity (3%)

Calculation of time complexity (3%) Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 4: November 17, Part 1 Single Buffer Management Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input

More information

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition Sngle-Faclty Schedulng over Long Tme Horzons by Logc-based Benders Decomposton Elvn Coban and J. N. Hooker Tepper School of Busness, Carnege Mellon Unversty ecoban@andrew.cmu.edu, john@hooker.tepper.cmu.edu

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1] DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm

More information

Two Methods to Release a New Real-time Task

Two Methods to Release a New Real-time Task Two Methods to Release a New Real-tme Task Abstract Guangmng Qan 1, Xanghua Chen 2 College of Mathematcs and Computer Scence Hunan Normal Unversty Changsha, 410081, Chna qqyy@hunnu.edu.cn Gang Yao 3 Sebel

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

HMMT February 2016 February 20, 2016

HMMT February 2016 February 20, 2016 HMMT February 016 February 0, 016 Combnatorcs 1. For postve ntegers n, let S n be the set of ntegers x such that n dstnct lnes, no three concurrent, can dvde a plane nto x regons (for example, S = {3,

More information

A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS

A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS Shervn Haamn A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS INTRODUCTION Increasng computatons n applcatons has led to faster processng. o Use more cores n a chp

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Lecture 5 Decoding Binary BCH Codes

Lecture 5 Decoding Binary BCH Codes Lecture 5 Decodng Bnary BCH Codes In ths class, we wll ntroduce dfferent methods for decodng BCH codes 51 Decodng the [15, 7, 5] 2 -BCH Code Consder the [15, 7, 5] 2 -code C we ntroduced n the last lecture

More information

An Interactive Optimisation Tool for Allocation Problems

An Interactive Optimisation Tool for Allocation Problems An Interactve Optmsaton ool for Allocaton Problems Fredr Bonäs, Joam Westerlund and apo Westerlund Process Desgn Laboratory, Faculty of echnology, Åbo Aadem Unversty, uru 20500, Fnland hs paper presents

More information

10. Canonical Transformations Michael Fowler

10. Canonical Transformations Michael Fowler 10. Canoncal Transformatons Mchael Fowler Pont Transformatons It s clear that Lagrange s equatons are correct for any reasonable choce of parameters labelng the system confguraton. Let s call our frst

More information

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments

More information

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION Advanced Mathematcal Models & Applcatons Vol.3, No.3, 2018, pp.215-222 ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EUATION

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Finding Primitive Roots Pseudo-Deterministically

Finding Primitive Roots Pseudo-Deterministically Electronc Colloquum on Computatonal Complexty, Report No 207 (205) Fndng Prmtve Roots Pseudo-Determnstcally Ofer Grossman December 22, 205 Abstract Pseudo-determnstc algorthms are randomzed search algorthms

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

Physics 5153 Classical Mechanics. Principle of Virtual Work-1 P. Guterrez 1 Introducton Physcs 5153 Classcal Mechancs Prncple of Vrtual Work The frst varatonal prncple we encounter n mechancs s the prncple of vrtual work. It establshes the equlbrum condton of a mechancal

More information

Lecture 14: Bandits with Budget Constraints

Lecture 14: Bandits with Budget Constraints IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed

More information

CHAPTER 17 Amortized Analysis

CHAPTER 17 Amortized Analysis CHAPTER 7 Amortzed Analyss In an amortzed analyss, the tme requred to perform a sequence of data structure operatons s averaged over all the operatons performed. It can be used to show that the average

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Lecture Space-Bounded Derandomization

Lecture Space-Bounded Derandomization Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval

More information

A Simple Inventory System

A Simple Inventory System A Smple Inventory System Lawrence M. Leems and Stephen K. Park, Dscrete-Event Smulaton: A Frst Course, Prentce Hall, 2006 Hu Chen Computer Scence Vrgna State Unversty Petersburg, Vrgna February 8, 2017

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

Annexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances

Annexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances ec Annexes Ths Annex frst llustrates a cycle-based move n the dynamc-block generaton tabu search. It then dsplays the characterstcs of the nstance sets, followed by detaled results of the parametercalbraton

More information

Improved Worst-Case Response-Time Calculations by Upper-Bound Conditions

Improved Worst-Case Response-Time Calculations by Upper-Bound Conditions Improved Worst-Case Response-Tme Calculatons by Upper-Bound Condtons Vctor Pollex, Steffen Kollmann, Karsten Albers and Frank Slomka Ulm Unversty Insttute of Embedded Systems/Real-Tme Systems {frstname.lastname}@un-ulm.de

More information

Two-Phase Low-Energy N-Modular Redundancy for Hard Real-Time Multi-Core Systems

Two-Phase Low-Energy N-Modular Redundancy for Hard Real-Time Multi-Core Systems 1 Two-Phase Low-Energy N-Modular Redundancy for Hard Real-Tme Mult-Core Systems Mohammad Saleh, Alreza Ejlal, and Bashr M. Al-Hashm, Fellow, IEEE Abstract Ths paper proposes an N-modular redundancy (NMR)

More information

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

The optimal delay of the second test is therefore approximately 210 hours earlier than =2. THE IEC 61508 FORMULAS 223 The optmal delay of the second test s therefore approxmately 210 hours earler than =2. 8.4 The IEC 61508 Formulas IEC 61508-6 provdes approxmaton formulas for the PF for smple

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Economics 101. Lecture 4 - Equilibrium and Efficiency

Economics 101. Lecture 4 - Equilibrium and Efficiency Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Reclaiming the energy of a schedule: models and algorithms

Reclaiming the energy of a schedule: models and algorithms CONCURRENCY AN COMPUTATION: PRACTICE AN EXPERIENCE Concurrency Computat.: Pract. Exper. 0; 5:505 5 Publshed onlne 5 July 0 n Wley Onlne Lbrary (wleyonlnelbrary.com)..889 Reclamng the energy of a schedule:

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Canonical transformations

Canonical transformations Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,

More information

Fundamental loop-current method using virtual voltage sources technique for special cases

Fundamental loop-current method using virtual voltage sources technique for special cases Fundamental loop-current method usng vrtual voltage sources technque for specal cases George E. Chatzaraks, 1 Marna D. Tortorel 1 and Anastasos D. Tzolas 1 Electrcal and Electroncs Engneerng Departments,

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Edge Isoperimetric Inequalities

Edge Isoperimetric Inequalities November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary

More information

Analysis of Queuing Delay in Multimedia Gateway Call Routing

Analysis of Queuing Delay in Multimedia Gateway Call Routing Analyss of Queung Delay n Multmeda ateway Call Routng Qwe Huang UTtarcom Inc, 33 Wood Ave. outh Iseln, NJ 08830, U..A Errol Lloyd Computer Informaton cences Department, Unv. of Delaware, Newark, DE 976,

More information

Queueing Networks II Network Performance

Queueing Networks II Network Performance Queueng Networks II Network Performance Davd Tpper Assocate Professor Graduate Telecommuncatons and Networkng Program Unversty of Pttsburgh Sldes 6 Networks of Queues Many communcaton systems must be modeled

More information

Math 261 Exercise sheet 2

Math 261 Exercise sheet 2 Math 261 Exercse sheet 2 http://staff.aub.edu.lb/~nm116/teachng/2017/math261/ndex.html Verson: September 25, 2017 Answers are due for Monday 25 September, 11AM. The use of calculators s allowed. Exercse

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Curve Fitting with the Least Square Method

Curve Fitting with the Least Square Method WIKI Document Number 5 Interpolaton wth Least Squares Curve Fttng wth the Least Square Method Mattheu Bultelle Department of Bo-Engneerng Imperal College, London Context We wsh to model the postve feedback

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Online Appendix: Reciprocity with Many Goods

Online Appendix: Reciprocity with Many Goods T D T A : O A Kyle Bagwell Stanford Unversty and NBER Robert W. Stager Dartmouth College and NBER March 2016 Abstract Ths onlne Appendx extends to a many-good settng the man features of recprocty emphaszed

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

find (x): given element x, return the canonical element of the set containing x;

find (x): given element x, return the canonical element of the set containing x; COS 43 Sprng, 009 Dsjont Set Unon Problem: Mantan a collecton of dsjont sets. Two operatons: fnd the set contanng a gven element; unte two sets nto one (destructvely). Approach: Canoncal element method:

More information

TRANSPOSE ON VERTEX SYMMETRIC DIGRAPHS

TRANSPOSE ON VERTEX SYMMETRIC DIGRAPHS TRANSPOSE ON VERTEX SYMMETRIC DIGRAPHS Vance Faber Center for Computng Scences, Insttute for Defense Analyses Abstract. In [] (and earler n [3]), we defned several global communcaton tasks (unversal exchange,

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

Complete subgraphs in multipartite graphs

Complete subgraphs in multipartite graphs Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased

More information

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION 1 2 MULTIPLIERLESS FILTER DESIGN Realzaton of flters wthout full-fledged multplers Some sldes based on support materal by W. Wolf for hs book Modern VLSI Desgn, 3 rd edton. Partly based on followng papers:

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper Games of Threats Elon Kohlberg Abraham Neyman Workng Paper 18-023 Games of Threats Elon Kohlberg Harvard Busness School Abraham Neyman The Hebrew Unversty of Jerusalem Workng Paper 18-023 Copyrght 2017

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

The L(2, 1)-Labeling on -Product of Graphs

The L(2, 1)-Labeling on -Product of Graphs Annals of Pure and Appled Mathematcs Vol 0, No, 05, 9-39 ISSN: 79-087X (P, 79-0888(onlne Publshed on 7 Aprl 05 wwwresearchmathscorg Annals of The L(, -Labelng on -Product of Graphs P Pradhan and Kamesh

More information