Resilient and energy-aware algorithms
|
|
- Joel Chandler
- 5 years ago
- Views:
Transcription
1 Resilient and energy-aware algorithms Anne Benoit ENS Lyon CR /2017 CR02 Resilient and energy-aware algorithms 1/ 65
2 Motivation Need of resilient algorithms (see classes 3-7) Need of energy-aware algorithms (see classes 8-9)... And need to combine both! DVFS has an impact on resilience, so both problematics are linked... Also, does energy have an impact on the optimal checkpointing period? CR02 Resilient and energy-aware algorithms 2/ 65
3 Motivation Need of resilient algorithms (see classes 3-7) Need of energy-aware algorithms (see classes 8-9)... And need to combine both! DVFS has an impact on resilience, so both problematics are linked... Also, does energy have an impact on the optimal checkpointing period? CR02 Resilient and energy-aware algorithms 2/ 65
4 Outline 1 Optimal checkpointing period: time vs energy Framework Optimal period for execution time Optimal period for energy Experiments 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 3/ 65
5 Energy: a crucial issue Data centers ( Cloud Begins with Coal, M. Mills) TWh in 2013 consumption of Turkey (242), Spain (267), or Italy (309) 530Mt of CO 2 (carbontrust) > Canada crucial for both environmental and economical reasons Coordinated periodic checkpointing: what is the optimal checkpointing period if you optimize for Energy consumption? Is there a tradeoff between optimizing for Energy and optimizing for Time? Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 4/ 65
6 Outline 1 Optimal checkpointing period: time vs energy Framework Optimal period for execution time Optimal period for energy Experiments 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 5/ 65
7 Power model P Static : base power (platform switched on) Trend: goes down (w.r.t. other powers) P Cal : overhead due to CPU (computations) P I/O : overhead due to file I/O (checkpoint or recovery) P Down : overhead when one machine is down (rebooting) Meneses, Sarood and Kalé: Base power L = P Static E. Meneses, O. Sarood, and L.V. Kalé, Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems, in Proceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2012), New York, USA, October Maximum power H = P Static + P Cal P I/O = 0 (and P Down = 0) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 6/ 65
8 Coordinated checkpointing Periodic checkpointing policy of period T Independent and identically distributed failures Applies to a single processor with MTBF µ = µ ind Applies to a platform with p processors with MTBF µ = µ ind p tightly-coupled application progress all processors available Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 7/ 65
9 Cost of checkpointing Time spent working Time spent checkpointing Time spent working with slowdown Time Computing the first chunk Checkpointing the first chunk Processing the first chunk General model: while a checkpoint is taken, computations are slowed-down: during a checkpoint of duration C, the same amount of computation is done as during a time ωc without checkpointing (0 ω 1). Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 8/ 65
10 Outline 1 Optimal checkpointing period: time vs energy Framework Optimal period for execution time Optimal period for energy Experiments 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 9/ 65
11 Expected execution time T base execution time without any overhead T final = T ff + T fails total execution time Time for fault-free execution Time lost due to failures T T ff = T base T (1 ω)c T fails = T final (D + R + Re-Exec) µ Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 10/ 65
12 AlgoT: Strategy with T opt Time Computation of the optimal checkpointing period in the non-blocking case: See Course 4, Section 4: Assessing protocols at scale (with α instead of ω) T opt Time = 2(1 ω)c(µ (D + R + ωc)) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 11/ 65
13 Outline 1 Optimal checkpointing period: time vs energy Framework Optimal period for execution time Optimal period for energy Experiments 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 12/ 65
14 Consumed energy E final = T Cal P Cal + T I/O P I/O + T Down P Down + T final P Static ( = T base + T final (ωc + T 2 C 2 + ωc 2 )) P Cal µ 2T 2T ( Tfinal + (R + C 2 ) ) T base + C P µ 2T T (1 ω)c I/O + T final µ DP Down + T final P Static T final T Cal + T I/O + T Down, unless ω = 0 CPU and I/O activities are overlapped (and both consumed) when checkpointing Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 13/ 65
15 AlgoE: Strategy with T opt Energy P Cal = αp Static, P I/O = βp Static, P Down = γp Static (T a) 2( b 2µ T ) 2 E P Static T base final = ab+t 2 ( ) 2µ (αωc +βr +γd +µ)+ µ αt 2 + α(1 ω)c2 + βc2 2T 2T + (T a)(b 2µ T ) ( α+ α(1 ω)c2 βc 2 ) ( ) βc b 2µ T 2µ T 2 ( ) = T 3( ) 1 4µ 4µ 1 +T 2 αωc+βr+γd 2µ 2µ βc 2µ 4µ ( ) +T 2µ ab 2µ ab + βcb (α(1 ω) β)c2 2 µ 4µ 2 βcb 2 ( ) ab(αωc+βr+γd+µ) b µ 2µ a 4µ 2 (α(1 ω) β)c 2 ( ) + T 1 (α(1 ω) β) 2µ C (α(1 ω) β) C ) 2µ 2µ 2 + 2µ b + a βc 4µ 2 + 2µ 1 ( ) +T = T 2 ( αωc+βr+γd 2µ 2 + b+ a (βc a)b 2 (α(1 ω) β)c2 µ 4µ 2 ab(αωc+βr+γd+µ) βcb 2 ( µ ) + b 2µ + a 4µ 2 (α(1 ω) β)c 2. Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 14/ 65
16 AlgoE: Strategy with T opt Energy P Cal = αp Static, P I/O = βp Static, P Down = γp Static (T a) 2( b 2µ T ) 2 E P Static T base final = ab+t 2 ( ) 2µ (αωc +βr +γd +µ)+ µ αt 2 + α(1 ω)c2 + βc2 2T 2T + (T a)(b 2µ T ) ( α+ α(1 ω)c2 βc 2 ) ( ) βc b 2µ T 2µ T 2 ( ) = T 3( ) 1 4µ 4µ 1 +T 2 αωc+βr+γd 2µ 2µ βc 2µ 4µ ( ) +T 2µ ab 2µ ab + βcb (α(1 ω) β)c2 2 µ 4µ 2 βcb 2 ( ) ab(αωc+βr+γd+µ) b µ 2µ a 4µ 2 (α(1 ω) β)c 2 ( ) + T 1 (α(1 ω) β) 2µ C (α(1 ω) β) C ) 2µ 2µ 2 + 2µ b + a βc 4µ 2 + 2µ 1 ( ) +T = T 2 ( αωc+βr+γd 2µ 2 + b+ a (βc a)b 2 (α(1 ω) β)c2 µ 4µ 2 ab(αωc+βr+γd+µ) βcb 2 ( µ ) + b 2µ + T a opt 4µ 2 (α(1 ω) β)c 2. We let Maple compute Energy Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 14/ 65
17 Outline 1 Optimal checkpointing period: time vs energy Framework Optimal period for execution time Optimal period for energy Experiments 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 15/ 65
18 Parameters: power ρ = P Static + P I/O P Static + P Cal = 1 + β 1 + α 20 Mega-watts for Exascale platform with 10 6 nodes Nominal power = 20 milli-watts per node 1/2 1/4 of that power in static consumption I/O an order of magnitude more than computing (J. Shalf, S. Dosanjh, and J. Morrison, Exascale computing technology challenges, in the 9th Int. Conf. High Performance Computing for Computational Science, 2011) Scenario 1: P Static = 10, P Cal = 10, P I/O = 100 ρ = 5.5 Scenario 2: P Static = 5, P Cal = 10, P I/O = 100 ρ = 7 Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 16/ 65
19 Parameters: resilience MTBF N = 45, 208 processors: one fault per day Individual (processor) MTBF µ ind 125 years. Total number of processors N: from N = 219, 150 to N = 2, 191, 500 µ = 300 min down to µ = 30 min C = R = 10 min, D = 1 min, and ω = 1/2. Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 17/ 65
20 Impact of ratio ρ T final (T energy )/ E final (T time )/ T final (T energy )/ E final (T time )/ T final (T time ) E final (T energy ) T final (T time ) E final (T energy ) ρ (µ=300) (µ=120) (µ=30) ρ ρ (µ=300) How much slower, (µ=120) if we optimize for energy instead of optimizing (µ=30) for time Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 18/ 65
21 T final (T energy )/ Impact of1.12 ratio ρ E final (T time )/ T final (T time ) E final (T energy ) (µ=300) (µ=120) (µ=30) ρ How much more energy consumption, if we optimize for time instead of optimizing for energy Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 19/ 65
22 AlgoT over AlgoE 1.25 How much slower, if we optimize for energy instead of optimizing for time How much more energy consumption, if we optimize for time instead of optimizing for energy ρ ρ µ µ Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 20/ 65
23 Scalability (ρ = 5.5) T final (T energy )/ E final (T time )/ T final (T time ) E final (T energy ) Number of nodes (ρ=5.5) (ρ=5.5) Number of nodes µ = 120 min for 10 6 nodes, C = R = 1 min, D = 0.1 min, ω = 1/2 Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 21/ 65
24 Scalability (ρ = 7) T final (T energy )/ E final (T time )/ T final (T time ) E final (T energy ) Number of nodes (ρ=7) (ρ=7) Number of nodes µ = 120 min for 10 6 nodes, C = R = 1 min, D = 0.1 min, ω = 1/2 Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 22/ 65
25 Conclusion Coordinated checkpointing, non-blocking Different optimal periods for time and energy Save more than 20% of energy with 10% increase in time Expect more gains for large-scale platforms Variety of resilience and power consumption parameters Quite flexible analytical model Easy to instantiate for other scenarios CR02 Resilient and energy-aware algorithms 23/ 65
26 Conclusion Coordinated checkpointing, non-blocking Different optimal periods for time and energy Save more than 20% of energy with 10% increase in time Expect more gains for large-scale platforms Variety of resilience and power consumption parameters Quite flexible analytical model Easy to instantiate for other scenarios CR02 Resilient and energy-aware algorithms 23/ 65
27 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption Model for one single chunk Model for multiple chunks and optimization problem Solving the problems Simulations 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 24/ 65
28 Framework Execution of a divisible task (W operations) Failures may occur Transient failures Resilience through checkpointing Objective: minimize expected energy consumption E(E), given a deadline bound D Probabilistic nature of failure hits: expectation of energy consumption is natural (average cost over many executions) Deadline bound: two relevant scenarios (soft or hard deadline) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 25/ 65
29 Soft vs hard deadline Soft deadline: met in expectation, i.e., E(T ) D (average response time) Hard deadline: met in the worst case, i.e., T wc D VS Hard (worst-case) Soft (expected) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 26/ 65
30 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption Model for one single chunk Model for multiple chunks and optimization problem Solving the problems Simulations 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 27/ 65
31 Execution time, one single chunk One single chunk of size W Checkpoint overhead: execution time T C Instantaneous failure rate: λ First execution at speed s: T exec = W s + T C Failure probability: P fail = λt exec = λ( W s + T C ) In case of failure: re-execute at speed σ: T reexec = W σ And we assume success after re-execution + T C E(T ) = T exec + P fail T reexec = ( W s + T C ) + λ( W s + T C )( W σ + T C ) T wc = T exec + T reexec = ( W s + T C ) + ( W σ + T C ) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 28/ 65
32 Energy consumption, one single chunk One single chunk of size W Checkpoint overhead: energy consumption E C First execution at speed s: W s s 3 + E C = Ws 2 + E C Re-execution at speed σ: W σ 2 + E C, with probability P fail (Pfail = λt exec = λ( W s + T C ) ) E(E) = (Ws 2 + E C ) + λ ( W s + T C ) ( W σ 2 + E C ) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 29/ 65
33 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption Model for one single chunk Model for multiple chunks and optimization problem Solving the problems Simulations 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 30/ 65
34 Multiple chunks Execution times: sum of execution times for each chunk (worst-case or expected) Expected energy consumption: sum of expected energy for each chunk Coherent failure model: consider two chunks W 1 + W 2 = W Probability of failure for first chunk: Pfail 1 = λ( W 1 s + T C ) For second chunk: Pfail 2 = λ( W 2 s + T C ) With a single chunk of size W : P fail = λ( W s + T C ), differs from Pfail 1 + P2 fail only because of extra checkpoint Trade-off: many small chunks (more T C to pay, but small re-execution cost) vs few larger chunks (fewer T C, but increased re-execution cost) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 31/ 65
35 Optimization problem Decisions that should be taken before execution: Chunks: how many (n)? which sizes (W i for chunk i)? Speeds of each chunk: first run (s i )? re-execution (σ i )? Input: W, T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 32/ 65
36 Optimization problem Decisions that should be taken before execution: Chunks: how many (n)? which sizes (W i for chunk i)? Speeds of each chunk: first run (s i )? re-execution (σ i )? Input: W, T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 32/ 65
37 Optimization problem Decisions that should be taken before execution: Chunks: how many (n)? which sizes (W i for chunk i)? Speeds of each chunk: first run (s i )? re-execution (σ i )? Input: W, T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 32/ 65
38 Models Chunks Single chunk of size W Speed per chunk VS VS Multiple chunks (n and W i s) Single speed (s) Multiple speeds (s and σ) Deadline bound VS Hard (T wc D) Soft (E(T ) D) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 33/ 65
39 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption Model for one single chunk Model for multiple chunks and optimization problem Solving the problems Simulations 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 34/ 65
40 Single chunk and single speed Consider first that s = σ (single speed): need to find optimal speed E(E) is a function of s: E(E)(s) = (Ws 2 + E C )(1 + λ( W s + T C )) Lemma: this function is convex and has a unique minimum s (function of λ, W, E c, T c ) s = λw 6(1+λT C ) ( (3 3 where a = λe C ( 2(1+λTC ) λw 27a 2 4a 27a+2) 1/3 2 1/3 (3 3 ) 2 E(T ) and T wc : decreasing functions of s ) 2 27a 1/3 1, 2 4a 27a+2) 1/3 Minimum speed s exp and s wc required to match deadline D (function of D, W, T c, and λ for s exp ) Optimal speed: maximum between s and s exp or s wc Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 35/ 65
41 Single chunk and multiple speeds Consider now that s σ (multiple speeds): two unknowns E(E) is a function of s and σ: E(E)(s, σ) = (Ws 2 + E C ) + λ( W s + T C )(W σ 2 + E C ) Lemma: energy minimized when deadline tight (both for wc and exp) σ expressed as a function of s: σ exp = λw D, (1+λT W C ) σwc = W (D 2T C )s W s +T s C Minimization of single-variable function, can be solved numerically (no expression of optimal s) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 36/ 65
42 General problem with multiple chunks Divisible task of size W Split into n chunks of size W i : n i=1 W i = W Chunk i is executed once at speed s i, and re-executed (if necessary) at speed σ i Unknowns: n, W i, s i, σ i n ( E(E) = Wi si 2 ) n + E C + λ i=1 i=1 ( Wi s i + T C ) (Wi σ 2 i + E C ) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 37/ 65
43 Multiple chunks and single speed With a single speed, σ i = s i for each chunk Theorem: in optimal solution, n equal-sized chunks (W i = W n ), executed at same speed s i = s Proof by contradiction: consider two chunks W 1 and W 2 executed at speed s 1 and s 2, with either s 1 s 2, or s 1 = s 2 and W 1 W 2 Strictly better solution with two chunks of size w = (W 1 + W 2 )/2 and same speed s Only two unknowns, s and n Minimum speed with n chunks: s exp (n) = W 1 + 2λT C + 4 λd + 1 n 2(D nt C (1 + λt C )) Minimization of double-variable function, can be solved numerically both for expected and hard deadline Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 38/ 65
44 Multiple chunks and multiple speeds Need to find n, W i, s i, σ i With expected deadline: All re-execution speeds are equal (σ i = σ) and tight deadline All chunks have same size and are executed at same speed With hard deadline: If s i = s and σ i = σ, then all W i s are equal Conjecture: equal-sized chunks, same first-execution / re-execution speeds σ as a function of s, bound on s given n Minimization of double-variable function, can be solved numerically Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 39/ 65
45 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption Model for one single chunk Model for multiple chunks and optimization problem Solving the problems Simulations 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 40/ 65
46 Simulation settings Large set of simulations: illustrate differences between models Maple software to solve problems We plot relative energy consumption as a function of λ The lower the better Given a deadline constraint (hard or expected), normalize with the result of single-chunk single-speed Impact of the constraint: normalize expected deadline with hard deadline Parameters varying within large ranges Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 41/ 65
47 Comparison with single-chunk single-speed Results identical for any value of W /D E SCMSed MCSSed MCMSed Model (/SCSS) SCMShd MCSShd MCMShd 1e 06 1e 03 1e+00 lambda For expected deadline, with small λ (< 10 2 ), using multiple chunks or multiple speeds do not improve energy ratio: re-execution term negligible; increasing λ: improvement with multiple chunks For hard deadline, better to run at high speed during second execution: use multiple speeds; use multiple chunks if frequent failures Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 42/ 65
48 Expected vs hard deadline constraint E Model SCSS MCSS SCMS MCMS 1e 06 1e 03 1e+00 lambda Important differences for single speed models, confirming previous conclusions: with hard deadline, use multiple speeds Multiple speeds: no difference for small λ: re-execution at maximum speed has little impact on expected energy consumption; increasing λ: more impact of re-execution, and expected deadline may use slower re-execution speed, hence reducing energy consumption Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 43/ 65
49 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy Complexity results Heuristics Simulation results 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 44/ 65
50 Framework DAG: G = (V, E) n = V tasks T i of weight w i p identical processors fully connected DVFS, Continuous model: interval of available continuous speeds [s min, s max ] One speed per task Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 45/ 65
51 Makespan Execution time of T i at speed s i : d i = w i s i If T i is executed twice on the same processor at speeds s i and s i : d i = w i s i + w i s i Constraint on makespan: end of execution before deadline D (hard deadline constraint) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 46/ 65
52 Reliability Transient failure: local, no impact on the rest of the system Transient failure rate: Poisson distribution of parameter: λ(s) = λ 0 e d smax s smax s min. Reliability R i of task T i as a function of speed s i : R i (s i ) = e λ(s i )Exe(w i,s i ) = (1st order) 1 λ 0 e ds i w i s i Threshold reliability (and hence speed s rel ) R i (s rel 1) R i (s) s min s rel s max Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 47/ 65 s
53 Re-execution: a task is re-executed on the same processor, just after its first execution With two executions, reliability R i of task T i is: R i = 1 (1 R i (s i ))(1 R i (s i )) Constraint on reliability: Reliability: R i R i (s rel ), and at most one re-execution Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 48/ 65
54 Energy Energy to execute task T i once at speed s i : E i (s i ) = w i s 2 i Dynamic part of classical energy models With re-executions, it is natural to take the worst-case scenario: ( Energy : E i = w i s 2 i + s i 2 ) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 49/ 65
55 Energy and reliability: set of possible speeds Energy w i s 2 i + w i s 2 i = 2E i (s i ) w i s 2 i = E i (s i ) E i (s rel ) s rel 2 s rel Speed Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 50/ 65
56 Tri-Crit-Cont Given G = (V, E) Find A schedule of the tasks A set of tasks I = {i T i is executed twice} Execution speed s i for each task T i Re-execution speed s i for each task in I such that i I w i (s 2 i + s 2 i ) + i / I w i s 2 i is minimized, while meeting reliability and deadline constraints Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 51/ 65
57 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy Complexity results Heuristics Simulation results 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 52/ 65
58 Complexity results One speed per task Re-execution at same speed as first execution, i.e., s i = s i Tri-Crit-Cont is NP-hard even for a linear chain, but not known to be in NP (because of continuous model) Polynomial-time solution for a fork Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 53/ 65
59 Complexity results with Vdd-Hopping Each task is computed using at most two different speeds Tri-Crit-Vdd is NP-complete even for a linear chain Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 54/ 65
60 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy Complexity results Heuristics Simulation results 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 55/ 65
61 Energy-reducing heuristics Two steps: Mapping (NP-hard) List scheduling Speed scaling + re-execution (NP-hard) Energy reducing The list scheduling heuristic maps tasks onto processors at speed s max, and we keep this mapping in step two Step two = slack reclamation: use of deceleration and re-execution Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 56/ 65
62 Deceleration and re-execution Deceleration: select a set of tasks that we execute at speed max max(s rel, s i=1..n t i max D ): slowest possible speed meeting both reliability and deadline constraints Re-execution: greedily select tasks for re-execution Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 57/ 65
63 Super-weight (SW) of a task SW: sum of the weights of the tasks (including T i ) whose execution interval is included into T i s execution interval SW of task slowed down = estimation of the total amount of work that can be slowed down together with that task p 4 T i p 3 p 2 p 1 s e time Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 58/ 65
64 Selected heuristics A.SUS-Crit: efficient on DAGs with low degree of parallelism max Set the speed of every task to max(s rel, s i=1..n t i max D ) Sort the tasks of every critical path according to their SW and try to re-execute them Sort all the tasks according to their weight and try to re-execute them B.SUS-Crit-Slow: good for highly parallel tasks: re-execute, then decelerate Sort the tasks of every critical path according to their SW and try to re-execute them. If not possible, then try to slow them down Sort all tasks according to their weight and try to re-execute them. If not possible, then try to slow them down Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 59/ 65
65 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy Complexity results Heuristics Simulation results 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 60/ 65
66 Results We compare the impact of: the number of processors p the ratio D of the deadline over the minimum deadline D min (given by the list-scheduling heuristic at speed s max ) on the output of each heuristic Results normalized by heuristic running each task at speed s max ; the lower the better Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 61/ 65
67 Results 1 1 A.SUS-Crit A.SUS-Crit 0.9 B.SUS-Crit-Slow 0.9 B.SUS-Crit-Slow Eg / Eg_fmax Eg / Eg_fmax Number of processors Number of processors With increasing p, D = 1.2 (left), D = 2.4 (right) A better when number of processors is small B better when number of processors is large Superiority of B for tight deadlines: decelerates critical tasks that cannot be re-executed Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 61/ 65
68 Summary Tri-criteria energy/makespan/reliability optimization problem Various theoretical results Two-step approach for polynomial-time heuristics: List-scheduling heuristic Energy-reducing heuristics Two complementary energy-reducing heuristics for Tri-Crit-Cont CR02 Resilient and energy-aware algorithms 62/ 65
69 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 63/ 65
70 Conclusion Resilience and energy consumption are two of the main challenges for Exascale platforms Revisiting checkpointing techniques for reliability while minimizing energy consumption Tri-criteria heuristics aiming at minimizing the energy consumption, with re-execution to deal with reliability... Still a lot of challenging algorithmic problems on these hot topics Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 64/ 65
71 Conclusion Resilience and energy consumption are two of the main challenges for Exascale platforms Revisiting checkpointing techniques for reliability while minimizing energy consumption Tri-criteria heuristics aiming at minimizing the energy consumption, with re-execution to deal with reliability... Still a lot of challenging algorithmic problems on these hot topics Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 64/ 65
72 Bibliography Optimal checkpointing period: time vs energy (Aupy, Benoit, Hérault, Robert, Dongarra, 2013) Energy-aware checkpointing of divisible tasks with soft or hard deadlines (Aupy, Benoit, Melhem, Renaud-Goud, Robert, 2013) Energy-aware scheduling under reliability and makespan constraints (Aupy, Benoit, Robert, 2012) CR02 Resilient and energy-aware algorithms 65/ 65
Energy-aware checkpointing of divisible tasks with soft or hard deadlines
Energy-aware checkpointing of divisible tasks with soft or hard deadlines Guillaume Aupy 1, Anne Benoit 1,2, Rami Melhem 3, Paul Renaud-Goud 1 and Yves Robert 1,2,4 1. Ecole Normale Supérieure de Lyon,
More informationEnergy-efficient scheduling
Energy-efficient scheduling Guillaume Aupy 1, Anne Benoit 1,2, Paul Renaud-Goud 1 and Yves Robert 1,2,3 1. Ecole Normale Supérieure de Lyon, France 2. Institut Universitaire de France 3. University of
More informationEnergy-aware checkpointing of divisible tasks with soft or hard deadlines
Energy-aware checkpointing of divisible tasks with soft or hard deadlines Guillaume Aupy, Anne Benoit, Rami Melhem, Paul Renaud-Goud, Yves Robert To cite this version: Guillaume Aupy, Anne Benoit, Rami
More informationOptimal checkpointing periods with fail-stop and silent errors
Optimal checkpointing periods with fail-stop and silent errors Anne Benoit ENS Lyon Anne.Benoit@ens-lyon.fr http://graal.ens-lyon.fr/~abenoit 3rd JLESC Summer School June 30, 2016 Anne.Benoit@ens-lyon.fr
More informationReclaiming the energy of a schedule: models and algorithms
Author manuscript, published in "Concurrency and Computation: Practice and Experience 25 (2013) 1505-1523" DOI : 10.1002/cpe.2889 CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.:
More informationOn scheduling the checkpoints of exascale applications
On scheduling the checkpoints of exascale applications Marin Bougeret, Henri Casanova, Mikaël Rabie, Yves Robert, and Frédéric Vivien INRIA, École normale supérieure de Lyon, France Univ. of Hawai i at
More informationCheckpoint Scheduling
Checkpoint Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute of
More informationEmbedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden
of /4 4 Embedded Systems Design: Optimization Challenges Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden Outline! Embedded systems " Example area: automotive electronics " Embedded systems
More informationTradeoff between Reliability and Power Management
Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,
More informationEnergy-aware scheduling under reliability and makespan constraints
Energy-aare scheduling under reliability and makespan constraints Guillaume Aupy, Anne Benoit and Yves Robert LIP, Ecole Normale Supérieure de Lyon Email: {Guillaume.Aupy Anne.Benoit Yves.Robert}@ens-lyon.fr
More informationEnergy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems
Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems Jian-Jia Chen *, Chuan Yue Yang, Tei-Wei Kuo, and Chi-Sheng Shih Embedded Systems and Wireless Networking Lab. Department of Computer
More informationCIS 4930/6930: Principles of Cyber-Physical Systems
CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 11 Scheduling Hao Zheng Department of Computer Science and Engineering University of South Florida H. Zheng (CSE USF) CIS 4930/6930: Principles
More informationEnergy-efficient Mapping of Big Data Workflows under Deadline Constraints
Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute
More informationFailure Tolerance of Multicore Real-Time Systems scheduled by a Pfair Algorithm
Failure Tolerance of Multicore Real-Time Systems scheduled by a Pfair Algorithm Yves MOUAFO Supervisors A. CHOQUET-GENIET, G. LARGETEAU-SKAPIN OUTLINES 2 1. Context and Problematic 2. State of the art
More informationHow to deal with uncertainties and dynamicity?
How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling
More informationApproximation algorithms for energy, reliability and makespan optimization problems
Approximation algorithms for energy, reliability and makespan optimization problems Guillaume Aupy, Anne Benoit REEARCH REPORT N 8107 July 014 Project-Team ROMA IN 049-6399 IRN INRIA/RR--8107--FR+ENG Approximation
More informationFault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing
20 Years of Innovative Computing Knoxville, Tennessee March 26 th, 200 Fault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing Zizhong (Jeffrey) Chen zchen@mines.edu Colorado School
More informationAnalysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing
Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing Prasanna Balaprakash, Leonardo A. Bautista Gomez, Slim Bouguerra, Stefan M. Wild, Franck Cappello, and Paul D. Hovland
More informationEfficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems
Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems Yifeng Guo, Dakai Zhu The University of Texas at San Antonio Hakan Aydin George Mason University Outline Background and Motivation
More informationScheduling algorithms for heterogeneous and failure-prone platforms
Scheduling algorithms for heterogeneous and failure-prone platforms Yves Robert Ecole Normale Supérieure de Lyon & Institut Universitaire de France http://graal.ens-lyon.fr/~yrobert Joint work with Olivier
More informationSTATISTICAL PERFORMANCE
STATISTICAL PERFORMANCE PROVISIONING AND ENERGY EFFICIENCY IN DISTRIBUTED COMPUTING SYSTEMS Nikzad Babaii Rizvandi 1 Supervisors: Prof.Albert Y.Zomaya Prof. Aruna Seneviratne OUTLINE Introduction Background
More informationFault-Tolerant Techniques for HPC
Fault-Tolerant Techniques for HPC Yves Robert Laboratoire LIP, ENS Lyon Institut Universitaire de France University Tennessee Knoxville Yves.Robert@inria.fr http://graal.ens-lyon.fr/~yrobert/htdc-flaine.pdf
More informationMICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE
MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By
More informationEfficient checkpoint/verification patterns for silent error detection
Efficient checkpoint/verification patterns for silent error detection Anne Benoit 1, Saurabh K. Raina 2 and Yves Robert 1,3 1. LIP, École Normale Supérieure de Lyon, CNRS & INRIA, France 2. Jaypee Institute
More informationReliability-Aware Power Management for Real-Time Embedded Systems. 3 5 Real-Time Tasks... Dynamic Voltage and Frequency Scaling...
Chapter 1 Reliability-Aware Power Management for Real-Time Embedded Systems Dakai Zhu The University of Texas at San Antonio Hakan Aydin George Mason University 1.1 1.2 Introduction... Background and System
More informationSegment-Fixed Priority Scheduling for Self-Suspending Real-Time Tasks
Segment-Fixed Priority Scheduling for Self-Suspending Real-Time Tasks Junsung Kim, Björn Andersson, Dionisio de Niz, and Raj Rajkumar Carnegie Mellon University 2/31 Motion Planning on Self-driving Parallel
More informationEfficient checkpoint/verification patterns
Efficient checkpoint/verification patterns Anne Benoit a, Saurabh K. Raina b, Yves Robert a,c a Laboratoire LIP, École Normale Supérieure de Lyon, France b Department of CSE/IT, Jaypee Institute of Information
More informationCheckpointing strategies for parallel jobs
Checkpointing strategies for parallel jobs Marin Bougeret ENS Lyon, France Marin.Bougeret@ens-lyon.fr Yves Robert ENS Lyon, France Yves.Robert@ens-lyon.fr Henri Casanova Univ. of Hawai i at Mānoa, Honolulu,
More informationProcess Scheduling for RTS. RTS Scheduling Approach. Cyclic Executive Approach
Process Scheduling for RTS Dr. Hugh Melvin, Dept. of IT, NUI,G RTS Scheduling Approach RTS typically control multiple parameters concurrently Eg. Flight Control System Speed, altitude, inclination etc..
More informationEDF Feasibility and Hardware Accelerators
EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo, Waterloo, Canada, arrmorton@uwaterloo.ca Wayne M. Loucks University of Waterloo, Waterloo, Canada, wmloucks@pads.uwaterloo.ca
More informationScheduling divisible loads with return messages on heterogeneous master-worker platforms
Scheduling divisible loads with return messages on heterogeneous master-worker platforms Olivier Beaumont, Loris Marchal, Yves Robert Laboratoire de l Informatique du Parallélisme École Normale Supérieure
More informationOn the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms
On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms Anne Benoit, Fanny Dufossé and Yves Robert LIP, École Normale Supérieure de Lyon, France {Anne.Benoit Fanny.Dufosse
More informationSpatio-Temporal Thermal-Aware Scheduling for Homogeneous High-Performance Computing Datacenters
Spatio-Temporal Thermal-Aware Scheduling for Homogeneous High-Performance Computing Datacenters Hongyang Sun a,, Patricia Stolf b, Jean-Marc Pierson b a Ecole Normale Superieure de Lyon & INRIA, France
More informationEnergy-aware scheduling for GreenIT in large-scale distributed systems
Energy-aware scheduling for GreenIT in large-scale distributed systems 1 PASCAL BOUVRY UNIVERSITY OF LUXEMBOURG GreenIT CORE/FNR project Context and Motivation Outline 2 Problem Description Proposed Solution
More informationA Dynamic Real-time Scheduling Algorithm for Reduced Energy Consumption
A Dynamic Real-time Scheduling Algorithm for Reduced Energy Consumption Rohini Krishnapura, Steve Goddard, Ala Qadi Computer Science & Engineering University of Nebraska Lincoln Lincoln, NE 68588-0115
More informationAdaptive and Power-Aware Resilience for Extreme-scale Computing
Adaptive and Power-Aware Resilience for Extreme-scale Computing Xiaolong Cui, Taieb Znati, Rami Melhem Computer Science Department University of Pittsburgh Pittsburgh, USA Email: {mclarencui, znati, melhem}@cs.pitt.edu
More informationSaving Energy in Sparse and Dense Linear Algebra Computations
Saving Energy in Sparse and Dense Linear Algebra Computations P. Alonso, M. F. Dolz, F. Igual, R. Mayo, E. S. Quintana-Ortí, V. Roca Univ. Politécnica Univ. Jaume I The Univ. of Texas de Valencia, Spain
More informationA Flexible Checkpoint/Restart Model in Distributed Systems
A Flexible Checkpoint/Restart Model in Distributed Systems Mohamed-Slim Bouguerra 1,2, Thierry Gautier 2, Denis Trystram 1, and Jean-Marc Vincent 1 1 Grenoble University, ZIRST 51, avenue Jean Kuntzmann
More informationScheduling of Frame-based Embedded Systems with Rechargeable Batteries
Scheduling of Frame-based Embedded Systems with Rechargeable Batteries André Allavena Computer Science Department Cornell University Ithaca, NY 14853 andre@cs.cornell.edu Daniel Mossé Department of Computer
More informationScheduling Lecture 1: Scheduling on One Machine
Scheduling Lecture 1: Scheduling on One Machine Loris Marchal 1 Generalities 1.1 Definition of scheduling allocation of limited resources to activities over time activities: tasks in computer environment,
More informationHoming and Synchronizing Sequences
Homing and Synchronizing Sequences Sven Sandberg Information Technology Department Uppsala University Sweden 1 Outline 1. Motivations 2. Definitions and Examples 3. Algorithms (a) Current State Uncertainty
More informationOptimizing Energy Consumption under Flow and Stretch Constraints
Optimizing Energy Consumption under Flow and Stretch Constraints Zhi Zhang, Fei Li Department of Computer Science George Mason University {zzhang8, lifei}@cs.gmu.edu November 17, 2011 Contents 1 Motivation
More informationBi-objective approximation scheme for makespan and reliability optimization on uniform parallel machines
Bi-objective approximation scheme for makespan and reliability optimization on uniform parallel machines Emmanuel Jeannot 1, Erik Saule 2, and Denis Trystram 2 1 INRIA-Lorraine : emmanuel.jeannot@loria.fr
More informationEmbedded Systems Development
Embedded Systems Development Lecture 3 Real-Time Scheduling Dr. Daniel Kästner AbsInt Angewandte Informatik GmbH kaestner@absint.com Model-based Software Development Generator Lustre programs Esterel programs
More informationLecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013
Lecture 3 Real-Time Scheduling Daniel Kästner AbsInt GmbH 203 Model-based Software Development 2 SCADE Suite Application Model in SCADE (data flow + SSM) System Model (tasks, interrupts, buses, ) SymTA/S
More informationReliability-aware energy optimization for throughput-constrained applications on MPSoC
Reliability-aware energy optimization for throughput-constrained applications on MPSoC Changjiang Gou, Anne Benoit, Mingsong Chen, Loris Marchal, Tongquan Wei To cite this version: Changjiang Gou, Anne
More informationEmbedded Systems 15. REVIEW: Aperiodic scheduling. C i J i 0 a i s i f i d i
Embedded Systems 15-1 - REVIEW: Aperiodic scheduling C i J i 0 a i s i f i d i Given: A set of non-periodic tasks {J 1,, J n } with arrival times a i, deadlines d i, computation times C i precedence constraints
More informationScheduling Lecture 1: Scheduling on One Machine
Scheduling Lecture 1: Scheduling on One Machine Loris Marchal October 16, 2012 1 Generalities 1.1 Definition of scheduling allocation of limited resources to activities over time activities: tasks in computer
More informationScheduling divisible loads with return messages on heterogeneous master-worker platforms
Scheduling divisible loads with return messages on heterogeneous master-worker platforms Olivier Beaumont 1, Loris Marchal 2, and Yves Robert 2 1 LaBRI, UMR CNRS 5800, Bordeaux, France Olivier.Beaumont@labri.fr
More informationTOWARD THE PLACEMENT OF POWER MANAGEMENT POINTS IN REAL-TIME APPLICATIONS
Chapter 1 TOWARD THE PLACEMENT OF POWER MANAGEMENT POINTS IN REAL-TIME APPLICATIONS Nevine AbouGhazaleh, Daniel Mossé, Bruce Childers, Rami Melhem Department of Computer Science University of Pittsburgh
More informationEnergy-Constrained Scheduling for Weakly-Hard Real-Time Systems
Energy-Constrained Scheduling for Weakly-Hard Real-Time Systems Tarek A. AlEnawy and Hakan Aydin Computer Science Department George Mason University Fairfax, VA 23 {thassan1,aydin}@cs.gmu.edu Abstract
More informationOptimal Checkpoint Placement on Real-Time Tasks with Harmonic Periods
Kwak SW, Yang JM. Optimal checkpoint placement on real-time tasks with harmonic periods. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(1): 105 112 Jan. 2012. DOI 10.1007/s11390-012-1209-0 Optimal Checkpoint
More informationReal-Time Scheduling and Resource Management
ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 2008 Real-Time Scheduling and Resource Management Lecturer: Giorgio Buttazzo Full Professor Scuola Superiore Sant Anna
More informationCPU SCHEDULING RONG ZHENG
CPU SCHEDULING RONG ZHENG OVERVIEW Why scheduling? Non-preemptive vs Preemptive policies FCFS, SJF, Round robin, multilevel queues with feedback, guaranteed scheduling 2 SHORT-TERM, MID-TERM, LONG- TERM
More informationCoping with silent and fail-stop errors at scale by combining replication and checkpointing
Coping with silent and fail-stop errors at scale by combining replication and checkpointing Anne Benoit a, Aurélien Cavelan b, Franck Cappello c, Padma Raghavan d, Yves Robert a,e, Hongyang Sun d, a Ecole
More informationCS 6901 (Applied Algorithms) Lecture 3
CS 6901 (Applied Algorithms) Lecture 3 Antonina Kolokolova September 16, 2014 1 Representative problems: brief overview In this lecture we will look at several problems which, although look somewhat similar
More informationReliable Computing I
Instructor: Mehdi Tahoori Reliable Computing I Lecture 5: Reliability Evaluation INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE NANO COMPUTING (CDNC) National Research Center of the Helmholtz
More informationMultiprocessor Scheduling I: Partitioned Scheduling. LS 12, TU Dortmund
Multiprocessor Scheduling I: Partitioned Scheduling Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 22/23, June, 2015 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 47 Outline Introduction to Multiprocessor
More informationLecture 6. Real-Time Systems. Dynamic Priority Scheduling
Real-Time Systems Lecture 6 Dynamic Priority Scheduling Online scheduling with dynamic priorities: Earliest Deadline First scheduling CPU utilization bound Optimality and comparison with RM: Schedulability
More informationNon-Work-Conserving Non-Preemptive Scheduling: Motivations, Challenges, and Potential Solutions
Non-Work-Conserving Non-Preemptive Scheduling: Motivations, Challenges, and Potential Solutions Mitra Nasri Chair of Real-time Systems, Technische Universität Kaiserslautern, Germany nasri@eit.uni-kl.de
More informationShadow Computing: An Energy-Aware Fault Tolerant Computing Model
Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms
More informationOnline Work Maximization under a Peak Temperature Constraint
Online Work Maximization under a Peak Temperature Constraint Thidapat Chantem Department of CSE University of Notre Dame Notre Dame, IN 46556 tchantem@nd.edu X. Sharon Hu Department of CSE University of
More informationMapping to Parallel Hardware
- Mapping to Parallel Hardware Bruno Bodin University of Edinburgh, United Kingdom 1 / 27 Context - Hardware Samsung Exynos ARM big-little (Heterogenenous), 4 ARM A15, 4 ARM A7, 2-16 cores GPU, 5 Watt.
More informationDivisible Load Scheduling
Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute
More informationEmbedded Systems 14. Overview of embedded systems design
Embedded Systems 14-1 - Overview of embedded systems design - 2-1 Point of departure: Scheduling general IT systems In general IT systems, not much is known about the computational processes a priori The
More informationPolynomial Time Algorithms for Minimum Energy Scheduling
Polynomial Time Algorithms for Minimum Energy Scheduling Philippe Baptiste 1, Marek Chrobak 2, and Christoph Dürr 1 1 CNRS, LIX UMR 7161, Ecole Polytechnique 91128 Palaiseau, France. Supported by CNRS/NSF
More informationAgreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur
Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program
More informationReal-time Scheduling of Periodic Tasks (1) Advanced Operating Systems Lecture 2
Real-time Scheduling of Periodic Tasks (1) Advanced Operating Systems Lecture 2 Lecture Outline Scheduling periodic tasks The rate monotonic algorithm Definition Non-optimality Time-demand analysis...!2
More informationLecture 2: Metrics to Evaluate Systems
Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video
More informationFine Grain Quality Management
Fine Grain Quality Management Jacques Combaz Jean-Claude Fernandez Mohamad Jaber Joseph Sifakis Loïc Strus Verimag Lab. Université Joseph Fourier Grenoble, France DCS seminar, 10 June 2008, Col de Porte
More informationFault Tolerant Computing CS 530 Software Reliability Growth. Yashwant K. Malaiya Colorado State University
Fault Tolerant Computing CS 530 Software Reliability Growth Yashwant K. Malaiya Colorado State University 1 Software Reliability Growth: Outline Testing approaches Operational Profile Software Reliability
More informationCycleTandem: Energy-Saving Scheduling for Real-Time Systems with Hardware Accelerators
CycleTandem: Energy-Saving Scheduling for Real-Time Systems with Hardware Accelerators Sandeep D souza and Ragunathan (Raj) Rajkumar Carnegie Mellon University High (Energy) Cost of Accelerators Modern-day
More informationMaximizing Rewards for Real-Time Applications with Energy Constraints
Maximizing Rewards for Real-Time Applications with Energy Constraints COSMIN RUSU, RAMI MELHEM, and DANIEL MOSSÉ University of Pittsburgh New technologies have brought about a proliferation of embedded
More informationEE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining
Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining
More informationSPEED SCALING FOR ENERGY AWARE PROCESSOR SCHEDULING: ALGORITHMS AND ANALYSIS
SPEED SCALING FOR ENERGY AWARE PROCESSOR SCHEDULING: ALGORITHMS AND ANALYSIS by Daniel Cole B.S. in Computer Science and Engineering, The Ohio State University, 2003 Submitted to the Graduate Faculty of
More informationA New Task Model and Utilization Bound for Uniform Multiprocessors
A New Task Model and Utilization Bound for Uniform Multiprocessors Shelby Funk Department of Computer Science, The University of Georgia Email: shelby@cs.uga.edu Abstract This paper introduces a new model
More informationScheduling periodic Tasks on Multiple Periodic Resources
Scheduling periodic Tasks on Multiple Periodic Resources Xiayu Hua, Zheng Li, Hao Wu, Shangping Ren* Department of Computer Science Illinois Institute of Technology Chicago, IL 60616, USA {xhua, zli80,
More informationPeriodic I/O Scheduling for Supercomputers
Periodic I/O Scheduling for Supercomputers Guillaume Aupy 1, Ana Gainaru 2, Valentin Le Fèvre 3 1 Inria & U. of Bordeaux 2 Vanderbilt University 3 ENS Lyon & Inria PMBS Workshop, November 217 Slides available
More informationENERGY-EFFICIENT REAL-TIME SCHEDULING ALGORITHM FOR FAULT-TOLERANT AUTONOMOUS SYSTEMS
DOI 10.12694/scpe.v19i4.1438 Scalable Computing: Practice and Experience ISSN 1895-1767 Volume 19, Number 4, pp. 387 400. http://www.scpe.org c 2018 SCPE ENERGY-EFFICIENT REAL-TIME SCHEDULING ALGORITHM
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Fault Tolerant Computing ECE 655
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655 Part 1 Introduction C. M. Krishna Fall 2006 ECE655/Krishna Part.1.1 Prerequisites Basic courses in
More informationOn the Optimal Recovery Threshold of Coded Matrix Multiplication
1 On the Optimal Recovery Threshold of Coded Matrix Multiplication Sanghamitra Dutta, Mohammad Fahim, Farzin Haddadpour, Haewon Jeong, Viveck Cadambe, Pulkit Grover arxiv:1801.10292v2 [cs.it] 16 May 2018
More informationAndrew Morton University of Waterloo Canada
EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo Canada Outline 1) Introduction and motivation 2) Review of EDF and feasibility analysis 3) Hardware accelerators and scheduling
More informationp-yds Algorithm: An Optimal Extension of YDS Algorithm to Minimize Expected Energy For Real-Time Jobs
p-yds Algorithm: An Optimal Extension of YDS Algorithm to Minimize Expected Energy For Real-Time Jobs TIK Report No. 353 Pratyush Kumar and Lothar Thiele Computer Engineering and Networks Laboratory, ETH
More informationThe Concurrent Consideration of Uncertainty in WCETs and Processor Speeds in Mixed Criticality Systems
The Concurrent Consideration of Uncertainty in WCETs and Processor Speeds in Mixed Criticality Systems Zhishan Guo and Sanjoy Baruah Department of Computer Science University of North Carolina at Chapel
More informationState-dependent and Energy-aware Control of Server Farm
State-dependent and Energy-aware Control of Server Farm Esa Hyytiä, Rhonda Righter and Samuli Aalto Aalto University, Finland UC Berkeley, USA First European Conference on Queueing Theory ECQT 2014 First
More informationENERGY EFFICIENT TASK SCHEDULING OF SEND- RECEIVE TASK GRAPHS ON DISTRIBUTED MULTI- CORE PROCESSORS WITH SOFTWARE CONTROLLED DYNAMIC VOLTAGE SCALING
ENERGY EFFICIENT TASK SCHEDULING OF SEND- RECEIVE TASK GRAPHS ON DISTRIBUTED MULTI- CORE PROCESSORS WITH SOFTWARE CONTROLLED DYNAMIC VOLTAGE SCALING Abhishek Mishra and Anil Kumar Tripathi Department of
More informationContinuous heat flow analysis. Time-variant heat sources. Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering
Thermal Modeling, Analysis and Management of 2D Multi-Processor System-on-Chip Prof David Atienza Alonso Embedded Systems Laboratory (ESL) Institute of EE, Falty of Engineering Outline MPSoC thermal modeling
More informationTechnology Mapping for Reliability Enhancement in Logic Synthesis
Technology Mapping for Reliability Enhancement in Logic Synthesis Zhaojun Wo and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts,Amherst,MA 01003 E-mail: {zwo,koren}@ecs.umass.edu
More informationAn Efficient Energy-Optimal Device-Scheduling Algorithm for Hard Real-Time Systems
An Efficient Energy-Optimal Device-Scheduling Algorithm for Hard Real-Time Systems S. Chakravarthula 2 S.S. Iyengar* Microsoft, srinivac@microsoft.com Louisiana State University, iyengar@bit.csc.lsu.edu
More informationDesign and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources
Design and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources Jan Reineke @ saarland university ACACES Summer School 2017 Fiuggi, Italy computer science Fixed-Priority
More informationCheckpointing strategies with prediction windows
hal-7899, version - 5 Feb 23 Checkpointing strategies with prediction windows Guillaume Aupy, Yves Robert, Frédéric Vivien, Dounia Zaidouni RESEARCH REPORT N 8239 February 23 Project-Team ROMA SSN 249-6399
More informationChapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.
Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved. 1 Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should
More informationThe Weighted Byzantine Agreement Problem
The Weighted Byzantine Agreement Problem Vijay K. Garg and John Bridgman Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712-1084, USA garg@ece.utexas.edu,
More informationTitle: Maximizing Reliability of Energy Constrained Parallel Applications on Heterogeneous Distributed Systems
Title: Maximizing Reliability of Energy Constrained Parallel Applications on Heterogeneous Distributed Systems Author: Xiongren Xiao Guoqi Xie Cheng Xu Chunnian Fan Renfa Li Keqin Li PII: S1877-7503(17)30493-3
More informationCheckpointing strategies with prediction windows
Author manuscript, published in "PRDC - The 9th EEE Pacific Rim nternational Symposium on Dependable Computing - 23 23" Checkpointing strategies with prediction windows Regular paper Guillaume Aupy,3,
More informationSlack Reclamation for Real-Time Task Scheduling over Dynamic Voltage Scaling Multiprocessors
Slack Reclamation for Real-Time Task Scheduling over Dynamic Voltage Scaling Multiprocessors Jian-Jia Chen, Chuan-Yue Yang, and Tei-Wei Kuo Department of Computer Science and Information Engineering Graduate
More informationReclaiming the energy of a schedule: models and algorithms
Reclaiming the energy of a schedule: models and algorithms Guillaume Aupy, Anne Benoit, Fanny Dufossé, Yves Robert To cite this version: Guillaume Aupy, Anne Benoit, Fanny Dufossé, Yves Robert. Reclaiming
More informationBounding the End-to-End Response Times of Tasks in a Distributed. Real-Time System Using the Direct Synchronization Protocol.
Bounding the End-to-End Response imes of asks in a Distributed Real-ime System Using the Direct Synchronization Protocol Jun Sun Jane Liu Abstract In a distributed real-time system, a task may consist
More informationNetwork Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas
Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Slides are partially based on the joint work of Christos Litsas, Aris Pagourtzis,
More informationShared Recovery for Energy Efficiency and Reliability Enhancements in Real-Time Applications with Precedence Constraints
Shared Recovery for Energy Efficiency and Reliability Enhancements in Real-Time Applications with Precedence Constraints BAOXIAN ZHAO and HAKAN AYDIN, George Mason University DAKAI ZHU, University of Texas
More information