Resilient and energy-aware algorithms

Size: px
Start display at page:

Download "Resilient and energy-aware algorithms"

Transcription

1 Resilient and energy-aware algorithms Anne Benoit ENS Lyon CR /2017 CR02 Resilient and energy-aware algorithms 1/ 65

2 Motivation Need of resilient algorithms (see classes 3-7) Need of energy-aware algorithms (see classes 8-9)... And need to combine both! DVFS has an impact on resilience, so both problematics are linked... Also, does energy have an impact on the optimal checkpointing period? CR02 Resilient and energy-aware algorithms 2/ 65

3 Motivation Need of resilient algorithms (see classes 3-7) Need of energy-aware algorithms (see classes 8-9)... And need to combine both! DVFS has an impact on resilience, so both problematics are linked... Also, does energy have an impact on the optimal checkpointing period? CR02 Resilient and energy-aware algorithms 2/ 65

4 Outline 1 Optimal checkpointing period: time vs energy Framework Optimal period for execution time Optimal period for energy Experiments 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 3/ 65

5 Energy: a crucial issue Data centers ( Cloud Begins with Coal, M. Mills) TWh in 2013 consumption of Turkey (242), Spain (267), or Italy (309) 530Mt of CO 2 (carbontrust) > Canada crucial for both environmental and economical reasons Coordinated periodic checkpointing: what is the optimal checkpointing period if you optimize for Energy consumption? Is there a tradeoff between optimizing for Energy and optimizing for Time? Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 4/ 65

6 Outline 1 Optimal checkpointing period: time vs energy Framework Optimal period for execution time Optimal period for energy Experiments 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 5/ 65

7 Power model P Static : base power (platform switched on) Trend: goes down (w.r.t. other powers) P Cal : overhead due to CPU (computations) P I/O : overhead due to file I/O (checkpoint or recovery) P Down : overhead when one machine is down (rebooting) Meneses, Sarood and Kalé: Base power L = P Static E. Meneses, O. Sarood, and L.V. Kalé, Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems, in Proceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2012), New York, USA, October Maximum power H = P Static + P Cal P I/O = 0 (and P Down = 0) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 6/ 65

8 Coordinated checkpointing Periodic checkpointing policy of period T Independent and identically distributed failures Applies to a single processor with MTBF µ = µ ind Applies to a platform with p processors with MTBF µ = µ ind p tightly-coupled application progress all processors available Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 7/ 65

9 Cost of checkpointing Time spent working Time spent checkpointing Time spent working with slowdown Time Computing the first chunk Checkpointing the first chunk Processing the first chunk General model: while a checkpoint is taken, computations are slowed-down: during a checkpoint of duration C, the same amount of computation is done as during a time ωc without checkpointing (0 ω 1). Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 8/ 65

10 Outline 1 Optimal checkpointing period: time vs energy Framework Optimal period for execution time Optimal period for energy Experiments 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 9/ 65

11 Expected execution time T base execution time without any overhead T final = T ff + T fails total execution time Time for fault-free execution Time lost due to failures T T ff = T base T (1 ω)c T fails = T final (D + R + Re-Exec) µ Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 10/ 65

12 AlgoT: Strategy with T opt Time Computation of the optimal checkpointing period in the non-blocking case: See Course 4, Section 4: Assessing protocols at scale (with α instead of ω) T opt Time = 2(1 ω)c(µ (D + R + ωc)) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 11/ 65

13 Outline 1 Optimal checkpointing period: time vs energy Framework Optimal period for execution time Optimal period for energy Experiments 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 12/ 65

14 Consumed energy E final = T Cal P Cal + T I/O P I/O + T Down P Down + T final P Static ( = T base + T final (ωc + T 2 C 2 + ωc 2 )) P Cal µ 2T 2T ( Tfinal + (R + C 2 ) ) T base + C P µ 2T T (1 ω)c I/O + T final µ DP Down + T final P Static T final T Cal + T I/O + T Down, unless ω = 0 CPU and I/O activities are overlapped (and both consumed) when checkpointing Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 13/ 65

15 AlgoE: Strategy with T opt Energy P Cal = αp Static, P I/O = βp Static, P Down = γp Static (T a) 2( b 2µ T ) 2 E P Static T base final = ab+t 2 ( ) 2µ (αωc +βr +γd +µ)+ µ αt 2 + α(1 ω)c2 + βc2 2T 2T + (T a)(b 2µ T ) ( α+ α(1 ω)c2 βc 2 ) ( ) βc b 2µ T 2µ T 2 ( ) = T 3( ) 1 4µ 4µ 1 +T 2 αωc+βr+γd 2µ 2µ βc 2µ 4µ ( ) +T 2µ ab 2µ ab + βcb (α(1 ω) β)c2 2 µ 4µ 2 βcb 2 ( ) ab(αωc+βr+γd+µ) b µ 2µ a 4µ 2 (α(1 ω) β)c 2 ( ) + T 1 (α(1 ω) β) 2µ C (α(1 ω) β) C ) 2µ 2µ 2 + 2µ b + a βc 4µ 2 + 2µ 1 ( ) +T = T 2 ( αωc+βr+γd 2µ 2 + b+ a (βc a)b 2 (α(1 ω) β)c2 µ 4µ 2 ab(αωc+βr+γd+µ) βcb 2 ( µ ) + b 2µ + a 4µ 2 (α(1 ω) β)c 2. Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 14/ 65

16 AlgoE: Strategy with T opt Energy P Cal = αp Static, P I/O = βp Static, P Down = γp Static (T a) 2( b 2µ T ) 2 E P Static T base final = ab+t 2 ( ) 2µ (αωc +βr +γd +µ)+ µ αt 2 + α(1 ω)c2 + βc2 2T 2T + (T a)(b 2µ T ) ( α+ α(1 ω)c2 βc 2 ) ( ) βc b 2µ T 2µ T 2 ( ) = T 3( ) 1 4µ 4µ 1 +T 2 αωc+βr+γd 2µ 2µ βc 2µ 4µ ( ) +T 2µ ab 2µ ab + βcb (α(1 ω) β)c2 2 µ 4µ 2 βcb 2 ( ) ab(αωc+βr+γd+µ) b µ 2µ a 4µ 2 (α(1 ω) β)c 2 ( ) + T 1 (α(1 ω) β) 2µ C (α(1 ω) β) C ) 2µ 2µ 2 + 2µ b + a βc 4µ 2 + 2µ 1 ( ) +T = T 2 ( αωc+βr+γd 2µ 2 + b+ a (βc a)b 2 (α(1 ω) β)c2 µ 4µ 2 ab(αωc+βr+γd+µ) βcb 2 ( µ ) + b 2µ + T a opt 4µ 2 (α(1 ω) β)c 2. We let Maple compute Energy Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 14/ 65

17 Outline 1 Optimal checkpointing period: time vs energy Framework Optimal period for execution time Optimal period for energy Experiments 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 15/ 65

18 Parameters: power ρ = P Static + P I/O P Static + P Cal = 1 + β 1 + α 20 Mega-watts for Exascale platform with 10 6 nodes Nominal power = 20 milli-watts per node 1/2 1/4 of that power in static consumption I/O an order of magnitude more than computing (J. Shalf, S. Dosanjh, and J. Morrison, Exascale computing technology challenges, in the 9th Int. Conf. High Performance Computing for Computational Science, 2011) Scenario 1: P Static = 10, P Cal = 10, P I/O = 100 ρ = 5.5 Scenario 2: P Static = 5, P Cal = 10, P I/O = 100 ρ = 7 Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 16/ 65

19 Parameters: resilience MTBF N = 45, 208 processors: one fault per day Individual (processor) MTBF µ ind 125 years. Total number of processors N: from N = 219, 150 to N = 2, 191, 500 µ = 300 min down to µ = 30 min C = R = 10 min, D = 1 min, and ω = 1/2. Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 17/ 65

20 Impact of ratio ρ T final (T energy )/ E final (T time )/ T final (T energy )/ E final (T time )/ T final (T time ) E final (T energy ) T final (T time ) E final (T energy ) ρ (µ=300) (µ=120) (µ=30) ρ ρ (µ=300) How much slower, (µ=120) if we optimize for energy instead of optimizing (µ=30) for time Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 18/ 65

21 T final (T energy )/ Impact of1.12 ratio ρ E final (T time )/ T final (T time ) E final (T energy ) (µ=300) (µ=120) (µ=30) ρ How much more energy consumption, if we optimize for time instead of optimizing for energy Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 19/ 65

22 AlgoT over AlgoE 1.25 How much slower, if we optimize for energy instead of optimizing for time How much more energy consumption, if we optimize for time instead of optimizing for energy ρ ρ µ µ Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 20/ 65

23 Scalability (ρ = 5.5) T final (T energy )/ E final (T time )/ T final (T time ) E final (T energy ) Number of nodes (ρ=5.5) (ρ=5.5) Number of nodes µ = 120 min for 10 6 nodes, C = R = 1 min, D = 0.1 min, ω = 1/2 Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 21/ 65

24 Scalability (ρ = 7) T final (T energy )/ E final (T time )/ T final (T time ) E final (T energy ) Number of nodes (ρ=7) (ρ=7) Number of nodes µ = 120 min for 10 6 nodes, C = R = 1 min, D = 0.1 min, ω = 1/2 Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 22/ 65

25 Conclusion Coordinated checkpointing, non-blocking Different optimal periods for time and energy Save more than 20% of energy with 10% increase in time Expect more gains for large-scale platforms Variety of resilience and power consumption parameters Quite flexible analytical model Easy to instantiate for other scenarios CR02 Resilient and energy-aware algorithms 23/ 65

26 Conclusion Coordinated checkpointing, non-blocking Different optimal periods for time and energy Save more than 20% of energy with 10% increase in time Expect more gains for large-scale platforms Variety of resilience and power consumption parameters Quite flexible analytical model Easy to instantiate for other scenarios CR02 Resilient and energy-aware algorithms 23/ 65

27 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption Model for one single chunk Model for multiple chunks and optimization problem Solving the problems Simulations 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 24/ 65

28 Framework Execution of a divisible task (W operations) Failures may occur Transient failures Resilience through checkpointing Objective: minimize expected energy consumption E(E), given a deadline bound D Probabilistic nature of failure hits: expectation of energy consumption is natural (average cost over many executions) Deadline bound: two relevant scenarios (soft or hard deadline) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 25/ 65

29 Soft vs hard deadline Soft deadline: met in expectation, i.e., E(T ) D (average response time) Hard deadline: met in the worst case, i.e., T wc D VS Hard (worst-case) Soft (expected) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 26/ 65

30 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption Model for one single chunk Model for multiple chunks and optimization problem Solving the problems Simulations 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 27/ 65

31 Execution time, one single chunk One single chunk of size W Checkpoint overhead: execution time T C Instantaneous failure rate: λ First execution at speed s: T exec = W s + T C Failure probability: P fail = λt exec = λ( W s + T C ) In case of failure: re-execute at speed σ: T reexec = W σ And we assume success after re-execution + T C E(T ) = T exec + P fail T reexec = ( W s + T C ) + λ( W s + T C )( W σ + T C ) T wc = T exec + T reexec = ( W s + T C ) + ( W σ + T C ) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 28/ 65

32 Energy consumption, one single chunk One single chunk of size W Checkpoint overhead: energy consumption E C First execution at speed s: W s s 3 + E C = Ws 2 + E C Re-execution at speed σ: W σ 2 + E C, with probability P fail (Pfail = λt exec = λ( W s + T C ) ) E(E) = (Ws 2 + E C ) + λ ( W s + T C ) ( W σ 2 + E C ) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 29/ 65

33 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption Model for one single chunk Model for multiple chunks and optimization problem Solving the problems Simulations 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 30/ 65

34 Multiple chunks Execution times: sum of execution times for each chunk (worst-case or expected) Expected energy consumption: sum of expected energy for each chunk Coherent failure model: consider two chunks W 1 + W 2 = W Probability of failure for first chunk: Pfail 1 = λ( W 1 s + T C ) For second chunk: Pfail 2 = λ( W 2 s + T C ) With a single chunk of size W : P fail = λ( W s + T C ), differs from Pfail 1 + P2 fail only because of extra checkpoint Trade-off: many small chunks (more T C to pay, but small re-execution cost) vs few larger chunks (fewer T C, but increased re-execution cost) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 31/ 65

35 Optimization problem Decisions that should be taken before execution: Chunks: how many (n)? which sizes (W i for chunk i)? Speeds of each chunk: first run (s i )? re-execution (σ i )? Input: W, T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 32/ 65

36 Optimization problem Decisions that should be taken before execution: Chunks: how many (n)? which sizes (W i for chunk i)? Speeds of each chunk: first run (s i )? re-execution (σ i )? Input: W, T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 32/ 65

37 Optimization problem Decisions that should be taken before execution: Chunks: how many (n)? which sizes (W i for chunk i)? Speeds of each chunk: first run (s i )? re-execution (σ i )? Input: W, T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 32/ 65

38 Models Chunks Single chunk of size W Speed per chunk VS VS Multiple chunks (n and W i s) Single speed (s) Multiple speeds (s and σ) Deadline bound VS Hard (T wc D) Soft (E(T ) D) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 33/ 65

39 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption Model for one single chunk Model for multiple chunks and optimization problem Solving the problems Simulations 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 34/ 65

40 Single chunk and single speed Consider first that s = σ (single speed): need to find optimal speed E(E) is a function of s: E(E)(s) = (Ws 2 + E C )(1 + λ( W s + T C )) Lemma: this function is convex and has a unique minimum s (function of λ, W, E c, T c ) s = λw 6(1+λT C ) ( (3 3 where a = λe C ( 2(1+λTC ) λw 27a 2 4a 27a+2) 1/3 2 1/3 (3 3 ) 2 E(T ) and T wc : decreasing functions of s ) 2 27a 1/3 1, 2 4a 27a+2) 1/3 Minimum speed s exp and s wc required to match deadline D (function of D, W, T c, and λ for s exp ) Optimal speed: maximum between s and s exp or s wc Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 35/ 65

41 Single chunk and multiple speeds Consider now that s σ (multiple speeds): two unknowns E(E) is a function of s and σ: E(E)(s, σ) = (Ws 2 + E C ) + λ( W s + T C )(W σ 2 + E C ) Lemma: energy minimized when deadline tight (both for wc and exp) σ expressed as a function of s: σ exp = λw D, (1+λT W C ) σwc = W (D 2T C )s W s +T s C Minimization of single-variable function, can be solved numerically (no expression of optimal s) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 36/ 65

42 General problem with multiple chunks Divisible task of size W Split into n chunks of size W i : n i=1 W i = W Chunk i is executed once at speed s i, and re-executed (if necessary) at speed σ i Unknowns: n, W i, s i, σ i n ( E(E) = Wi si 2 ) n + E C + λ i=1 i=1 ( Wi s i + T C ) (Wi σ 2 i + E C ) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 37/ 65

43 Multiple chunks and single speed With a single speed, σ i = s i for each chunk Theorem: in optimal solution, n equal-sized chunks (W i = W n ), executed at same speed s i = s Proof by contradiction: consider two chunks W 1 and W 2 executed at speed s 1 and s 2, with either s 1 s 2, or s 1 = s 2 and W 1 W 2 Strictly better solution with two chunks of size w = (W 1 + W 2 )/2 and same speed s Only two unknowns, s and n Minimum speed with n chunks: s exp (n) = W 1 + 2λT C + 4 λd + 1 n 2(D nt C (1 + λt C )) Minimization of double-variable function, can be solved numerically both for expected and hard deadline Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 38/ 65

44 Multiple chunks and multiple speeds Need to find n, W i, s i, σ i With expected deadline: All re-execution speeds are equal (σ i = σ) and tight deadline All chunks have same size and are executed at same speed With hard deadline: If s i = s and σ i = σ, then all W i s are equal Conjecture: equal-sized chunks, same first-execution / re-execution speeds σ as a function of s, bound on s given n Minimization of double-variable function, can be solved numerically Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 39/ 65

45 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption Model for one single chunk Model for multiple chunks and optimization problem Solving the problems Simulations 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 40/ 65

46 Simulation settings Large set of simulations: illustrate differences between models Maple software to solve problems We plot relative energy consumption as a function of λ The lower the better Given a deadline constraint (hard or expected), normalize with the result of single-chunk single-speed Impact of the constraint: normalize expected deadline with hard deadline Parameters varying within large ranges Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 41/ 65

47 Comparison with single-chunk single-speed Results identical for any value of W /D E SCMSed MCSSed MCMSed Model (/SCSS) SCMShd MCSShd MCMShd 1e 06 1e 03 1e+00 lambda For expected deadline, with small λ (< 10 2 ), using multiple chunks or multiple speeds do not improve energy ratio: re-execution term negligible; increasing λ: improvement with multiple chunks For hard deadline, better to run at high speed during second execution: use multiple speeds; use multiple chunks if frequent failures Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 42/ 65

48 Expected vs hard deadline constraint E Model SCSS MCSS SCMS MCMS 1e 06 1e 03 1e+00 lambda Important differences for single speed models, confirming previous conclusions: with hard deadline, use multiple speeds Multiple speeds: no difference for small λ: re-execution at maximum speed has little impact on expected energy consumption; increasing λ: more impact of re-execution, and expected deadline may use slower re-execution speed, hence reducing energy consumption Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 43/ 65

49 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy Complexity results Heuristics Simulation results 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 44/ 65

50 Framework DAG: G = (V, E) n = V tasks T i of weight w i p identical processors fully connected DVFS, Continuous model: interval of available continuous speeds [s min, s max ] One speed per task Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 45/ 65

51 Makespan Execution time of T i at speed s i : d i = w i s i If T i is executed twice on the same processor at speeds s i and s i : d i = w i s i + w i s i Constraint on makespan: end of execution before deadline D (hard deadline constraint) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 46/ 65

52 Reliability Transient failure: local, no impact on the rest of the system Transient failure rate: Poisson distribution of parameter: λ(s) = λ 0 e d smax s smax s min. Reliability R i of task T i as a function of speed s i : R i (s i ) = e λ(s i )Exe(w i,s i ) = (1st order) 1 λ 0 e ds i w i s i Threshold reliability (and hence speed s rel ) R i (s rel 1) R i (s) s min s rel s max Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 47/ 65 s

53 Re-execution: a task is re-executed on the same processor, just after its first execution With two executions, reliability R i of task T i is: R i = 1 (1 R i (s i ))(1 R i (s i )) Constraint on reliability: Reliability: R i R i (s rel ), and at most one re-execution Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 48/ 65

54 Energy Energy to execute task T i once at speed s i : E i (s i ) = w i s 2 i Dynamic part of classical energy models With re-executions, it is natural to take the worst-case scenario: ( Energy : E i = w i s 2 i + s i 2 ) Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 49/ 65

55 Energy and reliability: set of possible speeds Energy w i s 2 i + w i s 2 i = 2E i (s i ) w i s 2 i = E i (s i ) E i (s rel ) s rel 2 s rel Speed Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 50/ 65

56 Tri-Crit-Cont Given G = (V, E) Find A schedule of the tasks A set of tasks I = {i T i is executed twice} Execution speed s i for each task T i Re-execution speed s i for each task in I such that i I w i (s 2 i + s 2 i ) + i / I w i s 2 i is minimized, while meeting reliability and deadline constraints Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 51/ 65

57 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy Complexity results Heuristics Simulation results 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 52/ 65

58 Complexity results One speed per task Re-execution at same speed as first execution, i.e., s i = s i Tri-Crit-Cont is NP-hard even for a linear chain, but not known to be in NP (because of continuous model) Polynomial-time solution for a fork Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 53/ 65

59 Complexity results with Vdd-Hopping Each task is computed using at most two different speeds Tri-Crit-Vdd is NP-complete even for a linear chain Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 54/ 65

60 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy Complexity results Heuristics Simulation results 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 55/ 65

61 Energy-reducing heuristics Two steps: Mapping (NP-hard) List scheduling Speed scaling + re-execution (NP-hard) Energy reducing The list scheduling heuristic maps tasks onto processors at speed s max, and we keep this mapping in step two Step two = slack reclamation: use of deceleration and re-execution Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 56/ 65

62 Deceleration and re-execution Deceleration: select a set of tasks that we execute at speed max max(s rel, s i=1..n t i max D ): slowest possible speed meeting both reliability and deadline constraints Re-execution: greedily select tasks for re-execution Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 57/ 65

63 Super-weight (SW) of a task SW: sum of the weights of the tasks (including T i ) whose execution interval is included into T i s execution interval SW of task slowed down = estimation of the total amount of work that can be slowed down together with that task p 4 T i p 3 p 2 p 1 s e time Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 58/ 65

64 Selected heuristics A.SUS-Crit: efficient on DAGs with low degree of parallelism max Set the speed of every task to max(s rel, s i=1..n t i max D ) Sort the tasks of every critical path according to their SW and try to re-execute them Sort all the tasks according to their weight and try to re-execute them B.SUS-Crit-Slow: good for highly parallel tasks: re-execute, then decelerate Sort the tasks of every critical path according to their SW and try to re-execute them. If not possible, then try to slow them down Sort all tasks according to their weight and try to re-execute them. If not possible, then try to slow them down Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 59/ 65

65 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy Complexity results Heuristics Simulation results 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 60/ 65

66 Results We compare the impact of: the number of processors p the ratio D of the deadline over the minimum deadline D min (given by the list-scheduling heuristic at speed s max ) on the output of each heuristic Results normalized by heuristic running each task at speed s max ; the lower the better Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 61/ 65

67 Results 1 1 A.SUS-Crit A.SUS-Crit 0.9 B.SUS-Crit-Slow 0.9 B.SUS-Crit-Slow Eg / Eg_fmax Eg / Eg_fmax Number of processors Number of processors With increasing p, D = 1.2 (left), D = 2.4 (right) A better when number of processors is small B better when number of processors is large Superiority of B for tight deadlines: decelerates critical tasks that cannot be re-executed Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 61/ 65

68 Summary Tri-criteria energy/makespan/reliability optimization problem Various theoretical results Two-step approach for polynomial-time heuristics: List-scheduling heuristic Energy-reducing heuristics Two complementary energy-reducing heuristics for Tri-Crit-Cont CR02 Resilient and energy-aware algorithms 62/ 65

69 Outline 1 Optimal checkpointing period: time vs energy 2 Checkpointing and energy consumption 3 Tri-criteria problem: execution time, reliability, energy 4 Conclusion Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 63/ 65

70 Conclusion Resilience and energy consumption are two of the main challenges for Exascale platforms Revisiting checkpointing techniques for reliability while minimizing energy consumption Tri-criteria heuristics aiming at minimizing the energy consumption, with re-execution to deal with reliability... Still a lot of challenging algorithmic problems on these hot topics Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 64/ 65

71 Conclusion Resilience and energy consumption are two of the main challenges for Exascale platforms Revisiting checkpointing techniques for reliability while minimizing energy consumption Tri-criteria heuristics aiming at minimizing the energy consumption, with re-execution to deal with reliability... Still a lot of challenging algorithmic problems on these hot topics Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 64/ 65

72 Bibliography Optimal checkpointing period: time vs energy (Aupy, Benoit, Hérault, Robert, Dongarra, 2013) Energy-aware checkpointing of divisible tasks with soft or hard deadlines (Aupy, Benoit, Melhem, Renaud-Goud, Robert, 2013) Energy-aware scheduling under reliability and makespan constraints (Aupy, Benoit, Robert, 2012) CR02 Resilient and energy-aware algorithms 65/ 65

Energy-aware checkpointing of divisible tasks with soft or hard deadlines

Energy-aware checkpointing of divisible tasks with soft or hard deadlines Energy-aware checkpointing of divisible tasks with soft or hard deadlines Guillaume Aupy 1, Anne Benoit 1,2, Rami Melhem 3, Paul Renaud-Goud 1 and Yves Robert 1,2,4 1. Ecole Normale Supérieure de Lyon,

More information

Energy-efficient scheduling

Energy-efficient scheduling Energy-efficient scheduling Guillaume Aupy 1, Anne Benoit 1,2, Paul Renaud-Goud 1 and Yves Robert 1,2,3 1. Ecole Normale Supérieure de Lyon, France 2. Institut Universitaire de France 3. University of

More information

Energy-aware checkpointing of divisible tasks with soft or hard deadlines

Energy-aware checkpointing of divisible tasks with soft or hard deadlines Energy-aware checkpointing of divisible tasks with soft or hard deadlines Guillaume Aupy, Anne Benoit, Rami Melhem, Paul Renaud-Goud, Yves Robert To cite this version: Guillaume Aupy, Anne Benoit, Rami

More information

Optimal checkpointing periods with fail-stop and silent errors

Optimal checkpointing periods with fail-stop and silent errors Optimal checkpointing periods with fail-stop and silent errors Anne Benoit ENS Lyon Anne.Benoit@ens-lyon.fr http://graal.ens-lyon.fr/~abenoit 3rd JLESC Summer School June 30, 2016 Anne.Benoit@ens-lyon.fr

More information

Reclaiming the energy of a schedule: models and algorithms

Reclaiming the energy of a schedule: models and algorithms Author manuscript, published in "Concurrency and Computation: Practice and Experience 25 (2013) 1505-1523" DOI : 10.1002/cpe.2889 CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.:

More information

On scheduling the checkpoints of exascale applications

On scheduling the checkpoints of exascale applications On scheduling the checkpoints of exascale applications Marin Bougeret, Henri Casanova, Mikaël Rabie, Yves Robert, and Frédéric Vivien INRIA, École normale supérieure de Lyon, France Univ. of Hawai i at

More information

Checkpoint Scheduling

Checkpoint Scheduling Checkpoint Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute of

More information

Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden

Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden of /4 4 Embedded Systems Design: Optimization Challenges Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden Outline! Embedded systems " Example area: automotive electronics " Embedded systems

More information

Tradeoff between Reliability and Power Management

Tradeoff between Reliability and Power Management Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,

More information

Energy-aware scheduling under reliability and makespan constraints

Energy-aware scheduling under reliability and makespan constraints Energy-aare scheduling under reliability and makespan constraints Guillaume Aupy, Anne Benoit and Yves Robert LIP, Ecole Normale Supérieure de Lyon Email: {Guillaume.Aupy Anne.Benoit Yves.Robert}@ens-lyon.fr

More information

Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems

Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems Jian-Jia Chen *, Chuan Yue Yang, Tei-Wei Kuo, and Chi-Sheng Shih Embedded Systems and Wireless Networking Lab. Department of Computer

More information

CIS 4930/6930: Principles of Cyber-Physical Systems

CIS 4930/6930: Principles of Cyber-Physical Systems CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 11 Scheduling Hao Zheng Department of Computer Science and Engineering University of South Florida H. Zheng (CSE USF) CIS 4930/6930: Principles

More information

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute

More information

Failure Tolerance of Multicore Real-Time Systems scheduled by a Pfair Algorithm

Failure Tolerance of Multicore Real-Time Systems scheduled by a Pfair Algorithm Failure Tolerance of Multicore Real-Time Systems scheduled by a Pfair Algorithm Yves MOUAFO Supervisors A. CHOQUET-GENIET, G. LARGETEAU-SKAPIN OUTLINES 2 1. Context and Problematic 2. State of the art

More information

How to deal with uncertainties and dynamicity?

How to deal with uncertainties and dynamicity? How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling

More information

Approximation algorithms for energy, reliability and makespan optimization problems

Approximation algorithms for energy, reliability and makespan optimization problems Approximation algorithms for energy, reliability and makespan optimization problems Guillaume Aupy, Anne Benoit REEARCH REPORT N 8107 July 014 Project-Team ROMA IN 049-6399 IRN INRIA/RR--8107--FR+ENG Approximation

More information

Fault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing

Fault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing 20 Years of Innovative Computing Knoxville, Tennessee March 26 th, 200 Fault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing Zizhong (Jeffrey) Chen zchen@mines.edu Colorado School

More information

Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing

Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing Prasanna Balaprakash, Leonardo A. Bautista Gomez, Slim Bouguerra, Stefan M. Wild, Franck Cappello, and Paul D. Hovland

More information

Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems

Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems Yifeng Guo, Dakai Zhu The University of Texas at San Antonio Hakan Aydin George Mason University Outline Background and Motivation

More information

Scheduling algorithms for heterogeneous and failure-prone platforms

Scheduling algorithms for heterogeneous and failure-prone platforms Scheduling algorithms for heterogeneous and failure-prone platforms Yves Robert Ecole Normale Supérieure de Lyon & Institut Universitaire de France http://graal.ens-lyon.fr/~yrobert Joint work with Olivier

More information

STATISTICAL PERFORMANCE

STATISTICAL PERFORMANCE STATISTICAL PERFORMANCE PROVISIONING AND ENERGY EFFICIENCY IN DISTRIBUTED COMPUTING SYSTEMS Nikzad Babaii Rizvandi 1 Supervisors: Prof.Albert Y.Zomaya Prof. Aruna Seneviratne OUTLINE Introduction Background

More information

Fault-Tolerant Techniques for HPC

Fault-Tolerant Techniques for HPC Fault-Tolerant Techniques for HPC Yves Robert Laboratoire LIP, ENS Lyon Institut Universitaire de France University Tennessee Knoxville Yves.Robert@inria.fr http://graal.ens-lyon.fr/~yrobert/htdc-flaine.pdf

More information

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE

MICROPROCESSOR REPORT.   THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By

More information

Efficient checkpoint/verification patterns for silent error detection

Efficient checkpoint/verification patterns for silent error detection Efficient checkpoint/verification patterns for silent error detection Anne Benoit 1, Saurabh K. Raina 2 and Yves Robert 1,3 1. LIP, École Normale Supérieure de Lyon, CNRS & INRIA, France 2. Jaypee Institute

More information

Reliability-Aware Power Management for Real-Time Embedded Systems. 3 5 Real-Time Tasks... Dynamic Voltage and Frequency Scaling...

Reliability-Aware Power Management for Real-Time Embedded Systems. 3 5 Real-Time Tasks... Dynamic Voltage and Frequency Scaling... Chapter 1 Reliability-Aware Power Management for Real-Time Embedded Systems Dakai Zhu The University of Texas at San Antonio Hakan Aydin George Mason University 1.1 1.2 Introduction... Background and System

More information

Segment-Fixed Priority Scheduling for Self-Suspending Real-Time Tasks

Segment-Fixed Priority Scheduling for Self-Suspending Real-Time Tasks Segment-Fixed Priority Scheduling for Self-Suspending Real-Time Tasks Junsung Kim, Björn Andersson, Dionisio de Niz, and Raj Rajkumar Carnegie Mellon University 2/31 Motion Planning on Self-driving Parallel

More information

Efficient checkpoint/verification patterns

Efficient checkpoint/verification patterns Efficient checkpoint/verification patterns Anne Benoit a, Saurabh K. Raina b, Yves Robert a,c a Laboratoire LIP, École Normale Supérieure de Lyon, France b Department of CSE/IT, Jaypee Institute of Information

More information

Checkpointing strategies for parallel jobs

Checkpointing strategies for parallel jobs Checkpointing strategies for parallel jobs Marin Bougeret ENS Lyon, France Marin.Bougeret@ens-lyon.fr Yves Robert ENS Lyon, France Yves.Robert@ens-lyon.fr Henri Casanova Univ. of Hawai i at Mānoa, Honolulu,

More information

Process Scheduling for RTS. RTS Scheduling Approach. Cyclic Executive Approach

Process Scheduling for RTS. RTS Scheduling Approach. Cyclic Executive Approach Process Scheduling for RTS Dr. Hugh Melvin, Dept. of IT, NUI,G RTS Scheduling Approach RTS typically control multiple parameters concurrently Eg. Flight Control System Speed, altitude, inclination etc..

More information

EDF Feasibility and Hardware Accelerators

EDF Feasibility and Hardware Accelerators EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo, Waterloo, Canada, arrmorton@uwaterloo.ca Wayne M. Loucks University of Waterloo, Waterloo, Canada, wmloucks@pads.uwaterloo.ca

More information

Scheduling divisible loads with return messages on heterogeneous master-worker platforms

Scheduling divisible loads with return messages on heterogeneous master-worker platforms Scheduling divisible loads with return messages on heterogeneous master-worker platforms Olivier Beaumont, Loris Marchal, Yves Robert Laboratoire de l Informatique du Parallélisme École Normale Supérieure

More information

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms Anne Benoit, Fanny Dufossé and Yves Robert LIP, École Normale Supérieure de Lyon, France {Anne.Benoit Fanny.Dufosse

More information

Spatio-Temporal Thermal-Aware Scheduling for Homogeneous High-Performance Computing Datacenters

Spatio-Temporal Thermal-Aware Scheduling for Homogeneous High-Performance Computing Datacenters Spatio-Temporal Thermal-Aware Scheduling for Homogeneous High-Performance Computing Datacenters Hongyang Sun a,, Patricia Stolf b, Jean-Marc Pierson b a Ecole Normale Superieure de Lyon & INRIA, France

More information

Energy-aware scheduling for GreenIT in large-scale distributed systems

Energy-aware scheduling for GreenIT in large-scale distributed systems Energy-aware scheduling for GreenIT in large-scale distributed systems 1 PASCAL BOUVRY UNIVERSITY OF LUXEMBOURG GreenIT CORE/FNR project Context and Motivation Outline 2 Problem Description Proposed Solution

More information

A Dynamic Real-time Scheduling Algorithm for Reduced Energy Consumption

A Dynamic Real-time Scheduling Algorithm for Reduced Energy Consumption A Dynamic Real-time Scheduling Algorithm for Reduced Energy Consumption Rohini Krishnapura, Steve Goddard, Ala Qadi Computer Science & Engineering University of Nebraska Lincoln Lincoln, NE 68588-0115

More information

Adaptive and Power-Aware Resilience for Extreme-scale Computing

Adaptive and Power-Aware Resilience for Extreme-scale Computing Adaptive and Power-Aware Resilience for Extreme-scale Computing Xiaolong Cui, Taieb Znati, Rami Melhem Computer Science Department University of Pittsburgh Pittsburgh, USA Email: {mclarencui, znati, melhem}@cs.pitt.edu

More information

Saving Energy in Sparse and Dense Linear Algebra Computations

Saving Energy in Sparse and Dense Linear Algebra Computations Saving Energy in Sparse and Dense Linear Algebra Computations P. Alonso, M. F. Dolz, F. Igual, R. Mayo, E. S. Quintana-Ortí, V. Roca Univ. Politécnica Univ. Jaume I The Univ. of Texas de Valencia, Spain

More information

A Flexible Checkpoint/Restart Model in Distributed Systems

A Flexible Checkpoint/Restart Model in Distributed Systems A Flexible Checkpoint/Restart Model in Distributed Systems Mohamed-Slim Bouguerra 1,2, Thierry Gautier 2, Denis Trystram 1, and Jean-Marc Vincent 1 1 Grenoble University, ZIRST 51, avenue Jean Kuntzmann

More information

Scheduling of Frame-based Embedded Systems with Rechargeable Batteries

Scheduling of Frame-based Embedded Systems with Rechargeable Batteries Scheduling of Frame-based Embedded Systems with Rechargeable Batteries André Allavena Computer Science Department Cornell University Ithaca, NY 14853 andre@cs.cornell.edu Daniel Mossé Department of Computer

More information

Scheduling Lecture 1: Scheduling on One Machine

Scheduling Lecture 1: Scheduling on One Machine Scheduling Lecture 1: Scheduling on One Machine Loris Marchal 1 Generalities 1.1 Definition of scheduling allocation of limited resources to activities over time activities: tasks in computer environment,

More information

Homing and Synchronizing Sequences

Homing and Synchronizing Sequences Homing and Synchronizing Sequences Sven Sandberg Information Technology Department Uppsala University Sweden 1 Outline 1. Motivations 2. Definitions and Examples 3. Algorithms (a) Current State Uncertainty

More information

Optimizing Energy Consumption under Flow and Stretch Constraints

Optimizing Energy Consumption under Flow and Stretch Constraints Optimizing Energy Consumption under Flow and Stretch Constraints Zhi Zhang, Fei Li Department of Computer Science George Mason University {zzhang8, lifei}@cs.gmu.edu November 17, 2011 Contents 1 Motivation

More information

Bi-objective approximation scheme for makespan and reliability optimization on uniform parallel machines

Bi-objective approximation scheme for makespan and reliability optimization on uniform parallel machines Bi-objective approximation scheme for makespan and reliability optimization on uniform parallel machines Emmanuel Jeannot 1, Erik Saule 2, and Denis Trystram 2 1 INRIA-Lorraine : emmanuel.jeannot@loria.fr

More information

Embedded Systems Development

Embedded Systems Development Embedded Systems Development Lecture 3 Real-Time Scheduling Dr. Daniel Kästner AbsInt Angewandte Informatik GmbH kaestner@absint.com Model-based Software Development Generator Lustre programs Esterel programs

More information

Lecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013

Lecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013 Lecture 3 Real-Time Scheduling Daniel Kästner AbsInt GmbH 203 Model-based Software Development 2 SCADE Suite Application Model in SCADE (data flow + SSM) System Model (tasks, interrupts, buses, ) SymTA/S

More information

Reliability-aware energy optimization for throughput-constrained applications on MPSoC

Reliability-aware energy optimization for throughput-constrained applications on MPSoC Reliability-aware energy optimization for throughput-constrained applications on MPSoC Changjiang Gou, Anne Benoit, Mingsong Chen, Loris Marchal, Tongquan Wei To cite this version: Changjiang Gou, Anne

More information

Embedded Systems 15. REVIEW: Aperiodic scheduling. C i J i 0 a i s i f i d i

Embedded Systems 15. REVIEW: Aperiodic scheduling. C i J i 0 a i s i f i d i Embedded Systems 15-1 - REVIEW: Aperiodic scheduling C i J i 0 a i s i f i d i Given: A set of non-periodic tasks {J 1,, J n } with arrival times a i, deadlines d i, computation times C i precedence constraints

More information

Scheduling Lecture 1: Scheduling on One Machine

Scheduling Lecture 1: Scheduling on One Machine Scheduling Lecture 1: Scheduling on One Machine Loris Marchal October 16, 2012 1 Generalities 1.1 Definition of scheduling allocation of limited resources to activities over time activities: tasks in computer

More information

Scheduling divisible loads with return messages on heterogeneous master-worker platforms

Scheduling divisible loads with return messages on heterogeneous master-worker platforms Scheduling divisible loads with return messages on heterogeneous master-worker platforms Olivier Beaumont 1, Loris Marchal 2, and Yves Robert 2 1 LaBRI, UMR CNRS 5800, Bordeaux, France Olivier.Beaumont@labri.fr

More information

TOWARD THE PLACEMENT OF POWER MANAGEMENT POINTS IN REAL-TIME APPLICATIONS

TOWARD THE PLACEMENT OF POWER MANAGEMENT POINTS IN REAL-TIME APPLICATIONS Chapter 1 TOWARD THE PLACEMENT OF POWER MANAGEMENT POINTS IN REAL-TIME APPLICATIONS Nevine AbouGhazaleh, Daniel Mossé, Bruce Childers, Rami Melhem Department of Computer Science University of Pittsburgh

More information

Energy-Constrained Scheduling for Weakly-Hard Real-Time Systems

Energy-Constrained Scheduling for Weakly-Hard Real-Time Systems Energy-Constrained Scheduling for Weakly-Hard Real-Time Systems Tarek A. AlEnawy and Hakan Aydin Computer Science Department George Mason University Fairfax, VA 23 {thassan1,aydin}@cs.gmu.edu Abstract

More information

Optimal Checkpoint Placement on Real-Time Tasks with Harmonic Periods

Optimal Checkpoint Placement on Real-Time Tasks with Harmonic Periods Kwak SW, Yang JM. Optimal checkpoint placement on real-time tasks with harmonic periods. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(1): 105 112 Jan. 2012. DOI 10.1007/s11390-012-1209-0 Optimal Checkpoint

More information

Real-Time Scheduling and Resource Management

Real-Time Scheduling and Resource Management ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 2008 Real-Time Scheduling and Resource Management Lecturer: Giorgio Buttazzo Full Professor Scuola Superiore Sant Anna

More information

CPU SCHEDULING RONG ZHENG

CPU SCHEDULING RONG ZHENG CPU SCHEDULING RONG ZHENG OVERVIEW Why scheduling? Non-preemptive vs Preemptive policies FCFS, SJF, Round robin, multilevel queues with feedback, guaranteed scheduling 2 SHORT-TERM, MID-TERM, LONG- TERM

More information

Coping with silent and fail-stop errors at scale by combining replication and checkpointing

Coping with silent and fail-stop errors at scale by combining replication and checkpointing Coping with silent and fail-stop errors at scale by combining replication and checkpointing Anne Benoit a, Aurélien Cavelan b, Franck Cappello c, Padma Raghavan d, Yves Robert a,e, Hongyang Sun d, a Ecole

More information

CS 6901 (Applied Algorithms) Lecture 3

CS 6901 (Applied Algorithms) Lecture 3 CS 6901 (Applied Algorithms) Lecture 3 Antonina Kolokolova September 16, 2014 1 Representative problems: brief overview In this lecture we will look at several problems which, although look somewhat similar

More information

Reliable Computing I

Reliable Computing I Instructor: Mehdi Tahoori Reliable Computing I Lecture 5: Reliability Evaluation INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE NANO COMPUTING (CDNC) National Research Center of the Helmholtz

More information

Multiprocessor Scheduling I: Partitioned Scheduling. LS 12, TU Dortmund

Multiprocessor Scheduling I: Partitioned Scheduling. LS 12, TU Dortmund Multiprocessor Scheduling I: Partitioned Scheduling Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 22/23, June, 2015 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 47 Outline Introduction to Multiprocessor

More information

Lecture 6. Real-Time Systems. Dynamic Priority Scheduling

Lecture 6. Real-Time Systems. Dynamic Priority Scheduling Real-Time Systems Lecture 6 Dynamic Priority Scheduling Online scheduling with dynamic priorities: Earliest Deadline First scheduling CPU utilization bound Optimality and comparison with RM: Schedulability

More information

Non-Work-Conserving Non-Preemptive Scheduling: Motivations, Challenges, and Potential Solutions

Non-Work-Conserving Non-Preemptive Scheduling: Motivations, Challenges, and Potential Solutions Non-Work-Conserving Non-Preemptive Scheduling: Motivations, Challenges, and Potential Solutions Mitra Nasri Chair of Real-time Systems, Technische Universität Kaiserslautern, Germany nasri@eit.uni-kl.de

More information

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms

More information

Online Work Maximization under a Peak Temperature Constraint

Online Work Maximization under a Peak Temperature Constraint Online Work Maximization under a Peak Temperature Constraint Thidapat Chantem Department of CSE University of Notre Dame Notre Dame, IN 46556 tchantem@nd.edu X. Sharon Hu Department of CSE University of

More information

Mapping to Parallel Hardware

Mapping to Parallel Hardware - Mapping to Parallel Hardware Bruno Bodin University of Edinburgh, United Kingdom 1 / 27 Context - Hardware Samsung Exynos ARM big-little (Heterogenenous), 4 ARM A15, 4 ARM A7, 2-16 cores GPU, 5 Watt.

More information

Divisible Load Scheduling

Divisible Load Scheduling Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute

More information

Embedded Systems 14. Overview of embedded systems design

Embedded Systems 14. Overview of embedded systems design Embedded Systems 14-1 - Overview of embedded systems design - 2-1 Point of departure: Scheduling general IT systems In general IT systems, not much is known about the computational processes a priori The

More information

Polynomial Time Algorithms for Minimum Energy Scheduling

Polynomial Time Algorithms for Minimum Energy Scheduling Polynomial Time Algorithms for Minimum Energy Scheduling Philippe Baptiste 1, Marek Chrobak 2, and Christoph Dürr 1 1 CNRS, LIX UMR 7161, Ecole Polytechnique 91128 Palaiseau, France. Supported by CNRS/NSF

More information

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program

More information

Real-time Scheduling of Periodic Tasks (1) Advanced Operating Systems Lecture 2

Real-time Scheduling of Periodic Tasks (1) Advanced Operating Systems Lecture 2 Real-time Scheduling of Periodic Tasks (1) Advanced Operating Systems Lecture 2 Lecture Outline Scheduling periodic tasks The rate monotonic algorithm Definition Non-optimality Time-demand analysis...!2

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

Fine Grain Quality Management

Fine Grain Quality Management Fine Grain Quality Management Jacques Combaz Jean-Claude Fernandez Mohamad Jaber Joseph Sifakis Loïc Strus Verimag Lab. Université Joseph Fourier Grenoble, France DCS seminar, 10 June 2008, Col de Porte

More information

Fault Tolerant Computing CS 530 Software Reliability Growth. Yashwant K. Malaiya Colorado State University

Fault Tolerant Computing CS 530 Software Reliability Growth. Yashwant K. Malaiya Colorado State University Fault Tolerant Computing CS 530 Software Reliability Growth Yashwant K. Malaiya Colorado State University 1 Software Reliability Growth: Outline Testing approaches Operational Profile Software Reliability

More information

CycleTandem: Energy-Saving Scheduling for Real-Time Systems with Hardware Accelerators

CycleTandem: Energy-Saving Scheduling for Real-Time Systems with Hardware Accelerators CycleTandem: Energy-Saving Scheduling for Real-Time Systems with Hardware Accelerators Sandeep D souza and Ragunathan (Raj) Rajkumar Carnegie Mellon University High (Energy) Cost of Accelerators Modern-day

More information

Maximizing Rewards for Real-Time Applications with Energy Constraints

Maximizing Rewards for Real-Time Applications with Energy Constraints Maximizing Rewards for Real-Time Applications with Energy Constraints COSMIN RUSU, RAMI MELHEM, and DANIEL MOSSÉ University of Pittsburgh New technologies have brought about a proliferation of embedded

More information

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining

More information

SPEED SCALING FOR ENERGY AWARE PROCESSOR SCHEDULING: ALGORITHMS AND ANALYSIS

SPEED SCALING FOR ENERGY AWARE PROCESSOR SCHEDULING: ALGORITHMS AND ANALYSIS SPEED SCALING FOR ENERGY AWARE PROCESSOR SCHEDULING: ALGORITHMS AND ANALYSIS by Daniel Cole B.S. in Computer Science and Engineering, The Ohio State University, 2003 Submitted to the Graduate Faculty of

More information

A New Task Model and Utilization Bound for Uniform Multiprocessors

A New Task Model and Utilization Bound for Uniform Multiprocessors A New Task Model and Utilization Bound for Uniform Multiprocessors Shelby Funk Department of Computer Science, The University of Georgia Email: shelby@cs.uga.edu Abstract This paper introduces a new model

More information

Scheduling periodic Tasks on Multiple Periodic Resources

Scheduling periodic Tasks on Multiple Periodic Resources Scheduling periodic Tasks on Multiple Periodic Resources Xiayu Hua, Zheng Li, Hao Wu, Shangping Ren* Department of Computer Science Illinois Institute of Technology Chicago, IL 60616, USA {xhua, zli80,

More information

Periodic I/O Scheduling for Supercomputers

Periodic I/O Scheduling for Supercomputers Periodic I/O Scheduling for Supercomputers Guillaume Aupy 1, Ana Gainaru 2, Valentin Le Fèvre 3 1 Inria & U. of Bordeaux 2 Vanderbilt University 3 ENS Lyon & Inria PMBS Workshop, November 217 Slides available

More information

ENERGY-EFFICIENT REAL-TIME SCHEDULING ALGORITHM FOR FAULT-TOLERANT AUTONOMOUS SYSTEMS

ENERGY-EFFICIENT REAL-TIME SCHEDULING ALGORITHM FOR FAULT-TOLERANT AUTONOMOUS SYSTEMS DOI 10.12694/scpe.v19i4.1438 Scalable Computing: Practice and Experience ISSN 1895-1767 Volume 19, Number 4, pp. 387 400. http://www.scpe.org c 2018 SCPE ENERGY-EFFICIENT REAL-TIME SCHEDULING ALGORITHM

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Fault Tolerant Computing ECE 655

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Fault Tolerant Computing ECE 655 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655 Part 1 Introduction C. M. Krishna Fall 2006 ECE655/Krishna Part.1.1 Prerequisites Basic courses in

More information

On the Optimal Recovery Threshold of Coded Matrix Multiplication

On the Optimal Recovery Threshold of Coded Matrix Multiplication 1 On the Optimal Recovery Threshold of Coded Matrix Multiplication Sanghamitra Dutta, Mohammad Fahim, Farzin Haddadpour, Haewon Jeong, Viveck Cadambe, Pulkit Grover arxiv:1801.10292v2 [cs.it] 16 May 2018

More information

Andrew Morton University of Waterloo Canada

Andrew Morton University of Waterloo Canada EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo Canada Outline 1) Introduction and motivation 2) Review of EDF and feasibility analysis 3) Hardware accelerators and scheduling

More information

p-yds Algorithm: An Optimal Extension of YDS Algorithm to Minimize Expected Energy For Real-Time Jobs

p-yds Algorithm: An Optimal Extension of YDS Algorithm to Minimize Expected Energy For Real-Time Jobs p-yds Algorithm: An Optimal Extension of YDS Algorithm to Minimize Expected Energy For Real-Time Jobs TIK Report No. 353 Pratyush Kumar and Lothar Thiele Computer Engineering and Networks Laboratory, ETH

More information

The Concurrent Consideration of Uncertainty in WCETs and Processor Speeds in Mixed Criticality Systems

The Concurrent Consideration of Uncertainty in WCETs and Processor Speeds in Mixed Criticality Systems The Concurrent Consideration of Uncertainty in WCETs and Processor Speeds in Mixed Criticality Systems Zhishan Guo and Sanjoy Baruah Department of Computer Science University of North Carolina at Chapel

More information

State-dependent and Energy-aware Control of Server Farm

State-dependent and Energy-aware Control of Server Farm State-dependent and Energy-aware Control of Server Farm Esa Hyytiä, Rhonda Righter and Samuli Aalto Aalto University, Finland UC Berkeley, USA First European Conference on Queueing Theory ECQT 2014 First

More information

ENERGY EFFICIENT TASK SCHEDULING OF SEND- RECEIVE TASK GRAPHS ON DISTRIBUTED MULTI- CORE PROCESSORS WITH SOFTWARE CONTROLLED DYNAMIC VOLTAGE SCALING

ENERGY EFFICIENT TASK SCHEDULING OF SEND- RECEIVE TASK GRAPHS ON DISTRIBUTED MULTI- CORE PROCESSORS WITH SOFTWARE CONTROLLED DYNAMIC VOLTAGE SCALING ENERGY EFFICIENT TASK SCHEDULING OF SEND- RECEIVE TASK GRAPHS ON DISTRIBUTED MULTI- CORE PROCESSORS WITH SOFTWARE CONTROLLED DYNAMIC VOLTAGE SCALING Abhishek Mishra and Anil Kumar Tripathi Department of

More information

Continuous heat flow analysis. Time-variant heat sources. Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering

Continuous heat flow analysis. Time-variant heat sources. Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering Thermal Modeling, Analysis and Management of 2D Multi-Processor System-on-Chip Prof David Atienza Alonso Embedded Systems Laboratory (ESL) Institute of EE, Falty of Engineering Outline MPSoC thermal modeling

More information

Technology Mapping for Reliability Enhancement in Logic Synthesis

Technology Mapping for Reliability Enhancement in Logic Synthesis Technology Mapping for Reliability Enhancement in Logic Synthesis Zhaojun Wo and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts,Amherst,MA 01003 E-mail: {zwo,koren}@ecs.umass.edu

More information

An Efficient Energy-Optimal Device-Scheduling Algorithm for Hard Real-Time Systems

An Efficient Energy-Optimal Device-Scheduling Algorithm for Hard Real-Time Systems An Efficient Energy-Optimal Device-Scheduling Algorithm for Hard Real-Time Systems S. Chakravarthula 2 S.S. Iyengar* Microsoft, srinivac@microsoft.com Louisiana State University, iyengar@bit.csc.lsu.edu

More information

Design and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources

Design and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources Design and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources Jan Reineke @ saarland university ACACES Summer School 2017 Fiuggi, Italy computer science Fixed-Priority

More information

Checkpointing strategies with prediction windows

Checkpointing strategies with prediction windows hal-7899, version - 5 Feb 23 Checkpointing strategies with prediction windows Guillaume Aupy, Yves Robert, Frédéric Vivien, Dounia Zaidouni RESEARCH REPORT N 8239 February 23 Project-Team ROMA SSN 249-6399

More information

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved. Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved. 1 Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should

More information

The Weighted Byzantine Agreement Problem

The Weighted Byzantine Agreement Problem The Weighted Byzantine Agreement Problem Vijay K. Garg and John Bridgman Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712-1084, USA garg@ece.utexas.edu,

More information

Title: Maximizing Reliability of Energy Constrained Parallel Applications on Heterogeneous Distributed Systems

Title: Maximizing Reliability of Energy Constrained Parallel Applications on Heterogeneous Distributed Systems Title: Maximizing Reliability of Energy Constrained Parallel Applications on Heterogeneous Distributed Systems Author: Xiongren Xiao Guoqi Xie Cheng Xu Chunnian Fan Renfa Li Keqin Li PII: S1877-7503(17)30493-3

More information

Checkpointing strategies with prediction windows

Checkpointing strategies with prediction windows Author manuscript, published in "PRDC - The 9th EEE Pacific Rim nternational Symposium on Dependable Computing - 23 23" Checkpointing strategies with prediction windows Regular paper Guillaume Aupy,3,

More information

Slack Reclamation for Real-Time Task Scheduling over Dynamic Voltage Scaling Multiprocessors

Slack Reclamation for Real-Time Task Scheduling over Dynamic Voltage Scaling Multiprocessors Slack Reclamation for Real-Time Task Scheduling over Dynamic Voltage Scaling Multiprocessors Jian-Jia Chen, Chuan-Yue Yang, and Tei-Wei Kuo Department of Computer Science and Information Engineering Graduate

More information

Reclaiming the energy of a schedule: models and algorithms

Reclaiming the energy of a schedule: models and algorithms Reclaiming the energy of a schedule: models and algorithms Guillaume Aupy, Anne Benoit, Fanny Dufossé, Yves Robert To cite this version: Guillaume Aupy, Anne Benoit, Fanny Dufossé, Yves Robert. Reclaiming

More information

Bounding the End-to-End Response Times of Tasks in a Distributed. Real-Time System Using the Direct Synchronization Protocol.

Bounding the End-to-End Response Times of Tasks in a Distributed. Real-Time System Using the Direct Synchronization Protocol. Bounding the End-to-End Response imes of asks in a Distributed Real-ime System Using the Direct Synchronization Protocol Jun Sun Jane Liu Abstract In a distributed real-time system, a task may consist

More information

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas

Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast. Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Network Algorithms and Complexity (NTUA-MPLA) Reliable Broadcast Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas Slides are partially based on the joint work of Christos Litsas, Aris Pagourtzis,

More information

Shared Recovery for Energy Efficiency and Reliability Enhancements in Real-Time Applications with Precedence Constraints

Shared Recovery for Energy Efficiency and Reliability Enhancements in Real-Time Applications with Precedence Constraints Shared Recovery for Energy Efficiency and Reliability Enhancements in Real-Time Applications with Precedence Constraints BAOXIAN ZHAO and HAKAN AYDIN, George Mason University DAKAI ZHU, University of Texas

More information