Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing
|
|
- Irma Reynolds
- 5 years ago
- Views:
Transcription
1 Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing Prasanna Balaprakash, Leonardo A. Bautista Gomez, Slim Bouguerra, Stefan M. Wild, Franck Cappello, and Paul D. Hovland ANL PMBS SC 14 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 1
2 Context and motivations Context: The Need For Speed Figure : From leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 2
3 Context and motivations Motivation: Failures Sequoia MTBF 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 3
4 Context and motivations Motivation: Failures Sequoia MTBF 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day. 20 % of the computation is wasted in recovery and re-execution (Implies energy waste leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 3
5 Context and motivations Motivation: Failures Sequoia MTBF 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day. 20 % of the computation is wasted in recovery and re-execution (Implies energy waste Exascale: The number of components for both memory and processors will increase by a factor of 100. Shrinking the circuit sizes and running at lower voltages, increases the SDC probability. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 3
6 Context and motivations Motivation: Failures Sequoia MTBF 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day. 20 % of the computation is wasted in recovery and re-execution (Implies energy waste Exascale: The number of components for both memory and processors will increase by a factor of 100. Shrinking the circuit sizes and running at lower voltages, increases the SDC probability. In exascale failures will occur at higher frequency, optimistic MTBF is couple of hours. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 3
7 Context and motivations Motivation: Energy The power draw of the interconnect on Blue Gene/Q appears to be independent of load. CPU varies only by some 20% Power draw under different loads is DRAM change by a factor of 2 or more. Exascale: Data movement and IO will consume more than 70% of the total system power (most of the 20 MW will go just to power the 10 PB of total system memory. Flops/Watt VS Communication/Watts Avoid checkpointing and data movement do more re-computations. VS Avoid re-computations via checkpointing more often. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 4
8 Context and motivations Related Work ECOTFIT, Diouri et al Blocking checkpointing Message logging Conclusion no big tradeoff observed. Meneses et al Parallel recovery vs global recovery Used RALP API (No communication or IO are covered Parallel is better since it reduces the overall time Aupy et al Blocking vs no-blocking single level. No experiment. (ANL Tradeoffs between Energy and Run Time PMBS SC 14 5
9 Context and motivations Related Work ECOTFIT, Diouri et al Blocking checkpointing Message logging Conclusion no big tradeoff observed. Meneses et al Parallel recovery vs global recovery Used RALP API (No communication or IO are covered Parallel is better since it reduces the overall time Aupy et al Blocking vs no-blocking single level. No experiment. The missing episode What about multilevel checkpointing? (ANL Tradeoffs between Energy and Run Time PMBS SC 14 5
10 Problem formulation and notations 1 Context and motivations 2 Problem formulation and notations Multilevel checkpointing Energy model Multiobjective optimization 3 Simulation and experimentations Experimentations Tradeoff analysis 4 Conclusion and future work leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 6
11 Problem formulation and notations Multilevel checkpointing Multilevel Checkpointing Multiple levels of storage (DRAM, NVM, PFS. Coupled with data replication and erasure codes. Low levels offer high performance and partial reliability. High levels offer high reliability but impose large overhead. Different ckpt. levels have different frequencies. After a failure the application restart from the lowest available level. If unable to recover, try next level of checkpoint (Further in the past. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 7
12 Problem formulation and notations Multilevel checkpointing Wasted time model L levels of checkpoint (4 with FTI Checkpoint strategy: τ i, for i = 1 L Checkpoint cost: c i for level i r i time for a restart from level i d i downtime after a failure affecting level i. µ i rate of failures affecting level i. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 8
13 Problem formulation and notations Energy model Wasted energy model Pi c Pi r power for a level i checkpoint Watts. power for a restart from level i Watts. P a power for a failure-free computation without checkpointing Watts. µ i rate for failure affecting level i. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 9
14 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint time W ch = L i=1 c i τ i i 1 + µ i τ i j=1 c j 2τ j leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 10
15 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint time W ch = L i=1 c i τ i i 1 + µ i τ i j=1 c j 2τ j Rework time W rew = L i=1 µ i τ i 2 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 10
16 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint time W ch = L i=1 c i τ i i 1 + µ i τ i j=1 c j 2τ j Rework time W rew = L i=1 µ i τ i 2 Downtime and restart time W down = L µ i (r i + d i i=1 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 10
17 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint wasted energy E ch = L i=1 P c i c i i 1 Pj c + µ i τ c j i τ i 2τ j j=1 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 11
18 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint wasted energy E ch = L i=1 P c i c i i 1 Pj c + µ i τ c j i τ i 2τ j j=1 Rework wasted energy E rew = L i=1 P a µ iτ i 2 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 11
19 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint wasted energy E ch = L i=1 P c i c i i 1 Pj c + µ i τ c j i τ i 2τ j j=1 Rework wasted energy E rew = L i=1 P a µ iτ i 2 Downtime and restart wasted energy E down = L Pi r µ i (r i + d i i=1 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 11
20 Problem formulation and notations Multiobjective optimization Total wasted time W = L i=1 ( ( c i + µ iτ i 1 + τ i 2 i 1 j=1 c j 2τ j + µ i (r i + d i (1 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 12
21 Problem formulation and notations Multiobjective optimization Total wasted time W = L i=1 ( ( c i + µ iτ i 1 + τ i 2 i 1 j=1 c j 2τ j + µ i (r i + d i (1 Total wasted energy E = ( L P c i c i i=1 τ i ( + µ i τ P a i 2 + i 1 Pj cc j j=1 2τ j + L i=1 Pr i µ i(r i + d i, (2 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 12
22 Problem formulation and notations Multiobjective optimization First derivatives E τ i W τ i = µ i 2 = µ i 2 ( ( i j=1 i 1 P a + j=1 c j τ j P c j c j τ j c i τi 2 ( Pc i c i τ 2 i 1 + ( L j=i µ j τ j 2 L j=i+1 µ j τ j 2 (3 (4 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 13
23 Problem formulation and notations Multiobjective optimization First derivatives E τ i W τ i = µ i 2 = µ i 2 ( ( i j=1 i 1 P a + j=1 c j τ j P c j c j τ j c i τi 2 ( Pc i c i τ 2 i 1 + ( L j=i µ j τ j 2 L j=i+1 µ j τ j 2 (3 (4 Solutions τ W i = τ E i = c i(2 + L j=i+1 µ jτ W µ i (1 + i 1 c j j=1 τ W j j ρ ic i (2 + L j=i+1 µ jτ E µ i (1 + i 1 j=1 ρ j c j τj E j ρ i = P c i /Pa leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 13
24 Problem formulation and notations Multiobjective optimization Solutions ρ i = P c i /Pa τ W i = τ E i = c i(2 + L j=i+1 µ jτ W µ i (1 + i 1 c j j=1 τ W j j ρ ic i (2 + L j=i+1 µ jτ E µ i (1 + i 1 j=1 ρ j c j τj E j Solutions For one single level we have : τ W = 2c/µ and τ E = τ W P c /P a Whenever P c P a, we have that τ W τ E, and hence the two objectives are conflicting. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 14
25 Problem formulation and notations Multiobjective optimization Pareto front Definition τ i is said to be Pareto-optimal if it is not dominated by any other τ j. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 15
26 Problem formulation and notations Multiobjective optimization Pareto front Definition τ i is said to be Pareto-optimal if it is not dominated by any other τ j. Convex combination If the Pareto front is convex, any point on the front can be obtained by minimizing a linear combination of the objectives. Theorem f λ (τ = λw(τ + (1 λe(τ, for λ [0, 1]. The Hessian 2 ττ W(τ is diagonally dominant, and thus W is a convex function of τ over the domain (same for E(τ leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 15
27 Problem formulation and notations Multiobjective optimization Pareto front Convex combination If the Pareto front is convex, any point on the front can be obtained by minimizing a linear combination of the objectives. Theorem f λ (τ = λw(τ + (1 λe(τ, for λ [0, 1]. The Hessian 2 ττ W(τ is diagonally dominant, and thus W is a convex function of τ over the domain (same for E(τ τ i (λ = ( c i (λ + (1 λpi c 2 + L µ j τj j=i+1 (, (5 µ i λ + (1 λp a + i 1 (λ + (1 λpj c c j j=1 τ j leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 15
28 Problem formulation and notations Multiobjective optimization Pareto front Theorem The Hessian 2 ττ W(τ is diagonally dominant, and thus W is a convex function of τ over the domain (same for E(τ τ i (λ = ( c i (λ + (1 λpi c 2 + L µ j τj j=i+1 (, (5 µ i λ + (1 λp a + i 1 (λ + (1 λpj c c j j=1 τ j Case one level (L = 1 τ (λ = τ W λ + (1 λp c λ + (1 λp a. (6 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 15
29 Problem formulation and notations Multiobjective optimization Pareto Front (ANL Tradeoffs between Energy and Run Time PMBS SC 14 16
30 Simulation and experimentations 1 Context and motivations 2 Problem formulation and notations Multilevel checkpointing Energy model Multiobjective optimization 3 Simulation and experimentations Experimentations Tradeoff analysis 4 Conclusion and future work leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 17
31 Simulation and experimentations Experimentations Platform Mira 10 PF IBM Blue Gene/Q (BG/Q 49,152 nodes organized in 48 racks 16 cores of 1.6 GHz PowerPC A2 and 16 GB of DDR3 memory. 5-D torus network. Vesta (developmental platform for Mira Same as Mira s but with 2,048 nodes leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 18
32 Simulation and experimentations Experimentations Platform Mira 10 PF IBM Blue Gene/Q (BG/Q 49,152 nodes organized in 48 racks 16 cores of 1.6 GHz PowerPC A2 and 16 GB of DDR3 memory. 5-D torus network. Vesta (developmental platform for Mira Same as Mira s but with 2,048 nodes MonEQ for power measurement (resolution of 560 ms Chip core DRAM Network Collect power data only at the node card level (every 32 nodes The library can not measure the I/O power consumption!! leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 18
33 Simulation and experimentations Experimentations Applications LAMMPS Production-level molecular dynamics application. Lennard-Jones simulation of 1.3 billion atoms. Checkpoint size per node 200 MB ( 100GB application s footprint Checkpoints intervals 4, 8, 16 and 32 minutes. CORAL Qbox: first-principles molecular dynamics AMG: is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. LULESH: performs hydrodynamics stencil calculations minife: is a finite-element code. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 19
34 Simulation and experimentations Experimentations LAMMPS: synchronous Figure : Synchronous multilevel checkpointing 1.3 billion atoms Lennard-Jones simulation. 512 nodes running 64 MPI ranks per node (32,678 proc.. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 20
35 Simulation and experimentations Experimentations LAMMPS: asynchronous Figure : Asynchronous multilevel checkpointing 1.3 billion atoms Lennard-Jones simulation. 512 nodes running 64 MPI ranks per node (32,678 proc.. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 21
36 Simulation and experimentations Experimentations CORAL benchmark (a LULESH (b MiniFE (c AMG (d Qbox 32 nodes. Each(ANL application Tradeoffs is run between with Energy a and configuration Run Time PMBS of 16workshop ranks SC 14 22
37 Simulation and experimentations Tradeoff analysis Pareto fronts: Level 4 power consumption (e Level 4 power consumption P c 4 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 23
38 Simulation and experimentations Tradeoff analysis Pareto fronts :Computation power (f Computation power P a leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 24
39 Simulation and experimentations Tradeoff analysis Pareto fronts : Power ratio Pc P a (g Power ratio Pc P a leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 25
40 Conclusion and future work 1 Context and motivations 2 Problem formulation and notations Multilevel checkpointing Energy model Multiobjective optimization 3 Simulation and experimentations Experimentations Tradeoff analysis 4 Conclusion and future work leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 26
41 Conclusion and future work Summary Analytical models of performance and energy for multilevel checkpoint schemes. The pareto-front is obtained using convex combination. Power measurement experiments with production-level scientific applications running on over 32,000 MPI processes: The relative energy overhead of using FTI is minor and thus the tradeoffs are relatively small. Richer tradeoff exist when the power consumption of checkpointing is greater than that of the computation. (ANL Tradeoffs between Energy and Run Time PMBS SC 14 27
42 Conclusion and future work What is Next? Analyzing power profile of different fault tolerance protocols such as full/partial replication and message logging. The viability of replication with respect to the power cap of future exascale platforms (ANL Tradeoffs between Energy and Run Time PMBS SC 14 28
43 Conclusion and future work Questions? Thank You!! (ANL Tradeoffs between Energy and Run Time PMBS SC 14 29
Tradeoff between Reliability and Power Management
Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,
More informationResilient and energy-aware algorithms
Resilient and energy-aware algorithms Anne Benoit ENS Lyon Anne.Benoit@ens-lyon.fr http://graal.ens-lyon.fr/~abenoit CR02-2016/2017 Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 1/
More informationEmbedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden
of /4 4 Embedded Systems Design: Optimization Challenges Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden Outline! Embedded systems " Example area: automotive electronics " Embedded systems
More informationA Tale of Two Erasure Codes in HDFS
A Tale of Two Erasure Codes in HDFS Dynamo Mingyuan Xia *, Mohit Saxena +, Mario Blaum +, and David A. Pease + * McGill University, + IBM Research Almaden FAST 15 何军权 2015-04-30 1 Outline Introduction
More informationOptimal checkpointing periods with fail-stop and silent errors
Optimal checkpointing periods with fail-stop and silent errors Anne Benoit ENS Lyon Anne.Benoit@ens-lyon.fr http://graal.ens-lyon.fr/~abenoit 3rd JLESC Summer School June 30, 2016 Anne.Benoit@ens-lyon.fr
More informationCheckpoint Scheduling
Checkpoint Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute of
More informationCoping with silent and fail-stop errors at scale by combining replication and checkpointing
Coping with silent and fail-stop errors at scale by combining replication and checkpointing Anne Benoit a, Aurélien Cavelan b, Franck Cappello c, Padma Raghavan d, Yves Robert a,e, Hongyang Sun d, a Ecole
More informationFault-Tolerant Techniques for HPC
Fault-Tolerant Techniques for HPC Yves Robert Laboratoire LIP, ENS Lyon Institut Universitaire de France University Tennessee Knoxville Yves.Robert@inria.fr http://graal.ens-lyon.fr/~yrobert/htdc-flaine.pdf
More informationEnergy-efficient scheduling
Energy-efficient scheduling Guillaume Aupy 1, Anne Benoit 1,2, Paul Renaud-Goud 1 and Yves Robert 1,2,3 1. Ecole Normale Supérieure de Lyon, France 2. Institut Universitaire de France 3. University of
More informationPeriodic I/O Scheduling for Supercomputers
Periodic I/O Scheduling for Supercomputers Guillaume Aupy 1, Ana Gainaru 2, Valentin Le Fèvre 3 1 Inria & U. of Bordeaux 2 Vanderbilt University 3 ENS Lyon & Inria PMBS Workshop, November 217 Slides available
More informationCactus Tools for Petascale Computing
Cactus Tools for Petascale Computing Erik Schnetter Reno, November 2007 Gamma Ray Bursts ~10 7 km He Protoneutron Star Accretion Collapse to a Black Hole Jet Formation and Sustainment Fe-group nuclei Si
More informationPerformance Analysis of Lattice QCD Application with APGAS Programming Model
Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models
More informationAdaptive and Power-Aware Resilience for Extreme-scale Computing
Adaptive and Power-Aware Resilience for Extreme-scale Computing Xiaolong Cui, Taieb Znati, Rami Melhem Computer Science Department University of Pittsburgh Pittsburgh, USA Email: {mclarencui, znati, melhem}@cs.pitt.edu
More informationRollback-Recovery. Uncoordinated Checkpointing. p!! Easy to understand No synchronization overhead. Flexible. To recover from a crash:
Rollback-Recovery Uncoordinated Checkpointing Easy to understand No synchronization overhead p!! Flexible can choose when to checkpoint To recover from a crash: go back to last checkpoint restart How (not)to
More informationSome thoughts about energy efficient application execution on NEC LX Series compute clusters
Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science
More informationFault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing
20 Years of Innovative Computing Knoxville, Tennessee March 26 th, 200 Fault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing Zizhong (Jeffrey) Chen zchen@mines.edu Colorado School
More informationEfficient checkpoint/verification patterns
Efficient checkpoint/verification patterns Anne Benoit a, Saurabh K. Raina b, Yves Robert a,c a Laboratoire LIP, École Normale Supérieure de Lyon, France b Department of CSE/IT, Jaypee Institute of Information
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.
More informationEE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption
EE115C Winter 2017 Digital Electronic Circuits Lecture 6: Power Consumption Four Key Design Metrics for Digital ICs Cost of ICs Reliability Speed Power EE115C Winter 2017 2 Power and Energy Challenges
More informationParallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2
1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013
More informationCRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY
CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY MEMSYS-2017 SWAMIT TANNU DOUG CARMEAN MOINUDDIN QURESHI Why Quantum Computers? 2 Molecule and Material Simulations
More informationCMP 338: Third Class
CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does
More informationHow to deal with uncertainties and dynamicity?
How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling
More informationOn scheduling the checkpoints of exascale applications
On scheduling the checkpoints of exascale applications Marin Bougeret, Henri Casanova, Mikaël Rabie, Yves Robert, and Frédéric Vivien INRIA, École normale supérieure de Lyon, France Univ. of Hawai i at
More informationDistributed by: www.jameco.com 1-800-831-4242 The content and copyrights of the attached material are the property of its owner. September 2001 S7C256 5V/3.3V 32K X 8 CMOS SRM (Common I/O) Features S7C256
More informationC 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This
Recap: Finger Table Finding a using fingers Distributed Systems onsensus Steve Ko omputer Sciences and Engineering University at Buffalo N102 86 + 2 4 N86 20 + 2 6 N20 2 Let s onsider This
More informationNCEP Applications -- HPC Performance and Strategies. Mark Iredell software team lead USDOC/NOAA/NWS/NCEP/EMC
NCEP Applications -- HPC Performance and Strategies Mark Iredell software team lead USDOC/NOAA/NWS/NCEP/EMC Motivation and Outline Challenges in porting NCEP applications to WCOSS and future operational
More informationLeveraging Task-Parallelism in Energy-Efficient ILU Preconditioners
Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga
More informationThe Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems
The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems Wu Feng and Balaji Subramaniam Metrics for Energy Efficiency Energy- Delay Product (EDP) Used primarily in circuit design
More informationLecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM
Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM Mark McDermott Electrical and Computer Engineering The University of Texas at Austin 9/27/18 VLSI-1 Class Notes Why Clocking?
More informationWeather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012
Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More informationCRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?
CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?
More informationEnergy-aware checkpointing of divisible tasks with soft or hard deadlines
Energy-aware checkpointing of divisible tasks with soft or hard deadlines Guillaume Aupy 1, Anne Benoit 1,2, Rami Melhem 3, Paul Renaud-Goud 1 and Yves Robert 1,2,4 1. Ecole Normale Supérieure de Lyon,
More informationAdjoint-Based Uncertainty Quantification and Sensitivity Analysis for Reactor Depletion Calculations
Adjoint-Based Uncertainty Quantification and Sensitivity Analysis for Reactor Depletion Calculations Hayes F. Stripling 1 Marvin L. Adams 1 Mihai Anitescu 2 1 Department of Nuclear Engineering, Texas A&M
More informationLecture 5 Fault Modeling
Lecture 5 Fault Modeling Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults Single stuck-at faults Fault equivalence Fault dominance and checkpoint theorem Classes
More informationEDF Feasibility and Hardware Accelerators
EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo, Waterloo, Canada, arrmorton@uwaterloo.ca Wayne M. Loucks University of Waterloo, Waterloo, Canada, wmloucks@pads.uwaterloo.ca
More information12 - The Tie Set Method
12 - The Tie Set Method Definitions: A tie set V is a set of components whose success results in system success, i.e. the presence of all components in any tie set connects the input to the output in the
More informationAsynchronous Parareal in time discretization for partial differential equations
Asynchronous Parareal in time discretization for partial differential equations Frédéric Magoulès, Guillaume Gbikpi-Benissan April 7, 2016 CentraleSupélec IRT SystemX Outline of the presentation 01 Introduction
More informationA Piggybacking Design Framework for Read- and- Download- efficient Distributed Storage Codes. K. V. Rashmi, Nihar B. Shah, Kannan Ramchandran
A Piggybacking Design Framework for Read- and- Download- efficient Distributed Storage Codes K V Rashmi, Nihar B Shah, Kannan Ramchandran Outline IntroducGon & MoGvaGon Measurements from Facebook s Warehouse
More informationBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?
More informationTowards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters
Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM
More informationScalable Tools for Debugging Non-Deterministic MPI Applications
Scalable Tools for Debugging Non-Deterministic MPI Applications ReMPI: MPI Record-and-Replay tool Scalable Tools Workshop August 2nd, 2016 Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Mar>n
More informationab initio Electronic Structure Calculations
ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab
More informationCartesius Opening. Jean-Marc DENIS. June, 14th, International Business Director Extreme Computing Business Unit
Cartesius Opening June, 14th, 2013 Jean-Marc DENIS International Business Director Extreme Computing Business Unit 1 Cartesius (Renatus, 1596 1650) (*) René Descartes (French: [ʁəne dekaʁt]; Latinized:
More informationAn Integrative Model for Parallelism
An Integrative Model for Parallelism Victor Eijkhout ICERM workshop 2012/01/09 Introduction Formal part Examples Extension to other memory models Conclusion tw-12-exascale 2012/01/09 2 Introduction tw-12-exascale
More informationParallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors
Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer
More informationSP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay
SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain
More informationCheckpointing strategies for parallel jobs
Checkpointing strategies for parallel jobs Marin Bougeret ENS Lyon, France Marin.Bougeret@ens-lyon.fr Yves Robert ENS Lyon, France Yves.Robert@ens-lyon.fr Henri Casanova Univ. of Hawai i at Mānoa, Honolulu,
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationA Flexible Checkpoint/Restart Model in Distributed Systems
A Flexible Checkpoint/Restart Model in Distributed Systems Mohamed-Slim Bouguerra 1,2, Thierry Gautier 2, Denis Trystram 1, and Jean-Marc Vincent 1 1 Grenoble University, ZIRST 51, avenue Jean Kuntzmann
More informationA Computation- and Communication-Optimal Parallel Direct 3-body Algorithm
A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,
More information2.5D algorithms for distributed-memory computing
ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster
More informationA Hybrid Method for the Wave Equation. beilina
A Hybrid Method for the Wave Equation http://www.math.unibas.ch/ beilina 1 The mathematical model The model problem is the wave equation 2 u t 2 = (a 2 u) + f, x Ω R 3, t > 0, (1) u(x, 0) = 0, x Ω, (2)
More informationScalable numerical algorithms for electronic structure calculations
Scalable numerical algorithms for electronic structure calculations Edgar Solomonik C Berkeley July, 2012 Edgar Solomonik Cyclops Tensor Framework 1/ 73 Outline Introduction Motivation: Coupled Cluster
More informationMassively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling
2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)
More informationEfficient checkpoint/verification patterns for silent error detection
Efficient checkpoint/verification patterns for silent error detection Anne Benoit 1, Saurabh K. Raina 2 and Yves Robert 1,3 1. LIP, École Normale Supérieure de Lyon, CNRS & INRIA, France 2. Jaypee Institute
More informationCoding for loss tolerant systems
Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 Mathieu Cunche, Vincent Roca INRIA, équipe Planète INRIA Rhône-Alpes Mathieu Cunche, Vincent Roca The erasure channel Erasure codes Reed-Solomon
More informationEfficient algorithms for symmetric tensor contractions
Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to
More informationEarly-stage Power Grid Analysis for Uncertain Working Modes
Early-stage Power Grid Analysis for Uncertain Working Modes Haifeng Qian Department of ECE University of Minnesota Minneapolis, MN 55414 qianhf@ece.umn.edu Sani R. Nassif IBM Austin Research Labs 11400
More informationLecture 2: Metrics to Evaluate Systems
Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video
More informationMPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory
MAX PLANCK INSTITUTE November 5, 2010 MPI at MPI Jens Saak Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory FOR DYNAMICS OF COMPLEX TECHNICAL
More informationThermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model
Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Yang Shang 1, Chun Zhang 1, Hao Yu 1, Chuan Seng Tan 1, Xin Zhao 2, Sung Kyu Lim 2 1 School of Electrical
More informationEE241 - Spring 2006 Advanced Digital Integrated Circuits
EE241 - Spring 2006 Advanced Digital Integrated Circuits Lecture 20: Asynchronous & Synchronization Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal
More informationLecture 19. Architectural Directions
Lecture 19 Architectural Directions Today s lecture Advanced Architectures NUMA Blue Gene 2010 Scott B. Baden / CSE 160 / Winter 2010 2 Final examination Announcements Thursday, March 17, in this room:
More informationLightweight Superscalar Task Execution in Distributed Memory
Lightweight Superscalar Task Execution in Distributed Memory Asim YarKhan 1 and Jack Dongarra 1,2,3 1 Innovative Computing Lab, University of Tennessee, Knoxville, TN 2 Oak Ridge National Lab, Oak Ridge,
More informationComputer Architecture
Lecture 2: Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture CPU Evolution What is? 2 Outline Measurements and metrics : Performance, Cost, Dependability, Power Guidelines
More informationAnalog Computation in Flash Memory for Datacenter-scale AI Inference in a Small Chip
1 Analog Computation in Flash Memory for Datacenter-scale AI Inference in a Small Chip Dave Fick, CTO/Founder Mike Henry, CEO/Founder About Mythic 2 Focused on high-performance Edge AI Full stack co-design:
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More informationParallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco
Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and
More informationA Fast and Scalable Low Dimensional Solver for Charged Particle Dynamics in Large Particle Accelerators
A Fast and Scalable Low Dimensional Solver for Charged Particle Dynamics in Large Particle Accelerators Yves Ineichen 1,2,3, Andreas Adelmann 2, Costas Bekas 3, Alessandro Curioni 3, Peter Arbenz 1 1 ETH
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More informationCyclops Tensor Framework
Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More informationAgreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur
Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program
More informationScalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults
Scalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults K.Morris, F.Rizzi, K.Sargsyan, K.Dahlgren, P.Mycek, C.Safta, O.LeMaitre, O.Knio, B.Debusschere Sandia National
More informationFrom here to there. peta exa-scale transient simulation. Lawrence Mitchell 1, 27th March
From here to there challenges for peta exa-scale transient simulation Lawrence Mitchell 1, 27th March 2017 1 Departments of Computing and Mathematics, Imperial College London lawrence.mitchell@imperial.ac.uk
More informationRound-off error propagation and non-determinism in parallel applications
Round-off error propagation and non-determinism in parallel applications Vincent Baudoui (Argonne/Total SA) vincent.baudoui@gmail.com Franck Cappello (Argonne/INRIA/UIUC-NCSA) Georges Oppenheim (Paris-Sud
More informationRevisiting Memory Errors in Large-Scale Production Data Centers
Revisiting Memory Errors in Large-Scale Production Da Centers Analysis and Modeling of New Trends from the Field Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu Overview Study of DRAM reliability: on modern
More informationDRAM Reliability: Parity, ECC, Chipkill, Scrubbing. Alpha Particle or Cosmic Ray. electron-hole pairs. silicon. DRAM Memory System: Lecture 13
slide 1 DRAM Reliability: Parity, ECC, Chipkill, Scrubbing Alpha Particle or Cosmic Ray electron-hole pairs silicon Alpha Particles: Radioactive impurity in package material slide 2 - Soft errors were
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationERLANGEN REGIONAL COMPUTING CENTER
ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,
More information2M x 32 Bit 5V FPM SIMM. Fast Page Mode (FPM) DRAM SIMM S51T04JD Pin 2Mx32 FPM SIMM Unbuffered, 1k Refresh, 5V. General Description.
Fast Page Mode (FPM) DRAM SIMM 322006-S51T04JD Pin 2Mx32 Unbuffered, 1k Refresh, 5V General Description The module is a 2Mx32 bit, 4 chip, 5V, 72 Pin SIMM module consisting of (4) 1Mx16 (SOJ) DRAM. The
More informationA Data Communication Reliability and Trustability Study for Cluster Computing
A Data Communication Reliability and Trustability Study for Cluster Computing Speaker: Eduardo Colmenares Midwestern State University Wichita Falls, TX HPC Introduction Relevant to a variety of sciences,
More informationDeadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC]
Deadlock CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC] 1 Outline Resources Deadlock Deadlock Prevention Deadlock Avoidance Deadlock Detection Deadlock Recovery 2 Review: Synchronization
More informationHigh Performance Computing
Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),
More informationThe Performance Evolution of the Parallel Ocean Program on the Cray X1
The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott
More informationEfficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems
Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems Yifeng Guo, Dakai Zhu The University of Texas at San Antonio Hakan Aydin George Mason University Outline Background and Motivation
More informationCPU Consolidation versus Dynamic Voltage and Frequency Scaling in a Virtualized Multi-Core Server: Which is More Effective and When
1 CPU Consolidation versus Dynamic Voltage and Frequency Scaling in a Virtualized Multi-Core Server: Which is More Effective and When Inkwon Hwang, Student Member and Massoud Pedram, Fellow, IEEE Abstract
More informationSoftware optimization for petaflops/s scale Quantum Monte Carlo simulations
Software optimization for petaflops/s scale Quantum Monte Carlo simulations A. Scemama 1, M. Caffarel 1, E. Oseret 2, W. Jalby 2 1 Laboratoire de Chimie et Physique Quantiques / IRSAMC, Toulouse, France
More informationSaving Energy in Sparse and Dense Linear Algebra Computations
Saving Energy in Sparse and Dense Linear Algebra Computations P. Alonso, M. F. Dolz, F. Igual, R. Mayo, E. S. Quintana-Ortí, V. Roca Univ. Politécnica Univ. Jaume I The Univ. of Texas de Valencia, Spain
More informationApril 2004 AS7C3256A
pril 2004 S7C3256 3.3V 32K X 8 CMOS SRM (Common I/O) Features Pin compatible with S7C3256 Industrial and commercial temperature options Organization: 32,768 words 8 bits High speed - 10/12/15/20 ns address
More informationSymmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano
Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization
More informationHeterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry
Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)
More informationELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers
ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional
More informationChapter 7. Sequential Circuits Registers, Counters, RAM
Chapter 7. Sequential Circuits Registers, Counters, RAM Register - a group of binary storage elements suitable for holding binary info A group of FFs constitutes a register Commonly used as temporary storage
More informationParallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata
Parallelization of the Dirac operator Pushan Majumdar Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Outline Introduction Algorithms Parallelization Comparison of performances Conclusions
More informationMassively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling
2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)
More informationSTATISTICAL PERFORMANCE
STATISTICAL PERFORMANCE PROVISIONING AND ENERGY EFFICIENCY IN DISTRIBUTED COMPUTING SYSTEMS Nikzad Babaii Rizvandi 1 Supervisors: Prof.Albert Y.Zomaya Prof. Aruna Seneviratne OUTLINE Introduction Background
More information