Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing

Size: px
Start display at page:

Download "Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing"

Transcription

1 Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing Prasanna Balaprakash, Leonardo A. Bautista Gomez, Slim Bouguerra, Stefan M. Wild, Franck Cappello, and Paul D. Hovland ANL PMBS SC 14 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 1

2 Context and motivations Context: The Need For Speed Figure : From leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 2

3 Context and motivations Motivation: Failures Sequoia MTBF 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 3

4 Context and motivations Motivation: Failures Sequoia MTBF 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day. 20 % of the computation is wasted in recovery and re-execution (Implies energy waste leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 3

5 Context and motivations Motivation: Failures Sequoia MTBF 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day. 20 % of the computation is wasted in recovery and re-execution (Implies energy waste Exascale: The number of components for both memory and processors will increase by a factor of 100. Shrinking the circuit sizes and running at lower voltages, increases the SDC probability. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 3

6 Context and motivations Motivation: Failures Sequoia MTBF 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day. 20 % of the computation is wasted in recovery and re-execution (Implies energy waste Exascale: The number of components for both memory and processors will increase by a factor of 100. Shrinking the circuit sizes and running at lower voltages, increases the SDC probability. In exascale failures will occur at higher frequency, optimistic MTBF is couple of hours. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 3

7 Context and motivations Motivation: Energy The power draw of the interconnect on Blue Gene/Q appears to be independent of load. CPU varies only by some 20% Power draw under different loads is DRAM change by a factor of 2 or more. Exascale: Data movement and IO will consume more than 70% of the total system power (most of the 20 MW will go just to power the 10 PB of total system memory. Flops/Watt VS Communication/Watts Avoid checkpointing and data movement do more re-computations. VS Avoid re-computations via checkpointing more often. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 4

8 Context and motivations Related Work ECOTFIT, Diouri et al Blocking checkpointing Message logging Conclusion no big tradeoff observed. Meneses et al Parallel recovery vs global recovery Used RALP API (No communication or IO are covered Parallel is better since it reduces the overall time Aupy et al Blocking vs no-blocking single level. No experiment. (ANL Tradeoffs between Energy and Run Time PMBS SC 14 5

9 Context and motivations Related Work ECOTFIT, Diouri et al Blocking checkpointing Message logging Conclusion no big tradeoff observed. Meneses et al Parallel recovery vs global recovery Used RALP API (No communication or IO are covered Parallel is better since it reduces the overall time Aupy et al Blocking vs no-blocking single level. No experiment. The missing episode What about multilevel checkpointing? (ANL Tradeoffs between Energy and Run Time PMBS SC 14 5

10 Problem formulation and notations 1 Context and motivations 2 Problem formulation and notations Multilevel checkpointing Energy model Multiobjective optimization 3 Simulation and experimentations Experimentations Tradeoff analysis 4 Conclusion and future work leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 6

11 Problem formulation and notations Multilevel checkpointing Multilevel Checkpointing Multiple levels of storage (DRAM, NVM, PFS. Coupled with data replication and erasure codes. Low levels offer high performance and partial reliability. High levels offer high reliability but impose large overhead. Different ckpt. levels have different frequencies. After a failure the application restart from the lowest available level. If unable to recover, try next level of checkpoint (Further in the past. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 7

12 Problem formulation and notations Multilevel checkpointing Wasted time model L levels of checkpoint (4 with FTI Checkpoint strategy: τ i, for i = 1 L Checkpoint cost: c i for level i r i time for a restart from level i d i downtime after a failure affecting level i. µ i rate of failures affecting level i. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 8

13 Problem formulation and notations Energy model Wasted energy model Pi c Pi r power for a level i checkpoint Watts. power for a restart from level i Watts. P a power for a failure-free computation without checkpointing Watts. µ i rate for failure affecting level i. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 9

14 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint time W ch = L i=1 c i τ i i 1 + µ i τ i j=1 c j 2τ j leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 10

15 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint time W ch = L i=1 c i τ i i 1 + µ i τ i j=1 c j 2τ j Rework time W rew = L i=1 µ i τ i 2 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 10

16 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint time W ch = L i=1 c i τ i i 1 + µ i τ i j=1 c j 2τ j Rework time W rew = L i=1 µ i τ i 2 Downtime and restart time W down = L µ i (r i + d i i=1 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 10

17 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint wasted energy E ch = L i=1 P c i c i i 1 Pj c + µ i τ c j i τ i 2τ j j=1 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 11

18 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint wasted energy E ch = L i=1 P c i c i i 1 Pj c + µ i τ c j i τ i 2τ j j=1 Rework wasted energy E rew = L i=1 P a µ iτ i 2 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 11

19 Problem formulation and notations Multiobjective optimization Problem solving Checkpoint wasted energy E ch = L i=1 P c i c i i 1 Pj c + µ i τ c j i τ i 2τ j j=1 Rework wasted energy E rew = L i=1 P a µ iτ i 2 Downtime and restart wasted energy E down = L Pi r µ i (r i + d i i=1 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 11

20 Problem formulation and notations Multiobjective optimization Total wasted time W = L i=1 ( ( c i + µ iτ i 1 + τ i 2 i 1 j=1 c j 2τ j + µ i (r i + d i (1 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 12

21 Problem formulation and notations Multiobjective optimization Total wasted time W = L i=1 ( ( c i + µ iτ i 1 + τ i 2 i 1 j=1 c j 2τ j + µ i (r i + d i (1 Total wasted energy E = ( L P c i c i i=1 τ i ( + µ i τ P a i 2 + i 1 Pj cc j j=1 2τ j + L i=1 Pr i µ i(r i + d i, (2 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 12

22 Problem formulation and notations Multiobjective optimization First derivatives E τ i W τ i = µ i 2 = µ i 2 ( ( i j=1 i 1 P a + j=1 c j τ j P c j c j τ j c i τi 2 ( Pc i c i τ 2 i 1 + ( L j=i µ j τ j 2 L j=i+1 µ j τ j 2 (3 (4 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 13

23 Problem formulation and notations Multiobjective optimization First derivatives E τ i W τ i = µ i 2 = µ i 2 ( ( i j=1 i 1 P a + j=1 c j τ j P c j c j τ j c i τi 2 ( Pc i c i τ 2 i 1 + ( L j=i µ j τ j 2 L j=i+1 µ j τ j 2 (3 (4 Solutions τ W i = τ E i = c i(2 + L j=i+1 µ jτ W µ i (1 + i 1 c j j=1 τ W j j ρ ic i (2 + L j=i+1 µ jτ E µ i (1 + i 1 j=1 ρ j c j τj E j ρ i = P c i /Pa leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 13

24 Problem formulation and notations Multiobjective optimization Solutions ρ i = P c i /Pa τ W i = τ E i = c i(2 + L j=i+1 µ jτ W µ i (1 + i 1 c j j=1 τ W j j ρ ic i (2 + L j=i+1 µ jτ E µ i (1 + i 1 j=1 ρ j c j τj E j Solutions For one single level we have : τ W = 2c/µ and τ E = τ W P c /P a Whenever P c P a, we have that τ W τ E, and hence the two objectives are conflicting. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 14

25 Problem formulation and notations Multiobjective optimization Pareto front Definition τ i is said to be Pareto-optimal if it is not dominated by any other τ j. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 15

26 Problem formulation and notations Multiobjective optimization Pareto front Definition τ i is said to be Pareto-optimal if it is not dominated by any other τ j. Convex combination If the Pareto front is convex, any point on the front can be obtained by minimizing a linear combination of the objectives. Theorem f λ (τ = λw(τ + (1 λe(τ, for λ [0, 1]. The Hessian 2 ττ W(τ is diagonally dominant, and thus W is a convex function of τ over the domain (same for E(τ leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 15

27 Problem formulation and notations Multiobjective optimization Pareto front Convex combination If the Pareto front is convex, any point on the front can be obtained by minimizing a linear combination of the objectives. Theorem f λ (τ = λw(τ + (1 λe(τ, for λ [0, 1]. The Hessian 2 ττ W(τ is diagonally dominant, and thus W is a convex function of τ over the domain (same for E(τ τ i (λ = ( c i (λ + (1 λpi c 2 + L µ j τj j=i+1 (, (5 µ i λ + (1 λp a + i 1 (λ + (1 λpj c c j j=1 τ j leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 15

28 Problem formulation and notations Multiobjective optimization Pareto front Theorem The Hessian 2 ττ W(τ is diagonally dominant, and thus W is a convex function of τ over the domain (same for E(τ τ i (λ = ( c i (λ + (1 λpi c 2 + L µ j τj j=i+1 (, (5 µ i λ + (1 λp a + i 1 (λ + (1 λpj c c j j=1 τ j Case one level (L = 1 τ (λ = τ W λ + (1 λp c λ + (1 λp a. (6 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 15

29 Problem formulation and notations Multiobjective optimization Pareto Front (ANL Tradeoffs between Energy and Run Time PMBS SC 14 16

30 Simulation and experimentations 1 Context and motivations 2 Problem formulation and notations Multilevel checkpointing Energy model Multiobjective optimization 3 Simulation and experimentations Experimentations Tradeoff analysis 4 Conclusion and future work leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 17

31 Simulation and experimentations Experimentations Platform Mira 10 PF IBM Blue Gene/Q (BG/Q 49,152 nodes organized in 48 racks 16 cores of 1.6 GHz PowerPC A2 and 16 GB of DDR3 memory. 5-D torus network. Vesta (developmental platform for Mira Same as Mira s but with 2,048 nodes leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 18

32 Simulation and experimentations Experimentations Platform Mira 10 PF IBM Blue Gene/Q (BG/Q 49,152 nodes organized in 48 racks 16 cores of 1.6 GHz PowerPC A2 and 16 GB of DDR3 memory. 5-D torus network. Vesta (developmental platform for Mira Same as Mira s but with 2,048 nodes MonEQ for power measurement (resolution of 560 ms Chip core DRAM Network Collect power data only at the node card level (every 32 nodes The library can not measure the I/O power consumption!! leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 18

33 Simulation and experimentations Experimentations Applications LAMMPS Production-level molecular dynamics application. Lennard-Jones simulation of 1.3 billion atoms. Checkpoint size per node 200 MB ( 100GB application s footprint Checkpoints intervals 4, 8, 16 and 32 minutes. CORAL Qbox: first-principles molecular dynamics AMG: is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. LULESH: performs hydrodynamics stencil calculations minife: is a finite-element code. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 19

34 Simulation and experimentations Experimentations LAMMPS: synchronous Figure : Synchronous multilevel checkpointing 1.3 billion atoms Lennard-Jones simulation. 512 nodes running 64 MPI ranks per node (32,678 proc.. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 20

35 Simulation and experimentations Experimentations LAMMPS: asynchronous Figure : Asynchronous multilevel checkpointing 1.3 billion atoms Lennard-Jones simulation. 512 nodes running 64 MPI ranks per node (32,678 proc.. leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 21

36 Simulation and experimentations Experimentations CORAL benchmark (a LULESH (b MiniFE (c AMG (d Qbox 32 nodes. Each(ANL application Tradeoffs is run between with Energy a and configuration Run Time PMBS of 16workshop ranks SC 14 22

37 Simulation and experimentations Tradeoff analysis Pareto fronts: Level 4 power consumption (e Level 4 power consumption P c 4 leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 23

38 Simulation and experimentations Tradeoff analysis Pareto fronts :Computation power (f Computation power P a leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 24

39 Simulation and experimentations Tradeoff analysis Pareto fronts : Power ratio Pc P a (g Power ratio Pc P a leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 25

40 Conclusion and future work 1 Context and motivations 2 Problem formulation and notations Multilevel checkpointing Energy model Multiobjective optimization 3 Simulation and experimentations Experimentations Tradeoff analysis 4 Conclusion and future work leobago@anl.gov (ANL Tradeoffs between Energy and Run Time PMBS SC 14 26

41 Conclusion and future work Summary Analytical models of performance and energy for multilevel checkpoint schemes. The pareto-front is obtained using convex combination. Power measurement experiments with production-level scientific applications running on over 32,000 MPI processes: The relative energy overhead of using FTI is minor and thus the tradeoffs are relatively small. Richer tradeoff exist when the power consumption of checkpointing is greater than that of the computation. (ANL Tradeoffs between Energy and Run Time PMBS SC 14 27

42 Conclusion and future work What is Next? Analyzing power profile of different fault tolerance protocols such as full/partial replication and message logging. The viability of replication with respect to the power cap of future exascale platforms (ANL Tradeoffs between Energy and Run Time PMBS SC 14 28

43 Conclusion and future work Questions? Thank You!! (ANL Tradeoffs between Energy and Run Time PMBS SC 14 29

Tradeoff between Reliability and Power Management

Tradeoff between Reliability and Power Management Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,

More information

Resilient and energy-aware algorithms

Resilient and energy-aware algorithms Resilient and energy-aware algorithms Anne Benoit ENS Lyon Anne.Benoit@ens-lyon.fr http://graal.ens-lyon.fr/~abenoit CR02-2016/2017 Anne.Benoit@ens-lyon.fr CR02 Resilient and energy-aware algorithms 1/

More information

Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden

Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden of /4 4 Embedded Systems Design: Optimization Challenges Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden Outline! Embedded systems " Example area: automotive electronics " Embedded systems

More information

A Tale of Two Erasure Codes in HDFS

A Tale of Two Erasure Codes in HDFS A Tale of Two Erasure Codes in HDFS Dynamo Mingyuan Xia *, Mohit Saxena +, Mario Blaum +, and David A. Pease + * McGill University, + IBM Research Almaden FAST 15 何军权 2015-04-30 1 Outline Introduction

More information

Optimal checkpointing periods with fail-stop and silent errors

Optimal checkpointing periods with fail-stop and silent errors Optimal checkpointing periods with fail-stop and silent errors Anne Benoit ENS Lyon Anne.Benoit@ens-lyon.fr http://graal.ens-lyon.fr/~abenoit 3rd JLESC Summer School June 30, 2016 Anne.Benoit@ens-lyon.fr

More information

Checkpoint Scheduling

Checkpoint Scheduling Checkpoint Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute of

More information

Coping with silent and fail-stop errors at scale by combining replication and checkpointing

Coping with silent and fail-stop errors at scale by combining replication and checkpointing Coping with silent and fail-stop errors at scale by combining replication and checkpointing Anne Benoit a, Aurélien Cavelan b, Franck Cappello c, Padma Raghavan d, Yves Robert a,e, Hongyang Sun d, a Ecole

More information

Fault-Tolerant Techniques for HPC

Fault-Tolerant Techniques for HPC Fault-Tolerant Techniques for HPC Yves Robert Laboratoire LIP, ENS Lyon Institut Universitaire de France University Tennessee Knoxville Yves.Robert@inria.fr http://graal.ens-lyon.fr/~yrobert/htdc-flaine.pdf

More information

Energy-efficient scheduling

Energy-efficient scheduling Energy-efficient scheduling Guillaume Aupy 1, Anne Benoit 1,2, Paul Renaud-Goud 1 and Yves Robert 1,2,3 1. Ecole Normale Supérieure de Lyon, France 2. Institut Universitaire de France 3. University of

More information

Periodic I/O Scheduling for Supercomputers

Periodic I/O Scheduling for Supercomputers Periodic I/O Scheduling for Supercomputers Guillaume Aupy 1, Ana Gainaru 2, Valentin Le Fèvre 3 1 Inria & U. of Bordeaux 2 Vanderbilt University 3 ENS Lyon & Inria PMBS Workshop, November 217 Slides available

More information

Cactus Tools for Petascale Computing

Cactus Tools for Petascale Computing Cactus Tools for Petascale Computing Erik Schnetter Reno, November 2007 Gamma Ray Bursts ~10 7 km He Protoneutron Star Accretion Collapse to a Black Hole Jet Formation and Sustainment Fe-group nuclei Si

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

Adaptive and Power-Aware Resilience for Extreme-scale Computing

Adaptive and Power-Aware Resilience for Extreme-scale Computing Adaptive and Power-Aware Resilience for Extreme-scale Computing Xiaolong Cui, Taieb Znati, Rami Melhem Computer Science Department University of Pittsburgh Pittsburgh, USA Email: {mclarencui, znati, melhem}@cs.pitt.edu

More information

Rollback-Recovery. Uncoordinated Checkpointing. p!! Easy to understand No synchronization overhead. Flexible. To recover from a crash:

Rollback-Recovery. Uncoordinated Checkpointing. p!! Easy to understand No synchronization overhead. Flexible. To recover from a crash: Rollback-Recovery Uncoordinated Checkpointing Easy to understand No synchronization overhead p!! Flexible can choose when to checkpoint To recover from a crash: go back to last checkpoint restart How (not)to

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

Fault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing

Fault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing 20 Years of Innovative Computing Knoxville, Tennessee March 26 th, 200 Fault Tolerate Linear Algebra: Survive Fail-Stop Failures without Checkpointing Zizhong (Jeffrey) Chen zchen@mines.edu Colorado School

More information

Efficient checkpoint/verification patterns

Efficient checkpoint/verification patterns Efficient checkpoint/verification patterns Anne Benoit a, Saurabh K. Raina b, Yves Robert a,c a Laboratoire LIP, École Normale Supérieure de Lyon, France b Department of CSE/IT, Jaypee Institute of Information

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.

More information

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption EE115C Winter 2017 Digital Electronic Circuits Lecture 6: Power Consumption Four Key Design Metrics for Digital ICs Cost of ICs Reliability Speed Power EE115C Winter 2017 2 Power and Energy Challenges

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY

CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY MEMSYS-2017 SWAMIT TANNU DOUG CARMEAN MOINUDDIN QURESHI Why Quantum Computers? 2 Molecule and Material Simulations

More information

CMP 338: Third Class

CMP 338: Third Class CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does

More information

How to deal with uncertainties and dynamicity?

How to deal with uncertainties and dynamicity? How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling

More information

On scheduling the checkpoints of exascale applications

On scheduling the checkpoints of exascale applications On scheduling the checkpoints of exascale applications Marin Bougeret, Henri Casanova, Mikaël Rabie, Yves Robert, and Frédéric Vivien INRIA, École normale supérieure de Lyon, France Univ. of Hawai i at

More information

Distributed by: www.jameco.com 1-800-831-4242 The content and copyrights of the attached material are the property of its owner. September 2001 S7C256 5V/3.3V 32K X 8 CMOS SRM (Common I/O) Features S7C256

More information

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This Recap: Finger Table Finding a using fingers Distributed Systems onsensus Steve Ko omputer Sciences and Engineering University at Buffalo N102 86 + 2 4 N86 20 + 2 6 N20 2 Let s onsider This

More information

NCEP Applications -- HPC Performance and Strategies. Mark Iredell software team lead USDOC/NOAA/NWS/NCEP/EMC

NCEP Applications -- HPC Performance and Strategies. Mark Iredell software team lead USDOC/NOAA/NWS/NCEP/EMC NCEP Applications -- HPC Performance and Strategies Mark Iredell software team lead USDOC/NOAA/NWS/NCEP/EMC Motivation and Outline Challenges in porting NCEP applications to WCOSS and future operational

More information

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga

More information

The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems

The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems Wu Feng and Balaji Subramaniam Metrics for Energy Efficiency Energy- Delay Product (EDP) Used primarily in circuit design

More information

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM Mark McDermott Electrical and Computer Engineering The University of Texas at Austin 9/27/18 VLSI-1 Class Notes Why Clocking?

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

Energy-aware checkpointing of divisible tasks with soft or hard deadlines

Energy-aware checkpointing of divisible tasks with soft or hard deadlines Energy-aware checkpointing of divisible tasks with soft or hard deadlines Guillaume Aupy 1, Anne Benoit 1,2, Rami Melhem 3, Paul Renaud-Goud 1 and Yves Robert 1,2,4 1. Ecole Normale Supérieure de Lyon,

More information

Adjoint-Based Uncertainty Quantification and Sensitivity Analysis for Reactor Depletion Calculations

Adjoint-Based Uncertainty Quantification and Sensitivity Analysis for Reactor Depletion Calculations Adjoint-Based Uncertainty Quantification and Sensitivity Analysis for Reactor Depletion Calculations Hayes F. Stripling 1 Marvin L. Adams 1 Mihai Anitescu 2 1 Department of Nuclear Engineering, Texas A&M

More information

Lecture 5 Fault Modeling

Lecture 5 Fault Modeling Lecture 5 Fault Modeling Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults Single stuck-at faults Fault equivalence Fault dominance and checkpoint theorem Classes

More information

EDF Feasibility and Hardware Accelerators

EDF Feasibility and Hardware Accelerators EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo, Waterloo, Canada, arrmorton@uwaterloo.ca Wayne M. Loucks University of Waterloo, Waterloo, Canada, wmloucks@pads.uwaterloo.ca

More information

12 - The Tie Set Method

12 - The Tie Set Method 12 - The Tie Set Method Definitions: A tie set V is a set of components whose success results in system success, i.e. the presence of all components in any tie set connects the input to the output in the

More information

Asynchronous Parareal in time discretization for partial differential equations

Asynchronous Parareal in time discretization for partial differential equations Asynchronous Parareal in time discretization for partial differential equations Frédéric Magoulès, Guillaume Gbikpi-Benissan April 7, 2016 CentraleSupélec IRT SystemX Outline of the presentation 01 Introduction

More information

A Piggybacking Design Framework for Read- and- Download- efficient Distributed Storage Codes. K. V. Rashmi, Nihar B. Shah, Kannan Ramchandran

A Piggybacking Design Framework for Read- and- Download- efficient Distributed Storage Codes. K. V. Rashmi, Nihar B. Shah, Kannan Ramchandran A Piggybacking Design Framework for Read- and- Download- efficient Distributed Storage Codes K V Rashmi, Nihar B Shah, Kannan Ramchandran Outline IntroducGon & MoGvaGon Measurements from Facebook s Warehouse

More information

Behavioral Simulations in MapReduce

Behavioral Simulations in MapReduce Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?

More information

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM

More information

Scalable Tools for Debugging Non-Deterministic MPI Applications

Scalable Tools for Debugging Non-Deterministic MPI Applications Scalable Tools for Debugging Non-Deterministic MPI Applications ReMPI: MPI Record-and-Replay tool Scalable Tools Workshop August 2nd, 2016 Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Mar>n

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

Cartesius Opening. Jean-Marc DENIS. June, 14th, International Business Director Extreme Computing Business Unit

Cartesius Opening. Jean-Marc DENIS. June, 14th, International Business Director Extreme Computing Business Unit Cartesius Opening June, 14th, 2013 Jean-Marc DENIS International Business Director Extreme Computing Business Unit 1 Cartesius (Renatus, 1596 1650) (*) René Descartes (French: [ʁəne dekaʁt]; Latinized:

More information

An Integrative Model for Parallelism

An Integrative Model for Parallelism An Integrative Model for Parallelism Victor Eijkhout ICERM workshop 2012/01/09 Introduction Formal part Examples Extension to other memory models Conclusion tw-12-exascale 2012/01/09 2 Introduction tw-12-exascale

More information

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

Checkpointing strategies for parallel jobs

Checkpointing strategies for parallel jobs Checkpointing strategies for parallel jobs Marin Bougeret ENS Lyon, France Marin.Bougeret@ens-lyon.fr Yves Robert ENS Lyon, France Yves.Robert@ens-lyon.fr Henri Casanova Univ. of Hawai i at Mānoa, Honolulu,

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

A Flexible Checkpoint/Restart Model in Distributed Systems

A Flexible Checkpoint/Restart Model in Distributed Systems A Flexible Checkpoint/Restart Model in Distributed Systems Mohamed-Slim Bouguerra 1,2, Thierry Gautier 2, Denis Trystram 1, and Jean-Marc Vincent 1 1 Grenoble University, ZIRST 51, avenue Jean Kuntzmann

More information

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,

More information

2.5D algorithms for distributed-memory computing

2.5D algorithms for distributed-memory computing ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster

More information

A Hybrid Method for the Wave Equation. beilina

A Hybrid Method for the Wave Equation.   beilina A Hybrid Method for the Wave Equation http://www.math.unibas.ch/ beilina 1 The mathematical model The model problem is the wave equation 2 u t 2 = (a 2 u) + f, x Ω R 3, t > 0, (1) u(x, 0) = 0, x Ω, (2)

More information

Scalable numerical algorithms for electronic structure calculations

Scalable numerical algorithms for electronic structure calculations Scalable numerical algorithms for electronic structure calculations Edgar Solomonik C Berkeley July, 2012 Edgar Solomonik Cyclops Tensor Framework 1/ 73 Outline Introduction Motivation: Coupled Cluster

More information

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling 2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)

More information

Efficient checkpoint/verification patterns for silent error detection

Efficient checkpoint/verification patterns for silent error detection Efficient checkpoint/verification patterns for silent error detection Anne Benoit 1, Saurabh K. Raina 2 and Yves Robert 1,3 1. LIP, École Normale Supérieure de Lyon, CNRS & INRIA, France 2. Jaypee Institute

More information

Coding for loss tolerant systems

Coding for loss tolerant systems Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 Mathieu Cunche, Vincent Roca INRIA, équipe Planète INRIA Rhône-Alpes Mathieu Cunche, Vincent Roca The erasure channel Erasure codes Reed-Solomon

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

Early-stage Power Grid Analysis for Uncertain Working Modes

Early-stage Power Grid Analysis for Uncertain Working Modes Early-stage Power Grid Analysis for Uncertain Working Modes Haifeng Qian Department of ECE University of Minnesota Minneapolis, MN 55414 qianhf@ece.umn.edu Sani R. Nassif IBM Austin Research Labs 11400

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory MAX PLANCK INSTITUTE November 5, 2010 MPI at MPI Jens Saak Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory FOR DYNAMICS OF COMPLEX TECHNICAL

More information

Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model

Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Yang Shang 1, Chun Zhang 1, Hao Yu 1, Chuan Seng Tan 1, Xin Zhao 2, Sung Kyu Lim 2 1 School of Electrical

More information

EE241 - Spring 2006 Advanced Digital Integrated Circuits

EE241 - Spring 2006 Advanced Digital Integrated Circuits EE241 - Spring 2006 Advanced Digital Integrated Circuits Lecture 20: Asynchronous & Synchronization Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal

More information

Lecture 19. Architectural Directions

Lecture 19. Architectural Directions Lecture 19 Architectural Directions Today s lecture Advanced Architectures NUMA Blue Gene 2010 Scott B. Baden / CSE 160 / Winter 2010 2 Final examination Announcements Thursday, March 17, in this room:

More information

Lightweight Superscalar Task Execution in Distributed Memory

Lightweight Superscalar Task Execution in Distributed Memory Lightweight Superscalar Task Execution in Distributed Memory Asim YarKhan 1 and Jack Dongarra 1,2,3 1 Innovative Computing Lab, University of Tennessee, Knoxville, TN 2 Oak Ridge National Lab, Oak Ridge,

More information

Computer Architecture

Computer Architecture Lecture 2: Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture CPU Evolution What is? 2 Outline Measurements and metrics : Performance, Cost, Dependability, Power Guidelines

More information

Analog Computation in Flash Memory for Datacenter-scale AI Inference in a Small Chip

Analog Computation in Flash Memory for Datacenter-scale AI Inference in a Small Chip 1 Analog Computation in Flash Memory for Datacenter-scale AI Inference in a Small Chip Dave Fick, CTO/Founder Mike Henry, CEO/Founder About Mythic 2 Focused on high-performance Edge AI Full stack co-design:

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

A Fast and Scalable Low Dimensional Solver for Charged Particle Dynamics in Large Particle Accelerators

A Fast and Scalable Low Dimensional Solver for Charged Particle Dynamics in Large Particle Accelerators A Fast and Scalable Low Dimensional Solver for Charged Particle Dynamics in Large Particle Accelerators Yves Ineichen 1,2,3, Andreas Adelmann 2, Costas Bekas 3, Alessandro Curioni 3, Peter Arbenz 1 1 ETH

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur

Agreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program

More information

Scalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults

Scalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults Scalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults K.Morris, F.Rizzi, K.Sargsyan, K.Dahlgren, P.Mycek, C.Safta, O.LeMaitre, O.Knio, B.Debusschere Sandia National

More information

From here to there. peta exa-scale transient simulation. Lawrence Mitchell 1, 27th March

From here to there. peta exa-scale transient simulation. Lawrence Mitchell 1, 27th March From here to there challenges for peta exa-scale transient simulation Lawrence Mitchell 1, 27th March 2017 1 Departments of Computing and Mathematics, Imperial College London lawrence.mitchell@imperial.ac.uk

More information

Round-off error propagation and non-determinism in parallel applications

Round-off error propagation and non-determinism in parallel applications Round-off error propagation and non-determinism in parallel applications Vincent Baudoui (Argonne/Total SA) vincent.baudoui@gmail.com Franck Cappello (Argonne/INRIA/UIUC-NCSA) Georges Oppenheim (Paris-Sud

More information

Revisiting Memory Errors in Large-Scale Production Data Centers

Revisiting Memory Errors in Large-Scale Production Data Centers Revisiting Memory Errors in Large-Scale Production Da Centers Analysis and Modeling of New Trends from the Field Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu Overview Study of DRAM reliability: on modern

More information

DRAM Reliability: Parity, ECC, Chipkill, Scrubbing. Alpha Particle or Cosmic Ray. electron-hole pairs. silicon. DRAM Memory System: Lecture 13

DRAM Reliability: Parity, ECC, Chipkill, Scrubbing. Alpha Particle or Cosmic Ray. electron-hole pairs. silicon. DRAM Memory System: Lecture 13 slide 1 DRAM Reliability: Parity, ECC, Chipkill, Scrubbing Alpha Particle or Cosmic Ray electron-hole pairs silicon Alpha Particles: Radioactive impurity in package material slide 2 - Soft errors were

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

2M x 32 Bit 5V FPM SIMM. Fast Page Mode (FPM) DRAM SIMM S51T04JD Pin 2Mx32 FPM SIMM Unbuffered, 1k Refresh, 5V. General Description.

2M x 32 Bit 5V FPM SIMM. Fast Page Mode (FPM) DRAM SIMM S51T04JD Pin 2Mx32 FPM SIMM Unbuffered, 1k Refresh, 5V. General Description. Fast Page Mode (FPM) DRAM SIMM 322006-S51T04JD Pin 2Mx32 Unbuffered, 1k Refresh, 5V General Description The module is a 2Mx32 bit, 4 chip, 5V, 72 Pin SIMM module consisting of (4) 1Mx16 (SOJ) DRAM. The

More information

A Data Communication Reliability and Trustability Study for Cluster Computing

A Data Communication Reliability and Trustability Study for Cluster Computing A Data Communication Reliability and Trustability Study for Cluster Computing Speaker: Eduardo Colmenares Midwestern State University Wichita Falls, TX HPC Introduction Relevant to a variety of sciences,

More information

Deadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC]

Deadlock. CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC] Deadlock CSE 2431: Introduction to Operating Systems Reading: Chap. 7, [OSC] 1 Outline Resources Deadlock Deadlock Prevention Deadlock Avoidance Deadlock Detection Deadlock Recovery 2 Review: Synchronization

More information

High Performance Computing

High Performance Computing Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),

More information

The Performance Evolution of the Parallel Ocean Program on the Cray X1

The Performance Evolution of the Parallel Ocean Program on the Cray X1 The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott

More information

Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems

Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems Efficient Power Management Schemes for Dual-Processor Fault-Tolerant Systems Yifeng Guo, Dakai Zhu The University of Texas at San Antonio Hakan Aydin George Mason University Outline Background and Motivation

More information

CPU Consolidation versus Dynamic Voltage and Frequency Scaling in a Virtualized Multi-Core Server: Which is More Effective and When

CPU Consolidation versus Dynamic Voltage and Frequency Scaling in a Virtualized Multi-Core Server: Which is More Effective and When 1 CPU Consolidation versus Dynamic Voltage and Frequency Scaling in a Virtualized Multi-Core Server: Which is More Effective and When Inkwon Hwang, Student Member and Massoud Pedram, Fellow, IEEE Abstract

More information

Software optimization for petaflops/s scale Quantum Monte Carlo simulations

Software optimization for petaflops/s scale Quantum Monte Carlo simulations Software optimization for petaflops/s scale Quantum Monte Carlo simulations A. Scemama 1, M. Caffarel 1, E. Oseret 2, W. Jalby 2 1 Laboratoire de Chimie et Physique Quantiques / IRSAMC, Toulouse, France

More information

Saving Energy in Sparse and Dense Linear Algebra Computations

Saving Energy in Sparse and Dense Linear Algebra Computations Saving Energy in Sparse and Dense Linear Algebra Computations P. Alonso, M. F. Dolz, F. Igual, R. Mayo, E. S. Quintana-Ortí, V. Roca Univ. Politécnica Univ. Jaume I The Univ. of Texas de Valencia, Spain

More information

April 2004 AS7C3256A

April 2004 AS7C3256A pril 2004 S7C3256 3.3V 32K X 8 CMOS SRM (Common I/O) Features Pin compatible with S7C3256 Industrial and commercial temperature options Organization: 32,768 words 8 bits High speed - 10/12/15/20 ns address

More information

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization

More information

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

Chapter 7. Sequential Circuits Registers, Counters, RAM

Chapter 7. Sequential Circuits Registers, Counters, RAM Chapter 7. Sequential Circuits Registers, Counters, RAM Register - a group of binary storage elements suitable for holding binary info A group of FFs constitutes a register Commonly used as temporary storage

More information

Parallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata

Parallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Parallelization of the Dirac operator Pushan Majumdar Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Outline Introduction Algorithms Parallelization Comparison of performances Conclusions

More information

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling 2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)

More information

STATISTICAL PERFORMANCE

STATISTICAL PERFORMANCE STATISTICAL PERFORMANCE PROVISIONING AND ENERGY EFFICIENCY IN DISTRIBUTED COMPUTING SYSTEMS Nikzad Babaii Rizvandi 1 Supervisors: Prof.Albert Y.Zomaya Prof. Aruna Seneviratne OUTLINE Introduction Background

More information