Some thoughts about energy efficient application execution on NEC LX Series compute clusters
|
|
- Fay Warren
- 5 years ago
- Views:
Transcription
1 Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science Friedrich-Alexander-University Erlangen-Nuremberg Germany
2 Erlangen Regional Computing Center(RRZE) JuQueen 5 PF/s Hannover Berlin RRZE: Regional HPCservice provider and HPC research center FZ Jülich Erlangen HLRS-Stuttgart LRZ-München Hermit: 1 PF SuperMUC: 3 PF 2
3 Erlangen Regional Computing Center A broad range of users: Biology, Chemistry, CFD, Material Science, Physics Medicine, Economics, A broad range of clusters: LINUX (NEC): 560 nodes (234 TF/s) Installation: 2013 LINUX (NEC): 500 nodes (64 TF/s) Installation: 2010 LINUX (others): 300 nodes ( ) WINDOWS (other): 16 nodes (2009) Installation of a new LINUX cluster every 3 years: Decision based on benchmarks from users Production nodes: CPU only (benchmark commitments for applications on GPGPU / Phi cards ) Budget: ~2.5 3 Million USD 3
4 NEC Dedicated to Emmy Noether #210 in TOP500 as of Nov TF/s LINPACK (CPU only) LINPACK efficiency: 97.1 % of TF/s Peak (based on 2.2 GHz) Emmy cluster 234 TF/s peak 560 compute nodes 2x Intel Xeon E5-2660v2 (10 core Ivy 2.2 GHz) 64 GB DDR3 RAM 6 GPGPU nodes: 2xNVIDIA K20c 6 Phi nodes: 2xIntel Xeon Phi 4 mixed nodes: 1xK20c + 1xPhi QDR Infiniband no local disks 4
5 HPC-Research objectives SC13 Tutorial: The Practitioner's Cookbook for Good Parallel Performance on Multi- and Many- Core Systems Presenter(s): G. Wellein, G. Hager, J. Treibig SC13 Poster: Pattern-Driven Node-Level Performance Engineering Author(s):J.Treibig, G. Hager, G. Wellein See you there at 5:15-7:00 today! Performance Engineering for multi-/manycore architectures Efficient programming on hybrid parallel systems Fault Tolerance SC13 Tutorial: Hybrid MPI and OpenMP Parallel Programming Presenter(s): G. Jost, R. Rabenseifner, G. Hager Multicore tooling Application: Sparse matrix schemes and Lattice Boltzmann methods SC13 Doctoral Showcase: A Unified Sparse Matrix Format for Heterogeneous Systems Presenter: M. Kreutzer Don t miss it Thursday afternoon 5
6 Energy efficient application execution Best energy efficiency? There are so many parameters to consider! Clock Speed? Code variants SMT? Cores per Chip? 6
7 What kind of application do you run? Consider scalability within a single multicore processor chip LINPACK type Limiting factor: Core Execution STREAM type Limiting factor: Saturation (bandwidth) Change clock speed: 1.5 X 0.6 X 7
8 Simple model for Energy to solution: Clock speeds and core counts (1) Performance using t cores at clock speed of f P f, t = mmm f f 0 P 0 t, P mmm f 0 : P 0 P mmm : Baseline clock speed Baseline single core (max. chip) performance Power consumption for running t cores at clock speed of f W f, t = W 0 + W 1 f + W 2 f 2 t W 0 : Baseline power (memory, IO, network ) W 0, W 1, W 2 : Determined by benchmarks W 2 = 1 W/GHz 2 For Intel SNB: W 0 = 32 W for chip W 0 = 73 W per Socket for whole system 8
9 Simple model for Energy to solution: Clock speeds and core counts (2) Energy to solution if running t cores at clock speed of f E f, t = W f, t P f, t = W 0 + W 1 f + W 2 f 2 t mmm f f 0 P 0 t, P mmm Code optimization increases P 0 and / or P mmm and proportionally reduces E LINPACK type apps: Use all cores at clock speed of f ooo = W 0 t W 2 STREAM type apps: Minimum energy at saturation point. 9
10 Energy to Solution W 0 = 73 W W 2 = 1 W / GHz 2 LINPACK type base opt = 2 GHz = 3 GHz STREAM type Use all cores and high clock speed! Run all cores at clock speed which still saturates performance 10
11 Energy to Solution: A different way of presentation Energy vs. Performance Isoline of constant Energy delay product (E t) 11
12 A real world example: Lattice Boltzmann CFD solver STREAM type code Different levels of optimization (P 0 ): scalar, SSE, AVX code Not included in model: Bandwidth degradation with lower clock speed (2.7 GHz 1.2 GHz) 12
13 A real world example: Lattice Boltzmann CFD solver Realistic model for LBM performance MODEL MEASUREMENT Optimal point of operation: 1.2 GHz with AVX code at saturation point (7 cores) 13
14 A real world example: Lattice Boltzmann CFD solver Be aware! Lowering clock speed may lower MPI bandwidth between nodes! IMB sendrecv between two nodes (FDR IB) Using all cores network bandwidth may drop by 40%! 14
15 Lessons to learn Code optimization is a must! LINPACK-type codes: run as fast as possible STREAM-type code: Run at saturation point of lowest clock speed which saturates Check degradation of Main memory bandwidth Interconnect bandwidth Things to consider at system administration level: Allow users to specify clock speeds (simple modification in Prolog NEC) Install LIKWID toolkit ( allows users to measure power and energy consumption (likwid-powermeter) Works well with NEC software stack 15
16 LIKWID toolbox: small, flexible and easy-to-use tools likwid-topology likwid-pin likwid-bench likwid-perfctr likwid-powermeter likwid-mpirun References An analysis of energy-optimized lattice-boltzmann CFD simulations from the chip to the highly parallel level. Submitted. Preprint: arxiv: Exploring performance and power properties of modern multicore chips via simple machine models. Accepted for publication in CCPE Thank you! 16
17 Question: Name 2 hardware properties which may depend on clock speed (besides: clock speed and peak performance)? 17
More Science per Joule: Bottleneck Computing
More Science per Joule: Bottleneck Computing Georg Hager Erlangen Regional Computing Center (RRZE) University of Erlangen-Nuremberg Germany PPAM 2013 September 9, 2013 Warsaw, Poland Motivation (1): Scalability
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More informationExploring performance and power properties of modern multicore chips via simple machine models
Exploring performance and power properties of modern multicore chips via simple machine models G. Hager, J. Treibig, J. Habich, and G. Wellein Erlangen Regional Computing Center (RRZE) Martensstr. 1, 9158
More informationWeather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012
Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,
More informationERLANGEN REGIONAL COMPUTING CENTER
ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,
More informationarxiv: v1 [cs.pf] 5 Mar 2018
On the accuracy and usefulness of analytic energy models for contemporary multicore processors Johannes Hofmann 1, Georg Hager 2, and Dietmar Fey 1 arxiv:183.1618v1 [cs.pf] 5 Mar 218 1 Computer Architecture,
More informationParallel Simulations of Self-propelled Microorganisms
Parallel Simulations of Self-propelled Microorganisms K. Pickl a,b M. Hofmann c T. Preclik a H. Köstler a A.-S. Smith b,d U. Rüde a,b ParCo 2013, Munich a Lehrstuhl für Informatik 10 (Systemsimulation),
More informationA Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries
A Framework for Hybrid Parallel Flow Simulations with a Trillion Cells in Complex Geometries SC13, November 21 st 2013 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler, Ulrich
More informationPerformance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures
Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga Performance and Energy Analysis of the Iterative Solution of Sparse
More informationThe Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems
The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems Wu Feng and Balaji Subramaniam Metrics for Energy Efficiency Energy- Delay Product (EDP) Used primarily in circuit design
More informationScalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver
Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,
More informationA simple Concept for the Performance Analysis of Cluster-Computing
A simple Concept for the Performance Analysis of Cluster-Computing H. Kredel 1, S. Richling 2, J.P. Kruse 3, E. Strohmaier 4, H.G. Kruse 1 1 IT-Center, University of Mannheim, Germany 2 IT-Center, University
More informationLeveraging Task-Parallelism in Energy-Efficient ILU Preconditioners
Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga
More informationCactus Tools for Petascale Computing
Cactus Tools for Petascale Computing Erik Schnetter Reno, November 2007 Gamma Ray Bursts ~10 7 km He Protoneutron Star Accretion Collapse to a Black Hole Jet Formation and Sustainment Fe-group nuclei Si
More informationA Data Communication Reliability and Trustability Study for Cluster Computing
A Data Communication Reliability and Trustability Study for Cluster Computing Speaker: Eduardo Colmenares Midwestern State University Wichita Falls, TX HPC Introduction Relevant to a variety of sciences,
More informationLarge-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors
Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr) Principal Researcher / Korea Institute of Science and Technology
More informationPerformance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville
Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms
More informationScalable and Power-Efficient Data Mining Kernels
Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the
More informationPerformance Evaluation of Scientific Applications on POWER8
Performance Evaluation of Scientific Applications on POWER8 2014 Nov 16 Andrew V. Adinetz 1, Paul F. Baumeister 1, Hans Böttiger 3, Thorsten Hater 1, Thilo Maurer 3, Dirk Pleiter 1, Wolfram Schenck 4,
More informationStochastic Modelling of Electron Transport on different HPC architectures
Stochastic Modelling of Electron Transport on different HPC architectures www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivan ova Institute of Information and Communication Technologies Bulgarian Academy
More informationTowards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters
Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM
More informationA Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters
A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!
More informationMPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory
MAX PLANCK INSTITUTE November 5, 2010 MPI at MPI Jens Saak Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory FOR DYNAMICS OF COMPLEX TECHNICAL
More informationPerformance Analysis of a List-Based Lattice-Boltzmann Kernel
Performance Analysis of a List-Based Lattice-Boltzmann Kernel First Talk MuCoSim, 29. June 2016 Michael Hußnätter RRZE HPC Group Friedrich-Alexander University of Erlangen-Nuremberg Outline Lattice Boltzmann
More informationWRF performance tuning for the Intel Woodcrest Processor
WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,
More informationImpact of Thread and Frequency Scaling on Performance and Energy in Modern Multicores: A Measurement-based Study
Impact of Thread and Frequency Scaling on Performance and Energy in Modern Multicores: A Measurement-based Study Armen Dzhagaryan Electrical and Computer Engineering The University of Alabama in Huntsville
More informationMassively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem
Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing
More informationHYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017
HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher
More informationPiz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess
Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:
More informationA hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers
A hierarchical Model for the Analysis of Efficiency and Speed-up of Multi-Core Cluster-Computers H. Kredel 1, H. G. Kruse 1 retired, S. Richling2 1 IT-Center, University of Mannheim, Germany 2 IT-Center,
More informationPerformance Analysis of Lattice QCD Application with APGAS Programming Model
Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models
More informationQuantum Chemical Calculations by Parallel Computer from Commodity PC Components
Nonlinear Analysis: Modelling and Control, 2007, Vol. 12, No. 4, 461 468 Quantum Chemical Calculations by Parallel Computer from Commodity PC Components S. Bekešienė 1, S. Sėrikovienė 2 1 Institute of
More informationClaude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique
Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)
More informationReliability at Scale
Reliability at Scale Intelligent Storage Workshop 5 James Nunez Los Alamos National lab LA-UR-07-0828 & LA-UR-06-0397 May 15, 2007 A Word about scale Petaflop class machines LLNL Blue Gene 350 Tflops 128k
More informationA Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures
A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationQuantum ESPRESSO Performance Benchmark and Profiling. February 2017
Quantum ESPRESSO Performance Benchmark and Profiling February 2017 2 Note The following research was performed under the HPC Advisory Council activities Compute resource - HPC Advisory Council Cluster
More informationOn Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code
On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance
More informationLattice Quantum Chromodynamics on the MIC architectures
Lattice Quantum Chromodynamics on the MIC architectures Piotr Korcyl Universität Regensburg Intel MIC Programming Workshop @ LRZ 28 June 2017 Piotr Korcyl Lattice Quantum Chromodynamics on the MIC 1/ 25
More informationOne Optimized I/O Configuration per HPC Application
One Optimized I/O Configuration per HPC Application Leveraging I/O Configurability of Amazon EC2 Cloud Mingliang Liu, Jidong Zhai, Yan Zhai Tsinghua University Xiaosong Ma North Carolina State University
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More informationPerformance Evaluation of MPI on Weather and Hydrological Models
NCAR/RAL Performance Evaluation of MPI on Weather and Hydrological Models Alessandro Fanfarillo elfanfa@ucar.edu August 8th 2018 Cheyenne - NCAR Supercomputer Cheyenne is a 5.34-petaflops, high-performance
More informationBenchmarking program performance evaluation of Parallel programming language XcalableMP on Many core processor
XcalableMP 1 2 2 2 Xeon Phi Xeon XcalableMP HIMENO L Phi XL 16 Xeon 1 16 Phi XcalableMP MPI XcalableMP OpenMP 16 2048 Benchmarking program performance evaluation of Parallel programming language XcalableMP
More information- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline
- Part 4 - Multicore and Manycore Technology: Chances and Challenges Vincent Heuveline 1 Numerical Simulation of Tropical Cyclones Goal oriented adaptivity for tropical cyclones ~10⁴km ~1500km ~100km 2
More informationResearch of the new Intel Xeon Phi architecture for solving a wide range of scientific problems at JINR
Research of the new Intel Xeon Phi architecture for solving a wide range of scientific problems at JINR Podgainy D.V., Streltsova O.I., Zuev M.I. on behalf of Heterogeneous Computations team HybriLIT LIT,
More informationMAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors
MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors J. Dongarra, M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov University of Tennessee, Knoxville 05 / 03 / 2013 MAGMA:
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationIntroduction to Benchmark Test for Multi-scale Computational Materials Software
Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)
More informationPerformance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster
Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp
More informationDomain Decomposition-based contour integration eigenvalue solvers
Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM
More informationParallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2
1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013
More informationStatic-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems
Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse
More informationNCEP Applications -- HPC Performance and Strategies. Mark Iredell software team lead USDOC/NOAA/NWS/NCEP/EMC
NCEP Applications -- HPC Performance and Strategies Mark Iredell software team lead USDOC/NOAA/NWS/NCEP/EMC Motivation and Outline Challenges in porting NCEP applications to WCOSS and future operational
More informationParallel Algorithms for Solution of Large Sparse Linear Systems with Applications
Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Murat Manguoğlu Department of Computer Engineering Middle East Technical University, Ankara, Turkey Prace workshop: HPC
More informationPerformance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6
Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Sangwon Joo, Yoonjae Kim, Hyuncheol Shin, Eunhee Lee, Eunjung Kim (Korea Meteorological Administration) Tae-Hun
More information591 TFLOPS Multi-TRILLION Particles Simulation on SuperMUC
International Supercomputing Conference 2013 591 TFLOPS Multi-TRILLION Particles Simulation on SuperMUC W. Eckhardt TUM, A. Heinecke TUM, R. Bader LRZ, M. Brehm LRZ, N. Hammer LRZ, H. Huber LRZ, H.-G.
More informationBuilding a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI
Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering
More informationarxiv: v1 [hep-lat] 8 Nov 2014
Staggered Dslash Performance on Intel Xeon Phi Architecture arxiv:1411.2087v1 [hep-lat] 8 Nov 2014 Department of Physics, Indiana University, Bloomington IN 47405, USA E-mail: ruizli AT umail.iu.edu Steven
More informationHybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra
Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra Markus Rampp 1, Liang Shi 2, Marc Avila 3,2, Björn Hof 2,4 1 Computing Center of the
More informationLarge-scale MD simulation of heterogeneous systems with ls1 mardyn
Large-scale MD simulation of heterogeneous systems with ls1 mardyn M. T. Horsch, R. Srivastava, S. J. Werth, C. Niethammer, C. W. Glass, W. Eckhardt, A. Heinecke, N. Tchipev, H.-J. Bungartz, S. Eckelsbach,
More informationSustained Petascale Performance of Seismic Simulations with SeisSol
SIAM EX Workshop on Exascale Applied Mathematics Challenges and Opportunities Sustained Petascale Performance of Seismic Simulations with SeisSol M. Bader, A. Breuer, A. Heinecke, S. Rettenberger C. Pelties,
More informationGPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018
GPU-accelerated Computing at Scale irk Pleiter I GTC Europe 10 October 2018 Outline Supercomputers at JSC Future science challenges Outlook and conclusions 2 3 Supercomputers at JSC JUQUEEN (until 2018)
More informationR. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012
R. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012 * presenting author Contents Overview on AACE Overview on MIC
More informationGPU Computing Activities in KISTI
International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010, Cetraro, Italy HPC Infrastructure and GPU Computing Activities in KISTI Hongsuk Yi hsyi@kisti.re.kr
More informationINCREASING THE PERFORMANCE OF THE JACOBI-DAVIDSON METHOD BY BLOCKING
INCREASING THE PERFORMANCE OF THE JACOBI-DAVIDSON METHOD BY BLOCKING MELVEN RÖHRIG-ZÖLLNER, JONAS THIES, MORITZ KREUTZER, ANDREAS ALVERMANN, ANDREAS PIEPER, ACHIM BASERMANN, GEORG HAGER, GERHARD WELLEIN,
More informationNuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory
Nuclear Physics and Computing: Exascale Partnerships Juan Meza Senior Scientist Lawrence Berkeley National Laboratory Nuclear Science and Exascale i Workshop held in DC to identify scientific challenges
More informationParallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29
Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing
More informationBeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power
BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power James C. Hoe Department of ECE Carnegie Mellon niversity Eric S. Chung, et al., Single chip Heterogeneous Computing:
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationParallel Transposition of Sparse Data Structures
Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing
More informationScalable Tools for Debugging Non-Deterministic MPI Applications
Scalable Tools for Debugging Non-Deterministic MPI Applications ReMPI: MPI Record-and-Replay tool Scalable Tools Workshop August 2nd, 2016 Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Mar>n
More informationMassively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling
2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)
More informationProgress in NWP on Intel HPC architecture at Australian Bureau of Meteorology
Progress in NWP on Intel HPC architecture at Australian Bureau of Meteorology www.cawcr.gov.au Robin Bowen Senior ITO Earth System Modelling Programme 04 October 2012 ECMWF HPC Presentation outline Weather
More information1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria
1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation
More informationECMWF Computing & Forecasting System
ECMWF Computing & Forecasting System icas 2015, Annecy, Sept 2015 Isabella Weger, Deputy Director of Computing ECMWF September 17, 2015 October 29, 2014 ATMOSPHERE MONITORING SERVICE CLIMATE CHANGE SERVICE
More informationLecture 27: Hardware Acceleration. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 27: Hardware Acceleration James C. Hoe Department of ECE Carnegie Mellon niversity 18 447 S18 L27 S1, James C. Hoe, CM/ECE/CALCM, 2018 18 447 S18 L27 S2, James C. Hoe, CM/ECE/CALCM, 2018
More informationLarge-Scale Behavioral Targeting
Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting
More informationHigh-Performance Scientific Computing
High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org
More informationRWTH Aachen University
IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016
More informationInformation Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and
Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationMassively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling
2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)
More informationAdvanced Vectorization of PPML Method for Intel Xeon Scalable Processors
Advanced Vectorization of PPML Method for Intel Xeon Scalable Processors Igor Chernykh 1, Igor Kulikov 1, Boris Glinsky 1, Vitaly Vshivkov 1, Lyudmila Vshivkova 1, Vladimir Prigarin 1 Institute of Computational
More informationHeterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry
Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)
More informationMSC HPC Infrastructure Update. Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada
MSC HPC Infrastructure Update Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada Outline HPC Infrastructure Overview Supercomputer Configuration Scientific Direction 2 IT Infrastructure
More informationAnalysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing
Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing Prasanna Balaprakash, Leonardo A. Bautista Gomez, Slim Bouguerra, Stefan M. Wild, Franck Cappello, and Paul D. Hovland
More informationA Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )
A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) Stefan Tillich, Johann Großschädl Institute for Applied Information Processing and
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationThe Memory Intensive System
DiRAC@Durham The Memory Intensive System The DiRAC-2.5x Memory Intensive system at Durham in partnership with Dell Dr Lydia Heck, Technical Director ICC HPC and DiRAC Technical Manager 1 DiRAC Who we are:
More informationApplications of Lattice Boltzmann Methods
Applications of Lattice Boltzmann Methods Dominik Bartuschat, Martin Bauer, Simon Bogner, Christian Godenschwager, Florian Schornbaum, Ulrich Rüde Erlangen, Germany March 1, 2016 NUMET 2016 D.Bartuschat,
More informationNOAA Research and Development High Performance Compu3ng Office Craig Tierney, U. of Colorado at Boulder Leslie Hart, NOAA CIO Office
A survey of performance characteris3cs of NOAA s weather and climate codes across our HPC systems NOAA Research and Development High Performance Compu3ng Office Craig Tierney, U. of Colorado at Boulder
More informationParallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid
Section 1: Introduction Parallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid During the manufacture of integrated circuits, a process called atomic layer deposition
More informationThe Lattice Boltzmann Simulation on Multi-GPU Systems
The Lattice Boltzmann Simulation on Multi-GPU Systems Thor Kristian Valderhaug Master of Science in Computer Science Submission date: June 2011 Supervisor: Anne Cathrine Elster, IDI Norwegian University
More informationINTENSIVE COMPUTATION. Annalisa Massini
INTENSIVE COMPUTATION Annalisa Massini 2015-2016 Course topics The course will cover topics that are in some sense related to intensive computation: Matlab (an introduction) GPU (an introduction) Sparse
More informationSimulation of Lid-driven Cavity Flow by Parallel Implementation of Lattice Boltzmann Method on GPUs
Simulation of Lid-driven Cavity Flow by Parallel Implementation of Lattice Boltzmann Method on GPUs S. Berat Çelik 1, Cüneyt Sert 2, Barbaros ÇETN 3 1,2 METU, Mechanical Engineering, Ankara, TURKEY 3 METU-NCC,
More informationVerbundprojekt ELPA-AEO. Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen
Verbundprojekt ELPA-AEO http://elpa-aeo.mpcdf.mpg.de Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen BMBF Projekt 01IH15001 Feb 2016 - Jan 2019 7. HPC-Statustagung,
More informationMultiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU
Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,
More informationFPGA Implementation of a Predictive Controller
FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan
More informationVector Lane Threading
Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program
More informationRed Sky. Pushing Toward Petascale with Commodity Systems. Matthew Bohnsack. Sandia National Laboratories Albuquerque, New Mexico USA
Red Sky Pushing Toward Petascale with Commodity Systems Matthew Bohnsack Sandia National Laboratories Albuquerque, New Mexico USA mpbohns@sandia.gov Tuesday March 9, 2010 Matthew Bohnsack (Sandia Nat l
More informationAPPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD
APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD M.A. Naumenko, V.V. Samarin Joint Institute for Nuclear Research, Dubna, Russia
More information