Recent successes in high-end modelling for materials design in Europe. Thomas C. Schulthess

Size: px
Start display at page:

Download "Recent successes in high-end modelling for materials design in Europe. Thomas C. Schulthess"

Transcription

1 Recent successes in high-end modelling for materials design in Europe Thomas C. Schulthess 1

2 September 15, 2015 Today s Outloo: GPU-accelerated Weather Forecasting John Russell 2010: start investing in software 2012/13: co-design for Piz Daint 2014: COSMO in production on GPUs ( Piz Daint ) : co-design of Piz Kesch (specialised for MeteoSwiss) Oct. 2015: Piz Kesch in production Apr. 2016: new model operational 2

3 ASCR Computing Upgrades At a Glance System attributes NERSC Now OLCF Now ALCF Now NERSC Upgrade OLCF Upgrade ALCF Upgrades Name Planned Installation Edison TITAN MIRA Cori 2016 Summit Theta 2016 Aurora System pea (PF) > > Pea Power (MW) < Total system memory 357 TB 710TB 768TB ~1 PB DDR4 High Bandwidth Memory (HBM)1.5PB persistent memory > 1.74 PB DDR4 HBM 2.8 PB persistent memory >480 TB DDR4 High Bandwidth Memory (HBM) > 7 PB High Bandwidth On-Pacage Memory Local Memory and Persistent Memory Node performance (TF) > 3 > 40 > 3 > 17 times Mira Node processors Intel Ivy Bridge AMD Opteron Nvidia Kepler 64-bit PowerPC A2 Intel Knights Landing many core CPUs Intel Haswell CPU in data partition Multiple IBM Power9 CPUs & multiple Nvidia Voltas GPUS Intel Knights Landing eon Phi many core CPUs Knights Hill eon Phi many core CPUs System size (nodes) 5,600 nodes 18,688 nodes 49,152 9,300 nodes 1,900 nodes in data partition ~3,500 nodes >2,500 nodes >50,000 nodes System Interconnect Aries Gemini 5D Torus Aries Dual Rail EDR-IB Aries File System 7.6 PB 168 GB/s, Lustre 32 PB 1 TB/s, Lustre 26 PB 300 GB/s GPFS 28 PB 744 GB/s Lustre 120 PB 1 TB/s GPFS 10PB, 210 GB/s Lustre initial 2 nd Generation Intel Omni-Path Architecture 150 PB 1 TB/s Lustre

4 GPU - accelerated hybrid eon Phi (accelerated) Multi-core 2017 Summit Aurora post-k Toyo-1 Toyo-2 Both architecture have heterogeneous memory! DARPA HPCS 4

5 Architectural diversity is here to stay, because it is a consequence of the dawn of CMOS scaling (Moore s Law) What are the implications? Complexity in software is one, but we don t understand all implications Physics of the computer matters more than ever 5

6 Three European Centers of Excellence in Materials Science have recently been funded NoMaD: Novel Materials Discovery PI: Claudia Draxl and Matthias Scheffler Materials Encyclopedia and Big-Data Analytics tools for materials science and engineering Ma: Materials design at the eascale PI: Elisa Molinari Applications and tools for electronic structure simulations on future exascale architectures A CoE based at CECAM PI: Dominic Tildesley Molecular simulation tools with emphasis on education 6

7 12 year ( ) NCCR project funded by the Swiss National Fund 33 investigators from 11 Swiss institutions (Universities, National Labs, Industry) and various disciplines (physics, chemistry, materials science, computational science, computer science, engineering) First phase: funded at CHF 34.4M (18M SNS, 6.6M EPFL, 9.8 M others) EPFL (Marzari, Pasquarello, Roethlisberger, Koch, Andreoni, Corminboeuf, Yazyev, Ceriotti), ETHZ (Spaldin, Troyer, VandeVondele), Basel (Goedecer, Von Lilienfeld), Fribourg (Werner), Geneva (Georges), Svizzera Italiana (Parrinello), Zurich (Hutter), IBM (Curioni), CSCS (Schulthess), EMPA (Groning, Passerone), PSI (Kenzelman, Nolting) 7

8 Serendipitous discovery & Edisonian development Most new materials are discovered serendipitously (particularly true for complex materials) Or through very laborious searches, e.g. Edison tested 3000 materials for his filament and settled on burned sewing thread Haber-Bosh ammonia synthesis with osmium as a catalyst Mitasch (BASF) tested ~22,000 materials to find iron-based catalyst still in use today Norsov showed in 2009 that CoMo is a more efficient & inexpensive catalyst TOF(s 1 ) Fe CoMo Ru Os Co Mo Ni Relative nitrogen binding energy (J mol 1 ) Nicola Marzari 8

9 Systematic searches with high-throughput & capability runs There are ~150,000 nown inorganic materials with published structures Very basic properties computed with DFT-based quantum simulations tae ~10 minutes on a powerful worstation (e.g. hybrid CPU-GPU) Piz Daint with 5272 hybrid CPU-GPU nodes could scan ~5000 structures / 10 minutes But we want to study more complex, harder to compute properties how complex? 9

10 Approaching the problem form the other end Start with the most reliable (and expensive) approach to electronic structure Linearised Augmented Plane Wave Method (LAPW) and the largest problem that is reasonable* for materials searches ~1000 atoms in a unit cell the 1000-atom problem ** and bet on future improvements in extreme-scale computing novel architectures and exa-scale computing (*) Using W. Kohn s arguments on nearsightedness of electronic matter (**) proposed by Claudia Draxl at a PRACE project meeting in spring

11 Solving the Kohn-Sham Equations is the bottlenec in most DFT-based materials science codes Kohn-Sham Eqn. Ansatz Hermitian matrix Basis is not orthogonal ~ 2m r2 v s [n](~r ) i ( r) = µ Z H µ = S µ = Solve generalised eigenvalue problem Z c iµ µ ( r) ~ 2 µ(~r ) 2m r2 v s [n](~r ) µ( r) ( r)d r (H " i S)=0 (~r )d~r where we are usually interested in about 10-50% of spectrum We need eigenvectors as well, to compute the density: i(~r )= i i (~r ) n(~r )= N i=1 i (~r ) i (~r ) 11

12 Generalised eigenvalue problem in the LAPW H G C i = i O G C i Overlap: O G = h' G ' i MT I Hamiltonian: H G = h' G Ĥ ' i LAPW basis: ' G (r) = 8 >< >: L O ` =1 A L (G)u ` (r)y L (ˆr) 1 p e i(g)r 9 r 2 MT >= r 2 I >; 12

13 Generalised eigenvalue problem in the LAPW (cont.) H G C i = i O G C i Overlap: O G = h' G ' i = L A L (G)A L ( ) (G ) Hamiltonian: H G = h' G Ĥ ' i = 8 L LAPW basis: >< ' G (r) = L >: A L (G)B L ( ) 1 2 (G )(G0 ) (G )Ṽs(G ) O ` =1 A L (G)u ` (r)y L (ˆr) 1 p e i(g)r r 2 MT >= r 2 I 9 >; 13

14 Generalised eigenvalue problem in the LAPW (cont.) H G C i = i O G C i LAPACK / ScaLAPACK Overlap: O G = h' G ' i = L A L (G)A L ( ) (G ) Hamiltonian: H G = h' G Ĥ ' i = L A L (G)B L ( ) 1 2 (G )(G0 ) (G )Ṽs(G ) B L (G) = L 3 L 2 2 A L 2 2 (G)h l L 3 l 2 2 hy L R L3 Y L2 i 1 2 A L 2 u l (R )u 0 l 2 (R )R 2 2 Buried in thousands of lines of FORTRAN code 14

15 Generalised eigenvalue problem in the LAPW (cont.) H G C i = i O G C i Overlap: O G = h' G ' i O(N 3 ) complexity = L A L (G)A L ( ) (G ) Hamiltonian: H G = h' G Ĥ ' i = L A L (G)B L ( ) 1 2 (G )(G0 ) (G )Ṽs(G ) B L (G) = L 3 L 2 2 A L 2 2 (G)h l L 3 l 2 2 hy L R L3 Y L2 i 1 2 A L 2 u l (R )u 0 l 2 (R )R

16 Generalised eigenvalue problem in the LAPW (cont.) i HGG 0 CG0 = i G0 i OGG 0 CGInitial 0 data is distributed ı(g) fashion in a bloc-cyclic G0 Each MPI ran gets a panel of tiles [0, 0] [0, 1] ı( L ) OGG 0 = h'g 'G0 i Overlap: = L Hamiltonian: HGG 0 0 A (G)A (G ) (G L L G0 ) [1, 0] [1, 1] = h'g H 'G0 i Fig. 1. (color online) Panel and slice storage of the data. For parallel grid of MPI rans. In order to perform a local operation on a whole vecto 1 with PBL row0 rans of the MPI grid. To0perform a distributed operation 0 0 = A (G)B (G ) (G )(G ) (G G ) V (G G ) s L L 2 L B L (G) = Thus the LAPW basis functions are given by: O l 2 > > A L2 2 (G)hL3 l2 2 hyl RL3 YL2 i A u (R )u (R )R > l 2 L (G)u` (r)yl (r ) r 2 MT 2 < L 2 Al L3 L2 2 'G (r) =2 L =1 > 1 > > : p ei(g)r r2i angularp momentum and wehe L {`, m} denotes the P `max P` imuthal quantum numbers and L `=0 m= `. 16 conti International Worshop on CO-DESIGN, Wuxi, Monday, November 2015 A L (G) 9, are chosen to ensure matching coefficients cont ity of the basis functions (and if possible of their derivativ

17 Generalised eigenvalue problem in the LAPW (cont.) i HGG 0 CG0 = i G0 i OGG 0 CGInitial 0 data is distributed Each MPI ran gets a panel of tiles ı(g) fashion in a bloc-cyclic G0 [0, 0] [0, 1] ı( L ) Overlap: OGG 0 = h'g 'G0 i = 0 A (G)A (G ) (G L L G0 ) L Hamiltonian: HGG 0 [1, 0] [1, 1] = h'g H 'G0 i Fig. 1. (color online) Panel and slice storage of the data. For parallel grid of MPI rans. In order to perform a local operation on a whole vecto 1 with PBL row0 rans of the MPI grid. To0perform a distributed operation 0 0 = A (G)B (G ) (G )(G ) (G G ) V (G G ) L L Initial data is distributed Each ran gets a panel MPI of swop eachswop column The swopslices The of whole vectors ares is distributed MPI ran gets a panel of tiles of tiles MPI rans of rans each column The slices of slices whole vectors InitialInitial data data is distributed EachEach MPI ran getsmpi a panel of tiles MPI rans of each column of whole vectors are are 2 in afashion bloc-cyclic gathered on in a bloc-cyclic fashion fashion blocs ofblocs panelsof panels gathered on each MPIeach ranmpi ran in a bloc-cyclic blocs of panels gathered on each MPI ran L [0, 1] [0, 0][0, 0] MPI communication MPI communication [0, 0] [0, 1][0, 1] MPI communication [0, 0][0, 0] [0, 0] [0, 1][0, 1] [0, 0] [0, functions [0,are 1] [0, 0] [0, 0]LAPW 1][0, 1] [0, 1] the Thus basis 'G (r) = 8 O > > > < A given by: L (G)u` (r)yl (r ) =1 L =1 > 1 > > : p ei(g)r r 2 MT r2i angularp momentum and wehe L {`, m} denotes the P P ` ` [1, 0] [1, 1] [1, 0] [1, 1] max [1, 0] [1, 1][1, 1] [1, 1] [1, 0] [1, 1][1, 1] [1, 1] [1, 0] [1, 1] [1, 0][1, 0] [1, 0][1, 0] imuthal quantum numbers and L `=0 m= `. 17 conti International Worshop on CO-DESIGN, Wuxi, Monday, November 2015 A L (G) 9, are chosen to ensure matching coefficients cont 1. online) (color online) Panel andstorage slice storage of the For linear parallel linear algebra operations array to beindistributed in a fashion bloc-cyclic over a 2D ig.(color 1. Fig. (color Panel and slice the data. Fordata. parallel algebra operations array todistributed behas distributed in a bloc-cyclic fashion online) Panel and slice storage of theof data. For parallel linear algebra operations array has tohas be a bloc-cyclic over over afashion 2Da 2D (and if possible of their derivativ ity of the basis functions grid rans. ofinmpi In perform a local on avector, whole vector, the vectors are gathered from or locally created locally on the corresponding of MPI Inrans. order to order perform a local operation a whole the slices ofslices vectors are gathered from panels orpanels created on the corresponding frid MPI rans. order to perform a to local operation on operation a on whole vector, the slices of vectors areofgathered from panels or created locally on the corresponding

18 Solving the generalised eigenvalue problem Ax = Bx Standard 1 stage solver xpotrf B = LL H A 0 y = y xhegst A 0 = L 1 AL H T = Q H A 0 Q xhetrd xheevx A 0 y = y Most time consuming step, dominated by level 2 BLAS (memory bound) Ty 0 = y 0 xstexx xtrsm x = L H y y = Qy 0 xunmtr 18

19 Solving the generalised eigenvalue problem (cont.) Ax = Bx A 0 y = y xpotrf B = LL H reduction to banded Most time consuming step, but dominated by BLAS-3 A 00 = Q 1 H A 0 Q 1 xhegst A 0 = L 1 AL H tri-diagonalize T = Q 2 H A 00 Q 2 xheevx xtrsm A 0 y = y x = L H y needs two eigenvector transformations (but easy to parallelise) Ty 0 = y 0 y 00 = Q 2 y 0 y = Q 1 y 00 19

20 Implementations of two-stage eigen solvers for our problem (i.e. with bac transformation of eigenvectors) For multi-cores systems: ELPA library T. Aucenthaler et al., Parallel Comput. vol. 37, no. 12, pp (2011) A. Mare et al., Psi-K Research Highlight, vol. 2014, no. 1, Jan Remar: implementation relies on intrinsics For hybrid CPU-GPU systems: integrated into MAGMA library A. Haidar et al., Lecture Notes in Comp. Sci., 7905, (2013) A. Haidar et al., Int. J. of High Perf. Comp. App / (2013) R. Solcà et al., Proceedings of SC 15, New Yor, ACM (2015) Remar: relies on pblas that is aware of heterogenous memory 20

21 Accelerated hybrid systems: heterogeneous memory 4 nodes blade of a GPU accelerated Cray C30 Networ fabric DDR Memory few low latency threads DDR Memory few low latency threads DDR Memory few low latency threads DDR Memory few low latency threads High-BW Memory many throughput threads High-BW Memory many throughput threads High-BW Memory many throughput threads High-BW Memory many throughput threads 21

22 pblas for accelerated hybrid (memory) systems Bloc cyclic decomposition of a matrix Many scalapack routines rely on distributed pblas routines distributed over DDR memory [0,0] [0,1] Execute pblas on low-latency threads (CPUs) [1,0] [1,1] [0,0] [0,1] Execute pblas on throughput threads (GPUs) distributed over high-bw memory [1,0] [1,1] move data directly between high-bw memory of different nodes On a distributed-memory accelerated hybrid systems we need to types of pblas routines, depending on which memory sub-system the matrix is located 22

23 1000-atom test problem ~115,000 basis functions (matrix size) Running on Cray C30: > CPU runs on eon E (Sandy Bridge) > hybrid: same CPU Nvidia K20 GPU Use comparable number of socets Li intercalated CoO2: 432 formula units CoO2 205 Li atoms 1501 atoms in total 23

24 Results for the full runs (on SCF iteration) MPI grid MPI rans / socet OpenMP threads / ran active socets setup, O H (sec.) solve (sec.) rest (sec.) total (sec.) energy (Wh) 28x28 (2R:4T) ScaLAPACK 28x28 (2R:4T) ELPA2 20x20 (1R:8T) ELPA2 14x14 (1R:8T) hybrid 20x20 (1R:8T) hybrid

25 Resources used 1000-atom design problem Time: ~15 minutes / iteration, i.e. 3 hours for ~10 iterations Footprint: ~400 hybrid nodes on Cray C30 (SandyBrideK20) Scan ~13 materials in 3 hours or 5,000 in ~16 days (consider performance will improve x in by end of decade) 25

26 A note on MPIOpenMP hybrid: (n/10,240) 2 2-socet nodes; CPU-only: 2(n/10,2040) 2 socets On CPU nodes, MPI-only runs perform best! OpenMP necessary to reduce memory footprint 26

27 SIRIUS: (prototype) Domain Specific Library Low-level LAPW (and PW) library that supports multiple codes ~30 lines of C code (incl. documentation) with F90 bindings Anton Kozhevniov with Claudia Draxl, Andris Gulans, and Georg Huhs Exciting El Quantum Espresso other SIRIUS C library Distributed hybrid memory model, MPI where =OpenMP and others Density class Distributed charge density and magnetisation generation Potential class Distributed C potential and magnetic field generation, distributed Poisson solver Band class Second-variational and full diagonalization of the Hamiltonian with support of GPU and distributed eigenvalue solvers Force class Atomic forces with support of distributed Hamiltonian matrix GNU scientific library FFTW3 HDF5 ELPA MAGMA Spglib LAPACK and BLAS ScaLAPACK and PBLAS LibC 27

28 References and Collaborators Peter Messmer and his team at the NVIDIA co-design lab at ETH Zurich Teams at CSCS A. Haidar, R. Solcà, M. Gates, T. Tomov, T.C. Schulthess, J. Dongarra, Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations, Supercomputing, pages Springer Berlin, Heidelberg (2013) A. Haidar, S. Tomov, J. Dongarra, R. Solcà, T. C. Schulthess, A novel hybrid CPU-GPU generalised eigensolver for electronic structure calculations based on fine grained memory aware tass, International Journal of High Performance Computing Applications, August 2013 R. Solcà, A. Kozhevniov, A. Haidar, S. Tomov, J. Dongarra, T. C. Schulthess, Efficient implementation of quantum materials simulations on distributed CPU-GPU systems, to be published in Proceedings of the International Conference on High-Performance Computing, Networing, Storage and Analysis, SC 15, New Yor, NY, USA (2015). ACM 28

29 29

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting Thomas C. Schulthess 1 Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Piz Daint Compute node:

More information

A hybrid Hermitian general eigenvalue solver

A hybrid Hermitian general eigenvalue solver Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A hybrid Hermitian general eigenvalue solver Raffaele Solcà *, Thomas C. Schulthess Institute fortheoretical Physics ETHZ,

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess

From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system Oliver Fuhrer and Thomas C. Schulthess 1 Piz Daint Cray XC30 with 5272 hybrid, GPU accelerated compute nodes Compute node:

More information

Reflecting on the Goal and Baseline of Exascale Computing

Reflecting on the Goal and Baseline of Exascale Computing Reflecting on the Goal and Baseline of Exascale Computing Thomas C. Schulthess!1 Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b!2 Tracking supercomputer performance over

More information

Parallel Eigensolver Performance on High Performance Computers

Parallel Eigensolver Performance on High Performance Computers Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

INITIAL INTEGRATION AND EVALUATION

INITIAL INTEGRATION AND EVALUATION INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52

More information

Parallel Eigensolver Performance on the HPCx System

Parallel Eigensolver Performance on the HPCx System Parallel Eigensolver Performance on the HPCx System Andrew Sunderland, Elena Breitmoser Terascaling Applications Group CCLRC Daresbury Laboratory EPCC, University of Edinburgh Outline 1. Brief Introduction

More information

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors J. Dongarra, M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov University of Tennessee, Knoxville 05 / 03 / 2013 MAGMA:

More information

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners: The Problem The Bethe-Salpeter

More information

Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience

Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience Stefano de Gironcoli Scuola Internazionale Superiore di Studi Avanzati Trieste-Italy 0 Diagonalization of the Kohn-Sham

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

materials modelling and discovery: the high-performance compucng way

materials modelling and discovery: the high-performance compucng way materials modelling and discovery: the high-performance compucng way Stefano Baroni Scuola Internazionale Superiore di Studi AvanzaC & IsCtuto Officina dei Materiali del CNR, Trieste QUANTUM ESPRESSO FoundaCon,

More information

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp

More information

Supercomputers: instruments for science or dinosaurs that haven t gone extinct yet? Thomas C. Schulthess

Supercomputers: instruments for science or dinosaurs that haven t gone extinct yet? Thomas C. Schulthess Supercomputers: instruments for science or dinosaurs that haven t gone extinct yet? Thomas C. Schulthess 1 Do you really mean dinosaurs? We must be in the wrong movie 2 Not much has changed since the late

More information

CP2K. New Frontiers. ab initio Molecular Dynamics

CP2K. New Frontiers. ab initio Molecular Dynamics CP2K New Frontiers in ab initio Molecular Dynamics Jürg Hutter, Joost VandeVondele, Valery Weber Physical-Chemistry Institute, University of Zurich Ab Initio Molecular Dynamics Molecular Dynamics Sampling

More information

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,

More information

Parallel Sparse Tensor Decompositions using HiCOO Format

Parallel Sparse Tensor Decompositions using HiCOO Format Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores Parallel Sparse Tensor Decompositions using HiCOO Format Jiajia Li, Jee Choi, Richard Vuduc May 8, 8 @ SIAM ALA 8 Outline

More information

ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS

ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS FROM RESEARCH TO INDUSTRY 32 ème forum ORAP 10 octobre 2013 Maison de la Simulation, Saclay, France ELECTRONIC STRUCTURE CALCULATIONS FOR THE SOLID STATE PHYSICS APPLICATION ON HPC, BLOCKING POINTS, Marc

More information

MARCH 24-27, 2014 SAN JOSE, CA

MARCH 24-27, 2014 SAN JOSE, CA MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =

More information

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017 Quantum ESPRESSO Performance Benchmark and Profiling February 2017 2 Note The following research was performed under the HPC Advisory Council activities Compute resource - HPC Advisory Council Cluster

More information

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Large Scale Electronic Structure Calculations

Large Scale Electronic Structure Calculations Large Scale Electronic Structure Calculations Jürg Hutter University of Zurich 8. September, 2008 / Speedup08 CP2K Program System GNU General Public License Community Developers Platform on "Berlios" (cp2k.berlios.de)

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr) Principal Researcher / Korea Institute of Science and Technology

More information

The ELPA Library Scalable Parallel Eigenvalue Solutions for Electronic Structure Theory and Computational Science

The ELPA Library Scalable Parallel Eigenvalue Solutions for Electronic Structure Theory and Computational Science TOPICAL REVIEW The ELPA Library Scalable Parallel Eigenvalue Solutions for Electronic Structure Theory and Computational Science Andreas Marek 1, Volker Blum 2,3, Rainer Johanni 1,2 ( ), Ville Havu 4,

More information

Performance optimization of WEST and Qbox on Intel Knights Landing

Performance optimization of WEST and Qbox on Intel Knights Landing Performance optimization of WEST and Qbox on Intel Knights Landing Huihuo Zheng 1, Christopher Knight 1, Giulia Galli 1,2, Marco Govoni 1,2, and Francois Gygi 3 1 Argonne National Laboratory 2 University

More information

VASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria

VASP: running on HPC resources. University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria VASP: running on HPC resources University of Vienna, Faculty of Physics and Center for Computational Materials Science, Vienna, Austria The Many-Body Schrödinger equation 0 @ 1 2 X i i + X i Ĥ (r 1,...,r

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Domain specific libraries. Material science codes on innovative HPC architectures Anton Kozhevnikov, CSCS December 5, 2016

Domain specific libraries. Material science codes on innovative HPC architectures Anton Kozhevnikov, CSCS December 5, 2016 Domain specific libraries Material science codes on innovative HPC architectures Anton Kozhevnikov, CSCS December 5, 2016 Part 1: Introduction Kohn-Shame equations 1 2 Eigen-value problem + v eff (r) j(r)

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Parallel Eigensolver Performance on High Performance Computers 1

Parallel Eigensolver Performance on High Performance Computers 1 Parallel Eigensolver Performance on High Performance Computers 1 Andrew Sunderland STFC Daresbury Laboratory, Warrington, UK Abstract Eigenvalue and eigenvector computations arise in a wide range of scientific

More information

Dynamic Scheduling within MAGMA

Dynamic Scheduling within MAGMA Dynamic Scheduling within MAGMA Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Mathieu Faverge, Julien Langou, Hatem Ltaief, Samuel Thibault and Stanimire Tomov April 5, 2012 Innovative and Computing

More information

All-electron density functional theory on Intel MIC: Elk

All-electron density functional theory on Intel MIC: Elk All-electron density functional theory on Intel MIC: Elk W. Scott Thornton, R.J. Harrison Abstract We present the results of the porting of the full potential linear augmented plane-wave solver, Elk [1],

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

Verbundprojekt ELPA-AEO. Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen

Verbundprojekt ELPA-AEO. Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen Verbundprojekt ELPA-AEO http://elpa-aeo.mpcdf.mpg.de Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen BMBF Projekt 01IH15001 Feb 2016 - Jan 2019 7. HPC-Statustagung,

More information

Exascale computing: endgame or new beginning for climate modelling. Thomas C. Schulthess

Exascale computing: endgame or new beginning for climate modelling. Thomas C. Schulthess Exascale computing: endgame or new beginning for climate modelling Thomas C. Schulthess 17th Workshop on HPC in Meteorology @ ECMWF, Reading, Wednesday October 26, 2016 T. Schulthess 1 Operational system

More information

CP2K: Past, Present, Future. Jürg Hutter Department of Chemistry, University of Zurich

CP2K: Past, Present, Future. Jürg Hutter Department of Chemistry, University of Zurich CP2K: Past, Present, Future Jürg Hutter Department of Chemistry, University of Zurich Outline Past History of CP2K Development of features Present Quickstep DFT code Post-HF methods (RPA, MP2) Libraries

More information

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine

First, a look at using OpenACC on WRF subroutine advance_w dynamics routine First, a look at using OpenACC on WRF subroutine advance_w dynamics routine Second, an estimate of WRF multi-node performance on Cray XK6 with GPU accelerators Based on performance of WRF kernels, what

More information

The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and

The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and Home Search Collections Journals About Contact us My IOPscience The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science This content has been

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

Parallelization of the Molecular Orbital Program MOS-F

Parallelization of the Molecular Orbital Program MOS-F Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of

More information

Petascale Quantum Simulations of Nano Systems and Biomolecules

Petascale Quantum Simulations of Nano Systems and Biomolecules Petascale Quantum Simulations of Nano Systems and Biomolecules Emil Briggs North Carolina State University 1. Outline of real-space Multigrid (RMG) 2. Scalability and hybrid/threaded models 3. GPU acceleration

More information

Exascale challenges for Numerical Weather Prediction : the ESCAPE project

Exascale challenges for Numerical Weather Prediction : the ESCAPE project Exascale challenges for Numerical Weather Prediction : the ESCAPE project O Olivier Marsden This project has received funding from the European Union s Horizon 2020 research and innovation programme under

More information

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing

More information

Massively parallel electronic structure calculations with Python software. Jussi Enkovaara Software Engineering CSC the finnish IT center for science

Massively parallel electronic structure calculations with Python software. Jussi Enkovaara Software Engineering CSC the finnish IT center for science Massively parallel electronic structure calculations with Python software Jussi Enkovaara Software Engineering CSC the finnish IT center for science GPAW Software package for electronic structure calculations

More information

Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing

Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing Prasanna Balaprakash, Leonardo A. Bautista Gomez, Slim Bouguerra, Stefan M. Wild, Franck Cappello, and Paul D. Hovland

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Introduction to Benchmark Test for Multi-scale Computational Materials Software Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Large Scale Parallelism Carlo Cavazzoni, HPC department, CINECA Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have a mixed architecture + accelerators

More information

Mitglied der Helmholtz-Gemeinschaft. Linear algebra tasks in Materials Science: optimization and portability

Mitglied der Helmholtz-Gemeinschaft. Linear algebra tasks in Materials Science: optimization and portability Mitglied der Helmholtz-Gemeinschaft Linear algebra tasks in Materials Science: optimization and portability ADAC Workshop, July 17-19 2017 Edoardo Di Napoli Outline Jülich Supercomputing Center Chebyshev

More information

A knowledge-based approach to high-performance computing in ab initio simulations.

A knowledge-based approach to high-performance computing in ab initio simulations. Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background

More information

Improving the performance of applied science numerical simulations: an application to Density Functional Theory

Improving the performance of applied science numerical simulations: an application to Density Functional Theory Improving the performance of applied science numerical simulations: an application to Density Functional Theory Edoardo Di Napoli Jülich Supercomputing Center - Institute for Advanced Simulation Forschungszentrum

More information

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia On the Paths to Exascale: Crossing the Chasm Presented by Mike Rezny, Monash University, Australia michael.rezny@monash.edu Crossing the Chasm meeting Reading, 24 th October 2016 Version 0.1 In collaboration

More information

GPU Computing Activities in KISTI

GPU Computing Activities in KISTI International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010, Cetraro, Italy HPC Infrastructure and GPU Computing Activities in KISTI Hongsuk Yi hsyi@kisti.re.kr

More information

Human ages are named after materials - stone, bronze, iron, nuclear, silicon

Human ages are named after materials - stone, bronze, iron, nuclear, silicon MATERIALS ARE KEY TO SOCIETAL WELL BEING Human ages are named after materials - stone, bronze, iron, nuclear, silicon MATERIALS ARE KEY TO SOCIETAL WELL BEING We need novel materials for: Energy harvesting,

More information

Scalable and Power-Efficient Data Mining Kernels

Scalable and Power-Efficient Data Mining Kernels Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA. Bill Putman

Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA. Bill Putman Global Modeling and Assimilation Office Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA Bill Putman Max Suarez, Lawrence Takacs, Atanas Trayanov and Hamid

More information

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling 2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)

More information

MAGMA. Matrix Algebra on GPU and Multicore Architectures. Mark Gates. February 2012

MAGMA. Matrix Algebra on GPU and Multicore Architectures. Mark Gates. February 2012 MAGMA Matrix Algebra on GPU and Multicore Architectures Mark Gates February 2012 1 Hardware trends Scale # cores instead of clock speed Hardware issue became software issue Multicore Hybrid 1.E+07 1e7

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

Advancing Weather Prediction at NOAA. 18 November 2015 Tom Henderson NOAA / ESRL / GSD

Advancing Weather Prediction at NOAA. 18 November 2015 Tom Henderson NOAA / ESRL / GSD Advancing Weather Prediction at NOAA 18 November 2015 Tom Henderson NOAA / ESRL / GSD The U. S. Needs Better Global Numerical Weather Prediction Hurricane Sandy October 28, 2012 A European forecast that

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

Extreme scale simulations of high-temperature superconductivity. Thomas C. Schulthess

Extreme scale simulations of high-temperature superconductivity. Thomas C. Schulthess Extreme scale simulations of high-temperature superconductivity Thomas C. Schulthess T [K] Superconductivity: a state of matter with zero electrical resistivity Heike Kamerlingh Onnes (1853-1926) Discovery

More information

Matrix Eigensystem Tutorial For Parallel Computation

Matrix Eigensystem Tutorial For Parallel Computation Matrix Eigensystem Tutorial For Parallel Computation High Performance Computing Center (HPC) http://www.hpc.unm.edu 5/21/2003 1 Topic Outline Slide Main purpose of this tutorial 5 The assumptions made

More information

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

Before starting. few Words

Before starting. few Words Before starting few Words Now, many Places on the Planet Earth Materials Data are produced! New and Novel Materials New and Novel Materials New Materials from Materials Data? New Materials from Materials

More information

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs Lucien Ng The Chinese University of Hong Kong Kwai Wong The Joint Institute for Computational Sciences (JICS), UTK and ORNL Azzam Haidar,

More information

PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers

PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers James Kestyn, Vasileios Kalantzis, Eric Polizzi, Yousef Saad Electrical and Computer Engineering Department,

More information

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling 2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)

More information

Acceleration of WRF on the GPU

Acceleration of WRF on the GPU Acceleration of WRF on the GPU Daniel Abdi, Sam Elliott, Iman Gohari Don Berchoff, Gene Pache, John Manobianco TempoQuest 1434 Spruce Street Boulder, CO 80302 720 726 9032 TempoQuest.com THE WORLD S FASTEST

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Susumu YAMADA 1,3 Toshiyuki IMAMURA 2,3, Masahiko MACHIDA 1,3

Susumu YAMADA 1,3 Toshiyuki IMAMURA 2,3, Masahiko MACHIDA 1,3 Dynamical Variation of Eigenvalue Problems in Density-Matrix Renormalization-Group Code PP12, Feb. 15, 2012 1 Center for Computational Science and e-systems, Japan Atomic Energy Agency 2 The University

More information

FEAST eigenvalue algorithm and solver: review and perspectives

FEAST eigenvalue algorithm and solver: review and perspectives FEAST eigenvalue algorithm and solver: review and perspectives Eric Polizzi Department of Electrical and Computer Engineering University of Masachusetts, Amherst, USA Sparse Days, CERFACS, June 25, 2012

More information

A CUDA Solver for Helmholtz Equation

A CUDA Solver for Helmholtz Equation Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College

More information

ESLW_Drivers July 2017

ESLW_Drivers July 2017 ESLW_Drivers 10-21 July 2017 Volker Blum - ELSI Viktor Yu - ELSI William Huhn - ELSI David Lopez - Siesta Yann Pouillon - Abinit Micael Oliveira Octopus & Abinit Fabiano Corsetti Siesta & Onetep Paolo

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline - Part 4 - Multicore and Manycore Technology: Chances and Challenges Vincent Heuveline 1 Numerical Simulation of Tropical Cyclones Goal oriented adaptivity for tropical cyclones ~10⁴km ~1500km ~100km 2

More information

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory MAX PLANCK INSTITUTE November 5, 2010 MPI at MPI Jens Saak Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory FOR DYNAMICS OF COMPLEX TECHNICAL

More information

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals

More information

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS Alan Gray, Developer Technology Engineer, NVIDIA ECMWF 18th Workshop on high performance computing in meteorology, 28 th September 2018 ESCAPE NVIDIA s

More information

Scalable Systems for Computational Biology

Scalable Systems for Computational Biology John von Neumann Institute for Computing Scalable Systems for Computational Biology Ch. Pospiech published in From Computational Biophysics to Systems Biology (CBSB08), Proceedings of the NIC Workshop

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

Porting a sphere optimization program from LAPACK to ScaLAPACK

Porting a sphere optimization program from LAPACK to ScaLAPACK Porting a sphere optimization program from LAPACK to ScaLAPACK Mathematical Sciences Institute, Australian National University. For presentation at Computational Techniques and Applications Conference

More information

Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6

Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Performance and Application of Observation Sensitivity to Global Forecasts on the KMA Cray XE6 Sangwon Joo, Yoonjae Kim, Hyuncheol Shin, Eunhee Lee, Eunjung Kim (Korea Meteorological Administration) Tae-Hun

More information

A Parallel Bisection and Inverse Iteration Solver for a Subset of Eigenpairs of Symmetric Band Matrices

A Parallel Bisection and Inverse Iteration Solver for a Subset of Eigenpairs of Symmetric Band Matrices A Parallel Bisection and Inverse Iteration Solver for a Subset of Eigenpairs of Symmetric Band Matrices Hiroyui Ishigami, Hidehio Hasegawa, Kinji Kimura, and Yoshimasa Naamura Abstract The tridiagonalization

More information