A Survey of HPC Systems and Applications in Europe

Size: px

Start display at page:

Download "A Survey of HPC Systems and Applications in Europe"

Roxanne Short
5 years ago
Views:

1 A Survey of HPC Systems and Applications in Europe Dr Mark Bull, Dr Jon Hill and Dr Alan Simpson EPCC, University of Edinburgh

2 Overview Background Survey design Survey results HPC Systems HPC Applications Selecting a benchmark suite

3 Background to the survey The PRACE project is working towards the installation of Petaflop/s scale systems in Europe. Requirement for a set of benchmark applications to assess performance of systems before and during procurement process Benchmark applications should be representative of HPC usage by PRACE partners To understand current applications usage, we conducted a survey of PRACE partners current HPC systems

4 We took the opportunity to gather other interesting data as well We also devised a method for selecting (and weighting) a set of applications which can be considered representative of the current usage we wanted to do this in a quantifiable way we wanted to avoid political considerations but it was not entirely successful!

5 Survey Design We asked the PRACE centres to complete: a systems survey for their largest system, and any other system over 0 Tflop/s Linpack an application survey for all applications which consumed more than 5% of the utilised cycles on each system We collected data for 24 systems and 69 applications Survey was conducted in April 2008 data relates to 2007/8

6 Systems surveyed System Jugene MareNostrum HLRB II HECToR Neolith Platine Hexagon Galera Jubl BCX Stallo Palu HPCx Huygens Legion hww SX-8 Louhi murska.csc.fi Jump ZAHIR HERA XC5 Milipeia TNC Centre FZJ BSC BADW-LRZ EPSRC SNIC GENCI SIGMA PSNC FZJ CINECA SIGMA ETHZ EPSRC NCF EPSRC USTUTT-HLRS CSC CSC FZJ GENCI GENCI CINECA UC-LCA PSNC Manufacturer IBM IBM SGI Cray HP Bull Cray Supermicro IBM IBM HP Cray IBM IBM IBM NEC Cray HP IBM IBM IBM HP SUN IBM, Sun Model Blue Gene/P JS2 cluster Altix 4700 XT4 Cluster 3000 DL XT4 X7DBT-INF Blue Gene/L BladeCenter Cluster LS2 BL460c XT3 p575 cluster p575 cluster Blue Gene/P SX8 XT4 CP400 BL ProLiant SuperCluster p690 cluster p690/p690+/p655 cluster p690/p575 cluster HS2 cluster x400 cluster e325/v40z/x4600 cluster Architecture MPP TNC FNC MPP TNC TNC MPP TNC MPP TNC TNC MPP FNC FNC MPP VEC MPP TNC FNC FNC FNC TNC TNC TNC R peak R max Cores Totals

7 Compute power by architecture type Fat-node Cluster 4% Vector % MPP 50% Thin-node Cluster 35%

8 LEFs The measure of computational power and consumed cycles we use is the Linpack Equivalent Flop (LEF). A system which has a Linpack R max of 50 Tflop/s is said to have a power of 50T LEFs An application which uses 0% of the time on that system is said to consume 5T LEFs

9 Distribution of LEFs by job size > % < 32 4.% % % %

10 Jubl Jugene XC5 Legion Jump Palu hww SX-8 HERA HPCx HECToR Louhi Mean job size as % of machine % of machine of the mean job size 5 0 Galera Neolith Stallo BCX HLRB II murska.csc.fi Platine Milipeia halo Huygens ZAHIR MareNostrum

11 Job size distribution by system 00% 90% 80% 70% 60% 50% 40% 30% 20% 0% 0% > <32 Jugene Jubl Louhi XC5 Legion HECToR Palu MareNostrum Stallo Platine Neolith HPCx HLRB II BCX Huygens ZAHIR murska.csc.fi hww SX-8 Jump HERA Galera Milipeia TNC

12 Distribution of LEFs by scientific area Plasma Physics 3.3 Other 5.8 Computational Engineering 3.7 Particle Physics 23.5 Life Sciences 5.3 Astronomy & Cosmology 5.8 Earth & Climate 7.8 CFD 8.6 Computational Chemistry 22. Condensed Matter Physics 4.2

13 Milipeia Legion XC5 Louhi Galera Palu No. of users and Rmax per user No. of Users Jump HPCx hww SX-8 ZAHIR HLRB II Jubl HERA murska.csc.fi Huygens Neolith Jugene HECToR BCX TNC Machine Name Rmax per User No. Users Rmax Per User

14 Parallelisation techniques Of the 69 applications, all but two use MPI for parallelisation exceptions are Gaussian (OpenMP) and BLAST (sequential). Of the 67 MPI applications, six also have standalone OpenMP versions and three have standalone SHMEM versions. 3 applications have hybrid implementations 0 MPI+OpenMP, 2 MPI+SHMEM, MPI+Posix threads Only one application was reported as using MPI2 single sided communication.

15 Languages Language Fortran90 C90 Fortran77 C++ C99 Python Perl Mathematica No. of applications applications mix Fortran with C/C

16 Distribution of LEFs by dwarves Structured grids 9.0% Dense linear algebra 4.4% Sparse linear algebra 3.4% Particle methods 7.2% Unstructured grids 2.4% Map reduce methods 45.% Spectral methods 8.4%

17 Distribution of LEFs by dwarf and area Area/Dwarf Dense linear algebra Spectral methods Structured grids Sparse linear algebra Particle methods Unstructured grids Map reduce methods Astronomy and Cosmology Computational Chemistry Computational Engineering Computational Fluid Dynamics Condensed Matter Physics Earth and Climate Science Life Science Particle Physics Plasma Physics Other

18 Choosing a benchmark suite Want to choose a set of applications to form a benchmark suite to be used in the procurement process for Petaflop/s systems Suggested process: find a set of applications that is a best fit to the area/dwarf table in the sense that it minimises the norm of Uw-v where v is a linearised vector containing the table entries U is a matrix describing the area/dwarf combinations satisfied by the applications w is vector of weights

19 In principle, one could search all possible lists of applications up to a certain length and find the list with the smallest residual in practise, do a manual search we want to include other criteria, such as usage of applications, geographical spread, etc. Gives a quantitative measure of how well a benchmark suite represents current usage Also gives a weighting for the applications which could be used to weight benchmark results

20 Problems with this approach Classification of codes into dwarves (and to some extent, areas) is somewhat arbitrary some applications use more than one dwarf: we split the LEFs equally between dwarves Bias to recently acquired systems high LEFs recently acquired systems may have atypical usage by early users Reflects past, rather than future usage

21 Current status We used the above process as a starting point, then swapped some applications to meet some of the concerns 2 core applications, plus 8 additional applications Core apps: NAMD, CPMD, VASP, QCD, GADGET, Code_Saturne, TORB, NEMO, ECHAM5, CP2K, GROMACS, N3D Additional apps: AVBP, HELIUM, TRIPOLI_4, GPAW, ALYA, SIESTA, BSIT, PEPC

22 We have undertaken work to port these applications to the PRACE prototype systems, and optimise them for sequential performance and scalability. We are currently collecting benchmark data from the prototype systems which have been installed so far. Based on this data, we are reviewing the list of applications to ensure that the final benchmark suite contains scalable codes and avoids licensing problems.

23 Acknowledgements The authors would like to acknowledge all those who contributed by filling in survey forms and taking part in subsequent discussions. A full report is available from:

25 Availability and utilisation Availability % Utilisation % Neolith Palu Jubl Legion MareNostrum Jugene hwwsx-8 BCX halo Jump Platine Huygens HERA HPCx Milipeia HLRBII ZAHIR Stallo XC5 HECToR murska.csc.fi Galera

26 Top 30 applications by usage Application Name overlap and wilson fermions vasp lqcd (twisted mass) lqcd (two flavor) namd dalton cpmd gadget dynamical fermions spintronics materials with strong correlations dl_poly casino quantum-espresso cactus trio_u smmp tfs/piano gromacs pepc tripoli4 chroma wien2k bam trace bqcd cp2k helium magnum pdkgrav-gasoline LEFs Used (Gflop/s) Number of systems

A Survey of HPC Usage in Europe and the PRACE Benchmark Suite

A Survey of HPC Usage in Europe and the PRACE Benchmark Suite Dr Mark Bull Dr Jon Hill Dr Alan Simpson EPCC, University of Edinburgh Email: m.bull@epcc.ed.ac.uk, j.hill@epcc.ed.ac.uk, a.simpson@epcc.ed.ac.uk