The Memory Intensive System

Similar documents
Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

High-Performance Computing and Groundbreaking Applications

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

Gravitational Wave Data (Centre?)

Stochastic Modelling of Electron Transport on different HPC architectures

From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess

Marla Meehl Manager of NCAR/UCAR Networking and Front Range GigaPoP (FRGP)

Red Sky. Pushing Toward Petascale with Commodity Systems. Matthew Bohnsack. Sandia National Laboratories Albuquerque, New Mexico USA

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory

Update on Cray Earth Sciences Segment Activities and Roadmap

Weather and Climate Modeling on GPU and Xeon Phi Accelerated Systems

Data Intensive Computing meets High Performance Computing

Progress in NWP on Intel HPC architecture at Australian Bureau of Meteorology

Radio astronomy in Africa: Opportunities for cooperation with Europe within the context of the African-European Radio Astronomy Platform (AERAP)

Cactus Tools for Petascale Computing

SKA Industry Update. Matthew Johnson Head of the UK SKA Project Office STFC

A Data Communication Reliability and Trustability Study for Cluster Computing

The Hartree Centre A Research Collaboration in Association with IBM. Professor Terry Hewitt Project Delivery Executive

Report on the INAF-CINECA agreement

The Millennium Simulation: cosmic evolution in a supercomputer. Simon White Max Planck Institute for Astrophysics

Supercomputer Programme

Astronomical Computer Simulations. Aaron Smith

The Square Kilometre Array and Data Intensive Radio Astronomy 1

Performance Evaluation of MPI on Weather and Hydrological Models

Julian Merten. GPU Computing and Alternative Architecture

Performance of Met Office Weather and Climate Codes on Cavium ThunderX2 Processors. Adam Voysey, Maff Glover HPC Optimisation Team

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Reliability at Scale

APPLICATIONS FOR PHYSICAL SCIENCE

Chile / Dirección Meteorológica de Chile (Chilean Weather Service)

Multi-GPU Simulations of the Infinite Universe

MSC HPC Infrastructure Update. Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada

BSMBench: A flexible and scalable HPC benchmark from beyond the standard model physics.

Simulation Laboratories at JSC

Ramesh Vellore. CORDEX Team: R. Krishnan, T.P. Sabin, J. Sanjay, Milind Mujumdar, Sandip Ingle, P. Priya, M.V. Rama Rao, and Madhura Kane

Lattice calculations & DiRAC facility

Physics plans and ILDG usage

WRF performance tuning for the Intel Woodcrest Processor

High-Performance Scientific Computing

Atomistic Simulation of Nuclear Materials

GPU Computing Activities in KISTI

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Hellenic National Meteorological Service (HNMS) GREECE

An Overview of HPC at the Met Office

GPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018

New Zealand Impacts on CSP and SDP Designs

Verbundprojekt ELPA-AEO. Eigenwert-Löser für Petaflop-Anwendungen Algorithmische Erweiterungen und Optimierungen

The Square Kilometre Array Radio Telescope Project : An Overview

NASA's Kepler telescope uncovers a treasure trove of planets

Science Operations with the Square Kilometre Array

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components

Deutscher Wetterdienst

RSC Analytical Division Strategy

ECMWF Computing & Forecasting System

One Optimized I/O Configuration per HPC Application

Dalton Cumbrian Facility. A state-of-the-art national user facility for nuclear research and development.

Big-Data as a Challenge for Astrophysics

SPECIAL PROJECT PROGRESS REPORT

GRAPE and Project Milkyway. Jun Makino. University of Tokyo

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

PART 3 Galaxies. Gas, Stars and stellar motion in the Milky Way

Swift: task-based hydrodynamics at Durham s IPCC. Bower

Jake Diebolt, GIS Technician/Coordinator

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

Life Cycle of a Star - Activities

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

COSMOLOGICAL SIMULATIONS OF THE UNIVERSE AND THE COMPUTATIONAL CHALLENGES. Gustavo Yepes Universidad Autónoma de Madrid

DEUS Full Observable ΛCDM Universe Simulation: the numerical challenge

Harvard Center for Geographic Analysis Geospatial on the MOC

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

Microlensing Studies in Crowded Fields. Craig Mackay, Institute of Astronomy, University of Cambridge.

Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa

Towards a City Model for Heritage-Led Regeneration and Tourism Development

Overview of the Square Kilometre Array. Richard Schilizzi COST Workshop, Rome, 30 March 2010

Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA. Bill Putman

Near-real-time satellite data processing at NIWA with Cylc

Investigating Solar Power in Different Weather Conditions.

MIT Exploring Black Holes

Toward models of light relativistic jets interacting with an inhomogeneous ISM

The Milky Way Galaxy

Computational Physics Computerphysik

Directed Reading A. Section: The Life Cycle of Stars TYPES OF STARS THE LIFE CYCLE OF SUNLIKE STARS A TOOL FOR STUDYING STARS.

GraspIT Questions AQA GCSE Physics Space physics

Earth in Space. Stars, Galaxies, and the Universe

Stars, Galaxies & the Universe Lecture Outline

Parallel Eigensolver Performance on High Performance Computers 1

Direct Self-Consistent Field Computations on GPU Clusters

Listening for thunder beyond the clouds

LESSON 1. Solar System

STARS AND GALAXIES STARS

Advancing Weather Prediction at NOAA. 18 November 2015 Tom Henderson NOAA / ESRL / GSD

The Square Kilometre Array

Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond

Perm State University Research-Education Center Parallel and Distributed Computing

MAJOR SCIENTIFIC INSTRUMENTATION

Heidi B. Hammel. AURA Executive Vice President. Presented to the NRC OIR System Committee 13 October 2014

What is the solar system?

Transcription:

DiRAC@Durham The Memory Intensive System The DiRAC-2.5x Memory Intensive system at Durham in partnership with Dell Dr Lydia Heck, Technical Director ICC HPC and DiRAC Technical Manager 1

DiRAC Who we are: a U.K. National HPC facility distributed over four academic institutions http://www.dirac.ac.uk Managed by Scientists for their Science Areas DiRAC Services support a significant portion of STFC s science programme, providing simulation and data modelling resources for the UK Frontier Science theory community in Particle Physics, astroparticle physics, Astrophysics, cosmology, solar system & planetary science... Dirac web pages. A partner in the National e-infrastructure under the umbrella of IRIS. We have a full management structure Management team: Director, technical director, innovation director, project scientist, technical manager programme management board (PMB): management team + scientists Oversight Committee (OSC): independent, external. Fully refereed resource allocation through the DiRAC Resource Allocation committee (RAC): managed by STFC, chaired by scientists from both Astrophysics and Particle Physics 2

DiRAC - History 2009 DiRAC-1: 13 Installations ( 13M) Part of DiRAC-1 COSMA4 in Durham installed in 2010 with 2976 cores Intel Westmere 2.67 GHz and 14.8 Tbyte of RAM. 2012 DiRAC-2 : ( 15M) 4 sites and 5 installations: Funded by the U.K. government Department for Business, Innovation & Skills (BIS) OPEX funded by the U.K. Science and Technology Facilities Council (STFC) The procurements (OJEU) were done in less than 100 days. DiRAC-2 installations with a total peak performance of 2 PFlops From North to South: Edinburgh Bluegene (UK-QCD) QCD simulations; some astrophysical calculations were ported (IBM) 20 th of Top 500, June 2012 Durham (ICC) - COSMA Intel SandyBridge cluster Cosmological simulations; Solar system and planetary modelling; gravitational waves; Beam optimisation simulations (IBM) 134 th of Top 500, June 2012 Cambridge Cosmos Shared Memory SGI system; cosmological data modelling (SGI) Cambridge Darwin Intel SandyBridge cluster QCD, cosmological and astrophysical simulations; solar and planetary physics (Dell) 93 rd of Top 500, June 2012 Leicester Complexity Intel SandyBridge Cluster - astrophysical simulations; solar and planetary physics; cosmological and astrophysical simulations; star formation (HP) 3

DiRAC - History 2016 DiRAC-2.5 Edinburgh: transfer of the Bluegene-Q system from the Hartree Centre at Daresbury as spare parts ( 30k) Durham (ICC) COSMA6 - repurposing of the Blue Wonder cluster (114 th in Top 500, June 2012) gift from the Hartree Centre and rebuilt in partnership with OCF and DDN and a lot of willing hands from the ICC adding 8000 cores to the existing system: total of 14720 cores ( 400k) Cambridge new installation (Dell): 13% of CSD3 (768 nodes; 24,576 cores, Intel Omnipath), CSD3-GPU (360 nodes; 1440 GPUs, Mellanox EDR) and CSD3-KNL (342 nodes; 342 KNL, Intel Omnipath). Leicester Complexity repurposing local HPC cluster to add 3000 Intel SandyBridge cores: to a total of 7800. 4

DiRAC-2.5x Autumn 2017 Total spend: 9M - 3 competitive tenders. Edinburgh to replace ailing Bluegene Q Extreme Scaling Durham to add a system with 100 Tbyte of RAM and fast checkpointing i/o system Memory Intensive Leicester to replace the DiRAC-2.5 add-on with modern competitive cores and adding substantial shared memory systems Data Intensive @ Leicester Investigations into Cloud Computing how could Public Cloud benefit DiRAC and how could DiRAC offer cloud Investigations in optimal ways of transferring data 5

DiRAC @ Edinburgh Extreme Scaling (ES) Delivered and installed by HPE (680 Tflops) The Extreme Scaling Service is hosted by the University of Edinburgh. DiRAC Extreme Scaling (also know as Tesseract) is available to industry, commerce and academic researchers. Intel Xeon Skylake 4116 processors, 844 nodes, 12 cores per socket, two sockets per node, FMA AVX512, 2.2GHz base, 3.0Ghz turbo, 96GB RAM Hypercube Intel Omnipath Interconnect 2.4PB Lustre DDN storage This system is configured for good to excellent strong scaling and vectorised codes and has High Performance I/O and Interconnect. Edinburgh Castle Public Body for Scotland's Historic EnvironmentHistoric Environment Scotland 6

DiRAC @ Durham - Memory Intensive (MI) COSMA5 (2012): (IBM/Lenovo/DDN) 6,400 Intel SandyBridge, 2.6 GHz, 51 Tbyte of RAM; Mellanox FDR10 interconnect in 2:1 blocking; 2.5 Pbyte of GPFS data storage COSMA6 ( 2016): (IBM/Lenovo/DDN) 8192 Intel SandyBridge, 2.6 GHz, 65 Tbyte of RAM; Mellanox FDR10 interconnect in 2:1 blocking; 2.5 Pbyte of Lustre data storage COSMA7 (Dell, 2018): 4116 Intel Skylake 5120 cores, Mellanox EDR in a 2:1 blocking configuration with islands of 24 nodes; a total of 110 Tbyte of RAM; a fast checkpointing i/o system (343 Tbyte) with peak performance of 185 Gbyte/second write and read; a data storage of 1.8 Pbyte. The system was delivered by Dell 7

DiRAC @ Cambridge Data Intensive (DiC) DiRAC has a 13% share of the CSD3 petascale HPC platform (Peta4 & Wilkes2), hosted at Cambridge University (Dell) Peta4 The Peta4 system provides 1.5 petaflops of compute capability: 342 C6320p node Intel KNL Cluster (Intel Xeon Phi CPU 7210 @1.30Ghz) with 96GB of RAM per node. 768 Skylake nodes each with 2 x Intel Xeon Skylake 6142 processors, 2.6GHz 16-core (32 cores per node 384 nodes with 192 GB memory 384 nodes with 384 GB memory The HPC interconnect is Intel OmniPath in 2:1 Blocking The storage consists of 750 TB of disk storage offering a Lustre parallel filesystem and 750 GB of Tape. Wilkes2 The Wilkes2 system provides 1.19 petaflops of compute capability: 360 NVIDIA GPU cluster with four NVIDIA Tesla P100 GPUs, in 90 Dell EMC server nodes, each with 96GB memory connected by Mellanox EDR Infiniband, providing 1.19 petaflops of computational performance. 13% of 8

DiRAC @ Leicester Data Intensive Delivered and installed by HPE, 2018 Data Intensive 2.5x (DiL) The DI system has two login nodes, Mellanox EDR interconnect in a 2:1 blocking setup and 3PB Lustre storage. Main Cluster 136 dual-socket nodes with Intel Xeon Skylake 6140, two FMA AVX512, 2.3GHz; 36 cores, 192 GB RAM. 4896 cores in total. Large-Memory 1 x 6TB server with 144 cores X6154@ 3.0GHz base 3 x 1.5TB server with 36 cores X6140@ 2.3GHz base The DI System at Leicester is designed to offer fast, responsive I/O. Data Intensive 2 (formerly Complexity ) 272 Intel Xeon Sandybridge nodes with 128 GB RAM per node, 4352 cores (95Tflop/s) connected via non-blocking Mellanox FDR interconnect. This cluster features an innovative Switching architecture designed, built and delivered by Leicester University and Hewlett Packard. The total storage available to both systems is in excess of 1PB. 9

DiRAC @ Durham MI COSMA7 (Dell, 2018) 4116 Intel Skylake 5120 cores, Mellanox EDR in a 2:1 blocking configuration with islands of 24 nodes; 2 x 1.5 Tbyte login nodes 1 x 3 Tbyte 4 socket for Durham Astrophysics a total of 110 Tbyte of RAM; a fast checkpointing i/o system (343 Tbyte) with peak performance of 185 Gbyte/second write and read; a data storage of 1.8 Pbyte. 10

DiRAC @ Durham MI The system was delivered by Dell in March 2018; Installed by Alces The DiRAC service started on 1 May 2018 Industrial engagement: Aligns closely with the Industrial Strategy of the Department for Business, Energy and Industrial Strategy (BEIS). funding leads to industrial engagement; this results in innovation, both leading to benefits academia and the wider industry. 11

How is DiRAC@Durham MI different? a fast checkpointing i/o system (343 Tbyte) with peak performance of 185 Gbyte/second write and read; 15 Lustre Object Storage Servers on Dell 640 nodes each with 2 x Intel Skylake 5120 processors 192 Gbyte of RAM 8 x 3.2 TB NVMe SFF drives. 1 Mellanox EDR card A user code benchmark produced 180 Gbyte/second write and read This is almost wire-speed! This is currently the fastest filesystem in production in Europe 12

Kilo Watt hours over 5 years How is DiRAC@Durham MI different? Power usage for snapshots with different performance solutions 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 24-hour 12-hour 6-hour 4-hour 2-hour 1-hour Axis Title 30GB/sec 120GB/sec 140GB/sec 13

How is DiRAC@Durham MI different? Snapshot period 24-hour 12-hour 6-hour 4-hour 2-hour 1-hour hours/snap 30GB/sec 7,087,597 14,175,194 28,350,388 42,525,582 85,051,164 170,102,329 0.95 120GB/sec 1,771,899 3,543,799 7,087,597 10,631,396 21,262,791 42,525,582 0.24 140GB/sec 1,518,771 3,037,542 6,075,083 9,112,625 18,225,250 36,450,499 0.20 total number of cores 4,096 years of running 5 Total number of available cpu hours per year: 36,056,160 14

DiRAC @ Durham MI industrial engagement Dell partners: Alces integrator and offer for a CDT placement Intel collaboration on proofs of concept; extension of IPCC at Durham ICC; Mellanox student placement to optimize SWIFT for Mellanox interconnect; optimized openmpi for Mellanox infrastructure; membership of Centre of Excellence; Nvidia a 2 day Nvidia Hackerton in September 2018. More involvement as the partnerships are building 15

Science on DiRAC@Durham MI The system has been designed to allow for effective large scale cosmological structure calculations EAGLE likes fewer cores and more RAM per node. SWIFT should not really mind, however in order to do the detailed simulations lots of RAM is required. In both cases there are long run times. The Universe is close to 14 billion years old! Science aims: A run about 30 times bigger than the EAGLE run: The EAGLE run parameters: 1500^3 dark matter mass particles = 3.375* 10^9 1500^3 (= 3.375 * 10^9 ) baryonic (visible) matter particles Volume of the Eagle run: 100 Mega Parsec (1 parsec = 3.2 light years ) 10,000 particles per Milky Way galaxy 20 TB of RAM 16

Science on DiRAC@Durham MI Visible Matter Astrophysics: Galaxy formation, super nova, star formation 10^6 solar masses per mass particle; 10,000 mass particles per galaxy which is an effective resolution of 1000 parsec (or of the order of 3000 light years) Super Nova events create a lot of energy. Using previous methods this energy would be distributed to heat every particle. In EAGLE, this energy would heat a tiny fraction to 10^7 K with a hundred neighbours. The cooling rate would be low. To do this better the resolution has to improve by a factor of 10 at least to a resolution of 100 light years. It would be even better to go down to 1 light year. The stars in the milky way have a total of 5*10^10 solar masses. At a resolution of 10^6 solar masses/per mass particle this means that all stars are modelled by only about 50000 mass particles. (There are 250 +/- 150 x 10^9 stars in the galaxy). Gas in the present galaxy simulation is very limited features. The real galaxy has a complex intergalactic medium with dense cold gas clouds, warmer and diffuse hot gas and ionized gas. In more finer grain calculations, these could be modelled much more realistically. The first galaxies were are of the order 100 light years across. With the current resolution, these cannot be modelled. Our Milky Way is about 50,000 light years across. 17

Science on DiRAC@Durham MI Volume of the visible universe of the order of 10^31 cubic lightyears; the linear distance is of the order of 3x 10^10 light years. A run about 30 times bigger than the EAGLE run to investigate Verifying Einstein s Gravity laws scale out in volume keeping the same resolution. The evolution of the structure of the universe can calculated without a significant influence of the baryonic physics (less than 1%). This is only possible on larger scales where gravity of dark matter dominates the baryonic physics. The EAGLE run model a volume of about 10^25 lightyears, which is 1/1,000,000 of the real universe The largest objects have not yet been found/modelled. 18

19