Red Sky. Pushing Toward Petascale with Commodity Systems. Matthew Bohnsack. Sandia National Laboratories Albuquerque, New Mexico USA

Similar documents
Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Stochastic Modelling of Electron Transport on different HPC architectures

VMware VMmark V1.1 Results

A Data Communication Reliability and Trustability Study for Cluster Computing

Quantum ESPRESSO Performance Benchmark and Profiling. February 2017

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

The Memory Intensive System

Cluster Computing: Updraft. Charles Reid Scientific Computing Summer Workshop June 29, 2010

Progress in NWP on Intel HPC architecture at Australian Bureau of Meteorology

One Optimized I/O Configuration per HPC Application

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

Quantum Chemical Calculations by Parallel Computer from Commodity PC Components

Update on Cray Earth Sciences Segment Activities and Roadmap

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Cactus Tools for Petascale Computing

ab initio Electronic Structure Calculations

Reliability at Scale

Ion Torrent. The chip is the machine

The Green Index (TGI): A Metric for Evalua:ng Energy Efficiency in HPC Systems

Synergie PC Updates and Plans. A. Lasserre-Bigorry, B. Benech M-F Voidrot, & R. Giraud

16 Input AC Block I/O Module

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

WRF performance tuning for the Intel Woodcrest Processor

High-Performance Computing and Groundbreaking Applications

Analysis of the Weather Research and Forecasting (WRF) Model on Large-Scale Systems

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

Scott Barr -AM Craig Fleming - FAE. FTTX Solutions 04/12/2017. SCTE N-CA Tech Days & Vendor Show

New Transmission Options Case Study Superconductor Cables. Transmission Policy Institute Sheraton Downtown Denver, Colorado April 20-21, 2011

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

P214 Efficient Computation of Passive Seismic Interferometry

Geog 469 GIS Workshop. Managing Enterprise GIS Geodatabases

From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess

Essentials of Large Volume Data Management - from Practical Experience. George Purvis MASS Data Manager Met Office

BMS. comemso. Battery Cell Simulator. Battery Cell Simulation virtual Battery. BCS Generation 6. Battery. Battery Management System (BMS)

The conceptual view. by Gerrit Muller University of Southeast Norway-NISE

SmartDairy Catalog HerdMetrix Herd Management Software

Reactive power control strategies for UNIFLEX-PM Converter

32 Input AC Block I/O Module

THE ENERGY 68,SAVING TDVI. exible Combine Modular 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44 HP 60, 62, 64, 66 HP

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

Chile / Dirección Meteorológica de Chile (Chilean Weather Service)

NOAA Research and Development High Performance Compu3ng Office Craig Tierney, U. of Colorado at Boulder Leslie Hart, NOAA CIO Office

Gravitational Wave Data (Centre?)

Improvements for Implicit Linear Equation Solvers

RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures

SIMATIC Ident Industrial Identification Systems

Laird Thermal Systems Application Note. Precise Temperature Control for Medical Analytical, Diagnostic and Instrumentation Equipment

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Parallel Performance Studies for a Numerical Simulator of Atomic Layer Deposition Michael J. Reid

Multi-function, economical instrument with 4 functions in the same casing. Accessibility and safety: large-dimension terminals Insulated circuits

Architecture-Aware Algorithms and Software for Peta and Exascale Computing

Leveraging Web GIS: An Introduction to the ArcGIS portal

Exascale I/O challenges for Numerical Weather Prediction

Welcome to MCS 572. content and organization expectations of the course. definition and classification

All-in-one or BOX industrial PC for autonomous or distributed applications

1 Project Summary. 2 Project Description. 2.1 Background and Goals

Supercomputer Programme

arxiv:astro-ph/ v1 15 Sep 1999

Domain Decomposition-based contour integration eigenvalue solvers

Overview of the Square Kilometre Array. Richard Schilizzi COST Workshop, Rome, 30 March 2010

Presentation Outline

GPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018

ArcGIS Data Models: Raster Data Models. Jason Willison, Simon Woo, Qian Liu (Team Raster, ESRI Software Products)

Scalable and Power-Efficient Data Mining Kernels

What is Albira and How Does the System Work and Why is it Differentiated Technology? Page 1

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY

Thermomagnetic Siphoning on a Bundle of Current-Carrying Wires

Performance Metrics & Architectural Adaptivity. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

ITB Cisco Cost Worksheet

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory

LABOTRON EXTRACTION & SYNTHESIS 2450 MHz

SurPASS. Electrokinetic Analyzer for Solid Samples. ::: Innovation in Materials Science

ArcGIS Deployment Pattern. Azlina Mahad

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

Vaisala AviMet Automated Weather Observing System

473-SHX Dew Point Mirror

MSC HPC Infrastructure Update. Alain St-Denis Canadian Meteorological Centre Meteorological Service of Canada

Earth Observation and GEOSS in Horizon Copernicus for Raw Material Workshop 5 th September 2016

R. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012

Cartesius Opening. Jean-Marc DENIS. June, 14th, International Business Director Extreme Computing Business Unit

Data Replication in LIGO

Data Center Solutions: OASIS Indirect Evaporative Cooler. Craig MacFadyen EMEA & APAC Technical Sales Manager Data Center Solutions

Unidata Community Equipment Awards Cover Sheet. Proposal Title: Upgrading the Rutgers Weather Center to Meet Today s Needs

Highly-scalable branch and bound for maximum monomial agreement

The WeatherShare Project: Aggregation and Dissemination of Weather Information for Public Safety

Silicon Cube: A Supercomputer Specially Designed for Meteorological Applications

xlogic SuperRelay is a compact and expandable CPU replacing mini PLCs, multiple timers, relays and counters.

cobas 8000 modular analyzer series Intelligent LabPower

Introduction to Portal for ArcGIS. Hao LEE November 12, 2015

Chapter 7. Sequential Circuits Registers, Counters, RAM

Parallelization of the Molecular Orbital Program MOS-F

Sun SPOTs: A Great Solution for Small Device Development

MODULE 1 INTRODUCING THE TOWNSHIP RENEWAL CHALLENGE

Empowered lives. Resilient nations. Data for People and Planet. powered by

Advanced Computing Systems for Scientific Research

Agilent 7500a Inductively Coupled Plasma Mass Spectrometer (ICP-MS)

ECE 574 Cluster Computing Lecture 20

Troubleshooting IOM Issues

Transcription:

Red Sky Pushing Toward Petascale with Commodity Systems Matthew Bohnsack Sandia National Laboratories Albuquerque, New Mexico USA mpbohns@sandia.gov Tuesday March 9, 2010 Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 1 / 35

Introduction 1 Introduction 2 People 3 Hardware 4 Software 5 Performance 6 Questions? Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 2 / 35

Introduction HPC at Sandia HPC at Sandia Capability Computing Designed for scaling for single large runs Usually proprietary for maximum performance Red Storm is Sandia s current capability machine Capacity Computing Computing for the masses 100s of jobs and 100s of users Extreme reliability required Flexibility for changing workload Thunderbird will be decommissioned this quarter Red Sky is our future capacity computing platform Red Mesa machine for National Renewable Energy Lab Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 3 / 35

Introduction Strategic Goals Strategic Goals Meet critical and growing need Thunderbird being decommissioned Capacity systems oversubscribed by 4 Set a new standard for value Create strategic partnerships Engage tier 1 vendor (Sun/Oracle) Leverage supply chain (Intel) Diversify to energy sector (NREL) Sustain leadership Demonstrate feasibility of petascale midrange system Democratize benefits of Red architecture Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 4 / 35

Introduction Main Themes Main Themes Cheaper 5 capacity of Tbird at 2/3 the cost Substantially cheaper per FLOP than recent capacity platforms Leaner Lower operational costs Three security environments via modular fabric Expandable, upgradable, extensible Designed for 6 year life cycle Greener 15% less power... 1/6 power per flop 40% less water... 5M gallons saved annually Near 10 better cooling efficiency 4 denser footprint Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 5 / 35

Introduction Major Innovations Major Innovations Bridging from capacity to capability Many Red characteristics at commodity price 2-3 faster than Red Storm in mid range 1/3 operational costs Top ten Red Sky innovations Petascale midrange system Intel Nehalem processor QDR InfiniBand 3D mesh/torus 12 optical cabling Optical Red/Black switching Refrigerant cooling / glacier doors Power distribution Routing and interconnect resiliency Minimal Ethernet & boot over IB Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 6 / 35

Introduction Floorplan Floorplan: 68 Blade Racks + 20 Storage Racks Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 7 / 35

Introduction Capacity Computing at Sandia Capacity Computing at Sandia Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 8 / 35

People 1 Introduction 2 People 3 Hardware 4 Software 5 Performance 6 Questions? Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 9 / 35

People People Integrating an innovative 500+ TFLOP/s system is not easy! It requires smart, hard-working people: Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 10 / 35

Hardware 1 Introduction 2 People 3 Hardware 4 Software 5 Performance 6 Questions? Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 11 / 35

Hardware Hardware Overview Hardware Overview 505 TFLOP/s Peak 5,386 nodes (2,693 Sun X6275 blades) 2.93 GHz quad core, Nehalem X5570 processors (43,088 total cores) 12 GiB DDR3 RAM per node (1.5 per core 64 TiB total RAM) 3D torus InfiniBand QDR via Mellanox ConnectX on MB and InfiniScale IV in QNEM 1,440 12 IB cables = 9.1 miles (220 miles of optical strands) 2,304 1 TB Seagate disks in 96 J4400 JBOD enclosures 2 PB (raw) for /scratch filesystems R134a-based cooling doors 1.7 MW power 1,848 square feet of space in 6 rows 68 Sun C48 cabinets up to 96 nodes per rack up to 768 cores per rack Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 12 / 35

Hardware Power Power Specs Savings High density APC modular PDU: 288 kw in 1/2 rack Half rack for six Sun 6048 Racks Safely service without forced shutdowns 400 A 240Y / 415 V three phase input feed 24 3 16 A 240 V power whips Three-to-one reduction in cables Delivers far more power per square foot Copper Smaller wire size for 415 V Load Power Supply Efficiency Less Cooling Required Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 13 / 35

Hardware Cooling Cooling Sun s Glacier Door 1st rack-mounted, refrigerant-based, passive cooling system on the market Liebert s XDP First deployment Pumping unit isolates chilled water system from refrigerant circuit Operates above dew point No compressor Power for cooling rather than dehumidification 0.13 kw per kw cooling Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 14 / 35

Hardware Rack Rack: Sun Blade 6048 Chassis 4 shelves in a rack 12 blade slots per shelf 2 nodes per blade slot with X6275 1 Chassis Management Module (CMM) per shelf 1 QNEM in each shelf Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 15 / 35

Hardware Blade Blade: Sun X6275 (Vayu) 2 Nodes per Blade Dual-Socket Nehalem-EP Node 2.93 GHz quad-core, 93.8 GFLOP/s peak 3-channel integrated memory controller 1333 MHz DDR3 memory 12 GiB per node 63.9 GB/s peak Integrated Ethernet Shared 10/100 mgmt. network 1 Gbit/s Ethernet via NEM (OOB mgmt only) Integrated QDR InfiniBand host adapter Mellanox ConnectX 40 Gbit/s to NEM module Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 16 / 35

Hardware QNEM QNEM: 3D Torus Building Block QDR Network Express Module (QNEM) Four in each blade rack (one per shelf) Two vertices per shelf, with intra-shelf Z connectivity on PCB. These switches are interconnected with each other No core switches are used Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 17 / 35

Hardware Example 3D Torus Built From 36 Racks 3-Torus Example: 288 36-Port Switch Chip Nodes Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 18 / 35

Hardware Example 3D Torus Built From 36 Racks 3-Torus Example: Z Links, No Wrap-Around Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 19 / 35

Hardware Example 3D Torus Built From 36 Racks 3-Torus Example: X Links, No Wrap-Around Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 20 / 35

Hardware Example 3D Torus Built From 36 Racks 3-Torus Example: Y Links, No Wrap-Around Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 21 / 35

Hardware Example 3D Torus Built From 36 Racks 3-Torus Example: Z Wrap-Around Links Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 22 / 35

Hardware Example 3D Torus Built From 36 Racks 3-Torus Example: X Wrap-Around Links Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 23 / 35

Hardware Example 3D Torus Built From 36 Racks 3-Torus Example: Y Wrap-Around Links Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 24 / 35

Hardware Example 3D Torus Built From 36 Racks 3-Torus Example: Host Bristles Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 25 / 35

Software 1 Introduction 2 People 3 Hardware 4 Software 5 Performance 6 Questions? Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 26 / 35

Software Software Overview Software Overview CentOS 5.3 OFED 1.4.2 SNL modified OpenSM with custom routing engine (torus-2qos) Diskless boot over IB using a custom isolinux bootstrap or gpxe onesis for shared image and diskless/stateless boot git for image management and revision control SNL-developed system management toolset SNL-developed RAS system Linux software RAID Lustre 1.8.x with patchless clients SLURM + Moab workload manager Intel compiler suite OpenMPI 1.4.1+ Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 27 / 35

Software Service Nodes and Who Boots Whom Service Nodes and Who Boots Whom Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 28 / 35

Software Integration Challenges Integration Challenges Naming and attributes Red/Black switching, swings, and expansion No client Ethernet 3D torus on InfiniBand No good, resilient routing algorithm for torus Some difficulty with 12 fiber IB cables Software RAID for Lustre back-end storage Boot over InfiniBand New cooling system and impact on operation activities Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 29 / 35

Performance 1 Introduction 2 People 3 Hardware 4 Software 5 Performance 6 Questions? Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 30 / 35

Performance Linpack Linpack Official Top 500 November 2009 #10 result: 423.9 TFLOP/s on 5,202 nodes 86.9% efficiency ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR03C2L4 2479989 128 102 102 23988.13 4.239e+05 -------------------------------------------------------------------------------- Ax-b _oo/(eps*( A _oo* x _oo+ b _oo)*n)= 0.0006766... PASSED ================================================================================ Unofficial #9 result: 433.5 TFLOP/s on 5,305 nodes 87.2% efficiency ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR03C2L4 2504421 128 103 103 24158.53 4.335e+05 -------------------------------------------------------------------------------- Ax-b _oo/(eps*( A _oo* x _oo+ b _oo)*n)= 0.0005830... PASSED ================================================================================ Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 31 / 35

Performance CTH CTH Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 32 / 35

Performance HPCCG HPCCG Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 33 / 35

Performance Sierra/Presto Sierra/Presto Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 34 / 35

Questions? 1 Introduction 2 People 3 Hardware 4 Software 5 Performance 6 Questions? Matthew Bohnsack (Sandia Nat l Labs) Red Sky Tuesday March 9, 2010 35 / 35