High-performance computing and the Square Kilometre Array (SKA) Chris Broekema (ASTRON) Compute platform lead SKA Science Data Processor

Radio Astronomy and Computer Science Both very young sciences (1950s) Both matured due to technology developed in WW2 Computer Science: cryptography Radio Astronomy: radar Today radio astronomy is a data intensive science at the forefront of what Grey termed the 4th paradigm. The Square Kilometre Array (2020 ) is the poster child of this.

Aperture synthesis

History of radio astronomy First observations 1932 by Karl Jansky First purpose built telescope 1937 by Grote Reber 21 cm emission line of neutral hydrogen predicted 1944 by van de Hulst Detected in 1951 by Ewen and Purcell (MIT) Published after confirmation by Muller and Oort Opening Dwingeloo radio telescope in 1956 Doppler effect (redshift) of fast moving objects shows structure of the local galaxy (1950's)

History of radio astronomy

Doppler shift: velocity translated info frequency Andromeda Nebula

Observations are limited by Sensitivity collecting area Resolution diameter relative to λ (R=λ/D) System noise low-noise design, correlate Interference filters (spacial, spectral, etc) Foreground noise calibration techniques Processing / compute / data transport capacity

Aperture synthesis

Square Kilometre Array 197 dishes

Square Kilometre Array 131,072 antennas

SKA1_low SKA1_mid Location Australia South Afrika Number of receivers 131,072 (512 st x 256 el) 197 (133+64 MeerKAT) Receiver diameter 35m (station) 15m (13.5m MeerKAT) Maximum baseline 80km 150km Frequency channels 65,536 65,536 SDP input bandwidth 4.7 Tbps 5.2 Tbps Required compute capacity (*) 5.7 PFlops 24 PFlops (*) Work in progress

SKA: A Leading Big Data Challenge for 2020 Digital Signal Processing (DSP) Antennas To Process is HPC 2020: 100 PBytes/day 2028: 10,000 PBytes/day Over 10 s to 1000 s kms Transfer antennas to DSP 2020: 20,000 PBytes/day 2028: 200,000 PBytes/day Over 10 s to 1000 s kms HPC Processing 2020: 300 PFlop 2028: 30 EFlop High Performance Computing Facility (HPC)

SKA High level system overview

SKA SDP preliminary roll-out schedule

Architectural Principles Main principles: Ensure scalability Ensure affordability Ensure maintainability Support current state-of-the-art algorithms Exploit data parallelism, not just in frequency but also other dimensions We have only two fundamental/bulk data structures Raster grids and key-value-value stream records [e.g. u,v,w, -> visibility] Emphasis is on the framework to manage the throughput Hardware platform will be replaced on a short duty cycle c.f. any HPC facility Algorithms and workflow will evolve as we learn about telescopes Approach: Co-design of software and physical layer architectures

SDP context diagram with external interfaces Signal and Data Transport Telescope manager Central Signal Processor Signal and Data Transport Science Data Processor Signal and Data Transport power, cooling, floor space, etc Infrastructure World outside SKA observatory

Overview of Science Data Processor

Physical Layer

Top level SDP network

The Compute Island concept A Compute Island is a self-contained set of processing resources, capable of end-toend processing of a chunk of UVW-space, including - Ingest Buffer Calibration & Imaging Archive

SDP scaling using Compute Islands

Compute Node Candidate Architecture Heavily inspired by COBALT node Used for prelim costing & scaling Many alternatives possible ARM OpenPower + CAPI FPGA DSP Custom hardware combinations of above etc...

Required SDP capacity Required SDP computational capacity depends on 1. Input bandwidth 2. Computational intensity of the data reduction 3. Computational efficiency in % of R peak Top 500 computational efficiency Top 500 performance development projected SDP efficiency

Required SDP capacity