Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa

Similar documents
Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Numerical Simulations. Duncan Christie

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations

Parallel Multivariate SpatioTemporal Clustering of. Large Ecological Datasets on Hybrid Supercomputers

HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU

Acceleration of Deterministic Boltzmann Solver with Graphics Processing Units

Introduction to numerical computations on the GPU

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

CS-206 Concurrency. Lecture 13. Wrap Up. Spring 2015 Prof. Babak Falsafi parsa.epfl.ch/courses/cs206/

University of California High-Performance AstroComputing Center JOEL PRIMACK UCSC

arxiv: v1 [hep-lat] 7 Oct 2010

Astrophysics of Gaseous Nebulae and Active Galactic Nuclei

Direct Self-Consistent Field Computations on GPU Clusters

Solving PDEs with CUDA Jonathan Cohen

The GPU code FARGO3D: presentation and implementation strategies

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

Julian Merten. GPU Computing and Alternative Architecture

Face recognition for galaxies: Artificial intelligence brings new tools to astronomy

GPU Accelerated Markov Decision Processes in Crowd Simulation

Practical Combustion Kinetics with CUDA

COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD

Performance Evaluation of MPI on Weather and Hydrological Models

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Visualizing High-Resolution Simulations of Galaxy Formation and Comparing to the Latest Observations from Hubble and Other Telescopes

A CUDA Solver for Helmholtz Equation

Using AmgX to accelerate a PETSc-based immersed-boundary method code

2011 Arizona State University Page 1 of 6

The Square Kilometre Array & High speed data recording

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

arxiv: v1 [astro-ph.im] 20 Jan 2017

High-Performance Computing, Planet Formation & Searching for Extrasolar Planets

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Near-Infrared Imaging Observations of the Orion A-W Star Forming Region

Accelerated Neutrino Oscillation Probability Calculations and Reweighting on GPUs. Richard Calland

Toward models of light relativistic jets interacting with an inhomogeneous ISM

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

Prelab Questions for Hubble Expansion Lab. 1. Why does the pitch of a firetruck s siren change as it speeds past you?

Piz Daint & Piz Kesch : from general purpose supercomputing to an appliance for weather forecasting. Thomas C. Schulthess

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

Introduction to Neural Networks

ASTRONOMY (ASTRON) ASTRON 113 HANDS ON THE UNIVERSE 1 credit.

Bachelor and MSc thesis with CTAC (Center for Theoretical Astrophysics and Cosmology), Institute for Computational Science (UZH)

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

APPLICATIONS FOR PHYSICAL SCIENCE

GPU Acceleration of BCP Procedure for SAT Algorithms

CRYPTOGRAPHIC COMPUTING

Stochastic Modelling of Electron Transport on different HPC architectures

ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

One decade of GPUs for cosmological simulations (in Strasbourg) : fortunes & misfortunes

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

Galactic-Scale Winds. J. Xavier Prochaska Inster(stellar+galactic) Medium Program of Studies [IMPS] UCO, UC Santa Cruz.

Perm State University Research-Education Center Parallel and Distributed Computing

Real-time signal detection for pulsars and radio transients using GPUs

Acceleration of WRF on the GPU

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Welcome to MCS 572. content and organization expectations of the course. definition and classification

The European Southern Observatory - the Irish Perspective. Paul Callanan, on behalf of the Irish astronomical community

Astronomy 730 Course Outline

Heidi B. Hammel. AURA Executive Vice President. Presented to the NRC OIR System Committee 13 October 2014

Listening for thunder beyond the clouds

Astronomical Research at the Center for Adaptive Optics. Sandra M. Faber, CfAO SACNAS Conference October 4, 2003

Yale Center for Astronomy and Astrophysics, New Haven, USA YCAA Prize Fellowship

GPU Applications for Modern Large Scale Asset Management

Numerical Models of the high-z Universe

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Sunrise: Patrik Jonsson. Panchromatic SED Models of Simulated Galaxies. Lecture 2: Working with Sunrise. Harvard-Smithsonian Center for Astrophysics

High-resolution finite volume methods for hyperbolic PDEs on manifolds

GPU Computing Activities in KISTI

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark

The Memory Intensive System

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

From Piz Daint to Piz Kesch : the making of a GPU-based weather forecasting system. Oliver Fuhrer and Thomas C. Schulthess

Centaurus A: Some. Core Physics. Geoff Bicknell 1 Jackie Cooper 1 Cuttis Saxton 1 Ralph Sutherland 1 Stefan Wagner 2

Measuring freeze-out parameters on the Bielefeld GPU cluster

High-Performance Scientific Computing

Rick Ebert & Joseph Mazzarella For the NED Team. Big Data Task Force NASA, Ames Research Center 2016 September 28-30

Improving Dynamical Core Scalability, Accuracy, and Limi:ng Flexibility with the ADER- DT Time Discre:za:on

PuReMD-GPU: A Reactive Molecular Dynamic Simulation Package for GPUs

Background. Another interests. Sieve method. Parallel Sieve Processing on Vector Processor and GPU. RSA Cryptography

Future Improvements of Weather and Climate Prediction

Randomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory

Moving mesh cosmology: The hydrodynamics of galaxy formation

The Potential of Ground Based Telescopes. Jerry Nelson UC Santa Cruz 5 April 2002

High-performance computing and the Square Kilometre Array (SKA) Chris Broekema (ASTRON) Compute platform lead SKA Science Data Processor

ASTRONOMY (ASTR) 100 Level Courses. 200 Level Courses. 300 Level Courses

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

SWE Anatomy of a Parallel Shallow Water Code

Beam dynamics calculation

FIVE FUNDED* RESEARCH POSITIONS

Machine Learning Applications in Astronomy

Scalable and Power-Efficient Data Mining Kernels

Modified Physics Course Descriptions Old

The RAMSES code and related techniques 4. Source terms

Current Status of Chinese Virtual Observatory

Marla Meehl Manager of NCAR/UCAR Networking and Front Range GigaPoP (FRGP)

Transcription:

Accelerated Astrophysics: Using NVIDIA GPUs to Simulate and Understand the Universe Prof. Brant Robertson Department of Astronomy and Astrophysics University of California, Santa Cruz brant@ucsc.edu,

UC Santa Cruz: a world-leading center for astrophysics Home to one of the largest computational astrophysics groups in the world. Home to the University of California Observatories. World-wide top 5 graduate program for astronomy and astrophysics according to US News and World Report. Many PhD students in our program interested in professional data science. http://www.astro.ucsc.edu https://www.usnews.com/education/best-global-universities/space-science

GPUs as a scientific tool Grid code on a CPU Grid code on a GPU

A (brief) intro to finite volume methods conserved quantity at time n+1 Simulation cell z H i,j,k+ 1 2 u n+1 i,j,k = un i,j,k conserved quantity at time n t x t y t z F n+ 1 2 i 1 G n+ 1 2 i,j 1 H n+ 1 2 i,j,k 1 2 F n+ 1 2 2,j,k i+ 1 2,j,k 2,k G n+ 1 2 i,j+ 1 2,k H n+ 1 2 i,j,k+ 1 2 G F i,j+ 1 i+ 1 2,j,k 2,k fluxes of conserved quantities across each cell face x y

Conserved variable update in standard C for (i=0; i<nx; i++) { density[i] += dt/dx * (F.d[i-1] - F.d[i]); momentum_x[i] += dt/dx * (F.mx[i-1] - F.mx[i]); momentum_y[i] += dt/dx * (F.my[i-1] - F.my[i]); momentum_z[i] += dt/dx * (F.mz[i-1] - F.mz[i]); Energy[i] += dt/dx * (F.E[i-1] - F.E[i]); } Simple loop; potential for loop parallelization, vectorization.

Conserved variable update using CUDA // copy the conserved variable array onto the GPU cudamemcpy(dev_conserved, host_conserved, 5*n_cells*sizeof(Real), cudamemcpyhosttodevice); // call cuda kernel Update_Conserved_Variables<<<dimGrid,dimBlock>>>(dev_conserved, F_x, nx, dx, dt); // copy the conserved variable array back to the CPU cudamemcpy(host_conserved, dev_conserved, 5*n_cells*sizeof(Real), cudamemcpydevicetohost); Memory transfer, CUDA kernel, memory transfer

Conserved variable update CUDA kernel void Update_Conserved_Variables(Real *dev_conserved, Real *dev_f, int nx, Real dx, Real dt) { // get a global thread ID id = threadidx.x + blockidx.x * blockdim.x; } // update the conserved variable array if (id < nx) { dev_conserved[ id] += dt/dx * (dev_f[ id-1] - dev_f[ id]); dev_conserved[ nx + id] += dt/dx * (dev_f[ nx + id-1] - dev_f[ nx + id]); dev_conserved[2*nx + id] += dt/dx * (dev_f[2*nx + id-1] - dev_f[2*nx + id]); dev_conserved[3*nx + id] += dt/dx * (dev_f[3*nx + id-1] - dev_f[3*nx + id]); dev_conserved[4*nx + id] += dt/dx * (dev_f[4*nx + id-1] - dev_f[4*nx + id]); } Mapping between CUDA thread and simulation cell; memory coalescence for transfer efficiency.

Cholla: Computational hydrodynamics on ll (parallel) architectures Cholla are also a group of cactus species that grows in the Sonoran Desert of southern Arizona. A GPU-native, massivelyparallel, grid-based hydrodynamics code written by Evan Schneider for her PhD thesis. Incorporates state-of-the-art hydrodynamics algorithms (unsplit integrators, 3rd order spatial reconstruction, precise Riemann solvers, dual energy formulation, etc). Includes GPU-accelerated radiative cooling and photoionization. github.com/cholla-hydro/cholla Schneider & Robertson (2015)

Cholla leverages the world s most powerful supercomputers Titan: Oak Ridge Leadership Computing Facility

Cholla achieves excellent scaling to >16,000 NVIDIA GPUs Strong Scaling test, 512 3 cells Weak Scaling test, ~322 3 cells / GPU Strong scaling: Same total problem size, work divided amongst more processors. Weak scaling: Total problem size increases, work assigned to each processor stays the same. Tests performed on ORNL Titan (AST 109, 115, 125). Schneider & Robertson (2015, 2017)

2D implosion test with Cholla on NVIDIA GPUs Example test calculation: implosion (1024 2 ) P =1 =1 55,804,166,144 cell updates symmetric about y=x to roundoff error P =0.14 =0.1

Application: modeling galactic outflows Image credit: hubblesite.org

Cholla can simulate the structure of galactic winds Important questions: z How does mass and momentum become entrained in galactic winds? vshock Cloud How does the detailed structure of galactic winds arise? y Shock Front x Cholla + NVIDIA GPUs form a unique tool simulating astrophysical fluids.

Cholla can simulate the structure of galactic winds Schneider, E. & Robertson, B. 2017, ApJ, 834, 144 1.25e9 cells, 512 NVIDIA K20X GPUs on ORNL Titan

Leveraging the NVIDIA DGX-1 for astrophysical research NVIDIA DGX-1 2x 20-core Intel E5-2698 v4 CPUs, 8x NVIDIA P100 GPUs, 768 GB/s Bandwidth, 4x Mellanox EDR Infiniband NICs Unlike risk-adverse mission-critical astronomical software, pipeline and high-level analysis software can leverage new and emerging technologies. Utilize investments in software from Silicon Valley, data science, other industries. UCSC Astrophysicists use the NVIDIA DGX-1 for astrophysical simulation and astronomical data analysis.

Accelerated simulations of disk galaxies The UCSC Astrophysics DGX-1 system is our development platform for constructing complex initial conditions. The DGX-1 system is powerful enough to perform high-quality Cholla simulations of disk galaxies. 256 3, single P100, 2hrs

Cholla + Titan global outflow simulations of galactic outflows 2048 cells 2048 cells Cholla simulations of M82 initial conditions gain region 4096 cells ~66,000 ly Rev. Astron. Astrophys. 2005 ess provided by University of Arizo Indiana Yale NOAO telescope in Hα ( h, Gallagher & Westmoquette). starclusters embedded ~33,000 ly

Cholla + ORNL Titan global simulations of galactic outflows density temperature Test calculation on Titan - 1024 3, largest hydro simulation of a single galaxy ever performed. x-y 512 K20X GPUs, 6hours, ~90K core hours ~47M core hour allocation (AST-125) x-z

Using NVIDIA GPUs for astronomical data analysis Hubble Ultra Deep Field

Human galaxy classification. Expert classifications of Hubble images from the CANDELS survey. Kartaltepe et al., ApJS, 221, 11 (2015)

Human galaxy classification does not scale. New observatories will image >10 billion galaxies.

Morpheus a UCSC deep learning model for astronomical galaxy classification by Ryan Hausen NVIDIA DGX-1 Convolution Layers Residual Block Keeps Same Dimensions Addition Residual Block Input + Output Identity Fully Connected Fully Connected Layer Layer Hausen & Robertson, (in preparation) Multiband Imaging Class Classification PDF Series of Residual Blocks

Hausen & Robertson, Morpheus preliminary

Summary The Cholla hydrodynamical simulation code uses NVIDIA GPUs to model astrophysical fluid dynamics, written by Evan Schneider for her PhD thesis supervised by Brant Robertson. UCSC Astrophysics is using the ORNL Titan supercomputer and DGX-1 system, each powered by NVIDIA GPUs, for astrophysical simulation and astronomical data analysis. The Morpheus Deep Learning Framework for Astrophysics is under development by Ryan Hausen at UCSC for automated galaxy classification and other astrophysical machine learning applications.