MONTE CARLO NEUTRON TRANSPORT SIMULATING NUCLEAR REACTIONS ONE NEUTRON AT A TIME Tony Scudiero NVIDIA

Similar documents
Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

USE OF LATTICE CODE DRAGON IN REACTOR CALUCLATIONS

Temperature treatment capabilites in Serpent 2(.1.24)

Upcoming features in Serpent photon transport mode

On-the-fly Doppler Broadening in Serpent

Serpent Monte Carlo Neutron Transport Code

On the use of SERPENT code for few-group XS generation for Sodium Fast Reactors

Neutron Interactions with Matter

Using the Application Builder for Neutron Transport in Discrete Ordinates

New methods implemented in TRIPOLI-4. New methods implemented in TRIPOLI-4. J. Eduard Hoogenboom Delft University of Technology

PLATFORM DEPENDENT EFFICIENCY OF A MONTE CARLO CODE FOR TISSUE NEUTRON DOSE ASSESSMENT

MCNP6: Fission Multiplicity Model Usage in Criticality Calculations

2. The Steady State and the Diffusion Equation

2017 Water Reactor Fuel Performance Meeting September 10 (Sun) ~ 14 (Thu), 2017 Ramada Plaza Jeju Jeju Island, Korea

Full-core Reactor Analysis using Monte Carlo: Challenges and Progress

arxiv: v1 [hep-lat] 7 Oct 2010

sri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy

On-The-Fly Neutron Doppler Broadening for MCNP"

Neutronic analysis of SFR lattices: Serpent vs. HELIOS-2

3. State each of the four types of inelastic collisions, giving an example of each (zaa type example is acceptable)

TMS On-the-fly Temperature Treatment in Serpent

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

Delayed neutrons in nuclear fission chain reaction

MOx Benchmark Calculations by Deterministic and Monte Carlo Codes

Figure 1. Layout of fuel assemblies, taken from BEAVRS full core benchmark, to demonstrate the modeling capabilities of OpenMC.

Development of Multigroup Cross Section Generation Code MC 2-3 for Fast Reactor Analysis

PHYS 5020 Computation and Image Processing

ORNL Nuclear Data Evaluation Accomplishments for FY 2013

Assessment of the MCNP-ACAB code system for burnup credit analyses

Optimization studies of photo-neutron production in high-z metallic targets using high energy electron beam for ADS and transmutation

Preventing xenon oscillations in Monte Carlo burnup calculations by forcing equilibrium

YALINA-Booster Conversion Project

Reactors and Fuels. Allen G. Croff Oak Ridge National Laboratory (ret.) NNSA/DOE Nevada Support Facility 232 Energy Way Las Vegas, NV

DETERMINATION OF THE EQUILIBRIUM COMPOSITION OF CORES WITH CONTINUOUS FUEL FEED AND REMOVAL USING MOCUP

Nuclear Cross-Section Measurements at the Manuel Lujan Jr. Neutron Scattering Center

BURNUP CALCULATION CAPABILITY IN THE PSG2 / SERPENT MONTE CARLO REACTOR PHYSICS CODE

COMPARATIVE ANALYSIS OF WWER-440 REACTOR CORE WITH PARCS/HELIOS AND PARCS/SERPENT CODES

Reactor Physics: Basic Definitions and Perspectives. Table of Contents

Nuclear data sensitivity and uncertainty assessment of sodium voiding reactivity coefficients of an ASTRID-like Sodium Fast Reactor

Preliminary Uncertainty Analysis at ANL

Neutron and Gamma Ray Imaging for Nuclear Materials Identification

(1) SCK CEN, Boeretang 200, B-2400 Mol, Belgium (2) Belgonucléaire, Av. Arianelaan 4, B-1200 Brussels, Belgium

Vladimir Sobes 2, Luiz Leal 3, Andrej Trkov 4 and Matt Falk 5

CALIBRATION OF SCINTILLATION DETECTORS USING A DT GENERATOR Jarrod D. Edwards, Sara A. Pozzi, and John T. Mihalczo

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Control of the fission chain reaction

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 1 (2/3/04) Overview -- Interactions, Distributions, Cross Sections, Applications

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

11. Point Detector Tally

MC simulation of a PGNAA system for on-line cement analysis

Practical Combustion Kinetics with CUDA

Dose Rates Modeling of Pressurized Water Reactor Primary Loop Components with SCALE6.0

Brief Introduction to: Transport Theory/Monte Carlo Techniques/MCNP

2. The neutron may just bounce off (elastic scattering), and this can happen at all neutron energies.

Monte Carlo neutron transport and thermal-hydraulic simulations using Serpent 2/SUBCHANFLOW

Research Article Calculations for a BWR Lattice with Adjacent Gadolinium Pins Using the Monte Carlo Cell Code Serpent v.1.1.7

Reconstruction of Neutron Cross-sections and Sampling

Use of Monte Carlo and Deterministic Codes for Calculation of Plutonium Radial Distribution in a Fuel Cell

Uncertainty quantification using SCALE 6.2 package and GPT techniques implemented in Serpent 2

Validation of the Monte Carlo Model Developed to Estimate the Neutron Activation of Stainless Steel in a Nuclear Reactor

A Hybrid Stochastic Deterministic Approach for Full Core Neutronics Seyed Rida Housseiny Milany, Guy Marleau

Lecture 20 Reactor Theory-V

PhD Qualifying Exam Nuclear Engineering Program. Part 1 Core Courses

Chapter Four (Interaction of Radiation with Matter)

PROGRESS ON RMC --- A MONTE CARLO NEUTRON TRANSPORT CODE FOR REACTOR ANALYSIS

Neutron interactions and dosimetry. Eirik Malinen Einar Waldeland

Lectures on Applied Reactor Technology and Nuclear Power Safety. Lecture No 1. Title: Neutron Life Cycle

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Comparison of the Monte Carlo Adjoint-Weighted and Differential Operator Perturbation Methods

Real-time signal detection for pulsars and radio transients using GPUs

New Capabilities for the Chebyshev Rational Approximation method (CRAM)

Xenon Effects. B. Rouben McMaster University Course EP 4D03/6D03 Nuclear Reactor Analysis (Reactor Physics) 2015 Sept.-Dec.

CASMO-5/5M Code and Library Status. J. Rhodes, K. Smith, D. Lee, Z. Xu, & N. Gheorghiu Arizona 2008

High-Performance Computing, Planet Formation & Searching for Extrasolar Planets

MONTE CARLO SIMULATION OF VHTR PARTICLE FUEL WITH CHORD LENGTH SAMPLING

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Validation of the MCNP computational model for neutron flux distribution with the neutron activation analysis measurement

Randomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory

Continuous Energy Neutron Transport

Cross-Sections for Neutron Reactions

1 v. L18.pdf Spring 2010, P627, YK February 22, 2012

Consistent Code-to-Code Comparison of Pin-cell Depletion Benchmark Suite

HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU

Needs for Nuclear Reactions on Actinides

Michael Dunn Nuclear Data Group Leader Nuclear Science & Technology Division Medical Physics Working Group Meeting October 26, 2005

In the Memory of John Rowlands

Cold Critical Pre-Experiment Simulations of KRUSTy

Thermal Radiation Studies for an Electron-Positron Annihilation Propulsion System

Adjoint-Based Uncertainty Quantification and Sensitivity Analysis for Reactor Depletion Calculations

WM2016 Conference, March 6 10, 2016, Phoenix, Arizona, USA. Recovering New Type of Sources by Off-Site Source Recovery Project for WIPP Disposal-16604

Study on Nuclear Transmutation of Nuclear Waste by 14 MeV Neutrons )

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

Overview of SCALE 6.2

Modeling of Spent Fuel Pools with Multi stage, Response function Transport (MRT) Methodologies

Transmutation of Minor Actinides in a Spherical

MCNP CALCULATION OF NEUTRON SHIELDING FOR RBMK-1500 SPENT NUCLEAR FUEL CONTAINERS SAFETY ASSESMENT

Sodium void coefficient map by Serpent

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

Transcription:

MONTE CARLO NEUTRON TRANSPORT SIMULATING NUCLEAR REACTIONS ONE NEUTRON AT A TIME Tony Scudiero NVIDIA

TAKEAWAYS Why Monte Carlo methods are fundamentally different than deterministic methods Inherent Parallelism of Monte Carlo Methods is great on GPUs Parallelizing complex monte carlo simulations with highly involved samples on GPUs Approach Monte Carlo Divergence from a Memory Bandwidth Perspective

NEUTRONICS Study of the interaction of neutrons with specific materials and geometries. Nuclear Reactor Design Medical imaging Radiation Treatment Design Radiation Shielding Design

COMMON APPLICATION: K EFFECTIVE K effective - Effective Neutron multiplication factor Average number of neutrons resulting from fission which will go on to cause another fission. K eff =1 K eff > 1

NEUTRON TRANSPORT EQUATION Change in Angular Flux Loss to streaming Loss to Collision Gain From Fission Precursors Gain from Scattering Gain from Source http://en.wikipedia.org/wiki/neutron_transport

NEUTRONICS CODES Determinisitc NEWT (ORNL) Denovo (ORNL) PARTISN (LANL) DRAGON Monte Carlo MCNP (Los Alamos) Serpent (VTT Finland) KENO (ORNL) OpenMC (MIT)(ANL)

MONTE CARLO METHODS If at first you succeed, try again. And again. And again. And again. And again. And again. And Again. And Again. Useful for approximating solutions to problems with unknown or extraordinarily difficult direct solutions Computational Finance Computational Biology (protein folding, ion transport) Graphics (Ray Tracing, hair/cloth simulation) Virtual Lab Dangerous, Expensive, Impossible https://xkcd.com/242/

SIMPLE MONTE CARLO EXAMPLE Calculate Pi: 2 4 Generate N uniformly distributed random (x,y) points x,y 1 3 ops, 1 compare per sample No other data K40: 7.9E+10 Samples per second 3.1E+11 Samples per second Quad K40 http://upload.wikimedia.org/wikipedi a/commons/8/84/pi_30k.gif

SHIELDING EXAMPLE Neutron Detector 2 2 Neutron Source Neutron Detector Neutron Shield

SHIELDING EXAMPLE Not to scale Tally Scatter Tally Scatter Absorption Neutron Shield Material

CROSS SECTIONS Resonance frequency and shape varies by nuclide Nuclides have varying degrees of high-frequency resonances in cross section data. Aptly named: 1 barn = 1E-28 square meters. 1E-4 square yoctometers From: http://wwwndc.jaea.go.jp/j33fig/j33fig10.html

KORD SMITH CHALLENGE Delivered at Mathematics & Computation 2003 Monte Carlo simulation of power for fuel burnup calculations 40-60 million tallies with standard deviation of <1% Predicts 2030 a single processor can do this in 1 hr. Mathematics & Computation 2007: Bill Martin changes update to 2019

HOOGENBOOM AND MARTIN BENCHMARK Very specific: Geometry, Materials, concentrations, designs. Implement in any MC Neutronics app.(mcnp, OpenMC) Figures from Hoogenboom, J.E., Martin, W.R., Petrovic, B., 2011. The Monte Carlo performance benchmark test - aims, specifications, and first results. In International Conference on Mathematics and Computational Methods Applied to Nuclear Science and Engineering. Rio de Janerio, Brazil.

OPENMC Recently Developed MC Neutronics code from MIT Active Research at Center for Exascale Simulation of Advanced Reactors (CESAR) ANL ~30k lines of F90 Can compute the H-M benchmark In use as a neutronics virtual laboratory Primarily computational experiments thus far

MAJOR ALGORITHM SUBSECTIONS Geometry Identify the material in the neutron s location Constructive Solid Geometry Physics Probabilistic Collisions Collision Modeling: Inelastic Scattering Elastic Scattering Fission Delayed Neutron Prompt Neutron Cross Section Calculation

MONTE CARLO NEUTRONICS ON GPUS: DIVERGENCE While (neutron.alive) Sample Next Event Outer Loop Divergence Collison Scatter Inelastic Elastic Fission Inner Loop Divergence Absorption Delayed Prompt

OPENMC H-M SMALL ANALYSIS Percentage of walltime spent in specific functions.

XSBENCH DOE Benchmark for Material Cross Section calculation Derived from OpenMC ~1k lines of C99 Developed by John Tramm at ANL Used in NVIDIA s research on MC for future architectures

MATHEMATICAL FORMULATION Σ,, - Concentration of nuclide i in material -Number of different nuclides in material, - Cross section of Nuclide I at energy E Approximated by linear interpolation of proximal samples. Σ - Material cross section of type t. Total, Fission, Elastic, Inelastic, Capture E - random energy mat - random material

COMPUTATIONAL ANALYSIS Σ,, 5 different t s. M ranges from 4 to 321 Using linear interpolation for, involves loading 2 values, 3 flops 10 loads and 15 flops per t Additionally: 1 load for, 2 loads for E, 3 flops for interp. coefficient 104 bytes, 23 flops for each i =1:M

DIVERGENCE WITHIN XSBENCH Threads drop out as nuclide (M) loop completes. HM-Small warp lanes 350 300 Nuclides Per Material Time 0 1 2 3 30 31 NUMBER OF NUCLIDES 250 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 MATERIAL INDEX Small Large

XSBENCH PERFORMANCE COMPARISON HM Small HM Large LOOKUPS PER SECOND 40,000,000 30,000,000 20,000,000 10,000,000 0 LOOKUPS PER SECOND 12,000,000 10,000,000 8,000,000 6,000,000 4,000,000 2,000,000 0 Processor Lookups / Sec Speedup vs IVB10 SB6(HT) 4,452,404 0.62397 SB8 4,684,446 0.656488 SB8x2 6,253,792 0.87642 IVB 10 7,135,611 1 IVB 10 x2 11,939,573 1.673238 K40 Original 17,941,073 2.514301 K40 Tuned 39,761,679 5.572288 Processor Lookups / Sec Speedup vs IVB10 SB6(HT) 1,321,943 0.619345 SB8 1,453,827 0.681134 SB8x2 1,646,447 0.771379 IVB 10 2,134,420 1 IVB 10 x2 3,214,495 1.506027 K40 Original 5,059,439 2.370405 K40 Tuned 11,955,176 5.601136

GPU OPTIMIZATION TECHNIQUES USED Load outside Inner Loop Use large datatypes (double2) Use LDG intrinsic L1 Caching helps out AOS loads Lower load granularity Unroll outer loop Mitigate outer loop divergence Maximize loads in flight Pseudosort Warp Specialization for fuel ~60% of performance difference ~30% of performance difference <10% of performance difference

LOAD OUTSIDE INNER LOOP Energy Value Nuclide i ND for(int i=0; i<numnuclides[material]; i++) { double r[5]; // results f = (E - ND[0])/ (ND[6] - ND[0]); r[0] = ND[1] + f*(nd[7] - ND[1]); r[1] = ND[2] + f*(nd[8] - ND[2]); r[2] = ND[3] + f*(nd[9] - ND[3]); r[3] = ND[4] + f*(nd[10] - ND[4]); r[4] = ND[5] + f*(nd[11] - ND[5]); for(int i=0; i<numnuclides[material]; i++) { double r[5]; // results double2 d[6]; // register storage for(int t=0; t<6; t++) { d[t] = N0[t]; } f = (E - d[0].x)/ (d[3].x - d[0].x); r[0] = d[0].y + f*(d[3].y - d[0].y); r[1] = d[1].x + f*(d[4].y - d[1].y); r[2] = d[1].y + f*(d[4].y - d[1].y); r[3] = d[2].x + f*(d[5].y - d[2].y); r[4] = d[2].y + f*(d[5].y - d[2].y);

USE LDG INTRINSIC Energy Value Nuclide i 32 Bytes 128 Bytes for(int i=0; i<numnuclides[material]; i++) { double r[5]; // results double2 d[6]; // register storage for(int t=0; t<6; t++) { d[t] = ldg(&n0[t]); } 32 Byte Load Granularity Temporal Locality in Texture Cache SM_35+ (GK110 and later) f = (E - d[0].x)/ (d[3].x - d[0].x); r[0] = d[0].y + f*(d[3].y - d[0].y); r[1] = d[1].x + f*(d[4].y - d[1].y); r[2] = d[1].y + f*(d[4].y - d[1].y); r[3] = d[2].x + f*(d[5].y - d[2].y); r[4] = d[2].y + f*(d[5].y - d[2].y);

UNROLL OUTER LOOP Energy Value Nuclide i Nuclide i+1 ND0 for(int i=0; i<numnuclides[material]; i+=2) { double r[5]; // results double2 d[12]; // register storage for(int t=0; t<6; t++) { d[t] = ldg(&nd0[t]); d[t+6] ldg(&nd1[t]); } f0 = (E - d[0].x)/ (d[3].x - d[0].x);... // same compute as before f1 = (E - d[6].x)/ (d[9].x - d[6].x);... // Same compute with indexes+=6; ND1 Increases loads in flight before stall Mitigates divergence effect on BW Minute effect decreasing loop overhead Extra conditional code (not shown) Not just a #pragma unroll

PSEUDOSORT warp 0 warp 0 Time 0 1 2 3 30 31 0 1 2 3 30 31

FUEL-SORT SPECIALIZATION O(tb) scan moves all fuel lookups to first threads Expect < 32 fuel lookups => all fuel in warp0 Warp Specialization approach to divergence mitigation In-core(SM) sort, tb<=256 => low sorting cost Warp0 runs significantly longer than other warps Other warps do not clear SM, but are not scheduled ~10% improvement in memory bandwidth utilization ~2-5% performance improvement cost of scan reduces overall gains

XSBENCH PERFORMANCE COMPARISON HM Small HM Large LOOKUPS PER SECOND 40,000,000 30,000,000 20,000,000 10,000,000 0 LOOKUPS PER SECOND 12,000,000 10,000,000 8,000,000 6,000,000 4,000,000 2,000,000 0 Processor Lookups / Sec Speedup vs IVB10 SB6(HT) 4,452,404 0.62397 SB8 4,684,446 0.656488 SB8x2 6,253,792 0.87642 IVB 10 7,135,611 1 IVB 10 x2 11,939,573 1.673238 K40 Original 17,941,073 2.514301 K40 Tuned 39,761,679 5.572288 Processor Lookups / Sec Speedup vs IVB10 SB6(HT) 1,321,943 0.619345 SB8 1,453,827 0.681134 SB8x2 1,646,447 0.771379 IVB 10 2,134,420 1 IVB 10 x2 3,214,495 1.506027 K40 Original 5,059,439 2.370405 K40 Tuned 11,955,176 5.601136

FROM XSBENCH BACK TO OPENMC OpenMC on I7-3930k CPU (SB6 + HT) Small: 14.9k neutrons/second (OMP_NUM_THREADS=12) Large: 4.1k neutrons/second (OMP_NUM_THREADS=12) Git Cloned on Feb 21 st, 2014 K40 (full Boost) Projection Small: [ 39.4k - 133k ] neutrons / second Large: [ 9.0k - 37.07] neutrons / second Lower Bound: No acceleration of other code areas Upper Bound: Equal acceleration of other code areas Grain of Salt: Parallelizing the rest of the code generates additional memory traffic.

A LIVING CODE CUDA Conversion started in late 2012 OpenMC data structures have changed since original snapshot OpenMC CPU performance has changed as well. OpenMC Capabilities have expanding

WHAT S NEXT Integrate into New Versions of OpenMC New version more efficient for XS calculations Explore new Cross Section storage and calculations Explore other MC Neutronics Explore other MC transport (Photonics)

SUMMARY GPUs are very good at Monte Carlo Processes Embarrassingly Parallel on the surface Complex samples get... complicated. Monte Carlo Neutronics Require Out of Core operation, but do well on GPUs Complex simulation geometry is GPU s bread and butter OpenMC: Still in progress. Initial results are promising XSBench: Really really good on GPUs.

MORE MONTE CARLO ON GPU S4554 - GPU Acceleration of a Variational Monte Carlo Method - Wednesday 16:30 212A S4156 - Mining Hidden Coherence for Parallel Path Tracing on GPUs - Thursday 15:00 LL21C S4154 - Pricing American Options with Least Square Monte Carlo simulations on GPUs - Thursday 15:00 210C S4259 - Optimization of a CUDA-based Monte Carlo Code for Radiation Therapy - Wednesday 9:00 212A S4784 - Monte-Carlo Simulation of American Options with GPUs - Wednesday 10:30 210C S4261 - Massively-Parallel Stochastic Control and Automation: A New Paradigm in Robotics Wednesday 15:30

TAKEAWAYS Why Monte Carlo methods are fundamentally different than deterministic methods Inherent Parallelism of Monte Carlo Methods is great on GPUs Parallelizing complex monte carlo simulations with highly involved samples on GPUs Approach Monte Carlo Divergence from a Memory Bandwidth Perspective