Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider Dr. Andrea Bocci Applied Physicist On behalf of the CMS Collaboration
Discover CERN Inside the Large Hadron Collider at CERN we recreate the conditions found in the very early Universe less than a second after the Bing Bang to create particles that are not present in ordinary matter. Hydrogen nuclei protons are accelerated up to 0.99999999 times the speed of light, to an energy that is almost 7000 their mass and then made to collide, converting all that energy into new particles that did no exist before the collisions like the Higgs boson. 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 2
Discover CERN
the CMS detector Muon detectors 4 Tesla superconducting magnet Hadronic calorimeter brass and scintillators Inner Tracker silicon pixel and silicon strip Electromagnetic calorimeter lead tungstate crystals Full readout: 1 MB/event ADC conversion zero suppression 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 4
Data Acquisition constraints The LHC produces over 3 billion proton collisions per second in ATLAS and CMS over 1.5 billion in each experiment grouped in 30 millions events per second on average, over 50 collisions per event! Recording every event would require reading and processing 30 TB/s of data well over the capacity of the detector electronics of the online and offline event processing and of the long-term storage Instead, the experiments rely on a two-level trigger system to collect only the interesting events e.g. Standard Model probes, Higgs boson candidates, possible signatures of new physics the Level 1 Trigger (L1T), implemented with custom electronics, reduces the data rate from 30 MHz to 100 khz The High Level Trigger (HLT), implemented on standard servers, further reduces it to an average of 1 khz 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 5
the Level 1 Trigger Fast readout of the detector, with a coarse granularity electromagnetic calorimeter hadronic calorimeter muon detectors Implementation hardware: ASICs and FPGAs synchronous operation 40 MHz LHC clock Constraints from the detectors pipeline: ~4 μs to take a decision readout: 100 khz maximum output rate 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 6
Data Acquisition & HLT L1 Trigger 100 khz storage manager transfer system event builder on-demand reconstruction & event selection raw data fragments 100 GB/s 20 TB RAM > 30 000 CPU cores 5 GB/s to Tier-0 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 7
the High Level Trigger detector readout with full granularity, at 100 khz constraints 300 ms average time to take a decision 1 khz average output rate (rejection 100:1) software event reconstruction and selection runs on commercial servers quasi-real-time, self-monitoring CMSSW: a modular C++ reconstruction software over 4000 modules, written by hundreds of physicist configured via a dedicated python library each module runs as one or more TBB tasks exploit multi-threading to run multiple modules and reconstruct multiple events concurrently https://github.com/cms-sw/cmssw/ 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 8
Event reconstruction
Online reconstruction Jets/MET reco 3% E/Gamma reco 4% Muons reco 7% Particle Flow and Taus reco 8% Application logic and I/O 15% ECAL local reco 8% HCAL local reco 16% Full tracking and vertexing 30% Pixel local reco and tracking 9% 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 11
Local reconstruction First step of the reconstruction Over 120 millions of channels individual pixels in an image multiple samples over time Particle Flow and Taus reco 8% Muons reco 7% E/Gamma reco 4% Jets/MET reco 3% Application logic and I/O 15% ECAL local reco 8% Extract individual features from each detector 2D clusters and coord. transform multiple peak finding build inner track stubs Can be processed in parallel Full tracking and vertexing 30% Pixel local reco and tracking 9% HCAL local reco 16% 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 12
Track reconstruction Build tracks with the full silicon tracker information Iterative process, dominated by combinatorial complexity start from easy candidates data is cleaned after each step Parallel workflow 1000 s of tracks Very non-homogeneous propagation across layers strong magnetic field bends tracks Similar to ray-tracing in a non-linear space Particle Flow and Taus reco 8% Muons reco 7% E/Gamma reco 4% Jets/MET reco 3% Full tracking and vertexing 30% Application logic and I/O 15% Pixel local reco and tracking 9% ECAL local reco 8% HCAL local reco 16% 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 13
Physics objects Reconstructs physics objects electrons and photons muons tau leptons jets and missing transverse energy b-jets, invariant masses, jets substructure, Jets/MET reco 3% E/Gamma reco 4% Muons reco 7% Particle Flow and Taus reco 8% Application logic and I/O 15% ECAL local reco 8% Extract high level features combine the information from the previous steps hundreds of different algorithms dominated by the individual algorithm s complexity Full tracking and vertexing 30% Pixel local reco and tracking 9% HCAL local reco 16% Hard to parallelise 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 14
Application logic and I/O Application logic schedule multiple events concurrently run different algorithms on-demand Input and Output read input in the binary format defined by the electronics write multiple output files in a (compressed) object format Particle Flow and Taus reco 8% Muons reco 7% E/Gamma reco 4% Jets/MET reco 3% Application logic and I/O 15% ECAL local reco 8% HCAL local reco 16% Data Quality Monitoring monitor the physics and computing performance of the reconstruction Full tracking and vertexing 30% Pixel local reco and tracking 9% 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 15
High Luminosity LHC Starting in 2026, CMS will face an increase in luminosity by a factor 2.5x (x4) a corresponding increase in pileup and event complexity a higher readout rate by a factor x5 (x7.5) the reconstruction of data from new, more complex detectors Even assuming a linear scaling, CMS will need 12x (30x) more processing power! 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 16
Towards a Heterogeneous High Level Trigger
A traditional approach event builder on-demand reconstruction & event selection storage manager 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 18
Local accelerators event builder on-demand, accelerated reconstruction & event selection storage manager 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 19
Remote accelerators on-demand reconstruction & event selection storage manager accelerated reconstruction 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 20
On-demand remote accelerators event builder storage manager accelerated reconstruction 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 21
A case study
Exploit parallelism? Jets/MET reco 3% E/Gamma reco 4% Particle Flow and Taus reco 8% Muons reco 7% Application logic and I/O 15% ECAL local reco 8% HCAL local reco 16% Full tracking and vertexing 30% Pixel local reco and tracking 9% 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 23
How fast can we go? 500 ev/s 450 ev/s 400 ev/s 350 ev/s 300 ev/s 250 ev/s 200 ev/s Pixel local reconstruction and tracking 150 ev/s 100 ev/s 50 ev/s 0 ev/s 1 Xeon Gold 6140 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 24
How fast can we go? 500 ev/s 450 ev/s 400 ev/s 350 ev/s 300 ev/s 250 ev/s 200 ev/s Pixel local reconstruction and tracking 150 ev/s 100 ev/s 50 ev/s 0 ev/s 1 Xeon Gold 6140 2 Xeon Gold 6140 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 25
Can we go faster?
CMSSW framework and GPUs Leverage the inherent parallelism of the data Copy the pixel raw data to a GPU CMSSW modules queue on the GPU asynchronous memory operations and CUDA kernels all GPU operations run asynchronously from the CPU when complete, notify the framework via a callback the host CPU can process other data in parallel Keep all intermediate steps on the GPU using dedicated data structures that facilitate parallelism Transfer the results back to the host Convert to the format expected by the consumers 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 27
Can we go faster? 2500 ev/s 2000 ev/s 1500 ev/s 1000 ev/s Accelerated pixel local reconstruction and tracking 500 ev/s 0 ev/s 1 Xeon Gold 6140 2 Xeon Gold 6140 1 Tesla P100 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 28
Can we go faster? 2500 ev/s 2000 ev/s 1500 ev/s 1000 ev/s Accelerated pixel local reconstruction and tracking 500 ev/s 0 ev/s 1 Xeon Gold 6140 2 Xeon Gold 6140 1 Tesla P100 2 Tesla P100 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 29
Can we go even faster?
Can we go even faster? 5000 ev/s 4500 ev/s 4000 ev/s 3500 ev/s 3000 ev/s 2500 ev/s 2000 ev/s Accelerated pixel local reconstruction and tracking 1500 ev/s 1000 ev/s 500 ev/s 0 ev/s 1 Xeon Gold 6140 2 Xeon Gold 6140 1 Tesla P100 2 Tesla P100 1 Tesla V100 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 31
Can we go even faster? 5000 ev/s 4500 ev/s 4000 ev/s 3500 ev/s 3000 ev/s 2500 ev/s 2000 ev/s Accelerated pixel local reconstruction and tracking 1500 ev/s 1000 ev/s 500 ev/s 0 ev/s 1 Xeon Gold 6140 2 Xeon Gold 6140 1 Tesla P100 2 Tesla P100 1 Tesla V100 2 Tesla V100 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 32
Can we go even faster? 5000 ev/s 4500 ev/s 4000 ev/s 3500 ev/s 3000 ev/s 2500 ev/s 2000 ev/s Accelerated pixel local reconstruction and tracking 1500 ev/s 1000 ev/s 500 ev/s 0 ev/s 1 Xeon Gold 6140 2 Xeon Gold 6140 1 Tesla P100 2 Tesla P100 1 Tesla V100 2 Tesla V100 3 Tesla V100 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 33
Can we go even faster? 5000 ev/s 4500 ev/s 4000 ev/s 3500 ev/s 3000 ev/s 2500 ev/s 2000 ev/s Accelerated pixel local reconstruction and tracking 1500 ev/s 1000 ev/s 500 ev/s 0 ev/s 1 Xeon Gold 6140 2 Xeon Gold 6140 1 Tesla P100 2 Tesla P100 1 Tesla V100 2 Tesla V100 3 Tesla V100 4 Tesla V100 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 34
What else can we improve?
What else can we improve? 5000 ev/s 4500 ev/s CMS Simulation, Preliminary 4000 ev/s 3500 ev/s 3000 ev/s 2500 ev/s 2000 ev/s 1500 ev/s 1000 ev/s 500 ev/s 0 ev/s 1 Xeon Gold 6140 2 Xeon Gold 6140 1 Tesla P100 2 Tesla P100 1 Tesla V100 2 Tesla V100 3 Tesla V100 4 Tesla V100 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 36
Outstanding performance! 5000 ev/s 4500 ev/s CMS Simulation, Preliminary 4000 ev/s 3500 ev/s 3000 ev/s 2500 ev/s 10x faster! 2000 ev/s 1500 ev/s 2x better resolution! 1000 ev/s 500 ev/s 0 ev/s 1 Xeon Gold 6140 2 Xeon Gold 6140 1 Tesla P100 2 Tesla P100 1 Tesla V100 2 Tesla V100 3 Tesla V100 4 Tesla V100 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 37
Conclusions
Conclusions The computing requirements for the High Luminosity LHC cannot be met with a traditional approach CMS has started an R&D effort to port the reconstruction to CUDA The first case study, of the pixel local reconstruction and tracking, shows outstanding performance: 10x faster reconstruction, thanks to the Tesla V100 computing power 2x improved resolution, thanks to more advanced algorithms that can be afforded and the CUDA code is still being optimised! The experience of the next years will allows us to design a fully heterogeneous online farm for the HL-LHC! 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 39
Acknowledgments Thanks to the Flatiron Institute and CERN OpenLab for providing the hardware resources used in the benchmarks Thanks to CERN IdeaSquare for hosting the Patatrack hackathons 10/10/2018 E8382 - Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider - Andrea Bocci 40
Thank you for your attention Please give your feedback on this presentation on the GTC mobile app
Dr. Andrea Bocci <andrea.bocci@cern.ch> Applied Phycisist the CMS Collaboration - http://cms.cern CERN - https://home.cern