GAN Applications in High Energy Particle Physics

75.2355 7.5927 GAN Applications in High Energy Particle Physics Benjamin Nachman Lawrence Berkeley National Laboratory with collaborators Michela Paganini and Luke de Oliveira Outline: DNN with HEP images LAGAN CaloGAN

Simulation at the LHC 2 DEFAULT Inspired by Sherpa. paper - can you spot the differences? Spanning -2 m up to m can take O(min/event)

Simulation at the LHC 3 This is only possible because of factorization (Markov Property): given the physics at one energy (~/length) scale, the physics at the next one is independent of what came before. DEFAULT Spanning -2 m up to m can take O(min/event)

Part I: Hard-scatter 4 We begin with a model and ME generators. L = 4 F µ F µ + i /D + i y ij j + h.c. + D µ 2 V ( ) +??? See this paper for adapting a ME to HPC See this paper for ME integration with GNNs DEFAULT ************************************************************ * * * W E L C O M E to * * M A D G R A P H 5 _ a M C @ N L O * * * * * * * * * * * * * * * * * * * * 5 * * * * * * * * * * * * * * * * * ************************************************************ Standard is automated NLO or LO + matched For many cases, this is slow but not limiting (yet)

Part II: Fragmentation 5 Fragmentation uses MCMC; standard is leading-log. DEFAULT Not a limiting factor in terms of computing time.

Part III: Material Interactions 6 State-of-the-art for material interactions is Geant4. Includes electromagnetic and hadronic physics with a variety of lists for increasing/decreasing accuracy (at the cost of time) DEFAULT This accounts for O() fraction of all HEP competing resources!

Part IV: Digitization 7 It is important to mention that after Geant4, each experiment has custom code for digitization this can also be slow; but is usually faster than G4 and reconstruction DEFAULT deposited charge e MIP Frequency Preamplifier output MIP 2 MIPs Energy Loss de/dx ToT = 2 4 MHz clock analog threshold Time

Part IV: Digitization 8 It is important to mention that after Geant4, each experiment has custom code for digitization N.B. calorimeter energy deposits this can also be slow; but is usually factorize (sum of the deposits is faster than G4 and reconstruction the deposit of the sum) but DEFAULT deposited charge e digitization (w/ noise) does not! MIP Frequency Preamplifier output MIP 2 MIPs Energy Loss de/dx ToT = 2 4 MHz clock analog threshold Time

9 Goal: replace (or augment) simulation steps with a faster, powerful generator based on state-of-the-art machine learning techniques This work: attack the most important part: Calorimeter Simulation

Why should you care? N.B. ALL jet substructure analyses in ATLAS are forced to use full simulation as current fast sim. is not good enough. Standard Model Production Cross Section Measurements Status: July 27 σ [pb] 6 5 4 3 2 2 total (x2) inelastic incl. dijets p T > 25 GeV p T > 25 GeV p T > GeV n j n j 2 n j 3 n j n j n j 2n j n j n j 3n j 2 n j 2 n j 3 n j 4 n j 3 n j 4 n j 5 n j 4 n j 5 n j 6 n j 5 n j 6 n j 7 n j n n j 7 j 6 n j 7 total n j 4 n j 5 n j 6 n j 7 n j 8 ATLAS Preliminary Theory Run,2 s =7,8,3TeV ~3 billion events at the HL-LHC t-chan Wt s-chan Zt WW WZ ZZ WW WW WZ ZZ WZ ZZ total ggf H WW H ττ VBF H WW H γγ H ZZ 4l W γ Zγ LHC pp s =7TeV Data 4.5 4.9 fb LHC pp s =8TeV Data 2.3 fb LHC pp s =3TeV Data.8 36. fb 3 W ± W ± WZ pp Jets R=.4 γ W Z t t t tot. VV tot. γγ H WVVγ t tw tot. t tz tot. t tγ Wjj EWK Zjj EWK WW Excl. tot. ZγγWγγ WWγ ZγjjVVjj EWK EWK

Why should you care? N.B. ALL jet substructure analyses in ATLAS are forced to use full simulation as current fast sim. is not good enough. Standard Model Production Cross Section Measurements Status: July 27 σ [pb] 6 total (x2) inelastic incl. If we dijets 5 don t do something, the HL-LHC LHC pp s =8TeV p T > 25 GeV 4 n j won t be possible. If we do something LHC pp s =3TeV n p T > 25 GeV n j j ~3 billion events at the HL-LHC 3 total Data.8 36. fb p T > GeV n j 2n j WW now, t-chan n j n j WW 2 we can save O($ million/year). WW total n j 3n j 2 Wt n n j 2 j 2 WZ WZ n j 3 ZZ WZ ggf n j 4 H WW n j 3 n j 3 n j 4 ZZ W γ ZZ n j 4 n j 5 s-chan n j 5 n j 4 n j 6 H ττ n j 5 Zt Zγ n j 6 n n j 5 j 7 VBF n j 6 H WW 2 n j 7 n n j 7 j 6 n j 7 n j 8 ATLAS Preliminary Theory Run,2 s =7,8,3TeV H γγ H ZZ 4l LHC pp s =7TeV Data 4.5 4.9 fb Data 2.3 fb 3 W ± W ± WZ pp Jets R=.4 γ W Z t t t tot. VV tot. γγ H WVVγ t tw tot. t tz tot. t tγ Wjj EWK Zjj EWK WW Excl. tot. ZγγWγγ WWγ ZγjjVVjj EWK EWK

2 Jets First step: instead of studying the detailed structure of calorimeter showers, we consider Jet images

The Jet Image 3 Jet Image: A two-dimensional fixed representation of the radiation pattern inside a jet proton-proton collision into/ out-of page jet image jet [Translated] Azimuthal Angle (φ).5 -.5 Boosted W qq Not unlike a 2D calorimeter! 2 [GeV] Pixel p T jet - not to scale - -.5.5 [Translated] Pseudorapidity (η) jet image..and nothing like a natural image!

[Translated] Azimuthal Angle (φ) Why re-showered images? with Pythia 8, m = 25 GeV.5-3. Can directly visualize physics.8.6.4 [Translated] Azimuthal Angle (φ) -.5 p p -.5 -.5.5 [Translated] Pseudorapidity (η) p p bb, 8 re-showered with Pythia 8, m - -2-7 bb -9 - - - (will mention other fixed representations later) -3-6 -8 Normalized Pixel Energy -9 - - -2 - -3-7 there is information encoded in the -8 -.5 p p bb re-showered with Pythia 8, m singlet qq = 25 GeV [Translated] Azimuthal Angle (φ),8.5 = 25 GeV 4 extensive -4 image processing literature -5-4 -5-6 physical distance between pixels Normalized Pixel Energy -.5 [Translated] Azimuthal Angle (φ) -.5 -.5 - p p 8 bb re-showered with Pythia 8, m 8 p p 8 = 25 GeV and we can benefit from the bb octet qq 8 re-showered with Pythia 8, m = 25 GeV - -.5.5 [Translated] Pseudorapidity (η)

Why jets? P 5 ~X =(, ) p A jet is defined by a clustering algorithm (=unsupervised learning) BUT these clusters also have physical meaning e.g. can be calculated in perturbation theory p ~.8.6.4.2.8.6.4.2 2 See also Bayesian Model-based Clustering 2 3 ~µ 3 2 ~µ 2 5 4 3 2 2 3 4 5 ~µ a great testing ground to bridge state-of-the-art ML techniques with physically meaningful/ interpretable algorithms Some recent attempts to even cluster jets using modern ML tools. L. Mackey, BPN, A. Schwartzman, C. Stansbury, 59.226 Louppe et. al, 72.748

Pre-processing and Special Relativity 6 Pre-processing is an important aspect of 24 image < p /GeV < 26 GeV, 65 recognition < mass/gev < 95 [Translated] Azimuthal Angle (φ) However, some steps can - Pythia 8, W' WZ, s = 3 TeV -2 24 < p /GeV < 26 GeV, 65 < mass/gev < 95 damage the physics T information -3 3-4 2 content of a jet image -5 -.5-6 [Translated] Azimuthal Angle (φ).5 - -.5.5-7 - I won t discuss this in detail here, -5 but -.5-6 I bring it up so you are aware of -7 it! -8 gle (φ).5 - Pythia 8, W' WZ, T [Translated] Pseudorapidity (η) Pythia - 8, QCD dijets, 24 < p /GeV < 26 GeV, 65 < mass/gev < 95 T - -.5.5 s = 3 TeV [Translated] Pseudorapidity (η) s = 3 TeV 3 2-8 -2-9 -3-4 -9 3 2 [GeV] Pixel p T [GeV] Pixel p [GeV] T T gle [Translated] (φ) Azimuthal Angle [Translated] (φ) Azimuthal Angle [Translated] (φ) Azimuthal Angle [Translated] (φ) Azimuthal Angle (φ).5 -.5.5 - -.5.5 - -.5.5 - Pythia 8, W' WZ, 24 < p /GeV < 26 GeV, 65 < mass/gev < 95 T Pythia 8, W' WZ, 24 < p /GeV < 26 GeV, 65 < mass/gev < 95 T Pythia 8, W' WZ, 24 < p /GeV < 26 GeV, 65 < mass/gev < 95 T s = 3 TeV - -.5.5 [Translated] Pseudorapidity (η) Pythia 8, QCD dijets, 24 < p /GeV < 26 GeV, 65 < mass/gev < 95 T s = 3 TeV s = 3 TeV - -.5.5 [Translated] Pseudorapidity (η) Pythia 8, QCD dijets, 24 < p /GeV < 26 GeV, 65 < mass/gev < 95 T - -.5.5 [Translated] Pseudorapidity (η) s = 3 TeV s = 3 TeV 3 2 2 3-3 2-4 - -7-2 -8 3 2 - -2 3-3 -5-6 [GeV] -4-5 Pixel p -6 - -7-2 -8 [GeV] -9 Pixel p -3-9 3-4 2-5 [GeV] Pixel p T T [GeV] T T

Convolution age Max-Pool Convolution Max-Pool Modern Deep NN s for Classification Flatten 7 L. de Oliviera, M. Kagan, L. Mackey, BPN, A. Schwartzman 5.59 Convolutions Convolved Feature LayersLayers Convolved Feature Max-Pooling W WZ event P. Baldi et al. 63.9349 (W-tagging) J. Barnard et al. 69.67 (W-tagging) Repeat Subsequent P. Komiske et al. 62.55 (q/g-tagging) developments: G. Kasieczka et al. 7.8784 (top-tagging)

Modern Deep NN s for Classification 8 /(Background Efficiency) 5 25 < p /GeV < 3 GeV, 65 < mass/gev < 95 T Pythia 8, s = 3 TeV DNNs mass+τ 2 mass+ R τ 2 + R MaxOut Convnet Convnet-norm Random 5 Universally, DNNs out-perform physically by how much motivated features and what they are learning is.2.4.6.8 Signal Efficiency active R&D!

Modern Deep NN s for Regression 9 P. Komiske, E. Metodiev, BPN, M. Schwartz 77.xxxx Every pp collision comes with O(-) other collisions we don t care about (pileup) Pileup is a strange sort of noise because we can measure ~2/3 of it (charged pileup) See BOOST7 talk

Modern Deep NN s for Regression 2 P. Komiske, E. Metodiev, BPN, M. Schwartz 77.xxxx Leading vertex charged f o n o i t a c i! l s p r p e t l a fi l l a r a u n t o a i t N u l o v n o c Inputs to NN Total neutral {z filters 2 } Leading vertex neutral See BOOST7 talk b ea m Pileup charged

Modern Deep NN s for Regression 2 P. Komiske, E. Metodiev, BPN, M. Schwartz 77.xxxx Pileup Mitigation with Machine Learning See BOOST7 talk

And now: Modern Deep NN s for Generation 22 Generative Adversarial Networks (GAN): A two-network game where one maps noise to images and one classifies images as fake or real. GAN noise G D {real,fake} When D is maximally confused, G will be a good generator Pythia D Physics-based simulator

QCD ACGAN 23 {W,QCD} W GAN Add an auxiliary classification task G D W {real,fake} noise QCD D Pythia W = signal QCD = background

Locally Connected Layers Due to the structure of the problem, we do not have translation invariance. 24 Classification studies found fully connected networks outperformed CNNs However, convolutional-like architectures are still useful to e.g. reduce parameters

Locally Connected Layers 25 Locally connected layers use filters on small patches (CNN is then a special case with weight sharing)

Locally Aware GAN (LAGAN) 26 W - QCD (GAN) Unlike `natural images, we have physically meaningful D manifolds (here, jet mass) W - QCD (Pythia)

+ More Layers for Generation What about multiple layers with non-uniform granularity and a causal relationship? Not jet images per se, but the technology is more general than jets! φ z η 27

Calorimeter Simulation 28 η direction [mm] 2 5 5 5 - Geant4, Pb Absorber, lar Gap, GeV e 3 25 2 5 Local Energy Deposit [MeV] We take as our model a 3- layer LAr calorimeter, inspired by the ATLAS barrel EM calorimeter 5 5 2 2 5 5 5 5 2 Depth from Calorimeter Center [mm] A single event may have O( 3 ) of particles showering in the calorimeter - too cumbersome to do all at once (now) η direction [mm] 2 5 5 - Geant4, Pb Absorber, lar Gap, GeV e 9 8 7 6 5 Cell Energy [MeV] 5 4 We exploit factorization of 5 3 2 energy depositions 2 2 5 5 5 5 2 Depth from Calorimeter Center [mm]

Generator Network for CaloGAN 29 One jet image per calo layer One network per particle type; input particle energy use layer i as input to layer i+ ReLU to encourage sparsity

Discriminator Network for CaloGAN 3 help avoid mode collapse

Average Images 3 Geant4 CaloGAN

Overtraining 32 not memorizing A key challenge in training GANs is the diversity of generated images. This does not seem to be a problem for CaloGAN. no mode collapse

Energy per layer 33 Pions deposit much less energy in the first layers; leave the calorimeter with significant energy N.B. can always add these (and others) explicitly to the training

Depth of the shower 34 Depth-weighted total energy ld

Lateral spread 35 The much larger variation in the pion showers is a challenge for the network. These moments and others are useful for classification; we have also tested this as a metric (NN on 3D images)

Shower Energy 36 Beyond our training sample!

Timing 37 Generation Method Hardware Batch Size milliseconds/shower GEANT4 CPU N/A 772 3. CPU 5. 28 2.9 24 2.3 CALOGAN 4.5 4 3.68 GPU 28.2 52.4 24.2

Where to next 38 Add angle in addition to energy; hadronic calorimeter Non-uniform geometry as a function of h η φ z Integration within experiments (ATLAS and possibly others?) and collaboration with other efforts (e.g. GeantV)

Conclusions 39 Convolutions (Jet) image-based NN classification, regression, and generation are powerful tools for fully exploiting the physics program at the LHC Beyond our training sample! WZ eventis to The key tow robustness study what is being learned; this may even help us to learn something new!

Code and documentation 4 All of our training samples are public as is our generation, training, and plotting code: https://github.com/hep-lbdl you can find more documentation about the LAGAN and CaloGAN on the arxiv: 75.2355 7.5927

Workshop Advertisement 4 link

Backup

allows for an imageed in pixels in (η, φ) greyscale analogue. HEP 2 (25) 8], omputer vision.. We h image, as is often ntensities. meaning th discriminant Learning about Visualizing Learning Learning Below, we have the learned convolutional filters (left) and the difference in between the average signal and background image after applying the learned convolutional filters (right). This novel difference-visualization technique helps understand what the network learns. Jet images afford a lot of natural visualization 25 < p /GeV < 26 GeV,.59a< τ <.6, 79 < mass/gev < 8 tinguish between s = 3 TeV, Pythia 8, QCD. 5 2 correlation image T 4 3 2 - -.5-2 T ge nal d.5 Raw image (differences) CNN filters signal p - background p [GeV] [Translated] Azimuthal Angle (φ) T 43 2D Convolutions to Jet Images -3 - most activating images -4 - -.5.5-5 [Translated] Pseudorapidity (η) We show th potential of into Compu towards inc as a community, we have also developed many techniques 24 < pt/gev < 26 GeV,.9 < τ 2 <.2, 79 < mass/gev < 8 s = 3 TeV, Pythia 8 s = 3 TeV, Pythia 8..8.6.4.8.6 joint distributions mass+maxout τ2+maxout R+MaxOut MaxOut Convnet-norm.4.6.8 2 Pythia 8, 5 τ2.2 Convnet 5 MaxOut MaxOut-norm MaxOut (weighted).6 24 < pt/gev < 26 GeV,.9 < τ 2 <.2, 79 < mass/gev < 8 Pythia 8, re-weight Net + high level.4 s = 3 TeV mass+τ2.4.2.2 Convnet Random.2 5 25 < pt/gev < 3 GeV,.2 < τ 2 <.8, 65 < mass/gev < 95 /(Background Efficiency) QCD, /(Background Efficiency) Pr(τ2 DNN Output) DNN Output T /(Background Efficiency) 25 < p /GeV < 3 GeV, 65 < mass/gev < 95 Signal Efficiency.2.4.6 mass τ2 5 R redact MaxOut Convnet Random 5 5.8 s = 3 TeV 2.8 Signal Efficiency More detail in my DS@HEP5 talk.2.4.6.8 Signal Efficiency

Locally Aware GAN (LAGAN) 44