Particle Identification at LHCb Miriam Lucio, on behalf of the LHCb Collaboration IML Workshop April 10, 2018 M. Lucio Particle Identification at LHCb April 10, 2018 1
Outline 1 Introduction 2 Neutral ID tools 3 Charged ID tools 4 Resampling and transformation of simulated PID variables 5 Conclusions and Outlook M. Lucio Particle Identification at LHCb April 10, 2018 2
LHCb [Int J Mod Phys A30 (2015) 1530022] Single-arm forward spectrometer, 2 < η < 5 General purpose experiment, initially designed to study of particles containing b or c quarks Detected particles: e, µ, π, K, p, γ, ghosts (tracks that do not correspond to any real particle) M. Lucio Particle Identification at LHCb April 10, 2018 3
PID at LHCb Problem: identify particle type associated with a track/energy deposited in the subdetectors Charged: π, e, µ, K, p Neutral: π 0, γ, n Better PID performance better bkg rejection more precise results PID also used for trigger (in particular for upgrade): less background less resources (less bandwidth) High-level info from subdetectors + track quality info multi-class classification in machine learning [Int J Mod Phys A30 (2015) 1530022] M. Lucio Particle Identification at LHCb April 10, 2018 4
Outline 1 Introduction 2 Neutral ID tools 3 Charged ID tools 4 Resampling and transformation of simulated PID variables 5 Conclusions and Outlook M. Lucio Particle Identification at LHCb April 10, 2018 5
Introduction Radiative decays: interesting area of study at LHCb (e.g. photon polarisation measurement [PRL 112, 161801 (2014)]) Problem π 0 copiously produced at LHCb, inmediate decay to γγ high momentum π 0 merge of ECAL clusters huge background for radiative decays Need for a powerful tool to discriminate signal (γ) from background (π 0 γγ) M. Lucio Particle Identification at LHCb April 10, 2018 6
ECAL Signatures Different signatures (MC events): π 0 γγ γ ECAL clusters (3x3 cells) Coarse granularity separation is not straightforward M. Lucio Particle Identification at LHCb April 10, 2018 7
Baseline approach [LHCb-PUB-2015-016] Neural Network with 2 hidden layers (TMVA MLP) Train with B 0 K 0 γ as signal and B 0 π 0 X as background 14 ECAL and Pre-Shower cluster parameters (grouped under shape and symmetry) 4 variables that account for the size & tails, semiaxes and orientation of the ellipse in the ECAL 2 variables related to the energy of the most (seed) and the second most energetic cells of the cluster 4 variables for multiplicities of hits in the PS cells matrix in front of the seed of the electromagnetic cluster 4 shape and asymmetry variables in the 3x3 PS cells LHCb Simulation (inner region of ECAL) LHCb Data M. Lucio Particle Identification at LHCb April 10, 2018 8
New approach 2x25 input features: responses in 5x5 cells cluster from ECAL and pre-shower detectors Train: B Kπγ and B Kππ 0 MC samples (kinematically similar) Test: B Kπγ and B Kππ 0, B J/ψK π 0 MC samples Potentially misleading π 0 candidates with 2 outgoing γ sharing the same cluster are studied 8 TeV MC data, 120k photons, 220k π 0 Neural Networks and BDT considered M. Lucio Particle Identification at LHCb April 10, 2018 9
New approach Neural Network 1 or 2 hidden layers Width: 100, 250, 500, 800 units for ECAL + 10, 50 Preshower Optimizer: Adamax, Adagrad, SGD ROC AUC = 0.89 BDT XGBoost, CatBoost, LightGBM a ROC AUC = 0.95 a https://github.com/yandexdataschool/modelgym/ Big improvement, specially for moderate efficiencies M. Lucio Particle Identification at LHCb April 10, 2018 10
Comparison (all ECAL regions) Aggressive bacgkround suppression Good prospects for π 0 suppression & photon suppression for π 0 recovery M. Lucio Particle Identification at LHCb April 10, 2018 11
Outline 1 Introduction 2 Neutral ID tools 3 Charged ID tools 4 Resampling and transformation of simulated PID variables 5 Conclusions and Outlook M. Lucio Particle Identification at LHCb April 10, 2018 12
Introduction Goal: improve PID variables for charged particles better background rejection Particles: electron, muon, pion, kaon, proton, ghosts Only tracks using information from the full detector considered Baseline solution (ProbNN) Standard MVA used for PID LHCb Artificial neural networks with 6 binary classification models (One-versus-rest approach: separate one type from the others) 1 hidden layer, TMVA MLP [arxiv:0703039] Activation function: tanh, sigmoid Training method: Back-Propagation (BP), BFGS Algorithm (BFGS), or Genetic Algorithm (GA - slower and worse) Estimator: MSE (Mean Square Estimator) for Gaussian Likelihood or CE(Cross-Entropy) for Bernoulli Likelihood Trained on 2015 MC M. Lucio Particle Identification at LHCb April 10, 2018 13
Non-flat efficiency approach Gradient boosting: XGBoost [arxiv:1603.02754] Decision train (DT) [arogozhnikov.github.io/2015/05/22/decisiontrain-classifier.html] CatBoost [arxiv:1706.09516] Artificial neural networks (NN) One hidden layer Deep neural networks Linear combinations of features from subdetectors CatBoost and DNN give best results M. Lucio Particle Identification at LHCb April 10, 2018 14
Flat efficiency approach PID performace depends on particle kinematics (p,p T,η) and N tracks Flat PID efficiencies: Good discrimination for different analyses Unbiased background discrimination Reduced systematic uncertainties Introduce flatness term in loss function: L = L AdaLoss + αl Flat Flat4d: L Flat4d = L Flat P + L Flat PT + L Flat ntracks + L Flat η Pions Kaons Flat4d, ProbNN Better PID efficiency flatness in p,p T,η,N tracks than baseline M. Lucio Particle Identification at LHCb April 10, 2018 15
Outline 1 Introduction 2 Neutral ID tools 3 Charged ID tools 4 Resampling and transformation of simulated PID variables 5 Conclusions and Outlook M. Lucio Particle Identification at LHCb April 10, 2018 16
Introduction PID information used in both trigger selection and offline data analysis Obtain efficiency & systematic effects for the PID requirements applied PIDCalib package [CERN-LHCb-PUB-2016-021] NEW: Data-driven technique Efficiency obtained using per-event weights from simulated calibration sample PID variables cannot be used to train multivariate classifiers MCResampling: PID response replaced by the one generated from calibration PDFs Problematic for systematics computation Ignores correlation Resampling of PID variables: PIDGen Transformation of PID variables: PIDCorr M. Lucio Particle Identification at LHCb April 10, 2018 17
T T PIDGen & PIDCorr Input Variables for PDF Entries 1000 500 PID variable, log p T, η, log N tracks Transformed to remove narrow peaks 3 10 LHCb preliminary Entries 3 10 LHCb 1000 preliminary 500 PDF computation Four-dimensional kernel density estimation Meerkat library [2015 JINST 10 P02011] Entries 0 0 0.2 0.4 0.6 0.8 1 ProbNNK K 3 10 1000 500 0 LHCb preliminary 2 3 4 5 η Entries 0 200 100 6 7 8 9 log(p / MeV) T 6 10 LHCb 300 preliminary 0 3 4 5 6 log(n tracks ) / MeV) log(p binned data 3 10 9 700 600 8 500 400 7 300 200 6 LHCb preliminary 100 0 0 0.2 0.4 0.6 0.8 1 ProbNNK K Entries / MeV) log(p estimation 8 9 7 6 8 5 4 7 3 2 6 LHCb preliminary 1 0 0 0.2 0.4 0.6 0.8 1 ProbNNK K Density, a.u. sweighted D ± D 0 π ± calibration sample M. Lucio Particle Identification at LHCb April 10, 2018 18
PIDGen validation For a given set of (log p T, η, log N tracks ), generate PID variable that looks like data using the known 4D distribution of the calibration sample in the PID variable, log p T, η and log N tracks Clean, high-statistics data sample: Λ 0 b Λ+ c π, Λ + c pk π + Events / ( 0.02 ) 2000 1800 LHCb preliminary 1600 1400 1200 1000 800 600 400 200 0 0 0.2 0.4 0.6 0.8 1 ProbNNK K from Λc Events / ( 0.02 ) 3000 LHCb preliminary 2500 2000 1500 1000 500 0 0 0.2 0.4 0.6 0.8 1 ProbNNpi K from Λ c sweighted data, uncorrected simulation, PIDGen-corrected Good agreement between corrected MC and data M. Lucio Particle Identification at LHCb April 10, 2018 19
PIDCorr validation corrected ProbNNK K from Λc Using the obtained 4D PDF for data and MC, construct a function that transforms simulated PID response such that it matches data Preserves correlations between different PID responses for the same track 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 LHCb MC preliminary 0 0.2 0.4 0.6 0.8 1 ProbNNK K from Λc simulated Events / ( 0.02 ) Resampling 1600 LHCb preliminary 1400 1200 1000 800 600 400 200 0 0 0.2 0.4 0.6 0.8 1 ProbNNK (1-ProbNNpi) K from Λ c Events / ( 0.02 ) Transformation 1600 LHCb preliminary 1400 1200 1000 800 600 400 200 0 0 0.2 0.4 0.6 0.8 1 ProbNNK (1-ProbNNpi) K from Λ c sweighted data, uncorrected simulation, PIDGen-corrected (center), PIDCorr-corrected (right) Combinations of PID variables: Resampling procedure fails (correlations are ignored) Transformation of variables: better agreement M. Lucio Particle Identification at LHCb April 10, 2018 20
Outline 1 Introduction 2 Neutral ID tools 3 Charged ID tools 4 Resampling and transformation of simulated PID variables 5 Conclusions and Outlook M. Lucio Particle Identification at LHCb April 10, 2018 21
Conclusions & Outlook Big improvement in π 0 - γ separation Implemented PID transformation tools inside PIDCalib that preserves correlations betteer agreement with data Baseline ProbNN extended with deep neural networks and gradient boosting PID algorithms with better PID efficiency flatness studied Stay tuned! M. Lucio Particle Identification at LHCb April 10, 2018 22
Thanks for your attention! M. Lucio Particle Identification at LHCb April 10, 2018 23