1/5 Flavour tagging developments for the LHCb experiment Antonio Falabella Università di Ferrara Dottorato di Ricerca - Ciclo 6 December 16, 13
/5 Overview 1 The LHCb experiment Flavour Tagging 3 Tuning on data 4 Development of SS proton algorithm 5 Conclusion
3/5 The LHCb detector RICH1 TRACKER MUON LHCb measurements Improve the determination of CKM matrix parameters in b-meson decays. New Physics from rare decays of B and D mesons. VELO RICH MAGNET CALORIMETERS Detector requirements Very efficient Trigger system (Two levels L(hardware), HTL(software)) Mass resolution and Particle ID (RICHs, CALOs and MUON) Excellent Vertexing and Tracking (VELO and TRACKING SYSTEM)
4/5 Flavour Tagging Why Flavour Tagging CP violation studies usually involve the study of time-dependent rates asymmetries. A(t) = N(B f )(t) N(B f )(t) N(B f )(t) + N(B f )(t) The determination of this observable rely on the knowledge of the production B flavour. How to perform Flavour Tagging p-p collisions produce b b pairs by strong interactions. The signal-b flavour can be inferred by the decay of the accompanying b hadron decay (Opposite Side tagging) or exploiting the fragmentation process of the b to the signal B meson (Same Side tagging).
5/5 Flavour Tagging algorithms OS : muon, electron, kaon and inclusive vertex SS : pion, kaon, proton
6/5 Flavour Tagging - Some definitions The flavour is determined by the charge of the particle used to tag. Flavour tagging performances are quantified by: ɛ eff = (1 ω) ɛ tag, ω = W R + W, ɛtag = R + W R + W + U The error on the asymmetry A can be calculated (A m is the measured asymmetry) A m (1 ω)e ( mqσt ) / A(t), σ A 1 A m ɛtag N(1 ω) which shows that to minimize the statistical error the ɛ eff need to be maximized Measure and optimize (ɛ eff ) the performances of tagging algorithms using flavour specific control channels: B + J/ΨK +,B D µ + ν µ,b J/ΨK, B D π +
7/5 Calculation and Calibration of the predicted mistag Per-event mistag estimation (η) Using per-event mistag (η) is proved to improve tagging power (+/3%). For each tagger it is estimated using multivariate methods (Neural networks, BDT,...) that uses as inputs kinematical and geometrical information on taggers and is trained to identify the correct charge correlation. When more than one tagging algorithm give a decision they are combined to give a final decision according to individual decisions and probabilities. η calibration To provide a correct estimation of the mistag η must be calibrated The calibration is made assuming a linear dependency ω = p + p 1(η < η >).
8/5 Tuning on data Contributions to tagging from my 1 studies: new NNet structure, training on 1 data with 1 and 11 data training was not possible due to the lack of statistics avoid complications due to different variable distribution between MC/data Performance improvements year statistics ɛ tag D tagging physics results 1 35pb 1 1.97 ±.18 OS m s, φ s.38 ±.18 OS + SSπ sin(β) 11.37fb 1.7 ±.11 OS 11 1.fb 1.35 ±.6 OS m d, sin(β), Bs D sk, B hh 11 + 1 3.fb 1.8 ±.8 11 + 1 3.fb 1.47 ±.4 SSp
9/5 Development of SS proton algorithm Idea behind Same Side algorithms is to use the charge correlation between a B and a closed-by track to infer B flavour at production. Two possible sources of charge correlation: Origin in the decay of a higher mass resonance Correlation in the b hadronization process associate production (AP) B u and B d cases For B + you can have a companion π,k or p track with the same charge correlation, while in B you have π and p with opposite charge correlation. Studying a B neutral channel a SSπ and SSp tagging algorithm can than be developed Main focus of my studies is on SS proton
) 1/5 Implementation details B D ( Kππ)π + Tuning on data I used 1 data (fb 1 ). Data sample corresponds to B D ( Kππ)π + Train a Boosted Decision Tree (BDT backup) to select right/wrong charge closed-by tracks B p wrong sign B p right sign The BDT is trained from a set of input variables and using per-event sweights ( backup) to take into account the background contamination Events / ( MeV/c 16 14 1 1 8 6 4 Pull N_bkg = 711 ± N_sig = 357917 637 13465 5 55 53 535 54 m (MeV/c ) 5 ± Signal Background -5 5 55 53 535 54 # of entries = 4818 Total m (MeV/c ) # of B d signal candidates = 357917 S/B = 5.11
Preselection Cuts and Multiplicity Cannot use all B candidates for training. For mixed event the correlation is opposite cut on decay time: For t <.ps fraction of non oscillated events.93 Track related cut: PIDp > 5 selecting protons Ghost Prob <.5,IPPU > 9 B plus track system related cuts: dq < 15MeV where dq = M B+track m B m track φ < 1., η < 1...15.1.5 -.5 -.1 -.15 -. 5 1 m, t, pidp, CUTS m m, t dq, φ, η, Ghost Prob, IPPU Signal events 35781 74 11175 Multiplicity 5. 5.1 1.6 ɛ.84.53.41 Preselection Cuts reduce number of tagging tracks and the multiplicity per candidate 11/5
1/5 Input Variables The best set of input variables to train the BDT that I found are: For B plus track system: dq, pt (B + comp), φ, η For the companion track: PIDp, p, p T, IPχ,Ghost Prob, IPPU For the B: p T For the event : N tracks in PV Input variable: dq Input variable: log(b+track_pt) Input variable: log(track_pt) 33.3 F (1/N) dn /.14 Right.1 Wrong.1.8.6.4. 4 6 8 1 1 dq [F] U/O-flow (S,B): (.,.)% / (.,.)%.798 F (1/N) dn / 1.8.6.4. 8.5 9 9.5 1 1.5 11 log(b+track_pt) [F] U/O-flow (S,B): (.,.)% / (.,.)%.845 F (1/N) dn /.8.7.6.5.4.3..1 6 6.5 7 7.5 8 8.5 9 log(track_pt) [F] U/O-flow (S,B): (.,.)% / (.,.)% (1/N) dn /.988 F Input variable: log(b_pt) 1.8.6.4. 7.5 8 8.5 9 9.5 1 1.5 11 log(b_pt) [F] U/O-flow (S,B): (.,.)% / (.,.)%.15 F (1/N) dn / Input variable: log(track_p).6.5.4.3..1 7.5 8 8.5 9 9.5 1 1.5 11 11.5 1 log(track_p) [F] U/O-flow (S,B): (.,.)% / (.,.)%.51 F (1/N) dn / Input variable: log(dphi).6.5.4.3..1-8 -6-4 - log(dphi) [F] U/O-flow (S,B): (.,.)% / (.,.)% Small differences between right and wrong charge correlated tracks
13/5 Input Variables Input variable: log(deta) Input variable: log(track_pidp) Input variable: log(n_tracks).6 F (1/N) dn /.5.4.3..1-8 -6-4 - log(deta) [F] U/O-flow (S,B): (-.,.)% / (.,.)%.89 F (1/N) dn /.6.5.4.3..1.5 3 3.5 4 4.5 log(track_pidp) [F] U/O-flow (S,B): (.,.)% / (.,.)%.946 I (1/N) dn / 1. 1.8.6.4..5 3 3.5 4 4.5 5 5.5 log(n_tracks) [I] U/O-flow (S,B): (.,.)% / (.,.)% Input variable: log(track_ipchi).366 F (1/N) dn /.3.5..15.1.5-1 -8-6 -4 - log(track_ipchi) [F] U/O-flow (S,B): (., -.)% / (.,.)% Small differences between right and wrong charge correlated tracks
14/5 BDT Training and Variable Ranking Use all tracks Use event number to split the sample (EVEN=training, ODD=test) AdaBoost for training (1/N) dn / dx TMVA overtraining check for classifier: ssproton Right (test sample) Right (training sample) 1.8 Wrong (test sample) Wrong (training sample) 1.6 1.4 1. 1.8.6.4. -.4 -...4.6.8 1 ssproton response U/O-flow (S,B): (.,.)% / (.,.)% Rank Variable Variable Importance 1 PIDp 1.76e-1 log(p comp) 1.67e-1 3 log(pt comp ) 1.7e-1 4 dq 1.177e-1 5 log( phi) 1.64e-1 6 log( eta) 7.947e- 7 log(pt B ) 7.55e- 8 log(n tracksinpv ) 7.1e- 9 log(pt B+comp ) 6.78e- 1 log(ipχ comp) 6.318e-
15/5 Performances Performances computed in both samples to test overtraining In the plots ɛ tag (1 ω) as a function of BDT cut for training (EVEN) and for testing samples (ODD) ω = W /R + W for t <.ps (slightly overestimated due to B mixing) In case of multiple candidates choose the one with largest BDT Average ɛ tag D vs BDT cut ed [%].5..15.1.5-1 -.5.5 1 BDT Best average tagging power for BDT >.5 (EVEN) : ɛ tag D =.3% Best average tagging power for BDT >.5 (ODD): ɛ tag D =.13%
16/5 Fit to oscillation Unbiased determination of ω from the fit to B oscillation (no time cut). In case of multiple proton candidates choose the one with highest BDT response I define a binning to have the same statistics in each one: BDT bin [ 1.,.] [.,.15] [.15,.3] [.3,.4] [.4,.55] [.55,.7] [.7,.8] [.8, 1.] ω[%](odd) 49.5 ±.6 49.4 ±.6 47. ±.6 45.1 ±.6 43.4 ±.7 41.4 ± 1. 37. ± 1.7 5.3 ± 1.8 ω[%](even) 5.9 ±.5 49.6 ±.6 46. ±.6 46.7 ±.8 4.8 ±.7 38. ± 1. 33. ± 1.6 6.5 ± 1.9 [ 1.,.] [.,.15] [.15,.3] [.3,.4].6.4..6.4..6.4..6.4. -. -. -. -. -.4 -.4 -.4 -.4 -.6 -.6 -.6 -.6 5 1 5 1 5 1 5 1 [.4,.55] [.55,.7] [.7,.8] [.8, 1.].6.4..6.4..6.4..6.4. -. -. -. -. -.4 -.4 -.4 -.4 -.6 5 1 -.6 5 1 -.6 5 1 -.6 5 1 Oscillation clearly visible; amplitude increase as a function of the BDT output Mistag determination from each bin compatible in the two subsamples
17/5 Calibration of the BDT output Find per-event ω estimation as a function of BDT output Plot ω VS BDT for each BDT bin (ODD sample) polynomial (left plot) η = pol(bdt ) should be already calibrated (middle plot) Mistag (ω) Calibration SSproton ω.6.5 Mistag (ω) Calibration SSproton ω.6.5 a.u. heta eta.1 Entries 7933 Mean.453 RMS.3839.1.4.4.8.3.3.6..1 χ / ndf 6.485 / 5 p.489 ±.368 p1 -.8681 ±.1313 p ± p3 ± p4 ± p5 -.945 ±.4131-1 -.8 -.6 -.4 -...4.6.8 1 BDT..1 χ / ndf 3.778 / 4 p.4547 ±.3163 p1 1.16 ±.8536 < η>.453 ±.1..3.4.5.6 η.4..1..3.4.5.6 η p 1 p < η > ɛ[%] ɛ eff [%] Odd Sample 1.15 ±.85.454 ±.3.453 31.3 ±.1.471 ±.45 Even Sample 1.36 ±.85.449 ±.3.453 31. ±.1.64 ±.51
) 18/5 Validation on B Dπ 11 unbiased sample Events / ( MeV/c 7 6 5 N_bkg = 334 ± 471 N_sig = 15317 ± 46 Signal Background Total 4 Use B Dπ 11 sample (1fb 1 ) for validation Similar data analysis (no BDT training, only performances evaluation and calibration cross-check) S/B similar to 1 data 3 1 Pull 5 55 53 535 54 m (MeV/c ) 5-5 5 55 53 535 54 # of entries = 18351 # of B d signal candidates= 15317 S/B = 5.1 m (MeV/c )
19/5 Validation on Dπ 11 Using the BDT trainied on 1 data, and the same polynomial function Mistag (ω) Calibration SSproton ω.6.5 a.u..14 heta eta Entries 18378 Mean.458.1 RMS.399.4.1.3..1 χ / ndf 6.5 / 4 p.465 ±.338 p1 1.5 ±.8778 < η>.4519 ±.8.6.4..1..3.4.5.6 η.1..3.4.5.6 η p 1 p < η > ɛ[%] ɛ eff [%] 1.5 ±.87.46 ±.3.45 33. ±.1.418 ±.46 Calibration: p 1 OK, p < η > compatible with in 3σ Performances: compatible with 1 unbiased sample
) ) Validation on B J/ψK Use J/ψK for validation 1 data sample (fb 1 ) (left plot) 11 data sample (1fb 1 ) (right plot) 1 11 Events / (.9 MeV/c 14 N_bkg = 75 ± 1664 1 1 8 N_sig = 511 ± 18333 Signal Background Total Events / (.9 MeV/c 6 5 4 N_bkg = 78488 ± 1743 N_sig = 11814 ± 5917 Signal Background Total 6 3 4 1 Pull 54 56 58 53 53 m (MeV/c ) 5-5 54 56 58 53 53 m (MeV/c ) # of entries candidates = 477351 # of B d signal candidates = 511 S/B = 1.1 Pull 54 56 58 53 53 m (MeV/c ) 5-5 54 56 58 53 53 m (MeV/c ) # of entries candidates = 19659 # of B d signal candidates = 11814 S/B = 1.5 /5
1/5 Validation on J/ψK 1 - Fit to oscillation Oscillation clearly visible as a function of the BDT output [ 1.,.] [.,.15] [.15,.3] [.3,.4].6.4..6.4..6.4..6.4. -. -. -. -. -.4 -.4 -.4 -.4 -.6 5 1 -.6 5 1 -.6 5 1 -.6 5 1 [.4,.55] [.55,.7] [.7,.8] [.8, 1.].6.4..6.4..6.4..6.4. -. -. -. -. -.4 -.4 -.4 -.4 -.6 5 1 -.6 5 1 -.6 5 1 -.6 5 1
ω χ / ndf.53 / 4 p.463 ±.3165 p1.9419 ±.9358 η>.4581 ± ω χ / ndf 4.43 / 4 p.4677 ±.4419 p1.9451 ±.176 η>.4574 ± ω χ / ndf 1.17 / 4 p.4634 ±.76 p1.9339 ±.8 η>.458 ± Entries 79711 Mean 8374 RMS 3649 Validation on J/ψK 1 Using the BDT trained on 1 data, and the same polynomial function 1 11 11+1 Mistag (ω) Calibration SSproton Mistag (ω) Calibration SSproton Mistag (ω) Calibration SSproton.6.6.6.5.5.5.4.4.4.3.3.3....1.1.1 < < <.1..3.4.5.6 η.1..3.4.5.6 η.1..3.4.5.6 η p1 p < η > ɛ[%] ɛeff [%] 1.94 ±.93.463 ±.3.458 5.7 ±.1.3 ±.3 11.945 ±.18.468 ±.4.457 6. ±.1.5 ±.4 1+11.934 ±.8.463 ±.3.458 6. ±.1.4 ±. Calibration: p 1 OK, slight offset for p p < η > compatible with in 3σ Performances: smaller ɛ eff due to different B pt spectrum Reweighting ɛ eff [%] =.35 ±. B Dπ.16.14.1.1.8.6.4. B J/ψK 5 1 15 5 B_pT /5
3/5 Conclusion Development of a new SS tagging algorithm using protons based on a BDT: Used B D ( Kππ)π + 1 data sample for training For the unbiased B Dπ sample ɛ eff [%] =.471 ±.45 For the B J/ψK sample (11+1) ɛ eff [%] =.4 ±. Calibration proved to be portable across different data samples and a different channel
4/5 My 3 nd year of Ph.D. I spent the 3 nd of my Ph.D. covering two main activities: Development and optimization of a new same side proton FT algorithm for the LHCb experiment; I worked with the offline computing group of the LHCb experiment and in particular with the LHCbDirac developers group. Report regularly at the Flavour Tagging working group meeting about results and progresses of my studies A partecipated with a poster contribution to Beauty 13 International conference in Bologna; I partecipated to IFAE in Cagliari with a poster contribution on LHCb Flavour Tagging; I partecipated to IUSS Niccolò Cabeo school about Beyond Standard Model physics in Ferrara;
5/5 Thank you!
6/5 BACKUP
7/5 splots technique 1 Statistical tool to unfold data distributions Event characterized by a set of variables: Discriminating variables: the distribution of all the sources of events are known Control variables: the distribution of some sources of events are unknown splots technique reconstruct the distribution of the control variables for each source of events used to unfold signal and background event in a data sample for example given F s(y) and F b (y) distributions for the discriminating variable y for signal and background assuming N s and N b number of events for the two sources 1 arxiv.org/abs/physics/483
8/5 splots technique - cont d Define the sweight as: W s(y) = VssFs(y) + V sbf b (y) 1, where Vij = N sf s(y) + N b F s(y) N F i (y e)f j (y e) = (N sf s(y e) + N b F s(y e)) weighting the control variable disitribution x by the W s(y e) gives the true distribution for the signal component of x e=1
9/5 Boosted Decision Trees Decision tree are binary tree classifiers an event is classified after repeating a yes/no decision on one single variable the phase space is then split in many regions that can be classified as signal or background boosting build several trees: forest final decision is the weighted average of each individual decision tree boosting improve stability with respect to fluctuations of the training sample