Statistics for Resonance Search

Statistics for Resonance Search Georgios Choudalakis University of Chicago ATLAS exotic diphoton resonances meeting Nov. 5, 0

Intro Introduction I was asked to explain how we treated (statistically) the dijet resonance search. The BumpHunter is already being used in boosted object resonance searches (Elin, Muge et al), and in dilepton searches (Dan Hayden). The same tool could be used in diphoton resonance search, for consistency. Georgios Choudalakis (U.Chicago) Statistics for resonance search page

Introduction Background by fitting We fit a smooth function, through the whole spectrum, to obtain background. We justify this by showing it can fit SM QCD. Fitting the data is not a good argument. If we couldn t fit SM QCD, we wouldn t use a fit; we would use SM QCD as background. Georgios Choudalakis (U.Chicago) Statistics for resonance search page 3

Introduction Two statistical questions Is there a discrepancy between data and background? What is the probability density function (p.d.f.) of the number of RS G γγ signal events? Georgios Choudalakis (U.Chicago) Statistics for resonance search page 4

Introduction The Kolmogorov-Smirnov test, the χ test, etc. KS test statistic: D = max CDF CDF K&S computed the p.d.f. of D, when the two distributions are consistent (i.e. in the null hypothesis). a Knowing the p.d.f. of D, one can reject the null hypothesis if D obs is uncomfortably big: P(D > D obs ) = i= ( )i e i C (N,N )D obs. a This is directly analogous to Pearson s theorem, where he computed the p.d.f. of χ in the null hypothesis, and showed it to follow a χ - distribution. Georgios Choudalakis (U.Chicago) Statistics for resonance search page 5

Genius vs Computer : 0- Introduction Georgios Choudalakis (U.Chicago) Statistics for resonance search page 6

Introduction We can define any test statistic we want, and find experimentally its p.d.f. in the null hypothesis. It turns out there are better tests than KS and χ. In the recent dijet resonance search, we used 6 tests (to be as model-independent as possible, and just because it didn t hurt): KS χ lnl Jeffreys divergence TailHunter BumpHunter Georgios Choudalakis (U.Chicago) Statistics for resonance search page 7

The code Introduction You are welcome to use the code! Writing proper documentation, but in the meanwhile you can cite: Phys.Rev.D79:0,009 & Phys.Rev.D78:000,008 Download package (with example) from SVN: https://svnweb.cern.ch/trac/atlasgrp/browser/institutes/uchicago/dijetmassandchi/trunk/macros/package I ll be happy to provide technical support. Georgios Choudalakis (U.Chicago) Statistics for resonance search page 8

The BumpHunter BumpHunter This version operates on binned histograms. [Generalization to unbinned data is easy.] Decide width, in number of bins, of central window (W C ). Width of each sideband = max(, W C ) 3 Decide position of central window. Start from a position that gives enough room for the left sideband. 4 Cound data (D C ) and background (B C ) in the central window 5 Cound data (D L, D R ) and background (B L, B R ) in the Left and Right sidebands 6 Compute P L,P C,P R, where P X = BX d d=d X d! e B X if D X B X, or P X = D X BX d d=0 d! e B X if D X < B X. [Big excesses or big deficits are both treated as discrepancies.] 7 If D C < B C, or P L < P C, or P R < P C, then P C. [Look for an excess, surrounded by agreeing sidebands. This of course we can tweak to our liking.] 8 Translate the central window and its sidebands by bin to the right, and find P L,P C,P R again. 9 Start a scan with bigger W C, until W C is of the whole spectrum. [This too can be tweaked, if you have a certitude that the new physics will be narrow or wide or have a specific mass-width relationship.] Return lnp min C from all positions and all widths tried. Georgios Choudalakis (U.Chicago) Statistics for resonance search page 9

The BumpHunter BumpHunter Toy example, injecting signal Gaus(00 ± 50 GeV) Events 4 3 Interval range 000 00-0 events 500 00 500 000 500 jj Reconstructed m [GeV] 0 0 500 00 Interval number Georgios Choudalakis (U.Chicago) Statistics for resonance search page

The BumpHunter BumpHunter Toy example, injecting signal Gaus(00 ± 50 GeV) Events 4 3 Interval range 000 00 - events 500 00 500 000 500 jj Reconstructed m [GeV] 0 0 500 00 Interval number Georgios Choudalakis (U.Chicago) Statistics for resonance search page

The BumpHunter BumpHunter Toy example, injecting signal Gaus(00 ± 50 GeV) Events 4 3 Interval range 000 00-30 events 500 00 500 000 500 jj Reconstructed m [GeV] 0 0 500 00 Interval number Georgios Choudalakis (U.Chicago) Statistics for resonance search page

BumpHunter The BumpHunter Toy example, injecting signal Gaus(00 ± 50 GeV) Events 4 3 Interval range 500 000 500 00 500-40 events 500 00 500 000 500 jj Reconstructed m [GeV] 0 500 00 Interval number Georgios Choudalakis (U.Chicago) Statistics for resonance search page 3

BumpHunter The BumpHunter Toy example, injecting signal Gaus(00 ± 50 GeV) Events 4 3 Interval range 500 000 500 00 500-50 events 500 00 500 000 500 jj Reconstructed m [GeV] 0 500 00 Interval number Georgios Choudalakis (U.Chicago) Statistics for resonance search page 4

BumpHunter The BumpHunter Toy example, injecting signal Gaus(00 ± 50 GeV) Events 4 3 Interval range 500 000 500 00 500-60 events 500 00 500 000 500 jj Reconstructed m [GeV] 0 500 00 Interval number Georgios Choudalakis (U.Chicago) Statistics for resonance search page 5

The BumpHunter BumpHunter Toy example, injecting signal Gaus(00 ± 50 GeV) Events 4 3 observed statistic.9668 PoissonPval of interval - - -3-4 500 00 500 000 500 jj Reconstructed m [GeV] 00 500 000 dijet mass [GeV] Georgios Choudalakis (U.Chicago) Statistics for resonance search page 6

The BumpHunter BumpHunter Toy example, injecting signal Gaus(00 ± 50 GeV) Events 4 3 observed statistic.563 PoissonPval of interval - -3-5 -7 500 00 500 000 500-9 00 500 jj Reconstructed m [GeV] dijet mass [GeV] Georgios Choudalakis (U.Chicago) Statistics for resonance search page 7

The BumpHunter BumpHunter Toy example, injecting signal Gaus(00 ± 50 GeV) Pseudo-experiments Pseudo-experiments bump hunter statistic 30 σ band 0 0 5 signal = 0 bump hunter statistic 0 bump hunter statistic signal = 50 events 0 50 0 Number of Signal Events So, if I observe a BumpHunter statistic =, I know the p-value in the Null Hypothesis; it s about 00. If 50 signal events are expected, it will be likely to observe such a statistic, so, it will be likely to reject the Null Hypothesis at the corresponding confidence level. Later we ll talk about sensitivity: What s the chance of making a.6σ discovery ( 5% false-positive probability) if s signal events are expected? How much does s need to be, to have probability β of declaring a discovery with false-positive probability α? Georgios Choudalakis (U.Chicago) Statistics for resonance search page 8

Summary of tests Other statistical tests BumpHunter Pseudo-experiments 3 bump hunter statistic 30 σ band 0 0 5 bump hunter statistic 0 50 0 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page 9

Summary of tests Other statistical tests TailHunter : Like the BumpHunter, but instead of windows look for tails. No sideband criteria. Pseudo-experiments 3 tail hunter statistic 5 0 5 σ band 5 0 5 tail hunter statistic 0 50 0 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page 0

Summary of tests Other statistical tests logl = log i Poisson(d i;b i ) Pseudo-experiments 3 -ln(l) 0 σ band 90 80 90 0 -ln(l) 80 0 50 0 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page

Summary of tests Other statistical tests Pearson s famous χ Pseudo-experiments 3 ) ln(χ 4.5 4.0 σ band 3.5 3.0 3 4 ln(χ ) 0 50 0 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page

Summary of tests Other statistical tests Jeffreys divergence: A symmetric version of the Kullback-Leibler divergence. It expresses how surprised, on average, I ll be if I expected one distribution and observed the other. See literature on information theory. Pseudo-experiments 3 Jeffreys divergence 0.4 0.3-3 σ band 0. 0. 0. Jeffreys divergence 0. 0 50 0 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page 3

Summary of tests Other statistical tests The famous KS test. Pseudo-experiments 3 Kolmogorov-Smirnov 0.004 0.003 σ band 0.00 0.000 0.00 0.004 0.006 0.008 Kolmogorov-Smirnov 0.00 0 50 0 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page 4

Sensitivity Power curves of different tests, for signal = Gaus(00,50) Prob p-val <.6σ effect.5.0 -ln(l(s=0)/l(s)) BumpHunter BumpHunter, no sidebands -ln(l(s=0)) χ TailHunter KS Jeffreys 0.5 0.0 0 50 0 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page 5

Sensitivity Power curves of different tests, for signal = Gaus(00,400) Prob p-val <.6σ effect.5.0 -ln(l(s=0)/l(s)) BumpHunter BumpHunter, no sidebands -ln(l(s=0)) χ TailHunter KS Jeffreys 0.5 0.0 0 00 400 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page 6

Sensitivity Power curves of different tests, for signal = Gaus(000,50) Prob p-val <.6σ effect.5.0 -ln(l(s=0)/l(s)) BumpHunter BumpHunter, no sidebands -ln(l(s=0)) χ TailHunter KS Jeffreys 0.5 0.0 0 5 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page 7

Sensitivity Power curves of different tests, for signal = Gaus(000,400) Prob p-val <.6σ effect.5.0 -ln(l(s=0)/l(s)) BumpHunter BumpHunter, no sidebands -ln(l(s=0)) χ TailHunter KS Jeffreys 0.5 0.0 0 0 30 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page 8

Bayesian Limit-setting Limt-setting L(s) L(data s) = i bins Poisson(d i;b i +s i ) P(s data) = L(s) Prior(s)/Norm Georgios Choudalakis (U.Chicago) Statistics for resonance search page 9

Convolution Convolution of systematics L(s;λ JES,λ lumi,λ fit uncertainty ) = i bins Poisson(d i;b i (λ fit )+s i (λ JES,λ lumi )) 3-dimensional Gaussian (Normal) p.d.f. in the 3-dimensional space of λs: π( λ). Integration is not made by throwing pseudo-experiments. Instead, we use a grid from -3 to +3 in each dimension. L(s) = d λl(s; λ)π( λ) Georgios Choudalakis (U.Chicago) Statistics for resonance search page 30

Convolution of systematics Frequentist coverage of Bayesian limit Coverage of 95% Bayesian limit.00 ATLAS Preliminary 0.98 0.96 The coverage probability, or the fraction of times that the confidence interval defined by the Bayesian limit contains the true number of signal events, as a function of signal yield for a hypothetical q mass of 900 GeV. In this study, the coverage probabilities were found to lie in the vicinity of 95%, indicating compatibility between Bayesian and frequentist approaches. 0 50 0 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page 3

Convolution of systematics Systematics Convolution P(signal data) for flat prior 0.04 0.0 ATLAS Preliminary - s=7 TeV Data ( Ldt=96 nb ) Posterior p.d.f. for q*(400) No systematic uncertainty Ldt uncertainty JES and Ldt uncertainty JES and Ldt and fit uncertainty 0.00 0 0 00 Number of Signal Events Georgios Choudalakis (U.Chicago) Statistics for resonance search page 3

Convolution of systematics Conclusion The BumpHunter package offers a variety of model-independent tests. (The BumpHunter and TailHunter are just two of them.) The sensitivity of each method depends on the kind of signal, of course. Convolution of systematics was done more accurately and carefully, using the grid convolution (integration) method. You can use the software already. Documentation is being written. Georgios Choudalakis (U.Chicago) Statistics for resonance search page 33