Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics

Similar documents
Statistics for the LHC Lecture 2: Discovery

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

Statistics Challenges in High Energy Physics Search Experiments

Recent developments in statistical methods for particle physics

Use of the likelihood principle in physics. Statistics II

Hypothesis testing (cont d)

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Some Statistical Tools for Particle Physics

Likelihood Statistics

RooStatsCms: a tool for analyses modelling, combination and statistical studies

Search for Higgs Bosons at LEP. Haijun Yang University of Michigan, Ann Arbor

Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics

Discovery significance with statistical uncertainty in the background estimate

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Introductory Statistics Course Part II

Statistics Forum News

Final status of L3 Standard Model Higgs searches and latest results from LEP wide combinations. Andre Holzner ETHZ / L3

Introduction to the Terascale: DESY 2015

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics

Primer on statistics:

Asymptotic formulae for likelihood-based tests of new physics

Search for Higgs in H WW lνlν

Combined Higgs Results

Discovery Potential for the Standard Model Higgs at ATLAS

Search for SM Higgs Boson at CMS

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Wald s theorem and the Asimov data set

ETH Zurich HS Mauro Donegà: Higgs physics meeting name date 1

The Higgs boson discovery. Kern-und Teilchenphysik II Prof. Nicola Serra Dr. Annapaola de Cosa Dr. Marcin Chrzaszcz

Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics

D0 Higgs Results and Tevatron Higgs Combination

First two sided limit on BR(B s μ + μ - ) Matthew Herndon, University of Wisconsin Madison SUSY M. Herndon, SUSY

Search for the Higgs Boson In HWW and HZZ final states with CMS detector. Dmytro Kovalskyi (UCSB)

Modern Methods of Data Analysis - WS 07/08

The search for standard model Higgs boson in ZH l + l b b channel at DØ

Thesis. Wei Tang. 1 Abstract 3. 3 Experiment Background The Large Hadron Collider The ATLAS Detector... 4

CMS Internal Note. The content of this note is intended for CMS internal use and distribution only

Beyond the Standard Model Higgs Boson Searches at DØ

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

RooFit/RooStats Analysis Report

PoS(ICHEP2012)238. Search for B 0 s µ + µ and other exclusive B decays with the ATLAS detector. Paolo Iengo

MODIFIED FREQUENTIST ANALYSIS OF SEARCH RESULTS (THE CL s METHOD)

Search and Discovery Statistics in HEP

Search for the Standard Model Higgs boson decaying into a WW pair

PoS(ICHEP 2010)544. Higgs searches at the Tevatron. Ben Kilminster Fermilab bjk AT fnal.gov

How to find a Higgs boson. Jonathan Hays QMUL 12 th October 2012

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

Statistical Methods for LHC Physics

Status and Prospects of Higgs CP Properties with CMS and ATLAS

FYST17 Lecture 8 Statistics and hypothesis testing. Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons

arxiv: v3 [physics.data-an] 24 Jun 2013

Statistical Data Analysis Stat 3: p-values, parameter estimation

Confidence Limits and Intervals 3: Various other topics. Roger Barlow SLUO Lectures on Statistics August 2006

Statistical issues for Higgs Physics. Eilam Gross, Weizmann Institute of Science

Higgs Boson at the CMS experiment

The Tevatron Higgs Search

Contact the LHC

Search for Quark Substructure in 7 TeV pp Collisions with the ATLAS Detector

Statistical Methods for Particle Physics

Two Early Exotic searches with dijet events at ATLAS

Statistical Methods in Particle Physics

Higgs and Z τ + τ in CMS

Some Topics in Statistical Data Analysis

Single Top At Hadron Colliders

First two sided limit on BR(B s µ + µ - ) Matthew Herndon, University of Wisconsin Madison SUSY M. Herndon, SUSY

Combined Higgs Searches at DZero

Discovery potential of the SM Higgs with ATLAS

Search for high mass diphoton resonances at CMS

Searches for rare radiative Higgs boson decays to light mesons with ATLAS

Single top quark production at CDF

PoS(ACAT2010)057. The RooStats Project. L. Moneta CERN, Geneva, Switzerland K. Belasco. K. S. Cranmer. S.

Why I never believed the Tevatron Higgs sensitivity claims for Run 2ab Michael Dittmar

Statistical Methods for Particle Physics Tutorial on multivariate methods

Top results from CMS

Advanced Statistics Course Part I

Search for B (s,d) μμ in LHCb

Practical Statistics for Particle Physics

Albert De Roeck CERN, Geneva, Switzerland Antwerp University Belgium UC-Davis University USA IPPP, Durham UK. 14 September 2012

Statistical Methods in Particle Physics Day 4: Discovery and limits

Measurements of the Higgs Boson at the LHC and Tevatron

Extra dimensions and black holes at the LHC

arxiv: v1 [hep-ex] 8 Nov 2010

Text. Decays of heavy flavour hadrons measured by CMS and ATLAS. M.Smizanska, Lancaster University

Advanced statistical methods for data analysis Lecture 1

tth searches at ATLAS and CMS Thomas CALVET for the ATLAS and CMS collaborations Stony Brook University Apr 11 th, 2018

Observation of a New Particle with a Mass of 125 GeV

Statistical Data Analysis Stat 5: More on nuisance parameters, Bayesian methods

Hà γγ in the VBF production mode and trigger photon studies using the ATLAS detector at the LHC

ATLAS: LHC 2016: 240 GeV Higgs Mass State at 3.6 sigma

Two Higgs Doublets Model

Error analysis for efficiency

Evidence for Single Top Quark Production. Reinhard Schwienhorst

The Tevatron s Search for High Mass Higgs Bosons

Second Workshop, Third Summary

arxiv: v1 [hep-ex] 21 Aug 2011

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Searches for BSM Physics in Rare B-Decays in ATLAS

arxiv: v1 [hep-ex] 9 Jul 2013

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

Di-photon at 750 GeV! (A first read)

Transcription:

Statistical Tools in Collider Experiments Multivariate analysis in high energy physics Lecture 5 Pauli Lectures - 10/02/2012 Nicolas Chanon - ETH Zürich 1

Outline 1.Introduction 2.Multivariate methods 3.Optimization of MVA methods 4.Application of MVA methods in HEP 5.Understanding Tevatron and LHC results 2

Lecture 5. Understanding Tevatron and LHC results 3

Introduction / outline - Much more material in Tom Junk lectures on this topics! - This lecture will try to to focus on how analysis sensitivity estimate can be related to multivariate techniques - We will review the CLs method - How the CLs method is used to produce the exclusion limit plot - How are treated the systematic uncertainties - p-values, best fit, look-elsewhere effect 4

Tevatron Higgs combined result 95% CL Limit/SM 10 1 SM=1 Tevatron Run II Preliminary, L! 8.2 fb -1 Expected Observed ±1" Expected ±2" Expected Tevatron Exclusion March 7, 2011 130 140 150 160 170 180 190 200 m H (GeV/c 2 ) 5

LHC Higgs combined result 6

Sensitivity estimate as MVA - Multi-channel combination : uses log-likelihood ratio as test statistic - This can be seen as a giant multivariate analysis - Combines event counting experiment, shape analysis in single variables (on e.g. invariant mass, MVA output) 7

Multi-variable likelihood (II) Hypothesis testing Atypicalstatisticaltestisthelikelihood ratio between S + B hypothesis and B hypothesis with respect to data (here in the case of a counting experiment): Multi-variable likelihood (II) H γγ MVA analysis - Analysis sensitivity estimate is related to hypothesis testing - What is the likelihood of Q the = Multi-variable L(N data, N S + N B ) data to be consistent ; L(n, with x) background = likelihood (II) e x L(N data, N B ) n! xn only or signal +background hypothesis? Atypicalstatisticaltestisthelikelihood ratio between S + B hypot Construct likelihood ratio for Atypicalstatisticaltestisthelikelihood a binned histogram, withnbins ratio bins between (assuming S + Bthere hypothesis a hypothesis with respect to data (here in the case of a counting expe - Let nous correlation assume binned betweendistribution bins) : case for simplicity - Everything starts with the Poisson probability for observing Ndata event when Nb or Ns+Nb events are expected Q = L(N Q = data, L(N data, S N S + N B ) B ) Nbins Y Nbins ; L(n, x) = e x ; x) = e x Q L(N L(N data, B ) n! xn data, B ) n! xn binned = Q i = Y L(N datai, N Si + N Bi ) - Statistical test : the likelihood ratio i of the i two hypothesis L(N datai, N Bi :) Construct likelihood Construct Q = L(N ratio data, likelihood in Nthe S + multi-variable N B ratio ) for acase, binned with ; L(n, x) = e x histogram, Nchannels variables withnbins bins ( (provided thatno thecorrelation variables L(N are data between, uncorrelated) N B ) bins) :: n! xn Nbins Y Nbins Y L(N datai, N Si + N Bi ) - For an histograms made of N bins : hypothesis with respect to data (here in the case of a counting experiment) Construct likelihood ratio for a binned histogram, withnbins bins (assum no correlation between bins) : Q binned = i i Nbins Nchannels Nbins YY Y Nbins j Q ij - Multi-channel (or multi-variable) case : Nchannels Y Nchannels Nbins Y Yj The log-likelihood Construct ratio : likelihood ratio inq multichannels the multi-variable = Q binnedj case, = with QNchannel ij (provided that the variables are uncorrelated) j : j i j - The log-likelihood ratio : Nbins j Q multichannels = i Q i = ln(q Nchannels ij ) i j i Y Qbinnedj = L(N datai, N Bi ) Nchannels Y L(N datai, N Si + N Bi ) Q multichannels Construct = likelihood QQ binned binnedj ratio = = in theq multi-variable i = case, withnchannels varia (provided that the variables are uncorrelated) : L(N datai, N Bi ) j Nchannels X ln(q The multichannels log-likelihood )= ratio : j X j i j ln(q )= Nchannels X Nchannels Y Nbins Yj8 Nbins Xj Qij ln(q )

Confidence levels - The test-statistic -2.lnQ converges to a Chi2 law with large statistics - We would like to quantify the agreement between data and signal plus background hypothesis or background-only hypothesis - For this, one has to generate the expected Probability distributions of the teststatistic in the two hypothesis (i.e. when Ndata=Ns+Nb and Ndata=Nb) CL s+b = P s+b (lnq lnq obs )= Z lnqobs dp s+b dlnq dlnq CL b = P b (lnq lnq obs )= Z lnqobs dp b dlnq dlnq - This need to generate a number of toy-experiments (usually > 1000) - this step is avoided in the Bayesian framework - The agreement between data and S+B hypothesis is given by CLs+b, and with the B hypothesis by CLb : - CLb : probability to get a result less compatible with the B only hypothesis than the observed one - CLs+b : probability to get a result which is less compatible with a signal when the signal hypothesis is true 9

CLs method CLs method is used since LEP - CLs = CLs+b/CLb is not a probability - So-called ʻmodified frequentistʼ method - CLs+b has problems when Nobs is far below the expected Nb - More conservative than CLs+b Probability density 0.12 0.1 0.08 0.06 0.04 (a) LEP Observed m H = 115 GeV/c 2 Expected for background Expected for signal plus background CL s 1 10-1 10-2 10-3 10-4 Observed Expected for background LEP 0.02 10-5 114.4 115.3 0-15 -10-5 0 5 10 15-2 ln(q) 10-6 100 102 104 106 108 110 112 114 116 118 120 m H (GeV/c 2 ) 10

Exclusion : observed/expected Exclusion at 95% confidence level : CLs<0.05 - Meaning that the probability to observe more events than seen in the data with the signal+background hypothesis (normalized to the probability in the background hypothesis only) is less than 5% Observed limit at 95% CL - Once the probability distributions of lnq in the two hypothesis has been computed, one can integrate over them until lnqobs observed in data. This gives CLs and there is exclusion if CLs<0.05 Expected limit - Replace lnqobs with lnqb? (ie test statistic in the hypothesis of background only in data) - This can be done but is called the Asimov dataset. The probability to get in data the exact B distribution is very low (because of statistical fluctuations) - Again, one has to generate toy-experiments, according to the B only hypothesis 11

Signal strength modifier Signal strength modifier : - Let us test not only the Signal hypothesis s+b, but also μ.s+b where μ is the signal strength modifier Exclusion limit at 95% CL - CLs is computed for each signal strength (with some step). When CLs<0.05 is reached, there is exclusion at 95% CL 12

Exclusion : median, sigma bands Median expected limit at 95% CL - One run pseudo-experiments according to B only hypothesis in the pseudo-data (e.g. 1000) - For each pseudo-experiment, let us compute CLs+b and CLb => CLs for all signal strength - For each pseudo-experiment, scan over the signal strength to find μ which has CLs<0.05 - This gives μ95cl for each pseudo-experiment - Plot the distribution of μ95cl number of toys 120 100 80 60 40 - Median : expect limit at 95% CL (50% quantile) - 1-σ up and down bands : 21% and 79% quantiles - 2-σ up and down bands : 2.5% and 97.5% quantiles 20 0 0 20 40 60 80 100 95CL / SM 13

Statistical uncertainties Statistical uncertainties are taken into account : - In the definition of the likelihood (likelihood of b only to fluctuate to produce observed data) with the Poisson law : sensitive to statistical fluctuations - When generating the toys to produce the lnq distributions - When generating the toys to produce the r95cl distributions - Additionally, one can constrain the likelihood with nuisance parameters to take into account the systematic uncertainties - Often, systematic uncertainties are measured from control samples in data and are therefore reduced with more luminosity : behavior of a statistical uncertainty 14

Nuisance parameters - Nuisance parameters is a bayesian way of taking into account the systematic uncertainties in the likelihood - The bayesian priors for the uncertainties are just multiplied to the likelihood Example. - N events expected. - Common choice : gaussian uncertainty. 1-σ width : the value of the uncertainty. - Instead of generating N events for one toy, generate ε.n events where ε is a random number following a gaussian pdf centered in 1 and with width being the value of the uncertainty. - This example, the efficiency has a gaussian pdf. Pdfs for uncertainties : gaussian, log-normal, gamma Lnb (,, sb, ) Poissn ( s bgb ) ( b, ) meas meas b Probability density, dp/d# 5 4 3 2 1 Log Normal pdf $ =1.10 $=1.33 =1.20 $=1.50 $ 0 0 1 2 3 #=! "! 15

LEP/Tevatron/LHC Test statistic - Concept of profiling : first fit of the data to measure the nuisance parameters - Profiling is a way of measuring the nuisance parameters from data at each toy => systematic become statistic uncertainty Test statistic Profiled? Test statistic sampling LEP q µ = 2 ln L(data µ, θ) L(data 0, θ) no Bayesian-frequentist hybrid Tevatron q µ = 2 ln L(data µ,ˆθ µ ) L(data 0,ˆθ 0 ) yes Bayesian-frequentist hybrid LHC q µ = 2 ln L(data µ,ˆθ µ ) L(data ˆµ,ˆθ) yes (0 ˆµ µ) frequentist 16

Asymptotic limit Asymptotic limit for large statistics : arxiv:1007.1727 - Approximated formulae for high statistics - Based on LHC-type test-statistic - Formula based on the Asimov dataset : assuming no statistical fluctuation (S+B and B models that are given as input to the limit extraction procedure) - Very fast (one mass point, one strength : ~1min against several hours) Features : - Asymptotic limit gets the median expected right - Usually, sigma bands are too narrow with respect to the full CLs method 0 q 10 2 10 s = 20 1 s = 10 1 10 2 10 3 10 q 0,Asimov median[q 1] 0 1 10 s = 5 s = 2 s = 1 2 10 b 17

Exclusion limits Results are usually presented in two ways : - Upper limit on the cross-section (times branching ratio) : no theory uncertainty - Upper limit on the cross-section divided by the SM cross-section - Observed limit usually fluctuates around the expected limit 18

Testing different mass points : MVA? - Exclusion limits (and p-values, see later) are computed for each mass point to be tested - If with a 0.5 GeV step, one donʼt generate each signal sample for this mass (CPU demanding) - Rather interpolate the shapes of the discriminating variables between each generated mass point (see T. Junk lectures) Classifiers used for sensitivity - Have to be re-trained for each mass point - If training on very close mass points, statistical fluctuation for results might happens - On the other hand, interpolation is problematic - Usually not too close mass points are tested (ex: HWW) 19

p-value p-value = 1-CLb - Probability that the background model fluctuates to produce the fluctuation seen in data - P-value is related to significance : 1-CLb = erf(z/sqrt(2)) - Local p-value Z=5 (p-value<2.8 10-7 ) does not mean yet discovery... - Local p-value does not take into account the fact that similar searches are performed in near-by mass points (=> correction for LEE, global p-value) 20

Best fit 21

Channel compatibility 22

Look elsewhere effect ] do not apply. That is one cannot construct a unique test statistic enall possible signals and having asymptotic χ 2 -behaviour. Hence, specialised required for quantifying the compatibility of a given observation with the - When estimating p-value at one mass point, one should ideally take into account the p-value of the [13] do not apply. That is one cannot construct a unique test statistic enng all possible signals and having asymptotic χ 2 -behaviour. Hence, specialised other mass points tried - Re-introduce the mass dependence s(m) for the only are required 50 signal hypothesis. for quantifying the compatibility of a given observation with the d-only hypothesis. model 40 lobal - What test we statistic are looking to be associated at is a p-value with theover search the inmass some broad mass range ritten points as as follows: : global p-value 30 q 0 (ˆm q 0 H (ˆm ) = max H ) = q 0 max (m H ). 20 (13) m H al test statistic to be associated with the search in some broad mass range symptotic regime and for very small p-values, a procedure exists and is well in reference [14] that is largely based on Davies result [15]. 0Following these - Bounds on the p-value can be provided [arxiv: s, the p-value of the global test statistic can be written as follows: 5 1005.1891] b = P (q 0 (ˆm H ) >u) N u + 1 u is the average number of up-crossings of the likelihood ratio Where <Nu> is the average number of upcrossings 2 P scan χ 2 q 0 (m H )at 1 The definition of up-crossings is illustrated in Fig. 4. The ratio of global and lues at the is often level referred u of the to as test-statistic the trial factor. verage number of up-crossings at two levels u and u 0 are related via the following Events / unit mass m H q 0 (m H ). (13) ptotic regime and for very small p-values, a procedure exists and is well reference [14] that is largely based on Davies result [15]. Following these he p-value of the global test statistic can be written as follows: p global b = P (q 0 (ˆm H ) >u) N u + 1 2 P χ 2 1 p global q(m) 10 0 20 40 60 80 100 120 (u) (14) 0 0 20 40 60 80 100 120 m (u) (14) is the average number of up-crossings of the likelihood ratio scan q 0 (m H )at he definition of up-crossings is illustrated in Fig. 4. The ratio of global and s is often referredn to u as = the N uo trial e (u u factor. o)/2, (15) 23 We consider in addition number of degrees of freedom

Look elsewhere effect and MVA - Procedure involves running toys with small steps in mass: compute the mean number of expected crossing at a low level of the test-statistic (e.g. local Z=1) - Derive the global significance at e.g. local Z=2.6 - Here again, classifiers are trained for each generated mass point - MVA are not suited for evaluating sensitivity in a fine grain steps, approximations have to be made 24

Multi-channel/variable likelihood Different flavour of analysis sensitivity estimate per channel - Counting experiment - Categories - Shape Different observables are used to estimate the sensitivity accross channels (MVA output, invariant mass of different final states...) One can also imagine channels using several observables to estimate - Example of ATLAS Hgg PTDR (mass, pt, angular distribution) 25

RooStat RooStat : framework giving tools to compute the analysis sensitivity https://twiki.cern.ch/twiki/bin/view/roostats/webhome - Based on ROOT and RooFit - Many methods available : full CLs, asymptotic, bayesian framework - Allow to combine different categories Last Exercise (to go further...) - Try CLs with one category, once the selection on the MVA output has been applied - Compute observed limit and expected limit with Asimov dataset 26

Thank you! 27