Statistics for the LHC Lecture 2: Discovery

Similar documents
Recent developments in statistical methods for particle physics

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

Some Statistical Tools for Particle Physics

Statistics for the LHC Lecture 1: Introduction

Some Topics in Statistical Data Analysis

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Discovery significance with statistical uncertainty in the background estimate

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics

Introductory Statistics Course Part II

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Discovery Potential for the Standard Model Higgs at ATLAS

Statistical Methods in Particle Physics Day 4: Discovery and limits

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

Asymptotic formulae for likelihood-based tests of new physics

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Methods for Particle Physics (I)

Statistical Methods for Particle Physics

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

arxiv: v3 [physics.data-an] 24 Jun 2013

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics

Hypothesis testing (cont d)

Primer on statistics:

Multivariate statistical methods and data mining in particle physics

Use of the likelihood principle in physics. Statistics II

Introduction to Statistical Methods for High Energy Physics

Statistical Methods for Particle Physics Tutorial on multivariate methods

Introduction to Likelihoods

Lectures on Statistical Data Analysis

Discovery potential of the SM Higgs with ATLAS

arxiv: v1 [hep-ex] 9 Jul 2013

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

FYST17 Lecture 8 Statistics and hypothesis testing. Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons

Statistical Methods for Particle Physics Lecture 2: multivariate methods

Confidence Limits and Intervals 3: Various other topics. Roger Barlow SLUO Lectures on Statistics August 2006

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Search for tt(h bb) using the ATLAS detector at 8 TeV

Advanced statistical methods for data analysis Lecture 1

Overview of Statistical Methods in Particle Physics

Statistical Methods for LHC Physics

Statistical Methods in Particle Physics

Statistics Forum News

Statistics Challenges in High Energy Physics Search Experiments

Wald s theorem and the Asimov data set

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, June 2009

The Higgs boson discovery. Kern-und Teilchenphysik II Prof. Nicola Serra Dr. Annapaola de Cosa Dr. Marcin Chrzaszcz

Statistical Methods for Particle Physics Project on H ττ and multivariate methods

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

Summary of Columbia REU Research. ATLAS: Summer 2014

Albert De Roeck CERN, Geneva, Switzerland Antwerp University Belgium UC-Davis University USA IPPP, Durham UK. 14 September 2012

Minimum Power for PCL

ETH Zurich HS Mauro Donegà: Higgs physics meeting name date 1

Z boson studies at the ATLAS experiment at CERN. Giacomo Artoni Ph.D Thesis Project June 6, 2011

Combination of Higgs boson searches at ATLAS

ATLAS Discovery Potential of the Standard Model Higgs Boson

Combined Higgs Results

Title Text. ATLAS Higgs Boson Discovery Potential

Search for Higgs in H WW lνlν

Search for Higgs Bosons at LEP. Haijun Yang University of Michigan, Ann Arbor

PoS(EPS-HEP2011)250. Search for Higgs to WW (lνlν, lνqq) with the ATLAS Detector. Jonas Strandberg

ATLAS-CONF October 15, 2010

RooFit/RooStats Analysis Report

CMS Internal Note. The content of this note is intended for CMS internal use and distribution only

Status and Prospects of Higgs CP Properties with CMS and ATLAS

arxiv: v1 [hep-ex] 21 Aug 2011

Advanced Statistics Course Part I

Two Early Exotic searches with dijet events at ATLAS

P Values and Nuisance Parameters

Statistical Data Analysis Stat 5: More on nuisance parameters, Bayesian methods

Physics at Hadron Colliders

YETI IPPP Durham

Journeys of an Accidental Statistician

Discovery and Significance. M. Witherell 5/10/12

Is there evidence for a peak in this data?

MODIFIED FREQUENTIST ANALYSIS OF SEARCH RESULTS (THE CL s METHOD)

Detection of Z Gauge Bosons in the Di-muon Decay Mode in CMS

Search for SM Higgs Boson at CMS

Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics

PoS(CKM2016)117. Recent inclusive tt cross section measurements. Aruna Kumar Nayak

Searching for the Higgs at the Tevatron

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

tth searches at ATLAS and CMS Thomas CALVET for the ATLAS and CMS collaborations Stony Brook University Apr 11 th, 2018

Final status of L3 Standard Model Higgs searches and latest results from LEP wide combinations. Andre Holzner ETHZ / L3

Statistical issues for Higgs Physics. Eilam Gross, Weizmann Institute of Science

Statistical Data Analysis 2017/18

Higgs couplings and mass measurements with ATLAS. Krisztian Peters CERN On behalf of the ATLAS Collaboration

D0 Higgs Results and Tevatron Higgs Combination

Overview of the Higgs boson property studies at the LHC

Study of Diboson Physics with the ATLAS Detector at LHC

RWTH Aachen Graduiertenkolleg

E. Santovetti lesson 4 Maximum likelihood Interval estimation

The search for standard model Higgs boson in ZH l + l b b channel at DØ

VBF SM Higgs boson searches with ATLAS

Search for a heavy gauge boson W e

Search and Discovery Statistics in HEP

Transcription:

Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 1

Outline Lecture 1: Introduction and basic formalism Probability, statistical tests, parameter estimation. Lecture 2: Discovery Quantifying discovery significance and sensitivity Systematic uncertainties (nuisance parameters) Lecture 3: Exclusion limits Frequentist and Bayesian intervals/limits Lecture 4: Further topics More on Bayesian methods / model selection G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 2

Unfinished from Monday A few more remarks about p-values Brief overview of parameter estimation Main programme for today Statistics for discovery, i.e., rejecting the hypothesis that what we see is due solely to known (background) processes. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 3

A simple example For each event we measure two variables, x = (x 1, x 2 ). Suppose that for background events (hypothesis H 0 ), and for a certain signal model (hypothesis H 1 ) they follow where x 1, x 2 0 and C is a normalization constant. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 4

Likelihood ratio as test statistic In a real-world problem we usually wouldn t have the pdfs f(x H 0 ) and f(x H 1 ), so we wouldn t be able to evaluate the likelihood ratio for a given observed x, hence the need for multivariate methods to approximate this with some other function. But in this example we can find contours of constant likelihood ratio such as: G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 5

Event selection using the LR Using Monte Carlo, we can find the distribution of the likelihood ratio or equivalently of signal (H 1 ) background (H 0 ) From the Neyman-Pearson lemma we know that by cutting on this variable we would select a signal sample with the highest signal efficiency (test power) for a given background efficiency. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 6

Search for the signal process But what if the signal process is not known to exist and we want to search for it. The relevant hypotheses are therefore H 0 : all events are of the background type H 1 : the events are a mixture of signal and background Rejecting H 0 with Z > 5 constitutes discovering new physics. Suppose that for a given integrated luminosity, the expected number of signal events is s, and for background b. The observed number of events n will follow a Poisson distribution: G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 7

Likelihoods for full experiment We observe n events, and thus measure n instances of x = (x 1, x 2 ). The likelihood function for the entire experiment assuming the background-only hypothesis (H 0 ) is and for the signal plus background hypothesis (H 1 ) it is where p s and p b are the (prior) probabilities for an event to be signal or background, respectively. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 8

Likelihood ratio for full experiment We can define a test statistic Q monotonic in the likelihood ratio as To compute p-values for the b and s+b hypotheses given an observed value of Q we need the distributions f(q b) and f(q s+b). Note that the term s in front is a constant and can be dropped. The rest is a sum of contributions for each event, and each term in the sum has the same distribution. Can exploit this to relate distribution of Q to that of single event terms using (Fast) Fourier Transforms (Hu and Nielsen, physics/9906010). G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 9

Distribution of Q Take e.g. b = 100, s = 20. Suppose in real experiment Q is observed here. f (Q s+b) f (Q b) p-value of b only p-value of s+b G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 10

Systematic uncertainties Up to now we assumed all parameters were known exactly. In practice they have some (systematic) uncertainty. Suppose e.g. uncertainty in expected number of background events b is characterized by a (Bayesian) pdf p(b). Maybe take a Gaussian, i.e., where b 0 is the nominal (measured) value and s b is the estimated uncertainty. In fact for many systematics a Gaussian pdf is hard to defend more on this later. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 11

Distribution of Q with systematics To get the desired p-values we need the pdf f (Q), but this depends on b, which we don t know exactly. But we can obtain the Bayesian model average: With Monte Carlo, sample b from p(b), then use this to generate Q from f (Q b), i.e., a new value of b is used to generate the data for every simulation of the experiment. This broadens the distributions of Q and thus increases the p-value (decreases significance Z) for a given Q obs. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 12

Distribution of Q with systematics (2) For s = 20, b 0 = 100, s b = 10 this gives f (Q s+b) f (Q b) p-value of b only p-value of s+b G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 13

Using the likelihood ratio L(s)/L(s) ˆ Instead of the likelihood ratio L s+b /L b, suppose we use as a test statistic Intuitively this is a good measure of the level of agreement between the data and the hypothesized value of s. low l: poor agreement high l : good agreement 0 l 1 maximizes L(s) G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 14

L(s)/L(s) ˆ for counting experiment Consider an experiment where we only count n events with n ~ Poisson(s + b). Then. To establish discovery of signal we test the hypothesis s = 0 using whereas previously we had used which is monotonic in n and thus equivalent to using n as the test statistic. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 15

L(s)/L(s) ˆ for counting experiment (2) But if we only consider the possibility of signal being present when n > b, then in this range l(0) is also monotonic in n, so both likelihood ratios lead to the same test. b G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 16

L(s)/L(s) ˆ for general experiment If we do not simply count events but also measure for each some set of numbers, then the two likelihood ratios do not necessarily give equivalent tests, but in practice will be very close. l(s) has the important advantage that for a sufficiently large event sample, its distribution approaches a well defined form (Wilks Theorem). In practice the approach to the asymptotic form is rapid and one obtains a good approximation even for relatively small data samples (but need to check with MC). This remains true even when we have adjustable nuisance parameters in the problem, i.e., parameters that are needed for a correct description of the data but are otherwise not of interest (key to dealing with systematic uncertainties). G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 17

Prototype search analysis Expected Performance of the ATLAS Experiment: Detector, Trigger and Physics, arxiv:0901.0512, CERN-OPEN-2008-20. Search for signal in a region of phase space; result is histogram of some variable x giving numbers: Assume the n i are Poisson distributed with expectation values where strength parameter signal background G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 18

Prototype analysis (II) Often also have a subsidiary measurement that constrains some of the background and/or shape parameters: Assume the m i are Poisson distributed with expectation values Likelihood function is nuisance parameters (q s, q b,b tot ) G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 19

The profile likelihood ratio Base significance test on the profile likelihood ratio: maximizes L for specified m maximize L The likelihood ratio of point hypotheses gives optimum test (Neyman-Pearson lemma). The profile LR hould be near-optimal in present analysis with variable m and nuisance parameters q. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 20

Test statistic for discovery Try to reject background-only (m = 0) hypothesis using i.e. only regard upward fluctuation of data as evidence against the background-only hypothesis. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 21

p-value for discovery Large q 0 means increasing incompatibility between the data and hypothesis, therefore p-value for an observed q 0,obs is will get formula for this later From p-value get equivalent significance, G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 22

Expected (or median) significance / sensitivity When planning the experiment, we want to quantify how sensitive we are to a potential discovery, e.g., by given median significance assuming some nonzero strength parameter m. So for p-value, need f(q 0 0), for sensitivity, will need f(q 0 m ), G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 23

Wald approximation for profile likelihood ratio To find p-values, we need: For median significance under alternative, need: Use approximation due to Wald (1943) sample size G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 24

Noncentral chi-square for -2lnl(m) If we can neglect the O(1/ N) term, -2lnl(m) follows a noncentral chi-square distribution for one degree of freedom with noncentrality parameter As a special case, if m = m then L = 0 and -2lnl(m) follows a chi-square distribution for one degree of freedom (Wilks). G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 25

Distribution of q 0 Assuming the Wald approximation, we can write down the full distribution of q 0 as The special case m = 0 is a half chi-square distribution: G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 26

Cumulative distribution of q 0, significance From the pdf, the cumulative distribution of q 0 is found to be The special case m = 0 is The p-value of the m = 0 hypothesis is Therefore the discovery significance Z is simply G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 27

The Asimov data set To estimate median value of -2lnl(m), consider special data set where all statistical fluctuations suppressed and n i, m i are replaced by their expectation values (the Asimov data set): Asimov value of -2lnl(m) gives noncentrality param. L, or equivalently, s G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 28

Relation between test statistics and G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 29

Higgs search with profile likelihood Combination of Higgs boson search channels (ATLAS) Expected Performance of the ATLAS Experiment: Detector, Trigger and Physics, arxiv:0901.0512, CERN-OPEN-2008-20. Standard Model Higgs channels considered (more to be used later): H gg H WW ( * ) enmn H ZZ ( * ) 4l (l = e, m) H t + t - ll, lh Used profile likelihood method for systematic uncertainties: background rates, signal & background shapes. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 30

An example: ATLAS Higgs search (ATLAS Collab., CERN-OPEN-2008-020) G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 31

Cumulative distributions of q 0 To validate to 5s level, need distribution out to q 0 = 25, i.e., around 10 8 simulated experiments. Will do this if we really see something like a discovery. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 32

Combination of channels For a set of independent decay channels, full likelihood function is product of the individual ones: For combination need to form the full function and maximize to find estimators of m, q. ongoing ATLAS/CMS effort with RooStats framework Trick for median significance: estimator for m is equal to the Asimov value m for all channels separately, so for combination, where G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 33

Combined median significance ATLAS arxiv:0901.0512 N.B. illustrates statistical method, but study did not include all usable Higgs channels. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 page 34

Discovery significance for n ~ Poisson(s + b) Consider again the case where we observe n events, model as following Poisson distribution with mean s + b (assume b is known). 1) For an observed n, what is the significance Z 0 with which we would reject the s = 0 hypothesis? 2) What is the expected (or more precisely, median ) Z 0 if the true value of the signal rate is s? G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 35

Gaussian approximation for Poisson significance For large s + b, n x ~ Gaussian(m,s), m = s + b, s = (s + b). For observed value x obs, p-value of s = 0 is Prob(x > x obs s = 0),: Significance for rejecting s = 0 is therefore Expected (median) significance assuming signal rate s is G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 36

Better approximation for Poisson significance Likelihood function for parameter s is or equivalently the log-likelihood is Find the maximum by setting gives the estimator for s: G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 37

Approximate Poisson significance (continued) The likelihood ratio statistic for testing s = 0 is For sufficiently large s + b, (use Wilks theorem), To find median[z 0 s+b], let n s + b, This reduces to s/ b for s << b. G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 38

Wrapping up lecture 2 To establish discovery, we need (at least) to reject the background-only hypothesis (usually with Z > 5). For two simple hypotheses, best test of H 0 is from likelihood ratio L(H 1 )/L(H 0 ). Alternative likelihood ratio l(m) = L(m) / L(m) ˆ will lead to similar test; can use Wilks theorem to obtain sampling distribution. Methods for incorporating systematics somewhat different for the two approaches (in practice lead to similar results). Next: exclusion limits G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 39

Extra slides G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 40

Example: ALEPH Higgs search p-value (1 CL b ) of background only hypothesis versus tested Higgs mass measured by ALEPH Experiment Possible signal? Phys.Lett.B565:61-75,2003. hep-ex/0306033 G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 41

Example: LEP Higgs search Not seen by the other LEP experiments. Combined analysis gives p-value of background-only hypothesis of 0.09 for m H = 115 GeV. Phys.Lett.B565:61-75,2003. hep-ex/0306033 G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 2 42