Hypothesis testing (cont d) Ulrich Heintz Brown University 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 1
Hypothesis testing Is our hypothesis about the fundamental physics correct? We will not be able to give a yes no answer to this question but only a degree of confidence General approach: use parameter estimation techniques and determine p-value to quantify the degree of confidence 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 2
A simple counting experiment Consider an experiment which counts events of a certain type in search of a new process The expected background count from known processes is b = 5.2 The only model parameter is the number of counts from the new process s Thus the expected count is λ = b + s The probability to observe n counts is b+s e b + s n p n s = n! 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 3
p(n s = 0) Results Assume an observed count n The p-value ξ is the probability to observe at least n counts under the hypothesis s = 0 ξ n = p n s n=n n ξ 6 0.419 8 0.155 10 0.040 12 0.007 14 1.0 10 3 21 1.5 10 7 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 4 n
Hypothesis testing Framework for decision making between two hypotheses Null hypothesis H 0 - here s = 0 Alternative hypothesis H 1 - here s > 0 Reject H 0 if the p-value for observed data under H 0 is below some predefined threshold α Possible errors Error of the first kind (or type 1): reject H 0 if it is true ( false discovery ), probability = α Error of the second kind (or type 2): accept H 0 if it is false ( missed discovery), probability = β The probability 1 β to correctly reject H 0 if H 1 is true is called the power of the test For a given α select the test with the largest power 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 5
Z value Often p-values are very small Define z-value ( number of sigmas ) 1 2π z e 1 2 x2 dx = ξ z ξ 1 0.84 0 0.50 1 0.16 2 0.023 3 1.3 10 3 4 3.2 10 5 5 2.9 10 7 ξ For large counts use Gaussian approximation z s b 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 6
Distribution testing Generally it is more powerful to test a distribution of counts of many different event types Likelihood function for n 1, n 2, counts L μ = i p( n i b i + μs i μ is the signal strength parameter H 0 - background only (μ = 0) H 1 - background + signal (μ > 0) 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 7
Test statistic How to define the p-value for a distribution test? Choose a function, called the test statistic, which characterizes how signal-like the data are This could be a sum of squares χ 2 or a likelihood L Choose the function which maximizes the power of the test Only well-defined if H 1 is a simple hypothesis (doesn t have any free parameters) Then the most powerful test statistic is (Neyman-Pearson Lemma) t = ln L(H 1) L H 0 A generalization for complex H 1 is the profile likelihood ratio t = ln max μ>0 L 0 L μ Large values of t favor H 1, small values favor H 0 For an observed value t, the p-value then is ξ = p(t > t H 0 ) 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 8
Example μ = 0.72 μ = 0 Perform two max likelihood fits t = 4.92 What is the parent distribution of t? Generate many sample distributions with the background histogram as parent distribution Compute t for each of them 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 9
Results What fraction of sample distributions yield a value of t > t? ξ = 86 z = 3.13 105 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 10
Systematic uncertainties How can we consider systematic uncertainties in the hypothesis test? Say in our counting experiment we measure the background from a control experiment to be b = 5.2 ± 2.6 Possible approach: Include a probability distribution for b in the likelihood and average over all values of b Such parameters are called nuisance parameters because we are not fundamentally interested in their values (as opposed to the signal strength parameter, which we want to measure) 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 11
Result including systematic uncertainty 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 12
Result including systematic uncertainty 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 13
Result including systematic uncertainty In general, adding systematic uncertainties broadens the distributions of the test statistic, increases the p-value, and reduces the z-value Gaussian approximation for large counts z s b + δb 2 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 14
Look elsewhere effect We reject H 0 if ξ < α the probability to reject H 0 incorrectly is α If we repeat the procedure for the same H 0 but many different alternative hypotheses H 1 (eg test for peaks at different places) the probability that some tests reject H 0 becomes larger than α For n independent tests the probability is nα (if α 1) For example: Assume we want to carry out a test with 3 significance α = 0.0013 The probability to reject H 0 in one test is 0.13% If 10 independent channels are tested the probability to reject H 0 in any one of them is 1.3% Correct local p-value to a global p-value by multiplying with the trial factor n 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 15
Result with look elsewhere effect If we look only at M=500 we had ξ = 8.6 10 4 or z = 3.1 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 16
Result with look elsewhere effect If we look only at M=500 we had ξ = 8.6 10 4 or z = 3.1 If we look at a wider range of M, the probability to observe such a deviation increases Draw random samples and compute the minimum p-value for all values of M For M=300, 500, 700 we get ξ min = 2.2 10 3 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 17
Return to our simple counting experiment Count events of a certain type in search of a new process The expected background count from known processes is b = 5.2 The only model parameter is the number of counts from the new process s Thus the expected count is λ = b + s Suppose we count n = 8 events What statement can we make about s? Can we exclude large values of s such as s = 500? 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 18
Neyman construction Upper limit construction by hypothesis test inversion : 1. For a given s = s 0, carry out a hypothesis test with the null hypothesis s = s 0 and the alternative hypothesis s < s 0 with type-i error α (e.g., α = 0.05). 2. Repeat step 1 for different values of s 0. 3. The confidence interval for s comprises exactly those values s 0 for which the hypothesis test could not reject the null hypothesis s = s 0. For this formulation of the hypothesis test we get an upper limit with confidence level 1 α (here: 95%). This is known as the Neyman Construction. 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 19
Neyman construction Example: Counting experiment with b = 5.2. As a function of s, determine n 0 for which p(n obs < n 0 s) α: 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 20
Neyman construction Example: n obs = 8, the 95% C.L. upper limit for s is 9.2. 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 21
Empty intervals In the Neyman construction one can obtain empty intervals, e.g. for n obs = 1, one would state s < 0 at 95% C.L. For a correct-coverage method and true μ = 0, this happens in 5% of the cases, when n obs happens to be small. 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 22
Empty intervals Empty (or very small) intervals are unsatisfactory: We know we are in the 5% type I error case. We would cite a very strong limit although there is no experimental sensitivity for such small values. To avoid citing such intervals, one can modify the frequentist construction modified frequentist intervals also known as the CLs method. 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 23
The CLs method Small/empty intervals happen in case of incompatibility with background-only model (e.g. very few events even for background-only). 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 24
The CLs method Test statistic distribution for background-only model μ = 0. increase limit if data is incompatible with background-only hypothesis μ = 0. increase interval in case of small values for p b. 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 25
Definition of CLs The CLs-value is a modified p-value which is large for small p b CLs = p s+b p b In the limit construction, use CLs in place of p s+b as before Limit is μ for which CLs = α. CLs limits are always more conservative than Neyman limits because CLs p s+b by construction The CLs method prevents citing limits with no experimental sensitivity 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 26