Wald s theorem and the Asimov data set

Similar documents
Discovery significance with statistical uncertainty in the background estimate

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Recent developments in statistical methods for particle physics

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

Statistics Forum News

Asymptotic formulae for likelihood-based tests of new physics

Introductory Statistics Course Part II

Some Statistical Tools for Particle Physics

Statistics for the LHC Lecture 2: Discovery

Minimum Power for PCL

Second Workshop, Third Summary

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

arxiv: v3 [physics.data-an] 24 Jun 2013

Some Topics in Statistical Data Analysis

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Search and Discovery Statistics in HEP

Statistics Challenges in High Energy Physics Search Experiments

Statistical issues for Higgs Physics. Eilam Gross, Weizmann Institute of Science

MIT Spring 2015

Statistical Methods for Particle Physics

Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics

Session 3 The proportional odds model and the Mann-Whitney test

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Primer on statistics:

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

Comparing two independent samples

Answer Key for STAT 200B HW No. 8

STAT 461/561- Assignments, Year 2015

A3. Statistical Inference Hypothesis Testing for General Population Parameters

Hypothesis testing (cont d)

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Empirical Power of Four Statistical Tests in One Way Layout

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Institute of Actuaries of India

Parameter Estimation and Fitting to Data

Maximum-Likelihood Estimation: Basic Ideas

Lecture 17: Likelihood ratio and asymptotic tests

Lecture 21: October 19

Likelihood-based inference with missing data under missing-at-random

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

2014/2015 Smester II ST5224 Final Exam Solution

Exercises Chapter 4 Statistical Hypothesis Testing

Open Problems in Mixed Models

Psychology 282 Lecture #4 Outline Inferences in SLR

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

TMA 4275 Lifetime Analysis June 2004 Solution

The Higgs boson discovery. Kern-und Teilchenphysik II Prof. Nicola Serra Dr. Annapaola de Cosa Dr. Marcin Chrzaszcz

Statistical Data Analysis Stat 5: More on nuisance parameters, Bayesian methods

theta a framework for template-based modeling and inference

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Some General Types of Tests

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

Frequency Distribution Cross-Tabulation

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Chapter 11. Hypothesis Testing (II)

Summary of Columbia REU Research. ATLAS: Summer 2014

Central Limit Theorem ( 5.3)

Topic 5: Generalized Likelihood Ratio Test

Multivariate Statistical Analysis

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

UNIVERSITY OF TORONTO Faculty of Arts and Science

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Hypothesis testing. Anna Wegloop Niels Landwehr/Tobias Scheffer

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

It's Only Fitting. Fitting model to data parameterizing model estimating unknown parameters in the model

Lecture 3. Inference about multivariate normal distribution

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

Journeys of an Accidental Statistician

TUTORIAL 8 SOLUTIONS #

Score test for random changepoint in a mixed model

MS&E 226: Small Data

Inferential Statistics

Topic 19 Extensions on the Likelihood Ratio

Hypothesis Testing One Sample Tests

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Practical Statistics for Particle Physics

Discovery Potential for the Standard Model Higgs at ATLAS

Solutions for Examination Categorical Data Analysis, March 21, 2013

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Statistics and econometrics

Composite Hypotheses and Generalized Likelihood Ratio Tests

Math 181B Homework 1 Solution

Econometrics of Panel Data

You can compute the maximum likelihood estimate for the correlation

Answer Key for STAT 200B HW No. 7

Testing equality of two mean vectors with unequal sample sizes for populations with correlation

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

A Very Brief Summary of Statistical Inference, and Examples

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

DA Freedman Notes on the MLE Fall 2003

Use of the likelihood principle in physics. Statistics II

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

Transcription:

Wald s theorem and the Asimov data set Eilam Gross & Ofer Vitells ATLAS statistics forum, Dec. 009 1

Outline We have previously guessed that the median significance of many toy MC experiments could be obtained simply by using the Asimov data set, i.e. the one data set in which all observed quantities are set equal to their expected values. This guess was supported by MC evidence but not proven. Here we show that due to a theorem by Wald [1943], this relation can indeed be mathematically proven, in the same limiting approximation as that of Wilks theorem (the χ approximation ).

Introduction The well known theorem by Wilks [1939] states that the profile likelihood ratio -logλ distributes asymptotically as χ, when the null hypothesis is true. Wald theorem [1943] generalizes this result to the non-null hypothesis: In that case the asymptotic distribution of -logλ is a non-central χ. The approx. is essentially ˆ ˆ Where θˆ are the MLE s, which are normally distributed with log L covariance matrix Vij = E[ ] (Fisher Information) θ θ The non-centrality parameter is θ 0 T 1 log λ( θ ) ( θ θ ) V ( θ θ ) i Λ= ( θ θ ) V ( θ θ ) T 1 0 0 j where is the true parameter (Kendall & Stuart, 6 th edition,.7, p.46) 3

Introduction For the Asimov data set, the MLE s assume their true values, θ so we have in this limit logλ asimov =Λ ˆasimov = θ 0 i.e., the Asimov data set produces the non-centrality parameter of the distribution of -logλ, under the non-null hypothesis χ Non-central With χ Λ= logλ asimov Example: counting experiment with a sideband measurement S=50, b = 100, τ = 0.5 4

Relation to median significance Since the non-null distribution is known, the median significance can be related the the Asimov value of -logλ by Z = log λ = F (1 ; log λ ) 1 median median asimov where F( x; Λ) is the cumulative distribution function of a non-central χ with non-centrality parameter Λ. The median converges to the Asimov value for Λ 1 We have not yet specified the null hypothesis, i.e. this relation holds for both discovery & exclusion 5

Physical constraints Usually we would not like to allow the signal strength µ to be negative, since this has no physical meaning For a single parameter of interest (µ) we have ( ˆ µ µ ) 1 log λ ( µ ) with log λ(0) asimov =Λ σ σµ ˆµ µ is normally distributed with variance σ µ and ˆ µ = 1 (under the non-null hypothesis, for discovery) If we require ˆ µ > 0, then P( ˆ µ > 1) = P( ˆ µ > 1) = P ( ˆ µ / σµ > 1/ σµ ) = 1 the median is given exactly by the Asimov value Z = logλ median asimov Toy MC with different values of S 6

Exclusion For exclusion, we test µ = 1, and ˆ µ = 0 under the non-null Similarly to the previous case, we need to require µ< ˆ 1 in order for the equality between the median and Asimov to hold : P(1 ˆ µ > 1) = P((1 ˆ µ ) > 1) = P ((1 ˆ µ ) / σµ > 1/ σµ ) = 1 if µ< ˆ 1 so Z median = logλ if µ< ˆ 1 asimov This also makes sense physically: we would not like to reject the null hypothesis (s+b) if the signal rate is higher then expected Notice that in this case negative values of ˆµ always result in -logλ(1) being larger then the median, since (1 ˆ µ ) > 1 if µ< ˆ 0. Therefore, the median will not change whether we require ˆµ to be positive or not. 7

Exclusion Toy MC with different values of S When µ< ˆ 1,The Asimov and median values agree, regardless of the additional requirement ˆ µ > 0 8

Significance error bands log λ ( µ ) ( ˆ µ µ ) σ µ Notice that the significance (e.g. log λ(0) ) is normally distributed with variance 1, i.e. the error bands of the significance are always just ±1 e.g. if the expected median significance is 4.8, then the ±1σ band (68% CL interval) is 4.8±1 unit variance gaussians 9

Combination example We extend the example to a combination of 5 channels, each one is a counting experiment with a sideband measurement L( µ ) = Poiss( ni µ si + bi ) Poiss( mi τ ibi ) i Unit variance Gaussian Non-centralχ With Λ= logλ asimov Median = 3.65 λ = 3.6 log asimov ±1 In this example: s i = {0,10, 5, 40, 5} b i = {10, 30, 10, 60, 40}, τ i = {0.5, 1,, 0.7, 1} 10

Conclusions Wald s theorem puts the Asimov guess on solid ground. we have shown that the Asimov data set can be used to obtain the median significance for both discovery & exclusion. Under physical constraints, ˆ µ > 0 for discovery and µ< ˆ 1 for exclusion, the Asimov value is equal to the median significance, as was previously conjectured. Wald s theorem gives the asymptotic distribution of the profile likelihood ratio under the non-null (alternative) hypothesis, from which one can get both the median significance and the associated error bands, without generating a single toy MC experiment. 11