Hypothesis testing:power, test statistic CMS:

Similar documents
Recall the Basics of Hypothesis Testing

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

FYST17 Lecture 8 Statistics and hypothesis testing. Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Multivariate statistical methods and data mining in particle physics

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Statistical Data Analysis Stat 3: p-values, parameter estimation

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Use of the likelihood principle in physics. Statistics II

Applied Statistics. Multivariate Analysis - part II. Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1

TUTORIAL 8 SOLUTIONS #

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

Modern Methods of Data Analysis - WS 07/08

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Summary of Chapters 7-9

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Primer on statistics:

Random Number Generation. CS1538: Introduction to simulations

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Statistical Methods for Particle Physics Tutorial on multivariate methods

Advanced statistical methods for data analysis Lecture 1

Machine Learning Linear Classification. Prof. Matteo Matteucci

HANDBOOK OF APPLICABLE MATHEMATICS

Statistical Methods for Astronomy

EE/CpE 345. Modeling and Simulation. Fall Class 9

How Good Are Your Fits?

Testing Statistical Hypotheses

3 Joint Distributions 71

Asymptotic Statistics-VI. Changliang Zou

6.867 Machine Learning

Hypotheses Testing. Chapter Hypotheses and tests statistics

Table of Contents. Multivariate methods. Introduction II. Introduction I

2.3 Analysis of Categorical Data

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Hypothesis Testing for Var-Cov Components

Statistics for the LHC Lecture 2: Discovery

Statistical Methods for Particle Physics (I)

Unsupervised machine learning

Data modelling Parameter estimation

The Bayes classifier

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Subject CS1 Actuarial Statistics 1 Core Principles

Testing Statistical Hypotheses

Classification: The rest of the story

EE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002

14.30 Introduction to Statistical Methods in Economics Spring 2009

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Practical Statistics

Discrete Multivariate Statistics

CMU-Q Lecture 24:

Introduction to Machine Learning

Statistical Methods in Particle Physics

Modeling and Performance Analysis with Discrete-Event Simulation

Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting. J. Marker, LSMWP, CLRS 1

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Introduction to Statistics and Error Analysis II

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

INFORMATION THEORY AND STATISTICS

CS534 Machine Learning - Spring Final Exam

Dimension Reduction (PCA, ICA, CCA, FLD,

Pattern Recognition and Machine Learning

Histograms were invented for human beings, so that we can visualize the observations as a density (estimated values of a pdf).

Machine learning - HT Maximum Likelihood

Multivariate Data Analysis and Machine Learning in High Energy Physics (III)

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Fundamental Probability and Statistics

E. Santovetti lesson 4 Maximum likelihood Interval estimation

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

BTRY 4090: Spring 2009 Theory of Statistics

Some Statistical Tools for Particle Physics

Statistical techniques for data analysis in Cosmology

Computer simulation on homogeneity testing for weighted data sets used in HEP

Robustness and Distribution Assumptions

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

MLPR: Logistic Regression and Neural Networks

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

A Measure of the Goodness of Fit in Unbinned Likelihood Fits; End of Bayesianism?

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

Introduction to Error Analysis

Hypothesis testing (cont d)

One-Sample Numerical Data

PATTERN CLASSIFICATION

Dimensionality Reduction and Principal Components

M A N O V A. Multivariate ANOVA. Data

Bayesian Methods for Machine Learning

A new class of binning-free, multivariate goodness-of-fit tests: the energy tests

Statistics for the LHC Lecture 1: Introduction

Inf2b Learning and Data

4.5.1 The use of 2 log Λ when θ is scalar

Model Estimation Example

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Linear Classification

Logistic Regression. Machine Learning Fall 2018

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Bivariate Relationships Between Variables

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Practice Problems Section Problems

Transcription:

Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this goal, especially in many dimensions the observables are often replaced by a one dimensional function of the obervables, called the test statistic p d w H ) 1 p ( 1 t( d) w H ) ( t 0 CMS:

From CMS (arxiv: 1202.1488)

Testing the hypothesis: no Higgs Summary: Multivariate Analysis MVA required for hypothesis testing ( classification ) and parameter estimation ( regression ) in high-dimensional parameter spaces with complicated likelihood functions Curse of dimensionality For hypothesis testing, an ansatz is made for the test statistic, t, whose parameters are optimized using different criteria This optimisation can be difficult and is most often done by training data ( machine learning ). 12-03-09 Jan Conrad, FK8006, Multivariate methods and learning machines

Summary: Multivariate analysis Examples we discussed were all classification Fisher discriminant: linear combination, optimised on separation in units of inter-sample variance PCA: diagonalize the covariance matrix to identify the most important, uncorrelated variables) Artificial neural networks: non-linear test statistic optimised (e.g). In the same way as the Fisher discriminant. 12-03-09 Jan Conrad, FK8006, Multivariate methods and learning machines Goodness of fit. Test of a null hypothesis with a test statistic t, but in this case the alternative hypothesis is the set of all possible alternative hypotheses. Thus the statement being aimed at is: If H 0 were true and the experiment were repeated many times, one would obtain data as likely (or less likely) than the observed data with probability p. p is then called the p-value. A small p-value is taken as evidence against the null hypothesis (bad-fit)

Example Poisson counting rate. Distribution-free tests In this case, we knew the distribution of the test statistic, but that is not generally true. For many cases, it can be quite complicated to calculate. One therefore considers distribution-free tests, i.e. a test whose distribution is known independent of the null hypothesis. In that case, it is sufficient to calculate the distribution of the test statistic once and then look up the value for your particular problem. The most commonly applicable null-distribution for such tests is the χ 2 distribution (--used for mapping t to the p-value).

Pearsons Chi-square test In general, you will want to measure the distance between data and hypothesis (in data space). A very reasonable way to do this is to use the quantity: Where Y denotes the data point, f the expected value under H 0 and V is the covariance matrix. This is called Pearson s chi-square, since for k-data points it behaves like χ 2 (k) if Y ~ Gaussian. Chi-square test for histograms In this case you use the asymptotic Normality of a mulinomial PDF to find the distribution of: where N is the total number of events in the histogram, V the covariance matrix and n the vector of bin contents. The most usual case looks a little simpler: this distribution behaves like a χ 2 (k-1). This requires Gaussianity for Np i with the empirical requirement of number of expected events (Np i > 5)

Chi-square test with estimation of parameters If you use the data to estimate the parameters of our parent distribution the Pearson test statistics T does not any more behave like χ 2 (k-1). In this case, the distribution is between χ 2 (k-1) and χ 2 (k-r-1), where r is the number of parameters that had been estimated from the data. Usually: χ 2 (k-r-1) holds (e.g. for maximum likelihood). Neyman s chi-square Instead of the expected number of events in the denominator, you consider the observed number of events. Easy, and asymptotically equivalent to Pearson s chi-square.

Choosing optimal bin size Choosing the bin size is a very often encountered problem in physics. Consider your experiment measures the energy of a particle with very good precision (i.e. choosing very wide bins in the energy observable would be ignoring information), on the other hand the number of particles you expect is very small, so if you choose to many bins you will not be able to use the Normal approximation anymore. Rule of thumb: for a given hypothesis H 0 choose the bin size such to have equal probability (for k bins given). Some more details on choosing optimal bin size You can use the data to estimate parameters of the hypothesis for which you then design the binning. Start with a large number of bins and then group adjacent bins together until the asymptotical distribution can be used. Choose the bin size a little smaller than the experimental resolution. Don t optimze the binning that gives lowest T. This would not be distributed as a χ 2

Likelihood Chi-square Instead of assuming Gaussianity, you can use the actual distribution of number of events in a bin. This is known: Poisson, if variable total number of events. Multinomial, if fixed total number In this case you can use the binned likelihood as a test statistic. Binned likelihood con t Define likelihood for perfect fit (n i = µ i ) Then the likelihood ratio becomes: and we set the last term to 0 if n = 0.

Binned likelihood con t The test statistic obeys asymptotically a chi-squre with χ 2 (r-1), r the number of bins. My recommendation is to use it for both parameter fitting and GOF testing. The unbinned likelihood (and the likelihood function itself) is usually is not a good GOF. Binned and Unbinned data Binning data always leads to loss of information, so in general tests on unbinned data should be superior. The most commonly used tests for unbinned data (that are distribution-free) are based on the order statistics. Given N independent data points x 1,,x N of the random variable X, consider the ordered sample x (1) x (2). x (N). This is called the order statistics, with distribution function (empirical distribution function):

Example Difference between two EDFs, used with different norms (for different tests) is now used as a test statistics Kolmogorov-Smirnov test Maximum deviation of the EDF from F(X) (expected distribution under H 0 ). For this test-statistics a null distribution can be found:

Kolmogorov test con t. Exercise: show that behaves as χ 2 (2) Kolmogorov test WARNING! The Kolmogorov test is NOT good for binned data (the option unluckily exists in some popular analysis tools, e.g. ROOT).

Summary goodness of fit. Test of a null hypothesis with a test statistic t, but in this case the alternative hypothesis is the set of all possible alternative hypotheses. If H 0 were true and the experiment were repeated many time, one would obtain data as likely (or less likely) than the observed data with probability P (p-value). We discussed mainly distribution free tests: Summary g.o.f tests. Chi-squared test for binned data with and without fitting (Pearson s chi-square, Neyman s chi-square, Likelihood chi-square) choice of bin size In general, if possible, it is better to use unbinned data. For unbinned data, g.o.f. test are usually based on order statistics. The most common test using order statistic is the Kolmogorov-Smirnov test, here the test statistic is the maximal distance of the distribution functions (cummulative distribution functions), different norms (distance-squared etc. represent different tests.