Hypothesis testing:power, test statistic CMS:

Size: px

Start display at page:

Download "Hypothesis testing:power, test statistic CMS:"

Giles Miller
6 years ago
Views:

1 Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this goal, especially in many dimensions the observables are often replaced by a one dimensional function of the obervables, called the test statistic p d w H ) 1 p ( 1 t( d) w H ) ( t 0 CMS:

2 From CMS (arxiv: )

Testing the hypothesis: no Higgs Summary: Multivariate Analysis MVA required for hypothesis testing ( classification ) and parameter estimation ( regression ) in high-dimensional parameter spaces

3 Testing the hypothesis: no Higgs Summary: Multivariate Analysis MVA required for hypothesis testing ( classification ) and parameter estimation ( regression ) in high-dimensional parameter spaces with complicated likelihood functions Curse of dimensionality For hypothesis testing, an ansatz is made for the test statistic, t, whose parameters are optimized using different criteria This optimisation can be difficult and is most often done by training data ( machine learning ) Jan Conrad, FK8006, Multivariate methods and learning machines

4 Summary: Multivariate analysis Examples we discussed were all classification Fisher discriminant: linear combination, optimised on separation in units of inter-sample variance PCA: diagonalize the covariance matrix to identify the most important, uncorrelated variables) Artificial neural networks: non-linear test statistic optimised (e.g). In the same way as the Fisher discriminant Jan Conrad, FK8006, Multivariate methods and learning machines Goodness of fit. Test of a null hypothesis with a test statistic t, but in this case the alternative hypothesis is the set of all possible alternative hypotheses. Thus the statement being aimed at is: If H 0 were true and the experiment were repeated many times, one would obtain data as likely (or less likely) than the observed data with probability p. p is then called the p-value. A small p-value is taken as evidence against the null hypothesis (bad-fit)

5 Example Poisson counting rate. Distribution-free tests In this case, we knew the distribution of the test statistic, but that is not generally true. For many cases, it can be quite complicated to calculate. One therefore considers distribution-free tests, i.e. a test whose distribution is known independent of the null hypothesis. In that case, it is sufficient to calculate the distribution of the test statistic once and then look up the value for your particular problem. The most commonly applicable null-distribution for such tests is the χ 2 distribution (--used for mapping t to the p-value).

6 Pearsons Chi-square test In general, you will want to measure the distance between data and hypothesis (in data space). A very reasonable way to do this is to use the quantity: Where Y denotes the data point, f the expected value under H 0 and V is the covariance matrix. This is called Pearson s chi-square, since for k-data points it behaves like χ 2 (k) if Y ~ Gaussian. Chi-square test for histograms In this case you use the asymptotic Normality of a mulinomial PDF to find the distribution of: where N is the total number of events in the histogram, V the covariance matrix and n the vector of bin contents. The most usual case looks a little simpler: this distribution behaves like a χ 2 (k-1). This requires Gaussianity for Np i with the empirical requirement of number of expected events (Np i > 5)

7 Chi-square test with estimation of parameters If you use the data to estimate the parameters of our parent distribution the Pearson test statistics T does not any more behave like χ 2 (k-1). In this case, the distribution is between χ 2 (k-1) and χ 2 (k-r-1), where r is the number of parameters that had been estimated from the data. Usually: χ 2 (k-r-1) holds (e.g. for maximum likelihood). Neyman s chi-square Instead of the expected number of events in the denominator, you consider the observed number of events. Easy, and asymptotically equivalent to Pearson s chi-square.

8 Choosing optimal bin size Choosing the bin size is a very often encountered problem in physics. Consider your experiment measures the energy of a particle with very good precision (i.e. choosing very wide bins in the energy observable would be ignoring information), on the other hand the number of particles you expect is very small, so if you choose to many bins you will not be able to use the Normal approximation anymore. Rule of thumb: for a given hypothesis H 0 choose the bin size such to have equal probability (for k bins given). Some more details on choosing optimal bin size You can use the data to estimate parameters of the hypothesis for which you then design the binning. Start with a large number of bins and then group adjacent bins together until the asymptotical distribution can be used. Choose the bin size a little smaller than the experimental resolution. Don t optimze the binning that gives lowest T. This would not be distributed as a χ 2

9 Likelihood Chi-square Instead of assuming Gaussianity, you can use the actual distribution of number of events in a bin. This is known: Poisson, if variable total number of events. Multinomial, if fixed total number In this case you can use the binned likelihood as a test statistic. Binned likelihood con t Define likelihood for perfect fit (n i = µ i ) Then the likelihood ratio becomes: and we set the last term to 0 if n = 0.

10 Binned likelihood con t The test statistic obeys asymptotically a chi-squre with χ 2 (r-1), r the number of bins. My recommendation is to use it for both parameter fitting and GOF testing. The unbinned likelihood (and the likelihood function itself) is usually is not a good GOF. Binned and Unbinned data Binning data always leads to loss of information, so in general tests on unbinned data should be superior. The most commonly used tests for unbinned data (that are distribution-free) are based on the order statistics. Given N independent data points x 1,,x N of the random variable X, consider the ordered sample x (1) x (2). x (N). This is called the order statistics, with distribution function (empirical distribution function):

11 Example Difference between two EDFs, used with different norms (for different tests) is now used as a test statistics Kolmogorov-Smirnov test Maximum deviation of the EDF from F(X) (expected distribution under H 0 ). For this test-statistics a null distribution can be found:

12 Kolmogorov test con t. Exercise: show that behaves as χ 2 (2) Kolmogorov test WARNING! The Kolmogorov test is NOT good for binned data (the option unluckily exists in some popular analysis tools, e.g. ROOT).

13 Summary goodness of fit. Test of a null hypothesis with a test statistic t, but in this case the alternative hypothesis is the set of all possible alternative hypotheses. If H 0 were true and the experiment were repeated many time, one would obtain data as likely (or less likely) than the observed data with probability P (p-value). We discussed mainly distribution free tests: Summary g.o.f tests. Chi-squared test for binned data with and without fitting (Pearson s chi-square, Neyman s chi-square, Likelihood chi-square) choice of bin size In general, if possible, it is better to use unbinned data. For unbinned data, g.o.f. test are usually based on order statistics. The most common test using order statistic is the Kolmogorov-Smirnov test, here the test statistic is the maximal distance of the distribution functions (cummulative distribution functions), different norms (distance-squared etc. represent different tests.

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE