# Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Size: px
Start display at page:

Download "Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1"

## Transcription

1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics, multivariate methods, goodness-of-fit tests 3 Parameter estimation (90 min.) general concepts, maximum likelihood, variance of estimators, least squares 4 Interval estimation (60 min.) setting limits 5 Further topics (60 min.) systematic errors, MCMC G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

2 The data stream Experiment records a mixture of events of different types, each with different numbers of particles, kinematic properties,... We are usually interested in events of a single type, in a search to see if they exist at all and/or to identify them for further study. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 2

3 Statistical tests (in a particle physics context) Suppose the result of a measurement for an individual event is a collection of numbers x 1 = number of muons, x 2 = mean p t of jets, x 3 = missing energy,... follows some n-dimensional joint pdf, which depends on the type of event produced, i.e., was it For each reaction we consider we will have a hypothesis for the pdf of, e.g., etc. Often call H 0 the signal hypothesis (the event type we want); H 1, H 2,... are background hypotheses. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 3

4 Selecting events Suppose we have a data sample with two kinds of events, corresponding to hypotheses H 0 and H 1 and we want to select those of type H 0. Each event is a point in space. What decision boundary should we use to accept/reject events as belonging to event type H 0? Perhaps select events with cuts : H 1 H 0 accept G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 4

5 Other ways to select events Or maybe use some other sort of decision boundary: linear or nonlinear H 1 H 1 H 0 accept H 0 accept How can we do this in an optimal way? G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 5

6 Test statistics Construct a test statistic of lower dimension (e.g. scalar) Goal is to compactify data without losing ability to discriminate between hypotheses. We can work out the pdfs Decision boundary is now a single cut on t. This effectively divides the sample space into two regions where we either: accept H 0 (acceptance region) or reject it (critical region). G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 6

7 Significance level and power of a test Probability to reject H 0 if it is true (error of the 1st kind): (significance level) Probability to accept H 0 if H 1 is true (error of the 2nd kind): ( power) G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 7

8 Efficiency of event selection Signal efficiency, i.e., probability to accept event which is signal, Background efficiency, i.e., probability to accept background event, Expected number of signal events: s = s s L Expected number of background events: b = b b L s, b = signal, background cross sections; L = integrated luminosity G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 8

9 Purity of event selection Suppose only one background type b; overall fractions of signal and background events are s and b (prior probabilities). Suppose we select events with t < t cut. What is the purity of our selected sample? Here purity means the probability to be signal given that the event was accepted. Using Bayes theorem we find: So the purity depends on the prior probabilities as well as on the signal and background efficiencies. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 9

10 Constructing a test statistic How can we select events in an optimal way? Neyman-Pearson lemma (proof in Brandt Ch. 8) states: To get the lowest b for a given s (highest power for a given significance level), choose acceptance region such that where c is a constant which determines s. Equivalently, optimal scalar test statistic is N.B. any monotonic function of this is just as good. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 10

11 Purity vs. efficiency optimal trade-off Consider selecting n events: expected numbers s from signal, b from background; n ~ Poisson (s + b) Suppose b is known and goal is to estimate s with minimum relative statistical error. Take as estimator: Variance of Poisson variable equals its mean, therefore So we should maximize equivalent to maximizing product of signal efficiency purity. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 11

12 Why Neyman-Pearson doesn t always help The problem is that we usually don t have explicit formulae for the pdfs Instead we may have Monte Carlo models for signal and background processes, so we can produce simulated data, and enter each event into an n-dimensional histogram. Use e.g. M bins for each of the n dimensions, total of M n cells. But n is potentially large, prohibitively large number of cells to populate with Monte Carlo data. Compromise: make Ansatz for form of test statistic with fewer parameters; determine them (e.g. using MC) to give best discrimination between signal and background. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 12

13 Linear test statistic Ansatz: Choose the parameters a 1,..., a n so that the pdfs have maximum separation. We want: large distance between mean values, small widths g (t) s s b b t Fisher: maximize G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 13

14 Determining coefficients for maximum separation We have where In terms of mean and variance of this becomes G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 14

15 Determining the coefficients (2) The numerator of J(a) is between classes and the denominator is within classes maximize G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 15

16 Fisher discriminant Setting gives Fisher s linear discriminant function: H 1 Corresponds to a linear decision boundary. H 0 accept G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 16

17 Fisher discriminant for Gaussian data Suppose is multivariate Gaussian with mean values and covariance matrices V 0 = V 1 = V for both. For this case we can show that the Fisher discriminant is equivalent to using the likelihood-ratio, and thus gives the maximum purity for a given efficiency. For non-gaussian data this no longer holds, but linear discriminant function may be simplest practical solution. Can try to transform data so as to better approximate Gaussian before constructing Fisher discrimimant. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 17

18 Nonlinear test statistics The optimal decision boundary may not be a hyperplane, nonlinear test statistic Multivariate statistical methods are a Big Industry: H 1 Neural Networks, Support Vector Machines, Kernel density methods,... H 0 accept Particle Physics can benefit from progress in Machine Learning. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 18

19 Introduction to neural networks Used in neurobiology, pattern recognition, financial forecasting,... Here, neural nets are just a type of test statistic. Suppose we take t(x) to have the form logistic sigmoid This is called the single-layer perceptron. s( ) is monotonic equivalent to linear t(x) G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 19

20 The activation function The activation function function s(t) is often taken to be a logistic sigmoid: G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 20

21 The multi-layer perceptron Generalize from one layer to the multilayer perceptron: The values of the nodes in the intermediate (hidden) layer are and the network output is given by weights (connection strengths) G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 21

22 Neural network discussion Easy to generalize to arbitrary number of layers. Feed-forward net: values of a node depend only on earlier layers, usually only on previous layer ( network architecture ). More nodes neural net gets closer to optimal t(x), but more parameters need to be determined. Parameters usually determined by minimizing an error function, where t (0), t (1) are target values, e.g., 0 and 1 for logistic sigmoid. Expectation values replaced by averages of training data (e.g. MC). In general training can be difficult; standard software available. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 22

23 Neural network example from LEP II Signal: e e W W (often 4 well separated hadron jets) Background: e e qqgg (4 less well separated hadron jets) input variables based on jet structure, event shape,... none by itself gives much separation. Neural network output does better... (Garrido, Juste and Martinez, ALEPH ) G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 23

24 Neural network discussion (2) Why not use all of the available input variables? Fewer inputs fewer parameters to be adjusted, parameters better determined for finite training data. Some inputs may be highly correlated drop all but one. Some inputs may contain little or no discriminating power between the hypotheses drop them. NN exploits higher moments (nonlinear features) of joint pdf f(x H), but these may not be well modeled in training data. Better to have simper t(x) where you can understand what it s doing. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 24

25 Neural network discussion (3) Recall that the purpose of the statistical test is usually to select objects for further study; e.g. select WW events, then measure their properties (e.g. particle multiplicity). Need to avoid input variables that are correlated with the properties of the selected objects that you want to study. (Not always easy; correlations may be poorly known.) Some NN references: L. Lönnblad et al., Comp. Phys. Comm., 70 (1992) 167; C. Peterson et al., Comp. Phys. Comm., 81 (1994) 185; C.M. Bishop, Neural Networks for Pattern Recognition, OUP (1995); John Hertz et al., Introduction to the Theory of Neural Computation, Addison-Wesley, New York (1991). G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 25

26 Testing goodness-of-fit Suppose hypothesis H predicts pdf observations for a set of We observe a single point in this space: What can we say about the validity of H in light of the data? Decide what part of the data space represents less compatibility with H than does the point (Not unique!) less compatible with H more compatible with H G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 26

27 p-values Express goodness-of-fit by giving the p-value for H: p = probability, under assumption of H, to observe data with equal or lesser compatibility with H relative to the data we got. This is not the probability that H is true! In frequentist statistics we don t talk about P(H) (unless H represents a repeatable observation). In Bayesian statistics we do; use Bayes theorem to obtain where (H) is the prior probability for H. For now stick with the frequentist approach; result is p-value, regrettably easy to misinterpret as P(H). G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 27

28 p-value example: testing whether a coin is fair Probability to observe n heads in N coin tosses is binomial: Hypothesis H: the coin is fair (p = 0.5). Suppose we toss the coin N = 20 times and get n = 17 heads. Region of data space with equal or lesser compatibility with H relative to n = 17 is: n = 17, 18, 19, 20, 0, 1, 2, 3. Adding up the probabilities for these values gives: i.e. p = is the probability of obtaining such a bizarre result (or more so) by chance, under the assumption of H. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 28

29 The significance of an observed signal Suppose we observe n events; these can consist of: n b events from known processes (background) n s events from a new process (signal) If n s, n b are Poisson r.v.s with means s, b, then n = n s + n b is also Poisson, mean = s + b: Suppose b = 0.5, and we observe n obs = 5. Should we claim evidence for a new discovery? Give p-value for hypothesis s = 0: G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 29

30 The significance of a peak Suppose we measure a value x for each event and find: Each bin (observed) is a Poisson r.v., means are given by dashed lines. In the two bins with the peak, 11 entries found with b = 3.2. The p-value for the s = 0 hypothesis is: G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 30

31 The significance of a peak (2) But... did we know where to look for the peak? give P(n 11) in any 2 adjacent bins Is the observed width consistent with the expected x resolution? take x window several times the expected resolution How many bins distributions have we looked at? look at a thousand of them, you ll find a 10-3 effect Did we adjust the cuts to enhance the peak? freeze cuts, repeat analysis with new data How about the bins to the sides of the peak... (too low!) Should we publish???? G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 31

32 Making a discovery Often compute p-value of the background only hypothesis H 0 using test variable related to a characteristic of the signal. p-value = Probability to see data as incompatible with H 0, or more so, relative to the data observed. Requires definition of incompatible with H 0 HEP folklore: claim discovery if p-value equivalent to a 5 fluctuation of Gaussian variable (one-sided) Actual p-value at which discovery becomes believable will depend on signal in question (subjective) Why not do Bayesian analysis? Usually don t know how to assign meaningful prior probabilities G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 32

33 Pearson s 2 statistic Test statistic for comparing observed data (n i independent) to predicted mean values For n i ~ Poisson( i ) we have V[n i ] = i, so this becomes (Pearson s 2 statistic) 2 = sum of squares of the deviations of the ith measurement from the ith prediction, using i as the yardstick for the comparison. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 33

34 Pearson s 2 test If n i are Gaussian with mean i and std. dev. i, i.e., n i ~ N( i, i2 ), then Pearson s 2 will follow the 2 pdf (here for 2 = z): If the n i are Poisson with i >> 1 (in practice OK for i > 5) then the Poisson dist. becomes Gaussian and therefore Pearson s 2 statistic here as well follows the 2 pdf. The 2 value obtained from the data then gives the p-value: G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 34

35 The 2 per degree of freedom Recall that for the chi-square pdf for N degrees of freedom, This makes sense: if the hypothesized i are right, the rms deviation of n i from i is i, so each term in the sum contributes ~ 1. One often sees 2 /N reported as a measure of goodness-of-fit. But... better to give 2 and N separately. Consider, e.g., i.e. for N large, even a 2 per dof only a bit greater than one can imply a small p-value, i.e., poor goodness-of-fit. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 35

36 Pearson s 2 with multinomial data If is fixed, then we might model n i ~ binomial with p i = n i / n tot. I.e. ~ multinomial. In this case we can take Pearson s 2 statistic to be If all p i n tot >> 1 then this will follow the chi-square pdf for N 1 degrees of freedom. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 36

37 Example of a 2 test This gives for N = 20 dof. Now need to find p-value, but... many bins have few (or no) entries, so here we do not expect 2 to follow the chi-square pdf. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 37

38 Using MC to find distribution of 2 statistic The Pearson 2 statistic still reflects the level of agreement between data and prediction, i.e., it is still a valid test statistic. To find its sampling distribution, simulate the data with a Monte Carlo program: Here data sample simulated 10 6 times. The fraction of times we find 2 > 29.8 gives the p-value: p = 0.11 If we had used the chi-square pdf we would find p = G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 38

39 Wrapping up lecture 2 Main ideas of statistical tests and related issues for HEP: Discriminate between event types (hypotheses), determine selection efficiency, sample purity, etc. Some methods for constructing a test statistic Linear: Fisher discriminant Nonlinear: Neural networks Goodness-of-fit tests p-value (not same as P(H 0 )!), 2 = (data prediction) 2 / variance. Often 2 ~ chi-square pdf use to get p-value. Next we turn to: parameter estimation G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 39

### Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods

Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate methods www.pp.rhul.ac.uk/~cowan/stat_aachen.html Graduierten-Kolleg RWTH Aachen 10-14 February 2014 Glen Cowan Physics

### Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

### Statistics for the LHC Lecture 1: Introduction

Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University

### Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

### Statistical Methods for Particle Physics (I)

Statistical Methods for Particle Physics (I) https://agenda.infn.it/conferencedisplay.py?confid=14407 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

### Advanced statistical methods for data analysis Lecture 1

Advanced statistical methods for data analysis Lecture 1 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

### Statistical Methods for Particle Physics Lecture 2: multivariate methods

Statistical Methods for Particle Physics Lecture 2: multivariate methods http://indico.ihep.ac.cn/event/4902/ istep 2015 Shandong University, Jinan August 11-19, 2015 Glen Cowan ( 谷林 科恩 ) Physics Department

### Discovery and Significance. M. Witherell 5/10/12

Discovery and Significance M. Witherell 5/10/12 Discovering particles Much of what we know about physics at the most fundamental scale comes from discovering particles. We discovered these particles by

### Statistics for the LHC Lecture 2: Discovery

Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University of

### Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

### Some Statistical Tools for Particle Physics

Some Statistical Tools for Particle Physics Particle Physics Colloquium MPI für Physik u. Astrophysik Munich, 10 May, 2016 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk

### Statistical Data Analysis Stat 2: Monte Carlo Method, Statistical Tests

Statistical Data Analysis Stat 2: Monte Carlo Method, Statistical Tests London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

### Statistical Methods in Particle Physics Day 4: Discovery and limits

Statistical Methods in Particle Physics Day 4: Discovery and limits 清华大学高能物理研究中心 2010 年 4 月 12 16 日 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

### Hypothesis testing:power, test statistic CMS:

Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this

### Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits www.pp.rhul.ac.uk/~cowan/stat_aachen.html Graduierten-Kolleg RWTH Aachen 10-14 February 2014 Glen Cowan Physics Department

### Introduction to Statistical Methods for High Energy Physics

Introduction to Statistical Methods for High Energy Physics 2011 CERN Summer Student Lectures Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

### Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

### YETI IPPP Durham

YETI 07 @ IPPP Durham Young Experimentalists and Theorists Institute Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan Course web page: www.pp.rhul.ac.uk/~cowan/stat_yeti.html

### Applied Statistics. Multivariate Analysis - part II. Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1

Applied Statistics Multivariate Analysis - part II Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1 Fisher Discriminant You want to separate two types/classes (A and B) of

RWTH Aachen Graduiertenkolleg 9-13 February, 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan Course web page: www.pp.rhul.ac.uk/~cowan/stat_aachen.html

### Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

### Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics

### Primer on statistics:

Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

### Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistics for Particle Physics Kyle Cranmer New York University 1 Hypothesis Testing 55 Hypothesis testing One of the most common uses of statistics in particle physics is Hypothesis Testing! assume one

### Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Centro de ciencias Pedro Pascual Benasque, Spain 3-15

### Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

### Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Systematic uncertainties in statistical data analysis for particle physics DESY Seminar Hamburg, 31 March, 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

### Lectures on Statistical Data Analysis

Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk

### Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

### Hypothesis testing (cont d)

Hypothesis testing (cont d) Ulrich Heintz Brown University 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 1 Hypothesis testing Is our hypothesis about the fundamental physics correct? We will not be able

### Use of the likelihood principle in physics. Statistics II

Use of the likelihood principle in physics Statistics II 1 2 3 + Bayesians vs Frequentists 4 Why ML does work? hypothesis observation 5 6 7 8 9 10 11 ) 12 13 14 15 16 Fit of Histograms corresponds This

### Hypotheses Testing. Chapter Hypotheses and tests statistics

Chapter 8 Hypotheses Testing In the previous chapters we used experimental data to estimate parameters. Here we will use data to test hypotheses. A typical example is to test whether the data are compatible

### Some Topics in Statistical Data Analysis

Some Topics in Statistical Data Analysis Invisibles School IPPP Durham July 15, 2013 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan G. Cowan

### Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics istep 2014 IHEP, Beijing August 20-29, 2014 Glen Cowan ( Physics Department Royal Holloway, University of London

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

### The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

### The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

### Statistical Methods for Particle Physics Tutorial on multivariate methods

Statistical Methods for Particle Physics Tutorial on multivariate methods http://indico.ihep.ac.cn/event/4902/ istep 2015 Shandong University, Jinan August 11-19, 2015 Glen Cowan ( 谷林 科恩 ) Physics Department

### Fundamental Probability and Statistics

Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are

### Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

### Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

### Machine Learning Lecture 5

Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

### Ways to make neural networks generalize better

Ways to make neural networks generalize better Seminar in Deep Learning University of Tartu 04 / 10 / 2014 Pihel Saatmann Topics Overview of ways to improve generalization Limiting the size of the weights

### Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Physics 403 Classical Hypothesis Testing: The Likelihood Ratio Test Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Bayesian Hypothesis Testing Posterior Odds

### Statistical Data Analysis 2017/18

Statistical Data Analysis 2017/18 London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk

### Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

### Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Statistics Lent Term 2015 Prof. Mark Thomson Lecture 2 : The Gaussian Limit Prof. M.A. Thomson Lent Term 2015 29 Lecture Lecture Lecture Lecture 1: Back to basics Introduction, Probability distribution

### Statistical Methods for Particle Physics

Statistical Methods for Particle Physics Invisibles School 8-13 July 2014 Château de Button Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

### Statistical Methods in Particle Physics

Statistical Methods in Particle Physics 8. Multivariate Analysis Prof. Dr. Klaus Reygers (lectures) Dr. Sebastian Neubert (tutorials) Heidelberg University WS 2017/18 Multi-Variate Classification Consider

### PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

### The Bayes classifier

The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

### Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

### FYST17 Lecture 8 Statistics and hypothesis testing. Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons

FYST17 Lecture 8 Statistics and hypothesis testing Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons 1 Plan for today: Introduction to concepts The Gaussian distribution Likelihood functions Hypothesis

### Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, June 2009

Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods CERN-JINR European School of High Energy Physics Bautzen, 14 27 June 2009 Glen Cowan Physics Department Royal Holloway, University

### ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

### Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

### Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Physics 403 Credible Intervals, Confidence Intervals, and Limits Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Summarizing Parameters with a Range Bayesian

### Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

### Introductory Statistics Course Part II

Introductory Statistics Course Part II https://indico.cern.ch/event/735431/ PHYSTAT ν CERN 22-25 January 2019 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

### Pattern Recognition and Machine Learning

Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

### Statistical Methods for LHC Physics

RHUL Physics www.pp.rhul.ac.uk/~cowan Particle Physics Seminar University of Glasgow 26 February, 2009 1 Outline Quick overview of physics at the Large Hadron Collider (LHC) New multivariate methods for

### Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

### Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative

### Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits www.pp.rhul.ac.uk/~cowan/stat_freiburg.html Vorlesungen des GK Physik an Hadron-Beschleunigern, Freiburg, 27-29 June,

### INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

### Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

### Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

### Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

### Linear Models for Classification

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

### Multi-layer Neural Networks

Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

### E. Santovetti lesson 4 Maximum likelihood Interval estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson

### STA 414/2104: Machine Learning

STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

### 9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

### Artificial Neural Networks

Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

### Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

### Multivariate Data Analysis and Machine Learning in High Energy Physics (III)

Multivariate Data Analysis and Machine Learning in High Energy Physics (III) Helge Voss (MPI K, Heidelberg) Graduierten-Kolleg, Freiburg, 11.5-15.5, 2009 Outline Summary of last lecture 1-dimensional Likelihood:

### Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

### A Brief Review of Probability, Bayesian Statistics, and Information Theory

A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system

### STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

### Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

### ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

### Ling 289 Contingency Table Statistics

Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

### Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

### Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

### Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

### Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

### Discovery Potential for the Standard Model Higgs at ATLAS

IL NUOVO CIMENTO Vol.?, N.?? Discovery Potential for the Standard Model Higgs at Glen Cowan (on behalf of the Collaboration) Physics Department, Royal Holloway, University of London, Egham, Surrey TW EX,

### Why Try Bayesian Methods? (Lecture 5)

Why Try Bayesian Methods? (Lecture 5) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ p.1/28 Today s Lecture Problems you avoid Ambiguity in what is random

### Lecture 3: Pattern Classification

EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

### Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty

### Negative binomial distribution and multiplicities in p p( p) collisions

Negative binomial distribution and multiplicities in p p( p) collisions Institute of Theoretical Physics University of Wroc law Zakopane June 12, 2011 Summary s are performed for the hypothesis that charged-particle

### 7 Gaussian Discriminant Analysis (including QDA and LDA)

36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,

### LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

### LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

LECTURE NOTES FYS 4550/FYS9550 - EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I PROBABILITY AND STATISTICS A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO Before embarking on the concept

### Inf2b Learning and Data

Inf2b Learning and Data Lecture 13: Review (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/inf2b/