Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 1 Statistische Methoden der Datenanalyse Kapitel 1: Fundamentale Konzepte Professor Markus Schumacher Freiburg / Sommersemester 2009
Data analysis in particle physics Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 2 Observe events of a certain type Measure characteristics of each event (particle momenta, number of muons, energy of jets,...) Theories (e.g. SM) predict distributions of these properties up to free parameters, e.g., a, G F, M Z, a s, m H,... Some tasks of data analysis: Estimate (measure) the parameters; Quantify the uncertainty of the parameter estimates; Test the extent to which the predictions of a theory are in agreement with the data.
Dealing with uncertainty In particle physics there are various elements of uncertainty: theory is not deterministic quantum mechanics random measurement errors present even without quantum effects things we could know in principle but don t e.g. from limitations of cost, time,... We can quantify the uncertainty using PROBABILITY Lecture 1 page 3 Prof. M. Markus Schumacher Schumacher Stat Meth. der Prediscussion Datenanalyse for Lecture Kapi,1: on Fundamentale Statistical Data Konzepten Analysis Uni. Uni. Freiburg Siegen / SoSe09 / SS08 3
A definition of probability Consider a set S with subsets A, B,... Kolmogorov axioms (1933) From these axioms we can derive further properties, e.g. Lecture 1 page 4 Prof. M. Markus Schumacher Schumacher Stat Meth. der Prediscussion Datenanalyse for Lecture Kapi,1: on Fundamentale Statistical Data Konzepten Analysis Uni. Uni. Freiburg Siegen / SoSe09 / SS08 4
Conditional probability, independence Also define conditional probability of A given B (with P(B) 0): E.g. rolling dice: Subsets A, B independent if: If A, B independent, N.B. do not confuse with disjoint subsets, i.e., Lecture 1 page 5 Prof. M. Markus Schumacher Schumacher Stat Meth. der Prediscussion Datenanalyse for Lecture Kapi,1: on Fundamentale Statistical Data Konzepten Analysis Uni. Uni. Freiburg Siegen / SoSe09 / SS08 5
Interpretation of probability I. Relative frequency A, B,... are outcomes of a repeatable experiment cf. quantum mechanics, particle scattering, radioactive decay... II. Subjective probability A, B,... are hypotheses (statements that are true or false) Both interpretations consistent with Kolmogorov axioms. In particle physics frequency interpretation often most useful, but subjective probability can provide more natural treatment of non-repeatable phenomena: systematic uncertainties, probability that Higgs boson exists,... Lecture 1 page 6 Prof. M. Markus Schumacher Schumacher Stat Meth. der Prediscussion Datenanalyse for Lecture Kapi,1: on Fundamentale Statistical Data Konzepten Analysis Uni. Uni. Freiburg Siegen / SoSe09 / SS08 6
Bayes theorem From the definition of conditional probability we have, and but, so Bayes theorem First published (posthumously) by the Reverend Thomas Bayes (1702 1761) An essay towards solving a problem in the doctrine of chances, Philos. Trans. R. Soc. 53 (1763) 370; reprinted in Biometrika, 45 (1958) 293. Lecture 1 page 7 Prof. M. Markus Schumacher Schumacher Stat Meth. der Prediscussion Datenanalyse for Lecture Kapi,1: on Fundamentale Statistical Data Konzepten Analysis Uni. Uni. Freiburg Siegen / SoSe09 / SS08 7
The law of total probability B Consider a subset B of the sample space S, S divided into disjoint subsets A i such that [ i A i = S, A i B A i law of total probability Bayes theorem becomes Lecture 1 page 8 Prof. M. Markus Schumacher Schumacher Stat Meth. der Prediscussion Datenanalyse for Lecture Kapi,1: on Fundamentale Statistical Data Konzepten Analysis Uni. Uni. Freiburg Siegen / SoSe09 / SS08 8
An example using Bayes theorem Suppose the probability (for anyone) to have AIDS is: prior probabilities, i.e., before any test carried out Consider an AIDS test: result is + or - probabilities to (in)correctly identify an infected person probabilities to (in)correctly identify an uninfected person Suppose your result is +. How worried should you be? Lecture 1 page 9 Prof. M. Markus Schumacher Schumacher Stat Meth. der Prediscussion Datenanalyse for Lecture Kapi,1: on Fundamentale Statistical Data Konzepten Analysis Uni. Uni. Freiburg Siegen / SoSe09 / SS08 9
Bayes theorem example (cont.) The probability to have AIDS given a + result is posterior probability i.e. you re probably OK! Your viewpoint: my degree of belief that I have AIDS is 3.2% Your doctor s viewpoint: 3.2% of people like this will have AIDS Lecture 1 page 10 Prof. M. Markus Schumacher Schumacher Stat Meth. der Prediscussion Datenanalyse for Lecture Kapi,1: on Fundamentale Statistical Data Konzepten Analysis Uni. Uni. Freiburg Siegen / SoSe09 / SS08 10
Frequentist Statistics general philosophy Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 11 In frequentist statistics, probabilities are associated only with the data, i.e., outcomes of repeatable observations (shorthand: ). Probability = limiting frequency Probabilities such as P (Higgs boson exists), P (0.117 < a s < 0.121), etc. are either 0 or 1, but we don t know which. The tools of frequentist statistics tell us what to expect, under the assumption of certain probabilities, about hypothetical repeated observations. The preferred theories (models, hypotheses,...) are those for which our observations would be considered usual.
Bayesian Statistics general philosophy Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Lecture Uni. 1 Freiburg page / SoSe09 12 12 In Bayesian statistics, use subjective probability for hypotheses: probability of the data assuming hypothesis H (the likelihood) prior probability, i.e., before seeing the data posterior probability, i.e., after seeing the data normalization involves sum over all possible hypotheses Bayes theorem has an if-then character: If your prior probabilities were p (H), then it says how these probabilities should change in the light of the data. No general prescription for priors (subjective!)
Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Lecture Uni. 1 Freiburg page / SoSe09 13 13 Random variables and probability density functions A random variable is a numerical characteristic assigned to an element of the sample space; can be discrete or continuous. Suppose outcome of experiment is continuous value x f(x) = probability density function (pdf) x must be somewhere Or for discrete outcome x i with e.g. i = 1, 2,... we have probability mass function x must take on one of its possible values
Cumulative distribution function Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Lecture Uni. 1 Freiburg page / SoSe09 14 14 Probability to have outcome less than or equal to x is cumulative distribution function Alternatively define pdf with
Mean, median and mode Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 15 e.g.: Maxwells s velocity distribution
Obtaining PDF from Histograms Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 16 pdf = histogram with infinite data sample, zero bin width, normalized to unit area
Multivariate distributions Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 17 Outcome of experiment characterized by several values, e.g. an n-component vector, (x 1,... x n ) joint pdf Normalization:
Marginal pdf Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 18 Sometimes we want only pdf of some (or one) of the components: i marginal pdf x 1, x 2 independent if
Marginal pdf (2) Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Lecture Uni. 1 Freiburg page / SoSe09 19 19 Marginal pdf ~ projection of joint pdf onto individual axes.
Conditional pdf Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 20 Sometimes we want to consider some components of joint pdf as constant. Recall conditional probability: conditional pdfs: Bayes theorem becomes: Recall A, B independent if x, y independent if
Conditional pdfs (2) Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 21 E.g. joint pdf f(x,y) used to find conditional pdfs h(y x 1 ), h(y x 2 ): Basically treat some of the r.v.s as constant, then divide the joint pdf by the marginal pdf of those variables being held constant so that what is left has correct normalization, e.g.,
WDF für zwei Variablen mit Abhängigkeit Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 22
2.4 Functions of a random variable Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 23 A function of a random variable is itself a random variable. Suppose x follows a pdf f(x), consider a function a(x). What is the pdf g(a)? ds = region of x space for which a is in [a, a+da]. For one-variable case with unique inverse this is simply
Mapping the x and a spaces Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 24 probability in a-space g(a)da equals one in x-space f(x)ds figure from Lutz Feld
Mapping the x and a spaces Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 25
Mapping the x and a spaces: two branches Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 26
Functions without unique inverse Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 27 If inverse of a(x) not unique, include all dx intervals in ds which correspond to da: Example:
Functions of more than one r.v. Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 28 Consider r.v.s and a function ds = region of x-space between (hyper)surfaces defined by
Functions of more than one r.v. (2) Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 29 Example: r.v.s x, y > 0 follow joint pdf f(x,y), consider the function z = xy. What is g(z)? (Mellin convolution)
More on transformation of variables Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 30 Consider a random vector with joint pdf Form n linearly independent functions for which the inverse functions exist. Then the joint pdf of the vector of functions is where J is the Jacobian determinant: For e.g. integrate over the unwanted components.
Expectation values G. Cowan Prof. M. Schumacher Stat Meth. Lectures der Datenanalyse on Kapi,1: Statistical Fundamentale Konzepten Lecture Uni. 1 Freiburg page / SoSe09 31 31 Consider continuous r.v. x with pdf f (x). Define expectation (mean) value as Notation (often): ~ centre of gravity of pdf. For a function y(x) with pdf g(y), (equivalent) Variance: Notation: Standard deviation: s ~ width of pdf, same units as x.
Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 32 Covariance and correlation Define covariance cov[x,y] (also use matrix notation V xy ) as Correlation coefficient (dimensionless) defined as If x, y, independent, i.e.,, then x and y, uncorrelated N.B. converse not always true.
Correlations Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 33
Korrelationen Prof. M. Schumacher Stat Meth. der Datenanalyse Kapi,1: Fundamentale Konzepten Uni. Freiburg / SoSe09 34