Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler <distler@uni-mainz.de> Mainz, 27. April 2015 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 1 / 44
What we ve learned so far Fundamental concepts random variable, probability frequentist vs. bayesian interpretation probability mass function, probability density function, cumulative distribution function expectation values and moments Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 2 / 44
Definitions probability mass function (pmf) probability density function (pdf) of a measured value (=random variable) f(n) 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 5 10 15 20 25 30 n f(x) 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 5 10 15 20 25 30 f (n) discrete f (x) continuous Normalization: f (n) 0 f (n) = 1 f (x) 0 f (x) dx = 1 Probability: n p(n 1 n n 2 ) = n 2 x n 1 f (n) p(x 1 x x 2 ) = x2 x 1 f (x)dx Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 3 / 44
Expectation values and moments Mean: A random variable X takes on the values X 1, X 2,..., X n with probability p(x i ), then the expected value of X ( mean ) is X = X = n X i p(x i ) i=1 The expected value of an arbitrary function h(x) for a continuous random variable is: E[h(x)] = The mean ist the expected value of x: E[x] = x = h(x) f (x)dx x f (x)dx Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 4 / 44
Expectation values and moments Moments are the expected value of x n and of (x x ) n. They are called nth algebraic moment µ n and nth central moment µ n, respectivly. Skewness v(x) is a measure of the asymmetry of the probability distribution of a random variable x: v = µ 3 σ 3 = E[(x E[x])3 ] σ 3 Kurtosis is a measure of the peakedness of the probability distribution of a random variable x. β 2 = µ 4 σ 4 = E[(x E[x])4 ] σ 4 γ 2 = β 2 3 (excess kurtosis) Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 5 / 44
Binomial distribution The binomial distribution is the discrete probability distribution of the number of successes r in a sequence of n independent yes/no experiments, each of which yields success with probability p (Bernoulli experiment). P(r) = ( n r ) p r (1 p) n r P(r) is normalized. Proof: Binomial theorem with q = 1 p. The mean of r is: n r = E[r] = rp(r)= np The variance σ 2 is V [r] = E[(r r ) 2 ] = r=0 n (r r ) 2 P(r)= np(1 p) r=0 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 6 / 44
Poisson distribution The Poisson distribution ist given by: The mean is: The variance is: The skewness is: P(r) = µr e µ r! r = µ V [r] = σ 2 = µ v = µ 3 σ 3 = 1 µ The excess kurtosis is: γ 2 = 1 µ 0.6 0.5 0.4 µ = 0.5 0.3 0.2 0.1 0 0 2 4 6 8 10 0.35 0.3 0.25 0.2 µ = 2 0.15 0.1 0.05 0 0 2 4 6 8 10 0.6 0.5 0.4 µ = 1 0.3 0.2 0.1 0 0 2 4 6 8 10 0.35 0.3 0.25 0.2 µ = 4 0.15 0.1 0.05 0 0 2 4 6 8 10 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 7 / 44
Uniform distribution This probability distribution is constant in between the limits x = a and x = b: f (x) = Mean and variance: { 1 b a a x < b 0 otherwise x = E[x] = a + b 2 V [x] = σ 2 = (b a)2 12 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 8 / 44
Gaussian distribution The most important probability distribution - also called normal distribution: f (x) = 1 e (x µ)2 2σ 2 2πσ The Gaussian distribution has two parameters, the mean µ and the variance σ 2. The probability distribution with mean µ = 0 and variance σ 2 = 1 is named standard normal distribution or short N(0, 1). The Gaussian distribution can be derived from the binomial distribution for large values of n and r and similarly from the Poisson distribution for large values of µ. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 9 / 44
Gaussian distribution 1 1 2 2 3 3 dx N(0, 1) = 0.6827 = (1 0.3173) dx N(0, 1) = 0.9545 = (1 0.0455) dx N(0, 1) = 0.9973 = (1 0.0027) FWHM: useful to estimate the standard deviation: FWHM = 2σ 2ln2 = 2.355σ Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 10 / 44
press any key Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 11 / 44
Chi-square distribution If x 1, x 2,..., x n are independend random variables distributed according to the standard Gaussian distribution with mean 0 and variance 1, then the sum u = χ 2 = n i=1 x 2 i ist distributed according to a χ 2 distribution f n (u) = f n (χ 2 ) where n is called the number of degrees of freedom. f n (u) = ( 1 u ) n/2 1 2 2 e u/2 Γ(n/2) The χ 2 distribution has a maximum at (n 2). The mean is found to be n and the variance is 2n. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 12 / 44
Chi-square distribution 0.3 0.25 0.2 pdf(2,x) pdf(3,x) pdf(4,x) pdf(5,x) pdf(6,x) pdf(7,x) pdf(8,x) pdf(9,x) 0.15 0.1 0.05 0 0 2 4 6 8 10 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 13 / 44
Chi-square cumulative distribution function The probability for χ 2 n to take on a value in the interval [0, x]. 1 0.8 cdf(2,x) cdf(3,x) cdf(4,x) cdf(5,x) cdf(6,x) cdf(7,x) cdf(8,x) cdf(9,x) 0.6 0.4 0.2 0 0 2 4 6 8 10 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 14 / 44
χ 2 vs. χ 2 red. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 15 / 44
Chi-square distribution with 5 d.o.f. 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 95% c.l. [0.831... 12.83] 0 0 2 4 6 8 10 12 14 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 16 / 44
Gamma distribution The goal is to calculate the probability density function of f (t) for the time difference t between two events, when events occur at a mean rate λ. Example: the radioactive decay with a mean decay rate λ. The probability density distribution of the gamma distribution is given by: f (x; k) = x k 1 e x Γ(k) mit Γ(z) = 0 t z 1 e t dt; Γ(z+1) = z! this is the wait time t = x from the first to the kth event of Poisson-distributed process with mean µ = 1 an. The generalization for other values of µ is f (x; k, µ) = x k 1 µ k e µx Γ(k) Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 17 / 44
Gamma distribution 1 0.9 1.0*exp(-1.0*x) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 18 / 44
Characteristic function If x is a real random variable with the distribution function F(x) and the probability density function f (x), one referred to the expected value of exp(ıtx) as their characteristic function: ϕ(t) = E[exp(ıtx)] so in the case of continuous variables, a Fourier integral with its well-known transforming properties: ϕ(t) = exp(ıtx) f (x)dx f (x) = 1 2π Especially for the algebraic moments one gets: µ n = E[x n ] = ϕ (n) (t) = d n ϕ(t) dt n = ı n ϕ (n) (0) = ı n µ n x n f (x)dx x n exp(ıtx) f (x)dx exp( ıtx) ϕ(t)dt Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 19 / 44
1.5 Theorems The law of large numbers The law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed. We perform n independent experiments (Bernoulli trials) where the result j occurs n j times. p j = E[h j ] = E[n j /n] The variance of a Binomial distribution is: V [h j ] = σ 2 (h j ) = σ 2 (n j /n) = 1 n 2 σ2 (n j ) = 1 n 2 np j(1 p j ) From the product p j (1 p j ) which is 1 4, we can deduce the law of large numbers: σ 2 (h j ) < 1/n Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 20 / 44
The central limit theorem The central limit theorem (CLT) states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. Let x i be a sequence of n independent and identically distributed random variables each having finite values of expectation µ and variance σ 2 > 0. In the limit n the random variable w = n i=1 x i will be normally distributed with mean w = n x and variance V [w] = nσ 2. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 21 / 44
Illustration: The central limit theorem 0.5 0.5 N=1 0.4 Gauss 0.4 N=2 0.3 0.2 0.1 0-3 -2-1 0 1 2 3 0.3 0.2 0.1 0-3 -2-1 0 1 2 3 0.5 0.4 N=3 0.5 0.4 N=10 0.3 0.3 0.2 0.2 0.1 0.1 0-3 -2-1 0 1 2 3 0-3 -2-1 0 1 2 3 The sum of uniformly distributed random variables and the standard normal distribution. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 22 / 44
Sampling e.g. selecting a random (or representative) subset of a population Sample of 100 measurements: l i /cm n i n i l i /cm n i li 2 /cm 2 18.9 1 18.9 357.21 19.1 1 19.1 364.81 19.2 2 38.4 737.28 19.3 1 19.3 372.49 19.4 4 77.6 1505.44 19.5 3 58.5 1140.75 19.6 9 176.4 3457.44 19.7 8 157.6 3104.72 19.8 11 217.8 4312.44 19.9 9 179.1 3564.09 20.0 5 100.0 2000.00 20.1 7 140.7 2828.07 20.2 8 161.6 3264.32 20.3 9 182.7 3708.81 20.4 6 122.4 2496.96 20.5 3 61.5 1260.75 20.6 2 41.2 848.72 20.7 2 41.4 856.98 20.8 2 41.6 865.28 20.9 2 41.8 873.62 21.0 4 84.0 1764.00 21.2 1 21.2 449.44 100 2002.8 40133.62 N = n i = 100 Mean? Variance? Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 23 / 44
Sampling e.g. selecting a random (or representative) subset of a population Sample of 100 measurements: l i /cm n i n i l i /cm n i li 2 /cm 2 18.9 1 18.9 357.21 19.1 1 19.1 364.81 19.2 2 38.4 737.28 19.3 1 19.3 372.49 19.4 4 77.6 1505.44 19.5 3 58.5 1140.75 19.6 9 176.4 3457.44 19.7 8 157.6 3104.72 19.8 11 217.8 4312.44 19.9 9 179.1 3564.09 20.0 5 100.0 2000.00 20.1 7 140.7 2828.07 20.2 8 161.6 3264.32 20.3 9 182.7 3708.81 20.4 6 122.4 2496.96 20.5 3 61.5 1260.75 20.6 2 41.2 848.72 20.7 2 41.4 856.98 20.8 2 41.6 865.28 20.9 2 41.8 873.62 21.0 4 84.0 1764.00 21.2 1 21.2 449.44 100 2002.8 40133.62 N = n i = 100 l = 1 ni l i = 20.028 cm N ( s 2 1 = ni li 2 1 ( ) ) 2 ni l i N 1 N = 0.2176 cm 2 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 23 / 44
Sampling e.g. selecting a random (or representative) subset of a population Sample of 100 measurements: l i /cm n i n i l i /cm n i li 2 /cm 2 18.9 1 18.9 357.21 19.1 1 19.1 364.81 19.2 2 38.4 737.28 19.3 1 19.3 372.49 19.4 4 77.6 1505.44 19.5 3 58.5 1140.75 19.6 9 176.4 3457.44 19.7 8 157.6 3104.72 19.8 11 217.8 4312.44 19.9 9 179.1 3564.09 20.0 5 100.0 2000.00 20.1 7 140.7 2828.07 20.2 8 161.6 3264.32 20.3 9 182.7 3708.81 20.4 6 122.4 2496.96 20.5 3 61.5 1260.75 20.6 2 41.2 848.72 20.7 2 41.4 856.98 20.8 2 41.6 865.28 20.9 2 41.8 873.62 21.0 4 84.0 1764.00 21.2 1 21.2 449.44 100 2002.8 40133.62 N = n i = 100 l = 1 ni l i = 20.028 cm N ( s 2 1 = ni li 2 1 ( ) ) 2 ni l i N 1 N l = 0.2176 cm 2 s = l ± N = (20.028 ± 0.047) cm s = s s ± 2(N 1) = (0.466 ± 0.033) cm Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 23 / 44
Sampling 12 10 "length.dat" Gauß(µ=20.028,σ=0.466) Gauß(µ=20.0,σ=0.5) 8 Häufigkeit 6 4 2 0 18.5 19 19.5 20 20.5 21 21.5 Länge / cm Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 24 / 44
Sampling How likeli is it, that the sample was taken from a normal distributed population with the parameters µ = 20.028 cm and σ = 0.466 cm? Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 25 / 44
Sampling How likeli is it, that the sample was taken from a normal distributed population with the parameters µ = 20.028 cm and σ = 0.466 cm? Exercises Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 25 / 44
Numerical calculation of sample mean and variance Well known formulas: x = 1 n x i s 2 = 1 n n 1 i=1 n (x i x) 2. The calculation requires to loop twice over the whole data sample. This can be done in one loop (for large data samples): ( s 2 = 1 n (x i x) 2 = 1 n n ) 2 xi 2 1 x i. n 1 n 1 n i=1 i=1 Two Sums have to be calculated: n S x = x i S xx = i=1 Mean and variance are given by: x = 1 n S x s 2 = 1 n 1 i=1 n i=1 x 2 i i=1 ( S xx 1 ) n S2 x. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 26 / 44
Numerical calculation of sample mean and variance This may require to subtract large numbers. Because the resolution of the number representation in your computer is often finite this may lead to numerical problems. In this case it is better to use a rough estimate x e (e.g. the first measured value) for the mean: T x = n (x i x e ) T xx = i=1 n (x i x e ) 2 i=1 This leads to x = x e + 1 n T x s 2 = 1 n 1 ( T xx 1 ) n T x 2. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 27 / 44
1.6 multidimensional distributions Random variables in two dimensions The multidimensional probability distribution f (x, y) of the two random variables x and ỹ is defined by the probability to find the pair of variables ( x, ỹ) in the intervals a x < b and c ỹ < d P(a x < b, c ỹ < d) = d b c a f (x, y) dx dy Normalisation: f (x, y) dx dy = 1 If f (x, y) = h(x) g(y) then the two variables are independend. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 28 / 44
Random variables in two dimensions The definition of mean and variance is straight forward: < x >= E[x] = x f (x, y) dx dy < y >= E[y] = y f (x, y) dx dy V [x] = (x < x >) 2 f (x, y) dx dy = σx 2 V [y] = (y < y >) 2 f (x, y) dx dy = σy 2 If z is a function of x, y: z = z(x, y) Then z is also a random variable < z > = z(x, y) f (x, y) dx dy σ 2 z = (z < z >) 2 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 29 / 44
Random variables in two dimensions Example: Expected value of z: < z > = a z(x, y) = a x + b y x f (x, y) dx dy + b = a < x > + b < y > y f (x, y) dx dy uncomplicated Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 30 / 44
Random variables in two dimensions Variance: σ 2 z = = z(x, y) = a x + b y ((a x + b y) (a < x > + b < y >)) 2 ((a x a < x >) + (b y b < y >)) 2 = a 2 (x < x >) 2 +b 2 (y < y >) 2 } {{ } } {{ } σx 2 σy 2 +2ab (x < x >)(y < y >) }{{}?? < (x < x >)(y < y >) >= cov(x, y) covariance = σ xy = (x < x >)(y < y >) f (x, y) dx dy Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 31 / 44
Random variables in two dimensions Normalized covariance: cov(x, y) σ x σ y = ρ xy correlation coefficient gives a dimensionless measure of the level of correlation between two variables : 1 ρ xy 1 Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 32 / 44
Random variables in two dimensions The determinant of any covariance matrix is positive σ xy = σ2 xσy 2 σxy 2 = σxσ 2 y(1 2 ρ 2 ) 0 σ2 x σ xy σ 2 y Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 33 / 44
2-dim Gaussian distribution -2.7-2.8-2.9 Parameter a 2-3 -3.1-3.2-3.3 1.85 1.9 1.95 2 2.05 2.1 2.15 Parameter a 1 Probability content of the covariance ellipse: 39.3% Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 34 / 44
Covariance matrix in n dimensions In order to generalize the variance one defines the covariance matrix to be: V ij = ( x < x >)( x < x >) T The diagonal elements of the matrix V ij are the variances and the non-diagonal elements are the covariances: V ii = var(x i ) = (x i < x i >) 2 f ( x) dx 1 dx 2... dx n V ij = cov(x i, x j ) = (x i < x i >)(x j < x j >) f ( x) dx 1 dx 2... dx n. Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 35 / 44
Covariance matrix in n dimensions The covariance matrix V ij = var(x 1 ) cov(x 1, x 2 )... cov(x 1, x n ) cov(x 2, x 1 ) var(x 2 )... cov(x 2, x n )......... cov(x n, x 1 ) cov(x n, x 2 )... var(x n ) is a symmetric n n matrix: V ij = σ 2 1 σ 12... σ 1n σ 21 σ 2 2... σ 2n......... σ n1 σ n2... σ 2 n Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 36 / 44
1.7 Functions of random variables A function of a random variable is itself a random variable. Suppose x follows a pdf f x (x), consider a function y = y(x), what is the pdf f y (y)? f x (x) y = y(x) f y(y) Consider the interval (x, x + dx) (y, y + dy) In order to conserve the normalization the integrals have to be the same: f x (x)dx = f y (y)dy f y (y) = f x (x(y)) dx dy Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 37 / 44
Transformation of mean and variance Taylor expansion of the mean y(x) = y( x ) + (x x ) dy dx + 1 x= x 2 (x d 2 y x )2 dx 2 +... x= x up to order 2: E[y] y( x ) + E[x x ] dy dx + 1 x= x 2 E[(x x )2 ] d 2 y dx 2 }{{} =0 1 d 2 y y y( x ) + 2 σ2 x dx 2 x= x }{{} frequently disregarded x= x Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 38 / 44
Error propagation For the transformation of the variance we assume y y( x ) and expand y(x) in a neighborhood of the x : ( [ V [y] = E (y y ) 2] = E (x x ) dy ) 2 dx = ( 2 dy dx E x= x ) ( [ (x x ) 2] = error propagation for a single random variable. x= x dy dx x= x ) 2 σ 2 x Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 39 / 44
1.8 Folding Two random variables x and y are defined by their probability distributions f x (x) and f y (y). Obviously the sum w = x + y is also a random variable. The probability distribution for w is f w (w) which is calculated by folding x and y. f w (w) = f x (x)f y (y)δ(w x y) dx dy = f x (x)f y (w x) dx = f y (y)f x (w y) dy f w (w) = f x (x) f y (y) ϕ w (t) = ϕ x (t) ϕ y (t) characteristic function Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 40 / 44
Characteristic function If x is a real random variable with the distribution function F(x) and the probability density function f (x), one referred to the expected value of exp(ıtx) as their characteristic function: ϕ(t) = E[exp(ıtx)] so in the case of continuous variables, a Fourier integral with its well-known transforming properties: ϕ(t) = exp(ıtx) f (x)dx f (x) = 1 2π Especially for the algebraic moments one gets: µ n = E[x n ] = ϕ (n) (t) = d n ϕ(t) dt n = ı n ϕ (n) (0) = ı n µ n x n f (x)dx x n exp(ıtx) f (x)dx exp( ıtx) ϕ(t)dt Dr. Michael O. Distler <distler@uni-mainz.de> Statistics, Data Analysis, and Simulation SS 2015 41 / 44