Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli
Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli
2 Intro The Crash Course Definitions: random vars, probability, PDFs Point estimators Max likelihood, least squares fits Hypothesis testing, confidence limits Monte Carlo techniques Systematic uncertainties
3 Necessary Tool Statistics Physics = Experimental Science Both qualitative and quantitative understanding is required Observations Laws Make a set of measurements or observations Summarize the results Most of the conclusions are drawn with some degree of (un)certainty» Ex: Gravity exists (qualitative).» F=G N m 1 m 2 /r 2 (quantitative)» But in reality G N and n=2 are known to some precision, e.g. G N = (6.67428±0.00067)*10-11 N*m 2 /kg 2 Many measurements are a priori uncertain (e.g. quantum physics), and have to be interpreted in probabilistic terms Estimating such probabilities given finite amount of data and to test whether a given model is consistent with the data is the basic task of classical statistics
4 Describing the Data Data: results of the measurements In physics, we mostly deal with quantitative data, i.e. set of numbers Other fields may deal with qualitative data An American Robin has gray upperparts and head, and orange underparts, usually brighter in the male Numbers are easier to handle mathematically (duh), hence statistics will deal with quantitative measurements Discrete data, e.g. integers (counts) Continuous data, e.g. energies, momenta, etc Write down with some precision, typically set by the measuring apparatus or other external conditions
5 Scatter plot Examples Asymmetry (ppm) Δy (µm)
6 Examples Graph
7 Histogram Examples
8 Frequentist interpretation: Probability is a limiting frequency a given outcome is reported when experiments are repeating an infinite number of times Measurable parameters are represented by estimators with assigned confidence levels. CL measures a probability an estimator would fall in a certain range, given a true value of a parameter. No probability is assigned to constants of nature. Bayesian interpretation: Probability: Two Interpretations More general: define probability as a degree of belief that a given statement is true E.g. that the true value of parameter x is in interval [a,b] This is somewhat subjective, but follows how most humans think
9 Bayes Theorem Conditional probability of A given B P (A B) = P (A B) P (B) Interpreted within Bayesian statistics as P (theory data) P (data theory)p (theory) Posterior probability Likelihood (result of the measurement) Prior probability (initial prejudice) Allows one to interpret a single experiment as a measure of (subjective) probability that a given hypothesis is correct (e.g. that some fundamental constant is in some range). Requires assigning some probability interpretation to prior knowledge, which is where subjectivity comes in.
10 Random Variables Random variable: a numerical outcome of a (repeatable) measurement Characterized by a Probability Density Function Depends on a set of parameters θ Cumulative distribution (CDF): F (a) = a f(x) dx
11 Expectation Values Expectation value of function u(x): E[u(x)] = Moments of a random variable x: u(x) f(x) dx Special moments: µ α 1, σ 2 V [x] m 2 = α 2 µ 2 α n E[x n ]= m n E[(x α 1 ) n ]= x n f(x) dx (x α 1 ) n f(x) dx n-th moment n-th central moment Mean Variance
12 Examples 09/10/2013 09/15/2011 YGK, Phys226: YGK, Statistics Phys129
13 Sample from a Continuous PDF
14 Sample From a Continuous PDF 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
15 Sample From a Continuous PDF PDF 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
16 Sample From a Discrete PDF 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
17 Sample From a Discrete PDF 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
18 Sample From a Discrete PDF 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
19 PDF Cumulative Distribution Cumulative Distribution 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
20 2d Distribution dx dy 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
21 2d Distribution 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
22 Marginal PDF 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
23 Correlations Between Variables Covariance: cov[x, y] =E[(x µ x )(y µ y )] = E[xy] µ x µ y Represent N-dimensional parameter space in terms of covariance matrix Vij Diagonal elements: variances (squares of RMS) Off-diagonal: covariances Correlation: normalized covariance ρ xy =cov[x, y]/σ x σ y
24 Profile Histogram 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
25 Common PDFs
26 Poisson Distribution 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
27 Gaussian Distribution
28 χ 2 Distribution
29 Cauchy (Breit-Wigner) PDF
30 Point Estimation Standard problem: set of values x 1, x 2,, x n described by PDF data parameter(s) Point estimation: want to construct Estimator of parameter θ
31 Estimator Properties Consistency Approaches true value asymptotically for infinite dataset Bias Difference wrt true value for finite dataset Efficiency Variance of the estimator (compared to others) Sufficiency Dependence on true value Robustness Sensitivity to bad data, e.g. outliers Others: physicality, tractable-ness, etc. No ideal recipe, what is best depends on the problem
32 Basic Estimators Estimators for mean and variance Shape of the PDF (fitting): Maximum likelihood Most efficient, but may be biased Goodness of fit is not readily available Least Chisquared ML for gaussian-distributed data Convenient for binned data, analytic solutions for linear functions Automatic goodness-of-fit measure Be careful of gaussian approximations (e.g. when Poisson becomes Gaussian)
33 Mean and Variance from a Sample Estimators: (equally weighted data) µ = 1 N Variances of these estimators: V [ˆµ] = σ2 N [ σ2] V = 1 N N i=1 σ 2 = 1 N 1 i.e. x i N (x i µ) 2 i=1 N>0 N>1 σ [ˆµ] =σ/ N ( m 4 N 3 ) N 1 σ4 σ [ˆσ] =σ/ 2N for Gaussian distribution of x and large N
34 Maximum Likelihood Estimators Define likelihood for N independent measurements x i : L(θ) = N f(x i ; θ) i=1 max to determine estimators of θ This leads to a system of (generally nonlinear) equations for parameters θ: ln L θ i =0, i =1,...,n Solutions of these equations (often done numerically) determine estimators. Their covariance matrix iis given by ( V 1 ) ij = 2 ln L θ i θ j θ Maximum likelihood method has a nice property that (in the limit of infinite. (33 10) can result in an statistics) it produces unbiased estimators with smallest possible variance. But beware of small statistics samples! ML fits are implemented in many statistical packages (ROOT, MATLAB). Can be applied to binned or unbinned data
35 Least Squares Estimators For a set of Gaussian-distributed variables y i, define: χ 2 (θ) = 2lnL(θ)+ constant = N i=1 Or, for correlated variables! χ 2 (θ) =(y F (θ)) T V 1 (y F (θ)) Estimators: i (y i F (x i ; θ)) 2 σ 2 i In particular, if the function F is linear in parameters θ, LS estimators are found by y solving a a system of of linear linear equations analytically:! m F (x i ; θ) = θ j h j (x i ) θ =(H T V 1 H) 1 H T V 1 y Dy. j=1 Least-squares east-squares fits fits are are typically typically done done on on binned binned data, data, and and implemented implemented in in most statistical packages (ROOT, MATLAB, even Excel)
36 Confidence Limits
37 Error Intervals From Likelihood Ratio
38 Error Intervals From Likelihood Ratio
39 Chi2 Quantiles
40 Confidence Limits Frequentist approach: confidence belts Define Small caveat: interval not unique. Use central intervals (equal area on both sides) or decide based on likelihood ratio (e.g. Feldman-Cousins) x 0
41 Bayesian Approach Likelihood function + prior -> posterior for parameter Treat as PDF and integrate Caveat: choice of prior
42 Ben Hooberman s thesis! Example Likelihood 1 0.8 Likelihood 1 0.8 0.6 0.4 0.2 0.6 0.4 0.2 0-4 -2 0 2 4 6 8 10-6 BF(ϒ(3S) eτ) ( 10 ) 0-4 -2 0 2 4 6 8 10-6 BF(ϒ(3S) µτ) ( 10 ) Figure 2.33: Likelihood as a function of the branching fractions (left) and (right) [60]. The dotted red curve includes statistical uncertainties only, the solid blue curve includes systematic uncertainties as well. The shaded green regions bounded by the vertical lines indicate 90% of the area under the physical ( )regionsofthelikelihood curves.
43 Hypothesis Testing Setting a confidence interval is a special case of a general problem of hypothesis testing E.g. hypothesis is that x is within this interval Or x belongs to a distribution Hypothesis testing is a procedure for assigning a significance (confidence) level to a test Generally involves computing quintiles of a distribution
44 Example: Gaussian distribution 1 α = 1 2πσ µ+δ µ δ e (x µ)2 /2σ 2 dx =erf ( δ 2 σ ) α/2 that the measured value will fall within ± of the tru 1 α f (x; µ,σ) α/2 α δ α δ 0.3173 1σ 0.2 1.28σ 4.55 10 2 2σ 0.1 1.64σ 2.7 10 3 3σ 0.05 1.96σ 6.3 10 5 4σ 0.01 2.58σ 5.7 10 7 5σ 0.001 3.29σ 2.0 10 9 6σ 10 4 3.89σ ± 3 2 1 0 1 2 3 (x µ)/σ
45 Example: chi-squared p-values 1.000 0.500 p-value for test α for confidence intervals 0.200 0.100 0.050 0.020 0.010 0.005 n = 1 2 3 4 6 8 10 15 25 20 30 40 50 0.002 0.001 1 2 3 4 5 7 10 20 30 40 50 70 100 e 33.1: One minus the cumulative distribution, 1 ( ; ), for de χ 2
46 Systematics
47 Statistical errors: Another Class of Errors Spread in values one would see if the experiment were repeated multiple times RMS of the estimator for an ensemble of experiments done under the same conditions (e.g. numbers of events) Several methods discussed (sqrt)variance of the estimator if PDF is known Curvature of log(likelihood) Δlog(L) = 1/2 rule (or Δχ 2 =1) But there is another source of uncertainty in results: systematics
48 Mass spectrometer error: resolutio Stat error: resolution/sqrt(n) m = qrb2 2V Measure V,B for each run Average fluctuations Common errors do not average out Scale of B,V Radius r Velocity selection Energy loss (residual pressure) Etc, etc. Simple Example 09/10/2013 YGK, Phys226: YGK, Statistics Phys129
49 Combination of Errors Normally, independent errors are added in quadrature For instance, if measurements of r,v,b are uncorrelated, then (to first order) "(m) m = #"(r)& % ( $ r ' 2 # + "(V ) & % ( $ V ' This is fine for a single ion But when we average (take more data), have to take into account the fact that errors on r,v,b correlate measurements of mass for each ion 2 # + 2 "(B) & % ( $ B ' 2
50 Quadrature Sum Stat and syst errors are typically quoted separately in experimental papers (though not in PDG) E.g. σ=[15 ± 5 (stat.) ± 1 (syst.)] nb It is understood that the first number scales with the number of events while the second may not Splitting like this gives a feeling of how much a measurement could be improved with more data It is also understood that stat and syst errors are uncorrelated (if this is not the case, have to say so explicitly!) It is also understood that stat errors are uncorrelated between different experiments, while syst errors could be correlated (modeling, bias)
51 Classic Example (one of many)
52 Combining Errors For one measurement with stat and syst errors, this is easy Suppose we measure x 1 =<x 1 >±σ 1 ±S Split into random and systematic parts x 1 =<x 1 >+x R +x S <x R >=<x S >=0, <(x R ) 2 >=σ 1, <(x S ) 2 >=S Total variance V[x 1 ]=<x 1 2>-<x 1 > 2 =< (x R +x S ) 2 >= σ 1 2+S 2 Syst and stat errors are combined in quadrature
53 Error Propagation Fully fledged formula Assume small errors (i.e. keep 1st Taylor term): Consequences If two measurements are correlated, it may be possible to find a combination with zero variance (det(v)=0) Two fully-correlated measurements x 1,x 2 ; for X=x 1 +x 2 : Errors add up linearly!
54 Error Propagation, General m functions f 1,,f m, n variables x 1,,x n : Or in matrix form and
55 Systematic Errors and Fitting Use covariance matrix in χ 2 : d i =(y i -y i fit) Can apply the same recipe for ML fit (e.g. L~exp(-χ 2 /2))
56 Practical Implications In the full formalism, can still use χ 2 /df test to determine the goodness of fit But this will not work unless correlations are taken into account For simplicity, if all stat errors are roughly equal and all systematic errors are common, can do the fit with stat errors only (this will determine stat errors on parameters), then propagate syst errors Limitations More points do not improve the systematic error Goodness of fit would not reveal unsuspected sources of systematics All points move together -- same goodness of fit