Introduction to Error Analysis

Similar documents
Introduction to Statistics and Error Analysis II

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

Introduction to Statistics and Error Analysis

MASSACHUSETTS INSTITUTE OF TECHNOLOGY PHYSICS DEPARTMENT

Statistics, Probability Distributions & Error Propagation. James R. Graham

Measurement And Uncertainty

Primer on statistics:

Counting Statistics and Error Propagation!

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Statistical Methods in Particle Physics

A Quick Introduction to Data Analysis (for Physics)

Lecture 2: Discrete Probability Distributions

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Statistics and data analyses

Lecture 3: Statistical sampling uncertainty

Week 1 Quantitative Analysis of Financial Markets Distributions A

Physics 6720 Introduction to Statistics April 4, 2017

Probability Density Functions

Some Statistics. V. Lindberg. May 16, 2007

Statistical Data Analysis Stat 3: p-values, parameter estimation

Introduction to Statistics and Data Analysis

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Business Statistics. Lecture 10: Course Review

Statistics and Data Analysis

Statistical Methods in Particle Physics

Parameter Estimation and Fitting to Data

Statistical Methods for Astronomy

Statistical Methods in Particle Physics

Inferring from data. Theory of estimators

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

This does not cover everything on the final. Look at the posted practice problems for other topics.

Data, Estimation and Inference

Chapter 3 Common Families of Distributions

Statistics, Data Analysis, and Simulation SS 2017

Statistical Methods for Astronomy

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

Chapter 2: Statistical Methods. 4. Total Measurement System and Errors. 2. Characterizing statistical distribution. 3. Interpretation of Results

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

System Identification

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

STA1000F Summary. Mitch Myburgh MYBMIT001 May 28, Work Unit 1: Introducing Probability

Hypothesis testing:power, test statistic CMS:

ik () uk () Today s menu Last lecture Some definitions Repeatability of sensing elements

Probability and Estimation. Alan Moses

Chapter 5 continued. Chapter 5 sections

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Statistics, Data Analysis, and Simulation SS 2015

Chapter 5. Chapter 5 sections

Objective Experiments Glossary of Statistical Terms

Lecture 24: Weighted and Generalized Least Squares

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Practice Problems Section Problems

Chapter 4. Probability and Statistics. Probability and Statistics

Numerical Methods Lecture 7 - Statistics, Probability and Reliability

PHYS 6710: Nuclear and Particle Physics II

Poisson distribution and χ 2 (Chap 11-12)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Lectures on Statistical Data Analysis

Practical Statistics

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Statistics, Data Analysis, and Simulation SS 2013

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

probability of k samples out of J fall in R.

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Data Analysis and Fitting for Physicists

Probability Distributions - Lecture 5

Statistics, Data Analysis, and Simulation SS 2013

Part I. Experimental Error

Error Analysis How Do We Deal With Uncertainty In Science.

Bayesian Data Analysis

Lecture 8 Hypothesis Testing

Discrete Probability distribution Discrete Probability distribution

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 18

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Notes Errors and Noise PHYS 3600, Northeastern University, Don Heiman, 6/9/ Accuracy versus Precision. 2. Errors

Statistics Random Variables

Ch. 7: Estimates and Sample Sizes

Advanced Statistical Methods. Lecture 6

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Generalized Linear Models for Non-Normal Data

The Union and Intersection for Different Configurations of Two Events Mutually Exclusive vs Independency of Events

BERTINORO 2 (JVW) Yet more probability Bayes' Theorem* Monte Carlo! *The Reverend Thomas Bayes

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Probability Distributions

CSC321 Lecture 18: Learning Probabilistic Models

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Sociology 6Z03 Review II

Lecture 12: Quality Control I: Control of Location

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Review for the previous lecture

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Transcription:

Introduction to Error Analysis Part 1: the Basics Andrei Gritsan based on lectures by Petar Maksimović February 1, 2010 Overview Definitions Reporting results and rounding Accuracy vs precision systematic vs statistical errors Parent distribution Mean and standard deviation Gaussian probability distribution What a 1σ error means

Definitions µ true : true value of the quantity x we measure x i : observed value error on µ: difference between the observed and true value, x i µ true All measurement have errors true value is unattainable seek best estimate of true value, µ seek best estimate of true error x i µ

One view on reporting measurements (from the book) keep only one digit of precision on the error everything else is noise Example: 410.5163819 4 10 2 exception: when the first digit is 1, keep two: Example: 17538 1.7 10 4 round off the final value of the measurement up to the significant digits of the errors Example: 87654 ± 345 kg (876 ± 3) 10 2 kg rounding rules: 6 and above round up 4 and below round down 5: if the digit to the right is even round down, else round up (reason: reduces systematic erorrs in rounding)

A different view on rounding From Particle Data Group (authority in particle physics): http://pdg.lbl.gov/2009/reviews/rpp2009-rev-rpp-intro.pdf between 100 and 354, we round to two significant digits Example: 87654 ± 345 kg (876.5 ± 3.5) 10 2 kg between 355 and 949, we round to one significant digit Example: 87654 ± 365 kg (877 ± 4) 10 2 kg lie between 950 and 999, we round up to 1000 and keep two significant digits Example: 87654 ± 950 kg (87.7 ± 1.0) 10 3 kg Bottom line: Use consistent approach to rounding which is sound and accepted in the field of study, use common sense after all

Accuracy vs precision Accuracy: Precision: how close to true value how well the result is determined (regardless of true value); a measure of reproducibility Example: µ = 30 x = 23 ± 2 precise, but inacurate uncorrected biases (large systematic error) x = 28 ± 7 acurate, but imprecise subsequent measurements will scatter around µ = 30 but cover the true value in most cases (large statistical (random) error) an experiment should be both acurate and precise

Statistical vs. systematic errors Statistical (random) errors: describes by how much subsequent measurements scatter the common average value if limited by instrumental error, use a better apparatus if limited by statistical fluctuations, make more measurements Systematic errors: all measurements biased in a common way harder to detect: faulty calibrations wrong model bias by observer also hard to determine (no unique recipe) estimated from analysis of experimental conditions and techniques may be correlated

Parent distribution (assume no systematic errors for now) parent distribution: the probability distribution of results if the number of measurements N however, only a limited number of measurements: we observe only a sample of parent dist., a sample distribution prob. distribution of our measurements only approaches parent dist. with N use observed distribution to infer the parameters from the parent distribution, e.g., µ µ true when N Notation Greek: parameters of the parent distribution Roman: experimental estimates of params of parent dist.

Mean, median, mode Mean: of experimental (sample) dist: N x 1 N i=1 x i... of the parent dist µ lim N ( 1 N xi ) mean centroid average Median: splits the sample in two equal parts Mode: most likely value (highest prob.density)

Variance Deviation: d i x i µ, for single measurement Average deviation: x i µ = 0 by definition α x i µ but, absolute values are hard to deal with analytically Variance: instead, use mean of the deviations squared: σ 2 (x i µ) 2 = x 2 µ 2 ( 1 σ 2 = lim N N ) x 2 i µ 2 ( mean of the squares minus the square of the mean )

Standard deviation Standard deviation: root mean square of deviations: σ σ 2 = x 2 µ 2 associated with the 2nd moment of x i distribution Sample variance: replace µ by x s 2 1 N 1 (xi x) 2 N 1 instead of N because x is obtained from the same data sample and not independently

So what are we after? We want µ. Best estimate of µ is sample mean, x x Best estimate of the error on x (and thus on µ is square root of sample variance, s s 2 Weighted averages P(x i ) discreete probability distribution replace x i with P(x i )x i and x 2 i by P(x i )x 2 i by definition, the formulae using are unchanged

Gaussian probability distribution unquestionably the most useful in statistical analysis a limiting case of Binomial and Poisson distributions (which are more fundamental; see next week) seems to describe distributions of random observations for a large number of physical measurements so pervasive that all results of measurements are always classified as gaussian or non-gaussian (even on Wall Street)

Meet the Gaussian probability density function: random variable x parameters center µ and width σ: p G (x; µ, σ) = 1 σ 2π exp [ 1 2 ( ) ] x µ 2 σ Differential probability: probability to observe a value in [x, x + dx] is dp G (x; µ, σ) = p G (x; µ, σ)dx

Standard Gaussian Distribution: replace (x µ)/σ with a new variable z: p G (z)dz = 1 2π exp ( z2 2 ) dz got a Gaussian centered at 0 with a width of 1. All computers calculate Standard Gaussian first, and then stretch it and shift it to make p G (x; µ, σ) mean and standard deviation By straight application of definitions: mean = µ (the center ) standard deviation = σ (the width ) This makes Gaussian so convenient!

Interpretation of Gaussian errors we measured x = x 0 ± σ 0 ; what does that tell us? Standard Gaussian covers 0.683 from 1.0 to +1.0 the true value of x is contained by the interval [x 0 σ 0, x 0 + σ 0 ] 68.3% of the time!

The Gaussian distribution coverage

Introduction to Error Analysis Part 2: Fitting Overview principle of Maximum Likelihood minimizing χ 2 linear regression fitting of an arbitrery curve

Likelihood observed N data points, from parent population assume Gaussian parent distribution (mean µ, std. deviation σ) probability to observe x i, given true µ, σ P i (x i µ, σ) = 1 σ 2π exp [ 1 2 ( ) ] xi µ 2 probability to have measured µ in this single measurement, given observed x i and σ i, is called likelihood: P i (µ x i, σ i ) = 1 [ σ 2π exp 1 ( ) ] xi µ 2 2 σ for N observations, total likelihood is L P(µ ) = Π N i=1 P i(µ ) σ

Principle of Maximum Likelihood maximizing P(µ ) gives µ as the best estimate of µ ( the most likely population from which data might have come is assumed to be the correct one ) for Gaussian individual probability distributions (σ i = const = σ) ( ) [ 1 N L = P(µ) = σ exp 1 2π 2 ( ) ] x i µ 2 maximizing likelihood minimizing argumen of Exp. χ 2 ( x i µ σ ) 2 σ

Example: calculating mean cross-checking... dχ 2 dµ = d dµ ( xi µ ) 2 = 0 σ derivative is linear ( x i µ ) σ = 0 µ = x 1 N xi The mean really is the best estimate of the measured quantity.

Linear regression simplest case: linear functional dependence measurements y i, model (prediction) y = f(x) = a + bx in each point, µ y(x i ) = a + bx i ( special case f(x i ) = const = a = µ) minimize χ 2 (a, b) = ( ) y i f(x i ) 2 σ conditions for a minimum in 2-dim parameter space: a χ2 (a, b) = 0 b χ2 (a, b) = 0 can be solved analytically, but don t do it in real life (e.g. see p.105 in Bevington)

Familiar example from day one: linear fit A program (e.g. ROOT) will do minimization of χ 2 for you example analysis y label [units] 5 4.5 4 3.5 3 2.5 2 1.5 1 1 2 3 4 5 x label [units] This program will give you the answer for a and b (= µ ± σ)

Fitting with an arbitrary curve a set of measurement pairs (x i, y i ± σ i ) (note no errors on x i!) theoretical model (prediction) may depend on several parameters {a i } and doesn t have to be linear y = f(x; a 0, a 1, a 2,...) identical approach: minimize total χ 2 χ 2 ({a i }) = ( y i f(x i ; a 0, a 1, a 2,...) σ i minimization proceeds numerically ) 2

Fitting data points with errors on both x and y x i ± σ x i, y i ± σ y i Each term in χ 2 sum gets a correction from the σ x i contribution: ( ) y i f(x i ) 2 (y i f(x i )) 2 σ y i (σ y i )2 + ( ) f(xi +σi x) f(x i σi x 2 ) 2 y(x) f(x) f(x+sx) f(x sx) contribution to y chi2 error on x x

Behavior of χ 2 function near minimum when N is large, χ 2 (a 0, a 1, a 2,...) becomes quadratic in each parameter near minimum χ 2 = (a j a j )2 σ 2 j + C known as parabollic approximation C tells us about goodness of the overall fit (function of all uncertaintines + other {a k } for k j a j = σ j χ 2 = 1 valid in all cases! parabollic error is the curvature at the minimum 2 χ 2 a 2 j = 2 σ 2 j

χ 2 shapes near minimum: examples χ2 better errors worse fit asymmetric errors better fit a

Two methods for obtaining the error 1. σ 2 j = 2 2 χ 2 a 2 j 2. scan each parameter around minimum while others are fixed until χ 2 = 1 is reached method #1 is much faster to calculate method #2 is more generic and works even when the shape of χ 2 near minimum is not exactly parabollic the scan of χ 2 = 1 defines a so-called one-sigma contour. It contains the truth with 68.3% probability (assuming Gaussian errors)

What to remember In the end the fit will be done for you by the program you supply the data, e.g. (x i, y i ) and the fit model, e.g. y = f(x; a 1, a 2,...) the program returns a 1 = µ 1 ± σ 1, a 2 = µ 2 ± σ 2,... and the plot with the model line through the points You need to understand what is done In more complex cases you may need to go deep into code

Introduction to Error Analysis Part 3: Combining measurements Overview propagation of errors covariance weighted average and its error error on sample mean and sample standard deviation

Propagation of Errors x is a known function of u, v... x = f(u, v,...) assume that most probable value for x is x = f(ū, v,...) x is the mean of x i = f(u i, v i,...) by definition of variance σ x = lim N [ 1 N (xi x) 2 ] expand (x i x) in Taylor series: ( ) x x i x (u i ū) + (v i v) u ( ) x v +

Variance of x σ 2 x lim N lim N 1 N 1 [ ( ) ( ) ] x x 2 (u i ū) + (v i v) + u v [ ( ) x 2 ( ) x 2 (u i ū) 2 + (v i v) 2 N u v ( ) ( ) ] x x +2(u i ū)(v i v) + u v ( ) x 2 ( ) x 2 ( ) ( ) x x σ 2 u + σ 2 v + 2σ 2 uv + u v u v This is the error propagation equation. σ uv is COvariance. Describes correlation between errors on u and v. For uncorrelated errors σ uv 0

Examples x = u + a where a =const. Thus x/ u = 1 σ x = σ u x = au + bv where a, b =const. σ 2 x = a2 σ 2 u + b2 σ 2 v + 2abσ2 uv correlation can be negative, i.e. σ 2 uv < 0 if an error on u counterballanced by a proportional error on v, σ x can get very small!

More examples x = auv ( x ) ( ) x u = av v = au σ 2 x = (avσ u) 2 + (auσ v ) 2 + 2a 2 uvσ 2 uv σ2 x x 2 = σ2 u u 2 + σ2 v v 2 + 2σ2 uv uv 2 x = a u v σ2 x x 2 = σ2 u u 2 + σ2 v v 2 2σ2 uv uv 2 etc., etc.

Weighted average From part #2: calculation of the mean minimizing χ 2 = ( x i µ σ i ) 2 minimum at dχ 2 /dµ = 0, but now σ i const. 0 = ( x i µ ) = ( ) x i µ ( 1 σi 2 σi 2 σi 2 ) so-called weighted average is µ = ( x i σi 2 ( 1 σi 2 ) ) each measurement is weighted by 1/σ 2 i!

Error on weighted average N points contribute to a weighted average µ straight application of the error propagation equation: ( µ ) 2 ( µ ) x i σ 2 µ = σ 2 i = x i putting both together x i (xj /σ 2 j ) (1/σ 2 k ) = 1/σ2 i (1/σ 2 k ) 1 σ 2 µ = [ σ 2 i ( 1/σ 2 i (1/σ 2 k ) ) 2 ] 1 = 1 σ 2 k

Example of weighted average x 1 = 25.0 ± 1.0 x 2 = 20.0 ± 5.0 error σ 2 = 1 = 25/26 0.96 1.0 1/1 + 1/52 weighted average ( 25 x = σ 2 1 + 20 2 5 2 ) = 25 25 + 20 1 25 26 25 = 24.8 result: x = 24.8 ± 1.0 morale: x 2 practically doesn t matter!

Error on the mean N measurements from the same parent population (µ, σ) from part #1: sample mean µ and sample standard deviation are best estimators of the parent population but: more measurements still gives same σ: our knowledge of shape of parent population improves and thus of original true error on each point but how well do we know the true value? (i.e. µ?) if N points from same population with σ: 1 σ 2 µ = 1 σ 2 = N σ 2 σ µ = σ N s N Standard deviation of the mean, or standard error.

Example: Lightness vs Lycopene content, scatter plot Lycopene content lyc:light 120 110 100 90 80 70 60 50 30 40 50 60 Lightness

Example: Lightness vs Lycopene content: RMS as Error Lightness vs Lycopene content -- spread option Lycopene content 105 100 95 90 85 80 75 70 65 60 30 35 40 45 50 55 Lightness Points don t scatter enough the error bars are too large!

Example: Lightness vs Lycopene content: Error on Mean Lightness vs Lycopene content Lycopene content 100 95 90 85 80 75 70 65 This looks much better! 60 30 35 40 45 50 55 Lightness

Introduction to Error Analysis Part 4: dealing with non-gaussian cases Overview binomial p.d.f. Poisson p.d.f.

Binomial probability density function random process with exactly two outcomes (Bernoulli process) probability for one outcome ( success ) is p probability for exactly r successes (0 r N ) in N independent trials order of successes and failures doesn t matter binomial p.d.f.: f(r; N, p) = N! r!(n r)! pr (1 p) N r mean: Np variance: Np(1 p) if r and s are binomially distributed with (N r, p) and (N s, p), then t = r + s distributed with (N r + N s, p).

Examples of binomial probability Binomial distribution always shows up when data exhibits binary properties: event passes or fails efficiency (an important exp. parameter) defined as ǫ = N pass /N total particles in a sample are positive or negative

Poisson probability density function probability of finding exactly n events in a given interval of x (e.g., space and time) events are independent of each other and of x average rate of ν per interval Poisson p.d.f. (ν > 0) f(n; ν) = νn e ν n! mean: ν variance: ν limiting case of binomial for many events with low probability: p 0, N while Np = ν Poisson approaches Gaussian for large ν

Examples of Poisson probability Shows up in counting measurements with small number of events number of watermellons with circumference c [19.5, 20.5]in. nuclear spectroscopy in tails of distribution (e.g. high channel number) Rutherford experiment

Fitting a histogram with Poisson-distributed content Poisson data require special treatment in terms of fitting! histogram, i-th channel contains n i entries for large n i, P(n i ) is Gaussian Poisson σ = ν approximated by n i (WARNING: this is what ROOT uses by default!) Gaussian p.d.f. symmetric errors equal probability to fluctuate up or down minimizing χ 2 fit (the true value ) is equally likely to be above and below the data!

Comparing Poisson and Gaussian p.d.f. Prob(5 x) 0.18 0.16 0.14 0.12 0.1 0.08 Cummulative Probability 0.6 0.5 0.4 0.3 0.06 0.2 0.04 0.02 0.1 0-2 0 2 4 6 8 10 12 14 16 18 Expected number of events, x 0 0 2 4 6 8 10 12 14 Expected number of events, x 5 observed events Dashed: Gaussian at 5 with σ = 5 Solid: Poisson with ν = 5 Left: prob.density functions (note: Gauss can be < 0!) Right: confidence integrals (p.d.f integrated from 5)