Introduction to Statistics and Error Analysis

Similar documents
Introduction to Statistics and Error Analysis II

Statistics, Probability Distributions & Error Propagation. James R. Graham

Statistical Methods in Particle Physics

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Introduction to Error Analysis

Lecture 2: Repetition of probability theory and statistics

Review of Probability Theory

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Statistics and Data Analysis

Basics on Probability. Jingrui He 09/11/2007

1 Presessional Probability

Some Statistics. V. Lindberg. May 16, 2007

Chapter 4 Multiple Random Variables

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Algorithms for Uncertainty Quantification

Multiple Random Variables

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

functions Poisson distribution Normal distribution Arbitrary functions

Bivariate distributions

Data, Estimation and Inference

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Probability and Statistics

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

Probability Distributions - Lecture 5

2.3 Estimating PDFs and PDF Parameters

Math Review Sheet, Fall 2008

1: PROBABILITY REVIEW

MAS223 Statistical Inference and Modelling Exercises

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

This does not cover everything on the final. Look at the posted practice problems for other topics.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY GRADUATE DIPLOMA, Statistical Theory and Methods I. Time Allowed: Three Hours

1 Random variables and distributions

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Chapter 5 Joint Probability Distributions

Lectures on Statistical Data Analysis

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Probability Theory and Statistics. Peter Jochumzen

Chapter 2: Random Variables

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

Lecture 2: Review of Probability

2 (Statistics) Random variables

Chapter 5. Chapter 5 sections

STA 256: Statistics and Probability I

Data Analysis and Monte Carlo Methods

Math 180B Problem Set 3

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

Week 9 The Central Limit Theorem and Estimation Concepts

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic

Final Exam # 3. Sta 230: Probability. December 16, 2012

Joint p.d.f. and Independent Random Variables

Scientific Computing: Monte Carlo

Unit 4 Probability. Dr Mahmoud Alhussami

Statistics and data analyses

Chapter 4 continued. Chapter 4 sections

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Preliminary statistics

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Let X and Y denote two random variables. The joint distribution of these random

Review of Probabilities and Basic Statistics

Measurement And Uncertainty

02 Background Minimum background on probability. Random process

Statistical Methods for Astronomy

Random Variables and Their Distributions

Introduction to Probability and Statistics (Continued)

Numerical Methods I Monte Carlo Methods

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

1 Random Variable: Topics

1 Probability theory. 2 Random variables and probability theory.

Statistical Methods in Particle Physics

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

EE4601 Communication Systems

Statistical Methods for Astronomy

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

Random variables. DS GA 1002 Probability and Statistics for Data Science.

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 60 minutes.

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

Statistics, Data Analysis, and Simulation SS 2015

Review of Probability. CS1538: Introduction to Simulations

1 INFO Sep 05

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Kinetic Theory 1 / Probabilities

Counting principles, including permutations and combinations.

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

Probability. Table of contents

STA 2201/442 Assignment 2

MULTIVARIATE PROBABILITY DISTRIBUTIONS

1 Probability Review. 1.1 Sample Spaces

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Multivariate Distribution Models

ECON Fundamentals of Probability

Lecture 1. ABC of Probability

Transcription:

Introduction to Statistics and Error Analysis Physics116C, 4/3/06 D. Pellett References: Data Reduction and Error Analysis for the Physical Sciences by Bevington and Robinson Particle Data Group notes on probability and statistics, etc. online at http://pdg.lbl.gov/2004/reviews/contents_sports.html (Reference is S. Eidelman et al, Phys. Lett. B 592, 1 (2004)) Any original presentations copyright D. Pellett 2006

Crucial Issues for Experimenters Accuracy of data Probability distributions for data Statistical parameters and estimates: Propagation of errors Comparison with theory Curve fitting (also Essick, Ch. 8) Significance of results Chi-square test, confidence intervals Physics 116C applications: µ, σ Radioactive decay; mean life of nuclide; Johnson noise

Measurement and Uncertainty Estimation Error: Difference between the measured and true value But: the true value is unknown... Make measurement and estimate uncertainty due to blunders: oops! e.g., wrote down wrong number - do over carefully... may appear as statistical outlier random errors (statistical fluctuations) differ from trial to trial but average to better value e.g., repeated measurements with meter stick systematic errors due to reproducible discrepancies e.g., measurements with cold metal ruler appear bigger since scale has contracted

Accuracy vs. Precision Accuracy: how close to true value Precision: how well the result has been determined - e.g., how many decimal places for the result Example: can improve length measurement with vernier caliper Can be precise Can also be accurate if careful But if you hold it away from the piece you are measuring and estimate the length by sight, the precision will be the same but the accuracy will suffer... Understand use of significant figures (see Bevington, Ch. 1) Often better to state error estimate: l = 1.423 ± 0.003 m

Estimating Statistical Uncertainty Repeated measurements to understand uncertainties Mean value gives better estimate Leads to study of probability and statistics Parent distributions those of larger, perhaps infinite set of possible measurements (parent population) from which a finite sample is drawn Sample distributions Example: make N = 100 length measurements x i Histogram results (frequency distribution) Calculate sample mean, sample standard deviation x, s Compare with parent distribution (assumed Gaussian)

Sample mean: σ 2 lim N ( 1 N x 1 N Parent distribution mean: Variance: µ lim N ( 1 N N i=1 N i=1 Statistics x i N ) x i i=1 (x i µ) 2) = lim N ( 1 N N ) 2 x i µ 2 Sample variance (unbiased estimate of variance from sample): s 2 1 N (x i x) 2 (N 1) Standard deviation: i=1 (Divide by (N-1) since sample mean used)! = parent standard deviation s = sample standard deviation i=1

Statistics (continued) Some other statistics: mode (most probable value) median (half values less, half greater in infinite population) For example, for a Gaussian, the median equals the mean Finite sample (according to Mathworld): Put samples in increasing order. If the number of samples is odd, the value of the sample in the center is the median. If the number of samples is even, take the average of the values of the samples on either side of the center. Reference: http://mathworld.wolfram.com/statisticalmedian.html

Distributions and Expectation Values Probability density function (p.d.f) p(x) (following Bevington notation usually the p.d.f. is written f(x)). Cumulative distribution function P(x): P (x) = x p(x ) dx (again, this is usually written as F(x)). P(a) is the probability that x! a. Expectation value of a function of a random variable: < u(x) >= u(x)f(x) dx For discrete distribution: < u(x) >= u(x i )P (x i ) Area of width!x under p(x) curve is the normalized probability of x falling within the!x interval

Parent and Sample Distributions Simulate results of N =100 repeated measurements xi (random samples) from a parent Gaussian distribution f(x) = pgauss(x;µ,!) Calculate, compare sample mean and sample standard deviation Compare sample histogram with N f(x) "x (where "x = bin width) Gaussian Distribution

Joint p.d.f. Statistics Joint probability density function: f(x, y) Marginal p.d.f. s in x and y (all but one variable unobserved): f 1 (x) = µ x = f(x, y)dy f 2 (y) = xf(x, y)dxdy µ y = Mean values of x, y (expectation values under joint p.d.f.): f(x, y)dx yf(x, y)dxdy Covariance: cov[x, y] < (x µ x )(y µ y ) >=< xy > µ x µ y cov[x, y] Correlation coefficient: ρ xy ( 1 ρ xy 1) σ x σ y x, y independent iff f(x, y) = f 1 (x)f 2 (y) Independence implies ρ xy = 0 (but the converse is not true)

Example: 40,000 Samples from Joint p.d.f. Here f(x,y) is a product of two independent Gaussians in x and y (x, y are independent) plots done with ROOT (root.cern.ch) Marginal distribution histograms can be found from sums of rows or columns (add sums in the margins of the chart) σ x = 0.70 σ y = 0.40 cov[x, y] = 1.81 10 5 ρ xy = 6.4 10 5 root [14] h2.getcovariance(1,2) (const Stat_t)(-1.81279999091343846e-05) root [15] h2.getcorrelationfactor(1,2) (const Stat_t)(-6.44433599986376094e-05) root [16] h2.getrms(1) (const Stat_t)6.98881747937568965e-01 root [17] h2.getrms(2) (const Stat_t)4.02501975150788061e-01 root [18] h2.integral() (const Stat_t)4.00000000000000000e+04

Example: Variables Not Independent Sum of two functions of x and y (each a Gaussian similar to prev. plot) Correlation coefficient = 0.77

Common Distributions and Properties Typical probability distributions Uniform distribution Gaussian distribution Central limit theorem and LabVIEW example Lorentzian distribution (a.k.a. Cauchy, Breit-Wigner) Propagation of errors overview (more later) Generation of Pseudorandom distributions with LabVIEW Next - Counting statistics and square root of N Binomial distribution and Gaussian limit for large n Poisson distribution, relation to binomial, Gaussian

Gaussian Histogram Generation VI Generate random number from uniform distribution with rand() Transform to sample from unit normal distribution using Inverse Normal Dist. vi Scale from z to desired mean and std. dev.

Central Limit Theorem Example Sum of 6 samples from uniform distribution approximates Gaussian with mean = 3 and standard deviation = 0.707 Cutoff at 4.24 standard deviations Probability for exceeding 4.23 standard deviations is 2.3 x 10-5

Add 6 samples from uniform distribution Central Limit Example VI

Overview: Propagation of Errors Brief overview Suppose we have x = f(u,v) and u and v are uncorrelated random variables with Gaussian distributions. Expand f in a Taylor s series around x0 = f(u0,v0) where u 0,v 0 are the mean values of u and v, keeping the lowest order terms: ( ) ( ) f f x x x 0 = u + v u v ( ) ( ) The distribution of!x is a bivariate distribution in!u and!v. Under suitable ( conditions ) (see) Bevington Ch. 3) we can approximate! x (the standard deviation of!x) by σ x 2 ( f u ) 2 σ u 2 + ( f v ) 2 σ v 2

Binomial Distribution Summary The binomial distribution arises in flipping coins or performing a similar experiment ( ) with ( exactly ) two possible outcomes repeatedly with independent trials. Call the outcome of a single trial heads with probability p or tails with probability ( )(1-p), allowing ( ) the possibility of unequal probabilities. Then the probability for getting x heads with n trials is P B (x; n, p) = n! x!(n x)! px (1 p) n x The distribution has " = np and! 2 = npq. (σ = npq) For n >>1, the discrete distribution can be approximated by a Gaussian p.d.f. with the above mean and standard deviation.