Some Statistics. V. Lindberg. May 16, 2007

Size: px

Start display at page:

Download "Some Statistics. V. Lindberg. May 16, 2007"

Gwendoline Holland
5 years ago
Views:

1 Some Statistics V. Lindberg May 16, Go here for full details An excellent reference written by physicists with sample programs available is Data Reduction and Error Analysis for the Physical Sciences, Philip R. Bevington and D. Keith Robinson, McGraw Hill 2003, ISBN Basic Statistics You all have, I hope, understand the following terms: (a) Random versus systematic error (b) Gaussian or Normal distribution (c) Measures of central tendency: mean, median, mode (d) Measure of distribution spread: standard deviation We believe that there is an underlying answer to our measurements: that is there is some average and standard deviation in the quantity being measured. This is called the population mean, µ and the population standard deviation, σ. Our objective is to determine the population values based on a finite number of measurements. Suppose we make N measurements of a quantity x this is called sampling the distribution. The sample mean is defined as x = 1 N x i (1) N the sample variance is s 2 x = 1 N 1 i=1 N (x i x) 2 (2) i=1 1

2 and the sample standard deviation is the square root of the variance. As we make more and more measurements we are more and more certain that our sample measurements accurately represent the population, that is that x = µ and s x = σ. In some situations we have discrete values that a variable can take. For example, in the hydrogen spectra there are only certain energy photons that exist. It is convenient to bin the measurements giving the measured value and the number of times it is seen. So in a particular measurement of 100 photons from hydrogen we might measure 30 photons with energy 10.2 ev, 22 photons with energy 1.9 ev, etc. In this case we can describe the probability, P (x j ), of observing a particular energy x. Putting this on a formal basis: we make N measurements, and there are n values that the variable can have. A particular result x j is observed n j times. The probability of observing that particular value is P (x j ) = n j (3) N and the average can be written using this probability Variance is given by x = 1 N n NP (x j ) x j = j=1 s 2 x = n P (x j ) x j (4) j=1 n P (x j ) (x j x) 2 (5) j=1 Example The following measurements are made. Find the mean and standard deviation. Value Applying Equations (1) and (2) we get a mean of 4.27 and a standard deviation of Excel functions AVRERAGE and STDEV are useful. A more compact representation of the data is the following 2

3 Value Frequency Probability Using Equation (4) and we get mean = ((3)(0.2727) + (4)(0.3636) + (5)(0.2727) + 7(0.0909)) = 4.27 (6) The variance is done similarly. Equations (1) and (2) are appropriate for discrete distributions. Many times we have variables that are continuous and we define a probability density, p(x) such that the probability, dp (x) of making a measurement in an infinitesimal range from x to x + dx is dp (x) = p(x) dx (7) Using this we can write Equations(4) and (5) as x = p(x) x dx (8) and s 2 x = p(x) (x x) 2 dx (9) 3 Distributions In one standard treatment of statistics one makes an a priori assumption that the distributions are Gaussian. In many cases this is a fine assumption. In many other cases it is completely wrong. 3.1 Gaussian Distribution (Continuous) The Gaussian, or Normal, or Bell-shaped curve is a symmetric, continuous distribution in the measured variable x. The variable x can have any value: < x <. Population mean and standard deviations are µ and σ and the probability of making a measurement in the range x to x + dx is dp G (x : µ, σ) = p G (x; µ, σ) dx (10) 3

4 where [ p G (x; µ, σ) = 1 σ 2π exp 1 2 ( ) ] x µ 2 Our notation (after Bevington) uses the subscript G to indicate Gaussian, x as the variable, and µ, σ as the parameters describing the distribution. Consider first the math SAT scores for college students as a whole. These may well follow a normal distribution. By contrast, the math SAT scores for physics and math majors should not be distributed normally you are supposed to be better at math! At one time memory chips were manufactured and measured for speed of the chip. Those that could reliably operate at higher speed were sold at a higher price with a different part number than those that were slower. Imagine a normal distribution for the initial population of chips, draw a line somewhere separating the distribution into two parts and you can see that the distribution is not normal! Part of the job of a physicist is to determine what distribution applies to an experimental situation, and apply the appropriate statistics. This is not done nearly as carefully as it ought to be done! Fortunately the distribution is frequently Gaussian, and the researcher gets away with some unjustified assumptions. Excel has a function NORMDIST that returns the probablility density, =NORMDIST(x, mean, stdev, FALSE) or the probability gotten by the integral from to x, =NOR- MDIST(x, mean, stdev, TRUE). Example A Gaussian distribution of energy measurements, E, has a mean of 13.6 ev with a standard deviation of 1.2 ev. (a) What is the probability density for E = 11.0 ev? =NORMDIST(11.0, 13.6,1.2,FALSE) = (b) What is the probability of measuring a value between 10.9 and 11.1 ev? σ (11) Here the range in energies is small so try just multiplying the probability density by the width. P = ( )(0.2) = The exact answer is , so this is close. (c) What is the probability of measuring a value between 10.9 and 13.0 ev? Now we evaluate the integral, P =NORMDIST(13.0, 13.6,1.2,TRUE) - NOR- MDIST(10.9, 13.6,1.2,TRUE) = Three other distributions are seen regularly in physics: Binomial Distributions, Poisson Distributions, and Lorentzian Distributions. 4

5 3.2 Binomial Distribution Consider a coin toss that can have only two outcomes, heads or tails. If n coins are tossed, what is the probability that there are exactly x heads? We allow for the possibility of a biased coin by saying that the probability of a head from a single toss of the coin is p, and this may not be the fair-coin value of P B (x : n, p) = The mean of the binomial distribution, not surprisingly, is n! x!(n x)! px (1 p) n x (12) µ = n p (13) and the standard deviation is σ = n p (1 p) (14) What is the difference between the Binomial and the Gaussian? The binomial has finite limits: x can only run from 0 to n. For large values of n the Binomial can be approximated by the Gaussian however. example Preliminary measurements show that of 1000 measurements, 472 result in scattering in a forward direction and 528 result in scattering in the reverse direction. What is the standard deviation to be quoted? Here n = 1000 and we estimate that p = 472/1000 = From Equation (14) we get σ = Hence we can say that the number that forward scattered is (472 ± 16). 3.3 Poisson Distribution The Poisson Distribution is regularly seen for the case of counting statistics for photons or radioactivity, where we ask, What is the probability of measuring x counts in a time interval when the mean number of counts in that interval is µ. This probability is with the mean being µ and the standard deviation being P P (x : µ) = µx exp( µ) (15) x! σ = µ (16) example You count the number of photons arriving for 100 s and find 123 counts. 5

6 (i) What are the average and standard deviation of the the count rate? The mean is 123/100 = 1.23 counts per second. The standard deviation in the number of counts is 123 = 11 so the standard deviation in the count rate based on this sample is 11/100 = Hence we could quote the count rate to be 1.23 ± 0.11 counts/s. (ii) If you count for 2 seconds, what is the chance of getting 0 counts? 1 count?... 8 counts? Use Equation (15) with the mean count in 2 seconds being 2(1.23) = 2.46 we find n probability The Poisson distribution is different from the Binomial or Normal in one very important way it is not symmetrical around the mean. Unlike the Binomial which is bounded on both sides, 0 x n, the Poisson is only bounded on the lower side, 0 x. If the mean of a Poisson distribution is large, it can be approximated by a normal. 3.4 Lorentzian The Lorentzian or Cauchy Distribution is used to describe behavior of resonant systems and has a probability density distribution p L (x : µ, Γ) = 1 π Γ/2 (x µ) 2 + (Γ/2) 2 (17) The standard deviation is not defined for the Lorentzian. Instead the full-width-at-halfmaximum, Γ characterizes the distribution. Figure (1) compares the different distributions for a sample with mean of 12. 6

7 7

8 4 Error in the Mean Consider the following set of measurements. (a) Three measurements of the lifetime of an excited state resulting in a mean value of ns and standard deviation of ns (b) Ten measurements resulting in mean ns and standard deviation ns. (c) One Hundred measurements resulting in a mean of ns with standard deviation of ns All measurements agree that the mean is approximately 12 ns and the standard deviation is approximately 0.2 ns. Yet somehow we know that we trust the results from 100 measurements more than the results for just 3 measurements. What we want is an easy way to estimate the uncertainty or error in the value of the mean based on N measurements this is called the standard error in the mean and is defined as σ µ S E = σ (18) N Example From the three cases mentioned above we would get standard errors of 0.211/ 3 = 0.12 ns, 0.242/ 10 = ns, and 0.223/ 100 = ns, and we see that increasing the number of measurements reduces the standard error. Hence the lifetimes would be reported as ± 0.12 ns, ± ns, and ± ns. When you include either a standard deviation or a standard error, be sure to identify which it is in the text of the paper. 5 Chi-Square Test of Fit Many times we wish to compare experimental data to a predicted function. The chi-square (χ 2 ) statistic does this based on a histogram of data. We consider the case of N measurements of a quantity x. The quantity may be discrete, such as the number of heads obtained from tossing 40 coins, or it may be continuous. If it is continuous it must be binned. A bin is a range of values for x, and given some data our first task is to choose the size and boundaries of the bins.i know of no definitive technique for doing this: here are two methods mentioned Bin width = 2σ/ 3 N 8

9 Number of bins = ln N Suppose that there are n bins, and we know the predicted number of events for each bin. The χ 2 statistic is χ 2 = Observed Expected σ 2 Observed Expected Expected where we use the Poisson estimate of the standard deviation. Example 20 coins are tossed N = 40 times and the number of heads is recorded. On average the number of heads is 7.75, meaning the probability of getting a head is p = 7.75/20 = Using the Binomial Distribution, Equation (12), we can predict the distribution of heads. Number of Heads Times Observed Times Predicted Contribution to χ The sum of the last column is χ 2 = 7.50 Is this a good χ 2 or not? A full statistics treatment tells us that we can expect a value equal to the number of degrees of freedom that is defined as the number of bins (12 in our example) less the number of constraints in our fit, and that is 2 in our example the number of measurements and the mean number of heads. Since our χ 2 < (12 2) we trust that we have a good fit. (19) 6 Linear Least Squares Simplest Case Consider an independent variable x with no uncertainty and a dependent variable y that we expect to be related by the relation y = m x + b (20) 9

10 We take N measurements of (x i, y i ) pairs and want to find the best values for the slope m and intercept b. Each value y i has a corresponding standard deviation σ i. Define the deviation, y as and we can then form the χ 2 from the deviation, y = y i y(x i ) = y i mx i b (21) χ 2 = N (y i mx i b) 2 σ 2 i=1 i (22) For simplicity assume that all measurements have the same standard deviation. We choose to define the best fit as the choice of slope and intercept that minimizes this χ 2. There are two variables, so we minimize by setting partial derivatives to zero, χ 2 / m = 0, χ 2 / b = 0, and solving the resulting equations for m and b. χ 2 / m = 2 (y i mx i b)x i = 0 (23) or xi y i = m x 2 i + b x i (24) The other partial derivative results in yi = m x i + b N (25) where we use the fact that N i=1 1 = N. Solving the simultaneous equations results in our best fit parameters b = x 2 i yi x i xi y i N x 2 i ( x i ) 2 (26) and m = N x i y i x i yi N x 2 i ( x i ) 2 (27) This technique can easily be extended to a polynomial fit, just with more simultaneous equations, typically solved by using determinants. Likewise we can sometimes make a non-polynomial function look like a polynomial. For example, y = y 0 e kx (28) 10

11 can be written as ln y = ln y 0 kx (29) One subtlety is that while the standard deviation in y may be the same for all points, the standard deviation in ln y will not be the same for all points. Other functions cannot be linearized, and so the techniques described will not work. A simple example of such a non-linear function is y = a 1 + a 2 e a 4x + a 3 e a 5x (30) Finding ways to fit such functions is very difficult, and well beyond these notes. 11

Chapter 8: An Introduction to Probability and Statistics

Course S3, 200 07 Chapter 8: An Introduction to Probability and Statistics This material is covered in the book: Erwin Kreyszig, Advanced Engineering Mathematics (9th edition) Chapter 24 (not including