Introduction to Error Analysis

Introduction to Error Analysis This is a brief and incomplete discussion of error analysis. It is incomplete out of necessity; there are many books devoted entirely to the subject, and we cannot hope to learn it all here! So, I will introduce a few methods of error analysis commonly used by scientists that will be helpful in our labs. 0.1 The main point to keep in mind! It all boils down to this: Every measurement or calculated value of a physical quantity, let s say X, must include a value for the experimental uncertainty on that quantity (i.e. X). If it does not include this uncertainty, the measurement is meaningless. If you measure the length of a, say a piece of chalk, what would you report? It would depend on two things: the instrument you use to measure the length, and (not to be trivialized) how you define the length. For exemple, suppose you have a brand new untouched piece of chalk, and you measured it with a ruler. You would likely just lay the chalk down next to the ruler and read the length visually. If you were careful with this, you could probably report the length to within ±1 mm, or if you re super careful, about ±0.5 mm. If you used a micrometer, you could likely obtain an accuracy of ±0.025 mm. Could you do better than this? Possibly, but the point I am trying to make here is that you need to report an uncertainty for each measurement. What that uncertainty value is, depends upon your choice of measurement device is. To address the second point (i.e. how you define length, to continue the example), suppose we think in detail about measuring the length of the chalk. I was careful in this example to say a brand new, untouched piece of chalk, because I wanted you to picture in your head, a perfect cylindrical solid whereby the idea of it s length is obvious. But what about a real piece of chalk? If you measure it s length with a ruler, should you measure the length with the chalk lying on a table, standing on end, or is it better to do so on the international space station? Does it matter? In this case, although it DOES matter in actuality, the ruler and human eye are not precise enough to measure this difference. If you had a sufficiently precise measuring instrument, all three of the above methods would yield different results. It would be up to the scientist in this case clearly state what 1

2 measurement method and under what conditions the measurement took place (i.e. horizontally, vertically, in free fall, etc.) An additional complication arises from the fact that a real piece of chalk is not a perfect euclidian solid, the two ends may not really be parallel, the surface of the ends of the chalk is not locally flat if you look closely enough. This may all seem absurd to you and you might be thinking : I mean come on, you re just splitting hairs, right? Well, no, it s actually important to think about measurements in this way ALL the time when conducting scientific research. The uncertainty value you attach to a measurement is a quantitative reflection of the instrument used and the care and conditions the object was in during the measurement. You have to boil all this down to a number and describe in the text of your paper how you arrived at this uncertainty. This being the case, let s be a little quantitative and address a common occurrence in the laboratory: how to attach uncertainty to repeated measurements of a quantity. 0.2 Uncertainty from repeated measurements Suppose that you measure some physical quantity repeatedly. For example, suppose you measure the time for an object to accelerate (starting from rest) through some distance, x. Suppose that you measure the following times: TRIAL # Time (s) 1 5.22 2 5.41 3 5.63 4 5.31 5 5.03 6 5.53 7 5.41 8 5.75 If asked to report the best value for the time, a sensible thing to do is to report the average (often called the mean) value of the time data, 1 where the average time, t or t, is given by t = t 1 N N t j, where N is the number of measurements of t in the example above, N = 8. Hence, for the above data, t = t = (5.22 + 5.41 + 5.63 + 5.31 + 5.03 + 5.53 + 5.41 + 5.75) 8 = 43.29 8 = 5.41125 sec. 1 you should be aware that there are other ways to calculate the best value; the mean is only one of several possibilities.

0.2. UNCERTAINTY FROM REPEATED MEASUREMENTS 3 Thus, we would report our best estimate for the time as roughly 5.41 seconds (rounded to 3 significant figures). Now, what is the uncertainty on this number? Again, there are several ways to go. A conservative approach The first method is to be very conservative (this has nothing to do with one s political leanings!) and state the maximum and minimum possible values relative to the mean. For example, in the data above, the maximum time is 5.75 seconds, and the minimum time is 5.03 seconds. These values differ from the mean by +0.34 and -0.40 seconds, respectively. So, one could use the higher of these two values (to be the most cautious) and state the time plus an uncertainty as t = t = (5.41 ± 0.40) seconds This is perhaps the most straightforward way to get the uncertainty, and it says that we are confident that the time lies pretty close to 5.41 seconds, but may vary by as much as 0.40 seconds from this time. RMS deviation Notice however, that this uncertainty is very cautious; indeed, all but one measurement lies significantly closer to the mean value. Hence, it is reasonable to quote a smaller uncertainty. So here is another way to estimate the uncertainty. This second method is called the root mean square deviation (RMS deviation) and is computed by first calculating the deviation of each point from the mean; i.e. t j = t j t. In this way, values higher than the mean are positive deviations, and values smaller than than the mean are considered negative deviations. The table below shows the deviations for the original data: TRIAL # Time (s) t j 1 5.22-0.19125 2 5.41-0.00125 3 5.63 +0.21875 4 5.31-0.10125 5 5.03-0.38125 6 5.53 +0.11875 7 5.41-0.00125 8 5.75 +0.33875 Notice that the uncertainty for trials 2 and 7 are not zero; this is because I used the full value for t (5.41125 seconds) in calculating t j. At this point, you might be tempted to say Oh great! So now I just take the deviations, add them together and compute the average value and this will be the uncertainty! Bad

4 luck. You see, the problem is that if you do this, you will get zero... try it and see! And the problem is even worse than this the average deviation computed in this way will always be zero because of the definition of the mean and the deviation we have used. So, we need a way around this trouble. This is where the terms root, and mean square come into play. The reason the average deviation is zero is because some of the deviations are positive and some are negative (in just the right amounts so that they sum to zero). So, we can make the deviations positive by first squaring them. Then if we add them together they will not sum to zero, and if we divide by the number of trials, we have the mean square deviation: ( t) 2 = 1 N ( t j ) 2. N Of course, if t has units of seconds, then ( t) 2 has units of (seconds) 2, and the obvious way to remedy this situation is to take the square root, thus ending up with the root mean square deviation: t RMS = 1 N ( t j ) N 2. Adding onto the previous data table, we have TRIAL # Time (s) t j ( t j ) 2 1 5.22-0.19125 0.03658 2 5.41-0.00125 0.000002 3 5.63 +0.21875 0.04785 4 5.31-0.10125 0.01025 5 5.03-0.38125 0.14535 6 5.53 +0.11875 0.01410 7 5.41-0.00125 0.000002 8 5.75 +0.33875 0.11475 ( t) 2 : 0.04611 sec 2 The average squared deviation is 0.04611 s 2 and, taking the square root, the RMS deviation is 0.21473 seconds. Hence, by the root mean square method of calculating the uncertainty, we have, as our estimate of t: t = (5.41 ± 0.21) seconds Notice that this uncertainty is considerably smaller than that derived from the conservative approach. The reasoning is that while some of the time one may measure a time that falls outside of the RMS bounds, on average, the majority will fall within this range. Mean absolute deviation The last way of estimating the uncertainty is called the mean absolute deviation, and proceeds along similar lines to the RMS method. However, instead of calculating the squares of the

0.2. UNCERTAINTY FROM REPEATED MEASUREMENTS 5 deviations and then averaging them, we merely calculate the absolute value of the deviation and average those values. That is, ADev = t = 1 N t j t. N for the given data, here is what the absolute deviations would look like: TRIAL # Time (s) t j 1 5.22 0.19125 2 5.41 0.00125 3 5.63 0.21875 4 5.31 0.10125 5 5.03 0.38125 6 5.53 0.11875 7 5.41 0.00125 8 5.75 0.33875 t : 0.14755 sec Hence, by this method, we would report our time and its uncertainty as RMS.vs. mean absolute deviation t = (5.41 ± 0.15) seconds. Notice that of the three methods I have outlined, the mean absolute deviation has given the smallest uncertainty. The mean absolute deviation is often referred to as a robust estimate of the uncertainty meaning that it is less sensitive to points far away from the mean than in the RMS method. You can see this for yourself in the following simple example. Suppose that you measure a distance, x, three times (this is not enough times to get a good estimate of the average value I am just using it to show the difference between the rms and mean absolute deviations!) and obtain 3.0 m, 3.5 m, and 6.5 m. The average value is therefore 4.333 m and the deviations are -1.333, -0.8333 and +2.1667, respectively. Hence the rms and mean absolute deviations are: 1 RMS deviation = 3 [( 1.333)2 + ( 0.8333) 2 + (2.1667) 2 ] = 1.54 m Mean absolute deviation = 1 3 [1.333 + 0.8333 + 2.1667] = 1.44 m The RMS deviation is very sensitive to the points far from the average, since it sums the squares of the deviations. Thus the mean absolute deviation will consistently produce an uncertainty which is less than or equal to the RMS value. Hence, one data point far from the mean value will strongly effect your uncertainty estimate. Partly because the RMS method is very sensitive to these outlying points, the mean absolute method is gaining wider usage in scientific circles these days.

6 Which method should I use? As far as this class is concerned, you may use any one of the three methods. But make sure you indicate which you are using! If you do not have many measurements of a quantity (which there is little excuse for), it is probably best to err on the conservative side, and use the cautious approach. But for most cases, you should be using the RMS or the mean absolute deviations to compute your uncertainties. Take care to stick with one method for each lab; you might try RMS deviations in the first lab and use absolute deviations for the second lab, so that you get practice with both methods. QUESTIONS 1. Calculate the RMS and Absolute deviations for the following data points: x = 0.10 m, 0.08 m, 0.12 m, 0.09 m, 0.10 m, and 0.11 m. Make a table showing the relevant deviations and their averages. 2. Prove the claim made on page 2:... the average deviation computed in this way will always be zero... ; i.e. prove that for any set of N values x i, the average deviation x = 1 N N x j = 0. 3. Suppose that you measure two times with a stopwatch, t=1.51 sec and t=1.51 sec. What if anything is wrong with saying that the uncertainty in t is zero? 4. Suppose that you measure four time with a stopwatch, t=1.51, t=1.50, t=1.40, and t=1.48 seconds. What is the uncertainty you should report in your lab report? (There are several possible answers here, depending on which method you use. In your report of the uncertainty, label the method used to obtain it.