Treatment of Error in Experimental Measurements

in Experimental Measurements All measurements contain error. An experiment is truly incomplete without an evaluation of the amount of error in the results. In this course, you will learn to use some common tools for analyzing experimental uncertainties. While the methods you will learn here are by no means the most thorough, they are sufficient for many applications, and are the basis of almost all advanced methods. A. Some Basic Language Experimental errors can be generally classified as one of two types: Random (indeterminate): These errors occur with a random magnitude and sign each time the experiment is executed. The appearance of random error in every measurement is fundamentally unavoidable. However, efforts can be made to minimize contributions from random errors. Systematic (determinate): These errors occur with the same sign and approximate magnitude each time the experiment is executed. Systematic errors are called determinate errors because the experimenter should, in principle, be able to determine the source of these errors and therefore avoid them. In practice, the source(s) of systematic errors are sometimes difficult to identify. Human errors, or mistakes, are a third type of error which can be systematic or non-systematic, mathematical or procedural. Since errors due to mistakes tend to be larger than typical systematic or random errors, they are best spotted by comparing the values in a set of measurements mistakes often stand out against the other more accurate values (this is one reason why single measurements of a property should be avoided). Students learning to write scientifically often cite human error as the principle source of error in their experiment. While it may be true that you are the principle source of error, such a vague statement is not acceptable. In most cases, you should be proficient enough in the laboratory and careful enough in your calculations that mistakes do not become the major contributor to the error. However, if an accident occurs that does significantly influence the quality of your results, you should describe it and the effects on the results in detail. More often, the major source of error is random error, which can be quantified statistically (see below). The quality of a particular measurement or set of measurements can be generally described using the terms precision and accuracy. A precise measurement is one that is highly reproducible, and thus has little associated random error. (Of course, we can only know that a measurement is reproducible if several measurements are obtained. However, the comparison is often made against some standard that was obtained from multiple measurements elsewhere, such as by the manufacturer of an instrument.) An accurate measurement is one whose value is close to the true value, and thus has little or no systematic error. Accuracy can apply to a single measurement or to the average of a set of measurements. Note that precision and accuracy are independent expressions of the quality of the measurement(s). For example, a set of imprecise measurements may still be quite accurate. Four possible combinations of precision and accuracy are illustrated in Figure.

(a) (b) (c) (d) Figure : Precision versus accuracy. The center of the target represents the true value of a property, and the bullets represent the measured values of a property. (a) A set of measurements in which all measurements are both accurate and precise. (b) A set of imprecise measurements. The average is accurate, although each individual measurement is inaccurate. (c) A set of precise, but inaccurate measurements. (d) A set of imprecise, inaccurate measurements. B. Statistical Analysis of Random Errors A good assessment of the error in any experiment ultimately comes down to judgement on the part of the experimentalist. In a well-designed and executed experiment, there will be very little systematic error. Therefore, the uncertainty in the measurement can be explained by assuming the errors arise in a random way. This is fortunate, because random errors can be analyzed statistically, while systematic errors cannot. In this course, we will make the assumption that the experiments are well designed and executed so that we can use these methods. However, you must always check this assumption after you have obtained your final results to be sure that it is valid (see section D). Statistical methods also exist for identifying mistakes ( outliers ), although we will not cover them in this course. See SGN and/or Skoog for a good discussion of statistical error analyses.. An Infinite Number of Measurements: The statistical starting point Clearly the goal of making a measurement is to obtain the true value of the property that we are trying to measure. However, since all measurements have error, we are unlikely to measure the true value exactly. The more measurements we make, the more likely it is that we will obtain the correct value for the property of interest. While we will never actually work with an infinite number of measurements, it is useful to examine the errors associated with an infinite sample set because the treatments for the more realistic finite sample sets are derived from this limiting case. Regardless of the sample size, the probability of obtaining any given value as a result of making a measurement can be described by some distribution function. For an infinite sample set, the distribution of measured values (x) about the true value (µ) is described by a normalized Gaussian, or Normal, function: f ( x ) = σ π e ( x µ ) /σ () The group of constants in front of the exponential term serves to normalize the function (a function is normalized if the integrated area under the curve is equal to one.) This function is sometimes also called a bell function because the curve is bell-shaped (Figure ). The width of the bell is determined by the constant σ, which is called the standard deviation. For infinite sample sets, the standard deviation is exactly equal to the root-mean-square error,

σ = lim N N N ( x i µ ) () i = where N is number of measurements in the set (infinity, in this case). The term root-mean-square (or RMS) arises often in statistics, and it means exactly that (in reverse order): the error (x i -µ) is first squared, then averaged (to get the mean), and finally square-rooted. The RMS error is therefore a measure of the average magnitude of random error in the experiment. 0.04 Figure : The Normal (Gaussian) distribution function for an infinite sample set (σ=0, µ=50). The shaded region represents the 68.8% confidence interval. 0.03 0.0 0.0 σ 0.00 0 0 40 µ 60 80 00 If we have made an infinite number of measurements and if the error is indeed randomly distributed, then the mean of the distribution is exactly equal to the true value: µ = lim N N N x i i = (3) This simple result allows us to evaluate the quality of any single measurement by identifying the probability that it is within a certain range of the mean (true value). The certain amount is usually expressed in units of σ. For example, we could ask: What is the probability that a measured value lies within ±σ of the true value (i.e.: that x = µ ± σ)? which is equivalent to asking: What is the probability that the error in a measurement is less than σ in magnitude (i.e.: that x-µ < σ)? We can find this probability by integrating the probability function in equation () over the range of possible values for x, i.e.: from µ-σ to µ+σ. µ + σ P = µ σ f (x ) dx = 0.688 (4) One place you have almost certainly seen the term RMS in chemistry is in the Kinetic Molecular Theory of gases, where the RMS velocity of gas molecules dictates their average kinetic energy, among other things. This is only true if the distribution function is normalized. The integral must be evaluated numerically. 3

The result is shown in Figure as the shaded region. We can phrase the result of equation (4) in several ways: 68.6% of the measured values in an infinite sample set lie within ±σ of the true value. A given measurement in an infinite sample set has a 68.6% chance of having an error of σ in magnitude. The true value has a 68.6% chance of lying within ±σ of any measured value in an infinite sample set. There are many other permutations of these statements, but they are all equivalent. Since we will usually be interested in representing our measurement as an approximation to the true value, we will pick the third statement, and thus express the quality of our measurements in the form: µ = x ± σ with 68.6% probability (5) We now need to introduce some statistics terms: The value of (P x 00%) is termed the confidence level. In the above example, P=0.686, so based on our measurement x, we feel that we know µ with 68.6% confidence. Alternatively, we are 68.6% sure that our measured value is correct. The value of the error limit is called the confidence limit, and is given the symbol λ P. (In this case, λ 0.686 = σ.) The range of values (x ± λ P ) is called the P x 00% confidence interval, and is where we expect the true value to lie P x 00% of the times that we make a measurement. (In this case, the 68.6% confidence interval is between the limits (x + σ) and (x - σ)). While all of this language and formalism is a bit intimidating at first, it does allow us to make very specific statements about the error in our measurements and the corresponding faith that we have in our result. Most scientists are therefore willing to tolerate the somewhat cumbersome nature of statistical terminology. We are certainly not limited to describing our results with only 68% confidence. We can increase our confidence in our result by increasing the size of the confidence interval. In general, the confidence interval is expressed in terms of Z standard deviations: µ = x ± Zσ (6) where the number Z is called the standard score. Values of the confidence level (P) as a function of the standard score (Z) for infinite sample sets are tabulated in most statistics books, and are available in Table of SGN (p.45). Notice that we, the experimenters, get to manipulate the size of the error that we report by changing the value of Z! However, there is a tradeoff: if we chose a smaller confidence limit (a smaller value of Z), we will be less sure that the true value lies within the reported confidence interval. That is, we are less sure that the value that we report is correct. In practice, rather than choose a Z, most scientists choose work at a particular confidence level (P), and therefore use the tables to determine the corresponding value of Z. In this course, we will use 95% as our standard confidence level. According to Table, Z=.96 when P=0.95. Thus for infinite sample sets: 4

µ = x ±.96 σ N=, 95% confidence. A Large, Finite Number of Measurements In reality we can never obtain an infinite number of measurements. We can, however, strive to make a large number of measurements and hope to approximate an infinite sample set. Of course, we cannot exactly determine how close we are to true value if we make a finite number of measurements, no matter how careful we are. Ultimately, the error in the final result will depend on both the quality of the approximations that we are forced to make, and on the quality of the measurements themselves. If N is large enough (N>0 is generally accepted as large ), the distribution of measured values can be fairly well approximated by a Normal distribution: f ( x ) = s π e ( x x ) /s (7) where the true value is now approximated by the mean (average), x : N µ x = N x i (8) i = and the width of the distribution (the standard deviation) is now approximated by s: σ s = N (x i x ) (9) N - i = According to equation (8), the average is the best approximation of the true value that we can get with a finite number of measurements. Since x represents a combination of several measurements, it is reasonable to expect that x carries less random error than a single measurement. This is of course why we take an average, and why we use the average as the approximation of the true value. Regardless of whether the errors are in fact random, there should also be a distribution of means about the true value. 3 This distribution of means turns out to also be described by an approximate Normal distribution function, g( x ), g( x )= s x π e ( x x ) /s x (0) where the standard deviation of g( x ) is called the standard deviation of the mean, s x. The standard deviation of the mean is a better approximation of σ than is s, just as x is a better approximation of µ than is x. The standard deviation of the mean is related to the standard deviation of f(x), s, through the square root of the number of measurements: 3 If you are uncomfortable with this idea, imagine conducting an experiment in which you measure the value of a property a large number of times. The mean of this set of measurements, x, is an approximation of the true value, µ. Now you conduct the experiment again, measuring the value of the same property a large number of times. The mean of this set of measurements is probably slightly different than the mean for the first set of measurements since the mean is only an approximation of the true value, and thus carries error. If you think about conducting the experiment a large number of times, you should be willing to believe that you would obtain a distribution of means. 5

σ s x = s N () Examples of exact and approximate Normal distribution functions are shown in Figure 3. 0.04 Figure 3: Normal distribution functions for an infinite sample set (line; σ=0, µ=50) and a large finite sample set (bars; N=, s=3, s x =6.8, x = 50.). Notice that s x is a better approximation of σ than is s. 0.03 0.0 0.0 0.00 0 0 40 60 80 00 There are several technical points to notice about equations (7)-(9): The number of measurements used to construct the mean in the approximate standard deviation, s (N-, eq. 9), differs from that in the exact standard deviation, σ (N, eq. 3). This number is called the number of degrees of freedom, which is equal to the number of independent variables. There are fewer degrees of freedom in the approximate case because one is used up in constructing x. The idea of degrees of freedom also arises in chemistry, physics, mathematics and other fields. As N, x µ and s x σ. Therefore, if N is large enough, then s x is a good approximation of σ, and x is a good approximation of µ. In this limit, 68.6% of all measurements will fall within s x of x, just as in the infinite sample set case ( s x is said to be statistically meaningful). In analogy to the infinite sample set case (equation 6), the confidence interval for a large, finite sample set is expressed using the approximations introduced in this section (µ~ x and σ~ s x ) so that: µ x ± Z s x N > 0 () where the values of Z are the same as those for an infinite sample set. By using the mean rather than a single measurement x, we decrease the uncertainty in our reported result by a factor of N. This is another reason why it is a good idea to make as many measurements of a single property as possible. As we have said, in this course, P=95%, which means that Z=.96. Thus, µ x ±.96 s x N>0, 95% confidence 6

3. A Small Number of Measurements It is often impractical to obtain even 0 measurements of a single property. Unfortunately, when N<0, the distribution of measured values is not well approximated using a Normal function. Instead, we must describe the distribution of measured values about the true value using another distribution function called the Student t function. The form of the Student t function is complex; for our purposes you need only know that it depends on the value of N. Fortunately, even though the distribution function is different, the form of the equation for the confidence interval is quite similar to that for a large sample set: µ x ± t P,ν s x N < 0 (3) Just as before, the value of s x is calculated from equation (). The difference here is that the standard score, Z, is replaced by the Student t value, t P,ν, where P is the confidence level, and ν=n- is the number of degrees of freedom. At 95% confidence, µ x ±t 0.95,ν s x N<0, 95% confidence Values of t P,ν are tabulated in Table 3 (p.49) of SGN. As an example, the expression for the 95% confidence interval for a set with N=4 would be: µ x ± t 0.95,3 s x = x ± 3.8 s x Notice that the uncertainty for an N=4 sample set is quite a bit larger than for the large (N=0) sample set case. In fact, t P,ν increases dramatically as N decreases, to the alarming limit of 63.7 when N= (at 95% confidence). We are once again motivated to make our sample set as large as possible! In the other extreme, t P,ν Z as N, and the Student t distribution reverts to a perfect Gaussian distribution for infinite sample sets. 4. Estimation of Error Sometimes it is impractical to calculate even an approximate standard deviation. In these cases, we can only estimate the uncertainty associated with a measurement by non-statistical means. The most common approach is to estimate the random error associated with the measurement based on either a history of measurements (i.e.: manufacturer specifications) or observed random noise from an instrument. For example, the manufacturer may state that a balance is good to ± 0.00 g. This estimated uncertainty can usually be treated as the 95% confidence interval. However, if the balance is in a very windy room and you observe fluctuations of several milligrams due to the wind, it would be more prudent to assign a 95% confidence interval of perhaps ± 0.003 g, depending on the size of the fluctuations that you observe. In estimating the error this way, you must be sure that all possible sources of error are accounted for. For example, if the compound that you are weighing tends to stick to the weigh paper, not all of the compound that is weighed will actually be used in the experiment, and this additional uncertainty must be evaluated (it might be negligible). Of course, if you make many measurements, you could estimate the error statistically. In general, the bulk of the contribution to the overall uncertainty in a measurement will come from only a few sources. Because it is often difficult to analyze the relative importance of all 7

errors while making measurements, it is good practice to write down estimated errors for all measurements as you obtain the data it can always be ignored later if it is negligible. C. Propagation of Error It is a rare thing indeed to be able to directly measure the value of a property of interest. More often, we must obtain its value indirectly from the measurement of some related property. Since the measurement has error, the value of our interesting property will also have error. We now turn to the task of propagating experimental uncertainties through calculations. We will examine two methods: Differential Error Analysis, which is most useful if we are calculating a value using an analytical expression (an equation), and Linear Least Squares Analysis, which is most useful if we are correlating two or more sets of data (i.e. plotting the data, although you may not need to actually make the plot).. Differential Error Analysis Suppose that the value of some physical property, F, is related to the values of the measured properties x and y by a mathematical expression (the treatment here can easily be extended to more variables). The goal of differential error analysis is to determine the uncertainty in F by propagating the uncertainties in x and y through the mathematical expression. In doing so, we will assume that we have some measure of the uncertainties in x and y (either from statistics or by estimation), and that the uncertainties are uncorrelated (independent of each other). Derivation of DEA Equation: We begin by examining the total differential of F with respect to the measurement variables x and y: df = F dx + F x y dy (4) The terms df, dx, and dy represents infinitesimal changes in F, x, and y, respectively. The values of F, x and y will vary due to experimental errors. We expect our experimental errors to be small, but probably not infinitesimally small. Therefore, we will write each term as Taylor expansions about the means. For example, dx ( x x )+ a( x x ) + b( x x ) 3 +K (5) Similar expressions can be written for dy and df. The expression (x- x ) is a sort of raw error in x (the deviation of x from the mean). Since our errors are (hopefully) small, the higher order terms can be neglected (they are even smaller). Therefore, we can rewrite equation (4) as: (F F ) F (x - x ) + F x y ( y - y ) (6) Recall that the standard deviation is a root-mean-square error, and that an RMS expression is constructed by squaring, then averaging, then square-rooting. Therefore, we first square equation (6), 8

(F F ) = F x (x - x ) + F y ( y - y ) + F F x y ( x - x )( y- y ) (7) then construct a mean (an average). Assuming that both x and y have been measured N times, (F F ) N = F N x (x - x ) + F y ( y- y ) + F F N x y ( x - x )( y- y ) There are two types of error terms in equation (8): squared terms (such as (x - x ) ) and cross terms (such as (x - x )( y - y ) ). Since the squared terms will always be positive, they will always contribute to the uncertainty in F. However, since the cross terms contain products of independent uncertainties, they will sometimes be positive and sometimes be negative (that is, for some measurements (x - x ) will be negative, but ( y - y ) may be positive, and vice versa). If N is large enough, all of the cross error terms should cancel, and the average of the cross terms should approach zero. Therefore, we will ignore contributions from the cross terms in equation (8). Recognizing that the averaged squared error is equal to the squared standard deviation ( MS error): s F = F x + F y s x s y where we have used the approximate standard deviations of the means because these are the best estimates of the uncertainties in x and y. We are almost there. The final step is to multiply the entire equation by the squared standard score (or t-value) at the chosen confidence level in order to convert the squared standard deviations to squared confidence intervals: Differential Error Analysis Result: (8) (9) λ F = F x + F y λ x λ y (0) Thus, the square of the uncertainty in F is governed by the squares of the confidence limits in the measured values x and y, and by the partial differentials of F with respect to x and y. If we know the analytical expression for F in terms of x and y, then we can use equation (0) determine the uncertainty in F. One benefit of differential error analysis is that the effect of each source of error on the final outcome can be evaluated independently. For example, if the uncertainty in y (λ y ) dominates the uncertainty in F, we might try to redesign the experiment so that y is measured with less error, while preserving our technique for measuring x. 9

. Least Squares Analysis It is often more convenient to extract the values of some physical properties through a graphical analysis rather than from a direct calculation, particularly when the system is overdetermined (there are more sets of data than values to extract). In a least squares analysis, the idea is to find the best fit of the experimental data to a function that contains some number of adjustable parameters (one or more of which represent the properties of interest). In a linear least squares analysis, the function is a linear equation. Since an equation for a line has at most two adjustable parameters (equation ()), we must have significantly more than two sets of data points to perform a meaningful linear least squares analysis. We will use only the linear version in this course; SGN has a good treatment of least squares analyses that includes more complicated cases (p.70-73). Derivation of a Least Squares Line In a linear least squares analysis, the goal is to extract values for the properties α and β by obtaining the best fit of our experimental data (x, y) to the linear function y = α + β x () where y is the dependent variable, and x is the independent variable. Three assumptions are made in performing a linear least squares analysis: A linear relationship exists between the dependent and independent variables. The independent variable is assumed to be exact; all of the error is assumed to be random and is forced into the dependent variable. All measurements of the same quantity have equal uncertainties. In general these are reasonable assumptions. However, if a very poor fit is obtained, it may be that x and y are not linearly related, or that one of the other assumptions is invalid. The least squares approach is similar to the method we would use if we were fitting a line to a set of points by eye: we want the differences between the experimental values of y (the points), and the values of y calculated from equation () (the line) to be small. This amounts to adjusting the values of α and β so that the calculated and measured values of y are as close as possible. We will distinguish between measured values of a property and calculated values of a property by writing a hat (^) over the variable that is calculated from equation (). We begin by defining a quantity called the residual, r i, which represents the vertical (because all error resides in y) deviation of the measured value (no hat) from the fitted line (hats). r i = y i ) y i = y i α ) + ) β x i [ ( )] Keep in mind that from the least squares analysis view, the experiment has been completed, which means that the values of x and y are now fixed, whereas α and β are what we are trying to vary. Therefore, we will never have a calculated value of x, and we will always have calculated values of α and β. As in the differential error analysis treatment, we wish to work with errors that will not accidentally cancel. In a least squares analysis, this comes in the form of the chi-square function, χ, which is another kind of mean-square-error: () 30

χ N N = r i = y i α ) + ) β x i = ( ) i =[ ] As usual, N is the number of pairs of (x, y) measurements. To perform a least squares analysis, we minimize ( least ) the chi square ( squares ). The minimum value of χ will occur where the first derivatives with respect to each adjustable parameter are zero. That is, to minimize χ, we require that and χ α ) = y i N ) α ) β (3) ( x i ) = 0 (4a) χ ) β = x i y i ) α x i ) β x i ( ) = 0 (4b) (Recall that the adjustable parameters α and β are the variables here.) Equations (4a) and (4b) can be solved simultaneously to yield expressions for the best values of α and β, which was our original goal: LLS Result: ) β = N x i y i x i y i N x i x i ( ) or ) β = ( x i x )( y i y ) ( x i x ) (5a) and ) α = y ) β x (5b) The expressions in equations (5a) are equivalent. For spreadsheet analyses, the second form will be more convenient. Uncertainties in the Least Squares Fit Parameters A differential error analysis treatment must be performed on equations (5a, 5b) in order to obtain expressions for the uncertainties in the best fit values of α and β. This is an unpleasant task at best. Only the results of this analysis are presented here, but you should understand where they come from and, in principle, how to get them. (Note that this treatment uses the first form in equation (5a) as a starting point.) For simplicity, we define the quantity D to be the denominator of equation (5a), ( ) D = N x i x i The approximated standard deviations in the best fit values of α and β are then given by: (6) 3

Error in LLS Parameters: s ) α = s ) x i r i ( N )D = N r i β ( N )D (7a) (7b) where (N -) is equal to the number of degrees of freedom in the fit. (Notice that if N =, then there are no degrees of freedom in the fit, so the fit is not overdetermined. The quality of the fit in this case cannot be statistically determined - that is, two points exactly define a line.) The confidence intervals in the LLS parameters are calculated as usual, using ν= N -. For example, the 95% confidence interval for ) α is given by: λ ) 0.95, α = t 0.95, ν s α ) (8) The percent uncertainties in ) ) α and β can be used to estimate of the quality of the fit and the validity of assumptions made on p. 35. Quality in this sense is subjective - it is up to the scientist (you) to determine what is acceptable, and what is not. D. Interpretation of Uncertanties There is always a temptation to consider an error analysis complete once we know the size of the confidence interval for our property of interest. However, the interpretation of the error cannot be neglected. There are many ways to interpret errors, all of which depend on the application. Two common methods are comparison of the experimental result to a known value (i.e. a literature result), and comparison of the error to the experimental value. Clearly it is best to make both comparisons if possible.. Comparison to Known Results If we have correctly placed the true value within our confidence interval, and if the known value is in fact correct (close to the true value), then the known value should fall within our experimental confidence interval. Therefore, one very good check for the presence of systematic error is a comparison of the experimental confidence interval with a known result. This amounts to identifying the accuracy of our experimental result, provided that we believe the known value to be a good estimate of the true value. Bear in mind, however, that systematic error is not the only possible reason why the known value would not fall within our confidence interval. Other possibilities include omission of a source of random error in our error analysis, and an unreliable known value. Be sure to consider all three options before suggesting a reason for disagreement with a known value or pronouncing an experiment accurate (or inaccurate). 3

. Comparison to Experimental Results If the random error is very small, then we would expect that the standard deviation is small, which would make our confidence interval also small. Therefore, we can identify the precision of the experiment by examining the size of the confidence interval. Typically this is done by calculating a percent error, for example: % error = λ P x 00% (9) Again, bear in mind that you may have omitted a source of random error from your analysis. If the error is large (large must be defined by the experimenter), you should try to identify the source of the large error (differential error analysis can help you here), and come up with suggestions for reducing the error. References Shoemaker, D. P., Garland, C. W., & Nibler, J. W., Experiments in Physical Chemistry, 6 th Ed., McGraw- Hill, New York (996), Chapters II and XXII. (This text is denoted SGN within this manual.) Skoog, D. A., Principles of Instrumental Analysis, 3 rd Ed. (or later), Saunders College Publishing, New York (985). 33