Chem 321 Lecture 4 - Experimental Errors and Statistics 9/5/13

Chem 321 Lecture 4 - Experimental Errors and Statistics 9/5/13 Student Learning Objectives Experimental Errors and Statistics The tolerances noted for volumetric glassware represent the accuracy associated with their use. Accuracy is a measure of absolute error, of how close a measured quantity (for example, the delivered volume) is to the true or accepted value (the expected volume). The closer it is to the true value, the more accurate the measurement. If a pipet is rated as 10.00 ± 0.02 ml, this means that with proper use this pipet should deliver a volume between 9.98-10.02 ml. On the other hand, precision is a measure of how reproducible a series of measurements is; the more reproducible the measurements, the better the precision. The difference between the meaning of these terms is illustrated in Figure 2.1. Figure 2.1 A visual representation of accuracy and precision

page 2 When an experiment is repeated many times, the exact same result is not obtained each time. Consider the following set of results obtained in the calibration of a 2.0-mL glass transfer pipet. Calibration Results for a 2.0-mL Transfer Pipet 1.998 ml 1.991 ml 2.001 ml 1.999 ml 2.003 ml 1.998 ml 1.997 ml 2.002 ml Several probing questions are suggested by these measurements. 1. Why is there a range of values? 2. What value is the best estimate of the true value? 3. What can be said about the reproducibility of the results? 4. What can be said about the accuracy of the results? 5. On what basis should one or more of the data points be rejected?

page 3 Why is there a range of values? Every measurement is associated with error that cannot be compensated for. This is called indeterminate error. It is also known as random error because there is an equal chance of the measurement having a positive or negative error. If several factors are involved in the measurement, the random errors from all factors add together. The overall effect of random errors is to make some measurements in a series smaller than the true value and some measurements larger, resulting in a range of values. Suppose you flip a coin 10 times and record the number of heads that occur. If you repeat this hundreds of times, the measurements form a distribution of values which reflects combinations of possible indeterminate errors that exist. When a very large number of measurements is made, a bell-shaped plot of frequency of a certain result versus the result value, known as a Gaussian distribution or normal error curve, is formed (see Fig. 2.2). Since random errors are equally likely to be positive or negative, this curve has a symmetrical shape. There is an exponential decrease in frequency as the measured value gets very large or very small compared to the other values. This indicates that large indeterminate errors occur less often than small ones. Figure 2.2 A Gaussian distribution results from a very large number of measurements

page 4 On the other hand, systematic error, or determinate error, generally has a definite value and an assignable cause; in principle it can be measured and accounted for. Often it will result in all measurements being too large or too small. The tolerance associated with a transfer pipet is an example of a systematic error. Calibration of the pipet largely eliminates this error. Check for Understanding 2.1 Solutions 1. The figure below indicates the experimental results of 4 analysts. Each black dot indicates the result for one measurement. a) Characterize the set of results from each analyst in terms of its accuracy and precision. b) Indicate on the figure the determinate error associated with the results from analyst 3. c) Indicate on the figure the largest indeterminate error associated wih the results from analyst 2.

page 5 What value is the best estimate of the true value? In the absence of systematic errors, the distribution of a very large number of measurements (n) will be centered on a value with no determinate error; that is, the true value (also know as the population mean, μ). The best estimate of the true value is generally the average, or sample mean. The average (xg) is calculated as where x i is an individual value and n is the total number of measurements. The average of the 8 pipet calibration results above is 1.998 6 ml. Note that an extra, non-significant digit is reported, as a subscripted digit, to avoid round off errors if the average is used in other calculations. For small data sets (3-4 results) the median value (middle value) may be a better estimate of the true value, especially if one of the results seems anomalous. What can be said about the reproducibility of the results? The best statement about the reproducibility or precision of the results is the standard deviation (σ) 1. In fact, the normal error curve is defined by just two parameters, σ and μ. The Gaussian distribution is described by: The population standard deviation is calculated by: where x i is an individual value and N is the total number of measurements in the population. The statistical power of the standard deviation is that for any normal 1 Statisticians refer to F as the population standard deviation.

page 6 distribution 68.3% of the values fall within ±1σ of the mean (see Fig. 2.3b). This means that the smaller the value of σ, the more reproducible the data (see Fig. 2.3a). Figure 2.3 Normal error curves. (a) The abscissa is the deviation from the mean in units of the measurement. (b) The abscissa is the deviation from the mean in units of σ. 2003 Thomson - Brooks/Cole Other general statements can be made regarding the normal distribution; for example, 95.5% of the values fall within ±2σ of the mean and 99.7% of the values fall within ±3σ of the mean (see Fig. 2.4).

page 7 Figure 2.4 Area under the normal error curve encompassed by varying numbers of standard deviations See the table below for more details about the area under the normal error curve.

page 8 For the coin flip measurements mentioned above and illustrated in Figure 2.2, the average number of heads was 5.04/10. This is very close to what you would expect (5.00/10) because hundreds of measurements were made and there was no determinate error (e.g., the coin was not distorted in some way that might affect the result). The standard deviation for these measurements was 1.62 (see Fig. 2.5). Figure 2.5 Gaussian distribution for a series of coin flips However, μ is never known exactly so there is some error equating xg with μ and a corresponding error in σ. Small sets of data (< 30 measurements) tend to underestimate the standard deviation because extremely large or small accumulated random errors are less likely to occur. This bias can be reduced by estimating the standard deviation by s (referred to as the sample standard deviation), where

page 9 For the 8 pipet calibration results above s equals 0.003 7 ml. This means that approximately 68% of the measurements fall in the range 1.995-2.002 ml. Note that the standard deviation is rounded off to the rightmost place used for the mean. However, the statistical power of the standard deviation is compromised for very small data sets (n 3-4). Note that for the above data set, 6/8 (75%) of the values are within the ±1σ range. In such cases, the mean deviation (MD) is frequently used as a measure of reproducibility. The relative mean deviation (RMD) in ppth is also routinely used as a measure of precision for small data sets. The smaller the MD or RMD, the more reproducible the results. Check for Understanding 2.2 Solution 1. What is the MD and RMD (in ppth) for the 8 pipet calibration results above? What can be said about the accuracy of the results? A statement about the reproducibility of a set of measurements can be made from even a casual glance at the data. However, it is not at all obvious that a statement about the accuracy of these measurements can also be made. Although we do not know μ, we can set limits about xg within which we may expect to find the population mean with a given degree of probability. This set of limits is known as a confidence interval. For small data sets, the confidence interval is calculated by

page 10 where μ is the population mean, xg is the measured mean, s is the estimate of the standard deviation, n is the number of measurements and t is Student s t, a factor that depends on the number of measurements and the degree of confidence desired (see table below). For the 8 pipet calibration results and a 90% confidence level, This means that there is a 90% chance that the true mean lies in the range 1.996-2.001 ml. Notice that the degrees of freedom needed to determine the value for t is taken as n-1, and that t gets smaller as you make more measurements (n-1 increases). If you want to be more confident about where the true mean is, the confidence interval will be larger. The confidence interval is usually reported to one significant figure and determines the place with uncertainty in the measured mean, and hence how to round off this result to the correct number of significant figures. Check for Understanding 2.3 Solution 1. What is the confidence interval for the 8 pipet calibration results at the 99% confidence level?

page 11 The validity of an analytical method, to a certain degree of confidence, can be assessed using confidence interval information. This is done by analyzing a sample with known composition and comparing the results. The sample of known composition is often a Standard Reference Material (SRM) from the National Institute of Science and Technology (NIST). SRM s are homogeneous materials that have been carefully analyzed by a variety of techniques and by different laboratories. Each SRM comes with a list of certified values - the true concentrations - for various components present in the material. Suppose an SRM is certified to contain 34.1 ppb iron. If you analyze this material repeatedly and find 32.1 ppb, 30.8 ppb, 33.9 ppb and 31.7 ppb of iron, are your results significantly different from the certified value? Calculate the mean of these results and the confidence interval at various levels of confidence. If the true value does not fall within the confidence interval, you have that level of certainty that a systematic error is present in the measurement. The 4 results have an average of 32.1 2 ppb with a standard deviation of 1.3 0 ppb. From the table above, t = 2.353 and t = 3.182 for 3 degrees of freedom at the 90% and 95% confidence level, respectively. Thus, 90% confidence 95% confidence Since the certified value does not fall within the 90% confidence interval this means that you have a good deal of confidence that the measured result is statistically different than the certified value. However, you cannot be 95% confident that this is the case because the certified value just does fall within the 95% confidence interval.