Rejection of Data. Kevin Lehmann Department of Chemistry Princeton University

Size: px

Start display at page:

Download "Rejection of Data. Kevin Lehmann Department of Chemistry Princeton University"

Alexia Simon
6 years ago
Views:

1 Reection of Data by Kevin Lehmann Department of Chemistry Princeton University Copyright Kevin Lehmann, 997. All rights reserved. You are welcome to use this document in your own classes but commercial use is not allowed without the permission of the author. The author welcomes any constructive criticisms or other comments from either educators or students. Goal: To demonstrate to students how a statistically obective method for removing or reection of data points can improve the accuracy of estimates made from experimental data. Students will learn how to apply two commonly used techniques, and more generally, how to use numerical simulation of data to evaluate different proposed methods of data analysis. Prerequisites: This worsheet assumes that the student is already familiar with the basic concepts of probability theory and error analysis, including the concepts of a population and a sample drawn from that population; the mean and standard deviation of both the population and the sample; the Gaussian distribution function; and determination of confidence intervals based upon that distribution function. These topics are covered in most texts that discuss error analysis, as well as in several worsheets that I have written and will be included in this archive (Mean_vs_Median.mcd, Gaussian_Distribution.mcd, and Linear_Regression.mcd). The worsheet maes extensive use of the Mathcad s statistical functions, and it would be helpful for the student to have on hand a copy of manual to review what each function call does. Reection.mcd page

2 Introduction: The purpose of this Mathcad Worsheet is to demonstrate why one will sometimes want to exclude certain data points from the calculation of population statistics, such as the mean. This is a subect that is very poorly explained in many introductory boos on Error Analysis and Statistics. The goal of any statistical analysis is to come up with the best estimate of the "true" value of a quantity that is measured with random error, along with a realistic estimate of the liely uncertainty in that measured quantity. The criteria for deciding between competing methods of analysis should be based on whether the analysis methods meet the obective criteria of statistics, not some "religious" commitment to retaining all data. We will compare the distribution of values calculated for a sample mean for samples of 25 data points. Each value represents a single measurement of some physical quantity, such as a voltage or concentration of a solution. Each sample of 25 represents the results of a series of measurements with all experimentally controlled variables the same, i.e., the measurements are expected to give identical values, except for noise which is always present in any measurement. If the measurements are done by hand, 25 is a lot of times to redo the same measurement, 3-5 is more typical. However, for measurements made by a computer, it is often not difficult to signal average this number of times or more. We will generate,000 such data sets, piced from a nown "Prior Distribution" having a zero mean and unit standard deviation, and compare the results of our statistical analysis of the generated data to the statistics of the Prior Distribution. In a real experiment, we never have precise values of the true sample mean and standard deviation, if we did there would be no reason to do the experiment! However, the type of numerical simulation we are going to perform is a useful way to test the procedures we plan to use to analyze our real data. They are also very useful in giving experience about what errors to expect given our assumptions about the nature of measurement errors. From each set of measurements, we compute an estimate of the true value of the quantity we are trying to measure. However, because of measurement error or noise, we will not get the precise prior value we assumed in generating the synthetic data. Further, each set of data we generate will give a different estimate. If our statistical method is unbiased, the mean of these estimates will approach closer to our assumed prior value. The computed standard deviation of the distribution of estimated values is directly proportional to our uncertainty in the true value (assuming we did not now it a priori), based upon a single data set. So now we will start by considering data taen from a nown Gaussian distribution. Reection.mcd page 2

3 N Number of Gaussian random numbers to generate µ 0 Mean of Prior distribution σ Standard deviation of Prior distribution N s 25 Number of data points per sample N t N N s Number of different samples or sets of data points 0.. N s Range variable for data points per sample 0.. N t Range variable for samples y rnorm N s, µ, σ Mathcad function that generates N s Gaussian Random Numbers, with mean µ, and standard deviation σ. These values are put into a two dimensional matrix. Each data sample of 25 points is one column of the matrix. In the empty space to the right as Mathcad to show the y matrix by typing y=. How many columns are in this array? How many rows? mean( y) = Here we chec the statistics of the total set, y, of random numbers, using stdev( y) =.0008 Mathcad functions. Why are these values for the mean and standard deviation of the set of numbers we generated not exactly equal to the values of µ and σ that we used to generate y? Now lets generate statistics for each one of our sets of N s data points, lie we would do if they were real measurements from the lab. avg mean y < The calculated mean of the th sample of 25 points Reection.mcd page 3

4 N s s. N s stdev y rms( x) mean ( x. x) s is the standard deviation (SD) of the th sample. The square root factor is because the Mathcad function stdev divides by N s instead of N s and we need the latter for finite samples of data. Here we define the function rms. this function will return the root mean squared of a set of numbers contained in a vector x. rms( s) = Here we used the RMS definition for the array s. Notice that the RMS value of the the set of sample SD s almost equals the standard deviation of the Prior Distribution, as it should. How many s values are in the array s? What is the meaning of each s value? Why do we use an rms average instead of a traditional mean? (Hint: calculate the mean of s and compare it to what you would have expected) From each set of data points, we have computed a sample mean value, avg, and a sample standard deviation, s. Next we compute the the statistics (mean and standard deviation in the 000 averages) of the set of avg we have determined. mean( avg) = This is exactly the same as the mean of y calculated above stdev( avg) = The computed standard deviation for the generated data is smaller than that of the Prior Distribution by close to the expected factor of N s = 0.2 Let us now loo at plots of the distribution of both data points and means i Set the range variable to define the bins for the histogram plot x i i Vector to define bins for histograms Reection.mcd page 4

5 hdata hist( x, y ).( N. 0.06) hist(x,y) generates a vector whose i th value counts the number of y values that hmean hist( x, avg ) N t occur between x i and x i+. The factor after hist(x,y) is to normalize the histogram to unit area. As Mathcad to display hdata and hmean. Compare these vectors to the ordinate of the Distribution of data figure below. What is a probability density? i Need to redefine the i range variable as there is one less histogram value than elements of x 2 Distribution of Data and Sample Means.5 Probability Density Data Mean (Value - Mean)/Standard Deviation Reection.mcd page 5

6 Let us now apply the Chauvenet s criteria for reection of data. We we now consider criteria for removing or reecting data points that appear to be erroneous. One of the most popular methods (at least in introductory text boos on error analysis) is Chauvenet s criteria. We will now apply this criteria to our data sample, and see how this changes the distribution of average values. To use Chauvenet s criteria, we compute the mean and standard deviation of each set of data. We then assume that the errors are normally distributed, which is what we designed our original data to be. We will reect a data point if we find that the probability of finding even one point in a sample of N s points that far from the sample mean is less than /(2N s ). Thus, if our sample contains 25 data points, we will reect a data point if it is more than 2.5 standard deviations from the mean, because the theory of Gaussian distributed errors shows that the probability of finding a point 2.5 or greater standard deviations from the mean is 2%. After we have reected data points (if any) using this criterion, we compute an estimate for the sample mean and standard deviation from the points that are left. reect_prob. 2 N s We reect a data point if the probability of being that distance or a greater distance from the sample mean is less than the selected reect cutoff probability. range qnorm( 0.5. reect_prob, µ, σ) range = For a Gaussian distribution, the variable range returns the normalized deviation from the mean (in terms of the standard deviation) that contains a fraction, reect_prob, of the data points We will reect data points such that y avg >. range s Display the value for the right side of the inequality. How large is this vector and what does it contain? Reection.mcd page 6

7 Aside: One may be tempted to be more udicious in the reecting data by removing only data that falls even further from the mean. However, one must recognize that it is simply not possible, mathematically, for a data point to be further than N. s s from the mean. Thus, if we mae the interval too large, we will never remove a data point. The basic problem is that the value for s will be dominated by any outliers in the data set. If we have an independent estimate for the expected standard deviation of the data, it is far better to use that in the test, especially if N s is not large. This can often be done since we typically mae many sets of measurements with the same apparatus, as we vary some parameter(s). Unless we have grounds to thin otherwise, one should assume that the statistical character of the fluctuations do not change from one data set to the next. Thus, we can combine the individual s values to give a better overestimate of the σ for the distribution. Another useful technique to use when we expect outliers is to estimate σ from the mean absolute deviation of the data from the sample mean. For a Gaussian distribution, the expectation value of the mean absolute error is.253 σ. The advantage is that the mean absolute error is much less effected by the outliers in the data set than the root mean squared error. In particular, it is possible for a data point to have an absolute deviation of as large as N s /2 times the mean value. Exercise: Consider a set of N s - points with value 0., and one point with value.. What is the mean and standard deviation of this distribution? 2. How many sigma away from the mean is the point at? 3. Compare the value for s estimated from both standard deviation and the mean absolute error of the sample? 4. If the point at is erroneous, which method gives a better estimate of s? End of Aside Reection.mcd page 7

8 We will mae use Mathcad s Boolean functions which are of the form (x > y). This is evaluated as. if the condition inside the parentheses holds, 0 if not. Thus, y avg > range. s = if y is to be reected and zero if not. N_reect y avg > range. s Number of data points to be reected in the th sample N s N_reect = Number of data points left in th sample Display N_reect. How many data points are reected from each set. mean( N_reect) = Average number of data points reected per sample; this is less than the 0.5 because in samples with a point far from the mean, we will overestimate σ by the sample standard deviation s. avg2 is a vector of the mean of our data points left after filtering data by Chauvenet s criterion avg2. N s N_reect y avg.. range s y The Boolean function restricts sum to only those points not reected Let us now compare how the our estimates of the true sample mean compare with the Prior value of zero. mean( avg2) = The mean of our sample averages after filtering mean( avg) = The mean of our sample averages without filtering. stdev( avg2) = 0.26 Standard Deviation of the sample averages with filtering Reection.mcd page 8

9 stdev( avg) = Standard Deviation of the sample averages without filtering. The value for standard deviation of avg2 is larger than for avg. Based upon this result, do you suggest using Chauvenet s criteria to reect data? Let us mae a plot to compare to the distribution of mean before and after filtering data i x i i i Setting up bins hc hist( x, avg2 ) N t hmean hist( x, avg ) N t Mean Distribution of filtered data Mean Distribution of initial data 3 Distributions of Sample Mean Probabilty Density Mean of Filtered Data Mean of Initial Data Error of Mean / Sample S.D. Reection.mcd page 9

10 As we can see from the graph, by reecting some of the data, we have slightly increased the uncertainty in the mean estimated from a sample. If we now that our noise is described by a Gaussian or normal distribution, as is often assumed, then we get the best estimate of the true sample mean if we never reect data. However, if we do apply the test, the increase in the uncertainty in the mean is small. So why is it useful to sometime reect some of the data? The reason is that in the real world, the distribution of errors is never exactly described by a Gaussian. Often, the real distribution follows a Gaussian closely near the center, but the probability of getting data points many standard deviations from the mean is much larger than predicted by a Gaussian distribution. There are many reasons for this. They come down to the ever present possibility of a rare but substantial disruption of the experimental apparatus. These often fall in the category of "one over f noise", which is so named since the noise spectrum produced by these disturbances is often found to be a spectrum of approximately /f. Such noise is nown to be ubiquitous in physical systems. Even rare disturbances can introduce sizable uncertainty in the mean, since they can pull the mean far from the correct value. In order to model this effect, we will multiply a random % of the data points by a factor of 00. This corresponds to a population distribution that is the sum of two Gaussian functions: a narrow one with 99% of the population, and a one hundred times wider Gaussian with only % of the population. However, when we compute the sample variance, the broad Gaussian dominates, leading to a standard deviation approximately ten times larger than that of the narrow Gaussian alone. It is in such cases that methods to test data points and reect "outliers" will lead to a substantial improvement in the sharpness of our predictions. λ 0.0 Fraction of data from "outlier" distribution γ 00 Relative standard deviation of outlier distribution y., y, ( ( γ ).( rnd( )< λ) ) Multiply each data point by 00 a random % of the time and by the other 99% of the time Reection.mcd page 0

11 mean( y) = This does not strongly effect the overall mean, which is still close to the Prior value of zero. stdev( y) = The standard deviation of the total data set is now an order of magnitude larger than before avg mean y Recompute the set of sample means and standard deviations N s s. N s stdev y rms( s) = Again, the RMS value of the standard deviation nearly matches that of the total sample stdev( avg) =.999 The standard deviation of the mean is also an order of magnitude larger than before t-test: We can test how often the calculated mean, avg, contains the true sample mean (0.), within the estimated 95% confidence interval. This test is based upon an assumption of simple Gaussian error, which does not rigorously apply in this case t_range t_range = qt 0.95, N s N s For samples following a Gaussian distribution, the absolute deviation of the sample mean from the true mean of the distribution will be less than t_range x s (sample standard deviation) 95% of the time. N t. avg > t_range. s = Fraction of means found outside predicted confidence interval For this level of outliers, t-test still wors well, in fact, it overestimates the size of the 95% confidence interval. Reection.mcd page

12 We will now use the Chauvenet s Criterion to "reect" data from these samples N_reect y avg > range. s mean( N_reect) = Note that we are more liely to reect a data point than before avg2. N s N_reect y avg <.. range s y Average of remaining points. s2 y avg <.. range s y N s N_reect avg 2 Standard Deviation of remaining pts. mean( avg) = mean( avg2) = stdev( avg) =.999 stdev( avg2) = 0.32 When we use Chauvenet s criteria to reect outliers, the standard deviation of our estimates of the true mean of the distribution (which is zero), is close to what we obtained above from the narrow Gaussian distribution alone, and almost one order of magnitude smaller than that obtained if we retained all the data points. If you believed that the assumptions used in constructing this distribution function provided a good representation of your experiments, would you recommend using Chauvenet s criteria to filter your laboratory data? Explain rms( s2) = The RMS value of the SD also falls by a large factor Reection.mcd page 2

13 To get a better understanding of how use of Chauvenet s criteria reduces the uncertainty in our estimate of the sample mean, let us Plot the two distributions of sample means: i x i i Vector to define bins for the histograms hc hist( x, avg2 ). N. t 0.06 histogram of sample mean values, using Chauvenet s criteria hmean hist( x, avg ) N t histogram of sample mean values, using all the data points. i Distributions of Sample Mean Probabilty Density Mean of Filtered Data Mean of Initial Data Error of Mean / Sample S.D. The two distributions loo very similar near the center, but the "Filtered" data distribution is higher because it does not have the wide "wing" of highly divergent values. This last result should not be surprising when we consider the fact that we have a probability = ( λ) N s = that a sample has not even one point from the wider distribution. To mae the wings of the distribution clear, we will blow up the vertical scale. Reection.mcd page 3

14 0.2 Distributions of Sample Mean 0.5 Probabilty Density Mean of Filtered Data Mean of Initial Data Error of Mean / Sample S.D. Let us summarize what we have observed so far:. For a truly Gaussian distribution of data, the use of Chauvenet s criteria of data reection produces a worse estimate of the true sample mean. However, we only get a very modest increase in uncertainty. Qualitatively, we would obtain the same result if we had used any of the other available methods for reection of outliers in the data. 2. We then considered a distribution function of data that consists of a narrow Gaussian distribution, but with a small percent from a much wider distribution, corresponding to outliers in the data. We find that even though the probability of getting a single outlier in one set of data was small (~22%), the distribution of sample means now also had a much wider tail, with a substantial probability of an error much larger than that expected based upon the narrow distribution alone. 3. By reecting outliers in the data by Chauvenet s, we dramatically reduce the probability of a large error in our estimate of the true mean. The method is not perfect in that we will not reect all data from the wider distribution function, and we will reect some data points that are part of the narrow Gaussian distribution. However, we have a much better statistical estimate than the average of all the data points, no matter how far from the mean. Reection.mcd page 4

15 A natural question to consider is should we now apply the Chauvenet s criterion on the pruned data set? Since the RMS sample standard deviation has been significantly reduced, one may now find further points that should be eliminated. We will now try this and see if we obtain a further reduction in the range of calculated mean values. We should recalculate the value of range, using N s N_reect instead of N s, but that would require a different value for each sample (). Since N s >>N_reect, we expect small error from this approximation. N_reect2 y avg2 > range. s2 mean( N_reect2) = Note that we are only modestly more liely to reect a data point than before mean( N_reect) = Sample mean and standard deviation after a second pass of data reection avg3. N s N_reect2 y avg2 <.. range s2 y s3 y avg2 <.. range s2 y N s N_reect2 avg2 2 mean( avg3) = stdev( avg3) = 0.28 Two passes of Chauvenet s criterion mean( avg2) = stdev( avg2) = 0.32 One pass of Chauvenet s criterion mean( avg) = stdev( avg) =.999 Unfiltered data rms( s3) = rms( s2) = rms( s) = Compare the mean, stdev, and rms values for each filtering pass. Where is the greatest improvement? How significant is the improvement? Reection.mcd page 5

16 In comparing methods, we concentrate on the standard deviation of the set of sample means, rather than the mean of sample mean values. Why is this? Let us Plot the mean value distributions for one and two passes: hc3 hist( x, avg3 ). N. t Distributions of Sample Mean values Probabilty Density One Pass Two Passes Error of Mean / Sample S.D. The two distributions loo almost identical. Let s plot again with an expanded vertical axis. 0.2 Distributions of Sample Mean values Probabilty Density One Pass Two Passes Error of Mean / Sample S.D. Reection.mcd page 6

17 We see that the primary effect of the second pass is to eliminate the few remaining mean values that are more than one sample SD from the true mean. The above numerical experiments demonstrate that we can mae sharper predictions from a limited set of experimental data if we adopt a protocol to reect data points that appear to be far from the rest. The optimal strategy depends upon the population distribution. For a perfect Gaussian distribution, any data reection reduces the sharpness of the estimate of the mean. However, the degradation is modest. For a population that includes a small fraction with a much broader spread than the bul of distribution, data reection can pay large dividends. I encourage you to adust the value of different parameters used above, such as λ and γ, to see how the calculated results change. Numerical "Experiments" of the type used in this worsheet are a powerful way to estimate the expected precision of proposed methods of data analysis, as a function of the statistical properties of the distribution to be sampled. The reection of data is often viewed negatively, as fudging or even dishonest. This is certainly not true if the follow rules are followed.. The decision to reect data is made by an obective statistical test, not by subective decision 2. The algorithm used to reect data should be clearly noted when the data is reported, and all the data, including the reected points should be stored together. 3. The decision on the algorithm to be used should not change after the data has been taen and subected to a first analysis. Since it helps to have an idea what the data population distribution is when maing such decision, it desirable to tae a preliminary data set and try different algorithms on it as part of the selection process. However, this data should not be included in the overall statistical analysis. When data is wored up by hand, a strongly deviant point will be obvious to any but the most oblivious person. Often, these data points are simply omitted or perhaps the entire data set retaen. The latter is often a needless waste of effort, but certainly better than averaging all data, outliers and all. This is especially true if confidence intervals based upon assumed Gaussian statistics are invoed. Reection.mcd page 7

18 Why is this? Thin about the distribution of sample means shown in the plots above, compared to those obtained earlier using a true Gaussian distribution of errors. Q-Test: Cauvenet s criteria is an efficient test for outliers when either we are dealing with a sample of reasonable size, or we have a prior estimate of the expected sample standard deviation. For small samples (say N < 0), the problem is that if we estimate the standard deviation from the sample itself, it is liely to be dominated by the possible outlier. This is why, above, it was cautioned not to mae the probability of reection of a data point much below /2N, or one will never eliminate any points, no matter how deviant. For small samples, a more efficient test for outliers is nown as the Q-test. One arranges the data points in ascending order. Suppose the lowest point (i.e. the first one) loos far from the pac. We define the ratio Q = (x 2 - x )/(x N - x ), which is the fraction of the range of data used up by the first interval. We define P Q (Q 0,N) as the probability that for a random sample of N points selected from a Gaussian distribution that Q will be greater than Q 0. Since we are dealing with a ratio of two differences, we can compute this probability for a standard Gaussian distribution and it will apply to any Gaussian regardless of the mean or standard deviation. I leave it for the reader to ustify the following expression for this probability, remembering that if the first point is a x and the last at x N, then the points x 2...x N- must be in the interval [x +Q 0 (x N -x ),x N ]. I split the expression into two so it will fit on one line x N f temp x, x N, Q 0, N e. cnorm x N cnorm x. Q 0 x N x N 2 N.( N ) P Q Q 0, N. 2. π 0 0 e x. 0 x f temp x, x N, Q 0, N dx N dx Reection.mcd page 8

19 For example, the probability that with 5 points, the first two will be separated by half of the total range is calculated by: P Q ( 0.5, 5) = 0.49 Using the probability distribution function for a standard Gaussian, P( x ). exp 2. π x2 2 derive the above expression. Note that as the points are independent, then the oint probability distribution function is P (x,x 2...x n ) = P(x )P(x 2 )...P(x n ). Also note that we can always relabel the points to mae x the smallest and x n the largest point, but we must multiply by N(N-), which is the number of possible position for the lowest and highest points in the list. Points from a general Gaussian distribution, with mean m and standard deviation s, can be written as z i = m + s x i, where x i are points from a standard Gaussian distribution. Show that the value of Q calculated for [z,z 2...z n ] is the same as Q calculated in terms of the [x,x 2...x n ]. The probability that the highest two are separated by the same fraction is identical, given the symmetry of the Normal distribution. For testing data, we are more interested in the inverse of the above function, i.e. for what value of Q is the P Q less than some given fraction p? Use Mathcad s root find to get this value. We need to pass an initial guess for Q 0. Q_test( p, N, Q) root P Q ( Q, N) p, Q If we are reecting data with a confidence interval, say 90%, we want to now what the probability is that either the lowest or highest point fails the Q test. This is given by the sum of the probability that the lowest point does plus the probability that the highest point does. This double counts cases where both the first and last interval fail the test simultaneously, but that case is of low probability in cases of interest. Thus if we want a 90% confidence for reecting a point from a distribution of N=0, this point must satisfy a Q value greater than: Q_test( 0.5.( 0.90), 0, 0.5) = 0.4 Reection.mcd page 9

20 We used 0.5 as an initial guess of Q 0 for the root search. You must change this guess if the root search does not converge, or returns an unreasonable value, say Q 0 outside the region [0,]. One can ust try different values in the P Q function until a probability close to the desired value is found and use that as an initial guess. In this case, the probability of both the first and last points failing the Q-test is only 2x0-5, and thus is negligible as predicted. I leave it as an exercise to the reader to modify the f temp function given above to calculate this probability. Using the Q_test function, one can get Q 0 values for any confidence interval with any number of points. I suggest that the reader perform a test of a sample data set generated by Mathcad with nown statistical properties, lie we did above, and compare the number of points reected by the Q-test with that expected. The Q-test is efficient as long as we have no more than one outlier in the data set. As a result, it is best used for relatively small data sets or where the probability of an outlier is considered small, so the chance of two in one particular data set can be neglected. One will only be ustified in removing multiple data points if the sample size is reasonably large, at which point one should use the Chauvenet s criteria. Modify the f temp function above to calculate the probability that both the first and last interval simultaneously are larger than Q times the total interval, x -x n, and then calculate this probability for Q=0.4 and n=0. Hint: In what interval (in terms of x and x n ) must the points x 2...x n- now lie? The end points of this interval is the values at which cnorm must be calculated. For a 90% confidence interval, what is the critical value for Q to use on data sets with 25 points? How about for a 98% confidence? Q t Q_test 0.5.( 0.90), N s, 0.3 Q t = At the bottom of this worsheet, write the Mathcad code to apply the Q test to the simulated data sets y <> to reect outliers. For this, you will need to use Mathcad s sort function to put each column of y in ascending order: y sort y Reection.mcd page 20

21 Compare the standard deviation of the sample mean of data filtered using the Q test to the same data filtered using Chauvenet s criteria. Which is better for this data? Try modifying both the number of data points in each sample, and the probability of an outlier, and determine empirically under what conditions one test or the other produces a sharper distribution of the sample means. N s 2 i = avg4 y, y 0, y N y. s, y N s 2, i, < Q t y. y N s, y 0, < 0, y N s, y, y, y 0, y N s, y N s 2, Q t y N s, 0 N s 2 < Q t < Q t y N s, y 0, y N s, y 0, stdev( avg) =.999 No filtering stdev( avg2) = 0.32 Chauvenet s criteria stdev( avg4) = 0.57 Q-Test It is not obvious what is the best confidence interval to tae in the Q-test. In fact, one procedure widely used is to automatically throw out the highest and lowest points and average the rest, corresponding to Q t = 0. Try this and compare with the other results. Of these two methods, which one do you expect will be better for smaller sample sizes, and which for larger? avg5. N s 2 N s 2 i = y i, stdev( ) = avg Go to the top of the worsheet and change N s to 0. How do the two methods of data reection now compare? How do the different methods now compare? Also try N s equal to 5. Often, we do not have a number of identical measurements, but rather a series of measurements as a function of some variable that changes. For example, in a inetic study, we would measure one or more concentrations verses time. We then fit the data to a model, based upon an assumed rate law. Explain how we could modify the Chauvenet and Q tests to apply to this situation. Reection.mcd page 2

Part I. Experimental Error

Part I. Experimental Error 1 Types of Experimental Error. There are always blunders, mistakes, and screwups; such as: using the wrong material or concentration, transposing digits in recording scale readings,