Unless we have made a very large number of measurements, we don't have an accurate estimate of the mean or standard deviation of a data set. If we assume the values are normally distributed, we can estimate the mean and standard deviation from the data. The sample mean and sample variance are given by: N = i (4.4a) N Ecel: AVERAGE(values) i= S ( ) (4.4b) N 2 2 i N i Variance 2 S S sample standard deviation Ecel: STDEV(values) How close are these values to the true mean and standard deviation? That depends on how many samples we have. FiniteStatistics.doc 9/26/2008 9:37 AM Page
p() 4.4 Finite Statistics Normal Distribution Function 0.9 0.8 Sigma = 0.5 Sigma =.0 Sigma = 2.0 0.7 Sigma = 3.0 0.6 0.5 0.4 0.3 0.2 0. 0-0 -8-6 -4-2 0 2 4 6 8 0 X FiniteStatistics.doc 9/26/2008 9:37 AM Page 2
For a normally distributed data set, we can say that the probability of a sample, i, differing from the data set mean value,, is given by i t, PS (P%) (4.5) t,p is referred to as the t estimator. Eample: http://www.eng.buffalo.edu/courses/mae334/notes/finitestatseample.ls TABLE 4. Sample of Variable i i Find the sample mean, standard deviation, 95% precision interval within which one should epect any measured value to fall, standard deviation of the means, and the 95% estimate of the true mean value. 0.98 2.07 3 0.86 4.6 5 0.96 6 0.68 Mean:.02 7.34 Standard Deviation: 0.6 8.04 t 9,95% : 2.093 9.2 95% Precision Interval: 0.330 0 0.86 Standard Deviation of the means: 0.035.02 95% Precision interval about the true mean value: 2.26 3.08 4.02 5 0.94 6. 7 0.99 8 0.78 9.06 20 0.96 0.074 FiniteStatistics.doc 9/26/2008 9:37 AM Page 3
Table 4.4 Student-t Distribution t 50 t 90 t 95 t 99.000 6.34 2.706 63.657 2 0.86 2.920 4.303 9.925 3 0.765 2.353 3.82 5.84 4 0.74 2.32 2.776 4.604 5 0.727 2.05 2.57 4.032 6 0.78.943 2.447 3.707 7 0.7.895 2.365 3.499 8 0.706.860 2.306 3.355 9 0.703.833 2.262 3.250 0 0.700.82 2.228 3.69 0.697.796 2.20 3.06 2 0.695.782 2.79 3.055 3 0.694.77 2.60 3.02 4 0.692.76 2.45 2.977 5 0.69.753 2.3 2.947 6 0.690.746 2.20 2.92 7 0.689.740 2.0 2.898 8 0.688.734 2.0 2.878 9 0.688.729 2.093 2.86 20 0.687.725 2.086 2.845 2 0.686.72 2.080 2.83 30 0.683.697 2.042 2.750 40 0.68.684 2.02 2.704 50 0.679.676 2.009 2.678 60 0.679.67 2.000 2.660 0.674.645.960 2.576 FiniteStatistics.doc 9/26/2008 9:37 AM Page 4
Standard Deviation of the Means If we take a set of N measurements of the same variable, then repeat this process M times, the mean of each data set will differ somewhat from the others. It can be shown that the mean values themselves will follow a normal distribution even if the original distribution is not normal. The standard deviation of the means is given by: S N (4.6) Notice that the standard deviation of the mean decreases as the sample size increases. We can now say with a certainty of P% that the mean of a S / 2 FiniteStatistics.doc 9/26/2008 9:37 AM Page 5
sample of N values differs from the true mean of the distribution by an amount = t,p S (P%) (4.7) PROBLEM 4.6 Consider a process in which the applied measured load has a known true mean of 00 N and a variance of 400 N 2. An engineer takes 6 measurements at random. What is the probability that this sample will have a mean value between 90 and 0? KNOWN: ' = 00 N 2 =400 N 2 (so, = 20 N) FIND: For N = 6, P(90 0)? FiniteStatistics.doc 9/26/2008 9:37 AM Page 6
SOLUTION Begin by finding the z value for a corresponding z / N For = 90 N, z 90 00 20/ 6 = -2.0 For = 0 N, z 0 00 20 / 6 = 2.0 So, P(90 0) P( 2.0 z 2.0) = 0.9544 So, there is about a 95% chance. FiniteStatistics.doc 9/26/2008 9:37 AM Page 7
Table 4.3 Probability Values for Normal Error Function One-Sided Integral Solution for p( z ) (2 ) / 2 z 0 e 2 /2d z 0 0.0 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0 0.0000 0.0040 0.0080 0.020 0.060 0.099 0.0239 0.0279 0.039 0.0359 0. 0.0398 0.0438 0.0478 0.057 0.0557 0.0596 0.0636 0.0675 0.074 0.0753 0.2 0.0793 0.0832 0.087 0.090 0.0948 0.0987 0.026 0.064 0.03 0.4 0.3 0.79 0.27 0.255 0.293 0.33 0.368 0.406 0.443 0.480 0.57 0.4 0.554 0.59 0.628 0.664 0.700 0.736 0.772 0.808 0.844 0.879 0.5 0.95 0.950 0.985 0.209 0.2054 0.2088 0.223 0.257 0.290 0.2224 0.6 0.2257 0.229 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.257 0.2549 0.7 0.2580 0.26 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852 0.8 0.288 0.290 0.2939 0.2967 0.2995 0.3023 0.305 0.3078 0.306 0.333 0.9 0.359 0.386 0.322 0.3238 0.3264 0.3289 0.335 0.3340 0.3365 0.3389 0.343 0.3438 0.346 0.3485 0.3508 0.353 0.3554 0.3577 0.3599 0.362. 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.380 0.3830.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.405.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.45 0.43 0.447 0.462 0.477.4 0.492 0.4207 0.4222 0.4236 0.425 0.4265 0.4279 0.4292 0.4306 0.439.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.448 0.4429 0.444.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.455 0.4525 0.4535 0.4545.7 0.4554 0.4564 0.4573 0.4582 0.459 0.4599 0.4608 0.466 0.4625 0.4633.8 0.464 0.4649 0.4656 0.4664 0.467 0.4678 0.4686 0.4693 0.4699 0.4706.9 0.473 0.479 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.476 0.4767 2 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.482 0.487 2. 0.482 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857 2.2 0.486 0.4864 0.4868 0.487 0.4875 0.4878 0.488 0.4884 0.4887 0.4890 2.3 0.4893 0.4896 0.4898 0.490 0.4904 0.4906 0.4909 0.49 0.493 0.496 2.4 0.498 0.4920 0.4922 0.4925 0.4927 0.4929 0.493 0.4932 0.4934 0.4936 2.5 0.4938 0.4940 0.494 0.4943 0.4945 0.4946 0.4948 0.4949 0.495 0.4952 2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.496 0.4962 0.4963 0.4964 2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.497 0.4972 0.4973 0.4974 2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.498 2.9 0.498 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986 3 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990 FiniteStatistics.doc 9/26/2008 9:37 AM Page 8
4.7 Data Outlier Detection How do you handle spurious data points? The most common and simplest approach is to label points that lie outside the range of 99.8% probability of occurrence,, as outliers. This three-sigma test works well with data set of 0 or more points. Eample 4. Given 0 data points with a mean of 27 psi and a standard deviation of 3.8 test for data outliers. Three-sigma test for small data sets gives a range of 27 ± t 9,99.8% S = 27 ± 4.3*3.8,.25 < > 42.75. There are no data points outside this range. For large data sets, the modified three-sigma test for outliers can be used. A modified z variable is computed with the data set mean and standard deviation. z 0 for each data point is calculated and the corresponding probability value for the Normal Error Function is found. If the probability value is less than 0.% then the data point is considered an outlier. There is one data point ( i = 8) that is outside this range. 4.8 Number of Measurements Required Some sample statistics must be known to estimate the variation in the data set and therefore estimate a confidence interval in the data yet to be acquired. = t S,P FiniteStatistics.doc 9/26/2008 9:37 AM Page 9
The 95% confidence interval is therefore S CI t, 95% S t, 95% N (95%) The one-sided precision value d is d CI 2 Therefore if follows that N t, 95% S t, 95% d (95%) Problem 4.4 Estimate the number of measurements of a time-dependent acceleration signal obtained from a vibrating vehicle that would lead to an acceptable confidence interval about the mean of 0. g, if the standard deviation of the signal is epected to be 2 g. 2 S N KNOWN: CI = 0. g S = 2 g FIND: N SOLUTION Let d = CI/2 = 0.05 g. We are looking for the number of measurements required to keep ts 0.05g at 95%. N = ( ts / d) 2 If we select a large number of measurements, such that t N,95 =.96, then N 650 For this value, the t value remains unchanged. Thus, a large number of measurements are required due to the close restriction on CI. FiniteStatistics.doc 9/26/2008 9:37 AM Page 0
FiniteStatistics.doc 9/26/2008 9:37 AM Page