Descriptive Statistics Example

Descriptive tatistics Example A manufacturer is investigating the operating life of laptop computer batteries. The following data are available. Life (min.) Life (min.) Life (min.) Life (min.) 130 145 16 146 164 130 13 15 145 19 133 155 1 17 139 137 131 16 145 148 15 13 16 16 16 135 131 19 147 136 19 136 156 146 130 146 13 14 13 13 Using the first two digits as stem we may develop the following plot: Freq. 1 5 6 9 7 6 6 6 9 6 9 10 13 0 1 0 5 6 3 9 1 0 7 6 16 14 5 0 7 5 6 5 6 8 6 10 15 6 5 3 16 4 1 tem-and-leaf plot The plot shows that most of the data is clustered around 130, with few data points crossing the 150 limit. One may conclude that the center of the data is somewhere in the 130s. Variation is harder to judge. Whether the variability is high or low can only be determined on a comparative basis at this stage. If another data set is available (may be for another brand), a back-to-back stem-and-leaf plot could be used to visually compare the variability in both sets. By ordering the leafs, we get the following plot: Freq 1 5 6 6 6 6 6 7 9 9 9 10 13 0 0 0 1 1 3 5 6 6 7 16 9 14 0 5 5 5 6 6 6 7 8 10 15 5 6 3 16 4 1 Ordered tem-and-leaf plot Descriptive tatistics Example 1 of 6 L. K. Gaafar

From the plot above, we may determine many measures of dispersion and central tendency: Minimum 15, Maximum 164, Range 164 15 39. Mode 16, 13 (both are repeated 5 times- Bimodal data) ( ) (13 133) ( ~ x[0] + x[1] + Median x) 13.5. Other measures require some calculations: Average ( x) x i i 1 (130 + 164 +... + 146 + 13) 136.85. These results confirm our initial conclusion that the center is in the 130s. Variance( s ) 9.79. i 1 ( x i 39 x) (130 136.85) +...(13 136.85) 39 95.87. Note: The average, median, mode, variance, and standard deviation may all be determined using the Excel functions AVERAGE, MEDIAN, MODE, VAR, and TDEV; respectively. Also, we may use the ordered tem-and-leaf plot (repeated below for convenience) to determine some probabilities: Freq 1 5 6 6 6 6 6 7 9 9 9 10 13 0 0 0 1 1 3 5 6 6 7 16 9 14 0 5 5 5 6 6 6 7 8 10 15 5 6 3 16 4 1 For example: Only 3 observations are not less than 155. Therefore, P(X<155) 37/ 0.95, or 9.5%. This means that, based on the data we have, we expect 9.5% of the batteries to fail before 155 minutes. 3 observations are 155 or above, P(X155) 3/ 0.075, or 7.5%. This is also the complement of the above probability (1-0.95). 1 observations are greater or equal to 1 and less or equal to 155. Therefore: P(1 X 155) 1/ 0.30, or 30%. Based on the data we have, we ex pect 30% of the batteries to last no less than1 minutes, but no more than 155 minutes. Descriptive tatistics Example of 6 L. K. Gaafar

Notice that all calculated probabilities are approximate estimates that will improve as the amount of data increases. We may develop a Frequency Distribution table for the data, by dividing its range to classes and counting the frequency of data in each class. The number of classes (c) should be between 5 and 0, but close to n, where n is the number of data points (n ). I our case, we should use about 6 or 7 classes. The class width (w) may be determined as w Range/c. In our case w 39/7 5.571. To simplify calculations, we may increase c to 8 and modify w to 5. If we start the first class at 15, its upper bound would be 130, and all other classes are determined accordingly. ince the lowest data point is 15, the lower class limit must be inclusive. The last upper class limit is one point above the maximum, guaranteeing that all data will be included. The following table shows the frequency distribution of the data. Class Interval Tally Frequency Cumulative Frequency Relative Frequency Cumulative Relative Frequency 15 X<130 10 10 0.5 0.50 130 X<135 11 1 0.75 0.55 135 X<1 5 6 0.15 0.650 1 X<145 8 0.05 0.700 145 X<150 8 36 0.0 0.900 150 X<155 1 37 0.05 0.95 155 X<160 39 0.05 0.975 160 X<165 1 0.05 1.000 The following histogram is a graphical depiction on the frequencies above. It shows that most of the data are clustered around 135, with few points above 150. Descriptive tatistics Example 3 of 6 L. K. Gaafar

A cumulative relative frequency plot may be used to calculate various probabilities. For example, in the plot below, we see that the probability of a battery life of less than 150 is 0.95. If our frequency distribution was developed with inclusive upper bounds, we may obtain cumulative probabilities directly from the graph. To do that, we should start the first class from 14 to include all data. Consequently, the upper limit of the last class would be 164. Descriptive tatistics Example 4 of 6 L. K. Gaafar

Now, let us assume that another data set of points is available for another brand of batteries (Battery ). Life (min.) Life (min.) Life (min.) Life (min.) 134 130 1 151 143 134 136 144 150 135 160 141 143 1 138 141 148 146 1 146 151 138 151 139 151 18 146 147 15 14 144 134 14 146 14 136 1 134 145 147 The measures of center and dispersion for Battery are: Minimum 1, Maximum 161, Range 161 1 39. Mode 134, 146, 151 (all repeated 4 times- Multi-modal data) ( ) (14 14) ( ~ x[0] + x[1] + Median x) 14. x i i 1 Average ( x) 14. ymmetric data (Average Median). Descriptive tatistics Example 5 of 6 L. K. Gaafar

Variance( s ) 7.43. i 1 ( x i 39 x) 55.. These results show numerically that Battery has a higher average life with slightly less variation. An easy way to graphically compare the two sets is to develop a back-to-back stem-and-leaf plot. Freq Battery Battery 1 Freq 8 1 5666667999 10 11 988665444 13 00011356679 16 0 87766665443311000 14 055566678 10 6 11110 15 56 3 1 0 16 4 1 Back-to-Back tem-and-leaf Plot The plot above shows that more data for Battery are in the 1s compared to the 130s for Battery 1. Also, the spread (variability) of Battery is less than that of Battery 1. Based on these results, we may conclude that Battery is a better brand (higher average and lower variability). The validity of this conclusion, however, depends on how data are collected and the sufficiency of n. These issues are typically discussed as part of Inferential tatistics and Design of Experiments. A better graphical comparison tool is the box (box-and-whisker) plot. A plot for both data sets is shown below. Box Plot The plot above supports our previous conclusion as the interquartile range of Battery is shorter than that of Battery 1 (less variability), and is shifted to the right (higher center). Descriptive tatistics Example 6 of 6 L. K. Gaafar