Chapter 3: Data Description

Size: px

Start display at page:

Download "Chapter 3: Data Description"

Barnaby Rogers
6 years ago
Views:

1 Chapter 3: Data Description 3.3 Graphical Methods 3.1 a. Pie Chart b. Bar Graph 3.2 a. A pie chart would not be appropriate since we are not proportionally allocating a sample or population into a number of categories. b. Bar Graph c. Yes. There was a decline from the late 80 s to the early 90 s. An abrupt surge upward occurred in 1994, 1995, and Pie Chart and Bar Graph should be plotted. Pie chart is better since we are proportionally allocating the food dollar to seven categories. 3.4 Pie chart should be plotted. 3.5 Bar graph should be plotted. 3.6 a. Range = = 0.33 b. Frequency histogram should be plotted with 7 classes ranging from to The intervals have width.05. c. Relative frequencies are given below. Plot relative frequencies versus class intervals. Class Class Interval Frequence(f i ) Relative Frequency(f i /25) Total n= d. The probability is 7/25 =.28 that the fluoride reading would be greater than.90 ppm. Thus, we would predict the 28% of the days would have a reading greater than.90 ppm. 3.7 Two separate bar graphs could be plotted, one with Lap Belt Only and the other with Lap and Shoulder Belt. A single bar graph with the Lap Belt Only value plotted next to the Lap and Shoulder for each value of Percentage of Use is probably the most effective plot. This plot would clearly demonstrate that the increase in number of lives saved by using a shoulder belt increased considerably as the percentage use increased. 3.8 a. A relative frequency histogram with 10 classes of width 100 units would have the same shape as the stem-and-leaf plot. 5

2 b. The shape is a fairly symmetric, unimodal plot. The stem-and-leaf plot is more informative because it shows the exact crime rates within the leaves (classes) as well as the frequencies. c. Construct a histogram with 10 intervals of width 100, starting at 100 and ending at The relative frequency histogram is multimodal, skewed to the right a. Because there a few relatively very large values in the data set, a frequency table with unequal class widths is given below: Class Class Interval Frequence(f i ) Relative Frequency(f i /25) Total n= b. Plot a relative frequency histogram from the table in a. c. The graph is unimodal skewed to the right. Note that the plot in this case is a graphical distortion because the amount of area in the last two classes is not representative of the relative frequencies in these classes. d. We would estimate that there is a 6/24 = 25% chance that city taxes would be greater than $ The stem-and-leaf would be since the exact values for the 24 cities is being displayed, whereas in the relative frequency histogram the data is grouped into classes. Stem-and-Leaf Plot (Leaf Unit 100) 2 70,71, ,60, ,67,88, ,20,61, ,43, , The stem-and-leaf plot is more informative because it provides the exact values of the data as well as the frequencies within each stem The computer generated relative frequency histogram will generally have classes with equal class widths. This will produce a number of intervals having relative frequency of 0. The stem-and-leaf plot may have a different choice for stems a. Construct separate relative frequency histograms. 6

3 b. The histogram for the New Therapy has one more class than the Standard Therapy. This would indicate that the New Therapy generates a few more large values than the Standard Therapy. However, there is not convincing evidence that the New Therapy generates a longer survival time The plot has a bimodal shape. This would be an indication that there are two separate populations. However, the evidence is not very convincing because the individual plots were similar in shape with the exception that the New Therapy had a few times somewhat larger than the survival times obtained under the Standard Therapy a. The outlays have increased rapidly from 1980 to 1989, then dropped somewhat until 1991, when they increased again to The values had a slight decrease from 1992 through b. The % GNP increased rapidly form 1980 to 1985, remain steady for a couple of years, then decreased fairly rapidly from 1987 through 1997 (except a minor jump from 1991 to 1992). c. The two plots have very different trends. The public interest group s contention is not supported by either graph for the decade of the 90 s a. There appears to have been a dramatic drop in verbal scores for both sexes from 1970 to 1980, with a greater drop in female scores. There appears to have been a slight drop in math scores from 1970 to 1980, then a slow steady rise from 1980 to b. Yes. For the years , the difference between males and females has remained relatively constant. c. The female verbal scores had a larger decrease for the years than the male scores Plot of average public school teacher s salaries: The plot obtained in Exercise 3.17 would lead to the conclusion that there was a dramatic increase in salaries from 1970 to 1985, whereas the plot obtained in Exercise 3.18 shows what really happened-there was a gradual increase in salaries from 1970 to Spacing the points evenly along the horizontal axis made the increase from 1970 to 1985 more pronounced. The interval width between on the horizontal axis should be five times the width of that between , and the interval width between 1983, 1985, 1987 should be two times the width of the interval between If these interval widths between time points do not correspond to the actual length of time between data observations, the resulting trend line could be seriously misleading There are a few states having either a very small or very large number of telephones per thousand residents. The vast majority of states have values in the 450 to 700 range a. Relative frequency histograms for 1985 and 1996 should be plotted. b. The two plots are very similar in shape. The proportion of states having a high percentage of homeownerships appears to have increased relative to c. There was a very robust economy during this time period which may have allowed more people to purchase homes. 7

4 d. Congress would be able to determine that there has not been a large increase in homeownership during this period of a strong economy and decide to increase the tax deductions relative to homeowners Stem-and-Leaf Plot 3.22 The shapes of the 1985 and 1996 histograms and stem-and-leaf plots are asymmetric. The four plots are unimodal and left-skewed The plots show an upward trend from year 1 to year 5. There is a strong seasonal (cyclic) effect; the number of units sold increases dramatically in the late summer and fall months a. Plot the activity data versus time. b. The plot shows an upward linear trend. c. There is no seasonal trend in the data. 3.4 Measures of Central Tendency 3.25 Mean = 243/16 = = 15.2, Median = (14+15)/2 = 14.5, Mode = Mean = 283/16 = = 17.7, Median = (14+15)/2 = 14.5, Mode = 18 The median and mode are unalterered but the mean is inflated by the two large values % of measurements = 10% of 16 = Thus, we delete the two largest and smallest data values. Since the 2 largest values are deleted in computing the 10% Trimmed Mean, its value is the same for both data sets: 10% Trimmed Mean = ( )/12 = Thus, the extreme values do not have an effect. A 5% Trimmed Mean deletes only the smallest and largest values in the data set. Therefore, the second largest extreme value does enter into the calculation and hence would have an effect on the 5% Trimmed Mean. The effect would not be as dramatic as was seen in using the untrimmed mean Mean = 96/16 = 6, Median = (5+6)/2 = 5.5, 3.29 The following table is used to calculate the summary statistics: 8

5 Class Interval Frequency (f i ) Midpoint(y i ) f i y i Total mean 114/15 = 7.6 mode 7 median L + w f m (.5n cf b ) = [(.5)(15) 4] = a. The relative frequency histogram is unimodal, slightly right skewed. b. The following table is used to calculate the summary statistics: Class Interval Frequency (f i ) Midpoint(y i ) f i y i Total ,380 mean 18380/191 = 96.2, mode 80 median L + w f m (.5n cf b ) = [(.5)(191) 92] = c. Since the median is larger than the mean, it would indicate that the plot is somewhat left skewed. This contradiction between what is indicated in the relative frequency histogram and what is indicated by the summary statistics is due to the fact that the class intervals are of different width. The correct plot would have relative frequency divided by class width on the vertical axis. This would then produce a left skewed histogram with mode at approximately 110. d. The median is more informative since the distribution is somewhat skewed to the left which produces a mean somewhat less than the middle of the distribution. The median distance traveled would at least represent a value such that half of the buses traveled less and half greater than 101,600 miles a. The mean cannot be approximated since we do not know the endpoint for the last interval, hence cannot compute the midpoint of this interval. Mode 240. median L + w f m (.5n cf b ) = [(.5)(408) 119] = b. The median since it would give an indication of the cholestrol level for which half of men in this group have a greater of lesser value a. Mean = 8.04, Median = 1.54 b. Terrestrial: Mean = 15.01, Median = 6.03 Aquatic: Mean = 0.38, Median =.375 9

6 c. The mean is more sensitive to extreme values than is the median. d. Terrestrial: Median, because the two large values(76.50 and 41.70) results in a mean which is larger than 82% of the values in the data set. Aquatic: Mean or median since the data set is relatively symmetric a. After removing the survival times of the two individuals who left the study, we obtain Mean = The median can be calculated for all 11 patients, since we know that the values for the two indiduals who left the study were greater than the listed values of 57 and 60 which would place them in the upper half of the survival times. Thus, we obtain Median = 29. b. The median would be unchanged but the mean would increase since these two values will be greater than the mean calculated from the nine observed values a. If we use all 14 failure times, we obtain Mean > and Median = 154. In fact, we know the mean is greater than since the failure times for two of engines are greater than the reported times of 300 hours. b. The median would be unchanged if we replaced the failure times of 300 with the true failure times for the two engines that did not fail. However, the mean would be increased a. The rounded measurements: b. Sample modes = 2.10, 2.77, 2.82 c. Median = ( )/2 = 2.45 d. Mean = mode = 2.10, 2.77, 2.82 median = 2.45 mean = The mode and median stay the same but the mean increases a. The values are given below: Group Mean Median Mode I no mode II , 1.57 III b. mean = median = mode = 0.70, 1.55, 1.57 c. If we were to use one summary for the combined group, then the median would be most appropriate because the three groups are substantially different. If separate summaries are computed for each group, then the mean and median are both appropriate since the three groups have relatively symmetric distributions. 10

7 3.38 Mean = , Median = , Mode = The average of the three net group means and the mean of the complete set of measurements are the same. This will be true whenever the group have the same number of measurements, but is not true if the groups have different sample sizes. However, the average of the group modes and medians are different from the overall median and mode. 3.5 Measures of Variability 3.39 a. X = yi /n = 40 8 = 5. Yes. b. 8 i=1 (y i 5) 2 = (6 5) 2 + (3 5) 2 + (10 5) 2 + (4 5) 2 + (4 5) 2 + (2 5) 2 + (4 5) 2 + (7 5) 2 = = 46 c. s 2 = = 6.57 s = The CV = 100 s 2.56 ȳ % = % = 51%. The standard deviation is 51% of the mean a. s = 7.95 b. Because the magnitude of the racers ages is larger than that of their experience Age: s (10 2)/4 = 2. The estimate is fairly close to the actual value of Experience: s (39 18)/4 = The estimate is somewhat smaller than the actual value of i y i = 1016, ȳ = 1016/50 = 20.32, i (y i 20.32) 2 = 3, , s 2 = [ ] = , s = =

8 3.43 The quantile plot is given here. Quantile Plot of Times Treatment Times(minutes) u a. The 25th percentile is the value associated with u =.25 on the graph which is 14 minutes. Also, by definition 14 minutes is the 25th percentile since 25% of the times are less than or equal to 14 and 75% of the times are greater than or equal to 14 minutes. b. Yes; the 90th percentile is 31.5 minutes. This means that 90% of the patients have a treatment time less than or equal to 31.5 minutes (which is less than 40 minutes) a. The frequency table is given here. Class Intervals Frequency Relative Frequency Total The frequency histogram is given here. 12

9 Frequency Histogram for Trees Frequency Diameters(inches) b. ȳ = = c. s = ȳ ± s yields (5.744,9.713); 50/70 = 71.43% ȳ ± 2s yields (3.759, ); 68/70 = 95.71% ȳ ± 3s yields (1.774, ); 70/70 = 100% The percentages are very close to the percentages given by the empirical rule: 68%, 95%, and 99.7% a. Range 200 b. s 200/4 = 50 or we can use the method of estimating s from grouped data which yields s (( )2 6 + ( ) ( ) ( ) ( )46 + ( ) ( ) ( ) 2 4)) = s = 37.2 and ȳ 96.2 c. ȳ ± s yields (59, 133.4); 121/191 64% ȳ ± 2s yields (21.8, 170.6); 181/ % ȳ ± 3s yields ( 15.4, 207.8); 191/191 = 100% The empirical rule works well for this data set since the relative frequency histogram is mound-shaped. 13

10 3.46 a. Luxury: ȳ = 145.0, s = 27.6 Budget: ȳ = 46.1, s = 5.13 b. Luxury: CV = 19% Budget: CV = 11% c. Luxury hotels vary in quality, location, and price, whereas budget hotels are more competitive for the low-end market so prices tend to be similar. d. The CV would be better because it takes into account the larger difference in the means between the two types of hotels a. The time series plot is given here. Mercury Concentration (ng/g dry weight) Mercury Concentration Site 1 Site Year For Site 1, there is a steady decrease until 1980, after which the level is fairly constant but at a much lower level than the values for Site 2. The concentrations at Site 2 are very erratic from 1969 to 1980, with alternating rises and falls. From 1980 through 1992, there is a fairly steady decline in mercury concentration. b. Site 1: Median = 18.25, Mean = Site 2: Median = 184.1, Mean = Both distributions are right skewed, thus the median is a more appropriate measure of center than is the mean. Site 1 has a considerably lower center than that of Site 2. c. Site 1: s = 26.95, CV = 92% Site 2: s = 194.7, CV = 68% 14

11 Comparing the standard deviations can be misleading because is smaller in magnitude than However, the data values for Site 1 are considerably smaller in magnitude also. Therefore, it is more informative to compare the CV values. Based on CV values the concentrations from Site 1 are relatively more variable than those from Site 2. d. No, Site 1 does not have values for these years. 3.6 The Boxplot 3.48 Median (Q 2 ) = 6 Lower Quartile (Q 1 ) = 4, Upper Quartile (Q 3 ) = Q 2 = 20 Q 1 = (17+18)/2 = 17.5, Q 3 = (23+24)/2 = a. Stem-and-Leaf Plot is given here: Stem Leaf b. Min=250, Q 1 = ( )/2 = 298, Q 2 = ( )/2 = 317.5, Q 3 = ( )/2 = 334, Max=386 15

12 Number of Blood Donors On Fridays Number of Volunteers There are no outliers because Q 1 (1.5)IQR = 244 < 250 and Q 3 + (1.5)IQR = 388 > 386. The distribution is approximately symmetric with no outliers a. CAN: Q , Q , Q DRY: Q , Q , Q b. Canned dogfood is more expensive (median much greater than that for dry dogfood), highly skewed to the right with a few large outliers. Dry dogfood is slightly left skewed with a considerably less degree of variability than canned dogfood. 16

13 3.7 More Than One Variable 3.52 a. Stacked Bar Graph is given here: Literacy Level of Three Subsistence Groups Percent in Literacy Level Middle Primary Illerate Shifting Settled TownDweller Subsitence Group b. Illiterate: 46%, Primary Schooling: 4%, At Least Middle School: 50% Shifting Cultivators: 27%, Settled Agriculturists: 21%, Town Dwellers: 51% There is a marked difference in the distribution in the three literacy levels for the three subsistence groups. Town dwellers and shifting cultivators have the reverse trends in the three categories, whereas settled agriculturists fall into essentially two classes Percentage comparison based on Row Totals: Turnover Age(years) Reason < Total Resigned (n=60) Transferred (n=66) Retired/fired (n=124) 50% of workers who resign are in the youngest age group; of those who transfer, 68.2% are in their 30 s; and of those who either retire or got fired 86.24% are at least 40 years old. 17

14 3.54 Percentage comparison based on Column Totals: Turnover Age(years) Reason < Resigned Transferred Retired/fired Total 100(n=50) 100(n=60) 100(n=60) 100(n=80) 60% of workers in their 20 s resign; 75% of the workers in their 30 s transfer; 87.6% of the workers in their 40 s retire or get fired; and 68.75% of the workers who are at least 50 years old retire or get fired a. The means and standard deviations are given here: Supplier ȳ s b. Side-by-side Boxplots are given here: Deviations from Target Power Value for Three Suppliers Deviations from Target Supplier 1 Supplier 2 Supplier 3 Supplier The three distributions are relatively symmetric but supplier 3 is considerably more variable and is shifted above supplier 1 s values, which in turn are shifted above supplier 2 s values. 18

15 c. Supplier 3 not only has the largest mean but also the largest standard deviation. Suppliers 1 and 2 have similar degrees of variability but supplier 1 has a greater mean than supplier 2. d. Supplier 2 because it has the smallest mean and deviates with essentially the same degree of variability as supplier A scatterplot of M3 versus M2 is given here. Money Supply (Trillions of Dollars) M M2 a. Yes, it would since we want to determine the relative changes in the two over the 20 month period of time. b. See scatterplot. The two measures follow an approximately increasing linear relationship. 19

16 3.57 A time series plot with M2 and M3 values on the vertical axis and months on the horizontal axis is given here. Money Supply (Trillions of Dollars) Money Supply M2 M Month This is a more informative plot than the scatterplot because it shows the relative changes of the two measures of money supply across the 20 months. 20

17 Supplementary Exercises 3.58 a. Mean = 57.5, Median = 34.0 b. Median since the data has a few very large values which results in the mean being larger than all but a few of the data values. c. Range = 273, s = 70.2 d. Using the approximation, s range/4 = 273/4 = The approximation is fairly accurate. e. ȳ ± s ( 12.7, 127.7); yields 82% ȳ ± 2s ( 82.9, 197.0); yields 94% ȳ ± 3s ( 153.1, 268.1); yields 97% These percentages do not match the Empirical Rule very well: 68%, 95%, and 99.7% f. The Empirical Rule applies to data sets with roughly a mound-shaped histogram. The distribution of this data set is highly skewed right a. mode = 5, median = 15, mean = b. range = 34-4 = 30, s 30/4 = 7.5 c. s = 8.5 d. No, the histogram for the data set is skewed to the right and hence is not moundshaped a. Price per roll: Mean = , s = Price per sheet: Mean = , s = b. Price per roll: CV = = 46.03% Price per sheet: CV = = 54.13% The price per sheet is more variable relative to its mean. c. CV; The CV value is unit free, whereas the standard deviation also reflects the relative magnitude of the data values Plot the data in a scatterplot. a. No b. No, as the number of sheets increases from 50 to 100, there is just a scatter of points, no real pattern. The price per roll jumps dramatically for the two brands having the largest number of sheets. c. Paper towel sheets vary in thickness and size which will affect the price From the two boxplots, there are 5 unusual brands with regards to price per roll: $1.49, $1.56, $1.59, $1.78, and $1.98. There are 2 unusual brands with respect to sheet per roll: 180 and a. Mean = 0.65; Median = 0.55; Mode =

18 b. Mean = 1.21; Median = 0.55; Mode = 0.5 The mean increases but the median and mode remain the same a. Range = = 0.9, s = b. s 0.9/4 = (fairly close to the actual value of 0.276) c. Range = = 6.5, s = 1.974, s 6.5/4 = Range = = 67.7, s = 21.32, s 67.7/4 = Note as the data becomes more dispersed, the range increases more rapidly than s a. Construct a relative frequency histogram b. Highly skewed to the right c ± 3488 (-2063, 4912) contains 37/41 = 90.2% 1424 ± (2)3488 (-5551, 8400) contains 38/41 = 92.7% 1424 ± (3)3488 (-9039, 11888) contains 39/41 = 95.1% These values do not match the percentages from the Empirical Rule: 68%, 95%, and 99.7%. d ± 1.54) (-0.06, 3.02) contains 31/41 = 75.6% 1.48 ± (2)1.54 (-1.60, 4.56) contains 40/41 = 97.6% 1.48 ± (3)1.54 (-3.14, 6.10) contains 41/41 = 100% These values closely match the percentages from the Empirical Rule: 68%, 95%, and 99.7% The values for the two years are given here: Year Mean Median Std. Dev a. The median is more appropriate since both distributions are left skewed. b. There is not a large difference in the summary statistics between the two years a. There has been very little change from 1985 to b. Yes; For both 1985 and 1996, DC had extremely low ownership. New York and Hawaii had semi-low ownership. c. No d. The cost of homes is very high Relative frequency histogram plot a. Mode = 2.5 Median L + w f m [0.5n cf b ] = [0.5(90) 35] = 7.04 b. Mean 1 13 n i=1 f iy i = 747/90 = 8.3 c. Since the distribution is skewed to the right, the median provides a better measure of the center of the distribution. 22

19 3.70 a. 75th percentile L + w f p (0.75n cf b ) = [0.75(90) 72] = th percentile L + w f p (0.25n cf b ) = frac215[.25(90) 20] = 3.83 interquartile range = b. s n 1 i=1 f i(y i 8.3) 2 = 1 89 [2(.5 8.3)2 + 18( ) ( ) 2 ] = Thus, s = The quantile plots are given here. Homeownership in Percentage(by State) for 1985 Homeownership in Percentage(by State) for 1996 Homeownership(%) Homeownership(%) u u a. The 20th percentile for 1996 is given by reading the vertical value on the graph for u = %. Thus, approximately 20% of the 1996 homeownership percentages are less than or equal to 63%. b. The upper 10th percentile would correspond to the states having the largest 5 percentages: Michigan, Indiana, West Virginia, Minnesota, and Maine. c. In 1985, the states falling in the upper 10th percentile are Pennsylvania, South Carolina, Wyoming, Maine, and West Virginia. There are only two states which fall into both groups The relative frequency distribution is unimodal, highly right skewed a. mode = 1.0 median =

20 b. mean = 10 i=0 f iy i = 663/500 = c. Becauce the mean is larger than the mode and median, the distribution is skewed to the right Since the distribution is highly skewed to the right, the Empirical Rule may not be appropriate a. Relative frequency histogram 11 b. Mean 1 n i=1 f iy i = 1 50 [(1)(52) + (4)(67) + + (1)(202)] = 5480/50 = s n 1 i=1 f i(y i 109.6) 2 = 1 49 [47412] = s = a. South: i y i = 17930, ȳ = 17930/30 = North: i y i = 14185, ȳ = 14185/29 = West: i y i = 19707, ȳ = 19707/29 = b. ȳ = 1 n [ i n iy i ] = 1 88 [30(597.67) + (29)(489.14) + (29)(679.55)] 1 88 [51822] = c. Total for all three regions is = Thus, ȳ = 1 88 [51822] = Sample mean computed in part b is equal to that obtained in part a. Note that if we were to just average the three means obtained in a we would obtain a slightly different answer since the sample sizes for the three regions are unequal. 1 3 [ ] = a. Plot relative frequency histogram. b c. mean = 25115/23 = , median = 1039 d. Because the mean is slightly larger than the median, it is likely that the distribution is slightly skewed to the right a. Mode = 688; yes, because it would indicate the most frequently occurring production. b. Median = 688 c. Mean = 17664/26 = d. Because the mean is somewhat less than the median, it is likely that the distribution skewed to the left. 24

21 3.79 The stem-and-leaf diagram is given here: Stem Leaf Yes, the distribution is left skewed a. median(q2) = 688, Ql = 667, Q3 = 701, IQR = = 34 b. The inner fence boundaries are: 667-(1.5)(34) = 616 and 701+(1.5)(34) = 752. There is an outlier at 547. c. The boxplot is given here: 25

22 Number of Cars Produced Per Shift Number of Cars Produced The means and medians are given here: Number of Members Mean Median a. mean = 9024/83 = b. Yes, we can use the following method: [20(93.75) + 23(98.652) + 16( ) + 14( ) + 10(131.90)]/83 = /83 = Results agree with part a, except for round-off errors. c. Median = 104 d. No 3.83 a. New policy ȳ = 2.27, s = 3.26 Old policy: ȳ = 4.60, s = 2.61 b. Both the average number of sick days and the variation in number of sick days have decreased with new policy. 26

23 3.84 New policy: ȳ = 1.93, s = 2.31 Old policy: ȳ = 4.27, s = 1.79 Yes, the ranges are affected a. The sample mean will be distorted by several large values which skew the distribution. State 5 and State 11 have more than 10 times as many plants destroyed as any other state; for arrests, States 1, 2, 8 and 12 exceed the other arrest figures substantially. b. Plants: ȳ = /15 = Arrests:ȳ = 1425/15 = 95 10% trimmed mean: Plants: ȳ = /11 = Arrests: ȳ = 657/11 = % trimmed mean: Plants: ȳ = /9 = 133, Arrests: ȳ = 372/9 = For plants, the 10% trimmed mean works well since it eliminates the effect of States 5 and 11. For arrests, the means differ because each takes some of the high values out of calculation. It appears that the distribution is not skewed, but rather separated into at least two parts: states with high number of arrests and states with low numbers. By trimming the mean, we may be losing important information Although the six states with fewest plants also have the fewest arrests, the states with the most plants do not have the most arrests. We could get a better idea of the relationship by plotting the two variables in a scatterplot. Other variables which might relate to number of plants destroyed include size of police force (especially the narcotics division) and size of budget allocated to drug enforcement a. A time series plot is given here: 27

24 Four Monthly FDC Indices FDC Index Pharmaceticals Diversified Chain Wholesaler Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month b. All the components vary throughout the year with a slight increasing trend. Wholesalers show the greatest degree of variability through the year The percent change for the four indices are given here: Month Pharmaceuticals Diversified Chain Wholesaler January February March April May June July August September October November December A graph of percent changes helps to show comparisons between components. The diversified index shows much greater monthly variability than the other components; pharmaceuticals also vary, but not as rapidly as diversified a. Average Price =

25 b. Range = c. DJIA = d. Yes; The stocks covered on the NYSE. No; The companies selected are the leading companies in the different sectors of business a. No. The distributions of the data for the two years are left skewed. b. ȳ = 66.84, s = 6.69 ȳ ± s (60.16, 73.53) contains 82% of the data values ȳ ± 2s (53.46, 80.22) contains 94% of the data values ȳ ± 3s (46.77, 86.91) contains 98% of the data values These percentages are not very consistent with those provided by the Empirical Rule: 68%, 95%, 99.7% Only the District of Columbia with 37.4% and 40.4% homeownership in 1985 and 1996, respectively, fell more than 3 standard deviations from the mean. The stem-and-leaf plot showed this value at the low end of the scale, but did not really separate it too much from the rest of the data. It is not too extreme and should not decrease the mean too much. The raw and 10% trimmed means are given here: Year Raw Mean 10% Trimmed Mean a. The job-history percentages within each source are give here: Source Job History Within Firm Related Business Unrelated Business Promoted Same position Resigned Dismissed Total 100(n=57) 100(n=21) 100(n 42) b. If for each source, we compute the percentages combined over the promoted and same position categories, we find that they are 78.94% for within firms, 57.15% for related business and 66.67% for unrelated business. This ordering by source also holds for every job history category except the promoted one in which the three sources are nearly equal. It appears that a company does best when it selects its middle managers from within its own firm and worst when it takes its choices from a related firm a. The value 62 reflects the number of respondents in coal producing states who preferred a national energy policy that encouraged coal production. The value 32.8 is of those who favored a coal policy, 32.80% came from major coal producing states. The value 41.3 tells us that 41.3% of those from coal states favored a coal policy. And the value 7.8 tells us that 7.8% of all the responses come from residents of major coal producing states who were in favor of a national energy policy that encouraged coal production. b. The column percentages because they displayed the distribution of opinions within each of the three types of states. 29

26 c. Yes. For both the coal and oil-gas states, the largest percentage of responses favored the type of energy produced in their own state Arbitration seems to win the largest wage increases. If we assume that the Empirical Rule holds for these data, then a standard error for the mean of the arbitration figures would be s/ n =.25. Thus the mean increase after arbitration is ( )/.25 = 4 standard errors above the next largest mean, that for Poststrike. Management, on the other hand, should favor negotiation. It has the smallest mean percentage wage increase and the smallest variance, or least risk. 30

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency