STATISTICS Unit 2 STUDY GUIDE Topics 6-10 Part 1: Vocabulary For each word, be sure you know the definition, the formula, or what the graph looks like. Name Block A. association M. mean absolute deviation Y. side-by-side stemplot B. boxplot N. median Z. Simpson s paradox C. center O. mode AA. standard deviation D. conditional distribution P. modified boxplot BB. standardization E. empirical rule Q. outliers CC. stemplot F. five-number summary R. outlier test DD. symmetric G. histogram S. range EE. two-way table H. independent T. relative risk FF. upper quartile I. interquartile range U. resistant GG. variability J. lower quartile V. segmented bar graph HH. z-score K. marginal distribution W. skewed left L. mean X. skewed right 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. A graph for a quantitative variable that divides a distribution into 25% segments. A graph for a quantitative variable that divides a distribution into 25% segments and shows all mathematical outliers. The minimum, Q1, median, Q3, and maximum. The middle of a distribution that can be described by the mean, median, or mode. The middle of a distribution that is also known as the average. The middle of a distribution that is the most frequently occurring number. The middle of a distribution that divides the list of numbers in half. A graph for a quantitative variable that has a column for part of the numbers and rows for the other part of the numbers. A graph for a quantitative variable and a categorical binary variable that has a column for part of the numbers and rows off to the left and right for the other part of the numbers. The description of a distribution s shape that has a peak in the middle and tapers off evenly to the left and to the right. The description of a distribution s shape that has a peak on the left and tapers off to the right. The description of a distribution s shape that has a peak on the right and tapers off to the left. 68% of the data falls between 1 and + 1 standard deviations, 95% of the data falls between 2 and + 2 standard deviations, and 99.7% of the data falls between 3 and +3 standard deviations
14. 15. Q3 Q1 maximum minimum 16. Q3 + (IQR * 1.5) and Q1 (IQR * 1.5) 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. The proportion of an event in one category compared to the proportion of the same event in a different category. This value tells you how many times more likely the event is to occur in the first category than in the second. Q3 This value divides a distribution into 75% and 25% segments. Q1 This value divides a distribution into 25% and 75% segments. A value (or values) that are significantly far away from the rest of the data. A measure of spread that is calculated by (1) subtracting the mean from each number in a distribution, (2) taking the absolute value of each of the differences, then (3) taking the average of these differences. A measure of spread that is calculated by (1) subtracting the mean from each number in a distribution, (2) squaring the differences, (3) adding the squared values, (4) dividing that sum by n 1, and (5) taking the square root of the quotient. When one variable has an affect on another variable, there is this between them. When one variable does not have any affect on another variable, they are said to be this. A graph for two categorical variables where one of the variables is represented in columns and the other variable is represented as segments within the columns. This is a measure of standard deviations. A phenomenon where overall proportions contradict proportions in separate categories. The process of measuring different distributions in standard deviations so comparisons can be made between them. A tool for organizing two categorical variables in rows and columns. Proportions that are calculated within the columns of a two-way table. Proportions that are calculated within the margins of a two-way table. A measurement that doesn t change when outliers are present is said to be this. Another term for the spread. A graph for a quantitative variable that is similar to a dotplot, but uses columns.
Part 2: General Knowledge Questions 35. What are the different measures of center? 36. What are the different measures of spread? 37. What are the 3 different shapes a distribution can have? 38. What are the five values listed in the five-number summary? 39. Based on the five-number summary, what percent of the data falls: Below the Q1? Below the Median? Below the Q3? Between the Q1 and the Q3? Between the Min and the Q3? Between the Q1 and the Max? 40. For a normal distribution, what is the proportion of data that falls within: one standard deviation of the mean? two standard deviations of the mean? three standard deviations of the mean? What is this pattern called? Using the histograms provided, choose the most appropriate graph for each description. 41. The mean is greater than the median. 42. The standard deviation is largest. 43. The graph is skewed left. 44. The median is greater than the mean. 45. The graph is a normal distribution. 46. The graph is skewed right. A. B. C. D.
Match each of the following graphs with the proper description. 47. Bar Graph 48. Box Plot 49. Dot Plot 50. Histogram 51. Modified Box Plot 52. Segmented Bar Graph 53. Stem Plot A. B. C. D. E. F. KEY: G. 0 5 5 6 8 1 0 1 3 4 7 9 9 9 2 0 0 1 2 3 5 For each of the graphs, state what type of variables are represented by that type of graph and how many variables can be represented at a time. graph number of variables type of variables 54. Bar Graph 55. Box Plot 56. Dot Plot 57. Histogram 58. Modified Box Plot 59. Scatter Plot 60. Segmented Bar Graph 61. Stem Plot
Match each term with the appropriate letter, formula, or equation. Please use capital letters. A. Max Min B. x-μ z= σ C. Q1 (1.5*IQR) D. Q3 + (1.5*IQR) E. Q3 Q1 62. The interquartile range. 63. Test for lower outliers. 64. The range. 65. Test for upper outliers. 66. The z-score. Part 3: Short Answer / Extended Response Topic 6: Given the number of times an event occurs out of how many total occurrences for two different groups, you should be able to create a two-way table, a segmented bar graph and calculate the relative risk. Toward the end of 2003, there were many warnings that the flu season would be especially severe and many more people chose to obtain a flu vaccine than in previous years. In January 2004, the Centers for Disease Control and Prevention magazine published the results of a study that looked at workers at Children s Hospital in Denver, Colorado. Of the 1000 people who had chosen to receive the flu vaccine (before November 1, 2003), 149 still developed flu-like symptoms. Of the 402 people who did not get the vaccine, 68 developed flu-like symptoms. a. Create a two-way table for the data in the paragraph above. TOTAL TOTAL b. Calculate the conditional distributions and write the proportions in the lower right corners of the table. c. Create a segmented bar graph based on the conditional distributions. KEY: d. What is the relative risk of developing flu-like symptoms? Show all work. e. Are these variables independent? Why or why not?
Topic 7: Given quantitative data or quantitative data that is divided into categories, you should be able to create a histogram, a stemplot, or a side-by-side stemplot then describe the distribution using SOCS. Arby s Sandwiches fat/oz Arby s Sandwiches fat/oz Arby s Melt with Cheddar 3.5 * Roast Chicken Santa Fe 3.4 Arby Q 2.8 French Dip 3.2 Bac n Cheddar Deluxe 4.2 * Hot Ham n Swiss 2.5 * Beef n Cheddar 4.2 * Italian Sub 3.6 * Giant Roast Beef 3.5 Philly Beef n Swiss 4.5 * Junior Roast Beef 3.2 Roast Beef Sub 3.9 Regular Roast Beef 3.5 Triple Cheese Melt 5.4 * Super Roast Beef 3.1 Turkey Sub 2.8 Breaded Chicken Fillet 3.9 Roast Beef Deluxe 1.6 Chicken Cordon Bleu 3.9 * Roast Chicken Deluxe 0.9 Grilled Chicken BBQ 1.8 Roast Turkey Deluxe 1.0 Grilled Chicken Deluxe 2.5 * Fish Fillet 3.5 Roast Chicken Club 3.6 * Ham n Cheese 2.4 * Roast Chicken Deluxe 2.9 * Ham n Cheese Melt 2.7 * a. Create a histogram of the Arby s data. b. Describe the distribution using SOCS.
c. Create a stemplot for the Arby s data. Rough Draft: Final Copy: d. Create a side-by-side stemplot for the Arby s data (An asterix indicates a sandwich with cheese, those without an asterix do not have cheese.) e. Describe the cheese distribution and the no cheese distribution using SOCS.
Topic 8: Given quantitative data in a list or in a table, you should be able to calculate the mean, the median and the mode. Also, look over the review packet from this topic. The main concepts were mean, median, mode, comparing dotplots, and using a calculator to generate the 3 measures of center. a. The table represents the number of friends students reported having in their first block class. Determine each of the three measures of center from the table. (Round to the nearest tenth if rounding is necessary.) # friends 1 2 3 4 5 6 7 8 frequency 0 1 2 7 15 18 12 10 mean = median = mode = b. Use the Arby s data from the Topic 7 example to calculate each of the following measurements. Check your answers by entering the data into your calculator. Using 1-Var Stats and a calculator-generated dotplot. (Round to the nearest tenth.) mean = median = mode = shape = spread = Topic 9: Look over your review sheets from this chapter. The main concepts were range, IQR, and standard deviation. We also calculated the MAD and standard deviation by hand, looked at the Empirical Rule and z-scores so we could compare distributions that were measured on different scales. Topic10: Look over your review sheets from this chapter. The main concepts were boxplots, modified boxplots, the 5-number summary and calculating outliers. We also used our calculator to send groups, ungroup them, modify lists, sort lists (with an ID list), and create graphs. Use the Five-Number Summary to answer #8-11. Minimum Q1 Median Q3 Maximum 7 15 18 33 75 a. What is the IQR? b. What is the range? b. An upper outlier would be any number that falls above what value? c. A lower outlier would be any number that falls below what value? d. Construct a regular boxplot for the data above. e. Would it be possible to make a modified boxplot? Why or why not?