Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of Contents 1. Introduction to Statistics & Data 2. Graphical Displays 3. Two Way Tables 4. Describing Distributions: Shape, Skew & Center 5. Measures of Spread Categorical Variable Usually an adjective Rarely a number Examples: Gender Race Grade in School (Freshmen, Soph, Jr., Sr.) Zip Code Data Collected Quantitative Variable Always a number Must be able to find the mean of the numbers Examples: Weight Height Amount of money in wallet Age What is the Study of Statistics?! Statistics is the science of data. Statistics is the mathematic discipline that involves collecting and analyzing data. Categorical or Quantitative? 1. Survey about whether student buy lunch from the cafeteria or bring lunch from home, doesn t eat lunch, etc. 2. Experiment where we measure how tall a plant grows. 3. Observation where we count how many people are in each car leaving school. 4. Survey about each student s shoe size. 1
Displaying Quantitative Data Graphical Displays of Data 14 16 18 20 22 24 26 28 30 32 34 MPG Displaying Data We can display data in a variety of ways. Based on the type of data collected (categorical or quantitative) and the amount of data we select the best style of graph. Dotplots Each data value is shown as a dot above its location on a number line. Number of Goals Scored Per Game by the 2004 US Women s Soccer Team 3 0 2 7 8 2 4 3 5 1 1 4 5 3 1 1 3 3 3 2 1 2 2 2 4 3 5 6 1 5 5 1 1 5 Displaying Categorical Data How to Make a Dotplot Pie Chart: Bar Graph: 1. Draw a horizontal axis (a number line) and label it with the variable name. 2. Scale the axis from the minimum to the maximum value. 3. Mark a dot above the location on the horizontal axis corresponding to each data value. 2
Histograms Looks like a bar graph, but the bars must touch! X axis labeled with number ranges Box and Whisker Plots A box plot is a graphical display of the minimum, first quartile, median, third quartile, and maximum. The term "box plot" comes from the fact that the graph looks like a rectangle with lines extending from the top and bottom. How to Make a Histogram 1) Divide the range of data into classes of equal sizes. 2) Find the count (frequency) of individuals in each class. 3) Label and scale your axes and draw the histogram. The height of the bar equals its frequency. Adjacent bars should touch, unless a class contains no individuals. Quartiles We can divide data in quartiles Quartiles are divisions representing 25% of the data. 3
Interquartile Range (IQR) How to Calculate Quartiles To calculate the quartiles: 1)Arrange the observations in increasing order and locate the median M. 2)The first quartile Q 1 is the median of the observations located to the left of the median in the ordered list. 3)The third quartile Q 3 is the median of the observations located to the right of the median in the ordered list. Let s Practice: Find the Quartiles and calculate the IQR. Travel times to work for 20 randomly selected Miami Residents: 10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45 Calculate the 1 st, 2 nd (median) and 3 rd quartiles for the following data sets: Travel times to work for 20 randomly selected Miami Residents 10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45 5 10 10 15 15 15 15 20 20 20 25 30 30 40 40 45 60 60 65 85 1. 15, 17, 16, 15, 18, 19, 15, 20, 18 Q 1 = 15 M = 22.5 Q 3 = 42.5 2. 5, 8, 9, 7, 6, 9, 8,7, 10, 11, 4 IQR = Q 3 Q 1 = 42.5 15 = 27.5 minutes 4
Simple Box & Whisker Plot Modified Box & Whisker Plot How to Make a Simple Box & Whisker Plot 1. Draw a number line. 2. Mark the median, Quartile 1 and Quartile 3. 3. Draw a box around Q1 and Q3. 4. Determine if there are any outliers*. 5. Mark the lowest and highest non outlier values with a dot. 6. Draw whiskers from the end of each box to the dot. 7. Add outliers as dots. Outliers Modified Box & Whisker plots highlight outliers. Outliers are extreme values. Can be much higher or lower than the rest of the data. Let s Practice: Create a Box and Whisker Plot. Quiz Scores: 25 30 26 30 29 26 22 23 24 23 25 28 How to Determine Outliers 1. Calculate IQR. 2. Calculate lower fence Q1 (1.5 * IQR) 3. Calculate upper fence. Q3 + (1.5 * IQR) 4. Outliers are any values outside of the fences. 5
Let s Practice: Create a Modified Box & Whisker Plot. Travel times to work for 20 randomly selected Miami Residents: 10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45 Identify the median, Q1 and Q3, and the IRQ. If you had to pick one career, which one would you pick and why? (Must be a statistical reason!) In tens of thousands of dollars. Let s Practice: Two Way Tables Identify the median, Q1 and Q3, and the IRQ. 1. 2. Two Way Tables Two Way Tables: describe two categorical variables, organizing counts according to a row variable and a column variable. When a dataset involves two categorical variables, we begin by examining the counts or percents in various categories for one of the variables. 6
1. What proportion of students have red hair? 2. What proportion of students have brown eyes and hair? 3. What proportion of students have blue eyes and either red or blond hair? 4. What proportion of students have not brown eyes and black hair? 5. What proportion of students with blond hair have blue eyes? 6. What proportion of students with hazel eyes have a hair color other than brown? 1. What proportion of students that ride the school bus are members of two or more clubs? 2. What proportion of students that are members of no clubs do not ride the school bus? 3. What proportion of students that do not ride the school bus are members of at least one club? Member of No Clubs Member of Member of 2 or One Club More Clubs Total Rides the School Bus 55 33 20 108 Does not Ride Bus 16 44 82 142 Total 71 77 102 250 7
Describing Distributions: Shape, Skew & Center Skew in Box Plots Different Shapes of Distributions Describe the Shape Distributions can be described as: Roughly symmetric Skewed right Skewed left Shape Definitions: Symmetric: if the right and left sides of the graph are approximately mirror images of each other. Skewed to the right (right skewed) if the right side of the graph is much longer than the left side. Skewed to the left (left skewed) if the left side of the graph is much longer than the right side. Other Ways to Describe Shape: Unimodal Bimodal Multimodal 0 2 4 6 8 10 12 DiceRolls 70 75 80 85 90 95 100 Score 0 1 2 3 4 5 6 7 Siblings Symmetric Skewed left Skewed right 8
Measures of Center Measures of Center = Mean and Median Type of Distribution Symmetric Skewed Right Skewed Left Best Measure of Center Mean Median Median Why?!?! Measures of Spread Standard Deviation, IQR and Range Which Measure of Center? Standard Deviation Standard deviation is a number used to tell how measurements for a group are spread out from the mean. 9
Standard Deviation Which Measure of Spread? A relatively low standard deviation value indicates that the data points tend to be very close to the mean. A relatively high standard deviation value indicates that the data points are spread out over a large range of values. Below are dotplots of three different distributions, A, B, and C. Which one has the largest standard deviation? Justify your answer. Measures of Spread Measures of Center = IQR, Range and Standard Deviation Type of Distribution Symmetric Skewed Right Skewed Left Best Measure of Center Standard Deviation Range IQR IQR Let s Practice.. Mr. Morris gave his algebra class a test, the results of which are listed below. 68, 92, 74, 75, 86, 90, 92, 81, 60, 82, 77, 80 Shania was absent on the day of the test and had to take the test late. She earned a score of 99. Which measure of the class's test results did Shania's score most change? A. IQR B. Mean C. Median D. Range 10