MEASURING THE SPREAD OF DATA: 6F

CONTINUING WITH DESCRIPTIVE STATS 6E,6F,6G,6H,6I MEASURING THE SPREAD OF DATA: 6F othink about this example: Suppose you are at a high school football game and you sample 40 people from the student section about their age. othen you head to a professional game and you sample 40 random people there. You find that you have the same mean as the high school game. owhat is different about the two scenarios? oare they a good representation of the data collected? 1

DISPERSION osometimes mean, median and mode don t give you an accurate description of the distribution. To do that, we need to measure both the centre and its dispersion. owe can identify the centre, but the spread of the data can be analyzed 3 different ways: orange, ointerquartile range, ostandard deviation. RANGE The range of the data, or the max minus the min is not a particularly reliable measure of spread. Why do you think that would be true? 2

THE QUARTILES AND THE INTERQUARTILE RANGE The median divides the data into two even halves. If we look at the middle of the lower half, we have found the 1 st quartile, or the lower quartile. If we look at the middle of the upper half, we have found the 3 rd quartile, or the upper quartile. The distance between the two quartiles is called the Interquartile range. The tells us the range of the middle 50% of the data. = EXAMPLE 1) Reorder the set 2) Find the median 3) Find the lower quartile and upper quartile. If there is a middle term, disregard it when finding the quartiles. If there are 2 terms for the median, use the lower one for and the upper on for. 4) Calculate the 3

CALCULATOR EXAMPLE Use a GDC to calculate the Range, &, and IQR. 6G BOX AND WHISKER PLOT Hopefully you have seen these before. Lets break it down Quickly. Make sure that you use a number line that is in increments. Why would that need to be true? 4

WHAT WOULD A B&W LOOK LIKE IF DRAW AN EXAMPLE! Where on the number line is the outlier? Toward the positive side = positively skewed Toward the negative side = negatively skewed The data was a symmetrical distribution? The data was positively skewed? The data was negatively skewed? OYO: TRY IT Create a box and whisker plot (boxplot) from the data: 13, 24, 14, 11, 9, 31, 33, 33, 33, 18, 29, 28 Use of a calculator can be helpful but it doesn t label the important values for you, so 5

PARALLEL BOXPLOTS Simply put, two sets of data are compared on the same number line with two boxplots. Example: A hospital is trialing a new anesthetic drug and has collected data on how long the new and old drugs take before the patient becomes unconscious. They wish to know which drug acts faster and which is more reliable. Old drug times (s): 8, 12, 9, 8, 16, 10, 14, 7, 5, 21, 13, 10, 8, 10, 11, 8, 11, 9, 11, 14 New drug times (s): 8, 12, 7, 8, 12, 11, 9, 8, 10, 8, 10, 9, 12, 8, 8, 7, 10, 7, 9, 9 Lets put these on the same number line and compare the data. Use a 5-number summary! PARALLEL BOXPLOTS Old drug times (s): 8, 12, 9, 8, 16, 10, 14, 7, 5, 21, 13, 10, 8, 10, 11, 8, 11, 9, 11, 14 New drug times (s): 8, 12, 7, 8, 12, 11, 9, 8, 10, 8, 10, 9, 12, 8, 8, 7, 10, 7, 9, 9 Faster? Reliable? 6

INTERESTING TO NOTE Old drug times (s): 8, 12, 9, 8, 16, 10, 14, 7, 5, 21, 13, 10, 8, 10, 11, 8, 11, 9, 11, 14 Are any of these outliers? CUMULATIVE FREQUENCY GRAPHS Before we get started: Cumulative Frequency: The frequency of an event is the accumulation of the frequencies up to and including the event. Cumulative Relative Frequency of an event is the sum of the relative frequencies up to and including that event divided by the total number n. (the percent of data used thus far) 7

EXAMPLE BY HAND Lengths Tally Frequency Relative frequency 1.00 2 1.25 7 1.50 7 1.75 10 2.00 15 2.25 24 2.50 33 2.75 14 3.00 11 3.25 21 3.50 6 3.75 3 Total Length of steel Rod to 3 decimal places. Cumulative frequency Cumu. Relative Frequency PERCENTILES (EXACT PERCENTILES ARE NOT ON IB EXAM) A percentile is the score below which a certain percentage of the data lies. For example: The 85th percentile is the score below which 85% of the data lies. If your score in a test is the 95th percentile, then 95% of the class have scored less than you. Notice that: the lower quartile (Q1) is the 25th percentile the median (Q2) is the 50th percentile the upper quartile (Q3) is the 75th percentile. 8

CUMULATIVE FREQUENCY GRAPH: Represents only cumulative frequency. It starts at 0 and ends at the total (these are the boundaries). CONTINUED 9

LETS CREATE OUR OWN From the table on slide 18. Length of steel Rod. OYO FROM YOUR BOOK 10

9/27/2017 THE LIMITATIONS Range and IQR are limited in the amount information. We talked about the limitations of range. What would the limitations of IQR be? We need a better way of describing the dispersion of the data!! STANDARD DEVIATION DEF: The measures of deviation between scores and the mean; the measure of dispersal of the data. The larger the standard deviation, then more widely spread the data would be. The smaller the standard deviation, the less spread (less dispersed). How deviated each score is from the mean We calculate it by considering a data set of n values:,,,,.,, with mean. = ( ) 11

9/27/2017 LETS BREAK IT DOWN First thing first. We are talking about individual ungrouped data. = total frequency = individual Score = mean = is the Standard Deviation = SD. ( ) We are looking at the measure of how far an individual score is from the mean. We then sum up all of those distances after we have made them all positive, by squaring them. If this number is smaller, then we know that most of the data values are close to. Dividing by n averages out each data value and square rooting it corrects the units. STANDARD DEVIATION BY HAND This will be an expectation of mine, so learn how to do it. I will be testing you on this, but the IB papers, and IA will not require you to do it by hand. The best way to find standard deviation by hand is to use a table. Lets look at an example and fill in the table by hand. We do it by hand to understand the mechanisms of how the GDC computes the. 12

9/27/2017 EXAMPLE: IA MATH SCORES FOR WILLAMETTE HS. Calculate the SD, or for the data below. We will need to know some information before we can calculate it. What info do we need? Math IA Scores 4 2 5 4 5 6 7 6 4 3 TOTAL = ( ) NOW, USING A CALCULATOR For larger sets up data, it would only make sense to use a GDC. Therefore, lets do an example. Calculate the standard deviation of the data set: 2, 5, 4, 6, 7, 5, 6, 8, 5, 8, 3, 9, 6, 8, 1, 1, 2, 2, 2, 5 As before, we would enter this into a list and use 1- variable stats to calculate the Make sure you always use the standard deviation of the population (this is a new development!). 13

9/27/2017 FREQUENCY TABLES. For frequency tables, we can still find the SD by hand or by use of GDC. By hand, we use the formula. This simply adds one more column to our table. Lets calculate this by hand. = ( ) Math IA Scores Frequency 1 1 2 2 3 4 4 8 5 17 6 11 7 3 GROUPED DATA FREQUENCY TABLES Same thing here, but we use as the midpoint of the class intervals. Lets use technology to calculate this one! Steps for grouped Data: 1. Create 2. Enter into 3. Enter freq. into 4. Press. 1 5. List 6. Freq. 7. Press calculate and find 14

COMPARING THE SPREAD OF TWO DATA SETS The following exam results were recorded by two classes of students studying Spanish: Class A: 64 69 74 67 78 88 76 90 89 84 83 87 78 80 95 75 55 78 81 Class B: 94 90 88 81 86 96 92 93 88 72 94 61 87 90 97 95 77 77 82 90 Compare the results of the two classes including their spread. Lets use the GDC to 1) Compare mean, 2) compare SD for dispersion. CORRECT, in a galaxy far far away 15

THAT PUTS US AT THE END OF THE POWER STANDARD! We will have one day of a review/activity, then take the PS2 assessment! Your homework is 6G.1 #2,4 6G.2 #2,4 (use GDC!!) 6H #1,4, 5 6I.1 #1, 3, 5 6I.2 #1,2, 6,7 6I.3 #1, 3 16