FOUNDATIONS OF MATH 11 Ch. 5 Day 1: EXPLORING DATA VOCABULARY A measure of central tendency is a value that is representative of a set of numerical data. These values tend to lie near the middle of a set of data. There are three common such measures. The mean is the average of the data values. µ = sum of data values number of data items µ is the lower case Greek letter mu. If the data are x-values, the mean is x = x where Σ is the symbol for n "the sum of" (capital Greek letter sigma) and n is the number of x-values. The median is the middle data value; for an even number of data items, take the average of the two middle values. The mode is the most common data value that occurs. If there is more than one value that occurs most frequently, they are all modes of the data set. A data set with two modes is called bimodal. A measure of dispersion is a value that describes the spread of the data values. There are several such measures. The range is the difference between the maximum value and minimum value from the set of numerical data. exercises: Data set A is 2, 3, 3, 5, 8, 9, 9, 9, 15 Data set B is 2, 3, 3, 3, 5, 8, 9, 9, 9, 15 a) Determine the mean of the data sets. b) Determine the median of the data sets. c) Determine the mode of the data sets. d) Determine the range of the data sets.
Ch. 5: Day 1 notes Exploring Data Page 2 of 2 exercises: While looking for a new plasma TV, a consumer web site showed the lifespan, in years, of 30 TVs from two manufacturers. Brand X 3.5 6 5 8.5 6.5 7 4 5.5 6 6 8 6.5 7.5 4.5 5.5 5.5 6 7 4.5 6.5 5 7.5 6.5 5.5 6.5 5.5 7 6 6 5 Brand Y 5 5.5 6 5 7 6 6.5 5.5 7.5 6 6.5 5.5 6 6.5 5.5 6 6 7.5 6.5 6 5.5 5 7 6 6.5 6 5.5 7 4.5 5 This next table shows the frequency of TV lifespans for each manufacturer. Lifespan in years 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 Frequency for Brand X 1 1 2 3 5 6 5 3 2 1 1 Frequency for Brand Y 0 0 1 4 6 9 5 3 2 0 0 a) Determine the mean of the data sets. b) Determine the median of the data sets. c) Determine the mode of the data sets. d) Describe how the data in each set is distributed. Describe the similarities and differences between the two sets of data. e) Which TV would be a better purchase? Why?
FOUNDATIONS OF MATH 11 Ch. 5 Day 2: FREQUENCY TABLES AND GRAPHS With a large amount of information, it is easier to work with it if it is organized. Sorting, tabulating, and graphing will simplify the analysis of data. FREQUENCY DISTRIBUTIONS example: The number of minutes 50 teenager watched TV last month is listed. 2675 2123 2897 545 254 2088 1766 2567 2334 2012 2867 1235 2342 2563 933 2234 2567 345 675 234 2454 1456 1098 1435 166 677 2467 576 563 4131 2986 2332 123 2099 2567 3123 1235 2236 211 561 557 1894 341 2347 2123 1345 313 441 344 257 The data are values that all seem to be different. We are going to group data with similar values by creating intervals. Range = Interval width = Complete the following frequency distribution table. minutes frequency tally (intervals) no. of months fraction, decimal, percent Total =
Ch. 5: Day 2 notes Frequency Tables and Graphs Page 2 of 2 Show the frequency distribution as a histogram. Each interval has a bar where the height of each bar corresponds to how many data values fall in that interval (the frequency of data values in the interval). Show the frequency distribution as a frequency polygon. Place a point at the centre of each interval at the height that corresponds to how many data values fall in that interval. The shape of the histogram would be the same even if the frequency scale is written as fractions, decimals, or percents.
FOUNDATIONS OF MATH 11 Ch. 5 Day 3: STANDARD DEVIATION Standard Deviation is a measure of dispersion; it describes the spread of the data. The symbol for standard deviation is σ, the lower case Greek letter sigma. The formula to calculate standard deviation is σ = ( x x) n 2. example: Calculate the standard deviation of the data set 3, 5, 6, 8, 8. x x x (x x) 2 3 3 9 5 6 8 8 Σ = 30 ( x x) x n = 30 ( x x) 5 2 = x= 6 σ = n 2 = exercise: The mean of the lifespans for each brand was 6 years. Determine the standard deviation of these lifespans. Lifespan in years 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 Frequency for Brand X 1 1 2 3 5 6 5 3 2 1 1 Frequency for Brand Y 0 0 1 4 6 9 5 3 2 0 0
Ch. 5: Day 3 notes Standard Deviation Page 2 of 2 exercise: Use the frequency table below to find an estimate of the mean and standard deviation. minutes f x (f) (x) (x x) 2 (f) (x x) 2 0-400 10 400-800 800-1200 1200-1600 1600-2000 2000-2400 2400-2800 2800-3200 3200-3600 3600-4000 4000-4400 8 2 5 2 11 7 4 0 0 1 x = ( x x) 2 = x n = ( x x) n 2 = µ = x= σ = Manually calculating the standard deviation of unorganized data is timeconsuming. Spreadsheets and other technology is usually used.
FOUNDATIONS OF MATH 11 Ch. 5 Day 4: THE NORMAL DISTRIBUTION A normal distribution is a type of frequency distribution that occurs commonly. example: When eight coins are flipped, there could be 0 to 8 heads showing. These eight coins are flipped 1000 times and the number of times no heads, 1 head, 2 head,... occurred was tallied, put into this table, and graphed. no. of heads 0 1 2 3 4 5 6 7 8 frequency, f 4 30 110 218 275 219 109 31 4 probability as a fraction probability as a decimal probability as a percent 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 The histogram starts to look like a bell-shaped curve. Imagine a histogram with many more bars; the shape of the combined bars will become a bell curve, the shape of all normal distributions. NORMAL DISTRIBUTION GRAPH The graph is a continuous curve. The distribution's mean, median, and mode at its centre. The graph is symmetric about the mean (the graph to one side of the mean is a mirror image of the other side). The total area under every normal curve is 1 The area under a normal curve calculates the probability that a event in that shaded region will happen. µ
Ch. 5: Day 4 notes The Normal Distribution Page 2 of 4 68-95-99 RULE This rule estimates areas under the normal distribution graph in special regions. For every normal curve with mean, σ, and standard deviation, σ.: About 68% of the data is within 1 standard deviation of the mean. µ σ 68% µ µ+σ About 95% of the data is within 2 standard deviation of the mean. µ 2σ 95% µ µ+2σ About 99.7% of the data is within 3 standard deviation of the mean. µ 3σ 99.7% µ µ+3σ The graph of a normal distribution depends on two factors: the mean, µ, which is a measure of central tendency and the standard deviation, σ, which is a measure of dispersion. Remember, the total area under the graph must be 1. example: A normal distribution with µ = 67 and σ = 5 is shown below. a) Sketch a normal distribution with µ = 77 and σ = 5. 37 42 47 52 57 62 67 72 77 82 87 92 97 The new graph must have a different centre, but the same dispersion.
Ch. 5: Day 4 notes The Normal Distribution Page 3 of 4 b) Sketch a normal distribution with µ = 67 and σ = 10. 37 42 47 52 57 62 67 72 77 82 87 92 97 The new graph must have the same centre, but a different dispersion. APPLYING THE NORMAL DISTRIBUTION example: A university calculus course's final exam had a mean score of 67% and a standard deviation of 5%. How good is a score of 77%? o 77% is 2 standard deviation more than the mean. o The area under its normal distribution graph up to 77% will be 0.975, from 50% + ½(95%). Answer: 77% is as good as 97.5% of the exam scores. exercise: A university calculus course's final exam had a mean score of 77% and a standard deviation of 5%. How good is a score of 77%? [answer: as good as 50% of the scores] exercise: A university calculus course's final exam had a mean score of 67% and a standard deviation of 10%. How good is a score of 77%? [answer: as good as 84% of the scores]
Ch. 5: Day 4 notes The Normal Distribution Page 4 of 4 exercise: A manufacturer offers a warranty on its toasters. The toasters have a mean lifespan of 4 years with a standard deviation of 1 year. How long should the toasters be covered by its warranty, if the manufacturer wants to repair no more than 2.5% of the toasters sold? [answer: 2 years] exercise: The noon temperature at the airport has been as low as 6 C and as high as 36 C. If these temperatures are distributed normally, what would be a good estimate of the mean temperature and the standard deviation. [answer: 15 C, 7 C] exercise: For the population of Canadian university students, the number of hours they are in class during a week is normally distributed with a mean of 20 hours and a standard deviation of 4 hours. a) Mark the horizontal axis with appropriate values. b) What is the probability that a university student is in class: (i) between 16 and 24 hours per week? [answer: 68%] (ii) between 12 and 28 hours per week? [answer: 95%] (iii) between 8 and 32 hours per week? [answer: 99.7%] (iv) more than 20 hours per week? [answer: 50%] (v) more than 28 hours per week? [answer: 2.5%] (vi) fewer than 24 hours per week? [answer: 84%]
FOUNDATIONS OF MATH 11 Ch. 5 Day 5: Z-SCORES Thus far, we have only been able to answer questions about normal distribution for a small number of values; µ, µ±σ, µ±2σ, µ±3σ. 68% 95% 99.7% µ σ µ µ+σ µ 2σ µ µ+2σ µ 3σ µ µ+3σ In order to work with other values we will need to use the standard normal distribution and z-scores. STANDARD NORMAL DISTRIBUTION The standard normal distribution has mean 0 and standard deviation 1. 3 2 1 0 1 2 3 The z-score is the number of standard deviations a data value is from the mean. The portion of the total population that has a smaller z-score can be found from a standard normal distribution table. A data value that is 1 standard deviation above the mean has a z-score of 1. The table gives us the value 0.8413 which is 84.13%. This is the portion of the population less than 1 standard deviation above the mean. 3 2 1 0 1 2 3 We were not able to work with a data value that was 1.5 standard deviations above the mean. Now the standard normal distribution table tells us that a value with a z-score of 1.5 is 3 2 1 0 1 2 3 higher than 93.32% of the data.
Ch. 5: Day 5 notes Z-Scores Page 2 of 2 In order to use the standard normal distribution to answer questions about a normal distribution, values from the normal distribution will need to be converted to its z-score. The z-score formula is z = x µ σ where x is a value from the normal distribution. example: A university calculus course's final exam had a mean score of 67% and a standard deviation of 5%. How good is a score of 80%? 52 57 62 67 72 77 82 o The z-score for 80% is z = 80 67 5 = 2.6 o The area under its normal distribution graph up to 80% will be 0.9953, from the standard normal distribution table. 3 2 1 0 1 2 3 Answer: 80% is as good as 99.53% of the exam scores. exercise: A university calculus course's final exam had a mean score of 77% and a standard deviation of 5%. How good is a score of 80%?
Ch. 5: Day 5 notes Z-Scores Page 2 of 2 exercise: A university calculus course's final exam had a mean score of 67% and a standard deviation of 10%. How good is a score of 80%? exercise: A university calculus course's final exam had a mean score of 67% and a standard deviation of 10%. What percent of student got a "B", a score between 73% and 85%?
Ch. 5: Day 5 notes Z-Scores Page 2 of 2 example: A university calculus course's final exam had a mean score of 67% and a standard deviation of 5%. What score would be as good as 80% of the other scores? o The area under its normal distribution graph will be 0.8. o The z-score for this area is about 0.84. o To find the x-value, use the z-score formula, 0.84 = x 67 5 5(0.84) = x 67 4.2 + 67 = x 71.2 = x Answer: 72% will be as good as at least 80% of the other test scores. exercise: A university calculus course's final exam had a mean score of 67% and a standard deviation of 10%. What score would be as good as 80% of the other scores?
FOUNDATIONS OF MATH 11 Ch. 5 Day 6: CONFIDENCE INTERVALS It is usually impossible to obtain data for an entire population. Instead, random samples are taken and the mean and standard deviation are calculated. This information is then used to make predictions. These predictions are not a sure thing, so they need to include: an indication of how close we think our prediction is to the actual value, and how confident we are with the prediction. Margin of Error is a measure of how close we believe the prediction is to the actual value. For example, if our prediction is that 44% of graduating high school student go on to university with a margin of error of ±3%, then we believe that the actual percent going on to university is at least 41% and at most 47%. Confidence Interval is the interval in which we believe our prediction to be in. In the previous example, the confidence interval is 41% to 47%. Confidence Level is a measure of how confident we are in the confidence interval (how good do we feel about the 44% ± 3% prediction?). A confidence level is the probability that the actual value is in the confidence interval, so a confidence level of 19 out of 20 means that there is a 95% likelihood that the actual value is in the confidence interval. example: From the article below, determine the predicted value, the margin of error, the confidence interval, and the confidence level. Describe what these values actually mean. CANADIAN MAJORITY AGAINST NEW PIPELINE CONSTRUCTION A new survey indicates 57% of Canadians oppose government policy. A group lobbying for environmental protection commissioned a poll with the question, is a new pipeline an unacceptable risk to the environment? The polling firm is independent of the environmental group, and the results are considered to reflect Canadian public opinion with an accuracy within 8%, 9 times out of 10. predicted value = margin of error = confidence interval: confidence level =
Ch. 5: Day 6 notes Confidence Intervals Page 2 of 2 exercise: Use the following article (from the Thursday, February 2 nd edition of the Vancouver Sun) to answer the questions below. DIX SURPASSES CLARK AS TOP CHOICE FOR PREMIER Poll finds NDP steadily climbing as Liberals lose momentum. Premier Christy Clark suffered a significant blow in public opinion Wednesday, falling for the first time behind NDP leader Adrian Dix as people s top choice for premier. The poll, conducted by Angus Reid Public Opinion, found that 26% of British Columbians think Dix would make the best premier, followed by 22% who chose Clark. Last November the only other time Angus Reid has directly compared the two leaders Clark led Dix on the best premier question with 25% support to Dix s 19%. The result is bad news for Clark, who recently named a new chief of staff and whose party has been running weeks of negative ads against Dix all in an apparent effort to reverse her slide in the polls. Wednesday s poll also found that Clark s B.C. Liberal party has lost almost all the gains it made from its change in leadership, plummeting in popularity back to just above where it was when former premier Gordon Campbell announce his resignation in 2010. The Liberals had rebounded to a high of 43% as Clark was sworn in last March, but since then have steadily dropped by 15 points. Wednesday s poll found the Liberals now have 28% support, well behind the NDP s 42%. The B.C. Conservatives got 19% support in Wednesday s poll, and the Green party registered 10%. Conducted online among 800 people between Jan. 27 and Jan. 29, the poll has a margin of error of plus or minus 3.5%, 19 times out of 20. a) What is the sample size of this survey? b) What is the margin of error of this survey? c) What is the range of possible voter support for the Liberals? d) What is the range of possible voter support for the NDP? e) Determine the certainty of the results. f) If there are 3.4 million registered voters in B.C., how many would you expect would vote for the Liberals? g) How many would you expect to vote for the NDP? h) If the sample size was increased, what would happen to the margin of error?