Exercises from Chapter 3, Section 1

Similar documents
CHAPTER 1. Introduction

Chapter 3 Data Description

Elementary Statistics

Describing distributions with numbers

Describing distributions with numbers

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Introduction to Statistics

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Section 3. Measures of Variation

AP Final Review II Exploring Data (20% 30%)

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Determining the Spread of a Distribution Variance & Standard Deviation

TOPIC: Descriptive Statistics Single Variable

Sampling, Frequency Distributions, and Graphs (12.1)

Unit 2. Describing Data: Numerical

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

A is one of the categories into which qualitative data can be classified.

MATH 117 Statistical Methods for Management I Chapter Three

2011 Pearson Education, Inc

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

3.3. Section. Measures of Central Tendency and Dispersion from Grouped Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Chapter 2 Statistics. Mean, Median, Mode, and Range Definitions

Practice problems from chapters 2 and 3

are the objects described by a set of data. They may be people, animals or things.

Determining the Spread of a Distribution

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Determining the Spread of a Distribution

Math 221, REVIEW, Instructor: Susan Sun Nunamaker

Chapter 3. Data Description

The empirical ( ) rule

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

STAT 200 Chapter 1 Looking at Data - Distributions

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Descriptive Statistics-I. Dr Mahmoud Alhussami

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Range The range is the simplest of the three measures and is defined now.

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Section 3.2 Measures of Central Tendency

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks

MATH 1150 Chapter 2 Notation and Terminology

GRACEY/STATISTICS CH. 3. CHAPTER PROBLEM Do women really talk more than men? Science, Vol. 317, No. 5834). The study

2.1 Measures of Location (P.9-11)

SESSION 5 Descriptive Statistics

6 THE NORMAL DISTRIBUTION

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Chapter 2: Tools for Exploring Univariate Data

Practice Questions for Exam 1

Descriptive Statistics Class Practice [133 marks]

Unit 2: Numerical Descriptive Measures

Name: JMJ April 10, 2017 Trigonometry A2 Trimester 2 Exam 8:40 AM 10:10 AM Mr. Casalinuovo

3.1 Measure of Center

Math 082 Final Examination Review

Resistant Measure - A statistic that is not affected very much by extreme observations.

A C E. Answers Investigation 4. Applications

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Math Sec 4 CST Topic 7. Statistics. i.e: Add up all values and divide by the total number of values.

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

University of Jordan Fall 2009/2010 Department of Mathematics

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

Statistics 100 Exam 2 March 8, 2017

Example 2. Given the data below, complete the chart:

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326

STRAND E: STATISTICS. UNIT E4 Measures of Variation: Text * * Contents. Section. E4.1 Cumulative Frequency. E4.2 Box and Whisker Plots

QUIZ 1 (CHAPTERS 1-4) SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

Topic 5: Statistics 5.3 Cumulative Frequency Paper 1

1. Exploratory Data Analysis

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

Unit 1: Statistics. Mrs. Valentine Math III

Chapter 2 Solutions Page 15 of 28

Chapter 1 - Lecture 3 Measures of Location

Vocabulary: Samples and Populations

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Histograms allow a visual interpretation

Review Packet for Test 8 - Statistics. Statistical Measures of Center: and. Statistical Measures of Variability: and.

Representations of Data - Edexcel Past Exam Questions

Revision Topic 13: Statistics 1

Measures of average are called measures of central tendency and include the mean, median, mode, and midrange.

MEASURING THE SPREAD OF DATA: 6F

CIVL 7012/8012. Collection and Analysis of Information

Quantitative Tools for Research

Continuous random variables

MATH-A Day 8 - Stats Exam not valid for Paper Pencil Test Sessions

Francine s bone density is 1.45 standard deviations below the mean hip bone density for 25-year-old women of 956 grams/cm 2.

Exploring and describing data

P8130: Biostatistical Methods I

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Chapter 5: Exploring Data: Distributions Lesson Plan

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Solutions to Additional Questions on Normal Distributions

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

What are the mean, median, and mode for the data set below? Step 1

STT 315 This lecture is based on Chapter 2 of the textbook.

IB MATH SL Test Review 2.1

Transcription:

Exercises from Chapter 3, Section 1 1. Consider the following sample consisting of 20 numbers. (a) Find the mode of the data 21 23 24 24 25 26 29 30 32 34 39 41 41 41 42 43 48 51 53 53 (b) Find the median of the data Solution: (a) The mode is 41, the most frequently occurring number in the data set, as highlighed below 21 23 24 24 25 26 29 30 32 34 39 41 41 41 42 43 48 51 53 53 (b) Because there is an even number of data, the median is the average of the middle two values in the ordered data, 21 23 24 24 25 26 29 30 32 34 39 41 41 41 42 43 48 51 53 53 and so the median is 34 + 39 2 = 36.5 2. Consider the following data set consisting of 15 numbers. (a) Find the mode of the data 20 22 23 23 26 26 26 31 33 36 39 41 41 43 43 (b) Find the median of the data Solution: (a) The mode is 26, the most frequently occurring number in the data set as highlighted below 20 22 23 23 26 26 26 31 33 36 39 41 41 43 43 (b) The median is in the central position (8th place...7 data above it and 7 data below it) of the ordered data, so the median is 31, as seen below 20 22 23 23 26 26 26 31 33 36 39 41 41 43 43 3. The test scores (not percentages) for Jennifer s honors English class are as follows, where the teacher wrote down the scores in the order of the students last names. 99 84 100 80 92 102 105 129 109 117 99 122 119 120 89 106 107 112 129 89 97 119 125 94 101 114 127 108 111 (a) Make a stem and leaf display for the test scores. (b) Find the median score for the class? (c) All students who score more than 10 points above the median will receive a prestigious scholarship and Jennifer s score was 119. Will she receive one of the prestigious scholarships?

Solution: (a) A stem-and-leaf plot for the test scores is as follows. 8 0 4 9 9 8 0 = 80 9 2 4 7 9 9 10 0 1 2 5 6 7 8 9 11 1 2 4 7 9 9 12 0 2 5 7 9 9 (b) There are 29 test scores, so the median score is in the (29+1)/2 = 15th place of the ordered scores. So the median score is 107. (c) Jennifer s score is 12 points above the median, so yes, she will receive a scholarship. 4. (a) Given a collection of ordered data with 228 numbers, in what position is the median? (b) Suppose the data in positions 112 to 118 for an ordered data set of size 228 is as below. Find the median of the data set.... 51 55 56 61 62 67 67... (That is 51 is in position 112, 55 is in position 113, etc.) (c) Given a collection of ordered data with 383 numbers, in what position is the median? (d) Suppose the data in positions 190 to 196 for an ordered data set of size 383 is as below. Find the median of the data set.... 51 55 56 61 62 62 67... (That is 51 is in position 190, 55 is in position 191, etc.) Solution: (a) For this example, the data set has size n = 228 which is even. Thus the position of the median is n + 1 = 228 + 1 = 114.5 2 2 Thus the median is found by adding the data values in positions 114 and 115 and dividing that result by 2. (b) This the data value in position 114 is 56 and data value in position 115 is 61 and so median = 56 + 61 2 = 58.5 (c) For this example, the data set has size n = 383 which is odd. Thus the position of the median is n + 1 2 = 383 + 1 2 = 192 (d) By the previous part, the median is position 192, so the median is 56. 5. A student received the following grades in their chemistry tests last quarter (there was a test every Friday except for during the first and last weeks of the quarter). 5 73 77 78 85 88 91 93 (a) Find the mode. (b) Find the median Page 2

(c) Find the mean score. (Note: the sum of the test scores is 590). (d) In actuality in that class, the instructor dropped the highest and lowest test scores and then computed the mean of the remaining 6 tests (this is called a trimmed mean). Compute this trimmed mean for the test scores. What was the median of the remaining 6 test scores? Round answers to one decimal place, when needed in the above answers. Solution: (a) There is no mode, each test score occurs once. (b) The median is the average of the 4th and 5th tests: 78+85 2 = 81.5. (c) The mean is Mean = (d) The trimmed mean is 5 + 73 + 77 + 78 + 85 + 88 + 91 + 93 8 = 73.8 73 + 77 + 78 + 85 + 88 + 91 6 = 82.0 The median remains 81.5. 6. The annual income in thousands of dollars for a small company with 20 employees is given below. 31.8 34.8 34.8 34.8 37.2 37.2 43.3 45.2 45.2 45.2 46.0 48.9 49.9 49.9 51.0 51.0 54.8 55.6 60.2 2289.0 (a) Compute the median annual income. Express answer to nearest 0.1 thousand dollars. (b) Compute the mean annual income. Express answer to nearest 0.1 thousand dollars. (c) Compute the 5% trimmed mean annual income. Express answer to nearest 0.1 thousand dollars. (d) Think about the following questions: Which piece of the above information would you prefer to know as a prospective employee in the company? Which piece of the above information would you find least useful as a prospective employee in the company? What impact does the really high income have on the mean? What impact does the really high income have on the median? Hint: To reduce your computations, note that 31.8+3(34.8)+2(37.2)+43.3+3(45.2)+46.0+48.9+2(49.9)+2(51.0)+54.8+55.6+60.2+2289.0 = 3145.8 Solution: (a) The median is the average of the 10th and 11th numbers: Median annual income = 45.2 + 46.0 2 = 45.6 thousand dollars. (b) By the hint, the sum of all incomes (in thousands of dollars) is 3145.8, thus Mean annual income = 3145.8 20 = 157.3 thousand dollars. Page 3

(c) Deleting the bottom and top 5% of the data results in deleting the lowest and highest incomes since 5% of 20 is 1. Thus the 5% trimmed mean = 3145.8 31.8 2289.0 18 = 45.8 thousand dollars. (d) (Thoughts will vary) The one really high income raises mean considerably, but its size does not influence the median. That is why in many will prefer to know the median in many situations. The trimmed mean also avoids the extreme data. 7. In Xianfu s AP calculus class there were a total of 480 points on tests, and 170 points on assignments and 220 points on quizzes. Suppose Xianfu achieved 80% on his tests, 93% on his assignments and 94% on his quizzes. Compute Xianfu s overall percentage as a weighted average. Round your answer to the nearest 0.1 percent (e.g. 99.3%). Solution: In this case, weights are 480, 170 and 220 (the total number of points in each category), so we compute xw (480)(80%) + (170)(93%) + (220)(94%) = = 74890 w 480 + 170 + 220 870 = 86.1% Therefore, Xianfu had an overall average of 86.1% in his AP calculus class. 8. On-time percentages are given for two airlines in Phoenix, Los Angeles and Seattle for last year. Crashcade Airlines Los Angeles Phoenix Seattle Number of Fights 866 464 3577 On time % 88 95 77 Pacific Worst Airlines Los Angeles Phoenix Seattle Number of Fights 240 4509 224 On time % 83 90 72 (a) How do the on time percentages of the airlines compare? (b) Calculate the on-time percentage average for these three cities for each airline. Do this as a weighted average where the weight for each airline and city is the number of flights. (c) Does the answer in (b) surprise you? Why or why not? Solution: (a) Crashcade has an on-time percentage that is 5% higher in each city than Pacific Worst. (b) For Crashcade we compute xw (866)(88%) + (464)(95%) + (3577)(77%) = = 395717 w 866 + 464 + 3577 4907 = 80.64% For Pacific Worst we compute xw (240)(83%) + (4509)(90%) + (224)(72%) = = 441858 w 240 + 4509 + 224 4973 = 88.85% Page 4

(c) On the surface it may seem surprising that Pacific Worst has a better overall on-time percentage. However, this happens because Pacific Worst s schedule is heavily weighted to flights in Phoenix where they have their best on-time percentage, whereas Crashcade s flights heavily weighted in Seattle where they have their worst on-time percentage. 9. (General Questions on Means and Medians) Consider a data set of 50 distinct measurements with undisclosed mean A and median B. (a) If the largest number is increased by 250, what is the new mean? (b) If the largest number is increased by 5, what is the new median? (c) If every number is increased by 7, what is the effect on the median and mean? (d) In what position of the ordered data is the median? Solution: (a) The sum of the data is then 50A + 5, and so the new mean is 50A + 250 50 = A + 250 50 = A + 5 (b) The median remains the same, it is still B because the middle number has not changed, only the largest number was changed. (c) Both increase by 7, so the new mean is A + 7 and the new median is B + 7. (d) The median is in the (50 + 1)/2 = 25.5th position, which means the average of the data in the 25th and 26th positions Page 5

Exercises from Chapter 3, Section 2 1. The annual income in thousands of dollars for a small company with 20 employees is given below. 32.1 34.7 34.7 34.7 39.3 39.3 43.7 44.1 44.1 44.1 44.3 48.7 49.6 49.6 50.8 50.8 54.3 55.3 59.9 2499.0 (a) Compute the range. (b) Assume the data represents the entire population. Compute the variance and the standard deviation for the population. Express answers to nearest 0.0001. Note that: x = 3353.1 and x 2 = 6284487.15 (c) Suppose the large income 2499.0 is that of the company owner. If the owner s income were removed from the population, to you think the standard deviation would increase, decrease or stay the same? Compute the standard deviation of this new population of 19 incomes to verify your answer. Solution: (a) The range is 2499.0 32.1 = 2466.9. (b) Use the formula and σ 2 = Population Variance = x 2 ( x) 2 20 20 = 6284487.15 (3353.1)2 20 20 σ = Population Standard Deviation = 286116.1585 = 534.8983 = 286116.1585 (c) Removing the largest value, will leave the data less spread out, and so the standard deviation should decrease. To verify this, note that x = 3353.1 2499.0 = 854.1 and x 2 = 6284487.15 2499.0 2 = 39486.15. The standard deviation is then x 2 ( x) 2 σ = 19 19 = 39486.15 (854.1)2 19 19 = 57.4793 7.5815 2. Stem-and-leaf plots are given for two populations each of size N = 11; in fact, the second data set is obtained by adding 30 to each number in the first set. Population 1 Population 2 5 6 9 5 6 = 56 8 6 9 8 6 = 86 6 2 5 5 8 9 2 5 5 8 7 1 6 9 10 1 6 9 8 9 2 5 11 12 2 5 (a) How do you think the means of Population 1 and Population 2 compare? Explain. (b) How do you think the medians of Population 1 and Population 2 compare? Explain. (c) How do you think the ranges of Population 1 and Population 2 compare? Explain. Page 6

(d) How do you think the standard deviations of Population 1 and Population 2 compare? Explain. (e) Verify your answers to (a) through (d) by finding the values of the various quantities. For your convenience the following sums are computed for you. Population 1: x = 788 x 2 = 58082; Population 2: x = 1118 x 2 = 115262 (f) Compute the coefficient of variation for each population. Solution: (a) The mean of Population 2 is 30 larger than the mean of Population 1 since each number is increased by 30. (b) The median of Population 2 is 30 larger than the median of Population 1 since each number is increased by 30, and so, the middle value in Population 2 will be 30 larger than the middle value in Population 1. (c) The ranges will be equal since the spread of the data is exactly the same, since the high and low are each increased by 30 in Population 2, the difference between the high and low remains the same. (d) The standard deviations will be equal, the deviations of each data value from the respective means will remain the same. (e) The following computations confirm the answers above Population 1: the range is 39, and the other requested quantities are µ = 788 58082 1152622 11 71.6364 Median = 68 σ = 11 11 Population 2: the range is 39, and the other requested quantities are µ = 1118 115262 11182 11 101.6364 Median = 98 σ = 11 11 12.1825 12.1825 (f) Population 1: CV = 100 12.1825 71.6364 = 17.0% Popuation 2: CV = 100 12.1825 101.6364 = 12.0% 3. Stem-and-leaf plots are given for two populations each of size N = 11; where the data values in the second are double the data values in the first data set. Population 1 Population 2 1 0 2 1 0 = 10 2 0 4 2 0 = 20 2 1 2 2 3 3 3 0 2 2 4 2 4 4 6 4 1 3 5 6 0 4 4 7 8 2 6 (a) How do you think the means of Population 1 and Population 2 compare? Explain. (b) How do you think the medians of Population 1 and Population 2 compare? Explain. Page 7

(c) How do you think the ranges of Population 1 and Population 2 compare? Explain. (d) How do you think the standard deviations of Population 1 and Population 2 compare? What about the variances? Explain. (e) Verify your answers to (a) through (d) by finding the values of the various quantities. For your convenience the following sums are computed for you. Population 1: x = 288 x 2 = 8660; Population 2: x = 576 x 2 = 34640 Solution: (a) The mean of Population 2 is twice as large as the mean of Population 1 since each number is Population 2 is double the corresponding number in Population 1. (b) The median of Population 2 is twice as large as the median of Population 1 since the middle value in Population 2 will be double the middle value in Population 1. (c) The range in Population 2 will be double the range in Population 1 since 2(H) 2(L) = 2(H L) where H and L represent the High and Low in Population 1. (d) Because the deviations of each data value from the mean in Population 2 will be double the difference of the corresponding number and the mean in Population 1, the variance in Population 2 will be 2 2 = 4 times as large, and the standard deviation will be twice as large as those in Population 1. (e) The following computations confirm the answers above Population 1: the range is 33, and the other requested quantities are µ = 288 8660 346402 11 26.1818 Median = 23 σ = 10.0889 σ 2 101.7851 11 11 Population 2: the range is 66, and the other requested quantities are µ = 576 34640 5762 11 52.3636 Median = 46 σ = 20.1777 σ 2 407.1405 11 11 4. Consider the following sample consisting of 22 numbers. 24 26 28 28 31 33 34 35 37 39 41 44 46 46 46 48 51 57 58 59 59 62 (a) Find the mode of the data (b) Find the median of the data (c) Find the range of the data (d) Given that x = 932 and x 2 = 42430 find the mean, variance and standard deviation for this sample. Solution: (a) The mode is 46, the most frequently occurring number in the data set. 41 + 44 (b) The median is = 42.5 2 (c) The range is 62 24 = 38 Page 8

(d) The mean is x = 932 42.36364. Because this is a sample, the variance is 22 s 2 = 42430 9322 22 21 140.33766 and the standard deviation is s = s 2 140.33766 11.84642. 5. At a large medical center there is some concern about the high turnover of nurses. A survey was given to a sample of nurses to determine how long in months they had been in their current positions. The responses (in months) of surveyed nurses were as follows. 5 5 5 8 10 10 12 13 16 19 21 23 24 24 27 30 31 32 33 33 35 38 40 41 44 47 48 50 Note that there 28 pieces of data, and then find the following. (a) The range of the data. (b) The median of the data. (c) The standard deviation of the data. (d) The mean of the data. Hint. The following may be useful x = 724 and x 2 = 24082. Solution: (a) The range is 45 = 50 5. (b) The median is in the (n + 1)/2-th places, i.e. it is the average of the data in the 14th 24 + 27 and 15-th place, and so the median is = 25.5. 2 (c) The standard deviation is s = x 2 ( x) 2 /n (n 1) = 24082 (724) 2 /28 27 14.0915 We use the sample formula because the survey was not based on the entire population of nurses at the hospital. (d) The mean is x = 724 28 25.8571. 6. The depth of ground water is given in the following grouped data table. Distance from ground to water level (ft), x 18 24 25 31 32 38 39 45 Number of wells, f 4 9 6 4 (a) Estimate the mean depth of the ground water. (b) Estimate the sample standard deviation for the depth of the ground water. Page 9

xf Solution: (a) x n, where n = f = 4 + 9 + 6 + 4 = 23 and x is the class mark so x (21)(4) + (28)(9) + (35)(6) + (42)(4) 23 = 714 23 = 31.0435. x (b) s 2 f ( xf) 2 /n, and n 1 x 2 f = (21 2 )(4) + (28 2 )(9) + (35 2 )(6) + (42 2 )(4) = 23226, and so s 23226 7142 23 22 48.22529644 6.9444 7. The exam scores (not percentages) in a certain AP statistics class had a mean of µ = 52 and a standard deviation of σ = 6. (a) The tick marks on the horizontal axis given below range from 4 standard deviations below the mean to 4 standard deviations above the mean. Use the information given above to label each of the tick marks with the appropriate numerical values. µ 4σ µ 2σ µ µ+2σ µ+4σ (b) According to Chebyshev s theorem, what interval of test scores contains at least 8 of all 9 data values. Illustrate that interval on a graph. (c) According to Chebyshev s theorem, what portion of test scores will be in the range from 40 to 64? Solution: (a) The mean µ is given to be 52, then for each tick mark to the right, we successively add σ = 6 and for each tick mark to the left, we successively subtract σ = 6. 28 34 40 46 52 58 64 70 76 µ 4σ µ 2σ µ µ+2σ µ+4σ (b) Notice that 8/9 = 1 1/3 2, so that is the interval from µ 3σ to µ + 3σ, that is the interval from 34 to 70. 28 34 40 46 52 58 64 70 76 µ 4σ µ 2σ µ µ+2σ µ+4σ (c) With reference to our graph 28 34 40 46 52 58 64 70 76 µ 4σ µ 2σ µ µ+2σ µ+4σ Page 10

this is the interval that goes from 2 standard deviations below the mean to 2 standard deviations above the mean, so at least 1 1/2 2 = 3/4 of all test scores lie in the interval from 40 to 64. 8. A population is known to have a mean of 70 and a standard deviation of 5. Use Chebyshev s theorem to answer the following questions. (a) Find an interval that contains at least 3 of the population. 4 (b) At least what portion of the population is contained in the interval from 45 to 95? Solution: Chebyshev s theorem says that at least 1 1 n 2 of a population lies with n standard deviations of the mean. Since 3 4 = 1 1 2 2 that means at least 3 4 of the populations lies in the interval with end points µ ± 2σ, we have µ = 70 and σ = 5 we have that at least 3 of the populations lies within the interval from 4 60 to 80. (b) Observe that (95 70)/5 = 5 and (45 70)/5 = 5, so the interval given has endpoints µ ± 5σ, thus at least 1 1 5 2 = 24 25 of the population lies in the given interval. 9. The exam scores (not percentages) of a college entrance test had a mean of µ = 470 and a standard deviation of σ = 20. (a) Find a score that is 3 standard deviations above the mean. (b) Find a score that is 5 standard deviations below the mean. (c) How many standard deviations away from the mean is a score of 520? (Clearly indicate where it is above or below the mean in your answer). (d) How many standard deviations away from the mean is a score of 460? (Clearly indicate where it is above or below the mean in your answer). Solution: (a) The score that is 3 standard deviations above the mean is 470+(3)(20) = 530. (b) The score that is 5 standard deviations below the mean is 470 (5)(20) = 370. (c) The score 520 is 520 470 20 = 2.5 standard deviations above the mean. (c) For the score 460, we compute 460 470 20 = 0.5. Because the answer is negative, this means 460 is 0.5 standard deviations below the mean. Page 11

10. (Suspected Outliers) One indicator of a suspected outlier is than an observation is more than 2.5 standard deviations from the mean. Suppose for a certain breed of dog has a mean weigh of an adult female is 90 pounds with a standard devitation of 8 pounds. Which of the following weights for an adult female would be a suspected outlier? (a) 99 pounds (b) 68 pounds (c) 121 pounds (d) 78 pounds You may wish to label the points on the graph below to visually support the answer: µ 2.5σ µ µ+2.5σ Solution: Notice that 2.5 standard deviations in this case is (2.5)(8) = 20.0 lbs. Then µ 2.5σ = 90 20.0 = 70.0 and µ + 2.5σ = 90 + 20.0 = 110.0. These points are labeled and the other points are plotted on the graph. 68 70.0 78 90 99 110.0 121 µ 2.5σ µ µ+2.5σ With reference to the graph above, we see that the weighs (b) 68 pounds and (c) 121 pounds are more than 2.5 standard deviations from the mean and so they are suspected outliers; while (a) 99 pounds and (d) 78 pounds are within 2.5 standard deviations from the mean so they are not suspected outliers. Page 12

Exercises from Chapter 3, Section 3 1. Suppose 7000 people took an aptitude test for entrance into a university. (a) If Clayton scored in the 77 percentile on the test. Approximately how many of the 7000 people taking the test scored as well or better than Clayton? How many had scores that were at or below Clayton s score. (b) If Esperance scored in the 88 percentile on the test. Approximately how many of the 7000 people taking the test had scores that were at or below her score? How many people had scores that were at or or above her score? (c) To be considered for admission to the university, a score at or above a percentile of 36 was required. Approximately how many of the 7000 people taking the test would be considered for admission to the university? Solution: (a) Approximately 23% of 7000 = 1610 people taking the test scored as well or better than Clayton. Approximately 77% of 7000 = 5390 people taking the test had scores at or below Clayton s score. (b) Approximately 12% of 7000 = 840 people taking the test scored as well or better than Esperance. Approximately 88% of 7000 = 6160 people taking the test had scores at or below Esperance s score. (c) Approximately 64% of 7000 = 4480 people taking the test would be considered for admission to the university. 2. At a six month check-up, the pediatrician told Benjamin and Crystal that their daughter Emily Elisabeth ranked at the 83th percentile in height among all six month old girls. (a) What percentage of six month old girls measure at or below Emily Elisabeth s height? (b) What percentage of six month old girls are taller than Emily Elisabeth s? Solution: (a) 83% (b) (100 83)% = 17% 3. Consider the following sample consisting of 23 numbers. 25 26 28 28 32 33 36 38 39 40 42 45 46 46 46 47 47 51 53 56 57 57 65 Find the quartiles: Q 1, Q 2, Q 3 and the IQR for the data. Solution: The second quartile Q 2 is the median which is in the 12th place, and so Q 2 = 45 and highlighted below. 25 26 28 28 32 33 36 38 39 40 42 45 46 46 46 47 47 51 53 56 57 57 65 The first quartile is the median of the first 11 data (all the data below the position of the median) 25 26 28 28 32 33 36 38 39 40 42 which is in the 6th place and so Q 1 = 33 as highlighted above. Page 13

The third quartile is the median of the last 11 data (all the data above the position of the median) 46 46 46 47 47 51 53 56 57 57 65 and so Q 3 = 51 as highlighted above. The interquartile range is IQR = Q 3 Q 1 = 51 33 = 18 4. Consider the following sample consisting of 22 numbers. 21 22 24 24 26 28 30 33 34 36 38 43 44 44 44 46 48 52 54 55 55 59 (a) Find the mode of the data (b) Find the median of the data (c) Find the range of the data (d) Given that x = 860 and x 2 = 36650 find the mean, variance and standard deviation for this sample. (e) Find Q 1, Q 2, Q 3 and the IQR for the data. Solution: (a) The mode is 44, the most frequently occurring number in the data set. 38 + 43 (b) The median is = 40.5 2 (c) The range is 59 21 = 38 (d) The mean is x = 860 39.09091. Because this is a sample, the variance is 22 s 2 = 36650 8602 22 21 144.37229 and the standard deviation is s = s 2 144.37229 12.01550. (e) The quartiles are Q 1 = 28, Q 2 = 40.5, and Q 3 = 48 and the inter quartile range is IQR = Q 3 Q 1 = 48 28 = 20. 5. Consider the following sample consisting of 22 numbers. 20 21 22 22 24 25 27 28 29 31 32 35 36 36 36 38 40 41 43 45 45 52 (a) Find the quartiles: Q 1, Q 2, Q 3 and the IQR for the given data. (b) Construct a box-and-whisker plot for the given data. Solution: (a) The second quartile Q 2 is the median which is in the 12th place, and so Q 2 = 35. The first quartile is the median of the first 11 data (all the data below the position of the median) which is in the 6th place Q 1 = 25, and the third quartile is the Page 14

median of the last 11 data (all the data above the position of the median), and so Q 3 = 40 (so Q 3 is in the 18th place). The interquartile range is (b) A box and whisker plot is as follows IQR = Q 3 Q 1 = 40 25 = 15 20 25 33.5 40 52 15 20 25 30 35 40 45 50 55 6. At a large medical center there is some concern about the high turnover of nurses. A survey was completed to determine how long in months nurses had been in their current positions. The responses (in months) of 36 nurses were used to make the following box-and-whisker plot. 3 10 24 29 58 0 5 10 15 20 25 30 35 40 45 50 55 60 65 (a) Find the numbers Q 1, Q 2, Q 3, L and H. (b) Find the interquartile range. (c) How many of the 36 nurses have been working at the medical center for at least 10 months? (d) What is the range of the data? Solution: (a) The numbers are: Q 1 = 10, Q 2 = 24, Q 3 = 29, L = 3 and H = 58. (b) The interquartile range is 29 10 = 19. (c) 10 is the first quartile, so 75% of the 36 nurses or 27 nurses have been working at least 10 months. (d) The range of the data is 58 3 = 55. 7. At a large medical center there is some concern about the high turnover of nurses. A survey was completed to determine how long in months nurses had been in their current positions. The responses (in months) of 56 nurses were used to make the following box-and-whisker plot. Page 15

1 9 21 32 47 0 5 10 15 20 25 30 35 40 45 50 (a) Find the interquartile range. (b) How many of the 56 nurses have been working at the medical center for at least 9 months? (c) How many of the 56 nurses surveyed have been working at the medical center for at least 32 months? (d) Find the numbers Q 1, Q 2, Q 3, L and H. (e) What is the median amount of time that the 56 nurses surveyed have been working at the medical center? (f) Have half of the nurses surveyed been working at the medical center for at least 2 years? (g) Find the range of the data. Solution: (a) The interquartile range is 32 9 = 23. (b) 9 is the first quartile, so 75% of the 56 nurses, or 42 nurses have been working at least 9 months. (c) 32 is the third quartile, so 25% of the 56 nurses, or 14 nurses have been working at least 32 months. (d) The numbers are: Q 1 = 9, Q 2 = 21, Q 3 = 32, L = 1 and H = 47. (e) 21 months. (f) No, the median is 21 months which is less than two years. (g) The range of the data is 47 1 = 46. 8. At a large medical center there is some concern about the high turnover of nurses. A survey was completed to determine how long in months nurses had been in their current positions. The responses (in months) of the nurses were as follows. 3 5 6 6 6 7 8 9 11 12 14 15 15 17 18 19 20 22 23 25 25 25 27 27 28 29 29 31 32 33 34 36 38 38 39 41 42 44 45 47 49 50 51 52 54 55 58 61 65 67 69 72 74 75 78 82 84 86 Use this data to construct a box-and-whisker plot; notice that there are 58 data points. Solution: There are 58 data points. Therefore, the median is the average of the data in the 29-th and 30-th place, that is Q 2 = 32.5. Then Q 1 is the median of the first 29 data, and so Q 1 = 18 (it is in the 15-th place). Similarly, Q 3 is the median of the last 29 data, so Q 3 = 52. Clearly, the low number is 3, the high number is 86. Therefore, a box and whisker plot for the data is Page 16

3 18 32.5 52 86 0 10 20 30 40 50 60 70 80 90 9. The test scores (not percentages) Natasha s honors chemistry class are as follows, where the teacher wrote down the scores in the order of the students last names. 109 95 110 90 101 111 115 137 119 126 109 133 129 130 116 116 121 142 99 107 129 134 105 110 123 137 99 (a) Make a stem and leaf plot for the test scores. (b) Make a box-and-whisker plot for the test scores. Solution: (a) A stem-and-leaf plot for the test scores is as follows. 9 0 5 9 9 9 0 = 90 10 1 5 7 9 9 11 0 0 1 5 6 6 9 12 1 3 6 9 9 13 0 3 4 7 7 14 2 (b) There are 27 test scores, so the median score is in the (27+1)/2 = 14th place of the ordered scores. So the median score is 116, then Q 1 is the median of the first 13 test scores, so Q 1 = 107 (the score in the 7th place), and similarly, Q 3 is the median of the last 13 test scores, so Q 3 = 129 A box-and whisker plot is then 90 107 116 129 142 85 90 95 100 105 110 115 120 125 130 135 140 145 10. Consider the following data set consisting of 19 numbers. 97 99 99 100 102 105 108 111 112 114 119 121 122 125 126 126 127 128 176 (a) Find the quartiles: Q 1, Q 2, Q 3 and the IQR for the data. (b) A suspected outlier is any data that fall below the lower limit: Q 1 1.5 (IQR) or above the upper limit: Q 3 + 1.5 (IQR). Page 17

Solution: (a) The second quartile Q 2 is the median which is in the 10th place, and so Q 2 = 114. The first quartile is the median of the first 9 data (all the data below the position of the median) which is in the 5th place Q 1 = 102, and the third quartile is the median of the last 9 data (all the data above the position of the median), and so Q 3 = 126 (so Q 3 is in the 15th place). The interquartile range is IQR = Q 3 Q 1 = 126 102 = 24 (b) The lower limit is 102 (1.5)(24) = 66.0 and the upper limit is 126 + (1.5)(24) = 162.0, the only data outside of this is 176 which is above the upper limit. Thus 176 is a suspected outlier. 11. A review of some symbols. Select the description below that best matches the symbol or concept. 1. x 2. σ 3. s 4. µ 5. Q 1 (a) The sample standard deviation, it is a parameter. (b) The population standard deviation, it is a parameter. (c) The sample standard deviation, it is a statistic. (d) The population standard deviation, it is a statistic. (e) The first quartile, it is the 75th percentile in a data distribution. (f) The first quartile, it is the 25th percentile in a data distribution. (g) The sample mean, it is a parameter. (h) The population mean, it is a parameter. (i) The sample mean, it is a statistic. (j) The population mean, it is a statistic. Solution: 1. (i) 2. (b) 3. (c) 4. (h) 5. (f) 12. A review of some concepts. Select the description below that best matches the concept. 1. The mode 2. The median 3. The range 4. The mean 5. The interquartile range (a) The average that takes all specific values into account. (b) The average that represents the middle value of a data distribution. (c) The average that represents the most frequent value of a data distribution. (d) The average that is the arithmetic midpoint of the high and low data values of a distribution. (e) A measure of dispersion that is computed by subtracting the low value form the high value in a data distribution. (f) An interval of numbers that starts at the low value of the data and ends at the high value of the data. (g) An interval of numbers that starts at the 25th percentile and ends at the 75th percentile from a data distribution. (h) A single number computed by subtracting the 25th percentile from the 75th percentile in a data distribution. Page 18

Solution: 1. (c) 2. (b) 3 (e) 4. (a) 5. (h) 13. When computing the standard deviation, does it matter whether the data are from a sample or comprise the entire population? (a) Yes, in the population standard deviation formula one divides by N 1 and for a sample one divides by n. (b) No, in both formulas one divides by n 1. (c) No, in both formulas one divides by N (d) Yes, in a sample one divides by n 1 and in a population one divides by N. (e) Yes, in a sample one divides by n + 1 and in a population one divides by N 1. Solution: Yes, the answer is (d). Page 19