Range The range is the simplest of the three measures and is defined now.

Size: px

Start display at page:

Download "Range The range is the simplest of the three measures and is defined now."

Arron Brown
5 years ago
Views:

1 Measures of Variation EXAMPLE A testing lab wishes to test two experimental brands of outdoor paint to see how long each will last before fading. The testing lab makes 6 gallons of each paint to test. Since different chemical agents are added to each group and only six cans are involved, these two groups constitute two small populations. The results (in months) are shown. Find the mean of each group. Brand A Brand B Since the means are equal in the example, you might conclude that both brands of paint last equally well. However, when the data sets are examined graphically, a somewhat different conclusion might be drawn. See the figure below. As the figure below shows, even though the means are the same for both brands, the spread, or variation, is quite different. The figure shows that brand B performs more consistently; it is less variable. For the spread or variability of a data set, three measures are commonly used: range, variance, and standard deviation. Each measure will be discussed in this section. Range The range is the simplest of the three measures and is defined now. The range is the highest value minus the lowest value. The symbol R is used for the range. R highest value lowest value Variation of paint (in months) Examine The Data Sets Graphically A A A A A A 10 (a) Brand A Variation of paint (in months) B B B B B B (b) Brand B

2 Comparison of Outdoor Paint EXAMPLE Find the ranges for the paints. Make sure the range is given as a single number. The range for brand A shows that 50 months separate the largest data value from the smallest data value. For brand B, 20 months separate the largest data value from the smallest data value, which is less than one-half of brand A s range. One extremely high or one extremely low data value can affect the range markedly, as shown in the next example. Employee Salaries EXAMPLE The salaries for the staff of the XYZ Manufacturing Co. are shown here. Find the range. Staff Salary Owner $100,000 Manager 40,000 Sales representative 30,000 Workers 25,000 15,000 18,000 Since the owner s salary is included in the data, the range is a large number. To have a more meaningful statistic to measure the variability, statisticians use measures called the variance and standard deviation. Population Variance and Standard Deviation Before the variance and standard deviation are defined formally, the computational procedure will be shown, since the definition is derived from the procedure. Rounding Rule for the Standard Deviation The rounding rule for the standard deviation is the same as that for the mean. The final answer should be rounded to one more decimal place than that of the original data. Comparison of Outdoor Paint Find the variance and standard deviation for the data set for brand A paint in the paint fading example. 10, 60, 50, 30, 40, 20

3 Solution Step 1 Find the mean for the data. Step 2 Step 3 Step 4 Step 5 Step 6 m X N Subtract the mean from each data value Square each result. ( 25) ( 15) ( 5) 2 25 ( 25) ( 5) 2 25 ( 15) Find the sum of the squares Divide the sum by N to get the variance. Variance Take the square root of the variance to get the standard deviation. Hence, the standard deviation equals 291.7, or It is helpful to make a table. A B C Values X X M (X M) Column A contains the raw data X. Column B contains the differences X m obtained in step 2. Column C contains the squares of the differences obtained in step 3. The preceding computational procedure reveals several things. First, the square root of the variance gives the standard deviation; and vice versa, squaring the standard deviation gives the variance. Second, the variance is actually the average of the square of the distance that each value is from the mean. Therefore, if the values are near the mean, the variance will be small. In contrast, if the values are far from the mean, the variance will be large. You might wonder why the squared distances are used instead of the actual distances. One reason is that the sum of the distances will always be zero. To verify this result for a specific case, add the values in column B of the table above. When each value is squared, the negative signs are eliminated. Finally, why is it necessary to take the square root? The reason is that since the distances were squared, the units of the resultant numbers are the squares of the units of the original raw data. Finding the square root of the variance puts the standard deviation in the same units as the raw data. When you are finding the square root, always use its positive or principal value, since the variance and standard deviation of a data set can never be negative.

4 Section 3 2 Measures of Variation 127 The variance is the average of the squares of the distance each value is from the mean. The symbol for the population variance is s 2 (s is the Greek lowercase letter sigma). The formula for the population variance is X s 2 m 2 N where X individual value m population mean N population size The standard deviation is the square root of the variance. The symbol for the population standard deviation is s. The corresponding formula for the population standard deviation is s s 2 m 2 X N Comparison of Outdoor Paint Find the variance and standard deviation for brand B from the paint data in the first example. The months were 35, 45, 30, 35, 40, 25

5 Since the standard deviation of brand A is 17.1 and the standard deviation of brand B is 6.5, the data are more variable for brand A. In summary, when the means are equal, the larger the variance or standard deviation is, the more variable the data are. Sample Variance and Standard Deviation When computing the variance for a sample, one might expect the following expression to be used: X X 2 n where X is the sample mean and n is the sample size. This formula is not usually used, however, since in most cases the purpose of calculating the statistic is to estimate the corresponding parameter. For example, the sample mean X is used to estimate the population mean m. The expression X X 2 n does not give the best estimate of the population variance because when the population is large and the sample is small (usually less than 30), the variance computed by this formula usually underestimates the population variance. Therefore, instead of dividing by n, find the variance of the sample by dividing by n 1, giving a slightly larger value and an unbiased estimate of the population variance. The formula for the sample variance, denoted by s 2, is X X s 2 2 n 1 where X sample mean n sample size To find the standard deviation of a sample, you must take the square root of the sample variance, which was found by using the preceding formula. Formula for the Sample Standard Deviation The standard deviation of a sample (denoted by s) is X s s 2 2 Xn 1 where X individual value X sample mean n sample size Shortcut formulas for computing the variance and standard deviation are presented next. These formulas are mathematically equivalent to the preceding formulas and do not involve using the mean. They save time when repeated subtracting and squaring occur in the original formulas. They are also more accurate when the mean has been rounded.

6 Shortcut or Computational Formulas for s 2 and s The shortcut formulas for computing the variance and standard deviation for data obtained from samples are as follows. Variance Standard deviation s 2 n X2 X 2 n n 1 s n X 2 X 2 n n 1 European Auto Sales Find the sample variance and standard deviation for the amount of European auto sales for a sample of 6 years shown. The data are in millions of dollars. 11.2, 11.9, 12.0, 12.8, 13.4, 14.3 Note that X 2 is not the same as ( X) 2. The notation X 2 means to square the values first, then sum; ( X) 2 means to sum the values first, then square the sum. Variance and Standard Deviation for a Frequency Distribution The procedure for finding the variance and standard deviation for frequency distribution data is similar to that for finding the mean for frequency distribution data, and it uses the midpoints of each class.

7 Miles Run per Week Find the variance and the standard deviation for the frequency distribution of the data in Example 2 7. The data represent the number of miles that 20 runners ran during one week. Class Frequency Solution Step 1 Make a table as shown, and find the midpoint of each class. A B C D E Frequency Midpoint Class f X m f X m f X 2 m

8 Be sure to use the number found in the sum of column B (i.e., the sum of the frequencies) for n. Do not use the number of classes. The steps for finding the variance and standard deviation for grouped data are summarized in this Procedure Table. Procedure Table Finding the Sample Variance and Standard Deviation for Grouped Data Step 1 Step 2 Step 3 Step 4 Step 5 Make a table as shown, and find the midpoint of each class. A B C D E Class Frequency Midpoint f X m f Multiply the frequency by the midpoint for each class, and place the products in column D. Multiply the frequency by the square of the midpoint, and place the products in column E. Find the sums of columns B, D, and E. (The sum of column B is n. The sum of column D is f X m. The sum of column E is f X 2 m.) Substitute in the formula and solve to get the variance. X 2 m Step 6 s 2 n f X2 m f X m 2 n n 1 Take the square root to get the standard deviation. The three measures of variation are summarized below Summary of Measures of Variation Measure Definition Symbol(s) Range Distance between highest value and lowest value R Variance Average of the squares of the distance that each value is from the mean s 2, s 2 Standard deviation Square root of the variance s, s

9 Uses of the Variance and Standard Deviation 1. As previously stated, variances and standard deviations can be used to determine the spread of the data. If the variance or standard deviation is large, the data are more dispersed. This information is useful in comparing two (or more) data sets to determine which is more (most) variable. 2. The measures of variance and standard deviation are used to determine the consistency of a variable. For example, in the manufacture of fittings, such as nuts and bolts, the variation in the diameters must be small, or the parts will not fit together. 3. The variance and standard deviation are used to determine the number of data values that fall within a specified interval in a distribution. For example, Chebyshev s theorem (explained later) shows that, for any distribution, at least 75% of the data values will fall within 2 standard deviations of the mean. 4. Finally, the variance and standard deviation are used quite often in inferential statistics. These uses will be shown in later chapters of this textbook. Coefficient of Variation Whenever two samples have the same units of measure, the variance and standard deviation for each can be compared directly. For example, suppose an automobile dealer wanted to compare the standard deviation of miles driven for the cars she received as trade-ins on new cars. She found that for a specific year, the standard deviation for Buicks was 422 miles and the standard deviation for Cadillacs was 350 miles. She could say that the variation in mileage was greater in the Buicks. But what if a manager wanted to compare the standard deviations of two different variables, such as the number of sales per salesperson over a 3-month period and the commissions made by these salespeople? A statistic that allows you to compare standard deviations when the units are different, as in this example, is called the coefficient of variation. The coefficient of variation, denoted by CVar, is the standard deviation divided by the mean. The result is expressed as a percentage. For samples, CVar s X 100% For populations, CVar s m 100% Sales of Automobiles EXAMPLE The mean of the number of sales of cars over a 3-month period is 87, and the standard deviation is 5. The mean of the commissions is $5225, and the standard deviation is$773. Compare the variations of the two. Since the coefficient of variation is larger for commissions, the commissions are more variable than the sales.

10 Range Rule of Thumb The range can be used to approximate the standard deviation. The approximation is called the range rule of thumb. The Range Rule of Thumb A rough estimate of the standard deviation is s range 4 In other words, if the range is divided by 4, an approximate value for the standard deviation is obtained. For example, the standard deviation for the data set 5, 8, 8, 9, 10, 12, and 13 is 2.7, and the range is The range rule of thumb is s 2. The range rule of thumb in this case underestimates the standard deviation somewhat; however, it is in the ballpark. A note of caution should be mentioned here. The range rule of thumb is only an approximation and should be used when the distribution of data values is unimodal and roughly symmetric. The range rule of thumb can be used to estimate the largest and smallest "USUAL" data values of a data set. The smallest data value will be approximately 2 standard deviations below the mean, and the largest data value will be approximately 2 standard deviations above the mean of the data set. The mean for the previous data set is 9.3; hence, MINIMUM USUAL DATA VALUE X 2s MAXIMUM USUAL DATA VALUE X 2s Notice that the smallest data value was 5, and the largest data value was 13. Again, these are rough approximations. For many data sets, almost all data values will fall within 2 standard deviations of the mean. Better approximations can be obtained by using Chebyshev s theorem and the empirical rule. These are explained next. Chebyshev s Theorem As stated previously, the variance and standard deviation of a variable can be used to determine the spread, or dispersion, of a variable. That is, the larger the variance or standard deviation, the more the data values are dispersed. For example, if two variables measured in the same units have the same mean, say, 70, and the first variable has a standard deviation of 1.5 while the second variable has a standard deviation of 10, then the data for the second variable will be more spread out than the data for the first variable. Chebyshev s theorem, developed by the Russian mathematician Chebyshev ( ), specifies the proportions of the spread in terms of the standard deviation. Chebyshev s theorem The proportion of values from a data set that will fall within k standard deviations of the mean will be at least 1 1 k 2, where k is a number greater than 1 (k is not necessarily an integer). This theorem states that at least three-fourths, or 75%, of the data values will fall within 2 standard deviations of the mean of the data set. This result is found by substituting k 2 in the expression. 1 1 k or % 4

11 For the example in which variable 1 has a mean of 70 and a standard deviation of 1.5, at least three-fourths, or 75%, of the data values fall between 67 and 73. These values are found by adding 2 standard deviations to the mean and subtracting 2 standard deviations from the mean, as shown: and 70 2(1.5) (1.5) For variable 2, at least three-fourths, or 75%, of the data values fall between 50 and 90. Again, these values are found by adding and subtracting, respectively, 2 standard deviations to and from the mean. and 70 2(10) (10) Furthermore, the theorem states that at least eight-ninths, or 88.89%, of the data values will fall within 3 standard deviations of the mean. This result is found by letting k 3 and substituting in the expression. For variable 1, at least eight-ninths, or 88.89%, of the data values fall between 65.5 and 74.5, since and 1 1 k or % (1.5) (1.5) For variable 2, at least eight-ninths, or 88.89%, of the data values fall between 40 and 100. Chebyshev s Theorem At least 88.89% At least 75% X 3s X 2s X X + 2s X+ 3s This theorem can be applied to any distribution regardless of its shape. The next two examples illustrate the application of Chebyshev s theorem.

12 Prices of Homes The mean price of houses in a certain neighborhood is $50,000, and the standard deviation is $10,000. Find the price range for which at least 75% of the houses will sell. Chebyshev s theorem can be used to approximate the minimum percentage of data values that will fall between any two given values. The procedure is shown in the next example. Travel Allowances A survey of local companies found that the mean amount of travel allowance for executives was $0.25 per mile. The standard deviation was $0.02. Using Chebyshev s theorem, find the minimum percentage of the data values that will fall between $0.20 and $0.30.

13 The Empirical (Normal) Rule Chebyshev s theorem applies to any distribution regardless of its shape. However, when a distribution is bell-shaped (or what is called normal), the following statements, which make up the empirical rule, are true. Approximately 68% of the data values will fall within 1 standard deviation of the mean. Approximately 95% of the data values will fall within 2 standard deviations of the mean. Approximately 99.7% of the data values will fall within 3 standard deviations of the mean. For example, suppose that the scores on a national achievement exam have a mean of 480 and a standard deviation of 90. If these scores are normally distributed, then approximately 68% will fall between 390 and 570 ( and ). Approximately 95% of the scores will fall between 300 and 660 ( and ). Approximately 99.7% will fall between 210 and 750 ( and ). (The empirical rule is explained in greater detail in Chapter 6.) 99.7% The Empirical Rule 95% 68% X 3s X 2s X 1s X X + 1s X + 2s X + 3s from Elem. Stats., Bluman

14 Example The mean of times it takes a commuter to get to work in Baltimore is 29.7 minutes. Assume the distribution of commuter times is approximately bell shaped. (a) If the standard deviation is 6 minutes, within what limits would you expect 68% of times to fall?within what limits would you expect 68% of the times to fall? (b) Within what limits would you expect 95% of the times to fall? (c) Within what limits would you expect 99.7% of the times to fa

15 3.4 Measures of Position A measure of position determines the position of a single value in relation to other values in a sample or a population data set. Measures of position are quartiles, percentiles, and z scores. Quartiles and Interquartile Range Quartiles are the summary measures that divide a sorted data set into four equal parts. Three measures will divide any data set into four equal parts. These three measures are the first quartile (denoted by Q 1 ), the second quartile (denoted by Q 2 ), and the third quartile (denoted by Q 3 ). The data should be ranked in increasing order before the quartiles are determined. The quartiles are defined as follows. Definition Quartiles Quartiles are three summary measures that divide a ranked data set into four equal parts. The second quartile is the same as the median of a data set. The first quartile is the value of the middle term among the observations that are less than the median, and the third quartile is the value of the middle term among the observations that are greater than the median. Figure 3.11 describes the positions of the three quartiles. Each of these portions contains 25% of the observations of a data set arranged in increasing order Figure 3.11 Quartiles. 25% 25% 25% 25% Q 1 Q 2 Q 3 Approximately 25% of the values in a ranked data set are less than Q 1 and about 75% are greater than Q 1. The second quartile, Q 2, divides a ranked data set into two equal parts; hence, the second quartile and the median are the same. Approximately 75% of the data values are less than Q 3 and about 25% are greater than Q 3. The difference between the third quartile and the first quartile for a data set is called the interquartile range (IQR). Calculating Interquartile Range interquartile range; that is, The difference between the third and the first quartiles gives the IQR Interquartile range Q 3 Q 1.

16 EXAMPLE 3 20 Refer to the table below, which gives the 2008 profits (rounded to billions of dol-lars) of 12 companies selected from all over the world. That table is reproduced below. Finding quartiles and the interquartile range Profits Company (billions of dollars) Merck & Co 8 IBM 12 Unilever 7 Microsoft 17 Petrobras 14 Exxon Mobil 45 Lukoil 10 AT&T 13 Nestlé 17 Vodafone 13 Deutsche Bank 9 China Mobile 11 (a) (b) Find the values of the three quartiles. Where does the 2008 profits of Merck & Co fall in relation to these quartiles? Find the interquartile range.

17 Finding the interquartile range. (b) The value of Q 2, which is also the median, is given by the value of the middle term in the ranked data set. For the data of this example, this value is the average of the sixth and seventh terms. Consequently, Q 2 is $12.5 billion. The value of Q 1 is given by the value of the middle term of the six values that fall below the median (or Q 2 ). Thus, it is obtained by taking the average of the third and fourth terms. So, Q 1 is $9.5 billion. The value of Q 3 is given by the value of the middle term of the six values that fall above the median. For the data of this example, Q 3 is obtained by taking the average of the ninth and tenth terms, and it is $15.5 billion. The value of Q 1 $9.5 billion indicates that 25% of the companies in this sample had 2008 profits less than $9.5 billion and 75% of the companies had 2008 profits higher than $9.5 billion. Similarly, we can state that half of these companies had 2008 profits less than $12.5 billion and the other half had profits greater than $12.5 billion since the second quartile is $12.5 billion. The value of Q 3 $15.5 billion indicates that 75% of the companies had 2008 profits less than $15.5 billion and 25% had profits greater than this value. By looking at the position of $8 billion, which is the 2008 profit of Merck & Co, we can state that this value lies in the bottom 25% of the profits for The interquartile range is given by the difference between the values of the third and the first quartiles. Thus, IQR Interquartile range Q 3 Q $6 billion Finding quartiles and the interquartile range. EXAMPLE 3 21 The following are the ages (in years) of nine employees of an insurance company: (a) Find the values of the three quartiles. Where does the age of 28 years fall in relation to the ages of these employees? (b) Find the interquartile range.

18 Percentiles and Percentile Rank Percentiles are the summary measures that divide a ranked data set into 100 equal parts. Each (ranked) data set has 99 percentiles that divide it into 100 equal parts. The data should be ranked in increasing order to compute percentiles. The kth percentile is denoted by P k, where k is an integer in the range 1 to 99. For instance, the 25th percentile is denoted by P 25. Figure 3.12 shows the positions of the 99 percentiles. Each of these portions contains 1% of the observations of a data set arranged in increasing order 1% 1% 1% 1% 1% P 1 1%P 2 P 3 P 97 P 98 P 99 Figure 3.12 Percentiles. Thus, the kth percentile, P k, can be defined as a value in a data set such that about k% of the measurements are smaller than the value of P k and about (100 k)% of the measurements are greater than the value of P k. Calculating Percentiles The (approximate) value of the kth percentile, denoted by P k, is kn P k Value of the a 100 b th term in a ranked data set where k denotes the number of the percentile and n represents the sample size. EXAMPLE 3 22 Refer to the data on 2008 profits for 12 companies given in Example Find the value of the 42nd percentile. GGive a brief interpretation of the 42nd percentile. Finding the percentile for a data set. Here is the data arranged in increasing order are as follows:

19 We can also calculate the percentile rank for a particular value x i of a data set by using the formula given below. The percentile rank of x i gives the percentage of values in the data set that are less than x i. Finding Percentile Rank of a Value Percentile rank of x i Number of values less than x i 100 Total number of values in the data set Finding the percentile rank for a data value. EXAMPLE 3 23 Refer to the data on 2008 profits for 12 companies given in Example Find the percentile rank for $14 billion profit of Petrobras. GGive a brief interpretation of this percentile rank. The data arranged in increasing order are as follows:

20 Box-and-Whisker Plot A box-and-whisker plot gives a graphic presentation of data using five measures: the median, the first quartile, the third quartile, and the smallest and the largest values in the data set between the lower and the upper inner fences. (The inner fences are explained in Example 3 24 below.) A box-and-whisker plot can help us visualize the center, the spread, and the skewness of a data set. It also helps detect outliers. We can compare different distributions by making box-and-whisker plots for each of them. Definition Box-and-Whisker Plot A plot that shows the center, spread, and skewness of a data set. It is constructed by drawing a box and two whiskers that use the median, the first quartile, the third quartile, and the smallest and the largest values in the data set between the lower and the upper inner fences.

21 Constructing a box-and-whisker plot. EXAMPLE 3 24 The following data are the incomes (in thousands of dollars) for a sample of 12 households Construct a box-and-whisker plot for these data. Step 1. First, rank the data in increasing order and calculate the values of the median, the first quartile, the third quartile, and the interquartile range. The ranked data are Step 2. Find the points that are 1.5 IQR below Q 1 and 1.5 IQR above Q 3. These two points are called the lower and the upper inner fences, respectively. Step 3. Determine the smallest and the largest values in the given data set within the two inner fences. These two values for our example are as follows: Step 4. Draw a horizontal line and mark the income levels on it such that all the values in the given data set are covered. Above the horizontal line, draw a box with its left side at the position of the first quartile and the right side at the position of the third quartile. Inside the box, draw a vertical line at the position of the median. The result of this step is shown in Figure First quartile Median Third quartile Income Figure 3.13 Step 5. By drawing two lines,jjoin the points of the smallest and the largest values within the two inner fences to the box. These values are 69 and 112 in this example as listed in Step 3. The two lines that join the box to these two values are called whiskers. A value that falls outside the two inner fences is shown by marking an asterisk and is called an outlier. This completes the box-and-whisker plot, as shown in Figure 3.14.

22 Smallest value within the two inner fences First quartile Median Third quartile Largest value within the two inner fences An outlier Figure Income In Figure 3.14, about 50% of the data values fall within the box, about 25% of the values fall on the left side of the box, and about 25% fall on the right side of the box. Also, 50% of the values fall on the left side of the median and 50% lie on the right side of the median. The data of this example are skewed to the right because the lower 50% of the values are spread over a smaller range than the upper 50% of the values. The observations that fall outside the two inner fences are called outliers. These outliers can be classified into two kinds of outliers mild and extreme outliers. To do so, we define two outer fences a lower outer fence at 3.0 IQR below the first quartile and an upper outer fence at 3.0 IQR above the third quartile. If an observation is outside either of the two inner fences but within either of the two outer fences, it is called a mild outlier. An observation that is outside either of the two outer fences is called an extreme outlier. For the previous example, the outer fences are at 5 and 173. Because 144 is outside the upper inner fence but inside the upper outer fence, it is a mild outlier. For a symmetric data set, the line representing the median will be in the middle of the box and the spread of the values will be over almost the same range on both sides of the box.

23 Z Scores Key Concept This section introduces measures that can be used to compare values from different data sets, or to compare values within the same data set. The most important concept in this section is the z score, so we should understand the role of z scores (for comparing values from different data sets) and we should develop the ability to convert data values to z scores. z Scores A z score (or standardized value) is found by converting a value to a standardized scale, as given in the following definition. We will use z scores extensively in Chapter 6 and later chapters, so they are extremely important. Definition A z score (or standardized value), is the number of standard deviations that a given value x is above or below the mean. It is found using the following expressions: Sample Population z 5 x 2 x s (Round z to two decimal places.) or z 5 x 2 m s The following example illustrates how z scores can be used to compare values, even though they might come from different populations. EXAMPLE Comparing Heights With a height of 75 in., Lyndon Johnson was the tallest president of the past century. With a height of 85 in., Shaquille O Neal is the tallest player on the Miami Heat basketball team. Who is relatively taller: Lyndon Johnson among the presidents of the past century, or Shaquille O Neal among the players on his Miami Heat team? Presidents of the past century have heights with a mean of 71.5 in. and a standard deviation of 2.1 in. Basketball players for the Miami Heat have heights with a mean of 80.0 in. and a standard deviation of 3.3 in. SOLUTION The heights of presidents and basketball players are from very different populations, so a comparison requires that we standardize heights by converting them to z scores. Lyndon Johnson: z 5 x 2 m s 2.1 Shaquille O Neal: z 5 x 2 m s 3.3 INTERPRETATION Lyndon Johnson s height is 1.67 standard deviations above the mean, and Shaquille O Neal s height is 1.52 standard deviations

24 Unusual Values Ordinary Values Unusual Values 0 z Figure 3-5 Interpreting z Scores Unusual values are those with z scores less than 2.00 or greater than above the mean. Lyndon Johnson s height among presidents of the past century is relatively greater than Shaquille O Neal s height among the Miami Heat basketball players. Shaquille O Neal is much taller than Lyndon Johnson, but Johnson is relatively taller when compared to colleagues. z Scores and Unusual Values In Section 3-3 we used the range rule of thumb to conclude that a value is unusual if it is more than 2 standard deviations away from the mean. It follows that unusual values have z scores less than 22 or greater than 12. (See Figure 3-5.) Using this criterion, Lyndon Johnson is not unusually tall when compared to presidents of the past century, and Shaquille O Neal is not unusually tall when compared to his teammates, because neither of them has a height with a z score greater than 2. Ordinary values: 22 # z score # 2 Unusual values: z score,22 or z score. 2 While considering Miami Heat basketball players, the shortest player is Damon Jones with a height of 75 in. His z score is 21.52, as shown in the calculation below. (We again use m in. and s in. for the Miami Heat.) Damon Jones: z 5 x 2 m s Damon Jones height illustrates this principle about values that are below the mean: Whenever a value is less than the mean, its corresponding z score is negative. z scores are measures of position in the sense that they describe the location of a value (in terms of standard deviations) relative to the mean. A z score of 2 indicates that a value is two standard deviations above the mean, and a z score of 23 indicates that a value is three standard deviations below the mean. Quartiles and percentiles are also measures of position, but they are defined differently than z scores and they are useful for comparing values within the same data set or between different sets of data.

Section 3. Measures of Variation

Section 3. Measures of Variation Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The