Measures of average are called measures of central tendency and include the mean, median, mode, and midrange.

CHAPTER 3 Data Description Objectives Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and standard deviation. Identify the position of a data value in a data set using various easures of position, such as percentiles, deciles, and quartiles. Use the techniques of exploratory data analysis, including boxplots and five-nuber suaries to discover various aspects of data. Introduction Measures of average are called easures of central tendency and include the ean, edian, ode, and idrange. Measures that deterine the spread of the data values are called easures of variation and include the range, variance, and standard deviation. Measures of a specific data value s relative position in coparison with other data values are called easures of position and include percentiles, deciles, and quartiles. Section 3-1 Measures of Central Tendency A statistic is a characteristic or easure obtained by using the data values fro a saple. A paraeter is a characteristic or easure obtained by using all the data values for a specific population. I. Mean and Mode The sybol for a population ean is. The sybol for a saple ean is x (read x bar ). The ean is the su of the values, divided by the total nuber of values. saple ean population ean x is any data value fro the data set n is the total nuber of data (n is called the saple size) Rounding Rule for the Mean: The ean should be rounded to one ore decial place than occurs in the raw data.

Exaple 1: Find the ean of 4, 8, 36 Mode is the value that occurs ost often in a data set. A data set can have ore than one ode or no ode at all. Exaple : Find the ode of.3.4.8.3 4.5 3.1. Exaple 3: Find the ode of 3 4 7 8 11 13 The ode is the only easure of central tendency that can be used in finding the ost typical case when the data are categorical. The procedure for finding the ean for grouped data uses the idpoints of the f x classes. The forula for finding the ean of grouped data is x. n The odal class is the class with the largest frequency. Exaple 4: Thirty autoobiles were tested for fuel efficiency (in iles per gallon). Find the ean fuel efficiency and the odal class for the frequency distribution obtained fro the thirty autoobiles. Class boundaries Frequency 7.5 1.5 3 1.5 17.5 5 17.5.5 15.5 7.5 5 7.5 3.5

II. Median and Midrange The edian is the idpoint in a data set. The sybol for a saple edian is MD 1. Reorder the data fro sall to large. Find the data that represents the iddle position Exaple 1: Find the edian (a) 35, 48, 6, 3, 47 (b) 5.4, 6.8, 7.3, 7.5, 8.1, 6.4 Exaple : Find the edian 3, 5, 3, 6, 13, 11, 8, 19, 1, 6 Midrange is the su of the lowest and highest values in a data set, divided by. Exaple 3: Find the idrange of 17, 16, 15, 13, 17, 1, 10. Exaple 4: The average undergraduate grade point average (GPA) for the top 9 ranked-edical schools are listed below. 3.80 3.86 3.83 3.78 3.75 3.75 3.86 3.70 3.74 Find (a) the ean, (b) the edian, (c) the ode, and (d) the idrange.

III. Weighted Mean Weighted Mean ultiply each value by its corresponding weight and dividing the su of the products by the su of the weights. w1 x1 w x w3 x3... wnx x w w w... w x wx w 1 3 n n where w1, w,..., w n are the weights and x1, x,..., xn are the values. Exaple 1: An instructor gives four 1-hour exas and one final exa, which counts as two 1-hour exas. Find the student s overall average if she received 83, 65, 70, and 7 on the 1- hour exas and 78 on the final exa. Exaple : Grade distributions for a Math 7 class: In class-8%; tests-5%; coputer exa-10%; and final exa-30%. A student had grades of 8, 75, 94, and 78 respectively, In class, tests, coputer exa, and final exa. Find the student s final average. Properties of the Mean (pg 14) Uses all data values. Varies less than the edian or ode Used in coputing other statistics, such as the variance Unique, usually not one of the data values Cannot be used with open-ended classes Affected by extreely high or low values, called outliers

Properties of the Median (pg 14) Gives the idpoint Used when it is necessary to find out whether the data values fall into the upper half or lower half of the distribution. Can be used for an open-ended distribution. Affected less than the ean by extreely high or extreely low values. Properties of the Mode (pg 14) Used when the ost typical case is desired Easiest average to copute Can be used with noinal data Not always unique or ay not exist Properties of the Midrange (pg 14) Easy to copute. Gives the idpoint. Affected by extreely high or low values in a data set Types of Distributions Figure 3-1 Section 3- Measures of Variation I. Range, saple variance, and saple standard deviation Range is the highest value inus the lowest value. R = highest value lowest value Exaple 1: Find the range of 3, 78, 54, 65, 89. Text: Exaple 3-18/19: Outdoor Paint

The easures of variance and standard deviation are used to deterine the consistency of a variable. Variance is the average of the square of the distance that each value is fro the ean. Measure the dispersion away fro the ean e.g. 5, 8, 11 What is the ean? Logically su up differences then divide it by 3 To avoid the cancellation, take the squared deviations. Su of the squares = Average of the su of the squares = (Variance) Standard deviation (Take the square root of the variance) = Forulas for calculating variance and standard deviation Definition Forulas Variance of a saple ( x x) s n 1 Standard Deviation of a saple s s ( x x) n 1 variance Coputational Forulas Variance of a saple s ( n 1) x ( x) / n

Exaple : Use the definition forula to find the variance and standard deviation of 5, 8, 11 x = x x x ( x x) ( x x) = Saple variance: Saple standard deviation: Exaple 3: Use the definition forula to find the standard deviation of 5.8, 4.6, 5.3, 3.8, 6.0 x = x x x ( x x) ( x x) Saple variance: Saple standard deviation:

Exaple 4: Use the coputational forula to find the standard deviation of 5.8, 4.6, 5.3, 3.8, 6.0 x x Note: Both the ean and standard deviation are sensitive to extree observations called the outliers. The standard deviation is used to describe variability when the ean is used as a easure of central tendency. II. Variance and standard deviation for grouped data The forula is siilar to the coputational forula of s Exaple 1: f x f x / n n 1 s for a data set These data represent the net worth (in illions of dollars) of 50 businesses in a large city. Find the variance and standard deviation. Class Liits Frequency 10-0 5 1 31 10 3 4 3 43 53 7 54 64 18 65 75 7

III. Coefficient of variation The coefficient of variation is a easure of relative variability that expresses standard deviation as a percentage of the ean. s CVar 100% x When coparing the standard deviations of two different variables, the coefficient of variations are used. Exaple 1: The average score on an English final exaination was 85, with a standard deviation of 5; the average score on a history final exa was 110, with a standard deviation of 8. Copare the variations of the two. IV. Range Rule of Thub The range can be used to approxiate the standard deviation. This approxiation is called the range rule of thub. The Range Rule of Thub: Exaple: Using the range rule of thub, approxiate the standard deviation for the data set 5, 8, 8, 9, 10, 1, and 13. Note: The range rule of thub is only an approxiation and should be used when the distribution of data is uniodal and roughly syetric. Measures of Variation: Range Rule of Thub Use to approxiate the lowest value and to approxiate the highest value in a data set.

Exaple: X 10, Range 1 V. Chebyshev s Theore and Epirical Rule Chebyshev s Theore (Any distribution shape) The proportion of values fro a data set that will fall with k standard deviation of the ean will be at least 1 1/k, where k is a nuber greater than 1. Epirical Rule ( A bell-shaped distribution) Approxiately 68% of the data values will fall within 1 standard deviation of the ean. Approxiately 95% of the data values will fall within standard deviations of the ean. Approxiately 99.7% of the data values will fall within 3 standard deviations of the ean. Exaple 1: The average U.S. yearly per capita consuption of citrus fruit is 6.8 pounds. Suppose that the distribution of fruit aounts consued is bell-shaped with a standard deviation of equal to 4. pounds. What percentage of Aericans would you expect to consue in the range of 18.4 pounds to 35. pounds of citrus fruit per year?

Exaple : Using Chebyshev s theore, solve these probles for a distribution with a ean of 50 and a standard deviation of 5. At least what percentage of the values will fall between 40 and 60? Exaple 3: A saple of the labor costs per hour to asseble a certain product has a ean of $.60 and a standard deviation of $0.15. Using Chebyshev s theore, find the range in which at least 88.89% of the data will lie. Section 3-3 Measures of Position I. z score A z score represents the nuber of standard deviations that a data value lies above or below the ean. x x z s

Exaple 1: Which of these exa grades has a better relative position? (a) A grade of 56 on a test with x = 48 and s = 5. (b) A grade of 0 on a test with x = 00 and s = 10. Exaple : Huan body teperatures have a ean of 98.0 and a standard deviation of 0.6. An Eergency roo patient is found to have a teperature of 101. Convert 101 to a z score. Consider a data to be extreely unusual if its z score is less than -3.00 or greater than 3.00. Is that teperature unusually high? What does it suggest? II. Percentiles and Quartiles Percentile Forula (Percentile Rank) The percentile corresponding to a given value x is coputed by using the following forula: ( nuber of valuesbelow x) 0.5 Percentile 100% total nuber of values

Exaple 1: Find the percentile rank for each test score in the data set. 5, 15, 1, 16, 0, 1 Forula for finding a value corresponding to a given percentile ( P ) P - is the nuber that separates the botto % of the data fro the top (100- )% of the data. e.g. If your test score represented 90 th percentile eans that 90% of the people who took the test scored lower than you and only 10% scored higher than you. Finding the location of P : Evaluate ( ) n 100 1. If ( ) n is a whole nuber, then location of P is [ ( ) n +0.5]. 100 100 The percentile of P is halfway between the data value in position ( ) n and the data value in the next position. 100. If ( ) n is not a whole nuber, then location of P is the next higher 100 whole nuber. The percentile of P is the data value in this location.

Quartiles are defined as follows: The first Quartile Q 1 = P 5 The second Quartile Q = P 50 The third Quartile Q 3 = P 75 Exaple : The nuber of hoe runs hit by Aerican League hoe run leaders in the year 1959 1998. These ordered data are 3 3 3 3 33 36 36 37 39 39 39 40 40 40 40 41 4 4 43 43 44 44 44 44 45 45 46 46 48 49 49 49 49 50 51 5 56 56 61 Find the following: (a) P 77 (b) P 4 (c) Q 1 (d) Q 3

III. The Interquartile Range The interquartile range, or IQR IQR = Q 3 - Q 1 The interquartile range is not influenced by extree observations. If the edian is used as a easure of central tendency, then the interquartile range should be used to describe variability. Identifying Outliers (extreely high or low data value) Any data value that is saller than Q 1-1.5 x IQR or larger than Q 3 + 1.5x IQR is considered as an outlier. The quick way to find Q 1 and Q 3: Find the edian of the data values that fall below Q is Q 1. Find the edian of the data values that fall above Q is Q 3. Exaple 1: Consider the following ranked data:.09.14.5.37.55.55.56.60.77.77.86.93 1.15 1.34 1.41 1.75.01.3 3.69 3.90 4.50 4.88 7.79 (a) Find the interquartile range (b) Is 7.79 an outlier?

Exaple : Check the following data set for outliers. 145 119 1 118 15 100 Section 3-4 Exploratory Data Analysis I. Boxplot The edian and the interquartile range are used to describe the distribution using a graph called a boxplot. Fro a boxplot, we can detect any skewness in the shape of the distribution and identify any outliers in the data set. Constructing a boxplot Find the 5-nuber suary consisting of the Low, Q 1, Q (edian), Q 3, and High. Construct a scale with values that include the Low and High. Construct a box with two vertical sides called the hinges above Q 1 and Q 3 on the axis. Also construct a vertical line in the box above Q. Finally, connect the Low and High to the hinges using horizontal lines called the whiskers.

Exaple 1: Construct a boxplot for the nuber of calculators sold during a randoly selected week. 8, 1, 3, 5, 9, 15, 3 Exaple : The following ranked data represent the nuber of English-language Sunday newspaper in each of the 50 states. 3 3 4 4 4 4 4 5 6 6 6 7 7 7 8 10 11 11 11 1 1 13 14 14 14 15 15 16 16 16 16 16 16 18 18 19 1 1 3 7 31 35 37 38 39 40 44 6 85 Construct a boxplot for the data.

C Density Exaple 3: For the boxplot given below, (a) identify the axiu value, iniu value, first quartile, edian, third quartile, and interquartile range; (b) coent on the shape of the distribution; (c) identify a suspected outlier. --------------------- --I + I--------------------- * --------------------- --------+---------+---------+---------+---------+-------- 48 60 7 84 96 II. The distribution shape A bell-shaped distribution 14 0.3 13 0. 1 11 0.1 10 0.0 10 11 1 13 14 C

C1 Density C3 Density A slightly skewed to the right distribution 14 0.3 13 0. 1 11 0.1 10 0.0 10 11 1 13 14 C A slightly skewed to the left distribution 15 0.3 14 13 0. 1 0.1 11 10 0.0 10 11 1 13 14 15 C Suary Soe basic ways to suarize data include easures of central tendency, easures of variation or dispersion, and easures of position. The three ost coonly used easures of central tendency are the ean, edian, and ode. The idrange is also used to represent an average. The three ost coonly used easureents of variation are the range, variance, and standard deviation. The ost coon easures of position are percentiles, quartiles, and deciles. Data values are distributed according to Chebyshev s theore and in special cases, the epirical rule.