Measures of average are called measures of central tendency and include the mean, median, mode, and midrange.

Similar documents
OBJECTIVES INTRODUCTION

Chapter 3 Data Description

2011 Pearson Education, Inc

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Section 3. Measures of Variation

Unit 2. Describing Data: Numerical

Exercises from Chapter 3, Section 1

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Feature Extraction Techniques

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Chapter 2: Tools for Exploring Univariate Data

MATH 117 Statistical Methods for Management I Chapter Three

P8130: Biostatistical Methods I

PHY 171. Lecture 14. (February 16, 2012)

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers

Principal Components Analysis

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Statistics for Managers using Microsoft Excel 6 th Edition

Lecture 11. Data Description Estimation

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Pattern Recognition and Machine Learning. Artificial Neural networks

Chapter 3. Data Description

Chapter 3. Data Description. McGraw-Hill, Bluman, 7 th ed, Chapter 3

Measures of Location. Measures of position are used to describe the relative location of an observation

Estimation of the Population Mean Based on Extremes Ranked Set Sampling

GRACEY/STATISTICS CH. 3. CHAPTER PROBLEM Do women really talk more than men? Science, Vol. 317, No. 5834). The study

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Statistics-I. Dr Mahmoud Alhussami

Physics 202H - Introductory Quantum Physics I Homework #12 - Solutions Fall 2004 Due 5:01 PM, Monday 2004/12/13

equal to the of the. Sample variance: Population variance: **The sample variance is an unbiased estimator of the

National 5 Summary Notes

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Chapter 6: Economic Inequality

Range The range is the simplest of the three measures and is defined now.

A proposal for a First-Citation-Speed-Index Link Peer-reviewed author version

3.1 Measure of Center

Midrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used

STAT 200 Chapter 1 Looking at Data - Distributions

CHAPTER 1. Introduction

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

3.3 Variational Characterization of Singular Values

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Describing distributions with numbers

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

In this chapter, we consider several graph-theoretic and probabilistic models

Biostatistics Department Technical Report

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

a a a a a a a m a b a b

Ch 12: Variations on Backpropagation

CHAPTER ONE. Physics and the Life Sciences

GAUTENG DEPARTMENT OF EDUCATION SENIOR SECONDARY INTERVENTION PROGRAMME. PHYSICAL SCIENCE Grade 11 SESSION 11 (LEARNER NOTES)

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Block designs and statistics

Describing Distributions

Elementary Statistics

1. Exploratory Data Analysis

Measures of Dispersion

CHAPTER 2: Describing Distributions with Numbers

A is one of the categories into which qualitative data can be classified.

TOPIC: Descriptive Statistics Single Variable

Chapter 1. Looking at Data

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

Quantitative Tools for Research

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Example 2. Given the data below, complete the chart:

Unit 2: Numerical Descriptive Measures

Section 3.2 Measures of Central Tendency

Describing distributions with numbers

Math 221, REVIEW, Instructor: Susan Sun Nunamaker

Chapter 1: Exploring Data

Research in Area of Longevity of Sylphon Scraies

What is Probability? (again)

Determining the Spread of a Distribution Variance & Standard Deviation

Curious Bounds for Floor Function Sums

Describing Distributions With Numbers Chapter 12

Example A1: Preparation of a Calibration Standard

Chapter 6 Group Activity - SOLUTIONS

Problem Set 2. Chapter 1 Numerical:

Measures of the Location of the Data

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

An Introduction to Meta-Analysis

Practice Questions for Exam 1

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression

Kernel Methods and Support Vector Machines

Practice problems from chapters 2 and 3

SESSION 5 Descriptive Statistics

Describing Distributions With Numbers

General Properties of Radiation Detectors Supplements

1.3.1 Measuring Center: The Mean

Transcription:

CHAPTER 3 Data Description Objectives Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and standard deviation. Identify the position of a data value in a data set using various easures of position, such as percentiles, deciles, and quartiles. Use the techniques of exploratory data analysis, including boxplots and five-nuber suaries to discover various aspects of data. Introduction Measures of average are called easures of central tendency and include the ean, edian, ode, and idrange. Measures that deterine the spread of the data values are called easures of variation and include the range, variance, and standard deviation. Measures of a specific data value s relative position in coparison with other data values are called easures of position and include percentiles, deciles, and quartiles. Section 3-1 Measures of Central Tendency A statistic is a characteristic or easure obtained by using the data values fro a saple. A paraeter is a characteristic or easure obtained by using all the data values for a specific population. I. Mean and Mode The sybol for a population ean is. The sybol for a saple ean is x (read x bar ). The ean is the su of the values, divided by the total nuber of values. saple ean population ean x is any data value fro the data set n is the total nuber of data (n is called the saple size) Rounding Rule for the Mean: The ean should be rounded to one ore decial place than occurs in the raw data.

Exaple 1: Find the ean of 4, 8, 36 Mode is the value that occurs ost often in a data set. A data set can have ore than one ode or no ode at all. Exaple : Find the ode of.3.4.8.3 4.5 3.1. Exaple 3: Find the ode of 3 4 7 8 11 13 The ode is the only easure of central tendency that can be used in finding the ost typical case when the data are categorical. The procedure for finding the ean for grouped data uses the idpoints of the f x classes. The forula for finding the ean of grouped data is x. n The odal class is the class with the largest frequency. Exaple 4: Thirty autoobiles were tested for fuel efficiency (in iles per gallon). Find the ean fuel efficiency and the odal class for the frequency distribution obtained fro the thirty autoobiles. Class boundaries Frequency 7.5 1.5 3 1.5 17.5 5 17.5.5 15.5 7.5 5 7.5 3.5

II. Median and Midrange The edian is the idpoint in a data set. The sybol for a saple edian is MD 1. Reorder the data fro sall to large. Find the data that represents the iddle position Exaple 1: Find the edian (a) 35, 48, 6, 3, 47 (b) 5.4, 6.8, 7.3, 7.5, 8.1, 6.4 Exaple : Find the edian 3, 5, 3, 6, 13, 11, 8, 19, 1, 6 Midrange is the su of the lowest and highest values in a data set, divided by. Exaple 3: Find the idrange of 17, 16, 15, 13, 17, 1, 10. Exaple 4: The average undergraduate grade point average (GPA) for the top 9 ranked-edical schools are listed below. 3.80 3.86 3.83 3.78 3.75 3.75 3.86 3.70 3.74 Find (a) the ean, (b) the edian, (c) the ode, and (d) the idrange.

III. Weighted Mean Weighted Mean ultiply each value by its corresponding weight and dividing the su of the products by the su of the weights. w1 x1 w x w3 x3... wnx x w w w... w x wx w 1 3 n n where w1, w,..., w n are the weights and x1, x,..., xn are the values. Exaple 1: An instructor gives four 1-hour exas and one final exa, which counts as two 1-hour exas. Find the student s overall average if she received 83, 65, 70, and 7 on the 1- hour exas and 78 on the final exa. Exaple : Grade distributions for a Math 7 class: In class-8%; tests-5%; coputer exa-10%; and final exa-30%. A student had grades of 8, 75, 94, and 78 respectively, In class, tests, coputer exa, and final exa. Find the student s final average. Properties of the Mean (pg 14) Uses all data values. Varies less than the edian or ode Used in coputing other statistics, such as the variance Unique, usually not one of the data values Cannot be used with open-ended classes Affected by extreely high or low values, called outliers

Properties of the Median (pg 14) Gives the idpoint Used when it is necessary to find out whether the data values fall into the upper half or lower half of the distribution. Can be used for an open-ended distribution. Affected less than the ean by extreely high or extreely low values. Properties of the Mode (pg 14) Used when the ost typical case is desired Easiest average to copute Can be used with noinal data Not always unique or ay not exist Properties of the Midrange (pg 14) Easy to copute. Gives the idpoint. Affected by extreely high or low values in a data set Types of Distributions Figure 3-1 Section 3- Measures of Variation I. Range, saple variance, and saple standard deviation Range is the highest value inus the lowest value. R = highest value lowest value Exaple 1: Find the range of 3, 78, 54, 65, 89. Text: Exaple 3-18/19: Outdoor Paint

The easures of variance and standard deviation are used to deterine the consistency of a variable. Variance is the average of the square of the distance that each value is fro the ean. Measure the dispersion away fro the ean e.g. 5, 8, 11 What is the ean? Logically su up differences then divide it by 3 To avoid the cancellation, take the squared deviations. Su of the squares = Average of the su of the squares = (Variance) Standard deviation (Take the square root of the variance) = Forulas for calculating variance and standard deviation Definition Forulas Variance of a saple ( x x) s n 1 Standard Deviation of a saple s s ( x x) n 1 variance Coputational Forulas Variance of a saple s ( n 1) x ( x) / n

Exaple : Use the definition forula to find the variance and standard deviation of 5, 8, 11 x = x x x ( x x) ( x x) = Saple variance: Saple standard deviation: Exaple 3: Use the definition forula to find the standard deviation of 5.8, 4.6, 5.3, 3.8, 6.0 x = x x x ( x x) ( x x) Saple variance: Saple standard deviation:

Exaple 4: Use the coputational forula to find the standard deviation of 5.8, 4.6, 5.3, 3.8, 6.0 x x Note: Both the ean and standard deviation are sensitive to extree observations called the outliers. The standard deviation is used to describe variability when the ean is used as a easure of central tendency. II. Variance and standard deviation for grouped data The forula is siilar to the coputational forula of s Exaple 1: f x f x / n n 1 s for a data set These data represent the net worth (in illions of dollars) of 50 businesses in a large city. Find the variance and standard deviation. Class Liits Frequency 10-0 5 1 31 10 3 4 3 43 53 7 54 64 18 65 75 7

III. Coefficient of variation The coefficient of variation is a easure of relative variability that expresses standard deviation as a percentage of the ean. s CVar 100% x When coparing the standard deviations of two different variables, the coefficient of variations are used. Exaple 1: The average score on an English final exaination was 85, with a standard deviation of 5; the average score on a history final exa was 110, with a standard deviation of 8. Copare the variations of the two. IV. Range Rule of Thub The range can be used to approxiate the standard deviation. This approxiation is called the range rule of thub. The Range Rule of Thub: Exaple: Using the range rule of thub, approxiate the standard deviation for the data set 5, 8, 8, 9, 10, 1, and 13. Note: The range rule of thub is only an approxiation and should be used when the distribution of data is uniodal and roughly syetric. Measures of Variation: Range Rule of Thub Use to approxiate the lowest value and to approxiate the highest value in a data set.

Exaple: X 10, Range 1 V. Chebyshev s Theore and Epirical Rule Chebyshev s Theore (Any distribution shape) The proportion of values fro a data set that will fall with k standard deviation of the ean will be at least 1 1/k, where k is a nuber greater than 1. Epirical Rule ( A bell-shaped distribution) Approxiately 68% of the data values will fall within 1 standard deviation of the ean. Approxiately 95% of the data values will fall within standard deviations of the ean. Approxiately 99.7% of the data values will fall within 3 standard deviations of the ean. Exaple 1: The average U.S. yearly per capita consuption of citrus fruit is 6.8 pounds. Suppose that the distribution of fruit aounts consued is bell-shaped with a standard deviation of equal to 4. pounds. What percentage of Aericans would you expect to consue in the range of 18.4 pounds to 35. pounds of citrus fruit per year?

Exaple : Using Chebyshev s theore, solve these probles for a distribution with a ean of 50 and a standard deviation of 5. At least what percentage of the values will fall between 40 and 60? Exaple 3: A saple of the labor costs per hour to asseble a certain product has a ean of $.60 and a standard deviation of $0.15. Using Chebyshev s theore, find the range in which at least 88.89% of the data will lie. Section 3-3 Measures of Position I. z score A z score represents the nuber of standard deviations that a data value lies above or below the ean. x x z s

Exaple 1: Which of these exa grades has a better relative position? (a) A grade of 56 on a test with x = 48 and s = 5. (b) A grade of 0 on a test with x = 00 and s = 10. Exaple : Huan body teperatures have a ean of 98.0 and a standard deviation of 0.6. An Eergency roo patient is found to have a teperature of 101. Convert 101 to a z score. Consider a data to be extreely unusual if its z score is less than -3.00 or greater than 3.00. Is that teperature unusually high? What does it suggest? II. Percentiles and Quartiles Percentile Forula (Percentile Rank) The percentile corresponding to a given value x is coputed by using the following forula: ( nuber of valuesbelow x) 0.5 Percentile 100% total nuber of values

Exaple 1: Find the percentile rank for each test score in the data set. 5, 15, 1, 16, 0, 1 Forula for finding a value corresponding to a given percentile ( P ) P - is the nuber that separates the botto % of the data fro the top (100- )% of the data. e.g. If your test score represented 90 th percentile eans that 90% of the people who took the test scored lower than you and only 10% scored higher than you. Finding the location of P : Evaluate ( ) n 100 1. If ( ) n is a whole nuber, then location of P is [ ( ) n +0.5]. 100 100 The percentile of P is halfway between the data value in position ( ) n and the data value in the next position. 100. If ( ) n is not a whole nuber, then location of P is the next higher 100 whole nuber. The percentile of P is the data value in this location.

Quartiles are defined as follows: The first Quartile Q 1 = P 5 The second Quartile Q = P 50 The third Quartile Q 3 = P 75 Exaple : The nuber of hoe runs hit by Aerican League hoe run leaders in the year 1959 1998. These ordered data are 3 3 3 3 33 36 36 37 39 39 39 40 40 40 40 41 4 4 43 43 44 44 44 44 45 45 46 46 48 49 49 49 49 50 51 5 56 56 61 Find the following: (a) P 77 (b) P 4 (c) Q 1 (d) Q 3

III. The Interquartile Range The interquartile range, or IQR IQR = Q 3 - Q 1 The interquartile range is not influenced by extree observations. If the edian is used as a easure of central tendency, then the interquartile range should be used to describe variability. Identifying Outliers (extreely high or low data value) Any data value that is saller than Q 1-1.5 x IQR or larger than Q 3 + 1.5x IQR is considered as an outlier. The quick way to find Q 1 and Q 3: Find the edian of the data values that fall below Q is Q 1. Find the edian of the data values that fall above Q is Q 3. Exaple 1: Consider the following ranked data:.09.14.5.37.55.55.56.60.77.77.86.93 1.15 1.34 1.41 1.75.01.3 3.69 3.90 4.50 4.88 7.79 (a) Find the interquartile range (b) Is 7.79 an outlier?

Exaple : Check the following data set for outliers. 145 119 1 118 15 100 Section 3-4 Exploratory Data Analysis I. Boxplot The edian and the interquartile range are used to describe the distribution using a graph called a boxplot. Fro a boxplot, we can detect any skewness in the shape of the distribution and identify any outliers in the data set. Constructing a boxplot Find the 5-nuber suary consisting of the Low, Q 1, Q (edian), Q 3, and High. Construct a scale with values that include the Low and High. Construct a box with two vertical sides called the hinges above Q 1 and Q 3 on the axis. Also construct a vertical line in the box above Q. Finally, connect the Low and High to the hinges using horizontal lines called the whiskers.

Exaple 1: Construct a boxplot for the nuber of calculators sold during a randoly selected week. 8, 1, 3, 5, 9, 15, 3 Exaple : The following ranked data represent the nuber of English-language Sunday newspaper in each of the 50 states. 3 3 4 4 4 4 4 5 6 6 6 7 7 7 8 10 11 11 11 1 1 13 14 14 14 15 15 16 16 16 16 16 16 18 18 19 1 1 3 7 31 35 37 38 39 40 44 6 85 Construct a boxplot for the data.

C Density Exaple 3: For the boxplot given below, (a) identify the axiu value, iniu value, first quartile, edian, third quartile, and interquartile range; (b) coent on the shape of the distribution; (c) identify a suspected outlier. --------------------- --I + I--------------------- * --------------------- --------+---------+---------+---------+---------+-------- 48 60 7 84 96 II. The distribution shape A bell-shaped distribution 14 0.3 13 0. 1 11 0.1 10 0.0 10 11 1 13 14 C

C1 Density C3 Density A slightly skewed to the right distribution 14 0.3 13 0. 1 11 0.1 10 0.0 10 11 1 13 14 C A slightly skewed to the left distribution 15 0.3 14 13 0. 1 0.1 11 10 0.0 10 11 1 13 14 15 C Suary Soe basic ways to suarize data include easures of central tendency, easures of variation or dispersion, and easures of position. The three ost coonly used easures of central tendency are the ean, edian, and ode. The idrange is also used to represent an average. The three ost coonly used easureents of variation are the range, variance, and standard deviation. The ost coon easures of position are percentiles, quartiles, and deciles. Data values are distributed according to Chebyshev s theore and in special cases, the epirical rule.