Comparing Measures of Central Tendency *

Similar documents
Binomial Distribution *

Lecture Notes 2: Variables and graphics

MATH 10 INTRODUCTORY STATISTICS

Chapter I, Introduction from Online Statistics Education: An Interactive Multimedia Course of Study comprises public domain material by David M.

Introduction to Statistics

SESSION 5 Descriptive Statistics

Chapter 3. Data Description

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 6 Group Activity - SOLUTIONS

Units. Exploratory Data Analysis. Variables. Student Data

Stat 101 Exam 1 Important Formulas and Concepts 1

Example 2. Given the data below, complete the chart:

Chapter 2: Tools for Exploring Univariate Data

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Histograms allow a visual interpretation

Elementary Statistics

8/4/2009. Describing Data with Graphs

3.1 Measure of Center

1. Exploratory Data Analysis

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

A is one of the categories into which qualitative data can be classified.

AP Final Review II Exploring Data (20% 30%)

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Chapters 1 & 2 Exam Review

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Chapter 1. Looking at Data

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

FREQUENCY DISTRIBUTIONS AND PERCENTILES

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Chapter 7: Statistics Describing Data. Chapter 7: Statistics Describing Data 1 / 27

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Chapter 5. Understanding and Comparing. Distributions

Descriptive Statistics-I. Dr Mahmoud Alhussami

Statistics for Managers using Microsoft Excel 6 th Edition

Topic Page: Central tendency

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Chapter2 Description of samples and populations. 2.1 Introduction.

Introduction to Statistics

Exponential Functions and Graphs - Grade 11 *

Getting To Know Your Data

Essential Academic Skills Subtest III: Mathematics (003)

Section 3. Measures of Variation

STAT 200 Chapter 1 Looking at Data - Distributions

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Measures of Central Tendency

Descriptive Data Summarization

Sem. 1 Review Ch. 1-3

Unit 2: Numerical Descriptive Measures

Principles of Business Statistics. Collection Editor: Mihai Nica

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Statistics I Chapter 2: Univariate data analysis

Preliminary Statistics course. Lecture 1: Descriptive Statistics

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Statistics and parameters

Probability Distributions

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Chapter I: Introduction & Foundations

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Range The range is the simplest of the three measures and is defined now.

Data Analysis and Statistical Methods Statistics 651

TOPIC: Descriptive Statistics Single Variable

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Statistics I Chapter 2: Univariate data analysis

MgtOp 215 Chapter 3 Dr. Ahn

The science of learning from data.

Statistics 511 Additional Materials

Sampling, Frequency Distributions, and Graphs (12.1)

Introduction to Statistics for Traffic Crash Reconstruction

Archdiocese of Washington Catholic Schools Academic Standards Mathematics

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Analytical Graphing. lets start with the best graph ever made

Statistics in medicine

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Descriptive Statistics Solutions COR1-GB.1305 Statistics and Data Analysis

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Section 3.2 Measures of Central Tendency

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STT 315 This lecture is based on Chapter 2 of the textbook.

Multiple Choice. Chapter 2 Test Bank

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Statistics lecture 3. Bell-Shaped Curves and Other Shapes

Functions and graphs - Grade 10 *

Exponential and Logarithmic Equations

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks

Linear Equations in One Variable *

Unit 1: Number System Fluency

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

GRE Quantitative Reasoning Practice Questions

Francine s bone density is 1.45 standard deviations below the mean hip bone density for 25-year-old women of 956 grams/cm 2.

Describing Data: Numerical Measures

Agile Mind Mathematics 6 Scope and Sequence, Common Core State Standards for Mathematics

ANÁLISE DOS DADOS. Daniela Barreiro Claro

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

2011 Pearson Education, Inc

MATH 1150 Chapter 2 Notation and Terminology

Transcription:

OpenStax-CNX module: m11011 1 Comparing Measures of Central Tendency * David Lane This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 1.0 1 Comparing Measures of Central Tendency How do the various measures of central tendency compare with each other? For symmetric distributions (p. 10), the mean, median, trimean, and trimmed mean are equal, as is the mode except in bimodal distributions (p. 3). Dierences among the measures occur with skewed (p. 9) distributions. Figure 1 shows the distribution of 642 scores on an introductory psychology test. Notice this distribution has a slight positive skew. Figure 1: A distribution with a positive skew. Measures of central tendency are shown in Table 1: Measures of central tendency for the test scores.. Notice they do not dier greatly, with the exception that the mode is lower than the other measures. When distributions have a positive skew, the mean is higher than the median. For these data, the mean of 91.58 is higher than the median of 90. Typically the trimean (p. 10) and trimmed (p. 10) mean will fall between the median (p. 7) and the mean (p. 7), although in this case, the trimmed mean is slightly lower than the median. The geomtric mean (p. 6) is the lower than all measures except the mode (p. 7). * Version 2.3: Jul 11, 2003 10:22 am -0500 http://creativecommons.org/licenses/by/1.0

OpenStax-CNX module: m11011 2 Measures of central tendency for the test scores. Measure Value Mode 84.00 Median 90.00 Geometric Mean 89.70 Trimean 90.25 Mean trimmed 50% 89.81 Mean 91.58 Table 1 The distribution of baseball salaries (in 1994) shown in Figure 2 has a much more pronounced skew than the distribution in Figure 1. Figure 2: A distribution with a very large positive skew. This histogram shows the salaries of major league baseball players (in thousands of dollars). Table 2: Measures of central tendency for baseball salaries (in thousands of dollars). shows the measures of central tendency for these data. The large skew results in very dierent values for these measures. No single measure of central tendency is sucient for data such as these. If you were asked the very general question:"so, what do baseball players make?" and answered with the mean of $1,183,000, you would have not told the whole story since only about one third of baseball players make that much. If you answered with the mode of $250,000 or the median of $500,000, you would not be giving any indication that some players make many millions of dollars. Fortunately, there is no need to summarize a distribution with a single number. When the various measurs dier, our opinion is that you should report the mean, median, and either the trimean or a the mean trimmed 50%. Sometimes it is worth reporting the mode as well. In the media, the median is usually reported to summarize the center of skewed distributions. You will hear about median salaries and median prices of houses sold, etc. This is better than reporting only the mean, but it would be informative to hear more statistics.

OpenStax-CNX module: m11011 3 Measures of central tendency for baseball salaries (in thousands of dollars). Measure Value Mode 250 Median 500 Geometric Mean 555 Trimean 792 Mean trimmed 50% 619 Mean 1,183 Table 2 Glossary Denition 2: Average 1. The (arithmetic) mean 2. Any measure of central tendency Denition 2: Bimodal Distribution A distribution with two distinct peaks. An example is shown below. Figure 3 Denition 3: Bar Chart A graphical method of presenting data from a discrete variable. A bar is drawn for each value of the variable. The height of each bar contains the number or percentage of observations with that value of the variable. An exmple is shown below. See also: histogram, line graph, pie chart, box plot. See Figure 4 for an example.

OpenStax-CNX module: m11011 4 Figure 4 Denition 4: Box Plot One of the more eective graphical summaries of a data set, the box plot generally shows mean, median, 25th and 75th percentiles, and outliers. A standard box plot is composed of the median, upper hinge, lower hinge, higher adjacent value, lower adjacent value, outside values, and far out values. An example is shown below. Parallel box plots are very useful for comparing distributions. See Figure 5 for an example. See also: step, H-spread. Figure 5 Denition 5: Center (of a Distribution) Central Tendency The center or middle of a distribution. There are many measures of central tendency. The most common are the mean, median, and mode. Others include the trimean, trimmed mean, and geometric mean.

OpenStax-CNX module: m11011 5 Denition 5: Class Interval Bin Width Also known as bin width, the class interval is a division of data for use in a histogram. For instance, it is possible to partition scores on a 100 point test into class intervals of 1-25, 26-49, 50-74 and 75-100. Denition 5: Class Frequency One of the components of a histogram, the class frequency is the number of observations in each class interval. See also: relative frequency. Denition 5: Continuous Variables Variables that can take on any value in a certain range. Time and distance are continuous; gender, SAT score and "time rounded to the nearest second" are not. Variables that are not continuous are known as discrete variables. No measured variable is truly continuous; however, discrete variables measured with enough precision can often be considered continuous for practical purposes. Denition 5: Data A collection of values to be used for statistical analysis. See also: variable. Denition 5: Discrete Variables that can only take on a nite number of values are called "discrete variables." All qualitative variables are discrete. Some quantitative variables are discrete, such as performance rated as 1, 2, 3, 4, or 5, or temperature rounded to the nearest degree. Sometimes, a variable that takes on enough discrete values can be considered to be continuous for practical purposes. One example is time to the nearest millisecond. Variables that can take on an innite number of possible values are called continuous variables. Denition 5: Distribution Frequency Distribution The distribution of empirical data is called a frequency distribution and consists of a count of the number of occurences of each value. If the data are continuous, then a grouped frequency distribution is used. Typically, a distribution is portrayed using a frequency polygon or a histogram. Mathematical distributions are often used to dene distributions. The normal distribution is, perhaps, the best known example. Many empirical distributions are approximated well by mathematical distributions such as the normal distribution. Denition 5: Far Out Value One of the components of a box plot, far out values are those that are more than 2 steps from the nearest hinge. They are beyond the outer fences. Denition 5: Frequency Polygon A frequency polygon is a graphical representation of a distribution. It partitions the variable on the x-axis into various contiguous class intervals of (usually) equal widths. The heights of the polygon's points represent the class frequencies. See Figure 6 for an example.

OpenStax-CNX module: m11011 6 Figure 6 Denition 6: Geometric Mean The geometric mean of n numbers is obtained by multiplying all of them together, and then taking the nth root of them. It is one of the rarer measures of central tendency, and not to be confused with the much, much more common arithmetic mean. Denition 6: Grouped Frequency Distribution A grouped frequency distribution is a frequency distribution in which frequencies are displayed for ranges of data rather than for individual values. For example, the distribution of heights might be calculated by dening one-inch ranges. The frequency of indivuals with various heights rounded o to the nearest inch would be then be tabulated. See also: histogram. Denition 6: Higher Adjacent Value One of the components of a box plot, the higher adjacent value is the largest value in the data below the 75th percentile. Denition 6: Histogram A histogram is a graphical representation of a distribution. It partitions the variable on the x-axis into various contiguous class intervals of (usually) equal widths. The heights of the bars represent the class frequencies. See Figure 7 for an example. Figure 7 See also: Sturgis's Rule.

OpenStax-CNX module: m11011 7 Denition 7: H-spread One of the components of a box plot, the H-spread is the dierence between the upper hinge and the lower hinge. Denition 7: Levels of Measurement Measurement scales dier in their level of measurement. There are four common levels of measurement: 1. Nominal scales are only labels. 2. Ordinal Scales are ordered but are not truly quantitative. Equal intervals on the ordinal scale do not imply equal intervals on the underlying trait. 3. Interval scales are are ordered and equal intervals equal intervals on the underlying trait. However, interval scales do not have a true zero point. 4. Ratio scales are interval scales that do have a true zero point. With ratio scales, it is sensible to talk about one value being twice as large as another, for example. Denition 7: Line Graph Essentially a bar graph in which the height of each par is represented by a single point, with each of these points connected by a line. Line graphs are best used to show change over time, and should never be used if your X-axis is not an ordered variable. Denition 7: Lower Hinge A component of a box plot, the lower hinge is the 25th percentile. The upper hinge is the 75th percentile. Denition 7: Lower Adjacent Value A component of a box plot, the lower adjacent value is smallest value in the data above the inner lower fence. Denition 7: Mean Arithmetic Mean Also known as the arithmetic mean, the mean is typically what is meant by the word average. The mean is perhaps the most common measure of central tendency. The mean of a variable is given by (the sum of all its values)/(the number of values). For example, the mean of 4, 8, and 9 is 7. The sample mean is written as M, and the population mean as the Greek letter mu (µ). Despite its popularity, the mean may not be an appropriate measure of central tendency for skewed distributions, or in situations with outliers. Denition 7: Median The median is a popular measure of central tendency. It is the 50th percentile of a distribution. To nd the median of a number of values, rst order them, then nd the observation in the middle: the median of 5, 2, 7, 9, and 4 is 5. (Note that if there is an even number of values, one takes the average of the middle two: the median of 4, 6, 8, and 10 is 7.) The median is often more appropriate than the mean in skewed distributions, or in situations with large outliers. Denition 7: Mode The mode is a measure of central tendency. It is the most common value in a distribution: the mode of 3, 4, 4, 5, 5, 5, 8 is 5. Note that the mode may be very dierent from the mean and the median: 1, 1, 1, 3, 8, 10 has mode 1, but mean 6 and median 2. Denition 7: Nominal Scale A nominal scale is one of four Levels of Measurement. No ordering is implied, and addition/subtraction and multiplication/division would be inappropriate for a variable on a nominal scale. {Female, Male} and {Buddhist, Christian, Hindu, Muslim} have no natural ordering (except alphabetic). Occasionally, numeric values are nominal: for instance, if a variable was coded as Female = 1, Male = 2, the set {1, 2} is still nominal.

OpenStax-CNX module: m11011 8 Denition 7: Ordinal Scale One of four levels of measurement, an ordinal scale is a set of ordered values. However, there is no set distance between scale values. For instance, for the scale: (Very Poor, Poor, Average, Good, Very Good) is an ordinal scale. You can assign numerical values to an ordinal scale: rating performance such as 1 for "Very Poor," 2 for "Poor," etc, but there is no assurance that the dierence between a score of 1 and 2 means the same thing as the dierence between a score of and 3. Denition 7: Outside Value A component of a box plot, an outside value is a value more than 1 step from the nearest hinge. See also: Far out value. Denition 7: Parallel Box Plots Two or more box plots drawn on the same Y-axis. These are often useful in comparing features of distributions. An example portraying the times it took samples of women and men to do a task is shown below. See Figure 8 for an example. Figure 8 Denition 8: Percentile 1. There is no universally accepted denition of a percentile. Using the 65th percentile as an example, some statisticians dene the 65th percentile as the lowest score that is larger than 65% of the scores. Others have dened the 65th percentile as the smallest score that is greater than or equal to 65% of the scores. A more sophisticated denition is given below. 2. The rst step is to compute the rank (R) of the percentile in question. This is done using the following formula: R = P (N + 1) 100 where P is the desired percentile and N is the number of numbers. If R is an integer, then the P th percentile is the number with rank R. When R is not an integer, we compute the P th perentile by interpolation as follows: 1. Dene I R as the integer portion of R (the number to the left of the decimal point). 2. Dene F R as the fractional portion or R. 3. Find the scores with Rank I R and with Rank I R + 1. 4. Interpolate by multiplying the dierence between the scores by F R and add the result to the lower score. Denition 8: Pie Chart A graphical representation of data, the pie chart shows relative frequencies of classes of data. It is a circle cut into a number of wedges, one for each class, with the area of each wedge proportional

OpenStax-CNX module: m11011 9 to its relative frequency. Pie charts are only eective for a small number of classes, and are one of the less eective graphical representations. Denition 8: Qualitative Variables Categorical Variable Also known as categorical variables, qualitative variables are variables with no natural sense of ordering. For instance, hair color (Black, Brown, Gray, Red, Yellow) is a qualitative variable, as is name (Adam, Becky, Christina, Dave...). Qualitative variables can be coded to appear numeric but their numbers are meaningless, as in male=1, female=2. Variables that are not qualitative are known as quantitative variables. Denition 8: Quantitative Variables Variables that have are measured on a numeric or quantitative scale. Ordinal, interval and ratio scales are quantitative. A country's population, a person's shoe size, or a car's speed are all quantitative variables. Variables that are not quantitative are known as qualitative variables. Denition 8: Ratio Scale One of the four basic levels of measurement, a ratio scale is a numerical scale with a true zero point and in which a given size interval has the same interpretation for the entire scale. Weight is a ratio scale, Therefore it is meaningful to say that a 200 pound person weighs twice as much as a 100 pound person. Denition 8: Relative Frequency The proportion of observations falling into a given class. For example, if a bag of 55 M&M's has 11 green M&M's, then the frequency of green M&M's is 11 and the relative frequency is 11/55 = 0.20. Relative frequencies arise in the creation of histograms and pie charts, and sometimes in bar graphs. Denition 8: Skew A distribution is skewed if one tail extends out further than the other. A distribution has positive skew (is skewed to the right) if the tail to the right is longer. See Figure 9 for an example. Figure 9 A distribution has a negative skew (is skewed to the left) if the tail to the left is longer. See Figure 10 for an example.

OpenStax-CNX module: m11011 10 Figure 10 Denition 10: Step One of the components of a box plot, the step is 1.5 times the dierence between the upper hinge and the lower hinge. See also: H-spread. Denition 10: Sturgis's Rule One method of determining the number of classes for a histogram, Sturgis's Rule is to take 1 + log 2 N classes, rounded to the nearest integer. Denition 10: Symmetric Distribution In a symmetric distribution, the upper and lower halfs of the distribution are mirror images of each other. For example, in the distribution shown below, the portions above and below 50 are mirror images of each other. In a symmetric distribution, the mean is equal to the median. See Figure 11 for an example. Figure 11 Denition 11: Trimean The trimean is a measure of central tendency; it is a weighted average of the 25th, 50th, and 75th percentiles. Specically it is computed as follows: Trimean = 0.2525 th + 0.550 th + 0.2575 th Denition 11: Trimmed Mean The trimmed mean is a measure of central tendency generally falling between the mean and the median. As in the computation of the median, all observations are ordered. Next, the highest and lowest alpha percent of the data are removed, where alpha ranges from 0 to 50. Finally, the mean of the remaining observations is taken. The trimmed mean has advantages over both the mean and median, but is computationally more dicult and analytically more intractable. Denition 11: Upper Hinge The upper hinge is one of the components of a box plot; it is the 75th percentile.

OpenStax-CNX module: m11011 11 Denition 11: Variables Something that can take on dierent values. For example, dierent subjects in an experiment weight dierent amounts. Therefore "weight" is a variable in the experiment. Or, subjects may be given dierent doses of a drug. This would make "dosage" a variable. Variables can be dependent or independent, qualitative or quantitative, and continuous or discrete. Denition 11: Dependent Variable A variable that measures the experimental outcome. In most experiments, the eects of the independent variable on the dependent variables are observed. For example, if a study investigated the eectiveness of an experimental treatment for depression, then the measure of depression would be the dependent variable. Synonym: dependent measure Denition 11: Independent Variables Variables that are manipulated by the experimenter, as opposed to dependent variables. Most experiments consist of observing the eect of the independent variable on the dependent variable(s).