Measures of Central Tendency. Mean, Median, and Mode

Similar documents
Probability Distributions

The Normal Distribution. Chapter 6

Math 10 - Compilation of Sample Exam Questions + Answers

MATH 10 INTRODUCTORY STATISTICS

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

CHAPTER 2: Describing Distributions with Numbers

Chapter 2: Tools for Exploring Univariate Data

AP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam.

Data Analysis and Statistical Methods Statistics 651

CIVL 7012/8012. Collection and Analysis of Information

A C E. Answers Investigation 4. Applications

Descriptive Statistics-I. Dr Mahmoud Alhussami

TOPIC: Descriptive Statistics Single Variable

Teaching a Prestatistics Course: Propelling Non-STEM Students Forward

Example 2. Given the data below, complete the chart:

Algebra Calculator Skills Inventory Solutions

THE SAMPLING DISTRIBUTION OF THE MEAN

Solving Equations by Adding and Subtracting

Each trial has only two possible outcomes success and failure. The possible outcomes are exactly the same for each trial.

Chapter 1: Exploring Data

Introduction to Statistics

Lesson 8: Graphs of Simple Non Linear Functions

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Chapter 6 Group Activity - SOLUTIONS

Data set B is 2, 3, 3, 3, 5, 8, 9, 9, 9, 15. a) Determine the mean of the data sets. b) Determine the median of the data sets.

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning

STATISTICS. 1. Measures of Central Tendency

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Chapter 2: Summarizing and Graphing Data

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

Histograms, Central Tendency, and Variability

Elementary Statistics

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Lecture 2. Descriptive Statistics: Measures of Center

Chapter 5. Understanding and Comparing. Distributions

S1600 #2. Data Presentation #1. January 14, 2016

Resistant Measure - A statistic that is not affected very much by extreme observations.

Business Statistics Midterm Exam Fall 2015 Russell. Please sign here to acknowledge

Lecture 11. Data Description Estimation

Algebra 1: Semester 2 Practice Final Unofficial Worked Out Solutions by Earl Whitney

Chapter 1 - Lecture 3 Measures of Location

Determining the Spread of a Distribution Variance & Standard Deviation

SUMMARIZING MEASURED DATA. Gaia Maselli

are the objects described by a set of data. They may be people, animals or things.

Descriptive Statistics C H A P T E R 5 P P

Chapter 7 Sampling Distributions

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

Additional practice with these ideas can be found in the problems for Tintle Section P.1.1

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Introduction to statistics

Statistics for Engineering, 4C3/6C3 Assignment 2

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Survey on Population Mean

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

CS 361: Probability & Statistics

ST 371 (IX): Theories of Sampling Distributions

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2

Sampling Distribution Models. Chapter 17

STATISTICS 1 REVISION NOTES

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

BNG 495 Capstone Design. Descriptive Statistics

Do students sleep the recommended 8 hours a night on average?

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

BmMT 2016 Team Test Solutions November 13, 2016

Name Score. Physics Dr. E. F. Redish

Wisdom is not measured by age, Intelligence is not measured by grades, Personality is not measured by what others say. O.

How to fail Hyperbolic Geometry

Probability Distributions

3.1 Measure of Center

Business Statistics MEDIAN: NON- PARAMETRIC TESTS

Lesson 7: Classification of Solutions

Stat 20: Intro to Probability and Statistics

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 3.1-1

CS 361: Probability & Statistics

Lesson 7: The Mean as a Balance Point

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Descriptive Statistics Solutions COR1-GB.1305 Statistics and Data Analysis

MATH 10 INTRODUCTORY STATISTICS

Stat 101 Exam 1 Important Formulas and Concepts 1

Describing distributions with numbers

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Descriptive statistics

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

CSCI2244-Randomness and Computation First Exam with Solutions

Chapter 13 Correlation

Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz

Lecture 8 Continuous Random Variables

Lab 5 for Math 17: Sampling Distributions and Applications

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Statistics 511 Additional Materials

Foundation Mathematics. Sample. Examination Paper. Time: 2 hours

Name: Exam 2 Solutions. March 13, 2017

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics

P (A) = P (B) = P (C) = P (D) =

Transcription:

Measures of Central Tendency Mean, Median, and Mode

Population study The population under study is the 20 students in a class Imagine we ask the individuals in our population how many languages they speak. Here are some (made-up) responses: 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4 Each response is a measurement, or an outcome In this lecture, we will discuss ways to summarize data from a sample, using as an example these made-up responses

Frequency distribution A count of the number of times each outcome occurs is called a frequency distribution. This information can be conveyed in a table or a plot. A plot of a frequency distribution of numerical data is called a histogram.

The mean of a frequency distribution Recall our dataset 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4 The mean (sum/sample size) of these numbers is 2.15 Note that no student reports knowing 2.15 languages Still, this number is one way of summarizing the frequency distribution: On average, the students in this class know 2.15 languages.

Another way to calculate the mean Use the frequency distribution Calculate the sum of the data points by first multiplying the outcome and frequency columns together, and then adding up the results sum = (1*6) + (2*8) + (3*3) + (4*3) = 6 + 16 + 9 + 12 = 43 The mean is then the sum divided by the sample size (20) mean = sum/sample size = 43/20 = 2.15

Yet another way to calculate the mean We could also calculate the mean by multiplying the outcome column by relative frequencies (frequency/sample size), and adding up the results mean = (1*0.30) + (2*0.4) + (3*0.15) + (4*0.15) =.3 +.8 +.45 +.6 = 2.15 The mean of a sample is the weighted average of the distinct outcomes The weights are the relative frequencies with which those outcomes occur

But the mean is not perfect (No descriptive statistic ever is) Example: A ten person first-year seminar has 9 first-year students who are 18 years old, and one who is 45 The mean student age is 20.7 Yet almost no one in the class is even 20! The mean is not an ideal way to summarize data in this case

The median of a frequency distribution The median is the middle of a frequency distribution Assume a sample: 12, 5, 3, 4, 5 Sorting these data gives: 3, 4, 5, 5, 12 The median value of this sample is 5 If the sample size is even: e.g., 2, 3, 4, 5, 5, 12 The median is the mean of the middle two numbers (4 + 5)/2 = 4.5 Q: What is the median age in the aforementioned first-year seminar?

The median is the middle point of a distribution The median is the outcome that divides the distribution of outcomes in half 50% of the outcomes are at or below the median, and 50% are at or above it The area in a histogram to the right of the median equals the area to its left Blue Area = Red Area 6*1 + 8*0.5 = 8*0.5 + 3*1 + 3*1 10 = 10

The mean is the balance point of a distribution You can think of the mean as the a distribution s center of gravity The sum total of the distances of all the points less than the mean equals the sum total of the distances of all the points greater 2.15 (2.15-1)*6 + (2.15-2)*8 = (3-2.15)*3 + (4-2.15)*3 (1.15)*6 + (.15)*8 = (.85)*3 + (1.85)*3 8.1 = 8.1

iclicker Q: The mean vs. the median If a student scores in the top half (upper 50%) of the class on an exam, did the student score above the mean? A: Yes, the student scored above the mean B: No, the student scored below the mean C: Not sure; there isn t enough information to answer this question D: Not sure, because I don t understand the mean and median well enough yet to even hazard a guess (but I m interested in learning this stuff!)

C (or D): Not Enough Information The student is not necessarily above (or below) the mean The mean is the balance point of a distribution The median is the middle point of a distribution The order of these two points can vary with the distribution

Symmetric (i.e., non-skewed) distributions In a symmetric distribution, the left mirrors the right So the mean (the balance point) equals the median (the middle) E.g., the mean and the median of the frequency distribution for the sample 1, 2, 2, 3 are both 2 mean = median = 2

Asymmetric (i.e., skewed) distributions In an asymmetric distribution, the left does not mirror the right So the mean and the median are not necessarily equal E.g., the mean and the median of the frequency distribution for the sample 1, 2, 2, 10 are not equal median = 2 mean = 3.75

Mean vs. Median The gold sample has an outlier: 10 So its histogram is skewed right Likewise, its balance point (the mean) falls to the right of the middle point (the median) Ultimately, only 25% of the outcomes are above the mean mean = median = 2 median = 2 mean = 3.75

iclicker Q: The mean vs. the median If a student scores in the top half (upper 50%) of the class on an exam, did the student score above the mean? A: Yes, the student scored above the mean B: No, the student scored below the mean C: Not sure; there isn t enough information to answer this question D: Not sure, because I don t understand the mean and median well enough yet to even hazard a guess (but I m interested in learning this stuff!)

C (or D): Not Enough Information A student who scores in the upper 50% might have scored 10, but they might have scored 2 It s impossible to tell if they scored above the mean or not! mean = median = 2 median = 2 mean = 3.75

A real-world example of a skewed distribution Delay times are in minutes A negative observation means the flight left earlier than scheduled The histogram is right skewed It has a long right-hand tail The mean is being pulled away from the median in the direction of the tail, so we expect the mean to be greater than the median: mean = 9.135913 median = -2 Histogram of American Airlines Flight Delays

Another example of a skewed real-world distribution: Public school funding Image source

Mean vs. median, considering skew Image source: http://i.imgur.com/yseyhha.jpg

Bush tax cuts GOP reported 92 million Americans would get a tax cut, averaging $1,083 But actually, the median was less than $100 Incredibly rich outliers received much larger tax cuts and skewed the mean

Rule of Thumb For skewed distributions, the mean lies in the direction of the skew (towards the longer tail) relative to the median But this rule of thumb is not always true! Image source: http://www.amstat.org/publications/jse/v13n2/vonhippel_figure2.gif

Mode The mode is the most popular outcome; the one that occurs most often The outcome associated with the highest bar in a frequency distribution For example, in the sample 1, 2, 3, 4, 4, 4, 3, 2, 1, the mode is 4 In plurality voting, the winning candidate is the mode; the one with the most votes The mode need not be unique; some distributions are bi- or multi-modal Image source: https://statistics.laerd.com/statistical-guides/img/mode-1.png

Measures of Central Tendency Mean: numerical, preferably symmetric data Median: ordered, possibly skewed data Mode: any data, but especially non-numeric data e.g., hair color, food items, movies, etc.

Extras

The median is the middle point of a distribution The median is the outcome that divides the distribution of outcomes in half 50% of the outcomes are at or below the median, and 50% are at or above it The area in a histogram to the right of the median equals the area to its left Blue Area = Red Area 30*1 + 40*0.5 = 40*0.5 + 15*1 + 15*1 50 = 50

The mean is the balance point of a distribution You can think of the mean as the a distribution s center of gravity The sum total of the distances of all the points less than the mean equals the sum total of the distances of all the points greater 2.15 (2.15-1)*6 + (2.15-2)*8 = (3-2.15)*3 + (4-2.15)*3 (1.15)*6 + (.15)*8 = (.85)*3 + (1.85)*3 8.1 = 8.1

Symmetric (i.e., non-skewed) distributions In a symmetric distribution, the left mirrors the right So the mean (the balance point) equals the median (the middle) E.g., the mean and the median of the frequency distribution for the sample 1, 2, 2, 3 are both 2 mean = median = 2

Asymmetric (i.e., skewed) distributions In an asymmetric distribution, the left does not mirror the right So the mean and the median are not necessarily equal E.g., the mean and the median of the frequency distribution for the sample 1, 2, 2, 10 are not equal median = 2 mean = 3.75

Mean vs. Median The gold sample has an outlier: 10 So its histogram is skewed right Likewise, its balance point (the mean) falls to the right of the middle point (the median) Ultimately, only 25% of the outcomes are above the mean median = 2 mean = 3.75

C (or D): Not Enough Information A student who scores in the upper 50% might have scored 10, but they might have scored 2 It s impossible to tell if they scored above the mean or not! mean = median = 2 median = 2 mean = 3.75