Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

Similar documents
STT 315 This lecture is based on Chapter 2 of the textbook.

2011 Pearson Education, Inc

Numerical Measures of Central Tendency

Elementary Statistics

Stats Review Chapter 3. Mary Stangler Center for Academic Success Revised 8/16

Chapter 5: Exploring Data: Distributions Lesson Plan

MgtOp 215 Chapter 3 Dr. Ahn

Descriptive Univariate Statistics and Bivariate Correlation

Chapter 3: The Normal Distributions

Unit 2: Numerical Descriptive Measures

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Chapter 3. Data Description

Section 3.2 Measures of Central Tendency

Unit 2. Describing Data: Numerical

Chapter 3. Measuring data

Describing Distributions with Numbers

Describing Distributions With Numbers Chapter 12

CHAPTER 1. Introduction

Stat 101 Exam 1 Important Formulas and Concepts 1

Chapter 5. Understanding and Comparing. Distributions

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

are the objects described by a set of data. They may be people, animals or things.

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Section 3. Measures of Variation

The Empirical Rule, z-scores, and the Rare Event Approach

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Chapter 4. Displaying and Summarizing. Quantitative Data

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Chapter 1. Looking at Data

Practice problems from chapters 2 and 3

Range The range is the simplest of the three measures and is defined now.

STAT 200 Chapter 1 Looking at Data - Distributions

Describing distributions with numbers

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

The empirical ( ) rule

Recap: Ø Distribution Shape Ø Mean, Median, Mode Ø Standard Deviations

Continuous random variables

Chapter 5: Exploring Data: Distributions Lesson Plan

equal to the of the. Sample variance: Population variance: **The sample variance is an unbiased estimator of the

Describing Distributions

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

P8130: Biostatistical Methods I

TOPIC: Descriptive Statistics Single Variable

Chapter 4.notebook. August 30, 2017

CHAPTER 2: Describing Distributions with Numbers

Section 2.3: One Quantitative Variable: Measures of Spread

Statistics for Managers using Microsoft Excel 6 th Edition

1 Measures of the Center of a Distribution

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

BNG 495 Capstone Design. Descriptive Statistics

Sampling, Frequency Distributions, and Graphs (12.1)

Chapter 6 Group Activity - SOLUTIONS

Measures of the Location of the Data

EQ: What is a normal distribution?

Chapter 1: Exploring Data

Performance of fourth-grade students on an agility test

A is one of the categories into which qualitative data can be classified.

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

3.1 Measure of Center

Sections 2.3 and 2.4

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Example 2. Given the data below, complete the chart:

STA 218: Statistics for Management

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Complement: 0.4 x 0.8 = =.6

The Standard Deviation as a Ruler and the Normal Model

1.3: Describing Quantitative Data with Numbers

Chapter 2 Solutions Page 15 of 28

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Descriptive statistics

Summarising numerical data

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Lecture 3: Chapter 3

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

GRACEY/STATISTICS CH. 3. CHAPTER PROBLEM Do women really talk more than men? Science, Vol. 317, No. 5834). The study

Resistant Measure - A statistic that is not affected very much by extreme observations.

Describing Distributions With Numbers

Exam #2 Results (as percentages)

Chapter 3 Data Description

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Units. Exploratory Data Analysis. Variables. Student Data

3.3. Section. Measures of Central Tendency and Dispersion from Grouped Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

MATH 117 Statistical Methods for Management I Chapter Three

Chapter 2: Tools for Exploring Univariate Data

11. The Normal distributions

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

After completing this chapter, you should be able to:

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Lecture 1: Description of Data. Readings: Sections 1.2,

3 GRAPHICAL DISPLAYS OF DATA

Full file at

Transcription:

Finding Quartiles. Use the median to divide the ordered data set into two halves.. If n is odd, do not include the median in either half. If n is even, split this data set exactly in half.. Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

Data: -13,-10,-3,6,12,18,45,56,71 n= 9 is odd. Median =12 Excluding the median Data: -13,-10,-3,6 18,45,56,71 Q1 is median of first half {-13,-10,-3,6 } So, Q1 = 10+( 3) 2 Q3 is median of second half {18,45,56,71} So, Q3 = 45+56 2 Data: -13,-10,-3,6,12,18,45,56,71,96 n= 10 is even. Median = 12+18 2 Including the median Data: -13,-10,-3,6,12 18,45,56,71,96 Q1 is median of first half {-13,-10,-3,6,12 } So, Q1= 3 Q3 is median of second half {18,45,56,71} So, Q3 = 56

Chebyshev s rule For any distribution at least 1 1 of the k observations will fall within k standard 2 deviations of mean,i.e [µ-k*σ, µ+k*σ] where k 1. Chebyshev s rule is for any distribution, whereas the empirical rule is valid only for approximately symmetric unimodal (mound-shaped) distribution. If k=1, not much information is available from Chebyshev s rule. According to Chebyshev at least 75% observations fall within 2 standard deviations of mean. According to Chebyshev at least 88.9% of observations fall within 3 standard deviations of mean. 3

Examples Suppose, the mean of the height of Japanese is µ=5.5 feet and standard deviation is σ=1 feet. How much of Japanese lie between 3.5 feet and 7.5 feet? 3.5=5.5-k*1 7.5=5.5+k*1. Hence k=2 By Chebychev s rule at least at least 1 1 2 2 = 75% lie between 3.5 and 7.5 feet Between what heights you can find at least 93.75% of Japanese? 93.75%= 15 16 = 1 1 4 2. Hence k=4 By Chebychev s rule between 5.5-4*1=1.5 and 5.5+4*1=9.5 we find at least 93..75% of Japanese

Empirical rule For approximately symmetric unimodal (bellshaped/mound shaped) distribution Approximately 68% of observations fall within 1 standard deviation of mean. Approximately 95% of observations fall within 2 standard deviations of mean. Approximately 99.7% of observations fall within 3 standard deviations of mean. 5

Empirical rule 6

Empirical rule 7

Box Plot Box plot is another graphical representation of quantitative data using the following 5 number summary: 1. Minimum Value, 2. Lower Quartile, 3. Median (the middle value), 4. Upper Quartile, 5. Maximum Value. NOTE: Data must be ordered from lowest value to highest value before finding the 5 number summary. 8

Box Plots Are a representation of the five number summary (Minimum, Maximum, Median, Lower Quartile, Upper Quartile). Half the data are in the box One-quarter of the data are in each whisker. If one part of the plot is long, the data are skewed. Box-plot is very useful for comparing distributions This box plot indicates data are skewed to the left. 9

Box Plot Box Plot is a pictorial representation of the 5-number summary. 10

Outliers Any observation farther than 1.5 times IQR from the closest boundary of the box is an outlier. If it is farther than 3 times IQR, it is an extreme outlier, otherwise a mild outlier. One can also indicate the outliers in a box plot, by drawing the whiskers only up to 1.5 times IQR on both sides, and indicating outliers with stars or crosses (or other symbols). 11

An example Suppose min = 2, Q 1 = 18, median = 20, Q 3 = 22, max = 35. Which of the following observations are outliers? A. 10 B. 15 C. 25 D. 30 Lower Fence= Q 1-1.5*IQR= 18-1.5(22-18)=12 Upper Fence= Q 3 +1.5*IQR=22+1.5(22-18)=28 Note: All observations below the lower fence and above the higher fence are considered to be outliers. 12

Histogram vs. Box plot Both histogram and box plot capture the symmetry or skewness of distributions. Box plot cannot indicate the modality of the data. Box plot is much better in finding outliers. The shape of histogram depends to some extent on the choice of bins. 13

Comparing Distributions We can compare between distributions of various data-sets using Box Plots (or the 5-Number Summary), Histograms. We shall first compare distributions using box plots.

Which type of car has the largest median Time to accelerate? A. upscale B. sports C. small D. large E. family 15

Which type of car has the smallest median time value? A. upscale B. sports C. small D. Large E. Luxury 16

Which type of car always take less than 3.6 seconds to accelerate? A. upscale B. sports C. small D. Large E. Luxury 17

Which type of car has the smallest IQR for Time to accelerate? A. upscale B. sports C. small D. Large E. Luxury 18

What is the shape of the distribution of acceleration times for luxury cars? A. Left skewed B. Right skewed C. Roughly symmetric D. Cannot be determined from the information given. 19

What percent of luxury cars accelerate to 30 mph in less than 3.5 seconds? A. Roughly 25% B. Exactly 37.5% C. Roughly 50% D. Roughly 75% E. Cannot be determined from the information given 20

What percent of family cars accelerate to 30 mph in less than 3.5 seconds? A. Less than 25% B. More than 50% C. Less than 50% D. Exactly 75% E. None of the above 21

Z-Scores How to compare apples with oranges? A college admissions committee is looking at the files of two candidates, one with a total SAT score of 1500 and another with an ACT score of 22. Which candidate scored better? How do we compare things when they are measured on different scales? We need to standardize the values. 22

How to standardize? Subtract mean from the value and then divide this difference by the standard deviation. The standardized value = the z-score value mean z-scores are free std of.dev units.. 23

z-scores: An Example Data: 4, 3, 10, 12, 8, 9, 3 (n=7 in this case) Mean = (4+3+10+12+8+9+3)/7 = 49/7 =7. Standard Deviation = 3.65. Original Value z-score -------------------------------------------------------------- 4 (4 7)/3.65 = -0.82 3 (3 7)/3.65 = -1.10 10 (10 7)/3.65 = 0.82 12 (12 7)/3.65 = 1.37 8 (8 7)/3.65 = 0.27 9 (9 7)/3.65 = 0.55 3 (3 7)/3.65 = -1.10 -------------------------------------------------------------- 24

Interpretation of z-scores The z-scores measure the distance of the data values from the mean in the standard deviation scale. A z-score of 1 means that data value is 1 standard deviation above the mean. A z-score of -1.2 means that data value is 1.2 standard deviations below the mean. Regardless of the direction, the further a data value is from the mean, the more unusual it is. A z-score of -1.3 is more unusual than a z-score of 1.2. 25

How to use z-scores? A college admissions committee is looking at the files of two candidates, one with a total SAT score of 1500 and another with an ACT score of 22. Which candidate scored better? SAT score mean = 1600, std dev = 500. ACT score mean = 23, std dev = 6. SAT score 1500 has z-score = (1500-1600)/500 = -0.2. ACT score 22 has z-score = (22-23)/6 = -0.17. ACT score 22 is better than SAT score 1500. 26

Which is more unusual? A. A 58 in tall woman z-score = (58-63.6)/2.5 = -2.24. B. A 64 in tall man z-score = (64-69)/2.8 = -1.79. C. They are the same. Heights of adult women have mean of 63.6 in. std. dev. of 2.5 in. Heights of adult men have mean of 69.0 in. std. dev. of 2.8 in. 27

Using z-scores to solve problems An example using height data and U.S. Marine and Army height requirements Question: Are the height restrictions set up by the U.S. Army and U.S. Marine more restrictive for men or women or are they roughly the same? 28

Data from a National Health Survey Heights of adult women have mean of 63.6 in. standard deviation of 2.5 in. Heights of adult men have mean of 69.0 in. standard deviation of 2.8 in. Height Restrictions Men Minimum Women Minimum U.S. Army 60 in 58 in U.S. Marine Corps 64 in 58 in 29

Heights of adult men have mean of 69.0 in. standard deviation of 2.8 in. Heights of adult women have mean of 63.6 in. standard deviation of 2.5 in. Men Minimum 60 in Women minimum 58 in U.S. Army U.S. Marine z-score = -3.21 Less restrictive 64 in z-score = -1.79 z-score = -2.24 More restrictive 58 in z-score = -2.24 More restrictive Less restrictive 30

Effect of Standardization Standardization into z-scores does not change the shape of the histogram. Standardization into z-scores changes the center of the distribution by making the mean 0. Standardization into z-scores changes the spread of the distribution by making the standard deviation 1. 31

Z-score and Empirical Rule When data are bell shaped, the z-scores of the data values follow the empirical rule. 32

Outlier detection with z-score Empirical Rule tells us that if data are mound-shaped distributed, then almost all the data-points are within plus minus 3 standard deviations from the mean. So an absolute value of z-score larger than 3 can be considered as an outlier. 33

2004 Olympics Women s Heptathlon Austra Skujyte (Lithunia) Shot Put = 16.40m, Long Jump = 6.30m. Carolina Kluft (Sweden) Shot Put = 14.77m, Long Jump = 6.78m. Mean (all contestant) Shot Put Long Jump 13.29m 6.16m Std.Dev. 1.24m 0.23m n 28 26 34

Which performance was better? A. Skujyte s shot put, z-score of Skujyte s shot put = 2.51. B. Kluft s long jump, z-score of Kluft s long jump = 2.70. C. Both were same. Mean (all contestant) Shot Put Long Jump 13.29m 6.16m Std.Dev. 1.24m 0.23m n 28 26 35

Based on shot put and long jump whose performance was better? A. Skujyte s, z-score: shot put = 2.51, long jump = 0.61. Total z-score = (2.51+0.61) = 3.12. B. Kluft s, z-score: shot put = 1.19, long jump = 2.70. Total z-score = (1.19+2.70) = 3.89. C. Both were same. 36