The Empirical Rule, z-scores, and the Rare Event Approach

Similar documents
A is one of the categories into which qualitative data can be classified.

More on Variability. Overview. The Variance is sensitive to outliers. Marriage Data without Nevada

Describing distributions with numbers

2011 Pearson Education, Inc

Practice problems from chapters 2 and 3

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

are the objects described by a set of data. They may be people, animals or things.

2011 Pearson Education, Inc

TOPIC: Descriptive Statistics Single Variable

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Example 2. Given the data below, complete the chart:

STT 315 This lecture is based on Chapter 2 of the textbook.

Algebra Calculator Skills Inventory Solutions

Practice Questions for Exam 1

Chapter 8. Inferences Based on a Two Samples Confidence Intervals and Tests of Hypothesis

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Chapter 5: Exploring Data: Distributions Lesson Plan

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

Describing distributions with numbers

AP Final Review II Exploring Data (20% 30%)

Statistics for IT Managers

Remember your SOCS! S: O: C: S:

Unit 2: Numerical Descriptive Measures

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Preliminary Statistics course. Lecture 1: Descriptive Statistics

MATH 117 Statistical Methods for Management I Chapter Three

CHAPTER 1. Introduction

Elementary Statistics

GRACEY/STATISTICS CH. 3. CHAPTER PROBLEM Do women really talk more than men? Science, Vol. 317, No. 5834). The study

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Introduction to Statistics

Chapter 3 Data Description

Data 1 Assessment Calculator allowed for all questions

Histograms allow a visual interpretation

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

Frequency Distribution Cross-Tabulation

6. Cold U? Max = 51.8 F Range = 59.4 F Mean = 33.8 F s = 12.6 F med = 35.6 F IQR = 28.8 F

SESSION 5 Descriptive Statistics

20 Hypothesis Testing, Part I

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

a table or a graph or an equation.

Stats Review Chapter 3. Mary Stangler Center for Academic Success Revised 8/16

Sections 6.1 and 6.2: The Normal Distribution and its Applications

EXAM 3 Math 1342 Elementary Statistics 6-7

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

Stat 101 Exam 1 Important Formulas and Concepts 1

Sampling, Frequency Distributions, and Graphs (12.1)

Lesson 5.4: The Normal Distribution, page 251

The Difference in Proportions Test

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Business Statistics. Chapter 6 Review of Normal Probability Distribution QMIS 220. Dr. Mohammad Zainal

Complement: 0.4 x 0.8 = =.6

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

Data 1 Assessment Calculator allowed for all questions

Chapter (7) Continuous Probability Distributions Examples Normal probability distribution

Solutions exercises of Chapter 7

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Chapter (7) Continuous Probability Distributions Examples

Statistics and Quantitative Analysis U4320. Segment 5: Sampling and inference Prof. Sharyn O Halloran

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University

Chapter2 Description of samples and populations. 2.1 Introduction.

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Review of the Normal Distribution

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

Unit 1: Statistics. Mrs. Valentine Math III

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Exercises from Chapter 3, Section 1

Section 5.4. Ken Ueda

Descriptive Univariate Statistics and Bivariate Correlation

Statistical inference provides methods for drawing conclusions about a population from sample data.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

The empirical ( ) rule

Chapter 5: Exploring Data: Distributions Lesson Plan

Range The range is the simplest of the three measures and is defined now.

Determining the Spread of a Distribution

Descriptive Statistics Solutions COR1-GB.1305 Statistics and Data Analysis

Determining the Spread of a Distribution

Chapter 2: Tools for Exploring Univariate Data

Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Introduction to Statistics

Section 3. Measures of Variation

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Foundations of Math 1 Review

Performance of fourth-grade students on an agility test

their contents. If the sample mean is 15.2 oz. and the sample standard deviation is 0.50 oz., find the 95% confidence interval of the true mean.

Sampling (Statistics)

CIVL 7012/8012. Collection and Analysis of Information

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Regression - Modeling a response

Transcription:

Overview The Empirical Rule, z-scores, and the Rare Event Approach Look at Chebyshev s Rule and the Empirical Rule Explore some applications of the Empirical Rule How to calculate and use z-scores Introducing the Rare Event strategy for inference Dr Tom Ilvento Department of Food and Resource Economics 2 Interpreting the Standard Deviation We can use the standard deviation to express the proportion of cases that might fall within one or 2 standard deviations from the mean. We can use two theorems to help Chebyshev s Rule Based on a mathematical theorem for any data, regardless of the distribution of the variable. The percentage of observations that are contained within distances of k standard deviations around the mean must be: Chebyshev s Rule Empirical Rule 3 (1-1/k2) * 100% Example: k=2 (1-1/22) *100 = 75% At least 3/4 of the measurements will fall within ± 2 standard deviations from the mean At least 8/9 (88.89%) of the measurements will fall within ± 3 standard deviations from the mean 4

The Empirical Rule Based on a symmetrical mound-shaped distribution where the mean, median, and the mode are similar The EPA mpg data fits this A Symmetrical Mound-Shaped Distribution 5 6 Empirical Rule MPG Car Data Approximately 68% of the measurements will be ± 1 standard deviations from the mean Approximately 95% of the cases fall between ± 2 standard deviations from the mean Approximately 99.7% of the cases will fall within ± 3 standard deviations from the mean MPG Mean 36.99 Standard Error 0.24 Median 37.00 Mode 37.00 Standard Deviation 2.42 Sample Variance 5.85 Coefficient of Variation 6.54% Kurtosis 0.77 Skewness 0.05 Range 14.90 Minimum 30.00 Maximum 44.90 Sum 3699.40 Count 100 Stem-and-Leaf Display for MPG Stem unit: Whole number 30 0 31 8 32 5 7 9 9 33 1 2 6 8 9 9 34 0 2 4 5 8 8 35 0 1 2 3 5 6 6 7 8 9 9 36 0 1 2 3 3 4 4 5 5 6 6 7 7 7 8 8 8 9 9 9 37 0 0 0 0 1 1 1 2 2 3 3 4 4 5 6 6 7 7 8 9 9 38 0 1 2 2 3 4 5 6 7 8 39 0 0 3 4 5 7 8 9 40 0 1 2 3 5 5 7 41 0 0 2 42 1 43 44 9 7 8

30 0 31 8 32 5 7 9 9 33 1 2 6 8 9 9 34 0 2 4 5 8 8 35 0 1 2 3 5 6 6 7 8 9 9 38 0 1 2 2 3 4 5 6 7 8 39 0 0 3 4 5 7 8 9 40 0 1 2 3 5 5 7 41 0 0 2 42 1 43 44 9 MPG Data Auto Batteries Example We would expect that 68% of the values would fall between 36.99 ± 2.42 34.57 to 39.41 We expect that 95% of the values would fall between 36.99 ± 2*2.42 32.15 to 41.83 We expect that 99% of the values would fall between 36.99 ± 3*2.42 29.73 to 44.25 Stem-and-Leaf Display for MPG Stem unit: Whole number 36 0 1 2 3 3 4 4 5 5 6 6 7 7 7 8 8 8 9 9 9 37 0 0 0 0 1 1 1 2 2 3 3 4 4 5 6 6 7 7 8 9 9 9 Grade A Battery: Average Life is 60 Months Guarantee is for 36 months Standard Deviation s = 10 months Frequency distribution is mound-shaped and symmetrical What percent of the Grade A Batteries will last more than 50 months? 10 What percent of the Grade A Batteries will last more than 50 months? What percent of the Grade A Batteries will last more than 50 months? Start with finding how many standard deviations 50 months is from the mean Draw it out Figure out the probability from the Empirical Rule 11 50 months is one standard deviation to the left of the mean - (60-10) = 50 This represents 34% of the cases Because ± 1 std deviation = 68%, so 1 std deviation = 34% To the right of the mean (60 months or more) represents 50% of the cases Answer: 34 + 50 = 84% 12

Approximately what percentage of the batteries will last less than 40 months? What percent of the Grade A Batteries will last less than 40 months? Start with finding how many standard deviations 40 months is from the mean Draw it out Figure out the probability 13 40 is 2 standard deviations from the mean, and ± 2 standard deviations = 95% of the cases I am interested in the part less than 40 months! of the 95% for two standard deviations, the left hand side of the distribution, is equal to 47.5% But I want the part in the left hand tail of the distribution Which is 50% - 47.5% = 2.5% So it represents 2.5% of the cases 14 Suppose your battery lasted 37 months. What could you infer about the manufacturer s claim? Z-Scores 37 months is more than 2 standard deviations below the mean Less than 2.5% of the batteries would fail within 37 months if the claims were true It s possible you just got a bad one do you feel lucky? Or unlucky?????? 15 This is a method of transforming the data to reflect relative standing of the value For a value, X, Subtract the mean and divide by the std dev The result represents the distance between a given measurement X and the mean, expressed in standard deviations A positive z-score means that the measurement is larger than the mean A negative z-score means that it is smaller than the mean z i = ( x! x) i s 16

z-score example from MPG data Mean = 36.99 s = 2.42 One value is 33.2 The z-score is (33.2-36.99)/2.42 = -1.57 This value of -1.57 is 1.57 standard deviations below the mean 1.894 2.05 2.11 Create a z-score for the following values for a variable with a mean = 2.0007 and s =.0446 z = (1.894-2.0007)/.0446 = -2.392 z = (2.050-2.0007)/.0446 = 1.105 z = (2.110-2.0007/.0446 = 2.45 17 18 z-scores If we were to convert an entire variable to z- scores This means create a new variable by taking each value, subtracting the mean, and dividing by the standard deviation This is called a data transformation The new variable would have Mean = 0 Standard deviation = 1 19 Empirical Rule and z-scores Approximately 68% of the measurements will have a z-score between 1 and 1 Approximately 95% of the measurements will have a z-score between 2 and 2 Almost all the measurements 99.7% will have a z-score between 3 and 3 20

A data example How to think about this problem? A female bank employee believes her salary is low as a result of sex discrimination. Her salary is $27,000 She collects information on salaries of male counterparts. Their mean salary is $34,000 with a standard deviation of $2,000. Does this information support her claim? 21 What is her salary in relation to the mean male salary? To find out, calculate a z-score for her salary to see how far below the mean her salary is in standard deviations Her salary is 3.5 standard deviations below that of her male counterparts If her salary is part of the same distribution as the males in her bank, a value of 3.5 would be very rare $27,000! $34,000 z = =! 3.5 $2,000 22 The Rare-Event Approach The Rare Event Approach This approach is called a Rare-Event Approach Express the problem in terms of a known distribution - the distribution of males And see how rare it was to observe the value you have - the woman s salary Based on a z-score of -3.5, we might doubt that her salary comes from the same distribution, and we might conclude there is something different about her salary One conclusion could be discrimination But it could also be related to performance, or time on the job, or some other factors 23 What if the woman s salary was only one standard deviation below the mean? That would not be such an unusual thing In the distribution were symmetrical, we might expect 34% to fall between the mean and one standard deviation below. And more importantly, 16% would be more than one standard deviation below the mean 24

The Rare-Event Approach We hypothesize a frequency distribution to describe a population of measurements We draw a sample from the population Compare the sample statistic to the hypothesized frequency distribution And see how likely or unlikely the sample came from the hypothesized distribution The decision would focus on how many standard deviations our sample statistic is from the hypothesized value 25 A definition of an outlier One way to determine what is an outlier is to calculate the z-score If a value is more than three standard deviations from the mean, it is relatively far away Later, we will express this in a probability framework via the normal or t-distribution For now we will say that any value that is more than three standard deviations from the mean is unusual, and an outlier. 26 Summary This concludes the basic description section of the course From graphs to central tendency to variability all are ways to describe data The z-score approach is a way to express a data point as being so many standard deviations from the mean The rare event approach is away to start to make inferences do you lucky? 27