Chapter 15 Sampling Distribution Models

Similar documents
STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Sampling Distribution Models. Chapter 17

Sampling Distribution Models. Central Limit Theorem

Chapter 18. Sampling Distribution Models /51

From the Data at Hand to the World at Large

STA Module 8 The Sampling Distribution of the Sample Mean. Rev.F08 1

( ) P A B : Probability of A given B. Probability that A happens

CHAPTER 7. Parameters are numerical descriptive measures for populations.

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Lecture 7: Confidence interval and Normal approximation

Chapter 18. Sampling Distribution Models. Bin Zou STAT 141 University of Alberta Winter / 10

Test 3 SOLUTIONS. x P(x) xp(x)

Chapter 8: Sampling Distributions. A survey conducted by the U.S. Census Bureau on a continual basis. Sample

Chapter 18: Sampling Distribution Models

Probability and Probability Distributions. Dr. Mohammed Alahmed

Chapitre 3. 5: Several Useful Discrete Distributions

Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic

What Is a Sampling Distribution? DISTINGUISH between a parameter and a statistic

Chapter 6. Estimates and Sample Sizes

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University

Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li)

Chapter 6: SAMPLING DISTRIBUTIONS

Exam #2 Results (as percentages)

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

Each trial has only two possible outcomes success and failure. The possible outcomes are exactly the same for each trial.

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 9 Inferences from Two Samples

Chapter 18 Sampling Distribution Models

Elementary Statistics

Chapter 7: Sampling Distributions

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

3 Conditional Probability

DSST Principles of Statistics

Chapter 22. Comparing Two Proportions 1 /29

MTH135/STA104: Probability

Chapter 5, 6 and 7: Review Questions: STAT/MATH Consider the experiment of rolling a fair die twice. Find the indicated probabilities.

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

7.1: What is a Sampling Distribution?!?!

Chapter 18: Sampling Distributions

STA Module 10 Comparing Two Proportions

The area under a probability density curve between any two values a and b has two interpretations:

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Review. A Bernoulli Trial is a very simple experiment:

The Central Limit Theorem

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

appstats8.notebook October 11, 2016

Binomial and Poisson Probability Distributions

Two Sample Problems. Two sample problems

Probability calculus and statistics

Chapter. Objectives. Sampling Distributions

green green green/green green green yellow green/yellow green yellow green yellow/green green yellow yellow yellow/yellow yellow

The Central Limit Theorem

Chapter 18 Summary Sampling Distribution Models

What is a parameter? What is a statistic? How is one related to the other?

AP Statistics Review Ch. 7

QUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

CHAPTER 18 SAMPLING DISTRIBUTION MODELS STAT 203

3/30/2009. Probability Distributions. Binomial distribution. TI-83 Binomial Probability

Problems Pages 1-4 Answers Page 5 Solutions Pages 6-11

Math 243 Section 3.1 Introduction to Probability Lab

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

CHAPTER 14 THEORETICAL DISTRIBUTIONS

Chapter 22. Comparing Two Proportions 1 /30

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Math/Stat 3850 Exam 1

Math/Stat 352 Lecture 10. Section 4.11 The Central Limit Theorem

Population 1 Population 2

MATH 1150 Chapter 2 Notation and Terminology

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

Review of Multiple Regression

Chapter 26: Comparing Counts (Chi Square)

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

You may use your calculator and a single page of notes. The room is crowded. Please be careful to look only at your own exam.

Ch. 7 Statistical Intervals Based on a Single Sample

STT 315 This lecture is based on Chapter 2 of the textbook.

Point Estimation and Confidence Interval

One-sample categorical data: approximate inference

7 Estimation. 7.1 Population and Sample (P.91-92)

MAT2377. Rafa l Kulik. Version 2015/November/23. Rafa l Kulik

Practice Questions: Statistics W1111, Fall Solutions

The Central Limit Theorem

Sections 7.1 and 7.2. This chapter presents the beginning of inferential statistics. The two major applications of inferential statistics

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

Thus, P(F or L) = P(F) + P(L) - P(F & L) = = 0.553

Lecture Slides. Elementary Statistics. Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

STAT 200 Chapter 1 Looking at Data - Distributions

green green green/green green green yellow green/yellow green yellow green yellow/green green yellow yellow yellow/yellow yellow

Sampling, Frequency Distributions, and Graphs (12.1)

success and failure independent from one trial to the next?

STAT 201 Assignment 6

Continuous Probability Distributions

Section 9 1B: Using Confidence Intervals to Estimate the Difference ( p 1 p 2 ) in 2 Population Proportions p 1 and p 2 using Two Independent Samples

Math 10 - Compilation of Sample Exam Questions + Answers

Margin of Error for Proportions

Transcription:

Chapter 15 Sampling Distribution Models 1

15.1 Sampling Distribution of a Proportion 2

Sampling About Evolution According to a Gallup poll, 43% believe in evolution. Assume this is true of all Americans. If many surveys were done of 1007 Americans, we could calculate the sample proportion for each. The histogram shows the distribution of a simulation of 2000 sample proportions. The distribution of all possible sample proportions from samples with the same sample size is called the sampling distribution. 3

Sampling Distributions Sampling Distribution for Proportions Symmetric Unimodal Centered at p The sampling distribution follows the Normal model. What does the sampling distribution tell us? The sampling distribution allows us to make statements about where we think the corresponding population parameter is and how precise these statements are likely to be. 4

Sampling Distribution for Smoking 18% of US adults smoke. How much would we expect the proportion of smokers in a sample of size 1000 to vary from sample to sample? A histogram was drawn to display the results of a simulation. The mean is 0.18 = the population proportion. The standard deviation was calculated as 0.0122. Normal: 68-95-99.7 rule works. 95% of all proportions are within 0.0244 of the mean. This is very close to the true value: 95.41% 5

Thinking about Many Samples Distribution of the Sample Variable was the answer to the survey question or the result of an experiment. Proportion is a fixed value that comes from the one sample. Sampling Distribution Variable is the proportion that comes from the entire sample. Many proportions that differ from one to another, each coming from a different sample. 6

Mean and Standard Deviation Sampling Distribution for Proportions Mean = p ˆ npq σ( p) = = n pq n N p, pq n 7

The Normal Model for Evolution Population: p = 0.43, n = 1007. Sampling Distribution: Mean = 0.43 Standard deviation = 0.43 0.57 σ( pˆ ) = 0.0156 1007 8

Smokers Revisited p = 0.18, n = 1000 Standard deviation = 0.18 0.82 σ( pˆ ) = 0.0121 1000 Standard deviation from simulation: 0.0122 The sample-to-sample standard deviation is called the standard error or sampling variability. The standard error is not an error, since no error has been made. 9

15.2 When Does the Normal Model Work? Assumptions and Conditions 10

When Does the Normal Model Work? Success Failure Condition: np 10, nq 10 There must be at least 10 expected successes and failures. Independent trials: Check for the Randomization Condition. 10% Condition: Sample size less than 10% of the population size 11

Understanding Health Risks 22% of US women have a BMI that is above the 25 healthy mark. Only 31 of the 200 randomly chosen women from a large college had a BMI above 25. Is this proportion unusually small? Randomization Condition: Yes, the women were randomly chosen. 10% Condition: For a large college, this is ok. Success Failure Condition: 31 10, 169 10 Yes, the Normal model can be used. 12

Understanding Health Risks: n = 200, p = 0.22, x = 31 ˆ 31 = = 0.155, = 0.22, 200 p p SD pˆ z 0.155 0.22 0.029 = 2.24 0.22 0.78 = 0.029 200 68-95-99.7 Rule: Values 2 SD below the mean occur less than 2.5% of the time. Perhaps this college has a higher proportion of healthy women, or women who lie about their weight. 13

Enough Lefty Seats? 13% of all people are left handed. A 200-seat auditorium has 15 lefty seats. What is the probability that there will not be enough lefty seats for a class of 90 students? Think Plan: 15/90 0.167, Want Model: P pˆ > 0.167 Independence Assumption: With respect to lefties, the students are independent. 10% Condition: This is out of all people. Success/Failure Condition: 15 10, 75 10 14

Enough Lefty Seats? Think Model: p = 0.13, SD pˆ 0.13 0.87 = 0.035 90 The model is: N(0.13, 0.035) Show Plot 0.167 0.13 Mechanics: z = 1.06 0.035 P( pˆ > 0.167) = P( z >1.06) 0.1446 15

Enough Lefty Seats? Tell Conclusion: There is about a 14.5% chance that there will not be enough seats for the left handed students in the class. 16

15.3 The Sampling Distribution of Other Statistics 17

The Sampling Distribution for Others There is a sampling distribution for any statistic, but the Normal model may not fit. Below are histograms showing results of simulations of sampling distributions. 18

The Sampling Distribution For Others The medians seem to be approximately Normal. The variances seem somewhat skewed right. The minimums are all over the place. In this course, we will focus on the proportions and the means. 19

Sampling Distribution of the Means For 1 die, the distribution is Uniform. For 3 dice, the sampling distribution for the means is closer to Normal. For 20 dice, the sampling distribution for the means is very close to normal. The standard deviation is much smaller. 20

15.4 The Central Limit Theorem: The Fundamental Theorem of Statistics 21

The Central Limit Theorem The Central Limit Theorem The sampling distribution of any mean becomes nearly Normal as the sample size grows. Requirements Independent Randomly collected sample The sampling distribution of the means is close to Normal if either: Large sample size Population close to Normal 22

How Normal? 23

Population Distribution and Sampling Distribution of the Means Population Distribution Sampling Distribution for the Means Normal Normal (any sample size) Uniform Bimodal Skewed Normal (large sample size) Normal (larger sample size) Normal (larger sample size) 24

Binomial Distributions and the Central Limit Theorem Consider a Bernoulli trial as quantitative: Success = 1 Failure = 0 The mean of many trials is just ˆp. This distribution of a single trial is far from Normal. By the Central Limit Theorem, the Binomial distribution is approximately normal for large sample sizes. 25

Standard Deviation of the Means Which would be more unusual: a student who is 6 9 tall in the class or a class that has mean height of 6 9? The sample means have a smaller standard deviation than the individuals. The standard deviation of the sample means goes down by the square root of the sample size: SD y = σ n 26

The Sampling Distribution Model for a Mean When a random sample is drawn from a population with mean m and standard deviation s, the sampling distribution has: Mean: m σ Standard Deviation: n For large sample size, the distribution is approximately normal regardless of the population the random sample comes from. The larger the sample size, the closer to Normal. 27

Low BMI Revisited The 200 college women with the low BMI reported a mean weight of only 140 pounds. For all 18-year-old women, m = 143.74 and s = 51.54. Does the mean weight seem exceptionally low? Randomization Condition: The women were a random sample with weights independent. Sample size Condition: Weights are approximately Normal. 200 is large enough. 28

Low BMI Revisited Mean and Standard Deviation of the sampling distribution μ( y) =143.7 The 68-95-99.7 rule suggests that the mean is low but not that unusual. Such variability is not extraordinary for samples of this size. σ 51.54 SD( y ) = = 3.64 n 200 29

Too Heavy for the Elevator? Mean weight of US men is 190 lb, the standard deviation is 59 lb. An elevator has a weight limit of 10 persons or 2500 lb. Find the probability that 10 men in the elevator will overload the weight limit. Think Plan: 10 over 2500 lb same as their mean over 250. Model: Independence Assumption: Not random, but probably independent. Sample Size Condition: Weight approx. Normal. 30

Too Heavy for the Elevator Think Model: m = 190, s = 59 By the CLT, the sampling distribution of approximately Normal: σ 59 μ( y ) =190, SD( y ) = = 18.66 n 10 Show Plot: y is 31

Too Heavy for the Elevator? Mechanics: Tell y μ 250 190 z = = 3.21 SD ( y ) 18.66 P( y > 250) P( z >3.21) 0.0007 Conclusion: There is only a 0.0007 chance that the 10 men will exceed the elevator s weight limit. 32

15.5 Sampling Distributions: A Summary 33

Sample Size and Standard Deviation SD( y ) = σ n SD( pˆ ) = pq n Larger sample size Smaller standard deviation Multiply n by 4 Divide the standard deviation by 2. Need a sample size of 100 to reduce the standard deviation by a factor of 10. 34

Billion Dollar Misunderstanding Bill and Melinda Gates Foundation found that the 12% of the top 50 performing schools were from the smallest 3%. They funded a transformation to small schools. Small schools have a smaller n, thus a higher standard deviation. y Likely to see both higher and lower means. 18% of the bottom 50 were also from the smallest 3%. 35

Distribution of the Sample vs. the Sampling Distribution Don t confuse the distribution of the sample and the sampling distribution. If the population s distribution is not Normal, then the sample s distribution will not be normal even if the sample size is very large. For large sample sizes, the sampling distribution, which is the distribution of all possible sample means from samples of that size, will be approximately Normal. 36

Two Truths About Sampling Distributions Sampling distributions arise because samples vary. Each random sample will contain different cases and, so, a different value of the statistic. Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for proportions and means. This is especially important when we do not know the population s distribution. 37

38

What Can Go Wrong? Don t confuse the sampling distribution with the distribution of the sample. A histogram of the data shows the sample s distribution. The sampling distribution is more theoretical. Beware of observations that are not independent. The CLT fails for dependent samples. A good survey design can ensure independence. Watch out for small samples from skewed or bimodal populations. The CLT requires large samples or a Normal population or both. 39