Sampling Distribution Models. Central Limit Theorem

Similar documents
STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Sampling Distribution Models. Chapter 17

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 18. Sampling Distribution Models /51

Chapter 15 Sampling Distribution Models

STA Module 8 The Sampling Distribution of the Sample Mean. Rev.F08 1

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Chapter 7. Scatterplots, Association, and Correlation. Copyright 2010 Pearson Education, Inc.

Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li)

Chapter 7 Summary Scatterplots, Association, and Correlation

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Are data normally normally distributed?

Chapter 18: Sampling Distributions

Lecture 27. DATA 8 Spring Sample Averages. Slides created by John DeNero and Ani Adhikari

Chapter 7 Sampling Distributions

P (E) = P (A 1 )P (A 2 )... P (A n ).

Section 5.4. Ken Ueda

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8: Sampling Distributions. A survey conducted by the U.S. Census Bureau on a continual basis. Sample

Lecture 8 Sampling Theory

Last few slides from last time

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

From the Data at Hand to the World at Large

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Section 7.1 How Likely are the Possible Values of a Statistic? The Sampling Distribution of the Proportion

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Exam #2 Results (as percentages)

appstats8.notebook October 11, 2016

σ. We further know that if the sample is from a normal distribution then the sampling STAT 2507 Assignment # 3 (Chapters 7 & 8)

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

Week 11 Sample Means, CLT, Correlation

The Central Limit Theorem

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Chapter 18 Sampling Distribution Models

Stat 20 Midterm 1 Review

4/19/2009. Probability Distributions. Inference. Example 1. Example 2. Parameter versus statistic. Normal Probability Distribution N

EQ: What is a normal distribution?

MATH 1150 Chapter 2 Notation and Terminology

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.

Chapter 8: Confidence Intervals

Lab 5 for Math 17: Sampling Distributions and Applications

Data Presentation. Naureen Ghani. May 4, 2018

COMP6053 lecture: Sampling and the central limit theorem. Jason Noble,

COMP6053 lecture: Sampling and the central limit theorem. Markus Brede,

( ) P A B : Probability of A given B. Probability that A happens

Math 10 - Compilation of Sample Exam Questions + Answers

Solving with Absolute Value

Review of the Normal Distribution

Chapter 6. Estimates and Sample Sizes

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

DISTRIBUTIONS USED IN STATISTICAL WORK

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland

Simple Illustrations of Statistical Significance

Read the text and then answer the questions.

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

STA Module 4 Probability Concepts. Rev.F08 1

Senior Math Circles November 19, 2008 Probability II

Probability Distributions

Assignment 3 Logic and Reasoning KEY

The Central Limit Theorem

appstats27.notebook April 06, 2017

Functions. If x 2 D, then g(x) 2 T is the object that g assigns to x. Writing the symbols. g : D! T

Mr. Stein s Words of Wisdom

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

STA Module 10 Comparing Two Proportions

Simple Regression Model. January 24, 2011

Chapter 27 Summary Inferences for Regression

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

CHAPTER 1: Preliminary Description of Errors Experiment Methodology and Errors To introduce the concept of error analysis, let s take a real world

Introducing Proof 1. hsn.uk.net. Contents

Sampling Distributions. Introduction to Inference

Men. Women. Men. Men. Women. Women

Hyperreal Numbers: An Elementary Inquiry-Based Introduction. Handouts for a course from Canada/USA Mathcamp Don Laackman

Lesson 6-1: Relations and Functions

Test 3 SOLUTIONS. x P(x) xp(x)

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Elementary Statistics

Preptests 55 Answers and Explanations (By Ivy Global) Section 4 Logic Games

Chapter 23. Inference About Means

Chapter 5 Least Squares Regression

Polling and sampling. Clement de Chaisemartin and Douglas G. Steigerwald UCSB

Commentary. Regression toward the mean: a fresh look at an old story

Mean/Average Median Mode Range

Discrete Mathematics and Probability Theory Summer 2014 James Cook Midterm 1

CHAPTER 18 SAMPLING DISTRIBUTION MODELS STAT 203

QUADRATICS 3.2 Breaking Symmetry: Factoring

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Introduction to Statistical Data Analysis Lecture 4: Sampling

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.

Discrete Distributions

Homework 4 Solutions Math 150

STA 291 Lecture 16. Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

Direct Proofs. the product of two consecutive integers plus the larger of the two integers

Transcription:

Sampling Distribution Models Central Limit Theorem

Thought Questions 1. 40% of large population disagree with new law. In parts a and b, think about role of sample size. a. If randomly sample 10 people, will exactly four (40%) disagree with law? Surprised if only two in sample disagreed? How about if none disagreed? b. If randomly sample 1000 people, will exactly 400 (40%) disagree with law? Surprised if only 200 in sample disagreed? How about if none disagreed?

The Diversity of Samples from the Same Population Working Backward from Samples to Populations Start with question about population. Collect a sample from the population, measure variable. Answer question of interest for sample. With statistics, determine how close such an answer, based on a sample, would tend to be from the actual answer for the population. Understanding Dissimilarity among Samples We need to understand what kind of differences we should expect to see in various samples from the same population

What to Expect of Sample Proportions A slice of the population 40% of population carry a certain gene Do Not Carry Gene =, Do Carry Gene = X

What to Expect of Sample Proportions Possible Samples Sample 1: Proportion with gene = 12/25 = 0.48 = 48% Sample 2: Proportion with gene = 9/25 = 0.36 = 36% Sample 3: Proportion with gene = 10/25 = 0.40 = 40% Sample 4: Proportion with gene = 7/25 = 0.28 = 28%

The Central Limit Theorem for Sample Proportions Rather than showing real repeated samples, imagine what would happen if we were to actually draw many samples. Now imagine what would happen if we looked at the sample proportions for these samples. The histogram we d get if we could see all the proportions from all possible samples is called the sampling distribution of the proportions. What would the histogram of all the sample proportions look like?

The Central Limit Theorem for Sample Proportions We would expect the histogram of the sample proportions to center at the true proportion, p, in the population. As far as the shape of the histogram goes, we can simulate a bunch of random samples that we didn t really draw. It turns out that the histogram is unimodal, symmetric, and centered at p. More specifically, it s an amazing and fortunate fact that a Normal model is just the right one for the histogram of sample proportions.

Imagine, Imagine, Imagine - Predicting Election Results Imagine a research organization (R.E.) wants to know if Candidate A would win the election if held next week. They completed a well-conducted poll of 100 randomly selected voters. Imagine several other research organizations (499 of them) also completed a well-conducted poll of 100 randomly selected voters to try and answer the same question. They all conduct the polls on the same day. Imagine that the answer they are looking for is 0.53, that is Candidate A would get 53% of the vote if the election were held on the day of the polls.

Predicting Election Results R.E. Results 1 52/100 =.52 2 49/100 =.49 3 62/100 =.62 4 45/100 =.45 5 59/100 =.59...and so on

Relative frequency 0.35 Results from 500 Polls of 100 voters 0.3 0.25 0.2 0.15 0.1 0.05 0 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Proportion in sample that would vote for Candidate A

Relative frequency 0.4 Results from 500 Polls of 400 voters 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.35 0.4 0.45 0.5 0.55 0.6 0.65 Proportion in sample that would vote for Candidate A

Relative frequency Relative frequency 0.35 0.3 0.25 0.2 0.15 0.1 0.05 Results from 500 Polls of 100 voters 0 0.2 0.3 0.4 0.5 0.6 0.7 Proportion in sample that would vote for Candidate A 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 Results from 500 Polls of 400 voters 0 0.35 0.4 0.45 0.5 0.55 0.6 0.65 Proportion in sample that would vote for Candidate A

Relative frequency 0.3 Results from 500 Polls of 1000 voters 0.25 0.2 0.15 0.1 0.05 0 0.45 0.47 0.49 0.51 0.53 0.55 0.57 Proportion in sample that would vote for Candidate A

Relative frequency Relative frequency 0.4 Results from 500 Polls of 400 voters 0.3 Results from 500 Polls of 1000 voters 0.35 0.25 0.3 0.25 0.2 0.15 0.1 0.05 0 0.35 0.4 0.45 0.5 0.55 0.6 0.65 Proportion in sample that would vote for Candidate A 0.2 0.15 0.1 0.05 0 0.45 0.5 0.55 Proportion in sample that would vote for Candidate A

The Central Limit Theorem for Sample Proportions Modeling how sample proportions vary from sample to sample is one of the most powerful ideas we ll see in this course. A sampling distribution model for how a sample proportion varies from sample to sample allows us to quantify that variation and how likely it is that we d observe a sample proportion in any particular interval. To use a Normal model, we need to specify its mean and standard deviation. We ll put µ, the mean of the Normal, at p.

The Central Limit Theorem for Sample Proportions When working with proportions, knowing the mean automatically gives us the standard deviation as well the standard deviation we will use is pq n So, the distribution of the sample proportions is modeled with a probability model that is N p, pq n

Sampling Distribution Model for Proportions Mean and Standard Deviation Let Y - Binom(n,p) where n is the number of trials and p is the probability of success.

The Central Limit Theorem for Sample Proportions A picture of what we just discussed is as follows:

The Central Limit Theorem for Sample Proportions Because we have a Normal model, for example, we know that 95% of Normally distributed values are within two standard deviations of the mean. So we should not be surprised if 95% of various polls gave results that were near the mean but varied above and below that by no more than two standard deviations. This is what we call sampling variability. The sample proportions varies from sample to sample. All possible sample proportions arrange themselves neatly under a normal curve.

What to Expect of Sample Proportions If numerous samples of the same size are taken from a population are taken, the frequency curve made from proportions from various samples will be approximately bell-shaped. In other words, the sampling distribution of possible sample proportions is Normal and centered at the true population proportion, with standard deviation (true proportion)(1 true proportion) sample size

What to Expect of Sample Proportions In reality, we don t know the true proportion The standard error(se) for the sampling distribution of possible sample proportions is (sample proportion)(1 sample proportion) sample size

Assumptions and Conditions 1. Randomization Condition: The sample should be a random sample of the population. 2. 10% Condition: the sample size, n, must be no larger than 10% of the population. 3. Success/Failure Condition:The sample size has to be big enough so that both np (number of successes) and nq (number of failures) are at least 10.

A Sampling Distribution Model for a Proportion A proportion is no longer just a computation from a set of data. It is now a random variable quantity that has a probability distribution. This distribution is called the sampling distribution model for proportions. Even though we depend on sampling distribution models, we never actually get to see them. We never actually take repeated samples from the same population and make a histogram. We only imagine or simulate them.

A Sampling Distribution Model for a Proportion Still, sampling distribution models are important because they act as a bridge from the real world of data to the imaginary world of the statistic and enable us to say something about the population when all we have is data from the real world.

The Sampling Distribution Model for a Proportion Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of ˆp is modeled by a Normal model with Mean: Expected Value( ˆp )=p Standard deviation: SD( ˆp) pq n

What to Expect of Sample Proportions Example : Suppose 40% of all voters in U.S. favor candidate X. Pollsters take a sample of 2400 people. What sample proportion would be expected to favor candidate X? The sample proportion could be anything from a bell-shaped curve with mean 0.40 and standard deviation: (0.40)(1 0.40) = 0.01 2400 For our sample of 2400 people: 68% of sample proportions will be between 39% and 41% 95% of sample proportions will be between 38% and 42% 99.7% of sample proportions will be between 37% and 43%

Example: Do Americans Really Vote When They Say They Do? Reported in Time magazine (Nov 28, 1994): Telephone poll of 800 adults (2 days after election) 56% reported they had voted. Committee for Study of American Electorate stated only 39% of American adults had voted. Could it be the results of poll simply reflected a sample that, by chance, voted with greater frequency than general population?

Example: Do Americans Really Vote When They Say They Do? Suppose only 39% of American adults voted. We can expect sample proportions to be represented by a bell-shaped curve with mean 0.39 and standard deviation: (0.39)(1 0.39) = 0.017 or 1.7% 800 68% of sample proportions will be between 37.3% and 40.7% 95% of sample proportions will be between 35.6% and 42.4% 99.7% of sample proportions will be between 33.9% and 44.1%

Question

Thought Questions 2. Mean weight of all women at large university is 135 pounds with a standard deviation of 10 pounds. a. Recalling Empirical Rule for bell-shaped curves, in what range would you expect 95% of women s weights to fall? b. If randomly sampled 10 women at university, how close do you think their average weight would be to 135 pounds? c. If sampled 1000 women, would you expect average weight to be closer to 135 pounds than for the sample of only 10 women?

What to Expect of Sample Means Example Want to estimate average weight loss for all who attend national weight-loss clinic for 10 weeks. Unknown to us, population mean weight loss is 8 pounds and standard deviation is 5 pounds. If weight losses are approximately bell-shaped, 95% of individual weight losses will fall between 2 (a gain of 2 pounds) and 18 pounds lost. Possible Samples (random samples of 25 people from this population) Sample 1: 1,1,2,3,4,4,4,5,6,7,7,7,8,8,9,9,11,11,13,13,14,14,15,16,16 Sample 2: 2, 2,0,0,3,4,4,4,5,5,6,6,8,8,9,9,9,9,9,10,11,12,13,13,16 Sample 3: 4, 4,2,3,4,5,7,8,8,9,9,9,9,9,10,10,11,11,11,12,12,13,14,16,18 Sample 4: 3, 3, 2,0,1,2,2,4,4,5,7,7,9,9,10,10,10,11,11,12,12,14,14,14,19

What to Expect of Sample Means Results: Sample 1: Mean = 8.32 pounds Sample 2: Mean = 6.76 pounds Sample 3: Mean = 8.48 pounds Sample 4: Mean = 7.16 pounds Each sample gave a different sample mean, but close to 8.

What to Expect of Sample Means Say the true population mean height is 68 inches and the population standard deviation is 3 inches

What to Expect of Sample Means Variation in sample means Say, the actual mean height of a population is 68 inches. First sample mean is 67.5 inches Second sample mean is 66 inches Third sample mean is 69 inches Sample means are (hopefully) close to the population mean But they are not identical Why? Due to sample variability Each sample is only a subset of the population

What to Expect of Sample Means : Population of measurements is bell-shaped, and a random sample of any size is measured.

What to Expect of Sample Means: Population of measurements of interest is not bell-shaped, but a large random sample is measured. Sample of size 30 is considered large, but if there are extreme outliers, better to have a larger sample. Population Mean is 21.76

What to Expect of Sample Means: Population of measurements of interest is not bell-shaped, but a large random sample is measured.

What to Expect of Sample Means If numerous samples or repetitions of the same size are taken, the frequency curve of means from various samples will be approximately bell-shaped. The mean for the sampling distribution of the sample mean is equal to the true population mean The standard deviation(sd) for the sampling distribution of the possible sample means is : population standard deviation sample size

What to Expect of Sample Means In reality, we won t know the population standard deviation The standard error(se) for the sampling distribution of the possible sample means is sample standard deviation sample size

The Central Limit Theorem: The Fundamental Theorem of Statistics The sampling distribution of any mean becomes closer to near Normal as the sample size grows. We don t even care about the shape of the population distribution as long as the sample size is large enough! The Fundamental Theorem of Statistics is called the Central Limit Theorem (CLT).

The Central Limit Theorem: The Fundamental Theorem of Statistics The CLT is surprising: Not only does the histogram of the sample means get closer and closer to the Normal distribution as the sample size grows, but this is true regardless of the shape of the population distribution. The CLT works better (and faster) the closer the population distribution is to a Normal itself. It also works better for larger samples.

The Central Limit Theorem: The Fundamental Theorem of Statistics Slide 18-48

The Fundamental Theorem of Statistics The Central Limit Theorem (CLT) The mean of a random sample is a random variable whose sampling distribution can be approximated by a Normal model. The larger the sample, the better the approximation will be.

Assumptions and Conditions The CLT requires essentially the same assumptions we saw for modeling proportions: Independence Assumption: The sampled values must be independent of each other. Sample Size Assumption: The sample size must be sufficiently large.

Assumptions and Conditions (cont.) We can t check these directly, but we can think about whether the Independence Assumption is plausible. We can also check some related conditions: Randomization Condition: The data values must be sampled randomly. 10% Condition: When the sample is drawn without replacement, the sample size, n, should be no more than 10% of the population. Large Enough Sample Condition: The CLT doesn t tell us how large a sample we need. For now, you need to think about your sample size in the context of what you know about the population.

Weight-Loss Example Weight-loss example, population mean and standard deviation were 8 pounds and 5 pounds, respectively, and we were taking random samples of size 25. Potential sample means represented by a bell-shaped curve with mean of 8 pounds and standard deviation: 5 = 1 pound 25 For our samples of 25 people: 68% of sample means will be between 7 and 9 pounds 95% of sample means will be between 6 and 10 pounds 99.7% of sample means will be between 5 and 11 pounds

Weight-Loss Example Increasing the Size of the Sample suppose a sample of 100 people instead of 25 was taken. Potential sample means still represented by a bell-shaped curve with mean of 8 pounds but standard deviation: 5 = 0.5 pounds 100 For our sample of 100 people: 68% of sample means will be between 7.5 and 8.5 pounds 95% of sample means will be between 7 and 9 pounds 99.7% of sample means will be between 6.5 and 9.5 pounds

Question Suppose that test scores on a particular exam have a mean of 77 and standard deviation of 5, and that they have a bellshaped curve. Suppose you randomly select a 1000 samples of size 100 from this population and calculate the sample mean test scores. Between what two values(sample means) would you expect 95% of these sample mean test scores to fall? A. 72 AND 82 B. 67 AND 87 C. 76 AND 78

Question

But Which Normal? The CLT says that the sampling distribution of any mean or proportion is approximately Normal. But which Normal model? For proportions, the sampling distribution is centered at the population proportion. For means, it s centered at the population mean. But what about the standard deviations?

But Which Normal? The Normal model for the sampling distribution of the mean has a standard deviation equal to SDy n where σ is the population standard deviation.

But Which Normal? The Normal model for the sampling distribution of the proportion has a standard deviation equal to SD ˆp pq n pq n

About Variation The standard deviation of the sampling distribution declines only with the square root of the sample size (the denominator contains the square root of n). Therefore, the variability decreases as the sample size increases. While we d always like a larger sample, the square root limits how much we can make a sample tell about the population. (This is an example of the Law of Diminishing Returns.)

The Real World and the Model World Be careful! Now we have two distributions to deal with. The first is the real world distribution of the sample, which we might display with a histogram. The second is the math world sampling distribution of the statistic, which we model with a Normal model based on the Central Limit Theorem. Don t confuse the two!

Sampling Distribution Models There are two basic truths about sampling distributions: 1. Sampling distributions arise because samples vary. Each random sample will have different cases and, so, a different value of the statistic. 2. Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for means and proportions.

What Can Go Wrong? Don t confuse the sampling distribution with the distribution of the sample. When you take a sample, you look at the distribution of the values, usually with a histogram, and you may calculate summary statistics. The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples the one you got and the ones you didn t get. Watch out for small samples from skewed populations. The more skewed the distribution, the larger the sample size we need for the CLT to work.

Summary Sample proportions and means will vary from sample to sample that s sampling error (sampling variability). Sampling variability may be unavoidable, but it is also predictable! In statistics, the concept of population parameter is theoretical Only God? knows the Truth? We try our best to find out what it is.

A Scientific Look at the Dangers of High Heels, NY Times, Jan., 2012 Not long ago, Neil J. Cronin, a postdoctoral researcher, and two of his colleagues at the Musculoskeletal Research Program at Griffith University in Queensland, Australia, were having coffee on the university s campus when they noticed a young woman tottering past in high heels. She looked quite uncomfortable and unstable, Dr. Cronin says.

A Scientific Look at the Dangers of High Heels, NY Times, Jan., 2012 Some observers, particularly women, might have winced in sympathy or, alternatively, wondered where she d bought stilettos. But the three researchers, men who study the biomechanics of walking, were struck instead by the scientific implications of her passage. We began to consider what might be happening at the muscle and tendon level in women who wear heels, Dr. Cronin says.

Study: Long-term use of high heeled shoes alters the neuromechanics of human walking

Study: Long-term use of high heeled shoes alters the neuromechanics of human walking