Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li)

Similar documents
Introduction to Statistical Data Analysis Lecture 4: Sampling

Lecture 7: Confidence interval and Normal approximation

Theoretical Foundations

Sampling Distribution Models. Central Limit Theorem

Chapter 15 Sampling Distribution Models

Chapter 18: Sampling Distributions

Are data normally normally distributed?

Econ 325: Introduction to Empirical Economics

Discrete Distributions

STA 260: Statistics and Probability II

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Chapter. Objectives. Sampling Distributions

ST 371 (IX): Theories of Sampling Distributions

CHAPTER 18 SAMPLING DISTRIBUTION MODELS STAT 203

Sampling Distribution Models. Chapter 17

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

AP Statistics Review Ch. 7

Lecture 20 Random Samples 0/ 13

UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS

4/19/2009. Probability Distributions. Inference. Example 1. Example 2. Parameter versus statistic. Normal Probability Distribution N

Lecture 8 Sampling Theory

Gov 2000: 6. Hypothesis Testing

1. Sample Space and Probability Part IV: Pascal Triangle and Bernoulli Trials. ECE 302 Spring 2012 Purdue University, School of ECE Prof.

Chapter 18. Sampling Distribution Models /51

Statistics for Business and Economics

Chapter 8: Confidence Intervals

Unit 9: Inferences for Proportions and Count Data

Ch. 7: Estimates and Sample Sizes

Review. A Bernoulli Trial is a very simple experiment:

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University

Unit 4 Probability. Dr Mahmoud Alhussami

Business Statistics:

Estimation and Confidence Intervals

Unit 9: Inferences for Proportions and Count Data

Management Programme. MS-08: Quantitative Analysis for Managerial Applications

Chapters 3.2 Discrete distributions

CHAPTER 7. Parameters are numerical descriptive measures for populations.

Section 7.5 Conditional Probability and Independent Events

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

STA 291 Lecture 16. Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal

Finite Dictatorships and Infinite Democracies

Chapter 10: Comparing Two Populations or Groups

Polling and sampling. Clement de Chaisemartin and Douglas G. Steigerwald UCSB

NOWCASTING THE OBAMA VOTE: PROXY MODELS FOR 2012

13.1 Categorical Data and the Multinomial Experiment

Problems Pages 1-4 Answers Page 5 Solutions Pages 6-11

Mean/Average Median Mode Range

Lecture 27. DATA 8 Spring Sample Averages. Slides created by John DeNero and Ani Adhikari

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

Econ 325: Introduction to Empirical Economics

STAT:5100 (22S:193) Statistical Inference I

Introduction to Statistical Data Analysis Lecture 1: Working with Data Sets

Hypothesis testing for µ:

Q Scheme Marks AOs. Notes. Ignore any extra columns with 0 probability. Otherwise 1 for each. If 4, 5 or 6 missing B0B0.

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

AP Online Quiz KEY Chapter 7: Sampling Distributions

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

Ch. 7 Statistical Intervals Based on a Single Sample

Test 3 SOLUTIONS. x P(x) xp(x)

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Probability Distributions

Lecture 10 and 11: Text and Discrete Distributions

Business Statistics:

STAT 4385 Topic 01: Introduction & Review

Each trial has only two possible outcomes success and failure. The possible outcomes are exactly the same for each trial.

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

CMU Social choice 2: Manipulation. Teacher: Ariel Procaccia

The Central Limit Theorem

REPEATED TRIALS. p(e 1 ) p(e 2 )... p(e k )

Forecasting: Intentions, Expectations, and Confidence. David Rothschild Yahoo! Research, Economist December 17, 2011

3/30/2009. Probability Distributions. Binomial distribution. TI-83 Binomial Probability

Probability Distributions

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions

Introduction to Statistical Data Analysis Lecture 5: Confidence Intervals

Discussion 03 Solutions

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

CENTRAL LIMIT THEOREM (CLT)

THE SAMPLING DISTRIBUTION OF THE MEAN

Math 243 Chapter 7 Supplement The Sampling Distribution of a Proportion

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or

MA/CS 109 The Art and Science of Quantitative Reasoning Estimation and Confidence: Opinion Polls

CHAPTER 14 THEORETICAL DISTRIBUTIONS

Expected Value - Revisited

Chapter 7: Sampling Distributions

example: An observation X comes from a normal distribution with

Lesson 19: Understanding Variability When Estimating a Population Proportion

Chapter 1. The Mathematics of Voting

The Importance of the Median Voter

Discrete Probability Distributions

7.1: What is a Sampling Distribution?!?!

Inference for Proportions

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Events A and B are said to be independent if the occurrence of A does not affect the probability of B.

MATH 19-02: HW 5 TUFTS UNIVERSITY DEPARTMENT OF MATHEMATICS SPRING 2018

Stat 101: Lecture 12. Summer 2006

Transcription:

Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li) Department of Educational Psychology University of Illinois at Urbana-Champaign 1

Inferential methods are the main focus of the rest of the course. Understanding the concept of sampling distribution is crucial to understanding statistical inferences. 2

3

Key Points 1. Statistic vs. Parameter 2. Population Distribution, Data Distribution, and Sampling Distributions 3. Mean and Standard Deviation of the Sampling Distribution of a Proportion 4. Inference with Sampling Distribution of a Proportion 4

Example: Predicting California Election Results Using Exit Polls Using exit polls, polling organizations predict winners after learning how a small number of people voted, often only a few thousand out of possibly millions of voters. The total number of voters was over nine million, and the poll sampled a small portion of them. How do we know if the sample proportion from the California exit poll is a good estimate, falling close to the population proportion? This section introduces a type of probability distribution called the Sampling Distribution that helps us determine how close to the population parameter a sample statistic is likely to fall. 5

Example: Predicting California Election Results Using Exit Polls In California in November 2010, the gubernatorial race pitted the Republican candidate Meg Whitman against the Democratic candidate, Jerry Brown. After sampling 3889 randomly selected voters, 53.1% said they voted for Brown, 42.4% for Whitman. At the time of the exit poll, the percentage of the entire voting population (nearly 9.5 million people) that voted for Brown was unknown. 6

Example: Predicting Election Results Using Exit Polls How close can we expect a sample percentage to be to the population percentage? How does the sample size influence our analysis? The sampling distribution helps us determine how close to the population parameter a sample statistic is likely to fall. 7

Recall: Statistic and Parameter A statistic is a numerical summary of sample data such as a sample proportion or sample mean A parameter is a numerical summary of a population such as a population proportion or population mean. In practice, we seldom know the values of parameters. Parameters are estimated using sample data. We use statistics to estimate parameters. 8

Population Distribution Population distribution: the probability distribution of the random variable of interest in the whole population. Example: Let X = vote outcome, with x = 1 for Jerry Brown and x = 0 for all other responses. The possible values of the random variable X (0 and 1) and how often these values occurred in the whole population (0.462 and 0.538) give the population distribution. 9

Data Distribution Data distribution: probability distribution of the random variable of interest in one sample that we obtain from the population. Example: The possible values of the random variable X (0 and 1) and how often these values occurred (0.469 and 0.531) give the data distribution for this one sample. With random sampling, the larger the sample size n, the more closely the data distribution resembles the population distribution 10

Example: Predicting Election Results Using Exit Polls Figure 7.1 The population (9.5 million voters) and data (n=3889) distributions of candidate preference (0 = Not Brown, 1= Brown). 11

Sampling Distribution Sampling distribution: the probability distribution of a sample statistic. With random sampling, the sampling distribution provides probabilities for all the possible values the statistic can take. Example: the sampling distribution of a sample proportion the sampling distribution of a sample mean 12

Sampling Distribution A sampling distribution is different from population distribution and data distribution. Rather than giving probabilities for an observation for an individual subject (as in a population or data distribution), it gives probabilities for the value of a statistic for a sample of subjects. Sampling distributions describe the variability of the sample statistic (e.g., sample mean, sample proportion) that occurs from sample to sample. The sampling distribution provides the key for telling us how close a sample statistic falls to the corresponding unknown parameter. 13

True or False: For one population distribution there is only one data distribution. a) True b) False 14

Mean and SD of the Sampling Distribution of a Proportion For a random sample of size n from a population with proportion p of outcomes in a particular category, the sampling distribution of the proportion of the sample in that category has Mean = p Standard deviation = p(1-p) n 15

The Standard Error To distinguish the standard deviation of a sampling distribution from the standard deviation of an ordinary probability distribution, we refer to it as a standard error. The standard error of a sample statistic (e.g., sample mean, sample proportion) is the standard deviation of the sampling distribution of the sample statistic 16

Example: 2010 California Election Revisited Election results showed that 53.8% of the population of all voters voted for Brown. What was the mean and standard deviation of the sampling distribution of the sample proportion who voted for him? Given that the exit poll had 3889 people (n =3889) and 53.8% supported Brown (p =.538), Mean = p =.538 S.E. = p*(1- p) n =.538*(1-.538) 3889 =.008 17

Suppose that 40% of men over the age of 30 suffer from lower back pain. For a random sample of 50 men over the age of 30, find the mean and the standard error of the sampling distribution of the sample proportion of men over the age of 30 that suffer from lower back pain. a) Mean = 0.40 Standard Error = 0.0693 b) Mean= 20 Standard Error = 3.464 c) Mean = 0.40 Standard Error = 3.464 d) Mean = 20 Standard Error = 0.0693 e) Cannot be determined 18

Example: 2010 California Election Revisited Q1: Given the sampling distribution of the sample proportion who voted for Brown, what are the values of the sample proportion we would expect to observe from random sampling (data distribution)? 19

Example: 2010 California Election Revisited Mean 3*S.E.=.514 Mean=.538 Mean+3*S.E.=.562 20

Example: 2010 California Election Revisited Q1: Given the sampling distribution of the sample proportion who voted for Brown, what are the values of the sample proportion we would expect to observe from random sampling (data distribution)? Answer: given p=.538, it is likely that the sample proportion from a random sample taken from this population will fall within 3 S.E. from the mean, which is between.514 and.562. 21

Example: 2010 California Election Revisited Q2: Based on the results of the exit poll, would you have been willing to predict Brown as the winner on election night while the votes were still being counted? 22

Example: 2010 California Election Revisited Think it through: Our inference on the plausible population proportion will help us predict the election result. When the votes are still being counted, we do not know the actual population proportion (p). Our best estimate of the population proportion is the sample proportion (p-hat) from the exit poll. We could estimate the standard error of a sample proportion by substituting p-hat for p ˆp =.531 S.E. - hat = ˆp*(1- ˆp) n =.531*(1-.531) 3889 =.008 23

Example: 2010 California Election Revisited Think it through: With the estimated mean and standard error of the sample proportion, we can find a range of plausible values for the actual population proportion as.531±3*.008 =[.507,.557] We observe that all the plausible values estimated for the population proportion of voters who will vote for Brown are above the value of 0.50 and give Brown a majority over any other candidate. Therefore, we would be willing to predict Brown as the winner. 24

25

Key Points Revisited 1. Statistic vs. Parameter 2. Population Distribution, Data Distribution, and Sampling Distributions 3. Mean and Standard Deviation of the Sampling Distribution of a Proportion 4. Inference with Sampling Distribution of a Proportion 26

Key Points 1. The Sampling Distribution of the Sample Mean 2. Effect of n on the Standard Error 3. Central Limit Theorem (CLT) 4. Calculating Probabilities of Sample Means 5. Binomial Distribution is a Sampling Distribution 27

The Sampling Distribution of the Sample Mean The sample mean, x, is a random variable. The sample mean varies from sample to sample. By contrast, the population mean, µ, is a single fixed number. 28

The Mean and Standard Deviation of the Sampling Distribution of the Sample Mean For a random sample of size n from a population having mean µ and standard deviation σ, the sampling distribution of the sample mean has: its center described by the mean µ (the same as the mean of the population). and the spread described by the standard error, which equals the population standard deviation divided by the square root of the sample size: S.E. x =s n 29

Example 1: Pizza Sales Daily sales at a pizza restaurant vary from day to day. The daily sales figures fluctuate around a mean µ = $900 with a standard deviation σ = $300. What are the center and spread of the sampling distribution of the average daily sales in a week? m = $900 S.E. = 300 7 = $113 30

The Sampling Distribution of the Sample Mean When the Population Distribution is Normally Distributed For a random sample of size n from a normally distributed population having mean µ and standard deviation σ, the sampling distribution of the sample mean: is also normally distributed with its center described by the mean µ (the same as the mean of the population). and the spread described by the standard error, which equals the population standard deviation divided by the square root of the sample size: S.E. x =s n 31

The Sampling Distribution of the Sample Mean When the Population Distribution is NOT Normally Distributed For a random sample of size n from a NOT normally distributed population having mean µ and standard deviation σ, the sampling distribution of the sample mean: approaches an approximately normal distribution as the sample size increases has its center described by the mean µ (the same as the mean of the population). and the spread described by the standard error, which equals the population standard deviation divided by the square root of the sample size: S.E. x =s n 32

Central Limit Theorem (CLT) CLT: for a random sample of size n from a population having mean µ and standard deviation σ, the sampling distribution of the sample mean: Approaches an approximately normal distribution as the sample size increases has its center described by the mean µ (the same as the mean of the population). and the spread described by the standard error, which equals the population standard deviation divided by the square root of the sample size: S.E. x =s This result applies no matter what the shape of the probability distribution from which the samples are taken. n 33

CLT: How Large a Sample? The sampling distribution of the sample mean takes more of a bell shape as the random sample size n increases. The more skewed the population distribution, the larger n must be before the shape of the sampling distribution is close to normal. In practice, the sampling distribution is usually close to normal when the sample size n is at least about 30. If the population distribution is approximately normal, then the sampling distribution is approximately normal for all sample sizes. 34

CLT: Impact of increasing n 35

CLT Helps Us Make Inferences For large n, the sampling distribution is approximately normal even if the population distribution is not. This enables us to make inferences about population means regardless of the shape of the population distribution. 36

Effect of n on the Standard Error Knowing how to find a standard error gives us a mechanism for understanding how much variability to expect in sample statistics just by chance. s The standard error of the sample mean = n As the sample size n increases, the denominator increases, so the standard error decreases. With larger samples, the sample mean is more likely to fall closer to the population mean. 37

CLT: Impact of increasing n 38

Calculating Probabilities of Sample Means The distribution of weights of milk bottles is normally distributed with a mean of 1.1 lbs and a standard deviation (σ)=0.20 lbs. What is the probability that the mean of a random sample of 5 bottles will be greater than 0.99 lbs? Calculate the mean and standard error for the sampling distribution of a random sample of 5 milk bottles By the CLT, x is approximately normal with mean=1.1 and standard error = = 0.0894 æ P(X >.99) = PçZ > è (.99-1.1).0895 0.2 5 ö = P(z > -1.23) =.89 ø 39

Binomial Distribution is a Sampling Distribution In binomial distribution, p, the probability of success in one trial, can also be regarded as the population proportion of success. The binomial distribution is the probability distribution of the number of successes in n independent trials, which can be regarded as the sampling distribution for the sample proportion of successes multiplied by n when the sample size is n. 40

Binomial Distribution is a Sampling Distribution For a random sample of size n from a population with proportion p of success, the sampling distribution of the proportion of the sample has Mean = p Standard error = n Now, if multiply them by n mean and sd for binomial distribution. The binomial distribution of the number of successes in n independent trials with probability of success p in each trial has: Mean = np Standard deviation = p(1- p) np(1- p) 41

Approximating the Binomial Distribution with the Normal Distribution The binomial distribution can be well approximated by the normal distribution when the expected number of successes, np, and the expected number of failures, n(1-p) are both at least 15. This is an application of CLT. 42

2000 Presidential Election The 2000 US presidential election came down to votes in Florida. The official results from the Florida Department of State, Division of Elections for the two top candidates on Sunday November 28, 2000 George W. Bush 2,912,790 Al Gore 2,912,253 Total 5,825,043 Bush only had a 537 vote lead. Distribution of proportion for Bush is approximate normal. 43

Example continued Proportion for Bush p = 2912790 5825043 =.5000046094 se = p(1 p)/n =.00207166 If the election was a tie, z =.5000046094.5.00207166 =.0222 p z >.0222 =.98, which equals probability of making a mistake if the election was a tie. (Bush would have had to win by 6,217 votes for a decisive victory) 44

Key Points Revisited 1. The Sampling Distribution of the Sample Mean 2. Effect of n on the Standard Error 3. Central Limit Theorem (CLT) 4. Calculating Probabilities of Sample Means 5. Binomial Distribution is a Sampling Distribution 45

46