Occupy movement - Duke edition. Lecture 14: Large sample inference for proportions. Exploratory analysis. Another poll on the movement

Similar documents
Lecture 11 - Tests of Proportions

CHAPTER 10 Comparing Two Populations or Groups

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Bernoulli and Binomial Distributions. Notes. Bernoulli Trials. Bernoulli/Binomial Random Variables Bernoulli and Binomial Distributions.

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

AP Statistics Ch 12 Inference for Proportions

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

Practice Questions: Statistics W1111, Fall Solutions

What Is a Sampling Distribution? DISTINGUISH between a parameter and a statistic

10.4 Hypothesis Testing: Two Independent Samples Proportion

Two-Sample Inference for Proportions and Inference for Linear Regression

Chapter 10: Comparing Two Populations or Groups

Chapter 10: Comparing Two Populations or Groups

Announcements. Final Review: Units 1-7

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

DETERMINE whether the conditions for performing inference are met. CONSTRUCT and INTERPRET a confidence interval to compare two proportions.

Inferences About Two Proportions

CHAPTER 7. Parameters are numerical descriptive measures for populations.

Difference Between Pair Differences v. 2 Samples

10.1. Comparing Two Proportions. Section 10.1

Announcements. Unit 1: Introduction to data Lecture 1: Data collection, observational studies, and experiments. Statistics 101

Annoucements. MT2 - Review. one variable. two variables

Unit5: Inferenceforcategoricaldata. 4. MT2 Review. Sta Fall Duke University, Department of Statistical Science

Announcements. Unit 4: Inference for numerical variables Lecture 4: ANOVA. Data. Statistics 104

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Chapter 10: Comparing Two Populations or Groups

Chapter 6. Estimates and Sample Sizes

One-sample categorical data: approximate inference

Have you... Unit 1: Introduction to data Lecture 1: Data collection, observational studies, and experiments. Readiness assessment

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

Chapter 15 Sampling Distribution Models

Psych 10 / Stats 60, Practice Problem Set 5 (Week 5 Material) Part 1: Power (and building blocks of power)

Math 140 Introductory Statistics

7.1: What is a Sampling Distribution?!?!

THE SAMPLING DISTRIBUTION OF THE MEAN

GAISE Framework 3. Formulate Question Collect Data Analyze Data Interpret Results

Lecture 7: Confidence interval and Normal approximation

Introduction to Survey Analysis!

Hypotheses. Poll. (a) H 0 : µ 6th = µ 13th H A : µ 6th µ 13th (b) H 0 : p 6th = p 13th H A : p 6th p 13th (c) H 0 : µ diff = 0

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

QUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

Chapter 9 Inferences from Two Samples

Lab 5 for Math 17: Sampling Distributions and Applications

Inferential Statistics

Confidence Intervals for Population Mean

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning

LECTURE 15: SIMPLE LINEAR REGRESSION I

Do students sleep the recommended 8 hours a night on average?

Gov 2000: 6. Hypothesis Testing

The Components of a Statistical Hypothesis Testing Problem

Lecture 20: Multiple linear regression

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing

Project proposal feedback. Unit 4: Inference for numerical variables Lecture 3: t-distribution. Statistics 104

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

Two-sample inference: Continuous data

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Stat 231 Exam 2 Fall 2013

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

AP Statistics Chapter 7 Multiple Choice Test

Data Mining. CS57300 Purdue University. March 22, 2018

Margin of Error. What is margin of error and why does it exist?

Business Statistics 41000: Homework # 5

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

MONT 105Q Mathematical Journeys Lecture Notes on Statistical Inference and Hypothesis Testing March-April, 2016

Confidence Interval Estimation

STA Module 10 Comparing Two Proportions

Statistical Inference for Means

1 Hypothesis testing for a single mean

Chapter 6: SAMPLING DISTRIBUTIONS

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Unit4: Inferencefornumericaldata. 1. Inference using the t-distribution. Sta Fall Duke University, Department of Statistical Science

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

What is a parameter? What is a statistic? How is one related to the other?

Chapter 26: Comparing Counts (Chi Square)

Two-Sample Inferential Statistics

Weldon s dice. Lecture 15 - χ 2 Tests. Labby s dice. Labby s dice (cont.)

Sampling Distribution of a Sample Proportion

Quantitative Analysis and Empirical Methods

Chapter 8: Sampling Variability and Sampling Distributions

example: An observation X comes from a normal distribution with

Lecture 26 Section 8.4. Wed, Oct 14, 2009

The variable θ is called the parameter of the model, and the set Ω is called the parameter space.

Big Data Analysis with Apache Spark UC#BERKELEY

Lecture 10: Generalized likelihood ratio test

Sampling, Confidence Interval and Hypothesis Testing

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Math 140 Introductory Statistics

23. MORE HYPOTHESIS TESTING

Sampling Distribution Models. Chapter 17

Transcription:

Occupy movement - Duke edition Lecture 14: Large sample inference for proportions Statistics 101 Mine Çetinkaya-Rundel October 20, 2011 On Tuesday we asked you about how closely you re following the news about the Occupy Wall Street protests involving demonstrations in New York City and other places around the country. Below are the results. (n = ) (a) Very or somewhat closely (b) Not too closely (c) Not at all (d) No opinion Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 1 / 19 Another poll on the movement A USAToday/Gallup poll conducted October 13-15, 2011 asked 1,026 adults How closely are you following the news about the Occupy Wall Street protests involving demonstrations in New York City and other places around the country very or somewhat closely, not too closely, or not at all? Exploratory analysis Duke: Among Duke students, 23 said they follow the news about the Occupy movement very or somewhat closely, (39%). 0.0 0.1 0.2 0.3 0.4 0.5 0.6 No Duke Yes US: Among 1,026 Americans, 564 said they follow the news about the Occupy movement very or somewhat closely, (55%). 0.0 0.1 0.2 0.3 0.4 0.5 0.6 No US Yes Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 2 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 3 / 19

Parameter and point estimate Do these data provide convincing evidence that the proportion of Duke students who follow the news about the Occupy movement closely differ from the proportion of all Americans who do? Parameter of interest: Difference between the proportions of all Duke students and all Americans who follow the news about the Occupy movement closely. p Duke p US Point estimate: Difference between the proportions of sampled Duke students and sampled Americans who follow the news about the Occupy movement closely. Which of the following is the correct set of hypotheses for testing if the proportion of Duke students who follow the news about the Occupy movement closely differ from the proportion of all Americans who do? (a) H 0 : p Duke = p US H 0 : p Duke p US (b) H 0 : ˆp Duke = ˆp US H 0 : ˆp Duke ˆp US (c) H 0 : p Duke p US = 0 H 0 : p Duke p US 0 (d) H 0 : p Duke = p US H 0 : p Duke < p US ˆp Duke ˆp US Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 4 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 5 / 19 Hypothesis testing when p 1 = p 2 Pooled estimate of a proportion If assumptions and conditions for inference are satisfied (are they?) we know that ˆp Duke ˆp US will be nearly normally distributed, with mean = p Duke p US = 0 from H 0, and SE also calculated assuming H 0 is true. The CLT says SE of ˆp 1 ˆp 2 = p 1 (1 p 1 ) n 1 + p 2(1 p 2 ) n 2 But we are supposed to be doing the hypothesis test assuming that H 0 is true, and H 0 says p 1 = p 2 (well, it says p Duke = p US in this case, but you get the point) In short, we need to find a common proportion for these samples which we can use to calculate SE Since H 0 implies that both samples come from the same population, we pool the two samples to calculate a pooled estimate of the sample proportion. This simply means finding the proportion of total successes among the total number of observations. Pooled estimate of a proportion ˆp = # of successes 1 + # of successes 2 n 1 + n 2 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 6 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 7 / 19

Pooled estimate of a proportion - in context SE for a hypothesis test when p 1 = p 2 US Duke # of successes 564 23 n 1,026 ˆp 0.55 0.39 ˆp = # of successes 1 + # of successes 2 n 1 + n 2 564 + 23 = 1, 026 + = 587 1085 = 0.54 Which of the following is the correct standard error of ˆp Duke ˆp US for this hypothesis test? (a) SE = (b) SE = (c) SE = (d) SE = (e) SE = 0.55 (1 0.55) + 0.39 (1 0.39) 0.54 (1 0.54) + 0.54 (1 0.54) 0.55 (1 0.39) + 0.39 (1 0.55) 0.39 (1 0.39) + 0.39 (1 0.39) 0.55 (1 0.55) + 0.55 (1 0.55) Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 8 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 9 / 19 Calculating the p-value Evaluating the study Which of the following is the correct p-value for this hypothesis test? Do you have any reservations about our findings? Is there anything about this analysis that concerns you? (a) 0.0082 (b) 0.0164 (c) 0.9918 (d) 2.40 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 10 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 11 / 19

When to retreat Quick recap on comparing proportions When to retreat H 0 : p 1 p 2 = 0 When comparing proportions, if H 0 : p 1 p 2 = 0, first calculate the pooled estimate, ˆp, and then use that to calculate the standard error. H 0 : p 1 p 2 = some nonzero value If H 0 : p 1 p 2 = some nonzero value, then just use the observed sample proportions, ˆp 1 and ˆp 2, for calculating the standard error. When calculating a confidence interval always use the observed sample proportions, ˆp 1 and ˆp 2, for calculating the standard error, since there is no null hypothesis telling you what to do. The inference tools that we have learned that rely on the CLT and the normal distribution require the following two assumptions: 1. The individual observations must be independent. 2. Sample size and skew should not prevent the sampling distribution from being nearly normal. means: n > 50, population distribution not extremely skewed proportions: at least 10 successes and 10 failures In Chapter 6 we ll learn how to analyze smaller samples. If conditions for a statistical technique are not satisfied: 1. learn new methods that are appropriate for the data 2. consult a statistician 3. ignore the failure of conditions this option effectively invalidates any analysis and may discredit novel and interesting findings Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 12 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 13 / 19 Got ESP? In test for extrasensory perception, ESP, a pack of five cards is hidden and the test taker guesses the chosen card. This is repeated many times. If the test taker does not have ESP, i.e. is randomly guessing, what percent of the cards would s/he be expected to guess correctly? (a) 0 (b) 0.5 (c) 0.20 (d) 0.25 (e) 0.99 http:// www.psychicscience.org/ esp3.aspx Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 14 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 15 / 19

Testing for ESP What if? Psi-hitting is defined as more guessing more cards correctly than expected by chance. At least how many cards, out of 100, should the test taker get right in order for there to be a statistically significant evidence of psi-hitting at 5% significance level? What if a test taker gets more than out of 100 cards right? The hypothesis test would yield a p-value less than 0.05, and we would conclude that the data provide convincing evidence for the test taker having ESP. In reality nobody has ESP. What type of error might we have made? (a) Type 1 error (b) Type 2 error Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 16 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 17 / 19 The Pique Technique claims that people are more likely to respond to an unusual request more than a standard request because the unusual request will pique their curiosity. Researchers divided 144 volunteers into two equally sized groups. Depending on their group subjects were asked for either a quarter or 17 cents. What are the appropriate hypotheses for evaluating the Pique Technique in this context? (a) H 0 : p usual = p unusual H A : p usual > p unusual (b) H 0 : p usual = p unusual H A : p usual p unusual (c) H 0 : p usual = p unusual H A : p usual < p unusual (d) H 0 : p unusual = 0.5 H A : p unusual > 0.5 Ramsey and Schafer, The Statistical Sleuth, 2 nd ed, (Duxbury, 2002), p.549 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 18 / 19 Pique technique The group from which a quarter was requested had 30.6% success rate, while the other group had 43.1% success rate. Evaluate the hypotheses at 5% significance level. Usual Unusual ˆp 0.306 0.431 n 72 72 # of successes 72 0.306 22 72 0.431 31 Pooled ˆp = 22 + 31 72 + 72 = 53 144 = 0.368 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 19 / 19