Occupy movement - Duke edition. Lecture 14: Large sample inference for proportions. Exploratory analysis. Another poll on the movement

Occupy movement - Duke edition Lecture 14: Large sample inference for proportions Statistics 101 Mine Çetinkaya-Rundel October 20, 2011 On Tuesday we asked you about how closely you re following the news about the Occupy Wall Street protests involving demonstrations in New York City and other places around the country. Below are the results. (n = ) (a) Very or somewhat closely (b) Not too closely (c) Not at all (d) No opinion Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 1 / 19 Another poll on the movement A USAToday/Gallup poll conducted October 13-15, 2011 asked 1,026 adults How closely are you following the news about the Occupy Wall Street protests involving demonstrations in New York City and other places around the country very or somewhat closely, not too closely, or not at all? Exploratory analysis Duke: Among Duke students, 23 said they follow the news about the Occupy movement very or somewhat closely, (39%). 0.0 0.1 0.2 0.3 0.4 0.5 0.6 No Duke Yes US: Among 1,026 Americans, 564 said they follow the news about the Occupy movement very or somewhat closely, (55%). 0.0 0.1 0.2 0.3 0.4 0.5 0.6 No US Yes Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 2 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 3 / 19

Parameter and point estimate Do these data provide convincing evidence that the proportion of Duke students who follow the news about the Occupy movement closely differ from the proportion of all Americans who do? Parameter of interest: Difference between the proportions of all Duke students and all Americans who follow the news about the Occupy movement closely. p Duke p US Point estimate: Difference between the proportions of sampled Duke students and sampled Americans who follow the news about the Occupy movement closely. Which of the following is the correct set of hypotheses for testing if the proportion of Duke students who follow the news about the Occupy movement closely differ from the proportion of all Americans who do? (a) H 0 : p Duke = p US H 0 : p Duke p US (b) H 0 : ˆp Duke = ˆp US H 0 : ˆp Duke ˆp US (c) H 0 : p Duke p US = 0 H 0 : p Duke p US 0 (d) H 0 : p Duke = p US H 0 : p Duke < p US ˆp Duke ˆp US Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 4 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 5 / 19 Hypothesis testing when p 1 = p 2 Pooled estimate of a proportion If assumptions and conditions for inference are satisfied (are they?) we know that ˆp Duke ˆp US will be nearly normally distributed, with mean = p Duke p US = 0 from H 0, and SE also calculated assuming H 0 is true. The CLT says SE of ˆp 1 ˆp 2 = p 1 (1 p 1 ) n 1 + p 2(1 p 2 ) n 2 But we are supposed to be doing the hypothesis test assuming that H 0 is true, and H 0 says p 1 = p 2 (well, it says p Duke = p US in this case, but you get the point) In short, we need to find a common proportion for these samples which we can use to calculate SE Since H 0 implies that both samples come from the same population, we pool the two samples to calculate a pooled estimate of the sample proportion. This simply means finding the proportion of total successes among the total number of observations. Pooled estimate of a proportion ˆp = # of successes 1 + # of successes 2 n 1 + n 2 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 6 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 7 / 19

Pooled estimate of a proportion - in context SE for a hypothesis test when p 1 = p 2 US Duke # of successes 564 23 n 1,026 ˆp 0.55 0.39 ˆp = # of successes 1 + # of successes 2 n 1 + n 2 564 + 23 = 1, 026 + = 587 1085 = 0.54 Which of the following is the correct standard error of ˆp Duke ˆp US for this hypothesis test? (a) SE = (b) SE = (c) SE = (d) SE = (e) SE = 0.55 (1 0.55) + 0.39 (1 0.39) 0.54 (1 0.54) + 0.54 (1 0.54) 0.55 (1 0.39) + 0.39 (1 0.55) 0.39 (1 0.39) + 0.39 (1 0.39) 0.55 (1 0.55) + 0.55 (1 0.55) Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 8 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 9 / 19 Calculating the p-value Evaluating the study Which of the following is the correct p-value for this hypothesis test? Do you have any reservations about our findings? Is there anything about this analysis that concerns you? (a) 0.0082 (b) 0.0164 (c) 0.9918 (d) 2.40 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 10 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 11 / 19

When to retreat Quick recap on comparing proportions When to retreat H 0 : p 1 p 2 = 0 When comparing proportions, if H 0 : p 1 p 2 = 0, first calculate the pooled estimate, ˆp, and then use that to calculate the standard error. H 0 : p 1 p 2 = some nonzero value If H 0 : p 1 p 2 = some nonzero value, then just use the observed sample proportions, ˆp 1 and ˆp 2, for calculating the standard error. When calculating a confidence interval always use the observed sample proportions, ˆp 1 and ˆp 2, for calculating the standard error, since there is no null hypothesis telling you what to do. The inference tools that we have learned that rely on the CLT and the normal distribution require the following two assumptions: 1. The individual observations must be independent. 2. Sample size and skew should not prevent the sampling distribution from being nearly normal. means: n > 50, population distribution not extremely skewed proportions: at least 10 successes and 10 failures In Chapter 6 we ll learn how to analyze smaller samples. If conditions for a statistical technique are not satisfied: 1. learn new methods that are appropriate for the data 2. consult a statistician 3. ignore the failure of conditions this option effectively invalidates any analysis and may discredit novel and interesting findings Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 12 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 13 / 19 Got ESP? In test for extrasensory perception, ESP, a pack of five cards is hidden and the test taker guesses the chosen card. This is repeated many times. If the test taker does not have ESP, i.e. is randomly guessing, what percent of the cards would s/he be expected to guess correctly? (a) 0 (b) 0.5 (c) 0.20 (d) 0.25 (e) 0.99 http:// www.psychicscience.org/ esp3.aspx Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 14 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 15 / 19

Testing for ESP What if? Psi-hitting is defined as more guessing more cards correctly than expected by chance. At least how many cards, out of 100, should the test taker get right in order for there to be a statistically significant evidence of psi-hitting at 5% significance level? What if a test taker gets more than out of 100 cards right? The hypothesis test would yield a p-value less than 0.05, and we would conclude that the data provide convincing evidence for the test taker having ESP. In reality nobody has ESP. What type of error might we have made? (a) Type 1 error (b) Type 2 error Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 16 / 19 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 17 / 19 The Pique Technique claims that people are more likely to respond to an unusual request more than a standard request because the unusual request will pique their curiosity. Researchers divided 144 volunteers into two equally sized groups. Depending on their group subjects were asked for either a quarter or 17 cents. What are the appropriate hypotheses for evaluating the Pique Technique in this context? (a) H 0 : p usual = p unusual H A : p usual > p unusual (b) H 0 : p usual = p unusual H A : p usual p unusual (c) H 0 : p usual = p unusual H A : p usual < p unusual (d) H 0 : p unusual = 0.5 H A : p unusual > 0.5 Ramsey and Schafer, The Statistical Sleuth, 2 nd ed, (Duxbury, 2002), p.549 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 18 / 19 Pique technique The group from which a quarter was requested had 30.6% success rate, while the other group had 43.1% success rate. Evaluate the hypotheses at 5% significance level. Usual Unusual ˆp 0.306 0.431 n 72 72 # of successes 72 0.306 22 72 0.431 31 Pooled ˆp = 22 + 31 72 + 72 = 53 144 = 0.368 Statistics 101 (Mine Çetinkaya-Rundel) L14: Large sample inference for proportions October 20, 2011 19 / 19