Chapter 6: Inference for Proportions

Similar documents
Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Topic 18: Composite Hypotheses

Math 140 Introductory Statistics

Frequentist Inference

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Common Large/Small Sample Tests 1/55

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

STAC51: Categorical data Analysis

Properties and Hypothesis Testing

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Chapter 8: Estimating with Confidence

1 Inferential Methods for Correlation and Regression Analysis

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

This is an introductory course in Analysis of Variance and Design of Experiments.

Y i n. i=1. = 1 [number of successes] number of successes = n

Expectation and Variance of a random variable

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Sampling Distributions, Z-Tests, Power

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Read through these prior to coming to the test and follow them when you take your test.

Final Examination Solutions 17/6/2010

Chapter 22: What is a Test of Significance?

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Economics Spring 2015

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

1036: Probability & Statistics

Chapter 13, Part A Analysis of Variance and Experimental Design

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

6.3 Testing Series With Positive Terms

Chapter 5: Hypothesis testing

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Estimation of a population proportion March 23,

π: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Announcements. Unit 5: Inference for Categorical Data Lecture 1: Inference for a single proportion

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Chapter 6 Sampling Distributions

Last Lecture. Wald Test

Topic 9: Sampling Distributions of Estimators

Problem Set 4 Due Oct, 12

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Data Analysis and Statistical Methods Statistics 651

1 Models for Matched Pairs

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Lecture 2: Monte Carlo Simulation

Statistical inference: example 1. Inferential Statistics

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

MA238 Assignment 4 Solutions (part a)

Topic 9: Sampling Distributions of Estimators

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Successful HE applicants. Information sheet A Number of applicants. Gender Applicants Accepts Applicants Accepts. Age. Domicile

Chapter two: Hypothesis testing

STAT431 Review. X = n. n )

Chapter 23: Inferences About Means

Estimating the Population Mean - when a sample average is calculated we can create an interval centered on this average

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Statistics 20: Final Exam Solutions Summer Session 2007

Power and Type II Error

Parameter, Statistic and Random Samples

Chapter 1 (Definitions)

University of California, Los Angeles Department of Statistics. Hypothesis testing

MATH/STAT 352: Lecture 15

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Biostatistics for Med Students. Lecture 2

One-Sample Test for Proportion

Statistics. Chapter 10 Two-Sample Tests. Copyright 2013 Pearson Education, Inc. publishing as Prentice Hall. Chap 10-1

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Topic 10: Introduction to Estimation

AP Statistics Review Ch. 8

LESSON 20: HYPOTHESIS TESTING

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

Confidence Intervals for the Population Proportion p

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Topic 6 Sampling, hypothesis testing, and the central limit theorem

1 Review of Probability & Statistics

The standard deviation of the mean

Sample Size Determination (Two or More Samples)

Understanding Dissimilarity Among Samples

Transcription:

Chapter 6: These otes reflect material from our text, Explorig the Practice of Statistics, by Moore, McCabe, ad Craig, published by Freema, 2014. The Logic of a Cofidece Iterval for a Proportio Suppose we are studyig a biary categorical variable such as what proportio of the populatio will vote for a specific propositio i the ext electio. After the electio, this proportio of the etire votig populatio becomes kow; call it p. If we iterview a sample of voters before the electio, or i a exit poll, the we kow from the last chapter that the samplig distributio of the sample proportio ˆp of all voters i our sample votig for the propositio is a approximately ormal distributio with mea p ad stadard deviatio p(1 p)/. The proper ame for the stadard deviatio of a statistic is stadard error, so we will ofte write SE = p(1 p)/. But ow, 95% of a ormal distributio falls withi 1.96 stadard deviatios of its mea, so with probability 95%, the distace from the ceter p of our ormal samplig distributio to the sample proportio ˆp is less tha or equal to 1.96 SE. Well, the distace from A to B is the same as the distace from B to A, so we ca tur that aroud ad say that there is a 95% probability that such a iterval cetered o ˆp ad with radius 1.96 SE will cotai the populatio proportio p. Fially, 1.96 is a particular z-value associated with 95%, so let z = 1.96. The the ceter of our cofidece iterval for a proportio is the poit estimate ˆp, the margi of error of the iterval is ME = z SE, ad the etire cofidece iterval has the form poit estimate ± ME = ˆp ± z SE. Logic of a CI If the distace from 0 to a data poit is < 2, the the distace from the data poit to 0 is < 2. -3-2 -1 0 1 2 3 4 Sprig 2017 Page 1 of 13

Samplig distributio of ˆp Variable Statistic Shape Ceter Stadard Error Coditios p(1 p) categorical ˆp Normal p mi(p, (1 p)) 10 The stadard deviatio of a statistic is its stadard error: p(1 p) SEˆp = σˆp =. Whe it is kow, this value idicates the accuracy of ˆp. Whe it is ot kow, which is ofte the case, we use the estimated stadard error sˆp i its place (Rice, p.213). It is calculated from the data. ˆp(1 ˆp) sˆp =. Typical R code for a 95% cofidece iterval for a proportio # Peck, 1/e, ex. 9.43 p.hat <- 0.173 <- 200 alpha <- 0.05 z.star <- qorm(1 - alpha/2) se <- sqrt(p.hat * (1 - p.hat) / ) ci <- p.hat + z.star * se * c(-1, 1) # 0.1205786 0.2254214 The above code explicitly iserts plus ad mius sigs to create upper ad lower limits for the cofidece iterval. Aother way to achieve the same result is to have the fuctio qorm retur a pair of z.star values. z.star <- qorm(c(alpha/2, 1 - alpha/2)) ci <- p.hat + z.star * se ci # 0.1205786 0.2254214 Sprig 2017 Page 2 of 13

Hypothesis tests for statistical iferece There are four steps i the formal process of usig hypothesis tests for statistical iferece (Probability ad Statistics, Ope Learig Iitiative, CMU): Hypotheses. Formulate the ull ad alterative hypotheses. Data ad sample statistic. Collect relevat data from a radom sample ad summarize them usig a appropriate sample statistic. Verify the coditios which determie the distributio of the sample statistic. p value. Calculate the associated p value, the probability of obtaiig the observed sample statistic if the ull hypothesis is true. Coclusio. Decide whether or ot there is eough evidece to reject H 0 ad accept H A, ad state the coclusio i cotext. Example: a aciet coi A aciet piece of eight from a pirate s treasure claims to be a fair coi. Skeptical olookers ask for a bit of evidece, so the coi is flipped 100 times ad comes up heads 61 times. What should we believe? There are two hypotheses. The ull hypothesis assumes that othig is amiss. The alterative hypothesis states the cotrary. Let p be the probability of obtaiig a head with this coi. H 0 : p = 0.5 H a : p 0.5 The sample proportio is ˆp = 61/100 = 0.61, ad the stadard error of this statistic, assumig that the ull hypothesis is true, is so the z-score of ˆp is SE = p(1 p)/ = 0.05, z = (ˆp p)/se = 2.2. The probability of obtaiig a score this extreme or larger is 0.028, so the olookers quickly arrive at a coclusio. What did they coclude? Sprig 2017 Page 3 of 13

R code for the aciet coi # hypothesis test for a proportio # a aciet piece of eight claims to be a fair coi # H_0 : p == 0.5 # H_a : p!= 0.5 h <- 61; <- 100 p.hat <- h / p <- 0.5 se <- sqrt(p * (1 - p) / ) z <- (p.hat - p) / se # 2.2 p.value <- 2 * (1 - porm(z)) # 0.0278069 Hypothesis Test (coi) p = 0.028 z = 2.2-3 -2-1 0 1 2 3 Sprig 2017 Page 4 of 13

Sigificace (alpha, beta, ad power) A Type I error occurs if we reject H 0 whe H 0 is true. The probability of doig so is the sigificace level α of the hypothesis test. A Type II error occurs if we fail to reject H 0 whe H 0 is false. The probability of doig so is deoted by β. The power of a test is 1 β. The power of a hypothesis test is the probability of acceptig the alterative hypothesis H a whe the alterative hypothesis is true. If the alterative hypothesis is that a ew drug is more effective tha a established drug, the the power of the hypothesis test is very importat to the researcher s efforts to establish that fact. Decisio Reality Fail to reject H 0 Reject H 0 H 0 true correct Type I error H 0 false Type II error correct I the illustratios below, the z-test statistic has a stadard ormal distributio whe H 0 is true, illustrated by the curve o the left of each pair. The sigificace α is the area of the yellow regio. If the alterative hypothesis H a is true, ad the z-test statistic actually has a ormal distributio with mea 4 ad stadard deviatio 1, as i the curves o the right i the illustratios below, the β is the area of the regio i red. I the illustratio o the upper right, the power is the area of the regio i ta. Notice that reducig the probability of makig a Type I error amouts to shiftig the dividig lie separatig the yellow ad red areas to the right, reducig the area of the yellow regio, ad that will automatically icrease the probability of makig a Type II error, the area of the red regio. The size of β, ad hece of the power, 1 β, is very sesitive to the positio ad shape of the distributio of the z-statistic for the alterative hypothesis. Shiftig its mea to the left or right, or icreasig or decreasig its spread will affect β ad 1 β. These are samplig distributios, so the sample size,, plays a sigificat role i determiig the spread of these distributios. As gets larger, the spreads get arrower, so β gets smaller ad the power gets larger. alpha ad beta power 3 1 1 2 3 4 5 6 7 2 0 2 4 6 icrease beta decrease beta 3 1 1 2 3 4 5 6 7 3 1 1 2 3 4 5 6 7 Sprig 2017 Page 5 of 13

Oe proportio At oe poit, we studied a aciet coi recovered from a pirate s treasure chest, ad we wished to kow if the coi was fair or ot, that is, if it were flipped a large umber of times would the proportio of resultig heads be very close to 0.5. I this case, the relevat experimet is to flip the coi oce, ad the respose variable is categorical, the outcome is either a head or a tail. The evidece to decide the issue will be the outcomes from a certai umber of idepedet experimets. There are two ways to aalyze this evidece, by costructig a cofidece iterval or by performig a hypothesis test. The first step i both types of aalysis is to establish some defiitios ad otatio for expressig our ideas clearly. I this case, the populatio proportio, p, is the log ru probability of obtaiig a head o flippig the aciet coi. The sample proportio, ˆp, is the actual proportio of heads we observe i our sample of flips. Now, ˆp is a statistic, it varies from oe sample to the ext, so it has a samplig distributio. The stadard deviatio of that samplig distributio is called its stadard error, SE. For a proportio, the stadard error is give by SE = p(1 p)/ whe p is kow, ad SE = ˆp(1 ˆp)/ whe p is ukow. The Cetral Limit Theorem, CLT, applied to proportios shows that ˆp N(mea = p, SE = p(1 p)/) provided that certai coditios are satisfied : (1) the sample observatios are idepedet, ad (2) the sample size is sufficietly large, which is determied by the success-failure coditios (p 10 ad (1 p) 10). A cofidece iterval for a proportio ow takes the form poit estimate ± multiplier SE ˆp(1 ˆp) ˆp ± z where the multiplier, z, depeds o the desired cofidece level. For istace, for sigificace level α = 0.05 ad 95% cofidece we would use z = 1.96, ad for a more geeral (1 α) level of cofidece we would calculate z with the R commad qorm(1 alpha/2). A hypothesis test for a proportio posits the ull hypothesis H 0 : p = p 0, where p 0 = 0.5 i this case because of the ull assumptio that this is a fair coi, agaist a alterative hypothesis, which may take oe of three forms: The test statistic is H 0 : p p 0, H 0 : p > p 0, H 0 : p < p 0. z = (ˆp p 0 )/SE where the formula for the stadard error takes ito accout the presumptio of the ull hypothesis, sice p 0 is the ull value. SE = p 0 (1 p 0 )/ Sprig 2017 Page 6 of 13

We decide betwee the ull ad alterative hypotheses by calculatig the probability, called the p value, of obtaiig the test statistic that we actually did obtai from our evidece uder the coditios specified by the ull hypothesis. We summarize the above aalyses i the followig iferece table for a sigle proportio. Iferece Cofidece Iterval Test Statistic poit estimate ± multiplier SE (poit estimate - ull value) / SE proportio ˆp ± z ˆp(1 ˆp) p (ˆp p 0 )/ 0(1 p 0) Two proportios Suppose that we ow focus o two garly cois from the pirate s treasure. Is oe of them more likely tha the other to preset a head whe flipped, or do both cois share the same probability of comig up heads? This time we oce agai have a biary respose variable as a outcome, heads or tails, but the data come from two sources, the two cois. We have a biary explaatory variable with levels coi 1 ad coi 2. Let p 1 be the probability that coi 1 comes up heads, ad let p 2 be the probability that coi 2 comes up heads. The questio becomes, is p 1 = p 2? We ca ivestigate this questio with a hypothesis test whose ull hypothesis is H 0 : p 1 = p 2, or with a cofidece iterval which would put plausible bouds o the differece p 1 p 2. I both cases, the basic experimet is to flip each coi a certai umber of times ad record the outcome of each flip, head or tail. Let 1 be the umber of flips ad ˆp 1 the proportio of heads for coi 1 ad let 2 be the umber of flips ad ˆp 2 the proportio of heads for coi 2. The cofidece iterval for the differece of two proportios takes the form poit estimate ± multiplier SE where the poit estimate is ˆp 1 ˆp 2, the multiplier is z as before, ad the stadard error of our ew statistic is p 1 (1 p 1 ) SE = + p 2(1 p 2 ) 1 2 Notice that this formula for SE takes ito accout the variability of each coi. Therefore, the formula for the cofidece iterval for the differece of two proportios takes the form poit estimate ± multiplier SE (ˆp 1 ˆp 2 ) ± z p 1 (1 p 1 ) + p 2(1 p 2 ) 1 2 For a hypothesis test, the test statistic takes the form z = (poit estimate ull value)/se, where the ull value ad the SE are calculated assumig that the ull hypothesis is true. Sprig 2017 Page 7 of 13

The stadard error for the differece of two proportios for cofidece itervals ad for ull hypotheses of the form H 0 : p 1 p 2 0 is p 1 (1 p 1 ) SE = + p 2(1 p 2 ) 1 2 is where The stadard error for the differece of two proportios for ull hypotheses of the form H 0 : p 1 p 2 = 0 ˆp pool (1 ˆp pool ) SE = + ˆp pool(1 ˆp pool ) 1 2 ˆp pool = umber of successes i both groups combied 1 + 2. All of the above cases for a categorical respose variable are summarized i the followig table. Iferece Cofidece Iterval Test Statistic poit estimate ± multiplier SE (poit estimate - ull value) / SE proportio ˆp ± z ˆp(1 ˆp) (ˆp p 0 )/ differece i proportios (ˆp 1 ˆp 2 ) ± z ˆp 1(1 ˆp 1) 1 + ˆp2(1 ˆp2) ( (ˆp1 2 ˆp 2 ) 0 ) / p 0(1 p 0) ˆp(1 ˆp) 1 + ˆp(1 ˆp) 2 For hypothesis tests ivolvig a differece i proportios, the test statistic may make use of the pooled proportio, ˆp, which pools the total umber of successes ad the total umber of observatios i the two samples: p pooled = # successes # observatios. For sigificace level α, for istace α = 0.05 for a 95% cofidece level, the multiplier z ca be calculated i R with alpha <- 0.05 z.star <- qorm(1 - alpha/2) May proportios How ca oe compare more tha two cois? Do they all share the same probability of producig a head, or is at least oe of the cois differet? A efficiet approach to the problem of multiple comparisos goes back to the tur of the cetury with Karl Pearso s discovery i 1900 of the chi-squared test, oe of the earliest statistical tests to be clearly explaied. We will study the chi-square test for goodess of fit ad the chi-square test for idepedece i a later chapter. Sprig 2017 Page 8 of 13

Differece of Two Proportios Hadedess Wikipedia uses a hypothetical radom sample of left- ad right-haded people to illustrate its article o cotigecy tables. right-haded left-haded total males 43 9 52 females 44 4 48 totals 87 13 100 A first questio suggested by such a table is How may people are left-haded? The totals alog the bottom row idicate that, i this sample at least, about 13% of the (hypothetical) populatio that this sample was draw from is left-haded. The table also presets data o geder, so aother questio arises, Are males ad females equally likely to be left-haded? The mosaic plot o the left illustrates the distributio of hadedess i the sample, ad the plot o the right illustrates the proportios we would expect if hadedess were idepedet of geder. There is a differece, but is the differece i proportios simply due to radom samplig from male ad female populatios that are equally likely to be left-haded, or is the differece so great that we ca take it as evidece that males ad females actually differ i their likelihood to be left-haded? Hadedess i the Sample Expected Values if Hadedess Is Idepedet of Geder m f m f left left Haded right Haded right Geder Geder A hypothesis test of the differece of two proportios addresses that questio directly. Let p 1 be the proportio of left-haded me, ad let p 2 be the proportio of left-haded wome. The hypotheses to be tested are H 0 : p 1 p 2 = 0, H a : p 1 p 2 0. The poit estimate of the differece of proportios is ˆp 1 ˆp 2 = 0.08974359, the pooled proportio of left-haded people is p pooled = 0.13, ad the stadard error of this statistic is ( 1 SE = p pooled (1 p pooled ) + 1 ) = 0.06731456, 1 2 so the test statistic is z = (ˆp 1 ˆp 2 )/SE = 1.333197. Sprig 2017 Page 9 of 13

The test statistic z has a stadard ormal distributio, ad the probability of obtaiig a value this large or larger is p = 0.1824671. Hypothesis Test (Hadedess) p = 0.182 z = 1.333-3 -2-1 0 1 2 3 If males ad females are equally likely to be left-haded, we might expect to see a differece i the proportios of left- ad right-haded males ad females as large or larger tha we observed i about 18% of such samples. That is ot very much evidece agaist the ull hypothesis, so we fail to reject the ull hypothesis ad coclude that, i the light of this case at least, males ad females are equally likely to be left-haded. R Code # HT Here is supportig R code for the hypothesis test. x1 <- 9; 1 <- 52; x2 <- 4; 2 <- 48 p1.hat <- x1 / 1 p2.hat <- x2 / 2 p.pooled <- (x1 + x2) / (1 + 2) # 0.13 se <- sqrt(p.pooled * (1 - p.pooled) * (1 / 1 + 1 / 2)) # 0.06731456 z <- (p1.hat - p2.hat) / se # 1.333197 p.value <- 2 * (1 - porm(z)) # 0.1824671 Sprig 2017 Page 10 of 13

Differece of Two Proportios Smokig The ad Now Here is aother hypothetical example. Suppose you come across a magazie article reportig o the icidece of smokig i Sewaee studets some years ago, say i the year 1890. Accordig to this article, a radom sample of Sewaee studets was asked Have you smoked at least oe cigarette i the last week? ad the article reports the umber of studets who respoded Yes. Hmm, you thik. I woder how that would compare with Sewaee studets today? So, of course, the atural thig to do is to take aother radom sample of today s Sewaee studets ad ask the same questio. First, let s defie some variables. Let p 1 deote the proportio of Sewaee studets who smoked i 1890, ad let p 2 deote the proportio of Sewaee studets who smoke ow. Here, whether a studet is cosidered to be a smoker or ot is determied by his aswer to that same questio. There are two differet ways to orgaize this study: calculate a cofidece iterval for the differece of populatio proportios p 2 p 1 to quatify how the proportio of smokers might have chaged, or create a hypothesis test to test whether these two populatio proportios are the same or ot. Let s create the cofidece iterval first. We record the results of the surveys symbolically: ˆp 1 is the proportio of the 1 studets reportig Yes i 1890, ad ˆp 2 is the proportio of the 2 studets reportig Yes this year. Our poit estimate for the differece i the populatio proportios is ˆp 2 ˆp 1, ad the stadard error of this statistic is ˆp 1 (1 ˆp 1 ) SE = + ˆp 2(1 ˆp 2 ), 1 2 so the desired 95% cofidece iterval is (ˆp2 ˆp 1 ) ± 1.96 SE. A hypothesis test framig this questio might take the form H 0 : p 1 = p 2, H a : p 1 p 2. I effect, the ull hypothesis claims that the differece i proportios is 0, ad our poit estimate of this differece is ˆp 2 ˆp 1. But ow the stadard error of this statistic must take ito accout the ull hypothesis which claims that there is o differece i these two proportios. Therefore, we create a ew proportio by addig together all of the smokers from both samplig years, ad dividig by the total umber of sampled studets: # of smokers p pooled = # of studets. The stadard error of our poit estimate is ow ( 1 SE = p pooled (1 p pooled ) + 1 ), 1 2 the z-statistic is z = (ˆp 2 ˆp 1 )/SE, ad the correspodig p-value is calculated by R as 2 * (1 - porm(z)), sice the alterative hypothesis is two-sided. Which approach seems to shed more light o the situatio, the cofidece iterval or the hypothesis test? Sprig 2017 Page 11 of 13

R Code Let s make up some totally fictitious data to illustrate how to tur the statistical discussio i the previous sectio o Smokig The ad Now ito umerical results. Assume that i the sample from 1890, a total of 120 Sewaee studets were iterviewed ad 23 idetified themselves as smokers. For the moder sample, assume that 142 studets were iterviewed ad 17 idetified themselves as smokers. The R fuctio prop.test is a sophisticated ad authoritative implemetatio of the above procedures. Its default is to use a cotiuity correctio for more accuracy, but if we request that the cotiuity correctio ot be used, by settig correct=false, the the results will match those that are outlied above. prop.test(x=c(23, 17), =c(120, 142), alterative="two.sided", cof.level=0.95) # 2-sample test for equality of proportios with cotiuity correctio # data: c(23, 17) out of c(120, 142) # X-squared = 2.0761, df = 1, p-value = 0.1496 # alterative hypothesis: two.sided # 95 percet cofidece iterval: # -0.02411712 0.16801383 # sample estimates: # prop 1 prop 2 # 0.1916667 0.1197183 Outlie Outlie for oe-variable iferece for sample data: The goal is to geeralize from a sample to lear about a populatio categorical variable - oe proportio - cofidece iterval - oe-sample z CI for a proportio - hypothesis testig - oe-sample z HT for a proportio - differece of two proportios - cofidece iterval - two-sample z CI for a differece i proportios - hypothesis testig - two-sample z HT for a differece i proportios Statistical mythology You are a p.value. If you are sufficietly small, you ca slip uder the alpha fece ad ru away... ad escape from the spell of the ull hypothesis. fece, by Elizabeth Parrish, C 19 Sprig 2017 Page 12 of 13

Exercises We will attempt to solve some of the followig exercises as a commuity project i class today. Fiish these solutios as homework exercises, write them up carefully ad clearly, ad had them i at the begiig of class ext Friday. Homework 6a CI for a proportio Exercises from Sectio 6.1: 6.23 (Iuits), 6.24 (mates), 6.27 (seiors), 6.28 (seiors) Homework 6b CI ad HT for a proportio Exercises from Sectio 6.1: 6.30 (coffee), 6.31 (sermos), 6.32 (drivig), 6.38 (accidets) Homework 6c comparig two proportios Exercises from Sectio 6.2 ad Chapter 6 exercises: 6.48 (podcasts), 6.49 (podcasts), 6.72 (golf), 6.80 (water) Sprig 2017 Page 13 of 13