Announcements. Unit 5: Inference for Categorical Data Lecture 1: Inference for a single proportion

Housekeepig Aoucemets Uit 5: Iferece for Categorical Data Lecture 1: Iferece for a sigle proportio Statistics 101 Mie Çetikaya-Rudel PA 4 due Friday at 5pm (exteded) PS 6 due Thursday, Oct 30 October 23, 2014 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 2 / 19 Mai ideas (1) Parameter ad poit estimate for a sigle proportio Mai ideas (2) Distributio of ˆp (1) Parameter ad poit estimate for a sigle proportio (2) Distributio of ˆp Parameter of iterest, p: Proportio of success i the populatio (ukow) Poit estimate, ˆp: Proportio of success i the sample Cetral limit theorem for proportios Sample proportios will be early ormally distributed with mea equal p (1 p) to the populatio mea, p, ad stadard error equal to. ˆp N mea = p, SE = p (1 p) Coditios: Idepedece: Radom sample/assigmet + 10% rule At least 10 successes ad failures Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 3 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 4 / 19

Mai ideas (2) Distributio of ˆp Mai ideas (2) Distributio of ˆp Suppose p = 0.93. What shape does the distributio of ˆp have i radom samples of = 100. (a) uimodal ad symmetric (early ormal) (b) bimodal ad symmetric Suppose p = 0.05. What shape does the distributio of ˆp have i radom samples of = 100. (a) uimodal ad symmetric (early ormal) (b) bimodal ad symmetric Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 5 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 6 / 19 Mai ideas (2) Distributio of ˆp Mai ideas (3) Expected vs. observed couts / proportios (3) Expected vs. observed couts / proportios Suppose p = 0.5. What shape does the distributio of ˆp have i radom samples of = 100. (a) uimodal ad symmetric (early ormal) (b) bimodal ad symmetric Whe doig a HT we must assume H 0 is true, whe costructig a CI there is o ull hypothesis that govers the calculatios. S-F: Number of successes ad failures for checkig the success-failure coditio for the early ormal distributio of ˆp: CI: use observed proportio ˆp 10 ad (1 ˆp) 10 HT: use ull value of the proportio p 0 10 ad (1 p 0 ) 10 SE: Proportio of success for calculatig the stadard error of ˆp: p(1 p) SE = CI: use observed proportio SE = ˆp(1 ˆp) HT: use ull value of the proportio SE = p 0 (1 p 0 ) Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 7 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 8 / 19

Mai ideas (4) Simulatio vs. theoretical iferece (4) Simulatio vs. theoretical iferece Sigle populatio proportio, large sample Behavioral idicators such as positive ad egative emotios are a vital measure of a society s wellbeig, ad are used for evaluatig coutries sice traditioal ecoomic idicators (e.g. GDP) aloe caot quatify the huma coditio. A 2012 Gallup poll foud that Sigaporeas are the least likely i the world to report experiecig emotios of ay kid o a daily basis. 36% out of Sigaporeas polled report feelig either positive or egative emotios. If the S-F coditio is met, ca do theoretical iferece: Z test, Z iterval If the S-F coditio is ot met, must use simulatio based methods: radomizatio test, bootstrap iterval Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 9 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 10 / 19 Sigle populatio proportio, large sample You are asked to write a ewspaper article about this fidig, ad provide a probable rage of values (a 95% cofidece iterval) for the true proportio of Sigaporeas who experiece emotios o a daily basis. Which of the followig is the correct CI? (a) 0.36 ± (b) 0.36 ± 1.65 (c) 0.36 ± 1.96 (d) 0.36 ± 1.96 (e) 0.36 ± 1.96 ( ) ˆp = 0.36 = 1, 095 Sigle populatio proportio, large sample Evaluate whether these data provide covicig evidece that majority of Sigaporeas do ot experiece emotios o a daily basis. ˆp = 0.36 = 1, 095 p: Proportio of Sigaporeas who experiece emotios daily H 0 : p = 0.5 H A : p < 0.5 S-F coditio: 1, 095 0.5 > 10 Z = obs ull SE = 0.36 0.50 = 3.97 p value = P(Z > 3.97) < 0.0001 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 11 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 12 / 19

Sigle populatio proportio, large sample Sigle populatio proportio, small sample Which of the followig is the best iterpretatio of the p-value? (a) If 50% of Sigaporeas experiece emotios daily, probability of obtaiig Sigaporeas where less tha 36% experiece emotios is less tha 0.001. (b) If 50% of Sigaporeas experiece emotios daily, probability of obtaiig Sigaporeas where more tha 36% experiece emotios is less tha 0.001. (c) If 50% of Sigaporeas experiece emotios daily, probability of obtaiig Sigaporeas where less tha 36% or more tha 64% experiece emotios is less tha 0.001. (d) If 36% of Sigaporeas experiece emotios daily, probability of obtaiig Sigaporeas where more tha 50% experiece emotios is less tha 0.001. Are you left haded? (a) Yes (b) No Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 13 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 14 / 19 Sigle populatio proportio, small sample A variety of studies suggest that 10% of the world populatio is lefthaded. Assumig that this class is a represetative sample of Duke studets, which of the followig are the correct set of hypotheses for testig if the proportio of Duke studets who are left-haded is differet tha the proportio of left-haded people i the world. (a) H 0 : p = 0.10; H A : p < 0.10 (b) H 0 : p = 0.10; H A : p 0.10 (c) H 0 : ˆp = 0.10; H A : ˆp < 0.10 (d) H 0 : ˆp Duke = ˆp world ; H A : ˆp Duke = ˆp world (e) H 0 : p Duke = p world ; H A : p Duke = p world Simulate by had Sigle populatio proportio, small sample Describe a simulatio scheme for this hypothesis test. 10 chips i a bag: 1 red (left-haded), 9 black (right-haded). Sample radomly times from the bag. Calculate ˆp, the proportio of reds (successes) i the radom sample of chips, record this value. Repeat may times. Calculate the proportio of simulatios where ˆp is at least as differet from 0.10 as the oe observed. Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 15 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 16 / 19

Simulate i R Sigle populatio proportio, small sample Recap Recap o CLT based methods dowload("https://stat.duke.edu/ mc301/r/iferece.rdata", destfile = "iferece.rdata") load("iferece.rdata") _left = [fill i based o class data] _otleft = [fill i based o class data] class_had = c(rep("left", _left), rep("ot left", _otleft)) iferece(class_had, success = "left", est = "proportio", type = "ht", ull = 0.10, alterative = "twosided", method = "simulatio") Calculatig the ecessary sample size for a CI with a give margi of error: If there is a previous study, use ˆp from that study If ot, use ˆp = 0.5: if you do t kow ay better, 50-50 is a good guess ˆp = 0.5 gives the most coservative estimate highest possible sample size HT vs. CI for a proportio Success-failure coditio: CI: At least 10 observed successes ad failures HT: At least 10 expected successes ad failures, calculated usig the ull value Stadard error: CI: calculate usig observed sample proportio: SE = HT: calculate usig the ull value: SE = p 0 (1 p 0 ) p(1 p) Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 17 / 19 Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 18 / 19 Recap Recap o simulatio methods If the S-F coditio is ot met HT: Radomizatio test simulate uder the assumptio that H 0 is true, the fid the p-value as proportio of simulatios where the simulated ˆp is at least as extreme as the oe observed. CI: Bootstrap iterval resample with replacemet from the origial sample, ad costruct iterval usig percetile or stadard error method. Statistics 101 (Mie Çetikaya-Rudel) U5 - L1: Iferece for a sigle proportio October 23, 2014 19 / 19