Estimating Proportions - PDF Free Download

3/1/018 Outlie for Today Remiders about Missig Values Iterretig Cofidece Itervals Cofidece About Proortios Proortios as Iterval Variables Cofidece Itervals Cofidece Coefficiets Examles Lab Exercise ( arts Both ivolve a lil math, A! 1 SOC364 w/ Dr. Ellis Godard -- Slide 4 Last Remiders about Missig Data SOC364 Statistics w/ Dr. Ellis Godard Estimatig Proortios 5 How do I recode to take care of missig values. Do NOT use TRANSFORM -RECODE or MISSINGS colum i VARIABLE VIEW VER uusual eed a good reaso robably NOT what you should do A gave a list of 7 solutios for missigs, twice recodig was NOT o it! Cases w/ missig values do t require actio They re already idetified as missig that s how they re couted Ivalid Values couted as valid - that s trouble! I the frequecy distributio, are ivalid values (such as "Dot Kow", "Not aly", ad "No aswer i the "Valid" grou of values, before the first Total or is there oly oe Total? If there are o roblematic values i the "Valid" grou ad all of the cases idetified as missig are i the "Missig" grou, the everythig's already bee doe for you. I Variable View, are there ivalid values i Values colum whose umbers SOC364 @ CSUN are t - Ellis listed Godard i the Missigs colum? Ilie for Today Breathe : Back-attig Questios? Descritios vary by level of measuremet So do ifereces (estimates about oulatios Nothig last lecture was es. ew 3 ideas today are (marked w/ lightig bolts Prose for Cofidece Itervals Not 5.74 ad 6.9 iterval betwee matters, ot just bouds Not just 5.74 to 6.9 What is that iterval? Not just cofidece iterval is 5.74 to 6.9 Put it i terms that ayoe ca uderstad We are 95% cofidet that the average umber of music tyes (of the 1 give liked by Americas falls somewhere betwee 5.74 ad 6.9. I geeral, We are 95% cofidet that the oulatio arameter falls withi this rage. 3 6 SOC364 w/ Dr. Ellis Godard 1

3/1/018 Cof. It. for Proortios? Remember: mea ad stadard deviatio ca oly be comuted for a iterval variable (e.g., verbal SAT scores or family icome. Cof. Iterval for iterval variables use both: ( z * ˆ ( z * s What do we do if our variables are either ordial or omial (a.k.a. categorical? Cetral Tedecy: mode or media o mea! Disersio: rage & variatio ratio o std dev! Nomial Proortio Cosider a hyothetical relative frequecy distributio for the omial variable Political Party for a samle of 500 voters: Value f r.f. Democrat 65 0.53 Reublica 0 0.44 Other 15 0.03 500 1.00 Based o this iformatio, we could say that the roortio of Democrats (versus Reublicas ad other is 0.53, or 53%. Be sure to use valid ercet! 7 10 Oly Itervals have Meas Iterval data has meas Arithmetic average Z scores cout # of std deviatios Cofidece itervals are rages, of width *z, aroud mea Nomials ad Ordials do t have meas Ca t arithmetically average Ca t cout # of std deviatios Measures distace from mea Calculated based o the mea Ca t calculate CIs aroud a mea usig stadard deviatios But Nomials & Ordials do have roortios The % that is ay value es, Male, Black, Very Satisfied Ordial Proortio Cosider a hyothetical frequecy distributio for stregth of arty loyalty (Do you suort your arty s ositio... amog a samle of 1,000 registered Democrats: Value f r.f. Most of the Time 630 0.63 Some of the Time 300 0.30 Rarely 50 0.05 Never 0 0.0 1,000 1.00 Based o this iformatio, we could say that the roortio of Disloyal Democrats (who either rarely or ever suort their arty s ositio is 0.07, or 7%. 8 11 Equatio for a Proortio A roortio is a secial case of a mea. Defie i =1 if the ith observatio is i the category of iterest (e.g. a Democrat i the first examle or disloyal Democrat i the d ad defie i =0 if the ith observatio is ot i the category. The, 1 i. Proortios are Iterval They re about a value of a omial or ordial variable But the itervals betwee them are equal ad cosistet They rage from 0% (0.0 to 100% (1.0 Remember: ercets rereset hidde decimals 50% = 50 er 100 = 50/100 = 0.50 We ca add them ad average them A set of roortios has a mea ad stadard deviatio 9 1 SOC364 w/ Dr. Ellis Godard

3/1/018 Poulatio vs. Samle Proortio Let (i equal the roortio of the defied oulatio classified i some secific category. The best (least biased ad most efficiet estimate of the oulatio roortio is the samle roortio (or, as i the text. Samlig Variace for i For omial or ordial variables, we use roortios rather tha meas, ad the formula for variace of roortios: (1 For examle, the variace for the ercet that s female, i a class of 10 studets that is 50% female, would be 0.50(1 0.50 0.50(0.50 0.5 40 10 10 10 But Professor Godard, you said ot to use the variace Excet as a ste to a stadard deviatio 13 16 Meas of Samle Proortios Proortios vary itervally amog samles Each samle has a roortio female, Athiest, tall, etc. Samle roortio Ca arithmetically average those roortios The collectio of all of them is a samlig distributio There s a hyothetical std. deviatio to all of them It s a stadard error of the samlig distributio With mea & std dev, ca calculate Zs & CIs of the oulatio mea roortio So, almost othig ew today Stadard Error for a Proortio For omial or ordial variables, the formula for variace of roortios: (1 The variace = the stadard deviatio squared, so The square root of each side gives the stadard deviatio of roortios: (1 That stadard deviatio is of a samlig distributio, so is a stadard error We estimate that stadard error by usig the samle roortio as a estimate of the oulatio roortio: 14 17 ˆ (1 Samlig Distributio of a Proortio If we calculated the samle roortio for all ossible samles of size, the distributio of these samle roortios would be aroximately ormally distributed aroud the oulatio roortio. (Stadard Error Largest ossible Whe & q are both 50%.5 x.5 =.5 But 90% female -> 0.9(1-0.9 = 0.9(0.1 = 0.09 Use that (.5 whe do t kow & 1- (aka q It s more coservative tha ay other guess Larger std error makes a wider cof. iterval, which gives looser claims about reality Review examles o.137 i the text More i ext lecture 15 18 SOC364 w/ Dr. Ellis Godard 3

3/1/018 Cof. Iterval for a Proortio C.I. for a Mea C.I. for a Proortio z z or 19 z or s z ( 1 Serious Errors to Avoid What you re redictig: Do t cofuse mea ad roortio Do t cofuse samle mea & oulatio mea Calculatig the iterval Mea: Use the stadard error (measurig disersio i the samlig distributio ot samle stadard deviatio (measurig the disersio i the samle Proortio: Use the stadard error for a roortio (which is ot the std. dev. divided by the sqrt rt of Where the s & Cof. Coefficiets are: Do t cofuse the area i the tails with the area betwee the tails of a samlig distributio; 0 Remember this? Same idea 196. 196. ( 196. ( A A A Samle B: Poulatio Mea Iside C.I. Area=0.95 196. Samle A: Poulatio Mea outside C.I. 196. ( 196. ( B B B 3 What roortio of CSUN studets have cosumed marijuaa? Poit estimate: 14 from samle of 6 = 53.8% or.538 A 95% cofidece iterval would be: z *.538 1.538* 1.538 *.538 1.96* Assumig our samle was radom & ubiased, we ca be 95% cofidet that betwee 34.63% & 7.96% of CSUN studets have smoked ot 6.486.538 1.96*.538 1.96*.00956 6 1.96*.0977.538.1916.3463.796 Prose for Proortios If we drew reeated samles of the same size from a oulatio ad calculated the 95 ercet cofidece itervals for each samle s roortio the we would exect that the oulatio roortio would fall i the cofidece iterval 95 ercet of the time. AP Style Guide o Polls Do ot exaggerate oll results. I articular, with re-electio olls, these are the rules for decidig whe to write that the oll fids oe cadidate is leadig aother: -- If the differece betwee the cadidates is more tha twice the samlig error margi*, the the oll says oe cadidate is leadig. -- If the differece is less tha the samlig error margi, the oll says that the race is close, that the cadidates are "about eve. (Do ot use the term "statistical dead heat," which is iaccurate if there is ay differece betwee the cadidates; if the oll fids the cadidates are tied, say they're tied. -- If the differece is at least equal to the samlig error but o more tha twice the samlig error, the oe cadidate ca be said to be "aaretly leadig" or "slightly ahead" i the race. * They mea stadard error (oos 1 4 SOC364 w/ Dr. Ellis Godard 4

3/1/018 Proortio Examle from SPSS Lab, Part I (ot i SPSS Last lab was C.I. for the iterval idex MUSIC Could also do omial, e.g. roortio who like folk Poit estimate (from frequecy table: 47.8% Iterval estimate: If the samle roortio is 50% ad the stadard error is 10%, costruct ad iterret 1. a 90% cofidece iterval z * * (1.478 1.96 *. 478.06.45.530.478 * (. 5 149 Iterretatio: We ca be 95% cofidet that, i the oulatio as whole, betwee 45. ad 53.0 ercet like folk. Note that this icludes the ossibility that more tha half like folk But we ca t be cofidet that a majority do. It s ossible that less tha a majority do too close to call. a 95% cofidece iterval 3. a 95.44% cofidece iterval 6 9 Goig Further w/ C.I. s Comare them, e.g. if we estimate that: 10-0% like hot dogs 15-30% like burgers 40-50% like izza. burgers hotdogs izza Lab, Part II (artly i SPSS Get % s from freq. distributios i SPSS Do t get S.E. or C.I. from SPSS! Assumes iterval! 4. Calculate a 95% cofidece iterval for the roortio who like Ra 10-------------15-------------0-------------5-------------30-------------35-------------40-------------45-------------50 If ay overla, ca t coclude there s ay differece Both arameters could be same value (e.g. 19% like hotdogs & burgers Could be the reverse of what it aears (19% hotdogs, 16% burgers Rest of the CI does t matter (e.g. burger mostly above hotdog? irrelevat But if the itervals do t overla, ca coclude arameters differ We re 95% cofidet that izza is more oular tha either oe 5. Calculate a 95% cofidece iterval for the roortio who like Oera 6. Calculate a 95% cofidece iterval for the roortio who like Bluegrass 7. Comare those itervals draw a icture, ad make coclusios based o how the itervals comare to each other! 7 30 Lab Hits There are two arts, that cout as oe lab. Remember, ercetages are iterval They iclude two hidde decimals : 50% = 50 er 100 = 50/100 = 0.50 8 SOC364 w/ Dr. Ellis Godard 5