Understanding Dissimilarity Among Samples

Aoucemets: Midterm is Wed. Review sheet is o class webpage (i the list of lectures) ad will be covered i discussio o Moday. Two sheets of otes are allowed, same rules as for the oe sheet last time. Office hours today, Mo, Tues slightly revised from usual. See webpage. Homework (due Moday): Chapter 9: #50 (Each part couts for poit, so problem is worth 6 poits.) Chapter 9, Sectios 4, 5, 9 Samplig Distributios for Proportios: Oe proportio or differece i two Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 Uderstadig Dissimilarity Amog Samples Key: Need to uderstad what kid of dissimilarity we should expect to see i various samples from the same populatio. Suppose kew most samples were likely to provide a aswer that is withi 0% of the populatio aswer. The would also kow the populatio aswer should be withi 0% of whatever our specific sample gave. => Have a good guess about the populatio value based o just oe sample value. Statistics ad Parameters A statistic is a umerical value computed from a sample. Its value may differ for differet samples. e.g. sample mea x, sample stadard deviatio s, ad sample proportio. A parameter is a umerical value associated with a populatio. Cosidered fixed ad uchagig. e.g. populatio mea µ, populatio stadard deviatio σ, ad populatio proportio p. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 3 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 4 Samplig Distributios Each ew sample take => sample statistic will chage. The distributio of possible values of a statistic for repeated samples of the same size from a populatio is called the samplig distributio of the statistic. May statistics of iterest have samplig distributios that are approximately ormal distributios 9.4 Samplig Distributio for Oe Sample Proportio Suppose (ukow to us) 40% of a populatio carry the gee for a disease, (p = 0.40). We will take a radom sample of 5 people from this populatio ad cout X = umber with gee. Although we expect o average to fid 0 people (40%) with the gee, we kow the umber will vary for differet samples of = 5. I this case, X is a biomial radom variable with = 5 ad p = 0.4. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 5 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 6

May Possible Samples Four possible radom samples of 5 people: Sample : X =, proportio with gee =/5 = 0.48 or 48%. Sample : X = 9, proportio with gee = 9/5 = 0.36 or 36%. Sample 3: X = 0, proportio with gee = 0/5 = 0.40 or 40%. Sample 4: X = 7, proportio with gee = 7/5 = 0.8 or 8%. Note: Each sample gave a differet aswer, which did ot always match the populatio value of 40%. Although we caot determie whether oe sample statistic will accurately estimate the true populatio parameter, statisticias have determied probabilities for how far from the truth the sample values could be. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 7 The Normal Curve Approximatio Rule for Sample Proportios Let p = populatio proportio of iterest or biomial probability of success. Let = sample proportio or proportio of successes. If umerous radom samples or repetitios of the same size are take, the distributio of possible values of is approximately a ormal curve distributio with Mea = p p ( Stadard deviatio = s.d.( ) = This approximate distributio is samplig distributio of. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 8 Ex: Medicie cures 60% Sample 00 people p = proportio of sample cured Samplig distributio for p is: Approximately ormal Mea = p =.60 (.4)(.6) St. dev. = =.0346 00 From Empirical Rule, expect 95% of samples to produce p to be i the iterval mea ± s.d..60 ± (.0346) or.60 ±.07 or.53 to.67. Samplig distributio of p-hat for = 00, p =.6 Normal, Mea=0.6, StDev=0.0346 0.53 0.6 Possible p-hat 95% 0.67 9 0 The Normal Curve Approximatio Rule for Sample Proportios Normal Approximatio Rule ca be applied i two situatios: Situatio : A radom sample is take from a populatio. Situatio : A biomial experimet is repeated umerous times. I each situatio, three coditios must be met: Coditio : The Physical Situatio There is a actual populatio or repeatable situatio. Coditio : Data Collectio A radom sample is obtaied or situatio repeated may times. Coditio 3: The Size of the Sample or Number of Trials The size of the sample or umber of repetitios is relatively large, p ad p(- must be at least 5 ad preferable at least 0. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 How well does the approximatio work? It depeds o ad p. Try this applet: http://bcs.whfreema.com/pbs/cat_050/pbs/clt-biomial.html

Examples for which Rule Applies Polls: to estimate proportio who favor a cadidate; uits = all voters. Televisio Ratigs: to estimate proportio of households watchig TV program; uits = all households with TV. Cosumer Prefereces: to estimate proportio of cosumers who prefer ew recipe compared with old; uits = all cosumers. Testig ESP: to estimate probability a perso ca successfully guess which of 5 symbols o a hidde card; repeatable situatio = a guess. Example: Belief i evolutio Gallup Poll. Feb. 6-7, 009. N=,08 adults atiowide. Margi of error give as +/-3%. "Now, thikig about aother historical figure: Ca you tell me with which scietific theory Charles Darwi is associated?" Optios rotated Correct respose (Evolutio, atural selectio, etc.) 55% Icorrect respose 0% Usure/do t kow 34% No aswer % Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 3 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 4 Example, cotiued "I fact, Charles Darwi is oted for developig the theory of evolutio. Do you, persoally, believe i the theory of evolutio, do you ot believe i evolutio, or do't you have a opiio either way? (Poll based o = 08 adults) Believe i evolutio 39% Do ot believe i evolutio 5% No opiio either way 36% No aswer % Example, cotiued Let p = populatio proportio who believe i evolutio. Our observed =.39, from sample of 08. Based o samples of = 08, comes from a distributio of possible values, which is approximately ormal with mea µ = p ad stadard deviatio σ = p( 08 Based o this, ca we use to estimate p? Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 5 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 6 Estimatig the Populatio Proportio from a Sigle Sample Proportio I practice, we do t kow the true populatio proportio p, so we caot compute the stadard deviatio of, p ( s.d.( ) =. I practice, we oly take oe radom sample, so we oly have oe sample proportio. Replacig p with i the stadard deviatio expressio gives us a estimate that is called the stadard error of. p ( ) s.e.( ) =. If = 0.39 ad = 08, the the stadard error is 0.053. So the true proportio who believe i evolutio is almost surely betwee 0.39 3(0.053) = 0.344 ad 0.39 + 3(0.053) = 0.436. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 7 Idepedet Samples Two samples are called idepedet samples whe the measuremets i oe sample are ot related to the measuremets i the other sample. Radom samples take separately from two populatios ad same respose variable is recorded. Oe radom sample take ad a variable recorded, but uits are categorized to form two populatios. Participats radomly assiged to oe of two treatmet coditios, ad same respose variable is recorded. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 8

Parameter : Differece i two populatio proportios, based o idepedet samples Example research questios: How much differece is there betwee the proportios that would quit smokig if wearig a icotie patch versus if wearig a placebo patch? How much differece is there i the proportio of UCI studets ad UC Davis studets who are a oly child? Were the proportios believig i evolutio the same i 994 ad 005? Populatio parameter: p p = differece betwee the two populatio proportios. Sample estimate: = differece betwee the two sample proportios. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 9 Samplig distributio for the differece i two proportios Approximately ormal Mea is p p = true differece i the populatio proportios Stadard deviatio of p is s. d.( ) = p ( p ) p ( p ) Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 0 + Ex: drugs, cure rates of 60% ad 65%, what is probability that drug will cure more i the sample tha drug if we sample 00 takig each drug? Wat P( > 0) Samplig distributio for is: Approximately ormal Mea =.05. 6 s.d. = (.6).65 (.65 ) + =.048 00 00 See picture o ext slide. Samplig distributio for differece i proportios (00 i each sample) Normal, Mea=-0.05, StDev=0.048-0.0 0.488-0.5-0.0-0.05 0.00 0.05 Possible differeces i proportios cured (Drug - Drug ) 0.0 Geeral format for all samplig distributios i Chapter 9 The samplig distributio of the sample estimate (the sample statistic) is: Approximately ormal Mea = populatio parameter Stadard deviatio is called the stadard deviatio of, where the blak is filled i with the ame of the statistic (p-hat, x-bar, etc.) The estimated stadard deviatio is called the stadard error of. Stadard Error of the Differece Betwee Two Sample Proportios ( ) ( ) s. e.( ) = Are more UCI tha UCD childre a oly child? = 358 (UCI, classes combied) = 73 (UCD) UCI: 40 of the 358 studets were a oly child = p =. UCD: 4 of the 73 studets were a oly child = p =.08 So, p =..08 =.03 + ( )...08(.08) ad se..( = + =.064 358 73 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 3 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 4

Suppose populatio proportios are the same, so true differece p p = 0 Samplig distributio of p The the samplig distributio of is: Approximately ormal Mea = populatio parameter = 0 The estimated stadard deviatio is.064 Observed differece of.03 is z =.74 stadard errors above the mea of 0. See picture o ext slide; area above.03 =.0 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 5 Desity Normal, Mea=0, StDev=0.064 6 4 0 8 6 4 0.0 0 0 0.03 Copyright 004 Brooks/Cole, possible a divisio values of Thomso of differece Learig, Ic., i updated sample by proportios Jessica Utts, Nov 00 6 Stadardized Statistics for samplig distributios Recall the geeral form for stadardizig a radom variable x whe it has a ormal distributio: z = x µ σ For all 5 parameters we will cosider, we ca fid where our observed sample statistic falls if we hypothesize a specific umber for the populatio parameter: sample statistic populatio parameter z = s. d.( sample statistic) Example: Do college studets watch less TV? I geeral, there is t much correlatio betwee age ad hrs/tv per day. I 008 Geeral Social Survey (very large ), 73% watched hours per day. So assume populatio proportio is.73. I a sample of 75 college studets (at Pe State), 05 said they watched or more hours per day. Is it likely that the populatio proportio for studets is also.73? 05 p = =.6 75.6.73 z = = 3.8.034 p( 0.73( 0.73) sd..( ) = = = 0.034 75 This z-score is too small! Area below it is.00007. Studets are differet from geeral populatio. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 7 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 8 Case Study 9. Do Americas Really Vote Whe They Say They Do? Electio of 994: Time Magazie Poll: = 800 adults (two days after electio), 56% reported that they had voted. Ifo from Committee for the Study of the America Electorate: oly 39% of America adults had voted. If p = 0.39 the sample proportios for samples of size = 800 should vary approximately ormally with p( 0.39( 0.39) mea = p = 0.39 ad s.d.( ) = = = 0. 07 800 Case Study 9. Do Americas Really Vote Whe They Say They Do? If respodets were tellig the truth, the sample percet should be o higher tha 39% + 3(.7%) = 44.%, owhere ear the reported percetage of 56%. If 39% of the populatio voted, the stadardized score for the reported value of 56% is 0.56 0.39 z = = 0.0 0.07 It is virtually impossible to obtai a stadardized score of 0. Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 9 Copyright 004 Brooks/Cole, a divisio of Thomso Learig, Ic., updated by Jessica Utts, Nov 00 30