Chapter 3 Samplg For Proportos ad Percetages I may stuatos, the characterstc uder study o whch the observatos are collected are qualtatve ature For example, the resposes of customers may marketg surveys are based o reples lke yes or o, agree or dsagree etc Sometmes the respodets are asked to arrage several optos the order lke frst choce, secod choce etc Sometmes the objectve of the survey s to estmate the proporto or the percetage of brow eyed persos, uemployed persos, graduate persos or persos favorg a proposal, etc I such stuatos, the frst questo arses how to do the samplg ad secodly how to estmate the populato parameters lke populato mea, populato varace, etc Samplg procedure: The same samplg procedures that are used for drawg a sample case of quattatve characterstcs ca also be used for drawg a sample for qualtatve characterstc So, the samplg procedures rema same rrespectve of the ature of characterstc uder study - ether qualtatve or quattatve For example, the SRSWOR ad SRSWR procedures for drawg the samples rema the same for qualtatve ad quattatve characterstcs Smlarly, other samplg schemes lke stratfed samplg, two stage samplg etc also rema same Estmato of populato proporto: The populato proporto case of qualtatve characterstc ca be estmated a smlar way as the estmato of populato mea case of quattatve characterstc Cosder a qualtatve characterstc based o whch the populato ca be dvded to two mutually exclusve classes, say C ad C* For example, f C s the part of populato of persos sayg yes or agreeg wth the proposal the C* s the part of populato of persos sayg o or dsagreeg wth the proposal Let A be the umber of uts C ad ( - A) uts C* be a populato of sze The the proporto of uts C s A P = ad the proporto of uts C* s A Q= = P Samplg Theory Chapter 3 Samplg for Proportos Shalabh, IIT Kapur Page
A dcator varable Y ca be assocated wth the characterstc uder study ad the for =,,, Y th ut belogs to C = th 0 ut belogs to C* ow the populato total s TOTAL Y = Y = A = ad populato mea s Y Y A = = = P = Suppose a sample of sze s draw from a populato of sze by smple radom samplg Let a be the umber of uts the sample whch fall to class C ad ( a) uts fall class C*, the the sample proporto of uts C s a p = whch ca be wrtte as y a p = = = y = Sce = Y = A = P, so we ca wrte S ad s terms of P ad Q as follows: S Y Y = ( ) = ( = Y Y ) = ( ) = P P = PQ Smlarly, y = a = p ad = Samplg Theory Chapter 3 Samplg for Proportos Shalabh, IIT Kapur Page
s y y = ( ) = ( = y y ) = ( ) = p p = pq ote that the quattes yy,, s ad S have bee expressed as fuctos of sample ad populato proportos Sce the sample has bee draw by smple radom samplg ad sample proporto s same as the sample mea, so the propertes of sample proporto SRSWOR ad SRSWR ca be derved usg the propertes of sample mea drectly SRSWOR Sce sample mea y a ubased estmator of populato mea Y, e E( y) SRSWOR, so E( p) = E( y) = Y = P ad p s a ubased estmator of P = Y case of Usg the expresso of Var( y ), the varace of p ca be derved as Var( p) = Var( y) = S = PQ PQ = Smlarly, usg the estmate of Var( y), the estmate of varace ca be derved as Var( p) = Var( y) = s = pq = pq ( ) () SRSWR Sce the sample mea y s a ubased estmator of populato mea Y case of SRSWR, so the sample proporto, Samplg Theory Chapter 3 Samplg for Proportos Shalabh, IIT Kapur Page 3
E( p) = E( y) = Y = P, e, p s a ubased estmator of P Usg the expresso of varace of y ad ts estmate case of SRSWR, the varace of p ad ts estmate ca be derved as follows: Var( p) = Var( y) = S = PQ PQ = pq Var( p) = pq = Estmato of populato total or total umber of cout It s easy to see that a estmate of populato total A (or total umber of cout ) s Aˆ = p = a, ts varace s Var Aˆ ( ) = Var( p) ad the estmate of varace s Var( Aˆ ) = Var ( p) Cofdece terval estmato of P If ad are large the ca wrte p P Var( p) approxmately follows (0,) Wth ths approxmato, we p P P Z Z = α α α Var( p) ad the 00( α)% cofdece terval of P s p Z Var( p), p + Z Var( p) α α Samplg Theory Chapter 3 Samplg for Proportos Shalabh, IIT Kapur Page 4
It may be oted that ths case, a dscrete radom varable s beg approxmated by a cotuous radom varable, so a cotuty correcto / ca be troduced the cofdece lmts ad the lmts become p Z Var( p) +, p + Z Var( p ) α α Use of Hypergeometrc dstrbuto : Whe SRS s appled for the samplg of a qualtatve characterstc, the methodology s to draw the uts oe-by-oe ad so the probablty of selecto of every ut remas the same at every step If samplg uts are selected together from uts, the the probablty of selecto of uts does ot remas the same as the case of SRS Cosder a stuato whch the samplg uts a populato are dvded to two mutually exclusve classes Let P ad Q be the proportos of samplg uts the populato belogg to classes ad respectvely The P ad Q are the total umber of samplg uts the populato belogg to class ad, respectvely ad so P + Q = The probablty that a sample of selected uts out of uts by SRS such that selected uts belogs to class ad selected uts belogs to class s govered by the hypergeometrc dstrbuto ad P Q P ( ) = As grows large, the hypergeometrc dstrbuto teds to Bomal dstrbuto ad P ( ) s approxmated by P ( ) = p ( p) Iverse samplg I geeral, t s uderstood the SRS methodology for qualtatve characterstc that the attrbute uder study s ot a rare attrbute If the attrbute s rare, the the procedure of estmatg the populato proporto P by sample proporto / s ot sutable Some such stuatos are, eg, estmato of frequecy of rare type of gees, proporto of some rare type Samplg Theory Chapter 3 Samplg for Proportos Shalabh, IIT Kapur Page 5
of cacer cells a bopsy, proporto of rare type of blood cells affectg the red blood cells etc I such cases, the methodology of verse samplg ca be used I the methodology of verse samplg, the samplg s cotued utl a predetermed umber of uts possessg the attrbute uder study occur the samplg whch s useful for estmatg the populato proporto The samplg uts are draw oe-by-oe wth equal probablty ad wthout replacemet The samplg s dscotued as soo as the umber of uts the sample possessg the characterstc or attrbute equals a predetermed umber Let m deotes the predetermed umber dcatg the umber of uts possessg the characterstc The samplg s cotued tll m umber of uts are obtaed Therefore, the sample sze requred to atta m becomes a radom varable Probablty dstrbuto fucto of I order to fd the probablty dstrbuto fucto of, cosder the stage of drawg of samples t such that at t =, the sample sze completes the m uts wth attrbute Thus the frst (t - ) draws would cota (m - ) uts the sample possessg the characterstc out of P uts Equvaletly, there are (t - m) uts whch do ot possess the characterstc out of Q such uts the populato ote that the last draw must esure that the uts selected possess the characterstc So the probablty dstrbuto fucto of ca be expressed as I a sample of ( -) uts The ut draw at th P ( ) = P draw from, ( m-) uts P the draw wll wll possess the attrbute possess the attrbute P Q m m = P m +, = m, m +,, m + Q + ote that the frst term ( square brackets) s derved usg hypergeometrc dstrbuto as the probablty for dervg a sample of sze ( ) whch (m ) uts are from P uts ad P m + ( m) uts are from Q uts The secod term s the probablty assocated + wth the last draw where t s assumed that we get the ut possessg the characterstc m+ Q ote that P ( ) = = m Samplg Theory Chapter 3 Samplg for Proportos Shalabh, IIT Kapur Page 6
Estmate of populato proporto Cosder the expectato of m m+ Q m m E= P ( ) = m P Q m+ Q m m m p m + = = m + P Q m+ Q P m + m m = = m + whch s obtaed by replacg P by P, m by (m ) ad by ( - ) the earler step Thus m E = P So ˆ m P = s a ubased estmator of P Estmate of varace of ˆP ow we derve a estmate of varace of ˆP By defto ˆ ˆ ˆ VarP ( ) = EP ( ) ( ) EP EPˆ P = ( ) Thus Var( Pˆ) = Pˆ Estmate of P I order to obta a estmate of m m ( )( ) P, cosder the expectato of ( )( ),, e ( m )( m ) ( m )( m ) E P ( ) ( )( ) = m ( )( ) P Q P( P ) P m + m 3 m = m + 3 Samplg Theory Chapter 3 Samplg for Proportos Shalabh, IIT Kapur Page 7
where the last term sde the square bracket s obtaed by replacg P by ( P ), by ( ) ad m by (m - ) the probablty dstrbuto fucto of hypergeometrc dstrbuto Ths solves further to ( m )( m ) P P E ( )( ) = Thus a ubased estmate of Estmate of P P s ( m )( m ) Pˆ = + ( )( ) ( m )( m ) m = + ( )( ) Fally, a estmate of varace of ˆP s Var( Pˆ) = Pˆ Estmate of P m ( m )( m ) m = + ( )( ) m m ( )( m ) = + For large, the hypergeometrc dstrbuto teds to egatve Bomal dstrbuto wth probablty desty fucto ˆ m P = ad P m ˆ ˆ ˆ ( m )( m) P( P) Var( P) = = ( ) ( ) m Q m So Samplg Theory Chapter 3 Samplg for Proportos Shalabh, IIT Kapur Page 8
Estmato of proporto for more tha two classes We have assumed up to ow that there are oly two classes whch the populato ca be dvded based o a qualtatve characterstc There ca be stuatos whe the populato s to be dvded to more tha two classes For example, the taste of a coffee ca be dvded to four categores very strog, strog, mld ad very mld Smlarly aother example the damage to crop due to storm ca be classfed to categores lke heavly damaged, damaged, mor damage ad o damage etc These type of stuatos ca be represeted by dvdg the populato of sze to, say k, mutually exclusve classes C, C,, C k Correspodg to these classes, let C C Ck P =, P =,, Pk =, be the proportos of uts the classes C, C,, C k respectvely Let a sample of sze s observed such that c, c,, c k umber of uts have bee draw from C, C,, C k respectvely The the probablty of observg c c Pc c C C Ck c c c = k (,,, ck ),,, c k s c The populato proportos P ca be estmated by p =, =,,, k It ca be easly show that E( p ) = P, =,,, k, PQ Var( p ) = ad pq Var( p ) = For estmatg the umber of uts the th class, Cˆ = p Var Cˆ ( ) = Var( p) ad Var( Cˆ ) = Var ( p ) Samplg Theory Chapter 3 Samplg for Proportos Shalabh, IIT Kapur Page 9
The cofdece tervals ca be obtaed based o sgle p as the case of two classes If s large, the the probablty of observg c, c,, c k ca be approxmated by multomal dstrbuto gve by! c Pc (, c,, c) = P P P k c c k c! c! ck! For ths dstrbuto E( p ) = P, =,,, k, P( P) Var( p ) = ad p( p) Var( pˆ ) = k Samplg Theory Chapter 3 Samplg for Proportos Shalabh, IIT Kapur Page 0