Two phase stratified sampling with ratio and regression methods of estimation

CHAPTER - IV Two phase stratified samplig with ratio ad regressio methods of estimatio 4.1 Itroductio I sample survey a survey sampler might like to use a size variable x either (i) for stratificatio or (ii) for icorporatio i estimatio procedure or (iii) for selectig a sample. Sometimes oe might thik of usig x both for (i) ad (ii) or for (i) ad (iii). I this chapter we cosider the situatio whe a auxiliary variable x is used both for stratificatio ad for ratio or regressio method of estimatio. Let the fiite populatio JJ of size N cosists of L strata of sizes NJt N2, ;Nl with Nh uits belogig to /t-th stratum. Whe the sizes of strata are ot kow, a iitial SRSWOR sample Sj of fixed size! is selected ad the classified ito differet strata with [ uits fallig i the h-th stratum i. slh (h = 1,2,..,L) with T,h = '. I the secod phase a SRSWOR sample of size hf/ is draw from slh of size idepedetly of each h to observe the mai variable y. We assume that! is so large that 77,( > 0 for each h. We also assume that at the secod phase a costat proportio of uits iitial sample. gh = 'flll are sampled from the h-th stratum of the,v 0 (i ' > V* ' * ' - :' C

75 4.2 Ratio method estimatio Let us defie a ratio estimator uder two phase stratified samplig. L -// ~ yh / wh^xh h (4.2.1) where yhf/, xhf/ are the sample mea of the h th stramm based o a sample of size,fh; x/h is the sample mea of the hth stratum based o a sample of size 'h wi, = h/' Theorem 4.1 Uder two phase stratified radom samplig yis approximately a ubiased estimator of Y for large value of u Proof: E(y«*) W h -II Yh -/ L -// = Ei 2u wh xh E2 h=l II -/ /. \ r/' +'o v."»// / (4.2.2) where, = A^/ZV = W/r ad is based o with 2?(yj[) = -

76 Theorem 4.2 If the first sample is a radom sample of size /, the secod sample is a radom stratified sample from the first, with fixed gh (0 < gh < 1), the = 1 N \ I 4+E f-l-i) / a=i k ; (4.2.3) where S,h = 4 + Rl 4-2 Rh ; - t N Sy - K yrj) syh» sxh are populatio variaces of y ad x for the /z-th stramm respectively ad 5 A is the populatio covariace betwee x ad y for the /2-th stratum. Proof = Eiv2(%) + ViE2(yst) (4.2.4) Now,,^(5*,) a=i // yh -/ Xh,I>* : h=l / -// )2 yh -/ -/ // xh~yh / = ie w* a=i // / h h /2 v v 2 1?rA = Wh-~ a=i w. k?/2 VA where s = s + R? 4 2Rh 4a

77 Syh, are the variace based o sampled h uits i the iitial sample of the h-xh stratum ad s^,h is the covariace based o sampled 'h uits i the iitial sample of the h-th stratum ad r[ = y!h I x[. E1V2(yRJj= L> h=1 u-l) whsl it f--1] [8k ) h=l ; (4.2.5) L -H L ad VlE2$to) = VlE2 Y>kqfxll = Wkyl h-1 Xl < N S; (4.2.6) / From (4.2.4), (4.2.5) ad (4.2.6) we fid F(yJ = = (l i) L S? + E U - l) lv N) y *-i k8* > 2 wl h ^rh s, Theorem 4.3 A ubiased estimator of is N' N-1 f -lh Sk -1 hsrh + N- '-\ Zyiyl-'ylft=l ghj*= 1 J (4.2.7) where srh2 = sy2 + Rh//2 - IR^s^, yhj is the j-th observatio of /?-th stramm ad = yf / Jf.

78 Proof L Nh Est.(N-l)Sy = Est.'E y -MyL - h=l j=l (4.2.8) ad w E-*Etf *=i h y-i L If, TjEE^i iv A-l ;=1 (4.2.9) Form (4.2.8) ad (4.2.9) st(7v-l)s 2 = TV,, //.. Hl J=1 It ca be easily see that Est-i h=l (1-1 wa _ (J-i) l J ; h=i v ) WhSrh (4.2.10) (4.2.11) From (4.2.10) ad (4.2.11), we write ±_1 r N N N-l *=i i! j=i r\ a. w Hhc. M... i 1 + E (gk -1 \ 2 ^ Vrh (4.2.: Hece the result.

79 4.2.1 Optimum allocatio Cosider the cost fuctio C = C'' + 'ECkh (4.2.1.1) h = l where d = Cost per uit i the first phase sample; d1 h = Cost per uit i the secod phase sample. Sice f,h is a radom variable, the expected cost is E(C) = C1 (stf) = C ' +» 'Y, Ch8hWh h=i (4.2.1.2) because " = hgk, = ghe['h) =!Whgh. The product C* r(y**h- y N w, S,2 + E / 1 \ -l WH& h=l is miimised if ad oly if C' c[g>wh 2 J WtS^ s; - E A-1 (4.2.1.3) This gives optimum value of as = A SmVc7 s,2.-e^4,// (4.2.1.4)

80 Hece, the optimum variace is 2 v( yjjopt = c* ^ (4.2.1.5) N 4.3. Regressio method of estimatio samplig. Let us defie a regressio estimator uder two phase stratified radom L h=1 (4.3.1) where /3h is the kow populatio regressio coefficiet for the /i-th stratum. Theorem 4.4 Uder two phase stratified radom samplig yreg_st is a ubiased estimator of Y. Proof L L (4.3.2) where is the sample mea of the h th stratum based o a sample of size ^.

81 Theorem 4.5 If the first sample is a radom sample of size 1 ad the secod sample is a radom stratified sample from the first with fixed gh (0 < gh < 1), the V(y Reg-st) ( '~ N \ i. l-i WhS^ M- (4.3.3) where Sxh2 is the populatio variace of y for the /z-th stratum ad ph is the populatio correlatio coefficiet betwee x ad y for the /z-th stratum. Proof y(y^) = + (4.3.4) Now E^y^) r // o /-/ //v Ja + Pa(*a~*a ) = iewaf2(^/-ma/) / = ie w* II / V «A»A ) + p*42-2p p*v4) L M W.S, (i-p ^,8h, Tt -E (4.3.5) ad ^(7**-*) = W*K +?h(.4-*h)

82 Fi vl = h=l J l ' N (4.3.6) That completes the proof. Theorem 4.6 A estimator of is ^(y Reg-st) N1 N-l^t '-\h~ i -1 ^ Sh j (i-ps)^ TiT 1\L 1 "* N~ ' J\-\ 1 2 /=2 +------ L Vkj - y.reg-*\ -1 *=i Sr y=i (4.3.7) where ph is the estimated value of ph based o l Proof: L Nt Writig (IV-1)5^ = E E yl- Ny2,, h=l j=1 we ca see that it has a ubiased estimate If N _ // 'r r 2 _/=* ^ // ^5 ^ " y=l (4.3.8) Also sr. I /i=i 21 WVy, -1 M) &h II' Result follows from (4.3.7), (4.3.8) ad (4.3.9). A: f',.x /cjl P.tivWcf'' (4.3.9)

83 4.3.1 Optimum allocatio Cosiderig cost fuctio i (4.2.1.1) the optimum value of the variace is obtaied by miimisig V(y ) + JL 'jreg-st' jy ( c'+t.c;gkwh h=1 X r, l se+r / /i=i ^ gh ) (l-p with respect to gh ad it exists if ad oly if c' cl'ghwh h=1 Sh Hece the optimum value of gh is 8h = \ ^-E^-pIKs h=1.// (4.3.1.1) ad hece the optimum value of ' ca be obtaied by the expected cost ad the substitutig the optimum values of ' ad gh, the optimum value of the variace is obtaied as 4.3.2 Numerical illustratio Cosider a data collected i a complete eumeratio of 256 commercial peach orchards i North Carolia i Jue 1946 (Fiker, 1950). Here the area

84 is divided geographically ito three strata. The umber of peach trees i a orchard is deoted by xhi ad the estimated productio i bushels of peaches by Yhi- Strata wh Syh2 Q 2 &xh Syxh xh Yh Srh2 Ph 1 0.184 8699 5186 6462 53.80 69.48 1.29133 658 0.962 2 0.461 4614 2367 3100 31.07 43.64 1.40475 573 0.938 3 0.355 7311 4877 4817 56.97 66.39 1.16547 2706 0.807 Let the expected cost of the experimet be C*=50 ad the cost for each uit of the sample at the secod phase be cf=0.5. Hece for SRS, a sample of size =100 is permissible. Now, cz _ V rar q* &y ^ yy h ^yh h=l h=1 ' 6465.0378 Hece, fl_v U atj From (4.2.1.5) we have S* = 39.3963, sice N=256 2 h = 1 N {71.5485 VC1 + 24.1985) / 50} - 25.2541

85 We fid < V(yra) if d < 0.2083 ad hece further takig the cost for each uit of the sample at the iitial stage, C/=0.15 we fid = 28.6370 Also from (4.3.1.2) we have Rst'opt N -E (1 -pfjwhs^c'+ewhsyh y(l-p^)c 50 y 72.0068 {c1 + 33.4661 yjc^ 2 50 25.2541 = 27.8985 The relative precisio of the various methods ca be summarized as follows: Table 4-1 Samplig Method Method of Estimatio Relative Precisio (%) 1. Simple radom Mea per uit 100.00 2. Stratified radom Two phase 100.96 3. Stratified radom Two phase ratio 377.62 4. Stratified radom Two phase regressio 390.92

86 4.3.2.1 Determiatio of sample size Further, from (4.2.1.4) the optimum values of samplig fractios are: gj = 0.1964, g2 = 0.1832 ad g3 = 0.3982 ad hece from the expected cost give by (4.2.1.2) we fid *'=178, E(")=6, E{'l)=15, (f) =25 Also from (4.3.1.1), the optimum values of samplig fractios are: gj = 0.1937, g2 = 0.1791 ad g3 = 0.3841 ad hece from the expected cost give by (4.2.1.2) we fid /1M80, E(l!)=6, («") =15, (f) = 25 Summary ad Coclusio I this chapter a attempt has bee made to costruct ratio ad regressio estimators uder two phase stratified radom samplig i presece of oe auxiliary variable. Numerical illustratio shows that the regressio estimator uder two phase stratified samplig perform better i terms of efficiecy with respect to other competitive estimators.