Optimum Stratification in Bivariate Auxiliary Variables under Neyman Allocation

Journal o Modern Applied Statistical Methods Volume 7 Issue Article 3 6-9-08 Optimum Stratiication in Bivariate Auiliar Variables under Neman Allocation Faizan Danish Sher-e-Kashmir Universit o Agricultural Sciences and Technolog o Jammu, J&K India, secsportsbsc@gmail.com S.E.H. Rizvi Sher-e-Kashmir Universit o Agricultural Sciences and Technolog o Jammu, J&K India, rizvisehstat@gmail.com Follow this and additional works at: https://digitalcommons.wane.edu/jmasm Part o the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theor Commons Recommended Citation Danish, Faizan and Rizvi, S.E.H. (08 "Optimum Stratiication in Bivariate Auiliar Variables under Neman Allocation," Journal o Modern Applied Statistical Methods: Vol. 7 : Iss., Article 3. DOI: 0.37/jmasm/594867 Available at: https://digitalcommons.wane.edu/jmasm/vol7/iss/3 This Regular Article is brought to ou or ree and open access b the Open Access Journals at DigitalCommons@WaneState. It has been accepted or inclusion in Journal o Modern Applied Statistical Methods b an authorized editor o DigitalCommons@WaneState.

Optimum Stratiication in Bivariate Auiliar Variables under Neman Allocation Cover Page Footnote Respected Sir/Madam, Sir/Madam I have prepared the paper as suggested now and made the changes suggested b reviewers b including some Lemma's that ma provide the decorum o the work.the rebuttal ile has also been prepared Sir/Madam Thanking ou Sir/Madam This regular article is available in Journal o Modern Applied Statistical Methods: https://digitalcommons.wane.edu/jmasm/vol7/ iss/3

Journal o Modern Applied Statistical Methods Ma 08, Vol. 7, No., ep 580 doi: 0.37/jmasm/594867 Copright 08 JMASM, Inc. ISSN 538 947 Optimum Stratiication in Bivariate Auiliar Variables under Neman Allocation Faizan Danish Sher-e-Kashmir Universit o Agricultural Sciences and Technolog o Jammu Jammu, India S. E. H. Rizvi Sher-e-Kashmir Universit o Agricultural Sciences and Technolog o Jammu Jammu, India When the complete data set o the stud variable is unknown it produces a possible stumbling block in attempting various stratiication techniques. A technique is proposed under Neman allocation when the stratiication is done on the two auiliar variables having one estimation variable under consideration. Because o compleities made b minimal equations, approimate optimum strata boundaries are obtained. An empirical stud illustrates the proposed method when the auiliar variables have standard Cauch and power distributions. Kewords: Stratiication points, bi-variate distribution, power distribution, standard Cauch distribution Introduction A populace might be homogenous or heterogeneous. For the latter, one approach is to isolate it into sub-populaces, which are known as strata. Limiting change b stratiing is known as ideal stratiication. There are dierent actors that are responsible or minimum variance, which include choosing the variable on the basis o which stratiication would be done, total number o strata, the design b which sample size will be selected rom each stratum, and, the most inluential actor, the demarcation o strata. The use o a single stratiication variable ma be problematic, and in an case, using more than one stratiication variable increases the level o eactness. Dalenius (950 pioneered the work o obtaining optimum strata boundaries b minimizing the variance. See also Dalenius and Gurne (95, Mahalanobis doi: 0.37/jmasm/594867 Accepted: September 9, 07; Published: June 9, 08. Correspondence: Faizan Danish, danishstat@gmail.com

DANISH & RIZVI (95, Aoama (954, Dalenius and Hodges (959, Durbin (959, Singh (975, 977, and Verma (008. However, their equations were mostl taken b using an estimation variable as a stratiication variable. Others used a variable highl correlated to the stud variable as stratiication variable, such as Taga (967, Serling (968, Singh and Sukhatme (969, 97, 973, Singh (97, Singh and Parkash (975, Schneeberger and Goller (979, and Rizvi, Gupta, and Singh (00. Iterative procedures were also proposed b Rivest (00 or obtaining stratiication points, and Gunning and Horgan (004 proposed a new algorithm or the skewed population. The motivation behind the present eamination is to consider a solitar report variable and two actors eceedingl connected with it. The stratiication will be conducted based on auiliar variables. For numerical illustration o the proposed method, two dierent distributions will be considered or the auiliar variables. Stratiication Points Let a population o size N units be divided into T U strata, and let Nrs denote the number o units in the (r, s th stratum. Suppose a sample size n is to be taken rom the whole population, and let nrs denote the sample size allocated to the (r, s th stratum and zrsi the values o the population units in the (r, s th, i,, 3, Let the variable Z be the stud variable deined b Z T U N rs r s i z rsi The unbiased estimate o population mean is z T U W z st rs rs r s where Wrs is the weight or the (r, s th stratum. For obtaining strata boundaries, assume that a inite population consists o N units. The stratiication points [zrs] or the case o optimum allocation can be obtained b the ollowing equations: 3

OPTIMUM STRATIFICATION UNDER NEYMAN ALLOCATION ( rsz + ( rs rsz rsz rsrsz ( z z z + ijz rs ijz ijz rs ijz rsz ijz where i h +, h +,, T, and j k +, k +,, U, and μrsz denotes the mean o the population or Z in (r, s th stratum. The minimal equations given above were obtained b Dalenius (950 when the stratiication variable is same as the estimation variable. When the densit unctions o the auiliar variables Y and X are known, then the distribution o Z is not known due to the auiliar variables used to obtain optimum points. Assume the regression line o Z on Y and X is linear, given as Z λ Y, X + e ( where λ(y, X is a unction o Y and Z and e denotes the error term such that e e E 0 and V (, 0,, deined in (a, b. Let (z,, denote the densit unction o the population (Z, Y, X and let (, denote the joint marginal densit unction o Y and X. Also, let ( and ( be the marginal densit unctions o Y and X, respectivel. According to the model deined in (, rsz r s rs λ λ (, (, W ( rs r s which denotes the mean o the (r, s th stratum, where r s Wrs (, (3 r s denotes the weight o the (r, s th stratum and the variance o the stratum is given b + (4 rsz rsc rs 4

DANISH & RIZVI λ (, (, (, (, r s r s rsλ W rs W r s rs r s (5 where (r, r, s, s denotes the boundaries and μrsϕ is the epected value o the unction ϕ(, o the (r, s th stratum. Optimum Variance Equation Let (g, h and (k, L be the deined intervals or the variables Y and X which are needed to estimate the stratiication point (r, s so the variance o the estimate is minimum. The stratiication points so obtained would be the result o taking partial derivatives o (4 with respect to the stratiication points. In order to obtain the stratiication points o the (r, s th stratum, ind the partial derivatives. Dierentiate (3 with respect to r and s: and s s r, (6 r r (, s (7 where α and β are the irst partial derivatives o (3 with respect to r and s, respectivel. Also b dierentiating with respect to r and s, rs r s (, (, W rs r s 5

OPTIMUM STRATIFICATION UNDER NEYMAN ALLOCATION (, s r A (8 w s rs, r s A (9 w r rs where A ϕ(r, μrsϕ, A ϕ(, s μrsϕ, and γ and δ denote the irst partial derivatives o μrsϕ with respect to r and s, respectivel. Similarl, while inding the irst partial derivatives o ( with respect to r and s, respectivel, s A3 (0 s r A4 ( r where,, A λ (, and A λ (, r s 3 r rsλ 4 s rsλ wrs wrs Again, partiall dierentiate (5 with respect to r and s: s A5 ( rs s w r A6 (3 rs r w where 5 r, λ r, rs λ rsλ and 6, s λ, s rs λ rsλ A A 6

DANISH & RIZVI Similarl, or the (r +, s + th stratum, while taking the partial derivatives with respect to r and s, we get s W (4 (, r+ s r s r W rs (5 (, + s r ( r+ s s A7 (6 s ( + rs r A8 (7 r ( r+ sλ s A9 (8 s ( + rsλ r A0 (9 r ( r+ r, λ, ( ( sλ r w ( r+ s r r r+ s λ (, (, ( r sλ W + ( r+ s r s (0 7

OPTIMUM STRATIFICATION UNDER NEYMAN ALLOCATION s ( rs+ λ s w rs ( + s, λ, s r s+ λ (, (, rs λ W + rs ( + r s ( where,, A ( r,, A (, s, r s 7 r+ s 8 r s+ w w r+ s r s+,, A λ ( r,, A λ (, s r s 9 r+ sλ 0 r s+ λ w w r+ s r s+ Under the Neman allotment, the luctuation o the eample mean is V ( z st T U r k n W rs rsz ( However, i the inite population correction is ignored, minimizing ( is equivalent to minimizing V This can be rewritten, using (4, as V T ( z W U st rs rsz r s T U (3 ( zst Wrs rs λ + rs λ r s B dierentiating (3 with respect to r and then equating to zero, 8

DANISH & RIZVI W ( λ + + + W rs rs rs rsλ rs rs r r ( W + W + + + r+ s r+ sλ r+ s r+ sλ r+ s r+ s r r (4 Using equations (4-(9, ( r s, ( + I + (, (5 r rsλ rs r rsλ W s rs ( ( s r (, r+ sλ r+ s s r ( r+ sλ W s rs, + I + (6 r where + ( + I λ, and I λ, r rsλ rsλ r r sλ r sλ The minimal equations can be obtained b substituting the values obtained in (5 and (6: s s 3 4 ( + I + I + + + rs r s s s rsλ rs r sλ r s ( + ( + (7 where I3 r, λ r, rs + rsλ + r, I4 r, λ r, + +, r s r sλ r + + B dierentiating (3 with respect to s and equating to zero, the minimal equations are 9

OPTIMUM STRATIFICATION UNDER NEYMAN ALLOCATION r r, s λ, s rs + I5 rsλ + rs r r s ( s r( s+, λ, + I6 r s + ( + λ r( s+ (8 where (, and (, r s+ r( s+ I + + I + + 5 rsλ s rs 6 λ s Equations (7 and (8 result in strata boundaries (r, s corresponding to the minimum o V(z st o the unction ( 4λ, + I 7 ψ (, (, z, g, h, k, L I 3 7 where I7 ϕ(,. Assuming that the regression model deined in ( is linear o the orm z a + b + c + e, b + c + (9 rsz rs rs e where the epected value and variance o the error term e are 0 and σ, respectivel, and assume that the auiliar variables are independent o each other. Equation (3 can be written as T U Wrs ( h (30 r s where h b rs c rs e rs + + +. From the above, T U L M 3 h k rs r s r s Wrs ( h (3 s 43 r KK 0

DANISH & RIZVI where K [r r ], K [s s ], and φr, φs represent unchanged values o the marginal densit unctions o Y and X in (r, s th stratum, respectivel. Lemma. I the unction Iij(, is deined as i j I, t t t, t t t,, ij where (t, t is a unction o two variables, then k k k k k k I, i+ j+ i+ j+ i+ j+ ij + + ( i + ( j + ( i + ( j + ( i + ( j + i+ 3 j+ i+ j+ i+ j+ 3 k k k k k k + + + + O! ( i + 3( j + ( i + ( j + ( i + ( j + 3 i+ j+ 5 ( k (3 Proo. I (t, t is near (, and derivatives o are continuous, then epand (t, t with the help o Talor s theorem. Deine t + (t and t + (t. Then (t, t (t + (t, t + (t. Using the Talor series ormula or a unction o two variables, the epansion o (t, t is given b which ields ( t t ( + ( t + ( t,, t t ( t ( t! t! t + + + ( i j ( t ( t ( + ( t + ( t I,, ij t t ( t ( t! t! t + + +

OPTIMUM STRATIFICATION UNDER NEYMAN ALLOCATION k k k k k k I ij (,, + + i + j + i + j + t i + j + t i+ j+ i+ j+ i+ j+ ( i+ 3 j+ i+ j+ 3 i+ j+ k k k k k k + + + + O! i + 3 j + t! i + j + 3 t! i + j + t t at t, where k and k. Denote t t t t tt (,,,,,, Then Iij(, can be written as i+ j+ 5 ( k k k k k k k I, i+ j+ i+ j+ i+ j+ ij + + ( i + ( j + ( i + ( j + ( i + ( j + i+ 3 j+ i+ j+ 3 i+ j+ k k k k k k + + + + O! ( i + 3( j + ( i + ( j + 3 ( i + ( j + i+ j+ 5 ( k (33 where k indicates k or k. For i j 0, k k kk I, k k + + ( 00 3 3 k k kk k k + + + + O! ( 3 ( 3 5 ( k (34 Lemma. Let μη(, denote the conditional epectation o the unction η(t, t so that η μ, η t, t t, t t t ( t t, t t Then the series epansion o μη(, at point (t, t is given b

DANISH & RIZVI 3 η 4k k + 3k k μ η, η η η η 4 3 k η k k k + ( η + η + η + η + η + 8 6 ( + ( k k + k k + ( + + 3 4 k k kk 6 + ( η + η + η + η + ( η + η + o( k 8 3 3 k k kk kk + + + k k + k k + k k! 3 4 3 (35 Proo. From the deinition o μ, t, t t t η t, t t, t t t η ( ( μ, I, η 00 η t, t t, t t t Using the Talor series epansion, deined or two variables, o η(t, t about the point (t, t (,, ( t ( t η ( t, t η + ( t η + ( t η + η + η Thus, 3 3 ( t ( t ( t ( t ( t ( t ( t ( t + η +!! + η + η + η + η! 3! 3! 3! 3! 3

OPTIMUM STRATIFICATION UNDER NEYMAN ALLOCATION ( t ( t ( t ( t!! μ η (, I 00 (, η ( η ( η η η + t + t + + + η + ( t, t tt! I, η + I, η + I, η + I, η + I, η ( ( ( ( ( 00 0 0 0 0 5 ( ( k + I, η +o Neglecting the higher-order terms, it can be written as η μ, ( ( ( ( ( I (, I 0, η + I 0, η + I 0, η + I 0, η + I, η η + 00 (36 B substituting values o I00(,, I0(,, I0(,, I0(,, I0(,, and I(, rom (33 in (36, we get 3 η 4k k + 3k k μ η, η η η η 4 3 k η k k k + ( η + η + η + η + η + 8 6 ( + ( k k + k k + ( + + 3 4 k k kk + ( η + η + η + η + ( η + η 8 k k k k k k k k k k! 3 4 3 3 3 6 kk + + + + + + o( k Continuing in a similar manner and utilizing Talor's theorem at the point z, the epansion or μη(, is obtained as 4

DANISH & RIZVI 3 η 4k k + 3k k μ η, η η η η 4 3 k η k k k ( η + η η + η + η + 8 6 ( + ( k k k k + ( + + 3 4 k k kk 6 + ( η + η + η + η + ( η + η + o ( k 8 3 3 k k kk kk k k + k k k k! 3 4 3 (37 Lemma 3. I denotes the conditional variance o the unction η(t, t η, in the interval (, such that then (, μ (, μ (, η η η 4k k + 3k k, ηη + + ηη ηη 6 + η 4 4 3 3 + ( η +ηη ( k k + k k + k k 4 4 3 3 3 η ( ( k k k k k k k k k k k k + + + + 4 4 4 4 3 3 3 k k (38 Proo. This result can be established b using the epression orms o μ,, replacing the unction η(t, t b η (t, t: η 5

OPTIMUM STRATIFICATION UNDER NEYMAN ALLOCATION 3 4k k + 3k k μ η, η ηη ηη ηη η ηη 6 ( + ( k k + k k + + + ( + k k ( ηη ( η ηη ηη ( η ηη ( η ηη 4 3 k k + + + + + + + + 4 3 3 ηη kk + + ( ηη + ηη + ( η + ηη + ( η + ηη 6 4 kk 6 + ηη + ( η + ηη + O k 4 k k k k k k k k k k kk + +! + + ( 3 ( 3 3 3 where and η are unctions and their derivations are evaluated at the point (t, t (,. Similarl to equation (33, η 4 4 3 3 η + ( k k + k k + k k 4 μ, + O ( ( k 7 η 4 3 4 3 3 k k k k k k k k + k k + k k + + + 3 3 where and η are unctions and their derivations are evaluated at the point (t, t (,. Ater simpliication, 4 3 3 3 4k k + 3k k μ η (, ηη ( k k + k k + ηη ηη ( η ηη 6 + + + η 4 4 3 3 ( k k + k k + k k 4 k k k k k k k k + + + + 4 4 4 4 3 3 3 k k 6

DANISH & RIZVI Lemma 4. ( 4 4 ( k k k k η η η, + O ( k 7 3 4k k k k (39 Proo. To prove the lemma, use the relation obtained in equations (35 and (36. On multipling and taking square roots on both sides, obtain η ( k k kk, η η + O k kk kk + k k + Continuing the simpliication, ( Hence the lemma is proven. Lemma 5. 4 η ( k k kk + O kk kk k k 4 4 ( k k k k η η η, + O ( k 7 3 4k k k k 4 ( k 3 ( t, t t t ( tt ( t, t t t + O( k (40 Proo. Consider a unction λ ( t, t t t ( t, t t t 7

OPTIMUM STRATIFICATION UNDER NEYMAN ALLOCATION so that λ λ λ 0 Because there are the initial coeicients o k or k and series epansion o λ( about n, ind where k is either k or k. Lemma 6. λ 3 ( z O( k,, O k or 3 ( t t t t ( tt ( t t t t + ( k,, + O 3 ( t t t t ( tt ( t t t t ( k k in the Talor ( kk ( t, t t t ( t, t t t + O( k (4 Proo. To prove the above lemma, epand the term ( t t, t t in powers o k and k. Using Talor s theorem and epanding ( t, t point (t, t (,, obtain about the 8

DANISH & RIZVI ( t ( t ( t, t t t ( t, t + + + O( k t t This ma be rewritten as Using Lemmas -6, where Similarl, k k k k kk t, t + + + O k 4 4 5 kk + + 4 3 ( k k ( t, t ( O( k ( kk ( t, t t t O( k + ( kk ( t, t t t ( t, t t t + O( k ( h, T U W h (, 3 rsrs r s (4 r s g 3 g G 3 G (, T U W h (, 3 rsrs r s (43 r s Again using the approimation method discussed above, 9

OPTIMUM STRATIFICATION UNDER NEYMAN ALLOCATION where g(, g(g((rs G, L M T U Whk g (, z( rs rs ( kk (44 h k r s, G From equations (4, (43, (44, and (30, rsz rsn b g (, + c g (, + g (, r (45 Ever pair (, is made o actors that are stochastic. Thus the linear relationship between them can be obtained, rom which the coeicient o regression will be a result. Numerical Illustration The proposed strateg is appropriate when the likelihood thickness elements o the stratiication actors are known. For eample, let the auiliar variable Y ollow the standard Cauch distribution with densit unction, ( + (46 and the other auiliar variable X have the densit unction, 0 (47 0, otherwise where δ > 0 and θ > 0. In order to ind the stratiication points, ind the value o (3 and (5, 0

DANISH & RIZVI rs W rs I I 4( vr I I I u us 4 (48 log ( II ( + ( vr + r 3 3 3 s + r (49 and ( + ( II + + (( ( ( v u + I I rs r s s s ( u ( + + + + s s s (50 where I tan (vr r tan (r and I (us + s δ (s δ. B substituting values obtained in equations (48, (49, and (50 in (45, the optimum strata boundaries can be obtained..0000 0.784 0.394 0.45 0.0000 0.47 0.4765 0.6785.0000 Figure. OSB or bi-variate auiliar variables

OPTIMUM STRATIFICATION UNDER NEYMAN ALLOCATION Table. Strata boundaries and total variance OSB(r, s (0.47, 0.45 (0.4765, 0.45 (0.6785, 0.45 (.0000, 0.45 (0.47, 0.394 (0.4765, 0.394 (0.6785, 0.394 (.0000, 0.394 (0.47, 0.784 (0.4765, 0.784 (0.6785, 0.784 (.0000, 0.784 (0.47,.0000 (0.4765,.0000 (0.6785,.0000 (.0000,.0000 Total Variance 0.0546 Consider a population o size,000, which is to be sub-isolated into 6 strata with T 4 and U 4, and an eample o size 500 is to be taken rom the population. Appl equations (48, (49, and (50 in condition (44 b using the underling estimation o Y 0 and X 0 and b the maimum value o Y and X and θ, δ 3, respectivel. Thus the total width o Y and X is and, respectivel. The OSB so obtained can be displaed as above with corresponding total luctuation in Table. Conclusion Most o the time the complete set o data related to the stud variable is unknown, which becomes a stumbling block to obtaining stratiication points. However, in such situations using auiliar variables has an increasing trend o precision. A strateg was proposed under the Neman allocation when there is one investigation variable and two auiliar variables based on the auiliar variables. A numerical eample demonstrated the diminishing pattern o the luctuation when the quantit o strata is to be epanded. Along these lines, it can be concluded that this strateg or discovering stratiication points can be recommended instead o eisting techniques.

DANISH & RIZVI Reerences Aoama, H. (954. A stud o the stratiied random sampling. Annals o the Institute o Statistical Mathematics, 6(, -36. doi: 0.007/b096054 Dalenius, T. (950. The problem o optimum stratiication. Scandinavian Actuarial Journal, 33(3-4, 03-3. doi: 0.080/034638.950.04304 Dalenius, T., & Gurne, M. (95.The problem o optimum stratiication. II. Scandinavian Actuarial Journal, 34(-, 33-48. doi: 0.080/034638.95.04334 Dalenius, T., & Hodges, J. L., Jr. (959. Minimum variance stratiication. Journal o the American Statistical Association, 54(85, 88-0. doi: 0.080/06459.959.05050 Durbin, J. (959. Review o Sampling in Sweden. Journal o the Roal Statistical Societ. Series A (General, (, 46-48. doi: 0.307/346 Gunning, P., & Horgan, J. M. (004. A new algorithm or the construction o stratum boundaries in skewed populations. Surve Methodolog, 30(, 59-66. Retrieved rom http://www.statcan.gc.ca/pub/-00- /00400/article/7749-eng.pd Mahalanobis, P. C. (95. Some aspects o the design o sample surves. Sankhā: The Indian Journal o Statistics, (/, -7. Available rom http://www.jstor.org/stable/504808 Rivest, L.-P. (00. A generalization o the Lavallée and Hidiroglou algorithm or stratiication in business surves. Surve Methodolog, 8(, 9-98. Retrieved rom http://www.statcan.gc.ca/pub/-00- /0000/article/643-eng.pd Rizvi, S. E. H., Gupta, J. P., & Singh, R. (00. Approimatel optimum stratiication or two stud variable using auiliar inormation. Journal o the Indian Societ o Agricultural Statistics, 53(3, 87-98. Schneeberger, H., & Goller, W. (979. On the problem o easibilit o optimal stratiication points according to Dalenius. Statistische Hete, 0(4, 50-56. doi: 0.007/b093794 Serling, R. J. (968. Approimatel optimal stratiication. Journal o the American Statistical Association, 63(34, 98-309. doi: 0.080/06459.968.048098 3

OPTIMUM STRATIFICATION UNDER NEYMAN ALLOCATION Singh, R. (97. Approimatel optimum stratiication on the auiliar variable. Journal o the American Statistical Association, 66(336, 89-833. doi: 0.080/06459.97.04835 Singh, R. (975. An alternative method o stratiication on the auiliar variable. Sankhā: The Indian Journal o Statistics, Series C, 37(, 00-08. Singh, R. (977. A note on optimum stratiication or equal allocation with ratio and regression methods o estimation. Australian & New Zealand Journal o Statistics, 9(, 96-04. doi: 0./j.467-84.977.tb075. Singh, R., & Parkash, D. (975. Optimum stratiication or equal allocation. Annals o the Institute o Statistical Mathematics, 7(, 73-80. doi: 0.007/b0504646 Singh, R., & Sukhatme, B. V. (969. Optimum stratiication. Annals o the Institute o Statistical Mathematics, (3, 55-58. doi: 0.007/b05375 Singh, R., & Sukhatme, B. V. (97. Optimum stratiication in sampling with varing probabilities. Annals o the Institute o Statistical Mathematics, 4(, 485-494. doi: 0.007/b0479777 Singh, R., & Sukhatme, B. V. (973. Optimum stratiication with ratio and regression methods o estimation. Annals o the Institute o Statistical Mathematics, 5(, 67-633. doi: 0.007/b0479404 Taga, Y. (967. On optimum stratiication or the objective variable based on concomitant variables using prior inormation. Annals o the Institute o Statistical Mathematics, 9, 0-9. doi: 0.007/b09670 Verma, M. R. (008. Approimatel optimum stratiication or ratio and regression methods o estimation. Applied Mathematics Letters, (, 00-07. doi: 0.06/j.aml.007.03.0 4