Communications in Statistics Theory and Methods, 36: 075 081, 007 Copyright Taylor & Francis Group, LLC ISS: 0361-096 print/153-415x online DOI: 10.1080/0361090601144046 Sampling Theory Improvement in Variance Estimation in Simple Random Sampling CEM KADILAR AD HULYA CIGI Department of Statistics, Hacettepe University, Ankara, Turkey We propose a new estimator for the population variance using an auxiliary variable in simple random sampling and obtain the equations for its mean square error (MSE) and bias. In addition, theoretically, we show that the proposed estimator is more efficient than the traditional ratio and regression estimators, suggested by Isaki (1983), under certain conditions that are defined in this article. These conditions are satisfied with a numerical example. Keywords Auxiliary information; Bias; Efficiency; Mean square error; Ratio estimator; Regression estimator; Variance estimator. Mathematics Subject Classification 6D05. 1. Introduction Isaki (1983) presented the ratio estimator for the population variance as s ratio = s y Sx sx (1) where s y and s x are unbiased estimators of population variances S y and S x, respectively. Prasad and Singh (1990) obtained the MSE and the bias equations of this estimator as follows: MSE ( s ratio S4 y y + x () B ( ) s ratio S y x 1 1 (3) respectively, where = 1, n y = 40 is the kurtosis for the population of the study 0 variable, x = 04 is the kurtosis for the population of the auxiliary variable, = 0 0 and rs = 0 j=1 y j Ȳ r x j X s (Kendall and Stuart, 1963). Here n is the sample Received June 4, 005; Accepted October 13, 006 Address correspondence to Cem Kadilar, Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey; E-mail: kadilar@hacettepe.edu.tr 075
076 Kadilar and Cingi size, is the population size, X and Y are population means of the auxiliary variable x i and the study variable y i, respectively. Isaki (1983) also presented the regression estimator for the population variance using auxiliary variable as s reg = s y + b S x s x (4) where b is a constant, which makes the MSE of the estimator minimum, when b = B = V R 1. Here V x 1 R = S y. Garcia and Cebrain (1996) obtained the MSE of this Sx estimator as follows: } MSE s reg 1 S4 y { y 1 (5) x 1 ote that this estimator is an unbiased estimator. When we compare MSE Eqs. (5) and (), we see that the MSE of regression estimator is smaller than the MSE of ratio estimator for x > 1. In other words, regression estimator, given in (4), is more efficient than the ratio estimator, given in (1), for x > 1. There are some other important studies on this topic in literature such as Das and Tripathi (1978), Singh et al. (1988), Agrawal and Sthapit (1995), and Arcos et al. (005). In these studies, the following population variance estimators are proposed: ) ( X ( ) S s 1 = s y s x = x s y s 3 = sy X X + x X s 4 = sy S x Sx + s x S x s 5 = 1s y + X x s 6 = 1s y + S x s x s x s 7 = rs x + s rd + 1 s y s 8 = s y + 1 S x s x + X x where, 1,, 1, estimators, are suitable constants that minimize the MSE of the n 1 = n 1 r = 1 n r n i and s rd = 1 n r n 1 i r d i d i=1 i=1 ( ) yi Y n Here r i = d i = x i X x i X i=1 and d = d i n. The Suggested Estimator Shabbir and Yaab (003) proposed an estimator for the population mean as ȳ SY = 1 J ȳ + Jt b X (6) [ where J is a constant and t b = ȳ 1+ Cxy ] x 1+ C is called Beale s estimator. Here, x Cx and C y are the population coefficient of variations of x i and y i, respectively. C xy = xy C x C y,
Improvement in Variance Estimation 077 = 1 f n, f = n, xy is the population coefficient of correlation between x i and y i, ȳ and x are sample means of x i and y i, respectively. ote that the optimum value of J is J = xy C y /C x. Adapting the estimator of Shabbir and Yaab (003), given in (6), to the estimator for the population variance, we develop the following estimator: s pr = 1s y + t b X (7) where 1 and are the weights that satisfy the condition: 1 + = 1. The MSE of the proposed estimator can be found using the first-degree approximation in the Taylor series method defined by MSE ( s pr) d d (8) where [ h a b c d = h a b c h a b c a S y X Y b S y X Y c V sy cov s y x cov s y ȳ = cov x sy V x cov x ȳ cov ȳ sy cov ȳ x V ȳ S y X Y ] (see Wolter, 1985). Here, h a b c = h sy x ȳ = s pr. According to this definition, we obtain d for the proposed estimator as follows: d = 1 R where R = Y X and = 1+ C xy is a constant, whose value is approximately 1 in general 1+ Cx for a fixed sample size. We obtain the MSE of the proposed estimator using (8) as MSE ( ) s pr 1 V ( ) s y + R V x + V ȳ 1 R cov ( x ) s y + 1 cov ( ) ȳ s y R cov x ȳ (9) To find the variance, autocovariance, and cross covariance terms in (9), we use the following equations: V m r = ( r r + r r 1 r ) r 1 r+1 V m r = ( r r (10) ) V m s = ( s ) s (11) cov m r m q = r+q r q + rq r 1 q 1 r r 1 q+1 q r+1 q 1 cov m r m 1 = r+1 r r 1 (1) cov m rs m uv = ( r+u s+v rs uv + ru 0 r 1 s u 1 v + sv 0 r s 1 u v 1 + rv 11 r 1 s u v 1 + su 11 r s 1 u 1 v u r+1 s u 1 v v r s+1 u v 1 r r 1 s u+1 v s r s 1 u v+1 )
078 Kadilar and Cingi cov m rs m uv = ( r+u s+v rs uv) (13) cov m rs m uv = ( r+u s+v rs uv u r+1 s u 1 v v r s+1 u v 1 ) (14) respectively (see Kendall and Stuart, 1963, pp. 9 35), where m r = n j=1 y j Y r, m r = n j=1 yr j, m rs = n j=1 y j Y r x j X s, m rs = n j=1 yr j xs j, j=1 y r = j Y r r = j=1 y r j, rs = j=1 y j Y r x j X s When r = 1 and s = 1 in (11), we have, rs = j=1 y r j xs j respectively. When r = in (1), we have When r = 0, s = 1, u = 1, and v = 0 in (13), we have. When r = in (10), we have, V s y S4 y y 1 (15) V x S x (16) V ȳ S y (17) cov s y ȳ = cov ȳ s y = 30 (18) cov x ȳ S yx (19) where S yx is the population covariance between the study and auxiliary variables. When r = 0, s = 1, u =, and v = 0 in (14), we have cov x s y = 1 (0) Using (15) (0) in (9), we obtain the MSE equation of the proposed estimator as follows: MSE s pr S4 y 1 y 1 A + B (1) where y = y 1, A = R 1 30, and B = R + R. In other words, Sy 4 V R Sy Sy V R S y S x B = n MSE ȳ Sy 4 r. Here, ȳ r is the ratio estimator for the population mean in simple random sampling and MSE ȳ r Sy R S ys x + R Sx (see Cochran, 1977). The optimum values of 1 and to minimize the MSE of spr can easily be shown as: 1 = A + B A + B + y and = A + y () A + B + y The bias of the proposed estimator can be found using the second-degree approximation, 0 n, in the Taylor series method defined by B s pr 1 [ 3 3 i=1 j=1 ] d ij E ˆ i i ˆ j j (3)
Improvement in Variance Estimation 079 (see Wolter, 1985), where d ij = h t ˆ i ˆ j t=t, h t = h s y x ȳ = s pr, ˆ 1 = s y, ˆ =ȳ, ˆ 3 = x and 1 = S y, = Y, 3 = X. By these definitions, we can re-write (3) as B ( s pr ) 1{ d00 V s y + d 10cov ȳ s y + d 0cov x s y + d 01cov s y ȳ + d 11V ȳ d 1 cov x ȳ + d 0 cov s y x + d 1cov ȳ x + d V x } (4) Using the definition d ij and (15) (0) in (4), we can easily obtain the bias of the proposed estimator as follows: 3. Efficiency Comparisons B ( s pr ) ( ) Syx + RS X x (5) In this section, firstly we compare the MSE of the proposed estimator with the MSE of the traditional ratio estimator given in (). We have the condition as follows: MSE min spr <MSE s ratio 1 1 y 1 x + 1 A + B<0 (6) When the condition (6) is satisfied, the proposed estimator is more efficient than the traditional ratio estimator given in (1). Secondly, we compare equations of the MSE between the proposed estimator and the traditional regression estimator given in (5) as follows: ( ) ( ) MSE min s pr < MSE s reg 1 1 (7) 1 y + x 1 A + B<0 where x = x 1. When the condition (7) is satisfied, the proposed estimator is more efficient than the traditional regression estimator given in (4). As a result, we show that the MSE of the proposed estimator is smaller than the MSE of traditional ratio and regression estimators for the conditions (6) and (7), respectively. If we compare the bias of the proposed estimator, given in (5), with the bias of the ratio estimator, given in (3), we obtain the following equation: X S yx + RS x S y x 1 < 0 (8) When the condition (8) is satisfied, the proposed estimator is less biased than the traditional ratio estimator, given in (1). 4. umerical Example We use the data of Kadilar and Cingi (003) in this section. However, we consider the data of only the East Anatolia Region of Turkey, as we are interested in simple
080 Kadilar and Cingi Table 1 Data statistics = 104 Y = 6 54 = 0 050 A = 0 0845 n = 0 X = 13931 683 = 0 040 B = 0 0018 = 0 865 S y = 11 670 = 14 398 R = 0 0004 C y = 1 866 S x = 306 133 V R =.57E-07 1 = 966567 33 C x = 1 653 x = 17 516 1 = 0 005 30 = 5910 8 C yx = 668 y = 16 53 = 1 005 = 0 998 S yx = 3470 06 random sampling in this article. We apply the proposed and traditional estimators to this data set concerning the level of apple production (in 100 tones) (as study variable) and number of apple trees (as auxiliary variable) in 104 villages in the East Anatolia Region in 1999 (Source: Institute of Statistics, Republic of Turkey). The statistics of these data are given in Table 1. Using the statistics in Table 1, we obtain the values of conditions (6) and (7), as 5.41 and 4.653, respectively, for all sample sizes (n = 10, 0, 30, 40, etc.). Therefore, conditions are always satisfied for this data set. Thus, we can infer that the proposed estimator is more efficient than the traditional regression and ratio estimators. In addition, when we compare the bias of the proposed estimator with the bias of the traditional ratio estimator, we obtain the value of the condition (8) as approximately 390 for all sample sizes (n = 10, 0, 30, 40, etc.). Therefore, the bias of the proposed estimator is also less than of the traditional ratio estimator for this data set. 5. Conclusion We have developed a new population variance estimator whose MSE is smaller than the MSE of traditional ratio and regression estimators for the conditions (6) and (7), respectively. This theoretical inference is also supported by the result of an application with original data presented in Kadilar and Cingi (003). In forthcoming studies, we hope to adapt the estimator, presented in this article, to the stratified random sampling as in Kadilar and Cingi (005a) and hope to develop a variance estimator using two auxiliary variables as the estimator in Kadilar and Cingi (005b). References Agrawal, M. C., Sthapit, A. B. (1995). Unbiased ratio-type variance estimation. Statist. Probab. Lett. 5:361 364. Arcos, A., Rueda, M., Martinez, M. D., Gonzalez, S., Roman, Y. (005). Incorporating the auxiliary information available in variance estimation. Appl. Math. Computat. 160:387 399. Cochran, W. G. (1977). Sampling Techniques. ew York: John Wiley and Sons. Das, A. K., Tripathi, T. P. (1978). Use of auxiliary information in estimating the finite population variance. Sankhya 40:139 148. Garcia, M. R., Cebrain, A. A. (1996). Repeated substitution method: The ratio estimator for the population variance. Metrika 43:101 105.
Improvement in Variance Estimation 081 Isaki, C. T. (1983). Variance estimation using auxiliary information. J. Amer. Statist. Assoc. 78:117 13. Kadilar, C., Cingi, H. (003). Ratio estimators in stratified random sampling. Biometrical J. 45:18 5. Kadilar, C., Cingi, H. (005a). A new ratio estimator in stratified random sampling. Commun. Statist. Theor. Meth. 34:597 60. Kadilar, C., Cingi, H. (005b). A new estimator using two auxiliary variables. Appl. Math. Computat. 16:901 908. Kendall, M., Stuart, A. (1963). The Advanced Theory of Statistics: Distribution Theory. Vol. 1, London: Griffin. Prasad, B., Singh, H. P. (1990). Some improved ratio-type estimators of finite population variance in sample surveys. Commun. Statist. Theor. Meth. 19:117 1139. Shabbir, J., Yaab, M. Z. (003). Improvement over transformed auxiliary variable in estimating the finite population mean. Biometrical J. 45:73 79. Singh, H. P., Upadhyaya L.., amjoshi, U. D. (1988). Estimation of finite population variance. Curr. Sci. 57:1331 1334. Wolter, K. M. (1985). Introduction to Variance Estimation. Springer-Verlag.