On Efficiency of Midzuno-Sen Strategy under Two-phase Sampling

International Journal of Statistics and Analysis. ISSN 2248-9959 Volume 7, Number 1 (2017), pp. 19-26 Research India Publications http://www.ripublication.com On Efficiency of Midzuno-Sen Strategy under Two-phase Sampling P. A. Patel 1 and Shraddha Bhatt 2 Department of Statistics, Sardar Patel University, Vallabh Vidyanagar 388 120 Abstract This article deals with the ratio estimation of the finite population total under the Midzuno-Sen sampling scheme when complete auxiliary information is not available but another auxiliary variable, closely related to first auxiliary variable but remotely related to the study variable, is available. A chain ratio estimator in two-phase sampling is suggested. An empirical study is carried out to investigate the performance of the suggested estimator relative to the convention two-phase estimator. key words: Two-phase Sampling, Midzuno-Sen sampling, Ratio estimator, Regression estimator. MS Classification: 62D05: 62F10 1. INTRODUCTION Consider a finite population U = {1,, i, N}. Let y and x be the study variable and auxiliary variable taking the values y i and x i respectively for the population unit i. When the two variables are strongly related but no information is available on the population total X = U x i (or mean X), we wish to estimate the population total Y = U y i of y from a sample s, obtained through a two-phase selection. Scheme I: The first-phase sample s of fixed size n is drawn using simple random sampling without replacement (srswor) to observe only x in order to estimate X. Given

20 P. A. Patel and Shraddha Bhatt s, the second-phase sample s of fixed size n is drawn again using srswor to observe y only. The two-phase sampling ratio estimator is defined as Y Rd = Nx y x = N x s y s n x s (1) where x (x s ) is the mean (total) of x for the sample s, y (y s ), x (x s ) are means (totals) of y and x for the sample s. Suppose that the population mean Z of another auxiliary variable z closely related to x but compare to x remotely related to y is available. For instance, y could be acres of corn harvested for grain in 1964, x acres under corn in 1964 and z acres under corn in 1959. Chand (1975), Kiregyera (1980, 1984), Srivastava et al. (1990) used a second auxiliary variable z closely related to x to suggest different improved estimators under Scheme I. Raj (1965) suggests the following two-phase sample selection procedure. Scheme II: The initial sample s of size n is selected with probability proportional to z with replacement and the second phase sample s of size n is a subsample of s, selected with equal probabilities without replacement. An unbiased estimator of Y is provided by Y DRd = (y i np i s ) + k [ (x i n p i s ) (x i np i )] (2) s where k is a suitably chosen scalar. To compute this estimator it is necessary to assess the value of k. Often this is difficult. Motivated by this Srivenkataramana and Tracy (1989) have modified the sampling scheme. Scheme III: Select the first phase sample s of size n with probability proportional to z and select a subsample s of size n units with probability proportional to x z, with replacement. Under this scheme an unbiased estimator of Y is Y STd = Z (y i nx i ) (x i n z i ) (3) s s Experience shows that of all the unequal probability sampling without replacement schemes available in literature those due to Midzuno (1952) and Sen (1953), and Rao et al. (1962) have wider practical applicability since these procedures are quite simple and involve less computations at estimation stage compared to others. In this article to estimate the population total we will use the following two-phase sampling procedure.

On Efficiency of Midzuno-Sen Strategy under Two-phase Sampling 21 Scheme IV: A first-phase sample s, of size n, is drawn from U according to a design srswor. Given s, a second-phase sample s, of size n, is drawn from s according to the Midzuno-Sen (MS) sampling. The MS scheme of sampling describes the selection of the first unit with probability proportional to a given size measure (x) and remaining (n-1) units of the sample by srswor. The size of the measure is an auxiliary variable and is positively correlated with the study variable. Under this scheme the ordinary ratio estimator is known to be unbiased for the population total. The variance of this estimator is not available in literature in a meaningful form. Rao and Vijayan (1977) investigated the problem of estimating the variance of the ratio estimator, given at (1), under the MS sampling. The same problem addressed by Patel and Patel (2008, 2010a, 2010b). In Section 2 we suggest a chain ratio estimator using two auxiliary variables. To study the performance of the suggested estimator a small scale simulation is presented in Section 3 and a comparison of strategy is given in Section 4. The conclusion is given in Section 5. 2. THE SUGGESTED ESTIMATOR In this section a chain ratio estimator Y Rd is suggested. = N(x Z z )(y x) (4) Let S uv (s uv ) denote the population (sample) covariance between u and v variables, C u = S u U, the coefficient of variation of u variable and ρ uv = S uv S u S v, the correlation coefficient of u and v (u, v = y, x, z). Let B( ) and Var( ) denote respectively the bias and variance of an estimator under a given design. Then, under srswor, for large sample approximations: y = Y(1 + δ 1 ), x = X(1 + δ 2 ), z = Z(1 + δ 3 ) such that E(δ i ) = 0 for i = 1,2,3 we have the following results. Here we assume that δ i < 1 i. Theorem. Under Scheme IV, approximate bias and variance, too(n 1 ), of Y Rd are given by B(Y Rd ) = 1 f n Y(C 2 z ρ yz C y C z ) (5) Var(Y Rd ) = 1 f n Y 2 (C 2 y + C 2 z 2ρ yz C y C z ) + V [1 + 3 1 f n C 2 z 2( ZV) ] (6) ZV

22 P. A. Patel and Shraddha Bhatt where = 1 N [[ a iiy 2 i z i + a ( a ii y 2 i z j + 2 a ij y i y j z j ) i U i j U i j U + b a ij y i y j z k ] i j k U with a ij sand V are defined below and a = (n 1), b = (n 1)(n 2) and (N 1) (N 1)(N 2) f = n N. Proof. The expectation of Y Rd is evaluated using the conditional argument E( ) = E 1 E 2 ( s ) as E(Y Rd ) = NE 1 [(Z z )E 2 (x y x)] = NE 1 [(Z z )y ] and consequently using the formula for approximate bias of the ordinary ratio estimator (see, Cochran, 1977) of the population total under SRSWOR we obtain the bias of Y Rd as given in (5). Next, the formula for variance of Y Rd is The first component is readily seen to be Var(Y Rd ) = V 1 E 2 (Y Rd ) + E 1 V 2 (Y Rd ) (7) V 1 E 2 (Y Rd ) = V 1 (Ny Z z ) 1 f n Y 2 (C y 2 + C z 2 2ρ yz C y C z ) (8) Note that at second phase the ratio estimator x y x is unbiased for y under MS sampling design with variance (see, Rao, 1972) where a ii = 2 v = a ii y i + a ij y i y j s s X n 1 ) 1 1 and a S i x ij = X s ( N 1 n 1 ) 1 1 S i&j x s ( N 1 Also, note that v is unbiased for V = n N a 2 iiy i + n (n 1) U N(N 1) a ij y i y j

On Efficiency of Midzuno-Sen Strategy under Two-phase Sampling 23 at first phase (i.e., under srswor). The second component is written as E 1 V 2 (Y Rd ) = E 1 [(Z z ) 2 V 2 (x y x) ] = E 1 [( Z 2 z ) v ] Inserting v = V(1 + δ 4 ), δ 4 < 1, and using the expansion (1 + δ 3 ) 2 = 1 2δ 3 + 3δ 3 2 + O(n 1 ), after omitting the terms of δ s having power greater than two, we obtain E 1 V 2 (Y Rd ) V[1 + 3Var(δ 3 ) 2Cov(δ 3, δ 4 )] (9) Noting that Var(δ 3 ) = 1 f 2 C n z Cov(δ 3, δ 4 ) = [E(z v ) ZV] ZV = [ ZV] ZV (9) is written as E 1 V 2 (Y Rd ) V [1 + 3 1 f C 2 2( ZV) n z ] (10) ZV Inserting (8) and (10) in (7) we obtain (6). 3. SIMULATION STUDY The preceding estimators were compared empirically on 3 natural populations given below. For comparison of the estimators,a two-phase sample was drawn using srswor and the M-S sampling scheme from each of the populations and these estimators were computed. This procedure was repeated M = 5000 times. For each estimator Y d, the relative percentage bias RB(Y d) = 100 (Y Y) Y and the relative efficiency ascompared to Y were calculated, where RE(Y d) = MSE sim (Y ) MSE sim ( Y d) Y = M j=1 Y dj M M and MSE sim (Y d) = (Y dj Y) 2 j=1 (M 1)

24 P. A. Patel and Shraddha Bhatt Data set I: Murthy (1967) y: No of cultivators x : Area in square miles z : No. of households Data set II: Murthy (1967) y: workers at household industry x : Cultivated area (in acres) z : No. of households Data set III: Fisher data (combined all three data sets) y: Patel width x : Sepal length z : Patel length The following table shows corresponding relative bias and MSE for the data sets. Data Set n n N Estimator RB% MSE sim RE% I 10 30 128 Y Rd 0.60 3.79E+08 - Y Rd 1.02 2.95E+08 128 II 10 30 128 Y Rd 0.89 1.46E+07 - Y Rd 0.71 1.28E+07 114 III 15 40 150 Y Rd -3.16 1.01E+03 - Y Rd -3.32 7.07E+02 143 The above simulation reveals that (1) the absolute RBs % of both the estimators are in reasonable range and (2) the efficiency of the suggested estimator is substantial as compared to the conventional estimator. 4. COMPARISON OF STRATEGIES Here, we compare empirically the suggested strategy (mean a pair of design and estimator) given at (4) with the conventional strategy given at (1), denoted respectively by H 2 (srswor MS sampling,, Y Rd ) and H 1 (srswor srswor, Y Rd ). For the comparison of two strategies H 1 (p 1, Y 1) and H 2 (p 2, Y 2) we have computed the percentage gain in efficiency of H 2 over H 1 as Gain (H 2, H 1 ) = ( V p 1 (Y 1) 1) 100% V p2 (Y 2)

On Efficiency of Midzuno-Sen Strategy under Two-phase Sampling 25 The following table presents gain due to MS sampling scheme over srswor in twophase sampling. Data Set I II III Gain(H 2, H 1 ) 33% -4% 26% Remark. We have compared empirically Des Raj Strategy, Srivenkataramana and Tracy strategy and suggested strategy given at (2), (3) and (4) with the conventional strategy given at (1), for many real data. It has been observed that for most for the populations considered under simulation study Des Raj strategy performed very well. For sake of brevity the simulated results are not presented here. 5. CONCLUSION It can be concluded that there may be situations in which it is considered important to select the sample units with probability proportionate to some measure of size x, information on which is not readily available but could be collected at moderate cost for a fairly large sample and if the population mean Z of another auxiliary variable z closely related to x but compare to x remotely related to y is available then incorporating this extra auxiliary information at the estimation stage improves the convention estimator. However, pps without replacement schemes for general n are not easy to implement but the MS scheme is easy to execute. REFERENCES [1] Chand, L. (1975). Some ratio-type estimator based on two or more auxiliary variables. Unpublished Ph. D. dissertation, Iowa State University, Iowa. [2] Cochran, W. G. (1977). Sampling Techniques 3 rd edn (New York, John Wiley) [3] Fisher, R. A. (1936). The use of multiple measurements in taxonomic problem, Annals of Eugenics, 7, 179-188. [4] Patel, J.S. and Patel, P.A. (2008). A Monte Carlo comparison of some estimators of finite population Total under Midzuno sampling, Journal of the Indian Statistical Association, 46, 2, 141-153. [5] Patel, J.S. and Patel, P.A. (2010a). On Non-negative and Improved variance estimation for ratio estimator under Midzuno-Sen sampling scheme, Statistics in Transition-new series, 10(3), 371-385 [6] Patel, J.S. and Patel, P.A. (2010b). Comparison of some variance estimators of the ratio estimator in presence of two auxiliary variables, Interstat.

26 P. A. Patel and Shraddha Bhatt [7] Kiregyera, B. (1980). A chain ratio-type estimator in finite population double sampling using two auxiliary variables, Metrika 27, 217-23. [8] Kiregyera, B. (1984). Regression-type estimators using two auxiliary variables and model of double sampling from finite populations, Metrika 31, 215-26. [9] Midzuno, H.(1952). On the sampling system with probability proportionate to sum of sizes. Ann. Inst. Statist. Math., Tokyo, 3, 99-107. [10] Murthy, M. N. (1967) Sampling Theory and Methods, Calcutta: Statistical Publishing Society. [11] Raj, D. (1964). On double sampling for pps estimation, Ann. Math. Statist. 35 (2), 900--902. [12] Raj, D. (1965). On sampling over two occasions with probability proportional to size. Ann. Math. Statisti. 36, 327-30 [13] Rao, J.N.K., Hartley, H.O. and Cochran, W.G. (1962). On a simple procedure ofunequal probability sampling without replacement. J. Roy. Statist. Soc. B24, 482-491. [14] Rao, J.N.K. and Vijayan, K. (1977). On estimating the variance in sampling with probability proportional to aggregate size. Journal of the American Statistical Association, 72, 579-584. [15] Sen, A. R. (1953). On estimate of the variance in sampling with varying probabilities, J. Ind. Soc. Statist., 7, 119-127 [16] Srivastava, R. S., Srivastava, S. R., Khare, B. B. (1990). A generalized chain ratio estimator for mean of finite population. J. Ind. Soc. Agricult. Statist. 42, 108-17 [17] Srivenkataramana, T. and Tracy, D. S. (1989). Two-phase sampling for selection with probability proportional to size in sample surveys. Biometrika 76 (4), 818-21