VOL. 3, NO.6, June 2013 ISSN ARPN Journal of Science and Technology All rights reserved.

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. New Estimation Method in Two-Stage Cluster Sampling Using Finite n D. I. Lanlege, O. M. Adetutu, L.A. Nafiu Department of Mathematics and Computer Science, Ibrahim Badamasi Babangida University, Lapai, Niger State, Nigeria, Department of Mathematics and Statistics, Federal University of Technology, Minna, Niger State, Nigeria ABSTRACT This research investigates the use of a two-stage cluster sampling design in estimating the population total. We focus on a special design where certain number of visits is being considered for estimating the population size and a weighted factor of / is introduced. In particular, attempt was made at deriving a new method for a three-stage sampling design. In this study, we compared the newly proposed estimator with some of the existing estimators in a two-stage sampling design. Eight (8) data sets were used to justify this paper. The first four (4) data sets were obtained from Horvitz and Thompson (), Raj (7), Cochran (7 7) and Okafor (00) respectively while the second four (4) data sets represent the number of diabetic patients in Niger state, Nigeria for the years 00, 006, 007 and 008 respectively. The computation was done with software developed in Microsoft Visual C++ programming language. The population totals were obtained for illustrated data and life data. We obtained the biases of the estimated population totals for illustrated and life data respectively. For all the populations considered, the biases of our proposed estimator are the least among all estimators compared. The variance of the newly proposed method is less than the variances of those of the existing methods. All the estimated population totals are also found to fall within the computed confidence intervals for α = %. The coefficients of variations obtained for the estimated population totals using both illustrated and life data show that our newly proposed estimator has the least coefficient of variation. Therefore, our newly proposed estimator ( considering a two-stage cluster sampling design. Keywords: Sampling, cluster, two-stage, design, finite population, estimator, bias and variance.. INTRODUCTION In a census, each unit (such as person, household or local government area) is enumerated, whereas in a sample survey, only a sample of units is enumerated and information provided by the sample is used to make estimates relating to all units []. In designing a study, it can be advantageous to sample units in more than onestage. The criteria for selecting a unit at a given stage typically depend on attributes observed in the previous stages []. Recent work on two-stage sampling includes those of [] and [4]. ) is recommended when In [], comparison between two-stage cluster sampling and simple random sampling was made; and observed that two-stage cluster sampling is better in terms of efficiency. Also, [6] gives the reason for multistage sampling as administrative convenience. The work of [7] states that multistage sampling makes fieldwork and supervision relatively easy. Multistage sampling is more efficient than single stage cluster sampling ([8]; [4] and []). It is concluded in [0] that sub sampling has a great variety of applications. Figure shows specifically a schematic representation of a two-stage sample in which two units are selected from each cluster as given by [0]. Fig : Schematic Representation of a Two-Stage Cluster Sampling Design Variability in two-stage sampling includes the following: a. In one-stage cluster sampling, the estimate varies due to one source: different samples of primary units yield different estimates. b. In two-stage cluster sampling, the estimate varies due to two sources: different samples of primary units and then different samples of secondary units within primary units.. AIM AND OBJECTIVES The aim of this research is to model a new estimator for two-stage sampling design and the main objectives are to: 78

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. a. Investigate some of the existing estimators used in two-stage cluster sampling design and compare them in terms of efficiency and administrative convenience. b. Develop new estimator that is more efficient and precise than already existing estimators in twostage cluster sampling design. c. Compare these estimators ( conventional and newly proposed) using eight (8) data sets.. DATA USED FOR THIS STUDY There are eight (8) categories of data used in this paper. The first four (4) data sets were obtained and used as illustration as contained in []. The second four (4) data sets used are of secondary type and were collected from [] and []. We constructed a sampling frame from all diabetic patients with chronic eye disease (Glaucoma and Retinopathy) in the twenty-five () Local Government Areas of the state between years 00 and 008. 4. MATERIALS AND METHODS Let N denote the number of primary units in the population and n the number of primary units in the sample. Let M be the number of secondary units in the i primary unit. The total number of secondary units in the population is M N M i i () where > is the symbol to represent all stages of sampling after the first. The expression (4) may be written into three components as: V( h) V( h))) V( h))) V( h))) In line with [4] and [7], we now present our proposed estimator for two-stage cluster sampling design. Here, a sample of primary unit is selected and then a sample of secondary units is chosen from each of the selected primary units. For example, the total number of individuals who are involved in the treatment of diabetes at hospitals in a state can be estimated by selecting a sample of Hospitals in the Cities within the Local Government Areas in the state and then collect the number of diabetic patients in each sampled hospital. Let be the number of primary units (Cities) in the population. For, let be the size of a population of secondary units (Hospitals) in the primary unit engaged in the treatment of diabetes within the state. Assuming that each secondary unit engages in the variable of interest at only one primary unit, then (6) () Let y ij denote the value of the variable of interest of the jth secondary unit in the it primary unit. The total of the y-values in the it primary unit is M i Y i y ij j Accordingly, the population total for overall sample in a two-stage is given as Y N M i i j y ij For any estimation h in the hath cell based on completely arbitrary probabilities of selection, the total variance is then the sum of the variances for all strata. The symbol E is used for the operator of expectation, V for the variance, and V for the unbiased estimate of V. We may then write V ( h ) V ( h )) V ( h )) () () (4) Let be the number of primary units sampled without replacement, the number of secondary units in the sampled primary unit and the number of secondary units selected without replacement from the sampled primary unit for. An unbiased estimator of the total population at the primary unit in the sample is: where is the known sampling fraction in the (7) primary unit for and denotes the number of secondary units in the sample from the primary unit who engage in the variable of interest. total is: A proposed unbiased estimator of the population (8) 7

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. Where () and (0) Theorem : is unbiased for the population total Y Proof: We know that the expectation of is conditional on a sample of primary units equals, that is; () Therefore; ( ) To obtain the expected value of over all possible samples of primary units, let such that (), the inclusion probability for a primary units under simple random sampling. where and for (4) () (6) is: Then, the expectation of the proposed estimator The first term on the right of the equality in equation (4) is the variance that would be obtained if every secondary units in a selected primary unit were observed, that is, if the s were known for. The second term contains variance due to estimating s from a subsample of secondary units within the selected primary units. An unbiased estimator of the variance given in equation (4) is: () This implies that the proposed estimator is unbiased. Hence, the variance of the newly proposed estimator of the population total is derived as follows: We use (7) where (8) and for () Theorem : is unbiased for We note that (0) 80

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. Next we note that Hence, is an unbiased sample estimator of the proposed estimator ( ) in two-stage cluster sampling design. In addition, we also note that Together equations (0), () and () imply () (). RESULTS. Estimated n Totals The estimated population totals computed with the aid of software developed are given in table for the illustrated data and in table for the life data respectively. Table : Estimated n Totals for Illustrated Data Estimator Case I Case II Case III Case IV 48 0,76 8 4,0 40,6 0,0 8,867 4,8 84 4,74,886 4,77 40,0 40 8, 7 4,7 47 7,64 4,8 47 6,76 0,07 Finally, we note that () Table : Estimated n Totals for Life Data Estimator POP POP POP POP 4 8,00 8,60,40,46 4,6 6,0 6,60 8, 6, 6,60 8,664,,804 7, 8, 8,7 6,8 6,4 7,8 7,64 7, 8,64,40, 4, 6,8 8,,04,84 6,67 7,04,00 Together, equations () and (4) become That is; (4). Biases for the Estimated n Totals We consider the biases arising from estimated population totals using a two-stage cluster sampling design and the values presented in table for illustrated data and in table 4 for life data respectively. Table : Biases of Estimated n Totals for Illustrated Data Estimator Case I Case II Case III Case IV 0 87 7 6 0 8 0 6 8 4 0 4 6 66 48 08 4 6 6 486 666 8

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. 4 Table 4: Biases of Estimated n Totals for Life Data Estimator POP POP POP POP 4 8 6 0 48 48 6 8 8 46 70 7 0 08 47 0 87 4 07 0. Estimated Variances for the Estimated n Totals The estimated variances computed using the software developed are also given in table for illustrated data and in table 6 for life data respectively. Table : Variances Of The Estimated n Totals For Illustrated Data 4 0,60.860 6,674.880,477.7 0,. 6 0,. 7 0,807.08 7 0,8.86,70. 8.4 Standard Errors for the Estimated n Totals Tables 7 and 8 give the standard errors of the estimated population totals using a two-stage cluster sampling design. Table 7: Standard Errors for Estimated n Totals for Illustrated Data Estimator Case I Case II Case Case IV III 44.,76.68.8484 46.6 40.746,6.7.8076 4. 8.046,48.867.6 7.8 7.6776,4.68.4 6.44 7.48,4.46.40 6.87.4408,6.67..777.488,.6.0 4.7 0.8,.08.0.4 Estimato r Case I Case II Case Case IV III,47.0,8,08.4 8.46 60,87.67,660.0,66,00.4.67,40.600,48.60,8,7.00.8 6,46.6,4.60,,76.76.80,0.4,4.706,46,00.00.800,80.4 0,0.40 8,760,0.46 4.674,0.8 8.40,70,40.04.7 4,86.404.600,67,06.0. 4,077.4 Table 6: Variances of the Estimated n Totals for Life Data Estimator n n n n 4,404.0 8 8,484.60 7,066.78 6,8. 4 7,460.6 8,0.476 0,70.44 6,777.84 7 4,0.44 6,.0,770.6 6 4,74.8 6,8.0 4,07.70,66.0,77.7 8,0.8,8.48 6,.67 4,.00 8 0,8.40,7.76,70.86,40.07 Table 8: Standard Errors for Estimated n Totals for Life Data Estimato r POP POP POP POP 4.0.68.6.774.86 4.4 8.0..07 7.480 07.08.67.408 4.44 06.7 7.76 0.4.6 04.86 6.648 04.0788.700 0.67.80 0.06 08.004 0.66 0. 00.64 0.77.464 08.860. Confidence Intervals for the Estimated n Totals The confidence intervals for estimated population totals are presented in table for illustrated data and in table 0 for life data. Table : Confidence Intervals for Estimated n Totals for Illustrated Data Estimator Case I Case II Case Case IV III (4,6 8) (00, 0760) (4,4) (6,4 7) (70, 0) (66, 06) (47,4) (468, 8) 8

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. (7,46 7) (46,4 ) (6,0 ) (4,46 ) (40, 4) (8,47 6) (8, 00) (76, 44) (88, 44) (6, 0) (46, 8) (47, ) (,) (47, 04) (6,) (4,4 4) (7,4) (4747, 67) (4, (,4 87) (4,47) (0,4 448) (8,) (477, 64) Table 0: Confidence Intervals of Estimated n Totals for Life Data Estimato r n (7740,8 80) (00,44 0) (60,67 0) (80,60 0) (680,6 0) (70,7 (400,44 0) (640,60 n (840,88 80) (640,67 60) (660,68 0) (700,7 (60,66 0) (840,88 70) (60,67 0) (6400,68 0) n (80,6 60) (670,68 (840,88 70) (80,87 (70,77 (00,6 00) (80,8 0) (700,74 00) Popula tion 4 (70, 680) (7870, 880) (880, 460) (860, 00) (740, 7880) (00, 760) (880, 60) (00, 0) Table : Coefficients of Variation for Life Data Estimator POP POP POP POP 4 0.% 0.48% 0.46% 0.44% 0.% 0.% 0.4% 0.46% 0.46% 0.46% 0.% 0.4% 0.44% 0.4% 0.4% 0.4% 0.4% 0.4% 0.40% 0.4% 0.% 0.7% 0.% 0.% 0.4% 0.4% 0.8% 0.8% 0.8% 0.% 0.7% 0.7% 6. DISCUSSION The population totals obtained for illustrated data are given in table while the population totals obtained for life data are given in table. Table give the biases of the estimated population totals for illustrated data for our own estimator as, 4, and for cases I IV respectively while table 4 gives those of the four life data sets as 4, 07, 0 and respectively. This implies that our own estimator has the least biases using both data sets. Table shows the variances obtained using illustrated data for our own estimator as.600, 6706.0,. and 4077.4 for cases I IV respectively while table 6 shows those of life data sets as 0.7, 0807.087, 08.86 and 70.8 respectively meaning that our own estimator has the least variances using both data sets..6 Coefficients of Variation for the Estimated n Totals Coefficients of Variations for the estimated population totals are given in table for illustrated data and in table for life data. Table : Coefficients of Variation for Illustrated Data Estimator Case I Case II Case III Case IV.8%.66% 4.86%.64%.0%.64%.6%.6%.8%.7% 4.%.6%.8% 0.8% 4.6%.0% 8.7%.%.6%.0% 8.0%.4%.0%.64% 6.67% 0.6%.%.67% 7.6%.%.70%.46% Table 7 shows the obtained standard errors for the estimated population totals using illustrated data for our own estimator as 0.8,.08,.0 and.4 for cases I IV respectively while table 8 shows those of life data sets as 00.64, 0.77,.464 and 08.860 respectively meaning that our own estimator has the least standard errors using both data sets. The confidence intervals of the estimated populations in table are given in using α = %. The confidence intervals of the estimated populations in table are also given in table 0 for the same confidence intervals. Tables and 0 show that all the estimated population totals fall within the computed intervals as expected. For our own estimator, table gives the coefficients of variations for the estimated population totals using illustrated data as 7.6%,.%,.7% and.46% for cases I IV respectively while table gives that of life data sets as 0.8%, 0.%, 0.7% and 0.7% respectively which means that our newly proposed twostage cluster estimator has the least coefficient of variation. 8

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. 7. CONCLUSION When an unbiased estimator of high precision and an unbiased sample estimate of its variance is required for a two-stage sampling design, estimator is preferred and hence recommended for estimating population totals. REFERENCES [] Kish, L. (67). Survey Sampling. New York: John Wiley and Sons. [] Kuk, A. (88). Estimation of Distribution Functions and Medians under Sampling with Unequal Probabilities. Biometrika, 7: 7-0. [] Goldstein, H. (). Multilevel Statistical Models. New York: Halstead Press. [4] Tate, J.E. and M. G. Hudgens (007). Estimating n Size with Two-stage and Three-stage sampling Designs. American Journal of Epidemiology, 6(): 4-0. [] Fink, A. (00). How To Sample In Surveys. Thousand Oaks, C.A.: Sage Publications. [6] Kalton, G. (8). Introduction to Survey Sampling. C.A.: Sage Publications. [7] Okafor, F. (00). Sample Survey Theory with Applications. Nigeria: Afro-Orbis Publications. [8] Henry, G. T. (0). Practical Sampling. Thousand Oaks, C.A.: Sage Publications. [] Nafiu, L. A., Oshungade, I. O. and Adewara, A. A., (0): Alternative Estimation Method for a Three-Stage Cluster Sampling in Finite n. American Journal of Mathematics and Statistics (AJMS). Vol.. No. 6: -7. [0] Cochran, W.G. (77). Sampling Techniques. Third Edition. New York: Wiley and Sons. [] Nafiu, L. A. (0). An Alternative Estimation Method for Multistage Cluster Sampling in Finite n, Ph.D. thesis. University of Ilorin, Nigeria. [] Niger State (007). Niger State Statistical Year Book. Nigeria: Niger Press Printing & Publishing. [] National Bureau of Statistics (007). Directory of Health Establishments in Nigeria. Nigeria: Abuja Printing Press. [4] Thompson, S. K. (). Sampling. New York: John Wiley and Sons. AUTHORS PROFILES Mr. D. I. Lanlege received the degree in Mathematics from University of Ado-Ekiti, Ado-Ekiti, Ekiti State, Nigeria. He is a research student of Federal University of Technology, Minna, Niger State, Nigeria. Currently, he is an Assistant Lecturer at the Ibrahim Badamasi Babangida University, Lapai, Niger State, Nigeria. Mr. O. M. Adetutu received the degree in Statistics from University of Ilorin, Ilorin, Kwara State, Nigeria. He is currently an Assistant Lecturer at the Federal University of Technology, Minna, Niger State, Nigeria. Dr. L. A. Nafiu received the degree in Statistics from University of Ilorin, Ilorin, Kwara State, Nigeria. Currently, he is a Lecturer I at the Federal University of Technology, Minna, Niger State, Nigeria. 84