VOL. 3, NO.6, June 2013 ISSN ARPN Journal of Science and Technology All rights reserved.

Similar documents
SAMPLING III BIOS 662

Predictor Corrector Methods of High Order for Numerical Integration of Initial Value Problems

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

The Effect of Level of Significance (α) on the Performance of Hotelling-T 2 Control Chart

SAMPLING BIOS 662. Michael G. Hudgens, Ph.D. mhudgens :55. BIOS Sampling

ANALYSIS OF SURVEY DATA USING SPSS

O.O. DAWODU & A.A. Adewara Department of Statistics, University of Ilorin, Ilorin, Nigeria

Development and Implementation of an Empirical Model on Road Traffic Accidents.

Sample size and Sampling strategy

Unequal Probability Designs

OPTIMAL CONTROLLED SAMPLING DESIGNS

Module 4 Approaches to Sampling. Georgia Kayser, PhD The Water Institute

Sampling: A Brief Review. Workshop on Respondent-driven Sampling Analyst Software

Georgia Kayser, PhD. Module 4 Approaches to Sampling. Hello and Welcome to Monitoring Evaluation and Learning: Approaches to Sampling.

SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures

Indirect Sampling in Case of Asymmetrical Link Structures

Fundamentals of Applied Sampling

Figure Figure

SAS/STAT 13.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures

INTERVAL ESTIMATION OF THE DIFFERENCE BETWEEN TWO POPULATION PARAMETERS

Part 7: Glossary Overview

BIOSTATISTICS. Lecture 4 Sampling and Sampling Distribution. dr. Petr Nazarov

The New sampling Procedure for Unequal Probability Sampling of Sample Size 2.

Sampling and Estimation in Agricultural Surveys

Lectures of STA 231: Biostatistics

Sampling. General introduction to sampling methods in epidemiology and some applications to food microbiology study October Hanoi

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM

Bayesian Estimation Under Informative Sampling with Unattenuated Dependence

STA304H1F/1003HF Summer 2015: Lecture 11

SAMPLE SIZE ESTIMATION FOR MONITORING AND EVALUATION: LECTURE NOTES

Sampling. Jian Pei School of Computing Science Simon Fraser University

Non-uniform coverage estimators for distance sampling

TECH 646 Analysis of Research in Industry and Technology

Diurnal and Seasonal Variation of Surface Refractivity in Minna and Lapai, North Central Nigeria

Sampling. Module II Chapter 3

Day 8: Sampling. Daniel J. Mallinson. School of Public Affairs Penn State Harrisburg PADM-HADM 503

Ethiopian Journal of Environmental Studies and Management Vol. 6 Supplement 2013

What is Survey Weighting? Chris Skinner University of Southampton

Sampling: What you don t know can hurt you. Juan Muñoz

Statistics for Managers Using Microsoft Excel 5th Edition

Classification Exercise UCSB July 2006

The Use of Survey Weights in Regression Modelling

Longitudinal Modeling with Logistic Regression

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2012

We would like to describe this population. Central tendency (mean) Variability (standard deviation) ( X X ) 2 N

Spatial Variation in Hospitalizations for Cardiometabolic Ambulatory Care Sensitive Conditions Across Canada

Application of Statistical Analysis in Population and Sampling Population

SAS/STAT 14.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

ICES training Course on Design and Analysis of Statistically Sound Catch Sampling Programmes

From the help desk: It s all about the sampling

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Survey of Smoking Behavior. Survey of Smoking Behavior. Survey of Smoking Behavior

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2013

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

Trip Generation Model Development for Albany

Nonparametric meta-analysis for diagnostic accuracy studies Antonia Zapf

Supplementary Material

Study on Method of Mass Communication Research 传播研究方法 (6) Dr. Yi Mou 牟怡

BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication,

The ESS Sample Design Data File (SDDF)

Business Statistics: A First Course

On Efficiency of Midzuno-Sen Strategy under Two-phase Sampling

Sampling Theory and Methods

Sampling : Error and bias

Statistics in medicine

Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

Part I. Sampling design. Overview. INFOWO Lecture M6: Sampling design and Experiments. Outline. Sampling design Experiments.

C. J. Skinner Cross-classified sampling: some estimation theory

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

On Consistency of Tests for Stationarity in Autoregressive and Moving Average Models of Different Orders

SAMPLING- Method of Psychology. By- Mrs Neelam Rathee, Dept of Psychology. PGGCG-11, Chandigarh.

Facilitating Survey Sampling in Low- and Middle-Income Contexts Through Satellite Maps and Basic Mobile Computing Technology

SAMPLING TECHNIQUES INTRODUCTION

Multiple Regression of Students Performance Using forward Selection Procedure, Backward Elimination and Stepwise Procedure

HISTORICAL PERSPECTIVE OF SURVEY SAMPLING

CRISP: Capture-Recapture Interactive Simulation Package

CLUSTER EFFECTS AND SIMULTANEITY IN MULTILEVEL MODELS

FINAL EXAM STAT 5201 Fall 2016

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Sampling. February 24, Multilevel analysis and multistage samples

Model Assisted Survey Sampling

Introduction to Survey Sampling

Inference of Adaptive methods for Multi-Stage skew-t Simulated Data

Cluster Sampling 2. Chapter Introduction

Sampling Concepts. IUFRO-SPDC Snowbird, UT September 29 Oct 3, 2014 Drs. Rolfe Leary and John A. Kershaw, Jr.

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Modi ed Ratio Estimators in Strati ed Random Sampling

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu

International Journal of Pure and Applied Mathematics Volume 21 No , THE VARIANCE OF SAMPLE VARIANCE FROM A FINITE POPULATION

Estimating Bounded Population Total Using Linear Regression in the Presence of Supporting Information

3. When a researcher wants to identify particular types of cases for in-depth investigation; purpose less to generalize to larger population than to g

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

UNIVERSITY OF TORONTO MISSISSAUGA. SOC222 Measuring Society In-Class Test. November 11, 2011 Duration 11:15a.m. 13 :00p.m.

Discussing Effects of Different MAR-Settings

SAMPLING II BIOS 662. Michael G. Hudgens, Ph.D. mhudgens :37. BIOS Sampling II

Transcription:

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. New Estimation Method in Two-Stage Cluster Sampling Using Finite n D. I. Lanlege, O. M. Adetutu, L.A. Nafiu Department of Mathematics and Computer Science, Ibrahim Badamasi Babangida University, Lapai, Niger State, Nigeria, Department of Mathematics and Statistics, Federal University of Technology, Minna, Niger State, Nigeria ABSTRACT This research investigates the use of a two-stage cluster sampling design in estimating the population total. We focus on a special design where certain number of visits is being considered for estimating the population size and a weighted factor of / is introduced. In particular, attempt was made at deriving a new method for a three-stage sampling design. In this study, we compared the newly proposed estimator with some of the existing estimators in a two-stage sampling design. Eight (8) data sets were used to justify this paper. The first four (4) data sets were obtained from Horvitz and Thompson (), Raj (7), Cochran (7 7) and Okafor (00) respectively while the second four (4) data sets represent the number of diabetic patients in Niger state, Nigeria for the years 00, 006, 007 and 008 respectively. The computation was done with software developed in Microsoft Visual C++ programming language. The population totals were obtained for illustrated data and life data. We obtained the biases of the estimated population totals for illustrated and life data respectively. For all the populations considered, the biases of our proposed estimator are the least among all estimators compared. The variance of the newly proposed method is less than the variances of those of the existing methods. All the estimated population totals are also found to fall within the computed confidence intervals for α = %. The coefficients of variations obtained for the estimated population totals using both illustrated and life data show that our newly proposed estimator has the least coefficient of variation. Therefore, our newly proposed estimator ( considering a two-stage cluster sampling design. Keywords: Sampling, cluster, two-stage, design, finite population, estimator, bias and variance.. INTRODUCTION In a census, each unit (such as person, household or local government area) is enumerated, whereas in a sample survey, only a sample of units is enumerated and information provided by the sample is used to make estimates relating to all units []. In designing a study, it can be advantageous to sample units in more than onestage. The criteria for selecting a unit at a given stage typically depend on attributes observed in the previous stages []. Recent work on two-stage sampling includes those of [] and [4]. ) is recommended when In [], comparison between two-stage cluster sampling and simple random sampling was made; and observed that two-stage cluster sampling is better in terms of efficiency. Also, [6] gives the reason for multistage sampling as administrative convenience. The work of [7] states that multistage sampling makes fieldwork and supervision relatively easy. Multistage sampling is more efficient than single stage cluster sampling ([8]; [4] and []). It is concluded in [0] that sub sampling has a great variety of applications. Figure shows specifically a schematic representation of a two-stage sample in which two units are selected from each cluster as given by [0]. Fig : Schematic Representation of a Two-Stage Cluster Sampling Design Variability in two-stage sampling includes the following: a. In one-stage cluster sampling, the estimate varies due to one source: different samples of primary units yield different estimates. b. In two-stage cluster sampling, the estimate varies due to two sources: different samples of primary units and then different samples of secondary units within primary units.. AIM AND OBJECTIVES The aim of this research is to model a new estimator for two-stage sampling design and the main objectives are to: 78

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. a. Investigate some of the existing estimators used in two-stage cluster sampling design and compare them in terms of efficiency and administrative convenience. b. Develop new estimator that is more efficient and precise than already existing estimators in twostage cluster sampling design. c. Compare these estimators ( conventional and newly proposed) using eight (8) data sets.. DATA USED FOR THIS STUDY There are eight (8) categories of data used in this paper. The first four (4) data sets were obtained and used as illustration as contained in []. The second four (4) data sets used are of secondary type and were collected from [] and []. We constructed a sampling frame from all diabetic patients with chronic eye disease (Glaucoma and Retinopathy) in the twenty-five () Local Government Areas of the state between years 00 and 008. 4. MATERIALS AND METHODS Let N denote the number of primary units in the population and n the number of primary units in the sample. Let M be the number of secondary units in the i primary unit. The total number of secondary units in the population is M N M i i () where > is the symbol to represent all stages of sampling after the first. The expression (4) may be written into three components as: V( h) V( h))) V( h))) V( h))) In line with [4] and [7], we now present our proposed estimator for two-stage cluster sampling design. Here, a sample of primary unit is selected and then a sample of secondary units is chosen from each of the selected primary units. For example, the total number of individuals who are involved in the treatment of diabetes at hospitals in a state can be estimated by selecting a sample of Hospitals in the Cities within the Local Government Areas in the state and then collect the number of diabetic patients in each sampled hospital. Let be the number of primary units (Cities) in the population. For, let be the size of a population of secondary units (Hospitals) in the primary unit engaged in the treatment of diabetes within the state. Assuming that each secondary unit engages in the variable of interest at only one primary unit, then (6) () Let y ij denote the value of the variable of interest of the jth secondary unit in the it primary unit. The total of the y-values in the it primary unit is M i Y i y ij j Accordingly, the population total for overall sample in a two-stage is given as Y N M i i j y ij For any estimation h in the hath cell based on completely arbitrary probabilities of selection, the total variance is then the sum of the variances for all strata. The symbol E is used for the operator of expectation, V for the variance, and V for the unbiased estimate of V. We may then write V ( h ) V ( h )) V ( h )) () () (4) Let be the number of primary units sampled without replacement, the number of secondary units in the sampled primary unit and the number of secondary units selected without replacement from the sampled primary unit for. An unbiased estimator of the total population at the primary unit in the sample is: where is the known sampling fraction in the (7) primary unit for and denotes the number of secondary units in the sample from the primary unit who engage in the variable of interest. total is: A proposed unbiased estimator of the population (8) 7

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. Where () and (0) Theorem : is unbiased for the population total Y Proof: We know that the expectation of is conditional on a sample of primary units equals, that is; () Therefore; ( ) To obtain the expected value of over all possible samples of primary units, let such that (), the inclusion probability for a primary units under simple random sampling. where and for (4) () (6) is: Then, the expectation of the proposed estimator The first term on the right of the equality in equation (4) is the variance that would be obtained if every secondary units in a selected primary unit were observed, that is, if the s were known for. The second term contains variance due to estimating s from a subsample of secondary units within the selected primary units. An unbiased estimator of the variance given in equation (4) is: () This implies that the proposed estimator is unbiased. Hence, the variance of the newly proposed estimator of the population total is derived as follows: We use (7) where (8) and for () Theorem : is unbiased for We note that (0) 80

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. Next we note that Hence, is an unbiased sample estimator of the proposed estimator ( ) in two-stage cluster sampling design. In addition, we also note that Together equations (0), () and () imply () (). RESULTS. Estimated n Totals The estimated population totals computed with the aid of software developed are given in table for the illustrated data and in table for the life data respectively. Table : Estimated n Totals for Illustrated Data Estimator Case I Case II Case III Case IV 48 0,76 8 4,0 40,6 0,0 8,867 4,8 84 4,74,886 4,77 40,0 40 8, 7 4,7 47 7,64 4,8 47 6,76 0,07 Finally, we note that () Table : Estimated n Totals for Life Data Estimator POP POP POP POP 4 8,00 8,60,40,46 4,6 6,0 6,60 8, 6, 6,60 8,664,,804 7, 8, 8,7 6,8 6,4 7,8 7,64 7, 8,64,40, 4, 6,8 8,,04,84 6,67 7,04,00 Together, equations () and (4) become That is; (4). Biases for the Estimated n Totals We consider the biases arising from estimated population totals using a two-stage cluster sampling design and the values presented in table for illustrated data and in table 4 for life data respectively. Table : Biases of Estimated n Totals for Illustrated Data Estimator Case I Case II Case III Case IV 0 87 7 6 0 8 0 6 8 4 0 4 6 66 48 08 4 6 6 486 666 8

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. 4 Table 4: Biases of Estimated n Totals for Life Data Estimator POP POP POP POP 4 8 6 0 48 48 6 8 8 46 70 7 0 08 47 0 87 4 07 0. Estimated Variances for the Estimated n Totals The estimated variances computed using the software developed are also given in table for illustrated data and in table 6 for life data respectively. Table : Variances Of The Estimated n Totals For Illustrated Data 4 0,60.860 6,674.880,477.7 0,. 6 0,. 7 0,807.08 7 0,8.86,70. 8.4 Standard Errors for the Estimated n Totals Tables 7 and 8 give the standard errors of the estimated population totals using a two-stage cluster sampling design. Table 7: Standard Errors for Estimated n Totals for Illustrated Data Estimator Case I Case II Case Case IV III 44.,76.68.8484 46.6 40.746,6.7.8076 4. 8.046,48.867.6 7.8 7.6776,4.68.4 6.44 7.48,4.46.40 6.87.4408,6.67..777.488,.6.0 4.7 0.8,.08.0.4 Estimato r Case I Case II Case Case IV III,47.0,8,08.4 8.46 60,87.67,660.0,66,00.4.67,40.600,48.60,8,7.00.8 6,46.6,4.60,,76.76.80,0.4,4.706,46,00.00.800,80.4 0,0.40 8,760,0.46 4.674,0.8 8.40,70,40.04.7 4,86.404.600,67,06.0. 4,077.4 Table 6: Variances of the Estimated n Totals for Life Data Estimator n n n n 4,404.0 8 8,484.60 7,066.78 6,8. 4 7,460.6 8,0.476 0,70.44 6,777.84 7 4,0.44 6,.0,770.6 6 4,74.8 6,8.0 4,07.70,66.0,77.7 8,0.8,8.48 6,.67 4,.00 8 0,8.40,7.76,70.86,40.07 Table 8: Standard Errors for Estimated n Totals for Life Data Estimato r POP POP POP POP 4.0.68.6.774.86 4.4 8.0..07 7.480 07.08.67.408 4.44 06.7 7.76 0.4.6 04.86 6.648 04.0788.700 0.67.80 0.06 08.004 0.66 0. 00.64 0.77.464 08.860. Confidence Intervals for the Estimated n Totals The confidence intervals for estimated population totals are presented in table for illustrated data and in table 0 for life data. Table : Confidence Intervals for Estimated n Totals for Illustrated Data Estimator Case I Case II Case Case IV III (4,6 8) (00, 0760) (4,4) (6,4 7) (70, 0) (66, 06) (47,4) (468, 8) 8

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. (7,46 7) (46,4 ) (6,0 ) (4,46 ) (40, 4) (8,47 6) (8, 00) (76, 44) (88, 44) (6, 0) (46, 8) (47, ) (,) (47, 04) (6,) (4,4 4) (7,4) (4747, 67) (4, (,4 87) (4,47) (0,4 448) (8,) (477, 64) Table 0: Confidence Intervals of Estimated n Totals for Life Data Estimato r n (7740,8 80) (00,44 0) (60,67 0) (80,60 0) (680,6 0) (70,7 (400,44 0) (640,60 n (840,88 80) (640,67 60) (660,68 0) (700,7 (60,66 0) (840,88 70) (60,67 0) (6400,68 0) n (80,6 60) (670,68 (840,88 70) (80,87 (70,77 (00,6 00) (80,8 0) (700,74 00) Popula tion 4 (70, 680) (7870, 880) (880, 460) (860, 00) (740, 7880) (00, 760) (880, 60) (00, 0) Table : Coefficients of Variation for Life Data Estimator POP POP POP POP 4 0.% 0.48% 0.46% 0.44% 0.% 0.% 0.4% 0.46% 0.46% 0.46% 0.% 0.4% 0.44% 0.4% 0.4% 0.4% 0.4% 0.4% 0.40% 0.4% 0.% 0.7% 0.% 0.% 0.4% 0.4% 0.8% 0.8% 0.8% 0.% 0.7% 0.7% 6. DISCUSSION The population totals obtained for illustrated data are given in table while the population totals obtained for life data are given in table. Table give the biases of the estimated population totals for illustrated data for our own estimator as, 4, and for cases I IV respectively while table 4 gives those of the four life data sets as 4, 07, 0 and respectively. This implies that our own estimator has the least biases using both data sets. Table shows the variances obtained using illustrated data for our own estimator as.600, 6706.0,. and 4077.4 for cases I IV respectively while table 6 shows those of life data sets as 0.7, 0807.087, 08.86 and 70.8 respectively meaning that our own estimator has the least variances using both data sets..6 Coefficients of Variation for the Estimated n Totals Coefficients of Variations for the estimated population totals are given in table for illustrated data and in table for life data. Table : Coefficients of Variation for Illustrated Data Estimator Case I Case II Case III Case IV.8%.66% 4.86%.64%.0%.64%.6%.6%.8%.7% 4.%.6%.8% 0.8% 4.6%.0% 8.7%.%.6%.0% 8.0%.4%.0%.64% 6.67% 0.6%.%.67% 7.6%.%.70%.46% Table 7 shows the obtained standard errors for the estimated population totals using illustrated data for our own estimator as 0.8,.08,.0 and.4 for cases I IV respectively while table 8 shows those of life data sets as 00.64, 0.77,.464 and 08.860 respectively meaning that our own estimator has the least standard errors using both data sets. The confidence intervals of the estimated populations in table are given in using α = %. The confidence intervals of the estimated populations in table are also given in table 0 for the same confidence intervals. Tables and 0 show that all the estimated population totals fall within the computed intervals as expected. For our own estimator, table gives the coefficients of variations for the estimated population totals using illustrated data as 7.6%,.%,.7% and.46% for cases I IV respectively while table gives that of life data sets as 0.8%, 0.%, 0.7% and 0.7% respectively which means that our newly proposed twostage cluster estimator has the least coefficient of variation. 8

VOL., NO.6, June 0 ISSN -77 0-0. All rights reserved. 7. CONCLUSION When an unbiased estimator of high precision and an unbiased sample estimate of its variance is required for a two-stage sampling design, estimator is preferred and hence recommended for estimating population totals. REFERENCES [] Kish, L. (67). Survey Sampling. New York: John Wiley and Sons. [] Kuk, A. (88). Estimation of Distribution Functions and Medians under Sampling with Unequal Probabilities. Biometrika, 7: 7-0. [] Goldstein, H. (). Multilevel Statistical Models. New York: Halstead Press. [4] Tate, J.E. and M. G. Hudgens (007). Estimating n Size with Two-stage and Three-stage sampling Designs. American Journal of Epidemiology, 6(): 4-0. [] Fink, A. (00). How To Sample In Surveys. Thousand Oaks, C.A.: Sage Publications. [6] Kalton, G. (8). Introduction to Survey Sampling. C.A.: Sage Publications. [7] Okafor, F. (00). Sample Survey Theory with Applications. Nigeria: Afro-Orbis Publications. [8] Henry, G. T. (0). Practical Sampling. Thousand Oaks, C.A.: Sage Publications. [] Nafiu, L. A., Oshungade, I. O. and Adewara, A. A., (0): Alternative Estimation Method for a Three-Stage Cluster Sampling in Finite n. American Journal of Mathematics and Statistics (AJMS). Vol.. No. 6: -7. [0] Cochran, W.G. (77). Sampling Techniques. Third Edition. New York: Wiley and Sons. [] Nafiu, L. A. (0). An Alternative Estimation Method for Multistage Cluster Sampling in Finite n, Ph.D. thesis. University of Ilorin, Nigeria. [] Niger State (007). Niger State Statistical Year Book. Nigeria: Niger Press Printing & Publishing. [] National Bureau of Statistics (007). Directory of Health Establishments in Nigeria. Nigeria: Abuja Printing Press. [4] Thompson, S. K. (). Sampling. New York: John Wiley and Sons. AUTHORS PROFILES Mr. D. I. Lanlege received the degree in Mathematics from University of Ado-Ekiti, Ado-Ekiti, Ekiti State, Nigeria. He is a research student of Federal University of Technology, Minna, Niger State, Nigeria. Currently, he is an Assistant Lecturer at the Ibrahim Badamasi Babangida University, Lapai, Niger State, Nigeria. Mr. O. M. Adetutu received the degree in Statistics from University of Ilorin, Ilorin, Kwara State, Nigeria. He is currently an Assistant Lecturer at the Federal University of Technology, Minna, Niger State, Nigeria. Dr. L. A. Nafiu received the degree in Statistics from University of Ilorin, Ilorin, Kwara State, Nigeria. Currently, he is a Lecturer I at the Federal University of Technology, Minna, Niger State, Nigeria. 84