Taking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD

Similar documents
ANALYSIS OF SURVEY DATA USING SPSS

Sample size and Sampling strategy

Design of 2 nd survey in Cambodia WHO workshop on Repeat prevalence surveys in Asia: design and analysis 8-11 Feb 2012 Cambodia

Introduction to Survey Data Analysis

Part 7: Glossary Overview

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2012

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2013

New Developments in Econometrics Lecture 9: Stratified Sampling

MN 400: Research Methods. CHAPTER 7 Sample Design

POPULATION AND SAMPLE

Sampling. Sampling. Sampling. Sampling. Population. Sample. Sampling unit

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM

Lesson 6 Population & Sampling

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

Sampling and Estimation in Agricultural Surveys

A SAS MACRO FOR ESTIMATING SAMPLING ERRORS OF MEANS AND PROPORTIONS IN THE CONTEXT OF STRATIFIED MULTISTAGE CLUSTER SAMPLING

The ESS Sample Design Data File (SDDF)

Statistics for Managers Using Microsoft Excel 5th Edition

Sampling Techniques. Esra Akdeniz. February 9th, 2016

Sampling in Space and Time. Natural experiment? Analytical Surveys

Sampling: What you don t know can hurt you. Juan Muñoz

Lecture 5: Sampling Methods

Sampling Weights. Pierre Foy

ECON 214 Elements of Statistics for Economists

Sampling Methods and the Central Limit Theorem GOALS. Why Sample the Population? 9/25/17. Dr. Richard Jerz

Now we will define some common sampling plans and discuss their strengths and limitations.

Session 2.1: Terminology, Concepts and Definitions

Explain the role of sampling in the research process Distinguish between probability and nonprobability sampling Understand the factors to consider

Fundamentals of Applied Sampling

Examine characteristics of a sample and make inferences about the population

EC969: Introduction to Survey Methodology

Sampling. Vorasith Sornsrivichai, M.D., FETP Cert. Epidemiology Unit, Faculty of Medicine, PSU

Business Statistics: A First Course

CPT Section D Quantitative Aptitude Chapter 15. Prof. Bharat Koshti

You are allowed 3? sheets of notes and a calculator.

BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication,

Research Methods in Environmental Science

Day 8: Sampling. Daniel J. Mallinson. School of Public Affairs Penn State Harrisburg PADM-HADM 503

Variance estimation for Generalized Entropy and Atkinson indices: the complex survey data case

Field data acquisition

Data Collection: What Is Sampling?

SAMPLING- Method of Psychology. By- Mrs Neelam Rathee, Dept of Psychology. PGGCG-11, Chandigarh.

SAMPLING II BIOS 662. Michael G. Hudgens, Ph.D. mhudgens :37. BIOS Sampling II

Main sampling techniques

SAMPLING III BIOS 662

A4. Methodology Annex: Sampling Design (2008) Methodology Annex: Sampling design 1

ICES training Course on Design and Analysis of Statistically Sound Catch Sampling Programmes

Georgia Kayser, PhD. Module 4 Approaches to Sampling. Hello and Welcome to Monitoring Evaluation and Learning: Approaches to Sampling.

Preparing Spatial Data

SYA 3300 Research Methods and Lab Summer A, 2000

Passing-Bablok Regression for Method Comparison

SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures

Q.1 Define Population Ans In statistical investigation the interest usually lies in the assessment of the general magnitude and the study of

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Model Assisted Survey Sampling

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

Introduction to Sample Survey

Sampling Scheme for 2003 General Social Survey of China

Sampling approaches in animal health studies. Al Ain November 2017

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu

What if we want to estimate the mean of w from an SS sample? Let non-overlapping, exhaustive groups, W g : g 1,...G. Random

Module 4 Approaches to Sampling. Georgia Kayser, PhD The Water Institute

Socio-Economic Atlas of Tajikistan. The World Bank THE STATE STATISTICAL COMMITTEE OF THE REPUBLIC OF TAJIKISTAN

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Error Analysis of Sampling Frame in Sample Survey*

Instance Selection. Motivation. Sample Selection (1) Sample Selection (2) Sample Selection (3) Sample Size (1)

5.3 LINEARIZATION METHOD. Linearization Method for a Nonlinear Estimator

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting)

Task. Variance Estimation in EU-SILC Survey. Gini coefficient. Estimates of Gini Index

From the help desk: It s all about the sampling

Part 3: Inferential Statistics

Why Sample? Selecting a sample is less time-consuming than selecting every item in the population (census).

Bayesian Estimation Under Informative Sampling with Unattenuated Dependence

Module 16. Sampling and Sampling Distributions: Random Sampling, Non Random Sampling

SAS/STAT 13.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures

Tropentag 2007 University of Kassel-Witzenhausen and University of Göttingen, October 9-11, 2007

Introduction to Survey Sampling

SAMPLING ERRORS IN HOUSEHOLD SURVEYS NATIONAL HOUSEHOLD SURVEY CAPABILITY PROGRAMME UNFPA/UN/INT-92-P80-15E

Sampling Theory in Statistics Explained - SSC CGL Tier II Notes in PDF

Conducting Fieldwork and Survey Design

Implementing Rubin s Alternative Multiple Imputation Method for Statistical Matching in Stata

3. When a researcher wants to identify particular types of cases for in-depth investigation; purpose less to generalize to larger population than to g

CROSS SECTIONAL STUDY & SAMPLING METHOD

Application of Statistical Analysis in Population and Sampling Population

BUSINESS STATISTICS. MBA, Pokhara University. Bijay Lal Pradhan, Ph.D.

APPENDIX A SAMPLE DESIGN

Survey Sample Methods

STATISTICAL INFERENCE FROM COMPLEX SAMPLE WITH SAS ON THE EXAMPLE OF HOUSEHOLD BUDGET SURVEYS

Figure Figure

Esri EADA10. ArcGIS Desktop Associate. Download Full Version :

Powers functions. Composition and Inverse of functions.

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time. Data Management

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Glossary. Appendix G AAG-SAM APP G

Prepared by: DR. ROZIAH MOHD RASDI Faculty of Educational Studies Universiti Putra Malaysia

Data Integration for Big Data Analysis for finite population inference

Stochastic calculus for summable processes 1

The required region is shown in the diagram above. The

Transcription:

Taking into account sampling design in DAD SAMPLING DESIGN AND DAD With version 4.2 and higher of DAD, the Sampling Design (SD) of the database can be specified in order to calculate the correct asymptotic sampling distribution of the various indices and statistics provided by DAD. Data from sample surveys usually display four important characteristics: 1- they come with sampling weights (SW), also called inverse probability weights; 2- they are stratified; 3- they are clustered; 4- sample observations provide aggregate information (such as household expenditures) on a number of statistical units (such as individuals) Figure 1 shows a graphical SD representation for the case of Simple Random Sampling (SRS), in which it is supposed that sample observations are directly and randomly selected from a base of sampling units (s) (e.g., the list of all households within in a country). Figure 1: Simple Random Sampling Population 1 2 3 4 5 6 7 8 9 10 Sample observations (e.g., households), or selected sample units Units within 4 Units within sample observation 4 (e.g., all individuals in household 4) Random Selection Sample observations Complete Selection 1

SRS is rarely used to generate household surveys. Hence, most SD encountered in practice will not look like that in Figure 1. Most SD will look instead like that of Figure 2. A country is first divided into geographical or administrative zones and areas, called strata. Each zone or area thus represents a strata in Figure 2. The first random selection takes place within the Primary Sampling Units (denoted as P s) of each stratum. Within each stratum, a number of P s are randomly selected. This random selection of P s provides clusters of information. P s are often provinces, departments, villages, etc. Within each P, there may then be other levels of random selection. For instance, within each province, a number of villages may be randomly selected, and within every selected village, a number of households may be randomly selected. The final sample observations constitute the Last Sampling Units ( s). Each sample observation may then provide aggregate information (such as household expenditures) on all individuals or agents found within that. These individuals or agents are not selected information on all on them appears in the sample. They therefore do not represent the s in statistical terminology. Figure 2: Sampling Design with two levels of random selection Strata Strata 2 Strata 1 Strata 3 I P(1,1) P(1,2) P(3,1) P(3,2) P(2,1) P(2,2) Primary Sampling Units P(i,j) for stata i II 1,1,1 1,1,2 1,2,1 1,2,2 3,1,1 3,1,2 3,2,1 3,2,2 2,1,1 2,2,1 Last Sampling Units () for each P Sub-Units Sub-Units within each Random Selection Stratification Complete Selection 2

IMPACT OF SD ON THE SAMPLING ERROR OF DAD S ESTIMATORS a) Impact of stratification Generally speaking, a variable of interest, such as household income, tends to be less variable within strata than across the entire population. This is because households within the same stratum typically share to a greater extent than in the entire population some socio-economic characteristics, such as geographical locations, climatic conditions, and demographic characteristics, and that these characteristics are determinants of the living standards of these households. Stratification ensures that a certain number of observations are selected from each of a certain number of strata. Hence, it helps generate sample information from a diversity of socio-economic areas. Because information from a broader spectrum of the population leads on average to more precise estimates, stratification generally decreases the sampling variance of estimators. For instance, suppose at the extreme that household income is the same for all households in a stratum, and this, for all strata. In this case, supposing also that the population size of each stratum is known, it is sufficient to draw one household from each stratum to know exactly the distribution of income in the population. b) Impact of clustering (or multi-stage sampling) Multi-stage sampling implies observations end up in a sample only subsequently to a process of multiple selection. Groups of observations are first randomly selected within a population (which may be stratified); this is followed by further sampling within the selected groups, which may be followed by yet another process of random selection within the subgroups selected in the previous stage. The first selection stage takes place at the level of P s, and generates what are often called clusters. Generally, variables of interest (such as living standards) vary less within a cluster than between clusters. Hence, multi-stage selection reduces the diversity of information generated by sampling. The impact of clustering sample observations is therefore to tend to decrease the precision of populations estimators, and thus to increase their sampling variance. Ceteris paribus, the lower the variability of a variable of interest within clusters, the larger the loss of information that there is in sampling further within the same clusters. To see this, suppose for instance an extreme case in which household income happens to be the same for all households in a cluster, and this, for all clusters. In such cases, it is clearly wasteful to adopt multistage sampling: it would be sufficient to draw one household from each cluster in order to know the distribution of income within that cluster. It would be more informative to draw randomly other clusters. SAMPLING DESIGN IN DAD By default, when a data file is loaded in DAD, the type of SD assigned to the data is the SRS presented in Figure 1. Once the data are loaded, the exact SD structure can nevertheless be easily specified. Up to 5 vectors can help specify that structure: 3

Vectors Strata P SW FPC Table 1: Description of vectors used in DAD to specify the SD Description Specifies the name of the variable (integer type) that contains stratum identifiers Specifies the name of the variable (integer type) that contains identifiers for the Primary Sampling Units Specifies the name of the variable (integer type) that contains identifiers for the Last Sampling Units Specifies the name of the variable for the Sampling Weights. Sampling weights are the inverse of the sampling rate. Roughly speaking, they equal the number of observations in the underlying population that are represented by each sample observation. Specifies the name of the variable for the Finite Population Correction factor. With FPC, DAD derives an indicator f h for each observation h, which is then used to compute SD-corrected sampling errors. If the variable FCP is not specified, f_h=0 for all observations; When the variable specified has values <= 1, it is directly interpreted as a stratum sampling rate f_h =n_h/n_h, where n_h = number of Ps sampled from the strata to which h belongs and N_h = total number of Ps in the population belonging to stratum h. When the variable specified has values greater than or equal to n_h, it is interpreted as representing N_h; f_h is then set to n_h/n_h. The following table contains an example of vectors used to specify the type of SD shown in Figure 2. Table 2: Example of SD. OBS Strata P SW 1 1 1 1 6 2 1 1 2 6 3 1 2 1 6 4 1 2 2 6 5 3 1 1 5 6 3 1 2 5 7 3 2 1 5 8 3 2 2 5 9 2 1 1 3 10 2 2 1 3 M 3 6 10 50 Omitting SW will systematically bias both the estimators of the values of indices and points on curves as well as the estimation of the sampling variance of those estimators. Consider for instance the estimation of total population income from the data shown in table 2. 4 households appear in strata 1, but the population number of households in that strata is six times as large (that is, 24), and this is captured by the SW 4

variable. Total population income for strata 1 would therefore be estimated to be six times that of total sample income for strata 1. Table 3: Example of SD. OBS Strata SW N_h 1 1 1 6 24 2 1 2 6 24 3 1 3 6 24 4 1 4 6 24 5 3 1 5 20 6 3 2 5 20 7 3 3 5 20 8 3 4 5 20 9 2 1 3 6 10 2 2 3 6 M 3 10 50 --- The FPC factor accounts for the reduction in sampling variance that occurs when a sample is drawn without replacement from a finite population (as compared to sampling with replacement). According to table 3, the four s of strata 1 were selected without replacement from a population of 24 s. These fuor s are then necessarily distinct by design. If sampling had been done with replacement, then multiple observations of the same population s could have been generated. Because sampling without replacement guarantees that sample observations represent different sampling units, it therefore generates greater sampling information and leads to smaller sampling variances than with sampling with replacement. For strata 1 of Table 3, data from four distinct s (or P s) out of 24 are necessarily generated after sampling. The f h factor for that strata is then 4/24=0.1666. REMARK: We can initialise and use the FPC correction just when the SD is based on one stage of random selection of s. In this case P s and s are equivalent. To initialize the SD after loading the database, select from the main menu the item Edit->Set Sample Design. The following window then appears. 5

This allows DAD to take into account a wide variety of possible SD. This is made by selecting (or not selecting) vectors for any of the five choices offered above. In the case of SRS within a number of strata, there would be an indicator of a strata vector without any indication of a vector of P s. The following table presents some of these combinations. Strata P SW FPC Indication SD is SRS without sampling weights X X SD is stratified with SW X X X No stratification, but multi-stage sampling and SW X X Random (one-stage) sampling of s with -specific selection probabilities. This can occur for instance if, once an individual is selected, all individuals in his household are also automatically selected. Implicitly, then, it is the household that is selected as a X X X Stratification with only the first sampling stage specified by the user X X Stratification with one-stage sampling and sampling weights (wrongly?) omitted X X X Stratification with one-stage sampling and sampling weights (wrongly?) omitted X X X Stratification with multi-stage sampling and sampling weights (wrongly?) omitted X X X X Stratification with multi-stage sampling and sampling weights provided X X X X X Stratification with multi-stage sampling and sampling weights provided. The finite population correction factor is also provided; this supposes that sampling for the statistical inferences X: Indicate that the variable is selected 6

Note that when DAD finds the values of the strata-psu-lsu variables to be the same across observations, it supposes that these observations comefrom just one. If the option Auto-compute FPC is activated, DAD generates implicitly the FPC vector. REMARKS:. After initialization of the SD information, the dataset is automatically ordered by (when specified) strata, P s and s. There should be more than one P within each stratum. e.g.:1) before initialization of the SD 2) after initialization of the SD: data is ordered according to strata, P and 7

To show the SD information, select from main menu the item Edit->Summarize Sample Design. The following window appears. 8