Maximizing Overlap of Large Primary Sampling Units in Repeated Sampling: A comparison of Ernst s Method with Ohlsson s Method

Similar documents
Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

COORDINATION OF PPS SAMPLES OVER TIME

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Coordinating the PRN: Combining Sequential and Bernoulli-Type Sampling Schemes in Business Surveys

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Module 9. Lecture 6. Duality in Assignment Problems

The Second Anti-Mathima on Game Theory

18.1 Introduction and Recap

Credit Card Pricing and Impact of Adverse Selection

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Lecture Notes on Linear Regression

Markov Chain Monte Carlo Lecture 6

Joint Statistical Meetings - Biopharmaceutical Section

Chapter 3 Describing Data Using Numerical Measures

Comparison of Regression Lines

Some modelling aspects for the Matlab implementation of MMA

Lecture 10 Support Vector Machines II

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Estimation: Part 2. Chapter GREG estimation

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Topic 23 - Randomized Complete Block Designs (RCBD)

Chapter 13: Multiple Regression

ECE559VV Project Report

COS 521: Advanced Algorithms Game Theory and Linear Programming

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

x = , so that calculated

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

Assortment Optimization under MNL

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Problem Set 9 Solutions

Appendix B: Resampling Algorithms

Linear Approximation with Regularization and Moving Least Squares

Chapter Newton s Method

A Bound for the Relative Bias of the Design Effect

Lecture 12: Discrete Laplacian

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

The written Master s Examination

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

Population element: 1 2 N. 1.1 Sampling with Replacement: Hansen-Hurwitz Estimator(HH)

CHAPTER 17 Amortized Analysis

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Bayesian predictive Configural Frequency Analysis

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

SOLVING CAPACITATED VEHICLE ROUTING PROBLEMS WITH TIME WINDOWS BY GOAL PROGRAMMING APPROACH

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

MMA and GCMMA two methods for nonlinear optimization

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

The Geometry of Logit and Probit

Perfect Competition and the Nash Bargaining Solution

VQ widely used in coding speech, image, and video

Hidden Markov Models

The Minimum Universal Cost Flow in an Infeasible Flow Network

MODIFICATION OF FRIEDMAN-RUBIN'S CLUSTERING ALGORITHM FOR USE IN STRATIFIED PPS SAMPLING

Which Separator? Spring 1

Generalized Linear Methods

On the Multicriteria Integer Network Flow Problem

Difference Equations

Kernel Methods and SVMs Extension

Statistics for Economics & Business

Global Sensitivity. Tuesday 20 th February, 2018

Feature Selection: Part 1

Chapter 8 Indicator Variables

STATIC OPTIMIZATION: BASICS

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Notes on Frequency Estimation in Data Streams

CS-433: Simulation and Modeling Modeling and Probability Review

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM

Negative Binomial Regression

Polynomial Regression Models

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Computing Correlated Equilibria in Multi-Player Games

Convergence of random processes

CHAPTER 14 GENERAL PERTURBATION THEORY

Lecture 3: Probability Distributions

Lecture 4: November 17, Part 1 Single Buffer Management

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

NUMERICAL DIFFERENTIATION

Lecture 6 More on Complete Randomized Block Design (RBD)

Basically, if you have a dummy dependent variable you will be estimating a probability.

Transcription:

Maxmzng Overlap of Large Prmary Samplng Unts n Repeated Samplng: A comparson of Ernst s Method wth Ohlsson s Method Red Rottach and Padrac Murphy 1 U.S. Census Bureau 4600 Slver Hll Road, Washngton DC 20233 padrac.a.murphy@census.gov, red.a.rottach@census.gov Abstract Many large repeated or contnuous demographc surveys employ a mult-stage desgn where large geographc areas (such as countes or clusters of contguous countes) are sampled n the frst or prmary stage. Usually, a new sample of these prmary sample unts (PSUs) s selected perodcally n order to account for changes n populaton, survey obectves, or other consderatons. But because hrng and tranng new ntervewers can be expensve, and replacng experenced ntervewers wth nexperenced ones may have an adverse effect on data qualty, there s often a strong ncentve to retan as many as possble of the PSUs from the old sample desgn when selectng the new PSU sample. At the same tme, one wshes to also retan the advantages of havng a probablty sample. Varous methods have been proposed to coordnate repeated samples wth these two consderatons n mnd. Ths paper dscusses and compares two such methods. The frst method, due to Ernst (1986,) has been used for demographc surveys at the U.S. Census Bureau. Ths method does not requre ndependent samplng between strata n the prevous desgn, and s cast as a constraned optmzaton problem, so n some respect the soluton s optmal. The second method, due to Ohlsson (1996, 2001,) uses exponental samplng, and does have the requrement of ndependent samplng; but t may be used repeatedly because t does not destroy ndependence n the current desgn. Key Words: Repeated Samplng, Coordnated Samplng, Maxmzng Overlap, Exponental Samplng, Permanent Random Numbers (PRNs) 1. Introducton The Census Bureau s currently n the research phase of a sample redesgn for several maor demographc surveys. Sample wll be selected followng the 2010 Census. One of the areas of research s that of maxmum overlap of ts PSUs. We defne a method of maxmum overlap as one that ncreases the probablty of reselectng PSUs already n sample compared to ndependent selectons, whle mantanng uncondtonal probablty proportonal to sze (pps) samplng. We are nterested n comparng overlap procedures that would be sutable gven the constrant that they can be used repeatedly across multple desgns. The method of Ernst (1986) was frst used at the Census Bureau followng the 1980 redesgn, and has been used n the 1990 and 2000 redesgns as well. The method of Ohlsson (1996, 2001) was an mportant development snce t appears to be the only method that does not lead to dependent selectons n the current desgn. Ths s at the heart of how Ohlsson s method satsfes the requrement for repeated use, whereas Ernst s method satsfes the requrement by not requrng ndependent samplng from stratum to stratum n the old desgn. Our nterest n presentng a drect comparson between the two methods comes from ths feature they have n common, and from the lack of a drect numercal comparson of the two methods n statstcal lterature. Ernst (1999) dscusses several dfferent features of several methods of overlap, although he does not nclude ther expected overlaps. Ohlsson (1996) compares the expected 1 Ths report s released to nform nterested partes of ongong research and to encourage dscusson of work n progress. The vews expressed on statstcal ssues are those of the author and not necessarly those of the U.S. Census Bureau.

overlaps of several dfferent methods, but he does not nclude Ernst, sayng that n part t was to avod lnear programmng. For our numercal comparsons we use data from the prevous two redesgns of the Current Populaton Survey (CPS), n whch we formed and restratfed PSUs followng the 1990 and 2000 Censuses. 2. PSU creaton, stratfcaton, and probabltes of selecton The prmary motvaton for PSU creaton s to form areas that allow manageable ntervewer workloads. Many PSUs are sngle countes, although they may be formed from any number of contguous countes, or n some cases, county-equvalents. The PSUs are then stratfed nto lke groups, such as by choosng the stratfcaton that mnmzes a samplng varance. PSUs are then assgned probabltes of selecton that are proportonal to sze. For surveys that select one PSU per stratum, ths s the measure of sze of the PSU dvded by the measure of sze of the stratum. Otherwse, for the selecton of two PSUs, the probablty of selecton s twce the measure of sze of the PSU dvded by the measure of sze of the stratum; ths s approprate for wthout replacement samplng. Wthn each stratum, the ont selecton probabltes for selectng pars of PSUs are controlled usng Durbn s formula (1967). When selectng PSUs, we restrct ourselves to pps samplng, but do not necessarly constran ont probabltes of selecton of PSUs n dfferent strata. In fact, we may follow an approach that leads to unknown ont probabltes of selecton. 3. Overlap We defne overlap to be an ndcator of whether a PSU, or some porton of the PSU, was n sample n two consecutve desgns. For the current desgn, the sum of these ndcator varables s the number of PSUs that were sampled n the prevous desgn. Expected overlap s the expected value of the number of PSUs selected n both desgns. Ths varable s defned at the stratum level for the current desgn. It does not depend on any realzaton n the old or new desgns, but ntegrates over all possble outcomes. From ths, we may present the expected number of contnung PSUs (those sampled n both desgns), whch would be the sum of expected overlaps, or smlarly, an average expected overlap. Our workng defnton of maxmum overlap s a method of samplng PSUs that: Is a probablty sample; that s, has known selecton probabltes Has a hgher average expected overlap than samplng ndependently from the prevous desgn 4. Samplng PSUs 4.1 Notaton In ths paper we wll dentfy PSUs as though ther defnton had not changed across desgns, although n fact that wll not be the case. The PSUs that changed defnton were dvded nto peces, and these peces were treated as PSUs for the sake of overlap. For a gven stratum n the new desgn: represents a PSU s ts probablty of selecton n the new desgn p was ts probablty of selecton n the old desgn Sums ndexed by are over all PSUs n the new desgn stratum

4.2 Independent Samplng (A Lower Bound for Expected Overlap) The overlap procedures we examne wll perform at least as well as ndependent samplng n each stratum, so the expected overlap of ndependent samplng s an obvous lower bound. Furthermore, we would lke to consder the possblty of usng ths approach f we can t show there are real benefts to usng maxmum overlap procedures. Wth ndependent selecton, we gnore the outcome of the prevous desgn when selectng PSUs n the new desgn, so for each PSU the probablty t s n both desgns s the product of ther probabltes. For each new desgn stratum, the expected overlap for ndependent samplng s: overlapnd p 4.3 Posson Samplng (An Upper Bound for Expected Overlap) If we allowed varable sample szes, we could mplement a Posson samplng procedure that would acheve an expected overlap hgher than the procedures we are consderng. Posson samplng refers to an approach n whch each PSU s selected ndependently of every other PSU n the stratum. That s, the PSUs are subected to ndependent Bernoull trals, n whch the expected number of PSUs selected s a sum of the probabltes of selecton. So, for example, f we were to select an expected one PSU per stratum, we may end up wth some strata wth no PSUs n sample, as well as strata wth multple PSUs. Brewer, Early, and Joyce (1972) dscuss an approach to samplng n whch a PRN from a unform [0,1] dstrbuton s assgned to every PSU, and the PSU s selected f the PRN s less than the target number of PSUs tmes the probablty of selectng that PSU. Usng these PRN s n the next desgn wll result n a maxmum overlap approach to samplng, and one that s n fact optmal. Followng an approach other than Posson samplng, n whch we add the constrant of a fxed sample sze, wll lead to an expected overlap no greater than that of the Posson approach. We dscuss Posson samplng only as an upper bound for the expected overlap of the methods we wll consder. For each new desgn stratum, the expected overlap for Posson samplng s: overlap po mn p, 4.4 Ernst s Method Ernst s method s a varant of an approach outlned n Causey, Cox, and Ernst (CCE, 1985). These authors address the problem of constraned optmzaton drectly, n whch the expected overlap s maxmzed usng numercal technques subect to the constrants on sample sze and probabltes of selecton. So, CCE s truly optmal, but has the drawback that t can only be used once for pps samplng snce t requres the knowledge of ont probabltes of selecton. These are dffcult enough to determne after t s mplemented that we consder them effectvely unknown. The way n whch Ernst (1986) avods the requrement of ndependent samplng from stratum to stratum s by selectng only one stratum from the old desgn to overlap wth, smlar to an earler method descrbed n Perkns (1970). Essentally, the expected overlap s optmzed gven the requrement that we wll select ust one stratum n the old desgn to overlap wth. It s superor to Perkns procedure n ths respect, but t s not necessarly optmal among a broader class of overlappng algorthms. Usng Ernst s method, the old desgn stratum s chosen probablstcally, wth the probabltes determned va the optmzaton procedure. The expected overlap for Ernst s method s determned by the optmzaton procedure and does not have a closed form. The expected overlap s the value of the obectve functon we wll maxmze by lnear programmng (PROC LP n SAS). 4.5 Ohlsson s Method As wth Posson samplng, Ohlsson s method uses PRNs. For a one-psu per stratum desgn, the approach s to transform the unformly dstrbuted PRNs and select the PSU wth the smallest assgned value. In

log 1 X partcular, for a gven PSU wth PRN equal to X, the transformed number s. It s very smple to mplement, and correlates wth the selectons n the old desgn only through the PRN. Although not mmedately apparent, t can be shown to be a method of maxmzng overlap; t satsfes our constrant of beng a probablty sample that ncreases the expected overlap when compared to samplng ndependently from the old desgn. For each new desgn stratum, the expected overlap for Ohlsson s method s p overlapohl p p A A' Where for each, defne the followng: D s the set of PSUs {} n the same old and new strata as unt, and satsfy p p. A s the set of PSUs n the same old desgn stratum as, but not n D. A s the set of PSUs n the same new stratum as unt, except those unts n A. Ths approach has been expanded to the selecton of n>1 PSUs per stratum (Ohlsson, 1999), but that case wll not be consdered here. 4.6 A Hybrd Approach As already dscussed, f we are to mantan probablty samplng we cannot use Ohlsson s method wthout frst selectng ndependently. One opton would be to phase out Ernst s method and phase n Ohlsson s across multple desgns, by selectng ndependently frst n some states. For example, f half the states were selected usng Ernst s method, and the other half ndependently, then the average expected overlap would be approxmately halfway between that of the two methods. 5. Results Table 1. Average Expected Overlap Method Average Expected Overlap Ernst 60% Ohlsson 61% Independent (Lower Bound) 35% Posson (Upper Bound) 81%

Fgure 1. Expected Overlap For 374 Non-self-representng Strata 1 0.9 0.8 0.7 Ernst's Method 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Ohlsson's Method The average expected overlaps of Ernst and Ohlsson were very close, at 60% and 61%, respectvely. Independent selecton was 35% on average, and the upper bound of Posson samplng resulted n 81%. It s nterestng to note the dfferng dstrbutons of expected overlap n Ernst and Ohlsson, as shown n Fgure 1. The dagonal lne represents equalty of the two axes. Ohlsson s method seems to perform better at the lower end of the scale, whle Ernst s method seems to perform better at the hgher end. Lower expected overlaps may suggest larger strata relatve to the PSU szes, whch may also be related to the number of strata that are overlapped wth. A possble reason for Ohlsson s performng better at the lower end s that the method uses nformaton from all strata overlapped n the old desgn, rather than havng to select ust one to overlap wth. Ernst s method wll be optmal when stratum defntons do not change, and t seems that n general the method wll work better when there are fewer old desgn strata that overlap, whch may explan why t performs better at the hgher end. References Brewer, K.R.W., Early, L.J. and Joyce, S.F. (1972). Selectng several samples from a sngle populaton. Australan Journal of Statstcs, 14, 231-239. Durbn, J. (1967). Desgn of Mult-Stage Surveys for the Estmaton of Samplng Errors. Appled Statstcs, 16, 152-164

Ernst, L.R. (1986). Maxmzng the Overlap Between Surveys When Informaton s Incomplete. European Journal of Operatonal Research, 27, 192-200. Ernst. Lawrence R. (1999). The Maxmzaton and Mnmzaton of Sample Overlap Problems: A Half Century of Results. Internatonal Statstcal Insttute, Proceedngs, Invted Papers, IASS Topcs, 168-182. Ernst, Lawrence R. (2000). Dscusson Paper - Sesson 31: Coordnatng Samplng Between and Wthn Surveys. The Second Internatonal Conference on Establshment Surveys. Alexandra VA: Amercan Statstcal Assocaton, 265-267. Ohlsson, E. (1996). Methods for PPS Sze One Sample Coordnaton. Insttute of Actuaral Mathematcs and Mathematcal Statstcs, Stockholm Unversty, No. 194. Ohlsson, E. (1999). Comparson of PRN Technques for Small Sample Sze PPS Sample Coordnaton. Insttute of Actuaral Mathematcs and Mathematcal Statstcs, Stockholm Unversty, No. 210. Ohlsson, E. (2000). Coordnaton of PPS Samples Over Tme. The Second Internatonal Conference on Establshment Surveys. Alexandra VA: Amercan Statstcal Assocaton, 255-264. Perkns, W.M. (1970). 1970 CPS Redesgn: Proposed Method for Dervng Sample PSU Selecton Probabltes Wthn 1970 NSR Strata. Memorandum to Joseph Waksberg, U.S. Bureau of the Census.