Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

Similar documents
Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

Imputation for Missing Data under PPSWR Sampling

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

Data Integration for Big Data Analysis for finite population inference

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

Fractional Imputation in Survey Sampling: A Comparative Review

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

Nonresponse weighting adjustment using estimated response probability

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Introduction to Survey Data Analysis

6. Fractional Imputation in Survey Sampling

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION

Workpackage 5 Resampling Methods for Variance Estimation. Deliverable 5.1

A decision theoretic approach to Imputation in finite population sampling

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Combining data from two independent surveys: model-assisted approach

Two-phase sampling approach to fractional hot deck imputation

Model Assisted Survey Sampling

Successive Difference Replication Variance Estimation in Two-Phase Sampling

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

Singh, S. (2013). A dual problem of calibration of design weights. Statistics: A Journal of Theoretical and Applied Statistics 47 (3),

A JACKKNIFE VARIANCE ESTIMATOR FOR SELF-WEIGHTED TWO-STAGE SAMPLES

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

Songklanakarin Journal of Science and Technology SJST R3 LAWSON

Asymptotic Normality under Two-Phase Sampling Designs

Chapter 8: Estimation 1

Taking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD

Sampling from Finite Populations Jill M. Montaquila and Graham Kalton Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A.

Sampling Techniques. Esra Akdeniz. February 9th, 2016

You are allowed 3? sheets of notes and a calculator.

J.N.K. Rao, Carleton University Department of Mathematics & Statistics, Carleton University, Ottawa, Canada

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

A note on multiple imputation for general purpose estimation

Resampling Variance Estimation in Surveys with Missing Data

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time. Data Management

Estimation of change in a rotation panel design

COMPLEX SAMPLING DESIGNS FOR THE CUSTOMER SATISFACTION INDEX ESTIMATION

Deriving indicators from representative samples for the ESF

Large n normal approximations (Central Limit Theorem). xbar ~ N[mu, sigma 2 / n] (sketch a normal with mean mu and sd = sigma / root(n)).

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

A Design-Sensitive Approach to Fitting Regression Models With Complex Survey Data

Robustness to Parametric Assumptions in Missing Data Models

Mean estimation with calibration techniques in presence of missing data

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

A comparison of weighted estimators for the population mean. Ye Yang Weighting in surveys group

Biostat 2065 Analysis of Incomplete Data

Combining Non-probability and Probability Survey Samples Through Mass Imputation

ESTIMATION OF FINITE POPULATION MEAN USING KNOWN CORRELATION COEFFICIENT BETWEEN AUXILIARY CHARACTERS

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS

Handling Missing Data on Asymmetric Distribution

Optimal Calibration Estimators Under Two-Phase Sampling

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

Estimation of Parameters and Variance

The Effect of Multiple Weighting Steps on Variance Estimation

New estimation methodology for the Norwegian Labour Force Survey

Variance Estimation for Calibration to Estimated Control Totals

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting)

IE 361 Module 4. Modeling Measurement. Reading: Section 2.1 Statistical Methods for Quality Assurance. ISU and Analytics Iowa LLC

Sampling Weights. Pierre Foy

CPT Section D Quantitative Aptitude Chapter 15. Prof. Bharat Koshti

agilis D1. Define Estimation Procedures European Commission Eurostat/B1, Eurostat/F1 Contract No

BOOK REVIEW Sampling: Design and Analysis. Sharon L. Lohr. 2nd Edition, International Publication,

Statistics 135: Fall 2004 Final Exam

SAS/STAT 14.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures

Survey Sample Methods

Multidimensional Control Totals for Poststratified Weights

On the bias of the multiple-imputation variance estimator in survey sampling

5.3 LINEARIZATION METHOD. Linearization Method for a Nonlinear Estimator

Weight calibration and the survey bootstrap

New Developments in Nonresponse Adjustment Methods

arxiv:math/ v1 [math.st] 23 Jun 2004

Research Methods in Environmental Science

Model-assisted Estimation of Forest Resources with Generalized Additive Models

Jakarta, Indonesia,29 Sep-10 October 2014.

New Method to Estimate Missing Data by Using the Asymmetrical Winsorized Mean in a Time Series

in Survey Sampling Petr Novák, Václav Kosina Czech Statistical Office Using the Superpopulation Model for Imputations and Variance

On the Use of Compromised Imputation for Missing data using Factor-Type Estimators

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

Sampling Concepts. IUFRO-SPDC Snowbird, UT September 29 Oct 3, 2014 Drs. Rolfe Leary and John A. Kershaw, Jr.

Sampling: What you don t know can hurt you. Juan Muñoz

Estimation for two-phase designs: semiparametric models and Z theorems

A Family of Estimators for Estimating The Population Mean in Stratified Sampling

What is Survey Weighting? Chris Skinner University of Southampton

Estimating Unemployment for Small Areas in Navarra, Spain

Propensity score adjusted method for missing data

VARIANCE ESTIMATION FOR COMBINED RATIO ESTIMATOR

Efficient estimators for adaptive two-stage sequential sampling

Missing Covariate Data in Matched Case-Control Studies

Part 7: Glossary Overview

Investigating the Use of Stratified Percentile Ranked Set Sampling Method for Estimating the Population Mean

Sociology 6Z03 Review I

SAMPLING- Method of Psychology. By- Mrs Neelam Rathee, Dept of Psychology. PGGCG-11, Chandigarh.

Sampling techniques for big data analysis in finite population inference

'LVNXVVLRQVSDSLHUH 'LVFXVVLRQ3DSHUV

A measurement error model approach to small area estimation

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand

Transcription:

Jackknife Variance Estimation for Two Samples after Imputation under Two-Phase Sampling Jong-Min Kim* and Jon E. Anderson jongmink@mrs.umn.edu Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris 2004 JSM, August 11 1

Outline Jackknife Background and Definition Ratio Estimator in Simple Random Sampling Jackknife variance estimators for two samples Simulation study Ratio Estimator in Stratified Random Sampling Jackknife variance estimators for two samples Simulation study Conclusions 2004 JSM, August 11 2

Jackknife Background Introduced by Quenouille (1949, 1956) as a method to reduce bias Popularized by Tukey (1958) who used it for variances and CIs Arvesen (1969) was the first to propose two-sample jackknife estimator 2004 JSM, August 11 3

Jackknife Definition Let ˆθ be an estimate Let ˆθ( j) be an estimator of the same form with observation j deleted The jackknife estimate of the variance of ˆθ is n 1 n n 2 [ˆθ( j) ˆθ] j=1 2004 JSM, August 11 4

Naive Jackknife Variance Estimator for Two Samples, Two-Phase, Simple Random Sampling Two, first-phase simple random samples s 1 of size n 1 and s 2 of size n 2 are taken without replacement from a population of N elements. Simple random subsamples s 1 of size n 1 and s 2 of size n 2 are taken without replacement from s 1 and s 2. Ratio estimators of Y, are y r1 = (y 1 /x 1 )x 1 = ˆR 1 x 1, y r2 = (y 2 /x 2 )x 2 = ˆR 2 x 2, where x 1 and x 2 are the means for the first-phase samples s 1 and s 2 and (x 1, y 1 ) and (x 2, y 2 ) are the means for the second-phase samples s 1 and s 2. 2004 JSM, August 11 5

Naive Jackknife Variance Estimator for Two Samples, SRS Define y rk ( j) = [y k ( j)/x k ( j)]x k( j), for all j s k, k = 1, 2, where (n k x k x j )/(n k 1), if (j s k ), x k ( j) = x k, if (j s k s k), y k ( j) = (n k y k y j )/(n k 1), if (j s k ), y k, if (j s k s k). and x k ( j) = (n k x k x j)/(n k 1) for all j s k. 2004 JSM, August 11 6

Naive Jackknife Variance Estimator for Two Samples, SRS Apply the usual jackknife method for y rk ( j) to get v Jrk = n k 1 n k j s k [y rk ( j) y rk ] 2. The jackknife variance estimator is a weighted average of two estimators, given by v Jr = 1v n Jr1 + n 2v Jr2 n 1 + n 2 = n 1 1 n 1 + [y r1 ( j) y r1 ] 2 + n 2 1 n 2 n 1 + n 2 j s 1 l s 2 [y r2 ( l) y r2 ] 2. 2004 JSM, August 11 7

Adjusted Jackknife Variance Estimator for Two Samples, Two-Phase, Simple Random Sampling Let s consider an adjusted jackknife variance estimator using a data imputation framework. In one of our samples, k = 1, 2 we define our estimator of Y as y ki = (1/n k ) i s yi. k If the observation is part of the second phase sample, s k, y i = y i, because y i is observed. If the value of y is not directly observed because it is part of s k s k, the value is obtained through ratio imputation as y i = (y k/x k )x i. 2004 JSM, August 11 8

Adjusted Jackknife Variance Estimator for Two Samples, Two-Phase, Simple Random Sampling Rao and Sitter (1995) proposed the following device { ẑ ki ( j) = yki yk ( j) + x k ( j) x ki y } k x ki, x k for sample k = 1, 2. Under this formulation, ẑ ki ( j) = y ki = (y k/x k )x ki for j s k s k in sample k = 1, 2, and ẑ ki ( j) = (y k ( j)/x k ( j))x ki for j s k in sample k = 1, 2. 2004 JSM, August 11 9

Define the adjusted estimator, y a ki( j) = 1 n k 1 n k i=1 ẑ ki ( j). Define the jackknife variance estimator for sample k, v Jrk = n k 1 n k j s k [y a ki( j) y ki ] 2, where y ki = y kr under ratio imputation. The jackknife variance estimator based on adjusted imputed estimators y ki, k = 1, 2 is a weighted average of two estimators, given by vjr a = 1v n Jr1 + n 2v Jr2 n 1 + n 2 = n 1 1 n 1 + [y a 1I( j) y 1I ] 2 + n 2 1 n 2 n 1 + n 2 j s 1 l s 2 [y a 2I( l) y 2I ] 2. 2004 JSM, August 11 10

Simulation Study Design: Simple Random Sampling Population size is 10000 Pop Y is related to Pop X, Y = 0.8 X + ɛ Y 40 60 80 100 120 60 80 100 120 X 2004 JSM, August 11 11

Simulation: Simple Random Sampling One Sample Jacknife Variance for Ratio Estimator J. Variance 0 10 20 30 40 50 60 70 2*SecondPhase 3*SecondPhase 200 400 600 800 1000 Second Phase Sample Size 2004 JSM, August 11 12

Simulation: Simple Random Sampling Two Sample Jackknife Variance for Ratio Estimator J. Variance 0 10 20 30 40 50 60 70 2*SecondPhase 3*SecondPhase 100 200 300 400 500 Second Phase Sample Size 2004 JSM, August 11 13

Simulation: Simple Random Sampling Comparison of One vs Two Sample Estimators St. Dev of Ests 0.0 0.5 1.0 1.5 2.0 2.5 3.0 One Sample St. Dev Two Sample St. Dev 200 400 600 800 1000 Second Phase Sample Size 2004 JSM, August 11 14

Adjusted Jackknife Variance Estimator for Two Samples, Two-Phase, Stratified Random Sampling Suppose the population of N units consists of L strata such that the h-th stratum consists of N h units and L h=1 N h = N. Suppose that an auxiliary variable, x, closely related to an item y is observed on all sample units, s hk in sample k = 1, 2 for stratum h. Ratio imputation uses y hki = (y hk/x hk )x hki for i s hk s hk where y hk and x hk are the means of y and x for the respondents s hk in stratum h. 2004 JSM, August 11 15

Under ratio imputation, y ki = L W h y hki = h L W h (y hk /x hk )x hk, h where x hk is the x mean for the full sample s hk h. from stratum Also, y hki ( hkj) = [y hk( hkj)/x hk ( hkj)]x hki, under ratio imputation when hkjth respondent is deleted, where y hk ( hkj) = [n hk y hk y hj ] /(n hk 1) and x hk ( hkj) = [n hk x hk x hj ] /(n hk 1), for k = 1, 2. 2004 JSM, August 11 16

Define ẑ hki ( hkj) = y hki + { yhk ( hkj) x hk ( hkj) x hki y } hk x hki, x hk for sample k = 1, 2, stratum h. Under this formulation, ẑ hki ( hkj) = y hki = (y hk/x hk )x hki for hkj s hk s hk in sample k = 1, 2, and ẑ hki ( hkj) = (y hk ( hkj)/x hk ( hkj))x hki for hkj s hk in sample k = 1, 2, stratum h. Define the adjusted estimator, y a hki( hkj) = 1 n hk 1 n hk i=1 ẑ hki ( hkj). Using these values, the jackknife variance estimator is given by v Jr (y ki ) = L h=1 n hk 1 n hk n hk j=1 [y a ki( hkj) y ki ] 2, 2004 JSM, August 11 17

Adjusted Jackknife Variance Estimator for Two Samples,Two-Phase, Stratified Random Sampling Noting that y a ki ( hkj) y ki = W h [y a hki ( hkj) y hki], where y a hki ( hkj) is the adjusted imputed estimator of the hth stratum mean Y h when hkjth sample unit is deleted, we get v Jr (y ki ) = = L Whv 2 Jr (y hki ) h=1 L h=1 W 2 h n hk 1 n hk n hk j=1 (y a hki( hkj) y hki ) 2, for k = 1, 2. 2004 JSM, August 11 18

The jackknife variance estimator is a weighted average of two estimators, given by v Jrs = 1v n Jr (y 1I ) + n 2v Jr (y 2I ) n 1 + n 2 n L 1 = n 1 + W 2 n h1 1 h n 2 n h=1 h1 n L 2 + n 1 + W 2 n h2 1 h n 2 n h2 h=1 n h1 j=1 n h2 j=1 where n 1 = 2 h=1 n h1 and n 2 = 2 h=1 n h2. (y a h1i( h1j) y h1i ) 2 (y a h2i( h2j) y h2i ) 2 2004 JSM, August 11 19

Simulation Study Design: Stratified Random Sampling Population size is N = 10000 Pop Y is related to Pop X, Y = 0.8 X + ɛ Three strata, X < 90, 90 X 110, 110 < X. Stratum 1 size = N 1 = 1633 s.t. W 1 = N 1 N =.1633. Stratum 2 size = N 2 = 6805 s.t. W 2 = N 2 N =.6805. Stratum 3 size = N 3 = 1562 s.t. W 3 = N 3 N =.1562. 2004 JSM, August 11 20

Simulation Study Design: Stratified Random Sampling Stratum=3 Population Y Values 40 60 80 100 120 Stratum=1 Stratum=2 60 80 100 120 Population X Values 2004 JSM, August 11 21

Simulation: Stratified Random Sampling One Sample Jackknife Variance, Stratified Sampling J. Variance 0 10 20 30 40 50 60 70 2*SecondPhase 3*SecondPhase 100 200 300 400 500 Second Phase Sample Size 2004 JSM, August 11 22

Simulation: Stratified Random Sampling Two Sample Jackknife Variance, Stratified Sampling J. Variance 0 10 20 30 40 50 60 70 2*SecondPhase 3*SecondPhase 100 200 300 400 500 Second Phase Sample Size 2004 JSM, August 11 23

Simulation: Stratified Random Sampling Comparison of One and Two Samples, Stratified RS St. Dev of Ests 0.0 0.5 1.0 1.5 2.0 2.5 3.0 One Sample St. Dev Two Sample St. Dev 100 200 300 400 500 Second Phase Sample Size 2004 JSM, August 11 24

Simulation: Stratified Random Sampling Comparison of Complete and Missing Data St. Dev of Ests 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Complete Data Missing Data Case 100 200 300 400 500 Second Phase Sample Size Stratified RS, First SS= 2*SecondPhase 2004 JSM, August 11 25

Calibration Approach to Jackknife Variance Estimation Three major advantages of calibration approach in Survey Sampling Leads to consistent estimates Provides an important class of techniques for the efficient combination of data sources. Has computational advantage for estimates. Apply Tracy et al. (2003) calibration in Stratified and Double Sampling to Jackknife Variance Estimator 2004 JSM, August 11 26

Conclusions and Future Study Jackknife variance estimator for two samples has less SD (variation) than Jackknife variance estimator for one sample. Adjusted Jackknife variance estimator for two samples, Stratified RS, is more efficient than the Adjusted Jackknife variance estimator for two samples, SRS. As the sample sizes increase, the adjusted Jackknife variance estimator for two-sample is shown to be consistent. Future: Apply Jackknife variance estimator for two samples to Stratified Multistage Sampling. 2004 JSM, August 11 27