Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Similar documents
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Nonresponse weighting adjustment using estimated response probability

Chapter 8: Estimation 1

6. Fractional Imputation in Survey Sampling

A measurement error model approach to small area estimation

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

Combining data from two independent surveys: model-assisted approach

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Two-phase sampling approach to fractional hot deck imputation

Recent Advances in the analysis of missing data with non-ignorable missingness

Data Integration for Big Data Analysis for finite population inference

Imputation for Missing Data under PPSWR Sampling

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

On the bias of the multiple-imputation variance estimator in survey sampling

Chapter 4: Imputation

Cluster Sampling 2. Chapter Introduction

Introduction to Survey Data Integration

Fractional Imputation in Survey Sampling: A Comparative Review

Parametric fractional imputation for missing data analysis

Propensity score adjusted method for missing data

Calibration estimation using exponential tilting in sample surveys

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Weighting in survey analysis under informative sampling

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

A note on multiple imputation for general purpose estimation

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

analysis of incomplete data in statistical surveys

Deriving indicators from representative samples for the ESF

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Statistical Methods for Handling Missing Data

Combining multiple observational data sources to estimate causal eects

Simple Linear Regression: The Model

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Estimation of Complex Small Area Parameters with Application to Poverty Indicators

arxiv: v2 [math.st] 20 Jun 2014

Combining Non-probability and. Probability Survey Samples Through Mass Imputation

A decision theoretic approach to Imputation in finite population sampling

Master s Written Examination

Chapter 3: Element sampling design: Part 1

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Economics 582 Random Effects Estimation

Robustness to Parametric Assumptions in Missing Data Models

Calibration Estimation for Semiparametric Copula Models under Missing Data

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Regression: Lecture 2

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

MISSING or INCOMPLETE DATA

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

Model Assisted Survey Sampling

Analyzing Pilot Studies with Missing Observations

Unequal Probability Designs

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Propensity Score Weighting with Multilevel Data

Introduction to Econometrics

Flexible Estimation of Treatment Effect Parameters

Graybill Conference Poster Session Introductions

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Penalized Balanced Sampling. Jay Breidt

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Advanced Topics in Survey Sampling

Problem Selected Scores

ECON Program Evaluation, Binary Dependent Variable, Misc.

Robust Hierarchical Bayes Small Area Estimation for Nested Error Regression Model

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Estimation from Purposive Samples with the Aid of Probability Supplements but without Data on the Study Variable

Topics and Papers for Spring 14 RIT

Model-based Estimation of Poverty Indicators for Small Areas: Overview. J. N. K. Rao Carleton University, Ottawa, Canada

Biostat 2065 Analysis of Incomplete Data

ECON The Simple Regression Model

Chapter 1. Linear Regression with One Predictor Variable

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Chapter 3: Maximum Likelihood Theory

New Developments in Nonresponse Adjustment Methods

Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Bootstrap inference for the finite population total under complex sampling designs

Inference with Imputed Conditional Means

Some methods for handling missing values in outcome variables. Roderick J. Little

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model. *Modified notes from Dr. Dan Nettleton from ISU

The regression model with one fixed regressor cont d

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Making sense of Econometrics: Basics

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Some methods for handling missing data in surveys

Nonparametric Small Area Estimation Using Penalized Spline Regression

Stat 579: Generalized Linear Models and Extensions

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

Two Applications of Nonparametric Regression in Survey Estimation

Ch 2: Simple Linear Regression

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Transcription:

Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse: imputation J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 2 / 70

Two-Phase Setup for Unit Nonresponse Phase one (A): Observe x i Phase two (A R ) : Observe (x i, y i ) π 1i = Pr[i A] : inclusion probability phase one (known) π 2i 1i = Pr[i A R i A] : inclusion probability phase two (unknown) We are interested in estimating the population mean of Y using weighted mean of the observations: i A ȳ R = R w i y i i A R w i where w i = π 1 1i ˆπ 1 2i 1i J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 3 / 70

Two-Phase Setup for Unit Nonresponse Regression weighting approach ȳ reg,1 = x N ˆβ or ȳ reg,2 = x 1 ˆβ where x 1 = ( i A π 1 1i ) 1 ( i A π 1 1i x i ) and ˆβ = ( i x i ) 1 ( π 1 1i x i A R π 1 1i x i A R i y i ). Response model approach: Make a parametric model assumption for π 2i 1i = p(x i ; φ) and use ˆπ 2i 1i = p(x i ; ˆφ). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 4 / 70

Theorem 5.1.1 Theorem (i) V [N 1 i A π 1 1i (x i, y i ) F] = O(n 1 ) (ii) V [ ˆV (ȲHT) F] = O(n 3 ) (iii) K L < π 2i 1i < K U, π 1 2i 1i = x i α for some α, (iv) x i λ = 1 for all i for some λ, (iv) R i : independent ȳ reg,1 ȳ N = 1 π 1 2i e i + O p (n 1 ), N i A R where π 2i = π 1i π 2i 1i, e i = y i x i β N, and β N = ( i U π 2i 1ix i x i) 1 i U π 2i 1ix i y i. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 5 / 70

Proof of Theorem 5.1.1 Since ˆβ = ( i A R π 1 1i x i x i) 1 i A R π 1 1i x i y i, ˆβ β N = O p (n 1/2 ) where β N = ( i U π 2i 1ix i x i) 1 i U π 2i 1ix i y i. ȳ reg,1 ȳ N = x N ˆβ xn β N N (y i x i β N ) = π 1 2i 1i π 2i 1i(y i x i β N ) i=1 = i=1 N i=1 Use π 1 2i 1i = x i α, transform x i to show x N ( ˆβ β N ) = N 1 (α x i )π 2i 1i (y i x i β N ) = 0 i A R π 1 2i e i + O p (n 1 ). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 6 / 70

Variance Estimation for ȳ reg,1 ȳ reg,1 = i A R x N j A R π 1 1j x j x j 1 π 1 1i x i Small f = n/n, let ˆb j = ˆπ 1 2j 1jêj, ê j = y j x j ˆβ. ˆV = 1 N 2 i A R π 1ij π 1iπ 1j π 1ij j A R y i =: 1 N ˆb i ˆbj π 1i π 1j i A R 1 π 1i 1 ˆπ 2i 1i y i J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 7 / 70

Justification Variance V w 2i e i F = (π 2ij π 2i π 2j ) w 2i w 2j e i e j i A R i U j U = π 2i 1i π 2j 1j (π 1ij π 1i π 1j )w 2i w 2j e i e j i j;i,j U + i U(π 2i π 2 2i)w 2 2ie 2 i where π 2ij = { π1ij π 2i 1i π 2j 1j for i j π 1i π 2i 1i for i = j. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 8 / 70

Justification (Cont d) Expectation of variance estimator E 1ij (π 1ij π 1i π 1j )w 2i e i w 2j e j F π 1 i A R j A R = i U(π 1i π 2 1i)π 2i 1i w 2 2ie 2 i = i U + i j;i,j U π 2i 1i π 2j 1j (π 1ij π 1i π 1j )w 2i e i w 2j e j (π 2ij π 2i π 2j )w 2i e i w 2j e j j U + i U π 2i (π 2i π 1i )w 2 2ie 2 i, where w 2i = N 1 π 1 2i. The second term is the bias of the variance estimator and it is of order O(N 1 ). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 9 / 70

Variance Estimation for ȳ reg,2 ȳ reg,2 = x 1 ˆβ ȳ reg,2 ȳ N = ( x 1 x N )β N + x N ( ˆβ β N ) + O p (n 1 ) = ( x 1 x N )β N + N 1 π 1 2i (y i x i β N ) + O p (n 1 ). i A R Variance estimator ˆV 2 = 1 N 2 i A j A π 1ij π 1i π 1j π 1ij ˆb i2 π 1i ˆb j2 π 1j ( 1 where ˆb j2 = (x j x 1 ) ˆβ + (N x 1 ) i AR π 1 1i x i i) x Rj x j êj. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 10 / 70

Response model approach (Propensity score approach) Let R i = { 1 if yi is observed 0 otherwise Assume that the true response mechanism satisfies Pr(R = 1 x, y) = Pr(R = 1 x) = p(x; φ 0 ) (1) for some φ 0. The first equality is often called missing at random (MAR). Under the response model (18), a consistent estimator of φ 0 can be obtained by solving Û h (φ) { } Ri d i p(x i ; φ) 1 h(x i ; φ) = 0, (2) i A where d i = 1/π i, for some h(x; φ) such that Û h (φ)/ φ is of full rank. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 11 / 70

Once ˆφ h is computed from (2), the propensity score adjusted (PSA) estimator of Y = N i=1 y i is given by Ŷ PSA = i A R d i g(x i ; ˆφ h )y i, (3) where g(x i ; ˆφ h ) = {p(x i ; ˆφ h )} 1. The PSA estimator Ŷ PSA is asymptotically equivalent to Ỹ PSA = d i g(x i ; φ 0 )y i + d i h i d i g(x i ; φ 0 )h i i A R i A i A R where ( N ) 1 N B z = p i z i h i p i z i y i, i=1 p i = P(R i = 1 x i ), and z i = g(x i ; φ)/ φ evaluated at φ = φ 0. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 12 / 70 i=1 B z (4)

Thus, the asymptotic variance is equal to ) ) V (ỸPSA F N = V (ŶHT F N +V d i p 1 ( i yi h ) ib z FN i A R ) = V (ŶHT F N { +E di 2 (p 1 i 1) ( } y i h ) 2 ib z FN, i A where p i = p(x i ; φ 0 ) and the second equality follows from independence among R i s. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 13 / 70

Note that { E di 2 (p 1 i 1) ( } y i h ) 2 ib z = E = E +E i A [ i A [ i A [ i A d 2 i (p 1 i 1) { y i E(y i x i ) + E(y i x i ) h ib z } 2 d 2 i (p 1 i 1) {y i E(y i x i )} 2 d 2 i (p 1 i 1) { E(y i x i ) h ib z } 2 and the cross product term is zero because y i E(y i x i ) is conditionally unbiased for zero, conditional on x i and A. ] ] ] J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 14 / 70

Thus, we have ) V (ỸPSA F N ) V (ŶHT F N [ ] + E di 2 (p 1 i 1) {y i E(y i x i )} 2 F N, i A where the equality holds if ˆφ h satisfies { } Ri d i p(x i ; φ) 1 E(Y x i ) = 0. (5) i A Condition (5) provides a way of constructing an optimal PSA estimator. If E(Y x) = β 0 + β 1 x, an optimal PSA estimator of θ can be obtained by solving i A d i R i p(x i ; φ) (1, x i) = i A d i (1, x i ). (6) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 15 / 70

We now discuss variance estimation of PSA estimators of the form (3) where ˆp i = p i ( ˆφ) is constructed to satisfy (2). By (4), we can write Ŷ PSA = ) d i η i (φ 0 ) + o p (n 1/2 N, (7) i A where η i (φ) = h i B z + R i ( yi h ) p i (φ) ib z. (8) To derive the variance estimator, we assume that the variance estimator ˆV = i A j A Ω ijq i q j satisfies ˆV /V (ˆq HT F N ) = 1 + o p (1) for some Ω ij related to the joint inclusion probability, where ˆq HT = i A d iq i for any q with a finite fourth moment. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 16 / 70

To obtain the total variance, the finite population is divided into two groups, a population of respondents and a population of nonrespondents, so the response indicator is extended to the entire population as R N = {R 1, R 2,, R N }. Given the population, the sample A is selected according to a probability sampling design. Then, we have both respondents and nonrespondents in the sample A. The total variance of ˆη HT = i A d iη i can be written as V (ˆη HT F N ) = E{V (ˆη HT F N, R N ) F N } + V {E(ˆη HT F N, R N ) F N } = V 1 + V 2. (9) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 17 / 70

The conditional variance term V (ˆη HT F N, R N ) in (9) can be estimated by ˆV 1 = Ω ij ˆη i ˆη j, (10) i A j A where ˆη i = η i ( ˆφ) is defined in (8) with B z replaced by a consistent estimator such as ˆB z = 1 d i ẑ i h i d i ẑ i y i i A R i A R and ẑ i = z(x i ; ˆφ) is the value of z i = g(x i ; φ)/ φ evaluated at φ = ˆφ. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 18 / 70

The second term V 2 in (9) is ( N ) V {E(ˆη HT F N, R N ) F N } = V η i F N = N i=1 i=1 A consistent estimator of V 2 can be derived as ˆV 2 = 1 ˆp i d i ˆp 2 i A R i 1 p i p i ( yi h ib z ) 2. ( y i h i ˆB z ) 2. (11) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 19 / 70

Therefore, ˆV (ŶPSA ) = ˆV 1 + ˆV 2, (12) is consistent for the variance of the PSA estimator defined in (3) with ˆp i = p i ( ˆφ) satisfying (2), where ˆV 1 is in (10) and ˆV 2 is in (11). Note that the first term of the total variance is V 1 = O p (n 1 N 2 ), but the second term is V 2 = O p (N). Thus, when the sampling fraction nn 1 is negligible, that is, nn 1 = o(1), the second term V 2 can be ignored and ˆV 1 is a consistent estimator of the total variance. Otherwise, the second term V 2 should be taken into consideration so that a consistent variance estimator can be constructed as in (12). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 20 / 70

5.2 Imputation Meaning: Fill in missing values by a plausible value (or by a set of plausible values) Why imputation? It provides a complete data file: we can apply the standard complete data methods By filling in missing values, the analyses from different users can be consistent. By a proper choice of imputation model, we may reduce the nonresponse bias. Do not want to delete the records of partial information: Makes full use of information. (i.e. reduce the variance) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 21 / 70

Basic setup y i : study variable. subject to missing. x i : auxiliary variable. always observed. R i : response indicator function for y i. Imputed estimator of total Y = N i=1 y i: Ŷ I = i A where y is the imputed value of y i. How to find y i? 1 π i {R i y i + (1 R i )y i } (13) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 22 / 70

Lemma 1: If y i is not observed at R i = 0 and if we can find yi that satisfies E (yi R i = 0) = E (y i R i = 0) (14) then the imputed ) estimator ŶI in (13) is unbiased for Y in the sense that E (ŶI Y = 0. How to get y i satisfying (14)? Deterministic imputation: Use an estimator of E (y i R i = 0). Stochastic imputation: Generate y i from f (y i R i = 0). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 23 / 70

Approaches of computing the conditional distribution f (y i R i = 0): Assuming Missing Completely At Random (MCAR): f (y i R i = 0) = f (y i R i = 1). (15) Under MCAR, we can estimate the parameter using the set of respondents. However, the MCAR may not be realistic. Assume that there exists an auxiliary vector x i such that f (y i x i, R i = 0) = f (y i x i, R i = 1). (16) Condition (16) is called Missing At Random (MAR). Under MAR, we have E (y i R i = 0) = E {E (y i x i, R i = 0) R i = 0} = E {E (y i x i, R i = 1) R i = 0}. Thus, we have only to generate y i from f (y i x i, R i = 1). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 24 / 70

Lemma 2: Let yi be the imputed value of y i. If E (y i x i, R i = 1) = E (y i x i, R i = 1) (17) and MAR condition holds, then the imputed estimator Ŷ I in (13) is unbiased. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 25 / 70

When the MAR condition holds? : If the response mechanism satisfies Pr (R i = 1 y i, x i ) = Pr (R i = 1 x i ) then (16) holds. Commonly used imputation methods 1 Business surveys: Ratio, regression, nearest neighbor imputation 2 Socio-economic surveys: Random donor (within classes), stochastic ratio or regression, Fractional Imputation, Multiple imputation. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 26 / 70

A Hot Deck Imputation Procedure Partition the sample into G groups: A = A 1 A 2 A G. In group g, we have n g elements, r g respondents, and m g = n g r g nonrespondents. For each group A g, select m g imputed values from r g respondents with replacement (or without replacement). Imputation model: y i iid(µ g, σ 2 g ), i A g (respondents and missing) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 27 / 70

Example 5.2.1: Hot Deck Imputation Under SRS A g = A Rg A Mg with A Rg = {i A g ; R i = 1} and A Mg = {i A g ; R i = 0}. Imputation: y j = y i with probability 1/r g for i A Rg and j A Mg. Imputed estimator of ȳ N : ȳ I = n 1 i A {R i y i + (1 R i ) yi } =: n 1 y Ii i A J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 28 / 70

Variance of Hot Deck Imputed Mean V (ȳ I ) = V {E I (ȳ I y n )} + E {V I (ȳ I y n )} G G = V n g ȳ Rg + E ( ) n 2 m g 1 r 1 g S 2 Rg n 1 g=1 where ȳ Rg = rg 1 i A Rg y i and SRg 2 = (r g 1) 1 i A Rg (y i ȳ Rg ) 2, y n = (y 1, y 2,..., y n ) g=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 29 / 70

Variance of Hot Deck Imputed Sample (2) Model : y i i A g iid(µ g, σ 2 g ) V {ȳ I } = V {ȳ n } + n 2 G g=1 n g m g r 1 g G = V {ȳ n } + n 2 c g σg 2 g=1 Reduced sample size: n 2 n 2 g (r 1 g σ 2 g + n 2 ng 1 )σg 2 G g=1 m g (1 rg 1 )σg 2 Randomness due to stochastic imputation: n 2 m g (1 rg 1 )σg 2 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 30 / 70

Variance Estimation Naive approach: Treat imputed values as if observed Naive approach underestimates the true variance! Example: Naive: ˆV I = n 1 S 2 I E { } SI 2 } n = E {(n 1) 1 (y Ii ȳ I ) 2 Bias corrected estimator ˆV = ˆV I + i=1. = (n 1) 1 [ E{(y Ii µ) 2 } V {ȳ I } ]. = E(Sy,n) 2 G c g SRg 2 g=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 31 / 70

Other Approaches for Variance Estimation Multiple imputation: Rubin (1987) Adjusted jackknife: Rao and Shao (1992) Fractional imputation: Kim and Fuller (2004), Kim (2011). Linearization: Shao and Steel (1999), Kim and Rao (2009) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 32 / 70

Fractional imputation Basic Idea Split the record with missing item into M imputed values Assign fractional weights J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 33 / 70

Example 5.2.1: Artificial bivariate data from SRS Table 5.1: Sample with Missing data ID Weight Cell for x Cell for y x y 1 0.10 1 1 1 7 2 0.10 1 1 2 M 3 0.10 1 2 3 M 4 0.10 1 1 M 14 5 0.10 1 2 1 3 6 0.10 2 1 2 15 7 0.10 2 2 3 8 8 0.10 2 1 3 9 9 0.10 2 2 2 2 10 0.10 2 1 M M M: Missing J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 34 / 70

x: categorical variable with three categories (1,2,3). Two imputation cells Cell 1: Cell 2: x = 1 with observed prob. 0.5 2 with observed prob. 0.25 3 with observed prob. 0.25 1 with observed prob. 0.00 x = 2 with observed prob. 0.50 3 with observed prob. 0.50 y: continuous variable. Four possible donors for cell one and three possible donors for cell two. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 35 / 70

Fractional hot deck imputation: Table 5.2: Fractionally Imputed Data Set Donor Final ID x y wij0 Weight Cell for x Cell for y x y 1 0 0 0.1000 1 1 1 7 2 0 1 0.3333 0.0289 1 1 2 7 0 6 0.3333 0.0396 1 1 2 15 0 8 0.3333 0.0315 1 1 2 9 3 0 5 0.3333 0.0333 1 2 3 3 0 7 0.3333 0.0333 1 2 3 8 0 9 0.3333 0.0333 1 2 3 2 4 0 0.5000 0.0500 1 1 1 14 0 0.2500 0.0250 1 1 2 14 0 0.2500 0.0250 1 1 3 14 5 0 0 0.1000 1 2 1 3 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 36 / 70

Table 5.2: Fractionally Imputed Data Set (Cont d) Donor Final ID x y wij0 Weight Cell for x Cell for y x y 6 0 0 0.1000 2 1 2 15 7 0 0 0.1000 2 2 3 8 8 0 0 0.1000 2 1 3 9 9 0 0 0.1000 2 2 2 2 10 8 0.2500 0.0225 2 1 2 9 4 0.2500 0.0275 2 1 2 14 1 0.2500 0.0209 2 1 3 7 6 0.2500 0.0291 2 1 3 15 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 37 / 70

Q. How to compute the fractional weight? [Step 1] Compute the cell mean of the control variable z i where z i consists of I (x i = 1), I (x i = 2), I (x i = 3), and y i. [Step 2] Apply the calibration weighting method to find wij that satisfies M w i z i + w i wij z i = w i z c i A Rc j=1 i A c and M j=1 w ij = 1. i A Mc J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 38 / 70

Nearest neighbor imputation Models for nearest neighbor imputation Model Y i = g (x i, β) + e i Semiparametric Model : If g ( ) were known, one could use model based imputation such as Y i = g ( ) x i, ˆβ + êi. Nonparametric Model : The form of g ( ) is unknown, except that it is a smooth function of x i. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 39 / 70

Models for nearest neighbor imputation Fay (1999) s model E ζ (y i ) = E ζ ( ynn1(i) ) = Eζ ( ynn2(i) ) Var ζ (y i ) = Var ζ ( ynn1(i) ) = Varζ ( ynn2(i) ) and the y-variables are uncorrelated, where nn1(i) is the index for the nearest neighbor of unit i and nn2(i) is the index for the second nearest neighbor of unit i. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 40 / 70

Models for nearest neighbor imputation Alternative representation of Fay (1999) s model Y j indep (µ gi, σ gi ), j A gi where A gi = {i, nn1(i), nn2(i)} is the index set of the two nearest neighbors of unit i. Thus, Fay (1999) s model is a special case of cell mean model with much finer cell definition. Kim and Fuller (2004) proposed a variance estimation method for fractional imputation under the cell mean model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 41 / 70

Models for nearest neighbor imputation Alternative representation of Fay (1999) s model Y j indep (µ gi, σ gi ), j A gi where A gi = {i, nn1(i), nn2(i)} is the index set of the two nearest neighbors of unit i. Thus, Fay (1999) s model is a special case of cell mean model with much finer cell definition. Kim and Fuller (2004) proposed a variance estimation method for fractional imputation under the cell mean model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 42 / 70

Example: Simple Random Sample Auxiliary Element Sample Variable Y Weight ( House Rent ) ( Income ) 1 0.1 1 1 2 0.1 1 2 3 0.1 1 3 4 0.1 1 4 5 0.1 1 5 6 0.1 1? 7 0.1 0 3 8 0.1 0 6 9 0.1 0 9 10 0.1 0? J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 43 / 70

Example : Jackknife for Fractional Imputation in Kim and Fuller (2004) w (1) ij w (2) i w (2) w (5) ij Unit w i wij X Y w (1) i ij w (5) i 1 0.10 1 1 0 0.111 0.111 2 0.10 1 2 0.111 0 0.111 3 0.10 1 3 0.111 0.111 0.111 4 0.10 1 4 0.111 0.111 0.111 5 0.10 1 5 0.111 0.111 0 6 0.05 1 1 0.111 (0.5 δ 1 ) 0.055 0.111 (0.5 + δ 5 ) 0.05 1 5 0.111 (0.5 + δ 1 ) 0.055 0.111 (0.5 δ 5 ) 7 0.10 0 3 0.111 0.111 0.111 8 0.10 0 6 0.111 0.111 0.111 9 0.10 0 9 0.111 0.111 0.111 10 0.05 0 3 0.055 0.055 0.055 0.05 0 6 0.056 0.056 0.056 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 44 / 70

Example (Continued) w (6) ij w (7) i w (7) w (10) ij Unit w i wij X Y w (6) i ij w (10) i 1 0.10 1 1 0.111 0.111 0.111 2 0.10 1 2 0.111 0.111 0.111 3 0.10 1 3 0.111 0.111 0.111 4 0.10 1 4 0.111 0.111 0.111 5 0.10 1 5 0.111 0.111 0.111 6 0.05 1 1 0 0.055 0.055 0.05 1 5 0 0.056 0.056 7 0.10 0 3 0.111 0 0.111 8 0.10 0 6 0.111 0.111 0.111 9 0.10 0 9 0.111 0.111 0.111 10 0.05 0 3 0.55 0.111 (0.5 δ 7 ) 0 0.05 0 6 0.56 0.111 (0.5 + δ 7 ) 0 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 45 / 70

Example : Continued If θ = E (X ) If θ = E (Y ) ˆθ (k) I ˆθ I = ˆθ (k) n ˆθ n, k = 1, 2,..., 10. ˆθ (1) I ˆθ I = ˆθ (1) I,naive ˆθ I + δ 1 (y 5 y 1 ) ˆθ (5) I ˆθ I = ˆθ (5) I,naive ˆθ I + δ 5 (y 1 y 5 ) ˆθ (7) I ˆθ I = ˆθ (7) I,naive ˆθ I + δ 7 (y 8 y 7 ) ˆθ (8) I ˆθ I = ˆθ (8) I,naive ˆθ I + δ 8 (y 7 y 8 ) ˆθ (k) I ˆθ I = ˆθ (k) I,naive ˆθ I, k = 2, 3, 4, 6, 9, 10. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 46 / 70

Variance estimation for fractional imputation Variance estimator is a function of δ k s : ˆV δ = L k=1 c k i A R α (k) δ,i y i α i y i i A R Naive variance estimator ( δ k 0 ) : Underestimation Increasing the δ k will increase the value of variance estimator How to decide δ k? ( ) ) E ˆV δ Var (ˆθ I = E [ G L g=1 i A R k=1 ] ( ) c k α (k) 2 δ,i α i α 2 i 2 σ 2 g J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 47 / 70

Variance estimation for fractional imputation Kim and Fuller (2004) showed that if i A R w (k) ij = 1 (C.1) and L i A Rg k=1 ( ) c k α (k) 2 i α i = αi 2, i A Rg then the replication variance estimator defined by (C.2) ˆV I = L k=1 c k (ˆθ (k) I ˆθ I ) 2, where ˆθ (k) I = i A R α (k) i y i, is unbiased for the total variance under the cell mean model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 48 / 70

Variance estimation for fractional imputation Condition (C.1) is needed for the replication weight to be used for other completely responding variables. Condition (C.2) is used to unbiasedly estimate the imputation variance. The δ k is determined by solving a quadratic equation of δ k. i A Rg c k ( ) 2 α (k) ( ) δ,i α i c k α (k) 2 0,i α i = α 2 k i A Rg L s=1 ( ) c s α (s) 2 0,k α k where α (k) 0,i = j A w (k) i wij d ij is the k-th replicate of total weight of donor i for naive variance estimator. (δ k 0) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 49 / 70

Example : Continued Calculation of δ 1 : δ 1 = 0.303 0.9 {0.044 0.111δ 1 0.14} 2 + 0.9 {0.178 + 0.111δ 1 0.16} 2 0.9 {0.044 0.14} 2 0.9 {0.178 0.16} 2 = 0.14 2 0.9 {0.044 0.14} 2 0.9 {0.155 0.14} 2 8 0.9 {0.111 0.14} 2 Similarly, we can calculate δ 5 = 0.429, δ 7 = 0.377, and δ 8 = 0.377. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 50 / 70

Conclusion Kim, Fuller, and Bell (2011) used the method to compute the variance estimation for 2000 US census long form income data. Nonparametric method: Theoretically challenging but practically very attractive (and popular) Other references 1 Chen and Shao (2001) 2 Beaumont and Bocci (2009) 3 Kim and Fuller (2004), Fuller and Kim (2005), Kim (2011). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 51 / 70

Small area estimation Basic Setup Original sample A is decomposed into G domains such that A = A 1 A G and n = n 1 + + n G n is large but n g can be very small. Direct estimator of Y g = i U g y i Ŷ d,g = i A g 1 π i y i Unbiased May have high variance. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 52 / 70

If there is some auxiliary information available, then we can do something: Synthetic estimator of Y g Ŷs,g = Xg ˆβ where X g = i U g x i is the known total of x i in U g and ˆβ is an estimated regression coefficient. Low variance (if x i does not contain the domain indicator). Could be biased (unless i U g (y i x ib) = 0) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 53 / 70

Composite estimation: consider Ŷ c,g = α g Ŷ d,g + (1 α g ) Ŷs,g for some α g (0, 1). We are interested in finding αg that minimizes the MSE of Ŷ c. The optimal choice is αg = MSE (Ŷs,g ) MSE ) (Ŷd,g + MSE (Ŷd,g ) (Ŷs,g ) (Ŷd,g ) For the direct estimation part, MSE = V can be estimated. ) } For the synthetic estimation part, MSE (Ŷs,g = E {(Ŷ s,g Y g ) 2 cannot be computed directly without assuming some error model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 54 / 70

Area level estimation Basic Setup Parameter of interest: Ȳ g = Ng 1 Model and u g ( 0, σ 2 u). Also, we have with V g = V ( ˆȲ d,g ). i U g y i Ȳ g = X g β + u g ˆȲ d,g ( Ȳ g, V g ) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 55 / 70

Two model can be written ˆȲ d,g = Ȳg + e g X g β = Ȳg u g where e g and u g are independent error terms with mean zeros and variance V g and σ 2 u, respectively. Thus, the best linear unbiased predictor (BLUP) can be written as where α g = σ 2 u/(v g + σ 2 u). ˆȲ g = αg ˆȲ d,g + ( 1 αg ) X g β (18) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 56 / 70

MSE: If β, V g, and σu 2 are known, then ( ) ( ) MSE ˆȲ g = V ˆȲ g Ȳ g { ( ) = V αg ˆȲ d,g Ȳ g + ( 1 αg ) ( X ) } g β Ȳ g = ( αg ) 2 Vg + ( 1 αg ) 2 σ 2 u = αg V g = ( 1 αg ) σ 2 u. Note that, since 0 < αg < 1, ( ) MSE ˆȲ g < V g and ( ) MSE ˆȲ g < σu. 2 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 57 / 70

When β is unknown (and V g and σ 2 u are known): G ˆβ = w g X g X g g=1 1 where w g = ( σ 2 u + V g ) 1. The EBLUP is G g=1 w g X g ˆȲ d,g ˆȲ g ( ˆβ) = αg ˆȲ d,g + ( 1 αg ) X ˆβ g (19) which takes the form of the composite estimator. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 58 / 70

The MSE is { } MSE ˆȲ g ( ˆβ) { } = V ˆȲ g ( ˆβ) Ȳ g { ( ) = V αg ˆȲ d,g Ȳ g + ( 1 αg ) ( X ˆβ )} g Ȳ g = ( αg ) 2 Vg + ( 1 αg ) 2 {σu 2 + X g V ( ˆβ) X } g = α g V g + ( 1 α g )2 X g V ( ˆβ) X g. where ( G V ˆβ) = w g X g X g g=1 1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 59 / 70

If β and σ 2 u are unknown: 1 Find a consistent estimator of β and σu. 2 2 Use ˆȲ g (ˆα g, ˆβ) = ˆα g ˆȲ d,g + ( 1 ˆα g ) X g ˆβ. (20) where ˆα g = ˆσ 2 u/( ˆV g + ˆσ 2 u) Estimation of σu: 2 Method of moment ˆσ u 2 = { G ( k g ˆȲ d,g X ˆβ) } 2 g ˆV d,g, G p g { } 1 where k g ˆσ u 2 + ˆV g and G g=1 k g = 1. If ˆσ u 2 is negative, then we set it to zero. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 60 / 70

MSE { MSE ˆȲ g (ˆα g, ˆβ) } { } = V ˆȲ g (ˆα g, ˆβ) Ȳ g { ( ) = V ˆα g ˆȲ d,g Ȳ g + ( 1 ˆα g ) ( X ˆβ )} g Ȳ g = ( αg ) 2 Vg + ( 1 αg ) } 2 {σu 2 + X g V ( ˆβ) X g +V (ˆα g ) { V g + σu 2 } = αg V g + ( 1 αg )2 X g V ( ˆβ) X g +V (ˆα g ) { V g + σu 2 } MSE estimation (Prasad and Rao, 1990): { MSE ˆ ˆȲ g (ˆα g, ˆβ) } = ˆα g ˆV g + ( 1 ˆα g )2 X g ˆV ( ˆβ) Xg { } +2 ˆV (ˆα g ) ˆV g + ˆσ u 2. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 61 / 70

Extensions Unit level estimation: Battese, Harter, and Fuller (1988). Use a unit level modeling and y gi = x giβ + u g + e gi Ŷ g = i U g {x gi ˆβ + û g }, where σu û g = Ê(u g ˆX 2 g, Ŷ g ) = (Ŷ g ˆX g ˆβ). σu 2 + ˆV g It can be shown that ˆȲ g = ˆα g Ȳreg,g + ( 1 ˆα g ) Ȳs,g where Ȳ reg,g = ˆȲ d,g + ( X g ˆ X d,g ) ˆβ and Ȳ s,g = X g ˆβ. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 62 / 70

Benchmarked small area estimation: Wang, Fuller, and Qu (2009). sum of the small area estimates is not necessarily equal to Ŷ = i A 1 π i y i It is desired to make the benchmarking condition holds: G N g ˆȲ g g=1 = Ŷ Idea: Since ˆȲ g = X g ˆβ + α g ( ˆȲ d,g X g ˆβ), we can adjust σ 2 u so that G g=1 ( N g αg ˆȲ d,g X ˆβ ) g = 0. For other applications, read Small Area Estimation by Rao (2003). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 63 / 70

Measurement Error Measurement error: Errors due to inaccurate measurement (e.g. interviewer effect; ambiguous questions; inaccurate memory; impossible to measure directly) Two aspects: 1 Bias: very hard to measure. Validation subsample needed 2 Variance: repeated independent determinations for the same item J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 64 / 70

Measurement Error Model X i : observed value of item x for unit i x i : true value of item x for unit i Measurement error Model 1: where u i x i iid ( 0, σ 2 u). Measurement error Model 2: X i = x i + u i X i = γ 0 + γ 1 x i + u i where u i x i iid ( 0, σ 2 u). Possible model if xi are observed in a validation sample. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 65 / 70

Simple Estimators Horvitz-Thompson estimator : ˆT X = i A w ix i Parameter: T x = N i=1 x i. 1 H-T estimator unbiased under Model 1 ( ) ( ˆT X T x = ˆTx T x + ˆTX ˆT ) x The first term has zero mean over the sampling mechanism and the second term has zero mean under model 1. 2 Variance { } V ˆTX T x F x = V = V { { } ˆTx T x F x + E wi 2 σu 2 { ˆTx T x F x } + E i A } { N } w i σu 2 i=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 66 / 70

Naive Variance Estimator ( ) ˆV ˆT X { } E ˆV ( ˆT x ) = i A j A { } = E ˆV ( ˆT x F x ) = V π 1 ij (π ij π i π j ) w i w j X i X j ( ˆT x T x F x ) + { } + E ˆV ( ˆT u ) N (1 π i ) w i σu. 2 ( ) { } The bias of ˆV ˆT X as an estimator of V ˆT X T x F x is Nσu, 2 which is negligible if n/n = o (1). Remark: The bias is negligible under the assumption that the measurement errors are independent. If the assumption does not hold, then the variance estimator is no longer asymptotically unbiased and we may need to use a different variance estimator. i=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 67 / 70

Complex Estimators Observe (X i, y i ), where X i subject to measurement error Regression Model (under model 1) y i = β 0 + β 1 x i + e i X i = x i + u i where e i x i ( 0, σ 2 e), ui x i ( 0, σ 2 u), and ei and u i are independent. Naive approach: OLS estimator associated with y i = β 0 + β 1 X i + a i where a i = e i β 1 u i. The OLS estimator is biased because E (a i X i ) 0. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 68 / 70

Regression Coefficient Under the assumption of e i and u i, { n E n 1 ( Xi X ) } n y i F x = n 1 (x i x) (β 0 + β 1 x i ) + o (1) and E { i=1 n 1 i=1 n ( Xi X ) } n 2 Fx = n 1 (x i x) 2 + σu 2 + o (1), i=1 we have { }.= E ˆβ1,OLS F x β1 σ xx σ xx + σu 2 = β 1 κ xx { where σ xx = E n 1 n i=1 (x i x) 2}. Thus, the effect of measurement error is to bias the slope estimate in the direction of zero. Bias of this nature is commonly referred to as attenuation. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 69 / 70 i=1

If κ xx is known, a simple bias-correction method is to use If σ 2 u is known, then we use ˆβ 1 = κ 1 xx ˆβ 1,OLS. ˆβ 1 = { S 2 X σ2 u} 1 SXY. If the ratio σe/σ 2 u 2 is known, a bias-corrected method can be obtained by minimizing the following quantity: { } n (y i β 0 β 1 x i ) 2 Q (x 1,, x n, β 0, β 1 ) = σe 2 + (X i x i ) 2 σu 2. i=1 Here, x i are treated as parameters. This is essentially the least squares method with measurement errors. The solution can be obtained by minimizing { } n (y i β 0 β 1 X i ) 2 Q (β 0, β 1 ) = σe 2 + β1 2. σ2 u i=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 70 / 70