Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015
Outline Introduction Fractional imputation Features Numerical illustration Conclusion Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 2 / 29
Introduction Basic Setup U = {1, 2,, N}: Finite population A U: sample (selected by a probability sampling design). Under complete response, suppose that ˆη n,g = w i g(y i ) i A is an unbiased estimator of η g = N 1 N i=1 g(y i). Here, g( ) is a known function. For example, g(y) = I (y < 3) leads to η g = P(Y < 3). Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 3 / 29
Introduction Basic Setup (Cont d) A = A R A M, where y i are observed in A R. y i are missing in A M R i = 1 if i A R and R i = 0 if i A M. y i : imputed value for y i, i A M Imputed estimator of η g ˆη I,g = i A R w i g(y i ) + i ) i A M w i g(y Need E {g(y i ) R i = 0} = E {g(y i ) R i = 0}. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 4 / 29
Introduction ML estimation under missing data setup Often, find x (always observed) such that Missing at random (MAR) holds: f (y x, R = 0) = f (y x) Imputed values are created from f (y x). Computing the conditional expectation can be a challenging problem. 1 Do not know the true parameter θ in f (y x) = f (y x; θ): E {g (y) x} = E {g (y i ) x i ; θ}. 2 Even if we know θ, computing the conditional expectation can be numerically difficult. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 5 / 29
Introduction Imputation Imputation: Monte Carlo approximation of the conditional expectation (given the observed data). E {g (y i ) x i } = 1 M M ( g j=1 y (j) i ) 1 Bayesian approach: generate yi from f (y i x i, y obs ) = f (y i x i, θ) p(θ x i, y obs )dθ 2 Frequentist approach: generate yi consistent estimator. from f ( y i x i ; ˆθ ), where ˆθ is a Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 6 / 29
Comparison Bayesian Frequentist Model Posterior distribution Prediction model f (latent, θ data) f (latent data, θ) Computation Data augmentation EM algorithm Prediction I-step E-step Parameter update P-step M-step Parameter est n Posterior mode ML estimation Imputation Multiple imputation Fractional imputation Variance estimation Rubin s formula Linearization or Resampling Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 7 / 29
Fractional Imputation: Basic Idea Approximate E{g(y i ) x i } by M i E{g(y i ) x i } = wij g(y (j) i ) where wij is the fractional weight assigned to y (j) i, the j-th imputed value of y i. j=1 The fractional weights satisfy M i j=1 w ij = 1. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 8 / 29
Fractional Imputation for categorical data If y i is a categorical variable, we can use M i = total number of possible values of y i y (j) i = the j-th possible value of y i w (j) ij = P(y i = y (j) i x i ; ˆθ), where ˆθ is the pseudo MLE of θ. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 9 / 29
Parametric Fractional Imputation (Kim, 2011) More generally, we can write y i = (y i1,, y ip ) and y i can be partitioned into (y i,obs, y i,mis ). 1 More than one (say M) imputed values of y mis,i, denoted by y (1) mis,i,, y (M) mis,i, are generated from some density h (y mis,i y obs ). 2 Create weighted data set {( wi wij, yij ) } ; j = 1, 2,, M; i A where M j=1 w ij = 1, y ij = (y obs,i, y (j) mis,i ) wij f (yij (j) ; ˆθ)/h(y mis,i y i,obs ), ˆθ is the (pseudo) maximum likelihood estimator of θ, and f (y; θ) is the joint density of y. 3 The weight wij are the normalized importance weights and can be called fractional weights. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 10 / 29
EM algorithm using PFI EM algorithm by fractional imputation 1 Initial imputation: generate y (j) mis,i h (y i,mis y i,obs ). 2 E-step: compute where M j=1 w ij(t) = 1. 3 M-step: update w ij(t) f (y ij ; ˆθ (t) )/h(y (j) i,mis y i,obs) ˆθ (t+1) : solution to i A M j=1 w i w ij(t) S ( θ; y ij ) = 0, where S(θ; y) = log f (y; θ)/ θ is the score function of θ. 4 Repeat Step2 and Step 3 until convergence. We may add an optional step that checks if wij(t) is too large for some j. In this case, h(y i,mis y i,obs ) needs to be changed. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 11 / 29
Approximation: Calibration Fractional imputation In large scale survey sampling, we prefer to have smaller M. Two-step method for fractional imputation: 1 Create a set of fractionally imputed data with size nm, (say M = 1000). 2 Use an efficient sampling and weighting method to get a final set of fractionally imputed data with size nm, (say m = 10). Thus, we treat the step-one imputed data as a finite population and the step-two imputed data as a sample. We can use efficient sampling technique (such as systematic sampling or stratification) to get a final imputed data and use calibration technique for fractional weighting. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 12 / 29
Why Fractional Imputation? 1 To improve the efficiency of the point estimator (vs. single imputation) 2 To obtain valid frequentist inference without congeniality condition (vs. multiple imputation) 3 To handle informative sampling mechanism We will discuss the third issue first. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 13 / 29
Informative sampling design Let f (y x) be the conditional distribution of y given x. x is always observed but y is subject to missingness. A sampling design is called noninformative (w.r.t f ) if it satisfies f (y x, I = 1) = f (y x) (1) where I i = 1 if i A and I i = 0 otherwise. If (1) does not hold, then the sampling design is informative. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 14 / 29
Missing At Random Two versions of Missing At Random (MAR) 1 PMAR (Population Missing At Random) Y R X 2 SMAR (Sample Missing At Random) Y R (X, I ) Fractional imputation assumes PMAR while multiple imputation assumes SMAR Under noninformative sampling design, PMAR=SMAR Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 15 / 29
Imputation under informative sampling Two approaches under informative sampling when PMAR holds. 1 Weighting approach: Use weighted score equation to estimate θ in f (y x; θ). The imputed values are generated from f (y x, ˆθ). 2 Augmented model approach: Include w into model covariates to get the augmented model f (y x, w; φ). The augmented model makes the sampling design noninformative in the sense that f (y x, w) = f (y x, w, I = 1). The imputed values are generated from f (y x, w; ˆφ), where ˆφ is computed from unweighted score equation. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 16 / 29
Berg, Kim, and Skinner (2015) Figure : A Directed Acyclic Graph (DAG) for a setup where PMAR holds but SMAR does not hold. Variable U is latent in the sense that it is never observed. R W I Y X U f (y x, R) = f (y x) holds but f (y x, w, R) f (y x, w). Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 17 / 29
Imputation under informative sampling Weighting approach generates imputed values from f (y x, R = 1) and ˆθ I = i A w i {R i y i + (1 R i )y i } (2) is unbiased under PMAR. The augmented model approach generates imputed values from f (y x, w, I = 1, R = 1) and (2) is unbiased when f (y x, w, I = 1, R = 1) = f (y x, w, I = 1, R = 0) (3) holds. PMAR does not necessarily imply SMAR in (3). Fractional imputation is based on weighting approach. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 18 / 29
Numerical illustration A pseudo finite population constructed from a single month data in Monthly Retail Trade Survey (MRTS) at US Bureau of Census N = 7, 260 retail business units in five strata Three variables in the data h: stratum x hi : inventory values y hi : sales Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 19 / 29
Box plot of log sales and log inventory values by strata 1 2 3 4 5 10 11 12 13 14 15 16 17 Box plot of sales data by strata strata log scale 1 2 3 4 5 13 14 15 16 17 Box plot of inventory data by strata strata log scale Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 20 / 29
Imputation model log(y hi ) = β 0h + β 1 log(x hi ) + e hi where e hi N(0, σ 2 ) Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 21 / 29
Residual plot and residual QQ plot 12 13 14 15 16 3 2 1 0 1 2 3 Fitted values Residuals Residuals vs Fitted 4 6658 6424 4 2 0 2 4 5 0 5 Theoretical Quantiles Standardized residuals Normal Q Q 4 6658 6424 Regression model of log(y) against log(x) and strata indicator Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 22 / 29
Stratified random sampling Table : The sample allocation in stratified simple random sampling. Strata 1 2 3 4 5 Strata size N h 352 566 1963 2181 2198 Sample size n h 28 32 46 46 48 Sampling weight 12.57 17.69 42.67 47.41 45.79 Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 23 / 29
Response mechanism: PMAR Variable x hi is always observed and only y hi is subject to missingness. PMAR R hi Bernoulli(π hi ), π hi = 1/[1 + exp{4 0.3 log(x hi )}]. The overall response rate is about 0.6. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 24 / 29
Simulation Study Table 1 Monte Carlo bias and variance of the point estimators. Parameter Estimator Bias Variance Std Var Complete sample 0.00 0.42 100 θ = E(Y ) MI 0.00 0.59 134 FI 0.00 0.58 133 Table 2 Monte Carlo relative bias of the variance estimator. Parameter Imputation Relative bias (%) V (ˆθ) MI 18.4 FI 2.7 Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 25 / 29
Discussion Rubin s formula is based on the following decomposition: V (ˆθ MI ) = V (ˆθ n ) + V (ˆθ MI ˆθ n ) where ˆθ n is the complete-sample estimator of θ. Basically, W M term estimates V (ˆθ n ) and (1 + M 1 )B M term estimates V (ˆθ MI ˆθ n ). For general case, we have V (ˆθ MI ) = V (ˆθ n ) + V (ˆθ MI ˆθ n ) + 2Cov(ˆθ MI ˆθ n, ˆθ n ) and Rubin s variance estimator ignores the covariance term. Thus, a sufficient condition for the validity of unbiased variance estimator is Cov(ˆθ MI ˆθ n, ˆθ n ) = 0. Meng (1994) called the condition congeniality of ˆθ n. Congeniality holds when ˆθ n is the MLE of θ (self-efficient estimator). Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 26 / 29
Discussion For example, there are two estimators of θ = E(Y ) when log(y ) follows from N(β 0 + β 1 x, σ 2 ). 1 Maximum likelihood method: 2 Method of moments: ˆθ MLE = n 1 n exp{ ˆβ 0 + ˆβ 1 x i + 0.5ˆσ 2 } i=1 ˆθ MME = n 1 The MME of θ = E(Y ) does not satisfy the congeniality and Rubin s variance estimator is biased (R.B. = 58.5%) Rubin s variance estimator is essentially unbiased for MLE of θ (R.B. = -1.9%) but MLE is rarely used in practice. n i=1 y i Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 27 / 29
Summary Fractional imputation is developed as a frequentist imputation. Multiple imputation is motivated from a Bayesian framework. The frequentist validity of multiple imputation requires congeniality. Fractional imputation does not require the congeniality condition and works well for Method of Moments estimators. For informative sampling, augmented model approach does not necessarily achieve SMAR. Fractional imputation uses weighting approach for informative sampling. Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 28 / 29
The end Yang & Kim (Iowa State University ) Fractional Imputation 2015 JSM 29 / 29