6. Fractional Imputation in Survey Sampling

Size: px

Start display at page:

Download "6. Fractional Imputation in Survey Sampling"

Brandon Lloyd
5 years ago
Views:

1 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population are study variables, z i = (x i, y i ) with y i = (y i1,, y ip ) being the vector of study variables that are subject to missingness and x i being the vector of auxiliary variables that are always observed. We are interested in estimating η, defined as a (unique) solution to the population estimating equation N i=1 U(η; z i) = 0. Examples of η includes 1. Population mean: U(η; x, y) = y η 2. Population proportion of Y less than q: U(η; x, y) = I(y < q) η 3. Population p-th quantitle : U(η; x, y) = I(y < η) p 4. Population regression coefficient: U(η; x, y) = (y xη)x 5. Domain mean: U(η; x, y) = (y η)d(x) Let A denote the set of indices for the units in a sample selected by a probability sampling mechanism with sample size n. Under complete response, a consistent estimator of η is obtained by solving w i U(η; z i ) = 0. (1) i A where w i = 1/π i is the inverse of the first-order inclusion probability. Under some regularity conditions, we can establish that the solution ˆη n to (1) converges in probability to η and is asymptotically normally distributed. 1

2 Things to consider Complex sampling: unequal probability of selection, multi-stage sampling General purpose estimation: We do not know which parameter η will be used by data analyst at the time of imputation. Multivariate missingness with arbitrary missing pattern. We cannot use large M. (We do not want to create a huge data file.) Two concepts of missing at random (MAR): (For simplicity, assume p = 1.) Population missing at random (PMAR): MAR holds in the population level f(y x) = f(y x, δ) That is, Y δ x. Sample missing at random (SMAR): MAR holds in the sample level f(y x, I = 1) = f(y x, I = 1, δ) That is, Y δ (x, I = 1), where I i = 1 if unit i A and I i = 0 otherwise. If the sampling design is such that P (I = 1 x, y) = P (I = 1 x), which is often called noninformative sampling design, then PMAR implies SMAR. For informative sampling design, PMAR does not necessarily imply SMAR. Under PMAR, an imputed value y i of missing y i satisfies while, under SMAR, it satisfies E{y i x i, δ i = 0} = E{y i x i, δ i = 0} (2) E{y i x i, I i = 1, δ i = 0} = E{y i x i, δ i = 0}. (3) Roughly speaking, fractional imputation is based on PMAR assumption, while multiple imputation (which will be covered next week) is based on SMAR assumption. 2

3 2 Parametric fractional imputation We assume that the finite population at hand is a realization from an infinite population, called superpopulation. In the superpopulation model, we often postulate a parametric distribution, f(y x; θ), which is known up to the parameter θ with parameter space Ω. The parametric model has a joint density f(y x; θ) = f 1 (y 1 x; θ 1 )f 2 (y 2 x, y 1 ; θ 2 ) f p (y p x, y 1,, y p 1 ; θ p ) (4) where θ k is the parameter in the conditional distribution of y k given x and y 1,, y k 1. For each y i = (y 1i,, y pi ), we have δ i = (δ 1i,, δ pi ) where δ ki = 1 if y ki is observed and δ ki = 0 otherwise. For example, p = 3, there are 8 = 2 3 possible missing patterns: A = A 111 A 110 A 000, where, for example, A 100 is the set of sample indices with δ 1i = 1, δ 2i = 0, and δ 3i = 0. For i A 100, we need to create imputed values for y 2i and y 3i from f(y 2i, y 3i x i, y 1i ). Without loss of generality, we can express y i = (y obs,i, y mis,i ), where y obs,i and y mis,i are the observed and the missing part of y i, respectively. Three steps for PFI under complex sampling 1. Compute the pseudo maximum likelihood estimator of θ using EM by PFI method with sufficiently large imputation size M (say, M = 1, 000). 2. Select m (say, m = 10) imputed values from the set of M imputed values. 3. Construct the final fractional weights for the m imputed data. The first step is called Fully Efficient Fractional Imputation (FEFI) step. The second step is Sampling Step, The third step is called Weighting Step. Step 1 (FEFI step): The pseudo maximum likelihood estimator of θ is computed by the following EM algorithm. 3

4 1. [I-step]: Set t = 0. Obtain M imputed values of y mis,i generated from a proposal distribution h(y mis,i x i, y obs,i ). One simple choice of h( ) is h(y mis,i x i, y obs,i ) = f(y mis,i x i, y obs,i ; ˆθ (0) ) (5) where ˆθ (0) is the initial estimator of θ obtained from the available respondents. Generating samples from (5) may require MCMC method or SIR (Sampling Importance Resampling) method. See Appendix B for an illustration of SIR method. Let w ij(0) for y (j) mis,i. = 1/M be the initial fractional weights 2. [M-step]: Update the parameter ˆθ (t+1) by solving the following imputed score equation, ˆθ (t+1) : solution to i A w i M j=1 w ij(t)s(θ; x i, y ij) = 0, where yij = (y obs,i, y (j) mis,i ) and S(θ; x, y) = log f(y x; θ)/ θ is the score function of θ. 3. [W-step]: Set t = t + 1. Using the current value of the parameter estimates ˆθ (t), compute the fractional weights w ij(t) f(y obs,i, y (j) mis,i x i; ˆθ (t) ) h(y (j) mis,i x i, y obs,i ) with M j=1 w ij(t) = 1. For t = 0, we set w ij(t) = 1/M. 4. Check if w ij(t) > 1/m for some j = 1,, M. If yes, update the proposal distribution with ˆθ 0 replaced by ˆθ (t) and goto [I-step]. If no, goto [M-step]. Stop if ˆθ (t) meets the convergence criterion. [I-step] is the imputation step, [W-step] is the weighting step, and [M-step] is the maximization step. Note that the imputed values are not changed in the EM iteration. Only the fractional weights are updated. Step 2 (Sampling Step): For each i, we have M possible imputed values z (j) i = (x i, y obs,i, y (j) mis,i ) with their fractional weights w ij, where w ij is computed from 4

5 the EM algorithm after convergence. For each i, we treat z i = {z (j) i ; j = 1, 2,, M} as as a weighted finite population (with weight w ij) and use an unequal probability sampling method to select a sample of size m from z i using w ij as the selection probability. (We can use a PPS sampling or systematic πps sampling to obtain an imputed data of size m.) Let z (1) i,, z (m) i elements sampled from z i by the PPS sampling. That is, P r( z (k) i = z (j) i ) = w ij, j = 1,, M; k = 1,, m. be the m The fractional weights for the final m imputed values are given by w ij0 = 1/m. Step 3 (Weighting Step): Modify the initial fractional weights w ij0 = 1/m to satisfy the calibration constraints. The constraint is w i m i A j=1 w ijs(ˆθ; x i, ỹ ij) = 0 (6) with m j=1 w ij = 1, where ˆθ is the pseudo MLE of θ computed from the FEFI step and ỹ ij = (y obs,i, ỹ (j) mis,i ). That is, ỹ ij is the j-th imputed element of y i selected from the PPS sampling in Step 2. A solution to this calibration problem is { } w ij = w ij0 w i S Ii0 ˆT 1 w ij0(s ij S Ii0) where i A Sij = S(ˆθ; x i, ỹij) S Ii0 = m ˆT = i A w ij0s ij j=1 m w i w ij j=1 ( S ij S Ii0 ) ( S ij S Ii0 ). Once the final fractional weights are computed, then the PFI estimator of η is obtained by solving w i m i A j=1 w iju(η; x i, ỹ ij) = 0. (7) 5

6 Note that the above fractionally imputed estimating equation is an approximation to the following expected estimating equation w i E{U(η; x i, y obs,i, Y mis,i ) x i, y obs,i ; ˆθ} = 0. i A For variance estimation, we can use replication methods (such as jackknife or bootstrap). Details are given in Appendix C. 3 Nonparametric approach: Fractional Hot deck Imputation for multivariate continuous variable We do not want to make a parametric model assumptions about f(y 1,, y p x). However, some assumption of joint distribution of (y 1,, y p ) is needed in order to preserve the correlation structure between the items. Easy if the data were categorical. Example (SRS of size n = 10) ID Weight x y 1 y y 1,1 y 2, y 1,2 M M y 2, y 1,4 y 2, y 1,5 y 2, y 1,6 y 2, M y 2, M M y 1,9 y 2, y 1,10 y 2,10 M: Missing Fractional Imputation Idea: If both y 1 and y 2 are categorical, then fractional imputation is easy to apply. We have only finite number of possible values. Imputed values = possible values 6

7 The fractional weights are the conditional probabilities of the possible values given the observations. Can use EM by weighting method of Ibrahim (1990) to compute the fractional weights. Example (y 1, y 2 : dichotomous, taking 0 or 1) ID Weight x y 1 y y 1,1 y 2, w2,1 1 y 1, w2,2 1 y 1, w3,1 1 0 y 2,3 0.10w3,2 1 1 y 2, y 1,4 y 2, y 1,5 y 2, y 1,6 y 2, w7,1 2 0 y 2,7 0.10w7,2 2 1 y 2, w8, w8, w8, w8, y 1,9 y 2, y 1,10 y 2,10 Fractional weights are the conditional probabilities of the imputed values given the observations. For example, w2,1 = ˆP (y 2 = 0 x = x 2, y 1 = y 1,2 ) w3,1 = ˆP (y 1 = 0 x = x 3, y 2 = y 3,2 ) w7,1 = ˆP (y 1 = 0 x = x 7, y 2 = y 2,7 ) and w 8,1 = ˆP (y 1 = 0, y 2 = 0 x = x 8 ). The conditional probabilities are computed from the joint probabilities. M-step: Update the joint probability π bc a = P (y 1 = b, y c = b x = a) by solving ˆπ bc a = n Mi i=1 j=1 w iwi,ji(x i = a, y (j) 1i = b, y (j) 2i = c) n i=1 w. ii(x i = a) 7

8 For continuous y, Let s consider an approximation using categorical transformation. For simplicity, let Y = (Y 1, Y 2, Y 3 ) be the study variables that have missingness. 1. Preliminary Step For each item k, create a transformation of Y k into Ỹk, a discrete version of Y k. The value of Ỹk will serve the role of imputation cell for Y k. If Y k is missing, then Ỹ k is also missing. Let M k be the number of cells for item Y k. The maximum number of cells for p = 3 is then G = M 1 M 2 M FEFI Step: Two-stage imputation is used. For each i in the sample, ỹ i is decomposed into ỹ i = (ỹ obs,i, ỹ mis,i ). In the stage 1 imputation, we impute the imputation cells. In the stage 2 imputation, we impute for missing observations within imputation cells. To perform two-stage imputation, we first compute the estimated joint probability π ijk = Pr (ỹ 1 = i, ỹ 2 = j, ỹ 3 = k) using the EM algorithm (or other estimation methods). (a) Stage 1 imputation: For each i, identify all possible values of ỹ mis,i. Let G i be the number of possible values of ỹ mis,i. In the stage 1 FEFI method, we create G i imputed values of ỹ mis,i with the fractional weights corresponding to ỹ mis,i(g) is w ig(1) = π(ỹ i,obs, ỹ i,mis(g) ) g π(ỹ i,obs, ỹ i,mis(g) ) (8) where ỹ i,mis(g) is the g-th realization of the ỹ i,mis, the missing part of Ỹ for unit i. (b) Stage 2 imputation: For each g-th imputed cell in Stage 1 imputation, we identify the donor set from the respondents of Y mis(i) to impute all the observed values of Y k in the same cell. For example, if we observe y 1i and y 3i but y 2i is not observed. In this case, the donor set for unit i is D i = {j A; δ 1j = δ 3j = 1, ỹ 1j = ỹ 1i, ỹ 3j = ỹ 3i }. The within-cell fractional 8

9 weight for donor j is then w ij(2) = w j j D i w j. The final fractional weight for donor j in unit i is w ij = w ig(1)w ij(2). (9) Note that j w ij = 1. Note that, for the fractional weight w ij, the imputed value for y i = (y obs,i, y mis,i ) is y ij = (y obs,i, y mis(i),j ), where y mis(i),j is the value corresponding to variable y mis(i) for unit j. 3. Sampling Step From the two stage FEFI data, we use a PPS sampling (with rejective sampling) and calibration weighting to obtain an approximation. That is, for each i, from the set {(w ij, y ij); j A}, we perform a systematic PPS sampling of size m using w ij in (9) as the size measure. Let y i1,, y im be the m imputed values from the PPS selection. The initial fractional weight assigned to y ij w ij0 = 1/m. 4. Weighting Step is given b The fractional weights are further adjusted to match to the marginal probabilities π i++, π +j+, and π ++k. That is, we may use w i {δ i I(ỹ i = g) + (1 δ i ) i A m j=1 w iji(ỹ ij = g) } = ˆπ g, g = 1,, G. In this case, a raking ratio estimation method can be used to achieve these calibration constraints. For variance estimation, we can use the replication method by repeating [Step 2]-[Step 4] to obtain replicated fractional weights. 9

10 Appendix A. Lemma 6.1 Lemma 6.1: If either (2) or (3) holds, the imputed estimator of Y = N i=1 y i of the form ŶI = i A w i{δ i y i +(1 δ i )y i } is unbiased for Y in the sense that E(ŶI Y ) = 0. Proof. We first introduce an extended definition of δ i, where δ i = 1 if unit i responds when sampled and δ i = 0 otherwise. With this extended definition, δ i is defined throughout the finite population. Fay (1992), Shao and Steel (1999) and Kim and Rao (2009) also used this extended definition. Now, we will first show that (3) implies unbiased. Let Ŷn = i A w iy i be an unbiased estimator of Y. Note that Ŷ I Ŷn = N I i w i (1 δ i )(yi y i ). (10) i=1 Thus, writing I = (I 1, I 2,, I N ) and δ = (δ 1,, δ N ), E{ŶI Ŷn I, δ} = N I i w i (1 δ i )E{yi y i I i = 1, δ i = 0} = 0. (11) i=1 Since we can write E{ŶI Y } = E{ŶI Ŷn} + E{Ŷn Y }, (12) the first term is zero by (11) and the second term is zero by the design-unbiasedness of Ŷn. Finally, we will show that condition (2) also implies unbiasedness. From (10), by taking expectation with respect to the sampling design, we have E{ŶI Ŷn δ, Y} = N (1 δ i ) (yi y i ) i=1 and so N E{ŶI Ŷn δ} = (1 δ i ) E (yi y i δ i = 0) = 0. i=1 Therefore, the unbiasedness of the imputed estimator also follows as the first term of (12) can be shown to be zero from (2). 10

11 B. SIR algorithm in the I-step To discuss I-step in Step 1, we give an illustration for p = 2. Extension to p > 2 case is straightforward. Note that the joint density can be written f(y 1, y 2 x) = f 1 (y 1 x; θ 1 )f 2 (y 2 x, y 1 ; θ 2 ). The sample is partitioned into four sets, A 11, A 10, A 01, A 00, according to the missing patterns. We first obtain the initial parameter estimate ˆθ (0) = (ˆθ 1(0), ˆθ 2(0) ) using available respondents. That is, we use the observations in A 11 A 10 to estimate θ 1 and use the observations in A 11 to estimate θ 2. Now, we want to generate M imputed values from (5). In the case of p = 2, we have the proposal distribution can be written as f 2 (y 2i x i, y 1i ; ˆθ 2(0) ) if i A 10 h(y mis,i x i, y obs,i ) = f(y 1i x i, y 2i ; ˆθ (0) ) if i A 01 f(y 1i, y 2i x i ; ˆθ (0) ) if i A 00 where f(y 1i x i, y 2i ; ˆθ (0) ) = f 1 (y 1i x i ; ˆθ 1(0) )f 2 (y 2i x i, y 1i ; ˆθ 2(0) ) f1 (y 1i x i ; ˆθ 1(0) )f 2 (y 2i x i, y 1i ; ˆθ 2(0) )dy 1i. (13) Except for some special cases such as normal f 1 and normal f 2, the conditional distribution in (13) is not of know form. Thus, some computational tools (such as Metropolis-Hastings algorithm) to generate samples from (13) for i A 01. We introduce SIR (Sampling Importance Resampling) algorithm as an alternative computational tool for generating imputed values from (13). The SIR consists of the following steps: 1. Generate B (san B = 100) samples from f 1 (y 1i x i ; ˆθ 1(0) ). 2. Select a PPS sample of size one from the B elements of y1i with size measure f 2 (y 2i x i, y1i; ˆθ 2(0) ). 3. Repeat Step 1 and Step 2 independently M times to obtain M imputed values. Once we obtain M imputed values of y 1i, we can use ĥ(y mis,i x i, y obs,i ) f 1 (y 1i x i ; ˆθ 1(0) )f 2 (y 2i x i, y 1i ; ˆθ 2(0) ) 11

12 as an estimator for the proposal density in (5). Since M j=1 w ij = 1, we do not need to compute the density for the conditional density in (13). C. Replication variance estimation In the variance estimation, we use a replication variance method. Let Ŷn = i A w iy i be the complete sample estimator of Y under complete response and let ˆV rep = n k=1 c k (Ŷ (k) n Ŷn ) 2 be a replication variance estimator with Ŷ (k) n = i A w(k) i y i. To discuss variance estimation of the PFI method presented in Section 2, recall that the PFI method consists of three steps: (1) FEFI step, (2) Sampling Step, (3) Weighting Step. We mimic the procedures for each replication but want to avoid regenerating the imputed values. The proposed variance estimation employs similar steps but uses the replication weights w (k) i instead of the original weight w i. 1. FEFI step: Compute the replicate ˆθ (k) of ˆθ by applying the same EM algorithm using w i replaced by w (k) i. 2. Sampling Step: We will use the same imputed data for each replication. The replicates for the fractional weights for the final m imputed values are given by w (k) ij0 f(y obs,i, ỹ (j) mis,i ; ˆθ (k) ) f(y obs,i, ỹ (j) mis,i ; ˆθ) (14) with m j=1 w (k) ij0 = Weighting step: Modify the initial fractional weights w (k) ij0 following calibration constraint in (14) to satisfy the i A w (k) i m j=1 w (k) ij S(ˆθ (k) ; x i, ỹ ij) = 0 with m j=1 w (k) ij = 1, where ˆθ (k) is the pseudo MLE of θ computed from the FEFI step using the replication weights. 12

13 Once the final replicated fractional weights are computed, then the variance estimation of ˆη P F I obtained from (7) is given by where ˆη (k) P F I is computed from ˆV P F I = n k=1 ( ) 2 c k ˆη (k) P F I ˆη P F I i A w (k) i m j=1 w (k) ij U(η; x i, ỹ ij) = 0. 13

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction