A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32
Introduction Basic Setup Assume simple random sampling, for simplicity. Under complete response, suppose that ˆη n,g = n 1 n g(y i ) is an unbiased estimator of η g = E{g(Y )}, for known g( ). δ i = 1 if y i is observed and δ i = 0 otherwise. y i : imputed value for y i for unit i with δ i = 0. Imputed estimator of η g ˆη I,g = n 1 n i=1 i=1 {δ i g(y i ) + (1 δ i )g(y i )} Need E {g(y i ) δ i = 0} = E {g(y i ) δ i = 0}. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 2 / 32
Introduction ML estimation under missing data setup Often, find x (always observed) such that Missing at random (MAR) holds: f (y x, δ = 0) = f (y x) Imputed values are created from f (y x). Computing the conditional expectation can be a challenging problem. 1 Do not know the true parameter θ in f (y x) = f (y x; θ): E {g (y) x} = E {g (y) x; θ}. 2 Even if we know θ, computing the conditional expectation can be numerically difficult. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 3 / 32
Introduction Imputation Imputation: Monte Carlo approximation of the conditional expectation (given the observed data). E {g (y i ) x i } = 1 m m ( g j=1 y (j) i ) 1 Bayesian approach: generate yi from f (y i x i, y obs ) = f (y i x i, θ) p(θ x i, y obs )dθ 2 Frequentist approach: generate yi consistent estimator. from f ( y i x i ; ˆθ ), where ˆθ is a Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 4 / 32
Introduction Basic Setup (Cont d) Thus, imputation is a computational tool for computing the conditional expectation E{g(y i ) x i } for missing unit i. To compute the conditional expectation, we need to specify a model f (y x; θ) evaluated at θ = ˆθ. Thus, we can write ˆη I,g = ˆη I,g (ˆθ). To estimate the variance of ˆη I,g, we need to take into account of the sampling variability of ˆθ in ˆη I,g = ˆθ I,g (ˆθ). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 5 / 32
Introduction Basic Setup (Cont d) Three approaches Bayesian approach: multiple imputation by Rubin (1978, 1987), Rubin and Schenker (1986), etc. Resampling approach: Rao and Shao (1992), Efron (1994), Rao and Sitter (1995), Shao and Sitter (1996), Kim and Fuller (2004), Fuller and Kim (2005). Linearization approach: Clayton et al (1998), Shao and Steel (1999), Robins and Wang (2000), Kim and Rao (2009). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 6 / 32
Comparison Bayesian Frequentist Model Posterior distribution Prediction model f (latent, θ data) f (latent data, θ) Computation Data augmentation EM algorithm Prediction I-step E-step Parameter update P-step M-step Parameter est n Posterior mode ML estimation Imputation Multiple imputation Fractional imputation Variance estimation Rubin s formula Linearization or Bootstrap Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 7 / 32
MultipleImputation Step 1. (Imputation) Create M complete datasets by filling in missing values with imputed values generated from the posterior predictive distribution. To create the jth imputed dataset, first generate θ (j) from the posterior distribution p(θ X n, y obs ), and then generate y (j) i from the imputation model f (y x i ; θ (j) ) for each missing y i. Step 2. (Analysis) Apply the user s complete-sample estimation procedure to each imputed dataset. Let ˆη (j) be the complete-sample estimator of η = E{g(Y )} applied to the jth imputed dataset and ˆV (j) be the complete-sample variance estimator of ˆη (j). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 8 / 32
Multiple Imputation Step 3. (Summarize) Use Rubin s combining rule to summarize the results from the multiply imputed datasets. The multiple imputation estimator of η, denoted by ˆη MI, is ˆη MI = 1 M M j=1 ˆη (j) I Rubin s variance estimator is ˆV MI (ˆη MI ) = W M + ( 1 + 1 ) B M, M where W M = M 1 M j=1 ˆV (j) and B M = (M 1) 1 M j=1 (ˆη(j) I ˆη MI ) 2. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 9 / 32
Multiple imputation Motivating Example Suppose that you are interested in estimating η = P(Y 3). Assume a normal model for f (y x; θ) for multiple imputation. Two choices for ˆη n : 1 Method-of-moments (MOM) estimator: ˆη n1 = n 1 n i=1 I (y i 3). 2 Maximum-likelihood estimator: ˆη n2 = n 1 n P(Y 3 x i ; ˆθ), i=1 where P(Y 3 x i ; ˆθ) = 3 f (y x i; ˆθ)dy. Rubin s variance estimator is nearly unbiased for ˆη n2, but provide conservative variance estimation for ˆη n1 (30-50% overestimation of the variances in most cases). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 10 / 32
Introduction The goal is to 1 understand why MI provide biased variance estimation for MOM estimation. 2 characterize the asymptotic bias of MI variance estimator when MOM estimator is used in the complete-sample analysis; 3 give an alternative variance estimator that can provide asymptotically valid inference for MOM estimation. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 11 / 32
Multiple imputation Rubin s variance estimator is based on the following decomposition, var(ˆη MI ) = var(ˆη n ) + var(ˆη MI, ˆη n ) + var(ˆη MI ˆη MI, ), (1) where ˆη n is the complete-sample estimator of η and ˆη MI, is the probability limit of ˆη MI for M. Under some regularity conditions, W M term estimates the first term, the B M term estimates the second term, and the M 1 B M term estimates the last term of (1), respectively. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 12 / 32
Multiple imputation In particular, Kim et al (2006, JRSSB) proved that the bias of Rubin s variance estimator is Bias( ˆV MI ) = 2cov(ˆη MI ˆη n, ˆη n ). (2) The decomposition (1) is equivalent to assuming that cov(ˆη MI ˆη n, ˆη n ) = 0, which is called the congeniality condition by Meng (1994). The congeniality condition holds when ˆη n is the MLE of η. In such cases, Rubin s variance estimator is asymptotically unbiased. If method of moment (MOM) estimator is used to estimate η = E{g(Y )}, Rubin s variance estimator can be asymptotically biased. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 13 / 32
MI for MOM estimator: No x variable Assume that y is observed for the first r elements. MI for MOM estimator of η = E{g(Y )}: ˆη MI = 1 n r g(y i ) + 1 1 n M i=1 n M i=r+1 j=1 g(y (j) i ) where y (j) i are generated from f (y θ (j) ) and θ (j) are generated from p(θ y obs ). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 14 / 32
MI for MOM estimator: No x variable (Cont d) Since, conditional on the observed data 1 p lim M M M j=1 g(y (j) i ) = E [E {g(y ); θ } y obs ] = E{g(Y ); ˆθ} and the MI estimator of η (for M ) can be written ˆη MI = r n ˆη MME,r + n r n ˆη MLE,r. Thus, ˆη MI is a convex combination of ˆη MME,r and ˆη MLE,r, where ˆη MLE,r = E{g(Y ); ˆθ} and ˆη MME,r = r 1 r i=1 g(y i). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 15 / 32
MI for MOM estimator: No x variable (Cont d) Since Bias( ˆV MI ) = 2Cov(ˆη MME,n, ˆη MI ˆη MME,n ), we have only to evaluate the covariance term. Writing ˆη MME,n = p ˆη MME,r + (1 p) ˆη MME,n r, where p = r/n, we can obtain Cov(ˆη MME,n, ˆη MI ˆη MME,n ) = Cov{ˆη MME,n, (1 p)(ˆη MLE,r ˆη MME,n r )} = p(1 p)cov{ˆη MME,r, ˆη MLE,r } (1 p) 2 V {ˆη MME,n r } = p(1 p) {V (ˆη MLE,r ) V (ˆη MME,r )}, which proves Bias( ˆV MI ) 0. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 16 / 32
Bias of Rubin s variance estimator Theorem Let ˆη n = n 1 n i=1 g(y i) be the method of moments estimator of η = E{g(Y )} under complete response. Assume that E( ˆV (j) ) = var(ˆη n ) holds for j = 1,..., M. Then for M, the bias of Rubin s variance estimator is Bias( ˆV ( ) MI ) = 2n 1 (1 p) E [var{g(y ) X } δ = 0] ṁθ,0 T I 1 θ ṁ θ,1, where p = r/n, I θ = E{ 2 log f (Y X ; θ)/ θ θ T }, m(x; θ) = E{g(Y ) x; θ}, ṁ θ (x) = m(x; θ)/ θ, ṁ θ,0 = E{ṁ θ (X ) δ = 0}, and ṁ θ,1 = E{ṁ θ (X ) δ = 1}. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 17 / 32
Bias of Rubin s variance estimator Under MCAR, the bias simplifies to Bias( ˆV MI ) = 2p(1 p){var(ˆη r,mme ) var(ˆη r,mle )}, where ˆη r,mme = r 1 r i=1 g(y i) and ˆη r,mle = r 1 r i=1 E{g(Y ) x i; ˆθ}. This shows that Rubin s variance estimator is unbiased if and only if the method of moments estimator is as efficient as the maximum likelihood estimator, that is, var(ˆη r,mme ) = var(ˆη r,mle ). Otherwise, Rubin s variance estimator is positively biased. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 18 / 32
Rubin s variance estimator can be negatively biased Now consider a simple linear regression model which contains one covariate X and no intercept. By Theorem 1, Bias( ˆV MI ) = 2(1 p)σ2 n { 1 which can be zero, positive or negative. If } E(X δ = 0)E(X δ = 1) E(X 2, δ = 1) E(X δ = 0) > E(X 2 δ = 1) E(X δ = 1) the Rubins variance estimator can be negatively biased. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 19 / 32
Alternative variance estimation Decompose where and var (ˆη MI, ) = n 1 V 1 + r 1 V 2, V 1 = var{g(y )} (1 p)e[var{g(y ) X } δ = 0], V 2 = ṁθ T I 1 θ ṁ θ p 2 ṁθ,1 T I 1 θ ṁ θ,1. The first term n 1 V 1, is the variance of the sample mean of δ i g(y i ) + (1 δ i )m(x i ; θ). The second term r 1 V 2, reflects the variability associated with the estimated value of θ instead of the true value θ in the imputed values. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 20 / 32
Alternative variance estimation Estimate n 1 V 1 The first term n 1 V 1 can be estimated by W M = W M C M, where W M = M 1 M j=1 ˆV (j), and C M = 1 n 2 (M 1) M n k=1 i=r+1 { g(y (k) i ) 1 M since E{W M } = n 1 var{g(y )} and E(C M ) = n 1 (1 p)e[var{g(y ) X } δ = 0]. M k=1 g(y (k) i )} 2. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 21 / 32
Alternative variance estimation Estimate r 1 V 2 To estimate the second term term, we use over-imputation: Table: Over-Imputation Data Structure Row 1... M average 1 g (1) 1... g (M) 1 ḡ1. r r + 1. n average... gr (1)... gr (M) ḡr g (1) r+1... g (M) r+1 ḡr+1.. gn (1)... gn (M) η n (1)... η n (M). ḡ n η n Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 22 / 32
Alternative variance estimation Estimate r 1 V 2 The second term r 1 V 2 can be estimated by D M = D M,n D M,r, where D M,n = (M 1) 1 D M,r = (M 1) 1 with d (k) i M k=1 M k=1 (n 1 n i=1 (n 1 r i=1 d (k) i d (k) i ) 2 (M 1) 1 ) 2 (M 1) 1 = g(y (k) i ) M 1 M (l) l=1 g(yi ), since E(D M,n ) = r 1 ṁ T θ I 1 θ M k=1 M k=1 n 2 n 2 n i=1 r i=1 (d (k) i ) 2, (d (k) i ) 2. ṁ θ and E(D M,r ) = r 1 p 2 ṁθ,1 T I 1 θ ṁ θ,1. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 23 / 32
Alternative variance estimation Theorem Under the assumptions of Theorem 1, the new multiple imputation variance estimator is ˆV MI = W M + D M + M 1 B M, where W M = W M C M, and B M being the usual between-imputation variance. ˆV MI is asymptotically unbiased for estimating the variance of the multiple imputation estimator as n. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 24 / 32
Simulation study Set up 1 Samples of size n = 2, 000 are independently generated from Y i = βx i + e i, where β = 0.1, X i exp(1) and e i N(0, σ 2 e) with σ 2 e = 0.5. Let δ i be the response indicator of y i and δ i Bernoulli(p i ), where p i = 1/{1 + exp( φ 0 φ 1 x i )}. We consider two scenarios: (i) (φ 0, φ 1 ) = ( 1.5, 2) and (ii) (φ 0, φ 1 ) = (3, 3) The parameters of interest are η 1 = E(Y ) and η 2 = pr(y < 0.15). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 25 / 32
Simulation study Result 1 Table: Relative biases of two variance estimators and mean width and coverages of two interval estimates under two scenarios in simulation one Relative bias Mean Width Coverage (%) for 95% C.I. for 95% C.I. Scenario Rubin New Rubin New Rubin New 1 η 1 96.8 0.7 0.038 0.027 0.99 0.95 η 2 123.7 2.9 0.027 0.018 1.00 0.95 2 η 1-19.8 0.4 0.061 0.069 0.91 0.95 η 2-9.6-0.4 0.037 0.039 0.93 0.95 C.I., confidence interval; η 1 = E(Y ); η 2 = pr(y < 0.15). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 26 / 32
Simulation study Result 1 For η 1 = E(Y ), under scenario (i), the relative bias of Rubin s variance estimator is 96.8%, which is consistent with our result with E 1 (X 2 ) E 0 (X )E 1 (X ) > 0, where E 1 (X 2 ) = 3.38, E 1 (X ) = 1.45, and E 0 (X ) = 0.48. Under scenario (ii), the relative bias of Rubin s variance estimator is 19.8%, which is consistent with our result with E 1 (X 2 ) E 0 (X )E 1 (X ) < 0, where E 1 (X 2 ) = 0.37, E 1 (X ) = 0.47, and E 0 (X ) = 1.73. The empirical coverage for Rubin s method can be over or below the nominal coverage due to variance overestimation or underestimation. On the other hand, the new variance estimator is essentially unbiased for these scenarios. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 27 / 32
Simulation study Set up 2 Samples of size n = 200 are independently generated from Y i = β 0 + β 1 X i + e i, where β = (β 0, β 1 ) = (3, 1), X i N(2, 1) and e i N(0, σ 2 e) with σ 2 e = 1. The parameters of interest are η 1 = E(Y ) and η 2 = pr(y < 1). We consider two different factors: 1 The response mechanism: MCAR and MAR: For MCAR, δ i Bernoulli(0.6). For MAR, δ i Bernoulli(p i ), where p i = 1/{1 + exp( 0.28 0.1x i )} with the average response rate about 0.6. 2 The size of multiple imputation, with two levels M = 10 and M = 30. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 28 / 32
Simulation study Result 2 Table: Relative biases of two variance estimators and mean width and coverages of two interval estimates under two scenarios of missingness in simulation two Relative Bias Mean Width Coverage (%) for 95% C.I. for 95% C.I. M Rubin New Rubin New Rubin New Missing completely at random η 1 10-0.9-1.58 0.240 0.250 0.95 0.95 30-0.6-1.68 0.230 0.235 0.95 0.95 η 2 10 22.7-1.14 0.083 0.083 0.98 0.95 30 23.8-1.23 0.082 0.075 0.98 0.95 Missing at random η 1 10-1.0-1.48 0.23 0.25 0.95 0.95 30-0.9-1.59 0.231 0.23 0.95 0.95 η 2 10 20.7-1.64 0.081 0.081 0.98 0.95 30 21.5-1.74 0.074 0.071 0.98 0.95 C.I., confidence interval; η 1 = E(Y ); η 2 = pr(y < 1). Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 29 / 32
Simulation study Result 2 Both Rubin s variance estimator and our new variance estimator are unbiased for η 1 = E(Y ). Rubin s variance estimator is biased upward for η 2 = pr(y < 1), with absolute relative bias as high as 24%; whereas our new variance estimator reduces absolute relative bias to less than 1.74%. For Rubin s method, the empirical coverage for η 2 = pr(y < 1) reaches to 98% for 95% confidence intervals, due to variance overestimation. In contrast, our new method provides more accurate coverage of confidence interval for both η 1 = E(Y ) and η 2 = pr(y < 1) at 95% levels. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 30 / 32
Conclusion Investigate asymptotic properties of Rubin s variance estimator. If method of moment is used, Rubin s variance estimator can be asymptotically biased. New variance estimator, based on multiple over-imputation, can provide valid variance estimation in this case. Our method can be extended to a more general class of parameters obtained from estimating equations. n U(η; x i, y i ) = 0. i=1 This is a topic of future study. Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 31 / 32
The end Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 32 / 32