arxiv:math/ v1 [math.st] 23 Jun 2004

Size: px

Start display at page:

Download "arxiv:math/ v1 [math.st] 23 Jun 2004"

Edmund Miles
5 years ago
Views:

1 The Annals of Statistics 2004, Vol. 32, No. 2, DOI: / c Institute of Mathematical Statistics, 2004 arxiv:math/ v1 [math.st] 23 Jun 2004 FINITE SAMPLE PROPERTIES OF MULTIPLE IMPUTATION ESTIMATORS By Jae Kwang Kim 1 Yonsei University Finite sample properties of multiple imputation estimators under the linear regression model are studied. The exact bias of the multiple imputation variance estimator is presented. A method of reducing the bias is presented simulation is used to make comparisons. We also show that the suggested method can be used for a general class of linear estimators. 1. Introduction. Multiple imputation, proposed by Rubin (1978), is a procedure for hling missing data that allows the data analyst to use stard techniques of analysis designed for complete data, while providing a method to estimate the uncertainty due to the missing data. Repeated imputations are drawn from the posterior predictive distribution of the missing values under the specified model given a suitable prior distribution. Schenker Welsh [(1988), hereafter SW] studied the asymptotic properties of multiple imputation in the linear-model framework, where the scalar outcome variable Y i is assumed to follow the model (1) Y i = x i β + e i, e i i.i.d. N(0,σ 2 ). The p-dimensional x i s are observed on the complete sample are assumed to be fixed. To describe the imputation procedure, we adopt matrix notation. Without loss of generality, we assume that the first r units are the respondents. Let y r = (Y 1,Y 2,...,Y r ) X r = (x 1,x 2,...,x r ). Also, let y n r = (Y r+1,y r+2,...,y n ) X n r = (x r+1,x r+2,...,x n ). The suggested method of multiple imputation for model (1) is as follows: Received September 2001; revised January Supported by Korea Research Foundation Grant KRF C AMS 2000 subject classifications. Primary 62D05; secondary 62J99. Key words phrases. Missing data, nonresponse, variance estimation. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2004, Vol. 32, No. 2, This reprint differs from the original in pagination typographic detail. 1

2 2 J. K. KIM [M1] For each repetition of the imputation, k = 1,..., M, draw (2) σ 2 (k) y r i.i.d. (r p)ˆσ 2 r/χ 2 r p, where ˆσ 2 r = (r p) 1 y r [I X r(x r X r) 1 X r ]y r. [M2] Draw (3) β (k) (y r,σ (k) ) i.i.d. N(ˆβ r,(x rx r ) 1 σ 2 (k) ), where ˆβ r = (X rx r ) 1 X ry r. [M3] For each missing unit j = r + 1,...,n draw (4) e j(k) (β (k),σ (k) ) i.i.d. N(0,σ(k) 2 ). Then Yj(k) = x jβ (k) +e j(k) is the imputed value associated with unit j for the kth imputation. [M4] Repeat [M1] [M3] independently M times. The above procedure assumes a constant prior for (β,logσ) an ignorable response mechanism in the sense of Rubin (1976). At each repetition of the imputation, k = 1,...,M, we can calculate the imputed version of the full sample estimators ( n ) 1 [ r ] ˆβ I(k),n = x i x i x i y i + x i Yi(k) i=1 i=1 i=r+1 ( n ) ˆV I(k),n = x i x i 1ˆσ I(k),n 2, i=1 where [ r ] ˆσ I(k),n 2 = (n p) 1 (Y i x iˆβ I(k),n ) 2 + (Yi(k) x iˆβ I(k),n ) 2. i=1 i=r+1 The proposed point estimator for the regression coefficient based on M repeated imputations is (5) ˆβ M M,n = M 1 ˆβ I(k),n. The proposed estimator for the variance of the point estimator (5) is (6) where (7) k=1 ˆV M,n = W M,n + (1 + M 1 )B M,n, W M,n = M 1 M k=1 ˆV I(k),n

3 MULTIPLE IMPUTATION ESTIMATORS 3 (8) M B M,n = (M 1) 1 (ˆβ I(k),n ˆβ M,n )(ˆβ I(k),n ˆβ M,n ). k=1 Rubin (1987) called W M,n the within-imputation variance called B M,n the between-imputation variance. We call ˆV M,n Rubin s variance estimator. SW studied the asymptotic properties of the point estimator (5) its variance estimator (6). Under regularity conditions they showed that (9) (10) lim E(ˆβ n M,n β) = 0 lim n{e(ˆv M,n ) Var(ˆβ n M,n )} = 0, where the reference distribution in (9) (10) is the regression model (1) with an ignorable response mechanism. Note that (9) (10) require that the sample size n go to infinity, for fixed M, M > 1. Finite sample properties are not discussed by SW. The next section gives finite sample properties of the multiple imputation estimators. In Section 3 a simple modified version of the SW method is proposed to minimize the finite sample bias of the multiple imputation variance estimator. In Section 4 extensions are made to a more general class of estimators. In Section 5 results of a simulation study are reported. In Section 6 concluding remarks are made. 2. Finite sample properties. The following lemma provides the covariance structure of the multiply-imputed data set generated by [M1] [M4]. Lemma 2.1. Let Y i be the observed value of the ith unit, i = 1,2,...,r (r > p+2), let Yj(k) be the imputed value associated with the jth unit for the kth repetition of the multiple imputation generated by the steps [M1] [M4]. Then, under model (1) with an ignorable response mechanism, (11) (12) Cov(Y i(k),yj(s) ) Cov(Y i,y j(k) ) = x i (X r X r) 1 x j σ 2 (1 + λ)x i (X rx r ) 1 x i σ 2 + λσ 2, if i = j k = s, = (1 + λ)x i (X rx r ) 1 x j σ 2, if i j k = s, x i (X rx r ) 1 x j σ 2, if k s, where λ = (r p 2) 1 (r p) the expectations in (11) (12) are taken over the joint distribution of model (1) the imputation mechanism with the indices of respondents fixed.

4 4 J. K. KIM can be decomposed into three indepen- For the proof, see Appendix A. Note that the imputed value Y dent components as (13) i(k) Y i(k) = x iˆβ r +x i (β (k) ˆβ r ) + e i(k). The first component x iˆβr has mean x i β variance x i (X r X r) 1 x i σ 2, the second component has mean zero variance λx i (X rx r ) 1 x i σ 2 the third component has mean zero variance λσ 2. The following theorem gives the mean variance of the point estimator ˆβ M,n of the regression coefficient the mean of the multiple imputation variance estimator ˆV M,n. Again, the expectations in the following theorem are taken over the joint distribution of model (1) the imputation mechanism with the indices of respondents fixed. (14) (15) (16) (17) Theorem 2.1. Under the assumptions of Lemma 2.1, E(ˆβ M,n ) = β, Var(ˆβ M,n ) = (X rx r ) 1 σ 2 + M 1 λ[(x rx r ) 1 (X nx n ) 1 ]σ 2, E(W M,n ) = (X n X n) 1 {1 + (n p) 1 (λ 1)(n r)}σ 2 E(B M,n ) = λ[(x rx r ) 1 (X nx n ) 1 ]σ 2, where λ = (r p 2) 1 (r p), W M,n is the within-imputation variability defined in (7) B M,n is the between-imputation variability defined in (8). The bias of the multiple imputation variance estimator is (18) E(ˆV M,n ) Var(ˆβ M,n ) = (X n X n) 1 (n p) 1 (λ 1)(n r)σ 2 + (λ 1)[(X r X r) 1 (X n X n) 1 ]σ 2. For the proof, see Appendix B. As is observed from (15), the point estimator ˆβ M,n for infinite M achieves the same efficiency of ˆβ r, the estimator based on the respondents. In fact, lim ˆβ M,n M ( n = + = ˆβ r, ) 1 { x i x r i=1 i=1 ( n ) 1 { x i x i=1 x i [x iˆβ r + (y i x iˆβ r )] n i=r+1 } [ M x i x iˆβ r + M 1 k=1 (x i (β (k) ˆβ r ) + e i(k) ) ]}

5 MULTIPLE IMPUTATION ESTIMATORS 5 because r i=1 x i (y i x iˆβ r ) = 0 by stard regression theory, lim M M 1 Mk=1 (β (k) ˆβ r ) = 0 by the law of large numbers lim M M 1 Mk=1 e i(k) = 0 by the law of large numbers. By (15) the variance of ˆβ M,n can be written as (19) Var(ˆβ M,n ) = Var(ˆβ r ) + M 1 λ[var(ˆβ r ) Var(ˆβ n )]. The second part in the right-h side of (19) is the increase in variance due to using ˆβ M,n instead of ˆβ r. By (17) that increase can be unbiasedly estimated by M 1 B M,n. Thus, an alternative estimator for the total variance of ˆβ M,n that is unbiased for (15) is (20) ˆ Var(ˆβ r ) + M 1 B M,n, where ˆ Var(ˆβ r ) is the stard variance estimator that treats the respondents as if they are the original sample. In large samples, λ. = 1 the bias of the multiple imputation variance estimator for the imputed regression coefficient is negligible. The bias term (18) is an exact bias for r > p + 2. The total variance of ˆβ M,n can be decomposed into three parts: Var(ˆβ M,n ) = Var(ˆβ n ) + Var(ˆβ r ˆβ n ) + Var(ˆβ M,n ˆβ r ). The first part, the second part the third part can be called the sampling variance, the variance due to missingness the variance due to imputation, respectively. Rubin s multiple imputation uses W M,n to estimate the sampling variance, B M,n to estimate the variance due to missingness M 1 B M,n to estimate the variance due to M repeated imputations. The first term on the right-h side of (18) is the bias of W M,n as an estimator of the sampling variance the second term on the right-h side of (18) is the bias of B M,n as an estimator of the variance due to missingness. The imputation variance is unbiasedly estimated by M 1 B M,n. 3. A modification. We consider ways of reducing the bias of the multiple imputation variance estimator. Recall that multiple imputation is characterized by the method of generating the imputed values by the variance formula. The variance formula directly uses the complete sample variance estimator so that it can be implemented easily using existing software. One estimator of variance is the alternative variance estimator in (20), which involves direct estimation of parameters from the respondents. A similar idea was used by Wang Robins (1998). Then Rubin s variance estimator ˆV M,n in (6) is no longer required. If we want to use Rubin s variance formula, one approach is to modify the imputation method to minimize the bias term in (18). To find a best imputation procedure, instead of fixing a constant prior for log σ, we use a

6 6 J. K. KIM class of prior distributions indexed by hyperparameters to express a class of imputation methods. As a conjugate prior distribution for σ 2, we choose a scaled inverse chi-square with degrees of freedom ν 0 scale parameter σ0 2 as the hyperparameters; that is, the prior distribution of σ 2 is the distribution of ν 0 σ0 2/χ2 ν 0. In the modified imputation method we determine the values of the hyperparameters, ν 0 σ0 2, to remove the bias of the multiple imputation variance estimator. Using the hyperparameters, the posterior distribution for σ 2 is written (21) σ 2 (k) y r i.i.d. [ν 0 σ (r p)ˆσ 2 r]/χ 2 (ν 0 +r p), so that (22) E(σ 2 (k) ) = (ν 0 + r p 2) 1 (ν 0 σ (r p)σ 2 ). Note that SW used ν 0 = 0. Using the arguments of the proofs of Lemma 2.1 Theorem 2.1, the bias of the multiple imputation variance estimator based on the posterior distribution in (21) is (23) Bias(ˆV M,n ) = (X nx n ) 1 (n p) 1 {(λ 0 1)(n r)σ 2 + λ 1 (n r)σ 2 0} + {(X r X r) 1 (X n X n) 1 }{(λ 0 1)σ 2 + λ 1 σ 2 0 }, where λ 0 = (ν 0 +r p 2) 1 (r p) λ 1 = (ν 0 +r p 2) 1 ν 0. The bias is zero when ν 0 = 2 σ0 2 = 0, which is equivalent to generating the posterior values of σ 2 from the inverse chi-square distribution with degrees of freedom ν = r p + 2 instead of ν = r p in (2). Thus, the choice of ν = r p + 2 in (2) makes Rubin s variance estimator unbiased for a finite sample. We can also derive the optimal prior using the recent work of Meng Zaslavsky (2002) on single observation unbiased priors (SOUP). Meng Zaslavsky [2002, Section 6] showed that (24) π(σ 2 ) (σ 2 ) 2 is the unique SOUP among all continuously relatively scale invariant priors for the scale family distribution. Note that the prior density of the scaled inverse chi-square distribution can be written as (25) π(σ 2 ) (σ 2 ) (νo/2+1) exp[ ν 0 σ 2 0 /(2σ2 )]. Thus, the unique SOUP for σ 2 in (24) corresponds to the scaled inverse chi-square distribution with ν 0 = 2 σ0 2 = 0, which makes the posterior mean in (22) unbiased for σ 2.

7 MULTIPLE IMPUTATION ESTIMATORS 7 4. Extensions. In this section we investigate the properties of the modified method when it is applied to estimators other than the regression coefficients. Let the complete sample point estimator be a linear estimator of the form (26) ˆθ n = α i Y i i=1 for some known coefficients α i. Also, let the complete sample estimator for the variance of ˆθ n be a quadratic function of the sample values of the form (27) ˆV n = Ω ij Y i Y j i=1 j=1 for some known coefficients Ω ij. For the ˆV M,n to be asymptotically unbiased for the variance of ˆθ M,n, we need the congeniality assumption as defined in Meng (1994). The congeniality assumption in our context implies (28) Var(ˆθ,n ) = Var(ˆθ n ) + Var(ˆθ,n ˆθ n ). For the case of ˆθ n = ˆβ n discussed in Section 2, congeniality holds because ˆθ,n = ˆβ r Var(ˆβ r ˆβ n ) = (X r X r) 1 σ 2 (X n X n) 1 σ 2 = Var(ˆβ r ) Var(ˆβ n ). We restrict our attention to the case of congenial multiple imputation estimation because otherwise the variance estimator will be biased even asymptotically. The following theorem expresses the bias of Rubin s variance estimator applied to the general class of estimators ˆθ n in (26) ˆV n in (27) under the SW method. Theorem 4.1. Let the assumptions of Lemma 2.1 hold. Assume that the complete sample variance estimator ˆV n in (27) is unbiased for the variance of the complete sample point estimator ˆθ n in (26) under model (1). Let the congeniality assumption (28) hold. Then the multiple imputation point estimator ˆθ M,n is unbiased with variance { r } r Var(ˆθ M,n ) = α 2 i + 2 α i α j h ij + α i α j h ij σ 2 (29) i=1 + M 1 λ i=1 j=r+1 { n α i α j h ij + } α 2 i i=r+1 σ 2.

8 8 J. K. KIM The bias of ˆV M,n as an estimator of Var(ˆθ M,n ) is (30) E(ˆV M,n ) Var(ˆθ M,n ) { r = 2 Ω ij h ij + (1 + λ) Ω ij h ij i=1 j=r+1 + (λ 1) j=r+1 Ω jj } { n + (λ 1) α 2 j + α i α j h ij }σ 2, j=r+1 j=r+1 j=r+1 σ 2 where λ = (r p 2) 1 (r p) h ij = x i (X rx r ) 1 x j. For the proof, see Appendix C. The first term on the right-h side of (30) is the bias of W M,n as an estimator of Var(ˆθ n ) the second term is the bias of (1 + M 1 )B M,n as an estimator of Var(ˆθ M,n ˆθ n ). Note that the bias term in (30) can be written (31) where U = Bias(ˆV M,n ) = 2 { n i=1 j=r+1 (Ω ij + α i α j )h ij + Ω ij h ij σ 2 + (λ 1)Uσ 2, j=r+1 (Ω jj + α 2 j ) } = trace {(Ω n r + α n r α n r )[X n r(x r X r) 1 X n r + I n r]} with Ω n r the lower-right (n r) (n r) partition of Ω = [Ω ij ] α n r = (α r+1,...,α n ). By the nonnegative definiteness of Ω n r, α n r α n r, X n r (X rx r ) 1 X n r I n r, the U term is nonnegative. Thus, if we use the modified method suggested in Section 3 so that we have λ = 1, the variance term (29) decreases the bias will be reduced to (32) Bias(ˆV M,n ) = 2 Ω ij h ij σ 2, i=1 j=r+1 which is always smaller than the original bias in (31) because λ = (r p 2) 1 (r p) > 1. A sufficient condition for the bias term in (32) to be zero is (33) Ω = c[i n X n (X nx n ) 1 X n]

9 MULTIPLE IMPUTATION ESTIMATORS 9 for some constant c > 0. To show this, let δ ij be the (i,j)th element of I n. Then Ω ij h ij = c[δ ij x i (X n X n) 1 x j ][x i (X r X r) 1 x j ] i=1 i=1 [ n ] = cx j (X r X r) 1 x j cx j (X n X n) 1 x i x i (X r X r) 1 x j = 0 i=1 the bias term in (32) equals zero, which alternatively justifies our assertion of zero bias in Section Simulation study. To see the effect of changing ν = r p into ν = r p + 2, we performed a limited simulation. The simulation study can be described as a factorial design with L = 50,000 samples within each cell, where each sample is generated from (34) Y i = 2 + 4x i + e i, where x i = (n + 1) 1 i e i are independently identically distributed from the stard normal distribution. Thus, the population mean of (X,Y ) is (10, 42). The factors are as follows: 1. factor A, method of multiple imputation SW method (ν = r 2), new method (ν = r); 2. factor B, response rate (r/n) 0.8, 0.6, 0.4; 3. factor C, sample size (n) 20, 200. We used a uniform response mechanism M = 5 repeated imputations. Table 1 presents the mean, the variance the percentage relative efficiency of the point estimators under the two imputation schemes. The percentage relative efficiency (PRE) is PRE = [Var L (ˆθ SW )] 1 Var L (ˆθ new ) 100, where the subscript L denotes the distribution generated by the Monte Carlo simulation. Both imputation methods are unbiased for the two parameters the Monte Carlo results are consistent with that property. The new procedure is slightly more efficient than the SW procedure the efficiency is greater for lower response rates. Table 2 presents the relative bias z-statistics for the variance estimators. The relative bias of ˆV as an estimator of the variance of ˆθ is calculated as [Var L (ˆθ)] 1 [E L (ˆV ) Var L (ˆθ)], the z-statistic for testing H 0 :E(ˆV ) = Var(ˆθ) is (35) z-statistic = L 1/2 [E L ˆV ) VarL (ˆθ)] {E L [ˆV E L (ˆV ) + Var L (ˆθ) (ˆθ E L (ˆθ)) 2 ] 2 } 1/2.

10 10 J. K. KIM Table 1 Mean, variance the percentage relative efficiency (PRE) of the multiple imputation point estimators under the two different imputation schemes (50,000 samples) Mean Variance PRE Parameter n r/n SW New SW New (%) Mean Slope A heuristic argument for the justification of the z-statistic is made in Appendix D. Under the SW imputation, the relative bias is larger for smaller samples for smaller response rates. The new imputation produces much smaller relative bias for the variance estimator. For large sample sizes both imputation methods produce negligible relative biases of the variance estimators. Table 2 Relative bias (RB) the z-statistic of Rubin s variance estimators under the two different imputation schemes (50,000 samples) RB z-statistic Parameter n r/n SW New SW New Mean Slope

11 MULTIPLE IMPUTATION ESTIMATORS 11 Table 3 Mean length the coverage of 95% confidence intervals under the two different imputation schemes (50,000 samples) Mean length Coverage (%) Parameter n r/n SW New SW New Mean Slope Table 3 displays the mean lengths the coverages of 95% confidence intervals. The confidence intervals are (ˆθ t ˆV, ˆθ+t ˆV ), where t = t0.025,ν ν is computed using the method of Barnard Rubin (1999). The coverages of the confidence intervals are all close to the nominal level. For small sample sizes the confidence intervals based on the new imputation are slightly narrower than the confidence intervals based on the SW method. Interval estimation shows less dramatic results for small sample size than variance estimation. This is partly because the distributions of point estimates are bell-shaped partly because the degrees of freedom of Barnard Rubin (1999) attenuate the effect of small sample bias of the variance estimator. 6. Discussion. We study the mean the covariance structure of the data set generated by the conventional multiple-imputation method under the regression model. Using the mean the covariance structure of the multiply-imputed data set, we investigate the exact bias of Rubin s variance estimator. The bias of Rubin s variance estimator is negligible for large sample sizes, as discussed by SW, but the bias may be sizable for small sample sizes. We propose a simple modified imputation method that is more efficient than the SW method makes Rubin s variance estimator unbiased. When applied to a general class of linear estimators, the proposed method produces more efficient estimates has smaller bias for variance estimation than that of the SW method. In a simulation study we found that the bias of Rubin s variance estimator under the SW method is remarkably large for small sample sizes. The bias of Rubin s variance estimator under the new method is reasonably small in the simulation.

12 12 J. K. KIM In practice, the small sample bias of the multiple imputation variance estimator is of special concern when the scale parameter σ is generated with small degrees of freedom. One such example is stratified sampling, where the sample selection is performed independently across the strata. In a stratified sample the assumption of equal σ across the strata is not a reasonable assumption. Thus, the scale parameters have to be generated independently within each stratum, using only the respondents in the stratum, which often makes the degrees of freedom very small even for a large data set. The new method will significantly reduce the bias in this case. A commonly used imputation model is the cell mean model, where the study variables are assumed homogeneous within each cell. Under the cell mean model Rubin Schenker (1986) considered various multiple imputation methods. Since the imputation is performed separately within each cell, the scale parameters are generated independently within each cell. Thus, the methods considered by Rubin Schenker (1986) are subject to small sample biases. The biases can be significant when there are a small number of respondents within each cell. Recently, Kim (2002) proposed an alternative imputation method of making the variance estimator unbiased under the cell mean model. APPENDIX A Proof of Lemma 2.1. By (3), (A.1) So E(x j β (k) y r) = x j ˆβ r. Cov(Y i,y j(k) ) = Cov {Y i,e(y j(k) y r)} = Cov(Y i,x j ˆβ r ) = x i (X r X r) 1 x j σ 2. Now, by (2), (A.2) where λ = (r p 2) 1 (r p). So (A.3) E(σ 2 (k) ) = E(σ 2 (k) y r) = E(λˆσ 2 r) = λσ 2, Cov(x iβ (k),x jβ (k) y r) = Cov[E(x iβ (k) y r,σ (k) ),E(x jβ (k) y r,σ (k) ) y r], for k s, (A.4) + E[Cov(x i β (k),x j β (k) y r,σ (k) ) y r] = E[x i (X r X r) 1 x j σ 2 (k) y r] = x i (X r X r) 1 x jˆσ 2 r λ, Cov(x i β (k),x j β (s) y r) = 0.

13 Hence, by (A.1) (A.3), (A.5) MULTIPLE IMPUTATION ESTIMATORS 13 Cov(x iβ (k),x jβ (k) ) = Cov[E(x iβ (k) y r),e(x jβ (k) y r)], for k s, by (A.1) (A.4), (A.6) + E[Cov(x iβ (k),x jβ (k) y r)] = Cov(x iˆβ r,x j ˆβ r ) + E{x i (X r X r) 1 x jˆσ 2 r λ} = (1 + λ)x i (X r X r) 1 x j σ 2 Cov(x iβ (k),x jβ (s) ) = E[Cov(x iβ (k),x jβ (s) y r)] By (4) we have, for i j, (A.7) (A.8) + Cov[E(x i β (k) y r),e(x j β (s) y r)] = Cov(x iˆβ r,x j ˆβ r ) = x i(x X) 1 x j σ 2. Cov(Yi(k),Yj(s) ) = Cov(x i β (k),x j β (s) ) { Cov(Yi(k) Var(x,Yj(k) ) = i β (k) ) + E(σ 2 (k)), if i = j, Cov(x i β (k),x j β (k)), if i j. Therefore, inserting (A.2), (A.5) (A.6) into (A.7) (A.8), result (12) follows. APPENDIX B Proof of Theorem 2.1. Before we calculate the variance of the multiple imputation estimator we provide the following matrix identity. Lemma B.1. Let X n be an n p matrix of the form X n = (X r,x n r), where X r is an r p matrix X n r is an (n r) p matrix. Assume that X nx n X rx r are nonsingular. Then (X r X r) 1 = (X n X n) 1 + (X n X n) 1 X n r X n r(x n X n) 1 (B.1) + (X nx n ) 1 X n rx n r (X rx r ) 1 X n rx n r (X nx n ) 1. Proof. Using the identity [e.g., Searle (1982), page 261] (B.2) (D CA 1 B) 1 = D 1 + D 1 C(A BD 1 C) 1 BD 1 with A = I, B = X n r, C = X n r D = X n X n, we have (X r X r) 1 = (X n X n) 1 + (X n X n) 1 X n r [I X n r(x n X n) 1 X n r (B.3) ] 1 X n r (X n X n) 1.

14 14 J. K. KIM Using (B.2) again with A = X nx n, B = X n r, C = X n r D = I, we have (B.4) [I X n r (X n X n) 1 X n r ] 1 = I + X n r (X r X r) 1 X n r. Inserting (B.4) into (B.3), we have (B.1). Note that ˆβ M,n = M 1 M k=1 ˆβI(k),n the ˆβ I(1),n, ˆβ I(2),n,..., ˆβ I(M),n are identically distributed. Thus, (B.5) Var(ˆβ M,n ) = (1 M 1 )Cov(ˆβ I(1),n, ˆβ I(2),n ) + M 1 Var(ˆβ I(1),n ). Define a i = (X nx n ) 1 x i h ij = x i (X rx r ) 1 x j. By (11) (12) (B.6) (B.7) Cov(ˆβ I(1),n, ˆβ r r I(2),n ) = a i a iσ a i h ij a jσ 2 i=1 i=1 j=r+1 + a i h ij a j σ2 r r Var(ˆβ I(1),n ) = a i a i σ2 + 2 a i h ij a j σ2 i=1 i=1 j=r+1 + (1 + λ) a i h ij a j σ2 + λ a i a i σ2. i=r+1 By the definition of a i h ij, we have (B.8) (B.9) (B.10) r a i a i = (X nx n ) 1 X rx r (X nx n ) 1, i=1 r a i h ij a j = (X n X n) 1 X n r X n r(x n X n) 1 i=1 j=r+1 = a i a j a i h ij a j = (X n X n) 1 X n r X n r(x r X r) 1 X n r X n r(x n X n) 1.

15 MULTIPLE IMPUTATION ESTIMATORS 15 Hence, inserting (B.8) (B.10) into (B.6) (B.7), applying X rx r + X n r X n r = X n X n (B.1), we have (B.11) (B.12) Cov(ˆβ I(1),n, ˆβ I(2),n ) = (X r X r) 1 σ 2 Var(ˆβ I(1),n ) = (X r X r) 1 σ 2 + λ[(x r X r) 1 (X n X n) 1 ]σ 2. Thus, (15) is proved by (B.5). To show (17), because the ˆβ I(1),n, ˆβ I(2),n,..., ˆβ I(M),n are identically distributed, (B.13) E(B M,n ) = Var(ˆβ I(1),n ) Cov(ˆβ I(1),nˆβI(2),n ). Thus, (17) is proved by inserting (B.11) (B.12) into (B.13). To show (16), we define Ỹ (k) = (y r,y(k) ) to be the vector of the augmented data set at the kth repeated imputation, where y(k) = (Y r+1(k),y r+2(k),...,y k = 1,2,...,M. Then (n p)ˆσ 2 I(k),n = Ỹ (k) [I n X n (X nx n ) 1 X n]ỹ(k), under the regression model (1), E{Ỹ (k) [I n X n (X nx n ) 1 X n]ỹ(k)} = trace {[I n X n (X nx n ) 1 X n]v (Ỹ (k) )}. By (11) (12) we have ( V (Ỹ (k) ) = I r X n r (X r X r) 1 X r X r (X r X r) 1 X n r (1 + λ)x n r (X r X r) 1 X n r + λi n r Let P x = X n (X n X n) 1 X n Q = V (Ỹ (k) )σ 2 I n. Then (B.14) E{(n p)ˆσ 2 I(k),n } = E{(I P x)(q x + I n )σ 2 } = trace{i n P x }σ 2 + trace{[i n P x ]Q}σ 2. ) σ 2. By the classical regression theory trace{i n P x } = n p. For the second term note that the left-upper r r elements of Q are all zeros. Define C = (X n X n) 1 X n r X n r D = (X r X r) 1 X n r X n r. Then (B.15) trace(q) = trace {(1 + λ)x n r (X r X r) 1 X n r + (λ 1)I n r} = (1 + λ)trace(d) + (λ 1)(n r) trace(p x Q) = 2trace {X n r (X n X n) 1 X n r } (B.16) + (1 + λ)trace {X n r (X n X n) 1 X n r X n r(x r X r) 1 X n r } + (λ 1)trace {X n r (X nx n ) 1 X n r} = (1 + λ)trace(c + CD). n(k) ),

16 16 J. K. KIM Note that, using X rx r + X n rx n r = X nx n, we have (B.17) trace(d CD) = trace {[I (X nx n ) 1 X n rx n r ](X rx r ) 1 X n rx n r } = trace {(X nx n ) 1 X n rx n r } = trace(c). Thus, by applying (B.17) to (B.15) (B.16), we have (B.18) trace(q P x Q) = (λ 1)(n r). Therefore, inserting (B.18) into (B.14), we have (16). APPENDIX C Proof of Theorem 4.1. The variance formula (29) directly follows by applying the multiple imputation variance formula (B.5) to ˆθ M,n using a i = α i in (B.6) (B.7). To show (30), we decompose the total variance into three parts: (C.1) Var(ˆθ M,n ) = Var(ˆθ n ) + Var(ˆθ M,n ˆθ n ) + 2Cov(ˆθ n, ˆθ M,n ˆθ n ). To compute the bias of W M,n as an estimator of Var(ˆθ n ), we first express it as ˆV n = Y nωy n, where Y n = (y r,y n r) Ω = [Ω ij ]; then the withinimputation variance term can be written W M,n = M 1 M k=1 Ỹ (k) ΩỸ(k). By E(Ỹ(k)) = E(Y n ), E(ˆV (k) ) = E(Ỹ (k) )ΩE(Ỹ(k)) + trace {ΩVar(Ỹ(k))} = E(ˆV n ) + trace {Ω[Var(Ỹ(k)) Var(Y n )]}. By the unbiasedness of ˆV n by the covariance structures in (11) (12) we have (C.2) E(W M,n ) Var(ˆθ n ) = 2 r i=1 j=r+1 + (λ 1)σ 2 n Ω ij h ij σ 2 + (σ 2 + λσ 2 ) j=r+1 Ω jj. Ω ij h ij For the B M,n term it can be shown that, using the same argument as for (B.6) (B.7), (C.3) E(B M,n ) = Var(ˆθ I(1),n ) Cov(ˆθ I(1),nˆθI(2),n ) ( n ) = α i α j h ij + λσ 2. α 2 i j=r+1

17 By the covariance structures in (11) MULTIPLE IMPUTATION ESTIMATORS 17 (C.4) ( Cov(ˆθ M,n, ˆθ r r n ) = α 2 i + α i α j h ij )σ 2. i=1 i=1 j=r+1 Thus, by (29), (C.4) Var(ˆθ n ) = n i=1 α 2 i σ2, Var(ˆθ M,n ˆθ n ) = Var(ˆθ M,n ) + Var(ˆθ n ) 2Cov(ˆθ M,n, ˆθ n ) (C.5) ( = (1 + M 1 n n ) λ) α 2 i + α i α j h ij, by (C.3) (C.5), i=r+1 σ 2 (C.6) E[(1 + M 1 )B M,n ] Var(ˆθ M,n ˆθ n ) [ n = (λ 1) α 2 i + n i=r+1 α i α j h ij ]σ 2. For the covariance term in (C.1) note that (C.7) Cov(ˆθ n, ˆθ M,n ˆθ n ) = Cov(ˆθ n, ˆθ,n ˆθ n ) + Cov(ˆθ n, ˆθ M,n ˆθ,n ) = 0, because the first term on the right-h side of the above equality is zero by the congeniality condition (28) the second term is also zero because Cov(ˆθ n, ˆθ M,n ˆθ,n ) = Cov(ˆθ n, ˆθ M,n ) Cov(ˆθ n, ˆθ,n ) = 0, by the fact that ˆθ I(k), k = 1,2,...,M, are identically distributed. Therefore, (30) follows from (C.2), (C.6) (C.7). APPENDIX D D.1. Justification for z-statistic in (35). Let (ˆθ i, ˆV i ), i = 1,2,...,L, be i.i.d. samples from a bivariate distribution G(ˆθ, ˆV ) with second moments. Then E(ˆV ) is unbiasedly estimated by E L (ˆV ) = L 1 L i=1 ˆVi Var(ˆθ) is unbiasedly estimated by (L 1) 1 L E L [(ˆθ E L (ˆθ)) 2 ]. = L 1 L i=1 (ˆθ E L (ˆθ)) 2, where E L (ˆθ) = L 1 L i=1 ˆθi. Thus, by the central limit theorem, (D.1) Z = E L(ˆV ) E L [(ˆθ E L (ˆθ)) 2 ] [E(ˆV ) Var(ˆθ)] Var{E L (ˆV ) E L [(ˆθ E L (ˆθ)) 2 ]}

18 18 J. K. KIM converges to a N(0,1) distribution as L. As E L (ˆV ) E L [(ˆθ E L (ˆθ)) 2 ] = L 1 L i=1 [ˆV i (ˆθ i E L (ˆθ)) 2 ], the variance term in the denominator of (D.1) is consistently estimated by L 1 E L {[ˆV (ˆθ E L (ˆθ)) 2 E L [ˆV (ˆθ E L (ˆθ)) 2 ]] 2 }. = L 1 E L {[ˆV (ˆθ E L (ˆθ)) 2 E L (ˆV ) + Var L (ˆθ)] 2 }. Thus, using Slutsky s theorem, the z-statistic in (35) converges to a N(0, 1) distribution under H 0 :E(ˆV ) = Var(ˆθ) as L. Acknowledgments. The author thanks Wayne Fuller, Lou Rizzo the referees for their useful comments suggestions. REFERENCES Barnard, J. Rubin, D. B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika MR Kim, J. K. (2002). A note on approximate Bayesian bootstrap imputation. Biometrika MR Meng, X.-L. (1994). Multiple-imputation inferences with uncongenial sources of input (with discussion). Statist. Sci Meng, X.-L. Zaslavsky, A. M. (2002). Single observation unbiased priors. Ann. Statist MR Rubin, D. B. (1976). Inference missing data (with discussion). Biometrika MR Rubin, D. B. (1978). Multiple imputations in sample surveys: A phenomenological Bayesian approach to nonresponse (with discussion). In Proc. Survey Research Methods Section Amer. Statist. Assoc., Alexria, VA. Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley, New York. MR Rubin, D. B. Schenker, N. (1986). Multiple imputation for interval estimation from simple rom samples with ignorable nonresponse. J. Amer. Statist. Assoc MR Schenker, N. Welsh, A. H. (1988). Asymptotic results for multiple imputation. Ann. Statist MR Searle, S. R. (1982). Matrix Algebra Useful for Statistics. Wiley, New York. MR Wang, N. Robins, J. M. (1998). Large sample theory for parametric multiple imputation procedures. Biometrika MR Department of Applied Statistics Yonsei University 134 Sinchon-dong, Seodaemun-gu Seoul Korea kimj@yonsei.ac.kr

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume