Inferences on missing information under multiple imputation and two-stage multiple imputation

Size: px

Start display at page:

Download "Inferences on missing information under multiple imputation and two-stage multiple imputation"

Leo Scott
5 years ago
Views:

1 p. 1/4 Inferences on missing information under multiple imputation and two-stage multiple imputation Ofer Harel Department of Statistics University of Connecticut Prepared for the Missing Data Approaches in the Health and Social Sciences: A Modern Survey meeting, October, 2010.

2 p. 2/4 Outline Background Methods Multiple imputation (MI) Two-stage MI Asymptotic results Number of imputations Discussion Current/future work

3 Background The missing data problem Most statistical analysis and estimation procedures were not designed to handle missing values. Even a small amount of missing values can cause great difficulty. The missing-data aspect is nuisance, not of primary interest. The Goal To make statistically valid inferences about population parameters from an incomplete data set. Not to estimate, predict, or recover the missing values themselves. Untestable assumptions are inevitable. Sensitivity analysis is helpful. p. 3/4

4 p. 4/4 Background Information The information concept was introduced by Fisher in the 30 s. The use of information, information matrices and other functions of the information concepts have become very common. For example The inverse of the covariance matrix. Information based tests such as AIC and BIC. The information concept became important for both frequentists and bayesians.

5 p. 5/4 Background Information I am interested in using the information concept in an incomplete-data situation. Rubin introduced the rate of the missing information, after partitioning the complete information into observed and missing parts. Both Rubin and Schafer argue for the importance of this measure. I am going to develop this concept and show its importance. I am going to touch on possible extensions. How do we define the rates of missing information?

6 p. 6/4 Why do we care? Can we use MI (or any other missing data procedure) if we have 1%, 5%, 10%, 25%, 50%, 75%, 90% missing values? what is the cut point?

7 p. 7/4 Why do we care? Can we use MI (or any other missing data procedure) if we have 1%, 5%, 10%, 25%, 50%, 75%, 90% missing values? what is the cut point? How confident are we in the results from incomplete data? When using MI, how do we determine the number of imputations? Can we evaluate the effect of a missing value or a group of missing values? Can we evaluate our missing data model, and/or its effect on our inference?

8 p. 8/4 Multiple imputation Multiple imputation (Rubin, 1987): a simulation-based approach to missing data. Imputation: Create m imputations of the missing data, Y (1) (m) mis,...,y mis, under a suitable model. Analysis: Analyze each of the m completed data sets in the same way. Combination: Combine the m sets of estimates and variances using Rubin s (1987) rules.

9 p. 9/4 Multiple imputation MI is separated into 3 stages: Imputation stage, Analysis stage, and Combining Results stage. Observed Data???? Im putation 1 2 m

10 p. 10/4 Multiple imputation Imputation: Under an ignorability assumption we draw m sets of imputations from P(Y mis Y obs ), while under non-ignorable models we draw the imputations from P(Y mis Y obs,m). Analysis: Given the complete data sets we can use any complete-data procedures to analyze the data.

11 p. 11/4 Multiple imputation Combining rules: Calculate and store the estimates ˆQ (j) and variancesu (j) for j = 1,...,m and combine: Q = m 1 m ˆQ (j) j=1 U = m 1 m j=1 U (j) m (ˆQ(j) Q) 2 B = (m 1) 1 j=1 T = U +(1+m 1 )B

12 p. 12/4 MI - Cont. Which leads to an approximate 95% interval for Q Q ±t ν T, where the degrees of freedom are ν = (m 1) [ ] (1+m 1 2 )B. T The rate of missing information is λ = U 1 (ν +1)(ν +3) 1 T 1 U 1. This is approximately λ = B U+B when ν is large.

13 p. 13/4 Rates of missing information The rate of missing information is the proportion of missing information to the complete information. The rate of missing information governs the convergence rates of the EM algorithm and the data augmentation. The lower the rate of missing information, the faster the convergence will be. The rate of missing information is important in assessing how missing information contributes to inferential uncertainty about the population quantity of interest.

14 p. 14/4 Rates of missing information The naive estimate of the rate of missing information is the rate of missing values. The naive estimate will be accurate in only very limited situations. The rate of missing information is a function not only of the amount of missing values but also of their importance. When using MI with a small number of imputations, the estimate of the rates of missing information can be very noisy.

15 p. 15/4 Two-types of missing values In many situations, missing values may be oftwo qualitatively different types. Examples include: planned missingness versus unplanned. unit non-response versus item non-response. refusal versus don t know. latent variables versus missing manifest items. deaths versus other types of dropout. dropouts versus subjects who return. dropout for reasons that might be closely related to outcomes, versus dropout for other reasons.

16 p. 16/4 Two-stage multiple imputation Extends Rubin s (1987) framework to two types of missing values. Imputation: Createmimputations ofymis A the first level of the missing data, Y A(1) A(m) mis,...,ymis. Then, for eachymis, A generate n imputations of Ymis B from a conditional predictive distribution given Y A mis. Analysis: Analyze each of the mn completed data sets in the same way. Combination: Combine the mn sets of estimates and variances by Shen s (2000) rules.

17 p. 17/4 Two-stage MI m A A A B B A B 1 n 1 n 1 n 1 n

18 Shen rules From Shen (2000, unpublished Ph.D. thesis) calculate and store: ˆQ (j,k) = estimate of Q U (j,k) = standard error 2 for j = 1,...,m and k = 1,...,n. Then: Q.. = 1 mn m n j=1 k=1 ˆQ (j,k) Qj. = 1 n n k=1 ˆQ (j,k) U = B = 1 mn W = 1 m m 1 m 1 n j=1 k=1 m j=1 U (j,k) m ) 2 (ˆQ (j.) Q.. j=1 1 n 1 n (ˆQ(j,k) Q ) 2 j. k=1 p. 18/4

19 p. 19/4 Shen rules - Cont. The total variance is T = U.. +(1+ 1 m )B +(1 1 n )W. An approximate 95% interval for Q is Q ±t ν T, where the degrees of freedom are ν 1 = 1 m(n 1) ( (1 1 n )W T ) (m 1) ( (1+ 1 m )B ) 2. T

20 p. 20/4 Shen rules - Cont. The overall estimated rate of missing information is ˆλ = B +(1 n 1 )W U.. +B +(1 n 1 )W. The estimated rate of missing information due to Y B mis if Y A mis was known is ˆλ B A = W U.. +W, and the difference ˆλ A = ˆλ ˆλ B A represents the decrease in the rate of missing information if Y A mis becomes known.

21 p. 21/4 Rates of missing information The overall estimated rate of missing information in two-stage MI is equivalent to the rate of missing information in conventional MI. The partition of the rates of missing information is based on the partition of the missing values. There are many ways to partition the missing data. Similar to conventional MI, a small number of imputations may lead to a noisy estimate of the rates of missing information.

22 p. 22/4 Asymptotic Results - Conventional MI The random imputations, Y (j) mis P(Y mis Y obs ), induces a probability distribution to B. Rubin argues that (m 1)B B χ 2 m 1, where B = V(ˆQ(Y obs,y mis ) Y obs ). Assuming that the above is reasonable, the Central Limit Theorem indicates that m/2(b B ) B N(0,1) asm, where E(B) = B and V(B) = 2B 2 /(m 1).

23 p. 23/4 Asymptotic Results - Conventional MI Because ˆλ = B/(B +U), we can use the delta method to derive m(ˆλ λ) 2B 2 U 2 (B+ U) 4 which may also be written as N(0,1), m(ˆλ λ) 2λ(1 λ) N(0,1). Therefore, an approximate 95% confidence interval for the population rate of missing information is ˆλ± 1.96ˆλ(1 ˆλ) m/2.

24 p. 24/4 Asymptotic Results - Conventional MI The rate of missing information is a proportion. In situations where the rate of missing information is close to zero or one, it may be useful to apply a normal approximation on the logit scale, Logit(p) = log[p/(1 p)]; this result in m(logit(ˆλ) Logit(λ)) N(0,2). the logit transformation is a first order variance-stabilizing transformation for ˆλ

25 p. 25/4 Asymptotic Results - two-stage MI In two-stage MI, the sources of the variance can be separated into a between and within nest. ˆQ jk Q.. = ( Q j. Q.. )+(ˆQ jk Q j. ). Source DF SS MS E(MS) Between m 1 n m j=1 ( Q j. Q.. ) 2 nb σw 2 +nσ2 B Within m(n 1) m n j=1 k=1 (ˆQ jk Q j. ) 2 W σw 2

26 p. 26/4 Asymptotic Results - two-stage MI From the ANOVA literature it is known that the mean square errors are independent chi-squares random variables such that, ( σ 2 W +nσb 2 m 1 ) 1 nb χ 2 m 1 ( σ 2 W m(n 1) ) 1 W χ 2 m(n 1).

27 p. 27/4 Asymptotic Results - two-stage MI A first-order approximation to the joint distribution ofb and W is m V B W E B W N (0,I), 0 V 2 where V 1 = 2(σ 2 W +nσ2 B )2 /n 2 and V 2 = 2(σ 2 W )2 /(n 1). Using a multivariate delta method expansion, similar to the procedure we have done for conventional MI we get

28 p. 28/4 Asymptotic Results - two-stage MI mσ 1 1 λ λ B A ˆλ ˆ λ B A N (0,I), where Σ 1 = V 1 U 2 +V 2 U 2 (1 n 1 ) V 2 U 2 (1 n 1 ) (U+B+(1 n 1 )W) 4 (U+B+(1 n 1 )W) 2 (U+W) 2 V 2 U 2 (1 n 1 ) V 2 U 2 (U+B+(1 n 1 )W) 2 (U+W) 2 (U+W) 4.

29 Asymptotic Results - two-stage MI Rewriting the covariance matrixσ1 as a function of λ andλ B A gives Σ 1 = [ λ 2(1 λ)4 1 λ λb A n 1 1 λ B A n 2(1 λ) 2 (λ B A ) 2 n ] 2 2(1 λ) 2 (λ B A ) 2 n 2(1 λ B A ) 2 (λ B A ) 2 (n 1). When the rates of missing information are close to zero or one, we can calculate an approximation on a logit scale mσ 1 Logit(λ) 2 Logit(ˆλ) Logit(λ B A ) Logit( λ B A ˆ ) N (0,I), where Σ 2 = 2(1 λ) 2 λ 2 [ λ 1 λ λb A 2(1 λ)λ B A λ(1 λ B A )n ] 2 n 1 2(1 λ)λ B A 1 λ B A n λ(1 λ B A )n 2 (n 1). p. 29/4

30 p. 30/4 Asymptotic Results - simulations Simulation studies were designed to evaluate the theoretical distributions. Simulation studies were performed for both conventional MI and two-stage MI. Different levels of rates of missing information and number of imputations were simulated for conventional MI and the estimates, variances and coverage were tested. For two-stage MI, the rates of missing information and the number of imputations in each level were simulated, and again the estimates variances (covariances) and coverage were tested. The simulated and theoretical results were very close.

31 p. 31/4 Simulations results Comparison of asymptotic and simulated behavior of the estimated rate of missing information. Plot of SE(ˆλ) 100

32 p. 32/4 Simulations results Comparison of asymptotic and simulated behavior of the estimated rate of missing information. Plot of SE(ˆλ) 100

33 Number of imputations Rubin (1987) demonstrated that a limited number of imputations (5-20) is enough. This is true since the relative efficiency on the variance scale of a point estimate based on m imputations, to one based on an infinite number of imputations, is(1+λ/m) 1. m λ p. 33/4

34 p. 34/4 Number of imputations - Cont. From the standpoint of point-estimation alone, there is little to gain from using more than a few imputations unless the rates of missing information are unusually large. When we are interested in the rates of missing information, estimates of these rates can be very unstable (Schafer, 1997). I will determine the number of imputations necessary for obtaining reliable estimates for the missing information rate.

35 p. 35/4 Number of imputations - Cont. Table 1: Recommendations for choosing m in Conventional MI to estimateλwith given accuracy ˆλ SE(ˆλ) = SE(ˆλ) = SE(ˆλ) =

36 p. 36/4 Number of imputations - Cont. For two-stage MI, one can invoke similar arguments for the relative efficiency (1+λ /m) 1, where λ = λ λb A (1 λ)(1 n 1 ). (1 λ B A ) Because0 λ B A λ, the relative efficiency lies between (1+λ/m) 1 and (1+λ/(mn)) 1.

37 Number of imputations - Cont. Recommendations for choosing m and n in two-stage MI where SE(ˆλ B A ) SE(ˆλ) 0.03 ˆλ B A n ˆλ p. 37/4

38 Number of imputations - Cont. Recommendations for choosing m and n in two-stage MI where SE(ˆλ B A ) SE(ˆλ) 0.02 ˆλ B A n ˆλ p. 38/4

39 Number of imputations - Cont. Recommendations for choosing m and n in two-stage MI where SE(ˆλ B A ) SE(ˆλ) 0.01 ˆλ B A n ˆλ p. 39/4

40 p. 40/4 Discussion When the researcher s main interest is point estimates (and their variances), it is sufficient to use a modest number of imputations. When the rates of missing information are a concern, more imputations are recommended. Simulations have been made to test the accuracy of the asymptotic distribution. In most cases, producing more imputations is not a problem. The rates of missing information are an important model diagnostic tool.

41 p. 41/4 Extensions There are numerous measures that assess the effect of an observation, group of observations, a variable, or variables and observations on the regression estimation (Outlier, influence point etc.). Can we assess the effect of a missing observation, a group of missing observations, an incomplete variable or any combination of these, on the overall estimation? We want this measure to be applicable under any missing data mechanism. And applicable to any data analysis problem.

42 p. 42/4 Extensions I call this measure "Outfluence". The Outfluence is a relative measure comparing the effect of a missing value relative to the effect of the rest of the missing values. For example, consider a survey study in which we are interested in the effect of missing income on the estimation. The outfluence will measure the effect of the missing income compared to all other missing values. We already defined the measure (Harel, 2008) and introduce its limiting distribution (Harel & Stratton, 2009). More work needs to be devoted to the development of methods that will evaluate incomplete-data methodology.

43 p. 43/4 Outfluence Consider the next setup:

44 p. 44/4 Marijuana Example A pilot study presented by Weil et al (1968) and analyzed by Schafer (1997). Three treatments (low dose, high dose and placebo) were given to each subject. The change in heart beat was recorded. We will look at the mean of placebo in 90 minute since Schafer (1997) showed missing data have a large effect on this estimate.

45 p. 45/4 Example Data 15 minutes 90 minutes Subject Placebo Low High Placebo Low High NA NA NA NA NA Means Source: Weil et al (1967)

46 p. 46/4 Example Results Missing values ˆ (4,+) (4,4) (5,4) (+,4) (9,+) B A ˆ ˆ

47 p. 47/4 Extensions and Future work Model diagnosis is an importance concept in statistics The rates of missing information can be used as a diagnostic tool (e.g. Harel & Miglioretti 2007). Many measures were developed to compare the performance of competing models in complete data. To date there is almost no research about the performance of methodology with incomplete data. How can we use measures such as AIC,BIC, DIC to evaluate our missing data modeling?

48 p. 48/4 References Harel, O. (2007) "Inferences on missing information under multiple imputation and two-stage multiple imputation" Statistical Methodology, 4, Harel, O., Miglioretti, D. (2007) "Latent class analysis and the rate of missing information" Journal of Data Science, 5, Harel, O. (2008) "Outfluence - The impact of missing values"model Assisted Statistics and Applications, 3(2), Harel, O., Stratton, J. (2009) "Inferences on the Outfluence How do missing values impact your analysis?" Communications in Statistics-Theory & Methods, In press. Rubin, D.B. (1987)Multiple Imputation for Nonresponse in Surveys. J. Wiley & Sons, New York. Schafer, J.L. (1997)Analysis of Incomplete Multivariate Data. Chapman & Hall, London. Shen, Z. (2000). Nested Multiple Imputation. Ph.D. thesis, Harvard University, Department of Statistics. Weil, A., Zinberg, N., Nelson, J. (1968) Clinical and psychological effects of marihuana in man. Science, 162,

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing