Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models

Size: px

Start display at page:

Download "Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models"

Martin Randall
5 years ago
Views:

1 Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models Hui Xie Assistant Professor Division of Epidemiology & Biostatistics UIC This is a joint work with Drs. Hua Yun Chen and Yi Qian. Multiple Imputation by OR model p. 1/35

2 Outline Review Multiple Imputation by OR model p. 2/35

3 Outline Review The Semiparametric OR Model (SOR) Multiple Imputation by OR model p. 2/35

4 Outline Review The Semiparametric OR Model (SOR) Multiple Imputation (MI) under SOR Multiple Imputation by OR model p. 2/35

5 Outline Review The Semiparametric OR Model (SOR) Multiple Imputation (MI) under SOR A Simulation Study Multiple Imputation by OR model p. 2/35

6 Outline Review The Semiparametric OR Model (SOR) Multiple Imputation (MI) under SOR A Simulation Study Empirical Application Multiple Imputation by OR model p. 2/35

7 Outline Review The Semiparametric OR Model (SOR) Multiple Imputation (MI) under SOR A Simulation Study Empirical Application Conclusion Multiple Imputation by OR model p. 2/35

8 Review: Multiple Imputation (MI) Missing data is prevalent in practice. Improper handling of Missing data can cause bias and loss of efficiency. MI (Rubin 1987) stands out as a popular method for missing data analysis. Softwares packages for MI SAS Proc MI and Proc MIANALYZE, S-Plus library missing. Stand-alone packages: MICE, IVEware. These packages have made MI easy to apply in practical analysis. Multiple Imputation by OR model p. 3/35

9 Review: MI Let the full data from a sampling unit be denoted by Y = (Y 1,,Y t ), and we observe n i.i.d. replicates of Y. Let R = (R1,,R t ) be the missing data indicator for Y. where R j = { 1 if Y j is observed 0 if Y j is missing MI makes multiple draws from the posterior predictive distribution f(y mis Y obs,r) For an arbitrary pattern of missingness, a key step is to specify f(y ) = f(y mis,y obs ) Multiple Imputation by OR model p. 4/35

10 Review: MI Common Approaches to specify f(y ) Joint model approach. e.g. f(y ) MV N(, ). (e.g. impgaussian in S-Plus) Sequential model approach (e.g. MICE in R) 1: f(y 1 Y 2,,Y t ). 2: f(y 2 Y 1,Y 3,,Y t ). t: f(y t Y 1,Y 2,,Y t 1 ). Multiple Imputation by OR model p. 5/35

11 Review: Limitations of Existing MI Software Inflexibility in modeling mixed discrete and continuous data (Kenward and Carpenter 2007, Shafer 1997) Joint Normal applied to discrete data. Joint Normal have difficulties in incorporating interaction and higher order terms (van Buuren 2007, Yu, Burton and Rivero-Arias 2007). Alternative Method, such as the sequential imputation approach has limitations as follows. Potential incompatibility in model specification ( van Buuren, Boshuizen and Knook 1999, Raghunathan et al. 2001, Gelman and Raghunathan 2001). Lack of theory to support the use of the method for MI. Multiple Imputation by OR model p. 6/35

12 A New Approach: MI under SOR We propose a novel imputation framework for MI using conditional Semiparametric Odds Ratio model (SOR) with the following features. Generalize generalized linear models (GLM). No parametric distributional assumptions. Flexible to model the mixture of discrete and continuous variables. Easily handle the bounded or semi-continuous variables, which can be a problem for other imputation approaches. Simultaneously address both the issue of inflexibility of the joint normal model and the issue of potential inconsistency of sequential imputation models. Multiple Imputation by OR model p. 7/35

13 A New Approach: MI under SOR We propose a novel imputation framework for MI using conditional Semiparametric Odds Ratio model (SOR) with the following features (cont.). Like hot-deck approach, the proposed approach imputes a missing value by the weighted draws from the combinations of the observed values from different missing groups. Unlike hot-deck approach, our imputation is model-based and is proper in Rubin s sense. Multiple Imputation by OR model p. 8/35

14 A New Approach: MI under SOR Outline of our work. We study the Bayesian inference under the SOR model. We propose using Dirichlet process prior (Ferguson 1973, 1974) for nonparametric parameters in the model. We devise an efficient posterior sampling method using Gibbs sampler combined with Hybrid Monte Carlo method. Multiple Imputation by OR model p. 9/35

15 SOR model Let the full data from a sampling unit be denoted by Y, and we observe n i.i.d. replicates of Y. Let the density of Y = (Y t,,y 1 ) under a product of Lebesgue measures and count measures be decomposed into consecutive conditional densities as g(y t,,y 1 ) = t j=1 g j (y j y j 1,,y 1 ). Multiple Imputation by OR model p. 10/35

16 SOR model For any given conditional density g j (y j y j 1,,y 1 ), define the odds ratio function relative to a sample point (y j0,,y 10 ) as η j {y j ; (y j 1,, y 1 ) y j0,, y 10 } = g j (y j y j 1,, y 1 )/g j (y j0 y j 1,, y 1 ) g j (y j y (j 1)0,, y 10 )/g j (y j0 y (j 1)0,, y 10 ). For notational simplicity, we will use η j {y j ; (y j 1,,y 1 )} to denote η j {y j ; (y j 1,,y 1 ) y j0,,y 10 }. Chen (2003, 2004, 2007) showed that the conditional density can be rewritten as g j (y j y j 1,,y 1 ) = η j {y j ; (y j 1,,y 1 )}g j (y j y (j 1)0,,y 10 ) ηj {y j ; (y j 1,,y 1 )}g j (y j y (j 1)0,,y 10 )dy j. Multiple Imputation by OR model p. 11/35

17 SOR model Note that g j (y j y j 1,,y 1 ) = η j {y j ; (y j 1,,y 1 )}g j (y j y (j 1)0,,y 10 ) ηj {y j ; (y j 1,,y 1 )}g j (y j y (j 1)0,,y 10 )dy j. The SOR model decomposes g j (y j y j 1,,y 1 ) into two parts: A marginal-like density function: f j (y j ) = g j (y j y (j 1)0,,y 10 ) An odds-ratio function: η j {y j ; (y j 1,,y 1 )} Multiple Imputation by OR model p. 12/35

18 SOR model A non-parametric model for the marginal-like density function: We model g j (y j y (j 1)0,,y 10 ) nonparametrically by f j (y j ) and assign point mass to the observed data values of y j. A parametric log-bilinear model for the odds-ratio function: log η j {y j ; (y j 1,,y 1 )} = j 1 k=1 θ jk (y j y j0 )(y k y k0 ). More generally to include interaction and higher-order terms, log η j {y j ; (y j 1,,y 1 ),θ} = j 1 M k L θ jlkmk d j (y ju y ju0 ) l u d k (y kv y kv0 ) m kv. k=1 m k =1 l =1 u=1 v=1 Multiple Imputation by OR model p. 13/35

19 SOR model In summary, the density function of Y is g(y t,,y 1 θ 2,,θ t ;f 1,,f t ) = t η j {y j ; (y j 1,,y 1 ),θ j }f j (y j ) ηj {y j ; (y j 1,,y 1 ),θ j }f j (y j )dy j. j=1 Multiple Imputation by OR model p. 14/35

20 Relation to GLM SOR nests GLM. g j (y j y j 1,,y 1 ) = exp [ ] 1 {θ j y j b(θ j )} + c(y j,φ j ) φ j One can show the marginal-like density is: g j (y j y (j 1)0,,y 10 ) = exp [ ] 1 φ {θ j0y j b(θ j0 )} + c(y j,φ j ). and the odds ratio function is: η j {y j ; (y j 1,,y 1 )} = exp {(y j y j0 )(θ j θ j0 )/φ j }. With canonical link function, θ j = β 0 + β 1 y β j 1 y j 1 parameters in the odds ratio function are (β k /φ j,k = 1,,j 1) Multiple Imputation by OR model p. 15/35

21 MI under SOR First consider Bayesian Inference of SOR with complete data. The likelihood under SOR is: n i=1 t j=1 η j {Yj i; (Y j 1 i,,y 1 i),θ j}f j (Yj i ) ηj {y j ; (Yj 1 i,,y 1 i),θ. j}f j (y j )dy j Priors θ j ψ j (θ j ) f j D j (c j F j ) where c j > 0 and F j is a probability distribution. Multiple Imputation by OR model p. 16/35

22 MI under SOR Given the above model specification, the posterior distribution of model parameters is: P(θ j,f j Y i,i = 1,,n) { n } p j (Yj i Yj 1, i,y1,θ i j,f j ) D j (f j )ψ j (θ j ) { n i=1 i=1 η j {Yj i; (Y j 1 i,,y 1 i),θ } j} ηj {y j ; (Yj 1 i,,y 1 i),θ ψ j (θ j ) j}f j (y j )dy j ) n D j (c j F j + where δ Yj denote the point measure at Y j. i=1 δ Y i j Multiple Imputation by OR model p. 17/35

23 MI under SOR To simplify computation, we set the Dirichlet process prior with the mean distribution having probability mass on the observed data points. This is in analogy to use the empirical distribution to approximate the true continuous distribution when Y j is continuous This is equivalent to replacing D j (c j F j + nf nj ) with D((c j + n)f nj ), where F nj = n 1. Note that i δ Y i j c j F j + nf nj = (c j + n)f nj + c j (F j F nj ). The second term is of lower order in n compared with the first term on the right-hand side of the foregoing equation. This suggests that the replacement is approximately right for large n. Multiple Imputation by OR model p. 18/35

24 MI under SOR Denote the unique values that Y i j, i = 1,,n, take by { y jk }, k = 1,,K j. Let δ jk denote the frequency that Y i j = y jk for i = 1,,n. Let the Dirichlet distribution approximating the prior has parameter α jk, k = 1,,K j. Let λ jk = log(f jk /f jkj ), k = 1,,K j. The sampling distribution for (λ j,θ j ) appears as P(θ j,λ j ) { n i=1 η j {Yj i; (Y j 1 i,,y 1 i),θ j} Kj k=1 η j{y jk ; (Yj 1 i,,y 1 i),θ j}e λ jk ψ j (θ j ) K j k=1 } exp{(δ jk + a jk 1)λ jk } Multiple Imputation by OR model p. 19/35

25 Updating: A hybrid monte carlo (HMC) algorithm We apply the hybrid monte carlo (Liu 2001, Chapter 9) to sample λ j and θ j Let U(λ j,θ j ) = ln P(θ j,λ j ) and H{(λ j,θ j ), (p j,q j )} = U(λ j,θ j )+ 1 2 K j 1 k=1 p 2 jk m jk + D j k=1 q 2 jk n jk, where (p j,q j ) are auxiliary variables. Starting from (λ old ), HMC uses leap-frog algorithm to propose candidate draw (λ new ). The candidate sample is j,θj old j,θj new then accepted with the probability min(1,exp[ H{(λ new j,θ new j ),(p new j,q new j )}+H{(λ old j,θ old j ),(p old j,qj old )}]). Multiple Imputation by OR model p. 20/35

26 Updating: Leap-frog Algorithm Let (λ 0 j,θ0 j ) = (λold ). Draw p j from the normal distribution with mean 0 and variance diag(m j1,...,m j(kj 1)), j,θj old and draw q j from the normal distribution with mean 0 and variance diag(n j1,...,n jdj ). Then the initial momentum p 0 j and q 0 j have their elements given as follows: p 0 jk = p jk 2 q 0 jk = q jk 2 U { λ 0 λ j,θ 0 } j jk U θ jk { λ 0 j,θ 0 j } Multiple Imputation by OR model p. 21/35

27 Updating: Leap-frog Algorithm From the initial phase space (λ 0 j,θ0 j,p0 j,q0 j ) of the system, we run the leap-frog algorithm in S steps to generate a new phase space (λ S j,θs j,ps j,qs j ) where for the s step λ s jk θjk s p s jk qjk s = λ s 1 jk = θ s 1 jk = p s 1 jk = q s 1 jk + ps 1 jk m jk + ps 1 jk m jk s U λ jk { λ s jk,θ s jk s U θ jk { λ s jk,θ s jk } } where s = 1,...,S, s = for s < S and s = 2 if s = S, is the user-specified stepsize, and Multiple Imputation by OR model p. 22/35

28 Derivatives are: U λ jk Updating: Leap-frog Algorithm = (δ jk + α jk 1) + n i=1 U θ jk = θ jk log ψ j (θ j ) È n i=1 η j {y jk ;(Yj 1, i,y1 i ),θ j }e λ jk ÈK j k=1 η j{y jk ;(Yj 1 i i,,y1 ),θ j}e λ jk log η θ j {Yj i ;(Y i jk j 1,,Y1 i ),θ j } + È n i=1 K j k=1 η θ j {y jk ;(Y i jk j 1,,Y 1 i ),θ j }eλ jk K j k=1 η j {y jk ;(Y i j 1,,Y i 1 ),θ j }eλ jk. High optimal acceptance rate (65%, Beskos et al. 2010) while being able to quickly explore all areas of the target distribution by exploiting the gradient information. Multiple Imputation by OR model p. 23/35

29 Updating: A hybrid monte carlo algorithm The Gibbs sampler for sampling (λ j,θ j ), j = 1,,t iteratively can be described as follows. 1. Fit an independence model to the data, which is equivalent to setting η 1 (or θ = 0) and f to the empirical marginal probability mass function. 2. Given data, sample (θ j,λ j ) using the hybrid monte carlo approach. 3. Do step 2 for j = 1,,t. 4. Repeat steps 2 and 3 until convergence. Multiple Imputation by OR model p. 24/35

30 MI under SOR MCMC algorithm for MI under SOR is as follows: 1. Initially, the missing values are imputed using independence model or from other method (e.g. MICE.). 2. Carry out one step of the hybrid monte carlo sampling algorithm for (λ j,θ j ), j = 1,,t. 3. Draw γ from the distribution with density proportional to P(γ R i,y i n,i=1,,n) i=1 Ér{π(r,Y 1 i,,yt i,γ)} 1 {R i =r} ξ(γ). É Multiple Imputation by OR model p. 25/35

31 MI under SOR MCMC algorithm for MI under SOR continued: 4. For each i = 1,,n and missing group Y j, j = 1, t, if Y i j is missing, impute Y i j from the conditional distribution of Y i j given (Y i j,r i,γ,θ j,f j ), which is the discrete distribution proportional to π(r i,y1 i,,yt i t,γ) l=j É η l {Y l i ;(Y l 1 i,,y 1 i ),θ l } η l {y l ;(Y l 1 i,,y 1 i),θ l }f l (y l )dy l f j (Yj i ). 5. Repeat steps 2-4 until convergence. Multiple Imputation by OR model p. 26/35

32 A Simulation Study We simulated complete data from the following joint distribution of six variables: Y 1 N(0, 1) Given Y 1, Y 2 is binary with logit{p(y 2 =1 Y 1 )}=β 20 +β 21 Y 1. Given Y 1 and Y 2, Y 3 is normally distributed with unit variance and mean µ 1 (Y 1,Y 2 )=β 30 +β 31 Y 1 +β 32 Y 2 +β 33 Y 1 Y 2, Given Y 1,Y 2,Y 3, Y 4 is Poisson distributed with rate parameter ln λ(y 1,Y 2,Y 3 )=β 40 +β 41 Y 1 +β 42 Y 2 +β 43 Y 3. Multiple Imputation by OR model p. 27/35

33 A Simulation Study Complete data model continued: Given Y 1,,Y 4, Y 5 is normally distributed with unit variance and mean µ 2 (Y j,j=1,,4)=β 50 +β 51 Y 1 +β 52 Y 2 +β 53 Y 3 +β 54 Y 4 +β 55 Y 3 Y 4, Given Y j, j = 1,, 5, Y 6 is binary with logit{p(y 6 = 1 Y j,j = 1,, 5)} =β 60 +β 61 Y 1 +β 62 Y 2 +β 63 Y 3 +β 64 Y 4 +β 65 Y 5 +β 66 Y 2 Y 4. Two Scenarios: No Interactions: (β 33,β 55,β 66 ) = (0, 0, 0). With Interactions: (β 33,β 55,β 66 ) = (1, 1, 1). Multiple Imputation by OR model p. 28/35

34 A Simulation Study Missing data model with two scenarios (1) MCAR: Randomly set 10% of each variable to be missing. (2) MAR: The last three variables are subject to missing with logit{p(r k =1 Y j,j=1,,6)}=α k0 +α k1 Y 1 +α k2 Y 2 +α k3 Y 3, k=4,5,6. where α k0 = 2 and α k1 = α k2 = α k3 = 0.5. Both scenarios lead to 50% of complete cases. Multiple Imputation by OR model p. 29/35

35 A Simulation Study Analysis of Simulated Data: Imputation Step. Applying the following methods for imputation. MI using SOR MI using impgaussian or impcgm in S-Plus library Missing. R package MICE. Analysis Step: Each imputed dataset is analyzed using the respective parametric models given earlier. Rubin s rule for combination is then applied to the estimates from the multiply imputed datasets. The simulation results were obtained based on 500 replicates of a sample size 400 for the full data. Multiple Imputation by OR model p. 30/35

36 Table 1 Simulation results for the MCAR data without interaction. Parameter FD CC JN MICE Imp Bias SE CR Bias SE CR Bias SE (RSE) CR Bias SE(RSE) CR Bias SE(RSE) CR β 21 = (0.22) (0.22) (0.22) 96 β 31 = (0.07) (0.07) (0.07) 95 β 32 = (0.14) (0.14) (0.13) 94 β 41 = (0.07) (0.07) (0.07) 96 β 42 = (0.17) (0.17) (0.17) 94 β 43 = (0.04) (0.04) (0.04) 94 β 51 = (0.10) (0.10) (0.10) 95 β 52 = (0.25) (0.25) (0.25) 94 β 53 = (0.06) (0.06) (0.06) 94 β 54 = (0.04) (0.04) (0.04) 96 β 61 = (0.25) (0.26) (0.26) 93 β 62 = (0.61) (0.61) (0.61) 97 β 63 = (0.21) (0.21) (0.21) 94 β 64 = (0.12) (0.12) (0.12) 96 β 65 = (0.15) (0.15) (0.15) Biometrics, Note: JN denotes MI assuming joint normal distribution and is fitted using the function impgauss in the missing data library of Splus 8.0 (Insightful). After the imputation, the imputed values are post-processed to conform to the data type as follows: for binary variables, the imputed value is converted to the closer value of one or zero; for count variables, the imputed value is rounded off to the closest integer, and negative integer values are then changed to zero. MICE is multiple imputation using the Chained Equations. The R package MICE 1.16 is used. The default imputation method is used to impute each univariate, given all the rest. Predictive mean matching is used for numeric data, logistic regression imputation for binary data, and polytomous regression imputation for categorical data. Imp is the multiple imputation method using our proposed method. Bias: etsimated truth, SE: standard error estimate from simulation, RSE: Average of Rubin s standard error estimate, CR: 95% confidence interval coverage rate (in percentage).

37 Table 2 Simulation results for the MAR data without interaction. Parameter FD CC JN MICE Imp Bias SE CR Bias SE CR Bias SE (RSE) CR Bias SE(RSE) CR Bias SE(RSE) CR β 21 = (0.21) (0.21) (0.21) 95 β 31 = (0.06) (0.06) (0.06) 96 β 32 = (0.13) (0.13) (0.13) 95 β 41 = (0.07) (0.07) (0.07) 95 β 42 = (0.17) (0.17) (0.17) 95 β 43 = (0.04) (0.04) (0.04) 96 β 51 = (0.10) (0.09) (0.09) 95 β 52 = (0.25) (0.24) (0.24) 94 β 53 = (0.06) (0.06) (0.06) 96 β 54 = (0.05) (0.05) (0.05) 94 β 61 = (0.26) (0.26) (0.26) 95 β 62 = (0.61) (0.63) (0.63) 95 β 63 = (0.21) (0.22) (0.22) 93 β 64 = (0.13) (0.13) (0.14) 94 β 65 = (0.16) (0.17) (0.17) 94 See Table 1 for definitions of abbreviations. Imputation Through Odds Ratio Models 29

38 Table 3 Simulation results on the MCAR data with interactions. Parameter FD CC CGM MICE Imp Bias SE CR Bias SE CR Bias SE (RSE) CR Bias SE(RSE) CR Bias SE(RSE) CR β 21 = (0.14) (0.16) (0.14) 96 β 31 = (0.08) (0.12) (0.08) 96 β 32 = (0.12) (0.23) (0.12) 95 β 33 = (0.12) (0.20) (0.12) 95 β 41 = (0.08) (0.09) (0.08) 95 β 42 = (0.16) (0.20) (0.15) 95 β 43 = (0.04) (0.05) (0.04) 94 β 51 = (0.19) (0.23) (0.14) 97 β 52 = (0.41) (0.48) (0.32) 98 β 53 = (0.14) (0.20) (0.10) 97 β 54 = (0.11) (0.15) (0.07) 96 β 55 = (0.05) (0.07) (0.03) 93 β 61 = (0.27) (0.26) (0.27) 96 β 62 = (0.63) (0.63) (0.63) 94 β 63 = (0.20) (0.19) (0.20) 95 β 64 = (0.28) (0.33) (0.29) 95 β 65 = (0.05) (0.05) (0.05) 96 β 66 = (0.32) (0.37) (0.33) Biometrics, Note: CGM stands for Conditional Gaussian Model, which implements the location-scale model for a mixture of discrete and continuous outcomes. In the approach, the log-linear model is used to model the categorical variables, and a conditional joint normal outcome is then used to model the remaining continuous outcomes. In the simulation study, Y 2 and Y 6 are categorical variables and the rest are continuous variables. The method imputes Y 4, which is a count variable, as a normal outcome. The imputed values of Y 4 are rounded off to the closest integers, and negative integer values are then changed to zero. The function impcgm in the missing data library of Splus 8.0 (Insightful Corp.) is used. MICE is multiple imputation using the Chained Equations. The R package MICE 1.16 is used. The default imputation method is used to impute each univariate, given all the rest. Predictive mean matching is used for numeric data, logistic regression imputation for binary data, and polytomous regression imputation for categorical data. Interaction terms Y 1 Y 2, Y 2 Y 4 and Y 3 Y 4 are used for imputing all terms.

39 Table 4 Simulation results for the MAR data with interactions. Parameter FD CC CGM MICE Imp Bias SE CR Bias SE CR Bias SE (RSE) CR Bias SE(RSE) CR Bias SE(RSE) CR β 21 = (0.13) (0.13) (0.13) 95 β 31 = (0.08) (0.08) (0.08) 94 β 32 = (0.11) (0.11) (0.11) 96 β 33 = (0.11) (0.11) (0.11) 94 β 41 = (0.08) (0.09) (0.08) 96 β 42 = (0.15) (0.16) (0.15) 95 β 43 = (0.04) (0.04) (0.04) 94 β 51 = (0.23) (0.33) (0.14) 98 β 52 = (0.47) (0.68) (0.29) 97 β 53 = (0.19) (0.35) (0.12) 96 β 54 = (0.11) (0.18) (0.08) 96 β 55 = (0.07) (0.18) (0.05) 90 β 61 = (0.26) (0.27) (0.27) 94 β 62 = (0.63) (0.61) (0.63) 95 β 63 = (0.21) (0.21) (0.21) 95 β 64 = (0.30) (0.30) (0.30) 96 β 65 = (0.05) (0.05) (0.05) 95 β 66 = (0.34) (0.33) (0.34) 96 See Table 3 for definitions of abbreviations. Imputation Through Odds Ratio Models 31

40 A Simulation Study When no interaction exists, all MI methods: MI using SOR, the Joint normal and Sequential imputation method (MICE) perform reasonably well and better than CC. JN and sequential imputation method can perform poorly in accommodating interactions. JN cannot model interaction terms and the conditional models used in the sequential imputation are in conflict with each other. SOR provides a robust and flexible alternative to the existing MI softwares. Multiple Imputation by OR model p. 31/35

41 Application: Bone Fracture Data A case-control study of risk factors of hip fracture among male veterans (Barengolts et al. 2001) Nine Risk factors considered in this study. All risk factors are subject to missing values. Only 237 out of 436 subjects have complete data. MI runs 2000 iteration for burn-in period. After burn-in period, generate 20 imputed datasets for every 150 iterations. A logistic model for hip fracture outcome was used to analyze imputed datasets and the results are pooled using Rubin s combination rule. Multiple Imputation by OR model p. 32/35

42 Application: Bone Fracture Data Table 1: Analysis of the imputed bone fracture data Method Variable CC MICE CGM IMPA IMPB Etoh 1.39(0.39) 1.23(0.31) 1.18(0.30) 1.24(0.31) 1.30(0.34) Smoke 0.93(0.40) 0.62(0.30) 0.51(0.29) 0.67(0.32) 0.64(0.32) Dementia 2.51(0.72) 1.61(0.47) 1.54(0.45) 1.56(0.46) 1.58(0.45) Antiseiz 3.31(1.06) 2.51(0.64) 2.44(0.60) 2.56(0.62) 2.56(0.62) LevoT4 2.01(1.02) 0.92(0.64) 0.88(0.55) 0.97(0.62) 0.85(0.60) AntiChol -1.92(0.77) -1.49(0.59) -0.91(0.48) -1.62(0.55) -1.56(0.56) Albumin -0.91(0.35) -1.03(0.28) -1.01(0.26) -1.01(0.30) -0.90(0.29) BMI -0.10(0.04) -0.10(0.03) -0.11(0.03) -0.10(0.03) -0.10(0.03) log(hgb) -2.60(1.20) -3.39(0.93) -3.18(0.88) -3.20(0.96) -3.38(0.99) Multiple Imputation by OR model p. 33/35

43 Application: Bone Fracture Data Table 2: Analysis of the imputed bone fracture data Method Variable CC MICE CGM IMPA IMPB Etoh 1.41(0.40) 1.13(0.29) 1.15(0.30) 1.27(0.31) 1.31(0.30) Smoke -9.21(5.69) -5.32(4.34) -3.05(4.52) -2.97(4.54) -3.14(4.63) Dementia 2.80(0.79) 1.69(0.47) 1.54(0.47) 1.60(0.48) 1.63(0.47) Antiseiz 4.12(1.29) 2.45(0.62) 2.51(0.63) 2.67(0.66) 2.76(0.65) LevoT4 3.15(1.34) 0.41(0.65) 1.03(0.63) 1.00(0.66) 0.89(0.62) AntiChol 5.08(4.15) -0.72(1.99) -1.26(2.34) -2.87(2.29) -3.32(2.20) Albumin 5.90(4.04) -3.07(3.40) 2.53(2.97) 2.80(3.02) 2.60(3.40) BMI -0.12(0.04) -0.12(0.03) -0.11(0.03) -0.11(0.03) -0.10(0.03) log(hgb) 4.60(5.99) -7.56(4.80) 1.02(4.35) 1.46(4.43)) 1.26(4.76) smoke*loghgb 4.05(2.28) 2.40(1.74) 1.40(1.79) 1.82(1.80) 1.64(1.84) AntiChol*albumin -2.36(1.40) 0.02(0.55) 0.07(0.62) 0.36(0.65) 0.50 (0.63) Albumin*loghgb -2.67(1.67) 0.95(1.35) -1.43(1.19) -1.58(1.22) (1.36) Multiple Imputation by OR model p. 34/35

44 Discussion We proposed a new MI approach based on semiparametric odds ratio model. The approach is more flexible than existing methods Unlike the joint model, higher order terms can be incorporated into model. Unlike sequential imputation model, the SOR approach is compatible. Can accommodate different shapes of distributions. Imputation model is more general than GLM, commonly used for data analysis. More research is needed for model selection for nonlinear terms in the model. Multiple Imputation by OR model p. 35/35

Compatibility of conditionally specified models

Compatibility of conditionally specified models Hua Yun Chen Division of epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West Taylor Street, Chicago, IL 60612