Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models

Size: px
Start display at page:

Download "Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models"

Transcription

1 Multiple Imputation for Missing Values Through Conditional Semiparametric Odds Ratio Models Hui Xie Assistant Professor Division of Epidemiology & Biostatistics UIC This is a joint work with Drs. Hua Yun Chen and Yi Qian. Multiple Imputation by OR model p. 1/35

2 Outline Review Multiple Imputation by OR model p. 2/35

3 Outline Review The Semiparametric OR Model (SOR) Multiple Imputation by OR model p. 2/35

4 Outline Review The Semiparametric OR Model (SOR) Multiple Imputation (MI) under SOR Multiple Imputation by OR model p. 2/35

5 Outline Review The Semiparametric OR Model (SOR) Multiple Imputation (MI) under SOR A Simulation Study Multiple Imputation by OR model p. 2/35

6 Outline Review The Semiparametric OR Model (SOR) Multiple Imputation (MI) under SOR A Simulation Study Empirical Application Multiple Imputation by OR model p. 2/35

7 Outline Review The Semiparametric OR Model (SOR) Multiple Imputation (MI) under SOR A Simulation Study Empirical Application Conclusion Multiple Imputation by OR model p. 2/35

8 Review: Multiple Imputation (MI) Missing data is prevalent in practice. Improper handling of Missing data can cause bias and loss of efficiency. MI (Rubin 1987) stands out as a popular method for missing data analysis. Softwares packages for MI SAS Proc MI and Proc MIANALYZE, S-Plus library missing. Stand-alone packages: MICE, IVEware. These packages have made MI easy to apply in practical analysis. Multiple Imputation by OR model p. 3/35

9 Review: MI Let the full data from a sampling unit be denoted by Y = (Y 1,,Y t ), and we observe n i.i.d. replicates of Y. Let R = (R1,,R t ) be the missing data indicator for Y. where R j = { 1 if Y j is observed 0 if Y j is missing MI makes multiple draws from the posterior predictive distribution f(y mis Y obs,r) For an arbitrary pattern of missingness, a key step is to specify f(y ) = f(y mis,y obs ) Multiple Imputation by OR model p. 4/35

10 Review: MI Common Approaches to specify f(y ) Joint model approach. e.g. f(y ) MV N(, ). (e.g. impgaussian in S-Plus) Sequential model approach (e.g. MICE in R) 1: f(y 1 Y 2,,Y t ). 2: f(y 2 Y 1,Y 3,,Y t ). t: f(y t Y 1,Y 2,,Y t 1 ). Multiple Imputation by OR model p. 5/35

11 Review: Limitations of Existing MI Software Inflexibility in modeling mixed discrete and continuous data (Kenward and Carpenter 2007, Shafer 1997) Joint Normal applied to discrete data. Joint Normal have difficulties in incorporating interaction and higher order terms (van Buuren 2007, Yu, Burton and Rivero-Arias 2007). Alternative Method, such as the sequential imputation approach has limitations as follows. Potential incompatibility in model specification ( van Buuren, Boshuizen and Knook 1999, Raghunathan et al. 2001, Gelman and Raghunathan 2001). Lack of theory to support the use of the method for MI. Multiple Imputation by OR model p. 6/35

12 A New Approach: MI under SOR We propose a novel imputation framework for MI using conditional Semiparametric Odds Ratio model (SOR) with the following features. Generalize generalized linear models (GLM). No parametric distributional assumptions. Flexible to model the mixture of discrete and continuous variables. Easily handle the bounded or semi-continuous variables, which can be a problem for other imputation approaches. Simultaneously address both the issue of inflexibility of the joint normal model and the issue of potential inconsistency of sequential imputation models. Multiple Imputation by OR model p. 7/35

13 A New Approach: MI under SOR We propose a novel imputation framework for MI using conditional Semiparametric Odds Ratio model (SOR) with the following features (cont.). Like hot-deck approach, the proposed approach imputes a missing value by the weighted draws from the combinations of the observed values from different missing groups. Unlike hot-deck approach, our imputation is model-based and is proper in Rubin s sense. Multiple Imputation by OR model p. 8/35

14 A New Approach: MI under SOR Outline of our work. We study the Bayesian inference under the SOR model. We propose using Dirichlet process prior (Ferguson 1973, 1974) for nonparametric parameters in the model. We devise an efficient posterior sampling method using Gibbs sampler combined with Hybrid Monte Carlo method. Multiple Imputation by OR model p. 9/35

15 SOR model Let the full data from a sampling unit be denoted by Y, and we observe n i.i.d. replicates of Y. Let the density of Y = (Y t,,y 1 ) under a product of Lebesgue measures and count measures be decomposed into consecutive conditional densities as g(y t,,y 1 ) = t j=1 g j (y j y j 1,,y 1 ). Multiple Imputation by OR model p. 10/35

16 SOR model For any given conditional density g j (y j y j 1,,y 1 ), define the odds ratio function relative to a sample point (y j0,,y 10 ) as η j {y j ; (y j 1,, y 1 ) y j0,, y 10 } = g j (y j y j 1,, y 1 )/g j (y j0 y j 1,, y 1 ) g j (y j y (j 1)0,, y 10 )/g j (y j0 y (j 1)0,, y 10 ). For notational simplicity, we will use η j {y j ; (y j 1,,y 1 )} to denote η j {y j ; (y j 1,,y 1 ) y j0,,y 10 }. Chen (2003, 2004, 2007) showed that the conditional density can be rewritten as g j (y j y j 1,,y 1 ) = η j {y j ; (y j 1,,y 1 )}g j (y j y (j 1)0,,y 10 ) ηj {y j ; (y j 1,,y 1 )}g j (y j y (j 1)0,,y 10 )dy j. Multiple Imputation by OR model p. 11/35

17 SOR model Note that g j (y j y j 1,,y 1 ) = η j {y j ; (y j 1,,y 1 )}g j (y j y (j 1)0,,y 10 ) ηj {y j ; (y j 1,,y 1 )}g j (y j y (j 1)0,,y 10 )dy j. The SOR model decomposes g j (y j y j 1,,y 1 ) into two parts: A marginal-like density function: f j (y j ) = g j (y j y (j 1)0,,y 10 ) An odds-ratio function: η j {y j ; (y j 1,,y 1 )} Multiple Imputation by OR model p. 12/35

18 SOR model A non-parametric model for the marginal-like density function: We model g j (y j y (j 1)0,,y 10 ) nonparametrically by f j (y j ) and assign point mass to the observed data values of y j. A parametric log-bilinear model for the odds-ratio function: log η j {y j ; (y j 1,,y 1 )} = j 1 k=1 θ jk (y j y j0 )(y k y k0 ). More generally to include interaction and higher-order terms, log η j {y j ; (y j 1,,y 1 ),θ} = j 1 M k L θ jlkmk d j (y ju y ju0 ) l u d k (y kv y kv0 ) m kv. k=1 m k =1 l =1 u=1 v=1 Multiple Imputation by OR model p. 13/35

19 SOR model In summary, the density function of Y is g(y t,,y 1 θ 2,,θ t ;f 1,,f t ) = t η j {y j ; (y j 1,,y 1 ),θ j }f j (y j ) ηj {y j ; (y j 1,,y 1 ),θ j }f j (y j )dy j. j=1 Multiple Imputation by OR model p. 14/35

20 Relation to GLM SOR nests GLM. g j (y j y j 1,,y 1 ) = exp [ ] 1 {θ j y j b(θ j )} + c(y j,φ j ) φ j One can show the marginal-like density is: g j (y j y (j 1)0,,y 10 ) = exp [ ] 1 φ {θ j0y j b(θ j0 )} + c(y j,φ j ). and the odds ratio function is: η j {y j ; (y j 1,,y 1 )} = exp {(y j y j0 )(θ j θ j0 )/φ j }. With canonical link function, θ j = β 0 + β 1 y β j 1 y j 1 parameters in the odds ratio function are (β k /φ j,k = 1,,j 1) Multiple Imputation by OR model p. 15/35

21 MI under SOR First consider Bayesian Inference of SOR with complete data. The likelihood under SOR is: n i=1 t j=1 η j {Yj i; (Y j 1 i,,y 1 i),θ j}f j (Yj i ) ηj {y j ; (Yj 1 i,,y 1 i),θ. j}f j (y j )dy j Priors θ j ψ j (θ j ) f j D j (c j F j ) where c j > 0 and F j is a probability distribution. Multiple Imputation by OR model p. 16/35

22 MI under SOR Given the above model specification, the posterior distribution of model parameters is: P(θ j,f j Y i,i = 1,,n) { n } p j (Yj i Yj 1, i,y1,θ i j,f j ) D j (f j )ψ j (θ j ) { n i=1 i=1 η j {Yj i; (Y j 1 i,,y 1 i),θ } j} ηj {y j ; (Yj 1 i,,y 1 i),θ ψ j (θ j ) j}f j (y j )dy j ) n D j (c j F j + where δ Yj denote the point measure at Y j. i=1 δ Y i j Multiple Imputation by OR model p. 17/35

23 MI under SOR To simplify computation, we set the Dirichlet process prior with the mean distribution having probability mass on the observed data points. This is in analogy to use the empirical distribution to approximate the true continuous distribution when Y j is continuous This is equivalent to replacing D j (c j F j + nf nj ) with D((c j + n)f nj ), where F nj = n 1. Note that i δ Y i j c j F j + nf nj = (c j + n)f nj + c j (F j F nj ). The second term is of lower order in n compared with the first term on the right-hand side of the foregoing equation. This suggests that the replacement is approximately right for large n. Multiple Imputation by OR model p. 18/35

24 MI under SOR Denote the unique values that Y i j, i = 1,,n, take by { y jk }, k = 1,,K j. Let δ jk denote the frequency that Y i j = y jk for i = 1,,n. Let the Dirichlet distribution approximating the prior has parameter α jk, k = 1,,K j. Let λ jk = log(f jk /f jkj ), k = 1,,K j. The sampling distribution for (λ j,θ j ) appears as P(θ j,λ j ) { n i=1 η j {Yj i; (Y j 1 i,,y 1 i),θ j} Kj k=1 η j{y jk ; (Yj 1 i,,y 1 i),θ j}e λ jk ψ j (θ j ) K j k=1 } exp{(δ jk + a jk 1)λ jk } Multiple Imputation by OR model p. 19/35

25 Updating: A hybrid monte carlo (HMC) algorithm We apply the hybrid monte carlo (Liu 2001, Chapter 9) to sample λ j and θ j Let U(λ j,θ j ) = ln P(θ j,λ j ) and H{(λ j,θ j ), (p j,q j )} = U(λ j,θ j )+ 1 2 K j 1 k=1 p 2 jk m jk + D j k=1 q 2 jk n jk, where (p j,q j ) are auxiliary variables. Starting from (λ old ), HMC uses leap-frog algorithm to propose candidate draw (λ new ). The candidate sample is j,θj old j,θj new then accepted with the probability min(1,exp[ H{(λ new j,θ new j ),(p new j,q new j )}+H{(λ old j,θ old j ),(p old j,qj old )}]). Multiple Imputation by OR model p. 20/35

26 Updating: Leap-frog Algorithm Let (λ 0 j,θ0 j ) = (λold ). Draw p j from the normal distribution with mean 0 and variance diag(m j1,...,m j(kj 1)), j,θj old and draw q j from the normal distribution with mean 0 and variance diag(n j1,...,n jdj ). Then the initial momentum p 0 j and q 0 j have their elements given as follows: p 0 jk = p jk 2 q 0 jk = q jk 2 U { λ 0 λ j,θ 0 } j jk U θ jk { λ 0 j,θ 0 j } Multiple Imputation by OR model p. 21/35

27 Updating: Leap-frog Algorithm From the initial phase space (λ 0 j,θ0 j,p0 j,q0 j ) of the system, we run the leap-frog algorithm in S steps to generate a new phase space (λ S j,θs j,ps j,qs j ) where for the s step λ s jk θjk s p s jk qjk s = λ s 1 jk = θ s 1 jk = p s 1 jk = q s 1 jk + ps 1 jk m jk + ps 1 jk m jk s U λ jk { λ s jk,θ s jk s U θ jk { λ s jk,θ s jk } } where s = 1,...,S, s = for s < S and s = 2 if s = S, is the user-specified stepsize, and Multiple Imputation by OR model p. 22/35

28 Derivatives are: U λ jk Updating: Leap-frog Algorithm = (δ jk + α jk 1) + n i=1 U θ jk = θ jk log ψ j (θ j ) È n i=1 η j {y jk ;(Yj 1, i,y1 i ),θ j }e λ jk ÈK j k=1 η j{y jk ;(Yj 1 i i,,y1 ),θ j}e λ jk log η θ j {Yj i ;(Y i jk j 1,,Y1 i ),θ j } + È n i=1 K j k=1 η θ j {y jk ;(Y i jk j 1,,Y 1 i ),θ j }eλ jk K j k=1 η j {y jk ;(Y i j 1,,Y i 1 ),θ j }eλ jk. High optimal acceptance rate (65%, Beskos et al. 2010) while being able to quickly explore all areas of the target distribution by exploiting the gradient information. Multiple Imputation by OR model p. 23/35

29 Updating: A hybrid monte carlo algorithm The Gibbs sampler for sampling (λ j,θ j ), j = 1,,t iteratively can be described as follows. 1. Fit an independence model to the data, which is equivalent to setting η 1 (or θ = 0) and f to the empirical marginal probability mass function. 2. Given data, sample (θ j,λ j ) using the hybrid monte carlo approach. 3. Do step 2 for j = 1,,t. 4. Repeat steps 2 and 3 until convergence. Multiple Imputation by OR model p. 24/35

30 MI under SOR MCMC algorithm for MI under SOR is as follows: 1. Initially, the missing values are imputed using independence model or from other method (e.g. MICE.). 2. Carry out one step of the hybrid monte carlo sampling algorithm for (λ j,θ j ), j = 1,,t. 3. Draw γ from the distribution with density proportional to P(γ R i,y i n,i=1,,n) i=1 Ér{π(r,Y 1 i,,yt i,γ)} 1 {R i =r} ξ(γ). É Multiple Imputation by OR model p. 25/35

31 MI under SOR MCMC algorithm for MI under SOR continued: 4. For each i = 1,,n and missing group Y j, j = 1, t, if Y i j is missing, impute Y i j from the conditional distribution of Y i j given (Y i j,r i,γ,θ j,f j ), which is the discrete distribution proportional to π(r i,y1 i,,yt i t,γ) l=j É η l {Y l i ;(Y l 1 i,,y 1 i ),θ l } η l {y l ;(Y l 1 i,,y 1 i),θ l }f l (y l )dy l f j (Yj i ). 5. Repeat steps 2-4 until convergence. Multiple Imputation by OR model p. 26/35

32 A Simulation Study We simulated complete data from the following joint distribution of six variables: Y 1 N(0, 1) Given Y 1, Y 2 is binary with logit{p(y 2 =1 Y 1 )}=β 20 +β 21 Y 1. Given Y 1 and Y 2, Y 3 is normally distributed with unit variance and mean µ 1 (Y 1,Y 2 )=β 30 +β 31 Y 1 +β 32 Y 2 +β 33 Y 1 Y 2, Given Y 1,Y 2,Y 3, Y 4 is Poisson distributed with rate parameter ln λ(y 1,Y 2,Y 3 )=β 40 +β 41 Y 1 +β 42 Y 2 +β 43 Y 3. Multiple Imputation by OR model p. 27/35

33 A Simulation Study Complete data model continued: Given Y 1,,Y 4, Y 5 is normally distributed with unit variance and mean µ 2 (Y j,j=1,,4)=β 50 +β 51 Y 1 +β 52 Y 2 +β 53 Y 3 +β 54 Y 4 +β 55 Y 3 Y 4, Given Y j, j = 1,, 5, Y 6 is binary with logit{p(y 6 = 1 Y j,j = 1,, 5)} =β 60 +β 61 Y 1 +β 62 Y 2 +β 63 Y 3 +β 64 Y 4 +β 65 Y 5 +β 66 Y 2 Y 4. Two Scenarios: No Interactions: (β 33,β 55,β 66 ) = (0, 0, 0). With Interactions: (β 33,β 55,β 66 ) = (1, 1, 1). Multiple Imputation by OR model p. 28/35

34 A Simulation Study Missing data model with two scenarios (1) MCAR: Randomly set 10% of each variable to be missing. (2) MAR: The last three variables are subject to missing with logit{p(r k =1 Y j,j=1,,6)}=α k0 +α k1 Y 1 +α k2 Y 2 +α k3 Y 3, k=4,5,6. where α k0 = 2 and α k1 = α k2 = α k3 = 0.5. Both scenarios lead to 50% of complete cases. Multiple Imputation by OR model p. 29/35

35 A Simulation Study Analysis of Simulated Data: Imputation Step. Applying the following methods for imputation. MI using SOR MI using impgaussian or impcgm in S-Plus library Missing. R package MICE. Analysis Step: Each imputed dataset is analyzed using the respective parametric models given earlier. Rubin s rule for combination is then applied to the estimates from the multiply imputed datasets. The simulation results were obtained based on 500 replicates of a sample size 400 for the full data. Multiple Imputation by OR model p. 30/35

36 Table 1 Simulation results for the MCAR data without interaction. Parameter FD CC JN MICE Imp Bias SE CR Bias SE CR Bias SE (RSE) CR Bias SE(RSE) CR Bias SE(RSE) CR β 21 = (0.22) (0.22) (0.22) 96 β 31 = (0.07) (0.07) (0.07) 95 β 32 = (0.14) (0.14) (0.13) 94 β 41 = (0.07) (0.07) (0.07) 96 β 42 = (0.17) (0.17) (0.17) 94 β 43 = (0.04) (0.04) (0.04) 94 β 51 = (0.10) (0.10) (0.10) 95 β 52 = (0.25) (0.25) (0.25) 94 β 53 = (0.06) (0.06) (0.06) 94 β 54 = (0.04) (0.04) (0.04) 96 β 61 = (0.25) (0.26) (0.26) 93 β 62 = (0.61) (0.61) (0.61) 97 β 63 = (0.21) (0.21) (0.21) 94 β 64 = (0.12) (0.12) (0.12) 96 β 65 = (0.15) (0.15) (0.15) Biometrics, Note: JN denotes MI assuming joint normal distribution and is fitted using the function impgauss in the missing data library of Splus 8.0 (Insightful). After the imputation, the imputed values are post-processed to conform to the data type as follows: for binary variables, the imputed value is converted to the closer value of one or zero; for count variables, the imputed value is rounded off to the closest integer, and negative integer values are then changed to zero. MICE is multiple imputation using the Chained Equations. The R package MICE 1.16 is used. The default imputation method is used to impute each univariate, given all the rest. Predictive mean matching is used for numeric data, logistic regression imputation for binary data, and polytomous regression imputation for categorical data. Imp is the multiple imputation method using our proposed method. Bias: etsimated truth, SE: standard error estimate from simulation, RSE: Average of Rubin s standard error estimate, CR: 95% confidence interval coverage rate (in percentage).

37 Table 2 Simulation results for the MAR data without interaction. Parameter FD CC JN MICE Imp Bias SE CR Bias SE CR Bias SE (RSE) CR Bias SE(RSE) CR Bias SE(RSE) CR β 21 = (0.21) (0.21) (0.21) 95 β 31 = (0.06) (0.06) (0.06) 96 β 32 = (0.13) (0.13) (0.13) 95 β 41 = (0.07) (0.07) (0.07) 95 β 42 = (0.17) (0.17) (0.17) 95 β 43 = (0.04) (0.04) (0.04) 96 β 51 = (0.10) (0.09) (0.09) 95 β 52 = (0.25) (0.24) (0.24) 94 β 53 = (0.06) (0.06) (0.06) 96 β 54 = (0.05) (0.05) (0.05) 94 β 61 = (0.26) (0.26) (0.26) 95 β 62 = (0.61) (0.63) (0.63) 95 β 63 = (0.21) (0.22) (0.22) 93 β 64 = (0.13) (0.13) (0.14) 94 β 65 = (0.16) (0.17) (0.17) 94 See Table 1 for definitions of abbreviations. Imputation Through Odds Ratio Models 29

38 Table 3 Simulation results on the MCAR data with interactions. Parameter FD CC CGM MICE Imp Bias SE CR Bias SE CR Bias SE (RSE) CR Bias SE(RSE) CR Bias SE(RSE) CR β 21 = (0.14) (0.16) (0.14) 96 β 31 = (0.08) (0.12) (0.08) 96 β 32 = (0.12) (0.23) (0.12) 95 β 33 = (0.12) (0.20) (0.12) 95 β 41 = (0.08) (0.09) (0.08) 95 β 42 = (0.16) (0.20) (0.15) 95 β 43 = (0.04) (0.05) (0.04) 94 β 51 = (0.19) (0.23) (0.14) 97 β 52 = (0.41) (0.48) (0.32) 98 β 53 = (0.14) (0.20) (0.10) 97 β 54 = (0.11) (0.15) (0.07) 96 β 55 = (0.05) (0.07) (0.03) 93 β 61 = (0.27) (0.26) (0.27) 96 β 62 = (0.63) (0.63) (0.63) 94 β 63 = (0.20) (0.19) (0.20) 95 β 64 = (0.28) (0.33) (0.29) 95 β 65 = (0.05) (0.05) (0.05) 96 β 66 = (0.32) (0.37) (0.33) Biometrics, Note: CGM stands for Conditional Gaussian Model, which implements the location-scale model for a mixture of discrete and continuous outcomes. In the approach, the log-linear model is used to model the categorical variables, and a conditional joint normal outcome is then used to model the remaining continuous outcomes. In the simulation study, Y 2 and Y 6 are categorical variables and the rest are continuous variables. The method imputes Y 4, which is a count variable, as a normal outcome. The imputed values of Y 4 are rounded off to the closest integers, and negative integer values are then changed to zero. The function impcgm in the missing data library of Splus 8.0 (Insightful Corp.) is used. MICE is multiple imputation using the Chained Equations. The R package MICE 1.16 is used. The default imputation method is used to impute each univariate, given all the rest. Predictive mean matching is used for numeric data, logistic regression imputation for binary data, and polytomous regression imputation for categorical data. Interaction terms Y 1 Y 2, Y 2 Y 4 and Y 3 Y 4 are used for imputing all terms.

39 Table 4 Simulation results for the MAR data with interactions. Parameter FD CC CGM MICE Imp Bias SE CR Bias SE CR Bias SE (RSE) CR Bias SE(RSE) CR Bias SE(RSE) CR β 21 = (0.13) (0.13) (0.13) 95 β 31 = (0.08) (0.08) (0.08) 94 β 32 = (0.11) (0.11) (0.11) 96 β 33 = (0.11) (0.11) (0.11) 94 β 41 = (0.08) (0.09) (0.08) 96 β 42 = (0.15) (0.16) (0.15) 95 β 43 = (0.04) (0.04) (0.04) 94 β 51 = (0.23) (0.33) (0.14) 98 β 52 = (0.47) (0.68) (0.29) 97 β 53 = (0.19) (0.35) (0.12) 96 β 54 = (0.11) (0.18) (0.08) 96 β 55 = (0.07) (0.18) (0.05) 90 β 61 = (0.26) (0.27) (0.27) 94 β 62 = (0.63) (0.61) (0.63) 95 β 63 = (0.21) (0.21) (0.21) 95 β 64 = (0.30) (0.30) (0.30) 96 β 65 = (0.05) (0.05) (0.05) 95 β 66 = (0.34) (0.33) (0.34) 96 See Table 3 for definitions of abbreviations. Imputation Through Odds Ratio Models 31

40 A Simulation Study When no interaction exists, all MI methods: MI using SOR, the Joint normal and Sequential imputation method (MICE) perform reasonably well and better than CC. JN and sequential imputation method can perform poorly in accommodating interactions. JN cannot model interaction terms and the conditional models used in the sequential imputation are in conflict with each other. SOR provides a robust and flexible alternative to the existing MI softwares. Multiple Imputation by OR model p. 31/35

41 Application: Bone Fracture Data A case-control study of risk factors of hip fracture among male veterans (Barengolts et al. 2001) Nine Risk factors considered in this study. All risk factors are subject to missing values. Only 237 out of 436 subjects have complete data. MI runs 2000 iteration for burn-in period. After burn-in period, generate 20 imputed datasets for every 150 iterations. A logistic model for hip fracture outcome was used to analyze imputed datasets and the results are pooled using Rubin s combination rule. Multiple Imputation by OR model p. 32/35

42 Application: Bone Fracture Data Table 1: Analysis of the imputed bone fracture data Method Variable CC MICE CGM IMPA IMPB Etoh 1.39(0.39) 1.23(0.31) 1.18(0.30) 1.24(0.31) 1.30(0.34) Smoke 0.93(0.40) 0.62(0.30) 0.51(0.29) 0.67(0.32) 0.64(0.32) Dementia 2.51(0.72) 1.61(0.47) 1.54(0.45) 1.56(0.46) 1.58(0.45) Antiseiz 3.31(1.06) 2.51(0.64) 2.44(0.60) 2.56(0.62) 2.56(0.62) LevoT4 2.01(1.02) 0.92(0.64) 0.88(0.55) 0.97(0.62) 0.85(0.60) AntiChol -1.92(0.77) -1.49(0.59) -0.91(0.48) -1.62(0.55) -1.56(0.56) Albumin -0.91(0.35) -1.03(0.28) -1.01(0.26) -1.01(0.30) -0.90(0.29) BMI -0.10(0.04) -0.10(0.03) -0.11(0.03) -0.10(0.03) -0.10(0.03) log(hgb) -2.60(1.20) -3.39(0.93) -3.18(0.88) -3.20(0.96) -3.38(0.99) Multiple Imputation by OR model p. 33/35

43 Application: Bone Fracture Data Table 2: Analysis of the imputed bone fracture data Method Variable CC MICE CGM IMPA IMPB Etoh 1.41(0.40) 1.13(0.29) 1.15(0.30) 1.27(0.31) 1.31(0.30) Smoke -9.21(5.69) -5.32(4.34) -3.05(4.52) -2.97(4.54) -3.14(4.63) Dementia 2.80(0.79) 1.69(0.47) 1.54(0.47) 1.60(0.48) 1.63(0.47) Antiseiz 4.12(1.29) 2.45(0.62) 2.51(0.63) 2.67(0.66) 2.76(0.65) LevoT4 3.15(1.34) 0.41(0.65) 1.03(0.63) 1.00(0.66) 0.89(0.62) AntiChol 5.08(4.15) -0.72(1.99) -1.26(2.34) -2.87(2.29) -3.32(2.20) Albumin 5.90(4.04) -3.07(3.40) 2.53(2.97) 2.80(3.02) 2.60(3.40) BMI -0.12(0.04) -0.12(0.03) -0.11(0.03) -0.11(0.03) -0.10(0.03) log(hgb) 4.60(5.99) -7.56(4.80) 1.02(4.35) 1.46(4.43)) 1.26(4.76) smoke*loghgb 4.05(2.28) 2.40(1.74) 1.40(1.79) 1.82(1.80) 1.64(1.84) AntiChol*albumin -2.36(1.40) 0.02(0.55) 0.07(0.62) 0.36(0.65) 0.50 (0.63) Albumin*loghgb -2.67(1.67) 0.95(1.35) -1.43(1.19) -1.58(1.22) (1.36) Multiple Imputation by OR model p. 34/35

44 Discussion We proposed a new MI approach based on semiparametric odds ratio model. The approach is more flexible than existing methods Unlike the joint model, higher order terms can be incorporated into model. Unlike sequential imputation model, the SOR approach is compatible. Can accommodate different shapes of distributions. Imputation model is more general than GLM, commonly used for data analysis. More research is needed for model selection for nonlinear terms in the model. Multiple Imputation by OR model p. 35/35

Compatibility of conditionally specified models

Compatibility of conditionally specified models Compatibility of conditionally specified models Hua Yun Chen Division of epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West Taylor Street, Chicago, IL 60612

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Longitudinal analysis of ordinal data

Longitudinal analysis of ordinal data Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

In marketing applications, it is common that some key covariates in a regression model, such as marketing

In marketing applications, it is common that some key covariates in a regression model, such as marketing CELEBRATING 30 YEARS Vol. 30, No. 4, July August 2011, pp. 717 736 issn 0732-2399 eissn 1526-548X 11 3004 0717 doi 10.1287/mksc.1110.0648 2011 INFORMS No Customer Left Behind: A Distribution-Free Bayesian

More information

Bayesian Multilevel Latent Class Models for the Multiple. Imputation of Nested Categorical Data

Bayesian Multilevel Latent Class Models for the Multiple. Imputation of Nested Categorical Data Bayesian Multilevel Latent Class Models for the Multiple Imputation of Nested Categorical Data Davide Vidotto Jeroen K. Vermunt Katrijn van Deun Department of Methodology and Statistics, Tilburg University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Convergence Properties of a Sequential Regression Multiple Imputation Algorithm

Convergence Properties of a Sequential Regression Multiple Imputation Algorithm Convergence Properties of a Sequential Regression Multiple Imputation Algorithm Jian Zhu 1, Trivellore E. Raghunathan 2 Department of Biostatistics, University of Michigan, Ann Arbor Abstract A sequential

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Predictive mean matching imputation of semicontinuous variables

Predictive mean matching imputation of semicontinuous variables 61 Statistica Neerlandica (2014) Vol. 68, nr. 1, pp. 61 90 doi:10.1111/stan.12023 Predictive mean matching imputation of semicontinuous variables Gerko Vink* Department of Methodology and Statistics, Utrecht

More information

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian GLMs and Metropolis-Hastings Algorithm Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

MCMC for Cut Models or Chasing a Moving Target with MCMC

MCMC for Cut Models or Chasing a Moving Target with MCMC MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency for Research on Cancer MCMSki Chamonix, 6 Jan 2014 Cut models What do we want to do? 1. Generate some random

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data Comparison of multiple imputation methods for systematically and sporadically missing multilevel data V. Audigier, I. White, S. Jolani, T. Debray, M. Quartagno, J. Carpenter, S. van Buuren, M. Resche-Rigon

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Bayesian Nonparametric Rasch Modeling: Methods and Software

Bayesian Nonparametric Rasch Modeling: Methods and Software Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /rssa.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /rssa. Goldstein, H., Carpenter, J. R., & Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

Quantile POD for Hit-Miss Data

Quantile POD for Hit-Miss Data Quantile POD for Hit-Miss Data Yew-Meng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection

More information

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data Alexina Mason, Sylvia Richardson and Nicky Best Department of Epidemiology and Biostatistics, Imperial College

More information

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Measurement Error and Linear Regression of Astronomical Data Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Classical Regression Model Collect n data points, denote i th pair as (η

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates

Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates Biometrics 000, 000 000 DOI: 000 000 0000 Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates M.J. Daniels, C. Wang, B.H. Marcus 1 Division of Statistics & Scientific

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Pooling multiple imputations when the sample happens to be the population.

Pooling multiple imputations when the sample happens to be the population. Pooling multiple imputations when the sample happens to be the population. Gerko Vink 1,2, and Stef van Buuren 1,3 arxiv:1409.8542v1 [math.st] 30 Sep 2014 1 Department of Methodology and Statistics, Utrecht

More information

Bayesian Analysis of Risk for Data Mining Based on Empirical Likelihood

Bayesian Analysis of Risk for Data Mining Based on Empirical Likelihood 1 / 29 Bayesian Analysis of Risk for Data Mining Based on Empirical Likelihood Yuan Liao Wenxin Jiang Northwestern University Presented at: Department of Statistics and Biostatistics Rutgers University

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Using Bayesian Priors for More Flexible Latent Class Analysis

Using Bayesian Priors for More Flexible Latent Class Analysis Using Bayesian Priors for More Flexible Latent Class Analysis Tihomir Asparouhov Bengt Muthén Abstract Latent class analysis is based on the assumption that within each class the observed class indicator

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

Technical Appendix Detailing Estimation Procedures used in A. Flexible Approach to Modeling Ultimate Recoveries on Defaulted

Technical Appendix Detailing Estimation Procedures used in A. Flexible Approach to Modeling Ultimate Recoveries on Defaulted Technical Appendix Detailing Estimation Procedures used in A Flexible Approach to Modeling Ultimate Recoveries on Defaulted Loans and Bonds (Web Supplement) Section 1 of this Appendix provides a brief

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

Semiparametric Approach for Non-Monotone Missing Covariates in a Parametric Regression Model

Semiparametric Approach for Non-Monotone Missing Covariates in a Parametric Regression Model Biometrics DOI: 10.1111/biom.12159 Semiparametric Approach for Non-Monotone Missing Covariates in a Parametric Regression Model Samiran Sinha, 1 Krishna K. Saha, 2 and Suojin Wang 1, * 1 Department of

More information

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Analysis of Latent Variable Models using Mplus Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Known unknowns : using multiple imputation to fill in the blanks for missing data

Known unknowns : using multiple imputation to fill in the blanks for missing data Known unknowns : using multiple imputation to fill in the blanks for missing data James Stanley Department of Public Health University of Otago, Wellington james.stanley@otago.ac.nz Acknowledgments Cancer

More information

Bayesian inference for multivariate skew-normal and skew-t distributions

Bayesian inference for multivariate skew-normal and skew-t distributions Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential

More information

Three-Level Multiple Imputation: A Fully Conditional Specification Approach. Brian Tinnell Keller

Three-Level Multiple Imputation: A Fully Conditional Specification Approach. Brian Tinnell Keller Three-Level Multiple Imputation: A Fully Conditional Specification Approach by Brian Tinnell Keller A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Arts Approved

More information