A weighted simulation-based estimator for incomplete longitudinal data models

Size: px
Start display at page:

Download "A weighted simulation-based estimator for incomplete longitudinal data models"

Transcription

1 To appear in Statistics and Probability Letters, 113 (2016), doi /j.spl A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun Wang 2 1 Juno Therapeutics, Seattle, Washington, USA 2 University of Manitoba, Winnipeg, Canada Final Version, February 2016 Abstract Recently, Li and Wang (2012a, b) and Wang (2007) have proposed a simulation-based estimator for generalized linear and nonlinear mixed models with complete longitudinal data. This estimator is constructed using the simulation-by-parts technique which leads to the unique feature that it is consistent even using finite number of simulated random points. This paper extends the methodology to deal with incomplete longitudinal data by applying the inverse probability weighting method for the monotone missing-at-random response data. The finite sample performance of this estimator is investigated through simulation studies and compared with the multiple imputation approach. Keywords: Simulation-based estimator. Generalized linear mixed model; Inverse probability weighting; Missing data; 1 Introduction In biomedical, environmental and social sciences research, longitudinal data analysis is widely used and constitutes the fundamental statistical research methodologies. Generalized linear mixed models (GLMM) have been widely used in the modeling of longitudinal data. Li and Wang (2012a) proposed a simulation-based estimator (SBE) for GLMM based on the first two marginal moments of the response variables, which does not rely on the normality distribution assumption for random effects. Li and Wang (2012b) extended the SBE to the GLMM where some covariates are Corresponding author, Department of Statistics, University of Manitoba, 186 Dysart Road, Winnipeg, Manitoba, Canada R3T 2N2. liqun.wang@umanitoba.ca 1

2 Li and Wang (2016): A weighted simulation-based estimator 2 measured with error. This approach was originally studied by Wang (2007) for nonlinear mixed effects models. The SBE is constructed using a novel simulation-by-parts technique to ensure its consistency by using finite number of simulated random points. This is the key difference from many other simulation-based estimators proposed in the literature, where they require the number of simulated random points go to infinity to achieve consistency. So far, the SBE is only studied under complete data settings although incomplete or missing data are common in longitudinal studies. For example, in clinical trials, missing data are almost inevitable because subjects may decide to withdraw from the study at anytime prior to completion or subjects are not compliant to protocol for scheduled assessments. Problems arise if the mechanism leading to the missing data depends on the response process. It is known that ignoring missing data and using naive methods may introduce bias, reduce the power of inference and lead to misleading conclusions (Little and Rubin, 2002). The extension of the SBE to account for incomplete longitudinal data is non-trivial and needs to be addressed to allow this estimator used in more general settings. In this paper, we discuss the validity of SBE under different missing data mechanism and modify it for the data missing at random (MAR) with monotone missingness through the inverse probability weighting (IPW) method. The IPW is a general methodology for constructing parameter estimators in semi-parametric models with complete as well as missing data (Robins, et al., 1995; Yi and Cook, 2002). Another popular approach to deal with missing data is the multiple imputation (Rubin, 1987; Schafer, 1997). We also investigate the performance of the SBE using this strategy. The structure of the paper is as follows. In Section 2, we introduce and review the missing data mechanism, pattern and estimation. Section 3 provides details on the proposed weighted simulationbased estimator, and addresses some practical computational issues. In particular, Section 3.2 handles the data missing completely at random (MCAR), while Section 3.3 focuses on the data MAR. Some simulation studies are conducted in Section 4 to examine the finite sample performance of the proposed estimator, and concluding remarks are given in section 5. 2 Missing Data Framework and Notation 2.1 Missing data mechanism To obtain valid inferences, it is essential to consider the reason for missingness. Let Y ij be the j th response for the i th subject, R i = (r i1, r i2,, r in ) be the vector of missing data indicators for Y i = (y i1,, y in ), such that r ij = 1 if response y ij is observed, and 0 otherwise. We partition Y i into Yi O and Yi M, where Yi O contains those y ij for which r ij = 1 and Yi M contains the remaining components. Assuming X i = (x i1,, x in ) to be a vector of covariates always observed, Little and Rubin (2002) classified missing data mechanism into three types: (1) MCAR, where the missingness is unrelated to the responses so that P (R i Y i, X i ) = P (R i X i ). (2) MAR, where the missingness depends only on the observed responses so that P (R i Y i, X i ) = P (R i Yi O, X i ). This is a weaker

3 Li and Wang (2016): A weighted simulation-based estimator 3 and more plausible assumption than MCAR. (3) MNAR, where the missingness depends on both observed and unobserved responses. 2.2 Missing data patterns There are two broad classes of missing data patterns: intermittent missing and dropout. Intermittent missing pattern refers to the scenario that a subject completes the study but skips a few occasions in the middle of the study period. Dropout (attrition, lost of follow-up) is a particular example of monotone pattern of missingness, which means if one observation is missing, then all subsequent observations are unobserved. Intermittent missing is often easier to deal with because the subject is still participating in the study and the reason of missing values can be ascertained. Dropout is more serious because the subject is no longer available and it is not certain whether the dropout is related to the observed or unobserved outcome. MAR mechanisms are commonly assumed when the interest lies on the parameter estimation (Robins et al., 1995: Lindsey, 2000). 2.3 Estimation of missing data process Let λ ij = P (r ij = 1 r i,j 1 = 1, X i, Yi O ) be the conditional probability that subject i is observed at time j, given that the subject is present at time j 1; and π ij = P (r ij = 1 X i, Yi O ) be the marginal probability that subject i is present at time j. Then π ij = j t=2 λ it. Generally it is assumed that all individuals are observed on the first occasion so that r i1 = λ i1 = 1. Further, let π ijk = P (r ij = 1, r ik = 1 X i, Yi O ) be the probability of observing both y ij and y ik given the response history and covariates. Usually λ ij is estimated using a logistic regression model logitλ ij = A ij α, where A ij is a vector consisting of information on X i and response history, and α is the vector of parameters (Diggle and Kenward, 1994; Fitzmaurice etal., 1996; Molenberghs etal., 1997; Yi and Cook 2002). 3 Weighted Simulation-based Estimator 3.1 GLMM formulation Suppose subject i is measured repeatedly on n i occasions. In a GLMM it is assumed that, given the covariates and random effects b i IR q, the responses y ij are conditionally independent and have distribution from the exponential family { } ωij y ij a(ω ij ) f(y ij b i, X i, Z i ) = exp + c(y ij, φ), i = 1,, N, j = 1,, n i, φ (3.1) where φ is a dispersion parameter, ω ij is the canonical parameter and a( ) and c( ) are known functions. The conditional mean and variance µ c ij = E(y ij b i, X i, Z i ) = a (1) (ω ij ) (3.2) vij c = V ar(y ij b i, X i, Z i ) = φa (2) (ω ij ) (3.3)

4 Li and Wang (2016): A weighted simulation-based estimator 4 satisfy g 1 {µ c ij } = x ij β + z ij b i and vij c = φν(µc ij ), where a(d) denotes the d th derivative against ω ij, g 1 ( ) and ν( ) are known link and variance functions, respectively. The random effects are assumed to have mean zero and distribution f b (u; θ) with unknown parameters θ IR r. Our approach is motivated by the fact that all the model parameters of interest can be identified and consistently estimated using the first two marginal moments E(y ij X i, Z i ) = g(x ijβ + z iju)f b (u; θ)du, (3.4) E(y ij y ik X i, Z i ) = g(x ijβ + z iju)g(x ik β + z ik u)f b(u; θ)du + δ jk φ ν(g(x ijβ + z iju))f b (u; θ)du j k, (3.5) where δ jk = 1 if j = k and 0 otherwise. For example, in a linear model y ij = x ij β + z ij b i + ɛ ij with b i N(0, D(θ)), the first two marginal moments are E(y ij X i, Z i ) = x ij β and E(y ijy ik X i, Z i ) = (x ij β)(x ik β) + z ij D(θ)z ik + δ jk φ. Another example is a logistic model logitp (y ij = 1 b i ) = x ij β + z ij b i, where the moments are given in (3.4)-(3.5) with logit link function g(w) = (1 + e w ) 1. In general the integrals on the right-hand sides of (3.4)-(3.5) are intractable but can be approximated using the Monte Carlo simulation techniques such as importance sampling. 3.2 Simulation-based estimator for the data MCAR Li and Wang (2012a, b) and Wang (2007) used a simulation-by-parts technique to construct two sets of simulated moments which are unbiased estimates of the true moments. First, a known density h(u) is chosen such that its support covers that of the integrands in (3.4)-(3.5). Second, a set of random points u is, s = 1, 2,..., 2S are generated from h(u) which are used to construct the simulated moments as µ ij,1 (ψ) = 1 S η ijk,1 (ψ) = 1 S S g(x ij β + z ij u is)f b (u is ; θ), (3.6) h(u is ) s=1 S g(x ij β + z ij u is)g(x ik β + z ik u is)f b (u is ; θ) + h(u is ) s=1 δ jk φ S S ν(g(x ij β + z ij u is))f b (u is ; θ) s=1 h(u is ) and µ ij,2 (ψ) and η ijk,2 (ψ) are constructed similarly using the second half of the points u is, s = S + 1, S + 2,..., 2S. Finally the SBE for ψ = (β, θ, φ) is obtained by minimizing (3.7) Q N,S (ψ) = N ρ i,1(ψ)w i ρ i,2 (ψ) (3.8) i=1 within a compact parameter space Ψ, where ρ i,t (ψ) = (y ij µ ij,t (ψ), 1 j n i, y ij y ik η ijk,t (ψ), 1 j k n i ), t = 1, 2, and W i = W (X i, Z i ) is a nonnegative definite weight matrix. As is shown in

5 Li and Wang (2016): A weighted simulation-based estimator 5 Li and Wang (2012a, b) that in the case of complete data the SBE is consistent and asymptotically normal as N for any finite S. Their simulation studies and real data applications have also shown that the SBE works well in the finite sample situations with moderately large S. Now for the case of data MCAR, we define the SBE ˆψ N,S as the solution of the score equation N i=1 ρ i,1 (ψ) W i i ρ i,2 (ψ) = 0, (3.9) where i = diag(r ij, 1 i n i, r ij r ik, 1 j k n i ). It is easy to see that (3.9) is an unbiased estimating equation because under MCAR i does not depend on Y i and therefore [ ρ ] [ ρ ] E W i i ρ i,2 (ψ 0 ) = E W i i E(ρ i,2 (ψ 0 ) X i, Z i ) = 0, where ψ 0 = (β 0, θ 0, φ 0) denotes the true parameter value. Moreover, following Li and Wang (2012a) it is straightforward to show that for a finite S, N( ˆψ N,S ψ 0 ) L N(0, B 1 CB 1 ), where [ ρ ] ρ i,2 (ψ 0 ) B = E W i i (3.10) and [ ρ ] C = E W i i ρ ρ i,2(ψ 0 )ρ i,2 (ψ 0 ) i W i. (3.11) An approximately optimal choice of W i as derived in Li and Wang (2012a) is given by A( ˆψ N1 ) = 1 N N ρ i,1 ( ˆψ N1 ) i ρ i,2( ˆψ N1 ), (3.12) i=1 where ˆψ N1 is an initial consistent estimator of ψ. 3.3 Weighted simulation-based estimator for the data MAR Now we modify the SBE to handle the data MAR by using the IPW method. The idea is to weight each subject s contribution in the estimation by the inverse probability that the subject drops out at the time of dropping out (Robins et al., 1995). The weights are obtained based on models for the missing data process as specified in section 2.3. Specifically, let i = diag(r ij /π ij, 1 j n i, r ij r ik /π ijk, 1 j k n i ) be the weight matrix accommodating missingness. Then we define the weighted SBE (WSBE) ψ N,S as the solution of N i=1 ρ i,1 (ψ) W i i ρ i,2 (ψ) = 0. (3.13) This is an unbiased estimating equation because under MAR E[ i X i, Z i, Y i ] is an identity matrix and therefore by the law of iterated expectation we have [ ρ ] [ ρ E W i i ρ i,2 (ψ 0 ) = E [ ρ = E ] W i E[ i X i, Z i, Y i ]ρ i,2 (ψ 0 ) ] W i ρ i,2 (ψ 0 ) = 0,

6 Li and Wang (2016): A weighted simulation-based estimator 6 It follows that ψ N,S is consistent and asymptotically normally distributed with asymptotic covariance matrix B 1 CB 1, where B and C are given by (3.10) and (3.11) respectively with i replaced by i. Similarly, the approximately optimal weight Ãi is calculated as in (3.12) with i replaced by i. For the computation of A i or Ãi, the moment estimator described in Li and Wang (2012a) needs to be modified because the length of ρ i (ψ) is different across subjects. The second-order marginal moments can be calculated using (3.7), and the third- and fourth-order moments can be calculated using the same simulation method by constructing the conditional moments first. For all j, k, l, t, cov(y ij, y ik y il b i, X i, Z i ) = E(y ij y ik y il b i, X i, Z i ) E(y ij b i, X i, Z i )E(y ik y il b i, X i, Z i ), and cov(y ij y ik, y il y it b i, X i, Z i ) = E(y ij y ik y il y it b i, X i, Z i ) E(y ij y ik b i, X i, Z i )E(y il y it b i, X i, Z i ). Alternatively, one can adopt the idea of working variance matrix (Prentice and Zhao, 1991; Vonesh et al., 2002) to construct Ãi. For example, assuming y i is multivariate normal, then cov(y ij, y ik y il b i, X i, Z i ) = µ c il σ ijk + µ c ik σ ijl, and cov(y ij y ik, y il y it b i, X i, Z i ) = σ ijl σ ikt + σ ijt σ ikl + µ c ik µc il σ ijt + µ c ij µc il σ ikt + µ c ik µc it σ ijl + µ c ij µ itσikl c, where σ ijk = E[(y ij u ij )(y ik u ik ) b i, X i, Z i ]. Thus, both third and fourth moments can be obtained through the first two moments. We can also assume independence among the elements of y i, in which case the third and fourth moments of y i are respectively given by E[(y ij u c ij )3 ] + 2µ c ij σ ijj 2(µ c ij )3 if j = k = l, σ ijj µ c ik if j = l k, cov(y ij, y ik y il b i, X i, Z i ) = σ ijj µ c il if j = k l, 0 otherwise and E[yij 4 ] (µc ij )2 σ ijj if j = k = l = t, E[(y ij u c ij cov(y ij y ik, y il y it b i, X i, Z i ) = )3 ]µ it + 2µ c ij µc it σ ijj if j = k = l t, E[(y ij u ij ) 3 ]µ c il + 2µc ij µc il σ ijj if j = k = t l, 0 otherwise. If we further assume that the distribution of y i is symmetric, then we have E[(y ij u c ij )3 ] = 0. 4 Monte Carlo Simulation Studies In this section we conduct simulation studies to assess the finite sample performance of the proposed WSBE under the MCAR and MAR scenarios with various amount of missing data. We consider two models for two types of response variables. In particular, we consider a linear mixed model for the continuous response y ij = β 0 + β 1 x ij + b i + ɛ ij with ɛ ij N(0, 1), and a mixed Poisson model for the count data log E(y ij b i ) = β 0 + β 1 x ij + b i. In both models we set β = (1, 1) and b i N(0, θ) with θ = The covariate x ij is generated from normal distribution N(1, 1) in the linear model and N(0.5, 1) in the Poisson model. The missing indicator r ij is generated from the logistic model logitλ ij = α 0 + α 1 y i,j 1, with α = (2, 0), (2, 0.5), (2, 1) for the linear model and α = (3, 0), (0.5, 0.1), (0.5, 0.5) for the Poisson model respectively. Note that α 1 = 0 represents the

7 Li and Wang (2016): A weighted simulation-based estimator 7 scenario of data MCAR, while α 1 0 represents data MAR. For a given α 0, the smaller α 1 results in higher percentage of missing data. The combined choice of α 0 and α 1 leads to about 10-40% drop-out, spread over time points 2-4. Therefore, these parameter setups not only lead to different missing data mechanism but also different percentage of missing data. For comparisons, we also calculate the naive SBE that ignores the missing data and the SBE based on the multiple imputed data. The multiple imputation is done using the R package MICE with predictive mean matching method and iteration time as the seed for random number generation (Horton and Lipsitz 2001). Further, we set the number of multiple imputations to be 5 which is generally sufficient to yield efficient results (Rubin, 1987). The sample sizes are N = 50, 100, 200, 500 and the number of observations per subject is n i = 4. In each simulation we generate M = 500 datasets and report the average biases (1/M) M i=1 ˆψ i ψ 0 and the root mean square errors (RMSE) ( M i=1 ( ˆψ i ψ 0 ) 2 /M) 1/2. Table 1 and 2 contain the numerical results for two models respectively. These results show that in the case of MCAR (α 1 = 0), the SBE and WSBE perform similarly, which is consistent with theory. However, in linear model the MI-SBE has larger bias and RMSE than the other two estimators except for the variance parameter θ due to the relatively high percentage of missing data. When we repeat the simulation with lower percentage of missing data, the MI-SBE performs actually slightly better than the SBE and WSBE. Moreover, the RMSE of all estimators reduce when the sample size increases. In the case of MAR (α 1 0), the WSBE has smaller bias and RMSE than the SBE for variance parameters. For the regression parameters, the WSBE performs similarly as the SBE for small sample sizes, but improves fast and is significantly better for large sample sizes. In general, MI-SBE clearly outperforms the WSBE and SBE, especially in the Poisson model. This is not surprising since it is documented in the literature that MI is generally more efficient than the IPW method (Robins et al., 1995). More general discussions about the IPW and MI methods can be found in e.g., Carpenter et al. (2006) and Seaman et al. (2012). To improve efficiency, one may consider applying augmented inverse probability weight method (Robins et al., 1995). Furthermore, we notice that the numerical computation of the MI-SBE is more stable. We have repeated our simulations with various values of (α 0, α 1 ) and observed similar patterns as discussed above. 5 Concluding remarks Incomplete longitudinal data are common in practical applications. For a valid analysis, a study of the missing mechanism is necessary. Although comprehensive theoretical work and application of the SBE for GLMM were discussed in Li and Wang (2012a, b), there is still a strong need to examine this approach when missing data are present. In this paper, we show that the SBE based on observed data is only valid for data MCAR, and hence we adopt the inverse probability weighting method to construct the WSBE for data MAR. We also investigate the performance of

8 Li and Wang (2016): A weighted simulation-based estimator 8 SBE for incomplete longitudinal data by the means of multiple imputation. Our simulation studies demonstrate that the proposed WSBE is feasible to compute, performs well under finite sample sizes, and is comparable to the multiple imputation approach in many cases. Furthermore, this paper suggests a few ways to compute the optimal weight matrix under the incomplete longitudinal data setting. Since the weight matrix contains the third and fourth moments, the computation can be cumbersome even using simulated moments. In our experience, diagonal weight works quite well and can reduce the computational burden substantially. The proposed WSBE is formulated under the GLMM framework, however, it can be easily extended to nonlinear mixed effects models. In principle, the WSBE and MI-SBE can also be used for data with intermittent missing pattern or longitudinal data with unequally spaced repeated measures. Missing data and measurement error often arise simultaneously in a real world problem, so it would be valuable to develop the proposed methodology to cope with these situations. Another future research is to further extend the SBE to deal with the non-ignorable missing data problems. Acknowledgments The authors are grateful to the editor and two anonymous referees for their valuable comments and suggestions that helped to improve this paper. The research was supported by grant RGPIN/ from the Natural Sciences and Engineering Research Council of Canada (NSERC). References [1] Carpenter, J.R., Kenward, M.G., Vansteelandt, S. (2006). A comparison of multiple imputation and doubly robust estimation for analyses with missing data. Journal of the Royal Statistical Society, Series A, 169, [2] Diggle, P. and Kenward, M. G. (1994). Informative drop-out in longitudinal data analysis (with discussion). Applied Statistics, 43, [3] Fitzmaurice, G.M., Laird, N.M. and Zahner, G.E.P. (1996). Multivariate logistic models for incomplete binary responses. Journal of the American Statistical Association, 91, [4] Horton, N. J. and Lipsitz, S. R. (2001). Multiple imputation in practice: comparison of software packages for regression models with missing variables. The American Statistician, 55, [5] Li, H. and Wang, L. (2012a) A consistent simulation-based estimator in generalized linear mixed models. Journal of Statistical Computation and Simulation, 82, [6] Li, H. and Wang, L. (2012b) Consistent estimation in generalized linear mixed models with measurement error. Journal of Biometrics and Biostatistics, S7:007. DOI / S7-007.

9 Li and Wang (2016): A weighted simulation-based estimator 9 [7] Lindsey, J. K. (2000). Dropouts in longitudinal studies: definitions and models. Journal of Biopharmaceutical Statistics, 10, [8] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, Second edition, John Wiley & Sons, Hoboken, N.J.. [9] Molenberghs, G., Kenward, M.G. and Lesaffre, E. (1997). The analysis of longitudinal ordinal data with nonrandom drop-out. Biometrika, 84, [10] Prentice, R.L. and Zhao, L.P. (1991). Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics, 47, [11] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90, [12] Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: Wiley. [13] Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data, New York: Chapman & Hall. [14] Seaman, S.R., White, I.R., Copas, A.J., and Li L. (2012). Combining multiple imputation and inverse-probability weighting. Biometrics, 68, [15] Vonesh, E. F., Wang, H., Nie L. and Majumdar, D. (2002). Conditional second-order generalized estimating equations for generalized linear and nonlinear mixed effects models. Journal of the American Statistical Association, 97, [16] Wang, L. (2007). A unified approach to estimation of nonlinear mixed effects and Berkson measurement error models. The Canadian Journal of Statistics, 35, [17] Yi, G. Y. and Cook, R. J. (2002). Marginal methods for incomplete longitudinal data arising in clusters. Journal of the American Statistical Association, 97,

10 Li and Wang (2016): A weighted simulation-based estimator 10 Table 1: Simulation results for the linear regression model Missingness N SBE WSBE MI-SBE Bias RMSE Bias RMSE Bias RMSE (2,0) 50 β β θ φ β β θ φ β β θ φ β β θ φ (2,0.5) 50 β β θ φ β β θ φ β β θ φ β β θ φ (2,1) 50 β β θ φ β β θ φ β β θ φ β β θ φ

11 Li and Wang (2016): A weighted simulation-based estimator 11 Table 2: Simulation results for the Poisson regression model Missingness N SBE WSBE MI-SBE Bias RMSE Bias RMSE Bias RMSE (3,0) 50 β β θ β β θ β β θ β β θ (0.5,0.1) 50 β β θ β β θ β β θ β β θ (0.5,0.5) 50 β β θ β β θ β β θ β β θ

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

5 Methods Based on Inverse Probability Weighting Under MAR

5 Methods Based on Inverse Probability Weighting Under MAR 5 Methods Based on Inverse Probability Weighting Under MAR The likelihood-based and multiple imputation methods we considered for inference under MAR in Chapters 3 and 4 are based, either directly or indirectly,

More information

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation Structures Authors: M. Salomé Cabral CEAUL and Departamento de Estatística e Investigação Operacional,

More information

Identification Problem for The Analysis of Binary Data with Non-ignorable Missing

Identification Problem for The Analysis of Binary Data with Non-ignorable Missing Identification Problem for The Analysis of Binary Data with Non-ignorable Missing Kosuke Morikawa, Yutaka Kano Division of Mathematical Science, Graduate School of Engineering Science, Osaka University,

More information

Modelling Dropouts by Conditional Distribution, a Copula-Based Approach

Modelling Dropouts by Conditional Distribution, a Copula-Based Approach The 8th Tartu Conference on MULTIVARIATE STATISTICS, The 6th Conference on MULTIVARIATE DISTRIBUTIONS with Fixed Marginals Modelling Dropouts by Conditional Distribution, a Copula-Based Approach Ene Käärik

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

A Flexible Bayesian Approach to Monotone Missing. Data in Longitudinal Studies with Nonignorable. Missingness with Application to an Acute

A Flexible Bayesian Approach to Monotone Missing. Data in Longitudinal Studies with Nonignorable. Missingness with Application to an Acute A Flexible Bayesian Approach to Monotone Missing Data in Longitudinal Studies with Nonignorable Missingness with Application to an Acute Schizophrenia Clinical Trial Antonio R. Linero, Michael J. Daniels

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Longitudinal analysis of ordinal data

Longitudinal analysis of ordinal data Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Imputation Algorithm Using Copulas

Imputation Algorithm Using Copulas Metodološki zvezki, Vol. 3, No. 1, 2006, 109-120 Imputation Algorithm Using Copulas Ene Käärik 1 Abstract In this paper the author demonstrates how the copulas approach can be used to find algorithms for

More information

Analyzing Pilot Studies with Missing Observations

Analyzing Pilot Studies with Missing Observations Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

2 Naïve Methods. 2.1 Complete or available case analysis

2 Naïve Methods. 2.1 Complete or available case analysis 2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

A Correlated Binary Model for Ignorable Missing Data: Application to Rheumatoid Arthritis Clinical Data

A Correlated Binary Model for Ignorable Missing Data: Application to Rheumatoid Arthritis Clinical Data Journal of Data Science 14(2016), 365-382 A Correlated Binary Model for Ignorable Missing Data: Application to Rheumatoid Arthritis Clinical Data Francis Erebholo 1, Victor Apprey 3, Paul Bezandry 2, John

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract Far East J. Theo. Stat. 0() (006), 179-196 COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS Department of Statistics University of Manitoba Winnipeg, Manitoba, Canada R3T

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

T E C H N I C A L R E P O R T KERNEL WEIGHTED INFLUENCE MEASURES. HENS, N., AERTS, M., MOLENBERGHS, G., THIJS, H. and G. VERBEKE

T E C H N I C A L R E P O R T KERNEL WEIGHTED INFLUENCE MEASURES. HENS, N., AERTS, M., MOLENBERGHS, G., THIJS, H. and G. VERBEKE T E C H N I C A L R E P O R T 0465 KERNEL WEIGHTED INFLUENCE MEASURES HENS, N., AERTS, M., MOLENBERGHS, G., THIJS, H. and G. VERBEKE * I A P S T A T I S T I C S N E T W O R K INTERUNIVERSITY ATTRACTION

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

Linear Mixed Models for Longitudinal Data with Nonrandom Dropouts

Linear Mixed Models for Longitudinal Data with Nonrandom Dropouts Journal of Data Science 4(2006), 447-460 Linear Mixed Models for Longitudinal Data with Nonrandom Dropouts Ahmed M. Gad and Noha A. Youssif Cairo University Abstract: Longitudinal studies represent one

More information

Bootstrapping Sensitivity Analysis

Bootstrapping Sensitivity Analysis Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline

More information

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies Paper 177-2015 An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies Yan Wang, Seang-Hwane Joo, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta

More information

Pooling multiple imputations when the sample happens to be the population.

Pooling multiple imputations when the sample happens to be the population. Pooling multiple imputations when the sample happens to be the population. Gerko Vink 1,2, and Stef van Buuren 1,3 arxiv:1409.8542v1 [math.st] 30 Sep 2014 1 Department of Methodology and Statistics, Utrecht

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas Multiple Imputation for Missing Data in epeated Measurements Using MCMC and Copulas Lily Ingsrisawang and Duangporn Potawee Abstract This paper presents two imputation methods: Marov Chain Monte Carlo

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data Comparison of multiple imputation methods for systematically and sporadically missing multilevel data V. Audigier, I. White, S. Jolani, T. Debray, M. Quartagno, J. Carpenter, S. van Buuren, M. Resche-Rigon

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

ATINER's Conference Paper Series STA

ATINER's Conference Paper Series STA ATINER CONFERENCE PAPER SERIES No: LNG2014-1176 Athens Institute for Education and Research ATINER ATINER's Conference Paper Series STA2014-1255 Parametric versus Semi-parametric Mixed Models for Panel

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Inferences on missing information under multiple imputation and two-stage multiple imputation

Inferences on missing information under multiple imputation and two-stage multiple imputation p. 1/4 Inferences on missing information under multiple imputation and two-stage multiple imputation Ofer Harel Department of Statistics University of Connecticut Prepared for the Missing Data Approaches

More information

arxiv: v2 [stat.me] 17 Jan 2017

arxiv: v2 [stat.me] 17 Jan 2017 Semiparametric Estimation with Data Missing Not at Random Using an Instrumental Variable arxiv:1607.03197v2 [stat.me] 17 Jan 2017 BaoLuo Sun 1, Lan Liu 1, Wang Miao 1,4, Kathleen Wirth 2,3, James Robins

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Columbia University. Columbia University Biostatistics Technical Report Series. Handling Missing Data by Deleting Completely Observed Records

Columbia University. Columbia University Biostatistics Technical Report Series. Handling Missing Data by Deleting Completely Observed Records Columbia University Columbia University Biostatistics Technical Report Series Year 2006 Paper 13 Handling Missing Data by Deleting Completely Observed Records Cuiling Wang Myunghee Cho Paik Columbia University,

More information

Improved Doubly Robust Estimation when Data are Monotonely Coarsened, with Application to Longitudinal Studies with Dropout

Improved Doubly Robust Estimation when Data are Monotonely Coarsened, with Application to Longitudinal Studies with Dropout Biometrics 64, 1 23 December 2009 DOI: 10.1111/j.1541-0420.2005.00454.x Improved Doubly Robust Estimation when Data are Monotonely Coarsened, with Application to Longitudinal Studies with Dropout Anastasios

More information

TUTORIAL IN BIOSTATISTICS Handling drop-out in longitudinal studies

TUTORIAL IN BIOSTATISTICS Handling drop-out in longitudinal studies STATISTICS IN MEDICINE Statist. Med. 2004; 23:1455 1497 (DOI: 10.1002/sim.1728) TUTORIAL IN BIOSTATISTICS Handling drop-out in longitudinal studies Joseph W. Hogan 1; ;, Jason Roy 2; and Christina Korkontzelou

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

Chapter 4. Parametric Approach. 4.1 Introduction

Chapter 4. Parametric Approach. 4.1 Introduction Chapter 4 Parametric Approach 4.1 Introduction The missing data problem is already a classical problem that has not been yet solved satisfactorily. This problem includes those situations where the dependent

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

LATENT VARIABLE MODELS FOR LONGITUDINAL STUDY WITH INFORMATIVE MISSINGNESS

LATENT VARIABLE MODELS FOR LONGITUDINAL STUDY WITH INFORMATIVE MISSINGNESS LATENT VARIABLE MODELS FOR LONGITUDINAL STUDY WITH INFORMATIVE MISSINGNESS by Li Qin B.S. in Biology, University of Science and Technology of China, 1998 M.S. in Statistics, North Dakota State University,

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Longitudinal Response and Covariate Data

Longitudinal Response and Covariate Data Weighted Generalized Estimating Functions for Incomplete Longitudinal Response and Covariate Data Baojiang Chen, Grace Y. Yi and Richard J. Cook Department of Statistics and Actuarial Science University

More information

Integrated approaches for analysis of cluster randomised trials

Integrated approaches for analysis of cluster randomised trials Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool

More information

Comparing Group Means When Nonresponse Rates Differ

Comparing Group Means When Nonresponse Rates Differ UNF Digital Commons UNF Theses and Dissertations Student Scholarship 2015 Comparing Group Means When Nonresponse Rates Differ Gabriela M. Stegmann University of North Florida Suggested Citation Stegmann,

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Downloaded from:

Downloaded from: Hossain, A; DiazOrdaz, K; Bartlett, JW (2017) Missing binary outcomes under covariate-dependent missingness in cluster randomised trials. Statistics in medicine. ISSN 0277-6715 DOI: https://doi.org/10.1002/sim.7334

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

REGRESSION ANALYSIS IN LONGITUDINAL STUDIES WITH NON-IGNORABLE MISSING OUTCOMES

REGRESSION ANALYSIS IN LONGITUDINAL STUDIES WITH NON-IGNORABLE MISSING OUTCOMES REGRESSION ANALYSIS IN LONGITUDINAL STUDIES WITH NON-IGNORABLE MISSING OUTCOMES by Changyu Shen B.S. in Biology University of Science and Technology of China, 1998 Submitted to the Graduate Faculty of

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

Unbiased estimation of exposure odds ratios in complete records logistic regression

Unbiased estimation of exposure odds ratios in complete records logistic regression Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

This is the submitted version of the following book chapter: stat08068: Double robustness, which will be

This is the submitted version of the following book chapter: stat08068: Double robustness, which will be This is the submitted version of the following book chapter: stat08068: Double robustness, which will be published in its final form in Wiley StatsRef: Statistics Reference Online (http://onlinelibrary.wiley.com/book/10.1002/9781118445112)

More information

TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random

TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random Paul J. Rathouz University of Chicago Abstract. We consider the problem of attrition under a logistic

More information

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates by c Ernest Dankwa A thesis submitted to the School of Graduate

More information

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Alessandra Mattei Dipartimento di Statistica G. Parenti Università

More information

LONGITUDINAL DATA ANALYSIS

LONGITUDINAL DATA ANALYSIS References Full citations for all books, monographs, and journal articles referenced in the notes are given here. Also included are references to texts from which material in the notes was adapted. The

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Bayesian Nonparametric Analysis of Longitudinal. Studies in the Presence of Informative Missingness

Bayesian Nonparametric Analysis of Longitudinal. Studies in the Presence of Informative Missingness Bayesian Nonparametric Analysis of Longitudinal Studies in the Presence of Informative Missingness Antonio R. Linero Department of Statistics, Florida State University, 214 Rogers Building, 117 N. Woodward

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

Comparison of methods for repeated measures binary data with missing values. Farhood Mohammadi. A thesis submitted in partial fulfillment of the

Comparison of methods for repeated measures binary data with missing values. Farhood Mohammadi. A thesis submitted in partial fulfillment of the Comparison of methods for repeated measures binary data with missing values by Farhood Mohammadi A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Biostatistics

More information

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp. 55-78 c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information