MULTILEVEL IMPUTATION 1

Similar documents
Three-Level Multiple Imputation: A Fully Conditional Specification Approach. Brian Tinnell Keller

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Bayesian Linear Regression

The linear model is the most fundamental of all serious statistical models encompassing:

arxiv: v2 [stat.me] 27 Nov 2017

Bayesian Linear Models

Bayesian Linear Models

Bayesian linear regression

STA 216, GLM, Lecture 16. October 29, 2007

Default Priors and Effcient Posterior Computation in Bayesian

Bayesian Analysis of Latent Variable Models using Mplus

arxiv: v1 [stat.me] 27 Feb 2017

Bayesian Linear Models

The lmm Package. May 9, Description Some improved procedures for linear mixed models

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Hierarchical Modeling for Univariate Spatial Data

Bayesian Multilevel Latent Class Models for the Multiple. Imputation of Nested Categorical Data

November 2002 STA Random Effects Selection in Linear Mixed Models

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /rssa.

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals.

Efficient Bayesian Multivariate Surface Regression

STA 4273H: Statistical Machine Learning

Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent

Weighted Least Squares

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

The pan Package. April 2, Title Multiple imputation for multivariate panel or clustered data

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Lecture Notes based on Koop (2003) Bayesian Econometrics

Bayesian Multivariate Logistic Regression

Plausible Values for Latent Variables Using Mplus

Hierarchical Modelling for Univariate Spatial Data

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Part 6: Multivariate Normal and Linear Models

Part 8: GLMs and Hierarchical LMs and GLMs

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Bayesian Methods for Machine Learning

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

Bayesian non-parametric model to longitudinally predict churn

Categorical Predictor Variables

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

variability of the model, represented by σ 2 and not accounted for by Xβ

Nearest Neighbor Gaussian Processes for Large Spatial Data

The sbgcop Package. March 9, 2007

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Hierarchical Modelling for Univariate Spatial Data

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Pooling multiple imputations when the sample happens to be the population.

Module 11: Linear Regression. Rebecca C. Steorts

Gibbs Sampling in Latent Variable Models #1

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Hypothesis Testing for Var-Cov Components

Hierarchical Linear Models. Jeff Gill. University of Florida

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Weighted Least Squares

Zita Oravecz, Francis Tuerlinckx, & Joachim Vandekerckhove Department of Psychology. Department of Psychology University of Leuven, Belgium

Model Estimation Example

Multivariate Normal & Wishart

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables

Bayes Model Selection with Path Sampling: Factor Models

CTDL-Positive Stable Frailty Model

ST 740: Markov Chain Monte Carlo

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java

2 1 Introduction Multilevel models, for data having a nested or hierarchical structure, have become an important component of the applied statistician

Bayes methods for categorical data. April 25, 2017

0.1 factor.ord: Ordinal Data Factor Analysis

Package effectfusion

Bayesian Causal Mediation Analysis for Group Randomized Designs with Homogeneous and Heterogeneous Effects: Simulation and Case Study

Correspondence Analysis of Longitudinal Data

Bayesian Methods in Multilevel Regression

Generalized Linear Models for Non-Normal Data

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

ST 740: Linear Models and Multivariate Normal Inference

Package sbgcop. May 29, 2018

Number of analysis cases (objects) n n w Weighted number of analysis cases: matrix, with wi

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

Efficient Bayesian Multivariate Surface Regression

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian Mixture Modeling

Rank Regression with Normal Residuals using the Gibbs Sampler

Centering Predictor and Mediator Variables in Multilevel and Time-Series Models

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Markov Chain Monte Carlo

Ronald Heck Week 14 1 EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 17, 2012

Hierarchical Linear Models

A Bayesian Probit Model with Spatial Dependencies

BayesSummaryStatLM: An R package for Bayesian Linear Models for

Introduction to Within-Person Analysis and RM ANOVA

Package factorqr. R topics documented: February 19, 2015

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

A Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach

Gibbs Sampling in Endogenous Variables Models

Timevarying VARs. Wouter J. Den Haan London School of Economics. c Wouter J. Den Haan

Estimation of Sample Selection Models With Two Selection Mechanisms

Transcription:

MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression coefficients, random effects, and covariance matrices (or variance estimates) for the two-level imputation scheme from the Enders, Keller, and Levy (018) paper in Psychological Methods. Additional estimation details are widely available in the literature, as these distributions largely borrow from established Bayesian estimation procedures for multilevel models (Browne & Draper, 000; Cowles, 1996; Gelman et al., 014; Goldstein et al., 009; Kasim & Raudenbush, 1998; Schafer, 1997; Schafer & Yucel, 00; Sinharay et al., 001; van Buuren, 01; Yucel, 008). For the remainder of the document, we abandon the scalar notation from the paper in favor of a more succinct matrix representation of the multilevel model y j = X j β + Z j u j + ε j (1) where y j is the vector of outcome scores for cluster j, X j is the corresponding matrix of predictor variables (level-1 or level-), including a unit vector for the intercept, Z j is a subset of the level-1 variables in X j that have a random influence on the outcome (including a unit vector for the intercept), u j is the column vector of level- residuals for cluster j, and ε j is a vector of within-cluster residuals. In the context of FCS, y is an incomplete variable that is the target of imputation at a particular step, and X and Z contain complete and previously imputed variables. Level- imputation applies a single-level regression model to a cluster-level data set with records. In matrix format, the model is as follows. y = Xβ + u () Two algorithmic options are available for the Bayesian estimation sequence that provides the necessary parameter estimates for imputation. Following derivations from Rubin (1987, pp. 16-166), the MICE computational option is consistent with the original formulation of fully conditional specification (van Buuren,

MULTILEVEL IMPUTATION 01; van Buuren, Brand, Groothuis-Oudshoorn, & Rubin, 006), whereby only those cases with observed data on the variable to be imputed are used to estimate the imputation model parameters. In contrast, the GIBBS option uses a conventional Gibbs sampler that draws parameters from a distribution that conditions on the observed and imputed data. The Gibbs sampler for missing data imputation is described in various Bayesian analysis texts (ackman, 009; Lynch, 007). Level-1 Gibbs Sampler Steps for Continuous Variables Step 1: Draw regression coefficients from a multivariate normal distribution, conditional on the current random effects, parameter estimates, and imputations. Assuming a uniform prior, the full conditional distribution is as follows. β = ( X j X j ) β MVN(β, Σ β ) X j (y j Z j u j ) Σ β = σ εj ( X j X j ) (3) We use a j subscript on the within-cluster residual variance σ εj to allow for the possibility of heterogeneous within-cluster variances (discussed below), noting that all σ εj are the same in the homogeneous case, which is the default in Blimp. Step : Draw cluster-specific random effects from a multivariate normal distribution, conditional on the current parameter estimates and imputations. u j MVN(u j, V uj ) V uj = (σ εj Z j Z j + Σ u ) (4) u j = σ εj V uj Z j (y j X j β) Step 3: We define 1 σ ε as a gamma random variable. Then, for a homogeneous within-cluster variance (the HOV keyword of the OPTION command, the default), draw the reciprocal of the residual variance from a gamma distribution, conditional on the current parameter estimates, level- residuals, and imputations.

MULTILEVEL IMPUTATION 3 1 σ ε gamma ( N + df p, S + S p ) S = ε j ε j (5) ε j = y j X j β Z j u j The df p and S p terms are the prior distribution s hyperparameters. The default setting (the PRIOR1 keyword of the OPTIONS command) specifies S p = 1 and df p = (i.e., a gamma(1,.5) prior). Specifying the PRIOR keyword of the OPTIONS command sets S p = 0 and df p =, which is equivalent to a uniform prior on the variance. Finally, the PRIOR3 keyword of the OPTIONS command sets S p = 0 and df p = 0, which gives a effreys prior. For a heteroscedastic within-cluster variance (the HEV keyword of the OPTIONS command), we implement the procedure described in Kasim and Raudenbush (1998). Step 4: We define the inverse of the covariance matrix (i.e., the precision matrix, Σ u ) as a Wishart random variable. The level- precision matrix is sampled from a Wishart distribution, conditional on the current parameter estimates, level- residuals, and imputations. Σ u W((S + S p ), + df p ) (6) S = u j u j The S p can be viewed as the inverse of the prior sums of squares matrix based on df p degrees of freedom (i.e., prior observations). As such, S + S p is a sums of squares and cross products matrix based on + df p observations. The default setting (the PRIOR1 keyword of the OPTIONS command) specifies S p = I and df p = p + 1, where p is the dimension of Σ u. This prior corresponds to marginal uniform priors between -1 and 1 for all correlations and a marginal inverse gamma prior IG(1,.5) for variance elements. Specifying the PRIOR keyword of the OPTIONS command sets S p = 0 and df p = p 1, which is equivalent to a uniform prior on the elements in Σ u. Finally, the PRIOR3 keyword sets S p = 0 and p = 0. For random intercept models with a single level- variance component, we define we define the reciprocal of the variance, 1 σ u, as a gamma random variable.

MULTILEVEL IMPUTATION 4 1 σ u gamma ( N + df p, S + S p ) (7) S = u j The priors are the univariate analogs of those given for the Wishart. The default setting (the PRIOR1 keyword of the OPTIONS command) specifies S p = 1 and df p = (i.e., a gamma(1.,.5) prior). Specifying the PRIOR keyword of the OPTIONS command sets S p = 0 and df p =, which is equivalent to a uniform prior on the variance. Finally, the PRIOR3 keyword of the OPTIONS command sets S p = 0 and df p = 0, which gives a effreys prior. Step 5: Draw the imputation for case i in cluster j from a univariate normal posterior predictive distribution, conditional on the current parameter estimates, level- residual terms, and previously. y ij(mis) N(X ij β + Z ij u j, σ ε ) (8) Level-1 Gibbs Sampler Steps for Categorical Variables The initial sampling step that draws threshold parameters, conditional on the underlying latent scores and current parameter values, applies only to ordinal variables with K > categories. Albert and Chib (1993) described an approach for updating thresholds, but this procedure converges very slowly. Instead, Blimp implements the procedure described by Cowles (1996). Cowles procedure uses a Metropolis-Hastings procedure within the Gibbs sampler to draw each threshold from a normal proposal distribution, and it accepts the threshold draws at some prespecified probability. In the interest of space, we refer readers to Cowles (1996) for details on sampling threshold parameters, as the procedure is rather involved The second step draws latent variable scores for the complete cases. For ordinal variables, latent values are drawn from a truncated normal distribution. For nominal variables, latent scores are drawn that conform to the necessary rank and magnitude conditions given in the manuscript (e.g., the discrete response has the largest latent score). Both situations are described in the body of the manuscript, so

MULTILEVEL IMPUTATION 5 we do not repeat that information here. The sampling steps for β, u j, and Σ u are the same as those in Equations 3, 4, and 6, and no sampling step is needed for σ ε because this term is fixed at unity. The primary computational difference is that y j (a vector of latent variable scores for cluster j) replaces y j in all expressions. For nominal variables, these sampling steps are repeated for each of the K 1 latent variable difference scores, whereas they are performed only once for ordinal variables. After drawing parameter values and level- residual terms, latent variable imputations for the incomplete cases are drawn from an unrestricted normal distribution, as described in text. The final step converts the latent imputes to discrete values using the functions described in the paper. Level- Gibbs Sampler Steps for Continuous Variables After completing a single iteration of level-1 imputation, Blimp creates a level- data set with records, such that each row contains level- scores and the manifestvariable cluster means of the level-1 variables (i.e., the arithmetic average of scores for cluster j). The program then applies single-level imputation scheme to the incomplete level- variables. The remainder of the document describes the sampling steps for the single-level regression model in Equation. Note that we use u to denote the residuals in this single-level model because these terms reflect between-cluster variation. Step 1: Draw regression coefficients from a multivariate normal distribution, conditional on the current parameter values and imputations. Assuming a uniform prior, the full conditional distribution is as follows. β MVN(β, Σ β ) β = (X X) X y (8) Σ β = σ u (X X) Step : We define 1 σ u as a gamma random variable. Assuming a effreys prior, the full conditional distribution is. 1 σ u gamma (, S )

MULTILEVEL IMPUTATION 6 S = u j u j = y Xβ (9) Step 3: Draw an imputation for cluster j from a univariate normal distribution, conditional on the current parameter values and data. y j(mis) N(Xβ, σ u ) (10) Level- Gibbs Sampler Steps for Categorical Variables The level- Gibbs steps for categorical variables are as follows. First, draw threshold parameters for ordinal variables with K > response options. This step follows Cowles (1996), as described previously. Second, draw latent variable scores for the complete cases. For ordinal variables, latent values are drawn from a truncated normal distribution. For nominal variables, latent scores are drawn that conform to the necessary rank and magnitude conditions given in the paper (e.g., the discrete response has the largest latent score). Third, draw regression coefficients from the distribution in Equation 8 where y (the vector of latent variable scores) replaces y. For nominal variables, this step is repeated for each of the K - 1 latent difference scores. Fourth, draw latent imputations for the incomplete cases from an unrestricted normal distribution, as described in the manuscript. Finally, convert the latent imputes to discrete values using the functions given in the paper.