Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Similar documents
Joint Modeling of Longitudinal Item Response Data and Survival

STA 216, GLM, Lecture 16. October 29, 2007

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Sample Size and Power Considerations for Longitudinal Studies

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Bayes methods for categorical data. April 25, 2017

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Using Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results

Default Priors and Effcient Posterior Computation in Bayesian

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

arxiv: v1 [stat.me] 27 Feb 2017

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Optimal rules for timing intercourse to achieve pregnancy

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

CASE STUDY: Bayesian Incidence Analyses from Cross-Sectional Data with Multiple Markers of Disease Severity. Outline:

A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Gibbs Sampling in Latent Variable Models #1

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

A Nonparametric Bayesian Model for Multivariate Ordinal Data

MULTILEVEL IMPUTATION 1

Bayesian Quantile Regression for Longitudinal Studies with Nonignorable Missing Data

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Some methods for handling missing values in outcome variables. Roderick J. Little

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

Bayesian methods for latent trait modeling of longitudinal data

Generalized Linear Models for Non-Normal Data

Bayesian non-parametric model to longitudinally predict churn

Latent Factor Regression Models for Grouped Outcomes

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

On Estimating the Relationship between Longitudinal Measurements and Time-to-Event Data Using a Simple Two-Stage Procedure

Lecture 5 Models and methods for recurrent event data

A Bayesian multi-dimensional couple-based latent risk model for infertility

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

Joint Longitudinal and Survival-cure Models with Constrained Parameters in Tumour Xenograft Experiments

Fractional Imputation in Survey Sampling: A Comparative Review

Chapter 4. Parametric Approach. 4.1 Introduction

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Richard N. Jones, Sc.D. HSPH Kresge G2 October 5, 2011

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

6 Pattern Mixture Models

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

A general mixed model approach for spatio-temporal regression data

Dynamic analysis of binary longitudinal data

Generalized linear models with a coarsened covariate

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Statistics in medicine

Bayesian Multilevel Latent Class Models for the Multiple. Imputation of Nested Categorical Data

Generalized Linear Mixed Models for Longitudinal Data with Missing Values: A Monte Carlo EM Approach

A Joint Longitudinal-Survival Model with Possible Cure: An Analysis of Patient Outcomes on the Liver Transplant Waiting List

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /rssa.

MODEL ASSESSMENT FOR MODELS WITH MISSING DATA. Xiaolei Zhou. Chapel Hill 2015

Variable selection and machine learning methods in causal inference

PIRLS 2016 Achievement Scaling Methodology 1

REGRESSION ANALYSIS IN LONGITUDINAL STUDIES WITH NON-IGNORABLE MISSING OUTCOMES

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

MISSING or INCOMPLETE DATA

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest.

Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness

Basics of Modern Missing Data Analysis

Linear Mixed Models for Longitudinal Data with Nonrandom Dropouts

Nonparametric Bayesian Modeling for Multivariate Ordinal. Data

Longitudinal analysis of ordinal data

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

HISTORICAL CONTROL DATA OF SPERM ANALYSES FROM 2-GENERATION AND FERTILITY STUDIES IN HsdRccHan TM : WIST, Wistar Hannover Rats

Bayesian Multivariate Logistic Regression

Survival Analysis I (CHL5209H)

Overview and Basic Approaches

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

8 Nominal and Ordinal Logistic Regression

Longitudinal Modeling with Logistic Regression

Experimental Design and Data Analysis for Biologists

Joint longitudinal and survival-cure models in tumour xenograft experiments

Analysing geoadditive regression data: a mixed model approach

A Flexible Bayesian Approach to Monotone Missing. Data in Longitudinal Studies with Nonignorable. Missingness with Application to an Acute

Comparison between conditional and marginal maximum likelihood for a class of item response models

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

7 Sensitivity Analysis

A weighted simulation-based estimator for incomplete longitudinal data models

Missing Covariate Data in Matched Case-Control Studies

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Estimating chopit models in gllamm Political efficacy example from King et al. (2002)

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

Longitudinal + Reliability = Joint Modeling

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Review Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and Issues

Transcription:

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, 302-308) Consider data in which multiple outcomes are collected for subunits nested within subjects, and some of the observations may be missing. Latent trait models discussed in lecture 16 can be generalized and applied to this setting. We are particularly interested in the case in which missingness may depend on the subject s latent traits, and hence missingness is informative. We will consider Bayesian methods motivated by applications to toxicology and epidemiology studies.

Motivation In many reproductive toxicology and epidemiology studies, multiple outcomes are measured for each subunit (e.g. sperm, embryo) that survives to the time of observation. Since subunits within healthier subjects may be more likely to have a variety of favorable outcomes, including survival, censoring due to subunit mortality is potentially informative about differences between subjects in the primary outcomes. Such informative censoring also occurs in longitudinal studies in which subjects who tend to miss visits frequently, perhaps due to illness or death, respond differently than other subjects. Since the censored data are not missing at random (Little and Rubin, 1987), failure to account for within-subject dependency between the missingness indicators and the observed outcomes can result in biased inference.

Spermatotoxicity Application Interest is in study effect of agent known to cause infertility at high dosages after short term exposure. A study was designed to detect toxicant-induced changes in sperm motility at relatively low levels of exposure. Adult male rats were randomized to one of four dose groups (0, 8, 24, 72), and there were 10 animals in each group. Rats were dosed daily for 14 days, and sperm were obtained from the proximal cauda epididymidis of each animal on day 15. These sperm had been exposed to the toxicant during their final differentiation in the testis and their maturation in the caput and corpus epididymidis. It is during this time that the capability for progressive motion develops in sperm.

Using computer aided sperm analysis (CASA) (reviewed by Boyers, Davis, and Katz, 1989), the proportion of motile (surviving) sperm was recorded for a sample of 100-200 sperm from each rat, as were the (x, y) coordinates of points (60 per sec) along the path travelled by each surviving sperm. We use three kinematic measures (LDV: linear displacement velocity, LNR: linearity, and PRD: predictability), to quantify the progressive motility of each surviving sperm.

Summary of sperm motility data (LDV = linear displacement velocity, LNR = linearity, PRD = predictability). Outcome Dose Mean SD Corr. with % Survival % Survival 0 83.6 0.042 1.000 8 80.4 0.065 1.000 24 79.5 0.078 1.000 72 62.2 0.126 1.000 LDV 0 88.4 9.21 0.153 8 76.1 7.54 0.146 24 82.1 15.6 0.053 72 77.2 13.3-0.687 LNR 0 0.219 0.013 0.079 8 0.216 0.013 0.069 24 0.207 0.012-0.043 72 0.206 0.020-0.386 PRD 0 25.5 2.70 0.336 8 22.0 2.52 0.202 24 24.3 5.05 0.131 72 23.6 3.38-0.516 Statistics are based on average values for each animal

There is a decreasing trend in the proportion of surviving sperm with increasing dose There is a decline in LDV, LNR, and PRD with increasing dose; though a conventional MANOVA test for an overall effect of dose was not significant (P = 0.224). Within-animal correlation between each of the kinematic measures and the proportion of surviving sperm tended to decrease from positive values in the unexposed and low dose groups to negative values in the higher dose groups.

Such a trend could be due to variability between rats in the stage of the sperm production cycle when treatment occurs combined with differences in the effect of exposure at different stages. We are interested in assessing agent-related changes in sperm survival and in progressive motility (as quantified by the kinematic measures). Conventional analyses that do not adjust for sperm survival can produce biased inferences in the presence of nonignorable missingness due to within-animal dependency in the proportion of surviving sperm and the motility of the survivors.

Models for nonignorable missing data Three general classes, depending on how the joint distribution of the primary data (y) and missingness indicators (z) is specified (Little and Rubin, 1987). 1. Selection approaches assume different parameters for the primary data model p(y) and for p(z y). 2. Pattern mixture approaches assume different parameters for p(y z) and for p(z). 3. Shared parameter approaches instead incorporate common parameters into models for p(y) and p(z).

Relevant Literature Cowles, Carlin, and Connett (1996) and Bradlow and Zaslavsky (1999) proposed Bayesian selection models for ordinal data with nonignorable missingness. Both approaches assume that there are several independent normal latent variables underlying the observed response. Cowles et al. accounted for nonignorable missingness by placing informative priors directly on the missing data, while Bradlow and Zaslavsky instead consider missingness as a category of the outcome variable. In clustered and longitudinal data settings, in which missingness may be associated with the true underlying response for a subject, it is natural to use a shared parameter model. Several shared parameter approaches have been proposed for analyzing longitudinal data when the time to censoring depends on a subject s underlying rate of change. Wu and Carroll (1988) described a pseudo-maximum likelihood approach under a probit censoring model. Wu and Bailey (1989) approximated this shared parameter model by conditioning on the time to censoring, and Schluchter (1992) and De Gruttola and Tu (1994) developed maximum likelihood procedures.

Outline of proposed approach Use a type of Bayesian shared parameter model for clustered multivariate data subject to nonignorable missingness. Instead of following the conventional shared parameter approach of incorporating the same subject-specific parameters into the primary data and missingness models, we instead link latent variables related to censoring to latent variables related to the primary outcomes through a linear model. A factor analytic structure is then used to relate these latent variables to the primary data and missingness indicators. Subunit-specific factors are incorporated to accommodate dependency between the multiple measurements on each subunit (e.g. sperm), and covariate effects are included in each level of the hierarchy.

Multilevel Factor Analytic Model Consider a study involving N clusters (e.g. subjects), the ith of which contains n i subunits. For subunit j (j = 1,..., n i ) within cluster i (i = 1,..., N), M measures of health are collected y ij = (y ij1,..., y ijm ) T. Each subunit is either uncensored (z ij = 1), in which case y ij is completely observed, or censored (z ij = 0), in which case y ij is missing.

Model for Primary Outcomes: We model the primary outcomes y ij using a multilevel factor analytic model: y ij = βu i1 + Λ 1 V i1 a i1 + Λ 2 V i2 b ij + ɛ ij ; i = 1,..., N; j = 1,..., n i,(1) β is a M r 1 matrix of parameters with row vectors β 1,..., β M, u i1 is a r 1 1 vector of cluster-level covariates, a i1 is a p 1 1 vector of cluster-specific latent variables, Λ 1 is a M p 1 factor loadings matrix, v i1 is a p 1 1 vector of covariates modifying the expression of a i1, V i1 = diag(v i1 ), b ij is a q 1 vector of subunit-specific latent variables, Λ 2 is a M q factor loadings matrix, v i2 is a q 1 vector of covariates modifying the expression of b ij, V i2 = diag(v i2 ), ɛ ij is a M 1 vector of residuals with independently distributed elements ɛ ijk N(0, τ 1 k ), k = 1,..., M, τ = (τ 1,..., τ M ) T are precision parameters. a i and b ij are independent and b ijl N(0, ν 1 l ), l = 1,..., q, where ν = (ν 1,..., ν q ) T are precision parameters.

Model for Censoring Process We assume that the censoring probability relates to the latent traits a i2 and to the covariates u i2 through the generalized linear model: h{pr(z ij = 1 a i, u i, v i )} = u T i2γ + λ T 3 V i3 a i2, (2) h( ) is a monotonic link function (e.g. logistic), γ is a r 2 1 vector of parameters, λ 3 is a p 2 1 factor loadings vector, v i3 is a p 2 1 vector of covariates modifying the expression of a i2, and V i3 = diag(v i3 ).

Comments Without additional structure, the models for the primary outcomes and censoring mechanism assume an ignorable censoring mechanism. To allow for nonignorable missingness, we could 1. Use cluster-specific summary statistics of y ij as covariates in the censoring model (selection approach) 2. Include the proportion of surviving subunits as a covariate in the primary outcome model (pattern-mixture approach) 3. Let a i2 a i1 (shared parameter approach).

Linking the Primary Outcome and Censoring Models We instead specify a two stage hierarchical model for the cluster-level latent variables: a i1 = α 11 x i1 + α 12 w(x i1, a i2 ) + e i1, a i2 = α 2 x i2 + e i2, (3) α 11 is a p 1 s 1 matrix of parameters, x i1 is an s 1 1 covariate vector, α 12 is a p 1 s 2 matrix of parameters, w i = w(x i1, a i2 ) is a s 2 1 vector with elements that are functions of x i1 and a i2, e i1 is a p 1 1 vector of residuals with independently distributed elements e i1l N(0, φ 1 l ), l = 1,..., p 1, φ 1 = (φ 1,..., φ p1 ) T are precision parameters, α 2 is a p 2 t matrix of parameters, x i2 is a t 1 covariate vector, e i2 is a p 2 1 vector of residuals with independently distributed elements e i2l N(0, 1), l = 1,..., p 2.

Comments Our model assumes that the data are missing at random (MAR) conditional on the latent variables a i1 and a i2. If the latent variables were observed, expression (3) would be in the form of a pattern mixture model. Since the censoring-related latent variables (a i2 ) are incorporated into the model for the factors (a i1 ) underlying the primary outcomes, our hierarchical model can be viewed as a shared parameter model. Our approach accommodates a broader class of data structures and missing data mechanisms than current univariate shared parameter approaches (e.g. De Gruttola and Tu, 1994) that use the same subject-specific parameters in the primary data and censoring models.

The factor analytic form of models (1) and (2) has several advantages over conventional random-effects approaches (Laird and Ware, 1982). By incorporating outcome-specific factor loadings, flexible models can be formulated that include a single latent variable measuring the health of a subject. In sperm motility applications, this latent variable provides a subject-specific motility score that can be used both to identify subjects with high or low motility and to simplify analyses relating sperm motility to other endpoints (e.g. fertility). By including covariate effects on the subject-specific latent variable instead of on each of the measured outcomes, one can reduce the dimensionality of the analysis. The parameters characterizing such effects have appealing interpretations.