Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Similar documents
Multilevel Statistical Models: 3 rd edition, 2003 Contents

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Markov Chain Monte Carlo methods

7.1 The Hazard and Survival Functions

Chapter 4 Multi-factor Treatment Designs with Multiple Error Terms 93

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Analysing geoadditive regression data: a mixed model approach

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

CTDL-Positive Stable Frailty Model

Bayesian Methods in Multilevel Regression

Bayesian Methods for Machine Learning

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis

Monte Carlo in Bayesian Statistics

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Contents. Part I: Fundamentals of Bayesian Inference 1

Frailty Modeling for clustered survival data: a simulation study

Statistical Inference and Methods

Incorporating unobserved heterogeneity in Weibull survival models: A Bayesian approach

STA 4273H: Statistical Machine Learning

3003 Cure. F. P. Treasure

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

Machine Learning Techniques for Computer Vision

Lecture 5 Models and methods for recurrent event data

Joint Modeling of Longitudinal Item Response Data and Survival

Longitudinal breast density as a marker of breast cancer risk

Step-Stress Models and Associated Inference

Review Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and Issues

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Multivariate Survival Analysis

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

Multistate Modeling and Applications

General Regression Model

Longitudinal and Multilevel Methods for Multinomial Logit David K. Guilkey

Stat 5101 Lecture Notes

Gibbs Sampling in Latent Variable Models #1

Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data

Ph.D. course: Regression models. Introduction. 19 April 2012

STA 4273H: Statistical Machine Learning

Bayesian Regression Linear and Logistic Regression

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Lecture 2: From Linear Regression to Kalman Filter and Beyond

TMA 4275 Lifetime Analysis June 2004 Solution

Probabilistic Machine Learning

Bayesian Modeling of Accelerated Life Tests with Random Effects

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

ABC methods for phase-type distributions with applications in insurance risk problems

MAS3301 / MAS8311 Biostatistics Part II: Survival

Bayesian and Non Bayesian Estimations for. Birnbaum-Saunders Distribution under Accelerated. Life Testing Based oncensoring sampling

Dynamic Scheduling of the Upcoming Exam in Cancer Screening

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

Bayesian Graphical Models

Bayesian Multivariate Logistic Regression

Multistate models and recurrent event models

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Default Priors and Effcient Posterior Computation in Bayesian

Limited Dependent Variable Panel Data Models: A Bayesian. Approach

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Hmms with variable dimension structures and extensions

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Density Estimation. Seungjin Choi

Pattern Recognition and Machine Learning

Bayesian Inference and MCMC

A general mixed model approach for spatio-temporal regression data

Beyond MCMC in fitting complex Bayesian models: The INLA method

Bayesian Estimation for the Generalized Logistic Distribution Type-II Censored Accelerated Life Testing

Analysis of competing risks data and simulation of data following predened subdistribution hazards

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM

Reconstruction of individual patient data for meta analysis via Bayesian approach

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

Statistical Methods for Alzheimer s Disease Studies

Optimal rules for timing intercourse to achieve pregnancy

Nonparametric Bayesian Methods (Gaussian Processes)

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Lecture 9: Quantile Methods 2

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Survival analysis with long-term survivors and partially observed covariates

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Approximate Bayesian Computation

Multistate models and recurrent event models

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

CPSC 540: Machine Learning

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Probability and Statistics

Dynamic Models Part 1

Technical Report - 7/87 AN APPLICATION OF COX REGRESSION MODEL TO THE ANALYSIS OF GROUPED PULMONARY TUBERCULOSIS SURVIVAL DATA

Lecture 3. Truncation, length-bias and prevalence sampling

University of Southampton Research Repository eprints Soton

MAS3301 / MAS8311 Biostatistics Part II: Survival

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Transcription:

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar - Analysis of Longitudinal Data QMSS p.1/45

Hutterite data natural high fertility population no use of contraception first birth interval 432 closed birth intervals 18 open birth intervals 2+ birth intervals 2,544 closed birth intervals 301 open birth intervals QMSS p.2/45

Analysis of Hutterite birth intervals use complete birth histories simultaneously analyse all intervals including open intervals simultaneously model waiting time to conception primary sterility secondary sterility QMSS p.3/45

New estimation approach Bayesian approach Gibbs sampling flexible but computationally demanding QMSS p.4/45

Why mixture models for survival data? problem of long-term survivors time to event of interest equals infinity e.g. sterile never close birth interval bias if problem ignored use mixture model proper inference few demographic applications to date QMSS p.5/45

Mixture model for two populations 1 : those eventually having the event 0 : those at zero risk of having the event Y = 1 if eventually having the event 0 if zero risk, i.e., long-term survivor QMSS p.6/45

Mixture model for two populations... Y is partially observable known for uncensored observation missing for right-censored observation logistic model for distribution of Y logit(pr(y = 1)) = X β QMSS p.7/45

Generic survival model individual with vector of covariates X uncensored observation at time t: likelihood term is the probability of experiencing the event at time t, pr(y = 1 X) f(t Y = 1, X) (1) QMSS p.8/45

Generic survival model... censored observation at time t: likelihood term is the probability of being a long-term survivor plus the probability of experiencing the event after time t pr(y = 0 X) + pr(y = 1 X) S(t Y = 1, X) (2) likelihood is product of (1) & (2) terms QMSS p.9/45

Alternative models for a single event Weibull Farewell (1982) accelerated failure time Yamaguchi (1992) Yamaguchi and Ferguson (1995) piecewise constant hazard Li and Choe (1997) Wang and Murphy (1998) QMSS p.10/45

Logistic-normal-geometric model discrete-time survival model model for a single event or recurrent events including unobserved heterogeneity can be extended to account for long-term survivors McDonald and Rosina (2001) QMSS p.11/45

Mixture modelling of recurrent events simultaneous estimation of logistic model for long-term survivors estimates effects of covariates on the probability of the event occurring simultaneous estimation of survival model estimates effects of covariates on the timing of the event, given that it will occur QMSS p.12/45

Algorithms for model fitting Expectation-Maximization algorithm commonly used simple, but slow standard errors not obtained Markov chain Monte Carlo methods, e.g. Gibbs sampling QMSS p.13/45

Discrete-time event history model individual event history is a series of Bernoulli or success/failure trials at each discrete time point, e.g., month let T = waiting time to conception (success) in months, t = 1, 2, 3,... discrete hazard is h t pr(t = t T t) QMSS p.14/45

Discrete-time event history model... probability of conception at time t is pr(t = t) = (1 h 1 ) (1 h 2 ) (1 h t 1 ) h t geometric distribution has constant hazard, i.e., h t = p, and pr(t = t) = q t 1 p where q 1 p QMSS p.15/45

Geometric distributed waiting times uncensored observation at time t: likelihood term is pr(t = t) = q t 1 p or 1 success out of t trials censored observation at time t: likelihood term is pr(t t) = q t 1 QMSS p.16/45

Geometric distributed waiting times... trick: software for fitting logistic models can be used to fit geometric distributed waiting times and allow for covariates! logit(hazard) = X β observed heterogeneity in risk is modelled by a geometric regression model QMSS p.17/45

Logistic-normal-geometric model unobserved heterogeneity in risk is modelled by a mixed-geometric regression model for waiting times logit(hazard) = X β + Z σ Z N(0, 1) Z unobserved covariate value σ standard deviation parameter Z σ random effect QMSS p.18/45

Logistic-normal-geometric model... trick: software for fitting random effects logistic models can be used to fit mixed-geometric distributed waiting times and allow for observed and unobserved heterogeneity! QMSS p.19/45

Constant hazard: recurrent events T k = time between k 1 & k th conception T = T 1 + + T k probability of kth conception at time t is pr(t = t) = q t1 1 p q t2 1 p q tk 1 p uncensored observation at time t: likelihood term is pr(t = t) = q t k p k QMSS p.20/45

1st birth interval/primary sterility 1st birth interval covariates F and effects γ logit(hazard fecund) = F γ + Z σ primary sterility covariates P and effects α logit(primary sterility) = P α QMSS p.21/45

2+ birth intervals/secondarysterility 2+ birth intervals covariates H and effects δ logit(hazard fecund) = H δ + Z σ secondary sterility covariates S and effects β logit(secondary sterility) = S β QMSS p.22/45

All birth intervals 1st birth interval logit(hazard fecund) = F γ + Z σ 2+ birth intervals logit(hazard fecund) = H δ + Z σ common unobserved Z N(0, 1) common effect σ common normal random effect QMSS p.23/45

Logistic-normal-geometric model... logit(hazard) = F γ + Z σ have joint likelihood l(γ, σ, Z) but each Z is unknown obtainmarginal likelihood l(γ, σ) by integrating out Z over its prior N(0, 1) exact integration impossible as logistic is a nonlinear model QMSS p.24/45

Estimation methods quadrature approximate using numerical integration quasi-likelihood Taylor series approximation nonlinear model becomes linear Bayesian inference via Gibbs sampling samples [γ, σ, Z] from joint likelihood for inference on each parameter QMSS p.25/45

Bayesian Inference parameters random with distributions prior to data posterior given data apply Bayes Theorem posterior likelihood prior uninformative locally uniform priors = posterior standardized likelihood QMSS p.26/45

Estimation by Gibbs sampling sample from joint posterior by sampling univariately from full conditionals sample γ 1 [γ σ 0, Z 0 ] sample σ 1 [σ γ 1, Z 0 ] sample Z 1 [Z γ 1, σ 1 ] generate a joint sample (γ 1, σ 1, Z 1 ) and so on 2, 3, discard early burn-in phase QMSS p.27/45

Estimation by Gibbs sampling... fitting mixed-geometric survival model with long-term survivors is straightforward extension WinBUGS for fitting Bayesian model can use the univariate sample average for inference on each parameter QMSS p.28/45

Hutterite data 724 unions exclusions for incomplete information dataset for analysis 450 families 2,976 births 3,295 birth intervals QMSS p.29/45

Hutterite data... first birth interval 432 closed birth intervals 18 open birth intervals 2+ birth intervals 2,544 closed birth intervals 301 open birth intervals QMSS p.30/45

Effective conditional fecundability effective conception leads to live birth conditional on being fecund (not sterile) bias if open intervals excluded some women may not be sterile throughout open interval QMSS p.31/45

Model waiting time to conception waiting time to conception birth interval 8 months = 1, 2, 3, discrete-time survival model QMSS p.32/45

Effects on primary sterility mean 2.5% 97.5% constant -2.92-4.18-1.82 cohort 1910 19-1.30-3.29 0.33 >1919-0.97-2.18 0.27 age at < 21-0.17-1.50 1.17 marriage > 23 0.70-0.77 2.11 cohort & age at marriage not significant QMSS p.33/45

Effects on secondary sterility mean 2.5% 97.5% constant -6.25-8.26-6.14 cohort 1910 19 0.10-0.44 0.64 >1919 1.06 0.11 1.96 <21-79.63-221.80-2.29 24 26 0.56-1.57 2.87 age at 27 29 0.64-2.00 3.18 start of 30 32 0.84-1.99 3.53 birth 33 35 1.35-1.69 4.05 interval 36 38 1.70-1.36 4.51 39 41 3.35 0.27 6.29 >41 5.99 2.95 8.88 previous child died 0.46-0.61 1.41 previous child died not significant but one cohort & late age at start significant QMSS p.34/45

Effects on secondary sterility... mean 2.5% 97.5% 3 5-0.32-2.22 1.53 6 8 0.08-1.91 2.32 duration 9 11 0.82-1.17 3.26 of 12 14 1.15-0.85 3.81 marriage 15 17 0.94-1.17 3.67 18 20 1.45-0.63 4.15 >20 1.76-0.35 4.50 duration of marriage not significant, but positive trend QMSS p.35/45

Effects on 2+ birth spacing mean 2.5% 97.5% 3 5-0.21-0.35-0.06 6 8-0.34-0.52-0.15 duration 9 11-0.33-0.55-0.12 of 12 14-0.34-0.61-0.08 marriage 15 17-0.42-0.72-0.11 18 20-0.39-0.76-0.02 >20-0.59-1.13-0.04 sigma 0.104 0.005 0.256 duration of marriage significant with negative trend QMSS p.36/45

Effects on 2+ birth spacing mean 2.5% 97.5% constant -2.37-2.51-2.23 cohort 1910 19 0.18 0.08 0.29 >1919 0.18 0.07 0.29 <21-0.11-0.38 0.15 24 26 0.02-0.13 0.17 age at 27 29-0.02-0.21 0.16 start of 30 32-0.01-0.23 0.20 birth 33 35-0.07-0.33 0.20 interval 36 38-0.08-0.37 0.21 39 41-0.24-0.61 0.11 >41 0.05-0.58 0.66 previous child died 0.40 0.18 0.61 previous child died significant, cohort significant & age at start not significant QMSS p.37/45

Effects on first birth interval spacing mean 2.5% 97.5% constant -1.21-1.50-0.92 cohort 1910 19-0.10-0.43 0.24 >1919 0.02-0.26 0.31 age at <21-0.16-0.40 0.08 marriage >23-0.37-0.70-0.05 cohort not significant last age at start significant QMSS p.38/45

Results effects on primary sterility cohort not significant age at marrige not significant QMSS p.39/45

Results... effects on secondary sterility age at start of interval positive trend 39-41, 41+ significant duration of marrige not significant, but positive trend cohort: 1919+ significant previous child died not significant QMSS p.40/45

Results... effects on first birth spacing last age at start of interval significant cohort not significant QMSS p.41/45

Results... effects on 2+ birth spacing duration of marrige significant, negative trend age at start of interval not significant, no trend cohort: 1910-19, 1919+ significant previous child died significant, positive QMSS p.42/45

Results... unobserved heterogeneity σ mean.1042 median.0955 s.d..0692 95% CI [.0048,.2560] QMSS p.43/45

New birth interval/sterility model uses complete birth histories simultaneously analyse all intervals simultaneously models determinants of waiting time to conception primary sterility secondary sterility allows for zero risk subpopulations estimates fecundability for those fecund QMSS p.44/45

Posterior probability of sterility posterior probability that an individual with vector of explanatory variables x comes from population Y = 0, given that no event has occurred by time t pr(y = 0 X, T > t) = pr(y = 0 X) pr(y = 0 X) + pr(y = 1 X) S(t Y = 1, X) QMSS p.45/45