CTDL-Positive Stable Frailty Model

Similar documents
Multivariate Survival Analysis

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

Models for Multivariate Panel Count Data

Frailty Modeling for clustered survival data: a simulation study

Markov Chain Monte Carlo

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

Bayesian data analysis in practice: Three simple examples

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models

Statistical Inference for Stochastic Epidemic Models

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

STAT 518 Intro Student Presentation

Gibbs Sampling in Linear Models #2

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

STA 216, GLM, Lecture 16. October 29, 2007

Markov Chain Monte Carlo in Practice

Metropolis-Hastings Algorithm

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Frailty Probit model for multivariate and clustered interval-censor

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Hierarchical Modeling for Univariate Spatial Data

Modelling geoadditive survival data

Statistical Inference and Methods

Bayesian Multivariate Logistic Regression

Research & Reviews: Journal of Statistics and Mathematical Sciences

Richard D Riley was supported by funding from a multivariate meta-analysis grant from

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects

Default Priors and Effcient Posterior Computation in Bayesian

Likelihood-free MCMC

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Bayesian Linear Regression

Hierarchical Modelling for Univariate Spatial Data

Markov Chain Monte Carlo (MCMC)

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Stat 5101 Lecture Notes

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Bayesian Methods for Machine Learning

Randomized Quasi-Monte Carlo for MCMC

Lecture 5 Models and methods for recurrent event data

Markov Chain Monte Carlo methods

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

An Introduction to Bayesian Linear Regression

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Bayesian linear regression

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

A general mixed model approach for spatio-temporal regression data

Bayesian inference for multivariate skew-normal and skew-t distributions

Principles of Bayesian Inference

CPSC 540: Machine Learning

Part 8: GLMs and Hierarchical LMs and GLMs

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

CURE FRACTION MODELS USING MIXTURE AND NON-MIXTURE MODELS. 1. Introduction

Reconstruction of individual patient data for meta analysis via Bayesian approach

Analysing geoadditive regression data: a mixed model approach

A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes

Tests of independence for censored bivariate failure time data

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Report and Opinion 2016;8(6) Analysis of bivariate correlated data under the Poisson-gamma model

Bayesian Analysis for Markers and Degradation

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

MARGINAL MARKOV CHAIN MONTE CARLO METHODS

Multistate models and recurrent event models

Classical and Bayesian inference

Bayesian Multivariate Extreme Value Thresholding for Environmental Hazards

MULTILEVEL IMPUTATION 1

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Simulation-based robust IV inference for lifetime data

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Bayesian Inference for Regression Parameters

STA 4273H: Statistical Machine Learning

Hierarchical Modelling for Univariate Spatial Data

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Bayesian Modelling of Extreme Rainfall Data

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Simulating Random Variables

Multivariate Normal & Wishart

Lecture 8: The Metropolis-Hastings Algorithm

Bayesian Linear Models

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

Computational statistics

Frailty Models and Copulas: Similarities and Differences

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Bayesian Linear Models

Multistate models and recurrent event models

ABC methods for phase-type distributions with applications in insurance risk problems

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Down by the Bayes, where the Watermelons Grow

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Bayesian Linear Models

Multivariate spatial modeling

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

MCMC algorithms for fitting Bayesian models

Correlated Gamma Frailty Models for Bivariate Survival Data

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 9. Linear models and regression

Transcription:

CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland Abstract: The non-ph Canonical Time Dependent Logistic (CTDL) survival regression model is extended by incorporating a positive stable frailty component into the hazard function within the Bayesian framework. The resulting model is compared numerically with the Weibull-positive stable frailty model, using data from a placebo controlled randomized trial of gamma interferon in chronic granulotomous disease (CGD). Moreover, supremum bounds of the ratio-of-uniforms (ROU) algorithm, used for sampling from complete conditional distributions, are obtained analytically thus yielding a more efficient form of the algorithm. Keywords: Canonical Logistic, Frailty Models, Positive Stable, Non-PH, Ratioof-Uniforms. 1 Introduction A flexible non-ph model is the Canonical Time-Dependent Logistic (CTDL) model described by MacKenzie (1996, 1997). In our earlier work, the CTDL model was extended to univariate and multivariate gamma frailty models within the frequentist framework using a marginal Likelihood approach and its properties compared with the Weibull-Gamma models analytically and numerically. It was revealed via an extensive simulation study that the Weibull and Weibull-gamma models gave more precise results, in terms of standard errors, than the CTDL and CTDL-gamma models. However, analysis of real data revealed that the CTDL based models provided superior fits to the data. It has been shown (Qiou et al, 1999) that allowance is made for a higher degree of heterogeneity among subjects by using infinite variance frailty distributions than would be possible using finite variance frailty, such as gamma. A positive stable frailty term has previously been mixed with PH models. but never with non-ph basic models. Therefore, it was timely to develop a such a new model for the multivariate shared frailty setting using the CTDL hazard as the basic model and comparing its performance with that of the Weibull-positive stable frailty model. Inference was carried out in a Bayesian framework, and bounds of components necessary for the ratio-of-uniforms (ROU) algorithm for sampling from complete conditional

2 CTDL-Positive Stable Frailty Model distributions were derived successfully, thus improving the efficiency of the algorithm. 2 Positive stable frailty models Let the random variable U denote unobservable individual frailties. Buckle (1995) gives the joint density of n iid 4-parameter stable distributed rv s by using the joint pdf of U i and Y, f(u i, y ω), from which the marginal density of U i turns out to be the stable pdf. f(u i ω) = ω u i 1/(ω 1) ω 1 where 1/2 1/2 τ ω (y) = sin(πωy + ψ ω) cosπy ] u i 1 exp τ ω (y) ω/(ω 1) τ ω (y) ω/(ω 1) dy (1) cosπy cos(π(ω 1)y + ψ ω ) ω (0, 1), y ( 1/2, 1/2) and ψ ω = min(ω, 2 ω)π/2. ] (ω 1)/ω The observed data for the jth time observation for the ith individual (or alternatively for the jth individual in the ith group) is (t ij, δ ij, x ij ). Let D obs denote all such triplets. The unobserved data are the frailties u = (u 1,..., u n ). So the complete data is D = (D obs, u). Note that u is based on a vector of auxiliary variables y = (y 1,..., y n ) in equation (1). So given the data D obs and the parameters of interest, a likelihood and prior for parameters are needed so that a posterior density may be obtained. 2.1 CTDL-positive stable frailty model A non-ph CTDL regression model is defined by the hazard function λ(t x) = λ exp(tα + x β)/1 + exp(tα + x β)} (2) where λ > 0 is a scalar, α is a scalar measuring the effect of time and β is a p 1 vector of regression parameters associated with fixed covariates x = (x 1,..., x p ). The corresponding survival function is S(t x) = (1 + exp(tα + x β))/(1 + exp(x β))} λ α (3) So the complete data likelihood is given by: n m i L(λ, α, β, ω D) = λ(t ij λ, α, β, u i )] δij S(t ij λ, α, β, u i ) i=1 j=1

Blagojevic & MacKenzie. 3 = n m i ui λ exp(t ij α + x ij β) } δij 1 + exp(tij α + x ij β) } u i λ α (4) i=1 j=1 The observed data likelihood, which is simply the marginal model once the frailty components have been integrated out, is: L(λ, α, β, ω D obs ) = n i=1 mi ui λ exp(t ij α + x ij β) } δij 1 + exp(tij α + x ij β) j=1 ω u i 1/(ω 1) ω 1 1 ] u i exp τ ω (y i ) ω/(ω 1) 1/2 1/2 } u i λ α τ ω (y i ) ω/(ω 1) dy i du i (5) The posterior density, expressed in terms of the observed data likelihood and the joint prior for the parameters is: π(λ, α, β, ω D obs ) L(λ, α, β, ω D obs )p(λ)p(α)p(β)p(ω) (6) where π( ) denotes the posterior and p( ) the prior distribution. Note that independence among all parameters is assumed. The integrals in equation (5) do not have a closed form, so instead the unknown parameter vector (λ, α, β, ω) is augmented with vectors u and y and MCMC methods are used to obtain samples for (λ, α, β, ω, u, y). Complete conditional distributions are needed for λ, α, β, ω, u and y from which the corresponding samples are to be drawn. They are derived as being proportional to (6) and only the components involving the parameter of interest are retained. The choice of priors is given in table 1. All resulting complete conditional distributions are of non-standard forms and we use rejection algorithm for generation of y, Metropolis-Hastings algorithm with a beta proposal density in the case of ω and ratio of uniforms (ROU) algorithm in all other instances. We have developed a more efficient way of implementing the ROU algorithm, namely deriving its components analytically instead of performing numerical bisection. In order to demonstrate the new approach clearly, we outline briefly the ROU algorithm, exemplified by generation of α. Suppose V and W are real variables that are uniform on A= (v, w) : 0 < v < f( w v )}, where f( ) is the complete conditional distribution of interest, then the variable α = W V has the pdf that is proportional to f(α). The result is most useful when the set A is contained in a rectangle 0, a] b 1, b 2 ], in which case a type of rejection sampling can be used: Step 1: Generate V 1, V 2 U(0, 1). Step 2: Let V = av 1 and W = b 1 + (b 2 b 1 )V 2, then (V, W ) is a point

4 CTDL-Positive Stable Frailty Model randomly chosen in the rectangle 0, a] b 1, b 2 ]. Step 3: If (V, W ) A, then accept α = W V, otherwise repeat from step 1. A 0, a] b 1, b 2 ] where a = sup α f(α) b 1 = sup α 0 (α 2 f(α)) b 2 = sup α 0 (α 2 f(α)) Hence, step 3 above may be re-written as: Step 3: Accept α = W V if and only if V 2 f( W V ). Derivation of quantities a, b 1 and b 2 is necessary, and it has been claimed by various authors that this cannot be done analytically. However, we show that it is in fact possible to obtain their bounds, details will appear elsewhere. 2.2 Weibull-positive stable frailty model The familiar Weibull regression distribution has the following hazard and survival function λ(t x) = λρ(tλ) ρ 1 exp(x β) (7) S(t x) = exp( (tλ) ρ e x β ) (8) So the complete data likelihood is given by: L(λ, ρ, β, ω D) = n m i i=1 j=1 u i λρ(λt ij ) ρ 1 e x ij β} δ ij exp( ui (λt ij ) ρ e x ij β ) (9) The observed data likelihood is then: n mi L(λ, ρ, β, ω D obs ) = u i λρ(λt ij ) ρ 1 e x β} δ ij ij exp( ui (λt ij ) ρ e x ij β ) The posterior density is: i=1 j=1 ω u i 1/(ω 1) ω 1 1 1/2 1/2 ] u i exp τ ω (y i ) ω/(ω 1) τ ω (y i ) ω/(ω 1) dy i du i (10) π(λ, ρ, β, ω D obs ) L(λ, ρ, β, ω D obs )p(λ)p(ρ)p(β)p(ω) (11) Again, the integrals in (10) do not have a closed form, so MCMC methods are used to obtain samples for (λ, ρ, β, ω, u, y) as in the previous section. The choice of priors is given in table 1. It should be mentioned that only

Blagojevic & MacKenzie. 5 the prior for λ is conjugate with the likelihood, giving a standard gamma complete conditional distribution. All other complete conditional distributions are of non-standard forms and ROU sampling algorithm is employed except in the instances for y and ω as in the previous section. last section only mentioned ω as a clear exception 3 Example Data Analysis Consider now an application of the models outlined above to a data set from a placebo controlled randomized trial of gamma interferon in chronic granulotomous disease (CGD). The CGD study is described in detail in a report by the International CGD Cooperative Study Group (1991). The treatment was given to each of the 128 patients at the first scheduled visit for that patient. The data for each patient give the time to first and any recurrent serious infections, from study entry until the first scheduled visit of the patient. There is a minimum of 1 record per patient, with an additional record for each serious infection occurring up to the study completion date. For bivariate data, only information pertaining to record(s) 1, 2 and 3 is needed from which the gap times as well as censoring indicators are calculated. Only one factor, gender, is included in the analysis. Prior and hyperparameter specifications for the two models are given in table 1. Gibbs sampler is used to generate samples from the derived complete conditional distributions; S-Plus (V4.5) was used for programming. 5000 iterations were taken as burn-in and a further 5000 iterations taken for inference purposes. Table 2 gives Gelman and Rubin convergence statistic for the parameters of the two models; successful convergence is indicated for all parameters. Posterior distributions are summarized in terms of means and standard errors of each parameter in the CTDL and Weibull models with positive stable frailty in table 3. Gender is statistically significant in both models, and its negative estimated effect means that females are at a lower risk of being infected. Note that all other parameters are statistically significant too. Values ω = 0.751 in the CTDL case and ω = 0.622 in the Weibull case correspond to a reasonable degree of dependence between times of each patient (note that the value of 1(0) implies maximum dependence(independence)), more so in the CTDL case. 4 Remarks and Future work So we have shown that the CTDL-positive stable frailty model confirmed a higher degree of dependence between individual observations than the Weibull-positive stable frailty model. Sensitivity to prior specifications for models considered in this paper will be the subject of future work. We have

6 CTDL-Positive Stable Frailty Model already developed the CTDL and Weibull models with gamma distributed frailty in Bayesian framework and a comparison of these with their corresponding frequentist counterparts will be the subject of future work. Key References Buckle, D.J. (1995) Bayesian Inference for Stable Distributions. Journal of the American Statistical Association, 90, 605-613. MacKenzie, G. (1996) Regression models for survival data: the generalised time dependent logistic family. JRSS Series D, 45, 21-34. MacKenzie, G. (1997) On a non-proportional hazards regression model for repeated medical random counts. Statistics in Medicine, 16, 1831-1843. Wakefield, J.C. et al. (1991) Efficient generation of random variates via the ratio-of-uniforms method. Stat. Comput. 1, 129-133.

Blagojevic & MacKenzie. 7 TABLE 1. Prior and Hyperparameter specifications Parameter Prior Hyperparameter specifications λ w Gamma(γ, γ) γ = 0.001 λ c ρ Gamma(µ, µ) µ = 0.001 α Normal(ξ, ν) ξ = 0, ν = 1000 β Normal(ɛ, m) ɛ = 0, m = 1000 ω p(ω) = 1 0 < ω < 1 NB: ˆλ c = λ in the CTDL, ˆλ w = λ in the Weibull. TABLE 2. Gelman and Rubin statistics of Model Parametres CTDL + Positive Stable Frailty Parameter Gelman and Rubin (97.5% quantile) λ c 1.00 (1.00) α 1.02(1.02) β 1.00(1.00) ω 1.00(1.00) Weibull + Positive Stable Frailty Parameter Gelman and Rubin (97.5% quantile) λ w 1.00(1.00) ρ 1.06(1.10) β 1.00(1.00) ω 1.00(1.00) TABLE 3. Posterior Summary of Model Parameters CTDL + Positive Stable Frailty Parameter Posterior Mean Posterior S.E. λ c 0.236 0.017 α -0.097 0.032 β -0.098 0.039 ω 0.751 0.082 Weibull + Positive Stable Frailty Parameter Posterior Mean Posterior S.E. λ w 0.058 0.015 ρ 1.342 0.321 β -0.073 0.021 ω 0.622 0.066