Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

Size: px

Start display at page:

Download "Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial"

Dorthy Powers
5 years ago
Views:

1 Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial Jin Kyung Park International Vaccine Institute Min Woo Chae Seoul National University R. Leon Ochiai International Vaccine Institute Yongdai Kim Seoul National University

2 Outlines 1. Motivational data 2. Frailty model, parameters, and priors 3. MCMC algorithm 4. Deviance Information Criterion 5. Simulation 6. Example 7. Summary 1

3 1. Motivational data To assess typhoid vaccine effectiveness, the International Vaccine Institute (IVI) coordinated a clinical trial such that Clustered randomization trial (CRT) Two-year surveillance at Kolkata, India from 2004 to 2006 Two-year surveillance at Karachi, Pakistan from 2002/2003 to 2004/2005 Time-to-first event of typhoid or censoring Risk factors: gender, age group etc. 2

4 Figure 1: Geographic clusters in Kolkata, India Source: The New England Journal of Medicine,

5 Table 1: Basic statistics Country Clusters Subjects Males Children Cases no. no. no.(%) no.(%) no.(ir /1000) Kolkata, India (51.49) 2192 (18.33) 109 (9.12) Karachi, Pakistan (51.74) 6481 (23.80) 79 (2.9) Vaccinees (2-16 years aged people) 5 yrs aged Incidence rate(no. of cases/no. of subjects) Clustering effect (random effects) Variance component of random effects by country Common effect of the treatment (vaccine effectiveness) We propose a way of selecting a model for the multi-country CRT based on the deviance information criterion (DIC) 4

6 2. Frailty model Observations: (T ijk, δ ijk, z ijk ), i = 1,, n s, j = 1, 2,, n ic, k = 1, 2,, n ijl where T ijk = min{x ijk, C ijk }, δ ijk = I(X ijk C ijk ),z ijk = (z ijk1,..., z ijkp ), X ijk and C ijk are survival and censoring times respectively n s is the number of sites (i.e., countries) n ic is the number of clusters of ith site n ijl the number of subjects in the jth cluster of ith site p is the number of covariates 5

7 Model a i (t z ijk, η ij ) = η ij exp(z ijkβ)a i (t) where β = (β 1,..., β p ) are the regression coefficients, a i (t) are baseline hazard functions of ith site and η ij (i = 1,, n s, j = 1, 2,, n ic ) are cluster specific frailty terms. Frailty η ij Γ(α i, α i ) It is assumed that η ij are independent random variables distributed by the gamma distribution with mean 1 and variance 1/α i. Here, α i can be considered as the degree of cluster heterogeneity of ith site (Clayton, 1978). 6

8 Parameters: β, a i ( ), α i Priors Normal prior for β m N(0, σm), 2 m = 1,..., p Gamma prior for α i, α i Γ(τ i1, τ i2 ) Bayesian bootstrap prior for a i ( ) 7

9 Bayesian Bootstrap (Kim and Lee, 2003) To approximate the full Bayesian posterior by Bayesian bootstrap posterior that is proportional to the product of the empirical likelihood and prior The conditional likelihood of θ = (β, a( )) given η = (η 1,, η nc ) L(θ η) = = n c n i i=1 j=1 n c n i i=1 j=1 ( ) ( η i exp(z ijβ)a(t δij ij ) exp ( ) ( η i exp(z ijβ)da(t δij ij ) exp tij 0 tij 0 η i exp(z ijβ)a(s)ds ) η i exp(z ijβ)da(s) ) where A(t) = t 0 a(s)ds is the cumulative baseline hazard function. 8

10 Let q be the number of distinct uncensored observations Let 0 < t 1 < < t q be the corresponding ordered uncensored observations Then, the empirical likelihood is obtained by assuming that A is a step function having jumps only at t 1,..., t q and replacing da(t) by A(t) = A(t) A(t ) L E (θ η) = n c n i i=1 j=1 ( ) η i exp(z ijβ) A(t δij ij ) exp Bayesian bootstrap prior for A (Kim and Lee, 2003) π( A) q k=1 1 A(t k ) Posterior Empirical likelihood BB prior η i exp(zijβ) A(t k ) k:t k t ij 9

11 Advantages of the Bayesian bootstrap Finitely many parameters: standard Bayes theorem can be used for posterior calculation Parameters: β = (β 1,, β p ), { A(t k ), k = 1,..., q}, α Computation of posterior is easy The resulting posterior is always proper, approximates the full Bayesian posterior well, and has desirable large sample properties. Kim and Lee (2003) Annals of Statistics 10

12 3. MCMC algorithm Step 1 : updating cumulative baseline hazards (A i ) Step 2 : updating regression coefficients (β 1,..., β p ) Step 3 : updating cluster specific random effects (η ij ) Step 4 : updating variance of cluster specific random effects (α i ) 11

13 A i (t l ) others Gamma(a il, b il ), where and b il = where a il = (i,j,k) D(t l ) (i,j,k) R(t l ) δ ijk η ij exp(z ijkβ) R(t) = {(i, j, k) : T ijk t, i = 1,..., n s, j = 1, 2,, n c, k = 1, 2,, n j } and D(t) = {(i, j, k) : T ijk = t, δ ijk = 1, i = 1,..., n c, j = 1, 2,, n i, k = 1, 2,, n j }. π p (β 1,..., β p others) q η ij exp(z ijkβ) exp Ai (t l ) l=1 (i,j,k) D(t l ) (i,j,k) R(t l ) Metropolis-Hastings algorithm is used for generating β. η ij exp(z ijkβ) π(β) 12

14 η ij others Gamma(c ij, d ij ), where c ij = n c j=1 δ ijk + α i and d ij = n c j=1 exp(z ijkβ)a(t ijk ) + α i. π p (α i others) [ nc j=1 ] α α i i Γ(α i ) ηα i 1 ij exp( α i η ij ) α τ i1 1 i exp( τ i2 α i ). Metropolis-Hastings algorithm is used for generating α i. 13

15 Exemplary data: Model selection using CRT data of two countries Table 2: Models: - mean that the corresponding parameters are all different Model Treatment effect Effect of other risk factors Cluster heterogeneity M M2 β 11 = = β nc M3 - β 1( 1) = = β nc ( 1) - M4 β 11 = = β nc 1 β 1( 1) = = β nc ( 1) - M5 - - α 1 = = α nc M6 β 11 = = β nc 1 - α 1 = = α nc M7 - β 1( 1) = = β nc ( 1) α 1 = = α nc M8 β 11 = = β nc 1 β 1( 1) = = β nc ( 1) α 1 = = α nc 14

16 4. Deviance Information Criterion (DIC) Bayesian Model selection criterion (Spiegelhalter et al., 2002) DIC(θ) = 4E θ [log f(y θ) y] + 2 log f(y θ(y)) where f(y θ) be the density of data y and parameter of interest, θ DIC is defined as the sum of the expected deviance and the effective number of parameters A simple and intuitively appealing extension of the Akaike Information Criterion (AIC, Akaike 1973) 15

17 Extensions of DIC (Celux et al., 2006) Observed DIC Use the marginal likelihood to define the DIC Often the observed DIC is difficult to calculate Alternatives Complete DIC Conditional DIC 16

18 Complete DIC Defined as the posterior expectation of the DIC obtained assuming the random effects are known DIC comp1 = 4E θ,w [log f(y, w θ) y] + 2E w [log f(y, w θ(y, w)) y] where f(y, w θ) be the density of y and w y and w represent observed data and random effects, respectively Computationally demanding when θ(y, w) does not have a closed form solution 17

19 Complete DIC type 2 Replace the term E w [log f(y, w θ(y, w)) y] by log f(y, w(y) θ(y)) DIC comp1 = 4E θ,w [log f(y, w θ) y] + 2E w [log f(y, w θ(y, w)) y] DIC comp2 = 4E θ,w [log f(y, w θ) y] + 2 log f(y, w(y) θ(y)) ( w(y), θ(y)) is an reasonable estimator of (w, θ) by treating w as an additional parameter (of interest) Tends to overestimate the effective number of parameters 18

20 Complete DIC type 3 Replace θ(y, w) in the DIC comp1 by θ(y) DIC comp1 = 4E θ,w [log f(y, w θ) y] + 2E w [log f(y, w θ(y, w)) y] DIC comp2 = 4E θ,w [log f(y, w θ) y] + 2 log f(y, w(y) θ(y)) DIC comp3 = 4E θ,w [log f(y, w θ) y] + 2E w [log f(y, w θ(y)) y] If we let θ = E θ [θ y], which is easily obtained using MCMC samples 19

21 Conditional DIC The conditional likelihood of y given w and θ is used DIC cond1 = 4E θ,w [log f(y w, θ) y] + 2E w [log f(y w, θ(y, w)) y] DIC cond2 = 4E θ,w [log f(y w, θ) y] + 2 log f(y w(y), θ(y)) DIC cond3 = 4E θ,w [log f(y w, θ) y] + 2E w [log f(y w, θ(y)) y] Note that the DIC cond1, DIC cond2 and DIC cond3 are conditional likelihood versions of the DIC comp1, DIC comp2 and DIC comp3, respectively. 20

22 5. Simulations Complete/Conditional DIC type 2 and 3 are used for model selection Generate 100 data sets with no. of sites, n s = 2 no. of clusters per site, n c = 20 no. of subjects per cluster, n k = 5 Generate Z 1 and Z 2 as covariates independently from the symmetric Bernoulli distribution with success probability 0.5. For the baseline hazard functions, we let a 1 (t) = 1 and a 2 (t) = 2 21

23 Consider the four models: 1. Equal parameter of interest and equal nuisance parameters SM1: β 11 = β 21 = 1.0, β 12 = β 22 = 0.5, α 1 = α 2 = Different parameter of interest and equal nuisance parameters SM2: (β 11, β 21 ) = ( 1.0, 0.0) and β 12 = β 22 = 0.5, α 1 = α 2 = Equal parameter of interest and different nuisance parameters SM3: β 11 = β 21 = 1.0, (β 12, β 22 ) = (1.0, 0.0), (α 1, α 2 ) = (0.3, 0.7) 4. Different parameter of interest and different nuisance parameters SM4: (β 11, β 21 ) = ( 1.0, 0.0), (β 12, β 22 ) = (1.0, 0.0), (α 1, α 2 ) = (0.3, 0.7) 22

24 Table 3: No frailty: the frequencies of the models selected by DICs among 100 data sets Censoring Simulation Partial likelihood Full likelihood prob. Model SM1 SM2 SM3 SM4 SM1 SM2 SM3 SM4 0.0 SM SM SM SM SM SM SM SM Selection by using AIC 23

25 Table 4: Frailty: the frequencies of the models selected by Complete DICs among 100 data sets True model COMPLETE DICs Censoring Simulation Type 2 Type 3 prob. Model SM1 SM2 SM3 SM4 SM1 SM2 SM3 SM4 0.0 SM SM SM SM SM SM SM SM

26 Table 5: Frailty: the frequencies of the models selected by Conditional DICs among 100 data sets True model CONDITIONAL DICs Censoring Simulation Type 2 Type 3 prob. Model SM1 SM2 SM3 SM4 SM1 SM2 SM3 SM4 0.0 SM SM SM SM SM SM SM SM

27 Summary of simulations DIC using full likelihood selects model well where the data has no frailties Complete DICs highly depend on frailty distribution Conditional DICs is proposed to be used for the model selection of the frailty models 26

28 DIC cond2 = 4E θ,w [log f(y w, θ) y] + 2 log f(y w(y), θ(y)) DIC cond3 = 4E θ,w [log f(y w, θ) y] + 2E w [log f(y w, θ(y)) y] DIC comp2 = 4E θ,w [log f(y w, θ) y] + 2 log f(y w(y), θ(y)) 4E θ,w [log π(w θ)] + 2 log π( w(y) θ(y)) DIC comp3 = 4E θ,w [log f(y w, θ) y] + 2E w [log f(y w, θ(y)) y] 4E θ,w [log π(w θ)] + 2E w [log π(w θ(y))] where w = η i and the distribution of η i follows η i Gamma(α, α) 27

29 6. Exemplary data: model selection using CRT data of two countries Table 6: Models: - mean that the corresponding parameters are all different Model Treatment effect Effect of other risk factors Cluster heterogeneity M M2 β 11 = = β nc M3 - β 1( 1) = = β nc ( 1) - M4 β 11 = = β nc 1 β 1( 1) = = β nc ( 1) - M5 - - α 1 = = α nc M6 β 11 = = β nc 1 - α 1 = = α nc M7 - β 1( 1) = = β nc ( 1) α 1 = = α nc M8 β 11 = = β nc 1 β 1( 1) = = β nc ( 1) α 1 = = α nc 28

30 MCMC sampling 500 times of burn-in no. of repetitions = no. of updates =

31 Table 7: Results of fitting data to frailty model: Type 2 DICs Conditional DIC Complete DIC Model DIC p D DIC p D M M M M M M M M

32 Table 8: Results of fitting data to frailty model: Type 3 DICs Conditional DIC Complete DIC Model DIC p D DIC p D M M M M M M M M

33 Table 9: Results of fitting data Model Country Effect coeff HR 95% CI of HR M4 I+II Vaccine (0.306, 0.707) Child (1.446, 2.670) Male (0.387, 1.454) Variance of frailties of Kolkata 1/ ˆα B 1 = (0.387, 1.236) Karachi 1/ ˆα B 2 = (0.768, 2.560) M2 I+II Vaccine (0.310, 0.728) I Child (1.287, 3.007) Male (0.811, 1.740) II Child (1.262, 3.076) Male (0.608, 1.402) Variance of frailties of Kolkata 1/ ˆα B 1 = (0.403, 1.233) Karachi 1/ ˆα B 2 = (0.772, 2.756) ˆα i B is the Bayes estimate of α i (i = 1, 2) 32

34 Summary of the example Model selection is done using the conditional DICs Data of CRT from two countries are combined The cluster effects are heterogeneous among countries (variance component of countries) 33

35 7. Summary Introduce frailty model for the cluster randomized trial Bayesian bootstrap prior for baseline hazard function Gamma prior frailty Introduce model selection for frailty model Complete DIC Conditional DIC Perform intensive simulation for the proportional hazards model and frailty model Conditional DIC is proposed for the frailty model selection 34

36 Acknowledgement This work was supported by the Bill and Melinda Gates Foundation through the Diseases of the Most Impoverished (DOMI) program, administered by the International Vaccine Institute (IVI), Seoul, Korea. Vaccine, Children, and a Better World 35

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland