Bayesian semiparametric analysis of short- and long- term hazard ratios with covariates Department of Statistics, ITAM, Mexico 9th Conference on Bayesian nonparametrics Amsterdam, June 10-14, 2013
Contents Motivating example Model Extension to covariates Bayesian inference Data analysis
Study Survival Study: Ovarian cancer patients who have undergone a treatment with erythoprotein (EPO) stimulating agents (ESA)
Study Survival Study: Ovarian cancer patients who have undergone a treatment with erythoprotein (EPO) stimulating agents (ESA) ESA are used to relieve chemotherapy induced anemia in patients with cancer
Study Survival Study: Ovarian cancer patients who have undergone a treatment with erythoprotein (EPO) stimulating agents (ESA) ESA are used to relieve chemotherapy induced anemia in patients with cancer EPO is a glycoprotein hormone that controls red blood cell production
Study Survival Study: Ovarian cancer patients who have undergone a treatment with erythoprotein (EPO) stimulating agents (ESA) ESA are used to relieve chemotherapy induced anemia in patients with cancer EPO is a glycoprotein hormone that controls red blood cell production In contrast, other studies have shown that ESA treatments can compromise survival for some types of cancer
Study Survival Study: Ovarian cancer patients who have undergone a treatment with erythoprotein (EPO) stimulating agents (ESA) ESA are used to relieve chemotherapy induced anemia in patients with cancer EPO is a glycoprotein hormone that controls red blood cell production In contrast, other studies have shown that ESA treatments can compromise survival for some types of cancer It was believed that effects of ESA was induced via the canonical EPO receptor (EpoR)
Study Survival Study: Ovarian cancer patients who have undergone a treatment with erythoprotein (EPO) stimulating agents (ESA) ESA are used to relieve chemotherapy induced anemia in patients with cancer EPO is a glycoprotein hormone that controls red blood cell production In contrast, other studies have shown that ESA treatments can compromise survival for some types of cancer It was believed that effects of ESA was induced via the canonical EPO receptor (EpoR) Objective: test whether the new ephrin-type B receptor 4 (EphB4) can better explain the effect of ESA.
Study n = 174 patients
Study n = 174 patients Covariates: X 1 = ESA treatment indicator (0 or 1). 51% received treat. X 2 = EphB4 level of expression (low or high). 39% high expsn. X 3 = EpoR level of expression (low or high) X 4 = Cytoreduction (suboptimal and optimal) X 5 = Disease s stage (I V) X 6 = Age (20 92). Average of 58 years
Study n = 174 patients Covariates: X 1 = ESA treatment indicator (0 or 1). 51% received treat. X 2 = EphB4 level of expression (low or high). 39% high expsn. X 3 = EpoR level of expression (low or high) X 4 = Cytoreduction (suboptimal and optimal) X 5 = Disease s stage (I V) X 6 = Age (20 92). Average of 58 years Tested proportionality assumption for each covariate. Found X 1 and X 4 with non proportional behaviour
Study K M survival curves 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 Time Figure : KM curves for ESA treatment indicator.
Definitions In survival analysis, let
Definitions In survival analysis, let h T (t) hazard rate for the treatment group
Definitions In survival analysis, let h T (t) hazard rate for the treatment group h C (t) hazard rate for the control group
Definitions In survival analysis, let h T (t) hazard rate for the treatment group h C (t) hazard rate for the control group We define the hazard ratio as HR(t) = h T (t) h C (t)
Definitions In survival analysis, let h T (t) hazard rate for the treatment group h C (t) hazard rate for the control group We define the hazard ratio as HR(t) = h T (t) h C (t) There are three most common behaviours for HR
Definitions In survival analysis, let h T (t) hazard rate for the treatment group h C (t) hazard rate for the control group We define the hazard ratio as HR(t) = h T (t) h C (t) There are three most common behaviours for HR HR(t) = c proportional hazards
Definitions In survival analysis, let h T (t) hazard rate for the treatment group h C (t) hazard rate for the control group We define the hazard ratio as HR(t) = h T (t) h C (t) There are three most common behaviours for HR HR(t) = c proportional hazards lim t HR(t) = 1 proportional odds model
Definitions In survival analysis, let h T (t) hazard rate for the treatment group h C (t) hazard rate for the control group We define the hazard ratio as HR(t) = h T (t) h C (t) There are three most common behaviours for HR HR(t) = c proportional hazards lim t HR(t) = 1 proportional odds model Non-monotone HR(t). If HR(t) < 1 and HR(t) > 1, for some t, the hazards cross
Semiparametric model Yang and Prentice (2005) proposed a model that accounts for the previous three behaviours of a HR
Semiparametric model Yang and Prentice (2005) proposed a model that accounts for the previous three behaviours of a HR Their model is S C (t) = {1 + R(t)} 1 and S T (t) = {1 + λθ } θ R(t) This implies that HR(t) = λθ λ + (θ λ)/{1 + R(t)} where λ, θ > 0 and R(t) is monotone nondecreasing
Semiparametric model The model s name is justified by lim HR(t) = λ and lim HR(t) = θ t 0 t
Semiparametric model The model s name is justified by lim HR(t) = λ and lim HR(t) = θ t 0 t If λ = θ proportional hazards
Semiparametric model The model s name is justified by lim HR(t) = λ and lim HR(t) = θ t 0 t If λ = θ proportional hazards If θ = 1 proportional odds model
Semiparametric model The model s name is justified by lim HR(t) = λ and lim HR(t) = θ t 0 t If λ = θ proportional hazards If θ = 1 proportional odds model If (λ 1)(θ 1) < 0 hazard functions cross
Semiparametric model The model s name is justified by lim HR(t) = λ and lim HR(t) = θ t 0 t If λ = θ proportional hazards If θ = 1 proportional odds model If (λ 1)(θ 1) < 0 hazard functions cross The model is semiparametric because it includes: (λ, θ) and R(t) = 1 S C (t) S C (t) is the odds function of the control group
Semiparametric model S(t) 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Figure : Varying (λ, θ): (1, 1) (solid line), (2, 0.5) (dashed line), and (0.5, 3) (dotted-dashed line). t
Extensions Nieto-Barajas (2013) extended the model to include covariates with S(t x i ) = { 1 + λ } θi i R(t) θ i λ i = exp(δ x i ) and θ i = exp(γ x i ) where δ and γ do not have intercept
Extensions Nieto-Barajas (2013) extended the model to include covariates with S(t x i ) = { 1 + λ } θi i R(t) θ i λ i = exp(δ x i ) and θ i = exp(γ x i ) where δ and γ do not have intercept Here we can have more than two groups
Extensions Nieto-Barajas (2013) extended the model to include covariates with S(t x i ) = { 1 + λ } θi i R(t) θ i λ i = exp(δ x i ) and θ i = exp(γ x i ) where δ and γ do not have intercept Here we can have more than two groups HR i (t) = h(t x i )/h(t 0) has the same interpretation as in the two groups model
Extensions Nieto-Barajas (2013) extended the model to include covariates with S(t x i ) = { 1 + λ } θi i R(t) θ i λ i = exp(δ x i ) and θ i = exp(γ x i ) where δ and γ do not have intercept Here we can have more than two groups HR i (t) = h(t x i )/h(t 0) has the same interpretation as in the two groups model Some individuals can have prop. hazards, some others prop. odds and some others crossing hazards
Follow a Bayesian approach for inference on δ, γ and R(t)
Follow a Bayesian approach for inference on δ, γ and R(t) Concentrate first on R(t):
Follow a Bayesian approach for inference on δ, γ and R(t) Concentrate first on R(t): is monotone nondecreasing function
Follow a Bayesian approach for inference on δ, γ and R(t) Concentrate first on R(t): is monotone nondecreasing function lim t 0 R(t) = 0 and lim t R(t) =
Follow a Bayesian approach for inference on δ, γ and R(t) Concentrate first on R(t): is monotone nondecreasing function lim t 0 R(t) = 0 and lim t R(t) = R(t) behaves like a cumulative hazard function (c.h.f.)
Follow a Bayesian approach for inference on δ, γ and R(t) Concentrate first on R(t): is monotone nondecreasing function lim t 0 R(t) = 0 and lim t R(t) = R(t) behaves like a cumulative hazard function (c.h.f.) There is a vast list of priors for c.h.f.:
Follow a Bayesian approach for inference on δ, γ and R(t) Concentrate first on R(t): is monotone nondecreasing function lim t 0 R(t) = 0 and lim t R(t) = R(t) behaves like a cumulative hazard function (c.h.f.) There is a vast list of priors for c.h.f.: beta and beta-stacy (NTR) (Hjort, 1990; Walker and Muliere, 1997)
Follow a Bayesian approach for inference on δ, γ and R(t) Concentrate first on R(t): is monotone nondecreasing function lim t 0 R(t) = 0 and lim t R(t) = R(t) behaves like a cumulative hazard function (c.h.f.) There is a vast list of priors for c.h.f.: beta and beta-stacy (NTR) (Hjort, 1990; Walker and Muliere, 1997) mixtures of gamma and weighted gamma (Dykstra and Laud, 1981; Lo and Weng, 1989)
Follow a Bayesian approach for inference on δ, γ and R(t) Concentrate first on R(t): is monotone nondecreasing function lim t 0 R(t) = 0 and lim t R(t) = R(t) behaves like a cumulative hazard function (c.h.f.) There is a vast list of priors for c.h.f.: beta and beta-stacy (NTR) (Hjort, 1990; Walker and Muliere, 1997) mixtures of gamma and weighted gamma (Dykstra and Laud, 1981; Lo and Weng, 1989) Lévy-driven proceses (Nieto-Barajas and Walker, 2004)
Follow a Bayesian approach for inference on δ, γ and R(t) Concentrate first on R(t): is monotone nondecreasing function lim t 0 R(t) = 0 and lim t R(t) = R(t) behaves like a cumulative hazard function (c.h.f.) There is a vast list of priors for c.h.f.: beta and beta-stacy (NTR) (Hjort, 1990; Walker and Muliere, 1997) mixtures of gamma and weighted gamma (Dykstra and Laud, 1981; Lo and Weng, 1989) Lévy-driven proceses (Nieto-Barajas and Walker, 2004) We will use the class of Lévy-driven processes
Lévy-driven prior: mixture of kernel k(t, s) with respect to a Lévy process (or IAP) r(s) R(t) = 0 k(t, s)r(ds)
Lévy-driven prior: mixture of kernel k(t, s) with respect to a Lévy process (or IAP) r(s) R(t) = 0 k(t, s)r(ds) k(t, s) is nondecreasing a right-continuous as a function of t
Lévy-driven prior: mixture of kernel k(t, s) with respect to a Lévy process (or IAP) r(s) R(t) = 0 k(t, s)r(ds) k(t, s) is nondecreasing a right-continuous as a function of t lim t 0 k(t, s) = 0 for all s 0
Lévy-driven prior: mixture of kernel k(t, s) with respect to a Lévy process (or IAP) r(s) R(t) = 0 k(t, s)r(ds) k(t, s) is nondecreasing a right-continuous as a function of t lim t 0 k(t, s) = 0 for all s 0 This is a general class
Lévy-driven prior: mixture of kernel k(t, s) with respect to a Lévy process (or IAP) r(s) R(t) = 0 k(t, s)r(ds) k(t, s) is nondecreasing a right-continuous as a function of t lim t 0 k(t, s) = 0 for all s 0 This is a general class If k(t, s) = h(t, s)i (s t)β(s), and r(s) a gamma p. L&W(89)
Lévy-driven prior: mixture of kernel k(t, s) with respect to a Lévy process (or IAP) r(s) R(t) = 0 k(t, s)r(ds) k(t, s) is nondecreasing a right-continuous as a function of t lim t 0 k(t, s) = 0 for all s 0 This is a general class If k(t, s) = h(t, s)i (s t)β(s), and r(s) a gamma p. L&W(89) h(t, s) = (t s), and r(s) an extended gamma p. D&L(81)
Lévy-driven prior: mixture of kernel k(t, s) with respect to a Lévy process (or IAP) r(s) R(t) = 0 k(t, s)r(ds) k(t, s) is nondecreasing a right-continuous as a function of t lim t 0 k(t, s) = 0 for all s 0 This is a general class If k(t, s) = h(t, s)i (s t)β(s), and r(s) a gamma p. L&W(89) h(t, s) = (t s), and r(s) an extended gamma p. D&L(81) k(t, s) = { 1 e a(t s)} I (s t)), and r(s) a finite p. N-B&W(04)
For the kernel we will take
For the kernel we will take Location Weibull kernel k(t, s) = {1 e a(t s)b }I (s t), a, b > 0
For the kernel we will take Location Weibull kernel k(t, s) = {1 e a(t s)b }I (s t), a, b > 0 Mean Weibull kernel k(t, s) = 1 e {Γ(1+b 1 )t/s} b, b > 0
R(t) 0 1 2 3 4 0 2 4 6 8 10 Figure : Kernel comparison. Loc. exp. (dotted line), loc. Weib. (solid line), and mean Weib. (dashed line). t
For the Lévy intensity ν(dv, ds) = ρ(dv s)α(ds) we will take
For the Lévy intensity ν(dv, ds) = ρ(dv s)α(ds) we will take A non-homogeneous version of the GG p. ρ(dv s) = Γ(1 ɛ) 1 v (1+ɛ) e β(s)v dv, where
For the Lévy intensity ν(dv, ds) = ρ(dv s)α(ds) we will take A non-homogeneous version of the GG p. ρ(dv s) = Γ(1 ɛ) 1 v (1+ɛ) e β(s)v dv, where ɛ {(0, 1) { 1}}
For the Lévy intensity ν(dv, ds) = ρ(dv s)α(ds) we will take A non-homogeneous version of the GG p. ρ(dv s) = Γ(1 ɛ) 1 v (1+ɛ) e β(s)v dv, where ɛ {(0, 1) { 1}} β(s) = cs d, c > 0, d { 1, 0, 1} s 0
For the Lévy intensity ν(dv, ds) = ρ(dv s)α(ds) we will take A non-homogeneous version of the GG p. ρ(dv s) = Γ(1 ɛ) 1 v (1+ɛ) e β(s)v dv, where ɛ {(0, 1) { 1}} β(s) = cs d, c > 0, d { 1, 0, 1} s 0 Together with α(s) = ωp 0 (s)
For the Lévy intensity ν(dv, ds) = ρ(dv s)α(ds) we will take A non-homogeneous version of the GG p. ρ(dv s) = Γ(1 ɛ) 1 v (1+ɛ) e β(s)v dv, where ɛ {(0, 1) { 1}} β(s) = cs d, c > 0, d { 1, 0, 1} s 0 Together with α(s) = ωp 0 (s) In notation r nh-gg(ɛ, β( ), ω; P 0 )
E{S(t)} 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 t E{S(t)} 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 t E{S(t)} 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 t E{S(t)} 0.0 0.2 0.4 0.6 0.8 1.0 E{S(t)} 0.0 0.2 0.4 0.6 0.8 1.0 E{S(t)} 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 t Figure : Prior E{S(t λ, θ)}. Loc.Weib.(a, b) = (1, 2) (top row), mean Weib.b = 2 (bottom row). Decreasing β (first column), constant β (second column), and increasing β (third column). (λ, θ): (1, 1) (solid line), (2, 0.5) (dashed line), and (0.5, 3) (dotted line). t t
For the vector parameters δ and γ we will take
For the vector parameters δ and γ we will take δ s N(0, σδ 2 ), for s = 1,..., p
For the vector parameters δ and γ we will take δ s N(0, σδ 2 ), for s = 1,..., p γ s N(0, σγ), 2 for s = 1,..., p
For the vector parameters δ and γ we will take δ s N(0, σδ 2 ), for s = 1,..., p γ s N(0, σγ), 2 for s = 1,..., p δ s and γ s independent a-priori
Posterior characterization: via a Gibbs sampler with [r t, δ, γ], [δ t, r, γ], [γ t, r, δ]
Posterior characterization: via a Gibbs sampler with [r t, δ, γ], [δ t, r, γ], [γ t, r, δ] The tricky part is to find [r t, δ, γ].
Posterior characterization: via a Gibbs sampler with { E [r t, δ, γ], [δ t, r, γ], [γ t, r, δ] The tricky part is to find [r t, δ, γ]. This will be done by computing { E e } φ(s)r(ds) P(A Z) e φ(s)r(ds) } A = E{P(A Z)} where A = (A 1,..., A n ) is the data with A i either an exact or right censored obs.
To find [r t, δ, γ] we require latent variables: A pair (u i, w i ) for each exact obs., and A single u i for each right censored obs.
To find [r t, δ, γ] we require latent variables: A pair (u i, w i ) for each exact obs., and A single u i for each right censored obs. The u i s are dependent and the w i s are independent
To find [r t, δ, γ] we require latent variables: A pair (u i, w i ) for each exact obs., and A single u i for each right censored obs. s are independent [r t, w] = [r t, w, u][u t, w]d u, with [r t, w, u] an updated IAP with fixed jump locations at w i s The u i s are dependent and the w i
To find [r t, δ, γ] we require latent variables: A pair (u i, w i ) for each exact obs., and A single u i for each right censored obs. s are independent [r t, w] = [r t, w, u][u t, w]d u, with [r t, w, u] an updated IAP with fixed jump locations at w i s To simplify computations on [δ t, r, γ] and [γ t, r, δ] we introduce: Another latent variable v i for each obs. The u i s are dependent and the w i
To find [r t, δ, γ] we require latent variables: A pair (u i, w i ) for each exact obs., and A single u i for each right censored obs. s are independent [r t, w] = [r t, w, u][u t, w]d u, with [r t, w, u] an updated IAP with fixed jump locations at w i s To simplify computations on [δ t, r, γ] and [γ t, r, δ] we introduce: Another latent variable v i for each obs. These v i s are independent with gamma cond. dist. The u i s are dependent and the w i
Data analysis Implemented our semiparametric model with 6 covariates Same covariates were used in λ i and θ i
Data analysis Implemented our semiparametric model with 6 covariates Same covariates were used in λ i and θ i Table : LPML statistics for different prior settings. (ω, ɛ) (c, d) Loc. Weib. K. mean-s Weib. K. (1, 1) 243.85 244.43 (1, 0.5) (1, 0) 239.39 240.76 (1, 1) 240.54 243.54 (1, 1) 244.37 247.96 (10, 0.5) (1, 0) 240.97 241.66 (1, 1) 241.56 245.70 (1, 1) (1, 0) 239.84 241.89 Prop. Hazards 284.48
Data analysis R(t) 0 1 2 3 4 5 6 h(t 0) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14 t t Figure : Post. estimates. R(t) (left) and baseline hazard h(t 0) (right). Using mean Weibull kernel (solid) and using location Weibull kernel (dashed).
Data analysis Table : Posterior estimates of covariate coefficients δ and γ. Loc. Weib. K. Parameter Mean 95% CI P(< 0) δ 1 0.337 ( 0.160, 0.883) 0.066 δ 2 1.442 (0.885, 2.060) 0.000 δ 3 0.213 ( 0.401, 0.944) 0.282 δ 4 0.683 ( 1.219, 0.136) 0.990 δ 5 0.195 ( 0.135, 0.494) 0.114 δ 6 1.646 ( 0.617, 3.719) 0.070 γ 1 1.261 (0.558, 1.685) 0.000 γ 2 0.949 (0.250, 1.591) 0.010 γ 3 0.356 ( 0.894, 0.049) 0.955 γ 4 0.500 ( 0.242, 0.952) 0.080 γ 5 0.032 ( 0.279, 0.338) 0.421 γ 6 2.453 ( 0.834, 5.750) 0.091
Data analysis h(t) 0.0 0.2 0.4 0.6 0 2 4 6 8 10 12 14 Figure : Posterior predictive hazard rates. Hypothetical individuals x 1 = (0, 0, 0, 1, 3, 92), x 2 = (0, 1, 0, 1, 3, 20), x 3 = (1, 0, 0, 1, 3, 92), and x 4 = (1, 1, 0, 1, 3, 20). t
References Yang, S. and Prentice, R. (2005). Semiparametric analysis of short-term and long-term hazard ratios with two sample survival data. Biometrika 92, 1 17. Nieto-Barajas, L.E. (2013). Bayesian semiparametric analysis of short- and long-term hazard ratios with covariates. Computational Statistics and Data Analysis. In press. DOI: 10.1016/j.csda.2013.03.012