Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25

Right censored time-to-event data with covariates Suppose the following data are available: D t n = {(t i, δ i, x i ), i = 1,..., n}, where t i observed survival time for the ith individual, δ i : censoring indicator, x i = (x i1,..., x ip ) : vector of covariates. Non-informative random censoring with t i = min(t i, C i ) and δ i = I (T i C i ), where I ( ) denotes the indicator function. The covariates are assumed to be independent of time t. Winter term 2018/19 2/25

Regression models for time-to-event data How can we study the effect of a number of covariates on the survival experience in a manner similar to other regression models? There may be settings in which the distribution of survival time has a known parametric form. A fully parametric regression model accomplishes two goals simultaneously: 1 it describes the basic underlying distribution of survival time (error component), and 2 it characterizes how the the distribution changes as a function of the covariates (systematic component). Winter term 2018/19 3/25

Proportional hazards assumption Suppose that patients are randomised to receive either a standard treatment or a new treatment. Let h S (t) (h N (t)) be the hazard of death at time t for patients on the standard treatment (new treatment). Proportional hazards assumption: h N (t) = ψh S (t), where ψ is a constant, known as the hazard ratio. If ψ < 1 (ψ > 1), the hazard of death at t is smaller (greater) for an individual on the new drug, relative to an individual on the standard treatment. Winter term 2018/19 4/25

General proportional hazards model Let h 0 (t) be the hazard function for an individual for whom x = 0, known as the baseline hazard function. The hazard function for the ith individual can then be written as h i (t) = exp(x i θ)h 0 (t), where θ = (θ 1,..., θ p ) is the vector of coefficients of the explanatory variables x 1,..., x p. Linear model for the logarithm of the hazards ratio: ( ) hi (t) ln = x i θ. h 0 (t) Winter term 2018/19 5/25

Parametric proportional hazards models In semiparametric proportional hazards models, the form of h 0 (t) is unspecified. In parametric models a specific probability distribution is assumed for the survival times, and this imposes a particular parametric form on h 0 (t). However, relatively few probability distributions can be used with parametric proportional hazards models. Moreover, distributions that are available such as the Weibull and Gompertz distribution lead to hazard functions that increase or decrease monotonically. Winter term 2018/19 6/25

Parametric regression structure The distribution of T as a function of covariates x is characterized via the equation T = exp(x β) exp(σɛ), where β = (β 0, β 1,..., β p ) is vector of regression coefficients, ɛ is the error component and σ is a scale parameter for the distribution of ɛ. Log-linear form of the model: ln(t ) = x β + σɛ. Survival time models that can be linearized by taking logs are called accelerated failure time (AFT) models. Winter term 2018/19 7/25

Accelerated failure time assumption Let S S (t) and S NS (t) denote the survival functions of smokers and non-smokers, respectively. AFT assumption: S NS (t) = S S (γt), where γ > 0 is a constant named acceleration factor. The AFT assumption can also be expressed as γt NS = T S, where T NS is a random variable representing the survival time for nonsmokers and T S is the analogous one for smokers. Winter term 2018/19 8/25

cceleration factor he acceleration factor is a ratio of time-quantiles corresponding to Illustration 268 the 7. y fixed value of S(t). acceleration Parametric factor Survival Models S(t) 1.00 0.75 0.50 0.25 γ = 2 distance to G = 1 distance to G = 2 G = 1 G = 2 Survival curves for Group 1 (G = 1) and Group 2 (G = 2) Horizontal lines are twice as long to G = 2Winter compared term 2018/19 to 9/25 G = 1 because t This idea is gra the survival cur 2(G= 2) show S(t), the distan S(t) axis to the the distance to tice the median and 75th percen models, this rat stant for all fixe Figure: Acceleration factor γ as ratio of time-quantiles corresponding to any fixed value of S(t). For γ > 1 (γ < 1): exposure benefits (is harmful to) survival for Group G = 2.

Acceleration factor in a regression framework The acceleration factor allows to evaluate the effect of predictor variables on the survival time. AFT model: Y = ln(t ) = x β + σɛ T = exp(x β) exp(σɛ), }{{} T 0 where T 0 denotes the baseline survival time. Often, the baseline survival time is defined as T 0 = exp(β 0 + σɛ). Let S T0 denote the baseline survival function, then it holds that S T (t) = S T0 (t exp( x β)). Winter term 2018/19 10/25

Genesis of AFT models Various choices for the distribution of ɛ can be made: Distribution of ɛ Standard Gumbel (minimum) with σ = 1 Standard Gumbel (minimum) with σ 1 Standard logistic Standard normal Distribution of T Exponential Weibull Log-logistic Log-normal Note that the Gumbel distribution is also referred to as the extreme value type I distribution. Winter term 2018/19 11/25

Exponential regression model AFT model with σ = 1: Y = ln(t ) = x β + ɛ, where ɛ follows the standard Gumbel (minimum) distribution, denoted as G(0, 1), with density f ɛ (ɛ) = exp(ɛ exp(ɛ)) for ɛ R. Density of survival time T : f T (t) = exp( x β) exp( (t exp( x β))). Set λ := exp( x β), then f T (t) = λ exp( λt) T E(λ). Winter term 2018/19 12/25

Weibull regression model AFT model with σ 1: ln(t ) = x β + σɛ, where ɛ follows the standard Gumbel (minimum) distribution. T WB(α, λ). The Weibull regression model is an AFT model that has proportional hazards. The correspondence between the AFT representation and the proportional hazards representation is such that ( ) λ = exp x β, α = 1/σ, θ j = β j /σ (j = 1,..., p). σ Winter term 2018/19 13/25

Log-logistic regression model AFT model: ln(t ) = x β + σɛ, where ɛ follows the standard logistic distribution with density f ɛ (ɛ) = exp(ɛ)/(1 + exp(ɛ)) 2 for ɛ R. T has a log-logistic distribution with parameters α and γ. In the log-logistic model, the regression coefficients can be expressed in such a way that they can be interpreted as odds ratios. The log-logistic regression model is an AFT model that has proportional odds. Winter term 2018/19 14/25

Log-normal regression model AFT model: ln(t ) = x β + σɛ, ɛ N (0, 1). Y = ln(t ) N (x β, σ 2 ) with ( ) y x h Y (y) = 1 φ β σ ( ), σ y x 1 Φ β σ where φ( ) and Φ( ) denote the pdf and cdf of the standard normal distribution, respectively. T LN (x β, σ 2 ) with h T (t) = 1 t h Y (ln(t)). Winter term 2018/19 15/25

Log-Likelihood Likelihood: n n L(θ Dn) t = [f i (t i θ)] δ i [S i (t i θ)] 1 δ i = [h i (t i θ)] δ i S i (t i θ), where θ = (β, σ) is the vector of unknown parameters. Log-likelihood: l(θ D t n) = = n [δ i ln(f i (t i θ)) + (1 δ i ) ln(s i (t i θ))] n [δ i ln(h i (t i θ)) + ln(s i (t i θ))]. Winter term 2018/19 16/25

Log-Likelihood (2) Let ɛ i (t i ) := (ln(t i ) x i β)/σ. Then S i (t i ) = S ɛ (ɛ i (t i )), f i (t i ) = f ɛ(ɛ i (t i )), h i (t i ) = 1 h ɛ (ɛ i (t i )). σt i σt i The log-likelihood can then be written as n l(β, σ Dn) t = [ δ i ln(σt i ) + δ i ln(f ɛ(ɛ i (t i ))) + (1 δ i ) ln(s ɛ(ɛ i (t i )))] = c 1 + n [ δ i ln(σ) + δ i ln(f ɛ(ɛ i (t i ))) + (1 δ i ) ln(s ɛ(ɛ i (t i )))]. or alternatively n l(β, σ Dn) t = [ δ i ln(σt i ) + δ i ln(h ɛ(ɛ i (t i ))) + ln(s ɛ(ɛ i (t i )))] = c 1 + n [ δ i ln(σ) + δ i ln(h ɛ(ɛ i (t i ))) + ln(s ɛ(ɛ i (t i )))]. Winter term 2018/19 17/25

Log-Likelihood of the transformed data For the transformations y i = min{ln(t i ), ln(c i )} we denote the data as D y n = {(y i, δ i, x i ), i = 1,..., n}. With θ = (β, σ) and ɛ i (y i ) = (y i x i β)/σ it holds that S i (y i ) = S ɛ (ɛ i (y i )), f i (y i ) = f ɛ(ɛ i (y i )), h i (y i ) = 1 σ σ h ɛ(ɛ i (y i )). Log-likelihood for Dn y : n l(β, σ Dn y ) = [ δ i ln(σ) + δ i ln(f ɛ (ɛ i (y i ))) + (1 δ i ) ln(s ɛ (ɛ i (y i )))] = n [ δ i ln(σ) + δ i ln(h ɛ (ɛ i (y i ))) + ln(s ɛ (ɛ i (y i )))]. It follows that l(β, σ D y n ) + c 2 = l(β, σ D t n). Winter term 2018/19 18/25

Score function The first derivatives of the log-likelihood with respect to the unknown parameters are s β (β, σ) = s σ (β, σ) = l(β, σ) β l(β, σ) σ = 1 σ = 1 σ n a i x i n (δ i + ɛ i a i ) with a i = δ i d ln h ɛ (ɛ i (y i )) dɛ h ɛ (ɛ i (y i )). Winter term 2018/19 19/25

Hesse matrix The matrix of second derivatives has entries with 2 l(β, σ) β β = 1 σ 2 2 l(β, σ) β σ 2 l(β, σ) σ 2 = 1 σ 2 n b i x i x i = 1 n σ 2 [a i + ɛ i (y i )b i ]x i n [ δi + 2ɛ i (y i )a i + (ɛ i (y i )) 2 ] b i b i = da i dɛ = δ d 2 ln h ɛ (ɛ i (y i )) i dɛ 2 dh ɛ(ɛ i (y i )) dɛ. Winter term 2018/19 20/25

Confidence intervals The inverse of the observed Fisher information matrix provides estimators of the variances and covariances: Ĉov(ˆβ) = I(ˆβ) 1. Typically, software packages provide estimates of the standard errors of each of the model coefficients, which are the square roots of the elements on the main diagonal of Ĉov(ˆβ). The endpoints of a 100(1 α)% confidence interval for the jth coefficient are ˆβ j ± z 1 α/2 s.e.( ˆβ j ), where s.e.( ˆβ j ) denotes the standard error of the estimator of the coefficient j. Winter term 2018/19 21/25

Testing of linear hypotheses To test a linear relationship among x 1,..., x p is equivalent to testing the null hypothesis that there is a linear relationship among β 1,..., β p. The null hypothesis can be written in general as H 0 : Cβ = d, where C is a matrix of constants for the linear hypothesis and d is a known column vector of constants. Winter term 2018/19 22/25

Likelihood ratio test The likelihood ratio test statistic is defined as where Q = 2[ln L(ˆβ) ln L( β)], ˆβ = arg max l(β) β = arg max Cβ=d l(β) are the ML estimators obtained without and with restrictions under H 0 imposed on the parameters, respectively. It holds that Q a χ 2 (rank(c)). Winter term 2018/19 23/25

Wald test The Wald statistic is defined as W = (Cˆβ d) [CĈov(ˆβ)C ] 1 (Cˆβ d), where CĈov(ˆβ)C is the covariance matrix of Cˆβ. The Wald statistic only needs the ML estimator for the unrestrictive model. It holds that W a χ 2 (rank(c)). Winter term 2018/19 24/25

Score test The score statistic is defined as S = s( β) I( β) 1 s( β), where s( β) is the score vector for β. The score statistic only needs the ML estimator for the restrictive model. It holds that S a χ 2 (rank(c)). Winter term 2018/19 25/25