STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis

Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive a long time. Continuous variables like stress, temperature, voltage, and pressure. Discrete variables like number of hardening treatments or number of simultaneous users of a system. Categorical variables like manufacturer, design, and location. Regression model relates failure time distribution to explanatory variables x = (x 1, x 2,..., x k ): Pr(T t) = F(t) = F(t; x). 1

Failure-Time Regression Analysis Parameters as Functions of Explanatory Variables Material in this chapter is an extension of statistical regression analysis with normal distributed data and mean = β 0 + β 1 x 1 +... + β k x k where the x i are explanatory variables. The ideas presented here are more general: Data not necessarily from a normal distribution. Data may be censored. Nonstandard regression models that relate life to explanatory variables. 2

Failure-Time Regression Analysis Scale-Accelerated Failure-Time Model The scale-accelerated failure-time (SAFT) model is commonly used to describe the effect that explanatory variables x have on time. This model has a simple time-scaling acceleration factor that is a function of x and is defined by T(x) = T(x 0) AF(x), AF(x) > 0, AF(x 0) = 1, where T(x) is the time at condition x and T(x 0 ) is the corresponding time at some baseline condition x 0. 3

Scale-Accelerated Failure-Time Model Some commonly used forms for the timescale factor AF(x) include the log linear relationships where typical forms for AF(x) are log[af(x)] = β 1 x with x 0 = 0 for a scalar x. log[af(x)] = β 1 x 1 +β 2 x 2 +..., β k x k with x 0 = 0 for a vector x. When AF(x) > 1, T(x) < T(x 0 ), time at x is accelerated relative to time at x 0. When 0 < AF(x) < 1, T(x) > T(x 0 ), time at x is decelerated relative to time at x 0. 4

Scale-Accelerated Failure-Time Model In terms of cdfs, with baseline cdf F(t; x 0 ), F(t; x) = F[AF(x) t; x 0 ]. In terms of quantiles, t p (x) = t p (x 0 )/AF(x), i.e. ln[t p (x)] ln[t p (x 0 )] = AF(x). SAFT models are often suggested by the physical theory of some simple failure mechanisms, but they do not hold universally. 5

Location-Scale Regression Model With one explanatory variable (it can be extended to more complicate situations), the location-scale simple regression model is ( ) y µ Pr(Y y) = F(y; µ, σ) = F(y; β 0, β 1, σ) = Φ, σ where µ = β 0 + β 1 x and σ does not depend on the explanatory variable x. The quantile function y p = µ + Φ 1 (p)σ = β 0 + β 1 x + Φ 1 (p)σ is linear in x. Choosing Φ determines the shape of the distribution for a particular value of x. Distribution Normal Extreme Value Logistic Φ(z) z 1 exp( w2 2π 2 )dw exp[z exp(z)] exp(z) 1+exp(z) 6

A simple form of the proportional hazards model is h(t, x) = h 0 (t)ψ(x, β), in which the explanatory vector x does not change over time for any individual. The proportional hazards model can also be written as S(t, x) = [S(t; x 0 )] ψ(x,β), or F(t, x) = 1 [1 F(t; x 0 )] ψ(x,β), or ln{ ln[1 F(t, x)]} ln{ ln[1 F(t; x 0 )]} = ln[ψ(x, β)]. When ψ(x, β) > 1, the model accelerates time, i.e. F(t; x) > F(t; x 0 ) for all t. When ψ(x, β) < 1, the model decelerates time, i.e. F(t; x) < F(t; x 0 ) for all t. 7

For the Weibull distribution (and only the Weibull distribution), a proportional hazards regression model is also a SAFT regression model. Three commonly used parameterizations of ψ may be considered Log linear form: ψ(x; β) = exp(β T x) Linear form: ψ(x; β) = 1 + β T x Logistic form: ψ(x; β) = log(1 + e βt x ) Discrimination among these forms may be achieved by fitting an augmented family, for example, ψ(x; β, κ) = (1 + κβ T x) 1/κ includes the linear and log linear models as special cases, κ = 1 and κ 0, respectively. 8

First we assume the survival times have continuous distributions and are recorded exactly (no ties). Let t 1 < t 2 < < t n denote the failure times of the n units. We consider inferences about β when the baseline hazard function h 0 (t) is completely unknown. The Cox proportional hazards model: is a semiparametric model makes no assumptions about the form of h 0 (t) (nonparametric part of the model) assumes parametric form for the effect of the predictors on the hazard 9

The risk set at time t j is denoted by R(t j ) = {i : t i t j }. In the absence of knowledge of h 0 (t), the t j (actual times of failures) can provide little or no information about β, for their distribution will depend heavily on h 0 (t). Cox partial likelihood estimation method for β corresponds to the method of maximum likelihood only the ranks of the failure times (and the location between the failure ranks of the censoring) is considered. 10

Partial Likelihood: The conditional probability that k fails at t j given that one individual from the risk set R(t j ) fails at t j, which is simply h(t j ; x k ) h(t j ; x i ) = ψ(x k, β) ψ(x i, β). i R(t j ) i R(t j ) The overall partial likelihood is L(β) = n j=1 i R(t j ) ψ(x k, β) ψ(x i, β) where δ i denotes the censoring indicators. δ i, Let ˆβ denote the maximum (partial) likelihood estimate of β, obtained by maximizing the partial log-likelihood function, ln L(β). 11

If ψ(x i, β) = exp(β T x i ), we have ln L(β) = n δ i (β T x i ) i=1 n i=1 δ i i R(t j ) exp(β T x j ). The first derivative of ln L(β) with respect to β is called vector of efficient scores, given by U(β) = dln L(β) dβ = δ T X n i=1 δ i exp(β T x j )X (j, ) exp(β T, x j ) j R(t i ) j R(t i ) where δ = (δ 1,..., δ n ) T denotes the vector of censoring indicators, and X is the n p matrix of covariate values, with the j-th row containing the covariate values of the j-th individual, X (j, ) = x T j. 12

The information matrix I(β) is given by the negative of the second derivative of ln L(β). Let 1 R(ti ) denotes the indicator vector of the risk set R(t i ), i.e. the j-th element of 1 R(ti ) is 1 when t j t i, and 0, otherwise. Then, the information matrix takes the form = I(β) = d2 ln L(β) dβ T dβ n i=1 { δ i (i) T w [w i (β)] 2X i (β)diag(e Xβ ) [ e Xβ][ e Xβ] T } X (i) where w i (β) = 1 T exp(xβ) are scalars; R(t i ) for matrix v, diag(v) is the diagonal matrix with the main diagonal v, and exp(v) is defined elementwise; and X(i) = diag(1 R(ti ) )X. The inverse of the information matrix I 1 (ˆβ), is a consistent estimate of the variance -covariance matrix of ˆβ. 13

Test hypotheses about the regression parameters β, for example H 0 : β = β 0 Wald test χ 2 W = (ˆβ β 0 ) T I(ˆβ)(ˆβ β 0 ) which has a chi-squared distribution with p degrees of freedom under H 0 for large samples. Likelihood ratio test χ 2 LR = 2[ln L(ˆβ) ln(β 0 )] which has a chi-squared distribution with p degrees of freedom under H 0 for large samples. Score test χ 2 S = U(β 0 )T I(β 0 )U(β 0 ) which has a chi-squared distribution with p degrees of freedom under H 0 for large samples. 14

Partial Likelihood for Discrete failure times: ties Let t 1 < t 2 < < t D : the D distinct, ordered, failure times. d i : number of deaths at t i D i : the set of all individuals who die at time t i. r i : number of individuals in R(t i ). In the absence of knowledge of the true order (the real case), we have to consider all possible orders of these observed d i tied survival times. For each t i, the observed d i tied survival time can be ordered in d i! different possible ways. 15

For each of these possible orders we will have a product as the continuous case for the corresponding d i survival times. For large r i, construction and computation of the exact partial likelihood function is very tedious task. To approximate the exact partial likelihood function, the following two methods can be used when each d i is small compared to r i Breslow (1974, International Statistical Review) L(β) = D [ i=1 Efron (1977, JASA) L(β) = D i=1 d i j=1 { k R(t i ) j D i ψ(x j ; β) j R(t i ) ψ(x j ; β) j D i ψ(x j ; β) ψ(x k ; β) j 1 d i ] di k D i ψ(x k ; β) } 16