MAS3301 / MAS8311 Biostatistics Part II: Survival

Similar documents
MAS3301 / MAS8311 Biostatistics Part II: Survival

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Analysis. Stat 526. April 13, 2018

Lecture 4 - Survival Models

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

5. Parametric Regression Model

8. Parametric models in survival analysis General accelerated failure time models for parametric regression

Lecture 22 Survival Analysis: An Introduction

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

ST745: Survival Analysis: Parametric

Likelihood Construction, Inference for Parametric Survival Distributions

7.1 The Hazard and Survival Functions

Topic 4: Continuous random variables

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

STAT 6350 Analysis of Lifetime Data. Probability Plotting

β j = coefficient of x j in the model; β = ( β1, β2,

Semiparametric Regression

Beyond GLM and likelihood

Step-Stress Models and Associated Inference

Statistics 3858 : Maximum Likelihood Estimators

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Proportional hazards regression

Topic 4: Continuous random variables

Survival Regression Models

Key Words: survival analysis; bathtub hazard; accelerated failure time (AFT) regression; power-law distribution.

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Economics 508 Lecture 22 Duration Models

The Weibull in R is actually parameterized a fair bit differently from the book. In R, the density for x > 0 is

FAILURE-TIME WITH DELAYED ONSET

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

TMA 4275 Lifetime Analysis June 2004 Solution

Introduction to Statistical Analysis

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Statistics for Engineers Lecture 4 Reliability and Lifetime Distributions

Linear Regression Models P8111

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Multistate models and recurrent event models

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Package threg. August 10, 2015

Hacettepe Journal of Mathematics and Statistics Volume 45 (5) (2016), Abstract

4 Testing Hypotheses. 4.1 Tests in the regression setting. 4.2 Non-parametric testing of survival between groups

ST495: Survival Analysis: Maximum likelihood

Building a Prognostic Biomarker

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

ST5212: Survival Analysis

3 Continuous Random Variables

Multistate models and recurrent event models

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

A Very Brief Summary of Statistical Inference, and Examples

Duration Analysis. Joan Llull

Parametric Evaluation of Lifetime Data

Censoring. Time to Event (Survival) Data. Special features of time to event (survival) data: Strictly non-negative observations

Cox regression: Estimation

Multistate Modeling and Applications

Exercises. (a) Prove that m(t) =

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Chapter 4 Regression Models

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

Exponential Families

Generalized Linear Models

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Statistics & Data Sciences: First Year Prelim Exam May 2018

Chapter 17. Failure-Time Regression Analysis. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

CTDL-Positive Stable Frailty Model

Parameter Estimation for the Two-Parameter Weibull Distribution

Dynamic Models Part 1

CHAPTER 3 ANALYSIS OF RELIABILITY AND PROBABILITY MEASURES

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Logistic regression model for survival time analysis using time-varying coefficients

Accelerated Failure Time Models

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

The Weibull Distribution

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

4. Comparison of Two (K) Samples

Meei Pyng Ng 1 and Ray Watson 1

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

n =10,220 observations. Smaller samples analyzed here to illustrate sample size effect.

3003 Cure. F. P. Treasure

Survival Models. Patrick Lam. February 1, 2008

Solution to Series 3

Count and Duration Models

Survival Analysis. STAT 526 Professor Olga Vitek

UNIVERSITY OF TORONTO Faculty of Arts and Science

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Joint Modeling of Longitudinal Item Response Data and Survival

Stat 475 Life Contingencies I. Chapter 2: Survival models

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Power and Sample Size Calculations with the Additive Hazards Model

Introduction to Machine Learning. Lecture 2

ST745: Survival Analysis: Nonparametric methods

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Textbook: Survivial Analysis Techniques for Censored and Truncated Data 2nd edition, by Klein and Moeschberger

Transcription:

MAS330 / MAS83 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-0

8 Parametric models 8. Introduction In the last few sections (the KM estimate and hypothesis tests) we haven t made any assumptions about the form of the underlying hazard function eg. constant, decaying, increasing. In the next few sections we introduce the idea of assuming some specific functional form for the hazard function, and fitting it to the data. We start off by looking at different exploratory plots of the estimated survivor function that we use to assess which parametric assumptions are appropriate. 8.2 Identification of distributions 8.2. Exponential The exponential distribution has the following density, hazard, and survivor functions: where λ is a positive constant. It follows that We plot f(t) = λ exp( λt) h(t) = λ S(t) = exp( λt) S(t) = exp( λt) so log[s(t)] = λt. log[ŝ(t)] against t. If the lifetime distribution underlying the data is exponential, then the plot will approximately be a straight line through the origin, gradient λ. 8.2.2 Weibull The Weibull distribution has the following density, hazard, and survivor functions: f(t) = λγt γ exp( λt γ ) h(t) = λγt γ S(t) = exp( λt γ ) where λ, γ are positive constants. 39

It follows that S(t) = exp( λt γ ) so log ( log[s(t)]) = log λ + γ log t. We plot ) log ( log[ŝ(t)] against log t. If the lifetime distribution is Weibull, then the plot will approximately be a straight line gradient γ intercept log λ. If the distribution is exponential then the gradient will be approximately. You can plot the data in this way using R as follows: library(survival) my.surv <- Surv(times,cens) surv.fn <- survfit(my.surv) plot(log(surv.fn$time), log(-log(surv.fn$surv)), type="s") 8.2.3 Log-logistic The log-logistic distribution has the following density, hazard, and survivor functions: where γ, ρ are positive constants. f(t) = γρ(ρt)γ ( + (ρt) γ ) 2 h(t) = γρ(ρt)γ + (ρt) γ S(t) = + (ρt) γ 40

What sort of plot could we use to recognize a survivor function with this sort of distribution? We plot S(t) = (ρt)γ so ( ) log S(t) = γ log ρ + γ log t. ( ) log Ŝ(t) against log t. If the lifetime distribution is log-logistic then this should approximately give a straight line with intercept γ log ρ and gradient γ. Example: Given the parametric form S(t) = ( + λ 2 t 2 ) /2 for a survivor function, suggest a suitable plot of the estimated survivor function to assess whether this form is suitable and describe how an estimate for λ can be obtained from the graph. Plot /Ŝ2 vs t 2. The graph will have intercept and gradient λ 2 if the data are distributed with the given survivor function. 8.3 Some other lifetime distributions 8.3. The log normal distribution Let X N(µ, σ 2 ) be a normal random variable. If T = exp X and t = exp x then f T (t) = f X (x) dx dt = t 2πσ exp (log t 2 µ)2 2σ2 Unfortunately the survivor and hazard functions can only be expressed in terms of integrals: ( ) log t µ S(t) = Pr(T > t) = Pr(X > log t) = Φ σ where Φ is the standard normal distribution function Φ(z) = z 2π exp( u 2 /2) du. 4

Clearly this isn t very convenient. The log-logistic distribution is easier to use and is very similar. 8.3.2 The gamma distribution You might also see the gamma distribution used for lifetime random variables: f T (t) = Γ(α) λα t α exp( λt) where Γ is the gamma function (the constant is used to make sure the density integrates to ). Again, this isn t very convenient to use since the survivor function can only be expressed using integrals. The Weibull distribution is similar and simpler to use. 8.4 The log linear representation and R Lemma. Suppose T has a Weibull distribution with S(t) = exp( λt γ ). If we define a random variable E by E = λt γ then log T = γ log λ + γ log E and E exp(). Proof: The definition of E gives log E = log λ + γ log T ( ) and therefore log T = γ log λ + log E. γ Furthermore Pr(E > η) = Pr(λT γ > η) = Pr(T > (η/λ) /γ ) = S([η/λ] /γ ) = exp( η) and so E exp(). Transformations of this kind are quite useful, and we refer to equation ( ) as the log-linear representation of T. For the log-logistic distribution the log-linear representation has the form log T = log ρ + γ log E where E is a standard log-logistic random variable (one with γ = ρ = ). 42

Proof: so and T = exp { log ρ + γ } log E = ρ E/γ E = (ρt ) γ Pr(E > η) = Pr[(ρT ) γ > η] = Pr (T > ρ ) η/γ ) = S ( ρ η/γ = + η In general the transformation has the form log T = µ + σ log E for some constants µ, σ usually referred to as the intercept and scale parameter respectively. Why is this useful? R fits parametric models using this sort of representation. This is carried out using the survreg function: it fits models of the form log T = µ + (a term that depends on covariates) + σ log E. Compare this to a normal regression formula: Y = µ + βx + σɛ where ɛ N(0, ) is an error term. You can see how survreg is carrying out a similar task to regular regression. We ll see in the practicals how to use the function, and we ll refer back to the little bit of theory presented here. 9 Maximum likelihood for parametric models In the last section we looked at how exploratory plots of the estimated survivor function can suggest a specific functional form for the underlying lifetime distribution. In this section we consider how to fit the parameters to the observed data. The way we do this is with maximum likelihood. If you look back at your notes for MAS2305 you ll see how to estimate the mean and standard deviation for a set of data assumed to be normally distributed. We re going to do exactly the same kind of thing here for survival data: we ll just be using different distributions and we ll have to take censoring into account. 43

9. Likelihood functions 9.. Notation Probability density function: Distribution function: Survivor function: Hazard function: Data: f(t) F (t) S(t) = F (t) h(t) = f(t)/s(t) n independent subjects, of whom n R have right-censored times, n Q have left-censored times, n I have interval-censored times, n D have observed failure times. n = n D + n R + n Q + n I Define sets: R = {i : right censored } Q = {i : left censored } I = {i : interval censored } D = {i : failure (death) time observed } A = D R Q I. ( A is for all.) Denote sums and products over these sets using, e.g., t i, t i, i R t i etc. 9..2 All survival times observed D = A n D = n Likelihood: L = f(t i ) i A Log likelihood l = i A log[f(t i )] 44

9..3 Some observations right-censored A = D R n = n D + n R Likelihood: L = f(t i ) S(t i ) i R Log likelihood l = log[f(t i )] + log[s(t i )] i R The contribution of a right-censored observation to the likelihood is S(t i ), the probability that the patient is still alive at time t i. 9..4 Left-censoring Similarly, a left-censored observation at time t i would contribute F (t i ) = S(t i ) to the likelihood, the probability that the subject dies before time t i. 9..5 Interval-censoring Likewise, an interval-censored observation, in the interval (t Li, t Ui ) contributes [F (t Ui ) F (t Li )] = [S(t Li ) S(t Ui )] = tui t Li f(t)dt to the likelihood, the probability that the subject dies in the interval (t Li, t Ui ). 9..6 Example Given a survivor function of the form S(t) = exp[ (at + bt 2 )] write down and simplify the log-likelihood function for a set of right-censored data. 45

Start off by calculating f(t): Then and L(a, b) = f(t) = ds(t) dt = (a + 2bt) exp[ (at + bt 2 )] ( (a + 2bti ) exp[ (at i + bt 2 i )] ) exp[ (at i + bt 2 i )] i R l(a, b) = log(a + 2bt i ) (at i + bt 2 i ) i + bt i R(at 2 i ) = log(a + 2bt i ) (at i + bt 2 i ) 9.2 Exponential distribution Suppose the lifetime distribution is exponential, so that f(t) = λe λt, and S(t) = e λt. Let s suppose we have n D failure times and n R right censored times so that A = D R, n = n R + n D. The likelihood is L = f(t i ) S(t i ) i R = λe λti i R ( = λ n D exp λ e λti t i ) and l = log(l) = n D log λ λ t i. 46

Differentiating gives l λ = n D λ t i, λ 2 = n D λ 2. The second derivative is negative, so solving l/ λ = 0 gives a maximum. The MLE is ˆλ = n D t. i The mean of the exponential distribution with parameter λ is /λ, so the maximum likelihood estimate of the mean is ˆµ = t i. n D Similarly, the median t m is the solution to /2 = F (t m ) = exp( λt m ). Rearranging gives t m = (log 2)/λ, so ˆt m = ˆµ log 2. Standard theory about MLEs shows that the approximate large-sample variance of ˆλ is var(ˆλ) [ ( 2 )] l E λ 2. Substituting in the value for the second derivative we obtained above gives var(ˆλ) λ2 n D ˆλ 2 n D. An approximate 95% CI for λ is therefore ˆλ ±.96 ˆλ/ n D, assuming ˆλ is normally distributed. 0 Maximum likelihood for some other distributions 0. The Weibull Distribution Suppose that a lifetime random variable adopts a Weibull distribution with f(t) = λγt γ exp( λt γ ), and S(t) = exp( λt γ ). Let s suppose we have n D failure times and n R right censored times so that The likelihood is A = D R, n = n R + n D. 47

L = f(t i ) S(t i ) i R = {λγt γ i exp( λt γ i )} exp( λt γ i ) i R ( ) ( = λ n D γ n D t γ i exp λ t γ i ) and l = log(l) = n D log λ + n D log γ + (γ ) log t i λ t γ i. In order to carry out the differentiation, we need the following: Lemma. Proof: From properties of logs: x tx = [log(t)] t x t x = e x log(t). Differentiating both sides with respect to x gives x tx = [log(t)] e x log(t) = [log(t)] t x. Differentiating with respect to λ gives l λ = n D λ Solving l/ λ = 0 gives the following relationship between the MLEs: ˆλ = n D tˆγ i. t γ i. 48

Differentiating with respect to γ (via the lemma) gives l γ = n D γ + log t i λ t γ i log t i. Setting l/ γ = 0 gives a second relationship between the MLEs: 0 = n D ˆγ + log t i ˆλ tˆγ i log t i. We can then substitute in the value for ˆλ: 0 = n D ˆγ + tˆγ i log t i n log t i D tˆγ. i We cannot solve this equation algebraically so we need a numerical procedure. The Newton-Raphson algorithm is one way to solve such equations numerically. In practice the Newton-Raphson algorithm is used to solve both equations l λ = 0, l γ = 0 simultaneously. Let ] [ˆλ ˆγ j denote the jth approximation to the MLE. We pick some suitable values to start the process off at j = 0. For example, you can use a graphical method as we saw in previous lectures to obtain rough estimates for the parameters. To obtain the (j + )th approximation we evaluate [ l ] λ l γ j and ( ) [ λ 2 γ 2 using the values ( ) for λ, γ. At iteration (j + ), the approximation is then ] j ] [ˆλ = ˆγ j+ ] [ˆλ ˆγ j [ λ 2 ] 2 l γ 2 j [ l ] λ l γ j where 49

λ 2 = n D λ 2 = t γ i log t i γ 2 = n D γ 2 λ t γ i (log t i) 2. Standard MLE theory gives that, asymptotically, ] [ [ [ˆλ 2 l λ var E 2 ˆγ γ 2 ]]. We therefore have that when the MLE is found, an approximation to the variance is var ] [ˆλ ˆγ [ λ 2 ] 2 l γ 2 where the matrix is evaluated at the MLE. This is a useful by-product of the Newton-Raphson algorithm. Note that, if the value of γ is known, then and ˆλ = n D tγ i ( d 2 ) l var(ˆλ) dλ 2 = λ2 ˆλ 2 n D = n D n D (. tγ i )2 0.2 Log-logistic f(t) = γρ(ρt)γ ( + (ρt) γ ) 2 S(t) = + (ρt) γ With the same censoring assumptions as above you get a likelihood function L = f(t i ) i R S(t i ). 50

It s straightforward to substitute in the formulae for f and S, differentiate, and come up with a Newton-Raphson scheme, although the calculation isn t short. As for the Weibull distribution, initial estimates of the parameters can be obtined via a graphical method. 0.3 Gamma and Log-normal Since the survivor functions cannot be written down in a closed form they have to be evaluated numerically, and maximizing the likelihood cannot be performed using a straightforward Newton Raphson algorithm. Tutorial Examples 3. The survival time for individuals is modelled as a random variable T. Define the hazard function h(t) associated with T in terms of a limit. From the definition prove that where S(t) = Prob(T t). h(t) = d dt log[s(t)] 2. A study was performed to investigate the survival times of patients following heart bypass surgery. It was found that 50% of patients lived for 24 months or longer, while 30% of patients lived for 60 months or longer. Assuming the survival time for the population has a Weibull distribution, obtain approximate estimates of the parameters λ and γ. Then estimate: (a) the probability of an individual living 72 months or longer, and (b) the probability of an individual dying between 24 and 60 months, given she was alive at 24 months. 3. The time until relapsing back to injecting heroin is recorded for drug addicts taking part in a study of replacement therapy. The response variables are the time until relapse to injecting and the censoring status (some times are right-censored). There are two explanatory variables: p, a binary variable which takes value if the addict has had a previous prison sentence, and 0 otherwise; and x, the dose of the replacement therapy given. The parameter estimates, when both explanatory variables are fitted, are as follows. Variable Estimated coefficient Standard error p 0.327 0.67 x -0.035 0.006 (a) What is the estimated hazard ratio for a subject who has not been to prison compared to a subject who has, given that they receive the same dose of the replacement therapy? (b) What is the estimated hazard ratio for a subject receiving a dose x = 30 and another who receives a dose receiving a dose x = 50, given that neither has a previous prison sentence. (c) In each case give an approximate 95% confidence interval for the hazard ratio. (Assume that the sample is large enough for the approximation). 5