Survival Analysis I (CHL5209H)

Similar documents
Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

STAT331. Cox s Proportional Hazards Model

Lecture 22 Survival Analysis: An Introduction

Survival Analysis. Stat 526. April 13, 2018

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

Survival Analysis I (CHL5209H)

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

9 Generalized Linear Models

Model Estimation Example

Statistics in medicine

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Checking the Poisson assumption in the Poisson generalized linear model

Introduction to the Analysis of Tabular Data

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

UNIVERSITY OF CALIFORNIA, SAN DIEGO

12 Modelling Binomial Response Data

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Linear Regression Models P8111

STAT 526 Spring Final Exam. Thursday May 5, 2011

Lecture 5 Models and methods for recurrent event data

Chapter 22: Log-linear regression for Poisson counts

Multistate Modeling and Applications

Survival Regression Models

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Logistic Regression - problem 6.14

Various Issues in Fitting Contingency Tables

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

7.1 The Hazard and Survival Functions

MAS3301 / MAS8311 Biostatistics Part II: Survival

Survival Analysis Math 434 Fall 2011

Multi-state Models: An Overview

Poisson Regression. The Training Data

Lecture 3. Truncation, length-bias and prevalence sampling

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Lecture 8 Stat D. Gillen

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Cox s proportional hazards model and Cox s partial likelihood

Power and Sample Size Calculations with the Additive Hazards Model

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Classification. Chapter Introduction. 6.2 The Bayes classifier

Survival Analysis. Lu Tian and Richard Olshen Stanford University

MSH3 Generalized linear model Ch. 6 Count data models

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Age 55 (x = 1) Age < 55 (x = 0)

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Multi-state models: prediction

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA

The Weibull Distribution

TMA 4275 Lifetime Analysis June 2004 Solution

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

PASS Sample Size Software. Poisson Regression

Generalised linear models. Response variable can take a number of different formats

General Regression Model

Multivariate Survival Data With Censoring.

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Logistic regression model for survival time analysis using time-varying coefficients

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Instrumental variables estimation in the Cox Proportional Hazard regression model

Pump failure data. Pump Failures Time

Exam Applied Statistical Regression. Good Luck!

Case-control studies

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

BMI 541/699 Lecture 22

Residuals and model diagnostics

Survival analysis in R

Logistic Regressions. Stat 430

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Survival Analysis for Case-Cohort Studies

Lecture 7 Time-dependent Covariates in Cox Regression

Log-linear Models for Contingency Tables

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

R Hints for Chapter 10

ST495: Survival Analysis: Maximum likelihood

PAPER 206 APPLIED STATISTICS

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Nonparametric Model Construction

Exercise 5.4 Solution

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

DAGStat Event History Analysis.

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Estimation for Modified Data

1 Introduction. 2 Residuals in PH model

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Introduction to Statistical Analysis

Modeling Overdispersion

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Transcription:

Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1

Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really useful as a reference text but interesting pedagogical approach. Kalbfleisch JD & Prentice RL (2002): The Statistical Analysis of Failure Time Data, Second Edition. Introductory, serves as a reference text. Klein JP & Moeschberger ML (2003): Survival Analysis - Techniques for Censored and Truncated Data, Second Edition. Introductory, serves as a reference text. Aalen OO, Borgan Ø, Gjessing H (2008): Survival and Event History Analysis - A Process Point of View. For those looking for something more theoretical. 31-2

31-3

Survival analysis? Survival analysis focuses on a single event per individual (say, first marriage, graduation, diagnosis of a disease, death). Analysis of multiple events would be referred to as event history analysis. Despite the name, a course in survival analysis is mostly not about survival. Rather, most of the time, we concentrate on estimating or modeling a more fundamental quantity, the rate parameter λ, or the time-dependent version, the hazard function λ(t). Survival probability in turn is determined by the hazard function. The rates can be for example mortality or incidence rates. 31-4

More about rates Rate parameter is the parameter of the distribution, characterizing the rate of occurrence of the events of interest. The expected number of events µ in a total of Y years of follow-up time and λ are connected by µ = λy. The observed number of events D in Y years of follow-up time is distributed as D (λy ). How to estimate the rate parameter λ? 31-5

An estimator for λ A possible estimator is suggested by µ = λy λ = µ Y. It would seem reasonable to replace here the expected number of events µ with the observed number of events D and take ˆλ = D Y. This is in fact the maximum likelihood estimator of λ. 31-6

Maximum likelihood criterion Maximum likelihood estimate is the parameter value that maximizes the probability of observing the data D. The probability of the observed data is given by the statistical model, which is now D (λy ). Probabilities under the distribution are given by (λy ) D e λy. D! We consider this probability as a function of λ, and call it the likelihood of λ. 31-7

Maximizing the likelihood We may ignore any multiplicative terms not depending on the parameter, and instead maximize the expression λ D e λy. Or, for mathematical convenience, its logarithm D log λ λy. How to find the argument value which maximizes a function? Set the first derivative to zero and solve w.r.t. λ: D λ Y = 0 λ = D Y. Check that the second derivative is negative: D λ 2 < 0. 31-8

Follow-up data Clayton & Hills (1993, p. 41): 31-9

Estimate For the 7 subjects (individuals) there is a total of 36 time units of follow-up time/person-time, and 2 outcome events (for individuals 2 and 6). The follow-up of the other individuals was terminated by censoring (e.g. by events other than the outcome event of interest). In fact, censoring is the reason why we need to approach the estimation problem through rates, rather than looking at the event times directly. We arrive at ˆλ = D Y = 2 36 0.056. To recap: Estimand/parameter/object of inference: λ Estimator: D Y Estimate: 0.056. 31-10

Interpretation? We will define the rate parameter more precisely shortly, but let us first consider the interpretation. Unlike the risk parameter, the probability of an event occurring within a specific time period, the rate parameter does not correspond to a follow-up period of a fixed length. Rather, it characterizes the instantaneous occurrence of the outcome event at any given time. The rate parameter is not a probability, but it can be characterized in terms of the risk parameter when the follow-up period is very short. 31-11

Time unit Suppose that each of the N = 36 time bins here is of length h = 0.05 years: In total there is Y = Nh = 36 0.05 = 1.8 years of follow-up. 31-12

From risk to rate The empirical rate is given by ˆλ = 2 1.8 = 1.11 per person-year, or, say, 1110 per 1000 person-years. Per person-year, the empirical rate would be the same, had we instead split the person-time into 180 bins of length 0.01 years. Suppose that we have made the time bins short enough so that at most one event can occur in each bin. Whether an event occurred in a particular bin of length h is now a Bernoulli-distributed variable, with the expected number of events equal to the risk π. Thus, because rate is the expected count divided by person-time, when h is small, we have λ = π h π = λh. 31-13 This connection is important in understanding how rate is related to survival probability.

31-14 Survival probability One of the particular properties of the natural logarithm and its inverse is that when x is close to zero, e x 1 + x, and conversely, log(1 + x) x. Suppose that we are interested in the probability of surviving T years. By splitting the timescale so that N = T h, T = Nh. The probability of surviving through a single time bin of length h, conditional on surviving until the start of this interval, is 1 π = 1 λh. By the multiplicative rule, the T year survival probability is thus (1 λh) N. This motivates the well-known Kaplan-Meier estimator, to be encountered later. In turn, the logarithm of this is N log(1 λh) Nλh = λt.

Survival and cumulative hazard The quantity λt is known as the cumulative hazard. We have (approximately, without calculus) obtained a fundamental relationship of survival analysis, namely that the T year survival probability is (1 λh) N e λt. Let us test whether this approximation actually works. Now ˆλ = 1.11. If T = 1 and h = 0.05, N = 20 and we get (1 1.11 0.05) 20 0.319. The exact one year survival probability is e 1.11 1 0.330. We should get a better approximation through a finer split of the time scale. If h = 0.01, N = 100 and (1 1.11 0.01) 100 0.328. 31-15

Regression models Recall the relationship µ = λy. If γ = log λ, uquivalently we have a log-linear form log µ = γ + log Y, where we call log Y an offset term. γ is an unknown parameter, which we could estimate in an obvious way. (How?) Such a one-parameter model is not very interesting, but serves as a starting point to modeling. Consider now the expected number of events µ 1 in Y 1 years of exposed person-time and the expected number of events µ 0 in Y 0 years of unexposed person-time. The corresponding probability models are now D 1 (µ 1 ) and D 0 (µ 0 ), 31-16 where µ 1 = λ 1 Y 1 and µ 0 = λ 0 Y 0. Here µ 1 + µ 0 = µ and D 1 + D 0 = D.

Combining the two models We may now parametrize the two log-rates in terms of an intercept term α and a coefficient β as log λ 0 = α and log λ 1 = α + β. By introducing an exposure variable Z, with Z = 1 (Z = 0) indicating the exposed (unexposed) person-time, we can express these definitions as a equation log λ Z = α + βz λ Z = e α+βz. This results in a single statistical model, namely ( ) D Z Y Z e α+βz. What is the interpretation of the coefficient? Now we have λ 1 = eα+β λ 0 e α = eα e β e α = eβ, 31-17 or β = log ( λ1 λ 0 ), that is, the log rate ratio.

Reparametrizing rates The model can be easily extended to accommodate more than one determinant. For example, unadjusted comparisons of rates are susceptible to confounding; we can move on to consider confounder-adjusted rate ratios. Consider the following dataset: 31-18 Introduce an exposure variable taking values Z = 1 (energy intake < 2750 kcals) and Z = 0 ( 2750 kcals), and an age group indicator taking values X = 0 (40 49), X = 1 (50 59) and X = 2 (60 69).

The original parameters There are now six rate parameters λ ZX, corresponding to each exposure-age combination: Z = 0 Z = 1 X = 0 λ 00 λ 10 X = 1 λ 01 λ 11 X = 2 λ 02 λ 12 The corresponding statistical distributions are Z = 0 Z = 1 X = 0 D 00 (Y 00 λ 00 ) D 10 (Y 10 λ 10 ) X = 1 D 01 (Y 01 λ 01 ) D 11 (Y 11 λ 11 ) X = 2 D 02 (Y 02 λ 02 ) D 12 (Y 12 λ 12 ) 31-19

Transformed parameters Now, we are not primarily interested in estimating six rates; rather, we are interested in the rate ratio between the exposure categories, adjusting for age. We could parametrize the rates w.r.t. the baseline, or reference, rate λ 00 which is then modified by the exposure and age (cf. Clayton & Hills 1993, p. 220). Define Z = 0 Z = 1 X = 0 λ 00 = λ 00 λ 10 = λ 00 θ X = 1 λ 01 = λ 00 φ 1 λ 11 = λ 00 θφ 1 X = 2 λ 02 = λ 00 φ 2 λ 12 = λ 00 θφ 2 Now θ is the rate ratio within each age group (verify). 31-20

Regression parameters As before, we can specify the reparametrization in terms of a link function and a linear predictor as Since log λ ZX = α + βz + γ 1 1 {X =1} + γ 2 1 {X =2}. λ ZX = e α+βz+γ 11 {X =1} +γ 2 1 {X =2}, we have that λ 00 = e α, θ = e β, φ 1 = e γ 1 and φ 2 = e γ 2. The rates are now given by the equation as 31-21 Z = 0 Z = 1 X = 0 λ 00 = e α λ 10 = e α+β X = 1 λ 01 = e α+γ 1 λ 11 = e α+β+γ 1 X = 2 λ 02 = e α+γ 2 λ 12 = e α+β+γ 2 The number of parameters has been reduced from six to four.

Specification in terms of expected counts A model is always specified in terms of the expected event count: D ZX (µ ZX ). The model for the expected count is specified by µ ZX = Y ZX λ ZX = Y ZX e α+βz+γ 11 {X =1} +γ 2 1 {X =2} = e α+βz+γ 11 {X =1} +γ 2 1 {X =2} +log Y ZX. We have obtained the model ) D XZ (e α+βz+γ 11 {X =1} +γ 2 1 {X =2} +log Y ZX. When fitting the model, log-person years has to be included in the linear predictor as an offset variable. 31-22

Fitting the model The data as frequency records are entered into R as: d <- c(4,5,8,2,12,14) y <- c(607.9,1272.1,888.9,311.9,878.1,667.5) z <- c(0,0,0,1,1,1) x <- c(0,1,2,0,1,2) The model is specified as model <- glm(d ~ z + as.factor(x) + offset(log(y)), family=poisson(link="log")) The as.factor(x) term specifies that we want to estimate separate age group effects (rather than assume that the X -variable modifies the log-rate additively). 31-23

Results Call: glm(formula = d ~ z + as.factor(x) + offset(log(y)), family = poisson(link = "log")) Deviance Residuals: 1 2 3 4 5 6 0.73940-0.58410 0.04255-0.77385 0.42800-0.03191 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -5.4177 0.4421-12.256 < 2e-16 *** z 0.8697 0.3080 2.823 0.00476 ** as.factor(x)1 0.1290 0.4754 0.271 0.78609 as.factor(x)2 0.6920 0.4614 1.500 0.13366 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 14.5780 on 5 degrees of freedom Residual deviance: 1.6727 on 2 degrees of freedom AIC: 31.796 31-24 Number of Fisher Scoring iterations: 4

Proportional hazards From the model output, we may calculate estimates for the original rate parameters (per 1000 person-years) as Z = 0 Z = 1 X = 0 ˆλ 00 = 4.44 ˆλ 10 = 10.59 X = 1 ˆλ 01 = 5.05 ˆλ 11 = 12.05 X = 2 ˆλ 02 = 8.86 ˆλ 12 = 21.20 Note that the rate ratio stays constant across the age groups. This is forced by the earlier model specification. This is a modeling assumption, namely the proportional hazards assumption. Compare these estimates to the corresponding six empirical rates. Is assuming proportionality of the hazard rates justified? How could one test this? Or relax this assumption? 31-25

31-26

Counting process In survival analysis, the outcome is generally a pair of random variables (T i, E i ), where T i represents the event time, and E i the type of the event that occurred at T i. Usually, we have to consider at least two types of events, namely the outcome event of interest (say, E i = 1), and censoring (say, E i = 0), that is, termination of the follow-up due to some other reason than the outcome event of interest. Equivalently, we can characterize the outcomes in terms of the cause-specific counting processes N ij (t) = 1 {Ti t,e i =j}, j = 0, 1. 31-27 (What is a process?) We can also introduce an overall counting process N i (t) = 1 j=0 N ij(t) for any type of event. If there is no censoring, this is all we need.

History The observed histories of such counting processes just before time t are denoted as F t = σ({n ij (u) : i = 1,..., n; j = 0, 1; 0 u < t}). The mathematical characterization of this as a generated σ-algebra is not important at this point, we only need to understand F t as all observed information before t. Consider now the conditional probability for an event (of any type) occurring at t, that is, P(dN i (t) = 1 F t ), where dn i (t) N i (t + dt) N i (t). 31-28

Hazard function Alternatively, this may be characterized as P(dN i (t) = 1 F t ) = E[dN i (t) F t ] = P(t T i < t + dt F t ) λ(t) dt, or λ(t) = P(t T i < t + dt F t ). dt When T i t and the outcome of individual i does not depend on the outcomes of other individuals, this is equivalent to the familiar definition λ(t) = lim t 0 P(t T i < t + t T i t). t 31-29

Connection to the survival function Now P(t T i < t + dt T i t) = P(t T i < t + dt) P(T i t) λ(t) = f (t) S(t), where f (t) is the density function and S(t) the survival function of the event time distribution. Note that S(t) = 1 F (t) and f (t) = Further, d[log F (t)] dt df (t) dt. = f (t) d[log S(t)] F (t) and dt This gives us again the fundamental relationship { t } S(t) = exp λ(u) du, 0 = f (t) S(t). 31-30 where t 0 λ(u) du Λ(t) is the cumulative hazard.

Cause-specific hazards In the presence of censoring, we are not interested in the overall hazard rate, but rather, the cause-specific hazard rate P(dN ij (t) = 1 F t ) = E[dN ij (t) F t ] = P(t T i < t + dt, E i = j F t ) λ j (t) dt for event type j. It is easy to see that these are related to the overall hazard by λ(t) = 1 j=0 λ j(t). The sub-density function corresponding to event type j is given by f j (t) P(t T i < t + dt, E i = j) dt = λ j (t)s(t), 31-31 where S(t) is the overall survival function.