Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Similar documents
Lecture 7 Time-dependent Covariates in Cox Regression

Extensions of Cox Model for Non-Proportional Hazards Purpose

MAS3301 / MAS8311 Biostatistics Part II: Survival

The coxvc_1-1-1 package

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

TMA 4275 Lifetime Analysis June 2004 Solution

Statistics in medicine

Cox s proportional hazards model and Cox s partial likelihood

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Semiparametric Regression

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Extensions of Cox Model for Non-Proportional Hazards Purpose

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Ph.D. course: Regression models. Introduction. 19 April 2012

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Chapter 4 Regression Models

Multi-state Models: An Overview

β j = coefficient of x j in the model; β = ( β1, β2,

3003 Cure. F. P. Treasure

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Survival analysis in R

Proportional hazards regression

A multi-state model for the prognosis of non-mild acute pancreatitis

Nonparametric Model Construction

Survival Analysis Math 434 Fall 2011

HOW TO USE SURVIVAL FORESTS (SFPDV1)

Survival Regression Models

4. Comparison of Two (K) Samples

Survival analysis in R

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Machine Learning. Module 3-4: Regression and Survival Analysis Day 2, Asst. Prof. Dr. Santitham Prom-on

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Philosophy and Features of the mstate package

Multistate Modeling and Applications

Logistic regression model for survival time analysis using time-varying coefficients

Time-dependent covariates

Power and Sample Size Calculations with the Additive Hazards Model

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Package threg. August 10, 2015

Lecture 22 Survival Analysis: An Introduction

Lecture 4 - Survival Models

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Tied survival times; estimation of survival probabilities

ST745: Survival Analysis: Nonparametric methods

Package CoxRidge. February 27, 2015

Lecture 8 Stat D. Gillen

Residuals and model diagnostics

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

First Aid Kit for Survival. Hypoxia cohort. Goal. DFS=Clinical + Marker 1/21/2015. Two analyses to exemplify some concepts of survival techniques

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Multistate models and recurrent event models

Survival Analysis I (CHL5209H)

Multivariate Survival Analysis

Probability and Probability Distributions. Dr. Mohammed Alahmed

Survival Distributions, Hazard Functions, Cumulative Hazards

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Estimation for Modified Data

Stat 642, Lecture notes for 04/12/05 96

Multistate models and recurrent event models

Lecture 5 Models and methods for recurrent event data

Cox s proportional hazards/regression model - model assessment

Survival Models for the Social and Political Sciences Week 6: More on Cox Regression

Survival Analysis. Stat 526. April 13, 2018

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Lecture 3. Truncation, length-bias and prevalence sampling

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Binary Logistic Regression

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Reduced-rank hazard regression

Logistic Regression - problem 6.14

Introduction to Statistical Analysis

Introduction to logistic regression

49th European Organization for Quality Congress. Topic: Quality Improvement. Service Reliability in Electrical Distribution Networks

5. Parametric Regression Model

STAT331. Cox s Proportional Hazards Model

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Analysis of transformation models with censored data

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM

Basic Medical Statistics Course

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

Multi-state models: prediction

Chapter 17. Failure-Time Regression Analysis. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Simple techniques for comparing survival functions with interval-censored data

Time-dependent coefficients

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Quantile Regression for Residual Life and Empirical Likelihood

Checking model assumptions with regression diagnostics

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

Transcription:

Survival Analysis 732G34 Statistisk analys av komplexa data Krzysztof Bartoszek (krzysztof.bartoszek@liu.se) 10, 11 I 2018 Department of Computer and Information Science Linköping University

Survival analysis In brief Study of time to event questions. What are the chances of a person surviving till pension age? What are the chances of a pension being taken out for at least 10 years? (demographics, economics) What are the chances of cancer remission within 5 years of successful treatment? (medicine) What are the chances of a computer not requiring replacement within 2 years? (engineering, economics) Do the survival times between two groups receiving different treatments differ significantly? (medicine, biology) What are the chances of a person being arrested within a year of release?

Lesson structure Two lectures 10 January (13:15 15), 11 January (08:15 10) Computer examination 15 January (10:15 12) Hand in of examination at latest 22 January 10:15 Answer in Swedish or English, as you prefer. Electronic reports as.pdf (or.txt). INCLUDE YOUR NAME IN THE REPORT! Work indivdually. Disclose ALL collaborations and sources. Provide source code (if used). E mail contact: krzysztof.bartoszek@liu.se

Course materials, software Lecture slides 2015/16 lecture slides (på svenska, Karl Wahlin) Articles on Kaplan Meier curves, Cox regression, hazards ratio on course www http://www.ats.ucla.edu/stat/examples/alda.htm https://cran.r-project.org/web/packages/survival/ index.html vignettes Paul D. Allison Event history and survival analysis, SAGE Publishing, 2014 (if anyone feels they need a book) R (survival package) R (KMsurv package, data sets) SPSS (e.g. Karl Wahlin s lecture) SAS (e.g. P. D. Allison s books)

Survival analysis the applications Where? Medical studies Engineering studies (reliability theory) Social studies (duration analysis) Any time to event questions What are people interested in? Visualize data Test hypotheses Make predictions Model phenomena Estimate parameters

Important terms Event: something that we are interested in occurring e.g. death, birth, getting arrested, obtaining a salary raise, onset of disease, being cured of disease e.t.c. Failure of an individual: the event occurs for a given individual e.g. dying, giving birth, being born, landing in prison, getting a pay increase, falling ill, becoming healthy e.t.c. Individual at risk: the event has not yet occurred but may in the future. Population at risk: the collection of individuals for which the event can occur.

Experimental setups

The survival function: definition T : random variable, time to event T s cumulative distribution function F (t) := P r(t t) Survival function S(t) := P r(t > t) S(t) is the probability to survive longar than t. Recap questions: How is S(t) related to the CDF? And T s density (if exists)? Question: Is S(t) a monotonic function?

The survival function: toy example Discrete time, we have minute bins individ time status 1 1.00 1.00 2.00 2 2.00 2.00 2.00 3 3.00 1.00 2.00 4 4.00 3.00 2.00 5 5.00 3.00 2.00 6 6.00 2.00 2.00 7 7.00 1.00 2.00 8 8.00 4.00 2.00 9 9.00 2.00 2.00 10 10.00 5.00 2.00

Life table: toy example # at Cens. At risk Deaths Pr. Pr. Pr. start death surv. x surv. > x x n x w x r x d x q x p x S x 1 10 10 3 0.3 0.7 7/10 2 7 7 3 3/7 4/7 28/70 3 4 4 2 0.5 0.5 14/70 4 2 2 1 0.5 0.5 7/70 5 1 1 1 1 0 0 q x = d x /r x p x = 1 q x S x = y x p y

The survival function: toy example Cumulative survival 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 time

The survival function: R code library ( s u r v i v a l ) ## data f o r s u r v i v a l a n a l y s i s needs a binary s t a t u s v a r i a b l e e. g. 1 a l i v e, 2 dead, see? Surv dftoydata< cbind ( individ =1:10,time=times. toydata, status =2) surv. toydata< with ( as. data. frame ( dftoydata ), Surv ( time, status ) ) print ( surv. toydata ) ## [ 1 ] 1 2 1 3 3 2 1 4 2 5 dftoydata [, time ] ## [ 1 ] 1 2 1 3 3 2 1 4 2 5 ## Old v e r s i o n s would accept s u r v f i t ( surv. t o y d a ta ) s f i t. toydata< s u r v f i t ( surv. toydata 1) plot ( s f i t. toydata, conf. i n t=false, xlab= time, ylab= Cumulative s u r v i v a l, lwd=3)

The survival function: toy example 2 individ time status treatment 1 1 3 2.00 a 2 2 3 2.00 a 3 3 2 2.00 b 4 4 4 2.00 b 5 5 1 2.00 b 6 6 1 2.00 a 7 7 5 2.00 a 8 8 4 2.00 b 9 9 1 2.00 a 10 10 4 2.00 b Exercise: Construct separate life tables for group a and b.

The survival function: toy example 2 Cumulative survival 0.0 0.2 0.4 0.6 0.8 1.0 a b 0 1 2 3 4 5 time

The survival function: R code library ( s u r v i v a l ) ## now we see why we need a data frame dftoydata2< data. frame ( individ =1:10,time= sample ( 1 : 5, 1 0, rep=true), status =2, treatment= sample ( c ( a, b ),10, rep=true) ) times. toydata2< with ( dftoydata2, Surv ( time, status ) ) ## and s f i t. toydata2< s u r v f i t ( times. toydata2 treatment, dftoydata2 ) plot ( s f i t. toydata2, lwd=3, l t y=c ( 1, 2 ), xlab= time, ylab= Cumulative s u r v i v a l, cex. lab =1.5) legend ( t o p r i g h t, legend=c ( a, b ), l t y=c ( 1, 2 ), lwd=3, cex =2, bty= n )

Censored data Right censoring What if an entity does not fail to survive? e.g. patient does not die within observation period What if an entity dropped out of the analysis? e.g. patient died of other causes Left censoring What if we know an entity failed to survive but not when failure occurred? e.g. exact time of death unknown Interval censoring We only know failure occurred inside a given time interval.

Censored data: toy example 3 individ time status 1 1 93.06 2.00 2 2 266.68 1.00 3 3 87.56 1.00 4 4 270.87 1.00 5 5 206.27 2.00 6 6 25.69 2.00 7 7 153.26 2.00 8 8 16.75 2.00 9 9 170.52 1.00 10 10 134.20 2.00

Life table: toy example Time intervals can be arbitrary (as useful). r x = n x w x /2 # at Cens. At risk Deaths Pr. Pr. Pr. start death surv. x surv. > x x n x w x r x d x q x p x S x 20 10 0 10 1 0.1 0.9 0.9 30 9 0 9 1 1/9 8/9 72/90 100 8 1 7.5 1 10/75 65/75 0.693 150 6 0 6 2 1/3 2/3 0.462 200 4 1 3.5 0 0 1 0.462 250 3 0 3 1 1/3 2/3 0.308 300 2 2 1 0 0 1 0.308

Kaplan Meier estimation Edward Kaplan and Paul Meier published a paper 1958 how to deal with incomplete observations. We need to know the moments of censoring, i.e. continuous time. records n.max n.start events 10.0 10.0 10.0 6.0 *rmean *se(rmean) 161.7 30.3 median 0.95LCL 0.95UCL 153.3 93.1 NA * restricted mean with upper limit = 271 to obtain 95%CIs use normal approximation 1.96SE

Kaplan Meier estimation: toy example 3 Cumulative survival 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 200 250 time

Kaplan Meier estimation: R code library ( s u r v i v a l ) dftoydata4< data. frame ( individ =1:30,time=runif (30,min=0,max=300), status=sample ( c ( 1, 2 ),30, prob=c ( 0. 2, 0. 8 ), rep=true), treatment=sample ( c ( a, b ),30, rep=true) ) times. toydata4< with ( dftoydata4, Surv ( time, status ) ) s f i t. toydata4< s u r v f i t ( times. toydata4 treatment, dftoydata4 ) print ( s f i t. toydata4, print. rmean=true) plot ( s f i t. toydata4, lwd=1, l t y=c ( 1, 2 ), xlab= time, ylab= Cumulative s u r v i v a l, cex. lab =1.5, conf. i n t=true) legend ( b o t t o m l e f t, legend=c ( a, b ), l t y=c ( 1, 2 ), lwd=3, cex =2, bty= n )

Kaplan Meier estimation: two group comparison Cumulative survival 0.0 0.2 0.4 0.6 0.8 1.0 a b 0 50 100 150 200 250 time

Risk, odds # people # events # non events (hay fever) Medication A N A = 144 D A = 19 H A = 125 Medication B N B = 146 D B = 33 H B = 113 Risk: R A =??, R B =??, Risk ratio: RR A/B =??, Odds: O A =??, O B =??, Odds ratio: OR A/B =??.

Risk, odds https://en.wikipedia.org/wiki/odds_ratio # people # events # non events (hay fever) Medication A N A = 144 D A = 19 H A = 125 Medication B N B = 146 D B = 33 H B = 113 Risk: R A = D A /N A = 0.132, R B = D B /N B = 0.226, Risk ratio: RR A/B = R A /R B = 0.584, RR B/A = R B /R A = 1.713 Odds: Alternative formula? O A = D A /H A = 0.152, O B = D B /H B = 0.292, Odds ratio: OR A/B = O A /O B = 0.520 OR B/A = O B /O A = 1.921.

Hazard rate (function) We are interested in the chances of failure if we know survival occurred up to time t. P r(t T < t + t T t) h(t) := lim t 0 t This is the conditional density of failing at time t given survival till t. If time is discrete, then t = 1.

Log rank test We have two groups (e.g. smokers/non smokers, no chemo/chemo, men/women). Consider HR := h 1 (t)/h 2 (t). H 0 : HR = 1 H a : HR 1 The test looks at observations at each time point. Non parametric. The test variable has a χ 2 distribution. If p value is below significance level (e.g. 0.05), then survival distributions can be assumed different.

Cox regression To estimate h(t) from data we need a parametric form for it in terms of explanatory variables. E.g. justification of a given model can be difficult. Regression methods are popular to relate predictor and response variables (here survival). Cox regression can be used to study effects of predictors on survival time, compare groups under different treatments, predict survival time. Semi parametric method, does not assume a particular form for the hazard function but does assume a regression framework for explanatory variables. Explanatory variables can be discrete or continuous. Explanatory variables can be fixed (e.g. race, ever smoked) or time dependent (blood pressure, salary). Response variable: hazard rate.

Cox regression h(t) = h 0 (t) exp(b 1 x 1 + b 2 x 2 +... + b k x k ) h(t) hazard rate for an individual at time t h 0 (t) baseline hazard (all x i = 0), the intercept sits here b i s are coefficients that modify the baseline hazard x i s are explanatory variables b i s are obtained by e.g. (partial) maximum likelihood (h 0 (t) is usually not of interest) It is a proportional hazards model covariates are multiplicatively related to the hazard.

Cox regression: example veteran data set in survival package Randomized trial of two treatment regimens for lung cancer. D. Kalbfleisch and R. L. Prentice (1980), The Statistical Analysis of Failure Time Data. Wiley, New York. trt: 1=standard 2=test celltype: 1=squamous, 2=smallcell, 3=adeno, 4=large time: survival time status: censoring status karno: Karnofsky performance score (100=good) diagtime: months from diagnosis to randomization age: in years MAY WE USE IT? prior: prior therapy 0=no, 1=yes

Cox regression: example Cumulative survival 0.0 0.2 0.4 0.6 0.8 1.0 trt=1 trt=2 0 200 400 600 800 1000 time

Cox regression: R code library ( s u r v i v a l ) v. times< with ( veteran, Surv ( time, status ) ) v. s u r v f i t< s u r v f i t ( v. times t r t, data=veteran ) plot ( v. s u r v f i t, l t y=c ( 1, 2 ), lwd=1, xlab= time, ylab= Cumulative s u r v i v a l, cex. lab =1.5, conf. i n t=true) legend ( t o p r i g h t, legend=c ( t r t =1, t r t =2 ), l t y=c ( 1, 2 ), lwd=3, cex =2, bty= n ) print ( v. s u r v f i t, print. rmean=true) print ( s u r v d i f f ( v. times veteran $ t r t ) ) print ( s u r v d i f f ( v. times veteran $ c e l l t y p e ) ) summary( coxph ( v. times karno, data=veteran ) ) summary( coxph ( v. times karno+t r t, data=veteran ) ) summary( coxph ( v. times karno+t r t+c e l l t y p e, data= veteran ) )

Cox regression: R code

Model validation Schoenfeld residuals also called partial residuals We want to test if the covariates effect on the hazard is independent of time. Schoenfeld (1982) Partial Residuals for The Proportional Hazards Regression Model A residual for each covariate c for each individual i. r ci = x ci k R i x ck p ck p ck = exp(β T X k )/ r R i exp(β T X r ) Expectation w.r.t. to likelihood of failure of each individual in risk set, R i set of individuals at risk when i fails.

Schoenfeld residuals Only defined for uncensored individuals, i.e. those that failed. Should be independent of time. In other words the slope of the regression of the residuals on time should be 0. library ( s u r v i v a l ) v. times< with ( veteran, Surv ( time, status ) ) CoxKarnoTrtCell< coxph ( v. times karno+t r t+ c e l l t y p e, data=veteran ) ZPHktc< cox. zph ( CoxKarnoTrtCell ) plot ( ZPHktc, var=1) ; plot ( ZPHktc, var=2) ; plot ( ZPHktc, var=3) ; plot ( ZPHktc, var=4) ; plot ( ZPHktc, var=5)

Schoenfeld residuals Time Beta(t) for karno 8.2 19 32 54 99 130 220 390 0.15 0.10 0.05 0.00 0.05 0.10 Time Beta(t) for trt2 8.2 19 32 54 99 130 220 390 4 2 0 2 4 Time Beta(t) for celltypesmallcell 8.2 19 32 54 99 130 220 390 5 0 5 Time Beta(t) for celltypeadeno 8.2 19 32 54 99 130 220 390 5 0 5 Time Beta(t) for celltypelarge 8.2 19 32 54 99 130 220 390 6 4 2 0 2 4 6 http://www.ats.ucla.edu/stat/examples/asa/test proportionality.htm

Extending the proportional hazards model WHY?

Extending the proportional hazards model The basic assumption is that the hazard ratio between different groups is constant with time. But time dependent covariates, e.g. age, salary, etc. Modelled as an interaction term between time and covariate Discrete case (i.e. fixed for time intervals): create separate entry for every time interval where individual has constant value, (start, stop] version in Surv. Continuous case: need a parametric form for the interaction e.g. h(t) = h 0 (t) exp(b 1 x 1 +... + b k x k + cx k+1 t). See Using Time Dependent Covariates vignette https://cran.r-project.org/web/packages/survival/ index.html especially for veteran data analysis.

Recap Modelling time to event in a population. Kaplan Meier curves. Modelling how different treatments affect failure time. Proportional hazards model, Cox regression.