Survival Analysis. Stat 526. April 13, 2018

Similar documents
Lecture 22 Survival Analysis: An Introduction

MAS3301 / MAS8311 Biostatistics Part II: Survival

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Survival Analysis. STAT 526 Professor Olga Vitek

Power and Sample Size Calculations with the Additive Hazards Model

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Extensions of Cox Model for Non-Proportional Hazards Purpose

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Lecture 4 - Survival Models

Survival Analysis Math 434 Fall 2011

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

MAS3301 / MAS8311 Biostatistics Part II: Survival

8. Parametric models in survival analysis General accelerated failure time models for parametric regression

Survival analysis in R

Economics 508 Lecture 22 Duration Models

Multistate Modeling and Applications

Survival Regression Models

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

STAT331. Cox s Proportional Hazards Model

Survival Analysis I (CHL5209H)

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Duration Analysis. Joan Llull

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Cox s proportional hazards model and Cox s partial likelihood

Key Words: survival analysis; bathtub hazard; accelerated failure time (AFT) regression; power-law distribution.

β j = coefficient of x j in the model; β = ( β1, β2,

A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

Lecture 8 Stat D. Gillen

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

Logistic regression model for survival time analysis using time-varying coefficients

Multistate models and recurrent event models

5. Parametric Regression Model

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Unobserved Heterogeneity

Multistate models in survival and event history analysis

Beyond GLM and likelihood

Meei Pyng Ng 1 and Ray Watson 1

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Multistate models and recurrent event models

Estimation for Modified Data

Exercises. (a) Prove that m(t) =

Reliability Engineering I

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Multi-state Models: An Overview

Statistical Inference and Methods

Package threg. August 10, 2015

TMA 4275 Lifetime Analysis June 2004 Solution

Survival analysis in R

ST495: Survival Analysis: Maximum likelihood

Lecture 5 Models and methods for recurrent event data

DAGStat Event History Analysis.

ST745: Survival Analysis: Cox-PH!

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching

Residuals and model diagnostics

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Cox s proportional hazards/regression model - model assessment

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Tied survival times; estimation of survival probabilities

Lecture 7. Poisson and lifetime processes in risk analysis

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Lecture 7 Time-dependent Covariates in Cox Regression

The Weibull Distribution

Lecture 3. Truncation, length-bias and prevalence sampling

Frailty Modeling for clustered survival data: a simulation study

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Multi-state models: prediction

Modelling geoadditive survival data

Dynamic Models Part 1

Comparison of Hazard, Odds and Risk Ratio in the Two-Sample Survival Problem

3003 Cure. F. P. Treasure

Models for Multivariate Panel Count Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Longitudinal + Reliability = Joint Modeling

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Likelihood Construction, Inference for Parametric Survival Distributions

Survival Prediction Under Dependent Censoring: A Copula-based Approach

Variable Selection in Competing Risks Using the L1-Penalized Cox Model

Nonparametric Model Construction

Step-Stress Models and Associated Inference

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

STAT 526 Spring Final Exam. Thursday May 5, 2011

Survival Model Predictive Accuracy and ROC Curves

7.1 The Hazard and Survival Functions

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Accelerated Failure Time Models

Chapter 4 Regression Models

ST745: Survival Analysis: Parametric

Philosophy and Features of the mstate package

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis

4 Testing Hypotheses. 4.1 Tests in the regression setting. 4.2 Non-parametric testing of survival between groups

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Ph.D. course: Regression models. Introduction. 19 April 2012

Transcription:

Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined as S(t) = P [T > t] = 1 F (t) It is clear that S(0) = 1 and S( ) = 0 The survival function can be explained by the probability of the subject to live longer than t The hazard function is defined by It is clear that h(t) = lim t 0 h(t) = P (t < T t + t T t) t f(t) 1 F (t) = f(t) S(t), where f and F are density or distribution of T It can be explained as an instant hazard of the people exposed The cumulative hazard function is defined as H(t) = t 0 h(x)dx = log[s(t)] It can be explained as the cumulative hazard of the people exposed until time t 2 Some Well Known Distributions The density function of the exponential distribution with parameter λ > 0 is f(t) = λe λt, for t > 0 It is clear that S(t) = e λt, h(t) = λ and H(t) = λt This distribution is the constant hazard distribution The density of the Weibull distribution with parameter α and β is f(t) = αγ(γt) α 1 e (γt)α 1

It is cleat that when α = 1 it is exponential distribution The distribution is F (t) = 1 e (γt)α It is clear that S(t) = e (γt)α, h(t) = αγe (γt)α = αγ(γt) α 1, and H(t) = (γt) α Thus, for Weibull distribution log[ log S(t)] = α(log γ + log t) Therefore, it is suggested to take a look at the estimated log[ log S(t)] versus log t to diagnose Weibull distribution R defines the output for the parameters as 1 The scale parameter α, and p 1 log(γ) = β 0 + β j x j 2 If α are equal for all groups, then Weibull distribution has the proportional hazard property In the default, R requires the scale parameters are the same If log(y ) follow a normal distribution, then we call Y follows a log-normal distribution Therefore, the distribution function of log-normal with parameter µ and σ 2 is F (t) = Φ( log t µ ) σ The density function of Gamma-distribution with parameter α and β is f(t) = βα Γ(α) tα 1 e βt 3 Non-parametric Estimate the Survival Function There is no assumption for the distributions Let us look at the leukemia dataset In Table 1 There are total 42 observations in this dataset The objective is to test the drug effect Thus, 21 of them took drugs and 21 of them took placebo The assignment for drugs or placebo are completely randomized The response is the time (weeks) that the patient live The predictor censor tells us the corresponding patient drop off from the experiment at the time Thus, we only know that the patient live longer than the time we record In this dataset, 1 means not dropoff or relapse and 0 means dropoff or censored In this dataset, no patients in placebo dataset dropped off 2

Table 1: Survival Data For Leukemia treat censor weeks treat censor weeks drug 0 6 placebo 1 1 drug 1 6 placebo 1 1 drug 1 6 placebo 1 2 drug 1 6 placebo 1 2 drug 1 7 placebo 1 3 drug 0 9 placebo 1 4 drug 0 10 placebo 1 4 drug 1 10 placebo 1 5 drug 0 11 placebo 1 5 drug 1 13 placebo 1 8 drug 1 16 placebo 1 8 drug 0 17 placebo 1 8 drug 0 19 placebo 1 8 drug 0 20 placebo 1 11 drug 1 22 placebo 1 11 drug 1 23 placebo 1 12 drug 0 25 placebo 1 12 drug 0 32 placebo 1 15 drug 0 32 placebo 1 17 drug 0 34 placebo 1 22 drug 0 35 placebo 1 23 3

Suppose there is no dropoff Then, the empirical-survival function is Ŝ(t) = 1 #{P atient > t} #P atient where #{P atient > t} means the number of the patient live longer than t Next, let us consider the dataset with censoring subjects Let (T i, C i ) be the record for observation i with observation (t i, δ i ), where T i is the time and C i is the indicator of censoring Assume that t 1 t 2 t n Then, if observation i relapse (δ i = 1) then the instant density is f(t i ) and if observation i censors (δ i = 0) then the observed probability is S(t i ) Thus, we have the likelihood function is L(t 1,, t n ) = f(t i ) S(t i ) = h(t i ) δ i δ i =1 δ i =0 δ i =1 n S(t i ) i=1 Thus, we can assume i Ŝ(t i ) = (1 λ i ), where 0 λ 1,, λ n < 1, because S(t) is right-continuous Then, the mass function at t 1,, t n is i 1 ˆf(t i ) = Ŝ(t i 1) Ŝi = λ i (1 λ j ) Let d i die and n i happen at t i Then, the likelihood at t i part is ˆL i = ˆf(t i ) d i Ŝ(t i ) n i d i = λ d i i (1 λ i ) n i d i Thus, we have L = i λ d i i (1 λ i ) n i d i Thus, Therefore, we have ˆλ i = d i n i Ŝ(t) = (1 d i ), n i where d i is the number of death and n i is the number of at risk (remaining) It is clear that Thus, log Ŝ(t) = log(1 d i ), n i Vˆar[log Ŝ(t)] = d i n i (n i d i ) There is another options (Greenwood s Formula) Vˆ ar[ŝ(t)] = d i Ŝ(t)2 n i (n i d i ) 4

This estimator is called Kaplan Meier estimator It is the ML estimator of the survival function Let us take a look at the estimator of the treatment group for the leukemia dataset The event happened at points: 6, 7, 9, 10, 11, 13, 16, 17, 19, 20, 22, 23, 25, 32, 34, 35 totally 16 points The number of death and at risk at the above time is (3, 21), (1, 17), (0, 16), It is clear that when t < 6, Ŝ(t) = 1; when 6 t < 7, d i = 3 and n i = 21, when 7 t < 9, Ŝ(t) = (1 3 21 ) = 6 7 ; Ŝ(t) = 6 7 (1 1 17 ); and so on The significance test is based on a log-rank test See detail in my output file The analysis result for this data set tells us a significant drug effect exist because the p-value is 00000417 4 Parametric Method Let us still consider the leukemia dataset If we assume the distribution is exponential, then there are two parameter λ t and λ p for treatment and placebo groups Let us look at the treatment group We have h(t) = λ t and S(t) = e λtt Then, the likelihood function is δi n L = λt exp[ λ t t i ] i=1 Thus, ˆλ t = ni=1 δ i ni=1 t i For the treatment group, we have ˆλ t = 9 359 In R, the estimate of treatment is defined by ˆβ t = log(ˆλ t ) = 369 Similarly, we have ˆβ p = 216 They are significant different because the p-value is 0000049 Let us look at the Weibull distribution The hazard function is h(t) = αλ α t α 1 It is clear that the hazard is a decreasing function of t if α < 1 and a increasing function of t if α > 1 When α > 1, we call it is an accelerated life models It is not easy to compute the estimate of the parameters for Weibull distribution Thus, a numerical method is suggested R has a function to fit the Weibull model R rewrite the model into h(t) = αt α 1 exp[ β x], 5

and call α scale parameter The output tells us that ˆα = 07322, and ˆβ t = 35157, ˆβ p = 2248 Thus, the hazard of treatment group is less than the hazard of placebo group We assume the scale parameter α is the same for the two groups Thus, if we rewrite αβ by β Then, the hazard function becomes h(t) = αt α 1 exp( β x) = h 0 (t) exp( β x) It is clear that h 0 (t) does not dependent on the covariates or treatment proportional hazard model We call this model 5 Semi-parametric method: proportional hazards model The assumption for cox-proportional hazards model is h(t) = h 0 (t)e β x, where x is a vector of predictors and β is a vector of parameters independent of time t, h 0 is only a function of t; that is where x j are covariates or treatment p 1 β x = β 0 + β j x j, Cox suggests to look at the partial likelihood L p = n e β x i=1 T t i e β x to estimate the parameters This function is called partial likelihood function It is the hazards of observation at t i over the sum of the hazard of all the remaining We can prove that the definition above is equivalent to S(t) = [S 0 (t)] eβ x The estimates are the values to maximize the partial likelihood function In the leukemia example, β t = 0 and β p = 157 The p-value is 0000014 Thus, the treatment is significant Once the parameters are estimated, we can estimate the baseline hazard function R has those functions In R output, we can directly read the survival function Ŝ(t) which should be a step function Then, the hazard function at time t i is and or h(t i ) = S t i 1 S ti S ti ; Ĥ(t i ) = j i h(t i )(t j+1 t j ); Ĥ(t i ) = j i h(t i )(t j t j 1 ) The non-parametric Kaplan-Meier estimator is the MLE of survival function It is suggested to compare the result with this estimator to diagnose the model 6