REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Similar documents
9 Estimating the Underlying Survival Distribution for a

Lecture 7 Time-dependent Covariates in Cox Regression

MAS3301 / MAS8311 Biostatistics Part II: Survival

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

β j = coefficient of x j in the model; β = ( β1, β2,

Lecture 8 Stat D. Gillen

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Survival Regression Models

Multistate models and recurrent event models

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Multistate models and recurrent event models

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Extensions of Cox Model for Non-Proportional Hazards Purpose

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Lecture 22 Survival Analysis: An Introduction

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Chapter 4 Regression Models

Survival analysis in R

Relative-risk regression and model diagnostics. 16 November, 2015

Time-dependent covariates

Semiparametric Regression

Survival Analysis Math 434 Fall 2011

Extensions of Cox Model for Non-Proportional Hazards Purpose

Beyond GLM and likelihood

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

TMA 4275 Lifetime Analysis June 2004 Solution

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

9. Estimating Survival Distribution for a PH Model

Power and Sample Size Calculations with the Additive Hazards Model

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Building a Prognostic Biomarker

Survival Analysis for Case-Cohort Studies

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Stat 642, Lecture notes for 04/12/05 96

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

The coxvc_1-1-1 package

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Package crrsc. R topics documented: February 19, 2015

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis

5. Parametric Regression Model

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

STAT331. Cox s Proportional Hazards Model

MAS3301 / MAS8311 Biostatistics Part II: Survival

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Logistic regression model for survival time analysis using time-varying coefficients

Multi-state Models: An Overview

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Survival analysis in R

Frailty Modeling for clustered survival data: a simulation study

A fast routine for fitting Cox models with time varying effects

DAGStat Event History Analysis.

Multivariable Fractional Polynomials

3003 Cure. F. P. Treasure

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Multivariate Survival Analysis

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Statistics in medicine

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Survival Analysis. Stat 526. April 13, 2018

Survival Prediction Under Dependent Censoring: A Copula-based Approach

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Müller: Goodness-of-fit criteria for survival data

STAT 526 Spring Final Exam. Thursday May 5, 2011

A multi-state model for the prognosis of non-mild acute pancreatitis

Survival Analysis. STAT 526 Professor Olga Vitek

Advanced Methodology Developments in Mixture Cure Models

The influence of categorising survival time on parameter estimates in a Cox model

Direct likelihood inference on the cause-specific cumulative incidence function: a flexible parametric regression modelling approach

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

Faculty of Health Sciences. Cox regression. Torben Martinussen. Department of Biostatistics University of Copenhagen. 20. september 2012 Slide 1/51

Tests of independence for censored bivariate failure time data

Survival Distributions, Hazard Functions, Cumulative Hazards

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Joint Modeling of Longitudinal Item Response Data and Survival

Ph.D. course: Regression models. Introduction. 19 April 2012

Multistate models in survival and event history analysis

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Smoothing Spline-based Score Tests for Proportional Hazards Models

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression

Philosophy and Features of the mstate package


Instantaneous geometric rates via Generalized Linear Models

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Case-control studies

First Aid Kit for Survival. Hypoxia cohort. Goal. DFS=Clinical + Marker 1/21/2015. Two analyses to exemplify some concepts of survival techniques

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Multivariable Fractional Polynomials

Transcription:

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU 1

Outline Why standard regression models aren t used with censored survival data Modeling the hazard rate as a function of covariates The Proportional Hazards Model Interpreting the regression coefficients 2

Regression Models with Survival Data A major focus in any study is to characterize the relationship between a response Y and covariates (prognostic factors) X 1,...,X k For continuous variables Y the most popular model is the multiple linear regression model where Y = β 0 +β 1 X 1 +...+β k X k +e The regression coefficient β 0,β 1,...,β k in such a model have a nice interpretation, describing the direction and strength of the relationship of each of the prognostic factors on their effect on Y. Reminder: If the j-th variable X j is increased by one unit and all other variables are kept the same the response will be increased by β j units. 3

Multiple Linear Regression Such models accommodate continuous variables discrete variables (dummy variables) interactions polynomial regression Easy to estimate the parameters using least squares Properties have been studied extensively 4

Multiple Linear Regression In chronic disease clinical trials we argued that the primary endpoint (response variable) is survival time T Since T is a continuous positive random variable, it would seem natural to model the relationship of survival time (or possibly some simple transformation of survival time) and covariates using multiple linear regression. logt = β 0 +β 1 X 1 +...+β k X k +e This may be a reasonable strategy if the survival time T were observed (uncensored) for everyone in the study 5

Difficulties with Linear Regression Models Survival times are most often right censored This creates difficulties in estimating the parameters β 0,β 1,...,β k. Least squares doesn t work anymore in providing good estimates Does not accommodate time-dependent covariates (i.e. covariates that change over time). For example cumulative exposure to a risk factor blood pressure heart transplant status 6

Modeling the Hazard Function Data from a clinical trial with a survival endpoint can be summarized as (U i, i,x 1i,...,X ki ),i = 1,...n, where for patient i among a sample of n patients U i denotes time on study i denotes failure indicator 1 = death, 0 = censored X 1i,...,X ki denotes the value of the k covariates With such data it turns out to be more convenient to model the hazard function of dying rather than the survival time itself 7

The hazard rate The hazard rate or hazard function is defined as { } P(t T < t+h T t) λ(t) = lim. h 0 h Models for the hazard rate are given by considering λ(t X 1,...,X k ), where λ(t X 1 = x 1,...,X k = x k ) means the hazard rate (mortality rate) at time t for individuals in the population whose X 1 value equals x 1,... and X k value equals x k. 8

Hazard rate So, for example, suppose survival time is measured as length of life after treatment for leukemia, X 1 denotes age at time of treatment and X 2 denotes gender (0=male, 1=female), then, roughly speaking, if λ(5 X 1 = 55,X 2 = 1) =.10, then this means that the hazard of failing at five years; i.e. given that a woman, 55 years of age when starting treatment, is still alive 5 years after treatment, then the probability of dying in the next year is.10. Notice that time t where the hazard is measured as well as the covariates X 1,...,X k are important in this relationship 9

Modeling the hazard rate Models explore the relationship of the hazard rate in terms of time and covariates The most popular model is the proportional hazards regression model introduced by D.R. Cox in (1972). Often referred to as the Cox regression model. In this model it is assumed that λ(t X 1,...,X k ) = λ 0 (t)exp(β 1 X 1 +...+β k X k ) 10

How to interpret the Cox Model λ(t X) = λ 0 (t)exp(βx) First let is consider only one covariate If X = 0, then the hazard at time t is λ 0 (t) λ 0 (t) is referred to as the baseline hazard function No assumption is made regarding the shape of this function over time t Semiparametric model 11

How to interpret the Cox Model This model can also be written as λ(t X = x) λ 0 (t) = exp(βx), where λ(t X) λ 0 (t) is the ratio of the hazard rate at time t for an individual whose covariate value X = x to the hazard rate at time t for an individual whose covariate value X = 0 λ(t X = x 1 ) λ(t X = x 0 ) = λ 0(t)exp(βx 1 ) λ 0 (t)exp(βx 0 ) = exp{β(x 1 x 0 )} This ratio of hazard rates is sometimes referred to as relative risk The Cox model implicitly assumes that the relative risk is constant over time; i.e. the so-called proportional hazards assumption 12

How to interpret the Cox Model Suppose x 1 > x 0 If β > 0, then exp{β(x 1 x 0 )} > 1 implying that the hazard rate is higher for individuals whose X = x 1 as compared to those whose X = x 0. Moreover, because of proportional hazards, this higher hazard rate occurs throughout all time. Consequently, the greater the value of X the higher the hazard of dying resulting in shorter survival times on average. If β = 0, then the hazard rate doesn t change with the value of X. This corresponds to the null hypothesis that X has no effect on survival. Everyone in the population, regardless of their value of X, has the same hazard rate λ 0 (t). If β < 0, then the hazard rate decreases with increasing X resulting in longer survival times. 13

Example Let X be a binary indicator. For example let X = 1 denotes women with Stage II, node positive breast cancer receiving high intensity CAF therapy and X = 0 those receiving low intensity therapy β =.33; i.e. λ(t X = 1) λ(t X = 0) = exp(.33) =.72 This means that the hazard of dying for women receiving high intensity therapy is.72 times that of women receiving low intensity therapy. i.e. high-dose therapy increased longevity 14

Example Let X denote the number of involved nodes at the time of treatment. β =.06 λ(t X = x 1 ) λ(t X = x 0 ) = exp{β(x 1 x 0 )} So, for example, if we wanted to derive the relative risk between a woman with 11 involved nodes at time of treatment to a woman with 2 involved nodes at time of treatment, we take x 1 = 11 and x 0 = 2 to obtain a relative risk of exp{.06(11 2)} = 1.72 The woman with 11 involved nodes has almost 2 times the risk of death compared to the woman with 2 involved nodes. Clearly, the fact that β was positive also implies that the risk of death increases with the greater the number of nodes that are involved. 15

Multiple Covariates The model implies that λ(t X 1,...,X k ) = λ 0 (t)exp(β 1 X 1 +...+β k X k ) λ(t X 1,...,X k ) λ 0 (t) = exp(β 1 X 1 +...+β k X k ) More importantly, the regression coefficient β j associated with the covariate X j will indicate the direction and strength of the relationship that X j has on the risk of dying, adjusting for the effect of the other covariates. 16

Multiple Covariates For example, suppose we are jointly considering the relationship of X 1,...,X k on the risk of dying using a proportional hazards model. The effect that increasing X j by one unit on the risk of dying, keeping all other variables the same, is λ(t X 1 = x 1,...,X j 1 = x j 1,X j = x j +1,X j+1 = x j+1,...,x k = x k ) λ(t X 1 = x 1,...,X j 1 = x j 1,X j = x j,x j+1 = x j+1,...,x k = x k ) = λ 0(t)exp(β 1 x 1 +...+β j 1 x j 1 +β j (x j +1)+β j+1 x j+1 +...+β k x k ) λ 0 (t)exp(β 1 x 1 +...+β j 1 x j 1 +β j x j +β j+1 x j+1 +...+β k x k ) = exp(β j ) 17

Statistical inference As in any statistical problem, we don t get to see the true population relationship between the hazard rate and the covariates That is, we don t ever know the true values of the regression coefficients β are Instead, these must be estimated from a sample of data (U i, i,x 1i,...,X ki ),i = 1,...,n. Consequently, we obtain estimators of β 1,...,β k which are denoted by ˆβ 1,..., ˆβ k. Estimators for the β s are obtained by maximizing the partial likelihood (an incredibly clever idea that Cox developed) This methodology also provides standard errors for the estimators of β 18

Statistical Inference Therefore, we can get a good idea where, say, the range of the coefficient β j lies within by considering the 95% confidence interval computed by ˆβ j ±1.96 se(ˆβ j ), where seˆβ j ), is the standard error of the estimator ˆβ j If the value β j = 0 (null hypothesis that variable X j has no effect on survival) is not contained in the confidence interval, then this can be used as evidence that X j has a significant effect on survival (the direction of the effect depends on the sign of ˆβ j ) You can also assess the strength of the effect by computing p-value; that is how far out in the tail of a standard normal distribution is the standardized test statistic ˆβ j se(ˆβ j ) 19

Example Using R library(survival) data=read.table( cal8541.dat ) time=data[,1] status=data[,2] trt=data[,3] newdata=subset(data,trt 3) newtime=newdata[,1] newstatus=newdata[,2] newtrt=newdata[,3]-1 ph=coxph(surv(newtime,newstatus) newtrt) ph 20

Example Using R (Results) Call: coxph(formula = Surv(newtime, newstatus) newtr coef exp(coef) se(coef) z p newtrt 0.329 1.39 0.105 3.13 0.0018 Likelihood ratio test=9.86 on 1 df, p=0.00169 n= 987, number of events= 367 One can do multiple regression say for example meno=newdata[,4] ph=coxph(surv(newtime,newstatus) newtrt + meno) 21

Example Using SAS options ps=59 ls=80; data bcancer; infile tsiatis/butch/cal8541.dat ; input days cens trt meno tsize nodes er; years=days/365.25; data bcancer1; set bcancer; if trt= 1 or trt=2; proc phreg data=bcancer1; model years*cens(0)=trt meno; run; 22