( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

Similar documents
Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

More on Cox-regression

Survival Analysis Math 434 Fall 2011

Faculty of Health Sciences. Cox regression. Torben Martinussen. Department of Biostatistics University of Copenhagen. 20. september 2012 Slide 1/51

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

β j = coefficient of x j in the model; β = ( β1, β2,

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

STK4080/9080 Survival and event history analysis

9 Estimating the Underlying Survival Distribution for a

Survival Regression Models

Cox s proportional hazards/regression model - model assessment

MAS3301 / MAS8311 Biostatistics Part II: Survival

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

The coxvc_1-1-1 package

Survival Analysis I (CHL5209H)

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

STAT331. Cox s Proportional Hazards Model

Residuals and model diagnostics

Building a Prognostic Biomarker

Reduced-rank hazard regression

Relative-risk regression and model diagnostics. 16 November, 2015

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen

Cox s proportional hazards model and Cox s partial likelihood

Model Adequacy Test for Cox Proportional Hazard Model

DAGStat Event History Analysis.

Modelling geoadditive survival data

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Dynamic analysis of binary longitudinal data

Philosophy and Features of the mstate package

Beyond GLM and likelihood

Tied survival times; estimation of survival probabilities

TMA 4275 Lifetime Analysis June 2004 Solution


Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Maximum likelihood estimation for Cox s regression model under nested case-control sampling

The nltm Package. July 24, 2006

1 Glivenko-Cantelli type theorems

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

11 Survival Analysis and Empirical Likelihood

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

e 4β e 4β + e β ˆβ =0.765

Chapter 7: Hypothesis testing

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA

Time-dependent covariates

Regularization in Cox Frailty Models

Frailty Models and Copulas: Similarities and Differences

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

ST745: Survival Analysis: Cox-PH!

Analysis of competing risks data and simulation of data following predened subdistribution hazards

1 Introduction. 2 Residuals in PH model

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Lecture 2: Poisson and logistic regression

Consider Table 1 (Note connection to start-stop process).

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Lecture 8 Stat D. Gillen

9. Estimating Survival Distribution for a PH Model

ST495: Survival Analysis: Maximum likelihood

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

PASS Sample Size Software. Poisson Regression

Power and Sample Size Calculations with the Additive Hazards Model

Modeling Real Estate Data using Quantile Regression

Survival analysis in R

STAT Sample Problem: General Asymptotic Results

Lecture 7 Time-dependent Covariates in Cox Regression

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Goodness-of-fit test for the Cox Proportional Hazard Model

Outline. Cox's regression model Goodness-of-t methods. Cox's proportional hazards model: Survival analysis

Multinomial Logistic Regression Models

Chapter 4 Regression Models

Data Mining Stat 588

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Stat 642, Lecture notes for 04/12/05 96

Chapter 2 Inference on Mean Residual Life-Overview

Generalized logit models for nominal multinomial responses. Local odds ratios

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Lecture 5: Estimation of time series

A Hierarchical Perspective on Lee-Carter Models

Estimation for Modified Data

Multi-state Models: An Overview

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Lecture 5: Poisson and logistic regression

Efficiency of Profile/Partial Likelihood in the Cox Model

Models for Multivariate Panel Count Data

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Tests of independence for censored bivariate failure time data

Transcription:

Outline: Cox regression part 2 Ørnulf Borgan Department of Mathematics University of Oslo Recapitulation Estimation of cumulative hazards and survival probabilites Assumptions for Cox regression and check of model assumptions NORBIS course University of Oslo 4-8 December 217 1 2 Recapitulation Assume that we have a sample of n individuals, and let N i (t) count the observed occurrences of the event of interest for individual i as a function of (study) time t We have the decomposition dn ( t) = λ ( t) dt + dm ( t) i i i observation signal noise he intensity process for individual i may be given as λ i ( t) = Yi ( t) α( t xi ) at risk indicator hazard rate (intensity) (time-dependency of covariates suppressed in the notation) Assume that the hazard rate for individual i takes the form α ( t x ) = α ) (, x ( t)) i ( t rβ i he common choice of relative risk function is ( ) ( β1 1 β ) r( β, x ( t)) = exp β x ( t) = exp x ( t) + + x ( t) i i i p ip which gives Cox's regression model e β j baseline hazard hazard ratio (relative risk) is the hazard ratio (HR), often called relative risk, for one unit's increase in the j-th covariate, keeping all other covariates the same 4

Partial likelihood and estimation of β Ordinary ML-estimation does not work for the relative risk regression models (due to the nonparametric baseline) Instead we have to use Cox's partial likelihood Cumulative hazards and survival probabilities We will estimate the cumulative baseline hazard A t t ( ) ( ) = α We take the aggregated counting process as our starting point. u du Its intensity process is given by Here i j is the index of the individual who experiences an event at j, and is the risk set at j If we had knownβ, we could have repeated the argument we used to derive the Nelson-Aalen estimator to show that we could estimate A ( t) by 5 6 Since is unknown, we replace it by to obtain the Breslow estimator: he corresponding survival function is given by { } = exp A( t x ) and it may be estimated by If all covariates are fixed, the cumulative hazard corresponding to an individual with covariate vector and it may be estimated by x is 7 Alternatively we may use (as is done in R): { A ˆ t x } Sɶ ( t x ) = exp ( ) For practical purposes there is little difference between the two estimators he estimators of the cumulative hazards and survival functions are approximately normal and their variances may be estimated as described in section 4.1.6 in ABG 8

Melanoma data We first compare Nelson-Aalen estimates (black lines) with the cumulative hazards obtained from a Cox model with sex as only covariate (red lines) Cumulative hazard..1.2.3.4.5.6.7 Females Males Using R he results on the previous slide are obtained by the following commands: # We consider a model with sex as the only covariate and start by # making Nelson-Aalen plots for females and males: fit.ss=coxph(surv(lifetime,status==1)~strata(sex),data=melanoma) surv.ss=survfit(fit.ss) plot(surv.ss,fun="cumhaz", mark.time=false,xlim=c(,1),ylim=c(,.7), xlab="years since operation",ylab="cumulative hazard",lty=c(1,3),lwd=2) legend("topleft",c("females","males"),lty=c(1,3),lwd=2) # We then fit a Cox model with sex as the only covariate and plot # the model based estmates of the cumulative hazards in the same plot: fit.s=coxph(surv(lifetime,status==1)~factor(sex),data=melanoma) surv.s=survfit(fit.s,newdata=data.frame(sex=c(1,2))) lines(surv.s,fun="cumhaz", mark.time=f, lty=c(1,3),lwd=2,col="red") 2 4 6 8 1 Years since operation 9 1 hen we consider a model with sex thickness and ulceration ˆβ HR se( ˆ β ) Z P Sex:.459 1.58.267 1.72.85 hickness:.113 1.12.38 2.99.28 Ulceration: -1.667.31.311-3.75.18 Estimated cumulative hazards : We will estimate cumulative hazards and survival functions for females with the following combinations of tumor thickness and ulceration: 1) hickness: 1 mm, ulceration: absent 2) hickness: 2 mm, ulceration: absent 3) hickness: 2 mm, ulceration: present 4) hickness: 4 mm, ulceration: present 11 12

Estimated survival functions: Using R he results on the previous slides are obtained by the following commands: # We consider the model with sex, ulceration and thickness: fit.stu=coxph(surv(lifetime,status==1)~factor(sex)+factor(ulcer)+thickn, data=melanoma) summary(fit.stu) # We plot the cumulative hazards for females for four covariate combinations: # 1) thickn=1, ulcer=2 2) thickn=2, ulcer=2 # 3) thickn=2, ulcer=1 4) thickn=4, ulcer=1 new.covariates=data.frame(sex=c(1,1,1,1), ulcer=c(2,2,1,1), thickn=c(1,2,2,4)) surv.stu=survfit(fit.stu,newdata=new.covariates) plot(surv.stu,fun="cumhaz", mark.time=false, xlim=c(,1), xlab="years since operation",ylab="cumulative hazard",lty=1:4,lwd=2) legend("topleft",c("female, 1 mm, absent","female, 2 mm, absent", "female 2 mm, present","female, 4 mm, present"), lty=1:4,lwd=2) 13 # o plot the survival functions for females for the same combinations of the # covariates we just omit the "cumhaz" option Assumptions for Cox regression We consider a Cox regression model with fixed covariates: α( t x) = α( t) exp( β x) Note that the model assumes: 1) Log-linearity: log{ α( t x)} = log{ α ( t)} + β x Check of log-linearity We check log-linearity for a numeric covariate, say covariate 1, assuming that log-linearity is ok for the other covariates We may fit a penalized smoothing spline s( x1 ) for the effect of covariate 1 : α( t x) = α ( t) exp { s( x ) + β x } 1 2 2 and see if the spline estimate becomes fairly linear 2) Proportional hazards: α( t x2) = exp{ β ( x2 x1)} (independent of time) α( t x ) 1 A number of methods exist for checking these assumption, and we will have a look at two of them (this material is not in the ABG-book, cf page 134) 15 Melanoma data: Checking log-linearity by using a spline for tumor thickness in a model with sex and ulceration as the other covariates he non-linear parts of the smoothing spline have a significant effect (P=.38) Partial for pspline(thickn) -2-1 1 2 3 5 1 15 thickn

When the effect of a numeric covariate is not log-linear, we may transform the covariate or use a grouped version of it For the melanoma data, the plots indicate that we may use log-thickness as covariate (and then log2 is a good choice) Melanoma data: Checking log-linearity by using a spline for log2 of tumor thickness in a model with sex and ulceration as the other covariates Partial for pspline(log2thick) -6-4 -2 2 he non-linear parts of the smoothing spline do not have a significant effect (P=.41) -2 2 4 log2thick Using R he results on the previous slide are obtained by the following commands: # o check log-linearity for thickness, we fit a model with sex, ulceration # and penalized smoothing spline for the effect of thickness: fit.spstu=coxph(surv(lifetime,status==1)~factor(sex)+factor(ulcer)+pspline(thickn), data=melanoma) print(fit.spstu) termplot(fit.spstu,se=,terms=3) # o check log-linearity for log2(thickness) [which has to be defined as a # new covariate], we fit a model with sex, ulceration and penalized smoothing # spline for the effect of log2(thickness): melanoma$log2thick=log2(melanoma$thickn) fit.spslogtu=coxph(surv(lifetime,status==1)~factor(sex)+factor(ulcer)+pspline(log2thick), data=melanoma) print(fit.spslogtu) termplot(fit.spslogtu,se=,terms=3) 18 Check of proportional hazards One way to check if we have proportional hazard is to fit a model of the form { 11 i1 12 i1 p1 ip p2 ipg t } α( t x) = α ( t) exp β x + β x g( t) + + β x + β x ( ) for a known function g(t), e.g. g(t) = log t Melanoma data: Plots that indicate possible time dependent effects of the covariates We then test the null hypotesis that one or all of β j 2 = For the melanoma data: chisq p factor(sex)2.23.631 factor(ulcer)2.96.328 log2thick 4.2.45 GLOBAL 8.77.33 19 he test and plots indicate that there may be a non-proportional (i.e. time-dependent) effect of log-thickness 2

Using R he results on the previous slides are obtained by the commands below. # We will do a formal test for proportionality of the covariates. his is done by, # for each covariate x, adding time-dependent covariate x*log(t), and testing # whether the time-dependent covariates are significant using a score test: cox.zph(fit.slogtu,transform='log') # he test indicates that the effect of tumor-thickness is not proportional. # he estimate we get for log-thicness in then a weighted average of the # time-varying effect # We also make plots that give nonparametric estimates of the (possible) # time dependent effect of the covariates: par(mfrow=c(1,3)) plot(cox.zph(fit.slogtu)) par(mfrow=c(1,1) 21