Cox s proportional hazards/regression model - model assessment

Similar documents
Survival Analysis Math 434 Fall 2011

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

1 Introduction. 2 Residuals in PH model

Cox s proportional hazards model and Cox s partial likelihood

Residuals and model diagnostics

Time-dependent covariates

Lecture 10. Diagnostics. Statistics Survival Analysis. Presented March 1, 2016

MAS3301 / MAS8311 Biostatistics Part II: Survival

Time-dependent coefficients

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Exercises. (a) Prove that m(t) =

Chapter 4 Regression Models

Checking model assumptions with regression diagnostics

Maximum likelihood estimation for Cox s regression model under nested case-control sampling

Survival Regression Models

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Extensions of Cox Model for Non-Proportional Hazards Purpose

Power and Sample Size Calculations with the Additive Hazards Model

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Philosophy and Features of the mstate package

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Survival Analysis. Stat 526. April 13, 2018

Reduced-rank hazard regression

TESTINGGOODNESSOFFITINTHECOX AALEN MODEL

Generalized logit models for nominal multinomial responses. Local odds ratios

Semiparametric Regression

β j = coefficient of x j in the model; β = ( β1, β2,

ST745: Survival Analysis: Cox-PH!

Weighted Least Squares

Introduction to Reliability Theory (part 2)

Extensions of Cox Model for Non-Proportional Hazards Purpose

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

Chapter 1 Statistical Inference

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Survival Analysis for Case-Cohort Studies

Survival analysis in R


The coxvc_1-1-1 package

Understanding the Cox Regression Models with Time-Change Covariates

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Multivariate Survival Analysis

TMA 4275 Lifetime Analysis June 2004 Solution

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA

Cox regression: Estimation

Frailty Modeling for clustered survival data: a simulation study

Chapter 7: Hypothesis testing

Lecture 8 Stat D. Gillen

Lecture 5 Models and methods for recurrent event data

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Lecture 7 Time-dependent Covariates in Cox Regression

STAT331. Cox s Proportional Hazards Model

On a connection between the Bradley-Terry model and the Cox proportional hazards model

Survival analysis in R

Consider Table 1 (Note connection to start-stop process).

Goodness-of-fit test for the Cox Proportional Hazard Model

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Validation. Terry M Therneau. Dec 2015

Quantile Regression for Residual Life and Empirical Likelihood

Generalized Linear Models

Tied survival times; estimation of survival probabilities

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Multiple Linear Regression

Survival Analysis. STAT 526 Professor Olga Vitek

Accelerated Failure Time Models

More on Cox-regression

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

STAT 4385 Topic 06: Model Diagnostics

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

ST495: Survival Analysis: Maximum likelihood

Building a Prognostic Biomarker

Notes from Survival Analysis

Survival Models for the Social and Political Sciences Week 6: More on Cox Regression

Lecture 2: Martingale theory for univariate survival analysis

Statistics in medicine

Müller: Goodness-of-fit criteria for survival data

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Remedial Measures, Brown-Forsythe test, F test

Follow this and additional works at: Part of the Applied Mathematics Commons

Faculty of Health Sciences. Cox regression. Torben Martinussen. Department of Biostatistics University of Copenhagen. 20. september 2012 Slide 1/51

Single-level Models for Binary Responses

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

THESIS for the degree of MASTER OF SCIENCE. Modelling and Data Analysis

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

8. Parametric models in survival analysis General accelerated failure time models for parametric regression

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

8 Nominal and Ordinal Logistic Regression

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

University of California, Berkeley

Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising from Animal Habitat Selection Studies

Transcription:

Cox s proportional hazards/regression model - model assessment Rasmus Waagepetersen September 27, 2017

Topics: Plots based on estimated cumulative hazards Cox-Snell residuals: overall check of fit Martingale residuals: assessment of functional form of covariate Deviance residuals: detection of outliers Score-process residual: check of proportional hazards for each covariate Detection of influential observations.

Why not just proceed as for linear normal models? Issues: censoring. for Cox ph model we do not have a fully specified model - thus we do not know distribution of residuals. Generally, residual analysis is a bit tricky not only for survival data but for non-normal data in general - residuals tend to look ugly even if the model is correct.

Model with one factor Suppose we have observations (t ij, δ ij ) i = 1,..., K and model for the ith group h i (t ij ) = h 0 (t ij ) exp(β i ) Compute a cumulative hazard estimate Ĥ i for each group. Recall H i (t) = H 0 (t) exp(β i ) log H i (t) = log H 0 (t) + β i Various types of plots can be considered 1. log Ĥ i (t) s against t 2. log Ĥ i vs log Ĥ j 3. Ĥ i vs Ĥ j 4. log Ĥ i (t) log Ĥ 1 (t) s vs t. The last three alternatives require a bit of programming since the estimates are not obtained for the same ts.

Stratified Cox process Suppose we have several covariates and the first is a factor dividing subjects into K groups. Then a stratified Cox model is specified by h i (t z) = h 0i (t) exp(z T 1β 1 ) where h i ( z 1 ) is the hazard for a subject in the ith group with remaining covariate vector z 1 = (z 2,..., z p ) T. That is, a separate baseline hazard h 0i for each group/strata. If proportional hazards holds for the factor used for stratification then H 0i (t) = H 0 (t) exp(β 1 ). So we can make plots similar to those on the previous slide to assess proportional hazards for the factor considered. If we want to assess ph for a quantitative covariate then we can initially discretize it into a factor variable.

Martingale residuals Martingale residuals: r M i = δ i Ĥ0(t i ) exp(z T i ˆβ) Very skewed with values in interval ], 1]. Not useful for detecting outliers. May be used for assessing functional form of covariate by computing ri M for model without covariate and plotting ri M against the omitted covariate. Curve fitted to scatter plot may give indication of possible transformation of covariate. Reason for terminology will be more clear when we later on discuss counting processes and martingales.

Cox-Snell Cox-Snell residuals based on results for continuous random variable X with survivor function S and cumulative hazard and H: S(X ) Unif(]0, 1[) H(X ) Exp(1). Cox-Snell residual: r C i = Ĥ 0 (t i ) exp(z T i ˆβ) = δ i r M i Cox-Snell residuals should look like censored sample of unit-rate exponential random variables which have H(t) = t. This can be checked by considering estimated cumulative hazard for r C i. Cox-Snell residuals may be used for checking overall fit of model - but see reservations in practical notes in KM page 358-359.

Deviance residuals Deviance residuals are obtained by applying symmetrizing transformation to martingale residuals: r D i = sign(ri M )[ 2(ri M + δ i log(δ i ri M ))] 1/2. These residuals should look (approximately) like a sample of iid normal random variables if model correct. However, if heavy censoring distribution becomes bimodal. May be useful for spotting outliers.

Schoenfeld residuals and score process Let x 1,..., x n denote the set of death times (where we only observe x i if δ i = 1) and let R 1,..., R n denote the associated ranks. Recall score function u(β) for Cox s partial likelihood is a sum of terms (p-dimensional vectors) u (i) (β) = δ (i) (z (i) E[z R(i) H i ]) = δ (i) [z (i) e((i))] The components of these terms are also known as Schoenfeld residuals (KM page 376). We can define the score process (KM page 376) as u(β, t) = l D: t l t u l (β) By definition u( ˆβ, t) = 0 for t greater than the maximal observed death time.

KM suggest to plot score process u( ˆβ, t) against time and compare with 95% boundaries of Brownian bridge process. Martinussen and Scheike (2006) Dynamic regression models for survival data, suggest to compare with simulations of score process under assumed model. The score process can also be expressed as n u(β, t) = δ i (z i e(i)) exp(zi T β) i=1 l D: t l t (z l e(l)) k R(t l ) exp(zt k β) (we will see later why, when considering counting processes and martingales). The score residuals are given by the components of u(β, t i ), i = 1,..., n. These are also available from the residuals function and can be cumulated to obtain score process.

Assessment of timevarying effects Suppose that we do not have proportional hazards for the jth covariate in the sense that the true effect of z j is timevarying: β j (t) = β j + γ j g(t). Let r S j,i be Schoenfeld residual scaled with the covariance matrix of ˆβ. Then the expected value of r S j,i is approximately equal to γ j g(t i ). Thus a plot of scaled Schoenfeld residuals versus time may reveal deviations from proportional hazards. Implemented in the cox.zph procedure. This is not covered in KM. See e.g. book by Collett.

Influential observations Do some observations have unusually large influence on estimation of β? Let ˆβ and ˆβ i denote estimates of β based on full data set and data with ith observation omitted. Want to look for i where ˆβ ˆβ i is an outlier. Based on score process residuals it is possible to compute approximation of ˆβ i - i.e. we do not need to fit Cox model for all datasets obtained by omitting one observation. These estimates are called dfbeta in the residual function for coxph objects.

Use of formal testing? KM note 5 page on 380 advocates use of graphical checks rather than formal tests. This is because we know that any statistical model is just an approximation and thus is bound to be rejected if the sample size is large enough. Remember the famous quote by Box: all models are wrong but some are useful Graphical checks may reveal if there are any serious deviations between model and data and possibly also hint to the cause of such deviations.