Proportional hazards regression

Similar documents
Cox regression: Estimation

Accelerated Failure Time Models

Semiparametric Regression

Tied survival times; estimation of survival probabilities

Residuals and model diagnostics

The Weibull Distribution

One-sample categorical data: approximate inference

Time-dependent coefficients

Censoring mechanisms

Sampling distribution of GLM regression coefficients

Outline of GLMs. Definitions

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Other likelihoods. Patrick Breheny. April 25. Multinomial regression Robust regression Cox regression

Multistate models and recurrent event models

Multivariate Survival Analysis

Generalized Linear Models

Multistate models and recurrent event models

STAT5044: Regression and Anova

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Linear Regression Models P8111

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

MAS3301 / MAS8311 Biostatistics Part II: Survival

Lecture 7 Time-dependent Covariates in Cox Regression

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Logistic regression: Miscellaneous topics

Survival Analysis Math 434 Fall 2011

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Lecture 16 Solving GLMs via IRWLS

MAS3301 / MAS8311 Biostatistics Part II: Survival

Ph.D. course: Regression models. Introduction. 19 April 2012

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

FSAN815/ELEG815: Foundations of Statistical Learning

UNIVERSITY OF CALIFORNIA, SAN DIEGO

A Practitioner s Guide to Generalized Linear Models

Survival Regression Models

Extensions of Cox Model for Non-Proportional Hazards Purpose

Survival Analysis. Lu Tian and Richard Olshen Stanford University

POLI 8501 Introduction to Maximum Likelihood Estimation

Lecture 8 Stat D. Gillen

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Modelling geoadditive survival data

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Poisson regression: Further topics

Introduction to General and Generalized Linear Models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Chapter 5: Logistic Regression-I

Cox s proportional hazards model and Cox s partial likelihood

Ph.D. course: Regression models

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Logistic regression model for survival time analysis using time-varying coefficients

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

STAT 526 Spring Final Exam. Thursday May 5, 2011

β j = coefficient of x j in the model; β = ( β1, β2,

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Lecture 8. Poisson models for counts

Theoretical results for lasso, MCP, and SCAD

Generalized Linear Models. Kurt Hornik

University of California, Berkeley

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

One-parameter models

STAT 7030: Categorical Data Analysis

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

Lecture 5: LDA and Logistic Regression

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Stability and the elastic net

Neural networks (not in book)

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Exponential smoothing is, like the moving average forecast, a simple and often used forecasting technique

TMA 4275 Lifetime Analysis June 2004 Solution

Machine Learning. Module 3-4: Regression and Survival Analysis Day 2, Asst. Prof. Dr. Santitham Prom-on

Statistical Distribution Assumptions of General Linear Models

Survival Analysis. Stat 526. April 13, 2018

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Likelihood Construction, Inference for Parametric Survival Distributions

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

Count and Duration Models

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Approximate Normality, Newton-Raphson, & Multivariate Delta Method

Lecture 22 Survival Analysis: An Introduction

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Building a Prognostic Biomarker

SB1a Applied Statistics Lectures 9-10

Count and Duration Models

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Survival analysis in R

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

LOGISTIC REGRESSION Joseph M. Hilbe

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Statistics in medicine

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Transcription:

Proportional hazards regression Patrick Breheny October 8 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/28

Introduction The model Solving for the MLE Inference Today we will begin discussing regression models for time-to-event data There are a number of ways one could think about modeling the dependency between the time to an event and the factors that might affect it The two most common approaches are known as proportional hazards models and accelerated failure time models Patrick Breheny Survival Data Analysis (BIOS 7210) 2/28

Proportional hazards The model Solving for the MLE Inference We ll start with proportional hazards models As the name implies, the idea here is to model the hazard function directly: λ i (t) = λ(t) exp(x T i β) Here, the covariates act in a multiplicative manner upon the hazard function; note that the exponential function ensures that λ i (t) is always positive In this model, the hazard function for the ith subject always has the same general shape λ(t), but can be, say, doubled or halved depending on a patient s risk factors Patrick Breheny Survival Data Analysis (BIOS 7210) 3/28

The model Solving for the MLE Inference In general, any hazard function can be used; today, we ll restrict attention to the constant hazard for the sake of simplicity Thus, the exponential regression model is: λ i (t) = λ exp(x T i β) Note that if x i contains an intercept term, we will have a problem with identifiability there is no way to distinguish β 0 and λ Patrick Breheny Survival Data Analysis (BIOS 7210) 4/28

Identifiability The model Solving for the MLE Inference For a variety of reasons (convenience, simplicity, numerical stability, accuracy of approximate inferential procedures), it is preferable to estimate β 0 rather than λ, so this is the parameterization we will use Of course, having estimated β 0, one can easily obtain estimates and confidence intervals for λ through the transformation λ = exp(β 0 ) In today s lecture notes, we will discuss how to estimate the regression coefficients and carry out inference concerning them, and then illustrate these results using the pbc data Patrick Breheny Survival Data Analysis (BIOS 7210) 5/28

The model Solving for the MLE Inference Solving a nonlinear system of equations Maximum likelihood estimation of β is complicated in exponential regression by the need to solve a nonlinear system of equations This cannot be done in closed form; some sort of iterative procedure is required The basic idea is to construct a linear approximation to the nonlinear system of equations, solve for β, re-approximate, and so on until convergence (this is known as the Newton-Raphson algorithm) We will begin by working out the score and Hessian with respect to the linear predictor, η i = x T i β Patrick Breheny Survival Data Analysis (BIOS 7210) 6/28

Log-likelihood, score, and Hessian The model Solving for the MLE Inference Under independent censoring and assuming T i x i Exp(λ i ), the log-likelihood contribution of the ith subject in exponential regression is l i (η i ) = d i η i t i e η i The score and Hessian are therefore u i (η i ) = d i t i e η i H i (η i ) = t i e η i Patrick Breheny Survival Data Analysis (BIOS 7210) 7/28

Vector/matrix versions The model Solving for the MLE Inference Letting µ denote the vector with ith element t i e η i and W denote the diagonal matrix with ith diagonal element t i e η i, we can rewrite the score and Hessian as u(η) = d µ H(η) = W As we remarked earlier, solving for µ = 0 is complicated because µ is a nonlinear function of η; thus, consider the Taylor series approximation about η where µ and W are fixed at η u(η) u( η) + H( η)(η η) = d µ + W( η η) Patrick Breheny Survival Data Analysis (BIOS 7210) 8/28

Solving for β The model Solving for the MLE Inference So far, of course, we ve ignored the fact that η = Xβ and that we re really estimating β Substituting this expression into the previous equation and solving for β, we obtain β (X T WX) 1 X T (d µ) + β Again, this is an iterative process, which means that this is not an exact solution for β; rather, we must solve for β, recompute µ and W, re-solve for β, and so on The Newton-Raphson algorithm will converge to the MLE (although this is not absolutely guaranteed) provided that the likelihood is log-concave and coercive, both of which (typically) hold for exponential regression Patrick Breheny Survival Data Analysis (BIOS 7210) 9/28

Crude R code The model Solving for the MLE Inference Below is some crude R code providing an implementation of this algorithm b <- rep(0, ncol(x)) for (i in 1:20) { eta <- as.numeric(x%*%b) mu <- t*exp(eta) W <- diag(t*exp(eta)) b <- solve(t(x) %*% W %*% X) %*% t(x) %*% (d-mu) + b } This is crude in the sense that it isn t as efficient as it could be and in that it assumes convergence will occur in 20 iterations; a better algorithm would check for convergence by examining whether β has stopped changing Patrick Breheny Survival Data Analysis (BIOS 7210) 10/28

Wald approach The model Solving for the MLE Inference Since β is the MLE, our derivation of the Wald results from earlier means that β. N ( β, I 1) ; we just have to work out the information matrix with respect to β Applying the chain rule, we have β. N ( β, (X T WX) 1) It is very easy, therefore, to construct confidence intervals for β j with β j ± z 1 α/2 SE j, where SE j = (X T WX) 1 jj Patrick Breheny Survival Data Analysis (BIOS 7210) 11/28

Likelihood ratio approach The model Solving for the MLE Inference The likelihood ratio approach, while desirable, is somewhat complicated in multiparameter settings where we lack closed-form estimates Consider the problem of obtaining a likelihood ratio confidence interval for β j If β j was the only parameter, this is simply a root-finding problem in which we determine the values β L and β U where 2(l( β j ) l(β j )) = χ 2 1,.95 Patrick Breheny Survival Data Analysis (BIOS 7210) 12/28

The profile likelihood The model Solving for the MLE Inference However, β j is not the only parameter, and in particular, if β j was restricted to equal β L, all the other MLEs would change as a consequence In other words, evaluating l(β j ) is not simple, because it involves re-solving for β j at every value of β j that we try out in our root-finding procedure The likelihood L(β j, β j (β j )) is known as the profile likelihood, and the re-solving procedure is sometimes referred to as profiling Note that obtaining a confidence interval using either the score or likelihood ratio approaches involve profiling, but the Wald approach does not Patrick Breheny Survival Data Analysis (BIOS 7210) 13/28

Availability of LRCIs The model Solving for the MLE Inference In summary, it is much faster and more convenient to obtain Wald CIs, since score and LR CIs involve profiling Certainly, it is possible to write code that carries out profiling, and some software packages have implemented functions to do this for you (e.g., glm) Often, however, likelihood ratio confidence intervals are not provided by software packages; in particular, the survival package does not provide them Patrick Breheny Survival Data Analysis (BIOS 7210) 14/28

pbc data: Setup pbc example Accounting for uncertainty in other parameters Diagnostics To illustrate, let s fit an exponential regression model to the pbc data, and include the following four factors as predictors: trt: Treatment (D-penicillamine, placebo) stage: Histologic stage of disease (1, 2, 3, 4) hepato: Presence of hepatomegaly (enlarged liver) bili: Serum bilirunbin (mg/dl) We can fit this model using our crude R code (the survival package does have a function for exponential regression, but its setup doesn t exactly match ours today, so I m postponing coverage of the function to next week) Patrick Breheny Survival Data Analysis (BIOS 7210) 15/28

Results pbc example Accounting for uncertainty in other parameters Diagnostics trt 0.39 stage < 0.0001 hepato 0.10 bili < 0.0001 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 β Patrick Breheny Survival Data Analysis (BIOS 7210) 16/28

Interpretation of coefficients pbc example Accounting for uncertainty in other parameters Diagnostics As in other regression models, the interpretation of the regression coefficients involves the effect of changing one factor while all others remain the same Consider a hypothetical comparison between two individuals whose explanatory variables are the same, except for variable j, where it differs by δ j = x 1j x 2j : λ 1 (t) λ 2 (t) = exp(δ jβ j ) Patrick Breheny Survival Data Analysis (BIOS 7210) 17/28

Hazard ratios pbc example Accounting for uncertainty in other parameters Diagnostics Note that for any proportional hazard model, λ 1 (t)/λ 2 (t) is a constant with respect to time This constant is known as the hazard ratio, and typically abbreviated HR, although some authors refer to it as the relative risk (RR) Thus, the interpretation of a regression coefficient in a proportional hazards model is that e δβ is the hazard ratio for a δ-unit change in that covariate In particular, HR = e β for a one-unit change So, for stage in our pbc example, HR = e 0.564 = 1.76; a one-unit change in stage increases a patient s hazard by 76% Patrick Breheny Survival Data Analysis (BIOS 7210) 18/28

Results (hazard ratios; δ bili = 5) pbc example Accounting for uncertainty in other parameters Diagnostics trt 0.39 stage < 0.0001 hepato 0.10 bili < 0.0001 0.5 1.0 1.5 2.0 2.5 Hazard ratio Patrick Breheny Survival Data Analysis (BIOS 7210) 19/28

Predicted survival: Some examples pbc example Accounting for uncertainty in other parameters Diagnostics We can also predict survival curves at the individual level 1.0 Progression free survival 0.8 0.6 0.4 0.2 0.0 0 2 4 6 8 10 Time (years) Patrick Breheny Survival Data Analysis (BIOS 7210) 20/28

pbc example Accounting for uncertainty in other parameters Diagnostics Uncertainty and information in the multiparameter setting Let s take a moment now to consider the subtle distinction between (I jj ) 1 and (I 1 ) jj The second expression, (I 1 ) jj, is the correct one to use when calculating Wald standard errors, because it accounts for the uncertainty in all the other regression coefficient estimates The first expression, (I jj ) 1, would be correct only if some all-knowing oracle told us exactly what all the values of β j were Patrick Breheny Survival Data Analysis (BIOS 7210) 21/28

An example pbc example Accounting for uncertainty in other parameters Diagnostics To make this more concrete, let s consider the standard error of the coefficient for hepatomegaly The actual Wald SE is (I 1 ) jj = 0.126 The naïve standard error is (I jj ) 1 = 0.024 As we have remarked previously, hepatomegaly is strongly correlated with stage (it s also moderately correlated with bilirubin); any uncertainty in the true effect of stage means increased uncertainty about the effect of hepatomegaly The naïve approach fails to account for this, and greatly underestimates the true uncertainty Patrick Breheny Survival Data Analysis (BIOS 7210) 22/28

Diagnostic plot (original scale) pbc example Accounting for uncertainty in other parameters Diagnostics As a diagnostic plot to check whether the exponential distribution seems reasonable, we can plot the Kaplan-Meier estimate against the best exponential fit: 1.0 Progression free survival 0.8 0.6 0.4 0.2 0.0 0 2 4 6 8 10 Time (years) Patrick Breheny Survival Data Analysis (BIOS 7210) 23/28

Diagnostic plot (linear) pbc example Accounting for uncertainty in other parameters Diagnostics Alternatively, since the exponential model implies log S(t) = λt, we can obtain a linear version of the diagnostic plot: 1.0 0.8 log S(t) 0.6 0.4 0.2 0.0 0 2 4 6 8 10 Time (years) Patrick Breheny Survival Data Analysis (BIOS 7210) 24/28

Limitations pbc example Accounting for uncertainty in other parameters Diagnostics These diagnostic plots, although useful for identifying gross lack of fit, have some clear limitations The main limitation is that our model does not assume T i Exp(λ), but rather that T i x i Exp(λ i ) Thus, we may see a departure from linearity in the plot on the previous page, but it doesn t necessarily imply a violation of model assumptions Patrick Breheny Survival Data Analysis (BIOS 7210) 25/28

Diagnostic plot (simulated) pbc example Accounting for uncertainty in other parameters Diagnostics For example, consider this simulated diagnostic plot for two groups, each independently following an exponential distribution, but with different rate parameters: 6 5 log S(t) 4 3 2 1 0 0 2 4 6 8 10 12 Time Patrick Breheny Survival Data Analysis (BIOS 7210) 26/28

Comments pbc example Accounting for uncertainty in other parameters Diagnostics Nevertheless, these diagnostic plots are generally useful provided that the covariates do not have an overwhelming effect on survival (covariates do not dominate ) If any covariates do have overwhelming effects, one may considering stratifying the diagnostic plots For example, we may wish to construct separate diagnostic plots for each stage in our pbc example Patrick Breheny Survival Data Analysis (BIOS 7210) 27/28

Residuals? pbc example Accounting for uncertainty in other parameters Diagnostics In linear regression, of course, we don t face these issues because we can directly examine residuals In survival analysis, however, residuals are more complicated in that some of them will be censored There are ways of dealing with this, and of obtaining residuals for time-to-event regression models, but we will postpone this discussion for a later lecture Patrick Breheny Survival Data Analysis (BIOS 7210) 28/28