Relative-risk regression and model diagnostics. 16 November, 2015

Similar documents
β j = coefficient of x j in the model; β = ( β1, β2,

Lecture 8 Stat D. Gillen

Multivariable Fractional Polynomials

Survival Regression Models

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

MAS3301 / MAS8311 Biostatistics Part II: Survival

Survival analysis in R

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Multivariable Fractional Polynomials

Survival analysis in R

The coxvc_1-1-1 package

Multistate models and recurrent event models

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Multistate models and recurrent event models

Survival Analysis Math 434 Fall 2011

Lecture 7 Time-dependent Covariates in Cox Regression

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Fitting Cox Regression Models

The influence of categorising survival time on parameter estimates in a Cox model

Chapter 4 Regression Models

A fast routine for fitting Cox models with time varying effects

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

On a connection between the Bradley-Terry model and the Cox proportional hazards model

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

STAT331. Cox s Proportional Hazards Model

Survival Analysis. STAT 526 Professor Olga Vitek

TMA 4275 Lifetime Analysis June 2004 Solution

Comparison of Hazard, Odds and Risk Ratio in the Two-Sample Survival Problem

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Matched Pair Data. Stat 557 Heike Hofmann

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Time-dependent coefficients

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Package CoxRidge. February 27, 2015

Survival Analysis I (CHL5209H)

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

On a connection between the Bradley Terry model and the Cox proportional hazards model

Faculty of Health Sciences. Cox regression. Torben Martinussen. Department of Biostatistics University of Copenhagen. 20. september 2012 Slide 1/51

Tied survival times; estimation of survival probabilities

SSUI: Presentation Hints 2 My Perspective Software Examples Reliability Areas that need work

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Multimodal Deep Learning for Predicting Survival from Breast Cancer

Philosophy and Features of the mstate package

Residuals and model diagnostics

Introduction to logistic regression

Simple techniques for comparing survival functions with interval-censored data

Case-control studies

Applied Survival Analysis Lab 10: Analysis of multiple failures

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Session 3 The proportional odds model and the Mann-Whitney test

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Introduction to Statistical Analysis

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Fractional Polynomials and Model Averaging

Statistics in medicine

ST745: Survival Analysis: Cox-PH!

4 Testing Hypotheses. 4.1 Tests in the regression setting. 4.2 Non-parametric testing of survival between groups

More on Cox-regression

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

pensim Package Example (Version 1.2.9)

POWER AND SAMPLE SIZE DETERMINATIONS IN DYNAMIC RISK PREDICTION. by Zhaowen Sun M.S., University of Pittsburgh, 2012

Logistic regression model for survival time analysis using time-varying coefficients

Cox s proportional hazards model and Cox s partial likelihood

Towards stratified medicine instead of dichotomization, estimate a treatment effect function for a continuous covariate

Extensions of Cox Model for Non-Proportional Hazards Purpose

Building a Prognostic Biomarker

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Survival Prediction Under Dependent Censoring: A Copula-based Approach

Does anemia contribute to end-organ dysfunction in ICU patients Statistical Analysis

Analysis of categorical data S4. Michael Hauptmann Netherlands Cancer Institute Amsterdam, The Netherlands

Correlation and regression

Lecture 10. Diagnostics. Statistics Survival Analysis. Presented March 1, 2016

4. Comparison of Two (K) Samples

Survival Analysis APTS 2016/17. Ingrid Van Keilegom ORSTAT KU Leuven. Glasgow, August 21-25, 2017

Lecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018

Cox Proportional-Hazards Regression for Survival Data in R

Cox s proportional hazards/regression model - model assessment

Quantile Regression for Residual Life and Empirical Likelihood

STA6938-Logistic Regression Model

Final Exam. Name: Solution:

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Beyond GLM and likelihood

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

BIOS 625 Fall 2015 Homework Set 3 Solutions

STA6938-Logistic Regression Model

Lecture 2: Poisson and logistic regression

Multi-state models: prediction

Multiple imputation in Cox regression when there are time-varying effects of covariates

Transcription:

Relative-risk regression and model diagnostics 16 November, 2015

Relative risk regression More general multiplicative intensity model: Intensity for individual i at time t is i(t) =Y i (t)r(x i, ; t) 0 (t). Cox proportional hazards regression: r(x,β)=e x β. Estimate with maximum partial likelihood: L P ( )= t j i j t j = t j r(, x ij (t j ); t j ) l R j r(, x l (t j ); t j ). Breslow estimator for baseline cumulative hazard Â(t) = 0 t dn(u) n i=1 Y i(u)r( ˆ, x i (u)) = t j t 1 i R j r( ˆ, x i (t j ).

Tied data Suppose at time t j there are 5 individuals at risk, with relative risks r 1,r 2,r 3,r 4,r 5. Individuals 1 and 2 have events at time t j. The contribution to the partial likelihood is r 1 r 1 + r 2 + r 3 + r 4 + r 5 r 2 r 2 + r 3 + r 4 + r 5 + r 2 r 1 + r 2 + r 3 + r 4 + r 5 In general, if d j individuals have events, there are d j! terms. r 1 r 2 Efron s alternative: (r 1 + r 2 + r 3 + r 4 + r 5 )( 1 2 (r 1 + r 2 )+r 3 + r 4 + r 5 ). In other words, l P (β) = t j Applying the same principle to baseline hazard estimate, i D j log r(β, x i (t j )) Â 0 (t) = t j t d j 1 k=0 d j 1 k=0 i R j r log i R j r(β, x i (t j ) k d i ( β, xi (t j ) ) k d i r 1 r 1 + r 3 + r 4 + r 5. i D j r r(β, x i ). i D j ( β, xi (t j ) ) 1. Breslow s alternative: Ignore the decrements to total hazard at risk `Breslow P ( )= X t j X i2d j log r(, x i (t j )) d j log X i2r j r, x i (t j ).

maintenance nonmaintenance 10 20 30 40 50 Maintenance Non-Maintenance (control) t i Y (t i ) d i ĥ i Ŝ(t i ) Ĥ i e S(ti ) Y (t i ) d i ĥ i Ŝ(t i ) Ĥ i e S(ti ) 5 11 0 0.00 1.00 0.00 1.00 12 2 0.17 0.83 0.17 0.85 8 11 0 0.00 1.00 0.00 1.00 10 2 0.20 0.67 0.37 0.69 9 11 1 0.09 0.91 0.09 0.91 8 0 0.00 0.67 0.37 0.69 12 10 0 0.00 0.91 0.09 0.91 8 1 0.12 0.58 0.49 0.61 13 10 1 0.10 0.82 0.19 0.83 7 0 0.00 0.58 0.49 0.61 18 8 1 0.12 0.72 0.32 0.73 6 0 0.00 0.58 0.49 0.61 23 7 1 0.14 0.61 0.46 0.63 6 1 0.17 0.49 0.66 0.52 27 6 0 0.00 0.61 0.46 0.63 5 1 0.20 0.39 0.86 0.42 30 5 0 0.00 0.61 0.46 0.63 4 1 0.25 0.29 1.11 0.33 31 5 1 0.20 0.49 0.66 0.52 3 0 0.00 0.29 1.11 0.33 33 4 0 0.00 0.49 0.66 0.52 3 1 0.33 0.19 1.44 0.24 34 4 1 0.25 0.37 0.91 0.40 2 0 0.00 0.19 1.44 0.24 43 3 0 0.00 0.37 0.91 0.40 2 1 0.50 0.10 1.94 0.14 45 3 0 0.00 0.37 0.91 0.40 1 1 1.00 0.00 2.94 0.05 48 2 1 0.50 0.18 1.41 0.24 0 0

L P L P ( )= 1.6e-18 2.0e-18 2.4e-18 2.8e-18 0.6 0.8 1.0 1.2 1.4 e 2 (12e + 11)(11e + 11) (10e + 11)(9e + 11) 1 e 1 1 8e + 11 8e + 10 7e + 10 6e +8 e 1 e e (6e + 7)(5.5e +6.5) 5e +6 4e +5 1 e 1 e 3e +5 3e +4 2e +4 2e +3 β e 2 e 1 e +3 2

0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 Table 9.1: Output of the coxph function run on the aml data set. coxph(formula = Surv(time, status) x, data = aml) coef exp(coef) se(coef) z p Nonmaintained 0.916 2.5 0.512 1.79 0.074 Likelihood ratio test=3.38 on 1 df p=0.0658 n= 23 Maintenance Non-Maintenance Baseline (control) t i Y M (t i ) d M i Y N (t i ) d N i ĥ 0 (t i ) Ĥ 0 (t i ) e S0 (t i ) 5 11 0 12 2 0.050 0.050 0.951 8 11 0 10 2 0.058 0.108 0.898 9 11 1 8 0 0.032 0.140 0.869 12 10 0 8 1 0.033 0.174 0.841 13 10 1 7 0 0.036 0.210 0.811 18 8 1 6 0 0.043 0.254 0.776 23 7 1 6 1 0.095 0.348 0.706 27 6 0 5 1 0.054 0.403 0.669 30 5 0 4 1 0.067 0.469 0.625 31 5 1 3 0 0.080 0.549 0.577 33 4 0 3 1 0.087 0.636 0.529 34 4 1 2 0 0.111 0.747 0.474 43 3 0 2 1 0.125 0.872 0.418 45 3 0 1 1 0.182 1.054 0.348 48 2 1 0 0 0.500 1.554 0.211

Testing Estimate variance for the parameters from observed Fisher partial information. Single parameter: Test H 0 :β q =0 using asymptotically normal test statistic ˆq J P ( ˆ) qq. Multiple parameters: Test H 0 : β 0. Wald statistic : 2 W := ( ˆ 0) T J P ( 0 )( ˆ 0); Score statistic : 2 SC = U( 0 ) T J P ( 0 )U( 0 ); Likelihood ratio statistic : 2 LR =2 P ( ˆ) P ( 0 ), where P := log L P. Under the null hypothesis these are all asymptotically chi-squared distributed with p degrees of freedom.

Dutch Cancer Institute (NKI) breast cancer data One of the first studies of gene expression in cancer. Research question: Does expression level of oestrogen receptor gene affect prognosis? Data: patnr Patient identification number d Survival status; 1 = death; 0 = censored tyears Time in years until death or last follow-up diameter Diameter of the primary tumor posnod Number of positive lymph nodes age Age of the patient mlratio oestrogen expression level chemotherapy Chemotherapy used (yes/no) hormonaltherapy Hormonal therapy used (yes/no) typesurgery Type of surgery (excision or mastectomy) histolgrade Histological grade (Intermediate, poorly, or well di erentiated) vasc.invasion Vascular invasion (-, +, or +/-)

nki.surv=with(nki,surv(tyears,d)) nki.cox=with(nki,coxph(nki.surv~posnodes+chemotherapy+hormonaltherapy +histolgrade+age+mlratio+diameter+posnodes)) nki.cox=with(nki,coxph(nki.surv~posnodes+chemotherapy+hormonaltherapy +histolgrade+age+mlratio+diameter+posnodes+vasc.invasion+typesurgery)) > summary(nki.cox) Call: coxph(formula = nki.surv ~ posnodes + chemotherapy + hormonaltherapy + histolgrade + age + mlratio + diameter + posnodes + vasc.invasion + typesurgery) n= 295, number of events= 79 coef exp(coef) se(coef) z Pr(> z ) posnodes 0.0744 1.077 0.0528 1.409 0.159 chemotherapyyes -0.423 0.655 0.298-1.421 0.155 hormonaltherapyyes -0.172 0.842 0.442-0.388 0.698 histolgradepoorly diff 0.266 1.30 0.281 0.946 0.344 histolgradewell diff -1.31 0.270 0.548-2.39 0.0170 * age -0.039 0.96 0.0195-2.02 0.0438 * mlratio -0.750 0.472 0.211-3.550 0.00039 *** diameter 0.0198 1.02 0.0132 1.493 0.135 vasc.invasion+ 0.603 1.83 0.253 2.38 0.0172 * vasc.invasion+/- -0.146 0.864 0.491-0.297 0.767 typesurgerymastectomy 0.150 1.16 0.249 0.605 0.545

Stratified baselines Individual i belongs to category c i with baseline hazard α c0 (t). So the hazard rate for individual i is α ( t x i (t),c i ) = αci 0(t)r ( β, x i (t) ). We estimate with the stratified partial likelihood k r(β, x ij (t j )) L P (β) = l R j :c l =c r(β, x l(t j )). c=1 t j :c ij =c Example: NHANES data. Looking at effect of systolic and diastolic BP on survival. Different ethnic groups and sexes will have distinct baseline mortality. Potentially confounding with the parameters of interest. C1=coxph(formula = with(nhanesc, Surv(age,age+yrsfu,mrtHrt+mrtOtherCVD) ) ~ meansys + meandias + strata(race) + strata(female),data=nhanesc) C2=coxph(formula = with(nhanesc, Surv(age,age+yrsfu,mrtHrt+mrtOtherCVD) ) ~ meansys + meandias + race + female,data=nhanesc)

> summary(c1) Call: coxph(formula = with(nhanesc, Surv(age, age + yrsfu, mrthrt + mrtothercvd)) ~ meansys + meandias + strata(race) + strata(female), data = nhanesc) n= 10289, number of events= 560 coef exp(coef) se(coef) z Pr(> z ) meansys 0.010433 1.010487 0.002284 4.568 4.92e-06 *** meandias -0.003964 0.996043 0.004547-0.872 0.383 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 exp(coef) exp(-coef) lower.95 upper.95 meansys 1.010 0.9896 1.0060 1.015 meandias 0.996 1.0040 0.9872 1.005 Concordance= 0.55 (se = 0.031 ) Rsquare= 0.002 (max possible= 0.437 ) Likelihood ratio test= 23.07 on 2 df, p=9.771e-06 Wald test = 24.12 on 2 df, p=5.772e-06 Score (logrank) test = 24.13 on 2 df, p=5.755e-06

> summary(c2) Call: coxph(formula = with(nhanesc, Surv(age, age + yrsfu, mrthrt + mrtothercvd)) ~ meansys + meandias + race + female, data = nhanesc) n= 10289, number of events= 560 coef exp(coef) se(coef) z Pr(> z ) meansys 0.010641 1.010698 0.002254 4.721 2.35e-06 *** meandias -0.005255 0.994759 0.004515-1.164 0.24445 raceother -0.366528 0.693137 0.118370-3.096 0.00196 ** racewhite -0.439763 0.644189 0.100398-4.380 1.19e-05 *** female -0.507339 0.602096 0.088151-5.755 8.65e-09 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 exp(coef) exp(-coef) lower.95 upper.95 meansys 1.0107 0.9894 1.0062 1.0152 meandias 0.9948 1.0053 0.9860 1.0036 raceother 0.6931 1.4427 0.5496 0.8741 racewhite 0.6442 1.5523 0.5291 0.7843 female 0.6021 1.6609 0.5066 0.7156 Concordance= 0.621 (se = 0.013 ) Rsquare= 0.008 (max possible= 0.535 ) Likelihood ratio test= 83.2 on 5 df, p=2.22e-16 Wald test = 85.96 on 5 df, p=0 Score (logrank) test = 86.94 on 5 df, p=0

Model diagnostics: General approaches 1. Compare features of the data to features characteristic of the model, such as: i. proportional hazards, additive hazards assumption ii. regression assumptions, e.g., logarithmic link in Cox model iii. important outliers in otherwise well-specified model 2. Compare expected features of the fit model to observed features of the real data. 3. Simulate from the fit model and examine how typical the real data would be for a simulated data set.

Survival for 3 different gene expression levels (based on the Cox model) Survival 0.0 0.2 0.4 0.6 0.8 1.0 mlratio -1.2-0.1 0.35 10th percentile median 90th percentile 0 5 10 15 Time (Years)

Survival for 3 different gene expression strata (Kaplan-Meier estimates) 0.0 0.2 0.4 0.6 0.8 1.0 mlratio quantiles 1 2-4 5 0 5 10 15

NKI survival Survival 0.0 0.2 0.4 0.6 0.8 1.0 mlratio -1.2-0.1 0.35 mlratio quantiles 1 2-4 5 0 5 10 15 Time (Years)