STK4080/9080 Survival and event history analysis

Similar documents
Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

x i1 =1 for all i (the constant ).

Comparison of Regression Lines

Linear Approximation with Regularization and Moving Least Squares

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Composite Hypotheses testing

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Chapter 13: Multiple Regression

Diagnostics in Poisson Regression. Models - Residual Analysis

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Linear Regression Analysis: Terminology and Notation

Chapter 11: Simple Linear Regression and Correlation

STAT 3008 Applied Regression Analysis

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Negative Binomial Regression

Lecture 6: Introduction to Linear Regression

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Nested case-control and case-cohort studies

Logistic regression with one predictor. STK4900/ Lecture 7. Program

Chapter 20 Duration Analysis

Statistics for Economics & Business

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Professor Chris Murray. Midterm Exam

January Examinations 2015

1 Binary Response Models

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Limited Dependent Variables and Panel Data. Tibor Hanappi

STAT 511 FINAL EXAM NAME Spring 2001

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

Lecture 3 Stat102, Spring 2007

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Economics 130. Lecture 4 Simple Linear Regression Continued

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

QUASI-LIKELIHOOD APPROACH TO RATER AGREEMENT PLUS LINEAR BY LINEAR ASSOCIATION MODEL FOR ORDINAL CONTINGENCY TABLES

Lecture Notes on Linear Regression

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Statistics for Business and Economics

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Basically, if you have a dummy dependent variable you will be estimating a probability.

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Econometrics of Panel Data

Convergence of random processes

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

Chapter 12 Analysis of Covariance

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

4.3 Poisson Regression

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria

A Comparative Study for Estimation Parameters in Panel Data Model

Introduction to Generalized Linear Models

/ n ) are compared. The logic is: if the two

Chapter 9: Statistical Inference and the Relationship between Two Variables

Limited Dependent Variables

Sample Size Calculation Based on the Semiparametric Analysis of Short-term and Long-term Hazard Ratios. Yi Wang

RELIABILITY ASSESSMENT

Properties of Least Squares

Parameters Estimation of the Modified Weibull Distribution Based on Type I Censored Samples

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

Global Sensitivity. Tuesday 20 th February, 2018

First Year Examination Department of Statistics, University of Florida

Chapter 15 - Multiple Regression

9. Binary Dependent Variables

Basic Business Statistics, 10/e

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

Influence Diagnostics on Competing Risks Using Cox s Model with Censored Data. Jalan Gombak, 53100, Kuala Lumpur, Malaysia.

Outline. Recall that Aalen additive hazards model and the semiparametric version

Chapter 6. Supplemental Text Material

28. SIMPLE LINEAR REGRESSION III

e i is a random error

Statistics II Final Exam 26/6/18

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Hydrological statistics. Hydrological statistics and extremes

Transcription:

SK48/98 Survval and event hstory analyss Lecture 7: Regresson modellng Relatve rsk regresson Regresson models Assume that we have a sample of n ndvduals, and let N (t) count the observed occurrences of the event of nterest for ndvdual as a functon of (study) tme t We have the decomposton dn ( t) = λ ( t) dt + dm ( t) Partal lkelhood Estmaton of cumulatve hazards and survval probabltes Martngale resduals and model check Stratfed models observaton sgnal nose We wll consder regresson models where the ntensty process λ ( t) for ndvdual depends on a vector of (possbly) tme-dependent covarates x ( t) = ( x ( t), x ( t),..., x ( t)) 2 p 2 he ntensty process for ndvdual may be gven as λ ( t) = Y ( t) α( t x ) at rsk ndcator hazard rate (ntensty) (tme-dependency of covarates suppressed n the notaton) A regresson model specfes how the hazard rate depends on the covarates We wll consder two types of regresson models: Relatve rsk regresson models (secton 4.) Addtve regresson models (secton 4.2) 3 A note on covarates We assume that the ntensty processes depend on the covarate processes x ( t) = ( x ( t), x ( t),..., x ( t)) =,..., n 2 p hroughout we wll assume that the covarate processes are predctable hs mples that: fxed covarates should be measured n advance (.e. at tme zero) and reman fxed throughout the study the values at tme t of tme-dependent covarates should be known "ust before" tme t You should never let covarates depend on nformaton from the future! 4

It s useful to dstngush between external (or exogenous) and nternal (or endogenous) covarates Examples of external covarates are: Fxed covarates Defned tme-dependent covarates: the complete covarate path s gven at the outset of the study (e.g. a person's age at study tme t ) Ancllary tme-dependent covarates: the path of a stochastc process that s not nfluenced by the event beng studed (e.g. observed level of ar polluton) me-dependent covarates that are not external, are called nternal One example of an nternal covarate s a bomarker measured for the ndvduals durng follow-up Interpretaton of regresson analyses wth nternal tme-dependent covarates s not at all straghtforward! 5 Relatve rsk regresson models Assume that the hazard rate for ndvdual takes the form α ( t x ) = α ) (, x ( t)) baselne hazard ( t rβ hazard rato (relatve rsk) β α ( t) We assume r (, ) =, so the baselne hazard s the hazard for an ndvdual wth all covarates equal to zero We make no assumptons of the form of the baselne hazard hus the model contans a nonparametrc part (the baselne hazard) and a parametrc part (the relatve rsk functon) We say that the model s semparametrc 6 he common choce of relatve rsk functon s ( ) ( β β ) r( β, x ( t)) = exp β x ( t) = exp x ( t) + + x ( t) p p whch gves Cox's regresson model Consder two ndvduals, ndexed and 2, and assume that all components of x ( t) and ( t) are equal, 2 except the -th component where x ( t) = x ( t) + hen: α( t x2) α( t x ) ( β x t ) ( β x t ) α( t)exp 2( ) = α ( t)exp ( ) x 2 { ( 2 t t )} = exp β x ( ) x ( ) = e β hus e β s the hazard rato for one unt's ncrease n the -th covarate, keepng all other covarates constant 7 Other possble choces of the relatve rsk functon are: he addtve rsk functon: r( β, x ( t)) = + β x ( t) he excess relatve rsk functon: p r( β, x ( t)) = { + β x ( t)} = Cox regresson s the only relatve rsk regresson model mplemented n R 8

Partal lkelhood and estmaton of β Ordnary ML-estmaton does not work for the relatve rsk regresson models (due to the nonparametrc baselne) Instead we have to use a partal lkelhood We wll se how ths may be derved he ntensty process of N (t) s gven as λ ( t) = Y ( t) α( t x ) = Y ( t) α ( t) r( β, x ( )) t he ntensty process of the aggregated countng process takes the form (assumng no ont events) We consder the condtonal probablty of observng an event for ndvdual at tme t, gven the past and gven that an event s observed at tme t : ( t) = P( dn ( t) = dn ( t) =, F ) π t P( dn( t) = F t ) = P( dn ( t) = F ) t hen the ntensty process of N (t) may be factorzed as 9 We obtan the partal lkelhood by multplyng together the condtonal probabltes over all observed event tmes (thereby dsregardng the nformaton on the regresson coeffcents contaned n the aggregated process) hen, f s the ndex of the ndvdual who experences an event at, the partal lkelhood becomes We wll show (later) that the maxmum partal lkelhood estmator enoys "the usual propertes" of ML-estmators hus s approxmately multvarate normally dstrbuted around the true value of wth a covarance matrx that may be estmated by, where s the observed nformaton matrx For general relatve rsk functons t may be better to use the expected nformaton matrx. But as ths concdes wth the observed nformaton matrx for Cox regresson, we wll not go nto these detals (cf. secton 4..5) where s the rsk set at 2

o test the null hypothess H : β =, we may use the Wald test statstc ˆ β Z = se ( ˆ β ) whch s approxmately standard normally dstrbuted under the null hypothess o obtan a confdence nterval for the hazard rato we transform the lmts of the standard confdence nterval for to get the 95% confdence nterval : β { ˆ β ˆ ± se β } exp.96 ( ) e β 3 o test the smple null hypothess H : β = β for a specfed value of β (typcally β = ) we may apply the usual lkelhood based tests statstcs: he lkelhood rato test statstc: he score test statstc: where = Uβ ( ) Iβ ( ) Uβ ( ) 2 χ SC he Wald test statstc: χ = ( βˆ β ) Iβ ( ˆ)( βˆ β ) 2 W s the vector of score functons All the test statstcs are approxmately ch-squared dstrbuted wth p df under the null hypothess 4 All the tests may be generalzed to a composte null hypothess, where on want to test the hypothess that r of the regresson coeffcents are zero (or equvalently, after a reparameterzaton, that there are r lnear restrctons among the regresson coeffcents) In partcular f s the maxmum partal lkelhood estmator under the null hypothess, the lkelhood rato test statstc takes the form and t s approxmately ch-squared dstrbuted wth r df under the null hypothess 5 Usng R For llustraton we use the melanoma data (cf practcal exercses and 2) # Read data: path="http://www.uo.no/studer/emner/matnat/math/sk48/h4/melanoma.txt" melanoma=read.table(path,header=) # We frst consder the model wth log-thckness as the only covarate: ft.t=coxph(surv(lfetme,status==)~log2(thckn),data=melanoma) summary(ft.t) # Note that we use base 2 logarthms for ease of nterpretaton # hen we consder the model wth log-thckness and sex as covarates: ft.ts=coxph(surv(lfetme,status==)~log2(thckn)+sex,data=melanoma) summary(ft.ts) # Note that snce sex s a bnary covarate (coded and 2), we get the same # estmates f we treat sex as a numerc covarate or as a categorcal # covarate [by usng factor(sex) n the coxph-command] # he two models may be compared usng the lkelhood rato test: anova(ft.t,ft.ts,test="chsq") 6

Estmaton of cumulatve hazards and survval probabltes We wll estmate the cumulatve baselne hazard A t t ( ) ( ) = α u du We take the aggregated countng process as our startng pont Its ntensty process s gven by For a gven value ofβ, we may therefore estmate A ( t) by Snce s unknown, we replace t by to obtan the Breslow estmator: If we had knownβ, ths would have been an example of the multplcatve ntensty model 7 8 If all covarates are fxed, the cumulatve hazard correspondng to an ndvdual wth a gven covarate vector s x he correspondng survval functon s gven by and t may be estmated by and t may be estmated by For a gven path of an external tme-dependent covarate, the cumulatve hazard may be estmated by Alternatvely we may use (as s done n R): { A ˆ t x } Sɶ ( t x ) = exp ( ) For practcal purposes there s lttle dfference between the two estmators he estmators of the cumulatve hazards and survval functons are approxmately normal and ther varances may be estmated as descrbed n secton 4..6 (whch s not part of the currculum) 2

Usng R For llustraton we contnue to use the melanoma data # We frst consder ulceraton as the only covarate and start by # makng Nelson-Aalen plots for patents wth and wthout ulceraton: ft.su=coxph(surv(lfetme,status==)~strata(ulcer),data=melanoma) surv.su=survft(ft.su) plot(surv.su,fun="cumhaz", mark.tme=f,xlm=c(,),ylm=c(,.7), xlab="years snce operaton",ylab="cumulatve hazard",lty=:2) legend("topleft",c("ulceraton","no ulceraton"),lty=:2) # We then ft a Cox model wth ulceraton as the only covarate and plot # the model based estmates of the cumulatve hazards n the same plot: ft.u=coxph(surv(lfetme,status==)~ulcer,data=melanoma) surv.u=survft(ft.u,newdata=data.frame(ulcer=c(,2))) lnes(surv.u,fun="cumhaz", mark.tme=f,conf.nt=f, lty=:2,col="red") # We then consder the model wth ulceraton and log-thckness ft.ut=coxph(surv(lfetme,status==)~ulcer+log2(thckn),data=melanoma) summary(ft.ut) # We wll plot the cumulatve hazards for the four covarate combnatons # ) ulcer=2, thckn= # 2) ulcer=2, thckn=4 # 3) ulcer=, thckn=4 # 3) ulcer=, thckn=8 new.covarates=data.frame(ulcer=c(2,2,,),thckn=c(,4,4,8)) surv.ut=survft(ft.ut,newdata= new.covarates) plot(surv.ut,fun="cumhaz", mark.tme=f, xlm=c(,), xlab="years snce operaton",ylab="cumulatve hazard",lty=:4) legend("topleft",c("","2","3","4"), lty=:4) # o plot the survval functons for the same combnatons of the # covarates we ust omt the "cumhaz" opton: plot(surv.ut,mark.tme=f, xlm=c(,), xlab="years snce peraton",lty=:4) legend("bottomleft",c("","2","3","4"), lty=:4) 2 22 Martngale resduals and model check We know that the processes wth Λ ( t) = t λ ( u) du = Y ( u) r( β, x ( u)) α ( u) du M ( t) = N ( t) Λ ( t) are martngales f the model s correctly specfed Λ ( t) ˆβ β α u du We may estmate by nsertng for and da ˆ ( u) for ( ) where A ˆ ( t) s the Breslow estmator t 23 Estmated cumulatve ntensty processes: Λ ˆ ( t ) = ( ) ( ˆ, ( )) ˆ t Y u r β x u d A ( u ) Y ( ) ( ˆ r β, x ( )) r( βˆ, x ( )) = t l l R Martngale resdual processes Mˆ ( t) = N ( t) Λˆ ( t) Martngale resduals M ˆ = M ˆ ( τ ) where τ s upper tme lmt of study 24

In the ABG-book (secton 4..3) a method s descrbed for checkng goodness-of-ft for relatve rsk regresson models usng grouped martngale resdual processes We wll not consder ths method, but rather present the methods of Ln et al (Bometrka 993) for checkng the assumptons of Cox regresson usng cumulatve sums of martngale resdual processes So consder Cox's regresson model wth fxed covarates: α( t x) = α ( t)exp( β x) he model assumes: ) Log-lnearty: 2) Proportonal hazards: log{ α( t x)} = log{ α ( t)} + β x α( t x2) = exp{ β ( x2 x)} (ndependent of tme) α( t x ) 25 26 For checkng log-lnearty,.e. f the k-th covarate has correct functonal form, we may consder ( ) = ( ) ˆ n k k = W x I x x M n = I( x x) N ( τ ) I( x x) exp ( βˆ x ) ( βˆ x ) k k = exp R l l R he two terms are the observed and expected number of falures for covarate values x 27 Illustraton for melanoma data wth ulceraton and tumor thckness (not log-transformed) as covarates -6-4 -2 2 4 6 5 5 If the model s correctly specfed, the test process should fluctuate around zero So «large» values ndcate that the covarate has a wrong functonal form But how large s «large»? umor thckness 28

Ln et al. (993) showed that f the model s correctly specfed, Wk ( x) s asymptotcally dstrubuted as a mean zero Gaussan process he lmtng dstrbuton s ntractable, but Ln et al. suggested a way to approxmate the dstrbuton usng Monte Carlo smulatons he trck s to consder an asymptotc approxmaton of Wk ( x) and to replace dm (t) n ths approxmaton by G dn (t) where the G 's are sampled from a standard normal dstrbuton (keepng the data fxed) 29 Cumulatve MG-resduals Plot of the observed test process together wth 5 smulated processes (assumng a correct model) -5 5 5 5 umor thckness he computaton may be performed usng the tmereg package n R, cf. below and secton 6.2 n Martnussen & Scheke (Sprnger 26) he plot ndcates that the model predcts too many deaths for thn tumors o get a P-value we compare sup Wk ( x) wth smulated processes, gvng P=.64 3 For a model wth log thckness and ulceraton we get the followng result: Cumulatve MG-resduals -5 5 For checkng proportonal hazards, we for the k-th covarate consder ( ) ˆ = n n exp( βˆ x ) Uk t xk M ( t) = xk N ( t) xk ˆ = exp β x Illustraton for melanoma data wth log tumor thckness and ulceraton: prop(ulcer) ( ) = t R l l R prop(log2(thckn)) -2 2 4 log tumor thckness Here the assumpton of a log-lnear effect seems fne (P=.3) 3 Cumulatve MG-resduals -4-2 2 4 P=.4 2 4 6 8 Cumulatve MG-resduals - -5 5 P=.5 2 4 6 8 Years snce operaton Years snce operaton

Usng R For llustraton we contnue to use the melanoma data We wll use the tmereg package so ths needs to be nstalled and loaded # We frst consder a model wth ulceraton and thckness (not log-transformed) ft.ut=cox.aalen(surv(lfetme,status==)~prop(ulcer)+prop(thckn), data=melanoma, weghted.test=, resduals=,rate.sm=,n.sm=) #Check of log-lnearty resds.ut=cum.resduals(ft.ut,data=melanoma,cum.resd=) plot(resds.ut,score=2,xlab="umor thckness") summary(resds.ut) # We then check log-lnearty and proportonal hazards for a model wth log-transformed thckness ft.ult=cox.aalen(surv(lfetme,status==)~prop(ulcer)+prop(log2(thckn)), data=melanoma, weghted.test=, resduals=,rate.sm=,n.sm=) resds.ult=cum.resduals(ft.ult,data=melanoma,cum.resd=) plot(resds.ult,score=2,xlab="log tumor thckness") summary(resds.ult) par(mfrow=c(,2)) plot(ft.ult,score=,xlab="years snce operaton") summary(ft.ult) Stratfed models So far we have assumed a common baselne hazard for all ndvduals,.e. α ( t x ) = α ) (, x ( t)) ( t rβ When ths s not a realstc assumpton, one may adopt a stratfed verson of the model hen the study popolaton s grouped nto k strata, and for an ndvdual n stratum s we assume that the hazard takes the form: α ( t x,stratu m s) = α ( t) r( β, x ( t)) Note that the effects of the covarates are assumed to be the same accross strata, whle the baselne hazard may vary between strata 33 34 s We now estmate β by maxmzng the partal lkelhood We may estmate the stratum-specfc cumulatve baselne hazards t A ( t) ( u) du = α s s where are the observed event tmes n by the Breslow estmators stratum s and s the rsk set n ths stratum at tme s he maxmum partal lkelhood estmator enoys smlar propertes as for the stuaton wthout stratfcaton and statstcal test may be performed as before 35 As before these provde the bass for estmatng cumulatve hazards and survval functons for gven values of fxed covarates (or gven paths of external tme-varyng covarates) 36

Usng R For llustraton we contnue to use the melanoma data # We ft a model where we stratfy on ulceraton use log-thckness as covarate ft.strat=coxph(surv(lfetme,status==)~log2(thckn)+strata(ulcer), data=melanoma) summary(ft.strat) # We may plot the cumuatve baselne hazards for the two ulceraton strata: baselne.covar=data.frame(thckn=) surv.strat=survft(ft.strat,newdata=baselne.covar) plot(surv.strat,fun="cumhaz", mark.tme=f,xlm=c(,), xlab="years snce operaton",ylab="cumulatve hazard",lty=:2) 37