Outline. Recall that Aalen additive hazards model and the semiparametric version

Similar documents
STK4080/9080 Survival and event history analysis

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Chapter 13: Multiple Regression

Chapter 20 Duration Analysis

Regression Analysis of Clustered Failure Time Data under the Additive Hazards Model

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

Time to dementia onset: competing risk analysis with Laplace regression

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

STAT 511 FINAL EXAM NAME Spring 2001

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Chapter 9: Statistical Inference and the Relationship between Two Variables

Dummy variables in multiple variable regression model

x i1 =1 for all i (the constant ).

Statistics for Economics & Business

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Chapter 8 Indicator Variables

RELIABILITY ASSESSMENT

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Comparison of Regression Lines

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Business and Economics

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

The Geometry of Logit and Probit

CS-433: Simulation and Modeling Modeling and Probability Review

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

A joint frailty-copula model between disease progression and death for meta-analysis

Lecture Notes on Linear Regression

/ n ) are compared. The logic is: if the two

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Linear Approximation with Regularization and Moving Least Squares

Basically, if you have a dummy dependent variable you will be estimating a probability.

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Introduction to the R Statistical Computing Environment R Programming

Homework Assignment 3 Due in class, Thursday October 15

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Semiparametric Methods of Time Scale Selection

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

Stat 543 Exam 2 Spring 2016

Credit Card Pricing and Impact of Adverse Selection

Lecture 6 More on Complete Randomized Block Design (RBD)

1 Binary Response Models

Lecture 6: Introduction to Linear Regression

Diagnostics in Poisson Regression. Models - Residual Analysis

An Introduction to Censoring, Truncation and Sample Selection Problems

Simulation and Random Number Generation

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Online Classification: Perceptron and Winnow

Global Sensitivity. Tuesday 20 th February, 2018

Chapter 11: Simple Linear Regression and Correlation

Logistic regression models 1/12

Basic Business Statistics, 10/e

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Phase I Monitoring of Nonlinear Profiles

Chapter 11: I = 2 samples independent samples paired samples Chapter 12: I 3 samples of equal size J one-way layout two-way layout

STAT 3008 Applied Regression Analysis

6 Supplementary Materials

Lecture 3: Probability Distributions

A Robust Method for Calculating the Correlation Coefficient

Stat 543 Exam 2 Spring 2016

Negative Binomial Regression

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Dimension Reduction and Visualization of the Histogram Data

CS286r Assign One. Answer Key

January Examinations 2015

Faculty of Health Sciences. Cox regression. Torben Martinussen. Department of Biostatistics University of Copenhagen. 20. september 2012 Slide 1/51

Lecture 2: Prelude to the big shrink

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

Chapter 4: Regression With One Regressor

Basic R Programming: Exercises

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Chapter 15 - Multiple Regression

Scientific Question Determine whether the breastfeeding of Nepalese children varies with child age and/or sex of child.

Linear Regression Analysis: Terminology and Notation

Introduction to Econometrics (3 rd Updated Edition, Global Edition) Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 13

Sample Size Calculation Based on the Semiparametric Analysis of Short-term and Long-term Hazard Ratios. Yi Wang

β0 + β1xi. You are interested in estimating the unknown parameters β

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

First Year Examination Department of Statistics, University of Florida

Strong Markov property: Same assertion holds for stopping times τ.

Financing Innovation: Evidence from R&D Grants

x = , so that calculated

Generalized Linear Methods

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Transcription:

Outlne Clustered survval data (addtve models) Addtve-Multplcatve hazards model Advanced Survval Analyss 21 Copenhagen Gulana Cortese gco@bostat.ku.dk Clustered survval data Margnal addtve models Addtve-Multplcatve hazards model Cox-Aalen model Proportonal excess hazards models Aalen addtve model Clustered survval data Recall that Aalen addtve hazards model and the semparametrc verson p λ(t) = Y (t)x T (t)β(t) = Y (t) X j(t)β j(t), (1) and the semparametrc verson j=1 λ (t) = Y [ (t) X T (t)β(t) + Z T (t)γ ]. (2) These are models for the ntensty of the observed countng process, e.. λ(t)dt = E [dn(t) σ(n(s), X (s), Z(s), Y (s), s [, t) ] In clustered data, falure tmes wthn each cluster may be correlated. Many examples: to blndness n both eyes Twn studes Recurrent events Here, choce of survval model depends on the ams of the study, such as 1) correlaton of falure tmes wthn clusters (condtonal models) 2) comparson of falure tmes across clusters (margnal models)

Clustered survval data Clustered survval data Two basc approaches : 1) Fralty models: gven random eects "Z" the survval tmes are ndependent wth hazards λ(t)z exp(x T β). 2) Margnal models: Gven covarates the margnal rate (hazard) s Y (t)λ(t) exp(x T β) = E(dN(t) X, Y (t) = 1)/dt For k = 1,..., K, = 1,..., n, let T and C be the falure and censorng tmes for the th ndvdual n the kth cluster, and let X (t) be a p-vector of covarates. Put T k = ( T1k,..., Tnk), C k = (C1k,..., C nk), X k(t) = (X1k(t),..., X nk(t)). We assume that ( Tk, C k, X k( )), k = 1,..., K are ndependent and dentcally dstrbuted varables and these varables follow the model descrbed n the followng. The rght-censored falure tme s denoted T = T C and as usual we let Y (t) = 1(T t). Margnal models Rght-censored falure tmes T = T C, Y (t) = 1(T t), N (t) = 1(T t, T = T) Margnal (ntensty) model s a model Cox model: F t = σ{n (s), Y (s), X (s) : s t}, (3) λ F (t) = Y (t)λ(t) exp (X T (t)β). (4) It s mportant to note that (4) s not the ntensty wth respect to the observed ltraton F t = F k t, (5) k F k t = σ{n (s), Y (s), X (s) : = 1, n, s t} s the nformaton generated by observng all the ndvduals n the kth cluster. Margnal addtve models Instead of the Cox model we assume the Aalen addtve model for the margnal ntenstes, λ F (t) = Y (t)x T (t)β(t). (6) Possble nteracton terms are ncluded n the covarate vector X T (t). Let Xk(t) be the n p-matrx wth th row and let (Y (t)x 1(t),..., Y (t)x p(t)). N k(t) = (N1k(t),..., N nk(t)) T M k(t) = N k(t) X k(s)db(s). The -th component of M k(t), M (t) s a martngale wth respect to, but M k(t) s not a martngale wth respect to the observed ltraton. F t

Margnal addtve models Margnal addtve models One may also show that (7) has the same lmt dstrbuton as The (unweghted) workng ndependence estmator of B(t) s We have that K K ˆB(t) = [ X T k (s) Xk(s)] 1 X T k (s)dn k(s) K K 1/2 (ˆB(t) B(t)) = K 1/2 K [K 1 X T k (s) Xk(s)] 1 X T k (s)dm k(s) K = K 1/2 ɛ k(t) (7) whch s essentally a sum of..d. components (replace K 1 K X T (t) Xk(t) by ts lmt n probablty). k ˆɛ k(t) = K 1/2 K K 1/2 K ˆɛ k(t)g k, (8) K [K 1 X T k (s) Xk(s)] 1 T X k (s)d ˆMk(s) and ˆM k(t) = N k(t) X k(s)d ˆB(s). G1,..., GK are ndependent standard normal. The asymptotc covarance matrx of K 1/2 ( ˆB(t) B(t)) s estmated consstently by K K 1 ˆɛ 2 (t). k Resamplng from (8) may be used to construct condence bands and obtan p-values n tests about B(t). Example: Dabetc retnopathy data Example: Dabetc retnopathy data Subset of 197 patents wth dbetc retnopathy. Purpose of study was to assess the ecacy of laser treatment n delayng onset of blndness. Treatment assgned randomly to one eye of each patent, the other eye was wthout treatment. d tme status trteye treat adult 1 5 46.2496727 2 1 2 2 5 46.275538 2 2 3 14 42.568362 1 1 1 4 14 31.3414453 1 1 1 5 16 42.39841 1 1 1 6 16 42.274643 1 1 7 25 2.6443229 2 1 1 8 25 2.635121 2 1 9 29 38.7829463 2 1 1 1 29.3153673 1 2 1 > adult.treat<-(dabetes$adult==2)*(dabetes$treat) > ft<-aalen(surv(tme,status) ~adult+treat+adult.treat, dabetes,cluster=dabetes$d) > plot(ft) > summary(ft) Addtve Aalen Model Test for nonparametrc terms Test for non-sgnfcant effects sup hat B(t)/SD(t) p-value H_: B(t)= (Intercept) 2.9.387 adult 1.86.534 treat 2.49.18 adult.treat 2.99.57 Test for tme nvarant effects sup B(t) - (t/tau)b(tau) p-value H_: B(t)=b t (Intercept).187.791 adult.144.765 treat.11.553 adult.treat.154.89 nt (B(t)-(t/tau)B(tau))^2dt p-value H_: B(t)=b t.597.669 (Intercept) adult.161.94 treat.849.713 adult.treat.433.658

Example: Dabetc retnopathy data Example: Dabetc retnopathy data (Intercept) adult The margnal addtve model may be smpled by assumng a tme-constant eect for the nteracton term. Cumulatve coeffcents..5 1. 1 2 3 4 5 6 Cumulatve coeffcents.2..2.4.6.8 1 2 3 4 5 6 > ft.s<-aalen(surv(tme,status) ~adult+treat+const(adult.treat), dabetes,cluster=dabetes$d) > par(mfrow=c(2,2));plot(ft,sm.c=2) Test for non-sgnfcant effects Supremum-test of sgnfcance p-value H_: B(t)= (Intercept) 2.51.118 adult 2.1.3 treat adult.treat treat 2.48.169 Cumulatve coeffcents.6.4.2..1 1 2 3 4 5 6 Cumulatve coeffcents 1.2.8.4. 1 2 3 4 5 6 Test for tme nvarant effects Kolmogorov-Smrnov test p-value H_:constant effect (Intercept).243.73 adult.127.3 treat.689.828 Parametrc terms : Coef. SE Robust SE z P-val const(adult.treat) -.921.382.339-2.41.16 Fgure: Dabetes-data. Cumulatve regresson estmators along wth 95% condence ntervals (full lnes) and unform bands (broken lnes). Margnal semparametrc addtve models What are the conclusons about the nteracton? What does ts regresson coecent express? Multplcatve and addtve hazards models Multplcatve ntensty models One can also consder a semparametrc addtve model the margnal ntenstes are assumed to be λ F (t) = Y [ (t) X T (t)β(t) + Z T (t)γ ], (9) γ s an unknown q-dmensonal vector of tme-constant coecents. Workng ndependence estmator of B(t) and γ can be found smlarly to the nonparametrc model. Agan, an d decomposton for K 1/2 (B(t, ˆγ) B(t)) and for K 1/2 (ˆγ γ) can be obtaned and used for resamplng. Addtve models If one compares the two models: α (t) = Y (t) exp (β(t) T X (t)γ T Z (t)) α (t) = Y (t)(β(t) T X (t) + γ T Z (t)) Multplcatve model very attractve but smoothng s needed! Multplcatve model leads to a relatve-rsk type summary. Addtve model very appealng because of lack of smoothng parameter, but hazard predctons may be negatve. Addtve model leads to excess rsk nterpretaton. Some eects are addtve (competng rsks model), and some wll be multplcatve (smokng). Goodness-of-t wll seldom clearly decde whch model s the correct one.

Survval Estmaton for proportonal/addtve models Survval estmate s S (t) = exp( Therefore estmate λ (s)ds: Cox: exp(βt X )dλ(s). Multplcatve: exp(β(s)t X )dλ(s). Aalen: B(s) T X. Cox-Aalen: (B(s) T X ) exp(γ T Z). λ (s)ds). In multplcatve model S (t) needs the estmates of beta(s), and thus t depends on bandwdth. Asymptotcs are more complcated. In addtve model S (t) does not need smoothng, problems of negatve hazards predctons can be handled wth post-processng. Multplcatve-addtve hazards models The addtve and multplcatve hazards models may be combned as follows: Cox-Aalen model λ(t) = Y (t) [ X T (t)α(t) ] exp{z T (t)β}. Here there s an addtve model for the baselne based on covarate X. The nterpretaton of relatve rsks for Z s the same as n Cox model, whle the coecents α(t) represent the excess baselne due to presence of X. Proportonal excess hazards model λ(t) = Y (t)x T (t)α(t)+ρ(t)λ(t) exp{z T (t)β}, Y (t) and ρ(t) are at rsk ndcators, λ(t) s the baselne hazard of the excess rsk term. Estmaton n Cox-Aalen Estmaton n Cox-Aalen λ(t) = Y (t) [ X T (t)α(t) ] exp{z T (t)β}. The log-lelhood functon leads to the score equatons for β and da(t) Z(t) T { dn Y (β, t) T da(t) } =, Y (β, t) T W (t) { dn Y (β, t) T da(t) } =, W (t) = dag(w (t)) wth w (t) = Y (t) λ = Y (t) exp( Z (t) T β), (t) X (t) T α(t) for = 1,..., n. For known β ths leads to the estmator of the cumulatve ntensty Â(β, t) = Y (β, s)dn(s), Y (β, t) = (Y (β, t) T W (t)y (β, t)) 1 Y (β, t) T W (t) Insertng ths estmator nto the score equaton for β gves U(β) = wth τ U(β) = U(β, τ) = Z T (t)g(β, t)dn(t), (1) G(β, t) = I Y (β, t)y (β, t) s the projecton onto the orthogonal space spanned by the columns of Y (β, t). Easy to see that U(β, t) = s a square ntegrable martngale. Z T (s)g(β, s)dm(s)

2 4 6 8 2 4 6 8 2 4 6 8 Example: PBC data Example: PBC data > ft <- cox.aalen(surv(tme/365, status) ~ prop(age) + edema + + prop(log(bl)) + prop(log(alb)) + log(protme), pbc, + maxtme = 3/365) > summary(ft) (Intercept) Edema pro.m Test for tme nvarant effects sup B(t) - (t/tau)b(tau) p-value H_: B(t)=b t (Intercept) 4.557.41 edema.596.334 log(protme) 1.815.424 Proportonal Cox terms : Coef. Std. Error Robust SE D2log(L)^-1.35.7.1.8 prop(age) prop(log(bl)).8.78.87.87 prop(log(alb)) -2.451.676.647.675 Score Tests for Proportonalty sup hat U(t) p-value H_ Cumulatve regresson functon..2.4.6 Cumulatve regresson functon.4.2..2.4.6 Cumulatve regresson functon.5..5 1. 1.5 2. prop(age) 75.46.686 prop(log(bl)) 17.358.14 prop(log(alb)).516.992 Proportonal excess hazard models Example: Melanoma data For model 11 It can be seen as an excess rsk model (Martnussen & schee, 22): λ(t) = Y (t)x T (t)α(t)+ρ(t)λ(t) exp{z T (t)β}, (11) data(melanoma) lt<-log(melanoma$thck) excess<-(melanoma$thck>=21) # excess rsk for thck tumors ft<-prop.excess(surv(days/365,status==1)~ sex + ulc + cox(sex)+ cox(ulc) + cox(lt), melanoma, excess=excess, n.sm=1) For model 12 ρ = 1, all, gves a mx of Aalen's and Cox's models. The Cox term represents the excess rsk for an exposed subject wth covarates Z. ρ s an excess ndcator, e.g. I (d > ) wth d doses for th subject. The addtve term X T (t)α(t) can be estmated from the study. It can also be replaced by a known functon of X, α(t, X ), whch represents the background mortalty rate of a control populaton (often n cancer studes) (Sasen, 1996): λ(t) = Y (t) [ α(t, X )+λ(t) exp{z T (t)β} ]. (12) data(mela.pop) out<-pe.sasen(surv(start,stop,status==1)~ age + sex, mela.pop, d=1:25, max.tme=7, offsets=mela.pop$rate, n.sm=1) summary(out) Proportonal terms: coef se(coef) z p.327.125.261.794 age sex.472.365 1.29.196 Test for Proportonalty sup hat U(t) p-value H_ age 56..89 sex 3.77.43 ul<-out$cum[,2]+1.96*out$var.cum[,2]^.5 ll<-out$cum[,2]-1.96*out$var.cum[,2]^.5 plot(out$cum,type="s",ylm=range(ul,ll)) lnes(out$cum[,1],ul,type="s"); lnes(out$cum[,1],ll,type="s")

Exercse on Dabetes data Consder the Dabetes data n the tmereg package. Start from the followng model ft.s<-aalen(surv(tme,status) ~adult+treat+const(adult.treat), dabetes,cluster=dabetes$d) Try to smplfy further the above margnal semparametrc addtve model, after successve GOF tests. What s a nal possble model? Solve Exercse 9.1 of the book Martnussen & Schee (26). Exercse on TRACE data The TRACE study group conducted a study amed at assessng the prognostc mportance of varous rsk factors on mortalty for approxmately 6 patents wth myocardal nfarcton. The patents had varous rsk factors recorded such as age, sex (male=1), congestve heart falure (chf) (present=1), dabetes (present=1) and ventrcular brllaton (vf) (present=1). Consder the TRACE data n the tmereg package (a subset of patents from the orgnal data) wth the above covarates. ft.tr <- cox.aalen(surv(tme,status==9)~ chf + age + sex + prop(dabetes) + vf, TRACE) plot(ft.tr, sm.c=2,xlab=" (years)") Conclude what s/are the best model/models (Cox, addtve, Cox-Aalen) from ttng the TRACE data and arrve to a nal concluson by usng GOF procedures. Compare the nal model wth results from a Cox model wth "vf" as a stratcaton factor. Can ths be obtaned from cox.aalen? Plot survval predctons, gven some covarate values, for your nal model and the above strated Cox model. By comparng estmated curves, what may we conclude?