ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Similar documents
Lecture 7 Time-dependent Covariates in Cox Regression

Case-control studies C&H 16

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Case-control studies

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

Survival Regression Models

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Beyond GLM and likelihood

Extensions of Cox Model for Non-Proportional Hazards Purpose

Extensions of Cox Model for Non-Proportional Hazards Purpose

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Statistics in medicine

Survival Analysis I (CHL5209H)

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Lecture 2: Poisson and logistic regression

Simple logistic regression

Lecture 5: Poisson and logistic regression

Multinomial Logistic Regression Models

Ph.D. course: Regression models. Introduction. 19 April 2012

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Stat 642, Lecture notes for 04/12/05 96

MAS3301 / MAS8311 Biostatistics Part II: Survival

Survival Analysis. Lu Tian and Richard Olshen Stanford University

STAT331. Cox s Proportional Hazards Model

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

β j = coefficient of x j in the model; β = ( β1, β2,

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

STAT 526 Spring Final Exam. Thursday May 5, 2011

STA6938-Logistic Regression Model

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Cox s proportional hazards model and Cox s partial likelihood

Basic Medical Statistics Course

Multi-state Models: An Overview

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Testing Independence

TMA 4275 Lifetime Analysis June 2004 Solution

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Faculty of Health Sciences. Cox regression. Torben Martinussen. Department of Biostatistics University of Copenhagen. 20. september 2012 Slide 1/51

DAGStat Event History Analysis.

Longitudinal Modeling with Logistic Regression

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

8 Nominal and Ordinal Logistic Regression

Federated analyses. technical, statistical and human challenges

UNIVERSITY OF CALIFORNIA, SAN DIEGO

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

STAT 7030: Categorical Data Analysis

Introduction to Statistical Analysis

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Chapter 20: Logistic regression for binary response variables

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

3003 Cure. F. P. Treasure

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Lecture 8 Stat D. Gillen

Tied survival times; estimation of survival probabilities

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Correlation and regression

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Multistate models and recurrent event models

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Count data page 1. Count data. 1. Estimating, testing proportions

Power and Sample Size Calculations with the Additive Hazards Model

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Philosophy and Features of the mstate package

Multistate Modeling and Applications

Relative-risk regression and model diagnostics. 16 November, 2015

Lecture 3. Truncation, length-bias and prevalence sampling

Semiparametric Regression

A Survival Analysis of GMO vs Non-GMO Corn Hybrid Persistence Using Simulated Time Dependent Covariates in SAS

Multistate models and recurrent event models

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Lecture 01: Introduction

Lecture 12: Effect modification, and confounding in logistic regression

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Statistics in medicine

9 Estimating the Underlying Survival Distribution for a

Survival Analysis Math 434 Fall 2011

ECONOMETRICS II TERM PAPER. Multinomial Logit Models

Methodological challenges in research on consequences of sickness absence and disability pension?

Meei Pyng Ng 1 and Ray Watson 1

Inference for Binomial Parameters

STA6938-Logistic Regression Model

COMPLEMENTARY LOG-LOG MODEL

The influence of categorising survival time on parameter estimates in a Cox model

ssh tap sas913, sas

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Chapter 4 Regression Models

9 Generalized Linear Models

Transcription:

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

Outline Survival Data Example: Malignant Melanoma Data The Cox Model Cox in SAS Choice of Time-Scale Example: Guinea-Bissau Data Delayed entries Time dependent explanatory variables 2 / 53

d i=1 exp(βx i ) j R(t i ) exp(βx j) 3 / 53

Survival Data Time to death or other event of interest. One time-scale including a well-defined starting time time-origin: Time from start of randomized clinical trial to death. Time from first employment to pension. Time from filling of a tooth to filling falls out. What is special about survival data? Right-skewed. No problem. CENSORING: For some we will only know a lower bound of lifetime. 4 / 53

Simple data 12 11 10 9 8 Individual 7 6 5 4 3 2 1 0 5 10 15 20 25 Times (months) 5 / 53

Survival and hazard function Let T be the TIME to event of interest: S(t) = P(T > t) = probability of survival to time t after entry at time 0 λ(t) = incidence, rate, or hazard Relationship: S(t) = exp ( t ) λ(s)ds = exp( Λ(t)) 0 Λ(t) is called the integrated hazard function. 6 / 53

λ(t) = λ S(t) = e λt Hazard rate 0.001 0.003 0.005 Survival Function 0.2 0.4 0.6 0.8 1.0 0 50 100 150 Time (t) 0 50 100 150 Time (t) Λ(t) = λt Integrated hazard 0.00 0.05 0.10 0.15 0 50 100 150 Time (t) 7 / 53

Kaplan-Meier estimate of survival function Death times t 1,..., t d (ordered). Y (t i ) = # alive just before t i. Ŝ(t) = ( 1 1 ) Y (t t i t i ) Risk sets 12 11 10 9 8 Individual 7 6 5 4 3 2 1 0 5 10 15 20 25 Times (months) 8 / 53

Survival probability 0.00 0.25 0.50 0.75 1.00 Kaplan Meier survival estimate 1 1 1 1 1 0 5 10 15 20 25 Time (months) Number at risk 12 10 9 6 5 4 1 9 / 53

Malignant Melanoma Data In the period 1962-77 a total of 205 patients had their tumor removed and were followed until 1977. At the end of 1977: 57 died of mgl. mel. (status=1) 134 were still alive. (status=2) 14 died of non-related mgl. mel. (status=3) competing risk Purpose: Study effect on survival of sex, age, thickness of tumor, ulceration, etc. 1962 1977 10 / 53

Malignant melanoma N time status sex age year thickness ulcer 1 10 3 1 76 1972 6.76 1 2 30 3 1 56 1968 0.65 0 3 35 2 1 41 1977 1.34 0 4 99 3 0 71 1968 2.90 0 5 185 1 1 52 1965 12.08 1 6 204 1 1 28 1971 4.84 1 7 210 1 1 77 1972 5.16 1 8 232 3 0 60 1974 3.22 1 9 232 1 1 49 1968 12.88 1 10 279 1 0 68 1971 7.41 1................ 203 4688 2 0 42 1965 0.48 0 204 4926 2 0 50 1964 2.26 0 205 5565 2 0 41 1962 2.90 0 11 / 53

The Cox Model The Cox model assumes that the rate for the ith individual is λ i (t) = λ 0 (t) exp(β 1 X i1 + β 2 X i2 +... + β p X ip ) where β 1, β 2,..., β p are regression parameters, X i1 is the covariate value for covariate 1 for individual i, etc. Finally, λ 0 (t) is the baseline hazard. Time t is the time-scale of choice, e.g. age, time since randomization, or time since operation. As formulated here the only quantity on the right-hand side of the equal sign that depends on time is the baseline hazard λ 0 (t). If all covariates (X s) are zero we get λ i (t) = λ 0 (t). The interpretation of the baseline hazard is thus the hazard of a individual that have all covariates equal to zero. 12 / 53

The Cox model λ i (t) = λ 0 (t) exp(β 1 X i1 + β 2 X i2 +... + β p X ip ) can also be written on the log-scale (natural log) log(λ i (t)) = log(λ 0 (t) exp(β 1 X i1 + β 2 X i2 +... + β p X ip )) The Cox model assumes that = log(λ 0 (t)) + β 1 X i1 + β 2 X i2 +... + β p X ip. the effects of covariates are additive and linear on the log rate scale, just like the poisson regression. the CORNER i.e. the baseline hazard is non-parametric and depends on time, and time is thus adjusted for. We now turn to the interpretation of the regression parameters β 1, β 2,..., β p. 13 / 53

One binary covariate To make things more simple we only study the effect of one single binary covariate, e.g. sex on the risk of dying { 0 if individual i is a female X i = 1 if individual i is a male The Cox model is λ i (t) = λ 0 (t) exp(βx i ). With X i defined as above we get { λ 0 (t) if individual i is a female λ i (t) = λ 0 (t) exp(β) if individual i is a male 14 / 53

Mortality Rate Ratio Hazard Ratio If λ i (t) = { λ 0 (t) λ 0 (t) exp(β) if individual i is a female if individual i is a male then we have that the RATE RATIO (RR) between males and females is RR = λ 0(t) exp(β) = exp(β). λ 0 (t) Importantly, the ratio is independent of time, i.e. we have PROPORTIONAL HAZARDS over time. The Cox model is also called the proportional hazards model. How to estimate β? And what about baseline hazard λ 0 (t)? 15 / 53

Likelihood Function The baseline hazard is regarded as a nuisance and is not in general estimated, but it is possible. Let t 1,..., t d be the ordered death times It can been shown, that all we need is to find the β that maximizes the following function called Cox s partial likelihood function d exp(βx i ) L(β) = j R(t i ) exp(βx j) i=1 where R(t i ) is the RISK SET at death time t i i.e. the set of individuals being at risk of dying (under observation) just before time t i. The resulting estimate β is called the MAXIMUM LIKELIHOOD ESTIMATE of β. 16 / 53

Likelihood Function a closer look Death times t 1,..., t d, numbering individuals with deaths first: i = 1, 2,..., d, d + 1,..., n. with times and covariates t 1, t 2,..., t d, t d+1,..., t n. X 1, X 2,..., X d, X d+1,..., X n. At each death time we have the RISK SET: individuals alive and at risk of dying just before the death time: R(t 1 ), R(t 2 ),..., R(t d ) 17 / 53

Risk sets 12 11 10 9 8 Individual 7 6 5 4 3 2 1 0 5 10 15 20 25 Times (months) 18 / 53

For the Cox model λ i (t) = λ 0 (t) exp(βx i ) we use the Cox likelihood function to estimate β: L(β) = = d exp(βx i ) j R(t i ) exp(βx j) i=1 exp(βx 1 ) j R(t 1 ) exp(βx j) exp(βx 2 ) j R(t 2 ) exp(βx j) exp(βx d ) j R(t d ) exp(βx j) We index individuals in the risk sets using the letter j. Writing j R(t 1 ) exp(βx j) means summing over the individuals in the risk set for death time t 1. If we here assume that no one was censored before the first death time all individuals are in the risk set R(t 1 ) and the sum is exp(βx 1 ) + exp(βx 2 ) + + exp(βx n ). 19 / 53

For example for the Cox model λ i (t) = λ 0 (t) exp(β sex) Sex: 1=male, 0=female. Likelihood function: exp(β) j R(t 1 ) exp(βx j) 1 j R(t 2 ) exp(βx j) exp(β) j R(t d ) exp(βx j). If we again assume that no one was censored before the first death time all individuals are in the risk set R(t 1 ) and the sum is exp(β) + 1 + + exp(β) = N M exp(β) + N F, where N M and N F number of males and females respectively in R(t 1 ). The risk sets also play a crucial role in nested case-control studies more on this later in the course. 20 / 53

So far the following assumptions have been made for the Cox model The baseline hazard is assumed non-parametric, i.e. assumed to vary freely. The effects of covariates are additive and linear on the log rate scale. The ratio of the hazard rate for two subjects are constant over time. In other words, there is no interaction between the covariates and the time variable. Let us look at the Melanoma data using SAS. 21 / 53

0.00 0.25 0.50 0.75 1.00 Kaplan Meier survival estimates, by sex 0 5 10 15 Time (years) female male What is the estimate of the RR between males and females? 22 / 53

Cox in SAS 9.1.3 In SAS, proc phreg and proc tphreg can be used for estimating in the Cox model. We will use proc tphreg as this procedure can handle categorical variables much easier than proc phreg. Using proc tphreg we define the variable sex to be categorical using the class statement. For the variable sex 1 is males and 0 is females. proc tphreg data=melanom; class sex; model time*status(2,3) = sex; run; Please note, that we have two censoring codes namely 2 and 3. NB: In SAS 9.2 proc phreg now handles class variables and proc tphreg is obsolete. 23 / 53

Part of output from proc tphreg: Analysis of Maximum Likelihood Estimates Parameter Standard Hazard Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio sex 0 1-0.66214 0.26513 6.2370 0.0125 0.516 The column Parameter Estimate is β. For a class variable SAS will automatically choose the highest number (here 1) as the reference. Thus, the rate ratio or Hazard Ratio is females compared to males. There is no estimate statement in proc (t)phreg, but a similar so-called contrast statement exists. Instead we can use the ref option in the class statement. Note also the option risklimits in the model statement which calculates the confidence interval for the hazard ratio. 24 / 53

proc tphreg data=melanom; class sex(ref="0"); model time*status(2,3) = sex / risklimits; run;... Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits sex 1 1 0.66214 0.26513 6.2370 0.0125 1.939 1.153 3.260 25 / 53

Melanoma data, thickness of tumor given by variable gtyk 1 if <2mm gtyk = 2 if 2-5 mm 3 if >5 mm proc tphreg data=melanom; class gtyk; model time*status(2,3) = gtyk / risklimits; run; Type 3 Tests Wald Effect DF Chi-Square Pr > ChiSq gtyk 2 25.6749 <.0001 Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits gtyk 1 1-1.67324 0.38572 18.8176 <.0001 0.188 0.088 0.400 gtyk 2 1-0.11055 0.32391 0.1165 0.7329 0.895 0.475 1.689 26 / 53

Melanoma data, + age in years proc tphreg data=melanom; class gtyk sex; model time*status(2,3) = gtyk sex age / risklimits; run; Type 3 Tests Wald Effect DF Chi-Square Pr > ChiSq sex 1 2.3660 0.1240 gtyk 2 21.3752 <.0001 age 1 1.5241 0.2170 Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits sex 0 1-0.41608 0.27050 2.3660 0.1240 0.660 0.388 1.121 gtyk 1 1-1.53827 0.39232 15.3738 <.0001 0.215 0.100 0.463 gtyk 2 1-0.08180 0.32849 0.0620 0.8033 0.921 0.484 1.754 age 1 0.01052 0.00852 1.5241 0.2170 1.011 0.994 1.028 27 / 53

LR = 560.248-532.244 = 28.0 χ 2 2 (2 degrees of freedom) 28 / 53 Likelihood Ratio Test. proc tphreg data=melanom; class gtyk sex; model time*status(2,3) = gtyk sex; run; Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 566.398 532.244 AIC 566.398 538.244 SBC 566.398 544.373 ------------------------------------------- proc tphreg data=melanom; class sex; model time*status(2,3) = sex; run; Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 566.398 560.248 AIC 566.398 562.248 SBC 566.398 564.291

SAS: p-value from chi-square test data temp; chisquare=28; df=2; p=1-probchi(chisquare,df); run; proc print data=temp; run; Obs chisquare df p 1 28 2.000000832 29 / 53

Choice of Time-Scale A study may be conducted over calendar time even though the natural time-scale is time since treatment Melanoma study. Cohort studies are often conducted by recruiting a random sample of the population at the start of the study and then these subjects are followed for a number of years Framingham. A natural time-scale may be age rather than time in study which most often is an artificial time-scale constructed by the investigators. What would time-origin be if age was chosen as time-scale? 30 / 53

Vaccinations in Guinea-Bissau 1990-96 Rural Guinea-Bissau: 5274 children under 7 months of age visited two times at home, with an interval of six months. Information about vaccination (BCG, DTP, mealses vaccine) collected at each visit and at second visit death during follow-up is registered. Some children moved away during follow-up, i.e. censored or survived until next visit, also censored. Below are some of the variable names from the bissau data. fuptime dead bcg agem Follow-up time in days 0 = censored, 1 = dead 1 = Yes, 2 = No Age at first visit in months 31 / 53

Is the risk of dying associated with vaccination? Outcome Exposure Died Survived Total BCG vaccinated 125 (3.8%) 3176 3301 not BCG vaccinated 97 (4.9%) 1876 1973 Total 222 (4.2%) 5052 5274 32 / 53

proc tphreg data=bissau; class bcg; model fuptime*dead(0)=bcg / rl ; run; Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 4.2824 1 0.0385 Score 4.3761 1 0.0364 Wald 4.3474 1 0.0371 Type 3 Tests Wald Effect DF Chi-Square Pr > ChiSq bcg 1 4.3474 0.0371 Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits bcg 1 1-0.28214 0.13532 4.3474 0.0371 0.754 0.578 0.983 33 / 53

proc tphreg data=bissau; class bcg agem; model fuptime*dead(0)=bcg agem / rl ; run; Type 3 Tests Wald Effect DF Chi-Square Pr > ChiSq bcg 1 5.6510 0.0174 agem 6 7.7246 0.2590 Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits bcg 1 1-0.34720 0.14605 5.6510 0.0174 0.707 0.531 0.941 agem 0 1 0.01053 0.35339 0.0009 0.9762 1.011 0.506 2.020 agem 1 1 0.12553 0.34494 0.1324 0.7159 1.134 0.577 2.229 agem 2 1-0.24631 0.35903 0.4707 0.4927 0.782 0.387 1.580 agem 3 1 0.20946 0.34502 0.3686 0.5438 1.233 0.627 2.425 agem 4 1 0.34300 0.34265 1.0020 0.3168 1.409 0.720 2.758 agem 5 1 0.34118 0.34699 0.9668 0.3255 1.407 0.713 2.777 34 / 53

Delayed entries Time in study Age as time 12 12 11 11 10 10 9 9 8 8 Individual 7 6 Individual 7 6 5 5 4 4 3 3 2 2 1 1 0 5 10 15 20 25 0 5 10 15 20 25 Times (months) Age (months) 35 / 53

Subjects are only at risk at age of entry and onwards. They are not at risk in our World of analysis before age of entry! Handling of delayed entries is easily done by careful control of the RISK SET R(t i ) at death time t i in the likelihood function: L(β) = d exp(βx i ) j R(t i ) exp(βx j) i=1 Only individuals at risk and under observation is included in the risk set R(t i ) at time t i. 36 / 53

Delayed entries in SAS data bissau2; set bissau; outage=age+fuptime; run; proc tphreg data=bissau2; class bcg; model (age,outage)*dead(0)= bcg / rl; run; Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits bcg 1 1-0.35542 0.14065 6.3854 0.0115 0.701 0.532 0.923 37 / 53

Time dependent explanatory variables The Cox model can be expanded to include time-varying covariates λ i (t) = λ 0 (t) exp(βx i (t)). The likelihood function for death times t 1,..., t d becomes L(β) = d i=1 exp(βx i (t i )) j R(t i ) exp(βx j(t i )). From this we can see that we just need to know the value of the covariates at the deaths times: X i (t 1 ), X i (t 2 ),..., X i (t d ). The covariate values at any time different from a death time is not used in the likelihood function. 38 / 53

The most simple time-varying covariate is a binary variable that is allowed to change once during follow-up, e.g. new BCG vaccinations registered between visits in the Bissau data: X i (t) = { 0 if no BCG before time t 1 if BCG-time t 39 / 53

A child being BCG-vaccinated after 3 months of follow-up. BCG 0 1 0 1 2 3 4 5 6 Follow up (months) The time-varying covariate is 0 in the time interval 0 to 3 months and 1 for the rest of follow-up. For a child who was BCG vaccinated before first visit the time-varying covariate is one during all the follow-up. 40 / 53

Multi-state Model λ 01 (t) 0 1 Unexposed Exposed λ 02 (t) 2 Dead λ 12 (t) We want to compare λ 02 (t) and λ 12 (t). The transition λ 01 (t) is not modeled here. 41 / 53

Instead of time of follow-up we will use age as time-scale to illustrate the use of BCG as a time-varying covariate in the Bissau data. At visit 2 the vaccination cards were seen for the children at home and an age of BCG vaccination (bcgage) was calculated: id fuptime dead age bcg bcgage outage... 486 159 0 199 1 107 358 487 183 0 97 1 20 280 488 183 0 43 2 174 226 489 137 1 140 1 40 277 490 183 0 165 1 46 348... 499 157 0 186 1 64 343 500 25 1 191 2. 216 501 157 0 183 1 61 340... 42 / 53

Binary time-varying covariate in SAS (I) proc tphreg data=bcg; if.<bcgage<outage then bcg_t=1; else bcg_t=0; model (age,outage)*dead(0)=bcg_t / rl ; run; Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits bcg_t 1-1.08278 0.14046 59.4286 <.0001 0.339 0.257 0.446 43 / 53

The if-statement if.<bcgage<outage then bcg_t=1; else bcg_t=0; is recalculated at each death time. The outage in the model statement refers to the current death times being evaluated (i.e. a t i in the likelihood). For the first death time which is t 1 = 23 days of age, the if-statement becomes if.<bcgage<23 then bcg_t=1; else bcg_t=0; being calculated for all children at risk at age 23 days (in R(t 1 = 23)) with their individual bcgage-values. This is a recalculation of the time-varying covariate at each death time c.f. the likelihood function. 44 / 53

Binary time-varying covariate in SAS (II) Splitting up persons with a changing time-varying covariate in two records: age bcgage outage bcgvacc=0 status=0 bcgvacc=1 status=dead and use delayed entries. Thus, we need to generate a new data set. 45 / 53

data splitbcg; set bcg; if bcgage=. or bcgage>outage then do; bcgvacc=0; entryage=age; exitage=outage; status=dead; output; end; if.<bcgage<=age then do; bcgvacc=1; entryage=age; exitage=outage; status=dead; output; end; if age<bcgage<=outage then do; bcgvacc=0; entryage=age ; exitage=bcgage; status= 0; output; bcgvacc=1; entryage=bcgage; exitage=outage; status=dead; output; end; run; id fuptime dead age bcg bcgage outage bcgvacc entryage exitage status 488 183 0 43 2 174 226 0 43 174 0 488 183 0 43 2 174 226 1 174 226 0 46 / 53

proc tphreg data=splitbcg; class bcgvacc(ref="0"); model (entryage,exitage)*status(0)=bcgvacc / rl ; run; Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits bcgvacc 1 1-1.08278 0.14046 59.4286 <.0001 0.339 0.257 0.446 47 / 53

Other time-varying covariates Effect of binary X (0,1) changes at t 0 : where λ i (t) = λ 0 (t) exp(β 1 X i + β 2 X i I (t t 0 )), I (t t 0 ) = Can be handled by method I+II. { 1 if t t 0 0 if t < t 0 Effect of binary X (0,1) decreases or increases with time: λ i (t) = λ 0 (t) exp(β 1 X i + β 2 (X i t)). Can be handled by method I or by splitting at each failure or special options. 48 / 53

Stanford Heart Transplant Data (p. 235) In a report (Crowley and Hu, J Amer. Statist Assoc. 1977) on the Stanford Heart Transplantation Study, patients identified as been eligible (N=103) for a heart transplant were followed until death or censorship. In total 65 received transplant during follow-up, whereas 38 did not. Assess whether transplanted patients survive better. On the next slide you will find the variables in the transplant data set. Here we will discuss how to analyse and at the exercises we will do some of the analyses. 49 / 53

Stanford Heart Transplant Data variables age cens days trans wait mismatch age (in years) at entry into the study. 0 = Censoring 1 = Dead number of days from entry to dead/censoring. 1 = if the person had a heart transplantation 0 = otherwise. number of days from entry to transplantation NB: if trans = 0 then wait = -1 1 = mismatch between HLA type in donor and patient 0 = no mismatch NB: if trans = 0 then mismatch = -1. 50 / 53

Obs age cens days trans wait mismatch 52 56 1 90 1 27 1 53 53 1 96 1 67 0 54 48 1 100 1 46 0 55 41 1 102 0-1 -1 56 28 0 109 1 96 1 57 46 1 110 1 60 0 58 23 0 131 1 21 1 59 41 1 149 0-1 -1 60 47 1 153 1 26 0 61 43 1 165 1 4 0 62 26 0 180 1 13 0 63 52 1 186 1 160 1 64 47 1 188 1 41 0 65 51 1 207 1 139 1 66 51 1 219 1 83 1 67 8 1 263 0-1 -1 68 47 0 265 1 28 0 69 48 1 285 1 32 1 70 19 1 285 1 57 0 71 49 1 308 1 28 0 51 / 53

Piecewise Constant Hazard Rate = Poisson regression Divide the time scale into K pieces and assuming piecewise constant but different hazard rates in each of the intervals. This may provide a sensible summary of many phenomena and is often used in epidemiology. λ 1 λ 2 λ 3 λ K c 0 = 0 c 1 c 2 c 3 c K 1 c K Age Thus λ(t) = λ k for t (c k 1, c k ], k = 1,..., K The intervals do not need to be of same length. We only need to keep record of the total number of deaths and the exposure time in each group. 52 / 53

We can further divide each interval into categories of covariates, e.g. sex (F=females, M=males): λ 1F λ 2F λ 3F λ KF λ 1M λ 2M λ 3M λ KM c 0 = 0 c 1 c 2 c 3 c K 1 c K Age Not straight forward in SAS to split the time-scale, but so-called user-written SAS-macros exist. See for example: http://staff.pubhealth.ku.dk/~bxc/lexis/lexis.sas Stata use stsplit command. R packages exist (e.g. Epi Package) SPSS? 53 / 53