You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David Cox

Statistics 745: Lecture 12 Eric B. Laber Department of Statistics, North Carolina State University February 28, 2012

Where we ve been Unconditional modeling of survival functions Parametric regression models Accelerated failure time models Cox proportional hazards models

Where we re going Application of these methods (a few more data examples and projects) Extending these ideas Personalized treatment and machine learning methods...

So, you ve got some data You observe a training set D = {(Y i, X i )} n i=1 where Y R and X R p. Your goal is model the mean of Y give X, how do you get started?

So, you ve got some data You observe a training set D = {(Y i, X i )} n i=1 where Y R and X R p. Your goal is model the mean of Y give X, how do you get started? You observe the training set D = {(X i, i, Z i )} n i=1 where X i = min(t i, C i ) R +, Delta i = 1 Ti C i, and Z i R p. Your goal is to model the conditional survival function of T given Z, how do you get started?

Basics of data analysis TALK TO THE SCIENTISTS! Exploratory data analysis (EDA, this is where we will focus our attention today)

Basics of data analysis TALK TO THE SCIENTISTS! Exploratory data analysis (EDA, this is where we will focus our attention today) Univariate plots Model diagnostics Survival is more complex than regression (Why?)

BMT Example Bone marrow transplant data (example 1.3 in K&M) Response: patient survival time, right censored Many baseline predictors: age, sex, donor demographics, transplant waiting time, degree of need, center etc. Time-dependent covariates: time to graft vs. host disease, time to return of platelets to normal levels, etc.

Looking for relationships...

Binary predictors Case 1: Z is binary Ex. sex

Binary predictors Case 1: Z is binary Ex. sex 0.0 0.2 0.4 0.6 0.8 1.0 Male Female 0 500 1000 1500 2000 2500

Categorical predictors Case 2: Z is categorical Ex. center

Categorical predictors Case 2: Z is categorical Ex. center 0.0 0.2 0.4 0.6 0.8 1.0 OSU Alferd St. Vincent Hahnemann 0 500 1000 1500 2000 2500

Continuous predictors Case 3: Z is continuous Ex. patient age What to do?

Continuous predictors Case 3: Z is continuous Ex. patient age What to do? Categorize and apply case 1 or 2 0.0 0.2 0.4 0.6 0.8 1.0 Age < 35 Age >= 35 0 500 1000 1500 2000 2500

Continuous predictors Case 3: Z is continuous Ex. patient age What to do? Categorize and apply case 1 or 2 0.0 0.2 0.4 0.6 0.8 1.0 Age <= 21 21 < Age <= 28 28 < Age <= 35 35 < Age 0 500 1000 1500 2000 2500

Model assumptions...

Checking proportional hazards Case 1: Binary covariate Z If the proportional hazards model holds then Λ(t Z = 1) = exp{β}λ(t Z = 0), where Λ(t Z = z) denotes the cumulative hazard function.

Checking proportional hazards Case 1: Binary covariate Z If the proportional hazards model holds then Λ(t Z = 1) = exp{β}λ(t Z = 0), where Λ(t Z = z) denotes the cumulative hazard function. Convince your neighbor that this is true (1 minute). Using the above relationship we have log Λ(t Z = 1) logλ(t Z = 0) = β, thus, we can estimate the cumulative hazard and plot the right hand size of the above equation against t. A constant trend indicates the proportional hazards model may be appropriate.

Checking proportional hazards cont d Recall the Nelson-Aalen estimator is given by ˆΛ(t) u<t dn(u) Y (u). Estimate Λ(t Z = 1 and Λ(t Z = 0) using the Nelson-Aalen estimator on disjoint subsets of the data corresponding to Z = 1 and Z = 0

Binary predictors Look for parallel log cumulative hazards 0.01 0.02 0.05 0.10 0.20 0.50 1.00 Male Female 0 500 1000 1500 2000 2500

To be continued... To test proportional hazards for continuous predictors Z we ll need some more tools...

Time-dependent covariates So far we have considered models built on baseline information Often, data is collected during the course of a study Using evolving patient information can lead to a better understanding of the survival function Does this matter? Yes, we can obtain better estimates of the survival function and better understand the interplay between evolving patient health characteristics and survival.

Time-dependent covaraites cont d Suppose that for each patient, in addition to observing X i = min(t i, C i ) and i = 1 Ti C i, we observe Z(t) = {Z 1 (t), Z 2 (t), Z p (t), t T i } How can we incorporate time-dependent covariates into the model?

Basic Cox model Observed data D = {(X i, i, {Z i (t), 0 t T i }} n i=1 Under assumptions of conditional independence of censoring time and failure time, partial likelihood is given by D i=1 exp p j=1 β jz (i)j (t i ) k Y (t i ) exp p j=1 β jz kj (t i ), where t 1 < t 2 <... < t D denote distinct failure times, Z (i) (t i ) denotes the covariates associated with the failure time t i, and Y (t i ) denotes the risk set at t i.

Basic Cox model cont d Estimation and inference for β can proceed as usual Maximize partial likelihood to find an estimator Use observed Fisher information and asymptotic normality to construct a confidence set, etc.

Basic Cox model cont d Common examples of time-varying covaraites BMI Blood pressure Number of hospitalizations Presence of a comorbid condition Depression inventory (e.g. HAMD) Treatment adherence... Of course, by defining Z j (t) = Z j for all t makes any covariate time-dependent It is assumed in the derivation of the partial likelihood that Z(t) is predictable, e.g., it is known conditional on all available information just prior to time t

Back to testing model adequacy Suppose that Z is continuous (not time dependent) We would like to test the proportional hazards assumption λ(t z) = λ 0 (t) exp{β 1 z} A common approach is to use a time-dependent proportional hazards model to do this For fixed function g(t) define W = g(t)z and consider the proportional hazards model λ(t Z, W ) = λ 0 (t) exp{β 1 Z + β 2 W }

Back to testing model adequacy cont d For fixed function g(t) define W = g(t)z and consider the alternative proportional hazards model λ(t Z, W ) = λ 0 (t) exp{β 1 Z + β 2 W } If the posited proportional hazards model (e.g., the one that depends only on Z) is correct than β 2 = 0. Idea! Test H 0 : β 2 = 0 to test validity of proportional hazards.