ST745: Survival Analysis: Parametric

ST745: Survival Analysis: Parametric Eric B. Laber Department of Statistics, North Carolina State University January 13, 2015

...the statistician knows... that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world. Nicole Polizzi

Warm-up Define and explain 1. Surivor function 2. Hazard function 3. Cumulative hazard function 4. How does the density relate to the hazard function? True or false: (T/F) Hazard fn is always positive (T/F) Hazard fn is always less than one (T/F) Hazard fn integrates to one What is censoring?

Fact from a past life Recall that if X Gamma(α, β) then X has density f (x) = where Γ(u) is the gamma function Γ(u) βα Γ(α) x α 1 exp { βx}, 0 x u 1 exp { x}.

Parametric models The first part of this course will focus on parametric models Important role in statistics Data impoverished settings Gain intuition Reference distributions for test cases Sometime justified by underlying process With parametric models we can focus on the usual likelihood based approaches Maximum likelihood estimation Wald-type confidence intervals

Example: Exponential distribution If you can t get your estimator to work assuming an exponential generative model, you re in a bad way. Adam Baldwin as Animal Mother in Full Metal Jacket Exponential distribution defined by h(t) λ Exercise: Find f (t) and S(t) Note sometimes parameterized by θ = 1/λ T exp(θ) implies ET = θ, and Var T = θ

Example: Weibull distribution Very widely used, esp. in manufacturing Closed under minimum operation Minimum of independent Weibulls with same shape parameter is Weibull Think weakest link in a chain T Weib(λ, β) then h(t) = λβ(λt) β 1 Exercise: find f (t) and S(t) On board: for T Weib(λ, β) find ET r

Example: Weibull distribution cont d Weibull distribution gives a reasonable amount of flexibility β > 1 increasing hazard β < 1 decreasing hazard β = 1 constant hazard Combination of flexibility and analytic tractability makes the Weibull an appealing choice in practice Weibull hazard, β=1/2 Weibull hazard, β=1 Weibull hazard, β=2 h(t) 0 1 2 3 4 5 6 7 h(t) 0.6 0.8 1.0 1.2 1.4 h(t) 0 2 4 6 8 10 0 1 2 3 4 5 t 0 1 2 3 4 5 t 0 1 2 3 4 5 t

Example: Weibull distribution cont d Book sometimes uses T Weib(α, β) where λ = 1/α β is sometimes called the shape parameter λ can be seen to compress or stretch the time axis f (t; λ, β) = λβ(λt) β 1 exp { (λβ) β}

Code break ## Generate 100 samples from exp distn with mean 3 n = 100; lam = 1/3; x = rexp(n, rate=lam); hist (x, freq=f, col="gray", main=""); ## Plot density and survivor function x = seq (0, 5, length=1000); plot(x, dexp (x, rate=lam), lwd=4, type= l ); plot(x, 1-pexp(x, rate=lam), lwd=4, type= l );

Code break II ## Generate 1000 points from weib(1,2) n = 1000; shp = 1; scl = 1; x = rweibull(n, shape=shp, scale=scl); hist (x, col= gray, freq=f); ## Plot CDF and cumulative hazard x = seq (0, max(x), length=1000); plot(x, dweibull(x, shape=shp, scale=scl)); ST = 1-pweibull(x, shape=shp, scale=scl); plot(x, -log(st), type= l );

Example: extreme value distribution T Weibull then log T EV Y = log T takes values in R (why?) { ( )} y u y u f (y) = b 1 exp exp b b { ( )} y u S(y) = exp exp, b where b > 0, u R. Does the form of f (y) look familiar? If T weib(α, β) then we say Y = log T EV (u, b) with b = 1/β and u = log α. If Y EV (u, b) then (Y u)/b EV (0, 1; EV(0,1) is called the standard extreme value distribution

Example: extreme value distribution cont d Question: let X denote covariates how can we (conveniently) build a model so that T X follows a Weibull distribution? Let u(x) and b(x) be mean and standard deviation functions E.g., u(x) = x θ and b(x) = exp {x γ} Assume log T u(x ) EV (0, 1), b(x ) this is an example of our old friend the location-scale model!

Example: log-normal distribution (aka the Lormal distn) T is said to be log-normally distributed if Y = log T N(µ, σ 2 ) which we denote T log N(µ, σ 2 ) A log-normal has density f (t) = { 1 2πσ 2 t exp 1 2 2 how can we derive this density? ( ) } log t µ 2, σ Derive the hazard function and show h(0) = 0 and lim t h(t) = 0.

Example: log-normal distribution cont d h(t) is continuous and satisfies h(0) = lim t h(t) = 0 Corresponds to a mix of individuals with short and long lifetimes Some forms of cancer Marriages Some electronics Waiting time of my dog

Code break III Question: Using only what we ve already covered in class, how to generate data from an standard EV in R? Question: Using only what we ve already covered in class, how to generate data from an EV (u, b)? Using the function rnorm which generates data from a standard normal distribution, how would you generate data from a log N(µ, σ 2 )?

Example: log-logistic distribution Also called the Fisk distribution Density, survivor, and hazard where α, β > 0 f (t) = (β/α)(t/α)β 1 [1 + (t/α) β ] 2, [ S(t) = 1 + (t/α) β] 1, h(t) = (β/α)(t/α)β 1 [1 + (t/α) β, ]

Example: log-logistic distribution As the name suggests, log T follows a logistic distribution The hazard function can be monotone or non-monotone depending on the choice of β, α shrinks or expands the time axis h(t) 0 2 4 6 8 β = 1/2 β = 1 β = 2 β = 4 β = 8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t

Log location-scale models Recall a parametric location scale model for Y R f (y) = 1 ( ) y u b f 0, b where u is the location and b is the scale parameter F (y) = F 0 ((y u)/b) and S(y) = 1 F 0 ((y u)/b) = S 0 ((y u)/b) Common examples S0 (y) = exp{ e y } extreme value S0 (y) = 1 Φ(y) normal S0 (y) = (1 + e y ) 1 logistic

Log location-scale models cont d Any location-scale model can be made into a lifetime distn through exponentiation T = exp{y } Thus P(T t) = S 0 ((log t u)/b) Log location-scale models are useful for simulating data 1. Generate Y X (µ(x ), σ 2 (X )) 2. Set T = exp{y }

Piecewise constant hazard functions Recall that a piecewise constant function takes the form h(t) = λ j if a j t < a j+1, where 0 = a 0 < a 1 < < a m = h(t) 0.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 2.5 t

Piecewise constant hazard functions, cont d A piecewise hazard can closely* approximate a large class of functions These will play a role later on when we study nonparametrics Note that the cumulative hazard function is given by H(t) = m(t) 1 j=1 λ j (a j a j 1 ) + λ m(t) (t a m(t) 1 ), where m(t) satisfies a m(t) 1 t < a m(t) What is S(t) in this case? What is f (t) in this case?

Piecewise constant hazard functions, cont d Piecewise constant hazards are discontinuous which may not be appealing in some settings. Common alternatives are: 1. Linear spline: where (u) + max(0, u) 2. Cubic spline: m 1 h(t) = α 0 + λ j (t a j ) +, j=1 h(t) = α 0 + α 1 t + α 2 t 2 + α 3 t 3 + m λ j (t a j ) 3 + 3. Another approach is to model log h(t) with a nonlinear basis expansion j=1

Regression models A common question of interest is how covariates affect survival Tumor stage and cancer survival Treatment received and time to relapse Manufacturing conditions and time to failure Link parametric models with covariates by making parameters fns of covariates E.g., S(t x) = exp{ λ(x)t} E.g., P(Y y x) = S0 ((y β x)/b)