Lecture 4 - Survival Models

Similar documents
Survival Analysis. Stat 526. April 13, 2018

MAS3301 / MAS8311 Biostatistics Part II: Survival

Estimation for Modified Data

Lecture 22 Survival Analysis: An Introduction

Survival Distributions, Hazard Functions, Cumulative Hazards

Applied Linear Statistical Methods

MAS3301 / MAS8311 Biostatistics Part II: Survival

Reliability Engineering I

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

Logistic regression model for survival time analysis using time-varying coefficients

Multistate models and recurrent event models

10 Introduction to Reliability

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Survival Analysis. STAT 526 Professor Olga Vitek

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Multistate models and recurrent event models

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Multistate Modeling and Applications

Censoring mechanisms

Extensions of Cox Model for Non-Proportional Hazards Purpose

Power and Sample Size Calculations with the Additive Hazards Model

Statistical Inference and Methods

Package threg. August 10, 2015

Joint Modeling of Longitudinal Item Response Data and Survival

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Textbook: Survivial Analysis Techniques for Censored and Truncated Data 2nd edition, by Klein and Moeschberger

Survival Analysis I (CHL5209H)

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

3003 Cure. F. P. Treasure

Nonparametric Model Construction

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Censoring and Truncation - Highlighting the Differences

PROBABILITY DISTRIBUTIONS

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Lecture 5 Models and methods for recurrent event data

Cox s proportional hazards model and Cox s partial likelihood

Transform Techniques - CF

Duration Analysis. Joan Llull

Constrained estimation for binary and survival data

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

TMA 4275 Lifetime Analysis June 2004 Solution

Statistics for Engineers Lecture 4 Reliability and Lifetime Distributions

Transform Techniques - CF

1. Reliability and survival - basic concepts

ST745: Survival Analysis: Parametric

Multistate models in survival and event history analysis

Multi-state Models: An Overview

Dynamic Models Part 1

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

4 Testing Hypotheses. 4.1 Tests in the regression setting. 4.2 Non-parametric testing of survival between groups

IE 303 Discrete-Event Simulation

Lecture 3. Truncation, length-bias and prevalence sampling

Transform Techniques - CF

λ = pn µt = µ(dt)(n) Phylomath Lecture 4

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time

The Weibull in R is actually parameterized a fair bit differently from the book. In R, the density for x > 0 is

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Proportional hazards regression

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Estimating transition probabilities for the illness-death model The Aalen-Johansen estimator under violation of the Markov assumption Torunn Heggland

A multi-state model for the prognosis of non-mild acute pancreatitis

The Weibull Distribution

UNIVERSITY OF CALIFORNIA, SAN DIEGO

An Overview of Methods for Applying Semi-Markov Processes in Biostatistics.

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Generalized Linear Models for Non-Normal Data

ON THE SUM OF EXPONENTIALLY DISTRIBUTED RANDOM VARIABLES: A CONVOLUTION APPROACH

ST5212: Survival Analysis

Kernel density estimation in R

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Dependence structures with applications to actuarial science

ECE531 Lecture 10b: Maximum Likelihood Estimation

Survival Models. Lecture: Weeks 2-3. Lecture: Weeks 2-3 (Math 3630) Survival Models Fall Valdez 1 / 31

Direct likelihood inference on the cause-specific cumulative incidence function: a flexible parametric regression modelling approach

Continuous Random Variables

Miscellaneous Errors in the Chapter 6 Solutions

Right-truncated data. STAT474/STAT574 February 7, / 44

Lecture 7. Poisson and lifetime processes in risk analysis

Variable Selection in Competing Risks Using the L1-Penalized Cox Model

Failure rate in the continuous sense. Figure. Exponential failure density functions [f(t)] 1

ISyE 2030 Practice Test 1

Lecture 10: Normal RV. Lisa Yan July 18, 2018

3 Continuous Random Variables

HW7 Solutions. f(x) = 0 otherwise. 0 otherwise. The density function looks like this: = 20 if x [10, 90) if x [90, 100]

CHAPTER 2 A Function for Size Distribution of Incomes

Brief Review of Probability

Parameter Estimation

Econometric Analysis of Games 1

Lectures prepared by: Elchanan Mossel Yelena Shvets

Distribution Fitting (Censored Data)

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Economics 101 Lecture 5 - Firms and Production

Checking the Reliability of Reliability Models.

Quiz #2 A Mighty Fine Review

ST495: Survival Analysis: Maximum likelihood

Transcription:

Lecture 4 - Survival Models Survival Models Definition and Hazards Kaplan Meier Proportional Hazards Model Estimation of Survival in R

GLM Extensions: Survival Models Survival Models are a common and incredibly useful extension of the generalized linear model. They are linked on a basic level to Poisson arrivals, which as we learned earlier, yield an exponential distribution of arrival times. Survival models are used across many fields Medicine and biostatistics: Many drugs are used to prolong life in the face of serious illness. Firm survival and death. How long do businesses live? Eg: conditional on entering a market (or new market) today, what is the probability of bankruptcy in 12 months? One can imagine survival being used to model time spent on webpages, shopping, Facebook, etc... In this part of the course, we ll learn the basics of survival models using the GLM methodology, and then discuss extensions.

GLM Extensions: Survival Models Let y be survival time, and f(y) be the pdf of survival times. Probability of surviving less than y is: F (y) = Pr (Y < y) = y 0 f(t)dt By the property of complements, the probability of surviving longer then y is the survivor function S (y) = 1 F (y) The hazard function, h(y) is the probability of death within a small period between y and δy, given they have survived until t. F(y + δy) F(y) h(y) = lim δ 0 δy = f(y) S(y) 1 S(y) This is essentially a conditional probability. Conditional on surviving up to y or later, S(y), what is the instantaneous probability of death?

GLM Extensions: Survival Models For a few more definitions, it is straightforward to show that the hazard function is linked to the survivor function: h(y) = d ds(y) dy log (S(y)) = dy S(y) = f(y) S(y) Finally, the cumulative hazard function, H(y) is written as H(y) = log (S(y)) Example: Exponential Distribution f(y) = θ exp ( θy) F(y) = y 0 θ exp ( θt) dt = ( exp ( θt) y 0 Exponential Survivor function and Hazard: S(y) = exp ( θy), h(y) = θ = 1 exp ( θy) Note that the hazard does not depend on age. Thus, the exponential distribution is "memoryless". When is this a good or bad property?

GLM Extensions: Survival Models The memoryless property makes the exponential distribution unsuitable for a number of applications. The Weibull distribution nests the exponential distribution. f(y) = λφy λ 1 exp φy λ Under what condition is this identical to the exponential distribution? The survival function of Weibull: S(y) = t = exp φy λ Hence, the hazard is written as: λφt λ 1 exp φt λ dt h(y) = λφy λ 1 The link between y and the hazard may be either positive of negative. What are some economic examples of each?

Simple Estimation: Survival Models One way to estimate survival models is to construct a Kaplan-Meier estimate of the survivor function For this, individuals are ordered by time of death from 1 to n y (1) y (2) y (k), where n j is the number of individuals alive just before y (j) and, d j the number of deaths that occur at time y (j) First, consider the probability of survival just before y (1). S y [0, y (1) ) = 1 Next, probability of survival just before y (2). S y [y (1), y (2) ) = 1 n 1 d 1 n 1 Next, probability of survival just before y (3). S y [y (2), y (3) ) = 1 n 1 d 1 n 1 n 2 d 2 n 2

Simple Estimation: Survival Models In general, the Kaplan-Meier estimate of the survivor function at time y (s) is the following: S s nj d j y (s) = This can be compared to the survivor function for Exponential and Weibull distributions Exponential : S (y) = exp ( θ y) Weibull : S (y) = exp φy λ How do we choose between the two distributions? Take logs of the survivor functions: Exponential : log (S (y)) = θ y Weibull : log (S (y)) = φy λ Log of KM estimate should be approximately linear for exponential, non-linear for Weibull n j

Example: Kaplan-Meier To study survival models, we will use an influential study, the "Gehan-Freirich" Survival Data Data available on course website in stata format The data show the length of remission in weeks for two groups of leukemia patients, treated and control weeks: Weeks in remission (effectively survival) relapse: 1 if a relapse observed, 0 otherwise (this is censoring) group: 1 if respondent was in treatment group, 0 if in control The library "survival" contains many function that were useful for survival models. To construct Kaplan-Meier Estimates: fit <- survfit(surv(weeks, relapse)~group, data = g) plot(fit, lty = 2:3) legend(23, 1, c("control", "Treatment"), lty = 2:3)

Estimation: Survival Models The importance of the survival function and hazard function become apparent when estimating rigorously by maximum likelihood. For survival analysis, the data are recorded by subject j y j is the survival time of individual j δ j = 1 is a variable identifying uncensored observations, δ j = 0 if censored. x j a vector of explanatory variables for j. Order j such that j = 1..r are uncensored, and j = r + 1..n are censored Censored individuals are still "surviving" at the end of the data collection. We do not observe when censored individuals actually die. For uncensored data, the likelihood function is written as: n L = f y j

Estimation: Survival Models With censored data, the likelihood function is written as: r L = f n y j S y j j=r+1 f y j is the pdf at yj, which is appropriate for uncensored data. S y j is the probability that we observe yj or greater, which is the appropriate likelihood to consider for censored observations. We know that a censored individual j survives y j or longer, so the likelihood of this event is S y j Rearranging the likelihood function, we get: n L = f δj y j S 1 δj y j We can now place this in log-likelihood form, and impose the distributional assumptions.

Estimation: Survival Models In log-likelihood form: n l = δ j log f y j + 1 δj log S yj Intuition: = = = n δ j log f y j + log S yj δj log S y j n δ j log f yj log S yj + log S yj n δ j log h y j + log S yj All individuals survive until y j. This is accounted for in log S y j For individuals with δ j = 1, they die at y j. So, we account for this within the likelihood function using the hazard function, log h y j

Estimation: Exponential Survival The exponential distribution has convenient forms for h y j and S(yj ). h y j = θ, S(yj ) = exp θy j Thus, log-likelihood is: l = n δ j log θ j θj y j This looks a lot like a Poisson likelihood function, with δ j as the dependent variable. To get it even closer, write: n l = δ j log θ j y j θj y j δ j log y j Defining µ j = θ j y j, we have n l = δ j log µ j µj δ j log y j We choose µ j to maximize the log-likelihood.

Estimation: Exponential Survival Often, we assume a proportional hazards model, where the hazard function is related to observables, θ j = exp (xβ) While exponential is memoryless, the probability of dying at y is a function of observables (treatment vs control, for example). Thus, substituting into µ j = θ j y j, we have Taking logs: µ j = exp (xβ) y j log µ j = xβ + log yj Exponential with proportional hazards can be estimated by glm in R, Poisson as family log link (µ to xβ) Offset (of the log mean) by log(y j )

Estimation: Proportional Hazards Model in R Estimated the simple exponential survival model using R form<-as.formula(relapse~group+offset(log(weeks)) haz_glm<-glm(form,family=poisson("log"),data=g) summary(haz_glm) To interpret, note that the hazard is estimated as: θ treat = exp (β 0 + β 1 Treat) Note that θ control = exp (β 0 ). Hence: = exp (β 0 ) exp (β 1 Treat) θ treat = θ control exp (β 1 Treat) θ treat θ control = exp (β 1 ) θ treat θ control θ control = exp (β 1 ) 1 = exp( 1.53) 1 = 0.783 78% reduction in the hazard of relapse relative to control.