ST495: Survival Analysis: Maximum likelihood

Similar documents
ST495: Survival Analysis: Hypothesis testing and confidence intervals

ST745: Survival Analysis: Nonparametric methods

ST745: Survival Analysis: Parametric

ST745: Survival Analysis: Cox-PH!

Censoring mechanisms

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Likelihood Construction, Inference for Parametric Survival Distributions

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Lecture 3. Truncation, length-bias and prevalence sampling

Step-Stress Models and Associated Inference

Exercises. (a) Prove that m(t) =

Beyond GLM and likelihood

11 Survival Analysis and Empirical Likelihood

3003 Cure. F. P. Treasure

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Semiparametric Regression

Survival Analysis Math 434 Fall 2011

STAT331. Cox s Proportional Hazards Model

Survival Analysis. Stat 526. April 13, 2018

The Weibull in R is actually parameterized a fair bit differently from the book. In R, the density for x > 0 is

Multistate Modeling and Applications

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

β j = coefficient of x j in the model; β = ( β1, β2,

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Statistics for Engineers Lecture 4 Reliability and Lifetime Distributions

Introduction to Reliability Theory (part 2)

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

The Weibull Distribution

Lecture 22 Survival Analysis: An Introduction

IE 303 Discrete-Event Simulation

Reliability Engineering I

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

Stat 451 Lecture Notes Numerical Integration

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

MAS3301 / MAS8311 Biostatistics Part II: Survival

Tied survival times; estimation of survival probabilities

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

MAS3301 / MAS8311 Biostatistics Part II: Survival

Survival Analysis I (CHL5209H)

The Design of a Survival Study

Cox s proportional hazards model and Cox s partial likelihood

Hybrid Censoring Scheme: An Introduction

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Survival Regression Models

Nonparametric Model Construction

STAT 6350 Analysis of Lifetime Data. Probability Plotting

Key Words: survival analysis; bathtub hazard; accelerated failure time (AFT) regression; power-law distribution.

Parameter Estimation

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes.

STAT 526 Spring Final Exam. Thursday May 5, 2011

Estimation of Parameters of the Weibull Distribution Based on Progressively Censored Data

Maximum Likelihood Estimation

10 Introduction to Reliability

Proportional hazards regression

GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Math 494: Mathematical Statistics

Empirical Likelihood

ABC methods for phase-type distributions with applications in insurance risk problems

Hybrid Censoring; An Introduction 2

Statistical Inference on Constant Stress Accelerated Life Tests Under Generalized Gamma Lifetime Distributions

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

CTDL-Positive Stable Frailty Model

1 Glivenko-Cantelli type theorems

Central Limit Theorem ( 5.3)

Normalising constants and maximum likelihood inference

SOLUTION FOR HOMEWORK 8, STAT 4372

Cox s proportional hazards/regression model - model assessment

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Accelerated Failure Time Models

Bayesian Analysis for Partially Complete Time and Type of Failure Data

Stat 475 Life Contingencies I. Chapter 2: Survival models

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

4. Comparison of Two (K) Samples

Mathematical Statistics

Modelling geoadditive survival data

Survival analysis in R

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Simulating Random Variables

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Estimation of Quantiles

Review and continuation from last week Properties of MLEs

Joint Modeling of Longitudinal Item Response Data and Survival

Analysis of Type-II Progressively Hybrid Censored Data

Fundamentals of Reliability Engineering and Applications

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Dynamic Models Part 1

STAT Sample Problem: General Asymptotic Results

Frailty Modeling for clustered survival data: a simulation study

Optimal Cusum Control Chart for Censored Reliability Data with Log-logistic Distribution

Mathematical statistics

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Quantile Regression for Residual Life and Empirical Likelihood

Transcription:

ST495: Survival Analysis: Maximum likelihood Eric B. Laber Department of Statistics, North Carolina State University February 11, 2014

Everything is deception: seeking the minimum of illusion, keeping within the ordinary limitations, seeking the maximum. In the first case one cheats the Good, by trying to make it too easy for oneself to get it, and the Evil by imposing all too unfavorable conditions of warfare on it. In the second case one cheats the Good by keeping as aloof from it as possible, and the Evil by hoping to make it powerless through intensifying it to the utmost. Franz Kafka

Last time Introduced parametric models commonly used in survival analysis; discussed their densities, hazards, survivor function, and CDFs; showed how to draw from these distributions using R Exponential Weibull Gamma Extreme value Log-normal Log-logistic We also discussed location-scale models and how to use the location-scale framework to incorporate covariate information We also discussed flexible models for the hazard function including piecewise constant and basis expansions

Last time: burning questions How to choose a distribution? How to estimate the parameters indexing a chosen distribution? How can we accommodate difference types of censoring? How can I use R to do the foregoing estimation steps?

Warm-up Explain to your stat buddy 1. Hazard function 2. How a bathtub hazard might arise 3. How an increasing hazard might arise 4. How a decreasing hazard might arise True or false: (T/F) Minimum of independent Weibull r.v. s are Weibull (T/F) Measles increases fecundity in goats (T/F) An exponential distribution would be good model for mortality in humans Who is generally credited with discovering maximum likelihood estimation?

Warm-up cont d Some concepts and notation for today Let a1,..., a n be a sequence of constants then n a i = a 1 a 2 a n Let f (x) denote a function from R p into R then x = arg max f (x) x satisfies f (x ) f (x) for all x R p. We use 1 Statement to denote the function that equals one if Statement is true and zero otherwise. Thus, 1 t 1 equals one if t 1 and zero otherwise.

Observation schemes Why is survival analysis its own sub-field of statistics? Abundance of important applications Fundamental contributions to statistical theory (esp. in semi-parametrics) Dealing partial information due to censoring Recall that when we only observe partial information about a failure time we say that it s censored T C (Right censored) T L (Left censored) V T < U (Interval censored)

Likelihood If generative model is indexed by parameter θ then the likelihood is L(θ) P(Data; θ), which is viewed as a function of θ with the data being fixed The maximum likelihood estimator is θ n = arg max L(θ) θ Warm-up: Let X 1,..., X n N(µ, σ 2 ), then θ = (µ, σ 2 ), derive L(θ), and θ n. Check your answer with your stat buddy.

Likelihood cont d Ex. Let T 1,..., T n be an iid draw from distn with density f (t; θ) indexed by θ then L(θ) P(Data; θ) = n P(T i = t i ; θ) = n f (t i ; θ), Let T denote a generic observation distd according to f (t; θ). How can we use L(θ) to estimate: The mean of T? The CDF of T? The hazard of T?

Nonparametric maximum likelihood Ex. Let T 1,..., T n be an iid draw from distn with density f (t). Suppose now, however, we don t put any restrictions of f (t) (other than it being a density). In this case f is our parameter n L(f ) P(Data) = f (t i ), how can we maximize this over densities f? Claim: Our estimated f, say ˆf should only put positive mass on t 1,..., t n. (Why?) If f puts mass on t1,..., t n then maximizing the likelihood is equivalent to solving n n max subj. to α i = 1 α i α 1,...,α n 0 Some painful calculus shows f is pmf with f (ti ) = 1/n, i = 1,..., n

Nonparametric maximum likelihood cont d Thus f (t) is a discrete distribution with f (t i ) = 1/n. The estimated is given by F (t) = 1 n n 1 t ti, this is called the empirical distribution function (ECDF)

Computing ECDF in R n = 50; x = rnorm (n); FHat = ecdf (x); plot (FHat, xlab= x ); Fn(x) 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 1 2 3 x

Observation schemes

Truncated estimation Suppose t 1,..., t n compose a random sample from subjects with lifetimes less than or equal to one year. The LH takes the form: n n { } f (ti ) f (t i T i 1) =, F (1) Why? Suppose we posited a parametic model f (t; θ) for T, how can we estimate θ using left-truncated data? Deceptively difficult stat question: Suppose T 1,..., T n are drawn ind. from an exp(θ) distn but are truncated at one. When does the maximum likelihood estimator exist?

Right-censoring Recall that a censoring time is right censored at C if we only observe that T > C Our goal in the next few slides is to derive the LH under different right-censoring mechanisms Notation: observe {(T i, δ i )} n where T i is the observation time and δ i is the censoring indicator { 1 Failure time observed δ i = 0 Right censored Big result of the day: Under a variety of right-censoring mechanisms: n LH f (t i ) δ i S(t i +) 1 δ i

Type I censoring In type I censoring each individual has a fixed (non-random) censoring time C > 0 If T C then failure time observed If T > C then right-censored Ex. Odense Malignant Melanoma Data: n = 205 subjects enrolled between 1962 and 1972 at Odense Dept. of Plastic Surgery had tumors and surrounding tissue removed. Patients were followed until the death or the study concluded in 1977. Note* This data is contained in the boot package in R.

Type I censoring cont d Using the book s notation: define t i = min(t i, C i ) and δ i = 1 Ti C i then: If δ i = 1, t i is the failure time so the information (t i, δ i ) contributes to the LH is f (t i ) If δ i = 1 then t i is the censoring time so the information (t i, δ i ) contributes to the LH is S(t i +) Thus, the LH is n f (t i ) δ i S(t i +) 1 δ i

Type I censoring cont d In class: Suppose T 1,..., T n are iid exp(θ) but subject to Type I censoring, let δ 1,..., δ n denote the censoring indicators. Derive the MLE for θ.

Independent random censoring Assume lifetime T and censoring time C are random variables. Often more realistic Ex. Random study enrollment times Ex. Subjects moving out of town... Let G(t) and g(t) denote the survivor and density function for C resp., define t i = min(t i, C i ), δ i = 1 Ti C i, then Why? f (t i, δ i ) = [f (t i )G(t i +)] δ i [S(t i +)g(t i )] 1 δ i,

Independent random censoring cont d The LH for n iid observations is ( n ) ( n ) f (t i ) δ i S(t i +) 1 δ i g(t i ) 1 δ G(t i +) δ i, note that if g(t) and G(t) does contain information about f (t) or S(t) then the LH is proportional to n f (t i ) δ i S(t i +) 1 δ i It s the same LH as before!!!

Type II censoring Observe individuals until the rth failure is observed, so that we observe the r smallest lifetimes t (1) t (r). All n units start at the same time Follow-up stops at the time of the rth failure Follow-up time is random Using properties of order statistics, the LH is ( r ) n! n f (t (n r)! (i) ) S(t (r) +) n r f (t i ) δ i S(t i +) 1 δ i

Code break Go over mle.r in R Studio

For those who are adventurous

Counting process notation Goal: Show the form the LH for right-censoring applies in very general settings For clarity we assume discrete time t = 0, 1,... Let h i (t) and S i (t) denote the hazard and survivor function for ith subject resp. ; further define Y i (t) 1 Ti t, ith subj not censored { 1 ith subj. hasn = t failed or been censored at t 0 Otherwise, if Y i (t) = 1 then we say the ith subj. is at risk at time t

Counting process notation cont d Define dn i (t) Y i (t)1 Ti =t { 1 if at risk and fails at t = 0 Otherwise dc i (t) Y i (t)1 ith subj. censored at t { 1 if at risk and censored at t = 0 Otherwise Claim: {dn i (t), dc i (t), t 0} has a single 1 and the rest zeros

Counting process notation cont d Even more definitions: dn(t) (dn 1 (t),..., dn n (t)) dc(t) (dc 1 (t),..., dc n (t)) H(t) {(dn(s), dc(s), s = 0, 1,..., t 1} we say H(t) is the history of the survival process up to time t

Counting process notation: the likelihood Note that lim t H(t) contains all the information in the collected data (why?), thus P (Data) = P (dn(0)) P (dc(0) dn(0)) = P (dn(1) H(1)) P (dc(1) dn(1), H(1)) P(dN(t) H(t))P(dC(t) dn(t), H(t)) t=0

Counting process notation: the likelihood cont d To make the horrible expression tractable we ll assume conditional independence across subjects given H(t) and P (dn i (t) = 1 H(t)) = Y i (t)h i (t), explain this expression to your stat buddy We will also assume that terms inside P(dC(t) dn(t), H(t)) are not informative for the parameters in h i (t) When the above assumptions hold we say the censoring is non-informative

Counting process notation: the likelihood cont d Under the foregoing assumption the LH is given by n t=0 h i (t) dn i (t) (1 h i (t)) Y i (t)(1 dn i (t)) To see this, we ll consider two cases: Case 1: ith subject s failure time is observed at ti, then they re at risk at t = 0, 1,..., t i, and dn i (t) = 1 t=ti, thus t=0 t i 1 h i (t) dn i (t) (1 h i (t)) Y i (t)(1 dn i (t)) = h i (t i ) (1 h i (s)) = f i (t i ) Case 2: ith subject is censored at time ti, then they re at risk at times t = 0, 1,..., t i, and dc i (t) = 1 t=si, thus t i h i (t) dn i (t) (1 h i (t)) Y i (t)(1 dn i (t)) = (1 h i (t)) = S i (t i +1) t=0 t=0 s=0

Counting process notation: the likelihood cont d Putting it all together shows the LH is proportional to n f i (t i ) δ i S i (t i + 1) 1 δ i Limiting arguments show the LH is the same (with S i (t i + 1) replaced by S i (t i +)) in the continuous case Note* we ll see later that using the framework of partial likelihood that maximizing the above LH is appropriate in even more general settings

LH-based inference For parametric models the LH provides an efficient framework for estimation and inference Let θ Θ R p index the survival distribution of interest, define L(θ) the LH l(θ) = log L(θ) the log-lh u(θ) = d dθ l(θ) the score function I (θ) = d 2 dθdθ l(θ) the Fisher information

LH-based inference cont d Recall maximum LH estimator solves u(θ) = 0 θ n = arg max θ Θ L(θ), Under mild regularity conditions n( θn θ ) N(0, I 1 (θ ))

LH-based inference example Assume T 1,..., T n are iid exp(λ) and subject to noninformative censoring. Let δ 1,..., δ n denote the censoring indicator. Find the MLE for λ, the Fisher information matrix, and a 95% confidence interval for λ. The LH, L(λ) is n f (t i ; λ) δ i S(t i ; λ) 1 δ i = n λ δ i exp { λt i δ i } exp { λt i (1 δ i )} = λ n δ i exp The log-lh, l(λ), is ( n ) δ i log λ λ n { t i λ } n t i

LH-based inference example cont d The score function u(λ) is given by u(λ) = n δ i λ n t i setting this to zero and solving yields λ n = n δ i n t i

LH-based inference example cont d We take the negative derivative of u(λ) to get I (λ) = n δ i/λ 2 How do we get a 95% confidence interval for λ?