Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time

Similar documents
Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Estimation for Modified Data

ST745: Survival Analysis: Nonparametric methods

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

MAS3301 / MAS8311 Biostatistics Part II: Survival

Exercises. (a) Prove that m(t) =

Nonparametric Model Construction

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

18.465, further revised November 27, 2012 Survival analysis and the Kaplan Meier estimator

STAT Sample Problem: General Asymptotic Results

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Multistate Modeling and Applications

Right-truncated data. STAT474/STAT574 February 7, / 44

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Multi-state models: prediction

Lecture 3. Truncation, length-bias and prevalence sampling

Survival Distributions, Hazard Functions, Cumulative Hazards

University of California, Berkeley

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Quantile Regression for Residual Life and Empirical Likelihood

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Product-limit estimators of the survival function with left or right censored data

Survival Regression Models

Survival Analysis. Stat 526. April 13, 2018

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Empirical Processes & Survival Analysis. The Functional Delta Method

Survival Analysis. STAT 526 Professor Olga Vitek

Censoring and Truncation - Highlighting the Differences

Linear models and their mathematical foundations: Simple linear regression

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes.

ST495: Survival Analysis: Maximum likelihood

Credit risk and survival analysis: Estimation of Conditional Cure Rate

Survival Analysis Math 434 Fall 2011

Statistical Inference and Methods

β j = coefficient of x j in the model; β = ( β1, β2,

arxiv: v3 [stat.me] 24 Mar 2018

Lecture 4 - Survival Models

Constrained estimation for binary and survival data

Survival Analysis I (CHL5209H)

Continuous Time Survival in Latent Variable Models

9 Estimating the Underlying Survival Distribution for a

Step-Stress Models and Associated Inference

Part III Measures of Classification Accuracy for the Prediction of Survival Times

MAS3301 / MAS8311 Biostatistics Part II: Survival

3003 Cure. F. P. Treasure

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

11 Survival Analysis and Empirical Likelihood

10 Introduction to Reliability

Estimating transition probabilities for the illness-death model The Aalen-Johansen estimator under violation of the Markov assumption Torunn Heggland

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Cox s proportional hazards model and Cox s partial likelihood

Survival Analysis Using S/R

Multistate models in survival and event history analysis

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Lecture 22 Survival Analysis: An Introduction

Power and Sample Size Calculations with the Additive Hazards Model

Chapter 4 Regression Models

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data

Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data

Introduction to Statistical Analysis

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Package threg. August 10, 2015

Double-Sampling Designs for Dropouts

Comparing Distribution Functions via Empirical Likelihood

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

STAT331. Cox s Proportional Hazards Model

The Relationship Between Confidence Intervals for Failure Probabilities and Life Time Quantiles

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Empirical Likelihood in Survival Analysis

STAT Section 2.1: Basic Inference. Basic Definitions

1 The problem of survival analysis

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Confidence Intervals. Contents. Technical Guide

4. Comparison of Two (K) Samples

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Logistic regression model for survival time analysis using time-varying coefficients

Package Rsurrogate. October 20, 2016

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

and Comparison with NPMLE

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Likelihood Construction, Inference for Parametric Survival Distributions

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

POWER AND SAMPLE SIZE DETERMINATIONS IN DYNAMIC RISK PREDICTION. by Zhaowen Sun M.S., University of Pittsburgh, 2012

A multi-state model for the prognosis of non-mild acute pancreatitis

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Consistency of bootstrap procedures for the nonparametric assessment of noninferiority with random censorship

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Transcription:

Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/36

The case of complete observations Suppose that we have a single sample of survival times, where none of the observations are censored. Then, the survivor function S(t) can be estimated by the empirical survivor function, given by Ŝ(t) = Number of individuals with survival times > t Number of individuals in the data set. Let t 1, t 2,..., t n be the exact survival times of the n individuals under study. We relabel the n survival times t 1, t 2,..., t n in ascending order such that t (1) t (2)... t (n). Winter term 2018/19 2/36

The case of complete observations The survivorship function at t (i) can be estimated as Ŝ(t (i) ) = n i n = 1 i n, where n i is the number of individuals surviving longer than t (i). If two or more t (i) are equal (tied observations), the largest i value is used. This gives a conservative estimate for the tied observations. Since every individual is alive at the beginning of the study and no one survives longer than t (n), Ŝ(t (0) ) = 1 and Ŝ(t (n) ) = 0. Winter term 2018/19 3/36

Example: Computation of Ŝ(t) for 10 lung cancer patients Winter term 2018/19 4/36

Plot of the empirical survivor function 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 Months Figure: Step function Ŝ(t) of lung cancer patients. Winter term 2018/19 5/36

Estimating S(t) in case of censored observations When censored observations are present, a different method of estimating S(t) is required. Before attempting to fit a theoretical distribution to a set of survival data, we will discuss nonparametric methods for estimating S(t). They are also said to be distribution-free, since they do not require specific assumptions to be made about the underlying distribution of the survival times. If the main objective is to find a model for the data, estimates obtained by nonparametric methods and graphical methods can be helpful in choosing a distribution. Winter term 2018/19 6/36

Kaplan-Meier estimator Let n be the total number of individuals whose survival times, censored or not, are available. Relabel the n survival times in order of increasing magnitude such that t (1) t (2)... t (n). The Kaplan-Meier (product-limit) estimate of the survivor function is Ŝ(t) = n r n r + 1, t (r) t where r runs through those positive integers for which t (r) t and t (r) is uncensored. Winter term 2018/19 7/36

Example Table: Kaplan-Meier estimates Ŝ(t) of remission durations of 10 patients with solid tumors. Remission time Rank r (n r) (n r+1) Ŝ(t) 3.0 1 1 9/10 0.900 4.0+ 2 5.7+ 3 6.5 4 4 6/7 9/10 6/7 = 0.771 6.5 5 5 5/6 9/10 6/7 5/6 = 0.643 8.4+ 6 10.0 7 7 3/4 9/10 6/7 5/6 3/4 = 0.482 10.0+ 8 12.0 9 9 1/2 9/10 6/7 5/6 3/4 1/2 = 0.241 15.0 10 10 0 0 Winter term 2018/19 8/36

Plot of the Kaplan-Meier survival curve s(t) 0> 1.0 c s: "2: 0.8 :::l Cf) c 0.6 0 :e g_ 0.4 e 0.. 0.2 0 2 4 6 8 10 12 14 16 Months Figure: Function Ŝ(t) for remission data. Winter term 2018/19 9/36

2.2 Nonparametrische Pointwise confidence S(t) intervals und Λ(t) for the Schätzung, survival function Kaplan-Meier-Schätzer Alternative Kaplan-Meier-Schätzer formulation of the (aka. Kaplan-Meier Produkt-Limit-Schätzer estimate Consider Betrachtet m distinct werden failure dietimes, geordneten t (1) < t (2) Ereigniszeitpunkte <... < t (m) [* denotes t (k), k = event]. 1, 2,, m, * * * *..., m n 0 t (1) < t (2) < < t (m 1) < t (m) * Ereignis TheDiskrete Kaplan-Meier Hazardrate estimate für ofis(t) k = [t can (k 1) be, written t (k) ) istas wiederum { λ d k = 1 P (T [t (k 1), for t (k) t ) < t(1) T t (k 1) ) Ŝ(t) = ( ) t Die Wahrscheinlichkeit, (k) t 1 d, k n das k for t t (1) k-te Intervall zu überleben, where gegeben n k the es wurde numbererreicht: of individuals at risk just prior to t (k) and d k the number of failures at t p k = 1 λ d (k) (k = 1..., m). k = P (T t (k) T t (k 1) ) Winter term 2018/19 10/36

Greenwood s formula The variance of the Kaplan-Meier estimator is estimated by Greenwood s formula: Var[Ŝ(t)] = [Ŝ(t)] 2 t (k) t d k n k (n k d k ). The standard error of the Kaplan-Meier estimator is given by se{ŝ(t)} = { Var[Ŝ(t)]} 1/2. Winter term 2018/19 11/36

Nelson-Aalen estimator The Nelson-Aalen estimator of the cumulative hazard function H(t) is defined as { 0 for t < t(1) H(t) = t (k) t d k. n k for t t (1) The estimated variance of the Nelson-Aalen estimator is given by Var[ H(t)] = d k. n 2 t (k) t k Winter term 2018/19 12/36

Breslow estimator Based on the Nelson-Aalen estimator of the cumulative hazard rate, the Breslow estimator of the survival function is given by S(t) = exp[ H(t)]. The estimated variance is obtained as Var[ S(t)] = [ S(t)] 2 d k n 2 t (k) t k. Winter term 2018/19 13/36

Example Table: Nelson-Aalen estimates H(t) and S(t) and Kaplan-Meier estimates Ŝ(t) of remission time of 10 patients. t (k) d k n k d k /n k H(t) S(t) Ŝ(t) 3.0 1 10 0.100 0.100 0.905 0.900 4.0+ 0 9 0 5.7+ 0 8 0 6.5 2 7 0.286 0.386 0.680 0.643 8.4+ 0 5 0 10.0 1 4 0.250 0.636 0.529 0.482 10.0+ 0 3 0 12.0 1 2 0.5 1.136 0.321 0.241 15.0 1 1 1.000 2.136 0.118 0 Winter term 2018/19 14/36

Life-table estimate Sterbetafel-Methode (1) Eine der traditionellsten Methoden zur Analyse von Verweildauern und Lebenszeiten. Wird The life-table vorwiegend (or actuarial) in Demographie estimate und of the Lebensversicherungen survivor function angewandt allows to group event times into intervals. Eignet sich auch für Daten in gruppierter Form Diskretisierung Consider a discretization der Zeitachse of the in time q + axis 1 Intervalle in q + 1 adjacent, [a non-overlapping k 1, a k ), k = intervals, 1,..., q + [a k 1 1, wobei, a k ), k a= 0 = 1,. 0.. und, q + a1 q+1 where =. a 0 = 0 and a q+1 =. a 0 = 0 a 1 a 2 a 3... a q Winter term 2018/19 15/36

Life-table estimate (2) Notations n: number of individuals at the start of the study. d k : number of deaths in the kth interval. c k : number of censored survival times in the kth interval. n k : number of individuals who are alive at the start of the kth interval. It holds that n k = n k 1 d k 1 c k 1 for k = 2,..., q + 1. n k : Number of individuals at risk of experiencing the event in the kth interval, assuming that censored survival times occur uniformly throughout the kth interval, n k = n k c k /2. Winter term 2018/19 16/36

Life-table estimate (3) The conditional proportion of death in the kth interval given exposure to the risk of death in the kth interval is d k /n k. The life-table estimate of S(a k ) is given by ( S (a k ) = S (a k 1 ) 1 d ) ( ) k k n k = 1 d j n j for k = 1,..., q with S (a 0 ) = 1. The estimated variance of the life-table estimate is given by k Var[S (a k )] = [S (a k )] 2 d j n j (n j d, j) for k = 1,..., q with the standard error of S (a 0 ) = 1 being zero. Winter term 2018/19 17/36 j=1 j=1,

Example Table: Life-table estimate of the survivor function for some data. k a k 1 n k d k c k n k 1 d k S n (a k k ) 1 0 24 4 2 23 0.826 0.826 2 182 18 0 0 18 1.000 0.826 3 365 18 2 1 17.5 0.886 0.732 4 547 15 1 1 14.5 0.931 0.681 5 730 13 1 2 12 0.917 0.624 6 912 10 0 1 9.5 1.000 0.624 7 1095 9 2 0 9 0.778 0.486 8 1277 7 0 1 6.5 1.000 0.486 9 1460 6 1 2 5 0.800 0.389 10 1642 3 0 3 1.5 1.000 0.389 Winter term 2018/19 18/36

Sterbetafel-Methode (6) Beispiel Plot of the life-table estimate Plot von ˆP k für jedes Intervall. S(t) 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 500 1000 1500 t Winter term 2018/19 19/36

Cohort life tables A cohort is a group of individuals who have some common origin from which the event time will be calculated. They are followed over time and their event time or censoring time is recorded to fall in one of q + 1 adjacent, non-overlapping intervals, I k = [a k 1, a k ), k = 1,..., q + 1 with a 0 = 0 and a q+1 =. A traditional cohort life table presents the actual mortality experience of the cohort from the birth of each individual to the death of the last surviving member of the cohort. Censoring may occur because some individuals may migrate out of the study area or drop out of observation. Winter term 2018/19 20/36

Basic construction of a cohort life table 1 The 1st column gives the intervals I k, k = 1,..., q + 1. 2 The 2nd column gives the number of subjects n k, entering the kth interval who have not experienced the event. 3 The 3rd column gives the number of censored survival times in the kth interval, c k. 4 The 4th column gives the number of individuals at risk of experiencing the event in the kth interval, n k. 5 The 5th column reports the number of individuals d k who experience the event in the kth interval. 6 The 6th column gives the estimated survival function at the start of the kth interval, S (a k 1 ) with S (a 0 ) = 1. Winter term 2018/19 21/36

Basic construction of a cohort life table (2) 7 The 7th column gives the estimated pdf ˆf (a mk ) at the midpoint of the kth interval, a mk = (a k + a k 1 )/2: ˆf (a mk ) = [S (a k 1 ) S (a k )]/(a k a k 1 ). 8 The 8th column gives the estimated hazard rate ĥ(a mk ) at the midpoint of the kth interval, a mk : ĥ(a mk ) = ˆf (a mk )/S (a mk ) = ˆf (a mk )/{S (a k ) + [S (a k 1 ) S (a k )]/2} = 2ˆf (a mk ) [S (a k ) + S (a k 1 )] Note that S (a mk ) is based on a linear approximation between the estimate of the survivor function at the endpoints of the interval. Winter term 2018/19 22/36.

Basic construction of a cohort life table (3) 9 The 9th column gives the standard error of survival at the beginning of the kth interval, se{s (a k 1 )} = { Var[S (a k 1 )]} 1/2 for k = 2,..., q + 1 with se{s (a 0 )} = 0. 10 The 10th column shows the standard error of the pdf at the midpoint of the kth interval. 11 The 11th column shows the standard error of the hazard function at the midpoint of the kth interval. Winter term 2018/19 23/36

Example of a cohort life table Figure: Life table for weaning example (Klein and Moeschberger 2003, p. 156). Winter term 2018/19 24/36

Interval estimates An estimate of the survivor function provides a summary estimate of the mortality experience of a given population. The standard error of an estimate of the survivor function provides some information about the precision of the estimate. We can use these estimators to construct a pointwise confidence interval (CI) for the corresponding value of the survivor function at a fixed time t. The intervals are constructed to assure, with prescribed probability, that the true value of the survival function, at a predetermined time t, falls in the interval we construct. Winter term 2018/19 25/36

Standard normal distribution cp(x) 1 - p 1-p 0 Figure: Percentiles of the standard normal distribution. Winter term 2018/19 26/36

Confidence interval A pointwise 100(1 α)% confidence interval for S(t), for a given value of t, is given by Ŝ(t) ± z 1 α 2 se{ŝ(t)}, where se{ŝ(t)} = { Var[Ŝ(t)]} 1/2 and z 1 α is the 1 α/2 2 quantile of the standard normal distribution. For all t it holds that ) a P (Ŝ(t) z1 α se{ŝ(t)} S(t) Ŝ(t) + z 2 1 α se{ŝ(t)} 2 1 α for a given confidence level 1 α. Winter term 2018/19 27/36

Confidence intervals based on transformations of Ŝ(t) Alternative CIs can be constructed by first transforming Ŝ(t) and obtaining a CI for the transformed value. The resulting confidence limits are then back-transformed to give a confidence interval for S(t) itself. Possible transformations include the log transformation, ln[ŝ(t)], the logistic transformation, ln[ŝ(t)/{1 Ŝ(t)}], and the complementary log-log transformation, ln[ ln{ŝ(t)}]. Such intervals can lead to better coverage probabilities. Winter term 2018/19 28/36

Simultaneous confidence bands Pointwise confidence intervals are valid for a single fixed time at which the inference is to be made. For simultaneous confidence bands of the survival function it should hold that P (Ŝ(t) ± cα (t L, t U ) S(t), t [t L, t U ]) a 1 α for a given confidence level 1 α. We present two approaches for constructing confidence bands for S(t). Winter term 2018/19 29/36

Equal precision (EP) bands Equal precision (EP) bands are obtained as Ŝ(t) ± c α (a L, a U ) se{ŝ(t)}, with c α (a L, a U ) chosen such that 0 < a L < a U < 1, where and a L = nˆσ2 S (t L) 1 + nˆσ 2 S (t L) ˆσ 2 S(t) = Var(Ŝ(t)) Ŝ(t) 2, a U = nˆσ2 S (t U) 1 + nˆσ 2 S (t U) = t (k) t d k n k (n k d k ). Winter term 2018/19 30/36

Equal precision (EP) bands (2) To construct 100(1 α)% confidence bands for S(t) over the range [t L, t U ], we find a confidence coefficient, c α (a L, a U ). We pick t L < t U so that t L is greater than or equal to the smallest observed event time and t U is less than or equal to the largest observed event time. Values of c α (a L, a U ) can be obtained from Table C.3 in Appendix C in Klein and Moeschberger (2003). EP bands are proportional to pointwise confidence intervals. Winter term 2018/19 31/36

Hall-Wellner bands An alternate set of confidence bands are Hall-Wellner bands, which are obtained as Ŝ(t) ± k α(a L, a U ) n [1 + nˆσ 2 S(t)]Ŝ(t). To construct a 100(1 α)% confidence band for S(t) over the region [t L, t U ], we find a confidence coefficient, k α (a L, a U ). For these bands, a lower limit, t L, of zero is allowed. Winter term 2018/19 32/36

Hall-Wellner bands (2) Values of k α (a L, a U ) can be obtained from Table C.4 in Appendix C in Klein and Moeschberger (2003). Hall-Wellner bands are not proportional to the pointwise confidence intervals. As in the case of pointwise confidence intervals, other forms for the confidence bands based on transformations are available. Winter term 2018/19 33/36

Point estimates of quantiles of survival times Recall that the pth quantile (0 p 1) of a random variable T with survival function S(t) is defined as t p = inf{t : S(t) 1 p}. To estimate t p, we find the smallest time ˆt p for which the value of the estimated survivor function is less than or equal to 1 p, that is, ˆt p = inf{t : Ŝ(t) 1 p}. Some software packages use a different estimator. Winter term 2018/19 34/36

Interval estimates of quantiles of survival times An estimator of the variance of ˆt p may be obtained from an application of the delta method. The suggested estimator for the variance of the estimator of the pth quantile is Var(ˆt p ) = Var(Ŝ(ˆt p )) [ˆf (ˆt p )] 2. The estimator of the pdf often used is ˆf (ˆt p ) = Ŝ(û p) Ŝ(ˆl p ) ˆl p û p. Winter term 2018/19 35/36

Interval estimates of quantiles of survival times (2) The values û p and ˆl p are chosen such that û p < ˆt p < ˆl p and are obtained as û p = max{t : Ŝ(t) 1 p+ɛ} and ˆl p = min{t : Ŝ(t) 1 p ɛ} for small values of ɛ. The endpoints of a 100 (1 α)% confidence interval are where s.e.(ˆt p ) = ˆt p ± z 1 α/2 s.e.(ˆt p ), Var(ˆt p ). Winter term 2018/19 36/36