Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Similar documents
Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time

ST745: Survival Analysis: Nonparametric methods

Estimation for Modified Data

Multistate models in survival and event history analysis

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Survival Analysis APTS 2016/17. Ingrid Van Keilegom ORSTAT KU Leuven. Glasgow, August 21-25, 2017

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

STAT Sample Problem: General Asymptotic Results

Multistate Modeling and Applications

Multi-state models: prediction

Survival Analysis. Stat 526. April 13, 2018

Package threg. August 10, 2015

Exercises. (a) Prove that m(t) =

ST5212: Survival Analysis

Multi-state Models: An Overview

Nonparametric Model Construction

Power and Sample Size Calculations with the Additive Hazards Model

Survival Analysis. STAT 526 Professor Olga Vitek

Philosophy and Features of the mstate package

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Multistate models and recurrent event models

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Survival Analysis using Bivariate Archimedean Copulas. Krishnendu Chandra

Quantile Regression for Residual Life and Empirical Likelihood

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Lecture 8 Stat D. Gillen

Dependence structures with applications to actuarial science

Estimating the cumulative incidence function of dynamic treatment regimes

Lecture 4 - Survival Models

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Censoring and Truncation - Highlighting the Differences

β j = coefficient of x j in the model; β = ( β1, β2,

Survival Analysis Using S/R

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival

Lecture 3. Truncation, length-bias and prevalence sampling

University of California, Berkeley

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Kaplan-Meier in SAS. filename foo url " data small; infile foo firstobs=2; input time censor; run;

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Survival Regression Models

Multistate models STK4080 H Competing risk setting 2. Illness-Death setting 3. General event histories and Aalen-Johansen estimator

Multistate models and recurrent event models

4. Comparison of Two (K) Samples

Constrained estimation for binary and survival data

Textbook: Survivial Analysis Techniques for Censored and Truncated Data 2nd edition, by Klein and Moeschberger

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Likelihood ratio confidence bands in nonparametric regression with censored data

A SIMPLE IMPROVEMENT OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

Chapter 7: Hypothesis testing

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Sample Size Determination

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Kernel density estimation in R

9. Estimating Survival Distribution for a PH Model

Comparison of Hazard, Odds and Risk Ratio in the Two-Sample Survival Problem

Lecture 7 Time-dependent Covariates in Cox Regression

Statistical Inference and Methods

Survival Analysis I (CHL5209H)

Treatment Comparison in Biomedical Studies Using Survival Function

TMA 4275 Lifetime Analysis June 2004 Solution

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Advanced Methodology Developments in Mixture Cure Models

DAGStat Event History Analysis.

Testing Goodness-of-Fit of a Uniform Truncation Model

STAT331. Cox s Proportional Hazards Model

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Journal of Statistical Software

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

Empirical Processes & Survival Analysis. The Functional Delta Method

Censoring mechanisms

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes.

Comparison of Two Population Means

18.465, further revised November 27, 2012 Survival analysis and the Kaplan Meier estimator

Survival analysis in R

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Meei Pyng Ng 1 and Ray Watson 1

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Estimating transition probabilities for the illness-death model The Aalen-Johansen estimator under violation of the Markov assumption Torunn Heggland

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Survival analysis in R

Estimating Causal Effects of Organ Transplantation Treatment Regimes

The Weibull in R is actually parameterized a fair bit differently from the book. In R, the density for x > 0 is

INFERENCES ON MEDIAN FAILURE TIME FOR CENSORED SURVIVAL DATA

Extensions of Cox Model for Non-Proportional Hazards Purpose

Survival Analysis Math 434 Fall 2011

Lecture 22 Survival Analysis: An Introduction

Product-limit estimators of the survival function with left or right censored data

arxiv: v3 [stat.me] 24 Mar 2018

Bayesian Nonparametric Inference Methods for Mean Residual Life Functions

Logistic regression model for survival time analysis using time-varying coefficients

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Transcription:

Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 4 Fall 2012 4.2 Estimators of the survival and cumulative hazard functions for RC data Suppose X is a continuous random failure time with the survival function S(t) and cumulative hazard function H(t). X may be subject to noninformative right censoring, hence we observe {(T i, δ i ), i = 1,..., n}, where T i = min(x i, C i ) and δ i = I(T i = X i ). Notations: t 1 < t 2 < < t D, D unique death times = # deaths at t j = n i=1 I(T i = t j, δ i = 1), j = 1,..., D Y j = # at risk /alive at t j = n i=1 I(T i t j ) Kaplan-Meier estimator for S(x): { 1, t < t1, Ŝ(t) = t j t (1 /Y j ), t 1 t. (4.2.1) Greenwood s variance estimator for Ŝ(t): ˆV [Ŝ(t)] = Ŝ(t)2 tj t Nelson-Aalen estimator for H(t): Y j (Y j ). (4.2.2) H(t) = { 0, t < t1 t j t /Y j, t 1 t. (4.2.3) Aalen s variance estimator for H(t): σ 2 H(t) = t j t Yj 2. (4.2.4) 1

An alternative estimator for H(t) is Ĥ(t) = log Ŝ(t), and an alternative estimator for S(x) is S(t) = exp{ H(t)}. Ŝ is also called product limit estimator. As n, D, and # of terms in the product for continuous X. For any fixed n <, D <, but in the limit, involves the whole support of X. Example: Continuous X is subject to right censoring. Let T = min(x, C) and δ = I(X < C). A random sample of size 5 is given in the table. T i 0.5 1 0.75 0.25 0.75 δ i 1 1 1 0 0 Question: What are Ŝ, H and S? Remarks Ŝ, H and Ĥ are right continuous, even though S, H may be continuous. Ŝ, S are consistent for S, for all t τ, such that P (T τ) > 0 2

Ŝ, S are asymptotically equivalent. Recall that Ŝ, H are only defined for t max(t i ) Some methods to extend Ŝ to t > max(t i) = t max Efron (1967) Ŝ(t) = 0, t > t max Gill (1980) Ŝ(t) = Ŝ(t max), t > t max Brown, Hollander and Kowar (1974) Ŝ(t) = exp{t log[ŝ(t max)]/t max }, t > t max. Example 4.1: We consider the data in Section 1.2 on the time to relapse of patients in a clinical trial of 6-MP against a placebo. The study included 42 children with acute leukemia who had a complete or partial remission of their leukemia induced by treatment with a drug. In the maintenance phrase, 6-MP was used to prevent relapse and its efficacy was of interest. The trial was conducted by matching pairs of patients by remission status and randomizing within the pair to either a 6-MP or placebo maintenance therapy. Patients were followed until relapse or until the end of study. Now we only consider 6-MP patients and would like to estimate S(t) and H(t). 4.3 Pointwise confidence intervals for the survival function 100 (1 α)% linear confidence interval for S(t) is Ŝ(t) Z 1 α/2 Ŝ(t)σ S (t), Ŝ(t) + Z 1 α/2ŝ(t)σ S(t), (4.3.1) where σ 2 S (t) = ˆV [Ŝ(t)]/Ŝ2 (t) = t j t Y j (Y j ). Alternative: CI based on some transformation g Consider g(ŝ), where g is a known, monotone and differentiable function. 3

Construct 100 (1 α)% CI for g(s(t)): g(ŝ(t)) ± Z 1 α/2 g (Ŝ(t)) Ŝ(t)σ S(t) Retransform to find 100 (1 α)% CI for S(t): g 1 {g(ŝ(t)) ± Z 1 α/2 g (Ŝ(t)) Ŝ(t)σ S(t)} Why does transforming help? 100 (1 α)% arcsine-square root confidence interval for S(t) is [ ( ) 1/2 ]} Ŝ(t) sin {max 2 0, arcsin(ŝ(t)1/2 ) ± 0.5Z 1 α/2 σ S (t) 1 Ŝ(t) (4.3.3) 100 (1 α)% log-transformed (or more commonly called log-log transformed) confidence interval for S(t) is { } Ŝ(t) 1/θ Z1 α/2 σ S (t), Ŝ(t)θ, where θ = exp log[ŝ(t)] (4.3.2) Klein and Moeschberger: log{h(t)} = log{ log(s(t))} g(u) = log{ log(u)}, g (u) = 1 u log(u) and g 1 (u) = exp{ exp(u)} 95% CI for g(s(t)) : log{ log(ŝ(t))} ± 1.96 σ S(t) log(ŝ(t)) Retransform to get a 95% CI for S(t) in (4.3.2). 4

Exercise: What is a 100 (1 α)% log -transformed CI for S(t)? (g(u) = log(u)) Both the log-log and arcsine-square root transformations give CIs which have better coverage than linear CI. R/Splus codes: library(survival) T = c(1.5, 2.5, 1.4, 6.2, 2.8, 5.3, 4.5) ind = c(1, 0, 0, 1, 1, 1, 0) fit1 = survfit(surv(t,ind), type = "Kaplan-meier", error = "greenwood", conf.int = 0.95, conf.type = "log") summary(fit1) Output: > summary(fit1) Call: survfit(formula = Surv(T, ind)) time n.risk n.event survival std.err lower 95% CI upper 95% CI 1.5 6 1 0.833 0.152 0.583 1 2.8 4 1 0.625 0.213 0.320 1 5.3 2 1 0.313 0.245 0.067 1 6.2 1 1 0.000 NA NA NA 5

conf.type = none (suppress CI), plain (linear), log (default, g(u) = log(u)), log-log (today s recommendation) 6

4.4 Confidence bands for the survival function Here we find L(t), U(t), such that P {L(t) S(t) U(t), for all t L t t U } = 1 α, where t L the smallest observed event time and t U the largest observed event time. [L(t), U(t)]is then called 100 (1 α)% confidence band for S(t), t L t t U. Define a L = nσ2 S (t L) 1+nσ 2 S (t L) and a U = nσ2 S (t U ) 1+nσ 2 S (t U ), which will satisfy 0 < a L < a U < 1. Equal probability (EP) bands for S(t), t L t t U : From Table C.3 in Appendix C, find confidence coefficients c α (a L, a U ) and construct EP bands over the range [t L, t U ] by replacing Z 1 α/2 in (4.3.1)-(4.3.3) with c α (a L, a U ). Hall and Wellner (HW) confidence bands for S(t), t L t t U : Replace Z 1 α/2 σ S (t) in (4.3.1)-(4.3.3) with k α (a L, a U )[1+nσ 2 S (t)]n 1/2, where k α (a L, a U ) can be found in Table C.4 in Appendix C. Comparisons of different confidence bands for S(t): EP Sample size HW Sample size Linear 200 Linear 20 Log 20 Log 20 Arcsine 20 Arcsine 20 7

4.5 Point and interval estimates of the mean and median survival time Mean survival time µ = E(X) = 0 S(t)dt. Naturally ˆµ = Ŝ(t)dt. 0 Ŝ(t) is not defined beyond last observation, if censored. However, the right tail of Ŝ(t) can be completed. First solution: If the last observation is censored and we use the Efron s method, we obtain the estimate t max 0 Ŝ(t)dt. Second solution: Estimate the mean restricted to a preassigned interval [0, τ], where τ can be the largest observation or some value prespecified by the investigators: The variance of this estimator is ˆµ τ = τ 0 Ŝ(t)dt. (4.5.1) ˆV (ˆµ τ ) = D [ τ j=1 t j ] 2 Ŝ(t)dt Y j (Y j ). (4.5.2) A 100 (1 α)% CI for the mean is: ˆµ τ ± Z 1 α/2 ˆV (ˆµ τ ). (4.5.3) Example 4.1 continued: Example 4.1 uses the data in Section 1.2 on the time to relapse of patients in a clinical trial of 6-MP against a placebo. We consider estimating the mean survival time and its standard error for the 6-MP patients based on the following product-limit estimator of S(t). 8

t j Y j Ŝ(t j ) 6 3 21 0.85714 7 1 17 0.80672 10 1 15 0.75294 13 1 12 0.69020 16 1 11 0.62745 22 1 7 0.53782 23 1 6 0.44818 Recall that the pth quantile of X with S(x) is defined by x p = inf{t : S(t) 1 p}. When p = 0.5, x p is the median time to the event of interest. ˆx p = inf{t : Ŝ(t) 1 p}. A 100 (1 α)% CI for x p, based on linear CI, is the set of all time points t which satisfy: Z 1 α/2 Ŝ(t) (1 p) ˆV 1/2 [Ŝ(t)] Z 1 α/2. (4.5.4) A 100 (1 α)% CI for x p, based on log-transformed CI, is the set of all time points t satisfying: Z 1 α/2 [log{ log(ŝ(t))} log{ log((1 p))}]ŝ(t) log(ŝ(t)) ˆV 1/2 [Ŝ(t)] Z 1 α/2. (4.5.5) A 100 (1 α)% CI for x p, based on arcsine-square root CI, is the set of all time points t satisfying: { 2 arcsin ( Ŝ(t) ) arcsin( } 1 p) {Ŝ(t)(1 Ŝ(t))}1/2 Z 1 α/2 ˆV 1/2 [Ŝ(t)] Z 1 α/2. 9 (4.5.6)

Example 4.2: This example uses the data described in Section 1.3 on bone marrow transplantation for leukemia. We shall estimate and construct 95% confidence intervals for the median disease-free survival time for the ALL group. 4.6 Estimators of the survival function for left-truncated and right-censored data Right-censored and left-truncated data {(L i, T i, δ i ), i = 1,..., n}, where L i : age entering the study, and time T i : the death or censoring time. Death times t 1 < t 2 < < t D, and at time t j, j = 1,..., D, let = number of deaths and let Y j = n i=1 I(L i t j T i ) Using the modified Y j for the LT data, all the estimation procedures defined in Section 4.2-4.4 are now applicable. Note that we are estimating the conditional survival S L (t) = P (X > t X > L). Also Nelson-Aalen estimator H L (t) = ln P (X > t) + ln P (X > L). Conditional hazard rate is H L (t) = h(t). Example 4.3 A survival study of residences of the Channing House retirement center located in California. Here the truncation times are the ages, in months, at which individuals entered the community. We want to estimate the conditional survival function. First look at the males. The first subject entered the center at age 751 and died at 777, the second entered the study at age 759 and died at 781, and the third subject appeared at age 782. If we used the K-M estimator directly to estimate the conditional survival function, what is ŜL(t)? 10

Instead, we estimate S a (t) = P (X > t X a), where t > a. Ŝa(t) = a t j t (1 /Y j ), t a. ˆV [Ŝa(t)] = [Ŝa(t)] 2 a t j t Ĥa(t) = a t j t Y j. Y j (Y j ), t > a. 4.7 Summary curves for competing risks Example: Bone marrow transplantation in Sec 1.3. Suppose we are interested in the time to treatment failure, which includes death in remission or relapse, whichever comes first. Suppose the death in remission is of primary interest, then the relapse is the competing risk. Competing risk data: {(T i, δ i ), i = 1,..., n}, where 1 if T i = X 1i T i = min(x 1i, X 2i, C i ) and δ i = 2 if T i = X 2i 0 if T i = C i Can we still apply K-M estimator to the competing risk data? In competing risks setting, the standard Kaplan-Meier method is commonly applied for type k event as follows: KM-C is to consider only the first event if of type k and to censor the non-type k events; With the presence of dependence, KM-C results in an inflated estimate of the cause-specific failure probability. KM-I is to consider any event of type k while ignoring all other non-type k events; 11

As KM-I analyses are conducted for each cause of failure, the resulting commonent failure probabilities may exceed the total probability of failure. For cause k, the cumulative incidence function is defined F k (t) = P (X t, δ = k) = t 0 h k (u)s(u )du, 1 where S(u) = P (X > u), h k (u) = lim u 0 P (u X < u + u, δ = k X u). u Let t 1 < t 2 < < t M be the distinct times where one of the competing risks occurs. At time t j, Y j = the number of subjects at risk at t j, r j = the number of subjects with an occurrence of the event of interest at t j, = the number of subjects with an occurrence of any of the other events of interest at t j. F k (t) can be estimated by 0 if t < t 1 ˆF k (t) = = t i t [ i 1 j=1 (1 +r j Y j )] r i Y i if t t 1 where Ŝ(t i ) is the K-M estimator, evaluated at just before t i. The variance of ˆF k (t) is given in (4.7.2) in the textbook. t i t Ŝ(t i ) r i Y i, This estimator and its inference have been implemented in a R package called cmprsk. The specific function name is cuminc. They are also implemented in a SAS macro http://www.mcw.edu/biostatistics/research/software.htm 12