A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

Similar documents
A SIMPLE IMPROVEMENT OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

Survival Analysis. Stat 526. April 13, 2018

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Power and Sample Size Calculations with the Additive Hazards Model

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Continuous Time Survival in Latent Variable Models

Survival Distributions, Hazard Functions, Cumulative Hazards

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

STAT331. Cox s Proportional Hazards Model

Survival Analysis I (CHL5209H)

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Multivariate Survival Analysis

Meei Pyng Ng 1 and Ray Watson 1

Survival Analysis Math 434 Fall 2011

Assessing the effect of a partly unobserved, exogenous, binary time-dependent covariate on -APPENDIX-

Extensions of Cox Model for Non-Proportional Hazards Purpose

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Simulation-based robust IV inference for lifetime data

Adaptive Prediction of Event Times in Clinical Trials

Multistate models in survival and event history analysis

Maximum likelihood estimation of a log-concave density based on censored data

Key Words: survival analysis; bathtub hazard; accelerated failure time (AFT) regression; power-law distribution.

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Multistate models and recurrent event models

Quantile Regression for Residual Life and Empirical Likelihood

Logistic regression model for survival time analysis using time-varying coefficients

Technical Report - 7/87 AN APPLICATION OF COX REGRESSION MODEL TO THE ANALYSIS OF GROUPED PULMONARY TUBERCULOSIS SURVIVAL DATA

Likelihood Construction, Inference for Parametric Survival Distributions

STAT 526 Spring Final Exam. Thursday May 5, 2011

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Multistate models and recurrent event models

Randomization Tests for Regression Models in Clinical Trials

PMC-optimal nonparametric quantile estimator. Ryszard Zieliński Institute of Mathematics Polish Academy of Sciences

Multistate Modeling and Applications

Lecture 22 Survival Analysis: An Introduction

Lecture 3. Truncation, length-bias and prevalence sampling

Accelerated life testing in the presence of dependent competing causes of failure

Cox s proportional hazards model and Cox s partial likelihood

Survival Analysis for Case-Cohort Studies

Multi-state Models: An Overview

Longitudinal + Reliability = Joint Modeling

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Multi-state models: prediction

Chapter 2. Data Analysis

Extensions of Cox Model for Non-Proportional Hazards Purpose

3003 Cure. F. P. Treasure

STAT 6350 Analysis of Lifetime Data. Probability Plotting

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Tests of independence for censored bivariate failure time data

Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored Data

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

ST495: Survival Analysis: Maximum likelihood

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

Statistical Modeling and Analysis for Survival Data with a Cure Fraction

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

SIMULTANEOUS CONFIDENCE BANDS FOR THE PTH PERCENTILE AND THE MEAN LIFETIME IN EXPONENTIAL AND WEIBULL REGRESSION MODELS. Ping Sa and S.J.

Package threg. August 10, 2015

Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time

Modified maximum likelihood estimation of parameters in the log-logistic distribution under progressive Type II censored data with binomial removals

5. Parametric Regression Model

Analysis of transformation models with censored data

Survival Analysis. STAT 526 Professor Olga Vitek

Economics 508 Lecture 22 Duration Models

The Design of a Survival Study

Step-Stress Models and Associated Inference

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

MAS3301 / MAS8311 Biostatistics Part II: Survival

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Package SimSCRPiecewise

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Instrumental variables estimation in the Cox Proportional Hazard regression model

CTDL-Positive Stable Frailty Model

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

MODELING MISSING COVARIATE DATA AND TEMPORAL FEATURES OF TIME-DEPENDENT COVARIATES IN TREE-STRUCTURED SURVIVAL ANALYSIS

Investigation of goodness-of-fit test statistic distributions by random censored samples

Efficiency of an Unbalanced Design in Collecting Time to Event Data with Interval Censoring

NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA SIMIN HU

Credit risk and survival analysis: Estimation of Conditional Cure Rate

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects

Lecture 4 - Survival Models

Distribution Fitting (Censored Data)

CURE FRACTION MODELS USING MIXTURE AND NON-MIXTURE MODELS. 1. Introduction

Bayesian semiparametric inference for the accelerated failure time model using hierarchical mixture modeling with N-IG priors

Frailty Models and Copulas: Similarities and Differences

Part III Measures of Classification Accuracy for the Prediction of Survival Times

A MEDIAN UNBIASED ESTIMATOR OF THE AR(1) COEFFICIENT

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes

Semiparametric Regression

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Survival SVM: a Practical Scalable Algorithm

Parametric and Non Homogeneous semi-markov Process for HIV Control

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Nonparametric Model Construction

Transcription:

A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR Agnieszka Rossa Dept. of Stat. Methods, University of Lódź, Poland Rewolucji 1905, 41, Lódź e-mail: agrossa@krysia.uni.lodz.pl and Ryszard Zieliński Inst. Math. Polish Acad. Sc. P.O.Box 137 Warszawa, Poland e-mail: rziel@impan.gov.pl ABSTRACT The celebrated Kaplan-Meter estimator (KME) suffers from a disadvantage: it may happen that estimated probabilities of survival for two different times t 1 and t 2 are equal each to other while t 1 and t 2 differ substantially. We propose a smoothinq of KME in such a way that the resulting estimator is a strictly decreasing function of time. The smoothed KME appears to be more accurate than the original one. 1. INTRODUCTION The celebrated Kaplan-Meter estimator (KME) suffers from a disadvantage: it may happen that estimated probabilities of survival for two different times t 1 and t 2 are equal each to other while t 1 and t 2 differ substantially. It is a consequence of the fact that KME, like typical empirical distiribution function, is piecewise constant. The disadvantage has been recognized since long ago and some smoothed versions have been, explicitly or implicitly, presented in the literature. Typical approach is to choose a smooth and strictly decreasing parametric representation for 1

the survival probablity and to estimate that from observations at hand. For example exponential and Weibull models has been used in Greenhouse and Silliman 1, Gompertz model in Gieser et al. 2 ), logistic, log-logistic and Weibull in Hauck et al. 3. Biganzoli et al. 4 presented a smoothed estimate of the discrete hazard function through artificial neural network (ANN) developed as Partial Logistic regression models with ANN (PLANN). A smooth prediction through a parametric transformation of the time axis is discussed in Byers et al. 5. An interesting nonparametric smoothing for survival distribution with strictly decreasing probability distribution function one can find in Xu and Prorok 6. The literature is abundant; to not overload our note with quotations we confine ourselves to the most recent results presented in Statistics in Medice. Our proposal for smoothing KME is to approximate a slightly modified version of KME locally by a suitable Weibull survival function. In practice it means that we fit the Weibull curve to two adjoining jump points of the original KME. The resulting estimator is a strictly decreasing function of time. It appears to be more accurate than the original KME. 2. MODEL AND ESTIMATION We assume a nonparametrical model: the survival probablity function is any continuous and strictly decreasing function F (t) for t 0 with F (0) = 1 and lim t F (t) = 0. Typical representatives are exponential, Weibull, gamma, generalized gamma, lognormal, Gompertz, Pareto, log-logistic, and exponential-power distribtuions, to mention the most popular among them (see e.g. Kalbfleisch and Prentice 7, Klein et al. 8 ). Every survival probability function may be locally approximated with a prescribed level of accuracy by a Weibull W (t; λ, α) survival probability function of the form W (t; λ, α) = exp{ λt α }. For that reason we construct our estimator, to be denoted by S 2 (t) (a reason for the subcript will become clear later), as follows. Denote by t 1, t 2,..., t N the jump points of KME, by P 1, P 2,..., P N the values of KME at those points, and by P 1, P 2,..., P N the arithmetic means of KME in close left-hand and right-hand vinicities of a given point t (by the very definition, 2

at the point t i KME jumps down from the level P i 1 to the level P i (we define t 0 = 0 and P 0 = 1). Hence we define P i = (P i 1 + P i )/2 for i = 1, 2,..., N 1; for i = N we define P N = P N /2 if the last observation is censored and P N = P N otherwise. We shall illustrate our considerations using the well known data on the effect of 6-mercaptopurine on the duration of steroid-induced remission in acute leukemia taken from Freireich at al. 9 (see also Marubini and Valsecchi 10 ). The survival times of 21 clinical patients were 6, 6, 6, 6, 7, 9, 10, 10, 11, 13, 16, 17, 19, 20, 22, 23, 25, 32, 32, 34, 35 (1) where denotes a censored observation. Kaplan-Meier estimator for that data is presented in Fig. 1 and in the following table: 1.00 0.75 0.50 0.25 0 6 7 10 13 16 22 23 35 Fig.1. Kaplan-Meier estimator for data (3) Tab.1. KME and modified KME for data (1) i 1 2 3 4 5 6 7 8 t i 6 7 10 13 16 22 23 35 P i.857.807.753.690.627.538.448.448 P i.928.832.780.722.659.583.493.448 To estimate the survival probability for a given t we define our estimator S 2 (t) as follows. If t = t i for some i, then S 2 (t) = P i. 3

If 0 < t t N and t i < t < t i+1 then we choose a Weibull survival probability function which pass through the points (t i, P i ) and (t i+1, P i+1 ) and then as the value of our estimator S 2 (t) we take the value of the fitted Weibull survival probability function at that point t. It amounts to finding values of λ and α, say ˆλ and ˆα, such that W (t i ; ˆλ, ˆα) = P i and W (t i+1 ; ˆλ, ˆα) = P i+1 (2) Then S 2 (t) = W (t; ˆλ, ˆα). Solving (1) amounts to solving, with respect to Λ and α, the simple set of two linear equations { α log ti + Λ = log( log P i ) α log t i+1 + Λ = log( log P i+1 ) (2 ) with Λ = log λ. If t > t N than we proceed as follows: if the last observed t N is a censoring time, our estimator, like the original KME, is not defined; otherwise we solve (2) for i = N 1 (we extrapolate the Weibull curve which is based on two largest not censored observations). 1.00. 0.75 0.50 0.25 0 6 7 10 13 16 22 23 35 Fig.4. Kaplan-Meier and S 2 estimators for data (3) 4

Estimator S 2 (t) for data (1), as well as original KME, are presented in Fig. 2. For example, if t = 25 or t = 33 the original KME gives us the predicted survival equal to 0.448 in both cases, while our estimator gives us S 2 (25) = 0.484 and s 2 (33) = 0.456, respectively. Similarly, for t = 17 and t = 20 KME is equal to 0.627, S 2 (17) = 0.645, and S 2 (20) = 0.607. Between the two points KME is constant while S 2 (t) strictly decreases. 3. SIMULATION To assess teh accuracy of the new estimator we performed a great number of computer simulations. It appeared that Mean Square Error and Mean Absolute Deviation were significantly smaller. Also Pitman s Measure of Closeness advocates for our estimator. Detailed numerical results are given in a technical report (Rossa and Zieliński 11 ) which we can sent to an interested reader in a TeX-file form. 4. DISCUSSION The proposed estimator S 2 (t) is based on a local fitting a Weibull survival probability to two neighbouring step points of the modified KME. One could expect that a similar estimator S k (t) based on k neighbours would perform better. It evidently gives us a better smoothing but any interval on time axis which contains k > 2 neighbouring points is of course larger than that for k = 2 which may result in a poorer local approximation of an unknown survival curve from a nonparametric family by a Weibull one. Also some practical questions arise: instead of solving (2) or (2 ) one has to apply a technic of fitting two-parameter Weibull curve to k > 2 points, for example a version of the least square method. All these advocate for a very simple but still quite satisfactory estimator S 2 (t). REFERENCES 1. Greenhouse, J.B., and Silliman, N.P. Applications of a mixture survival model with covariates to the analysis of depression prevention trial SM 15, 2077-2094 (1996) 5

2. Gieser, P.W., Chang, M.N., Rao, P.V., Shuster, J.J., and Pullen, J. Modelling cure rates using the Gompertz model with covariate information SM 17, 831-839 (1998) 3. Hauck, W.W., McKee, L.J., and Turner, B.J. Two-part survival models applied to administrative data for determining rate of and predictors for maternalchild transmission of HIV SM 16, 1683-1694 (1997) 4. Biganzoli, E., Boracchi, P., Mariani, L., and Marubini, E. Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach, Statistics in Medicine, 17, 1169-1186 (1998) 5. Byers, R.H. Jr., Caldwell, M.B., Davis, S., Gwinn, M., and Lindegren, M.L. Projection of AIDS and HIV incidence among children born infected with HIV SM 17, 169-181 (1998) 6. Xu, J.-L. and Prorok, P.C. Non-parametric estimation of the post-lead-time survival distribution of screen-detected cancer cases, Statistics in Medicine, 14,, 2715 2725 (1995). 7. Kalbfleisch, J.D. and Prentice, R.L. The statistical analysis of failure time data. Wiley (1980) 8. Klein, J.P., Lee, S.C. and Moeschberger, M.L. A partially parametric estimator of survival in the presence of randomly censored data Biometrics, 46, 795 811 (1990). 9. Freireich, E.O. et al. The effect of 6-mercaptopurine on the duration of steroid-induced remission in acute leukemia: a model for evaluation of other potentially useful therapy, Blood, 21, 699 716 (1963). 10. Marubini, E. and Valsecchi, M.G. Analysing Survival Data from Clinical Trials and Observational Studies, Wiley (1995). 11. Rossa, A. and Zieliński, R. Locally Weibull Smoothed Kaplan Meier EStimator, Institute of Mathematics Polish Academy of Sciences, Preprint 599, November 1999. 6