Statistical Analysis of Competing Risks With Missing Causes of Failure

Similar documents
Inference for the dependent competing risks model with masked causes of

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

STAT Sample Problem: General Asymptotic Results

Inference based on the em algorithm for the competing risk model with masked causes of failure

Accelerated life testing in the presence of dependent competing causes of failure

Product-limit estimators of the survival function with left or right censored data

Investigation of goodness-of-fit test statistic distributions by random censored samples

Multistate Modeling and Applications

Lecture 5 Models and methods for recurrent event data

Empirical Processes & Survival Analysis. The Functional Delta Method

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

11 Survival Analysis and Empirical Likelihood

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

On the Breslow estimator

Lecture 3. Truncation, length-bias and prevalence sampling

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

An augmented inverse probability weighted survival function estimator

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Accelerated Failure Time Models: A Review

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Product-limit estimators of the gap time distribution of a renewal process under different sampling patterns

A Bivariate Weibull Regression Model

Approximate Self Consistency for Middle-Censored Data

Order restricted inference for comparing the cumulative incidence of a competing risk over several populations

Inference based on the EM algorithm for the competing risks model with masked causes of failure

CHAPTER 1 COMPETING RISK MODELING IN RELIABILITY TIM BEDFORD

Analysis of incomplete data in presence of competing risks

An Extension of the Freund s Bivariate Distribution to Model Load Sharing Systems

Reinforced urns and the subdistribution beta-stacy process prior for competing risks analysis

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data

University of California, Berkeley

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

On graphical tests for proportionality of hazards in two samples

Effects of a Misattributed Cause of Death on Cancer Mortality

Regression analysis of interval censored competing risk data using a pseudo-value approach

ANALYSIS OF COMPETING RISKS DATA WITH MISSING CAUSE OF FAILURE UNDER ADDITIVE HAZARDS MODEL

Bayesian Analysis for Partially Complete Time and Type of Failure Data

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Survival Analysis Math 434 Fall 2011

Chapter 2 Inference on Mean Residual Life-Overview

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

CHAPTER 1 STATISTICAL MODELLING AND INFERENCE FOR COMPONENT FAILURE TIMES UNDER PREVENTIVE MAINTENANCE AND INDEPENDENT CENSORING

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Empirical Likelihood in Survival Analysis

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen

Lecture 22 Survival Analysis: An Introduction

Exercises. (a) Prove that m(t) =

Nonparametric Model Construction

Understanding product integration. A talk about teaching survival analysis.

Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data

Inference for the dependent competing risks model with masked causes of failure

TMA 4275 Lifetime Analysis June 2004 Solution

DAGStat Event History Analysis.

Survival analysis in R

Linear rank statistics

An Extended Weighted Exponential Distribution

STAT 331. Martingale Central Limit Theorem and Related Results

Review of Multivariate Survival Data

Goodness-of-fit test for the Cox Proportional Hazard Model

Tests of independence for censored bivariate failure time data

BARTLETT IDENTITIES AND LARGE DEVIATIONS IN LIKELIHOOD THEORY 1. By Per Aslak Mykland University of Chicago

Likelihood Ratio Tests For and Against Ordering of the Cumulative Incidence Functions in Multiple Competing Risks and Discrete Mark Variable Models

Statistical Inference on Constant Stress Accelerated Life Tests Under Generalized Gamma Lifetime Distributions

Power and Sample Size Calculations with the Additive Hazards Model

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models

Efficiency of Profile/Partial Likelihood in the Cox Model

Chapter 7: Hypothesis testing

A comparison study of the nonparametric tests based on the empirical distributions

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities

Step-Stress Models and Associated Inference

Survival Analysis. Stat 526. April 13, 2018

Non-parametric Tests for the Comparison of Point Processes Based on Incomplete Data

Multivariate Survival Data With Censoring.

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Estimation for Modified Data

Survival Analysis I (CHL5209H)

Likelihood ratio confidence bands in nonparametric regression with censored data

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

Competing risks data analysis under the accelerated failure time model with missing cause of failure

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Multivariate Survival Analysis

Survival analysis in R

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model

Quantile Regression for Residual Life and Empirical Likelihood


Nonparametric estimation of linear functionals of a multivariate distribution under multivariate censoring with applications.

Estimating Load-Sharing Properties in a Dynamic Reliability System. Paul Kvam, Georgia Tech Edsel A. Peña, University of South Carolina

UNIVERSIDADE FEDERAL DE SÃO CARLOS CENTRO DE CIÊNCIAS EXATAS E TECNOLÓGICAS

AFT Models and Empirical Likelihood

Statistical Inference and Methods

Median regression using inverse censoring weights

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

TESTINGGOODNESSOFFITINTHECOX AALEN MODEL

Lecture 7 Time-dependent Covariates in Cox Regression

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Transcription:

Proceedings 59th ISI World Statistics Congress, 25-3 August 213, Hong Kong (Session STS9) p.1223 Statistical Analysis of Competing Risks With Missing Causes of Failure Isha Dewan 1,3 and Uttara V. Naik-Nimbalkar 2 1 Indian Statistical Institute, New Delhi, INDIA 2 Department of Statistics University of Pune, Pune, INDIA 3 Corresponding Author : Isha Dewan isha@isid.ac.in April 15, 213 Abstract In the competing risks model, a unit is exposed to several risks at the same time, but it is assumed that the eventual failure of the unit is due to only one of these risks, which is called a cause of failure. Thus the competing risks data consist of failure time and the cause of failure of each unit on test. Statistical inference procedures when the time to failure and the cause of failure are observed for each unit are well documented. In this paper we address the problem when the cause of failure may be unknown for some units. Several articles have proposed estimation of the survival or the sub-survival function in this situation. However the problem of testing whether the risks are equal or some risk dominates the other has not received much attention. We review some the estimation procedures and propose tests for the equality of the risks based on the sub-distribution, sub-survival functions and cause-specific hazard rates. Key Words : Failure time, missing failure causes, Kaplan-Meier, cause-specific failure rate 1 Introduction Consider a series system of k components. When the system fails we know the system failure time T and a random variable δ where δ = j, j = 1, 2,..., k if the failure of the jth component led to system failure. Apart from reliability, such competing risks data also arise in survival analysis where T is the failure time and δ the cause of failure of the patient, in economics where T is the time spent in an unemployment register and δ indicating the type of job an individual gets, in social sciences where T could be time of marriage and δ indicating the cause of end of marriage - say death or divorce. For other examples see Crowder (21). 1

Proceedings 59th ISI World Statistics Congress, 25-3 August 213, Hong Kong (Session STS9) p.1224 The joint distribution of (T, δ) can be expressed in terms of the sub-distribution function F (j, t) or the sub-survival function S(j, t) of the risk j, j = 1,..., k which are defined, respectively, as F (j, t) = P [T t, δ = j], (1.1) S(j, t) = P [T > t, δ = j]. (1.2) The overall distribution function and the survivor function of the lifetime T are given by F (t) and S(t), respectively. Let f(j, t) denote the sub-density function corresponding to risk j and the density function of T be f(t). We know that F (t) = k F (j, t), S(t) = k S(j, t), f(t) = k f(j, t). j=1 j=1 j=1 The cause-specific hazard rates for j = 1,..., k are defined as λ(j, t) = f(j, t)/s(t). This is the probability of instantaneous failure of the unit due to jth cause given that the unit has survived time t. One would want to know whether all risks are equally important for the unit. For example, if a series system fails due to the same component repeatedly, then one could improve on the reliability of that component so as to ensure better functioning of the system. In mathematical terms such hypotheses can be written in terms of the sub-distribution, sub-survival functions or cause-specific hazards as follows. H 1 : H 2 : H 3 : F (1, t) =... = F (k, t), S(1, t) =... = S(k, t), λ(1, t) =... = λ(k, t). Dewan and Deshpande (25) noted the following result when there are 2 risks of failure. Theorem 1 : Under the null hypothesis of bivariate symmetry of hypothetical failure times due to two risks we have (i)f (1, t) = F (2, t) for all t, (ii)s(1, t) = S(2, t) for all t, (iii) λ(1, t) = λ(2, t) for all t, (iv) P [δ = 1] = P [δ = 2], (v) T and δ are independent. Thus, the following testing problems can be looked at: H : F (1, t) = F (2, t) for all t against H 1 : F (1, t) F (2, t) with strict inequality for some t H : S(1, t) = S(2, t) for all t against H 2 : S(1, t) S(2, t) with strict inequality for some t H : λ(1, t) = λ(2, t) for all t against H 3 : λ(1, t) λ(2, t) with strict inequality for some t. (1.3) 2

Proceedings 59th ISI World Statistics Congress, 25-3 August 213, Hong Kong (Session STS9) p.1225 Under each of the three alternatives risk 2 is more effective than risk 1 stochastically. The three hypotheses mentioned above are not equivalent, but H 3 implies both H 1 and H 2. Sometimes the sub-distribution functions cross may cross but the sub-survival functions could be ordered or vice-versa. These hypotheses are discussed by Carriere and Kochar (2). The experimenter may know only the failure time for the units under consideration. individuals. The cause of failure of some units may not be available. Kodel and Chen (1987) considered an example from animal bioanalysis where all causes were not available. Lapidus et al. (1994) observed that 4 percent of the death certificates of people who had died in motor accidents had no information on causes Dinse (1982), Dewanji(1992), Goetghebeur and Ryan (1995), Dewanji and Sengupta (23), Lu and Tsiatis (25) considered likelihood based estimation of the cause specific failure rates when causes of failure are unknown. Goetghebeur and Ryan (199) considered a modified log rank test for competing risks with missing failure type. Miyawaka (1984) obtained MLE s and MVUE s of the parameters of exponential distribution for the missing case. Kundu and Basu (2) discussed approximate and asymptotic properties of these estimators and obtained confidence intervals. For recent work on competing risks with missing data see Hyun et al (212), Wang and Yu (212), Yu and Li (212), Sun et al (212), Datta et al (21) etc. In what follows we consider a nonparametric test for testing H against H 3 when we have information on time to failure T 1, T 2,..., T n for all n units but the cause of failure δ 1, δ 2,..., δ n may not be known. Let O i be an indicator variable which takes value one if δ i is observed and zero if δ i is missing. The indicator variables O 1, O n are observed. 2 Test H against H 3 We base the test on the following counting processes. N (n) 1 (t) = I[T i t, δ i = 1, O i = 1], N (n) 2 (t) = I[T i t, δ i = 2, O i = 1], N (n) 3 (t) = i= i= I[T i t, δ i = 1, O i = ], N (n) 4 (t) = i= I[T i t, δ i = 2, O i = ]. The corresponding intensity functions are α(j; t)y (n) (t), i = 1, 2, 3, 4, where Y (n) (t) = n i=1 I[T i > t] and α(j; t) are the cause specific hazard functions. Let A 1 (t), A 2 (t), A 1,3 (t), A 2,4 (t) and A 3.4 (t) denote the cumulative hazard functions corresponding to the counting processes N (n) 1, N (n) 2, N (n) 1,3 = N (n) 1 + N (n) 3, N (n) 2,4 = N (n) 2 + N (n) 4 and N (n) 3,4 = N (n) 3 + N (n) 4. The processes N (n) 3 and N (n) 4 are not observable but their sum N (n) 3,4 is. Further suppose that β(t) = P [δ = 1 O =, T = t] is a known function. Our interest is in comparing λ(1; t) and λ(2; t), which are the cause specific hazard functions corresponding respectively to the processes N (n) 1,3 and N (n) 2,4. Then λ(1; t) = α(1; t) + α(3; t), and λ(2; t) = α(2; t) + α(4; t). 3 i=

Proceedings 59th ISI World Statistics Congress, 25-3 August 213, Hong Kong (Session STS9) p.1226 Suppose α 3,4 (t)y (n) (t) denotes the intensity function of the process N (n) 3,4 (t), then we have, α(3; t) = β(t)α 3,4 (t) and α(4; t) = (1 β(t))α 3,4 (t). Then the Nelson-Aalen estimators of A 1,3 (t) and A 2,4 (t) are, respectively, Â 1,3 (t) = Â 2,4 (t) = 1 t Y n (s) dn 1 n β(s) (s) + Y n (s) dn 3,4(s), n 1 Y n (s) dn n 2 (s) + 1 β(s) Y n (s) dn n 3,4(s). A test for the hypothesis H : λ(1; t) = λ(2; t) for all t against H 3, is based on the statistics S 2n (τ) = Â2,4(τ) Â1,3(τ), where τ is some fixed time point. Under the null S 2n (τ) has zero mean. Large positive values of S 2n show evidence against H. Using the Doob-Meyer decompositions of the counting processes and the Rebelledo martingale central limit theorem (Andersen et al., 1993, p. 83) we obtain the following theorem. Theorem 2: : Suppose the subdistribution functions are absolutely continuous. As n and under H : λ(1, t) = λ(2, t), the process { ns 2n (t), t τ} converges weakly to a process U = {U(t), t τ} in D[, τ] with the Skorohod topology, where {U(t), t τ} is a continuous zero-mean martingale and cov(u(s), U(t)) = σ 2 min(t, s) with σ 2 α(1, s) + α(2, s) + (1 2 β(s)) 2 α 3,4 (s) (t) = ds. S(t) Here D[, τ] denotes the space of real valued functions on [, τ] that are continuous from the right and have limits from the left everywhere. A consistent estimator for σ 2 (t) is given by σ 2 n(t) = dn1 n (s) t Y n (s) + dn2 n (s) 2 Y n (s) + 2 From the above results we have that ns2n (τ) N(, 1), as n. σ n (τ) (1 2 β(s)) 2 dn n 3,4(s) Y n (s) 2. Thus a value of ns 2n (τ) σ n (τ) larger than 1.64 shows evidence in favor of H 3 against H at 5% level of significance. 4

Proceedings 59th ISI World Statistics Congress, 25-3 August 213, Hong Kong (Session STS9) p.1227 3 Conclusion We propose a test for testing the equality of two risks against the dominance of one of them. We have assumed that function β(t) is known. In practice a form will have to be assumed for it. A Kolmogorov-Smirnov type test for H against H 1 can be obtained with the estimators of the sub-distribution functions based on the counting processes defined in the previous section. Two other ways of handling missing causes of failure are to either ignore the missing data and use the available tests with reduced sample size or to use imputation procedures, that is generate the missing data. For the first case, the size of the sample is random and for the second case some distributional assumptions are needed. 4 References Anderson, P. K., Borgan, Ø., Gill, R. D., and Keiding, N. (1993). Statistical models based on counting processes. New York: Springer. Carriere, K. C. and Kochar, S.C. (2). Cmparing sub-survival functions in a competing risks model. Lifetime Data Anal., 6, 85-97. Crowder, M.J. (21). Classical Competing Risks. Chapman and Hall. Datta, S., Bandyopadhyay, D. and Satten, G. A. (21), Inverse probability ofcensoring: weighted U-statistics for right-censored data with an application to testing hypotheses. Scandinavian Journal of Statistics, 37, 68-7. Dewan, I. and Deshpande, J.V. (25). Tests for some statistical hypotheses for dependent competing risks - A review. Modern Statistical Methods in Reliability eds Wilson, A. et al, 137-152, World Scientific, New Jersey. Dewanji, A. (1992). A note on a test for competing risks with general missing pattern in failure types. Biometrics, 79, 855-857. Dewanji, A.and Sengupta, D. (23). Estimation of competing risks with general missing pattern in failure types. Biometrics, 59, 163-17. Dinse, G.E. (1986). Nonparametric prevalence and mortality estimators for animal experiments with incomplete cause-of-death data. J. Amer. Statist. Assoc., 81, 328-336. Goetghebeur, E. and Ryan, L. (199). A modified log rank test for competing risks with missing failure type. Biometrika, 77, 27-211. Goetghebeur,E. and Ryan, L. (1995). Analysis of competing risks survival data when some failure types are missing. Biometrika, 82, 821 833. Hyun, S., Lee, J. and Sun, Y. (212) Proportional hazards model for competing risks data with missing cause of failure. J. Statist. Plann. Inference, 142, 1767-1779. 5

Proceedings 59th ISI World Statistics Congress, 25-3 August 213, Hong Kong (Session STS9) p.1228 Kochar, S.C. (1995). A review of some distribution-free tests for the equality of cause specific hazard rates. Analysis of Censored Data, ed Koul, H.L. and Deshpande, J.V., IMS, Hayward, 147-162. Kodel, R.K. and Chen, J.J. (1987). Handling cause of death in equivocal cases using the EM algorithm. Comm. Stats. - Theory and Methods, 16, 2565-263. Kundu, D. and Basu, S. (2). Analysis of incomplete data in presence of competing risks. J. Statist. Plann. Inference, 87, 221-239. Lapidus, G. Braddock, M., Schwartz, R. Banco, L. and Jacobs, L. (1994). Accuracy of fatal motorcycle injury reporting on death certificates. Accident Anal. Prevention, 26, 535-542. Lu, K. and Tsiatis, A. (25). Comparison between two partial likelihood approaches for the competing risks model with missing cause of failure. Lifetime Data Anal., 11, 29-4. Miyakawa, M. (1984). Analysis of incomplete data in competing risks model. IEEE Transactions in Reliability, 33, 293-296. Wang, Jiaping; Yu, Q. (212) Consistency of the generalized MLE with interval-censored and masked competing risks data. Comm. Statist. Theory Methods, 41, 436-4377. Yu, Q. and Li, J. (212). The NPMLE of the joint distribution function with right-censored and masked competing risks data. J. Nonparametr. Stat., 24, 753-764. Sun, Y., Wang, H. J. and ; Gilbert, P. B. (212). Quantile regression for competing risks data with missing cause of failure. Statist. Sinica, 22, 73-728. 6