TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip

Similar documents
On the generalized maximum likelihood estimator of survival function under Koziol Green model

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Investigation of goodness-of-fit test statistic distributions by random censored samples

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Asymptotic Properties of Kaplan-Meier Estimator. for Censored Dependent Data. Zongwu Cai. Department of Mathematics

Estimation of the Bivariate and Marginal Distributions with Censored Data

STAT331. Cox s Proportional Hazards Model

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

AFT Models and Empirical Likelihood

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

Survival Analysis for Case-Cohort Studies

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Quantile Regression for Residual Life and Empirical Likelihood

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Editorial Manager(tm) for Lifetime Data Analysis Manuscript Draft. Manuscript Number: Title: On An Exponential Bound for the Kaplan - Meier Estimator

GOODNESS-OF-FIT TEST FOR RANDOMLY CENSORED DATA BASED ON MAXIMUM CORRELATION. Ewa Strzalkowska-Kominiak and Aurea Grané (1)

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

EMPIRICAL LIKELIHOOD ANALYSIS FOR THE HETEROSCEDASTIC ACCELERATED FAILURE TIME MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

Survival Analysis Math 434 Fall 2011

Empirical Likelihood in Survival Analysis

Linear rank statistics

TMA 4275 Lifetime Analysis June 2004 Solution

Analysis of transformation models with censored data

and Comparison with NPMLE

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. *

4 Testing Hypotheses. 4.1 Tests in the regression setting. 4.2 Non-parametric testing of survival between groups

Introduction to Statistical Analysis

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

Efficiency of Profile/Partial Likelihood in the Cox Model

A comparison study of the nonparametric tests based on the empirical distributions

Power and Sample Size Calculations with the Additive Hazards Model

Semiparametric Regression

Lecture 3. Truncation, length-bias and prevalence sampling

Approximate Self Consistency for Middle-Censored Data

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Estimating Bivariate Survival Function by Volterra Estimator Using Dynamic Programming Techniques

4. Comparison of Two (K) Samples

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities

ST4241 Design and Analysis of Clinical Trials Lecture 9: N. Lecture 9: Non-parametric procedures for CRBD

Tied survival times; estimation of survival probabilities

University of California, Berkeley

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

11 Survival Analysis and Empirical Likelihood

arxiv: v1 [math.st] 2 May 2014

LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE ACCELERATED FAILURE TIME MODEL

Full likelihood inferences in the Cox model: an empirical likelihood approach

1 Introduction. 2 Residuals in PH model

3 Joint Distributions 71

Analytical Bootstrap Methods for Censored Data

Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data

Lecture 5 Models and methods for recurrent event data

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Local Linear Estimation with Censored Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

,..., θ(2),..., θ(n)

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data

Resampling methods for randomly censored survival data

ST4241 Design and Analysis of Clinical Trials Lecture 7: N. Lecture 7: Non-parametric tests for PDG data

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

A SIMPLE IMPROVEMENT OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

MAS3301 / MAS8311 Biostatistics Part II: Survival

Tests of independence for censored bivariate failure time data

Harvard University. Harvard University Biostatistics Working Paper Series. A New Class of Rank Tests for Interval-censored Data

The Central Limit Theorem Under Random Truncation

Survival Times (in months) Survival Times (in months) Relative Frequency. Relative Frequency

Goodness-of-fit test for the Cox Proportional Hazard Model

UNIVERSITY OF CALIFORNIA, SAN DIEGO

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

Product-limit estimators of the survival function with left or right censored data

Stat 5101 Lecture Notes

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Model checks via bootstrap when there are missing binary data.

Statistical Inference and Methods

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

β j = coefficient of x j in the model; β = ( β1, β2,

UNIVERSITÄT POTSDAM Institut für Mathematik


Nonparametric estimation of linear functionals of a multivariate distribution under multivariate censoring with applications.

Survival Times (in months) Survival Times (in months) Relative Frequency. Relative Frequency

ESTIMATING PANEL DATA DURATION MODELS WITH CENSORED DATA

On the Breslow estimator

Testing Homogeneity Of A Large Data Set By Bootstrapping

Lecture 22 Survival Analysis: An Introduction

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Statistical Analysis of Competing Risks With Missing Causes of Failure

Cox s proportional hazards model and Cox s partial likelihood

Transcription:

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississippi University, MS38677 K-sample location test, Koziol-Green model, random censorship, maximum likelihood estimator, product limit estimator. ABSTRACT This paper considers tests for location with k samples and randomly rightcensored data. Under the Koziol-Green model of random censorship in which the survival distribution of the censoring times is some power of the survival distribution of the lifetimes, one class of k-sample location tests similar to those discussed by James (1987), which allows shapes do differ in the k populations, are developed. The tests are based on general estimating functions and do not require full parametric assumptions for large samples. The asymptotic distribution of the test scores is studied. 1. INTRODUCTION In survival analysis, one is often interested in comparing several groups (or treatments) in terms of their means, medians, or distributions when data are possibly censored. Several tests for right-censored data have been proposed in the literature to test equality of distributions or equality of medians, see, for example, Gehan (1965), Breslow (1970), Peto and Peto (1972), Prentice (1978), Gill (1980), Brookmeyer & Crowley (1982), Harrington & Fleming (1982), Leurgans (1983). James (1987) proposed a class of tests for location with k samples and right-censored data, which allows shapes to differ in the distributions. The Koziol-Green (1976) model is an important particular pattern of censorship. In the Koziol-Green (K-G) model one assumes that the lifetime distribution function 1

F and the censoring distribution function G satisfy 1 G = (1 F ) ff. Koziol & Green (1976) show that this pattern of censorship often occurs in clinical trials. In this paper we develop a class of tests, similar to those discussed by James (1987), for location with k samples under the K-G model of random censorship. The tests are based on distribution-free estimating equations similar to those discussed by James (1986) and Buckley & James (1979), and do not require full parametric assumptions for large samples, but they are analogous in form to parametric score tests. Abdushukurov (1984) and Cheng and Lin (1984, 1987) independently proposed the maximum likelihood estimator (MLE) of the survival distribution function and discussed the advantage of using the MLE instead of the product-limit estimator (PLE) of Kaplan & Meier (1958) under the K-G model. Wu, Yang and Duran (1999) have shown that in two-sample scale problem with right censoring the minimum L 2 -distance estimator of the scale parameter based on the MLE of the survival function is asymptotically more efficient than the estimator based on the PLE of the survival function under R the K-G model. Stute (1992) has shown that in the estimation of the integral ffidf, where F is the lifetime distribution function and ffi is a preassigned F-integrable function, R ffidf n outperforms R ffid ^Fn, where F n is the MLE of F and ^Fn is the PLE R of F, under the K-G model of random censorship. The asymptotic distribution of ffidf n has been established by Dikta (1995). In our new tests for location with k samples under the K-G model we used the MLE of the survival distribution function instead of the PLE that is used in James (1987) and Buckley & James (1979). Section 2 contains the general formulation and rationale of the k-sample location tests under the K-G model of random censorship, where distributional shapes are not constrained to be equal in the k populations. Section 3 gives the asymptotic distribution of the test statistics. 2. K-SAMPLE LOCATION TESTS UNDER THE K-G MODEL OF RANDOM CENSORSHIP Let Y ij denote the jth independent observation drawn from the ith population, where j =1;:::;n i, i =1;:::;k, and let C ij denote the censoring value corresponding to Y ij. In the right censoring model, the observable quantities are 2

ij = min(y ij ;C ij ); ffi ij = I(Y ij» C ij ); (2.1) where I is the indicator function. We say that the observation ij is uncensored if ffi ij = 1 and censored if ffi ij =0. Denote by S i (y) =pr(y ij >y), the survival function for the ith population, and F i = 1 S i the distribution function of the survival times. Denote G i (y) =pr(c ij» y) the distribution function of censoring times. We assume that F i and G i are absolutely continuous with density functions and finite variances, respectively, and Y i1 ;:::;Y ini are independent ofc i1 ;:::;C ini. We consider the following k-sample location model S i (y) =H(y ff fix i ; i ); i =1;:::;k; (2.2) where H is an absolutely continuous function with density function h, ff is a scalar parameter, fi is a (k - 1)-dimensional row vector of parameters, x i is a (k - 1)- dimensional column vector of dummy covariates associated with each Y ij which indicates membership of the ith sample, and i is a vector of p nuisance parameters which allow the shapes of the distributions to differ in different populations. If H is known, the log likelihood function may be written log l = ij fffi ij log h(y ij ff fix i ; i )+(1 ffi ij ) log H( ij ff fix i ; i )g; (2.3) and the score functions may be obtained by @ log l=@ff = ij fffi ij ffi ij +(1 ffi ij )E(ffi ij jy ij >C ij )g; (2.4) @ log l=@fi s = ij fx is [ffi ij ffi ij +(1 ffi ij )E(ffi ij jy ij >C ij )]g; (2.5) @ log l=@ ir = ij fffi ij ijr +(1 ffi ij )E( ijr jy ij >C ij )g; (2.6) for s =1;:::;k 1, r =1;:::;p, where fi s and x is are the sth components of fi and x i respectively, ir is the rth component of i, and ffi ij = @ log h(y ij ff fix i ; i )=@ff; (2.7) 3

ijr = @ log h(y ij ff fix i ; i )=@ ir : (2.8) By equating (2.4), (2.5) and (2.6) to zero, we can obtain the maximum likelihood estimaes of ff; fi = (fi 1 ;:::;fi s ), and = ( 1 ;:::; k ). Suppose that here we are interested in testing fi = fi 0, the score statistic is obtained from the vector [@ log l=@fi],where^ff and ^ ^ff;^ ;fi are the restricted maximum likelihood estimates of 0 ff and by equating (2.4) and (2.6) to zero with fi = fi 0 fixed. Our purpose here is to test fi = fi 0 when H is not specified, but instead we have some functions w i, i =1;:::;k, suchthatifw ij = w i (Y ij ff fix i ) then E(W ij )=0 for all i, j. For example, if w i (u) =u the model becomes Y ij = ff + fix i + " ij with E(" ij )=0andwe are testing hypotheses of equality of means in k populations. If w i (u) =I(u >0) 1 the " 2 ij have median 0 and we are testing equality of k medians. Note that, from Buckley & James (1979), we have Efffi ij W ij +(1 ffi ij )E(W ij jy ij >C ij )g = E(W ij )=0: (2.9) This suggests that the suitable estimating functions analogous to (2.4) and (2.5) would be and ij fffi ij W ij +(1 ffi ij )E(W ij jy ij >C ij )g ij fx is [ffi ij W ij +(1 ffi ij )E(W ij jy ij >C ij )]g; respectively. The conditional expectation E(W ij jy ij >C ij ) depends on the unknown survival function S i. Buckley & James (1979) proposed an estimator for E(W ij jy j > C j ) based on the product-limit estimator (Kaplan and Meier, 1958) of the survival function S i for specified ff and fi, given by ^S i (e; ff; fi) = Y ^e (ij)»e (1 d (ij) =n (ij) ) ffi (ij) ; (2.10) where ^e ij = z ij ff ^fix i ; ^e (i1) < ^e (i2) < < ^e (ini ) are the ordered values of ^e ij, n (ij) is the number at risk at ^e (ij), d (ij) is the number dying at ^e (ij),andffi (ij) =1if 4

d (ij) > 0, = 0 otherwise. The convention in computing the Kaplan-Meier estimator ^S i (e; ff; fi) is to assign the remaining mass to the largest observation (ini ) in the ith sample if it is censored and therefore ^Si (y) =0fory> (ini ) (Miller, 1981). Hence, for C ij < ini, the estimator of E(W ij jy ij > C ij )is ^E(W ij jy ij >C ij ;ff;fi)= Cij w i (y ff fix i )d ^Fi (y)= ^Si (C ij ); (2.11) where ^Fi =1 ^Si. From Efron's (1967) self-consistency representation of the product limit estimator ^Si, it has been shown in James (1986) that fffi ij W ij +(1 ffi ij )^E(W ij jy ij >C ij ;ff;fi)g = n i j 1 w i(y ff fix i )d ^Fi (y): (2.12) The test statistic for testing fi = fi 0 would be obtained from the estimating function (2.12) in the right random censoring model. An important particular case of the general right random censoring model is the so-called Koziol-Green model in which F i and G i are connected by the assumption that for some constant fl i > 0, 1 G i =(1 F i ) fli ; i =1;:::;k; (2.13) where fl i is referred to as the censoring parameter. Indeed ß i = Pr(Y ij» C ij ) = 1 f1 G i(t)gdf i (t) =1=(1 + fl i ) is the expected proportion of uncensored observations in the ith sample. The case fl i = 0 corresponds to no censoring. Let L i denote the distribution function of the observable ij, j =1;:::;n i. By independence, Hence 1 L i =(1 F i )(1 G i )=(1 F i ) 1+fli ; 1 F i =(1 L i ) ßi : (2.14) Abdushukurov (1984) and Cheng and Lin (1984, 1987) independently proposed the maximum likelihood estimator of S i =1 F i,given by 5

~S i (e; ff; fi) =(1 M ini (e; ff; fi)) ß in i ; (2.15) where M ini is the empirical distribution function of ij ff fix i,thatis, and M ini (e; ff; fi) = 1 n i ni j=1 I(z ij ff fix i» e) (2.16) ß ini = 1 n i ni j=1 ffi ij (2.17) is the sample proportion of uncensored observations in the ith sample. Paralleling the approach to the estimating function (2.12) proposed by James (1987), we propose the estimating function by using the MLE of S i instead of the PLE of Kaplan-Meier estimator under the Koziol-Green model of censorship, fffi ij W ij +(1 ffi ij )~E(W ij jy ij >C ij ;ff;fi)g = n i j where ~ Fi =1 ~ Si is the MLE of F i under the K-G model, and 1 w i(y ff fix i )d ~ Fi (y); (2.18) ~E(W ij jy ij >C ij ;ff;fi) = n i n i n i ß ini k fffi ik w i ( ik ff fix i )[ ~ F (rik ) 1 n i ]g where r ij = rank of ij in the ith sample, and j =1;:::;n i. +n i w i ( ij ff fix i ) ~ F (rij ); (2.19) F ~ (rij )=(1 r ij 1 ßin i r ij ) (1 ) ßin i ; (2.20) n i n i Consequently, corresponding to (2.4), the estimator ^ff of ff is obtained as a solution of the estimating equation where i fi i =0; (2.21) 6

fi i = n i 1 w i(y ^ff fi 0 x i )d ~ Fi (y): (2.22) Corresponding to (2.5), the vector of test statistics for testing fi = fi 0 is then q = (q 1 ;:::;q k 1) T, where q s = i x is fi i ; s =1;:::;k 1: (2.23) Note that fi i is in the form of the functional R ffid~ Fi, i =1;:::;k, and q s is a linear combination of ffi 1 ;:::;fi k g, s =1;:::;k 1. 3. ASYMPTOTIC DISTRIBUTION OF THE TEST STATISTIC In this section the joint asymptotic distribution of q =(q 1 ;:::;q k 1) T, suitabaly normalized, is derived under the null hypothesis fi = fi 0. The following set of assumptions is required. (i) w i is a F i -integrable function for i =1;:::;k. (ii) n i =n! i 2 (0; 1) as n = P i n i!1, for i =1;:::;k, where P i i =1. (iii) ^ff! ff in probability asn!1if fi = fi 0, where ^ff is satisfying (2.21). (iv) For some ffl i > 0, [j' i j(1 L i ) ß i 1 ] 2+ffli dl i < 1; i =1;:::;k; and j' i j(1 L i ) ß i 2 dl i < 1; i =1;:::;k; where ß i 2 (0; 1] is the expected proportion of uncensored observations in the ith sample, L i is the distribution function of ij, j =1;:::;n i, and ' i (y) =w i (y ff fi 0 x i ), i =1;:::;k. Suppose that assumptions (i)-(iv) hold, the K-G model (2.13) holds, and fi = fi 0, by the result of Dikta (1995), we have p ni ( ' i d Fi ~ ' i df i )=) N(0;ff 2 i ); i =1;:::;k; as n!1; (3.1) 7

where =) " means converges weakly to", and ff 2 ßi 1 ßi 1 i = ß i (1 ß i )f [S Li + ß i ln S Li S Li ]' i dl i g 2 +ß 2 i f +2ß i3 (ß i 1) [S ßi 1 Li ' i ] 2 dl i ( ' i df i ) 2 g (3.2) x (S Li (x)) ßi 1 ' i (x)[ 1 (S Li (y))ß i 2 ' i (y)dl i (y)]dl i (x); where S Li =1 L i is the survival function of ij, j =1;:::;n i. It is worth noting that, if ß i = 1, the data are uncensored and ff 2 i the variance in the classical CLT. Suppose now that is the same as ψ i (a) = 1 w i(y a fi 0 x i ) df i (y) is differentiable at a = ff. Imitating the decomposition of Lemma 3.1 of Brookmeyer & Crowley (1982) and using the arguements completely analogous to the result 1 of James (1987), we have q p n = ( i i r ni p 1 x i1 ni n x i(k 1) r ni n 1 w i(y ^ff fi 0 x i )d ~ Fi (y); ::: ; (3.3) p ni 1 w i(y ^ff fi 0 x i )d ~ Fi (y) ) T asymptotically has a (k-1)-variate normal distribution N k 1(0; D), where D = A±A T T ; =(x 1 ;:::;x k ); ±=diag(ff 2 i ); A =(a rs ) (3.4) where ff 2, as defined in (3.2), is the asymptotic variance of p R i n i 'i d Fi ~, i =1;:::;k, a rr = p p r (1 r Q r ), a rs = r Q r s if r 6= s, and Q r = P ψr(ff)= 0 i iψr(ff), 0 r =1;:::;k, s =1;:::;k. Consequently we have, if ^D is a consistent estimator of D, then n 1 q T ^D 1 q =) χ 2 (k 1); as n!1: (3.5) 8

BIBLIOGRAPHY Abdushukurov, A.A. (1984). On some estimates of the distribution function under random censorship. In: Conference of Young Scientists, Math. Inst. Acad. Sci. Uzbek SSR, Tashkent. VINITI No. 8756-V (in Russian). Breslow, N. (1970). A generalized Kruskal-Wallis test for comparing K samples subject to unequal patterns of censorship. Biometrika, 57, 579-594. Brookmeyer, R. and Crowley, J. (1982). A k-sample median test for censored data. J. Amer. Statist. Assoc., 77, 433-440. Buckley, J. and James, I. (1979). Linear regression with censored data. Biometrika, 66, 429-36. Cheng, P.E. and Lin, G.D. (1987). Maximum likelihood estimation of a survival function under the Koziol-Green proportional hazards model. Statist. Probab. Letters, 5, 75-80. Dikta, G. (1995). Asymptotic normality under the Koziol-Green model. Commun. Statist.-Theory Meth., 24, 1537-1549. Efron, B. (1967). The two-sample problem with censored data. Proc. 5th Berkeley Symp., 4, 831-853. Gehan, E. (1965). A generalized Wilcoxon test for comparing arbitrarily singly censored samples. Biometrika, 52, 203-223. Gill, R.D. (1980). Censoring and Stochastic Integrals. Mathematical Centre Tracts, 124. Amsterdam: Mathematische Centrum. Harrington, D.P. and Fleming, T.R. (1982). A class of rank test procedures for censored survival data. Biometrika, 69, 553-566. James, I.R. (1986). On estimating equations with censored data. Biometrika, 73, 35-42. James, I.R. (1987). Tests for location with k samples and censored data. Biometrika, 74, 599-607. Kaplan, E.L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc., 53, 457-481. Koziol, J.A. and Green, S.B. (1976). A Cramér-von Mises statistic for randomly censored data. Biometrika, 63, 465-474. Leurgans, S. (1983). Three classes of censored data rank tests: strengths and weaknesses under censoring. Biometrika, 70, 651-658. 9

Miller, R.G. (1981). Survival Analysis. New York: Wiley. Peto, R. and Peto, J. (1972). Asymptotically efficient rank invariant test procedures (with discussion). J. R. Statist. Soc. A, 135, 185-206. Prentice, R.L. (1978). Linear rank tests with right censored data. Biometrika, 65, 167-179. Stute, W. (1992). Strong consistency under the Koziol-Green model. Statist. and Probab. Letters, 14, 313-320. Wu, K., Yang, S. and Duran, B.S. (1999). On minimum distance estimation in two-sample scale problem with right censoring. Commun. in Statist.-Theory Meth., 28, 6, 1461-1477. 10