Investigation of goodness-of-fit test statistic distributions by random censored samples

d samples Investigation of goodness-of-fit test statistic distributions by random censored samples Novosibirsk State Technical University November 22, 2010

d samples Outline 1 Nonparametric goodness-of-fit tests for complete data 2 Modified nonparametric goodness-of-fit tests for random censored data 3 Investigation of test statistic distributions for different distributions of censoring times 4 Transformation of censored sample to complete sample by means of randomization 5 RRN χ 2 test for censored data 6 Test power

d samples Nonparametric goodness-of-fit tests for complete data Nonparametric goodness-of-fit tests for complete data The Kolmogorov test statistic D n = sup F n (x) F (x, θ), (1) x < F n (x) is the empirical distribution function. The distribution of the statistic (1) in testing simple hypotheses obeys the Kolmogorov distribution law K(S). The Cramer-von Mises-Smirnov test statistic W 2 n = and in the Anderson-Darling test the statistic A 2 n = (F n (t) F (t)) 2 df (t), (2) (F n (t) F (t)) 2 df (t) F (t)(1 F (t)). (3) In testing a simple hypothesis, statistic (2) has the distribution a1(s), and statistic (3) has the distribution a2(s).

d samples Nonparametric goodness-of-fit tests for complete data Approximations of test statistic distributions It should be noted that in case of composite hypotheses, test statistic distributions G(S H 0 ) are affected by a number of factors, such as the form of the tested distribution F (t; θ), the number of estimated parameters, the estimation method used. In the papers by Lemeshko (2009) the approximations of statistic distribution models were obtained for testing composite hypotheses for a wide range of distribution laws using the maximum likelihood estimates of unknown parameters. These papers are available in the Internet http://ami.nstu.ru/ headrd/seminar/publik html/ Models Part I eng.pdf http://ami.nstu.ru/ headrd/seminar/publik html/ Models Part II eng.pdf

d samples Modified nonparametric goodness-of-fit tests for random censored data Independent random censoring Let lifetime T and censoring time C are independent random variables from distribution functions F (t) and F C (t) respectively. All lifetimes and censoring times are assumed mutually independent, and it is assumed that F C (t) does not depend on any of the parameters of F (t). So, t i = min (T i, C i ) and δ i = 1 {T i C i }, i = 1,..., n.

d samples Modified nonparametric goodness-of-fit tests for random censored data Modified Kolmogorov test The Kolmogorov test statistic D n = sup ˆF n (t) F (t; θ), t< where ˆF n (t) is the Kaplan-Meier estimator. Formulas for calculation: D n = max ( D n +, Dn ), D + n D n ( ) ( = max {ˆFn t(i) F t(i), θ )}, i: δ i =1 { = max F ( t (i), θ ) ( ) ˆF } n t(i 1). i: δ i =1

d samples Modified nonparametric goodness-of-fit tests for random censored data Modified Cramer-von Mises-Smirnov test The Cramer-von Mises-Smirnov test statistic (Koziol and Green (1976)) W 2 n = W 2 n = r j: δ j =1 ) 2 (ˆF n (t) F (t; θ) df (t; θ) {ˆF 2 n ( t(j) ) (F ( t(j+1) ; θ ) F ( t (j) ; θ ) ) ˆF n ( t(j 1) ) ( F 2 ( t (j+1) ; θ ) F 2 ( t (j) ; θ )) } + r 3 where r is the number of complete observations.

d samples Modified nonparametric goodness-of-fit tests for random censored data Modified Anderson-Darling test The Anderson-Darling test statistic A 2 n = (ˆF n (t) F (t; θ) A 2 n = r + r j: δ j =1 ( ( 1 ˆF n ( t(j 1) ) 2 df (t;θ) F (t;θ)(1 F (t;θ)), { (ˆF n 2 ( ) t(j 1) ˆF n 2 ( ) ) t(j) log F ( t (j) ; θ ) ) ) 2 ( 1 ˆF n ( t(j) ) ) 2 ) log ( 1 F ( t (j) ; θ )) }, where r is the number of complete observations.

d samples Modified nonparametric goodness-of-fit tests for random censored data These modified goodness-of-fit tests are mentions in many papers on statistical analysis of censored data. For example, Anderson (1952), Hjort (1992), Nair (1981) Reineke (2004), Lawless (2003) and many others. And it is assumed that when testing a goodness-of-fit hypothesis p-value can be calculated basing on a simulated statistic distribution G(S H 0 ). The main purpose of the paper is to investigate with computer simulation technique the distributions of modified test statistics for various distributions of censoring times.

d samples Modified nonparametric goodness-of-fit tests for random censored data Simulation study of statistic distributions Case 1 F (t; θ) is the Weibull distribution with parameters (2, 2) - red curve F C (t) is the Beta-I distribution Figure: Considered distributions F (t; θ) and F C (t)

d samples Modified nonparametric goodness-of-fit tests for random censored data Simulation study of statistic distributions Figure: Kolmogorov test statistic distributions for different censoring degrees when testing composite hypothesis of goodness-of-fit with the Weibull distribution, n = 100

d samples Modified nonparametric goodness-of-fit tests for random censored data Simulation study of statistic distributions Case 2 F (t; θ) is the Weibull distribution with parameters (2, 2) - red curve F C (t) is the Weibull distribution with other values of parameters Figure: Considered distributions F (t; θ) and F C (t)

d samples Modified nonparametric goodness-of-fit tests for random censored data Simulation study of statistic distributions Figure: Kolmogorov test statistic distributions for different distributions of censoring times when testing the composite hypothesis of goodness-of-fit with the Weibull distribution, censoring degree is 60%

d samples Modified nonparametric goodness-of-fit tests for random censored data Simulation study of statistic distributions Algorithm for simulation of random censored sample 1 Generate a complete sample of the size n from the hypothetical distribution: T i = F 1 (ξ i ; ˆθ n ), i = 1, n, where ξ Uni (0, 1). 2 Calculate the Kaplan-Meier estimate of the censoring distribution ˆF c (t) by the inversed original sample. 3 Generate censoring times C i, i = 1, n by the following formula. ξ i c 1 ˆF c (c 1 ), 0 < ξ i ˆF c (c 1 ) ( ) ξ i ˆF c (c j ) (c j+1 c j ) C i = c j + (ˆF c (c j+1 ) ˆF, ˆF c (c j )) c (c j ) < ξ i ˆF c (c j+1 ), j = 1, k c k + c k (ξ i ˆF ) c (c k ), ξ i > ˆF c (c k ) where c 1,..., c k are the increase-ordered different censoring observations in the original sample, k is the number of different censoring observations in the original sample. 4 t i = min (T i, C i ), δ i = 1 {T i C i }, i = 1, n

d samples Modified nonparametric goodness-of-fit tests for random censored data Simulation study of statistic distributions Inversion of a censored sample Inversed sample is the original sample in which δ i = 1 have been replaced with δ i = 0 and vice versa

d samples Transformation of censored sample to complete sample by means of randomization Randomization In the sample of observations (t 1, δ 1 ), (t 2, δ 2 ),..., (t n, δ n ) we replace all censored observations (t i, δ i = 0) = C i by simulated times ˆT i from the hypothetical distribution.

d samples Transformation of censored sample to complete sample by means of randomization Such replacement enables to obtain a complete sample, for which one can apply the goodness-of-fit tests with statistics (1), (2), (3) for complete data. After this transformation it is necessary to estimate unknown parameters of hypothetical distribution by obtained complete sample. Then the distributions of statistics (1), (2), (3) by transformed samples are the same as in the case of originally complete data. It is possible to use the approximations of statistic distributions obtained in papers of Lemeshko (2009) for calculation of the p-value.

d samples Transformation of censored sample to complete sample by means of randomization An example of testing goodness-of-fit hypothesis Consider a sample of observations of the size n = 100 and it contains 10 right censored observations. H 0 : F (t) is from the family of lognormal distributions

d samples Transformation of censored sample to complete sample by means of randomization An example of testing goodness-of-fit hypothesis (continued) Figure: Kaplan-Meier estimate by considered censored sample, Weibull distribution and Lognormal distribution (MLEs are used)

d samples RRN χ 2 test for censored data RRN χ 2 test for censored data In the papers by Nikulin et al.(2010) the χ 2 test with the statistic Y 2 n (ˆθ n ) = Z T ˆV Z has been suggested for random censored data. The limit distribution of statistic Y 2 n (ˆθ n ) under condition of true hypothesis H 0 is χ 2 distribution with r = rank(v ) degrees of freedom.

d samples RRN χ 2 test for censored data Simulation study of statistic distributions Figure: RRN χ 2 test statistic distributions for different censoring degrees when testing the composite hypothesis of goodness-of-fit with the Weibull distribution, n = 100, K = 5

d samples RRN χ 2 test for censored data Simulation study of statistic distributions Figure: RRN χ 2 test statistic distributions for sample sizes n = 100 and n = 500 when testing the composite hypothesis of goodness-of-fit with the Weibull distribution, censoring degree is about 30%, K = 5

d samples RRN χ 2 test for censored data Simulation study of statistic distributions Figure: RRN χ 2 test statistic distributions for different distributions of censoring times when testing the composite hypothesis of goodness-of-fit with the Weibull distribution, censoring degree is about 50%, n = 100, K = 5

d samples RRN χ 2 test for censored data Simulation study of statistic distributions Tests power H 0 : Weibull distribution H 1 : Lognormal distribution The power of Kolmogorov, Cramer-von Mises-Smirnov and Anderson-Darling tests was calculated by completed samples with randomization. Sample size n = 200, α = 0.1 Table: The test power comparison Goodness-of-fit test 10% 20% 30% 40% 50% 60% 70% 80% Kolmogorov test 0.74 0.64 0.54 0.45 0.35 0.25 0.19 0.13 Cramer-von Mises-Smirnov test 0.84 0.74 0.63 0.52 0.39 0.28 0.21 0.14 Anderson-Darling test 0.88 0.78 0.67 0.56 0.44 0.32 0.24 0.16 RRN χ 2 test, K = 3 0.87 0.81 0.76 0.69 0.62 0.56 0.46 0.35 RRN χ 2 test, K = 5 0.90 0.86 0.81 0.75 0.69 0.58 0.47 0.34

d samples RRN χ 2 test for censored data Simulation study of statistic distributions Tests power H 0 : Weibull distribution H 1 : Lognormal distribution

d samples Conclusions The distributions of modified Kolmogorov, Cramer-von Mises-Smirnov and Anderson-Darling test statistics strongly depend on the distribution of censoring times. This fact doesn t enable to recommend using these tests in practice. Randomization procedure enables to obtain a complete sample, for which one can apply the goodness-of-fit tests with statistics (1), (2), (3) for complete data. The distributions of these statistics by completed samples are the same as in the case of originally complete data. RRN χ 2 test has a number of advantages comparing with the considered nonparametric tests.

d samples References [1] Anderson, T.W. Asymptotic Theory of Certain Goodness of fit Criteria based on Stochastic Processes / T.W. Anderson, D.A. Darling // The Annals of Mathematical Statistics. - 1952. - Vol. 23, No. 3. - P. 193-212. [2] Nair, V. Plots and tests for goodness of fit with randomly censored data / Nair, V. // Biometrika. - 1981. - Vol. 68. - P. 99-103. [3] Reineke, D. Estimation of Hazard, Density and Survivor Functions for Randomly Censored Data / D. Reineke, J. Crown // Journal of Applied Statistics. - 2004. - Vol. 31, No. 10. - P. 1211-1225. [4] Koziol, J.A. A Cramer-von Mises Statistic for Randomly Censored Data / J.A. Koziol, S.B. Green // Biometrika. - 1976. - Vol. 63, No. 3. - P. 465-474. [5] Lawless, J.F. Statistical model and methods for lifetime data / J.F. Lawless. - New Jersey : Wiley-Interscience, 2003. - 630 p. [6] Hjort, N.L. On Inference in Parametric Survival Data / Hjort, N.L. // International Statistical Review. - 1992. - Vol. 60, No. 3. - P. 355-387. [7] Lemeshko, B.Yu. Distribution models for nonparametric tests for fit in verifying complicated hypotheses and maximum-likelihood estimators. Part 1 / B.Yu. Lemeshko, S.B. Lemeshko // Measurement Techniques. - 2009. - Vol. 52, No. 6. - P.555-565. [8] Lemeshko, B.Yu. Models for statistical distributions in nonparametric fitting tests on composite hypotheses based on maximum-likelihood estimators. Part II / B.Yu. Lemeshko, S.B. Lemeshko // Measurement Techniques. - 2009. - Vol. 52, No. 8. - P.799-812. [9] V. Bagdonavicius, M. Nikulin Chi-square goodness-of-fit test for right censored data.- The International Journal of Applied Mathematics and Statistics (IJAMAS) (accepted for publication). [10] V. Bagdonavicus, J. Kruopis, M. Nikulin Nonparametric Tests for Censored Data. - Wiley-ISTE, 2010.