Goodness-of-Fit Tests With Right-Censored Data by Edsel A. Pe~na Department of Statistics University of South Carolina Colloquium Talk August 31, 2 Research supported by an NIH Grant 1
1. Practical Problem Right-censored survival data for lung cancer patients from Gatsonis, Hsieh and Korwar (1985) with 86 observations (63 complete and 23 right-censored). Probability histograms of the Complete and Censored Values Probability Histogram of Complete Values ProbabilitHistogram of Censored Values..2.4.6.8.1.12..2.4.6.8.1.12.14 5 1 15 2 time[status == 1] 1 12 14 16 18 2 22 time[status == ] Product-limit estimator and best-tting exponential. Survival Probability..2.4.6.8 1. 5 1 15 2 Time to Death Question: Did the survival data come from the family of exponential distributions? Or was it from the family of Weibull distributions? 2
2. On Densities and Hazards T = a positive-valued continuous failure-time variable, e.g., { time-to-failure of a mechanical or electronic system { time-to-occurrence of an event { survival time of a patient in a clinical trial f(t) = density function of T. Practical interpretation: f(t)t P ft 2 [t t +t)g : F (t) = PfT tg = distribution function F (t) = 1 ; F (t) =survivor function (t) = f (t) F (t) = hazard rate function. Practical interpretation: (t)t P ft 2 [t t +t)jt tg : (t) = R t (w)dw = ; log[ F (t)] = (cumulative) hazard function Equivalences: F (t) = e ;(t) f(t) =(t)e ;(t) 3
Two Simple Examples: Exponential: f(t ) = e ;t F (t ) =e ;t (t ) = (t ) = t Two-Parameter Weibull: f(t ) =()(t) ;1 e ;(t) F (t ) =e ;(t) (t ) =()(t) ;1 (t ) =(t) Qualitative Aspects from Plots of Hazards Figure 1: Weibull Hazard Plots Hazard Rate 1.5 2. 2.5 3. 3 3 33 1 3 2 22 2 2 22 2 2 2 4 6 8 1 Time 4
3. On Hazard-Based Modeling Advantages of specifying models via hazards: Vantage point in density modeling: Time origin. [`What proportion are going to fail in [t t +t)?]. Vantage point in hazard modeling: `Present, together with information that accumulated in the past.' [`Given history until time t, what proportion are going to fail among those at risk in [t t+t)?]. Qualitative aspects (e.g., IFR, or bath-tub) can be incorporated. Incorporates dynamic evolution. Relevant in reliability systems modeling where failure rates of components of a system may drastically change due to the failure of other components (Arjas and Norros Lawless Hollander and Pe~na, Lynch and Padgett, etc.). Likelihood construction natural via product integrals. Adapts well in the presence of right-censored or truncated data. Conducive to modeling with point processes (popularized by Aalen Andersen and Gill etc.). 5
Theory to be presented applicable to more general models, but will only consider the following models. IID Model: T 1 T 2 ::: T n IID with common unknown hazard rate function (t). Observable vectors are (Z 1 1 ) (Z 2 2 ) ::: (Z n n ) with i = 1 ) T i = Z i i = ) T i > Z i : Cox PH Model (also Andersen and Gill Model): Let (T 1 X 1 ) (T 2 X 2 ) ::: (T n X n ) such that T jx (tjx) = (t)expf t Xg () an unknown hazard rate function, and a regression coecient vector. The observable vectors are (Z 1 1 X 1 ) (Z 2 2 X 2 ) ::: (Z n n X n ) with i = 1 ) T i = Z i i = ) T i > Z i : 6
4. Problems, Issues, and Prior Works Problem (Goodness-of-Fit): Given f(z i i ) i= 1 2 ::: ng in the IID model, or f(z i i X i ) i= 1 2 ::: ng in the Cox model, decide whether () 2 C = f ( ) : 2 g where is a nuisance parameter vector. C could be: Exponential, Weibull, Pareto or IFRA class. Importance: knowing () 2 C may improve inference procedures. Previous works on GOF problem: Akritas (88, JASA), Hjort (9, AS), Hollander and Pe~na (92, JASA), Li and Doss (93, AS), and others. How to generalize the Pearson-type statistic 2 P = X (O j ; ^E j ) 2 ^E j? Diculty in extending Pearson statistic: O j 's not computable. Optimality properties? 7
Problem (Model Validation): Given f(z i i ) i = 1 2 ::: ng or f(z i i X i ) i = 1 2 ::: ng, how to assess the viability of model assumptions? Unit Exponentiality Property (UEP) T () ) (T ) EXP(1) IID model: If () is the true hazard function, then with Ri = (Z i ), (R1 1 ) (R2 2 ) ::: (Rn n ) is a right-censored sample from EXP(1). Since () is not known, the Ri 's are estimated by R i 's with R i = ^(Z i ) i = 1 2 ::: n ^() is an estimator of () based on the (Z i i )'s. Idea: (R i i )'s assumed to form an approximate right-censored sample from EXP(1), so to validate model, test whether (R i i )'s is a right-censored sample from EXP(1). Question: How good is the approximation, even in the limit??? 8
For Cox PH model, the analogous expressions for R i and R i are: R i = (Z i ) expfx i g R i = ^(Z i )expf ^X i g i = 1 2 ::: n i = 1 2 ::: n ^() is an estimator of () [e.g., Aalen-Breslow estimator], while ^ is an estimator of [partial likelihood MLE]. R i 's are true generalized residuals (Cox and Snell (68, JRSS)) while R i 's are estimated generalized residuals. Generalized residuals are analogs of the linear model residuals: \(Observed Value) minus (Fitted Value)." Question: What are the eects of substituting estimators for the unknown parameters?? 9
5. Class of GOF Tests Convert observed data (Z 1 1 X 1 ) (Z 2 2 X 2 ) ::: (Z n n X n ) into stochastic processes. For i = 1 2 ::: n, and t, let N i (t) =IfZ i t i = 1g = No. of uncensored failures Y i (t) = IfZ i tg = No. at risk: A i (t () )= Z t Y i (w)(w)expf t X i gdw M i (t () )=N i (t) ; A i (t () ): If () and are the true parameters, M (t) = (M 1 (t () ) ::: M n (t () )) are orthogonal sq-int zero-mean martingales with predictable quadratic variation processes hm i M i i(t) = A i (t () ): Problem: Test H : () 2 C = f ( ) : 2 g versus H 1 : () =2 C: 1
Idea: If () is the true hazard rate function, then under H there is some 2 such that () = ( ): Dene 2 (t ) = log 4 3 (t) 5 : (t ) Denote by K the collection of such f( ) : 2 g. Consider a basis set (e.g., trigonometric, polynomial, wavelet, etc.) for K given by f 1 ( ) 2 ( ) :::g so X (t ) = 1 j (t ): j=1 For an appropriate order K (smoothing order), approximate ( ) by X (t ) = K j (t ): j=1 Equivalently, 8 9 < KX = (t) (t )exp: j (t ) : j=1 Dene the class 8 8 9 < C K = : < KX = K( ) = ( )exp: j j ( ) : j=1 9 = K 2 < K 2 : H C K : 11
Goodness-of-Fit Tests: Score tests for H : K = 2B versus H 1 : K 6= 2 B: Tests introduced in Pe~na (1998, JASA 1998, AS). Since score tests, they possess some optimality properties. Score function for K at K = : Q( ) = n X Z K() n dn i ; Y i ()expf t X i gdt o : Not a statistic since and are unknown. Need to plug-in estimators for and under the restriction K =. Estimate by ^ which solves the estimating equation S() n X S (m) (t ) = n X Z [X i ; E()]dN i = E(t ) = S(1) (t ) S () (t ) X m i Y i expf t X i g m = 1 2: Estimate by ^ which solves the prole estimating equation R( ^) n X Z () dn i ; Y i ()expf ^ t X i gdt = (t ) = @ @ log (t ): 12
Test statistic: S K = 1 n ^;1 ^Q ^Qt with ^Q = Q(^ ^) = n X Z K(^) dn i ; Y i (^)expf ^ t X i gdt ^ is an estimator of the limiting covariance matrix of 1 p n ^Q. S K ia a function of the generalized residuals (R i i ) ::: (R n n ): 6. Asymptotics Proposition: If the parameters are known, 2 3 2 1 1 Q( ) p 6 n 4 R( ) d 7 5 ;! N 6 4 B @ C A = B @ S( ) so 1 p n Q( ) d ;! N( 11 ): 11 12 21 22 33 13 C A 7 5 Theorem: With estimated parameters, 1 1 p ^Q pn Q(^ ^) n d ;! N K ( ) where = 11 ; 12 ;1 22 21 +( 1 ; 12 ;1 22 2 ) ;1 33 ( 1 ; 12 ;1 22 2 ) t : Proofs rely on the martingale central limits theorem of Rebolledo. 13
7. Eects of the Plug-In Procedure From the covariance matrix = 11 ; 12 ;1 22 21 +( 1 ; 12 ;1 22 2 ) ;1 33 ( 1 ; 12 ;1 22 2 ) t plugging-in (^ ^) for ( ) to obtain the statistic ^Q has no asymptotic eect if 12 = and 1 = since 11 is the limiting covariance for 1 p n Q( ). Essence of \adaptiveness" [notion in semiparametrics BKRW ('93) Cox and Reid ('87)]: it does not matter that the nuisance parameters ( ) are unknown in Q( ) since replacing them by their estimators does not make the asymptotic distribution of Q(^ ^) dierent from Q( ). 12 = is an orthogonality condition between K( ) and ( ). 1 = is an orthogonality condition between ( ) and e( ). Orthogonality dened in an appropriate Hilbert space with inner product hf gi = Z fg (dt) on the class of square-integrable functions L 2 f[ ] g, with Z (A) = A s() ( ) ( )dt: 14
Can we always choose the K to satisfy orthogonality conditions? Yes, via a Gram-Schmidt process but hard to implement! If orthogonality conditions are not satised, substituting (^ ^) for ( ) in Q( ) impacts on the asymptotic distribution of ^Q. even though these estimators are consistent. Eect contained in the last two terms in the expression for. Second term is the result of estimating by ^ while the third term, which is an increase in the variance, is the eect of estimating by the partial MLE ^. Estimating by ^ leads to an increase in variance is because this estimator is less ecient than the full MLE of ^. Ignoring eect on variance could have dire consequences in the testing. If overall eect is a variance reduction, ignoring it may result in a highly conservative test and may lead into concluding model appropriateness when in fact model is inappropriate. 15
8. Form of the Test Procedure Recall: ^Q = Q(^ ^) X = n Z K(^) dn i ; Y i (^)expf ^ t X i gdt : Estimating Limiting Covariance Matrix, : With and ^ = 1 2n nx ^ 1 = 1 2n ^ 2 = 1 2n Z nx nx 2 6 4 Z K(^) (^) X i ; E( ^) Z 32 7 dni 5 + Yi(^)expf ^ t Xigdt K(^)E( ^) dn t i + Y i (^) expf ^ t X i gdt (^)E( ^) dn t i + Y i (^) expf ^ t X i gdt an estimator of is ^ = ^ 11 ; ^ ^;1 ^ 12 22 21 +(^ 1 ; ^ ^;1 ^ 12 22 2 )^ ;1 33 ( ^ 1 ; ^ ^;1 ^ 12 22 2 ) t : Form of Asymptotic -Level Test: Reject H : () 2 C 2 B whenever S K = 1 ^Q ^ t ;^Q 2 K n ^ ; = Moore-Penrose generalized inverse of ^ and K = rank(^): 16
9. Choices for K Partition [ ]: = a < a 1 < ::: < a K;1 < a K = Let K(t) = h I [ a1 ](t) I (a1 a 2 ](t) ::: I (ak;1 ](t) i t : Then ^Q = [O 1 ; E 1 O 2 ; E 2 ::: O K ; E K ] t where E j = n X O j = n X Z aj Z aj a j;1 dn i (t) a j;1 Y i (t) expf ^ t X i g (t ^)dt: O j 's are observed frequencies. E j 's are estimated dynamic expected frequencies. Extends Pearson's statistic, but `counts are in a dynamic fashion.' Resulting test statistic not of form KX j=1 (O j ; E j ) 2 E j because correction terms in variance destroys diagonal nature of covariance matrix. If adjustments are ignored, distribution is not chi-squared. Procedure extends Akritas (1988) and Hjort (199). 17
Example: Consider the no-covariate (so X i = ), and with C = f (t ) =g (Exponential distribution). For j = 1 2 ::: K, ^ = O j = n X E j = ^ n X Z aj a j;1 dn i (t) Z aj P n R dn i(t) P n R Y i(t)dt = Resulting test statistic becomes a j;1 Y i (t)dt # of events total exposure : S K = E [p ; ^] t h Dg(^) ; ^^ ti ; [p ; ^] where X E = K E j j=1 p = 1 E (O 1 O 2 ::: O K ) t ^ = 1 E (E 1 E 2 ::: E K ) t : E 6= n. Appropriate df for statistic is K ; 1 since 1 t^ = 1: 18
Polynomial-type of basis: K(t) = h 1 (t ) (t ) 2 ::: (t ) K;1i t : Resulting ^Q vector has components where ^Q j = n X expf;(j ; 1) ^X Z i g N R i (w) =IfR i w i = 1g Y R i (w) =IfR i wg R i = (Z i ^)expf ^X i g: Ni R 's and Yi R 's are generalized residual processes. Generalizes Hyde's (mid '7's). w n j;1 dn R (w) ; Y R i i (w)dw o Total-Time-On-Test Specication: For K = 1, dene ^ = ( ^) N R (t) =N[ ;1 (t ^)] Y R (t) =Y [ ;1 (t ^)] R R (t) = Z t Y R (s)ds: Let (t) = RR(t) 1 R R ( ) ; 1 2 : 19
Resulting Test: Reject H whenever 2 S 1 = 4 N R(^ 3 ) 5 [Q R (^ )] 2 R R (^ ) [1 ; 12 ^(^ )] 2 1 where ^Q = q 8 < 1 12N R (^ ) : N R (^ ) Z ^ R R (t) R R (^ ) dn R (t) ; 1 9 = 2 ^(^ ) = ^ 1 ^ ;1^ t 1 n ;1 R R (^ ) : In notation usable for \teaching purposes," ^Q = p 2 12n 4 1 nx W (i) n (i) ; 1 3 5 W (n) 2 2 where n X = n i = number of observed failures W (i) = i X j=1 (n ; j +1)(R (j) ; R (j;1) ) R (j) = jth smallest generalized residual: Q R (^ ) : generalizes the normalized spacings test applied to the generalized residual processes (N R Y R ). 2
Good for detecting IFR alternatives. Bonus: Able to show that this normalized spacings test is a score test! [1 ; 12 ^(^ )] ;1 represents variance adjustment due to the estimation of nuisance parameter by ^. If C is the constant hazard class, no correction is needed even with censored data. Adaptiveness rules! If C is the two-parameter Weibull, correction term is not ignorable. For complete data, [1 ; 12(1)] ;1 = 2 41 ; 18(log 2)2 2 3 5 ;1 = 8:82:::: 21
1. Levels of Tests for Dierent Smoothing Order, K Polynomial: K(t ) = 1 (t ) ::: (t ) K;1 t K 2 f1 2 3 4g censoring proportion 2 f25% 5%g and true failure time model were exponential and 2-Weibull. # of Reps = 2 Null Dist. Exponential() Parameters = 2 = 5 % Uncensored 75% 5% 75% 5% Level 5% 5% 5% 5% n K 2 4.65 6.65 5. 5.6 5 3 5.45 5.5 4.35 4.95 4 6.55 5.5 5.1 5.85 5 6.4 5. 5.2 4.2 2 4.9 4.75 4.45 4.35 1 3 4.55 4.35 4.65 4.25 4 5.7 5.3 5.1 4.8 5 5.75 4.9 5.3 4.75 Null Dist. Weibull( ) Parameters ( ) =(2 1) ( ) = (3 2) % Uncensored 75% 5% 75% 5% Level 5% 5% 5% 5% n K 2 4.3 4.8 4.6 6.2 5 3 5.4 5.15 6.3 5.7 4 4.8 4.8 6.1 5.3 5 5.25 3.45 6.7 4.6 2 3.9 4.95 4.2 4.8 1 3 5.75 4.65 5.15 5.25 4 5.55 4.15 5. 4.3 5 6. 4.65 5.8 5.3 22
11. Power Function of Tests Legend: Solid line (K = 2) Dots (K = 3) Short dashes (K = 4) Long dashes (K = 5) Figure 2: Simulated Powers for Null: Exponential vs. Alt: 2-Weibull Power 2 4 6 8 1.6.8 1. 1.2 1.4 1.6 Alpha (Weibull Shape Parameter) Figure 3: Simulated Powers for Null: Exponential vs. Alt: 2-Gamma Power 2 4 6 8 1 1 2 3 4 Alpha (Gamma Shape Parameter) 23
Figure 4: Simulated Powers for Null: 2-Weibull vs. Alt: 2-Gamma Power 1 2 3 4 5 6 5 1 15 2 25 Alpha (Gamma Shape Parameter) Some Observations: Appropriate smoothing order K depends on alternative considered. Not always necessary to have large K! Tests based on S 3 and S 4 could serve as omnibus tests, at least for the models in these simulations. Calls for a formal method to dynamically determine K. 24
12. Back to the Lung Cancer Data Product-Limit Estimator (PLE) of survival curve together with condence band, and the best tting exponential survival curve. Survival Probability..2.4.6.8 1. 5 1 15 2 Time to Death Testing Exponentiality: Values of test statistics using the polynomialtype specication, together with their p-values are: S 2 = 1:92(p = :1661) S 3 = 1:94(p = :3788) S 4 = 7:56(p = :561) S 5 = 12:85(p = :121): Close to a constant function, but with high frequency terms. Testing Two-Parameter Weibull: Value of S 3, together with its p-value, is S 3 = 8:35 (p = :153) Values of S 4 and S 5 both indicate rejection of two-parameter Weibull model. 25
13. Concluding Remarks Presented a formal approach to develop GOF tests with censored data. Able to see eects of estimating nuisance parameters: (Don't Ignore!) Potential problems when using generalized residuals in model validation: (Intuitive considerations may Fail!) Promising possibility: Wavelets as Basis??! Discrete hazard modeling. Almost nished with this. Open problems: How about a detailed comparison with existing tests: KS, CVM, etc. How to determine smoothing order K adaptively? What if C is a nonparametric life distribution class such as the IFRA? 26