Survival Times (in months) Survival Times (in months) Relative Frequency. Relative Frequency

Smooth Goodness-of-Fit Tests in Hazard-Based Models by Edsel A. Pe~na Department of Statistics University of South Carolina at Columbia E-Mail: pena@stat.sc.edu Univ. of Virginia Colloquium Talk Department of Statistics December 1, 2 Research supported by an NIH Grant 1

1. Practical Problem ffl Right-censored survival data for lung cancer patients from Gatsonis, Hsieh and Korwar (1985). ffl Survival times (in months): n = 86 with 23 right-censored..99, 1.28, 1.77, 1.97, 2.17, 2.63, 2.66, 2.76, 2.79, 2.86, 2.99, 3.6, 3.15, 3.45, 3.71, 3.75, 3.81, 4.11, 4.27, 4.34, 4.4, 4.63, 4.73, 4.93, 4.93, 5.3, 5.16, 5.17, 5.49, 5.68, 5.72, 5.85, 5.98, 8.15, 8.26, 8.48, 8.61, 9.46, 9.53, 1.5, 1.15, 1.94, 1.94, 11.4+, 11.24, 11.63, 12.26, 12.65, 12.78, 13.18, 13.47, 13.53+, 13.96, 14.23+, 14.65+, 14.88, 14.91+, 15.5, 15.31, 15.47+, 16.13, 16.46, 16.49+, 17.5+, 17.28+, 17.45, 17.61, 17.88+, 17.97+, 18.2, 18.37 18.83+ 19.6 19.55+ 19.58+ 19.75+ 19.78+ 19.95+ 2.4+, 2.24+, 2.7, 2.73+, 21.55+, 21.98+, 22.54, 23.36 ffl Probability histograms of the Complete and Censored Values. Relative Frequency..2.4.6.8 Relative Frequency..2.4.6.8.1.12.14 5 1 15 2 25 Survival Times (in months) 1 12 14 16 18 2 22 Survival Times (in months) 2

ffl NPMLE of Survivor Function: Product-Limit Estimator ^μf (t) = Y 8 < : 1 N(s) 9 = Y (s) ; ; fs»tg where N(s) = No. of uncensored times up to time s; Y (s) = No. at risk at time s: ffl Product-limit estimator and best-fitting exponential survivor function. Survivor Function Estimates..2.4.6.8 1. 5 1 15 2 Survival Time (in months) ffl Question: Did the survival data come from the family of exponential distributions? Or was it from the family of Weibull distributions? 3

ffl Another Data Set: Times to withdrawal (in hours) of 171 car tires in Davis and Lawrance (1989, Scand. J. Statist.) from a car-tire testing study, with withdrawal either due to failure or right-censoring. ffl The pneumatic tires were subjected to laboratory testing by rotating each tire against a steel drum until either failure [actually, there were several competing causes] or removal (right-censoring). ffl Plots of the product-limit estimator, best-fitting exponential, and bestfitting Weibull distribution. Survivor Function Estimates..2.4.6.8 1. 5 1 15 2 25 3 35 Removal Times (in hours) ffl Question: Is this failure-time car tires data consistent with an underlying Weibull distribution? ffl Both situations point to a goodness-of-fit problem with right-censored data. 4

2. Densities and Hazards ffl T = a positive-valued continuous failure-time variable, e.g., Π time-to-failure of a mechanical or electronic system Π time-to-occurrence of an event Π survival time of a patient in a clinical trial ffl f(t) =density function of T. f(t) t ß P ft 2 [t; t + t)g : ffl F (t) = PfT» tg = distribution function ffl μ F (t) = 1 F (t) =survivor function ffl (t) = f(t)= μ F (t) =hazard rate function. (t) t ß P ft 2 [t; t + t)jt tg : ffl Λ(t) = R t (w)dw = log[ μf (t)] = (cumulative) hazard function ffl Equivalences: μf (t) = expf Λ(t)g f(t) = (t)expf Λ(t)g 5

ffl Two Common Examples: Π Exponential: f(t; ) = expf tg μf (t; ) =expf tg (t; ) = Λ(t; ) = t Π Two-Parameter Weibull: f(t; ff; ) =(ff )( t) ff 1 expf ( t) ff g μf (t; ff; ) = expf ( t) ff g (t; ff; ) =(ff )( t) ff 1 Λ(t; ff; ) =( t) ff Π Qualitative Aspects from Plots of Hazards Weibull Hazard Plots Hazard Rate 1.5 2. 2.5 3. 3 3 33 1 3 2 22 2 2 22 2 2 2 4 6 8 1 Time 6

3. Hazard-Based Models Π IID Model: T 1 ;T 2 ;:::;T n IID with common unknown hazard rate function (t). Observable vectors: (Z 1 ;ffi 1 ); (Z 2 ;ffi 2 );:::;(Z n ;ffi n ) with ffi i = 1 ) T i = Z i ffi i = ) T i > Z i : Π Cox PH Model (also Andersen and Gill Model): (T 1 ;X 1 ); (T 2 ;X 2 );:::;(T n ;X n ) with T jx (tjx) = (t)expffi t Xg ( ) an unknown hazard rate function, and fi a regression coefficient vector. Observable vectors: (Z 1 ;ffi 1 ;X 1 ); (Z 2 ;ffi 2 ;X 2 );:::;(Z n ;ffi n ;X n ) with ffi i = 1 ) T i = Z i ffi i = ) T i > Z i : 7

4. Problems and Issues ffl Problem (Goodness-of-Fit): Given f(z i ;ffi i );i= 1; 2;:::;ng in the IID model, or f(z i ;ffi i ;X i );i= 1; 2;:::;ng in the Cox model, decide whether ( ) 2 C = f ( ; ) : 2 g where is a nuisance parameter vector. Π C could be: Exponential, Weibull, Gamma; or IFRA class. Π Importance? Π Previous works on GOF problem with censored data: Akritas (88, JASA), Hjort (9, AS), Hollander and Pe~na (92, JASA), Li and Doss (93, AS), and others. Π Generalize Pearson's: χ 2 P = X (O j ^E j ) 2 ^E j? Π Difficulty: O j 's not computable with censored data. Also, any optimality properties of existing tests? 8

ffl Problem (Model Validation): Given f(z i ;ffi i );i = 1; 2;:::;ng or f(z i ;ffi i ;X i );i = 1; 2;:::;ng, how to assess model assumptions (e.g., IID assumption, model structure)? Π Unit Exponentiality Property (UEP) T ο Λ( ) ) Λ(T ) ο EXP(1) Π IID model: If Λ ( ) is the true hazard function, then with R i = Λ (Z i ), (R1;ffi 1 ); (R2;ffi 2 );:::;(Rn;ffi n ) is a right-censored sample from EXP(1). Π Since Λ ( ) is not known, the R i 's are estimated by R i 's with R i = ^Λ(Z i ); i = 1; 2;:::;n; ^Λ( ) is an estimator of Λ ( ) based on the (Z i ;ffi i )'s. Π Idea: (R i ;ffi i )'s assumed to form an approximate right-censored sample from EXP(1), so to validate model, test whether (R i ;ffi i )'s is a right-censored sample from EXP(1). Π Question: How good is this approximation, even in the limit??? 9

Π For Cox PH model, the analogous expressions for R i and R i are: R i = Λ (Z i ) expffix i g; i = 1; 2;:::;n; R i = ^Λ(Z i )expf ^fix i g; i = 1; 2;:::;n: Λ ^Λ( ) is an estimator of Λ ( ) [e.g., Aalen-Breslow estimator]. Λ ^fi is an estimator of fi [partial likelihood MLE]. Π R i 's are true generalized residuals (Cox and Snell (68, JRSS)). Π R i 's are estimated generalized residuals. Π Generalized residuals are analogs of linear model residuals: (Observed Value) minus (Fitted Value)." Π Question: Effects of substituting estimators for the unknown parameters, even when the estimators are consistent?? 1

5. Class of Smooth GOF Tests ffl Convert observed data (Z 1 ;ffi 1 ;X 1 ); (Z 2 ;ffi 2 ;X 2 );:::;(Z n ;ffi n ;X n ) into stochastic processes. ffl For i = 1; 2;:::;n, and t, let N i (t) = IfZ i» t; ffi i = 1g; Y i (t) =IfZ i tg: ffl A i (t; ( );fi)= so Z t Y i (w) (w)expffi t X i gdw; Pf N i (t) = 1jF t g = Y i (t) (t) expffi t X i gdt: ffl M i (t; ( );fi)=n i (t) A i (t; ( );fi): ffl If ( ) and fi are the true parameters, M (t) = (M 1 (t; ( );fi );:::;M n (t; ( );fi )) are orthogonal sq-int zero-mean martingales with predictable quadratic variation processes hm i ;M i i(t) = A i (t; ( );fi ): 11

ffl Problem: Test H : ( ) 2 C = f ( ; ) : 2 g versus H 1 : ( ) =2 C: ffl Idea: If ( ) is the true hazard rate function, then under H there is some 2 such that ( ) = ( ; ): ffl Define 2»(t; ) = log 4 3 (t) 5 : (t; ) ffl K = collection of such f»( ; ) : 2 g. ffl Basis set (e.g., trigonometric, polynomial, wavelet, etc.) for K: Ψ = fψ 1 ( ; );ψ 2 ( ; );:::g; X»(t; ) = 1 j ψ(t; ) = Ψ: j=1 ffl For smoothing order K, approximate»( ; ) by»(t; ) = K X j=1 j ψ(t; ) = KΨ K : ffl Equivalently, 8 9 < KX = (t) ß (t; )exp: j ψ(t; ); : j=1 ffl Define: 8 8 9 9 < C K = : < KX = K( ; ; ) = ( ; )exp: j ψ j ( ; ); : = K 2 < K ; 2 ; : ffl Embedding: H ρ C K : j=1 12

ffl Goodness-of-Fit Tests: Score tests for H : K = ;fi 2B versus H 1 : K 6= ;fi 2 B: ffl Tests introduced in Pe~na (1998, JASA; 1998, AS). ffl Since score tests, they possess local optimality properties. ffl Likelihood function: obtained via product integration. ffl Score function for K at K = : Q( ; fi) = n X Z fi Ψ K ( ) n dn i Y i ( )expffi t X i gdt o : ffl Not a statistic since and fi are unknown. ffl Plug-in estimators for and fi under the restriction K =. ffl Estimate fi by ^fi which solves S(fi) n X S (m) (t; fi) = n X ffl Estimate by ^ which solves R( ; ^fi) n X Z fi Z fi [X i E(fi)]dN i = ; E(t; fi) = S(1) (t; fi) S () (t; fi) ; X Ωm i Y i expffi t X i g; m = ; 1; 2: ρ ρ( ) dn i Y i ( )expf ^fi ff t X i gdt = ; ρ(t; ) = @ @ log (t; ): 13

ffl Test statistic: S K = 1 n ρ ffρ^ξ 1 ffρ^qff ^Qt ; with ^Q = Q(^ ; ^fi) = n X Z fi ρ Ψ K (^ ) dn i Y i (^ )expf ^fi ff t X i gdt ffl ^Ξ is an estimator of the limiting covariance matrix of 1 p n ^Q. ffl S K is a function of the generalized residuals (R 1 ;ffi 1 );:::;(R n ;ffi n ): 6. Main Asymptotic Results ffl Proposition: If the parameters are known, 2 3 2 1 1 Q( ;fi ) p 6 n 4 R( ;fi ) d 7 5! N 6 4 B @ C A ; ± = B @ S(fi ) so 1 p n Q( ;fi ) d! N(; ± 11 ): ± 11 ± 12 ± 21 ± 22 ± 33 13 C A 7 5 ; ffl Theorem: With estimated parameters, 1 1 p ^Q pn Q(^ ; ^fi) n d! N K (; Ξ) where Ξ = ± 11 ± 12 ± 1 22 ± 21 +( 1 ± 12 ± 1 22 2)± 1 33 ( 1 ± 12 ± 1 22 2) t : ffl Proof: Relies on Rebolledo's MCT. 14

7. Effects of the Plug-In Procedure Π From Ξ = ± 11 ± 12 ± 1 22 ± 21 +( 1 ± 12 ± 1 22 2)± 1 33 ( 1 ± 12 ± 1 22 2) t ; replacing ( ; fi) by(^ ; ^fi) to obtain the statistic ^Q has no asymptotic effect if ± 12 = and 1 = ; since ± 11 is the limiting covariance for 1 p n Q( ;fi ). Π Essence of adaptiveness:" it does not matter that nuisance parameters ( ; fi) are unknown in Q( ; fi) since replacing them by their estimators does not make theasymptotic distribution of Q(^ ; ^fi) different from Q( ;fi ). Π ± 12 = is an orthogonality condition between Ψ K ( ) and ρ( ). Π 1 = is an orthogonality condition between ρ( ) and e( ;fi ). Π Orthogonality defined in a Hilbert space whose inner product is hf;gi = Z fi fgν (dt); with Z ν (A) = A s() ( ;fi ) ( )dt; s () (t; ;fi ) = plim 1 n S() (t; ;fi ): 15

Π Can we always choose Ψ K to satisfy orthogonality conditions? Yes, via a Gram-Schmidt process... but hard to implement! Π If orthogonality conditions are not satisfied, substituting (^ ; ^fi) for ( ;fi ) in Q( ;fi ) impacts on the asymptotic distribution of ^Q, even if the estimators are consistent. Π Second term in Ξ: effect of estimating by ^. Π Third term in Ξ: effect of estimating fi by the partial MLE ^fi. Π Estimating fi by ^fi leads to an increase in variance because this estimator is less efficient than the full MLE of fi. Π Ignoring effect on variance could have dire consequences in the goodness-of-fit testing. Π If overall effect is a variance reduction, ignoring effect may result in a highly conservative test and lead into concluding model appropriateness when model is inappropriate. 16

8. Form of Test Procedure ffl Recall: ^Q = Q(^ ; ^fi) X = n Z fi ρ Ψ K (^ ) dn i Y i (^ )expf ^fi ff t X i gdt : ffl Estimating Limiting Covariance Matrix, Ξ: With ^± = 1 2n nx ^ 1 = 1 2n ^ 2 = 1 2n Z fi nx nx 2 6 4 Z fi an estimator of Ξ is Ψ K (^ ) ρ(^ ) X i E( ^fi) Z fi 3Ω2 ρ 7 dni 5 + Yi (^ )expf ^fi ff t Xigdt ; Ψ K (^ )E( ^fi) ρdn t i + Y i (^ ) expf ^fi ff t X i gdt ; ρ(^ )E( ^fi) ρdn t i + Y i (^ ) expf ^fi ff t X i gdt ; ^Ξ = ^± 11 ^± ^± 1 ^± 12 22 21 +(^ 1 ^± ^± 1 1 12 22 ^ 2)^± 33 ( ^ 1 ^± ^± 1 ^ 2) t 12 22 : ffl Asymptotic ff-level Test: Reject H : ( ) 2 C; fi 2 B whenever S K = 1 ρ ^Q ffρ^ξ t ffρ^q ff χ 2 K n ;ff; Λ ^Ξ = Moore-Penrose generalized inverse of ^Ξ and K Λ = rank(^ξ): 17

9. Choices for Ψ K ffl Partition [;fi]: = a < a 1 < ::: < a K 1 < a K = fi ffl Let Ψ K (t) = h I [;a1 ](t);i (a1 ;a 2 ](t);:::;i (ak 1 ;fi ](t) i t : ffl Then ^Q = [O 1 E 1 ;O 2 E 2 ;:::;O K E K ] t where E j = n X O j = n X Z aj Z aj a j 1 dn i (t) a j 1 Y i (t) expf ^fi t X i g (t;^ )dt: ffl O j 's are observed frequencies. ffl E j 's are estimated dynamic expected frequencies. ffl Extends Pearson's statistic, but `counts are in a dynamic fashion.' ffl Resulting test statistic not of form KX j=1 (O j E j ) 2 E j because correction terms in variance destroys diagonal nature of covariance matrix. ffl If adjustments are ignored, distribution is not chi-squared. ffl Procedure extends Akritas (1988) and Hjort (199). 18

ffl Example: Consider no covariates (so X i = ); C = f (t; ) = g, that is, Exponential distribution. ffl For j = 1; 2;:::;K, ^ = ffl Resulting test statistic: O j = n X E j = ^ n X P n R fi dn i(t) P n R fi Y i(t)dt Z aj a j 1 dn i (t); Z aj a j 1 Y i (t)dt; = Number of events Total exposure : S K = E ffl [p ^ß] t h Dg(^ß) ^ß^ß ti [p ^ß] ; where X E ffl = K E j ; j=1 p = 1 E ffl (O 1 ;O 2 ;:::;O K ) t ; ^ß = 1 E ffl (E 1 ;E 2 ;:::;E K ) t : ffl E ffl 6= n. ffl Appropriate degrees-of-freedom for the statistic is K 1 since 1 t^ß = 1: 19

ffl Polynomial-type: Ψ K (t; ) = h 1; Λ (t; ); Λ (t; ) 2 ; :::; Λ (t; ) K 1i t : ffl Resulting ^Q vector has components ^Q j ;j = 1; 2;:::;K given by ^Q j = n X where, with expf (j 1) ^fix Z fi i g w n j 1 dn R (w) Y R i i (w)dw o ; R i = Λ (Z i ;^ )expf ^fix i g being the estimated residuals, N R i (w) =IfR i» w; ffi i = 1g; Y R i (w) =IfR i wg: ffl N R i 's and Y R i 's are generalized residual processes. ffl Resulting test generalizes test proposed by Hyde's (mid '7's). 2

1. Levels of Tests for Different K's ffl Polynomial: Ψ K (t; ) = 1; Λ (t; );:::;Λ (t; ) K 1 t ffl K 2 f1; 2; 3; 4g; censoring proportion 2 f25%; 5%g and true failure time model were exponential and 2-Weibull. # of Reps = 2 Null Dist. Exponential( ) Parameters = 2 = 5 % Uncensored 75% 5% 75% 5% Level 5% 5% 5% 5% n K 2 4.65 6.65 5. 5.6 5 3 5.45 5.5 4.35 4.95 4 6.55 5.5 5.1 5.85 5 6.4 5. 5.2 4.2 2 4.9 4.75 4.45 4.35 1 3 4.55 4.35 4.65 4.25 4 5.7 5.3 5.1 4.8 5 5.75 4.9 5.3 4.75 Null Dist. Weibull(ff; ) Parameters (ff; ) =(2; 1) (ff; ) = (3; 2) % Uncensored 75% 5% 75% 5% Level 5% 5% 5% 5% n K 2 4.3 4.8 4.6 6.2 5 3 5.4 5.15 6.3 5.7 4 4.8 4.8 6.1 5.3 5 5.25 3.45 6.7 4.6 2 3.9 4.95 4.2 4.8 1 3 5.75 4.65 5.15 5.25 4 5.55 4.15 5. 4.3 5 6. 4.65 5.8 5.3 21

11. Power Function of Tests Simulation Parameters: n = 1; 25% censoring. Legend: Solid line (K = 2); Dots (K = 3); Short dashes (K = 4); Long dashes (K = 5). Figure 1: Simulated Powers for Null: Exponential vs. Alt: 2-Weibull Power 2 4 6 8 1.6.8 1. 1.2 1.4 1.6 Alpha (Weibull Shape Parameter) Figure 2: Simulated Powers for Null: Exponential vs. Alt: 2-Gamma Power 2 4 6 8 1 1 2 3 4 Alpha (Gamma Shape Parameter) 22

Figure 3: Simulated Powers for Null: 2-Weibull vs. Alt: 2-Gamma Power 1 2 3 4 5 6 5 1 15 2 25 Alpha (Gamma Shape Parameter) Some Observations: ffl Appropriate smoothing order K depends on alternative considered. ffl Not always necessary to have large K! ffl Tests based on S 3 and S 4 could serve as omnibus tests, at least for the models in these simulations. ffl Calls for a formal method to dynamically determine K. Plan: to utilize information-based criteria to determine appropriate smoothing order. 23

12. Back to the Lung Cancer Data ffl Product-Limit Estimator (PLE) of survival curve together with confidence band, and the best fitting exponential survival curve. Survivor Function Estimates..2.4.6.8 1. 5 1 15 2 Survival Time (in months) ffl Testing Exponentiality: Values of test statistics using the polynomialtype specification, together with their p-values are: S 2 = 1:92(p = :1661); S 3 = 1:94(p = :3788); S 4 = 7:56(p = :561); S 5 = 12:85(p = :121): ffl Close to a constant function, but with high frequency terms. ffl Testing Two-Parameter Weibull: Value of S 3, together with its p-value, is S 3 = 8:35 (p = :153) Values of S 4 and S 5 both indicate rejection of two-parameter Weibull model. 24

13. Back to the Car Tire Example Survivor Function Estimates..2.4.6.8 1. 5 1 15 2 25 3 35 Removal Times (in hours) Results for Testing Exponentiality Summary of Results of Smooth Goodness-of-Fit Test Testing for Exponential Distribution Input FileName: C: Talks TalkATUG davislawrancecartiredata.txt Output FileName: C: Talks TalkATUG cartire.out Sample Size = 171 Sum of Z_i = 36457. Sum of Delta_i = 15 Estimate of Eta = 4.11443618594769E-3 Estimate of Mean = 243 k S_k DF P-Value 1. 1. 2 172.371 1. 3 174.9297 2. 4 175.7617 3. 5 175.7839 4. Conclusion: Results very consistent with the graphical display: the exponential model does not hold. 25

Results for Testing the Weibull Class Summary of Results of Smooth Goodness-of-Fit Test Testing for the Two-Parameter Weibull Distribution Using the Polynomial Specification for Psi Input FileName: C: Talks TalkATUG davislawrancecartiredata.txt Output FileName: C: Talks TalkATUG cartireweibull.txt Sample Size = 171 # of Uncensored Values = 15. Estimate of alpha = 3.4189555692639 Estimate of eta = 4.85166613135E-3 k S_k DF P-Value 1. 1. 2.513 1.47891 3.555 2.75769 4 6.423 3.9286 5 6.5183 4.16364 Conclusions From Analysis ffl Cannot reject the hypothesis that the Weibull class holds for this car tire data. ffl Notice in particular the p-values associated with the different smoothing order. ffl Result of test consistent with the plots of the survivor estimates! ffl Are they, really? Results Tests Using Larger Smoothing Orders k S_k DF P-Value 5 6.5183 4.16364 6 9.881 5.7871 7 27.3555 6.12 8 31.423 7.5 9 31.416 8.12 1 31.8217 9.21 26

14. Extensions and Open Problems ffl When failure times are discrete or interval=censored. Paper with V. Nair now in draft form. New twist is that embedding of the discrete hazards is through their odds! ffl How about if the failure times are of a mixed-type, i.e., with continuous and discrete components? ffl Problem of adaptively determining the smoothing order to be addressed. ffl Promising possibility for Ψ K : Wavelets as Basis??! 15. Conclusions ffl Presented a formal approach for developing GOF tests with censored data, and extends in a more natural way Neyman's smooth gof tests (cf., Rayner and Best (1989), Smooth Goodness of Fit Tests). ffl Resulting tests possess optimality conditions by virtue of being score tests. ffl Able to see effects of estimating nuisance parameters: (Morale: Don't Ignore!) ffl Potential problems when using generalized residuals in model validation: (Morale: Intuitive considerations may Fail!) 27