Goodness-of-fit test for the Cox Proportional Hazard Model

Size: px

Start display at page:

Download "Goodness-of-fit test for the Cox Proportional Hazard Model"

Garry Harmon
5 years ago
Views:

1 Goodness-of-fit test for the Cox Proportional Hazard Model Rui Cui Department of Economics, UC3M Abstract In this paper, we develop new goodness-of-fit tests for the Cox proportional hazard model. We derive principal component decomposition of the cumulative martingale residual process and construct new tests based on its estimated components, which overperform the corresponding omnibus test. The omnibus test, consistent in the deviation of non-peracute alternatives, is in fact a weighted average of all components, while our test is based on each component, i.e., it is not able to detect all possible alternatives, but it is very powerful in some high-frequency directions. Smooth tests, which are unweighted averages of a few components, are also constructed. The finite sample performance of the tests are illustrated by mean of a Monte Carlo experiment. JEL: C12; C52; Keywords: Duration analysis; Goodness-of-fit; Principal component decomposition; Right-censorship; 1 Introduction The Cox Proportional Hazard model has been widely used in many fields, including economics, since it has been proposed by David Cox in The model specifies the interested duration time through its hazard rate, which is the candidate to describe a dynamic time-dependent phenomena. Also it introduces covariable effects, I very much appreciate the help and support from Miguel Delgado and Winfried Stute. All errors belong to me. 1

2 that makes regression analysis possible for duration data under censorship. The estimation of the Cox model has been studied by Cox (1972,1975) through a partial likelihood approach. The large sample properties has been studied by Tsiatis (1981) and Andersen and Gill (1982) among others. Andersen and Gill (1982) adopted a counting process approach and extended the results to recurrent events process. The counting process approach, which is equivalent to the hazard approach, becomes popular because of the introduction of martingale theory, which makes duration analysis possible. A comprehensive review is in Fleming and Harrington (1991). The Cox model might fail in two ways. On one hand, the Cox model assumes that the hazard rates among individuals are proportional, i.e.,, the hazard ratio is time invariant. This proportional assumption might fail. On the other hand, the specification of the covariable effect might be misspecified. This misspecification might be of the functional form of the covariables and the exponential form of the link function. For model checking, various graphical methods and goodness-of-fit tests have been proposed in the literature. The most common method consists of using the martingale residuals defined by Barlow and Prentice (1988). The martingale residuals, which come from the Doob-Meyer decomposition of the counting process, provide a basis for goodness-of-fit tests of hazard models, e.g. Lin, Wei and Ying (1993), Martinussen and Scheike (27). For the Cox model, the landmark paper is Lin, Wei and Ying (1993). They developed a class of goodness-of-fit tests, including an omnibus test and special tests for the proportional hazard assumption, the functional form of covariables and the form of the link function. Their method is based on the cumulative sum of martingale residuals. Principal component decomposition approach, commonly used in functional analysis, has been used to develop more powerful goodness-of-fit tests for different models. The landmark paper is by Durbin, Knott and Taylor (1975). They studied the standard empirical process with estimated parameters to test the specification of the distribution function and derived its principal components. These components not only help to solve the problem caused by estimation, but also provide a basis for more powerful tests in certain directions. Stute (1997) studied the marked residual process and its principal components to test the specification of nonparametric regression model. Anh and Stute (212) studied the principal component analysis of the martingale part of the empirical process to test parametric hazard model. In all the cases, the obtained components serve as special experts to detect certain deviations. In this paper, more powerful tests are developed for the Cox model using a con- 2

3 ditional principal component decomposition approach. I consider the CUSUM of martingale residuals as in Lin, Wei and Ying (1993), and derive its principal component decomposition in the time dimension. The obtained components are sensitive when detecting certain deviations from the proportional hazard assumption, for instance, higher-frequency deviations are more reflected in later components. The decompose method in this paper is applicable for any model that has a martingale interpretation, including hazard models and transformation models. However, we focus on the Cox model in the present paper. A brief introduction of the Cox model together with some other important models in duration analysis and the omnibus test proposed by Lin, Wei and Ying (1993) is in section 2. Section 3 contains the main result: the principal component decomposition, the asymptotic results of the component processes and the test statistics based on the components. Simulation studies illustrating the performance of our tests in finite sample are presented in section 4. 2 Omnibus Test for the Cox Model 2.1 The Cox Proportional Hazard Model In the framework of regression analysis with right-censored duration data, consider a sample {Z i, i, X i }, i = 1,, n of i.i.d. realizations of {Z,, X}. Here Z is the minimum of the non-negative failure and censoring time, which are denoted by T and C, i.e.,,, Z = min(t, C). The indicator = 1 {T C} contains the information indicating which of T and C is actually observed, and X is the covariable vector. The conditional distribution of failure time is usually better described through its hazard functions rather than densities. The conditional cumulative hazard function is given by Λ(t X) = t df (u X) 1 F (u X), where F is the conditional distribution function of the failure time. If F admits a density f, we have The function dλ(t X) = λ(t X) = f(t X) 1 F (t X) dt. f(t X) 1 F (t X) 3

4 is called the hazard function. It also has a conditional probability expression, λ(t X) = lim h h 1 P (t T < t + h T t, X). In the Cox proportional hazard model, the hazard rate is assumed to have the multiplicative form λ(t X) = λ (t)exp(β X), where λ (t) is an unspecified baseline hazard function. Another approach to the censored data regression models is based on the analysis of counting process. Define the following two processes N(t) = 1 {Z t, =1}, Y (t) = 1 {Z t}. Here N is the counting process and Y is the at-risk process. Applying the Doob- Meyer decomposition, there is a unique predictable process A such that N A is a martingale and A is called the compensator of N. In the counting process approach, instead of modeling conditional hazard rate of T, the compensator process is modeled. Notice that the information contained in {Z, } is equivalent to that contained in {N, Y }. Actually, these two approaches are equivalent under the conditional independence of T and C on X. To be more specific, the process given by M(t) = N(t) t Y (u)dλ(u X) is a martingale with the filtration F t = σ{x, N(u), Y (u+) : u t}. Then modeling the compensator t Y (s)dλ(s X) is equivalent to modeling the conditional hazard. Hence, if the Cox specification is correct for a given sample, there exists a β and λ (t), such that M i (t) = N i (t) t Y i (s)exp(β X i )λ (s)ds i = 1,, n are martingales. The corresponding martingale residuals are defined as ˆM i (t) = N i (t) t Y i (s)exp( ˆβ X i )dˆλ (s), (2.1) 4

5 where ˆβ is an estimator of β and ˆΛ (t) is an estimator of the cumulative baseline hazard function Λ (t) = t λ (s)ds. These martingale residuals provide a basis for goodness-of-fit test for the Cox model. The estimation was suggested by Cox (1972,1975) using the partial likelihood inference. The partial likelihood score function for β is U(β) = ( X i X(β, ) t) dn i (t), where X(β, t) = n Y i(t)e β X i X i n Y i(t)e β X i. The partial likelihood estimator ˆβ is the solution to U(β) =. Under some mild regularity conditions, n 1/2 ( ˆβ β ) converges in distribution to a centered Gaussian variable with covariance matrix Σ(β ) 1. The matrix Σ(β) is defined as [ ] Σ(β) = E (X i X(β, s)) 2 Y i (s)e β X i λ (s)ds, with being the limit of X(β, t). X(β, t) = E[Y (t)eβ X X] E[Y (t)e β X ] The cumulative baseline hazard is estimated by the Breslow (1974) estimator t n ˆΛ (t) = dn i(u) n Y. i(u)e ˆβX i 2.2 Other Important Models in duration analysis The Cox proportional hazard model assumes the conditional hazard rate of the duration time to be as the product of a baseline hazard and the covariable effect. In this sense, it is also called the multiplicative hazard model. Another important hazard model is the Aalen s additive hazard model, which is proposed by Aalen (198) and the hazard rate is assumed to be a summation of the covariable effects. The multiplicative and additive hazard models are suitable for regression analysis of duration data, however, they are not the only important models in duration analysis. There are two general classes of models in duration analysis with regression, the transformation model and the accelerated failure time model. In fact, the Cox model is a special case of the transformation model. 5

6 A transformation model is H(T ) = β X + ε, (2.2) with H( ) an unknown monotone transformation and ε an error term with a known distribution. The transformation model has a martingale interpretation, i.e.,, if we denote Λ ε as the known cumulative hazard function of ε, then M(t) = N(t) t Y (u)dλ ε (β X + H(u)) is a martingale. One special case is the Cox model, in which ε is taken to follow the extreme-value distibution with Λ ε (t) = e t and the transformation is taken as H( ) = ln(λ ( )). Another special case is the proportional odds model, in which ε follows the standard logistic distribution. The accelerated failure time model assumes log(t ) = β X + ε, (2.3) with unspecified distribution of ε. It is just a transformed version of ordinary linear model. The inference of accelerated failure time model is not as easy as that of the Cox model because of censorship. This is no direct martingale structure in (2.3). Although the parameter is easily interpreted as the effect on the mean of log(t ) in the standard linear regression model, it is not so clear when T is under censorship. For transformation model, Chen et al. (22) has proposed an estimating equation approach based on the martingale structure. The estimation coincides with the partial-likelihood estimator in the special case of the Cox model. A brief review of transformation model and accelerated failure time model can be found in Martinussen and Scheike (27). The method to construct goodness-of-fit test in this paper is applicable to models that have a martingale structure, e.g. hazard models and transformation models. The tests we propose are therefore helpful with model selection for analysis of duration data. We demonstrate the method under the Cox model, and generate datas from transformation models and accelerated failure time models as alternatives in the simulation, to study the power of our tests. 2.3 Omnibus Test To test the specification of the Cox model, i.e.,,, to test H : λ(t X) = λ (t)exp(β X) for some β and λ (t), 6

7 we could consider the CUSUM of the martingales R n (t, x) = n 1/2 1 {Xi x}m i (t), (2.4) where M i (t) = N i (t) t Y i(s)exp(β X i )λ (s)ds, i = 1,, n are martingales under H. Lin, Wei and Ying (1993) proposed an omnibus test for the Cox model by considering the process with estimated β and Λ, ˆR n (t, x) = n 1/2 1 ˆM {Xi x} i (t), (2.5) where ˆM i (t) = N i (t) t Y i(s)exp( ˆβ X i )dˆλ (s), i = 1,, n are the martingale residuals. They have shown that, under the null hypothesis, the process ˆR n (t, x) converges weakly to a centered Gaussian process R (t, x) in the space D([, ) [ 1, 1]). Kolmogorov type statistic is constructed based on this process. To simplify the notation, we only consider the univariate case, i.e.,, real-valued X. In the next section, we decompose R n into a countable sum of component processes, and use these component processes to construct new test statistics. 3 Tests based on Component Processes 3.1 Conditional Principal Component Analysis Notice that the process R n in (2.4) is bivariate with non-independent components x and t. Hence, the direct Karhunen-Loève representation is not available in this case. Instead, I adopt a conditional principal component decomposition, i.e.,, to do the decomposition of the process conditional on X. From now on, we impose the following assumptions. (A1). T and C are independent conditional on X. (A2). X is bounded, without loss of generality by 1. (A3). C is independent of X. (A4). For each τ [ <, P {Y (τ) = 1} >. (A5). Σ(β ) = E (X i X(β ], s)) 2 Y i (s)e β X i λ (s)ds is positive definite. The first two assumptions are standard in the Cox model. The third one is needed to justify consistency of the martingale conditional variance. The last two assumptions are needed to get the asymptotic distribution of the partial likelihood estimator ˆβ, 7

8 see Anderson and Gill (1982), Theorem 4.2. Let us begin with the decomposition of the martingale M i (t) conditional on X i. The conditional covariance of M i conditional on X i is [ s t ] E(M i (s)m i (t) X i ) = E Y i (u)e β X i λ (u)du X i = = = = s t s t s t s t E[Y i (u) X i ]e β X i λ (u)du P (T i u, C i u X i )e β X i λ (u)du P (T i u X 1 )P (C i u X i )e β X i λ (u)du exp( Λ (u)e β X i )P (C i u)e β X i λ (u)du. The first equation follows from martingale properties (Fleming and Harrington (1991), Theorem 2.5.1). The last two equations follow respectively from assumption (A1) and (A3). Let us denote the conditional covariance function by T (s t, x) = E(M i (s)m i (t) X i = x), with T (t, x) := E(M 2 (t) X = x) = t exp( Λ (u)e β X )P (C u)e β X λ (u)du. Notice that function T is non-decreasing in t, and T (, x) =, T (, x) 1. Remark. Suppose we do not have censorship, then T (t, x) = 1 exp( Λ (t)e β x ) = F T (t X = x). The conditional covariance function of the martingale equals to the conditional distribution function of T. In this case, T (, x) = 1. Let µ j = 4 π 2 (2j 1) 2, ϕ j(t) = 2sin (2j 1)πt, j = 1, 2, 2 be the eigenvalues and eigenfunctions of the standard Brownian Motion with covariance structure K(s, t) = s t. For each x, let f j be the transformation f j (t, x) = ϕ j (T (t, x)/t (, x)). 8

9 Then {f j (, x)} form an orthonormal basis of a subspace of L 2 (R +, T (, x)/t (, x)), the Hilbert space of all square integrable functions on R + with the inner product T (dt, x) ρ, g x = ρ(t)g(t) R T (, x). + Actually, f j, f h x = = ϕ j R + 1 ( T (t, x) ) ( T (t, x) ϕ h T (, x) T (, x) ϕ j (u)ϕ h (u)du = { 1 j = h j h. ) T (dt, x) T (, x) Moreover, {f j (, x)} are the eigenfunctions of the covariance structure T (s t, x)/t (, x) with associated eigenvalues {µ j }, i.e.,, T (s t, x) R T (, x) f T (ds, x) j(s, x) T (, x) = µ jf j (t, x). + By Mercer s theorem, the covariance function can be decomposed as T (s t, x) = µ j f j (s, x)f j (t, x). T (, x) Since T (s t, x)/t (, x) is the conditional covariance function of the process M i (t)(t (, x)) 1/2 given X i = x, we have the decomposition M i (t)(t (, X i )) 1/2 = µ 1/2 j z ij f j (t, X i ) a.s., (3.1) where z ij := µ 1/2 j (T (, X i )) 1/2 M i, f j (X i ) Xi 1/2 = µ j (T (, X i )) 3/2 M i (t)f j (t, X i )T (dt, X i ). R + The z ij is the j th principal component of M i (t)(t (, X i )) 1/2 conditional on X i. For each j and j h, it has the following properties E(z ij X i ) =, E(z 2 ij X i ) = 1, (3.2) E(z ij z ih X i ) =. Hence, from (3.1) the Karhunen-Loève representation of R n can be written as R n (t, x) = n 1/2 = µ j 1/2 1 {Xi x} [(T (, X i )) 1/2 ] µ 1/2 j z ij f j (t, X i ) ] [n 1/2 z ij 1 {Xi x}(t (, X i )) 1/2 f j (t, X i ). 9

10 I call the term in bracket the j th component process of R n and denote it as c n,j (t, x) := n 1/2 z ij 1 {Xi x}(t (, X i )) 1/2 f j (t, X i ). (3.3) Thus we have the following proposition. Proposition 1 Under the null hypothesis and (A1)-(A5), the CUSUM of martingale processes (2.4) can be decomposed into a weighted sum of component processes R n (t, x) = µ 1/2 j c n,j (t, x). (3.4) The weights are the square root of the eigenvalues of the standard Brownian Motion. 3.2 Asymptotic Theory of Component Processes The expression of the component process (3.3) looks complicated. However it can be simplified a lot. Let us define another function g j corresponding to f j as g j (t, x) := φ j (T (t, x)/t (, x)) = 2cos (2j 1)T (t, x)/t (, x). 2 By changing the order of integration and change of variables, the component processes can be rewritten as c n,j (t, x) = n 1/2 1 {Xi x}f j (t, X i )g j (s, X i )dm i (s). (3.5) The following theorem shows the convergence of the component process, which follows from a tightness result and the fact that it is a sum of i.i.d. centered random functions with variance H j (t, x) := E[1 {Xi x}t (, X i )fj 2 (t, X i )] = Here F ( ) is the distribution function of X. x T (, s)f 2 j (t, s)f (ds). Theorem 1 Under the null hypothesis and (A1)-(A5), for each j, the process c n,j (t, x) converges weakly to a centered Gaussian process in the space D([, ) [ 1, 1]), c d n,j c,j. The limit Gaussian process c,j has covariance structure K(t 1, t 2, x 1, x 2 ) = x1 x 2 T (, s)f j (t 1, s)f j (t 2, s)f (ds). 1

11 Moreover, c,j and c,h are independent for j h. In order to have test statistics, we need to consider the component process after estimation ĉ n,j (t, x) := n 1/2 1 ˆf {Xi x} j (t, X i )ĝ j (s, X i )d ˆM i (s). (3.6) Here ˆM i (t) = N i (t) t Y i (s)exp( ˆβ X i )dˆλ (s), ˆf j (t, x) = ϕ j ( ˆT (t, x)/ ˆT (, x)), ĝ j (t, x) = φ j ( ˆT (t, x)/ ˆT (, x)). As remarked earlier, it involves the estimators of β and Λ and the estimator of the conditional covariance function T (t, x). For ˆT (t, x), recall that T (t, x) = t A natural consistent estimator is ˆT (t, x) = t exp( Λ (u)e β X )P (C u)e β X λ (u)du. exp( ˆΛ (u)e ˆβx )(1 Ĝ(u ))e ˆβx dˆλ (u), where Ĝ is the Kaplan-Meier estimator of the distribution function of C. In the appendix, it is shown that ĉ n,j (t, x) has the same asymptotic distribution as c n,j (t, x) := n 1/2 A j (t, x)σ(β ) 1 n 1/2 [ 1 {Xi x}f j (t, X i )g j (s, X i ) l ] j (β, t, x, s) dm i (s) with X(β, t) and Σ(β) defined in section 2.1, and (X i X(β, s))dm i (s), lj (β, t, x, s) = E[Y (s)eβ X 1 {X x} f j (t, X)g j (s, X)] E[Y (s)e β X, ] [ ] A j (t, x) = E Y (s)e βx (X X(β, s))λ (s)1 {X x} f j (t, X)g j (s, X)ds. The process c n,j (t, x) can be written in the form of c n,j (t, x) = n 1/2 h ij (β, t, x, s)dm i (s), (3.7) 11

12 with h ij (β, t, x, s) = 1 {Xi x}f j (t, X i )g j (s, X i ) l j (β, t, x, s) A j (t, x)σ(β) 1 (X i X(β, s)). The asymptotic distribution of ĉ n,j (t, x) is derived by the following theorem. Theorem 2 Under the null hypothesis and (A1)-(A5), for each j = 1, 2,, the process ĉ n,j (t, x) converges weakly to a centered Gaussian process in the space D([, ) [ 1, 1]), ĉ n,j d c,j. The limit Gaussian process c,j (t, x) has covariance structure K(t 1, t 2, x 1, x 2 ) = E [ h j (β, t 1, x 1, s)h j (β, t 2, x 2, s)y (s)e β X λ (s)ds ]. In addition to the single component process, finite weighted sum of some component processes can also be used for model checking. In this sense, we combine information from different components. Consider the first m component processes with weight w = {w j } m, i.e.,, the process m w jĉ n,j (t, x). It has the same asymptotic distribution with the following process m ( m ) w j c n,j (t, x) = n 1/2 w j h ij (β, t, x, s) dm i (s). The asymptotic distribution is given in the following theorem. Theorem 3 Under the null hypothesis and (A1)-(A5), for any given weight w = {w j } m, the process m w jĉ n,j (t, x) converges weakly to a centered Gaussian process in the space D([, ) [ 1, 1]), m w j ĉ d n,j c w. The limit Gaussian process c w (t, x) has covariance structure [ ( m ) ( m ) ] K(t 1, t 2, x 1, x 2 ) = E w j h j (β, t 1, x 1, s) w j h j (β, t 2, x 2, s) Y (s)e βx λ (s)ds. 3.3 Test Statistics The omnibus tests are based on the original CUSUM of the martingales. By the continuous mapping theorem, we have the following asymptotic distribution of the Kolmogorov-Smirnov and Cramér-von Mises type statistics KS o = sup t,x ˆRn (t, x) d R sup t,x (t, x), 12

13 CvM o = [ ˆRn (t, x)] 2 Fn (dx)dt d [ R (t, x)] 2F (dx)dt. Here F n ( ) is the empirical distribution of X. The component processes we derived provide a basis of new specification tests for the Cox model. I propose to construct Kolmogorov-Smirnov and Cramér-von Mises type statistics based on each component process, i.e.,, for each j = 1, 2,, we have the following, what I call, component tests, ĉn,j KS nj = sup t,x (t, x) d sup t,x c,j (t, x), CvM nj = [ĉ n,j (t, x)] 2 F n (dx)dt d [ c,j (t, x)] 2 F (dx)dt. Note that in (3.4), the weight for the j th component process is µ 1/2 j that decreases very rapidly in j. In consequence, the later components are down-weighted in the original process. In fact, each component reflects certain aspect of a deviation from the null hypothesis. For example, high-frequency deviations are more reflected in later components. Therefore, the omnibus test, which gives low weights to later components, has low power, while the tests based on later components are specially designed for such high-frequency alternatives. Since different aspects of a deviation are distinguished through its components and it is difficult to decide which component to use before model checking, we should construct tests based on each component and reject the null hypothesis if any of them gives us a rejection. In practice, the data should not be very frequent, hence we can focus on the first few components, say no more than ten in general. In addition, smooth test statistics based on the reweighted sum of component processes can be constructed. If we give the components with equal weights and consider the sum of the first m components, the Kolmogorov and Cramér-von Mises type statistics, for some fixed m, can be constructed as m KS nm = sup t,x w j ĉ n,j (t, x) d sup t,x c w (t, x), [ m 2Fn CvM nm = w j ĉ n,j (t, x)] (dx)dt d [ c w (t, x)] 2 F (dx)dt. The smooth tests provide a compromise between the omnibus tests and the tests based on one component. The smooth test is the one that takes w = (1,, 1). The test based on the j th component process is the one that takes w as the j th unit vector, i.e.,, w = 13

14 (,, 1,, ). However, the problem is that one has to choose a suitable w before model checking. Accually, we can take into account the information from all the component processes by considering a new test that behaves as an intersection of the component tests. The idea is based on Bonferroni method, i.e., we run the first m component tests, and record the decision for each one. Then we accept H if all the m tests accept, and reject H if any of them gives us a rejection. Specifically, let T 1, T 2,, T m be the first m component tests with common size x. The Bonferroni test T is { if T1 = = T m =, T = 1 o.w. The probability of our test to accept under H is P (T 1 =,, T m = ), and it admits the following inequality P (T 1 =,, T m = ) P (T 1 = ) + + P (T m = ) (m 1) = (1 x) + + (1 x) (m 1) = 1 mx. For a significant level α, we could choose x = α/m, then the size of our test will be 1 P (T 1 =,, T m = ) mx = α, i.e., the Bonferroni test has a bounded size of α. To approximate the limit distribution c,j (t, x), we follow the suggestion of Lin, Wei and Ying (1993) through Monte Carlo simulations. Recall from the expression (3.7), c n,j (t, x) is a martingale integral. To approximate its asymptotic distribution, the integrand h i (β, t, x, s) can be replaced by its consistent estimator, but we do not know the distribution form of the martingale M i (t). Lin, Wei and Ying (1993) suggested to replace M i (t) by a similar process which has a known distribution. The candidate is N i (t)g i, where N i is the observed counting process and {G i ; i = 1,, n} is a random sample of standard normal variables. Noticing the martingale property E[M 2 (t)] = E[N(t)], the process M i (t) and N i (t)g i have the same variance function. Finally replace all the unknown quantities in h i (β, t, x, s) by their consistent estimators, i.e.,, replace β, Λ (t), f j (t, x), g j (t, x) by ˆβ, ˆΛ (t), ϕ j ( ˆT (t, x)/ ˆT (, x)), φ j ( ˆT (t, x)/ ˆT (, x)) and replace X(β, t), l(β, t, x, s) by their sample analogies. Given the observed data, the distribution of the process after replacement is the same with c n,j (t, x) in the limit. 14

15 4 Simulation study As discussed earlier, the accelerated failure time model and transformation model provide general frameworks for studying the covariable effects of duration data. In our simulation study, we take several alternatives from these models to study the power of our tests. We consider the following DGPs with explanations afterwards. Cox: λ(t X) = λ (t)exp(β X). DGP1: Weibull hazard rate λ(t X) = (.2X)t.2X 1. DGP2: Log-normal Model ln(t ) = β X + ɛ. Here we take ɛ as a standard normal variable. This model is a special case of accelerated failure time models. DGP3: Transformation Model Λ (T )e β X = P areto, where P areto is a standard Pareto variable, which has hazard rate x 1 for x > 1. DGP4: Transformation Model Λ (T )e β X = A 1, where A 1 is a positive random variable that has hazard rate λ(t) = 1 + sin(3πt/2). DGP5: Transformation Model Λ (T )e β X = A 2, where A 2 is a positive random variable that has hazard rate λ(t) = 1 + cos(3πt/2). DGP6: Transformation Model Λ (T )e β X = A 3, where A 3 is a positive random variable that has hazard rate λ(t) = 1 + sin(5πt/2). 15

16 DGP7: Transformation Model Λ (T )e β X = A 4, where A 4 is a positive random variable that has hazard rate λ(t) = 1 + cos(5πt/2). DGP1 is the Weibull hazard model, in which the hazard for different values of the covariable is non-proportional. DGP2 is a commonly used model in economics, and it belongs to the accelerated failure time models. DGP3-7 are transformation models with unspecified transformation ln(λ ( )). The Cox model, as a special case of a transformation model, can be expressed as Λ (T )e β X = E, where E is the standard exponential variable with constant hazard rate. In DGP3, we replace the exponential by a Pareto variable which has decreasing hazard rate. For DGP4-7, we call them high-frequency alternatives, in the sense that the variable A 1, A 2, A 3, A 4 have periodic hazard rates rather than constants. We take β =.2, λ (t) = 1, Λ (t) = t, and X =, 1,, 9 with equal proportions. The censoring variable in each case is drawn from uniform distribution such that the percentage of censorship is around 3%. We run for sample size n = 5, 1, 15, and use 1 realizations of the Gaussian process to estimate the distribution of each statistic. We run 1 replications for each DGP. The result is shown in the table 1 and 2. The omnibus test is based on the original process ˆR n (t, x). The smooth test is based on the reweighted sum of the first five component processes as discussed in the previous section. The last five lines in the tables are for tests based on the first five component processes. I use bold type to indicate the test that has the largest power. Since the Weibull hazard rate data is highly non-proportional, the omnibus test works well, but the test based on the second component has larger power. The lognormal data seems to fit the Cox model well, but still the test based on the second component has largest power. For the Pareto alternative, it is the same situation. Finally, from the result of high-frequency alternatives DGP4-7, it is clear how these components serve as special experts for certain deviations from the proportional hazard assumption. When the alternative gets more frequent in the time domain, the test based on latter component behaves better. 16

17 Table 1: Estimated size and power of KS tests at 5% Cox DGP1 DGP2 DGP3 n = n = n = n = omnibus smooth Bonferroni DGP4 DGP5 DGP6 DGP7 n = n = n = n = omnibus smooth Bonferroni Table 2: Estimated size and power of CvM tests at 5% Cox DGP1 DGP2 DGP3 n = n = n = n = omnibus smooth Bonferroni DGP4 DGP5 DGP6 DGP7 n = n = n = n = omnibus smooth Bonferroni

18 Table 3: Estimated size and power of KS component tests at 5% Cox DGP1 DGP2 DGP3 n = n = n = n = st nd rd th th DGP4 DGP5 DGP6 DGP7 n = n = n = n = st nd rd th th Table 4: Estimated size and power of CvM component tests at 5% Cox DGP1 DGP2 DGP3 n = n = n = n = st nd rd th th DGP4 DGP5 DGP6 DGP7 n = n = n = n = st nd rd th th

19 5 Conclusion We have used conditional principal component decomposition method to decompose the martingale process in hazard model with regression. The component processes provide a basis of more powerful specification tests. The decomposition is in the time domain, and each component process reflects certain deviations from the proportional hazard assumption. The method is applicable for any hazard model with regression and for transformation models. However, these components do not help when the deviations come from misspecifications of the covariable effect, for example, missing variable or wrong link function. To have more powerful tests against these deviations, we need the decomposition of the bivariate process R n (t, x) in x. Since x and t play different roles in R n (t, x), the decomposition method should be different. The decomposition in x will be discussed in the following paper. 6 Appendix: Proofs Proof of Theorem 1: Note that each f j and g j are bounded and differentiable. The tightness of c n,j follows from Lemma 1 in Lin, Wei and Ying (1993). It then follows from the multivariate CLT that the process converges weakly to a centered Gaussian process. The independence between c,j and c,h comes from the Gaussian property and conditional uncorrelation between z ij and z ih. Proof of the asymptotic equivalence of ĉ n,j (t, x) and c n,j (t, x): The asymptotic properties of ˆβ and ˆΛ is given by Tsiatis (1981) and Andersen and Gill (1982). By taking the Taylor s expansion of ĉ n,j (t, x) and the score function 19

20 U(β) at β, we have ĉ n,j (t, x) = n 1/2 n 1/2 n 1 Σ(β ) 1 n 1/2 +o p (1). 1 {Xi x} ˆf j (t, X i )ĝ j (s, X i )dm i (s) n Y i(s)e β X i 1 ˆf {Xi x} j (t, X i )ĝ j (s, X i ) n Y i(s)e β X i dm i (s) Y i (s)e β X i (X i X(β, s))λ (s)1 {Xi x} ˆf j (t, X i )ĝ j (s, X i )ds (X i X(β, s))dm i (s) By the strong consistency of ˆβ, ˆΛ and the Kaplan-Meier estimator, together with the continuous mapping theorem, ˆfj and ĝ j are strongly consistent. Hence, for the first term on the right hand side of the above equation, by the martingale property and the strong consistency and boundness of ˆf j and ĝ j, we have E [n 1/2 = E [n 1/2 =, ( 1 ˆf ) ] 2 {Xi x} j (t, X i )ĝ j (s, X i ) 1 {Xi x}f j (t, X i )g j (s, X i ) dm i (s) ( 1 ˆf 2Yi ] {Xi x} j (t, X i )ĝ j (s, X i ) 1 {Xi x}f j (t, X i )g j (s, X i )) (s)e β X i λ (s)ds [( E 1 ˆf ) 2Yi ] {Xi x} j (t, X i )ĝ j (s, X i ) 1 {Xi x}f j (t, X i )g j (s, X i ) (s)e β X i λ (s)ds thus n 1/2 ( 1 ˆf ) {Xi x} j (t, X i )ĝ j (s, X i ) 1 {Xi x}f j (t, X i )g j (s, X i ) dm i (s) = o p (1). The same argument for the second term, since from the strong consistency of ˆf j and ĝ j and the uniform SLLN, we have n 1 Y i (s)e β X i 1 {Xi x}( ˆf j (t, X i )ĝ j (s, X i ) f j (t, X i )g j (s, X i )) = o p (1). For the third term, we have n 1 and Y i (s)e β X i (X i X(β, s))λ (s)1 {Xi x}( ˆf j (t, X i )ĝ j (s, X i ) f j (t, X i )g j (s, X i ))ds = o p (1), n 1/2 (X i X(β, s))dm i (s) d N(, Σ(β )). 2

21 Thus, ĉ n,j (t, x) and c n,j (t, x) have the same asymptotic distribution. Proof of Theorem 2: To show the tightness of ĉ n,j (t, x), it suffices to show the tightness of c n,j (t, x). Recall c n,j (t, x) = n 1/2 A(t, x)σ(β ) 1 n 1/2 [ 1 {Xi x}f j (t, X i )g j (s, X i ) l(β ], t, x, s) dm i (s) (X i X(β, s))dm i (s). From Lemma 1 in Lin, Wei and Ying (1993), the first term is tight. The second term is tight since n 1/2 (X i X(β, s))dm i (s) converges in distribution. It then follows from the multivariate CLT that ĉ n,j (t, x) converges weakly to a centered Gaussian process. 21

22 References [1] Aalen, O. O. (198). A model for non-parametric regression analysis of life times. Mathematical Statistics and Probability Theory (eds W. Klonecki, A. Kozek and J. Rosinski), Lecture Notes in Statistics, vol. 2, Springer-Verlag, New York. [2] Andersen, P. K. and Gill, R. D. (1982). Cox s regression model for counting processes: a large sample study. The annals of statistics, [3] Anh, T. L. and Stute, W. (212). Principal Component Analysis of Martingale Residuals. Indian Statist. Assoc. [4] Barlow, W. E. and Prentice, R. L. (1988). Residuals for relative risk regression. Biometrika, [5] Bickel, P. J. and Wichura, M. J. (1971). Convergence criteria for multiparameter stochastic processes and some applications. The Annals of Mathematical Statistics, [6] Billingsley, P. (213). Convergence of probability measures, John Wiley & Sons. [7] Breslow, N. (1974). Covariance analysis of censored duration data. Biometrics, [8] Chen, K. and Jin, Z. and Ying, Z. (22). Semiparametric analysis of transformation models with censored data. Biometrika, [9] Cheng, S. C. and Wei, L. J. and Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika, [1] Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), [11] Cox, D. R. (1975). Partial likelihood. Biometrika, 62(2): [12] Dabrowska, D. M. and Doksum, K. A. (1988). Partial likelihood in transformation models with censored data. Scandinavian journal of statistics, [13] Delgado, M. A. and Stute, W. (28). Distribution-free specification tests of conditional models. Journal of Econometrics, [14] Durbin, J. and Knott, M. and Taylor, C. C. (1975). Components of Cramérvon Mises statistics. II. Journal of the Royal Statistical Society. Series B (Methodological),

23 [15] Fleming, T. R. and Harrington, D. P. (1991). Counting processes and duration analysis, Wiley, New York. [16] Lin, D. Y. and Wei, L. J. and Ying, Z. (1993). Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika, 8(3): [17] Martinussen, T. and Scheike, T. H. (27). Dynamic regression models for duration data, Springer Science & Business Media. [18] Schoenfeld, D. (198). Chi-squared goodness-of-fit tests for the proportional hazards regression model. Biometrika, 67(1): [19] Schoenfeld, D. (1982). Partial residuals for the proportional hazards regression mode. Biometrika, [2] Stute, W. (1993). Consistent estimation under random censorship when covariables are present. Journal of Multivariate Analysis, 45(1): [21] Stute, W. (1997). Nonparametric model checks for regression. The Annals of Statistics, [22] Therneau, T. M. and Grambsch, P. M. and Fleming, T. R. (199). Martingale-based residuals for survival models. Biometrika, [23] Tsiatis, A. A. (1981). A large sample study of Cox s regression model. The Annals of Statistics,

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);