Goodness-of-fit test for the Cox Proportional Hazard Model

Size: px
Start display at page:

Download "Goodness-of-fit test for the Cox Proportional Hazard Model"

Transcription

1 Goodness-of-fit test for the Cox Proportional Hazard Model Rui Cui Department of Economics, UC3M Abstract In this paper, we develop new goodness-of-fit tests for the Cox proportional hazard model. We derive principal component decomposition of the cumulative martingale residual process and construct new tests based on its estimated components, which overperform the corresponding omnibus test. The omnibus test, consistent in the deviation of non-peracute alternatives, is in fact a weighted average of all components, while our test is based on each component, i.e., it is not able to detect all possible alternatives, but it is very powerful in some high-frequency directions. Smooth tests, which are unweighted averages of a few components, are also constructed. The finite sample performance of the tests are illustrated by mean of a Monte Carlo experiment. JEL: C12; C52; Keywords: Duration analysis; Goodness-of-fit; Principal component decomposition; Right-censorship; 1 Introduction The Cox Proportional Hazard model has been widely used in many fields, including economics, since it has been proposed by David Cox in The model specifies the interested duration time through its hazard rate, which is the candidate to describe a dynamic time-dependent phenomena. Also it introduces covariable effects, I very much appreciate the help and support from Miguel Delgado and Winfried Stute. All errors belong to me. 1

2 that makes regression analysis possible for duration data under censorship. The estimation of the Cox model has been studied by Cox (1972,1975) through a partial likelihood approach. The large sample properties has been studied by Tsiatis (1981) and Andersen and Gill (1982) among others. Andersen and Gill (1982) adopted a counting process approach and extended the results to recurrent events process. The counting process approach, which is equivalent to the hazard approach, becomes popular because of the introduction of martingale theory, which makes duration analysis possible. A comprehensive review is in Fleming and Harrington (1991). The Cox model might fail in two ways. On one hand, the Cox model assumes that the hazard rates among individuals are proportional, i.e.,, the hazard ratio is time invariant. This proportional assumption might fail. On the other hand, the specification of the covariable effect might be misspecified. This misspecification might be of the functional form of the covariables and the exponential form of the link function. For model checking, various graphical methods and goodness-of-fit tests have been proposed in the literature. The most common method consists of using the martingale residuals defined by Barlow and Prentice (1988). The martingale residuals, which come from the Doob-Meyer decomposition of the counting process, provide a basis for goodness-of-fit tests of hazard models, e.g. Lin, Wei and Ying (1993), Martinussen and Scheike (27). For the Cox model, the landmark paper is Lin, Wei and Ying (1993). They developed a class of goodness-of-fit tests, including an omnibus test and special tests for the proportional hazard assumption, the functional form of covariables and the form of the link function. Their method is based on the cumulative sum of martingale residuals. Principal component decomposition approach, commonly used in functional analysis, has been used to develop more powerful goodness-of-fit tests for different models. The landmark paper is by Durbin, Knott and Taylor (1975). They studied the standard empirical process with estimated parameters to test the specification of the distribution function and derived its principal components. These components not only help to solve the problem caused by estimation, but also provide a basis for more powerful tests in certain directions. Stute (1997) studied the marked residual process and its principal components to test the specification of nonparametric regression model. Anh and Stute (212) studied the principal component analysis of the martingale part of the empirical process to test parametric hazard model. In all the cases, the obtained components serve as special experts to detect certain deviations. In this paper, more powerful tests are developed for the Cox model using a con- 2

3 ditional principal component decomposition approach. I consider the CUSUM of martingale residuals as in Lin, Wei and Ying (1993), and derive its principal component decomposition in the time dimension. The obtained components are sensitive when detecting certain deviations from the proportional hazard assumption, for instance, higher-frequency deviations are more reflected in later components. The decompose method in this paper is applicable for any model that has a martingale interpretation, including hazard models and transformation models. However, we focus on the Cox model in the present paper. A brief introduction of the Cox model together with some other important models in duration analysis and the omnibus test proposed by Lin, Wei and Ying (1993) is in section 2. Section 3 contains the main result: the principal component decomposition, the asymptotic results of the component processes and the test statistics based on the components. Simulation studies illustrating the performance of our tests in finite sample are presented in section 4. 2 Omnibus Test for the Cox Model 2.1 The Cox Proportional Hazard Model In the framework of regression analysis with right-censored duration data, consider a sample {Z i, i, X i }, i = 1,, n of i.i.d. realizations of {Z,, X}. Here Z is the minimum of the non-negative failure and censoring time, which are denoted by T and C, i.e.,,, Z = min(t, C). The indicator = 1 {T C} contains the information indicating which of T and C is actually observed, and X is the covariable vector. The conditional distribution of failure time is usually better described through its hazard functions rather than densities. The conditional cumulative hazard function is given by Λ(t X) = t df (u X) 1 F (u X), where F is the conditional distribution function of the failure time. If F admits a density f, we have The function dλ(t X) = λ(t X) = f(t X) 1 F (t X) dt. f(t X) 1 F (t X) 3

4 is called the hazard function. It also has a conditional probability expression, λ(t X) = lim h h 1 P (t T < t + h T t, X). In the Cox proportional hazard model, the hazard rate is assumed to have the multiplicative form λ(t X) = λ (t)exp(β X), where λ (t) is an unspecified baseline hazard function. Another approach to the censored data regression models is based on the analysis of counting process. Define the following two processes N(t) = 1 {Z t, =1}, Y (t) = 1 {Z t}. Here N is the counting process and Y is the at-risk process. Applying the Doob- Meyer decomposition, there is a unique predictable process A such that N A is a martingale and A is called the compensator of N. In the counting process approach, instead of modeling conditional hazard rate of T, the compensator process is modeled. Notice that the information contained in {Z, } is equivalent to that contained in {N, Y }. Actually, these two approaches are equivalent under the conditional independence of T and C on X. To be more specific, the process given by M(t) = N(t) t Y (u)dλ(u X) is a martingale with the filtration F t = σ{x, N(u), Y (u+) : u t}. Then modeling the compensator t Y (s)dλ(s X) is equivalent to modeling the conditional hazard. Hence, if the Cox specification is correct for a given sample, there exists a β and λ (t), such that M i (t) = N i (t) t Y i (s)exp(β X i )λ (s)ds i = 1,, n are martingales. The corresponding martingale residuals are defined as ˆM i (t) = N i (t) t Y i (s)exp( ˆβ X i )dˆλ (s), (2.1) 4

5 where ˆβ is an estimator of β and ˆΛ (t) is an estimator of the cumulative baseline hazard function Λ (t) = t λ (s)ds. These martingale residuals provide a basis for goodness-of-fit test for the Cox model. The estimation was suggested by Cox (1972,1975) using the partial likelihood inference. The partial likelihood score function for β is U(β) = ( X i X(β, ) t) dn i (t), where X(β, t) = n Y i(t)e β X i X i n Y i(t)e β X i. The partial likelihood estimator ˆβ is the solution to U(β) =. Under some mild regularity conditions, n 1/2 ( ˆβ β ) converges in distribution to a centered Gaussian variable with covariance matrix Σ(β ) 1. The matrix Σ(β) is defined as [ ] Σ(β) = E (X i X(β, s)) 2 Y i (s)e β X i λ (s)ds, with being the limit of X(β, t). X(β, t) = E[Y (t)eβ X X] E[Y (t)e β X ] The cumulative baseline hazard is estimated by the Breslow (1974) estimator t n ˆΛ (t) = dn i(u) n Y. i(u)e ˆβX i 2.2 Other Important Models in duration analysis The Cox proportional hazard model assumes the conditional hazard rate of the duration time to be as the product of a baseline hazard and the covariable effect. In this sense, it is also called the multiplicative hazard model. Another important hazard model is the Aalen s additive hazard model, which is proposed by Aalen (198) and the hazard rate is assumed to be a summation of the covariable effects. The multiplicative and additive hazard models are suitable for regression analysis of duration data, however, they are not the only important models in duration analysis. There are two general classes of models in duration analysis with regression, the transformation model and the accelerated failure time model. In fact, the Cox model is a special case of the transformation model. 5

6 A transformation model is H(T ) = β X + ε, (2.2) with H( ) an unknown monotone transformation and ε an error term with a known distribution. The transformation model has a martingale interpretation, i.e.,, if we denote Λ ε as the known cumulative hazard function of ε, then M(t) = N(t) t Y (u)dλ ε (β X + H(u)) is a martingale. One special case is the Cox model, in which ε is taken to follow the extreme-value distibution with Λ ε (t) = e t and the transformation is taken as H( ) = ln(λ ( )). Another special case is the proportional odds model, in which ε follows the standard logistic distribution. The accelerated failure time model assumes log(t ) = β X + ε, (2.3) with unspecified distribution of ε. It is just a transformed version of ordinary linear model. The inference of accelerated failure time model is not as easy as that of the Cox model because of censorship. This is no direct martingale structure in (2.3). Although the parameter is easily interpreted as the effect on the mean of log(t ) in the standard linear regression model, it is not so clear when T is under censorship. For transformation model, Chen et al. (22) has proposed an estimating equation approach based on the martingale structure. The estimation coincides with the partial-likelihood estimator in the special case of the Cox model. A brief review of transformation model and accelerated failure time model can be found in Martinussen and Scheike (27). The method to construct goodness-of-fit test in this paper is applicable to models that have a martingale structure, e.g. hazard models and transformation models. The tests we propose are therefore helpful with model selection for analysis of duration data. We demonstrate the method under the Cox model, and generate datas from transformation models and accelerated failure time models as alternatives in the simulation, to study the power of our tests. 2.3 Omnibus Test To test the specification of the Cox model, i.e.,,, to test H : λ(t X) = λ (t)exp(β X) for some β and λ (t), 6

7 we could consider the CUSUM of the martingales R n (t, x) = n 1/2 1 {Xi x}m i (t), (2.4) where M i (t) = N i (t) t Y i(s)exp(β X i )λ (s)ds, i = 1,, n are martingales under H. Lin, Wei and Ying (1993) proposed an omnibus test for the Cox model by considering the process with estimated β and Λ, ˆR n (t, x) = n 1/2 1 ˆM {Xi x} i (t), (2.5) where ˆM i (t) = N i (t) t Y i(s)exp( ˆβ X i )dˆλ (s), i = 1,, n are the martingale residuals. They have shown that, under the null hypothesis, the process ˆR n (t, x) converges weakly to a centered Gaussian process R (t, x) in the space D([, ) [ 1, 1]). Kolmogorov type statistic is constructed based on this process. To simplify the notation, we only consider the univariate case, i.e.,, real-valued X. In the next section, we decompose R n into a countable sum of component processes, and use these component processes to construct new test statistics. 3 Tests based on Component Processes 3.1 Conditional Principal Component Analysis Notice that the process R n in (2.4) is bivariate with non-independent components x and t. Hence, the direct Karhunen-Loève representation is not available in this case. Instead, I adopt a conditional principal component decomposition, i.e.,, to do the decomposition of the process conditional on X. From now on, we impose the following assumptions. (A1). T and C are independent conditional on X. (A2). X is bounded, without loss of generality by 1. (A3). C is independent of X. (A4). For each τ [ <, P {Y (τ) = 1} >. (A5). Σ(β ) = E (X i X(β ], s)) 2 Y i (s)e β X i λ (s)ds is positive definite. The first two assumptions are standard in the Cox model. The third one is needed to justify consistency of the martingale conditional variance. The last two assumptions are needed to get the asymptotic distribution of the partial likelihood estimator ˆβ, 7

8 see Anderson and Gill (1982), Theorem 4.2. Let us begin with the decomposition of the martingale M i (t) conditional on X i. The conditional covariance of M i conditional on X i is [ s t ] E(M i (s)m i (t) X i ) = E Y i (u)e β X i λ (u)du X i = = = = s t s t s t s t E[Y i (u) X i ]e β X i λ (u)du P (T i u, C i u X i )e β X i λ (u)du P (T i u X 1 )P (C i u X i )e β X i λ (u)du exp( Λ (u)e β X i )P (C i u)e β X i λ (u)du. The first equation follows from martingale properties (Fleming and Harrington (1991), Theorem 2.5.1). The last two equations follow respectively from assumption (A1) and (A3). Let us denote the conditional covariance function by T (s t, x) = E(M i (s)m i (t) X i = x), with T (t, x) := E(M 2 (t) X = x) = t exp( Λ (u)e β X )P (C u)e β X λ (u)du. Notice that function T is non-decreasing in t, and T (, x) =, T (, x) 1. Remark. Suppose we do not have censorship, then T (t, x) = 1 exp( Λ (t)e β x ) = F T (t X = x). The conditional covariance function of the martingale equals to the conditional distribution function of T. In this case, T (, x) = 1. Let µ j = 4 π 2 (2j 1) 2, ϕ j(t) = 2sin (2j 1)πt, j = 1, 2, 2 be the eigenvalues and eigenfunctions of the standard Brownian Motion with covariance structure K(s, t) = s t. For each x, let f j be the transformation f j (t, x) = ϕ j (T (t, x)/t (, x)). 8

9 Then {f j (, x)} form an orthonormal basis of a subspace of L 2 (R +, T (, x)/t (, x)), the Hilbert space of all square integrable functions on R + with the inner product T (dt, x) ρ, g x = ρ(t)g(t) R T (, x). + Actually, f j, f h x = = ϕ j R + 1 ( T (t, x) ) ( T (t, x) ϕ h T (, x) T (, x) ϕ j (u)ϕ h (u)du = { 1 j = h j h. ) T (dt, x) T (, x) Moreover, {f j (, x)} are the eigenfunctions of the covariance structure T (s t, x)/t (, x) with associated eigenvalues {µ j }, i.e.,, T (s t, x) R T (, x) f T (ds, x) j(s, x) T (, x) = µ jf j (t, x). + By Mercer s theorem, the covariance function can be decomposed as T (s t, x) = µ j f j (s, x)f j (t, x). T (, x) Since T (s t, x)/t (, x) is the conditional covariance function of the process M i (t)(t (, x)) 1/2 given X i = x, we have the decomposition M i (t)(t (, X i )) 1/2 = µ 1/2 j z ij f j (t, X i ) a.s., (3.1) where z ij := µ 1/2 j (T (, X i )) 1/2 M i, f j (X i ) Xi 1/2 = µ j (T (, X i )) 3/2 M i (t)f j (t, X i )T (dt, X i ). R + The z ij is the j th principal component of M i (t)(t (, X i )) 1/2 conditional on X i. For each j and j h, it has the following properties E(z ij X i ) =, E(z 2 ij X i ) = 1, (3.2) E(z ij z ih X i ) =. Hence, from (3.1) the Karhunen-Loève representation of R n can be written as R n (t, x) = n 1/2 = µ j 1/2 1 {Xi x} [(T (, X i )) 1/2 ] µ 1/2 j z ij f j (t, X i ) ] [n 1/2 z ij 1 {Xi x}(t (, X i )) 1/2 f j (t, X i ). 9

10 I call the term in bracket the j th component process of R n and denote it as c n,j (t, x) := n 1/2 z ij 1 {Xi x}(t (, X i )) 1/2 f j (t, X i ). (3.3) Thus we have the following proposition. Proposition 1 Under the null hypothesis and (A1)-(A5), the CUSUM of martingale processes (2.4) can be decomposed into a weighted sum of component processes R n (t, x) = µ 1/2 j c n,j (t, x). (3.4) The weights are the square root of the eigenvalues of the standard Brownian Motion. 3.2 Asymptotic Theory of Component Processes The expression of the component process (3.3) looks complicated. However it can be simplified a lot. Let us define another function g j corresponding to f j as g j (t, x) := φ j (T (t, x)/t (, x)) = 2cos (2j 1)T (t, x)/t (, x). 2 By changing the order of integration and change of variables, the component processes can be rewritten as c n,j (t, x) = n 1/2 1 {Xi x}f j (t, X i )g j (s, X i )dm i (s). (3.5) The following theorem shows the convergence of the component process, which follows from a tightness result and the fact that it is a sum of i.i.d. centered random functions with variance H j (t, x) := E[1 {Xi x}t (, X i )fj 2 (t, X i )] = Here F ( ) is the distribution function of X. x T (, s)f 2 j (t, s)f (ds). Theorem 1 Under the null hypothesis and (A1)-(A5), for each j, the process c n,j (t, x) converges weakly to a centered Gaussian process in the space D([, ) [ 1, 1]), c d n,j c,j. The limit Gaussian process c,j has covariance structure K(t 1, t 2, x 1, x 2 ) = x1 x 2 T (, s)f j (t 1, s)f j (t 2, s)f (ds). 1

11 Moreover, c,j and c,h are independent for j h. In order to have test statistics, we need to consider the component process after estimation ĉ n,j (t, x) := n 1/2 1 ˆf {Xi x} j (t, X i )ĝ j (s, X i )d ˆM i (s). (3.6) Here ˆM i (t) = N i (t) t Y i (s)exp( ˆβ X i )dˆλ (s), ˆf j (t, x) = ϕ j ( ˆT (t, x)/ ˆT (, x)), ĝ j (t, x) = φ j ( ˆT (t, x)/ ˆT (, x)). As remarked earlier, it involves the estimators of β and Λ and the estimator of the conditional covariance function T (t, x). For ˆT (t, x), recall that T (t, x) = t A natural consistent estimator is ˆT (t, x) = t exp( Λ (u)e β X )P (C u)e β X λ (u)du. exp( ˆΛ (u)e ˆβx )(1 Ĝ(u ))e ˆβx dˆλ (u), where Ĝ is the Kaplan-Meier estimator of the distribution function of C. In the appendix, it is shown that ĉ n,j (t, x) has the same asymptotic distribution as c n,j (t, x) := n 1/2 A j (t, x)σ(β ) 1 n 1/2 [ 1 {Xi x}f j (t, X i )g j (s, X i ) l ] j (β, t, x, s) dm i (s) with X(β, t) and Σ(β) defined in section 2.1, and (X i X(β, s))dm i (s), lj (β, t, x, s) = E[Y (s)eβ X 1 {X x} f j (t, X)g j (s, X)] E[Y (s)e β X, ] [ ] A j (t, x) = E Y (s)e βx (X X(β, s))λ (s)1 {X x} f j (t, X)g j (s, X)ds. The process c n,j (t, x) can be written in the form of c n,j (t, x) = n 1/2 h ij (β, t, x, s)dm i (s), (3.7) 11

12 with h ij (β, t, x, s) = 1 {Xi x}f j (t, X i )g j (s, X i ) l j (β, t, x, s) A j (t, x)σ(β) 1 (X i X(β, s)). The asymptotic distribution of ĉ n,j (t, x) is derived by the following theorem. Theorem 2 Under the null hypothesis and (A1)-(A5), for each j = 1, 2,, the process ĉ n,j (t, x) converges weakly to a centered Gaussian process in the space D([, ) [ 1, 1]), ĉ n,j d c,j. The limit Gaussian process c,j (t, x) has covariance structure K(t 1, t 2, x 1, x 2 ) = E [ h j (β, t 1, x 1, s)h j (β, t 2, x 2, s)y (s)e β X λ (s)ds ]. In addition to the single component process, finite weighted sum of some component processes can also be used for model checking. In this sense, we combine information from different components. Consider the first m component processes with weight w = {w j } m, i.e.,, the process m w jĉ n,j (t, x). It has the same asymptotic distribution with the following process m ( m ) w j c n,j (t, x) = n 1/2 w j h ij (β, t, x, s) dm i (s). The asymptotic distribution is given in the following theorem. Theorem 3 Under the null hypothesis and (A1)-(A5), for any given weight w = {w j } m, the process m w jĉ n,j (t, x) converges weakly to a centered Gaussian process in the space D([, ) [ 1, 1]), m w j ĉ d n,j c w. The limit Gaussian process c w (t, x) has covariance structure [ ( m ) ( m ) ] K(t 1, t 2, x 1, x 2 ) = E w j h j (β, t 1, x 1, s) w j h j (β, t 2, x 2, s) Y (s)e βx λ (s)ds. 3.3 Test Statistics The omnibus tests are based on the original CUSUM of the martingales. By the continuous mapping theorem, we have the following asymptotic distribution of the Kolmogorov-Smirnov and Cramér-von Mises type statistics KS o = sup t,x ˆRn (t, x) d R sup t,x (t, x), 12

13 CvM o = [ ˆRn (t, x)] 2 Fn (dx)dt d [ R (t, x)] 2F (dx)dt. Here F n ( ) is the empirical distribution of X. The component processes we derived provide a basis of new specification tests for the Cox model. I propose to construct Kolmogorov-Smirnov and Cramér-von Mises type statistics based on each component process, i.e.,, for each j = 1, 2,, we have the following, what I call, component tests, ĉn,j KS nj = sup t,x (t, x) d sup t,x c,j (t, x), CvM nj = [ĉ n,j (t, x)] 2 F n (dx)dt d [ c,j (t, x)] 2 F (dx)dt. Note that in (3.4), the weight for the j th component process is µ 1/2 j that decreases very rapidly in j. In consequence, the later components are down-weighted in the original process. In fact, each component reflects certain aspect of a deviation from the null hypothesis. For example, high-frequency deviations are more reflected in later components. Therefore, the omnibus test, which gives low weights to later components, has low power, while the tests based on later components are specially designed for such high-frequency alternatives. Since different aspects of a deviation are distinguished through its components and it is difficult to decide which component to use before model checking, we should construct tests based on each component and reject the null hypothesis if any of them gives us a rejection. In practice, the data should not be very frequent, hence we can focus on the first few components, say no more than ten in general. In addition, smooth test statistics based on the reweighted sum of component processes can be constructed. If we give the components with equal weights and consider the sum of the first m components, the Kolmogorov and Cramér-von Mises type statistics, for some fixed m, can be constructed as m KS nm = sup t,x w j ĉ n,j (t, x) d sup t,x c w (t, x), [ m 2Fn CvM nm = w j ĉ n,j (t, x)] (dx)dt d [ c w (t, x)] 2 F (dx)dt. The smooth tests provide a compromise between the omnibus tests and the tests based on one component. The smooth test is the one that takes w = (1,, 1). The test based on the j th component process is the one that takes w as the j th unit vector, i.e.,, w = 13

14 (,, 1,, ). However, the problem is that one has to choose a suitable w before model checking. Accually, we can take into account the information from all the component processes by considering a new test that behaves as an intersection of the component tests. The idea is based on Bonferroni method, i.e., we run the first m component tests, and record the decision for each one. Then we accept H if all the m tests accept, and reject H if any of them gives us a rejection. Specifically, let T 1, T 2,, T m be the first m component tests with common size x. The Bonferroni test T is { if T1 = = T m =, T = 1 o.w. The probability of our test to accept under H is P (T 1 =,, T m = ), and it admits the following inequality P (T 1 =,, T m = ) P (T 1 = ) + + P (T m = ) (m 1) = (1 x) + + (1 x) (m 1) = 1 mx. For a significant level α, we could choose x = α/m, then the size of our test will be 1 P (T 1 =,, T m = ) mx = α, i.e., the Bonferroni test has a bounded size of α. To approximate the limit distribution c,j (t, x), we follow the suggestion of Lin, Wei and Ying (1993) through Monte Carlo simulations. Recall from the expression (3.7), c n,j (t, x) is a martingale integral. To approximate its asymptotic distribution, the integrand h i (β, t, x, s) can be replaced by its consistent estimator, but we do not know the distribution form of the martingale M i (t). Lin, Wei and Ying (1993) suggested to replace M i (t) by a similar process which has a known distribution. The candidate is N i (t)g i, where N i is the observed counting process and {G i ; i = 1,, n} is a random sample of standard normal variables. Noticing the martingale property E[M 2 (t)] = E[N(t)], the process M i (t) and N i (t)g i have the same variance function. Finally replace all the unknown quantities in h i (β, t, x, s) by their consistent estimators, i.e.,, replace β, Λ (t), f j (t, x), g j (t, x) by ˆβ, ˆΛ (t), ϕ j ( ˆT (t, x)/ ˆT (, x)), φ j ( ˆT (t, x)/ ˆT (, x)) and replace X(β, t), l(β, t, x, s) by their sample analogies. Given the observed data, the distribution of the process after replacement is the same with c n,j (t, x) in the limit. 14

15 4 Simulation study As discussed earlier, the accelerated failure time model and transformation model provide general frameworks for studying the covariable effects of duration data. In our simulation study, we take several alternatives from these models to study the power of our tests. We consider the following DGPs with explanations afterwards. Cox: λ(t X) = λ (t)exp(β X). DGP1: Weibull hazard rate λ(t X) = (.2X)t.2X 1. DGP2: Log-normal Model ln(t ) = β X + ɛ. Here we take ɛ as a standard normal variable. This model is a special case of accelerated failure time models. DGP3: Transformation Model Λ (T )e β X = P areto, where P areto is a standard Pareto variable, which has hazard rate x 1 for x > 1. DGP4: Transformation Model Λ (T )e β X = A 1, where A 1 is a positive random variable that has hazard rate λ(t) = 1 + sin(3πt/2). DGP5: Transformation Model Λ (T )e β X = A 2, where A 2 is a positive random variable that has hazard rate λ(t) = 1 + cos(3πt/2). DGP6: Transformation Model Λ (T )e β X = A 3, where A 3 is a positive random variable that has hazard rate λ(t) = 1 + sin(5πt/2). 15

16 DGP7: Transformation Model Λ (T )e β X = A 4, where A 4 is a positive random variable that has hazard rate λ(t) = 1 + cos(5πt/2). DGP1 is the Weibull hazard model, in which the hazard for different values of the covariable is non-proportional. DGP2 is a commonly used model in economics, and it belongs to the accelerated failure time models. DGP3-7 are transformation models with unspecified transformation ln(λ ( )). The Cox model, as a special case of a transformation model, can be expressed as Λ (T )e β X = E, where E is the standard exponential variable with constant hazard rate. In DGP3, we replace the exponential by a Pareto variable which has decreasing hazard rate. For DGP4-7, we call them high-frequency alternatives, in the sense that the variable A 1, A 2, A 3, A 4 have periodic hazard rates rather than constants. We take β =.2, λ (t) = 1, Λ (t) = t, and X =, 1,, 9 with equal proportions. The censoring variable in each case is drawn from uniform distribution such that the percentage of censorship is around 3%. We run for sample size n = 5, 1, 15, and use 1 realizations of the Gaussian process to estimate the distribution of each statistic. We run 1 replications for each DGP. The result is shown in the table 1 and 2. The omnibus test is based on the original process ˆR n (t, x). The smooth test is based on the reweighted sum of the first five component processes as discussed in the previous section. The last five lines in the tables are for tests based on the first five component processes. I use bold type to indicate the test that has the largest power. Since the Weibull hazard rate data is highly non-proportional, the omnibus test works well, but the test based on the second component has larger power. The lognormal data seems to fit the Cox model well, but still the test based on the second component has largest power. For the Pareto alternative, it is the same situation. Finally, from the result of high-frequency alternatives DGP4-7, it is clear how these components serve as special experts for certain deviations from the proportional hazard assumption. When the alternative gets more frequent in the time domain, the test based on latter component behaves better. 16

17 Table 1: Estimated size and power of KS tests at 5% Cox DGP1 DGP2 DGP3 n = n = n = n = omnibus smooth Bonferroni DGP4 DGP5 DGP6 DGP7 n = n = n = n = omnibus smooth Bonferroni Table 2: Estimated size and power of CvM tests at 5% Cox DGP1 DGP2 DGP3 n = n = n = n = omnibus smooth Bonferroni DGP4 DGP5 DGP6 DGP7 n = n = n = n = omnibus smooth Bonferroni

18 Table 3: Estimated size and power of KS component tests at 5% Cox DGP1 DGP2 DGP3 n = n = n = n = st nd rd th th DGP4 DGP5 DGP6 DGP7 n = n = n = n = st nd rd th th Table 4: Estimated size and power of CvM component tests at 5% Cox DGP1 DGP2 DGP3 n = n = n = n = st nd rd th th DGP4 DGP5 DGP6 DGP7 n = n = n = n = st nd rd th th

19 5 Conclusion We have used conditional principal component decomposition method to decompose the martingale process in hazard model with regression. The component processes provide a basis of more powerful specification tests. The decomposition is in the time domain, and each component process reflects certain deviations from the proportional hazard assumption. The method is applicable for any hazard model with regression and for transformation models. However, these components do not help when the deviations come from misspecifications of the covariable effect, for example, missing variable or wrong link function. To have more powerful tests against these deviations, we need the decomposition of the bivariate process R n (t, x) in x. Since x and t play different roles in R n (t, x), the decomposition method should be different. The decomposition in x will be discussed in the following paper. 6 Appendix: Proofs Proof of Theorem 1: Note that each f j and g j are bounded and differentiable. The tightness of c n,j follows from Lemma 1 in Lin, Wei and Ying (1993). It then follows from the multivariate CLT that the process converges weakly to a centered Gaussian process. The independence between c,j and c,h comes from the Gaussian property and conditional uncorrelation between z ij and z ih. Proof of the asymptotic equivalence of ĉ n,j (t, x) and c n,j (t, x): The asymptotic properties of ˆβ and ˆΛ is given by Tsiatis (1981) and Andersen and Gill (1982). By taking the Taylor s expansion of ĉ n,j (t, x) and the score function 19

20 U(β) at β, we have ĉ n,j (t, x) = n 1/2 n 1/2 n 1 Σ(β ) 1 n 1/2 +o p (1). 1 {Xi x} ˆf j (t, X i )ĝ j (s, X i )dm i (s) n Y i(s)e β X i 1 ˆf {Xi x} j (t, X i )ĝ j (s, X i ) n Y i(s)e β X i dm i (s) Y i (s)e β X i (X i X(β, s))λ (s)1 {Xi x} ˆf j (t, X i )ĝ j (s, X i )ds (X i X(β, s))dm i (s) By the strong consistency of ˆβ, ˆΛ and the Kaplan-Meier estimator, together with the continuous mapping theorem, ˆfj and ĝ j are strongly consistent. Hence, for the first term on the right hand side of the above equation, by the martingale property and the strong consistency and boundness of ˆf j and ĝ j, we have E [n 1/2 = E [n 1/2 =, ( 1 ˆf ) ] 2 {Xi x} j (t, X i )ĝ j (s, X i ) 1 {Xi x}f j (t, X i )g j (s, X i ) dm i (s) ( 1 ˆf 2Yi ] {Xi x} j (t, X i )ĝ j (s, X i ) 1 {Xi x}f j (t, X i )g j (s, X i )) (s)e β X i λ (s)ds [( E 1 ˆf ) 2Yi ] {Xi x} j (t, X i )ĝ j (s, X i ) 1 {Xi x}f j (t, X i )g j (s, X i ) (s)e β X i λ (s)ds thus n 1/2 ( 1 ˆf ) {Xi x} j (t, X i )ĝ j (s, X i ) 1 {Xi x}f j (t, X i )g j (s, X i ) dm i (s) = o p (1). The same argument for the second term, since from the strong consistency of ˆf j and ĝ j and the uniform SLLN, we have n 1 Y i (s)e β X i 1 {Xi x}( ˆf j (t, X i )ĝ j (s, X i ) f j (t, X i )g j (s, X i )) = o p (1). For the third term, we have n 1 and Y i (s)e β X i (X i X(β, s))λ (s)1 {Xi x}( ˆf j (t, X i )ĝ j (s, X i ) f j (t, X i )g j (s, X i ))ds = o p (1), n 1/2 (X i X(β, s))dm i (s) d N(, Σ(β )). 2

21 Thus, ĉ n,j (t, x) and c n,j (t, x) have the same asymptotic distribution. Proof of Theorem 2: To show the tightness of ĉ n,j (t, x), it suffices to show the tightness of c n,j (t, x). Recall c n,j (t, x) = n 1/2 A(t, x)σ(β ) 1 n 1/2 [ 1 {Xi x}f j (t, X i )g j (s, X i ) l(β ], t, x, s) dm i (s) (X i X(β, s))dm i (s). From Lemma 1 in Lin, Wei and Ying (1993), the first term is tight. The second term is tight since n 1/2 (X i X(β, s))dm i (s) converges in distribution. It then follows from the multivariate CLT that ĉ n,j (t, x) converges weakly to a centered Gaussian process. 21

22 References [1] Aalen, O. O. (198). A model for non-parametric regression analysis of life times. Mathematical Statistics and Probability Theory (eds W. Klonecki, A. Kozek and J. Rosinski), Lecture Notes in Statistics, vol. 2, Springer-Verlag, New York. [2] Andersen, P. K. and Gill, R. D. (1982). Cox s regression model for counting processes: a large sample study. The annals of statistics, [3] Anh, T. L. and Stute, W. (212). Principal Component Analysis of Martingale Residuals. Indian Statist. Assoc. [4] Barlow, W. E. and Prentice, R. L. (1988). Residuals for relative risk regression. Biometrika, [5] Bickel, P. J. and Wichura, M. J. (1971). Convergence criteria for multiparameter stochastic processes and some applications. The Annals of Mathematical Statistics, [6] Billingsley, P. (213). Convergence of probability measures, John Wiley & Sons. [7] Breslow, N. (1974). Covariance analysis of censored duration data. Biometrics, [8] Chen, K. and Jin, Z. and Ying, Z. (22). Semiparametric analysis of transformation models with censored data. Biometrika, [9] Cheng, S. C. and Wei, L. J. and Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika, [1] Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), [11] Cox, D. R. (1975). Partial likelihood. Biometrika, 62(2): [12] Dabrowska, D. M. and Doksum, K. A. (1988). Partial likelihood in transformation models with censored data. Scandinavian journal of statistics, [13] Delgado, M. A. and Stute, W. (28). Distribution-free specification tests of conditional models. Journal of Econometrics, [14] Durbin, J. and Knott, M. and Taylor, C. C. (1975). Components of Cramérvon Mises statistics. II. Journal of the Royal Statistical Society. Series B (Methodological),

23 [15] Fleming, T. R. and Harrington, D. P. (1991). Counting processes and duration analysis, Wiley, New York. [16] Lin, D. Y. and Wei, L. J. and Ying, Z. (1993). Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika, 8(3): [17] Martinussen, T. and Scheike, T. H. (27). Dynamic regression models for duration data, Springer Science & Business Media. [18] Schoenfeld, D. (198). Chi-squared goodness-of-fit tests for the proportional hazards regression model. Biometrika, 67(1): [19] Schoenfeld, D. (1982). Partial residuals for the proportional hazards regression mode. Biometrika, [2] Stute, W. (1993). Consistent estimation under random censorship when covariables are present. Journal of Multivariate Analysis, 45(1): [21] Stute, W. (1997). Nonparametric model checks for regression. The Annals of Statistics, [22] Therneau, T. M. and Grambsch, P. M. and Fleming, T. R. (199). Martingale-based residuals for survival models. Biometrika, [23] Tsiatis, A. A. (1981). A large sample study of Cox s regression model. The Annals of Statistics,

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

1 Introduction. 2 Residuals in PH model

1 Introduction. 2 Residuals in PH model Supplementary Material for Diagnostic Plotting Methods for Proportional Hazards Models With Time-dependent Covariates or Time-varying Regression Coefficients BY QIQING YU, JUNYI DONG Department of Mathematical

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

STAT Sample Problem: General Asymptotic Results

STAT Sample Problem: General Asymptotic Results STAT331 1-Sample Problem: General Asymptotic Results In this unit we will consider the 1-sample problem and prove the consistency and asymptotic normality of the Nelson-Aalen estimator of the cumulative

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Lecture 2: Martingale theory for univariate survival analysis

Lecture 2: Martingale theory for univariate survival analysis Lecture 2: Martingale theory for univariate survival analysis In this lecture T is assumed to be a continuous failure time. A core question in this lecture is how to develop asymptotic properties when

More information

STAT 331. Martingale Central Limit Theorem and Related Results

STAT 331. Martingale Central Limit Theorem and Related Results STAT 331 Martingale Central Limit Theorem and Related Results In this unit we discuss a version of the martingale central limit theorem, which states that under certain conditions, a sum of orthogonal

More information

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model STAT331 Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model Because of Theorem 2.5.1 in Fleming and Harrington, see Unit 11: For counting process martingales with

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data 1 Part III. Hypothesis Testing III.1. Log-rank Test for Right-censored Failure Time Data Consider a survival study consisting of n independent subjects from p different populations with survival functions

More information

Investigation of goodness-of-fit test statistic distributions by random censored samples

Investigation of goodness-of-fit test statistic distributions by random censored samples d samples Investigation of goodness-of-fit test statistic distributions by random censored samples Novosibirsk State Technical University November 22, 2010 d samples Outline 1 Nonparametric goodness-of-fit

More information

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen Outline Cox s proportional hazards model. Goodness-of-fit tools More flexible models R-package timereg Forthcoming book, Martinussen and Scheike. 2/38 University of Copenhagen http://www.biostat.ku.dk

More information

Goodness-of-Fit Tests With Right-Censored Data by Edsel A. Pe~na Department of Statistics University of South Carolina Colloquium Talk August 31, 2 Research supported by an NIH Grant 1 1. Practical Problem

More information

AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t) PhD course in Advanced survival analysis. (ABGK, sect. V.1.1) One-sample tests. Counting process N(t) Non-parametric hypothesis tests. Parametric models. Intensity process λ(t) = α(t)y (t) satisfying Aalen

More information

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Overview of today s class Kaplan-Meier Curve

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Frailty Models and Copulas: Similarities and Differences

Frailty Models and Copulas: Similarities and Differences Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt

More information

Statistical Analysis of Competing Risks With Missing Causes of Failure

Statistical Analysis of Competing Risks With Missing Causes of Failure Proceedings 59th ISI World Statistics Congress, 25-3 August 213, Hong Kong (Session STS9) p.1223 Statistical Analysis of Competing Risks With Missing Causes of Failure Isha Dewan 1,3 and Uttara V. Naik-Nimbalkar

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter

More information

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where STAT 331 Accelerated Failure Time Models Previously, we have focused on multiplicative intensity models, where h t z) = h 0 t) g z). These can also be expressed as H t z) = H 0 t) g z) or S t z) = e Ht

More information

TESTINGGOODNESSOFFITINTHECOX AALEN MODEL

TESTINGGOODNESSOFFITINTHECOX AALEN MODEL ROBUST 24 c JČMF 24 TESTINGGOODNESSOFFITINTHECOX AALEN MODEL David Kraus Keywords: Counting process, Cox Aalen model, goodness-of-fit, martingale, residual, survival analysis. Abstract: The Cox Aalen regression

More information

Empirical Likelihood in Survival Analysis

Empirical Likelihood in Survival Analysis Empirical Likelihood in Survival Analysis Gang Li 1, Runze Li 2, and Mai Zhou 3 1 Department of Biostatistics, University of California, Los Angeles, CA 90095 vli@ucla.edu 2 Department of Statistics, The

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

On the Breslow estimator

On the Breslow estimator Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,

More information

Accelerated Failure Time Models: A Review

Accelerated Failure Time Models: A Review International Journal of Performability Engineering, Vol. 10, No. 01, 2014, pp.23-29. RAMS Consultants Printed in India Accelerated Failure Time Models: A Review JEAN-FRANÇOIS DUPUY * IRMAR/INSA of Rennes,

More information

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters Communications for Statistical Applications and Methods 2017, Vol. 24, No. 5, 519 531 https://doi.org/10.5351/csam.2017.24.5.519 Print ISSN 2287-7843 / Online ISSN 2383-4757 Goodness-of-fit tests for randomly

More information

On the Goodness-of-Fit Tests for Some Continuous Time Processes

On the Goodness-of-Fit Tests for Some Continuous Time Processes On the Goodness-of-Fit Tests for Some Continuous Time Processes Sergueï Dachian and Yury A. Kutoyants Laboratoire de Mathématiques, Université Blaise Pascal Laboratoire de Statistique et Processus, Université

More information

Empirical Processes & Survival Analysis. The Functional Delta Method

Empirical Processes & Survival Analysis. The Functional Delta Method STAT/BMI 741 University of Wisconsin-Madison Empirical Processes & Survival Analysis Lecture 3 The Functional Delta Method Lu Mao lmao@biostat.wisc.edu 3-1 Objectives By the end of this lecture, you will

More information

Full likelihood inferences in the Cox model: an empirical likelihood approach

Full likelihood inferences in the Cox model: an empirical likelihood approach Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:

More information

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA Ørnulf Borgan Bryan Langholz Abstract Standard use of Cox s regression model and other relative risk regression models for

More information

Likelihood Construction, Inference for Parametric Survival Distributions

Likelihood Construction, Inference for Parametric Survival Distributions Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make

More information

Exercises. (a) Prove that m(t) =

Exercises. (a) Prove that m(t) = Exercises 1. Lack of memory. Verify that the exponential distribution has the lack of memory property, that is, if T is exponentially distributed with parameter λ > then so is T t given that T > t for

More information

Survival Analysis: Counting Process and Martingale. Lu Tian and Richard Olshen Stanford University

Survival Analysis: Counting Process and Martingale. Lu Tian and Richard Olshen Stanford University Survival Analysis: Counting Process and Martingale Lu Tian and Richard Olshen Stanford University 1 Lebesgue-Stieltjes Integrals G( ) is a right-continuous step function having jumps at x 1, x 2,.. b f(x)dg(x)

More information

DAGStat Event History Analysis.

DAGStat Event History Analysis. DAGStat 2016 Event History Analysis Robin.Henderson@ncl.ac.uk 1 / 75 Schedule 9.00 Introduction 10.30 Break 11.00 Regression Models, Frailty and Multivariate Survival 12.30 Lunch 13.30 Time-Variation and

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississippi University, MS38677 K-sample location test, Koziol-Green

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0, Accelerated failure time model: log T = β T Z + ɛ β estimation: solve where S n ( β) = n i=1 { Zi Z(u; β) } dn i (ue βzi ) = 0, Z(u; β) = j Z j Y j (ue βz j) j Y j (ue βz j) How do we show the asymptotics

More information

Cox s proportional hazards/regression model - model assessment

Cox s proportional hazards/regression model - model assessment Cox s proportional hazards/regression model - model assessment Rasmus Waagepetersen September 27, 2017 Topics: Plots based on estimated cumulative hazards Cox-Snell residuals: overall check of fit Martingale

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

MODELING THE SUBDISTRIBUTION OF A COMPETING RISK

MODELING THE SUBDISTRIBUTION OF A COMPETING RISK Statistica Sinica 16(26), 1367-1385 MODELING THE SUBDISTRIBUTION OF A COMPETING RISK Liuquan Sun 1, Jingxia Liu 2, Jianguo Sun 3 and Mei-Jie Zhang 2 1 Chinese Academy of Sciences, 2 Medical College of

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

TMA 4275 Lifetime Analysis June 2004 Solution

TMA 4275 Lifetime Analysis June 2004 Solution TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

Attributable Risk Function in the Proportional Hazards Model

Attributable Risk Function in the Proportional Hazards Model UW Biostatistics Working Paper Series 5-31-2005 Attributable Risk Function in the Proportional Hazards Model Ying Qing Chen Fred Hutchinson Cancer Research Center, yqchen@u.washington.edu Chengcheng Hu

More information

Multivariate Survival Data With Censoring.

Multivariate Survival Data With Censoring. 1 Multivariate Survival Data With Censoring. Shulamith Gross and Catherine Huber-Carol Baruch College of the City University of New York, Dept of Statistics and CIS, Box 11-220, 1 Baruch way, 10010 NY.

More information

A comparison study of the nonparametric tests based on the empirical distributions

A comparison study of the nonparametric tests based on the empirical distributions 통계연구 (2015), 제 20 권제 3 호, 1-12 A comparison study of the nonparametric tests based on the empirical distributions Hyo-Il Park 1) Abstract In this study, we propose a nonparametric test based on the empirical

More information

Published online: 10 Apr 2012.

Published online: 10 Apr 2012. This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

GOODNESS-OF-FIT TEST FOR RANDOMLY CENSORED DATA BASED ON MAXIMUM CORRELATION. Ewa Strzalkowska-Kominiak and Aurea Grané (1)

GOODNESS-OF-FIT TEST FOR RANDOMLY CENSORED DATA BASED ON MAXIMUM CORRELATION. Ewa Strzalkowska-Kominiak and Aurea Grané (1) Working Paper 4-2 Statistics and Econometrics Series (4) July 24 Departamento de Estadística Universidad Carlos III de Madrid Calle Madrid, 26 2893 Getafe (Spain) Fax (34) 9 624-98-49 GOODNESS-OF-FIT TEST

More information

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia

More information

On Estimation of Partially Linear Transformation. Models

On Estimation of Partially Linear Transformation. Models On Estimation of Partially Linear Transformation Models Wenbin Lu and Hao Helen Zhang Authors Footnote: Wenbin Lu is Associate Professor (E-mail: wlu4@stat.ncsu.edu) and Hao Helen Zhang is Associate Professor

More information

On graphical tests for proportionality of hazards in two samples

On graphical tests for proportionality of hazards in two samples On graphical tests for proportionality of hazards in two samples Technical Report No. ASU/2014/5 Dated: 19 June 2014 Shyamsundar Sahoo, Haldia Government College, Haldia and Debasis Sengupta, Indian Statistical

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Cramér-von Mises Gaussianity test in Hilbert space

Cramér-von Mises Gaussianity test in Hilbert space Cramér-von Mises Gaussianity test in Hilbert space Gennady MARTYNOV Institute for Information Transmission Problems of the Russian Academy of Sciences Higher School of Economics, Russia, Moscow Statistique

More information

Analysis of transformation models with censored data

Analysis of transformation models with censored data Biometrika (1995), 82,4, pp. 835-45 Printed in Great Britain Analysis of transformation models with censored data BY S. C. CHENG Department of Biomathematics, M. D. Anderson Cancer Center, University of

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

The Log-generalized inverse Weibull Regression Model

The Log-generalized inverse Weibull Regression Model The Log-generalized inverse Weibull Regression Model Felipe R. S. de Gusmão Universidade Federal Rural de Pernambuco Cintia M. L. Ferreira Universidade Federal Rural de Pernambuco Sílvio F. A. X. Júnior

More information

ST745: Survival Analysis: Cox-PH!

ST745: Survival Analysis: Cox-PH! ST745: Survival Analysis: Cox-PH! Eric B. Laber Department of Statistics, North Carolina State University April 20, 2015 Rien n est plus dangereux qu une idee, quand on n a qu une idee. (Nothing is more

More information

Modelling Survival Events with Longitudinal Data Measured with Error

Modelling Survival Events with Longitudinal Data Measured with Error Modelling Survival Events with Longitudinal Data Measured with Error Hongsheng Dai, Jianxin Pan & Yanchun Bao First version: 14 December 29 Research Report No. 16, 29, Probability and Statistics Group

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods MIT 14.385, Fall 2007 Due: Wednesday, 07 November 2007, 5:00 PM 1 Applied Problems Instructions: The page indications given below give you

More information

A TEST OF FIT FOR THE GENERALIZED PARETO DISTRIBUTION BASED ON TRANSFORMS

A TEST OF FIT FOR THE GENERALIZED PARETO DISTRIBUTION BASED ON TRANSFORMS A TEST OF FIT FOR THE GENERALIZED PARETO DISTRIBUTION BASED ON TRANSFORMS Dimitrios Konstantinides, Simos G. Meintanis Department of Statistics and Acturial Science, University of the Aegean, Karlovassi,

More information

STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes

STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes This section introduces Lebesgue-Stieltjes integrals, and defines two important stochastic processes: a martingale process and a counting

More information

Lectures on Structural Change

Lectures on Structural Change Lectures on Structural Change Eric Zivot Department of Economics, University of Washington April5,2003 1 Overview of Testing for and Estimating Structural Change in Econometric Models 1. Day 1: Tests of

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Smoothing the Nelson-Aalen Estimtor Biostat 277 presentation Chi-hong Tseng

Smoothing the Nelson-Aalen Estimtor Biostat 277 presentation Chi-hong Tseng Smoothing the Nelson-Aalen Estimtor Biostat 277 presentation Chi-hong seng Reference: 1. Andersen, Borgan, Gill, and Keiding (1993). Statistical Model Based on Counting Processes, Springer-Verlag, p.229-255

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models

Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models G. R. Pasha Department of Statistics, Bahauddin Zakariya University Multan, Pakistan E-mail: drpasha@bzu.edu.pk

More information

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University

More information

Goodness of fit test for ergodic diffusion processes

Goodness of fit test for ergodic diffusion processes Ann Inst Stat Math (29) 6:99 928 DOI.7/s463-7-62- Goodness of fit test for ergodic diffusion processes Ilia Negri Yoichi Nishiyama Received: 22 December 26 / Revised: July 27 / Published online: 2 January

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

The Proportional Hazard Model and the Modelling of Recurrent Failure Data: Analysis of a Disconnector Population in Sweden. Sweden

The Proportional Hazard Model and the Modelling of Recurrent Failure Data: Analysis of a Disconnector Population in Sweden. Sweden PS1 Life Cycle Asset Management The Proportional Hazard Model and the Modelling of Recurrent Failure Data: Analysis of a Disconnector Population in Sweden J. H. Jürgensen 1, A.L. Brodersson 2, P. Hilber

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Modified Kolmogorov-Smirnov Test of Goodness of Fit. Catalonia-BarcelonaTECH, Spain

Modified Kolmogorov-Smirnov Test of Goodness of Fit. Catalonia-BarcelonaTECH, Spain 152/304 CoDaWork 2017 Abbadia San Salvatore (IT) Modified Kolmogorov-Smirnov Test of Goodness of Fit G.S. Monti 1, G. Mateu-Figueras 2, M. I. Ortego 3, V. Pawlowsky-Glahn 2 and J. J. Egozcue 3 1 Department

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

From semi- to non-parametric inference in general time scale models

From semi- to non-parametric inference in general time scale models From semi- to non-parametric inference in general time scale models Thierry DUCHESNE duchesne@matulavalca Département de mathématiques et de statistique Université Laval Québec, Québec, Canada Research

More information

9 Estimating the Underlying Survival Distribution for a

9 Estimating the Underlying Survival Distribution for a 9 Estimating the Underlying Survival Distribution for a Proportional Hazards Model So far the focus has been on the regression parameters in the proportional hazards model. These parameters describe the

More information

Linear life expectancy regression with censored data

Linear life expectancy regression with censored data Linear life expectancy regression with censored data By Y. Q. CHEN Program in Biostatistics, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, U.S.A.

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Survival Analysis. Lu Tian and Richard Olshen Stanford University 1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky EMPIRICAL ENVELOPE MLE AND LR TESTS Mai Zhou University of Kentucky Summary We study in this paper some nonparametric inference problems where the nonparametric maximum likelihood estimator (NPMLE) are

More information