Empirical likelihood for average derivatives of hazard regression functions

Size: px

Start display at page:

Download "Empirical likelihood for average derivatives of hazard regression functions"

Bryce Payne
5 years ago
Views:

1 Metrika ( :93 2 DOI 0.007/s Empirical likelihood for average derivatives of hazard regression functions Xuewen Lu Jie Sun Yongcheng Qi Received: 26 May 2006 / Published online: 9 February 2007 Springer-Verlag 2007 Abstract In this paper, we propose an empirical likelihood ratio method for the inference about average derivatives in semiparametric hazard regression models for competing risks data. Empirical loglikelihood ratio for the vector of the average derivatives of a hazard regression function is defined and shown to be asymptotically chi-squared with degrees of freedom equal to the dimension of covariate vector. Monte Carlo simulation studies are presented to compare the empirical likelihood ratio method with the normal-approximation-based method. Keywords Average derivative Competing risks data Empirical likelihood Nonparametric hazard regression Single-index model Introduction In survival analysis or censored data modelling, the relationship between a survival time and a set of covariates is often modelled through hazard functions. Typically, there are two types of models: the multiplicative model and the additive model (Cox 972; Aalen 980. In these models, the effects of covariates are modelled linearly through some explicit risk functions. X. Lu (B J. Sun Department of Mathematics and Statistics, University of Calgary, Calgary, AB T2N N4, Canada lux@math.ucalgary.ca Y. Qi Department of Mathematics and Statistics, University of Minnesota-Duluth, Duluth, MN 5582, USA

2 94 X. Lu et al. When the functional form of the risk function is not obvious, it requires nonparametric or semiparametric modelling for the effects of covariates (Sasieni 992; Dabrowska 987; Hastie and Tibshirani 990; Fan et al. 997; Huang 999. But the assumption of proportional hazards is essential for these models. On the other hand, when there are many covariates, the existing nonparametric methods may suffer the curse of dimensionality. In the framework of complete data, many models have been recently developed in studying nonparametric estimation of the regression function and its derivatives, while the latter is of special interest to biostatisticians, econometricians and other applied researchers. One of these models is called a singleindex model, in which the nonparametric regression function is assumed to be a nonparametric regression of the response on some linear combination of covariates (Härdle and Stoker 989; Powell et al. 989; Ichimura 993; Horowitz and Härdle 996. A nice survey on the theory of single-index models is given by Geenens and Delecroix (2005. In the models for complete data, interest centers on the mean regression functions. With censored data, however, it is often more convenient and informative to study hazard rates instead of means. There is not much research on the estimation of average derivatives of hazard functions and single-index models in survival analysis. Retaining the assumption of proportional hazards, Nielsen et al. (998 and Lu et al. (2006 investigated single-index models for censored data. Without this assumption, Lu and Burke (2005 proposed a synthetic data method for the estimation of average derivatives of mean regression functions when the censoring mechanism does not depend on the covariates. G rgens (2004 proposed kernel-based estimators for the average derivatives of risk-specific hazard functions for competing risks data. G rgens method can handle censored data where the censoring mechanism may depend on the covariates, and it has appealing features of single-index models as well as capabilities of dealing with censoring. But G rgens method requires the calculation of the covariance matrix and the finite sample properties may not be very good for small samples since the inference procedure is based on the normal approximation. In this paper, we will apply empirical likelihood methods to infer the average derivatives of hazard functions for competing risks data and conduct some finite sample Monte Carlo simulations to compare the performance of the empirical likelihood method and the normal-approximation-based method. Usually, the empirical loglikelihood ratio statistic has a limiting chi-squared distribution. It has been used for tests and constructing confidence regions (or intervals for a variety of problems, see, e.g., Thomas and Grunkemeier (975, Owen (988, 990, Qin and Lawless (994. An appealing feature of the empirical likelihood approach is that it produces confidence regions whose shape and orientation are determined entirely by the data. It has many advantages over some classical and modern methods such as the normal-approximation-based method and the bootstrap method. In particular, it does not impose prior constraints on the region shape, it does not require the construction of a pivotal quantity and the calculation of any asymptotic covariance matrices, the region is

3 Empirical likelihood for average derivatives 95 range-preserving and transformation-respecting, and it has better small sample performance than the normal-approximation-based method (Hall and La Scala 990. For the inference of the average derivatives of hazard functions, we define an empirical likelihood function based on the estimation procedure by G rgens (2004. This procedure involves the nonparametric estimation of the conditional subdistribution functions of the length of failure time and their derivatives. Hence, the data used to estimate the average derivatives are dependent. As a consequence, the empirical loglikelihood ratio does not have a standard chisquared limiting distribution. Instead, we show in Theorem 2 that the adjusted empirical loglikelihood ratio is distributed approximately as chi-squared. The plan of the paper is as follows. In Sect. 2, we introduce the empirical likelihood method for the inference about the average derivatives of hazard functions and present our main results. In Sect. 3, we report Monte Carlo simulation results and demonstrate the advantages numerically. Some concluding remarks are given in Sect. 4. All the proofs are presented in the Appendix. 2 Methodology and main results First, we introduce the average derivatives method of G rgens (2004 for hazard functions and propose the empirical likelihood method for the average derivatives. In a competing risks problem, there is a possibility of several kinds of failure. A special case of this is random censoring, where there are two risks, and the second risk can be thought of as censoring. We use the notation of G rgens (2004. Let Y represent the length of failure time, S be an indicator of the type of failure, and X be a q-vector of explanatory variables. Define the following conditional distribution functions F (y x = P(Y y, S = s X = x, F 2 (y x = P(Y y X = x. Let ξ(x denote the density of X with respect to Lebesgue measure. Define the following joint subdistributions of Y and X by A (y, x = P(Y y, S = s X = xξ(x, A 2 (y, x = P(Y y X = xξ(x. Then the conditional cumulative hazard function of Y given X is by definition H(y x = y 0 F (dv x F 2 (v x. (

4 96 X. Lu et al. It can be written as H(y x = y 0 A (dv, x A 2 (v, x. The following assumptions were used to derive the estimators of the average derivatives and prove their asymptotic normality by G rgens (2004. A. There are a function H and a vector β such that H(y x = H(y x τ β for all y and x, where a τ denotes the transpose of a column vector a. A2. The sequence {(Y i, S i, Xi τ τ n is a random sample. A3. For k N + (the set of positive integers, the following conditions are satisfied.. The q-vector X is absolutely continuous. 2. ξ is bounded. j 3. x A (dy, exist and are bounded and continuous for j =,...,+ k. 4. xa j 2 exist and are bounded and continuous for j =,...,+ k. A4. The function w : R + q R is a weight function and satisfies the following conditions.. The quantity C defined in the following is finite and nonzero. C = w(y, xa 2 (y, x 2 2 H(dy x τ βdx. 2. w(y, x is bounded. 3. x w and x 2 w exist and are bounded. A5. For k N +, the kernel function K : R q R satisfies the following conditions.. K is a bounded kernel with support [, ] q and the order of K is at least k. That is, K(x dx = and x j K(x dx = 0forj =, 2,..., k. 2. x K exists and is bounded and continuous on R q. If the hazard function is of index form, then x H(dy x = 2 H(dy x τ ββ.let W be another weight function. Then the weighted average derivative of H is defined as β = W(y, x x H(dy x dx. In model (, β is intimately related to β,asβ = γβ, where γ = W(y, x 2 H(dy x τ βdx is a nonzero scalar. Due to this relationship, one can estimate β through estimating β. Similar to the density weighted average derivative estimator of Powell et al. (989, G rgens (2004 considered the approach as follows.

5 Empirical likelihood for average derivatives 97 To estimate β and x H(dy x and to avoid random denominators in the estimation formula for x H(dy x, choose W(y, x = w(y, xa 2 (y, x 2, where w is the weight function satisfying Assumption A4. It follows that β = w(y, xa 2 (y, x x A (dy, x dx w(y, x x A 2 (y, xa (dy, x dx. Define K b (x = b q K(b x and the following estimators of A (y, x and A 2 (y, x: A n (y, x = n A 2n (y, x = n K b (x X i I(Y i yi(s i = s, K b (x X i I(Y i y. (2 Then, G rgens estimator of β is defined as βn = w(y, xa 2n (y, x x A n (dy, x dx w(y, x x A 2n (y, xa n (dy, x dx = n 2 x K b (x X i K b (x X j w(y i, xi(y j Y i I(S i = s dx n 2 j= j= K b (x X i x K b (x X j w(y i, xi(y j Y i I(S i = s dx. Under appropriate conditions, βn is a uniformly consistent and asymptotically normal estimator of β, as stated below. Theorem [Theorem (iii, G rgens (2004] Suppose Assumptions A A5 hold. If nb 2q + 2 and nb 2k 0, then βn β n and Φ(Y i, S i, X i = o p (n /2 n /2 (β n β D N(0, Σ,

6 98 X. Lu et al. where D " denotes the convergence in distribution, and Σ is the covariance matrix of Φ(Y, S, X with Φ(y, s, x = 2 + w(v, xi(y v x A (dv, x 2w(y, x x A 2 (y, xi(s = s x w(v, xi(y va (dv, x x w(y, xa 2 (y, xi(s = s 2β. (3 One has from (3 that EΦ(Y, S, X = 0 and Σ = EΦ(Y, S, XΦ(Y, S, X τ. To make inference about β, one needs to estimate the covariance matrix Σ of βn.lett = (Y, S, Xτ τ, t = (y, s, x τ τ and t i = (y i, s i, x τ i τ for i n. Define ρ 0 (t i, t j = x K b (x x i K b (x x j w(y i, xi(y j y i I(s i = s dx K b (x x i x K b (x x j w(y i, xi(y j y i I(s i = s dx and r n (t = n Then (ρ 0 (t, T j + ρ 0 (T j, t. (4 j= β n = n 2 ρ 0 (T i, T j = n j= r n (T i, 2 and Σ can be consistently estimated by Σ n = n = 4 n ( rn (T i 2βn ( rn (T i 2βn τ ( ( rn (T i βn rn (T i τ β 2 2 n. Therefore, a large sample ( α-level confidence region for the true parameter value β 0 of β based on the above normal approximation is given by R NA,α = { β : n(β n β τ Σ n (β n β χ 2 q (α,

7 Empirical likelihood for average derivatives 99 where χq 2 (α is the ( αth quantile of the chi-squared distribution with q degrees of freedom. In (3, for t = (y, s, x τ τ,letη(t = η(y, s, x = (/2{Φ(y, s, x + 2β. Then { η(t = (/2 2 + w(v, xi(y v x A (dv, x 2w(y, x x A 2 (y, xi(s = s x w(v, xi(y va (dv, x x w(y, xa 2 (y, xi(s = s. Since Eη(T i = β, i =,..., n, testing whether β0 is the true parameter value of β is equivalent to testing whether Eη(T i = β0, i =,..., n. Todothis, we apply Owen (990 empirical likelihood method. Let p = (p,..., p n be a probability vector satisfying n p i = and p i 0fori =,..., n. LetF p be a distribution function which assigns probability p i at point η(t i. Denote β (F p = p i η(t i. Then the empirical likelihood, evaluated at the true parameter value β 0,is defined by { n L(β 0 = sup p i : p i =, β (F p = β0. Since β (F p depends on unknown functions A (y, x and A 2 (y, x, we need to replace them by their kernel estimators A n (y, x and A 2n (y, x given by (2. Hence, an estimated empirical likelihood, evaluated at the true parameter value β 0 of β, is defined by L(β 0 = sup { n p i : p i =, βn (F p = β0, where β n (F p = n p i η n (T i, η n (T i = r n (T i /2. For simplicity, write W ni = η n (T i β 0. Then, by the method of Lagrange multipliers, we can easily get p i = n { + λτ W ni, i =,..., n,

8 00 X. Lu et al. where λ = (λ,..., λ q τ is the solution of n W ni + λ τ W ni = 0. (5 Note that n p i, subject to n p i =, attains its maximum n n at p i = n. So we define the empirical likelihood ratio at β 0 by R ( β0 n = (np i = n { + λ τ W ni, and the corresponding empirical loglikelihood ratio statistic is defined as L ( β 0 = 2 log R(β 0 = 2 log { + λ τ W ni. (6 The following theorem gives the asymptotic distribution of the adjusted empirical loglikelihood ratio (/4L(β 0. Theorem 2 Let β 0 be the true value of β. Then, under the assumptions stated in Theorem, we have (/4L(β 0 D χ 2 q, where χq 2 is a standard chi-squared random variable with q degrees of freedom. It is well known that the empirical loglikelihood ratio statistics for the mean of i.i.d. random vectors have a limiting chi-squared distribution. This is due to the fact that the limit of the covariance matrix for the sample mean is in agreement with the limit of the sample covariance. In this paper, η(t i β0 are i.i.d. random vectors, but the unknown functions A (y, x and A 2 (y, x have to be estimated. Instead W ni s are used to define the empirical likelihood ratio. Although η(t i β0 and W ni are so close that both sequences have the same limit covariance matrix Σ, the covariance matrix for the limit distribution of the sample mean n n W ni = βn β 0 is different (see Theorem. This disagreement explains why we need the adjusted empirical loglikelihood ratio. Theorem 2 can be used to construct the empirical likelihood ratio confidence region of β0 as follows, R EL,α ={β R q : (/4L(β χ 2 q (α, since P(β0 R EL,α = P((/4L(β0 χ q 2 (α α, n. In some applications, we often need to make inferences about a particular index coefficient or a linear combination of β0. This can be done by an empirical

9 Empirical likelihood for average derivatives 0 likelihood confidence interval for a linear combination θ = a τ β. The following theorem states the result. Theorem 3 Under the conditions of Theorem 2, for any parameter θ 0 = a τ β0, which is a linear combination of elements in β0, where a is a constant q-vector, we have (/4L a (θ 0 D χ 2, where L a (θ 0 is defined by (6 with W ni = W ni,a a τ η n (T i θ 0. A large sample ( α-level confidence interval for θ 0 based on the above approximation is given by R a,α ={θ a : (/4L a (θ a χ 2 (α. 3 Simulation studies We conduct several Monte Carlo simulations to compare the performance of the empirical likelihood method and the normal-approximation-based method in three censored hazard regression models. In all simulations, the weight function is simply w(y, x =. The kernel function is the product kernel K(x, x 2 = K u (x K u (x 2 for bivariate covariates and K u (x for univariate covariates, where K u is the univariate 4th order (k = 4 kernel function suggested by G rgens (2004: K u (v = (05/64( 5v 2 + 7v 4 3v 6 I( v. The simulations are implemented using codes in R, and a function called el.test in the R library emplik written by Mai Zhou, which is adapted from Owen s SPlus code. Example Exponential regression. Consider the following data generating process for a single-index exponential hazard regression model with two covariates (q = 2, which is a competing risks model with two risks, the second risk is viewed as the censoring. Assume that the conditional hazard function of the survival time T is given by h(t X = x = exp(u, which is a proportional hazards model, where u = x τ β 0 is the single-index with the true parameter vector β 0 = ( 2/2, 2/2 τ, X = (X, X 2 τ, X and X 2 are independent, X N(0, and X 2 χ3 2. The conditional cumulative hazard function is H(t X = x = t exp(u.

10 02 X. Lu et al. Table Coverage probabilities (% in comparisons of empirical likelihood (EL and normal approximation (NA at ( different censoring proportions (cp, sample sizes (n and bandwidths Var(X (b = c σ, where σ =, τ Var(X 2 = (, 6 τ is the standard deviation of (X, X 2 τ, c is some constant factor. The nominal level is set up at 95%. The conditional survival function is exponential NA EL NA EL NA EL NA EL NA EL cp n b = 0.6σ b = 0.7σ b = 0.8σ b = 0.9σ b =.0σ 0% b = 0.7σ b = 0.8σ b = 0.9σ b =.0σ b =.σ 20% b =.2σ b =.3σ b =.4σ b =.5σ b =.6σ 50% The distribution of the censoring variable C is exponential with its hazard function specified by h C (s X = x = exp(ν + x τ β C, where β C = (, 0 τ, ν is a constant, taking values 0.8, 0., 2 so that the censoring proportion cp is 0, 20 and 50% respectively. The sample size n has been chosen to be 30, 60 and 20, respectively. The focus of doing this is on the effect of the degree of censoring and the size of samples. For this model specification, the q-vector of the true average derivatives β0 is a function of the censoring proportion cp but proportional to β 0. Finally, the observations will be {(Y i, δ i, X i n, where Y i = min(t i, C i, δ i = I[T i C i ]=I[S i = ], [S i = ] indicates that the failure is caused by the first risk, which is of interest. The nominal confidence level ( α has been set at 95%. To assess the sensitivity of the estimates and the coverage probabilities to bandwidth, simulations are conducted for different bandwidth values. Let σ = ( Var(X, Var(X 2 τ = (, 6 τ, b = c σ, σ is the standard deviation of (X, X 2 τ and c is some constant factor. For a range of b values, we report the simulation results using 000 replicates in Table. Example 2 Log-logistic regression. In this example, we assume that the survival time T follows a log-logistic distribution with a conditional hazard function h(t X = x = exp(βτ 0 x + t exp(β τ 0 x,

11 Empirical likelihood for average derivatives 03 Table 2 Coverage probabilities (% in comparisons of empirical likelihood (EL and normal approximation (NA at ( different censoring proportions (cp, sample sizes (n and bandwidths Var(X (b = c σ, where σ =, τ Var(X 2 = (, 6 τ is the standard deviation of (X, X 2 τ, c is some constant factor. The nominal level is set up at 95%. The conditional survival function is log-logistic NA EL NA EL NA EL NA EL NA EL cp n b =.0σ b =.σ b =.2σ b =.3σ b =.4σ 0% b =.0σ b =.σ b =.2σ b =.3σ b =.4σ 20% b =.0σ b =.σ b =.2σ b =.3σ b =.4σ 50% which is not a proportional hazards regression model, where β 0 = (/ 3, 2/ 3 τ. The settings for all other random variables and parameters are the same as that in Example except for β C = ( 2/2, 2/2 τ and ν, which takes values 3, 2, 0.2 so that the censoring proportion cp is kept at 0, 20 and 50%, respectively. Table 2 summarizes the results. It is seen that the bandwidth in this example is less variable with the censoring proportion than that in Example. Example 3 Exponential regression with univariate covariate. As suggested by a referee, it is interesting to compare the average length of confidence intervals based on two methods. To do that, we use a model analogous to the one in Example except that the covariate is univariate covariate (q =. Namely, assume that h(t X = x = exp(u, where u = (x 3β 0 with β 0 =, X χ 2 3 and h C (s X = x = exp(ν + xβ C, where β C =, ν is a constant controlling the censoring proportions. We have set up the censoring proportion at cp = 50% (ν = 3 and sample size at n = 20. At the 95% nominal confidence level, 000 replicates were simulated for a range of b values. Table 3 presents the results, from which we have the following observations: the coverage probability for both methods decreases with the bandwidth increasing, and it crosses the nominal level where the bandwidth is considered the optimal; the average length of confidence intervals for both methods has a similar pattern; the empirical likelihood method gives more accurate coverage probabilities than the normalapproximation-based method and slightly wider confidence intervals. Based on these results, we conduct an additional simulation study subject to the same coverage probability. Table 4 provides the results. It is found that the empirical

12 04 X. Lu et al. Table 3 Coverage probabilities (% and the average length (numbers in parentheses of confidence intervals in comparisons of empirical likelihood (EL and normal approximation (NA at different bandwidths (b = c σ, where σ = Var(X = 6 is the standard deviation of X and c is some constant factor. The nominal level is set up at 95%, the censoring proportion at cp = 50% and the sample size at n = 20 respectively. The true average derivative β0 = NA EL NA EL NA EL NA EL b = 0.7σ b = 0.8σ b = 0.9σ b =.0σ (0.233 (0.33 (0.009 (0.086 ( (0.099 ( (0.089 b =.σ b =.2σ b =.3σ b =.4σ ( ( ( ( ( ( ( ( Table 4 The average length (numbers in parentheses of confidence intervals in comparisons of empirical likelihood (EL and normal approximation (NA subject to the same coverage probability (% (numbers in the first row. The censoring proportion cp = 50% and the sample size n = 20. The true average derivative β0 = NA EL NA EL NA EL b = 0.7σ b =.0σ b =.3σ (0.2 (0.206 ( (0.074 (0.054 ( likelihood method has slightly shorter average length than the normal-approximation-based method when they are compared with each other at the same coverage probability, and the difference becomes more prominent when the bandwidth deviates downward from the optimal bandwidth. From the simulation results shown in Tables, 2, 3 and 4, we draw the following conclusions. The coverage probability varies with the bandwidth for both the empirical likelihood method and the normal-approximation-based method, but the empirical likelihood method outperforms the normal-approximation-based method in terms of the coverage probability, the average length of confidence intervals and the sensitivity to the bandwidth. Particularly, the former is less sensitive to the change of the bandwidth, because it inclines to give higher coverage than the latter using the same bandwidth. Subject to the same coverage probability, the former also tends toward giving slightly narrower confidence intervals than the latter. In summary, the empirical likelihood method appears to be superior to the normal-approximation-based method for the inference about the average derivatives of hazard functions. We have noticed that, in all cases, if the bandwidth is appropriately chosen, the coverage probability is in agreement with the nominal level. However, the coverage accuracy for both methods is very sensitive to the bandwidth. It is apparent that the bandwidth at which the best coverage occurs depends on the

13 Empirical likelihood for average derivatives 05 censoring proportion and the sample size. From our simulation results, we see that the range of b is between 0.7σ and.4σ, where σ is the standard deviations of covariates. Hence, we can select the bandwidth as b = cσ, c is a constant depending on the sample size, the censoring proportion and underlying distributions. For example, an empirical formula for c used for the model in Example can be fitted as c = ( cpn /7 and the resulting bandwidth satisfies the conditions in Theorem with q = 2 and k = 4. Although the criterion for the bandwidth selection has not been developed for the estimation of censored average derivatives, our Monte Carlo studies suggest that the reasonable small-sample performance is obtained by setting b in the range of 0.7 to.4 standard deviations of the covariates. Obviously, a formal criterion is desirable in applying any of these two methods for statistical inferences and it needs a full investigation. 4 Conclusion remarks We have proposed a semiparametric method for the inference about the average derivatives of hazard functions by using empirical likelihood methods. Our method applies to general hazard regression models without other assumptions on the model structure except for a weak index assumption. It does not need additional assumptions rather than those for the normal-approximation-based method. The finite sample behavior of our method appears comparable with that of the latter in general and less sensitive to the change of the bandwidth. Moreover, the confidence region based on empirical likelihood does not have predetermined symmetry and does not require the calculation of any covariance matrices. The factor /4 in the adjusted empirical loglikelihood ratio can be viewed as the cost of estimating the unknown functions in constructing empirical likelihood. Therefore, our method is a useful and competitive alternative to the normal-approximation-based method. Appendix: Proof of Theorem 2 To prove Theorem 2, we need the following lemmas. Lemma Under the assumptions of Theorem 2, we have (a max i n W ni =o p (n /2. (b λ =O p (n /2. Lemma 2 Under the assumptions of Theorem 2, we have (a Σ n P Σ, where P denotes the convergence in probability.

14 06 X. Lu et al. (b Set Σ n = (η n (T i βn n (η n(t i βn τ, Σ = E[(η(Y, S, X β 0 (η(y, S, X β 0 τ ]. (c P Then, Σ = (/4Σ and Σ n Σ. n n W ni Wni τ P Σ. Proof of Lemma. Part (b can be proved using the similar arguments as those used in the proof of (2.4 of Owen (990. Next, we prove (a. Set d (t = E{ρ 0 (t, T and d 2 (t = E{ρ 0 (T, t. Then it is easily seen that max ρ 0(T i, T i =o p (n /2 and max d j(t i β0 =o p(n /2 for j =, 2. i n i n Since for each i n W ni = η n (T i β 0 and = ρ 0 (T i, T i + ρ 0 (T i, T i n 2 = n n { ρ0 (T i, T i β 0 + n 2n n j =i j =i it suffices to show + n n n n j =i ρ 0 (T i, T j + ρ 0 (T j, T i 2 β 0 { ρ0 (T i, T j β0 + ρ 0(T j, T i β0, j =i { ρ0 (T i, T j β 0 = n { ρ0 (T j, T i β 0 = n max i n ρ 0 (T i, T j d (T i + d (T i β0 j =i ρ 0 (T j, T i d 2 (T i + d 2 (T i β0, j =i ρ 0 (T i, T j d (T i n = o p(n /2 (7 j =i and max i n ρ 0 (T j, T i d 2 (T i n = o p(n /2. (8 j =i

15 Empirical likelihood for average derivatives 07 We prove (7 only. (8 can be proved using some similar arguments. Set ρ 0 (T i, T j = ρ 02 (T i, T j = x K b (x X i K b (x X j w(y i, xi(y j Y i I(S i = s dx, K b (x X i x K b (x X j w(y i, xi(y j Y i I(S i = s dx. Then ρ 0 (T i, T j = ρ 0 (T i, T j ρ 02 (T i, T j. Therefore, proving (7 is equivalent to showing the following two equations: max i n ρ 0 (T i, T j d 0 (T i n = o p(n /2 (9 j =i and max i n ρ 02 (T i, T j d 02 (T i n = o p(n /2, j =i where d 0j (t = Eρ 0j (t, T for j =, 2. We prove the first equation only since the second one can be proved in the same way. For any t i = (y i, s i, x τ i τ,let ρ n (t i = (n j =i ρ 0(t i, T j d 0 (t i. Note that ρ n (t i is a q-vector. Without loss of generality, we consider the first element of ρ n (t i,say,ρ n (t i.let P n (t i = P( ρ n (t i > C for some constant C > 0, σn 2 = Var{ρ n(t i and K ( (x be the first element of xk(x. Then σn 2 = (n 2 (n b2(2q + { ( ( Var K ( x xi x Xj K b b (n b2(2q + { ( ( E K ( x xi x Xj K b b w(y i, xi(y j y i I(s i = s dx w(y i, xi(y j y i I(s i = s dx 2.

16 08 X. Lu et al. By using Cauchy-Schwarz inequality and changing variables, { E K ( ( x xi K ( x Xj b b [{ { ( ( E K ( x xi x Xj K b b { [ ] x X j ] I b dx [ { ( ( 2 q b q E K ( x xi x Xj K b b 2 q b {w 3q 2 (y i, x i A 2 (y i, x i 2 w(y i, xi(y j y i I(s i = s dx w(y i, xi(y j y i I(s i = s 2 dx ] 2 w(y i, xi(y j y i I(s i = s dx K (2 (uk 2 (v dudv + O(b. Therefore, we have σ 2 n 2 q { (n b q+2 w 2 (y i, x i A 2 (y i, x i K (2 (uk 2 (v dudv + O(b. By letting K max ( = max x{ K ( (x, K max = max x { K(x, wmax = max y,x { w(y, x, and K n = wmaxk max ( K max/{(n b 2q + and applying Bernstein s inequality (Serfling 980, p. 95, we obtain [ C 2 ] P n (t i 2 exp 2σn 2 + (2/3K nc [ ] C(n b 2q + 2 exp 2C b q C + (2/3wmaxK max ( K max 2 exp[ C 2 C(n b 2q + ], where C and C 2 are two constants independent of t i. For any ε>0, taking C = εn /2, and noticing that nb 2q + 2 implies that there exists a sufficiently large number >0 such that (n 3/2 b 2q + > log n and C 2 ε >, therefore, we get

17 Empirical likelihood for average derivatives 09 [ ] P max ρ n(t i εn /2 i n = P [ ρ n (T i εn /2] E {P n (T i T i 2q + 2n exp { C 2 C(n b 2n exp { C 2 ε(n 3/2 2q + b 2n exp { (C 2 ε log n = o(, which implies (9. The proof of Lemma is complete. Proof of Lemma 2. In fact, Lemma 2(a is a direct consequence of Theorem in G rgens (2004. Noticing that Σ n = (/4Σ n, we see that Lemma 2(a implies Lemma 2(b. Set S n = n n W ni Wni τ. To prove Lemma 2(c, we only need to show that S n = Σ n + o p (, since Lemma 2(b holds. For any a R q, we have a τ (S n Σ n a = [2(a τ η n (T i { a τ ( βn n ( β 0 + a τ β0 2 (a τ βn 2 ] = n {2a τ (η n (T i η(t i a τ (βn β 0 + n {a τ (β n β 0 2 2(a τ β 0 {aτ (β n β 0 = R n + R n2 + R n3 + R n4. {2a τ η(t i a τ (βn β 0 Notice that sup i η n (T i η(t i =sup i η n (T i E{η n (T i T i + sup i E{η n (T i T i η(t i =o p (n /2 +o(b = o p (n /2 as shown in (7. By the assumptions given in Theorem, we obtain R n 2 a 2 sup η n (T i η(t i βn β 0 =o p(, i ( R n2 2 a 2 βn β 0 n η(t i = O p (n /2, R n3 a 2 βn β 0 2 = O p (n, R n4 2 a τ β0 a β n β 0 =O p(n /2. Therefore, a τ (S n Σ n a = o p (. Hence, S n = Σ n + o p ( and Lemma 2(c is proved. Thus, the proof of Lemma 2 is complete.

18 0 X. Lu et al. Proof of Theorem 2. The Taylor s expansion of L(β0 in (4 with respect to λ τ W ni gives L(β 0 = 2 { λ τ W ni (/2(λ τ W ni 2 + R n, (0 where R n satisfies the following inequality in light of Lemmas and 2(c for some constant C > 0, R n C (λ τ W ni 3 C λ 3 max i n W ni W ni 2 = o p (. Using Lemmas, 2(c and some similar arguments as above we obtain From (5 we have 0 = λ τ W ni + λ τ W ni = (λ τ W ni 3 + λ τ W ni = o p (. ( (λ τ W ni which together with ( yields Again by (5 we have Since 0 = (λ τ W ni = W ni + λ τ W ni = = (λ τ W ni 2 + (λ τ W ni 3 + λ τ W ni, (λ τ W ni 2 + o p (. (2 [ W ni λ τ W ni + (λτ W ni 2 ] + λ τ W ni W ni (W ni Wni τ λ + n W ni(λ τ W ni 2 + λ τ W ni C λ 2 max i n W ni n W ni (λ τ W ni 2 + λ τ W ni. W ni 2 = o p (n /2

19 Empirical likelihood for average derivatives from Lemmas and 2(c, we conclude ( ( λ = W ni Wni τ W ni + n ( W ni Wni τ n W ni (λ τ W ni 2 + λ τ W ni ( = W ni Wni τ W ni + o p (n /2. (3 Consequently, by (0, (2 and (3, we obtain L(β 0 = = = λ τ W ni Wni τ λ + o p( ( ( n /2 n /2 τ ( W ni ( = Σ /2 n /2 n W ni τ Σ ( W ni Wni τ n /2 W ni + o p ( ( W ni + o p ( n /2 τ ( ( W ni Σ /2 Σ Σ/2 Σ /2 n /2 W ni +o p (. Noticing that Σ = (/4Σ, we obtain Σ /2 Σ Σ/2 = 4I q, where I q denotes a q-dimensional identity matrix. Hence, L ( τ ( β0 = 4 (Σ /2 n /2 W ni Σ /2 n /2 W ni + o p (. By Theorem, Σ /2 n /2 n W ni proved. D N(0, Iq, Therefore, Theorem 2 is Acknowledgments The authors thank the editor and an anonymous referee for their valuable and constructive comments. Lu s research was partly supported by the NSERC Discovery Grant of Canada and Qi s research was supported by NSF grant DMS of USA. References Aalen OO (980 A model for nonparametric regression analysis of counting processes. In: Klonecki N, Kosek A, Rosiński J (eds Lecture notes in statistics, vol 2, Springer, New York, pp 25 Cox DR (972 Regression models and life-tables (with discussion. J R Statist Soc Ser B 34: Dabrowska DM(987 Non-parametric regression with censored survival time data. Scand J Statist 4:8 92 Fan J, Gijbels I, King M (997 Local likelihood and local partial likelihood in hazard regression. Ann Statist 25:66 690

20 2 X. Lu et al. Geenens G, Delecroix M (2005 A survey about single-index models theory. Discussion paper #0508, Intitut de Statistique, Université Catholique de Louvain, Belgium. Available from G rgens T (2004 Average derivatives for hazard functions. Econom Theory 20: Hall P, La Scala B (990 Methodology and algorithms of empirical likelihood. Int Statist Rev 58:09 27 Härdle W, Stoker TM (989 Investigating smooth multiple regression by the method of average derivative. J Am Statist Assoc 84: Hastie T, Tibshirani R (990 Generalized additive models. Chapman and Hall, London Horowitz JL, Härdle W (996 Direct semiparametric estimation of single-index models with discrete covariates. J Am Statist Assoc 9: Huang J (999 Efficient estimation of the partly linear additive Cox model. Ann Statist 27: Ichimura H (993 Semiparametric least squares (SLS and weighted SLS estimation of single-index models. J Econom 58:7 20 Lu X, Burke MD (2005 Censored multiple regression by the method of average derivatives. J Multivariate Anal 95: Lu X, Chen G, Song X-K, Singh RS (2006 A class of partially linear single-index survival models. Can J Statist 34:97 2 Nielsen JP, Linton OB, Bickel PJ (998 On a semiparametric survival model with flexible covariate effect. Ann Statist 26:25 24 Owen A (988 Empirical likelihood ratio confidence intervals for single functional. Biometrika 75: Owen A (990 Empirical likelihood ratio confidence regions. Ann Statist 8:90 20 Powell JL, Stock JH, Stoker TM (989 Semiparametric estimation of index coefficients. Econometrica 57: Qin J, Lawless JF (994 Empirical likelihood and general estimating equations. Ann Statist 22: Sasieni P (992 Non-orthogonal projections and their application to calculating the information in a partly linear Cox model. Scand J Statist 9: Serfling RJ (980 Approximation theorems of mathematical statistics. Wiley, New York Thomas DR, Grunkemeier GL (975 Confidence interval estimation of survival probabilities for censored data. J Am Statist Assoc 70:866 87

University of California, Berkeley

University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan