arxiv: v1 [stat.me] 26 May 2017

Size: px

Start display at page:

Download "arxiv: v1 [stat.me] 26 May 2017"

Noah O’Neal’
5 years ago
Views:

1 Nearly Semiparametric Efficient Estimation of Quantile Regression Kani CHEN, Yuanyuan LIN, Zhanfeng WANG and Zhiliang YING arxiv: v [stat.me] 26 May 207 ABSTRACT: As a competitive alternative to least squares regression, quantile regression is popular in analyzing heterogenous data. For quantile regression model specified for one single quantile level τ, major difficulties of semiparametric efficient estimation are the unavailability of a parametric efficient score and the conditional density estimation. In this paper, with the help of the least favorable submodel technique, we first derive the semiparametric efficient scores for linear quantile regression models that are assumed for a single quantile level, multiple quantile levels and all the quantile levels in (0, ) respectively. Our main discovery is a one-step (nearly) semiparametric efficient estimation for the regression coefficients of the quantile regression models assumed for multiple quantile levels, which has several advantages: it could be regarded as an optimal way to pool information across multiple/other quantiles for efficiency gain; it is computationally feasible and easy to implement, as the initial estimator is easily available; due to the nature of quantile regression models under investigation, the conditional density estimation is straightforward by plugging in an initial estimator. The resulting estimator is proved to achieve the corresponding semiparametric efficiency lower bound under regularity conditions. Numerical studies including simulations and an example of birth weight of children confirms that the proposed estimator leads to higher efficiency compared with the Koenker-Bassett quantile regression estimator for all quantiles of interest. KEY WORDS: Quantile regression; Semiparametric efficient score; Least favorable submodel; One-step estimation;. INTRODUCTION Quantile regression is a statistical methodology for the modeling and inference of conditional quantile functions. Following Koenker and Bassett (978), we model the τth conditional quantile function of Y R given X R p as Q Y X (τ) = X β τ, ()

2 for certain specific τ (0,), and β τ is p-vector usually including an intercept. Let (x i,y i ), i =,2,...,n, be independent and identically distributed copies of (X,Y). For the τth quantile, the classical Koenker-Bassett estimate of β τ, denoted as ˆβ τ, c is obtained by minimizing the following objective function n ρ τ (y i x i β τ), (2) i= over β τ, where ρ τ (u) = u(τ I(u < 0)). The computation of ˆβ τ c is straightforward with the help of linear programming. There is vast literature on the estimation and inference for one or several percentile levels for model (); see Yu and Jones (998), He (997), Koenker and Geling (200), Koenker and Xiao (2002), He and Zhu (2003), Koenker (2005), Peng and Huang (2008), Peng and Fine (2009), Bondell, Reich and Wang (200), Wang, Wu and Li (202), Jiang, Wang and Bondell (203), He, Wang and Hong (203), Kato (20, 202), Zheng, Peng and He (205), among many others. When there are commonality of quantile coefficients across multiple quantiles, the composite quantile regression (CQR) is proposed to combine information shared across a number of quantiles to improve estimation efficiency; see Zou and Yuan (2008), Wang and Wang (2009), Kai et al. (200), Wang, Li and He (202), Wang and Li (203). But the novelty of CQR lies in the key assumption that there exist common covariate effects across multiple quantile levels. Recently, important findings in Bayesian inference for quantile regression were reported in Yang and He (20), Kim and Yang (20) and Feng, Chen and He (205). Typically, model () can be expressed as the following linear regression model Y = X β τ +ǫ τ, (3) wheretheτthpercentileofǫ τ isassumedtobe0. Forspecificτ, undertheindependence assumption of X and ǫ τ, it can be shown that ˆβ τ c is semiparametric efficient by a straightforward argument to be discussed in section 2. As a special case, when τ = 0.5, the least absolute deviation (LAD) is semiparametric efficient for model (3) with the independence assumption of X and ǫ τ (Zhou and Portnoy, 998). However, we point out that, without assuming independence of X and ǫ τ, ˆβ τ c is not semiparametric efficient and the semiparametric efficient estimation of model () or model(3) is indeed 2

3 a sophisticated issue. The most difficult part is the estimation of the density of ǫ τ given X in the semiparametric score function (Kato, 204), which suffers from the curse of dimensionality. When model () is specified for each τ (0,), following Portnoy (2003), we consider the quantile regression model Q Y X (τ) = X β(τ), for all τ (0,), (4) where Y and X are the same as in model (), and the regression parameter β(τ) = (β (τ),β 2 (τ),,β p (τ)) T is a function of τ. With the linearity assumption for all quantiles, the true unknown function β 0 (τ) is suffice to describe the entire conditional distribution of Y given X. Important results on the estimation of the quantile process with survival data can be found in Portnoy (2003), Peng and Huang (2008). Recently, there are some breakthroughs on Bayesian nonparametric regression models on all quantiles; see Müller & Quintana (2004), Dunson & Taylor (2005) and Chung & Dunson (2009), Reich et al.(20), Qu & Yoon (205), etc. To summarize, there are two main approaches for the estimation of quantile process: linear interpolation and basis representation. The linear interpolation approach consists of two steps: the first step is to estimate the quantile regression coefficients separately at certain proper grid of τ-values, and the second step is to interpolate linearly between grid values or apply rearrangement. For the basis representation method, the quantile function is represented by basis functions or some specific functions after transformation. Nevertheless, both methods reviewed above are in Bayesian framework and their theoretical properties remain unclear. To the best of our knowledge, there is no specific construction of a semiparametric efficient estimate of β(τ) of model (4) in the literature. We point out that for model (4), the likelihood function is n i= /{x i β(τ i )} where y i = x i β(τ i ) and β( ) is the derivative of β( ). However, the maximum likelihood method as in Zeng and Lin (2006,2007) involves enormous technical/numerical difficulty. In our view, one of the main reasons lies in the nature of model (4) that the quantile process β( ) and the nuisance parameter f Y X are not separable. The numerical maximization of the estimated likelihood subject to n constrains y i = x i β(τ i) is rather unstable and wild. The numerical difficulties here are in the same spirit as that in numerically searching for 3

4 the maximum likelihood estimation (MLE) of θ for Uniform[0, θ], where the solutions would often go to the boundary. Moreover, due to data sparsity, the estimated β(τ) or β(τ) would be unstable when τ is close to 0 or. In view of the technical/numerical complications involved in the semiparametric efficient estimation of β(τ) in model (4), we thus take one step back and consider the following quantile regression model Q Y X (τ l ) = X β(τ l ), for all l =,2,...,L, (5) where 0 < τ < τ 2 < < τ L <. Model (5) is intermediate of model () and model (4). With the explicit expression of the semiparametric efficient score function of β(τ k ), k =,2,...,L, derived by the least favorable submodel technique in section 2, we propose a one-step estimation with the estimated score function, that leads to the semiparametric efficient estimation of β(τ k ). The proposed procedure is numerically doable and stable. Most importantly, one can show that when the maximum space of {τ l τ l,l =,2,...,L+} tends to 0, the semiparametric efficient score of model (5) approaches to that of model (4). As the impetus for this work was to pursue semiparametric efficient estimation of β(τ) in model (4), theoretically, one can use efficient estimator of β(τ k ) with model (5) to approximate that of model (4). Hence, we refer the proposed procedure as nearly semiparametric efficient estimation for quantile regression. The rest of the paper is organized as follows. Section 2 introduces the model and the proposed estimation with detailed discussions. Extensive simulation studies with supportive evidence are demonstrated in section 3. In section 4, the proposed method is illustrated using a real data of birth weight of children from the National Center for Health Statistics. All technical derivation and proofs are in Appendix. 2. METHODOLOGIES AND MAIN RESULTS First, consider model (5), by the definition of quantile, F Y X (Q Y X (τ l )) = τ l F Y X (X β(τ l )) = τ l, l =,2,,L, (6) 4

5 where F Y X is the cumulative distribution function of Y given X. Let f Y X (t) be the density function of Y conditional on X. Let β 0 (τ l ) = (β 0 (τ l ),,β p0 (τ l )) be the true value of β(τ l ) = (β (τ l ),,β p (τ l )). By the nature of quantile regression model, x β(τ l ) is τ l -quantile of Y given X = x. Without loss of generality, we assume that x β(τ ) < x β(τ 2 ) < < x β(τ L ). 2.. Semiparametric efficient scores. In quantile regression, estimation of the quantile regression coefficient or the quantile process is inseparably linked to the nuisance parameter, the conditional density function. In such a case, the least favorable submodel method (Kato, 204) plays a roleto derive a semiparametric efficient score function of β(τ l ), l =,...,L of model (5) and their variance lower bound. It is known that the least favorable submodel technique is to reduce a high dimensional problem to a problem involving a finite-dimensional least favorable submodel ; see Begun et al.(983), Bickel et al.(993), among others. Following section 25.4 in van der Vaart (998), we begin with the construction of a parametric submodel of model (5) based on the cumulative distribution function with parameter θ in a neighborhood of 0, F Y X (t;θ) = F Y X (t)+θg Y X (t), (7) where G Y X (t) is a function of t satisfying certain conditions. Differentiating (7) we get f Y X (t;θ) = f Y X (t)+θg Y X (t), (8) where f Y X (t;θ), f Y X (t) and g Y X (t) are derivatives of F Y X (t;θ), F Y X (t) and G Y X (t) respectively. To guarantee f Y X (t;θ) is a density function for all θ, the first restriction of G Y X (t) is that + g Y X (u)du = 0. Moreover, undermodel(5),letx β(τ l ;θ)betheτ l quantileof F Y X (t;θ)andx β(τ l ;0) = X β 0 (τ l ), for l =,2,...,L. Hence, we have the identity τ l = F Y X (x β(τ l ;θ);θ). By a Taylor expansion of the right hand side of this identity as a function of θ in a neighborhood of 0, we obtain the second restriction that G Y X (X β 0 (τ l )) = f Y X (X β 0 (τ l ))X d(τ l ), 5

6 for l =,2...,L, where d(τ l ) is the derivative of β(τ l ;θ) at θ = 0. Clearly, the derivative of log-likelihood of θ based on the density function f Y X (t) at θ = 0 is g Y X (t)/f Y X (t), denoted as ξ. By the information theory in Bickel et al.(993), we are able to approximate the least favorable submodel by searching for the lower bound of E(ξ 2 ), which as a result would lead to the semiparametric efficient score. We defer the details to Appendix I. The resulting semiparametric efficient score of β(τ k ) can be regarded as an optimal way to combine information from all the quantile levels τ,...,τ L. Let U = BAB and W be a pl pl diagonal matrix with diagonal elements being the reciprocal of diagonal of matrix U, where A and B are defined in (A.0 ) and (A. ) in Appendix I. Set (u,u 2,,u pl ) = U W, where u i is a vector with length pl. The following proposition presents the semiparametric efficient score of β(τ k ), k L and their variance lower bound. Proposition. For model (5), the semiparametric efficient score of β(τ k ), k L, is S k (y,x) = L+ l= f Y X (x β(τ l ))x D l f Y X (x β(τ l ))x D l τ l τ l [I{x β(τ l ) < y < x β(τ l )} ] (τ l τ l ). (9) Moreover, for the estimate of the j th component of β(τ k ), its variance has a lower bound σkj 2 = u kj Uu, j =,2,...,p, kj where D l is p p matrix, l = 0,,...,L +, D 0 = D L+ = 0, (D,D 2,...,D L ) = (u k,u k2,...,u kp ) ; f Y X (x β 0 (0)) = f Y X (x β 0 ()) = 0; β 0 (0) =, β 0 () = + ; τ 0 = 0 and τ L+ =. Remark. When L =, model (5) reduces to model () for one single quantile point τ. By Proposition, the semiparametric efficient score of β τ in model () is S(y,x) = f Y X (x { β τ ) τ I(y < x β τ ) } D x. (0) ( τ)τ By the definitions of U and W, D = U W is a constant matrix not depending on random variable X. For the corresponding linear model (3), under the assumption 6

7 that the τ-quantile of ǫ τ is 0 and the error term ǫ τ is independent of covariate X, f Y X (x β τ ) = f Y X β τ X(0) = f ǫτ X(0) is also not relevant to X. In this case, the efficient score in (0) is exactly the efficient score in classical quantile regression model specified at a single quantile level, such as the least absolute deviation estimate (LAD) for median regression; see Zhou and Portnoy(998) and Kato(204). However, without the crucial independence assumption of X and ǫ τ, as conventional quantile regression models allows heterogeneity, the distribution of ǫ τ depends on X implying f Y X (x β τ ) also depends on x. As a result, the Koenker-Bassett estimate is not semiparametric efficient. Remark 2. When L and the maximum space of {τ l τ l,l =,2,...,L+ } tends to 0, model (5) approaches model (4). Next, we intend to show that the semiparametric efficient score (9) of β(τ k ) approaches that of model (4) as L. In fact, for the j-th component of β(τ k ), a similar calculation as that of (9) reveals that semiparametric efficient score of β j (τ k ) in model (4) is S kj (y,x) = (f Y X(x β(τ))x d(τ)) τ where d(τ) = [d (τ),...,d p (τ)] is a minimizer of E[{S kj (Y,X)}2 ] = {. } = f Y X (x β(τ))x d(τ), () [ { f Y X (X β(τ))x d(τ)} t=x β(τ) ] 2 dt, (2) subject to d j (τ k ) =. We defer the detailed derivations of this finding in Appendix II. We point out that, it is infeasible to pursue the semiparametric efficient estimation of β j (τ k ) in model (4) based on (), as the numerical minimization of (2) is intractable. Fortunately, the semiparametric efficient score of β j (τ k ) in (9) can be rewritten as, S kj (y,x) = L+ l= f Y X (x β(τ l ))x d(τ l ) f Y X (x β(τ l ))x d(τ l ) τ l τ l I{x β(τ l ) < y < x β(τ l )}, (3) where d = [d(τ ),...,d(τ L ) ] is a minimizer of the quadratic form E[{S kj (Y,X)} 2 ] d BAB d subject to d j (τ k ) =. It is straightforward to check that S kj (y,x) S kj (y,x) and E[{S kj(y,x)} 2 ] E[{S kj (Y,X)}2 ] (4) 7

8 as L. This finding motivates us to use the efficient score in (9) to approximate the efficient score in (), which leads to a nearly semiparametric efficient estimator of β(τ k ) in model (4). Remark 3. The key idea of this work is to borrow information across quantiles and search for the most efficient estimation. This remark provides more insights in this idea. Intuitively, for certain quantile level τ k, the estimation of β(τ k ) in traditional quantile regression does not depend on the information on Y at other quantiles {τ i, i k}, especially those quantiles far away from τ k. The intuition is true when the number of covariates (including an intercept term) is, that is p =. For this special case, one can rewrite (3) as { d(τ k )/ S kj (y,x) = β(τ k ) d(τ k+ )/ β(τ k+ ) d(τ k )/ β(τ k ) d(τ k )/ β(τ } k ) τ k+ τ k τ k τ k [τ k I{y < β(τ k )x}], (5) from which one can see that S kj (y,x) is not relevant to the model information at other quantiles {τ l,l k}. Appendix III contains the proofs of (5). In other words, for model (5) with p =, the semiparametric efficiency for the estimation of β j (τ k ) can be achieved using only the information at τ k. However, besides an intercept, there is generally at least one covariate in the model, namely p 2. Hence, the efficient estimator of β j (τ k ) generally depends on the information at other quantiles. In view of this fact, borrowing information across other quantiles via the efficient score (5) is able to improve the estimation efficiency of β(τ k ) when p 2. In addition, Proposition tells that the variance of estimates of β j (τ k ) have a lower bound σ 2 kj. For illustration, we consider a toy example for model (5) with L = 2. To estimate β (τ ), if we use only the model information at single quantile τ and ignore the information at τ 2, then [ E[{S (Y,X)} 2 ] = E τ ( τ ) { fy X (X β(τ )) } 2{ X d(τ ) } ] 2.= E(Q ). On the other hand, by incorporating the model information at τ 2 for the estimation of 8

9 β (τ ), we have shown in Appendix IV that E[{S (Y,X)} 2 ] [ { =E fy X (X β(τ )) } 2{ X d(τ ) } 2 { + fy X (X β(τ 2 )) } 2{ X d(τ 2 ) } 2 τ τ 2 + { fy X (X β(τ ))X d(τ ) f Y X (X β(τ 2 ))X d(τ 2 ) } ] 2 τ 2 τ. =E(Q 2 ). (6) Most importantly, we have shown Q 2 Q > 0 which leads to E(Q 2 ) E(Q ) > 0. In summary, our theoretical analysis validates that combining information across quantiles can generally reduce the variance of the estimate of β(τ k ) The nearly semiparametric efficient estimation. In this subsection, we introduce the proposed nearly semiparametric efficient estimation procedure for the regression coefficients of mode (4). As discussed earlier, we make use of the score (9) in the construction of the proposed estimator. Since (9) involves the density function of Y given X, we need to find an appropriate estimate of f Y X (x β(τ l )), l =,2,...,L. Recall that f Y X (X β(τ l )) = X β(τ, l =,2,...,L. (7) l ) Hence, instead of estimating the conditional density function directly, we estimate β(τ l ). A natural estimate of β(τ l ) is ˆ β(τl ) = {ˆβ c (τ l +h) ˆβ c (τ l h)}/(2h), where ˆβ c (τ) is the Koenker-Bassett estimate of β(τ) by minimizing (2) and h is the bandwidth. Thus, the density function f Y X (X β(τ l )) can be estimated by /X ˆ β(τl ) for l =,2,...,L. Next, we define the proposed one-step estimator of β(τ k ), denoted by ˆβ(τ k ), as n ˆβ j (τ k ) = ˆβ j c (τ k)+ ˆσ kj 2 i=ŝkj(y i,x i ), j =,2,...,p, (8) n where Ŝkj(y,x) is the j-th component of the estimated score Ŝk(y,x) by plugging ˆ β(τ l ) and ˆβ c (τ l ), l =,...,L, into (9), ˆσ kj 2 is the estimated variance lower bound by plugging ˆ β(τl ) and ˆβ c (τ l ), l =,...,L, into σ 2 kj in Proposition. Under regularity conditions given in Appendix V, the resulting estimate of ˆβj (τ k ) can be proved to achieve the semiparametric efficiency lower bound. The following theorem presents the main results. 9

10 Theorem. Assume model (4) and conditions () (3) in Appendix V hold. Then, for j =,2,...,p and k =,2,...,L, n {ˆβj (τ k ) β 0j (τ k )} N(0, σkj 2 ) (9) in distribution as n, where β 0j (τ k ) is the j-th component of β 0 (τ k ). Moreover, the asymptotic variance of ˆβ j (τ k ) achieves the semiparametric efficiency bound σkj 2. The implementation of the one-step estimation is as follows: for each k =,,L, j =,,p, Step. For each l =,,L, compute the initial estimator ˆβ c (τ l ); Step 2. For each l =,,L, calculate ˆ β(τl ) and the conditional density function f Y X (x i β(τ l )) is /x i ˆ β(τ l ); Step 3. Compute Ŝkj(y,x) and ˆσ kj 2 by plugging the initial estimator in step and the estimated density in step 2 into S kj (y,x) and σkj 2 ; Step 4. Obtain ˆβ j (τ k ) according to (8). Remark 4. Actually, in the above one-step estimation, we only need to estimate the conditional density function f Y X (x β(τ l )) at quantile levels {τ l,l =,...,L}. In this regard, we only need to assume the linear quantile regression model is specified in a neighborhood of each τ l, l =,...,L, and do not need to assume a linear quantile regression model for all τ (0,). 3. SIMULATION STUDIES Simulations are conducted to evaluate the performance of our proposed method. In the simulation, for a quantile level τ k of interest, we consider three methods for the estimation of ˆβ j (τ k ): the Koenker-Bassett quantile estimate ˆβ τ, c denoted by TQE; the proposed one-step estimate based on the semiparametric efficient score of β(τ k ), referred as EFF; the one-step estimate based on the score function (0) ignoring the model information at other quantiles, referred as (SEF). The simulated data is generated from the following quantile regression model with two covariates, Q Y X (τ) = X β (τ)+x 2 β 2 (τ), (20) 0

11 where β (τ) and β 2 (τ) takes each of the following 5 forms: M : β (τ) = 2 and β 2 (τ) = +Φ (τ); M2 : β (τ) = 2+Φ (τ) and β 2 (τ) = 2+Φ (τ); M3 : β (τ) = 2 and β 2 (τ) = +log{τ/( τ)}; M4 : β (τ) = 2 and β 2 (τ) = +tan{π (τ 0.5)}; M5 : β (τ) = +log{τ/( τ)} and β 2 (τ) = 2+tan{π (τ 0.5)}. The covariate X is constant for M, M3 and M4, and it follows log-normal distribution for M2 and M5. Another covariate X 2 follows log-normal distribution for all cases. In particular, model (20) with cases M and M2 are equivalent to Y = 2+X 2 +X 2 ǫ, and Y = 2+2X 2 +(+X 2 )ǫ, respectively, where ǫ follows the standard normal distribution. The sample size n = 000 and All simulations are repeated 000 times. We first consider the two quantiles 0.5 and 0.7. The simulation results are summarized in Table. One can see that the parameter estimates are generally unbiased. In all configurations, EFF has the smallest standard deviation (SD) compared with TQE and SEF. And SEF have much smaller SD compared to TQE. For example, for case M3 and n = 000, the ratio of the standard deviations of TQE and EFF ranges from.343 to And the ratio of the standard deviations of SEF and EFF ranges from.026 to.062. In other words, EFF improves efficiency of TQE for at least 80% and it improves efficiency of the SEF for around 5% to 2%, which confirms our theoretical findings. In addition, we also compare the numerical performance of the three methods with quantiles 0.5 and 0.9, a higher quantile. Table 2 reports the estimation results for the 5 cases, fromwhich similar conclusion to that of τ = 0.5 and 0.7 can be drawn. Specially, EFFhasthesmallest standarderrosandsefismoreefficient thantqe. Thisconfirms the theory that, if a higher quantile is of particular interest, it is beneficial to combine the model information across other quantile levels, for example, some moderate quantile τ = 0.5, for more efficient and stable estimation.

12 Table : Simulation results for five models with quantiles 0.5 and 0.7. τ = 0.5 τ = 0.7 Model n β (τ) β 2 (τ) β (τ) β 2 (τ) M True TQE (0.052) (0.0899) 2.003(0.0547).595(0.096) SEF (0.0238) (0.0547) (0.0265).549(0.0560) EFF 2.005(0.0227) (0.0533) (0.0247).5200(0.0529) 2000 TQE.9992(0.0365).000(0.0652) (0.0370).523(0.0653) SEF (0.059) (0.036) (0.074).590(0.0376) EFF (0.045) (0.0352) (0.050).5224(0.0365) M2 True TQE.9976(0.92).9987(0.55) (0.244) (0.229) SEF.9989(0.0896).998(0.0875) (0.089) (0.0903) EFF.9985(0.088).9982(0.0870) (0.0883) (0.088) 2000 TQE.9980(0.0834) (0.0844) (0.0877) (0.0833) SEF.9990(0.067) (0.064) (0.063) (0.0605) EFF.9988(0.0608) (0.0608) (0.0624) (0.0602) M3 True TQE 2.00(0.0822) (0.437) (0.0907).8397(0.592) SEF 2.004(0.038) (0.0874) (0.0445).8305(0.0929) EFF 2.002(0.0365) (0.0852) 2.009(0.0420).8400(0.0875) 2000 TQE.9987(0.0585).007(0.042) (0.065).8424(0.082) SEF (0.0256) (0.0575) 2.006(0.0290).8378(0.0622) EFF (0.0230) (0.056) 2.002(0.0250).8436(0.0607) M4 True TQE (0.0669) (0.44) (0.0930).722(0.62) SEF 2.004(0.036) (0.0699) 2.066(0.049).705(0.0952) EFF (0.0287) (0.0677) 2.004(0.0480).772(0.0925) 2000 TQE.9990(0.0469).003(0.0824) (0.0629).7228(0.097) SEF (0.0207) (0.046) 2.003(0.0327).78(0.0646) EFF (0.088) (0.0449) 2.006(0.0289).7227(0.0628) M5 True TQE (0.797).9982(0.555).8467(0.2073) (0.2072) SEF (0.344).9972(0.79).8440(0.488) 2.724(0.50) EFF 0.997(0.35).9984(0.73).8449(0.465) (0.474) 2000 TQE (0.258) 2.003(0.39).8449(0.459) (0.396) SEF (0.092) (0.083).8448(0.052) (0.0) EFF (0.09) (0.087).8462(0.039) (0.004) Standard deviations are in parentheses. 2

13 Table 2: Simulation results for five models with a high quantile. τ = 0.5 τ = 0.9 Model n β (τ) β 2 (τ) β (τ) β 2 (τ) M True TQE (0.052) (0.0899) 2.07(0.0725) (0.296) SEF (0.0238) (0.0547) 2.058(0.0362) (0.0784) EFF 2.004(0.0226) (0.0530) (0.0377) (0.0772) 2000 TQE.9992(0.0365).000(0.0652) (0.0482) (0.0877) SEF (0.059) (0.036) (0.023) (0.0543) EFF (0.042) (0.0347) (0.0207) (0.050) M2 True TQE.9976(0.92).9987(0.55) (0.622) 3.275(0.648) SEF.9989(0.0896).998(0.0875) 3.288(0.227) (0.237) EFF.9982(0.0879).9984(0.0868) (0.89) (0.200) 2000 TQE.9980(0.0834) (0.0844) 3.286(0.49) (0.27) SEF.9990(0.067) (0.064) (0.085) (0.0862) EFF.9986(0.0607) (0.0607) (0.082) (0.0839) M3 True TQE 2.00(0.0822) (0.437) (0.40) 3.834(0.253) SEF 2.004(0.038) (0.0874) (0.0720) 3.530(0.533) EFF (0.0366) (0.0848) 2.007(0.0668) 3.87(0.458) 2000 TQE.9987(0.0585).007(0.042) 2.086(0.0938) 3.807(0.708) SEF (0.0256) (0.0575) 2.020(0.0458) 3.79(0.06) EFF (0.0226) (0.0555) (0.0403) 3.872(0.0995) M4 True TQE (0.0669) (0.44) 2.044(0.409) 4.079(0.7658) SEF 2.004(0.036) (0.0699) 2.874(0.2729) (0.4643) EFF (0.0286) (0.0672) (0.2469) (0.4447) 2000 TQE.999(0.0469).003(0.0824) (0.2667) (0.5028) SEF (0.0207) (0.046) 2.08(0.550) 3.973(0.399) EFF (0.083) (0.0444) (0.358) (0.3075) M5 True TQE (0.797).9982(0.555) (0.4602) 5.270(0.87) SEF (0.344).9972(0.79) 3.208(0.3645) 5.003(0.624) EFF (0.33).9987(0.66) 3.208(0.3424) 5.070(0.5705) 2000 TQE (0.258) (0.39) 3.258(0.328) 5.080(0.534) SEF (0.092) (0.083) (0.258) (0.4226) EFF 0.998(0.0906) (0.084) (0.2400) (0.3935) Standard deviations are in parentheses. 3

14 4. APPLICATION We apply the proposed method to analyze a birth data (birth) released annually by the National Center for Health Statistics. The data includes information on nearly all live births from United States. Education of mother of each birth is recorded as 5 classes based on years of education. For illustration, we only consider the births that occurred in the month of June, 997, and had mothers with smoking cigarettes and education class 2 (7 to years of education). There are 9832 birth children consisting of 486 female and 497 male. In this paper, our interest is to study the relationship of the birth weight of child (in grams) and the covariates: the age of mother (Mage), the age of father (Fage) and the total number of prenatal care visits (Nprevist). All variables are taken the logarithmic transformation before analysis. We apply model (5) with τ = 0.3,0.5,0.7 for analyzing the dataset. Tables 3-4 present the estimation results of regression coeffecients by TQE, SEF and EFF, which are defined the same as in section 3. In Tables 3-4, Est represents the parameter estimate, Esd is the variance estimate of Est by 000 boostrap resampling method and the P-value is computed by Φ( Est/Esd ) where Φ( ) is the cumulative distribution function of the standard normal distribution. It can be seen that at nominal significance level 0.05, all the three methods detect Nprevist for all quantiles, detect ages ofparents at τ = 0.3 and0.5. And at τ = 0.7, the three methods identify father age of the female children data. However, one significant finding in the analysis is that at τ = 0.7, Fage and Mage of the male children data do not have significantly nonzero coefficients, however, for female data, Mage is only detected by EFF with a significant nonzero coefficients, while TQE and SEF do not detect this. Overall, Tables 3-4 report that Nprevist and ages of parents have positive and negative coefficients, respectively, which suggests that the birth weights of children become heavier when their mothers are younger and have more prenatal care visits. In addition, the effect of the three covariates to the birth weights of children are more significant at lower quantile (τ = 0.3) compared with that of higher quantile (τ = 0.7). 4

15 Table 3: Analysis of birth data with male child. Intercept Mage Fage Nprevist τ model Est Esd P value Est Esd P value Est Esd P value Est Esd P value 0.3 TQE < < SEF < < EFF < < TQE < < SEF < < EFF < < TQE < < SEF < < EFF < < Table 4: Analysis of birth data with female child. Intercept Mage Fage Nprevist τ model Est Esd P value Est Esd P value Est Esd P value Est Esd P value 0.3 TQE < < < SEF < < < EFF < < TQE < < < SEF < < EFF < TQE < < SEF < < EFF < References [] Begun, J. M., Hall, W. J., Huang, W. M. and Wellner, J. A. (983). Information and asymptotic efficiency in parametric-nonparametric models. Ann. Statist., [2] Bickel, P. J., Klaassen, C. A., Ritov, Y. and Wellner, J. A. (993). Efficient and adaptive estimation for semiparametric models. Baltimore: Johns Hopkins University Press. [3] Bondell, H. D., Reich, B. J. and Wang, H. (200). Noncrossing quantile regression curve estimation. Biometrika, 97, [4] Chung, Y. and Dunson, D. B. (2009). Nonparametric Bayes conditional distribution modeling with variable selection. J. Amer. Statist. Assoc. 04,

16 [5] Dunson, D. B. and Taylor, J. A. (2005). Approximate Bayesian inference for quantiles. J. Nonparametri. Stat. 7, [6] Feng, Y., Chen, Y., and He, X. (205). Bayesian quantile regression with approximate likelihood. Bernoulli 2, [7] He, X. (997). Quantile curves without crossing. The American Statistician 5, [8] He, X., Wang, L., and Hong, H.G.(203). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Statist. 4, [9] He, X. and Zhu, L. X. (2003). A lack-of-fit test for quantile regression. J. Amer. Statist. Assoc. 98, [0] Jiang, L., Wang, H. J. and Bondell, H. D. (203). Interquantile shrinkage in regression models. J. Comp. Graph. Statist., 22, [] Kai, B., Li, R. and Zou, H. (20). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann. Statist. 39, [2] Kato, K. (20). Group Lasso for high dimensional sparse quantile regression models. arxiv preprint arxiv: [3] Kato, K. (202). Estimation in functional linear quantile regression. Ann. Statist. 40, [4] Kato, K. (204). Semiparametric efficiency bound for linear quantile regression. Technical report. [5] Kim, M. O. and Yang, Y. (20). Semiparametric approach to a random effects quantile regression model. J. Amer. Statist. Assoc. 06, [6] Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38. Cambridge Univ. Press, Cambridge. [7] Koenker, R. and Bassett, G. JR. (978). Regression quantiles. Econometrica 46,

17 [8] Koenker, R. and Geling, O. (200). Reappraising medfly longevity: A quantile regression survival analysis. J. Amer. Statist. Assoc. 96, [9] Koenker, R. and Xiao, Z. (2002). Inference on the quantile regression process. Econometrica 70, [20] Müller, P. and Quintana, F. A. (2004). Nonparametric Bayesian data analysis. Stat. Sci. 9, [2] Peng, L. and Fine, J. P. (2009). Competing risks quantile regression. J. Amer. Statist. Assoc. 04, [22] Peng, L. and Huang, Y. (2008). Survival analysis with quantile regression models. J. Amer. Statist. Assoc. 03, [23] Portnoy, S. (2003). Censored regression quantiles. J. Amer. Statist. Assoc. 98, [24] Qu, Z. and Yoon, J.(205). Nonparametric estimation and inference on conditional quantile processes. J. Econometrics 85, -9. [25] Reich, B. J., Fuentes, M., and Dunson, D. B. (20). Bayesian spatial quantile regression. J. Amer. Statist. Assoc. 06, [26] van der Vaart A. W. (998). Asymptotic Statistics. Cambridge University Press. [27] Wang, H. J. and Li, D. (203). Estimation of Extreme Conditional Quantiles Through Power Transformation. J. Amer. Statist. Assoc. 08, [28] Wang, H. J., Li, D. and He, X. (202). Estimation of high conditional quantiles for heavy-tailed distributions. J. Amer. Statist. Assoc. 07, [29] Wang, H. J. and Wang, L. (2009). Locally weighted censored quantile regression. J. Amer. Statist. Assoc. 04, [30] Wang, L., Wu, Y., and Li, R.(202). Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Amer. Statist. Assoc. 07,

18 [3] Yang, Y. and He, X. (202). Bayesian empirical likelihood for quantile regression. Ann. Statist. 40, [32] Yu, K. and Jones, M. C. (998). Local linear quantile regression. J. Amer. Statist. Assoc. 93, [33] Zeng, D. and Lin, D. Y. (2006). Efficient estimation in semiparametric transformation models for counting process. Biometrika 93, [34] Zeng, D. and Lin, D. Y. (2007). Maximum likelihood estimation in semiparametric regression models with censored data. J. R. Statist. Soc. B 69, [35] Zheng, Q., Peng, L., and He, X. (205). Globally adaptive quantile regression with ultra-high dimensional data. Ann. Statist. 43, [36] Zhou, K. Q. and Portnoy, S. L. (998). Statistical inference on heteroscedastic models based on regression quantiles. J. Nonparametric Stat.9, [37] Zou, H. and Yuan, M. (2008). Composite quantile regression and the oracle model selection theory. Ann. Statist. 36,

19 APPENDIX Appendix I We firstly consider the following quantile regression model, Q Y X (τ l ) = X β(τ l ), l =,2,,L, where X = (X,X 2,,X p ), β(t) = (β (t),β 2 (t),,β p (t)). This model focuses on parameter estimation of the L quantile points. A semiparametric efficient score of β(τ k ) is calculated by using methodology of the least favorable submodel (Bickel et al., 993). Without loss of generality, the efficient score of the j-th component of β(τ k ) is constructed firstly. We consider the following parametric submodels based on cumulative distribution function with parameter θ in a neighborhood of 0, F Y X (t;θ) = F Y X (t)+θg Y X (t), (A.) where G Y X (t) is a function on t. Easily, we have f Y X (t;θ) = f Y X (t)+θg Y X (t), (A.2) where f Y X (t;θ), f Y X (t) and g Y X (t) are derivatives of F Y X (t;θ), F Y X (t) and G Y X (t) with respect to t. To guarantee the f Y X (t;θ) is a density function for all of θ, the g Y X (t) satisfies that + g Y X (u)du = 0. (A.3) Let x β(τ l ;θ) be the τ l quantile of distribution of F Y X (t;θ) and x β(τ l ;0) = x β 0 (τ l ), for l =,2,...,L. From (A. ) and the Taylor expansion, we have τ l = F Y X (x β(τ l ;θ);θ) =F Y X (x β 0 (τ l ))+f Y X (x β 0 (τ l ))x (β(τ l ;θ) β 0 (τ l ))+θg Y X (x β 0 (τ l ))+o( θ ) =τ l +f Y X (x β 0 (τ l ))x (β(τ l ;θ) β 0 (τ l ))+θg Y X (x β 0 (τ l ))+o( θ ), which suggests that G Y X (x β 0 (τ l ))θ = f Y X (x β 0 (τ l ))(β(τ l ;θ) β 0 (τ l )) x+o( θ ). (A.4) 9

20 Let d(τ l ) be derivative value of β(τ l ;θ) on θ = 0. Note that d(τ l ) is a vector with length p and β(τ l ;θ) β l0 = d(τ l )θ+o( θ ). From (A.4 ), we have G Y X (x β 0 (τ l ))θ = f Y X (x β 0 (τ l ))x d(τ l )θ+o( θ ), (A.5) which indicates that G Y X (x β 0 (τ l )) = f Y X (x β 0 (τ l ))x d(τ l ), for l =,2...,L. (A.6) Thus, we study the studied parametric submodel (A. ), subject to constraints (A.3 ) and (A.6 ). It is well-known that Var(β j (τ k ;ˆθ)) can be generally expressed with d j (τ k ) 2 Var(ˆθ), where d j (τ k ) and β j (τ k ;ˆθ) are the j th components of d(τ k ) and β k respectively, and ˆθ is an estimator of θ. When d j (τ k ) equals to, Var(β j (τ k ;ˆθ)) can be approximated by Var(ˆθ). This paper takes d j (τ k ) =. Based on the density function f Y X (t), we show Then we have ( x β 0 (τ ) E(ξ 2 x) = + ( x β 0 (τ ) = + log f Y X (t) θ L x β 0 (τ l ) l=2 x β 0 (τ l ) x β 0 (τ l ) L l=2 ( x β 0(τ ) g Y X (t)dt) 2 = x β 0 (τ ) L+ l= L+ = l= f Y X (t)dt x β 0 (τ l ) + = g Y X(t) θ=0 f Y X (t) L l=2 + + x β 0 (τ L ) + + x β 0 (τ L ) (G(x β 0 (τ l )) G(x β 0 (τ l ))) 2 τ l τ l. = ξ. (A.7) ) g 2 Y X (t) f Y X (t) dt ) (gy X (t) ) 2dt f /2 Y X (t) ( x β 0 (τ l ) x β 0 (τ l ) g Y X(t)dt) 2 x β 0 (τ l ) x β 0 (τ l ) f Y X(t)dt + ( + x β 0 (τ L ) g Y X(t)dt) 2 + x β 0 (τ L ) f Y X(t)dt (f Y X (x β 0 (τ l ))x d(τ l ) f Y X (x β 0 (τ l ))x d(τ l )) 2 τ l τ l, (A.8) where d(τ 0 ) = d(τ L+ ) = 0. Hence, we get L+ E(ξ 2 ) = E(E(ξ 2 x)) l= ( ) 2 E f Y X (x β 0 (τ l ))x d(τ l ) f Y X (x β 0 (τ l ))x d(τ l ) τ l τ l =d BAB d, (A.9) 20

21 where d = (d(τ ),d(τ 2 ),...,d(τ L ) ) T, ( ) ( ) E(fY X (X β 0 (τ )) 2 XX ) E(fY X (X β 0 (τ L )) 2 XX ) A =, A L+ = τ p p τ L A l = E(f Y X (X β 0 (τ l )) 2 XX ) τ l τ l E(f Y X(X β 0 (τ l ))f Y X (X β 0 (τ l ))XX ) τ l τ l E(f Y X(X β 0 (τ l ))f Y X (X β 0 (τ l ))XX ) τ l τ l E(f Y X (X β 0 (τ l )) 2 XX ) τ l τ l l = 2,,L, p p 2p 2p, A = A A A L 0, (A.0) A L+ 2pL 2pL B = I p I p I p I p I p I p 0 0. (A.) I p I p pl 2pL The equation holds if and only if L+ g Y X (t) f /2 Y X (t) = {a l f /2 Y X (t)+b l}i(x β 0 (τ l ) < y < x β 0 (τ l )). l= It follows from conditions (A.3 ) and (A.6 ) that b l = 0, a l = ( f Y X (x β 0 (τ l ))d(τ l ) f Y X (x β 0 (τ l ))d(τ l )) x τ l τ l, l =,,L+. 2

22 Therefore, a semiparametric efficient score for β j (τ k ) is, g Y X (t) f Y X (t) L+ = l= f Y X (x β 0 (τ l ))x d(τ l ) f Y X (x β 0 (τ l ))x d(τ l ) τ l τ l I(x β 0 (τ l ) < y < x β 0 (τ l )) L+ f Y X (x β 0 (τ l ))x d(τ l ) f Y X (x β 0 (τ l ))x d(τ l ) = τ l τ l l= { } I(x β 0 (τ l ) < y < x β 0 (τ l )) (τ l τ l ). (A.2) Due to unknown d, we need to compute d by minimizing E(ξ 2 ), that is minimizing quadratic function d BAB d on d subject to d j (τ k ) =. Let U = BAB and W be a p p diagonal matrix with diagonal components same as /diag(u ). Denote (u,u 2,,u pl ) = U W. By Lagrange multiplier method, we have L(d,λ) = d Ud+λ{d j (τ k ) }. From L(d,λ)/ d = 0 and d j (τ k ) =, we get d = u kj. (A.3) Therefore, for j p, the semiparametric efficient score of β j (τ k ) is L+ f Y X (x β 0 (τ l ))x d(τ l ) f Y X (x β 0 (τ l ))x d(τ l ) S kj (y,x) = τ l τ l l= { } I(x β 0 (τ l ) < y < x β 0 (τ l )) (τ l τ l ), with d = u kj. Naturally, the semiparametric efficient score of β(τ k ) is L+ f Y X (x β 0 (τ l ))x D l f Y X (x β 0 (τ l ))x D l S k (y,x) = τ l τ l l= { } I(x β 0 (τ l ) < y < x β 0 (τ l )) (τ l τ l ), where (D,D 2,...,D L ) = (u k,u k2,...,u kp ), D l is p p matrix, l = 0,,...,L+, and D 0 = D L+ = 0. Appendix II 22

23 Consider another quantile regression model, Q Y X (τ) = x β(τ), 0 < τ <, where x = (x,x 2,,x p ), β(t) = (β (t),β 2 (t),,β p (t)). This model assumes all of quantiles for response Y given X have a linear form. Since the cumulative distribution function of Y given X is F Y X (β(τ) X) = τ, and density function of Y given X = x, satisfies f Y X (β(τ) x) = x β(τ). Asemiparametric efficient score of β j (τ k ), j =,2,...,p, is calculated by using the least favorable submodel similar to the previous parametric submodel. The parametric submodels with parameter θ in a neighborhood of 0 is, F Y X (t;θ) = F Y X (t)+θg Y X (t), (A.4) where G Y X (t) is a function on t. Then we have f Y X (t;θ) = f Y X (t)+θg Y X (t), (A.5) where f Y X (t;θ), f Y X (t) and g Y X (t) are derivatives of F Y X (t;θ), F Y X (t) and G Y X (t) with respect to t. Similar derivations to (A.3 ) and (A.6 ), we have constraints on g as follows, and + g Y X (u)du = 0, (A.6) G Y X (x β 0 (τ)) = f Y X (x β 0 (τ))x d(τ), for 0 < τ <, (A.7) where d(τ) is derivative value of β(τ;θ) with respect to θ at point 0, and d j (τ k ) = ; x β 0 (τ) and x β(τ;θ) are τ quantiles of F Y X (t) and F Y X (t;θ), respectively; and β(τ;0) = β 0 (τ). Since x β 0 (τ) is a monotone and increasing function on τ, from (A.6 ) we have G Y X (x β 0 (0)) = G Y X (x β 0 ()) = 0. From (A.7 ), it shows G Y X (x β 0 (τ)) τ = (f Y X(x β 0 (τ))x d(τ)), τ 23

24 which indicates that g Y X (x β 0 (τ)) f Y X (x β 0 (τ)) = g Y X(x β 0 (τ))x β 0 (τ) = (f Y X(x β 0 (τ))x d(τ)) τ (. ) = f Y X (x β 0 (τ))x d(τ). (A.8) Hence, we have log f Y X (t) θ = g ( Y X(t) θ=0 f Y X (t) = f Y X (x β 0 (τ))x d(τ)) t=x β 0 (τ). = ξ, (A.9) and the semiparametric efficient score of β j (τ k ), ( ) S kj (y,x) = f Y X (x β 0 (τ))x d(τ), (A.20) where d(τ) is a minimizer of { ( E(ξ 2 ) = f Y X (x β 0 (τ))x d(τ)) subject to d j (τ k ) =. t=x β 0 (τ) } 2 dt, Obviously, it is intractable to compute the semiparametric score (A.20 ) of β j (τ k ), and can not be used to estimate the β 0 (τ) directly. Appendix III For the jth component of β(τ k ), j =,...,p, it follows from (3) that d can be solved by minimizing quadratic form d BAB d on d, subject to d j (τ k ) =. For l k, letting derivative of d BAB d on d(τ l ) be 0, we have ) (τ l+ τ l )E (f Y X (X β(τ l ))f Y X (X β(τ l ))XX d(τ l ) ) (τ l+ τ l )E (f Y X (X β(τ l ))f Y X (X β(τ l ))XX d(τ l ) ) +(τ l τ l )E (f Y X (X β(τ l ))f Y X (X β(τ l+ ))XX d(τ l+ ) = 0. Based on model (4) and cumulative distribution function (6) with p =, we have which leads to f Y X (Xβ(τ l )) = X β(τ l ), l =,2,...,L, f Y X (Xβ(τ l ))X = β(τ l ), l =,2,...,L. 24 (A.2)

25 Thus, from (A.2 ), we show that 0 =(τ l+ τ l ) d(τ l ) β(τ l ) (τ l+ τ l ) d(τ l) β(τ l ) +(τ l τ l ) d(τ l+) β(τ l+ ) ( d(τl ) =(τ l+ τ l ) β(τ l ) d(τ ) ( l) d(τl ) (τ l τ l ) β(τ l ) β(τ l ) d(τ ) l+). (A.22) β(τ l+ ) Hence, the score (3) becomes S kj (y,x) L+ = = = l= f Y X (xβ(τ l ))xd(τ l ) f Y X (xβ(τ l ))xd(τ l ) τ l τ l I(xβ(τ l ) < y < xβ(τ l )) L ( fy X (xβ(τ l ))xd(τ l ) f Y X (xβ(τ l+ ))xd(τ l+ ) l= τ l+ τ l f Y X(xβ(τ l ))xd(τ l ) f Y X (xβ(τ l ))xd(τ l ) ) (τ l I(y < β(τ l )x)) τ l τ l ( L d(τ l )/ β(τ l ) d(τ l+ )/ β(τ l+ ) d(τ l )/ β(τ l ) d(τ l )/ β(τ ) l ) τ l+ τ l τ l τ l l= (τ l I(y < β(τ l )x)) ( d(τ k )/ = β(τ k ) d(τ k+ )/ β(τ k+ ) d(τ k )/ β(τ k ) d(τ k )/ β(τ ) k ) τ k+ τ k τ k τ k (τ k I(y < β(τ k )x)). (A.23) Appendix IV We take estimator of β (τ ) as an example with p 2 and L = 2. If only use single quantile τ without considering model informationof quantile τ 2, from (3) with L =, we have E(S (Y,X) 2 ) = E( τ ( τ ) f Y X(X β(τ )) 2 (X d(τ )) 2 ) =. E(Q ). (A.24) And taking quantile τ 2 into account (L = 2 in (3)), we get { E(S (Y,X) 2 ) = E f Y X (X β(τ )) 2 (X d(τ )) 2 + f Y X (X β(τ 2 )) 2 (X d(τ 2 )) 2 τ τ 2 + ( 2 } f Y X (X β(τ ))X d(τ ) f Y X (X β(τ 2 ))X.= d(τ 2 )) E(Q2 ). (A.25) τ 2 τ 25

26 Then we have Q 2 Q = τ (f Y X (X β(τ ))) 2 (X d(τ )) 2 + τ 2 (f Y X (X β(τ 2 ))) 2 (X d(τ 2 )) 2 + ( f Y X (X β(τ ))X d(τ ) f Y X (X β(τ 2 ))X d(τ 2 ) τ 2 τ τ ( τ ) f Y X(X β(τ )) 2 (X d(τ )) 2 = τ 2 τ ( τ2 τ f Y X (X β(τ )) 2 (X d(τ )) 2 + τ τ 2 f Y X (X β(τ 2 )) 2 (X d(τ 2 )) 2 ) 2 τ 2 τ f Y X (X β(τ ))(X d(τ ))f Y X (X β(τ 2 ))(X d(τ 2 )) 0. ) 2 (A.26) We know that when Q 2 Q = 0 holds, there exist two constants a and b for all X = x such that τ2 f Y X (x β(τ ))(x τ d(τ )) = a f Y X (x β(τ 2 ))(x d(τ 2 ))+b, τ τ 2 which obviously does not satisfy. Appendix V: Proofs of Theorem on the one-step efficient estimation. Let ˆβ c (τ) be the classical Koenker-Bassett regression quantile estimator of β 0 (τ) at any given quantile level τ and let h be the bandwidth for the estimation of β 0 (τ), the derivative of β 0 (τ). More conditions are needed. Assumption The covariate X satisfies that X i M for some constant M uniformly in i =,...,n. Assumption 2 The function β(τ) is bounded away from 0 for all ǫ τ ǫ and 0 < ǫ <. Assumption 3 The bandwidth h for the derivative estimation satisfies h = o(n δ ) with 0 < δ < /2. Hereafter, mathematic operators of vectors (matrices) A and B, such as A+B and A/B, stand for the corresponding operators of each component of A and B. Proof of Theorem. We prove Theorem in the several steps. 26

27 Step. To prove ( sup ˆfY X (t) f Y X (t) = O p + {log(n)}3/2 M ǫ<t<m ǫ nh +h 2 ), (A.27) where ˆf Y X (t) = /{X ˆ β(τ)} with t = X β(τ) for any fixed ǫ τ ǫ, ˆ β(τ) ˆβ c (τ +h) ˆβ c (τ h), (A.28) 2h and M ǫ is certain constant large enough depending on ǫ and M. To this end, first, standard approximation using Taylor expansion shows that which implies β 0 (τ +h) β 0 (τ h) = [β 0 (τ +h) β 0 (τ)] [β 0 (τ h) β 0 (τ)] = β 0 (τ) 2h+O(h 3 ), (A.29) β 0 (τ +h) β 0 (τ h) 2h = β 0 (τ)+o(h 2 ). (A.30) Next, by a result in Portnoy(202, page 733), we have ( ) ˆβ c (τ) β 0 (τ) = O p n + (logn)3/2, n ( ) ˆ β(τ) β 0 (τ) = O p + (logn)3/2 +h 2, (A.3) nh uniformly for all ǫ τ ǫ. A straightforward calculation yields that ˆ β(τ) = ˆ β(τ) β 0 (τ) β 0 (τ) [ ] 2 + β 0 (τ) ] 2 [ˆ β(τ) β 0 (τ) [ 2. (A.32) β(τ)] ˆ β(τ) Under condition (A2), inf ǫ<τ< ǫ ˆ β(τ) > c > 0 for some positive constant c. Together with (A.3 ) and (A.32 ), we have sup ǫ<τ< ǫ ˆ β(τ) β 0 (τ) = O p ( ) + (logn)3/2 +h 2. (A.33) nh Next, by the boundedness of X in assumption (A), we have ( ) sup ǫ<τ< ǫ X ˆ β(τ) X β 0 (τ) = O p + (logn)3/2 +h 2, (A.34) nh 27

28 which implies ( sup ˆfY X (t) f Y X (t) = O p + {log(n)}3/2 M ǫ<t<m ǫ nh +h 2 ). Step 2. To prove sup l L ˆD ( l D l = Op + {log(n)}3/2 nh +h 2 ). To this end, first, we need to evaluate the order of /{X ˆ β(τ)} 2 /{X β 0 (τ)} 2 uniformly in τ (ǫ, ǫ). We show that {ˆ β(τ) } 2 { β 0 (τ) } 2 = { β 0 (τ)+ ˆ β(τ)}{ β 0 (τ) ˆ β(τ)} { } 2 } 2. (A.35) β 0 (τ) {ˆ β(τ) Under Assumption 2, inf ǫ<τ< ǫ β(τ) > 0 andinf ǫ<τ< ǫ ˆ β(τ) > c > 0for some positive constant c. Then, sup } ǫ<τ< ǫ 2 {ˆ β(τ) { β 0 (τ) } 2 = C sup ˆ β(τ) β 0 (τ) ǫ<τ< ǫ ( ) = O p + {log(n)}3/2 +h 2, (A.36) nh for n large enough. For brevity, we denote θ l β 0 (τ l ) and ˆθ l ˆ β(τl ). Write, for any l = 2,...,L+, = θ l ˆθ l + θ l ˆθ l. ˆθ lˆθl θ l θ l ˆθ lˆθl θ l ˆθ l θ l θ l Under Assumption (A2), one can check that sup 2 l (L+) ˆθ lˆθl θ l θ l = C{ sup θ l ˆθ l + sup θ l ˆθ l } 2 l (L+) l L for n large enough. Hence, sup 2 l (L+) ˆ β(τ l )ˆ β(τl ) β 0 (τ l ) β 0 (τ l ) = O p( + {log(n)}3/2 +h 2 ). nh Given that X is bounded, we have sup 2 l (L+) X ˆ β(τl ) X ˆ β(τl X β 0 (τ l ) X β 0 (τ l ) ( ) =O p + {log(n)}3/2 +h 2. (A.37) nh 28

29 From the definition matrices A and B in section 2, combining (A.36 ) and (A.37 ), we have sup l L ˆD ( l D l = Op + {log(n)}3/2 nh +h 2 ). (A.38) Step 3. For ease of presentation, let ˆη l /X ˆθl and η l /X θ l. Recall that θ l β 0 (τ l ) and ˆθ l ˆ β 0 (τ l ). Consider η l D l ˆη l ˆDl = η l D l (ˆη l η l +η l )( ˆD l D l +D l ) = η l D l {(ˆη l η l )( ˆD l D l )+η l ( ˆD l D l )+D l (ˆη l η l )+η l D l } = (ˆη l η l )( ˆD l D l ) η l ( ˆD l D l ) D l (ˆη l η l ). Hence, sup 2 l L+ = sup 2 l L+ η l D l ˆη l ˆDl (ˆη l η l )( ˆD l D l )+η l ( ˆD l D l )+D l (ˆη l η l ) Using (A.33 ) and (A.38 ), we have shown sup 2 l L+ ( η l D l ˆη l ˆDl = Op + {log(n)}3/2 nh +h 2 ). Step 4. To evaluate the estimated score function Ŝk(y,x) in Proposition by plugging in ˆf Y X ( ), ˆDl and ˆβ c (τ l ), l =,...,L into S k (y,x). To be concise, we define Ŝ k (y i,x i ) L+ l= â l â [ l I{x i τ l τ l ˆβ c (τ l ) < y i < x i ] ˆβ c (τ l )} (τ l τ l ), (A.39) where â l = ˆη l ˆDl and a l = η l D l, ˆη l. First, write We also write â l â l τ l τ l = a l a l τ l τ l + âl a l (â l a l ) τ l τ l. (A.40) I{x i ˆβ c (τ l ) < y i < x i ˆβ c (τ l )} = I{y i > x i ˆβ c (τ l )} I{y i > x i ˆβ c (τ l )}. (A.4) 29

30 We first consider the first term in (A.4 ) and write I{y i > x i ˆβ c (τ l )} = I{y i > x i ˆβ c (τ l )} I{y i > x i β 0 (τ l )}+I{y i > x i β 0 (τ l )} = I{x i β 0 (τ l ) < y i < x i ˆβ c (τ l )}+I{y i > x i β 0 (τ l )} ˆ i l +I{y i > x i β 0(τ l )} (A.42) Based on the fact that Var(ˆ i l ) = O( + (logn)3/2 nh +h 2 ), we can show ( sup ˆ i l = O p l L nh 2 + (logn)3/2 nh +h 2 ) Then, by the monotonicity implied by the quantile regression model,. (A.43) I{x i ˆβ c (τ l ) < y i < x i ˆβ c (τ l )} = ˆ i l ˆ i l +I{x i β 0(τ l ) < y i < x i β 0(τ l )} (A.44) Using (A.40 ) and (A.44 ), we have Ŝ k (y i,x i ) L+ { al a l + (â } l a l ) (â l a l ) τ l τ l τ l τ l l= [ˆ i l ˆ ] i l +I{x i β 0(τ l ) < y i < x i β 0(τ l )} (τ l τ l ), L+ { } al a l = [ˆ i l τ l τ ˆ ] i l +I{x i β 0(τ l ) < y i < x i β 0(τ l )} (τ l τ l ), l l= L+ { } (âl a l ) (â l a l ) + [ˆ i l τ l τ ˆ ] i l +I{x i β 0 (τ l ) < y i < x i β 0 (τ l )} (τ l τ l ), l l= L+ = l= L+ + l= L+ + l= a l a l [ I{x i τ l τ β 0(τ l ) < y i < x i β 0(τ l )} (τ l τ l ) ] l (a l a l )(ˆ i l ˆ i l ) L+ + τ l τ l l= (â l a l ) (â l a l ) τ l τ l (ˆ i l ˆ i l) (â l a l ) (â l a l ) τ l τ l [ I{x i β 0(τ l ) < y i < x i β 0(τ l )} (τ l τ l ) ] S 0 k (y i,x i )+Π +Π 2 +Π 3. (A.45) We first consider Π. For L large but fixed, as obviously E(ˆ i l ) 0, then it can be 30

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows