Local Rank Inference for Varying Coefficient Models

Size: px

Start display at page:

Download "Local Rank Inference for Varying Coefficient Models"

Jack Booker
5 years ago
Views:

1 Local Rank Inference for Varying Coefficient Models Lan WANG, BoAI, and Runze LI By allowing te regression coefficients to cange wit certain covariates, te class of varying coefficient models offers a flexible approac to modeling nonlinearity and interactions between covariates Tis article proposes a novel estimation procedure for te varying coefficient models based on local ranks Te new procedure provides a igly efficient and robust alternative to te local linear least squares metod, and can be conveniently implemented using existing R software package Teoretical analysis and numerical simulations bot reveal tat te gain of te local rank estimator over te local linear least squares estimator, measured by te asymptotic mean squared error or te asymptotic mean integrated squared error, can be substantial In te normal error case, te asymptotic relative efficiency for estimating bot te coefficient functions and te derivative of te coefficient functions is above 96%; even in te worst case scenarios, te asymptotic relative efficiency as a lower bound 8896% for estimating te coefficient functions, and a lower bound 8991% for estimating teir derivatives Te new estimator may acieve te nonparametric convergence rate even wen te local linear least squares metod fails due to infinite random error variance We establis te large sample teory of te proposed procedure by utilizing results from generalized U-statistics, wose kernel function may depend on te sample size We also extend a resampling approac, wic perturbs te objective function repeatedly, to te generalized U-statistics setting, and demonstrate tat it can accurately estimate te asymptotic covariance matrix EY WORDS: Asymptotic relative efficiency; Local linear regression; Local rank; Varying coefficient model 1 INTRODUCTION As introduced in Cleveland, Crosse, and Syu (1992 and Hastie and Tibsirani (1993, te varying coefficient model provides a natural and useful extension of te classical linear regression model by allowing te regression coefficients to depend on certain covariates Due to its flexibility to explore te dynamic features wic may exist in te data and its easy interpretation, te varying coefficient model as been widely applied in many scientific areas It as also experienced rapid developments in bot teory and metodology; see Fan and Zang (2008 for a compreensive survey Fan and Zang (1999 proposed a two-step estimation procedure for te varying coefficient model wen te coefficient functions ave possibly different degrees of smootness auermann and Tutz (1999 investigated te use of varying coefficient models for diagnosing te lack-of-fit of regression, regarding te varying coefficient model as an alternative to a parametric null model Cai, Fan, and Li (2000 developed a more efficient estimation procedure for varying coefficient models in te framework of generalized linear models As special cases of varying coefficient models, time-varying coefficient models are particularly appealing in longitudinal studies, survival analysis, and time series data since tey allow one to explore te time-varying effect of covariates over te response Pioneering works on novel applications of time-varying coefficient models to longitudinal data include Brumback and Rice (1998, Hoover et al (1998, Wu, Ciang, and Hoover (1998, and Fan and Zang (2000, among oters For more details, readers are referred to Fan and Li (2006 and te references terein Time-varying coefficient Lan Wang is Associate Professor, Scool of Statistics, University of Minnesota, Minneapolis, MN ( lan@statumnedu Bo ai is Assistant Professor, Department of Matematics, College of Carleston, Carleston, SC ( kaib@cofcedu Runze Li is Professor, Department of Statistics and Te Metodology Center, Te Pennsylvania State University, University Park, PA ( rli@statpsuedu Wang s researc is supported by National Science Foundation grant DMS ai s researc is supported by National Science Foundation grants DMS as a researc assistant Li s researc is supported by NIDA, NIH grants R21 DA and P50 DA10075 Te content is solely te responsibility of te autors and does not necessarily represent te official views of te NIDA or te NIH Te autors tank an associate editor and two anonymous referees for teir insigtful and constructive comments models are also popular in modeling and predicting nonlinear time series data and survival data; see Fan and Zang (2008for related literature Estimation procedures in te aforementioned papers are built on eiter local least squares type or local likeliood type metods Altoug tese estimators remain asymptotically normal for a large class of random error distributions, teir efficiency can deteriorate dramatically wen te true error distribution deviates from normality Furtermore, tese estimators are very sensitive to outliers Even a few outlying data points may introduce undesirable artificial features in te estimated functions Tese considerations motivate us to develop a novel local rank estimation procedure tat is igly efficient, robust, and computationally simple In particular, te proposed local rank regression estimator may acieve te nonparametric convergence rate even wen te local linear least squares metod fails to consistently estimate te regression coefficient functions due to infinite random error variance, wic occurs for instance wen te random error as a Caucy distribution Te new approac can substantially improve upon te commonly used local linear least squares procedure for a wide class of error distributions Teoretical analysis reveals tat te asymptotic relative efficiency (ARE, measured by te asymptotic mean squared error (or te asymptotic mean integrated squared error, of te local rank regression estimator in comparison wit te local linear least squares estimator as an expression tat is closely related to tat of te Wilcoxon Mann Witney rank test in comparison wit te two-sample t-test However, different from te two-sample test scenario were te efficiency is completely determined by te asymptotic variance, in te current setting of estimating an infinite-dimensional parameter bot bias and variance contribute to te asymptotic efficiency Te value of ARE is often significantly greater tan one For example, te ARE is 167% for estimating te regression coefficient functions wen te random error as a t 3 distribution, is 240% for exponential random error distribution, and is 493% for lognormal random error distribution 2009 American Statistical Association Journal of te American Statistical Association December 2009, Vol 104, No 488, Teory and Metods DOI: /jasa2009tm

2 1632 Journal of te American Statistical Association, December 2009 A striking feature of te local rank procedure is tat its pronounced efficiency gain comes wit only a little loss wen te random error actually as a normal distribution, for wic te ARE of te local rank regression estimator relative to te local linear least squares estimator is above 96% for estimating bot te coefficient functions and teir derivatives For estimating te regression coefficient functions, te ARE as a sarp lower bound 8896%, wic implies tat te efficiency loss is at most 1104% in te worst case scenario For estimating te first derivative of te regression coefficient functions, te ARE possesses a lower bound 8991% im (2007 developed a quantile regression procedure for varying coefficient models wen te random errors are assumed to ave a certain quantile equal to zero Se used te regression splines metod and derived te convergence rate, but te lack of asymptotic normality result does not allow te comparison of te relative efficiency On te oter and, one may extend te local quantile regression approac (Yu and Jones 1998 to te varying coefficient models However, tis is expected to yield an estimator wic still suffers from loss of efficiency and may ave near zero ARE relative to te local linear least squares estimator in te worst case scenario Te new estimator proposed in tis article minimizes a convex objective function based on local ranks Te implementation of te minimization can be conveniently carried out using existing functions in te R statistical software package via a simple algoritm (Section 41 Te objective function as te form of a generalized U-statistic wose kernel varies wit te sample size Under some mild conditions, we establis te asymptotic representation of te proposed estimator and furter prove its asymptotic normality We derive te formula of te asymptotic relative efficiency of te local rank estimator relative to te local linear least squares estimator, wic confirms te efficiency advantage of te local rank approac We also extend a resampling approac, wic perturbs te objective function repeatedly, to te generalized U-statistics setting, and demonstrate tat it can accurately estimate te asymptotic covariance matrix Tis article is organized as follows Section 2 presents te local rank procedure for estimating te varying coefficient models Section 3 discusses its large sample properties and proposes a resampling metod for estimating te asymptotic covariance matrix In Section 4, we address issues related to practical implementation and present Monte Carlo simulation results We furter illustrate te proposed procedure via analyzing an environment dataset Regularity conditions and tecnical proofs are presented in te Appendix 2 LOCAL RAN ESTIMATION PROCEDURE Let Y be a response variable, and U and X be te covariates Te varying coefficient model is defined by Y = a 0 (U + X T a(u + ɛ, (1 were a 0 ( and a( are bot unknown smoot functions Te random error ɛ as probability density function g( wic as finite Fiser information, tat is, g(x} 1 g (x 2 dx < In tis article, it is assumed tat U is a scalar and X is a p- dimensional vector Te proposed procedures can be extended to te case of multivariate U wit more complicated notations by following te same idea in tis article Suppose tat U i, X i, Y i }, i = 1,,n, is a random sample from model (1 Write X i = (X i1,,x ip T and a( = (a 1 (,,a p ( T Foru in a neigborood of any given u 0, we locally approximate te coefficient function by a Taylor expansion a m (u a m (u 0 + a m (u 0(u u 0, m = 0, 1,,p (2 Denote α 1 = a 0 (u 0, α 2 = a 0 (u 0, β m = a m (u 0, and β p+m = a m (u 0, form = 1,,p Based on te above approximation, we obtain te residual for estimating Y i at U i = u 0 p e i = Y i α 1 α 2 (U i u 0 β m + β p+m (U i u 0 ]X im m=1 We define te local rank objective function to be Q n (β,α 2 = 1 n(n 1 1 i,j n (3 e i e j (U i u 0 (U j u 0, (4 were β = (β 1,,β p,β p+1,,β 2p T, and for a given kernel function ( and a bandwidt, (t = 1 (t/ Note tat Q n (β,α 2 does not depend on α 1 because α 1 is canceled out in e i e j Te objective function Q n (β,α 2 is a local version of Gini s mean difference, wic is a classical measure of concentration or dispersion (David 1998 Witout te kernel functions, n(n 1] 1 1 i,j n e j e j is te global rank objective function tat leads to te classical rank estimator in linear models based on Wilcoxon scores Rank-based statistical procedures ave played a fundamental role in nonparametric analysis of linear models due to its ig efficiency and robustness We refer to te review paper of Mcean (2004 formany useful references For any given u 0, minimizing Q n (β,α 2 yields te local Wilcoxon rank estimator for (β T 0,α 2 T, were β 0 = β(u 0 = (a 1 (u 0,,a p (u 0, a 1 (u 0,,a p (u 0 T Denote te minimizer of Q n (β,α 2 by ( β T, α 2 T Ten for m = 1,,p, â m (u 0 = β m, â m (u 0 = β p+m, and â 0 (u 0 = α 2 In te sequel, we also use te vector notation â(u 0 = (â 1 (u 0,,â p (u 0 T and â (u 0 = (â 1 (u 0,,â p (u 0 T wen convenient Te location parameter a 0 (u 0 needs to be estimated separately Tis is analogous to te scenario of global rank estimation of intercept in te linear regression model In order to make te intercept identifiable, it is essential to ave additional location constraint on te random errors We adopt te commonly used constraint tat ɛ i as median zero Given ( β T, α 2 T,we estimate a 0 (u 0 by α 1,tevalueofα 1 tat minimizes n n 1 Y i α 1 α 2 (U i u 0 p β m + β p+m (U i u 0 ]X im (U i u 0, (5 m=1 wic is a local version of a weigted L 1 -norm objective function

3 Wang, ai, and Li: Local Rank Inference for Varying Coefficient Models THEORETICAL PROPERTIES 31 Large Sample Distributions In tis subsection, we investigate te asymptotic properties of β and α 2 Te main callenge comes from te nonsmootness of te objective function Q n (β,α 2 To overcome tis difficulty, we first derive an asymptotic representation of β and α 2 via a quadratic approximation of Q n (β,α 2, wic olds uniformly in a local neigborood of te true parameter values Aided wit tis asymptotic representation, we furter establis te asymptotic normality of te local rank estimator Let us begin wit some new notation Let γ n = (n 1/2, and define β = γn 1 ( β1 a 1 (u 0,,β p a p (u 0, (β p+1 a 1 (u 0,,(β 2p a p (u 0 T, α = (α1,α 2 T = γn 1 ( α1 a 0 (u 0, (α 2 a 0 (u 0 T, p i (u 0 = a m (U i a m (u 0 a m (u 0(U i u 0 ]X im m=1 +a 0 (U i a 0 (u 0 a 0 (u 0(U i u 0 ] Let ( β T n, α 2n T be te value of (β T,α 2 T tat minimizes te following reparametrized objective function Q n (β,α 2 1 = n(n 1 ( ɛ i γ n α2 (U i u 0 / γ n β T Z i + i (u 0 1 i,j n ( ɛ j γ n α 2 (U j u 0 / γ n β T Z j + j (u 0 (U i u 0 (U j u 0, (6 were Z i = (X T i,((u i u 0 /X T i T LetH = diag(1, I p, were denotes te operation of ronecker product and I p denotes a p p identity matrix Ten it can be easily seen tat β n = nh( β β 0 and α 2n = n 3 α 2 a 0 (u 0] We next sow tat te nonsmoot function Q n (β,α2 can be locally approximated by a quadratic function of (β T,α2 T Let μ i = t i (t dt, i = 1, 2, and ν i = t i 2 (t dt, i = 0, 1, 2 In tis article, we assume tat te kernel function ( is symmetric Tis is not restrictive considering tat most of te commonly used kernel functions, suc as te Epanecnikov kernel (t = 075(1 t 2 I( t < 1, are symmetric We use S n (β,α2 = (ST n1 (β,α2, S n2(β,α2 T to denote te gradient function of Q n (β,α2, tat is, S n1(β,α2 = β Q n (β,α2 and S n2(β,α2 = α2 Q n (β,α2 More specifically, S n1 (β,α 2 = 2γ n n(n 1] 1 ( I ɛi γ n α2 (U i u 0 / γ n β T Z i + i (u 0 i j ɛ j γ n α2 (U j u 0 / γ n β T Z j + j (u 0 1/2 ] (Z i Z j (U i u 0 (U j u 0 and S n2 (β,α 2 = 2γ n n(n 1] 1 ( I ɛi γ n α2 (U i u 0 / γ n β T Z i + i (u 0 i j ɛ j γ n α2 (U j u 0 / γ n β T Z j + j (u 0 1/2 ] ((U i U j / (U i u 0 (U j u 0 Furtermore, we consider te following quadratic function of (β T,α2 T : ( B n (β,α2 = γ n 1 (β T,α2 Sn1 (0, 0 S n2 (0, ( 2 γ n(β T,α2 β A + γn 1 Q n (0, 0, (7 were A = 4τf 2 (u 0 α 2 ( (u μ 2 (u 0 0, (8 0 0 μ 2 (u 0 = EX i X T i U i = u 0 ], 0 denotes a matrix (or vector of zeroes wose dimension is determined by te context, τ = g 2 (t dt is te Wilcoxon constant, and g( is te density function of te random error ɛ Lemma 31 Suppose tat Conditions (C1 (C4 in te Appendix old Ten ɛ >0, c > 0, ] P sup γn 1 Q n (β,α2 B n(β,α2 ɛ 0, (β T,α2 c were denotes te Euclidean norm Lemma 31 implies tat te nonsmoot objective function Q n (β,α2 can be uniformly approximated by a quadratic function B n (β,α2 in a neigborood around 0 InteAppendix, it is also sown tat te minimizer of B n (β,α2 is asymptotically witin a o(1 neigborood of ( β T n, α n2 T Tis furter allows us to derive te asymptotic distribution Te local linear Wilcoxon estimator of a(u 0 = (a 1 (u 0,, a p (u 0 T is â(u 0 Te teorem below provides an asymptotic representation of â(u 0 and te asymptotic normal distribution Let S n1 (0, 0 = (Sn11 T (0, 0, ST n12 (0, 0T, were S n11 (0, 0 and S n12 (0, 0 are bot p 1 vectors Teorem 32 Suppose tat Conditions (C1 (C4 in te Appendix old Ten we ave te following asymptotic representation nâ(u0 a(u 0 ] = γ 2 n 4τf 2 (u 0 (u 0 ] 1 S n11 (0, 0 + o P (1, (9 were f (u is te density function of U Furtermore, n â(u 0 a(u 0 μ ] 2 2 a (u o( 2 ( ν 0 N 0, 12τ 2 f (u 0 1 (u 0 in distribution, were a (u 0 = (a 1 (u 0,,a p (u 0 T (10

4 1634 Journal of te American Statistical Association, December 2009 Remark For te estimators of te derivatives of te coefficient functions, we ave te following asymptotic representations: n 3 α 2 a 0 (u 0] = γ 2 n 4τf 2 (u 0 μ 2 ] 1 S n2 (0, 0 + o P (1, (11 n 3 â (u 0 a (u 0 ] = γ 2 n 4τf 2 (u 0 μ 2 (u 0 ] 1 S n12 (0, 0 + o P (1 (12 Following similar proof as tat for Teorem 32 in te Appendix, it can be sown tat n 3 α n2 a 0 (u 0] and n 3 â (u 0 a (u 0 ] are bot asymptotically normal Te proof of te asymptotic normality of ˆα 2 and â (u 0 is given in te tecnical report version of tis article (Wang, ai, and Li Asymptotic Relative Efficiency We now compare te estimation efficiency of te local rank estimator denoted by â R (u 0 ] wit tat of te local linear least squares estimator denoted by â LS (u 0 ] for estimating a(u 0 in te varying coefficient model To measure efficiency, we consider bot te asymptotic mean squared error (MSE at a given u 0 and te asymptotic mean integrated squared error (MISE Wen evaluating bot criteria, we plug in te teoretical optimal bandwidt Zang and Lee (2000 gives te asymptotic MSE of â LS (u 0 for estimating a(u 0 : MSE LS (; u 0 = E â LS (u 0 a(u 0 2 = μ2 2 a (u ν 0σ 2 4 f (u 0 tr 1 (u 0 } 1 n, were σ 2 = Var(ɛ is assumed to be finite and positive Tus, te teoretical optimal bandwidt, wic minimizes te asymptotic MSE of â LS (u 0,is opt LS (u 0 = ν0 σ 2 tr 1 ] (u 0 } 1/5 μ 2 2 a (u 0 2 n 1/5 (13 f (u 0 From (10, te asymptotic MSE of te local rank estimator â R (u 0 is MSE R (; u 0 = E â R (u 0 a(u 0 2 = μ2 2 a (u ν τ 2 f (u 0 tr 1 (u 0 } 1 n Te teoretical optimal bandwidt for te local rank estimator tus is opt R (u ν 0 tr 1 ] (u 0 } 1/5 0 = 12τ 2 μ 2 2 a (u 0 2 n 1/5 (14 f (u 0 Tis allows us to calculate te local asymptotic relative efficiency Teorem 33 Te asymptotic relative efficiency of te local rank estimator to te local linear least squares estimator for a(u 0 is ARE(u 0 = MSE LS opt LS (u 0, u 0 } MSE R opt R (u 0, u 0 } = (12σ 2 τ 2 4/5 Tis asymptotic relative efficiency as a lower bound 08896, wic is attained at te random error density f (t = 3 20 (5 5 x 2 I( x 5 Remark 1 Alternatively, we may consider te asymptotic relative efficiency obtained by comparing te MISE, wic is defined as MISE( = E â(u a(u 2 w(u du wit a weigt function w( Tis provides a global measurement Interestingly, it leads to te same relative efficiency Tis follows by observing tat te teoretical optimal global bandwidts for te local linear least squares estimator and te local rank estimator are opt LS = ν0 σ 2 w(u tr 1 ] (u}/f (u du 1/5 μ 2 2 a (u 2 n 1/5 (15 w(u du and opt R = ν0 w(u tr 1 ] (u}/f (u du 1/5 12τ 2 μ 2 2 a (u 2 n 1/5, (16 w(u du respectively Tus, wit te teoretical optimal bandwidts, ARE = MISE LS( opt LS MISE R ( opt R = (12σ 2 τ 2 4/5 Define φ = (12σ 2 τ 2 4/5 Ten ARE(u 0 = ARE = φ Note tat te above ARE is closely related to te asymptotic relative efficiency of te Wilcoxon Mann Witney rank test in comparison wit te two-sample t-test Table 1 depicts te value of φ for some commonly used error distributions It can be seen tat te desirable ig efficiency of traditional rank metods for estimating a finite-dimensional parameter completely carries over to te local rank metod for estimating an infinite dimensional parameter By a similar calculation, we can sow tat te asymptotic relative efficiencies of te local rank estimator to te local linear estimator for a (u 0 and a ( bot equal ψ = (12σ 2 τ 2 8/11, wic as a lower bound Tis value is also reported in Table 1 for some common error distributions Remark 2 We may also apply te local median approac (Yu and Jones 1998 to estimate te coefficient functions and teir first derivatives Similarly, we can prove tat suc estimators are asymptotically normal Te ARE of te local median estimator versus te local linear least squares estimator is closely related to tat of te sign test versus te t-test It is known tat te ARE of te sign test versus te t-test for te normal distribution is 063 Tus we expect te efficiency loss of te local median procedure to be substantial for normal random error Table 1 Asymptotic relative efficiency Error Normal Laplace t 3 Exponential Log N Caucy φ ψ

5 Wang, ai, and Li: Local Rank Inference for Varying Coefficient Models Asymptotic Normality of α 1 Following (5, α 1 = n α 1 a 0 (u 0 } is te value of α1 tat minimizes n Q n0 (α 1, α 2, β = n 1 ɛ i γ n α1 ( α 2 a 0 (u 0(U i u 0 p ( β m a m (u 0 + ( β p+m a m (u 0 m=1 (U i u 0 ] X im + i (u 0 (U i u 0 Similarly as Lemma 31, we can establis te following local quadratic approximation wic olds uniformly in a neigborood around 0: γn 1 Q n0 (α 1, α 2, β were = γn 1 α1 S n0 + γ n g(0f (u 0 α1 2 + γ n 1 Q n0 (0, a 0 (u 0, β 0 + o p (1, (17 S n0 = 2γ n n 1 n I(ɛi i (u 0 1/2 ] (U i u 0 (18 Tis furter leads to an asymptotic representation of α 1 : n( α1 a 0 (u 0 = γ 2 n 2g(0f (u 0 ] 1 S n0 + o p (1 (19 Te teorem below gives te asymptotic distribution of α 1 Teorem 34 Under te conditions of Teorem 32,weave n α 1 a 0 (u 0 μ 2a 0 (u ] o( Estimation of te Standard Errors N ( 0, 12g 2 (0f (u 0 ] 1 ν 0 To make statistical inference based on te local rank metodology, one needs to estimate te standard error of te resulting estimator As indicated by Teorem 32, te asymptotic covariance matrix of te local rank estimator is rater complex and involves unknown functions Here we propose a standard error estimator using a simple resampling metod proposed by Jin, Ying, and Wei (2001 Let V 1,,V n be independent and identically distributed nonnegative random variables wit mean 1/2 and variance 1 We consider a stocastic perturbation of (4: 1 Q n (β,α 2 = (V i + V j e i e j n(n 1 1 i,j n (U i u 0 (U j u 0, (20 were e i is defined in (3 In Q n (β,α 2, te data Y i, U i, X i } are considered to be fixed, and te randomness comes from te V i s Let (β T, α 2 T be te value of (β T,α 2 T tat minimizes Q n (β,α 2 It is easy to obtain (β T, α 2 T by applying a simple algoritm described in Section 31 Jin, Ying, and Wei (2001 establised te validity of te resampling metod wen te objective function as a U-statistics structure Altoug teir teory covers many important applications, tey require tat te U-statistic as a fixed kernel We extend teir result to our setting, were te U-statistic involves variable kernel due to nonparametric smooting Let a(u 0 be te local rank estimator of a(u 0 based on te perturbed objective function (20, tat is, it is te subvector tat consists of te first p components of β Its asymptotic normality is given in te teorem below Teorem 35 Under te conditions of Lemma 31, conditional on almost surely every sequence of data Y i, U i, X i }, ( ν 0 na(u0 â(u 0 ] N 0, 12τ 2 f (u 0 1 (u 0 in distribution Tis teorem suggests tat to estimate te asymptotic covariance matrix of â(u 0, one can repeatedly perturb (4 by generating a large number of independent random samples V i } n For eac perturbed objective function, one solves for a(u 0 Te sample covariance matrix of a(u 0 based on a large number of independent perturbations provides a good approximation Te accuracy of te resulting standard error estimate will be tested in te next section Te perturbed estimator as conditional bias equal to zero It as been found tat standard bootstrap metod, wic resamples from te empirical distribution of te data, also estimates bias as zero wen estimating nonparametric curves (Hall and ang 2001 It is possible to use more delicate bootstrap tecnique to estimate te bias of a nonparametric curve estimator Altoug some of te ideas may be adapted to te metod of disturbing te objective function, tis is beyond te scope of our article and is not pursued furter ere 4 NUMERICAL STUDIES 41 A Pseudo-Observation Algoritm Te local rank estimator can be obtained by applying an efficient and reliable algoritm Note tat te local rank estimator of (β T 0, a 0 (u 0 T can be solved by fitting a weigted L 1 regression on n(n 1 2 pseudo observations (x i x j, Y i Y j wit weigts w ij = ((U i u 0 /((U j u 0 /, were x i = (U i u 0, Xi T,(U i u 0 Xi TT,1 i < j ngiven( β T, α 2 T, te estimator of a 0 (u 0 can be obtained by anoter weigted L 1 regression on (1, Y i α 2 (U i u 0 p m=1 β m + β p+m (U i u 0 ]X im wit weigts w i = ((U i u 0 /, 1 i n Many statistical software packages can implement weigted L 1 regression In our numerical studies, we use te function rq in te R package quantreg 42 Bandwidt Selection Bandwidt selection is an important issue for all statistical models tat involve nonparametric smooting Altoug we ave derived te teoretical optimal bandwidt for te local rank estimator in (14 and (16, it is difficult to use te plug-in metod to estimate it due to many unknown quantities

6 1636 Journal of te American Statistical Association, December 2009 We propose below an alternative bandwidt selection metod tat is practically feasible Tis approac is based on te relationsip between opt R and opt LS From Section 32, we see tat ( opt R (u 1 1/5 0 = 12τ 2 σ 2 opt LS (u 0 and ( (21 opt R = 1 1/5 12τ 2 σ 2 opt LS Tus, we can first use existing bandwidt selectors (eg, Zang and Lee 2000 to estimate opt LS (u 0 or opt LS Te error variance σ 2 can be estimated based on te residuals; in particular wen robustness is of concern it can be estimated using te MAD of te residuals Hettmansperger and Mckean (1998, p 181 discussed in details ow to estimate τ, wic can be obtained by te function wilcoxontau in te R software developed by Terpstra and Mcean (2005 In te end, we plug in tese estimators into (21 to get te bandwidt for local rank estimator Alternatively, instead of te above two-step procedure, we may directly use te computationally intensive cross-validation to estimate te bandwidt for te local rank procedure Note tat under outlier contamination, standard cross-validation can lead to extremely biased bandwidt estimates because it can be adversely influenced by extreme prediction errors Robust cross-validation metod, suc as tat developed by Leung (2005, is terefore preferred 43 Examples We conduct Monte Carlo simulations to access te finite sample performance, and illustrate te proposed metodology on a real environmental dataset In te analysis, we use te Epanecnikov kernel (u = 075(1 u 2 I( u < 1 Example 1 We generate random data from Y = a 0 (U + a 1 (UX 1 + a 2 (UX 2 + ɛ, were a 0 (u = exp(2u 1, a 1 (u = 8u(1 u, and a 2 (u = 2sin 2 (2πu Te covariate U follows a uniform distribution on 0, 1], and is independent from (X 1, X 2, were te covariates X 1 and X 2 are standard normal random variables wit correlation coefficient 2 1/2 Te coefficient functions and te mecanism to generate U and (X 1, X 2 are te same as tose in Cai, Fan, and Li (2000 We consider six different error distributions: N(0, 1, Laplace, standard Caucy, t-distribution wit 3 degrees of freedom, mixture of normals 09N(0, N(0, 10 2 and log-normal distribution Except for Caucy error, all te oter generated random errors are standardized to ave median 0 and variance 1 We consider sample sizes n = 400 and 800, and conduct 400 simulations for eac case We compare te performance of te local rank estimate wit te local least squares estimate using te square root of average squared errors (RASE, defined by n 1 p grid } 1/2 RASE = â m (u k a m (u k } 2, n grid m=1 k=1 were u k : k = 1,,n grid } is a set of grid points uniformly placed on 0, 1] wit n grid = 200 Te sample mean and standard deviation of te RASEs over 400 simulations are presented in Figures 1 and 2, for sample size n = 400 and 800, respectively Te two figures clearly demonstrate tat te local rank estimator performs almost as well as te local least squares estimator for normal random error, and as smaller RASE for oter eavier-tailed error distributions Te efficiency gain can be substantial; see, for example, te mixture normal case were te observed relative efficiency for te local rank estimator versus te local least squares estimator is above 2 for most coices of bandwidt For Caucy random error, te local rank estimator yields a n-consistent estimator but te local least squares estimator is inconsistent, wic is reflected by te extremely large value of RASE for te local least squares estimator Figure 3 depicts te estimated coefficient functions for te normal random error and te mixture normal random error for a typical sample, wic is selected in suc a way tat its RASE value is te median of te 400 RASE values For tis typical sample, we observe tat te local rank estimator is almost identical to te local least squares estimator for normal random error; but falls muc closer to te trut tan te local least squares estimator does for mixture normal random error Figure 4 plots te estimated coefficient functions for all 400 simulations wen te random error as a mixture of normal distribution It is clear tat te local rank estimator as smaller variance In tese two figures, we set te bandwidt to be te teoretic optimal one opt, calculated using (15 and (16, for bot te local rank estimator and te local least squares estimator At te end, we evaluate te resampling metod (Section 34 for estimating te standard errors We randomly perturb te objective function 1000 times; eac time te random variables V i in (20 are generated from te Gamma(025, 2 distribution Table 2 summarizes te simulation results at tree points u 0 = 025, 050, and 075 In te table, SD denotes te standard deviation of 400 estimated â m (u 0 and can be regarded as te true standard error; SE (std(se denotes te mean (standard deviation of 400 estimated standard errors from te resampling metod Bandwidts are set to be te optimal ones We observe tat te resampling metod estimates te standard error very accurately Example 2 As an illustration, we now apply te local rank procedure to te environmental dataset in Fan and Zang (1999 Of interest is to study te relationsip between levels of pollutants and te number of total ospital admissions for circulatory and respiratory problems on every Friday from January 1, 1994 to December 31, 1995 Te response variable is te logaritm of te number of total ospital admissions, and te covariates include te level of sulfur dioxide (X 1, te level of nitrogen dioxide (X 2, and te level of dust (X 3 A scatter plot of te response variable over time is given in Figure 5(a We analyze tis dataset using te following varying coefficient model: Y = a 0 (u + a 1 (ux 1 + a 2 (ux 2 + a 3 (ux 3 + ɛ, were u denotes time and is scaled to te interval 0,1] We select te bandwidt via te relation (21 More specifically, we first use 20-fold cross-validation to select a bandwidt ĥ LS for te local least squares estimator We ten use te function wilcoxontau in te R package for rank regression by Terpstra and Mcean to estimate (12τ 2 1/2 and use te MAD

7 Wang, ai, and Li: Local Rank Inference for Varying Coefficient Models 1637 Figure 1 Bar graps of te RASE wit standard error for sample size n = 400 over 400 simulations Te ligt gray bar denotes te local least squares metod and te dark gray bar denotes te local rank metod Te orizontal axis is in te unit of opt, wic is calculated separately for eac metod and error specification using eiter (13 or(14 of te residuals to robustly estimate σ All tese lead to te selected bandwidt for te local rank estimator: ĥ R = 026 Te estimated coefficient functions are depicted in Figures 5(b, (c, and (d, were te two dased curves around te solid line are te estimated function plus/minus twice te standard errors estimated by te resampling metod Tese two dased lines can be regarded as a pointwise confidence interval wit bias ignored Te figures suggest clearly tat te coefficient functions vary wit time Te fitted curve is sown in Figure 5(a Now we demonstrate te robustness of te local rank procedure To tis end, we artificially perturb te dataset by moving te response value of te 68t observation from 589 to 689, and te response value of te 34t observation from 507 to 307 We refit te data wit bot te local least squares procedure and te local rank procedure; see Figure 6 We observe tat te local least squares estimator canges dramatically due to te presence of tese two artificial outliers In contrast, te local rank estimator is nearly unaffected APPENDIX: PROOFS Regularity Conditions (C1 Assume tat U i, X i, Y i } are independent and identically distributed, and tat te random error ɛ and te covariate U, X} are independent Furtermore, assume tat ɛ as probability density function g( wic as finite Fiser information, tat is, g(x} 1 g (x 2 dx < ; andtatu as probability density function f ( (C2 Te function a m (, m = 0, 1,,p, as continuous secondorder derivative in a neigborood of u 0 (C3 Assume tat E(X i U i = u 0 = 0andtat (u = E(X i X T i U i = u is continuous at u = u 0 Te matrix (u 0 is positive definite (C4 Te kernel function ( is symmetric about te origin, and as a bounded support Assume tat 0andn 2,as n Tese conditions are used to facilitate te proofs, but may not be te weakest ones Te assumptions on te random errors in (C1 are te same as tose for multiple linear rank regression (Hettmansperger and Mcean 1998 (C2 imposes smootness requirement on te coefficient functions In (C3, te assumption E(X i U i = u 0 = 0also adopted by im (2007] makes te presentation simpler but can be relaxed It can be sown tat te asymptotic normality still olds witout tis assumption Te conditions on te kernel function and bandwidt in (C4 are common for nonparametric kernel smooting In our proofs, we will use some results on generalized U-statistic, were te kernel function is allowed to depend on te sample size n Te generalized U-statistic as te form U n =n(n 1] 1 i j H n(d i, D j,wered i } n is a random sample and H n is symmetric in its arguments, tat is, H n (D i, D j = H n (D j, D i Intis article, D i = (X T i, U i,ɛ i T Definer n (D i = EH n (D i, D j D i ], r n = Er n (D i ], andû n = r n + 2n 1 n r n (D i r n ] We will repeatedly use te following lemma taken from Powell, Stock, and Stoker (1989

8 1638 Journal of te American Statistical Association, December 2009 Figure 2 Bar graps of te RASE wit standard error for sample size n = 800 over 400 simulations Te ligt gray bar denotes te local least squares metod and te dark gray bar denotes te local rank metod Te orizontal axis is in te unit of opt, wic is calculated separately for eac metod and error specification using eiter (13 or(14 Figure 3 Plot of estimated coefficient functions for a typical dataset

9 Wang, ai, and Li: Local Rank Inference for Varying Coefficient Models 1639 Figure 4 (a and (c are plots of 400 local least squares estimators of a 1 ( and a 2 ( over 400 simulation, respectively (b and (d are plots of 400 local rank estimators of a 1 ( and a 2 (, respectively Lemma A1 If E H n (D i, D j 2 ]=o(n, ten n(u n Û n = o p (1 and U n = r n + o p (1 We need te following two lemmas to prove Lemma 31 Denote ( ( } A n11 = 2 2 E (Z i Z j (Z i Z j T Ui u 0 Uj u 0, ( ( } A n12 = 2 2 Ui u E (Z i Z j (U i U j /] 0 Uj u 0, Table 2 Standard deviations of te local rank estimators wit n = 400 â 1 (u â 2 (u Error u 0 SD SE (std(se SD SE (std(se Normal ( ( ( ( ( (0032 Laplace ( ( ( ( ( (0037 Mixture ( ( ( ( ( (0055 t ( ( ( ( ( (0042 Log N ( ( ( ( ( (0060 A n21 = A T n12, ( A n22 = 2 2 E (U i U j 2 / 2 Ui u ] 0 and define ( An11 A A n = τ n12 A n21 A n22 ( Uj u 0 }, Lemma A2 Suppose tat Conditions (C1 (C4 old, ten A n A,wereA is defined in (8 Proof We can write ( A 1 A n11 = n11 A 2 n11 A 3 n11 A 4 n11 Let ( A 1 n11 = 2 2 E (X i X j (X i X j T Ui u 0 ( Uj u 0 ] Calculating te expectation by conditional on U i and U j first, A n11 becomes 2 2 E(X i X j (X i X j T U i = u, U j = v] ( u u0 ( v u0 f (uf (v du dv Using Condition (C3, straigtforward calculation gives A 1 n11 4f 2 (u 0 (u 0 Let A 2 n11 = 2 2 E (X i X j X i (U i u 0 / X j (U j u 0 /] T ( Ui u 0 ( Uj u 0 }

10 1640 Journal of te American Statistical Association, December 2009 Figure 5 (a Scatterplot of te log of number of total ospital admissions over time, and te solid curve is an estimator of te expected log of number of ospital admissions over time at te average pollutant levels, tat is, â 0 (u + â 1 (u X 1 + â 2 (u X 2 + â 3 (u X 3,(b,(c,and(dare te estimated coefficient functions via te local rank estimator for a k (, k = 1, 2, and 3, respectively Figure 6 (a Scatterplot of te perturbed data (witout te two outliers sown as tey are outside of te range, and te local LS and te local rank estimators of te expected log of number of ospital admissions (b, (c, and (d are te local LS and te rank estimators of te coefficient functions for a k (, k = 1, 2, and 3, respectively

11 Wang, ai, and Li: Local Rank Inference for Varying Coefficient Models 1641 Using Condition (C3 and notice tat ( is symmetric, it can be sown tat A 2 n11 0 By symmetry, A3 n11 0 Similarly, we ave A 4 n11 = 2 2 E X i (U i u 0 / X j (U j u 0 /] Tus X i (U i u 0 / X j (U j u 0 /] T ( ( } Ui u 0 Uj u 0 4f 2 (u 0 (u 0 μ 2 ( A n11 4f 2 Ip 0 (u 0 (u 0 0 μ 2 I p Similarly, we can sow tat A n12 = A T n21 0,and A n22 = 2 (t 1 t 2 2 (t 1 (t 2 f (u 0 + t 1 f (u 0 + t 2 dt 1 dt 2 4f 2 (u 0 μ 2 Lemma A3 Under Conditions (C1 (C4, we ave ( γn 1 S n (β,α2 β S n(0, 0]=γ n A α2 + o p (1 Proof Let U n = γn 1 S n (β,α2 S n(0, 0] =n(n 1] 1 i j W n(d i, D j,were W n (D i, D j = 2 I ( ɛ i γ n α2 (U i u 0 / γ n β T Z i + i (u 0 ɛ j γ n α2 (U j u 0 / γ n β T Z j + j (u 0 1/2 ] ( Zi Z j (U i U j / (U i u 0 (U j u 0 Let H n (D i, D j =W n (D i, D j + W n (D j, D i ]/2, ten U n =n(n 1] 1 i j H n(d i, D j as te form of a generalized U-statistic Note tat E H n (D i, D j 2 ] 1 2 E W n(d i, D j 2 ]+ 1 2 E W n(d i, D j 2 ]=E W n (D i, D j 2 ]Furtermore, E W n (D i, D j 2] 4 4 (Zi E Z j T (Z i Z j +(U i U j /] 2] ( ( } 2 Ui u 0 2 Uj u 0 = O( 2 = o(n since n 2 by assumption Tus U n = EH n (D i, D j ]+o p (1 by Lemma A1Furtermore, EH n (D i, D j ] = 2 2 E G(ɛ + j (u 0 i (u 0 γ n α2 (U j U i / γ n β T (Z j Z i G(ɛ ] g(ɛ dɛ ( ( ( } Zi Z j Ui u 0 Uj u 0 (U i U j / = 2 2 γ n E gɛ + j (u 0 i (u 0 ]g(ɛ dɛ ( Zi Z j (Z T (U i U j / i Z T j,(u i U j / ( Ui u 0 = γ n A n ( β α o(1} ( } ( Uj u 0 β α2 Te proof is completed by using Lemma A2 Proof of Lemma 31 In view of Lemma A3, it follows tat γ 1 n Q n (β,α 2 B n(β,α 2 ] = γn 1 S n (β,α2 S n(0, 0] γ n A = o p (1 (1 + o(1 ( β Te proof follows along te same lines of te proof of teorem A37 of Hettmansperger and Mcean (1998, using a diagonal subsequencing argument and convexity Proof of Teorem 32 By Lemma 31, γn 1 Q n (s 1, s 2 = B n (s 1, s 2 + r n (s 1, s 2,were r n (s 1, s 2 p 0 uniformly over any bounded set Note tat γn 1 Q n (s 1, s 2 is minimized by ( β T n, α 2n T,andB n (s 1, s 2 is minimized by ( β T n, α 2n T = γn 2 A 1 (Sn1 T (0, 0, S n2(0, 0 T We first establis te asymptotic representation by following similar argument as in Hjort and Pollard (1993 For any constant c > 0, define T n = inf B n (s 1, s 2 B n ( β (s T 1,s 2 ( β T n, α 2n =c n, α 2n, R n = sup γn 1 Q n (s 1, s 2 B n (s 1, s 2, (s T 1,s 2 ( β T n, α 2n c p ten R n 0asn Let(s T 1, s 2 T be an arbitrary point outside te ball (s T 1, s 2 T : (s T 1, s 2 ( β T n, α 2n c}, ten we can write (s T 1, s 2 T = ( β T n, α 2n T + l1 2p+1,werel > c is a positive constant and 1 d denotes a unit vector of lengt d Write c l γ 1 n Q n (s 1, s 2 γ 1 = c l γ 1 n Q n (s 1, s 2 + γ 1 n Q n ( β n, α 2n n Q n ( β n, α 2n ] ( 1 c l By te convexity of γn 1 Q n (s 1, s 2,weave c l γ 1 n Q n (s 1, s 2 γ 1 Tus, γn 1 Q n n Q n ( β n, α 2n ] ( 1 c l ( c l (s 1, s 2 + c γ 1 n Q n l (s 1, s 2 γn 1 Q n ( β n, α 2n ] α 2 γ 1 n Q n ( β n, α 2n ( β n, α 2n γ 1 n Q n ( β n, α 2n γn 1 Q n ( β n + c1 2p, α 2n + c γ 1 n Q n ( β n, α 2n = B n ( β n + c1 2p, α 2n + c + r n( β n + c1 2p, α 2n + c B n ( β n, α 2n r n( β n, α 2n T n 2R n If R n 1 2 T n,tenγn 1 Q n (s 1, s 2 >γn 1 Q n ( β n, α 2n for all (st 1, s 2 T outside te ball Tis implies if R n 1 2 T n ten te minimizer of

12 1642 Journal of te American Statistical Association, December 2009 γn 1 Q n must be inside te ball Tus P ( ( β T n, α 2n T ( β T n, α 2n T c P ( R n 1 2 T ( n = P Rn 1 2 λc2 0, were λ is te smallest eigenvalue of A Terefore, ( β T n, α 2n T = ( β T n, α 2n T + o p (1 Tis in particular implies te asymptotic representations (9, (11, and (12 We next sow te asymptotic normality of â(u 0 From(9, we ave n(â(u0 a(u 0 were = γ 2 n (4τf 2 (u 0 (μ 0 1 S n11 (0, 0 + o p (1, (A1 S n11 (0, 0 = 2γ n n(n 1] 1 i j I(ɛi + i (u 0 ɛ j + j (u 0 1/2 ] (X i X j (U i u 0 (U j u 0 (A2 By (A2, let us rewrite γ 2 n S n11(0, 0 = S na1 (0, 0 + S na2 (0, 0, were S na (0, 0 = 2γ 1 n n(n 1] 1 i j I(ɛ i ɛ j 1/2](X j X i S nb (0, 0 = 2γ 1 n We next prove tat (U i u 0 (U j u 0, i j n(n 1] 1 I(ɛi + i (u 0 ɛ j + j (u 0 I(ɛ i ɛ j ] (X j X i (U i u 0 (U j u 0 S na (0, 0 N ( 0, 4 3 f 3 (u 0 ν 0 (u 0 in distribution (A3 Note tat we can write S na (0, 0 = nn(n 1] 1 i j H n(d i, D j, were H n (D i, D j = W n (D i, D j + W n (D j, D i wit W n (D i, D j = 3/2 I(ɛ i ɛ j 1/2](X j X i ( Ui u 0 ( Uj u 0 Similarly to te arguments in te proof of Lemma A3, it can be sown tat E H n (D i, D j 2 ]=o(n By Lemma A1, tis implies tat S na (0, 0 = 2n 1 n r n (D i + o p (1 since it is easy to ceck tat r n = 0 We ave r n (D i = EH n (D i, D j D i ] ( = 2 3/2 Ui u G(ɛ i 1/2] 0 ( Uj u } 0 E (X i X j Xi, U i,ɛ i ( = 2 1/2 Ui u G(ɛ i 1/2] 0 ( (tf (u 0 + t dt X i ] E(X j U j = u 0 + t(tf (u 0 + t dt Furtermore, Er n (D i r n (D i T ] = E ( 2 ( Ui u 0 ( (tf (u 0 + t dt X i ] E(X j U j = u 0 + t(tf (u 0 + t dt (tf (u 0 + t dt X T i ]} E(Xj T U j = u 0 + t(tf (u 0 + t dt 1 3 f 3 (u 0 ν 0 (u 0 To prove te asymptotic normality of S na (0, 0, it is sufficient to ceck te Lindeberg Feller condition: ɛ >0, n 1 n Er n (D i r n (D i T I( r n (D i >ɛ n} 0 Tis can be easily verified by applying te dominated convergence teorem Next we sow tat S nb (0, 0 = 22 γ n τf 2 (u 0 μ 2 (u 0 a (u 0 + o(1]+o p (1 (A4 We may write S nb (0, 0 =n(n 1] 1 i j H n (D i, D j, were H n (D i, D j = W n (D i, D j + W n (D j, D i wit W n (D i, D j = n 1 γ n I(ɛi + i (u 0 ɛ j + j (u 0 I(ɛ i ɛ j ] Note tat j (u 0 i (u 0 ( Ui u (X j X i 0 = 1 2 (U j u 0 2 X T j (U i u 0 2 X T i ]a (u (U j u 0 2 (U i u 0 2 ]a 0 (u 0 + o((u i u o((u j u 0 2 ( Uj u 0 By applying Lemma A1, it can be sown tat S nb (0, 0 = EH n (D i, D j ]+o p (1 It follows by using te same arguments as tose in te proof of Lemma A2 tat EH n (D i, D j ] = 2n 1 γ n E G(ɛ + j (u 0 i (u 0 G(ɛ ] g(ɛ dɛ ( ( } Ui u (X j X i 0 Uj u 0 = 2n 1 γ n τ + O(]E ( j (u 0 i (u 0 (X j X i ( Ui u 0 ( Uj u 0 ] (1 + o(1 = 22 γ n τf 2 (u 0 μ 2 (u 0 a (u 0 + o(1] Tis proves (A4 By combining (A3 and(a4 and using te approximation given in (A1, we obtain (10 Proof of Teorem 33 A result of Hodges and Lemann (1956 indicates tat te ARE as a lower bound /5 = 08896, wit tis lower bound being attained at te density f (t = (5 x2 I( x 5

13 Wang, ai, and Li: Local Rank Inference for Varying Coefficient Models 1643 Proof of Teorem 34 Let V n (α 1,ξ 1, ξ 2, ξ 3 n = (n 1 ɛi γ n α1 ξ 1(U i u 0 ξ T 2 X i ξ T 3 (U i u 0 X i + i (u 0 ( Ui u 0, were α1 = γ n 1 (α 1 a 0 (u 0, ξ 1 R, ξ 2 R p,andξ 3 R p Te subgradient of V n (α1,ξ 1, ξ 2, ξ 3 wit respect to α1 is S n (α 1,ξ 1, ξ 2, ξ 3 = 2γ n n n ( I ɛi γ n α1 + ξ 1(U i u 0 + ξ T 2 X i + ξ T 3 (U i u 0 X i i (u 0 1/2 ] ( Ui u 0 We ave S n (0, 0, 0, 0 = 2γ n(n 1 n I(ɛ i i (u 0 1/2] ( U i u 0, wic is te same as te S n0 defined in (18 Let U n (α1,ξ 1, ξ 2, ξ 3 = γn 1 Sn (α 1,ξ 1, ξ 2, ξ 3 Sn (0, 0, 0, 0],ten U n (α 1,ξ 1, ξ 2, ξ 3 n = 2(n 1 ( I ɛi γ n α1 + ξ 1(U i u 0 + ξ T 2 X i + ξ T 3 (U i u 0 X i i (u 0 I(ɛ i i (u 0 ] ( Ui u 0 For any positive constants c i, i = 1, 2, 3and ξ 1, ξ 2, ξ 3 suc tat ξ 1 c 1 1 γ n, ξ 2 c 2 γ n,and ξ 3 c 3 1 γ n,weave U n (α 1,ξ 1, ξ 2, ξ 3 = 2γ n g(0f (u 0 α 1 + o p(1 (A5 Tis can be proved by directly cecking te mean and variance More specifically, EU n (α1,ξ 1, ξ 2, ξ 3 ] G = 2 1 ( E γn α1 + ξ 1(U i u 0 + ξ T 2 X i + ξ T 3 (U i u 0 X i And i (u 0 G( i (u 0 ] ( } Ui u 0 = 2 1 γn g(0e α1 + ξ 1(U i u 0 + ξ T 2 X i + ξ T 3 (U ] i u 0 X i ( } Ui u 0 (1 + O( = 2γ n g(0f (u 0 α1 (1 + O( VarU n (α1,ξ 1, ξ 2, ξ 3 ] I 4n 1 2 ( E ɛi γ n α1 + ξ 1(U i u 0 + ξ T 2 X i + ξ T 3 (U i u 0 X i i (u 0 I(ɛ i i (u 0 ] ( } 2 2 Ui u 0 ( } 4n 1 2 E 2 Ui u 0 = O(n 1 1 = o(1 By (A5 and similar proof as tat for Lemma 31,weave γ 1 n V n (α 1,ξ 1, ξ 2, ξ 3 = V n (α 1 + o p(1, (A6 were Vn (α 1 = γ n 1 Sn (0, 0, 0, 0α 1 + γ ng(0f (u 0 α1 2 + γ n 1 V n (0, 0, 0, 0 Because te function V n (α1,ξ 1, ξ 2, ξ 3 is convex in its arguments, (A6 can be strengtened to uniform convergence (convexity lemma, see Pollard 1991, tat is, sup α 1 C, ξ 1 c 1 1 γ n ξ 2 c 2 γ n, ξ 3 c 3 1 γ n γn 1 V n (α1,ξ 1, ξ 2, ξ 3 Vn (α 1 =o p(1, were C is a compact set in R By Teorem 32, α 2 a 0 (u 0 = O p ( 1 γ n, â(u 0 a(u 0 = O p (γ n, and â (u 0 a (u 0 = O p ( 1 γ n, we tus ave sup γ 1 ( α1 C n V n α 1, α 2 a 0 (u 0, â(u 0 a(u 0, â (u 0 a (u 0 V n (α 1 = op (1 Note tat V n (α1, α 2 a 0 (u 0, â(u 0 a(u 0, â (u 0 a (u 0 = Q n0 (α 1, α 2, β, Sn (0, 0, 0, 0 = S n0, were Q n0 and S n0 are defined in Section 33 Te quadratic function Vn (α 1 is minimized by α 1n = 1 2 γ n 2 g(0f (u 0 ] 1 S n0 Similar argument as tat for Teorem 32 sows tat α 1n = α 1n + o p(1 Tus we ave (19 We can write γn 2 S n0 = T 1n + T 2n,were 2γ 1 n ( n Ui u T 1n = I(ɛ i 0 1/2] 0, n 2γ 1 n n T 2n = I(ɛi i (u 0 I(ɛ i 0 ] Ui u ( 0 n By te Lindeberg Feller central limit teorem, T 1n N(0, f (u 0 ν 0 /3 in distribution By cecking mean and variance, we ave T 2n = 2 γ n g(0f (u 0 a 0 (u 0μ 2 (1 + o(1 + o p (1 Combining te above results and using (19, te proof is completed To prove Teorem 35, we first extend Lemma A1 to almost sure convergence Lemma A4 If E H n (D i, D j 2 ]=O( 2,tenU n Û n = o(1 almost surely and U n = r n + o(1 as Proof Te proof of Powell, Stock, and Stoker (1989 for Lemma A1 suggests tat E U n Û n 2 ]=O(n 2 2 By teorem 135 of Serfling (1980, n E U n Û n 2 ]=O(n 1 2 < TisimpliestatU n Û n = o(1 almost surely Te second result follows by an application of te strong law of large numbers to Û n Proof of Teorem 35 Let β and α2 be defined te same as before We introduce te reparametrized objective function Q n (β,α2 LetS n(β,α2 = (S T n1 (β,α2, S n2(β,α2 T denote te gradient function of Q n (β, α2, wic is defined similarly as in Section 31 We first sow tat S n (β,α2 as a similar local linear approximation as stated in Lemma A3 To make te proof concise, we prove tis for S n1 (β,α2, were S n1 (β,α 2 = 2γ n n(n 1] 1

14 1644 Journal of te American Statistical Association, December 2009 i j (V i + V j I ( ɛ i γ n α 2 (U i u 0 / γ n β T Z i + i (u 0 ɛ j γ n α 2 (U j u 0 / γ n β T Z j + j (u 0 1/2 ] (Z i Z j (U i u 0 (U j u 0 Let U n = γn 1 S n1 (β,α2 S n1(0, 0] =n(n 1] 1 i j (V i + V j M n (D i, D j, β,α2, were M n(d i, D j, β,α2 = 1 2 m n(d i, D j, β,α2 + m n(d j, D i, β,α2 ] and m n (D i, D j, β,α 2 = 2 I ( ɛ i γ n α 2 (U i u 0 / γ n β T Z i + i (u 0 ɛ j γ n α 2 (U j u 0 / γ n β T Z j + j (u 0 1/2 ] (Z i Z j (U i u 0 (U j u 0 Note tat U n = 2n 1 n V i (n 1 1 n j=1,j i M n (D i, D j, β, α 2 ] Conditional on D i} n, tis is a weigted average of V inote tat E(U n D i } n =n(n 1] 1 i j M n (D i, D j, β,α 2, Var(U n D i } n = n 2 n n (n 1 1 j=1,j i M n (d i, d j, β,α 2 ] 2 By Lemma A4, it can be sown tat n(n 1] 1 i j M n(d i, D j, β,α2 = γ na β + o(1 almost surely, were A = 4τf 2 (u 0 diag(i p,μ 2 I p (μ 0 It is also easy to ceck tat n 2 n (n 1 1 n j=1,j i M n (D i, D j, β,α2 ]2 = o(1 almost surely Tus for almost surely every sequence D i } n, U n = γ n A β + o p (1, were o p (1 is in te probability space generated by V i } n Te proofs of Lemma 31 and te asymptotic representation in Teorem 32 can be similarly carried out to sow tat for almost surely every sequence D i } n, nan (u 0 a(u 0 ] = γn 2 (4τf 2 (u 0 (μ 0 1 S na (0, 0 + o p(1, (A7 were o p (1 is in te probability space generated by V i } n,and S na (0, 0 = 2γ nn(n 1] 1 i j (V i + V j I(ɛ i + i (u 0 ɛ j + j (u 0 1/2](X i X j (U i u 0 (U j u 0 Te approximation (A1 can be strengtened to almost surely convergence, tat is, nân (u 0 a(u 0 ] = γn 2 (4τf 2 (u 0 (μ 0 1 Sna (0, 0 + o(1 as (A8 Combining (17 and(a7, we ave tat for almost surely every sequence D i } n, nan (u 0 â n (u 0 ] = γn 2 (4τf 2 (u 0 (μ 0 1 S na (0, 0 S na (0, 0]+o p(1 Note tat γn 2 S na (0, 0 S na (0, 0] = 2γn 1 n(n 1] 1 (V i 1/2 + (V j 1/2] i j I(ɛ i + i (u 0 ɛ j + j (u 0 1/2 ] (X i X j (U i u 0 (U j u 0 n = 4γn 1 n 1 (V i 1/2 (n 1 1 n I(ɛi + i (u 0 ɛ j + j (u 0 1/2 ] j=1,j i } (X i X j (U i u 0 (U j u 0 And Eγn 2 S na (0, 0 S na (0, 0] D i} n }=0 Weave were Var γ 2 n S na (0, 0 S na (0, 0] D i} n = 16γn 2 n 2 (n 1 2 n n I(ɛi + i (u 0 ɛ j + j (u 0 1/2 ] j=1,j i (X i X j (U i u 0 (U j u 0 = W 1 + W 2, W 1 = 16γn 2 n 2 (n n n I(ɛi + i (u 0 ɛ j + j (u 0 1/2 ] 2 j=1,j i (X i X j (X i X j T 2 ((U i u 0 / 2 ((U j u 0 /, W 2 = 16γn 2 n 2 (n n n n ( I ɛi + i (u 0 ɛ j1 + j1 (u 0 1/2 ] j 1 i j 2 i,j 1 I ( ɛ i + i (u 0 ɛ j2 + j2 (u 0 1/2 ] ( X i X j1 ( Xi X j2 T } } 2 2 ((U i u 0 / (( U j1 u 0 / (( Uj2 u 0 / Lemma A4 can be used to sow tat W 1 = o(1 almost surely; and a minor extension of Lemma A4 to tird-order U-statistic can be used to sow tat W 2 = 4 3 f 3 (u 0 ν 0 (u 0 + o(1 almost surely Te asymptotic normality of γn 2 S na (0, 0 S na (0, 0] follows by sowing tat te condition of Lindeberg Feller central limit teorem for triangular arrays olds almost surely We ave, for almost surely every sequence D i } n, γn 2 S na (0, 0 S na (0, 0] N( 0, 4 3 f 3 (u 0 ν 0 (u 0 in distribution Tis completes te proof Received January 2009 Revised June 2009] REFERENCES Brumback, B, and Rice, J A (1998, Smooting Spline Models for te Analysis of Nested and Crossed Samples of Curves (wit discussion, Journal of te American Statistical Association, 93, Cai, Z, Fan, J, and Li, R (2000, Efficient Estimation and Inferences for Varying-Coefficient Models, Journal of te American Statistical Association, 95, Cleveland, W S, Grosse, E, and Syu, W M (1992, Local Regression Models, in Statistical Models in S, eds J M Cambers and T J Hastie, Pacific Grove, CA: Wadswort & Brooks, pp

Kernel Density Based Linear Regression Estimate

Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency