Local Adaptive Smoothing in Kernel Regression Estimation

Size: px

Start display at page:

Download "Local Adaptive Smoothing in Kernel Regression Estimation"

Gerard Fitzgerald
5 years ago
Views:

Clemson University TigerPrints All Theses Theses 8-009 Local Adaptive Smoothing in Kernel Regression Estimation Qi Zheng Clemson University, qiz@clemson.

1 Clemson University TigerPrints All Theses Theses Local Adaptive Smoothing in Kernel Regression Estimation Qi Zheng Clemson University, Follow this and additional works at: Part of the Applied Mathematics Commons Recommended Citation Zheng, Qi, "Local Adaptive Smoothing in Kernel Regression Estimation" (009). All Theses This Thesis is brought to you for free and open access by the Theses at TigerPrints. It has been accepted for inclusion in All Theses by an authorized administrator of TigerPrints. For more information, please contact

2 Local Adaptive Smoothing in Kernel Regression Estimation A Master s Thesis Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Master of Science Mathematical Science by Qi Zheng August 009 Accepted by: Dr. Karunarathna B Kulasekera, Committee Co-Chair Dr. Colin Gallagher, Committee Co-Chair Dr. Chanseok Park

3 Abstract We consider nonparametric estimation of a smooth regression function of one variable. In practice it is quite popular to use the data to select one global smoothing parameter. Such global selection procedures cannot sufficiently account for local sparseness of the covariate nor can they adapt to local curvature of the regression function. We propose a new method to select local smoothing parameters which takes into account sparseness and adapts to local curvature of the regression function. A Bayesian method allows the smoothing parameter to adapt to the local sparseness of the covariate and provides the basis for a local cross validation procedure which adjusts smoothing according to local curvature of the regression function. Simulation evidence indicates that the method can result in significant reduction of both point-wise mean squared error and integrated mean squared error of the estimators. ii

4 Dedication a complete one. I dedicate this work to my loving parents. It is their love and support that made this work iii

5 Acknowledgments I would like to epress my heartiest gratitude to my co-advisors Dr.K.B.Kulasekera and Dr. C. Gallagher for their guidance and support. Their mathematical insights inspired me and helped me making this a success. My sincere thanks to Dr.C. Park for his insightful suggestions. iv

6 Table of Contents Title Page i Abstract ii Dedication iii Acknowledgments iv List of Tables vi List of Figures vii Introduction Adaptive bandwidth selection for kernel estimation of regression Bayesian Smoothing window Local cross-validation Numerical results Conclusions Appendices A Proof of Theorem B More numerical results Bibliography v

7 List of Tables 3. EIMSE for m() = , σ = EIMSE for m() = 0.3ep{ 64( 0.5) } + 0.7ep{ 56( 0.75) )}, σ = EIMSE for m() = + 0.5ep{ 50( 0.5) }, σ = vi

8 List of Figures 3. m() = , σ = m() = 0.3ep{ 64( 0.5) } + 0.7ep{ 56( 0.75) )}, σ = regression estimation of Ethanol Data m() = , σ = m() = 0.3ep{ 64( 0.5) } + 0.7ep{ 56( 0.75) )}, σ = m() = + 0.5ep{ 50( 0.5) }, σ = m() = + 0.5ep{ 50( 0.5) }, σ = m() = sin(5π), σ = m() = sin(5π), σ = m() = 0ep( 0), σ = m() = 0ep( 0), σ = vii

9 Chapter Introduction The kernel method for regression estimation has become a standard statistical technique in many areas of research. It is well known that the performance of kernel methods depends crucially on the smoothing parameter (the bandwidth). This fact motivated data-driven bandwidth procedures appearing in the literature. Among these, we mention three: least square cross validation (LSCV) proposed by Craven and Wahba(979); plug-in methods(e.g., see Hardle and Bowman (988)); and AIC based methods (e.g., see Akaike (973)). However, in many situations a single fied bandwidth does not generally provide a good estimator of a regression function. One reason for this is that the amount of smoothing needed for estimating a function where the data is dense is different from that required to estimate a function where the data is sparse. Besides that, a bandwidth chosen to optimally estimate the function at or near its local mode may differ from the bandwidth that is required to get an accurate estimate of the regression function where the function is flat. In order to overcome the disadvantages of a fied bandwidth it is necessary to allow the bandwidth to somehow adapt locally. Apart from the direct plug-in method which requires the knowledge of the unknown mean function, very few data based methods to select local bandwidths have been proposed. Fan et al (996) proposed an adaptive procedure to get local bandwidths. Their technique uses a plug-in approach coupled with an arbitrary approimation scheme to develop a bandwidth as a function of the point of estimation. However, as the authors mentioned, their procedure does not work well for moderate sample sizes. To allow a smoothing method to adapt to sparseness in the data, the choice of a local smoothing parameter should depend on the design density of the covariate. The local sparseness

10 of the covariate would be reflected by a data based local bandwidth for a design density estimator. Gangopadhyay and Cheung (00) and Kulasekera and Padgett (006) proposed adaptive Bayesian bandwidth selection for density estimation. They treat the bandwidth as a scale parameter, and use a prior distribution for that to compensate for the lack of information in moderate to small sample sizes. As a result, their approaches perform well for moderate sample sizes. In this article, we combine Bayesian estimation of a bandwidth for the design density with a local cross-validation to get local bandwidths that enhance nonparametric regression estimator performance. We describe our procedures for local linear regression, but the method can be adapted to other smoothed estimators of regression functions. We show that judicious choices of prior parameters lead to local bandwidths tending to zero as sample size increases, and thus the estimator using bandwidths from our procedure is consistent. This article is structured as follows. In section we develop the Bayesian cross-validation procedure. Results obtained from a simulation study following by a real data eample are provided in Section 3. All the proofs are presented in the Appendi.

11 Chapter Adaptive bandwidth selection for kernel estimation of regression We describe the development of the Bayesian cross-validation approach for kernel regression estimators and discuss asymptotic properties of local smoothing parameters from our procedure. We begin by developing some notation. Let (X i, Y i ), i =,,..., n be n independent and identically distributed (i.i.d.) bivariate observations with a joint density function f(, ). Let the marginal density of X be f( ). Let K be a classical second order kernel function, that is () K(u)du = () uk(u)du = 0 (3) M = u K(u)du 0 (4) V = K(u) du < Our goal is to estimate the regression function, which is the conditional epectation m() = E[Y X = ] (.) (assuming f() 0). Then the model can be written as : Y = m(x) + ɛ 3

12 where E[ɛ X] = 0, V [ɛ X] = σ. In this article, we consider the local linear estimator of m() given by mˆ h () = nh() n ( ) Xi Mn () ( Xi h() K )M n() h() M n ()M 0n () Mn ()Y i (.) i= where M jn () = (h()n) n i= K ( selected at. ) ( X i h() ) j X i h(), j = 0,, ; and h() is the bandwidth to be Our aim is to select a bandwidth that depends on the data. Noticing that the amount of smoothing when X is dense is different from that when X is sparse, we initially get an adaptive smoothing window based on the X observations, using a Bayesian approach (see Kulasekera and Padgett, 006 ). This window will be used to conduct a local cross-validation resulting in h() in (). The reason for a local cross-validation is to accommodate different mean functions with the same design density for X.. Bayesian Smoothing window Here we develop an initial bandwidth which adapts to the local sparseness of data. Define f h () = f K h () = f(u)k h ( u)du = E[K h (X )] (.3) where K h () = h K( h ) Now, considering h as a parameter for f h, for a prior density π(h), the posterior distribution of h is π(h ) = f h()π(h) fh ()π(h)dh (.4) Since f h () is unknown, we can not compute π(h ) directly. However, it is natural to use the sample mean ˆ f h () = n n K h (X i ) i= to estimate f h (). Substituting this in (4), we get ˆπ(h X,..., X n, ) = ˆ f h ()π(h) ˆ fh ()π(h)dh (.5) 4

13 Then, for the squared-error loss, the best local bandwidth h = h() is given by the posterior mean h () = hˆπ(h X,..., X n, )dh (.6) Note that with this approach, the posterior is a function of h only and, with a well-selected prior and a kernel, π(h X,...X n, ) and h () can be eplicitly obtained. Although this approach works in principal for any kernel and the most suitable prior (preferably a conjugate prior), for the remainder of this article we shall use a normal kernel and an inverted-gamma prior due to the algebraic simplicity. In particular for a normal kernel K(u) = π e u /, < u < coupled with an inverted-gamma prior, π(h) = Γ(α)β α ep{ }, α > 0, β > 0, h > 0 hα+ βh we get ˆπ(h X,..., X n, ) = n i= (/hα+ )ep{ (/h )((X i ) / + /β)} n i= (Γ(α + /)/){(X i ) / + /β} resulting in n h Γ(α) i= () = {/(β(x i ) + )} α n βγ(α + /) i= {/(β(x (.7) i ) + )} α+/ The following theorem shows that if one picks the prior parameters in a suitable manner(fi α and let β diverge with a proper rate as sample size increases) one can get a sequence of h () that converges to zero almost surely, for every. Theorem.. Let f( ) be continuous and bounded away from 0 at. If α > 0 is fied, β, and nβ, as n then h () 0 and nh () with probability one, as n 5

14 The proof of this Theorem is given in the Appendi. Remark.. The implementations of the Bayesian procedure requires a specification of parameters α, and β for the prior distribution. From the proof of Theorem. in Appendi A, we can see that the asymptotic rate of the Bayesian bandwidth h () is between [β, β 4α+ ]. Thus, if we let α α be sufficiently large and choose β between n 5 and n 4 5, the Bayesian bandwidths converge to 0 with rate close to that of the mean squared optimal bandwidths.. Local cross-validation The above Bayesian bandwidth only considers the influence from the distribution of X. However, in kernel regression estimation, the bandwidths must reflect the impact of the responses. This motivates the following modifications of the Bayesian bandwidths to obtain the final bandwidths to estimate the regression function. Since m( ) is continuous and our object is to estimate m( ) locally at, we argue that only the segment of observations in the neighborhood of is vitally important. In particular, for estimating m at, we propose to conduct a cross-validation using observations falling in I = [ h (), +h ()] only. The version of local CV in this contet is developed as follows. First let ˆm i (X i ) = (n )h ( Xi X j K h j i ) Mi (X i ) ( Xi Xj h )M i (X i ) M i (X i )M 0i (X i ) M i (X i) Y j and M ki (X i ) = ((n )h) ( ) ( ) k Xi X j Xi X j K, k =,, 3 h h j i Now, let l() denotes the number of covariate values falling in I and let (X i, Y i ), i =,.., l() denotes the corresponding observations. Now we define the locally cross-validation bandwidth h () by l() h () = arg min ) (Y i ˆm i (X i) h i= (.8) Remark.. Note that the bandwidth h () is selected from a set of candidate bandwidths when minimizing the leave one out cross validation CV above. In global cross validation procedures this set is usually taken as [n +η, n η ] for some small η > 0. Such cross validations produce bandwidths of order n /5 with probability tending to when the sample size is n (see Girard, 998). Since the 6

15 number of observations contained in the interval I is of order nh () under reasonable conditions on the density of X, h above will be of order [nh ()] /5. Thus, the proposed adaptive bandwidth h can achieve near asymptotic optimal rate for suitably chosen β in the prior. 7

16 Chapter 3 Numerical results To illustrate the improvement of our Bayesian locally adaptive CV procedure over a few eisting methods, we conducted a Monte Carlo simulation. Implementation of our technique is straightforward. However, there are two steps that we need to be careful. The first one is how to choose the parameters α, β for the prior distribution. In our numerical work, we chose α = 00 and β = n for small and moderate sample sizes. The second is the cross validation. Notice that one needs at least three observations to do the local Cross-Validation for local linear estimators. Thus, in practice, if l() is greater than 3, we can do the local cross-validation to modify the Bayesian bandwidth h (). For windows with less than 3 observations, we used the Bayesian bandwidth. Also in practice, cross-validation is generally carried over a sequence of candidate bandwidths (h,..., h k ). In our simulations, the grid of h values for each is given as h (), h (),..., h ˆl() ˆl() (), where ˆl() =, which equally spaces the initial Bayesian window according to the number of observations l() falling in the window. The performance of local linear estimator using different bandwidth selection procedures is related to the sample size, the distribution of X, the true regression function, and the true standard deviation of the errors. Although only few selected results are reported here for the sake of brevity, the following settings of these factors were eamined, with 000 replications for each set of factor combination. (a) sample size n= 5(small), 50(moderate) (b) distributions of X: Uniform(0, ), Normal(0.5, 0.5) 8

17 (c) regression functions (i) m() = (a polynomial regression function with a trend) (ii) m() = sin(5π) ( a function without much fine structure ) (iii) m() = 0ep( 0) ( a function with a trend but no fine structure ) (iv) m() = + 0.5ep{ 50( 0.5) } ( a function with different degrees of curvature ) (v) m() = 0.3ep{ 64( 0.5) } + 0.7ep{ 56( 0.75) )} ( a function with noticeably different curvature) (d) error standard deviation σ=0., 0.3 Most of the regression functions above were used in earlier studies (Ruppert et al. 995; Hart and Yi, 996; Herrmann 997; Eubank, 999). We used R (Ihaka and Gentlemen, 996) for all computations. We compare the local linear estimators ˆm l () using Bayes type local bandwidth to local linear estimators ˆm g () using global CV bandwidth by comparing the estimated MSE (ESMSE) of each estimator based on the simulations. Specifically, we consider logged ratio r() = log ( ) ESMSE( ˆmg ()) ESMSE( ˆm l ()) and the estimated integrated MSE EIMSE( ˆm) = p p ESMSE( ˆm( i )) i= over (0, ), since (0, ) contains more than 95% observations for both design densities in our simulation study. Here, ESMSE( ˆm()) = ( ˆm() m()) /N at any given for any estimator ˆm() N of i= m() with N being the number of simulations, i s i =,..., p are equally spaced points in (0, ). We chose p = 00 for our simulations. Table 3.: EIMSE for m() = , σ = 0.3 n Design Density Bayesian Local CV Global CV 5 Uniform Normal Uniform Normal

18 r() r() (a) solid=uniform Design dashed=normal Design n=5 (b) solid=uniform Design dashed=normal Design n=50 m() (c) regression function Figure 3.: m() = , σ = 0.3 Table and Figures give some results of two typical mean functions: () m() = , () m() = 0.3ep{ 64( 0.5) } + 0.7ep{ 56( 0.75) )}. The results with other mean functions and parameter combinations were very similar, and are presented in Appendi B. Figure 3. gives the logged ratio r() for the regression function m() = with σ = 0.3, where the global CV works well for small and moderate sample sizes with the uniform design. As shown in Figure 3., our procedure outperforms the global CV over a large proportion of the interval even for the uniform design, when the sample size is small. Although it does not outperform the global CV everywhere for the uniform design when the sample size is moderate, it is still preferable since the integrated MSE (MSE) of ˆm() was less than that of ˆm n () ( see Table 3.). As for the normal design, the Bayesian local CV dominates the global CV nearly everywhere, especially where the data is sparse. Figure has shown that our method not only works well for those regression functions on which the global CV has a good performance; moreover, it gives a much better estimation when the design is not uniform. This phenomenon is simply a manifestation of the fact that our technique takes the sparseness of design into account and adapts to the design density. Figure 3. shows the logged ratio r() for the mean function m() = 0.3ep{ 64( 0.5) } + 0.7ep{ 56( 0.75) )} with σ = 0.. This mean function has two modes and the 0

19 r() r() (a) solid:uniform Design dashed:normal Design n= (b) solid:uniform Design dashed:normal Design n=50 m() (c) regression function Figure 3.: m() = 0.3ep{ 64( 0.5) } + 0.7ep{ 56( 0.75) )}, σ = 0. Table 3.: EIMSE for m() = 0.3ep{ 64( 0.5) } + 0.7ep{ 56( 0.75) )}, σ = 0. n Design Density Bayesian Local CV Global CV 5 Uniform Normal Uniform Normal degrees of its curvature are noticeably different, and hence the global CV performs relatively poorly, especially for areas where design density is sparse. Figure clearly indicates that our Bayesian local cross-validation selection is better than the global CV method for small and moderate sample sizes. This fact is epected since our procedure adapts to the mean function and density design locally. For The improvement of our method is not significant for sample size 00 or larger, since the global CV performs well with a large sample size. In addition to the above, we eamined the performance of our method against the improved AIC method (AIC c ) proposed by Hurvich et al. (998) and the adaptive bandwidth selection mentioned in Fan et al. (996). Fan et al. (996) eplicitly discussed the smoothing parameter selection for density estimation only. They however suggested that their method, although not performing well for small to moderate sample sizes, can be etended to regression estimation. Following their approach, we used the same spline approimation to the optimal bandwidth function h() where the

20 proposed cross-validation over a large set of cubic spline interpolant of prechosen knot-bandwidth pairs {(a, h ),..., (a p, h p )}(in their notation) was conducted by minimizing n i= ( ˆm ĥ i (X i ) Y i ) where ˆmĥi (X i ) is the estimated regression function for a bandwidth calculated at X i for each spline function. The knots (a,... a p ) for our simulation were chosen to be equispaced on [0, ] with p = 6. We chose four values {0.5ĥG, ĥg,.5ĥg, ĥg} for each coordinate h i, i =,..., p where ĥg was the global cross-validation bandwidth. The MSE performance of local linear estimators with Fan et al. type bandwidths and those with AIC c were both inferior to our proposed method. We just provided the IEMSE for these local linear estimators using these bandwidths in Table 3. Table 3.3: EIMSE for m() = + 0.5ep{ 50( 0.5) }, σ = 0.3 n Design Density Bayesian Local CV Fan et al. type bandwidth AIC c 50 Uniform Normal The use of this Bayesian local cross-validation procedure is not restricted to local linear estimator. We eamine the Nadaraya-Watson estimator proposed by Nadaraya (964) and Watson (964) with our local bandwidths and obtained a similar performance to above. To illustrate the use of the proposed idea, we now apply the proposed methodology to analyze a real data set. This data set pertains to an ehaust study using ethanol as the fuel in an eperimental engine (Cleveland 993). The eperiment is to determine the dependency of the concentration of oides of nitrogen, the major air pollutants, on various engine settings. In particular, the response variable Y is the concentration of nitric oide plus the concentration of nitrogen dioide in the ehaust of an eperimental engine when the engine is set at different equivalence ratios (the amount of ethanol). We modeled Y with using the local linear estimator. The Bayesian local CV estimated regrssion function for several pairs of prior parameters along with the global CV type estimator are presented in Figure 3.3.

21 solid: α=00, β=n 5 dashed: α=00, β=n dotted:global CV n=88 Amount of Ethanol NO Figure 3.3: regression estimation of Ethanol Data 3

22 Chapter 4 Conclusions The Bayesian local cross-validation procedure for smoothing parameter selection proposed here has several advantages as seen in our simulations. The finite sample dominance of these estimators coupled with reasonable asymptotic properties make such bandwidths highly desirable. However, there are several issues in the Bayesian approach that need further investigation. These include the eamination of the Bayes criteria couple with other smoothing parameter selection criterions, such as AIC, AIC c. 4

23 Appendices 5

24 Appendi A Proof of Theorem. Here, we show h () 0 and nh (), as n. and () Since β diverges as sample size increases, we write β as β n. Choose ɛ n, such that () ɛ n, ɛ n 0, as n. Since Γ(α) is a constant for fied α, (7) can be rewritten as Γ(α+ ) ( n h i= () = C α ( n i= β n(x i ) + β n(x i ) + ) α ) α+ () where C α = Γ(α) Γ(α+ n ( ) α n β i= n (X i ) = + i= n + ). Consider the numerator of (9), < i= n i= = n ɛn n ɛn [ ] ( I X i ɛn [ I X i ɛn [ I X i ɛn n i= n i= [ I X n ɛn i ɛn I n ɛn ) α β n (X i ) + ] ( ) α β n (X i ) + ] ( ) α ( ) α β n (X i ) + n + ɛ n + ] ( ) α + n β n(x i ) + [ ] ( ) α ( ) α X i ɛn + n ɛ n + ( ) α ɛ n + Consider the denominator of (9) n i= ( ) α+ n [ ] ( ) α+ ɛn β n (X i ) I X i + β i= n(x i ) + = n n [ ] ( ɛn ɛ n X i β n(x i ) + n ɛ n i= n I ɛ n n i= n I ɛ n [ ] ( ) α+ ɛn X i ɛ n + ) α+ Let f n() denotes [ n i= nɛ n I X i ɛn ], and let f n() denotes [ n i= n ɛ n I X i ] ɛ n 6

25 then, combing two inequalities above, we get n i= ( n i= ( ) α β n(x i ) + β n(x i ) + n ɛn f ( ) α n() + n ( ) α ɛ n ) + α+ ) α+ n ( ɛ nf n() ɛ n+ ɛn f ( ( ) α n() = ( ) α+ + ( f n() ɛnf n() ɛ n+ ) α ɛ n + ɛ n+ ) α+ Since both ɛn n=ep( γn( ) ) and n= ep( γn( ɛ n ) ) converge for every γ > 0 with our choices of ɛ n, β n, then sup f n() f() a.s 0, as n ; by Theorem..3 Prakasa Rao (983). Notice that ɛ n and ɛ n 0, as n, then n i= ( n i= ( ) α β n(x i ) + β n(x i ) + ) α+ ( ) α ɛn C f() sup f n() f() a.s 0 ( ) ( α ɛn(ɛ ) n + ) α+ C ) ( ɛ α+ n C 3 + ɛ α n f() ( ) α+ + C ( ) α+ ɛ n+ ɛnf() ɛ n+ ( ) (ɛ n + ) α+ + C f() ɛ n(ɛ n + ) α ( ) α ɛ n + with probability one, as n, where C, C, C 3 are constant. Hence, for sufficiently large n h () C ( ɛ α+ n + ) ɛ α n almost surely, where some C. The optimal rate of ɛ n to minimize the right hand side is β (α+) n. Then, h () Cβ α (α+) n Also, since we have n ( ) α β n(x i ) + i= n ( β n(x i ) + i= ) α+ h () C αβ n Therefore C αβ n h () Cβ α (α+) n () 7

26 By the assumptions, β n and nβ n h () is a proper bandwidth of regression estimator for every., we get h () 0 and nh (), as n. Hence, 8

27 Appendi B More numerical results More figures about our simulation study are presented here. These figures show the improvement of our procedure convincingly. r() r() (a) solid:uniform Design dashed:normal Design n= (b) solid:uniform Design dashed:normal Design n=50 m() Figure : m() = , σ = 0. 9

28 r() r() (a) solid:uniform Design dashed:normal Design n= (b) solid:uniform Design dashed:normal Design n=50 m() Figure : m() = 0.3ep{ 64( 0.5) } + 0.7ep{ 56( 0.75) )}, σ = 0.3 r() r() (a) solid:uniform Design dashed:normal Design n= (b) solid:uniform Design dashed:normal Design n=50 m() Figure 3: m() = + 0.5ep{ 50( 0.5) }, σ = 0. 0

29 r() r() (a) solid:uniform Design dashed:normal Design n= (b) solid:uniform Design dashed:normal Design n=50 m() Figure 4: m() = + 0.5ep{ 50( 0.5) }, σ = 0.3 r() r() (a) solid:uniform Design dashed:normal Design n=5 (b) solid:uniform Design dashed:normal Design n=50 m() Figure 5: m() = sin(5π), σ = 0.

30 r() r() (a) solid:uniform Design dashed:normal Design n= (b) solid:uniform Design dashed:normal Design n=50 m() Figure 6: m() = sin(5π), σ = 0.3 r() r() (a) solid:uniform Design dashed:normal Design n= (b) solid:uniform Design dashed:normal Design n=50 m() Figure 7: m() = 0ep( 0), σ = 0.

31 r() r() (a) solid:uniform Design dashed:normal Design n= (b) solid:uniform Design dashed:normal Design n=50 m() Figure 8: m() = 0ep( 0), σ = 0.3 3

32 Bibliography Akaike, H. (973). Information theory and an etension of the maimum likelihood principle, In Proc. nd Int. Symp. Information Theory (eds. B. N. Petrov and F. Csàki), pp Budapest: Akadémiai Kiadó Cleveland, W. (993), Visualizing Data, Hobart Press, Lafayette. Craven, P. and Wahba, G. (979). Smoothing noisy data with spline functions. in Numer. Math., 3, Eubank, R. L. (999), Nonparametric regression and spline smoothing, Marcel Dekker, New York. Fan, J., Hall, P., Martin, M. and Patil, P. (996). On local smoothing of nonparametric curve estimators, Journal of American Statistical Association, 9, Gangopadhyay, A. and Cheung, K. (00). Bayesian approach to the choice of smoothing parameter in kernel density estimation, Journal of Nonparametric Statistics, 4(6), Girard D. A. (998). Asymptotic comparison of (partial) cross-validation, GCV and randomized GCV in nonparametric regression, Annals of Statistics, 6(), Hardle, W. and Bowman, A. G. (988). Bootstrapping in nonparametric regression: local adaptive smoothing and confidence bands, Journal of American Statistical Association, 83, 0-0. Hart, J. D. and Yi, S. One-sided cross-validation, unpublished Herrmann, E. (997). Local bandwidth choice in kernel regression estimation, Journal of Computational and Graphical Statistics, 6, Hurvich, C. M. Simonoff J. S. and Tsai C. (998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion, Journal of Royal Statistical Society, 60, Part, pp Ihaka, R. and Gentleman, R. (996). R: A language for data analysis and graphics, Journal of Computational and Graphical Statistics, 5, 99-34, ( Kulasekera, K.B., and Padgett, W. J (006), Bayes bandwidth selection in kernel density estimation with censored data, Journal of Nonparametric Statistics, 8, pp Nadaraya, E. A. (964), On Estimating Regression, Theory of Probability and its Applications, 9(), pp 4-4. Rao, P. B. (983), Nonparametric function estimation, Academic Press, London. Ruppert, D., Sheather, S. J. and Wand, M. P. (995). An effective bandwidth selector for local least squares regression, Journal of American Statistical Association, 90, Watson, G. (964). Smooth regression analysis, Sankhya, Series A, 6,

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable