Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Similar documents
Local linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity

Local Polynomial Regression

ERROR VARIANCE ESTIMATION IN NONPARAMETRIC REGRESSION MODELS

Nonparametric Regression

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Nonparametric confidence intervals. for receiver operating characteristic curves

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

Nonparametric Econometrics

Nonparametric Regression. Changliang Zou

Variance Function Estimation in Multivariate Nonparametric Regression

Smooth simultaneous confidence bands for cumulative distribution functions

Simple and Efficient Improvements of Multivariate Local Linear Regression

Local Polynomial Modelling and Its Applications

Automatic Local Smoothing for Spectral Density. Abstract. This article uses local polynomial techniques to t Whittle's likelihood for spectral density

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

Bandwidth selection for kernel conditional density

Adaptive Kernel Estimation of The Hazard Rate Function

Nonparametric Density Estimation (Multidimension)

Estimation of a quadratic regression functional using the sinc kernel

Test for Discontinuities in Nonparametric Regression

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Nonparametric Methods

Local Adaptive Smoothing in Kernel Regression Estimation

Nonparametric Density Estimation

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

41903: Introduction to Nonparametrics

Nonparametric Modal Regression

Bickel Rosenblatt test

Modelling Non-linear and Non-stationary Time Series

Integral approximation by kernel smoothing

A New Procedure for Multiple Testing of Econometric Models

Nonparametric Function Estimation with Infinite-Order Kernels

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

Statistica Sinica Preprint No: SS

Does k-th Moment Exist?

On a Nonparametric Notion of Residual and its Applications

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

A Design Unbiased Variance Estimator of the Systematic Sample Means

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Function of Longitudinal Data

A Bootstrap Test for Conditional Symmetry

Transformation and Smoothing in Sample Survey Data

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

A review of some semiparametric regression models with application to scoring

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Local Modal Regression

A simple bootstrap method for constructing nonparametric confidence bands for functions

Single Index Quantile Regression for Heteroscedastic Data

SEMIPARAMETRIC ESTIMATION OF CONDITIONAL HETEROSCEDASTICITY VIA SINGLE-INDEX MODELING

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Goodness-of-fit tests for the cure rate in a mixture cure model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Local linear hazard rate estimation and bandwidth selection

Convergence rates for uniform confidence intervals based on local polynomial regression estimators

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

Optimal Estimation of Derivatives in Nonparametric Regression

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

4 Nonparametric Regression

Optimal bandwidth selection for differences of nonparametric estimators with an application to the sharp regression discontinuity design

Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

Multiscale Exploratory Analysis of Regression Quantiles using Quantile SiZer

Uniform Convergence Rates for Nonparametric Estimation

Adaptive Nonparametric Density Estimators

CALCULATING DEGREES OF FREEDOM IN MULTIVARIATE LOCAL POLYNOMIAL REGRESSION. 1. Introduction

Estimation and Testing for Varying Coefficients in Additive Models with Marginal Integration

A NOTE ON TESTING THE SHOULDER CONDITION IN LINE TRANSECT SAMPLING

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Teruko Takada Department of Economics, University of Illinois. Abstract

Single Index Quantile Regression for Heteroscedastic Data

On Asymptotic Normality of the Local Polynomial Regression Estimator with Stochastic Bandwidths 1. Carlos Martins-Filho.

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors

A Bayesian approach to parameter estimation for kernel density estimation via transformations

Error distribution function for parametrically truncated and censored data

Nonparametric Inference via Bootstrapping the Debiased Estimator

Time Series and Forecasting Lecture 4 NonLinear Time Series

Additional results for model-based nonparametric variance estimation for systematic sampling in a forestry survey

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Relative error prediction via kernel regression smoothers

Testing Equality of Nonparametric Quantile Regression Functions

Nonparametric Regression. Badr Missaoui

Logistic Kernel Estimator and Bandwidth Selection. for Density Function

Package NonpModelCheck

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY

Regression Discontinuity Design Econometric Issues

Optimal Bandwidth Choice for the Regression Discontinuity Estimator

On the Robust Modal Local Polynomial Regression

Plot of λ(x) against x x. SiZer map based on local likelihood (n=500) x

Titolo Smooth Backfitting with R

Introduction to Regression

The Priestley-Chao Estimator - Bias, Variance and Mean-Square Error

ECON The Simple Regression Model

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

LOCAL POLYNOMIAL AND PENALIZED TRIGONOMETRIC SERIES REGRESSION

Smooth functions and local extreme values

Transcription:

Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable bandwidth for multivariate nonparametric regression. We prove its consistency and asymptotic normality in interior and obtain its rates of convergence. For heteroscedastic models, the local linear estimator with variable bandwidth has better goodness of fit than the local linear estimator with constant bandwidth. JEL Classification: C12; C15; C52 Keywords: heteroscedasticity; kernel smoothing; local linear regression; variable bandwidth. 1 Azhong Ye, College of Management, Fuzhou University, Fuzhou, 350002, China. Email: ye2004@fzu.edu.cn 2 Rob J Hyndman, Department of Econometrics and Business Statistics, Monash University, Clayton, Victoria 3800, Australia. Email: Rob.Hyndman@buseco.monash.edu.au 3 Zinai Li, School of Economics and Management, Tsinghua University, Beijin, 100084, China. Email: lizinai@mail.tsinghua.edu.cn 1

1 Introduction We are interested in the problem of heteroscedasticity in nonparametric regression, especially when applied to economic data. Heteroscedasticity is very common in economics, and heteroscedasticity in linear regression is covered in almost every econometrics tetbook. Applications of nonparametric regression in economics are growing, and Yatchew (1998) argued that it will become an indispensable tool for every economist because it typically assumes little about the shape of the regression function. Consequently, we believe that heteroscedasticity in nonparametric regression is an important problem that has received limited attention to date. We seek to develop new estimators that have better goodness of fit than the common estimators in nonparametric econometric models. In particular, we are interested in using heteroscedasticity to improve nonparametric regression estimation. There have been a few papers on related topics. Testing for heteroscedasticity in nonparametric regression has been discussed by Eubank and Thomas (1993) and Dette and Munk (1998). Ruppert and Wand (1994) discussed multivariate locally weighted least squares regression when the variances of the disturbances aren t constant. Ruppert, Wand, Ulla and Hossjer (1997) presented the local polynomial estimator of the conditional variance function in a heteroscedastic, nonparametric regression model using linear smoothing of squared residuals. Sharp-optimal and adaptive estimators for heteroscedastic nonparametric regression using the classical trigonometric Fourier basis are given by Efromovich and Pinsker (1996). Our approach is to eploit the heteroscedasticity by using variable bandwidths in local linear regression. Müller and Stadtmüller (1987) discussed variable bandwidth kernel estimators of regression curves. Fan and Gijbels (1992, 1995, 1996) discussed the local linear estimator with variable bandwidth for nonparametric regression models with a single covariate. In this paper, we etend these papers by presenting a local linear estimator with variable bandwidth for nonparametric multiple regression models. We demonstrate that the local linear estimator has optimal conditional mean squared error when Ye, Hyndman and Li: 23 January 2006 2

its variable bandwidth is a function of the density of the eplanatory variables and conditional variances. Numerical simulation shows that the local linear estimator with this variable bandwidth has better goodness of fit than the local linear estimator with constant bandwidth for the heteroscedastic models. 2 Local linear regression with a variable bandwidth Suppose we have a univariate response variable Y and a d-dimensional set of covariates X, and we observe the random vectors (X 1, Y 1 ),, (X n, Y n ) which are independent and identically distributed. It is assumed that each variable in X has been scaled so they have similar measures of spread. Our aim is to estimate the regression function = E(Y X = ). We can regard the data as being generated from the model Y = m(x) + u, (1) where E(u X) = 0, Var(u X = ) = σ 2 () and the marginal density of X is denoted by f(). We assume the second-order derivatives of are continuous, f() is bounded above 0 and σ 2 () is continuous and bounded. Let K be a d-variate kernel which is symmetric, nonnegative, compactly supported, K(u) du = 1 and uu T K(u) du = µ 2 (K)I where µ 2 (K) 0 and I is the d d identity matri. In addition, all odd-order moments of K vanish, that is, u l 1 1 u l d d K(u) du = 0 for all nonnegative integers l 1,, l d such that their sum is odd. Let K h (u) = K(u/h). Then the local linear estimator of with variable bandwidth is ˆm n (, h n, α) = e T 1 (X T W,α X ) 1 X T W,α Y, (2) Ye, Hyndman and Li: 23 January 2006 3

where h n = cn 1/(d+4), c is a constant that depends only on K, α() is the variable bandwidth function, e T 1 = (1, 0,, 0), X = (X,1,, X,n ) T, X,i = (1, (X i )) T and W,α = ( ) diag K hn α(x 1 )(X 1 ),, K hn α(x n )(X n ). We assume α() is continuously differentiable. We now state our main result. Theorem 1 Let be a fied point in the interior of { f() > 0}, H m () = let s(h m ()) be the sum of the elements of H m (). Then ( 2 i j )d d and 1 E [ ˆm n (, h n, α) X 1,, X n ] = 0.5h 2 nµ 2 (K)α 2 ()s(h m ()) + o p (h 2 n); 2 Var [ ˆm n (, h n, α) X 1,, X n ] = n 1 h d n R(K)α d ()σ 2 ()f 1 () + o p (n 1 h d n ), where R(K) = K 2 (u)du; and 3 n 2/(d+4) [ ˆm n (, h n, α) ] d N(0.5c 2 µ 2 (K)α 2 ()s(h m ()), c d R(K)α d ()σ 2 ()f 1 ()). When α() = 1, results 1 and 2 coincide with Theorem 2.1 of Ruppert and Wand (1994). By result 2 and the law of large numbers, we find that ˆm n (, h n, α) is consistent. From result 3 we know that the rate of convergence of ˆm n (, h n, α) in interior points is O(n 2/(d+4) ) which, according to Stone (1980, 1982), is the optimal rate of convergence for estimating nonparametric function. 3 Using heteroscedasticity to improve local linear regression Although Fan and Gijbels (1992) and Ruppert and Wand (1994) discuss the local linear estimator of, nobody has previously developed an improved estimator using the information of heteroscedasticity. We now show how this can be achieved. Ye, Hyndman and Li: 23 January 2006 4

Using Theorem 1, we can give an epression for the conditional mean squared error of the local linear estimator with variable bandwidth. } MSE = E {[ ˆm n (, h n, α) ] 2 X 1,, X n = 1 4 [h nα()] 4 µ 2 2(K)s 2 (H m ()) + R(K)σ2 () n[h n α()] d f() + o p(h 2 n) + o p (n 1 h d n ). (3) Minimizing MSE, ignoring the higher order terms, we obtain the optimal variable bandwidth ( α opt () = c 1 dr(k)σ 2 ) 1/(d+4) () µ 2. 2 (K)f()s2 (H m ()) Note that the constant c will cancel out in the bandwidth epression h n α opt (). Therefore, { } without loss of generality, we can set c = dr(k)µ 2 2 1/(d+4) (K) which simplifies the above epression to ( σ 2 ) 1/(d+4) () α opt () = f()s 2. (4) (H m ()) To apply this new estimator, we need to replace f(), H m () and σ 2 () with estimators. We can use the kernel estimator of f() and the local polynomial estimator of H m (). To estimate σ 2 () = E(u 2 X = ), we first obtain the estimator of u, û i = Y i Ŷi, where Ŷ i = ˆm n (X i, h n, 1). Then we estimate σ 2 () using local polynomial regression with the model û 2 i = σ2 (X i ) + v i where v i are iid with zero mean. In our eamples, we measure relative goodness of fit by the mean of standard squared errors: ( n ) MSSE = n 1 1/2 [ ˆm n (X i, h n, α) m(x i )] 2. (5) Ye, Hyndman and Li: 23 January 2006 5

4 Numerical studies with univariate regression This section eamines the performance of the proposed variable bandwidth method via several data sets of univariate regression, generated from known functions. Also we compare it with the performance of the constant bandwidth method. As the true regression function is known in each case, the performances of the variable and constant bandwidths are measured by MSSE. We consider five simulated models as following subsection. In each of these models Y = m(x)+ σ(x)u with u N(0, 1) and the covariate X has a Uniform ( 2, 2) distribution. 4.1 True regression models Model A: m A () = 2 +, σ 2 A() = 32 2 + 0.04. Model B: m B () = (1 + ) sin(1.5), σ 2 B() = 3.2 2 + 0.04. Model C: m C () = + 2 ep( 2 2 ), σ 2 C() = 16( 2 0.01)I ( 2 >0.01) + 0.04. Model D: m D () = sin(2) + 2 ep( 2 2 ), Ye, Hyndman and Li: 23 January 2006 6

σ 2 D() = 16( 2 0.01)I ( 2 >0.01) + 0.04. Model E: m E () = ep( ( + 1) 2 ) + 2 ep( 2 2 ), σ 2 E() = 32( 2 0.01)I ( 2 >0.01) + 0.04. 4.2 Bandwidth selectors For the constant bandwidth method, we use direct plug-in methodology to select the bandwidth of a local linear Gaussian kernel regression estimate, as described by Ruppert, Sheather and Wand (1995). For the variable bandwidth method, we use direct plug-in methodology to choose the variable bandwidth. First, we use direct plug-in methodology to select the bandwidth of a kernel density estimate for f(). Second, we estimate σ 2 () = E(u 2 X = ) by using local linear or kernel regression with the model û 2 i = σ2 (X i ) + v i where û i = Y i ˆm n (X i, ĥn, 1),v i are iid with zero mean and ĥn is chosen by the direct plug-in methodology. Third, we use three methods to estimate. For the first method, we assume that = α 1 + α 2 + α 3 2 + α 4 3 + α 5 4, and then = 2α 3 + 6α 4 + 12α 5 2. For the second method, we use revised rule of thumb for bandwidth selection (J. Fan and I. Gijbels, 1996) by replacing the rule of thumb for bandwidth selector (the formula (4.2) in their book) with the ideal choice of bandwidth (the formula (3.20) in their book). For the third method, we average the first and second estimators of in previous methods. Finally, we Ye, Hyndman and Li: 23 January 2006 7

have the direct plug-in bandwidth ( ˆσ 2 ) 1/5 () ĥ() = 2n π ˆf() ˆ m 2. () 4.3 Simulated results We draw 1000 random samples of size 200 from each model for each method. Table 1 presents a summary of the outputs of the variable and constant bandwidth methods. It shows that the variable bandwidth method has less mean of standard squared errors than the constant bandwidth method. Among 1000 samples, the percentages that the variable bandwidth method is better than the constant bandwidth method are totally more than half. We plot the true regression functions (the solid line) and four typical estimated curves in Figure 1 and 2. These correspond to the 10 th, the 30 th, the 70 th, the 90 th percentile. The dashed line is for the constant bandwidth method and the dotted line is for the variable bandwidth method. We find out that for the same percentile, the dotted line (the variable bandwidth method) is almost closer to the solid line (the true regression functions) than the dashed line ( the constant bandwidth method). Therefore, we have the conclusion that for heteroscedastic models, the local linear estimator with variable bandwidth has better goodness of fit than the local linear estimator with constant bandwidth. 5 Numerical studies with bivariate regression This section eamines the performance of the proposed variable bandwidth method via several data sets of bivariate regression, generated from known functions and compares it with the performance of the constant bandwidth method. We consider four simulated models as following subsection. In each of these models Y = m(x 1, X 2 ) + σ(x 1 )u with u N(0, 1) and the covariate X 1 and X 2 are independent and have a Uniform ( 2, 2) distribution. We draw 200 random samples of size 400 from each model. Ye, Hyndman and Li: 23 January 2006 8

5.1 True regression models Model A: m A ( 1, 2 ) = 1 2, σ 2 A( 1, 2 ) = ( 2 1 0.04)I ( 2 1 >0.04) + 0.01. Model B: m B ( 1, 2 ) = 1 ep( 2 2 2), σ 2 B( 1, 2 ) = 2.5( 2 1 0.04)I ( 2 1 >0.04) + 0.025. Model C: m C ( 1, 2 ) = 1 + 2 sin(1.5 2 ), σ 2 C( 1, 2 ) = ( 2 1 0.04)I ( 2 1 >0.04) + 0.01. Model D: m D ( 1, 2 ) = sin( 1 + 2 ) + 2 ep( 2 2 2), σ 2 D( 1, 2 ) = 3( 2 1 0.04)I ( 2 1 >0.04) + 0.03. 5.2 Bandwidth selectors To estimate the second derivative of m( 1, 2 ), we assume that m( 1, 2 ) = α 1 + α 2 1 + α 3 2 + α 4 2 1 + α 5 1 2 + α 6 2 2 + α 7 3 1 + α 8 2 1 2 + α 9 1 2 2 Ye, Hyndman and Li: 23 January 2006 9

+α 10 3 2 + α 11 4 1 + α 12 3 1 2 + α 13 2 1 2 2 + α 14 1 3 2 + α 15 4 2, and then we can obtain the estimators of α. Hence, we easily get the estimators for the second derivative of m( 1, 2 ) by using 2 2 1 = 2α 4 + 6α 7 1 + 2α 8 2 + 12α 11 2 1 + 6α 12 1 2 + 2α 13 2 2, 2 1 2 = α 5 + 2α 8 1 + 2α 9 2 + 3α 12 2 1 + 4α 13 1 2 + 3α 14 2 2, 2 2 2 = 2α 6 + 2α 9 1 + 6α 10 2 + 2α 13 2 1 + 6α 14 1 2 + 12α 15 2 2. For the constant bandwidth method, we use direct plug-in methodology to select the bandwidth of a local linear Gaussian kernel regression estimate. To estimate σ 2, we first obtain the estimator of u, û i = Y i Ŷi, where Ŷi = ˆm n (X i, ĥn, 1) and ĥn = min(ĥ1, ĥ2) and ĥ1, ĥ2 are chosen respectively by the direct plug-in methodology for Y = m 1 (X 1 ) + u 1 and Y = m 2 (X 2 ) + u 2. Then ˆσ 2 = n 1 n û 2 i. Finally, we have the direct plug-in bandwidth ( ˆσ 2 ) 1/6 ĥ = π. n s 2 (Ĥm(X i )) For the variable bandwidth method, we also use direct plug-in methodology to choose the variable bandwidth. First, we use direct plug-in methodology to select the bandwidth of a kernel density estimate for X 1 and X 2 respectively, as described by Wand and Jones (1995). Second, we estimate σ 2 ( 1 ) using local linear regression with the model û 2 i = σ2 (X 1i ) + v i, where v i are iid with zero mean. Finally, we have the direct plug-in bandwidth ( ˆσ 2 ) 1/6 () ĥ() = nπ ˆf()s. 2 (Ĥ) Ye, Hyndman and Li: 23 January 2006 10

5.3 Simulated results We draw 200 random samples of size 200 from each model. Table 2 presents a summary of the outputs of the variable and constant bandwidth methods. It shows that the variable bandwidth method has less mean of standard squared errors than the constant bandwidth method. Among 200 samples, the percentages that the variable bandwidth method is better than the constant bandwidth method are totally more than half. We plot the true regression functions with one fied variable (the solid line) and their four typical estimated curves in Figure 3-6. These correspond to the 10 th, the 30 th, the 70 th, the 90 th percentile. The dashed line is for the constant bandwidth method and the dotted line is for the variable bandwidth method. We find out that for the same percentile, the dotted line (the variable bandwidth method) is almost closer to the solid line (the true regression functions) than the dashed line ( the constant bandwidth method). Therefore, we have the conclusion that for heteroscedastic models, the local linear estimator with variable bandwidth has better goodness of fit than the local linear estimator with constant bandwidth. 6 Summary We have presented a local linear nonparametric estimator with variable bandwidth for multivariate regression models. We have shown that the estimator is consistent and asymptotically normal in the interior of the sample space. We have also shown that its convergence rate is optimal for nonparametric regression (Stone, 1980, 1982). By minimizing the conditional mean squared error of the estimator, we have derived the optimal variable bandwidth as a function of the density of the eplanatory variables and the conditional variance. We have also provided a plug-in algorithm for computing the estimator. Numerical simulation shows that our local linear estimator with variable bandwidth has better goodness of fit than the local linear estimator with constant bandwidth for heteroscedastic models. Ye, Hyndman and Li: 23 January 2006 11

Acknowledgements We thank X. Zhang for his helpful comments. References Dette, H., Munk, A., 1998. Testing heteroscedasticity in nonparametric regression. Journal of the Royal Statistical Society. Series B, 60, Part4, 693-708. Efromovich, S., and Pinsker, M., 1996. Sharp-optimal and adaptive estimation for heteroscedastic nonparametric regression. Statistica Sinica 6, 925-942. Eubank, R.L., Thomas, W., 1993. Detecting heteroscedasticity in nonparametric regression. Journal of the Royal Statistical Society. Series B, 55(1), 145-155. Fan, J., Gijbels, I., 1992. Variable bandwidth and local linear regression smoothers. The Annals of Statistics, 20(4), 2008-2036. Fan, J., Gijbels, I., 1995. Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation. Journal of the Royal Statistical Society. Series B, 57(2), 371-394. Fan, J., Gijbels, I., 1996. Local polynomial modelling and its applications. Chapman and Hall. Härdle, W., 1990. Applied nonparametric regression. Cambridge University Press. Müller, H.G., Stadtmüller, U., 1987. Variable bandwidth kernel estimators of regression curves. The Annals of Statistics, 15(1), 182-201. Opsomer, J., Yang, Y., Wang Y., 2001. Nonparametric regression with correlated errors. Statistical Science, 16(2), 134-153. Pagan, A., and Ullah, A., 1999. Nonparametric econometrics. Cambridge University Press. Ruppert, D., Wand, M.P., 1994. Multivariate locally weighted least squares regression. The Annals of Statistics, 22(3), 1346-1370. Ruppert, D., Wand, M.P., Ulla, H., Hossjer, O., 1997. Local polynomial variance-function Ye, Hyndman and Li: 23 January 2006 12

estimation. Technometrics, 39, 262-273. Stone, C.J., 1980. Optimal convergence rates for nonparametric estimators. The Annals of Statistics, 8, 1348-1360. Stone, C.J., 1982. Optimal global rates of convergence for nonparametric regression. The Annals Statist., 10, 1040-1053. White, H., 1984. Asymptotic theory for econometricians. Academic Press. Xia, Y., Tong, H., Li, W.K., Zhu, L.X., 2002. An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society. Series B, 64, Part 3, 363-410. Yatchew, A., 1998. Nonparametric regression techniques in economics. Journal of Economic Literature, 34, June, 669-721. Ye, Hyndman and Li: 23 January 2006 13

Appendi A: Proof of Theorem 1 Note that e T 1 (X T W,α X ) 1 X T W,α X = e T 1 D m () =, (6) D m () where D m () = [ 1 ],, T d. Therefore, ˆm n (, h n, α) = e T 1 (X T W,α X ) 1 X T W,α [0.5Q m () + U], (7) where Q m () = [Q m,1 (),, Q m,n ()] T, Q m,i () = (X i ) T H m (z i (, X i ))(X i ), ( H m () = 2 i j )d d, z i(, X i ) X i and U = (u 1,, u n ) T. We can deduce that {z i (, X i )} n are independent because {X i} n obvious that are independent. It is n 1 n 1 X T W,α X = n K hnα(xi )(X i ) n K hn α(x i )(X i )(X i ) n K hnα(x i )(X i )(X i ) T n K hn α(x i )(X i )(X i )(X i ) T. (8) We begin with a lemma using the notation of (2) and Theorem 1. Lemma 1 n 1 n n 1 n n 1 n K hn α(x i )(X i ) = f() + o p (1) (9) K hn α(x i )(X i )(X i ) = h 2 nα 3 ()G(α, f, ) + o p (h 2 n1) (10) K hnα(x i )(X i )(X i )(X i ) T = µ 2 (K)h 2 nα 2 ()f()i + o p (h 2 n1) (11) Ye, Hyndman and Li: 23 January 2006 14

where [ ] G(α, f, ) = f() udα T ()uu T D K (u)du + µ 2 (K) df()d α () + α 1 ()D f () supp(k) and 1 is a generic matri having each entry equal to 1, the dimensions of which will be clear from the contet. (X T W,α X ) 1 = f() 1 + o p (1) B(, α) + o p (1) (12) B T (, α) + o p (1) µ 2 (K) 1 h 2 n α 2 ()f() 1 I + o p (h 2 n 1) f()µ 2(K)α 2 ()s(h m ()) n 1 X T W,α Q m () = h 2 n + o p (h 2 n1) (13) 0 (nh d n) 1/2 (n 1 X T W,α u) d where B(, α) = µ 2 (K) 1 f() 2 α()g(α, f, ) T. N(0, R(K)α d ()σ 2 ()f()). (14) 0 We only prove results (10), (13) and (14) as the other results can be proved similarly. Proof of (10) It is easy to show that n 1 n where Ψ = [ ] K hn α(x i )(X i )(X i ) = E K hn α(x 1 )(X 1 )(X 1 ) + O p ( n 1 Ψ), (15) ( Var(K hnα(x 1 )(X 1 )(X 11 1 )),, Var(K hnα(x 1 )(X 1 )(X d1 d ))) T. Because is a fied point in the interior of supp(f) = { f() 0}, we have supp(k) {z : ( + h n α()z) supp(f)}, provided that the bandwidth h n is small enough. Ye, Hyndman and Li: 23 January 2006 15

Due to the continuity of f, K and α, we have [ ] E K hn α(x 1 )(X 1 )(X 1 ) = h d n α d ()K(h 1 n (α(x 1 )) 1 (X 1 ))(X 1 )f(x 1 )dx 1 supp(f) = (α( + h n Q)) d K(Q(α( + h n Q)) 1 )f( + h n Q)h n QdQ Ω n = h 2 nα() 3 G(α, f, ) + o(h 2 n1), (16) where Ω n = {Q : + h n Q supp(f)}. It is easy to see [ ] Var K hn α(x 1 )(X 1 )(X 1 ) { [ ] [ ] } T = E K hn α(x 1 )(X 1 )(X 1 ) K hn α(x 1 )(X 1 )(X 1 ) { [ ]} { [ T E K hn α(x 1 )(X 1 )(X 1 ) E K hn α(x 1 )(X 1 )(X 1 )]}. (17) Again by the continuity of f, K and α, we have { [ ] [ ] } T E K hn α(x 1 )(X 1 )(X 1 ) K hn α(x 1 )(X 1 )(X 1 ) = E [K ] hn α(x 1 )(X 1 ) 2 (X 1 )(X 1 ) T = [ 2 h d n (α()) d K(h 1 n (α(x 1 )) 1 (X 1 ))] (X1 )(X 1 ) T f(x 1 )dx 1 supp(f) = h d+2 n (α( + h n Q)) d K(Q(α( + h n Q)) 1 ) 2 f( + h n Q)QQ T dq Ω n = h d+2 n (α()) d K(Q(α()) 1 ) 2 f()qq T dq + O(h d+2 n 1) = O(h d+2 n 1). Ω n (18) Therefore we have O p ( n 1 Ψ) = o p (h 2 n1). (19) Then (10) follows from (15) (19). Ye, Hyndman and Li: 23 January 2006 16

Proof of (13) It is straightforward to show that n 1 X T W,α Q m () n 1 n K hn α(x i )(X i )(X i ) T H m (z i (, X i ))(X i ) = n 1. (20) n K hn α(x i )(X i )(X i ) T H m (z i (, X i ))(X i )(X i ) It is easy to prove that n n 1 n n 1 K hn α(x i )(X i )(X i ) T H m (z i (, X i ))(X i ) (21) = h 2 nf()µ 2 (K)(α()) 2 s(h m ()) + o p (h 2 n1), K hn α(x i )(X i )(X i ) T H m (z i (, X i ))(X i )(X i ) = O p (h 3 n1). (22) Therefore (13) holds. Proof of (14) It is obvious that [ ] [ ] E n 1 X T W,α u = E n 1 X T W,α u = 0, n 1 n K n 1 X T hn α(x i )(X i )u i W,α u = n 1. (23) n K hnα(xi )(X i )(X i )u i By the continuity of f, K, σ 2 and α, we have Var [ n 1 n ] ] K hn α(x i )(X i )u i = n 1 Var [K hn α(x 1 )(X 1 )u 1 [ 2 = n 1 h d n (α()) d K(h 1 n (α(x 1 )) 1 (X 1 ))] σ 2 (X 1 )f(x 1 )dx 1 supp(f) = n 1 h d n ((α( + h n Q)) d K(Q(α( + h n Q)) 1 )) 2 σ 2 ( + h n Q)f( + h n Q)dQ Ω n = n 1 h d n ((α()) d K(Q(α()) 1 )) 2 σ 2 ()f()dq + o(n 1 h d n ) Ω n Ye, Hyndman and Li: 23 January 2006 17

= n 1 h d n R(K)(α()) d σ 2 ()f() + o(n 1 h d n ), (24) [ n ] Var n 1 K hnα(xi )(X i )(X i )u i = o(n 1 h d+2 n 1). (25) Then (14) holds. Proof of Theorem 1 Proof of Theorem 1(1) By (12) and (13), we have E [ ˆm n (, h n, α) X 1,, X n ] = 0.5e T 1 (X T W,α X ) 1 X T W,α Q m () = 0.5h 2 nµ 2 (K)(α()) 2 s(h m ()) + o p (h 2 n). (26) Therefore Theorem 1(1) holds. Proof of Theorem 1(2) Denote V = diag { σ 2 (X 1 ),, σ 2 (X n ) }. It is obvious that Var [ ˆm n (, h n, α) X 1,, X n ] = e T 1 (X T W,α X ) 1 X T W,α V W,α X (X T W,α X ) 1 e 1, (27) and a 11(, h n, α) (a 21 (, h n, α)) T n 1 X T W,α V W,α X =, (28) a 21 (, h n, α) a 22 (, h n, α) Ye, Hyndman and Li: 23 January 2006 18

where n a 11 (, h n, α) = n 1 (K hn α(x i )(X i )) 2 σ 2 (X i ) n a 21 (, h n, α) = n 1 (K hn α(x i )(X i )) 2 (X i )σ 2 (X i ) n a 22 (, h n, α) = n 1 (K hnα(xi )(X i )) 2 (X i )(X i ) T σ 2 (X i ). It is easy to prove that n n 1 n n 1 n n 1 (K hn α(x i )(X i )) 2 σ 2 (X i ) = h d R(K)(α()) d σ 2 ()f()o p (h d (K hn α(x i )(X i )) 2 (X i ) T σ 2 (X i ) = O p (h d+1 n 1) n (K hn α(x i )(X i )) 2 (X i )(X i ) T σ 2 (X i ) = O p (h d+2 n 1). (29) n ) By (27) (29) and (12), we have Var [ ˆm n (, h n, α) X 1,, X n ] = n 1 h d n R(K)(α()) d σ 2 ()f() 1 + o p (n 1 h d n ). (30) Therefore Theorem 1(2) holds. Proof of Theorem 1(3) By (13) and (14) and the central limit theorem, we have n 2/(d+4) n 1 X T d W,α [0.5Q m () + u] N(0.5c2 µ 2 (K)(α()) 2 f()s(h m ()), c d R(K)(α()) d σ 2 ()f()). (31) 0 Applying White (1984, Proposition 2.26) and (12), we can easily deduce Theorem 1(3). Ye, Hyndman and Li: 23 January 2006 19

Table 1: The percentage of 1000 samples (in which the variable bandwidth method is better than the constant bandwidth method) and the mean of standard squared errors of two methods. Model and Percentage Mean of standard squared errors the method estimating Constant bandwidth Variable bandwidth Model A, method 1 75 1.358062 1.11497 Model A, method 2 62.1 1.357988 1.211857 Model A, method 3 69.5 1.36389 1.174379 Model B, method 1 63.6 0.4990762 0.4346853 Model B, method 2 53 0.4897841 0.4534872 Model B, method 3 56 0.49308 0.446344 Model C, method 1 75.7 0.9738531 0.7994637 Model C, method 2 65.7 1.001366 0.8889062 Model C, method 3 71 0.9757689 0.8387869 Model D, method 1 68.5 0.9737332 0.8523557 Model D, method 2 59.8 0.9750131 0.9011074 Model D, method 3 63.5 0.9776387 0.8683526 Model E, method 1 80 1.364086 1.100882 Model E, method 2 66.6 1.390835 1.232905 Model E, method 3 72.2 1.329069 1.122972 Table 2: The percentage of 200 samples (in which the variable bandwidth method is better than the constant bandwidth method) and the mean of standard squared errors of two methods. Model Percentage Mean of standard squared errors Constant bandwidth Variable bandwidth Model A 54.6 0.04108433 0.03972653 Model B 54 0.08431403 0.08162663 Model C 53 0.0935101 0.08142993 Model D 52 0.09994177 0.09068134 Ye, Hyndman and Li: 23 January 2006 20

Figure 1: The results for the simulated univariate regression data of model A, B and C. The true regression functions (the solid line) and four typical estimated curves are presented. These correspond to the 10 th, the 30 th, the 70 th, the 90 th percentile. The dashed line is for the constant bandwidth method and the dotted line is for the variable bandwidth method. Model A, method 1 Model A, method 2 Model A, method 3 0 5 10 0 5 10 0 5 10 Model B, method 1 Model B, method 2 Model B, method 3 2 0 2 2 0 2 2 0 2 Model C, method 1 Model C, method 2 Model C, method 3 6 2 2 6 6 2 2 6 6 2 2 6 Ye, Hyndman and Li: 23 January 2006 21

Figure 2: The results for the simulated univariate regression data of model D and E. The true regression functions (the solid line) and four typical estimated curves are presented. These correspond to the 10 th, the 30 th, the 70 th, the 90 th percentile. The dashed line is for the constant bandwidth method and the dotted line is for the variable bandwidth method. Model D, method 1 Model D, method 2 Model D, method 3 4 2 0 2 4 4 2 0 2 4 4 2 0 2 4 Model E, method 1 Model E, method 2 Model E, method 3 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 Ye, Hyndman and Li: 23 January 2006 22

Figure 3: The results for the simulated bivariate data of model A. The true regression functions (the solid line) and four typical estimated curves are presented. These correspond to the 10 th, the 30 th, the 70 th, the 90 th percentile. The dashed line is for the constant bandwidth method and the dotted line is for the variable bandwidth method. Model A, _2=1 Model A, _1=1 m(_1,1) 3 0 2 m(1,_2) 3 1 1 3 _1 _2 Model A, _2=0 Model A, _1=0 m(_1,0) 1.5 0.0 1.5 m(0,_2) 0.15 0.05 _1 _2 Model A, _2= 1 Model A, _1= 1 m(_1, 1) 3 0 2 m( 1,_2) 3 1 1 3 _1 _2 Ye, Hyndman and Li: 23 January 2006 23

Figure 4: The results for the simulated bivariate data of model B. The true regression functions (the solid line) and four typical estimated curves are presented. These correspond to the 10 th, the 30 th, the 70 th, the 90 th percentile. The dashed line is for the constant bandwidth method and the dotted line is for the variable bandwidth method. Model B, _2=1 Model B, _1=1 m(_1,1) 3 1 1 3 m(1,_2) 2 0 1 2 _1 _2 Model B, _2=0 Model B, _1=0 m(_1,0) 4 0 2 4 m(0,_2) 0.2 0.0 0.2 _1 _2 Model B, _2= 1 Model B, _1= 1 m(_1, 1) 3 1 1 m( 1,_2) 1.5 0.0 1.5 _1 _2 Ye, Hyndman and Li: 23 January 2006 24

Figure 5: The results for the simulated bivariate data of model C. The true regression functions (the solid line) and four typical estimated curves are presented. These correspond to the 10 th, the 30 th, the 70 th, the 90 th percentile. The dashed line is for the constant bandwidth method and the dotted line is for the variable bandwidth method. Model C, _2=1 Model C, _1=1 m(_1,1) 4 0 4 m(1,_2) 2 0 2 4 _1 _2 Model C, _2=0 Model C, _1=0 m(_1,0) 4 0 4 m(0,_2) 2 0 1 2 _1 _2 Model C, _2= 1 Model C, _1= 1 m(_1, 1) 6 2 2 m( 1,_2) 4 2 0 2 _1 _2 Ye, Hyndman and Li: 23 January 2006 25

Figure 6: The results for the simulated bivariate data of model D. The true regression functions (the solid line) and four typical estimated curves are presented. These correspond to the 10 th, the 30 th, the 70 th, the 90 th percentile. The dashed line is for the constant bandwidth method and the dotted line is for the variable bandwidth method. Model D, _2=1 Model D, _1=1 m(_1,1) 4 0 2 4 m(1,_2) 3 0 2 _1 _2 Model D, _2=0 Model D, _1=0 m(_1,0) 2 2 4 6 m(0,_2) 1.0 0.5 2.0 _1 _2 Model D, _2= 1 Model D, _1= 1 m(_1, 1) 4 0 2 4 m( 1,_2) 2 0 1 2 _1 _2 Ye, Hyndman and Li: 23 January 2006 26