Multivariate Locally Weighted Polynomial Fitting and Partial Derivative Estimation

Similar documents
Local Polynomial Regression

Nonparametric Density Estimation (Multidimension)

Nonparametric Econometrics

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Nonparametric Modal Regression

Modelling Non-linear and Non-stationary Time Series

Simple and Efficient Improvements of Multivariate Local Linear Regression

Nonparametric Regression. Changliang Zou

Nonparametric Regression

Absolute value equations

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Local Polynomial Modelling and Its Applications

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Variance Function Estimation in Multivariate Nonparametric Regression

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Smooth simultaneous confidence bands for cumulative distribution functions

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

On a Nonparametric Notion of Residual and its Applications

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly Unrelated Regression Model and a Heteroscedastic Model

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Density estimators for the convolution of discrete and continuous random variables

A NOTE ON A DISTRIBUTION OF WEIGHTED SUMS OF I.I.D. RAYLEIGH RANDOM VARIABLES

Large Sample Properties of Estimators in the Classical Linear Regression Model

Statistical inference on Lévy processes

Local linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity

Improved Newton s method with exact line searches to solve quadratic matrix equation

On the Robust Modal Local Polynomial Regression

Comments on \Wavelets in Statistics: A Review" by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill

Approximation by Conditionally Positive Definite Functions with Finitely Many Centers

Two Results About The Matrix Exponential

Automatic Local Smoothing for Spectral Density. Abstract. This article uses local polynomial techniques to t Whittle's likelihood for spectral density

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction

On variable bandwidth kernel density estimation

A note on the equality of the BLUPs for new observations under two linear models

Optimal Kernel Shapes for Local Linear Regression

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

Local Modelling with A Priori Known Bounds Using Direct Weight Optimization

Additive Isotonic Regression

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

Nonparametric Regression. Badr Missaoui

The choice of weights in kernel regression estimation

Econ 582 Nonparametric Regression

Mohsen Pourahmadi. 1. A sampling theorem for multivariate stationary processes. J. of Multivariate Analysis, Vol. 13, No. 1 (1983),

On Hadamard and Kronecker Products Over Matrix of Matrices

Time Series Analysis. Asymptotic Results for Spatial ARMA Models

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Fisher information for generalised linear mixed models

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Optimal global rates of convergence for interpolation problems with random design

4 Nonparametric Regression

41903: Introduction to Nonparametrics

A nonparametric method of multi-step ahead forecasting in diffusion processes

STATISTICAL ESTIMATION IN VARYING COEFFICIENT MODELS

Stochastic Design Criteria in Linear Models

OR MSc Maths Revision Course

Factorization of integer-valued polynomials with square-free denominator

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

Determinants of Partition Matrices

ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

HIGHER ORDER CUMULANTS OF RANDOM VECTORS, DIFFERENTIAL OPERATORS, AND APPLICATIONS TO STATISTICAL INFERENCE AND TIME SERIES

lecture 4: Constructing Finite Difference Formulas

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Solvability of Linear Matrix Equations in a Symmetric Matrix Variable

Spatial Process Estimates as Smoothers: A Review

Bias Correction and Higher Order Kernel Functions

A Basis for Polynomial Solutions to Systems of Linear Constant Coefficient PDE's

Function of Longitudinal Data

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

Appendix 5. Derivatives of Vectors and Vector-Valued Functions. Draft Version 24 November 2008

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Interpolating the arithmetic geometric mean inequality and its operator version

Malaya J. Mat. 2(1)(2014) 35 42

Consistent Bivariate Distribution

Global Quadratic Minimization over Bivalent Constraints: Necessary and Sufficient Global Optimality Condition

Matrix Inequalities by Means of Block Matrices 1

Transformation and Smoothing in Sample Survey Data

AN EFFICIENT GLS ESTIMATOR OF TRIANGULAR MODELS WITH COVARIANCE RESTRICTIONS*

STAT 100C: Linear models

ERROR VARIANCE ESTIMATION IN NONPARAMETRIC REGRESSION MODELS

THE CLASSIFICATION OF PLANAR MONOMIALS OVER FIELDS OF PRIME SQUARE ORDER

Estimation of the Bivariate and Marginal Distributions with Censored Data

CS 195-5: Machine Learning Problem Set 1

. a m1 a mn. a 1 a 2 a = a n

A comparative study of ordinary cross-validation, r-fold cross-validation and the repeated learning-testing methods

J. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis

Time Series and Forecasting Lecture 4 NonLinear Time Series

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Wavelet Regression Estimation in Longitudinal Data Analysis

Yunhi Cho and Young-One Kim

Transcription:

journal of multivariate analysis 59, 8705 (996) article no. 0060 Multivariate Locally Weighted Polynomial Fitting and Partial Derivative Estimation Zhan-Qian Lu Geophysical Statistics Project, National Center for Atmospheric Research, Boulder, Colorado 80307 Nonparametric regression estimator based on locally weighted least squares fitting has been studied by Fan and Ruppert and Wand. The latter paper also studies, in the univariate case, nonparametric derivative estimators given by a locally weighted polynomial fitting. Compared with traditional kernel estimators, these estimators are often of simpler form and possess some better properties. In this paper, we develop current work on locally weighted regression and generalize locally weighted polynomial fitting to the estimation of partial derivatives in a multivariate regression context. Specifically, for both the regression and partial derivative estimators we prove joint asymptotic normality and derive explicit asymptotic expansions for their conditional bias and conditional convariance matrix (given observations of predictor variables) in each of the two important cases of local linear fit and local quadratic fit. 996 Academic Press, Inc.. INTRODUCTION Nonparametric regression estimation from a locally weighted least squares fit has been studied by Stone (977, 980), Cleveland (979), Cleveland and Devlin (988), Fan (993), and Ruppert and Wand (994). The last paper also studied, in the univariate case, nonparametric derivative estimators given by a locally weighted polynomial fitting. In this paper, we develop current work on locally weighted regression and we generalize locally weighted polynomial fitting the estimation of partial derivatives in a multivariate regression context. We consider the following setup: Given i.i.d. random vectors, [(X i, Y i ),,..., n], Y i # R, X i # R p and the latter has density function f. The statistical issues are estimation of the regression function Received February 5, 995; revised July 996. AMS 99 subject classification: 6G07 Key words and phrases: locally weighted regression, joint asymptotic normality, asymptotic bias, asymptotic variance, kernel estimator, nonparametric regression. 87 0047-59X96 8.00 Copyright 996 by Academic Press, Inc. All rights of reproduction in any form reserved.

88 ZHAN-QIAN LU m(x)=e(y X=x) and its partial derivatives at any given point x # R p in a domain of interest. Alternatively, if EY < we can write Y i =m(x i )+& (X i ) = i,,..., n, (.) = i 's are i.i.d. scalar random variables with E(= i X i )=0, Var(= i X i )=, and & is called the variance function. This setup is called the random design model, as opposed to the fixed design model, given by Y i =m(x i )+& (x i ) = i,,..., n, the x i 's are predetermined quantities and nonrandom. In this paper, all results are given for the random design model, although similar results hold for the fixed design model as well. The two kernel regression estimators, namely the NadarayaWatson (NW ) estimator (Nadaraya, 964; Watson, 964) and the GasserMu ller (GM ) estimator (Gasser and Mu ller, 979), have been studied extensively in the literature. Substantial recent attention focuses on a larger class of kernel estimators given by the locally weighted polynomial fitting. (The NW estimator corresponds to the local constant fit). Chu and Marron (99) found it difficult to compare the NW and GM estimators in a random design model. Subsequently Fan (993) studied the nonparametric regression estimator based on the local linear fit and showed that the local linear regression smoother has advantages over both the NW and GM estimators as well as a certain optimality property. Another important issue is the boundary effect in the NW and GM estimators, namely both have slower convergence rates near boundary points and require some boundary modification for global estimation. As shown in Fan and Gijbels (99) the local linear regression estimator does not have this drawback. Derivative estimation often arises in bandwidth selection problems (Ha rdle, 990) and in modeling growth curves in longitudinal studies (Mu ller, 988). Our motivation for studying partial derivative estimation arises from a practical problem in chaos. Specifically, estimating Lyapunov exponents is an important issue which involves estimating first-order partial derivatives of a multivariate autoregression function. McCaffrey et al. (99) have proposed applying nonparametric regression to this problem and they employed the derivative estimators given by differentiating certain nonparametric regression estimators. Traditionally another often employed approach to derivative estimation is by kernel estimators using higher-order kernels. However, adaptations of these approaches for partial derivative estimation in a random design model are not very satisfactory since the derived estimators often have complicated form and are not easy to analyze. For example, the asymptotic bias and variance associated with the differentiation approach are difficult

LOCAL POLYNOMIAL FITTING 89 to work out. Furthermore, these approaches suffer from drawbacks similar to those of the NW and GM estimators as discussed earlier. We will show, generalizing Ruppert and Wand (994), that the partial derivative estimators given by the local quadratic fit have desirable properties. The following results will be proved in this paper. In each of the two important cases of the local linear fit and local quadratic fit, for both the regression and derivative estimators, explicit asymptotic expansions are derived for their conditional bias and conditional covariance matrix, given observations of predictor variables (in Theorem and Theorem 3). In each case, the joint asymptotic normality is also proved respectively (in Theorem and Theorem 4). The results on bias and variance calculations of regression estimators in Theorem and Theorem 3 correspond to those of Ruppert and Wand (994), but more explicit expressions are given here. The results on derivative estimators generalize Ruppert and Wand (994) to the multivariate regression case. In Lu (995b; see also Lu, 994), the multivariate local polynomial fitting has been generalized to the multivariate time series context, the joint asymptotic normality is proved, and the method is applied to estimating the spectra of local Lyapunov exponents. Higher order fit, such as local cubic fit, can also be studied similarly, but the number of parameters to be estimated at each fitting increases very rapidly and may not be practical except for very large data set. It should be pointed out that, due to the curse of dimensionality in nonparametric smoothing methods, in order for the local polynomial fitting to be consistent, the sample size required grows exponentially fast as the dimension p increases. So for large p and a moderate amount of data some dimension reduction principle should be employed. This paper is organized follows. Notations are given in Section. The locally weighted polynomial fit is introduced in Section 3, the local linear fit is studied. Section 4 studies the main case of the local quadratic fit. The proofs of theorems are given in Section 5.. NOTATION Given a p_p matrix A, A T denotes its transpose. For A,..., A k (k>) which are square matrices, we denote A diag[a,..., A k ]=\... +, A k the suppressed elements are zeros.

90 ZHAN-QIAN LU For a given p_q matrix A=(a, a,..., a q ), let vec A=(a T, at,..., at q )T. If A=(a ij ) is a symmetric matrix of order p, let vech A denote the column vector of dimension p( p+), formed by stacking the elements on and below the diagonal; that is, vech A=(a,..., a p, a,..., a p,..., a pp ) T. Also vech T A=(vech A) T. Let U denote an open neighborhood of x=(x,..., x p ) T in R p, and let C d (U ) be the class of functions which have up to order d continuous parital derivatives in U. For any g=g(x,..., x p )#C d (U) and a positive number k (less than d ), the kth-order differential D k g (x, u) for any given point u=(u,..., u p )#R p is defined by D k g (x, u)= : i,..., i p C k i }}}i p k g(x) u i x i x i }}}x i p }}}u ip p, p the summations are over all distinct nonnegative integers i,..., i p such that i +}}}+i p =k, and C k i }}}i p =k!(i!}}}i p!). We also denote D g (x)=(g(x)x,..., g(x)x p ) T for g # C (U ) and the Hessian matrix by H g (x)=( g(x)x i x j ) for g # C (U ). For a p_p matrix A, we denote its determinant by A, and a certain norm by &A&, for example, &A&=( p A(i, i, j= j) ). Given a random sequence [a n ], we denote a n =o p (# n )if# & a n n tends to zero in probability, we denote a n =O p (# n ) if # & a n n tends to zero in probability, we denote a n =O p (# n ) if # & a n n is bounded in probability. For a sequence of p_q random matrices [A n ], write A n =o p (# n ), or O p (# n ) if and only if each component A n (i, j )=o p (# n ), or O p (# n ),,..., p, j=,..., q. Then A n =o p (# n )(O p (# n )) if and only if &A n &=o p (# n )(O p (# n )). 3. LOCAL LINEAR FIT The local linear estimators of regression and partial derivatives at any given x are given by the locally weighted least squares fit of a linear function, i.e., derived by minimizing the weighted sum of squares n : [Y i &a&b T (X i &x)] H & K(H & (X i &x)), (3.) K( } ) is the weighting function, H is the bandwidth matrix, and a and b are parameters. We denote Y=(Y,..., Y n ) T, W=diag[ H & K(H & (X &x)),..., H & K(H & (X n &x))],

LOCAL POLYNOMIAL FITTING 9 and use X to denote the n_(p+) design matrix (X &x) T X=\b b +. (X n &x) T If there are at least ( p+) points with positive weights, (X T WX) is invertible with probability one, and there is a unique solution to the minimization (3.) given in matrix form by ; L=(X T WX) & X T WY, (3.) which is an estimator of ; L (x)=(m(x), D T m (x))t. Use of a bandwidth matrix H in a multivariate smoothing context is sometimes advantageous as discussed by Ruppert and Wand (994) (whose H corresponds to the H here) and is also adopted in this paper following the suggestion of a referee. We assume that: (A) The bandwidth matrix H is symmetric and strictly positive definite. One choice is H=hI, h is a scalar bandwidth and I is the identity matrix of dimension p. This choice implies that all predictor variables are scaled equally. However, since this special bandwidth matrix is simple and only one smoothing parameter needs to be specified, it remains the predominant choice in practice. If the predictor variables do not have the same scale, some transformation before smoothing, such as normalization by the respective standard deviations, may be advisable. The weighting or kernel function K is generally a nonnegative integrable function. For simplicity, we make the following assumptitions on the kernel function: (A) The kernel K is a spherically symmetric density function i.e., there exists a univariate function k such that K(x)=k(&x&) for all x # R p. Furthermore, we will assume that the kernel K has eight-order marginal moment, i.e., u8 K(u,..., u p ) du }}}du p <. Consequently, the odd-ordered moments of K and K, when they exist, are zero; i.e., for l=, p ui u i }}}u i p p K l (u)du=0 if : j= i j is odd.

9 ZHAN-QIAN LU Also let + l =u l K(u)du,J l=u i p p K l (u)dufor any nonnegative integers l. Intuitively, by using a spherical symmetric kernel the weight is a function of Mahalanobis distance between data and the point of interest, and the bandwidth matrix H controls both the shape and scale of smoothing. The following smoothness assumptions are made on the regression and design density: (A3) For the given point x=(x,..., x p ) T with f(x)>0, &(x)>0, there is an open neighborhood U of x such that m # C 3 (U ), f # C (U ), & # C 0 (U ). Since unconditional moments of the estimators considered here may not exist in general, following the tradition of Fan (993) and Rupert and Wand (994), we will work with their conditional moments (given observations of predictor variables X i 's). Their large-sample behavior as &H& 0, n H is studied in detail. We also establish joint asymptotic normality of the estimators. The following theorem gives the asymptotic expansions for the conditional bias and conditional convariance matrix of the local linear estimators. Recall that H m (x) denotes the Hessian matrix of m at x, i.e., ( m(x)x i x j ). Theorem. Under model (.) and assumptions (A)(A3), for &H& 0, n H as n, the conditional bias of the local linear regression and derivative estimators given by (3.) have the asymtotic expansions m(x) @ E {\ D m (x) D m (x)+} X, X,..., X n= @+ & \ m(x) B L (x, H )+diag[, H & ] &H& [o p (&H&)+O p ([n H ] & )], (3.3) B L (x, H )=diag[, H ]\ + Tr(H m (x) H ) & b(m, H)+ (3.4) 3! + + f(x) b (m, H)+, b(m, H )= ud3 m (x, Hu) K(u) du, (3.5) b (m, H )= u[ut HH m (x) Hu][D T f (x) Hu] K(u) du &+ HD f(x) Tr[H m (x)h ].

LOCAL POLYNOMIAL FITTING 93 The conditional variance-covariance matrix has the asymptotic expansion Cov m(x) @ {\ D@+} m (x) X, X,..., X n= = &(x) nf(x) H diag[, H & 0 ] _\J 0 0 (J + )++o p() & diag[, H & ]. (3.6) The regression result of Theorem duplicates exactly Theorem. of Ruppert and Wand (994) (hereafter R6W) and the derivative result is a new contribution of this paper and serves as a comparison to the local quadratic fit in the next section. The joint asymptotic normality of the local linear estimators follows under the additional assumptition that (A4) there eixsts a $>0 such that E Y +$ <. Theorem. Under conditions of Theorem and A4, we have that (n H ) diag[, H][; L&;(x)&[B L (x, H)+diag[, H & ]o(&h& 3 )]] tends in distribution to N(0, [&(x)f(x)] diag[j 0,(J u )]), B L(x, H ) is given by (3.4). Remark. For the results on the regression estimator alone to hold, weaker assumptions in (A3) such as m # C (U ), and f # C 0 (U ) will suffice. Remark. The convergence rate of the derivative estimator corresponding to H=O(n &( p+6) ) I is of order n &( p+6), which attains the optimal rate as established in Stone (980). For p=, the conditional mean squared error of m$(x) @ is approximated by h 4 { + 4 m (3) (x)+ + 4&+ m () (x) f$(x) 3! + + f(x) = + &(x) J + f(x) nh3. It can be checked that above expression corresponds to Theorem 4. of R6W (with p=, r= in R6W's notation). Remark 3. The bias of the local linear derivative estimator depends on the partial derivatives of the design density. This may be undesirable in some sense, e.g., in the minimax sense, analogous to criticism of the N-W estimator in Fan (993). Furthermore, the local linear derivative estimators have boundary effect; i.e., at boundary points the asymptotic bias is of order O(&H&).

94 ZHAN-QIAN LU 4. LOCAL QUADRATIC FIT Local quadratic fit is sometimes desirable for regression estimation as discussed in Cleveland and Devlin (988). We will show that the local quadratic fit is suitable for first-order partial derivative estimation. The local quadratic estimators at a given point x are derived by minimizing the weighted sum of squares n : [Y i &a&b T (X i &x)&(x i &x) T L(X i &x)] _ H & K(H & (X i &x)), (4.) a, b, L are parameters and L is restricted to be a lower triangular matrix for identifiability. Note that the total number of parameters in (4.) is q= ( p+)(p+). Denote Y, W as in Section 3, (X &x) T vech T [(X &x)(x &x) T ] X=\b b b +. (X n &x) T vech T [(X n &x)(x n &x) T ] n_q If there are at least q points with positive weights in (4.), then X T WX is invertible with probability one, and there is a unique solution given in matrix form by ; =(X T WX) & X T WY, (4.) which is an estimator of ;(x)=(m(x), D T (x), m vecht [(x)]) T. Here L(x)=(l ij ) satisfies l ij =h ij if i> j and h ii ifi=j, H m (x)=(h ij )is the Hessian. We will assume that: (A5) The kernel K is as in (A) and has th-order marginal moment, i.e., uk(u,..., u p ) du }}}du p <. (A6) For the given point x with f(x)>0, &(x)>0, there is an open neighborhood U of x such that m # C 4 (U ), f # C (U), & # C 0 (U ). Additional notations are introduced. We define a square matrix C(H ) of p(p+) such that vech[huu T H]=C(H) vech[uu T ] for any u # R p. Explicitly C(H )=L (HH) D denotes the Kronecker product and L is the elimination matrix defined so that L vec A=vec A for any p_p matrix A, and D is the duplication matrix defined so that D vech A=vec A for any symmetric matrix A. Further properties on these

LOCAL POLYNOMIAL FITTING 95 special matrices are given in Magnus and Neudecker (980). In particular, we have C(H ) i =L (H i H i ) D for any i= } } }, &, &, 0,,,... Letting &}& denote the matrix norm given by the largest singular value and using the fact that H is symmetric, we have that &C(H )&=&H&. Hence &C(H )&=O(&H& ) for any matrix norm &}& as &H& 0. As in Section 3, we analyze the conditional bias and conditional covariance matrix of ;. The large sample asymptotic expansions are given in the next theorem. Theorem 3. Under model (.) and assumptions (A), (A5), and (A6), as &H& 0 and n H as n, the conditional bias has the asymptotic expansion: E{\ +&\ m(x) @ m(x) D@ m (x) D m (x) +}X vech[l(x) @] vech[l(x)] =B(x, H)+diag[, H &, C(H ) & ] &H& 3 n+, X,..., X _[o p (&H&)+O p ([n H ] & )], (4.3) ]\ %(m, H)+ 4! 3! f(x) % (m, H) B(x, H)=diag[, H &, C(H ) & b(m, H) +, (4.4) 3! + #(m, H)+ 4! 3! f(x) # (m, H) b(m, H) is as in Theorem, and %(m, H)=d & D4 m (x, Hu) K(u) du&c vech[i] vech[uut ] _D 4 m (x, Hu) K(u) du, % (m, H )=d & D3 (x, m Hu)[DT f (x) Hu] K(u) du &c vech[i] vech[uut ]D 3 m (x,hu)[dt f (x) Hu] K(u) du,

96 ZHAN-QIAN LU #(m, H )=&c vech[i] D4 m (x, Hu) K(u) du +E & vech[uu] D4 m (x,hu)k(u)du, # (m,h)=&c vech[i] D3 m(x, Hu)[D T f (x) Hu] K(u) du &Q T (x, H ) ud3 m (x, Hu) du+e & vech[uu] and _D 3 (x, m Hu)[DT f (x) Hu] K(u) du, d=(+ 4 &+ )(+ 4+(p&) + ), c=+ (+ 4 &+ ), E=diag[+ 4 &+, +,..., +, + 4 &+, +,..., +,..., p& p& + 4 &+, +, + 4 &+ ], (4.5) Q(x, H)=&cHD f (x) vech T [I] ++ & { u vecht [uu T ][D T f (x) Hu] K(u) du = E &. The conditional variancecovariance matrix has asymptotic expansion, m(x) @ D@ Cov{\ m (x) +}X vech[l(x) @] n=, X,..., X = n H diag[, H &, C(H) & ][7(x)+o p ()] _diag[, H &, C T (H) & ], (4.6) 7(x)= &(x) f(x) \ 0 J + & I 0, vech[i] 0 4& + (J &J 0 + ) \ 0, vech T [I] +, (4.7) vech[i] vech T [I] (+ 4 &+ )

LOCAL POLYNOMIAL FITTING 97 \=(+ 4 &+ )& [J 0 (+ 4 +(p&) + ) &pj + (+ 4 +(p&) + ) +p+ (J 4 +(p&) J )],,=(+ 4 &+ )& [J + 4 +(p&) J + &(p&) J + &J 4 + &J 0 + + 4 &(p&) J 0 + 3 ], 4=diag[*, *,..., *, *, *,..., *,..., *, *, * ], p& p& * =(J 4 &J )(+ 4&+ )&, * =J +&4. Further, in regard to the matrix Q(x, H ) defined in (4.5), it can be checked that the following result holds. Lemma. Let (a, a,..., a p ) T =HD f (x), the matrix Q(x, H) of (4.5) has the form \ + & a a }}} a p& a p a a }}} a p& a p...... }}} +, a a a p& a p a a a p& a p the suppressed elements are zeros. Proof. form: Let G(x, H)= u vech[uu T ][D T f (x) Hu] K(u) du, which has the a + + \+4 a }}} a p a 0 }}} 0 }}} a 0 a + 4 a a }}} 0 a a 3 }}} a p }}} a 0 a + b b b b b b b b b a p 0 }}} a a p 0 }}} a }}} a p a p& + 4 + a p +. (4.8) It can be checked that Q(x, H) has the given form. K

98 ZHAN-QIAN LU The regression part of Theorem 3 corresponds to the results in Section 3 of R6W. The derivative part of this theorem in the multivariate case is new and is the main contribution in this paper. In order to see more clearly the connections between the results in this paper and the corresponding results in R6W, we give the explicit calculations of the matrix A(H ) (corresponding to N x of R6W) which is used in our proofs. Define A(H)= \ u +( u vech[uu T ]) K(u) f(x+hu) du. vech[uu T ] We have the following lemma, whose proof is given in Section 5. Lemma. If f # C (U ) and f(x)>0, as &H& 0, we have A & (H) d & 0 &c vech T [I] = f(x)\ 0 + & &f(x) & Q(x, H )++o(&h&), &c vech[i] &f(x) & Q(x, H) T E & d, c, E, Q(x, H ) are as in Theorem 3. Explicit expressions for Theorem 3 and Theorem 4 are available in the important case that H=hI. Corollary. In the special case H=hI, we have that C(H )=h I and H =h p, I is the identity matrix of dimension p( p+), and the bias expressions in (4.4) have the explicit forms + 3 p m(x) +3+ 3\+4 x : 3 m(x) 3 x i= i x 3 m(x) + 4 +3+ b(m, H)=h x : 3 m(x) 3 x (4.9) i{ i x b + 4 3 m(x) x 3 p p& +3+ : 3 m(x) x i x p

LOCAL POLYNOMIAL FITTING 99 and 4_ %(m, H)=h +&+ 4 + 6 + 4 &+ % (m, H)=h _ 4 +&+ 4 + 6 + 4 &+ p : p : 4 m(x) &6+ x 4 : i 3 m(x) x 3 i 4 m(x) x i<jp i xj&, f(x) 3 m(x) &3+ : x i x i x j <i, jp i{j f(x) x i & ; and #(m, H) and # (m, H) are vectors of dimension p( p+) with components #(m, H)=(#,..., # p, #,..., # p,..., # pp ) T, # ii =h 4_ + 6&+ + 4 + 4 &+ 4_ # ij =h 4 + 4 + { 4 for 4 m(x) +6 + + 4 x 4 i + 4 &+ : kp k{i m(x) + 4 m(x) x 3 i x j x i x j= ++ 3 : j<ip, kp k{i, j 4 m(x), for ip x i x &. k 4 m(x) x k x i x j&, and # (m, H)=(# (),..., # p (), # (),..., # p (),..., # pp ()) T, 4_ # ii ()=h + + 6 &+ 4 3 m(x) f(x) + (+ 4 &+ ) +3+ x 3 : i x i for # ij ()=h 4_ 6+ : ip. kp k{i, j 3 m(x) f(x) x k x i x j kp k{i 3 m(x) x i x k& &3+ x k { 3 m(x) x i x j f(x) x j + 3 m(x) f(x) for j<ip. x i x j x i =& The joint asymptotic normality of the local quadratic estimators is given as follows. Theorem 4. Under conditions of Theorem 3 and (A4), we have that (n H) diag[, H, C(H)] _[; &;(x)&[b(x, H)+diag[, H &, C(H) & ]o(&h& 4 )]] tends in distribution to N(0, 7(x)), B(x, H) and 7(x) are given as in (4.4) and (4.7) of Theorem 3, respectively.

00 ZHAN-QIAN LU The following remarks are relevant: Remark 4. For the results on the first-order partial derivatives to hold, the assumptions in (A6) can be weakened to m # C 3 (U), f # C 0 (U). Remark 5. It is seen that the local quadratic derivative estimator eliminates the extra bias term [h (+ f(x))]b (m,k) of the local linear derivative estimator while retaining the asymptotic covariance matrix. Remark 6. In the special case H=hI, the conditional mean squared distance of error (CMSDE) for the local quadratic gradient estimator D m (x) is given by E[&D@ m (x)&d m (x)& X, X,..., X n ] r h4 (3! + ) &b(m)& + pj &(x) + nh p+ f(x), (4.0) b(m)=b(m, H)h 3. The locally optimal h which minimizes (4.0) is given by h opt (x)= J (p+6) &(x) {9p(p+) n f(x) b(m)& = &( p+6). The minimum pointwise CMSDE is given by plugging in h opt (x), CMSDE opt (x)= [p(p+) J &(x)] 4(p+6) &b(m)& (p+4)(p+6) 3 (p+4)(p+6) 4+ f(x)4(p+6) n &4( p+6). The asymptotic analysis provides some insights on the behavior of the estimator. The bias is quantified by the amount of smoothing and the third-order partial derivatives at x for each coordinate. Bias is increased when there is more third-order nonlinearity quantified by b(m, K ) and more smoothing. On the other hand, the conditional variance will be increased when there is less smoothing and sparser data near x. 5. PROOFS We only give the outlines of proofs of Theorem 3 and Theorem 4. The proof of Theorem 3 follows along the same lines as Fan (993) and Ruppert and Wand (994), and readers can refer to Lu (995a) for a more detailed proof. Theorem and Theorem can be proved similarly.

LOCAL POLYNOMIAL FITTING 0 5.. Outline of the Proof of Theorem 3 The conditional bias and conditional covariance matrix are given respectively by E(; X,..., X n )&;(x)=(x T WX) & X T W(M&X;) =diag[, H &, C(H) & ]S & n R n, Cov(; X,..., X n )=(X T WX) & X T WVW X(X T W X) & = n H diag[, H &, C(H) & ]S & n C n S & n _diag[, H &, C T (H) & ] M=(m(X ),..., m(x n )) T, V=diag[&(X ),..., &(X n )], S n = n : n X X T H & K(H & (X i &x)), C n = n : n X X T &(X i ) H & K (H & (X i &x)), R n = n n : X _ m(x i)&m(x)&d T (x)(x m i&x) & (X i&x) T H m (x)(x i &x) & H & K(H & (X i &x)), X =_ H & (X i &x) &. vech[h & (X i &x)(x i &x) T H & ] The proof of the bias part of the theorem consists of combining the following key steps with Lemma S n =A(H)+O p ([n H ] & ), S & n =A & (H)+O p ([n H ] & ), (5.) R n =[R(x, H)+&H& 3 [o(&h&)+o p ([n H ] & )]],

0 ZHAN-QIAN LU p : D 3 (x, m Hu)[DT f (x) Hu] K(u) du + R(x, H)= 3!\f(x) p u : D 3 (x, Hu) K(u) du m p h vech[uut ] : D 3 (x, Hu) m K(u)[DT f (x) Hu] du D4 m (x, Hu) K(u) du + f(x) 4! \0 +. vech[uut ]D 4 m (x,hu)k(u)du The proof of covariance part of the theorem consists of combining (5..), Lemma with C n =C(x)+O(&H&)+O p ((n H ) & ), J 0 0 J vech T [I] C(x)=&(x) 0 J f(x)\ I + 0 J vech[i] 0 E J +J vech[i] vecht [I] and E J =diag[j 4 &J, J,..., J, J 4 &J, J,..., J,..., J 4 &J, J, J 4 &J ]. p& 5.. Proof of Lemma Using the Taylor expansion p& f(x+hu)=f(x)+d T f (x) Hu+o(&H&), as &H& 0, we obtain that 0 + vech T [I] 0 + A(H)=f(x)\ + I 0 + vech[i] 0 D 0 + HD T (x) 0 f +\+ HD f (x) 0 G(x, H)++O(&H&), 0 G(x, H) T 0

LOCAL POLYNOMIAL FITTING 03 D=E++ vech[i] vecht [I], and E as in Theorem 3 and G(x, H) asin(4.8). Denote A(H)=A 0 +B(H)+o(&H&), A 0, B(H) corresponds to the matrices in the expansion of A(H) above. Note that A(H) & =A & 0 +A & 0 B(H) A & 0 +o(&h&), d & 0 &cvech T [I] A & 0 = f(x)\ 0 + & I 0 +, &c vech[i] 0 E & using matrix inverse formulae such as those in page 33 of Rao (973). Now we only need to check that the second term A & 0 B(H) A& 0, which is &f(x) & multiplied by \ d & 0 &cv T + 0 + & I 0 &cv 0 E & 0 + (HD f ) T 0 d & 0 &cv T + 0 + & I 0 0 G T 0 &cv 0 E & 0 d & (HD f ) T &c+ & V T G T 0 =\d & HD f &c+ & GV 0 Q+, 0 Q T 0 _\+ HD f 0 G+\ V=vech[I] and G=G(x, H) and Q=Q(x, H). Since G(x, H) V=(+ 4 +(p&) + ) HD f, which can easily be checked from the explicit expression for G(x, H) in Lemma, thus c G(x, H) V= + 4+(p&) + D + + 4 &+ f = d D f; i.e., d & HD f &c+ & G(x, H) V=0, d & (HD f ) T &c+ & V T G(x, H) T =0. The lemma is thus proved. K

04 ZHAN-QIAN LU 5.3. Outline of the Proof of Theorem 4 We write (n H ) diag[, H, C(H)] _[; &;(x)&diag[, H &, C(H) & ]S & n R n ]=S & n Z n, (5.) n Z n =(n H ) & : X K(H & (X i &x)) & (X i ) = i. The joint asymptotic normality of Z n can be established using the CramerWold device and Theorem.9.3 of Serfling (980) under the additional assumption (A4). The theorem is then proved by combining with previous results (5.) in Subsection 5., and replacing the LHS of (5.) by the bias given in Theorem 3 using Slusky's theorem (Theorem.5.4 of Serfling (980)). ACKNOWLEDGMENTS This paper is based on part of the author's doctoral thesis finished in the Department of Statistics, University of North Carolina, under the supervision of Professor Richard L. Smith. Revision of this work was supported by the NCAR Geophysical Statistics Project, sponsored by the National Science Foundation under Grant DMS93-686. This work benefitted from discussions with Professors Steve Marron and Jianqing Fan. Constructive suggestions from two anonymous referees have led to this condensed and improved version. REFERENCES [] Chu, C. K., and Marron, J. S. (99). Choosing a kernel estimator (with discussions). Statist. Sci. 6, No. 4 404436. [] Cleveland, W. (979). Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 74, No. 368 89836. [3] Cleveland, W., and Devlin, S. (988). Locally weighted regression: An approach to regression analysis by local fitting. J. Amer. Statist. Assoc. 83, No. 403 59660. [4] Fan, J. (993). Local linear regression smoothers and their minimax efficiencies. Ann of Statist., No. 966. [5] Fan, J., and Gijbels, I. (99). Variable bandwith and local linear regression smoothers. Ann. of Statist. 0, No. 4 008036. [6] Gasser, T., and Mu ller, H. G. (979). Kernel estimation of regression functions. Smoothing Techniques for Curve Estimation. Lecture Notes in Math., Vol. 757, pp. 368. Springer-Verlag New York. [7] Ha rdle, W. (990). Applied Nonparametric Regression. Cambridge Univ. Press, Boston.

LOCAL POLYNOMIAL FITTING 05 [8] Magnus, J. R., and Neudecker, H. (980). The elimination matrix: Some lemmas and applications. SIAM J. Alg. Disc. Methods, No. 4 4449. [9] Lu, Z. Q. (994). Estimating Lyapunov Exponents in Chaotic Time Series with Locally Weighted Regression. Ph.D. dissertation. University of North Carolina, Chapel Hill. [0] Lu, Z. (995a). Multivariate locally weighted polynomial fit and partial derivative estimation. Unpublished. [] Lu, Z. Q. (995b). Statistical estimation of local Lyapunov exponents: Toward characterizing predictability in nonlinear systems. Submitted. [] McCaffrey, D., Nychka, D., Ellner, S., and Gallant, A. R. (99). Estimating Lyapunov exponents with nonparametric regression. J. Amer. Statist. Soc. 87, No. 49 68695. [3] Mu ller, H.-G. (988). Nonparametric Regression Analysis of Longitudinal Data. Springer-Verlag, Berlin. [4] Nadaraya, E. A. (964). On estimating regression. Theory Probab. Appl. 9 44. [5] Rao, C. R. (973). Linear Statistical Inference and Its Applications, nd ed. Wiley, New York. [6] Ruppert, D., and Wand, M. P. (994). Multivariate locally weighted least squares regression. Ann. Statist., No. 3 346370. [7] Serfling, R. J. (980). Approximation Theorems of Mathematical Statistics. Wiley, New York. [8] Stone, C. J. (977). Consistent nonparametric regression (with discussions). Ann. of Statist. 5, No. 4 595645. [9] Stone, C. J. (980). Optimal rate of convergence for nonparametric estimators. Ann. of Statist. 8, No. 6 348360. [0] Watson, G. S. (964). Smooth regression analysis. Sankhya Ser. A 6 35937.