Nonparametric Trending Regression with Cross-Sectional Dependence

Similar documents
Simple Estimators for Semiparametric Multinomial Choice Models

Parametric Inference on Strong Dependence

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

GMM estimation of spatial panels

Simple Estimators for Monotone Index Models

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Estimating the Number of Common Factors in Serially Dependent Approximate Factor Models

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

13 Endogeneity and Nonparametric IV

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Notes on Time Series Modeling

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Chapter 2. Dynamic panel data models

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

1. The Multivariate Classical Linear Regression Model

Introduction: structural econometrics. Jean-Marc Robin

(Y jz) t (XjZ) 0 t = S yx S yz S 1. S yx:z = T 1. etc. 2. Next solve the eigenvalue problem. js xx:z S xy:z S 1

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Rank Estimation of Partially Linear Index Models

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

7 Semiparametric Methods and Partially Linear Regression

Inference on Trending Panel Data

MC3: Econometric Theory and Methods. Course Notes 4

University of Toronto

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Chapter 1. GMM: Basic Concepts

Tests for Cointegration, Cobreaking and Cotrending in a System of Trending Variables

Department of Econometrics and Business Statistics

Lecture Notes on Measurement Error

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

Nonparametric Identi cation of a Binary Random Factor in Cross Section Data

Testing for Panel Cointegration Using Common Correlated Effects Estimators

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

A New Approach to Robust Inference in Cointegration

Developments in the Analysis of Spatial Data

Testing the null of cointegration with structural breaks

Nonparametric Spectrum Estimation for Spatial Data. P.M. Robinson. London School of Economics

Testing for Regime Switching: A Comment

Panel cointegration rank testing with cross-section dependence

Testing for a Trend with Persistent Errors

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Cointegration and the joint con rmation hypothesis

When is it really justifiable to ignore explanatory variable endogeneity in a regression model?

A formal statistical test for the number of factors in. the approximate factor models

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Applications of Subsampling, Hybrid, and Size-Correction Methods

Bootstrapping Long Memory Tests: Some Monte Carlo Results

Stochastic integral. Introduction. Ito integral. References. Appendices Stochastic Calculus I. Geneviève Gauthier.

Testing Weak Convergence Based on HAR Covariance Matrix Estimators

GMM Estimation of SAR Models with. Endogenous Regressors

Approximately Most Powerful Tests for Moment Inequalities

Bootstrapping the Grainger Causality Test With Integrated Data

A Bootstrap Test for Conditional Symmetry

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance

A Test of Cointegration Rank Based Title Component Analysis.

Cointegration Tests Using Instrumental Variables Estimation and the Demand for Money in England

Chapter 6. Maximum Likelihood Analysis of Dynamic Stochastic General Equilibrium (DSGE) Models

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models

Discussion Paper Series

Comparing Nested Predictive Regression Models with Persistent Predictors

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Multiple Local Whittle Estimation in Stationary Systems

On the optimal weighting matrix for the GMM system estimator in dynamic panel data models

Time is discrete and indexed by t =0; 1;:::;T,whereT<1. An individual is interested in maximizing an objective function given by. tu(x t ;a t ); (0.

Economics 241B Estimation with Instruments

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

Comparing the asymptotic and empirical (un)conditional distributions of OLS and IV in a linear static simultaneous equation

Averaging Estimators for Regressions with a Possible Structural Break

CAE Working Paper # Fixed-b Asymptotic Approximation of the Sampling Behavior of Nonparametric Spectral Density Estimators

Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities

Quantiles, Expectiles and Splines

E cient Regressions via Optimally Combining Quantile Information

When is a copula constant? A test for changing relationships

Nonparametric Estimation of Wages and Labor Force Participation

Measuring robustness

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Bootstrapping Long Memory Tests: Some Monte Carlo Results

LECTURE 13: TIME SERIES I

Panel VAR Models with Spatial Dependence

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

1 Continuation of solutions

On Econometric Analysis of Structural Systems with Permanent and Transitory Shocks and Exogenous Variables

Defect Detection using Nonparametric Regression

Problem set 1 - Solutions

Panel VAR Models with Spatial Dependence

Estimation and Inference of Linear Trend Slope Ratios

Robust Con dence Intervals in Nonlinear Regression under Weak Identi cation

Granger Causality and Dynamic Structural Systems

11. Bootstrap Methods

TESTING SERIAL CORRELATION IN SEMIPARAMETRIC VARYING COEFFICIENT PARTIALLY LINEAR ERRORS-IN-VARIABLES MODEL

Weak - Convergence: Theory and Applications

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

R = µ + Bf Arbitrage Pricing Model, APM

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

On the Power of Tests for Regime Switching

UNIVERSITY OF CALIFORNIA, SAN DIEGO DEPARTMENT OF ECONOMICS

Large Panel Data Models with Cross-Sectional Dependence: A Surevey

On Parameter-Mixing of Dependence Parameters

Transcription:

Nonparametric Trending Regression with Cross-Sectional Dependence Peter M. Robinson London School of Economics January 5, 00 Abstract Panel data, whose series length T is large but whose cross-section size N need not be, are assumed to have a common time trend. The time trend is of unknown form, the model includes additive, unknown, individual-speci c components, and we allow for spatial or other cross-sectional dependence and/or heteroscedasticity. A simple smoothed nonparametric trend estimate is shown to be dominated by an estimate which exploits the availability of cross-sectional data. Asymptotically optimal choices of bandwidth are justi ed for both estimates. Feasible optimal bandwidths, and feasible optimal trend estimates, are asymptotically justi ed, the nite sample performance of the latter being examined in a Monte Carlo study. A number of potential extensions are discussed. JEL Classi cations: C3; C4; C3 Keywords: Panel data; nonparametric time trend; cross-sectional dependence; generalized least squares; optimal bandwidth Tel. +44-0-7955-756; fax: +44-0-7955-659. E-mail address: p.m.robinson@lse.ac.uk

. INTRODUCTION Much econometric modelling of nonstationary time series employs deterministic trending functions that are polynomial, indeed frequently linear. However the penalties of mis-specifying parametric functions are well appreciated, and nonparametric modelling is increasingly widely accepted, at least in samples of reasonable size. The inability of polynomials to satisfactorily globally approximate general functions of time deters study of polynomial functions whose order increases slowly with sample size, and rather leads one to consider the possibility of a smooth trend mapped into the unit interval and approximated by a smoothed kernel regression. For example, Starica and Granger (005) employed this approach in modelling series of stock prices. There is a huge literature on such xed-design nonparametric regression, principally in the setting of a single time series. Here we are concerned with panel data, where N series of length ave a common, nonparametric, time trend but also additive, xed, individual e ects, for which we have to correct before being able to form a trend estimate. We assume an asymptotic framework in which T is large; but not necessarily N; so that the cross-sectional mean at a given time point is not necessarily consistent for the trend, hence the recourse to smoothed nonparametric regression. A major feature of the paper is concern for possible cross-sectional correlation and/or heteroscedasticity. These in uence the asymptotic variance of our trend estimate, and thence also the mean squared error and consequent optimal rules for bandwidth choice. The availability of crosssectional data enables us to propose a trend estimate, based on the generalized least squares principle, that reduces the asymptotic variance. This estimate, along with its asymptotic variance (and that of the original trend estimate), depends on the cross-sectional covariance matrix. In general this is not wholly known, and possibly not known at all. Using residuals from the tted trend, we consistently estimate

its elements, so as to obtain a feasible improved trend estimate, and a consistent estimate of its variance, as well as feasible optimal bandwidths that are asymptotically equivalent to the infeasible versions. These results are valid with N remaining xed as T increases, and they continue to hold if N is also allowed to increase, in which case there is a faster rate of convergence, and in this latter situation our results hold irrespective of whether or not the covariance matrix is nitely parameterized. Section describes the basic model. In Section 3 we present a simple trend estimate and its mean squared error properties. Improved estimation is discussed in Section 4. In Section 5 optimal bandwidths are reported. Section 6 suggests estimates of the cross-sectional covariance matrix, with asymptotic justi ication for their insertion in the optimal bandwidths and improved trend estimates. Section 7 suggests some directions for further research. Proof details may be found in two appendices.. PANEL DATA NONPARAMETRIC MODEL We observe y it, i = ; :::; N, t = ; :::; T, generated by y it = i + t + x it ; (.) where the i and t are unknown constants, and the x it are unobservable zero-mean random variables, uncorrelated and homoscedastic across time, but possibly correlated and heteroscedastic over the cross section. Thus we impose Assumption For all i; t, E(x it ) = 0; (.) for all i; j; t there exist nite constants! ij such that E (x it x jt ) =! ij ; (.3) and for all i; j; t; u, E (x it x ju ) = 0; t 6= u: (.4) 3

Our focus is on estimating the time trend. Super cially this is represented in (.) by t, but the i and t are identi ed only up to location shift. To resolve this problem we initially impose the (arbitrary) restriction NX i = 0: (.5) i= An immediate consequence of (.5) is the relationship y At = t + x At ; (.6) de ning the cross-sectional means x At = N N X X N x it ; y At = N y it : (.7) i= i= We use the A subscript to denote averaging, in (.7) with respect to i and subsequently also with respect to t. In view of (.) an obvious estimate of t is thus t = y At : (.8) This can be a good estimate if N is large. For any xed t, it is trivially seen that t is mean-square consistent for t if lim N!N NX NX! ij = 0: (.9) i= j= Condition (.9) is trivially satis ed if the x it are uncorrelated over i, but more generally if cross-sectional dependence is limited by the condition lim N! N NX NX! ij < : (.0) i= j= We mention (.0) because it is analogous to a common weak dependence condition for time series, indeed it would correspond to an extension of the latter condition to stationary spatial lattice processes. Also (.0) is slightly weaker than Condition C.3 4

of Bai and Ng (00), in a di erent panel data context. The more general condition (.9) permits a type of cross-sectional long range dependence. While it can also cover certain models including additive factors, (.9) does not, however, hold for factor models in which the x it, for all i, are in uenced by the same factor or factors. For this reason, and because we do not wish to model the t in terms of nitely many parameters (as would be the case for a polynomial trend, say), we use nonparametric smoothing across time which requires T to be large, but not necessarily N. In order to achieve consistent estimates we assume the existence of a function (), 0, that is suitably smooth, such that t = (t=t ); t = ; :::; T: (.) The T -dependent argument in (.) enables information to be borrowed so as to permit consistent estimation of () for any xed as T!. Thus t, and hence in turn y it, should be regarded as triangular arrays, i.e. t = tt, y it = y itt, but for ease of notation we suppress reference to the T -subscript. Also, though our work is relevant in part to the case of a xed N (as T! ), we also allow N! (slowly, as T! ). In the latter circumstances, the i should also be regarded as triangular arrays, i = in, in view of the restriction (.5), and this would imply dependence of the y it on N also, though again we suppress reference to an N subscript. In terms of practical applications, on the other hand, one envisages data for which T is much larger than N, one example being many frequent time series observations on a relatively modest number of stock prices. Smoothed nonparametric estimation has been considered previously in a panel data setting. For example, Ruckstuhl, Welsh and Carroll (000) considered a model in which t is replaced by a nonparametric function of a stochastic explanatory variable, which can vary across both i and t, the i are stochastic, and independent and homoscedastic across i; t, and N is xed. Thus there is an explicit factor structure 5

built in, with no other source of cross-sectional dependence (and no heteroscedasticity). We discuss a factor structure for x it but as a special case only, and here as elsewhere we consider the possibility that N, like T; diverges. Also, Hart and Wehrly (986) considered the special case of (.) with no individual-speci c e ects, i.e. i 0, and with no cross-sectional correlation or heteroscedasticity. A deterministic nonparametric trend also features in the panel data partly linear semiparametric regression models considered by Severini and Stanisvalis (994), Moyeed and Diggle (994), for example. 3. SIMPLE TREND ESTIMATION We introduce a kernel function k(u), < u <, satisfying Z k(u)du = ; (3.) and a positive scalar bandwidth h = h T. Then with the abbreviation T t k t = k ; (3.) de ne the estimate ~() = k t y At = t= k t ; (0; ): (3.3) t= Important measures of goodness of nonparametric estimates, which lead to optimal choices of bandwidth h, are mean squared error, i.e. and mean integrated squared error, i.e. n o n o MSE ~() = E ~() () ; (3.4) n o MISE ~ = Z 0 E n ~(u) (u) o du: (3.5) 6

To approximate these we require conditions on ; k and h. Assumption () is twice continuously di erentiable, 0. Assumption 3 k(u) satis es (3.) and is even, non-negative, and twice boundedly di erentiable at all but possibly nitely many points, k(u) and its derivative k 0 (u) = (d=du)k(u) satisfying k(u) = O ( + u 4 ), k 0 (u) = O ( + juj c ) ; some c > : (3.6) Assumption 4 As T! h + ()! 0: (3.7) The smoothness condition on in Assumption could be relaxed at cost of inferior optimal rates of convergence, or on the other hand strengthened, which would lead to better rates if Assumption 3 were modi ed to permit higher order kernels; in any case the non-negativity condition on k is imposed only to simplify proofs. Assumption 4 is as usual a minimal condition for consistent estimation. De ne the N N cross-sectional disturbance covariance matrix ; having (i; j) th element! ij (see Assumption ), de ne by ` the N vector of s, and also de ne = Z k (u)du; = Z u k(u)du; (3.8) () = 00 (), = Z 00 (u) du; (3.9) where 00 (u) = (d =du ) (u): 0 Assumption 5 () > 0, > 0: 7

Theorem Let Assumptions -5 hold. Then as T! n o MSE ~() n o MISE ~ `0` N + () h 4 ; 4 (3.0) `0` N + h 4 : 4 (3.) Theorem contains nothing new in itself, for xed N; `0`=N being just the variance of the dependent variable y At in the nonparametric regression. Versions of this result were given long ago, e.g. by Benedetti (977), though conditions employed in Theorem are essentially taken from Robinson (997), and no proof is required. Hart and Wehrly (986) also considered nonparametric regression of cross-sectional means in a panel data setting, but without allowing for individual-speci c e ects. However, the availability of cross-sectional data o ers the possibility of improved estimation, in terms of variance-reduction, as explored in the following section. 4. IMPROVED TREND ESTIMATION Improved estimation of the trend requires it to be identi ed in a di erent way from that in Section, in particular to shift its location. Consider the representation y it = (w) i + (w) t + x it ; (4.) where the bracketed superscript w represents a vector w = (w ; :::; w n ) 0 of weights, such that w 0` = ; (4.) w 0 (w) = 0; (4.3) where ` is a N vector of s. This represents a generalisation of (.), (.5), in which w = (=N; :::; =N) 0. It is convenient to write (4.), for i = ; :::; N, in 8

N-dimensional column vector form as where (w) = 6 4 (w). (w) N y t = (w) + (w) t ` + x t ; (4.4) 3 7 5 ; x t = 6 4 x t. x Nt 3 7 5 ; y t = 6 4 For given w consider a smooth function (w) () such that (w) t two weight vectors w a ; w b we have y t. y Nt 3 7 5 : (4.5) = (w) (t=t ). For any (wa) + (wa) ()` = (w b) + () (w b)`: (4.6) Thus (w b) () = (wa) () + w 0 b (wa) ; (4.7) (w b) = (wa) + (wa) () (wb) () ` where I N is the N N identity matrix. The location shift in (w b) t = (I N `w 0 b) (wa) ; (4.8) relative to (wa) t is thus +w 0 b (wa) (and that in (w b) i relative to (wa) i is w 0 b (wa), for each i). However, the time trend is scale- and shape-invariant to choice of w. Moreover, de ning the estimate e (w) () = k t w 0 y :t = t= we have, with as de ned in Section, n o E ~ (w) () (w) () = = t= k t ; (4.9) t= n (w) (t=t ) o (w) () k t = f(t=t ) ()g k t = t= 9 t= k t k t (4.0) t=

for any w, so bias is invariant to w. Thus, n o n MSE ~ (w) () = E ~ (w) () ()o (w) (4.) and MISE n ~ (w) o = Z E n ~ (w) (u) (w) (u)o du (4.) 0 are a ected by w only through the variance. This is approximated by the leading term on the right sides of (4.3) and (4.4) in Theorem below, which requires the additional Assumption 6 is non-singular. Theorem Let Assumptions -6 hold. Then as T!, n o MSE ~ (w) () (w0 w) n o MISE ~ (w) (w0 w) + h4 (); (4.3) 4 + h4 : (4.4) 4 Again no proof is needed. The bias contributions in Theorems and are identical as discussed above, or alternatively due to and having identical second derivatives. The right hand sides of (4.3) and (4.4) are minimized, subject to (4.), by w = w, where w = `0 `; (4.5) For notational ease denote () = (w) () = () + (`0 )=(`0 `); ~() = ~ (w ) (). Disregarding the vertical shift, ~() can be said to be at least as good an estimate as ~ (). We have w 0 w = `0 ` : (4.6) 0

Thus the asymptotic variance, and MSE and MISE, of ~ () equal those of ~() when `0` N = `0 `: (4.7) This is true if has an eigenvector `. One such situation of interest arises when the cross-sectional dependence in x it is described by a spatial autoregressive model with row- and column-normalized weight matrix: in particular, x t = W x :t + " :t ; (4.8) where " :t is an N vector of zero-mean uncorrelated, homoscedastic random variables (with variance constant also across T ); the scalar ( ; ); and W is an N N matrix with zero diagonal elements and satisfying W ` = W 0` = `. In practice elements of W are a measure of inverse economic distance between the corresponding elements of x :t. Thus W is often, though not always, symmetric, in which case if one of the latter equalities holds, so do both. This is the case in the familiar "farmers-districts" setting in which W is block-diagonal with symmetric blocks, including spatial correlation across farmers within a district, but not across districts. However, if W is non-symmetric, and only row-normalized, ~() is better than ~ (). As another example, suppose x it has factor structure such that = ai N + bb 0 (4.9) where the scalar a > 0 and b is an N non-null vector. Then `0` = a (`0b) N N + N ; (4.0) (`0 `) (`0b) = a N ; (4.) a + b 0 b and (4.7) holds if and only if b is proportional to `, i.e. if all factor weights are identical. Another simple example in which (4.7) holds is when there is no crosssectional dependence in x it but not all variances! ii are identical.

In terms of the impact on (4.3) and (4.4) of N increasing in the asymptotic theory, note that `0 ` N= kk ; denoting by kak the square root of the greatest eigenvalue of A 0 A. Thus if kk increases more slowly than N (it cannot increase faster) the variance components of (4.3) and (4.4) decrease faster than if N were xed, indeed if kk remains bounded the variance rate is (N) ; and the latter property holds even in the factor case (4.9), where kk increases at rate N but, as (4.) indicates, (`0 `) = O(N ) if lim N! (`0b) = fn(a + b 0 b)g < ; (4.) where the expression whose upper limit is taken is clearly less than for all xed N (by the Cauchy inequality), but converges to as N! if, say, b is proportional to `; essentially the requirement for (6.7) to hold for this factor model is one of su cient variability in the factor weights. 5. OPTIMAL BANDWIDTH CHOICE A key question in implementing either ~ or ~ is the choice of bandwidth h. Choices that are optimal in the sense of minimizing asymptotic MSE or MISE are conventional. The following theorem di ers only from well known results in indicating the dependence of the optimal choices on and N; and so again no proof is given. Theorem 3 Let Assumptions -5 hold. MISE of ~ are respectively h ;MSE () = h ;MISE = The h minimizing asymptotic MSE and `0` =5 T N = () ; (5.) =5 : (5.) `0` T N =

Let Assumption 6 also hold. The h minimizing asymptotic MSE and MISE of ~ are respectively ( =5 (`0 `) h ;MSE () = = ()) ; (5.3) T ( =5 (`0 `) h ;MISE = = ) : (5.4) T For xed N, these optimal h all have the conventional T =5 rate. If N is regarded as increasing also then unless (.9) does not hold the rates of (5.) and (5.), and thus (5.3) and (5.4) (in view of (4.5)), are of smaller order. In any case (4.5) implies generally smaller optimal bandwidths for ~ than. ~ 6. FEASIBLE OPTIMAL BANDWIDTH CHOICE AND TREND ESTIMATION In practice the optimal bandwidths of the previous section cannot be computed. The constants and are trivially calculated, but () and are unknown. Discussion of their estimation can be found in the nonparametric smoothing literature, see e.g. Gasser, Kneip and Kohler (99), and there is nothing about our setting to require additional treatment here, apart from the improved estimation possible by averaging over the cross section. More notable is the need to approximate the partly or wholly unknown. This arises also if, instead of employing a plug-in proxy to the optimal bandwidths of the previous section, some automatic method such as crossvalidation is employed. Estimation of is also required in order to form a feasible version of the optimal trend estimate ~() In the simplest realistic situation, of no cross-sectional heteroscedasticity or dependence, =!I N, with! unknown, in which case ~() (). ~ More generally we have a parametric structure permitting dependence and/or heteroscedasticity, such as in 3

the examples of the previous section. Of course if N remains xed as T increases, is by de nition parametric, so there is no theoretical loss in regarding as an unrestricted positive de nite N N matrix. If then N is allowed to increase this corresponds to a nonparametric treatment of. In all cases is estimated by means of residuals. It is natural to base the residuals on the originally parameterized model (.), but there is a choice of residuals depending on whether or not N is regarded as increasing with T, and, if it does, on the properties of the sequence as N increases. De ne the temporal means x ia = T and the overall means x AA = N x it ; y ia = T y it ; (6.) t= N X x ia ; y AA = N t= N X i= i= y ia : (6.) Then we have, using (.5), y it y ia + y AA = t + x it x ia + x AA : (6.3) Employing also (.6) gives y it y ia y At + y AA = x it x ia x At + x AA : (6.4) This suggests the residual ~x it = y it y ia y At + y AA : (6.5) Clearly x ia = O p (T ) uniformly in i and x AA = O p (T ), helping to justify (6.5). However, it is necessary also that x At = o p () and this requires N!, and, as discussed previously, will not hold even then in models such as (4.9) due to the strength of the cross-sectional dependence. Moreover, a satisfactory convergence rate 4

may be required in order to show that the optimal bandwidth, and ~(), are suitably closely approximated. Thus, instead of considering (6.5) further, we rely solely on T being large and in view of (6.3) consider instead the residual ^x it = y it y ia t ~ + y AA ; (6.6) where ~ t = ~ (t=t ). Thus estimate! ij by ^! ij = T ^x it^x jt : (6.7) t= We strengthen Assumption 4 to Assumption 7 As T! ; h + T h 3! 0: (6.8) Theorem 4 Let Assumptions -3, 6 and 7 hold. Then as T! E j^! ij! ij j = O h 3 + h3= + ; (6.9) T uniformly in i and j. The proof appears in Appendix A. Now de ne ^ to be the N N matrix with (i; j)-th element ^! ij, and consider the replacement of by ^ in the optimal bandwidths (5.)-(5.4). For completeness we also assume estimates of ^(), ^ of (), respectively. Notice that from Theorem 4 ^ ( X N i= NX j= (^! ij! ij ) ) = O p Nh 3 + Nh3= + N : (6.0) T 5

Assumption 8 ^() () = O p kk ^ ; ^ = O p kk ^ : (6.) Assumption 8 is expressed in a somewhat unprimitive way, but the order in (6.0) is actually what is required to ensure that the e ect of estimating bias is negligible, and as the bias issue is of very secondary importance here we do not go into details about the rates attainable by a particular estimate. However the rate would depend inter alia on the rate of decay of a bandwidth (not necessarily h) used in the nonparametric estimation of 00, or more particularly, in view of (6.), on its rate relative to that of h: Note that if N remains xed as T increases the rate in (6.) must be O p h 3 + h 3= T + ; whereas if N increases then the latter rate is still relevant in, for example, the factor case (4.9), but if kk does not increase or increases more slowly than N; then (6.) entails a milder restriction. De ne ^h ;MSE () = ^h ;MISE = ^h ;MSE () = ^h ;MISE = ( ) =5 ^` `0 T =^() ; (6.) ( ) =5 ^` `0 T =^ ; (6.3) ( (`0 ^ `) ) =5 = ^() ; (6.4) T ( (`0 ^ `) ^) =5 = : (6.5) T Assumption 9 If N! as T! then Nh 3 + Nh3= T + N! 0: (6.6) Assumption 0 If N! as T! then N`0 ` + = O(): (6.7) (`0 `) 6

Boundedness of the rst term on the left is equivalent to having smallest eigenvalue that is bounded away from zero for su ciently large N, as in Assumption 6. By the Cauchy inequality, the second term on the left of (6.7) is no less than, so it cannot converge to zero. On the other hand, since `0 ` `0 ` max k k ; N kk ; (6.8) it follows that if also the greatest eigenvalue of is bounded, the second term on the left of (6.7) is also bounded. But the greatest eigenvalue of can diverge with N, as in the factor example (4.9), whereas (6.7) may be held even in this case. To see this, note that `0 ` = a fn (4.) it is su cient that (4.) holds. (`0b) (a + b 0 b) =(a + b 0 b) g = O(N); so in view of Theorem 5 Let Assumptions -3 and 5-0 hold. N! also, Then as T!, and possibly ^h ;MSE () h ;MSE () ; ^h;mise h ;MISE ; The proof is in Appendix A. ^h ;MSE () h ;MSE () ; ^h;mise h ;MISE! p : (6.9) We can also use ^ to obtain a feasible version of ~(), namely `0 ^ ^() = `0 ^ ` k t y t = k t : (6.0) t= t= Theorem indicates that ~() () = O e (`0 `) () + h! ; (6.) where O e denotes an exact stochastic order. It is of interest to show that ^() achieves the same asymptotic MSE as ~(), in other words that ~() ~() is of smaller order than (6.). For this purpose we introduce: 7

Assumption If N! as T! then Nh + N! 0: (6.) Theorem 6 Let Assumptions -4 and 5- hold. Then as T!, and possibly N! also, (`0 `)! ^() ~() = o p + h : (6.3) () The proof is in Appendix A. 7. MONTE CARLO STUDY OF FINITE SAMPLE PERFORMANCE As always when large sample asymptotic results are presented, the issue of nitesample relevance arises. In the present case, one interesting question is the extent to which ^() matches the e ciency of ~(), and whether it is actually better than (), ~ given the sampling error in estimating. We study this question by Monte Carlo simulations in the case where has the factor structure (4.9). In (.), we thus take x it = b i i + p a" it ; (7.) where the i and " it have mean zero and variance, and there is independence throughout the f i ; i = ; :::; " it ; i; t = ; ; :::g, more particularly we take the i, " it to be normally distributed. We tried various values of a (a = ; ; ) and set the b i factor weights by rst generating b ; :::; b N independently from a normal distribution, but then keeping them xed across replications; in particular we considered the three choices b N (0; I N ), N (0; 5I N ) and N(0; 0I N ). With respect to the deterministic component of (.), we took (u) = ( + u ) throughout and xed the individual e ects i by rst generating ; :::; N independently from the standard normal distribution, then taking N = ::: N in order to satisfy (.5), then keeping 8

these i xed across replications. The estimates ~ (), ~() and ^() were computed at points = ; ; 3. For k we used the uniform kernel on [ ; ], and the band- 4 4 width values used were h = 0:; 0:5;. Two (N; T ) combinations employed, (5; 00) and (0; 500). Monte Carlo MSE, "\MSE" in Tables -3, was computed in each case, based on 000 replications. In Table V ar b i =, in Table V ar b i = 5, and in Table 3 V ar b i = 0. (Tables -3 about here) When (N; T ) = (5; 00) it was found that in every case ^ had larger \MSE than ~ but smaller \MSE than ~. However, when (N; T ) = (0; 500) ^ was always worse than ^ when the b i were generated from a distribution with variance, though when the latter variance was increased to 5 ^ was better in nearly half of the cases (3 out of 7), and when this variance was 0 ^ was worse than ~ in only one case. It would appear that these results re ect the fact that the di erence between variances of ~ and ^, namely `0` `0 ` ; (7.) N has to be large enough relative to the variability increase due to estimation of, in order for ^ to beat ~. In some sense the farther away is from an identity matrix the better for ^ s relative performance, explaining why a large variance in generating the b i helps, though for given such variance increasing N hurts ^ relatively (despite the helpful simultaneous increase of T ). The above description of the results is not informative of the extent to which ^ and ~ are close, or to whether ^ beats or is beaten by ~, and Tables -3 reveal these details. As a increases so does the variance of ~, so again the potential for ^ to improve over ~ is reduced, and the tables illustrate this. For (N; T ) = (5; 00), ^ s performance is often very roughly midway between those of ^ and ~. Also for (N; T ) = (5; 00), \MSE is often U-shaped in h in Tables and, and also for ~ and ^ in Table 3, but 9

tends to be decreasing in h for ~ in Table 3. The pattern of performance relative to h is less clear when (N; T ) = (0; 500). In these simulations we have not pursued the discussion of optimal bandwidth choice (see Theorem 3) by incorporating feasible optimal bandwidths (as justi ed in Theorem 5). Some discussion of this issue was presented in Section 6, but a more explicit one is included in the following sequence of steps. Given use of a twice di erenciable k, a feasible ^h ;MISE (6.3) can be computed (or approximated, perhaps by cross-validation), and employed in place of h in the estimate () ~ (3.3). Then the residuals (6.6) are calculated, followed by ^, using (6.7). Finally, ^h ;MISE (6.3) is computed and used in the estimate ^() (6.0). 8. FURTHER DIRECTIONS FOR RESEARCH. Our asymptotic variance formulae for ^() and ^() appear also in central limit theorems, under some additional conditions, indeed one could develop joint central limit theorems for both () ~ and ~() at nitely many, r, xed frequencies ; :::; r, with asymptotic independence across the i. When the bias is negligible relative to the standard deviation, the convergence rate will be () when N is xed, and faster if N is allowed to increase with T. We could also develop a central limit theorem for ^( i ), i = ; :::; r, giving the same limit distribution.. The assumption of regular spacing across t is easily relaxed with some regularity conditions on the spacings, since much of the xed-design nonparametric regression estimation literature permits this. 3. Temporal correlation can also be introduced. For example, suppose X x t = A j " t j ; (8.) 0 j=0

where the " t are uncorrelated with covariance matrix I N, and the N N matrix A j satis es Then as T!, where X A j z j 6= 0; jzj = ; (8.) j=0 X ka j k < : (8.3) j=0 n o V ar ~() `0f(0)` N f(0) =! X A j j=0 (8.4)! 0 X A j ; (8.5) which is proportional to the value of the spectral density matrix of x t at frequency zero. Likewise as T! j=0 (`0f(0) `) V ar fe()g : (8.6) We can consistently estimate f(0) by procedures of smoothed probability density estimation, and thence extend the results of Section 6. 4. We could proceed further by relaxing (8.3) to permit long memory, or on the other hand relax (8.) to permit antipersistence. Relevant variance formula for univariate xed-design regression with long memory or antipersistent disturbances can be found in Robinson (997), whose results can be extended to our otherwise more general setting, one further issue arising being the possibility of varying memory parameters over the cross-section. 5. Unknown individual-speci c multiplicative e ects can also be incorporated. To extend (.), y t = + t + x t ; (8.7)

where is an N unknown vector. Since the scale, as well as location, of t is not xed we need a normalization restriction on. Imposing `0 = N; (8.8) in addition to (.5), we readily see that () ~ retains the properties described in Theorem. However, improving on () ~ along the lines discussed in Section 4 is more problematic. It was seen in Section 4 that changing w does not change the bias of ~ (w) () as an estimate of (w) (), and changes the asymptotic variance by a factor that depends only on w. But consider any (w) ; (w) ; (w) (), where w 0 (w) = 0; w 0 (w) = (8.9) and We deduce that and thence + () = (w) + (w) (w) (): (8.0) (w) () = w 0 + (w 0 )() (8.) (w) w = I 0 N ; (8.) w 0 (w) = w 0 : (8.3) Then the bias n o E ~ (w) () (w) () = t= o k t n (w) (t=t ) (w) () = = (w 0 )BIAS t= k t n ~ (w) () o ; (8.4) n o where BIAS ~ (w) () is (4.0). Thus, varying w changes the bias by a factor which re ects the unknown multiplicative e ects, so the incorporation of these complicates the study of optimal trend estimation.

6. Analogous methods and theory could be developed for corresponding models in which the regressor in the nonparametric function is stochastic (and possibly multivariate), rather than deterministic. Robinson (007) considers static stochastic-design nonparametric regression but without distinguishing between a time and cross-sectional dimension, and without individual e ects, but his conditions on spatial dependence in regressors and disturbances are quite general and could be employed in more general settings, such as that just envisaged. Appendix A: Proofs of Theorems Proof of Theorem 4 We have (^x it^x jt (x it x jt! ij ) : (A.) ^! ij! ij = T x it x jt ) + T t= t= The second term on the right has mean zero and variance bounded by CT, where C throughout denotes a generic constant (so in the present case there is uniformity in i and j). From (6.3) and (6.6) we may write ^x it = x it + d i + e t ; (A.) where d i = x AA x ia ; e t = t ~ t : (A.3) Then ^x it^x jt x it x jt = (^x it x it ) (^x jt x jt ) + x it (^x jt x jt ) + (^x it x it ) x jt = (d i + e t ) (d j + e t ) + x it (d j + e t ) + (d i + e t ) x jt : (A.4) The rst term on the right of (A.4) is bounded in absolute value by d i + d j + e t. From Assumption, E(d i ) CT uniformly in i. Next write e t = f t =k t m t =k t ; (A.5) 3

where f t = () k st ( t s ); (A.6) s= m t = () k st x As ; s= k t = () k st : s= (A.7) (A.8) By Lemma 3, jk t j C uniformly in t and T. Next, m t has mean zero and variance bounded by C() kst C() ; applying Lemma. By the mean value theorem s t s t = 0 t + s t 00 T T st; s= (A.9) (A.0) where 0 t = 0 (t=t ) and j 00 stj C: Thus jf t j j 0 s t s tj () k st + () T T s= s= t jk st 00 stj : (A.) The last term is O(h ) by Lemma. The rst factor of the rst term on the right is uniformly bounded and the second factor may be written Z h s t () k st uk(u)du : s= (A.) From Lemma this is bounded by ( Ch + T h + 3 t T + t ) (A.3) for t T. For other t we use the bound Ch from Lemma. It follows 4

from Lemma 3 that f t =k t has the same bounds. Thus T t= ft Thus, for large T, k t C + C C h 3 + ( E T C t= T + T 4 h + h 4 T T + T 4 h + h : 4 T h 3 + T + T 4 h 4 d i + d j + e t ) C Next, looking at the second term in (A.4) we have E d j T x it t= ( Ed j T! X (t=) t : (A.4) h 3 + : (A.5) t= ) E x it CT : (A.6) Also E T ( x it f t =k t C T t= C T t= ft k t ) h 3= + T + T h : (A.7) Write Thus m t E x it k t t= x it m t = k(0) x itx At + x it k(0) E t= 8 0 + >< >: E B @ x it x At k t t= k st x As : s= s6=t x it k t 9 > = C k st x As A >; s= s6=t (A.8) : (A.9) 5

The rst term on the right is bounded by C=(T h). The second term is C 8 >< >: E t= t= s= x it k t k sv 0 B @ 9 > = C k st x As A >; s= s6=t! C=h + E 0 B @ t= v= v6=t ktv x it x Av x iv x At C k t k v A (A.0) by Lemma. It follows that E T x it m t =k t C=( ): t= (A.) Thus, E j^! ij! ij j C h 3 + + C h 3= + T T + T h C h 3 + h3= + T + C : (A.) The proof is completed. Proof of Theorem 5 It su ces to consider only the MISE-optimal bandwidths, the proofs for the MSE-optimal ones being identical. We have 8 =5 < ^h ;MISE h ;MISE = T N : `0 ^` ^! =5 `0` 9 = ; =5 : (A.3) By the mean value theorem, the last factor is bounded in absolute value by 5 j~r j `0 ^ ` + 5 j~r j ^ ; (A.4) where ~r lies between (`0`) 4=5 =5 and (`0 ^`) 4=5^ =5 and ~r lies between (`0`) =5 6=5 6

6=5 and (`0 ^`) =5^. Then straightforwardly we deduce that (A.4) is n o (`0`) 4=5 N ^ + (`0`) ^ = O p N =5 ^ + kk ^ = O p N =5 ^ (A.5) by Assumption 8. Thus (A.3) is =5 `0` N =5 ^! O p T N `0`! =5 `0` = O p ^ = o T N p (h ;MSE ) : (A.6) Next consider ^h ;MSE h ;MSE = T 8 =5 >< `0 ^ ` >: ^=5 =5 (`0 `) =5 =5 9 >= >; : (A.7) The last factor is bounded in absolute value by 5 j~s j `0 ^ ` `0 ` + 5 j~s j ^ ; (A.8) where ~s lies between (`0 `) 4=5 =5 and `0 ^ ` 4=5 =5 ^ and ~s lies between (`0 `) =5 6=5 and `0 ^ ` =5 6=5 ^. Now `0 ^ ` `0 ` `0 ^ ` `0 ^ ` (`0 `) `0 ^ ^ ` = `0 ^ ` (`0 `) (`0 `) = O p ^ (`0 `) 0 ^ = O p @ A : (A.9) N 7

Thus (A.8) is O p 0 @ `0 `4=5 ^ + `0 ` =5 ^ A N = O p (`0 N `) 4=5 ^ + kk ^! : (A.30) It follows from Assumption 8 that! 4=5 (`0 `) ^h ;MISE h ;MISE = O p ^ NT =5 = O p T `0 ` =5 ^ = o p (h ;MISE ) : (A.3) The proof is completed. Proof of Theorem 6 We have ( ^() ~() = = ( `0 ^ `0 ^ ` `0 ^ `0 ^ ` ) `0 `0 ` k t ( + t` + x t ) = t= t= ) `0 `0 ` k t ( + x t ) = k t : t= t= k t (A.3) It clearly su ces to show that the factor in braces has norm o p (`0 `) h + N (). This norm is bounded by `0 ^ `0 ^ ` `0 ` + `0 ` `0(^ ) : (A.33) From the proof of Theorem 5, 0 `0 ^ ` `0 ` ^ = O @ A p ; N (A.34) 8

so since `0 ^ N ^ the rst term is Op N `0(^ ) = O p `0 ` ^ ^. Also, (A.35) and thus from Assumption 0 the second term has the same bound. This bound is O p `0 ` ^ O p `0 ` N h 3 + h3= T + : (A.36) Now Nh 3 = o(h ) if Nh! 0 and Nh 3= =T = o(h ) if N = o (). Also, N=( ) = o () if N = o(t ) but this is implied by the last condition. The conclusion follows. Appendix B: Technical Lemmas Lemma C ( + u c ). Then For given d 0 let the function g(u) be such that for c > d, jg(u)j max tt ( s t d ) s t g C: s= Proof. The expression in braces is bounded by C() d X js tju for any positive U. This is bounded by Choosing U gives the result. js tj d + C() c d X js tju (B.) js tj d c ; (B.) C() d U d+ + C() c d U d c+ : (B.3) Lemma Let the function g be twice boundedly di erentiable, and let g(u) = ( + juj c ) (B.4) 9

for c >, and let g(u) and its derivative g 0 (u) be integrable. Then for t, t T, s= s g t Z Proof. The left side is g(u)du C + T h + 3 (s Zt)= s= (s t )= Z g(u)du s t g Z t= g(u)du: c t T + g(u) du t c! : (B.5) (B.6) (B.7) (T t)= The rst integrand is s t s t g 0 u + s t g00 st where jgstj 00 C. Thus (B.6) is bounded in absolute value by s t () g0 + CT () C 3 + C T h ; 3 s= u ; (B.8) using Lemma. On the other hand (B.7) is bounded in absolute value by (B.9) T t C c + C c t : (B.0) Lemma 3 For all su ciently large T min k t =8: tt (B.) Proof. De ne t = Z (T t)=() k(u)du: (B.) t=() 30

Then The integrand is k t t = s= Z (T t)=() t=() s t s t k 0 k s t s t u + C t k(u) du: (B.3) u ; (B.4) where max C t < : tt Thus (B.3) is bounded in absolute value by () s t k0 + C t T () 3 C() + CT h 3 s= (B.5) =8; (B.6) say, for large enough T, using Lemma and Assumption 7. Now for large enough T, min t tt Z =(h) 0 k(u)du 4 (B.7) and thus min k t = min t tt tt max jk t t j > =8: (B.8) tt Acknowledgement This research was supported by ESRC Grant RES-06-3-0036. I am grateful for the helpful comments of a referee and Jungyoon Lee, and to the latter also for carrying out the simulations. 3

References Bai, J., S. Ng, 00. Determining the number of factors in approximate factor models. Econometrica 70, 9-. Benedetti, J.K., 977. On the nonparametric estimation of regression functions. Journal of the Royal Statisticl Society Series B39, 48-57. Gasser, T., A. Kneip, W. Kohler, 99. A exible and fast method for automatic smoothing. Journal of the American Statistical Association 86, 647-65. Hart, J.D., T.E. Wehrly, 986. Kernel regression estimation using repeated measurements data. Journal of the American Statistical Association 8, 080-088. Moyeed, K.A., P.J. Diggle, 994. Rates of convergence in semi-parametric modelling of longitudinal data. Australian Journal of Statistics 36, 75-93. Robinson, P.M., 997. Large-sample inference for nonparametric regression with dependent errors. Annals of Statistics 5, 059-083. Robinson, P.M., 007. Asymptotic properties of nonparametric regression with spatial data. Preprint. Ruckstuhl, A.F., A.H. Welsh, R.J. Carroll, 000. Nonparametric function estimation of the relationship between two repeatedly measured variables. Statistica Sinica 0, 5-7. Severini, T.A., J.G. Staniswalis, 994. Quasi-likelihood estimation in semiparametric models. Journal of the American Statistical Association 89, 50-5. Starica, C., C. Granger, 005. Nonstationarities in stock returns. Review of Economics and Statistics 87, 503-5. 3