Nonparametric Time Varying Coefficient Panel Data Models with Fixed Effects

Nonparametric Time Varying Coefficient Panel Data Models with Fixed Effects By Degui Li, Jia Chen and Jiti Gao Monash University and University of Adelaide Abstract This paper is concerned with developing a nonparametric time varying coefficient model with fixed effects to characterize nonstationarity and trending phenomenon in a nonlinear panel data model. We develop two methods to estimate the trend function and the coefficient function without taking the first difference to eliminate the fixed effects. The first one eliminates the fixed effects by taking cross sectional averages, and then uses a nonparametric local linear method to estimate both the trend and coefficient functions. The asymptotic theory for this approach reveals that although the estimates of both the trend function and the coefficient function are consistent, the estimate of the coefficient function has a rate of convergence of /2, which is slower than N /2 as the rate of convergence for the estimate of the trend function. To estimate the coefficient function more efficiently, we propose a pooled local linear dummy variable approach. This is motivated by a least squares dummy variable method proposed in parametric panel data analysis. This method removes the fixed effects by deducting a smoothed version of cross time average from each individual. It estimates both the trend and coefficient functions with a rate of convergence of N /2. The asymptotic distributions of both of the estimates are established when T tends to infinity and N is fixed or both T and N tend to infinity. Both the simulation results and real data analysis are provided to illustrate the finite sample behavior of the proposed estimation methods. JEL Classifications: C3, C4, C23. Abbreviated Title: Time Varying Coefficient Models Keywords: Fixed effects, local linear estimation, nonstationarity, panel data, time varying coefficient function.

. Introduction Panel data analysis has received a lot of attention during the last two decades due to applications in many disciplines, such as economics, finance and biology. The double index panel data models enable researchers to estimate complex models and extract information which may be difficult to obtain by applying purely cross section or time series models. There exists a rich literature on parametric linear and nonlinear panel data models see the books by Baltagi 995, Arellano 2003 and Hsiao 2003. However, it is known that the parametric panel data models may be misspecified, and estimators obtained from misspecified models are often inconsistent. To address such issues, some nonparametric methods have been used in both panel data model estimation and specification testing see, for example, Ullah and Roy 998, Lin and Ying 200, Fan and Li 2004, Hjellvik, Chen and Tjøstheim 2004, Cai and Li 2008, Henderson, Carroll and Li 2008, Mammen, Støve and Tjøstheim 2009, and Zhang, Fan and Sun 2009. Meanwhile, trending econometric modeling of nonstationary processes has also gained a great deal of attention in recent years. For example, it is generally believed that the increase in carbon dioxide emissions through the 20th century has caused global warming problem and it is important to model the trend of the global temperature. Some existing literature, such as Gao and Hawthorne 2006, revealed that the parametric linear trend does not approximate well the behavior of global temperature data. Hence, nonparametric modelling of the trending phenomenon has since attracted interest. One of the key features of the nonparametric trending model is that it allows for the data to speak for themselves with regard to choosing the form of the trend. For the recent development in nonparametric and semiparametric trending modelling of time series or panel data, see Gao and Hawthorne 2006, Cai 2007, Gao 2007, Robinson 2008, Atak, Linton and iao 2009 and the references therein. While there is a rich literature on parametric and nonparametric time varying coefficient time series models Robinson 989; Phillips 200; Cai 2007, as far as we know, few work has been done in the panel data case. The recent work by Robinson 2008 may be among the first to introduce a trending time varying model for the panel data case where cross sectional dependence is incorporated. In both theory and applications, various explanatory variables are of significant interest when modeling the trend of a panel data. Thus, it may be more informative and useful to add such explanatory variables into a time varying panel data model when modeling the trend of a panel data. 2

This paper thus proposes using a nonparametric trending time varying coefficient panel data model of the form Y it = f t + d β t,j it,j + α i + e it j= = f t + it β t + α i + e it, i =,, N, t =,, T,. where it = it,,, it,d, β t = β t,,, β t,d, all β t and f t are unknown functions, {α i } reflects unobserved individual effect, and {e it } is stationary and weakly dependent for each i and independent of { it } and {α i }, T is the time series length and N is the cross section size. Model. is called a fixed effects model when {α i } is allowed to be correlated with { it } with an unknown correlation structure. Model. is called a random effects model when {α i } is uncorrelated with { it }. For the purpose of identification, we assume that throughout the paper. N α i = 0,.2 i= Model. includes many interesting nonparametric panel data models. For example, when {β t } does not vary over time and reduces to a vector of constants, model. becomes a semiparametric trending panel data model with fixed effects. When β t 0 d 0 d is a d dimensional null vector, model. reduces to a nonparametric trending panel data model as discussed in Robinson 2008, which allows for cross sectional dependence for {e it }. The aim of this paper is to construct consistent estimates for the time trend f t and time varying coefficient vector β t before we establish asymptotic properties for the estimates. As in Robinson 989, 2008 and Cai 2007, we suppose that the time trend function f t and the coefficient vector β t satisfy t f t = f T and β t,j = β j t T where f and β j are unknown smooth functions., t =,, T,.3 In this paper, we consider two classes of local linear estimates. As N α i i= = 0, the first method eliminates the fixed effects by taking cross sectional averages, and we call it the averaged local linear method. We establish asymptotic distributions for the resulting estimates of f and β under mild conditions. The asymptotic results reveals that as both T and N tend to infinity, the rate of convergence for the estimate of the coefficient function 3

β is O P /2 while the rate of convergence for the estimate of the trend function f is O P N /2. To improve the rate of convergence for the coefficient function, a local linear dummy variable approach is proposed. This is motivated by a least squares dummy variable method proposed in parametric panel data analysis see Chapter 3 of Hsiao 2003 for example. This method removes the fixed effects by deducting a smoothed version of cross time average from each individual. As a consequence, it is shown that the rate of convergence of O P N /2 for both of the estimates is achievable. The simulation study in Section 3 confirms that the local linear dummy variable estimate of β outperforms the averaged local linear estimate. The rest of this paper is organized as follows. Two classes of local linear estimates as well as their asymptotic distributions are given in Section 2. The simulated example is provided in Section 3. A real data example is given in Section 4. Section 5 summarizes some conclusions and discusses future research. All the mathematical proofs of the asymptotic results are given in Appendix A. 2. Nonparametric estimation method and asymptotic theory In this section, we introduce two classes of local linear estimates and establish the asymptotic distributions of the proposed estimates. In Section 2., we consider the averaged local linear estimation method. Section 2.2 discusses the local linear dummy variable approach. 2.. Averaged local linear estimation To introduce the estimation method, we introduce some notation. Define Y t = N N Y it, i= t = N N it i= and e t = N N e it. i= By taking averages over i and using N α i = 0, we have i= Y t = f t + t β t + e t, t =,, T, 2. where the individual effects α i s are eliminated. Letting Y = Y,, Y T, f = f,, f T, B, β = β,, T β T and e = e,, e T, model 2. can then be rewritten as the following vector form: Y = f + B, β + e. 2.2 4

We then adopt the conventional local linear approach see, for example, Fan and Gijbels 996 to estimate β = f, β,, β d. For given 0 < τ <, define Mτ =.. T τt. T τt τt. T τt T and τt T τt W τ = diag K,, K. Assuming that β has continuous derivatives of up to the second order, by Taylor expansion we have β t T t t 2 = β τ + β τ T τ + O T τ, 2.3 where 0 < τ < and β is the derivative of β. Based on the local linear approximation in 2.3, β τ, h β τ can then be estimated by solving the optimization problem: arg min Y Mτa, b W τ Y Mτa, b. 2.4 a R d+,b R d+ Following the standard argument, the local linear estimator of β τ is [ β τ = [I d+, 0 d+ ] M τw τmτ] M τw τy, 2.5 where I d+ is a d + d + identity matrix and 0 d+ is a d + d + null matrix. To establish asymptotic results, we need to introduce the following regularity conditions. Here and in the sequel, define µ j = u j Kudu and ν j = u j K 2 udu for j = 0,, 2. A The probability kernel function K is symmetric and Lipschitz continuous with a compact support [, ]. A2 i { i, e i, i } is a sequence of independent and identically distributed i.i.d. variables, where i = it, t and e i = e it, t. Furthermore, for each i, { it, e it, t } is stationary and α mixing with mixing coefficient α k satisfying α k = O k τ, where τ > δ+2 δ for some δ > 0 involved in ii below. 5

. Fur- ii E it = 0 d and there exists a positive definite matrix Σ := E thermore, E it 22+δ <, where is the L 2 distance. it it iii The error process {e it } is independent of { it } with E [e it ] = 0 and σe 2 = E [ e 2 it] < [. Furthermore, E e it 2+δ] < and c tc e t is positive definite, where t= c t = E is i,s+t and c e t = E e is e i,s+t. A3 The trend function f and the coefficient function β have continuous derivatives of up to the second order. A4 The bandwidth h satisfies that h 0 and as T. Remark 2.. i The conditions on the kernel function K in A are imposed for brevity of our proofs and they can be weakened. For example, the compact support assumption can be removed if we impose certain restriction on the tail of the kernel function. In A2 i, we assume that { i, e i, i } is cross sectional independent and each time series component is α mixing. Such assumptions are reasonable and verifiable and cover many linear and nonlinear time series models see, for example, Fan and Yao 2003; Gao 2007; Li and Racine 2007. Condition A2iii imposes the homoscedasticity assumption on {e it }. In fact, the independence between {e it } and { it } may be weakened through allowing e it = σ it, tɛ it, where σx, t is a positive function and Lipschitz continuous in t, and {ɛ it } satisfies A2iii with E[ɛ it ] = 0 and E[ɛ 2 it ] =. ii Both A3 and A4 are mild common conditions on the smoothness of the functions and the bandwidth involved in the local linear fitting for the fixed design case. Since both the stationarity and mixing condition are only required to impose on the parametric regressors { it }, and the nonparametric kernel estimation is only involved in the fixed design case, there is no need to impose any kind of relationship between the mixing coefficient α k and the bandwidth h in Assumption A4. Define = 0 d 0 d Σ, Λ = ν 0 σe 2 + 2 c e t 0 d t= 0 d c tc e t. We state an asymptotic distribution for β τ in the following theorem. 6

Theorem 2.. Consider models. and.2. Suppose that A A4 are satisfied. Then, as T, D NT β τ β τ bτh 2 + o P h 2 d N for given 0 < τ <, where D NT = diag f, β,, β d. In particular, N fτ fτ b f τh 2 + o P h 2 d N Λ 0 d+, 2.6 N, Id, bτ = 2 µ 2β τ and β = 0, ν 0 σe 2 + 2 c e t 2.7 and βτ βτ b β τh 2 + o P h 2 d N 0 d, Σ ν 0 c tc e t Σ, t= 2.8 where b f τ = 2 µ 2f τ, b β τ = 2 µ 2β τ and β = β,, β d. Remark 2.2. i The asymptotic distribution 2.6 is established by letting T. This implies that two cases are included: i T tends to infinity and N is fixed and ii both T and N tend to infinity. Note that Assumption A4 is stronger than the condition h 0 and T Nh as either T or both T and N. Such a condition is reasonable and sufficient because the panel data series is being treated as a pooled time series with a sample size of NT in the construction of our estimation method. ii The above theorem complements some existing results see Robinson 989, 2008; Cai 2007 for example. Theorem 2. considers only the interior point in the interval 0, and the case of the boundary points can be dealt with analogously see Theorem 4 in Cai 2007 for example. It can be seen from 2.7 and 2.8 that the rate of convergence of βτ is slower than that of fτ, which is confirmed by the simulation study in Section 3 below. 2.2. Local linear dummy variable approach As shown in Theorem 2., the rate of convergence of the averaged local linear estimate of β is /2. To get an estimate that has faster rate of convergence, we next propose a local linear dummy variable approach. Recently, Su and Ullah 2006 considered a profile likelihood dummy variable approach in partially linear models with fixed effects, and Sun, Carroll and Li 2009 discussed a local linear dummy variable method in varying coefficient models with fixed effects. Note that. can be rewritten as Ỹ = f + B, β + Dα + ẽ, 2.9 7

where Ỹ = Y,, YN, Yi = Y i,, Y it, ẽ = e,, e N, ei = e i,, e it, f = I N f,, f T = I N f, B, β = β,, T β T, 2β,, NT β T, α = α,, α N, D = I N I T, in which I k is a k dimensional vector of ones and f is as defined in Section 2.. As N α i = 0, 2.9 can be further rewritten as i= where α = α 2,, α N and D = Ỹ = f + B, β + D α + ẽ, 2.0 I N, I N IT. Based on the Taylor expansion in 2.3, we have f + B, β Mτ β τ, h β τ, where β = f, β,, β d and M τ = M τ,, M N τ with M i τ = i.. it τt. T τt τt i. T τt it. The algorithm for the local linear dummy variable method is described as follows. Define W τ = I N W τ. For given 0 < τ <, minimize Ỹ Mτ β τ, h β τ D α W τ Ỹ Mτ β τ, h β τ D α with respect to β τ, h β τ and α. 2. Taking derivative of 2. with respect to α and setting the result to zero, we obtain α := α τ = D W τ D D W τ Ỹ Mτ β τ, h β τ. Replacing α in 2. by α, we obtain the concentrated weighted least squares: Ỹ Mτ β τ, h β τ W τ Ỹ Mτ β τ, h β τ, 2.2 8

where W τ = K τ W τ Kτ and Kτ = I NT D D W τ D D W τ. Observe that for any τ, Kτ D α = 0. Hence, the fixed effects term D α is eliminated in 2.2. as Minimizing 2.2 with respect to β τ, h β τ, we obtain the estimate of β τ β τ = [I d+, 0 d+ ] [ M τ W τ Mτ] M τ W τỹ. 2.3 Then β τ is called the local linear dummy variable estimator of β τ and its asymptotic distribution is given in the following theorem. Theorem 2.2. Consider models. and.2. Suppose that A A4 are satisfied. For given 0 < τ <, as T, N β τ β τ bτh 2 + o P h 2 d N 0 d+, Λ. 2.4 In particular, N fτ fτ b f τh 2 + o P h 2 d N 0, ν 0 σe 2 + 2 c e t 2.5 and N βτ βτ b β τh 2 + o P h 2 d N 0 d, Σ ν 0 c tc e t t= Σ. 2.6 Remark 2.3. As in Theorem 2., Theorem 2.2 covers the case where N is either fixed or N goes to infinity. Moreover, the local linear dummy variable estimates of both f and β have a rate of convergence of N /2. This implies that the local linear dummy variable estimate of β is asymptotically more efficient than the averaged local linear estimate of β. In addition, as the averaged local linear method, the individual fixed effects are eliminated in the estimation procedure without taking the first difference, and the fixed effects do not affect the asymptotic distributions of the two estimates. In Section 3 below, we provide a simulated example to compare the small sample behavior of the proposed local linear estimation methods with that of the ordinary local linear approach ignoring the fixed effects in nonparametric random effects panel data models. 3. Simulations 9

Throughout this section, we use a quadratic kernel function Ku = 3 4 x2 I x. In the averaged local linear estimation method, a conventional leave-one-out cross validation method is used to the averaged model 2. to select an optimal bandwidth. In contrast, in the local linear dummy variable method, a leave-one-unit-out cross validation method, which was proposed by Sun, Carroll and Li 2009, is applied to select an optimal bandwidth, as the conventional leave-one-out cross validation method fails to provide satisfactory results due to the existence of the fixed effects. See Sun, Carroll and Li 2009 for more details on this leave-one-unit-out method. Example 3.. Consider a trending time varying coefficient model of the form Y it = ft/t + βt/t it + α i + e it, i =,, N, t =,, T, 3. where fu = u 2 + u +, βu = sinπu, { it } is generated by the AR process it = 2 i,t + x it, t, i,0 = 0, {x it, t } is a sequence of independent and identically distributed N0, random variables, {x it, t } is independent of {x jt, t } for i j, i = T T it, θ 0 N α i = θ 0 i + u i, i =,, N, α N = α i, 3.2 = 0,, 2, {u i, i } is a sequence of independent and identically distributed N0, random variables, {e it } is an AR process generated by i= e it = 2 e i,t + η it, {η it, t } is a sequence of independent and identically distributed N0, random variables, {η it, t } is independent of {η jt, t } for i j, and {η it, i N, t T } is independent of { it, i N, t T }. Model 3. is similar to the simulated example in Sun, Carroll and Li 2009 and they studied the case of random design varying coefficient panel data models. It is easy to check that 3. becomes the random effects case when θ 0 = 0 in 3.2. Otherwise θ 0 =, 2, model 3. is a fixed effects panel data model. We next compare three classes of local linear estimation methods: the averaged local linear estimates ALLE, local linear dummy variable estimates LLDVE and the ordinary local linear estimates OLLE ignoring the 0

fixed effects. We compare the average mean squared errors AMSE of the three estimators. The AMSE of an estimator f of f is defined as AMSE f = R T 2 f R T r t/t ft/t, r= where f r denotes the estimate of f in the rth replication and R is the number of replications which is chosen as R = 500 in our simulation. estimator of β is AMSE β = R R r= T T 2 β r t/t βt/t. Similarly, the AMSE of an The simulation results for estimates of f and β with different values of N and T N, T = 0, 20, 30 and with both random effects θ 0 = 0 and fixed effects θ 0 =, 2 are summarized in Tables 3.a c and Tables 3.2a c. Tables 3.a c contain the AMSEs and their standard deviations in parentheses of the three estimates of f for the three cases of θ 0 = 0, and 2, and those of β are listed in Tables 3.2a c. From Tables 3. and 3.2, we can see that the performances of all the three estimates of the trend function f are satisfactory even when both N and T are small. And all of the three estimates of f improve as either T or N increases. However, when we observe the simulation results for estimates of β, we find that when keep T fixed, the performance of the average local linear estimate of the coefficient function β does not necessarily improve as N increases. But as T increases, its performance improves significantly. In contrast, the local linear dummy variable estimate and the ordinary local linear estimate perform better and better when either N or T increases. This confirms the asymptotic theory given in Section 2: the rate of convergence of /2 for the averaged local linear estimate of β is slower than the rate of convergence of N /2 for the local linear dummy variable estimate. However, the averaged local linear estimate and the local linear dummy variable estimate of f have the same rate of convergence of N /2. The simulation results also reveal that in the random effects case θ 0 = 0, the performances of the local linear dummy variable estimate and the ordinary local linear estimator are comparable with the local linear dummy variable estimate performing slightly better than the ordinary local linear estimate. As θ 0 increases, the performance of the ordinary local linear estimate becomes worse, while the performance of the dummy variable local linear dummy variable estimate is not influenced by the increase of θ 0 and remains quite stable and satisfactory.

Table 3.a AMSE for estimators of f with random effects θ 0 = 0 AMSE for estimators of f N\T 0 20 30 0 ALLE 0.008 0.345 0.0482 0.0458 0.0359 0.0300 LLDVE 0.0669 0.0652 0.0490 0.0362 0.0338 0.0267 OLLE 0.0722 0.0732 0.0523 0.0387 0.0349 0.0282 20 ALLE 0.0523 0.0587 0.0264 0.0267 0.024 0.08 LLDVE 0.0357 0.0290 0.0253 0.089 0.0272 0.068 OLLE 0.0378 0.0305 0.0262 0.095 0.0283 0.073 30 ALLE 0.0389 0.0478 0.022 0.097 0.033 0.008 LLDVE 0.0244 0.085 0.0230 0.039 0.042 0.0096 OLLE 0.025 0.088 0.0240 0.046 0.045 0.0099 Table 3.b AMSE for estimators of f with fixed effects θ 0 = AMSE for estimators of f N\T 0 20 30 0 ALLE 0.073 0.33 0.0478 0.0435 0.0350 0.0274 LLDVE 0.0750 0.0630 0.0440 0.0349 0.0322 0.0229 OLLE 0.0928 0.082 0.0489 0.0386 0.0334 0.0239 20 ALLE 0.055 0.0777 0.0386 0.0370 0.0263 0.020 LLDVE 0.0360 0.0294 0.0337 0.099 0.020 0.049 OLLE 0.042 0.033 0.0354 0.023 0.027 0.053 30 ALLE 0.0362 0.0372 0.0222 0.02 0.0224 0.092 LLDVE 0.026 0.0200 0.078 0.027 0.036 0.0095 OLLE 0.0292 0.0226 0.084 0.030 0.038 0.0096 2

Table 3.c AMSE for estimators of f with fixed effects θ 0 = 2 AMSE for estimators of f N\T 0 20 30 0 ALLE 0.099 0.24 0.0505 0.0465 0.0367 0.030 LLDVE 0.0703 0.0573 0.0654 0.0379 0.0335 0.0289 OLLE 0.00 0.559 0.089 0.0590 0.0358 0.0308 20 ALLE 0.0529 0.060 0.0290 0.026 0.0274 0.0228 LLDVE 0.036 0.0282 0.0342 0.0209 0.068 0.023 OLLE 0.048 0.0402 0.0383 0.0236 0.079 0.04 30 ALLE 0.047 0.0589 0.090 0.064 0.033 0.03 LLDVE 0.0262 0.020 0.079 0.023 0.036 0.0090 OLLE 0.0353 0.0305 0.097 0.035 0.045 0.0094 Table 3.2a AMSE for estimators of β with random effects θ 0 = 0 AMSE for estimators of β N\T 0 20 30 0 ALLE 0.7644.2696 0.37 0.369 0.2474 0.827 LLDVE 0.0852 0.0394 0.0292 0.022 0.0276 0.083 OLLE 0.520 0.425 0.084 0.0775 0.0629 0.0577 20 ALLE 0.8655.88 0.3220 0.3303 0.3006 0.3583 LLDVE 0.0280 0.090 0.069 0.022 0.039 0.0079 OLLE 0.0784 0.0783 0.0467 0.0382 0.0508 0.0463 30 ALLE.026.3883 0.3972 0.3986 0.2 0.966 LLDVE 0.095 0.033 0.037 0.0072 0.008 0.0050 OLLE 0.05 0.0450 0.0426 0.0358 0.0248 0.024 3

Table 3.2b AMSE for estimators of β with fixed effects θ 0 = AMSE for estimators of β N\T 0 20 30 0 ALLE 0.7638.043 0.3235 0.3252 0.2457 0.70 LLDVE 0.0543 0.0428 0.0376 0.0244 0.0287 0.088 OLLE 0.2029 0.2430 0.0942 0.0965 0.066 0.0649 20 ALLE 0.9463.3754 0.540 0.6736 0.2929 0.3756 LLDVE 0.029 0.0203 0.089 0.007 0.023 0.008 OLLE 0.237 0.25 0.0790 0.0588 0.0388 0.0337 30 ALLE 0.8382 0.9442 0.3929 0.4229 0.3202 0.3308 LLDVE 0.098 0.030 0.06 0.0074 0.0083 0.0054 OLLE 0.044 0.0906 0.0442 0.0407 0.027 0.0220 Table 3.2c AMSE for estimators of β with fixed effects θ 0 = 2 AMSE for estimators of β N\T 0 20 30 0 ALLE 0.8282.3952 0.384 0.306 0.806 0.707 LLDVE 0.0579 0.0467 0.0378 0.0233 0.036 0.0206 OLLE 0.429 0.469 0.246 0.2895 0.077 0.0686 20 ALLE 0.8852.3935 0.3420 0.3886 0.3376 0.3589 LLDVE 0.0276 0.09 0.088 0.008 0.064 0.000 OLLE 0.3279 0.265 0.535 0.24 0.0548 0.0572 30 ALLE 0.906.3650 0.3009 0.30 0.2032 0.84 LLDVE 0.0202 0.038 0.02 0.0072 0.0086 0.0057 OLLE 0.298 0.268 0.02 0.0846 0.0525 0.047 4

4. Real Data Analysis We analyze a climate data set from the UK met office web site: http://www.metoffice.gov. uk/climate/uk/stationdata/. The data set contains monthly mean maximum temperatures in Celsius degrees, mean minimum temperatures in Celsius degrees, days of air frost in days, total rainfall in millimeters and total sunshine duration in hours from 37 stations covering UK. Here we study the common trend in the mean maximum temperature series and the relationship of the mean maximum temperatures with total rainfall and total sunshine duration during the period of January/999 to December/2008. Data from 6 stations are selected according to data availability records start at different time for different stations and data for some part of the period Jan/999 Dec/2008 are missing at some stations. Denote by Y it the log transformed mean maximum temperature in t-th month in station i, and it, and it,2 the log transformed total rainfall and total sunshine duration, respectively. Let it = it,, it,2. As there exists seasonality in both Y and, we first remove the seasonality from the observations of Y and. We also remove the trends in, as we assume in the paper that is stationary. Let Y it be the seasonally adjusted values of Y it and it be the seasonally adjusted and detrended values of it. We then assume the following model: Y it = ft/t + β t/t it, + β 2 t/t it,2 + α i + e it, i N, t T 4. with α i being the station specific effects and i = 6 and T = 20. Our purpose is to estimate f, β and β 2 to get the common trend in the mean maximum temperature series from the 6 stations and to see the relationship between the mean maximum temperatures with the total rainfall and total sunshine duration. As evidenced by the simulation results in Section 3, the local linear dummy variable method generally outperforms the averaged local linear method. Hence, we use the local linear dummy variable method to estimate f, β and β 2 in 4.. The estimated curves are given in Figure 4.a c. The estimated trend curve in Figure 4. a shows that from the beginning of 999 to the end of 2000, there is a slight decrease in the monthly mean maximum temperatures. Thereafter, there is an overall upward trend from the beginning of 200 to the end of 2006. Then from the beginning of 2007 to the end of 2008, there is a drop in the maximum temperatures. On the other hand, the estimated coefficient functions in Figure 4. b and c do exhibit certain variance over time. And the coefficient function for the total sunshine 5

2.7 a 2.6 2.5 999 2000 200 2002 2003 2004 2005 2006 2007 2008 2009 0. b 0!0. 999 2000 200 2002 2003 2004 2005 2006 2007 2008 2009 0.2 c 0!0.2 999 2000 200 2002 2003 2004 2005 2006 2007 2008 2009 Figure 4.. The local linear dummy variable estimates of a f, b β, c β 2. duration β 2 is generally above zero, which indicates that there is an overall positive relationship between the mean maximum temperatures and the total sunshine duration. In other words, longer sunshine duration tends to result in higher maximum temperatures. 5. Conclusions and Discussion We have considered a nonparametric time varying coefficient panel data model with fixed effects. Two classes of nonparametric estimates have been proposed and studied. The first one is based on an averaged local linear estimation method while the second one relies on a local linear dummy variable approach. Asymptotic distributions of the proposed estimates have been established with the second estimation method providing a faster rate of convergence than the first one. Both the simulation results and real data analysis have been provided to illustrate the asymptotic theory and support the finite sample performance of the proposed estimates. The asymptotic distribution of the averaged local linear and dummy variable estimators for the case of E it 0 d can be obtained as corollaries of Theorems 2. and 2.2. Note 6

that. can be rewritten as Y it = f t + it β t + α i + e it = f t + it β t + α i + e it, 5. where f t = f t + E it β t and it = it E it. By Theorems 2. and 2.2, we can establish the asymptotic distributions of the averaged local linear estimator and dummy variable estimator of f τ, β τ. Then by the Cramér Wold device, we can further obtain asymptotic distributions for the two estimators of fτ, β τ by noting that f τ, β τ = E it 0 d I d There are some limitations in this paper. fτ, β τ. The first one is the assumption on cross sectional independence. One future topic is to extend the discussion of Robinson 2008 to model. under cross sectional dependence. The second one is that there is no endogeneity between {e it } and { it }. Another future topic is to accommodate such endogeneity in a general model. 6. Acknowledgments This project was supported by the Australian Research Council Discovery Grant Program under Grant Number: DP0879088. Appendix A: Proofs of the main results In this section, we provide the proofs for the asymptotic results in Section 2. Proof of Theorem 2.. We only consider the case when T and N tend to infinity simultaneously. The proof for the case when T tends to infinity and N is fixed is similar and we therefore omit the details here. Note that β τ = [ [I d+, 0 d+ ] M τw τmτ] M τw τy = [ [I d+, 0 d+ ] M τw τmτ] M τw τ B, β + e A. and β τ β τ = { [ } [I d+, 0 d+ ] M τw τmτ] M τw τ [f + B, β] β τ [ + [I d+, 0 d+ ] M τw τmτ] M τw τe =: I NT + I NT 2. A.2 7

Note that 2.7 and 2.8 can be derived from 2.6. Hence, we only provide the detailed proof of 2.6, which follows from the following three propositions. Proposition A.. Suppose that A, A2 i, ii and A4 are satisfied. Then, as T, N simultaneously, PNT M τw τmτp NT Λ µ = o P, A.3 where P NT = diag, N I d,, N I d, Λ µ = diag µ 0, µ 2 and is as defined in Section 2. Proof. Observe that P NT M τw τmτp NT = T T t t K T t τt t t where t = PNT, t and P NT = diag We need only to prove that T t τt, K t τt N I d. T t t t t t τt t τt K t τt 2 K t τt, t τt t t K = µ 0 + o P, A.4 since the proofs for the other components in PNT M τw τmτp NT are analogous. For simplicity, define Q t = t t, K t = K and it = P NT, it. By A2iii, we have N 2 E N it i= N it i= = t τt 0 d 0 d Σ =. By A and the above equation, it is easy to show that as T, N, T E [Q t K t ] = T N N K N 2 t E = = µ 0 it i= N T K t E N 2 + O. it i= it i= N it i= A.5 T We then consider the variance of Q t K t. Note that T T T Var Q t K t = Var Q t K t + Cov Q t K t, Q s K s s t =: Π NT + Π NT 2. 8

It follows from A2 iii that [ sup E t 22+γ] C 2+γ, t T A.6 for some 0 γ δ and some constant C [ > 0, where δ is as defined in A2, and we have used an existing inequality of the form E N 22+γ] [ N 2+γ E t 22+δ]. i= it Furthermore, by A and A.6 with γ = 0, we have Π NT = T Kt 2 VarQ t = T C T 2 h 2 Meanwhile, for Π NT 2, notice that T Kt 2 = O. Kt 2 Var t t A.7 Π NT 2 = T Cov Q t K t, Q s K s + T 0< s t q T = Π NT 3 + Π NT 4, s t >q T Cov Q t K t, Q s K s A.8 where q T and q T = o. As { i, i } is cross sectional independent and for each i, and { it, t } is stationary and α mixing, both { t } and {Q t } are still stationary and α mixing with the same mixing coefficient α k. As a consequence, we are able to employ some existing results for α mixing processes. By A, A2 iii, A.6 and Lemma A. of Gao 2007, we have Π NT 4 C T K t C T K T 2 h 2 t α δ/2+δ k k>q T s t >q T Cov Q t, Q s = o, A.9 as q T and α k = Ok τ for τ > δ + 2/δ. Since K is Lipschitz continuous by A, we have Standard calculation gives K t K s C q T when t s q T. A.0 Π NT 3 = T Kt 2 T K t Cov Q t, Q s 0< s t q T K t K s Cov Q t, Q s 0< s t q T =: Π NT 5 + Π NT 6. A. 9

By A.0 and the fact that CovQ, Q t = O 2, we have t=2 Π NT 6 Cq T T 3 h 3 Furthermore, similarly to the proof of A.7, we can show that Π NT 5 C T 2 h 2 T T K t = o. A.2 Kt 2 = O. A.3 In view of A.7 A.9 and A. A.3, we have as T Var Q t K t = O. A.4 By A.5 and A.4, we have shown that A.4 holds. The proof of Proposition A. is thus completed. Proposition A.2. Suppose that A, A2 iii, A3 and A4 are satisfied. Then, as T, N, I NT = 2 µ 2β τh 2 + o P h 2. Proof. By A3 and Taylor expansion, we have β t/t β τ + β τt/t τ + 2 β τt/t τ 2. A.5 Equation A.5 then follows immediately from the definition of I NT, the above equation and Proposition A.. Proposition A.3. Suppose that A, A2 and A4 are satisfied. Then, as T, N, d D NT I NT 2 N 0 d+, Λ, A.6 where D NT and Λ are defined as in Theorem 2.. Proof. By A2 iiii, it is easy to check that E I NT 2 = 0 d+. A.7 We then calculate the variance of D NT I NT 2. Noticing that N 2 E t t e 2 t = N 2 N N 4 E N N it jt N e kt = N 2 = Σ σ 2 e i= N N i= k= j= k= E it it Ee 2 kt l= e lt 20

and N 2 Cov t e t, s e s = E it is E e it e is = c s t c e s t, we have as T, N, N 2 T T Var t e t t= c tc e t. A.8 T By A.8, the fact that Kt 2 = ν 0 + o and following the proof of A.4, we have N 2 T Var K t t e t = ν 0 c tc e t + o. A.9 t= and Analogously, we have N Var T K t e t = ν 0 σe 2 + 2 c e t + o A.20 N 3/2 T E Kt 2 t e 2 t = 0 d. A.2 Furthermore, by A.9 A.2 and Proposition A. and noting that D NT I NT 2 = D NT [I d+, 0 d+ ] P NT [P NT M τw τmτp NT ] P NT M τw τe, we have Var D NT I NT 2 = Λ + o. A.22 By equations A2, A.7, A.22 and Theorem 2.2 of Peligrad and Utev 997, we have shown that A.6 holds. Proof of Theorem 2.2. Note that [ β Mτ] τ β τ = [I d+, 0 d+ ] M τ W τ M τ W τỹ β τ { [ Mτ] = [I d+, 0 d+ ] M τ W τ M τ W τ f + B, } β β τ + [I d+, 0 d+ ] [ M τ W τ Mτ] M τ W τ D α + [I d+, 0 d+ ] [ M τ W τ Mτ] M τ W τẽ =: Ξ NT + Ξ NT 2 + Ξ NT 3. By the definition of W τ, we have A.23 W τ = K τ W τ Kτ = W τ W τ D D W τ D D W τ, 2

which implies that Ξ NT 2 [ = [I d+, 0 d+ ] M τ W τ Mτ ] M τ W τ D α [ [I d+, 0 d+ ] M τ W τ Mτ ] M τ W τ D D W τ D D W τ D α [ = [I d+, 0 d+ ] M τ W τ Mτ ] M τ W τ D α [ [I d+, 0 d+ ] M τ W τ Mτ ] M τ W τ D α 0 d+. Therefore, in order to establish the asymptotic distribution 2.4 in Theorem 2.2, we need only to consider Ξ NT and Ξ NT 3. The following propositions establish the asymptotic properties of Ξ NT and Ξ NT 3, which lead to 2.4. And 2.4 implies 2.5 and 2.6. Theorem 2.2 holds for both N being fixed and N tending to infinity, but we only give the proof for the case of N tending to infinity simultaneously with T. The proof for the case of fixed N is similar. Proposition A.4. Suppose that A, A2 i, ii and A4 are satisfied. Then, as T, N simultaneously, where Λ µ is defined as in Proposition A.. Proof. By the definition of W τ, we have N M τ W τ Mτ Λ µ = o P, A.24 M τ W τ Mτ = M τ W τ Mτ M τ W τ D D W τ D D W τ Mτ. Following the proof of Proposition A., we have N M τ W τ Mτ = Λ µ + o P. A.25 A.26 In view of A.25, we need only to prove M τ W τ D D W τ D D W τ Mτ = o P N. A.27 To do so, we first consider the term D W τ D. Define Z T = T 22 K t T τ. By the

definition of D and standard calculation, we have 2Z T Z T Z T Z T Z T Z T D W τ D =... =... Z T Z T 2Z T Z T Z T Z T + diag Z T,, Z T. Letting A = diag Z T,, Z T, B = Z T,, Z T, C = and D = ZT,, Z T, then D W τ D = A + BCD. Noticing that A + BCD = A A BDA B + C DA, A.28 which can be found in Poirier 995, we have Z T D W τ D NZ = T Z T. A.29... Z T Define Z T i = T K t T τ it and Z T i = T t T τ K t T τ it. By standard arguments, we have 0 0 0 M τ W τ D Z T 2 Z T Z T 3 Z T Z T N Z T =. 0 0 0 Z T 2 Z T Z T 3 Z T Z T N Z T which, together with A.29, implies M τ W τ D D W τ D D W τ Mτ = where 0 0 0 0 0 A NT 0 C NT 0 0 0 0 0 D NT 0 B NT, A.30 N A NT = A NT kz T k Z T, k=2 23

N B NT = B NT k Z T k Z T, C NT = D NT = k=2 N k=2 N k=2 A NT k Z T k Z T, B NT kz T k Z T, A NT k = Z T k N B NT k = Z T k N N j=, k N j=, k Z T j = Z T k Z T Z T j = Z T k Z T N Z T i and i= N Z T i. By the weak consistency of the nonparametric estimator in the fixed design case, we have for each i, i= Z T = µ 0 + o P, Z T i = o P, Z T i = o P. A.3 By A.30 and A.3, we have shown that A.27 holds. The proof of Proposition A.4 is completed. Proposition A.5. Suppose that A, A2 iii, A3 and A4 are satisfied. Then, as T, N simultaneously, Ξ NT = 2 µ 2β τh 2 + o P h 2 = bτh 2 + o P. A.32 Proof. By A3, Taylor expansion of β at τ and Proposition A.4, the proof of A.32 follows. Proposition A.6. Suppose that A, A2 and A4 are satisfied. Then, we have d NΞNT 3 N 0 d+, Λ A.33 as T, N simultaneously. Proof. By Propositions A.4 and A.5, to prove A.33, it suffices to prove M τ W τẽ d N 0 2d+2, Λ ν Λ, N A.34 where Λ ν = diag ν 0, ν 2. 24

By the definition of W τ, we have M τ W τẽ = M τ W τẽ M τ W τ D D W τ D D W τẽ. A.35 By A.35, it is enough to show that M τ W τẽ d N 0 2d+2, Λ ν Λ. N A.36 and M τ W τ D D W τ D D W τẽ = o P N. A.37 Define for j = 0,. t T τ L j = To prove A.36, we need only to prove By A2, we have N N T i= t T τ L j it t T τ j t T τ K e it N Var N T t T τ L j i= it d N 0 d+, ν 2j Λ. e it = ν 2j Λ + o. A.38 A.39 Denote {ξ j, j =,, NT } = {, e,, T, e T, 2, e 2,, NT, e NT }, then {ξ} is stationary and mixng with α mixing coefficient α k or 0, k < T, α k = 0, k T. By A.39 and Theorem 2.2 of Peligrad and Utev 997, equation A.38 holds. We then turn to the proof of A.37. Let e T i = T K t T τ e it. Following the proof of A.30, we have D W τẽ = e T 2 e T,, e T N e T. A.40 By A.29, A.30 and A.40, we have M τ W τ D D W τ D D W τẽ = 25 0, U NT, 0, V NT, A.4

where and U NT = V NT = N A NT ke T k e T = k=2 N B NT ke T k e T = k=2 N A NT ke T k k= N B NT ke T k, in which A NT k and B NT k are defined in the proof of Proposition A.4. By A2 iiii, we have k= E UNT 2 = on and E VNT 2 = on, which implies U NT = o P N and V NT = o P N. A.42 In view of A.4 and A.42, A.37 holds. The proof of Proposition A.6 is therefore completed. References Arellano, M. 2003. Panel Data Econometrics. Oxford University Press, Oxford. Atak, A., Linton, O., iao, Z. 2009. Kingdom. Manuscript. A semiparametric panel model for climate change in the United Baltagi, B. H. 995. Econometrics Analysis of Panel Data. John Wiley, New York. Cai, Z. 2007. Trending time varying coefficient time series models with serially correlated errors. Journal of Econometrics 36, 63-88. Cai, Z., Li, Q. 2008. Nonparametric estimation of varying coefficient dynamic panel data models. Econometric Theory 24, 32-342. Fan, J., Gijbels, I. 996. Local Polynomial Modelling and Its Applications. Chapman and Hall, London. Fan, J., Li, R. 2004. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association 99, 70 723. Fan, J., Yao, Q. 2003. Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New York. Gao, J. 2007. Nonlinear Time Series: Semiparametric and Nonparametric Methods. Chapman & Hall/CRC, London. Gao, J., Hawthorne, K. 2006. Semiparametric estimation and testing of the trend of temperature series. Econometrics Journal 9, 332-355. Henderson, D., Carroll, R., Li, Q. 2008. Nonparametric estimation and testing of fixed effects panel data models. Journal of Econometrics 44, 257-275. 26

Hjellvik, V., Chen, R., Tjøstheim, D. 2004. Nonparametric estimation and testing in panels of intercorrelated time series. Journal of Time Series Analysis 25, 83-872. Hsiao, C. 2003. Analysis of Panel Data. Cambridge University Press, Cambridge. Li, Q., Racine, J. 2007. Nonparametric Econometrics: Theory and Practice. Princeton University Press, Princeton. Lin, D., Ying, Z. 200. Semiparametric and nonparametric regression analysis of longitudinal data with discussion. Journal of the American Statistical Association 96, 03-26. Mammen, E., Støve, B., Tjøstheim, D. 2009. Nonparametric additive models for panels of time series. Econometric Theory 25, 442-48. Peligrad, M., Utev, S. 997. Central limit theorems for linear processes. Annals of Probability 25, 443-456. Phillips, P. C. B. 200. Trending time series and macroeconomic activity: some present and future challengers. Journal of Econometrics 00, 2-27. Poirier, D. J. 995. Intermediate Statistics and Econometrics: a Comparative Approach. The MIT Press. Robinson, P. 989. Nonparametric estimation of time varying parameters. Statistical Analysis and Forecasting of Economic Structural Change. Hackl, P. Ed.. Springer, Berlin, pp. 64-253. Robinson, P. 2008. Nonparametric trending regression with cross-sectional dependence. Manuscript. Su, L., Ullah, A. 2006. Profile likelihood estimation of partially linear panel data models with fixed effects. Economics Letters 92, 75-8. Sun, Y., Carroll, R. J., Li, D. 2009. Semiparametric estimation of fixed effects panel data varying coefficient models. Advances in Econometrics 25, 0-29. Ullah, A., Roy, N. 998. Nonparametric and semiparametric econometrics of panel data. Handbook of Applied Economics and Statistics, Ullah, A., Giles, D.E.A. Eds.. Marcel Dekker, New York, pp. 579-604. Zhang, W., Fan, J., Sun, Y. 2009. A semiparametric model for cluster data. Annals of Statistics 37, 2377 2408. 27