Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics 1

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics 1 By Jiti Gao 2 and Maxwell King 3 Abstract We propose a simultaneous model specification procedure for te conditional mean and conditional variance in nonparametric and semiparametric time series econometric models. An adaptive and optimal model specification test procedure is ten constructed and its asymptotic properties are investigated. Te main results extend and generalize existing results for testing te mean of a fixed design nonparametric regression model to te testing of bot te conditional mean and conditional variance of a class of nonparametric and semiparametric time series econometric models. In addition, we develop computer intensive bootstrap simulation procedures for te selection of an interval of bandwidt parameters as well as te coice of asymptotic critical values. An example of implementation is given to sow ow to implement te proposed simultaneous model specification procedure in practice. Moreover, finite sample studies are presented to support te proposed procedure. KEYWORDS: Continuous time model, diffusion process, kernel estimation, nonparametric estimation, optimal test, semiparametric metod, time series econometrics. 1. Introduction and Motivation Consider a continuous time diffusion process of te form dr t = µ(r t dt + σ(r t db t, were µ( and σ( > 0 are respectively te univariate drift and volatility functions of te process, and B t is standard Brownian motion. Recently, Aït-Saalia (1996a developed a simple metodology for testing bot te drift and te diffusion. Troug using te forward Kolmogorov equation, te autor derived a corresponding relationsip between te marginal density of r t and te pair (µ, σ. Ten, instead of testing bot te drift and te volatility 1 Te first autor would like to tank Song Xi Cen, Oliver Linton and Dag Tjøsteim for some constructive discussions. Te autors also acknowledge comments from seminar participants at University of Western Australia, Monas University, Catolique University de Louvain in Belgium, London Scool of Economics, Cornell University and Yale University, in particular, Donald Andrews, Iréne Gijbels, Yongmiao Hong, Peter Pillips, Peter Robinson, Howell Tong and Qiwei Yao. Tanks also go to te Australian Researc Council for its financial support. 2 Jiti Gao is from Department of Statistics, Scool of Matematics and Statistics, Te University of Western Australia, Crawley WA 6009, Australia. Email: jiti@mats.uwa.edu.au 3 Maxwell King is wit te Faculty of Business and Economics, Monas University, Melbourne, Vic. 3168, Australia. Email: max.king@buseco.monas.edu.au 1

simultaneously, te autor considered testing weter te density function belongs to a parametric family of density functions. Te approac as te advantage of using discrete data witout discretizing te continuous time model (see also Aït-Saalia 1996b. Te use of te marginal density is computationally convenient and can detect a wide range of alternatives. For a discrete time series regression model, owever, it is difficult to establis a corresponding relationsip between te marginal density of te time series and te pair of te conditional mean and te conditional variance of te model. Terefore, to specify te marginal density only may not be adequate for te specification of bot te conditional mean and te conditional variance of a general time series regression model. Tis motivates te discussion of a simultaneous model specification for bot te conditional mean and te conditional variance of a class of time series econometric models of te form Y t = g(x t + σ(x t e t, t = 1, 2,..., T (1.1 were bot g( and σ( > 0 are unknown functions defined over R d, te data {(X t, Y t : t 1} are eiter independent observations or dependent time series, {e t } is an independent and identically distributed (i.i.d. error wit mean zero and variance one, and T is te number of observations. In recent years, nonparametric and semiparametric tecniques ave been used to construct model specification tests for te mean function of model (1.1. Interest focuses on tests for a parametric form versus a nonparametric form, tests for a semiparametric (partially linear or single-index form against a nonparametric form, and tests for te significance of a subset of te nonparametric regressors. For example, Härdle and Mammen (1993 ave developed consistent tests for a parametric specification by employing te kernel regression estimation tecnique; Hong and Wite (1995 and oters ave applied te metod of series estimation to consistent testing for a parametric regression model; Eubank and Spiegelman (1990, Eubank and Hart (1992, Wooldridge (1992, Yatcew (1992, Gozalo (1993, Samarov (1993, Wang and Andrews (1993, Horowitz and Härdle (1994, Hjellvik and Tjøsteim (1995, Fan and Li (1996, Jayasuriva (1996, Zeng (1996, Hjellvik, Yao and Tjøsteim (1998, Li and Wang (1998, Cen and Fan (1999, Li (1999, Gao and King (2001, Cen, Härdle and Li (2003, and oters ave developed consistent tests for a semiparametric model (partially linear or single-index versus a nonparametric alternative for eiter te independent and identically distributed (i.i.d. case or te time series case. Oter related studies include Robinson (1988, 1989, Andrews (1997, Li and Hsiao (1998, Wang (2000, Aït-Saalia, Bickel and Stoker (2001, Fan and Huang (2001, Gozalo and Linton (2001, Gao, Tong and Wolff (2002, Hong and Lee (2002, and Sperlic, Tjøsteim and Yang (2002. Recently, Horowitz and Spokoiny (HS (2001 ave developed a new test of a parametric model of a mean function against a nonparametric alternative. Te test adapts to te unknown smootness of te alternative model and is uniformly consistent against alternatives 2

wose distance from te parametric model converges to zero at te fastest possible rate. Tis rate is slower tan T 1/2, were T is te number of observations. Anoter feature of te HS test is tat one can avoid coosing a particular bandwidt parameter for testing purposes wen using kernel based test statistics. Existing studies consider using an estimation based optimal value 4 for fixing te bandwidt parameter involved. Tis coice may not be justified in bot teory and practice, as estimation based optimal values may not be optimal for testing purposes. For a kernel based testing problem, as suggested in te HS paper, one needs to coose an optimal bandwidt parameter to ensure tat te power of te resulting test can be maximized at (or near te optimal bandwidt. Te HS paper as successfully used an interval of bandwidt parameters for constructing an adaptive and optimal test for testing te mean of a fixed design nonparametric regression model. To te best of our knowledge, owever, te problem of testing bot te conditional mean and te conditional variance of model (1.1 simultaneously as attracted less attention. Recently, Cen and Gao (2003 constructed an empirical likeliood (EL based test statistic to test bot te mean and te variance of a nonparametric regression model, and proposed a bootstrap simulation procedure for te implementation of te proposed test. Te current paper proposes two novel classes of test statistics and constructs an adaptive and optimal test. Te proposed adaptive test is consistent against some local alternatives wit an optimal rate. In addition, tis paper develops computer intensive simulation procdures for te coice of kernel bandwidt parameters and asymptotic critical values. In summary, our approac as te following features: (i It proposes simultaneous test procedures for testing bot te conditional mean and te conditional variance of a class of nonparametric time series econometric models for bot independent and strongly dependent error processes. Sound and novel teoretical properties for te simultaneous test procedures are establised. (ii It extends and generalizes te results of Horowitz and Spokoiny (2001 for testing te mean of fixed design nonparametric regression to te simultaneous tesing of bot te conditional mean and te conditional variance of a class of nonparametric and semiparametric time series econometric models. (iii It is applicable to a wide variety of models, wic include general nonparametric regression models for bot te i.i.d. case and te time series case. Te test procedure is also applicable to continuous time model specification. Bot te metodology and teoretical tecniques developed in tis paper can be used to improve economic and financial model building and forecasting. 4 Usually, a cross validation selection procedure is used for coosing an optimum bandwidt parameter to ensure tat te average mean square of te resulting estimator is minimized. See Härdle, Liang and Gao (2000, 2.1.3 for example. 3

Te rest of te paper is organised as follows. Section 2 proposes two class of model specification test statistics. An adaptive test procedure is discussed in Section 3 and some asymptotic consistency results are establised. Section 4 provides an application of te adaptive test procedure to a discrete nonlinear time series model. Section 5 concludes te paper wit some remarks on extensions. Matematical details are relegated to Appendices A and B. 2. Model specification tests Trougout tis section, we consider model (1.1. For convenience, let m 1 (x = E(Y t X t = x = g(x and m 2 (x = var(y t X t = x = σ 2 (x for x S R d. Define m(x = (m 1 (x, m 2 (x τ be a bivariate vector and {m θ ( = (m 1,θ (, m 2,θ ( τ θ Θ} be a parametric model tat specifies parametric forms for te conditional mean and conditional variance of Y t conditional on X t, were θ R q is an unknown parameter taking a value in te parameter space Θ R q. Te interest of tis paper is to test for some θ Θ against H 0 : m 1 (x = m 1θ (x and m 2 (x = m 2θ (x (2.1 H 1 : m 1 (x = m 1θ (x + C 1T 1T (x and m 2 (x = m 2θ (x + C 2T 2T (x, were bot 1T (x and 2T (x are continuous and bounded functions over R d. Note tat te above ypoteses are equivalent to H 0 : m(x = m θ (x versus H 1 : m(x = m θ (x + C T T (x for all x S, were C T = (C 1T, C 2T τ is a vector of two non random sequences tending to zero as T and T (x = ( 1T (x, 2T (x τ. Tis contains te parametric case were T ( 0. Let θ 0 Θ denote te true value of θ if H 0 is true. Tat is, m(x = m θ0 (x for all x S if H 0 is true. We first introduce a nonparametric kernel estimator for m(. Let K be a d-dimensional bounded probability density function wit a compact support on te d-dimensional cube 1, 1] d. Assume tat K( satisfies te moment conditions: uk(udu = 0 and uu τ K(udu = σ 2 KI d, were I d is te d-dimension identity matrix and σ 2 K is a positive constant. Let be a smooting bandwidt satisfying 0 and T d as T. 4

Define K (u = d K(u/. Te Nadaraya-Watson (NW estimators of m l (x for l = 1, 2 are defined by m 1 (x = Tt=1 K (x X t Y t Tt=1 K (x X t and m 2 (x = Tt=1 K (x X t (Y t m 1 (X t 2 Tt=1. (2.2 K (x X t Tis paper considers using te only one smooting parameter. One can use two different bandwidt parameters 1 and 2 for l = 1 and l = 2 respectively. Te representation for tis case will be complicated. See Cen and Gao (2003. Similarly, for te parametric models, one can estimate m l,θ by m l, θ(x = Tt=1 K (x X t m l, θ(x t Tt=1 K (x X t (2.3 for l = 1, 2, were θ is a consistent estimator of θ under H 0. Let m(x = ( m 1 (x, m 2 (x τ and m θ (x = ( m 1,θ (x, m 2,θ (x τ. Te test statistics we are going to consider are based on te difference between m θ( and m(, rater tan directly between m θ( and m(. Due to te use of (2.2 and (2.3, one can avoid te bias associated wit te nonparametric estimation. Te local linear estimator can also be used to replace te NW estimator in estimating m(. As we use m and m θ to construct eac test statistic, owever, te possible bias associated wit te NW estimator is not an issue ere. In addition, te NW estimator as a simpler analytic form. Extension of our approac to te local linear estimator based test procedure can be discussed in a similar fasion, altoug te proof will be more tecnical. We now introduce te following notation. ɛ t = Y t m 1 (X t, η t = ɛ 2 t m 2 (X t, σ ij (x = E ɛ i tη j t X t = x ] for i = 0, 1, 2 and s 0 (x = Σ 0 (x 1, were A is te determinant of a matrix A and Σ 0 (x = σ 20(x σ 11 (x σ 11 (x σ 02 (x. Let f(x be te marginal density of {X t }. We assume witout loss of generality tat R(K = K 2 (xdx = 1. Let Σ(x = f 1 (xσ 0 (x. In tis section, we ten construct two different classes of model specification tests and establis teir asymptotic distributions. Section 3 discusses an optimal version of one of te proposed tests. Empirical comparisons of te two tests are given in Section 4. 2.1. Class I of Test Statistics 5

To construct te first class of our test statistics, we ave a look at te following null ypotesis: H 01 : m 1 (x = m 1θ (x against H 11 : m 1 (x = m 1θ (x + C 1T 1T (x. (2.4 For testing (2.4, Härdle and Mammen (1993 suggested using te following test statistic HM T = (T d ( m 1 (x m 1 θ(x 2 π(xdx, (2.5 were π(x is a positive weigt function satisfying π 2 (xdx <. Te autors sowed tat under H 01 HM T = HM T µ 0 σ 0 D N(0, 1, (2.6 were µ 0 = K (2 (0 σ 2 (xπ(x dx and σ 2 f(x 0 = 2 d K (4 (0 ( σ 2 (xπ(x f(x density function of X t and σ 2 (x = Var(Y t X t = x. 2 dx, in wic f(x is te For testing (2.1, equation (2.5 tus motivates te use of a test statistic of te form N 1T ( = (T d { m(x m θ(x} τ ˆΣ 1 (x{ m(x m θ(x}π(xdx (2.7 provided tat ˆΣ 1 (x exists, were ˆΣ 1 (x = ˆf(xˆΣ 1 0 (x, ˆΣ0 (x = ˆf(x = 1 Tt=1 K ( x X t T d and for i, j = 0, 1, 2, ˆσ 20(x ˆσ 11 (x ˆσ 11 (x ˆσ 02 (x, (2.8 ˆσ ij (x = Tt=1 K ( x X t ˆɛ i tˆη j t Tt=1 K (, ˆɛ x X t = Y t m 1 (X t, ˆη t = ˆɛ 2 t t m 2 (X t. Te use of te weigt function, π(, is due to bot teoretical and practical considerations. For te teoretical consideration, one does not need to assume tat te support of te marginal density, f(, of {X t } is compact. Tis will not exclude some important distributions suc as Gaussian distributions, wic is particularly important in financial modelling. For te practical consideration, wen te support of f( is not compact, one can use π( for approximation and truncation purposes. Before establising te asymptotic distribution of (2.7, we give te following remark. Remark 2.1. We sould point out tat (2.7 is a natural extension of (2.5 and is asymptotically equivalent to te test statistic based on te empirical likeliood metod (see Cen and Gao 2003. We now state te main result of tis section and te proof is relegated to Appendix A. 6

Teorem 2.1. (i Suppose tat Assumptions A.1 A.4 old. Ten under H 0 L 1T = L 1T ( = N 1T ( 2µ π σ D N(0, 1 as T, (2.9 were µ π = π(xdx, σ 2 = 4 d C(K, π, C(K, π = K (4 (0R 2 (K π 2 (xdx, K (j ( denotes te j-times convolution product of K(, and R(K = K 2 (udu. (ii Assume tat te conditions of (i old. In addition, assume tat tere is a random data driven ĥ suc tat ĥ 1 p 0 as T. Ten under H 0 ˆL 1T = L 1T (ĥ = N 1T (ĥ 2µ π σĥ D N(0, 1 (2.10 as T. Remark 2.2. One needs to point out tat eiter (2.9 or (2.10 is already a normalized form. It follows from (2.9 or (2.10 tat L 1T or ˆL 1T as an asymptotic normality distribution under te null ypotesis H 0. In general, H 0 sould be rejected if L 1T or ˆL 1T exceeds a critical value, L 10, of te normal distribution. As can be seen from (2.10, te test statistic, ˆL 1T, involves te kernel function only and is terefore applicable to real data implementation. Remark 2.3. Teorem 2.1(ii sows tat te asymptotic normality remains uncanged wen is replaced wit te random data driven ĥ, wic is known as te plug in metod. Recently, Gao and King (2001, and Lavergne (2001 suggested using te plug in metod. Apart from using te plug in metod for testing purposes, tere are some oter metods. For example, Horowitz and Spokoiny (2001 adopted te maximum of a test statistic over a bandwidt interval. For our case, teir test statistic is similar to max HT L 1T (, in wic H T is an interval of bandwidts. We discuss an extension of Horowitz and Spokoiny (2001 to our case in Section 3. Teorem 2.1 gives te asymptotic normality of te test statistics for te simultaneous testing problem. Wen te null ypotesis is rejected, one needs to furter test H 01 : m 1 (x = m 1θ (x against H 11 : m 1 (x = m 1θ (x + C 1T 1T (x or H 02 : m 2 (x = m 2θ (x against H 12 : m 2 (x = m 2θ (x + C 2T 2T (x. and Define N 11T ( = (T d 1 { m 1 (x m 1, θ(x} 2ˆσ 20 (xπ(xdx N 12T ( = (T d 1 { m 2 (x m 2, θ(x} 2ˆσ 02 (xπ(xdx. We now ave te following teorem. 7

Teorem 2.2. (i Under te conditions of Teorem 2.1(i, under H 01 or H 02 we ave for i = 1 or 2, as T, were σ 2 1 = 2 d C(K, π. L 1iT = N 1iT ( µ π σ 1 D N(0, 1 (2.11 (ii Under te conditions of Teorem 2.1(ii, under H 01 or H 02 we ave for i = 1 or 2, as T, were ˆσ 2 1 = 2ĥd C(K, π. L 1iT (ĥ = N 1iT (ĥ µ π ˆσ 1 D N(0, 1 (2.12 Teorem 2.2 sows tat we can test eiter te conditional mean or te conditional variance. Te conclusion of Teorem 2.2(i is similar to tose obtained previously for kernel estimation or series estimation based test statistics. Unlike te existing test statistics, our test statistics depend only on and K. It follows from (2.4 and (2.5 tat te test statistic of Härdle and Mammen (1993 depends on σ 2 (x = Var(Y t X t = x. Obviously, σ 0 of (2.6 needs to be estimated wen using L 0T in practice. By contrast, σ 1 of (2.11 does not involve any unknown function suc as σ 2 (x. As can be seen from te construction of L 1T, random denominators are involved in te form. Our experience suggests tat te involvement of random denominators could reduce te power of te proposed tests. Tis motivates te construction of te second class of our test statistics below. 2.2. Class II of Test Statistics In order to explain te motivation for te construction of te second class of our test statistics, we need to ave a look at some relevant test statistics for testing te null ypotesis (2.4: H 01 : m 1 (x = m 1θ (x against H 11 : m 1 (x = m 1θ (x + C 1T 1T (x. For testing (2.4, several autors ave proposed novel test statistics. See Li and Wang (1998, and Gao and King (2001. Let p st = K((X s X t /. To test (2.4, we suggest using a test statistic of te form L 21T = L 21T ( = Ts=1 Tt=1, s p st Û t Û s S 21T, (2.13 were S 2 21T = 2 T s,t=1 p 2 stû 2 t Û 2 s and Ût = Y t m 1, θ(x t. Similar to T of Horowitz and Spokoiny (2001, pp.606, we construct a test statistic of te form L 21T = L 21T ( = Tt=1 Ts=1, t A st Û s Û t Ŝ 21T, (2.14 8

were Ŝ2 21T = 2 T t=1 Ts=1 A 2 stû 2 t Û 2 s, {A st } is te (s, t element of te T T matrix A = W τ W, and W is te T T matrix wose (s, t element is w (X s, X t = K((X s X t / Tu=1 K((X s X u /. Teoretically, L 21T is muc more complicated tan L 21T, as te latter involves only a double summation wile te former involves not only a triple summation, but also several random denominators. Let P = {p st } be a T T matrix wit p st as its (s, t element and Û = (Û1,..., ÛT τ. Ten te numerator of (2.13 can be expressed as p st Û t Û s = Û τ P Û T p tt Ût 2. t=1 s t t=1 Tis suggests using te following form for testing te null ypotesis (2.1: (Û τ, ˆV τ P P were ˆV = ( ˆV 1,..., ˆV T τ and ˆV t = Û 2 t m 2, θ(x t. A simple decomposition implies tat (Û τ, ˆV τ P P P P P P Û, ˆV Û = (Û ˆV + ˆV τ P (Û + ˆV. (2.15 Equations (2.13 and (2.15 finally motivate te use of te following test statistic for testing te null ypotesis (2.1: L 2T = L 2T ( = Tt=1 Ts=1, t p st Ŵ s Ŵ t ˆσ = Ŵ τ P Ŵ ˆµ ˆσ, (2.16 were ˆσ 2 = 2 T s,t=1 p 2 stŵ 2 t Ŵ 2 s, Ŵ t = Ût + ˆV t, Ŵ = (Ŵ1,..., ŴT τ, and ˆµ = T t=1 p tt Ŵ 2 t = K(0 T t=1 Ŵ 2 t. Oter alternatives include L 2T = ˆL 2T ( = Tt=1 Ts=1, t A st Ŵ s Ŵ t σ, (2.17 were σ 2 = 2 T s,t=1 A 2 stŵ 2 t Ŵ 2 s and {A st } is as defined in (2.14. As can be seen from (2.16 and (2.17, tere are some similarities teoretically. Empirically, our small sample studies suggest tat L 2T is more powerful tan L 2T. Tus, we suggest using L 2T of (2.16 trougout te rest of tis paper. 9

We now conclude our construction and discussion wit te following remark. Remark 2.4. (i Equation (2.16 extends (2.13 for te univariate case to te bivariate case. Wen comparing (2.13 wit (2.16, one can see te similarities of te two forms. Tis also suggests tat one can easily construct a similar form for oter multiple test problems, suc as testing te first four moments. (ii It follows from te construction of L 2T tat te form of L 2T depends on te use of (2.15. Before finally using (2.15, we also considered te following alternative: (Û τ, ˆV τ P P P P Û = (Û ˆV ˆV τ P (Û ˆV. Obviously, one can replace Ŵt = Ût + ˆV t by Ŵt = Ût ˆV t in (2.16. As our asymptotic and empirical studies sow tat tere is little difference between using te two different forms, we suggest using L 2T of (2.16 trougout tis paper. (iii As can be seen from (2.7 and (2.16, te test statistic L 1T involves not only a triple summation, but also several random denominators. By contrast, L 2T involves just a double summation and no random denominator is involved in te numerator. Teoretically, te form of (2.7 looks muc more complicated tan tat of (2.16, altoug te two test statistics ave similar asymptotic properties. Empirically, our small sample studies in Section 4 sow tat L 2T is more powerful tan L 1T. We now state te main result of tis section and te proof is relegated to Appendix A. Teorem 2.3. (i Suppose tat Assumptions A.1 A.4 old. Ten under H 0 L 2T = L 2T ( D N(0, 1 as T. (ii Assume tat te conditions of (i old. In addition, assume tat tere is a random data driven as T. ĥ suc tat ĥ 1 p 0 as T. Ten under H 0 L 2T (ĥ D N(0, 1 Similar to (2.13, we can construct a test statistic for te univarate test problem H 02 proposed above Teorem 2.2. Te test statistic is given by L 22T = L 22T ( = Tt=1 Ts=1, t p st ˆVs ˆVt S 22T, (2.18 were S 2 22T = 2 T t=1 Ts=1 p 2 st ˆV 2 s ˆV 2 t. We now ave te following teorem and its proof follows from tat of Teorem 2.3. 10

Teorem 2.4. (i Under te conditions of Teorem 2.3(i, under H 01 or H 02 we ave for i = 1 or 2, L 2iT ( D N(0, 1 as T. (ii Under te conditions of Teorem 2.3(ii, under H 01 or H 02 we ave for i = 1 or 2, L 2iT (ĥ D N(0, 1 as T. Sections 2.1 2.2 mainly discuss ow to establis asymptotically consistent test statistics for te null ypotesis problem of te form (2.1, in wic bot m 1θ ( and m 2θ ( are parametric functions. As a matter of fact, one can construct similar test statistics for two different test problems te first one is tat bot m 1θ ( and m 2θ ( are of partially linear forms, and te second problem is tat bot m 1θ ( and m 2θ ( are of single index forms. Tis extension includes some semiparametric models as alternatives to te nonparametric null models. 2.3. Some extensions and generalizations Assume tat tere are two pairs of unknown parameters, (α, β and (γ, δ, and a pair of unknown functions, (φ, ψ suc tat m 1θ (X t = Ut τ α + φ(v t and m 2θ (X t = Zt τ β + ψ(w t (2.19 or m 1θ (X t = U τ t α + φ(v τ t γ and m 2θ (X t = Z τ t β + ψ(w τ t δ, (2.20 were θ = (α, β for (2.19, θ = (α, γ, β, δ for (2.20, and U t, V t, Z t and W t are eiter subsets of X t or te entire X t. Wen {X t } is a sequence of i.i.d. random variables and U t, V t, Z t and W t are subsets of X t, Härdle, Liang and Gao (2000, Capter 2 constructed some consistent estimators for (α, β and (φ, ψ in (2.19. Similarly, one can establis consistent estimators for te parameters and functions wen {X t } is a stationary process. See Härdle, Liang and Gao (2000, Capter 6. Li (1999 already considered testing te conditional mean of te form of te first part of (2.19. Wen U t = V t = X t and {X t } is a sequence of dependent processes in (2.20, te conditional mean becomes m 1θ (X t = Xt τ α + φ(xt τ γ. (2.21 For model (2.21, Xia, Tong and Li (1999 establised asymptotically normal estimators for te parameters and function involved. Li (1999 already constructed a consistent test statistic 11

for testing te null ypotesis of te form of (2.21 wit α 0. Similarly, one can establis asymptotically normal estimators for te parameters and functions involved in model (2.20. Assume tat α, β, γ, δ, φ(, and ψ( are consistent estimators of te parameters and functions involved in (2.19 or (2.20. Te detailed construction of te estimators is similar to Li (1999 and Härdle, Liang and Gao (2000, Capter 2 for (2.19 or Li (1999, Xia, Tong and Li (1999 and Härdle, Liang and Gao (2000, Capter 2 for (2.20. We now define m 1, θ(x t = Ut τ α + φ(v t and m 2, θ(x t = Zt τ β + ψ(w t for (2.19, and for (2.20. m 1, θ(x t = U τ t α + φ(v τ t γ and m 2, θ(x t = Z τ t β + ψ(w τ t δ Substituting te new estimator m θ(x = ( m 1, θ(x, m 2, θ(x τ into (2.7, one can establis te corresponding test statistic L 1T ( of (2.9 for testing te null ypotesis problem of te form (2.19 or (2.20. Similarly, for te construction of te corresponding test statistic L 2T ( of (2.16, one needs to replace Ût and ˆV t tere by Û t = Y t m 1, θ(x t ] f(vt and ˆV t = { Yt m 1, θ (X t ] 2 m2, θ (X t} f 2 (V t f(w t for te case of (2.19, and Û t = Y t m 1, θ(x t ] τ f(v t γ and ˆV { Yt t = m (X 1, θ t ] } 2 m2, θ (X t f 2 (Vt τ γ f(w t τ δ for te case of (2.20, were f( is te usual kernel density estimator based on te data involved. Terefore, for te null ypotesis problem (2.19 or (2.20, we can establis te corresponding Teorems 2.1 and 2.3. Te detailed conditions and te proofs of te resulting teorems are similar to tose for Teorems 2.1 and 2.3. Similarily, one can consider nonparametric significance testing for bot te conditional mean and conditional variance of model (1.1. To do so, one needs to extend some existing results, suc as Fan and Li (1996, Lavergne and Vuong (2000, and Aït-Saalia, Bickel and Stoker (2001 to te simultaneous setting. As tey are extremely tecnical, we sall not provide te details, wic, owever, are available upon request from te first autor. We need to point out tat te test statistics proposed in Sections 2.1 and 2.2 are already normalized test statistics and teir asymptotic distributions are standard normal. It is expected tat te rate of convergence may not be fast. Tus, Teorems 2.1 2.4 can only give some roug idea about te asymptotic beaviour of te test statistics involved wen te sample size is small. Tus, in practice we need to consider using a bootstrap metod wen 12

implementing te test statistics in practice. As our small sample studies suggest tat L 2T ( is at least as powerful as L 1T ( for eac fixed, we need only to modify L 2T ( to an optimal test statistic and sow tat te modified test statistic is consistent against alternatives of te form (2.1 in Section 3 below. 3. An adaptive test procedure Section 2 establises te asymptotic normality of te test statistics for testing H 0 : m(x = m θ (x versus H 1 : m(x = m θ (x + C T T (x, con- were T (x is as defined before. Te test statistics ave nontrivial power only if C T verges more slowly tan T 1/2. Define C T = C1T 2 + C2T 2. In tis section, we consider tat te form of te local alternative models is were θ 1 Θ. m T (x = m θ1 (x + C T T (x, (3.1 Similar to our tests, te tests of Andrews (1997, Bierens (1982, Bierens and Ploberger (1997, and Hart (1997 are consistent against alternatives of te form (3.1 wenever C T converges more slowly tan T 1/2. Tis section considers te case were te testing problem is a simultaneous one for te dependent time series case. Te main results of tis section correspond to Teorems 1 4 of Horowitz and Spokoiny (2001. 3.1. Asymptotic Beaviour of te Test Statistic under te Null Hypotesis As discussed in Section 2, te proposed test statistics depend on te bandwidt. Tis section ten suggests using L 2 = max H T L 2T (, (3.2 were H T = { = max a k : min, k = 0, 1, 2,... }, in wic 0 < min < max, and 0 < a < 1. Let J T denote te number of elements of H T. In tis case, J T log 1/a ( max / min. Simulation Sceme: Trougout tis section, we use te notation of L = L 2. We now discuss ow to obtain a critical value for L. Te exact α level critical value, l α (0 < α < 1 is te 1 α quantile of te exact finite-sample distribution of L. Because θ 0 is unknown, l α cannot be evaluated in practice. We terefore suggest coosing a simulated α level critical value, l α, by using te following simulation procedure: 1. For eac t = 1, 2,..., T, generate Yt = m 1 θ(x t + m 2 θ(x t e t, were {e t } is sampled randomly from a specified distribution wit E e t ] = 0 and E (e t 2] = 1. In addition, assume tat te tird and fourt moments of {e t } exist. 13

2. Use te data set {Yt, X t : t = 1, 2,..., T } to estimate θ. Denote te resulting estimate by ˆθ. Compute te statistic ˆL tat is obtained by replacing Y t and θ wit Yt and ˆθ on te rigt and side of (3.2. 3. Repeat te above steps M times and produce M versions of ˆL denoted by ˆL m for m = 1, 2,..., M. Use te M values of ˆL m to construct teir empirical bootstrap distribution function, tat is, F (u = 1 M Mm=1 I(ˆL m u. Use te empirical bootstrap distribution function to estimate te asymptotic critical value, l α. We now state te following result and its proof is relegated to Appendix B. Teorem 3.1. Assume tat Assumptions A.1 A.2 and B.1 B.3 old. Ten under H 0 lim T P (L > l α = α. Te main result on te beavior of te test statistic L under H 0 is tat l α is an asymptotically correct α level critical value under any model in H 0. 3.2. Consistency Against a Fixed Alternative We now sow tat L is consistent against a fixed alternative model. Assume tat model (1.1 olds. Let te parameter set Θ be an open subset of R q. Let M = {m θ ( : θ Θ} satisfy Assumption B.1 listed in Appendix B. For i = 1, 2, let M i (θ = (m iθ (X 1,..., m iθ (X T τ, m i = (m i (X 1,..., m i (X T τ, M(θ = (M 1 (θ τ, M 2 (θ τ τ and m = (m τ 1, m τ 2 τ. Measure te distance between m and M by te normalized l 2 distance ρ(m, M = ( ] 1 1/2 inf m M(θ 2 (3.3 θ Θ 2T ( 1 = inf θ Θ 2T m 1 M 1 (θ 2 + 1 ] 1/2 2T m 2 M 2 (θ 2. If H 0 is false, ten ρ(m, M c ρ for all sufficiently large T and some c ρ > 0. A consistent test will reject a false H 0 wit probability approacing one as T. Te following teorem establises te consistency. Teorem 3.2. Assume tat te conditions of Teorem 3.1 old. In addition, if tere is some C ρ > 0 suc tat lim T P (ρ(m, M C ρ = 1 olds, ten lim P T (L > l α = 1. Te proof of Teorem 3.2 is relegated to Appendix B. 3.3. Consistency Against a Sequence of Local Alternatives 14

In tis section, we consider te consistency of L under local alternatives of te form m T (x = m θ1 (x + C T T (x wit C T C 0 T 1/2 max d/4 (loglogt 1/4 for some constant C 0 > 0 and θ 1 Θ, were m T (x = (m 1T (x, m 2T (x τ, m 1T (x = m 1θ (x + C 1T 1T (x and m 2T (x = m 2θ (x + C 2T 2T (x. Trougout tis section, for i = 1, 2 let m it = (m it (X 1,..., m it (X T τ, it = ( i (X 1,..., i (X T τ, m T = (m τ 1T, m τ 2T τ, T = ( τ 1T, τ 2T τ, For k = 1, 2, let θ M k (θ be te T q matrix wose (i, j element is m kθ(x i θ j and θ M(θ = (( θ M 1 (θ τ, ( θ M 2 (θ τ τ. We assume tat T (x is a continuous function tat is normalized so tat 1 2T T 2 = 1 2T ( T 1T (X t 2 + 2T (X t 2 1. (3.4 t=1 t=1 We also suppose tat T is not an element of te space spanned by te columns of θ M(θ. Tat is, for some δ > 0, were θ M(θ Π 1 θ M(θ δ θ M(θ (3.5 Π 1 = θ M(θ 1 ( θ M( 1 τ θ M(θ 1 1 θ M(θ 1 τ is te projection operator into te column space of θ M(θ 1. Conditions (3.4 and (3.5 exclude functions T ( for wic m T M(θ T,0 = o( C T for some nonstocastic sequence {θ T,0 } Θ. Tus, (3.4 and (3.5 ensure tat te rate of convergence of m T to te parametric model M(θ 1 is te same as te rate of convergence of C T to zero. In particular, wen (3.4 and (3.5 old in probability, olds in probability. ( ] 1 1/2 inf θ Θ 2T m T M(θ 2 δ C T (1 o(1 (3.6 We now state te following consistency result and its proof is relegated to Appendix B. Teorem 3.3. Assume tat Assumptions A.1 A.2 and B.1 B.3 old. Let θ be a T consistent estimator of θ. Let m T satisfy (3.1 wit C T CT 1/2 d/4 max (loglogt 1/4 for some constant C > 0. In addition, let conditions (3.4 and (3.5 old in probability. Ten lim P T (L > l α = 1. 15

Te result sows tat te power of te adaptive, rate-optimal test approaces one as T for any function T ( and sequence {C T } tat satisfy te conditions of Teorem 3.3. 3.4. Consistency Against a Sequence of Smoot Alternatives Tis section discusses tat L is consistent uniformly over alternatives in a Hölder smootness class wose distance from te parametric model approaces zero at te fastest possible rate. Te results can be extended to Sobolev and Besov classes under more tecnical conditions. Before specifying our smootness classes, we introduce te following notation. Let j = (j 1,..., j d, were j 1,..., j d 0 are integers, be a multi-index. For i = 1, 2, define d j = j i and D j m i (x = j m i (x i=1 x j 1 1 x j d d wenever te derivative exists. Define te Hölder norm m H,s = sup x S ( D j m 1 (x + D j m 2 (x. j s Te smootness classes tat we consider consist of functions m S(H, s {m : m H,s c H } for some (unknown s max(2, d/4 and c H <. For some s max(2, d/4 and all sufficiently large c m <, define B H,T = { m S(H, s : lim T P were ρ(m, M is as defined in (3.3. ( (ρ(m, M c m T 1 } 2s/(4s+d loglogt = 1, (3.7 We now state te following consistency result and its proof is relegated to Appendix B. Teorem 3.4. Assume tat Assumptions A.1 A.2 and B.1 B.3 old. Ten for 0 < α < 1 and B H,T as defined in (3.7 lim P T (L > l α = 1. Remark 3.1. Teorems 3.1 3.4 extend Teorems 1 4 of Horowitz and Spokoiny (2001 from testing te mean of a fixed design regression model to te testing of bot te conditional mean and te conditional variance of nonparametric α mixing time series. Moreover, we consider te simultaneous test case were bot te mean and variance functions can be simultaneously tested. Due to te property, we do not need to estimate te conditional variance directly for te simulation procedure proposed at te beginning of Section 3. Remark 3.2. As can be seen from te above, te implementation of te adaptive test requires an intensive computing process. In particular, one needs to select bot te interval of bandwidt parameters, H T, and te asymptotic critical value, l α. In particular, it is quite difficult to select a bandwidt parameter,, for implementing te test statistic, L 1T, as 16

existing teory provides no teoretical criteria on ow tis kind of coice sould be done. It sould be pointed out tat existing selection criteria for for estimation purposes may not be applicable and suitable, as estimation based optimal values are not necessarily optimal for testing purposes. Our experience suggests tat te coice of sould be based on te assessment of te power of te test involved. In Section 4 below, we provide two detailed simulation procedures for te coice of bot H T and te asymptotic critical value. 4. An example of implementation Tis section ten illustrates te proposed adaptive tests by a simulated example. In tis example, we use simulated data to compare some small sample properties of L 1T ( and te adaptive test statistic L 2 of (3.2. Example 4.1. Consider a nonlinear time series model of te form Y t = α + βx t + σ 1 + 0.5X 2 t e t, in wic X t = 0.5X t 1 + ɛ t, t = 1, 2,..., T, (4.1 were α, β and σ > 0 are unknown parameters to be estimated, bot {ɛ t : t 1} and {e t : t 1} are mutually independent and identically distributed, and independent of X 0, ɛ t U( 0.5, 0.5, X 0 U( 1, 1, and {e t } is eiter te standard N(0, 1 or te normalized exponential Exp(1 1 error, wic as mean zero and variance one. Define te true forms of te conditional mean and conditional variance by g θ (X t = α + βx t and σ θ (X t = σ 1 + 0.5Xt 2. We now consider a sequence of alternative models of te form Y t = g T (X t + σ T (X t e t, (4.2 were g T (x = g θ (x + C T φ(x/d T and σ T (x = σ θ (x + C T φ(x/d T, (4.3 in wic D T = ( T 1 loglogt 1/9, CT = D 4 T and φ( is te probability density function of te standard normal distribution. Te coice of (4.2 and (4.3 ensures tat (3.7 olds wit s = 2 and d = 1. Tis implies tat te adaptive test is consistent against te sequence wit an optimal rate. In te following detailed simulation, we consider using a class of alternatives of te form Y t = α + βx t + 1 ψ φ(x t/ψ + ( σ 1 + 0.5Xt 2 + 1 ψ φ(x t/ψ e t, (4.4 17

were ψ 0 is defined as te truncation parameter to be cosen, and te oters are as defined in (4.1. In Table 4.1 below, we calculate bot te size and te power of our adaptive test for various cases. Te vector of unknown parameters, θ = (α, β, σ, involved in (4.1 was ten estimated using te pseudo maximum likeliood metod, wic is quite common in te estimation of parametric ARCH models. Due to te structure of (4.1, we coose te following weigt function and te kernel function given by 1 if x 1, 1] 2 π(x = 0 oterwise (4.5 and 15 K(x = (1 16 x2 2 if x 1, 1] 0 oterwise. Let x i = i n and n = T 1/5 ] (x] x denotes te largest integer part of x. Define (4.6 ˆN 1T ( = 1 n (T { m(x i n m θ(x i } τ ˆΣ 1 (x i { m(x i m θ(x i }, (4.7 i=1 were m(x = ( m 1 (x, m 2 (x τ, m θ (x = ( m 1,θ (x, m 2,θ (x τ, θ is an estimator of θ, m 1 (x = Tt=1 K((x X t /Y t Tt=1 K((x X t /, m 2(x = Tt=1 K((x X t /(Y t ˆm 1 (X t 2 Tt=1, K((x X t / m l,θ (x = Tt=1 K((x X t /m l,θ (X t Tt=1 K((x X t / for l = 1, 2, m 1,θ (X t = α + βx t, m 2,θ (X t = σ 2 1 + 0.5Xt 2 ], ˆΣ 1 1 (x = ˆf(xˆΣ 0 (x, ˆΣ0 (x = ˆσ 20(x ˆσ 11 (x ˆf(x = 1 T Tt=1 K ( x X t and for i, j = 0, 1, 2, ˆσ 11 (x ˆσ 02 (x, ˆσ ij (x = Tt=1 K ( x X t ˆɛ i tˆη j t Tt=1 K (, ˆɛ x X t = Y t ˆm 1 (X t, ˆη t = ˆɛ 2 t t ˆm 2 (X t, and K( is as defined in (4.6. Alternatively, one can generate x i from te density π( as many as Q = 1000 times and ten define N 1T ( = T { } 1 n { m(x i Q n m θ(x i } τ ˆΣ 1 (x i { m(x i m θ(x i }. (4.8 Q replications i=1 Wit te coice of π( and K( in (4.5 and (4.6, te constant C(K, π involved in L 1T is 93. In order to calculate 10 L 2 of (3.2, one needs to find H T, wic is cosen by te following simulation procedure: 18

For te simulation, we start wit some initial values for θ 0 and X 0. For eac t = 1, 2,..., T, generate te data (X t, Y t from (4.2 and (4.3. Use te data set {(Y t, X t : t = 1, 2,..., T } to estimate θ. Denote te resulting estimate by θ. For eac fixed, compute te resulting function of given by L 1 ( = L 1T ( = N 1T ( 2. 186 5 Repeat te above steps M = 1000 times and produce M versions of ˆL 1 ( denoted by ˆL 1m ( for m = 1, 2,..., M. Use te M functions of, ˆL 1m ( for m = 1, 2,..., M, to construct teir empirical bootstrap distribution function, tat is, F 1 (u = 1 M I(ˆL 1m ( u, M m=1 were I(U u is te usual indicator function. For te given empirical value l 0.05 = 1.65, one can calculate te following power function φ 1 ( = 1 F 1 (l 0.05. Find approximately at wic value te power function φ 1 ( is maximized. Denote te maximizer by. Similarly, one can find te maximizer,, of te corresponding power function φ 2 ( for L 2 ( = Tt=1 ( Ts=1, t p st Ŵ s Ŵt ˆσ, were ˆσ 2 = 2 T t=1 Ts=1 p 2 stŵ 2 t Ŵ 2 s, Ŵt = Ût+ ˆV t, Ût = Y t m 1, θ(x t, ˆV t = Û 2 t m 2, θ(x t, p ts = K((X t X s /, and K( and θ are as defined before. Using, construct H T. We now can calculate te following test statistic L 1 = L 1 ( = N 1T ( 2 186 5. (4.9 For te cosen H T, we can compute L 2 of (3.2 given by ( Tt=1 Ts=1, t p L st Ŵ s Ŵt 2 = max. (4.10 H T ˆσ In order to compute te rejection rates of te test statistics, one needs to find te corresponding simulated critical values. We suggest coosing two simulated 5% level critical values, l 1,0.05 and l 2,0.05, by using te following simulation procedure: 19

For te simulation, we start wit some initial values θ 0 and X 0. For eac t = 1, 2,..., T, generate te data (X t, Y t from model (4.1. Use te data set {(Y t, X t : t = 1, 2,..., T } to estimate θ. Denote te resulting estimate by θ. For te cosen H T, compute te statistics L 1 and L 2 given by (4.9 and (4.10. Repeat steps 2 3 M = 1000 times and produce M versions of L 1 and L 2 denoted by L 1m and L 2m for m = 1, 2,..., M. Use te M values of L 1m and L 2m to construct teir empirical bootstrap distribution functions, tat is, F i (u = 1 M Mm=1 I(L im u for i = 1, 2. Use te empirical bootstrap distribution functions to calculate te two bootstrap simulated critical values, l 1,0.05 and l 2,0.05. For eac case were bot ψ and T are cosen, we can compute te rejection rates. For calculating te rejection rates wen H 0 is true, one needs to use te data {(X t, Y t } were eac (X t, Y t is generated from (4.1. For calculating te rejection rates wen H 1 is true, one needs to use te data {(X t, Y t } were eac (X t, Y t is generated from (4.2. Te number of simulations in producing Table 4.1 below was 1000. Te detailed results are given in Table 4.1 below. Table 4.1 near ere Remark 4.1. (i First, one needs to point out tat before modifying L 2T ( of (2.16 to be adaptive, we conducted some small sample studies for bot L 1T ( and L 2T (. Our studies sowed tat L 2T ( was more powerful tan L 1T ( uniformly in. Moreover, Table 4.1 sows tat L 2 of (4.10 is more powerful tan L 1 of (4.9 for all te cases under consideration. We were also trying to compare te power of L 2 of (3.2 wit tat of te proposed CGL test given in (3.1 of Cen and Gao (2003. Because te detailed comparison requires some very intensive and extremely lengty computation as well as te implementation of bot te proposed simulation sceme given in 3.1 and te so called empirical likeliood based bootstrap simulation procedure proposed in Cen and Gao (2003, we ave not been able to finis te detailed comparison for Example 4.1. (ii As can be seen from te first part of Table 4.1, for te standard Normal error te power can be close to one wen T = 500 and te value of ψ 1 is between 4% and 10%. Tis may sow tat L 2 is not only asymptotically optimal but also practically applicable to bot te small and medium sample cases, since te differences between H 0 and H 1 were made deliberately close. We also computed te power of te tests for te case were ψ = 1 or 0.25, our small sample results sowed tat te power of L 2 was already 100% even wen T = 250. In te second part of Table 4.1, we ave provided some small sample results for te case were te error is te normalized exponential random variable. Te results sow tat te power of L 2 is uniformly iger tan tat for te standard N(0, 1 case. Tis may 20

sow tat L 2 is capable of capturing te skewness and kurtosis due to te flexible structure of {e t } allowed in te Simulation Sceme. As pointed out in Section 2.2, for some cases one may need only to test eiter te conditional mean or te conditional variance. For te one sided test case, it would be interesting to know weter tere would be any significant reduction of te power wen using L 2 wile H 1 was different from H 0 only in eiter te conditional mean or te conditional variance. In oter words, we would be interested to know weter L 2 would be muc more powerful tan eiter L 21 = max H1T L 21T ( or L 22 = max H2T L 22T ( wen testing an one sided problem, were L 21T ( and L 22T ( are as defined in (2.13 and (2.18 respectively, and te coice of H 1T and H 2T is similar to tat for H T. We ave conducted some small sample studies for L 21, L 22 and L 2 for te one sided test case. Te number of simulations in producing Table 4.1 below was 1000. Te detailed results are given in Tables 4.2 and 4.3 below. Table 4.2 near ere Table 4.3 near ere Remark 4.2. (i Tables 4.2 and 4.3 provide some detailed values for te power of te simultaneous test and te power of te two one sided tests wen C 2T 0 or C 1T 0. Our small sample results sow tat te power of te simultaneous test was just sligtly less powerful tan te corresponding one sided test for bot te cases even wen te simultaneous test was used for testing eiter te conditional mean or te conditional variance. Tis may suggest tat one can consider testing bot te conditional mean and te conditional variance simultaneously wen it is difficult to determine wic component (te conditional mean or te conditional variance may cause a model specification problem. We observed tat te reduction of te power of te simultaneous test for te case of C 2T 0 was smaller tan tat for te case of C 1T 0. We also observed tat bot te simultaneous and te one sided tests for te case of C 1T 0 were less powerful tan te corresponding tests for te case of C 2T 0. We ave not been able to explain tese penomena, altoug we tink tat tis may be due to te increase in variability wen testing te conditional variance only. It is also observed tat te sizes of te tree tests were all quite close to 5%. (ii Wen comparing te individual values for te power of te simultaneous test wit tose for te power of te one sided tests for te Normal error distribution and te normalized exponential error distribution, we found some kind of superiority of te tests for te normalized exponential error distribution over tose for te Normal error distribution, altoug te superiority may not be significant. Tis finding is similar to tat drawn from Table I of Horowitz and Spokoiny (2001. 5. Conclusion 21

In tis paper, we considered te general nonparametric time series regression model (1.1 and ten proposed several model specification test statistics for testing te mean and te variance under te α mixing condition. Furtermore, we establised te adaptive test. Several consistency results about te test power of te test statistics were ten developed. Te consistency results extend te main results of Teorems 1 4 of Horowitz and Spokoiny (2001 from te fixed design case to te α mixing time series case. Te proposed optimal tests were illustrated troug using a simulated example in Section 4. Te results given in tis paper can be extended in a number of directions. First, te results for te sort-range dependent time series case can be extended to te long-range dependent time series case, wic is also relevant to some economic and financial data problems. Second, one can relax te strict stationarity and te mixing condition, as te recent work by Karlsen and Tjøsteim (2001 indicates tat it is possible to do suc work witout te stationarity and te mixing condition. 5 Tis part is particularly important for te two reasons: (i for te long-range dependent case one needs to avoid assuming bot te long-range dependence and te mixing condition, as tey contradict eac oter; and (ii some important models are nonstationary and long range dependent. See for example, Robinson (1995, 1997, and Gao (2002. Some of tese issues are left for possible future researc. Appendix A Tis appendix lists te necessary assumptions for te establisment and te proof of te main results given in Section 2. A.1. Assumptions Assumption A.1. (i Assume tat te process (X t, Y t is strictly stationary and α-mixing wit te mixing coefficient α(t = C α α t defined by α(t = sup{ P (A B P (AP (B : A Ω s 1, B Ω s+t} for all s, t 1, were 0 < C α < and 0 < α < 1 are constants, and Ω j i generated by {(X t, Y t : i t j}. denotes te σ-field (ii Assume tat P (0 < min t 1 σ(x t max t 1 σ(x t < = 1 and tat for all t 1 and 1 i 4, P (Ee i t Ω t 1 ] = µ i = 1, were µ 1 = 0, µ 2 = 1, µ 3 and µ 4 are real constants, and Ω t = σ{(x s+1, Y s : 1 s t} is a sequence of σ-fields generated by {(X s+1, Y s : 1 s t}. (iii Let ζ t = ɛ t or η t. In addition, E ζt 4+α ] < and E ζ i 1 t 1 ζ i 2 t 2 ζ i l t l 1+β ] < 5 One also needs to point out tat for te continuous time case, Aït-Saalia (1999 is applicable to te nonstationarity case. 22

for some small constants α > 0 and β > 0, were 2 l 4 is an integer, 0 i j lj=1 i j 8. 4 and Assumption A.2. (i Let ζ t = ɛ t or η t and µ i (x = Eζ i t X t = x] for 1 i 4. Assume tat te following Lipscitz condition is satisfied: max µ i(u + v µ i (u D(u v 1 i 4 wit v S (any compact set of R q and E D(X t 2+γ] < for some small constant γ > 0, were denotes te Euclidean norm. (ii Let S π be a compact subset of R d. Assume tat π( is a positive weigt function supported on S π and satisfies 0 < π 2 (xdx C for some constant C. Let S f = {x R d : f(x > 0} and S X be te projection of S π in S f. 6 Assume tat te marginal density function, f(x, of X t, and tat all te first two derivatives of f(x and m i (x, i = 1, 2, are continuous on R d, inf x SX m 2 (x C m > 0 for some constant C m, and on S X te density function f(x is bounded below by C f and above by C 1 f for some C f > 0, were m 1 (x = EY t X t = x] and m 2 (x = vary t X t = x]. (iii Let f τ1,τ 2,,τ l ( be te joint probability density of (X 1+τ1,..., X 1+τl (1 l 4. Assume tat f τ1,τ 2,,τ l ( exists and satisfies te following Lipscitz condition: f τ1,τ 2,,τ l (x 1 + v 1,, x l + v l f τ1,τ 2,,τ l (x 1,, x l D τ1,,τ l (x 1,, x l v for v S, were S is any compact subset of R d and D τ1,,τ l (x 1,, x l is integrable and satisfies te following conditions D τ1,,τ l (x 1,, x l x 2θ dx < M 1 <, D τ1,,τ l (x 1,, x l f τ1,τ 2,,τ l (x 1,, x l dx < M 2 < for some θ > 1 and constants M 1 > 0 and M 2 > 0. Assumption A.3. (i Assume tat te univariate kernel function k( is nonnegative, symmetric, and supported on 1, 1]. In addition, k(x is continuous on 1, 1]. Tis paper considers using (ii Te bandwidt parameter satisfies tat d K(x 1,, x d = k(x i. i=1 lim = 0, T lim T T d = and lim sup T 5d <. T Assumption A.4. Assume tat for any parametric estimator, θ, of θ max max m 1 i 2 1 t T iθ (X t m iθ (X t = O p (T 1/2. 6 In oter words, S X = S π S f. 23

Remark A.1. Assumptions A.1(i(ii, A.2(ii and A.3 and A.4 are novel conditions. Assumptions A.1(iii and A.2(i(iii are similar to some parts of Condition (A1 of Li (1999, p.107. All te conditions are quite natural in tis kind of problem. Note tat we ave not assumed te independence between {X t } and {e t }. Wen {X t } and {e t } are independent, Assumption A.1(ii olds naturally. For tis case, model (1.1 becomes a nonparametric ARCH model wen X t = (Y t 1,, Y t d and {e t } is a sequence of i.i.d. random errors. We also ave not assumed tat te marginal density of X t as a compact support. Instead, we impose some restrictions on te support of te weigt function π(. Assumption A.2 ensures tat 0 < inf x SX µ 2 (x sup x SX µ 2 (x < and 0 < inf x SX µ 4 (x sup x SX µ 2 (x <. Tese two conditions are required to ensure tat Σ 1 (x exists and tat te smallest eigenvalue of Σ 1 (x is positive uniformly in x. Assumption A.4(i tat requires te T rate of convergence for te parametric case is a standard condition. It olds wen eac m iθ ( is differentiable in θ and θ is an T consistent estimator of θ. A.2. Tecnical Lemmas Te following lemmas are necessary for te proof of te main results stated in Section 2. Trougout te rest of tis paper, we use f(x i1,..., x id to represent te joint density function of (X i1,..., X id for 1 i 1 <... < i d d. Lemma A.1. Suppose tat Mm n are te σ-fields generated by a stationary α-mixing process ξ i wit te mixing coefficient α(i. For some positive integers m let η i M t i s i were s 1 < t 1 < s 2 < t 2 < < t m and suppose t i s i > τ for all i. Assume furter tat η i p i p i = E η i p i < for some p i > 1 for wic Q = l 1 i=1 p i < 1. Ten l ] l E l η i Eη i ] 10(l 1α(τ(1 Q η i pi. i=1 i=1 i=1 Proof: See Roussas and Ionnides (1987. Lemma A.2. Let ξ t be a r-dimensional strictly stationary and strong mixing (α mixing stocastic process. Let φ(, be a symmetric Borel function defined on R r R r. Assume tat for any fixed x R r, Eφ(ξ 1, x] = 0 and Eφ(ξ i, ξ j Ω j 1 0 ] = 0 for any i < j, were Ω j i denotes te σ field generated by {ξ s : i s j}. Let φ st = φ(ξ s, ξ t, σ 2 st = var(φ st and σ 2 T = 1 s<t T σ2 st. For some small constant 0 < δ < 1, let M T 22 = M T 1 = M T 21 = { max max 1 i<j<k T M T 3 = { max max E φ ik φ jk 1+δ, 1 i<j<k T { max max E φ ik φ jk 2(1+δ, 1 i<j<k T φ ik φ jk 2(1+δ dp (ξ i, ξ j dp (ξ k, max E φ ikφ jk 2, M T 4 = 1 i<j<k T max 1 < i, j, k 2T i, j, k different } φ ik φ jk 1+δ dp (ξ i dp (ξ j, ξ k, } φ ik φ jk 2(1+δ dp (ξ i dp (ξ j, ξ k, } φ ik φ jk 2(1+δ dp (ξ i dp (ξ j dp (ξ k, { max P } φ 1i φ jk 2(1+δ dp, 24