SPECIFICATION TESTING FOR ERRORS-IN-VARIABLES MODELS

Size: px

Start display at page:

Download "SPECIFICATION TESTING FOR ERRORS-IN-VARIABLES MODELS"

Lesley Chambers
6 years ago
Views:

1 SPECIFICATION TESTING FOR ERRORS-IN-VARIABLES MODELS TAISUKE OTSU AND LUKE TAYLOR Astract. This paper considers specification testing for regression models with errors-invariales and proposes a test statistic comparing the distance etween the parametric and nonparametric fits ased on deconvolution techniques. In contrast to the method proposed y Hall and Ma 2007, our test allows general nonlinear regression models. Since our test employs the smoothing approach, it complements the nonsmoothing one y Hall and Ma in terms of local power properties. The other existing method, y Song 2008, is shown to possess trivial power under certain alternatives. We estalish the asymptotic properties of our test statistic for the ordinary and supersmooth measurement error densities and develop a ootstrap method to approximate the critical value. We apply the test to the specification of Engel curves in the US. Finally, some simulation results endorse our theoretical findings: our test has advantages in detecting high-frequency alternatives and dominates the existing tests under certain specifications. 1. Introduction As is the case with most decisions, the choice to employ nonparametric techniques over parametric ones is not always ovious and making the wrong decision can e costly. If we are ale to confirm that a parametric model is correctly specified, we can gain consideraly y using parametric estimators. Meanwhile, if we are not fully convinced of this, we should appeal to nonparametric estimation. A popular solution to this prolem involves comparing the distance etween some parametric and nonparametric estimators; this has een studied in detail y Härdle and Mammen Other tests for the suitaility of parametric models have een studied y Azzalini, Bowman and Härdle 1989, Euank and Spiegelman 1990, Horowitz and Spokoiny 2001, and Fan and Huang 2001 among many others. Measurement error is a prolem that is rife in datasets from many disciplines. Examples from iology, economics, geography, medicine, and physics are aundant see, e.g., Fuller, 1987, and Meister, Determining the validity of a parametric model ecomes even more important in the presence of measurement error ecause in this setting nonparametric estimators have The authors gratefully acknowledge financial support from the ERC Consolidator Grant SNP Otsu and the ESRC Taylor. 1

2 even slower convergence properties whilst in many cases, parametric estimators retain their n- consistency. However, when the data are contaminated y measurement error, conventional specification tests have incorrect size in general and may also suffer from low power properties. In this paper, we propose a specification, or goodness-of-fit test, for possily nonlinear regression models with errors-in-variales y comparing the distance etween the parametric and nonparametric fits ased on deconvolution techniques. We estalish asymptotic properties of the test statistic and propose a ootstrap critical value. As we discuss elow, in contrast to existing methods, our test allows nonlinear regression models and possesses desirale power properties. In the enormous literature on specification testing, relatively little attention has een given to the issue of measurement error despite its ovious need. Papers such as Zhu, Song and Cui 2003, Zhu and Cui 2005, and Cheng and Kukush 2004 proposed χ 2 statistics ased on moment conditions of oservales implied from errors-in-variales regression models. However, as is the case without measurement error, these tests are generally inconsistent for some fixed alternatives. Song 2008 proposed a consistent specification test for linear errors-in-variales regression models y comparing nonparametric and model-ased estimators on the conditional mean function of the dependent variale Y given the mismeasured oservale covariates W, that is E[Y W ]. As we clarify at the end of Section 2, this approach may not have sensile local power for the original hypothesis on E[Y X], where X is a vector of error-free unoservale covariates. Hall and Ma 2007 proposed a nonsmoothing specification test for regression models with errors-in-variales, which is ale to detect local alternatives at the n-rate. We propose a smoothing specification test that complements Hall and Ma s 2007 test see further discussion elow. 1 Consistent specification tests can e roadly split into those that use a nonparametric estimator called smoothing tests and those that do not called nonsmoothing or integral-transform tests. In contrast to Hall and Ma 2007 which adopted the nonsmoothing approach, we propose a kernel-ased smoothing test for the goodness-of-fit of parametric regression models with errors-in-variales. There are two important features of our test. First, our smoothing test is not restricted to polynomial models and allows testing of general nonlinear regression models. Second, analogous to the literature on conventional specification testing, our smoothing test complements Hall and Ma s 2007 test if applied to polynomial models due to its distinct 1 Other papers that study specification testing under measurement error includes Butucea 2007, Holzmann and Boysen 2006, Holzmann, Bissantz and Munk 2007, and Ma et al for testing proaility densities, Koul and Song 2009, 2010 for Berkson measurement error models, and Song 2009 and Xu and Zhu 2015 for errors-in-variales models with validation data. 2

3 power properties. Rosenlatt 1975 explained that although local power properties of nonsmoothing tests suggest they are more powerful than smoothing tests, there are other types of local alternatives for which tests ased on density estimates are more powerful. Fan and Li 2000 showed that in the non-measurement error case, smoothing tests are generally more powerful for high-frequency alternatives and nonsmoothing tests are more powerful for low-frequency alternatives. Thus, smoothing tests should e viewed as complements to, rather than sustitutes for, [nonsmoothing tests]. Our simulation results suggest that this phenomenon extends to errors-in-variales models. In contrast to the aove papers and our own, Ma et al moved away from Wald-type tests where restricted and unrestricted estimates are compared. They proposed a local test that is more analogous to the score test where only the model under the null hypothesis must e estimated. They extended this idea to an omnius test that is ale to detect departures from the null in virtually all directions using a system of different asis functions with which to test against. To determine critical values for our smoothing test, we propose a ootstrap procedure. Measurement error can cause difficulties in applying conventional ootstrap procedures ecause the true regressor, regression error, and measurement error are all unoserved. Moreover, in order to estimate the distriutions of test statistics, deconvolution techniques are typically required which converge at a much slower rate than n. Hall and Ma 2007 discussed this issue and noted, the ootstrap is seldom used in the context of errors-in-variales. They outline a procedure which involves estimating the distriution of the unoservale regressor using a kernel deconvolution estimator and otained ootstrap counterparts for the regression error using a wild ootstrap method. We propose a much simpler procedure involving a perturation of each summand of our test statistic. This paper is organized as follows. Section 2 descries the setup in detail and introduces the test statistic and its motivation. Section 3 outlines the main asymptotic properties of the test statistic and discusses how to implement the test in the case where the distriution of the measurement error is unknown ut repeated measurements on the contaminated covariates are availale. Section 4 analyses the small sample properties of the test through a Monte Carlo experiment and Section 5 applies the test to the specification of Engel curves. All mathematical proofs are deferred to the Appendix. 3

4 2. Setup and test statistic Consider the nonparametric regression model Y = mx + U with E[U X] = 0, where Y R is a response variale, X R d is a vector of covariates, and U R is the error term. In this paper, we focus on the situation where X is not directly oservale due to the measurement mechanism or nature of the environment. Instead a vector of variales W is oserved through W = X + ɛ, where ɛ R d is a vector of measurement errors that has a known density f ɛ and is independent of Y, X. The case of unknown density f ɛ will e discussed in Section 3.1. We are interested in specification, or goodness-of-fit, testing of a parametric functional form of the regression function m. More precisely, for a parametric model m θ, we wish to test the hypothesis H 0 : mx = m θ x for almost every x R d, H 1 : H 0 is false, ased on the random sample {Y i, W i } n i=1 of oservales whilst X i is unoservale. To test the null H 0, we adapt the approach of Härdle and Mammen 1993, which compares nonparametric and parametric regression fits, to the errors-in-variales model. As a nonparametric estimator of m, we use the deconvolution kernel estimator see, e.g., Fan and Truong, 1993, and Meister, 2009, for a review ˆmx = n i=1 Y ik x W i n i=1 K x W i, where K a = 1 2π d e it a Kft t fɛ ft t dt, is the so-called deconvolution kernel, i = 1, is a andwidth, and K ft and f ft ɛ are the Fourier transforms of a kernel function K and the measurement error density f ɛ, respectively. 2 Throughout the paper we assume f ft ɛ t 0 for all t R d and K ft has compact support so that the aove integral is well-defined. On the other hand, if one imposes a parametric functional 2 To simplify the exposition, we concentrate on the case where all elements of X are mismeasured. If X contains oth correctly measured and mismeasured covariates denoted y X 1 and X 2, respectively, then the kernel ni=1 Y estimator is modified as ˆmx = i K 1 x 1 X 1i K x 2 W i, where K ni=1 K 1 x 1 X 1i K x 2 W i 1a = 1 a d 1 K1 and K1 is a conventional kernel function for X 1, and analogous results can e estalished. 4

5 form m θ on the regression function, several methods are availale to estimate θ under certain regularity conditions. For example, ased on Butucea and Taupin 2008, we can estimate the parameter θ y the weighted least squares regression of Y on the implied conditional mean function E[m θ X W ]. In this paper, we do not specify the construction of the estimator ˆθ for θ except for a mild assumption on the convergence rate see Section 3 for details. In order to construct a test statistic for H 0, as in Härdle and Mammen 1993, we compare the nonparametric and parametric estimators of the regression function ased on the L 2 -distance, D n = n ˆmx ˆfx [K mˆθ ˆf]x 2 dx, where is the Euclidean norm, ˆfx = 1 n n i=1 K x W i is the deconvolution kernel density estimator for X, K x = 1 d K x, and [K mˆθ ˆf]x = K x amˆθa ˆfada is a convolution. The convolution y the original kernel function K plays an analogous role to the smoothing operator in Härdle and Mammen Note that the Fourier transform of a convolution is given y the product of the Fourier transforms. Thus y Parseval s identity, the distance D n is alternatively written as D n = n K ft t 2 2π d fɛ ft t 2 1 n n Y i e it W i [mˆθ ˆf] ft tfɛ ft t i=1 2 dt. Based on this expression, the distance D n can e interpreted as a contrast of the nonparametric and model-ased estimators for E[Y e it W ]. Let ζ i t = Y i e it W i e is W i m fṱt s Kft s dsf ft θ f ft ɛ t. To define the test statistic for H 0, we further decompose D n as ɛ s D n = 1 n n i=1 B n + T n, 1 K ft t 2 2π d fɛ ft t 2 ζ it 2 dt K ft t 2 n 2π d f ft ɛ t 2 ζ itζ j tdt 1 where ζ j t is the complex conjugate of ζ j t. The second term T n plays a dominant role in the limiting ehaviour of D n and the first term B n is considered a ias term. That is to say, using very similar techniques to those in the Appendix, B n can e shown to e of smaller asymptotic order than T n. Therefore, we neglect B n and employ T n as our test statistic for H 0. In the next section, we study the asymptotic ehaviour of T n. We close this section y a remark on an alternative testing approach. To test the null hypothesis H 0, one may consider testing some implication of H 0 on the conditional mean E[Y W ] of oservales, i.e., consider H 0 : f W we[y W = w] = m θ w uf X w uf ɛ udu for almost every w, and test H 0 y a conventional method, such as Härdle and Mammen This 5

6 approach was employed y Song To clarify the rationale of our testing approach ased on T n against the conventional approach for H 0, let us consider the following local alternative hypothesis for the regression function sin x m n x = m θ x + 2a n cosa n x, x where a n 0 and A n as n. In this case, m n converges to m θ at the rate of a n under the L 2 -norm, and the test ased on T n will have non-trivial power for a certain rate of a n. On the other hand, local power of the test ased on the implied null H 0 is determined y the L 2 -norm of the convolution {m n m θ f X } f ɛ. By Parseval s identity and the Fourier shift formula, we have {m n m θ f X } f ɛ 2 = a 2 n {q ft A n + q ft + A n }fɛ ft 2, where qx = sin x x fx x. For example, if f ɛ is Laplace with fɛ ft t = 1/1 + t 2, then we can see that the L 2 -norm {m n m θ f X } f ɛ is of order a n /A 2 n. By letting A n diverge at an aritrarily fast rate, the rate a n /A 2 n ecomes aritrarily fast so that any conventional test for H 0 fails to detect deviations from this null. Therefore, as far as the researcher is concerned with testing the functional form of the regression function m, we argue that our statistic T n tests directly the null hypothesis H 0 and possesses desirale local power properties compared to the conventional tests on H 0. Song 2008 is very open aout the fact that the test proposed in that paper is not consistent and explains that consistency is only achieved when f ɛ z, as a distriution family with parameter z R d, forms a complete family. 3. Asymptotic properties In this section, we present asymptotic properties of the test statistic T n defined in 1. We first derive the limiting distriution of T n under the null hypothesis H 0. To this end, we impose the following assumptions. Assumption D. i: {Y i, X i, ɛ i } n i=1 are i.i.d. ɛ is independent of Y, X and has a known density f ɛ. ii: f ft, m ft, θ mft θ L 1R d L 2 R d. iii: K ft t is compactly supported on [ 1, 1] d, is symmetric around zero i.e., K ft t = K ft t, and is ounded. iv: As n, it holds that 0 and n d. 6

7 Assumption D i is common in the literature of classical measurement error. Extensions to the case of unknown f ɛ will e discussed in Section 3.1. Assumption D ii contains oundedness conditions on the Fourier transforms of the density f of X and the regression function m, as well as the derivative, with respect to θ, of the Fourier transform of m θ. Assumption D iii and iv contain standard conditions on the kernel function K and andwidth, respectively. A popular choice for the kernel function in the context of deconvolution methods is the sinc kernel Kx = sin x πx whose Fourier transform is equal to Kft t = I{ 1 t 1}. For simplicity, we use product kernels in the following way. Following Masry 1993, let K e a univariate kernel and K ft denote its Fourier transform. Define the univariate deconvolution kernel as where dimx j=1 K x j = 1 2π f ft ɛ j is the Fourier transform of ɛ j. K x j. write f ft ɛ t = dimt j=1 e ita K ft t f ɛ ft j t/ dt. Finally, set Kx = dimx j=1 Kx j and K x = Since we assume that ɛ is vector valued with independent elements, we can f ft ɛ j t j. Comining these facts, we have K ft x = dimx j=1 K ft x j. We impose additional assumptions ased on the ounds on the decay rate of the tail of the characteristic function of the measurement error, f ft ɛ. Let σ 2 x = E[U 2 X = x] e the conditional variance of the error term. The first case, called the ordinary smooth measurement error case, contains the following assumptions. Assumption O. i: f ft ɛ t 0 for all t R d and there exist positive constants c, C, and α such that c s α ft f ɛ j s C s α, for all 1 j d as s. ii: [mf] ft t 2 dt <, [m 2 f] ft t 2 dt <, and [σ 2 f] ft t 2 dt <. iii: ˆθ θ = o p n 1/2 d 1 4 +α under H 0. Assumption O i requires that the Fourier transform f ft ɛ j decays in some finite power. A popular example of an ordinary smooth density is the Laplace density. Assumption O ii contains oundedness conditions on the Fourier transforms of the density f of X, regression function m, and conditional error variance σ 2. Assumption O iii is on the convergence rate of the estimator ˆθ for θ when the parametric model is correctly specified. Note that this assumption is satisfied if ˆθ is n-consistent for θ. When the regression model under the null hypothesis is linear i.e., 7

8 m θ x = x θ, we can employ the methods in, for example, Gleser 1981, Bickel and Ritov 1987, or van der Vaart For nonlinear regression, we may choose the estimators y e.g., Taupin 2001 or Butucea and Taupin 2008 under certain regularity conditions. It is interesting to note that in contrast to the no measurement error case as in Härdle and Mammen 1993, the limiting distriution of the estimation error nˆθ θ does not influence the first-order asymptotic properties of the test statistic T n. This is due to the fact that the measurement error slows down the convergence rate of the dominant term of T n. For the second case, known as the supersmooth measurement error case, we impose the following assumptions. Assumption S. and γ > 1 such that i: f ft ɛ t 0 for all t R d and there exist positive constants C ɛ, µ, γ 0, f ft ɛ j s C ɛ s γ 0 e s γ /µ, as s. Also, there exist constants A > 0 and β 0 such that K ft 1 s = As β + os β, as s 0. ii: E[Y 4 ] <, E[W 4 ] <, t 2dβ θ mft θ t 2 dt <, and t 2dβ m ft t 2 dt <. iii: ˆθ θ = o p n 1/2 dγ 1/2+γβ+γ 0 e d/µγ. Assumption S i is adopted from Holzmann and Boysen This assumption requires that the Fourier transform f ft ɛ decays at an exponential rate. An example of the supersmooth density satisfying this assumption is the normal density, where C ɛ = 1, γ 0 = 0, γ = 2, and µ = 2. However, due to the requirement γ > 1, the Cauchy density is excluded. As is clarified in the proof of Theorem 1 iii elow, the condition γ > 1 is imposed to make a ias term negligile. Assumption S i also contains an additional condition on the kernel function. For example, the sinc kernel Kx = sin x πx satisfies this assumption with A = 1 and β = 0. Similarly to the ordinary smooth case, Assumption S ii contains oundedness conditions on the Fourier transforms, and Assumption S iii regards the convergence rate of the estimator ˆθ. Again, the n-consistency of ˆθ is sufficient. Under these assumptions, the null distriution of T n is otained as follows. Theorem 1. 8

9 i: Suppose that Assumptions D and O hold true. Then under H 0, C 1/2 d 2 V, T n N 0, 2π 2d, where C V, = O d1+4α is defined in 3 in the Appendix. ii: Suppose that Assumptions D and S hold true and ɛ N0, 1. Then under H 0, ϕt n d λ k Zk 2 1, k=1 2π where ϕ = d with the gamma function Γ, {Z k} is an independent sequence of standard normal random variales and {λ k } is defined in 13 in d1+4β e d/2 A 2d Γ1+2dβ the Appendix. iii: Suppose that Assumptions D and S hold true with d = 1. Then under H 0, φt n d λ k Zk 2 1, k=1 where φ = 2π d γ 1+2dβ C 2d ɛ µ 1+2dβ dγ 1+2γβ+2γ 0 e 2d/µγ A 2 Γ1+2dβ, {Z k} is an independent sequence of standard normal random variales and {λ k } is defined in 13 in the Appendix. Theorem 1 i says that for the ordinary smooth case, the test statistic T n is asymptotically normal. The normalizing term C V, comes from the variance of the U-statistic of the leading term in T n. Note that the convergence rate C 1/2 V, = O d α of the statistic T n is slower than the rate O d/2 of Härdle and Mammen s 1993 statistic for the no measurement error case. As the dimension d of X or the decay rate α of f ft ɛ increases, the convergence rate of T n ecomes slower. This theorem can also e used to show the rule-of-thum rate for the andwidth is of the order = n 1 4+d+2α. Theorem 1 ii focuses on the case of normal measurement error and shows that the test statistic converges to a weighted sum of chi-squared random variales. The normalizing term ϕ is characterized y the shape of the kernel function specified in Assumption S i. For example, if we employ the sinc kernel i.e., A = 1 and β = 0, the normalization ecomes ϕ = 2πd. d e d/2 Γd In this supersmooth case, the non-normal limiting distriution emerges ecause the leading term of the statistic T n is characterized y the degenerate U-statistic with a fixed kernel see, e.g., Serfling, 1980, Theorem In contrast, for the ordinary smooth case in Part i of this theorem, the leading term is characterized y a U-statistic with a varying kernel so that the 9

10 central limit theorem in Hall 1984 applies. An analogous result is otained in Holzmann and Boysen 2006 for the integrated squared error of the deconvolution density estimator. Theorem 1 iii presents the limiting null distriution of the test statistic for the case of general supersmooth measurement error. In this case, after normalization y ϕ, the test statistic oeys the same limiting distriution as the normal case in Part ii of this theorem. Thus, similar comments to Part ii apply. The normalization term ϕ is characterized y the shapes of the kernel function and Fourier transform f ft ɛ t of the measurement error specified in Assumption S i. Theorem 1 can e applied to otain critical values for testing the null H 0 ased on T n. Alternatively, we can compute the critical values y ootstrap methods. A ootstrap counterpart of T n is given y perturing each summand in T n as follows Tn = 1 νi νj 1 K ft t 2 n 2π d fɛ ft t 2 ζ itζ j tdt, 2 where {ν i }n i=1 is an i.i.d. sequence which is mean zero, unit variance, and independent of {Y i, W i } n i=1. The asymptotic validity of this ootstrap procedure follows y a similar argument to Delgado, Dominguez and Lavergne 2006, Theorem 6. In order to investigate the power properties of the test ased on T n, we consider a local alternative hypothesis of the form H 1n : mx = m θ x + c n x, for almost every x R d where c n 0 and x is a non-zero function such that the limits lim n n and lim n Υ n defined in 15 and 16, respectively, in the Appendix exist. The local power properties are otained as follows. 10

11 Theorem 2. i: Suppose that Assumptions D and O hold true. Then under H 1n with c n = n 1/2 d 1 4 +α, C 1/2 d V, T n N lim 2 n, n 2π 2d. ii: Suppose that Assumptions D and S hold true with d = 1 and ɛ N0, 1. Then under H 1n with c n = n 1/2 d1/2+2β e d/22, ϕt n d lim n Υ n + λ k Zk 2 1. k=1 iii: Suppose that Assumptions D and S hold true with d = 1. Then under H 1n with c n = dλ 1/2+λβ+λ 0 e d/µλ, ϕt n d lim n Υ n + λ k Zk 2 1, k=1 Theorem 2 i says that under the ordinary smooth case, our test has non-trivial power against local alternatives drifting with the rate of c n = n 1/2 d 1 4 +α. This is a nonparametric rate, and the test ased on T n ecomes less powerful as the dimension d of X or the decay rate α of f ft ɛ increases. For the no measurement error case, Härdle and Mammen s 1993 statistic has non-trivial power for local alternatives with the rate of n 1/2 d/4. Therefore, as expected, the test ecomes less powerful in the presence of measurement error. Theorem 2 ii and iii present local power properties of our test for the normal and general supersmooth measurement error cases, respectively. Except for the normalizing constants, the test statistic has the same limiting distriution. Also, for c n 0, the andwidth should decay at a log rate. As an example, consider the case of ɛ N0, 1. In this case, if we choose d 1/2, logn then the rate for the local alternative will e c n d d1/4+β. logn So the rate at which we can detect local alternatives is typically a log rate in the supersmooth measurement error case Case of unknown f ɛ. In practical applications, it is unrealistic to assume that the density of the measurement error, f ɛ, is known to the researcher. In the literature on nonparametric deconvolution, several estimation methods for f ɛ are availale, these are typically ased on additional data see, e.g., Section 2.6 of Meister 2009 for a review. Although the analysis of the asymptotic properties is different, we can modify the test statistic T n y inserting the estimated Fourier transform of the measurement error density, ˆf ft ɛ. For example, suppose the researcher has access to repeated measurements on X in the form of W = X + ɛ and W r = X + ɛ r, where ɛ and ɛ r are identically distriuted and X, ɛ, ɛ r 11

12 are mutually independent, see Delaigle, Hall and Meister 2008 for a list of examples of such repeated measurements. If we further assume that the Fourier transform f ft ɛ is real-valued that is the density f ɛ is symmetric around zero, then we can employ the estimator proposed y Delaigle, Hall and Meister 2008 ˆf ft ɛ t = 1 n n cos{tw i Wi r } i=1 1/2. Delaigle, Hall and Meister 2008 studied the asymptotic properties of the deconvolution density and regression estimators using ˆf ft ɛ and found conditions to guarantee that the differences etween the estimators with known f ɛ and those with unknown f ɛ are asymptotically negligile. Under similar conditions, we can expect that the asymptotic distriutions of the test statistic T n otained aove remain unchanged when we replace f ft ɛ remove the assumption that f ft ɛ with ˆf ft ɛ. If the researcher wishes to is real-valued and ɛ and ɛ r are identically distriuted, it may e possile to employ the estimator y Li and Vuong 1998 ased on Kotlarski s identity. 4. Simulation We evaluate the small sample performance of our test through a Monte Carlo experiment. To egin, we consider the same data generating process as Hall and Ma 2007 for ease of comparison. We also compare our test to Song Recall that although Song s 2008 and Hall and Ma s 2007 test are confined to polynomial regression models, our test allows nonlinear models. We take the true unoservale regressor {X i } n i=1 to e distriuted as U[ 3, 4] and Y i = X i + C cosx i + U i, where U i N0, 1 and C is a constant to e varied. The contaminated regressor is given y W i = X i + ɛ i. We consider two distriutions for ɛ i to e drawn from. For the ordinary smooth case, we use the Laplace distriution with variance of 0.5. For the supersmooth case, we use N0, 1. We use the following kernel for all of our simulations Fan, 1992 Kx = 48 cosx πx sinx x 2 2 πx 5 5x 2. We report results for a range of sample sizes, andwidths, and nominal levels of the test. Specifically, for the ordinary and supersmooth cases, we choose the andwidths according to the rules of thum = c 5σ 4 1/9 n and = c 2σ 2 1/2, logn respectively, where σ is the standard deviation of the measurement error and c varies in the grid {0.01, 0.05, 0.1, 0.5, 1, 1.5} so we can analyse the sensitivity of our test to the andwidth. For the parametric estimator, we use the polynomial estimator of degree 2 proposed y Cheng and Schneeweiss 1998 so as to remain consistent with 12

13 the experiment conducted y Hall and Ma For the test of Song 2008 we use the same kernel as for our test and choose andwidths y cross-validation there was little dependence on the andwidths so we report only for these cross-validated values. All results are ased on 1000 Monte Carlo replications. Tale 1 takes C = 0 so as to assess the level accuracy of our test. To study the power properties of the test, we take C = 1.5 in Tale 2. The critical values for all tests are ased on 99 replications of the ootstrap procedure results were very similar for 199 replications and hence are not reported. The perturation random variale ν for the ootstrap is drawn from the Rademacher distriution. Finally, to highlight the power advantages of our test under high-frequency alternatives, we consider the slightly altered data generating process Y i = X i + cosπδx i + U i, where δ is a constant to e varied; larger values corresponding to higher frequency alternatives. All other parameter settings remain unchanged. Results for these experiments are shown in Tales 3-5. The column laelled HM corresponds to the power of the test proposed y Hall and Ma 2007 and the column laelled S corresponds to the power of the test proposed y Song Tale 1: Y = X + U Ordinary Smooth Bandwidth n Level % 3.0% 2.6% 2.3% 3.1% 2.4% 0.6% 50 5% 7.6% 7.1% 6.3% 6.5% 6.4% 2.4% 10% 12.7% 11.9% 10.7% 9.8% 10.1% 5.7% 1% 2.2% 2.0% 2.8% 2.3% 2.2% 0.6% 100 5% 4.9% 5.6% 6.8% 6.3% 6.9% 3.4% Super Smooth 10% 11.6% 10.5% 11.9% 12.4% 11.1% 7.3% 1% 2.1% 1.9% 1.3% 1.4% 1.9% 1.2% 50 5% 5.3% 5.2% 4.9% 5.4% 5.7% 4.3% 10% 11.1% 9.3% 9.6% 10.8% 10.4% 7.7% 1% 2.9% 2.4% 2.7% 2.0% 1.7% 1.9% 100 5% 6.9% 6.5% 6.0% 6.6% 5.3% 5.8% 10% 12.3% 10.7% 10.3% 10.6% 10.8% 10.5% 13

14 Tale 2: Y = X cosx + U Ordinary Smooth Bandwidth n Level HM S 1% 43.2% 33.7% 26.5% 40.6% 81.8% 63.3% 71.0% 86.1% 50 5% 63.1% 52.7% 45.5% 61.6% 92.1% 78.3% 76.9% 92.8% 10% 73.5% 63.1% 57.8% 72.2% 94.8% 86.2% 85.3% 95.3% 1% 67.4% 50.1% 38.5% 66.5% 99.3% 97.6% 95.8% 88.6% 100 5% 84.2% 71.3% 61.4% 85.1% 99.8% 99.2% 97.1% 99.1% Super Smooth 10% 91.0% 80.1% 73.0% 92.3% 99.9% 99.9% 99.1% 99.9% 1% 28.1% 22.3% 19.7% 12.2% 24.5% 36.1% 66.2% 67.2% 50 5% 46.6% 38.7% 33.5% 24.1% 44.5% 55.2% 72.0% 84.2% 10% 57.4% 48.4% 43.2% 34.9% 56.9% 67.3% 80.4% 88.6% 1% 54.0% 35.9% 23.5% 15.4% 47.2% 69.0% 94.3% 89.3% 100 5% 70.7% 52.5% 41.5% 28.8% 63.8% 84.6% 95.9% 94.5% 10% 78.1% 64.5% 53.9% 42.1% 73.7% 89.8% 97.7 % 97.8% 14

15 Tale 3: Y = X + cosπx + U Ordinary Smooth Bandwidth n Level HM S 1% 21.3% 15.9% 13.7% 12.9% 6.6% 2.4% 13.2% 8.9% 100 5% 40.3% 30.3% 26.5% 27.5% 17.6% 6.8% 27.5% 19.6% 10% 51.4% 45.0% 38.2% 38.0% 26.5% 13.6% 38.7% 26.8% 1% 35.6% 19.7% 12.7% 19.4% 11.0% 2.6% 20.3% 14.4% 200 5% 54.5% 37.2% 27.9% 36.2% 22.2% 8.7% 37.4% 24.4% Super Smooth 10% 66.4% 50.0% 39.8% 49.0% 32.2% 17.0% 50.0% 39.0% 1% 15.3% 12.4% 7.3% 4.3% 4.0% 2.6% 4.7% 11.2% 100 5% 29.0% 22.3% 18.1% 11.5% 9.6% 7.1% 11.0% 17.4% 10% 38.7% 31.7% 27.3% 19.5% 17.1% 14.8% 20.3% 30.3% 1% 23.8% 16.7% 10.6% 4.3% 5.3% 3.0% 5.7% 14.7% 200 5% 40.9% 29.1% 22.0% 13.7% 11.0% 8.2% 14.2% 26.0% 10% 52.9% 38.4% 32.3% 21.3% 18.6% 12.7% 24.3% 34.9% 15

16 Tale 4: Y = X + cos2πx + U Ordinary Smooth Bandwidth n Level HM S 1% 20.9% 16.7% 13.8% 6.9% 5.6% 1.8% 9.2% 7.3% 100 5% 38.7% 31.5% 28.0% 17.1% 12.5% 5.6% 20.6% 13.6% 10% 49.3% 43.2% 37.8% 25.4% 19.5% 9.9% 29.7% 23.7% 1% 35.9% 20.8% 15.3% 7.8% 4.8% 1.3% 9.3% 13.6% 200 5% 55.9% 37.4% 28.9% 18.4% 11.2% 4.6% 21.7% 23.8% Super Smooth 10% 66.8% 49.8% 40.4% 28.6% 17.6% 10.1% 31.4% 31.9% 1% 16.1% 11.2% 9.1% 5.3% 4.0% 3.1% 5.2% 7.6% 100 5% 30.4% 22.4% 17.9% 12.6% 11.0% 7.3% 11.6% 17.3% 10% 41.3% 35.0% 26.1% 20.6% 17.4% 13.7% 19.3% 27.1% 1% 23.6% 13.4% 9.4% 5.1% 4.7% 3.7% 5.3% 8.9% 200 5% 39.4% 25.0% 20.0% 11.8% 12.0% 8.4% 13.1% 25.0% 10% 50.8% 35.8% 30.5% 20.5% 19.4% 13.9% 20.3% 32.2% 16

17 Tale 5: Y = X + cos3πx + U Ordinary Smooth Bandwidth n Level HM S 1% 22.8% 17.0% 11.8% 7.9% 5.4% 1.8% 9.4% 7.3% 100 5% 39.3% 32.6% 28.2% 17.7% 12.9% 5.9% 20.8% 19.0% 10% 50.6% 46.1% 38.1% 27.2% 19.8% 10.4% 31.8% 27.1% 1% 36.4% 20.1% 14.0% 8.1% 4.2% 1.9% 9.9% 7.9% 200 5% 54.1% 36.3% 29.2% 19.4% 10.7% 5.0% 21.8% 21.8% Super Smooth 10% 67.1% 48.9% 40.0% 27.6% 18.0% 10.3% 32.0% 27.7% 1% 17.4% 10.9% 7.5% 4.7% 4.6% 3.3% 5.6% 8.5% 100 5% 31.4% 22.8% 19.6% 12.8% 10.1% 9.1% 12.0% 17.4% 10% 42.0% 33.2% 28.3% 20.8% 16.9% 14.3% 21.0% 26.1% 1% 21.5% 15.6% 9.3% 5.5% 4.6% 2.9% 4.8% 14.0% 200 5% 39.1% 29.0% 21.6% 12.3% 11.6% 8.1% 14.1% 22.5% 10% 50.9% 37.5% 30.2% 21.1% 17.9% 13.7% 22.6% 27.8% The results are encouraging and seem to e consistent with the theory. Tale 1 indicates that our test tracks the nominal level relatively closely. There does appear to e some dependence on the andwidth; smaller andwidths tending to lead to an over-rejection and larger andwidths leading to an under-rejection of the null hypothesis. Tale 2 gives a direct comparison to the tests proposed in Hall and Ma 2007 and Song As we expected, in this low-frequency alternative setting, our test is generally slightly less powerful than the other tests. Having said this, in the ordinary smooth case for several choices of andwidth our test does display the highest power. Hall and Ma s 2007 test is ale to detect local alternatives at the n-rate for oth ordinary and supersmooth measurement error distriutions, and the test of Song 2008 is ale to detect local alternatives at the rate n d/2 in oth cases. However, our test achieves a slower polynomial rate in the ordinary smooth case and only a logn-rate in the super smooth case. Thus it is not surprising to see our test underperform when faced with Gaussian measurement error. However, the test is still ale to enjoy considerale power in this case especially for larger sample sizes. 17

18 On the other hand, as mentioned earlier, we suspect that our test is etter suited to detecting high-frequency alternatives than Hall and Ma This is confirmed in Tales 3-5. We find that for smaller andwidths our test is more powerful across the range of δ. Unfortunately, the power of our test shows considerale variation across the andwidth choices. For smaller andwidths, the power is generally much higher. This is intuitive and is explained in Fan and Li Nonsmoothing tests can e thought of as smoothing tests ut with a fixed andwidth. Thus it is the asymptotically vanishing nature of the andwidth in smoothing tests that allow for the superior detection of high-frequency alternatives. When smaller andwidths are employed, the test is etter ale to pick up on these rapid changes. As discussed at the end of Section 2, the test of Song 2008 will have poor power properties for some high-frequency alternatives due to testing the hypothesis ased on E[Y W ] rather than E[Y X]. This fact is also reflected in the Monte Carlo simulations where the power falls as we move to higher frequency alternatives and is inferior to the test we propose. Interestingly the test of Song 2008 appears to dominate the test of Hall and Ma 2007 for the supersmooth case ut not for the ordinary smooth case. We can learn from these simulations that for reasonaly small samples with supersmooth measurement error, perhaps the tests proposed y Hall and Ma 2007 or Song 2008 would e a wiser choice if one suspects deviations from the null of a low-frequency type, otherwise our test appears to e superior. However, we suggest that to avoid any risk of very low power the test proposed in this paper may e the est option. In order to account for the dependence of our test on the andwidth, we may look to employ the ideas of Horowitz and Spokoiny 2001 to construct a test that is adaptive to the smoothness of the regression function. In order to do this we could construct a test statistic of the form T A,n = max n B n T n where B n is a finite set of andwidths. To otain valid critical values we can use a ootstrap procedure similar to the one proposed in Section 3. Specifically we construct a ootstrap counterpart as T A,n = max n B n T n where T n is defined in 2. It is eyond the scope of this paper to determine the asymptotic properties of such a test ut this could prove to e a fruitful area for future research. 18

19 5. Empirical Example We apply our test to the specification of Engel curves for food, clothing and transport. An Engel curve descries the relationship etween an individual s purchases of a particular good and their total resources and hence provides an estimate of a good s income elasticity. Much work has een carried out on the estimation of Engel curves and in particular, the correct functional form which has een shown to significantly affect estimates of income elasticity see for example Leser, Hausmann, Newey and Powell 1995 highlighted the prolem that measurement error plays in the estimation of Engel curves. To the est of our knowledge, no previous work has tested the parametric specification of Engel curves whilst accounting for the inherent measurement error in the data. We concentrate on the Working-Leser specification put forward y Leser 1963 Y gi = a 0 + a 1 Xi logx i + a 2 X i + U i where Y gi is the expenditure on good g of consumer i and X i is the true total expenditure of consumer i. It is commonly elieved that the measurement error in total expenditure is multiplicative, hence we take X i = logx i as our true regressor and adjust the specification accordingly, as in Schennach We use data from the Consumer Expenditure Survey where we take the third quarter of 2014 as our sample, giving 4312 oservations. To account for the measurement error we make use of repeated measurements of X. Specifically, we use total expenditure from the current quarter as one measurement and total expenditure from the previous quarter as the other. To estimate the parametric form we employ the estimator proposed y Schennach For the nonparametric estimator, we use Delaigle, Hall and Meister 2008 and select the andwidth using the cross-validation approach also proposed in that paper. To analyse the sensitivity of our test to the choice of andwidth we report results for various andwidths around the cross-validated choice of We use the same kernel and ootstrap procedure as implemented in the Monte Carlo simulations. We should riefly mention that endogeneity may e an additional complication in this setting see for example Blundell, Browning and Crawford, To the est of our knowledge, there are no tests ale to deal with this dual prolem, however the results of Adusumilli and Otsu 2017 may provide a possile solution. Of course, in that situation one would also require a suitale instrument which may prove to e elusive. 19

20 Tale 6 reports the p-value for our specification test on food, clothing and transport for a range of andwidths. Tale 6: Engel curve p-values Good Bandwidth Food Clothing Transport We can see that the test is fairly roust to the choice of andwidth. The parametric specification is rejected for all andwidths apart from 0.5 where we fail to reject the null hypothesis at the 5% level for food and at the 10% level for clothing. Interestingly Härdle and Mammen 1993 otained similar findings in the case of transport ut tended to fail to reject the Working-Leser specification for food. Thus, it appears that accounting for measurement error is indeed very important to draw correct conclusions and must not simply e ignored. 20

21 Appendix A. Mathematical Appendix Hereafter, f g means f/g 1 as 0. A.1. Proof of Theorem 1. A.1.1. Proof of i. First, we define the normalization term C V, and characterize its asymptotic order. Let ξ i t = Y i e it W i H i,j = K ft t 2 e is W i m ft t s Kft s ft fɛ ft dsfɛ t, s f ft ɛ t 2 ξ itξ j tdt. Then C V, is defined as C V, = E[H1,2] 2 K ft t 1 2 K ft t 2 2 = fɛ ft t 1 2 fɛ ft t { } [m 2 f] ft t 1 + t 2 + [σ 2 f] ft t 1 + t 2 fɛ ft t 1 + t 2 fw ft s 1 + s 2 m ft t 1 s 1 m ft t 2 s 2 Kft s 1 K ft s 2 fɛ ft s 1 fɛ ft s 2 ds 1ds 2 fɛ ft t 1 fɛ ft t 2 [mf] ft t 2 + s 1 fɛ ft t 2 + s 1 m ft t 1 s 1 Kft s 1 fɛ ft s 1 ds 1fɛ ft t 1 [mf] ft t 1 + s 1 fɛ ft t 1 + s 1 m ft t 2 s 1 Kft s 1 fɛ ft s 1 ds 1fɛ ft 2 t 2 dt 1 dt 2. To find the order of C V,, we consider the square of each of these four terms and all of their cross products. For example, the square of the first term is ounded y 3 K ft t 1 2 K ft t 2 2 { } fɛ ft t 1 2 fɛ ft t 2 2 [m 2 f] ft t 1 + t 2 + [σ 2 f] ft t 1 + t 2 fɛ ft t 1 + t 2 2 dt 1 dt 2 2d 4dα K ft a 1 2 K ft a 2 2 a 1 2dα a 2 2dα 1 + a1 + a 2 / 2 dα [m 2 + σ 2 f] ft a 1 + a 2 / 2 da 1 da 2 d 4dα 1 + a 2 dα [m 2 + σ 2 f] ft a 2 da K ft a 2 4 a 2 4dα da 2 = O d1+4α, 4 where the first wave relation follows from the change of variales a 1, a 2 = t 1, t 2 and Assumption O i, the second wave relation follows from the change of variales a = a 1 + a 2 /, and the equality follows from Assumption D iii and O ii. Since all other squared and cross terms can e ounded in the same manner, we otain C V, = O d1+4α. 21

22 Second, we show that the estimation error of θ is negligile for the limiting distriution of T n. Decompose ζ i t = ξ i t + ρ i t, where ρ i t = Then the test statistic T n is written as T n = 1 n + 1 n e is W i {m ft θ t s mfṱ 1 K ft t 2 2π d fɛ ft t 2 ξ itξ j tdt + 1 n θ t s s}kft fɛ ft s dsf ft ɛ t. 1 K ft t 2 2π d fɛ ft t 2 ρ itρ j tdt 1 K ft t 2 2π d fɛ ft t 2 ρ itξ j tdt K ft t 2 n 2π d f ft ɛ t 2 ξ itρ j tdt T n + R 1n + R 2n + R 3n. By an expansion around ˆθ = θ and Assumption O iii, the term R 1n satisfies R 1n = o p d/2 2α 1 1 K ft t 2 n 2 2π d fɛ ft t 2 ρ 1itρ 1j tdt, 5 where ρ 1i t = e is W i mft K ft s dsf ft fɛ ft s ɛ t. By the Cauchy-Schwarz inequality and Assumption D ii, θ t s θ [ K ft t 2 ] E fɛ ft t 2 ρ 1itρ 1j tdt = K ft t 2 f ft s m ft θ θ t s K ft sds = O dt Also, y applying the same argument to 4 under Assumption O ii, we have E [ K ft t 2 f ft ɛ t 2 ρ 1itρ 1j tdt 2 ] = O d1+4α. 7 Comining 5-7 and C V, = O d1+4α, we otain C 1/2 V, R 1n = o p 1. In the same manner we can show C 1/2 V, R 2n = o p 1 and C 1/2 V, R 3n = o p 1 under Assumption O ii-iii and thus C 1/2 V, T n = C 1/2 V, T n + o p 1. Thirdly, we derive the limiting distriution of C 1/2 T n. Note that T n is written as T n = 1 n 1 2π d H i,j and is a U-statistic with zero mean ecause E[Y expit W ] = [m θ f] ft tf ft ɛ t under H 0. To prove the asymptotic normality of T n, we apply the central limit theorem in Hall 1984, Theorem 1. To this end, it is enough to show V, E[H1,2 4 ] ne[h1,2 2 0, and E[G 2 1,2 ] ]2 E[H1,2 2 0, 8 ]2 22

23 where G i,j = E[H 1,i H 1,j Y 1, W 1 ]. Recall that C V, = E[H 2 1,2 ] defined in 3 satisfies C V, = O d 4α. By a similar argument to ound E[H1,2 2 ] in 4, we can show [ E[H1,2] 4 = E 4 k=1 For E[G 2 1,2 ], we can equivalently look at3 ] K ft t k 2 fɛ ft t k 2 ξ 1t k ξ 2 t k dt 1 dt 4 = O 3d1+8α. = E[H 1,3 H 1,4 H 2,3 H 2,4 ] 4 K ft t k 2 fɛ ft t k 2 ξ 1t 1 ξ 3 t 1 ξ 1 t 2 ξ 4 t 2 ξ 2 t 3 ξ 3 t 3 ξ 2 t 4 ξ 4 t 4 dt 1 dt 4 k=1 = O d1+8α. These results comined with Assumption D iv guarantee the conditions in 8. Thus, Hall 1984, Theorem 1 implies and the conclusion follows. C 1/2 V, T n d N 0, 2 2π 2d, A.1.2. Proof of ii. A similar argument to the proof of Part i guarantees ϕt n = ϕ T n + 2π o p 1, where ϕ = d. Thus, we hereafter derive the limiting distriution d2+4β e d/2 A 2d Γ1+2dβ of T n. Decompose T n = T n + r 1n + r 2n + r 3n, where T n = 1 1 K ft t 2 n 2π d f ft ɛ t 2 Y ie itw i Y j e itw jdt, 9 r 1n = 1 1 K ft t 2 n 2π d fɛ ft t 2 e isw i m ft t s Kft s ft fɛ ft dsfɛ t s e isw jm ft t s Kft s fɛ ft s K ft t 2 fɛ ft t 2 Y ie itw i r 2n = 1 1 n 2π d r 3n = 1 1 K ft t 2 n 2π d fɛ ft t 2 dsf ft ɛ t dt, e isw jm ft t s Kft s fɛ ft dsf ft ɛ t s e isw i m ft t s Kft s ft fɛ ft dsfɛ t s dt, Y j e itw jdt. 3 See Hall 1984 p.5. 23

24 First, we derive the limiting distriution of T n. Oserve that T n = 1 1 n 2π d K ft t 2 e dt2 Y i Y j {costw i costw j + sintw i sintw j }dt = 1 } 1 n d 2π d K ft t 2 twi twj twi twj e dt/2 Y i Y j {cos cos + sin sin dt 1 1 = d 2π d K ft t 2 1 } Wi Wj Wi Wj e dt/2 dt Y i Y j {cos cos + sin sin n +O p d2+4β e d/2 1 1 d 2π d K ft t 2 e dt/2 dt T n + O p d2+4β e d/2, 10 where the first equality follows from f ft ɛ t = e dt2 /2 and e itw i = costw i + i sintw i, the second equality follows from a change of variales, and the third equality follows from a simple multivariate extension of Holzmann and Boysen 2006, Theorem 1 ased on Assumption S ii. Note that T n = 1 Y i Y j n { cos Xi cos ɛ i sin Xi sin } { ɛ i cos { + sin Xi cos ɛ i + cos Xi sin } { ɛ i sin From van Es and Uh 2005, proof of Lemma 6, it holds ɛi mod 2π d V ɛ i U[0, 2π] as 0 for each i, where V ɛ i Xi Xj Xj cos ɛ j Xj sin sin ɛ j } cos ɛ j Xj + cos sin ɛ j } d mod 2π V X i 11 U[0, 2π] and is independent from Y i, Vi X. Thus y applying Holzmann and Boysen 2006, Lemma 1, Tn has the same limiting distriution as T V n = 1 n hq i, Q j, where Q i = Y i, Vi X, Vi ɛ and hq i, Q j = Y i Y j { { cosv X i cosvi ɛ sinv i X sinvi ɛ} cosvj X cosvj ɛ sinv j X + { sinvi X cosvi ɛ + cosv i X Oserve that Cov hq 1, Q 2, hq 1, Q 3 = 0 ecause E[cosV ɛ i { sinvi ɛ} sinvj X cosvj ɛ + cosv j X sinv ɛ j } sinv ɛ j } ] = E[sinV ɛ] = 0. Therefore, y applying the limit theorem for degenerate U-statistics with a fixed kernel h Serfling, 1980, Theorem 5.5.2, we otain T V n d λ k Zk 2 1, 12 k=1 where {Z k } is an independent sequence of standard normal random variales and {λ k } are the eigenvalues of the integral operator i.. ΛgQ 1 = λgq

25 where ΛgQ 1 = E[hQ 1, Q 2 gq 2 Q 1 ]. Also, an extension of van Es and Uh 2005, Lemma 5 gives Comining 10-14, 1 2π d ϕ T n K ft t 2 e dt/2 dt d ϕ. 14 d λ i Zi 2 1. k=1 Next, we show that r 1n is negligile. Oserve that r 1n = = = 1 1 n 3d 2π 1 n 3d 1 2π t s K K ft t 2 e iswi/ m ft ft s f ft ɛ s/ ds t s K e iswj/ m ft ft s fɛ ft s/ ds dt { cos s1 W i m ft t s 1 } K ft s 1 fɛ ft s 1 / ds 1 { 1 s2 W K ft t 2 cos j m ft s 2 } t K ft s 2 fɛ ft s 2 / ds 2 { 2π + sin s1 W i m ft } dt t s 1 K ft s 1 f ɛ fts 1/ ds 1 { s2 W sin j m ft } s 2 t K ft s 2 fɛ ft s 2/ ds 2 K ft t 2 K ft s 1 K ft s 2 t fɛ ft s 1 /fɛ ft m ft s1 m ft s2 t ds 1 ds 2 dt s 2 / { } Wi Wj Wi Wj cos cos + sin sin + O p d2+4β e d/2, 1 n 3d where the first equality follows from a change of variales, the second equality follows from a direct calculation using e isw i = cossw i + i sinsw i, the third equality follows from Holzmann and Boysen 2006, Theorem 1 ased on Assumption S ii. By a similar argument used to show 12, it holds 1 n { cos Wi cos Wj + sin Wi sin Wj } = O p 1. 25

26 Also, we have 1 K ft t 2 K ft s 1 K ft s 2 t 2π d fɛ ft s 1 /fɛ ft m ft s1 m ft s2 t ds 1 ds 2 dt s 2 / = 4d e d/2 2π d K ft t 2 K ft 1 2 v 1 K ft 1 2 v 2 d1 2 v d1 2 v e 2 2 e 2 2 m ft t 1+ 2 v 1 A2d d4+4β e d/2 2π d v dβ 1 vdβ t 1 1 t K ft t 2 m ft m ft d1 2 v e 2 2 d1 2 v e 2 2 dv 1 dv 2 dt A2d d4+4β e d/2 t 1 1 t 2π d K ft t 2 m ft m ft dt A2d Γ1 + dβ 2 d5+6β e d/2 2π d t 2dβ m ft t 2 dt = O d5+6β e d/2, m ft 1 2 v 2 t v dβ 1 e v 1 dv 1 dv 1 dv 2 dt v dβ 2 e v 2 dv 2 where the first equality follows from changes of variales s 1 = 1 2 v 1 and s 2 = 1 2 v 2, the wave relations follow from Assumption S i and L Hopital s rule, and the last equality follows from Assumption S ii. Comining these results, ϕr 1n = O p d1+2β, and thus r 1n is negligile. Similar arguments imply that the terms r 2n and r 3n are also asymptotically negligile. Therefore, the conclusion follows. A.1.3. Proof of iii. The proof for the general supersmooth case follows the same steps as in the proof of Part ii for the normal case. As the proof is similar, we omit the most part. Hereafter we show why the condition γ > 1 is imposed in this case. The dominant term T n defined in 9 satisfies T n 1 n d 1 2π d Cɛ 2d { cos K ft t 2 t twi twj cos 2dγ 0 e 2d t γ + sin µ γ Y i Y j twi sin twj } dt. 26

27 We now show that D cos 1 n d 1 2π d Cɛ 2d 1 2π d C 2d ɛ K ft t 2 t K ft t 2 t 2dγ 0 e 2d t γ 2dγ 0 e 2d t γ µ γ dt twi µ γ Y i Y j {cos 1 Y i Y j {cos n cos Wi twj cos } dt Wj is asymptotically negligile, as well as the correspondingly defined D sin. We have seen that each term is zero mean. Following the proof of Holzmann and Boysen 2006, Theorem 1, we otain cos twi cos twj cos Wi cos Wj 1 t W i + W k. Thus, similar arguments to van Es and Uh 2005, Lemmas 1 and 5 using Assumption S ii imply VarD cos = O n 2 2 d4γ t K ft t 2 t 2dγ 0 e 2d t γ µ γ dt E [ Y i 2 Y j 2 W i + W k 2] = O d4γ 0 4 2γ = O 4γ 0 4 γ2+2β e 2 2 µ γ, v K ft 1 γ v 2 1 γ v 2γ 0 e 2 1 γ v γ 2 µ γ dv using similar arguments as in the proof of Theorem 1 ii, and we otain D cos = O p 2γ 1+2γβ+2γ 0 e 2 µ γ. The same argument applies to D sin. Note that 1 1 T n = d 2π d K ft t 2 t Cɛ 2d 1 Wi Y i Y j {cos cos n +O d2γ 1+2γβ+2γ0 e 2d µ γ 2dγ 0 e 2d t γ µ γ dt Wj + sin Wi sin Wj = A2d µ 1+2dβ dγ 1+2γβ+2γ0 e 2d µ γ Γ1 + 2dβ T λ 1+2dβ 2π d n + O d2γ 1+2γβ+2γ0 e 2d µ γ Cɛ 2d T n ϕ + O d2γ 1+2γβ+2γ0 e 2d µ γ, where the second equality follows from the definition of T n in 11 and a modification of van Es and Uh 2005, Lemma 5. Therefore, we otain ϕt n = T n + O dγ 1. } } 27

28 The limiting distriution of T n is otained in the proof of Part ii. The remainder term ecomes negligile if we impose γ > 1. A.2. Proof of Theorem 2. A.2.1. Proof of i. By a similar argument to the proof of Theorem 1 i, the estimation error ˆθ θ is negligile for the asymptotic properties of T n and thus it is written as T n = 1 n + 1 n 1 K ft t 2 2π d fɛ ft t 2 ξ itξ j tdt + 1 n 1 K ft t 2 2π d fɛ ft t 2 η itη j tdt 1 K ft t 2 2π d fɛ ft t 2 ξ itη j tdt K ft t 2 n 2π d f ft ɛ t 2 η itξ j tdt + o p C 1/2 V, T n + R 1n + R 2n + R 3n + o p C 1/2 V,, where η i t = e is W i {m ft t s m ft θ t s s}kft ft fɛ ft dsfɛ t s = c n e is W i ft t s Kft s ft fɛ ft dsfɛ t, s under H 1n. By Theorem 1 i, it holds C 1/2 V, E[C 1/2 V, R1n] = n 1c2 n 2π d C 1/2 V, d T n N 2 0,. For R 2π 2d 1n, oserve that K ft t 2 K ft s 1 K ft s 2 ft t s 1 ft s 2 tf ft s 1 f ft s 2 ds 1 ds 2 dt n. 15 By the definition of c n, C V, = O d1+4α otained in the proof of Theorem 1 i, and Assumption D ii, it holds E[C 1/2 V, R1n ] = O1 and the limit of n exists. Also, a similar argument to 4 yields E[R1n] 2 = c 4 n K ft t 1 2 K ft t 2 2 K ft s 1 K ft s 2 K ft s 3 K ft s 4 fɛ ft s 1 fɛ ft s 2 fɛ ft s 3 fɛ ft fw ft s 1 + s 3 fw ft s 2 s 4 s 4 ft t 1 s 1 ft s 2 t 1 ft t 2 s 3 ft s 4 t 2 ds 1 ds 4 dt 1 dt 2 = O n 2 d1+6α. Therefore, VarC 1/2 V, R1n 0 and we otain C 1/2 V, R 1n arguments comined with E[ξ i t] = 0, we can show that C 1/2 V, R2n Comining these results, the conclusion follows. p lim n n. Finally, using similar p 0 and C 1/2 V, R 3n p 0. 28

SPECIFICATION TESTING FOR ERRORS-IN-VARIABLES MODELS

SPECIFICATION TESTING FOR ERRORS-IN-VARIABLES MODELS TAISUKE OTSU AND LUKE TAYLOR Astract. This paper considers specification testing for regression models with errors-invariales and proposes a test statistic