Breaking the curse of dimensionality in nonparametric testing

Size: px

Start display at page:

Download "Breaking the curse of dimensionality in nonparametric testing"

Lesley Gibbs
5 years ago
Views:

1 Breaking the curse of dimensionality in nonparametric testing Pascal Lavergne Simon Fraser University Valentin Patilea CREST-ENSAI November 2005 Abstract For tests based on nonparametric methods, power crucially depends on the dimension of the conditioning variables, and specifically decreases with this dimension. This is known as the curse of dimensionality. We propose a general approach to circumvent this problem and we show how to implement it when testing for a parametric regression. The resulting test behaves against directional local alternatives almost as if the dimension of the regressors was one. Keywords: Curse of dimensionality, Testing, Nonparametric methods. JEL classification: Primary C52; Secondary C4. We thank participants of ESRC 2005 (Bristol), CESG 2005 (Vancouver), in particular Yanqin Fan, and of the econometrics seminar at the University of Toulouse, for helpful comments. Special thanks to Jean-Marie Rolin for a very helpful suggestion. Address correspondence to: Pascal Lavergne, Dept. of Economics, Simon Fraser University, 8888 University Drive, Burnaby BC, V5A S6 CANADA. s: pascal lavergne@sfu.ca patilea@ensai.fr

2 Introduction The expression curse of dimensionality refers to the poor performances of local smoothing methods for multivariate data. Because of the sparsity of data in multidimensional spaces, the behavior of nonparametric smooth estimators quickly deteriorates as the number of dimension increases, see Stone (980). This issue is a prominent reason for the study of dimension-reduction models in nonparametric estimation. For instance, when the regression function depends only on a single linear index of the variables, the nonparametric estimator performs as in the one-dimensional case. The single-index regression model has been widely studied in econometrics, see e.g. Stoker (986), Härdle and Stoker (989), Powell, Stock and Stocker (989), Ichimura (993), Sherman (994b), Delecroix, Hristache and Patilea (2005). Many consistent specification tests of a (semi)parametric model contrast the latter model with a completely nonparametric one. Since nonparametric estimators suffer from the curse of dimensionality, so too do the power of the related tests. Specifically, most specification tests of a parametric regression are consistent against directional local alternatives that go further away from the null hypothesis when the dimension of the regressors increases, see for instance Härdle and Mammen (993) and Zheng (996). Another approach looks at the uniform consistency of the test against a class of regular alternatives, see Spokoiny (996), Horowitz and Spokoiny (200), Guerre and Lavergne (2002), and essentially reaches the same conclusion. The adverse effect of dimension on the tests power is also found to be significant in practice, as illustrated by our simulations in Section 4. Very little research has been aimed at alleviating the curse of dimensionality in testing. Zhu (2003) proposed a dimension-reduction type test for a parametric regression, but his null hypothesis is actually the independence of residuals and regressors. This is too strong a hypothesis for econometric application, in particular as data often exhibit conditional volatility. After we wrote a first version of this paper, we discovered a former work by Zhu and Li (998), who put forward a similar idea that we develop here for testing a linear 2

3 parametric regression, but did not study the related test. The purpose of this paper is to propose a cure for the curse of dimensionality in nonparametric testing. In Section 2, we propose a general approach that could apply to many nonparametric testing problems. To show the potential benefits of our approach, we apply it to testing for the parametric form of a regression and derive its properties in Section 3. Our main finding is that the resulting test behaves against directional alternatives almost as if the dimension of the regressors was one. Hence there is a cost to dimensionality, but this cost can be set arbitrarily low and is paid only once, as it does not increase with the dimension of the regressors. Our simulation study in Section 4 confirms these theoretical findings. Before entering into details, let us give a flavor of our general approach. As shown in Section 2, many testing problems consider a null hypothesis of the form H 0 : E [U(θ 0 ) X] = 0 almost surely (a.s.) for some θ 0, (.) where θ 0 is an unknown parameter to be estimated and X R q. That is, we want to check whether a zero conditional moment restriction holds for almost any value of the variables X. Our proposal for reducing the dimensionality of the problem is to use single linear indices X β as a conditioning variable instead of X and to then to look at E [U(θ 0 ) X β] for all directions defined by β of norm one in R q. This clearly helps in breaking the curse of dimensionality, but is it sufficient to obtain a consistent test? Our fundamental lemma in Section 2 shows that it is indeed the case. Hence looking at the index that makes an estimate of E [U(θ 0 ) X β] furthest away from zero yields a consistent test, and hopefully a powerful one. Under the null hypothesis however, the latter quantity is zero for every direction β. Hence, we introduce a criterion that favors one of the direction under H 0 and then yields a simple behavior. Our dimension-reduction approach is thus the testing counterpart of single-index model in estimation, but with the fundamental difference that the function under test does not need to depend on a single-index only. 3

4 2 Dimension reduction in nonparametric testing 2. Testing against nonparametric alternatives A common feature of many nonparametric tests is to consider a zero conditional moment restriction of the form (.). The unknown parameter θ 0 can be of finite or infinite dimension and should be estimated either before constructing the test or at the same time. Many testing problems can be recast into this framework. We detail here some important ones, and first the one that we will look at in Section 3. Example : Testing for a parametric regression. A large literature has been devoted to checking the functional form of a regression function. In that case, U(θ) = Y µ(x; θ), where µ( ; ) belongs to a parametric family and θ belongs to a subset of R d. Tests using smoothing methods have been proposed by Härdle and Mammen (993), Hong and White (995), and Zheng (996), among others, see also Hart (997) for a review. Against general directional alternatives of the form E [Y X] = µ(x; θ 0 ) + r n δ(x), r n should be of higher order than n /2 h q/4 to obtain consistency. When looking at the uniform consistency of the test against a class of alternatives of known smoothness s, it is found that the alternatives should be at distance n 2s/4s+q for consistency, see Guerre and Lavergne (2002). When the smoothness index s is unknown, the so-called adaptive rate is less by a small factor, see Spokoiny (996), Horowitz and Spokoiny (200), and Guerre and Lavergne (2005). Another class of consistent tests for a parametric regression is based on various transforms of the cumulative process of residuals obtained from estimation of the parametric model, see in particular Bierens (982, 990) and Stute (997). Theoretical results are mixed: while such tests do not theoretically suffer from the curse of dimensionality under directional alternatives, they exhibit poor performances against sets of regular alternatives, see Guerre and Lavergne (2002). 4

5 Example 2: Testing conditional moment restrictions. In econometrics, we are frequently interested in conditional moment restrictions beyond the regression case. In our framework, U(θ) = ρ(y, X, θ), where ρ(,, ) is a multivariate function known up to a finite-dimensional parameter θ. A simple instance is testing for homoscedasticity, where ρ(y, X, θ) = Y 2 θ 2. Delgado, Dominguez and Lavergne (2005) provide several more instances. Stinchcombe and White (998), Koul and Stute (999), and Whang (200) study single conditional moment restrictions, Donald, Imbens and Newey (2003) and Delgado, Dominguez and Lavergne (2005) study multiple ones. Example 3: Testing nonparametric restrictions. In this context, θ is infinitedimensional. Some instances are as follows. When testing for additivity, U(θ) = Y q l= m l(x l ), where the unknown univariate functions m l ( ) are properly normalized, see Gozalo and Linton (200). When testing for a single-index model, U(θ) = Y m(x β), for an unknown β and an unknown univariate function m( ), see Fan and Li (996), Stute and Zhu (2005), and Xia and al. (2005). When testing for the significance of some regressors Z in a nonparametric regression on X = (X, Z), U(θ) = Y E (Y X ), see Fan and Li (996), Lavergne and Vuong (2000), Aït-Sahalia, Bickel and Stoker (200), Delgado and Gonzalez-Manteiga (200), Lavergne (200). Chen and Fan (999) consider other types of nonparametric restrictions. Another class of testing problems is closely related to our framework. Consider for instance testing for a parametric conditional distribution function. The null hypothesis is then H 0 : E [I(Y y) F (y X, θ 0 ) X] = 0 a.s. for all y Y for some θ 0, where F ( X, θ) is a parametric conditional cumulative distribution function and I( ) denotes the indicator function, see Andrews (997). Here, one faces a set of conditional moment restrictions indexed not only by (random) X, but also by (non-random) y. Such a pattern also appears in other instances such as testing for conditional independence, see Delgado and Gonzalez-Manteiga (200). Though we do not pursue this issue, our 5

6 approach could be generalized to these more general hypotheses, for instance by rewriting H 0 as a conditional moment restriction upon X only through an integral over the domain of y, see Hall and Yatchew (2005). 2.2 The fundamental lemma Our approach relies on the following lemma, which shows that for checking constancy of a conditional expectation, it is equivalent to consider expectations conditional on X and expectations conditional on single linear indices of X. Lemma 2. Let X R q and Z R c be random vectors, with E Z j <, j =,... c. A) The following statements are equivalent: (i) for any (non random) β R q with β =, E(Z X β) = E(Z) almost surely. (2.) (ii) E(Z X) = E(Z) almost surely. B) If X is bounded and P [E(Z X) = E(Z)] <, then the set S = {β R q : β =, E(Z X β) = E(Z) almost surely } is included in a finite union of contours on the sphere {β R q : β = }. In particular, S has Lebesgue measure on the sphere equal to zero and it is not dense on the sphere. Proof. A) That (ii) implies (i) is immediate. To prove that (i) implies (ii), it suffices to consider the case c = and E(Z) = 0. Note that for any β 0, the σ field generated by X β is the same as the σ field generated by X β/ β. By Condition (2.) and elementary properties of the conditional expectation, we obtain that for any β, including β = 0, 0 = E [exp{ix β}e(z X β)] = E [exp{ix β}z] = E [exp{ix β}e(z X)], where i =. Write Z = Z + Z where Z + and Z are the positive and negative parts of Z and deduce that for any β E [ exp{ix β}e(z + X) ] = E [ exp{ix β}e(z X) ]. 6

7 As distinct positive finite measures cannot have the same characteristic function, this implies that E(Z + X) = E(Z X) and hence E(Z X) = 0 almost surely. B) Without loss of generality, take c = and E(Z) = 0. By Theorem of Bierens and Ploberger (997), the set A = {β R q : E [exp{ix β}z] = 0} has Lebesgue measure zero and is not dense in R q. Since S A, the same conclusion holds for S. A careful inspection of the proofs of Lemma of Bierens (990) and Theorems and 2 in Bierens (982) actually shows that when P [E(Z X) = 0] <, A B = {A R q } {R A 2 R q 2 }... {R q A q } where A,..., A q R contain only isolated points. The intersection of B with the set of vectors β = is a finite union of circles and points, and the result follows. Our lemma readily yields a new formulation of the null hypothesis of interest. Corollary 2.2 Consider random vectors U(θ) R c depending on a parameter θ Θ, such that E U j (θ) <, j =,... c, for all θ, and X R q. Then for any function ω( ) such that for any β, ω(x β) > 0 on the support of E (U(θ 0 ) X β), (.) is equivalent to max E [U (θ 0 )E (U(θ 0 ) X β) ω(x β)] = 0 for some θ 0 Θ. (2.2) β = Lemma 2. can actually be deduced from Theorem of Bierens (982), who showed that E [Z X] = 0 is equivalent to E [Z exp{ix β}] = 0 for all β. Part A can also be found in Chen (99). Stinchcombe and White extended Bierens result showing that E [Z X] = 0 is equivalent to E [Zφ (X β)] = 0 for any β, whenever φ ( ) is analytic nonpolynomial. Our approach is closely related to theirs, but different in a key aspect. Instead of working with a particular known φ ( ) at the outset, we choose for each β the function of X β maximizing squared correlation with Z. Intuitively, this strategy should enable better detection of departures from the null hypothesis. It is 7

8 easily shown that the solution is E(Z X β), so that our null hypothesis writes E [ZE (Z X β)] = 0 for any β. Now, looking for the least favorable direction β for the null hypothesis yields (2.2) with ω ( ). This is in the spirit of the well-known union-intersection principle in classical multivariate analysis, cf. Roy (953). A similar reasoning applies if one maximizes the square of E [Zφ (X β) ω(x β)] and ω( ) is not identically one. 2.3 A general dimension-reduction approach In view of the previous corollary, our goal is to estimate the quantity in (2.2). Assume we have at our disposal a consistent estimator θ n of θ 0 and denote by U i (θ) the datadependent vector function of θ for observation i. Let γ i (X iβ, θ) be a consistent estimator of E (U(θ) X iβ) ω(x iβ) and Q n (θ, β) = n U i (θ) γ i (X iβ; θ). (2.3) Under suitable conditions, Q n ( θ n, β) should converge uniformly in β to Q(θ, β) = E [U(θ)γ(X β; θ)] = E [U(θ)E (U(θ) X β) ω(x β)]. When the restriction given by H 0 does not hold, the maximum of Q n ( θ n, β) over β should stay away from zero almost surely and a test based on it should be consistent. Under H 0 however, Q n ( θ n, β) converges to zero for any β. This means a potentially erratic behavior for max β = Q n ( θ n, β), which would yield high critical values and then low power. To avoid this, we introduce a penalized criterion and define { } β n = arg max Q n ( θ n, β) π n ( β β 0 ) β = Our penalization scheme is related to Tikhonov s regularization in estimation, see Carrasco, Florens and Renault (2006), but applied to a testing problem. Here β 0 is a fixed. 8

9 vector chosen by the practitioner. The penalty π n ( ) is a nonnegative function that equals zero only at zero. This forces the maximum to be attained at β 0 under H 0, provided the penalty is large enough with respect to max β = Q n ( θ n, β). However, the penalty should not perturb the behavior of the maximum when H 0 does not hold, hence we should have π n (t) 0 fast enough for all t as n grows. A normalized version of Q n ( θ n, β n ) is then taken as the test statistic. As will become apparent in the next section, the choice of the vector β 0 is theoretically irrelevant for consistency of the test. Though, the test should have greater power when E [U(θ 0 ) X β 0 ] is different from zero, and in practice the choice of β 0 may have some influence. By contrast, the choice of the penalty is crucial to control the level of the test and to ensure high power. 3 Testing for a parametric regression 3. The test Consider a random vector (Y, X ) R +q. We consider the q-variate regression m(x) = E(Y X) and continuous X, as discrete regressors do not (strictly speaking) yield a curse of dimensionality. Consider the parametric regression model {µ ( ; θ) : θ Θ} with Θ R d. The model is correctly specified if and only if H 0 as defined in (.) holds with U(θ) = Y µ(x; θ). We thus apply our general approach to this testing problem using kernel estimators. To avoid handling denominators close to zero, set the weight function ω( ) in (2.2) equal to the density of X β, denoted by f β ( ), which is assumed to exist for any β. Define Q(θ, β) = E{U(θ)E[U(θ) X β]f β (X β)} = E{E 2 [U(θ) X β]f β (X β)}. By Corollary 2.2, the regression model is then correctly specified iff max β = Q(θ 0, β) = 0. Assume (Y i, X i), i =,... n is a random sample from the distribution of (Y, X ). The parameter θ 0 can be estimated in a variety of ways. For instance, θ n can be the nonlinear 9

10 least-squares (NLLS) estimator of θ solving θ n = arg min θ Θ (Y i µ(x i ; θ)) 2, (3.) with an appropriate convention in case of ties. estimator of Q(θ, β) as Q n (θ, β) = n(n ) In view of Equation (2.3), define the U i (θ) U j (θ) h K h ((X i X j ) β) j i where U i (θ) = Y i µ (X i ; θ) and K h ( ) = K ( /h), where K( ) is a kernel and h a bandwidth. For a fixed β, the estimator Q n ( θ n, β) is the statistic studied by Li and Wang (998) and Zheng (996) applied to the index X β, and has an asymptotic centered normal distribution with rate nh /2 under H 0. Zhu and Li (998) first proposed to use the maximum over β of a statistic close to Q n ( θ n, β) for checking a linear regression model. However, their test is based on the maximum plus a term of the form (/n) n U i( θ n )φ ( X i ), where φ( ) is the standard normal univariate density (or any other known function). Hence, they combine a consistent test based on nonparametric methods with an inconsistent M-type test, so that the asymptotic behavior under H 0 is completely driven by the M-test statistic. Instead, we apply our penalization method and we choose β as β n = arg max {Q n (θ, β) α n I [β β 0 ]}, (3.2) β = where β 0 is user-chosen and α n, n, is a sequence of positive real numbers decreasing to zero at an appropriate rate. Our choice for the penalty function corresponds to the one of Bierens (990) and is made for simplicity. We will prove that β n = β 0 with probability tending to under H 0. Since Q n ( θ n, β n ) behaves like Q n ( θ n, β 0 ), a test is easily constructed. With at hand a consistent estimator v 2 n(β) of the variance of nh /2 Q n ( θ n, β), let T n = nh /2 Q n( θ n, β n ) v n (β 0 ). 0

11 An asymptotic α-level test is given by I (T n z α ), where z α is the ( α)-th quantile of the standard normal distribution. Moreover, as both v n( β 2 n ) and v n(β 2 0 ) estimate the variance of Q n ( θ n, β n ) under H 0, we can also consider I (T n z α ), where T n = nh /2 Q n ( θ n, β n ) ( ). min v n (β 0 ), v n ( β n ) The purpose of taking the minimum of the two variance estimators is to improve the small sample power of our test. 3.2 Assumptions We consider the following assumptions on the data-generating process. Assumption D (a) The random vectors (ε, X ),..., (ε n, X n) are independent copies of the random vector (ε, X ) R q+ with E(ε X) = 0 and E(ε 4 ) <. (b) Let σ 2 (x) = E(ε 2 X = x). There exist constants σ 2 and σ 2 such that for any x 0 < σ 2 σ 2 (x) σ 2 <. (c) For any β of norm one, X β admits a density f β ( ) that is bounded uniformly in β. Next, we introduce assumptions on the regression model. For any matrix A of generic element a kl, let A denote the matrix norm [ kl a2 kl ]/2. Assumption M a) Let Θ R d be a compact set. For any θ, θ 2 Θ, µ( ; θ ) µ( ; θ 2 ) = (θ θ 2 ) µ( ; θ 2 ) + (θ θ 2 ) µ( ; θ, θ 2 )(θ θ 2 ), where (i) µ( ; θ) is such that sup θ Θ µ(x; θ) Φ (X) with E [Φ 4 (X)] < ; (ii) µ( ; θ, θ 2 ) is such that sup θ,θ 2 Θ µ(x; θ, θ 2 ) Φ 2 (X) with E [Φ 4 2(X)] < ; and (iii) for all ε > 0, there is a η > 0 such that E sup θ θ 2 η µ(x; θ, θ 2 ) < ε. b) (Identification condition) There exists a real valued function Φ 3 ( ) that is not almost surely zero such that for any θ Θ and X, µ(x; θ) µ(x; θ 0 ) Φ 3 (X) θ θ 0.

12 A large range of parametric models satisfies Assumption M. Together with our assumptions on the design, the latter ensures the n consistency of the NLLS estimator (3.) as stated in Lemma 6. of Section 6. We make the following assumptions on the kernel and bandwidth. Assumption K a) The kernel K( ) is a bounded symmetric density of bounded variation. b) h 0 and (nh 2 ) α / ln n for some α (0, ). Last, we need some assumptions to estimate the asymptotic variance of nh /2 Q n ( θ n, β), which writes, conditionally upon the X i, v 2 n (β) = 2 n(n ) σ 2 (X i )σ 2 (X j )h Kh 2 ((X i X j ) β). j i In general, the conditional variance σ 2 ( ) is unknown. However, with at hand a nonparametric estimator of the conditional variance such that sup σ 2 (X i ) σ 2 (X i ) = o P(), (3.3) i n v 2 n(β) can be consistently estimated by v 2 n (β) = 2 n(n ) For instance, one can consider σ 2 (X i ) σ 2 (X j )h Kh 2 ((X i X j ) β). j i σ 2 (x) = n Y ( i 2 I { x X i b n } n n I { x X i b n } Y ) 2 ii { x X i b n } n I { x X, i b n } where b n is a bandwidth parameter chosen independently of h. Guerre and Lavergne (2005) provide some primitive conditions for (3.3). Then it is straightforward to show that v n(β)/v 2 n(β) 2 = + o P () for any β. Given our focus, we shall proceed under (3.3). 2

13 3.3 Behavior under the null hypothesis Our first task is to study the behavior of the process Q n ( θ n, β) as indexed by β under H 0. It has the following decomposition Q n ( θ n, β) = Q 0n (β) + 2Q n ( θ n, β) + Q 2n ( θ n, β) = n(n ) n(n ) n(n ) j i ε i {µ(x j ; θ n ) µ(x j ; θ 0 )} h K h ((X i X j ) β) j i ε i ε j h K h ((X i X j ) β) { µ(x i ; θ }{ n ) µ(x i ; θ 0 ) µ(x j ; θ n ) µ(x j ; θ 0 )} h K h ((X i X j ) β). j i Lemma 3. Let Assumptions D, M, and K hold. Then (i) sup β = Q 0n (β) = O P (n h (/2+γ) ) for any γ > 0, (ii) if θ n θ 0 = O P (n /2 ), sup β = 2Q n ( θ n, β) + Q 2n ( θ n, β) = o P (n h /2 ). The proof is given in Section 6. We now describe the behavior of β n under H 0. Lemma 3.2 Let Assumptions D, M, and K hold. Consider a positive sequence α n such that α n nh /2+γ C > 0 for some γ > 0. Under H 0, P( β n = β 0 ). Proof. By definition, for all n, Q n ( θ n, β 0 ) Q n ( θ n, β n ) α n I( β n β 0 ). This implies { that 0 I( β n β 0 ) αn Q n ( θ n, β } n ) Q n ( θ n, β 0 ). From Lemma 6., θ n θ 0 = O P (n /2 ) under H 0 and then from Lemma 3., Q n ( θ n, β n ) Q n ( θ n,β 0 )=O P (n h (/2+γ/2) ). Then α n nh /2+γ C > 0 yields I( β n β 0 ) = O P (h γ/2 ) = o P (). Use the boundedness of [ ] I( ) to conclude that P( β n β 0 ) = E I( β n β 0 ) 0. The asymptotic behavior of our tests under the null hypothesis can then be stated. Theorem 3.3 Under Assumptions D, M, K, and (3.3), if α n nh /2+γ C > 0 for some γ > 0, then the tests based on T n or T n have asymptotic level α. [ Proof. From Lemma 3.2, P Q n ( θ n, β ] [ ] n ) = Q n ( θ n, β 0 ) and P v n( β 2 n ) = v n(β 2 0 ) both converge to one. By Condition (3.3), v 2 n(β 0 ) = v 2 n(β 0 )( + o p ()). From Lemmas 6. and 3

14 3., nh /2 Q n ( θ n, β 0 ) = nh /2 Q 0n (β 0 ) + o p (). From Lemma 2-(i) by Guerre and Lavergne (2005), nh /2 Q 0n (β 0 )/v n (β 0 ) converges to a standard normal conditionally upon the X i if Sp (W β0 ) W β0 p 0, where W β0 = [I (i j) K h ((X i X j ) β 0 ) /(h n(n )), i, j =,... n] and Sp(W β ) is the spectral radius of the matrix W β. Lemma 6.2 allows to conclude. Technical comments. Lemma 3. is the theoretical key that drives our results. The quantity Q 0n (β) is in probability of order (nh /2 ) for any β, however the supremum over β is not regular enough to be shown of the same order, at least from the results of Sherman (994a) we use here. process Q 0n (β)/v n (β) has a more regular behavior. It is an open question whether the self-normalized The study of Q n ( θ n, β) raises a similar problem. Namely, for fixed β, standard empirical processes methods show that Q n ( θ n, β) = O P (n ), see e.g. Guerre and Lavergne (2005), but the same does not seem to hold uniformly over β. All the trouble here comes from that any β is solution of (2.2). As will be seen shortly, the study under fixed and directional alternatives is simpler. 3.4 Behavior under directional alternatives A simple inequality is at the heart of the consistency of our test. Indeed, we have T n T n = nh/2 Q n ( θ n, β n ) v n (β 0 ) [ max = nh/2 v n (β 0 ) nh/2 v n (β 0 ) β = { } ] Q n ( θ n, β) α n I(β β 0 ) + α n I( β n β 0 ) [ max β = Q n( θ n, β) α n nh /2 v n (β 0 )( + o P ()) ] {Q n ( θ n, β) α n } for any β. (3.4) Hence, the test based on T n (or T n) is consistent if the last minorant stays away from zero with probability tending to one for some β. It is easily seen that our test is consistent under the assumptions of Theorem 3.3 provided θ n converges to some pseudo-true value θ. Indeed, when the model is misspecified, there exists at least one β for which Q(θ, β) > 0. 4

15 Let us now investigate the ability of our test to detect directional departures from the null hypothesis. Consider a real-valued function δ(x) such that E[δ(X) µ(x; θ 0 )] = 0 and 0 < E[δ 4 (X)] <, (3.5) and the sequence of alternatives defined as H n : E [Y X] = µ( ; θ 0 ) + r n δ(x), n. (3.6) Note that there is no smoothness restriction on the function δ( ) as is frequent in this kind of analysis, see e.g. Zheng (996). Under H n, θ n θ 0 = O P (n /2 ) as proved by Lemma 6. in Section 6. We show below that such directional alternatives can be detected if α n /rn 2 tends to zero. Given the conditions of Theorem 3.2, this means that rnnh 2 /2+γ for some small γ > 0, where h applies to the univariate variable defined by a single linear index in X. By comparison, when one uses a standard multidimensional smooth test, rnnh 2 q/2 is needed for consistency. In other words, from the theoretical point of view, our test does not suffer from the curse of dimensionality against directional alternatives, that is, whatever the number of regressors, the power can remain arbitrarily close to the power obtained in the unidimensional case. Theorem 3.4 Under Assumptions D, M, K, and (3.3), if r 2 nnh /2 and α n /r 2 n 0, the tests based on T n and T n are consistent against the sequence of alternatives H n with δ(x) satisfying (3.5). Proof. By Assumption D-(b), vn(β) 2 σ 4 n 2 h W β 2, where W β is the matrix with generic element I (i j) K h ((X i X j ) β 0 ) /(hn(n )). Lemma 6.2 then ensures that vn(β 2 0 ) is bounded in probability from above for any β 0. Under H n, U i ( θ n ) = µ(x i ; θ 0 ) + r n δ(x i ) + ε i µ(x i ; θ n ). Then by simple algebra, Q n ( θ n, β) can be decomposed for any β as Q 0n (β) + 2Q n ( θ n, β) + Q 2n ( θ n, β) + 2Q 3n ( θ n, β) + 2Q 4n (β) + Q 5n (β), 5

16 where Q 3n ( θ n, β) = Q 4n (β) = Q 5n (β) = r n n(n ) r n n(n ) r 2 n n(n ) j i { δ(x i ) µ(x j ; θ n ) µ(x j ; θ 0 )} h K h ((X i X j ) β), ε i δ(x j ) h K h ((X i X j ) β), j i δ(x i )δ(x j ) h K h ((X i X j ) β). j i Since v 2 n(β) σ 4 n 2 h W β 2 = O P (), nh /2 Q 0n (β) = O P () for any β. Lemma 3.-(ii) deals with Q n ( θ n, β) and Q 2n ( θ n, β). It is shown in Section 6 that for any β Q 3n ( θ n, β) = O P (r n n /2 ) (3.7) Q 4n (β) = O P (r n n /2 ) (3.8) Q 5n (β) = r 2 ne [ E 2 [δ(x) X β]f β (X β) ] + o P (r 2 n). (3.9) Collecting results and using α n /r 2 n 0, it follows that for any β nh /2 } {Q n ( θ n, β) α n Cnh { /2 r 2 v n (β 0 )( + o P ()) ne [ E 2 [δ(x) X β]f β (X β) ] + o P (rn) } 2. Choose β such that E [E 2 [δ(x) X β]f β (X β)] > 0, which is possible from Lemma 2.. The conclusion then follows from Inequality (3.4). 4 Small sample implementation 4. Bootstrap critical values The wild bootstrap, initially proposed by Wu (986), is often used in smooth tests to compute small sample critical values, see e.g. Härdle and Mammen (993). Here we use a generalization of this method, the smooth conditional moments bootstrap introduced by Gozalo (997). It consists in drawing n i.i.d. random variables ω i independent from the original sample with Eω i = 0, Eω 2 i =, and Eω 4 i <, and to generate bootstrap observations of Y i as Y i = µ(x i, θ n ) + σ(x i )ω i, i =,... n. Bootstrap test statistics are built from the bootstrap sample as was the original test statistic. When this scheme is 6

17 repeated many times, the bootstrap critical value z α,n at level α is the empirical ( α)-th quantile of the bootstrapped test statistic. This critical value is then compared to the initial test statistic. Theorem 4. Under the assumptions of Theorem 3.3, the bootstrap critical values yield a test based on T n or T n with asymptotic level α. The proof follows easily from our previous results and is thus omitted. 4.2 Simulation study Our focus was first to compare the small sample power of our test to the multivariate test of Zheng (996) and Li and Wang (998) and second to determine the sensitivity of our test to the penalty α n, the direction defined by β 0 and the smoothing parameter h. For simplicity, we considered the null hypothesis H 0 : E(Y X) = 0. We generated samples of 50 observations from independent uniformly distributed variables X, X 2, X 3. The support of each variable was chosen as U [ 3, 3 ] to get unit variance. We sampled errors from a standard normal distribution and constructed the response variable as { ( ) } Xi + 2X 2i + 3X 3i Y i = d cosh e + ε i i =, with e a centering constant equal to sinh() sinh(2) sinh(3)/6. We considered (i) Zheng s test when the index (X + 2X 2 + 3X 3 )/ 4 is considered as the only regressor; (ii) Zheng s test when all three regressors are taken into account; (iii) our test based on T n (results for T n differed little and are not reported). To speed up computations, we assumed that the errors variance was known for all the tests. The optimization was carried out on a grid of 5000 points sampled from the uniform distribution 7

18 on the three-dimensional hypersphere of unit radius. From 5000 samples generated under the null hypothesis, i.e. with d = 0, we computed the tests statistics and obtained small sample critical values that allows to calibrate the level of the different tests at 5%. This is equivalent to bootstrapping since no parameter is estimated under H 0. We then drew the power curves of the different tests based on 2000 samples for each point of the grid d = 0., 0.2, To compute the test statistics, we used a biweight kernel with support [, ] and we selected the bandwidth as h = bn 2/(8+q), with q = 3 in Case (ii) and in the other cases, and b varies in {0.5,,.5,..., 4}. To set α n, we computed v 0, the mean of v n ( β n ), which was found to vary little with β 0. We then chose α n = av 0 n h 0.5, with v 0 = 0.65 in our case, and we let a vary in a grid from 2 to 0. We first set β 0 to (,, ) / 3, a natural choice if one does not favor any regressor at the outset. Figure compares the power curves of Zheng s tests and of our test for the different values of α when a = 2, 7 and 0 and the bandwidth constant b is set to one. The first striking fact is the large loss in power for Zheng s test when going from dimension one to three. In practice however, the test based on the unknown single linear index is infeasible. The second striking fact is that our test largely outperforms Zheng s test in dimension 3. The curve power of our dimension-reduction test is very close to the one of the infeasible test for a = 2 and goes away from it as α n increases, as expected. Still the gain in power with respect to Zheng s test is large for a = 0. We then considered two polar cases. In Figure 2, β 0 was chosen as the true unknown index. In that case, the power curves of the infeasible test and of our test are almost exactly similar whatever the value of α n. Figure 3 depicts the less perfect case where β 0 is set to (0,, 0), that is we favor alternatives depending upon X 2 only. When α n is small, our test performs well, but its power decreases when α n increases. For the largest considered penalty, our test is beaten by Zheng s test for small alternatives, but the reverse holds for larger alternatives. In Figure 4 we drew the power of the tests as a function of the bandwidth constant b for d = 0.4 to illustrate that our main findings were very little dependent of the chosen 8

19 smoothing parameter. For a small bandwidth, our test can even outperform the infeasible test. Moreover, the power of our test is very stable while the performances of Zheng s test is much more variable in either dimension. In a second set of simulations, the true regression depended on two linear indices, and we generated samples as { ( ) ( ) } 3Xi + X 2i X2i + 3X Y i = 0.3 d cosh 3i + cosh e + ε i i =, with e a centering constant equal to 2 sinh() sinh(3)/3. Other features of the experiments were unchanged, except that we now considered as a benchmark Zheng s test based on the two linear indices entering the regression function. Figures 5 to 8 sum up our findings for different values for α n, β 0 and h. Compared to our first set of experiments, the benchmark test has power close to Zheng s test in dimension 3 and our test has better performances than the benchmark test for different values of β 0 and h, as soon as α n is not too large. 5 Conclusion We have proposed a general approach to circumvent the curse of dimensionality in testing moment restrictions conditional upon a multivariate random variable X. Lemma 2. is the key of our approach. It shows that for testing E(Z X) = 0, it is sufficient to test whether E(ZE(Z X β)) = 0 for all β of norm. In practice, an index is selected by maximizing an estimator of the previous quantity minus a penalty function that aims at obtaining a simple behavior under the null hypothesis. Our approach applies to many testing problems as explained in Section 2.. We have applied it to testing for a parametric regression function. The test has known asymptotic critical values and behaves against directional alternatives almost as if the dimension of X was one. Our simulations results confirm the good power of the test. Much work remains to be done to elaborate further on our general approach. The uniform behavior of our test against smooth alternatives should be studied. Also, from our 9

20 key result, other testing procedures could be constructed such as an integrated conditional moment test in the spirit of Bierens (982). We are currently investigating these issues. Finally, future work should be devoted to applying our approach to other testing problems. 6 Technicalities In the following, C is a positive constant that may vary from line to line. Lemma 6. Under Assumptions D (a) and M, θ n θ 0 = O P (n /2 ) under H 0 and H n. Proof. We will proceed under H n, and the result under H 0 follows taking r n = 0. By a uniform law of large numbers for Euclidean families, see e.g. Pakes and Pollard (989, Lemma 2.8), sup {ε i + µ(x i ; θ 0 ) + r n δ(x i ) µ(x i ; θ)} 2 E {ε + µ(x; θ 0 ) µ(x; θ)} 2 = o P (). n θ,δ( ) The Euclidean property is ensured by Assumption M (a). As θ 0 is identifiable from Assumption p M (b), deduce that θ n θ 0. Now, by definition of θ n, 0 n = n [ε i + r n δ(x i )] 2 n θ n θ 0 2 { n { µ(x i ; θ } 2 2 n ) µ(x i ; θ 0 ) + +( θ n θ 0 ) { 2 n [ { ε i + r n δ(x i ) µ(x i ; θ }] 2 n ) µ(x i ; θ 0 ) { {ε i + r n δ(x i )} µ(x i ; n θ } n ) µ(x i ; θ 0 ) } { } Φ 2 3(X i ) + ( θ n θ 0 ) 2 [ε i + r n δ(x i )] µ(x i ; θ 0 ) n } [ε i + r n δ(x i )] µ(x i ; θ n, θ 0 ) ( θ n θ 0 ) = : A n θ n θ ( θ n θ 0 ) B n + ( θ n θ 0 ) C n ( θ n θ 0 ). Now A n A = O P (n /2 ), where A = E [ Φ 2 3 (X)] > 0 and B n = O P (n /2 + r n n /2 ). On the event E n = {A n 3A/4} {Sp(C n ) A/4}, we then have A θ n θ B n θ n θ 0 0, p that is θ n θ 0 2A B n. As θ n θ 0 and by Assumption M (a), P (E n ) and thus θ n θ 0 = O P (n /2 ). 20

21 grows. For real random variables, A n P B n means that P(/C A n /B n C) goes to when n Lemma 6.2 Let W β be the matrix with generic element I (i j) K h ((X i X j ) β) /(h n(n )). Under Assumptions D (c) and K, Sp(W β ) = O P (n ) and nh /2 W β P for any β. Proof. By definition, Sp(W β ) = sup u 0 W β u / u and for any u R n, W β u 2 = K h ((X i X j ) 2 β) u j h n(n ) j=,j i K h ((X i X j ) β) K h ((X i X j ) β) h n(n ) h n(n ) j=,j i u 2 max i n Hence nsp(w β ) max i n j i j=,j i j=,j i K h ((X i X j ) β) h n(n ) (n )h K h ((X i X j ) β). For all j and β, K h ((x X j ) β) C and Var [K h ((x X j ) β)] C. Thus the Bernstein inequality yields for any t > 0 ( P (nh 2 ) α ) /2 ( max i n ln n (n ) h K h (Xi X j ) β ) E [ h ( K h (Xi X j ) β ) ] X i j i t E P ( K h (Xi X j ) β ) i n (n ) j i E [ ( K h (Xi X j ) β ) ] ( ) ]] ln n /2 X i th (nh 2 ) α X i ( 2n exp t2 (nh 2 ) ] )(ln n) 2 C((nh 2 ) α + th(nh 2 ) α/2 (ln n) /2 2 exp [ln(n) t2 ) C (ln n)(nh2 ) α 0, since nh 2. Moreover, using the proof of (3.9) with δ(x), deduce E { E [ h K h ( (Xi X j ) β ) X i ]} E [ fβ (X iβ) ] C for some C independent of β. By Markov inequality, E [ h K h ((X i X j ) β) X i ] = OP (). This gives the first result. For the second result, n 2 h W β 2 ( = (n ) 2 h K2 h (Xi X j ) β ) p E [ f β (X β) ] K 2 (u) du i j follows by adapting the proof of (3.9) below with δ(x). The last quantity is bounded from above and below by Assumptions D-(c) and K-(a). 2. u 2 j 2

22 Proof of Lemma 3.. i) Consider the degenerate U-process U n g = n(n ) ( ε i ε j K h (Xi X j ) β ) defined by the functions g indexed by h and β with β =. j i By Assumption D, Lemma 22(ii) of Nolan and Pollard (987) and Lemma 2.4(ii) of Pakes and Pollard (989), the family { g : β =, h > 0} is Euclidean for an envelope with bounded fourth moment. By Sherman s (994a) Main Corollary with p = and Holder s inequality, E sup β =,h>0 nu n g Λ [ E { sup β =,h>0 }] /2 { ( U2n g 2) α ΛE α/2 } sup U 2n g 2, (6.0) β =,h>0 where Λ is a universal constant and 0 < α <. Apply Hoeffding s decomposition for U 2n g 2 and Corollary 4(i) of Sherman (994a) to deduce that { } { E sup U 2n g 2 E sup U 2n g 2 E ( g 2) β =,h>0 β =,h>0 Using the boundedness of σ 2 ( ) and f β ( ), O(n /2 ) + sup E ( g 2). β =,h>0 } + sup E ( g 2) β =,h>0 E ( g 2) σ 4 E { Kh 2 ( (Xi X j ) β )} { } σ 4 h K 2 (t)f β (v + th) dt f β (v)dv Ch R R with C > 0 independent of h and β. Inequality (6.0) and nh 2 then yield E sup β = nh (/2+γ) Q 0n (β) = E sup Choose α 2γ to obtain the result. β = nh ( /2+γ) U n g = O(h γ+(α )/2 ). (ii) Consider V n (θ 0 ) = { θ Θ : θ θ 0 n /2 M } a shrinking neighborhood of θ 0. By ] Lemma 6., P [ θn V n (θ 0 ) whenever M. Let W = (ε, X ) and g θ,h,β (W i, W j ) = ε i {µ(x j ; θ) µ(x j ; θ 0 )} K h ( (Xi X j ) β ), which is such that E[g θ,h,β (W i, W j ) W j ] = 0. From our assumptions, the class of functions g θ,h,β (, ), θ Θ, h (0, ], β =, is Euclidean for a squared-integrable envelope F (W i, W j ) = ε i Φ(X j ) where Φ( ) = C 2 Φ i( ), for some suitable constant C, cf. Nolan 22

23 and Pollard (987, Lemma 22(ii)) and Pakes and Pollard (989, Lemma 2.3 and Lemma 2.4 (ii)). Apply Hoeffding s decomposition to the U-process hq n (θ, β) and consider the second order degenerate U-process in this decomposition U n g θ,h,β, with g θ,h,β (W i, W j ) = g θ,h,β (W i, W j ) E[g θ,h,β (W i, W j ) W i ]. By Lemma 5 of Sherman (994a), the family g θ,h,β, θ Θ, h (0, ], β =, is Euclidean for a squared-integrable envelope. From the Main Corollary of Sherman with p = and k = 2, E [ sup θ V n(θ 0 ),h,β ] [ nu n g θ,h,β Λ E sup θ V n(θ 0 ),h,β where Λ is a universal constant and 0 < α <. We have ] /2 { U2n g 2 } α θ,h,β (6.) g θ,h,β (W i, W j ) } C θ θ 0 ε i { Φ (Xj ) + E[ Φ (X j ) W i ] C θ θ 0 ε i { Φ (X j )+} for some constants C, C. Hence, from Inequality (6.), [ ] ( ) E nu n g θ,h,β CM 2 α/2, n sup θ V n(θ 0 ),h,β for some C > 0, and by Chebyshev s inequality, sup nh /2 U n g θn,h,β = O P ((nh /α ) α/2). (6.2) β = We now study the U-process of order in Hoeffding s decomposition of hq n (θ, β). Let P n g denote this empirical process, where g(w i ) = g θ,h,β (W i ) = E[g θ,h,β (W i, W j ) W i ] = ε i E [ ( {µ(x j ; θ) µ(x j ; θ 0 )} K h (Xi X j ) β ) ] X i = (θ θ 0 ) E [ ( µ(x j ; θ 0 )K h (Xi X j ) β ) ] X i εi + (θ θ 0 ) E [ µ(x ( j ; θ, θ 0 )K h (Xi X j ) β ) ] X i (θ θ0 )ε i = : (θ θ 0 ) g (W i ) + (θ θ 0 ) g 2 (W i )(θ θ 0 ). Let g,s ( ), s d, denote the components of g ( ). For each s, by our assumptions, Lemma 22(ii) of Nolan and Pollard (987) and Lemma 5 of Sherman (994), the family of functions g,s ( ), indexed by h and β is Euclidean for a squared integrable envelope. The Main Corollary 23

24 of Sherman with p = k = yields [ ] [ E n /2 P n g,s Λ sup h,β E sup h,β ] /2 { P2n g,s 2 } α s =,..., p, (6.3) where Λ is a universal constant and 0 < α <. If µ s ( ; ) denotes the sth component of µ( ; ), g,s (W i ) = [ ( E µs (X j ; θ 0 )K h (Xi X j ) β ) ] [ ( X i εi E Φ (X j )K h (Xi X j ) β ) ] X i εi E /4 [ Φ 4 (X) ] [ E 3/4 K 4/3 ( h (Xi X j ) β ) ] X i ε i Ch 3/4 ε i for some C > 0. From (6.3) and Chebyshev s inequality, sup β n /2 P n g = O P (h 3α/4 ), thus sup nh /2 (θ θ 0 ) P n g = OP (h (3α/4) /2)). (6.4) θ V n(θ 0 ),β Similar arguments apply to each of the components g 2,kl, k, l d, of the square matrix g 2, so that sup θ V n(θ 0 ),β nh /2 (θ θ 0 ) P n g 2 (θ θ 0 ) = O P (n /2 h (3α/4) /2). (6.5) From Equations (6.2), (6.4), and (6.5) with α > 2/3 and using nh 2, sup nh /2 Q n ( θ n, β) = o P (). β = For Q 2n ( θ n, β), use the expansion of µ( ; θ) and similar arguments to show that Last, for θ V n (θ 0 ), sup nh /2 [Q 2n (θ, β) EQ 2n (θ, β)] = o P (). θ V n(θ 0 ), β = EQ 2n (θ, β) = E [ ( {µ(x i ; θ) µ(x i ; θ 0 )} {µ(x j ; θ) µ(x j ; θ 0 )} K h (Xi X j ) β )] θ θ 0 2 E /2 [ Φ4 ] (X) E 3/4 h 4/3 K 4/3 ( h (Xi X j ) β )] = O P (n h /4 ) = o P (n h /2 ). Proof of (3.7). Since u W β v u v Sp(W β ), then for any β, { δ(x i ) µ(x j ; n(n ) θ n ) µ(x j ; θ 0 )} h K ( h (Xi X j ) β ) j i [ ] /2 [ ] n δ 2 ( /2 (X i ) µ(x i ; n n θ 2 n ) µ(x i ; θ 0 )) Sp(W β ) [ ] ( /2 O P () µ(x i ; n θ 2 n ) µ(x i ; θ 0 )), 24

25 by Lemma 6.2 and the weak law of large numbers. Now, under Assumption M, ( µ(x i ; θ ) 2 n ) µ(x i ; θ 0 ) Φ2 (X i ) θ n θ 0 2 for some Φ( ) with bounded fourth moment and n n ( µ(x i ; θ n ) µ(x i ; θ 0 )) 2 = OP (n ). Proof of (3.8). Denote by E n the conditional expectation given the X i and let δ(x i ) = δ(x j ) n(n ) h K ( h (Xi X j ) β ). j=,j i Then Marcinkiewicz-Zygmund s and Minkowski s inequalities implies that for any β, there is some constant C independent of n such that /2 E n ε i δ(x i ) C E2 n ε 2 i δ 2 (X i ) /2 C { δ 2 (X i )E 2 n ε i } /2 { } /2 { /2 C δ 2 (X i ) Cn /2 δ 2 (X i )} Sp(W β ) = O P (n /2 ), n using (δ(x ),..., δ(x n )) W β (δ(x ),..., δ(x n )) Sp(W β ), Lemma 6.2 and the weak law of large numbers. Proof of (3.9). Consider U n = rn 2 Q 5n (β). By straightforward computations, Var (U n ) C n Var [ δ(x )δ(x 2 )h ( K h (Xi X j ) β )] C n E [ δ 4 (X) ] E /2 [ h 4 K 4 h ( (Xi X j ) β )] = O(n h 3/2 ) = o(). Now, denoting by K( ) the Fourier transform of K( ), E (U n ) = E [ δ(x )δ(x 2 )h ( K h (X X 2 ) β )] = {δ(x 2π E )δ(x 2 )h exp ( it(x X 2 ) β/h ) } K(t) dt = E [ E[δ(X) X β] exp(itx β) ] 2 K(ht) dt. 2π As E [δ(x) X β] f β (X β) L (R) L 2 (R), we obtain by the Plancherel theorem that E [ E[δ(X) X β] exp(itx β) ] 2 dt = E [ E 2 [δ(x) X β]f β (X β) ], 2π see Rudin (987). Since K( ) and K(0) =, the Lebesgue dominated convergence theorem yields E (U n ) E [ E 2 [δ(x) X β]f β (X β) ]. 25

26 REFERENCES Aït-Sahalia, Y., Bickel, P., and Stoker, T. (200) Goodness-of-fit tests for kernel regression with an application to option implied volatilities. J. Econometrics 05, Andrews, D.W.K. (997). A conditional Kolmogorov test. Econometrica 65, Bierens, H.J. (982). Consistent model specification tests. J. Econometrics 20, Bierens, H.J. (990). A consistent conditional moment test of functional form. Econometrica 58, Bierens, H.J. and Ploberger W. (997). Asymptotic theory of integrated conditional moment tests. Econometrica 65, Carrasco, M., Florens J.P. and Renault E. (2006). Linear inverse problems in structural econometrics: estimation based on spectral decomposition and regularization. Handbook of Econometrics vol. 6 J. Heckman and E. Leamer eds. North Holland. Chen, X., and Fan, Y. (999). Consistent hypothesis testing in semiparametric and nonparametric models for econometric time series. J. Econometrics 9, Chen, P.D. (99). L -theory of approximation by ridge functions. Chinese Sci. Bull. 36, Delecroix, M., Hristache, M., and Patilea, V. (2005). On semiparametric M estimation in single-index regression. J. Statist. Plann. Inference, forthcoming. Delgado, M.A. and González-Manteiga, W. (200). Significance testing in nonparametric regression based on the bootstrap. Ann. Statist. 29, Delgado, M.A., Dominguez, M.A., and Lavergne, P. (2005). moment restrictions. Annales d Economie et Statistique, forthcoming. Consistent tests of conditional Donald, S.G., Imbens, G.W. and Newey, W.K. (2003). Empirical likelihood estimation and consistent tests with conditional moment restrictions. J. Econometrics 7, Fan, Y., and Li, Q. (996). Consistent model specification tests: omitted variables and semiparametric functional forms. Econometrica 64, Gozalo, P.L. (997). Nonparametric bootstrap analysis with applications to demographic effects in demand functions. J. Econometrics 8,

27 Gozalo, P.L., and Linton, O.B. (200). Testing additivity in generalized nonparametric regression models with estimated parameters. J. Econometrics 04, -48. Guerre, E., and Lavergne, P. (2002). Optimal minimax rates for nonparametric specification testing in regression models. Econometric Theory 8, Guerre, E., and Lavergne, P. (2005). Data-driven rate-optimal specification testing in regression models. Ann. Statist. 33, Hall, P., and Yatchew, A. (2005). Unified approach for testing functional hypotheses in semiparametric contexts. J. Econometrics, 27, Härdle, W., and Mammen, E. (993). Comparing nonparametric versus parametric regression fits. Ann. Statist. 2, Härdle, W., and Stoker, T. (989). Investigating smooth multiple regression by the method of average derivatives. J. Statist. Amer. Assoc. 84, Hart, J.D. (997). Nonparametric smoothing and lack-of-fit tests. Springer-Verlag, New-York. Hong, Y., and White, H. (995). Consistent specification tests via nonparametric series regression. Econometrica 63, Horowitz, J.L., and Spokoiny, V.G. (200). An adaptive, rate-optimal test of a parametric model against a nonparametric alternative. Econometrica 69, Ichimura, H. (993). Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J. Econometrics 58, Lavergne, P. (200). An equality test across nonparametric regressions. J. Econometrics 03, Lavergne, P., and Vuong, Q. (2000). Nonparametric significance testing. Econometric Theory 6, Li, Q., and Wang, S. (998). A simple consistent bootstrap test for a parametric regression function. J. Econometrics 87, Nolan, D., and Pollard, D. (987). U processes : Rates of convergence. Ann. Statist. 5, Pakes, A., and Pollard, D. (989). Econometrica 57, Simulation and the asymptotics of optimization estimators. Powell, J.L., Stock, J.H. and Stoker, T.M. (989). Semiparametric estimation of index coefficients. Econometrica 57,

28 Roy, S.N. (953). On a heuristic method of test construction and its use in multivariate analysis. Ann. Math. Statist. 24, Rudin, W. (987). Real and complex analysis. McGraw-Hill. Sherman, R.P. (994a). Maximal inequalities for degenerate U processes with applications to optimization estimators. Ann. Statist. 22, Sherman, R.P. (994b). U-processes in the analysis of a generalized semiparametric regression estimator. Econometric Theory 0, Spokoiny, V.G. (996). Adaptive hypothesis testing using wavelets. Ann. Statist. 24, Stinchcombe, M.B., and White, H.. (998). Consistent specification testing with nuisance parameters present only on the alternative. Econometric Theory 4, Stoker, T.M. (986). Consistent estimation of scaled coefficients. Econometrica 54, Stone, C.J. (980). Optimal rates of convergence for nonparametric estimators. Ann. Statist. 8, Stute, W. (997). Nonparametric models checks for regression. Ann. Statist. 25, Stute, W., and Zhu, L.X. (2005). Nonparametric checks for single-index models. Ann. Statist. 33, Whang, Y.J. (200). Consistent specification testing for conditional moment restrictions. Economics Letters 7, Wu, C.F.J. (986). Jacknife, bootstrap and other resampling methods in regression analysis (with discussion). Ann. Statist. 4, Xia, Y., Li, W.K., Tong, H., and Zhang, D. (2004). A goodness-of-fit test for single-index models. Statist. Sinica 4, -39 (with discussion). Zheng, J.X. (996). A consistent test of functional form via nonparametric estimation techniques. J. Econometrics 75, Zhu, L.X., and Li R. (998). Dimension-reduction type test for linearity of a stochastic model. Math. Appli. Sinica 4, Acta Zhu, L.X. (2003). Model checking of dimension-reduction type for regression. Statist. Sinica 3,

29 Figure : β 0 =(,, )/ 3 Figure 2: β 0 =(, 2, 3)/ Zheng s test Dim Zheng s test Dim 3 Our test a=2 Our test a=7 Our test a= Zheng s test Dim Zheng s test Dim 3 Our test a=2 Our test a=7 Our test a=0 Empirical power Empirical power Deviation d Deviation d Figure 3: β 0 =(0,, 0) Figure 4: β 0 =(,, )/ 3andd = Zheng s test Dim Zheng s test Dim 3 Our test a=2 Our test a=7 Our test a= Empirical power Empirical power Zheng s test Dim Zheng s test Dim 3 Our test a=2 Our test a=7 Our test a= Deviation d Bandwidth b

30 Figure 5: β 0 =(,, )/ 3 Figure 6: β 0 =(3,, 0)/ Empirical power Empirical power Zheng s test Dim 2 Zheng s test Dim 3 Our test a=2 Our test a=7 Our test a= Zheng s test Dim 2 Zheng s test Dim 3 Our test a=2 Our test a=7 Our test a= Deviation d Deviation d Figure 7: β 0 =(0,, 0) Figure 8: β 0 =(,, )/ 3andd = Zheng s test Dim 2 Zheng s test Dim 3 Our test a=2 Our test a=7 Our test a= Empirical power Empirical power Zheng s test Dim 2 Zheng s test Dim 3 Our test a=2 Our test a=7 Our test a= Deviation d Bandwidth b

Breaking the curse of dimensionality in nonparametric testing

Breaking the curse of dimensionality in nonparametric testing Pascal Lavergne, Simon Fraser University Valentin Patilea, CREST-ENSAI Abstract For tests based on nonparametric methods, power crucially depends