Bootstrapping The Order Selection Test

Size: px

Start display at page:

Download "Bootstrapping The Order Selection Test"

Hester Morrison
5 years ago
Views:

1 Bootstrapping The Order Selection Test Chien-Feng Chen, Jeffrey D. Hart, and Suojin Wang ABSTRACT We consider bootstrap versions of the order selection test of Euban and Hart (992) and Kuchibhatla and Hart (996) for testing lac-of-fit of regression models. For homoscedastic data, conditions are established under which the bootstrap level error is smaller (asymptotically) than that of the large sample test. A new statistic is proposed to deal with the case of heteroscedastic data. The limiting distribution of this test statistic is derived and shown to depend on the unnown error variance function. This dependency maes using the a formidable tas in practice. An alternative approximation is to apply bootstrap procedures. We propose various bootstrap tests, including ones based on the wild bootstrap. Simulation studies indicate that the wild bootstrap generally has good level and properties, although sometimes can be increased by appropriate smoothing of squared residuals. A real-data example is also considered to further illustrate the methodology. KEY WORDS: Bootstrap; Heteroscedasticity; Nonparametric smoothing; Order selection test; Wild bootstrap. Chien-Feng Chen is Senior Statistician, Insulins Product Team, Eli Lilly and Company, Indianapolis, IN Jeffrey D. Hart is Professor, Department of Statistics, Texas A&M University, College Station, TX Suojin Wang is Professor, Department of Statistics, Texas A&M University, College Station, TX Hart s research was supported in part by NSF Grant DMS Wang s research was supported by the Texas Advanced Research Program ( ), the National Cancer Institute (CA 57030) and the Texas A&M Center of Environmental and Rural Health through a grant from the National Institute of Environmental Health Sciences (P30-E50906). The authors are grateful to an Associate Editor and a referee for their review of our wor and helpful comments.

2 . INTRODUCTION In recent years using nonparametric smoothing techniques in testing lac-of-fit of regression models has drawn much attention. Smoothing-based tests surpass the classical nonparametric tests, such as the von Neumann test and cusum tests, in more than one way. They tend to be more ful and can provide estimates of the regression function when the null hypothesis is rejected. As a consequence of smoothing, most of the proposed tests depend on smoothing parameters. To have a desired level of significance, smoothing parameters need to be fixed in advance to carry out the tests. Furthermore, the choice of smoothing parameters has an effect on the test. Unlie most of the smoothing-based tests, the order selection test of Euban and Hart (992) does not depend on arbitrarily chosen smoothing parameters. The test utilizes an orthogonal series estimate of the underlying regression function by using the truncation point (a smoothing parameter) of the estimate as the test statistic. Therefore, the test statistic is itself a data-driven smoothing parameter. An equivalent form of the test pointed out by Kuchibhatla and Hart (996) provides a continuous valued test statistic that maes computation of P-values relatively straightforward. The order selection test is consistent against fixed alternatives and can detect local alternatives that converge to the null at the rate n. This paper focuses on testing the no-effect hypothesis, i.e., the hypothesis that the regression function is constant. In doing so we mae use of the Kuchibhatla and Hart (996) version of the order selection test. The exact distribution of one version of the test statistic is nown when the errors are independent and identically distributed (i.i.d.) Gaussian. The asymptotic distribution was obtained by Euban and Hart (992) assuming only that the errors are i.i.d. with finite fourth moments. However, the validity of approximating the sampling distribution by an asymptotic one depends on factors such as sample size, error distribution and estimation of error variance. An alternative and often better approximation is to apply bootstrap procedures. One goal of this paper is to provide theoretical justification for the bootstrap in the case of i.i.d. errors. Another aim of the paper is to deal with heteroscedastic regression models. The asymptotic distribution of the test statistic of Kuchibhatla and Hart (996) is derived and shown to depend on the unnown variance function. The discrepancy between the asymptotic distribution and its homoscedastic counterpart is non-negligible. Ignorance of model heteroscedasticity will invalidate the use of the order selection test. Approximating the heteroscedastic asymptotic distribution involves, among other things, estimation of the error variance function, which is not an easy tas. The wild bootstrap method is a convenient tool for producing a consistent estimator of a statistic s sampling distribution when the errors have nonconstant variances. We employ wild bootstrap methods in this research to approximate the sampling distribution of test statistics for the no-effect hypothesis. A new test statistic (denoted T het n), equivalent in the asymptotic sense to that of Kuchibhatla and Hart (996) in the

3 i.i.d. case, is proposed. Although the limiting distribution of T het n also depends on the unnown error variance, the dependence is shown to be minor. The asymptotic distribution under i.i.d. assumptions can be considered a ballpar substitute for the heteroscedastic one. The rest of the paper is organized as follows. In Section 2 we briefly review the development of the order selection test. Bootstrap procedures are discussed and applied to a homoscedastic model in Section 3. Here, it is shown that the bootstrap level error is asymptotically smaller than that of the. Section 4 deals with heteroscedastic models. Asymptotic theory and bootstrap procedures are explored. Simulation results for both i.i.d. and heteroscedastic cases are presented in Section 5. In Section 6 we apply the two test statistics through s, bootstrap tests and wild bootstrap tests to an example from a diabetes clinical trial. The conclusions reached in this research and some open questions for future research are given in Section 7. The proofs of two theorems are presented in the Appendix. 2. THE ORDER SELECTION TEST Consider the simple regression model Y i r x i ε i i n () where Y Y n are the observed responses, r is the regression function, x i i 5 n, i n, and ε ε n are i.i.d. error terms with zero mean and variance σ 2. As long as the regression function is piecewise smooth on 0, then at all its continuity points it can be represented by the Fourier series where the Fourier coefficients are r x φ 0 2 φ j cos π jx (2) j φ j 0 r x cos π jx dx j 0 In analogy to the Fourier series above, we may estimate r x by the truncated series r x;m where m is a nonnegative integer less than n and m φ 0 2 φ j cos π jx (3) j n φ j n Y i cos π jx i j 0 n (4) A fundamental problem in regression is testing the no-effect hypothesis, H 0 : r x C for all x 0 2

4 where C is an unnown constant. When (2) holds, the no-effect hypothesis is equivalent to the hypothesis that φ j 0, for all j. The Euban and Hart (992) order selection test for no-effect is based on the data driven truncation point m, an estimator of the m in (3) that minimizes an estimated ris function. Specifically, m is the imizer of J ;γ α, where J 0;γ α 0 J m 2n φ m;γ α 2 j j σ 2 γ α m m n σ 2 is any consistent estimator of σ 2 and γ α is a constant that depends upon the desired significance level α. A value of m is evidence that at least one φ j is nonzero; hence the test rejects the null hypothesis at level α if m. Taing γ α 3 22, 4 79 and yields asymptotic tests of size.0,.05 and.0, respectively. An attractive feature of this test is that once the null hypothesis is rejected, an immediate point estimate of the regression function is at hand: r x m φ 0 2 φ j cos π jx j An equivalent form of the test by Kuchibhatla and Hart (996) uses a continuous-valued test statistic T n 2n φ n 2 j j σ 2 (5) and H 0 is rejected for large values of T n. As long as the errors are assumed to be i.i.d. with finite fourth moments, T n converges in distribution (under H 0 ) to T sup Z 2 j j where Z Z 2 are i.i.d. standard normal random variables. As shown by Spitzer (956), the distribution of T can be determined to any desired accuracy. The asymptotic α level critical value of the T n -based test is precisely the value γ α that induces an asymptotic level of α for the m version of the order selection test. In cases where design points are fixed but not evenly spaced, there are at least two solutions. First, one may test for constancy of the regression quantile function, as defined by Parzen (98). Let u j j 5 n, j n, and suppose the (unevenly spaced) design points satisfy x j Q n u j j n where Q n is a piecewise constant empirical quantile function that converges to some Q as n. The hypothesis H 0 : r x j C, j n, is equivalent to H 0 : rq n u j C, j n. Hence the test procedures described previously can be applied to the regression quantile function rq n. A second approach is to define a version of T n in terms of basis functions that are orthogonal with respect to the design points. Given any set of basis functions, one may easily construct an orthogonal basis from them by a Gram-Schmidt 3

5 procedure, but in fact doing so is not necessary, as discussed in Euban and Hart (992). The asymptotic distribution theory for these two methods is the same as that for T n. Extensions to the random design case are also straightforward by conditioning on the observed x-values. 3. BOOTSTRAPPING WITH I.I.D. ERRORS Our main purpose in this section is to show that a bootstrap method often better approximates the null distribution of T n than does the large sample distribution. In our bootstrap algorithm, we need to simulate data from a model that assumes H 0 to be true, which is in eeping with one of the two bootstrap guidelines set forth by Hall and Wilson (99). To this end, let Y be the sample mean of Y Y n, and define bootstrap data by Y i Y ε i i n where ε ε n are i.i.d. as F n, the empirical distribution of e Y Y e n Y n Y. Define bootstrap Fourier coefficients by n φ j n Y i cos π jx i j n (6) i and σ 2 to be exactly the same function of Y Y n as σ 2 is of Y Y n. For the remainder of this section we assume that σ 2 n n i Y i Y 2. Our test statistic will be and its bootstrap counterpart T n 0 T n j j 2n φ 2 j σ 2 2n φ 2 j σ 2 The statistic T n 0 satisfies the second Hall and Wilson (99) guideline, namely it is an (asymptotic) pivotal quantity. The number 0 is fixed, but allowed to be arbitrarily large. Ideally, we would choose 0 n (as in Section 2), but this leads to technical difficulties in proving a bootstrap accuracy result. In practice, we have found that choice of 0 is a very minor point since it is extremely rare for the imum of j 2n φ 2 j σ 2 to occur at a larger than 5. In order to show that the bootstrap distribution accurately estimates the null distribution of the test statistic T n 0, we will mae use of the theoretical results of Qumsiyeh (990, 994) that were developed for the bootstrap in multiple regression models. Let X be the n 0 matrix with first column all s and entry 2cos π j x i in the ith row and jth column, i n, j 2 0. The discussion above indicates that, for some large enough but fixed 0 n, model () may be well approximated by the model Y Xβ ε 4

6 with Y Y Y n β φ 0 2φ 2φ 0, and ε ε ε n. Note that X X ni p p for p 0, and, as can easily be seen from (4) and (6) β φ 0 2 φ 2 φ 0 X X X Y n X Y, the least squares estimate of β, and β φ 0 2 φ 2 φ 0 n X Y. We mae the following assumptions on regularity conditions: A. The ε i s are i.i.d. with mean 0 and finite variance σ 2 0. A2. ε i has a non-zero absolutely continuous component which has a positive density on an open subset of R. A3. ε i has a finite 2s-th absolute moment for some integer s 3. Let Φ r and ψ r be the standard normal distribution and its density on R r, respectively. Denote the mapping of T n 0 from R 0 to R by Q Q x x 0 0 j x2 j. We also denote by P the probability under F n. Further, define B as a class of all Borel subsets of R 0 satisfying (i) sup E B Φ p E η O η as η 0, where E η is the set of points in R p within η of the boundary of E B, and (ii) each set E B corresponds to a Borel set D R with E Q D. Theorem. Let Assumptions A A3 hold and suppose that s in A3 is greater than p. Then under the null hypothesis H 0 that r is constant, we have a.s. as n sup P 0 t T n 0 t P T n 0 t o n 2 (7), and hence the bootstrap approximation for the distribution of T n 0 is asymptotically more accurate than the normal approximation, which generally has an error of O n 2. Proof. A complete proof is very long. We will setch the main steps by applying the results of Qumsiyeh (990, 994). First we show the following two-term Edgeworth expansion for P T n 0 D : sup P E T n 0 D B E n 2 P x ψ 0 x dx o n 2 (8) where P is a polynomial function with its coefficients depending on the ν-th and lower order cumulants κ ν of β for some ν. We use an argument similar to that in Qumsiyeh s (994) proof of his Theorem σ, where φ φ 0. The only difference 2.4 under H 0 for studentized vector W n n 2 β 0 0 between W n and the statistics W n in Theorem 2.4 of Qumsiyeh (994) is that σ in W n uses the null model Y i X i β to estimate the standard error. However, under H 0, it is easy to see that the difference between the estimated standard errors is of O p n. Therefore, residuals e i s while W n uses the full model residuals ε i sup P E W n E B E n 2 P x ψ 0 β 0 x dx o n 2 (9) for some polynomial P x as defined above. Equation (8) then follows since T n 0 Q W n and thus P T n 0 D P W n E for D Q E. 5

7 Next, we give the bootstrap version of equation (8): a.s. as n sup P E T n 0 D B E n 2 P x ψ 0 x dx o n 2 (0), where P is the same polynomial as P but with the cumulants κ ν replaced by the β given e i s. A proof of (0) can be obtained similarly to that of Theorem conditional cumulants κ ν of 3.3 of Qumsiyeh (994) in the same way (8) was derived by first arriving at (9). There are two differences between our bootstrap version W n of W n and Qumsiyeh s W n. The first one is between using the null model and the full model residuals in estimating the standard errors of the bootstrap statistics. This difference leads to a negligible error of O p n under H 0. The second difference is to resample ε i from the null model residuals e i and from the full model residuals ε i. However, this difference contributes only a negligible error and the following continues to hold when ε i are sampled from e i : sup P E W n E n 2 P x ψ 0 x dx o n 2 B E a.s. as n, since it is readily seen that Lemma 3. of Qumsiyeh (994) on Cramér s condition for the empirical distribution function is still valid when ε i are sampled from e i instead of ε i. Expression (0) then follows. proof. Using the fact that κ j κ j a.s. under H 0 and combining (8) and (0) yields (7), thus completing the Another important consideration is how the bootstrap test behaves when the alternative hypothesis is true, in which case the main concern is. If the alternative holds, the empirical distribution function of Y Y Y n Y is not a consistent estimator of the underlying error distribution. Nonetheless, under mild conditions, the bootstrap distribution P T n 0 t is a consistent estimator of the null distribution of the test statistic, as seen in the following theorem. The consistency of P is enough to ensure that our bootstrap test has desirable properties, such as consistency against a large class of alternatives. Theorem 2. Under model (), suppose that ε ε n are i.i.d. with finite fourth moments and that r is piecewise continuous on 0. Furthermore, let Z Z 0 be i.i.d. standard normal random variables and define T 0 0 j Z2 j. Then converges in probability to 0 as n sup 0 t P T n 0 t P T 0 t. In addition, if the of the bootstrap test tends to as n. 0 r x cos π jx dx 0 for some j 0, then Proof. Let G t P T 0 t and define T n 0 σ σ T n 0. Then for any positive δ, we have P T n 0 t G t P σ σ δ P T n 0 σ σ t σ σ δ G t () 6

8 Using the moment conditions on ε and the piecewise continuity of r, it is straightforward to show that P σ σ δ converges in probability to 0. It is enough, then, to investigate the second term on the right hand side of inequality (). That term is bounded by p n t G t p 2n t G t, where p n t P T n 0 δ t σ σ δ and p 2n t is defined exactly the same but with δ t replaced by δ t. We consider only p n t G t since p 2n t G t is handled in exactly the same way. It is elementary to show that p n t G t P T n 0 δ t G δ t The random variable T 0 has a density g such that sup 0 t tg t that sup G 0 t t G δ t G t G δ t 2P σ σ δ Cδ C for some constant C, which implies The last term can be made arbitrarily small by choosing δ small enough. The only term left to deal with is P T n 0 δ t G δ t, for which we use a Berry-Esséen type result in Bhattacharya and Ranga Rao (976). Applying their Theorem 3.3, we have Y i Y 4 sup 0 x P T n 0 x G x a 0 0 2n n i n where a 0 is a constant that depends only on 0. The result now follows from the piecewise continuity of r and the fact that ε has finite fourth moment. To prove the consistency of the test, note that the first part of the theorem establishes that the bootstrap percentiles consistently estimate those of T 0. It is enough, then, to argue that T n 0 tends to infinity in probabilty as n. By assumption, there exists J 0 such that φ J def 0 r x cos πjx dx 0. We have T n 0 2n φ 2 J σ 2 from which the result follows since φ J is consistent for φ J and σ 2 is consistent for where r 0 r x dx. 0 r x r 2 dx σ 2, 4. HETEROSCEDASTICITY In practice the assumption of equal error variances is sometimes violated. Under such circumstances, inferences based on homoscedasticity may be invalid regardless of whether parametric or nonparametric methods are used. A reasonable model for heteroscedasticity is to assume that the error variances follow a smooth function of some nown variable. Here we assume that the error variance is a function of the predictor x. In 7

9 Section 4. we consider how heteroscedasticity affects the large sample distribution of T n and another statistic that seems better suited for heteroscedastic data. In Section 4.2 we propose a wild bootstrap algorithm for approximating the distribution of statistics in the presence of heteroscedasticity. 4. Large Sample Distribution of Statistics Consider the model Y i r x i σ x i η i i n (2) where E η i 0, Var η i, i n and σ is some positive function. The following theorem provides the large sample distribution of T n under model (2) when r C. The proof of this result is given in the Appendix. Theorem 3. Assume that η η n in model (2) are independent with finite fourth moments, and that the error variance function σ 2 x has two continuous derivatives on [0,]. Then, if r is identical to a constant, where c j D T n sup Z Z 2 Z m are jointly normal for all m, Z j N 0, j j c j Z 2 j 0 σ2 x cos 2π jx dx 0 σ2 x dx 2, and Cov Z j Z 0 σ2 x cos π jx cos π x dx 0 σ2 x cos 2 π jx dx 0 σ2 x cos 2 π x dx Not surprisingly, the asymptotic distribution of T n depends on the unnown error variance function. Hence, to put Theorem 3 to practical use, one is faced with the daunting tas of estimating σ 2 x. Another possible test statistic is T het n n j φ 2 j Var φ j On the surface T het n seems better suited for heteroscedasticity than does T n since each φ j is correctly standardized. Note that Var φ j n n 2 σ 2 i cos 2 π jx i (3) i which is asymptotic to n we obtain a large sample distribution for T het n. 0 σ2 x cos 2 π jx dx under the conditions of Theorem 3. In the following theorem 8

10 Theorem 4. Assume that the conditions of Theorem 3 hold, and that Var φ j is uniformly consistent for Var φ j in the sense that Then, if r is identical to a constant, Var j n φ j Var φ j T D het n sup Z 2 j j where Z Z 2 have the same distribution as in Theorem 3. P 0 (4) Estimation of Var φ j requires estimation of the variance function σ 2 x, which may be done either parametrically or nonparametrically. The latter approach can be achieved by smoothing the squared residuals e 2 e 2 n. Another, simpler, possibility is to use the estimator Var φ j n n 2 e 2 i cos2 π jx i i In the case of a parametric model for σ 2 x, if wealy consistent estimators of model parameters are available, then one may show that the uniform consistency condition in Theorem 4 is achieved by using a version of Var φ j that replaces each σ 2 i in (3) by its parametric estimator. It is of interest to investigate how much different the limit distributions of T n and T het n are from each other and from the limit distribution, call it F OS, in the case of homoscedasticity. One observation is immediate: the limit distribution of T n is more complicated than that of T het n in that it depends on the variance function through the constants c c 2 as well as through the covariance function of the process Z Z 2. For this reason, we conjecture that the test based on T het n will generally be the more robust of the two to heteroscedasticity when F OS is used to obtain critical values for each test. It is interesting to note, however, that the limiting distributions of the two statistics are the same if the variance function is linear in x, since in that case c j for all j. To investigate the question of robustness, we shall measure how far the limit distribution of T het n is from F OS in the case of a quadratic variance function, i.e., σ 2 x β 0 β x β 2 x 2. Here, we have Cov Z j Z j A j B j j where and A j β 2 j π 2 j 2 π 2 j j 2 β 2 j π 2 j 2 π 2 j j 2 B j 2β 0 β 4 β 2 6 β 2 2π j 2 2β 0 β 4 β 2 6 β 2 2π 2 9

11 By letting β β 2 0 and β 0 β 2 0, one imizes Cov Z j Z. The limiting value of Cov Z j Z j is j π 2 j 2 π 2 j j 2 6 2π j 2 6 2π 2 for 2 (5) We now use simulation to investigate the limit distribution of T het n. Let Z Z 2 Z K have a multivariate normal distribution such that E Z i 0 and Var Z i, i K, and Cov Z j Z ( j ) is given by (5). Ten thousand replications of T K het K j Z 2 j were generated for each of K to approximate the 95th percentile of the limiting distribution of T het n. The results in Table indicate that the departure of this percentile from that of F OS, 4.793, is insubstantial. An empirical approximation of P T 80 het is Table. Approximations to 95th percentile of Thet K for a quadratic variance function K Percentile The calculations above lead us to conjecture that the correlation among the Z j s has very little impact on the limiting distribution of T het n. We can conclusively show that the impact is small when K 2. Let T ρ denote the random variable Z 2 Z 2 Z when the correlation between Z and Z 2 is ρ. Using numerical integration, we may compute P T ρ t for any given t and ρ. The 90th, 95th and 99th percentiles of T 0 are , and 6.736, respectively (Hart (997), page 79). For various ρ, we have computed P T ρ t for t The results are shown in Figure and indicate that the difference P T 0 t P T ρ t tends to be quite small in absolute value for ρ 8. For very large ρ, the difference is slightly larger, but positive, implying that, at least for K 2, a false assumption of homoscedasticity would lead to a conservative test. 0

12 Tail Probablity t= t=4.077 t= rho Figure. P T ρ t as a function of correlation ρ. 4.2 Wild Bootstrap Algorithm Ideally we desire a method that correctly adjusts a statistic s critical values to account for heteroscedasticity. The wild bootstrap is such a method. It was proposed by Wu (986), studied further by Liu (988) and considered in nonparametric regression and given its name by Härdle and Mammen (993). This procedure is called the wild bootstrap since n different distributions are estimated from n residuals. The wild bootstrap algorithm we propose is as follows. Let e i Y i Y, i n, and define bootstrap data Y Y n by Y i Y e i η i i n where η η n are a random sample from an arbitrary distribution having first moment zero and second and third moments both. The null distribution of a statistic S Y Y n is approximated by using Monte Carlo methods to generate many independent copies of S Y Y n. A popular and simple choice for the distribution of η i is the two-point distribution that assigns probabilities and 5 0 and to 5 2, respectively. This distribution and two continuous possibilities were proposed by Mammen (993). The intuitive motivation for the wild bootstrap is that, at least in many cases, it ensures that the first three moments of the bootstrap distribution asymptotically match those of the underlying null distribution. Mammen (993) provides some theoretical bacing for the wild bootstrap in the setting of a linear model whose number of parameters increases sufficiently slowly with the sample size. These results would undoubtedly

13 be useful in a theoretical study of the wild bootstrap in our current setting. However, we defer such a study to future research, and restrict attention in the rest of the paper to simulations and a real-data example. 5. SIMULATION STUDY The validity and of our tests are studied in this section via simulation. Validity is investigated using a nominal significance level of To detect level differences of 0.0 or larger, 2000 replications were conducted for each simulation study. The advice of Efron and Tibshirani (993) is followed by using 000 bootstrap samples on each replication, since the nominal test level is Finally, sample sizes of 5, 30 and 80 were used. 5. I.I.D. cases We first consider the simple linear regression model Y i β x i ε i, where x i i 0 5 n, i n, and ε ε n are i.i.d. random variables with mean 0 and variance σ 2. The variance σ 2 was taen to be 0 0, and β 0, 0.03, 0.09, 0.8 and 0.4. Four different choices for the error distribution were considered: Gaussian, exponential, t with 4 degrees of freedom, and uniform. Each of these distributions was shifted and rescaled as needed to yield a mean of 0 and variance 0.0. We considered a bootstrap test, a wild bootstrap test, a and a parametric t-test. Both types of bootstrap test and the use the test statistic T n defined in (5). Critical values of the are percentiles of the distribution F OS (Hart (997), p. 78). Critical values of the bootstrap and wild bootstrap tests were obtained as described in Sections 3 and 4.2, respectively. For the distribution of η i in the wild bootstrap, we used the two-point distribution defined in Section 4.2. The parametric t-test is simply the classical t-test of H 0 : β 0 based on the assumption of normally distributed errors. Finally, two additional tests were considered to investigate how of the bootstrap tests is affected (if any) by using the wrong error distribution when the alternative hypothesis is true. For a data set Y Y n, define where E i Y i β 0 β x i i n β 0 and β are the least squares estimates of intercept and slope, respectively. One may carry out bootstrap and wild bootstrap tests based on these residuals in exactly the same way tests are done with the residuals e i Y i Y, i n. We will refer to the two types of tests as (wild) and (wild) bootstrap E tests. So, a total of six tests were carried out for each data set generated. We first discuss the results for i.i.d. Gaussian errors (Figure 2). To save space we only show what happened for n 5 and 30, although our comments apply to n 80 as well. Both and bootstrap E are satisfactory and very similar in terms of level and despite the fact that the residuals E use nowledge of the regression function. The wild bootstrap performs almost as well as the bootstrap 2

14 tests except when n 5, this in spite of the true model being homoscedastic. Interestingly, the empirical levels of tend to the nominal level from below as sample size increases while those of wild bootstrap E have the opposite behavior. Use of the asymptotic distribution, F OS, provides satisfactory results when n is large. However, the empirical level of the is below 0.05 when n 5. Consequently, this test has lower than do the bootstrap tests, and thus seems less desirable when n is small. As n increases, the five nonparametric tests perform similarly. The parametric t test provides correct test levels and, as expected, is the most ful test among the six. n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 2. Power comparison for the six tests with i.i.d. Gaussian errors. The flat line here and in subsequent Section 5 figures indicates the nominal level of The performance of each test was practically invariant to error distribution type, and hence we do not show any results for the nonnormal cases. So far all wild bootstrap results have used the two-point distribution of Section 4.2 for η i. We also tried two other schemes for generating i.i.d. η i s, both of which were proposed by Mammen (993): η 2 i V i 2 V 2 i 2 where V i N 0, and where δ η 3 i 2, δ 2 75 δ V i 2 U i δ 2 2 δ δ and V i and U i are i.i.d. N 0 random variables. For large n, the three wild bootstrap methods were not much different. However, the two-point distribution produced more consistent results across the different error distributions, in that test levels did not exceed the 3

15 nominal level. Use of η 3 i leads to higher than the other two methods in small sample cases, but has test levels that are slightly high. It is not surprising that each wild bootstrap test performs relatively poorly when n 5, as the wild bootstrap does not tae advantage of the i.i.d. error structure. We also considered the regression functions r x bsin πx and r x bsin 3πx with the errors i.i.d. Gaussian. Figures 3 and 4 reveal that the test performs better than the for both functions when n 5. However there is not much difference in among the three tests at n 30. n=5 n= b b Figure 3. Rejection rates of tests with r x bsin πx and i.i.d. Gaussian errors. n=5 n= b b Figure 4. Rejection rates of tests with r x bsin 3πx and i.i.d. Gaussian errors. 4

16 5.2 Heteroscedastic cases We now consider the model Y i β x i σ x i η i, i n, where η η n are i.i.d. N 0 and the - shaped variance function σ 2 x x x 2. The same values of β as in Section 5. were used and 0 σ2 x dx 0 0, maing the scale more or less comparable to that in Section 5.. The same six tests as before are used. The simulation shows (Figures 5 and 6) that the level of the parametric t-test is much higher than the nominal level. The test based on T n and the i.i.d. limit distribution F OS also has excessive empirical levels. Bootstrap e and bootstrap E fail to maintain the correct test levels, as expected. The only valid test seems to be the, although its empirical level tends to be low, leading to low for n 5. n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 5. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T n. As argued in Section 4., T het n may have an advantage over T n in heteroscedastic cases. Evidence to this effect is seen in the simulation. When both T het n and T n are compared to the large sample homoscedastic critical values, the T het n-based test has better level accuracy. (Compare Figures 5 and 6.) Liewise, level accuracy of the and E tests is better for T het n than for T n. In other words, T het n seems to be the more robust statistic to departures from homoscedasticity. 5

17 n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 6. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T het n. We also tried the -shaped variance function σ 2 x x x 2 ; see in Figures 7 and 8. Here, the levels of the bootstrap and s based on T n and the parametric t test are significantly lower than the nominal level. The - and -shaped variance function affect the test levels in opposite directions. However, tests based on T het n still produce results with more accurate test levels, showing its robustness again to heteroscedasticity. The -shaped variance function does not seriously affect relative to the homoscedastic case. The bootstrap and s based on T het n tend to have higher in general, which is undoubtedly a consequence of their superior level properties. Simulations for the regression function r x bsin 3πx with both - and -shaped error variance functions were also performed. Of the, and homoscedastic s based on T n, only the wild bootstrap test maintained the correct level. The tests based on T het n all had correct levels, but their was very low when n 5. We did not see this phenomenon in the straight line model. A possible explanation for the low performance of T het n is that the large variation of Var φ j hinders the detection of a regression model with more local curvature. This problem could perhaps be alleviated by smoothing the squared residuals in order to stabilize estimators of Var φ j. 6

18 n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 7. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T n. n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 8. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T het n. Using a local linear smoother to obtain variance estimates, we improved the of the homoscedastic. For the wild bootstrap test, we need to smooth the squared residuals for the test statistic and all bootstrap statistics. If a data-driven bandwidth is used for the test statistic, then ideally we would 7

19 choose a different bandwidth in each wild bootstrap sample. To avoid time-consuming computations, the same bandwidth was used (in a given data set Y Y n ) for T het n and all wild bootstrap statistics. The one-sided cross-validation procedure of Hart and Yi (998) was used to select the bandwidth in T n. This implementation led to improvement for the wild bootstrap test. In the straight line model (both i.i.d. and heteroscedastic cases), the tests based on T n and T het n had comparable for n 5. No improvement in was found by applying the smoothing technique to T het n, which tends to validate our conjecture that the large variation of Var φ j is a hindrance only when the regression model has sufficient curvature. 6. EXAMPLE People with diabetes generally have high blood glucose levels. Uncontrolled high blood glucose can lead to long term complications such as retinopathy, neuropathy, nephropathy, and amputation. The American Diabetes Association (2000) recommends eeping the pre-meal blood glucose level between 80 and 20 mg/dl (4.4 to 6.7 mmol/l), or hemoglobin A c (HbA c) below 7%. Daily blood glucose levels can be easily monitored by diabetics themselves. However, the measurements are short-term and relatively unstable, often affected by emotion, meal intaes, etc. On the other hand, HbA c measures the long term glycemic control, but requires laboratory testing. In medical practice, daily morning fasting blood glucose (FBG) has been targeted for day to day glycemic control and is considered correlated to HbA c. Other opinion has expressed the importance of targeting postprandial blood glucose (usually two hours after meals, 2PPBG) to improve long term glycemic control. The example we consider here was from a clinical trial studying glycemic control in diabetics. The order selection test was used to study the relationships between HbA c and blood glucose measurements (FBG or 2PPBG). Scatter plots for HbA c versus FBG and 2PPBG are shown in Figure 9. The variation of HbA c appears to decrease approximately linearly as FBG increases, whereas variation of HbA c seems to be more or less constant in 2PPBG. Since the covariate values are not evenly spaced, we carried out the no-effect tests by regressing HbA c on the rans of each covariate (as discussed in Section 2). Table 2 provides strong evidence against the hypothesis that FBG has no effect on HbA c. P-values for the (using F OS ), bootstrap test and wild bootstrap test are less than.005 and very similar to each other, for both T n and T het n. The parametric t test assuming a straight line relationship between HbA c and FBG results in a similar P-value of for testing H 0 : β 0. As for the relationship between 2PPBG and HbA c, the P-values are even more significant, less than 0.000, for all the tests. These data support the importance of targeting either the morning fasting blood glucose or postprandial blood glucose to improve the long term glycemic control measured by HbA c. However, the postprandial glucose seems to 8

20 HbAc (%) HbAc (%) Morning Fasting Blood Glucose (mmol/l) Two Hour Postprandial Blood Glucose (mmol/l) Figure 9. HbA c versus morning fasting and two hour postprandial blood glucose measurements. Table 2. P-values for tests of the hypotheses that FBG and 2PPBG have no effect on HbA c. Test FBG 2PPBG T n T het n T n T het n F OS Bootstrap Wild Bootstrap have a greater effect on HbA c. As we mentioned in Section 2, an attractive feature of the order selection test is that once the null hypothesis is rejected, an immediate smooth estimate of the regression function is at hand. The smooth curves in Figure 9 are Fourier series estimates using truncation points equal to arg j was. φ 2 j, which in each case 7. CONCLUSION In this research we have implemented bootstrap methods to approximate the sampling distribution of order selection test statistics. For homoscedastic errors, the bootstrap is a better method than the, especially when the sample size is small. The wild bootstrap, meant for models with heteroscedastic errors, 9

21 also has satisfactory performance in i.i.d. cases when the sample size is not small. This is encouraging since in practice there is uncertainty about heteroscedasticity. i As for the residuals used in the bootstrap algorithm, we found two reasons why the residuals e i Y i Y, n, are a good choice for the no-effect hypothesis. First, computing the e i s is straightforward since no estimate of a regression model is required. Secondly, when the null hypothesis is true, we are sampling directly from the appropriate empirical distribution by resampling from e e n. By the same toen, resampling from ε i Y i Y i would be appropriate when testing the adequacy of the fitted values Y Y n. The order selection tests based on either the i.i.d. asymptotic distribution or the bootstrap are at best approximately valid in heteroscedastic cases. Wild bootstrapping, however, leads to tests that are asymptotically valid. A new test statistic T het n has been proposed and studied. In contrast to the statistic T n, T het n explicitly estimates the unnown error variance function. Tests based on T het n have comparable to ones based on T n when the regression function is a straight line. An advantage of T het n is that it is more robust than T n to heteroscedasticity when each statistic is compared with critical values based on the assumption of homoscedasticity. A disadvantage of T het n is that its larger variation (under heteroscedasticity) can hinder its ability to detect regression functions with substantial curvature, such as trigonometric functions. Fortunately, this disadvantage can be repaired by smoothing squared residuals. What is the best test? There is no simple answer for this easy question. None of the tests is best for every situation. However, if the sample size is not too small (n 30), the wild bootstrap test seems a good choice. It is the only asymptotically valid test (among those considered) in the heteroscedastic case and performs comparably to the bootstrap and s in i.i.d. cases. In case the sample size is small, the identification of heteroscedasticity will be crucial. The bootstrap test is the best choice for models with i.i.d. errors, whereas the wild bootstrap is better when heteroscedasticity is present. We have seen in the simulation that the test based on the smoothed version of T het n and using critical values from F OS maintains the correct level remarably well and also has satisfactory in the heteroscedastic cases. However, it needs further study to determine its robustness under other patterns of heteroscedasticity. The simulation was done with evenly spaced designs. We briefly discussed in Section 2 two possible solutions for cases with unevenly spaced or random designs. It would be interesting to see which has the best performance. The effects, if any, that a random design has on the null distribution of the order selection test statistic will be of interest. The exploration of proper bootstrap methods in this setting is a topic for future research. Thus far we have studied the order selection test of no-effect in simple regression models. Of interest are extensions to testing the fit of general parametric models and also multiple regression settings (i.e., more than one predictor). Extensions in each of these directions is certainly possible. This can be done by combining 20

22 ideas in Aerts et al. (999) and Aerts et al. (2000) with those in the current paper. 8. APPENDIX Proof of Theorem 3 Since E e 2 i σ 2 i O n uniformly in i, we can use a law of large numbers to establish that σ 2 P σ 2 0 σ 2 x dx By Slutsy s Theorem, T n has the same limit distribution as Ť n, where Ť n 2n φ n 2 j j σ 2 We may write where c jn 2nVar φ j σ 2 and Z 2 jn than n, the proof consists of demonstrating i and ii below: i P K n j ii P K n Ť n n c jn Z 2 jn j φ 2 j Var φ j. Letting K n be a sequence that goes to at a slower rate j c jn Z 2 jn t P c jn Z 2 jn t P sup n j j c jn Z 2 jn t 0 c j Z 2 j t 0 We first prove (i). It is straightforward to show that j c jz 2 j almost surely as, and hence in considering P T n t we may tae t. Note that P K n n j c jn Z 2 jn t P K n n c jn Z 2 jn c jn j K n n j c jn t It may be shown that j c jn as and n, and hence, for all sufficiently large n, the last probability is at least P K n n c jn Z 2 jn c jn j Let Q jn c jn Z 2 jn. It suffices to show that for any δ 0, P K n n j Q jn δ t 2 2

23 Let j be the largest integer such that j 2 and j j j 2, define K n and j 2 be the largest integer such that j 2 2 n. For each n ξ jn If j is such that j 2 j 2 n, and if j 2 r j 2 i i j 2 j 2 r j 2 r Q n r j 2, then Q j2 n j 2 Q rn ξ jn j 2 Thus we have K n n j Q jn δ j 3 j j j 2 j 2 r Q rn δ 2 ξ jn δ j 2 2 where j 3 j 2 if j 2 2 P K n n n, and j 3 j 2 otherwise. Hence j Q jn By Marov s inequality, we have j 3 δ j j P j 2 P j 3 P j 2 r j j j 3 j 2 j 3 j j j j P j 2 Q rn δ 2 j 2 r j 2 j 2 r Q rn j 2 r Q rn Q rn j 3 4 j j δ 2 ξ jn j 2 δ 2 j3 δ 2 P j j δ 2 δ 2 j 4 E Using the moment conditions and the boundedness of the cosine function, E j 2 r j 3 P ξ jn j j j 2 ξ jn j 2 δ 2 δ 2 2 Q rn (6) r j2 Q 2 rn Var j2 r c rnz 2 rn O j 2, and hence the right-hand side of inequality (6) is of the order j 3 j j j 2, which tends to 0 as K n. Again, by Marov s inequality, we have j 3 P ξ jn j j j 2 δ 2 j 3 4 j j δ 2 j 4 Eξ2 rn We now use a result of Serfling (970) to deal with Eξ 2 jn. Denote the joint distribution function of the random variables Q a n Q a n by F a. There exists a constant D such that E i Q in to be the following functional: g F a a E Q in D i a 2 D. Define g F a 22

24 Obviously g F a g F a g F a, and a E Q in i a Applying Theorem A of Serfling (970), we have 2 D g F a l Eξ 2 jn log j 2 2 j D Hence j 3 P ξ jn j j j 2 Combining the preceding results, δ 2 j 3 4 j j δ 2 j 4 log j 2 2 j D 0 P K n n j c jn Z 2 jn t which clearly leads to (i). Next we show (ii). For any positive integer K, let T n K K A K j We need to show that for any δ 0 c jn Z 2 jn T K K j T n K t and B K K K n c j Z 2 j T sup j j c jn Z 2 jn t c j Z 2 j P T n K n t P T t δ for all n sufficiently large. Since P T n K n t P A K B K, we have P T n K n t P T t P A K P B K P A K B K P T t P A K P T K t P T K t P T t P B K P A K B K P A K P T K t P T K t P T t P B c K One may show that c jn c j ε, j n, for any ε 0 and all n sufficiently large. Using this fact, the joint asymptotic normality of φ φ K, and the continuous mapping theorem, we have P A K P T K t 0 (7) for all fixed K as n. Clearly there exists K a such that P T Ka t P T t δ 3 (8) 23

25 Arguing as in our proof of (i), there exists K 0 K 0 ε for any ε 0 such that P K 0 K n j c jn Z 2 jn t ε (9) for all n sufficiently large. Using equations (7), (8), (9) and choosing K b K 0 δ 3 have K a, we P T n K n t P T t P A Kb P T Kb t P T Kb t P T t P B c K b δ for all n sufficiently large, which completes the proof of Theorem 3. that Proof of Theorem 4 Using essentially the same argument as in the proof of Theorem 3, it can be shown φ n 2 j j Var φ j D sup j Z 2 j (20) Now, T het n n φ 2 j j Var φ j φ 2 j j Var φ j Var φ j Var φ j Note that φ 2 j j Var φ j Var φ j Var φ j Var n φ j Var φ j φ 2 j j Var φ j Using this last inequality, (20) and assumption (4), it now follows that φ 2 j n j Var φ j Var φ j Var φ j converges in probability to 0, and the result follows. REFERENCES Aerts, M., Claesens, G., and Hart, J. (999), Testing the Fit of a Parametric Function, Journal of the American Statistical Association, 94, (2000), Testing Lac of Fit in Multiple Regression, Biometria, 87, American Diabetes Association (2000), Standards of Medical Care for Patients With Diabetes Mellitus, Diabetes Care, 23, Supplement, S32 S42. Efron, B., and Tibshirani, R. J. (993), An Introduction to the Bootstrap, New Yor: Chapman and Hall. 24

26 Euban, R. L., and Hart, J. D. (992), Testing Goodness of Fit in Regression via Order Selection Criteria, The Annals of Statistics, 20, Hall, P., and Wilson, S. R. (99), Two Guidelines for Bootstrap Hypothesis Testing, Biometrics, 47, Härdle, W., and Mammen, E. (993), Comparing Nonparametric Versus Parametric Regression Fits, The Annals of Statistics, 2, Hart, J. D. (997), Nonparametric Smoothing and Lac-of-Fit Tests, New Yor: Springer-Verlag. Hart, J. D., and Yi, S. (998), One-Sided Cross-Validation, Journal of the American Statistical Association, 93, Kuchibhatla, M., and Hart, J. D. (996), Smoothing-Based Lac of Fit Test: Variations on a Theme, Journal of Nonparametric Statistics, 7, 22. Liu, R. Y. (988), Bootstrap Procedures Under Some Non-I.I.D. Models, The Annals of Statistics, 6, Mammen, E. (993), Bootstrap and Wild Bootstrap for High Dimensional Linear Models, The Annals of Statistics, 2, Parzen, E. (98), Nonparametric Statistical Data Science: A Unified Approach Based on Density Estimation and Testing for White Noise, Technical report, Department of Statistics, Texas A&M University. Qumsiyeh, M. B. (990), Edgeworth Expansion in Regression Models, Journal of Multivariate Analysis, 35, (994), Bootstrapping and Empirical Edgeworth Expansions in Multiple Linear Regression Models, Communications in Statistics Theory and Methods, 23, Serfling, R. J. (970), Moment Inequalities for Maximum Cumulative Sum, The Annals of Mathematical Statistics, 4, Spitzer, F. (956), A Combinatorial Lemma and Its Application to Probability Theory, Transactions of the American Mathematical Society, 82, Wu, C. F. J. (986), Jacnife, Bootstrap and Other Resampling Methods in Regression Analysis, The Annals of Statistics, 4,

11. Bootstrap Methods

11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods