Bootstrapping The Order Selection Test
|
|
- Hester Morrison
- 5 years ago
- Views:
Transcription
1 Bootstrapping The Order Selection Test Chien-Feng Chen, Jeffrey D. Hart, and Suojin Wang ABSTRACT We consider bootstrap versions of the order selection test of Euban and Hart (992) and Kuchibhatla and Hart (996) for testing lac-of-fit of regression models. For homoscedastic data, conditions are established under which the bootstrap level error is smaller (asymptotically) than that of the large sample test. A new statistic is proposed to deal with the case of heteroscedastic data. The limiting distribution of this test statistic is derived and shown to depend on the unnown error variance function. This dependency maes using the a formidable tas in practice. An alternative approximation is to apply bootstrap procedures. We propose various bootstrap tests, including ones based on the wild bootstrap. Simulation studies indicate that the wild bootstrap generally has good level and properties, although sometimes can be increased by appropriate smoothing of squared residuals. A real-data example is also considered to further illustrate the methodology. KEY WORDS: Bootstrap; Heteroscedasticity; Nonparametric smoothing; Order selection test; Wild bootstrap. Chien-Feng Chen is Senior Statistician, Insulins Product Team, Eli Lilly and Company, Indianapolis, IN Jeffrey D. Hart is Professor, Department of Statistics, Texas A&M University, College Station, TX Suojin Wang is Professor, Department of Statistics, Texas A&M University, College Station, TX Hart s research was supported in part by NSF Grant DMS Wang s research was supported by the Texas Advanced Research Program ( ), the National Cancer Institute (CA 57030) and the Texas A&M Center of Environmental and Rural Health through a grant from the National Institute of Environmental Health Sciences (P30-E50906). The authors are grateful to an Associate Editor and a referee for their review of our wor and helpful comments.
2 . INTRODUCTION In recent years using nonparametric smoothing techniques in testing lac-of-fit of regression models has drawn much attention. Smoothing-based tests surpass the classical nonparametric tests, such as the von Neumann test and cusum tests, in more than one way. They tend to be more ful and can provide estimates of the regression function when the null hypothesis is rejected. As a consequence of smoothing, most of the proposed tests depend on smoothing parameters. To have a desired level of significance, smoothing parameters need to be fixed in advance to carry out the tests. Furthermore, the choice of smoothing parameters has an effect on the test. Unlie most of the smoothing-based tests, the order selection test of Euban and Hart (992) does not depend on arbitrarily chosen smoothing parameters. The test utilizes an orthogonal series estimate of the underlying regression function by using the truncation point (a smoothing parameter) of the estimate as the test statistic. Therefore, the test statistic is itself a data-driven smoothing parameter. An equivalent form of the test pointed out by Kuchibhatla and Hart (996) provides a continuous valued test statistic that maes computation of P-values relatively straightforward. The order selection test is consistent against fixed alternatives and can detect local alternatives that converge to the null at the rate n. This paper focuses on testing the no-effect hypothesis, i.e., the hypothesis that the regression function is constant. In doing so we mae use of the Kuchibhatla and Hart (996) version of the order selection test. The exact distribution of one version of the test statistic is nown when the errors are independent and identically distributed (i.i.d.) Gaussian. The asymptotic distribution was obtained by Euban and Hart (992) assuming only that the errors are i.i.d. with finite fourth moments. However, the validity of approximating the sampling distribution by an asymptotic one depends on factors such as sample size, error distribution and estimation of error variance. An alternative and often better approximation is to apply bootstrap procedures. One goal of this paper is to provide theoretical justification for the bootstrap in the case of i.i.d. errors. Another aim of the paper is to deal with heteroscedastic regression models. The asymptotic distribution of the test statistic of Kuchibhatla and Hart (996) is derived and shown to depend on the unnown variance function. The discrepancy between the asymptotic distribution and its homoscedastic counterpart is non-negligible. Ignorance of model heteroscedasticity will invalidate the use of the order selection test. Approximating the heteroscedastic asymptotic distribution involves, among other things, estimation of the error variance function, which is not an easy tas. The wild bootstrap method is a convenient tool for producing a consistent estimator of a statistic s sampling distribution when the errors have nonconstant variances. We employ wild bootstrap methods in this research to approximate the sampling distribution of test statistics for the no-effect hypothesis. A new test statistic (denoted T het n), equivalent in the asymptotic sense to that of Kuchibhatla and Hart (996) in the
3 i.i.d. case, is proposed. Although the limiting distribution of T het n also depends on the unnown error variance, the dependence is shown to be minor. The asymptotic distribution under i.i.d. assumptions can be considered a ballpar substitute for the heteroscedastic one. The rest of the paper is organized as follows. In Section 2 we briefly review the development of the order selection test. Bootstrap procedures are discussed and applied to a homoscedastic model in Section 3. Here, it is shown that the bootstrap level error is asymptotically smaller than that of the. Section 4 deals with heteroscedastic models. Asymptotic theory and bootstrap procedures are explored. Simulation results for both i.i.d. and heteroscedastic cases are presented in Section 5. In Section 6 we apply the two test statistics through s, bootstrap tests and wild bootstrap tests to an example from a diabetes clinical trial. The conclusions reached in this research and some open questions for future research are given in Section 7. The proofs of two theorems are presented in the Appendix. 2. THE ORDER SELECTION TEST Consider the simple regression model Y i r x i ε i i n () where Y Y n are the observed responses, r is the regression function, x i i 5 n, i n, and ε ε n are i.i.d. error terms with zero mean and variance σ 2. As long as the regression function is piecewise smooth on 0, then at all its continuity points it can be represented by the Fourier series where the Fourier coefficients are r x φ 0 2 φ j cos π jx (2) j φ j 0 r x cos π jx dx j 0 In analogy to the Fourier series above, we may estimate r x by the truncated series r x;m where m is a nonnegative integer less than n and m φ 0 2 φ j cos π jx (3) j n φ j n Y i cos π jx i j 0 n (4) A fundamental problem in regression is testing the no-effect hypothesis, H 0 : r x C for all x 0 2
4 where C is an unnown constant. When (2) holds, the no-effect hypothesis is equivalent to the hypothesis that φ j 0, for all j. The Euban and Hart (992) order selection test for no-effect is based on the data driven truncation point m, an estimator of the m in (3) that minimizes an estimated ris function. Specifically, m is the imizer of J ;γ α, where J 0;γ α 0 J m 2n φ m;γ α 2 j j σ 2 γ α m m n σ 2 is any consistent estimator of σ 2 and γ α is a constant that depends upon the desired significance level α. A value of m is evidence that at least one φ j is nonzero; hence the test rejects the null hypothesis at level α if m. Taing γ α 3 22, 4 79 and yields asymptotic tests of size.0,.05 and.0, respectively. An attractive feature of this test is that once the null hypothesis is rejected, an immediate point estimate of the regression function is at hand: r x m φ 0 2 φ j cos π jx j An equivalent form of the test by Kuchibhatla and Hart (996) uses a continuous-valued test statistic T n 2n φ n 2 j j σ 2 (5) and H 0 is rejected for large values of T n. As long as the errors are assumed to be i.i.d. with finite fourth moments, T n converges in distribution (under H 0 ) to T sup Z 2 j j where Z Z 2 are i.i.d. standard normal random variables. As shown by Spitzer (956), the distribution of T can be determined to any desired accuracy. The asymptotic α level critical value of the T n -based test is precisely the value γ α that induces an asymptotic level of α for the m version of the order selection test. In cases where design points are fixed but not evenly spaced, there are at least two solutions. First, one may test for constancy of the regression quantile function, as defined by Parzen (98). Let u j j 5 n, j n, and suppose the (unevenly spaced) design points satisfy x j Q n u j j n where Q n is a piecewise constant empirical quantile function that converges to some Q as n. The hypothesis H 0 : r x j C, j n, is equivalent to H 0 : rq n u j C, j n. Hence the test procedures described previously can be applied to the regression quantile function rq n. A second approach is to define a version of T n in terms of basis functions that are orthogonal with respect to the design points. Given any set of basis functions, one may easily construct an orthogonal basis from them by a Gram-Schmidt 3
5 procedure, but in fact doing so is not necessary, as discussed in Euban and Hart (992). The asymptotic distribution theory for these two methods is the same as that for T n. Extensions to the random design case are also straightforward by conditioning on the observed x-values. 3. BOOTSTRAPPING WITH I.I.D. ERRORS Our main purpose in this section is to show that a bootstrap method often better approximates the null distribution of T n than does the large sample distribution. In our bootstrap algorithm, we need to simulate data from a model that assumes H 0 to be true, which is in eeping with one of the two bootstrap guidelines set forth by Hall and Wilson (99). To this end, let Y be the sample mean of Y Y n, and define bootstrap data by Y i Y ε i i n where ε ε n are i.i.d. as F n, the empirical distribution of e Y Y e n Y n Y. Define bootstrap Fourier coefficients by n φ j n Y i cos π jx i j n (6) i and σ 2 to be exactly the same function of Y Y n as σ 2 is of Y Y n. For the remainder of this section we assume that σ 2 n n i Y i Y 2. Our test statistic will be and its bootstrap counterpart T n 0 T n j j 2n φ 2 j σ 2 2n φ 2 j σ 2 The statistic T n 0 satisfies the second Hall and Wilson (99) guideline, namely it is an (asymptotic) pivotal quantity. The number 0 is fixed, but allowed to be arbitrarily large. Ideally, we would choose 0 n (as in Section 2), but this leads to technical difficulties in proving a bootstrap accuracy result. In practice, we have found that choice of 0 is a very minor point since it is extremely rare for the imum of j 2n φ 2 j σ 2 to occur at a larger than 5. In order to show that the bootstrap distribution accurately estimates the null distribution of the test statistic T n 0, we will mae use of the theoretical results of Qumsiyeh (990, 994) that were developed for the bootstrap in multiple regression models. Let X be the n 0 matrix with first column all s and entry 2cos π j x i in the ith row and jth column, i n, j 2 0. The discussion above indicates that, for some large enough but fixed 0 n, model () may be well approximated by the model Y Xβ ε 4
6 with Y Y Y n β φ 0 2φ 2φ 0, and ε ε ε n. Note that X X ni p p for p 0, and, as can easily be seen from (4) and (6) β φ 0 2 φ 2 φ 0 X X X Y n X Y, the least squares estimate of β, and β φ 0 2 φ 2 φ 0 n X Y. We mae the following assumptions on regularity conditions: A. The ε i s are i.i.d. with mean 0 and finite variance σ 2 0. A2. ε i has a non-zero absolutely continuous component which has a positive density on an open subset of R. A3. ε i has a finite 2s-th absolute moment for some integer s 3. Let Φ r and ψ r be the standard normal distribution and its density on R r, respectively. Denote the mapping of T n 0 from R 0 to R by Q Q x x 0 0 j x2 j. We also denote by P the probability under F n. Further, define B as a class of all Borel subsets of R 0 satisfying (i) sup E B Φ p E η O η as η 0, where E η is the set of points in R p within η of the boundary of E B, and (ii) each set E B corresponds to a Borel set D R with E Q D. Theorem. Let Assumptions A A3 hold and suppose that s in A3 is greater than p. Then under the null hypothesis H 0 that r is constant, we have a.s. as n sup P 0 t T n 0 t P T n 0 t o n 2 (7), and hence the bootstrap approximation for the distribution of T n 0 is asymptotically more accurate than the normal approximation, which generally has an error of O n 2. Proof. A complete proof is very long. We will setch the main steps by applying the results of Qumsiyeh (990, 994). First we show the following two-term Edgeworth expansion for P T n 0 D : sup P E T n 0 D B E n 2 P x ψ 0 x dx o n 2 (8) where P is a polynomial function with its coefficients depending on the ν-th and lower order cumulants κ ν of β for some ν. We use an argument similar to that in Qumsiyeh s (994) proof of his Theorem σ, where φ φ 0. The only difference 2.4 under H 0 for studentized vector W n n 2 β 0 0 between W n and the statistics W n in Theorem 2.4 of Qumsiyeh (994) is that σ in W n uses the null model Y i X i β to estimate the standard error. However, under H 0, it is easy to see that the difference between the estimated standard errors is of O p n. Therefore, residuals e i s while W n uses the full model residuals ε i sup P E W n E B E n 2 P x ψ 0 β 0 x dx o n 2 (9) for some polynomial P x as defined above. Equation (8) then follows since T n 0 Q W n and thus P T n 0 D P W n E for D Q E. 5
7 Next, we give the bootstrap version of equation (8): a.s. as n sup P E T n 0 D B E n 2 P x ψ 0 x dx o n 2 (0), where P is the same polynomial as P but with the cumulants κ ν replaced by the β given e i s. A proof of (0) can be obtained similarly to that of Theorem conditional cumulants κ ν of 3.3 of Qumsiyeh (994) in the same way (8) was derived by first arriving at (9). There are two differences between our bootstrap version W n of W n and Qumsiyeh s W n. The first one is between using the null model and the full model residuals in estimating the standard errors of the bootstrap statistics. This difference leads to a negligible error of O p n under H 0. The second difference is to resample ε i from the null model residuals e i and from the full model residuals ε i. However, this difference contributes only a negligible error and the following continues to hold when ε i are sampled from e i : sup P E W n E n 2 P x ψ 0 x dx o n 2 B E a.s. as n, since it is readily seen that Lemma 3. of Qumsiyeh (994) on Cramér s condition for the empirical distribution function is still valid when ε i are sampled from e i instead of ε i. Expression (0) then follows. proof. Using the fact that κ j κ j a.s. under H 0 and combining (8) and (0) yields (7), thus completing the Another important consideration is how the bootstrap test behaves when the alternative hypothesis is true, in which case the main concern is. If the alternative holds, the empirical distribution function of Y Y Y n Y is not a consistent estimator of the underlying error distribution. Nonetheless, under mild conditions, the bootstrap distribution P T n 0 t is a consistent estimator of the null distribution of the test statistic, as seen in the following theorem. The consistency of P is enough to ensure that our bootstrap test has desirable properties, such as consistency against a large class of alternatives. Theorem 2. Under model (), suppose that ε ε n are i.i.d. with finite fourth moments and that r is piecewise continuous on 0. Furthermore, let Z Z 0 be i.i.d. standard normal random variables and define T 0 0 j Z2 j. Then converges in probability to 0 as n sup 0 t P T n 0 t P T 0 t. In addition, if the of the bootstrap test tends to as n. 0 r x cos π jx dx 0 for some j 0, then Proof. Let G t P T 0 t and define T n 0 σ σ T n 0. Then for any positive δ, we have P T n 0 t G t P σ σ δ P T n 0 σ σ t σ σ δ G t () 6
8 Using the moment conditions on ε and the piecewise continuity of r, it is straightforward to show that P σ σ δ converges in probability to 0. It is enough, then, to investigate the second term on the right hand side of inequality (). That term is bounded by p n t G t p 2n t G t, where p n t P T n 0 δ t σ σ δ and p 2n t is defined exactly the same but with δ t replaced by δ t. We consider only p n t G t since p 2n t G t is handled in exactly the same way. It is elementary to show that p n t G t P T n 0 δ t G δ t The random variable T 0 has a density g such that sup 0 t tg t that sup G 0 t t G δ t G t G δ t 2P σ σ δ Cδ C for some constant C, which implies The last term can be made arbitrarily small by choosing δ small enough. The only term left to deal with is P T n 0 δ t G δ t, for which we use a Berry-Esséen type result in Bhattacharya and Ranga Rao (976). Applying their Theorem 3.3, we have Y i Y 4 sup 0 x P T n 0 x G x a 0 0 2n n i n where a 0 is a constant that depends only on 0. The result now follows from the piecewise continuity of r and the fact that ε has finite fourth moment. To prove the consistency of the test, note that the first part of the theorem establishes that the bootstrap percentiles consistently estimate those of T 0. It is enough, then, to argue that T n 0 tends to infinity in probabilty as n. By assumption, there exists J 0 such that φ J def 0 r x cos πjx dx 0. We have T n 0 2n φ 2 J σ 2 from which the result follows since φ J is consistent for φ J and σ 2 is consistent for where r 0 r x dx. 0 r x r 2 dx σ 2, 4. HETEROSCEDASTICITY In practice the assumption of equal error variances is sometimes violated. Under such circumstances, inferences based on homoscedasticity may be invalid regardless of whether parametric or nonparametric methods are used. A reasonable model for heteroscedasticity is to assume that the error variances follow a smooth function of some nown variable. Here we assume that the error variance is a function of the predictor x. In 7
9 Section 4. we consider how heteroscedasticity affects the large sample distribution of T n and another statistic that seems better suited for heteroscedastic data. In Section 4.2 we propose a wild bootstrap algorithm for approximating the distribution of statistics in the presence of heteroscedasticity. 4. Large Sample Distribution of Statistics Consider the model Y i r x i σ x i η i i n (2) where E η i 0, Var η i, i n and σ is some positive function. The following theorem provides the large sample distribution of T n under model (2) when r C. The proof of this result is given in the Appendix. Theorem 3. Assume that η η n in model (2) are independent with finite fourth moments, and that the error variance function σ 2 x has two continuous derivatives on [0,]. Then, if r is identical to a constant, where c j D T n sup Z Z 2 Z m are jointly normal for all m, Z j N 0, j j c j Z 2 j 0 σ2 x cos 2π jx dx 0 σ2 x dx 2, and Cov Z j Z 0 σ2 x cos π jx cos π x dx 0 σ2 x cos 2 π jx dx 0 σ2 x cos 2 π x dx Not surprisingly, the asymptotic distribution of T n depends on the unnown error variance function. Hence, to put Theorem 3 to practical use, one is faced with the daunting tas of estimating σ 2 x. Another possible test statistic is T het n n j φ 2 j Var φ j On the surface T het n seems better suited for heteroscedasticity than does T n since each φ j is correctly standardized. Note that Var φ j n n 2 σ 2 i cos 2 π jx i (3) i which is asymptotic to n we obtain a large sample distribution for T het n. 0 σ2 x cos 2 π jx dx under the conditions of Theorem 3. In the following theorem 8
10 Theorem 4. Assume that the conditions of Theorem 3 hold, and that Var φ j is uniformly consistent for Var φ j in the sense that Then, if r is identical to a constant, Var j n φ j Var φ j T D het n sup Z 2 j j where Z Z 2 have the same distribution as in Theorem 3. P 0 (4) Estimation of Var φ j requires estimation of the variance function σ 2 x, which may be done either parametrically or nonparametrically. The latter approach can be achieved by smoothing the squared residuals e 2 e 2 n. Another, simpler, possibility is to use the estimator Var φ j n n 2 e 2 i cos2 π jx i i In the case of a parametric model for σ 2 x, if wealy consistent estimators of model parameters are available, then one may show that the uniform consistency condition in Theorem 4 is achieved by using a version of Var φ j that replaces each σ 2 i in (3) by its parametric estimator. It is of interest to investigate how much different the limit distributions of T n and T het n are from each other and from the limit distribution, call it F OS, in the case of homoscedasticity. One observation is immediate: the limit distribution of T n is more complicated than that of T het n in that it depends on the variance function through the constants c c 2 as well as through the covariance function of the process Z Z 2. For this reason, we conjecture that the test based on T het n will generally be the more robust of the two to heteroscedasticity when F OS is used to obtain critical values for each test. It is interesting to note, however, that the limiting distributions of the two statistics are the same if the variance function is linear in x, since in that case c j for all j. To investigate the question of robustness, we shall measure how far the limit distribution of T het n is from F OS in the case of a quadratic variance function, i.e., σ 2 x β 0 β x β 2 x 2. Here, we have Cov Z j Z j A j B j j where and A j β 2 j π 2 j 2 π 2 j j 2 β 2 j π 2 j 2 π 2 j j 2 B j 2β 0 β 4 β 2 6 β 2 2π j 2 2β 0 β 4 β 2 6 β 2 2π 2 9
11 By letting β β 2 0 and β 0 β 2 0, one imizes Cov Z j Z. The limiting value of Cov Z j Z j is j π 2 j 2 π 2 j j 2 6 2π j 2 6 2π 2 for 2 (5) We now use simulation to investigate the limit distribution of T het n. Let Z Z 2 Z K have a multivariate normal distribution such that E Z i 0 and Var Z i, i K, and Cov Z j Z ( j ) is given by (5). Ten thousand replications of T K het K j Z 2 j were generated for each of K to approximate the 95th percentile of the limiting distribution of T het n. The results in Table indicate that the departure of this percentile from that of F OS, 4.793, is insubstantial. An empirical approximation of P T 80 het is Table. Approximations to 95th percentile of Thet K for a quadratic variance function K Percentile The calculations above lead us to conjecture that the correlation among the Z j s has very little impact on the limiting distribution of T het n. We can conclusively show that the impact is small when K 2. Let T ρ denote the random variable Z 2 Z 2 Z when the correlation between Z and Z 2 is ρ. Using numerical integration, we may compute P T ρ t for any given t and ρ. The 90th, 95th and 99th percentiles of T 0 are , and 6.736, respectively (Hart (997), page 79). For various ρ, we have computed P T ρ t for t The results are shown in Figure and indicate that the difference P T 0 t P T ρ t tends to be quite small in absolute value for ρ 8. For very large ρ, the difference is slightly larger, but positive, implying that, at least for K 2, a false assumption of homoscedasticity would lead to a conservative test. 0
12 Tail Probablity t= t=4.077 t= rho Figure. P T ρ t as a function of correlation ρ. 4.2 Wild Bootstrap Algorithm Ideally we desire a method that correctly adjusts a statistic s critical values to account for heteroscedasticity. The wild bootstrap is such a method. It was proposed by Wu (986), studied further by Liu (988) and considered in nonparametric regression and given its name by Härdle and Mammen (993). This procedure is called the wild bootstrap since n different distributions are estimated from n residuals. The wild bootstrap algorithm we propose is as follows. Let e i Y i Y, i n, and define bootstrap data Y Y n by Y i Y e i η i i n where η η n are a random sample from an arbitrary distribution having first moment zero and second and third moments both. The null distribution of a statistic S Y Y n is approximated by using Monte Carlo methods to generate many independent copies of S Y Y n. A popular and simple choice for the distribution of η i is the two-point distribution that assigns probabilities and 5 0 and to 5 2, respectively. This distribution and two continuous possibilities were proposed by Mammen (993). The intuitive motivation for the wild bootstrap is that, at least in many cases, it ensures that the first three moments of the bootstrap distribution asymptotically match those of the underlying null distribution. Mammen (993) provides some theoretical bacing for the wild bootstrap in the setting of a linear model whose number of parameters increases sufficiently slowly with the sample size. These results would undoubtedly
13 be useful in a theoretical study of the wild bootstrap in our current setting. However, we defer such a study to future research, and restrict attention in the rest of the paper to simulations and a real-data example. 5. SIMULATION STUDY The validity and of our tests are studied in this section via simulation. Validity is investigated using a nominal significance level of To detect level differences of 0.0 or larger, 2000 replications were conducted for each simulation study. The advice of Efron and Tibshirani (993) is followed by using 000 bootstrap samples on each replication, since the nominal test level is Finally, sample sizes of 5, 30 and 80 were used. 5. I.I.D. cases We first consider the simple linear regression model Y i β x i ε i, where x i i 0 5 n, i n, and ε ε n are i.i.d. random variables with mean 0 and variance σ 2. The variance σ 2 was taen to be 0 0, and β 0, 0.03, 0.09, 0.8 and 0.4. Four different choices for the error distribution were considered: Gaussian, exponential, t with 4 degrees of freedom, and uniform. Each of these distributions was shifted and rescaled as needed to yield a mean of 0 and variance 0.0. We considered a bootstrap test, a wild bootstrap test, a and a parametric t-test. Both types of bootstrap test and the use the test statistic T n defined in (5). Critical values of the are percentiles of the distribution F OS (Hart (997), p. 78). Critical values of the bootstrap and wild bootstrap tests were obtained as described in Sections 3 and 4.2, respectively. For the distribution of η i in the wild bootstrap, we used the two-point distribution defined in Section 4.2. The parametric t-test is simply the classical t-test of H 0 : β 0 based on the assumption of normally distributed errors. Finally, two additional tests were considered to investigate how of the bootstrap tests is affected (if any) by using the wrong error distribution when the alternative hypothesis is true. For a data set Y Y n, define where E i Y i β 0 β x i i n β 0 and β are the least squares estimates of intercept and slope, respectively. One may carry out bootstrap and wild bootstrap tests based on these residuals in exactly the same way tests are done with the residuals e i Y i Y, i n. We will refer to the two types of tests as (wild) and (wild) bootstrap E tests. So, a total of six tests were carried out for each data set generated. We first discuss the results for i.i.d. Gaussian errors (Figure 2). To save space we only show what happened for n 5 and 30, although our comments apply to n 80 as well. Both and bootstrap E are satisfactory and very similar in terms of level and despite the fact that the residuals E use nowledge of the regression function. The wild bootstrap performs almost as well as the bootstrap 2
14 tests except when n 5, this in spite of the true model being homoscedastic. Interestingly, the empirical levels of tend to the nominal level from below as sample size increases while those of wild bootstrap E have the opposite behavior. Use of the asymptotic distribution, F OS, provides satisfactory results when n is large. However, the empirical level of the is below 0.05 when n 5. Consequently, this test has lower than do the bootstrap tests, and thus seems less desirable when n is small. As n increases, the five nonparametric tests perform similarly. The parametric t test provides correct test levels and, as expected, is the most ful test among the six. n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 2. Power comparison for the six tests with i.i.d. Gaussian errors. The flat line here and in subsequent Section 5 figures indicates the nominal level of The performance of each test was practically invariant to error distribution type, and hence we do not show any results for the nonnormal cases. So far all wild bootstrap results have used the two-point distribution of Section 4.2 for η i. We also tried two other schemes for generating i.i.d. η i s, both of which were proposed by Mammen (993): η 2 i V i 2 V 2 i 2 where V i N 0, and where δ η 3 i 2, δ 2 75 δ V i 2 U i δ 2 2 δ δ and V i and U i are i.i.d. N 0 random variables. For large n, the three wild bootstrap methods were not much different. However, the two-point distribution produced more consistent results across the different error distributions, in that test levels did not exceed the 3
15 nominal level. Use of η 3 i leads to higher than the other two methods in small sample cases, but has test levels that are slightly high. It is not surprising that each wild bootstrap test performs relatively poorly when n 5, as the wild bootstrap does not tae advantage of the i.i.d. error structure. We also considered the regression functions r x bsin πx and r x bsin 3πx with the errors i.i.d. Gaussian. Figures 3 and 4 reveal that the test performs better than the for both functions when n 5. However there is not much difference in among the three tests at n 30. n=5 n= b b Figure 3. Rejection rates of tests with r x bsin πx and i.i.d. Gaussian errors. n=5 n= b b Figure 4. Rejection rates of tests with r x bsin 3πx and i.i.d. Gaussian errors. 4
16 5.2 Heteroscedastic cases We now consider the model Y i β x i σ x i η i, i n, where η η n are i.i.d. N 0 and the - shaped variance function σ 2 x x x 2. The same values of β as in Section 5. were used and 0 σ2 x dx 0 0, maing the scale more or less comparable to that in Section 5.. The same six tests as before are used. The simulation shows (Figures 5 and 6) that the level of the parametric t-test is much higher than the nominal level. The test based on T n and the i.i.d. limit distribution F OS also has excessive empirical levels. Bootstrap e and bootstrap E fail to maintain the correct test levels, as expected. The only valid test seems to be the, although its empirical level tends to be low, leading to low for n 5. n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 5. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T n. As argued in Section 4., T het n may have an advantage over T n in heteroscedastic cases. Evidence to this effect is seen in the simulation. When both T het n and T n are compared to the large sample homoscedastic critical values, the T het n-based test has better level accuracy. (Compare Figures 5 and 6.) Liewise, level accuracy of the and E tests is better for T het n than for T n. In other words, T het n seems to be the more robust statistic to departures from homoscedasticity. 5
17 n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 6. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T het n. We also tried the -shaped variance function σ 2 x x x 2 ; see in Figures 7 and 8. Here, the levels of the bootstrap and s based on T n and the parametric t test are significantly lower than the nominal level. The - and -shaped variance function affect the test levels in opposite directions. However, tests based on T het n still produce results with more accurate test levels, showing its robustness again to heteroscedasticity. The -shaped variance function does not seriously affect relative to the homoscedastic case. The bootstrap and s based on T het n tend to have higher in general, which is undoubtedly a consequence of their superior level properties. Simulations for the regression function r x bsin 3πx with both - and -shaped error variance functions were also performed. Of the, and homoscedastic s based on T n, only the wild bootstrap test maintained the correct level. The tests based on T het n all had correct levels, but their was very low when n 5. We did not see this phenomenon in the straight line model. A possible explanation for the low performance of T het n is that the large variation of Var φ j hinders the detection of a regression model with more local curvature. This problem could perhaps be alleviated by smoothing the squared residuals in order to stabilize estimators of Var φ j. 6
18 n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 7. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T n. n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 8. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T het n. Using a local linear smoother to obtain variance estimates, we improved the of the homoscedastic. For the wild bootstrap test, we need to smooth the squared residuals for the test statistic and all bootstrap statistics. If a data-driven bandwidth is used for the test statistic, then ideally we would 7
19 choose a different bandwidth in each wild bootstrap sample. To avoid time-consuming computations, the same bandwidth was used (in a given data set Y Y n ) for T het n and all wild bootstrap statistics. The one-sided cross-validation procedure of Hart and Yi (998) was used to select the bandwidth in T n. This implementation led to improvement for the wild bootstrap test. In the straight line model (both i.i.d. and heteroscedastic cases), the tests based on T n and T het n had comparable for n 5. No improvement in was found by applying the smoothing technique to T het n, which tends to validate our conjecture that the large variation of Var φ j is a hindrance only when the regression model has sufficient curvature. 6. EXAMPLE People with diabetes generally have high blood glucose levels. Uncontrolled high blood glucose can lead to long term complications such as retinopathy, neuropathy, nephropathy, and amputation. The American Diabetes Association (2000) recommends eeping the pre-meal blood glucose level between 80 and 20 mg/dl (4.4 to 6.7 mmol/l), or hemoglobin A c (HbA c) below 7%. Daily blood glucose levels can be easily monitored by diabetics themselves. However, the measurements are short-term and relatively unstable, often affected by emotion, meal intaes, etc. On the other hand, HbA c measures the long term glycemic control, but requires laboratory testing. In medical practice, daily morning fasting blood glucose (FBG) has been targeted for day to day glycemic control and is considered correlated to HbA c. Other opinion has expressed the importance of targeting postprandial blood glucose (usually two hours after meals, 2PPBG) to improve long term glycemic control. The example we consider here was from a clinical trial studying glycemic control in diabetics. The order selection test was used to study the relationships between HbA c and blood glucose measurements (FBG or 2PPBG). Scatter plots for HbA c versus FBG and 2PPBG are shown in Figure 9. The variation of HbA c appears to decrease approximately linearly as FBG increases, whereas variation of HbA c seems to be more or less constant in 2PPBG. Since the covariate values are not evenly spaced, we carried out the no-effect tests by regressing HbA c on the rans of each covariate (as discussed in Section 2). Table 2 provides strong evidence against the hypothesis that FBG has no effect on HbA c. P-values for the (using F OS ), bootstrap test and wild bootstrap test are less than.005 and very similar to each other, for both T n and T het n. The parametric t test assuming a straight line relationship between HbA c and FBG results in a similar P-value of for testing H 0 : β 0. As for the relationship between 2PPBG and HbA c, the P-values are even more significant, less than 0.000, for all the tests. These data support the importance of targeting either the morning fasting blood glucose or postprandial blood glucose to improve the long term glycemic control measured by HbA c. However, the postprandial glucose seems to 8
20 HbAc (%) HbAc (%) Morning Fasting Blood Glucose (mmol/l) Two Hour Postprandial Blood Glucose (mmol/l) Figure 9. HbA c versus morning fasting and two hour postprandial blood glucose measurements. Table 2. P-values for tests of the hypotheses that FBG and 2PPBG have no effect on HbA c. Test FBG 2PPBG T n T het n T n T het n F OS Bootstrap Wild Bootstrap have a greater effect on HbA c. As we mentioned in Section 2, an attractive feature of the order selection test is that once the null hypothesis is rejected, an immediate smooth estimate of the regression function is at hand. The smooth curves in Figure 9 are Fourier series estimates using truncation points equal to arg j was. φ 2 j, which in each case 7. CONCLUSION In this research we have implemented bootstrap methods to approximate the sampling distribution of order selection test statistics. For homoscedastic errors, the bootstrap is a better method than the, especially when the sample size is small. The wild bootstrap, meant for models with heteroscedastic errors, 9
21 also has satisfactory performance in i.i.d. cases when the sample size is not small. This is encouraging since in practice there is uncertainty about heteroscedasticity. i As for the residuals used in the bootstrap algorithm, we found two reasons why the residuals e i Y i Y, n, are a good choice for the no-effect hypothesis. First, computing the e i s is straightforward since no estimate of a regression model is required. Secondly, when the null hypothesis is true, we are sampling directly from the appropriate empirical distribution by resampling from e e n. By the same toen, resampling from ε i Y i Y i would be appropriate when testing the adequacy of the fitted values Y Y n. The order selection tests based on either the i.i.d. asymptotic distribution or the bootstrap are at best approximately valid in heteroscedastic cases. Wild bootstrapping, however, leads to tests that are asymptotically valid. A new test statistic T het n has been proposed and studied. In contrast to the statistic T n, T het n explicitly estimates the unnown error variance function. Tests based on T het n have comparable to ones based on T n when the regression function is a straight line. An advantage of T het n is that it is more robust than T n to heteroscedasticity when each statistic is compared with critical values based on the assumption of homoscedasticity. A disadvantage of T het n is that its larger variation (under heteroscedasticity) can hinder its ability to detect regression functions with substantial curvature, such as trigonometric functions. Fortunately, this disadvantage can be repaired by smoothing squared residuals. What is the best test? There is no simple answer for this easy question. None of the tests is best for every situation. However, if the sample size is not too small (n 30), the wild bootstrap test seems a good choice. It is the only asymptotically valid test (among those considered) in the heteroscedastic case and performs comparably to the bootstrap and s in i.i.d. cases. In case the sample size is small, the identification of heteroscedasticity will be crucial. The bootstrap test is the best choice for models with i.i.d. errors, whereas the wild bootstrap is better when heteroscedasticity is present. We have seen in the simulation that the test based on the smoothed version of T het n and using critical values from F OS maintains the correct level remarably well and also has satisfactory in the heteroscedastic cases. However, it needs further study to determine its robustness under other patterns of heteroscedasticity. The simulation was done with evenly spaced designs. We briefly discussed in Section 2 two possible solutions for cases with unevenly spaced or random designs. It would be interesting to see which has the best performance. The effects, if any, that a random design has on the null distribution of the order selection test statistic will be of interest. The exploration of proper bootstrap methods in this setting is a topic for future research. Thus far we have studied the order selection test of no-effect in simple regression models. Of interest are extensions to testing the fit of general parametric models and also multiple regression settings (i.e., more than one predictor). Extensions in each of these directions is certainly possible. This can be done by combining 20
22 ideas in Aerts et al. (999) and Aerts et al. (2000) with those in the current paper. 8. APPENDIX Proof of Theorem 3 Since E e 2 i σ 2 i O n uniformly in i, we can use a law of large numbers to establish that σ 2 P σ 2 0 σ 2 x dx By Slutsy s Theorem, T n has the same limit distribution as Ť n, where Ť n 2n φ n 2 j j σ 2 We may write where c jn 2nVar φ j σ 2 and Z 2 jn than n, the proof consists of demonstrating i and ii below: i P K n j ii P K n Ť n n c jn Z 2 jn j φ 2 j Var φ j. Letting K n be a sequence that goes to at a slower rate j c jn Z 2 jn t P c jn Z 2 jn t P sup n j j c jn Z 2 jn t 0 c j Z 2 j t 0 We first prove (i). It is straightforward to show that j c jz 2 j almost surely as, and hence in considering P T n t we may tae t. Note that P K n n j c jn Z 2 jn t P K n n c jn Z 2 jn c jn j K n n j c jn t It may be shown that j c jn as and n, and hence, for all sufficiently large n, the last probability is at least P K n n c jn Z 2 jn c jn j Let Q jn c jn Z 2 jn. It suffices to show that for any δ 0, P K n n j Q jn δ t 2 2
23 Let j be the largest integer such that j 2 and j j j 2, define K n and j 2 be the largest integer such that j 2 2 n. For each n ξ jn If j is such that j 2 j 2 n, and if j 2 r j 2 i i j 2 j 2 r j 2 r Q n r j 2, then Q j2 n j 2 Q rn ξ jn j 2 Thus we have K n n j Q jn δ j 3 j j j 2 j 2 r Q rn δ 2 ξ jn δ j 2 2 where j 3 j 2 if j 2 2 P K n n n, and j 3 j 2 otherwise. Hence j Q jn By Marov s inequality, we have j 3 δ j j P j 2 P j 3 P j 2 r j j j 3 j 2 j 3 j j j j P j 2 Q rn δ 2 j 2 r j 2 j 2 r Q rn j 2 r Q rn Q rn j 3 4 j j δ 2 ξ jn j 2 δ 2 j3 δ 2 P j j δ 2 δ 2 j 4 E Using the moment conditions and the boundedness of the cosine function, E j 2 r j 3 P ξ jn j j j 2 ξ jn j 2 δ 2 δ 2 2 Q rn (6) r j2 Q 2 rn Var j2 r c rnz 2 rn O j 2, and hence the right-hand side of inequality (6) is of the order j 3 j j j 2, which tends to 0 as K n. Again, by Marov s inequality, we have j 3 P ξ jn j j j 2 δ 2 j 3 4 j j δ 2 j 4 Eξ2 rn We now use a result of Serfling (970) to deal with Eξ 2 jn. Denote the joint distribution function of the random variables Q a n Q a n by F a. There exists a constant D such that E i Q in to be the following functional: g F a a E Q in D i a 2 D. Define g F a 22
24 Obviously g F a g F a g F a, and a E Q in i a Applying Theorem A of Serfling (970), we have 2 D g F a l Eξ 2 jn log j 2 2 j D Hence j 3 P ξ jn j j j 2 Combining the preceding results, δ 2 j 3 4 j j δ 2 j 4 log j 2 2 j D 0 P K n n j c jn Z 2 jn t which clearly leads to (i). Next we show (ii). For any positive integer K, let T n K K A K j We need to show that for any δ 0 c jn Z 2 jn T K K j T n K t and B K K K n c j Z 2 j T sup j j c jn Z 2 jn t c j Z 2 j P T n K n t P T t δ for all n sufficiently large. Since P T n K n t P A K B K, we have P T n K n t P T t P A K P B K P A K B K P T t P A K P T K t P T K t P T t P B K P A K B K P A K P T K t P T K t P T t P B c K One may show that c jn c j ε, j n, for any ε 0 and all n sufficiently large. Using this fact, the joint asymptotic normality of φ φ K, and the continuous mapping theorem, we have P A K P T K t 0 (7) for all fixed K as n. Clearly there exists K a such that P T Ka t P T t δ 3 (8) 23
25 Arguing as in our proof of (i), there exists K 0 K 0 ε for any ε 0 such that P K 0 K n j c jn Z 2 jn t ε (9) for all n sufficiently large. Using equations (7), (8), (9) and choosing K b K 0 δ 3 have K a, we P T n K n t P T t P A Kb P T Kb t P T Kb t P T t P B c K b δ for all n sufficiently large, which completes the proof of Theorem 3. that Proof of Theorem 4 Using essentially the same argument as in the proof of Theorem 3, it can be shown φ n 2 j j Var φ j D sup j Z 2 j (20) Now, T het n n φ 2 j j Var φ j φ 2 j j Var φ j Var φ j Var φ j Note that φ 2 j j Var φ j Var φ j Var φ j Var n φ j Var φ j φ 2 j j Var φ j Using this last inequality, (20) and assumption (4), it now follows that φ 2 j n j Var φ j Var φ j Var φ j converges in probability to 0, and the result follows. REFERENCES Aerts, M., Claesens, G., and Hart, J. (999), Testing the Fit of a Parametric Function, Journal of the American Statistical Association, 94, (2000), Testing Lac of Fit in Multiple Regression, Biometria, 87, American Diabetes Association (2000), Standards of Medical Care for Patients With Diabetes Mellitus, Diabetes Care, 23, Supplement, S32 S42. Efron, B., and Tibshirani, R. J. (993), An Introduction to the Bootstrap, New Yor: Chapman and Hall. 24
26 Euban, R. L., and Hart, J. D. (992), Testing Goodness of Fit in Regression via Order Selection Criteria, The Annals of Statistics, 20, Hall, P., and Wilson, S. R. (99), Two Guidelines for Bootstrap Hypothesis Testing, Biometrics, 47, Härdle, W., and Mammen, E. (993), Comparing Nonparametric Versus Parametric Regression Fits, The Annals of Statistics, 2, Hart, J. D. (997), Nonparametric Smoothing and Lac-of-Fit Tests, New Yor: Springer-Verlag. Hart, J. D., and Yi, S. (998), One-Sided Cross-Validation, Journal of the American Statistical Association, 93, Kuchibhatla, M., and Hart, J. D. (996), Smoothing-Based Lac of Fit Test: Variations on a Theme, Journal of Nonparametric Statistics, 7, 22. Liu, R. Y. (988), Bootstrap Procedures Under Some Non-I.I.D. Models, The Annals of Statistics, 6, Mammen, E. (993), Bootstrap and Wild Bootstrap for High Dimensional Linear Models, The Annals of Statistics, 2, Parzen, E. (98), Nonparametric Statistical Data Science: A Unified Approach Based on Density Estimation and Testing for White Noise, Technical report, Department of Statistics, Texas A&M University. Qumsiyeh, M. B. (990), Edgeworth Expansion in Regression Models, Journal of Multivariate Analysis, 35, (994), Bootstrapping and Empirical Edgeworth Expansions in Multiple Linear Regression Models, Communications in Statistics Theory and Methods, 23, Serfling, R. J. (970), Moment Inequalities for Maximum Cumulative Sum, The Annals of Mathematical Statistics, 4, Spitzer, F. (956), A Combinatorial Lemma and Its Application to Probability Theory, Transactions of the American Mathematical Society, 82, Wu, C. F. J. (986), Jacnife, Bootstrap and Other Resampling Methods in Regression Analysis, The Annals of Statistics, 4,
11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationDefect Detection using Nonparametric Regression
Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare
More informationA Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance
CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice - theory and application in finance R. Scott Hacker and Abdulnasser Hatemi-J April 200
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationInequalities Relating Addition and Replacement Type Finite Sample Breakdown Points
Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationA better way to bootstrap pairs
A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationModified Simes Critical Values Under Positive Dependence
Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia
More informationRefining the Central Limit Theorem Approximation via Extreme Value Theory
Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of
More informationHeteroskedasticity-Robust Inference in Finite Samples
Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard
More informationA Resampling Method on Pivotal Estimating Functions
A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More informationREJOINDER Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions
arxiv: arxiv:0000.0000 REJOINDER Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions Li Pan and Dimitris N. Politis Li Pan Department of Mathematics University of California
More informationA Bootstrap Test for Conditional Symmetry
ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School
More informationInferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals
Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Michael Sherman Department of Statistics, 3143 TAMU, Texas A&M University, College Station, Texas 77843,
More informationBootstrapping sequential change-point tests
Bootstrapping sequential change-point tests Claudia Kirch February 2008 Abstract In this paper we propose some bootstrapping methods to obtain critical values for sequential change-point tests. We consider
More informationTest for Discontinuities in Nonparametric Regression
Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationResearch Article A Nonparametric Two-Sample Wald Test of Equality of Variances
Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner
More informationGoodness-of-fit tests for the cure rate in a mixture cure model
Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationA Test of Cointegration Rank Based Title Component Analysis.
A Test of Cointegration Rank Based Title Component Analysis Author(s) Chigira, Hiroaki Citation Issue 2006-01 Date Type Technical Report Text Version publisher URL http://hdl.handle.net/10086/13683 Right
More informationwhere x i and u i are iid N (0; 1) random variates and are mutually independent, ff =0; and fi =1. ff(x i )=fl0 + fl1x i with fl0 =1. We examine the e
Inference on the Quantile Regression Process Electronic Appendix Roger Koenker and Zhijie Xiao 1 Asymptotic Critical Values Like many other Kolmogorov-Smirnov type tests (see, e.g. Andrews (1993)), the
More informationConfidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods
Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationDiscussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon
Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More informationSize and Power of the RESET Test as Applied to Systems of Equations: A Bootstrap Approach
Size and Power of the RESET Test as Applied to Systems of Equations: A Bootstrap Approach Ghazi Shukur Panagiotis Mantalos International Business School Department of Statistics Jönköping University Lund
More informationOFFICE OF NAVAL RESEARCH FINAL REPORT for TASK NO. NR PRINCIPAL INVESTIGATORS: Jeffrey D. Hart Thomas E. Wehrly
AD-A240 830 S ~September 1 10, 1991 OFFICE OF NAVAL RESEARCH FINAL REPORT for 1 OCTOBER 1985 THROUGH 31 AUGUST 1991 CONTRACT N00014-85-K-0723 TASK NO. NR 042-551 Nonparametric Estimation of Functions Based
More informationQuantile Regression for Residual Life and Empirical Likelihood
Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu
More informationUTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN. EE & CS, The University of Newcastle, Australia EE, Technion, Israel.
UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN Graham C. Goodwin James S. Welsh Arie Feuer Milan Depich EE & CS, The University of Newcastle, Australia 38. EE, Technion, Israel. Abstract:
More informationA Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.
A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationMinimum distance tests and estimates based on ranks
Minimum distance tests and estimates based on ranks Authors: Radim Navrátil Department of Mathematics and Statistics, Masaryk University Brno, Czech Republic (navratil@math.muni.cz) Abstract: It is well
More informationRobust Backtesting Tests for Value-at-Risk Models
Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society
More informationA nonparametric two-sample wald test of equality of variances
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David
More informationImproving linear quantile regression for
Improving linear quantile regression for replicated data arxiv:1901.0369v1 [stat.ap] 16 Jan 2019 Kaushik Jana 1 and Debasis Sengupta 2 1 Imperial College London, UK 2 Indian Statistical Institute, Kolkata,
More informationTHE information capacity is one of the most important
256 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 1, JANUARY 1998 Capacity of Two-Layer Feedforward Neural Networks with Binary Weights Chuanyi Ji, Member, IEEE, Demetri Psaltis, Senior Member,
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationHANDBOOK OF APPLICABLE MATHEMATICS
HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationSimulating Uniform- and Triangular- Based Double Power Method Distributions
Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationhigh-dimensional inference robust to the lack of model sparsity
high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview
Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationTest Volume 11, Number 1. June 2002
Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa
More informationA note on profile likelihood for exponential tilt mixture models
Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential
More informationTopics in Probability Theory and Stochastic Processes Steven R. Dunbar. Worst Case and Average Case Behavior of the Simplex Algorithm
Steven R. Dunbar Department of Mathematics 203 Avery Hall University of Nebrasa-Lincoln Lincoln, NE 68588-030 http://www.math.unl.edu Voice: 402-472-373 Fax: 402-472-8466 Topics in Probability Theory and
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationA comparison study of the nonparametric tests based on the empirical distributions
통계연구 (2015), 제 20 권제 3 호, 1-12 A comparison study of the nonparametric tests based on the empirical distributions Hyo-Il Park 1) Abstract In this study, we propose a nonparametric test based on the empirical
More informationInformation Theoretic Asymptotic Approximations for Distributions of Statistics
Information Theoretic Asymptotic Approximations for Distributions of Statistics Ximing Wu Department of Agricultural Economics Texas A&M University Suojin Wang Department of Statistics Texas A&M University
More informationUniversity of California San Diego and Stanford University and
First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationLikelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science
1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School
More informationSUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota
Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In
More informationTesting Homogeneity Of A Large Data Set By Bootstrapping
Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationThe Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing
1 The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing Greene Ch 4, Kennedy Ch. R script mod1s3 To assess the quality and appropriateness of econometric estimators, we
More informationProfessors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th
DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann
More informationTESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST
Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationUnderstanding Regressions with Observations Collected at High Frequency over Long Span
Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University
More informationSmooth nonparametric estimation of a quantile function under right censoring using beta kernels
Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Chanseok Park 1 Department of Mathematical Sciences, Clemson University, Clemson, SC 29634 Short Title: Smooth
More informationDS-GA 1002 Lecture notes 2 Fall Random variables
DS-GA 12 Lecture notes 2 Fall 216 1 Introduction Random variables Random variables are a fundamental tool in probabilistic modeling. They allow us to model numerical quantities that are uncertain: the
More informationLarge Sample Properties of Estimators in the Classical Linear Regression Model
Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in
More informationThe Number of Bootstrap Replicates in Bootstrap Dickey-Fuller Unit Root Tests
Working Paper 2013:8 Department of Statistics The Number of Bootstrap Replicates in Bootstrap Dickey-Fuller Unit Root Tests Jianxin Wei Working Paper 2013:8 June 2013 Department of Statistics Uppsala
More informationA NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1
A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 Jinglin Zhou Hong Wang, Donghua Zhou Department of Automation, Tsinghua University, Beijing 100084, P. R. China Control Systems Centre,
More informationInferential statistics
Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationRank-sum Test Based on Order Restricted Randomized Design
Rank-sum Test Based on Order Restricted Randomized Design Omer Ozturk and Yiping Sun Abstract One of the main principles in a design of experiment is to use blocking factors whenever it is possible. On
More informationBootstrap Tests: How Many Bootstraps?
Bootstrap Tests: How Many Bootstraps? Russell Davidson James G. MacKinnon GREQAM Department of Economics Centre de la Vieille Charité Queen s University 2 rue de la Charité Kingston, Ontario, Canada 13002
More informationInference in VARs with Conditional Heteroskedasticity of Unknown Form
Inference in VARs with Conditional Heteroskedasticity of Unknown Form Ralf Brüggemann a Carsten Jentsch b Carsten Trenkler c University of Konstanz University of Mannheim University of Mannheim IAB Nuremberg
More informationWorking Paper No Maximum score type estimators
Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior
More informationPower and Sample Size Calculations with the Additive Hazards Model
Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine
More informationA Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators
Statistics Preprints Statistics -00 A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Jianying Zuo Iowa State University, jiyizu@iastate.edu William Q. Meeker
More informationDiscussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis
Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,
More informationBootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator
Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos
More informationRecursive Least Squares for an Entropy Regularized MSE Cost Function
Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares
Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit
More informationEco517 Fall 2014 C. Sims FINAL EXAM
Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,
More informationAdditive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535
Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June
More informationBootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205
Bootstrap (Part 3) Christof Seiler Stanford University, Spring 2016, Stats 205 Overview So far we used three different bootstraps: Nonparametric bootstrap on the rows (e.g. regression, PCA with random
More informationA NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL
Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl
More informationWooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics
Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).
More informationBootstrap-Based Improvements for Inference with Clustered Errors
Bootstrap-Based Improvements for Inference with Clustered Errors Colin Cameron, Jonah Gelbach, Doug Miller U.C. - Davis, U. Maryland, U.C. - Davis May, 2008 May, 2008 1 / 41 1. Introduction OLS regression
More informationThe Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University
The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor
More informationUniformly and Restricted Most Powerful Bayesian Tests
Uniformly and Restricted Most Powerful Bayesian Tests Valen E. Johnson and Scott Goddard Texas A&M University June 6, 2014 Valen E. Johnson and Scott Goddard Texas A&MUniformly University Most Powerful
More informationIdentification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case
Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College Original December 2016, revised July 2017 Abstract Lewbel (2012)
More informationInference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation
Inference about Clustering and Parametric Assumptions in Covariance Matrix Estimation Mikko Packalen y Tony Wirjanto z 26 November 2010 Abstract Selecting an estimator for the variance covariance matrix
More informationSmooth simultaneous confidence bands for cumulative distribution functions
Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang
More informationGMM Estimation of a Maximum Entropy Distribution with Interval Data
GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff January, 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics
More informationBootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap
Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap Emmanuel Flachaire To cite this version: Emmanuel Flachaire. Bootstrapping heteroskedastic regression models: wild bootstrap
More information