Bootstrapping The Order Selection Test

Size: px
Start display at page:

Download "Bootstrapping The Order Selection Test"

Transcription

1 Bootstrapping The Order Selection Test Chien-Feng Chen, Jeffrey D. Hart, and Suojin Wang ABSTRACT We consider bootstrap versions of the order selection test of Euban and Hart (992) and Kuchibhatla and Hart (996) for testing lac-of-fit of regression models. For homoscedastic data, conditions are established under which the bootstrap level error is smaller (asymptotically) than that of the large sample test. A new statistic is proposed to deal with the case of heteroscedastic data. The limiting distribution of this test statistic is derived and shown to depend on the unnown error variance function. This dependency maes using the a formidable tas in practice. An alternative approximation is to apply bootstrap procedures. We propose various bootstrap tests, including ones based on the wild bootstrap. Simulation studies indicate that the wild bootstrap generally has good level and properties, although sometimes can be increased by appropriate smoothing of squared residuals. A real-data example is also considered to further illustrate the methodology. KEY WORDS: Bootstrap; Heteroscedasticity; Nonparametric smoothing; Order selection test; Wild bootstrap. Chien-Feng Chen is Senior Statistician, Insulins Product Team, Eli Lilly and Company, Indianapolis, IN Jeffrey D. Hart is Professor, Department of Statistics, Texas A&M University, College Station, TX Suojin Wang is Professor, Department of Statistics, Texas A&M University, College Station, TX Hart s research was supported in part by NSF Grant DMS Wang s research was supported by the Texas Advanced Research Program ( ), the National Cancer Institute (CA 57030) and the Texas A&M Center of Environmental and Rural Health through a grant from the National Institute of Environmental Health Sciences (P30-E50906). The authors are grateful to an Associate Editor and a referee for their review of our wor and helpful comments.

2 . INTRODUCTION In recent years using nonparametric smoothing techniques in testing lac-of-fit of regression models has drawn much attention. Smoothing-based tests surpass the classical nonparametric tests, such as the von Neumann test and cusum tests, in more than one way. They tend to be more ful and can provide estimates of the regression function when the null hypothesis is rejected. As a consequence of smoothing, most of the proposed tests depend on smoothing parameters. To have a desired level of significance, smoothing parameters need to be fixed in advance to carry out the tests. Furthermore, the choice of smoothing parameters has an effect on the test. Unlie most of the smoothing-based tests, the order selection test of Euban and Hart (992) does not depend on arbitrarily chosen smoothing parameters. The test utilizes an orthogonal series estimate of the underlying regression function by using the truncation point (a smoothing parameter) of the estimate as the test statistic. Therefore, the test statistic is itself a data-driven smoothing parameter. An equivalent form of the test pointed out by Kuchibhatla and Hart (996) provides a continuous valued test statistic that maes computation of P-values relatively straightforward. The order selection test is consistent against fixed alternatives and can detect local alternatives that converge to the null at the rate n. This paper focuses on testing the no-effect hypothesis, i.e., the hypothesis that the regression function is constant. In doing so we mae use of the Kuchibhatla and Hart (996) version of the order selection test. The exact distribution of one version of the test statistic is nown when the errors are independent and identically distributed (i.i.d.) Gaussian. The asymptotic distribution was obtained by Euban and Hart (992) assuming only that the errors are i.i.d. with finite fourth moments. However, the validity of approximating the sampling distribution by an asymptotic one depends on factors such as sample size, error distribution and estimation of error variance. An alternative and often better approximation is to apply bootstrap procedures. One goal of this paper is to provide theoretical justification for the bootstrap in the case of i.i.d. errors. Another aim of the paper is to deal with heteroscedastic regression models. The asymptotic distribution of the test statistic of Kuchibhatla and Hart (996) is derived and shown to depend on the unnown variance function. The discrepancy between the asymptotic distribution and its homoscedastic counterpart is non-negligible. Ignorance of model heteroscedasticity will invalidate the use of the order selection test. Approximating the heteroscedastic asymptotic distribution involves, among other things, estimation of the error variance function, which is not an easy tas. The wild bootstrap method is a convenient tool for producing a consistent estimator of a statistic s sampling distribution when the errors have nonconstant variances. We employ wild bootstrap methods in this research to approximate the sampling distribution of test statistics for the no-effect hypothesis. A new test statistic (denoted T het n), equivalent in the asymptotic sense to that of Kuchibhatla and Hart (996) in the

3 i.i.d. case, is proposed. Although the limiting distribution of T het n also depends on the unnown error variance, the dependence is shown to be minor. The asymptotic distribution under i.i.d. assumptions can be considered a ballpar substitute for the heteroscedastic one. The rest of the paper is organized as follows. In Section 2 we briefly review the development of the order selection test. Bootstrap procedures are discussed and applied to a homoscedastic model in Section 3. Here, it is shown that the bootstrap level error is asymptotically smaller than that of the. Section 4 deals with heteroscedastic models. Asymptotic theory and bootstrap procedures are explored. Simulation results for both i.i.d. and heteroscedastic cases are presented in Section 5. In Section 6 we apply the two test statistics through s, bootstrap tests and wild bootstrap tests to an example from a diabetes clinical trial. The conclusions reached in this research and some open questions for future research are given in Section 7. The proofs of two theorems are presented in the Appendix. 2. THE ORDER SELECTION TEST Consider the simple regression model Y i r x i ε i i n () where Y Y n are the observed responses, r is the regression function, x i i 5 n, i n, and ε ε n are i.i.d. error terms with zero mean and variance σ 2. As long as the regression function is piecewise smooth on 0, then at all its continuity points it can be represented by the Fourier series where the Fourier coefficients are r x φ 0 2 φ j cos π jx (2) j φ j 0 r x cos π jx dx j 0 In analogy to the Fourier series above, we may estimate r x by the truncated series r x;m where m is a nonnegative integer less than n and m φ 0 2 φ j cos π jx (3) j n φ j n Y i cos π jx i j 0 n (4) A fundamental problem in regression is testing the no-effect hypothesis, H 0 : r x C for all x 0 2

4 where C is an unnown constant. When (2) holds, the no-effect hypothesis is equivalent to the hypothesis that φ j 0, for all j. The Euban and Hart (992) order selection test for no-effect is based on the data driven truncation point m, an estimator of the m in (3) that minimizes an estimated ris function. Specifically, m is the imizer of J ;γ α, where J 0;γ α 0 J m 2n φ m;γ α 2 j j σ 2 γ α m m n σ 2 is any consistent estimator of σ 2 and γ α is a constant that depends upon the desired significance level α. A value of m is evidence that at least one φ j is nonzero; hence the test rejects the null hypothesis at level α if m. Taing γ α 3 22, 4 79 and yields asymptotic tests of size.0,.05 and.0, respectively. An attractive feature of this test is that once the null hypothesis is rejected, an immediate point estimate of the regression function is at hand: r x m φ 0 2 φ j cos π jx j An equivalent form of the test by Kuchibhatla and Hart (996) uses a continuous-valued test statistic T n 2n φ n 2 j j σ 2 (5) and H 0 is rejected for large values of T n. As long as the errors are assumed to be i.i.d. with finite fourth moments, T n converges in distribution (under H 0 ) to T sup Z 2 j j where Z Z 2 are i.i.d. standard normal random variables. As shown by Spitzer (956), the distribution of T can be determined to any desired accuracy. The asymptotic α level critical value of the T n -based test is precisely the value γ α that induces an asymptotic level of α for the m version of the order selection test. In cases where design points are fixed but not evenly spaced, there are at least two solutions. First, one may test for constancy of the regression quantile function, as defined by Parzen (98). Let u j j 5 n, j n, and suppose the (unevenly spaced) design points satisfy x j Q n u j j n where Q n is a piecewise constant empirical quantile function that converges to some Q as n. The hypothesis H 0 : r x j C, j n, is equivalent to H 0 : rq n u j C, j n. Hence the test procedures described previously can be applied to the regression quantile function rq n. A second approach is to define a version of T n in terms of basis functions that are orthogonal with respect to the design points. Given any set of basis functions, one may easily construct an orthogonal basis from them by a Gram-Schmidt 3

5 procedure, but in fact doing so is not necessary, as discussed in Euban and Hart (992). The asymptotic distribution theory for these two methods is the same as that for T n. Extensions to the random design case are also straightforward by conditioning on the observed x-values. 3. BOOTSTRAPPING WITH I.I.D. ERRORS Our main purpose in this section is to show that a bootstrap method often better approximates the null distribution of T n than does the large sample distribution. In our bootstrap algorithm, we need to simulate data from a model that assumes H 0 to be true, which is in eeping with one of the two bootstrap guidelines set forth by Hall and Wilson (99). To this end, let Y be the sample mean of Y Y n, and define bootstrap data by Y i Y ε i i n where ε ε n are i.i.d. as F n, the empirical distribution of e Y Y e n Y n Y. Define bootstrap Fourier coefficients by n φ j n Y i cos π jx i j n (6) i and σ 2 to be exactly the same function of Y Y n as σ 2 is of Y Y n. For the remainder of this section we assume that σ 2 n n i Y i Y 2. Our test statistic will be and its bootstrap counterpart T n 0 T n j j 2n φ 2 j σ 2 2n φ 2 j σ 2 The statistic T n 0 satisfies the second Hall and Wilson (99) guideline, namely it is an (asymptotic) pivotal quantity. The number 0 is fixed, but allowed to be arbitrarily large. Ideally, we would choose 0 n (as in Section 2), but this leads to technical difficulties in proving a bootstrap accuracy result. In practice, we have found that choice of 0 is a very minor point since it is extremely rare for the imum of j 2n φ 2 j σ 2 to occur at a larger than 5. In order to show that the bootstrap distribution accurately estimates the null distribution of the test statistic T n 0, we will mae use of the theoretical results of Qumsiyeh (990, 994) that were developed for the bootstrap in multiple regression models. Let X be the n 0 matrix with first column all s and entry 2cos π j x i in the ith row and jth column, i n, j 2 0. The discussion above indicates that, for some large enough but fixed 0 n, model () may be well approximated by the model Y Xβ ε 4

6 with Y Y Y n β φ 0 2φ 2φ 0, and ε ε ε n. Note that X X ni p p for p 0, and, as can easily be seen from (4) and (6) β φ 0 2 φ 2 φ 0 X X X Y n X Y, the least squares estimate of β, and β φ 0 2 φ 2 φ 0 n X Y. We mae the following assumptions on regularity conditions: A. The ε i s are i.i.d. with mean 0 and finite variance σ 2 0. A2. ε i has a non-zero absolutely continuous component which has a positive density on an open subset of R. A3. ε i has a finite 2s-th absolute moment for some integer s 3. Let Φ r and ψ r be the standard normal distribution and its density on R r, respectively. Denote the mapping of T n 0 from R 0 to R by Q Q x x 0 0 j x2 j. We also denote by P the probability under F n. Further, define B as a class of all Borel subsets of R 0 satisfying (i) sup E B Φ p E η O η as η 0, where E η is the set of points in R p within η of the boundary of E B, and (ii) each set E B corresponds to a Borel set D R with E Q D. Theorem. Let Assumptions A A3 hold and suppose that s in A3 is greater than p. Then under the null hypothesis H 0 that r is constant, we have a.s. as n sup P 0 t T n 0 t P T n 0 t o n 2 (7), and hence the bootstrap approximation for the distribution of T n 0 is asymptotically more accurate than the normal approximation, which generally has an error of O n 2. Proof. A complete proof is very long. We will setch the main steps by applying the results of Qumsiyeh (990, 994). First we show the following two-term Edgeworth expansion for P T n 0 D : sup P E T n 0 D B E n 2 P x ψ 0 x dx o n 2 (8) where P is a polynomial function with its coefficients depending on the ν-th and lower order cumulants κ ν of β for some ν. We use an argument similar to that in Qumsiyeh s (994) proof of his Theorem σ, where φ φ 0. The only difference 2.4 under H 0 for studentized vector W n n 2 β 0 0 between W n and the statistics W n in Theorem 2.4 of Qumsiyeh (994) is that σ in W n uses the null model Y i X i β to estimate the standard error. However, under H 0, it is easy to see that the difference between the estimated standard errors is of O p n. Therefore, residuals e i s while W n uses the full model residuals ε i sup P E W n E B E n 2 P x ψ 0 β 0 x dx o n 2 (9) for some polynomial P x as defined above. Equation (8) then follows since T n 0 Q W n and thus P T n 0 D P W n E for D Q E. 5

7 Next, we give the bootstrap version of equation (8): a.s. as n sup P E T n 0 D B E n 2 P x ψ 0 x dx o n 2 (0), where P is the same polynomial as P but with the cumulants κ ν replaced by the β given e i s. A proof of (0) can be obtained similarly to that of Theorem conditional cumulants κ ν of 3.3 of Qumsiyeh (994) in the same way (8) was derived by first arriving at (9). There are two differences between our bootstrap version W n of W n and Qumsiyeh s W n. The first one is between using the null model and the full model residuals in estimating the standard errors of the bootstrap statistics. This difference leads to a negligible error of O p n under H 0. The second difference is to resample ε i from the null model residuals e i and from the full model residuals ε i. However, this difference contributes only a negligible error and the following continues to hold when ε i are sampled from e i : sup P E W n E n 2 P x ψ 0 x dx o n 2 B E a.s. as n, since it is readily seen that Lemma 3. of Qumsiyeh (994) on Cramér s condition for the empirical distribution function is still valid when ε i are sampled from e i instead of ε i. Expression (0) then follows. proof. Using the fact that κ j κ j a.s. under H 0 and combining (8) and (0) yields (7), thus completing the Another important consideration is how the bootstrap test behaves when the alternative hypothesis is true, in which case the main concern is. If the alternative holds, the empirical distribution function of Y Y Y n Y is not a consistent estimator of the underlying error distribution. Nonetheless, under mild conditions, the bootstrap distribution P T n 0 t is a consistent estimator of the null distribution of the test statistic, as seen in the following theorem. The consistency of P is enough to ensure that our bootstrap test has desirable properties, such as consistency against a large class of alternatives. Theorem 2. Under model (), suppose that ε ε n are i.i.d. with finite fourth moments and that r is piecewise continuous on 0. Furthermore, let Z Z 0 be i.i.d. standard normal random variables and define T 0 0 j Z2 j. Then converges in probability to 0 as n sup 0 t P T n 0 t P T 0 t. In addition, if the of the bootstrap test tends to as n. 0 r x cos π jx dx 0 for some j 0, then Proof. Let G t P T 0 t and define T n 0 σ σ T n 0. Then for any positive δ, we have P T n 0 t G t P σ σ δ P T n 0 σ σ t σ σ δ G t () 6

8 Using the moment conditions on ε and the piecewise continuity of r, it is straightforward to show that P σ σ δ converges in probability to 0. It is enough, then, to investigate the second term on the right hand side of inequality (). That term is bounded by p n t G t p 2n t G t, where p n t P T n 0 δ t σ σ δ and p 2n t is defined exactly the same but with δ t replaced by δ t. We consider only p n t G t since p 2n t G t is handled in exactly the same way. It is elementary to show that p n t G t P T n 0 δ t G δ t The random variable T 0 has a density g such that sup 0 t tg t that sup G 0 t t G δ t G t G δ t 2P σ σ δ Cδ C for some constant C, which implies The last term can be made arbitrarily small by choosing δ small enough. The only term left to deal with is P T n 0 δ t G δ t, for which we use a Berry-Esséen type result in Bhattacharya and Ranga Rao (976). Applying their Theorem 3.3, we have Y i Y 4 sup 0 x P T n 0 x G x a 0 0 2n n i n where a 0 is a constant that depends only on 0. The result now follows from the piecewise continuity of r and the fact that ε has finite fourth moment. To prove the consistency of the test, note that the first part of the theorem establishes that the bootstrap percentiles consistently estimate those of T 0. It is enough, then, to argue that T n 0 tends to infinity in probabilty as n. By assumption, there exists J 0 such that φ J def 0 r x cos πjx dx 0. We have T n 0 2n φ 2 J σ 2 from which the result follows since φ J is consistent for φ J and σ 2 is consistent for where r 0 r x dx. 0 r x r 2 dx σ 2, 4. HETEROSCEDASTICITY In practice the assumption of equal error variances is sometimes violated. Under such circumstances, inferences based on homoscedasticity may be invalid regardless of whether parametric or nonparametric methods are used. A reasonable model for heteroscedasticity is to assume that the error variances follow a smooth function of some nown variable. Here we assume that the error variance is a function of the predictor x. In 7

9 Section 4. we consider how heteroscedasticity affects the large sample distribution of T n and another statistic that seems better suited for heteroscedastic data. In Section 4.2 we propose a wild bootstrap algorithm for approximating the distribution of statistics in the presence of heteroscedasticity. 4. Large Sample Distribution of Statistics Consider the model Y i r x i σ x i η i i n (2) where E η i 0, Var η i, i n and σ is some positive function. The following theorem provides the large sample distribution of T n under model (2) when r C. The proof of this result is given in the Appendix. Theorem 3. Assume that η η n in model (2) are independent with finite fourth moments, and that the error variance function σ 2 x has two continuous derivatives on [0,]. Then, if r is identical to a constant, where c j D T n sup Z Z 2 Z m are jointly normal for all m, Z j N 0, j j c j Z 2 j 0 σ2 x cos 2π jx dx 0 σ2 x dx 2, and Cov Z j Z 0 σ2 x cos π jx cos π x dx 0 σ2 x cos 2 π jx dx 0 σ2 x cos 2 π x dx Not surprisingly, the asymptotic distribution of T n depends on the unnown error variance function. Hence, to put Theorem 3 to practical use, one is faced with the daunting tas of estimating σ 2 x. Another possible test statistic is T het n n j φ 2 j Var φ j On the surface T het n seems better suited for heteroscedasticity than does T n since each φ j is correctly standardized. Note that Var φ j n n 2 σ 2 i cos 2 π jx i (3) i which is asymptotic to n we obtain a large sample distribution for T het n. 0 σ2 x cos 2 π jx dx under the conditions of Theorem 3. In the following theorem 8

10 Theorem 4. Assume that the conditions of Theorem 3 hold, and that Var φ j is uniformly consistent for Var φ j in the sense that Then, if r is identical to a constant, Var j n φ j Var φ j T D het n sup Z 2 j j where Z Z 2 have the same distribution as in Theorem 3. P 0 (4) Estimation of Var φ j requires estimation of the variance function σ 2 x, which may be done either parametrically or nonparametrically. The latter approach can be achieved by smoothing the squared residuals e 2 e 2 n. Another, simpler, possibility is to use the estimator Var φ j n n 2 e 2 i cos2 π jx i i In the case of a parametric model for σ 2 x, if wealy consistent estimators of model parameters are available, then one may show that the uniform consistency condition in Theorem 4 is achieved by using a version of Var φ j that replaces each σ 2 i in (3) by its parametric estimator. It is of interest to investigate how much different the limit distributions of T n and T het n are from each other and from the limit distribution, call it F OS, in the case of homoscedasticity. One observation is immediate: the limit distribution of T n is more complicated than that of T het n in that it depends on the variance function through the constants c c 2 as well as through the covariance function of the process Z Z 2. For this reason, we conjecture that the test based on T het n will generally be the more robust of the two to heteroscedasticity when F OS is used to obtain critical values for each test. It is interesting to note, however, that the limiting distributions of the two statistics are the same if the variance function is linear in x, since in that case c j for all j. To investigate the question of robustness, we shall measure how far the limit distribution of T het n is from F OS in the case of a quadratic variance function, i.e., σ 2 x β 0 β x β 2 x 2. Here, we have Cov Z j Z j A j B j j where and A j β 2 j π 2 j 2 π 2 j j 2 β 2 j π 2 j 2 π 2 j j 2 B j 2β 0 β 4 β 2 6 β 2 2π j 2 2β 0 β 4 β 2 6 β 2 2π 2 9

11 By letting β β 2 0 and β 0 β 2 0, one imizes Cov Z j Z. The limiting value of Cov Z j Z j is j π 2 j 2 π 2 j j 2 6 2π j 2 6 2π 2 for 2 (5) We now use simulation to investigate the limit distribution of T het n. Let Z Z 2 Z K have a multivariate normal distribution such that E Z i 0 and Var Z i, i K, and Cov Z j Z ( j ) is given by (5). Ten thousand replications of T K het K j Z 2 j were generated for each of K to approximate the 95th percentile of the limiting distribution of T het n. The results in Table indicate that the departure of this percentile from that of F OS, 4.793, is insubstantial. An empirical approximation of P T 80 het is Table. Approximations to 95th percentile of Thet K for a quadratic variance function K Percentile The calculations above lead us to conjecture that the correlation among the Z j s has very little impact on the limiting distribution of T het n. We can conclusively show that the impact is small when K 2. Let T ρ denote the random variable Z 2 Z 2 Z when the correlation between Z and Z 2 is ρ. Using numerical integration, we may compute P T ρ t for any given t and ρ. The 90th, 95th and 99th percentiles of T 0 are , and 6.736, respectively (Hart (997), page 79). For various ρ, we have computed P T ρ t for t The results are shown in Figure and indicate that the difference P T 0 t P T ρ t tends to be quite small in absolute value for ρ 8. For very large ρ, the difference is slightly larger, but positive, implying that, at least for K 2, a false assumption of homoscedasticity would lead to a conservative test. 0

12 Tail Probablity t= t=4.077 t= rho Figure. P T ρ t as a function of correlation ρ. 4.2 Wild Bootstrap Algorithm Ideally we desire a method that correctly adjusts a statistic s critical values to account for heteroscedasticity. The wild bootstrap is such a method. It was proposed by Wu (986), studied further by Liu (988) and considered in nonparametric regression and given its name by Härdle and Mammen (993). This procedure is called the wild bootstrap since n different distributions are estimated from n residuals. The wild bootstrap algorithm we propose is as follows. Let e i Y i Y, i n, and define bootstrap data Y Y n by Y i Y e i η i i n where η η n are a random sample from an arbitrary distribution having first moment zero and second and third moments both. The null distribution of a statistic S Y Y n is approximated by using Monte Carlo methods to generate many independent copies of S Y Y n. A popular and simple choice for the distribution of η i is the two-point distribution that assigns probabilities and 5 0 and to 5 2, respectively. This distribution and two continuous possibilities were proposed by Mammen (993). The intuitive motivation for the wild bootstrap is that, at least in many cases, it ensures that the first three moments of the bootstrap distribution asymptotically match those of the underlying null distribution. Mammen (993) provides some theoretical bacing for the wild bootstrap in the setting of a linear model whose number of parameters increases sufficiently slowly with the sample size. These results would undoubtedly

13 be useful in a theoretical study of the wild bootstrap in our current setting. However, we defer such a study to future research, and restrict attention in the rest of the paper to simulations and a real-data example. 5. SIMULATION STUDY The validity and of our tests are studied in this section via simulation. Validity is investigated using a nominal significance level of To detect level differences of 0.0 or larger, 2000 replications were conducted for each simulation study. The advice of Efron and Tibshirani (993) is followed by using 000 bootstrap samples on each replication, since the nominal test level is Finally, sample sizes of 5, 30 and 80 were used. 5. I.I.D. cases We first consider the simple linear regression model Y i β x i ε i, where x i i 0 5 n, i n, and ε ε n are i.i.d. random variables with mean 0 and variance σ 2. The variance σ 2 was taen to be 0 0, and β 0, 0.03, 0.09, 0.8 and 0.4. Four different choices for the error distribution were considered: Gaussian, exponential, t with 4 degrees of freedom, and uniform. Each of these distributions was shifted and rescaled as needed to yield a mean of 0 and variance 0.0. We considered a bootstrap test, a wild bootstrap test, a and a parametric t-test. Both types of bootstrap test and the use the test statistic T n defined in (5). Critical values of the are percentiles of the distribution F OS (Hart (997), p. 78). Critical values of the bootstrap and wild bootstrap tests were obtained as described in Sections 3 and 4.2, respectively. For the distribution of η i in the wild bootstrap, we used the two-point distribution defined in Section 4.2. The parametric t-test is simply the classical t-test of H 0 : β 0 based on the assumption of normally distributed errors. Finally, two additional tests were considered to investigate how of the bootstrap tests is affected (if any) by using the wrong error distribution when the alternative hypothesis is true. For a data set Y Y n, define where E i Y i β 0 β x i i n β 0 and β are the least squares estimates of intercept and slope, respectively. One may carry out bootstrap and wild bootstrap tests based on these residuals in exactly the same way tests are done with the residuals e i Y i Y, i n. We will refer to the two types of tests as (wild) and (wild) bootstrap E tests. So, a total of six tests were carried out for each data set generated. We first discuss the results for i.i.d. Gaussian errors (Figure 2). To save space we only show what happened for n 5 and 30, although our comments apply to n 80 as well. Both and bootstrap E are satisfactory and very similar in terms of level and despite the fact that the residuals E use nowledge of the regression function. The wild bootstrap performs almost as well as the bootstrap 2

14 tests except when n 5, this in spite of the true model being homoscedastic. Interestingly, the empirical levels of tend to the nominal level from below as sample size increases while those of wild bootstrap E have the opposite behavior. Use of the asymptotic distribution, F OS, provides satisfactory results when n is large. However, the empirical level of the is below 0.05 when n 5. Consequently, this test has lower than do the bootstrap tests, and thus seems less desirable when n is small. As n increases, the five nonparametric tests perform similarly. The parametric t test provides correct test levels and, as expected, is the most ful test among the six. n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 2. Power comparison for the six tests with i.i.d. Gaussian errors. The flat line here and in subsequent Section 5 figures indicates the nominal level of The performance of each test was practically invariant to error distribution type, and hence we do not show any results for the nonnormal cases. So far all wild bootstrap results have used the two-point distribution of Section 4.2 for η i. We also tried two other schemes for generating i.i.d. η i s, both of which were proposed by Mammen (993): η 2 i V i 2 V 2 i 2 where V i N 0, and where δ η 3 i 2, δ 2 75 δ V i 2 U i δ 2 2 δ δ and V i and U i are i.i.d. N 0 random variables. For large n, the three wild bootstrap methods were not much different. However, the two-point distribution produced more consistent results across the different error distributions, in that test levels did not exceed the 3

15 nominal level. Use of η 3 i leads to higher than the other two methods in small sample cases, but has test levels that are slightly high. It is not surprising that each wild bootstrap test performs relatively poorly when n 5, as the wild bootstrap does not tae advantage of the i.i.d. error structure. We also considered the regression functions r x bsin πx and r x bsin 3πx with the errors i.i.d. Gaussian. Figures 3 and 4 reveal that the test performs better than the for both functions when n 5. However there is not much difference in among the three tests at n 30. n=5 n= b b Figure 3. Rejection rates of tests with r x bsin πx and i.i.d. Gaussian errors. n=5 n= b b Figure 4. Rejection rates of tests with r x bsin 3πx and i.i.d. Gaussian errors. 4

16 5.2 Heteroscedastic cases We now consider the model Y i β x i σ x i η i, i n, where η η n are i.i.d. N 0 and the - shaped variance function σ 2 x x x 2. The same values of β as in Section 5. were used and 0 σ2 x dx 0 0, maing the scale more or less comparable to that in Section 5.. The same six tests as before are used. The simulation shows (Figures 5 and 6) that the level of the parametric t-test is much higher than the nominal level. The test based on T n and the i.i.d. limit distribution F OS also has excessive empirical levels. Bootstrap e and bootstrap E fail to maintain the correct test levels, as expected. The only valid test seems to be the, although its empirical level tends to be low, leading to low for n 5. n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 5. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T n. As argued in Section 4., T het n may have an advantage over T n in heteroscedastic cases. Evidence to this effect is seen in the simulation. When both T het n and T n are compared to the large sample homoscedastic critical values, the T het n-based test has better level accuracy. (Compare Figures 5 and 6.) Liewise, level accuracy of the and E tests is better for T het n than for T n. In other words, T het n seems to be the more robust statistic to departures from homoscedasticity. 5

17 n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 6. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T het n. We also tried the -shaped variance function σ 2 x x x 2 ; see in Figures 7 and 8. Here, the levels of the bootstrap and s based on T n and the parametric t test are significantly lower than the nominal level. The - and -shaped variance function affect the test levels in opposite directions. However, tests based on T het n still produce results with more accurate test levels, showing its robustness again to heteroscedasticity. The -shaped variance function does not seriously affect relative to the homoscedastic case. The bootstrap and s based on T het n tend to have higher in general, which is undoubtedly a consequence of their superior level properties. Simulations for the regression function r x bsin 3πx with both - and -shaped error variance functions were also performed. Of the, and homoscedastic s based on T n, only the wild bootstrap test maintained the correct level. The tests based on T het n all had correct levels, but their was very low when n 5. We did not see this phenomenon in the straight line model. A possible explanation for the low performance of T het n is that the large variation of Var φ j hinders the detection of a regression model with more local curvature. This problem could perhaps be alleviated by smoothing the squared residuals in order to stabilize estimators of Var φ j. 6

18 n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 7. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T n. n=5 n= bootstrap E wild bootstrap E parametric t bootstrap E wild bootstrap E parametric t slope slope Figure 8. Power comparison of the six tests in a heteroscedastic case with σ 2 x x x 2 and using test statistic T het n. Using a local linear smoother to obtain variance estimates, we improved the of the homoscedastic. For the wild bootstrap test, we need to smooth the squared residuals for the test statistic and all bootstrap statistics. If a data-driven bandwidth is used for the test statistic, then ideally we would 7

19 choose a different bandwidth in each wild bootstrap sample. To avoid time-consuming computations, the same bandwidth was used (in a given data set Y Y n ) for T het n and all wild bootstrap statistics. The one-sided cross-validation procedure of Hart and Yi (998) was used to select the bandwidth in T n. This implementation led to improvement for the wild bootstrap test. In the straight line model (both i.i.d. and heteroscedastic cases), the tests based on T n and T het n had comparable for n 5. No improvement in was found by applying the smoothing technique to T het n, which tends to validate our conjecture that the large variation of Var φ j is a hindrance only when the regression model has sufficient curvature. 6. EXAMPLE People with diabetes generally have high blood glucose levels. Uncontrolled high blood glucose can lead to long term complications such as retinopathy, neuropathy, nephropathy, and amputation. The American Diabetes Association (2000) recommends eeping the pre-meal blood glucose level between 80 and 20 mg/dl (4.4 to 6.7 mmol/l), or hemoglobin A c (HbA c) below 7%. Daily blood glucose levels can be easily monitored by diabetics themselves. However, the measurements are short-term and relatively unstable, often affected by emotion, meal intaes, etc. On the other hand, HbA c measures the long term glycemic control, but requires laboratory testing. In medical practice, daily morning fasting blood glucose (FBG) has been targeted for day to day glycemic control and is considered correlated to HbA c. Other opinion has expressed the importance of targeting postprandial blood glucose (usually two hours after meals, 2PPBG) to improve long term glycemic control. The example we consider here was from a clinical trial studying glycemic control in diabetics. The order selection test was used to study the relationships between HbA c and blood glucose measurements (FBG or 2PPBG). Scatter plots for HbA c versus FBG and 2PPBG are shown in Figure 9. The variation of HbA c appears to decrease approximately linearly as FBG increases, whereas variation of HbA c seems to be more or less constant in 2PPBG. Since the covariate values are not evenly spaced, we carried out the no-effect tests by regressing HbA c on the rans of each covariate (as discussed in Section 2). Table 2 provides strong evidence against the hypothesis that FBG has no effect on HbA c. P-values for the (using F OS ), bootstrap test and wild bootstrap test are less than.005 and very similar to each other, for both T n and T het n. The parametric t test assuming a straight line relationship between HbA c and FBG results in a similar P-value of for testing H 0 : β 0. As for the relationship between 2PPBG and HbA c, the P-values are even more significant, less than 0.000, for all the tests. These data support the importance of targeting either the morning fasting blood glucose or postprandial blood glucose to improve the long term glycemic control measured by HbA c. However, the postprandial glucose seems to 8

20 HbAc (%) HbAc (%) Morning Fasting Blood Glucose (mmol/l) Two Hour Postprandial Blood Glucose (mmol/l) Figure 9. HbA c versus morning fasting and two hour postprandial blood glucose measurements. Table 2. P-values for tests of the hypotheses that FBG and 2PPBG have no effect on HbA c. Test FBG 2PPBG T n T het n T n T het n F OS Bootstrap Wild Bootstrap have a greater effect on HbA c. As we mentioned in Section 2, an attractive feature of the order selection test is that once the null hypothesis is rejected, an immediate smooth estimate of the regression function is at hand. The smooth curves in Figure 9 are Fourier series estimates using truncation points equal to arg j was. φ 2 j, which in each case 7. CONCLUSION In this research we have implemented bootstrap methods to approximate the sampling distribution of order selection test statistics. For homoscedastic errors, the bootstrap is a better method than the, especially when the sample size is small. The wild bootstrap, meant for models with heteroscedastic errors, 9

21 also has satisfactory performance in i.i.d. cases when the sample size is not small. This is encouraging since in practice there is uncertainty about heteroscedasticity. i As for the residuals used in the bootstrap algorithm, we found two reasons why the residuals e i Y i Y, n, are a good choice for the no-effect hypothesis. First, computing the e i s is straightforward since no estimate of a regression model is required. Secondly, when the null hypothesis is true, we are sampling directly from the appropriate empirical distribution by resampling from e e n. By the same toen, resampling from ε i Y i Y i would be appropriate when testing the adequacy of the fitted values Y Y n. The order selection tests based on either the i.i.d. asymptotic distribution or the bootstrap are at best approximately valid in heteroscedastic cases. Wild bootstrapping, however, leads to tests that are asymptotically valid. A new test statistic T het n has been proposed and studied. In contrast to the statistic T n, T het n explicitly estimates the unnown error variance function. Tests based on T het n have comparable to ones based on T n when the regression function is a straight line. An advantage of T het n is that it is more robust than T n to heteroscedasticity when each statistic is compared with critical values based on the assumption of homoscedasticity. A disadvantage of T het n is that its larger variation (under heteroscedasticity) can hinder its ability to detect regression functions with substantial curvature, such as trigonometric functions. Fortunately, this disadvantage can be repaired by smoothing squared residuals. What is the best test? There is no simple answer for this easy question. None of the tests is best for every situation. However, if the sample size is not too small (n 30), the wild bootstrap test seems a good choice. It is the only asymptotically valid test (among those considered) in the heteroscedastic case and performs comparably to the bootstrap and s in i.i.d. cases. In case the sample size is small, the identification of heteroscedasticity will be crucial. The bootstrap test is the best choice for models with i.i.d. errors, whereas the wild bootstrap is better when heteroscedasticity is present. We have seen in the simulation that the test based on the smoothed version of T het n and using critical values from F OS maintains the correct level remarably well and also has satisfactory in the heteroscedastic cases. However, it needs further study to determine its robustness under other patterns of heteroscedasticity. The simulation was done with evenly spaced designs. We briefly discussed in Section 2 two possible solutions for cases with unevenly spaced or random designs. It would be interesting to see which has the best performance. The effects, if any, that a random design has on the null distribution of the order selection test statistic will be of interest. The exploration of proper bootstrap methods in this setting is a topic for future research. Thus far we have studied the order selection test of no-effect in simple regression models. Of interest are extensions to testing the fit of general parametric models and also multiple regression settings (i.e., more than one predictor). Extensions in each of these directions is certainly possible. This can be done by combining 20

22 ideas in Aerts et al. (999) and Aerts et al. (2000) with those in the current paper. 8. APPENDIX Proof of Theorem 3 Since E e 2 i σ 2 i O n uniformly in i, we can use a law of large numbers to establish that σ 2 P σ 2 0 σ 2 x dx By Slutsy s Theorem, T n has the same limit distribution as Ť n, where Ť n 2n φ n 2 j j σ 2 We may write where c jn 2nVar φ j σ 2 and Z 2 jn than n, the proof consists of demonstrating i and ii below: i P K n j ii P K n Ť n n c jn Z 2 jn j φ 2 j Var φ j. Letting K n be a sequence that goes to at a slower rate j c jn Z 2 jn t P c jn Z 2 jn t P sup n j j c jn Z 2 jn t 0 c j Z 2 j t 0 We first prove (i). It is straightforward to show that j c jz 2 j almost surely as, and hence in considering P T n t we may tae t. Note that P K n n j c jn Z 2 jn t P K n n c jn Z 2 jn c jn j K n n j c jn t It may be shown that j c jn as and n, and hence, for all sufficiently large n, the last probability is at least P K n n c jn Z 2 jn c jn j Let Q jn c jn Z 2 jn. It suffices to show that for any δ 0, P K n n j Q jn δ t 2 2

23 Let j be the largest integer such that j 2 and j j j 2, define K n and j 2 be the largest integer such that j 2 2 n. For each n ξ jn If j is such that j 2 j 2 n, and if j 2 r j 2 i i j 2 j 2 r j 2 r Q n r j 2, then Q j2 n j 2 Q rn ξ jn j 2 Thus we have K n n j Q jn δ j 3 j j j 2 j 2 r Q rn δ 2 ξ jn δ j 2 2 where j 3 j 2 if j 2 2 P K n n n, and j 3 j 2 otherwise. Hence j Q jn By Marov s inequality, we have j 3 δ j j P j 2 P j 3 P j 2 r j j j 3 j 2 j 3 j j j j P j 2 Q rn δ 2 j 2 r j 2 j 2 r Q rn j 2 r Q rn Q rn j 3 4 j j δ 2 ξ jn j 2 δ 2 j3 δ 2 P j j δ 2 δ 2 j 4 E Using the moment conditions and the boundedness of the cosine function, E j 2 r j 3 P ξ jn j j j 2 ξ jn j 2 δ 2 δ 2 2 Q rn (6) r j2 Q 2 rn Var j2 r c rnz 2 rn O j 2, and hence the right-hand side of inequality (6) is of the order j 3 j j j 2, which tends to 0 as K n. Again, by Marov s inequality, we have j 3 P ξ jn j j j 2 δ 2 j 3 4 j j δ 2 j 4 Eξ2 rn We now use a result of Serfling (970) to deal with Eξ 2 jn. Denote the joint distribution function of the random variables Q a n Q a n by F a. There exists a constant D such that E i Q in to be the following functional: g F a a E Q in D i a 2 D. Define g F a 22

24 Obviously g F a g F a g F a, and a E Q in i a Applying Theorem A of Serfling (970), we have 2 D g F a l Eξ 2 jn log j 2 2 j D Hence j 3 P ξ jn j j j 2 Combining the preceding results, δ 2 j 3 4 j j δ 2 j 4 log j 2 2 j D 0 P K n n j c jn Z 2 jn t which clearly leads to (i). Next we show (ii). For any positive integer K, let T n K K A K j We need to show that for any δ 0 c jn Z 2 jn T K K j T n K t and B K K K n c j Z 2 j T sup j j c jn Z 2 jn t c j Z 2 j P T n K n t P T t δ for all n sufficiently large. Since P T n K n t P A K B K, we have P T n K n t P T t P A K P B K P A K B K P T t P A K P T K t P T K t P T t P B K P A K B K P A K P T K t P T K t P T t P B c K One may show that c jn c j ε, j n, for any ε 0 and all n sufficiently large. Using this fact, the joint asymptotic normality of φ φ K, and the continuous mapping theorem, we have P A K P T K t 0 (7) for all fixed K as n. Clearly there exists K a such that P T Ka t P T t δ 3 (8) 23

25 Arguing as in our proof of (i), there exists K 0 K 0 ε for any ε 0 such that P K 0 K n j c jn Z 2 jn t ε (9) for all n sufficiently large. Using equations (7), (8), (9) and choosing K b K 0 δ 3 have K a, we P T n K n t P T t P A Kb P T Kb t P T Kb t P T t P B c K b δ for all n sufficiently large, which completes the proof of Theorem 3. that Proof of Theorem 4 Using essentially the same argument as in the proof of Theorem 3, it can be shown φ n 2 j j Var φ j D sup j Z 2 j (20) Now, T het n n φ 2 j j Var φ j φ 2 j j Var φ j Var φ j Var φ j Note that φ 2 j j Var φ j Var φ j Var φ j Var n φ j Var φ j φ 2 j j Var φ j Using this last inequality, (20) and assumption (4), it now follows that φ 2 j n j Var φ j Var φ j Var φ j converges in probability to 0, and the result follows. REFERENCES Aerts, M., Claesens, G., and Hart, J. (999), Testing the Fit of a Parametric Function, Journal of the American Statistical Association, 94, (2000), Testing Lac of Fit in Multiple Regression, Biometria, 87, American Diabetes Association (2000), Standards of Medical Care for Patients With Diabetes Mellitus, Diabetes Care, 23, Supplement, S32 S42. Efron, B., and Tibshirani, R. J. (993), An Introduction to the Bootstrap, New Yor: Chapman and Hall. 24

26 Euban, R. L., and Hart, J. D. (992), Testing Goodness of Fit in Regression via Order Selection Criteria, The Annals of Statistics, 20, Hall, P., and Wilson, S. R. (99), Two Guidelines for Bootstrap Hypothesis Testing, Biometrics, 47, Härdle, W., and Mammen, E. (993), Comparing Nonparametric Versus Parametric Regression Fits, The Annals of Statistics, 2, Hart, J. D. (997), Nonparametric Smoothing and Lac-of-Fit Tests, New Yor: Springer-Verlag. Hart, J. D., and Yi, S. (998), One-Sided Cross-Validation, Journal of the American Statistical Association, 93, Kuchibhatla, M., and Hart, J. D. (996), Smoothing-Based Lac of Fit Test: Variations on a Theme, Journal of Nonparametric Statistics, 7, 22. Liu, R. Y. (988), Bootstrap Procedures Under Some Non-I.I.D. Models, The Annals of Statistics, 6, Mammen, E. (993), Bootstrap and Wild Bootstrap for High Dimensional Linear Models, The Annals of Statistics, 2, Parzen, E. (98), Nonparametric Statistical Data Science: A Unified Approach Based on Density Estimation and Testing for White Noise, Technical report, Department of Statistics, Texas A&M University. Qumsiyeh, M. B. (990), Edgeworth Expansion in Regression Models, Journal of Multivariate Analysis, 35, (994), Bootstrapping and Empirical Edgeworth Expansions in Multiple Linear Regression Models, Communications in Statistics Theory and Methods, 23, Serfling, R. J. (970), Moment Inequalities for Maximum Cumulative Sum, The Annals of Mathematical Statistics, 4, Spitzer, F. (956), A Combinatorial Lemma and Its Application to Probability Theory, Transactions of the American Mathematical Society, 82, Wu, C. F. J. (986), Jacnife, Bootstrap and Other Resampling Methods in Regression Analysis, The Annals of Statistics, 4,

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice - theory and application in finance R. Scott Hacker and Abdulnasser Hatemi-J April 200

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

A better way to bootstrap pairs

A better way to bootstrap pairs A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Refining the Central Limit Theorem Approximation via Extreme Value Theory Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of

More information

Heteroskedasticity-Robust Inference in Finite Samples

Heteroskedasticity-Robust Inference in Finite Samples Heteroskedasticity-Robust Inference in Finite Samples Jerry Hausman and Christopher Palmer Massachusetts Institute of Technology December 011 Abstract Since the advent of heteroskedasticity-robust standard

More information

A Resampling Method on Pivotal Estimating Functions

A Resampling Method on Pivotal Estimating Functions A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

REJOINDER Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions

REJOINDER Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions arxiv: arxiv:0000.0000 REJOINDER Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions Li Pan and Dimitris N. Politis Li Pan Department of Mathematics University of California

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Michael Sherman Department of Statistics, 3143 TAMU, Texas A&M University, College Station, Texas 77843,

More information

Bootstrapping sequential change-point tests

Bootstrapping sequential change-point tests Bootstrapping sequential change-point tests Claudia Kirch February 2008 Abstract In this paper we propose some bootstrapping methods to obtain critical values for sequential change-point tests. We consider

More information

Test for Discontinuities in Nonparametric Regression

Test for Discontinuities in Nonparametric Regression Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Goodness-of-fit tests for the cure rate in a mixture cure model

Goodness-of-fit tests for the cure rate in a mixture cure model Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

A Test of Cointegration Rank Based Title Component Analysis.

A Test of Cointegration Rank Based Title Component Analysis. A Test of Cointegration Rank Based Title Component Analysis Author(s) Chigira, Hiroaki Citation Issue 2006-01 Date Type Technical Report Text Version publisher URL http://hdl.handle.net/10086/13683 Right

More information

where x i and u i are iid N (0; 1) random variates and are mutually independent, ff =0; and fi =1. ff(x i )=fl0 + fl1x i with fl0 =1. We examine the e

where x i and u i are iid N (0; 1) random variates and are mutually independent, ff =0; and fi =1. ff(x i )=fl0 + fl1x i with fl0 =1. We examine the e Inference on the Quantile Regression Process Electronic Appendix Roger Koenker and Zhijie Xiao 1 Asymptotic Critical Values Like many other Kolmogorov-Smirnov type tests (see, e.g. Andrews (1993)), the

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Size and Power of the RESET Test as Applied to Systems of Equations: A Bootstrap Approach

Size and Power of the RESET Test as Applied to Systems of Equations: A Bootstrap Approach Size and Power of the RESET Test as Applied to Systems of Equations: A Bootstrap Approach Ghazi Shukur Panagiotis Mantalos International Business School Department of Statistics Jönköping University Lund

More information

OFFICE OF NAVAL RESEARCH FINAL REPORT for TASK NO. NR PRINCIPAL INVESTIGATORS: Jeffrey D. Hart Thomas E. Wehrly

OFFICE OF NAVAL RESEARCH FINAL REPORT for TASK NO. NR PRINCIPAL INVESTIGATORS: Jeffrey D. Hart Thomas E. Wehrly AD-A240 830 S ~September 1 10, 1991 OFFICE OF NAVAL RESEARCH FINAL REPORT for 1 OCTOBER 1985 THROUGH 31 AUGUST 1991 CONTRACT N00014-85-K-0723 TASK NO. NR 042-551 Nonparametric Estimation of Functions Based

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN. EE & CS, The University of Newcastle, Australia EE, Technion, Israel.

UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN. EE & CS, The University of Newcastle, Australia EE, Technion, Israel. UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN Graham C. Goodwin James S. Welsh Arie Feuer Milan Depich EE & CS, The University of Newcastle, Australia 38. EE, Technion, Israel. Abstract:

More information

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003. A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Minimum distance tests and estimates based on ranks

Minimum distance tests and estimates based on ranks Minimum distance tests and estimates based on ranks Authors: Radim Navrátil Department of Mathematics and Statistics, Masaryk University Brno, Czech Republic (navratil@math.muni.cz) Abstract: It is well

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Improving linear quantile regression for

Improving linear quantile regression for Improving linear quantile regression for replicated data arxiv:1901.0369v1 [stat.ap] 16 Jan 2019 Kaushik Jana 1 and Debasis Sengupta 2 1 Imperial College London, UK 2 Indian Statistical Institute, Kolkata,

More information

THE information capacity is one of the most important

THE information capacity is one of the most important 256 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 1, JANUARY 1998 Capacity of Two-Layer Feedforward Neural Networks with Binary Weights Chuanyi Ji, Member, IEEE, Demetri Psaltis, Senior Member,

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Test Volume 11, Number 1. June 2002

Test Volume 11, Number 1. June 2002 Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Worst Case and Average Case Behavior of the Simplex Algorithm

Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Worst Case and Average Case Behavior of the Simplex Algorithm Steven R. Dunbar Department of Mathematics 203 Avery Hall University of Nebrasa-Lincoln Lincoln, NE 68588-030 http://www.math.unl.edu Voice: 402-472-373 Fax: 402-472-8466 Topics in Probability Theory and

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

A comparison study of the nonparametric tests based on the empirical distributions

A comparison study of the nonparametric tests based on the empirical distributions 통계연구 (2015), 제 20 권제 3 호, 1-12 A comparison study of the nonparametric tests based on the empirical distributions Hyo-Il Park 1) Abstract In this study, we propose a nonparametric test based on the empirical

More information

Information Theoretic Asymptotic Approximations for Distributions of Statistics

Information Theoretic Asymptotic Approximations for Distributions of Statistics Information Theoretic Asymptotic Approximations for Distributions of Statistics Ximing Wu Department of Agricultural Economics Texas A&M University Suojin Wang Department of Statistics Texas A&M University

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing 1 The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing Greene Ch 4, Kennedy Ch. R script mod1s3 To assess the quality and appropriateness of econometric estimators, we

More information

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Chanseok Park 1 Department of Mathematical Sciences, Clemson University, Clemson, SC 29634 Short Title: Smooth

More information

DS-GA 1002 Lecture notes 2 Fall Random variables

DS-GA 1002 Lecture notes 2 Fall Random variables DS-GA 12 Lecture notes 2 Fall 216 1 Introduction Random variables Random variables are a fundamental tool in probabilistic modeling. They allow us to model numerical quantities that are uncertain: the

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

The Number of Bootstrap Replicates in Bootstrap Dickey-Fuller Unit Root Tests

The Number of Bootstrap Replicates in Bootstrap Dickey-Fuller Unit Root Tests Working Paper 2013:8 Department of Statistics The Number of Bootstrap Replicates in Bootstrap Dickey-Fuller Unit Root Tests Jianxin Wei Working Paper 2013:8 June 2013 Department of Statistics Uppsala

More information

A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1

A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 Jinglin Zhou Hong Wang, Donghua Zhou Department of Automation, Tsinghua University, Beijing 100084, P. R. China Control Systems Centre,

More information

Inferential statistics

Inferential statistics Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Rank-sum Test Based on Order Restricted Randomized Design

Rank-sum Test Based on Order Restricted Randomized Design Rank-sum Test Based on Order Restricted Randomized Design Omer Ozturk and Yiping Sun Abstract One of the main principles in a design of experiment is to use blocking factors whenever it is possible. On

More information

Bootstrap Tests: How Many Bootstraps?

Bootstrap Tests: How Many Bootstraps? Bootstrap Tests: How Many Bootstraps? Russell Davidson James G. MacKinnon GREQAM Department of Economics Centre de la Vieille Charité Queen s University 2 rue de la Charité Kingston, Ontario, Canada 13002

More information

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

Inference in VARs with Conditional Heteroskedasticity of Unknown Form Inference in VARs with Conditional Heteroskedasticity of Unknown Form Ralf Brüggemann a Carsten Jentsch b Carsten Trenkler c University of Konstanz University of Mannheim University of Mannheim IAB Nuremberg

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Statistics Preprints Statistics -00 A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Jianying Zuo Iowa State University, jiyizu@iastate.edu William Q. Meeker

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Recursive Least Squares for an Entropy Regularized MSE Cost Function Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Eco517 Fall 2014 C. Sims FINAL EXAM

Eco517 Fall 2014 C. Sims FINAL EXAM Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,

More information

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June

More information

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205 Bootstrap (Part 3) Christof Seiler Stanford University, Spring 2016, Stats 205 Overview So far we used three different bootstraps: Nonparametric bootstrap on the rows (e.g. regression, PCA with random

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Bootstrap-Based Improvements for Inference with Clustered Errors

Bootstrap-Based Improvements for Inference with Clustered Errors Bootstrap-Based Improvements for Inference with Clustered Errors Colin Cameron, Jonah Gelbach, Doug Miller U.C. - Davis, U. Maryland, U.C. - Davis May, 2008 May, 2008 1 / 41 1. Introduction OLS regression

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Uniformly and Restricted Most Powerful Bayesian Tests

Uniformly and Restricted Most Powerful Bayesian Tests Uniformly and Restricted Most Powerful Bayesian Tests Valen E. Johnson and Scott Goddard Texas A&M University June 6, 2014 Valen E. Johnson and Scott Goddard Texas A&MUniformly University Most Powerful

More information

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College Original December 2016, revised July 2017 Abstract Lewbel (2012)

More information

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation Inference about Clustering and Parametric Assumptions in Covariance Matrix Estimation Mikko Packalen y Tony Wirjanto z 26 November 2010 Abstract Selecting an estimator for the variance covariance matrix

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

GMM Estimation of a Maximum Entropy Distribution with Interval Data

GMM Estimation of a Maximum Entropy Distribution with Interval Data GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff January, 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics

More information

Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap

Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap Emmanuel Flachaire To cite this version: Emmanuel Flachaire. Bootstrapping heteroskedastic regression models: wild bootstrap

More information