Coverage Error Optimal Confidence Intervals
|
|
- Ralf Berry
- 5 years ago
- Views:
Transcription
1 Coverage Error Optimal Confidence Intervals Sebastian Calonico Matias D. Cattaneo Max H. Farrell August 3, 2018 Abstract We propose a framework for ranking confidence interval estimators in terms of their uniform coverage accuracy. The key ingredient is the (existence and) quantification of the error in coverage of competing confidence intervals, uniformly over some empirically-relevant class of data generating processes. The framework employs the check function to quantify coverage error loss, which allows researchers to incorporate their preference in terms of over- and under-coverage, where confidence intervals attaining the best-possible uniform coverage error are minimax optimal. We demonstrate the usefulness of our framework with three distinct applications. First, we establish novel uniformly valid Edgeworth expansions for nonparametric local polynomial regression, offering some technical results that may be of independent interest, and use them to characterize the coverage error of and rank confidence interval estimators for the regression function and its derivatives. As a second application we consider inference in least squares linear regression under potential misspecification, ranking interval estimators utilizing uniformly valid expansions already established in the literature. Third, we study heteroskedasticity-autocorrelation robust inference to showcase how our framework can unify existing conclusions. Several other potential applications are mentioned. Keywords: minimax bound, Edgeworth expansion, nonparametric regression, robust bias correction, linear models, bootstrap, heteroskedasticity-autocorrelation robust inference, optimal inference. The second author gratefully acknowledges financial support from the National Science Foundation (SES and SES ). We thank Federico Bugni, Chris Hansen, Michael Jansson, Andres Santos, Azeem Shaikh, Rocio Titiunik, and participants at various seminars and conference for comments. Department of Economics, University of Miami. Department of Economics and Department of Statistics, University of Michigan. Booth School of Business, University of Chicago.
2 1 Introduction Researchers typically have a range of options for constructing confidence intervals for a parameter of interest in empirical work. These options arise from many sources: how the sampling distribution is approximated, how standard errors or quantiles are constructed, how tuning and smoothing parameters are selected, or what concept of validity/robustness is used. Often many competing interval estimators are valid in some principled sense, but it is difficult to choose which are best among them. When two options are asymptotically equivalent to first order, for a given data generating process, practical guidance does not follow from theory. Further, it is rare that economic or other subject-specific theory dictate a choice of inference procedure. Instead, the model and accompanying assumptions formalize what the researcher believes to be a plausible class of distributions that could have generated the data, and they would like some assurances that the chosen confidence interval is accurate in level regardless of the specific data generating process. It is natural and desirable to know if any confidence intervals are more accurate than others over this class of distributions. We propose a framework that quantifies inference quality according to the coverage error of competing confidence interval estimators, uniformly over a class of data-generating processes (DGPs), thereby allowing us to rank those estimators. In a nutshell, we look at the worst-case coverage error for a given confidence region over the distributions allowed by the researcher s assumptions and then use this information to characterize an optimal inference procedure: a minimax notion of optimality. We employ the check function loss to quantify coverage error, which allows for asymmetric penalization of under- or over-coverage in the optimality criterion. For example, it is common practice to prefer conservative confidence intervals, perhaps at the expense of interval length, and our proposed framework can incorporate such preference directly. We necessarily restrict attention to classes of confidence regions for which the uniform coverage error can be quantified in some way. In the generic framework outlined in Section 2, we give nested levels of knowledge required of such a quantification. At heart, each of these involve some degree of higher-order asymptotic analysis (which is precisely how we can distinguish between first-order equivalent procedures). The weakest assumption is simply bounds on the rate of decay of the worst-case coverage error. When these bounds are nontrivial, we are able to conclusively rank 1
3 some procedures. The strongest assumption consists of a full and precise quantification of the leading terms of coverage error, in which case we can rank all procedures, identify minimax optimal rates (of coverage error decay), and single out optimal inference procedures. Optimization over constants is also possible in our framework. We show through examples that the required levels of knowledge are available in many contexts, and even the most stringent requirement is often met. In an application to nonparametric regression, we develop a uniformly valid coverage error expansion and use it to give a novel recipe for optimal inference. In other applications, we rely on existing results, from bounds to expansions, and show how our framework can be used to unify and extend previous rankings among competing inference procedures. In all cases, our framework gives principled guidance to practitioners. In our framework both the class of confidence intervals and the class of data generating processes play a crucial role: together they determine the lower ( min ) and upper ( max ) portions of the minimax optimality, and in particular, neither class should be too large nor too small in order to obtain useful and interesting results. If the class of intervals is too small it will not reflect the range of choices available to the practitioner, while if too large, the optimal procedure may be infeasible or not useful. For example, the interval estimator set to the real line with probability (1 α) and empty otherwise yields perfect coverage over any possible distribution, but is uninformative. In much the same way, if the assumed class of distributions is too small, it is unlikely to be rich enough to be useful in real-world applications. Economic or other field-specific theory often does not make tight restrictions on the distribution of the data (such as Gaussianity), and any inference procedure designed to be optimal or valid under these restrictions may exhibit poor behavior when the restrictions do not hold. The class is too small for the uniformity to be interesting. On the other hand, some restrictions on the distribution are necessary, because if the class is too large then uniformly valid and informative inference procedures can not be constructed; an idea with a long history that has been studied in a variety of settings (see Bahadur and Savage, 1956; Dufour, 1997; Romano and Wolf, 2000; Romano, 2004; Hirano and Porter, 2012, just to name a few examples). The interplay between the two classes will be crucial for our results. To illustrate, consider nonparametric regression, which we study in Section 3. In this context, the class of intervals may restrict attention to only low order approximation, for example all methods removing bias up to a certain order, even if the underlying functions of the DGP are assumed to possess more smoothness. 2
4 Alternatively, it could be that the intervals are based on approximations trying to utilizing more smoothness than available. The optimal coverage error rate is affected by these assumptions, as we study in detail below. This interplay impacts the optimal coverage error rate in other contexts as well, and applies to conditions other than smoothness, such as orthogonality of the errors in a linear model, as studied by Kline and Santos (2012). We employ their work to provide a second application of our generic framework in Section 4. Highlighting the importance of such interplay between the class of DGPs and the class of confidence intervals considered when developing coverage error minimax optimality is an additional methodological contribution that emerges naturally from our general framework (and the specific examples we study). The remaining of this paper proceeds as follows. Section 2 proposes our generic framework for ranking confidence interval estimators. Sections 3, 4, and 5 then consider three distinct applications of our framework to, respectively, nonparametric local polynomial regression, traditional least squares linear regression, and heteroskedasticity-autocorrelation robust (HAR) inference. The breadth of these applications spans parametric and nonparametric estimands and nuisance parameters, standard and nonstandard limiting distributions, and cross-sectional and dependent data. Section 6 briefly mentions several other possible applications, and concludes. The online supplement contains omitted formulas and proofs. Beyond their use as an ingredient in our framework, Section 3 contains new technical and methodological results that may be of independent interest. We establish uniformly valid Edgeworth expansions for local polynomial regression, and use them to derive inference-optimal bandwidth choices that minimize coverage error or, alternatively, balance coverage error against interval length, which may lead to a shorter (more powerful) interval that is still valid. Calonico, Cattaneo, and Farrell (2018b) further develops these ideas for the specific case of regression discontinuity designs. 1.1 Related Literature Romano (2004), Wasserman (2006), and Romano, Shaikh, and Wolf (2010) give introductions to uniform validity and optimality of confidence interval estimators in particular, and statistical inference more generally, thus providing background review and references for this paper. Hall and Jing (1995) is the closest paper related to our work, wherein minimax bounds are established for one-sided confidence interval estimators based on t-test statistics, relying on Edgeworth expansions 3
5 for the Studentized sample mean. Our general framework was inspired by their paper but is quite different from their work and, to the best of our knowledge, new in the literature. To be more specific, while we also consider a minimax optimality criteria to rank confidence intervals, our framework applies more generally to a large class of (one- or two-sided) confidence intervals and under a wide range of conceptually distinct sufficient conditions, which can be verified in many other settings beyond the case of a parametric location model under i.i.d. data. The idea of ranking inference procedures using coverage error, or the equivalent notion of error in rejection probability, has appeared before in the econometrics literature. Two important examples are Jansson (2004) and Bugni (2010, 2016), who give Berry-Esseen-type bounds for inference procedures as a means of ranking them. Our framework encompasses this type of bounds and conclusions as one possible (weak) way of ranking confidence intervals, although it goes beyond their specific approach: neither of these works, nor others that we are aware of, have laid out the minimax framework of Section 2. Nevertheless, these papers are discussed in more detail below in the context of our framework and specific examples. Ranking inference procedures in general, sometimes uniformly over data-generating processes, has a longer history in econometrics and statistics. We can not hope to do justice to this literature, and hence only mention a few relevant and recent examples beyond those already cited. Beran (1982) studies the uniform optimality of bootstrap inference procedures. Backus (1989) and Donoho (1994) construct minimax confidence intervals for regression under parametric (e.g. Gaussian) assumptions, with the latter reference also showing some interval length optimality properties. Rothenberg (1984) develops higher order (size and power) comparisons of classical testing procedures. Horowitz and Spokoiny (2001) discuss minimax optimal rates for hypothesis testing. Schafer and Stark (2009) constructs confidence intervals with optimal expected size and some minimax optimality. Elliott, Müller, and Watson (2015) establish upper bounds on power for tests involving a nuisance parameter. Müller and Norets (2016) propose a notion of betting-based uniformity guarantees as a measure of (minimax) inference quality. More references and related approaches are discussed in these works. 4
6 2 Framework Our framework centers around inference on a parameter denoted θ, and we write θ F for the value of the true parameter when the data is generated according to distribution F. We study confidence intervals estimators for θ F, denoted I, that have nominal 100(1 α)% coverage. Our ultimate goal is to provide a ranking of all confidence intervals I in a class I, uniformly over a class F of DGPs, where the ranking is determined by the accuracy of coverage with respect to the nominal level. This ranking requires some knowledge of the coverage error, and naturally the more that is known, the more precise the ranking will be. This is formalized in the three assumptions below, Assumptions CEB, CER, and CEE, which are progressively stronger. We first discuss our measure of coverage accuracy as well as the underlying classes F and I. The class F of plausible DGPs is defined by the modeling assumptions and the empirical regularities of the application of interest, and the researcher would like some assurance that coverage is accurate no matter which F F generated the data. Thus, it is reasonable to evaluate a confidence interval I I by studying the worst-case coverage error within the plausible set of DGPs. We thus measure the worst-case coverage error as ( ) sup L P F [θ F I] (1 α), (2.1) F F where L(e) = L τ (e) = e (τ 1{e < 0}) is the check function for a given τ (0, 1). We focus on L τ (e) for its concreteness and usefulness, but could use other well-behaved loss functions, or augment the loss with a penalty for interval length to rule out interval estimators that are infinite with positive probability. Using the check function loss allows the researcher, through their choice of τ, to evaluate inference procedures according to their preferences against over- and under-coverage. Setting τ = 1/2 recovers the usual, symmetric measure of coverage error. Guarding more against undercoverage requires choosing a τ < 1/2. For example, setting τ = 1/3 encodes the belief that undercoverage is twice as bad as the same amount of overcoverage. Intuitively, a good confidence interval is one for which this maximal coveral error is minimized. More precisely, we seek a minimax optimal confidence interval. Since coverage error cannot be written as expected loss, our analysis does not fit neatly within traditional minimax risk analyses, and the corresponding established 5
7 tools and results do not apply. Lastly, controlling (2.1) is qualitatively different than uniform size control: see Remark 1 below. The identity and properties of an optimal confidence interval will depend not only on what is assumed about F, but also which confidence intervals are considered: the class of confidence intervals (more generally, inference procedures) I under consideration. We wish to be agnostic about both F and I, so that our results are as useful as possible and provide tight guidance for empirical practice. That is, the larger F, the more likely it is that a given data set is generated by some F F, and the larger is I the more certain the researcher can be that she is using the best procedure for inference. At the same time, both must be restricted in order to obtain effective bounds on (2.1). The sizes of the two classes, F and I, must be considered together, and interesting problems naturally balance their sizes/complexities. To illustrate these ideas, and motivate our results, consider the case of forming a confidence interval for the mean of a scalar random variable. Let the data be a random sample {X i, i = 1,..., n} from a scalar random variable X F and θ F = E F [X]. First, consider the class I. In order to give useful and tight guidance to a researcher, I can be neither too small nor too large. If I contains very few interval types, we would not be comparing all the procedures available to a researcher. On the other hand, to see that I cannot be too large, consider the interval which is set to the real line with probability (1 α) and empty otherwise. This interval has uniformly perfect coverage, that is, (2.1) is exactly zero, but is entirely uninformative to the researcher. This applies for any F, no matter how large, but such a seemingly powerful conclusion is only possible because I is not usefully defined. To illustrate the role of the class F in studying (2.1), suppose that I includes the standard t interval: I = [ X ± t 1 α/2 s/ n], where X is the sample mean, s is the sample standard deviation, and t 1 α/2 the 1 α 2 is quantile of a t distribution with n 1 degrees of freedom. If F is sufficiently restricted, then the t interval is optimal: if only Gaussian DGPs are plausible, that is F = {F : P F [X x] = Φ((x θ)/σ), θ R, σ 2 > 0}, then (2.1) is again exactly zero. The optimality of the t interval holds for any class I, no matter how large, but in parallel to the discussion above, this powerful statement is only possible because F is unrealistically small. On the other hand, well-known results, dating back at least to Bahadur and Savage (1956), show that if F is too large it is impossible to construct an effective confidence interval that controls the worst-case coverage. 6
8 The restriction that I be effective rules out setting I = R with a certain probability, and other such examples. (We may augment the coverage error loss function so that such intervals are not optimal, but rarely would ranking these be of interest to researchers.) Again, this indicates that F and I must be defined together in order that the problem is interesting and the results useful. In the artificial examples above it is possible to reduce (2.1) to zero, but in general this is not possible. However, it is often true that (2.1) vanishes asymptotically, as the sample size grows. Such a confidence interval is uniformly consistent. But even this is not guaranteed. For example, the t interval is pointwise consistent, that is, P F [θ F I] 1 α, provided that F permits a central limit theorem to hold for n X/s, but (2.1) will not vanish without further restrictions on F. Our central aim will be to quantify the rate at which (2.1) vanishes asymptotically, and to show how this rate depends on F and I. Our rankings formalize the intuition that intervals for which (2.1) vanishes faster are preferred over those for which the rate is slower, and intervals with the fastest possible rate are minimax optimal. This gives a way to rank inference procedures that may otherwise appear equivalent. Depending on what is known about the worst-case coverage, which we will now successively build up, we can provide more informative rankings. In the remaining of the paper, all quantities may vary with n, including F and I (and their members F and I) and limits are taken as n unless explicitly stated otherwise. We focus on scalar θ F for concreteness, but our framework extends naturally to other types of estimand. We begin with a weak notion of ranking confidence intervals, and correspondingly we make a weak assumption about coverage error: only bounds are known for the worst-case coverage. Assumption CEB: Coverage Error Bounds. For each I I, there exists a non-negative sequence R I and a positive sequence R I, such that ( ) R I sup L P F [θ F I] (1 α) R I. F F This assumption requires the existence and characterization of lower and upper bounds on the worse-case coverage error of a confidence interval estimator I I. Trivial bounds are R I = 0 and R I = 1. Non-trivial bounds can be established employing Berry-Esseen-type bounds and their reversed versions, Edgeworth Expansions and related methods, or other higher-order approximations to coverage error. See, for example, Rothenberg (1984), Hall (1992a), and Chen, Goldstein, 7
9 and Shao (2010) for reviews and more references. Concrete illustrations of these methods are given below. Assumption CEB is useful in comparing confidence intervals when R I = o(1) and R I > 0 for at least some I I. In this case, an interval I 1 I would never be a preferred choice if there was a competing procedure I 2 I whose upper bound was below the lower bound of I 1 : heuristically, R I2 < R I1 should mean that I 2 ranks above I 1. The following definition formalizes this idea. Definition 1: Domination. Under Assumption CEB, an interval I 1 I is I /F -dominated if there exists I 2 I such that R I2 = o(r I1 ). This idea parallels the notion of (in)admissibility in classical statistical decision theory, separating those confidence intervals that have the potential of being optimal from those that can never be (i.e., intervals that will always be dominated by some other interval estimator in the class I ). This is a weak ranking notion for confidence intervals. Nevertheless, it is often useful. For example, in the context of partially identified parameters, Bugni (2010, 2016) compares inference procedures based on an asymptotic distributional approximation (AA), a bootstrap approximation (B), and subsampling (SS), and shows that subsampling-based inference is dominated under assumptions therein. In the notation of Assumption CEB, it is shown that R SS n 1/3 whereas R AA R B = O(n 1/2 ). (Bugni establishes these upper bounds pointwise in F, but they can be extended to hold uniformly under regularity conditions.) Therefore, subsampling is dominated in this specific setting. Further, we have only the trivial bounds R AA = R B = 0, and thus confidence intervals based on the asymptotic approximation and based on the bootstrap cannot be ranked. See Sections 3, 4, and 5 for more detailed examples. Although it does not provide optimality directly, domination does hint at the notion of optimality: not being dominated by any other member of the class I is a necessary but not sufficient condition for optimality, and intervals that dominate all others in the class I should be optimal. Our next definition formalizes this idea. Definition 2: Minimax Rate Optimal Interval. Under Assumption CEB, an interval I I is I /F -minimax coverage error rate optimal if lim inf n ( ) inf sup R 1 I I I L P F [θ F I] (1 α) > 0. F F 8
10 This definition is most interesting when R I > 0 for all I I and R I = o(1) for at least some. However, notice that we do not require uniformly, or even pointwise, consistent coverage, because R I need not vanish for all I I. Such intervals can be ranked, they are simply suboptimal in our framework. For example, this will occur in nonparametrics (Section 3) when using a meansquare error optimal bandwidth to conduct inference without bias reduction or, more generally, with intervals that are asymptotically conservative or liberal. See also Remark 3.1. Even if Assumption CEB holds with R I > 0 and R I = o(1) for all I I, we may not be able to find useful rankings if the bounds are too loose. To see why, consider an artificial example in which I has three members, with R I1 n 2, R I1 n 1, R I2 n 3/2, R I2 n 1, and R I3 R I3 n 1/2. Here, I 3 is dominated, but we are unable to further rank I 1 and I 2. Suppose further that we had a sharp bound for I 2 : R I2 R I2 n 1. In this case, only the bounds for I 1 do not agree, yet still we do not have enough information to conclusively rank I 1 and I 2. (Alternatively, we could restate the definition so that both were optimal.) In interesting applications of our framework, the bounds will not be loose. For many examples, including the three applications below, we can find the exact rate at which the worst-case coverage error vanishes for some I I. We will thus strengthen Assumption CEB by assuming that the lower and upper bounds exhibit the same rate, and this rate can be characterized. Assumption CER: Coverage Error Rate. For each I I, there exist a positive (bounded) sequence r I, such that 0 < lim inf n r 1 I R I lim sup n r 1 I R I <. Heuristically, the idea is that the worse-case coverage error of each I I is bounded and bounded away from zero after appropriate scaling: c I < r 1 I sup F F ( ) L P F [θ F I] (1 α) < C I, for constants 0 < c I C I <. This rules out intervals with zero worse-case coverage. Confidence intervals with exact coverage typically have too large I (e.g., taking the real line with probability 1 α), too small F (e.g., t-test inference in the Gaussian location model), or pertain to specific cases (e.g., rank-based tests of the median under symmetry). Our framework can accommodate 9
11 situations with zero-worst case coverage by putting all such procedures in the same equivalence class, but this is not as useful in the current context. Thus, we leave unranked procedures that do not exhibit coverage error for at least one F F : the main focus of our paper are scenarios where coverage error is unavoidable, arguably the most common case in practice. We can now formalize the minimax optimal rate in our framework. Definition 3: Minimax Optimal Rate. Under Assumption CER, a sequence r is the I /F - minimax optimal coverage error rate if lim inf n inf I I r I r > 0. This definition requires strictly more information than Definition 2. That is, Assumption CER is sufficient but not necessary for identifying the optimal interval in the sense of Definition 2. However, if Assumption CER holds, then any minimax optimal I will attain this rate and any interval that attains r is of course a minimax optimal interval. Identifying r is often a crucial step in providing practical guidance. Naturally, not all procedures can attain this rate and in many examples, even for those that can, certain implementation details must be chosen appropriately to yield an optimal interval estimator. For example, in Section 4, wild bootstrap intervals can be optimal, but even within this family, only certain bootstrap weights yield the optimal rate. Assumption CER is not as restrictive as it may seem. Indeed, often it is verified using higherorder asymptotic expansions, in which case even more is known about the worst-case coverage error. In many applications we can characterize the rate and constant of the leading term of the coverage error, as formalized in our final, and strongest, assumption. Assumption CEE: Coverage Error Expansion. For each I I, there exists a (bounded) sequence R I,F, with R I,F 0 for at least one F F, and a positive (bounded) sequence r I with R I,F = O(r I ) uniformly in F F, such that ) sup L (P F [θ F I] (1 α) R I,F = o(r I ). (2.2) F F For many classes of confidence intervals Assumption CEE will follow from an Edgeworth ex- 10
12 pansion or other higher-order approximation. Sections 3, 4, and 5 discuss concrete and empirically important contexts where such an approximation holds, covering parametric and nonparametric estimands and nuisance parameters, Gaussian and non-gaussian limiting distributions, and crosssectional and dependent data. Following the structure above, Assumption CEE is more than what is required for finding the rate-optimal interval (Definition 2) or the optimal rate (Definition 3), but with the stronger assumption more can be learned from the constants. Notice that R I,F subsumes the rate and constant, and is thus not restricted to be positive (cf. Assumptions CEB and CER). Without loss of generality we can set R I,F = r I,F C I,F, where r I,F is a positive (usually vanishing) sequence, the rate, and C I,F will be a non-vanishing bounded sequence, forming the constant term of the expansion (when it converges). These constants will often be useful to guide practical implementation, such as tuning parameter selection. Section 3 illustrates this point by constructing data-driven coverage-optimal bandwidth selectors in the context of local polynomial nonparametric regression. Calonico, Cattaneo, and Farrell (2018b) further investigate this in the specialized setting of regression discontinuity designs. Furthermore, if C I,F can be appropriately characterized, uniformly over F, it may be possible to minimize both the rate and constants, i.e. finding the minimax coverage error rate optimal intervals, and then within this group, find the best constants. This brings us to the final definition in our framework. Definition 4: Minimax Optimal Interval. Under Assumption CEE, an interval I I is I /F -minimax coverage error optimal if r I = r (of Definition 3) and lim inf n inf I I sup F F L(C I,F ) sup F F L(C I,F ) 1. This ranking requires the strongest assumption, but accordingly, is the most powerful: giving essentially a complete and strict notion of optimality within I and F. This level of information is not always available (see Section 5 for an example). Indeed, we will focus on rate optimality (Definitions 1 3) in the subsequent sections, both in our new results and in unifying the literature, relegating the role of the constants for practical implementation only. In future work, we plan to further investigate the optimality notion given in Definition 4. 11
13 Assumptions CEB, CER, and CEE, and Definitions 1 4, complete the description of our proposed optimality framework. It is fairly general with respect to both I and F, and in many cases one or more of the assumptions is verifiable for interesting and large classes of intervals and DGPs. We now turn to three applications: nonparametric regression (Section 3), linear least squares regression (Section 4), and HAR inference (Section 5). Others are mentioned in Section 6. Remark 1. An alternative idea when ranking confidence interval estimators is to search for the shortest interval and/or fastest contracting interval among those with asymptotically and/or uniformly conservative coverage (i.e. uniform size control). This method considers coverage as fixed, and not necessarily correct, and optimizes length (or power). We optimize coverage error under assumptions that will, in general, restrict attention to finite-length intervals. The two rankings need not agree because the uniform quality guarantees are different. Specifically, an I is uniformly asymptotically conservative level α if for any δ > 0, there exists an n 0 = n 0 (δ) such that for all n n 0, inf F F P F [θ F I] (1 α) δ. In contrast, we are interested in intervals for which (2.1) vanishes, which translates to the guarantee that for all n n 0, sup F F L(P F [θ F I] (1 α)) < δ. See Romano (2004) and Romano, Shaikh, and Wolf (2010) for more discussion. Remark 2. Our framework focuses squarely on inference quality, and not on quality of point estimation. In general, these goals are not the same, and our framework highlights the distinction: it may be possible to find an excellent approximation to the sampling distribution of a poor point estimator. There are many ways to measure the quality of a point estimator, perhaps the two most common being mean square error and, among unbiased estimators (or within a bias tolerance), precision/efficiency. Coverage error improvements can come at the expense of these measures, and our framework can quantify this tradeoff precisely. For example, in Section 3.4 we show that the MSE optimal point estimator is suboptimal in terms of coverage error, and furthermore, in some cases the coverage error optimal interval implicitly uses a point estimator that is not even consistent in mean square, revealing a striking gap between the two notions of quality. The distinction between precision and coverage error is also evident in Section 5, where fixed-b HAR procedures are not asymptotically efficient (manifesting, in particular, as longer intervals) but offer coverage improvements. Some of the examples mentioned in Section 6 have the same features. In general, our framework reinforces the point that when the researcher seeks better statistical 12
14 inference they should choose a method explicitly for that goal. 3 Application to Local Polynomial Nonparametric Regression The first application of our framework is to local polynomial regression. We will characterize the minimax optimal coverage error rate r for a popular class of confidence interval estimators (restricting I ) under precise smoothness restriction of the regression function (restricting F ). An important lesson of this section, recalling the discussion above, is that r, and the set of intervals which can attain it, depends crucially on both F and I, and in particular through the smoothness assumed for the population regression function (in F ) and the smoothness exploited by the interval estimator (in I ). We show that, with appropriate choice of bandwidth, standard errors and quantiles, the robust bias corrected confidence intervals proposed by Calonico, Cattaneo, and Farrell (2018a) are minimax rate optimal in the sense of Definitions 2 and 3. For this application, we require new technical results to verify Assumption CEE (and hence CEB and CER). Specifically, we obtain novel uniformly valid Edgeworth expansions for local polynomial estimators of the regression function and its derivatives at both interior and boundary points; given in the supplemental appendix and underlying Lemma 3.1 below. These results improve upon the current literature by (i) establishing uniformity over empirically-relevant classes of DGPs, (ii) covering derivative estimation, and (iii) allowing for the uniform kernel. 3.1 The Class of Data Generating Processes To apply our proposed framework, we must make precise the classes F and I. We begin with F. For a pair of random variables (Y, X), the object of interest is a derivative of the regression function at a point x in the support of X: θ F = µ (ν) ν F (x) := x ν E F [Y X =x], (3.1) x=x with ν Z +. From the rest of this section, we assume x = 0 and omit the point of evaluation (e.g., θ F = µ (ν) ) whenever possible. All generic results in this section cover both interior and boundary F cases, and could be naturally extended to vector-valued data. 13
15 The class of DGPs is defined by the following set of conditions. Assumption 3.1 (DGP). {(Y 1, X 1 ),..., (Y n, X n )} is a random sample from (Y, X) which are distributed according to F. There exist constants S ν, s (0, 1], 0 < c < C <, and δ > 8, and a neighborhood of x = 0, none of which depend on F, such that for all x, x in the neighborhood: (a) the Lebesgue density of X i, f( ), is continuous and c f(x) C, v(x) := V[Y i X i = x] c and continuous, and E[ Y i δ X i = x] C, and (b) µ( ) is S-times continuously differentiable and µ (S) (x) µ (S) (x ) C x x s. The conditions here are not materially stronger than usual, other than the requirement that they hold independently of F, which is used to prove uniform results. Assumption 3.1(b) highlights the smoothness assumption, which sets a limit of how quickly the worst-case coverage error can decay. The distinction and interplay between the smoothness assumed here and that utilized by the procedure will be important for our results, as made precise below. Procedures that make use of more smoothness will yield faster rates when such smoothness is available, but there is also an important optimality notion among inference procedures that exploit the same level of smoothness. To make these points precise, we must first define the class of confidence interval procedures (and estimators) considered. 3.2 The Class of Confidence Interval Estimators We restrict I to contain t-test-based intervals constructed using local polynomial methods (i.e., weighted least squares regression), and discuss optimal procedures within this class. Many other ways of forming confidence intervals exists, of course, as well as other nonparametric regression techniques, and all have strengths and weaknesses. We focus on local polynomial t-test-based intervals because they are tractable and popular in empirical work. We will only briefly review local polynomial estimation (see Fan and Gijbels, 1996, for more). Define ˆµ (ν) via the local regression: ˆµ (ν) = ν!e ν ˆβ = 1 nh ν ν!e νγ 1 ΩY, ˆβ = arg min b R p+1 n (Y i r p (X i ) b) 2 K i=1 ( ) Xi, (3.2) h 14
16 where p ν is an integer with p ν odd, e ν is the (p + 1)-vector with a one in the (ν + 1) th position and zeros in the rest, r p (u) = (1, u, u 2,..., u p ), K is a kernel or weighting function, Γ = n i=1 (nh) 1 K(X i /h)r p (X i /h)r p (X i /h), Ω = [K(X 1 /h)r p (X 1 /h),..., K(X n /h)r p (X n /h)], and Y = (Y 1,..., Y n ). The two germane quantities here are the bandwidth sequence h, assumed to vanish as n diverges, and p, the order of the polynomial, set as usual so that p ν is odd. These are chosen by the researcher and will impact the coverage error decay rate. The rate depends on the local sample size, nh, and the pointwise bias, determined by h, p, and the assumed smoothness. With these chosen, and a valid standard error choice ˆσ p (detailed below), the standard t-statistic is T p = nh 1+2ν (ˆµ (ν) θ F ) ˆσ p. (3.3) Valid inference requires a choice of the tuning parameter h, which is often regarded as the most difficult in practice and most delicate in theory. Our framework of coverage optimality sheds new light on this problem by motivating inference-optimal (coverage error minimizing) bandwidth choices, an important result from our work for empirical research. To formalize how the bandwidth choice impacts estimation and inference, let us begin with the most common choice by far, and indeed, the default in most software packages: minimizing the mean-squared error (MSE) of the point estimator ˆθ p := ˆµ (ν) (x). To characterize the MSE-optimal bandwidth, suppose for the moment that p S 1. Then the conditional mean and variance of ˆθ p are: E [ˆµ (ν) ] X 1,..., X n = µ (ν) + h p+1 ν ν!e νγ 1 Λ µ(p+1) (p + 1)! + o P(h p+1 ν ), (3.4) with Λ = Ω[(X 1 /h) p+1,, (X n /h) p+1 ] /n, and V [ˆµ (ν) ] X 1,..., X n = 1 nh 1+2ν ν!2 e νγ 1 (hωσω /n)γ 1 e ν, (3.5) with Σ the n n diagonal matrix with elements v(x i ). The MSE-optimal bandwidth will thus obey h mse n 1/(2p+3), whenever µ (p+1) 0. (Throughout this paper, asymptotic orders and their in-probability versions hold uniformly in F, as required by our framework; e.g., A n = o P (a n ) means sup F Fn P F [ A n /a n > ɛ] 0 for every ɛ > 0.) The rate of decay of h mse does not depend on the 15
17 specific derivative being estimated, though the convergence rate of the point estimate ˆµ (ν) to µ (ν) F will depend on ν. This is a well-known feature of local polynomials, but warrants mention as the coverage error decay rate will also not depend on the derivative, as established below. The MSE-optimal bandwidth is too large for inference: the bias term of (3.4) remains firstorder important after scaling by the standard deviation, rendering standard Gaussian inference invalid. To remove this bias term, we consider two approaches: undersmoothing and robust explicit bias correction. The former simply involves choosing a bandwidth that vanishes more rapidly than n 1/(2p+3), rendering the bias negligible. Explicit bias correction involves subtracting an estimate of the leading term of (3.4), of which only µ (p+1) is unknown, and then inference is made robust by accounting for the variability of this point estimate. The estimate ˆµ (p+1) is defined via (3.2), with p + 1 in place of both p and ν throughout, and a bandwidth b := ρ 1 h instead of h. These implementation choices have a precise theoretical justification (Calonico, Cattaneo, and Farrell, 2018a). The bias corrected point estimate is ˆµ (p+1) ˆθ rbc = ˆµ (ν) h p+1 ν ν!e νγ 1 Λ 1 (p + 1)! = 1 nh ν ν!e νγ 1 Ω rbc Y, Ω rbc = Ω ρ p+1 Λe p+1 Γ 1 Ω, where Γ and Ω are defined as above, but with p + 1 and b in place of p and h, respectively. Comparing to (3.2), all that has changed is the matrix Ω premultiplying Y. The final choice for implementation is that of variance estimator, which is also important for coverage error. This is a crucial aspect that differentiates traditional first-order analyses, where only consistency is required, from higher-order theory, which captures explicitly the uncertainty in variance estimation (among other things). Thus part of finding the optimal procedure in our framework is a careful choice of standard errors. In general, there are two types of higher-order terms that arise due to Studentization. One is the unavoidable estimation error incurred when replacing a population standardization, say σ 2, with a feasible Studentization, ˆσ 2. However, there is also an error in the difference between the population variability of the point estimate (i.e. of the numerator of the t-statistic) and the population standardization chosen. A fixed-n approach is one where the Studentization ˆσ 2 is chosen directly to estimate V[ nh 1+2ν ˆθ X 1,..., X n ], a fixed-n calculation. Fixed-n Studentization completely removes the second type of error. This can be contrasted with the popular practice of Studentizing with a feasible version of the asymptotic 16
18 variance, i.e. finding the probability limit of V[ nh 1+2ν ˆθ X 1,..., X n ] and estimating any unknown quantities. This is valid to first order, but the difference between V[ nh 1+2ν ˆθ X 1,..., X n ] and its limit manifest in the higher-order expansion, exacerbating coverage error. At boundary points these errors are O(h) and thus particularly damaging to coverage. Other possibilities are available and may also be detrimental to coverage. We can now define the class of confidence intervals we consider, which indexes choices of point estimates, standard errors, bandwidths, and quantiles. Regularity conditions are placed upon the kernel function. All of these represent choices made by the researcher, and each choice impacts the coverage error, as made precise below. Our results give practical guidance for these choices, the most important of which is the choice of bandwidth. In general, we shall write I and I, but when discussing specific choices it will be useful notationally to write the intervals as functions of these choices, such as I(h) for an interval based on a bandwidth h or I(ˆθ, ˆσ) for specific choices of point estimate and standard errors. The other choices will be clear from the context. In particular, let I p = I(ˆθ p, ˆσ p ) and I rbc = I(ˆθ rbc, ˆσ rbc ), where ˆσ p and ˆσ rbc are defined below. Assumption 3.2 (Confidence Intervals). (a) I is of the form I = [ˆθ / zuˆσ nh 1+2ν, ˆθ z / lˆσ nh 1+2ν] (3.6) for a point estimator ˆθ = ˆθ p or ˆθ rbc, a well-behaved standard error ˆσ (defined below), fixed quantiles z l and z u, a nonrandom bandwidth sequence h = Hn γ, with H bounded and bounded away from zero and c γ 1 c for c > 0, not depending on I, and, if required, a fixed, bounded ρ = h/b. (b) The kernel K is supported on [ 1, 1], positive, bounded, and even. Further, K(u) is either constant (the uniform kernel) or (1, K(u)r 3(p+1) (u)) is linearly independent on [ 1, 1]. The order p is at least ν and p ν is odd. By well-behaved standard errors here we mean two things. First, we assume that the standard errors are (uniformly) valid, in that the associated t-statistic is asymptotically standard Normal. Inference based on invalid standard errors will be dominated, trivially, and thus while we could in principle account for this, we assume it away for simplicity. Second, and more importantly, is 17
19 that the ingredients of the standard errors must obey Cramér s condition. This can be assumed directly, but for kernel-based estimators of V[ nh 1+2v ˆθ X 1,..., X n ] or its limit, we prove in the supplement that Assumption 3.2(b) ensures this. In particular, for ˆθ p and ˆθ rbc, we will focus on the following fixed-n standard errors, following the ideas above. Let ˆΣ p and ˆΣ rbc be the diagonal matrixes of estimates of v(x i ), given by ˆv(X i ) = (Y i r p (X i ) ˆβ) 2 for the former and and ˆv(X i ) = (Y i r p+1 (X i ) ˆβ p+1 ) 2 for the latter, where ˆβ p+1 is defined as in (3.2), with p + 1 in place of p and b instead of h. Then, we let ˆσ 2 p = ν! 2 e νγ 1 (hω ˆΣ p Ω /n)γ 1 e ν and ˆσ 2 rbc = ν! 2 e νγ 1 (hω rbc ˆΣ rbc Ω rbc/n)γ 1 e ν. (3.7) For the quantiles, the most common choices are z l = Φ 1 (α/2) =: z α/2 and z u = Φ 1 (1 α/2) =: z 1 α, where Φ is the standard Normal distribution function, but our results allow for other options. 2 For coverage error purposes, symmetric choices, i.e. where Φ (1) (z l ) = Φ (1) (z u ), yield improvements in coverage error due to cancellations in Edgeworth expansion terms. Asymmetric choices such that Φ(z u ) Φ(z l ) = 1 α can still yield correct coverage, but at a slower rate. Under Assumption 3.2(b) we prove (in the supplement) that the appropriate n-varying version of Cramér s condition hold for all I I. We do not need to make an opaque high-level assumption. Prior work on Edgeworth expansions for nonparametric inference has, explicitly or implicitly, ruled out the uniform kernel (Hall, 1991; Chen and Qin, 2002; Calonico, Cattaneo, and Farrell, 2018a) or treated the regressors as fixed (Hall, 1992b; Neumann, 1997). We are able to include the uniform kernel, which is important to account for popular empirical practice (i.e., local least squares). All popular kernel functions are now allowed for by Assumption 3.2(b), including uniform, triangular, Epanechnikov, and so forth. 3.3 Uniform Coverage Error Expansions Before we can apply the framework of Section 2 to this class of problems we must verify one of the assumptions. We now establish uniformly valid coverage error expansions, which verify Assumption CEE and define the R I,F. These are used in the next subsection to identify optimal intervals and rates, and further below to select inference-optimal bandwidths. As mentioned above, the relationship between S and p captures the interplay between I and 18
20 F in this problem. This is due to the bias of ˆθ p and ˆθ rbc, which manifest in the expansions. It is convenient to separate the rate and constant portions with specific notation. Let the (fixed-n) population bias of h ν ˆθ be denoted by h η ψ I,F, where both η > 0 and ψ I,F depend on the specific procedure and F. In general, the rate will be known but the constants may be unknown or even (if p > S) uncharacterizable without further assumptions (details are in the supplement). For example, in the case of the MSE-optimal bandwidth discussed above, with p < S, η = p + 1 and ψ Ip,F is ν!e νe[γ] 1 E[Λ]µ (p+1) /(p+1)!; c.f. Equation (3.4). In this notation, explicit bias correction removes an estimate of h η ν ψ Ip,F. For coverage to be uniformly correct in large samples, we will assume that nhh η vanishes asymptotically, making no explicit mention of smoothness. We can then use the generic expansions to study the coverage error of each interval and its dependence on p and S. For example, in the case of the standard approach, using ˆθ p and ˆσ p, this is the standard undersmoothing requirement for correct coverage. The coverage error expansions are given next. For an interval I, the coverage error is the difference of Edgeworth expansions for the associated t-statistic, evaluated at each quantile. The expansion is given in terms of six functions ω k,i,f (z), k = 1, 2,..., 6. These are cumbersome notationally, and so the exact forms are deferred to the supplement. All that is important for our results is that they are known for all I I and F F, bounded, and bounded away from zero for at least some F F, and most crucially, that ω 1, ω 2, and ω 3 are even functions of z, while ω 4, ω 5, and ω 6 are odd. Also appearing is λ I,F, a generic placeholder capturing the mismatch between the variance of the numerator of the t-statistic and the population standardization chosen (i.e. the quantity estimated by ˆσ of I). We can not make this error precise for all choices, but we consider two important special cases. First, employing an estimate of the asymptotic variance renders λ I,F = O(h) at boundary points. Second, the fixed-n Studentizations (3.7) yield λ I,F 0. For other choices, the rates and constants may change, but it is important to point out that the coverage error rate cannot be improved beyond the others shown through the choice of Studentization alone (see discussion in the supplement). Let λ I such that sup F F λ I,F = O(λ I ) = o(1). Our main technical result for local polynomials is the following Lemma. Lemma 3.1. Let F collect all F which obey Assumption 3.1 and I collect all I that obey Assump- 19
21 tion 3.2. Then, uniformly over I, if γ > 1/(1 + 2η) and z l, z u are such that Φ(z u ) Φ(z l ) = 1 α, then where r I = max{(nh) 1, nh 1+2η, h η, λ I } and ) sup L (P F [θ F I] (1 α) R I,F = o (r I ), F F R I,F = 1 nh { ω1,i,f (z u ) ω 1,I,F (z l ) } + nhh η{ ψ I,F [ω 2,I,F (z u ) ω 2,I,F (z l )] } + 1 { ω4,i,f (z u ) ω 4,I,F (z l ) } + nh 1+2η{ ψi,f 2 [ω 5,I,F (z u ) ω 5,I,F (z l )] } nh + h η{ ψ I,F [ω 6,I,F (z u ) ω 6,I,F (z l )] } { + λ I,F ω3,i,f (z u ) ω 3,I,F (z l ) }, ( ) otherwise sup F F L P F [θ F I] (1 α) 1. This result is quite general, covering unadjusted, undersmoothed, and robust bias corrected confidence intervals, as well as other methods (Remark 3.1), at interior and boundary points. To fully utilize this result, and identify optimal procedures, we have to specify the relationship of p to S. However, even at this level of generality, some important conclusions are available. First, this shows the well-known result that symmetric intervals, with z l = z u, have superior coverage: the even functions ω 1, ω 2, and ω 3 cancel, and these are the slowest-decaying. Second, the final conclusion of the theorem simply formalizes the idea that the bandwidth must vanish at the appropriate rate (among other choices) lest worst-case coverage error persists asymptotically. Notice that such intervals can be ranked in our framework, but are dominated (Definition 1). Third, I such that λ I,F 0 yield superior coverage. Taking these three conclusions into account, for the rest of the paper we use the fixed-n standard errors of (3.7) and z l = z α/2 := Φ 1 (α/2) and z u = z 1 α 2 := Φ 1 (1 α/2). The coverage error of such an interval is R I,F = 1 { 2ω4,I,F (z nh α/2 ) } + nh 1+2η{ 2ψI,F 2 ω 5,I,F (z α/2 ) } + h η{ 2ψ I,F ω 6,I,F (z α/2 ) }. (3.8) Below we use this form to obtain the optimal rates and intervals, as well as to select the bandwidths. Finally, Lemma 3.1 implicitly reveals how fundamentally different are point estimation and inference, recalling Remark 2. First, the two may proceed at different rates. Observe that the rate of R I,F does not depend on the order of the derivative being estimate, ν. As a consequence, neither will the optimal rate r (Definition 3) nor the optimal procedure (Definition 2), encompassing, in 20
On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference
On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference Sebastian Calonico Matias D. Cattaneo Max H. Farrell November 4, 014 PRELIMINARY AND INCOMPLETE COMMENTS WELCOME Abstract
More informationOn the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference
On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference arxiv:1508.02973v6 [math.st] 7 Mar 2018 Sebastian Calonico Matias D. Cattaneo Department of Economics Department of Economics
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationConfidence intervals for kernel density estimation
Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting
More informationOptimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs
Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs Sebastian Calonico Matias D. Cattaneo Max H. Farrell September 14, 2018 Abstract Modern empirical work in
More informationNonparametric Inference via Bootstrapping the Debiased Estimator
Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be
More informationSIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised March 2018
SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION By Timothy B. Armstrong and Michal Kolesár June 2016 Revised March 2018 COWLES FOUNDATION DISCUSSION PAPER NO. 2044R2 COWLES FOUNDATION
More informationSimple and Honest Confidence Intervals in Nonparametric Regression
Simple and Honest Confidence Intervals in Nonparametric Regression Timothy B. Armstrong Yale University Michal Kolesár Princeton University June, 206 Abstract We consider the problem of constructing honest
More informationRegression Discontinuity Designs Using Covariates
Regression Discontinuity Designs Using Covariates Sebastian Calonico Matias D. Cattaneo Max H. Farrell Rocío Titiunik May 25, 2018 We thank the co-editor, Bryan Graham, and three reviewers for comments.
More informationSIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised October 2016
SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION By Timothy B. Armstrong and Michal Kolesár June 2016 Revised October 2016 COWLES FOUNDATION DISCUSSION PAPER NO. 2044R COWLES FOUNDATION
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationSMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES
Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More informationInference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms
Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms Byunghoon ang Department of Economics, University of Wisconsin-Madison First version December 9, 204; Revised November
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationDepartment of Economics, Vanderbilt University While it is known that pseudo-out-of-sample methods are not optimal for
Comment Atsushi Inoue Department of Economics, Vanderbilt University (atsushi.inoue@vanderbilt.edu) While it is known that pseudo-out-of-sample methods are not optimal for comparing models, they are nevertheless
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationComparison of inferential methods in partially identified models in terms of error in coverage probability
Comparison of inferential methods in partially identified models in terms of error in coverage probability Federico A. Bugni Department of Economics Duke University federico.bugni@duke.edu. September 22,
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationOptimal bandwidth selection for the fuzzy regression discontinuity estimator
Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal
More informationRegression Discontinuity Designs Using Covariates
Regression Discontinuity Designs Using Covariates Sebastian Calonico Matias D. Cattaneo Max H. Farrell Rocío Titiunik March 31, 2016 Abstract We study identification, estimation, and inference in Regression
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationDiscussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis
Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationWooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics
Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).
More informationLarge Sample Properties of Partitioning-Based Series Estimators
Large Sample Properties of Partitioning-Based Series Estimators Matias D. Cattaneo Max H. Farrell Yingjie Feng April 13, 2018 Abstract We present large sample results for partitioning-based least squares
More informationProgram Evaluation with High-Dimensional Data
Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference
More informationRobust Backtesting Tests for Value-at-Risk Models
Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society
More informationRegression Discontinuity Designs in Stata
Regression Discontinuity Designs in Stata Matias D. Cattaneo University of Michigan July 30, 2015 Overview Main goal: learn about treatment effect of policy or intervention. If treatment randomization
More informationORIGINS OF STOCHASTIC PROGRAMMING
ORIGINS OF STOCHASTIC PROGRAMMING Early 1950 s: in applications of Linear Programming unknown values of coefficients: demands, technological coefficients, yields, etc. QUOTATION Dantzig, Interfaces 20,1990
More informationQuantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation
Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Maria Ponomareva University of Western Ontario May 8, 2011 Abstract This paper proposes a moments-based
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationThe Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University
The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor
More informationClosest Moment Estimation under General Conditions
Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids
More informationSupplement to On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference
Supplement to On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference Sebastian Calonico Matias D. Cattaneo Max H. Farrell December 16, 2017 This supplement contains technical
More informationInference on distributions and quantiles using a finite-sample Dirichlet process
Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics
More informationLocation Properties of Point Estimators in Linear Instrumental Variables and Related Models
Location Properties of Point Estimators in Linear Instrumental Variables and Related Models Keisuke Hirano Department of Economics University of Arizona hirano@u.arizona.edu Jack R. Porter Department of
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationNonparametric Regression
Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y
More informationReliable Inference in Conditions of Extreme Events. Adriana Cornea
Reliable Inference in Conditions of Extreme Events by Adriana Cornea University of Exeter Business School Department of Economics ExISta Early Career Event October 17, 2012 Outline of the talk Extreme
More informationPeter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationClosest Moment Estimation under General Conditions
Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest
More informationA Simple Adjustment for Bandwidth Snooping
A Simple Adjustment for Bandwidth Snooping Timothy B. Armstrong Yale University Michal Kolesár Princeton University June 28, 2017 Abstract Kernel-based estimators such as local polynomial estimators in
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationWhat s New in Econometrics. Lecture 13
What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationRefining the Central Limit Theorem Approximation via Extreme Value Theory
Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More information22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More informationSection 7: Local linear regression (loess) and regression discontinuity designs
Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear
More informationNonparametric Cointegrating Regression with Endogeneity and Long Memory
Nonparametric Cointegrating Regression with Endogeneity and Long Memory Qiying Wang School of Mathematics and Statistics TheUniversityofSydney Peter C. B. Phillips Yale University, University of Auckland
More informationComputational Tasks and Models
1 Computational Tasks and Models Overview: We assume that the reader is familiar with computing devices but may associate the notion of computation with specific incarnations of it. Our first goal is to
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationLECTURE 10: REVIEW OF POWER SERIES. 1. Motivation
LECTURE 10: REVIEW OF POWER SERIES By definition, a power series centered at x 0 is a series of the form where a 0, a 1,... and x 0 are constants. For convenience, we shall mostly be concerned with the
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationAlgorithm Independent Topics Lecture 6
Algorithm Independent Topics Lecture 6 Jason Corso SUNY at Buffalo Feb. 23 2009 J. Corso (SUNY at Buffalo) Algorithm Independent Topics Lecture 6 Feb. 23 2009 1 / 45 Introduction Now that we ve built an
More informationStatistica Sinica Preprint No: SS
Statistica Sinica Preprint No: SS-017-0013 Title A Bootstrap Method for Constructing Pointwise and Uniform Confidence Bands for Conditional Quantile Functions Manuscript ID SS-017-0013 URL http://wwwstatsinicaedutw/statistica/
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationBootstrap Tests: How Many Bootstraps?
Bootstrap Tests: How Many Bootstraps? Russell Davidson James G. MacKinnon GREQAM Department of Economics Centre de la Vieille Charité Queen s University 2 rue de la Charité Kingston, Ontario, Canada 13002
More informationRobust Performance Hypothesis Testing with the Variance. Institute for Empirical Research in Economics University of Zurich
Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 516 Robust Performance Hypothesis Testing with the Variance Olivier Ledoit and Michael
More informationDuration-Based Volatility Estimation
A Dual Approach to RV Torben G. Andersen, Northwestern University Dobrislav Dobrev, Federal Reserve Board of Governors Ernst Schaumburg, Northwestern Univeristy CHICAGO-ARGONNE INSTITUTE ON COMPUTATIONAL
More informationMultiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap
University of Zurich Department of Economics Working Paper Series ISSN 1664-7041 (print) ISSN 1664-705X (online) Working Paper No. 254 Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationMFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 18, 2015
MFM Practitioner Module: Risk & Asset Allocation February 18, 2015 No introduction to portfolio optimization would be complete without acknowledging the significant contribution of the Markowitz mean-variance
More information11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationAsymptotic Relative Efficiency in Estimation
Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationStatistical Properties of Numerical Derivatives
Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective
More informationoptimal inference in a class of nonparametric models
optimal inference in a class of nonparametric models Timothy Armstrong (Yale University) Michal Kolesár (Princeton University) September 2015 setup Interested in inference on linear functional Lf in regression
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationRobustness and Distribution Assumptions
Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology
More informationInterior-Point Methods for Linear Optimization
Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationINFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction
INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationInference in Regression Discontinuity Designs with a Discrete Running Variable
Inference in Regression Discontinuity Designs with a Discrete Running Variable Michal Kolesár Christoph Rothe arxiv:1606.04086v4 [stat.ap] 18 Nov 2017 November 21, 2017 Abstract We consider inference in
More informationMultiscale Adaptive Inference on Conditional Moment Inequalities
Multiscale Adaptive Inference on Conditional Moment Inequalities Timothy B. Armstrong 1 Hock Peng Chan 2 1 Yale University 2 National University of Singapore June 2013 Conditional moment inequality models
More informationOPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 COWLES FOUNDATION DISCUSSION PAPER NO.
OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS By Timothy B. Armstrong and Michal Kolesár May 2016 COWLES FOUNDATION DISCUSSION PAPER NO. 2043 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY
More informationFINITE-SAMPLE OPTIMAL ESTIMATION AND INFERENCE ON AVERAGE TREATMENT EFFECTS UNDER UNCONFOUNDEDNESS. Timothy B. Armstrong and Michal Kolesár
FINITE-SAMPLE OPTIMAL ESTIMATION AND INFERENCE ON AVERAGE TREATMENT EFFECTS UNDER UNCONFOUNDEDNESS By Timothy B. Armstrong and Michal Kolesár December 2017 Revised December 2018 COWLES FOUNDATION DISCUSSION
More informationECO Class 6 Nonparametric Econometrics
ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................
More informationPreface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation
Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric
More informationON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING AND THE BOOTSTRAP. Joseph P. Romano Azeem M. Shaikh
ON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING AND THE BOOTSTRAP By Joseph P. Romano Azeem M. Shaikh Technical Report No. 2010-03 April 2010 Department of Statistics STANFORD UNIVERSITY Stanford, California
More information