Coverage Error Optimal Confidence Intervals

Size: px
Start display at page:

Download "Coverage Error Optimal Confidence Intervals"

Transcription

1 Coverage Error Optimal Confidence Intervals Sebastian Calonico Matias D. Cattaneo Max H. Farrell August 3, 2018 Abstract We propose a framework for ranking confidence interval estimators in terms of their uniform coverage accuracy. The key ingredient is the (existence and) quantification of the error in coverage of competing confidence intervals, uniformly over some empirically-relevant class of data generating processes. The framework employs the check function to quantify coverage error loss, which allows researchers to incorporate their preference in terms of over- and under-coverage, where confidence intervals attaining the best-possible uniform coverage error are minimax optimal. We demonstrate the usefulness of our framework with three distinct applications. First, we establish novel uniformly valid Edgeworth expansions for nonparametric local polynomial regression, offering some technical results that may be of independent interest, and use them to characterize the coverage error of and rank confidence interval estimators for the regression function and its derivatives. As a second application we consider inference in least squares linear regression under potential misspecification, ranking interval estimators utilizing uniformly valid expansions already established in the literature. Third, we study heteroskedasticity-autocorrelation robust inference to showcase how our framework can unify existing conclusions. Several other potential applications are mentioned. Keywords: minimax bound, Edgeworth expansion, nonparametric regression, robust bias correction, linear models, bootstrap, heteroskedasticity-autocorrelation robust inference, optimal inference. The second author gratefully acknowledges financial support from the National Science Foundation (SES and SES ). We thank Federico Bugni, Chris Hansen, Michael Jansson, Andres Santos, Azeem Shaikh, Rocio Titiunik, and participants at various seminars and conference for comments. Department of Economics, University of Miami. Department of Economics and Department of Statistics, University of Michigan. Booth School of Business, University of Chicago.

2 1 Introduction Researchers typically have a range of options for constructing confidence intervals for a parameter of interest in empirical work. These options arise from many sources: how the sampling distribution is approximated, how standard errors or quantiles are constructed, how tuning and smoothing parameters are selected, or what concept of validity/robustness is used. Often many competing interval estimators are valid in some principled sense, but it is difficult to choose which are best among them. When two options are asymptotically equivalent to first order, for a given data generating process, practical guidance does not follow from theory. Further, it is rare that economic or other subject-specific theory dictate a choice of inference procedure. Instead, the model and accompanying assumptions formalize what the researcher believes to be a plausible class of distributions that could have generated the data, and they would like some assurances that the chosen confidence interval is accurate in level regardless of the specific data generating process. It is natural and desirable to know if any confidence intervals are more accurate than others over this class of distributions. We propose a framework that quantifies inference quality according to the coverage error of competing confidence interval estimators, uniformly over a class of data-generating processes (DGPs), thereby allowing us to rank those estimators. In a nutshell, we look at the worst-case coverage error for a given confidence region over the distributions allowed by the researcher s assumptions and then use this information to characterize an optimal inference procedure: a minimax notion of optimality. We employ the check function loss to quantify coverage error, which allows for asymmetric penalization of under- or over-coverage in the optimality criterion. For example, it is common practice to prefer conservative confidence intervals, perhaps at the expense of interval length, and our proposed framework can incorporate such preference directly. We necessarily restrict attention to classes of confidence regions for which the uniform coverage error can be quantified in some way. In the generic framework outlined in Section 2, we give nested levels of knowledge required of such a quantification. At heart, each of these involve some degree of higher-order asymptotic analysis (which is precisely how we can distinguish between first-order equivalent procedures). The weakest assumption is simply bounds on the rate of decay of the worst-case coverage error. When these bounds are nontrivial, we are able to conclusively rank 1

3 some procedures. The strongest assumption consists of a full and precise quantification of the leading terms of coverage error, in which case we can rank all procedures, identify minimax optimal rates (of coverage error decay), and single out optimal inference procedures. Optimization over constants is also possible in our framework. We show through examples that the required levels of knowledge are available in many contexts, and even the most stringent requirement is often met. In an application to nonparametric regression, we develop a uniformly valid coverage error expansion and use it to give a novel recipe for optimal inference. In other applications, we rely on existing results, from bounds to expansions, and show how our framework can be used to unify and extend previous rankings among competing inference procedures. In all cases, our framework gives principled guidance to practitioners. In our framework both the class of confidence intervals and the class of data generating processes play a crucial role: together they determine the lower ( min ) and upper ( max ) portions of the minimax optimality, and in particular, neither class should be too large nor too small in order to obtain useful and interesting results. If the class of intervals is too small it will not reflect the range of choices available to the practitioner, while if too large, the optimal procedure may be infeasible or not useful. For example, the interval estimator set to the real line with probability (1 α) and empty otherwise yields perfect coverage over any possible distribution, but is uninformative. In much the same way, if the assumed class of distributions is too small, it is unlikely to be rich enough to be useful in real-world applications. Economic or other field-specific theory often does not make tight restrictions on the distribution of the data (such as Gaussianity), and any inference procedure designed to be optimal or valid under these restrictions may exhibit poor behavior when the restrictions do not hold. The class is too small for the uniformity to be interesting. On the other hand, some restrictions on the distribution are necessary, because if the class is too large then uniformly valid and informative inference procedures can not be constructed; an idea with a long history that has been studied in a variety of settings (see Bahadur and Savage, 1956; Dufour, 1997; Romano and Wolf, 2000; Romano, 2004; Hirano and Porter, 2012, just to name a few examples). The interplay between the two classes will be crucial for our results. To illustrate, consider nonparametric regression, which we study in Section 3. In this context, the class of intervals may restrict attention to only low order approximation, for example all methods removing bias up to a certain order, even if the underlying functions of the DGP are assumed to possess more smoothness. 2

4 Alternatively, it could be that the intervals are based on approximations trying to utilizing more smoothness than available. The optimal coverage error rate is affected by these assumptions, as we study in detail below. This interplay impacts the optimal coverage error rate in other contexts as well, and applies to conditions other than smoothness, such as orthogonality of the errors in a linear model, as studied by Kline and Santos (2012). We employ their work to provide a second application of our generic framework in Section 4. Highlighting the importance of such interplay between the class of DGPs and the class of confidence intervals considered when developing coverage error minimax optimality is an additional methodological contribution that emerges naturally from our general framework (and the specific examples we study). The remaining of this paper proceeds as follows. Section 2 proposes our generic framework for ranking confidence interval estimators. Sections 3, 4, and 5 then consider three distinct applications of our framework to, respectively, nonparametric local polynomial regression, traditional least squares linear regression, and heteroskedasticity-autocorrelation robust (HAR) inference. The breadth of these applications spans parametric and nonparametric estimands and nuisance parameters, standard and nonstandard limiting distributions, and cross-sectional and dependent data. Section 6 briefly mentions several other possible applications, and concludes. The online supplement contains omitted formulas and proofs. Beyond their use as an ingredient in our framework, Section 3 contains new technical and methodological results that may be of independent interest. We establish uniformly valid Edgeworth expansions for local polynomial regression, and use them to derive inference-optimal bandwidth choices that minimize coverage error or, alternatively, balance coverage error against interval length, which may lead to a shorter (more powerful) interval that is still valid. Calonico, Cattaneo, and Farrell (2018b) further develops these ideas for the specific case of regression discontinuity designs. 1.1 Related Literature Romano (2004), Wasserman (2006), and Romano, Shaikh, and Wolf (2010) give introductions to uniform validity and optimality of confidence interval estimators in particular, and statistical inference more generally, thus providing background review and references for this paper. Hall and Jing (1995) is the closest paper related to our work, wherein minimax bounds are established for one-sided confidence interval estimators based on t-test statistics, relying on Edgeworth expansions 3

5 for the Studentized sample mean. Our general framework was inspired by their paper but is quite different from their work and, to the best of our knowledge, new in the literature. To be more specific, while we also consider a minimax optimality criteria to rank confidence intervals, our framework applies more generally to a large class of (one- or two-sided) confidence intervals and under a wide range of conceptually distinct sufficient conditions, which can be verified in many other settings beyond the case of a parametric location model under i.i.d. data. The idea of ranking inference procedures using coverage error, or the equivalent notion of error in rejection probability, has appeared before in the econometrics literature. Two important examples are Jansson (2004) and Bugni (2010, 2016), who give Berry-Esseen-type bounds for inference procedures as a means of ranking them. Our framework encompasses this type of bounds and conclusions as one possible (weak) way of ranking confidence intervals, although it goes beyond their specific approach: neither of these works, nor others that we are aware of, have laid out the minimax framework of Section 2. Nevertheless, these papers are discussed in more detail below in the context of our framework and specific examples. Ranking inference procedures in general, sometimes uniformly over data-generating processes, has a longer history in econometrics and statistics. We can not hope to do justice to this literature, and hence only mention a few relevant and recent examples beyond those already cited. Beran (1982) studies the uniform optimality of bootstrap inference procedures. Backus (1989) and Donoho (1994) construct minimax confidence intervals for regression under parametric (e.g. Gaussian) assumptions, with the latter reference also showing some interval length optimality properties. Rothenberg (1984) develops higher order (size and power) comparisons of classical testing procedures. Horowitz and Spokoiny (2001) discuss minimax optimal rates for hypothesis testing. Schafer and Stark (2009) constructs confidence intervals with optimal expected size and some minimax optimality. Elliott, Müller, and Watson (2015) establish upper bounds on power for tests involving a nuisance parameter. Müller and Norets (2016) propose a notion of betting-based uniformity guarantees as a measure of (minimax) inference quality. More references and related approaches are discussed in these works. 4

6 2 Framework Our framework centers around inference on a parameter denoted θ, and we write θ F for the value of the true parameter when the data is generated according to distribution F. We study confidence intervals estimators for θ F, denoted I, that have nominal 100(1 α)% coverage. Our ultimate goal is to provide a ranking of all confidence intervals I in a class I, uniformly over a class F of DGPs, where the ranking is determined by the accuracy of coverage with respect to the nominal level. This ranking requires some knowledge of the coverage error, and naturally the more that is known, the more precise the ranking will be. This is formalized in the three assumptions below, Assumptions CEB, CER, and CEE, which are progressively stronger. We first discuss our measure of coverage accuracy as well as the underlying classes F and I. The class F of plausible DGPs is defined by the modeling assumptions and the empirical regularities of the application of interest, and the researcher would like some assurance that coverage is accurate no matter which F F generated the data. Thus, it is reasonable to evaluate a confidence interval I I by studying the worst-case coverage error within the plausible set of DGPs. We thus measure the worst-case coverage error as ( ) sup L P F [θ F I] (1 α), (2.1) F F where L(e) = L τ (e) = e (τ 1{e < 0}) is the check function for a given τ (0, 1). We focus on L τ (e) for its concreteness and usefulness, but could use other well-behaved loss functions, or augment the loss with a penalty for interval length to rule out interval estimators that are infinite with positive probability. Using the check function loss allows the researcher, through their choice of τ, to evaluate inference procedures according to their preferences against over- and under-coverage. Setting τ = 1/2 recovers the usual, symmetric measure of coverage error. Guarding more against undercoverage requires choosing a τ < 1/2. For example, setting τ = 1/3 encodes the belief that undercoverage is twice as bad as the same amount of overcoverage. Intuitively, a good confidence interval is one for which this maximal coveral error is minimized. More precisely, we seek a minimax optimal confidence interval. Since coverage error cannot be written as expected loss, our analysis does not fit neatly within traditional minimax risk analyses, and the corresponding established 5

7 tools and results do not apply. Lastly, controlling (2.1) is qualitatively different than uniform size control: see Remark 1 below. The identity and properties of an optimal confidence interval will depend not only on what is assumed about F, but also which confidence intervals are considered: the class of confidence intervals (more generally, inference procedures) I under consideration. We wish to be agnostic about both F and I, so that our results are as useful as possible and provide tight guidance for empirical practice. That is, the larger F, the more likely it is that a given data set is generated by some F F, and the larger is I the more certain the researcher can be that she is using the best procedure for inference. At the same time, both must be restricted in order to obtain effective bounds on (2.1). The sizes of the two classes, F and I, must be considered together, and interesting problems naturally balance their sizes/complexities. To illustrate these ideas, and motivate our results, consider the case of forming a confidence interval for the mean of a scalar random variable. Let the data be a random sample {X i, i = 1,..., n} from a scalar random variable X F and θ F = E F [X]. First, consider the class I. In order to give useful and tight guidance to a researcher, I can be neither too small nor too large. If I contains very few interval types, we would not be comparing all the procedures available to a researcher. On the other hand, to see that I cannot be too large, consider the interval which is set to the real line with probability (1 α) and empty otherwise. This interval has uniformly perfect coverage, that is, (2.1) is exactly zero, but is entirely uninformative to the researcher. This applies for any F, no matter how large, but such a seemingly powerful conclusion is only possible because I is not usefully defined. To illustrate the role of the class F in studying (2.1), suppose that I includes the standard t interval: I = [ X ± t 1 α/2 s/ n], where X is the sample mean, s is the sample standard deviation, and t 1 α/2 the 1 α 2 is quantile of a t distribution with n 1 degrees of freedom. If F is sufficiently restricted, then the t interval is optimal: if only Gaussian DGPs are plausible, that is F = {F : P F [X x] = Φ((x θ)/σ), θ R, σ 2 > 0}, then (2.1) is again exactly zero. The optimality of the t interval holds for any class I, no matter how large, but in parallel to the discussion above, this powerful statement is only possible because F is unrealistically small. On the other hand, well-known results, dating back at least to Bahadur and Savage (1956), show that if F is too large it is impossible to construct an effective confidence interval that controls the worst-case coverage. 6

8 The restriction that I be effective rules out setting I = R with a certain probability, and other such examples. (We may augment the coverage error loss function so that such intervals are not optimal, but rarely would ranking these be of interest to researchers.) Again, this indicates that F and I must be defined together in order that the problem is interesting and the results useful. In the artificial examples above it is possible to reduce (2.1) to zero, but in general this is not possible. However, it is often true that (2.1) vanishes asymptotically, as the sample size grows. Such a confidence interval is uniformly consistent. But even this is not guaranteed. For example, the t interval is pointwise consistent, that is, P F [θ F I] 1 α, provided that F permits a central limit theorem to hold for n X/s, but (2.1) will not vanish without further restrictions on F. Our central aim will be to quantify the rate at which (2.1) vanishes asymptotically, and to show how this rate depends on F and I. Our rankings formalize the intuition that intervals for which (2.1) vanishes faster are preferred over those for which the rate is slower, and intervals with the fastest possible rate are minimax optimal. This gives a way to rank inference procedures that may otherwise appear equivalent. Depending on what is known about the worst-case coverage, which we will now successively build up, we can provide more informative rankings. In the remaining of the paper, all quantities may vary with n, including F and I (and their members F and I) and limits are taken as n unless explicitly stated otherwise. We focus on scalar θ F for concreteness, but our framework extends naturally to other types of estimand. We begin with a weak notion of ranking confidence intervals, and correspondingly we make a weak assumption about coverage error: only bounds are known for the worst-case coverage. Assumption CEB: Coverage Error Bounds. For each I I, there exists a non-negative sequence R I and a positive sequence R I, such that ( ) R I sup L P F [θ F I] (1 α) R I. F F This assumption requires the existence and characterization of lower and upper bounds on the worse-case coverage error of a confidence interval estimator I I. Trivial bounds are R I = 0 and R I = 1. Non-trivial bounds can be established employing Berry-Esseen-type bounds and their reversed versions, Edgeworth Expansions and related methods, or other higher-order approximations to coverage error. See, for example, Rothenberg (1984), Hall (1992a), and Chen, Goldstein, 7

9 and Shao (2010) for reviews and more references. Concrete illustrations of these methods are given below. Assumption CEB is useful in comparing confidence intervals when R I = o(1) and R I > 0 for at least some I I. In this case, an interval I 1 I would never be a preferred choice if there was a competing procedure I 2 I whose upper bound was below the lower bound of I 1 : heuristically, R I2 < R I1 should mean that I 2 ranks above I 1. The following definition formalizes this idea. Definition 1: Domination. Under Assumption CEB, an interval I 1 I is I /F -dominated if there exists I 2 I such that R I2 = o(r I1 ). This idea parallels the notion of (in)admissibility in classical statistical decision theory, separating those confidence intervals that have the potential of being optimal from those that can never be (i.e., intervals that will always be dominated by some other interval estimator in the class I ). This is a weak ranking notion for confidence intervals. Nevertheless, it is often useful. For example, in the context of partially identified parameters, Bugni (2010, 2016) compares inference procedures based on an asymptotic distributional approximation (AA), a bootstrap approximation (B), and subsampling (SS), and shows that subsampling-based inference is dominated under assumptions therein. In the notation of Assumption CEB, it is shown that R SS n 1/3 whereas R AA R B = O(n 1/2 ). (Bugni establishes these upper bounds pointwise in F, but they can be extended to hold uniformly under regularity conditions.) Therefore, subsampling is dominated in this specific setting. Further, we have only the trivial bounds R AA = R B = 0, and thus confidence intervals based on the asymptotic approximation and based on the bootstrap cannot be ranked. See Sections 3, 4, and 5 for more detailed examples. Although it does not provide optimality directly, domination does hint at the notion of optimality: not being dominated by any other member of the class I is a necessary but not sufficient condition for optimality, and intervals that dominate all others in the class I should be optimal. Our next definition formalizes this idea. Definition 2: Minimax Rate Optimal Interval. Under Assumption CEB, an interval I I is I /F -minimax coverage error rate optimal if lim inf n ( ) inf sup R 1 I I I L P F [θ F I] (1 α) > 0. F F 8

10 This definition is most interesting when R I > 0 for all I I and R I = o(1) for at least some. However, notice that we do not require uniformly, or even pointwise, consistent coverage, because R I need not vanish for all I I. Such intervals can be ranked, they are simply suboptimal in our framework. For example, this will occur in nonparametrics (Section 3) when using a meansquare error optimal bandwidth to conduct inference without bias reduction or, more generally, with intervals that are asymptotically conservative or liberal. See also Remark 3.1. Even if Assumption CEB holds with R I > 0 and R I = o(1) for all I I, we may not be able to find useful rankings if the bounds are too loose. To see why, consider an artificial example in which I has three members, with R I1 n 2, R I1 n 1, R I2 n 3/2, R I2 n 1, and R I3 R I3 n 1/2. Here, I 3 is dominated, but we are unable to further rank I 1 and I 2. Suppose further that we had a sharp bound for I 2 : R I2 R I2 n 1. In this case, only the bounds for I 1 do not agree, yet still we do not have enough information to conclusively rank I 1 and I 2. (Alternatively, we could restate the definition so that both were optimal.) In interesting applications of our framework, the bounds will not be loose. For many examples, including the three applications below, we can find the exact rate at which the worst-case coverage error vanishes for some I I. We will thus strengthen Assumption CEB by assuming that the lower and upper bounds exhibit the same rate, and this rate can be characterized. Assumption CER: Coverage Error Rate. For each I I, there exist a positive (bounded) sequence r I, such that 0 < lim inf n r 1 I R I lim sup n r 1 I R I <. Heuristically, the idea is that the worse-case coverage error of each I I is bounded and bounded away from zero after appropriate scaling: c I < r 1 I sup F F ( ) L P F [θ F I] (1 α) < C I, for constants 0 < c I C I <. This rules out intervals with zero worse-case coverage. Confidence intervals with exact coverage typically have too large I (e.g., taking the real line with probability 1 α), too small F (e.g., t-test inference in the Gaussian location model), or pertain to specific cases (e.g., rank-based tests of the median under symmetry). Our framework can accommodate 9

11 situations with zero-worst case coverage by putting all such procedures in the same equivalence class, but this is not as useful in the current context. Thus, we leave unranked procedures that do not exhibit coverage error for at least one F F : the main focus of our paper are scenarios where coverage error is unavoidable, arguably the most common case in practice. We can now formalize the minimax optimal rate in our framework. Definition 3: Minimax Optimal Rate. Under Assumption CER, a sequence r is the I /F - minimax optimal coverage error rate if lim inf n inf I I r I r > 0. This definition requires strictly more information than Definition 2. That is, Assumption CER is sufficient but not necessary for identifying the optimal interval in the sense of Definition 2. However, if Assumption CER holds, then any minimax optimal I will attain this rate and any interval that attains r is of course a minimax optimal interval. Identifying r is often a crucial step in providing practical guidance. Naturally, not all procedures can attain this rate and in many examples, even for those that can, certain implementation details must be chosen appropriately to yield an optimal interval estimator. For example, in Section 4, wild bootstrap intervals can be optimal, but even within this family, only certain bootstrap weights yield the optimal rate. Assumption CER is not as restrictive as it may seem. Indeed, often it is verified using higherorder asymptotic expansions, in which case even more is known about the worst-case coverage error. In many applications we can characterize the rate and constant of the leading term of the coverage error, as formalized in our final, and strongest, assumption. Assumption CEE: Coverage Error Expansion. For each I I, there exists a (bounded) sequence R I,F, with R I,F 0 for at least one F F, and a positive (bounded) sequence r I with R I,F = O(r I ) uniformly in F F, such that ) sup L (P F [θ F I] (1 α) R I,F = o(r I ). (2.2) F F For many classes of confidence intervals Assumption CEE will follow from an Edgeworth ex- 10

12 pansion or other higher-order approximation. Sections 3, 4, and 5 discuss concrete and empirically important contexts where such an approximation holds, covering parametric and nonparametric estimands and nuisance parameters, Gaussian and non-gaussian limiting distributions, and crosssectional and dependent data. Following the structure above, Assumption CEE is more than what is required for finding the rate-optimal interval (Definition 2) or the optimal rate (Definition 3), but with the stronger assumption more can be learned from the constants. Notice that R I,F subsumes the rate and constant, and is thus not restricted to be positive (cf. Assumptions CEB and CER). Without loss of generality we can set R I,F = r I,F C I,F, where r I,F is a positive (usually vanishing) sequence, the rate, and C I,F will be a non-vanishing bounded sequence, forming the constant term of the expansion (when it converges). These constants will often be useful to guide practical implementation, such as tuning parameter selection. Section 3 illustrates this point by constructing data-driven coverage-optimal bandwidth selectors in the context of local polynomial nonparametric regression. Calonico, Cattaneo, and Farrell (2018b) further investigate this in the specialized setting of regression discontinuity designs. Furthermore, if C I,F can be appropriately characterized, uniformly over F, it may be possible to minimize both the rate and constants, i.e. finding the minimax coverage error rate optimal intervals, and then within this group, find the best constants. This brings us to the final definition in our framework. Definition 4: Minimax Optimal Interval. Under Assumption CEE, an interval I I is I /F -minimax coverage error optimal if r I = r (of Definition 3) and lim inf n inf I I sup F F L(C I,F ) sup F F L(C I,F ) 1. This ranking requires the strongest assumption, but accordingly, is the most powerful: giving essentially a complete and strict notion of optimality within I and F. This level of information is not always available (see Section 5 for an example). Indeed, we will focus on rate optimality (Definitions 1 3) in the subsequent sections, both in our new results and in unifying the literature, relegating the role of the constants for practical implementation only. In future work, we plan to further investigate the optimality notion given in Definition 4. 11

13 Assumptions CEB, CER, and CEE, and Definitions 1 4, complete the description of our proposed optimality framework. It is fairly general with respect to both I and F, and in many cases one or more of the assumptions is verifiable for interesting and large classes of intervals and DGPs. We now turn to three applications: nonparametric regression (Section 3), linear least squares regression (Section 4), and HAR inference (Section 5). Others are mentioned in Section 6. Remark 1. An alternative idea when ranking confidence interval estimators is to search for the shortest interval and/or fastest contracting interval among those with asymptotically and/or uniformly conservative coverage (i.e. uniform size control). This method considers coverage as fixed, and not necessarily correct, and optimizes length (or power). We optimize coverage error under assumptions that will, in general, restrict attention to finite-length intervals. The two rankings need not agree because the uniform quality guarantees are different. Specifically, an I is uniformly asymptotically conservative level α if for any δ > 0, there exists an n 0 = n 0 (δ) such that for all n n 0, inf F F P F [θ F I] (1 α) δ. In contrast, we are interested in intervals for which (2.1) vanishes, which translates to the guarantee that for all n n 0, sup F F L(P F [θ F I] (1 α)) < δ. See Romano (2004) and Romano, Shaikh, and Wolf (2010) for more discussion. Remark 2. Our framework focuses squarely on inference quality, and not on quality of point estimation. In general, these goals are not the same, and our framework highlights the distinction: it may be possible to find an excellent approximation to the sampling distribution of a poor point estimator. There are many ways to measure the quality of a point estimator, perhaps the two most common being mean square error and, among unbiased estimators (or within a bias tolerance), precision/efficiency. Coverage error improvements can come at the expense of these measures, and our framework can quantify this tradeoff precisely. For example, in Section 3.4 we show that the MSE optimal point estimator is suboptimal in terms of coverage error, and furthermore, in some cases the coverage error optimal interval implicitly uses a point estimator that is not even consistent in mean square, revealing a striking gap between the two notions of quality. The distinction between precision and coverage error is also evident in Section 5, where fixed-b HAR procedures are not asymptotically efficient (manifesting, in particular, as longer intervals) but offer coverage improvements. Some of the examples mentioned in Section 6 have the same features. In general, our framework reinforces the point that when the researcher seeks better statistical 12

14 inference they should choose a method explicitly for that goal. 3 Application to Local Polynomial Nonparametric Regression The first application of our framework is to local polynomial regression. We will characterize the minimax optimal coverage error rate r for a popular class of confidence interval estimators (restricting I ) under precise smoothness restriction of the regression function (restricting F ). An important lesson of this section, recalling the discussion above, is that r, and the set of intervals which can attain it, depends crucially on both F and I, and in particular through the smoothness assumed for the population regression function (in F ) and the smoothness exploited by the interval estimator (in I ). We show that, with appropriate choice of bandwidth, standard errors and quantiles, the robust bias corrected confidence intervals proposed by Calonico, Cattaneo, and Farrell (2018a) are minimax rate optimal in the sense of Definitions 2 and 3. For this application, we require new technical results to verify Assumption CEE (and hence CEB and CER). Specifically, we obtain novel uniformly valid Edgeworth expansions for local polynomial estimators of the regression function and its derivatives at both interior and boundary points; given in the supplemental appendix and underlying Lemma 3.1 below. These results improve upon the current literature by (i) establishing uniformity over empirically-relevant classes of DGPs, (ii) covering derivative estimation, and (iii) allowing for the uniform kernel. 3.1 The Class of Data Generating Processes To apply our proposed framework, we must make precise the classes F and I. We begin with F. For a pair of random variables (Y, X), the object of interest is a derivative of the regression function at a point x in the support of X: θ F = µ (ν) ν F (x) := x ν E F [Y X =x], (3.1) x=x with ν Z +. From the rest of this section, we assume x = 0 and omit the point of evaluation (e.g., θ F = µ (ν) ) whenever possible. All generic results in this section cover both interior and boundary F cases, and could be naturally extended to vector-valued data. 13

15 The class of DGPs is defined by the following set of conditions. Assumption 3.1 (DGP). {(Y 1, X 1 ),..., (Y n, X n )} is a random sample from (Y, X) which are distributed according to F. There exist constants S ν, s (0, 1], 0 < c < C <, and δ > 8, and a neighborhood of x = 0, none of which depend on F, such that for all x, x in the neighborhood: (a) the Lebesgue density of X i, f( ), is continuous and c f(x) C, v(x) := V[Y i X i = x] c and continuous, and E[ Y i δ X i = x] C, and (b) µ( ) is S-times continuously differentiable and µ (S) (x) µ (S) (x ) C x x s. The conditions here are not materially stronger than usual, other than the requirement that they hold independently of F, which is used to prove uniform results. Assumption 3.1(b) highlights the smoothness assumption, which sets a limit of how quickly the worst-case coverage error can decay. The distinction and interplay between the smoothness assumed here and that utilized by the procedure will be important for our results, as made precise below. Procedures that make use of more smoothness will yield faster rates when such smoothness is available, but there is also an important optimality notion among inference procedures that exploit the same level of smoothness. To make these points precise, we must first define the class of confidence interval procedures (and estimators) considered. 3.2 The Class of Confidence Interval Estimators We restrict I to contain t-test-based intervals constructed using local polynomial methods (i.e., weighted least squares regression), and discuss optimal procedures within this class. Many other ways of forming confidence intervals exists, of course, as well as other nonparametric regression techniques, and all have strengths and weaknesses. We focus on local polynomial t-test-based intervals because they are tractable and popular in empirical work. We will only briefly review local polynomial estimation (see Fan and Gijbels, 1996, for more). Define ˆµ (ν) via the local regression: ˆµ (ν) = ν!e ν ˆβ = 1 nh ν ν!e νγ 1 ΩY, ˆβ = arg min b R p+1 n (Y i r p (X i ) b) 2 K i=1 ( ) Xi, (3.2) h 14

16 where p ν is an integer with p ν odd, e ν is the (p + 1)-vector with a one in the (ν + 1) th position and zeros in the rest, r p (u) = (1, u, u 2,..., u p ), K is a kernel or weighting function, Γ = n i=1 (nh) 1 K(X i /h)r p (X i /h)r p (X i /h), Ω = [K(X 1 /h)r p (X 1 /h),..., K(X n /h)r p (X n /h)], and Y = (Y 1,..., Y n ). The two germane quantities here are the bandwidth sequence h, assumed to vanish as n diverges, and p, the order of the polynomial, set as usual so that p ν is odd. These are chosen by the researcher and will impact the coverage error decay rate. The rate depends on the local sample size, nh, and the pointwise bias, determined by h, p, and the assumed smoothness. With these chosen, and a valid standard error choice ˆσ p (detailed below), the standard t-statistic is T p = nh 1+2ν (ˆµ (ν) θ F ) ˆσ p. (3.3) Valid inference requires a choice of the tuning parameter h, which is often regarded as the most difficult in practice and most delicate in theory. Our framework of coverage optimality sheds new light on this problem by motivating inference-optimal (coverage error minimizing) bandwidth choices, an important result from our work for empirical research. To formalize how the bandwidth choice impacts estimation and inference, let us begin with the most common choice by far, and indeed, the default in most software packages: minimizing the mean-squared error (MSE) of the point estimator ˆθ p := ˆµ (ν) (x). To characterize the MSE-optimal bandwidth, suppose for the moment that p S 1. Then the conditional mean and variance of ˆθ p are: E [ˆµ (ν) ] X 1,..., X n = µ (ν) + h p+1 ν ν!e νγ 1 Λ µ(p+1) (p + 1)! + o P(h p+1 ν ), (3.4) with Λ = Ω[(X 1 /h) p+1,, (X n /h) p+1 ] /n, and V [ˆµ (ν) ] X 1,..., X n = 1 nh 1+2ν ν!2 e νγ 1 (hωσω /n)γ 1 e ν, (3.5) with Σ the n n diagonal matrix with elements v(x i ). The MSE-optimal bandwidth will thus obey h mse n 1/(2p+3), whenever µ (p+1) 0. (Throughout this paper, asymptotic orders and their in-probability versions hold uniformly in F, as required by our framework; e.g., A n = o P (a n ) means sup F Fn P F [ A n /a n > ɛ] 0 for every ɛ > 0.) The rate of decay of h mse does not depend on the 15

17 specific derivative being estimated, though the convergence rate of the point estimate ˆµ (ν) to µ (ν) F will depend on ν. This is a well-known feature of local polynomials, but warrants mention as the coverage error decay rate will also not depend on the derivative, as established below. The MSE-optimal bandwidth is too large for inference: the bias term of (3.4) remains firstorder important after scaling by the standard deviation, rendering standard Gaussian inference invalid. To remove this bias term, we consider two approaches: undersmoothing and robust explicit bias correction. The former simply involves choosing a bandwidth that vanishes more rapidly than n 1/(2p+3), rendering the bias negligible. Explicit bias correction involves subtracting an estimate of the leading term of (3.4), of which only µ (p+1) is unknown, and then inference is made robust by accounting for the variability of this point estimate. The estimate ˆµ (p+1) is defined via (3.2), with p + 1 in place of both p and ν throughout, and a bandwidth b := ρ 1 h instead of h. These implementation choices have a precise theoretical justification (Calonico, Cattaneo, and Farrell, 2018a). The bias corrected point estimate is ˆµ (p+1) ˆθ rbc = ˆµ (ν) h p+1 ν ν!e νγ 1 Λ 1 (p + 1)! = 1 nh ν ν!e νγ 1 Ω rbc Y, Ω rbc = Ω ρ p+1 Λe p+1 Γ 1 Ω, where Γ and Ω are defined as above, but with p + 1 and b in place of p and h, respectively. Comparing to (3.2), all that has changed is the matrix Ω premultiplying Y. The final choice for implementation is that of variance estimator, which is also important for coverage error. This is a crucial aspect that differentiates traditional first-order analyses, where only consistency is required, from higher-order theory, which captures explicitly the uncertainty in variance estimation (among other things). Thus part of finding the optimal procedure in our framework is a careful choice of standard errors. In general, there are two types of higher-order terms that arise due to Studentization. One is the unavoidable estimation error incurred when replacing a population standardization, say σ 2, with a feasible Studentization, ˆσ 2. However, there is also an error in the difference between the population variability of the point estimate (i.e. of the numerator of the t-statistic) and the population standardization chosen. A fixed-n approach is one where the Studentization ˆσ 2 is chosen directly to estimate V[ nh 1+2ν ˆθ X 1,..., X n ], a fixed-n calculation. Fixed-n Studentization completely removes the second type of error. This can be contrasted with the popular practice of Studentizing with a feasible version of the asymptotic 16

18 variance, i.e. finding the probability limit of V[ nh 1+2ν ˆθ X 1,..., X n ] and estimating any unknown quantities. This is valid to first order, but the difference between V[ nh 1+2ν ˆθ X 1,..., X n ] and its limit manifest in the higher-order expansion, exacerbating coverage error. At boundary points these errors are O(h) and thus particularly damaging to coverage. Other possibilities are available and may also be detrimental to coverage. We can now define the class of confidence intervals we consider, which indexes choices of point estimates, standard errors, bandwidths, and quantiles. Regularity conditions are placed upon the kernel function. All of these represent choices made by the researcher, and each choice impacts the coverage error, as made precise below. Our results give practical guidance for these choices, the most important of which is the choice of bandwidth. In general, we shall write I and I, but when discussing specific choices it will be useful notationally to write the intervals as functions of these choices, such as I(h) for an interval based on a bandwidth h or I(ˆθ, ˆσ) for specific choices of point estimate and standard errors. The other choices will be clear from the context. In particular, let I p = I(ˆθ p, ˆσ p ) and I rbc = I(ˆθ rbc, ˆσ rbc ), where ˆσ p and ˆσ rbc are defined below. Assumption 3.2 (Confidence Intervals). (a) I is of the form I = [ˆθ / zuˆσ nh 1+2ν, ˆθ z / lˆσ nh 1+2ν] (3.6) for a point estimator ˆθ = ˆθ p or ˆθ rbc, a well-behaved standard error ˆσ (defined below), fixed quantiles z l and z u, a nonrandom bandwidth sequence h = Hn γ, with H bounded and bounded away from zero and c γ 1 c for c > 0, not depending on I, and, if required, a fixed, bounded ρ = h/b. (b) The kernel K is supported on [ 1, 1], positive, bounded, and even. Further, K(u) is either constant (the uniform kernel) or (1, K(u)r 3(p+1) (u)) is linearly independent on [ 1, 1]. The order p is at least ν and p ν is odd. By well-behaved standard errors here we mean two things. First, we assume that the standard errors are (uniformly) valid, in that the associated t-statistic is asymptotically standard Normal. Inference based on invalid standard errors will be dominated, trivially, and thus while we could in principle account for this, we assume it away for simplicity. Second, and more importantly, is 17

19 that the ingredients of the standard errors must obey Cramér s condition. This can be assumed directly, but for kernel-based estimators of V[ nh 1+2v ˆθ X 1,..., X n ] or its limit, we prove in the supplement that Assumption 3.2(b) ensures this. In particular, for ˆθ p and ˆθ rbc, we will focus on the following fixed-n standard errors, following the ideas above. Let ˆΣ p and ˆΣ rbc be the diagonal matrixes of estimates of v(x i ), given by ˆv(X i ) = (Y i r p (X i ) ˆβ) 2 for the former and and ˆv(X i ) = (Y i r p+1 (X i ) ˆβ p+1 ) 2 for the latter, where ˆβ p+1 is defined as in (3.2), with p + 1 in place of p and b instead of h. Then, we let ˆσ 2 p = ν! 2 e νγ 1 (hω ˆΣ p Ω /n)γ 1 e ν and ˆσ 2 rbc = ν! 2 e νγ 1 (hω rbc ˆΣ rbc Ω rbc/n)γ 1 e ν. (3.7) For the quantiles, the most common choices are z l = Φ 1 (α/2) =: z α/2 and z u = Φ 1 (1 α/2) =: z 1 α, where Φ is the standard Normal distribution function, but our results allow for other options. 2 For coverage error purposes, symmetric choices, i.e. where Φ (1) (z l ) = Φ (1) (z u ), yield improvements in coverage error due to cancellations in Edgeworth expansion terms. Asymmetric choices such that Φ(z u ) Φ(z l ) = 1 α can still yield correct coverage, but at a slower rate. Under Assumption 3.2(b) we prove (in the supplement) that the appropriate n-varying version of Cramér s condition hold for all I I. We do not need to make an opaque high-level assumption. Prior work on Edgeworth expansions for nonparametric inference has, explicitly or implicitly, ruled out the uniform kernel (Hall, 1991; Chen and Qin, 2002; Calonico, Cattaneo, and Farrell, 2018a) or treated the regressors as fixed (Hall, 1992b; Neumann, 1997). We are able to include the uniform kernel, which is important to account for popular empirical practice (i.e., local least squares). All popular kernel functions are now allowed for by Assumption 3.2(b), including uniform, triangular, Epanechnikov, and so forth. 3.3 Uniform Coverage Error Expansions Before we can apply the framework of Section 2 to this class of problems we must verify one of the assumptions. We now establish uniformly valid coverage error expansions, which verify Assumption CEE and define the R I,F. These are used in the next subsection to identify optimal intervals and rates, and further below to select inference-optimal bandwidths. As mentioned above, the relationship between S and p captures the interplay between I and 18

20 F in this problem. This is due to the bias of ˆθ p and ˆθ rbc, which manifest in the expansions. It is convenient to separate the rate and constant portions with specific notation. Let the (fixed-n) population bias of h ν ˆθ be denoted by h η ψ I,F, where both η > 0 and ψ I,F depend on the specific procedure and F. In general, the rate will be known but the constants may be unknown or even (if p > S) uncharacterizable without further assumptions (details are in the supplement). For example, in the case of the MSE-optimal bandwidth discussed above, with p < S, η = p + 1 and ψ Ip,F is ν!e νe[γ] 1 E[Λ]µ (p+1) /(p+1)!; c.f. Equation (3.4). In this notation, explicit bias correction removes an estimate of h η ν ψ Ip,F. For coverage to be uniformly correct in large samples, we will assume that nhh η vanishes asymptotically, making no explicit mention of smoothness. We can then use the generic expansions to study the coverage error of each interval and its dependence on p and S. For example, in the case of the standard approach, using ˆθ p and ˆσ p, this is the standard undersmoothing requirement for correct coverage. The coverage error expansions are given next. For an interval I, the coverage error is the difference of Edgeworth expansions for the associated t-statistic, evaluated at each quantile. The expansion is given in terms of six functions ω k,i,f (z), k = 1, 2,..., 6. These are cumbersome notationally, and so the exact forms are deferred to the supplement. All that is important for our results is that they are known for all I I and F F, bounded, and bounded away from zero for at least some F F, and most crucially, that ω 1, ω 2, and ω 3 are even functions of z, while ω 4, ω 5, and ω 6 are odd. Also appearing is λ I,F, a generic placeholder capturing the mismatch between the variance of the numerator of the t-statistic and the population standardization chosen (i.e. the quantity estimated by ˆσ of I). We can not make this error precise for all choices, but we consider two important special cases. First, employing an estimate of the asymptotic variance renders λ I,F = O(h) at boundary points. Second, the fixed-n Studentizations (3.7) yield λ I,F 0. For other choices, the rates and constants may change, but it is important to point out that the coverage error rate cannot be improved beyond the others shown through the choice of Studentization alone (see discussion in the supplement). Let λ I such that sup F F λ I,F = O(λ I ) = o(1). Our main technical result for local polynomials is the following Lemma. Lemma 3.1. Let F collect all F which obey Assumption 3.1 and I collect all I that obey Assump- 19

21 tion 3.2. Then, uniformly over I, if γ > 1/(1 + 2η) and z l, z u are such that Φ(z u ) Φ(z l ) = 1 α, then where r I = max{(nh) 1, nh 1+2η, h η, λ I } and ) sup L (P F [θ F I] (1 α) R I,F = o (r I ), F F R I,F = 1 nh { ω1,i,f (z u ) ω 1,I,F (z l ) } + nhh η{ ψ I,F [ω 2,I,F (z u ) ω 2,I,F (z l )] } + 1 { ω4,i,f (z u ) ω 4,I,F (z l ) } + nh 1+2η{ ψi,f 2 [ω 5,I,F (z u ) ω 5,I,F (z l )] } nh + h η{ ψ I,F [ω 6,I,F (z u ) ω 6,I,F (z l )] } { + λ I,F ω3,i,f (z u ) ω 3,I,F (z l ) }, ( ) otherwise sup F F L P F [θ F I] (1 α) 1. This result is quite general, covering unadjusted, undersmoothed, and robust bias corrected confidence intervals, as well as other methods (Remark 3.1), at interior and boundary points. To fully utilize this result, and identify optimal procedures, we have to specify the relationship of p to S. However, even at this level of generality, some important conclusions are available. First, this shows the well-known result that symmetric intervals, with z l = z u, have superior coverage: the even functions ω 1, ω 2, and ω 3 cancel, and these are the slowest-decaying. Second, the final conclusion of the theorem simply formalizes the idea that the bandwidth must vanish at the appropriate rate (among other choices) lest worst-case coverage error persists asymptotically. Notice that such intervals can be ranked in our framework, but are dominated (Definition 1). Third, I such that λ I,F 0 yield superior coverage. Taking these three conclusions into account, for the rest of the paper we use the fixed-n standard errors of (3.7) and z l = z α/2 := Φ 1 (α/2) and z u = z 1 α 2 := Φ 1 (1 α/2). The coverage error of such an interval is R I,F = 1 { 2ω4,I,F (z nh α/2 ) } + nh 1+2η{ 2ψI,F 2 ω 5,I,F (z α/2 ) } + h η{ 2ψ I,F ω 6,I,F (z α/2 ) }. (3.8) Below we use this form to obtain the optimal rates and intervals, as well as to select the bandwidths. Finally, Lemma 3.1 implicitly reveals how fundamentally different are point estimation and inference, recalling Remark 2. First, the two may proceed at different rates. Observe that the rate of R I,F does not depend on the order of the derivative being estimate, ν. As a consequence, neither will the optimal rate r (Definition 3) nor the optimal procedure (Definition 2), encompassing, in 20

On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference

On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference Sebastian Calonico Matias D. Cattaneo Max H. Farrell November 4, 014 PRELIMINARY AND INCOMPLETE COMMENTS WELCOME Abstract

More information

On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference

On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference arxiv:1508.02973v6 [math.st] 7 Mar 2018 Sebastian Calonico Matias D. Cattaneo Department of Economics Department of Economics

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Confidence intervals for kernel density estimation

Confidence intervals for kernel density estimation Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting

More information

Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs

Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs Sebastian Calonico Matias D. Cattaneo Max H. Farrell September 14, 2018 Abstract Modern empirical work in

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised March 2018

SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised March 2018 SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION By Timothy B. Armstrong and Michal Kolesár June 2016 Revised March 2018 COWLES FOUNDATION DISCUSSION PAPER NO. 2044R2 COWLES FOUNDATION

More information

Simple and Honest Confidence Intervals in Nonparametric Regression

Simple and Honest Confidence Intervals in Nonparametric Regression Simple and Honest Confidence Intervals in Nonparametric Regression Timothy B. Armstrong Yale University Michal Kolesár Princeton University June, 206 Abstract We consider the problem of constructing honest

More information

Regression Discontinuity Designs Using Covariates

Regression Discontinuity Designs Using Covariates Regression Discontinuity Designs Using Covariates Sebastian Calonico Matias D. Cattaneo Max H. Farrell Rocío Titiunik May 25, 2018 We thank the co-editor, Bryan Graham, and three reviewers for comments.

More information

SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised October 2016

SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised October 2016 SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION By Timothy B. Armstrong and Michal Kolesár June 2016 Revised October 2016 COWLES FOUNDATION DISCUSSION PAPER NO. 2044R COWLES FOUNDATION

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms

Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms Byunghoon ang Department of Economics, University of Wisconsin-Madison First version December 9, 204; Revised November

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Department of Economics, Vanderbilt University While it is known that pseudo-out-of-sample methods are not optimal for

Department of Economics, Vanderbilt University While it is known that pseudo-out-of-sample methods are not optimal for Comment Atsushi Inoue Department of Economics, Vanderbilt University (atsushi.inoue@vanderbilt.edu) While it is known that pseudo-out-of-sample methods are not optimal for comparing models, they are nevertheless

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

Comparison of inferential methods in partially identified models in terms of error in coverage probability

Comparison of inferential methods in partially identified models in terms of error in coverage probability Comparison of inferential methods in partially identified models in terms of error in coverage probability Federico A. Bugni Department of Economics Duke University federico.bugni@duke.edu. September 22,

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Optimal bandwidth selection for the fuzzy regression discontinuity estimator Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal

More information

Regression Discontinuity Designs Using Covariates

Regression Discontinuity Designs Using Covariates Regression Discontinuity Designs Using Covariates Sebastian Calonico Matias D. Cattaneo Max H. Farrell Rocío Titiunik March 31, 2016 Abstract We study identification, estimation, and inference in Regression

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Large Sample Properties of Partitioning-Based Series Estimators

Large Sample Properties of Partitioning-Based Series Estimators Large Sample Properties of Partitioning-Based Series Estimators Matias D. Cattaneo Max H. Farrell Yingjie Feng April 13, 2018 Abstract We present large sample results for partitioning-based least squares

More information

Program Evaluation with High-Dimensional Data

Program Evaluation with High-Dimensional Data Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Regression Discontinuity Designs in Stata

Regression Discontinuity Designs in Stata Regression Discontinuity Designs in Stata Matias D. Cattaneo University of Michigan July 30, 2015 Overview Main goal: learn about treatment effect of policy or intervention. If treatment randomization

More information

ORIGINS OF STOCHASTIC PROGRAMMING

ORIGINS OF STOCHASTIC PROGRAMMING ORIGINS OF STOCHASTIC PROGRAMMING Early 1950 s: in applications of Linear Programming unknown values of coefficients: demands, technological coefficients, yields, etc. QUOTATION Dantzig, Interfaces 20,1990

More information

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Maria Ponomareva University of Western Ontario May 8, 2011 Abstract This paper proposes a moments-based

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

Supplement to On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference

Supplement to On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference Supplement to On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference Sebastian Calonico Matias D. Cattaneo Max H. Farrell December 16, 2017 This supplement contains technical

More information

Inference on distributions and quantiles using a finite-sample Dirichlet process

Inference on distributions and quantiles using a finite-sample Dirichlet process Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics

More information

Location Properties of Point Estimators in Linear Instrumental Variables and Related Models

Location Properties of Point Estimators in Linear Instrumental Variables and Related Models Location Properties of Point Estimators in Linear Instrumental Variables and Related Models Keisuke Hirano Department of Economics University of Arizona hirano@u.arizona.edu Jack R. Porter Department of

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y

More information

Reliable Inference in Conditions of Extreme Events. Adriana Cornea

Reliable Inference in Conditions of Extreme Events. Adriana Cornea Reliable Inference in Conditions of Extreme Events by Adriana Cornea University of Exeter Business School Department of Economics ExISta Early Career Event October 17, 2012 Outline of the talk Extreme

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

A Simple Adjustment for Bandwidth Snooping

A Simple Adjustment for Bandwidth Snooping A Simple Adjustment for Bandwidth Snooping Timothy B. Armstrong Yale University Michal Kolesár Princeton University June 28, 2017 Abstract Kernel-based estimators such as local polynomial estimators in

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

What s New in Econometrics. Lecture 13

What s New in Econometrics. Lecture 13 What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Refining the Central Limit Theorem Approximation via Extreme Value Theory Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

Section 7: Local linear regression (loess) and regression discontinuity designs

Section 7: Local linear regression (loess) and regression discontinuity designs Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear

More information

Nonparametric Cointegrating Regression with Endogeneity and Long Memory

Nonparametric Cointegrating Regression with Endogeneity and Long Memory Nonparametric Cointegrating Regression with Endogeneity and Long Memory Qiying Wang School of Mathematics and Statistics TheUniversityofSydney Peter C. B. Phillips Yale University, University of Auckland

More information

Computational Tasks and Models

Computational Tasks and Models 1 Computational Tasks and Models Overview: We assume that the reader is familiar with computing devices but may associate the notion of computation with specific incarnations of it. Our first goal is to

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation LECTURE 10: REVIEW OF POWER SERIES By definition, a power series centered at x 0 is a series of the form where a 0, a 1,... and x 0 are constants. For convenience, we shall mostly be concerned with the

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Algorithm Independent Topics Lecture 6

Algorithm Independent Topics Lecture 6 Algorithm Independent Topics Lecture 6 Jason Corso SUNY at Buffalo Feb. 23 2009 J. Corso (SUNY at Buffalo) Algorithm Independent Topics Lecture 6 Feb. 23 2009 1 / 45 Introduction Now that we ve built an

More information

Statistica Sinica Preprint No: SS

Statistica Sinica Preprint No: SS Statistica Sinica Preprint No: SS-017-0013 Title A Bootstrap Method for Constructing Pointwise and Uniform Confidence Bands for Conditional Quantile Functions Manuscript ID SS-017-0013 URL http://wwwstatsinicaedutw/statistica/

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Bootstrap Tests: How Many Bootstraps?

Bootstrap Tests: How Many Bootstraps? Bootstrap Tests: How Many Bootstraps? Russell Davidson James G. MacKinnon GREQAM Department of Economics Centre de la Vieille Charité Queen s University 2 rue de la Charité Kingston, Ontario, Canada 13002

More information

Robust Performance Hypothesis Testing with the Variance. Institute for Empirical Research in Economics University of Zurich

Robust Performance Hypothesis Testing with the Variance. Institute for Empirical Research in Economics University of Zurich Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 516 Robust Performance Hypothesis Testing with the Variance Olivier Ledoit and Michael

More information

Duration-Based Volatility Estimation

Duration-Based Volatility Estimation A Dual Approach to RV Torben G. Andersen, Northwestern University Dobrislav Dobrev, Federal Reserve Board of Governors Ernst Schaumburg, Northwestern Univeristy CHICAGO-ARGONNE INSTITUTE ON COMPUTATIONAL

More information

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap University of Zurich Department of Economics Working Paper Series ISSN 1664-7041 (print) ISSN 1664-705X (online) Working Paper No. 254 Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 18, 2015

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 18, 2015 MFM Practitioner Module: Risk & Asset Allocation February 18, 2015 No introduction to portfolio optimization would be complete without acknowledging the significant contribution of the Markowitz mean-variance

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Asymptotic Relative Efficiency in Estimation

Asymptotic Relative Efficiency in Estimation Asymptotic Relative Efficiency in Estimation Robert Serfling University of Texas at Dallas October 2009 Prepared for forthcoming INTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES, to be published by Springer

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

optimal inference in a class of nonparametric models

optimal inference in a class of nonparametric models optimal inference in a class of nonparametric models Timothy Armstrong (Yale University) Michal Kolesár (Princeton University) September 2015 setup Interested in inference on linear functional Lf in regression

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Inference in Regression Discontinuity Designs with a Discrete Running Variable

Inference in Regression Discontinuity Designs with a Discrete Running Variable Inference in Regression Discontinuity Designs with a Discrete Running Variable Michal Kolesár Christoph Rothe arxiv:1606.04086v4 [stat.ap] 18 Nov 2017 November 21, 2017 Abstract We consider inference in

More information

Multiscale Adaptive Inference on Conditional Moment Inequalities

Multiscale Adaptive Inference on Conditional Moment Inequalities Multiscale Adaptive Inference on Conditional Moment Inequalities Timothy B. Armstrong 1 Hock Peng Chan 2 1 Yale University 2 National University of Singapore June 2013 Conditional moment inequality models

More information

OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 COWLES FOUNDATION DISCUSSION PAPER NO.

OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 COWLES FOUNDATION DISCUSSION PAPER NO. OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS By Timothy B. Armstrong and Michal Kolesár May 2016 COWLES FOUNDATION DISCUSSION PAPER NO. 2043 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY

More information

FINITE-SAMPLE OPTIMAL ESTIMATION AND INFERENCE ON AVERAGE TREATMENT EFFECTS UNDER UNCONFOUNDEDNESS. Timothy B. Armstrong and Michal Kolesár

FINITE-SAMPLE OPTIMAL ESTIMATION AND INFERENCE ON AVERAGE TREATMENT EFFECTS UNDER UNCONFOUNDEDNESS. Timothy B. Armstrong and Michal Kolesár FINITE-SAMPLE OPTIMAL ESTIMATION AND INFERENCE ON AVERAGE TREATMENT EFFECTS UNDER UNCONFOUNDEDNESS By Timothy B. Armstrong and Michal Kolesár December 2017 Revised December 2018 COWLES FOUNDATION DISCUSSION

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric

More information

ON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING AND THE BOOTSTRAP. Joseph P. Romano Azeem M. Shaikh

ON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING AND THE BOOTSTRAP. Joseph P. Romano Azeem M. Shaikh ON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING AND THE BOOTSTRAP By Joseph P. Romano Azeem M. Shaikh Technical Report No. 2010-03 April 2010 Department of Statistics STANFORD UNIVERSITY Stanford, California

More information