optimal inference in a class of nonparametric models

Size: px
Start display at page:

Download "optimal inference in a class of nonparametric models"

Transcription

1 optimal inference in a class of nonparametric models Timothy Armstrong (Yale University) Michal Kolesár (Princeton University) September 2015

2 setup Interested in inference on linear functional Lf in regression model y i = f (x i ) + u i, u i N (0,σ 2 (x i )). x i is fixed, σ 2 (x i ) is known. Important special cases: 1. Inference at a point: Lf = f (0) 2. Regression discontinuity: Lf = f (0 + ) f (0 ) 3. ATE under unconfoundedness: x i = {w i,d i }, Lf = 1 n i (f (w i,1) f (w i,0)) 4. Partially linear model 2/59

3 key assumption Convexity Assumption f F, a known convex set Rules out e.g. sparsity, but not usual shape/smoothness restrictions: Monotonicity F = { f : f non-increasing } Lipschitz class F Lip (C) = { f : f (x 1 ) f (x 2 ) C x 1 x 2 } (or Hölder class generalizations). Taylor class F T,2 (C) = { f : f (x) f (0) f (x)x Cx 2} (useful for RD / inference at point) Sign restrictions in linear regression { f (x) = x β : β j 0,j J } Will take C as known if necessary, and ask later if this can be relaxed. 3/59

4 notions of finite-sample optimality Normality = can derive finite-sample procedures that minimize the worst case loss over G F without Normality, procedures will be valid and optimal asymptotically under regularity conditions, uniformly over F 1. Setting G = F yields minimax procedures. Problem well-studied if loss is MSE, general solution in Donoho (1994), used to derive optimal kernels and rates of convergence (Stone, 1980; Fan, 1993; Cheng, Fan, and Marron, 1997) Donoho (1994) derives fixed-length confidence intervals (CI) that are almost optimal 2. G F smoother functions: adaptive inference ( directing power ) For two-sided CIs, Cai and Low (2004) give bounds 4/59

5 new finite-sample results: one-sided cis Derive one-sided CIs, [ĉ, ), that minimize maximum quantiles of excess length over G, with ĉ = ˆL bias F ( ˆL) z 1 α sd( ˆL), for optimal estimator ˆL For case F = G (minimax CIs), ˆL has same form as minimax MSE estimators / fixed-length CIs of Donoho (1994) We show that if F is symmetric, adaptation severely limited. Adaptation requires non-convexity or shape restrictions: otherwise, cannot do better at smaller C while maintaining coverage for larger C Conversely any inference method that claims to do better than minimax CIs when f is smooth must be size distorted for some f F (C) Related to Low (1997), who shows adapting to derivative smoothness classes limited for two-sided (random-length) CIs. 5/59

6 new finite-sample results: two-sided cis We derive two-sided CIs that minimize expected length over G = { д }, solving the problem of adaptation to a function posed in Cai, Low, and Xia (2013) Can be used to bound scope for adaptivity 6/59

7 implications for optimal bandwidth choice Asymptotically, optimal procedures often correspond to kernel estimators with fixed (optimal) kernel, and bw that depends on optimality criterion. We find that for RD and inference at a point: Optimal 95% fixed-length CIs use larger bandwidth than minimax MSE estimators. Undersmoothing cannot be optimal Recentering CIs by estimating bias cannot be optimal it s essentially equivalent to using higher-order kernel and undersmoothing (Calonico, Cattaneo, and Titiunik, 2014). Difference is small: CI around minimax MSE estimator only 1% longer In practice, can keep the same bandwidth as for estimation, and construct CI around it using worst-case bias correction 7/59

8 applications We apply the general results to: 1. RD with F = { f + f : f ± F T,2 (C) } as in Cheng, Fan, and Marron (1997) Optimal bandwidths balance number of effective observations on each side of cutoff Illustrate with empirical application from Lee (2008) 2. Linear regression with β possibly constrained (sign restrictions, sparsity, elliptical constraints) 3. Sample average treatment effect under unconfoundedness under Hölder class (separate paper) 8/59

9 incomplete list of related literature Stats literature on minimax estimation/inference/rates of convergence/adaptivity: Ibragimov and Khas minskii (1985), Donoho and Liu (1991), Donoho and Low (1992), Donoho (1994), Low (1995), Low (1997), Cai and Low (2004), Cai, Low, and Xia (2013), Cheng, Fan, and Marron (1997), Fan (1993), Fan, Gasser, Gijbels, Brockmann, and Engel (1997), Lepski and Tsybakov (2000) non-standard CIs: Imbens and Manski (2004), Müller and Norets (2012), Calonico, Cattaneo, and Titiunik (2014), Calonico, Cattaneo, and Farrell (2015), Rothe (2015) Adaptive estimation/inference in econometrics: Sun (2005), Armstrong (2015), Chernozhukov, Chetverikov, and Kato (2014) 9/59

10 Finite-Sample results Asymptotic results Applications Conclusion

11 running example Consider the problem of inference on f (0) when f is restricted to be in Lipschitz class F = F Lip (C) = { f : f (x 1 ) f (x 2 ) C x 1 x 2 }. Assume σ (x) = σ, known 10/59

12 performance criteria To measure performance of (1 α)% one-sided CIs [ĉ, ), we use maximum quantiles of excess length where q д,β is βth quantile under д. EL β (ĉ, G) = sup q д,β (Lд ĉ), д G For two-sided CIs, we focus on fixed-length CIs ˆL ± χ, where ˆL is estimator, and χ is chosen to satisfy coverage: χ α ( ˆL) = min { χ : inf f F P f ( ˆL Lf χ ) 1 α } For estimation, we use maximum MSE, R MSE ( ˆL) = sup f F E f ( ˆL Lf ) 2 11/59

13 minimax testing problem In running example, Lf = f (0), F = F Lip (C), consider minimax test of H 0 : Lf L 0 against H 1 : Lf L 0 + 2b Inverting minimax tests yields CI that minimizes EL β (ĉ, F ), where β is minimax power of the test. First need to find least favorable null and alternative. Problem equivalent to Y N (µ,σ 2 I ), µ = (f (x 1 ),..., f (x n )) M convex Both M 0 = M { f : Lf L 0 } and M1 = M { д : Lд L 0 + 2b } are convex least favorable functions minimize distance between them (Ingster and Suslina, 2003) (д, f ) = argmin д M 1,f M 0 n (д(x i ) f (x i )) 2. i=1 12/59

14 L_{0}+2b д (x) = L 0 + b + (b C x ) + f (x) = L 0 + b (b C x ) + L_{0}+b g^{*} f^{*} L_{0} b/c 0 b/c x 13/59

15 д (x) = L 0 + b + (b C x ) +, f (x) = L 0 + b (b C x ) + Minimax test then given by LR test of µ 0 = (f (x 1 ),..., f (x n )) against µ 1 = (д (x 1 ),...,д (x n )): reject for large values of Y (µ 1 µ 0 ) Test can be written as rejecting whenever L(h) L 0 b ( 1 ni=1 k T (x i /h) 2 ) ni=1 k T (x i /h) ( n i=1 k T (x i /h) 2 ) 1/2 ni=1 σz 1 α. k T (x i /h) where k T (u) = (1 u ) +, h = b/c and L(h) = ni=1 (д (x i ) f (x i ))Y i ni=1 (д (x i ) f (x i )) = ni=1 k T (x i /h)y i ni=1 k T (x i /h) Key feature: non-random bias correction based on worst-case bias, doesn t disappear asymptotically 14/59

16 general setup In general, we observe Y = K f + σϵ, ϵ is standard Normal and K linear operator, with Kд,K f = i (Kд)(x i )(K f )(x i ), Heteroscedasticity handled by setting K f = (f (x 1 )/σ (x 1 ),..., f (x n )/σ (x n )), Y = (Y 1 /σ (x 1 ),...,Y n /σ (x n )). Define modulus of continuity (Donoho and Liu, 1991): ω(δ; F ) = sup { L(д f ) : K (д f ) δ, д, f F } Denote solutions by д δ, f δ, and let f M,δ = (д δ + f δ )/2 Problem of finding LF functions equivalent to finding ω 1 ( ; F ), so for running example, д = д ω 1 (2b), f = f ω 1 (2b) 15/59

17 class of optimal estimators Define ˆL δ, F = Lf M,δ + ω (δ; F ) δ K (д δ f δ ),Y K f M,δ These estimators minimize maximum bias given variance bound (and vice versa) (Low, 1995). Their maximum and minimum bias over F satisfies bias F ( ˆL δ, F ) = bias F ( ˆL δ, F ) = 1 ( ω(δ; F ) δω (δ ; F ) ), 2 In running example: L(h) = ˆL ω 1 (2hC), F Lip (C) 16/59

18 centrosymmetry and translation invariance When F has additional structure, ˆL δ simplifies: If F is translation invariant (for some ι F with Lι = 1, cι F for all c R), then δ/ω (δ; F ) = K (д δ f δ ),ι, and estimator has Nadaraya-Watson form: ˆL δ, F = Lf M,δ + K (д δ f δ ),Y K f M,δ K (д δ f δ ),Kι. If F is centrosymmetric (f F = f F ), then f δ = д δ, and ˆL δ, F = 2ω (δ; F ) Kд δ δ,y = Kд δ,y Kд δ,kι, 17/59

19 Theorem 1 (One-sided minimax CI) Let ĉ α,δ, F = ˆL δ, F bias F ( ˆL δ, F ) z 1 α σω (δ; F ). Then [ĉ α,δ, F, ) is a 1 α CI for Lf, with coverage minimized at f δ. For β = Φ(δ/σ z 1 α ), it minimizes EL β (ĉ, F ) among all one sided 1 α CIs. All quantiles of excess length are maximized at д δ. The minimax excess length at quantile β is EL β (ĉ α,δ, F ; F ) = ω(δ; F ). β is minimax power of underlying tests (under translation invariance) Bias-correction based on worst-case bias under F, non-random In running ( example, using ) bw h minimizes β quantile of excess length at β = Φ ω 1 (2hC) σ z 1 α 18/59

20 For estimation and two-sided CI, exact optimality results hard Donoho (1994) shows that procedures based on ˆL δ, F minimax optimal if we restrict attention to affine estimators Results use the fact that problem is just as hard if we know that f is in one-dimensional subfamily { λf δ + (1 λ)д δ : 0 λ 1} To state these results, consider Z N (θ,1), θ [ τ,τ ] Minimax linear estimator is c ρ (τ )Z, c ρ (τ ) = τ 2 /(1 + τ 2 ) with minimax risk ρ(τ ) = τ 2 /(1 + τ 2 ) Shortest fixed-length CI is c χ (τ )Z ± χ α (c χ (τ )Z ), solution characterized in Drees (1999), similar in spirit to Imbens and Manski (2004) 19/59

21 optimal shrinkage in bounded normal means c_{aci} chi Confidence level 90% 95% 95% (Estimation) tau 20/59

22 Theorem (Donoho (1994)) minimax MSE affine estimator is ˆL δ, F where δ solves max δ >0 ω(δ; F ) δ ρ ( ) δ σ. 2σ and the optimal δ satisfies c ρ (δ/(2σ )) = δω (δ; F )/ω(δ; F ). ( ) ω (δ ;F ) The shortest fixed-length affine CI is ˆL δ, F ± δ χ δ α 2σ σ where δ solves ( ) ω(δ; F ) δ max χ α σ. δ >0 δ 2σ and the optimal δ satisfies c χ (δ/(2σ )) = δω (δ; F )/ω(δ; F ). 21/59

23 For example, to find minimax MSE optimal bandwidth in running example, solve which yields δ 2 4σ 2 + δ 2 = c ρ (δ/(2σ )) = δω (δ; F ) ω(δ; F ) Asymptotically = δ 2 2ω(δ; F ) i д δ (x i ) ( σ 2 = C 2 h 2 k T (x i /h) k T (x i /h) ). 2 i ( 3σ 2 ) 1/3 h opt,mse = C 2 + o p (1) nf X (0) Can also use these results to derive optimal rates of convergence (eg Fan (1993); Cheng, Fan, and Marron (1997)) n 1/3 here i 22/59

24 adaptive inference onesided CIs focus on good performance under least favorable f F, which may be too pessimistic Alternative: optimize excess length over smaller class G of smoother functions inf ĉ sup q q,β (Lд ĉ), д G among ĉ that satisfy sup f F P (Lf ĉ) 1 α. Amounts to directing power at smooth alternatives, while maintaining size over all of F 23/59

25 adaptive inference in running example Associated testing problem in running example: H 0 : Lf L 0 against { H 1 : Lf L0 + 2b } { f G } Inverting these minimax tests will yield CI that minimizes β quantile of excess length over G, where β is minimax power of the test. As long as G is convex, this is still equivalent to testing convex null against convex alternative = LF functions minimize distance between sets: (f,д ) = argmin f F,д G n (д(x i ) f (x i )) 2, Lд L 0 + 2b, Lf L 0 i=1 24/59

26 To make this concrete, consider G = { д(x) : д(x) = c,c R } (i.e. д(x) = cι), and suppose Lf L 0 + b under alternative Solution: f = L 0 + b (b X x ) + (as before), д (x) = L 0 + b L_{0}+2b L_{0}+2b g^{*} f^{*} L_{0}+b g^{*} f^{*} L_{0}+b L_{0} L_{0} b/c 0 b/c x 2b/C 0 2b/C x 25/59

27 But д f same as before, so estimator as before L(h) = ni=1 (д (x i ) f (x i ))Y i ni=1 (д (x i ) f (x i )) = ni=1 k T (x i /h)y i ni=1 k T (x i /h) Worst case-bias under the null and variance same as before = Same CI as before Summary One sided CI that minimizes maximum excess length over F for β = Φ(δ/σ z 1 α ) subject to 1 α coverage also minimizes EL β (ĉ; span(ι)) for β = Φ(δ/(2σ ) z 1 α ) 26/59

28 setup for general adaptivity result Define order modulus of continuity Cai and Low (2004): ω(δ; F, G) = sup { Lд Lf : K (д f ) δ, f F,д G }. so that ω(δ; F ) = ω(δ; F, F ), and define ˆL δ, F, G = Lf M,δ + ω (δ; F, G) δ K (д δ f δ ),Y K f M,δ, so that L δ, F, F = L δ, F bias formulas generalize: bias F ( ˆL δ, F, G ) bias G ( ˆL δ, F, G ) = 1 ( ω(δ; F, G) δω (δ; F, G) ), 2 In running example, L(h) = ˆL ω 1 (hc;f, G), F, G 27/59

29 Theorem 2 (One-sided adaptive CIs) Let F and G F be convex, and suppose that f δ and д δ achieve the ordered modulus at δ. Let ĉ α,δ, F, G = ˆL δ, F, G bias F ( ˆL δ, F, G ) z 1 α σω (δ; F, G). Then, for β = Φ(δ/σ z 1 α ), ĉ α,δ, F, G minimizes EL β (ĉ, G) among all one-sided 1 α CIs, where Φ denotes the standard normal cdf. Minimum coverage is taken at f and equals 1 α. All quantiles of δ excess length are maximized at д. The worst case βth quantile of δ excess length is EL β (ĉ α,δ, F, G, G) = ω(δ; F, G). 28/59

30 non-adaptivity under centrosymmetry Suppose F is centrosymmetric and f δ, F, G д δ, F, G F. (1) Holds for G smooth enough, e.g. G = span(ι) under translation invariance as in running example Then 0 and f δ, F, G д also solve the modulus, and since δ, F, G ω(δ; F ) = sup { 2Lf : K f δ/2, f F } under centrosymmetry, ω(δ; F, G) = ω(δ; F, {0}) = sup f F Implies ĉ α,δ, F, G = ĉ α,δ, F,{0} = ĉ α,2δ, F. { } 1 Lf : K f δ = ω(2δ; F ), 2 29/59

31 Theorem 3 (Non-adaptivity of one-sided CIs under centrosymmetry) Let F be centrosymmetric. Then the one-sided CI that is minimax for the βth quantile also optimizes EL β (ĉ; G) for any G such that the solution to the ordered modulus problem exists and satisfies (1), where β = Φ((z β z 1 α )/2). In particular, the minimax CI optimizes EL β (ĉ; {0}). CI that is minimax for median excess length among 95% CIs also optimizes Φ( 1.645/2) quantile under the zero function. 30/59

32 bound on adaptivity CI [ĉ α,σ (zβ +z 1 α ), F ) that is minimax for βth quantile of excess length is unbiased at 0, and satisfies q 0,β (L0 ĉ α,σ (zβ +z 1 α )) = 1 2 (ω (δ; F )δ + ω(δ; F )). Hence, ω(δ; F, G) q 0,β (L0 ĉ α,σ (zβ +z 1 α )) = ω(δ; F, G) 1 2 (ω (δ )δ + ω(δ )) = ω(2δ ) ω (δ )δ + ω(δ ). Typically, ω(δ; F ) = Aδ r (1 + o(1)) as n for some where r is the optimal rate of convergence of the MSE. Then for 1/2 r 1, minimax CI has asymptotic efficiency at least 94.3% when indeed f = 0. Adapting to G that includes 0 at least as hard as adapting to zero 31/59

33 implications of non-adaptivity result Need shape restriction or non-convexity for adaptation Similar to impossibility results in Low (1997) and Cai and Low (2004) for two-sided CIs, and in contrast to positive results for MSE Minimax rate of shrinkage describes the actual rate for all functions in the class Possible to construct estimators that do better when f is smoother, but impossible to tell how well you did For valid inference in cases where F is convex and centrosymmetric, one has to think hard about appropriate C Not possible to try to estimate it from the data and to better than if we assume worst possible case 32/59

34 adaptivity under monotonicity Suppose, in running example, that we know f is non-increasing Least favorable functions without and with directing power: L_{0}+2b g^{*} f^{*} L_{0}+2b g^{*} f^{*} L_{0}+b L_{0}+b L_{0} L_{0} 2b/C 0 2b/C x 2b/C 0 2b/C x 33/59

35 Without directing power, optimal estimator again given by triangular kernel, but now includes bias correction (to ensure bias = bias) L(h) = i k i Y i /σ 2 i i k i /σ 2 i + b i sign(x i )k i (1 k i )/σi 2 i k i /σi 2, where k i = k β (x i /h), and optimal bw bigger than without monotonicity. About 20% reduction in quantiles of excess length With directing power, optimal estimator averages all positive observations, and averages negative observations using triangular kernel. Excess length shrinks at parametric rate. When Lipschitz assumption dropped and only monotonicity maintained, optimal estimator averages all positive observations, and excess length still shrinks at parametric rate 34/59

36 two-sided adaptive cis Fixed-length confidence intervals cannot be adaptive Cai and Low (2004) construct random-length confidence intervals that are within a constant factor of lower bound on expected length Cai, Low, and Xia (2013) construct random-length confidence intervals under shape constraints that have near minimum expected length for each individual function (again within constant) 35/59

37 Natural best-case scenario for two-sided CIs: optimize expected length at a single function G = { д } By Pratt (1961), inverting UMP tests against G achieve exactly this Again amounts to testing convex null against convex alternative, LF function under null solves f θ = argmin f F n (f (x i ) д(x i )) 2, Lf θ i=1 Theorem 4 (Adaptation to a function) CI with minimum expected measure E д λ(c) st 1 α coverage on F inverts family of tests ϕ θ, where ϕ θ rejects for large values of K (д f θ ),Y with critical value given by 1 α quantile under f θ. 36/59

38 cis based on suboptimal estimators What is efficiency loss of CIs around suboptimal affine estimators? Affine estimators are Normal, with variance that doesn t depend on f, and bias that does For each performance criterion, only worst-case bias and variance matter: if we can calculate them, then can also calculate maximum MSE, and form of one- and two-sided CIs Let χ α (B) solve P ( Z + B χ ) = Φ(χ B) Φ(χ + B) = 1 α. Then for estimator ˆL with variance V and maximum bias B, is the shortest CI is ˆL ± V 1/2 χ α (B/V 1/2 ) 37/59

39 Theorem 5 (Suboptimal estimators) Let ˆL = a + w,y be an affine estimator. Then [ˆL bias F ( ˆL) w z 1 α σ, ) is valid CI and ˆL ± σ w χ α (bias F ( ˆL)/σ w ) is the shortest fixed-length 1 α CI centered at ˆL. Not deep result, but very useful: allows to compute exact efficiency loss from using suboptimal estimators, or size-distortion of CIs with (pointwise) asymptotic justification Asymptotic version of this theorem can be used to calculate asymptotic efficiency loss from using suboptimal kernel, and/or suboptimal bandwidth 38/59

40 suboptimal estimators in running example Consider some other kernel k in running example, ˆL = i k (x i /h)y i i k (x i /h) Variance: σ 2 i k (x i /h) 2 ( i k (x i /h)) 2 Maximum bias, since f F Lip (C). i k(x i /h)(f (x i ) f (0)) i k(x i /h) C i k(x i /h) x i. i k(x i /h) Bound attained at f (x) = C x i if k 0, otherwise gives an upper bound. 39/59

41 Finite-Sample results Asymptotic results Applications Conclusion

42 renormalization In many cases (depending on L and smoothness of F, but including inference at a point and RD), nonparametric regression problem equivalent to White noise model Y (dt) = f (t) + σϵ (t) See Brown and Low (1996) and Donoho and Low (1992) In running example, this holds with σ 2 = σ (0) 2 /nf X (0) Suppose F = { f : J (f ) C } for some J (as in running example), and that for the white noise model, following functionals are homogeneous J (af ( /h)) = ah s J J (f ) Ka1 f ( /h),ka 2 д( /h) = a 1 a 2 h 2s K K f,kд L(af ( /h)) = ah s L Lf In running example, we have s L = 0, s J = 2, s K 1/2 40/59

43 (single-class) modulus problem then renormalizes: if д C,δ, f C,δ minimize min L(f 1 f 0 ) st K (f 1 f 0 ) δ, J (f 1 ) C, J (f 0 ) C, then д C,δ = aд 1,1 ( /h) ω C (δ ) = C 1 r δ r ω 1 (1) f C,δ = af 1,1 ( /h) where a = δ s J /(s K s J ) C s K /(s K s J ), h = (C/δ ) 1/(s K s J ) and r = s L s J s K s J. root of minimax MSE, and (excess) length of CIs will shrink at rate n r /2 41/59

44 optimal bandwidths Class of optimal estimators can be written as ˆL δ = L(h) = h 2s K s L Kk( /h),y + Ch s J s L ( LfM,1,1 Kk,K f M,1,1 ), with h = (C/δ ) 1/(s K s J ) and kernel k(u) = rω 1 (1)(д 1,1 f 1,1 ). Recall that optimal δ given by c l (δ/(2σ )) = δω (δ )/ω(δ ). Plugging in definition of h yields optimal bandwidth h = (2σc 1 l (r )/C) 1 s J s K, where, for one-sided CIs, c 1 β (r ) = (z β z 1 α )/2 42/59

45 ratios of optimal bandwidths, s k = 1/2, s l = (twosided) 1.5 Bandwidth ratio 0.95 (twosided) 0.99 (onesided, q=0.8) 0.95 (onesided, q=0.8) (onesided, q=0.5) 0.95 (onesided, q=0.5) r Ratios of of optimal bandwidths for CIs to optimal MSE bandwidths 43/59

46 takeaways from picture Optimal bandwidth ratios depend only on dilation exponents s L,s K and s J : h l h l = c 1 l (r ) (r ) l c 1 1 s J s K Bandwidths of same order in all cases: no undersmoothing For one-sided CIs, bandwidth gets larger with quantile that we are minimizing For 95+% two-sided CIs, if s L = 0 and s K = 1/2, optimal fixed-length CI uses a larger bandwidth than optimal MSE bandwidth 44/59

47 For any bandwidth h, worst-case bias is C 1 r 2 r hs J s L ( k 2 ) 1/2 Can use this worst case bias to construct CIs around L(h) How much bigger are two-sided CIs around minimax MSE bandwidth? Ratio of CI lengths given by c 1 χ,α (r ) c 1 ρ (r ) r 1 χ α (c 1 χ,α (r )(1 1/r )) χ α (cρ 1 (r )(1 1/r )), where χ α (B) solves P ( N (0,1) + B χ ) = Φ(χ B) Φ(χ + B) = 1 α ( ) 1 r Need to use χ α r instead of z α /2 as a critical value to ensure coverage for CI around minimax MSE bandwidth 45/59

48 length of optimal cis relative to cis around mse bw Percentage decrease in length r 46/59

49 critical values for ci around mse bandwidth Critical value r 47/59

50 undercoverage with usual critical values Coverage r 48/59

51 takeaways from pictures To construct two-sided CIs, can keep the same bandwidth as for estimation, price is < 2% for 95% CIs Need to use a slightly higher critical value to ensure proper coverage 49/59

52 suboptimal kernels Results so far assumed using optimal kernel Under renormalization, maximum bias and variance renormalize in similar way for suboptimal kernels For any kernel k, let h k be bandwidth that equates the maximum bias and root variance, and let w(k) = se( L k ( h k )) = sup f bias f ( L k ( h k )) Suppose criterion scales linearly with maximum bias and root variance 50/59

53 Theorem 6 (Efficiency loss of suboptimal kernels) 1. Relative efficiency of k and k (where the optimal bandwidth is used in both cases) does not depend on the performance criterion, and is given by w(k)/w( k) 2. Results for ratios of optimal bandwidths remain unchanged for suboptimal kernels 3. Efficiency loss from using bandwidth optimal for a different criterion rather than bandwidth optimal for criterion of interest remains unchanged for suboptimal kernels 51/59

54 corollaries bounds for minimax MSE efficiency of different kernels of Cheng, Fan, and Marron (1997) 1. are tight; and 2. hold for other efficiency criteria Using minimax MSE bandwidth for two-sided CIs a good idea no matter what kernel one uses 52/59

55 Finite-Sample results Asymptotic results Applications Conclusion

56 rd Interested in Lf = lim x 0 f (x) lim x 0 f (x). Let f + (x) = f (x)i (x > 0) and f (x) = f (x)i (x < 0) so that f = f + f. We consider class F RDT,2 (C) = { f + f : f + F T,2 (C; R + ), f F T,2 (C; R ), } where F 2 (C; X), is the class from Sacks and Ylvisaker (1978), F T,2 (C; X) = { f : f (x) f (0) f (0)x Cx 2 all x X }. F T,2 also used in Cheng, Fan, and Marron (1997) for estimation at a point that justifies much of empirical RD practice 53/59

57 least favorable functions Least favorable functions are symmetric, д δ (x) = f (x) and have form δ д δ (x) = [(b b + d + x Cx 2 ) + (b b + d + x + Cx 2 ) ]1(x > 0) with b,d +,d chosen to solve [(b + d x Cx 2 ) + (b + d x + Cx 2 ) ]1(x < 0) 0 = n д n,b,c (x i )x i σ 2, 0 = (x i ) i=1 n д n +,b,c (x i )x i σ 2, (x i ) i=1 and n i=1 д +,b,c (x i ) σ 2 (x i ) = n i=1 д,b,c (x i ) σ 2 (x i ) 54/59

58 optimal kernel k(u) u Asymptotically, д δ corresponds to difference between two kernel estimators, with bandwidths chosen to equate number of effective observations Optimal kernel same as for inference at a point, derived in Cheng, Fan, and Marron (1997) using upper bound on minimax MSE 55/59

59 application to Lee (2008) RD design: X i = margin of victory in previous election for Democratic party (negative for Republican victory) Y i = Democratic vote share in given election D i = I (X i 0) = indicator for Democratic incumbency n = 6558 observations of elections between 1946 and 1998 For simplicity, assume homoscedastic errors, use estimates ˆσ (0) 2 = and ˆσ + (0) = derived using Imbens and Kalyanaraman (2012) bandwidth LF functions very close to scaled versions of optimal bandwidth Unless C very small, results in line with Lee (2008) and Imbens and Kalyanaraman (2012) 56/59

60 minimax mse estimator as function of c 11 estimate 10 Electoral advantage (%) variance bias_sq b I II III L / Effective number of observations Note L = C 57/59

61 optimal fixed-length cis 13 upper estimate 11 lower Electoral advantage (%) variance bias_sq b I II III L / Effective number of observations /59

62 Finite-Sample results Asymptotic results Applications Conclusion

63 summary 1. give exact results for 1. minimax optimal and 2. adaptive one-sided CIs. CIs use non-random bias correction based on worst-case bias Adaptivity without shape restrictions severely limited, like in two-sided case. Impossible to avoid thinking hard about appropriate C 2. give exact solution to problem of adaptation to a function 3. use these finite-sample results to characterize optimal tuning parameters for different performance criteria building CIs around minimax MSE bandwidth nearly optimal undersmoothing cannot be optimal 59/59

OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 COWLES FOUNDATION DISCUSSION PAPER NO.

OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 COWLES FOUNDATION DISCUSSION PAPER NO. OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS By Timothy B. Armstrong and Michal Kolesár May 2016 COWLES FOUNDATION DISCUSSION PAPER NO. 2043 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY

More information

OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 Revised May 2017

OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 Revised May 2017 OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS By Timothy B. Armstrong and Michal Kolesár May 2016 Revised May 2017 COWLES FOUNDATION DISCUSSION PAPER NO. 2043R COWLES FOUNDATION FOR RESEARCH IN ECONOMICS

More information

Optimal inference in a class of regression models

Optimal inference in a class of regression models Optimal inference in a class of regression models arxiv:1511.06028v1 [math.st] 19 Nov 2015 Timothy B. Armstrong Michal Kolesár Yale University Princeton University November 20, 2015 Abstract We consider

More information

Simple and Honest Confidence Intervals in Nonparametric Regression

Simple and Honest Confidence Intervals in Nonparametric Regression Simple and Honest Confidence Intervals in Nonparametric Regression Timothy B. Armstrong Yale University Michal Kolesár Princeton University June, 206 Abstract We consider the problem of constructing honest

More information

Optimal inference in a class of regression models

Optimal inference in a class of regression models Optimal inference in a class of regression models arxiv:1511.06028v4 [math.st] 22 Nov 2017 Timothy B. Armstrong Michal Kolesár Yale University Princeton University November 23, 2017 Abstract We consider

More information

SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised October 2016

SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised October 2016 SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION By Timothy B. Armstrong and Michal Kolesár June 2016 Revised October 2016 COWLES FOUNDATION DISCUSSION PAPER NO. 2044R COWLES FOUNDATION

More information

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness finite-sample optimal estimation and inference on average treatment effects under unconfoundedness Timothy Armstrong (Yale University) Michal Kolesár (Princeton University) September 2017 Introduction

More information

SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised March 2018

SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised March 2018 SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION By Timothy B. Armstrong and Michal Kolesár June 2016 Revised March 2018 COWLES FOUNDATION DISCUSSION PAPER NO. 2044R2 COWLES FOUNDATION

More information

FINITE-SAMPLE OPTIMAL ESTIMATION AND INFERENCE ON AVERAGE TREATMENT EFFECTS UNDER UNCONFOUNDEDNESS. Timothy B. Armstrong and Michal Kolesár

FINITE-SAMPLE OPTIMAL ESTIMATION AND INFERENCE ON AVERAGE TREATMENT EFFECTS UNDER UNCONFOUNDEDNESS. Timothy B. Armstrong and Michal Kolesár FINITE-SAMPLE OPTIMAL ESTIMATION AND INFERENCE ON AVERAGE TREATMENT EFFECTS UNDER UNCONFOUNDEDNESS By Timothy B. Armstrong and Michal Kolesár December 2017 Revised December 2018 COWLES FOUNDATION DISCUSSION

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"

Supplemental Appendix to Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs" Yingying Dong University of California Irvine February 2018 Abstract This document provides

More information

ON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES. Timothy B. Armstrong. October 2014 Revised July 2017

ON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES. Timothy B. Armstrong. October 2014 Revised July 2017 ON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES By Timothy B. Armstrong October 2014 Revised July 2017 COWLES FOUNDATION DISCUSSION PAPER NO. 1960R2 COWLES FOUNDATION FOR RESEARCH IN

More information

A NOTE ON MINIMAX TESTING AND CONFIDENCE INTERVALS IN MOMENT INEQUALITY MODELS. Timothy B. Armstrong. December 2014

A NOTE ON MINIMAX TESTING AND CONFIDENCE INTERVALS IN MOMENT INEQUALITY MODELS. Timothy B. Armstrong. December 2014 A NOTE ON MINIMAX TESTING AND CONFIDENCE INTERVALS IN MOMENT INEQUALITY MODELS By Timothy B. Armstrong December 2014 COWLES FOUNDATION DISCUSSION PAPER NO. 1975 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

A Simple Adjustment for Bandwidth Snooping

A Simple Adjustment for Bandwidth Snooping A Simple Adjustment for Bandwidth Snooping Timothy B. Armstrong Yale University Michal Kolesár Princeton University June 28, 2017 Abstract Kernel-based estimators such as local polynomial estimators in

More information

Section 7: Local linear regression (loess) and regression discontinuity designs

Section 7: Local linear regression (loess) and regression discontinuity designs Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear

More information

A Simple Adjustment for Bandwidth Snooping

A Simple Adjustment for Bandwidth Snooping A Simple Adjustment for Bandwidth Snooping Timothy B. Armstrong Yale University Michal Kolesár Princeton University October 18, 2016 Abstract Kernel-based estimators are often evaluated at multiple bandwidths

More information

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Regression Discontinuity Designs in Stata

Regression Discontinuity Designs in Stata Regression Discontinuity Designs in Stata Matias D. Cattaneo University of Michigan July 30, 2015 Overview Main goal: learn about treatment effect of policy or intervention. If treatment randomization

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

Multiscale Adaptive Inference on Conditional Moment Inequalities

Multiscale Adaptive Inference on Conditional Moment Inequalities Multiscale Adaptive Inference on Conditional Moment Inequalities Timothy B. Armstrong 1 Hock Peng Chan 2 1 Yale University 2 National University of Singapore June 2013 Conditional moment inequality models

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference

On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference Sebastian Calonico Matias D. Cattaneo Max H. Farrell November 4, 014 PRELIMINARY AND INCOMPLETE COMMENTS WELCOME Abstract

More information

Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms

Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms Byunghoon ang Department of Economics, University of Wisconsin-Madison First version December 9, 204; Revised November

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Coverage Error Optimal Confidence Intervals

Coverage Error Optimal Confidence Intervals Coverage Error Optimal Confidence Intervals Sebastian Calonico Matias D. Cattaneo Max H. Farrell August 3, 2018 Abstract We propose a framework for ranking confidence interval estimators in terms of their

More information

Integral approximation by kernel smoothing

Integral approximation by kernel smoothing Integral approximation by kernel smoothing François Portier Université catholique de Louvain - ISBA August, 29 2014 In collaboration with Bernard Delyon Topic of the talk: Given ϕ : R d R, estimation of

More information

Regression Discontinuity Designs

Regression Discontinuity Designs Regression Discontinuity Designs Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Regression Discontinuity Design Stat186/Gov2002 Fall 2018 1 / 1 Observational

More information

A Simple Adjustment for Bandwidth Snooping

A Simple Adjustment for Bandwidth Snooping Review of Economic Studies (207) 0, 35 0034-6527/7/0000000$02.00 c 207 The Review of Economic Studies Limited A Simple Adjustment for Bandwidth Snooping TIMOTHY B. ARMSTRONG Yale University E-mail: timothy.armstrong@yale.edu

More information

Smooth functions and local extreme values

Smooth functions and local extreme values Smooth functions and local extreme values A. Kovac 1 Department of Mathematics University of Bristol Abstract Given a sample of n observations y 1,..., y n at time points t 1,..., t n we consider the problem

More information

Interval Estimation. Chapter 9

Interval Estimation. Chapter 9 Chapter 9 Interval Estimation 9.1 Introduction Definition 9.1.1 An interval estimate of a real-values parameter θ is any pair of functions, L(x 1,..., x n ) and U(x 1,..., x n ), of a sample that satisfy

More information

Lecture 12 November 3

Lecture 12 November 3 STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Independent and conditionally independent counterfactual distributions

Independent and conditionally independent counterfactual distributions Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views

More information

Why high-order polynomials should not be used in regression discontinuity designs

Why high-order polynomials should not be used in regression discontinuity designs Why high-order polynomials should not be used in regression discontinuity designs Andrew Gelman Guido Imbens 6 Jul 217 Abstract It is common in regression discontinuity analysis to control for third, fourth,

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Confidence intervals for kernel density estimation

Confidence intervals for kernel density estimation Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Minimum Contrast Empirical Likelihood Manipulation. Testing for Regression Discontinuity Design

Minimum Contrast Empirical Likelihood Manipulation. Testing for Regression Discontinuity Design Minimum Contrast Empirical Likelihood Manipulation Testing for Regression Discontinuity Design Jun Ma School of Economics Renmin University of China Hugo Jales Department of Economics Syracuse University

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers 6th St.Petersburg Workshop on Simulation (2009) 1-3 A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers Ansgar Steland 1 Abstract Sequential kernel smoothers form a class of procedures

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Inference in Regression Discontinuity Designs with a Discrete Running Variable

Inference in Regression Discontinuity Designs with a Discrete Running Variable Inference in Regression Discontinuity Designs with a Discrete Running Variable Michal Kolesár Christoph Rothe arxiv:1606.04086v4 [stat.ap] 18 Nov 2017 November 21, 2017 Abstract We consider inference in

More information

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

HYPOTHESIS TESTING: FREQUENTIST APPROACH. HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

Inference on distributions and quantiles using a finite-sample Dirichlet process

Inference on distributions and quantiles using a finite-sample Dirichlet process Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Classification with Reject Option

Classification with Reject Option Classification with Reject Option Bartlett and Wegkamp (2008) Wegkamp and Yuan (2010) February 17, 2012 Outline. Introduction.. Classification with reject option. Spirit of the papers BW2008.. Infinite

More information

Uniform Confidence Sets for Nonparametric Regression with Application to Cosmology

Uniform Confidence Sets for Nonparametric Regression with Application to Cosmology Uniform Confidence Sets for Nonparametric Regression with Application to Cosmology Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry

More information

Inference in Regression Discontinuity Designs with a Discrete Running Variable: Supplemental Materials

Inference in Regression Discontinuity Designs with a Discrete Running Variable: Supplemental Materials Inference in Regression Discontinuity Designs with a Discrete Running Variable: Supplemental Materials Michal Kolesár Christoph Rothe March 27, 2018 This supplement is organized as follows. Section S1

More information

A New Method for Varying Adaptive Bandwidth Selection

A New Method for Varying Adaptive Bandwidth Selection IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 47, O. 9, SEPTEMBER 1999 2567 TABLE I SQUARE ROOT MEA SQUARED ERRORS (SRMSE) OF ESTIMATIO USIG THE LPA AD VARIOUS WAVELET METHODS A ew Method for Varying Adaptive

More information

TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS

TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS DENIS CHETVERIKOV Abstract. Monotonicity is a key qualitative prediction of a wide array of economic models derived via robust comparative statics.

More information

Local Polynomial Order in Regression Discontinuity Designs 1

Local Polynomial Order in Regression Discontinuity Designs 1 Local Polynomial Order in Regression Discontinuity Designs 1 Zhuan Pei 2 Cornell University and IZA David Card 4 UC Berkeley, NBER and IZA David S. Lee 3 Princeton University and NBER Andrea Weber 5 Central

More information

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

Minimax Estimation of a nonlinear functional on a structured high-dimensional model Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics

More information

Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs

Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs Andrew GELMAN Department of Statistics and Department of Political Science, Columbia University, New York, NY, 10027 (gelman@stat.columbia.edu)

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Confidence Sets Based on Shrinkage Estimators

Confidence Sets Based on Shrinkage Estimators Confidence Sets Based on Shrinkage Estimators Mikkel Plagborg-Møller April 12, 2017 Shrinkage estimators in applied work { } ˆβ shrink = argmin β ˆQ(β) + λc(β) Shrinkage/penalized estimators popular in

More information

Test Volume 11, Number 1. June 2002

Test Volume 11, Number 1. June 2002 Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa

More information

Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs

Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs Sebastian Calonico Matias D. Cattaneo Max H. Farrell September 14, 2018 Abstract Modern empirical work in

More information

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Lecture 21. Hypothesis Testing II

Lecture 21. Hypothesis Testing II Lecture 21. Hypothesis Testing II December 7, 2011 In the previous lecture, we dened a few key concepts of hypothesis testing and introduced the framework for parametric hypothesis testing. In the parametric

More information

Uniform Confidence Sets for Nonparametric Regression with Application to Cosmology

Uniform Confidence Sets for Nonparametric Regression with Application to Cosmology Uniform Confidence Sets for Nonparametric Regression with Application to Cosmology Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Partial Identification and Confidence Intervals

Partial Identification and Confidence Intervals Partial Identification and Confidence Intervals Jinyong Hahn Department of Economics, UCLA Geert Ridder Department of Economics, USC September 17, 009 Abstract We consider statistical inference on a single

More information

Variance Function Estimation in Multivariate Nonparametric Regression

Variance Function Estimation in Multivariate Nonparametric Regression Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered

More information

Semi and Nonparametric Models in Econometrics

Semi and Nonparametric Models in Econometrics Semi and Nonparametric Models in Econometrics Part 4: partial identification Xavier d Haultfoeuille CREST-INSEE Outline Introduction First examples: missing data Second example: incomplete models Inference

More information

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Collaborators:

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Optimal bandwidth selection for the fuzzy regression discontinuity estimator Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal

More information

Adaptivity to Local Smoothness and Dimension in Kernel Regression

Adaptivity to Local Smoothness and Dimension in Kernel Regression Adaptivity to Local Smoothness and Dimension in Kernel Regression Samory Kpotufe Toyota Technological Institute-Chicago samory@tticedu Vikas K Garg Toyota Technological Institute-Chicago vkg@tticedu Abstract

More information

Inference for Identifiable Parameters in Partially Identified Econometric Models

Inference for Identifiable Parameters in Partially Identified Econometric Models Inference for Identifiable Parameters in Partially Identified Econometric Models Joseph P. Romano Department of Statistics Stanford University romano@stat.stanford.edu Azeem M. Shaikh Department of Economics

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

Inverse Statistical Learning

Inverse Statistical Learning Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning

More information

Optimal Bandwidth Choice for the Regression Discontinuity Estimator

Optimal Bandwidth Choice for the Regression Discontinuity Estimator Optimal Bandwidth Choice for the Regression Discontinuity Estimator Guido Imbens and Karthik Kalyanaraman First Draft: June 8 This Draft: September Abstract We investigate the choice of the bandwidth for

More information

Simultaneous selection of optimal bandwidths for the sharp regression discontinuity estimator

Simultaneous selection of optimal bandwidths for the sharp regression discontinuity estimator Simultaneous selection of optimal bandwidths for the sharp regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working

More information

Nonparametric Regression

Nonparametric Regression Adaptive Variance Function Estimation in Heteroscedastic Nonparametric Regression T. Tony Cai and Lie Wang Abstract We consider a wavelet thresholding approach to adaptive variance function estimation

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

Inference in Regression Discontinuity Designs with a Discrete Running Variable

Inference in Regression Discontinuity Designs with a Discrete Running Variable Inference in Regression Discontinuity Designs with a Discrete Running Variable Michal Kolesár Christoph Rothe June 21, 2016 Abstract We consider inference in regression discontinuity designs when the running

More information

Regression Discontinuity

Regression Discontinuity Regression Discontinuity Christopher Taber Department of Economics University of Wisconsin-Madison October 16, 2018 I will describe the basic ideas of RD, but ignore many of the details Good references

More information

An Alternative Assumption to Identify LATE in Regression Discontinuity Design

An Alternative Assumption to Identify LATE in Regression Discontinuity Design An Alternative Assumption to Identify LATE in Regression Discontinuity Design Yingying Dong University of California Irvine May 2014 Abstract One key assumption Imbens and Angrist (1994) use to identify

More information

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER By Donald W. K. Andrews August 2011 COWLES FOUNDATION DISCUSSION PAPER NO. 1815 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS

More information

A nonparametric method of multi-step ahead forecasting in diffusion processes

A nonparametric method of multi-step ahead forecasting in diffusion processes A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

More information

Regression Discontinuity

Regression Discontinuity Regression Discontinuity Christopher Taber Department of Economics University of Wisconsin-Madison October 24, 2017 I will describe the basic ideas of RD, but ignore many of the details Good references

More information