optimal inference in a class of nonparametric models
|
|
- Brianna McCarthy
- 5 years ago
- Views:
Transcription
1 optimal inference in a class of nonparametric models Timothy Armstrong (Yale University) Michal Kolesár (Princeton University) September 2015
2 setup Interested in inference on linear functional Lf in regression model y i = f (x i ) + u i, u i N (0,σ 2 (x i )). x i is fixed, σ 2 (x i ) is known. Important special cases: 1. Inference at a point: Lf = f (0) 2. Regression discontinuity: Lf = f (0 + ) f (0 ) 3. ATE under unconfoundedness: x i = {w i,d i }, Lf = 1 n i (f (w i,1) f (w i,0)) 4. Partially linear model 2/59
3 key assumption Convexity Assumption f F, a known convex set Rules out e.g. sparsity, but not usual shape/smoothness restrictions: Monotonicity F = { f : f non-increasing } Lipschitz class F Lip (C) = { f : f (x 1 ) f (x 2 ) C x 1 x 2 } (or Hölder class generalizations). Taylor class F T,2 (C) = { f : f (x) f (0) f (x)x Cx 2} (useful for RD / inference at point) Sign restrictions in linear regression { f (x) = x β : β j 0,j J } Will take C as known if necessary, and ask later if this can be relaxed. 3/59
4 notions of finite-sample optimality Normality = can derive finite-sample procedures that minimize the worst case loss over G F without Normality, procedures will be valid and optimal asymptotically under regularity conditions, uniformly over F 1. Setting G = F yields minimax procedures. Problem well-studied if loss is MSE, general solution in Donoho (1994), used to derive optimal kernels and rates of convergence (Stone, 1980; Fan, 1993; Cheng, Fan, and Marron, 1997) Donoho (1994) derives fixed-length confidence intervals (CI) that are almost optimal 2. G F smoother functions: adaptive inference ( directing power ) For two-sided CIs, Cai and Low (2004) give bounds 4/59
5 new finite-sample results: one-sided cis Derive one-sided CIs, [ĉ, ), that minimize maximum quantiles of excess length over G, with ĉ = ˆL bias F ( ˆL) z 1 α sd( ˆL), for optimal estimator ˆL For case F = G (minimax CIs), ˆL has same form as minimax MSE estimators / fixed-length CIs of Donoho (1994) We show that if F is symmetric, adaptation severely limited. Adaptation requires non-convexity or shape restrictions: otherwise, cannot do better at smaller C while maintaining coverage for larger C Conversely any inference method that claims to do better than minimax CIs when f is smooth must be size distorted for some f F (C) Related to Low (1997), who shows adapting to derivative smoothness classes limited for two-sided (random-length) CIs. 5/59
6 new finite-sample results: two-sided cis We derive two-sided CIs that minimize expected length over G = { д }, solving the problem of adaptation to a function posed in Cai, Low, and Xia (2013) Can be used to bound scope for adaptivity 6/59
7 implications for optimal bandwidth choice Asymptotically, optimal procedures often correspond to kernel estimators with fixed (optimal) kernel, and bw that depends on optimality criterion. We find that for RD and inference at a point: Optimal 95% fixed-length CIs use larger bandwidth than minimax MSE estimators. Undersmoothing cannot be optimal Recentering CIs by estimating bias cannot be optimal it s essentially equivalent to using higher-order kernel and undersmoothing (Calonico, Cattaneo, and Titiunik, 2014). Difference is small: CI around minimax MSE estimator only 1% longer In practice, can keep the same bandwidth as for estimation, and construct CI around it using worst-case bias correction 7/59
8 applications We apply the general results to: 1. RD with F = { f + f : f ± F T,2 (C) } as in Cheng, Fan, and Marron (1997) Optimal bandwidths balance number of effective observations on each side of cutoff Illustrate with empirical application from Lee (2008) 2. Linear regression with β possibly constrained (sign restrictions, sparsity, elliptical constraints) 3. Sample average treatment effect under unconfoundedness under Hölder class (separate paper) 8/59
9 incomplete list of related literature Stats literature on minimax estimation/inference/rates of convergence/adaptivity: Ibragimov and Khas minskii (1985), Donoho and Liu (1991), Donoho and Low (1992), Donoho (1994), Low (1995), Low (1997), Cai and Low (2004), Cai, Low, and Xia (2013), Cheng, Fan, and Marron (1997), Fan (1993), Fan, Gasser, Gijbels, Brockmann, and Engel (1997), Lepski and Tsybakov (2000) non-standard CIs: Imbens and Manski (2004), Müller and Norets (2012), Calonico, Cattaneo, and Titiunik (2014), Calonico, Cattaneo, and Farrell (2015), Rothe (2015) Adaptive estimation/inference in econometrics: Sun (2005), Armstrong (2015), Chernozhukov, Chetverikov, and Kato (2014) 9/59
10 Finite-Sample results Asymptotic results Applications Conclusion
11 running example Consider the problem of inference on f (0) when f is restricted to be in Lipschitz class F = F Lip (C) = { f : f (x 1 ) f (x 2 ) C x 1 x 2 }. Assume σ (x) = σ, known 10/59
12 performance criteria To measure performance of (1 α)% one-sided CIs [ĉ, ), we use maximum quantiles of excess length where q д,β is βth quantile under д. EL β (ĉ, G) = sup q д,β (Lд ĉ), д G For two-sided CIs, we focus on fixed-length CIs ˆL ± χ, where ˆL is estimator, and χ is chosen to satisfy coverage: χ α ( ˆL) = min { χ : inf f F P f ( ˆL Lf χ ) 1 α } For estimation, we use maximum MSE, R MSE ( ˆL) = sup f F E f ( ˆL Lf ) 2 11/59
13 minimax testing problem In running example, Lf = f (0), F = F Lip (C), consider minimax test of H 0 : Lf L 0 against H 1 : Lf L 0 + 2b Inverting minimax tests yields CI that minimizes EL β (ĉ, F ), where β is minimax power of the test. First need to find least favorable null and alternative. Problem equivalent to Y N (µ,σ 2 I ), µ = (f (x 1 ),..., f (x n )) M convex Both M 0 = M { f : Lf L 0 } and M1 = M { д : Lд L 0 + 2b } are convex least favorable functions minimize distance between them (Ingster and Suslina, 2003) (д, f ) = argmin д M 1,f M 0 n (д(x i ) f (x i )) 2. i=1 12/59
14 L_{0}+2b д (x) = L 0 + b + (b C x ) + f (x) = L 0 + b (b C x ) + L_{0}+b g^{*} f^{*} L_{0} b/c 0 b/c x 13/59
15 д (x) = L 0 + b + (b C x ) +, f (x) = L 0 + b (b C x ) + Minimax test then given by LR test of µ 0 = (f (x 1 ),..., f (x n )) against µ 1 = (д (x 1 ),...,д (x n )): reject for large values of Y (µ 1 µ 0 ) Test can be written as rejecting whenever L(h) L 0 b ( 1 ni=1 k T (x i /h) 2 ) ni=1 k T (x i /h) ( n i=1 k T (x i /h) 2 ) 1/2 ni=1 σz 1 α. k T (x i /h) where k T (u) = (1 u ) +, h = b/c and L(h) = ni=1 (д (x i ) f (x i ))Y i ni=1 (д (x i ) f (x i )) = ni=1 k T (x i /h)y i ni=1 k T (x i /h) Key feature: non-random bias correction based on worst-case bias, doesn t disappear asymptotically 14/59
16 general setup In general, we observe Y = K f + σϵ, ϵ is standard Normal and K linear operator, with Kд,K f = i (Kд)(x i )(K f )(x i ), Heteroscedasticity handled by setting K f = (f (x 1 )/σ (x 1 ),..., f (x n )/σ (x n )), Y = (Y 1 /σ (x 1 ),...,Y n /σ (x n )). Define modulus of continuity (Donoho and Liu, 1991): ω(δ; F ) = sup { L(д f ) : K (д f ) δ, д, f F } Denote solutions by д δ, f δ, and let f M,δ = (д δ + f δ )/2 Problem of finding LF functions equivalent to finding ω 1 ( ; F ), so for running example, д = д ω 1 (2b), f = f ω 1 (2b) 15/59
17 class of optimal estimators Define ˆL δ, F = Lf M,δ + ω (δ; F ) δ K (д δ f δ ),Y K f M,δ These estimators minimize maximum bias given variance bound (and vice versa) (Low, 1995). Their maximum and minimum bias over F satisfies bias F ( ˆL δ, F ) = bias F ( ˆL δ, F ) = 1 ( ω(δ; F ) δω (δ ; F ) ), 2 In running example: L(h) = ˆL ω 1 (2hC), F Lip (C) 16/59
18 centrosymmetry and translation invariance When F has additional structure, ˆL δ simplifies: If F is translation invariant (for some ι F with Lι = 1, cι F for all c R), then δ/ω (δ; F ) = K (д δ f δ ),ι, and estimator has Nadaraya-Watson form: ˆL δ, F = Lf M,δ + K (д δ f δ ),Y K f M,δ K (д δ f δ ),Kι. If F is centrosymmetric (f F = f F ), then f δ = д δ, and ˆL δ, F = 2ω (δ; F ) Kд δ δ,y = Kд δ,y Kд δ,kι, 17/59
19 Theorem 1 (One-sided minimax CI) Let ĉ α,δ, F = ˆL δ, F bias F ( ˆL δ, F ) z 1 α σω (δ; F ). Then [ĉ α,δ, F, ) is a 1 α CI for Lf, with coverage minimized at f δ. For β = Φ(δ/σ z 1 α ), it minimizes EL β (ĉ, F ) among all one sided 1 α CIs. All quantiles of excess length are maximized at д δ. The minimax excess length at quantile β is EL β (ĉ α,δ, F ; F ) = ω(δ; F ). β is minimax power of underlying tests (under translation invariance) Bias-correction based on worst-case bias under F, non-random In running ( example, using ) bw h minimizes β quantile of excess length at β = Φ ω 1 (2hC) σ z 1 α 18/59
20 For estimation and two-sided CI, exact optimality results hard Donoho (1994) shows that procedures based on ˆL δ, F minimax optimal if we restrict attention to affine estimators Results use the fact that problem is just as hard if we know that f is in one-dimensional subfamily { λf δ + (1 λ)д δ : 0 λ 1} To state these results, consider Z N (θ,1), θ [ τ,τ ] Minimax linear estimator is c ρ (τ )Z, c ρ (τ ) = τ 2 /(1 + τ 2 ) with minimax risk ρ(τ ) = τ 2 /(1 + τ 2 ) Shortest fixed-length CI is c χ (τ )Z ± χ α (c χ (τ )Z ), solution characterized in Drees (1999), similar in spirit to Imbens and Manski (2004) 19/59
21 optimal shrinkage in bounded normal means c_{aci} chi Confidence level 90% 95% 95% (Estimation) tau 20/59
22 Theorem (Donoho (1994)) minimax MSE affine estimator is ˆL δ, F where δ solves max δ >0 ω(δ; F ) δ ρ ( ) δ σ. 2σ and the optimal δ satisfies c ρ (δ/(2σ )) = δω (δ; F )/ω(δ; F ). ( ) ω (δ ;F ) The shortest fixed-length affine CI is ˆL δ, F ± δ χ δ α 2σ σ where δ solves ( ) ω(δ; F ) δ max χ α σ. δ >0 δ 2σ and the optimal δ satisfies c χ (δ/(2σ )) = δω (δ; F )/ω(δ; F ). 21/59
23 For example, to find minimax MSE optimal bandwidth in running example, solve which yields δ 2 4σ 2 + δ 2 = c ρ (δ/(2σ )) = δω (δ; F ) ω(δ; F ) Asymptotically = δ 2 2ω(δ; F ) i д δ (x i ) ( σ 2 = C 2 h 2 k T (x i /h) k T (x i /h) ). 2 i ( 3σ 2 ) 1/3 h opt,mse = C 2 + o p (1) nf X (0) Can also use these results to derive optimal rates of convergence (eg Fan (1993); Cheng, Fan, and Marron (1997)) n 1/3 here i 22/59
24 adaptive inference onesided CIs focus on good performance under least favorable f F, which may be too pessimistic Alternative: optimize excess length over smaller class G of smoother functions inf ĉ sup q q,β (Lд ĉ), д G among ĉ that satisfy sup f F P (Lf ĉ) 1 α. Amounts to directing power at smooth alternatives, while maintaining size over all of F 23/59
25 adaptive inference in running example Associated testing problem in running example: H 0 : Lf L 0 against { H 1 : Lf L0 + 2b } { f G } Inverting these minimax tests will yield CI that minimizes β quantile of excess length over G, where β is minimax power of the test. As long as G is convex, this is still equivalent to testing convex null against convex alternative = LF functions minimize distance between sets: (f,д ) = argmin f F,д G n (д(x i ) f (x i )) 2, Lд L 0 + 2b, Lf L 0 i=1 24/59
26 To make this concrete, consider G = { д(x) : д(x) = c,c R } (i.e. д(x) = cι), and suppose Lf L 0 + b under alternative Solution: f = L 0 + b (b X x ) + (as before), д (x) = L 0 + b L_{0}+2b L_{0}+2b g^{*} f^{*} L_{0}+b g^{*} f^{*} L_{0}+b L_{0} L_{0} b/c 0 b/c x 2b/C 0 2b/C x 25/59
27 But д f same as before, so estimator as before L(h) = ni=1 (д (x i ) f (x i ))Y i ni=1 (д (x i ) f (x i )) = ni=1 k T (x i /h)y i ni=1 k T (x i /h) Worst case-bias under the null and variance same as before = Same CI as before Summary One sided CI that minimizes maximum excess length over F for β = Φ(δ/σ z 1 α ) subject to 1 α coverage also minimizes EL β (ĉ; span(ι)) for β = Φ(δ/(2σ ) z 1 α ) 26/59
28 setup for general adaptivity result Define order modulus of continuity Cai and Low (2004): ω(δ; F, G) = sup { Lд Lf : K (д f ) δ, f F,д G }. so that ω(δ; F ) = ω(δ; F, F ), and define ˆL δ, F, G = Lf M,δ + ω (δ; F, G) δ K (д δ f δ ),Y K f M,δ, so that L δ, F, F = L δ, F bias formulas generalize: bias F ( ˆL δ, F, G ) bias G ( ˆL δ, F, G ) = 1 ( ω(δ; F, G) δω (δ; F, G) ), 2 In running example, L(h) = ˆL ω 1 (hc;f, G), F, G 27/59
29 Theorem 2 (One-sided adaptive CIs) Let F and G F be convex, and suppose that f δ and д δ achieve the ordered modulus at δ. Let ĉ α,δ, F, G = ˆL δ, F, G bias F ( ˆL δ, F, G ) z 1 α σω (δ; F, G). Then, for β = Φ(δ/σ z 1 α ), ĉ α,δ, F, G minimizes EL β (ĉ, G) among all one-sided 1 α CIs, where Φ denotes the standard normal cdf. Minimum coverage is taken at f and equals 1 α. All quantiles of δ excess length are maximized at д. The worst case βth quantile of δ excess length is EL β (ĉ α,δ, F, G, G) = ω(δ; F, G). 28/59
30 non-adaptivity under centrosymmetry Suppose F is centrosymmetric and f δ, F, G д δ, F, G F. (1) Holds for G smooth enough, e.g. G = span(ι) under translation invariance as in running example Then 0 and f δ, F, G д also solve the modulus, and since δ, F, G ω(δ; F ) = sup { 2Lf : K f δ/2, f F } under centrosymmetry, ω(δ; F, G) = ω(δ; F, {0}) = sup f F Implies ĉ α,δ, F, G = ĉ α,δ, F,{0} = ĉ α,2δ, F. { } 1 Lf : K f δ = ω(2δ; F ), 2 29/59
31 Theorem 3 (Non-adaptivity of one-sided CIs under centrosymmetry) Let F be centrosymmetric. Then the one-sided CI that is minimax for the βth quantile also optimizes EL β (ĉ; G) for any G such that the solution to the ordered modulus problem exists and satisfies (1), where β = Φ((z β z 1 α )/2). In particular, the minimax CI optimizes EL β (ĉ; {0}). CI that is minimax for median excess length among 95% CIs also optimizes Φ( 1.645/2) quantile under the zero function. 30/59
32 bound on adaptivity CI [ĉ α,σ (zβ +z 1 α ), F ) that is minimax for βth quantile of excess length is unbiased at 0, and satisfies q 0,β (L0 ĉ α,σ (zβ +z 1 α )) = 1 2 (ω (δ; F )δ + ω(δ; F )). Hence, ω(δ; F, G) q 0,β (L0 ĉ α,σ (zβ +z 1 α )) = ω(δ; F, G) 1 2 (ω (δ )δ + ω(δ )) = ω(2δ ) ω (δ )δ + ω(δ ). Typically, ω(δ; F ) = Aδ r (1 + o(1)) as n for some where r is the optimal rate of convergence of the MSE. Then for 1/2 r 1, minimax CI has asymptotic efficiency at least 94.3% when indeed f = 0. Adapting to G that includes 0 at least as hard as adapting to zero 31/59
33 implications of non-adaptivity result Need shape restriction or non-convexity for adaptation Similar to impossibility results in Low (1997) and Cai and Low (2004) for two-sided CIs, and in contrast to positive results for MSE Minimax rate of shrinkage describes the actual rate for all functions in the class Possible to construct estimators that do better when f is smoother, but impossible to tell how well you did For valid inference in cases where F is convex and centrosymmetric, one has to think hard about appropriate C Not possible to try to estimate it from the data and to better than if we assume worst possible case 32/59
34 adaptivity under monotonicity Suppose, in running example, that we know f is non-increasing Least favorable functions without and with directing power: L_{0}+2b g^{*} f^{*} L_{0}+2b g^{*} f^{*} L_{0}+b L_{0}+b L_{0} L_{0} 2b/C 0 2b/C x 2b/C 0 2b/C x 33/59
35 Without directing power, optimal estimator again given by triangular kernel, but now includes bias correction (to ensure bias = bias) L(h) = i k i Y i /σ 2 i i k i /σ 2 i + b i sign(x i )k i (1 k i )/σi 2 i k i /σi 2, where k i = k β (x i /h), and optimal bw bigger than without monotonicity. About 20% reduction in quantiles of excess length With directing power, optimal estimator averages all positive observations, and averages negative observations using triangular kernel. Excess length shrinks at parametric rate. When Lipschitz assumption dropped and only monotonicity maintained, optimal estimator averages all positive observations, and excess length still shrinks at parametric rate 34/59
36 two-sided adaptive cis Fixed-length confidence intervals cannot be adaptive Cai and Low (2004) construct random-length confidence intervals that are within a constant factor of lower bound on expected length Cai, Low, and Xia (2013) construct random-length confidence intervals under shape constraints that have near minimum expected length for each individual function (again within constant) 35/59
37 Natural best-case scenario for two-sided CIs: optimize expected length at a single function G = { д } By Pratt (1961), inverting UMP tests against G achieve exactly this Again amounts to testing convex null against convex alternative, LF function under null solves f θ = argmin f F n (f (x i ) д(x i )) 2, Lf θ i=1 Theorem 4 (Adaptation to a function) CI with minimum expected measure E д λ(c) st 1 α coverage on F inverts family of tests ϕ θ, where ϕ θ rejects for large values of K (д f θ ),Y with critical value given by 1 α quantile under f θ. 36/59
38 cis based on suboptimal estimators What is efficiency loss of CIs around suboptimal affine estimators? Affine estimators are Normal, with variance that doesn t depend on f, and bias that does For each performance criterion, only worst-case bias and variance matter: if we can calculate them, then can also calculate maximum MSE, and form of one- and two-sided CIs Let χ α (B) solve P ( Z + B χ ) = Φ(χ B) Φ(χ + B) = 1 α. Then for estimator ˆL with variance V and maximum bias B, is the shortest CI is ˆL ± V 1/2 χ α (B/V 1/2 ) 37/59
39 Theorem 5 (Suboptimal estimators) Let ˆL = a + w,y be an affine estimator. Then [ˆL bias F ( ˆL) w z 1 α σ, ) is valid CI and ˆL ± σ w χ α (bias F ( ˆL)/σ w ) is the shortest fixed-length 1 α CI centered at ˆL. Not deep result, but very useful: allows to compute exact efficiency loss from using suboptimal estimators, or size-distortion of CIs with (pointwise) asymptotic justification Asymptotic version of this theorem can be used to calculate asymptotic efficiency loss from using suboptimal kernel, and/or suboptimal bandwidth 38/59
40 suboptimal estimators in running example Consider some other kernel k in running example, ˆL = i k (x i /h)y i i k (x i /h) Variance: σ 2 i k (x i /h) 2 ( i k (x i /h)) 2 Maximum bias, since f F Lip (C). i k(x i /h)(f (x i ) f (0)) i k(x i /h) C i k(x i /h) x i. i k(x i /h) Bound attained at f (x) = C x i if k 0, otherwise gives an upper bound. 39/59
41 Finite-Sample results Asymptotic results Applications Conclusion
42 renormalization In many cases (depending on L and smoothness of F, but including inference at a point and RD), nonparametric regression problem equivalent to White noise model Y (dt) = f (t) + σϵ (t) See Brown and Low (1996) and Donoho and Low (1992) In running example, this holds with σ 2 = σ (0) 2 /nf X (0) Suppose F = { f : J (f ) C } for some J (as in running example), and that for the white noise model, following functionals are homogeneous J (af ( /h)) = ah s J J (f ) Ka1 f ( /h),ka 2 д( /h) = a 1 a 2 h 2s K K f,kд L(af ( /h)) = ah s L Lf In running example, we have s L = 0, s J = 2, s K 1/2 40/59
43 (single-class) modulus problem then renormalizes: if д C,δ, f C,δ minimize min L(f 1 f 0 ) st K (f 1 f 0 ) δ, J (f 1 ) C, J (f 0 ) C, then д C,δ = aд 1,1 ( /h) ω C (δ ) = C 1 r δ r ω 1 (1) f C,δ = af 1,1 ( /h) where a = δ s J /(s K s J ) C s K /(s K s J ), h = (C/δ ) 1/(s K s J ) and r = s L s J s K s J. root of minimax MSE, and (excess) length of CIs will shrink at rate n r /2 41/59
44 optimal bandwidths Class of optimal estimators can be written as ˆL δ = L(h) = h 2s K s L Kk( /h),y + Ch s J s L ( LfM,1,1 Kk,K f M,1,1 ), with h = (C/δ ) 1/(s K s J ) and kernel k(u) = rω 1 (1)(д 1,1 f 1,1 ). Recall that optimal δ given by c l (δ/(2σ )) = δω (δ )/ω(δ ). Plugging in definition of h yields optimal bandwidth h = (2σc 1 l (r )/C) 1 s J s K, where, for one-sided CIs, c 1 β (r ) = (z β z 1 α )/2 42/59
45 ratios of optimal bandwidths, s k = 1/2, s l = (twosided) 1.5 Bandwidth ratio 0.95 (twosided) 0.99 (onesided, q=0.8) 0.95 (onesided, q=0.8) (onesided, q=0.5) 0.95 (onesided, q=0.5) r Ratios of of optimal bandwidths for CIs to optimal MSE bandwidths 43/59
46 takeaways from picture Optimal bandwidth ratios depend only on dilation exponents s L,s K and s J : h l h l = c 1 l (r ) (r ) l c 1 1 s J s K Bandwidths of same order in all cases: no undersmoothing For one-sided CIs, bandwidth gets larger with quantile that we are minimizing For 95+% two-sided CIs, if s L = 0 and s K = 1/2, optimal fixed-length CI uses a larger bandwidth than optimal MSE bandwidth 44/59
47 For any bandwidth h, worst-case bias is C 1 r 2 r hs J s L ( k 2 ) 1/2 Can use this worst case bias to construct CIs around L(h) How much bigger are two-sided CIs around minimax MSE bandwidth? Ratio of CI lengths given by c 1 χ,α (r ) c 1 ρ (r ) r 1 χ α (c 1 χ,α (r )(1 1/r )) χ α (cρ 1 (r )(1 1/r )), where χ α (B) solves P ( N (0,1) + B χ ) = Φ(χ B) Φ(χ + B) = 1 α ( ) 1 r Need to use χ α r instead of z α /2 as a critical value to ensure coverage for CI around minimax MSE bandwidth 45/59
48 length of optimal cis relative to cis around mse bw Percentage decrease in length r 46/59
49 critical values for ci around mse bandwidth Critical value r 47/59
50 undercoverage with usual critical values Coverage r 48/59
51 takeaways from pictures To construct two-sided CIs, can keep the same bandwidth as for estimation, price is < 2% for 95% CIs Need to use a slightly higher critical value to ensure proper coverage 49/59
52 suboptimal kernels Results so far assumed using optimal kernel Under renormalization, maximum bias and variance renormalize in similar way for suboptimal kernels For any kernel k, let h k be bandwidth that equates the maximum bias and root variance, and let w(k) = se( L k ( h k )) = sup f bias f ( L k ( h k )) Suppose criterion scales linearly with maximum bias and root variance 50/59
53 Theorem 6 (Efficiency loss of suboptimal kernels) 1. Relative efficiency of k and k (where the optimal bandwidth is used in both cases) does not depend on the performance criterion, and is given by w(k)/w( k) 2. Results for ratios of optimal bandwidths remain unchanged for suboptimal kernels 3. Efficiency loss from using bandwidth optimal for a different criterion rather than bandwidth optimal for criterion of interest remains unchanged for suboptimal kernels 51/59
54 corollaries bounds for minimax MSE efficiency of different kernels of Cheng, Fan, and Marron (1997) 1. are tight; and 2. hold for other efficiency criteria Using minimax MSE bandwidth for two-sided CIs a good idea no matter what kernel one uses 52/59
55 Finite-Sample results Asymptotic results Applications Conclusion
56 rd Interested in Lf = lim x 0 f (x) lim x 0 f (x). Let f + (x) = f (x)i (x > 0) and f (x) = f (x)i (x < 0) so that f = f + f. We consider class F RDT,2 (C) = { f + f : f + F T,2 (C; R + ), f F T,2 (C; R ), } where F 2 (C; X), is the class from Sacks and Ylvisaker (1978), F T,2 (C; X) = { f : f (x) f (0) f (0)x Cx 2 all x X }. F T,2 also used in Cheng, Fan, and Marron (1997) for estimation at a point that justifies much of empirical RD practice 53/59
57 least favorable functions Least favorable functions are symmetric, д δ (x) = f (x) and have form δ д δ (x) = [(b b + d + x Cx 2 ) + (b b + d + x + Cx 2 ) ]1(x > 0) with b,d +,d chosen to solve [(b + d x Cx 2 ) + (b + d x + Cx 2 ) ]1(x < 0) 0 = n д n,b,c (x i )x i σ 2, 0 = (x i ) i=1 n д n +,b,c (x i )x i σ 2, (x i ) i=1 and n i=1 д +,b,c (x i ) σ 2 (x i ) = n i=1 д,b,c (x i ) σ 2 (x i ) 54/59
58 optimal kernel k(u) u Asymptotically, д δ corresponds to difference between two kernel estimators, with bandwidths chosen to equate number of effective observations Optimal kernel same as for inference at a point, derived in Cheng, Fan, and Marron (1997) using upper bound on minimax MSE 55/59
59 application to Lee (2008) RD design: X i = margin of victory in previous election for Democratic party (negative for Republican victory) Y i = Democratic vote share in given election D i = I (X i 0) = indicator for Democratic incumbency n = 6558 observations of elections between 1946 and 1998 For simplicity, assume homoscedastic errors, use estimates ˆσ (0) 2 = and ˆσ + (0) = derived using Imbens and Kalyanaraman (2012) bandwidth LF functions very close to scaled versions of optimal bandwidth Unless C very small, results in line with Lee (2008) and Imbens and Kalyanaraman (2012) 56/59
60 minimax mse estimator as function of c 11 estimate 10 Electoral advantage (%) variance bias_sq b I II III L / Effective number of observations Note L = C 57/59
61 optimal fixed-length cis 13 upper estimate 11 lower Electoral advantage (%) variance bias_sq b I II III L / Effective number of observations /59
62 Finite-Sample results Asymptotic results Applications Conclusion
63 summary 1. give exact results for 1. minimax optimal and 2. adaptive one-sided CIs. CIs use non-random bias correction based on worst-case bias Adaptivity without shape restrictions severely limited, like in two-sided case. Impossible to avoid thinking hard about appropriate C 2. give exact solution to problem of adaptation to a function 3. use these finite-sample results to characterize optimal tuning parameters for different performance criteria building CIs around minimax MSE bandwidth nearly optimal undersmoothing cannot be optimal 59/59
OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 COWLES FOUNDATION DISCUSSION PAPER NO.
OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS By Timothy B. Armstrong and Michal Kolesár May 2016 COWLES FOUNDATION DISCUSSION PAPER NO. 2043 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY
More informationOPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 Revised May 2017
OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS By Timothy B. Armstrong and Michal Kolesár May 2016 Revised May 2017 COWLES FOUNDATION DISCUSSION PAPER NO. 2043R COWLES FOUNDATION FOR RESEARCH IN ECONOMICS
More informationOptimal inference in a class of regression models
Optimal inference in a class of regression models arxiv:1511.06028v1 [math.st] 19 Nov 2015 Timothy B. Armstrong Michal Kolesár Yale University Princeton University November 20, 2015 Abstract We consider
More informationSimple and Honest Confidence Intervals in Nonparametric Regression
Simple and Honest Confidence Intervals in Nonparametric Regression Timothy B. Armstrong Yale University Michal Kolesár Princeton University June, 206 Abstract We consider the problem of constructing honest
More informationOptimal inference in a class of regression models
Optimal inference in a class of regression models arxiv:1511.06028v4 [math.st] 22 Nov 2017 Timothy B. Armstrong Michal Kolesár Yale University Princeton University November 23, 2017 Abstract We consider
More informationSIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised October 2016
SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION By Timothy B. Armstrong and Michal Kolesár June 2016 Revised October 2016 COWLES FOUNDATION DISCUSSION PAPER NO. 2044R COWLES FOUNDATION
More informationfinite-sample optimal estimation and inference on average treatment effects under unconfoundedness
finite-sample optimal estimation and inference on average treatment effects under unconfoundedness Timothy Armstrong (Yale University) Michal Kolesár (Princeton University) September 2017 Introduction
More informationSIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised March 2018
SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION By Timothy B. Armstrong and Michal Kolesár June 2016 Revised March 2018 COWLES FOUNDATION DISCUSSION PAPER NO. 2044R2 COWLES FOUNDATION
More informationFINITE-SAMPLE OPTIMAL ESTIMATION AND INFERENCE ON AVERAGE TREATMENT EFFECTS UNDER UNCONFOUNDEDNESS. Timothy B. Armstrong and Michal Kolesár
FINITE-SAMPLE OPTIMAL ESTIMATION AND INFERENCE ON AVERAGE TREATMENT EFFECTS UNDER UNCONFOUNDEDNESS By Timothy B. Armstrong and Michal Kolesár December 2017 Revised December 2018 COWLES FOUNDATION DISCUSSION
More informationNonparametric Inference via Bootstrapping the Debiased Estimator
Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be
More informationSupplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"
Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs" Yingying Dong University of California Irvine February 2018 Abstract This document provides
More informationON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES. Timothy B. Armstrong. October 2014 Revised July 2017
ON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES By Timothy B. Armstrong October 2014 Revised July 2017 COWLES FOUNDATION DISCUSSION PAPER NO. 1960R2 COWLES FOUNDATION FOR RESEARCH IN
More informationA NOTE ON MINIMAX TESTING AND CONFIDENCE INTERVALS IN MOMENT INEQUALITY MODELS. Timothy B. Armstrong. December 2014
A NOTE ON MINIMAX TESTING AND CONFIDENCE INTERVALS IN MOMENT INEQUALITY MODELS By Timothy B. Armstrong December 2014 COWLES FOUNDATION DISCUSSION PAPER NO. 1975 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationA Simple Adjustment for Bandwidth Snooping
A Simple Adjustment for Bandwidth Snooping Timothy B. Armstrong Yale University Michal Kolesár Princeton University June 28, 2017 Abstract Kernel-based estimators such as local polynomial estimators in
More informationSection 7: Local linear regression (loess) and regression discontinuity designs
Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear
More informationA Simple Adjustment for Bandwidth Snooping
A Simple Adjustment for Bandwidth Snooping Timothy B. Armstrong Yale University Michal Kolesár Princeton University October 18, 2016 Abstract Kernel-based estimators are often evaluated at multiple bandwidths
More informationOPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1
The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationMinimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.
Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of
More informationDiscussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon
Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics
More informationLocal Polynomial Regression
VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based
More informationRegression Discontinuity Designs in Stata
Regression Discontinuity Designs in Stata Matias D. Cattaneo University of Michigan July 30, 2015 Overview Main goal: learn about treatment effect of policy or intervention. If treatment randomization
More informationSupplement to Quantile-Based Nonparametric Inference for First-Price Auctions
Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract
More informationLocal Polynomial Modelling and Its Applications
Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,
More informationMultiscale Adaptive Inference on Conditional Moment Inequalities
Multiscale Adaptive Inference on Conditional Moment Inequalities Timothy B. Armstrong 1 Hock Peng Chan 2 1 Yale University 2 National University of Singapore June 2013 Conditional moment inequality models
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationOn the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference
On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference Sebastian Calonico Matias D. Cattaneo Max H. Farrell November 4, 014 PRELIMINARY AND INCOMPLETE COMMENTS WELCOME Abstract
More informationInference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms
Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms Byunghoon ang Department of Economics, University of Wisconsin-Madison First version December 9, 204; Revised November
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationCoverage Error Optimal Confidence Intervals
Coverage Error Optimal Confidence Intervals Sebastian Calonico Matias D. Cattaneo Max H. Farrell August 3, 2018 Abstract We propose a framework for ranking confidence interval estimators in terms of their
More informationIntegral approximation by kernel smoothing
Integral approximation by kernel smoothing François Portier Université catholique de Louvain - ISBA August, 29 2014 In collaboration with Bernard Delyon Topic of the talk: Given ϕ : R d R, estimation of
More informationRegression Discontinuity Designs
Regression Discontinuity Designs Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Regression Discontinuity Design Stat186/Gov2002 Fall 2018 1 / 1 Observational
More informationA Simple Adjustment for Bandwidth Snooping
Review of Economic Studies (207) 0, 35 0034-6527/7/0000000$02.00 c 207 The Review of Economic Studies Limited A Simple Adjustment for Bandwidth Snooping TIMOTHY B. ARMSTRONG Yale University E-mail: timothy.armstrong@yale.edu
More informationSmooth functions and local extreme values
Smooth functions and local extreme values A. Kovac 1 Department of Mathematics University of Bristol Abstract Given a sample of n observations y 1,..., y n at time points t 1,..., t n we consider the problem
More informationInterval Estimation. Chapter 9
Chapter 9 Interval Estimation 9.1 Introduction Definition 9.1.1 An interval estimate of a real-values parameter θ is any pair of functions, L(x 1,..., x n ) and U(x 1,..., x n ), of a sample that satisfy
More informationLecture 12 November 3
STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationIndependent and conditionally independent counterfactual distributions
Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views
More informationWhy high-order polynomials should not be used in regression discontinuity designs
Why high-order polynomials should not be used in regression discontinuity designs Andrew Gelman Guido Imbens 6 Jul 217 Abstract It is common in regression discontinuity analysis to control for third, fourth,
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationConfidence intervals for kernel density estimation
Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationMinimum Contrast Empirical Likelihood Manipulation. Testing for Regression Discontinuity Design
Minimum Contrast Empirical Likelihood Manipulation Testing for Regression Discontinuity Design Jun Ma School of Economics Renmin University of China Hugo Jales Department of Economics Syracuse University
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationA Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers
6th St.Petersburg Workshop on Simulation (2009) 1-3 A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers Ansgar Steland 1 Abstract Sequential kernel smoothers form a class of procedures
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationUnderstanding Regressions with Observations Collected at High Frequency over Long Span
Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University
More informationInference in Regression Discontinuity Designs with a Discrete Running Variable
Inference in Regression Discontinuity Designs with a Discrete Running Variable Michal Kolesár Christoph Rothe arxiv:1606.04086v4 [stat.ap] 18 Nov 2017 November 21, 2017 Abstract We consider inference in
More informationUniversity, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA
This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
More informationHYPOTHESIS TESTING: FREQUENTIST APPROACH.
HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationModel-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego
Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships
More informationInference on distributions and quantiles using a finite-sample Dirichlet process
Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics
More informationStat 710: Mathematical Statistics Lecture 31
Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:
More informationClassification with Reject Option
Classification with Reject Option Bartlett and Wegkamp (2008) Wegkamp and Yuan (2010) February 17, 2012 Outline. Introduction.. Classification with reject option. Spirit of the papers BW2008.. Infinite
More informationUniform Confidence Sets for Nonparametric Regression with Application to Cosmology
Uniform Confidence Sets for Nonparametric Regression with Application to Cosmology Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry
More informationInference in Regression Discontinuity Designs with a Discrete Running Variable: Supplemental Materials
Inference in Regression Discontinuity Designs with a Discrete Running Variable: Supplemental Materials Michal Kolesár Christoph Rothe March 27, 2018 This supplement is organized as follows. Section S1
More informationA New Method for Varying Adaptive Bandwidth Selection
IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 47, O. 9, SEPTEMBER 1999 2567 TABLE I SQUARE ROOT MEA SQUARED ERRORS (SRMSE) OF ESTIMATIO USIG THE LPA AD VARIOUS WAVELET METHODS A ew Method for Varying Adaptive
More informationTESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS
TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS DENIS CHETVERIKOV Abstract. Monotonicity is a key qualitative prediction of a wide array of economic models derived via robust comparative statics.
More informationLocal Polynomial Order in Regression Discontinuity Designs 1
Local Polynomial Order in Regression Discontinuity Designs 1 Zhuan Pei 2 Cornell University and IZA David Card 4 UC Berkeley, NBER and IZA David S. Lee 3 Princeton University and NBER Andrea Weber 5 Central
More informationMinimax Estimation of a nonlinear functional on a structured high-dimensional model
Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics
More informationWhy High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs
Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs Andrew GELMAN Department of Statistics and Department of Political Science, Columbia University, New York, NY, 10027 (gelman@stat.columbia.edu)
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationConfidence Sets Based on Shrinkage Estimators
Confidence Sets Based on Shrinkage Estimators Mikkel Plagborg-Møller April 12, 2017 Shrinkage estimators in applied work { } ˆβ shrink = argmin β ˆQ(β) + λc(β) Shrinkage/penalized estimators popular in
More informationTest Volume 11, Number 1. June 2002
Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa
More informationOptimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs
Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs Sebastian Calonico Matias D. Cattaneo Max H. Farrell September 14, 2018 Abstract Modern empirical work in
More informationEmpirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;
BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationLecture 21. Hypothesis Testing II
Lecture 21. Hypothesis Testing II December 7, 2011 In the previous lecture, we dened a few key concepts of hypothesis testing and introduced the framework for parametric hypothesis testing. In the parametric
More informationUniform Confidence Sets for Nonparametric Regression with Application to Cosmology
Uniform Confidence Sets for Nonparametric Regression with Application to Cosmology Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry
More informationCausal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies
Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed
More informationPartial Identification and Confidence Intervals
Partial Identification and Confidence Intervals Jinyong Hahn Department of Economics, UCLA Geert Ridder Department of Economics, USC September 17, 009 Abstract We consider statistical inference on a single
More informationVariance Function Estimation in Multivariate Nonparametric Regression
Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered
More informationSemi and Nonparametric Models in Econometrics
Semi and Nonparametric Models in Econometrics Part 4: partial identification Xavier d Haultfoeuille CREST-INSEE Outline Introduction First examples: missing data Second example: incomplete models Inference
More informationNonparametric Inference in Cosmology and Astrophysics: Biases and Variants
Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Collaborators:
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationOptimal bandwidth selection for the fuzzy regression discontinuity estimator
Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal
More informationAdaptivity to Local Smoothness and Dimension in Kernel Regression
Adaptivity to Local Smoothness and Dimension in Kernel Regression Samory Kpotufe Toyota Technological Institute-Chicago samory@tticedu Vikas K Garg Toyota Technological Institute-Chicago vkg@tticedu Abstract
More informationInference for Identifiable Parameters in Partially Identified Econometric Models
Inference for Identifiable Parameters in Partially Identified Econometric Models Joseph P. Romano Department of Statistics Stanford University romano@stat.stanford.edu Azeem M. Shaikh Department of Economics
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationInverse Statistical Learning
Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning
More informationOptimal Bandwidth Choice for the Regression Discontinuity Estimator
Optimal Bandwidth Choice for the Regression Discontinuity Estimator Guido Imbens and Karthik Kalyanaraman First Draft: June 8 This Draft: September Abstract We investigate the choice of the bandwidth for
More informationSimultaneous selection of optimal bandwidths for the sharp regression discontinuity estimator
Simultaneous selection of optimal bandwidths for the sharp regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working
More informationNonparametric Regression
Adaptive Variance Function Estimation in Heteroscedastic Nonparametric Regression T. Tony Cai and Lie Wang Abstract We consider a wavelet thresholding approach to adaptive variance function estimation
More informationNonparametric regression with martingale increment errors
S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration
More informationInference in Regression Discontinuity Designs with a Discrete Running Variable
Inference in Regression Discontinuity Designs with a Discrete Running Variable Michal Kolesár Christoph Rothe June 21, 2016 Abstract We consider inference in regression discontinuity designs when the running
More informationRegression Discontinuity
Regression Discontinuity Christopher Taber Department of Economics University of Wisconsin-Madison October 16, 2018 I will describe the basic ideas of RD, but ignore many of the details Good references
More informationAn Alternative Assumption to Identify LATE in Regression Discontinuity Design
An Alternative Assumption to Identify LATE in Regression Discontinuity Design Yingying Dong University of California Irvine May 2014 Abstract One key assumption Imbens and Angrist (1994) use to identify
More informationSIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011
SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER By Donald W. K. Andrews August 2011 COWLES FOUNDATION DISCUSSION PAPER NO. 1815 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS
More informationA nonparametric method of multi-step ahead forecasting in diffusion processes
A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationAN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY
Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria
More informationRegression Discontinuity
Regression Discontinuity Christopher Taber Department of Economics University of Wisconsin-Madison October 24, 2017 I will describe the basic ideas of RD, but ignore many of the details Good references
More information