The Numerical Delta Method and Bootstrap

The Numerical Delta Method and Bootstrap Han Hong and Jessie Li Stanford University and UCSC 1 / 41

Motivation Recent developments in econometrics have given empirical researchers access to estimators that exhibit nonsmoothness in the population objective function. Hypothesis tests and counterfactual analyses are being performed on nondifferentiable functions of structural parameters. This paper combines numerical differentiation with resampling to offer an asymptotically valid and computationally attractive approach for conducting inference in a class of possibly nonsmooth problems. Initially motivated by Fang and Santos (2014 and Dumbgen (1993. 2 / 41

Motivating Example Tennessee STAR experiment examined effects of class size reduction on test scores through a randomized experiment. Interested in testing whether treatment weakly benefits students at all percentiles of the test score distribution. Linear conditional quantile model: Q Y (τ W, X = α(τ + θ(τw + X β(τ θ(τ is Quantile Treatment Effect (QTE at τth quantile. Y is test scores. W is indicator for assignment to small class. X are student and teacher covariates. H 0 : θ(τ 0 for all τ T versus H 1 : θ(τ < 0 for some τ T T {0.05, 0.10,..., 0.95}. 3 / 41

Motivating Example A test statistic is the negative of the minimum of the normalized QTEs: S n nφ (ˆθ n = n min τ T ˆθ n (τ ÂsyVar (ˆθ n (τ Need to estimate the test statistic s limiting distribution under H 0 and use the percentiles of that distribution to form critical values. The minimum function makes the test statistic a nondifferentiable function of the QTEs, which invalidates standard delta method and bootstrap. Subsampling is a viable alternative but more information from the sample can be used. 4 / 41

Directionally Differentiable Function We are still able to obtain the test statistic s limiting distribution because the function is directionally differentiable. A function that is directionally differentiable has a different derivative depending on how you orient the tangent plane. For ease of visualization, look at min (β 1, β 2. 5 / 42

Outline 1 The Directional Delta Method 2 The Numerical Delta Method Pointwise Valid Confidence Intervals Uniformly Valid Inference 3 Second Order Directional Delta Method Application to Partially Identified Models 4 The Numerical Bootstrap General Principle Applications to Nonsmooth Optimization Problems Comparison with Subsampling 5 Empirical Application to Tennessee STAR Experiment 6 / 41

The Directional Delta Method The Directional Delta Method Santos and Fang (2014, Shapiro (1991, Dumbgen (1993 Consider a function(al φ ( which is Hadamard directionally differentiable. Goal: statistical inference for φ (θ 0 using limiting distribution of r n (φ (ˆθn φ (θ 0 where r n (ˆθn θ 0 converges in distribution to G 0. Consider rewriting the statistic as a finite difference: ( φ r n (φ (ˆθ n φ (θ 0 = φ θ 0 (G 0 (θ 0 + 1 rn r n (ˆθ n θ 0 φ (θ 0 1/r n As n, the stepsize 1/r n 0, and the finite difference converges to the Hadamard directional derivative evaluated at θ 0 with direction given by G 0. Consistent estimate of φ θ 0 (G 0 can be used to conduct inference. 7 / 41

The Directional Delta Method Examples of Directionally Differentiable Functions Regression function in threshold regression (e.g. Hansen 2015 Reinhart and Rogoff (2010 argue that economic growth declines when government debt relative to GDP exceeds a threshold. y t = β 1 (x t γ + β 2 (x t γ + + β 3 z t + e t = φ t (θ + e t. y t is GDP growth, x t is debt to GDP percentage. Would like to form confidence bands around the regression function. φ t (θ is directionally but not fully differentiable at x t = γ. Upper and Lower Bounds on Value Distribution in Incomplete Auction Model (e.g. Haile and Tamer (2003 Test statistics for subvector inference in moment inequality models (e.g. Bugni, Canay and Shi (2014 Endpoints of identified set for partially identified parameters (e.g. Lee and Bhattacharya (2016 8 / 41

The Directional Delta Method Literature Review Fang and Santos (2014 show that consistent estimates of the directional derivative can be used for inference. They derive the analytic expressions for the directional derivative and estimate its components on a case by case basis. We generalize Dumbgen s method to avoid analytic derivations and achieves pointwise valid inference for all directionally differentiable functions and uniformly valid inference under convexity and Lipschitz continuity of the function(al. We allow for estimators that are not n consistent or asymptotically normal. Hirano and Porter (2012 and Song (2014 focus on estimation rather than inference. Woutersen and Ham (2016 propose a method for conducting inference on nonsmooth functions of parameters based on projections. 9 / 41

The Directional Delta Method Examples of Directional Derivatives For φ (θ = aθ + + bθ, where θ + = max{θ, 0} and θ = min{θ, 0}, φ θ (h = ah1(θ > 0 bh1(θ < 0 + ( ah + + bh 1(θ = 0 For φ (θ = max{θ 1, θ 2,..., θ K }, φ θ (h = max {h k} where I = {k : θ k = max{θ 1, θ 2,..., θ K }} k I For φ (θ = inf τ T θ (τ, φ θ (h = inf h (τ where T 0 = argmin τ T θ (τ τ T 0 10 / 41

The Numerical Delta Method The Numerical Directional Delta Method Estimates the distribution of φ θ 0 (G 0 using the distribution of φ (ˆθn + ɛ n Z ˆφ n (Z n φ (ˆθ n n (1 ɛ n Z n Theorem P W G 0 (convergence in distribution conditional on the data e.g. Asymptotic Normal Approximation: N(0, ˆσ 2 n e.g. Bootstrap: r n (ˆθ n ˆθ n where ˆθ n are the bootstrapped estimates. e.g. MCMC: r n (ˆθ n ˆθ n where ˆθ n are draws from the posterior. For φ( Hadamard directionally differentiable at θ 0, ɛ n 0, r n ɛ n, and Z P n G 0, W ˆφ n (Z n P W φ θ 0 (G 0. 11 / 41

The Numerical Delta Method Numerical Delta Method Algorithm Pointwise Valid Confidence Intervals Suppose we take the bootstrap approach: For B iterations, draw with replacement a resample of size n, reestimate the parameters ˆθ n. Form the B dim (θ matrix Z n = r n (ˆθ n ˆθ n. Form the B 1 vector ˆφ n (Z n = φ(ˆθ n+ɛ nz n φ(ˆθ n ɛ n. A 1 α two-sided equal-tailed confidence interval for φ (θ 0 can be formed by ] [φ(ˆθ n 1rn c 1 α/2, φ(ˆθ n 1rn c α/2 where c 1 α/2 and c α/2 are the (1 α/2th and (α/2th percentiles of the empirical distribution of ˆφ n (Z n. A 1 α symmetric confidence interval can be formed by [φ(ˆθ n 1rn d 1 α, φ(ˆθ n + 1rn d 1 α ] where d 1 α is the (1 αth percentile of the empirical distribution of ˆφ n(z n. 12 / 41

The Numerical Delta Method Pointwise Valid Confidence Intervals Choice of ɛ n for First Order Numerical Delta Method Optimal ɛ n should minimize error between ˆφ n (Z n and φ θ 0 (G 0. Suppose φ θ 0 ( is Lipschitz. When the second order directional ( derivative is nonzero, 1/rn If φ θ 0 ( is not linear, ɛ n = O leads to an error of O P ( 1/r n. If φ θ 0 ( is linear, ɛ n = O (1/r n leads to error of O P (1/r n. When the second order directional derivative is zero, e.g. φ(θ = aθ + + bθ, φ(θ = min{θ 1,..., θ k }, ɛ n should converge to zero very slowly to get error of O P (1/r n. 13 / 41

Uniform Size Control The Numerical Delta Method Uniformly Valid Inference Consider hypothesis testing of the following form: H 0 : φ (θ 0 0 against H 1 : φ (θ 0 > 0. using the test statistic r n φ (ˆθ n. Examples of such tests include Dominance Test: H 0 : θ(τ 0 for all τ T versus H 1 : θ(τ < 0 for some τ T. r n φ (ˆθn = ( ˆθ n min n(τ τ T ÂsyVar(ˆθ n(τ Moment Inequalities Test: H 0 : θ 0k = E[X k ] 0 for all k = 1...K versus H 1 : θ 0k < 0 for some k = 1...K r n φ (ˆθ n = ( K ( ( 2 1/2 n X k k=1 14 / 41

The Numerical Delta Method Uniformly Valid Inference Uniform Size Control & Uniformly Valid Confidence Intervals Reject H 0 whenever r n φ (ˆθ n ĉ 1 α, where ĉ 1 α is the 1 α quantile of ˆφ n (Z n. Whenever φ (θ is convex and Lipschitz in θ, the size of this test will be less than or equal to α uniformly over a class of data generating distributions. If φ ( is( convex and Lipschitz, the upper one-sided confidence interval φ (ˆθ n ĉ1 α r n, has coverage greater than or equal to 1 α uniformly over a class of data generating distributions. If φ ( is( concave and Lipschitz, then the lower one-sided confidence interval, φ (ˆθ n ĉα r n will have uniformly valid coverage asymptotically. 15 / 41

Outline The Numerical Bootstrap General Principle 1 The Directional Delta Method 2 The Numerical Delta Method Pointwise Valid Confidence Intervals Uniformly Valid Inference 3 Second Order Directional Delta Method Application to Partially Identified Models 4 The Numerical Bootstrap General Principle Applications to Nonsmooth Optimization Problems Comparison with Subsampling 5 Empirical Application to Tennessee STAR Experiment 24 / 41

The Numerical Bootstrap General Principle A generalized numerical bootstrap method Inference on parameters which can be written as functions of the data generating distribution. Write θ 0 = θ (P and ˆθ n = θ (P n, where P is the data generating distribution and P n is the empirical distribution. Goal is to consistently estimate limiting distribution of a (n (θ (P n θ (P = n γ (ˆθ n θ 0 : a (n ( θ ( P + 1 n n (Pn P θ (P J. For ɛ n 0 and nɛ n, the numerical bootstrap replaces P with P n, 1/ n with ɛ n and n (P n P with n (Pn P n where Pn is the bootstrapped empirical distribution. ( 1 a θ ɛ 2 n P n + ɛ n n (P n P n θ (P }{{} n = ɛ 2γ n (ˆθ n ˆθ n Zn 25 / 41

The Numerical Bootstrap Comparison with Subsampling Comparison of Numerical Bootstrap with Subsampling Subsampling approximates the limiting distribution of ( ( a (n θ P + 1 n (Pn P θ (P n using the distribution of ( ( a (b θ P n + 1 b (Pb P n b θ (P n Numerical bootstrap estimates n (P n P using the entire sample of size n, which is more precise than using a subsample of size b << n. ( 1 (θ ( a Pn ɛ 2 + ɛ n n (P n P n θ (P n n 36 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Maximum Score Estimator of Manski (1975 Model:y i = 1 (x i θ + ν i 0 where P (ν i < 0 x i = x = 0.5 for all x. Maximize the number of correct predictions 1 ˆθ n = argmax θ Θ n n [2 1 (y i = 1 1] 1 ( x i θ 0 i=1 Kim and Pollard (1990 show that the maximum score estimator converges to a nonstandard limiting distribution at the cube root rate. Abrevaya and Huang (2005 show that one cannot use the bootstrap to estimate that limiting distribution consistently. Subsampling is a viable alternative. Alternative bootstrap methods: Seijo and Sen (2011: based on smoothing Most recently Cattaneo, Jansson and Nagasawa (2017: combine with numerical Hessian estimation. Horowitz (1992, Hong, Mahajan and Nekipelov (2016: smooth objective function. 26 / 41

The Numerical Bootstrap M-estimator consistency Applications to Nonsmooth Optimization Problems ˆθ n arg maxp n π(, θ = 1 θ Θ n n π (z i, θ. We approximate the limiting distribution of n γ (ˆθ n θ 0 using the finite sample distribution of ɛ 2γ n (ˆθ n ˆθ n, where ˆθ n arg maxznπ(, θ, and Zn = P n + ɛ n Ĝn is a linear combination θ Θ between the empirical distribution and the bootstrapped empirical process. For example, when Ĝn is the multinomial bootstrap, for each bootstrap sample zi, i = 1,..., n, ˆθ n 1 = arg max θ Θ n n i=1 π (z i, θ + ɛ n n 1 n i=1 n (π (zi, θ π (z i, θ. i=1 27 / 41

The Numerical Bootstrap M-estimator consistency Applications to Nonsmooth Optimization Problems On the other hand, when Ĝ n is the Wild bootstrap, ˆθ n 1 = arg max θ Θ n n i=1 π (z i, θ + ɛ n n 1 n n ( ξi ξ π (z i, θ. for Z 0 (h a mean zero Gaussian process with covariance kernel Σ ρ and nondegenerate increments, and i=1 n γ (ˆθ n θ 0 J arg maxz 0 (h 1 h 2 h Hh Z n ɛ 2γ n (ˆθ n ˆθ n P W J and Z n J. 28 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Maximum Score Estimator of Manski (1975 For each bootstrap draw {yi, x i }n i=1, compute Numerical Bootstrap estimate: ˆθ n 1 = argmax θ Θ n + ɛ n n 1 n n i=1 n [2 1 (y i = 1 1] 1 ( x i θ 0 i=1 { [2 1 (y i = 1 1] 1 ( x i θ 0 [2 1 (y i = 1 1] 1 ( x i θ 0 } Use the simulated distribution (conditional on data of (ˆθ n ˆθ n to approximate the limit distribution of ɛ 2/3 n n 1/3 (ˆθ n θ 0. 29 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Maximum Score Estimator of Manski (1975 A 1 α two-sided equal-tailed confidence interval for φ (θ 0 can be formed by [ ˆθ n 1 n 1/3 c 1 α/2, ˆθ n 1 ] n 1/3 c α/2 where c 1 α/2 and c α/2 are the (1 α/2th and (α/2th percentiles of the distribution of ɛ 2/3 n (ˆθ n ˆθ n. A 1 α symmetric confidence interval can be formed by [ ˆθ n 1 n 1/3 d 1 α, ˆθ n + 1 ] n 1/3 d 1 α where d 1 α is the (1 αth percentile of the distribution of (ˆθ n. ˆθ n ɛ 2/3 n 30 / 41

The Numerical Bootstrap Constrained M-estimation Applications to Nonsmooth Optimization Problems Replace Θ with a constrain set C, such that for ˆθ n C, P n π (, ˆθ n inf P ( nπ (, θ + o P n 2γ, (4 θ C and for ˆθ n C, Z nπ (, ˆθ n inf θ C Z nπ (, θ + op ( ɛ 4γ n. Let T C (θ 0 be a cone such that when α, α (C θ 0 T C (θ 0. J arg min h TC (θ 0 Z 0 (h + 1 2 h Hh. Jˆ n n (ˆθn γ θ 0 J, Jˆ n ɛ 2γ n (ˆθ n ˆθ P W n J, and Jˆ n ɛ 2γ n (ˆθ n ˆθ n J. Can also estimate T C (θ 0 directly by h C ˆθ n ɛ n in some situations. 31 / 41

The Numerical Bootstrap Sample size dependent statistics For Ĝ n = n (P n P, ˆθ n = θ (P n, n, n = θ Applications to Nonsmooth Optimization Problems ( P + 1 Ĝ n, n 2, n n Suppose Define Jˆ n = a (n (θ (P n, n, n θ 0 Jˆ ( ˆθ n = θ Zn, 1 ( ɛ 2, n = θ P n + ɛ n Ĝn, 1 n ɛ 2, n. n ( Jˆ 1 n = a (ˆθ P W ɛ 2 n ˆθ n n Examples include Laplace estimators (e.g. Jun, Pinkse, Wan and LASSO. ˆ J 32 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Application to LASSO, Finite Dimensional Case LASSO s asymptotic distribution cannot be consistently estimated by the conventional bootstrap when some of the coefficients are zero. 1 ˆβ n = arg min β n n i=1 ( yi x i β 2 + λ n n p β k. (5 k=1 Numerical Bootstrap consistently estimates the asymptotic distribution of ( ( n ˆβ n β 0 using the distribution of ɛ 1 n ˆβ n ˆβ n where Z n ( y x β 2 = 1 n ˆβ n = arg min Zn β ( y x β p 2 + λn ɛ n β k (6 k=1 ( n ( n yi x i β 2 ( + ɛn 2 n yi ( xi β yi x i β 2 i=1 n i=1 i=1 (7 33 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Application to 1-Norm SVM, Finite Dimensional Case For κ > 0, λ n > 0, the 1-norm SVM estimator is 1 n ( ( ˆβ n = arg min ρτ yi x i β κ + λ + n β n n i=1 k β j. where ρ τ (u is the Koenker and Bassett (1978 check function. If τ = 1 2 and κ = 0, then ˆβ is the LASSO quantile regression estimator of Belloni et al (2011. Numerical Bootstrap consistently estimates the asymptotic distribution of ( ( n ˆβ n β 0 using the distribution of ɛ 1 n ˆβ n ˆβ n ˆβ n 1 = arg min β n + ɛn n ( n i=1 n i=1 k + λ nɛ n β j j=1 ( ρτ ( yi x i β + κ + (ρ τ ( y i xi β + κ + n i=1 j=1 ( ρτ ( yi x i β + κ + (8 34 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Recentering Test H 0 : θ (P = θ vs H 1 : θ (P > θ. Estimate the distribution of a (n (ˆθn θ by either (1 the noncentered numerical bootstrap distribution ( 1 a (ˆθ n θ, or (2 the centered numerical bootstrap distribution ( 1 a (ˆθ n ˆθ n. Similar to subsampling (Chernozhukov et al. ɛ 2 n ɛ 2 n Can also estimate unknown polynomial rates of convergence. 35 / 41

Outline Second Order Directional Delta Method 1 The Directional Delta Method 2 The Numerical Delta Method Pointwise Valid Confidence Intervals Uniformly Valid Inference 3 Second Order Directional Delta Method Application to Partially Identified Models 4 The Numerical Bootstrap General Principle Applications to Nonsmooth Optimization Problems Comparison with Subsampling 5 Empirical Application to Tennessee STAR Experiment 16 / 41

Second Order Directional Delta Method Second Order Directional Delta Method First order directional delta method may produce degenerate limiting distribution. e.g. Test statistics commonly used for moment inequality models have zero first order directional derivative under the null. Theorem Let φ ( be a twice Hadamard directionally differentiable function at θ 0 and r n (ˆθ n θ 0 G 0. If φ θ 0 (h = 0 for all h, then r 2 n ( φ(ˆθ n φ(θ 0 J 1 2 φ θ 0 (G 0 (2 17 / 41

Second Order Directional Delta Method Application to Partially Identified Models Application to Partially Identified Models Simplified 2x2 Entry Game in Bresnahan and Reiss (1991: firm j {1, 2} decides whether to enter a market i {1,..., n}. Action: z j,i = 1 if firm j enters market i. Benefit of Entry: η j,i U(0, 1. Profit function: π j,i = (η j,i β j z j,i 1{z j,i = 1} where β (0, 1 2. Firms play pure strategy Nash Equilibria: 1 (z 1i, z 2i = (1, 1 if η j,i > β j j. 2 (z 1i, z 2i = (1, 0 if η 1,i > β 1 and η 2,i < β 2. 3 (z 1i, z 2i = (0, 1 if η 1,i < β 1 and η 2,i > β 2. 4 (z 1i, z 2i {(1, 0, (0, 1} if η j,i < β j j. Model implies P (z 1i = 1, z 2i = 1 = (1 β 1 (1 β 2 β 2 (1 β 1 P (z 1i = 1, z 2i = 0 β 2 18 / 41

Second Order Directional Delta Method Application to Partially Identified Models Application to Partially Identified Models Model leads to Q = 4 moment inequalities Pg ( ; β Eg (z i ; β 0. z 1i z 2i (1 β 1 (1 β 2 Pg ( ; β = (z 1i z 2i (1 β 1 (1 β 2 z 1i (1 z 2i β 2 (1 β 1 β 2 z 1i (1 z 2i Suppose we would like to perform the following test in Bugni, Canay, and Shi (BCS (2014 for k = 1, 2: H 0 : β k = γ 0 H 1 : β k γ 0 For P n g( ; β 1 n n i=1 g(z i; β, test statistic is n inf S (P ng( ; β = n β B k (γ 0 inf β B k (γ 0 q=1 Q ( (Pn g q ( ; β 2 B k (γ 0 {β B : β k = γ 0 } is the set of all β = (β 1, β 2 such that β k = γ 0. 19 / 41

Second Order Directional Delta Method Application to Partially Identified Models Application to Partially Identified Models A level α test rejects when n inf S (P ng(z i ; β is greater than the β B k (γ 0 (1 α percentile of a consistent estimate of the limiting distribution of ( ( n inf (ˆθ S n (β inf S (θ 0(β = n φ (ˆθ n φ(θ 0 β B k (γ 0 β B k (γ 0 Define θ 0 (β = Pg(, β and ˆθ n (β = P n g(, β. Define φ(θ inf S(θ(β = (f S (θ, S (θ = Q ( q=1 θ 2, q β B k (γ 0 f (S = inf S (β. β B k (γ 0 Using the chain rule, we can show that φ is twice Hadamard directionally differentiable. 20 / 41

Second Order Directional Delta Method Second Order Numerical Delta Method Application to Partially Identified Models Theorem Let φ ( be a twice Hadamard directionally differentiable function at θ 0 and r n (ˆθ n θ 0 G 0. Let ɛ n 0, r n ɛ n, and Z P n G 0. Then if W φ θ 0 (h = 0 for all h, 1 φ (ˆθ n + ɛ n Z ˆφ n (Z n 2 n ɛ 2 n φ (ˆθ n P W J 1 2 φ θ 0 (G 0. (3 For P ng( ; β 1 n n i=1 g(z i ; β and Z n n (P n P n g( ; β, inf S (( P n + ɛ n n (P n P n g( ; β 1 ˆφ n (Z β B k (γ 0 n = 2 ɛ 2 n inf β B k (γ 0 S (P ng( ; β 21 / 41

Second Order Directional Delta Method Second Order Numerical Delta Method Alternatively, we can use Application to Partially Identified Models ˆφ n(h φ(ˆθ n + 2ɛ n h 2φ(ˆθ n + ɛ n h + φ(ˆθ n ɛ 2 n For our moment inequalities example, ˆφ n (Z n = 1 ( inf ɛ S (( P 2 n + 2ɛ n n (P n P n g( ; β n β B k (γ 0 2 inf S (( P n + ɛ n n (P n P n g( ; β + β B k (γ 0 inf S (P ng( ; β β B k (γ 0 Theorem Under the same conditions as in the previous theorem, except without φ θ 0 (h 0, ˆφ n (Z n P φ θ W 0 (G 0. 22 / 41

Second Order Directional Delta Method Moment Inequalities Simulation Application to Partially Identified Models A level 5% test rejects when n inf S (P ng( ; β > ĉ 95, where ĉ 95 β B k (γ 0 is the 95th percentile of one of the following distributions. 1 Numerical Second Order Derivative 1: Two-term finite difference 2 Numerical Second Order Derivative 2: Three-term finite difference 3 Bugni, Canay, and Shi (2014 Minimum Resampling Test 4 Romano and Shaikh (2008 Subsampling Test using b = n 2/3 β 1 = 0.3 and β 2 = 0.5 are the true values. Plot rejection frequencies when testing H 0 : β 1 = γ 0 against H 1 : β 1 γ 0 for γ 0 [0.1, 0.5]. Plot rejection frequencies when testing H 0 : β 2 = γ 0 against H 1 : β 2 γ 0 for γ 0 [0.3, 0.7]. 23 / 41

Second Order Directional Delta Method Application to Partially Identified Models Figure : Rejection frequency as a function of β 1 1 Rejection Frequencies as a function of beta1 N=1000 B=1000 0.9 0.8 0.7 Rejection Frequency 0.6 0.5 0.4 0.3 0.2 Numerical Derivative 1 Numerical Derivative 2 0.1 BCS Subsampling 0 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 beta1 25 / 42

Second Order Directional Delta Method Application to Partially Identified Models Figure : Rejection frequency as a function of β 1 0.64 Rejection Frequencies as a function of beta1 N=1000 B=1000 0.62 0.6 Rejection Frequency 0.58 0.56 0.54 0.52 0.5 Numerical Derivative 1 Numerical Derivative 2 BCS Subsampling 0.36 0.365 0.37 0.375 0.38 beta1 26 / 42

Second Order Directional Delta Method Application to Partially Identified Models Figure : Rejection frequency as a function of β 2 1 Rejection Frequencies as a function of beta2 N=1000 B=1000 0.9 0.8 0.7 Rejection Frequency 0.6 0.5 0.4 0.3 0.2 Numerical Derivative 1 0.1 Numerical Derivative 2 BCS Subsampling 0 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 beta2 27 / 42

Second Order Directional Delta Method Application to Partially Identified Models Figure : Rejection frequency as a function of β 2 0.58 Rejection Frequencies as a function of beta2 N=1000 B=1000 0.57 0.56 Rejection Frequency 0.55 0.54 0.53 0.52 0.51 Numerical Derivative 1 Numerical Derivative 2 BCS Subsampling 0.568 0.5685 0.569 0.5695 0.57 0.5705 0.571 0.5715 0.572 beta2 28 / 42

Empirical Application to Tennessee STAR Experiment Outline 1 The Directional Delta Method 2 The Numerical Delta Method Pointwise Valid Confidence Intervals Uniformly Valid Inference 3 Second Order Directional Delta Method Application to Partially Identified Models 4 The Numerical Bootstrap General Principle Applications to Nonsmooth Optimization Problems Comparison with Subsampling 5 Empirical Application to Tennessee STAR Experiment 37 / 41

Empirical Application to Tennessee STAR Experiment Tennessee STAR Experiment For 79 schools between 1985 and 1988, Tennessee government randomly assigned some students to classes with only 13-17 students while others to classes with 20-25 students. Substantial attrition after the first year. Many students either moved away from participating schools or had to repeat a grade, which meant that they no longer received treatment. Run regressions on student-level variables for the year in which they entered the program Chetty et al (2011. Y i is the average of each student s math and reading percentile ranks obtained using the transformation in Krueger (1999. X i are student s gender, race, age, whether she has free lunch, teacher s experience, her position on the career ladder, whether she has a higher degree, and whether school is urban or rural. Fail to reject the null that QTEs at quantiles {0.05, 0.10,..., 0.95} are all nonnegative. 38 / 41

Empirical Application to Tennessee STAR Experiment Empirical Application Suppose we would like to form confidence intervals around the maximum and minimum QTEs. φ 1 (θ max τ T θ(τ and φ 2 (θ min τ T θ(τ. T {0.05, 0.10,..., 0.95}. Numerical Delta Method: For B iterations, draw with replacement a resample of size n, reestimate the QTE ˆθ n. Form the B 19 matrix Z n = n (ˆθ n ˆθ n. Compute percentiles of φ(ˆθ n+ɛ nz n φ(ˆθ n ɛ n. Subsampling: For B iterations, draw with replacement a resample of size b << n, reestimate the QTE ˆθ b. Compute percentiles of ( b φ (ˆθ b φ (ˆθ n. 39 / 41

Empirical Application to Tennessee STAR Experiment Table: 95% Numerical Delta Method Confidence Intervals for the Maximum and Minimum Quantile Treatment Effect φ(ˆθ n SE Equal-Tailed Lower Upper Max 6.77% 0.73% (4.07%,6.97% (,6.77% (4.37%, Min 1.52% 0.75% (1.35%,4.26% (,3.99% (1.54%, Table: 95% Subsampling Confidence Intervals for the Maximum and Minimum Quantile Treatment Effect SE Equal-Tailed Lower Upper Max 0.83% (4.12%,7.42% (,7.15% (4.42%, Min 0.81% (0.87%,4.05% (,3.76% (1.10%, 40 / 41

Empirical Application to Tennessee STAR Experiment Conclusion Demonstrated how to conduct inference on directionally differentiable functions of parameters using the numerical delta method. Pointwise valid inference for all directionally differentiable functions. Uniformly valid inference for convex and Lipschitz functions. Consistent estimation of the limiting distribution of test statistics in partially identified models. Proposed a numerical bootstrap principle that can be used to conduct inference when regular bootstrap fails. Pointwise valid inference for Maximum Score, LASSO, and 1-norm Support Vector Machine Regression. 41 / 41