The Numerical Delta Method and Bootstrap

Similar documents
Statistical Properties of Numerical Derivatives

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Inference for Identifiable Parameters in Partially Identified Econometric Models

Program Evaluation with High-Dimensional Data

Quantile Processes for Semi and Nonparametric Regression

A Resampling Method on Pivotal Estimating Functions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

Flexible Estimation of Treatment Effect Parameters

A Course in Applied Econometrics. Lecture 10. Partial Identification. Outline. 1. Introduction. 2. Example I: Missing Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Partial Identification and Inference in Binary Choice and Duration Panel Data Models

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Bayesian Indirect Inference and the ABC of GMM

Inference for identifiable parameters in partially identified econometric models

Inference for Subsets of Parameters in Partially Identified Models

Comparison of inferential methods in partially identified models in terms of error in coverage probability

Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms

STAT 461/561- Assignments, Year 2015

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Partial Identification and Confidence Intervals

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

Adaptive test of conditional moment inequalities

Constructing a Confidence Interval for the Fraction Who Benefit from Treatment, Using Randomized Trial Data

Lecture 9: Quantile Methods 2

Instrumental Variables Estimation and Weak-Identification-Robust. Inference Based on a Conditional Quantile Restriction

SOLUTIONS Problem Set 2: Static Entry Games

University of California San Diego and Stanford University and

INVALIDITY OF THE BOOTSTRAP AND THE M OUT OF N BOOTSTRAP FOR INTERVAL ENDPOINTS DEFINED BY MOMENT INEQUALITIES. Donald W. K. Andrews and Sukjin Han

MCMC CONFIDENCE SETS FOR IDENTIFIED SETS. Xiaohong Chen, Timothy M. Christensen, and Elie Tamer. May 2016 COWLES FOUNDATION DISCUSSION PAPER NO.

Optimal Plug-in Estimators of Directionally Differentiable Functionals

Inference on distributions and quantiles using a finite-sample Dirichlet process

arxiv: v1 [econ.em] 26 Sep 2017

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Asymmetric least squares estimation and testing

On the Uniform Asymptotic Validity of Subsampling and the Bootstrap

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Problem Selected Scores

Monte Carlo Confidence Sets for Identified Sets

Ultra High Dimensional Variable Selection with Endogenous Variables

ON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING AND THE BOOTSTRAP. Joseph P. Romano Azeem M. Shaikh

Quantile Regression for Extraordinarily Large Data

and Level Sets First version: August 16, This version: February 15, Abstract

Asymptotically Efficient Estimation of Models Defined by Convex Moment Inequalities

VALIDITY OF SUBSAMPLING AND PLUG-IN ASYMPTOTIC INFERENCE FOR PARAMETERS DEFINED BY MOMENT INEQUALITIES

Quantile Regression for Dynamic Panel Data

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Lecture 2: Consistency of M-estimators

Inference on Optimal Treatment Assignments

Testing against a linear regression model using ideas from shape-restricted estimation

Econometric Analysis of Games 1

arxiv: v3 [stat.me] 26 Sep 2017

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

Multiscale Adaptive Inference on Conditional Moment Inequalities

The properties of L p -GMM estimators

TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS

What s New in Econometrics. Lecture 13

Statistical Inference

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS

MCMC Confidence Sets for Identified Sets

Bootstrap Confidence Intervals

large number of i.i.d. observations from P. For concreteness, suppose

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Asymptotic Distortions in Locally Misspecified Moment Inequality Models

Expecting the Unexpected: Uniform Quantile Regression Bands with an application to Investor Sentiments

NONPARAMETRIC AND PARTIALLY IDENTIFIED CUBE ROOT ASYMPTOTICS FOR MAXIMUM SCORE AND RELATED METHODS

ON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES. Timothy B. Armstrong. October 2014 Revised July 2017

Inference on Breakdown Frontiers

A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters. By Tiemen Woutersen. Draft, September

Bayesian Regression Linear and Logistic Regression

Common Threshold in Quantile Regressions with an Application to Pricing for Reputation

Can we do statistical inference in a non-asymptotic way? 1

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Testing Many Moment Inequalities

Uniform Inference for Conditional Factor Models with Instrumental and Idiosyncratic Betas

Quantile Regression: Inference

PROGRAM EVALUATION WITH HIGH-DIMENSIONAL DATA

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

The Econometrics of Shape Restrictions

Inference For High Dimensional M-estimates. Fixed Design Results

LIKELIHOOD INFERENCE IN SOME FINITE MIXTURE MODELS. Xiaohong Chen, Maria Ponomareva and Elie Tamer MAY 2013

Inference for High Dimensional Robust Regression

Moment and IV Selection Approaches: A Comparative Simulation Study

What s New in Econometrics? Lecture 14 Quantile Methods

Inference on Estimators defined by Mathematical Programming

Reliable Inference in Conditions of Extreme Events. Adriana Cornea

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University

Sources of Inequality: Additive Decomposition of the Gini Coefficient.

Bayesian Sparse Linear Regression with Unknown Symmetric Error

CLUSTER-ROBUST BOOTSTRAP INFERENCE IN QUANTILE REGRESSION MODELS. Andreas Hagemann. July 3, 2015

Lecture 8 Inequality Testing and Moment Inequality Models

THE LIMIT OF FINITE-SAMPLE SIZE AND A PROBLEM WITH SUBSAMPLING. Donald W. K. Andrews and Patrik Guggenberger. March 2007

BAYESIAN INFERENCE IN A CLASS OF PARTIALLY IDENTIFIED MODELS

Statistics 135 Fall 2008 Final Exam

Bayesian and Frequentist Inference in Partially Identified Models

A NOTE ON MINIMAX TESTING AND CONFIDENCE INTERVALS IN MOMENT INEQUALITY MODELS. Timothy B. Armstrong. December 2014

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Learning discrete graphical models via generalized inverse covariance matrices

Transcription:

The Numerical Delta Method and Bootstrap Han Hong and Jessie Li Stanford University and UCSC 1 / 41

Motivation Recent developments in econometrics have given empirical researchers access to estimators that exhibit nonsmoothness in the population objective function. Hypothesis tests and counterfactual analyses are being performed on nondifferentiable functions of structural parameters. This paper combines numerical differentiation with resampling to offer an asymptotically valid and computationally attractive approach for conducting inference in a class of possibly nonsmooth problems. Initially motivated by Fang and Santos (2014 and Dumbgen (1993. 2 / 41

Motivating Example Tennessee STAR experiment examined effects of class size reduction on test scores through a randomized experiment. Interested in testing whether treatment weakly benefits students at all percentiles of the test score distribution. Linear conditional quantile model: Q Y (τ W, X = α(τ + θ(τw + X β(τ θ(τ is Quantile Treatment Effect (QTE at τth quantile. Y is test scores. W is indicator for assignment to small class. X are student and teacher covariates. H 0 : θ(τ 0 for all τ T versus H 1 : θ(τ < 0 for some τ T T {0.05, 0.10,..., 0.95}. 3 / 41

Motivating Example A test statistic is the negative of the minimum of the normalized QTEs: S n nφ (ˆθ n = n min τ T ˆθ n (τ ÂsyVar (ˆθ n (τ Need to estimate the test statistic s limiting distribution under H 0 and use the percentiles of that distribution to form critical values. The minimum function makes the test statistic a nondifferentiable function of the QTEs, which invalidates standard delta method and bootstrap. Subsampling is a viable alternative but more information from the sample can be used. 4 / 41

Directionally Differentiable Function We are still able to obtain the test statistic s limiting distribution because the function is directionally differentiable. A function that is directionally differentiable has a different derivative depending on how you orient the tangent plane. For ease of visualization, look at min (β 1, β 2. 5 / 42

Outline 1 The Directional Delta Method 2 The Numerical Delta Method Pointwise Valid Confidence Intervals Uniformly Valid Inference 3 Second Order Directional Delta Method Application to Partially Identified Models 4 The Numerical Bootstrap General Principle Applications to Nonsmooth Optimization Problems Comparison with Subsampling 5 Empirical Application to Tennessee STAR Experiment 6 / 41

The Directional Delta Method The Directional Delta Method Santos and Fang (2014, Shapiro (1991, Dumbgen (1993 Consider a function(al φ ( which is Hadamard directionally differentiable. Goal: statistical inference for φ (θ 0 using limiting distribution of r n (φ (ˆθn φ (θ 0 where r n (ˆθn θ 0 converges in distribution to G 0. Consider rewriting the statistic as a finite difference: ( φ r n (φ (ˆθ n φ (θ 0 = φ θ 0 (G 0 (θ 0 + 1 rn r n (ˆθ n θ 0 φ (θ 0 1/r n As n, the stepsize 1/r n 0, and the finite difference converges to the Hadamard directional derivative evaluated at θ 0 with direction given by G 0. Consistent estimate of φ θ 0 (G 0 can be used to conduct inference. 7 / 41

The Directional Delta Method Examples of Directionally Differentiable Functions Regression function in threshold regression (e.g. Hansen 2015 Reinhart and Rogoff (2010 argue that economic growth declines when government debt relative to GDP exceeds a threshold. y t = β 1 (x t γ + β 2 (x t γ + + β 3 z t + e t = φ t (θ + e t. y t is GDP growth, x t is debt to GDP percentage. Would like to form confidence bands around the regression function. φ t (θ is directionally but not fully differentiable at x t = γ. Upper and Lower Bounds on Value Distribution in Incomplete Auction Model (e.g. Haile and Tamer (2003 Test statistics for subvector inference in moment inequality models (e.g. Bugni, Canay and Shi (2014 Endpoints of identified set for partially identified parameters (e.g. Lee and Bhattacharya (2016 8 / 41

The Directional Delta Method Literature Review Fang and Santos (2014 show that consistent estimates of the directional derivative can be used for inference. They derive the analytic expressions for the directional derivative and estimate its components on a case by case basis. We generalize Dumbgen s method to avoid analytic derivations and achieves pointwise valid inference for all directionally differentiable functions and uniformly valid inference under convexity and Lipschitz continuity of the function(al. We allow for estimators that are not n consistent or asymptotically normal. Hirano and Porter (2012 and Song (2014 focus on estimation rather than inference. Woutersen and Ham (2016 propose a method for conducting inference on nonsmooth functions of parameters based on projections. 9 / 41

The Directional Delta Method Examples of Directional Derivatives For φ (θ = aθ + + bθ, where θ + = max{θ, 0} and θ = min{θ, 0}, φ θ (h = ah1(θ > 0 bh1(θ < 0 + ( ah + + bh 1(θ = 0 For φ (θ = max{θ 1, θ 2,..., θ K }, φ θ (h = max {h k} where I = {k : θ k = max{θ 1, θ 2,..., θ K }} k I For φ (θ = inf τ T θ (τ, φ θ (h = inf h (τ where T 0 = argmin τ T θ (τ τ T 0 10 / 41

The Numerical Delta Method The Numerical Directional Delta Method Estimates the distribution of φ θ 0 (G 0 using the distribution of φ (ˆθn + ɛ n Z ˆφ n (Z n φ (ˆθ n n (1 ɛ n Z n Theorem P W G 0 (convergence in distribution conditional on the data e.g. Asymptotic Normal Approximation: N(0, ˆσ 2 n e.g. Bootstrap: r n (ˆθ n ˆθ n where ˆθ n are the bootstrapped estimates. e.g. MCMC: r n (ˆθ n ˆθ n where ˆθ n are draws from the posterior. For φ( Hadamard directionally differentiable at θ 0, ɛ n 0, r n ɛ n, and Z P n G 0, W ˆφ n (Z n P W φ θ 0 (G 0. 11 / 41

The Numerical Delta Method Numerical Delta Method Algorithm Pointwise Valid Confidence Intervals Suppose we take the bootstrap approach: For B iterations, draw with replacement a resample of size n, reestimate the parameters ˆθ n. Form the B dim (θ matrix Z n = r n (ˆθ n ˆθ n. Form the B 1 vector ˆφ n (Z n = φ(ˆθ n+ɛ nz n φ(ˆθ n ɛ n. A 1 α two-sided equal-tailed confidence interval for φ (θ 0 can be formed by ] [φ(ˆθ n 1rn c 1 α/2, φ(ˆθ n 1rn c α/2 where c 1 α/2 and c α/2 are the (1 α/2th and (α/2th percentiles of the empirical distribution of ˆφ n (Z n. A 1 α symmetric confidence interval can be formed by [φ(ˆθ n 1rn d 1 α, φ(ˆθ n + 1rn d 1 α ] where d 1 α is the (1 αth percentile of the empirical distribution of ˆφ n(z n. 12 / 41

The Numerical Delta Method Pointwise Valid Confidence Intervals Choice of ɛ n for First Order Numerical Delta Method Optimal ɛ n should minimize error between ˆφ n (Z n and φ θ 0 (G 0. Suppose φ θ 0 ( is Lipschitz. When the second order directional ( derivative is nonzero, 1/rn If φ θ 0 ( is not linear, ɛ n = O leads to an error of O P ( 1/r n. If φ θ 0 ( is linear, ɛ n = O (1/r n leads to error of O P (1/r n. When the second order directional derivative is zero, e.g. φ(θ = aθ + + bθ, φ(θ = min{θ 1,..., θ k }, ɛ n should converge to zero very slowly to get error of O P (1/r n. 13 / 41

Uniform Size Control The Numerical Delta Method Uniformly Valid Inference Consider hypothesis testing of the following form: H 0 : φ (θ 0 0 against H 1 : φ (θ 0 > 0. using the test statistic r n φ (ˆθ n. Examples of such tests include Dominance Test: H 0 : θ(τ 0 for all τ T versus H 1 : θ(τ < 0 for some τ T. r n φ (ˆθn = ( ˆθ n min n(τ τ T ÂsyVar(ˆθ n(τ Moment Inequalities Test: H 0 : θ 0k = E[X k ] 0 for all k = 1...K versus H 1 : θ 0k < 0 for some k = 1...K r n φ (ˆθ n = ( K ( ( 2 1/2 n X k k=1 14 / 41

The Numerical Delta Method Uniformly Valid Inference Uniform Size Control & Uniformly Valid Confidence Intervals Reject H 0 whenever r n φ (ˆθ n ĉ 1 α, where ĉ 1 α is the 1 α quantile of ˆφ n (Z n. Whenever φ (θ is convex and Lipschitz in θ, the size of this test will be less than or equal to α uniformly over a class of data generating distributions. If φ ( is( convex and Lipschitz, the upper one-sided confidence interval φ (ˆθ n ĉ1 α r n, has coverage greater than or equal to 1 α uniformly over a class of data generating distributions. If φ ( is( concave and Lipschitz, then the lower one-sided confidence interval, φ (ˆθ n ĉα r n will have uniformly valid coverage asymptotically. 15 / 41

Outline The Numerical Bootstrap General Principle 1 The Directional Delta Method 2 The Numerical Delta Method Pointwise Valid Confidence Intervals Uniformly Valid Inference 3 Second Order Directional Delta Method Application to Partially Identified Models 4 The Numerical Bootstrap General Principle Applications to Nonsmooth Optimization Problems Comparison with Subsampling 5 Empirical Application to Tennessee STAR Experiment 24 / 41

The Numerical Bootstrap General Principle A generalized numerical bootstrap method Inference on parameters which can be written as functions of the data generating distribution. Write θ 0 = θ (P and ˆθ n = θ (P n, where P is the data generating distribution and P n is the empirical distribution. Goal is to consistently estimate limiting distribution of a (n (θ (P n θ (P = n γ (ˆθ n θ 0 : a (n ( θ ( P + 1 n n (Pn P θ (P J. For ɛ n 0 and nɛ n, the numerical bootstrap replaces P with P n, 1/ n with ɛ n and n (P n P with n (Pn P n where Pn is the bootstrapped empirical distribution. ( 1 a θ ɛ 2 n P n + ɛ n n (P n P n θ (P }{{} n = ɛ 2γ n (ˆθ n ˆθ n Zn 25 / 41

The Numerical Bootstrap Comparison with Subsampling Comparison of Numerical Bootstrap with Subsampling Subsampling approximates the limiting distribution of ( ( a (n θ P + 1 n (Pn P θ (P n using the distribution of ( ( a (b θ P n + 1 b (Pb P n b θ (P n Numerical bootstrap estimates n (P n P using the entire sample of size n, which is more precise than using a subsample of size b << n. ( 1 (θ ( a Pn ɛ 2 + ɛ n n (P n P n θ (P n n 36 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Maximum Score Estimator of Manski (1975 Model:y i = 1 (x i θ + ν i 0 where P (ν i < 0 x i = x = 0.5 for all x. Maximize the number of correct predictions 1 ˆθ n = argmax θ Θ n n [2 1 (y i = 1 1] 1 ( x i θ 0 i=1 Kim and Pollard (1990 show that the maximum score estimator converges to a nonstandard limiting distribution at the cube root rate. Abrevaya and Huang (2005 show that one cannot use the bootstrap to estimate that limiting distribution consistently. Subsampling is a viable alternative. Alternative bootstrap methods: Seijo and Sen (2011: based on smoothing Most recently Cattaneo, Jansson and Nagasawa (2017: combine with numerical Hessian estimation. Horowitz (1992, Hong, Mahajan and Nekipelov (2016: smooth objective function. 26 / 41

The Numerical Bootstrap M-estimator consistency Applications to Nonsmooth Optimization Problems ˆθ n arg maxp n π(, θ = 1 θ Θ n n π (z i, θ. We approximate the limiting distribution of n γ (ˆθ n θ 0 using the finite sample distribution of ɛ 2γ n (ˆθ n ˆθ n, where ˆθ n arg maxznπ(, θ, and Zn = P n + ɛ n Ĝn is a linear combination θ Θ between the empirical distribution and the bootstrapped empirical process. For example, when Ĝn is the multinomial bootstrap, for each bootstrap sample zi, i = 1,..., n, ˆθ n 1 = arg max θ Θ n n i=1 π (z i, θ + ɛ n n 1 n i=1 n (π (zi, θ π (z i, θ. i=1 27 / 41

The Numerical Bootstrap M-estimator consistency Applications to Nonsmooth Optimization Problems On the other hand, when Ĝ n is the Wild bootstrap, ˆθ n 1 = arg max θ Θ n n i=1 π (z i, θ + ɛ n n 1 n n ( ξi ξ π (z i, θ. for Z 0 (h a mean zero Gaussian process with covariance kernel Σ ρ and nondegenerate increments, and i=1 n γ (ˆθ n θ 0 J arg maxz 0 (h 1 h 2 h Hh Z n ɛ 2γ n (ˆθ n ˆθ n P W J and Z n J. 28 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Maximum Score Estimator of Manski (1975 For each bootstrap draw {yi, x i }n i=1, compute Numerical Bootstrap estimate: ˆθ n 1 = argmax θ Θ n + ɛ n n 1 n n i=1 n [2 1 (y i = 1 1] 1 ( x i θ 0 i=1 { [2 1 (y i = 1 1] 1 ( x i θ 0 [2 1 (y i = 1 1] 1 ( x i θ 0 } Use the simulated distribution (conditional on data of (ˆθ n ˆθ n to approximate the limit distribution of ɛ 2/3 n n 1/3 (ˆθ n θ 0. 29 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Maximum Score Estimator of Manski (1975 A 1 α two-sided equal-tailed confidence interval for φ (θ 0 can be formed by [ ˆθ n 1 n 1/3 c 1 α/2, ˆθ n 1 ] n 1/3 c α/2 where c 1 α/2 and c α/2 are the (1 α/2th and (α/2th percentiles of the distribution of ɛ 2/3 n (ˆθ n ˆθ n. A 1 α symmetric confidence interval can be formed by [ ˆθ n 1 n 1/3 d 1 α, ˆθ n + 1 ] n 1/3 d 1 α where d 1 α is the (1 αth percentile of the distribution of (ˆθ n. ˆθ n ɛ 2/3 n 30 / 41

The Numerical Bootstrap Constrained M-estimation Applications to Nonsmooth Optimization Problems Replace Θ with a constrain set C, such that for ˆθ n C, P n π (, ˆθ n inf P ( nπ (, θ + o P n 2γ, (4 θ C and for ˆθ n C, Z nπ (, ˆθ n inf θ C Z nπ (, θ + op ( ɛ 4γ n. Let T C (θ 0 be a cone such that when α, α (C θ 0 T C (θ 0. J arg min h TC (θ 0 Z 0 (h + 1 2 h Hh. Jˆ n n (ˆθn γ θ 0 J, Jˆ n ɛ 2γ n (ˆθ n ˆθ P W n J, and Jˆ n ɛ 2γ n (ˆθ n ˆθ n J. Can also estimate T C (θ 0 directly by h C ˆθ n ɛ n in some situations. 31 / 41

The Numerical Bootstrap Sample size dependent statistics For Ĝ n = n (P n P, ˆθ n = θ (P n, n, n = θ Applications to Nonsmooth Optimization Problems ( P + 1 Ĝ n, n 2, n n Suppose Define Jˆ n = a (n (θ (P n, n, n θ 0 Jˆ ( ˆθ n = θ Zn, 1 ( ɛ 2, n = θ P n + ɛ n Ĝn, 1 n ɛ 2, n. n ( Jˆ 1 n = a (ˆθ P W ɛ 2 n ˆθ n n Examples include Laplace estimators (e.g. Jun, Pinkse, Wan and LASSO. ˆ J 32 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Application to LASSO, Finite Dimensional Case LASSO s asymptotic distribution cannot be consistently estimated by the conventional bootstrap when some of the coefficients are zero. 1 ˆβ n = arg min β n n i=1 ( yi x i β 2 + λ n n p β k. (5 k=1 Numerical Bootstrap consistently estimates the asymptotic distribution of ( ( n ˆβ n β 0 using the distribution of ɛ 1 n ˆβ n ˆβ n where Z n ( y x β 2 = 1 n ˆβ n = arg min Zn β ( y x β p 2 + λn ɛ n β k (6 k=1 ( n ( n yi x i β 2 ( + ɛn 2 n yi ( xi β yi x i β 2 i=1 n i=1 i=1 (7 33 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Application to 1-Norm SVM, Finite Dimensional Case For κ > 0, λ n > 0, the 1-norm SVM estimator is 1 n ( ( ˆβ n = arg min ρτ yi x i β κ + λ + n β n n i=1 k β j. where ρ τ (u is the Koenker and Bassett (1978 check function. If τ = 1 2 and κ = 0, then ˆβ is the LASSO quantile regression estimator of Belloni et al (2011. Numerical Bootstrap consistently estimates the asymptotic distribution of ( ( n ˆβ n β 0 using the distribution of ɛ 1 n ˆβ n ˆβ n ˆβ n 1 = arg min β n + ɛn n ( n i=1 n i=1 k + λ nɛ n β j j=1 ( ρτ ( yi x i β + κ + (ρ τ ( y i xi β + κ + n i=1 j=1 ( ρτ ( yi x i β + κ + (8 34 / 41

The Numerical Bootstrap Applications to Nonsmooth Optimization Problems Recentering Test H 0 : θ (P = θ vs H 1 : θ (P > θ. Estimate the distribution of a (n (ˆθn θ by either (1 the noncentered numerical bootstrap distribution ( 1 a (ˆθ n θ, or (2 the centered numerical bootstrap distribution ( 1 a (ˆθ n ˆθ n. Similar to subsampling (Chernozhukov et al. ɛ 2 n ɛ 2 n Can also estimate unknown polynomial rates of convergence. 35 / 41

Outline Second Order Directional Delta Method 1 The Directional Delta Method 2 The Numerical Delta Method Pointwise Valid Confidence Intervals Uniformly Valid Inference 3 Second Order Directional Delta Method Application to Partially Identified Models 4 The Numerical Bootstrap General Principle Applications to Nonsmooth Optimization Problems Comparison with Subsampling 5 Empirical Application to Tennessee STAR Experiment 16 / 41

Second Order Directional Delta Method Second Order Directional Delta Method First order directional delta method may produce degenerate limiting distribution. e.g. Test statistics commonly used for moment inequality models have zero first order directional derivative under the null. Theorem Let φ ( be a twice Hadamard directionally differentiable function at θ 0 and r n (ˆθ n θ 0 G 0. If φ θ 0 (h = 0 for all h, then r 2 n ( φ(ˆθ n φ(θ 0 J 1 2 φ θ 0 (G 0 (2 17 / 41

Second Order Directional Delta Method Application to Partially Identified Models Application to Partially Identified Models Simplified 2x2 Entry Game in Bresnahan and Reiss (1991: firm j {1, 2} decides whether to enter a market i {1,..., n}. Action: z j,i = 1 if firm j enters market i. Benefit of Entry: η j,i U(0, 1. Profit function: π j,i = (η j,i β j z j,i 1{z j,i = 1} where β (0, 1 2. Firms play pure strategy Nash Equilibria: 1 (z 1i, z 2i = (1, 1 if η j,i > β j j. 2 (z 1i, z 2i = (1, 0 if η 1,i > β 1 and η 2,i < β 2. 3 (z 1i, z 2i = (0, 1 if η 1,i < β 1 and η 2,i > β 2. 4 (z 1i, z 2i {(1, 0, (0, 1} if η j,i < β j j. Model implies P (z 1i = 1, z 2i = 1 = (1 β 1 (1 β 2 β 2 (1 β 1 P (z 1i = 1, z 2i = 0 β 2 18 / 41

Second Order Directional Delta Method Application to Partially Identified Models Application to Partially Identified Models Model leads to Q = 4 moment inequalities Pg ( ; β Eg (z i ; β 0. z 1i z 2i (1 β 1 (1 β 2 Pg ( ; β = (z 1i z 2i (1 β 1 (1 β 2 z 1i (1 z 2i β 2 (1 β 1 β 2 z 1i (1 z 2i Suppose we would like to perform the following test in Bugni, Canay, and Shi (BCS (2014 for k = 1, 2: H 0 : β k = γ 0 H 1 : β k γ 0 For P n g( ; β 1 n n i=1 g(z i; β, test statistic is n inf S (P ng( ; β = n β B k (γ 0 inf β B k (γ 0 q=1 Q ( (Pn g q ( ; β 2 B k (γ 0 {β B : β k = γ 0 } is the set of all β = (β 1, β 2 such that β k = γ 0. 19 / 41

Second Order Directional Delta Method Application to Partially Identified Models Application to Partially Identified Models A level α test rejects when n inf S (P ng(z i ; β is greater than the β B k (γ 0 (1 α percentile of a consistent estimate of the limiting distribution of ( ( n inf (ˆθ S n (β inf S (θ 0(β = n φ (ˆθ n φ(θ 0 β B k (γ 0 β B k (γ 0 Define θ 0 (β = Pg(, β and ˆθ n (β = P n g(, β. Define φ(θ inf S(θ(β = (f S (θ, S (θ = Q ( q=1 θ 2, q β B k (γ 0 f (S = inf S (β. β B k (γ 0 Using the chain rule, we can show that φ is twice Hadamard directionally differentiable. 20 / 41

Second Order Directional Delta Method Second Order Numerical Delta Method Application to Partially Identified Models Theorem Let φ ( be a twice Hadamard directionally differentiable function at θ 0 and r n (ˆθ n θ 0 G 0. Let ɛ n 0, r n ɛ n, and Z P n G 0. Then if W φ θ 0 (h = 0 for all h, 1 φ (ˆθ n + ɛ n Z ˆφ n (Z n 2 n ɛ 2 n φ (ˆθ n P W J 1 2 φ θ 0 (G 0. (3 For P ng( ; β 1 n n i=1 g(z i ; β and Z n n (P n P n g( ; β, inf S (( P n + ɛ n n (P n P n g( ; β 1 ˆφ n (Z β B k (γ 0 n = 2 ɛ 2 n inf β B k (γ 0 S (P ng( ; β 21 / 41

Second Order Directional Delta Method Second Order Numerical Delta Method Alternatively, we can use Application to Partially Identified Models ˆφ n(h φ(ˆθ n + 2ɛ n h 2φ(ˆθ n + ɛ n h + φ(ˆθ n ɛ 2 n For our moment inequalities example, ˆφ n (Z n = 1 ( inf ɛ S (( P 2 n + 2ɛ n n (P n P n g( ; β n β B k (γ 0 2 inf S (( P n + ɛ n n (P n P n g( ; β + β B k (γ 0 inf S (P ng( ; β β B k (γ 0 Theorem Under the same conditions as in the previous theorem, except without φ θ 0 (h 0, ˆφ n (Z n P φ θ W 0 (G 0. 22 / 41

Second Order Directional Delta Method Moment Inequalities Simulation Application to Partially Identified Models A level 5% test rejects when n inf S (P ng( ; β > ĉ 95, where ĉ 95 β B k (γ 0 is the 95th percentile of one of the following distributions. 1 Numerical Second Order Derivative 1: Two-term finite difference 2 Numerical Second Order Derivative 2: Three-term finite difference 3 Bugni, Canay, and Shi (2014 Minimum Resampling Test 4 Romano and Shaikh (2008 Subsampling Test using b = n 2/3 β 1 = 0.3 and β 2 = 0.5 are the true values. Plot rejection frequencies when testing H 0 : β 1 = γ 0 against H 1 : β 1 γ 0 for γ 0 [0.1, 0.5]. Plot rejection frequencies when testing H 0 : β 2 = γ 0 against H 1 : β 2 γ 0 for γ 0 [0.3, 0.7]. 23 / 41

Second Order Directional Delta Method Application to Partially Identified Models Figure : Rejection frequency as a function of β 1 1 Rejection Frequencies as a function of beta1 N=1000 B=1000 0.9 0.8 0.7 Rejection Frequency 0.6 0.5 0.4 0.3 0.2 Numerical Derivative 1 Numerical Derivative 2 0.1 BCS Subsampling 0 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 beta1 25 / 42

Second Order Directional Delta Method Application to Partially Identified Models Figure : Rejection frequency as a function of β 1 0.64 Rejection Frequencies as a function of beta1 N=1000 B=1000 0.62 0.6 Rejection Frequency 0.58 0.56 0.54 0.52 0.5 Numerical Derivative 1 Numerical Derivative 2 BCS Subsampling 0.36 0.365 0.37 0.375 0.38 beta1 26 / 42

Second Order Directional Delta Method Application to Partially Identified Models Figure : Rejection frequency as a function of β 2 1 Rejection Frequencies as a function of beta2 N=1000 B=1000 0.9 0.8 0.7 Rejection Frequency 0.6 0.5 0.4 0.3 0.2 Numerical Derivative 1 0.1 Numerical Derivative 2 BCS Subsampling 0 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 beta2 27 / 42

Second Order Directional Delta Method Application to Partially Identified Models Figure : Rejection frequency as a function of β 2 0.58 Rejection Frequencies as a function of beta2 N=1000 B=1000 0.57 0.56 Rejection Frequency 0.55 0.54 0.53 0.52 0.51 Numerical Derivative 1 Numerical Derivative 2 BCS Subsampling 0.568 0.5685 0.569 0.5695 0.57 0.5705 0.571 0.5715 0.572 beta2 28 / 42

Empirical Application to Tennessee STAR Experiment Outline 1 The Directional Delta Method 2 The Numerical Delta Method Pointwise Valid Confidence Intervals Uniformly Valid Inference 3 Second Order Directional Delta Method Application to Partially Identified Models 4 The Numerical Bootstrap General Principle Applications to Nonsmooth Optimization Problems Comparison with Subsampling 5 Empirical Application to Tennessee STAR Experiment 37 / 41

Empirical Application to Tennessee STAR Experiment Tennessee STAR Experiment For 79 schools between 1985 and 1988, Tennessee government randomly assigned some students to classes with only 13-17 students while others to classes with 20-25 students. Substantial attrition after the first year. Many students either moved away from participating schools or had to repeat a grade, which meant that they no longer received treatment. Run regressions on student-level variables for the year in which they entered the program Chetty et al (2011. Y i is the average of each student s math and reading percentile ranks obtained using the transformation in Krueger (1999. X i are student s gender, race, age, whether she has free lunch, teacher s experience, her position on the career ladder, whether she has a higher degree, and whether school is urban or rural. Fail to reject the null that QTEs at quantiles {0.05, 0.10,..., 0.95} are all nonnegative. 38 / 41

Empirical Application to Tennessee STAR Experiment Empirical Application Suppose we would like to form confidence intervals around the maximum and minimum QTEs. φ 1 (θ max τ T θ(τ and φ 2 (θ min τ T θ(τ. T {0.05, 0.10,..., 0.95}. Numerical Delta Method: For B iterations, draw with replacement a resample of size n, reestimate the QTE ˆθ n. Form the B 19 matrix Z n = n (ˆθ n ˆθ n. Compute percentiles of φ(ˆθ n+ɛ nz n φ(ˆθ n ɛ n. Subsampling: For B iterations, draw with replacement a resample of size b << n, reestimate the QTE ˆθ b. Compute percentiles of ( b φ (ˆθ b φ (ˆθ n. 39 / 41

Empirical Application to Tennessee STAR Experiment Table: 95% Numerical Delta Method Confidence Intervals for the Maximum and Minimum Quantile Treatment Effect φ(ˆθ n SE Equal-Tailed Lower Upper Max 6.77% 0.73% (4.07%,6.97% (,6.77% (4.37%, Min 1.52% 0.75% (1.35%,4.26% (,3.99% (1.54%, Table: 95% Subsampling Confidence Intervals for the Maximum and Minimum Quantile Treatment Effect SE Equal-Tailed Lower Upper Max 0.83% (4.12%,7.42% (,7.15% (4.42%, Min 0.81% (0.87%,4.05% (,3.76% (1.10%, 40 / 41

Empirical Application to Tennessee STAR Experiment Conclusion Demonstrated how to conduct inference on directionally differentiable functions of parameters using the numerical delta method. Pointwise valid inference for all directionally differentiable functions. Uniformly valid inference for convex and Lipschitz functions. Consistent estimation of the limiting distribution of test statistics in partially identified models. Proposed a numerical bootstrap principle that can be used to conduct inference when regular bootstrap fails. Pointwise valid inference for Maximum Score, LASSO, and 1-norm Support Vector Machine Regression. 41 / 41