Bayesian Indirect Inference and the ABC of GMM

Similar documents
Statistical Properties of Numerical Derivatives

Indirect Likelihood Inference

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

Time Series Analysis Fall 2008

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Flexible Estimation of Treatment Effect Parameters

DSGE-Models. Limited Information Estimation General Method of Moments and Indirect Inference

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Single Index Quantile Regression for Heteroscedastic Data

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

Fractional Imputation in Survey Sampling: A Comparative Review

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Short Questions (Do two out of three) 15 points each

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Introduction to Estimation Methods for Time Series models Lecture 2

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Single Index Quantile Regression for Heteroscedastic Data

Technical appendices: Business cycle accounting for the Japanese economy using the parameterized expectations algorithm

A Local Generalized Method of Moments Estimator

A note on multiple imputation for general purpose estimation

Quantile Regression for Dynamic Panel Data

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity

Specification Test for Instrumental Variables Regression with Many Instruments

CCP Estimation. Robert A. Miller. March Dynamic Discrete Choice. Miller (Dynamic Discrete Choice) cemmap 6 March / 27

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results

Multiscale Adaptive Inference on Conditional Moment Inequalities

Inference for Identifiable Parameters in Partially Identified Econometric Models

Regression, Ridge Regression, Lasso

Graduate Econometrics I: Maximum Likelihood II

The Numerical Delta Method and Bootstrap

A Resampling Method on Pivotal Estimating Functions

Econometrics I, Estimation

Econ 583 Final Exam Fall 2008

Additional Material for Estimating the Technology of Cognitive and Noncognitive Skill Formation (Cuttings from the Web Appendix)

Lecture 9: Quantile Methods 2

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

11. Bootstrap Methods

Nonlinear GMM. Eric Zivot. Winter, 2013

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS

Instrumental Variables Estimation and Weak-Identification-Robust. Inference Based on a Conditional Quantile Restriction

Ultra High Dimensional Variable Selection with Endogenous Variables

Graduate Econometrics I: Maximum Likelihood I

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Statistics: Learning models from data

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

The generalized method of moments

Stochastic Proximal Gradient Algorithm

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Measuring the Sensitivity of Parameter Estimates to Estimation Moments

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

STAT 518 Intro Student Presentation

Sequential Monte Carlo Methods

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions

INDIRECT LIKELIHOOD INFERENCE

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Lecture 8 Inequality Testing and Moment Inequality Models

GMM Estimation and Testing

Economics 583: Econometric Theory I A Primer on Asymptotics

Asymptotic distribution of GMM Estimator

INDIRECT INFERENCE BASED ON THE SCORE

Estimation, Inference, and Hypothesis Testing

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

Machine learning, shrinkage estimation, and economic theory

Location Multiplicative Error Model. Asymptotic Inference and Empirical Analysis

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Advanced Econometrics

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms

Econ 582 Nonparametric Regression

Econometrics II - EXAM Answer each question in separate sheets in three hours

Density Estimation. Seungjin Choi

This chapter reviews properties of regression estimators and test statistics based on

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Lecture 2: Consistency of M-estimators

13 Endogeneity and Nonparametric IV

STA414/2104 Statistical Methods for Machine Learning II

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

Penalized Indirect Inference

Modelling Czech and Slovak labour markets: A DSGE model with labour frictions

Introduction to Bayesian Inference

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

What s New in Econometrics. Lecture 13

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

An Encompassing Test for Non-Nested Quantile Regression Models

Time Series and Forecasting Lecture 4 NonLinear Time Series

Transcription:

Bayesian Indirect Inference and the ABC of GMM Michael Creel, Jiti Gao, Han Hong, Dennis Kristensen Universitat Autónoma, Barcelona Graduate School of Economics, and MOVE Monash University Stanford University University College London, CEMMAP, and CREATES. April 18, 2016 1 / 44

Introduction The Generalized Method of Moment is the building block of many econometrics models: Large Sample Properties of Generalized Method of Moments Estimators, Hansen, 1982, Econometrica. Computation may not be easy Studies a method of implementing quasi-bayesian estimators for nonlinear and possibly nonseparable GMM models. Motivated very much by Indirect Likelihood Inference Creel and Kristensen 2011 and MCMC Chernozhukov and Hong 2003. Combines simulation with nonparametric regression in the computation of GMM models. new results showing that, in this particular setting, local linear or polynomial methods have theoretical advantage over possibly higher order kernel methods. Finite sample simulations consistent with theoretical results Need more work to handle large sample size and large dimension models. 2 / 44

Motivation and Literature Estimation of nonlinear models often involves difficult numerical optimization that might also be used in conjuction with simulation methods. Examples include maximum simulated likelihood, simulated method of moments, and efficient method of moments Gallant and Tauchen 1996, Gourieroux, Monfort, and Renault 1993. Despite extensive efforts, see for example Andrews 1997, the problem of extremum computation remains a formidable impediment in these mentioned applications. Recent insightful contribution by Creel and Kristensen 2011: Indirect Likelihood Inference, combines simulation and nonparametric kernel regression. 3 / 44

Setup and Estimators The current paper buids on and is closely related to Creel and Kristensen 2011: Indirect Likelihood Inference; Chernozhukov and Hong 2003: MCMC alternative to M Estimation. Gentzkow and Shapiro 2013 also regresses parameters on moments. We use nonparametric instead of linear regression; and focus on the fitted value instead of the regression coefficients. Recent work by Forneron and Ng 2015 analyze higher order asymptotic bias of ABC and indirect inference estimators. Extends in several directions Shows that local linear or polynomial methods have theoretical advantages over kernel methods. Demonstrates the validity of inference based on simulated posterior quantiles. generalizes to nonlinear, nonseparable, and possible nonsmooth moment conditions; Provides precise conditions relating the sampling noise to the sampling variance; 4 / 44

Setup and Background Creel and Kristensen 2011: log likelihood ˆQ θ based on a vector of summary statistics: ˆQ θ = log f Tn θ. Example: the efficient method of moments Gallant and Tauchen 1996 defines T n to be the score vector of an auxiliary model. A limited information Bayesian posterior distribution: f n θ T n = fn Tn, θ f n T n f nt n θπ θ =. 1 Θ fntn θπ θ dθ Information from Bayesian posterior can be used to conduct valid frequentist statistical inference. For example, the posterior mean is consistent and asymptotically normal. θ = θf n θ T n dθ E n θ T n. 2 Θ 5 / 44

Posterior quantiles can also be used to form valid confidence intervals under correct model specification. Define the posterior τth quantile of the jth parameter as θ j τ : θ τ j fnj θj T n dθj = τ where f nj θj T n = f n θ T n dθ j. A 1 τ level confidence interval for θ j : θ j τ/2, θ j 1 τ/2. More generally, let η θ be a known scalar function of the parameters. A point estimate of η 0 η θ 0 can be computed using the posterior mean: η = η θ f n θ T n dθ E n η θ T n. 3 Θ Define η τ, the posterior τth quantile of η given T n, through 1 η θ η τ f n θ T n dθ = τ. 4 An level 1 τ interval for η 0 : η τ/2, η 1 τ/2. 6 / 44

Draw θ s, s = 1,..., S independently from π θ. Compute η s = η θ s for s = 1,..., S. For each draw generate a sample from the model at this parameter value of θ s, then compute the corrsponding statistic T s n = T n θ s, s = 1,..., S. For a kernel function κ and a bandwidth sequence h, define ˆη = â, â, ˆb arg min a,b S s=1 η s a b Tn s T 2 T s n κ n T n. h Similarly, define a feasible version of η τ as ˆη τ = â as the intercept term in a local linear quantile regression, or a weighted quantile regression with weights κ : â, ˆb arg min a,b T s n T n h S T ρ τ η s a b Tn s s Tn κ n T n h s=1 In the above ρ τ x = τ 1 x 0 x is the check function in Koenker and Bassett 1978.. 7 / 44

Generalized to local polynomial least square and quantile regressions using the notations in Chaudhuri 1991. For u = u 1,..., u d, a d-dimensional vector of nonnegative integers, let [u] = u 1 + + u d. Let A be the set of all d-dimensional vectors u such that [u] p and set s A = # A. Let β = β u u A be a vector of coefficients of dimension s A. y u s = y u 1 s,1... y u d s,d, u A. Also let y s = Tn s T n, and ys A = Define the pth order polynomial P n β, y s = u A βuy s u = β ys A. Then replace last two steps by Define ˆη = ˆβ [0], the 0th element of ˆβ, for S y ˆβ = ya s s 1 S y A s κ ys A h ηs κ s=1 s=1 Define ˆη τ = ˆβ [0], the 0th element of ˆβ, for ˆβ arg min β S ρ τ η s β ys A s=1 κ ys y s h h. 5. 6 8 / 44

Local linear regression is a special case of local polynomial regression when p = 1. Under suitable regularity conditions, ˆη and ˆη τ are consistent if h 0 and S when n. For ˆη to be first order equivalent to limited information MLE and for 7 to hold, lim P η 0 η τ/2, η 1 τ/2 = 1 τ. 7 n we require nh 1+p 0 and Sh k, which entails S/ n k 2p+1, namely that S is much larger than n k 2p+1. The bias in ˆθ is of O h p The variance is order O 1 nsh which is much smaller than in usual k nonparametric regression models. In a local linear regression with p = 1, this requires S to be larger than n k/4, where k = dim θ. This conditions holds regardless of whether d = k or d > k. 9 / 44

The ABC of GMM M-estimators: ˇθ = arg max ˆQ n θ. MCMC Alternative Chernozhukov and Hong 2003: θπ θ exp m ˆQ n θ dθ θ =, π θ exp m ˆQ n θ dθ where m can be possibly different from n. But take m = n. ˆQ n θ = 1 2ĝ θ Ŵ θ ĝ θ, where Ŵ θ is a possibly data and parameter dependent weighting matrix. Redefine 2, 3 and 4 by replacing f n θ T n with f n θ GMM: θ = θf n θ GMM dθ, η = η θ f n θ GMM dθ 8 Θ Θ 1 η θ η τ f n θ GMM dθ = τ. 10 / 44

GMM and simulation based estimators Moment conditions: ĝ θ = 1 n n i=1 g z i, θ. ˆQ n θ = 1 2ĝ θ Ŵ θ ĝ θ. Consider the following statistical experiment / Consider a random vector Y m, which, given θ and the data, has a normal distribution: N ĝ θ, mŵ 1 θ 1. Define θ y = E θ Y m = y, for 1/2 f θ Y m = y π θ det Ŵ θ exp m 2 ĝ θ y Ŵ θ ĝ θ y. Then θ = θ y = θ y. y=0 11 / 44

GMM and simulation based estimators This interpretation suggests the following simulation based estimator. 1 Draw θ s, s = 1,..., S from π θ. For each θ s, compute ĝ θ s. 2 Draw yn s from Y n N ĝ θ s, nŵ 1 θs 1. For ξ N 0, I d : y s n = ĝ θs + 1 n Ŵ θ s 1/2 ξ, 3 Define ˆη = â in the following local to zero linear least square regression: â, ˆb arg min a,b S η s a b yn s 2 y s κ n h s=1 4 Define ˆη τ = â in the following local to zero linear quantile regression: â, ˆb arg min a,b S ρ τ η s a b yn s κ s=1. y s n. h 5 Local polynomials are implemented exactly as before. 12 / 44

Comments The weighting matrix W θ Can be chosen to provide an initial n consistent, but not necessarily efficient estimator. For example, W θ = I. Posterior quantile inference generally invalid in this case. Can be chosen ˆθ in two steps. In the first step W θ = I. In the second step, Ŵ estimated variance of ĝ θ 0 evaluated at ˆθ. Efficient and valid posterior quantile inference. Similar to optimal two step GMM. Or use continuously updating GMM, and choose Ŵ θ as estimated variance of ĝ θ for each given θ. Indirect inference implicitly uses continuously updated Ŵ θ. 13 / 44

Relation between BIL Creel and Kristensen 2011 and ABC-GMM separable moment conditions where ĝ θ = T n t θ. t θ is the probability limit of T n under θ. When t θ s is unknown, BIL replaces it with a simulated version T s n from θ s and use Y s n = T n T s n. This is analogous to drawing Y s n from ĝ θ s + 1 n W θ 1/2 ξ = T n t θ s T s n t θ s. where ξ is approximately normal. Appealing for parametric models with complex likelihood but feasible to simulate. 14 / 44

The simulation sample size can also differ from the observed sample. T s n can be replaced by T s m, where possibly m n. In step 2 of ABC-GMM, y s n can be replaced by ym s = ĝ θs + 1 W θ s 1/2 ξ N ĝ θ s, 1m W m θs 1. When m, ˆρ ρ 0 = O P 1 n m. Asymptotically valid however conservatively so when m < n 1 τth confidence interval for ρ 0 : η 1/2 + m n m m ητ/2 η 1/2, η1/2 + η1 τ/2 η 1/2. n m Only when m = n, this confidence interval specializes to ˆητ/2, ˆη 1 τ/2. Focus on m = n, since m < n does not seem to bring computational efficiency unless the cost of simulation increases with the simulation sample size, and m > n does not increase first order asymptotic efficiency. 15 / 44

Heuristics: Generalizing grid search with local polynomial extrapolation. Take m =, Y s = ĝ θ s. Regress θ s on Y s local to zero, for example, S ĝ θ ˆθ = θ s s κ / h s=1 S ĝ θ s κ h s=1 This is anagolous to a nonparametric regression in which there is no error term, ɛ 0: y = g x. Variance is soley due to the variation, and does not include the conditional variance of y given x. Fine for exactly identified models. But in overidentified model, while kernel methods are straightforward, local linear or polynomial methods involve possible multicolinearity among regressors ĝ θ s that are not present when m <.. 16 / 44

Furthermore, when the model is overidentified with d > t, conditional on a realization of θ s, the event that ĝ θ s = t is not possible for almost all values of t Lesbegue measure 1. In this case, the conditional distribution of θ ĝ θ = t is not defined for almost all t, including t = 0 for almost all realizations of ĝ θ. On the other hand, for m <, regardless of how large, the conditional distribution θ Y ĝ θ + ξ m = t is always well defined for all t, as long as ξ has full support. 17 / 44

Asymptotic Distribution Theory Provide conditions on the order of magnitude of the number of simulations in relation to the sample size for n consistency and asymptotic normality. Assumptions related to the infeasible estimators and intervals θ, η and η τ/2, η 1 τ/2 mirror the general results in Chernozhukov and Hong 2003 and Creel and Kristensen 2011. Additional assumptions relate to the feasible simulation based estimators and intervals, ˆη, ˆη τ, and ˆη 1 τ/2, ˆη 1 τ/2. θ and η τ, Assumption 1 The true parameter θ 0 belongs to the interior of a compact convex subset Θ of Euclidean space R k. The weighting function π : Θ R + is a continuous, uniformly positive density function. 18 / 44

Assumption 2 1 g θ = 0 if and only if θ = θ 0 ; 2 W θ is uniformly positive definite and finite on θ Θ; 3 sup θ Θ Ŵ θ W θ = o P 1; 4 sup θ Θ ĝ θ g θ = o P 1; 5 { n ĝ θ g θ ; θ Θ} G g, a mean zero Gaussian process with marginal variance Σ θ; 6 g θ and W θ are both p + 1 times boundedly and continuously differentiable. 7 For any ɛ > 0, there is δ > 0 such that lim sup P n { sup θ θ δ } n ĝθ ĝθ gθ gθ 1 + n θ θ > ɛ < ɛ. 9 19 / 44

Assumption 3, exactly identified models The model is exactly identified: d = k. Assumption 4, smooth overidentified model There exists random function Ĝ θ y, Ĥ θ y, such as for any δ n 0, sup sup θ θ y δ n y Y and sup sup θ θ y δ n y Y n ĝθ ĝθy gθ gθ y Ĝ θy G θ y θ θ y θ θ y = o P 1, n Ŵ θ Ŵ θy W θ W θ y Ĥ θy H θ y θ θ y θ θ y such that n ĝ θ y g θ y, Ĝ θ y G θ y, Ĥ θ y H θ y G g, G G, G H. = o P 1, 20 / 44

Assumption 5,nonsmooth overidentified models sup y Y y = o n 1/4. For any δ n 0, n ĝθ ĝθy gθ gθ y sup sup = O P 1 θ θ y δ n y Y θ θy Furthermore, Ŵ θ Ŵ, W θ W and Ŵ W = O P 1 n. Assumption 6 The kernel function satisfies 1 κ x = h x where h decreases monotonically on 0, ; 2 κ x dx = 1; 3 xκ x dx = 0; 4 x 2 κ x dx <. 21 / 44

Theorem Under Assumptions 1, 2, 6, and one of 3, 4, or 5, in the local linear regression, both n ˆη η = o P 1 and ˆη τ η τ = o P 1 n when Sh k, nh and nh 2 = o 1, so that ˆη and ˆη τ are first order asymptotically equivalent to ˆη and ˆη τ, and posterior inference based on ˆη τ is valid whenever it is valid for the infeasible η τ. Since the posterior distribution shrinks at 1/ n rate, whenever Sh k, aside from the bias term, ˆθ is automatically [ˆθ] n consistent for E. Interaction between n and h is limited to the bias term. Theorem 1 holds regardless of exact identification d = k or overidentification d > k. The lower bound on S is S >> n k/4 in the sense that n k/4 S. 22 / 44

Theorem Under Assumptions 1, 2 and 6, and one of 3, 4, for ˆη and ˆη τ defined in the local polynomial regressions, if nh 2p+1 0, nh, Sh k, then ˆθ θ = o P 1/ n, ˆη η = op 1 n, and ˆη τ η τ = o P 1/ n, so that posterior inference based on ˆη τ is valid whenever it is valid for the infeasible η τ. The lower bound on S implied by Theorem 2 is given by S >> n which can be much smaller than S >> n k/4 by using a larger p. Also hold regardless of exact or overidentification. k 2p+1, In the proof, we impose the additional assumption that nh, to allow for smaller S. 23 / 44

Difference from conventional nonparametric regression: marginal density of Y m, conditional variance of conditional density in the quantile regression case of θ given Y n are sample size dependent. Only for exactly identified model, f Yn 0 = O p 1. Behavior of the marginal density of Y n more complex in overidentified model: f Yn 0 = O p n d k 2. For local linear estimators with nh 2 = o 1, whenever Sh k, ˆθ is automatically n consistent for E ˆθ. Both higher order polynomials and kernels reduce bias. However, higher order polynomials also improve on variance. 24 / 44

Locally constant and possibly higher order kernel mean and quantile estimates of η are: and ˆη = S y η s s κ n / h s=1 ˆη τ = arg min a s=1 S κ s=1 y s n, 10 h S y ρ τ η s s a κ n. 11 h However, the conditions required for n-consistency and asymptotic normality are substantially more stringent for 10 and 11, and implies a larger lower bound on S: S >> n k/4 n Theorem Under Assumptions 1, 2, 6, and one of 3, 4, or 5, For ˆη and ˆη τ defined in 10 and 11, ˆη η = o 1 P n and ˆη τ η τ = o 1 P n if Sh k min 1, 1 nh 2 and nh 2 0 when d = k. The same conclusion holds when d > k under the additional condition that nh. 25 / 44

Illustrating example. Consider a vector of sample means: ĝ θ = µ X. Let π µ = N µ 0, Σ 0. Y m = µ X + 1 m Σ 1/2 ξ. Given µ, Y m N µ X, 1 m Σ. Then the posterior mean and variance are given by E µ Y m = t = Σ m Σ 0 + Σ 1 µ 0 + Σ 0 Σ 0 + Σ 1 X + t m X + t m m and Var µ Y m = t = Σ 0 Σ 0 + 1 1 m Σ Σ 1 m = O. m Given X, the marginal density of Y n is N µ 0 X, Σ 0 + 1 m Σ = O 1 whenever Σ 0 is nonsingular, as in the exact identification case of d dim Y m = k dim µ. 26 / 44

Suppose now d > k = 1, then for a scalar u 0 and σ0 2, and for l being a d 1 vector of 1 s, write µ 0 = u 0 l and Σ 0 = σ0 2ll. The previous calculation can not be used when m, n. Instead, In this case Σ 1 m + σ2 0 ll = mσ 1 σ2 0 m2 Σ 1 ll Σ 1 1 + σ0 2ml Σ 1 l. E µ Y m = t = I σ2 0 mll Σ 1 1 + σ0 2ml Σ 1 u 0 l + σ mσ 20 l ll 1 σ2 0 m2 Σ 1 ll Σ 1 X 1 + σ0 2ml Σ 1 + t l σ0 2 = I ll Σ 1 1/m + σ0 2l Σ 1 u 0 l + mσ2 0 ll Σ 1 X l 1 + σ0 2ml Σ 1 + t l As m, E µ Y m = t ll Σ 1 l Σ 1 l X + t, the GLS estimator. Furthermore, now interpret µ as a scalar: Var µ Y m = t = σ0 2 σ4 0 l Σ 0 + Σ/m 1 l = σ0 2 1 1 + σ0 2ml Σ 1 l. 27 / 44

The marginal density of Y m at t = 0 of N X u0 l, Σ becomes singular when m, n. m + σ2 0 ll Let X µ 0 = X 1 u 0 l + 0, / n for = n X 1 X 1 so that = Ω 1/2 Z for some Ω, and Z N 0, I d k. The exponent of this density becomes 1 1 Σ X µ 0 2 m + σ2 0ll X µ 0 = X 1 u 0 2 ml Σ 1 l 1 + σ 2 0 ml Σ 1 l + 0, / n which is O p 1 if m n, and if m/n. Also using the relation that det I + uv = 1 + u v, Σ m d 1 det m + σ2 0 ll m C > 0. mσ 1 σ2 0m 2 Σ 1 ll Σ 1 0, / n 1 + σ 2 0 ml Σ 1 l 28 / 44

Change of variable and the singularity of f Y. Partition Y = Y 1, Y 2 for scalar Y 1. Let Y 2 = Y 1 + n + w 2 m. Simplify by letting Σ = I. Then Y 1 =µ X 1 + ξ 1 m Y 2 = µ X 2 + ξ 2 m, = n X2 X 1 = Op 1 w 2 = ξ 2 ξ 1 = O p 1. Implication for the kernel function Y1 κ h, Y 2 = κ h If nh, mh, essentially κ Y 1 /h. Y1 h, Y 1 h + nh = o p 1, + w 2 nh mh w 2 mh = o p 1, this is 29 / 44

Behavior of f Y n in the general case: For each v, f nθ θ v Y n = t = n k fθ θ + n v Y n = t and Rewrite this relation f θ θ + v π θ θ + v n f Y n = t θ + n v Y n = t =. n f Y n = t f Y n = t = n k π θ θ + v f Y n n = t θ = θ + n v f nθ v Yn θ = t Since both f nθ θ v Y n = t = O 1 and π θ θ + v n = O 1, f Y n = t C n k f Consider for example t = v = 0, then f Y n = t θ = θ + v n. Y n = 0 θ = θ n d/2 det W 1/2 exp n θ 2 ĝ W ĝ θ = O p n d/2. Hence f Y n = 0 = O p n d k/2. 30 / 44

Implications for nonparametric regression. In local constant kernel method: variance 1 Sh k 1 n + h2. In local linear regression: regressors can be asymptotically collinear along a k dimensional manifold, with up to 1/ n local variation surrounding this manifold. 1 In this case, the intercept term will converge at the fast 1 Sh k n rate, up to the bias adjustment. However, the local linear coefficients will converge at the slower 1 Sh k rate, and does not benefit from the 1/ n term. Although certain linear combinations of the coefficients converge at 1 the faster rate of 1 Sh k h n. We illustrate this point in the normal analytic example, for k = 1, d = 2 with diffuse prior σ 0 =. 31 / 44

In the normal example when m = n: µ = β 0 + β 1 Y 1 + β 2 Y 2 + ɛ, where β 0 = l Σ 1 l 1 l Σ 1 X ɛ N 0, 1 l Σ 1 l 1. n β 1 β 2 = l Σ 1 l 1 l Σ 1 This can be written as µ =β 0 + Y 1 β 1 + β 2 + Y 2 Y 1 β 2 + ɛ β 0 + Y 1 η + X 1 X 2 + ξ 2 ξ 1 β 2 + ɛ n n =β 0 + X 2 X 1 β2 + µ + ɛ 1 β η + ɛ 2 2 + ɛ n n Then for θ = β 0 + X 2 X 1 β2, β 1 + β 2, β 2 n and its corresponding least square estimate ˆθ based on the dependent variable µ s, s = 1,..., S and regressors Y 1 µ + ɛ 1 n and n Y2 Y 1 ɛ 2, where S is typically Sh k, S ˆθ θ has a nondegenerate distribution. 32 / 44

As S n S ˆθ θ N 0, σ 2 Σ 1 n where Σ n = 1 0 0 0 σu 2 + σ2 1 σ n 12 n σ 0 n 12 σ2 2 1 0 0 0 σ 2 u 0 0 0 σ 2 2 Asymptotically, ˆβ 1 + ˆβ 2 and ˆβ 2 are independent. These results will be demonstrated in the general case. If m/n, then ɛ N 0, 1 m l Σ 1 l 1, and Y 2 Y 1 = X 1 X 2 + ξ 2 ξ 1 m. In this case, need to scale β 2 by m instead. 33 / 44

If m =, Y 2 is a constant shift of Y 1 by X 2 X 1. Regressors become collinear. In this case, it appears sufficient to regress µ s on either Y 1s or Y 2s. However, in this case the intercept term β 0 is not estimable within scale O p 1/ n, in the sense that ˆβ 0 = F ˆβ for F = 1 0 0 and ˆβ = ˆβ 0, ˆβ 1, ˆβ 2 is not uniquely determined when ˆβ can have multiple solutions to the normal equation due to the collinearity of regressors. Note see this, note that one can either of the following two cases: µ = β 0 + X2 X 1 β2 + Y 1 β 2 or µ = β 0 X2 X 1 β1 + Y 2 β 2 34 / 44

Or Rao 1973 to argue that there is no A such that F = X A. In fact, let A = a 1,..., a n. Then to satisfy 1 0 0 = ai ai Y 1i X2 X 1 ai + a i Y 1i It is necessary that a i = 1 and a i = 0, which forms contradiction. The linear collinear case is perhaps unlikely in nonlinear models. Nevertheless this suggests that in overidentified model, the limit of taking m, even at very fast rate, is different from setting m =. It is possible to regress only on Y 1 or Y 2, but in this case X 1 and X 2 might not have been combined optimally. 35 / 44

Monte Carlo Simulation: DSGE Model A single good can be consumed or used for investment, and a single competitive firm maximizes profits. The variables: y output; c consumption; k capital; i investment, n labor; w real wages; r return to capital. Household maximizes expected discounted utility c 1 γ E t β s t+s + 1 nt+sηtψ 1 γ s=0 subject to the budget constraint c t + i t = r tk t + w tn t accumulation of capital k t+1 = i t + 1 δk t. Preference shock evolution: and the ln η t = ρ η ln η t 1 + σ ηɛ t 12 36 / 44

Monte Carlo Simulation: DSGE Model Firm production technology: y t = k α t n1 α t z t. Technology shock log AR1 process: ln z t = ρ z ln z t 1 + σ z u t ɛ t and u t, mutually independent i.i.d. N 0, 1. variables. Output allocation: y t = c t + i t. Capital and labor paid at the rates r t and w t. Estimate steady state hours, n, along with the other parameters, excepting ψ 37 / 44

Table : DSGE models, support of uniform priors. Parameter Lower bound Upper bound True value Prior bias Prior RMSE α 0.2 0.4 0.330-0.030 0.065 β 0.95 1 0.990-0.015 0.021 δ 0.01 0.1 0.025 0.030 0.040 γ 0 5 2.000 0.500 1.527 ρ z 0 1 0.900-0.400 0.493 σ z 0 0.1 0.010 0.030 0.042 ρ η 0 1 0.700-0.200 0.351 σ η 0 0.1 0.005 0.040 0.049 n 6/24 9/24 1/3-0.021 0.042 38 / 44

Table : Selected statistics, DSGE model. For statistics 11-20, σ xy indicates the sample covariance of the residuals of the AR1 models for the respective variables x and y. Statistic Description Statistic Description 1 log ψ 12 σ qq 2 γ 13 σ qn 3 ρ η, residuals 14 σ qr 4 sample mean c 15 σ qw 5 sample mean n 16 σ cc 6 sample std. dev. q 17 σ cn 7 sample std. dev. c 18 σ cr 8 sample std. dev. n 19 σ cw 9 sample std. dev. r 20 σ nn 10 sample std. dev. w 21 σ ww 11 estimated AR1 coef., r 22 c/n 39 / 44

Table : DSGE model. Monte Carlo results 1000 replications. Bandwidths tuned using prior. LC=local constant, LL=local linear, LQ=local quadratic. 90% CI gives the proportion of times that the true value is in the 90% confidence interval. Bias RMSE 90% CI Parameter LC LL LQ LC LL LQ LC α 0.025 0.002 0.001 0.032 0.013 0.012 0.920 β -0.008 0.001 0.001 0.010 0.003 0.003 0.993 δ 0.007 0.001-0.000 0.011 0.004 0.003 0.991 γ 0.037 0.037 0.006 0.158 0.103 0.106 0.986 ρ z -0.012-0.003 0.001 0.040 0.012 0.009 0.877 σ z -0.001-0.001-0.000 0.003 0.002 0.002 0.893 ρ η -0.007-0.011-0.009 0.054 0.047 0.049 1.000 σ η 0.001-0.000 0.000 0.003 0.002 0.001 0.834 n 0.003 0.001 0.001 0.005 0.004 0.004 0.731 40 / 44

Table : DSGE model. Monte Carlo results 1000 replications. Bandwidths tuned locally. LC=local constant, LL=local linear, LQ=local quadratic. 90% CI gives the proportion of times that the true value is in the 90% confidence interval. Bias RMSE 90% CI Parameter LC LL LQ LC LL LQ LC α 0.027 0.003 0.001 0.033 0.013 0.012 0.916 β -0.008 0.001 0.002 0.011 0.003 0.003 1.000 δ 0.008 0.001-0.000 0.011 0.004 0.003 0.900 γ 0.031 0.036 0.005 0.145 0.103 0.099 0.922 ρ z -0.013-0.002 0.001 0.040 0.010 0.008 0.900 σ z -0.001-0.001-0.008 0.003 0.002 0.002 0.863 ρ η -0.010-0.012-0.010 0.054 0.046 0.049 0.794 σ η 0.001 0.000 0.000 0.003 0.002 0.001 0.835 n -0.006 0.001 0.002 0.006 0.004 0.004 0.921 41 / 44

Monte Carlo Simulation Setup: quantile instrument variable model of Chernozhukov and Hansen 2005 y i = x i β + ɛ i, where Q τ ɛ i z i = 0. Data generating process: ɛ i = exp z i α2 v i 1 where v i is such that Q τ v i z i = 0: v i N 0, 1 We choose x i = 1, x i where x i = ξ 1i + ξ 2i, z i = 1, z i 1, z2 i where z i 1 = ξ 2i + ξ 3i, z i 2 = ξ i1 + ξ 4i. In the above ξ ij, j = 1,..., 4 i.i.d N 0, 1. α = 1/5, 1/5, 1/5, β = 1, 1, n = 200. Quantile moment condition ĝ β = 1 n n i=1 z i τ 1 y i x i β. Weighting matrix W = 1 n n i=1 z iz i 1. 42 / 44

Table : Quantile IV model. Monte Carlo results 1000 replications. Bandwidths tuned using prior. LC=local constant, LL=local linear. 90% CI gives the proportion of times that the true value is in the 90% confidence interval. β 1 β 2 Prior 0.5 0.5 Bias IV 0.104 0.229 LC 0.005 0.008 LL 0.003 0.006 Prior 1.0 1.0 RMSE IV 0.107 0.232 LC 0.023 0.045 LL 0.019 0.038 90% CI LC 0.858 0.903 43 / 44

Table : Quantile IV model. Monte Carlo results 1000 replications. Bandwidths tuned locally. LC=local constant, LL=local linear. 90% CI gives the proportion of times that the true value is in the 90% confidence interval. β 1 β 2 Bias LC 0.009 0.018 LL 0.005 0.010 RMSE LC 0.028 0.056 LL 0.019 0.038 90% CI LC 0.899 0.912 44 / 44

References Andrews, D. 1997: A stopping rule for the computation of generalized method of moments estimators, Econometrica, 654, 913 931. Chaudhuri, P. 1991: Nonparametric estimates of regression quantiles and their local Bahadur representation, The Annals of Statistics, 192, 760 777. Chernozhukov, V., and C. Hansen 2005: An IV Model of Quantile Treatment Effects, Econometrica, 731, 245 261. Chernozhukov, V., and H. Hong 2003: A MCMC Approach to Classical Estimation, Journal of Econometrics, 1152, 293 346. Creel, M. D., and D. Kristensen 2011: Indirect likelihood inference,. Gallant, R., and G. Tauchen 1996: Which Moments to Match, Econometric Theory, 12, 363 390. Gentzkow, M., and J. Shapiro 2013: Measuring the sensitivity of parameter estimates to sample statistics, Unpublished Manuscript. Gourieroux, C., A. Monfort, and E. Renault 1993: Indirect Inference, Journal of Applied Econometrics, pp. S85 S118. 44 / 44

Koenker, R., and G. S. Bassett 1978: Regression Quantiles, Econometrica, 46, 33 50. 44 / 44