Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Similar documents
Garrett: `Bernstein's analytic continuation of complex powers' 2 Let f be a polynomial in x 1 ; : : : ; x n with real coecients. For complex s, let f

Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania

A Local Generalized Method of Moments Estimator

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

Proposition 5. Group composition in G 1 (N) induces the structure of an abelian group on K 1 (X):

Locally Robust Semiparametric Estimation

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

University of Pavia. M Estimators. Eduardo Rossi

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding

From Fractional Brownian Motion to Multifractional Brownian Motion

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Estimation of the Binary Response Model using a. Mixture of Distributions Estimator (MOD) Mark Coppejans 1. Department of Economics.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

2 FRED J. HICKERNELL the sample mean of the y (i) : (2) ^ 1 N The mean square error of this estimate may be written as a sum of two parts, a bias term

Citation for published version (APA): van der Vlerk, M. H. (1995). Stochastic programming with integer recourse [Groningen]: University of Groningen

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

Generalization of Hensel lemma: nding of roots of p-adic Lipschitz functions

Economics 620, Lecture 8: Asymptotics I

ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS. Abstract. We introduce a wide class of asymmetric loss functions and show how to obtain

Simple Estimators for Monotone Index Models

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Introduction Let Y be a ( + l) random vector and Z a k random vector (l; k 2 N, the set of all natural numbers). Some random variables in Y and Z may

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

12 CHAPTER 1. PRELIMINARIES Lemma 1.3 (Cauchy-Schwarz inequality) Let (; ) be an inner product in < n. Then for all x; y 2 < n we have j(x; y)j (x; x)

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Effective Dimension and Generalization of Kernel Learning

ROYAL INSTITUTE OF TECHNOLOGY KUNGL TEKNISKA HÖGSKOLAN. Department of Signals, Sensors & Systems

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Density estimators for the convolution of discrete and continuous random variables

Simple Estimators for Semiparametric Multinomial Choice Models

Statistical Properties of Numerical Derivatives

Parametric Inference on Strong Dependence

1 Solutions to selected problems

A new method of nonparametric density estimation

Chapter 1. GMM: Basic Concepts

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals

Introduction: structural econometrics. Jean-Marc Robin

Entrance Exam, Real Analysis September 1, 2017 Solve exactly 6 out of the 8 problems

. HILBERT POLYNOMIALS

and the polynomial-time Turing p reduction from approximate CVP to SVP given in [10], the present authors obtained a n=2-approximation algorithm that

2 I. Pritsker and R. Varga Wesay that the triple (G W ) has the rational approximation property if, for any f(z) which is analytic in G and for any co

7. F.Balarin and A.Sangiovanni-Vincentelli, A Verication Strategy for Timing-

A Proof of the EOQ Formula Using Quasi-Variational. Inequalities. March 19, Abstract

ECONOMETRIC MODELS. The concept of Data Generating Process (DGP) and its relationships with the analysis of specication.

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Rank Estimation of Partially Linear Index Models

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Regularization in Reproducing Kernel Banach Spaces

Strong convergence theorems for total quasi-ϕasymptotically

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

A More General Central Limit Theorem for m-dependent. Random Variables with unbounded m. Los Angeles, CA May Abstract

Some Perturbation Theory. James Renegar. School of Operations Research. Cornell University. Ithaca, New York October 1992

GMM estimation of spatial panels

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Nonparametric Trending Regression with Cross-Sectional Dependence

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS

Tests for Neglected Heterogeneity in Moment Condition Models

On a Nonparametric Notion of Residual and its Applications

Approximation of BSDEs using least-squares regression and Malliavin weights

Stochastic Convergence, Delta Method & Moment Estimators

A Note on the Correlated Random Coefficient Model. Christophe Kolodziejczyk

Scaled and adjusted restricted tests in. multi-sample analysis of moment structures. Albert Satorra. Universitat Pompeu Fabra.

BERGMAN KERNEL ON COMPACT KÄHLER MANIFOLDS

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Discrete Dependent Variable Models

Converse Lyapunov Functions for Inclusions 2 Basic denitions Given a set A, A stands for the closure of A, A stands for the interior set of A, coa sta

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

Nonparametric Estimation of Wages and Labor Force Participation

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Optimal Estimation of a Nonsmooth Functional

Cross-fitting and fast remainder rates for semiparametric estimation

The learning task is to use examples of classied points to be able to correctly classify all possible points. In neural network learning, decision bou

Economics 620, Lecture 20: Generalized Method of Moment (GMM)

Stochastic dominance with imprecise information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

A nonparametric method of multi-step ahead forecasting in diffusion processes

Our goal is to solve a general constant coecient linear second order. this way but that will not always happen). Once we have y 1, it will always

1 More concise proof of part (a) of the monotone convergence theorem.

A Finite Element Method for an Ill-Posed Problem. Martin-Luther-Universitat, Fachbereich Mathematik/Informatik,Postfach 8, D Halle, Abstract


Research Collection. On the definition of nonlinear stability for numerical methods. Working Paper. ETH Library. Author(s): Pirovino, Magnus

Asymptotic Statistics-III. Changliang Zou

A quasi-separation theorem for LQG optimal control with IQ constraints Andrew E.B. Lim y & John B. Moore z February 1997 Abstract We consider the dete

T xi (M) n i. x i b 1xi. x j. h j. x p j b 2x i

The best expert versus the smartest algorithm

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

4. Higher Order Linear DEs

An Averaging GMM Estimator Robust to Misspecication

data lam=36.9 lam=6.69 lam=4.18 lam=2.92 lam=2.21 time max wavelength modulus of max wavelength cycle

Re-sampling and exchangeable arrays University Ave. November Revised January Summary


7 Semiparametric Methods and Partially Linear Regression

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Dimensionality in the Stability of the Brunn-Minkowski Inequality: A blessing or a curse?

Transcription:

Computation Of Asymptotic Distribution For Semiparametric GMM Estimators Hidehiko Ichimura Graduate School of Public Policy and Graduate School of Economics University of Tokyo A Conference in honor of Professor Dan McFadden

1. Motivation 2. Model 3. Overview and preliminary results 4. Main results 5. Applications 6. Conclusion

1 Motivation Two reasons: Many works claried the structure of the asymptotic analysis but given a problem, we still cannot carry out the asymptotic analysis in the same way we can for the standard case. Common structure needs to be claried further to obtain systematic results on smoothing parameter choice problem.

2 Model E fg (z; ; (; ))g = 0 if and only if = 0 and (; ) = 0 (; 0 ). (; ) is estimated by ^ (; ). Notations: z 2 R k, 2 R p, for each, is a functions into R d with norm kk. Write instead of (; ) and g i (; ) instead of g (z i ; ; (; )). Often g i (; ) = g (z i ; ; h 1 (h 2 (z i ; ) ; )) for an unknown function h 1 into R` and a known function h 2 so that g can be regarded as a function from Z R` into R m.

The added generality is useful to handle applications where is a conditional expectation of an unknown variable which needs to be estimated, for example. The generality is also useful in applications where individuals' decisions depend on the entire distribution of a variable, which in turn is estimated. This is the case for individual decisions in auction models, for example, or more generally any decision under explicitly stated expectation which is to be estimated. Let G n (; ) = 1 n nx i=1 g i (; ) : We study the generalized method of moment estimator which is dened as a solution to the following problem: inf 2 G n (; ^ ) T ^AG n (; ^ )

where ^ is an estimator of and ^A is an mm symmetric matrix which converges in probability to a symmetric positive denite matrix A.

3 Overview of results Our approach is a direct application of the standard analysis of the two-step GMM estimators. Like the standard analysis, the basic result appeals to the Taylor's series expansion with respect to function (; ) at 0 (; ). Let F be a mapping from B into R K and let F be dened over an open subset O of B. The Taylor series expansion of F is available when the rth Frechet derivative F (r) (x) exists for any x 2 O and is uniformly continuous: F (x + h) = F (x) + F 0 (x) (h) + 1 2! F 00 (x) (h; h) + + 1 r! F (r) (x) (h; :::; h) +! (x; h) where k! (x; h)k B = o (khk r B ). If the rth derivative satises the Lipschitz condition with exponent > 0, then k! (x; h)k B = O khk r+ B :

Let n denote a sample size. We consider an estimator ^ 0 of an element 0 in and assume it is asymptotically linear: there exist a stochastic sequence f ni g n i=1 with ni 2 and E ( ni ) = 0 (when the function is evaluated at each point) and a deterministic sequence fb n g with b n 2 such that ^ 0 0 n 1 nx i=1 ni b n = o p n 1=2 : The norm used for needs to be stronger that the norm used to dene the Frechet dierentiability. Three dierences compared to the standard case: 1. Importance of the norm used and the specication of. 2. Presence of bias. 3. U-statistics CLT rather than the standard CLT.

4 Main results Let rg = E frg (z; 0 ; 0 (; 0 ))g where rg (z; ; (; )) = @g (z; ; (; )) =@ +@g (z; ; (; )) =@ @ (; ) =@; H = (rg) T A (rg), and denote the expectation conditional on z i by E fjz i g. Let n = V ar[g (z 1 ; 0 ; 0 ) +E n @g (z 1 ; 0 ; 0 ) =@ 0 n2jz 1 o +E n @g (z 2 ; 0 ; 0 ) =@ 0 n1jz 1 o ]

The main result is the following: Suppose ^ is consistent to 0. Under the conditions below n1=2 ^ 0 converges in distribution to a normal random vector with mean plim n 1=2 n!1 nx i=1 and variance-covariance matrix @g i ( 0 ; 0 ) =@ 0 b n lim n!1 H 1 (rg) T A n ArGH 1 :

Here are the conditions: 1. fz i g n i=1 are independent and identically distributed. 2. ( 0 ; 0 (; 0 )) is an interior point of f(; (; ))g 2. 3. plim n!1 ^A = A where A is symmetric and positive denite. 4. g (z; ; ) is Frechet dierentiable with respect to (; ) in and the Frechet derivatives satises the Lipschitz continuity conditions: for C j (z) > 0 E n C j (z) o < 1 (j = 1; 2; 3; 4) @g (z; ; ) =@ C 1 (z) @g (z; ; ) =@ C 3 (z) @g z; 0 ; 0 =@ 0 R p + C 2 (z) 0 R p + C 4 (z) 0 @g z; 0 ; 0 =@ 0 R mp L

5. k@g (z; ; (; )) =@k L + k@g (z; ; (; )) =@k R mp C 0 (z) and E fc 0 (z)g < 1. 6. E frg (z; 0 ; 0 (; 0 ))g rg is nite and has full rank. 7. 7! 0 (; ) as a mapping from into is continuous at 0. 8. sup 2N (0 ;") k^ (; ) 0 (; )k = o p n 1=4 and that ^ (; 0 ) is asymptotically linear for 0 (; 0 ) in with rate n 1=2. 9. plim n!1 n 3=2 P n i=1 @g i ( 0 ; 0 ) =@ 0 ni = 0.

10. plim n!1 n 1=2 P n i=1 @g i ( 0 ; 0 ) =@ 0 b n exists. Typically we will nd conditions under which the limit is 0. Under the condition on @g i ( 0 ; 0 ) =@ 0, the term can be bounded: n 1=2 nx i=1 nx 1 n i=1 @g i ( 0 ; 0 ) =@ 0 b n C 0 (z i ) n 1=2 kb n k : R m Thus if n 1=2 kb n k = o (1), then the limit is 0. 11. ^ (; ) is continuously dierentiable and sup k@^ (; ) =@ @ 0 (; ) @k = o p (1) : 2N ( 0 ;")

5 Applications To carry out these computations, we need to nd out the relevant Frechet derivatives and know what the asymptotic linear expressions are for the nonparametric estimators used in the estimation. For the kernel density estimators the following are the terms in the asymptotic linear expressions: ni = 1 zi z h dk h 1 E h dk 1 b n = E h dk zi zi h h z z and f (z) : For kernel regression estimators of g (x) = E (Y jx = x) ;

denoting " = Y g (X), the asymptotic linear approximation of (^g g) I ^f > b takes the following form: ni = " ih d K ((x i x) =h) I (f (x) > b) and f (x) b n = E[I (f (x) > b) (g (x i) g (x)) xi x h d K =f (x)]: h To control the bias a certain type of kernel function needs to be used. The following \higher order kernel" by Bartlett (1963) is a standard device in the literature. Let j0 = 1 if j = 0 and 0 for any other integer value j. K`, ` 1 is the class of symmetric functions k : R! R around zero such that Z 1 1 tj k (t) dt = j0 for j = 0; 1; :::; ` 1

and for some " > 0 lim k (t) = 1 + jtj`+1+" < 1: jtj!1 Dimension d kernel function K of order ` is constructed by K (t 1 ; :::; t d ) = k (t 1 ) k (t d ) for k 2 K`. In order to improve the order of bias by the higher order kernel, the underlying density is required to be smooth accordingly. The following notion of smoothness is used by Robinson (1988). Let [] denote the largest integer not equal or larger than. G, > 0, > 0, is the class of functions g : R d! R satisfying: g is []-times partially dierentiable for all z 2 R d ;

for some > 0, n y2 ky sup o zk R d< jg (y) g (z) Q (y; z)j ky zk R d h (z) for all z; Q = 0 when [] = 0; Q is a []-th degree homogeneous polynomial in (y z) with coecients the partial derivatives of g at z of orders 1 through [] when [] 1; and g (z), its partial derivatives of order [] and less, and h (z) have nite th moments. Bounded functions are denoted by G 1. Let K be a higher order kernel constructed as above. Robinson (1988) has shown the following results: he E h d K ((z 2 z 1 ) =h) jz 1 f (z 1 ) i 2 = O h 2 when f 2 G 1 for some > 0 and k 2 K []+1.

and E n (g (z2 ) g (z 1 )) h d K ((z 2 z 1 ) =h) = O h min(;+1;+) o when f 2 G 1, g 2 G, and k 2 K []+[]+1. The following estimator ^ of E ffg is examined by Schweder (1975), Ahmad (1978) and Hasminskii and Ibragimov (1979): 0 = n 1 n X i=1 h ^f (zi ) i : In this application g (z; ; f) = f (z). Note that @g (z; ; f) =@ = 1 and that @g (z; ; f) =@f = I z where I z evaluates a given function at point z. Since neither derivative depends on or f, condition 4 holds trivially.

Condition 5 would require to be restricted to a continuous and bounded class of functions. rg (z; ; f) = 1 so that rg = 1. @g (z 1 ; 0 ; f 0 ) @f 0 n2 = 1 h dk 1 + E h dk @g (z 2 ; 0 ; f 0 ) @f 0 n1 = 1 h dk 1 + E h dk z2 z 1 h z2 z 1 h z1 z 2 Thus E @g (z 1 ; 0 ; f 0 ) =@f 0 n2 jz 1 ( @g (z2 ; E 0 ; f 0 ) @f 0 n1jz 1 1 z1 z = E 2 h dk jz 1 +E h ) h z1 z 2 h = 0 and 1 h dk jz 1 jz 2 : z1 z 2 h

Therefore n = V ar[ 0 f 0 (z 1 ) 1 z1 z E 2 1 z1 z h dk jz 1 +E 2 h h dk h! 4E n [ 0 0 (z 1 )] 2o : ] Also, Robinson's result allows us to nd conditions under which the asymptotic bias is 0. Another example is the partial linear regression model of Cosslett (1984), Schiller (1984) and Wahba (1984). For x 2 R K, y 2 R, w 2 R d the model is where E ("jw; x) = 0. y = x T 0 + (w) + "

Consider an estimator which solves the following equations: 0 = n 1 n X i=1 h yi x 0 i^ ^E (yjw i ) + ^E x 0 jw i ^i ^I i x i where ^I i = I ^f (wi ) > b and I is the indicator function. The following lemma allows us to consider 0 = n 1 X n [y i x 0 i^ ^E (yjw i )+ ^E x 0 jw i ^]Ii x i i=1 instead of the feasible GMM. Let I i = I (f (w i ) > b). Pr at least one of ^I i I i 6= 0! 0 when f 2 G 1, for some > 0, k 2 K []+1, jk (0)j < 1, b is positive and bounded, nh d b 2 = log n! 1, b=h! 1, and when there is no positive probability that f (w i ) = b.

Let z = (w; x; y). In this example, so that g (z; ; ) = h y x T 1 (w) 2 (w) T i I x rg (z; ; ) = I x [x 2 (w)] T : Since rg (z; ; ) is linear in and does not depend on, the verication of the conditions are easy. One can verify when b! 0 and E n x [x E (xjw)] T o R K 2 < 1 the following holds: rg = E n I x [x E (xjw)] T o! E n x [x E (xjw)] T o To examine the asymptotic distribution note that the Frechet derivative of g with respect to is

@g=@ (h) = h 1 (w) h 2 (w) T 0 x so that writing u = y E (yjw) and v = x E (xjw) and " = y x T 0 (w) = = @g (z 1 ; 0 ; 0 ) @ 0 n2 u2 v2 T 0 h d K ((w 2 f (w 1 ) I (f (w 1 ) > b) x 1 @g (z 2 ; 0 ; 0 ) @ 0 n1 u1 v1 T 0 h d K ((w 1 f (w 2 ) I (f (w 2 ) > b) x 2 : Thus noting that u = y E (xjw) T 0 (w) = " + v T 0 w 1 ) =h) w 2 ) =h) E ( ) @g (z1 ; 0 ; 0 ) @ 0 n2jz 1 = 0

E ( ) @g (z2 ; 0 ; 0 ) @ 0 n1jz 1 = " 1 E[ h d K ((w 1 w 2 ) =h) I (f (w 2 ) > b) x 2 jw 1 ] f (w 2 ) so that n = V ar[" 1 [x 1 I 1 E[ h d K ((w 1 w 2 ) =h) I (f (w 2 ) > b) x 2 jw 1 ]]] f (w 2 )! V ar [" 1 [x 1 E (x 1 jw 1 )]] as b! 0: 6 Conclusion More examples need to be worked out. Analogous results for series estimators should be established.

First stage expansion result can be obtained including parameters so that conditions can be simplied. general M-estimator is considered in the joint work with Simon Lee (UCL) without assuming smoothness.