Generalized Method of Moments: I References Chapter 9, R. Davidson and J.G. MacKinnon, Econometric heory and Methods, 2004, Oxford. Chapter 5, B. E. Hansen, Econometrics, 2006. http://www.ssc.wisc.edu/~bhansen/notes/notes.htm {More complete treatments are provided in: Chapters 3,4,6, and 7, F. Hayashi, Econometrics, 2000, Princeton. A.R. Hall, Generalized Method of Moments, 2005, Oxford.}
Generalized Method of Moments, or, is an approach to estimation and inference that includes as special cases: OLS, GLS, FGLS, IV (including 2SLS and 3SLS, and NLS. he appeal of is that it provides a theoretically and computationally tractable unified approach to efficiently estimating linear and nonlinear regressions with endogenous regressors and/or nonspherical disturbances. It was introduced by Lars Hansen in his Ph.D. thesis and published in 982 (Econometrica and is being applied in virtually every area of economics, agricultural economics, and finance and with all types of data (i.e., time series data, cross-sectional data, and panel data.
I. for the Linear Regression Model Let s begin with the linear regression model y t = x t β + ε t, t =,, where x t and β are kx, > k. or, y = Xβ + ε where y and ε are x, X is xk, β is kx, E(ε = 0 and E(εε = Σ. [So, x t is the t-th row of X.] Some or even all of elements of x t can be correlated with ε t. Assume, however, that there exist m instrumental variables, w t (mx, where > m and m > k. hat is, E(ε t w t =0, t =,,. If E(ε t x ti = 0 for t =,,, for any i =,,k then x ti Є w t. In addition, we will assume that E(ε t ε s w t,w s = E(ε t ε s = Σ ts.
he model s assumption that E(ε t w t =0, t =,, implies the following population or theoretical set of moment conditions - E(w t (y t x t β = 0, t =,, {E(w ti (y t x t β = 0, t =,,; i =,,m} which simply says that the m instruments, w t,,w tm, are orthogonal to the disturbance term, ε t, for each t. he idea is to estimate β, so that the corresponding sample or empirical moments t = w ti ( y t ' xt ˆ β = wi ( y Xβ, i =,,m are close to zero, where w i =[w i w i ].
Digression he method of moments estimator, which is among the oldest estimation procedures in modern statistics, directs us to estimate parameters that are moments of probability distributions by using their sample analogues. For instance, if the parameter of interest is µ y = E(y, then the MM estimator of µ y is simply the sample mean of y, y. (What is the MM estimator of σ y 2?
he sample moment conditions form a set of m equations in k unknowns (i.e., the k elements of β. Suppose m = k. hen the model is just identified; there exists a unique β-hat for which the sample moments are all equal to zero. hat solution is the (simple IV estimator for β, ˆ IV = ( W ' X β W ' Y Note that if x s are all predetermined then W=X and this reduces to the OLS estimator. Suppose m > k. hen the model is overidentified. In general, there will not exist a β-hat that will make all the sample moments equal to zero. Instead, we select β-hat to make k independent linear combinations of the m restrictions equal to zero. Which k linear combination should we use? We could, for instance, simply select the first k of the m conditions and set those equal to zero. Or, we could select the last k of the m conditions and set those equal to zero. Or
However we decide to select k linear combinations of the m restrictions to identify β, we can write the resulting set of sample moment or orthogonality restrictions as: J W (y-xβ = 0 where J is an mxk with full column rank k. In effect the columns of WJ form a set of k instruments constructed as linear combinations of the original m instruments. (Another way to think about this - For any given β, W (y-xβ are m deviations from 0. J [W (y-xβ] are k linear combinations of these m deviations. he orthogonality condition directs us to choose β- hat so that the residual vector is orthogonal to each of these k instruments. he solution to this just identified IV estimation problem will be the estimator ˆ = ( J ' W ' X β J ' W ' Y = (X WP W X - X WP W y
P is any positive definite mxm matrix and it can vary with. Note that any given J will uniquely determine P and any given P will uniquely determine J. So the estimator can be specified by choosing J or choosing P. Under suitable regularity conditions (e.g., the x s and ε s are stationary and ergodic with existence of appropriate moments and the ε s are asymptotically independent, the estimator is consistent and asymptotically normal. What is the asymptotic variance matrix of the estimator? ˆ = ( J ' W ' X β J ' W ' Y herefore, ˆ β ( J ' W ' X = + β and J ' W ' ε / 2 ( ˆ β β = ( J ' W ' X / 2 J ' W ' ε
Consequently, a var( ˆ β = ( plim J ' W ' X ( plim J ' W ' ΣWJ( plim X ' WJ he sandwich form of the avar matrix is a signal of an inefficient estimator. (Recall the sandwich form of the avar matrix for the OLS estimator when the disturbances are nonspherical vs. the form of the avar matrix for the GLS estimator. he sandwich form of the asymptotic variance of this estimator can be eliminated and, therefore the efficiency of the estimator can be improved if the linear combinations that determine J are chosen so that J = (W ΣW - W X In this case a var( ˆ β = ( plim X ' W ( W ' ΣW W ' X and the estimator of β implied by this choice of J is the optimal or efficient estimator - ˆ = ( X ' W ( W ' ΣW W ' X X ' W ( W ' ΣW β W ' y
Notes. he efficient estimator is efficient not just in the class of estimators but for the class of all estimators that estimate β from the condition that E(W (y-xβ = 0. (It is semiparametrically efficient; semiparametric uses only a partial characterization of the joint distribution function. 2. his efficient estimator is not feasible since Σ is generally unknown. However, an asymptotically equivalent estimator can be constructed by replacing W ΣW with a consistent estimator. 3. Efficiency is defined here relative to the given set of instruments. Issues feasible selection of appropriate and optimal instruments inference extension to nonlinear regressions extensions to multiple equation linear and nonlinear regressions