Generalized Method of Moments (GMM) Estimation

Econometrics 2 Fall 2004 Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen of29 Outline of the Lecture () Introduction. (2) Moment conditions and methods of moments (MM) estimation. Ordinary least squares (OLS) estimation. Instrumental variables (IVE) estimation. (3) GMM defined in the general case. (4) Specification test. (5) Linear GMM. Generalized instrumental variables (GIVE or 2SLS) estimation. 2of29

Idea of GMM Estimation under weak assumptions; based on so-called moment conditions. Moment conditions are statements involving the data and the parameters. Arise naturally in many contexts. For example: (A) In a regression model, y t = x 0 tβ + t, we might think that E[y t x t ]=x 0 tβ. his implies the moment condition (B) Consider the economic relation E[x t t ]=E[x t (y t x 0 tβ)] = 0. y t = β E[x t+ I t ]+ t = β x t+ +(β (E[x t+ I t ] x t+ )+ t ) {z } u t Under rational expectation, the expectation error, E[x t+ I t ] x t+, should be orthogonal to the information set, I t,andforz t I t we have the moment condition E[z t u t ]=0. 3of29 GMM is a large sample estimator. Desirable properties as. Properties of GMM Consistent under weak assumptions. No distributional assumptions like in maximum likelihood (ML) estimation. Asymptotically efficient in the class of models that uses the same amount of information. Many estimators are special cases of GMM. Unifying framework for comparing estimators. GMM is a nonlinear procedure. We do not need a regression setup E[y t ]=h(x t ; β). We can have E[f(y t,x t ; β)] = 0. 4of29

Moment Conditions and MM Estimation Consider a variable y t with some (possibly unknown) distribution. Assume that the mean µ = E[y t ] exists. We want to estimate µ. We could state the population moment condition: E[y t µ] =0, or E[f(y t,µ)] = 0, where f(y t,µ)=y t µ. he parameter µ is identified by the condition if there is a unique solution, in the sense E[f(y t,µ)] = 0 only if µ = µ 0. 5of29 We cannot calculate E[f(y t,µ)] from an observed sample, y,y 2,..., y t,..., y. Define the sample moment condition as g (µ) = f(y t,µ)= (y t µ) =0. ( ) By Law of Large Numbers, sample moments converge to population moments, g (µ) E[f(y t,µ)] for. ( ) he method of moments estimator, bµ MM,isthesolutionto( ), i.e. bµ MM = y t. he sample average can be seen as a MM estimator. MM estimator is consistent. Under weak regularity conditions ( ) implies bµ MM µ 0. 6of29

OLSasaMMEstimator Consider the regression model with K explanatory variables y t = x 0 tβ + t. Assume no-contemporaneous-correlation (minimum for consistency of OLS): E[x t t ]=E[x t (y t x 0 tβ)] = 0. (K ) K moment conditions for the K parameters in β. Define the sample moments g (β) = (x t (y t x 0 tβ)) = X0 (Y Xβ)= 0. (K ) he MM estimator is given by the solution Ã! bβ MM = x t x 0 t x t y t =(X 0 X) X 0 Y = β b OLS. 7of29 Instrumental Variables as a MM Estimator Consider the regression model y t = x 0 tβ + x 2t β 2 + t, where the K variables in x t are predetermined and x 2t is endogenous: E[x t t ] = 0 (K ) E[x 2t t ] 6= 0. (#) OLS is inconsistent! Assume there exists a variable, z 2t, such that corr(x 2t,z 2t ) 6= 0 E[z 2t t ] = 0 ( ) (##) he new moment condition (##) can replace (#). 8of29

Define µ xt x t = K x 2t and µ xt z t =. K z 2t z t are called instruments. z 2t is the new instrument; the predetermined variables, x t, are instruments for themselves. he K population moment conditions are E[z t t ]=E[z t (y t x 0 tβ)] = 0 (K ). he K corresponding sample moment conditions are g (β) = (z t (y t x 0 tβ)) = Z0 (Y Xβ)= 0. (K ) he MM estimator is given by the unique solution Ã! bβ MM = z t x 0 t z t y t =(Z 0 X) Z {z } 0 Y = β b IV. (K K) 9of29 (Where do Instruments Come From?) Consider the two simple equations c t = β 0 + β y t + β 2 w t + t y t = β 20 + β 2 c t + + β 22 w t + β 23 r t + β 24 τ t + 2t Say that we are only interested in the first equation. Assume that w t is predetermined. If β 2 6=0,theny t is endogenous and E[y t t ] 6= 0. In this setup r t and τ t are possible instruments. We need β 23 and β 24 different from zero and E[(r t,τ t ) t ]=0. In dynamic models we can often used lagged values as instruments. Note,thatinthiscasewehavemorepotentialinstrumentsthanwehaveendogenous variables. his is adressed in GMM. 0 of 29

he GMM Problem Defined Let w t =(y t,x 0 t) 0 be a vector of model variables and let z t be instruments. Consider the R moment conditions E[f(w t,z t,θ)] = 0. Here θ is a K vector and f( ) is a R dimensional vector function. Consider the corresponding sample moment conditions g (θ) = f(w t,z t,θ)=0. When can the R sample moments be used to estimate the K parameters in θ? of 29 R<K No unique solution to g (θ) =0. he parameters are not identified. Order Condition R = K Unique solution to g (θ) =0. Exact identification. his is the MM estimator (OLS, IV). Note, that g (θ) =0is potentially a non-linear problem numerical solution. R>K More equations than parameters. Over-identified case. No solution in general (Z 0 X is a R K matrix). Not optimal to drop moments! Instead, choose θ to make g (θ) as close as possible to zero. 2 of 29

GMM Estimation (R >K) We want to make the R moments g (θ) asclosetozeroaspossible...how? Assume we have a R R symmetric and positive definite weight matrix W. hen we could define the quadratic form Q (θ) =g (θ) 0 W g (θ) ( ). he GMM estimator is defined as the vector that minimizes Q (θ), i.e. n b θgmm (W )=arg min g ( b θ) 0 W g ( b o θ). bθ he matrix W tells how much weight to put on each moment condition. Different W give different estimators, b θgmm (W ). GMM is consistent for any weight matrix, W. What is the optimal choice of W? 3 of 29 Optimal GMM Estimation he R sample moments g (θ) are estimators of E[f( )]; and random variables. he law of large numbers implies: g (θ) E[f( )] for. A central limit theorem implies: g (θ) N(0,S), where S istheasymptoticvarianceofthemoments, g (θ). Intuitively, moments with little variance should have large weights. he optimal weight matrix for GMM is a matrix W opt such that plim W opt = W opt = S. 4 of 29

Without autocorrelation, a natural estimator S b of S is h bs = V g (θ)i = V [g (θ)] " # = V f(w t,z t,θ) = f(w t,z t,θ)f(w t,z t,θ) 0. his implies that W opt = b S = Ã! f(w t,z t,θ)f(w t,z t,θ) 0. Note, that W opt depends on θ in general. 5 of 29 wo-step (efficient) GMM. Estimation in Practice () Choose some initial weight matrix W []. E.g. W [] = I or W [] =(Z 0 Z). Find a (consistent) estimate b θ[] =argmin θ Estimate the optimal weights, W opt. (2) Find the optimal GMM estimate g (θ) 0 W [] g (θ). b θgmm =argmin θ g (θ) 0 W opt g (θ). Iterated GMM. Start with some initial weight matrix W []. () Find an estimate b θ []. (2) Find a new weight matrix, W opt [2]. Iterate between b θ [ ] and W opt [ ] until convergence. 6 of 29

Properties of Optimal GMM he GMM estimator, b θ GMM (W opt ), is asymptotically efficient. Lowest variance in a class of models that uses same information. he GMM estimator is asymptotically normal, i.e. ³ bθgmm θ N (0,V), where V = D 0 W opt D = D 0 S D D = plim g (θ) θ 0 (R K). S measures the variance of the moments. he larger S the larger V. D measures the sensitivity of the moments wrt. changes in θ. If this is large the parameter can be estimated precisely. Little is known in finite samples. 7 of 29 Specification est If R>K, we have more moments than parameters. All moments have expectation zero. In a sense K moments are zero by estimating the parameters. additional R K moments are close to zero. If not, some orthogonality condition is violated. est if the Remember, that g (θ) N(0,S). his implies that if the weights are optimal, W opt S,then ξ = g ( b θ GMM ) 0 µ S g ( b θ GMM ) = g ( b θ GMM ) 0 W opt g ( b θ GMM ) χ 2 (R K). Hansen test for overidentifying restrictions. (J-test, Sargan test). A test for R K overidentifying conditions. 8 of 29

Famous Example: Hansen and Singleton (982) Consider an optimizing agent with a power utility function on consumption, U(C t )= C γ t γ.hefirst order condition for maximizing the discounted utility of future concumption is given by " µ # γ Ct+ E δ ( + r t+) I t =0, where I t is the conditioning information set a time t. C t Assume rational expectations. Now if z t I t, then it must be orthogonal to the expectation error, i.e. "Ã µ # γ Ct+ f(c t+,c t,r t+ ; z t ; δ, γ) =E δ ( + r t+)! z t =0. his is a moment condition. We need at least R =2instruments in z t. Note: Specification is theory driven, nonlinear, and not in regression format. C t 9 of 29 Linear GMM and GIVE Consider the linear regression model with K explanatory variables y t = x 0 tβ + x 0 2tβ 2 + t = x 0 tβ + t, where E[x t t ]=0, but the variables in x 2t are endogenous E[x 2t t ] 6= 0. Assume there exist R>K instruments z t =(x 0,z 0 2), suchthat E[z t t ]=E[z t (y t x 0 tβ)] = 0 (R ). Identification requires a non-zero correlation of z 2t and x 2t. Rank condition. he sample moments are g (β) = (z t (y t x 0 tβ)) = Z0 (Y Xβ). Note, that we cannot solve g (β) =0directly. Z 0 X is a R K matrix (of rank K) and cannot be inverted. 20 of 29

Instead we want to minimize the quadratic form Q (β) = g (β) 0 W g (β) µ 0 µ = Z0 (Y Xβ) W Z0 (Y Xβ) = 2 Y 0 Z β 0 X 0 Z W (Z 0 Y Z 0 Xβ) = 2 Y 0 ZW Z 0 Y 2β 0 X 0 ZW Z 0 Y + β 0 X 0 ZW Z 0 Xβ. o minimize Q (β) wetakethederivativeandsolvethek equations Q (β) β = 0 (K ). 2 of 29 he GMM estimator solves the K equations i.e. Q (β) β = 2 Y 0 ZW Z 0 Y 2β 0 X 0 ZW Z 0 Y + β 0 X 0 ZW Z 0 Xβ β = 2 2 X 0 ZW Z 0 Y +2 2 X 0 ZW Z 0 Xβ = 0 bβ GMM (W )=(X 0 ZW Z 0 X) X 0 ZW Z 0 Y. he optimal weight matrix is the inverse variance of the moments, i.e. where S = V W opt = S, h g (θ)i = V [Z0 ]= E [Z0 0 Z]= Z0 ΩZ, where we define E[ 0 ]=Ω. 22 of 29

Case : Homoscedastic Errors If E[ 0 ]=Ω = σ 2 I, the natural estimator of S is bs = Z0 b ΩZ = bσ2 Z 0 Z, where bσ 2 is a consistent estimator for σ 2. hen the GMM estimator becomes bβ GMM = ³ X 0 ZS b Z 0 X X 0 ZS b Z 0 Y = Ã µ! µ X 0 Z bσ2 Z 0 Z Z 0 X X 0 Z bσ2 Z 0 Z Z 0 Y ³ = X 0 Z (Z 0 Z) Z 0 X X 0 Z (Z 0 Z) Z 0 Y = b β GIV E = b β 2SLS Under homoscedasticity the optimal GMM estimator is GIVE (and 2SLS). 23 of 29 Recall, that bβ GMM N µ β, D 0 W opt D. he derivative is given by D = g (β) (R K) β 0 = Z0 (Y Xβ) β 0 = Z0 X = z t x 0 t. hevariancecanbeestimatedby h i V bβgmm = D 0 W opt D Ã µ = 0 µ µ Z0 X bσ2 Z 0 Z! Z0 X known from 2SLS. = bσ 2 (X 0 Z (Z 0 Z) Z 0 X), 24 of 29

he specification test is given by ξ = g ( b θ GMM ) 0 S b g ( b θ GMM ) Ã! 0 Ã! Ã bσ 2! = b t z t z t zt 0 b t z t Ã! 0 Ã! Ã! = b t z t bσ 2 z t zt 0 b t z t χ 2 (R K). In the linear case it is denoted the Sargan test. A simple way to calculate ξ is to consider the regression b t = z 0 tγ + residual, and calculate the test statistic as ξ = R 2. 25 of 29 Case 2: Heteroscedasticity, No Autocorrelation Inthecaseofheteroscedasticity but no autocorrelation, σ 2 0 0 E[ 0 ]=Ω = 0 σ 2 2..... 0, 0 0 σ 2 we can use the estimator bs = Z0 b ΩZ = b 2 tz t zt. 0 We only need an estimate of the K K matrix Z 0 ΩZ and not the matrix Ω. Weget β GMM ( S b ³ )= X 0 ZS b Z 0 X X 0 ZS b Z 0 Y. Note, that a constant in the weight is not important for the estimation. 26 of 29

hevarianceoftheestimatorbecomes h i V bβgmm = D 0 W opt D Ã!Ã! Ã = x t zt 0 b 2 tz t zt 0 Ã! Ã = x t zt 0 b 2 tz t zt 0 z t xt! 0, z t xt! 0 which is the heteroscedasticity consistent variance estimator of White. 27 of 29 Case 3: Autocorrelation he weight-matrix is W opt = S b,where " h # bs = V g (θ)i = V (z t t ). With autocorrelation, we need to take into account the covariances. his is done by the heteroscedasticity and autocorrelation consistent (HAC) estimator. Let Γ j = cov (z t t,z t j t j )=E[(z t t )(z t j t j ) 0 ] (R R) beacovariancematrixforlagj. hen S = {V (z t t )+Cov(z t t,z t t )+Cov(z t t,z t 2 t 2 )+... +Cov(z t t,z t+ t+ )+Cov(z t t,z t+2 t+2 )+...} = X Γ j. j= 28 of 29

If we can argue that Γ j =0for j larger than some lag, q, we can use the estimator qx bs = bγ j, where we estimate the covariances by j= q bγ j = (z t b t )(z t j b t j ) 0. t=j+ he obtained b S is not necessarily positive definite. Instead the covariances can be given decreasing weights, Newey-West estimator. Finite sample properties are unknown. he HAC covariance estimator can also be used for OLS. 29 of 29