Asymptotics for Nonlinear GMM Eric Zivot February 13, 2013
Asymptotic Properties of Nonlinear GMM Under standard regularity conditions (to be discussed later), it can be shown that where ˆθ(Ŵ) θ 0 ³ˆθ(Ŵ) θ 0 (0 avar(ˆθ(ŵ))) avar(ˆθ(ŵ)) = 1 (G0 WG) 1 G 0 WSWG(G 0 WG) 1 " # g(w θ G = 0 ) θ 0 For efficient GMM, set W = S 1 and so avar(ˆθ(ŝ 1 )) = 1 (G0 S 1 G) 1
Remark Notice that with nonlinear GMM, the expression for avar(ˆθ(ŵ)) is of the same form as in linear GMM except that Σ = [x z 0 ] is replaced by the matrix G = g(w θ 0 ) θ 0 g(w θ 0 ) θ 0 = where = " g(w θ 0 ) g(w θ 0 ) 1 is the Jacobian matrix and 1 (w θ 0 ) 1..... (w θ 0 ) 1 # 1 (w θ 0 ) (w θ 0 ) g(w θ 0 )=( 1 (w θ 0 ) (w θ 0 )) 0
Consistent estimate of avar(ˆθ(w)) avar(ˆθ(ŵ)) d = 1 (Ĝ0 ŴĜ) 1 ĜŴŜŴĜ(Ĝ 0 ŴĜ) 1 where Ŝ S = avar(ḡ) Ĝ = G (ˆθ(Ŵ)) = 1 X g(w ˆθ(Ŵ)) θ 0 ˆθ(Ŵ) θ 0 G For the efficient GMM estimator, Ŵ = Ŝ 1 and avar(ˆθ(ŝ d 1 )) = 1 (Ĝ0 Ŝ 1 Ĝ) 1
Estimation of S If {g (w θ 0 )} is an ergodic stationary MDS then a consistent estimator of S takes the form Ŝ HC = 1 X ˆθ θ 0 g (w ˆθ)g (w ˆθ) 0 If {g (w θ 0 )} is a mean-zero serially correlated ergodic-stationary process then S = LRV = Γ 0 +2 X (Γ + Γ 0 ) Γ = [g (w θ 0 )g (w θ 0 ) 0 ]
Consistent estimators of LRV and Γ have the form Ŝ HAC = ˆΓ 0 (ˆθ)+ ˆΓ (ˆθ) = 1 ˆθ θ 0 1 X X = +1 Ã ( )! (ˆΓ (ˆθ)+ˆΓ 0 (ˆθ)) g (w ˆθ)g (w ˆθ) 0
Proving Consistency and Asymptotic Normality for Nonlinear GMM Hayashi s Chapter 7 (see also Hall Chapter 3 and the handbook chapter by Newey and McFadden) discusses the techniques for proving consistency and asymptotic normality for a general class of nonlinear extremum estimators. Extremum estimators have the form ˆθ = arg max θ Θ (θ) (θ) = objective function Θ = parameter space
GMM as an Extremum Estimator For GMM, define the objective function to be maximized as (θ) = 1 2 (θ Ŵ) = 1 2 g (θ) 0 Ŵg (θ) g (θ) = 1 X g(w θ 0 ) If ˆθ(Ŵ) minimizes (θ Ŵ) then it maximizes (θ) Remark: 1 2 is used in the definition of (θ) to simplify derivations (e.g. remove unnecessary constants)
M-Estimators For the class of M-estimators, the objective function to be maximized has the form Three common M-estimators are: X (θ) = 1 (w θ) (w θ) = real valued function (1) maximum likelihood (ML) estimators; (2) nonlinear least squares (NLS) estimators; (3) bounded influence function robust estimators
MLE as an M-Estimator Let x ( x ; θ) for =1 with log-likelihood function The MLE solves ln (θ Y X) = ˆθ =argmax θ Then the MLE is an M-estimator with X X ln ( x ; θ) ln ( x ; θ) (w θ) = ln ( x ; θ) (θ) = 1 X ln ( x ; θ)
NLS Estimator as an M-Estimator Consider the nonlinear regression The NLS estimator solves = (x θ)+ =1 ˆθ =argminrss(θ) = θ Then NLS estimator is an M-estimator with X (w θ) = ( (x θ)) 2 (θ) = 1 X ( (x θ)) 2 ( (x θ)) 2
Robust Regression as an M-estimator Consider the linear regression = x 0 θ + =1 To be robust to potential outliers and/or fat-tail error distributions, Huber defined the robust regression estimator as X ˆθ = argmin ³ x 0 θ θ ( ) is a convex bounded function Then Huber s robust regression estimator is an M-estimator with (w θ) = ³ x 0 θ (θ) = 1 X ³ x 0 θ
M-estimators as Method of Moments Estimators The first order conditions (FOCs) for an M-estimator are 0 = (ˆθ) θ = 1 X (w ˆθ) θ = 1 X ψ(w ˆθ) ψ(w θ) = (w θ) θ Therefore, M-estimators can be thought of as method of moment estimators basedonthepopulationmoment [ψ(w θ 0 )] = 0 Note: this is useful for deriving the asymptotic distribution of M-estimators.
Example: MLE as Method of Moments Estimator ˆθ = argmax (θ) = 1 θ 0 = (ˆθ) θ = 1 X = 1 ˆψ(w ˆθ) X X ln ( x ; θ) ln ( x ˆθ) θ ˆψ(w ˆθ) = ln ( x ˆθ) = likelihood score θ The population moment is " # ln ( x θ [ψ(w θ 0 )] = 0 ) = 0 θ
GMM as an M-estimator GMM can be thought of as an M-estimator. The first order condtions for GMM are 0 = (ˆθ) θ = 1 X = G (ˆθ) 0 Ŵg (ˆθ) = G (ˆθ) 0 Ŵ 1 ˆψ(w ˆθ) ˆψ(w θ) = G (ˆθ) 0 Ŵg(w θ) X g(w ˆθ) The population moment is [ψ(w θ 0 )] = h G 0 Wg(w θ 0 ) i = G 0 W [g(w θ 0 )] = 0 Note: Defined this way, there are = population moments defined by G 0 W [g(w θ 0 )] = 0
Consistency of Extremum Estimators Basic Idea: Let ˆθ =argmax θ Θ (θ) If (θ) 0 (θ) uniformly in θ, θ 0 uniquely maximizes 0 (θ) then ˆθ θ
Technical requirements 1. Continuity of (θ) and 0 (θ) 2. Uniform convergence of (θ) to 0 (θ) sup (θ) 0 (θ) Θ 0 as 3. Compact parameter space Θ 4. θ 0 uniquely maximizes 0 (θ) Note: See Hall Chapter 3 or Newey and McFaddan (1994) for a formal proof based on and arguments
Application to GMM Recall (θ) = 1 2 (θ Ŵ) = 1 2 g (θ) 0 Ŵg (θ) ˆθ = arg max (θ) θ Θ = arg min (θ Ŵ) θ Θ
Now 1. (θ) is continuous provided g (θ) is continuous 2. If {w } is ergodic-stationary and [g(w θ)] exists, then g (θ) = 1 X g(w θ) [g(w θ)] (for fixed ) 0 (θ) = 1 2 [g(w θ)] 0 W [g(w θ)] 3. If [g(w θ 0 )] = 0 and [g(w θ)] 6= 0 for any θ 6= θ 0 then 0 (θ 0 ) = 1 2 [g(w θ 0 )] 0 W [g(w θ 0 )] = 0 0 (θ) 0 for θ 6= θ 0 Therefore, 0 (θ) is uniquely maximized at θ 0
4. Uniform convergence of (θ) to 0 (θ) requires uniform convergence of g (θ) to [g(w θ)] Hayashi states that a sufficient condition for is sup g (θ) [g(w θ)] Θ 0 as where [sup g(w θ) ] θ Θ g(w θ) = ³ 1 (w θ) 2 + + (w θ) 2 1 2 = Euclidean norm
Asymptotic Normality of GMM Recall Assume (θ) = 1 2 (θ Ŵ) = 1 2 g (θ) 0 Ŵg (θ) ˆθ = arg max (θ) θ Θ 1. θ 0 Θ compact 2. g(w θ) is 1 and continuously differentiable in θ for any w 3. [g(w θ 0 )] = 0 and [g(w θ 0 )] 6= 0 for θ 6= θ 0 4. G = [ g(w θ 0 ) θ 0 ] has full column rank
5. g (θ 0 ) (0 S) 6. [sup θ Θ g(w θ) ]
Basic trick: Apply exact TSE (i.e., Mean Value Theorem) to nonlinear sample moment g (θ) to get a linear function of θ Since ˆθ =argmax θ Θ (θ) the first order condtions (FOC) are 0 = (ˆθ) 1 θ G (θ) = g (θ) θ 0 = = G (ˆθ) 0 Ŵg (ˆθ) " g (θ) 1 g (θ) # Now apply exact TSE to g (ˆθ) about θ 0 g (ˆθ) = g (θ 0 )+G ( θ)(ˆθ θ 0 ) θ = ˆθ +(1 )θ 0 =1 (0 1)
Substitute TSE into FOC 0 = G (ˆθ) 0 Ŵ h g (θ 0 )+G ( θ)(ˆθ θ 1 0 ) i = G (ˆθ) 0 Ŵg (θ 0 ) G (ˆθ) 0 ŴG ( θ)(ˆθ θ 0 ) Solve for (ˆθ θ 0 ) (ˆθ θ 0 )= h G (ˆθ) 0 ŴG ( θ) i 1 G (ˆθ) 0 Ŵg (θ 0 ) Multiply both sides by (ˆθ θ 0 )= h G (ˆθ) 0 ŴG ( θ) i 1 G (ˆθ) 0 Ŵ g (θ 0 )
Asymptotics Now, By the ergodic theorem Also, we know that G (θ) = g (θ) θ 0 G (θ 0 ) = 1 X " g(w θ 0 ) θ 0 g(w θ) θ 0 # = G ˆθ θ 0 θ = ˆθ +(1 )θ 0 θ 0
If G (θ) " g(w θ) θ 0 # uniformly in θ then G (ˆθ) G ( θ) G Asufficient condition for uniform convergence of G (θ) to " sup g(w θ) θ 0 # g(w θ) θ 0 is
Finally, by assumption g (θ 0 )= 1 X g(w θ 0 ) (0 S) Therefore, (ˆθ θ 0 ) (G 0 WG) 1 G 0 W (0 S) (0 V) V = (G 0 WG) 1 G 0 WSWG(G 0 WG) 1 Remark: For efficient GMM, W = S 1 and V = avar( (ˆθ θ 0 )) = (G 0 S 1 G) 1
Remark The preceeding results imply the following simplifications provided ˆθ is close to θ 0 so that G ( θ) =G (θ 0 )+ (1) : and g (ˆθ) = g (θ 0 )+G ( θ)(ˆθ θ 0 ) = g (θ 0 )+G (θ 0 )(ˆθ θ 0 )+ (1) (ˆθ θ 0 ) = h G (ˆθ) 0 ŴG ( θ) i 1 G (ˆθ) 0 Ŵg (θ 0 ) = h G (θ 0 ) 0 ŴG (θ 0 ) i 1 G (θ 0 ) 0 Ŵg (θ 0 ) + (1) where (1) represents terms that converge in probability to zero. These simplifications will be useful in the following derivations.
Remark It is not wise to try to prove consistency by linearizing the FOCs 0 = (ˆθ) 1 θ = G (ˆθ) 0 Ŵg (ˆθ) using the Mean Value Theorem. This is because ˆθ that solves the FOCs may be a local-maximum to (θ) and so ˆθ may not converge to θ 0 Note that the proof of asymptotic normality assumes that ˆθ is consistent for θ so we avoid complications associated with local maximum.
Hypothesis Testing in Nonlinear GMM Models The main types of hypothesis tests Overidentification restrictions Coefficient restrictions (linear and nonlinear) Subsets of orthogonality restrictions work the same way in nonlinear GMM as they do with linear GMM.
Remarks: 1. One should always first test the overidentifying restrictions before conducting the other tests. If the model specification is rejected, it does not make sense to do the remaining tests. 2. Testing instrument relevance is not straightforward in nonlinear models.
Asymptotic Distribution of Sample Moments and J-statistic By assumption, the normalized sample moment evaluated at θ 0 is asymptotically normally distributed g (θ 0 ) = 1 X (w θ 0 ) (0 S) S = avar( g (θ 0 )) As a result, the J-statistic evaluated at θ 0 is asymptotically chi-square distributed with degrees of freedom provided Ŝ S = (θ 0 Ŝ 1 )= g (θ 0 ) 0 Ŝ 1 g (θ 0 ) 2 ( )
As with linear GMM, the normalized sample moment evaluated at the efficient GMM estimate ˆθ = ˆθ(Ŝ 1 ) is asymptotically normally distributed Ŝ 1 2 g (ˆθ) (0 (I P )) " P = F(F 0 F) 1 F 0 F = S 1 2 g(w θ G G = 0 ) θ 0 rank (I P )= #
To derive this result, combine the TSE of g (ˆθ) about θ 0 with the the F.O.C for the efficient GMM optimization g (ˆθ) = g (θ 0 )+G (θ 0 )(ˆθ θ 0 )+ (1) (ˆθ θ 0 ) = h G (θ 0 ) 0 Ŝ 1 G (θ 0 ) i 1 G (θ 0 ) 0 Ŝ 1 g (θ 0 )+ (1) to give g (ˆθ) =g (θ 0 ) G (θ 0 ) h G (θ 0 ) 0 Ŝ 1 G (θ 0 ) i 1 G (θ 0 ) 0 Ŝ 1 g (θ 0 )+ (1)
Let Ŝ 1 = Ŝ 1 20 Ŝ 1 2 Then write the normalized sample moment as Ŝ 1 2 g (ˆθ) = h I P ˆ i Ŝ 1 2 g (θ 0 )+ (1) where ˆF = Ŝ 1 2 G (θ 0 ) P ˆ = ˆF(ˆF 0ˆF) 1ˆF 0 Then by the asymptotic normality of g (θ 0 ) and Slutsky s theorem Ŝ 1 2 g (ˆθ) [I P ] (0 I ) (0 I P ) since I P is idempotent.
From the previous result, it follows that the asymptotic distribution of the J- statistic evaluated at the efficient GMM estimate ˆθ(Ŝ 1 ) is chi-square with degrees of freedom (ˆθ(Ŝ 1 ) Ŝ 1 ) = g (ˆθ(Ŝ 1 )) 0 Ŝ 1 g (ˆθ(Ŝ 1 µŝ 1 2 )) = g (ˆθ(Ŝ 1 0 )) µŝ 1 2 g (ˆθ(Ŝ 1 )) 2 ( ) Therefore, we reject the overidentifying restrictions at the 5% level if (ˆθ(Ŝ 1 ) Ŝ 1 ) 2 0 95 ( )
Testing Restrictions on Parameters: The Trilogy of Tests For ease of exposition, consider testing the simple hypotheses We consider three types of tests 0 : θ 0 1 1 : θ 6= θ 0 Wald test GMM-LR test (difference in J-statistics) GMM-LM test (GMM score test)
Let ˆθ = ˆθ(Ŝ 1 ) denote the efficient GMM estimator. The three test statistics have the form GMM = (ˆθ θ 0 ) 0 h G (ˆθ) 0 Ŝ 1 G (ˆθ) i (ˆθ θ 0 ) GMM = (θ 0 Ŝ 1 ) (ˆθ Ŝ 1 ) GMM = g (θ 0 ) 0 Ŝ 1 G (θ 0 ) h G (θ 0 ) 0 Ŝ 1 G (θ 0 ) i G (θ 0 ) 0 Ŝ 1 g (θ 0 ) where the same value of Ŝ 1 is used for all statistics. Result: Under 0 : θ = θ 0 GMM GMM GMM 2 ( )
Derivation of GMM (score) statistic The score statistic is based on the score of the GMM objective function 0 = (ˆθ) = G (ˆθ) 0 Ŝ 1 g (ˆθ) 1 θ Intuition: if 0 : θ = θ 0 is true then 0 (θ 0 ) = G (θ 0 ) 0 Ŝ 1 g (θ 0 ) 1 θ whereas if 0 : θ = θ 0 is not true then 0 6= (θ 0 ) 1 θ = G (θ 0 ) 0 Ŝ 1 g (θ 0 )
Assuming 0 : θ = θ 0 is true (θ 0 ) θ = G (θ 0 ) 0 Ŝ 1 g (θ 0 ) G 0 S 1 (0 S) (0 G 0 S 1 G) Notice that avar à (θ 0 ) θ! = G 0 S 1 G
Then, the GMM score statistic is defined as Ã! 0 (θ GMM = 0 ) h G 0 θ (θ 0 )Ŝ 1 G (θ 0 ) i 1 Ã! (θ 0 ) θ = g (θ 0 ) 0 Ŝ 1 G (θ 0 ) h G (θ 0 ) 0 Ŝ 1 G (θ 0 ) i G (θ 0 ) 0 Ŝ 1 g (θ 0 )
Then by the asymptotic normality of g (θ 0 ) and Slutsky s theorem GMM (0 G 0 S 1 G) 0 h G 0 S 1 G i 1 (0 G 0 S 1 G) 2 ( )