Nonlinear GMM Eric Zivot Winter, 2013
Nonlinear GMM estimation occurs when the GMM moment conditions g(w θ) arenonlinearfunctionsofthe model parameters θ The moment conditions g(w θ) may be nonlinear functions satisfying [g(w θ 0 )] = 0 Alternatively, for a response variable explanatory variables z and instruments x the model may defineanonlinearerrorterm such that ( z ; θ 0 )= [ ]= [ ( z ; θ 0 )] = 0
Note: if z is endogenous then cannot use nonlinear least squares to estimate Given x orthogonal to define g(w θ 0 )=x = x ( z ; θ 0 ) so that [g(w θ 0 )] = [x ]= [x ( z ; θ 0 )] = 0 defines the GMM orthogonality conditions.
In general, the GMM moment equations produce a system of nonlinear equations in unknowns. Global identification of θ 0 requires that [g(w θ 0 )] = 0 [g(w θ)] 6= 0 for θ 6= θ 0 Local Identification requires that the matrix " # g(w θ G = 0 ) θ 0 has full column rank. Remark Global identification does not require differentiability of g(w θ 0 )
Intuition about local identification Consider a first order Taylor series expansion of g(w θ) about θ = θ 0 : g(w θ) =g(w θ 0 )+ g(w θ 0 ) θ 0 (θ θ 0 )+ Then " # g(w θ [g(w θ)] [g(w θ 0 )] + 0 ) θ 0 (θ θ 0 ) " # g(w θ = 0 ) θ 0 (θ θ 0 ) For θ = θ 0 to be the unique solution to [g(w θ)] = 0 it must be the case that à " #! g(w θ rank 0 ) θ 0 =
The sample moment condition for an arbitrary θ is g (θ) = 1 X =1 g(w θ) If = then θ 0 is apparently just identified and the GMM objective function is (θ) = g (θ) 0 g (θ) which does not depend on a weight matrix. The corresponding GMM estimator is then and solves ˆθ =argmin θ g (ˆθ) =0 (θ)
If then θ 0 is apparently overidentified. Let Ŵ denote a symmetric and p.d. weight matrix, possibly dependent on the data, such that Ŵ W as with W symmetric and p.d. The GMM estimator of θ 0, denoted ˆθ(Ŵ) is defined as ˆθ(Ŵ) =argmin θ The first order conditions are (ˆθ(Ŵ) Ŵ) θ (θ Ŵ) = g (θ) 0 Ŵg (θ) = 2G (ˆθ(Ŵ)) 0 Ŵg (ˆθ(Ŵ)) = 0 G (ˆθ(Ŵ)) = g (ˆθ(Ŵ)) θ 0
The efficient GMM estimator uses Ŵ = Ŝ 1 such that Ŝ S =avar( g (θ 0 )) If {g(w θ 0 )} is an ergodic-stationary MDS then S = [g(w θ 0 )g(w θ 0 ) 0 ] If {g(w θ 0 )} is a serially correlated linear process then S = LRV = Γ 0 + X =1 (Γ + Γ 0 )=Ψ(1)ΣΨ(1)0 Γ 0 = [g(w θ 0 )g 0 (w θ 0 )] Γ = [g(w t θ 0 )g(w θ 0 ) 0 ] As with efficient GMM estimation of linear models, the efficient GMM estimator of nonlinear models may be computed using a two-step, iterated, or continuous updating estimator.
Computation Since the GMM objective function is a quadratic form, the Gauss-Newton (GN) algorithm is well suited for finding the minimum. The GN algorithm starts from a first order Taylor series approximation to g (θ) at a starting value ˆθ 1 where g (θ) = g (ˆθ 1 )+G (ˆθ 1 )(θ ˆθ 1 )+ h i h g (ˆθ 1 ) G (ˆθ 1 )ˆθ 1 G (ˆθ 1 ) i θ = v 1 G 1 θ v 1 = g (ˆθ 1 )+G 1ˆθ 1 G 1 = G (ˆθ 1 )= g (ˆθ 1 ) θ 0
Note: The linear approximation g (θ) =v 1 G 1 θ is like the linear GMM moment condition g (θ) =S S θ where v 1 = g (ˆθ 1 )+G 1ˆθ 1 = S G 1 = G (ˆθ 1 )=S
Now, do linear GMM using the approximate linear moment condition min θ (θ Ŵ) =[v 1 G 1 θ] 0 Ŵ [v 1 G 1 θ] The linear GMM estimator has the closed form solution ˆθ 2 = ³ G 0 1 1ŴG 1 G 0 1 Ŵv 1 = ³ G 0 1 1ŴG 1 G 0 1 Ŵ ³ g (ˆθ 1 )+G 1ˆθ 1 = ˆθ 1 + ³ G 0 1ŴG 1 1 G 0 1 Ŵg (ˆθ 1 ) The GN interative algorithm is then ˆθ +1 = ˆθ + ³ G 0 ŴG 1 G 0 Ŵg (ˆθ )
Common Convergence Criteria Stop when ˆθ +1 ˆθ 10 6 x = ³ 2 1 + + 2 1 2 This criteria is sensitive to the units of θ Better to replace by ³ ˆθ + 10 5 10 3 Then stop when ˆθ +1 ˆθ ³ ˆθ + 10 5
Stop when (ˆθ Ŵ) θ 0 Stop when (ˆθ +1 Ŵ) (ˆθ Ŵ) This criteria is sensitive to the units of (ˆθ Ŵ) Better to replace by ³ (ˆθ Ŵ)+ 10 5 10 3 Then stop when (ˆθ +1 Ŵ) (ˆθ Ŵ) ³ (ˆθ Ŵ)+ 10 5
Example: Student s-t Distribution (Hamilton, 1994) Consider a random sample 1 from a centered Student s-t distribution with 0 degrees of freedom with pdf ( ; 0 ) = Γ[( 0 +1) 2] ( 0 ) 1 2 Γ( 0 2) [1 + ( 2 0 )] ( 0+1) 2 Γ( ) = gamma function The goal is to estimate the degrees of freedom parameter 0 by GMM using the moment conditions [ 2 ] = [ 4 ] = 0 0 2 3 2 0 ( 0 2)( 0 4) 0 4
Let w =( 2 4 )0 and define g(w ) = Ã 2 ( 2) 4 3 2 ( 2)( 4) Then [g(w 0 )] = 0 is the moment condition used for defining the GMM estimator for 0! Here, =2and =1so 0 is apparently overidentified. Since we assume random sampling, in this example, g(w 0 ) is an iid process.
Using the sample moments g ( ) = 1 = Ã X =1 1 g(w ) 1 P =1 2 the GMM objective function has the form ( 2) P =1 4 3 2 ( 2)( 4)! ( ) = g ( ) 0 Ŵg ( ) where Ŵ is a 2 2 p.d. and symmetric weight matrix, possibly dependent on the data, such that Ŵ W.
The efficient GMM estimator uses the weight matrix Ŝ 1 such that Ŝ S = [g(w 0 )g(w 0 ) 0 ] For example, use Ŝ = 1 X =1 ˆ g(w ˆ )g(w ˆ ) 0
Remarks 1. In estimating the model, the restriction 4 should be imposed. This may be done by reparameterization. Define = ( ) =exp( )+4 andthenestimate freely. Given (ˆ 0 ) (0 ) a consistent and asymptotically normal estimate for by Slutsky s theorem and the delta method, is ˆ = (ˆ ) =exp(ˆ )+4where (ˆ 0 ) (0 0 ( 0 ) 2 ) 2. Since the pdf of the data is known, the most efficient estimator is the maximum likelihood estimator.
Example: MA(1) model TheMA(1)modelhastheform = 0 + + 0 1 =1 (0 2 0 ) 0 1 θ 0 =( 0 0 2 0 )0 Some population moment equations that can be used for GMM estimation are [ ] = 0 [ 2 ] = 0 + 2 0 = 2 0 (1 + 2 0 )+ 2 0 [ 1 ] = 1 + 2 0 = 2 0 0 + 2 0 [ 2 ] = 2 + 2 0 = 2 0 [ ] = + 2 0 = 2 0
Let w =( 2 1 2 ) 0 and define the 4 1 moment vector Then g(w θ) = 2 2 2 (1 + 2 ) 1 2 2 2 2 [g(w θ 0 )] = 0 is the population moment condition used for GMM estimation of the model parameters θ 0 Here, =3and =4the model is apparently overidentified. The process {g(w θ 0 )} will be autocorrelated (at least at lag 1) since follows an MA(1) process.
Thesamplemomentsare g (θ) = = 1 2 1 2 X =3 1 2 (w θ) 1 P =3 2 P =3 2 2 2 (1 + 2 ) P =3 1 2 2 P =3 2 2 1 2 The efficient GMM objective function has the form (θ) =( 2) g (θ) 0 Ŝ 1 g (θ) where Ŝ is a consistent estimate of S =avar(ḡ(θ 0 )). Note: Since { (w θ)} is autocorrelated S =LRV
1. The process {g(w θ 0 )} will be autocorrelated (at least at lag 1) since follows an MA(1) process. As a result, an HAC type estimator must be used to estimate S : Ŝ HAC = ˆΓ 0 (ˆθ)+ ˆΓ (ˆθ) = 1 X = +1 1 X =1 Ã ( )! g (ˆθ)g (ˆθ) 0 (ˆΓ (ˆθ)+ˆΓ 0 (ˆθ)) 2. Suppose it is known that 0 1 and 2 0 These restrictions may be imposed using the reparameterization = exp( 0 ) 1+exp( 0 ) 0 2 = exp( 1 ) 1
3. If iid (0 2 ) then the MLE is the efficient estimator.
Example: log-normal stochastic volatility model The simple log-normal stochastic volatility (SV) model, due to Taylor (1986), is given by = =1 ln 2 = 0 + 0 ln 2 1 + 0 ( ) 0 iid (0 I 2 ) θ 0 =( 0 0 0 ) 0 For 0 0 1 and 0 0 the series is strictly stationary and ergodic, and unconditional moments of all orders exist. In the SV model, the series is serially uncorrelated but dependency in the higher-order moments is induced by the serially correlated stochastic volatility term ln 2.
The GMM estimation of the SV model is surveyed in Andersen and Sorensen (1996). They recommended using moment conditions for GMM estimation based on lower-order moments of, since higher-order moments tend to exhibit erratic finite sample behavior. They considered a GMM estimation based on (subsets) of 24 moments considered by Jacquier, Polson, and Rossi (1994). To describe these moment conditions, first define = 1 2 = 2 1 2
The moment conditions, which follow from properties of the log-normal distribution and the Gaussian AR(1) model, are expressed as [ ] = (2 ) 1 2 [ ] [ 2 ] = [ 2 ] [ 3 q ] = 2 2 [ 3 ] [ 4 ] = 3 [ 4 ] [ ] = (2 ) [ ] =1 10 [ 2 2 ] = [ 2 2 ] =1 10 where for any positive integer and positive constants and, Ã [ ] = exp 2 + 2 2! 8 Ã [ ] = [ ] [ 2! ]exp 4
Let and define the 24 1 vector (w θ) = w = ( 2 3 4 1 10 2 1 2 2 10 2 )0 (2 ) 1 2 exp µ 2 exp + 2 µ 2 + 2 8. µ 2 2 2 10 exp ³ + 2 exp 10 2 Then, [ (w θ 0 )] = 0 is the population moment condition used for the GMM estimation of the model parameters θ 0. Since the elements of w are serially correlated, the efficient weight matrix S = avar(ḡ) must be estimated using an HAC estimator. 2 2
Example: Euler Equation Asset Pricing Model A representative agent is assumed to choose an optimal consumption path by maximizing the present discounted value of lifetime utility from consumption max subject to the budget constraint X =1 h 0 ( ) i + 1 + where denotes the information available at time denotes real consumption at denotes real labor income at, denotes the price of a pure discount bond maturing at time +1that pays +1 represents the quantity of bonds held at, and 0 represents a time discount factor.
The first order condition for the maximization problem may be represented as the conditional moment equation (Euler equation) Assume a power utility function " (1 + +1 ) 0 0 ( +1 ) 0 ( ) # 1 = 0 1+ +1 = +1 ( ) = 1 0 1 0 0 = risk aversion parameter
Then 0 ( +1 ) 0 ( ) = Ã! 0 +1 and the conditional moment equation becomes (1 + +1 ) 0 Ã +1! 0 1=0
Definethenonlinearerrortermas +1 = ( +1 +1 ; 0 0 ) Ã! 0 +1 = (1+ +1 ) 0 1 = (z +1 θ 0 ) z +1 = ( +1 +1 ) 0 θ 0 =( 0 0 ) 0 Then the conditional moment equation may be represented as [ +1 ]= [ (z +1 θ 0 ) ]=0
Since { +1 +1 } is a MDS, potential instruments x include current and lagged values of the elements in z as well as a constant For example, one could use x =(1 1 1 2 1 ) 0 Since x the conditional moment implies that [x +1 ]= [x (z +1 θ 0 ) ]=0 and by the law of iterated expectations the conditional moment equation implies the unconditional moment equation [x +1 ]=0
For GMM estimation, define the nonlinear residual as +1 =(1+ +1 ) and form the 5 1 vector of moments à +1! 1 = (w +1 θ) =x +1 = x (z +1 θ) (1 + +1 ) ³ +1 1 µ ( 1 ) (1 + +1 ) ³ +1 1 µ ( 1 2 ) (1 + +1 ) ³ +1 1 µ(1 + +1 ) ³ +1 1 1 µ(1 + +1 ) ³ +1 1
There are =5moment conditions to identify =2model parameters giving =3overidentifying restrictions. Note: { (w +1 θ 0 )} is an ergodic-stationary MDS by assumption so that S = [ (w +1 θ 0 ) (w +1 θ 0 ) 0 ]
The GMM objective function is (θ Ŝ 1 )=( 2) g (θ) 0 Ŝ 1 g (θ) where Ŝ is a consistent estimate of S =avar(ḡ) Since { (w +1 θ 0 )} is a MDS, S may be estimated using Ŝ = 1 2 ˆθ θ X =3 g(w +1 ˆθ)g(w +1 ˆθ) 0
An extension of the model allows the individual to invest in risky assets with returns +1 ( =1 ), as well as a risk-free asset with certain return +1. Assuming power utility and restricting attention to unconditional moments (i.e., using x =1), the Euler equations may be written as à +1 (1 + +1 ) 0! 0 ( +1 +1 ) 0 à +1 1 = 0! 0 = 0 =1
For the GMM estimation, one may use the +1vector of moments (w +1 θ) = (1 + +1 ) ³ +1 1 ( 1 +1 +1 ) ³ +1. ( +1 +1 ) ³ +1
Example: Interest Rate Diffusion Model Consider estimating the parameters of the continuous-time interest rate diffusion model = ( 0 + 0 ) + 0 0 θ 0 = ( 0 0 0 0 ) 0 = Wiener process Moment conditions for GMM estimation of θ 0 may be derived from the Euler discretization + =( 0 + 0 ) + 0 0 + + (0 1) [ + ]=0 [ + 2 ]=1
Definethetruemodelerroras + = ( + ; 0 0 0 0 ) = + ( 0 + 0 ) = 0 0 + = (z + θ 0 ) z + = ( + ) 0 Letting represent information available at time the true error satisfies [ + ]=0 Since { + + } is a MDS, potential instruments x include current and lagged values of the elements of z as well as a constant. Using x =(1 ) 0 as the instrument vector, the following four conditional moments may be deduced [ + ] = 0 [ 2 + ]= 2 0 2 0 [ + ] = 0 [ 2 + ]= 2 0 2 0
For given values of and define the nonlinear residual + = + ( + ) and, for w + =( + 2)0 define the 4 1 vector of moments g(w + θ) = Ã + 2 +! x = + + 2 + 2 2 ³ 2 + 2 2 Then [g(w + θ 0 )] = 0 gives the GMM estimating equation. Even though { } is a MDS, the moment vector g(w θ 0 ) is likely to be autocorrelated since it contains 2 However, since = =4 the model is just identified and so the GMM objective function does not depend on a weight matrix: (θ) = g (θ) 0 g (θ)