Chapter 1. GMM: Basic Concepts

Similar documents
Chapter 2. GMM: Estimating Rational Expectations Models

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Introduction: structural econometrics. Jean-Marc Robin

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

Simple Estimators for Semiparametric Multinomial Choice Models

Parametric Inference on Strong Dependence

Notes on Generalized Method of Moments Estimation

Chapter 6. Maximum Likelihood Analysis of Dynamic Stochastic General Equilibrium (DSGE) Models

Subsets Tests in GMM without assuming identi cation

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Economics 241B Estimation with Instruments

On Econometric Analysis of Structural Systems with Permanent and Transitory Shocks and Exogenous Variables

A Course on Advanced Econometrics

Lecture Notes Part 7: Systems of Equations

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Chapter 2. Dynamic panel data models

Combining Macroeconomic Models for Prediction

Lecture Notes on Measurement Error

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Economics 620, Lecture 18: Nonlinear Models

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

ECON0702: Mathematical Methods in Economics

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Testing Linear Restrictions: cont.

Comment on HAC Corrections for Strongly Autocorrelated Time Series by Ulrich K. Müller

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

We begin by thinking about population relationships.

Nonlinear GMM. Eric Zivot. Winter, 2013

GMM estimation of spatial panels

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

Generalized Method of Moments (GMM) Estimation

Simple Estimators for Monotone Index Models

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University

Sophisticated Monetary Policies

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.

LECTURE 13: TIME SERIES I

Nonlinear Programming (NLP)

Econometrics Midterm Examination Answers

Economics 620, Lecture 20: Generalized Method of Moment (GMM)

1. The Multivariate Classical Linear Regression Model

Equivalence of several methods for decomposing time series into permananent and transitory components

Advanced Economic Growth: Lecture 21: Stochastic Dynamic Programming and Applications

Maximum Likelihood (ML) Estimation

GMM Based Tests for Locally Misspeci ed Models

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

The Hausman Test and Weak Instruments y

R = µ + Bf Arbitrage Pricing Model, APM

Estimation and Inference with Weak Identi cation

Consistent estimation of asset pricing models using generalized spectral estimator

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

GMM Estimation with Noncausal Instruments

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

Identi cation and Frequency Domain QML Estimation of Linearized DSGE Models

Economics 620, Lecture 13: Time Series I

ECON0702: Mathematical Methods in Economics

In the Ramsey model we maximized the utility U = u[c(t)]e nt e t dt. Now

The marginal propensity to consume and multidimensional risk

13 Endogeneity and Nonparametric IV

The Forward Premium is Still a Puzzle Appendix

Using OLS to Estimate and Test for Structural Changes in Models with Endogenous Regressors

Empirical Asset Pricing and Statistical Power in the Presence of Weak Risk Factors

The Basic New Keynesian Model. Jordi Galí. June 2008

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

Exercises Chapter 4 Statistical Hypothesis Testing

Problem set 1 - Solutions

Estimation and Inference with Weak, Semi-strong, and Strong Identi cation

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.

Solving Extensive Form Games

Lecture 9: The monetary theory of the exchange rate

ECONOMETRICS II (ECO 2401) Victor Aguirregabiria. Spring 2018 TOPIC 4: INTRODUCTION TO THE EVALUATION OF TREATMENT EFFECTS

Lecture 3, November 30: The Basic New Keynesian Model (Galí, Chapter 3)

ECONOMET RICS P RELIM EXAM August 19, 2014 Department of Economics, Michigan State University

Discriminating between (in)valid external instruments and (in)valid exclusion restrictions

Föreläsning /31

On Standard Inference for GMM with Seeming Local Identi cation Failure

4.8 Instrumental Variables

Appendix for "O shoring in a Ricardian World"

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

Robust Con dence Intervals in Nonlinear Regression under Weak Identi cation

Inflation Dynamics in the Euro Area Jensen, Henrik

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Appendix II Testing for consistency

The Hansen Singleton analysis

Learning in Real Time: Theory and Empirical Evidence from the Term Structure of Survey Forecasts

Economics Discussion Paper Series EDP Measuring monetary policy deviations from the Taylor rule

On the Power of Tests for Regime Switching

An Extended Macro-Finance Model with Financial Factors: Technical Appendix

A New Approach to Robust Inference in Cointegration

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Lecture 6, January 7 and 15: Sticky Wages and Prices (Galí, Chapter 6)

Birkbeck Economics MSc Economics, PGCert Econometrics MSc Financial Economics Autumn 2009 ECONOMETRICS Ron Smith :

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Volume 30, Issue 1. Measuring the Intertemporal Elasticity of Substitution for Consumption: Some Evidence from Japan

Advanced Microeconomics Fall Lecture Note 1 Choice-Based Approach: Price e ects, Wealth e ects and the WARP

GMM based inference for panel data models

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Capital Structure and Investment Dynamics with Fire Sales

Transcription:

Chapter 1. GMM: Basic Concepts Contents 1 Motivating Examples 1 1.1 Instrumental variable estimator....................... 1 1.2 Estimating parameters in monetary policy rules.............. 2 1.3 Estimating the parameter of risk aversion................. 4 2 De nition 5 3 Global and local identi cation 6 3.1 Global identi cation............................. 6 3.2 Local identi cation.............................. 7 3.3 Identi ed, but only weakly.......................... 9 4 Estimation and inference in well identi ed models 9 4.1 The asymptotic distribution......................... 9 4.2 E cient GMM................................ 10 4.3 Two-step and continuous updating GMM................. 11 5 Testing parametric restrictions in well identi ed models 11 5.1 Wald Test................................... 12 5.2 Gradient test................................. 13 5.3 Distance test................................. 14 6 Model diagnostics 14 6.1 Formulating model diagnostics as testing for parametric restrictions.. 14 6.2 Testing overidentifying restrictions..................... 17 6.3 Hausman s Speci cation Test........................ 17

1. Motivating Examples 1.1. Instrumental variable estimator Consider the following linear model with endogeneity y t = x 0 t 0 + " t, with E(x t " t ) 6= 0: (1) Suppose z t is a set of valid instruments: E(z t " t ) = 0: For now, assume dim(z t ) = dim(x t ). Multiply both sides of the regression equation by z t z t y t = z t x 0 t 0 + z t " t : Taking expectation: Assuming E(z t x 0 t) is invertible, then E(z t y t ) = E(z t x 0 t) 0 (because E(z t " t ) = 0): 0 = E(z t x 0 t) 1 E(zt y t ) Replacing the two expectations with their sample estimates ^ IV = ( TX z t x 0 t) 1 t=1 T X t=1 z t y t : In matrix notation ^ IV = (Z 0 X) 1 Z 0 y: Question: What are examples of y t ; x t and z t in macroeconomics? Can the above idea be generalized to nonlinear models allowing dim(z t ) 6= dim(x t )? We rst illustrate these two issues using examples, then present a formal framework. 1

1.2. Estimating parameters in monetary policy rules Clarida, Gali and Gertler (2000) estimated a forward-looking monetary policy reaction function for the postwar United States economy. We use their study to illustrate how moment conditions naturally arise in rational expectations models. Let rt denote the target rate for the nominal Federal Funds rate in period t. The target rate in each period is a function of the gaps between expected in ation and output and their respective target levels. Speci cally r t = r + (E [ t+1 j t ] ) + E [x t+1 j t ] ; (2) where t+1 denotes in ation, i.e., percentage change in the price level between time t and t + 1; is the target rate for in ation; x t+1 is the output gap, de ned as the percent deviation between actual GDP and the corresponding target; t is the information set of the agent at time t when the interest rate is set; r is the desired interest rate when in ation and output are at their target levels. De ne the ex ante real interest rate as rr t = r t E [ t+1 j t ]. Its target rate is rr = r : Then, the reaction function (2) can be represented as rr t = rr + ( 1) (E [ t+1 j t ] ) + E [x t+1 j t ] : 2

In the above, < 1 implies that the ex ante real rate falls with higher expected in ation; > 1 implies the opposite. The latter policy rule is often said to be stabilizing. Distinguishing between these two cases is of substantial importance. In practice, it may take more than one period for the interest rate to adjust toward its target. This is called interest rate smoothing. Clarida, Gali and Gertler (2000) modeled this as r t = r t 1 + (1 ) rt : This leads to the following policy reaction function r t = (1 ) [rr ( 1) + t+1 + x t+1 ] + r t 1 + t ; where t = (1 ) f [ t+1 E ( t+1 j t )] + [x t+1 E (x t+1 j t )]g : The term in curly brackets is a linear combination of forecast errors and thus orthogonal to any variable in t. The orthogonality is what delivers the desired moment restrictions, as seen below. Let z t denote a vector of instruments known at time t (i.e., contained in t ). The above two equations then imply the following set of orthogonality conditions E f[r t (1 ) [rr ( 1) + t+1 + x t+1 ] r t 1 ] z t g = 0: (3) In Clarida, Gali and Gertler (2000), z t includes the Funds rate, in ation, output gap, M2 growth, the spread between the long-term bond rate and three-month Treasury Bill rate. Clearly, such a choice involves some arbitrariness. This is hard to avoid in practice. Note that rr and ( 1) are not separately identi able. Clarida, Gali and Gertler (2000) assume rr is known and set it to the observed sample average. The equation (3) then has four unknown parameters, ; ; and. Clarida, Gali and Gertler (2000) nd that is greater than one for the Volcker- Greenspan period but less than 1 for the Pre-Volcker period (see Table II in their paper). They conclude that the monetary policy is better managed in the Volcker- Greenspan period. However, their study has been harshly criticized subsequently, most 3

notably by Cochrane (2011). The latter paper argues that the Taylor rule is in general not identi able if one allows for multiple equilibria, making conventional inferential procedure invalid. 1.3. Estimating the parameter of risk aversion Suppose a representative agent solves the following problem: " 1 # max E X t U(C t ) fc tg under the budget constraint t=0 (4) C t + P t Q t R t Q t M + W t ; where C t : consumption in period t; : discount factor; U(:): utility function; Q t : quantity of asset held at the end of period t; P t : price of asset at t; R t : date t payo from holding a unit of an M-period asset purchased at date t M; W t : (real) labor income at date t: Maximizing (4) leads to P t U 0 (C t ) = M E t Rt+M U 0 (C t+m ) for all t, (5) where E t [:] is the conditional expectation. Equivalently, E t M R t+m U 0 (C t+m ) P t U 0 1 = 0: (6) (C t ) 4

Suppose we can observe R t+m, P t, C t and is willing to accept that the utility function is given by U(C t ) = C t =: Then, (6) can be written as " E t M R t+m C 1 t+m P t C 1 t # 1 = 0: For any information variable observable to the agent at time t, say z t, " E M R t+m C! # 1 t+m 1 z t = 0: P t C 1 t 2. De nition We now de ne GMM in a general framework. Consider the following moment restriction E [m(x t ; 0 )] = 0; where X t is a random vector. In general, m(x t ; ) is a vector-valued function of X t. If the dimension of m(x t ; 0 ) is k, we say there are k moment restrictions. Suppose we have q parameters to estimate. Then the GMM estimator can be constructed as follows. First, evaluate the function m(x t ; ) at the observations: m(x t ; ); t = 1; :::; T: Next, compute the sample average Finally, solve m T () = T 1 T X t=1 m(x t ; ) m T (^) = 0: (7) The idea is very simple: the GMM estimator is obtained by matching the sample and population moments. However, if k > q, then (7) in general has no solution. The idea is then to take a weighted average of the k equations and make it as close to zero as possible, leading to the following general de nition for the GMM estimator. 5

De nition 1. (GMM estimator) Let W T matrix such that W T be an k by k symmetric positive de nite! p W 0 as the sample size T approaches in nity, where W 0 is non-random and positive de nite. The GMM estimator of ; denoted as ^(W T ), is given by ^(WT ) = arg min m T () 0 W T m T () (8) Clearly, the estimator is a function of W T. We return to the choice of W T later. For now, assume W T is already speci ed. 3. Global and local identi cation We consider conditions that can ensure E [m(x t ; 0 )] = 0 has a unique solution. 3.1. Global identi cation De nition 2. The parameter vector is globally identi ed at 0 based on the moment function m(:) if E [m(x t ; )] = 0 if and only if = 0 : A necessary condition for global identi cation is the "order condition". Order Condition: If k q, then we say the order condition for identi cation is satis ed. 1. if k = q, the model is just identi ed; 2. if k > q the model is over identi ed. The order condition is necessary but not su cient for identi cation. Necessary and su cient conditions are hard to nd for nonlinear models. In practice, a weaker concept, local identi cation, is often considered. 6

3.2. Local identi cation De nition 3. The parameter vector is locally identi ed at 0 by the moment function m(:) if there exists a neighborhood of 0, B( 0 ), such that inside this neighborhood E [m(x t ; )] = 0 if and only if = 0 : Rank Condition: We say the rank condition for local identi cation is satis ed if the k q matrix of derivatives @E [m(x t ; )] @ 0 (9) is continuous and has full column rank q at 0. Lemma 1. Suppose the rank of @E [m(x t ; )] =@ 0 is constant in a neighborhood of 0. Then, the parameter vector is locally identi ed at 0 by the moment function m(:) if and only the rank condition is satis ed. Remark 1. If the constant rank requirement in (9) is dropped, then the condition is suf- cient but not necessary. (That is, there are situations where the rank of @E [m(x t ; )] =@ 0 is less than q, but is still identi ed.) Remark 2. In the context of a linear model, for example, see the 2SLS in (1), we have @E [m(x t ; 0 )] @ 0 = @E [(y t x 0 t 0 )z t ] @ 0 = Ez t x 0 t: Hence the rank condition is equivalent to the requirement that Ez t x 0 t has rank q. Proof of the Lemma. We use the arguments in Theorem 1 in Rothenberg (1971) to prove the result. Suppose 0 is not locally identi ed. Then, there exists an in nite sequence of vectors f s g 1 s=1 approaching 0 such that, for each s, E [m(x t ; 0 )] = E [m(x t ; s )]. By the mean value theorem and the di erentiability of E [m(x t ; )] in ; h @E m j (X t ; ~ i (j) 0 = E [m j (X t ; 0 )] E [m j (X t ; s )] = @ 0 ( s 0 ); 7

where the subscript j denotes the j-th element of the vector and ~ (j) lies between s and 0 and in general depends on j. Let then h @E m j (X t ; ~ i (j) d s = s 0 k s 0 k ; @ 0 d s = 0 for every s. The sequence fd s g is an in nite sequence on the unit sphere and therefore there exists a limit point d (note that d does not depend on j). As s! 0 ; d s approaches d and by the continuity of @E [m(x t ; )] =@ 0 we have h @E m j (X t ; ~ i (j) lim s!1 @ 0 d s = @E [m j(x t ; 0 )] @ 0 d = 0: Because this holds for an arbitrary j, it holds for the full vector: @E [m(x t ; 0 )] @ 0 d = 0; which implies @E [m(xt ; 0 )] rank @ 0 < q: To show the converse, suppose that @E [m(x t ; 0 )] =@ 0 has constant rank < q in a neighborhood of 0 denoted by B( 0 ). Consider the characteristic vector c() associated with one of its zero roots. We have @E [m(x t ; )] @ 0 c() = 0 (10) for all 2 ( 0 ). Because the gradient is continuous and has constant rank in B( 0 ), the vector c() is continuous in B( 0 ). Consider the curve de ned by the function (v) which solves for 0 v v the di erential equation @(v) @v = c(); (0) = 0 : 8

Then, @E [m(x t ; (v))] @v = @E [m(x t; (v))] @(v) @(v) 0 = @E [m(x t; (v))] @v @(v) 0 c() = 0 for all 0 v v, where the last equality uses (10). Thus, E [m(x t ; )] is constant on the curve. This implies that 0 is unidenti able. This completes the proof. 3.3. Identi ed, but only weakly Local and global identi cation are properties of the population. If is unidenti ed, then it is impossible to pin down its value even with an in nite sample size. In practice, the situation can be worse because we only observe a nite sample size. Even if the parameters are globally identi ed, the GMM criterion function m T () 0 W T m T () can still be at or nearly at around 0 for such a sample size. This poses substantially challenge for inference, and has led to a sizeable literature that is often referred to as "inference under weak identi cation". In the remainder of this chapter, we assume the parameters are strongly identi ed. We return to weak identi cation later. 4. Estimation and inference in well identi ed models If the parameter is strongly identi ed, then under some additional regularity conditions the GMM estimator is consistent and converges at rate p T to a Normal distribution. 4.1. The asymptotic distribution Proposition 1. Suppose is globally identi ed at 0, Also, assume the following conditions are satis ed 1. (LLN) @m T () @ 0! p G() = @E [m(x t; )] @ 0 ; where the convergence holds uniformly in a compact neighborhood of 0. Assume G() is continuous and write G 0 G( 0 ). 9

2. (CLT) p T mt ( 0 )! d N(0; S 0 ) where S 0 is non-random and positive de nite. Then, p T (^(WT ) 0 )! d N(0; V (W 0 )) with V (W 0 ) = G 0 0W 0 G 0 1 (G 0 0 W 0 S 0 W 0 G 0 ) G 0 0W 0 G 0 1 Proof: Ruud (2001), pp.546-547. Remark 3. The LLN and CLT require @m T ()=@ 0 and m T () to be free of trends (time trend, or stochastic trends). This requirement is non-trivial. For example, variables such as GDP or price indices tend to grow over time. In practice, two methods are often used to eliminate such trends. The rst is to run the data through some lter. The second is to normalize the variables such that their trends will cancel out. We will provide illustrations in the next chapter. Remark 4. The limiting variance depends on the matrix W 0 : This implies that more e cient estimators can be obtained by appropriate choices of W T. 4.2. E cient GMM The limiting variance of GMM is minimized if W T is chosen such that (prove it!) W 0 = plim W T = S0 1 T!1 Recall that S 0 is the limiting variance of p T m T ( 0 ), hence e ciency is achieved by assigning more weights to moments that have smaller variances. In this case, the limiting distribution is given by p T (^ 0 )! d N(0; V 0 ) with V 0 = G 0 0S 1 0 G 0 1 : Remark 5. In subsequent discussions, we let ^ denote the e cient GMM estimator unless stated otherwise. 10

4.3. Two-step and continuous updating GMM The weighting matrix S 0 depends on the unknown parameter 0. Two ways to address this issues lead to the following two asymptotically equivalent estimators. 1. Obtain some preliminary estimate of using the identify matrix as weighting matrix. Denote the estimate as ^ 1 : Compute ^S T (^ 1 ) and solve ^GMM = arg min m T () 0 ^ST (^ 1 ) 1 m T (): This is often referred to as the "two-step GMM estimator". It is often the default choice in practice. 2. Obtain the estimate in one step, i.e., treat S T () as a function of and solve ^CUGMM = arg min m T () 0 ^ST () 1 m T () This is often referred to as the "Continuous Updating GMM estimator", or CUGMM. This is used less often compared with the two step estimator. 5. Testing parametric restrictions in well identi ed models We consider testing restrictions of the form R( 0 ) = 0; (11) where R(:) is s-vector of di erentiable function with s < q. It is useful to re-state the general model. It is speci ed by the following moment conditions E [m(x t ; 0 )] = 0 Given a sample of size T; the sample moments are given by. Let m T () = T 1 T X t=1 m(x t ; ): Q T () = m T () 0 ^S 1 T m T (); (12) 11

where ^S T is a consistent estimate of the limiting variance of p T m T ( 0 ). The e cient GMM estimator then solves Let ^ : unrestricted GMM estimate ~ : restricted GMM estimate ^ = arg min Q T (): We present the GMM counterparts to the Wald, score and the likelihood ration test. 5.1. Wald Test The Wald statistic evaluates the restriction at the unrestricted estimate: W = p " # 1 T R(^) 0 @R(^) @ 0 ^VT (^) @R(^) 0 pt R(^) (13) @ Recall that the GMM estimator satis es p T (^ 0 )! d N(0; V 0 ): Applying the Delta method, p T (R(^) R(0 ))! d N 0; @R( 0) @R( 0 ) 0 @ 0 V 0 : @ This implies p T (R(^) R(0 )) 0 @R(0 ) @R( 0 ) 0 1 p @ 0 V 0 T (R(^) R(0 ))! d 2 @ s: Because R( 0 ) = 0 under the null hypothesis, we have p T R(^) 0 @R(0 ) @R( 0 ) 0 1 p @ 0 V 0 T R(^)! d 2 @ s: Finally, ^ converges in probability to 0, we have @R(^) @ 0 ^VT (^) @R(^) 0 @! p @R( 0) @R( 0 ) 0 @ 0 V 0 : @ Therefore, the Wald statistic converges to 2 s under the null hypothesis. 12

Example 1. Consider the following model with endogeneity, y = X + " We want to test the linear restrictions R r = 0: Suppose there are k instruments, summarized by matrix Z. Assume the errors are iid. Then, ^ = X 0 P Z X 1 X 0 P Z y Its limiting distribution is given by p T (^ 0 )! d N(0; V 0 ) with The Wald test is V 0 = 2 plim T!1 T 1 X 0 P Z X 1 : W = (R^ r) 0 h R X 0 P Z X 1 R 0 i 1 (R^ r)=^ 2 : 5.2. Gradient test The Gradient test is the GMM counterpart to the score test. It only requires estimating model under the null hypothesis. The test computes the derivative of the GMM criterion function (12) at the restricted estimate (multiplied by p T ): p T @Q T () @ = 2 p T @m T () 0 ^S 1 T @ m T () (14) = 2 p T G T () 0 ^ST () 1 m T () The idea is that if the restrictions are true, then the above quantity should be close to zero. As in the score test, we form a quadratic form with an metric to judge the signi cance of the deviations from 0. The metric used here is the limiting variance of (14) under the null hypothesis, which is given by G 0 0S0 1 G 0 = V0 1 13

The Gradient test is therefore G = ( p T @Q T () @ 0 ) ^V T ()( p T @Q T () ) (15) @ where ^V () is a consistent estimate of V 0 under the null hypothesis. The null limiting distribution is 2 s where s is the number of restrictions. 5.3. Distance test Recall that the LR test examines the di erence in the likelihood functions with and without imposing the restrictions. The same idea can be used in the GMM framework, leading to the following test: D = T h Q T ( ~ ) i Q T (^) : The null limiting distribution is 2 s where s is the number of restrictions. 6. Model diagnostics We are interested in testing the speci cation of the model thorough testing the validity of the moment restrictions. 6.1. Formulating model diagnostics as testing for parametric restrictions Example 2. Consider the following model with endogeneity, y = X + ": Suppose we have k potential instruments, summarized by matrix Z. We conjecture that some of the instruments may be correlated with the errors, hence not valid. Suppose we have partitioned Z into Z 1 and Z 2 with Z 2 containing the questionable instruments. Then the problem reduces to testing the restriction E (y t x 0 t 0 )z 2;t = 0: 14

Consider a more general set up than the above example. Suppose we have partitioned the moments into two subsets, i.e., " # m 1 (X t ; 0 ) k1 1 m(x t ; 0 ) k1 = ; m 2 (X t ; 0 ) (k k1 )1 where we believe E(m 1 (X t ; 0 )) = 0 but think the second set of conditions E(m 2 (X t ; 0 )) = 0 are questionable. In other words, we want to test H 0 : E(m 2 (X t ; 0 )) = 0 (16) against the hypothesis H 1 : E(m 2 (X t ; 0 )) 6= 0 The problem (16) can reformulated as testing for parametric restrictions to which the trinity of test procedures applies. Re-write the hypothesis of interest as E(m 2 (X t ; 0 ) 0 ) = 0 where 0 = 0 under the null hypothesis and nonzero under the alternative hypothesis. This leads to the following augmented moment functions " # m a m 1 (X t ; ) (X t ; ; ) = m 2 (X t ; ) and the augmented moment restrictions: Em a (X t ; 0 ; 0 ) = 0 (17) By this simple transformation, the problem reduces to testing the parametric restrictions speci ed by = 0 based on moment conditions (17). 15

The augmented sample moments are given by m a T (; ) = T 1 The unrestricted estimates (^; ^) are given by The restricted estimates are ( ~ ; 0) T X t=1 m a (X t ; ; ) (^; ^) = arg min ; ma T (; ) 0 ^S 1 T ma T (; ) ~ = arg min m a T (; 0) 0 ^S 1 T ma T (; 0) Note that the same weighting matrix is used for the restricted and unrestricted estimates. Wald test. The Wald test can be constructed using the formula (13) but with ^ replaced by (^; ^). The restrictions are linear and given by h i " # R(; ) = 0 (k k1 )q I k k1 = 0: The details are omitted. The Gradient test. The LM test can be constructed using the formula (15) but with replaced by ( ~ ; 0). The Distance test is D = T h m a T ( ~ ; 0) 0 1 ^S T ma T ( ~ ; 0) m a T (^; ^) 0 1 ^S T ma T (^; ^) i ; or equivalently, D =T h m a T ( ~ ; 0) 0 1 ^S T ma T ( ~ i ; 0) m a 1;T (^) 0 1 ^S 11;T ma 1;T (^) ; where ^S 11;T consists of entries in ^S T corresponding to m 2 (X t ; ): 16

6.2. Testing overidentifying restrictions The speci cation tests discussed above require separating the moment conditions into two subsets. This may have undesirable consequences. In particular, if the FIRST subset in fact contains false moment restrictions, we may end up rejecting moment conditions in the SECOND subset even if they are valid. moment restrictions without having to dividing them up. This suggests testing for The idea is then to look at the magnitude of the GMM criterion function when all moments are used, and a large value indicates some moment conditions may be invalid. The resulting procedure is usually referred to as testing for "overidentifying restrictions" because testing is possible only if k > q. Because if k = q, then the objective function equals zero when evaluated at ^, hence gives no information about the validity of the moment conditions. The test statistic, J; and its limiting distribution are given by The limiting distribution has k J = T m T (^) 0 ^S 1 T m T (^)! d 2 k q q degrees of freedom intuitively because q moment conditions are used to estimated the q parameters (hence they are not "free"). course, such a test leaves open which moments are invalid should the test statistic appear statistically signi cant. Of 6.3. Hausman s Speci cation Test Hausman (1978) proposed a general testing methodology, which in particular can be employed to test for the validity of moment restrictions. The methodology di ers from the afore mentioned ones, because it is based on the sampling behavior of di erent estimators of parameters rather than population moments or parameters. In general, Hausman test works as follows. To test the null hypothesis, we need two estimators: ~ : E cient under the null hypothesis but inconsistent under the alternative hypothesis; 17

^ : Consistent under both the null and the alternative hypothesis, but less e cient under the null hypothesis. Then, H = ( ~ ^) 0 h ^V (^ ~ ) i ( ~ ^): It has a chi-square limiting distribution with q degrees of freedom, where q is the dimension of : We illustrate the test with the following example. Example 3. (Testing for endogeneity) Consider the model where the errors are iid. y = X + " We suspect some variables are correlated with the errors. Suppose a set of valid instruments is available. We have: If there is no endogeneity, then the OLS estimator is consistent and BLUE; and 2SLS is also consistent, but less e cient. If there is endogeneity, then the OLS estimator is not consistent; and the 2SLS is consistent. Let ^ OLS be the OLS estimator and ^ 2SLS the 2SLS estimator. Then the Hausman test is de ned as where ^V pt ^OLS H = T (^ OLS ^2SLS ) 0 ^V ( p T ^OLS ^2SLS ) (^ OLS ^2SLS ); under the null hypothesis, and (:) ^2SLS is an estimate for the limiting variance of p T ^OLS denotes generalized inverse because the variance matrix may be singular. Because ^ OLS is BLUE under the null hypothesis, we have V ( p T ^OLS ^2SLS ) = V ( p T ^2SLS 0 ) V ( p T ^OLS 0 ): where V ( p T ^2SLS 0 ) and p T (^ OLS 0 ). 0 ) and V ( p T ^OLS 18 ^2SLS 0 ) are the limiting variance of p T (^ 2SLS

Suppose the null hypothesis is true, i.e., there is no endogeneity, then the test has a chi-square limiting distribution by construction. If the null hypothesis is false, then ^ OLS is biased and inconsistent while ^ 2SLS is still consistent, then the di erence ^ OLS ^2SLS will tend to be large. This forces the test to take on a large value. 19

References [1] Clarida, R., Galí, J. and Gertler, M. (2000): "Monetary Policy Rules And Macroeconomic Stability: Evidence And Some Theory," The Quarterly Journal of Economics, 115, 147-180. [2] Cochrane, J.H. (2011): "Determinacy and Identi cation with Taylor Rules", Journal of Political Economy, 119, 565-615. [3] Rothenberg, T. J. (1971): "Identi cation in Parametric Models," Econometrica, 39, 577-591. [4] Ruud, P.A. (2000): An Introduction to Classical Econometric Theory. Oxford University Press. 20