Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Similar documents
What s New in Econometrics. Lecture 15

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

The properties of L p -GMM estimators

Introduction to Estimation Methods for Time Series models Lecture 2

Generalized Method of Moments (GMM) Estimation

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Exponential Tilting with Weak Instruments: Estimation and Testing

Generalized Method of Moment

Statistics and econometrics

Testing Restrictions and Comparing Models

MCMC algorithms for fitting Bayesian models

The generalized method of moments

Asymptotics for Nonlinear GMM

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Econometrics I, Estimation

GMM Estimation and Testing

Lecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches

ECON 5350 Class Notes Nonlinear Regression Models

Research Article Optimal Portfolio Estimation for Dependent Financial Returns with Generalized Empirical Likelihood

Econ 583 Final Exam Fall 2008

Quick Review on Linear Multiple Regression

Economic modelling and forecasting

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Generalized Method of Moments Estimation

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Association studies and regression

Advanced Econometrics

Measuring nuisance parameter effects in Bayesian inference

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

Economics 583: Econometric Theory I A Primer on Asymptotics

... Econometric Methods for the Analysis of Dynamic General Equilibrium Models

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

A Primer on Asymptotics

Bayes and Empirical Bayes Estimation of the Scale Parameter of the Gamma Distribution under Balanced Loss Functions

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments

Bayesian Econometrics

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

ECON 3150/4150, Spring term Lecture 6

Short Questions (Do two out of three) 15 points each

Panel Threshold Regression Models with Endogenous Threshold Variables

Ch. 5 Hypothesis Testing

Long-Run Covariability

CS281 Section 4: Factor Analysis and PCA

Econometrics II - EXAM Answer each question in separate sheets in three hours

ORTHOGONALITY TESTS IN LINEAR MODELS SEUNG CHAN AHN ARIZONA STATE UNIVERSITY ABSTRACT

The BLP Method of Demand Curve Estimation in Industrial Organization

Maximum Likelihood (ML) Estimation

Chapter 1. GMM: Basic Concepts

Multivariate Distributions

University of Pavia. M Estimators. Eduardo Rossi

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Calibration Estimation for Semiparametric Copula Models under Missing Data

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

Machine Learning 2017

On asymptotic properties of Quasi-ML, GMM and. EL estimators of covariance structure models

δ -method and M-estimation

R. Koenker Spring 2017 Economics 574 Problem Set 1

Graduate Econometrics I: Maximum Likelihood I

SOLUTIONS Problem Set 2: Static Entry Games

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Slide Set 14 Inference Basded on the GMM Estimator. Econometrics Master in Economics and Finance (MEF) Università degli Studi di Napoli Federico II

Specification Test for Instrumental Variables Regression with Many Instruments

Moment Based Inference with Stratified Data

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010

Financial Econometrics and Volatility Models Estimation of Stochastic Volatility Models

Advanced Quantitative Methods: maximum likelihood

Tests forspatial Lag Dependence Based onmethodof Moments Estimation

Ultra High Dimensional Variable Selection with Endogenous Variables

Inference in non-linear time series

Adjusted Empirical Likelihood for Long-memory Time Series Models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection. Sung Y. Park CUHK

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Consistent Parameter Estimation for Conditional Moment Restrictions

Econometrics of Panel Data

Discrete Dependent Variable Models

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Exercises Chapter 4 Statistical Hypothesis Testing

Flexible Estimation of Treatment Effect Parameters

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Bayesian Indirect Inference and the ABC of GMM

The outline for Unit 3

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Greene, Econometric Analysis (6th ed, 2008)

Closest Moment Estimation under General Conditions

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

ML estimation: Random-intercepts logistic model. and z

What s New in Econometrics. Lecture 13

Transcription:

University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle of maximum likelihood which required a complete specification of the probability model describing the mechanism generating the data. Even when we considered M-estimations and the idea of quasi-maximum likelihood in which the specified probability model was admittedly not the true data generating mechanism, it nevertheless presumed that we had specified a complete probabilistic model for the data, and KLIC was used to implicitly define the relevant population parameters of this model. Occasionally, we have departed from this paradigm. In quantile regression, for example, we need not provide a fully specified probabilistic model; we require only a specification of the parametric model describing a single conditional quantile function. Of course, if we proceed to specify a parametric model for all conditional quantile functions we are back, essentially, to the framework underlying the MLE. Another example of a partially specified parametric model is, of course, the classical linear regression model which we may interpret as simply a specification of the parametric form of the conditional mean function. Strengthening the assumptions to specify a form for the conditional density at each design point, x i, would complete the full probability model for the data, but this is certainly not necessary to justify ordinary regression M-estimators. These examples illustrate the method-of-moments estimation paradigm. A parametric model is specified by asserting the parametric form of certain observable functions of the data. Then under certain rank conditions on the Jacobian of these functions, they can be inverted, or solved for the parameters of interest. This approach to estimation has been advanced by Sims and elaborated by Hansen (1982) and proven highly successful, particularly in addressing specification problems in macroeconomics where fully parametric statistical models are difficult, but certain orthogonality conditions may be used to specify identifying moment conditions. Actually, the prime examples of such specification of models and associated methods of estimation are classical regression and instrumental variables estimators in econometrics. Consider the linear model, y i = x iβ + u i u = 1,..., n and suppose we have instruments, {z i }, so we claim that Ez i u i = 0 i.e., that the IV s are orthogonal to the u i s and that Ez i x i is (asymptotically) invertible, requiring at the very least that there be as many coordinates to z i as to β. If the latter condition is exactly satisfied so we have, β, exactly identified by the 1

orthogonality conditions we can define the IV estimator as ˆβ = ( z i x i) 1 z i y i = (Z X) 1 Z y. Under the classical condition that it is easy to see that Euu = σ 2 I n( ˆβ β) N (0, σ 2 (X Z/n) 1 Z Z/n(Z X/n) 1 ) provided these matrices are invertible and the Lindeberg condition max i z i 2 / z i 2 0 holds. (The latter condition is needed for the CLT on n 1/2 z i u i to apply.) The condition that the matrix Z X/n be invertible, which is really an identifiability condition, is obviously quite restrictive. In general, we might hope to have more IV s than elements of β, i.e., have a parametric model which is overidentified. In this case we can not expect to satisfy the empirical counterpart of our population orthogonality conditions EZ u = 0 exactly, since we have a system of q > p equations Z û = Z (y Xβ) m(β) in only p = dim(β) unknowns. It is reasonable to replace the infeasible requirement that Z û = 0, by the requirement that it be small in some reasonable norm, e.g., û ZA 1 Z û = min! The immediate question is then: how should we choose the matrix A and a natural response would be to set A equal to the inverse of the covariance matrix of the orthogonality conditions, i.e., A = Cov (Z u) = EZ uu Z = σ 2 Z Z This leads to the estimator ˆβ = argmin û(β) P Z û(β) = (X P Z X) 1 X P Z y where P Z = Z(Z Z) 1 Z and thus gives us the familiar two stage least squares estimator. This approach can be extended in various ways. If Euu = Ω, then it is clear that we should take A = Z ΩZ and ˆβ is adapted accordingly. If the response function, i.e., the conditional mean function, is nonlinear in parameters, so for example û i (β) = y i g(x i, β) 2

we may still define ˆβ as the minimizer ˆβ = argmin û(β) P Z û(β) where we now have the nonlinear two stage least squares estimator. It is straightforward to derive the asymptotic behavior of this estimator using our usual strategy. Let J = βû(β) denote the Jacobian matrix of the model, so optimality would require that J P Z û(β) = 0 at β = ˆβ. Expanding this condition around β = β 0, the true parameter and evaluating at β = ˆβ we obtain 0 = J P Z (û(β 0 ) + J( ˆβ β 0 )) + ( ˆβ β 0 ) H P Z û(β 0 ) + R where H = 2 û(β) and R is the remainder. Then writing n( ˆβ β0 ) = (J P Z J/n + H P Z û(β 0 )/n) 1 n 1/2 J P Z û(β 0 ) + o p (1) and noting that the Hessian term tends to zero we have the linear representation, n( ˆβ β0 ) = (J P Z J/n) 1 n 1/2 J P Z û(β 0 ) + o p (1) and assuming that Eû(β 0 )û(β 0 ) = σ 2 I we have n( ˆβ β0 ) N (0, σ 2 J P Z J/n) How would this change if σ 2 I were Ω? What should be used in place of P Z for this case? If we continue to use P Z, we obtain V = (JP Z J) 1 J P Z ΩP Z J(JP Z J) 1 for the limiting form of the covariance matrix. However, as above, if we replace P Z by then we see that V collapses to P Z = Z(Z ΩZ) 1 Z V = (J P ZJ) 1 It is easy to see that V << V, in the sense of positive definite matrices so PZ is clearly preferred to P Z. The difficulty, of course, is that we need to estimate Ω or at least the lower dimensional matrix Z ΩZ. This is very much like the familiar Eicker-White problem and can be handled analogously. In the case of dependence the approach of Newey-West can be adopted. Inference in GMM estimation Not surprisingly, GMM inference can be conducted using any of the three approaches already considered in the discussion of likelihood based inference. Clearly Wald tests based on the largesample theory outlined above are possible. The GMM criterion function can itself be used as a 3

quasi-likelihood for constructing LR-type tests, and finally we may construct LM tests based on the gradient of the GMM criterion evaluated at the restricted estimate. Of these approaches, the LR approach would appear to offer the most attractive strategy, particularly, in situations in which the objective function may be highly non-quadratic. Information theoretic approaches to GMM inference Imbens, Johnson and Spady(1998) considered some alternative approaches to GMM inference based on information theoretic ideas. Following their notation, suppose we have iid observations {z i } n i=1 from F and moment conditions Eψ(Z, θ 0 ) = 0 where θ 0 R p constitutes a unique solution to the q >> p equations Eψ(Z, θ) = 0 This may be viewed as a GMM problem with criterion function Q(θ) = n 2 ψ W 1 ψ where ψ = (ψ(z i, θ) and (optimally) we would try to choose W = Eψψ. As we have seen, the minimizer of Q( ), say ˆθ, satisfies n(ˆθ θ) N (0, H 1 JH 1 ) where H = E ψ, and J = Eψψ, and furthermore nq(ˆθ) χ 2 q p which provides a test of over-identifying restrictions. Usually, as we have seen this is accomplished in two steps. In the first step we replace W by some preliminary estimate, such as I q, and in the second step we use this first stage estimator, say θ to obtain Ŵ = n 1 ψ(z i, θ)ψ(z i, θ) and minimize again to obtain ˆθ. An alternative to this two-step procedure is the empirical likelihood (Owen (1988) and Qin and Lawless (1994)) estimator which solves max n 1 (log π i log(n 1 )) π,θ subject to ψ(zi, θ)π i = 0 πi = 1. The solution to this problem may be seen to be the solution to the system of equations ( ) t 0 = ψ(z i, θ)/(1 + t ψ(z i, θ) ψ(zi, θ)/(1 + t ψ(z i, θ) 4

where t R q denotes a vector of q Lagrange multiplier parameters which is called the tilting parameter. This can be seen as follows by concentrating with respect to the π i s. Differentiating wrt π i we obtain n 1 π i λ t ψ i = 0 multiply by π i and summing, n 1 λπ i t ψ i π i = 0 Note that the last term is zero and consequently λ = 1 so we have from the first equation n 1 π i = 1 + t ψ i substituting back into the constraint and differentiating wrt to θ and t yields the estimating equations as given. An alternative suggested by Imbens, Johnson and Spady is the exponential tilting estimator which replaces the empirical likelihood term by max π,θ πi (log(n 1 ) log π i ) s.t. ψi π i = 0 πi = 1 Note that this just reverses the prior KL divergence expression. A similar argument to the one just employed yields the modified estimating equation, ( ) t 0 = ψ(z i, θ) exp(t ψ(z i, θ)) ψ(zi, θ) exp(t ψ(z i, θ)) In this approach π i = e t ψ i / j e t ψ j and our problem becomes max{ t ψ i exp(t ψ i ) exp(t ψ i ) + log( exp t ψ i ) + log( j exp(t ψ j ))} s.t. ψi exp(t ψ i ) exp(t ψ i ) = 0 When the constraints are satisfied the first term is identically zero so we can focus on the second term which we may recognize as the cumulant generating function of ψ. Thus the problem may be written more compactly as, max t,θ K(t, θ) s.t. K t(t, θ) = 0 where K t = t K(t, θ), higher order derivatives will be denoted similarly K tθ, K θθ, etc. Note that at a solution both K t (t, θ) = 0 and K θ (t, θ) = 0. In practice the constrained problem may be replaced by the unconstrained problem max K(t, θ) 1 t,θ 2 AK tw 1 K t 5

where A is a large scalar and W is a positive definite matrix of conformable dimension. A sensible choice of W is W (t, θ) = K tt + K t K t evaluated at some preliminary estimates of t, θ. The form of this matrix is not crucial since at a solution K t = 0 and the contribution of the penalty term vanishes. A connection with our original formulation of GMM may be made here by noting that ˆθ gmm = argmax { 1 2 AK t(0, θ) W (0, ˆθ) 1 K t (0, θ)} for some preliminary consistent choice of ˆθ, since W (0, θ) = K tt (0, θ) + K t (0, θ)k t (0, θ) = n 1 ψ(z i, θ)ψ(z i, θ) An important recent development in this direction involves the use of EL methods combined with Bayesian MCMC ideas to impose prior information in addition to that provided by the EL framework. An interesting example of this approach is the paper of Yang and He (2012) which treats quantile regression models in this way. Testing Overidentifying Moment Conditions We have already discussed what IJS call average moment tests based on T n = nq(ˆθ gmm ) χ 2 q p. A second version of this test is based on iterating the estimation of the weight matrix W in GMM until it is consistent with the one based on the final estimates of θ. Since the Lagrange multiplier parameter t measures the marginal cost of imposing the moment conditions in terms of the sacrifice in the objective function, it seems natural to consider tests based on ˆt as well. The large sample theory of (ˆθ, ˆt) is given by n ( ˆθ θ0 ˆt ) N (( 0 0 so we may consider the test statistic ), ( (H J 1 H) 1 0 0 J 1 (I H(H J 1 H) 1 H )J 1 T n = ˆt V n ˆt where V n = J 1 (I H(H J 1 H) 1 H )J 1 and Vn denotes any generalized inverse of V n. Such tests perform extremely well in monte carlo relative the classical average moment tests which are commonly used. )) Reference Imbens, G.W., Johnson, P. and Spady, R.H., (1998). Information Theoretic Approaches to Inference in Moment Condition Models, Econometrica, 66, 333-59. Qin, J. and Lawless, J. (1994). Generalized estimating equations, Annals of Statistics. Owen, A. (2001) Empirical Likelihood, Chapman-Hall/CRC Press. Yang, Y. and X. He (2012) Bayesian empirical likelihood for quantile regression Annals of Statistics, 40, 1102-1131. 6