POLS, GLS, FGLS, GMM. Outline of Linear Systems of Equations. Common Coefficients, Panel Data Model. Preliminaries

Similar documents
Efficient GMM LECTURE 12 GMM II

11 THE GMM ESTIMATION

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Asymptotic Results for the Linear Regression Model

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

MA Advanced Econometrics: Properties of Least Squares Estimators

Properties and Hypothesis Testing

x iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0.

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Statistical Properties of OLS estimators

Economics 326 Methods of Empirical Research in Economics. Lecture 18: The asymptotic variance of OLS and heteroskedasticity

11 Correlation and Regression

ARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t

Linear Regression Models, OLS, Assumptions and Properties

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Lecture 3. Properties of Summary Statistics: Sampling Distribution

1 General linear Model Continued..

Lesson 11: Simple Linear Regression

Statistical Inference Based on Extremum Estimators

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

1 Covariance Estimation

Random Variables, Sampling and Estimation

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

ECON 3150/4150, Spring term Lecture 3

Lecture 19: Convergence

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Topic 9: Sampling Distributions of Estimators

Solution to Chapter 2 Analytical Exercises

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Economics 102C: Advanced Topics in Econometrics 4 - Asymptotics & Large Sample Properties of OLS

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Estimation of the Mean and the ACVF

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Questions and Answers on Maximum Likelihood

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Machine Learning Brett Bernstein

Linear Regression Demystified

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic

University of Lausanne - École des HEC LECTURE NOTES ADVANCED ECONOMETRICS. Preliminary version, do not quote, cite and reproduce without permission

The standard deviation of the mean

Lecture 8: Convergence of transformations and law of large numbers

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Lecture 7: Properties of Random Samples

CLRM estimation Pietro Coretto Econometrics

Stat 200 -Testing Summary Page 1

Topic 9: Sampling Distributions of Estimators

Lecture 5: Linear Regressions

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

TAMS24: Notations and Formulas

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

(all terms are scalars).the minimization is clearer in sum notation:

Topic 9: Sampling Distributions of Estimators

This is an introductory course in Analysis of Variance and Design of Experiments.

Basis for simulation techniques

Algebra of Least Squares

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Last Lecture. Wald Test

A statistical method to determine sample size to estimate characteristic value of soil parameters

Simple Linear Regression

1 Inferential Methods for Correlation and Regression Analysis

Econometrics II Tutorial Problems No. 4

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Rank tests and regression rank scores tests in measurement error models

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

1.010 Uncertainty in Engineering Fall 2008

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Output Analysis (2, Chapters 10 &11 Law)

Exponential Families and Bayesian Inference

MATHEMATICAL SCIENCES PAPER-II

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

CSE 527, Additional notes on MLE & EM

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Bayesian Methods: Introduction to Multi-parameter Models

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

The Method of Least Squares. To understand least squares fitting of data.

Statistics 511 Additional Materials

¹Y 1 ¹ Y 2 p s. 2 1 =n 1 + s 2 2=n 2. ¹X X n i. X i u i. i=1 ( ^Y i ¹ Y i ) 2 + P n

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

4. Partial Sums and the Central Limit Theorem

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Single-Equation GMM: Estimation

Transcription:

Outlie of Liear Systems of Equatios POLS, GLS, FGLS, GMM Commo Coefficiets, Pael Data Model Prelimiaries he liear pael data model is a static model because all explaatory variables are dated cotemporaeously with the depedet variable. It is also cosidered a commo coefficiet model because β is the same for all idividuals across time. y it = x itβ + u it where x it : K ; i =,..., ad t =,..., ; is large ad is small. We assume i j are all idepedet. Wat time heterogeeity? he use time-dummies or the Seemigly Urelated Regressio SUR model. Wat idividual heterogeeity? Fixed Effects FE ad/or Radom Effects RE, or somethig more geeral such as Radom Coefficiets RC For right ow, there is o idividual or time heterogeeity preset i the model. We will iclude uobserved idividual heterogeeity ito the pael data model later. We will also discuss mulivariate liear systems with time heterogeeity, i.e., the SUR model, at aother time. o simplify the otatio, we ca stake the model over time y i = x i β + u i where y i :, u i :, ad x i x i2 x i =. x i K

POLS Idetificatio Assumptios Assumptio POLS.: Ex it u it = 0 i,t Withi equatio, or cotemporaeous, exogeeity. For most applicatios, x i has a sufficiet umber of elemets equal to uity, so that Assumptio POLS. implies that Eu i = 0 his is the weakest assumptio we ca impose i a regressio framework to get cosistet estimators of β, ad it ca hold whe some elemets of x i are correlated with some elemets of u i. For example, it allows x is ad u it to be correlated whe s t. Uder Assumptio POLS., the vector β satisfies E [x i y i x i β] = 0 or Ex i x iβ = Ex i y i. For each i, x i y i is a K vectors ad x i x i is a K Ksymmetric, positive semidefiite radom matrix. herefore, Ex i x i is always a K K symmetric, postive semidefiite oradom matrix the expectatio here is defie over the populatio distributio of x i. o be able to estimate β, we eed to assume that it is the oly K vectors that satisfies. Assumptio POLS.2: rak [ t= Ex itx it ] = K Uder Assumptios POLS. ad POLS.2, we ca write β = [Ex i x i] Ex i y i, which shows that the two assumptios idetify the vector β. Estimator Defie the Pooled Ordiary Least Squares POLS estimator as: ˆβ POLS = i= t= x it x it i= t= x it y it = x ix i x iy i i= i= For computig ˆβ POLS usig matrix laguage programmig, it is sometimes useful to write ˆβ = X X X Y where X x. x K ad Y y. y his estimator is called the pooled ordiary least squares POLS estimator because it correspods to ruig OLS o the observatios pooled across i ad t. Page 2 of 0

Asymptotic Properties Cosistecy Sice Ex it u it = 0 by assumptio, the ˆβ POLS β = i=t= x it x it i=t= x it u it p Ex it x it Ex it u it = 0 Asymptotic Normality y i = x i β + u i ˆβPOLS β = i= V R = Ex ix i Ex i u i u ix i Ex ix i ˆV R = x ix i x iû i û ix i i= where û i = y i x i ˆβPOLS H 0 : Rβ = r; Wald: i= x ix d i iu i=x i Ex ix i N0,Ex iu i u ix i i= Rˆβ r R ˆV R R Rˆβ r d χ 2 K q x ix i System Coditioal Homoskedasticity SCH Assumptio: Eu i u i x i = Eu i u i By the law of iterated expectatios, the SCH assumptio implies that Ex i u iu i x i = Ex i Ωx i where Ω Eu i u i. V NR = Ex ix i Ex i Ωx i Ex ix i ˆV NR = x ix i x i ˆΩx i i= ˆΩ = i= û i û i i= p Eu i u i = Ω i= x ix i Page 3 of 0

Homoskedasticity ad No Serial Correlatio o apply the usual OLS statistics from the pooled OLS regressio across i ad t ad for pooled OLS to be relatively efficiet, we require that u it be homoskedastic across t ad serially ucorrelated. he weakest forms of these coditios are the followig: Assumptio POLS.3: a Eu 2 it x itx it = σ2 Ex it x it, t =,...,, where σ2 = Eut 2 t; b Eu it u is x it x is = 0, t s, t,s =,...,. he first part of Assumptio POLS.3 is a fairly strog homoskedasticity assumptio; sufficiet is Eu 2 it x it = Eu 2 it = σ2 t. his meas ot oly that the coditioal variace does ot deped o x it, but also that the ucoditioal variace is the same i every time period. Assumptio POLS3b essetially resticts the coditioal covariaces of the errors across differet time periods to be zero. I fact, sice x it almost always cotais a costat, POLS.3b requires at a miimum that Eu it u is = 0, t s. Sufficiet for POLS.3b is Eu it u is x it x is = Eu it u is = 0, t s, t,s =,...,. It is importat to remember that Assumptio POLS.3 implies more tha just a certai form of the ucoditioal variace matrix of u i. Assumptio POLS.3 implies Eu i u i = σ2 I, which meas that the ucoditioal variaces are costat ad the ucoditioal covariaces are zero, but it also effectively restricts the coditioal variaces ad covariaces. If Assumptio POLS.3 holds, the AVarˆβ POLS = σ 2 [Ex i x i] /, so its appropriate estimator is ˆσ 2 X X = i= t= x it x it where ˆσ 2 is the usual OLS variace estimator from the pooled regressio y it o x it. GLS Idetificatio Assumptios Assumptio SGLS.: Ex it u is = 0 t,s =,..., Cross equatio exogeeity, i.e., strict exogeeity. his assumptio is more easily stated usig the Kroecker product, Ex i u i = 0. ypically, at least oe elemet of x i is uity, so i practice Assumptio SGLS. implies that Eu i = 0. SGLS. is stroger tha POLS., i.e., SGLS. implies POLS.. his stroger assumptio is eeded for GLS to be cosistet. Note, GLS is less robust tha POLS, but it is more effiicet tha POLS if SGLS. holds ad we add assumptios o the coditioal variace matrix of u i. A sufficiet coditio for Assumptio SGLS. is the zero coditioal mea assumptio, i.e., Eu i x i = 0. he secod momet matrix of u i, which is ecessarily costat across i by the radom samplig assumptio, plays a critical role for GLS estimatio of systems of equatios. Defie the G G positive semidefiite matrix Ω Eu i u i. Because Eu i = 0 i the vast majority of applicatios, Page 4 of 0

we will refer to Ω as the ucoditioal variace matrix of u i. Sometimes, a equatio must be dropped to esure that Ω is osigluar. Here, we assume Ω is osigular, so Assumptio SGLS. implies that Ex i Ω u i = 0 I place of Assumptio POLS.2, we assume that a weighted expected outer product of x i is osigular. Here we isert the assumptio of a osigular variace matrix for completeess. Assumptio SGLS.2: Ω is positive defiite ad Ex i Ω x i is osigular. Estimator Let Ω = C 2C 2. We use Cholesky or raigular decompositio, which we ca do for ay symmetric ad positive semidefiite matrix. Sicet Ω is ivertible, the Ω = C 2C 2. he usual motivatio for the GLS estimator is to trasform a system of equatios where the error has oscalar variace-covariace matrix ito a system where the error vector has a scalar variacecovariace matrix. We obtai this by premultiplyig the stacked equatio by C 2, ad the we get ỹ i = x i β + ũ i where ỹ i = C 2 y i, x i = C 2 x i, ad ũ i = C 2 u i. Simple algebra shows that Eũ i ũ i = I. he geeralized least squares GLS estimator of β is obtaied by performig POLS of ỹ i o x i. ˆβ GLS = x i x i x iỹ i = i= i= x iω x i x iω y i = [ X I N Ω X ] [ X I N Ω Y i= i= Asymptotic Properties Cosistecy Sice Ex i Ω u i = 0, the ˆβ GLS β = + where A = [ Ex i Ω x i ] x iω x i x iω u i i= i= p A Ex iω u i = 0 If we are willig to make the zero coditioal mea assumptio, ˆβGLS ca be show to be ubiased coditioal o X. Note, cosistecy fails if we oly make Assumptio POLS.. Ex i u i = 0 does ot imply Ex i Ω u i = 0. If Assumptio POLS. holds but Assumptio SGLS. fails, the trasformatio equatio ỹ i = x i β + ũ i geerally iduces correlatio betwee x i ad ũ i. Asymptotic Normality ˆβGLS β = i= x iω x i x iω u i i= Page 5 of 0

By the CL, x iω u i i= d N0,B where B Ex i Ω u i u i Ω x i Sice i= x i Ω u i = O p ad x i Ω x i A = o p, we ca write i= ˆβGLS β = A x iω u i + o p i= It follows from the asymptotic equivalece lemma that ˆβ GLS β V R = E x i x i E x i ũ i ũ i x i E xi x i AVarˆβGLS = A BA / a N0,A BA. SE: Use the robust stadard error for POLS of ỹ i o x i Feasible Geeralized Least Squares FGLS Asymptotic Properties Obtaiig the GLS estimator ˆβ GLS requires kowig Ω up to scale. hat is, we must be able to write Ω = σ 2 C, where C is a kow positive defiite matrix ad σ 2 is allowed to be a ukow costat. Sometimes C is kow, but much ofte it is ukow. herefore, we ow tur to the aalysis of feasible GLS FGLS estimatio. I FGLS estimatio, we replace the ukow matrix Ω with a cosistet estimator. Because the estimator of Ω appears highly oliearly i the expressio for the FGLS estimator, derivig fiite sample properties of FGLS is geerally difficult. he asymptotic properties of the FGLS estimator are easily established as because its first-order asymptotic properties are idetical to those of the GLS estimator uder Assumptios SGLS. ad SGLS.2 We iitially assume we have a cosiste estimator, ˆΩ, of Ω: p lim ˆΩ = Ω. Whe Ω is allowed to be a geeral positive defiite matrix, the followig estimatio approach ca be used. First, obtai the POLS estimator of β, which we deote ˇβ. We already showed that ˇβ is cosistet for β uder Assumptios POLS. ad POLS.2, ad therefore uder Assumptios SGLS. ad POLS.2. So, a atural estimator of Ω is ˆΩ i= ǔ i ǔ i Page 6 of 0

where ǔ i y i x i ˇβ are the POLS residuals. We ca show that this estimator is cosistet for Ω uder Assumptios SGLS. ad SOLS.2 ad staded momet coditios. Give ˆΩ, the feasible GLS FGLS estimator of β is ˆβ FGLS = [ ] [ ] x i ˆΩ x i x i ˆΩ y i = [ X I N ˆΩ X ] [ X I N ˆΩ Y ] i= i= We already kow that GLS is cositet ad asymptotically ormal. Because ˆΩ coverges to Ω, it is ot surprisig that FGLS is cosistet, ad we ca also verify that FGLS has the same limitig distributio of GLS, i.e., they are -equivalet. his asymptotic equivalece is importat because we ot have to worry that ˆΩ is a estimator whe performig asymptotic iferece about β usig ˆβ FGLS. I the FGLS cotext, a cosistet estimator of A is  i= x i ˆΩ x i A cosistet estimator of B is also readily available after FGLS estimatio. Defie the FGLS residuals by û i y i x i ˆβFGLS. Usig stadard arguemets, a cosistet estimator of B is ˆB i= x i ˆΩ û i û i ˆΩ x i he estimator of AVarˆβ ca be writte as  ˆB /. his is the extesio of the White heteroskedasticityrobust asymptotic variace estimator, ad it is robust uder Assumptios SGLS. ad SGLS.2 System Coditioal Homoskedasticity SCH Assumptio Uder the assumptios so far, FGLS has othig to offer over POLS, ad it is less robust. However, uder a additio assumptio, FGLS is asymptotically more efficeit that POLS ad other estimators. Assumptio SGLS.3: Eu i u i x i = Eu i u i = Ω he SCH assumptio puts restrictios o the coditioal variaces ad covariaces of elemets of u i. If Eu i x i = 0, the this assumptio is the same as assumig Varu i x i = Varu i = Ω. Aother way to state this assumptio is B = A, so this simplifies the asymptotic variace. By the law of iterated expectatios, the SCH assumptio implies that Ex i Ω u i u i Ω x i = Ex i Ω x i where Ω Eu i u i. Note, we oly eed the weaker coditio to determie the usual variace matrix for FGLS. Uder this weaker assumptio, alog with Assumptios SGLS. ad SGLS.2, the asymptotic variace of the FGLS estimator is AVarˆβ A /. We obtai a ˆ estimator of this variace matrix by usig our cosistet estimator of A, so AVarˆβ =  /. his is the usual formula for the asymptotic variace of FGLS. It is orobust i the sese that it relies o homoskedasticity assumptio. If heteroskedasticity i u i is suspected, the the robust estimator, which was derived earlier, should be used. Page 7 of 0

Uder Assumptios SGLS., POLS.2, SGLS.2, ad SGLS.3 the FGLS estimator is more efficeit that the POLS estimator. We ca actually say much more: FGLS is more efficiet tha ay other estimtator that uses the orthogoality coditios Ex i u i = 0. Summary of the Various System GMM Estimators Prelimiaries y it = x it β + u it For all t, x it is a K vector. Suppose we have a L t vector of istrumets z it, so the umber of istrumets ca vary with time. he istrumets must satisfy Ez it u it = 0 for all t. Stackig the equatios over t, we have y i = x i β + u i which is the same setup up as i 2, ad z i has the structure of 4. hus, the momet coditios are give by: Ez i[y i x i β] = Ez iu i = Eg i L = 0 he efficiet GMM estimator that uses oly the momets Ez it u it = 0 for all t, is the GMM estimator with optimal weightig matrix. However, the choice of istrumet matrix i 5 meas we are oly usig the momet coditios aggregated across time, t= Ez itu it = 0. hus, to obtai the efficiet GMM estimator, the matrix of istrumets should be as i 4 because this expresses the full set of momet coditios. List of estimators to deal with edogeeity are: System GMM, 3SLS, S2SLS, P2SLS, SIV, PIV, H.. GMM Estimator ˆβ GMM = argmi β = [ x i z i i= Ŵ z i x i i= i= z i u i Ŵ ] [ i=x iz i i= Ŵ z i u i z i y i i= o obtai the optimal GMM esitmator, we choose Ŵ such that limŵ = W [Ez i u iu i z i]. hus, the weightig matrix for the optimal GMM estimator is Ŵ = z iu i u iz i i= ] Page 8 of 0

2. 3SLS he weightig matrix used by the 3SLS estimator is Ŵ = z i ˆΩz i where ˆΩ = i= i= û i û i. he procedure o how to obtai the 3SLS estimator is: First two stages: Ru P2SLS to get û i hird Stage: Obtai Ŵ ad perform system GMM estimatio. he 3SLS estimator is efficiet uder the coditioal homoskedasiticy assumptio: Eu i u i z i = Eu i u i Ω 3. S2SLS he weightig matrix used by the S2SLS estimator is Ŵ = z iz i i= he S2SLS estimator is efficiet uder the coditioal homoskedasticity assumptio ad whe Ω is spherical, i.e., Ω = σ 2 I. 4. P2SLS If L t is the same for all t, i.e., L t = L for all t, the z i has the structure of 5. he P2SLS estimator exploits the orthogoality coditio Ez iu i = Ez i u i + + z i u i = 0 ad the coditioal homoskedasticity assumptio. So, whe z i has the structure of 5, the weightig matrix used by the P2SLS estimator is Ŵ = z it z it i=t= ad the P2SLS estimator is give by ˆβ = x it z it z it z it i=t= i=t= i= t= z it x it i= t= x it z it i= t= z it z it i= he P2SLS is efficiet uder the coditioal homoskedasticity assumptio. Note, whe z it = x it, this estimator reduces to the POLS estimator. t= z it y it Page 9 of 0

5. SIV If z i has the structure of 4 ad L = K, the we have exactly eough IVs for the explaatory variables i the system. hus, the SIV estimator is give by ˆβ = N N z ix i N N z iy i i= i= 6. PIV If z i has the structure of 5 ad L = K, the we have exactly eough IVs for the explaatory variables i the system. hus, the pooled istrumetal variables PIV estimator is give by ˆβ = i= t= z it x it i= t= z it y it Note, whe z it = x it, this estimator reduces to the POLS estimator. Page 0 of 0