Two-Stage Least Squares as Minimum Distance

Two-Stage Least Squares as Minimum Distance Frank Windmeijer Discussion Paper 17 / 683 7 June 2017 Department of Economics University of Bristo Priory Road Compex Bristo BS8 1TU United Kingdom

Two-Stage Least Squares as Minimum Distance Frank Windmeijer Department of Economics and IEU University of Bristo, UK June 2017 Abstract The Two-Stage Least Squares instrumenta variabes IV estimator for the parameters in inear modes with a singe endogenous variabe is shown to be identica to an optima Minimum Distance MD estimator. This MD estimator is a weighted average of the individua instrument-specific IV estimates, with the weights determined by the variances and covariances of the individua estimators under conditiona homoskedasticity. It is further shown that the Sargan test statistic for overidentifying restrictions is the same as the MD criterion test statistic. This provides an intuitive interpretation of the Sargan test. The equivaence resuts aso appy to the effi cient two-step GMM and robust optima MD estimators and criterion functions, aowing for genera forms of heteroskedasticity. It is further shown how these resuts extend to the inear overidentified IV mode with mutipe endogenous variabes. JEL Cassification: C26, C13, C12 Key Words: Instrumenta Variabes, Two-Stage Least Squares, Minimum Distance, Overidentification Test 1 Introduction For a singe endogenous variabe inear mode with mutipe instruments, the standard IV estimator is the Two-Stage Least Squares 2SLS estimator, which is a consistent and asymptoticay effi cient estimator under standard reguarity assumptions and conditiona homoskedasticity, see e.g. Hayashi 2000, page 228. This means that the 2SLS estimator combines the information from the mutipe instruments asymptoticay optimay under f.windmeijer@bristo.ac.uk. This research was party funded by the Medica Research Counci MC_UU_12013/9. I woud ike to thank Steve Pischke, Mark Schaffer, Chris Skees and Jon Tempe for hepfu comments. 1

these conditions. An aternative estimator is a Minimum Distance MD estimator, which is a weighted average of the individua instrument-specific IV estimates of the parameter of interest. It is shown in the next section, Section 2, that the optima MD estimator with the weights determined by the variances and covariances of the instrumentspecific estimators, specified under conditiona homoskedasticity, is identica to the 2SLS estimator. It is further shown that the Sargan test statistic for overidentifying restrictions is the same as the MD criterion test statistic, providing another intuitive interpretation of the Sargan test. Surprisingy, it appears that these equivaence resuts are not avaiabe in the iterature, and are not discussed in standard textbooks. Angrist 1991 derives simiar resuts, but for the specia case of orthogona binary instruments, see aso the discussion in Angrist and Pischke 2009, Section 4.2.2, whereas the resuts here are for genera designs. Recenty, Chen, Jacho-Chávez and Linton 2016 used this setting and the two estimators as an exampe in their much wider-ranging paper, but they did not reaise their equivaence and the resuts obtained in Section 2 modify the statements of Chen et a. 2016, pages 48-49. In Section 2.2, the resut is extended to the equivaence of the two-step GMM estimator and the optima minimum distance estimator based on a robust variance-covariance estimator of the vector of instrument-specific IV estimates, robust to genera forms of heteroskedasticity in the cross-sectiona setting considered here. The two-step Hansen J-test statistic for overidentifying restrictions Hansen, 1982 is aso shown to be the same as the robust MD criterion test statistic. Section 3 derives equivaence resuts for the mutipe endogenous variabes case. The setting considered there can best be characterised by the foowing simpe exampe. Consider a inear mode with two endogenous variabes, and there are four instruments avaiabe. In principe, there are then six distinct sets of two, just identifying instruments. However, a coection of three sets of two instruments that span a instruments is suffi - cient to provide a information needed. For exampe, if the instruments are denoted by z 1, z 2, z 3, and z 4, then the coection of sets {z 1, z 2, z 2, z 3, z 3, z 4 } is suffi cient. This resuts in three just identified IV estimates of the two parameters of interest, and Section 3 shows that the per parameter optima minimum distance estimators are identica to the 2SLS estimators. 2

2 Equivaence Resut for Singe Endogenous Variabe Mode We have a sampe {y i, x i, z i} n i1 and consider the mode y i x i β + u i, where x i is endogenous, such that E x i u i 0. Note that other exogenous variabes in the mode, incuding the constant, have been partiaed out. The k z > 1 instrument vector z i satisfies E z i u i 0 and is reated to x i via the inear projection, or first-stage mode x i z iπ + v i. Let y and x be the n-vectors y 1, y 2,..., y n and x 1, x 2,..., x n, and Z the n k z matrix with i-th row z i and j-th coumn z j, i 1,.., n, j 1,..., k z. Let P Z Z Z Z 1 Z and S β y xβ P Z y xβ. 1 The we-known Two-Stage Least Squares 2SLS instrumenta variabes estimator is then defined as and is given by where x P Z x. β 2SLS arg min S β β β 2SLS x P Z x 1 x P Z y x x 1 x y, 2 Next consider the individua instrument-specific IV estimates for β, β j x j x j 1 x j y; x j P zj x for j 1,.., k z. Let the k z -vector β ind be defined as β ind β1, β 2,..., β kz. 3 3

Under standard reguarity assumptions and conditiona homoskedasticity, E u 2 i z i σ 2 u, the imiting distribution of β ind is given by n βind ιβ d N 0, σ 2 uω, where ι is a k z -vector of ones. The eements of Ω are given by Ω js pim x j x j /n 1 pim x j x s /n pim x s x s /n 1 for j, s 1,..., k z. The asymptotic variances of β j and the covariances of β j and β s are therefore given by Var βj Cov βj, β s σ 2 1 u x j x j σ 2 u x j x j 1 x j x s x s x s 1, for j, s 1,..., k z, j s, resuting in Var βind σ 2 u Ω; Ω D 1 X ind Xind D 1, 4 where X ind [ x 1 x 2... x kz ] ; 5 D diag x j x j, 6 j 1,..., k z, where diag a j is a diagona matrix with j-th diagona eement a j. Under standard conditions, pim n Ω Ω. Let Q β βind ιβ βind Ω 1 ιβ. 7 The optima MD estimator for β is then defined as β md arg min β Q β, resuting in β md ι Ω 1 ι 1 ι Ω 1 βind. 8 4

It is cear that the the MD estimator is a weighted average of the individua instrument specific estimates, k z β md w md,j βj, j1 with k z j1 w md,j 1. The 2SLS estimator is aso a weighted average. Let j 1,..., k z, then As it therefore foows that z 1 diag jz j z j x, X ind Z. β ind D 1 X ind y, β 2ss x P Z x 1 x P Z y x P Z x 1 x Z Z Z 1 1 D D 1 Z y x P Z x 1 x Z Z Z 1 1 D βind k z j1 w 2ss,j βj. The next proposition states the main equivaence resut, S β Q β, hence β md β 2ss and w md,j w 2ss,j for j 1,.., k z. Proposition 1 Let S β, Q β, β 2ss and β md be as defined in 1, 7, 2 and 8 respectivey. Then for β R, S β Q β and hence β md β 2ss. Proof. As x x j x P Z P zj x x P zj x x x j x j x j, it foows that ι D x 1 x 1 x 2 x 2... x k z x kz x Xind x Xind, and hence D 1 X ind x D 1 X ind x ι. 5

As P Xind Z Z Z 1 Z PZ, it foows that, for β R, S β y xβ P Z y xβ Hence β md β 2SLS. y xβ P Xind y xβ y xβ Xind D 1 D X X 1 ind ind D D 1 X ind y xβ D 1 X indy D 1 X ind xβ Ω 1 D 1 X indy D 1 X ind xβ βind ιβ βind Ω 1 ιβ Q β. 2.1 Test for Overidentifying Restrictions The standard test for the nu hypothesis H 0 : E z i u i 0 is the Sargan test statistic given by Sar β2ss σ 2 u S β2ss, where σ 2 u y x β 2ss y x β 2ss /n. Under the nu, standard reguarity assumptions and conditiona homoskedasticity, Sar β2ss converges in distribution to a χ 2 k z 1 distributed random variabe, see e.g. Hayashi 2000, page 228. Next consider the MD criterion MD βmd σ 2 u Q βmd, where we can use σ 2 u because β md β 2ss. Denote β j pim βj for j 1,..., k z. Then under the nu hypothesis H 0 : β 1 β 2... β kz β, and the same maintained assumptions as above, MD βmd converges in distribution to a χ 2 k z 1 distributed random variabe, see e.g. Cameron and Trivedi 2005, page 203. hence It foows directy from the resuts of Proposition 1 that S β2ss Sar β2ss MD βmd. Q βmd and 6

Remark 1 The equivaence of Sar β2ss and MD βmd estabishes an intuitive interpretation of the Sargan test. It tests whether the individua instrument-specific estimators a estimate the same parameter vaue, see aso Parente and Santos Siva 2012. 2.2 Effi cient Two-Step Estimation The resuts above were derived assuming conditiona homoskedasticity. The equivaence resuts can be extended to the effi cient two-step GMM estimator. For the cross-sectiona setup considered here, this woud cover the case of genera conditiona heteroskedasticity, E u 2 i z i g z i. Using β 2ss as the initia consistent one-step GMM estimator, the effi cient two-step GMM estimator is defined as where W β gmm arg min β J β, 9 J β y xβ ZW β2ss 1 Z y xβ ; 10 n 2 y i x i β2ss zi z i. 11 β2ss i1 The Hansen J-test for overidentifying restrictions is given by J βgmm. Under standard assumptions, J βgmm d χ 2 k z 1 under the nu H 0 : E z i u i 0. Next, consider the MD criterion based on a robust variance estimator for β ind, using β md, defined as where the eements of Σ are given by for j, s 1,..., k z. Σ j,s Ω r D 1 Σ D 1, 12 n i1 Define the robust MD estimator as y i x i βmd 2 xji x si, 13 β md,r arg min β MD r β, 14 where MD r β βind ιβ Ω 1 r βind ιβ. 15 7

Under the nu as specified above, H 0 : β 1 β 2... β kz β, and standard assumptions, MD r βmd,r converges in distribution to a χ 2 k z 1 distributed random variabe. The next proposition estabishes the equivaence of J β and MD r β, hence β gmm βgmm β md,r and J MD r βmd,r. Proposition 2 Let J β, MD r β, β gmm and β md,r be as defined in 10, 15, 9 and 14 respectivey. Then, for β R, J β MD r β. Hence β md,r β gmm and MD r βmd,r J βgmm. Proof. Denoting the i-th row of X ind by X ind[i.], then Ω r can be written as 2 y i x i βmd X ind[i.] Xind[i.] D 1. For β R, Ω r D 1 n i1 J β y xβ ZW 1 β2ss Z y xβ y xβ Z D 1 D 1 W β2ss 1 1 D D 1 Z y xβ n 1 2 y xβ Xind D 1 D y i x i β2ss zi z i D D 1 X ind y xβ i1 n 1 2 βind ιβ D y i x i βmd Xind[i.] Xind[i.] βind ιβ Ω 1 r i1 βind ιβ MD r β. Hence β gmm β md,r and J βgmm MD r βmd,r. D βind ιβ Remark 2 An aternative "one-step" robust variance estimator for the MD estimator is given by i1 Ω Φ D 1 Φ D 1, with the eements of Φ given by n Φ j,s y i x i βj y i x i βs x ji x si, for j, s 1,..., k z. The resuting minimum distance estimator, β md,φ, has the same imiting distribution as β md,r, but differs in finite sampes. 8

3 Mutipe Endogenous Variabes Consider next the mutipe endogenous variabes mode y i x iβ + u i, where x i is a k x vector of endogenous variabes. There are k z > k x instruments z i avaiabe. Let X be the n k x matrix of expanatory variabes, with -th coumn x, then the 2SLS estimator is obtained as and is given by β 2ss arg min S β β S β y Xβ P Z y Xβ 16 β 2ss X P Z X 1 X P Z y 1 X X X y where X P Z X. An MD estimator coud of course be obtained here in simiar fashion to the onevariabe case above, from k z k x sets of just-identifying instruments and a generaized inverse for the now rank deficient variance matrix Ω. Cacuating the variance matrix under conditiona homoskedasticity eads again to equivaence of the 2SLS and MD estimators. However, more interesting resuts can be derived for the 2SLS and MD estimators of the individua coeffi cients β, 1,.., k x. Denote by X. the -th coumn of X, and et X be the k x 1 coumns of X, excuding X.. The 2SLS estimator for β is given by β,2ss X.M X X. 1 X. M X y x x 1 x y 17 where for a genera n k matrix A, M A I n A A A 1 A, with I n is the identity matrix of order n, and x M X X.. 18 Let { Z [t]} k z k x+1 t1 be a coection of k z k x + 1 sets of k x instruments Z [t] such that a instruments have been incuded. For exampe, { Z [t] z t,..., z t+kx 1 } k z k x+1 t1 is such 9

a set. From these sets, we get k z k x + 1 just identified IV estimates β [t] of β. Let β[t] β,ind be the k z k x + 1-vector of the individua estimates of β. Let X [t] P Z [t]x, and X [t]. and X [t] defined anaogousy to above. The eements of β,ind are then given by β [t],ind X[t]. M X[t] x [t] x [t] X [t]. 1 [t] x y 1 X[t]. M X[t] y for t 1,..., k z k x + 1, where x [t] M X [t] X[t].. 19 The asymptotic variances and covariances under conditiona homoskedasticity are given by β[t] Var,ind β[t] [s] Cov,ind, β,ind σ 2 u σ 2 u x [t] x [t] The MD estimator for β is then obtained as 1 x [t] x [t] 1 x [t] x [s] x [s] 1 x [s]. β,md arg min Q β ; β Q β β,ind ιβ Ω 1 β,ind ιβ 20 where ι is here a k z k x + 1 vector of ones, and Ω D 1 X X D 1, with and D diag x [t] β,md is therefore given by X x [1] x [t], t 1,..., k z k x + 1. β,md,..., x [kz kx+1] 21 1 1 ι D X X D ι ι D X X 1 D β,ind 1 1 ι D X X D ι ι D X X 1 X y, 22 10

where the atter equaity hods as β,ind D 1 X y. 23 We are now in the same situation as that of the singe endogenous variabe case, with x, X ind and D repaced by x, X and D respectivey. The next proposition estabishes the equivaence of β,2ss and β,md for 1,..., k x. Proposition 3 For 1,..., k x, et β,2ss, β,ind and β,md be as defined in 17, 23 and 22 respectivey, with β,ind based on a coection of k z k x +1 sets of k x instruments { Z [t] } k z k x+1 t1 that contains a instruments. Then β,2ss β,md for 1,..., k x. Proof. From the definitions of x and x [t] in 18 and 19 respectivey, it foows that x x [t] x 1 P Z P Z X X P Z X X P Z x x x [t] for t 1,..., k z k x + 1, and hence P Z [t] P Z [t]x X 1 P Z [t]x X P Z [t] P Z [t] P Z [t]x X 1 P Z [t]x X P Z [t] x [t] x [t], ι D x X. x x Therefore, from 22, β,md x P X x 1 x P X y. As the sets of instruments { Z [t]} k z k x+1 t1 contain a k z instruments, it foows that x is in the coumn space of X, and so P X x x. Therefore, β,md x x 1 x y β,2ss, for 1,..., k x. Next, consider the Sargan test statistic, given by Sar β2ss σ 2 u y X β 2ss PZ y X β 2ss, 24 11

where σ 2 u y X β 2ss y X β 2ss /n. Under the nu H 0 : E z i u i 0, standard d reguarity conditions and conditiona homoskedasticity, Sar β2ss χ 2 k z k x. Consider the MD statistics MD β,md σ 2 u β,ind ιβ Ω 1 j β,ind ιβ, 25 d for 1,..., k x. Let β,ind pim β,ind, then MD β,md χ 2 k z k x under the nu H 0 : β,ind ιβ, but as σ 2 u has to be a consistent estimator of σ 2 u, the maintained assumptions are that β s,ind ιβ s for s 1,..., k x, s. The foowing proposition states the equivaence of Sar β2ss and MD β,md for 1,..., k x. Proposition 4 Let Sar β2ss and MD β,md MD β,md for 1,..., k x. Sar β2ss be defined as in 24 and 25, then Proof. As X y X β 2ss 0, β,2ss β,md, and defining ỹ M X P Z y, it foows that S β2ss y X β 2ss PZ y X β 2ss y x β,2ss X β,2ss PZ y x β,2ss X β,2ss y x β,2ss X β,2ss PZ M X P Z y x β,2ss X β,2ss ỹ x β,2ss ỹ x β,2ss ỹ x β,2ss P X ỹ x β,2ss D 1 X ỹ β,ind ι β,md Q β,md 1 X x β,md D X X D D 1 X ỹ 1 D Ω 1 β,ind ι β,md and hence Sar β2ss MD β,md for 1,..., k x. D 1 X x β,md As for the singe-endogenous variabe case, these resuts can be extended to the twostep GMM and robust MD estimators. 12

References [1] Angrist, J.D., 1991, Grouped-Data Estimation and Testing in Simpe Labor-Suppy Modes, Journa of Econometrics 47, 243-266. [2] Angrist, J.D. and J.-S. Pischke, 2009, Mosty Harmess Econometrics. An Empiricist s Companion, Princeton University Press, Princeton and Oxford. [3] Cameron, A.C. and P.K. Trivedi, 2005, Microeconometrics. Methods and Appications, Cambridge University Press, New York. [4] Chen, X., D.T. Jacho-Chávez and O. Linton, 2016, Averaging of an Increasing Number of Moment Condition Estimators, Econometric Theory 32, 30-70. [5] Hansen, L.P., 1982, Large Sampe Properties of Generaized Method of Moments Estimators, Econometrica 50, 1029-1054. [6] Hayashi, F., 2000, Econometrics, Princeton University Press, Princeton. [7] Parente, P.M.C. and J.M.C. Santos Siva 2012 A Cautionary Note on Tests of Overidentifying Restrictions, Economics Letters 115, 314-317. 13