Implementations of tests on the exogeneity of selected variables and their performance in practice Pleus, M.

Size: px
Start display at page:

Download "Implementations of tests on the exogeneity of selected variables and their performance in practice Pleus, M."

Transcription

1 UvA-DARE (Digital Academic Repository) mplementations of tests on the exogeneity of selected variables and their performance in practice Pleus, M. Link to publication Citation for published version (APA): Pleus, M. (2015). mplementations of tests on the exogeneity of selected variables and their performance in practice Amsterdam: Tinbergen nstitute General rights t is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations f you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. n case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam ( Download date: 30 Apr 2018

2 mplementations of tests on the exogeneity of selected variables and their performance in practice Milan Pleus Universiteit van Amsterdam

3 mplementations of tests on the exogeneity of selected variables and their performance in practice

4 SBN Cover design: Crasborn Graphic Designers bno, Valkenburg a.d. Geul This book is no. 617 of the Tinbergen nstitute Research Series, established through cooperation between Thela Thesis and the Tinbergen nstitute. A list of books which already appeared in the series can be found in the back.

5 mplementations of tests on the exogeneity of selected variables and their performance in practice ACADEMSCH PROEFSCHRFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus prof. dr. D.C. van den Boom ten overstaan van een door het college voor promoties ingestelde commissie, in het openbaar te verdedigen in de Agnietenkapel op vrijdag 29 mei 2015, te 14:00 uur door Milan Pleus geboren te Naarden

6 Promotiecommissie: Promotor: Prof. dr. J.F. Kiviet Overige leden: Prof. dr. H.P. Boswijk Dr. M.J.G. Bun Prof. dr. J.-M. Dufour Prof. dr. F. Kleibergen Prof. dr. F. Windmeijer Faculteit Economie en Bedrijfskunde Dit onderzoek werd mede mogelijk gemaakt door steun van de Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) in het programma Statistical inference methods regarding effectivity of endogenous policy measures.

7 Acknowledgements This thesis marks the end of four years of research done at the University of Amsterdam, a period mostly reflect on with much pleasure. Although every PhD project has its struggles, consider myself lucky to have encountered relatively few of them. One of the more important reasons for this is the encouragement and distraction offered by various people. t is here that would like to take the opportunity to thank them. First of all would like to thank my supervisor Jan Kiviet. cannot begin to imagine these years without his guidance. Apart from Jan being an excellent supervisor and researcher, can only speak highly of him as a person. Shortly before starting this project it became clear that Jan would accept a position at Nanyang Technological University in Singapore. This entailed that he would only be in Amsterdam for three to six months a year. Consequently, have made several trips to Singapore, for which am very grateful. During these stays Jan and his wife Rose never failed to make me feel most welcome. want to thank the other members of the committee, Peter Boswijk, Maurice Bun, Jean-Marie Dufour, Frank Kleibergen and Frank Windmeijer, for reading the manuscript and providing me with helpful comments. Of course must thank my fellow PhD student and friend Rutger Poldermans, with whom have shared many experiences over the years. Although we have distracted each other on many occasions, have found our discussions to be of great help and inspiration. want to thank my other colleagues at the University of Amsterdam and in particular: Jan Bogers, Simon Broda, Cees Diks, Kees Jan van Garderen, Noud van Giersbergen, Artūras Juodis, Dick Kerver, Herman ten Napel, Kees Nieuwland, Hans van Ophem, Andrew Pua and Roald Ramer. Daan in 't Veld was kind enough to help me with typesetting this thesis, which is greatly appreciated. t has always been my intention to thank Mars Cramer for his conversation and it saddens me that this is no longer possible. Furthermore would like to thank the Netherlands Organisation for Scientic Research (NWO) for granting financial support to the project Statistical inference methods regarding effectivity of endogenous policy measures. Without their support this thesis would not have materialized. i

8 Outside of the academic community am grateful to my friends, in particular Dick Lorier, Jaap Lorier and Daniëlla Brals. would like to thank the members of futsal team Oranje, whose company always enjoy. A special thanks also to Monique van Buul and Paul Kamphuis for their support. am privileged to have experienced the loving care of my parents, Jan and Mone. They have always supported me and were available in times of need. My brother Timo is one of the most talented persons know and am grateful to have him as a friend. Finally, would like to thank Marte for her love and support. She makes me happy.

9 Contents 1 ntroduction Outline of the thesis The performance of tests on endogeneity of subsets of explanatory variables scanned by simulation ntroduction Testing the orthogonality of subsets of explanatory variables The model and setting The source of any estimator discrepancy Testing based on the source of any discrepancy Testing based on the discrepancy as such Testing based on covariance of structural and reduced form disturbances Testing by an incremental Sargan test Concerns for practitioners Earlier Monte Carlo designs and results A more comprehensive Monte Carlo design The simulated data generating process Simulation design parameter space Simulation findings on rejection probabilities At least one exogenous regressor Both regressors endogenous Results for bootstrapped tests A bootstrap routine for subset DWH test statistics Simulation results for bootstrapped test statistics Empirical case study Conclusions iii

10 3 On overidentifying restrictions tests and their incremental versions ntroduction Testing overidentifying restrictions Test statistics and distributions Power properties Neglecting heteroskedasticity A higher order refinement to the Sargan test Simulation design Simulation findings on rejection probabilities Conclusions Appendix 3.A Proofs of theorems Appendix 3.B Details on the corrected statistic Accuracy and efficiency of various GMM inference techniques in dynamic micro panel data models: theory ntroduction Basic GMM results for linear models Model and estimators Some algebraic peculiarities Particular test procedures mplementations for dynamic micro panel models Model and assumptions Removing individual effects by first differencing Respecting the equation in levels as well Coefficient restriction tests Tests of overidentification restrictions Modified GMM ntermediate conclusions Appendix 4.A Corrected variance estimation for 2-step GMM Appendix 4.B Partialling out and GMM Appendix 4.C Extracting redundant moment conditions Accuracy and efficiency of various GMM inference techniques in dynamic micro panel data models: practice ntroduction Simulation design Simulation results iv

11 5.3.1 DGPs under effect stationarity Nonstationarity Empirical results Major findings Appendix 5.A Derivations for (5.17) Refined exogeneity tests in dynamic panel data models ntroduction Exogeneity tests Estimators and assumptions ncremental Sargan-Hansen test Hausman test Some possible refinements Diagonal Sargan-Hansen test Finite sample corrected variance for the Hausman test Testing exogeneity in dynamic panel data models Model and assumptions Full comprehensive internal instrument matrices Estimators Establishing endogeneity Establishing weak exogeneity Simulation design Simulation results Results under strict exogeneity Results under weak exogeneity Results under endogeneity Empirical case study Conclusions (2) Appendix 6.A Non-negativeness of J Appendix 6.B Proof of Theorem Appendix 6.C Estimating the variance of the vector of contrasts Appendix 6.D Correcting H (2) Bibliography 197 Summary 207 v

12 Samenvatting (Summary in Dutch) 213 vi

13 Chapter 1 ntroduction n any econometrics class the estimation method of Ordinary Least Squares (OLS) is usually the first students get acquainted with. Benefits of this method are its nice statistical properties and the fact that it is relatively easy to derive and comprehend. One of the assumptions needed for consistency of the OLS estimator is that the right-hand side variables (explanatory variables) are not endogenous. Contrary to exogenous and predetermined explanatory variables, endogenous explanatory variables are contemporaneously correlated with the disturbance term. Endogeneity may arise for one of various reasons: (i) some of the explanatory variables are subject to measurement error; (ii) someofthe explanatory variables and the dependent variable are jointly determined by a system of simultaneous equations; (iii) some parts of the model are overlooked or unobserved and correlated with some explanatory variables. These causes are frequently encountered in practice and call for alternative estimation techniques. An example in which this plays a role is when one investigates the effectiveness of policy measures, as they are often endogenous with respect to their targets. A popular method that allows for endogeneity of explanatory variables is nstrumental Variables (V) or Two-Stage Least Squares (2SLS). At least as many external instrumental variables as there are endogenous regressors have to be found. These instrumental variables must satisfy two properties: (i) they should be valid, meaning that they are exogenous or predetermined with respect to the disturbance term in the equation of interest; (ii) they should be sufficiently correlated with the endogenous regressors. As can be seen, the V estimator trades one orthogonality condition for another to regain consistency. However, as for instance Bazzi and Clemens (2013) state, the most valid instruments could be the weakest, and the strongest could be the least valid. Both requirements have received much attention in the literature. The last three decades much progress has been made with respect to inference techniques that are robust to weak instruments. nfluential 1

14 papers on this subject are for instance Anderson and Rubin (1949), Nelson and Startz (1990a,b), Bound et al. (1995), Staiger and Stock (1997), Kleibergen (2002, 2005) and Moreira (2003), whereas Stock et al. (2002) provide a survey. Tests on the validity of the instruments are generally called overidentifying restrictions tests for the following reason. As already indicated, consistent estimation of the causal relationship requires at least as many valid external instruments as there are endogenous explanatory variables. An instrument is external if it has coefficient zero in the causal relationship and is therefore rightfully excluded from it. The model parameters are said to be just identified if the number of external instruments equals the number of endogenous explanatory variables. By estimating the parameters of a just identified model, V imposes uncorrelatedness of all instruments with the disturbances. Hence, in the just identified case the uncorrelatedness cannot be tested. The model parameters are overidentified if the number of external instruments exceeds the number of endogenous explanatory variables. Only whether the instruments yielded by this excess of required exclusion restrictions are correlated with the disturbances can be tested. n fact, testing the exclusion from the model of the overidentifying external instruments is equivalent to testing the validity of the corresponding external instruments. The literature regarding these tests generally dates further back to papers by for instance Sargan (1958), Basmann (1960), Hwang (1980a) Hausman and Taylor (1981), Hansen (1982), Magdalinos (1985, 1994) and Newey (1985). More recently, Bowsher (2002) has investigated the performance of overidentifying restrictions tests in dynamic panel data models. Parente and Silva (2012) warn that an insignificant test outcome may provide little comfort. Hahn et al. (2011) have proposed a test that is robust to certain forms of weak instruments and Chao et al. (2014) have developed a test that is robust to many instruments and heteroskedasticity. The asymptotic variance of the limiting distribution of the V estimator depends on the strength of the external instruments. OLS has the smallest asymptotic variance as it instruments the explanatory variables by themselves, yielding the best possible fit. Superfluously treating explanatory variables as endogenous therefore increases the asymptotic variance and makes the study more reliant on the availability of external instruments. Tests on the orthogonality of all possibly endogenous explanatory variables have been suggested by Durbin (1954), Wu (1973), Revankar and Hartley (1973), Revankar (1978) and Hausman (1978) and are often referred to as Durbin-Wu-Hausman (DWH) tests. Tests on the orthogonality of subsets of potentially endogenous regressors have been discussed by Hwang (1980b), Spencer and Berk (1981), Wu (1983), Smith (1984, 1985), Hwang (1985) and Newey (1985). Some of the proposed tests are asymptotically or even algebraically equivalent. 2

15 Sharp-eyed readers may have formed a conjecture by now regarding the subject of this thesis. t are the tests regarding the validity of instruments and the exogeneity of explanatory variables that will receive our interest and in particular their subset versions. These so-called exogeneity tests have in common that they test whether particular variables are endogenous or predetermined regarding their correlation with the disturbance term. More generally stated, they all infer about the validity of moment conditions. t is therefore no surprise that all tests on subsets of possibly endogenous explanatory variables can be applied to subsets of external instruments too. These tests are of great importance to practitioners as they provide guidance regarding consistency and efficiency of estimators. As shown above, tests for the orthogonality of subsets of variables have received substantial attention. However, generally accepted rules for best practice on how to approach this problem do not seem available and are not yet supported by any simulation evidence. n the first part of this thesis properties of various tests are reviewed and their small sample behaviour is investigated in the context of cross-sectional data. The second part of this thesis deals with inference techniques in dynamic panel data models and in particular with testing for exogeneity in that special context. Panel data offer some meaningful advantages over single-indexed data. The fact that for every individual several time-series observations are available offers the possibility to deal with unobserved individual effects that are correlated with some of the explanatory variables, possibly eliminating the above as third mentioned source of endogeneity. n standard applications these unobserved individual effects have to be assumed constant over time. Several transformations exist that get rid of the unobserved individual effects. n case of dynamic relationships, when the number of individuals is relatively large while covering just a few time periods, the analysis is often based on the generalized method of moments (GMM). This class of estimators, proposed by Hansen (1982), includes both OLS and V as special cases. GMM is often praised for its flexibility, generality, ease of use and efficiency. As the transformations of the data that get rid of the individual effects render the lagged dependent variable endogenous with respect to the transformed disturbance term, instrumental variables are required. The popular first difference transformation also induces serial dependence in the disturbances in which case GMM offers efficiency gains over V when taking this into account. Another benefit of panel data is the almost free availability of internal instruments. First noted by Anderson and Hsiao (1981) the lagged values of the explanatory variables establish instruments, potentially rendering the search for external instruments superfluous. Several studies have proposed additional instruments. These include Arellano and Bond (1991), Ahn and Schmidt (1995), Arellano and Bover (1995) and Blundell and Bond (1998). Although yielding an asymptotically 3

16 efficient estimator, its bias may be substantial if the number of instruments grows large relative to the width of the panel, see for instance Newey and Windmeijer (2009). Not only the bias of the estimator is affected. Bowsher (2002) reports simulation findings on overidentifying restrictions tests and finds that their rejection probabilities approach zero when the number of instruments grows large. Findings like these have led practitioners to consider instrument reduction methods, which include omitting long lags and collapsing. This last technique reduces the number of instruments by taking a particular linear transformation of reduced rank. The two reduction methods can also be combined. Whereas it is common practice to test the overidentifying restrictions after V or GMM estimation, their subset versions are less frequently used. There is a noticeable exception in dynamic panel data models. The popular Blundell-Bond GMM estimator employs an additional set of instruments with respect to the Arellano-Bond GMM estimator. These additional instruments are only valid under effect stationarity. Therefore Blundell- Bond estimates are often supplied with a subset test on these instruments to validate its use. Also in adequately specified dynamic panel data models the subset tests can be used to classify explanatory variables. n this context the methodological difference between testing the orthogonality of explanatory variables and instruments further fades. Classification of the explanatory variables implies which lags constitute valid instruments. These lags have a logical ordering, the most recent ones are naturally assumed to be the strongest and the ones furthest away are assumed to be the weakest. Wrongly classifying explanatory variables may render a subset of instruments either invalid or cause the practitioner to abstain from using some of the strongest instruments available. As only a subset of instruments is rendered invalid by misclassification, one can directly test this subset instead of relying on the standard overidentifying restrictions test often reported by popular software packages. Many studies on econometric theory are accompanied by results on Monte Carlo simulations. These Monte Carlo simulations allow researchers to infer about finite sample properties of techniques that are often too complicated to derive analytically. As simulation results are much less general than theoretical results, a well designed simulation study is essential. n this thesis a lot of attention has been paid to designing these simulations. For more information on this topic see Kiviet (2012). 1.1 Outline of the thesis Tests for classification as endogenous or predetermined of arbitrary subsets of explanatory variables are investigated in Chapter 2, which is based on Kiviet and Pleus (2014). 4

17 These tests are formulated as significance tests in auxiliary V regressions and their relationships with various more classic test procedures are examined and critically compared with statements in the literature. Then simulation experiments are designed by solving the data generating process parameters from salient econometric features, namely: degree of simultaneity and multicollinearity of regressors, and individual and joint strength of external instrumental variables. Next, for various test implementations, a wide class of relevant cases is scanned for flaws in performance regarding type and errors. Substantial size distortions occur, but these can be cured remarkably well through bootstrapping, except when instruments are weak. The power of the subset tests is such that they establish an essential addition to the well-known classic full-set DWH tests in a data based classification of individual explanatory variables. The performance of overidentifying restrictions tests is examined in Chapter 3. Wellknown tests such as the Sargan (1958) test, the test by Basmann (1960) and a Hausmantype test are examined. Power properties of these tests and their incremental versions are derived and recent papers that state that these tests provide little comfort in testing the actual moment conditions as these tests may have power equal to size for specific cases are clarified. On the other hand it is possible that a significant test outcome is not the result of invalid moment conditions. n the presence of conditional heteroskedasticity, the variance of the moment conditions may either be underestimated or overestimated. A rarely applied Cornish Fisher correction for the Sargan test proposed by Magdalinos (1985) is re-examined. Additionally, simulation experiments are used to compare the different test statistics and we conclude that in small samples the Cornish Fisher corrected test outperforms the others in terms of size control. However, with respect to the power properties of the various tests we find that the corrected Sargan test performs less well than the other statistics when instruments are weak. The following two chapters are based on Kiviet et al. (2014). Although separated due to their joint length, they are inextricably linked. Both chapters are about the popular Arellano-Bond and Blundell-Bond GMM estimation techniques for single linear dynamic panel data models. Studies employing these techniques are growing exponentially in number. However, for researchers it is hard to make a reasoned choice between many different possible implementations of these estimators and associated tests. Chapter 4, which focuses on theory and techniques, explicates in a systematic way many options regarding: (i) reducing, extending or modifying the set of instruments; (ii) specifying the weighting matrix in relation to the type of heteroskedasticity; (iii) using (robustified) one-step or (corrected) two-step variance estimators; (iv) employing one-step or twostep residuals in Sargan-Hansen overall or incremental overidentification restrictions tests. 5

18 This is all done for models in which some regressors may be either strictly exogenous, predetermined or endogenous. Chapter 5 examines the performance in finite samples of the inference techniques discussed in Chapter 4 by simulation. The root mean squared errors of the coefficient estimators are compared and also the size of tests on coefficient values and of different implementations of overidentification restriction tests. Also the size and power of tests on the validity of the additional orthogonality conditions exploited by the Blundell-Bond technique are assessed over a pretty wide grid of relevant cases. Surprisingly, particular asymptotically optimal and relatively robust weighting matrices are found to be superior in finite samples to ostensibly more appropriate versions. Most of the variants of tests for overidentification restrictions show serious deficiencies. A recently developed modification of GMM is found to have potential when the cross-sectional heteroskedasticity is pronounced and the time-series dimension of the sample not too small. Finally all techniques are employed to actual data and lead to some profound insights. n chapter 6 we investigate the performance of subset tests in adequately specified linear dynamic panel data models, estimated by GMM. Because in that context usually just internal instruments are being exploited, misclassification of explanatory variables renders either a specific subset of instruments invalid or yields inefficient estimates. Rather than testing all overidentifying restrictions by the Sargan-Hansen test, the focus is on subsets using either the incremental Sargan-Hansen test or a Hausman test. Although it is known in the literature that the Sargan-Hansen test suffers when using many instruments, it is yet unclear in what way the incremental test is affected. Therefore, test statistics are considered in which the number of employed instruments is deliberately restricted. Two possible refinements are proposed. Recently Hayakawa (2014) has proposed a method which forces a block diagonal structure on the weighting matrix in order to reduce problems stemming from taking its inverse. This method is generalized to the incremental test. A finite sample corrected variance estimate for the vector of contrasts is derived from which two new Hausman test statistics are constructed. Simulation is used to investigate finite sample performance. The corrected Hausman statistics outperform the standard Hausman implementation uniformly. The test by Hayakawa (2014) and its incremental version are found to yield little benefits and often perform less well than the standard implementations. Collapsing is found to improve performance regarding size control as is estimating variances under the null hypothesis. The results are illustrated using a study on the effect of deterrence on crime. This thesis concludes with a summary in English and in Dutch. 6

19 Chapter 2 The performance of tests on endogeneity of subsets of explanatory variables scanned by simulation 2.1 ntroduction n this chapter various test procedures are derived and examined for the classification of arbitrary subsets of explanatory variables as either endogenous or exogenous with respect to a single adequately specified structural equation. Correct classification is highly important because misclassification leads to either inefficient or inconsistent estimation. The derivations, which in essence are based on employing Hausman s principle of examining the discrepancy between two alternative estimators, formulate the various tests as joint significance tests of additional regressors in auxiliary V regressions. Their relationships are demonstrated with particular forms of classic tests such as Durbin-Wu-Hausman orthogonality tests, Revankar-Hartley covariance tests and Sargan-Hansen overidentification restriction tests. Various different and some under the null hypothesis asymptotically equivalent implementations follow. The latter vary only regarding degrees of freedom adjustments and the type of disturbance variance estimator employed. We run simulations over a wide class of relevant cases, to find out which versions have best control over type error probabilities and to get an idea of the power of these tests. This should help to use these tests effectively in practice when trying to avoid both evils of inconsistency and inefficiency. To that end a simulation approach is developed by which relevant data generating processes (DGPs) are designed by deriving the values for their parameters from chosen 7

20 salient features of the system, namely: degree of simultaneity of individual explanatory variables, degree of multicollinearity between explanatory variables, and individual and joint strength of employed external instrumental variables. This allows scanning the relevant parameter space of wide model classes for flaws in performance regarding type and errors of all implementations of the tests and their bootstrapped versions. We find that testing orthogonality by standard methods is impeded for weakly identified regressors. Like bootstrapped tests require resampling under the null, we find here that testing for orthogonality by auxiliary regressions benefits from estimating variances under the null, as in Lagrange multiplier tests, rather than under the alternative, as in Wald-type tests. However, after proper size correction we find that the Wald-type tests exhibit the best power properties. Procedures for testing the orthogonality of all possibly endogenous regressors regarding the error term have been developed by Durbin (1954), Wu (1973), Revankar and Hartley (1973), Revankar (1978) and Hausman (1978). Mutual relationships between these are discussed in Nakamura and Nakamura (1981) and Hausman and Taylor (1981). This test problem has been put into a likelihood framework under normality by Holly (1982) and Smith (1983). Most of the papers just mentioned, and in particular Davidson and MacKinnon (1989, 1990), provide a range of implementations for these tests that can easily be obtained from auxiliary regressions. Although this type of inference problem does address one of the basic fundaments of the econometric analysis of observational data, relatively little evidence on the performance of the available tests in finite samples is available. Monte Carlo studies on the performance of some of the implementations in static linear models can be found in Wu (1974), Meepagala (1992), Chmelarova and Hill (2010), Jeong and Yoon (2010), Hahn et al. (2011) and Doko Tchatoka (2014), whereas such results for linear dynamic models are presented in Kiviet (1985). The more subtle problem of deriving a test for the orthogonality of subsets of the regressors not involving all of the possibly endogenous regressors has also received substantial attention over the last three decades. Nevertheless, generally accepted rules for best practice on how to approach this problem do not seem available yet, or are confusing as we shall see, and not yet supported by any simulation evidence. Self-evidently, though, the situation where one is convinced of the endogeneity of a few of the regressors, but wants to test some other regressors for orthogonality, is of high practical relevance. f orthogonality is established, this permits to use these regressors as instrumental variables, which (if correct) improves the efficiency and the identification situation, because it makes the analysis less dependent on the availability of external instruments. This is important in particular when available external instruments are weak or of doubtful exogeneity status. 8

21 Testing the orthogonality of subsets of the possibly endogenous regressors was addressed first by Hwang (1980b) and next by Spencer and Berk (1981, 1982), Wu (1983), Smith (1984, 1985), Hwang (1985) and Newey (1985), who all suggest various test procedures, some of them asymptotically or even algebraically equivalent. So do Pesaran and Smith (1990), who also provide theoretical arguments regarding an ordering of the power of the various tests, although they are asymptotically equivalent under the null and under local alternatives. Various of the possible subset test implementations are paraphrased in Ruud (1984, 2000), Davidson and MacKinnon (1993) and in Baum et al. (2003), and occasionally their relationships with particular forms of Sargan-Hansen (partial-)overidentification test statistics are examined. As we shall show, a few particular situations still call for further analysis and formal proofs, and sometimes results from the studies mentioned above have to be corrected. As far as we know, there are no published simulation results yet on the actual qualities of tests for the exogeneity for arbitrary subsets of the regressors in finite samples. n this chapter we shall try to elucidate the various forms of available test statistics for the endogeneity of subsets of the regressors, demonstrate their origins and their relationships, and also produce solid Monte Carlo results on their performance in single static linear models with endogenous regressors and D disturbances. That yet no simulation results are available on subset tests may be due to the fact that it is not straight-forward how one should design a range of appealing and representative experiments. We believe that in this respect the present study, which closely follows the rules set out in Kiviet (2012), may claim originality. Besides exploiting some invariance properties, we choose the remaining parameter values for the DGP indirectly from the inverse relationships between the DGP parameter values and fundamental orthogonal econometric notions. The latter constitute an insightful base for the relevant nuisance parameter space. The present design can easily be extended to cover cases with a more realistic degree of overidentification and number of jointly dependent regressors. Other obvious extensions would be: to include recently developed tests which are specially built to cope with weak instruments, to consider non Gaussian and non D disturbances, to examine dynamic models, to include tests for the validity (orthogonality) of instruments which are not included in the regression, etc. Regarding all these aspects the present study just offers an initial reference point. The structure of the chapter is as follows. n Section 2.2, we first define the model s maintained properties and the hypothesis to be tested. Next, in a series of subsections, various routes to develop test procedures are followed and their resulting test statistics are discussed and compared analytically. Section 2.3 reviews earlier Monte Carlo designs 9

22 and results regarding orthogonality tests. n Section 2.4 we set out our approach to obtain DGP parameter values from chosen basic econometric characteristics. A simulation design is obtained to parametrize a synthetic single linear static regression model including two possibly endogenous regressors with an intercept and involving two external instruments. For this design Section 2.5 presents simulation results for a selection of practically relevant parametrizations. Section 2.6 produces similar results for bootstrapped versions of the tests, Section 2.7 provides an empirical case study and Section 2.8 concludes. 2.2 Testing the orthogonality of subsets of explanatory variables The model and setting We consider the single linear equation with endogenous regressors y = Xβ + u, (2.1) with D unobserved disturbances u (0,σ 2 n ),K-element unknown coefficient vector β, an n K regressor matrix X and n 1 regressand y. We also have an n L matrix Z containing sample observations on identifying instrumental variables, so E(Z u)=0, rank(z) =L, rank(x) =K and rank(z X)=K. (2.2) n addition, we make asymptotic regularity assumptions to guarantee asymptotic identification of all elements of β too and consistency of its V (or 2SLS) estimator ˆβ =(X P Z X) 1 X P Z y, (2.3) where P Z = Z(Z Z) 1 Z. Hence, we assume that plim n 1 Z Z =Σ Z Z and plim n 1 Z X =Σ Z X (2.4) are finite and have full column rank, whereas ˆβ has limiting normal distribution n 1/2 ( ˆβ β) d N ( 0,σ 2 [Σ Z XΣ 1 Z Z Σ Z X] 1). (2.5) The matrices X and Z may have some (but not all) columns in common and can 10

23 therefore be partitioned as X =(Y Z 1 )andz =(Z 1 Z 2 ), (2.6) where Z j has L j columns for j =1, 2. Because the number of columns in Y is K L 1 > 0 we find from L = L 1 + L 2 K that L 2 > 0, but we allow L 1 0, so Z 1 may be void. Throughout this chapter the model just defined establishes the maintained unrestrained 1 hypothesis, which allows Y to contain endogenous variables. Below we will examine particular further curbed versions of the maintained hypothesis and develop tests to verify these further limitations. These are not parametric restraints regarding β but involve orthogonality conditions in addition to the L maintained orthogonality conditions embedded in E(Z u) = 0. All these extra orthogonality conditions concern regressors and not further external instrumental variables. Therefore, we consider a partitioning of Y in K e and K o columns Y =(Y e Y o ), (2.7) where the variables Y e are maintained as possibly ēndogenous, whereas for the K o variables Y o their possible ōrthogonality will be examined, i.e. whether E(Y ou) = 0 seems to hold. We define the n (L + K o ) matrix Z r =(ZY o ), (2.8) which relates to all the orthogonality conditions in the r estrained model. Note that (2.2) implies that Z r has full column rank, provided n L + K o. Now the null and alternative hypotheses that we will examine can be expressed as H 0 : y = Xβ + u, u (0,σ 2 ), E(Z ru) =0, and (2.9) H 1 : y = Xβ + u, u (0,σ 2 ), E(Z u)=0,e(y ou) 0. Hence, H 0 assumes E(Y ou) =0. Under the extended set of orthogonality conditions E(Z ru) =0, i.e. under H 0, the restrained V estimator is ˆβ r =(X P Zr X) 1 X P Zr y. (2.10) f H 0 is valid this estimator is consistent and, provided plim n 1 Z rz r =Σ Z r Z r exists and is invertible, its limiting normal distribution has variance σ 2 [Σ Z rx Σ 1 Z r ZrΣ Z rx] 1, which in- 1 The terms restrained and unrestrained are used rather than restricted and unrestricted. This is done in order to avoid confusion as the latter terms are often used in the context of coefficient restrictions. 11

24 volves an asymptotic efficiency gain over (2.5). However, under the alternative hypothesis H 1 estimator ˆβ r is inconsistent. A test for (2.9) should (as always) have good control over its type error probability 2 and preferably also have high power, in order to prevent the acceptance of an inconsistent estimator. n practice inference on (2.9) usually establishes just one link in a chain of tests to decide on the adequacy of model specification (2.1) and the maintained instruments Z, see for instance Godfrey and Hutton (1994) and Guggenberger (2010). Many of the firm results obtained below require to make the very strong assumptions embedded in (2.1) and (2.2) and leave it to the practitioner to make a balanced use of them within an actual modelling context. n the derivations to follow we make use of the following three properties of projection matrices, which for any full column rank matrix A are denoted as P A = A(A A) 1 A : For a full column rank matrix C =(AB) one has (i) P A = P C P A = P A P C ; (ii) P C = P A +P MA B = P (AMA B), where M A = P A ; (iii) for C =(A B), where A = A BD and D an arbitrary matrix of appropriate dimensions, P C = P B +P MB A = P B +P MB A = P C The source of any estimator discrepancy A test based on the Hausman principle focusses on the discrepancy vector ˆβ ˆβ r = (X P Z X) 1 X P Z y (X P Zr X) 1 X P Zr y = (X P Z X) 1 X P Z [ X(X P Zr X) 1 X P Zr ]y = (X P Z X) 1 (P Z X) û r = (X P Z X) 1 (P Z Y e P Z Y o Z 1 ) û r, (2.11) where û r = y X ˆβ r denotes the V residuals obtained under H 0. Although testing whether the discrepancy between these two coefficient estimators is significantly different from zero is not equivalent to testing H 0, we will show that in fact all existing test procedures employ the outcome of this discrepancy to infer on the (in)validity of H 0.Because(X P Z X) 1 is non-singular ˆβ ˆβ r is close to zero if and only if the K 1 vector (P Z Y e P Z Y o Z 1 ) û r is. So, we will examine now when its three sub-vectors Y ep Z û r,y op Z û r and Z 1û r (2.12) 2 An actual type error probability much larger than the chosen nominal value would more often than intended lead to using an inefficient estimator. A much lower actual type error than the nominal level would deprive the test from its power hampering the detection of estimator inconsistency. 12

25 will jointly be close to zero. Note that due to the identification assumptions both P Z Y e and P Z Y o will have full column rank so cannot be O. For the V residuals û r we have X P Zr û r =0, and since P Zr X =(P Zr Y e Y o Z 1 ), this yields Y ep Zr û r =0,Y oû r =0andZ 1û r =0. (2.13) Note that the third vector of (2.12) is always zero according to the third equality from (2.13). Using projection matrix property (ii) and the first equality of (2.13), we find for the first vector of (2.12) that so Y ep Z û r = Y e(p Zr P MZ Y o )û r = Y ep MZ Y o û r, Y ep Z û r = Y em Z Y o (Y om Z Y o ) 1 Y om Z û r. (2.14) This K e element vector will be close to zero when the K o element vector Y om Z û r is. Due to the occurrence of the K e K o matrix Y em Z Y o as a first factor in the right-hand side of (2.14), it seems possible that Y ep Z û r may be close to zero too in cases where Y om Z û r 0; we will return to that possibility below. For the second vector of (2.12) we find, upon using the second equality of (2.13), that Y op Z û r = Y om Z û r. (2.15) Hence, the second vector of (2.12) will be close to zero if and only if the vector Y om Z û r is close to zero. From the above it follows that Y om Z û r being close to zero is both necessary and sufficient for the full discrepancy vector (2.11) to be small. Checking whether Y om Z û r is close to zero corresponds to examining to what degree the variables M Z Y o do obey the orthogonality conditions, while using û r as a proxy for u, which is asymptotically valid under the extended set of orthogonality conditions. Note that by focussing on M Z Y o the tested variables Y o have been purged from their components spanned by the columns of Z. Since these are maintained to be orthogonal with respect to u, they should better be excluded from the test indeed. Since the inverse matrix in the right-hand side of (2.11) is positive definite, the probability limits of ˆβ and ˆβ r will be similar if and only if plim n 1 Y om Z û r =0. Regarding the power of any discrepancy based test of (2.9) it is now of great interest to examine whether it could happen under H 1 to have plim n 1 Y om Z û r =0. For that purpose we specify the 13

26 reduced form equations Y j = ZΠ j +(uγ j + V j ), for j {e, o}, (2.16) where Π j is an L K j matrix of reduced form parameters, γ j is a K j 1 vector that parametrizes the simultaneity and V j establishes the components of the zero mean reduced form disturbances which are uncorrelated with u andofcoursewithz. After this further parametrization the hypotheses (2.9) can now be expressed as H 0 : γ o =0andH 1 : γ o 0. Let (L + K o ) (L + K o ) matrix Ψ be such that ΨΨ =(Z rz r ) 1. From Y om Z û r = Y op Z [ n X(X P Zr X) 1 X P Zr ]u = Y op Z [P Zr P Zr X(X P Zr X) 1 X P Zr ]u = Y op Z Z r Ψ[ L+Ko P Ψ Z r X ]Ψ Z ru (2.17) it follows that plim n 1 Y om Z û r =0if(L + K o ) 1 vector plim n 1 Z ru = σ 2 (0 γ o) is in the column space spanned by plim n 1 Z rx =Σ Z r X. This is obviously the case when γ o =0. However, it cannot occur for γ o 0, because (L + K o ) 1 vector Σ Z r Xc, with c a K 1 vector, has its first L K elements equal to zero only for c =0, due to the identification assumptions. This excludes the existence of a vector c 0 yielding Σ Z r Xc = σ 2 (0 γ o) when γ o 0, so under asymptotic identification the discrepancy will be nonzero asymptotically when Y o contains an endogenous variable. Cases in which the asymptotic identification assumptions are violated are Π e = C e / n and/or Π o = C o / n, where C e and C o are matrices of appropriate dimensions with full column rank and all elements fixed and finite. 3 Examining Σ Z r Xc closer yields ( ) ( ) ( ) Σ Z Σ Z r Xc = ZΠ e Σ Z c Π oσ Z ZΠ e +Σ V o V e + σ 2 γ o γ e 1 + ZΠ o Σ Z c 2 + Z 1 c Σ Y o Y o Π 3, (2.18) oσ Z Z 1 where c =(c 1 c 2 c 3) and Σ V o V e = plim n 1 V ov e. f only Π o = C o / n, so when all the instruments Z are weak and asymptotically irrelevant for the set of regressors Y o whose orthogonality is tested, we can set c 1 =0andc 3 =0andthenforc 2 = σ 2 Σ 1 Y oy o γ o = σ 2 (σ 2 γ o γ o +Σ V o V o ) 1 γ o 0wehaveΣ Z r Xc = σ 2 (0 γ o) 0, demonstrating that the test will have no asymptotic power. f only Π e = C e / n, thus all the instruments Z are weak for Y e, a solution c 0 can be found upon taking c 2 =0,c 3 =0andc 1 0, provided Σ V o V e + σ 2 γ o γ e O or Y e and Y o are asymptotically not uncorrelated. Only c 3 has to be set at zero to find a solution when Z is weak for both Y o and Y e. From (2.18) it can also 3 Doko Tchatoka (2014) considers a similar situation for the special case K e = 0 and K o =1. 14

27 be established that when from Z 2 at least K e + K o instruments are not weak for Y the discrepancy will always be different from zero asymptotically when γ o 0. Using (2.16) we also find plim n 1 Y em Z Y o =Σ V ov e + σ 2 γ e γ o, which demonstrates that the first vector of (2.12) would for γ o 0 tend to zero also when γ e = 0 while the reduced form disturbances of Y e and Y o are uncorrelated. This indicates the plausible result that a discrepancy based test may loose power when Y e is unnecessarily treated as endogenous and Y o is establishing a weak instrument for Y e after partialing out Z Testing based on the source of any discrepancy Next we examine the implementation of testing closeness to zero of Y om Z û r in an auxiliary regression. Consider y = Xβ + P Z Y o ζ + u, (2.19) where u = u P Z Y o ζ. ts estimation by V employing the instruments Z r yields coefficients that can be obtained by applying OLS to the second-stage regression of y on P Zr X and P Zr P Z Y o = P Z Y o. For ζ partitioned regression yields ˆζ =(Y op Z M PZr XP Z Y o ) 1 Y op Z M PZr Xy, (2.20) where, using rule (i), Y op Z M PZr Xy = Y op Z [ X(X P Zr X) 1 X P Zr ]y = Y op Z û r. Thus, by testing ζ = 0 in (2.19) we in fact examine whether Y op Z û r = Y om Z û r differs significantly from a zero vector, which is indeed what we aim for. 4 Alternatively, consider the auxiliary regression y = Xβ + M Z Y o ξ + v, (2.21) where v = u M Z Y o ξ. Using the instruments Z r involves here applying OLS to the secondstage regression of y on P Zr X and P Zr M Z Y o = P Zr Y o P Zr P Z Y o = Y o P Z Y o = M Z Y o. This yields ˆξ =(Y om Z M PZr XM Z Y o ) 1 Y om Z M PZr Xy, (2.22) 4 This procedure provides the explicit solution to the exercise posed in (Davidson and MacKinnon, 1993, p.242). 15

28 where Y om Z M PZr Xy = Y om Z [ P Zr X(X P Zr X) 1 X P Zr ]y = Y o[ X(X P Zr X) 1 X P Zr ]y Y op Z [ X(X P Zr X) 1 X P Zr ]y = Y om Z û r. (2.23) Thus, like testing ζ = 0 in (2.19), testing ξ = 0 in auxiliary regression (2.21) examines the magnitude of Y om Z û r. The estimator for β resulting from (2.21) is ˆβ r =(X P Zr M MZ Y o P Zr X) 1 X P Zr M MZ Y o y. Because P Zr M MZ Y o = P Zr P Zr P MZ Y o = P Zr (P Z + P MZ Y o )P MZ Y o = P Zr P MZ Y o = P Z, we find ˆβ r = ˆβ. Hence, the V estimator of β exploiting the extended set of instruments in the auxiliary model (2.21) equals the unrestrained V estimator ˆβ. Many text books mention this result for the special case K e =0. From the above we find that testing whether included possibly endogenous variables Y o can actually be used effectively as valid extra instruments, can be done as follows: Add them to Z, so use Z r as instruments, and add at the same time the regressors M Z Y o (the reduced form residuals of the alleged endogenous variables Y o in the maintained model) to the model, and then test their joint significance. When testing ξ = 0 in (2.21) by a Wald-type statistic, and assuming for the moment that σ 2 is known, the test statistic is σ 2 y P MPZr XM Z Y o y = σ 2 y (M A M C )y, (2.24) where A = P Zr X, B = M Z Y o and C =(AB). Hence, y P MPZr XM Z Y o y is equal to the difference between the OLS residual sums of squares of the restricted (by ξ =0)andthe unrestricted second stage regressions (2.21). One easily finds that testing ζ = 0 in (2.19) by a Wald-type test yields in the numerator y P MPZr XP Z Y o y = y (M A M C )y, with again A = P Zr X =(P Zr Y e Y o Z 1 ), but C =(AB )withb = P Z Y o. Although C C, both span the same subspace, so M C = M C and thus the two auxiliary regressions lead to numerically equivalent Wald-type test statistics. Of course, σ 2 is in fact unknown and should be replaced by an estimator that is consistent under the null. There are various options for this. Two rather obvious choices would be ˆσ 2 =û û/n or ˆσ r 2 =û rû r /n, giving rise to two under the null (and also under local 16

29 alternatives) asymptotically equivalent test statistics, both with χ 2 (K o ) asymptotic null distribution. Further asymptotically equivalent variants can be obtained by employing a degrees of freedom correction in the estimation of σ 2 and/or by dividing the test statistic by K o and then confronting it with critical values from an F distribution with K o and n l degrees of freedom with l some finite number, possibly K + K o. Testing the orthogonality of Y o and u, while maintaining the endogeneity of Y e, by a simple χ 2 -form statistic and using as in a Wald-type test the estimate ˆσ 2 (without any degrees of freedom correction) from the unrestrained model, will be indicated by W o. When using the uncorrected restrained estimator ˆσ r, 2 the statistic will be denoted here as D o. So we have the two archetype test statistics W o = y P MPZr XM Z Y o y/ˆσ 2 and D o = y P MPZr XM Z Y o y/ˆσ r. 2 (2.25) Using the restrained σ 2 estimator, as in a Lagrange-multiplier-type test under normality, was already suggested in Durbin (1954, p.27), where K e = L 1 =0andK o = L 2 =1. Before we discuss further options for estimating σ 2 in general subset tests, we shall first focus on the special case K e =0, where the full set of endogenous regressors is tested. Then ˆσ r 2 = y M X y/n = n K n s2 stems from OLS. Wu (1973) suggested for this case four test statistics, indicated as T 1,..., T 4, where T 4 = n 2K o L 1 n 1 D o and T 3 = n 2K o L 1 1 W o. (2.26) K o n K o On the basis of his simulation results Wu recommended to use the monotonic transformation of T 4 (or D o ) T 4 T 2 = K 1 o = n 2K o L 1 1 n 2K o L 1 T 4 n K o 1 D o /n. (2.27) He showed that under normality of both structural and reduced form disturbances the null distribution of T 2 is F (K o,n 2K o L 1 ) in finite samples. 5 Because K e = 0 implies M PZr X = M X we find from (2.24) that in this case D o 1 D o /n = n y P MX M Z Y o y y (M X P MX M Z Y o )y = n y P MX M Z Y o y y M = y P MX M Z Y o y. (XMZ Y o)y σ 2 Hence, from the final expression we see that T 2 estimates σ 2 by σ 2 = y M (XMZ Y o)y/n, which is the OLS residual variance of auxiliary regression (2.21). Like ˆσ 2 and ˆσ r, 2 σ 2 is 5 Wu s T 1 test for case K e = 0, which under normality has a F (K o,l 2 K o ) distribution, has a poor reputation in terms of power and therefore we leave it aside. D o 17

30 consistent under the null, because plim n 1 Y om Z û r = 0 implies, after substituting (2.23) in (2.22), that plim ˆξ =0. Pesaran and Smith (1990) show that under the alternative plim ˆσ 2 plim ˆσ 2 r plim σ 2 and then invoke arguments due to Bahadur to expect that T 2 (which uses σ 2 ) has better power than T 4 (which uses ˆσ r), 2 whereas both T 2 and T 4 are expected to outperform T 3 (which uses ˆσ 2 ). However, they did not verify this experimentally. Moreover, because T 2 is a simple monotonic transformation of T 4 when K e =0, we also know that after a fully successful size correction both should have equivalent power. Following the same lines of thought for cases where K e > 0, we expect (after proper size correction) D o to do better than W o, but Pesaran and Smith (1990) suggest that an even better result may be expected from formally testing ξ = 0 in the auxiliary regression (2.21) while exploiting instruments Z r. This amounts to the χ 2 (K o ) test statistic T o, which (omitting its degrees of freedom correction) generalizes Wu s T 2 for cases where K e 0, and is given by T o = y P MPZr XM Z Y o y/ σ 2 = y (M A M C )y/ σ 2, (2.28) with σ 2 =(y X ˆβ M Z Y o ˆξ) (y X ˆβ M Z Y o ˆξ)/n. (2.29) Actually, it seems that Pesaran and Smith (1990, p.49) employ a slightly different estimator for σ 2, namely (y X ˆβ M Z Y o ˆξ ) (y X ˆβ M Z Y o ˆξ )/n (2.30) with ˆξ =(Y om Z Y o ) 1 Y om Z (y X ˆβ). (2.31) However, because OLS residuals are orthogonal to the regressors we have Y om Z (y X ˆβ M Z Y o ˆξ) =0, from which it follows that ˆξ = ˆξ, so their test is equivalent with T o. When K e > 0 the three tests W o,d o and T o are not simple monotonic transformations of each other, so they may have genuinely different size and power properties in finite samples. n particular, we find that for D o 1 D o /n = y P C y y P A y (û rû r y P C y + y P A y)/n, 18

31 the denominator in the right-hand expression differs from σ 2 (unless K e =0). 6 Using that ˆξ is given by (2.31) we find from (2.29) that σ 2 =û M MZ Y o û/n ˆσ 2, so W o T o, (2.32) whereas D o can be at either side of W o and T o Testing based on the discrepancy as such Direct application of the Hausman (1978) principle yields the test statistic H o =(ˆβ ˆβ r ) [ˆσ 2 (X P Z X) 1 ˆσ 2 r(x P Zr X) 1 ] ( ˆβ ˆβ r ), (2.33) which uses a generalized inverse for the matrix in square brackets. When σ 2 were known the matrix in square brackets would certainly be singular though semi-positive definite. Using two different σ 2 estimates might lead to nonsingularity but could yield negative test statistics. As is obvious from the above, (2.33) will not converge to a χ 2 K distribution under H 0, but in our framework to one with K o degrees of freedom, cf. Hausman and Taylor (1981). Some further analysis leads to the following. Let β have separate components as follows from the decompositions Xβ = Y e β e + Y o β o + Z 1 β 1 = Yβ eo + Z 1 β 1, (2.34) whereas (X P Z X) 1 has blocks A jk,j,k=1, 2, where A 11 is a K eo K eo matrix with K eo = K e + K o. Then we find from (2.11) and (2.13) that ( ) [ ] ˆβ ˆβ Y P r =(X P Z X) 1 Z û r A 11 = Y P Z û r, 0 A 21 (2.35) ˆβ eo ˆβ eo,r = A 11 Y P Z û r. Hence, the discrepancy vector of the two coefficient estimates of just the regressors in Y, but also those of the full regressor matrix X, are both linear transformations of rank K eo of the vector Y P Z û r. Therefore it is obvious that the Hausman-type test statistic (2.33) 6 Therefore, the test statistic (54) suggested in Baum et al. (2003, p.26), although asymptotically equivalent to the tests suggested here, is built on an inappropriate analogy with the K e = 0 case. Moreover, in their formulas (53) and (54) Q should be the difference between the residual sums of squares of second-stage regressions, precisely as in (2.25). The line below (54) suggests that Q is a difference between squared V residuals (which would mean that Q could be negative) of the (un)restricted auxiliary regressions, although their footnote 25 seems to suggest otherwise. 19

32 can also be obtained from H o =(ˆβ eo ˆβ eo,r ) [ˆσ 2 (Y P MZ1 Z 2 Y ) 1 ˆσ 2 r(y P MZ1 (Z 2 Y o)y ) 1 ] ( ˆβ eo ˆβ eo,r ). (2.36) Both test statistics are algebraically equivalent, because of the unique linear relationship ˆβ ˆβ r = [ Keo A 21 A 1 11 ] ( ˆβ eo ˆβ eo,r ). (2.37) Calculating (2.36) instead of (2.33) just mitigates the numerical problems. One now wonders whether an equivalent Hausman-type test can be calculated on the basis of the discrepancy between the estimated coefficients for just the regressors Y o. This is not the case, because a relationship of the form ( ˆβ eo ˆβ eo,r )=G( ˆβ o ˆβ o,r ), where G is a K eo K o matrix, cannot be found 7. However, a matrix G can be found such that ( ˆβ eo ˆβ eo,r )=Gˆξ, indicating that test H o can be made equivalent to the three distinct tests of the foregoing subsection, provided similar σ 2 estimates are being used. Using (2.14) and (2.15) in (2.35) we obtain ˆβ eo ˆβ eo,r = A 11 Y P Z û r [ ] Y = A em Z Y o (Y om Z Y o ) 1 11 (Y om Z M PZr XM Z Y o )ˆξ, (2.38) Ko because (2.22) and (2.23) yield Y om Z û r =(Y om Z M PZr XM Z Y o )ˆξ. So, under the null hypothesis particular implementations of W o,d o,t o and H o are equivalent. 8 When H o is used with two different σ 2 estimates it may come close to a hybrid implementation of W o and D o where the two residual sums of squares in the numerator are scaled by different σ 2 estimates as in WD o = y M PZr Xy y M (PZr XM Z Y o)y. (2.39) ˆσ r 2 ˆσ 2 7 Note that Wu (1983) and Hwang (1985) start off by analyzing a test based on the discrepancy ˆβ o ˆβ o,r. Both Wu (1983) and Ruud (1984, p.236) wrongly suggest equivalence of such a test with (2.33) and (2.36). 8 This generalizes the equivalence result mentioned below (22.27) in Ruud (2000, p.581), which just treats the case K e =0. Note, however, that because Ruud starts off from the full discrepancy vector, the transformation he presents is in fact singular and therefore the inverse function mentioned in his footnote 24 is non-unique (the zero matrix may be replaced with any other matrix of the same dimensions). To obtain a unique inverse transformation, one should start off from the coefficient discrepancy for just the regressors Y, and this is found to be nonsingular for K e = 0 only. 20

33 2.2.5 Testing based on covariance of structural and reduced form disturbances Auxiliary regression (2.21) is used to detect correlation of u and V o (the reduced form disturbances of Y o ) by examing the covariance of the residuals û r and M Z Y o. This might perhaps be done in a more direct way by augmenting regression (2.1) by the actual reduced form disturbances, giving y = Xβ +(Y o ZΠ o )φ + w, (2.40) where w = u (Y o ZΠ o )φ with φ a K o 1 vector. Let ZΠ o = Z 1 Π o1 + Z 2 Π o2, then (2.40) can be written as y = Y e β e + Y o (β o + φ)+z 1 (β 1 Π o1 φ) Z 2 Π o2 φ + w = Xβ + Z 2 φ + w (2.41) in which we may assume that E(Z w ) = 0, though E(Y ew ) 0. However, testing φ =0, which corresponds to φ = 0 in (2.40), through estimating (2.41) consistently is not an option, unless K e =0. ForK e > 0, which is the case of our primary interest here, (2.41) contains all available instruments as regressors, so we cannot instrument Y e. For the case K e = 0 the test of φ = 0 yields the test of Revankar and Hartley (1973), which is an exact test under normality. When K o = L 2 (just identification) it specializes to Wu s T 2. 9 When L 2 >K o (overidentification) Revankar (1978) argues that testing the K o restrictions φ = 0 by testing the L 2 restrictions φ = 0 is inefficient. He then suggests to test φ = 0 by a quadratic form in the difference of the least-squares estimator of β o + φ in (2.41) and the V estimator of β o. 10 From the above we see that the tests on the covariance of disturbances do not have a straight-forward generalization for the case K e > 0. However, a test that comes close to it replaces the L L 1 columns of Z 2 in (2.41) by a set of L K regressors Z2 which span a subspace of Z 2, such that (P Z Y e Z 1 Z2) spans the same space as Z. Testing these L K exclusion restrictions yields the familiar Sargan-Hansen test for testing all the so-called overidentification restrictions of model (2.1). t is obvious that this test will have power for alternatives in which Z 2 and u are correlated, possibly because some of 9 This is proved as follows: Both tests have regressors X under the null, and under the alternative the full column rank matrices (X P Z Y o )and(xz 2 ) respectively. These matrices span the same space when X =(Y o Z 1 )andz =(Z 1 Z 2 ) have the same number of columns. 10 Meepagala (1992) produces numerical results indicating that the discrepancy based tests have lower power than the Revankar and Hartley (1973) test when instruments are weak and than the Revankar (1978) test when the instruments are strong. 21

34 the variables in Z 2 are actually omitted regressors. n practical situations this type of test, and also Hausman type tests for the orthogonality of particular instruments not included as regressors in the specification 11, are very useful. However, we do not consider such implementations here, because right from the beginning we have chosen a context in which all instruments Z are assumed to be uncorrelated with u. This allows focus on tests serving only the second part of the two-part testing procedure as exposed by Godfrey and Hutton (1994), who also highlight the asymptotic independence of these two parts Testing by an incremental Sargan test The original test of overidentifying restrictions initiated by Sargan (1958) does not enable to infer directly on the orthogonality of individual instrumental variables, but a so-called incremental Sargan test does. t builds on the maintained hypothesis E(Z u)=0andcan test the orthogonality of additional potential instrumental variables. Choosing for these the included regressors Y o yields a test statistic for the hypotheses (2.9) which is given by S o = û rp Zr û r ˆσ 2 r û P Z û ˆσ 2. (2.42) When using for both separate Sargan statistics the same σ 2 estimate, and employing P Z û =(P Z P PZ X)y, the numerator would be û rp Zr û r û P Z û = y (P Zr P PZr X P Z + P PZ X)y = y (P MZ Y o + P PZ X P PZr X)y = y (P (PZ XM Z Y o) P PZr X)y, whereas that of W o and D o in (2.24) is given by y (P C P A )y, where C =(AB)with A = P Zr X and B = M Z Y o. Equivalence 12 is proved by using general result (iii) on projection matrices, upon taking A = P Z X. Using P Zr = P Z + P MZ Y o, we have A = A P B X = A B(B B) 1 B X, so D =(B B) 1 B X. Thus P (AB) = P (A B) = P (PZ XM Z Y o) giving û rp Zr û r û P Z û = y (P (PZr XM Z Y o) P PZr X)y. (2.43) Hence, in addition to the H o statistic, S o establishes yet another hybrid form combining elements of both W o and D o, but different from (2.39). 11 See Hahn et al. (2011) for a study on its behaviour under weak instruments. 12 Ruud (2000, p.582) proves this just for the special case K e =0. Newey (1985, p.238), (Baum et al., 2003, p.23 and formula 55) and Hayashi (2000) mention equivalence for K e 0, but do not provide a proof. 22

35 2.2.7 Concerns for practitioners The foregoing subsections demonstrate that all available archetypical statistics W o,d o, T o,h o and S o for testing the orthogonality of a subset of the potentially endogenous regressors basically just differ regarding the way in which the expresssions they are based on are scaled with respect to σ 2. Both S o and H o (and of course WD o ) show a hybrid nature in this respect, because their most natural implementations require two different σ 2 estimates, which may lead to negative test outcomes. n addition to that, H o has the drawback that it involves a generalized inverse, whereas calculation of the other four is rather straight-forward. 13 Similar differences and correspondences carry over to more general models, which would require GMM estimation, see Newey (1985) and Ahn (1997). Although of no concern asymptotically, these differences may have major consequences in finite samples, thus practitioners are in need of clues which implementations should be preferred. 14 Therefore, in the remainder of this study, we will examine the performance in finite samples of all these five archetypical tests. First, we will examine whether any simple degrees of freedom corrections seem to lead to acceptable size control. Next, only for those variants that pass this test we will perform some power calculations. 2.3 Earlier Monte Carlo designs and results n the literature the actual rejection frequencies of tests on the independence between regressors and disturbances have been examined by simulation only for situations where all possibly endogenous regressors are tested jointly, hence K e = 0. To our knowledge, subset orthogonalty tests have not been examined yet. Wu (1974) was the first to design a simulation study in which he examined the four tests suggested in Wu (1973). He made substantial efforts, both analytically and experimentally, to assess the parameters and model characteristics which actually determine 13 t is not obvious why Pesaran and Smith (1990, p.49,55) mention that they find T o a computationally more attractive statistic than W o. All three test statistics are very easy to compute. However, T o is the only one that strictly applies a standard procedure (Wald) to testing zero restrictions in an auxiliary regression, which eases its use by standard software packages. On the other hand Baum et al. (2003, p.26) characterize tests like T o as computationally expensive and practically cumbersome, which we find far fetched too. 14 Under the heading of Regressor Endogeneity test EViews 8.1 presents statistic S o where for both σ 2 estimates n K degrees of freedom are used, like it does for the J statistic. n Stata 13 the hausman command calculates H o by default and offers the possibility to calculate W o and D o. The degrees of freedom reported is the rank of the estimated variance of the discrepancy vector. n case of H o this is not correct. t is possible to overwrite the degrees of freedom by an additional command. The popular package ivreg2 only reports D o with the correct degrees of freedom. 23

36 the distribution of the test statistics and their power curves. His focus is on the case where there is one possibly endogenous regressor (K o = 1), an intercept and one other included exogenous regressor (L 1 = 2) and two external instruments (L 2 = 2), giving a degree of overidentification of 1. All disturbances are assumed normal, all exogenous regressors are mutually orthogonal and all consist of elements equal to either 1, 0, or -1, whereas all instruments have coefficient 1 in the reduced form. Wu demonstrates that all considered test statistics are functions of statistics that follow Wishart distributions which are invariant with respect to the values of the structural coefficients of the equation of interest. The effects of changing the degree of simultaneity and of changing the joint strength of the external instruments are examined. Because the design is rather inflexible regarding varying the explanatory part of the reduced form, no separate attention is paid to the effects of multicollinearity of the regressors on the rejection probabilities, nor to the effects of weakness of individual instruments. Although none of the tests examined is found to be superior under all circumstances, test T 2, which is exact under normality and generalized as T o in (2.28), is found to be the preferred one. ts power increases with the absolute value of the degree of simultaneity, with the joint strength of the instruments and with the sample size. Nakamura and Nakamura (1985) examine a design where K e =0,K o =1,L 1 =2,L 2 = 3 and all instruments are mutually independent standard normal. The structural equation disturbances u and the reduced form disturbances v are D normal with variances σu 2 and σv 2 respectively and correlation ρ. They focus on the case where all coefficients in the structural equation and in the reduced form equation for the possibly endogenous regressor are unity. Given the fixed parameters the distribution of the test statistic T 2 now depends only on the values of ρ 2,σu 2 and σv. 2 Attention is drawn to the fact that the power of an endogeneity test and its interpretation differs depending on whether the test is used to signal: (a) the degree of simultaneity expressed as ρ, (b) the simultaneity expressed as the covariance δ = ρσ u σ v, or (c) the extent of the asymptotic bias of OLS (which in their design is also determined just by ρ, σu 2 and σv). 2 When testing (a) a natural choice of the nuisance parameters (which are kept fixed when ρ is varied to obtain a power curve) are σ u and σ v. However, when testing (b) or (c) ρ, σ u and σ v cannot all be chosen independently. The study shows that, although the power of test T 2 does increase for increasing values of ρ 2 while keeping σ u and σ v constant, it may decrease for increasing asymptotic OLS bias. Therefore, test T 2 is not very suitable for signaling the magnitude of OLS bias. n this design σv 2 =5(1 R 2 )/R 2, where R 2 is the population coefficient of determination of the reduced form equation for the possibly endogenous regressor. The joint strength of the instruments is a simple function of R 2 and hence of σ v. Again, due 24

37 to the fixed values of the reduced form coefficients the effects of weakness of individual instruments or of multicollinearity cannot be examined from this design. The study by Kiviet (1985) demonstrates that in models with a lagged dependent explanatory variable the actual type error probability of test T 2 may deviate substantially from the chosen nominal level. Then high rejection frequencies under the alternative have little or no meaning. 15 n the present study we will stick to static cross-section type models. Thurman (1986) performs a small scale Monte Carlo simulation of just 100 replications on a specific two equation simultaneous model using empirical data for the exogenous variables from which he concludes that Wu-Hausman tests may have substantial power under particular parametrizations and none under others. Chmelarova and Hill (2010) focus on pre-test estimation based on test T 2 (for K o =1, L 1 =2,L 2 = 1) and two other forms of contrast based tests which use an improper number of degrees of freedom 16. Their Monte Carlo design is very restricted, because the possibly endogenous regressor and the exogenous regressor (next to the constant) are uncorrelated, so multicollinearity does not occur, which makes the DGP unrealistic. Moreover, all coefficients in the equation of interest are kept fixed and are such that the signal to noise ratio is always 1. Therefore, the inconsistency of OLS is relatively large (and in fact equal to the simultaneity correlation coefficient ρ). Because the sample size is not varied and neither is the instrument strength parameter 17 the results do not allow to form an opinion on how effective the T 2 test is to diagnose simultaneity. Jeong and Yoon (2010) present a study in which they examine by simulation what the rejection probability of the Hausman test is when an instrument is employed which is actually correlated with the disturbances. Also for the subset tests to be examined here the situation seems of great practical relevance that they might be implemented while using some variable(s) as instruments which are in fact endogenous. n our Monte Carlo experiments we will cover such situations, but we do not find the design as used by Jeong and Yoon, in which the endogeneity/exogeneity status of variables depends on the sample size very useful. 15 Because we could not replicate some of the presented figures for the case of strong instruments, we plan to re-address the analysis of DWH type tests in dynamic models in future work. 16 This may occur when standard software is employed based on a naive implementation of the Hausman test. Practitioners should be advised never to use these standard options but always perform tests based on estimator contrasts by running the relevant auxiliary regression. 17 f the effects of a weaker instrument had been checked the simulation estimates of the moments of V (which do not exist, because the model is just identified) would have gone astray. 25

38 2.4 A more comprehensive Monte Carlo design To examine the differences between the various subset tests regarding their type and error probabilities in finite samples we want to lay out a Monte Carlo design which is less restrictive than those just reviewed. t should allow to represent the major characteristics of data series and their relationships as faced in empirical work, while avoiding the imposition of awkward restrictions on the nuisance parameter space. nstead of picking particular values for the coefficients and further parameters in a simple DGP, and check whether or not this leads to covering empirically relevant cases, we choose to approach this design problem from the opposite direction The simulated data generating process Model (2.1) is specialized in our simulations to y = β 1 ι + β 2 y (2) + β 3 y (3) + u, (2.44) y (2) = π 21 ι + π 22 z (2) + π 23 z (3) + v (2), (2.45) y (3) = π 31 ι + π 32 z (2) + π 33 z (3) + v (3), (2.46) where ι is an n 1 vector consisting of ones. So, K =3,L 1 =1andL 2 =2, with K o + K e =2,Y =(y (2) y (3) ),Z 1 = ι and Z =(ιz (2) z (3) ). Since K = L, at this stage we only investigate the case in which under the unrestrained alternative hypothesis the single equation with endogenous regressors (2.44) is just identified according to the order condition. Because the statistics to be analyzed will be invariant regarding the values of the intercepts, these are all set equal to zero, thus β 1 = π 21 = π 31 = 0. Fulfillment of the rank condition for identification then implies that the inequality π 22 π 33 π 23 π 32 (2.47) has to be satisfied. The vectors z (2) and z (3) will be generated as mutually independent D(0, 1) series. They have been drawn only once and then were kept fixed over all replications. n fact we drew two arbitrary series and next rescaled them such that their sample mean and variance, and also their sample covariance correspond to the population values 0, 1 and 0 respectively. To allow for simultaneity of both y (2) and y (3), as well as for any value of the correlation 26

39 between the reduced form disturbances v (2) and v (3), these disturbances have components v (2) = η (2) + γ 2 u and v (3) = η (3) + κη (2) + γ 3 u, (2.48) where the series u i, η (2) i and η (3) i will be generated as mutually independent zero mean D series (for i =1,..., n). Without loss of generality, we may choose σu 2 =1. Scaling the variances of the potentially endogenous regressors simplifies the model even further, again without loss of generality. This scaling is innocuous, because it can be compensated by the chosen values for β 2 and β 3. We will realize σ 2 = σ 2 = 1 by choosing appropriate y (2) y (3) values for σ 2 > 0andσ 2 > 0 as follows. For the variance of the D series for the η (2) η (3) reduced form disturbances and for the possibly endogenous explanatory variables we find σ 2 v (2) = σ 2 η (2) + γ 2 2, σ 2 y (2) = π π σ 2 v (2) =1, σ 2 v (3) = σ 2 η (3) + κ 2 σ 2 η (2) + γ 2 3, σ 2 y (3) = π π σ 2 v (3) =1. (2.49) This requires σ 2 η (2) =1 π 2 22 π 2 23 γ 2 2 > 0andσ 2 η (3) =1 π 2 32 π 2 33 κ 2 σ 2 η (2) γ 2 3 > 0. (2.50) n addition to (2.47), (2.50) implies two further inequality restrictions on the nine parameters of the data generating process, which are {γ 2,γ 3,κ,π 22,π 23,π 32,π 33,β 2,β 3 }. (2.51) However, more restrictions should be respected as we will see, when we consider further consequences of a choice of particular values for these DGP parameters Simulation design parameter space Assigning a range of reasonable values to the nine DGP parameters is cumbersome as it is not immediately obvious what model characteristics they imply. Therefore, we now first define econometrically meaningful design parameters. These are functions of the DGP parameters, and we will invert these functions in order to find solutions for the parameters of the DGP in terms of the chosen design parameter values. Since the DGP is characterized by nine parameters, we should define nine variation free design parameters as well. However, their relationships will be such, that this will not automatically imply the existence nor the uniqueness of solutions. 27

40 Two obvious design parameters are the degree of simultaneity in y (2) and y (3), given by ρ j = Cov(y (j) i,u i )/(σ y (j)σ u )=γ j,j=2, 3. (2.52) Hence, by choosing σ 2 = σ 2 = 1, the degree of simultaneity in y (j) is directly controlled y (2) y (3) by γ j for j =2, 3, and it implies two more inequality restrictions, namely γ j < 1, j=2, 3. (2.53) Another design parameter is a measure of multicollinearity between y (2) and y (3) given by the correlation ρ 23 = π 22 π 32 + π 23 π 33 + κ(1 π 2 22 π 2 23 γ 2 2)+γ 2 γ 3, (2.54) implying yet another restriction π22 π 32 + π 23 π 33 + κ(1 π 2 22 π 2 23 γ 2 2)+γ 2 γ 3 < 1. (2.55) Further characterizations relevant from an econometric perspective are the marginal strength of instrument z (2) for y (j) and the joint strength of z (2) and z (3) for y (j), which are established by the (partial) population coefficients of determination R 2 j;z2 = π 2 j2 and R 2 j;z23 = π 2 j2 + π 2 j3, j=2, 3. (2.56) n the same vain, and completing the set of nine design parameters, are two similar characterizations of the fit of the equation of interest. Because the usual R 2 gives complications under simultaneity, we focus on its reduced form equation y = (β 2 π 22 + β 3 π 32 ) z (2) +(β 2 π 23 + β 3 π 33 ) z (3) +(β 2 + β 3 κ) η (2) + β 3 η (3) +(1+β 2 γ 2 + β 3 γ 3 ) u. (2.57) This yields σ 2 y = (β 2 π 22 + β 3 π 32 ) 2 +(β 2 π 23 + β 3 π 33 ) 2 +(β 2 + β 3 κ) 2 σ 2 η (2) + β 2 3σ 2 η (3) +(1+β 2 γ 2 + β 3 γ 3 ) 2, (2.58) 28

41 and in line with (2.56) we then have R1;z2 2 =(β 2 π 22 + β 3 π 32 ) 2 /σy 2 and R1;z23 2 =[(β 2 π 22 + β 3 π 32 ) 2 +(β 2 π 23 + β 3 π 33 ) 2 ]/σy. 2 (2.59) The 9-dimensional design parameter space is given now by {ρ 2,ρ 3,ρ 23,R 2 2;z2,R 2 2;z23,R 2 3;z2,R 2 3;z23,R 2 1;z2,R 2 1;z23}. (2.60) The first three of these parameters have domain ( 1, +1) and the six R 2 values have to obey the restrictions 0 Rj;z2 2 Rj;z23 2 < 1, j=1, 2, 3. (2.61) However, without loss of generality we can further restrict the domain of the nine design parameters, due to symmetry of the DGP with respect to: (a) the two regressors y (2) and y (3) in (2.44), (b) the two instrumental variables z (2) and z (3), and (c) implications which follow when all random variables are drawn from distributions with a symmetric density function. Due to (a) we may just consider cases where ρ 2 2 ρ 2 3. (2.62) So, if one of the two regressors will have a more severe simultaneity coefficient, it will always be y (2). Due to (b) we will limit ourselves to cases where π22 2 π23. 2 Hence, if one of the instruments for y (2) is stronger than the other, it will always be z (2). On top of (2.61) this implies R2;z R2;z23. 2 (2.63) f (c) applies, we may restrict ourselves to cases where next to particular values for (γ 2,γ 3 ), we do not also have to examine ( γ 2, γ 3 ). This is achieved by imposing ρ 2 + ρ 3 0. n combination with (2.62) this leads to 1 >ρ 2 ρ 3 0. (2.64) Solving the DGP parameters in terms of the design parameters can now be achieved 29

42 as follows. n a first stage we can easily solve 7 of the 9 parameters, namely γ j = ρ j π j2 = d j2 (Rj;z2) 2 1/2,d j2 = 1, +1 j =2, 3. (2.65) π j3 = d j3 (Rj;z23 2 Rj;z2) 2 1/2,d j3 = 1, +1 With (2.54) these give κ =(ρ 23 π 22 π 32 π 23 π 33 γ 2 γ 3 )/(1 π 2 22 π 2 23 γ 2 2). (2.66) Thus, for a particular case of chosen design parameter values, obeying the inequalities (2.61) through (2.64), we may obtain 2 4 solutions from (2.65) and (2.66) for the DGP parameters. However, some of these may be inadmissible, if they do not fulfill the requirements (2.47) and (2.50). Moreover, we will show that not all of these 2 4 solutions necessarily lead to unique results on the distribution of the test statistics W o,d o and T o. Finally, the remaining two parameters β 2 and β 3 can be solved from the pair of nonlinear equations (1 R1;z2)(β 2 2 π 22 + β 3 π 32 ) 2 = R1;z2[(β 2 2 π 23 + β 3 π 33 ) 2 +(1+β 2 γ 2 + β 3 γ 3 ) 2 + β3σ 2 2 +(β η (3) 2 + β 3 κ) 2 σ 2 ], η (2) (1 R1;z23)[(β 2 2 π 22 + β 3 π 32 ) 2 +(β 2 π 23 + β 3 π 33 ) 2 ]=R1;z23[(1 2 + β 2 γ 2 + β 3 γ 3 ) 2 +β3σ 2 2 +(β η (3) 2 + β 3 κ) 2 σ 2 ]. η (2) Both these equations represent particular conic sections, specializing into either ellipses, parabolas or hyperbolas, implying that there may be zero up to eight solutions. However, it is easy to see that the three subset test statistics are all invariant with respect to β. Note that û =[ X(X P Z X) 1 X P Z ](Xβ + u) =[ X(X P Z X) 1 X P Z ]u and û r = [ X(X P Zr X) 1 X P Zr ]u are invariant with respect to β, thus so are ˆσ 2 and ˆσ r. 2 And σ 2 is too, because y X ˆβ M Z Y o ˆξ = û MZ Y o ˆξ is, as follows from (2.22) and (2.23). Moreover, because ˆξ is invariant with respect to β also is the numerator of the three test statistics. 18 Therefore, R1;z2 2 and R1;z23 2 do not really establish nuisance parameters, reducing the dimensionality of the nuisance parameter space to 7. Without loss of generality we may always set β 2 = β 3 = 0 in the simulated DGP s. 18 Wu (1974) finds this invariance result too, but his proof suggests that it is a consequence of normality of all the disturbances, whereas it holds more generally. 30

43 When (c) applies, not all 16 permutations of the signs of the four reduced form coefficients lead to unique results for the test statistics, because of the following. f the sign of all elements of y (2) and (or) y (3) is changed, this means that in the general formulas the matrix X is replaced by XJ, where J is a K K diagonal matrix with diagonal elements +1 or 1, for which J = J = J 1. t is easily verified that such a transformation has no effect on the quadratic forms in y which constitute the test statistics W o,d o and T o, because it does not alter the space spanned by the matrices A and C of (2.24) nor that of the projection matrices used in the three different estimators of σ 2. So, when changing the sign of all reduced form coefficients, and at the same time the sign of all the elements of the vectors u, η (2) and η (3), the same test statistics are found, whereas the simultaneity and multicollinearity are still the same. This reduces the 16 possible permutations to 8, which we achieve by choosing d 22 = 1. From the remaining 8 permutations four different couples yield similar ρ 23 and κ values. We keep the four permutations which genuinely differ by choosing d 23 = 1, and will give explicit attention to the four distinct cases (1, 1, 1, 1) (1, 1, 1, 1) (d 22,d 23,d 32,d 33 )= (2.67) (1, 1, 1, 1) (1, 1, 1, 1), when we generate the disturbances from a symmetric distribution, which at this stage we will. For the design parameters we shall choose various interesting combinations from ρ 2 {0,.2,.5} ρ 3 {.5,.2, 0,.2,.5} ρ 23 {.5,.2, 0,.2,.5} (2.68) Rj;z2 2 {.01,.1,.2,.3} Rj;z23 2 {.02,.1,.2,.4,.5,.6} in as far as they satisfy the restrictions (2.61) through (2.64), provided they obey also the admissibility restrictions given by (2.47), (2.50) and (2.55). 2.5 Simulation findings on rejection probabilities n each of the R replications in the simulation study, new independent realizations are drawn on u, η (2) and η (3). The three test statistics W o,d o and T o will be calculated 31

44 for both y (2) (then denoted as W 2,D 2,T 2 )andfory (3) (denoted W 3,D 3,T 3 ) assuming the other regressor to be endogenous. These genuine subset tests will be compared with tests on the endogeneity of the full set. The latter are denoted W 23,D 23,T 23 (these are tests involving 2 degrees of freedom), W3 2,D3,T (when y (3) is treated as exogenous) and W2 3,D2,T (when y (2) is treated as exogenous). The behavior under both the null and the alternative hypothesis will be investigated. These full set tests are included to better appreciate the special nature of the more subtle subset tests under investigation here. Every replication it is checked whether or not the null hypothesis is rejected by test statistic Υ, where Υ is any of the tests indicated above. From this we obtain the Monte Carlo estimate p Υ = 1 R R ( Υ (r) > Υ c (α) ). (2.69) r=1 Here (.) is the indicator function that takes value one when its argument is true and zero when it is not. We take the standard form of the test statistics in which Υ c (α) isthe α-level critical value of the χ 2 distribution (with either 1 or 2 degrees of freedom) and in which σ 2 estimates have no degrees of freedom correction. The Monte Carlo estimator p Υ estimates the actual rejection probability of asymptotic test procedure Υ. When H 0 is true it estimates the actual type error probability (at nominal level α) and when H 0 is false 1 p Υ estimates the type error probability, whereas p Υ is then a (naive, when there are size distortions) estimator of the power function of the test in one particular argument (defined by the specific case of values of the design and matching DGP parameters). Estimator p Υ follows the binomial distribution and has standard errors given by SE( p Υ )=[ p Υ (1 p Υ )/R] 1/2. For R large, a 99.75% confidence interval for the true rejection probability is C 99.75% =[ p Υ 3 SE( p Υ ), p Υ +3 SE( p Υ )]. (2.70) We choose R = 10000, examine all endogeneity tests at the nominal significance level of 5%, and take the sample size equal to n = 40 (mostly). Note that the boundary values for determining whether the actual type error probability of these asymptotic tests differs at this particular small sample size significantly (at the very small level of.25%) from the nominal value 5% are.043 and.057 respectively At least one exogenous regressor n this subsection we examine cases where either both regressors y (2) and y (3) are actually exogenous or just y (3) is exogenous. Hence, for particular implementations of the subset and full set tests on endogeneity the null hypothesis is true, but for some it is 32

45 false. n fact, it is always true for the subset tests on y (3) in the cases of this subsection. We present a series of tables containing estimated rejection probabilities and each separate table focusses on a particular setting regarding the strength of the instruments. Every case consists of potentially four subcases; a stands for (d 32,d 33 )=(1, 1), b for (d 32,d 33 )=( 1, 1), c for (d 32,d 33 )=(1, 1) and d for (d 32,d 33 )=( 1, 1). When both instruments have similar strength for y (2) and also (but probably stronger or weaker) for y (3) the identification condition requires d 32 d 33. Then only two of the four combinations (2.67) are feasible so that every case just consists of the two subcases b and c. n Table 2.1 we consider cases with mildly strong instruments. n the first five cases both y (2) and y (3) are exogenous whereas the degree of multicollinearity changes. So in the first ten rows of the table, for all five distinct implementations of the three different test statistics examined, the null hypothesis is true. Because y (2) and y (3) are parametrized similarly here, the two subset test implementations are actually equivalent. The minor differences in rejection probabilities stem from random variation, both in the disturbances and in the single realizations of the instruments. The same holds for the two full set implementations with one degree of freedom. For all implementations over the first five cases (both b and c ) D o shows acceptable size control, whereas W o tends to underreject, whilst T o overrejects. The subset version of W o gets worse under multicollinearity (irrespective of the sign of ρ 23 ), whereas multicollinearity increases the type error probability of the full set W o tests. Both D o and T o seem unaffected by multicollinearity for these cases. When y (2) is made mildly endogenous, as in cases 6-10, the null hypothesis is still true for the subset tests W 3,D 3 and T 3. Their type error probability seems virtually unaffected by the actual values of ρ 2 and ρ 23. For the subset tests W 2,D 2 and T 2 the null hypothesis is false. Due to their differences in type error probability we cannot conclude much about power yet, but that they have some and that it is virtually unaffected by ρ 23 is clear. The next three columns demonstrate that it is essential that a full set test exploits genuinely exogenous regressors, because if it does not it may falsely diagnose endogeneity of an exogenous regressor (but by a reasonably low probability when the regressors are uncorrelated). However, the next tests reported, which exploit the genuine exogeneity of y (3), demonstrate that in this case they do a much better job in detecting the endogenous nature of y (2) than the subset tests, provided there is (serious) multicollinearity. Here the full set tests have the advantage of using an extra valid instrument. The effects of multicollinearity can be explained as follows. Using the notation of the more general setup and auxiliary regression (2.19), the subset (full set) tests test here the significance 33

46 of the regressors P Z Y o (P Z Y o ) in a regression already containing P Zr X (P Z r X = X), where Z = (Z Y e )andzr = (Z r Y e ). Regarding the subset test it is obvious that, because the space spanned by P Zr X =(P Zr Y e Y o Z 1 ) does not change when Y e and Y o are more or less correlated, the significance test of P Z Y o is not affected by ρ 23. However, P Z Y o is affected (positively in a matrix sense) when Y o and Y e are more (positively or negatively) correlated, which explains the increasing probability of detecting endogeneity by the present full set tests. Finally the two degrees of freedom full set tests demonstrate power, also when the null hypothesis tested is only partly false. One would expect lower rejection probability here than for the full set test which correctly exploits orthogonality of y (3), but comparison is hampered again due to the differences between type error probabilities. Note though that the first five cases show larger type error probabilities for T 23 than for T3 2, whereas cases 6-10 show fewer correct rejections, which fully conforms to our expectations. For a higher degree of simultaneity in y (2) (cases 11-13) we find for the subset tests that W 3 still underrejects substantially but an effect of multicollinearity is no longer established, which is probably because DGP s with a similar ρ 2 and ρ 3 but higher ρ 23 are not feasible. Here D 3 does no longer outperform T 3. For the other tests the rejection probabilities that should increase with ρ 2 do indeed, and we find that the probability of misguidance by the full set tests exploiting an invalid instrument is even more troublesome now. These results already indicate that subset tests are indispensable in a comprehensive sequential strategy to classify regressors as either endogenous or exogenous. Because, after a two degrees of freedom full set test may have indicated that at least one of the two regressors is endogenous, neither of the one degree of freedom full set tests will be capable of indicating which one is endogenous if there is one endogenous and one exogenous regressor, unless these two regressors are mutually orthogonal. However, the two subset tests demonstrate that they can be used to diagnose the endogeneity/exogeneity of the regressors, especially when the endogeneity is serious, irrespective of their degree of multicollinearity. We shall now examine how these capabilities are affected by the strength of the instruments. The results in Table 2.2 stem from similar DGP s which differ from the previous ones only in the increased strength of both the instruments, which forces further limitations on multicollinearity, due to (2.50). Note that the size properties have not really improved. Due to the limitations on varying multicollinearity its effects can hardly be assessed from this table. The rejection probabilities of false null hypotheses are larger when the maintained hypothesis is valid, whereas the tests which impose an invalid or- 34

47 thogonality condition become even more confusing when the genuine instruments are stronger. Multicollinearity still has an increasing effect on the rejection probability of all the full set tests, which is very uncomfortable for the implementations which impose a false exogeneity assumption. Staiger and Stock (1997) found that full set tests have correct asymptotic size, although being inconsistent under weak instrument asymptotics. The following three tables illustrate cases in which the instruments are weak for one of the two potentially endogenous variables or for both. n the DGP s used to generate Table 2.3, the instruments are weak for y (2) but strong for y (3). So now the two subset tests examine different situations (even when ρ 2 = ρ 3 =0) and so do the two one degree of freedom full set tests. Especially the subset W o tests and the two degrees of freedom W 23 test are seriously undersized. When the endogeneity of the weakly instrumented regressor is tested by W3 2 the type error probability is seriously affected by (lack of) multicollinearity. All full set T o tests are oversized. Only the D o tests would require just a (mostly) moderate size correction. The probability that subset test D 2 will detect the endogeneity is small, which was already predicted immediately below (2.18). D2 3 will again provide confusing evidence, unless the regressors are orthogonal. Full set tests D3 2 and D 23 have power only under multicollinearity. The latter result can be understood upon specializing (2.18) for this case, where the contributions with c 1 and c 3 disappear because K e =0andL 1 =0. Using Σ Z Z = and Σ Y o Y o = we have to find a solution c 2 satisfying c 2 = σ 2 γ o 0andΠ o c 2 =0. Since γ o =(ρ 2 0) and the first column of Π o vanishes asymptotically there is such a solution indeed, but not if Σ Y o Y o were nondiagonal. The situation is reversed in Table 2.4, where the instruments are weak for y (3) and strong for the possibly endogenous y (2). Cases 23 and 24 are mirrored in cases 29 and 30. The W o tests are seriously undersized, except W3 2 (building on exogeneity of y (3), it is not affected by its weak external instruments) and W2 3 (provided the multicollinearity is substantial). The full set T o tests are again oversized. All D o implementations show mild size distortions. Because of the findings below (2.18) it is no surprise that the subset tests on y (2) exhibit deminishing power for more severe multicollinearity. After size correction it seems likely that W 2 or especially T 2 would do better than D 2. Also the tests W3 2,D3 2 and T3 2 show power for detecting endogeneity of y (2) when the instruments are weak for exogenous regressor y (3), and their power increases with multicollinearity and of course with ρ 2. Finally we construct DGP s in which the instruments are weak for both regressors. Given our predictions below (2.18) and because we found mixed results when the instru- 35

48 ments are weak for one of the two regressors, not much should be expected when both are affected. The results in Table 2.5 do indeed illustrate this. The W o tests underreject severely, T o gives a mixed picture, but D o would require only a minor size correction, although it will yield very modest power. n addition to cases in which the two instruments have similar strength for y (2) and y (3), we present a couple of cases in which this differs. Note that the inequality (2.47) is now satisfied by all four combinations in (2.67). The reason that not every case in Table 2.6 consists of four subcases is that not every subcase satisfies the second part of (2.50). The results for the subset tests differ greatly between the four subcases. Subcases a and d show lower rejection probabilities for W o and T o, whereas D o seems unaffected under the null hypothesis. This suggests that the estimate ˆβ r (and hence ˆσ r) 2 is probably less affected by (d 23,d 33 ) in these subcases than ˆσ 2 and σ 2. The subset tests on y (2) and y (3) behave similar although the (joint) instrument strength is a little higher for the former. Whereas the results between the subcases are quite different for the subset tests and the two degrees of freedom full set tests, the one degree of freedom full set test seem less dependent on the choice of (d 23,d 33 ). When y (2) is endogenous D 2 has substantially less power in subcases a and d even though under the null hypothesis it rejects less often in subcases c and d. For the full set test things are different. These reject far more often in subcases a and d when there is little or no multicollinearity. However, when multicollinearity is more pronounced the tests reject less often in subcases a and d than in b and c. From these results we conclude that the relevant nuisance parameters for these asymptotic tests are not just simultaneity, multicollinearity and instrument strength, but also the actual signs of the reduced form coefficients Both regressors endogenous The rejection probabilities of the subset tests estimated under the alternative hypothesis in the previous subsection are only of secondary interest, because the subset that was treated as endogenous was actually exogenous. n such cases application of the one-degree of freedom full set test is more appropriate. Now the not tested subset which is treated as endogenous will actually be endogenous, so we will get crucial information on the practical usefulness of the subset tests, and further evidence on the possible misguidance by the here inappropriate one degree of freedom full set tests. Similar cases in terms of instrument strength have been chosen to keep comparability with the previous subsection. The DGP s used for Table 2.7 mimic those of Table 2.1 in terms of instrument strength. n most cases the subset tests behave roughly the same as when the maintained regressor 36

49 was actually exogenous, although multicollinearity is now found to have a small though clear asymmetric impact on the rejection probabilities. When multicollinearity is of the same sign as the simultaneity in y (3), test statistics W o and T o reject less often than when these signs differ. This is not caused by the fixed nature of the instruments, because simulations (not reported) in which the instruments are random show the same effect. On the other hand, the differences between subcases diminish when the instruments are random. Multicollinearity decreases the rejection probabilities, but less so when the endogeneity of the maintained regressor is more severe. The full set tests with one degree of freedom are affected more by multicollinearity than the subset tests. As is to be expected, the two degrees of freedom full set tests reject more often now that both regressors are endogenous. The rejection probabilities of these full set tests, D o included, decrease dramatically if ρ 23 and ρ 3 are of the same sign, and they do that much more than for the subset tests. Note that the cases in which ρ 3 takes on a negative value are very similar to cases in which ρ 3 is positive and the sign of ρ 23 is changed, or those of (d 32,d 33 ). More specifically, case 63b corresponds with case 59c and case 63c with case 59b. Therefore, we will exclude cases with negative values for ρ 3 from future tables and stick to their positive counterparts. n Table 2.8 we examine stronger instruments. Comparing with Table 2.2 we find that the rejection probabilities seem virtually unaffected by choosing ρ 3 0. Aswe found before the rejection probabilities are affected in a positive manner by the increased strength of the instruments. The subset tests reject almost every time if the corresponding degree of simultaneity is.5. The effect of having ρ 23 and ρ 3 both positive seems less severe. As long as this is not the case, the two one degree of freedom full set tests reject more often than the subset tests. f ρ 23 and ρ 3 do not differ in sign W o and D o reject more often when applied to a subset than for their one degree and two degrees of freedom full set versions. Because Table 2.3 and 2.4 are very similar and now both regressors are endogenous we only need to consider the equivalent table of the latter. n Table 2.9 the instruments are weak for y (3) but strong for y (2). Obviously the subset tests for y (3) lack power now, as was already concluded from Table 2.3. However, subset tests for y (2) show power also in the presence of a maintained endogenous though weakly instrumented regressor. Note that when ρ 3 is increased all subset tests for y (2) reject more often. This dependence was not apparent under non-weak instruments. As we found in Table 2.5 the subset tests perform badly when the instruments are weak for both regressors. From the results on the subset test for y (3) we expect the same for the case in which ρ 3 0. This we found to be true in further simulations, though we 37

50 do not present a table on these as it is not very informative. These simulations demonstrate that the subset tests are indispensable when there is more than one regressor in a model that might be endogenous. Using only full set tests will not enable to classify the individual variables as either endogenous or exogenous. However, all tests examined here show substantial size distortions in finite samples. Moreover, these size distortions are found to be determined in a complex way by the model characteristics. n fact the various tables illustrate that it are not just the design parameters simultaneity, multicollinearity and instrument strength which determine the size of these tests. The differences between the subcases illustrate that the size also depends on the actual reduced form coefficients and therefore in fact on the degree by which the multicollinearity stems from correlation between the reduced form disturbances (κ). Trying to mitigate the size problems by simple degrees of freedom adjustments or by transformations to F statistics seems therefore a dead-end. 2.6 Results for bootstrapped tests Because all the test statistics that are under investigation here are based on appropriate first order asymptotics, it should be feasible to mitigate the size problems by bootstrapping A bootstrap routine for subset DWH test statistics Bootstrap routines for testing the orthogonality of all possibly endogenous regressors have previously been discussed by Wong (1996). mplementation of these bootstrap routines is relatively easy due to the fact that no regressors are assumed to be endogenous under the null hypothesis. This in contrast to the test of subsets where some regressors are endogenous also under the null hypothesis. Their presence complicates matters as bootstrap realizations have to be generated on both the dependent variable and the maintained set of endogenous regressors. We discuss two routines; first a parametric and next a semiparametric bootstrap. For the former routine we have to assume a distribution for the disturbances, which we choose to be the normal. Consider the n (1 + K e ) matrix U =(uv e ). ts elements can be estimated by: û r = y X ˆβ r and ˆV er = Y e Z r ˆΠer,whereˆΠ er =(Z rz r ) 1 Z ry e. Under the null hypothesis ˆβ r and ˆΠ er are consistent estimators and it follows that Ûr =(û r ˆVer ) is consistent for U, and hence ˆΣ =n 1 Û rûr is a consistent estimator of the variance of its rows. The following illustrates the steps that are required for the bootstrap procedure. 38

51 1. Draw pseudo disturbances of sample size n from the N(0, ˆΣ) distribution and collect them in U (b) =(u (b) V e (b) ). Obtain bootstrap realizations on the endogenous explanatory variables and the dependent variable through: Y e (b) = Z r ˆΠer + V e (b) and y (b) = X (b) ˆβ r + u (b),wherex (b) =(Y e (b) Y o Z 1 ). Calculate the test statistic of choice Υ and store its value ˆΥ (b). 2. Repeat step (1) B times resulting in the B 1 vector ˆΥ B =(ˆΥ (1)... ˆΥ (B) ) of which the elements should be sorted in increasing order. 3. The null hypothesis should be rejected if for the empirical value ˆΥ, calculated on the basis of y, X and Z, one finds ˆΥ bc > ˆΥ α,the(1 α)(b + 1)-th value of the sorted vector. Applying the semiparametric bootstrap is very similar as it only differs from the parametric one in step (1). nstead of assuming a distribution for the disturbances we resample by drawing rows with replacement from Ûr Simulation results for bootstrapped test statistics Wong (1996) concludes that bootstrapping the full set test statistics yields an improvement over using first order asymptotics, especially in the case where the (in his case external) instrument is weak. n this subsection we will discuss simulation results for the bootstrapped counterparts of the various test statistics. Again all results are obtained with R = and n = 40, additionally we choose the number of bootstrap replications to be B = 199. To mimic as closely as possible the way the bootstrap would be employed in practice, for each case and each test statistic we calculated the bootstrap critical value ˆΥ bc α again in each separate replication. Table 2.10 is the bootstrapped equivalent of Table 2.1. Whereas we found that the crude asymptotic version of W o underrejects while T o overrejects, bootstrapping these test statistics results in a substantial improvement 19 of their size properties. n fact, in this respect all three tests perform now equally well with mildly strong instruments, because the estimated actual significance level lies always inside the 99.75% confidence interval for the nominal level. Not only the subset tests profit from being bootstrapped, the one degree and two degrees of freedom full set tests do as well. n terms of power we find that the bootstrapped versions of W o,t o and D o perform almost equally well. We do find minor differences in rejection frequencies under the alternative, but often these seem still 19 Although the current implementation of the bootstrap already performs quite well, even better results may be obtained by rescaling the reduced form residuals by a loss of degrees of freedom correction. 39

52 to be the results of minor differences in size. Nevertheless, on a few occasions test D o seems to fall behind. Now we establish more convincingly that exploiting correctly the exogeneity of y (2) in a full set test provides more power, especially when multicollinearity is present, than not exploiting it in a subset test. Of course, the unfavorable substantial rejection probability of the exogeneity of the truly exogenous y (3), caused by wrongly treating y (2) as exogenous in a full set test, cannot be healed by bootstrapping. Similar conclusions can be drawn from Table 2.11 which contains results for stronger instruments. On the other hand, we find in Table 2.12 that bootstrapping does not achieve satisfactory size control for most of the subset tests, when the instruments are weak for one regressor. Only D 2 shows reasonable type error probabilities, but when testing the endogeneity of y (2), the regressor for which the instruments are weak, there is hardly any power. The full set tests do not show substantial size distortions and the one degree of freedom full set test on y (2) and the two degrees of freedom test demonstrate power provided the regressors show multicollinearity. The results in Table 2.13 indicate that the subset test is of more use when weakness of instruments does not concern the variable under test. We can conclude that W o and T o have more power than D o, since they reject less often under the null hypothesis but more often under the alternative. Because we were unable yet to properly size correct the subset test on the strongly instrumented regressor in Tables 2.12 and 2.13, we know that we will be unable to do so too when all regressors are weakly instrumented. This is supported by the results summarized in Table Again the results are slightly better for W o and T o but there is almost no power. For DGP s in which both regressors are endogenous we again construct three tables. From subsection we learned that under the alternative hypothesis the tests behave similar to cases in which only y (2) is endogenous. This is found here too as can be seen from Table We find further evidence that the subset version of D o performs less than W o and T o. New in comparison with Table 2.7 is that the two degrees of freedom full set tests generally exhibit more power than the one degree of freedom full set tests when the instruments are mildly strong. However, this was already found for cases with stronger instruments without bootstrapping. ncreasing the instrument strength raises the rejection probabilities as before as can be seen from Table That our current implementation of the bootstrap does not offer satisfactory size control for most subset tests when y (3) is weakly instrumented was already demonstrated in Table 2.12 and we conclude the same for the case when both regressors are endogenous as is obvious from the results in Table

53 2.7 Empirical case study A classic application involving more than one possibly endogenous regressor is Griliches (1976), which studies the effect of education on wage. t is often used to demonstrate instrumental variable techniques. Both education and Q are presumably endogenous due to omitted regressors. However, testing this assumption is often overlooked. Here we shall examine the exogeneity status of both regressors jointly and individually by means of the full set tests and the subset tests. The same data are used as in Hayashi (2000, p.236). They stem from the Young Men s Cohort of the National Longitudinal Survey and include 1362 observations. We have the wage equation and reduced form equations log W i = β 1 S i + β 2 Q i + Z 1i γ 1 + u i (2.71) Y i = Z 1i Π 1 + Z 2i Π 2 + V i, (2.72) where W is the hourly wage rate, S is schooling in years and Q is a test score. All regressors that are assumed to be exogenous are included in Z 1 ; these are an intercept (CONS),yearsofexperience(EXPR), tenure in years (TEN), a dummy for southern states (RNS) and a dummy for metropolitan areas (SMSA). Additionally Z 2 includes instruments age, age squared, mother education, KWW test score and a marital status dummy. n accordance with our previous notation both potentially endogenous regressors are included in Y. Table 2.18 presents the results of four regressions. OLS treats both schooling and Q as exogenous, whereas they are assumed to be endogenous in the V regression. n V 1 regressor S i is treated as exogenous and Q as endogenous whereas in V 2 regressor Q is treated as exogenous and schooling as endogenous. Next, in Table 2.19, we test various hypotheses regarding the exogeneity of one or both potentially endogenous regressors. Joint exogeneity of schooling and Q is rejected. Hence, at least one of these regressors is endogenous and we should use the subset tests to find out whether it is just one or both. However, first we examine the effect of using the full set test on the individual regressors. n both cases the null hypothesis is rejected. From the Monte Carlo simulation results we learned that the full set tests are inappropriate for correctly classifying individual regressors in the presence of other endogenous regressors. Therefore, we better employ the subset tests. Again we reject the null hypothesis that schooling is exogenous, but the null hypothesis that Q is exogenous is not rejected at usual significance levels. Bootstrapping these two test statistics does not lead to different conclusions. Based on these results one could greet regression V 2 instead of V, resulting in reduced standard errors and a less controversial result on the effect of Q,ascanbe 41

54 seen from Table Conclusions n this chapter various tests on the orthogonality of arbitrary subsets of explanatory variables are motivated and their performance is compared in a series of Monte Carlo experiments. We find that genuine subset tests play an indispensable part in a comprehensive sequential strategy to classify regressors as either endogenous or exogenous. full set tests have a high probability to classify an exogenous regressor wrongly as endogenous if it is merely correlated with an endogenous regressor. Regarding type error performance we find that subset tests benefit from estimating variances under the null hypothesis (D o ), as in Lagrange multiplier tests. Estimating the variances under the alternative (W o ), as in Wald-type tests, leads to underrejection when the instruments are not very strong. However, bootstrapping results in good size control for all test statistics as long as the instruments are not weak for one of the endogenous regressors. When the various tests are compared in terms of power the bootstrapped Waldtype tests behave often more favorable. This falsifies earlier theoretical presumptions on the better power of the T o type of test. The outcome is such that we do not expect that a better performance could have been obtained by the computationally more involved implementations that result from strictly employing the Hausman or the Hansen-Sargan principles. Even when the instruments are weak for the maintained endogenous regressor but strong for the regressor under inspection we find that the auxiliary regression tests exhibit power, but there is insufficient size control, also when bootstrapped. This is in contrast to situations in which the instruments are not weak. Then, when bootstrapped, the subset and full set tests can jointly be used fruitfully to classify individual explanatory variables and groups of them as either exogenous or endogenous. t must be noted though that the conclusions obtained from the experiments in this study are limited, as they only deal with static linear models with Gaussian disturbances, which are just identified by genuinely exogenous external instruments. Apart from relaxing some of these limitations in future work, we plan to look further into effects due to weakness of instruments. Furthermore, tests on the orthogonality of subsets of external instruments and joint tests on the orthogonality of included and excluded instruments deserve further examination. 42

55 Table 2.1: One endogenous regressor and mildly strong instruments: R 2 2;z2 =.2, R 2 2;z23 =.4, R 2 3;z2 =.2, R 2 3;z23 =.4 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 1b c b c b c b c b c b c b c b c b c b c b c b c b c Table 2.2: One endogenous regressor and stronger instruments: R 2 2;z2 =.3, R 2 2;z23 =.6, R 2 3;z2 =.3, R 2 3;z23 =.6 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 14b c b c b c b c b c b c b c b c b c

56 Table 2.3: One endogenous regressor and weak instruments for y (2) : R 2 2;z2 =.01, R 2 2;z23 =.02, R 2 3;z2 =.3, R 2 3;z23 =.6 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 23a b a b a b a b a b a b Table 2.4: One endogenous regressor and weak instruments for y (3) : R 2 2;z2 =.3, R 2 2;z23 =.6, R 2 3;z2 =.01, R 2 3;z23 =.02 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 29b c b c b c b c b c b c Table 2.5: One endogenous regressor and weak instruments: R 2 2;z2 = R 2 3;z2 =.01, R 2 2;z23 = R 2 3;z23 =.02 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 35b c b c b c b c b c b c

57 Table 2.6: One endogenous regressor and asymmetric instrument strength: R 2 2;z2 =.3, R 2 2;z23 =.5, R 2 3;z2 =.1, R 2 3;z23 =.4 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 1a b c d b c d a b c c d a b a b c d b c d a b c c d a b b c b c d a b c d a

58 Table 2.7: Two endogenous regressors and mildly strong instruments: R 2 2;z2 =.2, R 2 2;z23 =.4, R 2 3;z2 =.2, R 2 3;z23 =.4 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 57b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c

59 Table 2.8: Two endogenous regressors and stronger instruments: R 2 2;z2 =.3, R 2 2;z23 =.6, R 2 3;z2 =.3, R 2 3;z23 =.6 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 81b c b c b c b c b c b c Table 2.9: Two endogenous regressors and weak instruments for y (3) : R 2 2;z2 =.3, R 2 2;z23 =.6, R 2 3;z2 =.01, R 2 3;z23 =.02 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 87b c b c b c b c b c b c b c b c b c b c b c

60 Table 2.10: Bootstrapped: One endogenous regressor and mildly strong instruments: R 2 2;z2 =.2, R 2 2;z23 =.4, R 2 3;z2 =.2, R 2 3;z23 =.4 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 1b c b c b c b c b c b c b c b c b c b c b c b c b c Table 2.11: Bootstrapped: One endogenous regressor and stronger instruments: R 2 2;z2 =.3, R 2 2;z23 =.6, R 2 3;z2 =.3, R 2 3;z23 =.6 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 14b c b c b c b c b c b c b c b c b c

61 Table 2.12: Bootstrapped: One endogenous regressor and weak instruments for y (2) : R 2 2;z2 =.01, R 2 2;z23 =.02, R 2 3;z2 =.3, R 2 3;z23 =.6 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 23b c b c b c b c b c b c Table 2.13: Bootstrapped: One endogenous regressor and weak instruments for y (3) : R 2 2;z2 =.3, R 2 2;z23 =.6, R 2 3;z2 =.01, R 2 3;z23 =.02 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 29b c b c b c b c b c b c Table 2.14: Bootstrapped: One endogenous regressor and weak instruments: R 2 2;z2 = R 2 3;z2 =.01, R 2 2;z23 = R 2 3;z23 =.02 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 35b c b c b c b c b c b c

62 Table 2.15: Bootstrapped: Two endogenous regressors and mildly strong instruments: R 2 2;z2 =.2, R 2 2;z23 =.4, R 2 3;z2 =.2, R 2 3;z23 =.4 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 57b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c b c

63 Table 2.16: Bootstrapped: Two endogenous regressors and stronger instruments: R 2 2;z2 =.3, R 2 2;z23 =.6, R 2 3;z2 =.3, R 2 3;z23 =.6 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 81b c b c b c b c b c b c Table 2.17: Bootstrapped: Two endogenous regressors and weak instruments for y (3) : R 2 2;z2 =.3, R 2 2;z23 =.6, R 2 3;z2 =.01, R 2 3;z23 =.02 Case ρ 2 ρ 3 ρ 23 W 3 D 3 T 3 W 2 D 2 T 2 W2 3 D2 3 T2 3 W3 2 D3 2 T3 2 W 23 D 23 T 23 87b c b c b c b c b c b c b c b c b c b c b c

64 Table 2.18: Regression results for Griliches data, n = 758 OLS V V 1 V 2 log W Coef. Std. Err. Coef. Std. Err. Coef. Std. Err. Coef. Std. Err. S Q EXPR RNS TEN SMSA CONS Table 2.19: DWH tests for Griliches data Variables Test Statistics Critical values Ŵ bc.05 Test type Tested nstruments W D T χ 2.05 full set S, Q Z 1,Z full set S Z 1, Z 2, Q full set Q Z 1, Z 2, S subset S Z 1, Z subset Q Z 1, Z ˆD bc.05 ˆT bc.05 52

65 Chapter 3 On overidentifying restrictions tests and their incremental versions 3.1 ntroduction n this chapter tests on the validity of (subsets of) overidentifying restrictions are examined in a single adequately specified structural equation estimated by V. Contrary to the previous chapter the focus is now on the validity of external instruments. As some of the subset tests are simple functions of standard overidentifying restrictions tests, their properties are reviewed first. Testing the validity of the overidentifying restrictions is of great importance as they provide information regarding the validity of the identification strategy in V regressions and GMM generalizations. nvalidity of instruments will lead to inconsistency, whereas employing more valid instruments will increase efficiency. Several closely related test procedures are examined such as the Sargan test, Basmann s test, the Hausman testing principle and a heteroskedasticity robust test. Additionally, the Cornish-Fisher corrected Sargan test statistic derived by Magdalinos (1985) is reexamined. n finite samples this may yield a test procedure which outperforms those based on standard asymptotics. n order to find out which test has best control over type error probabilities we run simulations over a wide class of relevant cases. These cases are partly chosen based on the correction terms of the Cornish-Fisher corrected Sargan test, as it may provide important information regarding nuisance parameters. The data generating processes (DGPs) are designed by deriving the values for their parameters from chosen econometrically relevant design parameters: degree of simultaneity of a regressor in the equation of interest, degree of simultaneity of variables used as instruments and alternative measures regarding the strength of the instruments such as the concentration parameter and certain measures of fit. This design allows us to investigate type and 53

66 error probabilities for meaningful cases. We find that the degree of simultaneity of regressors has a profound effect on type errors, except for the Cornish-Fisher corrected Sargan test, provided the instruments are not weak. Practitioners often rely on the outcome of overidentifying restrictions tests. t is therefore important to have a good understanding of these tests. Recently, Hall (2005), De Blander (2008) and Parente and Silva (2012) have discussed that an insignificant test outcome may not necessarily be reassuring. The fact that overidentifying restrictions tests are not consistent against certain alternatives has already been noted by Newey (1985). They have power equal to size if the probability limit of the moment vector function lies in the column space of its Jacobian. The analysis is extended to the subset tests and we develop a relative distance measure that is used in the simulation study to investigate to what extent this special case is of importance. On the other hand it is possible that a significant test outcome is not the result of invalid moment conditions. n the presence of conditional heteroskedasticity, the variance of the moment conditions may either be underestimated or overestimated. f the heteroskedasticity and the instruments are positively correlated it is found that non-robust tests severely overreject. Overidentifying restrictions tests have been developed by Sargan (1958), Basmann (1960) and Hansen (1982), where the latter proposed an extension of the Sargan test for models estimated by GMM 1. Properties of these tests and their incremental versions have been discussed by Hwang (1980a), Newey (1985) and Magdalinos (1985) and more recently Davidson and MacKinnon (2014), who discuss bootstrap procedures. Rather than doubting the full set of instruments after a significant test outcome one can use the incremental test to test the validity of groups of instruments as long as their number is smaller than the degree of overidentification. This way a valid instrument set may be collected if available. Several studies have paraphrased the incremental test implementations, such as Newey (1985), Magdalinos (1994), Arellano (2003) and Hall (2005). Only recently these tests have been implemented in EViews and not many simulation findings exist on this topic. Magdalinos (1994) performs simulations for Cornish-Fisher corrected incremental Sargan tests and Niemczyk (2009) provides simulation results regarding the incremental test for both the Sargan and Sargan-Hansen test and their bootstrapped versions. Simulation results for the case of many instruments relative to the number of regressors are given by Lee and Okui (2009) and Hausman et al. (2012), whereas Davidson and MacKinnon (2014) report simulation findings on their bootstrapped variants and the standard statistics. The references given above all focus on overidentifying restrictions tests for single-indexed data models. For an overview of studies on overidentifying restrictions 1 Which we will address as the Sargan-Hansen test 54

67 tests in dynamic panel data models see Chapter 6. The structure of this chapter is as follows. n Section 3.2 we introduce the notation and discuss (power) properties of the overidentifying restrictions tests and their subset versions. The consequences of neglecting heteroskedasticity are discussed in Section 3.3. The corrected Sargan test statistic by Magdalinos (1985) is re-examined in Section 3.4. Next a simulation design is proposed in Section 3.5 which will allow us to investigate relevant cases. Results of simulation experiments based on this design are provided in Section 3.6 whereas Section 3.7 concludes. 3.2 Testing overidentifying restrictions We consider the single linear equation with endogenous regressors y = Xβ + u, (3.1) with X =(x 1,..., x n ) and Z =(z 1,..., z n ) an n K and an n L full column rank matrix respectively, with L K and rank(z X)=K. Furthermore, β is a K-element unknown coefficient vector. For the D unobserved disturbances we have u (0,σu 2 n )andwe assume that in general E(x i u i ) equals a constant vector, with elements which may be zero or nonzero for all i. As some regressors may be correlated with the disturbance term, OLS estimates are unreliable. Therefore we will focus on V estimation. With respect to the moment conditions stemming from the instrumental variables we will distinguish three situations: (i) the situation in which E(z i u i )=0,i.e. all instruments in Z are valid with Z u = O p (n 1/2 ); (ii) E[z i u i ]=δ/ n when we are interested in local power and δ is assumed to be nonzero and fixed; (iii) in case of asymptotic power E(z i u i )=δ. We make asymptotic regularity assumptions to guarantee asymptotic identification of all elements of β and consistency of its V (or 2SLS) estimator under (i) ˆβ =(X P Z X) 1 X P Z y, (3.2) with P Z = Z(Z Z) 1 Z. Hence, we assume that plim n 1 Z Z =Σ Z Z and plim n 1 Z X =Σ Z X (3.3) are finite and have full column rank. Then we can derive that ˆβ has limiting normal distribution n 1/2 ( ˆβ β) d N(0,σu[Σ 2 Z XΣ 1 Z Z Σ Z X] 1 ). (3.4) 55

68 3.2.1 Test statistics and distributions Sargan test Note that it is crucial 2 for consistency of ˆβ that E(z i u i )=0 i. A well known statistic that is often used to test this assumption is the Sargan test statistic. Formally the null and alternative hypothesis state that } H 0 : E[z i u i ]=0 i. (3.5) H 1 : E[z i u i ] 0 These hypotheses can alternatively be written in terms of δ, where the null hypothesis corresponds to δ = 0 and the alternative to δ 0. Let us first consider an unfeasible variant which utilizes the true disturbances 3 S u = u P Z u u u/n. (3.6) From the law of large numbers (LLN) it follows that u u/n p σu 2 and from the central limit theorem (CLT) one finds S u d χ 2 (L), if E(z i u i )=0 S u S u d χ 2 (λ u,l), if E(z i u i )=δ/ n p, if E(zi u i )=δ i, (3.7) where λ u = δ Σ 1 Z Z δ/σ2 u is the noncentrality parameter of S u under the local alternative. Due to the fact that S u is not a feasible test statistic we are interested in the asymptotic distribution of the feasible statistic S =û P Z û/ˆσ 2 u, (3.8) where û = y X ˆβ are the V residuals and ˆσ u 2 = n 1 û û. This test statistic is straightforward in the sense that it projects the residuals on the space spanned by the instruments. Although the null distribution of S is well-known we will formally derive it here and at the same time introduce a notation 4 which will prove to be helpful when analyzing the power of the test. Let (n 1 Z Z) 1 = ΨΨ, where Ψ is lower triangular. Define 2 Except for a specific case, which is mentioned in the next subsection on p Note that this test statistic actually examines E[Z u] = 0 instead of E[z i u i ]=0 i. The latter implies the former but not vice versa. 4 This notation is closely related to the notation of Arellano (2003) and Hall (2005). 56

69 Ψ Ψ =Σ 1 Z Z, also lower triangular and finite, so plim(ψ Ψ) = O. First note that S is the inner product of the vector n 1/2 Ψ Z û/ˆσ u.asˆσ u 2 p σu 2 we can focus on Ψ Z û, which can be rewritten as Ψ Z û = Ψ Z (y X ˆβ) = Ψ Z [ n X(X P Z X) 1 X P Z ]y = Ψ Z [ n X(X P Z X) 1 X P Z ]u = [Ψ Z Ψ Z X(X ZΨΨ Z X) 1 X ZΨΨ Z ]u = [ L P Ψ Z X]Ψ Z u. (3.9) Applying the CLT to n 1/2 Ψ Z u/ˆσ u yields n 1/2 Ψ Z u/ˆσ u d N(0, L ), (3.10) and because the rank of the probability limit of the projection matrix between square brackets in (3.9) is L K, S d χ 2 (L K) under the null hypothesis. Equation (3.9) illustrates that by the Sargan test we are actually not testing whether all instruments are valid but rather whether the overidentifying restrictions are. This is due to the fact that by estimating β the K sample moments X P Z û are set to zero and can therefore not be tested. The full set of moment conditions can only be tested with the unfeasible test statistic S u. A slightly modified statistic is that of Basmann (1960) S B û P Z û = (û û)/(n L) = n L S, (3.11) n and it follows that S B <Sdue to the fact that S B is a monotonic transformation of the Sargan test statistic. All derivations will be done for S and the asymptotic results self-evidently carry over to (3.11). Subset tests We will now turn to the case of testing a subset of moment conditions. Let us partition the instrument matrix as Z =(Z 1 Z 2 ), with Z 1 =(z 11,..., z 1n ) and Z 2 =(z 21,..., z 2n ) an n L 1 and an n L 2 full column rank matrix. Furthermore L = L 1 + L 2 and L 1 >K. When one is confident regarding the validity of Z 1 an incremental test can be employed to test the validity of the remaining subset of L 2 instruments. Formally this can be expressed 57

70 as } H z2 0 : E[z 2i u i ]=0, E[z 1i u i ]=0 i, (3.12) H z2 1 : E[z 2i u i ] 0, E[z 1i u i ]=0 where E[z 1i u i ] = 0 denotes the maintained hypothesis. Consider an alternative instrument matrix Z =(Z 1 M Z1 Z 2 )withp Z = P Z1 + P MZ1 Z 2 = P Z where we use the well-known result that for a full column rank matrix C =(AB) one has P C = P A + P MA B,where M A = P A. Hence, this transformation leaves the V estimator and Sargan test unaltered and from now on we may assume Z 1Z 2 = O without loss of generality. t also implies that in fact we are not testing the validity of Z 2 but rather of M Z1 Z 2. This does not need to be explicitly denoted in (3.12) because although Z 2 may contain a linear combination of the columns of Z 1, Z 1 must consist only of valid instruments due to the maintained hypothesis. The partitioning of the instruments implies that plim n 1 Z jx =Σ ZjX, for j =1, 2. (3.13) From Z 1Z 2 = O it follows that Ψ and Ψ are block-diagonal ( ) ( ) Ψ1 O Ψ1 O Ψ=, Ψ = O Ψ 2 O Ψ, (3.14) 2 where Ψ j Ψ j =(n 1 Z jz j ) 1 and Ψ j Ψ j = plim(n 1 Z jz j ) 1 for j =1, 2. The estimator resulting from only employing Z 1 as instruments is ˆβ 1 =(X P Z1 X) 1 X P Z1 y, (3.15) and the corresponding Sargan test based on Z 1 is S 1 =û 1P Z1 û 1 /ˆσ 2 1,u, (3.16) with û 1 = y X ˆβ 1 and ˆσ 1,u 2 = n 1 û 1û 1.fZ 1 consists only of valid instruments ˆσ 1,u 2 p σu 2 and from S d χ 2 (L K) itfollowsthats d 1 χ 2 (L 1 K). The incremental Sargan statistic to test (3.12) is S = S S 1. (3.17) n order to derive the asymptotic null distribution of S it is convenient to write the vector 58

71 of which S 1 is the inner product in terms of Ψ Z u. Starting off from (3.9) we find Ψ 1Z 1û = [ L1 P Ψ 1 Z 1 X ]Ψ 1Z 1u = [ L1 P Ψ 1 Z 1 X ]AΨ Z u (3.18) where A =( L1 O). Note that in (3.18) AΨ Z u =Ψ 1Z 1u follows from the fact that Ψ is block diagonal. Now S can be written as one quadratic form S = n 1 u ZΨ[( L P ΨZ X)/ˆσ u 2 A ( L1 P Ψ1Z 1 X )A/ˆσ 1,u]Ψ 2 Z u. (3.19) The probability limit of the matrix between square brackets is [ L P Ψ Σ Z X B]/σu, 2 where B = A [ L1 P Ψ 1 Σ ]A. The earlier CLT results for Z n 1/2 Ψ Z u/σ 1 X u and the fact 5 that [ L P Ψ Σ Z X B] is idempotent with rank L 2 give S d χ 2 (L 2 ), (3.20) under the null hypothesis. Of course an incremental Basmann test can be calculated along the same lines S B = S B S1 B, (3.21) where S1 B is the Basmann test statistic that is obtained from employing only Z 1 as instruments. An alternative way of testing the validity of a subset of instruments is by Hausman s principle of examining the discrepancy between two alternative estimators. Although the test as put forward by Hausman (1978) is often used to test the exogeneity of a (sub)set of regressors it can also be used to test the validity of a subset of instruments. The Hausman test statistic for testing the validity of Z 2 is H =(ˆβ 1 ˆβ) [ˆσ 1,u(X 2 P Z1 X) 1 ˆσ u(x 2 P Z X) 1 ] ( ˆβ 1 ˆβ), (3.22) where a generalized inverse is used for the matrix between square brackets. Some alternatives of H and their relation to the Sargan test become apparent by introducing some additional notation. Let us rewrite (3.1) as y = X 1 α 1 + Yα 2 + u, (3.23) 5 See for instance Arellano (2003) and Hall (2005) 59

72 where X =(X 1 Y )andβ =(α 1 α 2) and all regressors in X 1,ann K 1 matrix, are assumed to be exogenous/predetermined and are included in Z 1. The potentially endogenous regressors are collected in the n K 2 matrix Y. Kiviet and Pleus (2014) discuss and provide simulation results on variants of this test statistic when testing the orthogonality of a subset of Y. These variants are obtained as follows. Using equation (2.11) of Kiviet and Pleus (2014) we obtain ˆβ 1 ˆβ =(X P Z1 X) 1 (X 1 P Z1 Y ) û, (3.24) which is close to zero if and only if Y P Z1 û is close to zero as X 1û =0and(X P Z1 X) 1 is non-singular. So in order to test the null hypothesis in (3.12) the Hausman test examines Y P Z1 û. Consider now the auxiliary regression equation y = Xβ + P Z1 Yξ+ u, (3.25) which can be estimated by V using all instruments to obtain ˆξ = (Y P Z1 M PZ XP Z1 Y ) 1 Y P Z1 M PZ Xy = (Y P Z1 M PZ XP Z1 Y ) 1 Y P Z1 [ n P Z X(X P Z X) 1 X P Z ]y = (Y P Z1 M PZ XP Z1 Y ) 1 Y P Z1 [ n X(X P Z X) 1 X P Z ]y = (Y P Z1 M PZ XP Z1 Y ) 1 Y P Z1 û. (3.26) As (Y P Z1 M PZ XP Z1 Y ) 1 is non-singular it follows that by testing ξ = 0 in (3.25) we are also examining the closeness to zero of Y P Z1 û. Doing so by a Wald type test we obtain W ( σ 2 u)=y (P (PZ XP Z1 Y ) P PZ X)y/ σ 2 u, (3.27) where various options exist for σ u. 2 This test statistic replaces the different estimates of σu 2 in (3.22) by one and the same. Kiviet and Pleus (2014) find that in finite samples the test statistic which takes σ u 2 =ˆσ u 2 performs best under the null hypothesis. Let us now examine the relation between W ( σ u) 2 and the incremental Sargan statistic which also uses the same estimate of σu 2 S H ( σ 2 u)=û P Z û/ σ 2 u û 1P Z1 û 1 / σ 2 u. (3.28) 60

73 We start off by noting that the numerator of S H ( σ2 u) can be written as û P Z û û 1P Z1 û 1 = y (P Z P Z1 + P PZ1 X P PZ X)y = y (P Z2 + P PZ1 X P PZ X)y = y (P (Z2 P Z1 X) P PZ X)y, (3.29) where the second and third line follow from P Z = P Z1 + P MZ1 Z 2 = P Z1 + P Z2, Z 1Z 2 = O and P (Z2 P Z1 X) = P Z2 + P PZ1 X. Comparing the numerator of (3.27) with (3.29) shows that S H( σ2 u)andw ( σ u) 2 will be numerically equivalent if P (Z2 P Z1 X) = P (PZ1 Y P Z X). Written differently, we require that S(Z 2,P Z1 X)=S(P Z1 Y,P Z X), where S(C) denotes the subspace spanned by the columns of a general a b matrix C. The left side of this equation can be written as S(Z 2,P Z1 X)=S(X 1,P Z1 Y,Z 2 ), (3.30) and the right as S(P Z1 Y,P Z X)=S(X 1,P Z1 Y,P Z2 Y ). (3.31) t follows that the space denoted in (3.30) will equal that of (3.31) if L 2 K 2. Therefore, if L 2 >K 2 S H( σ2 u) W ( σ u), 2 whereas S H( σ2 u)=w ( σ u)ifl 2 2 K 2. Under the null hypothesis H and W ( σ u) 2 are asymptotically chi-squared distributed with min{l 2,K 2 } degrees of freedom because at most K 2 parameters can be tested by a Hausman test. f L 2 K 2 all tests on subsets considered are asymptotically equivalent under the null hypothesis as long as consistent estimates of σu 2 are used. We will consider both H (not investigated by Kiviet and Pleus, 2014) and S H(ˆσ2 u) (in our simulations equal to W (ˆσ u) 2 and then simply denoted by S H) Power properties Sargan test Now we will discuss the power properties of the various tests. First we will examine the local power properties. n order to do so we employ a so-called Pitman (1949) drift for the violation of the orthogonality conditions. Let E[z i u i ]=δ/ n, i. (3.32) 61

74 Although the orthogonality conditions are violated 6 for finite n this violation disappears asymptotically. t does not affect the consistency of the estimators for β and σu 2 but does introduce a noncentrality parameter in the distribution of S. By using the Pitman drift instead of E[z i u i ] = 0 we obtain the following theorem. Theorem 3.1 f E[z i u i ]=δ/ n the asymptotic distribution of S is chi-squared with L K degrees of freedom and noncentrality parameter λ = δ Ψ[L P Ψ Σ Z X ] Ψ δ/σu. 2 (3.33) See Appendix 3.A for a proof. Let us examine λ in more detail. f δ =Σ Z Xc for any nonzero K 1 vector c, soifδ lies in the column space of Σ Z X, the test has no local power because λ = 0. Furthermore, this problem is not restricted to a lack of local power. That the standard nonlocal power of the test is also affected can be seen from the following. f E[z i u i ]=δ i then plim n 1 S = δ Ψ[L P Ψ Σ Z X ] Ψ δ/σ, 2 (3.34) where plim ˆσ u 2 = σ 2 differs from σu 2 but is finite. t is easily seen from (3.34) that plim n 1 S =0ifδ =Σ Z Xc, which implies that plim S is finite and will not always have asymptotic power equal to unity with respect to nonlocal alternatives. This illustrates that an insignificant test outcome is not necessarily comforting. That the Sargan test is not consistent against some (local) alternatives has already been addressed by several studies such as Newey (1985), Hall (2005) and more recently De Blander (2008) and Parente and Silva (2012). The recent attention this property of overidentifying restrictions tests has received is relatively surprising given the results of Newey (1985). Parente and Silva (2012) acknowledge that their remarks 7 on this property are not new but argue that little attention is paid to them in the literature. This is not completely true. Often the case in which δ lies in the column space of Σ Z X is excluded by assuming at least K elements of δ to be equal to zero. Suppose the first K elements of δ are zero. Then, in order to satisfy the first K equations of δ =Σ Z Xc, it is required that c = 0. However, c = 0 cannot be the solution to the last L K equations corresponding to nonzero elements of δ. Hence, δ cannot lie in the column space of Σ Z X. So, under the assumption that at least K instruments are valid the test will always have asymptotic 6 (3.32) means that the data cannot stem from a strictly stationary process. For more details on this we refer to Chapter 5 of Hall (2005). 7 Parente and Silva (2012) restate the problem in terms of the probability limit of the V estimator, rather than the moment conditions. The Sargan test will have no power as long as it is possible to find a vector (β β) suchthatδ =Σ Z X(β β)/σ u,whereβ is the probability limit of the estimator under misspecification. Although this is claimed to be more intuitive it yields no additional insights. 62

75 power equal to unity with respect to nonlocal alternatives. Another special case exists in which the instruments are invalid, the resulting estimator is nevertheless consistent, while S. p So, even though the estimator is consistent, the overidentifying restrictions tests will warn the practitioner. This may happen if δ =Σ Z Zκ and κ lies in the null space of Σ Z X. To see this we examine the probability limit of ˆβ β plim( ˆβ β) = plim(x P Z X) 1 X P Z u = (Σ Z XΣ 1 Z Z Σ Z X) 1 Σ Z XΣ 1 Z Z δ = (Σ Z XΣ 1 Z Z Σ Z X) 1 Σ Z XΣ 1 Z Z Σ Z Zκ = (Σ Z XΣ 1 Z Z Σ Z X) 1 Σ Z Xκ = 0. (3.35) That S follows p from the fact that [ L P Ψ Σ Z X ] Ψ δ =[ L P Ψ Σ Z X ] Ψ Σ Z Zκ 0. For additional details on the power of the Sargan test in nonlinear GMM see for instance Newey (1985) and Hall (2005). The literature has mostly focused on the very specific case in which the Sargan test has no (local) power. n finite samples the power of these tests may very well depend on the extent to which δ lies in the column space of Σ Z X even if formally the equality δ =Σ Z Xc does not hold. We will measure to what extent δ lies in the column space of Σ Z X in our simulations as the problem is generally interpreted this way. Consider φ δ = δ Σ Z X(Σ Z X Σ Z X) 1 Σ Z X δ. (3.36) δ δ This relative distance measure is the population R 2 when regressing δ on Z X. Figure 3.1 illustrates the issues for L =3andK =1. ThenΣ Z X and δ are both 3 1 vectors and that the space spanned by Σ Z X is just a straight line in R 3. Hence, the vectors in (3.36) can be depicted in R 2. Let the horizontal line be the space spanned by Σ Z X.The idea behind (3.36) is simple. f δ is orthogonal to the horizontal line, the angle α is 90 degrees. As δ approaches the column space of Σ Z X, this angle decreases. Hence, φ δ can be interpreted as the R 2 equivalent of the cosine of α. The complexity of expressing nonorthogonality of δ and Σ Z X increases for larger K. Estimating φ δ is hardly possible in practice. One can estimate Σ Z X by n 1 Z X and compare it with a prior guess on δ, although this obviously depends completely on the accuracy of the guess. However, φ δ may very well serve as an indicator in our simulations. As Parente and Silva (2012) argue, it seems that one should at least be wary if the motivation for all instruments is the same, as their correlations with the regressors and 63

76 Figure 3.1: llustration of φ δ δ [ L P ΣZ X ]δ α P Z Xδ S(Σ Z X) the disturbance term will most probably be similar too. n our simulations we will examine how similar these instruments need to be in order to hamper the analysis. Subset tests The power of the incremental test has been discussed less often than that of the regular Sargan test. Again, let us first consider local power. Theorem 3.2 f E[z 1i u i ]=0and E[z 2i u i ]=δ 2 / n i, the asymptotic distribution of S is chi-squared with L 2 degrees of freedom and noncentrality parameter λ = δ 2 Ψ 2 [ L2 Ψ 2Σ Z 2 X(Σ Z X Ψ Ψ Σ Z X) 1 Σ Z 2 X Ψ 2 ] Ψ 2δ 2 /σ 2 u. (3.37) Proof of this theorem is given in Appendix 3.A. The maintained hypothesis E[z 1i u i ]=0 i plays a crucial role to ensure that λ > 0. n general λ can be written as the difference of two noncentrality parameters λ = λ λ 1,whereλ 1 is the noncentrality parameter of S 1. Under the maintained hypothesis λ 1 is zero as δ 1 =0andλ = λ. As L 1 >Kelements of δ are zero it is guaranteed that λ = λ is positive. Under the maintained hypothesis the incremental Sargan test has more local power than the regular Sargan test as its asymptotic distribution has fewer degrees of freedom while the noncentrality parameters are the same. f the maintained hypothesis is also violated locally (so E[z i u i ]=δ/ n, i) the noncentrality parameter can be shown, using (3.19), to be λ = λ λ 1 = δ Ψ[L P Ψ Σ Z X B] Ψ δ/σ 2 u. (3.38) From the fact that both λ and λ 1 cannot be negative it follows that λ λ if the maintained hypothesis is violated. The noncentrality parameter λ will be zero if δ = 64

77 Σ Z Xc, so for exactly the case in which the regular Sargan test has no local power. To see this note that Σ Z Xc = δ implies Σ Z 1 Xc = δ 1, because (c Σ Z 1 X,c Σ Z 2 X) =(δ 1,δ 2)and both λ and λ 1 will be zero (and consequently λ as well). For the Hausman type test statistics the next theorem gives the noncentrality parameter (proof given in Appendix 3.A) Theorem 3.3 f E[z 1i u i ]=0and E[z 2i u i ]=δ 2 / n, i, the asymptotic distribution of H is chi-squared with min{l 2,K 2 } degrees of freedom and noncentrality parameter ζ = δ 2 Ψ 2 Ψ 2 Σ Z 2 X(Σ Z X Ψ Ψ Σ Z X) 1 Q (Σ Z X Ψ Ψ Σ Z X) 1 Σ Z 2 X Ψ 2 Ψ 2 δ 2 /σ 2 u, (3.39) where Q =(Σ Z 1 X Ψ 1 Ψ 1 Σ Z 1 X) 1 (Σ Z X Ψ Ψ Σ Z X) 1. (3.40) Newey (1985) provides a proof that λ = ζ if L 2 K 2.fL 2 >K 2 either test may have more local power, depending on the DGP. With respect to the nonlocal power of S we obtain the nonzero right hand side of (3.34) for plim n 1 S under the maintained hypothesis of δ 1 = 0, and the test statistic tends to infinity as n grows. The same is true for the Hausman type tests as longs as the two estimators do not have the same probability limit 8. So under the maintained hypothesis, both statistics tend to infinity and are consistent. However, as shown by the local power analysis, the degree to which δ lies in the column space of Σ Z X may substantially affect finite sample performance. 3.3 Neglecting heteroskedasticity The previous section showed that overidentifying restrictions tests and their subset versions may have power equal to size under certain alternatives. Although the consistency of the V estimator is unaffected, we will investigate to what degree heteroskedasticity may cause a significant test outcome and may thus lead practitioners to reject an efficient estimator. For convenience let us assume for the remainder of this section that Z is random, its rows are independent and that the CLT yields n 1/2 Z u d N (0,V), (3.41) 8 The two estimators have the same probability limit if δ =Σ Z Xc, see the proof of Theorem

78 where V is given by V = lim n 1 n E[u 2 i z i z i]. (3.42) i=1 Note that if the errors are homoskedastic V = σuσ 2 Z Z. Following for instance Kiviet and Feng (2014) (3.42) can be written as n V = lim n 1 E[u 2 i z i z i] i=1 n = plim n 1 u 2 i z i z i (3.43) i=1 n = plim n 1 σ0z 2 i z i + plim n 1 i=1 n (u 2 i σ0)z 2 i z i n = σ0σ 2 Z Z + plim n 1 (u 2 i σi 2 )z i z i, (3.44) i=1 where σ0 2 = plim n 1 n i=1 u2 i is finite. Because ˆσ u 2 no longer converges to σu 2 but rather to σ0, 2 σ0σ 2 Z Z is the probability limit of the estimate of V used by V. Hence, this estimate is consistent only if the second term of (3.44) equals zero. We find plim n 1 i=1 n n (u 2 i σi 2 )z i z i = lim n 1 E{E[u 2 i σ0 z 2 i ]z i z i}. (3.45) i=1 This term will only equal zero if E[u 2 i z i ]=σ0, 2 i.e. if the disturbances are conditionally homoskedastic. f however the disturbances are subject to conditional heteroskedasticity V σuσ 2 Z Z. n terms of rejection probabilities of the Sargan test we will examine two extreme cases here. The result above implies that if u 2 i and z i are positively correlated for i =1,..., n, large realizations of z i z i are under-weighted, whereas small realizations are over-weighted. Hence, in a matrix sense σ0σ 2 Z Z is too small and S will tend to overreject. By the same reasoning we would expect that a negative correlation between u 2 i and z i for i =1,..., n results in underrejection. t is therefore important to make sure that the errors are homoskedastic before relying on (3.8) or to use a heteroskedasticity robust test. Such a test statistic is J =û ZŴ 1 Z û, (3.46) with Ŵ = n i=1 û2 i z i z i. This test statistic is very similar to the Sargan-Hansen test and only differs in the residuals that are tested. The Sargan-Hansen test uses two-step GMM i=1 66

79 residuals whereas J uses one-step residuals. t is easy to obtain that n 1 Ŵ J d χ 2 (L K), see for instance Arellano (2003) for a proof. p V and 3.4 A higher order refinement to the Sargan test As an alternative to standard asymptotic results we discuss some higher order results here. The Cornish-Fisher expansion and corrected test statistic derived by Magdalinos (1985) are re-examined. The corrected statistic is easily computed although rarely reported, whereas it may provide a substantial improvement over the usual statistics. Magdalinos (1985) lacks simulation results on this corrected statistic. We will only consider 9 the correction based on V estimates, a specific case of the derivation of Magdalinos (1985) who derives a Cornish-Fisher correction for the Sargan test based on a general k-class estimator. Consider the reduced form equations X = ZΠ+V, (3.47) where Π is a L K matrix of unknown coefficients and V is a n K matrix consisting of the reduced form disturbances. n this section we will assume that (u i V i ) N(0, Λ), with ( ) σ 2 Λ= u σ u ρ. (3.48) σ u ρ An Edgeworth expansion for S of the form Ω V S = s 0 + n 1/2 s 1 + n 1 s 2 + O p (n 3/2 ). (3.49) is obtained by finding the expansions for n 1/2 ( ˆβ β)/ˆσ u and ˆσ u/σ 2 u 2 and plugging them in the formula for S. Collecting terms of the same order then yields (3.49). Through the characteristic function a higher order cumulative distribution function (CDF) is obtained Pr(S x) =F L K (x n 1 q 1 x n 1 q 2 x 2 )+O p (n 3/2 ), (3.50) 9 As Magdalinos (1985) is not very clear we present detailed derivations in Appendix 3.B 67

80 with F L k (x) as the CDF of the chi-squared distribution with L K degrees of freedom, q 1 = tr(g 1 Q) tr(g 1 Ω X )+ 1 (L K)+2K (3.51) q 2 = tr(g 1 Q) 1 2, (3.52) and ˆΠ =(Z Z) 1 Z X, ˆδ = X û/(nˆσ u ), G = n 1 Z Π ΠZ, Ĝ = n 1 Z ˆΠ ˆΠZ, Ω X = G +Ω V, ˆΩX = X X/n, Q = ρρ, ˆQ =ˆρˆρ. An unfeasible corrected Sargan statistic is then calculated from S c = S n 1 q 1 S n 1 q 2 S 2, (3.53) with feasible variant S c = S n 1ˆq 1 S n 1ˆq 2 S 2. (3.54) Estimates ˆq 1 and ˆq 2 are easily obtained by substituting Ĝ, ˆQ and ˆΩ X in (3.51) and (3.52). The corrected test statistic in (3.54) should in finite samples behave more like a χ 2 (L K) distributed variable than S (under normality of the disturbances). However, in order for this test to be useful in practice, not only should it behave well under the null hypothesis, it should also have acceptable power. t is easily shown that S = O p (1) under the null hypothesis. Under nonlocal alternatives Z u = O p (n), which means that S = O p (n). The first correction term in (3.54) is O p (1), whereas the second is O p (n). As a result S is corrected by an O p (n) which might give problems under the alternative hypothesis, especially when q 1 and q 2 are hard to estimate (for instance when the instruments are weak). Even if the corrected test statistic is not found to be useful in practice, the correction terms offer information about nuisance parameters. The first part of interest is the occurrence of tr(g 1 ρρ )=ρ G 1 ρ in both q 1 and q 2. This means that the degree of simultaneity of the endogenous regressors is of importance. Furthermore, the trace of G 1 Ω X is present in q 1. This is a measure of the variance of the errors in the reduced form of X relative to the signal. Furthermore, the degree of overidentification and the number of regressors occur in q 1. These factors should provide guidelines for examining the finite sample performance of the various test statistics. 68

81 3.5 Simulation design n order to compare the performance in finite samples of the various techniques discussed we design a Monte Carlo simulation. Consider the model y i = β 0 + x i β 1 + σ i u i (3.55) x i = π 0 + z 1i π 1 + z 2i π 2 + z 3i π 3 + σ η η i + ρu i, (3.56) for i =1,..., n. So,K =2,L = 4 and the degree of overidentification is L K = 2. Because the statistics to be analyzed will be invariant regarding the values of the intercepts they will both be set to zero, thus β 0 = π 0 = 0. Additionally we may choose β 1 = 0 because the residuals, and thus the test statistics, are not determined by β 1. The disturbance series u i and η i will be generated as mutually independent ND(0,1) series. The instruments are generated according to z ji =(1 γj 2 ) 1/2 z ji + γ j u i,j=1, 2, 3, (3.57) with z ji also drawn from mutually independent ND(0,1) series. We require γ j < 1 and it follows that Var(z ji )=1 j. Without loss of generality, we may choose σ u =1. Another simplification comes from setting the variance of x i to unity. This is achieved by choosing σ η, if feasible, appropriately in Var(x i ) = π1 2 + π2 2 + π3 2 + ση 2 + ρ 2 +2π 1 γ 1 ρ +2π 2 γ 2 ρ +2π 3 γ 3 ρ +2π 1 π 2 γ 1 γ 2 +2π 1 π 3 γ 1 γ 3 +2π 2 π 3 γ 2 γ 3 = 1. (3.58) The heteroskedasticity (obtained by setting θ = 1 instead of θ = 0) is generated as follows. Define h i (θ) = θ 2 /2+ θ [ z 1i + z 2i + z 3i ]. (3.59) 3 As we are interested in the effect of either a positive or negative correlation between σi 2 and z i we set σi 2 = e hi(θ) to establish the former and σi 2 = e hi(θ) to generate the latter. Afterwards σi 2 is rescaled by dividing it by its sample standard deviation. We are left with seven free parameters for the DGP: {ρ, π 1,π 2,π 3,γ 1,γ 2,γ 3 }.nstead of directly choosing values for these DGP parameters we define seven econometrically relevant design parameters that are functions of the DGP parameters. One obvious design 69

82 parameter is the degree of simultaneity in x ρ xu = Cov(x i,u i )/ Var(x i )=π 1 γ 1 + π 2 γ 2 + π 3 γ 3 + ρ, (3.60) which under the null hypothesis of valid instruments γ 1 = γ 2 = γ 3 = 0 simplifies to ρ and it follows that ρ < 1. Three other design parameters are the degree of simultaneity in z 1, z 2 and z 3 ρ zju = Cov(z ji,u i )/ Var(z ji )=γ j,j=1, 2, 3. (3.61) Next we define the marginal instrument strength of z 1, z 2 and z 3 for x. Note that this definition is only meaningful if all instruments are valid. We define R 2 z j = π 2 j,j=1, 2, 3, (3.62) with the natural restriction that Rz Rz Rz 2 3 < 1. f this restriction is not obeyed one cannot find a positive σ η such that (3.58) holds. By choosing values for the marginal instrument strengths we may solve π j = ± Rz 2 j. (3.63) Alternatively under the null hypothesis we will consider situations in which all instruments have equal strength, so π 1 = π 2 = π 3 = π. Consider the concentration parameter μ 2 = n(π2 1 + π2 2 + π3) 2 = 3nπ2 1 π1 2 π2 2 π π, (3.64) 2 and we may solve π from μ π = ± 2 3(n + μ 2 ). (3.65) A popular measure for weakness of instruments is the first stage F -statistic. Stock et al. (2002) provide an approximate relation between μ 2 and F which in our specific case states that E[F ] μ 2 /3+1. With respect to the design parameters we will investigate combinations from ρ xu [0, 0.6], R1 2 {0.01, 0.2} ρ z1u [0, 0.2], R2 2 {0.01, 0.2}, (3.66) ρ z2u [0, 0.2], R3 2 {0.01, 0.2} ρ z3u [0, 0.6] 70

83 where we will additionally assume that Rz 2 1 Rz 2 2 as z 1 and z 2 are interchangeable. Alternatively when all instruments are assumed to have equal strength in the reduced form equation we will consider combinations involving μ 2 {10, 30, 80}. 3.6 Simulation findings on rejection probabilities Each of the R simulation replications new independent observations are drawn on u, η, z 1, z 2 and z 3. The four test statistics S, S c, S B and J are calculated when employing either the full set of instruments Z =(ιz 1 z 2 z 3 ) or a reduced set of instruments Z 1 =(ιz 1 z 2 ). Furthermore the subset tests S, S B, H, SH and J will be calculated for testing the subset Z 2 = z 3. Every replication it is checked whether or not the appropriate null hypothesis is rejected at the 5% significance level. From this we construct Monte Carlo estimates of the rejection probabilities p with standard error SE( p)= p(1 p)/r. (3.67) First we will investigate the performance of these test statistics under their respective null hypotheses, so ρ zju =0forj =1, 2, 3. Furthermore, we choose R = and n = 50 in order to be able to observe small sample behaviour. Figures 3.2 and 3.3 graph the estimated rejection probabilities as a function of ρ xu when μ 2 = 80 (and all instruments are relatively strong, E[F ] 28). When testing all overidentifying restrictions we have to compare the test statistics with the critical value from a chi-squared distribution with two degrees of freedom. When using only Z 1 as instruments the corresponding degree of freedom is one. Although the degree of simultaneity in x should not affect the performance of the test statistics, the Cornish-Fisher correction term suggests otherwise. ndeed, as ρ xu approaches unity the regular Sargan and Basmann statistics are affected in the sense that they tend to reject more often than for ρ xu = 0. The Basmann statistic underrejects if ρ xu = 0, in which case the regular Sargan statistic behaves very well. The heteroskedasticity robust test statistic J tends to overreject for every ρ xu, which also gets worse as ρ xu grows. t seems that robustness comes with a price. The corrected Sargan test statistic on the other hand behaves very well over the whole range of ρ xu. When testing less overidentifying restrictions the rejection probabilities are closer to each other and the curves are more flat. Figure 3.4 depicts the same type of graph but now for the subset tests. These all have one degree of freedom as they test the validity of z 3. The performance of the standard Hausman test statistic H is precarious, whereas using the same estimate of σu 2 twice (S H ) actually yields the best performance. 71

84 After decreasing the strength of the instruments by taking μ 2 = 30 (mildly strong instruments, E[F ] 11) we observe the same patterns in Figures 3.5, 3.6 and 3.7. However, the test statistics are a bit more sensitive to changes in ρ xu. Also the subset tests now show increasing rejection probabilities as ρ xu grows. Under weak instruments, by taking μ 2 =10(E[F ] 4), we obtain Figures 3.8, 3.9 and Again the patterns have become more extreme and even the corrected Sargan test overrejects as ρ xε becomes large. Testing fewer overidentifying restrictions again improves the results. Also the subset tests reject more often when the instruments are weak. When the errors are heteroskedastic and σi 2 and z i are positively correlated (denoted by + ) we obtain Figures 3.11, 3.12 and 3.13 for μ 2 = 80. Although J remains oversized it outperforms the other tests statistics by a considerable margin. As predicted every non-robust statistic overrejects. No difference between testing Z and Z 1 becomes apparent. The effect of heteroskedasticity is slightly less serious for the subset tests, although still very much present. When σi 2 and z i are negatively correlated (denoted by - ) we obtain Figures 3.14 and ndeed all non-robust tests statistics underreject severely. Hence, when testing overidentifying restrictions or testing subsets of instruments the homoskedasticity assumption should always be justified when using non-robust statistics. Findings under the alternative hypothesis will not be size corrected. However, given the dependence of the test statistics on ρ xε we choose ρ xε = 0 as for that case only J and J tended to overreject. We set ρ z1ε = ρ z2ε = 0 and consider various values of ρ z3ε ranging from -0.7 to 0.7. For μ 2 = 80 the results are given in Figures 3.16 and As the Basmann statistic will have exactly the same power properties as the Sargan statistic after proper size correction we will not discuss it any longer. With respect to testing all overidentifying restrictions we find that the heteroskedasticity robust statistic has less power than the corrected and regular Sargan test as it rejected more often under the null but less frequently under the alternative hypothesis. Regarding the subset tests H and S H performbest. EventhoughH was found to underreject when ρ xε =0itspowercurve is actually the steepest. However, it must not be forgotten that this test statistic is highly sensitive to large values of ρ xε. n general the subset tests have more power than the regular tests as should be expected. Reducing the strength of the instruments results in a loss of power as can be seen from Figures 3.18, 3.19, 3.20 and Especially when μ 2 = 10 the corrected Sargan statistic and the subset Hausman statistic H suffer. To investigate the effect of φ δ in (3.36) we must choose nonzero values for ρ z1ε and or ρ z2ε. The same plots will be used, although two additional lines will represent φ δ (dashed) and the noncentrality parameter (solid). First we examine a case in which φ δ equals one for a specific value of ρ z3ε. n Figures 3.22 and 3.23 ρ z1u = ρ z1u =0.2 andrz 2 j =0.2 for 72

85 j =1, 2, 3. Hence, ρ z3ε =0.2 corresponds to φ δ = 1. The power plots are shifted in the direction to which φ δ = 1. Even though the orthogonality conditions are violated more, the standard overidentifying restrictions tests reject less often when ρ z3ε =0.2 than when ρ z3ε = 0. The same is found for the subset tests of which the maintained hypothesis is now violated. These figures also offer the possibility to see the difference between φ δ and the noncentrality parameter, which is interesting as φ δ should give information about the noncentrality parameter. Although they reach their respective maximum and minimum at thesamevalueofρ z3ε, their behaviour in the neighborhood of this point is very different. The reason for this is that the noncentrality parameter actually examines the fit of Ψ δ on the space spanned by the columns of Ψ Σ Z X. So although the special case in which the tests have power equal to size can be written in terms of φ δ we observe that the behaviour in the neighborhood of this special case is more complicated and even harder to anticipate in practice. Next we consider a similar case in which ρ z2ε = 0. So in Figures 3.24 and 3.25 φ δ is never equal to one. Again the power plots are shifted. Testing all overidentifying restrictions we find that the tests have power over the whole range of ρ z3ε. This is not true for the subset tests which are found to suffer more from the fact that the maintained hypothesis is violated. Only when ρ z3ε is large do the subset tests reject more often than the tests on all overidentifying restrictions. Summarizing our findings with respect to violations of the maintained hypothesis we note that the rejection function is shifted in the direction of the maximum of φ δ. 3.7 Conclusions n this study we discuss various implementations of overidentifying restrictions tests, which are all variations on the Sargan test. A Cornish-Fisher corrected test statistic proposed by Magdalinos (1985) is reviewed and compared with the standard statistics. ts correction terms show that the standard implementations are sensitive to the degree of simultaneity in the explanatory variables and the degree of overidentification. n addition to the standard test statistics we consider their incremental versions and two Hausman-type test statistics. Recent work that discusses the lack of power of these tests is clarified and extended to the incremental tests. The potential effects of (un)conditional heteroskedasticity on robust and non-robust statistics are derived. n order to investigate small sample behaviour a series of Monte Carlo experiments is conducted. With respect to behaviour under the null hypothesis we find that all regular test statistics tend to overreject as the degree of simultaneity in the regressor becomes large. 73

86 The corrected Sargan test does not suffer from this and performs satisfactory for most cases considered. Only when the instruments are very weak the corrected Sargan test exhibits the same dependence, although still less extremely. The same dependence is found for the incremental tests. The standard Hausman test performs worse than the Sargan and Basmann, whereas a version that uses the same estimate of σu 2 (estimated under the null hypothesis) twice performs best. The heteroskedasticity robust test statistics are found to overreject substantially under homoskedasticity but also noticeably under heteroskedasticity and it seems that robustness comes with a price. n order to investigate rejection probabilities under the alternative we consider a case in which all tests offered the best performance under the null, which is when the possibly endogenous regressor is actually exogenous. We find that even when the instruments are relatively weak the tests have substantial power in small samples. The corrected Sargan test performs worse than the other tests when we simulate DGPs far from the null hypothesis and when instruments are too weak. f only a subset of the overidentifying restrictions is invalid the incremental tests that test the validity of this particular subset have more power than the regular tests unless the maintained hypothesis is violated. Also under the alternative the test which uses the same estimate of σu 2 twice performs best. f the maintained hypothesis is violated the rejection function of all test statistics shift in the direction to which the expectation of the moment vector function (n 1 Z u) lies closest to the column space of its Jacobian. All results are obtained from simulation experiments with Gaussian disturbances. Because the corrected Sargan test is derived under the assumption of Gaussian disturbances it is interesting to examine its performance when the disturbances follow alternative distributions. Another limitation is that we concentrate on cases in which the degree of overidentifying restrictions is relatively small and do not consider cases in which L K becomes large. These subjects are left for future research. 74

87 Appendix 3.A Proofs of theorems Proof of Theorem 3.1 First we must show that under the local alternative ˆσ 2 u p σ 2 u.fe[z i u i ]=δ/ n û û/n = (y X ˆβ) (y X ˆβ)/n = u [ n P Z X(X P Z X) 1 X ][ n X(X P Z X) 1 X P Z ]u/n = u u/n u P Z X(X P Z X) 1 X u/n u X(X P Z X) 1 X P Z u/n +u P Z X(X P Z X) 1 X X(X P Z X) 1 X P Z u/n. For the first term we find u u/n p σu. 2 Furthermore, plim u P Z X(X P Z X) 1 X u/n =0 and plim u P Z X(X P Z X) 1 X X(X P Z X) 1 X P Z u/n = 0, where we use that Z u/ n p δ and Z u/n p 0. Using (3.9) we may write S as S = n 1 u ZΨ[ n P Ψ Z X]Ψ Z u/ˆσ 2 u. (3.68) As n 1/2 Z u d N(δ, σuσ 2 Z Z), n 1/2 Ψ Z u/ˆσ d u N( Ψ δ/σ u, L ) and plim[ n P Ψ Z X]= n P Ψ Σ Z X with rank L K, wefinds d χ 2 (λ, L K), where λ = δ Ψ[L P Ψ Σ Z X ] Ψ δ/σ 2 u. (3.69) Proof of Theorem 3.2 We start with the expression for S in (3.19) S = n 1 u ZΨ[( L P ΨZ X)/ˆσ 2 u A ( L1 P Ψ1Z 1 X )A/ˆσ 2 1,u]Ψ Z u. (3.70) Under the local alternative ˆσ u 2 p σu,ˆσ 2 1,u 2 p σu 2 and n 1/2 Ψ Z u/σ d u N( Ψ δ/σ u, L ). Using that B = plim A ( L1 P Ψ1Z 1 X )A and that [ L P ΨΣZ X B] is idempotent with rank L 2, we find S d χ 2 (λ,l 2 ), (3.71) with noncentrality parameter λ = δ Ψ[L P Ψ Σ Z X B] Ψ δ/σ 2 u. (3.72) 75

88 This noncentrality parameter can be written as λ = δ Ψ[L P Ψ Σ Z X B] Ψ δ/σ 2 u = δ Ψ[L P Ψ Σ Z X ] Ψ δ/σ 2 u δ Ψ B Ψ δ/σ 2 u = λ δ ΨA ( L1 P Ψ 1 Σ Z 1 X )A Ψ δ/σ 2 u = λ δ 1 Ψ 1 [ L1 P Ψ 1 Σ Z 1 X ] Ψ 1δ 1 /σ 2 u = λ λ 1, where AΨ Z u =Ψ 1Z 1u is used once more. Under the maintained hypothesis of δ =(0,δ 2) we obtain λ 1 =0andλ = λ which reduces to λ = δ 2 Ψ 2 [ L2 Ψ 2Σ Z 2 X(Σ Z X Ψ Ψ Σ Z X) 1 Σ Z 2 X Ψ 2 ] Ψ 2δ 2 /σ 2 u. (3.73) Proof of Theorem 3.3 Let us first restate the Hausman statistic H as H = n( ˆβ 1 ˆβ) {n[ˆσ 2 1,u(X P Z1 X) 1 ˆσ 2 u(x P Z X) 1 ]} n( ˆβ 1 ˆβ) (3.74) Using equation (2.11) from the previous chapter we can write n( ˆβ1 ˆβ) = n(x P Z1 X) 1 X P Z1 û = n(x P Z1 X) 1 X P Z1 [ n X(X P Z X) 1 X P Z ]u = n 1/2 (X P Z1 X) 1 X Z 1 Ψ 1 Ψ 1Z 1[ n X(X P Z X) 1 X P Z ]u = n 1/2 (X P Z1 X) 1 X Z 1 Ψ 1 AΨ Z [ n X(X P Z X) 1 X P Z ]u = n(x Z 1 Ψ 1 Ψ 1Z 1X) 1 X Z 1 Ψ 1 A[ n P Ψ Z X]Ψ n 1/2 Z u, (3.75) from which we can observe that plim( ˆβ ˆβ 1 )=0ifδ =Σ Z Xc. Furthermore we denote Q = plim n[(x P Z1 X) 1 (X P Z X) 1 ]. Using again that n 1/2 Ψ Z u/σ u to be d N( Ψ δ/σ u, L ) the noncentrality parameter is found δ Ψ[n P Ψ Σ Z X ]A Ψ 1 Σ Z 1 X(Σ Z 1 X Ψ 1 Ψ 1 Σ Z 1 X) 1 Q (Σ Z 1 X Ψ 1 Ψ 1 Σ Z 1 X) 1 Σ Z 1 X Ψ 1 A[ n P Ψ Σ Z X ] Ψ δ. 76

89 Under the maintained hypothesis δ =(0,δ 2). Applying this to the second line of the noncentrality parameter we find (Σ Ψ Z 1 X 1 Ψ 1 Σ Z 1 X) 1 Σ Ψ Z 1 X 1 A[ n P Ψ Σ Z X ] Ψ δ = (Σ Ψ Z 1 X 1 Ψ 1 Σ Z 1 X) 1 Σ Ψ Z 1 X 1 A Ψ δ (Σ Ψ Z 1 X 1 Ψ 1 Σ Z 1 X) 1 Σ Ψ Z 1 X 1 AP Ψ Σ Ψ Z X δ ( ) = (Σ Ψ Z 1 X 1 Ψ 1 Σ Z 1 X) 1 Σ Ψ 0 Z 1 X 1 ( L1 O) Ψ 2δ 2 (Σ Ψ Z 1 X 1 Ψ 1 Σ Z 1 X) 1 Σ Ψ Z 1 X 1 Ψ1 Σ Z 1 X(Σ Ψ ΨΣ Z X Z X) 1 Σ Ψ Z 2 X 2 Ψ 2 δ 2 = (Σ Ψ ΨΣ Z X Z X) 1 Σ Ψ Z 2 X 2 Ψ 2 δ 2, yielding ζ in (3.40). Appendix 3.B Details on the corrected statistic n order to derive the expansion in (3.49) we first employ an Edgeworth approximation for d n 1/2 ( ˆβ β)/σ u and write u = u/σ u d = d 0 + n 1/2 d 1 + n 1 d 2 + O p (n 3/2 ), with d 0 = n 1/2 G 1 Π Z u d 1 = G 1 V Hu Dd 0 d 2 = D 2 d 0 [DG 1 + G 1 D ]V Hu G 1 V HV d 0 H = P Z P ZΠ D = n 1/2 G 1 Π Z V. 77

90 This is found by writing d = (n 1 X P Z X) 1 n 1/2 X P Z u = (n 1 Π Z ZΠ+n 1 Π Z V + n 1 V ZΠ+n 1 V P Z V ) 1 (n 1/2 Π Z u + n 1/2 V P Z u ) = G 1 [ +(n 1 Π Z V + n 1 V ZΠ+n 1 V P Z V )G 1 ] 1 (n 1/2 Π Z u + n 1/2 V P Z u ). Employing the expansion series 1 1+x =1 x + x2 x and writing the matrix between square brackets as + n 1 TG 1 we obtain for its inverse the approximation n 1 TG 1 +(n 1 TG 1 ) 2. Collecting terms of the same order yields the approximation above. Additionally, by the same methodology we find an approximation for ˆσ u/σ 2 u 2 of the form ˆσ u/σ 2 u 2 =1+n 1/2 σ 1 + n 1 σ 2 + O p (n 3/2 ), with σ 1 = g 2δ d 0 σ 2 = d 0Ω X d 0 2d 0q 2δ d 1 g = n 1/2 [n 1 u u 1] q = n 1/2 Π Z u + n 1/2 [n 1 V u δ] Now rewrite S in such a way that it can be expressed in d S =[u P Z u 2n 1/2 u Z(Z Z) 1 Z Xd + n 1 d X P Z Xd]/(ˆσ 2 u/σ 2 u). By filling in d and noting that 1/(ˆσ 2 u/σ 2 u)=1 n 1/2 σ 1 n 1 (σ 2 σ 2 1)+O p (n 3/2 ), we find S = s 0 + n 1/2 s 1 + n 1 s 2 + O p (n 3/2 ), 78

91 with s 0 = u Hu s 1 = u Hu g 2u H[V u δ ]d 0 s 2 = d 0V HV d 0 u HV G 1 V Hu + u Hu (g 2 2δ d 0 g +2b q +2d 0Qd 0 d 0Ω X d 0 +2δ G 1 V Hu )+2u H(n 1/2 WG 1 Π Z W + Wd 0 g Vd 0 d 0δ) W = V u δ. Next, we will derive φ S (t), the characteristic function of S, from this φ S (t) = E(exp(itS)) = E(exp(its 0 )exp(itn 1/2 s 1 )exp(itn 1 s 2 + O p (n 3/2 ))) = E(exp(its 0 )(1 + n 1/2 it + n 1 it(s its2 1))) + O(n 3/2 ) = E[(exp(its 0 )E(1 + n 1/2 it + n 1 it(s its2 1)) Hu )] + O(n 3/2 ), where we have used that exp(x) =1+x+ 1 2 x in the second step. Using the appendix of Kiviet and Phillips (2012) we can easily calculate the conditional expectations required. 1 Define φ(t) = (1 2it) (L K)/2 degrees of freedom and t = 1 1 2it we may write as the characteristic function of s 0,aχ 2 variable with L K. Using the fact that d E(exp(its d(it) 0)) = E(exp(its 0 )s 0 ) E(exp(its 0 )s j 0) = d j (d(it)) E(exp(its 0)) j = φ(t)t j (L K)(L K +2)...(L K +2(j 1)). Using this we eventually derive the characteristic function of S φ S (t) =φ(t)+n 1 tt [h 1 + t h 2 ]φ(t)+o(n 3/2 ), with h 1 = (c 1 c (L K)+2K +1)(L K) 2 h 2 = (c 1 1 )(L K)(L K +2). 2 79

92 Now we have to derive the corresponding distribution. Note that exp(itx)f L K+2j(x)dx 1 = (1 2it), (L K+2j)/2 where f j (x) is the probability density function of a chi-squared variable with j degrees of freedom. The corresponding cumulative distribution function is denoted as F j (x). Hence, Pr(S x) =F L K (x) n 1 f L K+2 (x)h 1 n 1 f L K+4 (x)h 2 + O(n 3/2 ) follows. This can be simplified by noting that xf j (x) =jf j+2 (x) and we obtain Pr(S x) =F L K (x) n 1 f L K (x)q 1 x n 1 f L K (x)q 2 x 2 + O(n 3/2 ), with q 1 = c 1 c (L K)+2K +1 2 q 2 = c Define a Taylor series for the cumulative distribution function F L K (z) F L K (a)+f L K (a)(z a)+ 1 2 f L K(a)(z a) , and choose a = x, z = x n 1 d 1 x n 1 d 2 x 2. Hence, we get F L K (x n 1 d 1 x n 1 d 2 x 2 )=F L K (x) f L K (x)(n 1 d 1 x + n 1 d 2 x 2 )+O(n 2 ), and Pr(S x) =F L K (x n 1 d 1 x n 1 d 2 x 2 )+O(n 3/2 ). 80

93 Figure 3.2: Testing Z, μ 2 =80 Figure 3.3: Testing Z 1, μ 2 = S S c S B J S S c S B J Rejection Probability Rejection Probability ρ xε ρ xε Figure 3.4: Testing Z 2, μ 2 =80 Figure 3.5: Testing Z, μ 2 = S S B J H S S c S B J Rejection Probability S H Rejection Probability ρ xε ρ xε Figure 3.6: Testing Z 1, μ 2 =30 Figure 3.7: Testing Z 2, μ 2 = S S c S B J S S B J H Rejection Probability Rejection Probability S H ρ xε ρ xε 81

94 Figure 3.8: Testing Z, μ 2 =10 Figure 3.9: Testing Z 1, μ 2 = S S c S B J S S c S B J Rejection Probability Rejection Probability ρ xε ρ xε Figure 3.10: Testing Z 2, μ 2 =10 Figure 3.11: Testing Z, μ 2 = 80, heteroskedasticity (+) Rejection Probability S S B J H S H Rejection Probability S S c S B J ρ xε ρ xε Figure 3.12: Testing Z 1, μ 2 = 80, heteroskedasticity (+) Figure 3.13: Testing Z 2, μ 2 = 80, heteroskedasticity (+) S S c S B J S S B J H S H Rejection Probability Rejection Probability ρ xε ρ xε 82

95 Figure 3.14: Testing Z, μ 2 = 80, heteroskedasticity (-) Figure 3.15: Testing Z 2, μ 2 = 80, heteroskedasticity (-) S S c S B J S S B J H S H Rejection Probability Rejection Probability ρ xε ρ xε Figure 3.16: Testing Z, μ 2 = Figure 3.17: Testing Z 2, μ 2 =80 Rejection Probability S S c S B J Rejection Probability S S B J H S H ρ z3 ε ρ z3 ε Figure 3.18: Testing Z, μ 2 =30 Figure 3.19: Testing Z 2, μ 2 = Rejection Probability S S c S B J Rejection Probability S S B J H S H ρ z3 ε ρ z3 ε 83

96 Figure 3.20: Testing Z, μ 2 = Figure 3.21: Testing Z 2, μ 2 =10 Rejection Probability S S c S B J Rejection Probability S S B J H S H ρ z3 ε ρ z3 ε Figure 3.22: Testing Z, ρ z1u = ρ z2u =0.2, Rz 2 j =0.2 j Figure 3.23: Testing Z 2, ρ z1u = ρ z2u =0.2, Rz 2 j =0.2 j Rejection Probability S S c S B J Rejection Probability S S B J H S H ρ z3 ε ρ z3 ε Figure 3.24: Testing Z, ρ z1u =0.2, ρ z2u =0,Rz 2 j =0.2 j Figure 3.25: Testing Z 2, ρ z1u =0.2, ρ z2u =0,Rz 2 j =0.2 j Rejection Probability S S c S B J Rejection Probability S S B J H S H ρ z3 ε ρ z3 ε 84

97 Chapter 4 Accuracy and efficiency of various GMM inference techniques in dynamic micro panel data models: theory 4.1 ntroduction One of the major attractions of analyzing panel data rather than single indexed variables is that they allow to cope with the empirically very relevant situation of unobserved heterogeneity correlated with included regressors. Econometric analysis of dynamic relationships on the basis of panel data, where the number of surveyed individuals (N) is relatively large while covering just a few time periods (T ), is very often based on GMM (generalized method of moments). ts reputation is built on its claimed flexibility, generality, ease of use, robustness and efficiency. Widely available standard software enables to estimate models including exogenous, predetermined and endogenous regressors consistently, while allowing for semiparametric approaches regarding the presence of heteroskedasticity and the type of distribution of the disturbances. This software also provides specification checks regarding the adequacy of the internal and external instrumental variables employed and the specific assumptions made regarding (absence of) serial correlation. Popular are especially the GMM implementations put forward by Arellano and Bond (1991). However, practical problems have often been reported, such as vulnerability due to the abundance of internal instruments, discouraging improvements of 2-step over 1-step GMM findings, poor size control of test statistics, and weakness of instruments especially when the dynamic adjustment process is slow (a root is close to unity). As 85

98 remedies it has been suggested to reduce the number of instruments by renouncing some valid orthogonality conditions, but also to extend the number of instruments by adopting more orthogonality conditions. Extra orthogonality conditions can be based on certain homoskedasticity or stationarity assumptions or initial value conditions, see Blundell and Bond (1998). By abandoning weak instruments finite sample bias may be reduced, whereas by extending the instrument set with a few strong ones the bias may be further reduced and the efficiency enhanced. Presently, it is not clear yet how practitioners can best make use of these suggestions, because no set of preferred testing tools is yet available that allows to select instruments by assessing both their validity and their strength, and to classify individual regressors accurately as either endogenous, predetermined or strictly exogenous. Therefore it happens often that in applied research models and techniques are selected simply on the basis of the perceived significance and plausibility of their coefficient estimates, whereas it is well known that imposing invalid coefficient restrictions and employing regressors wrongly as instruments will often lead to relatively small estimated standard errors, which then contain misleading information on the actual precision of often seriously biased estimators. The available studies on the performance of alternative inference techniques for dynamic panel data models have obvious limitations when it comes to advising practitioners on the most effective implementations of estimators and tests under general circumstances. As a rule, they do not consider various empirically relevant issues in conjunction, such as: (i) occurrence and the possible endogeneity of regressors additional to the lagged dependent variable, (ii) occurrence of individual effect (non-)stationarity of both the lagged dependent variable and other regressors, (iii) cross-section and/or time-series heteroskedasticity of the idiosyncratic disturbances, and (iv) variation in signal-to-noise ratios and in the relative prominence of individual effects. For example: the simulation results in Arellano and Bover (1995), Alonso-Borrego and Arellano (1999), Hahn and Kuersteiner (2002), Alvarez and Arellano (2003), Hahn et al. (2007), Kiviet (2007), Kruiniger (2008), Okui (2009), Roodman (2009), Hayakawa (2009) and Han and Phillips (2013) just concern the panel AR(1) model under homoskedasticity. Although an extra regressor is included in the simulation studies in Arellano and Bond (1991), Kiviet (1995), Bowsher (2002), Hsiao et al. (2002), Bond and Windmeijer (2005), Bun and Carree (2005a,b), Bun and Kiviet (2006), Gouriroux et al. (2010), Hayakawa (2010), Dhaene and Jochmans (2012), Flannery and Hankins (2013), Everaert (2013) and Kripfganz and Schwarz (2013), this regressor is (weakly-)exogenous and most experiments just concern homoskedastic disturbances and stationarity regarding the impact of individual effects. Blundell et al. (2001) and Bun and Sarafidis (2015) include an endogenous regressor, but their design does not 86

99 allow to control the degree of simultaneity; moreover, they stick to homoskedasticity. Harris et al. (2009) only examine the effects of neglected endogeneity. Heteroskedasticity is considered in a few simulation experiments in Arellano and Bond (1991) in the model with an exogenous regressor, and just for the panel AR(1) case in Blundell and Bond (1998). Windmeijer (2005) analyzes panel GMM with heteroskedasticity, but without including a lagged dependent variable in the model. Bun and Carree (2006) and Juodis (2013) examine effects of heteroskedasticity in the model with a lagged dependent and a strictly exogenous regressor under stationarity regarding the effects. Moral-Benito (2013) examines stationary and nonstationary regressors in a dynamic model with heteroskedasticity, but the extra regressor is predetermined or strictly exogenous. Moreover, his study is restricted to time-series heteroskedasticity, while assuming cross-sectional homoskedasticity. n a micro context the latter seems more realistic to us, whereas it is also trickier when N is large and T small. So, knowledge is still scarce with respect to the performance of GMM when it is not only needed to cope with genuine simultaneity (which we consider to be the core of econometrics), but also because of occurrence of heteroskedasticity of unknown form. Moreover, many of the simulation studies mentioned above did not systematically explore the effects of relevant nuisance parameter values on the finite sample distortions to asymptotic approximations. Regarding the performance of tests on the validity of instruments worrying results have been obtained in Bowsher (2002) and Roodman (2009) for homoskedastic models. On the other hand Bun and Sarafidis (2015) report reassuring results, but these just concern models where T = 3. Hence, it would be useful to examine more cases over an extended grid covering more dimensions. This we performed in the next chapter, which builds on the theoretical developments provided here. The present study aims to systematize and justify many of the different options available to implement and to test the GMM estimators put forward by Arellano and Bond (1991) and Blundell and Bond (1998). This does, for instance, involve a demonstration that, irrespective of the initial conditions, the intercept is a valid instrument for the equation in levels, but may nevertheless be neglected in Arellano-Bond estimation, due to a partialling out results for GMM estimators (given in Appendix 4.B). Also a rigorous derivation (in Appendix 4.C) is provided of the redundancy of particular instruments for the equation in levels in the presence of some endogenous, predetermined or exogenous regressors which are time-invariant with respect to the individual effects. A categorization is given of the great many possibilities there are to neglect valid orthogonality conditions in order to reduce the number of instruments in the hope that this will mitigate bias of the coefficient estimators. The same is done for overidentification restriction tests regard- 87

100 ing the use they make of 1-step or 2-step residuals. Additionally, more and less robust implementations of weighting matrices are being discussed, and also the possibility to use heteroskedasticity robust 1-step variance estimators in coefficient tests, or to calculate these on the basis of 2-step results, possibly corrected according to the often practiced approach by Windmeijer (2005). The latter s derivation is simplified here (in Appendix 4.A) by obtaining it directly through application of the delta-method to a linear V model with heteroskedasticity. Moreover, following Kiviet and Feng (2014), we develop for the dynamic panel model a novel modification of the traditional GMM implementation which aims to improve the strength of the exploited instruments in the presence of cross-sectional heteroskedasticity. The structure of this chapter is as follows. n Section 4.2 we first present the major issues regarding V and GMM coefficient and variance estimation in linear models and on inference techniques on establishing instrument validity and regarding the coefficient values by standard and by corrected test statistics. Next in Section 4.3 the generic results of Section 4.2 are used to discuss in more detail than provided elsewhere the various options for their implementation in linear models for single dynamic simultaneous micro econometric panel data relationships with both individual and time effects and some form of cross-sectional heteroskedasticity. Section 4.4 concludes and refers to the next chapter. 4.2 Basic GMM results for linear models Here we present concisely the major generic results on V and GMM inference for single indexed data that could either represent time-series or a cross-section. First we define the model and estimators, discuss some of their special properties and consider specific test situations. From these general findings for linear regressions the examined implementations for specific linear dynamic panel data models follow easily in Section Model and estimators Let the scalar dependent variable y i depend linearly on K regressors x i and an unobserved disturbance term u i, and let there be L K variables z i (the instruments) that establish orthogonality conditions such that } y i = x iβ 0 + u i E[z i (y i x iβ 0 )] = 0 i =1,..., n. (4.1) 88

101 Here x i and β 0 are K 1 vectors, β 0 containing the true values of the unknown coefficients, and z i is an L 1 vector. Applying the analogy principle, the method of moments (MM) aims to find an estimator for model parameter β by solving the L sample moment equations n 1 n z i(y i x ˆβ) i =0. (4.2) i=1 Generally, these have a unique solution only when L = K and then yield ( n ) 1 ˆβ = z n ix i z iy i, (4.3) i=1 i=1 provided the inverse exists. For L>Kthe MM recipe to find a unique estimator is: minimize with respect to β the criterion function n [(y j x jβ)z j]g n [z i(y i x iβ)] j=1 i=1 for some weighting matrix G. t can be shown that the asymptotically optimal choice for G is an expression which has a probability limit that is proportional to the inverse of the asymptotic variance V of n 1/2 n z iu i d N(0,V). i=1 When u i iid(0,σu) 2 an optimal choice for G is proportional to the inverse of n i=1 z iz i and the MM estimator is ˆβ V =[X Z(Z Z) 1 Z X] 1 X Z(Z Z) 1 Z y, (4.4) where y =(y 1 y n ),X=(x 1 x N ) and Z =(z 1 z N ). But, when E(z i u i ) = 0 while u =(u 1 u n ) (0,σuΩ), 2 where Ω has full rank and without loss of generality tr(ω) = n, the optimal choice for G is a matrix proportional to (Z ΩZ) 1, yielding MM estimator ˆβ GMM =[X Z(Z ΩZ) 1 Z X] 1 X Z(Z ΩZ) 1 Z y. (4.5) Note that for Ω = the latter formula simplifies to ˆβ V. When L = K both ˆβ GMM and ˆβ V simplify to (4.3) or (Z X) 1 Z y. When Ω is unknown and therefore (4.5) is unfeasible, one should use an informed guess Ω (0) (1) to obtain the 1-step estimator ˆβ GMM, which is sub-optimal when Ω(0) Ω, though consistent under the assumptions made. Then the residuals û (1) (1) = y X ˆβ GMM (4.6) 89

102 are consistent for u, thus from them it should be possible to obtain an expression such that plim n 1 (Z ˆΩ(1) Z Z ΩZ) =O. Substituting in (4.5) yields the 2-step estimator ˆβ (2) GMM =[X Z(Z ˆΩ(1) Z) 1 Z X] 1 X Z(Z ˆΩ(1) Z) 1 Z y, (4.7) which is asymptotically equivalent to ˆβ GMM and thus asymptotically optimal, given the L instruments used. ˆΩ (1) Some algebraic peculiarities Defining P Z = Z(Z Z) 1 Z and ˆX = P Z X one finds ˆβ V =(ˆX 1 ˆX) ˆX y, which highlights its two-stage least-squares character. Now suppose that X =(X 1 X 2 )andz =(Z 1 Z 2 ) with Z 2 = X 2 whereas Xβ = X 1 β 1 + X 2 β 2,whereβ 1 and β 2 have K 1 and K 2 elements respectively. Standard results on partitioned regression yields ˆβ 1,V =(ˆX 1M ˆX2 ˆX1 ) 1 ˆX 1 M ˆX2 y =(X 1P MX2 Z 1 X 1 ) 1 X 1P MX2 Z 1 y, (4.8) which is the V estimator in the regression of y on just X 1 using the L K 2 instruments M X2 Z 1. This result is known as partialling out the predetermined regressors X 2. t follows from ˆX 2 = P Z X 2 = X 2 which yields M ˆX2 ˆX1 = M X2 P Z X 1 = M X2 (P X2 + P MX2 Z 1 )X 1 = P MX2 Z 1 X 1. A similar result is not straight-forwardly available for GMM because of the following. Let positive definite matrix Ω be factorized as follows Ω 1 =Ψ Ψ, so Ω = Ψ 1 (Ψ ) 1. (4.9) Now define y =Ψy, X =ΨX, Z =(Ψ ) 1 Z, (4.10) then ˆβ GMM = [X Z (Z Z ) 1 Z X ] 1 X Z (Z Z ) 1 Z y = (X P Z X ) 1 X P Z y, so GMM is equivalent to V using transformed variables, but where Z has been transformed differently. Therefore, if X 2 is such that X2 establishes valid instruments in the transformed model y = X β + u, where u (0,σu), 2 the regressors X2 are not used 90

103 as instruments in GMM in its V interpretation. They would, though, if one would deliberately choose Z 2 =Ω 1 X 2. As is well-known and easily verified, linear transformations of the matrix of instruments of the form Z = ZC, where C is a full rank L L matrix, have no effect on ˆβ V nor on ˆβ GMM. However, there is not such invariance when the matrix Z is premultiplied by some transformation matrix, and hence not the columns but the rows of Z are directly affected. t has been shown in Kiviet and Feng (2014) that such transformations, chosen in correspondence with the required transformation of the model when Ω, maylead to modified GMM estimation achieving higher efficiency levels and better results in finite samples than standard GMM, provided the validity of the transformed instruments is maintained. We will examine here the effects of employing transformation Z =ΨZ, which provides the modified GMM estimator ˆβ MGMM =[X Ω 1 Z(Z Ω 1 Z) 1 Z Ω 1 X] 1 X Ω 1 Z(Z Ω 1 Z) 1 Z Ω 1 y. (4.11) When this can be made feasible, it yields ˆβ (2) MGMM Particular test procedures (2) nference on elements of β 0 based on ˆβ GMM of (4.7) requires an asymptotic approximation to its distribution. Under correct specification the standard first-order approximation is ˆβ (2) GMM a N(β 0, ˆσ 2 u[x Z(Z ˆΩ(1) Z) 1 Z X] 1 ). (4.12) t allows testing general restrictions by Wald-type tests. For an individual coefficient, say β 0k = e K,k β 0, where e K,k is a K 1 vector with all elements zero except its k th element (1 k K) which is unity, testing H 0 : β 0k = βk 0 and allowing for one-sided alternatives, amounts to comparing test statistic W βk =(e (2) K,k ˆβ GMM β0 k)/{ˆσ ue 2 K,k[X Z(Z ˆΩ(1) Z) 1 Z X] 1 e K,k } 1/2 (4.13) with the appropriate quantile of the standard normal distribution. Note that this test statistic is actually an asymptotic t-test; in finite samples the type error probability may deviate from the chosen nominal level, also depending on whether ˆσ u 2 has been obtained from 1-step or from 2-step residuals and any employed loss of degrees of freedom corrections. n fact it has been observed that the consistent estimator of the variance (2) of two-step GMM estimators Var( ˆβ GMM ) given in (4.12) often underestimates the finite sample variance, because in its derivation the randomness of ˆΩ (1) is not taken into ac- 91

104 (2) count. Windmeijer (2005) provides a corrected formula Varc( ˆβ GMM ), see Appendix 4.A, which can be used in the corrected t-test Wβ c k =(e (2) K,k ˆβ GMM β0 k)/{e (2) K,k Varc( ˆβ GMM )e K,k} 1/2. (4.14) When L>Kthe overidentification restrictions can be tested by the Sargan-Hansen statistic J (1) Z = n(û(1) Z(Z ˆΩ(1) Z) 1 Z û (1) )/(û (1) û (1) ), (4.15) which under correct specification and valid instruments Z is distributed as χ 2 L K asymptotically. Because ˆΩ (1) is based on û (1), which relates to a consistent estimator, but not to an asymptotically optimal estimator, it seems better to perform at least one further iteration and use J (2) Z = n(û(2) Z(Z ˆΩ(2) Z) 1 Z û (2) )/(û (2) û (2) ), (4.16) where û (2) (2) = y X ˆβ GMM is used to construct ˆΩ (2). When Z =(Z m Z a ), where Z m is an n L m matrix with L m K containing the instruments whose validity seems very likely, then, under the maintained hypothesis E(Z mu) = 0, one can test the validity of the L L m additional instruments Z a by the incremental test statistic J (2) Z a = n[û (2) Z(Z ˆΩ(2) Z) 1 Z û (2) /(û (2) û (2) ) û (2) m Z m (Z m (2) ˆΩ m Z m ) 1 Z mû (2) m /(û (2) m û (2) m )], (4.17) which under correct specification of the model with valid instruments Z is distributed as ˆΩ (2) m χ 2 L L m asymptotically. Of course, û (2) m and are obtained by just using the instruments Z m. Note that for m = K we have J (2) Z a = J (2) Z because in that case Z mû (2) m =0. Hence, when m = K, explicit specification of component Z m is meaningless. n simulations it is interesting to examine as well unfeasible versions of the above test statistics, which exploit information that is usually not available in practice. This will produce evidence on what elements of the feasible asymptotic tests may cause any inaccuracies in finite samples. So, next to (4.13), (4.16) and (4.17) we will also examine W (u) β k = (e ˆβ K,k GMM βk)/{σ 0 ue 2 K,k[X Z(Z ΩZ) 1 Z X] 1 e K,k } 1/2, (4.18) J (u) Z = û Z(Z ΩZ) 1 Z û/σu, 2 where û = y X ˆβ GMM, (4.19) J (u) Z a = [û Z(Z ΩZ) 1 Z û û mz m (Z mωz m ) 1 Z mû m ]/σu. 2 (4.20) Similar feasible and unfeasible implementations of t-tests and Sargan-Hansen tests for MGMM based estimators follow straight-forwardly. 92

105 4.3 mplementations for dynamic micro panel models Model and assumptions We consider the balanced linear dynamic panel data model (i =1,..., N; t =1,..., T ) y it = x itβ + w itγ + v itδ + τ t + η i + ε it, (4.21) where x it contains K x 0 strictly exogenous regressors (excluding fixed time effects), w it are K w 0 predetermined regressors (probably including lags of the dependent variable and other variables affected by lagged feedback from y it or just from ε it ), v it are K v 0 endogenous regressors (affected by instantaneous feedback from y it and therefore jointly dependent with y it ), the τ t are random or fixed time effects, the η i are random individual specific effects (most likely correlated with many of the regressors) such that 1 η i iid(0,σ 2 η), (4.22) whereas E(ε it )=0 E(ε 2 it) =σ it 2 i, j, t s. (4.23) E(ε it ε js )=0 E(η i ε jt )=0 The classification of the regressors implies E(x it ε is )=0 E(w it ε i,t+l )=0 E(v it ε i,t+1+l )=0 i, t, s, l 0. (4.24) For the sake of simplicity we assume that all regressors are time varying and that the vectors x it,w it or v it are defined for t =1,..., T. However, their elements may contain observations prior to t = 1 for regressors that are actually the l th order lag of a current variable. Only these lagged regressors are observed from t =1 l onwards. This means that all regressors, be it current variables or lags of them, have exactly T observations. So, any unbalancedness problems have been defined away; moreover, no internal instrumental 1 Note that this assumption can be weakened. However, as this is the assumption made for the simulations in the next chapter, we will use it here as well. 93

106 variables can be constructed involving observations prior to those included in x i1,w i1 or v i1. Stacking the T time-series observations of (4.21) the equation in levels can be written y i = X i β + W i γ + V i δ + τ + η i ι T + ε i, (4.25) where y i =(y i1 y it ),X i =(x i1 x it ),W i =(w i1 w it ),V i =(v i1 v it ),τ = (τ 1 τ T ) and ι T is the T 1 vector with all its elements equal to unity. Defining the i-th T K block of the regressor matrix R i = (X i W i V i T ), the K 1coefficient vector α =(β γ δ τ ) and the compound disturbances as u i = η i ι T + ε i, the model can compactly be expressed as y i = R i α + u i. (4.26) We do allow K x =0,K w =0,K v =0, but not all three at the same time, so K = K x + K w + K v + T>T. (4.27) The parameter vector τ could be void too, or have all its elements equal though non-zero. However, it seems better to allow for a general τ, for the same reasons as an ordinary cross-section regression normally should include an intercept, namely to help underpin the assumption that both the idiosyncratic disturbances and the individual effects have expectation zero. We focus on micro panels, where the number of time-series observations T is usually very small, possibly a one digit number, and the number of cross-section units N is large, usually at least several hundreds. Therefore asymptotic approximations will be for N and T finite Removing individual effects by first differencing First we consider estimating the model by GMM following the approach propounded by Arellano and Bond (1991), see also Holtz-Eakin et al. (1988). This estimates the equations (i =1,..., N; t =2,..., T ) Δy it = Δx itβ +Δw itγ +Δv itδ +Δτ t +Δε it = Δr it α +Δε it, (4.28) where α =(β γ δ τ ) and Δr it =(Δx it Δw it Δv it e T 1,t ) are (K 1) 1 vectors. Here τ is the (T 1) 1 vector (Δτ 2 Δτ T ). Stacking the T 1 time-series observations of 94

107 (4.28) yields ỹ i = R i α + ε i, (4.29) where ỹ i =(Δy i2 Δy it ), R i =(Δr i2 Δr it )and ε i =(Δε i2 Δε it ). To construct a full column rank matrix of instrumental variables Z =(Z 1 Z N ), which expresses as many linearly independent orthogonality conditions as possible for (4.29), while restricting ourselves to internal variables, i.e. variables occurring in (4.21), we define the following vectors x T i =(x i1 x it ), w t i =(w i1 w it), v t i =(v i1 v it). (4.30) Without making this explicit in the notation it should be understood that these three vectors only contain unique elements. Hence, if vector x is (or w is )containsfor1<s T a particular value and also its lag (which is not possible for v it ), then this lag should be taken out since it already appears in x i,s 1. Matrix Z i is of order (T 1) L and consists of four blocks (though some of these may be void) Z i =( T 1 Z x i Z w i Z v i ). (4.31) The first block is associated with the fundamental moment conditions E( ε i )=0and could therefore form part of Z i even if one imposes τ =0. For the other blocks we have wi Zi x = T 1 x T i,zi w = O... O v,zv i i = 0 0 w T 1 O.. (4.32).. O i 0 0 v T 2 i The maximum possible order of Zi x is (T 1) K x T (T 1), of Zi w it is (T 1) K w T (T 1)/2 andzi v is (T 1) K v (T 1)(T 2)/2, thus L (T 1){T [K x +(K w + K v )/2] K v +1}, (4.33) whereas MM estimation requires L K 1. t follows from (4.23) and (4.24) that E(Z i ε i ) = 0 indeed. n actual estimation one may use a subset of these instruments by taking the linear transformation Zi = Z i C, where C is an L L matrix (with all its elements often being either zero, one or minus one) of rank L <L,provided L K. n the above we have implicitly assumed that the variables are such that Z =(Z 1 Z N ) will have full column rank, so another necessary condition is N(T 1) L. Of course, it 95

108 is not required that individual blocks Z i have full column rank. Despite its undesirable effect on the asymptotic variance of method of moments estimators, reducing the number of instruments may improve estimation precision, because it may at the same time mitigate estimation bias in finite samples, especially when weak instruments are being removed. So, instead of including the block T 1 in Z i one could when the model has no time-effects replace it by T 1 ι T 1 = ι T 1. Regarding Zi w and Zi v two alternative instrument reduction methods have been suggested, namely omitting long lags (see Bowsher 2002, Windmeijer 2005 and Bun and Kiviet 2006 and collapsing (see Roodman 2009, but also suggested in Anderson and Hsiao 1981). Both are employed in Ziliak (1997); these two methods can also be combined. Omitting long lags could be achieved by reducing Zi w to, for instance, w i w i1 w i w i2 w i3 0 0 (4.34)... O O... O O w i,t 2 w i,t 1 and similar for Z v i. The collapsed versions of Z w i and of Z v i can be denoted as Z w i = w i1 0 0 w i2 w i O w i,t 1 w i,t 2 w i,1,z v i = v i1 0 0 v i2 v i1.. (4.35).... O v i,t 2 w i,t 3 v i,1 Collapsing can be combined with omitting long lags, if one removes all the columns of Zi w and Zi v which have at least a certain number of zero elements (say 1 or 2 or more) in their top rows. n corresponding ways, the column space of Zi x can be reduced by including in Z i either a limited number of lags and leads, or the collapsed matrix Z x i = x i2 x i1 0 0 x i3 x i2 x i1., (4.36)..... O x i,t x i,t 1 x i,t 2 x i,1 96

109 or just its first two or three columns or what is often done in practice simply the difference between the first two columns, the K x regressors (Δx i2,..., Δx it ). t seems useful to distinguish the following specific forms of instrument matrix reduction of the case where all instruments associated with valid linear moment restrictions are being used. The latter case we label as A (all); the reductions are labelled C (standard collapsing), L0, L1, L2, L3 (which primarily restrict the lag length), and C0, C1, C2, C3 (which combine the two reduction principles). n all the reductions we replace T 1 by ι T 1 when the model does not include time-effects. Regarding Zi x,zi w and Zi v different types of reductions can be taken, which we will distinguish by using for example the characterization: A v, L2 w, C1 x etc. This leads to the particular reductions as indicated and defined in Table 4.1. Table 4.1: Definition of labels for particular instrument matrix reductions A x : Zi x A w : Zi w A v : Zi v L0 x : diag(x i2,..., x it ) L0w : diag(w i1,..., w i,t 1 ) L0v :[0, diag(v i1,..., v i,t 2 )] L1 x : diag(δx i2,..., Δx it ) L1w : diag(w i1, Δw i2,..., Δw i,t 1 ) L1v :[0, diag(v i1, Δv i2,..., Δv i,t 2 )] L2 x : diag(x i1,..., x i,t 1 ), L0x L2 w :[0, diag(w i1,..., w i,t 2 )], L0 w L2 v :[0, 0, diag(v i1,..., v i,t 3 )], L0 v L3 x :[0, diag(x i1,..., x i,t 2 )], L2 x L3 w :[0, 0, diag(w i1,..., w i,t 3 )], L2 w L3 v :[0, 0, 0, diag(v i1,..., v i,t 4 )], L2 v C x : Zi x C w : Zi w C v : Zi v C0 x :(x i2,..., x it ) C0 w :(w i1,..., w i,t 1 ) C0 v :(0,v i1,..., v i,t 2 ) C1 x :(Δx i2,..., Δx it ) C1 w :(0, Δw i2,..., Δw i,t 1 ) C1 v :(0, 0, Δv i2,..., Δv i,t 2 ) C2 x :C0 x, (x i1,..., x i,t 1 ) C2 w :C0 w,(0,w i1,..., w i,t 2 ) C2 v :C0 v,(0, 0,v i1,..., v i,t 3 ) C3 x :C2 x,(0,x i1,..., x i,t 2 ) C3 w :C2 w,(0, 0,w i1,..., w i,t 3 ) C3 v :C2 v,(0, 0, 0,v i1,..., v i,t 4 ) Note that for all three types of regressors L2, like L1, uses one extra lag compared to L0, but does not impose the first-difference restrictions characterizing L1. We skipped a similar intermediary case between L2 and L3. Self-evidently L2 x can also be represented by combining diag(x i1,..., x i,t 1 )withl1x, and similar for L2 w and L2 v. The reductions C0 and C1, which yield just one instrument per regressor, consitute generalizations of the classic instruments suggested by Anderson and Hsiao (1981). These may lead to just identified models where the number of instruments equals the number of regressors and the occurrence of the non-existence of moments problem. To avoid that, and also because we suppose that in general some degree of overidentification will have advantages regarding both estimation precision and the opportunity to test model adequacy, one may choose to restrict oneself to the popular C1 x and the reductions C and C3 as far as collapsing is concerned. n C3 just the first three columns of the matrices in (4.35) and (4.36) are being used as instruments. 97

110 Alternative weighting matrices We assumed in (4.23) that the ε it are serially and cross-sectionally uncorrelated but may be heteroskedastic. Let us define the matrices σi Ω i = and D = , (4.37) 0 0 σit where ε i (0, Ω i )andd is (T 1) T, so ε i = Dε i (0,DΩ i D ). Under standard regularity we have N 1/2 N i=1 Z i ε d i N(0, plim N 1 N i=1 Z idω i D Z i ). (4.38) Hence, the optimal GMM estimator of α of (4.29) should use a weighting matrix such that its inverse has probability limit proportional to plim N 1 N i=1 Z idω i D Z i. This can be achieved by starting with a consistent 1-step GMM estimator α (1), which, assuming incorrectly Ω i = σε, 2 uses the weighting matrix ( N ) 1 G (0) = i=1 Z idd Z i, (4.39) and then yields [( α (1) N = i=1 ) ( N R iz i G (0) i=1 Z R )] 1 ( N ) ( N ) i i R iz i G (0) i=1 i=1 Z iỹ i. (4.40) Next, in a second step, the consistent 1-step residuals ε (1) i construct the asymptotically optimal weighting matrix However, an alternative is using ( N Ĝ (1) a = i=1 Z i ε (1) i ε (1) =ỹ i R i α (1) can be used to i Z i ) 1. (4.41) ( Ĝ (1) N ) 1 b = Z iĥ(1)b i Z i, (4.42) i=1 98

111 where Ĥ(1)b i is the band matrix Ĥ (1)b i = ε (1) (1) i2 ε i2 ε (1) i3 (1) ε i2 ε (1) (1) i2 ε ε (1) i3 0 ε (1) i4 i3 0 0 (1) ε i3 (1) ε i3 ε (1) (1) i3 ε i4 0 ε (1) (1) i4 ε i ε (1) i,t 1 ε (1) i,t ε (1) it (1) ε i,t 1. ε (1) i,t 1 ε (1) it ε (1) it (1) ε it. (4.43) Note that both (NĜ(1) a ) 1 and (NĜ(1) b ) 1 have a probability limit equal to the limiting variance of (4.38). The latter is less robust, but may converge faster when Ω i is diagonal indeed. On the other hand (4.42) may not be positive definite, whereas (4.41) is. For the special case Ω i = σε,i 2 of cross-section heteroskedasticity but time-series homoskedasticity, one could use ( N ) 1 Ĝ (1) c = Z iĥ(1)c i Z i, (4.44) i=1 with Ĥ (1)c i =ˆσ 2(1) ε,i H =ˆσ 2(1) ε,i....., (4.45) where H = DD and ˆσ 2(1) ε,i = ε (1) (1) i H 1 ε i /(T 1), (4.46) although this estimator is not consistent for T finite. All these different weighting matrices can be used to calculate alternative α (2) (j) estimators for j {a, b, c} according to [( α (2) N (j) = i=1 R iz i ) Ĝ(1) j ( N i=1 Z R )] 1 ( N i i i=1 R iz i ) Ĝ(1) j ( N i=1 Z iỹ i ). (4.47) When the employed weighting matrix is asymptotically optimal indeed, the first-order asymptotic approximation to the variance of α (2) (j) is given by the inverse of the matrix in square brackets. From this (corrected) t-tests are easily obtained, see Section

112 Matching implementations of Sargan-Hansen statistics follow easily too, see Section Respecting the equation in levels as well Since τ t = τ 1 + t s=2 τ s for t 2, defining the T (T 1) matrix Q = , (4.48) we may write τ = τ 1 ι T + Q τ. Therefore, we can rewrite model (4.26) as y i = X i β + W i γ + V i δ + Q τ + τ 1 ι T + u i = R i α + τ 1 ι T + u i = R i α + u i, (4.49) where R i =(X i W i V i Q), R i =( R i ι T )andk 1 vector α =( α τ 1 ), with coefficient τ 1 now being an overall intercept. A valid instrument for model (4.49) is the (as a regressor possibly redundant) intercept, because this embodies the single orthogonality condition E[ T t=1 (η i+ε it )] = E[ T t=1 u it] = 0( i), which is implied by the T +1 assumptions E(η i )=0andE(ε it )=0(fort =1,..., T ) made in (4.22) and (4.23). These T + 1 assumptions can also be represented (through linear transformation) by (i) E(η i )=0,(ii) E(Δε it )=0(fort =2,..., T )and(iii) E( T t=1 u it) = 0. Because we cannot express η i exclusively in observed variables and unknown parameters it is impossible to convert (i) into a separate sample orthogonality condition. The T 1 orthogonality conditions (ii) are already employed by Arellano-Bond estimation, through including T 1 in Z i of (4.31) for the equation in first differences. Orthogonality condition (iii), which is in terms of the level disturbance, can be exploited by including the column ι T in the i-th block of an instrument matrix for level equation (4.49). Combining the T 1 difference equations and the T level equations in a system yields ÿ i = R i α +ü i, (4.50) for each individual i, where ÿ i = (ỹ i y i), Ri = ( R i R i ), with R i = ( R i 0), so it 100

113 is extended by an extra column of zeros (to annihilate coefficient τ 1 in the equation in first differences), and ü i = ( ε i u i). We find that E( ε i u i) = E(Dε i ε i) = DΩ i and E(u i u i)=e[(η i ι T + ε i )(η i ι T + ε i ) ]=σηι 2 T ι T +Ω i, so ( ) E(ü i ü DΩi D DΩ i i)=. (4.51) Ω i D Ω i + σηι 2 T ι T Model (4.50) can be estimated by MM using the N(2T 1) (L +1)matrixof instruments with blocks ( Zi 0 Z i = O ι T ), (4.52) provided L +1 K and N(2T 1) L +1. Since both R i and Z i contain a column (0,ι T ), and due to the occurrence of the O-block in Z i, by a minor generalization of result (4.8) the V estimator of α obtained by using instrument blocks Z i in (4.50) will be equivalent regarding α with the V estimator of equation (4.29) using instruments with blocks Z i. That the same holds here for GMM under cross-sectional heteroskedasticity when using optimal instruments is due to the very special shape of Z i and is proved in Appendix 4.B. Hence, there is no reason to estimate the system, just in order to exploit the extra valid instrument (0,ι T ). Effect stationarity More internal instruments can be found for the equation in levels when some of the regressors are known to be uncorrelated with the individual effects. We will assume that after first differencing some of the explanatory variables may be uncorrelated with η i. Let r it =(x it w it v it ) contain the K = Kx + Kw + Kv unique elements of r it which are effect stationary, by which we mean that E(r it η i ) is time-invariant, so that E(Δr it η i)=0, i, t =2,..., T. This implies that for the equation in levels the orthogonality conditions E[Δx it (η i + ε is )] = 0 E[Δw it (η i + ε i,t+l )] = 0 E[Δv it (η i + ε i,t+1+l )] = 0 i, t > 1,s 1,l 0 (4.53) 101

114 hold. When w it includes y i,t 1, then apparently y it is effect stationary so that the adopted model (4.21) suggests that all regressors in r it must be effect stationary, resulting in K = K. Like for the T 1 conditions E(Δε it ) = 0 discussed below (4.26), many of the conditions (4.53) are already implied by the orthogonality conditions E(Z i ε i )=0forthe equation in first-differences. n Appendix 4.C we demonstrate that a matrix Z i s of instruments can be designed for the equation in levels (4.26) just containing instruments additional to those already exploited by E(Z i s ε i )=0, whilst E[ Z i (η i ι T + ε i )] = 0. This is the T L matrix Z i s =(ι T Zx i Zw i Zv i ), (4.54) where L =1+K (T 1) K v, with Z x i = Z v i = Δx i Δx i Δx it Δv i Δv i,t 1., Zw i = Δw i Δw i Δw it Under effect stationarity of the K variables (4.53) the system (4.50) can be estimated while exploiting the matrix of instruments Z s i = ( Zi O O Z s i, ). (4.55) f one decides to collapse the instruments included in Z i, it seems reasonable to collapse 102

115 Z s i as well and replace it by Z s i = Δx i2 Δw i2 0 1 Δx i3 Δw i Δx it Δw it Δv i2 Δv i,t 1. (4.56) Note that Z s i has L =1+K columns. Alternative weighting matrices under effect stationarity For the above system we have N 1/2 N i=1 Z s i ü i d N(0, plim N 1 N Φ i), (4.57) i=1 with ( Z Φ i = i DΩ i D Z i Z idω i Zs i Z i s Ω i D Z Zs i i (Ω i + σηι 2 T ι T ) Z i s Hence a feasible initial weighting matrix is given by where Φ (0) i (q) = ). (4.58) ( N 1 S (0) (q) = i=1 Φ(0) i (q)), (4.59) ( Z i DD Z i Z id Z s i Z s i D Z i Zs i ( + qι T ι T ) Z s i with q some nonnegative real value. Weighting matrix S (0) (q) would be optimal if Ω i = σε 2 with q = ση/σ 2 ε. 2 The for any q consistent 1-step GMM system estimator is given by [( α (1) N (q) = R Z ) ( N )] 1 ( N i i s S (0) (q) Z i s R i R Z ) ( N ) i i s S (0) (q) Z i s ÿ i. i=1 i=1 i=1 i=1 (4.60) Next, in a second step, the consistent 1-step residuals ü (1) i =ÿ i R i α (1) (q) can be used to construct the asymptotically optimal weighting matrix where ü (1) i =( ε s(1) i ( N Ŝ a (1) = i=1 Z s i (1) ü i ü (1) i ), Z s i ) 1, (4.61) û s(1) i ) with ε s(1) i =ỹ i R i α (1) (q) andû s(1) i = y i R i α (1) (q). However, 103

116 several alternatives are possible. Consider weighting matrix Ŝ (1) b = [ N i=1 ( Z i Ĥ s(1) i Z i Z i Z s i ˆD s(1) i Z i Zs s(1) ˆD i Z i s i û s(1) i û s(1) i Z i s )] 1, (4.62) where Ĥs(1) i ˆD s(1) i = is self-evidently like Ĥ(1)b i.. 0 ε s(1) i2 û s(1) ε s(1) i2 û s(1) i1 0 ε s(1) i3... but on the basis of the residuals ε s(1) i, and i û s(1) i2 ε s(1) i û s(1) i ε s(1) i,t 1û s(1) i,t 1.. ε s(1) i,t 1û s(1) it ε s(1) it û s(1) it. (4.63) For the special case σεω 2 i = σε,i 2 of cross-section heteroskedasticity and time-series homoskedasticity one can use the weighting matrix Ŝ (1) c = [ N i=1 ˆσ 2,s(1) ε,i ( Z i HZ i Z id Z s i Z s i D Z i Zs i [ +(ˆσ η 2(1) /ˆσ 2,s(1) ε,i )ι T ι T ] Z s i )] 1, (4.64) where ˆσ 2,s(1) ε,i = ε s(1) s(1) i H 1 ε i /(T 1), (4.65) ˆσ η 2,s(1) = N i=1 [(ι T û s(1) i ) 2 û s(1) i û s(1) i ]/[NT(T 1)]. (4.66) Note that, unlike the former, the latter is consistent, because for t s plim N 1 N i=1 ûs(1) it û s(1) is = lim N 1 N itu is ) i=1 = lim N 1 N i + ε it )(η i + ε is )=ση. 2 i=1 The three alternatives yield 2-step system estimators α (2) (j) for j {a, b, c} where [( α (2) N (j) = R Z ) ( N )] 1 ( N i i s Ŝ(1) j Z i s R i R Z ) ( N ) i i s Ŝ(1) j Z i s ÿ i, (4.67) i=1 i=1 i=1 i=1 and the inverse matrix expression can be used again to estimate the variance of α (2) (j) if all employed moment conditions are valid. 104

117 4.3.4 Coefficient restriction tests Simple Student-type coefficient test statistics can be obtained from 1-step and 2-step AB and BB estimation for the different weighting matrices considered. The 1-step estimators can be used in combination with a robust variance estimate (which takes possible heteroskedasticity into account). The 2-step estimators can be used in combination with the standard or a corrected variance estimate. 2 When testing particular coefficient values, the relevant element of estimator α (1) given in (4.40) should under homoskedasticity be scaled by the corresponding diagonal element of the standard expression for its estimated variance given by Var( α (1) )=N 1 N [( N i=1 ˆσ2(1) ε,i Ψ, with Ψ = i=1 R iz i ) G (0) ( N i=1 Z i R i )] 1, (4.68) where ˆσ 2(1) ε,i is given in (4.46). ts robust version under cross-sectional heteroskedasticity uses for j {a, b, c} ( Var (j) ( α (1) N ) ] 1 ( N )=Ψ R iz (0) i G [Ĝ(1) j G (0) i=1 i=1 Z R ) i i Ψ. (4.69) However, under heteroskedasticity the estimators α (2) (j) given in (4.67) are more efficient. The standard estimator for their variance is [( Var( α (2) N (j)) = i=1 R iz i ) Ĝ(1) j ( N i=1 Z R )] 1 i i. (4.70) The corrected version Varc( α (2) (j)) requires derivation for k =1,..., K 1 of the actual implementation of matrix Ω(β)/ β k of Appendix 4.A which is here N(T 1) N(T 1). We denote its i-th block as Ω (j)i ( α)/ α k. For the a-type weighting matrix 3 the relevant T 1 T 1 matrix ε i ε i/ α k with ε i =ỹ i R i α, is ( ε i R ik + R ik ε i), where R ik denotes the k-th column of R i. For weighting matrix b it simplifies to the matrix consisting of the main diagonal and the two first sub-diagonals of ( ε i R ik + R ik ε i) with all other elements zero. And Ω (c)i ( α)/ α k = 2[ ε ih 1 Rik /(T 1)]H. So, we find Varc( α (2) (2) (j)) = Var( α (j))+ ˆF (j) Var( α (2) (2) (j))+ Var( α (j)) ˆF (j) + ˆF (j) Var(j) ( α (1) ) ˆF (j), (4.71) 2 Many authors and the Stata program confusingly address the corrected 2-step variance as robust. 3 This is the only variant considered in Windmeijer (2005). 105

118 with the k-th column of ˆF (j) given by ( (2) ( N ) N ˆF (j) k = Var( α (j)) R iz i Ĝ(1) j i=1 i=1 Z i Ω (j)i ( α) α k Z i α (1) ) ( Ĝ (1) N ) j i=1 Z i ε (2) i. (4.72) All above expressions become a bit more complex when considering Blundell-Bond estimation of the K coefficients α. The suboptimal 1-step estimator (4.60) of α should not be used for testing, unless in combination with ( Var( α (1) N )=Φ R Z ) i i s S (0) (q) [ S (0) (ˆσ η 2,s(1) i=1 /ˆσ ε 2,s(1) under homoskedasticity, or a robust variance estimator, which is where Φ = ( Var (j) ( α (1) N )=Φ R Z ) i i s S (0) (q) i=1 [( N i=1 R i Z s i ) ( N S (0) (q) i=1 ) ] 1 ( N S (0) (q) [Ŝ(1) ] 1 ( N j S (0) (q) i=1 i=1 ) Z i s R i Φ, (4.73) ) Z i s R i Φ, (4.74) )] 1 s Z i R i. t seems better of course to use the efficient estimator α (2) (j) of (4.67). The standard expression for its estimated variance is [( Var( α (2) N (j)) = i=1 R i Z s i ) Ŝ(1) j ( N i=1 )] 1 Z i s R i. (4.75) Their corrected versions can be obtained by a formula similar to (4.71) upon changing α in α and ˆF (j) k of (4.72) in ( (2) ( N ˆF (j) k = Var( α (j)) R i Z ) N i s Ŝ(1) j Z i s i=1 i=1 Ω (j)i ( α) α k α (1) Zs i ) ( Ŝ (1) N j i=1 (4.76) where the block of Ω (j)i ( α)/ α k corresponding to the equation in first differences is similar as before, but with an extra column and row of zeros for the intercept. The block corresponding to the equation in levels we took for weighting matrices a and b equal to u i u i/ α k = (u R i ik + R ik u i), and for type c { ε ih 1 ε i /(T 1) + N i=1 [(ι T u i ) 2 u iu i ]/[NT(T 1)]} T / α k, for which the first term yields 2[( ε ih 1 Rik )/(T 1)] T, and the second gives Z s i (2) ε i ), 2{ N i=1 (ι T u R i ikι T R iku i )/[NT(T 1)]} T. 106

119 For the nondiagonal upperblock of Ω (j)i ( α)/ α k we took in cases a and b ε i u i/ α k = ( ε R i ik + R ik u i) and for the derivative with respect to the intercept ε i ι T. n case c it is 2[( ε ih 1 Rik )/(T 1)]D and a zero matrix for the derivative with respect to the intercept Tests of overidentification restrictions Using Arellano-Bond and Blundell-Bond type estimation, many options exist with respect to testing the overidentification restrictions. These options differ in the residuals and weighting matrices being employed. After 1-step Arellano-Bond estimation, see (4.40) and (4.46), we have test statistic ( N JAB (1,0) = i=1 )( ε (1) N ) 1 ( N ) i Z i i=1 Z ihz i i=1 Z i ε (1) i / (N 1 N ) i=1 ˆσ2(1) i, (4.77) which is only valid in case of homoskedasticity. Alternatively, after 1-step estimation the heteroskedasticity-robust test statistics ( JAB (1,1) N ) ( j = ε (1) N ) i Z i Ĝ(1) j,j {a, b, c} (4.78) i=1 may be used, where Ĝ(1) j i=1 Z i ε (1) i is given in (4.41), (4.42) and (4.44). Given the type of weight- (2) α (j), and from these ing matrix being used, one obtains 2-step residuals ε (2) i(j) =ỹ i R i overidentification restrictions test statistics can be obtained, which may differ depending on whether the weighting matrix is now obtained still from 1-step or already from 2-step residuals. 4 This leads to ( JAB (2,h) N ) ( j = ε (2) N ) a,i Z i Ĝ(h) j i=1 i=1 Z i ε (2) a,i, for h {1, 2} (4.79) where the 2-step weighting matrices are either Ĝ(2) ( N ) 1 i=1 Z iĥ(2)b i Z i or Ĝ (2) c = a = ( N ) 1, i=1 ˆσ2(2) i(c) Z ihz i and Ĥ (2)b i ( N i=1 Z i ε (2) i(a) ε (2) i(a)z i ) 1, Ĝ (2) b = though using ε (2) i(b) instead of ε (1) i ; furthermore ˆσ 2(2) i(c) = ε (2) (2) i(c)h 1 ε i(c)/(t 1). is like Ĥ(1)b i of (4.43), 4 Package xtabond2 for Stata always reports JAB (1,0) after Arellano-Bond estimation, which is inappropriate when there is heteroskedasticity. After requesting for robust standard errors in 1-step estimation it presents also JABa (2,1) instead of JABa (1,1). Requesting 2-step estimation also presents both JAB (1,0) and JABa (2,1). Blundell-Bond estimation yields JBB (1,0) and JBB (2,1), although a version of JBB (1,0) is reported that does not use weighting matrix S (0) (ˆσ η 2,s(1) /ˆσ ε 2,s(1) ), but S (0) (0), which is only valid under homoskedasticity and ση 2 =0. 107

120 Exploiting effect stationarity of a subset of the regressors by estimating the Blundell- Bond system leads to the 1-step test statistics ( N ) ( JBB (1,0) = ü (1) N ) i Z i s S (0) (ˆσ η 2,s(1) /ˆσ ε 2,s(1) ) Z s (1) i ü i /ˆσ ε 2,s(1), (4.80) i=1 i=1 ( JBB (1,1) N ) ( j = ü (1) N ) i Z i s Ŝ(1) j Z i s (1) ü i,j {a, b, c} (4.81) i=1 where ˆσ ε 2,s(1) = N i=1 ˆσ2,s(1) ε,i /N and S (0) ( ) andŝ(1) j can be found in (4.59), (4.61), (4.62) and (4.64). Defining the various 2-step residuals and variance estimators as ü (2) i(j) =ÿ i R i α (2) (j) =( ε s(2) i(j) û s(2) i(j) ) and ˆσ 2,s(2) ε,i(j) and ˆσ 2(2) η(j) similar to (4.65) and (4.66) though obtained from the appropriate two-step residuals ε s(2) i(j) =ỹ i R i statistics to be used after 2-step estimation are where Ŝ(2) a and Ŝ(2) b ( JBB (2,h) N j = ü (2) Z i(j) i s i=1 are like Ŝ(1) a ü (1) i. With respect to Ŝ(2) c one can use Ŝ (2) c = [ N i=1 ˆσ 2,s(2) ε,i(j) i=1 ) Ŝ(h) j ( N α (2) (j) and û s(2) i(j) = y i R i α (2) (j), the i=1 Z s i (2) ü i(j) ), (4.82) and Ŝ(1) b,exceptthattheyuse ü (2) i(a) and ü (2) i(b) instead of ( Z i HZ i Z id Z s i Z s i D Z i Zs i [ +(ˆσ 2(2) η(c) /ˆσ2,s(2) ε,i(c) )ι T ι T ] Z s i )] 1 Under their respective null hypotheses the tests based on Arellano-Bond estimation follow asymptotically χ 2 distributions with L K +1 degrees of freedom, whereas the tests based on Blundell-Bond estimates have L + L K degrees of freedom. Self-evidently tests on the effect stationarity related orthogonality conditions are given by JES (1,0) = JBB (1,0) JAB (1,0), (4.83) JES (l,h) j = JBB (l,h) j JAB (l,h) j, 0 <l h {1, 2}, j {a, b, c} (4.84) and should be compared with a χ 2 critical value for L 1 degrees of freedom. 5 5 By specifying instruments in separate groups, xtabond2 presents for each separate group the corresponding incremental J test. However, not the version as defined in (4.17), but an asymptotically equivalent one as suggested in Hayashi (2000, p.220) which will never be negative.. 108

121 4.3.6 Modified GMM n the special case that panel model (4.25) has cross-sectional heteroskedasticity and no time-series heteroskedasticity, hence σεω 2 i = σε,i, 2 with N i=1 σ2 ε,i = σεn, 2 (4.85) we can easily employ MGMM estimator (4.11). However, because H 1 is not a lowertriangular matrix, not all instruments σ 2 ε,i H 1 Z i would be valid for the equation in firstdifferences. This problem can be avoided by using, instead of first-differencing, the forward orthogonal deviation (FOD) transformation for removing the individual effects. Let B = T T T T / T 1 T 1 T T 2 T 2 T , (4.86) and ˇε i = Bε i. Then Bu i =ˇε i (0,σε,i 2 T 1 ) provided (4.85) holds, whereas E(Z i ˇε i )=0 for Z i given by (4.31). Hence, estimating model ˇy i = Ři α +ˇε i, (4.87) where ˇy i = By i and Ři = BR i, by GMM, but using an instrument matrix with components σ 2 ε,i Z i yields the unfeasible MABu estimator for the model with cross-sectional heteroskedasticity, which is α MABu = [ ( N )( N ) 1 ( N ) ] 1 i=1 σ 2 ε,i Ř iz i i=1 σ 2 ε,i Z iz i i=1 σ 2 ε,i Z iři ( N )( N ) 1 ( N ) i=1 σ 2 ε,i Ř iz i i=1 σ 2 ε,i Z iz i i=1 σ 2 ε,i Z i ˇy i. (4.88) Note that the exploited moment conditions are here E(σ 2 ε,i Z i ˇε i )=σ 2 ε,i E(Z i ˇε i )=0. For σε,i 2 > 0 these are intrinsically equivalent with E(Z i ˇε i )=0, but they induce the use of a different set of instruments yielding a different estimator. That it is most likely that the unfeasible standard AB estimator ABu, which uses instruments σ ε,i Z i for regressors σ 1 ε,i Ři, will generally exploit weaker instruments than MABu, which uses σ 1 ε,i Z i for regressors σ 1 ε,i Ři, should be intuitively obvious. 109

122 To convert this into a feasible procedure, one could initially assume that all σε,i 2 are equal. Then the first-step MGMM estimator α (1) MAB is numerically equivalent to AB1 of (4.40), provided all instruments are being used. 6 Next, exploiting (4.46), the feasible 2-step MAB2 estimator can be obtained by α (2) MAB2 = [ ( N i=1 ( N i=1 )( Ř iz i /ˆσ 2(1) N ) 1 ( ε,i i=1 Z iz i /ˆσ 2(1) N ) ] 1 ε,i i=1 Z iři/ˆσ 2(1) ε,i ) 1 ( N ) )( Ř iz i /ˆσ 2(1) N ε,i i=1 Z iz i /ˆσ 2(1) ε,i i=1 Z i ˇy i /ˆσ 2(1) ε,i. (4.89) Modifying the system estimator is more problematic, primarily because the inverse of the matrix Var(u i )=Σ i = σε,i 2 T + σηι 2 T ι T, which is Σ 1 i = σ 2 ε,i [ T +(T + σε,i/σ 2 η) 2 1 ι T ι T ], s s is nondiagonal. So, although E( Z i u i )=0, surely E( Z i Σ 1 i u i ) 0. However, as an unfeasible modified system estimator we can combine estimation of the model for ˇy i using instruments σ 2 ε,i Z i with estimation of the model for y i using instruments (σε,i 2 + ση) 2 1 Zs i. So, the system is then given by the model... y i =... Ri α +... u i, (4.90) where... y i =(ˇy i y i),... Ri =(Ř i R i ), with Ř i =(Ři 0), and... u i =(ˇε i u i). For the 1-step estimator we could again choose some nonnegative value q and calculate the 1-step estimator BB1 given in (4.60) in order to find residuals and obtain the estimators ˆσ 2,s(1) ε,i and ˆσ η 2,s(1) of (4.65) and (4.66). Building on E( u i u i) and instrument matrix block... Z i, given by ( E( u i u i)=σε,i 2 T 1 B B T +(ση/σ 2 ε,i)ι 2 T ι T one obtains weighting matrix ) and... ( 2,s(1) ˆσ ε,i Z i O Z i = O (ˆσ 2,s(1) ε,i +ˆσ η 2,s(1) ) 1 Zs i [ N ( Ŝc B(1) ˆσ 2,s(1) ε,i Z = iz i (ˆσ 2,s(1) ε,i +ˆσ η 2,s(1) ) 1 Z ib Z i s i=1 (ˆσ 2,s(1) ε,i +ˆσ η 2,s(1) 1 ) Zs i B Z i (ˆσ 2,s(1) ε,i +ˆσ η 2,s(1) 2 ) Zs i [ˆσ 2,s(1) ε,i +ˆσ η 2,s(1) ι T ι T ] Z i s (4.91) which can be exploited in the feasible 2-step MGMM system estimator MBB2 [( α (2) N... MBB2 = R... ) ( N... iz i ŜB(1) c Z... )] 1 ( N... iri R... ) ( N... iz i ŜB(1) c Z... ) y i i. i=1 i=1 i=1 i=1 (4.92) 6 Proved in Arellano and Bover (1995). ), )] 1, 110

123 For both α (2) MAB and α (2) MBB relevant t-test and Sargan-Hansen test statistics can be constructed. Regarding the latter we will just examine ( N JMAB = i=1 ˇε (2) i )( Z i /ˆσ 2,s(1) N where ˇε (2) (2) i =ˇy i Ři α MGMMch, and ( N JMBB = ε,i... i=1 Z iz i /ˆσ 2,s(1) ε,i... u (2) i Z s i i=1 ) Ŝ(1) c ) 1 ( N ( N i Z s u (2) i i=1 i=1 Z i ˇε (2) i /ˆσ 2,s(1) ε,i ), (4.93) ), (4.94) with u... (2) i =... y i... Ri α (2) MBB =( ˇε s(2) i û s(2) i ). Under their respective null hypotheses these follow asymptotically χ 2 distributions with L K +1 and L + L K degrees of freedom. Self-evidently, the test on the effect stationarity related orthogonality conditions is given by JESM = JMBB JMAB. (4.95) 4.4 ntermediate conclusions n social science the quantitative analysis of many highly relevant problems requires structural dynamic panel data methods. These allow the observed data to have at best a quasiexperimental nature, whereas the causal structure and the dynamic interactions in the presence of unobserved heterogeneity have yet to be unraveled. When the cross-section dimension of the sample is not very small, employing GMM techniques seems most appropriate in such circumstances. This is also practical since corresponding software packages are widely available. However, not too much is known yet about the actual accuracy in practical situations on the abundance of different not always asymptotically equivalent implementations of estimators and test procedures. This study serves as the technical prelude to the next chapter which aims to demarcate the areas in the parameter space where the asymptotic approximations to the properties of the relevant inference techniques in this context have either shown to be reliable beacons or are actually often misguiding marsh fires. For that purpose we first had to provide for this specific panel data context a rather rigorous treatment of many major variants of GMM implementations as well as for the inference techniques on testing the validity of particular orthogonality assumptions and restrictions on individual coefficient values. Special attention is given to the consequences of the joint presence in the model of time-constant and individual-constant unobserved effects, covariates that may be strictly exogenous, predetermined or endogenous, and dis- 111

124 turbances that may show particular forms of heteroskedasticity. Also the implications regarding initial conditions for separate regressors with respect to individual effect stationarity are analyzed in great detail. And various popular options that aim to mitigate bias in coefficient estimates by reducing the number of exploited internal instruments, and to mitigate bias in variance estimates by a small-sample correction, are elucidated. n addition, as alternatives to those used in current standard software, less robust weighting matrices and additional variants of Sargan-Hansen test implementations are considered, as well as the effects of particular modifications of the instruments under heteroskedasticity. For the actual simulation findings, the resulting recommendations to practitioners, and for an empirical illustration, see the next chapter. Appendix 4.A Corrected variance estimation for 2- step GMM Windmeijer (2005) provides a correction to the standard expression for the estimated variance of the 2-step GMM estimator in general nonlinear models and next specializes his results for models with linear moment conditions and finally for linear (panel data) models. Here we apply his approach directly to the standard linear model of section where ˆβ (2) is based on weighting matrix Z ˆΩ(1) Z, where ˆΩ (1) depends on û (1) and thus on ˆβ (1). The nonlinear dependence of ˆβ (2) on ˆβ (1) can be made explicit by a linear approximation obtained by employing the well-known delta-method to the vector function f(β) ={X Z[Z Ω(β)Z] 1 Z X} 1 X Z[Z Ω(β)Z] 1 Z u. Note that ˆβ (2) = β 0 + f( ˆβ (1) ). Expanding the second term around β 0 yields ˆβ (2) β 0 f(β 0 )+ f(β) β ( ˆβ (1) β 0 ), (4.96) β=β0 where under sufficient regularity the omitted terms will be of small order. For k =1,..., K we find f(β) = {X Z[Z Ω(β)Z] 1 Z X} 1 X Z[Z Ω(β)Z] 1 Z u β k β k +{X Z[Z Ω(β)Z] 1 Z X} 1 X Z[Z Ω(β)Z] 1 Z u, β k 112

125 where with and {X Z[Z Ω(β)Z] 1 Z X} 1 = {X Z[Z Ω(β)Z] 1 Z X} 1 X Z [Z Ω(β)Z] 1 Z X β k β k {X Z[Z Ω(β)Z] 1 Z X} 1, [Z Ω(β)Z] 1 β k {X Z[Z Ω(β)Z] 1 Z u} β k = [Z Ω(β)Z] 1 Z Ω(β) β k Z[Z Ω(β)Z] 1, = X Z [Z Ω(β)Z] 1 β k Z u. n the latter we omit an extra term in u/ β k = (y Xβ)/ β k simply because we just want to extract the dependence of ˆβ (2) on the operational weighting matrix. Using the short-hand notation A(β) =Z[Z Ω(β)Z] 1 Z and Ω k (β) = Ω(β)/ β k we can establish from the above that f(β)/ β k = [X A(β)X] 1 X A(β)Ω k (β){ A(β)X[X A(β)X] 1 X }A(β)u. This is the k-th column of the matrix F (β) = f(β)/ β in (4.96). The latter can now be expressed as ˆβ (2) β 0 {[X A(β 0 )X] 1 X A(β 0 )+F (β 0 )(X P Z X) 1 X P Z }u. (4.97) Because F (β 0 )=O p (n 1/2 ) the second term is of smaller order. This approximation to the estimation errors of ˆβ (2) can be used to obtain a finite sample corrected variance estimate of ˆβ (2). This is relatively easy if one conditions on some value for F (β 0 ), say ˆF. Windmeijer chooses for the k-th column of ˆF the vector [X A( ˆβ (1) )X] 1 X A( ˆβ (1) )Ω k ( ˆβ (1) ){ A( ˆβ (1) )X[X A( ˆβ (1) )X] 1 X }A( ˆβ (1) )û (2). Taking û (2) instead of the asymptotically equivalent û (1) leads to substantial simplification, because X A( ˆβ (1) )û (2) = X Z[Z Ω(β (1) )Z] 1 Z (y X ˆβ (2) )=0, giving ˆF =(ˆF 1,..., ˆF K ), with ˆF k = [X A( ˆβ (1) )X] 1 X A( ˆβ (1) )Ω k ( ˆβ (1) )A( ˆβ (1) )û (2). (4.98) Note that when L = K we have ˆF = O, because Z û (1) = Z û (2) =0. This all then yields for L>Kthe corrected variance estimator Varc( ˆβ (2) )= Var( ˆβ (2) )+ ˆF Var( ˆβ (2) )+ Var( ˆβ (2) ) ˆF + ˆF Var( ˆβ (1) ) ˆF, (4.99) 113

126 where Var( ˆβ (1) )=ˆσ 2 u(x P Z X) 1 X P Z ˆΩPZ X(X P Z X) 1. Note that in case Ω(β) =diag(u 2 1,..., u 2 n) one has Ω k ( ˆβ (1) )= 2 (1) ˆβ k diag(û(1) 1 x 1k,..., û (1) n x nk ). Appendix 4.B Partialling out and GMM The V/2SLS result on partialling out directly generalizes for the MGMM estimator, provided this uses all the (transformed) predetermined regressors as instruments. n standard GMM the equivalence of predetermined regressors and a block of the instruments gets lost. Using the notation of (4.10) and considering the partitioned model leading to (4.8), we easily find its counterpart ˆβ 1,GMM =(ˆX 1 M ˆX 1 ˆX 1 ) ˆX 2 1 M ˆX 2 y, (4.100) where ˆX =(ˆX 1 ˆX 2 )=P Z (X 1 X 2)=P (Ψ ) 1 Z(ΨX 1 ΨX 2 ). n the special case of system (4.50) with instruments (4.52) we have X 2 = Z 2 =(0,ι T ) and Z 1Z 2 =0, whereas under cross-sectional heteroskedasticity, due to Dι T =0, the optimal weighting matrix is block-diagonal, hence Z 1ΩZ 2 = 0. Therefore Z 1 Z 2 = 0 too, giving P Z = P Z 1 (P Z 1 +P Z )ΨZ 2 = P 2 Z 2 Z 2 has just one column. Therefore, M ˆX 2 + P Z. Now we find ˆX 1 = P Z X1 =(P 2 Z + P 1 Z )X 1 and ˆX 2 = 2 ΨZ 2 =(Ψ ) 1 Z 2 (Z 2ΩZ 2 ) 1 Z 2Z 2 = cz 2, with c some scalar, because 1 = M Z (P 2 Z + P 1 Z 2 this particular case (when using an appropriate weighting matrix), we find ˆX ˆβ 1,GMM =(ˆX 1 1 M ˆX ˆX 2 1 ) ˆX 1 M ˆX 2 y =(ˆX 1 P Z 1 )X1 = P Z X1. Thus, in 1 X1) 1 ˆX 1 P Z 1 Due to the block of zeros in Z 1 this is just the GMM estimator of the model in first differences. y. 114

127 Appendix 4.C Extracting redundant moment conditions Through linear transformation 7 we demonstrate that the sets of moment condions for the equation in levels and for the equation in first-differences have a non empty intersection. First we consider the moment conditions associated with the strictly exogenous regressors. For the equation in first differences these are E(x T i Δε it )=E[Δε it (x i1...x it ) ]=0, for t = 2,..., T. They can also be represented 8 by the combination E[Δε it (Δx i2...δx it ) ]=0and E(x it Δε it )=0. However, by a similar transformation 9 (here of the disturbances instead of the instruments), the conditions for the equation in levels E[Δx ith (η iι T + ε i )] = 0, where h =1,..., Kx (and again t =2,..., T ), can be represented by E(Δx ith ε i)=0and E[Δx ith (η i+ε it )] = 0. So, just the Kx (T 1) orthogonality conditions E[Δx it (η i+ε it )] = 0 for t =2,..., T are additional due to effect stationarity of Kx of the strictly exogenous regressors. Similarly, the orthogonality conditions E(w t 1 i Δε it )=0, or E(w is Δε it )=0fors = 1,..., t 1witht =2,..., T, can be represented by E(w i,t 1 Δε it )=0fort =2,..., T and E(Δw is Δε it )=0fort =3,..., T and s =2,..., t 1. On the other hand, the conditions E[Δw it (η i+ε i,t+l )] = 0 for t>1andl>0are actually E[Δw is (η i+ε it )] = 0 for t =2,..., T and s =2,..., t, whereas these can be represented by E(Δw is Δε it) =0fort =3,..., T and s =2,..., t 1andE[Δw it (η i + ε it )] = 0 for t =2,..., T. Thus, only the Kw(T 1) conditions E[Δw it (η i + ε it )] = 0 for t =2,..., T are additional. Using the same logic, the orthogonality conditions E(v t 2 i Δε it )=0fort =3,..., T, which are actually E(v is Δε it ) = 0 for t = 3,..., T and s = 1,..., t 2, can also be represented by E(v i,t 2 Δε it )=0fort =3,..., T and E(Δv is Δε it )=0fort =4,..., T and s =2,..., t 2. However, the conditions E[Δv it (η i +ε i,t+1+l )] = 0 for t>1andl>0arein fact E[Δv is (η i + ε it )] = 0 for t =3,..., T and s =2,..., t 1, which can also be represented as E(Δv is Δε it) =0fort =4,..., T and s =2,..., t 2andE[Δv i,t 2 (η i + ε it )] = 0 for t =3,..., T. Thus, we find that only the Kv (T 2) conditions E[Δv i,t 2 (η i + ε it )] = 0 for t =3,..., T are additional. 7 n this Appendix we repeatedly use the result that the p conditions E(aCb) =0, where a is a random scalar, b a p 1 random vector and C a deterministic nonsingular p p matrix, are equivalent with the p conditions E(ab) =0, because E(ab) =0 CE(ab) =0 E(aCb) =0. 8 Here C =(D e T,t ) Kx. 9 Now C =(D e T,t ). 115

128

129 Chapter 5 Accuracy and efficiency of various GMM inference techniques in dynamic micro panel data models: practice 5.1 ntroduction n the previous chapter it has been argued that the available studies on the performance of alternative inference techniques for micro dynamic panel data models have obvious limitations when it comes to advising practitioners on the most effective implementations of estimators and relevant tests under reasonably general circumstances. As a rule, the available simulation studies do not consider various empirically highly relevant issues in conjunction, such as: (i) occurrence and the possible endogeneity of regressors additional to the lagged dependent variable, (ii) occurrence of individual effect (non-)stationarity of both the lagged dependent variable and other regressors, (iii) cross-section and/or timeseries heteroskedasticity of the idiosyncratic disturbances, and (iv) variation in signalto-noise ratios and in the relative prominence of individual effects. Here, many of the techniques explicated in the previous chapter are examined by a much more comprehensive Monte Carlo study than executed before. Next, for illustrative purposes, those simulation results are used as yardsticks when interpreting findings for an empirical panel data set, already examined extensively in earlier literature. The chapter concludes with a long list of suggestions which should support practitioners when choosing between the many available implementations of estimators and test statistics and next interpreting the resulting GMM-based inference for this kind of model and data, where the time-series 117

130 sample size is very small and the cross-section sample size should be much larger. For the simulation experiments to be reasonably realistic it is argued that a data generating mechanism has to be designed which involves at least 10 design parameters. For these we select some of the grid values by a nonstandard approach, in which the model parameter values are solved from a set of salient general econometric characteristics of the data generating process. To respect reasonable space limitations this chapter presents just a selection of all the obtained simulation results. 1 t is found that many of the asymptotic techniques articulated in the previous chapter which are currently actively employed by many researchers may demonstrate serious inaccuracies in samples of a size not uncommon in practice. This is also the case for more specialized techniques which either limit robustness by adopting extra assumptions, or employ extra iterations, or which involve available finite sample corrections. Therefore, we also deliberately explore in the simulations unfeasible versions of estimators and test statistics. n these particular in practice unknown nuisance parameter values are nevertheless assumed to be known. From their performance often more useful conclusions can be drawn on the particular aspects of the data generating process which have major effects on any inference inaccuracies in finite samples. The structure of this chapter is as follows. n Section 5.2 the design of the simulation study is presented and the selection of grid values for its parameter values is discussed. Section 5.3 contains a selection of the simulation results and provides a detailed discussion of their consequences. The empirical illustration, which involves data on labor supply earlier examined by Ziliak (1997), can be found in Section 5.4, and in Section 5.5 the major findings are summarized. 5.2 Simulation design We will examine the stable dynamic simultaneous heteroskedastic DGP (i =1,..., N, t =1,..., T ) y it = α y + γy i,t 1 + βx it + σ η η i + σ ε ω 1/2 i ε it ( γ < 1). (5.1) 1 A full set of Monte Carlo results can be obtained from the authors upon request. When in Section 5.3 findings are reported which are not fully supported by one of its many tables, we guarantee and the reader can verify that they follow from the supplementary material. 118

131 Here β has just one element relating to the for each i stable autoregressive regressor x it = α x + ξx i,t 1 + π η ηi + π λ λ i + σ v ω 1/2 i vit, where (5.2) vit = ρ vε ε it +(1 ρ 2 vε) 1/2 ζit, (5.3) with ξ < 1and ρ vε < 1. All random drawings ηi,ε it, λ i,ζit are D(0, 1) and mutually independent. Parameter ρ vε indicates the correlation between the cross-sectionally heteroskedastic disturbances ε it = σ ε ω 1/2 i ε it and v it = σ v ω 1/2 i vit, which are both homoskedastic over time. How we did generate the values ω 1,..., ω N and the start-up values x i,0 and y i,0 and chose relevant numerical values for the other eleven parameters will be discussed extensively below. Note that in this DGP x it is either strictly exogenous (ρ vε = 0) or otherwise endogenous 2 ; the only weakly exogenous regressor is y i,t 1. Regressor x it may be affected contemporaneously by two independent individual specific effects when π η 0andπ λ 0, but also with delays if ξ 0. The dependent variable y it may be affected contemporaneously by the (standardized) individual effect ηi, both directly and indirectly; directly if σ η 0, and indirectly via x it when βπ η 0. However, ηi will also have delayed effects on y it,whenγ 0orξβπ η 0, and so has λ i when ξβπ λ 0. For the cross-sectional heteroskedasticity we follow an approach similar to Kiviet and Feng (2014). t is determined by both ηi and λ i, the two standardized individual effects, and is thus associated with the regressors x it and y i,t 1. t follows a lognormal pattern when both ηi and δi are standard normal, because we take ω i = e hi(θ), with h i (θ) = θ 2 /2+θ[κ 1/2 η i +(1 κ) 1/2 λ i ] ND( θ 2 /2,θ 2 ), (5.4) where 0 κ 1. This establishes a lognormal distribution with E(ω i )=1andVar(ω i )= e θ2 1. So, for θ =0theε it and v it are homoskedastic. The seriousness of the heteroskedasticity increases with the absolute value of θ. Obviously, for κ = 0 the error components η i and ε it are independent (like ηi and ε it), but not for 0 <κ 1. From h i (θ)/2 ND( θ 2 /4,θ 2 /4) it follows that ω 1/2 i = e hi(θ)/2 is lognormally distributed too, with E(ω 1/2 i )=e θ2 /8. Table 5.1 presents some quantiles of the distributions of ω i and ω 1/2 i (taken as the positive square root of ω i ) and the expectation of the latter in order to disclose the effects of parameter θ. t shows that θ 1 implies pretty serious heteroskedasticity, whereas it may be qualified mild when θ 0.3, say. Without loss of generality we may chose σ ε =1andα y = α x =0. Note that (5.1) 2 Strictly following the notation of the previous chapter the coefficient β should actually be called δ when ρ vε

132 Table 5.1: Heteroskedasticity quantiles and E(ω 1/2 i ) θ ω i ω 1/2 i q 0.01 q 0.5 q 0.99 q 0.01 q 0.5 q 0.99 E(ω 1/2 i ) implicitly specifies τ =0. All simulation results refer to estimators where these T 1 restrictions have been imposed (there are no time effects), but α y = α x = 0 have not been imposed. Hence, when estimating the model in levels ι T is one of the regressors. Moreover, we may always include T 1 in Z i and ι T in Z i s [see Chapter 4 (4.31) and (4.54)] in order to exploit the fundamental moment conditions E( ε it )=0(fort =2,..., T )and E[ T t=1 (η i + ε it )] = 0 for i =1,..., N. Apart from values for θ and κ, we have to make choices on relevant values for eight more parameters. We could choose γ {0.2, 0.5, 0.8}, which covers a broad range of adjustment processes for dynamic behavioral relationships, and ξ {0.5, 0.8, 0.95} to include less and more smooth x it processes. Next, interesting values should be given to the remaining six parameters, namely β, σ η,π η,π λ,σ v and ρ vε. We will do this by choosing relevant values for six alternative more meaningful notions, which are all functions of some of the eight DGP parameters and allow to establish relevant numerical values for them, as suggested in Kiviet (2012). The first three notions will be based on (ratios of) particular variance components of the long-run stationary path of the process for x it. Using lag-operator notation and assuming that vit (and ε it ) exist for t =,..., 0, 1,..., T, we find that the long-run path for x it consists of three mutually independent components, namely x it =(1 ξ) 1 π η η i +(1 ξ) 1 π λ λ i + σ v ω 1/2 i (1 ξl) 1 v it. (5.5) The third component, the accumulated contributions of vit, is a stationary AR(1) process with variance σvω 2 i /(1 ξ 2 ). Approximating N 1 N i=1 ω i by 1, the average variance is σv/(1 2 ξ 2 ). The other two components have variances πη/(1 2 ξ) 2 and πλ 2 /(1 ξ)2 120

133 respectively, so the average long-run variance of x it equals V x =(1 ξ) 2 (π 2 η + π 2 λ)+(1 ξ 2 ) 1 σ 2 v. (5.6) A first characterization of the x it series can be obtained by setting V x =1. This is an innocuous normalization, because β is still a free parameter. As a second characterization of the x it series, we can choose what we call the (average) effects variance fraction of x it, given by EV F x =(1 ξ) 2 (πη 2 + πλ)/ 2 V x, (5.7) with 0 EV F x 1, for which we could take, say, EV F x {0, 0.3, 0.6}. To balance the two individual effect variances we define for the case EV F x > 0 what we call the individual effect fraction of ηi in x it given by EFx η = πη/(π 2 η 2 + πλ). 2 (5.8) So EFx η, with 0 EFx η 1, expresses the fraction due to π λ ηi of the (long-run) variance of x it stemming from the two individual effects. We could take, say, EFx η {0, 0.3, 0.6}. From these three characterizations we obtain π λ = (1 ξ)[(1 EFx η )EV F x ] 1/2, (5.9) π η = (1 ξ)[efx η EV F x ] 1/2, (5.10) σ v = [(1 ξ 2 )(1 EV F x )] 1/2. (5.11) For all three we will only consider the nonnegative root, because changing the sign would have no effects on the characteristics of x it, as we will generate the series ηi,ε it, λ i and ζit from symmetric distributions. The above choices regarding the x it process have the following implications for the average correlations between x it and its two constituting effects: ρ xη = π η /(1 ξ) =[(1 EFx η )EV F x ] 1/2, (5.12) ρ xλ = π λ /(1 ξ) =[EFx η EV F x ] 1/2. (5.13) Now the x it series can be generated upon choosing a value for ρ vε. This we obtain from E(x it ε it )=σ ε σ v ρ vε ω i, which on average is σ v ρ vε. Hence, fixing the average simultaneity 3 3 Such control is not exercised in the simulation designs of Blundell et al. (2001) and Bun and Sarafidis (2015). They do consider simultaneity, but its magnitude has not been mentioned and it is not kept constant over different designs. 121

134 to ρ xε, we should choose ρ vε = ρ xε /σ v. (5.14) n order that both correlations are smaller than 1 in absolute value an admissibility restriction has to be satisfied, namely ρ 2 xε σv, 2 giving ρ 2 xε (1 ξ 2 )(1 EV F x ). (5.15) When choosing EV F x =0.6 andξ =0.8 we should have ρ xε That we should not exclude negative values of ρ xε will become obvious in due course. For the moment it seems interesting to examine, say, ρ xε { 0.3, 0, 0.3}. The remaining choices concern β and σ η which both directly affect the DGP for y it. Substituting (5.5) and (5.3) in (5.1) we find that the long-run stationary path for y it entails four mutually independent components, since y it = β(1 γl) 1 x it +(1 γ) 1 σ η ηi + σ ε ω 1/2 i (1 γl) 1 ε it = (1 γ) 1 (1 ξ) 1 {[βπ η +(1 ξ)σ η ]ηi + βπ λ λ i } +βσ v ω 1/2 i (1 γl) 1 (1 ξl) 1 vit + σ ε ω 1/2 (1 γl) 1 ε it = (1 γ) 1 (1 ξ) 1 {[βπ η +(1 ξ)σ η ]ηi + βπ λ λ i } + βσ v(1 ρ 2 vε) 1/2 ω 1/2 i (1 γl)(1 ξl) ζ it + [βρ vεσ v +(1 ξl)σ ε ]ω 1/2 i ε it (5.16) (1 γl)(1 ξl) The second term of the final expression constitutes for each i an AR(2) process and the third one an ARMA(2,1) process. The variance of y it has four components given by (derivations in the Appendix) V η = (1 γ) 2 (1 ξ) 2 [βπ η +(1 ξ)σ η ] 2 V λ = (1 γ) 2 (1 ξ) 2 β 2 πλ 2 V ζ (i) = β2 σv(1 2 ρ 2 vε)(1 + γξ) (1 γ 2 )(1 ξ 2 )(1 γξ) ω i V ε (i) = [(1 + βρ vεσ v ) 2 + ξ 2 ](1 + γξ) 2ξ(1 + βρ vε σ v )(γ + ξ) ω (1 γ 2 )(1 ξ 2 i. )(1 γξ) i Averaging the last two over all i yields V ζ and V ε. For the average long-run variance of y it we then can evaluate V y = V η + V λ + V ζ + V ε. (5.17) When choosing fixed values for ratios involving these components to obtain values for β and σ η we will run into the problem of multiple solutions. On the other hand, the four 122

135 components of (5.17) have particular invariance properties regarding the signs of β, σ η and ρ vε, since changing the sign of all three yields exactly the same value of V y.wecoped with this as follows. Although we note that V η does depend on βπ η, we set σ η simply by fixing the direct cumulated effect impact of ηi on y it relative to the current noise σ ε =1. This is DENy η = σ η /(1 γ). (5.18) Because the direct and indirect (via x it ) effects from ηi may have opposite signs, DENy η could be given negative values too, but we restricted ourselves to DENy η {1, 4}, yielding σ η =(1 γ)den η y. (5.19) Finally we fix a signal-noise ratio, which gives a value for β. Because under simultaneity the noise and current signal conflate, we focus on the case where ρ xε =0. Then we have V ζ =[(1 γ 2 )(1 ξ 2 )(1 γξ)] 1 β 2 σ 2 v(1 + γξ), Vε =(1 γ 2 ) 1. Leaving the variance due to the effects aside, the average signal variance is V ζ + V ε 1, because the current average noise variance is unity. Hence, we may define a signal-noise ratio as SNR = V ζ + V ε 1= β2 (1 EV F x )(1 + γξ) γ 2 + (1 γ 2 )(1 γξ) (1 γ 2 ), (5.20) where we have substituted (5.11). For this we may choose, say, SNR {2, 3, 5}, in order to find β = [ 1 γξ 1+γξ ] SNR γ 2 1/2 (SNR +1). (5.21) 1 EV F x Note that here another admissibility restriction crops up, namely γ 2 SNR/(SNR +1). (5.22) However, for γ 0.8 this is satisfied for SNR From (5.21) we only examined the positive root. nstead of fixing SNR another approach would be to fix the total multiplier TM = β/(1 γ), (5.23) which would directly lead to a value for β, given γ. However, different TM values will 123

136 then lead to different SNR values, because SNR = TM 2 (1 γ)(1 + γξ) (1 EV F x ) (1 + γ)(1 γξ) + γ 2 (1 γ 2 ). (5.24) At this stage it is hard to say what would yield more useful information from the Monte Carlo, fixing TM or SNR. Keeping both constant for different γ and some other characteristics of this DGP is out of the question. We chose to fix SNR = 3. which yields TM values in the range When comparing with results for TM = 1 we did not note substantial principle differences. For all different design parameter combinations considered, which involve sample size N = 200 and T {3, 6, 9}, we used the very same realizations of the underlying standardized random components ηi,λ i,ε it and ζit over the respective replications that we performed. At this stage, all these components have been drawn from the standard normal distribution. To speed-up the convergence of our simulation results, in each replication we have modified the N drawings ηi and λ i such, that they have sample mean zero, sample variance 1 and sample correlation zero. This rescaling is achieved by replacing the N draws ηi first by [ηi N 1 N i=1 η i ] and next by ηi /[N 1 N i=1 (η i ) 2 ] 1/2, and by replacing the λ i by the residuals obtained after regressing λ i on ηi and an intercept, and next scaling them by taking λ i /[N 1 N i=1 (λ i ) 2 ] 1/2. n addition, we have rescaled in each replication the ω i by dividing them by N 1 N i=1 ω i, so that the resulting ω i have sum N as they should in order to avoid that presence of heteroskedasticity is conflated with larger or smaller average disturbance variance. n the simulation experiments we will start-up the processes for x it and y it at presample period s < 0 by taking x is = 0 and y i,s = 0 and next generate x it and y it for the indices t = s +1,..., T. The data with time-indices s,..., 1 will be discarded when estimating the model. We suppose that for s = 50 both series will be on their stationary track from t = 0 onwards. When taking s = 1 or 2 the initial values y i0 and x i1 will be such that effect-stationarity has not yet been achieved. Due to the fixed zero startups (which are equal to the unconditional expectations) the (cross-)autocorrelations of the x it and y it series have a very peculiar start then too, so our results regarding effect nonstationarity will certainly not be fully general, but in a way for s close to zero we mimic the situation that a process only started very recently. Another simple way to mimic a situation in which lagged first-differenced variables are invalid instruments for the model in levels can be designed as follows. Equations (5.5) and (5.16) highlight that in the long-run Δx it and Δy it are uncorrelated with the effects η i and λ i. This can be undermined by perturbing x i0 and y i0 as obtained from s =

137 in such a way that we add to them the values [ πη (φ 1) 1 ξ η i + π ] λ 1 ξ λ i [ ] βπη +(1 ξ)σ η βπ λ and (φ 1) (1 γ)(1 ξ) η i + (1 γ)(1 ξ) λ i (5.25) respectively. Note that for φ = 1 effect stationarity is maintained, whereas for 0 φ<1the dependence of x i0 and y i0 on the effects is mitigated in comparison to the stationary track (upon maintaining stationarity regarding ε it and ζit), whereas for φ>1 this dependence is inflated. Note that this is a straight-forward generalization of the approach followed in Kiviet (2007) for the panel AR(1) model. 5.3 Simulation results To limit the number of tables we proceed as follows. Often we will first produce results on unfeasible implementations of the various inference techniques in relatively simple DGPs. These exploit the true values of ω 1,..., ω N,σε 2 and ση 2 instead of their estimates. Although this information is generally not available in practice, only when such unfeasible techniques behave reasonably well in finite samples it seems useful to examine in more detail the performance of feasible implementations. Results for the unfeasible Arellano and Bond (1991) and Blundell and Bond (1998) GMM estimators are denoted as ABu and BBu respectively. Their feasible counterparts are denoted as AB1 and BB1 for the 1-step (which under homoskedasticity are equivalent to their unfeasible counterparts) and AB2 and BB2 for the 2-step estimators. For 2-step estimators the lower case letters a, b or c are used (as in for instance AB2c) to indicate which type of weighting matrix has been exploited, as discussed in sections and of the previous chapter. For corresponding MGMM implementations these acronyms are preceded by the letter M. Under homoskedasticity their unfeasible implementation has been omitted when this is equivalent to GMM. n BB estimation we have always used q =1. First in subsection we will discuss the results for DGPs in which the initial conditions are such that BB estimation will be consistent and more efficient than AB, and subsequently in the situation where BB is inconsistent is examined. Within these subsections we will examine different parameter value combinations for the DGP. We will start by presenting results for a reference parametrization (indicated P0) which has been chosen such that the model has in fact four parameters less, by choosing ρ xε =0 (x it is strictly exogenous), EVF x = 0 (hence π λ = π η =0, so x it is neither correlated with λ i nor with η i )andκ = 0 (any cross-sectional heteroskedasticity is just related with λ i ). These choices (implying that any heteroskedasticity will be unrelated to the regressor 125

138 x it ) may (hopefully) lead to results where little difference between unfeasible and feasible estimation will be found and where test sizes are relatively close to the nominal level of 5%. Next we will discuss the effects of settings (to be labelled P1, P2 etc.) which deviate from this reference parametrization P0 in one or more aspects regarding the various correlations and variance fractions and ratios. n P0 the relationship for y it will be characterized by DENy η = 1 (the impact on y it of the individual effect η i and of the idiosyncratic disturbance ε it have equal variance). The two remaining parameters are fixed over all cases examined (including P0). The x it series has autoregressive coefficient ξ =0.8and regarding y it we take SNR = 3 (excluding the impacts of the individual effects, the variance of the explanatory part of y it is three times as large as σε). 2 n the previous chapter (section 4.3.2) we already indicated that we will examine implementations of GMM where all internal instruments associated with linear moment conditions will be employed (A), but also particular reductions based either on collapsing (C) or omitting long lags (L3, etc.), or a combination (C3, etc.). On top of this we will also distinguish situations that may lead to reductions of the instruments that are being used, because the regressor x it in model (5.1), which will either be strictly exogenous or endogenous with respect to ε it, might be rightly or wrongly treated as either strictly exogenous, or as predetermined (weakly exogenous), or as endogenous. These three distinct situations will be indicated by the letters X, W and E respectively. So, in parametrization P0, where x it is strictly exogenous, the instruments used by either A, C or, say, L2, are not the same under the situations X, W and E. This is hopefully clarified in the next paragraph. Since we assume that for estimation just the observations y i0,..., y it and x i1,..., x it are available, the number of internal instruments that are used under XA (all instruments, x it treated as strictly exogenous) for estimation of the equation in first differences is: T 1 (time-dummies) + T (T 1)/2 (lags of y it )+(T 1)T (lags and leads of x it ). This yields {11, 50, 116} instruments for T = {3, 6, 9}. Under WA this is {8, 35, 80} and under EA {6, 30, 72}. From the previous chapter (Section 4.3.1) it follows that for BB estimation this number of instruments increases with 1 (intercept) + T 1 (when y i,t 1 is supposed to be effect stationary) + T 1 (when x i,t is supposed to be effect stationary) 1 (when x it is treated as endogenous). This implies for T = {3, 6, 9} a total of {5, 11, 17} extra instruments under XA and WA, and of {4, 10, 16} under EA, whereas these extra instruments will be valid in Section below and invalid in Section For the tables to follow we always examine the three values γ {0.2, 0.5, 0.8} for the dynamic adjustment coefficient at the three sample size values T {3, 6, 9} while mostly N = 200, as in the classic Arellano and Bond (1991) study. This is done for 126

139 both θ = 0 (homoskedasticity) and θ = 1 (substantial cross-sectional heteroskedasticity). Tables have a code which starts by the design parametrization, followed by the character u or f, indicating whether the table contains unfeasible or feasible results. Because of the many feasible variants not all results can be combined in just one table. Therefore, the f is followed by c, t or J, where c indicates that the table just contains results on coefficient estimates, which are estimated bias, standard deviation (Stdv) and RMSE (root mean squared error; below often loosely addressed as precision); t refers to estimates of the actual rejection probabilities of tests on true coefficient values; and J indicates that the table only contains results on Sargan-Hansen tests. Next, after a bar (-), the earlier discussed code for how regressor x it is actually treated when selecting the instruments is given, followed by the type of instrument reduction. The acronyms in the column headings of the tables refer to estimators or test statistics all introduced in the previous chapter. Table 5.2 provides references to their formulas, where j {a, b, c} refers to the type of weighting matrix used and in the super-index (l, h) forthej tests both l and h refer to an iteration step and the corresponding residuals. ndex h refers to the residuals used in establishing the weighting matrix and index l to the residual vector for calculating the quadratic form in the numerator of J. Regarding the J tests of modified GMM the estimates of σε,i 2 and ση 2 used for weighting of the moment conditions and estimating the weighting matrix are always based on 1-step residuals, whereas the residual vector tested is based on two-step. Remember that unfeasible statistics always use the true values of σε,i 2 and (if relevant) ση 2 and thus require no iteration and use the optimal weighting matrix. Table 5.2: Acronyms of feasible statistics with reference to relevant formulas Chapter 4 estimator ref. t-test ref. J-test ref. AB1 (4.40) AB1 scaled by (4.68) JAB (1,0) (4.77) AB1jR scaled by (4.69) JAB (1,1) j (4.78) AB2j (4.47) AB2j scaled by (4.70) JAB (2,h) j (4.79) AB2jW scaled by (4.71), (4.72) MAB2 (4.89) MAB2 scaled by 1st factor (4.89) JMAB (4.93) BB1 (4.60), q = 1 BB1 scaled by (4.73) JBB (1,0) (4.80) JES (1,0) (4.83) BB1jR scaled by (4.74) JBB (1,1) j (4.81) BB2j (4.67) BB2j scaled by (4.75) JBB (2,h) j (4.82) BB2jW scaled by (4.71*), (4.76) JES (l,h) j (4.84) MBB2 (4.92) MBB2 scaled by 1st factor (4.92) JMBB (4.94) JESM (4.95) 127

140 5.3.1 DGPs under effect stationarity Here we focus on the case where BB is consistent and more efficient than AB, since s = 50 and φ =1. Results for the reference parametrization P0 Table 5.3, with code P0u-XA, gives results for unfeasible GMM coefficient estimators, unfeasible single coefficient tests, and for unfeasible Sargan-Hansen tests for the reference parametrization P0 when x it is (correctly) treated as strictly exogenous and all available instruments are being used. Table 5.4 (P0fc-XA) presents a selection of feasible counterparts regarding the coefficient estimators. Under homoskedasticity we see that for ˆγ ABu =ˆγ AB1 its bias (which is negative), stdv and thus its RMSE increase with γ and decrease with T, whereas the bias of ˆβ ABu = ˆβ AB1 is moderate and its RMSE, although decreasing in T, is almost invariant with respect to β. The BBu coefficient estimates are superior indeed, the more so for larger γ values (as is already well-known), but less so for β. As already conjectured in the previous chapter (Section 4.3.6) under cross-sectional heteroskedasticity both ABu and BBu are substantially less precise than under homoskedasticity. However, modifying the instruments under cross-sectional heteroskedasticity as is done by MABu and MBBu yields considerable improvements in performance both in terms of bias and RMSE. n fact, the precision of the unfeasible modified estimators under heteroskedasticity comes very close to their counterparts under homoskedasticity. The simulation results for feasible estimation do not contain the b variant of the weighting matrix 4 because it is so bad, whereas both the a and c variants yield RMSE values very close to their unfeasible counterparts, under homoskedasticity as well as heteroskedasticity. Although the best unfeasible results under heteroskedasticity are obtained by MBBu, this does not fully carry over to MBB, because for T small and also for moderate T and large γ, BB2c performs much better. The performance of MAB2 and AB2c is rather similar, whereas we established that their unfeasible variants differ a lot when γ is large. Apparently, the modified estimators can be much more vulnerable when the variances of the error components, σε,i 2 and ση, 2 are unknown, probably because their estimates have to be inverted in (4.89) and (4.91) of the previous chapter. From the type error estimates for unfeasible single coefficient tests in Table 5.3 we see that the standard test procedures work pretty well for all techniques regarding β, but with respect to γ ABu fails for larger γ. This gets even worse under heteroskedasticity, but less so for MABu. For BBu and MBBu the results are reasonable. Here the test seems to 4 The b variant differs from a only for T>3 and may then be not positive definite. For T =6, 9it proved to be so bad for both AB and BB that we discarded it completely from the presented tables. 128

141 Table 5.3: P0u-XA Unfeasible coefficient estimators θ =0 θ =1 ρxε =0 L ABu BBu ABu BBu MABu MBBu AB BB γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = AB BB β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = Unfeasible t-test: actual significance level L θ =0 θ =1 ρxε =0 AB BB γ ABu BBu β ABu BBu γ ABu BBu MABu MBBu β ABu BBu MABu MBBu T = T = T = Unfeasible Sargan-Hansen test: rejection probability df θ =0 θ =1 ρxε =0 AB BB nc γ JABu JBBu JESu JMABu JMMBuJESMu JABu JBBu JESu JMABu JMMBuJESMu T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0,ξ =0.8, κ =0,σε =1,q =1,φ =1. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0(and ρxη =0, ρxλ =0). 129

142 Table 5.4: P0fc-XA Feasible coefficient estimators for Arellano-Bond θ =0 θ =1 ρxε = 0 AB1 AB2a AB2c AB1 AB2a AB2c MAB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = Feasible coefficient estimators for Blundell-Bond θ =0 θ =1 ρxε = 0 BB1 BB2a BB2c BB1 BB2a BB2c MBB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0,ξ =0.8, κ =0,σε =1,q =1,φ =1. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0(and ρxη =0, ρxλ =0). 130

143 benefit from the smaller bias of BBu. For the feasible variants we find in Table 5.5 (P0ft- XA) that under homoskedasticity AB1 has reasonable actual significance level for β, but for γ only when it is small. The same holds for AB2c. Under heteroskedasticity σε,i 2 has to be estimated and then AB2c overrejects, especially for γ or T large, but only mildly so for tests on β. Both AB2a and MAB2 overreject enormously. Employing the Windmeijer (2005) correction mitigates the overrejection probabilities in many cases, but not in all. AB2cW has appropriate size for tests on β, but for tests on γ the size increases both with γ and with T from 7% to 37% over the grid examined. Since the test based on ABu shows a similar pattern, it is self-evident that a correction which takes into account the randomness of AB1 cannot be fully effective. Oddly enough the Windmeijer correction Table 5.5: P0ft-XA Feasible t-test Arellano-Bond: actual significance level θ =0 θ =1 ρxε =0 L γ AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW MAB2 T = T = T = L β AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW MAB2 T = T = T = Feasible t-test Blundell-Bond: actual significance level θ =0 θ =1 ρxε =0 L γ BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW MBB2 T = T = T = L β BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW MBB2 T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0,ξ =0.8, κ =0, σε =1,q =1,φ = 1. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0(and ρxη =0, ρxλ =0). 131

144 is occasionally more effective for the heavily oversized AB2a than for the less oversized AB2c. Under homoskedasticity both BB2c and BB2cW behave very reasonable, both for tests on β and on γ. Under heteroskedasticity BB2cW is still very reasonable, but all other implementations fail in some instances, especially for tests on γ when γ or T are large. The failure of BB1 under heteroskedasticity is self-evident, see the previous chapter (4.73). Regarding the unfeasible J tests Table 5.3 shows reasonable size properties under homoskedasticity, especially for JBBu, but less so for the incremental test on effect stationarity when γ is large. Under heteroskedasticity this problem is more serious, though less so for the unfeasible modified procedure. Heteroskedasticity and γ large lead to underrejection of the JABu test, especially when T is large too. Turning now to the many variants of feasible J tests, of which only some are presented in Table 5.6 (P0fJ-XA), we first focus on JAB. Under homoskedasticity JAB (1,0) behaves reasonable, though when applied when θ = 1 it rejects with high probability (thus detecting heteroskedasticity instead of instrument invalidity, probably due to underestimation of the variance of the still valid moment conditions). Of the JAB (1,1) tests, both for θ =0andθ =1, the c variant severely underrejects when T = 9 (when there is an abundance of instruments), but less so than the a version. Such severe underrejection had already been noted by Bowsher (2002). An almost similar pattern we note for JAB (2,1) and JAB (2,2). Test JMAB overrejects severely for T = 3 and underrejects otherwise. Turning now to feasible Table 5.6: P0fJ-XA Feasible Sargan-Hansen test: rejection probability df θ =0 ρxε =0 AB BB nc γ JAB a (2,1) JBB a (2,1) JES a (2,1) JAB c (2,1) JBB c (2,1) JES c (2,1) JMAB JMBB JESM (2,1) T = T = T = df θ =1 ρxε =0 AB BB nc γ JAB a (2,1) JMAB JMBB JESM JBB a JES a JAB c JBB c JES c T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0,ξ =0.8, κ =0,σε =1,q =1,φ = 1. These yield the DGP parameter values: πλ =0,πη =0, σv =0.60, ση =1 γ, ρvε =0(and ρxη =0, ρxλ =0). 132

145 JBB tests we find that JBB (1,0) underrejects when θ = 0 and, like JAB (1,0), rejects with high probability when θ =1. Both the a and c variants of test JBB (1,1), like JAB (1,1), have rejection probabilities that are not invariant with respect to T, γ and θ. Thecvariants seem the least vulnerable, and therefore also yield an almost reasonable incremental test JES (1,1), although it underrejects when θ = 0 and overrejects when θ =1forγ =0.8. For JBB (2,1) and JBB (2,2) too the c variant has rejection probabilities which vary the least with T, γ and θ, but they are systematically below the nominal significance level, which is also the case for the resulting incremental tests. Oddly enough, the incremental tests resulting from the a variants have type error probabilities reasonably close to 5%, despite the serious underrejection of both the JAB and JBB tests from which they result. When treating regressor x it as predetermined (P0-WA, not presented here), whereas it is strictly exogenous, fewer instruments are being used. Since the ones that are now abstained from are most probably the strongest ones regarding Δx it, it is no surprise that in the simulation results we note that especially the standard deviation of the β coefficient suffers. Also the rejection probabilities of the various tests differ slightly between implementations WA and XA, but not in a very systematic way as it seems. When treating x it as endogenous (P0-EA) the precision of the estimators gets worse, with again no striking effects on the performance of test procedures under their respective null hypotheses. Upon comparing for P0 the instrument set A (and set C) with the one where A x (C x )is replaced by C1 x it has been found that the in practice quite popular choice C1 x yields often slightly less efficient estimates for β, but much less efficient estimates for γ. When x it is again treated as strictly exogenous, but the number of instruments is reduced by collapsing the instruments stemming from both y it and x it, then we note from Table 5.7 (P0fc-XC, just covering θ = 1) a mixed picture regarding the coefficient estimates. Although any substantial bias always reduces by collapsing, standard errors always increase at the same time, leading either to an increase or a decrease in RMSE. Decreases occur for the AB estimators of γ, especially when γ is large; for β just increases occur. A noteworthy reduction in RMSE does show up for BB2a when γ is large, T =9 and θ =1, but then the RMSE of BB2c using all instruments is in fact smaller. However, Table 5.8 (P0ft-XC) shows that collapsing is certainly found to be very beneficial for the type error probability of coefficient tests, especially in cases where collapsing yields substantially reduced coefficient bias. The AB tests benefit a lot from collapsing, especially the c variant, leaving only little room for further improvement by employing the Windmeijer correction. After collapsing AB1 works well under homoskedasticity, and also under heteroskedasticity provided robust standard errors are being used, where the c version is clearly superior to the a version. AB2c has appropriate type error probabil- 133

146 Table 5.7: P0fc-XC Feasible coefficient estimators for Arellano-Bond (θ =1) ρxε = 0 AB1 AB2a AB2c MAB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = Feasible coefficient estimators for Blundell-Bond (θ =1) ρxε = 0 BB1 BB2a BB2c MBB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0, ξ =0.8, κ =0,σε =1,q =1,φ = 1. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0(and ρxη =0, ρxλ =0). ities, except for testing γ when it is 0.8 att =3andθ = 1 (which is not repaired by a Windmeijer correction either), and is for most cases superior to AB2aW. After collapsing BB2a shows overrejection which is not completely repaired by BB2aW when θ = 1. BB2c and BB2cW generally show lower rejection probabilities, with occasionally some underrejection. Tests based on MAB2 and MBB2 still heavily overreject. Table 5.9 (P0fJ-XC) shows that by collapsing JAB and JBB tests suffer much less from underrejection when T is larger than 3. However, both the a and c versions of the J (2,1) and J (2,2) tests usually still underreject, mostly by about 1 or 2 percentage points. Good performance is shown by JES a (2,1) and JES c (2,1). 134

147 Table 5.8: P0ft-XC Feasible t-test: actual significance level (θ =1) Arellano-Bond Blundell-Bond: ρxε =0 L γ AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW L γ BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW T = T = T = L β AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW L β BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0,ξ =0.8, κ =0,σε =1,q =1, φ = 1. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0(and ρxη =0, ρxλ =0). Table 5.9: P0fJ-XC Feasible Sargan-Hansen test: rejection probability df θ =1 ρxε =0 AB BB nc γ JAB a (2,1) JBB a (2,1) JES a (2,1) JAB c (2,1) JBB c (2,1) JES c (2,1) T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3, DENy =1,EV Fx =0, ρxε =0,ξ =0.8, κ =0,σε =1,q =1,φ =1. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0(and ρxη =0, ρxλ =0). When x it is still correctly treated as strictly exogenous but for the level instruments just a few lags or first differences are being used (XL0... XL3) for both y it and x it then we find the following. Regarding feasible AB and BB estimation collapsing (XC) always gives smaller RMSE values than XL0 and XL1 (which is much worse than XL0), but this is not the case for XL2 and XL3. Whereas XC yields smaller bias, XL2 and XL3 often reach smaller Stdv and RMSE. Especially regarding β XL3 performs better than XL2. Probably due to the smaller bias of XC it is more successful in mitigating size problems of coefficient tests than XL0 through XL3. The effects on J tests is less clear-cut. Combining collapsing with restricting the lag length we find that XC2 and XC3 are in some aspects slightly worse but in others occasionally better than XC for P0. We also examined the hybrid instrumentation which seems popular amongst practitioners where C w is combined 135

148 with L1 x (see Table 4.1 of the previous chapter). Especially for γ this leads to loss of estimator precision without any other clear advantages, so it does not outperform the XC results for P0. From examining P0-WC (and P0-EC) we find that in comparison to P0-WA (P0-EA) there is often some increase in RMSE, but the size control of especially the t-tests is much better. Summarizing the results for P0 on feasible estimators and tests we note that when choosing between different possible instrument sets a trade off has to be made between estimator precision and test size control. For both some form of reduction of the instrument set is often but not always beneficial. Not one single method seems superior irrespective of the actual values of γ, β and T. Using all instruments is not necessarily a bad choice; also XC, XL3 and XC3 often work well. To mitigate estimator bias and foster test size control while not sacrificing too much estimator precision using collapsing (C) for all regressors seems a reasonable compromise, as far as P0 is concerned. Coefficient and J tests based on the modified estimator using its simple feasible implementation examined here behave so poorly, that in the remainder we no longer mention its results. Results for alternative parametrizations Next we examine a series of alternative parametrizations where each time we just change one of the parameter values of one of the already examined cases. n P1 we increase DENy η from 1 to 4 (hence, substantially increasing the relative variance of the individual effects). We note that for P1-XA (not tabulated here) all estimators regarding γ are more biased and dispersed than for P0-XA, but there is little or no effect on the β estimates. For both T and γ large this leads to serious overrejection for the unfeasible coefficient tests regarding γ, in particular for ABu. Self-evidently, this carries over to the feasible tests and, although a Windmeijer correction has a mitigating effect, the overrejection remains often serious for both AB and BB based tests. Tests on β based on AB behave reasonable, apart from not robustified AB1 and AB2a. For the latter a Windmeijer correction proves reasonably effective. When exploiting the effect stationarity the BB2c implementation seems preferable. The unfeasible J tests show a similar though slightly more extreme pattern as for P0-XA. Among the feasible tests both serious underrejection and overrejection occurs. As far as the incremental tests concerns JES c (2,2) behaves remarkably well. n Tables 5.10, 5.11 and 5.12 (P1fj-XC, j = c,t,j) we find that collapsing leads again to reduced bias, slightly deteriorated precision though improved size control (here all unfeasible tests behave reasonably well). All feasible AB1R and AB2W tests have reasonable size control, apart from tests on γ when T is small and γ large. These give actual signifi- 136

149 Table 5.10: P1fc-XC Feasible coefficient estimators for Arellano-Bond (θ =1) ρxε = 0 AB1 AB2a AB2c MAB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = Feasible coefficient estimators for Blundell-Bond (θ =1) ρxε = 0 BB1 BB2a BB2c MBB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =4,EV Fx =0, ρxε =0, ξ =0.8, κ =0,σε =1,q =1,φ = 1. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =4 (1 γ), ρvε =0(and ρxη =0, ρxλ =0). cance levels close to 10%. BB2cW seems slightly better than BB2aW. The 1-step J tests show some serious overrejection whereas the 2-step J tests behave quite satisfactorily. For C3 reasonably similar results are obtained, but those for L3 are generally slightly less attractive. n P2 we increase EV F x from 0 to 0.6, upon having again EFx η = 0 (hence, x it is still uncorrelated with effect η i though correlated with effect λ i, which determines any heteroskedasticity). This leads to increased β values. Results for P2-XA show larger absolute values for the standard deviations of the β estimates than for P0-XA, but they are almost similar in relative terms. The patterns in the rejection probabilities under the 137

150 Table 5.11: P1ft-XC Feasible t-test: actual significance level (θ =1) Arellano-Bond Blundell-Bond: ρxε =0 L γ AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW L γ BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW T = T = T = L β AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW L β BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =4,EV Fx =0, ρxε =0,ξ =0.8, κ =0,σε =1,q =1, φ = 1. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =4 (1 γ), ρvε =0(and ρxη =0, ρxλ =0). Table 5.12: P1fJ-XC Feasible Sargan-Hansen test: rejection probability df θ =1 ρxε =0 AB BB nc γ JAB a (2,1) JBB a (2,1) JES a (2,1) JAB c (2,1) JBB c (2,1) JES c (2,1) T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3, DENy =4,EV Fx =0, ρxε =0,ξ =0.8, κ =0,σε =1,q =1,φ =1. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =4 (1 γ), ρvε =0(and ρxη =0, ρxλ =0). respective null hypotheses are hardly affected, and P2-XC shows again improved behavior of the test statistics due to reduced estimator bias, whereas the RMSE values have slightly increased. n P3 we change EFx η from 0 to 0.3, while keeping EV F x =0.6 (hence, realizing now dependence between regressor x it and the individual effect η i ). Comparing the results for P3-XA with those for P2-XA (which have the same β values) we find that all patterns are pretty similar. Also P3-XC follows the P2-XC picture closely. P4 differs from P3 because κ =0.25, thus now the heteroskedasticity is determined by η i too. This has a noteworthy effect on MBB2 estimation and a minor effect on JBB (and thus on JES) testing only. P5 differs from P0 just in having ρ xε =0.3, so x it is now endogenous with respect to ε it. 138

151 P5-EA uses all instruments available when correctly taking the endogeneity into account. This leads to very unsatisfactory results. The coefficient estimates of γ have serious negative bias, and those for β positive bias, whereas the standard deviation is slightly larger than for P0-EA, which are substantially larger than for P0-XA. All coefficient tests are very seriously oversized, also after a Windmeijer correction, both for AB and BB. Tests JABu and JBBu show underrejection, whereas the matching JES tests show serious overrejection when T is large, but the feasible 2-step variants are not all that bad. From Tables 5.13, 5.14 and 5.15 (P5fj-EC, j = c,t,j) we see that most results which correctly handle the simultaneity of x it are still bad after collapsing, especially for T small Table 5.13: P5fc-EC Feasible coefficient estimators for Arellano-Bond (θ =1) ρxε = 0 AB1 AB2a AB2c MAB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = Feasible coefficient estimators for Blundell-Bond (θ =1) ρxε = 0 BB1 BB2a BB2c MBB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0.3, ξ =0.8, κ =0,σε =1,q =1,φ = 1. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0.5 (and ρxη =0, ρxλ =0). 139

152 Table 5.14: P5ft-EC Feasible t-test: actual significance level (θ =1) Arellano-Bond Blundell-Bond: ρxε =0 L γ AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW L γ BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW T = T = T = L β AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW L β BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0.3, ξ =0.8, κ =0,σε =1,q =1, These yield the DGP parameter values: πλ =0, πη =0, σv =0.60, ση =1 γ, ρvε =0.5 (and ρxη =0, ρxλ =0). Table 5.15: P5fJ-EC Feasible Sargan-Hansen test: rejection probability df θ =1 ρxε =0 AB BB nc γ JAB a (2,1) JBB a (2,1) JES a (2,1) JAB c (2,1) JBB c (2,1) JES c (2,1) T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3, DENy =1,EV Fx =0, ρxε =0.3, ξ =0.8, κ =0,σε =1,q =1,φ =1. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0(andρvε =0.5, ρxλ =0). (where collapsing can only lead to a minor reduction of the instrument set), although not as bad as those for P5-EA and larger values of T. For P5-EC the rejection probabilities of the corrected coefficient tests are usually in the 10-20% range, but those of the 2-step J tests are often close to 5%. Both AB and BB are inconsistent when treating x it either as predetermined or as exogenous. For P5-WA and P5-XA the coefficient bias is almost similar but much more serious than for P5-EA. For the inconsistent estimators the bias does not reduce when collapsing the instruments. Because the inconsistent estimators have a much smaller standard deviation than the consistent estimators practitioners should be warned never to select an estimator simply because of its attractive estimated standard error. The consistency of AB and BB should be tested with the Sargan-Hansen test. n this study we did not examine the particular incremental test which focusses on 140

153 the validity of the extra instruments when comparing E with W or E with X. Here we just examine the rejection probabilities of the overall overidentification J tests for case P5 using all instruments and can compare the rejection frequencies when treating x it correctly as endogenous, or incorrectly as either predetermined or exogenous. From Table 5.16 (P5fJ-jA, j = E,W,X) we find that the detection of inconsistency in this way has often a higher probability when the null hypothesis is W than when it is X. The probability generally increases with T and with γ and is often better for the c variant than for the a variant and slightly better for BB implementations than for AB implementations, whereas in general heteroskedasticity mitigates the rejection probability. n the situation where all instruments have been collapsed, where we already established that the J tests do have reasonable size control, we find the following. For T =3andγ =0.2 the rejection probability of the JAB and JBB tests does not rise very much when ρ xε moves from 0 to 0.3, whereas for T =9, ρ xε =0.3this rejection probability is often larger than 0.7 when γ 0.5 and often larger than 0.9 forγ =0.8. Hence, only for particular T, γ and θ parametrizations the probability to detect inconsistency seems reasonable, whereas the major consequence of inconsistency, which is serious estimator bias, is relatively invariant regarding T, γ and θ. Summarizing our results for effect stationary models we note the following. We established that finite sample inaccuracies of the asymptotic techniques seriously aggravate when either σ η /(1 γ) σ ε or under simultaneity. For both problems it helps to col- Table 5.16: P5fJ-jA, j = E,W,X P5fJ-EA. Feasible Sargan-Hansen test: rejection probability P5fJ-WA. Feasible Sargan-Hansen test: rejection probability df θ =1 df θ =1 ρxε =0 AB BB nc γ JAB a (2,1) JBB a (2,1) JES a (2,1) JAB c (2,1) JBB c (2,1) JES c (2,1) AB BB nc γ JAB a (2,1) JBB a (2,1) JES a (2,1) JAB c (2,1) JBB c (2,1) JES c (2,1) T = T = T = P5fJ-XA. Feasible Sargan-Hansen test: rejection probability df θ =1 ρxε =0 AB BB nc γ JAB a (2,1) JBB a (2,1) JES a (2,1) JAB c (2,1) JBB c (2,1) JES c (2,1) T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0.3, ξ =0.8, κ =0,σε =1,q =1,φ =1. These yield the DGP parameter values: πλ =0, πη =0, σv =0.60, ση =1 γ, ρvε =0.5 (and ρxη =0, ρxλ =0). 141

154 lapse instruments, and the first problem is mitigated and the second problem detected with higher probability by instrumenting according to W rather than X. Neglected simultaneity leads to seemingly accurate but seriously biased coefficient estimators, whereas asymptotically valid inference on simultaneous dynamic relationships if often not very accurate either. Even when the more efficient BB estimator is used with Windmeijer corrected standard errors, the bias in both γ and β is very substantial and test sizes are seriously distorted. Some further pilot simulations disclosed that N should be much and much larger than 200 in order to find much more reasonable asymptotic approximation errors Nonstationarity Next we examine the effects of a value of φ different from unity while s = 50. We will just consider setting φ =0.5 and perturbing x i0 and y i0 according to (5.25), so that their dependence on the effects is initially 50% away from stationarity so that BB estimation is inconsistent. That this occurred we will indicate in the parametrization code by changing PintoP φ. Comparing the results for P φ 0-XA with those for P0-XA, where φ = 1 (effect stationarity), we note from Table 5.17 (P φ 0fc-XA) a rather moderate positive bias in the BB estimators for both γ and β when both T and γ are small. Despite the inconsistency of BB the bias is very mild for larger T and especially for larger γ it is much smaller than for consistent AB. The pattern regarding T can be explained, because convergence towards effect stationarity does occur when time proceeds. Since this convergence is faster for smaller γ the good results for large γ seems due to great strength of the first-differenced lagged instruments regarding the level equation. Since π η =0hereΔx i,t 1 is in fact a valid instrument too. Note that the RMSE of inconsistent BB1, BB2a and BB2c is always smaller than that for consistent AB1, AB2a and AB2c, except when T and γ are both small. With respect to the AB estimators we find little to no difference compared to the results under stationarity. Table 5.18 (P φ 0ft-XA) shows that when γ =0.8 the BB2cW coefficient test on γ yields very mild overrejection, while AB2aW and AB2cW seriously overreject. For smaller values of γ it is the other way around. After collapsing (not tabulated here) similar but more moderate patterns are found, due to the mitigated bias which goes again with slightly increased standard errors. Hence, for this case we find that one should perhaps not worry too much when applying BB even if effect stationarity does not strictly hold for the initial observations. As it happens, we note from Table 5.19 (P φ 0fJ-XA) that the rejection probabilities of the JES tests are such that they are relatively low when BB inference is more precise than AB inference, and relatively high when either T or γ are low for φ =0.5. This pattern is much more pronounced for the JES 142

155 Table 5.17: P φ 0fc-XA Feasible coefficient estimators for Arellano-Bond (θ =1) ρxε = 0 AB1 AB2a AB2c MAB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = Feasible coefficient estimators for Blundell-Bond (θ =1) ρxε = 0 BB1 BB2a BB2c MBB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0, ξ =0.8, κ =0,σε =1,q =1,φ =0.5. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0(and ρxη =0, ρxλ =0). tests than for the JBB tests. However, it is also the case in P φ 0 that collapsing mitigates this welcome quality of the JES tests to warn against unfavorable consequences of effect nonstationarity on BB inference. From P φ 1-XA, in which the individual effects are much more prominent, we find that φ =0.5has curious effects on AB and BB results. For effect stationarity (φ =1)we already noted more bias for AB than under P0. For γ large, this bias is even more serious when φ =0.5, despite the consistency of AB. For BB estimation the reduction of φ leads to much larger bias and much smaller stdv, with the effect that RMSE values for inconsistent BB are usually much worse than for AB, but are often slightly better (except 143

156 Table 5.18: P φ 0ft-XA Feasible t-test: actual significance level (θ =1) Arellano-Bond Blundell-Bond: ρxε =0 L γ AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW L γ BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW T = T = T = L β AB1 AB1aR AB1cR AB2a AB2aW AB2c AB2cW L β BB1 BB1aR BB1cR BB2a BB2aW BB2c BB2cW T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0,ξ =0.8, κ =0,σε =1,q =1, φ =0.5. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0(and ρxη =0, ρxλ =0). Table 5.19: P φ 0fJ-XA Feasible Sargan-Hansen test: rejection probability df θ =1 ρxε =0 AB BB nc γ JAB a (2,1) JBB a (2,1) JES a (2,1) JAB c (2,1) JBB c (2,1) JES c (2,1) T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3, DENy =1,EV Fx =0, ρxε =0,ξ =0.8, κ =0,σε =1,q =1,φ =0.5. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0(and ρxη =0, ρxλ =0). for BB2c) when γ =0.8. All BB coefficient tests for γ have size close or equal to 1 under P φ 1-XA and the AB tests for γ =0.8 overreject very seriously as well. Under P φ 1-XC the bias of AB is reasonable except for γ =0.8. The bias of BB has decreased but is still enormous, although its RMSE remains preferable when γ =0.8. Especially regarding tests on γ BB fails. For both the a and c versions the JES test has high rejection probability to detect φ 1, except when γ is large. The relatively low rejection probability of JES tests obtained after collapsing when γ =0.8 andφ =0.5 again indicates that despite its inconsistency BB has similar or smaller RMSE than AB for that specific case. Next we consider the simultaneous model again. n case P φ 5-EA estimator AB is consistent and BB again inconsistent. Nevertheless, for all γ and T values examined in Table 5.20 (P φ 5fc-EA), AB has a more severe bias than BB, whereas BB has smaller 144

157 Table 5.20: P φ 5fc-EA Feasible coefficient estimators for Arellano-Bond (θ =1) ρxε = 0 AB1 AB2a AB2c MAB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = Feasible coefficient estimators for Blundell-Bond (θ =1) ρxε = 0 BB1 BB2a BB2c MBB2 L γ Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = L β Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE Bias Stdv RMSE T = T = T = R = simulation replications. Design parameter values: N = 200, SNR =3,DENy =1,EV Fx =0, ρxε =0.3, ξ =0.8, κ =0,σε =1,q =1,φ =0.5. These yield the DGP parameter values: πλ =0,πη =0,σv =0.60, ση =1 γ, ρvε =0.5 (and ρxη =0, ρxλ =0). stdv values at the same time and thus has smaller RMSE for all γ and T values examined. The size control of coefficient tests is worse for AB, but for BB it is appalling too, where BB2aW, with estimated type error probabilities ranging from 5% to 70%, is often preferable to BB2cW. The 2-step JAB tests behave reasonably (wrongly indicate inconsistency with a probability rather close to 5%), whereas the JBB tests reject with probabilities in the 3-38% range, and JES in the 3-69% range. By collapsing the RMSE of AB generally reduces when T 6 and for BB especially when γ =0.8. BB has again smaller RMSE than AB. The rejection rates of the JBB and JES tests are substantially lower now, which seems bad because the invalid (first-differenced) instruments are less 145

158 often detected, but this may nevertheless be appreciated because it induces to prefer less inaccurate BB inference to AB inference. After collapsing the size distortions of BB2aW and BB2cW are less extreme too, now ranging from 5-33%, but the RMSE values for BB may suffer due to collapsing, especially when γ and T are small. The RMSE values for BB under P φ 5-WA and P φ 5-XA are usually much worse than those for AB under P φ 5-EA. Hence, although the invalid instruments for the level equation are not necessarily a curse when endogeneity of x it is respected, they should not be used when they are invalid for two reasons (φ 1,ρ xε 0). That neither AB nor BB should be used in P5 under W and X will be indicated with highest probability under WC, and then this probability is larger than 0.8 for JBB a (2,1) only when T is high and for JAB a (2,1) only when both T and γ are high. Summarizing our findings regarding effect nonstationarity, we have established that although φ 1 renders BB estimators inconsistent, especially when T is not small BB inference nevertheless often beats consistent AB, provided possible endogeneity of x it is respected. The JES test seems to have the remarkable property to be able to guide towards the technique with smallest RMSE instead of the technique exploiting the valid instruments. 5.4 Empirical results The above findings will be employed now in a re-analysis of the data and some of the techniques studied in Ziliak (1997). The main purpose of that article was to expose the downward bias in GMM as the number of moment conditions expands. This is done by estimating a static life-cycle labor-supply model for a ten year balanced panel of males, and comparing for various implementations of 2SLS and GMM the coefficient estimates and their estimated standard errors when exploiting expanding sets of instruments. We find this approach rather naive for various reasons: (a) the difference between empirical coefficient estimates will at best provide a very poor proxy to any underlying difference in bias; (b) standard asymptotic variance estimates of V estimators are known to be very poor representations of true estimator uncertainty; 5 (c) the whole analysis is based on just one sample and possibly the model is seriously misspecified. 6 The latter issue also undermines conclusions drawn in Ziliak (1997) on overrejection by the J test, because it is of course unknown in which if any of his empirical models the null hypothesis is true. To avoid such criticism we designed the controlled experiments in the two foregoing sections 5 See findings in Kiviet (2013) and in many of its references. 6 Baltagi et al. (2005) study a similar life-cycle labor-supply model for physicians in Norway. They consider a dynamic model, and this rejects the static specification used by Ziliak (1997). 146

159 on the effects of different sets of instruments on various relevant inference techniques. And now we will examine how these simulation results can be exploited to underpin actual inference from the data set used by Ziliak. This data set originates from waves X-XX and the years of the PSD. The subjects are N = 532 continuously married working men aged in Ziliak (1997) employs the static model 7 ln h it = β ln w it + z itγ + η i + ε it, (5.26) where h it is the observed annual hours of work, w it the hourly real wage rate, z it a vector of four characteristics (kids, disabled, age, age-squared), η i an individual effect and ε it the idiosyncratic error term. t is assumed that ln w it may be an endogenous regressor and that all variables included in z it are predetermined. The parameter of interest is β and its GMM estimates range from approximately 0.07 to 0.52, depending on the number of instruments employed. After some experimentation we inferred that lagged reactions play a significant role in this relationship and that in fact a general second-order linear dynamic specification is required in order to pass the diagnostic tests which are provided by default in the Stata package xtabond2, see Roodman (2009). This model, also allowing for time-effects, is given by ln h it = 2 γ l ln h i,t l + 2 l=1 l=0 (βw l ln w i,t l + βl k kids i,t l + βl d disab i,t l ) +β a age i,t + β aa age 2 i,t + τ t + η i + ε it. (5.27) Note that lags of age and its square would lead to multicollinearity problems. The inclusion of second-order lags yields T = 7 in estimating the first-differenced model. Table 5.21 presents estimation and test results for model (5.27) obtained by employing various of the techniques examined in the earlier sections. All results have been obtained by Stata/SE11.2 and xtabond2, supplemented with code for calculating ˆσ η 2 and ˆσ ε. 2 n column (1) 2-step Arellano-Bond GMM estimators are presented (omitting the time-effect results) in which all instruments are being used that follow from assuming that all regressors are predetermined, which gives 170 instruments. For the AR and J tests given in the bottom lines the p-values are presented. Hence, in column (1) the (first-differenced) residuals do exhibit 1st order serial correlation (as they should) but no significant 2nd order problems emerge. The presented J test is JAB a (2,1), which according to the simulations may have a tendency to underreject, and to have moderate rejection probability 7 This static model is also used extensively for illustrative purposes in Cameron and Trivedi (2005). 147

160 Table 5.21: Empirical findings for (restricted versions of) model (5.27) by AB estimation (1) (2) (3) (4) (5) (6) (7) (8) AB2aW AB2a AB2aW AB1aR AB1aR AB1aR AB1aR AB2aW WA WA EA WA EA WC EC EC γ ** 0.178** 0.199** 0.177** 0.206** 0.237** 0.238** 0.242** (0.057) (0.012) (0.061) (0.064) (0.069) (0.096) (0.101) (0.078) γ * 0.050** 0.080** 0.033* 0.068* 0.062** 0.055* 0.056* (0.032) (0.009) (0.027) (0.031) (0.028) (0.026) (0.028) (0.030) β w ** 0.403** 0.462** 0.566** 0.607** 0.254* (0.157) (0.041) (0.153) (0.174) (0.172) (0.179) (0.253) (0.284) β w * 0.121** * * (0.062) (0.024) (0.100) (0.073) (0.116) (0.106) (0.115) (0.110) β w * * (0.035) (0.016) (0.050) (0.044) (0.062) (0.068) (0.068) (0.071) β k * ** * ** ** * * (0.020) (0.011) (0.017) (0.024) (0.023) (0.023) (0.024) (0.023) β k (0.009) (0.007) (0.009) (0.011) (0.011) (0.012) (0.013) (0.011) β k * (0.009) (0.006) (0.010) (0.012) (0.012) (0.021) (0.014) (0.011) β d * (0.043) (0.021) (0.044) (0.047) (0.045) (0.077) (0.077) (0.061) β d * (0.042) (0.017) (0.042) (0.050) (0.051) (0.074) (0.078) (0.050) β d * 0.038** * 0.054* 0.087** 0.083** 0.046* (0.025) (0.011) (0.027) (0.030) (0.033) (0.031) (0.029) (0.024) β a (0.019) (0.012) (0.019) (0.021) (0.020) (0.021) (0.021) (0.018) β aa (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000) K L AR(1) AR(2) J ˆσ η ˆσ ε TM w when a regressor has incorrectly not been categorized as endogenous, especially under heteroskedasticity. Therefore its high p-value is not necessarily reassuring regarding validity of the instruments. When abstaining from 7 instruments such that wage is treated as endogenous, while still using as weighting matrix a submatrix of the weights under predeterminedness, the p-value of JAB a (2,1) is still and the resulting incremental J test for wage being predetermined is 7.13, which has p-value Hence, from the 148

161 pure asymptotic point of view these coefficient estimates and their Windmeijer-corrected standard errors could be taken serious, because yet no alarming signs of misspecification have been detected. All coefficient estimates with a t-ratio above 2 are indicated with a double asterix, and a single asterix when between 1 and 2; estimated standard errors are given between parentheses. Half of the time-dummies have a t-ratio around 1 and the others much smaller. Different though asymptotically equivalent tests for their joint significance (not presented here) yield χ 2 (7) statistics between 8 and 9. So, although not significantly different from zero, simply removing them from the model seems a bit rash. Next, we will examine the effects of the Windmeijer correction in column (2) and of treating wage not as predetermined, but as endogenous, in column (3). For the squared (not first-differenced) residuals of results (1) and (3) we also ran informal auxiliary regressions (with and without individual effects) on all the predetermined regressors of model (5.27). Due to several highly significant regressors these make obviously clear that ε it is heteroskedastic and not of the simple cross-sectional type. Hence, 1-step estimates which follow in columns (4) through (7) should be judged on the basis of robust standard errors only. Column (2) presents crude asymptotic standard errors, which are as expected much smaller for all coefficients. From the simulations we know that especially under heteroskedasticity we should distrust the uncorrected estimated standard errors of column (2). Column (3) shows that although some estimated standard errors substantially increase also many remain almost the same or even decrease when we treat log wage as endogenous. Especially the findings in column (3) on the coefficients γ 1 and γ 2 strongly reject the static specifications maintained in Ziliak (1997) and in Cameron and Trivedi (2005) for the same data. On the other hand the estimated values are such that the simulation results on large γ values do not seem relevant here. Testing the endogeneity by the difference between the two obtained JAB a (2,1) statistics yields (with p-value 0.161) leaving us still in doubt whether we should simply accept predeterminedness of wage, or err on the side of caution and treat it as endogenous. The rather indeterminate test results on endogeneity are in agreement with the finding that the coefficient estimates in columns (1) and (3) are only slightly different. They imply an immediate wage elasticity of hours worked ( ˆβ 0 w ) of 0.40 and 0.46 respectively, and an estimated long-run elasticity TM w =(ˆβ 0 w + ˆβ 1 w + ˆβ 2 w )/(1 ˆγ 1 ˆγ 2 ) of 0.68 and (assuming endogeneity of wage) A static model self-evidently would produce equivalent point-estimates for the immediate and the long-run wage elasticity of hours worked. The estimates in columns (1) and (3) suggest a value of DENy η of about 1.22 or 0.96 respectively, which we interpret such that the relatively unfavorable simulation results for case P1 where it is 4 do not apply here. 149

162 Next we estimate the same model again, but now by robust 1-step Arellano-Bond, assuming all variables to be predetermined in column (4) and wage to be endogenous in column (5). Most AB1aR standard errors are a bit larger than those for the corresponding AB2aW. This is in agreement with the differences between true standard deviations that we noted within simulations for P0 and P5. Although these simulations did not produce a serious difference between the coefficient bias of AB2 and AB1 the results in column (4) yield a much larger estimated TM w of The test on overidentification provided by xtabond2 8 (after estimating by AB1aR) is again JAB a (2,1) and not JAB a (1,1). Next we re-estimate the models of columns (4) and (5) by AB1aR, though employing substantially fewer instruments by collapsing. The results in columns (6) and (7) show some remarkable changes, both regarding coefficient estimates (yielding substantially lower TM w ) and regarding standard errors, which are often larger (as expected) but sometimes much lower. According to the simulation results both (6) and (7) will produce seriously biased coefficient estimates in the presence of a genuinely endogenous regressor, and both virtually unbiased estimators if all regressors are actually predetermined. The two versions of the incremental test (with 1 degree of freedom) on endogeneity give here p-values of and respectively. The collapsed AB2aW results in column (8) are less extreme and more in line with those of column (3). An unpleasant finding in columns (7) and (8) is that for the time dummies they lead to rejection of their validity as instruments by the incremental J test. Since these are exogenous by nature this raises doubts on the adequacy of the model specification. Such doubts may also be fed by the fact that all columns, except (8), suggest that a disability leads to increased labor supply, although this might perhaps be realistic in the US. We also employed BB2aW estimation, for E and W, and for A and C. When using all instruments we found p-values for the incremental test on effect stationarity of and However, after collapsing these are and This is a pattern that we noted in our simulations and gives reasons to believe that we should reject effect stationarity and take this test outcome as a warning that BB inference may be here less accurate than AB inference. The BB coefficient estimates imply an insignificant but positive immediate wage elasticity, whereas the long-run elasticity for all four is close to zero or even negative. This could well be due to the huge coefficient bias that we noted in our experiments with φ =0.5, especially those with an endogenous regressor. Hence, supposing that for these data wage is endogenous indeed, that the effects are nonstationary and heteroskedasticity is present, our cases P φ 5-EA and P φ 5-EC with 8 From our simulations we found that JAB (1,0), which is the test addressed as Sargan by xtabond2, has a rejection probability close to 1 under heteroskedasticity. So, it is useless after AB2 estimation and even misleading due to the comment Not robust, but not weakened by many instruments. 150

163 moderate γ, T =6andθ = 1 seem most relevant. For these we learned that collapsing has some advantages, but otherwise there is little to choose between AB1a and AB2a, because both yield substantial negative bias (about -50%) for γ and huge positive bias (about +30%) for β with actual test size of corrected t-tests close to 25%, although in this data set N is actually larger than 200. Nevertheless it seems a truism to conclude that there is great urge for developing more refined inference procedures for structural dynamic panel data models. 5.5 Major findings To parametrize and categorize all the variants put forward in the previous chapter for inference in linear dynamic micro panel data models, a simulation study has been designed here, which leads to a data generating process involving 13 parameters, for which, under 6 different settings regarding sample size and initial conditions, many different grid points have been examined. For each setting and various of the grid points 14 different choices regarding the set of instruments have been used to examine 10 different feasible implementations of GMM coefficient estimates (and 4 unfeasible), giving rise to 16 different feasible implementations of t-tests and 24 different feasible implementations of Sargan- Hansen tests. From all this only a pragmatically selected subset of results is actually presented in this chapter. The major conclusion from the simulations is that, even when the cross-section sample size is several hundreds, the quality of this type of inference depends heavily on a great number of aspects of which many are usually beyond the control of the investigator, such as: magnitude of the time-dimension sample size, speed of dynamic adjustment, presence of any endogenous regressors, type and severity of heteroskedasticity, relative prominence of the individual effects and (non)stationarity of the effect impact of any of the explanatory variables. The quality of inference also depends seriously on choices made by the investigator, such as: type and severity of any reductions applied regarding the set of instruments, choice between (robust) 1-step or (corrected) 2-step estimation, employing a modified GMM estimator, the chosen degree of robustness of the adopted weighting matrix, the employed variant of coefficient tests and of (incremental) Sargan- Hansen tests in deciding on the endogeneity of regressors, the validity of instruments and on the (dynamic) specification of the relationship in general. Our findings regarding the alternative approaches of modifying instruments and exploiting different weighting matrices are as follows for the examined case of cross-sectional heteroskedasticity. Although the unfeasible form of modification does yield very substan- 151

164 tial reductions in both bias and variance, for the straight-forward feasible implementation examined here the potential efficiency gains do not materialize. The robust weighting matrix, which also allows for possible time-series heteroskedasticity, performs often as well as (and sometimes even better than) a specially designed less robust version, although the latter occasionally demonstrates some benefits for incremental Sargan-Hansen tests. Furthermore we can report to practitioners: (a) when the effect-noise-ratio is large, the performance of all GMM inference deteriorates; (b) the same occurs in the presence of a genuine (or a supervacaneously treated as) endogenous regressor; (c) inmanysettings the coefficient restrictions tests show serious size problems which usually can be mitigated by a Windmeijer correction, although for γ large or under simultaneity serious overrejection remains unless N is very much larger than 200; (d) the limited effectiveness of the Windmeijer correction is due to the fact that the positive or negative bias in coefficient estimates is often more serious than the negative bias in the variance estimate; (e) limiting to some degree the number of instruments usually reduces bias and therefore improves size properties of coefficient tests, though at the potential cost of power loss because efficiency usually suffers; (f ) for the case of an autoregressive strictly exogenous regressor we noted that it is better to not just instrument it by itself, but also by some of its lags because this improves inference, especially regarding the lagged dependent variable coefficient; (g) to mitigate size problems of the overall Sargan-Hansen overidentification tests the set of instruments should be reduced, possibly by collapsing, and then, after 2-step estimation, none of the examined alternative variants did systematically beat the one using the standard robust weighting matrix based on 1-step residuals; (h) collapsing also reduces size problems of the incremental Sargan-Hansen effect stationarity test; (i) except under simultaneity, the GMM estimator which exploits instruments which are invalid under effect nonstationarity may nevertheless perform better than the estimator abstaining from these instruments; (j ) the rejection probability of the incremental Sargan-Hansen test for effect stationarity is such that it tends to direct the researcher towards applying the most accurate estimator, even if this is inconsistent. When re-analyzing a popular empirical data set in the light of the above simulation findings we note in particular that actual dynamic feedbacks may be much more subtile than those that can be captured by just including a lagged dependent variable regressor, which at present seems the most common approach to model dynamics in panels. n theory the omission of further lagged regressor variables should result in rejections by Sargan-Hansen test statistics, but their power suffers when many valid and some invalid orthogonality conditions are tested jointly, instead of by deliberately chosen sequences of incremental tests or direct variable addition tests. Hopefully tests for serial correlation, 152

165 which we intentionally left out of this already overloaded study, provide an extra help to practitioners in guiding them towards well-specified models. Appendix 5.A Derivations for (5.17) The results for V η and V λ are obvious. Those for V ζ (i) andv ε (i) are obtained as follows. We use the standard result that the variance of a general stationary ARMA(2,1) process where u t D(0, 1), is given by z t = Because we can rewrite (using σ ε =1) ψ(1 φl) (1 γl)(1 ξl) u t, (5.28) Var(z t )=ψ 2 (1 + γξ)(1 + φ2 ) 2φ(γ + ξ). (5.29) (1 γξ)(1 γ 2 )(1 ξ 2 ) ( ) [βρ vε σ v +(1 ξl)]ω 1/2 i =(1+βρ vε σ v )ω 1/2 ξ i 1 L, 1+βρ vε σ v the result for V ε (i) follows upon substituting ψ =(1+βρ vε σ v )ω 1/2 i and φ = ξ/(1+βρ vε σ v ). For V ζ (i) simply take φ =0andψ = βσ v (1 ρ 2 vε) 1/2 ω 1/2. 153

166

167 Chapter 6 Refined exogeneity tests in dynamic panel data models 6.1 ntroduction One of the main advantages of panel data is the possibility to deal with certain forms of unobserved heterogeneity. n linear single equation models for wide panels, which may involve lags of the dependent variable and possibly endogenous regressors, the most popular estimation methods are based on GMM (Hansen 1982), more precisely the implementations proposed by Arellano and Bond (1991) and Blundell and Bond (1998). These methods exploit the fact that a sufficient number of instruments are available internally, which potentially makes the search for external instruments superfluous. Violation of the resulting moment conditions can have two sources: (i) misspecification of the structural relationship, (ii) misclassification of explanatory variables with respect to their correlation with the idiosyncratic error term if the structural equation is correctly specified. After GMM estimation it is common practice to calculate the Sargan-Hansen 1 test statistic in order to check the validity of the overidentifying restrictions, a particular linear transformation (of reduced rank) of the moment conditions. The standard Sargan-Hansen test cannot be used directly to distinguish between (i) and(ii). This overall number of available internal instruments grows quadratically with the number of time-series observations. Several studies, such as Newey and Smith (2004) and Newey and Windmeijer (2009), have discussed the potential bias of the two-step GMM estimator as the number of instruments grows large relative to the width of the panel 2. 1 This test goes under various names of which the most popular variants seem to be the J-test and the Sargan test. 2 Okui (2009) discusses the optimal number of instruments in dynamic panel data models based on an approximation of the mean squared error and concludes that the number of instruments should be taken 155

168 Not only does the number of instruments influence the bias of the estimator. As discussed by Bowsher (2002), Windmeijer (2005), Kiviet et al. (2014) and others, the Sargan-Hansen test may suffer from serious underrejection when the number of moment conditions grows large. Several cures have been proposed, such as reducing the set of instruments, see for instance Roodman (2009), or modifying the test statistic, as in Hayakawa (2014). When explanatory variables are wrongly classified as strictly or weakly exogenous whereas the structural equation is correctly specified, only a subset of moment conditions is violated. Rather than testing all overidentifying restrictions, one can use tests on subsets of moment conditions to detect misclassification, which in theory should have more power. Here the focus is on these subset tests. We will consider two closely related testing principles. The first of which is the incremental Sargan-Hansen test, obtained as the difference of two overall Sargan-Hansen test statistics. Although the standard Sargan-Hansen test suffers in the presence of many moment conditions, it is yet unclear in what way the incremental test is affected. The second test is a Hausman test, a specific implementation of the testing principle proposed by Hausman (1978) also referred to as the Durbin-Wu-Hausman test. The relationship between these two tests is discussed and several variants are examined. Due to its infrequent use in dynamic panel data models, not much is known about its finite sample performance. Although simulation results for a Hausman test by Arellano and Bond (1991) on second-order serial correlation were not very encouraging in that context, there is reason to reconsider this test. Using the delta method as in Windmeijer (2005), we find a finite sample corrected variance estimate for the vector of contrasts. Employing this leads to a corrected Hausman test statistic. Additionally, the diagonal Sargan-Hansen test by Hayakawa (2014) is generalized to the incremental case. This test uses a weighting matrix that is block-diagonal by construction. As Hayakawa (2014) demonstrates, the dimension of the weighting matrix is an important factor in estimating its inverse in finite samples. Hence, forcing a block diagonal structure could mitigate problems stemming from its dimension. Because the resulting statistic does not follow a standard distribution critical values have to be obtained either by simulation or the procedure by mhof (1961). By means of Monte Carlo simulation it is examined how these two refined tests behave relative to their classic standard counterparts. Studies with simulation findings on overidentifying restrictions tests in dynamic panel data models include Bowsher (2002), Hayakawa (2014) and Kiviet et al. (2014). Bowsher (2002) focusses mainly on panel AR(1) models, although it does include some simulations based on an empirical example. Hayakawa (2014) and Kiviet et al. (2014) include an additional regressor that can be made endogenous. Because we are interested in classiproportional to both dimensions. 156

169 fying regressors rather than testing all overidentifying restrictions, we simulate from a linear first-order autoregressive model with an additional regressor which can be made either strictly exogenous, weakly exogenous or endogenous. This is done by extending the simulation design of Kiviet et al. (2014) by adding a lagged error term to the first-order autoregressive process for the additional regressor. We will restrict ourselves to tests based on the Arellano-Bond estimator (AB), although all test statistics can be applied to Blundell-Bond estimators (BB) as well. n Section 6.2 the standard exogeneity tests used for testing a subset of moment conditions are discussed for general models estimated by GMM. Section 6.3 introduces the refined statistics and their properties. Application to dynamic panel data models is discussed in Section 6.4. Section 6.5 presents the simulation design. Results for these simulations are discussed in Section 6.6, an empirical case is studied in Section 6.7 and Section 6.8 concludes. 6.2 Exogeneity tests Estimators and assumptions Consider the L 1 vector of linear moment conditions E[g(w i,θ 0 )] = E[g i (θ 0 )] = 0 i, where w i is a vector of observed variables for i =1,..., N, independent over i, andθ 0 is the true value of the K 1 unknown parameter vector θ, withk<l. Define ḡ(θ) = 1 N N g i (θ), i=1 C = ḡ(θ) θ, C 0 = plim N ḡ(θ) θ. (6.1) Due to the assumed linearity these last two L K matrices do not depend on θ. The one-step GMM estimator ˆθ (1) is found by minimizing Q (1) (θ) =ḡ(θ) W 1 N ḡ(θ) (6.2) with respect to θ. HereW N is an initial L L weighting matrix which satisfies plim N W N = W, with W positive definite. Under standard regularity conditions 3, which include 3 See Arellano (2003) sections A.4 and A

170 Nḡ(θ0 ) d N(0,V 0 ), with 1 V 0 = lim N N N E[g i (θ 0 )g i (θ 0 ) ], (6.3) the one-step GMM estimator is consistent with limiting distribution i=1 N(ˆθ(1) θ 0 ) d N ( 0, (C 0W 1 C 0 ) 1 C 0W 1 V 0 W 1 C 0 (C 0W 1 C 0 ) 1). (6.4) An optimal two-step estimator ˆθ (2) is obtained by minimizing with Q (2) (θ) =ḡ(θ) W 1 N (ˆθ (1) )ḡ(θ). (6.5) W N (θ) = 1 N N g i (θ)g i (θ). (6.6) i=1 From the independence of the observations it follows that plim N W N (ˆθ (1) )=V 0,even in the presence of heteroskedasticity. The limiting distribution of ˆθ (2) is N(ˆθ(2) θ 0 ) d N ( 0, (C 0V 1 0 C 0 ) 1). (6.7) Suppose that the set of moment conditions can be decomposed as E[g i (θ)] = (E[g 1i (θ)],e[g 2i (θ)] ) =0 i, where g 1i (θ) isl 1 1withK<L 1, g 2i (θ) isl 2 1andL = L 1 +L 2 >L 1. Also decompose C =(C 1,C 2) and C 0 =(C 01,C 02) conformably with the decomposition of the moment (1) (2) conditions. This means that alternative one- and two-step estimators ˆθ 1 and ˆθ 1 can be found from exploiting just E[g 1i (θ 0 )] = 0 i by subsequently minimizing Q (1) 1 (θ) =ḡ 1 (θ) W11,Nḡ1(θ), 1 Q (2) 1 (θ) =ḡ 1 (θ) W 1 (1) 11,N (ˆθ 1 )ḡ 1 (θ), where ḡ 1 (θ) = 1 N N i=1 g 1i(θ), W 11,N is the L 1 L 1 upper left block of W N and W 11,N (θ) = 1 N N g 1i (θ)g 1i (θ). i=1 The limiting distributions of these estimators can easily be deduced from (6.4) and (6.7). 158

171 6.2.2 ncremental Sargan-Hansen test As the incremental Sargan-Hansen test is based on the standard Sargan-Hansen test we will first discuss the latter. The standard Sargan-Hansen test is a misspecification test that is usually calculated to validate GMM results. ts purpose is to detect violations of the moment conditions which could be due either to misspecification of the structural relationship, misclassification of explanatory variables with respect to the idiosyncratic error term or invalidity of external instruments. However, the last source is excluded here as the validity of internal instruments is determined by the model specification and the classification of the regressors. As extensively discussed by Newey (1985), it can only be used to test the overidentifying restrictions and not all moment conditions. These overidentifying restrictions are a particular linear combination of the moment conditions. One can think of quite a few implementations of the Sargan-Hansen test as many options exist regarding the weighting matrix, see for instance Kiviet et al. (2014). Here we will focus on the robust weighting matrix used in (6.6) in order to be able to deal with both cross-sectional and time-series heteroskedasticity. The standard Sargan-Hansen test statistic based on two-step GMM is J (2) = N ḡ(ˆθ (2) ) W 1 N (ˆθ (1) )ḡ(ˆθ (2) ). (6.8) Under the null hypothesis it can easily be proven that J (2) d χ 2 (L K). The precise statement of the null hypothesis requires some attention. Although many authors state that the null is E[g i (θ 0 )] = 0 i, in practice it is impossible to distinguish 4 between this and E[ḡ(θ 0 )] = 0. However, as mentioned before, it is only possible to test the overidentifying restrictions, which means that the actual hypothesis tested can be weakened to E[M 0 ḡ(θ 0 )] = 0, where M 0 denotes an L L transformation matrix of rank L K. For convenience we write the null hypothesis as E[ḡ(θ 0 )] = 0, acknowledging the important subtlety above. f estimation is only based on the subset of moment conditions E[ḡ 1 (θ 0 )] = 0, the test statistic is J (2) 1 = N ḡ 1 (ˆθ (2) 1 ) W 1 (1) 11,N (ˆθ 1 )ḡ 1 (ˆθ (2) 1 ), (6.9) with J (2) d 1 χ 2 (L 1 K) ife[ḡ 1 (θ 0 )] = 0. ncremental tests are concerned with testing the particular subset of L 2 moment conditions E[ḡ 2 (θ 0 )] = 0. This subset is tested whilst taking E[ḡ 1 (θ 0 )] = 0 as a maintained hypothesis. Although under the null hypothesis E[ḡ 2 (θ 0 )] = 0 estimator ˆθ (2) (2) is consistent and efficient, under the alternative ˆθ 1 is 4 Of course under independence of observations these statements are the same. 159

172 consistent and efficient, whereas ˆθ (2) is inconsistent. An incremental test statistic J (2) = J (2) J (2) 1, (6.10) is for instance the difference of the two standard Sargan-Hansen test statistics in (6.8) and (6.9). t turns out 5 that under the null hypothesis J (2) d χ 2 (L 2 ). t has been noted that in finite samples this test statistic can yield negative values 6. An alternative test statistic that is guaranteed to be positive is provided 7 by Hayashi (2) (2000). Define a hybrid two-step estimator θ 1 which is found by minimizing Q (2) 1 =ḡ 1 (θ) W 1 11,N (ˆθ (1) )ḡ 1 (θ). (6.11) The weighting matrix used in the second step is based on the one-step estimator calculated under the null hypothesis. The incremental statistic based on the method of Hayashi (2000) is J (2) = J (2) (2) J 1, (6.12) (2) where J 1 = N ḡ 1 ( θ (2) 1 ) W 1 11,N (ˆθ (1) )ḡ 1 ( θ (2) 1 ). Because ˆθ (1) and θ (2) are consistent estimators (2) d under the null hypothesis we obtain J χ 2 (L 2 ). Similarly a test statistic can be (2) constructed by using ˆθ 1 in the weighting matrix for the two-step estimator based on the full set of moment conditions. All incremental Sargan-Hansen test statistics discussed up to now are asymptotically equivalent and have the same local power properties. This stems from the fact that all weighting matrices used in the various test statistics are consistent under local misspecification. As mentioned before, the Sargan-Hansen test is regarded as a general misspecification test instead of just an instrument validity test. n addition to a test on the validity of instruments it can also be interpreted as an exclusion restriction test. Suppose we would like to test the significance of L K additional homogeneous parameters and assume identification under this alternative hypothesis. The restricted two-step estimator that assumes these additional coefficients to be zero is ˆθ (2). Denote the unrestricted two-step estimator by ˆθ U. A well known test that can be used to test the zero coefficient restriction is the Wald test. As shown in Newey and West (1987) the Wald test statistic is numerically 5 See Arellano section A.8 for proofs of the asymptotic null distribution of the regular and incremental Sargan-Hansen tests. 6 t is customary to interpret these as insignificant test outcomes. 7 Hayashi (2000) provides a proof in the solution manual. A slightly more elaborate version is given in Appendix 6.A, also because it forms an introduction to the following appendices. 160

173 equivalent to ( ) LR = N Q (2) (ˆθ (2) ) Q (U) (ˆθ (U) ), (6.13) when ḡ(θ) is linear in θ. Here Q (U) (θ) represents the unrestricted criterion function in the second step which uses the same weighting matrix as Q (2) (θ). This alternative form is particularly convenient. Because Q (U) (ˆθ (U) ) = 0 it follows that LR = J (2). This stems from the fact that the unrestricted model is just identified. Hence, the Sargan-Hansen test can also be interpreted as an exclusion restriction test Hausman test Whereas the subset tests discussed up to this point are all criterion based there is an alternative in the form of the Hausman test. This test investigates the difference between two vector estimates (the vector of contrasts) in order to test E[ḡ 2 (θ 0 )] = 0. t has a rather simple form as long as one of the two estimators is asymptotically efficient, see Newey (1985). n order to explore the test statistic let us denote G(θ) =[C W 1 N (θ)c] 1, G 1 (θ) =[C 1W 1 11,N (θ)c 1] 1, (6.14) and Var(ˆθ (2) )= 1 N G(ˆθ (1) ), Var(ˆθ(2) 1 )= 1 N G 1(ˆθ (1) 1 ), as the usual variance estimates of ˆθ (2) and H (2) = ˆθ (2) 1. The Hausman test statistic is then (ˆθ(2) 1 ˆθ ) [ (2) Var(ˆθ(2) 1 ) Var(ˆθ ] (ˆθ(2) (2) ) 1 ˆθ ) (2), (6.15) where A denotes a generalized inverse of a general q q matrix A. Similar to a Hausman test can be constructed where both two-step estimators are based on the same (2) weighting matrix. Denote Var( θ 1 )= 1 G N 1(ˆθ (1) )and H ( θ(2) (2) = 1 ˆθ ) [ (2) Var( θ(2) 1 ) Var(ˆθ ] ( θ(2) (2) ) 1 ˆθ ) (2). (6.16) Whereas the incremental Sargan-Hansen test is frequently used in a GMM analysis of dynamic panel data to test the validity of a particular subset of moment conditions, the Hausman test is not. Hall (2005) notes that the asymptotic null distribution of the Hausman test is chi-squared with degrees of freedom equal to the rank of (2) plim N ( Var(ˆθ 1 ) Var(ˆθ (2) )), which is seldomly known in practice. Baum et al. (2003) state that there is an exception. f L 2 K they claim the Hausman test has L 2 degrees J (2) 161

174 of freedom, although no clear reasoning is provided by them. The claim by Baum et al. can be supported as follows. Newey (1985) proves numerical equivalence between a specific implementation of his so-called GMM test and H (2) if L 2 K in the linear case. Ahn (1997) shows that the test by Newey is actually equal to the incremental Sargan- Hansen test presented here. Hence, as long as L 2 K and g(θ) is linear 8, it follows that H (2) (2) (2) = J. As the asymptotic null distribution of J is known, the asymptotic null distribution of H (2) is also known in this case. Due to asymptotic equivalence of H (2) and H (2) it can be concluded that H (2) is distributed as χ 2 (L 2 ) under the null hypothesis if L 2 K as well. 6.3 Some possible refinements n addition to these standard subset tests we examine two possible refinements. For the Sargan-Hansen test the procedure of Hayakawa (2014) is generalized to the incremental test. With respect to the Hausman test a finite sample corrected variance for the vector of contrasts is derived similar to the correction by Windmeijer (2005) Diagonal Sargan-Hansen test n order to deal with the problem of many instruments leading to a high dimensional weighting matrix, Hayakawa (2014) proposes a so-called diagonal Sargan-Hansen test 9 for dynamic panel data models. This test replaces the usual weighting matrix by a block-diagonal version. Simulations show promising size properties compared to the standard Sargan-Hansen test. t must be noted that the rejection probabilities reported by Hayakawa (2014) for the standard test differ substantially from those reported by Bowsher (2002). The former finds rejection frequencies that increase with the number of instruments used, whereas the latter reports diminishing rejection frequencies. The reason for this seems to be the use of a centered weighting matrix. Although in theory this should benefit the power of the test, it seems to deteriorate performance under the null hypothesis. n the simulation study we will shed some light on these findings. Here we will extend the principle of Hayakawa (2014) to the incremental Sargan-Hansen test, using the uncentered weighting matrix of (6.6). The diagonal Sargan-Hansen test statistic for testing all overidentifying restrictions is D (2) = N ḡ(ˆθ (2) ) W 1 N,diag (ˆθ (1) )ḡ(ˆθ (2) ), (6.17) 8 f g(θ) is nonlinear the test statistics are only asymptotically equivalent. 9 Hayakawa (2014) calls it the diagonal J-test. 162

175 where W N,diag (ˆθ (1) ) is block diagonal and based on W N (ˆθ (1) ) by setting certain elements to zero. t turns out that D (2) d L K j=1 λ j z 2 j, (6.18) where z j ND(0, 1) and λ =(λ 1,..., λ L K ) is the vector of non-zero eigenvalues of Λ=V 1/2 0 M 0W 1 diag M 0V 1/2 0, (6.19) with M 0 = L C 0 (C 0V 0 1 C 0 )C 0V 0 1 and W diag = plim N W N,diag (ˆθ (1) ). n practice all elements of Λ are replaced by their sample estimates and the appropriate critical values are found by the procedure of mhof (1961), as outlined by Hayakawa (2014). Now we will show how this procedure can be generalized to the incremental Sargan- Hansen tests. The incremental version of this diagonal test is D (2) = D (2) D (2) 1, (6.20) with D (2) 1 =ḡ 1 (ˆθ (2) 1 ) W 1 11,N,diag (ˆθ (1) )ḡ 1 (ˆθ (2) 1 ), (6.21) where W 11,N,diag (θ) isthel 1 L 1 upper left block of W N,diag (θ). The following theorem (proof in Appendix 6.B) gives the asymptotic distribution of D (2). Theorem 6.1 Under the null hypothesis that E[ḡ(θ 0 )] = 0 we find for D (2) of (6.20) D (2) d p λ jzj 2, (6.22) j=1 with λ =(λ 1,..., λ p), the vector of non-zero eigenvalues of Λ = Q 0 Q 01, (6.23) where Q 0 = V 1/2 0 M 0W 1 diag M 0V 1/2 0, Q 01 = V 1/2 0 A M 01W 1 11,diag M 01AV 1/2 0, (6.24) and A is the L 1 L matrix ( O), M 01 = C 01 (C 01V 1 01 C 01 )C 01V 1 01 and V 01 = AV 0 A. Replacing each element of Λ by its sample counterpart yields a feasible implementation of this principle. Critical values are found in the same manner as those for the full-set diagonal test. 163

176 6.3.2 Finite sample corrected variance for the Hausman test Two-step GMM estimators are based on one-step estimates. The usual variance estimator is known for underestimating the variance in finite samples. This is the reason why Windmeijer (2005) developed a finite sample correction for the variance. Due to the explicit occurrence of variance estimates in (6.15) one might replace these by their corrected versions. However, as will be shown, an even more refined approach is possible here. nstead of applying the correction to both variance estimates in (6.15) separately we will derive a finite sample corrected variance for the vector of contrasts. Define the vector function f(θ) = [C W 1 N (θ)c] 1 C W 1 N (θ)ḡ(θ 0). Note that f(ˆθ (1) )= ˆθ (2) θ 0. Expanding this expression around θ 0 yields f(ˆθ (1) )=f(θ 0 )+F (θ 0 )(ˆθ (1) θ 0 )+O p (N 1 ), (6.25) where F (θ 0 )= f(θ) θ θ0.thej-th column of F (θ) isgivenby F j (θ) = [C W 1 N (θ)c] 1 C W 1 N (θ) W N(θ) θ j [C W 1 N (θ)c] 1 C W 1 N (θ)ḡ(θ 0) +[C W 1 N (θ)c] 1 C W 1 N (θ) W N(θ) θ j W 1 N (θ)c W 1 N (θ)ḡ(θ 0) (6.26) By noting that ˆθ (1) θ 0 = (C W 1 N C) 1 C W 1 N ḡ(θ 0), (6.27) a finite sample corrected variance of ˆθ (2) is given by Var c (ˆθ (2) ) = Var(ˆθ (2) )+ 1 N ˆF (ˆθ (1) )G(ˆθ 1 )+ 1 N G(ˆθ 1 ) ˆF (ˆθ (1) ) + ˆF (ˆθ (1) ) Var r (ˆθ (1) ) ˆF (ˆθ (1) ), (6.28) where the j-th column of ˆF (θ) is characterized by ˆF j (θ) =[C W 1 N (θ)c] 1 C W 1 N (θ) W N(θ) θ j W 1 N (θ)ḡ(ˆθ (2) ), (6.29) and where Var r (ˆθ (1) ) is the robust one-step variance estimate of ˆθ (1). n order to make F (ˆθ (1) ) feasible ḡ(θ 0 ) is replaced by ḡ(ˆθ (2) )inf(ˆθ (1) ). As C W 1 N (ˆθ (1) )ḡ(ˆθ (2) ) = 0, evaluation of the first two terms of (6.26) gives zero, obtaining ˆF (ˆθ (1) ). (2) With respect to ˆθ 1 let us denote the expressions above as f 1 (θ), F 1 (θ) and ˆF 1 (θ) 164

177 (which uses ḡ 1 (ˆθ (2) 1 )forḡ(θ 0 )). Note that the vector of contrasts can now be written as ˆθ (2) 1 ˆθ (2) = f 1 (ˆθ (1) 1 ) f(ˆθ (1) ) = f 1 (θ 0 ) f(θ 0 )+F 1 (θ 0 )(ˆθ (1) 1 θ 0 ) F (θ 0 )(ˆθ (1) θ 0 )+O p (N 1 ). (6.30) Using the same reasoning as above a finite sample corrected estimate of its variance can be obtained. Define G =(C W 1 N C) 1 and G 1 =(C 1W 1 11,N C 1) 1. t follows that (see Appendix 6.C) Var c (ˆθ (2) 1 ˆθ (2) ) = Var c (ˆθ (2) 1 )+ Var c (ˆθ (2) ) 2 Var(ˆθ (2) ) 1 (1) G(ˆθ 1 )C N 1W 1 (1) 11,N (ˆθ 1 )AW N (ˆθ (1) )W 1 N CG ˆF 1 (ˆθ (1) ) 1 N ˆF (ˆθ (1) )GC W 1 N W N(ˆθ (1) )A W 1 (1) 11,N (ˆθ 1 )C 1 G 1 (ˆθ (1) 1 ) 1 N ˆF 1 (ˆθ (1) 1 )G(ˆθ (1) ) 1 N G(ˆθ (1) ) ˆF 1 (ˆθ (1) 1 ) 1 N ˆF 1 (ˆθ (1) 1 )G 1 C 1W 1 11,N AW N(ˆθ (1) )W 1 N CG ˆF (ˆθ (1) ) 1 N ˆF (ˆθ (1) )GC W 1 N W N(ˆθ (1) )A W 1 11,N C 1G 1 ˆF1 (ˆθ (1) 1 ). (6.31) Whereas the corrected variance by Windmeijer (2005) consists of four terms, the variance estimate in (6.31) involves sixteen terms. Although it may seem cumbersome, calculating the proposed variance estimate does not require much effort as most components are already required for the individual corrected variances, which are readily available in most popular software packages. A Hausman type test that uses this variance estimate is given by (ˆθ(2) H c (2) = 1 ˆθ ) (2) Var c (ˆθ (2) 1 ˆθ (ˆθ(2) (2) ) 1 ˆθ ) (2), (6.32) which follows the same asymptotic distribution under the null hypothesis as H (2). This correction principle can easily be modified to accommodate different vectors of contrasts, such as the one examined by H (2). Correcting this test statistic involves some simple modifications detailed in Appendix 6.D. The corrected version of H (2) is H ( θ(2) c (2) = 1 ˆθ ) (2) Var c ( θ (2) 1 ˆθ ( θ(2) (2) ) 1 ˆθ ) (2). (6.33) Self-evidently, like H c (2) this test statistic also has the same asymptotic null distribution as H (2). Although it is very similar to H c (2) it involves different correction terms and even (2) a different variance correction for θ 1. However, due to the fact that both estimators 165

178 are based on the same weighting matrix this test statistic is slightly less complicated. By simulation we will examine which implementation provides the best finite sample performance. 6.4 Testing exogeneity in dynamic panel data models n dynamic panel data models there is often an abundance of moment conditions available. However, misclassification of one or more explanatory variables only renders a subset of moment conditions invalid if the structural equation is correctly specified. Although the Sargan-Hansen test should be able to detect these violations, it has diminishing power as the number of moment conditions grows as noted by Bowsher (2002). f one only suspects misclassification and no other form of misspecification, the discussed subset tests are expected to have better power properties as fewer restrictions are tested. Although some of these tests are frequently used for various purposes 10 they are less frequently used for classifying the weakly exogenous or endogenous nature of explanatory variables in the context of linear dynamic panel data models. n this section the tests will be made operational for some frequently arising hypotheses Model and assumptions Consider the balanced linear first-order autoregressive dynamic panel data model y it = γy i,t 1 + x itβ + η i + ε it (i =1,...N; t =1,..., T ) (6.34) where x it is a (K 1) 1 vector containing observations on K 1 regressors, η i is a random individual specific effect and ε it is the idiosyncratic disturbance term. The individual effects and the explanatory variables are allowed to be correlated as they most often are supposed to be in practice. Regarding the two error components we make the 10 The incremental Sargan-Hansen test is often used for testing effect stationarity in dynamic panel data models and is always reported by the popular Stata package xtabond2. Note that this package (2) does report J instead of J (2). The Hausman test is popular in the context of V in order to establish endogeneity of a regressor (see Chapter 2), and also for choosing between random and fixed effects panel data models. 166

179 following assumptions η i iid(0,ση) 2 E[ε it ]=0,E[ε 2 it] =σ it 2 i, j, t s =1,..., T. (6.35) E[ε it ε js ]=0,E[η i ε jt ]=0 E[y i0 ε jt ]=0 Note that we allow for cross-sectional and time-series heteroskedasticity but not for serial correlation because we assume that all time-dependence in y it is adequately modelled by the dynamics explicit in γy i,t 1 and implicit in x itβ. When discussing the classification of explanatory variables we do so with respect to ε it in (6.34). We distinguish between strictly exogenous, weakly exogenous and endogenous explanatory variables. Suppose that the set of regressors x it can be decomposed as x it =(m it,w it,v it), (6.36) where m it is the K m 1 vector of strictly exogenous regressors, w it is the K w 1 vector of weakly exogenous regressors and v it is the K v 1 vector of endogenous regressors. The first two sets of explanatory variables may include lagged components. f a variable is for instance endogenous and included in v it, its lagged value is weakly exogenous and belongs to w it. From the above it follows that K m + K w + K v = K 1 and we require that (K 1) 1 as we are interested in tests regarding x it or subsets of it. Given the classification of the various components of x it the following orthogonality conditions hold E[m it ε is ]=0 E[w it ε i,t+l ]=0 E[v it ε i,t+1+l ]=0 i, t, s, l 0. (6.37) Note that with respect to y i,t 1 we have that E[y i,t 1 ε i,t+l ]=0 i, t, l 0, from which follows that y i,t 1 is a weakly exogenous regressor. Although y i,t 1 canbeaddedto w it, we will refrain from that in order to be able to focus on classifying elements of x it. Throughout the chapter availability of observations y i,1 l and x i,1 l is assumed, where l is the maximum number of lags in the model. n case of dynamic models, l 1 by definition. The above means that all variables are available for the same time periods and at most T +l observations are available. Estimation is always based on T time-series observations. For the sake of simplicity we will just consider l = 1. That this is no restriction follows from the fact that the number of time-series observations T can be replaced by T l +1fora 167

180 general l and all dimensions change accordingly. Furthermore we require that x it includes the current observations of all lagged explanatory variables. Again, this limitation allows for a much cleaner notation. Extension to a more general model, in which one allows for lags without including current observations, is straight-forward. By stacking the T time-series observations (6.34) can be written as y i = γy i, 1 + X i β + η i ι T + ε i, (6.38) where y i =(y i1,..., y it ), y i, 1 =(y i0,..., y i,t 1 ), X i =(x i1,..., x it ), ε i =(ε i1,..., ε it ) and ι T is a T 1 vector of ones. With respect to the decomposition of the regressors we now have X i = (M i,w i,v i ), where M i = (m i1,..., m it ), W i = (w i1,..., w it ) and V i =(v i1,..., v it ). We explore two popular methods to deal with the occurrence of individual effects in (6.38). As is done in Anderson and Hsiao (1981) and Arellano and Bond (1991) one can take first differences. An alternative transformation, suggested by Arellano and Bover (1995), that does not introduce serial correlation in the error term under time-series homoskedasticity is the forward orthogonal deviations (FOD) transformation. By taking either transformation one time-series observation is lost for every individual and we denote ˇy i = γˇy i, 1 + ˇX i β +ˇε i, (6.39) for a transformed model. The transformed regressors have a different classification with respect to their respective transformed error terms. Whereas y i,t 1 is weakly exogenous 11 in the levels equation with respect to ε it, its transformed version is endogenous in (6.39) with respect to ˇε it. This means that least squares estimators are inconsistent, which calls for instrumental variables or GMM techniques. As clarified by Anderson and Hsiao (1981) instrumental variables can be found internally Full comprehensive internal instrument matrices Let us define y t i =(y i0,..., y it ), m t i =(m i0,..., m it), w t i =(w i0,..., w it), v t i =(v i0,..., v it ), (6.40) where m it, wit and vit have Km, Kw and Kv elements and consist only of the current observations whilst all lags that occur as explanatory variables are left out. We will consider three types of instrument matrix blocks: one associated with strictly exogenous 11 The transformed version of w it may be endogenous (6.39) if w it depends on ε i,t

181 variables, one with weakly exogenous variables and one with endogenous variables, all with respect to ε it in (6.34). Define Z (m,1) lev,i = Z (y,3) lev,i = i 0 0 O... O,,2) Z(w m T 0 0 m T i yi O... O 0 0 y T 2 i lev,i =,,3) Z(v lev,i = Note that these matrices have the following number of columns wi O... O, 0 0 T 1 wi vi O... O. (6.41) 0 0 v T 2 i Z (m,1) lev,i :(T +1)(T 1)Km, Z (w,2) lev,i :(T +2)(T 1)Kw/2, Z (y,3) lev,i : T (T 1)/2, Z (v,3) lev,i : T (T 1)Kv/2. Exploiting all valid linear moment conditions resulting from y i,t 1 and x it yields the instrument blocks Z i = ( ) Z (y,3) lev,i Z (m,1) lev,i Z (w,2) lev,i Z (v,3) lev,i for i =1,..., N. (6.42) The resulting instrument matrix houses L =(T +1)(T 1)K m (T +2)(T 1)K w + 1 T (T 2 1)(K v + 1) instruments and from (6.35) and (6.37) it follows that E[Z i ˇε i ]=0. t is important to realize that the lagged explanatory variables do not yield additional instruments as they are already provided by x it =(m it,wit,v it ) Estimators We will focus on testing after Arellano-Bond (AB) estimation. This estimator is a specific implementation of GMM, which means that all theory discussed up to now can be easily applied. For notational convenience let us rewrite (6.39) as ˇy i = Řiθ +ˇε i, (6.43) 169

182 where Ři =(ˇy i, 1, ˇX i ). A one-step AB estimator can be calculated from ˆθ (1) = ( N )( N ) 1 ( iři) N Ř iz i Z ihz i Z i=1 i=1 ( N )( N ) 1 ( N ) Ř iz i Z ihz i Z i ˇy i, (6.44) where H is either the identity matrix in case of FOD and i=1 i= H =, (6.45) when taking first differences. A heteroskedasticity robust variance estimator can be constructed by exploiting the consistency of the one-step residuals ˆˇε (1) i =ˇy i Řiˆθ (1). Define ˆΓ such that ˆθ (1) = ˆΓ N i=1 Z i ˇy i,then The two-step AB estimator is ˆθ (2) = Var r (ˆθ (1) )=ˆΓ ( N )( N Ř iz i i=1 i=1 ( N )( N Ř iz i i=1 ( N i=1 of which the variance can be estimated by ( N )( N Var(ˆθ (2) )= Ř iz i i=1 i=1 i=1 Z i Z i Z i Z i ˆˇε (1) i ˆˇε (1) i ˆˇε (1) i i=1 i=1 1 ) (1) ˆˇε i ˆˇε (1) i Z i ˆΓ. (6.46) ˆˇε (1) i ˆˇε (1) i ˆˇε (1) i ) 1 ( iři) N Z i Z i=1 1 ) 1 ( N ) Z i Z i ˇy i, (6.47) i=1 ) 1 ( iři) N Z i Z i=1 1. (6.48) Alternatively, the number of instruments may be reduced by for instance collapsing 170

183 the instrument matrix. Define for instance Z (m,1) c,i =.. m i2 m i1 m i m i3 m i2 m i1 m i m it m i,t 1 m i,t 2 m i,t 3... m i0.. (6.49) Furthermore Z (w,2) c,i, Z (y,3) c,i and Z (v,2) c,i are defined in similar fashion. Analogous to the extended instrument matrix blocks we construct its collapsed version by Z c,i = ( ) Z (y,3) c,i Z (m,1) c,i Z (w,2) c,i Z (v,3) c,i for i =1,..., N, (6.50) which includes L =(T +1)Km + TKw +(T 1)(Kv + 1) instruments. Although one can think of different methods to reduce the number of instruments (for example selecting only the two most recent lags as instruments) and other estimators (like the Blundell-Bond estimator), we limit our analysis to the estimator and instrument matrices discussed up to now Establishing endogeneity Although GMM allows for endogeneity of explanatory variables, simulation results by Kiviet et al. (2014) illustrate that needlessly assuming endogeneity increases the bias and standard deviation of AB estimators substantially and deteriorates coefficient test performance. Suppose one suspects that wit is endogenous instead of weakly exogenous. Rather than just assuming endogeneity one can test whether or not it is significant. The hypotheses 12 are denoted only regarding wit, as the assumptions regarding other explanatory variables are presumed to hold } H 0 : E[witε it ]=0 i, t, (6.51) H 1 : E[witε it ] 0 under the maintained hypothesis that E[w t 1 i ε it ]=0, i, t, and with respect to the other moment conditions the assumptions of the previous subsections apply. Even though wit is weakly exogenous under the null hypothesis, the null 12 Note that we can easily generalize all test statistics to a subset of w it. 171

184 allows for either weak or strict exogeneity of wit, meaning that this is not a test on weak exogeneity, but rather on endogeneity. Although wit is treated as weakly exogenous in what follows, we could also test endogeneity against strict exogeneity. All subset tests examined here require estimation of the model under both hypotheses. (2) The two-step estimator using the smaller set of moment conditions is ˆθ 1 and uses Z 1 i = ( ) Z (y,3) i Z (m,1) i Z (w,3) i Z (v,3). (6.52) i instead of Z i in (6.42) as instrument blocks. When using all available instruments misclassification would render (T 1)Kw moment conditions invalid. Although the instruments are constructed from the orthogonality conditions in (6.37), these are not the moment conditions used for estimation. Due to the transformation of the model, the estimator under the null hypothesis exploits E[Z i ˇε i ] = 0, whereas that under the alternative uses E[Zi 1 ˇε i ] = 0. This also has consequences for the subset tests. Take for instance the FD transformation. f wit is actually endogenous then E[witε it ] 0. However, the subset of moment conditions that is unique to E[Z i ˇε i ] = 0 is actually E[wi1(ε i2 ε i1 )] E[wi1ε i2 ] E[wi1ε i1 ]. =.. =0. (6.53) E[wi,T 1 (ε it ε i,t 1 )] E[wi,T 1 ε it ] E[wi,T 1 ε i,t 1] This vector consists of two parts due to the first differenced idiosyncratic errors. The first part is assumed to be zero under the maintained hypothesis and the second part represents the moment conditions of interest. The fact that we cannot access the second part directly results in a loss of power as the first part contributes to the variance but does not yield additional information. The occurrence of the individual effect makes this inevitable. When the tests are based on collapsed instruments the null hypothesis specific vector of moment conditions becomes [ T ] [ T ] [ T ] E wi,t 1(ε it ε i,t 1 ) = E wi,t 1ε it E wi,t 1ε i,t 1 =0, (6.54) t=2 t=2 which naturally also consists of two parts. The moment conditions in (6.54) are linear combinations of those in (6.53), which means that this test might not have power against some alternatives if E[w it ε it ] is not constant over time. All test statistics discussed can now be calculated, although several require additional discussion. For instance, in Section 6.2 it was noted that when the number of moment t=2 172

185 conditions tested exceeds the number of explanatory variables, the asymptotic distribution of the Hausman tests is unknown. t is not hard to imagine that (T 1)Kw might exceed K for large values of T. Using collapsed instruments ensures that the number of moment conditions tested equals Kw, which by definition is always smaller K. With respect to the diagonal Sargan-Hansen test of Section 6.3 one has to choose a block-diagonal structure for the weighting matrix. As Hayakawa (2014) notes, a natural choice is available when using the FOD transformation and Z i under time-series homoskedasticity. The block-diagonal structure of Z iz i can be used because E(ˇε iˇε i)=σi 2 and plim N N i=1 Z (1) i ˆˇε i ˆˇε (1) i Z i is block-diagonal. Under time-series heteroskedasticity this structure in no longer optimal, although the asymptotic distribution of the diagonal test statistics is still valid. We will only examine diagonal Sargan-Hansen tests after taking FOD and using this particular block-diagonal structure following from using all instruments Z i Establishing weak exogeneity Along the same lines of testing the potentially endogenous variables wit againstweakexogeneity, one may test whether m it is actually strictly exogenous. The null and alternative hypotheses are formulated as follows with as the maintained hypothesis } H 0 : E[(m i,t+1,..., m it ) ε it ]=0 H 1 : E[(m i,t+1,..., m it ) ε it ] 0 i, t, (6.55) E[m t i ε it ]=0, i, t. Violation of the null hypothesis potentially renders 1T (T 2 1)K m moment conditions invalid when using all available instruments. Under the alternative hypothesis instrument blocks ( ) Zi 1 = Z (y,3) i Z (m,2) i Z (w,2) i Z (v,3). (6.56) i are used. When using all instruments the subset of moment conditions actually under investigation is E[(m it,m i,t+1,..., m it ) (ε it ε i,t 1 )] = 0, t =2,..., T, (6.57) 173

186 which in turn can be written as E[m itε it ] E[m itε i,t 1 ] E[m i,t+1ε it ] E[m i,t+1ε i,t 1 ] =0,t =2,..., T. (6.58).. E[m it ε it] E[m it ε i,t 1] The interesting thing about (6.58) is that for every t the first Km elements are of a different nature than the following (T t)km elements. f m it is actually a set of weakly exogenous explanatory variables E[m itε it ] = 0, whereas E[m itε i,t 1 ] 0. All other elements of (6.58) are non-zero if m it is assumed to be for instance autoregressive. However, if m it is actually endogenous and the maintained hypothesis is also violated, all elements are non-zero. f (6.55) is tested using the collapsed set of instruments one actually examines [ T ] [ T ] [ T ] E m it(ε it ε i,t 1 ) = E m itε it E m itε i,t 1 =0, (6.59) t=2 t=2 t=2 of which the first part is zero if m it is weakly exogenous and the second part is non-zero. Under endogeneity of m it both parts are non-zero. From the discussion above we can conclude that the subset tests may have very different power properties, depending on the type of instruments used, the hypothesis tested, the actual violation of the null and the DGP. Hence, by means of Monte Carlo simulation these power properties will be investigated after examining the actual significance levels. 6.5 Simulation design The simulation design by Kiviet et al. (2014) is extended by adding a lagged disturbance term to the regressor x it similar to Kiviet (1999). We consider the stable dynamic simultaneous equation model with cross-sectional heteroskedasticity y it = γy i,t 1 + βx it + σ η η i + σ ε ω 1/2 i ε it, (i =1,..., N; t =1,..., T ) (6.60) where γ < 1. With respect to regressor x it we define the process x it = ξx i,t 1 + π η η i + π λ λ i + σ v ω 1/2 i v it, (6.61) where ξ < 1and v it = ρ 0 ε it + ρ 1 ε i,t 1 +(1 ρ 2 0 ρ 2 1) 1/2 ζ it. (6.62) 174

187 Cross-sectional heteroskedasticity is introduced by ω i. t is generated as ω i = e hi(δ), with h i (δ) = δ 2 /2+δ[κ 1/2 η i +(1 κ) 1/2 λ i ] ND( δ 2 /2,δ 2 ), (6.63) so ω i has a lognormal distribution that depends on δ and κ. Asδ is made larger, the degree of heteroskedasticity increases. For δ = 0 the error term is homoskedastic as ω i =1. With κ therelativeshareofηi and λ i can be determined. With respect to δ we will consider δ {0, 1} and κ will initially be set to zero. For the autoregressive parameters we will choose γ {0.2, 0.5, 0.8} and ξ = {0.5, 0.8} and without loss of generality we can set σ ε = 1. This leaves us with the seven free parameters {β,σ η,π η,π λ,σ v,ρ 0,ρ 1 }. Rather than choosing arbitrary values for these, we will consider seven econometrically meaningful notions derived under stationarity of x it and y it. What follows is very similar to the approach of Kiviet et al. (2014). Therefore only the relevant design parameters are mentioned here, whereas some additional details can be found in the previous chapter and the paper mentioned. The average long-run variance of x it, V x, is set to unity, which can be done without loss of generality as β is a free parameter. n the long-run variance of x it we can fix the relative contributions of the two individual effects by EF η x = π 2 η/(π 2 η + π 2 λ), (6.64) for which we will investigate values EFx η {0, 0.3, 0.6}. Additionally the contribution of the accumulated individual effects relative to the long-run variances is fixed by introducing EV F x = π2 η + π 2 λ (1 ξ) 2 / V x, (6.65) for which we will examine EV F x {0.1, 0.3, 0.5}. The three preceding restrictions yield values for π η, π λ and σ v π λ = (1 ξ)[(1 EFx η )EV F x ] 1/2, (6.66) π η = (1 ξ)[efx η EV F x ] 1/2, (6.67) σ v = [(1 ξ 2 )(1 EV F x )] 1/2. (6.68) The classification of regressor x it depends on the values of ρ 0 and ρ 1. Although it is obvious that x it is strictly exogenous when ρ 0 = ρ 1 = 0, weakly exogenous when only ρ 0 = 0 and endogenous when ρ 0 0, we would like to control the degree to which the strict 175

188 and weak exogeneity conditions are violated. With respect to the degree of simultaneity we find that σ ε ω 1/2 i E(x it ε i,t) =σ ε σ v ω i ρ 0, which on average equals σ v ρ 0. Using that V x =1 and picking a value for the average degree of simultaneity ρ xε, ρ 0 canbecalculatedby ρ 0 = ρ xε /σ v. (6.69) Similar to the degree of simultaneity, we can define a correlation associated with the assumption of weak exogeneity to measure the degree to which it is violated. This correlation can be controlled in the same way as the degree of simultaneity. The average σ ε ω 1/2 i E(x it ε i,t 1) isσ v ρ 1, which means that ρ 1 can be chosen from ρ 1 = ρ xε 1 /σ v, (6.70) where ρ xε 1 is the in the design chosen value for the average correlation between x it and σ ε ω 1/2 i ε i,t 1. Of course a more realistic DGP for x it could have been chosen for instance based on scheme 2 of Bun and Kiviet (2006), the present choice however offers the advantage that ρ 0 and ρ 1 can easily be given values. Given that all parameters with respect to x it are now chosen, let us consider the remaining free parameters for generating y it. We set σ η by fixing the long-run impact of ηi on y it purely through its occurrence in (6.60) and relative to the current noise σ ε =1. Define DENy η = σ η /(1 γ), (6.71) from which σ η is easily obtained. Whereas Kiviet et al. (2014) use a signal to noise ratio to determine values for β, this strategy does not yield a unique solution here. Therefore we use the total multiplier, through which β can be determined from a direct relation with γ TM = β 1 γ. (6.72) As all notions have been derived under stationarity of x it and y it, we consider simulations in which x it and y it are (close to) stationary. Before generating the sample observations 50 pre-heating observations are generated with start values x i, 50 = y i, 50 = 0, after which the pre-heating observations are discarded. 6.6 Simulation results Although the simulation design is characterized by a high-dimensional parameter space, we limit our analysis to a few relevant cases. One reference case is chosen from which 176

189 is deviated in a few interesting directions. The reference case is characterized by the following parameters. The x it series has autoregressive coefficient ξ =0.8, EFx η =0and EV F x =0.3. With respect to the relationship for y it we take DENy η =1,TM=1and ρ xε 1 = ρ xε = 0 nitially we take κ = 0, so any heteroskedasticity is uncorrelated with the individual effect in y it. To limit problems due to small samples the number of crosssectional observations is first chosen to be N = 200. By choosing ρ xε 1 = ρ xε =0,all rejection probabilities are estimated under the various null hypotheses. After investigating actual significance levels, ρ xε 1 and ρ xε will be given non-zero values to investigate size corrected power. All results are obtained by simulations with R = replications. Rejection probabilities are estimated for a nominal significance level of 5%. Note that the number of moment conditions changes with the assumptions made regarding the classification of x it and with the value of T. The same notation as in Section 6.2 and Section 6.3 is used, although a subscript col is added to J (2) if it is based on collapsed instrument matrices. Only results for tests based on first differences are provided (with the exception of diagonal test statistics) although those based on forward orthogonal deviations are available upon request. n general the results differed very little between these two methods. 13 The upper part of each table is used to provide results for the Sargan-Hansen test on all overidentifying restrictions, which we will refer to as the full set tests. These are included in order to examine to what extent they are able to detect misclassification. We examine the standard Sargan-Hansen test based on all instruments or collapsed instruments and a diagonal test statistic only based on all instruments. A collapsed version is excluded for this test statistic as no natural block-diagonal structure for the weighting matrix is available. The subset tests are presented in the lower half of the tables. A header clarifies whether the hypothesis of subsection or that of is tested. The subset tests are accompanied by another header, indicating what type of instrument matrix is used. The Hausman type statistics are only examined when using collapsed instruments as only then the asymptotic null distribution is known. The various degrees of freedom are given here first for all instruments, followed by the degrees of freedom after collapsing. The full set tests calculated under the assumption of endogeneity have (T +1)(T 1) and 2(T 2) degrees of freedom. Under the assumption of weak exogeneity the degrees of freedom are T (T +1) 4and2T 3. When treating x it as strictly exogenous the degrees of freedom are (3T 2 T 6)/2 and2(t 1). When using all instruments and testing for endogeneity by a subset test the degrees of freedom 13 Additionally, results based on CUE (Hansen, Heaton and Yaron, 1996) were produced where possible. These tests showed no clear improvement over Arellano-Bond based tests and were therefore excluded. 177

190 is T 1 and when testing for weak exogeneity it is T (T 1)/2. The degrees of freedom for the subset tests are always 1 after collapsing Results under strict exogeneity The results for the reference case are given in Table 6.1. Regarding the Sargan-Hansen test on all overidentifying restrictions it is found that the widely used statistic J (2) underrejects as T grows. As γ becomes close to unity J (2) rejects slightly more often. n the presence of heteroskedasticity these patterns are magnified, resulting in very low rejection probabilities for T = 8. Although in theory the diagonal test statistic D (2) should help to mitigate these size problems as T grows, it seems to perform worse. Limiting the number of moment conditions by collapsing the instrument matrix appears preferable as J (2) col outperforms the tests based on all instruments. The degree to which it is undersized if T is large is much less serious. Now let us turn to the subset tests focusing on the classification of individual variables. n the reference case all statistics behave reasonable regarding size when testing the null (2) hypothesis of weak exogeneity against the alternative of endogeneity. Especially J and D (2) perform well when using all instruments, whereas J (2) overrejects if T =4andγ = (2) 0.8. Using collapsed instruments J still outperforms J (2) especially when γ =0.8. The uncorrected Hausman test H (2) overrejects, displaying even higher rejection probabilities when T increases and in the presence of heteroskedasticity. ts corrected version H c (2) performs well under homoskedasticity, although size problems under heteroskedasticity remain serious. Given the disappointing performance of these two Hausman tests we will refrain from discussing them in the remainder of this section. Note that H (2) is (2) (2) numerically equivalent to J and is therefore omitted. ts corrected version H c behaves (2) very similar to J which is good as the latter shows no need for correction. Testing the null hypothesis of strict exogeneity against the alternative of weak exogeneity generally results in slightly worse sizes when using all instruments. Given that the tests based on collapsed instruments only test one restriction, little difference is found with respect to testing for endogeneity. This all suggests that the results in Hayakawa (2014) are indeed due to the use of a centered weighting matrix. n order to check this the reference case is also simulated for test statistics based on centered weighting matrices. ndeed the rejection probabilities are found to approach unity when T grows, for both the full set tests as the subset tests when using all instruments. Collapsing instruments also yields test statistics that overreject, although less severely. Hence, the use of centered weighting matrices for testing moment conditions has to be discouraged. 178

191 Decreasing the number of cross-sectional observations to N = 100, as in Table 6.2, affects all full set test statistics. All rejection probabilities have decreased and in almost every instance there is underrejection. When T =8andδ = 1 the rejection probabilities are often close to zero. The Sargan-Hansen test based on collapsed instruments performs relatively well, with rejection probabilities never less than The patterns regarding γ, T and δ remain unchanged. The impact on the subset tests when testing for endogeneity is less profound. Although the results seem slightly less satisfactory than those for N = 200, no clear pattern emerges. However, when testing for weak exogeneity, the tests based on all instruments show very low rejection frequencies under heteroskedasticity when T =8. (2) Collapsing resolves this issue. The corrected Hausman test H c slightly outperforms its (2) uncorrected version J, which performs pretty well already. The first way in which is deviated from the reference case is with respect to DEN y. Setting DEN y = 4 yields Table 6.3, again for N = 200. The full set tests exhibit higher rejection probabilities if γ =0.8. Regarding the size of the subset tests only J (2) rejects more often if γ =0.8insome instances. No obvious difference between the tests on endogeneity and the tests on weak exogeneity is visible. Results for DEN y = 1 are omitted as they do not seem to differ much from those for the reference case. The results in Table 6.4 are obtained by setting ξ =0.5. The autoregressive process for x it is now less persistent. Although we predicted that the power of the subset tests would depend on ξ, we first have to examine the behaviour under the null. We find virtually no impact of ξ on the full set tests. However, the sizes of the various subset tests are affected. Testing for endogeneity using all instruments by J (2) results in higher (2) rejection frequencies, whereas J and D (2) seem almost invariant regarding the change in ξ. Reducing the number of instruments by collapsing does not mitigate the effect on J (2). Surprisingly, when testing for weak exogeneity, this same test statistic displays lower rejection probabilities than for ξ =0.8. Next we report on changes to the reference case that did not influence the results much and for which the tables are omitted. First we set EF x =0.3 (introducing correlation between y it and x it through ηi ). No effect becomes apparent from introducing this correlation. Setting κ =0.5 whilst keeping EF x =0.3 makes the heteroskedasticity depend on ηi as much as it depends on λ i. Under heteroskedasticity these results do not differ much (and under homoskedasticity they are the same as in the previous case). ncreasing the total multiplier to TM = 3 only improves the size of the incremental diagonal Sargan-Hansen test in the presence of heteroskedasticity and when T =8. 179

192 6.6.2 Results under weak exogeneity By making x it weakly exogenous several test statistics are calculated under their alternative hypothesis. These are the subset tests that test for weak exogeneity and the full set test calculated under the assumption of strict exogeneity. We examine the situation in which the reference case is altered by setting ρ xε 1 =0.2inTable 6.5. The full set tests calculated under the null hypothesis reject slightly more often now. The rejection probabilities calculated under the alternative are size corrected, which allows for mutual comparisons. Especially for T = 4 the tests based on all instruments exhibit more power than those based on collapsed instruments. Among these test statistics it is the subset version of the diagonal Sargan-Hansen test that has the highest rejection probability. The subset tests are able to detect the invalidity of some of the instruments with a slightly higher probability than the full set tests unless T =8andδ = 1. This difference is more pronounced for tests based on collapsed instruments, especially for T small. The presence of heteroskedasticity has a diminishing effect on the power of all tests. Whereas no difference is found between J (2) (2) and H c in terms of size corrected power, the latter is found to reject slightly more often when the test statistics are not size corrected (not tabulated here), which is reasonable as both require no size correction Results under endogeneity By setting ρ xε =0.2 we obtain results for x it endogenous. First we consider the case in which ρ xε 1 =0andx it is only correlated with ε i,t 1 through ξx i,t 1. Note that the null hypothesis is violated for each subset test and that only the full set tests calculated under the assumption of endogeneity should follow its null distribution. Table 6.6 presents the (size corrected) rejection probabilities for this case. The full set tests calculated under the assumption of endogeneity reject more often now x it is indeed endogenous, resulting in overrejection if T =4orγ large. The diagonal Sargan-Hansen test seems unaffected and still underrejects for T = 8. The full set tests presented under strict exogeneity have slightly more trouble detecting the invalidity of some of the instruments than the full set tests calculated under weak exogeneity when using all instruments. This has probably to do with the fact that relatively more instruments are valid in the former case. Surprisingly, the tests based on collapsed instruments have more power now. This result carries over to the subset tests even when testing for strict exogeneity. This is opposite to what we found when x it was weakly exogenous. Whether using all instruments or collapsing them results in more power seems to be determined by the type of violation of the null hypothesis and the hypothesis tested. When testing for weak exogeneity (under the null hypothesis 180

193 that x it is strictly exogenous) it is observed that the diagonal Sargan-Hansen test has problems for γ larger than 0.2, although it performs rather well when testing for strict exogeneity. Among the subset tests based on collapsed instruments we find that J (2) consistently outperforms all other test statistics, sometimes by a considerable margin. t must be noted however, that in contrast to J (2) (2) and H c it requires size correction. n Table 6.7 we examine the case in which both ρ xε and ρ xε 1 equal 0.2. Not surprisingly the full set tests calculated under endogeneity overreject even more now. This is again most evident for T =4orγ large. As we found the effect on the power of the choice of instruments to dependent on the type of violation of the null hypothesis, this is an interesting case. Both violations are combined and we observe that the tests for endogeneity again benefit from using collapsed instruments. However, when testing for weak exogeneity it helps to use all instruments if T = 4, whereas collapsing yields more power if T =8. Examining the same case, with the exception of ξ which is now chosen to be 0.5, we find some remarkable differences in Table 6.8. Under the null the full set tests are more well behaved, whereas they have less power. Especially the tests calculated under the assumption of weak exogeneity suffer. All subset test exhibit lower rejection probabilities as well, especially when testing for endogeneity. The patterns found from examining Table 6.7 remain the same. 6.7 Empirical case study Now we will further illustrate some of our findings regarding the full and subset tests in the model of crime estimated by Cornwell and Trumbull (1994). This study investigated to what extent the criminal justice system is able to deter crime. The data are aggregated at the county level for North Carolina, spanning the years Recently Bun (2015) used these data to estimate a dynamic panel data model, using internal instruments. n the original paper only a static model has been considered, using external instruments. Bun (2015) demonstrates that these external instruments are weak and that lags of the explanatory variables yield more reliable estimates. Kelaher and Sarafidis (2011) have estimated a very similar dynamic model using more recent data for New South Wales. Let us first consider the original model estimated by Cornwell and Trumbull (1994) CR it = β a P a it + β pol Pol it + x itβ + η i + τ t + ε it, (6.73) where all variables are in logarithms. The dependent variable is the crime rate CR it, 181

194 Pit a is the probability of arrest, Pol it is the police capacity and x it consists of probability of conviction (Pit), c probability of imprisonment (P p it ), average sentence length (Sen it), eight wage variables, population density, a minority index and percentage of young males. Furthermore, τ t is the time-specific effect and η i is the individual specific effect. Cornwell and Trumbull (1994) estimate different versions of the model in (6.73), treating different sets of regressors as endogenous. Similar to Kelaher and Sarafidis (2011) and Bun (2015) the model in (6.73) is made dynamic by adding a lagged dependent variable and including the first order lags of probability of arrest and police capacity CR it = γcr i,t 1 + β a P a it + β 1 ap a i,t 1 + β pol Pol it + β 1 polpol i,t 1 + x itβ + η i + τ t + ε it. (6.74) Following Bun (2015) we will initially treat police capacity and probability of arrest as endogenous. Given the construction of the probability of arrest it should always be treated as endogenous. Police capacity is often also treated as endogenous. Here we will apply the subset test J to examine the difference between using an extended set of instruments or using collapsed instruments for this particular dataset. The endogeneity of both variables will be tested. When examining the effect of the type of instrument matrix only the instruments for the lag of crime rate, probability of arrest and police capacity will be changed. However, as N is only 90 the extended set of instruments should not include all lags. Hence, when opting for the extended set of instruments we will only include the second lag. When using a collapsed instrument matrix all lags form part of the instrument matrix. Every regression will be carried out twice, using either the extended or the collapsed set of instruments. Although Bun (2015) treats the probability of conviction, probability of imprisonment and average sentence length as strictly exogenous, we allow these regressors to be weakly exogenous. Therefore we will include the first, second and third lag of these variables as instruments and collapse them in every regression, unless reported otherwise. This set of regressors is tested for weak exogeneity. All other non-endogenous regressors are instrumented by themselves as their lags are expected to yield very little information regarding the other regressors. The first regression results are given in Table 6.9. All standard errors are corrected using the method of Windmeijer (2005) and are given in parentheses. f the estimates and standard errors are given in bold that variable is treated as endogenous, italics represents weakly exogenous regressors and normal font stands for strictly exogenous variables. Lags of regressors are given the same font as their current counterparts. The first column replicates the findings by Bun (2015). Although the same number of instruments is used, the results differ somewhat due to the different choice of instruments. Furthermore, 182

195 by collapsing almost all standard errors increase and the p-value of the full set Sargan- Hansen test slightly decreases to The second set of regressions differs from the first with respect to the fact that the probability of conviction, probability of imprisonment and average sentence length are now treated as weakly exogenous. The standard errors of their estimated coefficients naturally increase, although the standard errors for lagged probability of arrest and lagged police capacity somewhat decline. As the coefficient of the first lag of the probability of arrest is insignificant in all four regressions, it is henceforth left out of the model specification. Set (3) shows that this results in a significant coefficient for the lagged dependent variable when using the extended set of instruments. Whereas the coefficient of police capacity lagged is found to be insignificant it is significant when collapsing the instruments. However, the autoregressive coefficient is now found to be insignificant. These regressions are treated as the baseline regressions when testing the various subsets. Table 6.10 presents the results for the subset tests and the regressions under their null hypothesis. Estimation results (4) are found when police capacity is treated as weakly exogenous. Most estimates are closer to zero now, although exactly the same are significant as in (3) when using the extended set of instruments. The p-value of the standard Sargan-Hansen test has dropped from in (3) to and the incremental test J has a p-value of When using collapsed instruments only the coefficients of probability of conviction and probability of imprisonment are significant and it seems that the results are much more sensitive to the classification of police capacity. The p- value of the corresponding Sargan-Hansen test has dropped from to and the incremental test has a p-value of So, although the p-value of the standard Sargan- Hansen test statistic is lower, it cannot significantly detect the endogeneity of police capacity. Only the subset test based on collapsed instruments rejects the null hypothesis that police capacity is weakly exogenous and we conclude that the results in (4) should not be trusted. Next the endogeneity of probability of arrest is investigated. The null hypothesis is not rejected using either type of instrument matrix even though we know that it is in fact endogenous. Either the degree of simultaneity is rather low or the test has very little power in this case. To conclude probability of conviction, probability of imprisonment and average sentence length are tested for weak exogeneity. When using the extended set of instruments all standard errors decrease except that of the autoregressive coefficient. The p-value of the standard Sargan-Hansen test is and its incremental version has a p-value of When using collapsed instruments the p-value of the standard Sargan- Hansen test actually increases to 0.685, whereas it was in (3). t is therefore no 183

196 surprise that the subset test does not reject and does in fact have a p-value close to one. These findings implicate that the researcher can choose either (1), (2), (3) or (6). Although the results in (6) are not rejected by the incremental Sargan-Hansen test, strict exogeneity of probability of conviction, probability of imprisonment and average sentence length may be a rather strong assumption. That the outcome of the subset test depends on the way the endogenous regressors are instrumented fits the findings from the simulation study rather well. As a robustness check it is therefore advised that practitioners calculate the test statistics using both the extended set and the collapsed set of instruments and see if the outcome differs. 6.8 Conclusions Due to the nature of internal instruments in dynamic panel data models it frequently happens that only a subset of moment conditions is violated. For instance due to misclassification of an explanatory variable with respect to its correlation with the idiosyncratic error term by the researcher. As this will render the GMM estimator inconsistent, detection is of great importance. The Sargan-Hansen test is automatically calculated by almost every popular software package that allows GMM estimation and researchers often rely on its outcome. The present chapter illustrates that using so-called subset tests may yield crucial information to the practitioner when the standard Sargan-Hansen test outcome is not significant. Two types of subset tests are considered, the incremental Sargan-Hansen test and the Hausman test, both based on the Arellano-Bond estimator. n case of the former three test statistics are examined. One based on the usual difference of two standard Sargan- Hansen test statistics (J (2) ) and one based on using the same weighting (estimated under (2) the null hypothesis) in both statistics ( J ). Additionally an incremental Sargan-Hansen test (D (2) ) is derived based on the principle of Hayakawa (2014) for the regular test. This so-called diagonal Sargan-Hansen test uses a diagonal weighting matrix and its critical values are found using the routine by mhof (1961). With respect to the Hausman test a correction is proposed based on the variance correction in Windmeijer (2005). Transforming the data in order to get rid of the individual effect is shown to impact the power of all considered tests. The moment conditions under investigation cannot be directly tested and are partly clouded by the occurrence of additional moment conditions that may either be valid or not. A simulation study is conducted in order to investigate the degree to which the subset tests can be used to complement the standard Sargan-Hansen test and which implementation performs best in terms of size and power. 184

197 We find that the refined test statistic H c (2) provides a much required correction to the Hausman test statistic H (2) under homoskedasticity. However, even though it still outperforms H (2) under heteroskedasticity it suffers from overrejection. The incremental Sargan-Hansen test J (2), which is shown to be numerically equivalent to the Hausman test statistic H (2) when instruments are collapsed, behaves very well under the null hypothesis when using collapsed instruments whereas the corrected version of H (2),givenby H c (2), yields no further improvement. The not size-corrected power shows that it rejects slightly more often than the uncorrected test statistic. The performance of the diagonal Sargan-Hansen test deteriorates when T is large and in the presence of heteroskedasticity. Although its subset variant behaves very well under the null hypothesis of weak exogeneity and shows promising size corrected power when testing for strict exogeneity, it has much less power than J (2) and the standard incremental Sargan-Hansen test statistic J (2) when testing the null hypothesis of strict exogeneity against weak exogeneity. Summarizing our findings with respect to some of the nuisance parameters we note the following. For N as small as 100 the tests for weak exogeneity underreject. However, when the set of instruments is collapsed this effect is mitigated. When the variance of the individual effect stemming from y it or γ is increased then only J (2) is found to reject more often. This same statistic is also found to be sensitive to the value of the autoregressive (2) parameter in the process of the additional regressor x it. Under the null hypothesis J (2) and H c are found to demonstrate the best size control when using collapsed instruments. n terms of size corrected power a mixed picture is observed. Whether collapsing increases power depends on the DGP of x it. f it is weakly exogenous with respect to the idiosyncratic error term the tests on weak exogeneity generally have more power when using all instruments. However, when x it is endogenous and only correlated with ε i,t 1 through ξx i,t 1 the tests based on collapsed instruments have more power, although the tests on weak exogeneity have rather low size corrected power in general. Decreasing the autoregressive parameter of x it results in much less power for the tests on endogeneity, whereas the tests on strict exogeneity are hardly affected. The data of Cornwell and Trumbull (1994) are used to demonstrate how the choice of instruments may affect the test results in practice. Furthermore it shows that inference based on the standard Sargan-Hansen test can be misleading as an insignificant test outcome is not necessarily assuring. The present study only discusses performance of the subset tests in isolation, rather than in a sequence as is much more common in practice. Future research will have to show how these tests behave as part of a selection procedure like for instance in Andrews and Lu (2001) or Caner et al. (2013). 185

198 Appendix 6.A Non-negativeness of J (2) Due to the linearity of ḡ(θ) wemaywrite ḡ(ˆθ (2) )=ḡ(θ 0 )+C(ˆθ (2) θ 0 ), (6.75) from which it easily follows that ḡ(ˆθ (2) ) ḡ(θ 0 )=O p (N 1/2 ). Premultiplying by C W 1 N (ˆθ (1) ) yields C W 1 N (ˆθ (1) )ḡ(ˆθ (2) )=C W 1 N (ˆθ (1) )ḡ(θ 0 )+C W 1 N (ˆθ (1) )C(ˆθ (2) θ 0 ). (6.76) Noting that C W 1 N (ˆθ (1) )ḡ(ˆθ (2) ) = 0 from the first order conditions we obtain ( ) 1 N(ˆθ(2) θ 0 )= C W 1 N (ˆθ (1) )C C W 1 N (ˆθ (1) ) Nḡ(θ 0 ). (6.77) Substituting (6.77) into (6.75) yields ḡ(ˆθ (2) ) = [ ( ) ] 1 L C C W 1 N (ˆθ (1) )C C W 1 N (ˆθ (1) ) ḡ(θ 0 ) = M(ˆθ (1) )ḡ(θ 0 ). (6.78) Note that J (2) givenin(6.8)cannowbewrittenas J (2) = N ḡ(θ 0 ) M(ˆθ (1) ) W 1 N (ˆθ (1) )M(ˆθ (1) )ḡ(θ 0 ). (6.79) Using ˆΨ ˆΨ = W 1 N (ˆθ (1) ) we can write (6.79) as J (2) = N ḡ(θ 0 ) ˆΨ[L P ˆΨ C ] ˆΨ ḡ(θ 0 ). (6.80) Naturally a similar result holds for ḡ 1 ( θ (2) )and J (2) 1 = N ḡ 1 (θ 0 ) ˆΨ1 [ L1 P ˆΨ 1 C 1 ] ˆΨ 1 ḡ 1 (θ 0 ), (6.81) where ˆΨ 1 ˆΨ 1 = W 1 11,N (ˆθ (1) ). Using these formulations of J (2) and J (2) 1 we obtain J (2) = N ḡ(θ 0 ) ˆΨ[B B1 ] ˆΨ ḡ(θ 0 ) (6.82) 186

199 with B =[ L P ˆΨ C ]and B 1 =(ˆΨ) 1 A ˆΨ1 [ L1 P ˆΨ 1 C 1 ] ˆΨ 1 A( ˆΨ ) 1, where A =(O). (2) n order to prove that J 0 we must show that the matrix between brackets is positive semidefinite. This will be done by showing that B B 1 is symmetric and idempotent. Due to their nature B and B 1 are both symmetric as is their difference. As B is a projection matrix it is idempotent. n what follows we will use that A( ˆΨ ) 1 ( ˆΨ) 1 A = AW N (ˆθ (1) )A = W 11,N =(ˆΨ 1 ˆΨ 1 ) 1. We find B 1 B 1 = (ˆΨ) 1 A ˆΨ1 [ L1 P ˆΨ 1 C 1 ] ˆΨ 1 A( ˆΨ ) 1 ( ˆΨ) 1 A ˆΨ1 [ L1 P ˆΨ 1 C 1 ] ˆΨ 1 A( ˆΨ ) 1 = (ˆΨ) 1 A ˆΨ1 [ L1 P ˆΨ 1 C 1 ] ˆΨ 1 AW N (ˆθ (1) )A ˆΨ1 [ L1 P ˆΨ 1 C 1 ] ˆΨ 1 A( ˆΨ ) 1 = (ˆΨ) 1 A ˆΨ1 [ L1 P ˆΨ 1 C 1 ] ˆΨ 1 ( ˆΨ 1 ˆΨ 1 ) 1 ˆΨ1 [ L1 P ˆΨ 1 C 1 ] ˆΨ 1 A( ˆΨ ) 1 = (ˆΨ) 1 A ˆΨ1 [ L1 P ˆΨ 1 C 1 ] ˆΨ 1 A( ˆΨ ) 1 = B 1. With respect to the cross product of B and B 1 we find [ ] BB 1 = [ L P ˆΨ C ](ˆΨ) 1 A ˆΨ1 L1 P ˆΨ 1 C ˆΨ 1 1 A( ˆΨ ) 1 [ = L ˆΨ ] C(C ˆΨ ˆΨ C) 1 C ˆΨ ( ˆΨ) [ ] 1 A ˆΨ1 L1 P ˆΨ 1 C ˆΨ 1 1 A( ˆΨ ) 1 = [( ˆΨ) 1 A ˆΨ1 ˆΨ C(C ˆΨ ˆΨ C) 1 C 1 ˆΨ ][ ] 1 L1 P ˆΨ 1 C ˆΨ 1 1 A( ˆΨ ) 1 [ ] = (ˆΨ) 1 A ˆΨ1 L1 P ˆΨ 1 C ˆΨ 1 1 A( ˆΨ ) 1 = B 1, due to ˆΨ C(C ˆΨ ˆΨ C) 1 C 1 ˆΨ 1 [ L1 P ˆΨ 1 C 1 ] = O. Using the above we find (B B 1 )(B B 1 ) = BB B 1 B BB 1 + B 1 B 1 = B B 1. J (2) Hence, B B 1 is positive semidefinite and 0. t is crucial that ˆΨ 1 and ˆΨ are based on the same weighting matrix. Any incremental Sargan-Hansen test statistic of this type is therefore non-negative. 187

200 Appendix 6.B Proof of Theorem 6.1 The proof will be provided for the two-step GMM estimator ˆθ (2). expression for ḡ(ˆθ (2) ) in (6.78) from which it follows that We start from the M(ˆθ (1) )ḡ(θ 0 )=M 0 ḡ(θ 0 )+o p (1), (6.83) where M 0 = [ ( ) ] L C 0 C 0 V0 1 1 C 0 C 0 V0 1, (6.84) as plim N W N (θ) =V 0 and plim N C = C 0. Now let W N,diag (ˆθ (1) ) be the blockdiagonal weighting matrix with plim N W N,diag (θ 0 )=W diag,wherew diag is a positive definite matrix. The diagonal Sargan-Hansen test is D (2) = N ḡ(ˆθ (2) ) W 1 N,diag (ˆθ (1) )ḡ(ˆθ (2) ) = N ḡ(θ 0 ) M 0W 1 diag M 0ḡ(θ 0 )+o p (1). (6.85) Using his Lemma A2, Hayakawa (2014) finds that D (2) d L K j=1 λ jzj 2,wherez j ND(0, 1) and λ =(λ 1,..., λ L K ) is the vector of non-zero eigenvalues of Λ=V 1/2 0 M 0W 1 diag M 0V 1/2 0. (6.86) Conditional on the assumption that E[ḡ 1 (θ 0 )] = 0, the validity of the complementary set of moment conditions E[ḡ 2 (θ 0 )] = 0 can be tested using the incremental Sargan-Hansen test. The diagonal version of this test is D (2) = N ḡ(ˆθ (2) ) W 1 N,diag (ˆθ (1) )ḡ(ˆθ (2) ) N ḡ 1 (ˆθ (2) 1 ) W 1 (1) 11,N,diag (ˆθ 1 )ḡ 1 (ˆθ (2) 1 ). (6.87) Using the above we find D (2) = N ḡ(θ 0 ) M 0W 1 diag M 0ḡ(θ 0 ) N ḡ 1 (θ 0 ) M 01W 1 11,diag M 01ḡ 1 (θ 0 )+o p (1) = N ḡ(θ 0 ) M 0W 1 diag M 0ḡ(θ 0 ) N ḡ 1 (θ 0 ) A M 01W 1 11,diag M 01Aḡ 1 (θ 0 )+o p (1) = N ḡ(θ 0 ) V 1/2 0 [Q 0 Q 01 ] V 1/2 0 ḡ(θ 0 )+o p (1), (6.88) 188

201 where M 01 = L1 C 01 (C 01V 1 01 C 01 ) 1 C 01V 1 01 and Q 0 = V 1/2 0 M 0W 1 diag M 0V 1/2 0, Q 01 = V 1/2 0 A M 01W 1 11,diag M 01AV 1/2 0. (6.89) Again applying Lemma A2 of Hayakawa (2014) yields D (2) d p λ jzj 2 (6.90) where λ =(λ 1,..., λ p) is the vector of non-zero eigenvalues of Q 0 Q 01. Let us first determine the rank of Q 0. Note that M 0 is idempotent and that V 1/2 0 and W diag are non-singular. This means that rk(q 0 )=L K. With respect to Q 01 we must note that A has full row rank so rk(m 01 A)=rk(M 01 ) and we find that rk(q 01 )=L 1 K. Hence, rk(q 0 Q 01 ) L 2 and Λ has at least L 2 non-zero eigenvalues. j=1 Appendix 6.C Estimating the variance of the vector of contrasts n order to see how (6.31) is found let us rewrite (6.30) as ˆθ (2) 1 ˆθ (2) = {f 1 (θ 0 )+F 1 (θ 0 )(ˆθ (1) 1 θ 0 )} {f(θ 0 )+F(θ 0 )(ˆθ (1) θ 0 )} + o p (N 1 ), for which a variance estimate would involve Var c (ˆθ (2) 1 )and Var c (ˆθ (2) ) for the variances of the two terms between brackets. Furthermore, the covariance between the two terms needs to be estimated. For this we consider {f 1 (θ 0 )+F 1 (θ 0 )(ˆθ (1) 1 θ 0 )}{f(θ 0 )+F (θ 0 )(ˆθ (1) θ 0 )} = f 1 (θ 0 )f(θ 0 ) f 1 (θ 0 )(ˆθ (1) θ 0 ) F (θ 0 ) F 1 (θ 0 )(ˆθ (1) 1 θ 0 )f(θ 0 ) F 1 (θ 0 )(ˆθ (1) 1 θ 0 )(ˆθ (1) θ 0 ) F (θ 0 ). Finding expectations for these four terms is relatively easy. The expectation of the first (2) term can be estimated by Var(ˆθ 1 ). n order to estimate the expectation of the second term note that ˆθ (1) θ 0 = (C W 1 N C) 1 C W 1 N ḡ(θ 0), 189

202 and f 1 (θ 0 )(ˆθ (1) θ 0 ) = (C 1W 1 11,N (θ 0)C 1 ) 1 C 1W 1 11,N (θ 0)ḡ 1 (θ 0 )ḡ(θ 0 ) W 1 N C(C W 1 N = (C 1W 1 11,N (θ 0)C 1 ) 1 C 1W 1 11,N (θ 0)Aḡ(θ 0 )ḡ(θ 0 ) W 1 N C(C W 1 The expectation can thus be estimated by 1 N (C 1W 1 (1) 11,N (ˆθ 1 )C 1 ) 1 C 1W 1 (1) 11,N (ˆθ 1 )AW N (ˆθ (1) )W 1 N C(C W 1 N C) 1 ˆF (ˆθ(1) ), because E[ḡ(θ 0 )ḡ(θ 0 ) ]= 1 W N N(θ 0 ). For estimating the third expectation we examine C) 1 N C) 1. (ˆθ (1) 1 θ 0 )f(θ 0 ) = (C 1W 11,N C 1 ) 1 C 1W 11,Nḡ1(θ 1 0 )ḡ(θ 0 ) W 1 N (θ 0)C(C W 1 N (θ 0)C) 1 = (C 1W 11,N C 1 ) 1 C 1W 1 11,N Aḡ(θ 0)ḡ(θ 0 ) W 1 N (θ 0)C(C W 1 N (θ 0)C) 1. Again using that E[ḡ(θ 0 )ḡ(θ 0 ) ]= 1 N W N(θ 0 ) the third expectation can be estimated by 1 N ˆF 1 (ˆθ (1) 1 )(C W 1 N (ˆθ (1) )C) 1. The expectation of the last term involves the covariance between the two one-step estimators (ˆθ (1) 1 θ 0 )(ˆθ (1) θ 0 ) = (C 1W 11,N C 1 ) 1 C 1W 11,Nḡ1(θ 1 0 )ḡ(θ 0 ) W 1 N C(C W 1 N = (C 1W 11,N C 1 ) 1 C 1W 1 11,N Aḡ(θ 0)ḡ(θ 0 ) W 1 N C(C W 1 and can be estimated by 1 N ˆF 1 (ˆθ (1) 1 )(C 1W 1 11,N C 1) 1 C 1W 1 11,N AW N(ˆθ (1) )W 1 N C(C W 1 N C) 1 ˆF (ˆθ(1) ). C) 1 N C) 1, Adding the transpose of the covariance estimated by the four expressions above as well yields (6.31). Appendix 6.D Correcting H (2) Note first that θ (2) 1 θ 0 = f 1 (ˆθ (1) )and f 1 (ˆθ (1) )=f 1 (θ 0 )+F 1 (θ 0 )(ˆθ (1) θ 0 )+O p (N 1 ). (6.91) 190

203 The vector of contrast examined by H (2) is ˆθ (2) (2) θ 1 = f 1 (ˆθ (1) ) f(ˆθ (1) ) = f 1 (θ 0 ) f(θ 0 )+(F 1 (θ 0 ) F (θ 0 ))(ˆθ (1) θ 0 )+O p (N 1 ), (6.92) of which the variance could be estimated by Var c ( θ (2) 1 ˆθ (2) ) = Var c ( θ (2) 1 )+ Var c (ˆθ (2) ) 2 Var(ˆθ (2) ) 1 N G 1(ˆθ (1) )C 1W 1 11,N (ˆθ (1) )AW N (ˆθ (1) )W 1 N CG ˆF (ˆθ (1) ) where the j th column of F 1 (θ) is and 1 N ˆF (ˆθ (1) )GC W 1 N W N(ˆθ (1) )A W 1 11,N (ˆθ (1) )C 1 G 1 (ˆθ (1) ) 1 N F 1 (ˆθ (1) )G(ˆθ (1) ) 1 N G(ˆθ (1) ) F 1 (ˆθ (1) ) F 1 (ˆθ (1) ) Var r (ˆθ (1) )F (ˆθ (1) ) F (ˆθ (1) ) Var r (ˆθ (1) ) F 1 (ˆθ (1) ), (6.93) F j 1 (θ) =(C 1W 1 11,N (θ)c 1) 1 C 1W 1 N (θ) W N(θ) W 1 (2) N (θ)ḡ( θ 1 ), (6.94) θ j Var c ( θ (2) 1 ) = (2) Var( θ 1 ) + 1 N G 1(ˆθ (1) )C 1 C 1W 1 11,N (ˆθ (1) )AW N (ˆθ (1) )W 1 N CG F 1 (ˆθ (1) ) + 1 N F 1 (ˆθ (1) )GC W 1 N W N(ˆθ (1) )A W 1 11,N (ˆθ (1) )C 1 G 1 (ˆθ (1) ) + F 1 (ˆθ (1) ) Var r (ˆθ (1) ) F 1 (ˆθ (1) ). A corrected version of H (2) then follows from (6.33) 191

204 Full set tests Table 6.1: Results for reference case and x strictly exogenous δ =0 δ =1 Endogeneity Weak exogeneity Strict exogeneity Strict exogeneity Weak exogeneity Endogeneity γ J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col T = T = Subset tests δ =0 γ J (2) Testing for endogeneity, ρ xε = 0 Testing for weak exogeneity, ρ xε 1 =0 All Collapsed All Collapsed J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = δ =1 γ J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = Design parameter values: N = 200, TM =1,DENy =1,EFx =0,EV Fx =0.3, ξ =0.8, κ =0,σε =1. Table 6.2: Results for reference case, x strictly exogenous and N = 100 Full set tests δ =0 δ =1 Endogeneity Weak exogeneity Strict exogeneity Strict exogeneity Weak exogeneity Endogeneity γ J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col T = T = Subset tests δ =0 γ J (2) Testing for endogeneity, ρ xε = 0 Testing for weak exogeneity, ρ xε 1 =0 All Collapsed All Collapsed J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = δ =1 γ J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = Design parameter values: N = 100, TM =1,DENy =1,EFx =0,EV Fx =0.3, ξ =0.8, κ =0,σε =1. 192

205 Full set tests Table 6.3: Results for DEN y =4andx strictly exogenous δ =0 δ =1 Endogeneity Weak exogeneity Strict exogeneity Endogeneity Weak exogeneity Strict exogeneity γ J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col T = T = Subset tests δ =0 γ J (2) Testing for endogeneity, ρ xε = 0 Testing for weak exogeneity, ρ xε 1 =0 All Collapsed All Collapsed J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = δ =1 γ J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = Design parameter values: N = 200, TM =1,DENy =4,EFx =0,EV Fx =0.3, ξ =0.8, κ =0,σε =1. Full set tests Table 6.4: Results for ξ =0.5 andx strictly exogenous δ =0 δ =1 Endogeneity Weak exogeneity Strict exogeneity Endogeneity Weak exogeneity Strict exogeneity γ J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col T = T = Subset tests δ =0 γ J (2) Testing for endogeneity, ρ xε =0.0 Testing for weak exogeneity, ρ xε 1 =0.0 All Collapsed All Collapsed J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = δ =1 γ J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = Design parameter values: N = 200, TM =1,DENy =1,EFx =0,EV Rx =0.3, ξ =0.5, κ =0,σε =1. 193

206 Table 6.5: Results for reference case x weakly exogenous (size corrected) Full set tests δ =0 δ =1 Endogeneity Weak exogeneity Strict exogeneity Endogeneity Weak exogeneity Strict exogeneity γ J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col T = T = Subset tests δ =0 γ J (2) Testing for endogeneity, ρ xε =0.0 Testing for weak exogeneity, ρ xε 1 =0.2 All Collapsed All Collapsed J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = δ =1 γ J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = Design parameter values: N = 200, TM =1,DENy =1,EFx =0,EV Rx =0.3, ξ =0.8, κ =0,σε =1. Table 6.6: Results for reference case and x endogenous ( ρ xε =0.2, size corrected) Full set tests δ =0 δ =1 Endogeneity Weak exogeneity Strict exogeneity Endogeneity Weak exogeneity Strict exogeneity γ J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col T = T = Subset tests δ =0 γ J (2) Testing for endogeneity, ρ xε =0.2 Testing for weak exogeneity, ρ xε 1 =0.0 All Collapsed All Collapsed J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = δ =1 γ J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = Design parameter values: N = 200, TM =1,DENy =1,EFx =0,EV Rx =0.3, ξ =0.8, κ =0,σε =1. 194

207 Table 6.7: Results for reference case and x endogenous ( ρ xε =0.2, ρ xε 1 =0.2, size corrected) Full set tests δ =0 δ =1 Endogeneity Weak exogeneity Strict exogeneity Endogeneity Weak exogeneity Strict exogeneity γ J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col T = T = Subset tests δ =0 γ J (2) Testing for endogeneity, ρ xε =0.2 Testing for weak exogeneity, ρ xε 1 =0.2 All Collapsed All Collapsed J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = δ =1 γ J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = Design parameter values: N = 200, TM =1,DENy =1,EFx =0,EV Rx =0.3, ξ =0.8, κ =0,σε =1. Full set tests Table 6.8: Results for ξ =0.5 andx endogenous (size corrected) δ =0 δ =1 Endogeneity Weak exogeneity Strict exogeneity Endogeneity Weak exogeneity Strict exogeneity γ J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col J (2) D (2) J (2) col T = T = Subset tests δ =0 γ J (2) Testing for endogeneity, ρ xε =0.2 Testing for weak exogeneity, ρ xε 1 =0.2 All Collapsed All Collapsed J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = δ =1 γ J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c J (2) J (2) D (2) J (2) J (2) H (2) H c (2) H(2) c T = T = Design parameter values: N = 200, TM =1,DENy =1,EFx =0,EV Rx =0.3, ξ =0.5, κ =0,σε =1. 195

208 Table 6.9: Empirical findings: a model of crime (1) (2) (3) ext col ext col ext col CR (0.227 ) (0.318 ) (0.226 ) (0.371 ) (0.163 ) (0.360 ) P a (0.177) (0.294) (0.253) (0.287) (0.211) (0.230) P 1 a (0.155) (0.170) (0.129) (0.116) Pol (0.226) (0.140) (0.219) (0.168) (0.230) (0.163) Pol (0.277) (0.411) (0.208) (0.171) (0.172) (0.138) P c (0.103 ) (0.160 ) (0.189 ) (0.166 ) (0.177 ) (0.134 ) P p (0.085 ) (0.129 ) (0.102 ) (0.109 ) (0.096 ) (0.103 ) Sen (0.045 ) (0.081 ) (0.053 ) (0.071 ) (0.055 ) (0.067 ) L J (2) AR(1) AR(2) Bold - treated as endogenous; italics - weakly exogenous; normal - strictly exogenous. Table 6.10: Empirical findings: a model of crime, testing subsets (4) (5) (6) ext col ext col ext col CR (0.195 ) (0.290 ) (0.130 ) (0.263 ) (0.196 ) (0.333 ) P a (0.157) (0.267) (0.156 ) (0.175 ) (0.124) (0.172) Pol (0.118 ) (0.184 ) (0.150) (0.132) (0.117) (0.164) Pol (0.054 ) (0.059 ) (0.132) (0.125) (0.157) (0.102) P c (0.090 ) (0.135 ) (0.127 ) (0.129 ) (0.053) (0.090) P p (0.073 ) (0.098 ) (0.077 ) (0.102 ) (0.052) (0.068) Sen (0.055 ) (0.064 ) (0.046 ) (0.055 ) (0.033) (0.052) L J (2) J (2) AR(1) AR(2) Bold - treated as endogenous; italics - weakly exogenous; normal - strictly exogenous. 196

209 Bibliography Ahn, S. C. (1997): Orthogonality Tests in Linear Models, Oxford Bulletin of Economics and Statistics, 59, Ahn, S. C. and P. Schmidt (1995): Efficient Estimation of Models for Dynamic Panel Data, Journal of Econometrics, 68, Alonso-Borrego, C. and M. Arellano (1999): Symmetrically Normalized nstrumental-variable Estimation Using Panel Data, Journal of Business & Economic Statistics, 17, Alvarez, J. and M. Arellano (2003): The Time Series and Cross-Section Asymptotics of Dynamic Panel Data Estimators, Econometrica, 71, Anderson, T. W. and C. Hsiao (1981): Estimation of Dynamic Models With Error Components, Journal of the American Statistical Association, 76, Anderson, T. W. and H. Rubin (1949): Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations, The Annals of Mathematical Statistics, 20, Andrews, D. W. K. and B. Lu (2001): Consistent Model and Moment Selection Procedures for GMM Estimation With Application to Dynamic Panel Data Models, Journal of Econometrics, 101, Arellano, M. (2003): Panel Data Econometrics, Oxford University Press. Arellano, M. and S. Bond (1991): Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations, The Review of Economic Studies, 58, Arellano, M. and O. Bover (1995): Another Look at the nstrumental Variable Estimation of Error-Components Models, Journal of Econometrics, 68, Baltagi, B. H., E. Bratberg, and T. H. Holms (2005): A Panel Data Study of Physicians Labor Supply: The Case of Norway, Health Economics, 14, Basmann, R. L. (1960): On Finite Sample Distributions of Generalized Classical Linear dentifiability Test Statistics, Journal of the American Statistical Association, 55,

210 Baum, C. F., M. E. Schaffer, and S. Stillman (2003): nstrumental Variables and GMM: Estimation and Testing, Stata Journal, 3, Bazzi, S. and M. A. Clemens (2013): Blunt instruments: Avoiding Common Pitfalls in dentifying the Causes of Economic Growth, American Economic Journal: Macroeconomics, 5, Blundell, R. and S. Bond (1998): nitial Conditions and Moment Restrictions in Dynamic Panel Data Models, Journal of Econometrics, 87, Blundell, R., S. Bond, and F. Windmeijer (2001): Estimation in Dynamic Panel Data Models: mproving on the Performance of the Standard GMM Estimator, Advances in Econometrics, 1, Bond, S. and F. Windmeijer (2005): Reliable nference for GMM Estimators? Finite Sample Properties of Alternative Test Procedures in Linear Panel Data Models, Econometric Reviews, 24, Bound, J., D. A. Jaeger, and R. M. Baker (1995): Problems With nstrumental Variables Estimation When the Correlation Between the nstruments and the Endogeneous Explanatory Variable is Weak, Journal of the American Statistical Association, 90, Bowsher, C. G. (2002): On Testing Overidentifying Restrictions in Dynamic Panel Data Models, Economics Letters, 77, Bun, M. J. G. (2015): dentifying the mpact of Deterrence on Crime - nternal Versus External nstruments, Applied Economics Letters, 22, Bun, M. J. G. and M. A. Carree (2005a): Bias-Corrected Estimation in Dynamic Panel Data Models, Journal of Business & Economic Statistics, 23, (2005b): Correction: Bias-Corrected Estimation in Dynamic Panel Data Models, Journal of Business & Economic Statistics, 23, (2006): Bias-Corrected Estimation in Dynamic Panel Data Models With Heteroscedasticity, Economics Letters, 92, Bun, M. J. G. and J. F. Kiviet (2006): The Effects of Dynamic Feedbacks on LS and MM Estimator Accuracy in Panel Data Models, Journal of Econometrics, 132, Bun, M. J. G. and V. Sarafidis (2015): Dynamic Panel Data Models, in The Oxford Handbook of Panel Data, ed. by B. H. Baltagi, Oxford University Press, Cameron, A. C. and P. K. Trivedi (2005): Microeconometrics: Methods and Applications, Cambridge University Press. 198

211 Caner, M., X. Han, and Y. Lee (2013): Adaptive Elastic Net GMM Estimator with Many nvalid Moment Conditions: A Simultaneous Model and Moment Selection, Mimeo. Chao, J. C., J. A. Hausman, W. K. Newey, N. R. Swanson, and T. Woutersen (2014): Testing Overidentifying Restrictions With Many nstruments and Heteroskedasticity, Journal of Econometrics, 178, Chmelarova, V. and R. C. Hill (2010): The Hausman Pretest Estimator, Economics Letters, 108, Cornwell, C. and W. N. Trumbull (1994): Estimating the Economic Model of Crime With Panel Data, The Review of Economics and Statistics, 76, Davidson, R. and J. G. MacKinnon (1989): Testing for Consistency Using Artificial Regressions, Econometric Theory, 5, (1990): Specification Tests Based on Artificial Regressions, Journal of the American Statistical Association, 85, (1993): Estimation and nference in Econometrics, Oxford University Press. (2014): Bootstrap Tests for Overidentification in Linear Regression Models, Working Papers 1318, Queen s University, Department of Economics. De Blander, R. (2008): Which Null Hypothesis do Overidentification Restrictions Actually Test? Economics Bulletin, 3, 1 9. Dhaene, G. and K. Jochmans (2012): An Adjusted Profile Likelihood for Non- Stationary Panel Data Models With Fixed Effects, Mimeo. Doko Tchatoka, F. (2014): On bootstrap validity for specification tests with weak instruments, Forthcoming in The Econometrics Journal. Durbin, J. (1954): Errors in Variables, Review of the nternational Statistical nstitute, 22, Everaert, G. (2013): Orthogonal to Backward Mean Transformation for Dynamic Panel Data Models, The Econometrics Journal, 16, Flannery, M. J. and K. W. Hankins (2013): Estimating Dynamic Panel Models in Corporate Finance, Journal of Corporate Finance, 19, Godfrey, L. G. and J. P. Hutton (1994): Discriminating Between Errors-in- Variables/Simultaneity and Misspecification in Linear Regression Models, Economics Letters, 44, Gouriroux, C., P. C. B. Phillips, and J. Yu (2010): ndirect nference for Dynamic Panel Models, Journal of Econometrics, 157,

212 Griliches, Z. (1976): Wages of Very Young Men, Journal of Political Economy, 84, S69 S86. Guggenberger, P. (2010): The mpact of a Hausman Pretest on the Asymptotic Size of a Hypothesis Test, Econometric Theory, 26, Hahn, J., J. C. Ham, and H. R. Moon (2011): The Hausman Test and Weak nstruments, Journal of Econometrics, 160, Hahn, J., J. Hausman, and G. Kuersteiner (2007): Long Difference nstrumental Variables Estimation for Dynamic Panel Models With Fixed Effects, Journal of Econometrics, 140, Hahn, J. and G. Kuersteiner (2002): Asymptotically Unbiased nference for a Dynamic Panel Model with Fixed Effects When Both n and T Are Large, Econometrica, 70, Hall, A. R. (2005): Generalized Method of Moments, Oxford University Press Oxford. Han, C. and P. C. B. Phillips (2013): First Difference Maximum Likelihood and Dynamic Panel Estimation, Journal of Econometrics, 175, Hansen, L. P. (1982): Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, Harris, M. N., W. Kostenko, L. Mátyás, and. Timol (2009): The Robustness of Estimators for Dynamic Panel Data Models to Misspecification, The Singapore Economic Review, 54, Hausman, J. A. (1978): Specification Tests in Econometrics, Econometrica, 46, Hausman, J. A., W. K. Newey, T. Woutersen, J. C. Chao, and N. R. Swanson (2012): nstrumental Variable Estimation With Heteroskedasticity and Many nstruments, Quantitative Economics, 3, Hausman, J. A. and W. E. Taylor (1981): A Generalized Specification Test, Economics Letters, 8, Hayakawa, K. (2009): On the Effect of Mean-Nonstationarity in Dynamic Panel Data Models, Journal of Econometrics, 153, (2010): The Effects of Dynamic Feedbacks on LS and MM Estimator Accuracy in Panel Data Models: Some Additional Results, Journal of Econometrics, 159, (2014): Alternative Over-dentifying Restriction Test in GMM Estimation of Panel Data Models, Mimeo. Hayashi, F. (2000): Econometrics, Princeton University Press. 200

213 Holly, A. (1982): A Remark on Hausman s Specification Test, Econometrica, 50, Holtz-Eakin, D., W. Newey, and H. S. Rosen (1988): Estimating Vector Autoregressions With Panel Data, Econometrica, 56, Hsiao, C., M. H. Pesaran, and A. K. Tahmiscioglu (2002): Maximum Likelihood Estimation of Fixed Effects Dynamic Panel Data Models Covering Short Time Periods, Journal of Econometrics, 109, Hwang, H.-S. (1980a): A Comparison of Tests of Overidentifying Restrictions, Econometrica, 48, (1980b): Test of ndependence Between a Subset of Stochastic Regressors and Disturbances, nternational Economic Review, 21, (1985): The Equivalence of Hausman and Lagrange Multiplier Tests of ndependence Between Disturbance and a Subset of Stochastic Regressors, Economics Letters, 17, mhof, J. P. (1961): Computing the Distribution of Quadratic Forms in Normal Variables, Biometrika, 48, Jeong, J. and B. H. Yoon (2010): The Effect of Pseudo-Exogenous nstrumental Variables on Hausman Test, Communications in Statistics - Simulation and Computation, 39, Juodis, A. (2013): A Note on Bias-Corrected Estimation in Dynamic Panel Data Models, Economics Letters, 118, Kelaher, R. and V. Sarafidis (2011): Crime and Punishment Revisited, MPRA Paper 28213, University Library of Munich. Kiviet, J. F. (1985): Model Selection Test Procedures in a Single Linear Equation of a Dynamic Simultaneous System and Their Defects in Small Samples, Journal of Econometrics, 28, (1995): On Bias, nconsistency, and Efficiency of Various Estimators in Dynamic Panel Data Models, Journal of Econometrics, 68, (1999): Expectations of Expansions for Estimators in a Dynamic Panel Data Model: Some Results for Weakly-Exogenous Regressors, in Analysis of Panels and Limited Dependent Variables, ed. by C. Hsiao, K. Lahiri, L. F. Lee, and M. H. Pesaran, Cambridge University Press, (2007): Judging Contending Estimators by Simulation: Tournaments in Dynamic Panel Data Models, in The Refinement of Econometric Estimation and Test Procedures; Finite Sample and Asymptotic Analysis, ed. by G. D. A. Phillips and E. Tzavalis, Cambridge University Press,

214 (2012): Monte Carlo Simulation for Econometricians, Now Publishers. (2013): dentification and nference in a Simultaneous Equation Under Alternative nformation Sets and Sampling Schemes, The Econometrics Journal, 16, S24 S59. Kiviet, J. F. and Q. Feng (2014): Efficiency Gains by Modifying GMM Estimation in Linear Models under Heteroskedasticity, Economic Growth Centre Working Paper Series 1413, Nanyang Technological University, School of Humanities and Social Sciences, Economic Growth Centre. Kiviet, J. F. and G. D. A. Phillips (2012): Higher-Order Asymptotic Expansions of the Least-Squares Estimation Bias in First-Order Dynamic Regression Models, Computational Statistics & Data Analysis, 56, Kiviet, J. F. and M. Pleus (2014): The Performance of Tests on Endogeneity of Subsets of Explanatory Variables Scanned by Simulation, Economic Growth Centre Working Paper Series 1208, Nanyang Technological University, School of Humanities and Social Sciences, Economic Growth Centre. Kiviet, J. F., M. Pleus, and R. W. Poldermans (2014): Accuracy and Efficiency of Various GMM nference Techniques in Dynamic Micro Panel Data Models, UvA- Econometrics Working Papers 14-09, Universiteit van Amsterdam, Dept. of Econometrics. Kleibergen, F. (2002): Pivotal Statistics for Testing Structural Parameters in nstrumental Variables Regression, Econometrica, 70, (2005): Testing Parameters in GMM Without Assuming that They Are dentified, Econometrica, 73, Kripfganz, S. and C. Schwarz (2013): Estimation of Linear Dynamic Panel Data Models With Time-nvariant Regressors, Mimeo. Kruiniger, H. (2008): Maximum Likelihood Estimation and nference Methods for the Covariance Stationary Panel AR(1)/Unit Root Model, Journal of Econometrics, 144, Lee, Y. and R. Okui (2009): A Specification Test for nstrumental Variables Regression with Many nstruments, Cowles Foundation Discussion Papers 1741, Cowles Foundation for Research in Economics, Yale University. Magdalinos, M. A. (1985): mproving Some nstrumental Variables Test Procedures, Econometric Theory, 1, (1994): Testing nstrument Admissibility: Some Refined Asymptotic Results, Econometrica, 62, Meepagala, G. (1992): On the Finite Sample Performance of Exogeneity Tests of Revankar, Revankar and Hartley and Wu-Hausman, Econometric Reviews, 11,

215 Moral-Benito, E. (2013): Likelihood-Based Estimation of Dynamic Panels With Predetermined Regressors, Journal of Business & Economic Statistics, 31, Moreira, M. J. (2003): A Conditional Likelihood Ratio Test for Structural Models, Econometrica, 71, Nakamura, A. and M. Nakamura (1981): On the Relationships Among Several Specification Error Tests Presented by Durbin, Wu, and Hausman, Econometrica, 49, (1985): On the Performance of Tests by Wu and by Hausman for Detecting the Ordinary Least Squares Bias Problem, Journal of Econometrics, 29, Nelson, C. R. and R. Startz (1990a): The Distribution of the nstrumental Variables Estimator and ts t-ratio When the nstrument is a Poor One, The Journal of Business, 63, S125 S140. (1990b): Some Further Results on the Exact Small Sample Properties of the nstrumental Variable Estimator, Econometrica, 58, Newey, W. K. (1985): Generalized Method of Moments Specification Testing, Journal of Econometrics, 29, Newey, W. K. and R. J. Smith (2004): Higher Order Properties of GMM and Generalized Empirical Likelihood Estimators, Econometrica, 72, Newey, W. K. and K. D. West (1987): Hypothesis Testing With Efficient Method of Moments Estimation, nternational Economic Review, 28, Newey, W. K. and F. Windmeijer (2009): Generalized Method of Moments With Many Weak Moment Conditions, Econometrica, 77, Niemczyk, J. (2009): Consequences and Detection of nvalid Exogeneity Conditions, Ph.D. thesis. Okui, R. (2009): The Optimal Choice of Moments in Dynamic Panel Data Models, Journal of Econometrics, 151, Parente, P. M. D. C. and J. M. C. S. Silva (2012): A Cautionary Note on Tests of Overidentifying Restrictions, Economics Letters, 115, Pesaran, M. H. and R. J. Smith (1990): A Unified Approach to Estimation and Orthogonality Tests in Linear Single-Equation Econometric Models, Journal of Econometrics, 44, Pitman, E. J. G. (1949): Notes on Non-Parametric Statistical nference, Columbia University, New York, NY. Revankar, N. S. (1978): Asymptotic Relative Efficiency Analysis of Certain Test of ndependence in Structural Systems, nternational Economic Review, 19,

216 Revankar, N. S. and M. J. Hartley (1973): An ndependence Test and Conditional Unbiased Predictions in the Context of Simultaneous Equation Systems, nternational Economic Review, 14, Roodman, D. (2009): A Note on the Theme of Too Many nstruments, Oxford Bulletin of Economics and Statistics, 71, Ruud,P.A.(1984): Tests of Specification in Econometrics, Econometric Reviews, 3, (2000): An ntroduction to Classical Econometric Theory, Oxford University Press. Sargan, J. D. (1958): The Estimation of Economic Relationships Using nstrumental Variables, Econometrica, 26, Smith, R. J. (1983): On the Classical Nature of the Wu-Hausman Statistics for the ndependence of Stochastic Regressors and Disturbance, Economics Letters, 11, (1984): A Note on Likelihood Ratio Tests for the ndependence Between a Subset of Stochastic Regressors and Disturbances, nternational Economic Review, 25, (1985): Wald Tests for the ndependence of Stochastic Variables and Disturbance of a Single Linear Stochastic Simultaneous Equation, Economics Letters, 17, Spencer, D. E. and K. N. Berk (1981): A Limited nformation Specification Test, Econometrica, 49, (1982): A Limited nformation Specification Test, Econometrica, 50, Staiger, D. and J. H. Stock (1997): nstrumental Variables Regression With Weak nstruments, Econometrica, 65, Stock, J. H., J. H. Wright, and M. Yogo (2002): A Survey of Weak nstruments and Weak dentification in Generalized Method of Moments, Journal of Business & Economic Statistics, 20, Thurman, W. N. (1986): Endogeneity Testing in a Supply and Demand Framework, The Review of Economics and Statistics, 68, Windmeijer, F. (2005): A Finite Sample Correction for the Variance of Linear Efficient Two-Step GMM Estimators, Journal of Econometrics, 126, Wong, K.-f. (1996): Bootstrapping Hausman s Exogeneity Test, Economics Letters, 53, Wu, D.-M. (1973): Alternative Tests of ndependence Between Stochastic Regressors and Disturbances, Econometrica, 41,

217 (1974): Alternative Tests of ndependence Between Stochastic Regressors and Disturbances: Finite Sample Results, Econometrica, 42, (1983): A Remark on a Generalized Specification Test, Economics Letters, 11, Ziliak, J. P. (1997): Efficient Estimation With Panel Data When nstruments are Predetermined: An Empirical Comparison of Moment-Condition Estimators, Journal of Business & Economic Statistics, 15,

218

219 Summary Economics, in essence not being an experimental science, entails challenges making econometrics an indispensable tool. Econometrics offers solutions to many problems faced by practitioners when interested in providing policy advice based on empirical evidence. One of these problems lies at the heart of the subject of this thesis. Before this subject can be discussed, it is required to establish some concepts. t is often the goal of an economist to investigate a causal relationship. The variables by which one aims to explain another variable in an economic relationship are called explanatory variables. As the explanatory variables can never perfectly explain the so-called dependent variable, the model also includes an unexplained part, which is addressed as the disturbance term. Variables that are not correlated with the disturbance term are labeled exogenous. n case all explanatory variables are exogenous, the causal relationship can be estimated by Ordinary Least Squares (OLS). However, especially when the relationship involves policy measures and their target, the assumption of exogeneity is often violated. An example occurs when investigating the impact of foreign aid on the growth of the gross national product (GNP) of the receiving country. n order to determine the size of the foreign aid, the current growth of the GNP is often taken into account. Hence, the dependent and the explanatory variable are partly jointly determined. Consequently the explanatory variable and the disturbance term will be correlated. An explanatory variable that is correlated with the disturbance term is called endogenous. Endogeneity of one or more explanatory variables renders the OLS estimator inconsistent, i.e. the estimation error does not disappear when the size of the sample grows large. Fortunately, alternative estimators exist, of which one is the instrumental variables (V) estimator. This estimation technique allows for endogeneity of explanatory variables as long as at least as many non-explanatory instrumental variables are available. As their name suggests, these instrumental variables are not allowed to have direct explanatory power in the causal relationship of interest. f they do have explanatory power and are therefore wrongly left out of the relationship, the instrumental variables will be invalid. They are then also endogenous with respect to the disturbance term. The non-explanatory instru- 207

220 mental variables must satisfy another requirement. They must be sufficiently correlated with the endogenous explanatory variables. When they are highly correlated the instrumental variables are called strong, whereas they are called weak when the correlation is low. Hence, the instrumental variables should not be weak. The full set of instrumental variables consists of the non-explanatory instrumental variables and the exogenous explanatory variables. The variance of the V estimator is determined by the correlation between the (non-explanatory) instruments and the endogenous explanatory variables. The higher the correlation, the lower the variance. f all explanatory variables are assumed to be exogenous, all explanatory variables can also be used as instrumental variables for themselves, yielding a perfect correlation. The OLS and V estimators will be the same in this specific case. Treating explanatory variables as endogenous while they are actually exogenous therefore results in a loss of efficiency. Wrongfully treating an endogenous explanatory variable as exogenous yields an inconsistent estimator. The focus of this thesis is on the performance of techniques which are used to classify variables with respect to their correlation with the disturbance term. These techniques can be applied to test exogeneity of potentially endogenous explanatory variables and, eventually jointly, the validity of non-explanatory instrumental variables. They are of great importance to practitioners as they provide guidance regarding important properties of estimators of causal effects. This family of tests can be divided into two types. The first type can be used to test a full set of variables, for example all potentially endogenous explanatory variables or all non-explanatory instrumental variables. The second type of tests can be applied to subsets of variables. t are in particular the subset tests that are of interest here. The first part of this thesis examines the behaviour of these tests for data with no time dimension (cross-section data), whereas the second part examines their use in linear dynamic panel data models, which also include a time-dimension. n Chapter 2 various tests on the exogeneity of arbitrary subsets of explanatory variables are motivated and their performance is compared in a series of simulation experiments. t is found that genuine subset tests play an indispensable part in a comprehensive sequential strategy to classify explanatory variables as either endogenous or exogenous. Tests on the full set of potentially endogenous explanatory variables have a high probability to classify an exogenous explanatory variable wrongly as endogenous if it is merely correlated with an endogenous explanatory variable. The performance of the various tests can be substantially improved by applying the so-called bootstrap. However, its advantages disappear as soon as the non-explanatory instruments are weak. The bootstrap version of the Wald-type test behaves often most favorable. The Wald-type test estimates 208

221 the variances under the assumption of endogeneity of the variables under investigation. When bootstrapped, the subset and full set tests can jointly be used fruitfully to classify individual explanatory variables and groups of them as either exogenous or endogenous. Using these findings a popular study on the effect of education on wage is re-examined. Various implementations of tests on the validity of non-explanatory instrumental variables are examined in Chapter 3. n this context the full set tests are often called overidentifying restrictions tests. These and their subset versions are examined, as well as two corresponding Hausman-type test statistics. Recently several studies have highlighted that overidentifying restrictions tests are not always able to detect invalid instruments. Reasons for this are further clarified and extended to tests on a subset of non-explanatory instrumental variables. n order to derive the distribution of a test statistics it is often needed to assume that the sample size grows to infinity. However, taking the actual sample size into account is for instance possible by deriving a higher order Cornish-Fisher correction. The effectiveness of this correction for the most popular overidentifying restrictions test is re-examined, as it is hardly ever applied in practice. The correction terms show that the uncorrected overidentifying restrictions tests depend on several aspects when the sample size is small. Their reliability decreases as the correlation between the explanatory variables and the disturbance term increases, as well as when the degree of overidentification increases. The degree of overidentification is the difference between the number of non-explanatory instrumental variables and the number of endogenous explanatory variables. The role that these factors play is confirmed by the simulation results. These simulation results also illustrate that the corrected test statistic performs better than its uncorrected counterpart, unless the instruments are weak. The same dependence on the correlation between the explanatory variables and the disturbance term is found for the subset tests. By using these subset tests an additional assumption is implied. This assumption entails that all instrumental variables which are not tested should be valid. t is shown that a violation of this maintained hypothesis has severe consequences for the reliability of these tests. The next two chapters are used to study the performance of various generalized method of moments (GMM) inference techniques in linear dynamic panel data models, when just a few time-series observations are available. GMM is a general class of estimators, which includes both OLS and V as special cases, that is more flexible regarding properties of the disturbance term. t is for instance possible to take heteroskedasticity into account. Heteroskedasticity implies that the variance of the disturbance term may vary over time and per individual. A major advantage of panel data over single-indexed data is the possibility to deal with unobserved explanatory variables that are constant over time. 209

222 These unobserved explanatory variables are often called individual effects. As they are unobserved, the individual effects are part of the disturbance term. Using a transformation these effects can be removed from the data. Finding suitable instruments is often a burdensome task when working with cross-section data. Here panel data offers another advantage. Non-explanatory instrumental variables may be directly available as lags of explanatory variables. However, these instrumental variables are not necessarily strong. Chapter 4 provides the theoretical framework for the simulation study of Chapter 5. t is found that when the cross-section sample size is only moderate, the reliability of this type of inference depends on several aspects. These include the time-dimension sample size, speed of dynamic adjustment, type and severity of heteroskedasticity, relative magnitude of the variance of unobserved individual effects, presence of any endogenous explanatory variables and (non)stationarity of any explanatory variables. A specific implementation of a subset test is investigated when testing the validity of a specific group of non-explanatory instruments. This subset of instruments is only valid under the assumption that the correlation between the explanatory variables and the individual effects is constant over time, a phenomenon called effect stationarity. Remarkably, the behavior of the subset test for effect stationarity is found to be such that it tends to direct the researcher towards applying the most accurate estimator, even if this is inconsistent. Although this may seem strange, it is very much possible that an alternative inconsistent estimator has a much smaller variance than the consistent estimator, making it more accurate in general. Using the conclusions drawn from the simulations, a study on labor supply is revisited. An important conclusion of Chapter 5 is that the performance of standard GMM inference techniques deteriorates in the presence of a genuine (or a supervacaneously treated as) endogenous explanatory variable. Therefore it is investigated in Chapter 6 to what extent subset tests can be used to classify explanatory variables in dynamic panel data models when all time-varying explanatory variables are correctly included in the model for the causal relationship of interest. As the time-series dimension of the panel allows for a further distinction, the explanatory variables are classified as either strictly exogenous, weakly exogenous or endogenous. Strict exogeneity means that an explanatory variable is uncorrelated with all disturbance terms over time. Weakly exogenous explanatory variables may be correlated with past disturbance terms. As before we mean by endogeneity that an explanatory variable is correlated with the current disturbance term. n theory subset tests allow the researcher to abstain from using invalid instruments or from not using valid strong instruments, which is the case when treating a strictly or weakly exogenous explanatory variable as endogenous. Only the validity of certain lagged vari- 210

223 ables, when used as instruments, have to be examined in order to classify an explanatory variable. Several new test statistics are proposed. Two of these are Hausman-type test statistics, which exploit a finite sample corrected variance estimate. Regarding control over the probability to reject a true hypothesis it is found to be beneficial to estimate the variances under the tested hypothesis. Collapsing, a certain way to reduce the number of instruments, is also found to be advantageous when many instruments are available. Reducing the number of instruments this way does not always benefit the degree to which the tests are able to reject a false hypothesis. The corrected Hausman statistics outperform the standard Hausman implementation uniformly. However, all Hausman-type tests are only applicable if the number of instruments is reduced by collapsing. A subset version of the most popular overidentifying restrictions test is found to behave almost as well as the best performing Hausman test and allows for more general instrument structures. These findings are used to revisit a classic study on the effect of deterrence on crime. 211

224

225 Samenvatting (Summary in Dutch) Dat economie in essentie geen experimentele wetenschap is maakt de econometrie tot een onmisbaar instrument. Econometrie biedt oplossingen voor veel problemen die zich voordoen wanneer een onderzoeker beleidsadvies wil geven op basis van empirische bewijs. Een van deze problemen ligt ten grondslag aan het onderwerp van dit proefschrift. Alvorens dit te kunnen bespreken moet eerst een aantal zaken worden neergezet. Vaak is het de bedoeling van een econoom om een causaal verband te onderzoeken. De variabelen waarmee men in een economisch model een andere variabele tracht te verklaren, noemen we verklarende variabelen. Omdat de verklarende variabelen nooit de zogenoemde afhankelijke variabele volledig kunnen verklaren bevat het model ook een onverklaard stuk, de storingsterm. Variabelen die niet gecorreleerd zijn met de storingsterm noemen we exogeen. Wanneer alle verklarende variabelen exogeen zijn kan het causale verband onderzocht worden aan de hand van de kleinste kwadraten (OLS) schatter. Zeker wanneer het een onderzoek naar de effectiviteit van beleid betreft zal de aanname van exogeneiteit niet altijd standhouden. Een voorbeeld hiervan is wanneer men de impact van ontwikkelingshulp op de groei van het bruto binnenlands product (BBP) van het ontvangende land wil onderzoeken. Bij het vaststellen van de hoogte van de ontwikkelingshulp wordt onder meer gekeken naar de huidige groei van het BBP. Ofwel, de afhankelijke en de verklarende variabele worden deels simultaan bepaald. Als gevolg zal de verklarende variabele gecorreleerd zijn met de storingsterm. Wanneer een variabele gecorreleerd is met de storingsterm noemen we deze endogeen. Endogeniteit van een of meer verklarende variabelen zorgt ervoor dat de OLS schatter niet meer consistent is, i.e. de schattingsfout verdwijnt niet wanneer de steekproef oneindig groot wordt. Gelukkig bestaan er alternatieve schattingstechnieken zoals de instrumentele variabelen (V) schatter. Deze techniek staat endogeneiteit van verklarende variabelen toe mits er minstens evenveel niet-verklarende instrumentele variabelen beschikbaar zijn. Zoals de naam al suggereert mogen deze instrumentele variabelen de afhankelijke variabele niet direct verklaren. ndien dit wel het geval is en de instrumentele variabelen ten onrechte niet als verklarende variabelen worden gebruikt, zullen de instrumentele variabelen ongeldig 213

226 zijn. Ze zijn dan ook endogeen met betrekking tot de storingsterm. De niet-verklarende instrumentele variabelen moeten aan nog een voorwaarde voldoen. Ze moeten voldoende gecorreleerd zijn met de verklarende variabelen die endogeen zijn. Wanneer de correlatie hoog is noemen we de instrumentele variabelen sterk, terwijl ze bij een lage correlatie zwak worden genoemd. De instrumentele variabelen mogen dus niet zwak zijn. De volledige set van instrumentele variabelen bestaat uit de niet-verklarende instrumentele variabelen en de exogene verklarende variabelen. De nauwkeurigheid van de V schatter wordt bepaald door de correlatie tussen de (niet-verklarende) instrumentele variabelen en de endogene verklarende variabelen. Hoe sterker de correlatie, hoe nauwkeuriger de schatter. Wanneer alle verklarende variabelen verondersteld worden exogeen te zijn, dan kunnen alle verklarende variabelen als instrument voor zichzelf worden gebruikt, hetgeen resulteert in een perfecte correlatie. n dit specifieke geval zal de V schatter gelijk zijn aan de OLS schatter. Verklarende variabelen als endogeen behandelen terwijl ze exogeen zijn gaat dus ten koste van de nauwkeurigheid van de schatter. Ze ten onrechte als exogeen classificeren resulteert in een onbetrouwbare schatter. Dit proefschrift onderzoekt de implementatie en het gedrag van technieken die gebruikt kunnen worden om variabelen te classificeren met betrekking tot hun correlatie met de storingsterm. Deze technieken kunnen worden toegepast om de exogeniteit van verklarende variabelen te toetsen, maar ook om, eventueel tegelijk, de exogeniteit van niet-verklarende instrumentele variabelen te onderzoeken. Ze zijn van cruciaal belang voor onderzoekers aangezien ze informatie verschaffen over belangrijke eigenschappen van de schatters van causale verbanden. Deze familie van toetsen kent twee typen. Het eerste type wordt gebruikt om een volledige set van variabelen te toetsen, bijvoorbeeld alle potentiële endogene verklarende variabelen of alle instrumentele variabelen. Met het tweede type is het mogelijk om een subset van variabelen te classificeren. Het zijn met name de subset varianten die in dit proefschrift aandacht krijgen. n het eerste deel wordt de betrouwbaarheid van deze toetsen onderzocht voor data zonder tijdsdimensie (cross-sectie data) en in het tweede gedeelte wordt gekeken naar implementaties in lineaire dynamische panel data modellen. n hoofdstuk 2 worden verschillende toetsen op de exogeniteit van arbitraire subsets van verklarende variabelen gemotiveerd en worden hun prestaties vergeleken door middel van verschillende simulatie experimenten. Gevonden wordt dat subset-toetsen een onbetwistbare rol spelen in een sequentiële strategie om verklarende variabelen als endogeen of exogeen te herkennen. Toetsen op alle potentiële endogene verklarende variabelen hebben een grote kans om een exogene verklarende variabele als endogeen aan te merken als deze slechts gecorreleerd is met een endogene variabele. Het gedrag van de toetsgroothe- 214

227 den kan substantieel worden verbeterd wanneer de zogeheten bootstrap wordt toegepast. Hierbij moet echter wel worden opgemerkt dat de instrumenten niet zwak mogen zijn. De bootstrap versie van de toets volgens het Wald principe blijkt vaak het beste in staat om de endogeniteit van een verklarende variabele te constateren. De Wald-toets schat de variantie onder de aanname dat de getoetste verklarende variabelen endogeen zijn. n combinatie met het gebruik van de bootstrap kunnen de volledige set en de subset-toetsen samen succesvol gebruikt worden om verklarende variabelen of groepen daarvan als exogeen of endogeen te classificeren. Met behulp van de resultaten wordt een populaire studie naar het effect van scholing op het loon opnieuw bekeken. Verschillende implementaties van toetsen op de validiteit van niet-verklarende instrumentele variabelen worden onderzocht in hoofdstuk 3. n deze context worden de toetsen op de volledige set van variabelen vaak overidentificatietoetsen genoemd. Deze en hun subset versies worden bekeken, evenals twee subset-toetsen volgens het Hausman principe. Recentelijk hebben verschillende artikelen besproken dat overidentificatietoetsen niet altijd in staat zijn om invalide instrumenten te herkennen. Dit statement wordt verhelderd en uitgebreid naar het toetsen van een subset van niet-verklarende instrumentele variabelen. De verdeling van de meeste toetsgrootheden wordt afgeleid onder de aanname dat de steekproefomvang oneindig groot wordt. Het is echter mogelijk om rekening te houden met het feit dat de steekproefomvang vaak klein is. Een hogere orde Cornish- Fisher correctie bepalen is een manier om dit te doen. Het nut van deze correctie voor de meest gebruikte overidentificatietoets wordt opnieuw bekeken omdat deze zelden wordt gebruikt in de praktijk. De correctietermen laten zien dat het gedrag van de standaard overidentificatietoetsen in een kleine steekproef afhankelijk is van een aantal factoren. Zowel de correlatie tussen verklarende variabelen en de storingsterm, alsmede de mate van overidentificatie, beïnvloeden de betrouwbaarheid van de toetsen. De mate van overidentificatie is het verschil tussen het aantal niet-verklarende instrumentele variabelen en het aantal endogene verklarende variabelen. De rol van deze twee factoren wordt bevestigd door de simulatieresultaten. Uit dezelfde simulatieresultaten blijkt dat de gecorrigeerde toetsgrootheid beter is dan de ongecorrigeerde varianten, tenzij de instrumenten zwak zijn. Met betrekking tot de subset-toetsen wordt gevonden dat hun kwaliteit ook afhankelijk is van de correlatie tussen de verklarende variabelen en de storingsterm. Het gebruik van de subset-toetsen vereist een additionele aanname, namelijk dat de groep van niet-verklarende instrumentele variabelen die niet getoetst worden valide is. Uit de resultaten blijkt dat een schending van deze aanname ernstige consequenties heeft voor de toepasbaarheid van de subset-toetsen. De twee volgende hoofdstukken gaan over de betrouwbaarheid van verschillende toet- 215

228 sen en schattingstechnieken op basis van GMM in lineaire dynamische panel data modellen wanneer slechts enkele tijdreeksobservaties beschikbaar zijn. GMM is een algemene schattingstechniek die zowel OLS als V als speciale gevallen kent, maar flexibeler is met betrekking tot eigenschappen van de storingsterm. Het is bijvoorbeeld mogelijk om rekening te houden met de aanwezigheid van heteroskedasticiteit. Dat wil zeggen dat de variantie van de storingsterm mag verschillen per tijdstip en individu. Een groot voordeel van panel data is de mogelijkheid om rekening te houden met verklarende variabelen die constant zijn in de tijd, maar niet geobserveerd zijn. Deze ontbrekende verklarende variabelen noemen we ook wel individuele effecten. De individuele effecten maken dus deel uit van de storingsterm en kunnen door middel van een transformatie worden verwijderd. Waar het vinden van instrumentele variabelen vaak lastig is voor cross-sectionele data, biedt panel data nog een voordeel. Niet-verklarende instrumentele variabelen zijn direct beschikbaar in de vorm van vertragingen van de verklarende variabelen. Deze instrumenten zijn echter niet noodzakelijkerwijs voldoende gecorreleerd met de verklarende variabelen. Hoofdstuk 4 bevat het theoretische kader dat de basis vormt voor de simulatiestudie van hoofdstuk 5. Gevonden wordt dat bij een gelimiteerd aantal cross-sectionele waarnemingen de kwaliteit van de analyse afhangt van een aantal factoren. Deze factoren zijn onder andere het aantal tijdswaarnemingen, de snelheid waarmee het dynamische proces zich aanpast, de eigenschappen van de heteroskedasticiteit, de relatieve omvang van de variantie van de individuele effecten, de aanwezigheid van endogene verklarende variabelen en de (niet-)stationariteit van verklarende variabelen. Een specifieke toepassing van subset-toetsen op de validiteit van niet-verklarende instrumentele variabelen wordt onderzocht. Deze subset van instrumenten is alleen valide onder de aanname dat de correlatie tussen verklarende variabelen en het individuele effect constant is in de tijd. Dit wordt ook wel effect stationariteit genoemd. Opmerkelijk genoeg wordt gevonden dat deze toets op effect stationariteit de neiging heeft om de schatter te selecteren die het meest accuraat is en niet per definitie degene die consistent is. Het is namelijk mogelijk dat een schatter wel consistent is, maar een grote variantie heeft zodat een inconsistente schatter met een kleinere variantie over het algemeen accurater is. Met behulp van de conclusies die volgen uit de simulatieresultaten wordt een studie naar arbeidsaanbod opnieuw tegen het licht gehouden. Een belangrijke conclusie van hoofdstuk 5 is dat de betrouwbaarheid van standaard GMM inferentietechnieken er op achteruit gaat in de aanwezigheid van een echte (of onterecht als zodanig behandelde) endogene verklarende variabele. Vandaar dat in hoofdstuk 6 wordt onderzocht in hoeverre subset-toetsen in staat zijn verklarende variabelen te 216

229 classificeren in dynamische panel data modellen wanneer alle verklarende variabelen die variëren in de tijd, zijn meegenomen in het model voor de causale relatie. De tijdsdimensie staat een verder onderscheid toe, verklarende variabelen kunnen nu worden geclassificeerd als strikt exogeen, zwak exogeen en endogeen. Strikte exogeniteit betekent dat een verklarende variabele ongecorreleerd is met alle storingstermen in de tijd. Zwak exogene variabelen mogen gecorreleerd zijn met storingstermen uit het verleden. Zoals voorheen zijn variabelen die gecorreleerd zijn met de huidige storingsterm endogeen. De verklarende variabelen kunnen dus worden geclassificeerd aan de hand van de validiteit van een subset van instrumenten. Deze subset bestaat uit bepaalde recente vertragingen van de verklarende variabelen. n hoofdstuk 6 worden verschillende nieuwe toetsgrootheden voorgesteld. Twee hiervan zijn toetsen volgens het Hausman principe, die een eindigesteekproef-correctie bevatten voor de geschatte variantie. Wat betreft controle over de kans om een valide hypothese te verwerpen wordt gevonden dat het zinvol is om de varianties onder deze hypothese te schatten. Het collapsen van de instrumenten, een manier om het aantal instrumenten terug te dringen, heeft een positieve invloed op het gedrag van de subset-toetsen wanneer veel instrumenten beschikbaar zijn. Deze manier van instrumentreductie komt het vermogen van de subset-toetsen om een invalide hypothese te verwerpen echter niet altijd ten goede. De gecorrigeerde Hausman-toetsen presteren altijd beter dan de standaard implementaties van het Hausman principe. Echter, de toetsen volgens het Hausman principe kunnen alleen worden toegepast wanneer er gebruik wordt gemaakt van collapsen. Een subset-versie van de meest populaire overidentificatietoets presteert bijna even goed als de best presterende Hausman-toets, maar behoeft geen instrumentreductie. De simulatieresultaten worden gebruikt om een klassieke studie naar het effect van criminaliteitsbestrijding te herzien. 217

230 The Tinbergen nstitute is the nstitute for Economic Research, which was founded in 1987 by the Faculties of Economics and Econometrics of the Erasmus University Rotterdam, University of Amsterdam and VU University Amsterdam. The nstitute is named after the late Professor Jan Tinbergen, Dutch Nobel Prize laureate in economics in The Tinbergen nstitute is located in Amsterdam and Rotterdam. The following books recently appeared in the Tinbergen nstitute Research Series: 567. B.N. KRAMER, Why dont they take a card? Essays on the demand for micro health insurance 568. M. KLÇ, Fundamental nsights in Power Futures Prices 569. A.G.B. DE VRES, Venture Capital: Relations with the Economy and ntellectual Property 570. E.M.F. VAN DEN BROEK, Keeping up Appearances 571. K.T. MOORE, A Tale of Risk: Essays on Financial Extremes 572. F.T. ZOUTMAN, A Symphony of Redistributive nstruments 573. M.J. GERRTSE, Policy Competition and the Spatial Economy 574. A. OPSCHOOR, Understanding Financial Market Volatility 575. R.R. VAN LOON, Tourism and the Economic Valuation of Cultural Heritage 576..L. LYUBMOV, Essays on Political Economy and Economic Development 577. A.A.F. GERRTSEN, Essays in Optimal 578. M.L. SCHOLTUS, The mpact of High-Frequency Trading on Financial Markets 579. E. RAVV, Forecasting Financial and Macroeconomic Variables: Shrinkage, Dimension reduction, and Aggregation 580. J. TCHEM, Altruism, Conformism, and ncentives in the Workplace 581. E.S. HENDRKS, Essays in Law and Economics 582. X. SHEN, Essays on Empirical Asset Pricing 583. L.T. GATAREK, Econometric Contributions to Financial Trading, Hedging and Risk Measurement 584. X. L, Temporary Price Deviation, Limited Attention and nformation Acquisition in the Stock Market 585. Y. DA, Efficiency in Corporate Takeovers 586. S.L. VAN DER STER, Approximate feasibility in real-time scheduling: Speeding up in order to meet deadlines 587. A. SELM, An Examination of Uncertainty from a Psychological and Economic Viewpoint 218

231 588. B.Z. YUESHEN, Frictions in Modern Financial Markets and the mplications for Market Quality 589. D. VAN DOLDER, Game Shows, Gambles, and Economic Behavior 590. S.P. CEYHAN, Essays on Bayesian Analysis of Time Varying Economic Patterns 591. S. RENES, Never the Single Measure 592. D.L. N 'T VELD, Complex Systems in Financial Economics: Applications to nterbank and Stock Markets 593. Y. YANG, Laboratory Tests of Theories of Strategic nteraction 594. M.P. WOJTOWCZ, Pricing Credits Derivatives and Credit Securitization 595. R.S. SAYAG, Communication and Learning in Decision Making 596. S.L. BLAUW, Well-to-do or doing well? Empirical studies of wellbeing and development 597. T.A. MAKAREWCZ, Learning to Forecast: Genetic Algorithms and Experiments 598. P. ROBALO, Understanding Political Behavior: Essays in Experimental Political Economy 599. R. ZOUTENBER, Work Motivation and ncentives in the Public Sector 600. M.B.W. KOBUS, Economic Studies on Public Facility use 601. R.J.D. POTTER VAN LOON, Modeling non-standard financial decision making 602. G. MESTERS, Essays on Nonlinear Panel Time Series Models 603. S. GUBNS, nformation Technologies and Travel 604. D. KOPÁNY, Bounded Rationality and Learning in Market Competition 605. N. MARTYNOVA, ncentives and Regulation in Banking 606. D. KARSTANJE, Unraveling Dimensions: Commodity Futures Curves and Equity Liquidity 607. T.C.A.P. GOSENS, The Value of Recreational Areas in Urban Regions 608. L.M. MARĆ, The mpact of Aid on Total Government Expenditures 609. C. L, Hitchhiking on the Road of Decision Making under Uncertainty 610. L. ROSENDAHL HUBER, Entrepreneurship, Teams and Sustainability: a Series of Field Experiments 611. X. YANG, Essays on High Frequency Financial Econometrics 612. A.H. VAN DER WEJDE, The ndustrial Organization of Transport Markets: Modeling pricing, nvestment and Regulation in Rail and Road Networks 613. H.E. SLVA MONTALVA, Airport Pricing Policies: Airline Conduct, Price Discrimination, Dynamic Congestion and Network Effects 219

232 614. C. DETZ, Hierarchies, Communication and Restricted Cooperation in Cooperative Games 615. M.A. ZOCAN, Financial System Architecture and ntermediation Quality 616. G. ZHU, Three Essays in Empirical Corporate Finance 220

233 n order to consistently estimate a causal economic relationship at least as many exogenous non-explanatory instrumental variables are required as there are endogenous explanatory variables. This thesis studies various techniques that can be used to classify selected variables as either exogenous or endogenous. These techniques are of great importance as they provide guidance regarding avoiding inconsistency and enhancing efficiency of estimators of the causal effects. Various implementations are first examined in the context of models for cross-section data and subsequently for dynamic panel data models. n addition to standard techniques some more refined alternatives are proposed. The results are used to re-examine various popular studies. Milan Pleus (1987) obtained both his bachelor s and master s degree in econometrics at the University of Amsterdam. t is at this same university that he started as a PhD student in His research interests include testing procedures, simultaneity and panel data models.

The performance of tests on endogeneity of subsets of explanatory variables scanned by simulation

The performance of tests on endogeneity of subsets of explanatory variables scanned by simulation Discussion Paper: 2011/13 The performance of tests on endogeneity of subsets of explanatory variables scanned by simulation Jan F. Kiviet and Milan Pleus www.ase.uva.nl/uva-econometrics Amsterdam School

More information

Jan F. KIVIET and Milan PLEUS

Jan F. KIVIET and Milan PLEUS Division of Economics, EGC School of Humanities and Social Sciences Nanyang Technological University 14 Nanyang Drive Singapore 637332 The Performance of Tests on Endogeneity of Subsets of Explanatory

More information

Citation for published version (APA): Weber, B. A. (2017). Sliding friction: From microscopic contacts to Amontons law

Citation for published version (APA): Weber, B. A. (2017). Sliding friction: From microscopic contacts to Amontons law UvA-DARE (Digital Academic Repository) Sliding friction Weber, B.A. Link to publication Citation for published version (APA): Weber, B. A. (2017). Sliding friction: From microscopic contacts to Amontons

More information

UvA-DARE (Digital Academic Repository) Phenotypic variation in plants Lauss, K. Link to publication

UvA-DARE (Digital Academic Repository) Phenotypic variation in plants Lauss, K. Link to publication UvA-DARE (Digital Academic Repository) Phenotypic variation in plants Lauss, K. Link to publication Citation for published version (APA): Lauss, K. (2017). Phenotypic variation in plants: Roles for epigenetics

More information

Citation for published version (APA): Hin, V. (2017). Ontogenesis: Eco-evolutionary perspective on life history complexity.

Citation for published version (APA): Hin, V. (2017). Ontogenesis: Eco-evolutionary perspective on life history complexity. UvA-DARE (Digital Academic Repository) Ontogenesis Hin, V. Link to publication Citation for published version (APA): Hin, V. (2017). Ontogenesis: Eco-evolutionary perspective on life history complexity.

More information

Coherent X-ray scattering of charge order dynamics and phase separation in titanates Shi, B.

Coherent X-ray scattering of charge order dynamics and phase separation in titanates Shi, B. UvA-DARE (Digital Academic Repository) Coherent X-ray scattering of charge order dynamics and phase separation in titanates Shi, B. Link to publication Citation for published version (APA): Shi, B. (2017).

More information

Physiological and genetic studies towards biofuel production in cyanobacteria Schuurmans, R.M.

Physiological and genetic studies towards biofuel production in cyanobacteria Schuurmans, R.M. UvA-DARE (Digital Academic Repository) Physiological and genetic studies towards biofuel production in cyanobacteria Schuurmans, R.M. Link to publication Citation for published version (APA): Schuurmans,

More information

Instrumental Variables and GMM: Estimation and Testing. Steven Stillman, New Zealand Department of Labour

Instrumental Variables and GMM: Estimation and Testing. Steven Stillman, New Zealand Department of Labour Instrumental Variables and GMM: Estimation and Testing Christopher F Baum, Boston College Mark E. Schaffer, Heriot Watt University Steven Stillman, New Zealand Department of Labour March 2003 Stata Journal,

More information

UvA-DARE (Digital Academic Repository) Accuracy of method of moments based inference Poldermans, R.W. Link to publication

UvA-DARE (Digital Academic Repository) Accuracy of method of moments based inference Poldermans, R.W. Link to publication UvA-DARE (Digital Academic Repository) Accuracy of method of moments based inference Poldermans, R.W. Link to publication Citation for published version (APA): Poldermans, R. W. (2017). Accuracy of method

More information

Birkbeck Working Papers in Economics & Finance

Birkbeck Working Papers in Economics & Finance ISSN 1745-8587 Birkbeck Working Papers in Economics & Finance Department of Economics, Mathematics and Statistics BWPEF 1809 A Note on Specification Testing in Some Structural Regression Models Walter

More information

Citation for published version (APA): Adhyaksa, G. W. P. (2018). Understanding losses in halide perovskite thin films

Citation for published version (APA): Adhyaksa, G. W. P. (2018). Understanding losses in halide perovskite thin films UvA-DARE (Digital Academic Repository) Understanding losses in halide perovskite thin films Adhyaksa, G.W.P. Link to publication Citation for published version (APA): Adhyaksa, G. W. P. (2018). Understanding

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Citation for published version (APA): Nguyen, X. C. (2017). Different nanocrystal systems for carrier multiplication

Citation for published version (APA): Nguyen, X. C. (2017). Different nanocrystal systems for carrier multiplication UvA-DARE (Digital Academic Repository) Different nanocrystal systems for carrier multiplication Nguyen, X.C. Link to publication Citation for published version (APA): Nguyen, X. C. (2017). Different nanocrystal

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

A better way to bootstrap pairs

A better way to bootstrap pairs A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression

More information

Improving GMM efficiency in dynamic models for panel data with mean stationarity

Improving GMM efficiency in dynamic models for panel data with mean stationarity Working Paper Series Department of Economics University of Verona Improving GMM efficiency in dynamic models for panel data with mean stationarity Giorgio Calzolari, Laura Magazzini WP Number: 12 July

More information

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63 1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS17/18 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Endogeneity b) Instrumental

More information

Bias Correction Methods for Dynamic Panel Data Models with Fixed Effects

Bias Correction Methods for Dynamic Panel Data Models with Fixed Effects MPRA Munich Personal RePEc Archive Bias Correction Methods for Dynamic Panel Data Models with Fixed Effects Mohamed R. Abonazel Department of Applied Statistics and Econometrics, Institute of Statistical

More information

Exogeneity tests and weak identification

Exogeneity tests and weak identification Cireq, Cirano, Départ. Sc. Economiques Université de Montréal Jean-Marie Dufour Cireq, Cirano, William Dow Professor of Economics Department of Economics Mcgill University June 20, 2008 Main Contributions

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Strength and weakness of instruments in IV and GMM estimation of dynamic panel data models

Strength and weakness of instruments in IV and GMM estimation of dynamic panel data models Strength and weakness of instruments in IV and GMM estimation of dynamic panel data models Jan F. Kiviet (University of Amsterdam & Tinbergen Institute) preliminary version: January 2009 JEL-code: C3;

More information

1 Estimation of Persistent Dynamic Panel Data. Motivation

1 Estimation of Persistent Dynamic Panel Data. Motivation 1 Estimation of Persistent Dynamic Panel Data. Motivation Consider the following Dynamic Panel Data (DPD) model y it = y it 1 ρ + x it β + µ i + v it (1.1) with i = {1, 2,..., N} denoting the individual

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Exogeneity tests and weak-identification

Exogeneity tests and weak-identification Exogeneity tests and weak-identification Firmin Doko Université de Montréal Jean-Marie Dufour McGill University First version: September 2007 Revised: October 2007 his version: February 2007 Compiled:

More information

Increasing the Power of Specification Tests. November 18, 2018

Increasing the Power of Specification Tests. November 18, 2018 Increasing the Power of Specification Tests T W J A. H U A MIT November 18, 2018 A. This paper shows how to increase the power of Hausman s (1978) specification test as well as the difference test in a

More information

Climate change and topography as drivers of Latin American biome dynamics Flantua, S.G.A.

Climate change and topography as drivers of Latin American biome dynamics Flantua, S.G.A. UvA-DARE (Digital Academic Repository) Climate change and topography as drivers of Latin American biome dynamics Flantua, S.G.A. Link to publication Citation for published version (APA): Flantua, S. G.

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.

More information

GMM based inference for panel data models

GMM based inference for panel data models GMM based inference for panel data models Maurice J.G. Bun and Frank Kleibergen y this version: 24 February 2010 JEL-code: C13; C23 Keywords: dynamic panel data model, Generalized Method of Moments, weak

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral, IAE, Barcelona GSE and University of Gothenburg U. of Gothenburg, May 2015 Roadmap Testing for deviations

More information

Instrumental variables estimation using heteroskedasticity-based instruments

Instrumental variables estimation using heteroskedasticity-based instruments Instrumental variables estimation using heteroskedasticity-based instruments Christopher F Baum, Arthur Lewbel, Mark E Schaffer, Oleksandr Talavera Boston College/DIW Berlin, Boston College, Heriot Watt

More information

Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments

Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments CIRJE-F-466 Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments Yukitoshi Matsushita CIRJE, Faculty of Economics, University of Tokyo February 2007

More information

Instrumental Variables Estimation in Stata

Instrumental Variables Estimation in Stata Christopher F Baum 1 Faculty Micro Resource Center Boston College March 2007 1 Thanks to Austin Nichols for the use of his material on weak instruments and Mark Schaffer for helpful comments. The standard

More information

Instrumental Variables

Instrumental Variables Università di Pavia 2010 Instrumental Variables Eduardo Rossi Exogeneity Exogeneity Assumption: the explanatory variables which form the columns of X are exogenous. It implies that any randomness in the

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

Dynamic Panels. Chapter Introduction Autoregressive Model

Dynamic Panels. Chapter Introduction Autoregressive Model Chapter 11 Dynamic Panels This chapter covers the econometrics methods to estimate dynamic panel data models, and presents examples in Stata to illustrate the use of these procedures. The topics in this

More information

Linear dynamic panel data models

Linear dynamic panel data models Linear dynamic panel data models Laura Magazzini University of Verona L. Magazzini (UniVR) Dynamic PD 1 / 67 Linear dynamic panel data models Dynamic panel data models Notation & Assumptions One of the

More information

Advanced Econometrics

Advanced Econometrics Based on the textbook by Verbeek: A Guide to Modern Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna May 16, 2013 Outline Univariate

More information

ORTHOGONALITY TESTS IN LINEAR MODELS SEUNG CHAN AHN ARIZONA STATE UNIVERSITY ABSTRACT

ORTHOGONALITY TESTS IN LINEAR MODELS SEUNG CHAN AHN ARIZONA STATE UNIVERSITY ABSTRACT ORTHOGONALITY TESTS IN LINEAR MODELS SEUNG CHAN AHN ARIZONA STATE UNIVERSITY ABSTRACT This paper considers several tests of orthogonality conditions in linear models where stochastic errors may be heteroskedastic

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

Published in: Tenth Tbilisi Symposium on Language, Logic and Computation: Gudauri, Georgia, September 2013

Published in: Tenth Tbilisi Symposium on Language, Logic and Computation: Gudauri, Georgia, September 2013 UvA-DARE (Digital Academic Repository) Estimating the Impact of Variables in Bayesian Belief Networks van Gosliga, S.P.; Groen, F.C.A. Published in: Tenth Tbilisi Symposium on Language, Logic and Computation:

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation

Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation Seung C. Ahn Arizona State University, Tempe, AZ 85187, USA Peter Schmidt * Michigan State University,

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

Discriminating between (in)valid external instruments and (in)valid exclusion restrictions

Discriminating between (in)valid external instruments and (in)valid exclusion restrictions Discussion Paper: 05/04 Discriminating between (in)valid external instruments and (in)valid exclusion restrictions Jan F. Kiviet www.ase.uva.nl/uva-econometrics Amsterdam School of Economics Roetersstraat

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

Superfluid helium and cryogenic noble gases as stopping media for ion catchers Purushothaman, Sivaji

Superfluid helium and cryogenic noble gases as stopping media for ion catchers Purushothaman, Sivaji University of Groningen Superfluid helium and cryogenic noble gases as stopping media for ion catchers Purushothaman, Sivaji IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails GMM-based inference in the AR() panel data model for parameter values where local identi cation fails Edith Madsen entre for Applied Microeconometrics (AM) Department of Economics, University of openhagen,

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

A Robust Test for Weak Instruments in Stata

A Robust Test for Weak Instruments in Stata A Robust Test for Weak Instruments in Stata José Luis Montiel Olea, Carolin Pflueger, and Su Wang 1 First draft: July 2013 This draft: November 2013 Abstract We introduce and describe a Stata routine ivrobust

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Citation for published version (APA): Hoekstra, S. (2005). Atom Trap Trace Analysis of Calcium Isotopes s.n.

Citation for published version (APA): Hoekstra, S. (2005). Atom Trap Trace Analysis of Calcium Isotopes s.n. University of Groningen Atom Trap Trace Analysis of Calcium Isotopes Hoekstra, Steven IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please

More information

Christopher Dougherty London School of Economics and Political Science

Christopher Dougherty London School of Economics and Political Science Introduction to Econometrics FIFTH EDITION Christopher Dougherty London School of Economics and Political Science OXFORD UNIVERSITY PRESS Contents INTRODU CTION 1 Why study econometrics? 1 Aim of this

More information

4.8 Instrumental Variables

4.8 Instrumental Variables 4.8. INSTRUMENTAL VARIABLES 35 4.8 Instrumental Variables A major complication that is emphasized in microeconometrics is the possibility of inconsistent parameter estimation due to endogenous regressors.

More information

Citation for published version (APA): Harinck, S. (2001). Conflict issues matter : how conflict issues influence negotiation

Citation for published version (APA): Harinck, S. (2001). Conflict issues matter : how conflict issues influence negotiation UvA-DARE (Digital Academic Repository) Conflict issues matter : how conflict issues influence negotiation Harinck, S. Link to publication Citation for published version (APA): Harinck, S. (2001). Conflict

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

Instrumental Variables and the Problem of Endogeneity

Instrumental Variables and the Problem of Endogeneity Instrumental Variables and the Problem of Endogeneity September 15, 2015 1 / 38 Exogeneity: Important Assumption of OLS In a standard OLS framework, y = xβ + ɛ (1) and for unbiasedness we need E[x ɛ] =

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap Deviations from the standard

More information

LECTURE 10: MORE ON RANDOM PROCESSES

LECTURE 10: MORE ON RANDOM PROCESSES LECTURE 10: MORE ON RANDOM PROCESSES AND SERIAL CORRELATION 2 Classification of random processes (cont d) stationary vs. non-stationary processes stationary = distribution does not change over time more

More information

Instrumental variables and GMM: Estimation and testing

Instrumental variables and GMM: Estimation and testing Boston College Economics Working Paper 545, 02 November 2002 Instrumental variables and GMM: Estimation and testing Christopher F. Baum Boston College Mark E. Schaffer Heriot Watt University Steven Stillman

More information

Citation for published version (APA): Jak, S. (2013). Cluster bias: Testing measurement invariance in multilevel data

Citation for published version (APA): Jak, S. (2013). Cluster bias: Testing measurement invariance in multilevel data UvA-DARE (Digital Academic Repository) Cluster bias: Testing measurement invariance in multilevel data Jak, S. Link to publication Citation for published version (APA): Jak, S. (2013). Cluster bias: Testing

More information

Reliability of inference (1 of 2 lectures)

Reliability of inference (1 of 2 lectures) Reliability of inference (1 of 2 lectures) Ragnar Nymoen University of Oslo 5 March 2013 1 / 19 This lecture (#13 and 14): I The optimality of the OLS estimators and tests depend on the assumptions of

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Specification testing in panel data models estimated by fixed effects with instrumental variables

Specification testing in panel data models estimated by fixed effects with instrumental variables Specification testing in panel data models estimated by fixed effects wh instrumental variables Carrie Falls Department of Economics Michigan State Universy Abstract I show that a handful of the regressions

More information

IV and IV-GMM. Christopher F Baum. EC 823: Applied Econometrics. Boston College, Spring 2014

IV and IV-GMM. Christopher F Baum. EC 823: Applied Econometrics. Boston College, Spring 2014 IV and IV-GMM Christopher F Baum EC 823: Applied Econometrics Boston College, Spring 2014 Christopher F Baum (BC / DIW) IV and IV-GMM Boston College, Spring 2014 1 / 1 Instrumental variables estimators

More information

What s New in Econometrics. Lecture 13

What s New in Econometrics. Lecture 13 What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments

More information

Data-driven methods in application to flood defence systems monitoring and analysis Pyayt, A.

Data-driven methods in application to flood defence systems monitoring and analysis Pyayt, A. UvA-DARE (Digital Academic Repository) Data-driven methods in application to flood defence systems monitoring and analysis Pyayt, A. Link to publication Citation for published version (APA): Pyayt, A.

More information

Specification Test for Instrumental Variables Regression with Many Instruments

Specification Test for Instrumental Variables Regression with Many Instruments Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing

More information

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models

On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models On GMM Estimation and Inference with Bootstrap Bias-Correction in Linear Panel Data Models Takashi Yamagata y Department of Economics and Related Studies, University of York, Heslington, York, UK January

More information

Citation for published version (APA): Kooistra, F. B. (2007). Fullerenes for organic electronics [Groningen]: s.n.

Citation for published version (APA): Kooistra, F. B. (2007). Fullerenes for organic electronics [Groningen]: s.n. University of Groningen Fullerenes for organic electronics Kooistra, Floris Berend IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please

More information

Introduction to Eco n o m et rics

Introduction to Eco n o m et rics 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. Introduction to Eco n o m et rics Third Edition G.S. Maddala Formerly

More information

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and

More information

Instrumental variables estimation using heteroskedasticity-based instruments

Instrumental variables estimation using heteroskedasticity-based instruments Instrumental variables estimation using heteroskedasticity-based instruments Christopher F Baum, Arthur Lewbel, Mark E Schaffer, Oleksandr Talavera Boston College/DIW Berlin, Boston College, Heriot Watt

More information

1. GENERAL DESCRIPTION

1. GENERAL DESCRIPTION Econometrics II SYLLABUS Dr. Seung Chan Ahn Sogang University Spring 2003 1. GENERAL DESCRIPTION This course presumes that students have completed Econometrics I or equivalent. This course is designed

More information

Economics 308: Econometrics Professor Moody

Economics 308: Econometrics Professor Moody Economics 308: Econometrics Professor Moody References on reserve: Text Moody, Basic Econometrics with Stata (BES) Pindyck and Rubinfeld, Econometric Models and Economic Forecasts (PR) Wooldridge, Jeffrey

More information

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II Jeff Wooldridge IRP Lectures, UW Madison, August 2008 5. Estimating Production Functions Using Proxy Variables 6. Pseudo Panels

More information

Comments on Weak instrument robust tests in GMM and the new Keynesian Phillips curve by F. Kleibergen and S. Mavroeidis

Comments on Weak instrument robust tests in GMM and the new Keynesian Phillips curve by F. Kleibergen and S. Mavroeidis Comments on Weak instrument robust tests in GMM and the new Keynesian Phillips curve by F. Kleibergen and S. Mavroeidis Jean-Marie Dufour First version: August 2008 Revised: September 2008 This version:

More information

1 Procedures robust to weak instruments

1 Procedures robust to weak instruments Comment on Weak instrument robust tests in GMM and the new Keynesian Phillips curve By Anna Mikusheva We are witnessing a growing awareness among applied researchers about the possibility of having weak

More information

Instrumental Variables

Instrumental Variables Università di Pavia 2010 Instrumental Variables Eduardo Rossi Exogeneity Exogeneity Assumption: the explanatory variables which form the columns of X are exogenous. It implies that any randomness in the

More information

Instrumental Variables, Simultaneous and Systems of Equations

Instrumental Variables, Simultaneous and Systems of Equations Chapter 6 Instrumental Variables, Simultaneous and Systems of Equations 61 Instrumental variables In the linear regression model y i = x iβ + ε i (61) we have been assuming that bf x i and ε i are uncorrelated

More information

EFFICIENT ESTIMATION USING PANEL DATA 1. INTRODUCTION

EFFICIENT ESTIMATION USING PANEL DATA 1. INTRODUCTION Econornetrica, Vol. 57, No. 3 (May, 1989), 695-700 EFFICIENT ESTIMATION USING PANEL DATA BY TREVOR S. BREUSCH, GRAYHAM E. MIZON, AND PETER SCHMIDT' 1. INTRODUCTION IN AN IMPORTANT RECENT PAPER, Hausman

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

Lecture 11 Weak IV. Econ 715

Lecture 11 Weak IV. Econ 715 Lecture 11 Weak IV Instrument exogeneity and instrument relevance are two crucial requirements in empirical analysis using GMM. It now appears that in many applications of GMM and IV regressions, instruments

More information

University of Groningen. Morphological design of Discrete-Time Cellular Neural Networks Brugge, Mark Harm ter

University of Groningen. Morphological design of Discrete-Time Cellular Neural Networks Brugge, Mark Harm ter University of Groningen Morphological design of Discrete-Time Cellular Neural Networks Brugge, Mark Harm ter IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you

More information

Linear Panel Data Models

Linear Panel Data Models Linear Panel Data Models Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania October 5, 2009 Michael R. Roberts Linear Panel Data Models 1/56 Example First Difference

More information

When is it really justifiable to ignore explanatory variable endogeneity in a regression model?

When is it really justifiable to ignore explanatory variable endogeneity in a regression model? Discussion Paper: 2015/05 When is it really justifiable to ignore explanatory variable endogeneity in a regression model? Jan F. Kiviet www.ase.uva.nl/uva-econometrics Amsterdam School of Economics Roetersstraat

More information

Inference about the Indirect Effect: a Likelihood Approach

Inference about the Indirect Effect: a Likelihood Approach Discussion Paper: 2014/10 Inference about the Indirect Effect: a Likelihood Approach Noud P.A. van Giersbergen www.ase.uva.nl/uva-econometrics Amsterdam School of Economics Department of Economics & Econometrics

More information

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

4 Instrumental Variables Single endogenous variable One continuous instrument. 2 Econ 495 - Econometric Review 1 Contents 4 Instrumental Variables 2 4.1 Single endogenous variable One continuous instrument. 2 4.2 Single endogenous variable more than one continuous instrument..........................

More information

xtseqreg: Sequential (two-stage) estimation of linear panel data models

xtseqreg: Sequential (two-stage) estimation of linear panel data models xtseqreg: Sequential (two-stage) estimation of linear panel data models and some pitfalls in the estimation of dynamic panel models Sebastian Kripfganz University of Exeter Business School, Department

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

UvA-DARE (Digital Academic Repository) Fluorogenic organocatalytic reactions Raeisolsadati Oskouei, M. Link to publication

UvA-DARE (Digital Academic Repository) Fluorogenic organocatalytic reactions Raeisolsadati Oskouei, M. Link to publication UvA-DARE (Digital Academic Repository) Fluorogenic organocatalytic reactions Raeisolsadati Oskouei, M. Link to publication Citation for published version (APA): Raeisolsadati Oskouei, M. (2017). Fluorogenic

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap of the course Introduction.

More information

A CONDITIONAL LIKELIHOOD RATIO TEST FOR STRUCTURAL MODELS. By Marcelo J. Moreira 1

A CONDITIONAL LIKELIHOOD RATIO TEST FOR STRUCTURAL MODELS. By Marcelo J. Moreira 1 Econometrica, Vol. 71, No. 4 (July, 2003), 1027 1048 A CONDITIONAL LIKELIHOOD RATIO TEST FOR STRUCTURAL MODELS By Marcelo J. Moreira 1 This paper develops a general method for constructing exactly similar

More information