Simple Tests of Random Missing for Unbalanced Panel Data Models. September 10, 2014

Size: px
Start display at page:

Download "Simple Tests of Random Missing for Unbalanced Panel Data Models. September 10, 2014"

Transcription

1 Simple Tests of Random Missing for Unbalanced Panel Data Models Do Won Kwak and Suyong Song September 10, 2014 Abstract This paper proposes simple tests of missing completely at random (MCAR) and missing at random (MAR) assumptions for unbalanced panel data by extending Hausman [1978] s specification test. The proposed Hausman-Type (HT) tests can be applied flexibly to estimations with complete-case, multiple imputations, sample selection correction, and inverse probability weighted methods. Monte Carlo simulations show substantial power and no size bias of the tests. We illustrate usefulness of the tests of the MCAR and MAR assumptions in the study of the effects of trade liberalization policies on trade flows. The HT tests results show that conventional estimators substantially underestimate the effects on trade flows. Keywords: Unbalanced panel data; Hausman-type specification test; Missing data; Missing completely at random; Missing at random JEL Classfication: C12, C33, F15 We would like to thank Jeffrey M. Wooldridge for his perspective and helpful comments. We are grateful to Xuepeng Liu for kindly providing his dataset and Timothy Vogelsang, Todd Elder, Chris O Donnell, Adrian Pagan, Alicia Rambaldi, Prasada Rao, Donggyu Sul for suggestions and comments. We also thank seminar participants at Korea University, Michigan State University, Monash University, University of Sydney, and University of Queensland, for useful comments. All remaining errors are our own. School of Economics, The University of Queensland, St Lucia, QLD, Australia, d.kwak@uq.edu.au Department of Economics, University of Wisconsin-Milwaukee, PO Box 413, Milwaukee, WI 53211, US, songs@uwm.edu 1

2 1 Introduction As panel data are becoming more readily available, analysis using the methods for balanced data are becoming more attractive to empirical researchers. However, balanced panel data is rarely available because of missing data. For instance, Abrevaya and Donald [2010] surveyed that missing data occurs in 40% of all published articles in four empirical economics journals (AER, JHR, JOLE, and QJE) over If the missing completely at random (MCAR) assumption is satisfied (Rubin [1976] and Little and Rubin [2002]), we can still justify the results from the balanced data method but, unfortunately, the MCAR assumption is not generally testable. In 70% of these publications with missing data, units with incomplete observations are all discarded without providing any justification for the validity of the MCAR assumption. 1 Other remaining studies typically rely on the assumption that the missing and nonmissing data have the same distribution conditional on observed covariates. The literature generally assumes the MAR assumption and focuses on efficient estimation (Robins et al. [1994], Hahn [1998], Cattaneo [2010], Abrevaya and Donald [2010], Muris [2011], Graham et al. [2012], Chaudhuri [2014], Chaudhuri and Guilkey [2014], among others). Again, justification of the results for the estimates from using the MAR assumption relies on faith rather than formal statistical analysis because these MAR assumptions have been regarded as untestable in most empirical applications. For instance, Hirsch and Schumacher [2004], Bollinger and Hirsch [2006] and Bollinger and Hirsch [2013] examine the quality of imputations for the studies that use Current Population Survey data, but the MAR assumption used for imputation with the data can sometimes exacerbate bias which arises from missing data. Although the MCAR and MAR assumptions are generally considered untestable (Horowitz and Manski [2006]), this paper shows that, for unbalanced linear panel data models which have at least two consistent estimators under the null hypothesis, the validity of the assumptions can be tested. The proposed test statistics extend the Hausman [1978] s specification test to examine the validity of the assumptions in unbalanced panel data. Construction of test statistics is based on the idea that, under the null hypothesis of the MCAR assumption (or the MAR assumption), the difference in the estimates of two consistent estimators should be entirely due to sampling errors. In testing the MCAR assumption, a test statistic is constructed using the difference between two scaled consistent estimators and it is shown to asymptotically follow χ 2 -distribution. The alternative 1 Missing completely at random (MCAR) and missing at random (MAR) assumptions used in this paper are introduced by Rubin [1976] and we use in the following sense. The MCAR assumption implies that the probability of missingness is independent of observed and unobserved variables. The MAR assumption implies that, conditional on observed variables, the probability of missingness does not depend on the unobserved variables. 2

3 hypothesis is chosen according to the types of nonrandom missing data. There are several attractive characteristics of the proposed tests of the MCAR assumption. First, no restriction is imposed on missing pattern for the justification of these test statistics. For instance, missing data can be of the dependent and explanatory variablesand of any types of missingness (e.g. attrition type and/or intermediate type). Second, the tests allow the researcher to focus upon the inconsistency of a single covariate of interest (e.g., critical core covariate in Lu and White [2014] or White and Lu [2011]) caused by nonrandom missing data. This could be practically important because this flexibility can enhance the power of the test. Third, the proposed tests of the MCAR assumption can be easily implemented with standard statistical software such as Stata. The tests can be implemented using any combinations of two estimations from pooled least square (LS), random effects (RE), fixed effects (FE) and first-differencing (FD) estimators where two estimators are chosen according to the alternative hypothesis in consideration. Furthermore, we show that the proposed method can be applied to test the MAR assumption, provided that two consistent estimators from imputation (or inverse probability weighting) are available under the null hypothesis. In practice, researchers often have information on missingness process and use this information with the MAR assumption to eliminate or mitigate biases from nonrandom missing data. This information on the missingness process with the MAR assumption was typically implemented with popular methods such as multiple imputation (MI) or inverse probability weighting (IPW). 2 The primary idea of the tests is based on the difference between two scaled consistent MI estimators (or IPW estimators). The null hypothesis implies the validity of the MAR assumption which indicates consistency of MI (or IPW) estimators. In our empirical illustrations, we choose either two imputed estimators among LS with imputation, FD with imputation and FE with imputation estimators where a MAR assumption is imposed to calculate missing observations; or two IPW estimators among weighted pooled LS, weighted FD and weighted FE estimators where a MAR assumption is used to calculate probability of observability. We select two estimators in the construction of a test statistic depending on which alternative hypothesis to use. We assume these estimators are consistent under the validity of the MAR assumption. The alternative hypotheses are the violation of exogeneity caused by correlation between unobserved factors and covariate of interest conditional on observability. Like a test of the MCAR assumption, a test statistic for testing the MAR assumption uses the idea that the difference between two 2 The MI method with the MAR assumption is routinely applied and the details of the MI method is well documented in Shafer [1997], Buuren et al. [1999], Schafer and Graham [2002], Little and Rubin [2002], Hahn [1998], Chen et al. [2008], and Imbens et al. [2007]. The IPW method with MAR was initially introduced by Horvitz and Thompson [1952] but had not been used much until recently because of inefficiency. This, however, has changed drastically since Robins et al. [1994] introduced efficient version of the IPW (it is called Augmented IPW or AIPW) method. The IPW method with the MAR assumption recently has been studied in Tsiatis [2006], Hirano et al. [2003], Wooldridge [2007], and Graham et al. [2012]. 3

4 scaled consistent estimators under the null hypothesis (i.e. missingness is random after conditioning on observed variables) should be entirely because of sampling error and it is shown to converge to χ 2 - distribution. All three attractive characteristics of the tests of the MCAR assumption are retained for the test of the MAR assumption. In particular, focusing only upon a single covariate is important in practice because devising the MAR assumption so that all included covariates are conditionally random is difficult and probably unrealistic, while the MAR assumption that one variable of interestis conditionally random is relatively easier to work with. However, in contrast to a test of the MCAR assumption, the MAR assumption requires that the missingness process should be explained by observed variables which have to be excluded in the outcome model. In most applications, all explanatory variables in the outcome model and other excluded observed variables (i.e. exclusion restriction) are used as predictors for doing imputation and calculating the probability of observability. Similar to the proposed tests of the MCAR, the test of the MAR assumption is also easily implementable with popular statistical software such as Stata. The tests can be implemented after combining two estimations from either imputed or weighted pooled LS, RE, FE and FD estimators. Stata ado codes to implement the HT test of the MCAR assumption and the MAR assumption with IPW and MI estimators can be downloaded at Stata do file with illustrative examples and Stata help file for the codes are also provided at the website. Finite-sample properties of the proposed tests are examined with Monte Carlo (MC) experiments and the MC results show substantial power and no size distortion of the tests. For sensitivity analyses, various types of alternative hypotheses that induce bias for pooled LS, RE, FE and FD estimators are examined in the MC experiments. It is evident that the proposed tests of MCAR and MAR assumptions perform well in a variety of models. Practical usefulness of the proposed tests is illustrated by the study of the effects of trade liberalization policies World Trade Organization (WTO) membership, Regional Trade Agreement (RTA) membership, Currency Union (CU) membership on trade flows. A gravity equation in unbalanced panel data is often estimated with error-component models as in Wansbeek and Kapteyn [1989], Baltagi and Chang [1994], Werner [2001], and Davis [2002]. Justification of the results from these methods requires the validity of the MCAR assumption but formal testing of the assumption is rare in previous studies. In our sample, the dependent variable (trade flows) suffers severely from missing data, with a missing proportion of about 50%. Assuming the other sources of biases in the gravity model are properly controlled, 4

5 we test the validity of the MCAR assumption by examining whether there is a statistically significant difference between the FE and FD estimates beyond sampling error. Furthermore, following previous studies that use the MAR assumption with imputation in the gravity model as in Linders and de Groot [2006] and Felbermayr and Kohler [2006], we show usefulness of the HT test of the MAR assumption by imputing missing observations for trade flows. We also conduct the HT tests of the MAR assumption with the IPW and two-step estimators where inverse probability weights and the inverse mills ratio (IMR) are obtained by modeling a missing/observability process with observed variables including exclusion restrictions. Our illustration of the HT tests provides several important practical implications. First, the HT test rejects the null hypothesis that the MCAR assumption is valid. 3 Second, the difference between conventional FE and FD estimates and the difference between corresponding FE and FD estimates with imputation suggest that severe bias exists because of missing data and violation of the MAR assumption respectively. Rejection of the null hypothesis from the HT tests of the MAR assumption with imputation confirms a violation of the MAR assumption used for imputation in the literature. Furthermore, test results imply that the effects of trade liberalization policies such as WTO and RTA were severely underestimated by ignoring missing data. The trade creation effect of WTO membership is estimated to be 0% without imputation but 55% with imputation while RTA membership increases trade flows by 13% without imputation and 105% with imputation. Therefore, the WTO and RTA membership effects are underestimated by 55 and 92 percentage points respectively. Moreover, the HT tests of the MAR assumption with imputation also implies that the standard and most popular methods of controlling for omitted variable bias are not justifiable unless the MAR assumption used for the imputation model is valid. Assuming that our imputation model of the MAR assumption from economic argument is reasonable, we can attribute the difference between the FE and FD estimates with imputation to omitted variable bias that is not completely removed by conventional methods of controlling omitted variables suggested in Magee [2003], Baldwin and Taglioni [2006], Baier and Bergstrand [2007, 2009]. Finally, twostep and IPW estimates are similar to conventional estimates that ignore missing data. Furthermore, the qualitative conclusions we draw from the HT tests for the coefficients on WTO, RTA and CU membership variables are the same for conventional, IPW and two-step estimators. This implies that the MAR assumption used to obtain IPW and IMR correction term do little practically to mitigate/eliminate bias 3 We implement one of the most popular methods used in the gravity model that accounts for three-way error component models as in Anderson and van Wincoop [2003], Feenstra [2004], Baldwin and Taglioni [2006], Baier and Bergstrand [2007], Magee [2008], Eicher et al. [2012], Matyas and Balazsi [2012] among others. 5

6 caused from nonrandom missingness. This is not surprising in our application because we are limited in the availability of exclusion restrictions that predict whether trade flows are missing or not. Specifically, we use religion and lagged dependent variables as exclusion restrictions, following Helpman et al. [2008] in the first stage estimation of the probability of observability. These variables may not have enough variation to correctly predict the probability of observability and to have a major effect on the estimates for variables of trade liberalization policies. The rest of paper is organized as follows. In section 2, we introduce the HT test statistics to test the MCAR assumption in an unbalanced panel data. Section 3 extends the HT test statistics to test the MAR assumption. Section 4 shows the size and the power of the test statistics through Monte Carlo experiments. Section 5 provides application of HT tests to the effect of trade liberalization on trade flows. Section 6 contains concluding remarks. Appendix contains technical details, additional Monte Carlo experiments, and a recipe for our tests routine in Stata. 2 Linear panel data models with missing data A formal test of the MCAR assumption is based on a test statistic that uses a pairwise difference between two estimates. Two estimators used in the test are chosen among four popular linear panel data estimators - pooled LS, RE, FE and FD estimators. 2.1 Model We consider the following linear panel data model with missing data where observability is denoted by observability indicator s i t : y i t = d i t β + x 1i t δ + f t + c i + u i t, for i = 1,2,..., N and t = 1,2,...,T (1) 1 if all of y i t, x i t are observed s i t = 0 otherwise (2) where y i t is an outcome, d i t is a vector of regressors of primary interest, x 1i t is a vector of remaining observed covariates, f t is time effect, c i is unobserved heterogeneity and u i t is idiosyncratic error. Using x i t = (d i t, x 1i t, f t ), θ = (β,δ,1) and v i t = c i + u i t, we can rewrite (1) with selection indicator more 6

7 compactly as follows s i t y i t = s i t x i t θ + s i t v i t, for i = 1,2,...,n and t = 1,2,...,T. (3) Pooled LS, RE, FE and FD estimators in unbalanced panel data model are: N θ pl s = ( s i t x i t x i t ) 1 N s i t x i t y i t, (4) N θ RE = ( s i t x i t x i t ) 1 N s i t x i t ỹi t, (5) where.ẋ i t = x i t x i, x i = 1 T i N θ F E = (.. s i t x.. i t x i t ) 1 N s i t x i t, x i t = x i t λx i, λ = 1 N θ F D = ( i=1 t=2 s F D i t x i t x i t ) 1 N.. s i t x.. i t i=1 t=2 σ 2 u σ 2 u+t i σ 2 c y i t, (6) and T i = T s i t, and s F D i t x i t y i t (7) where x i t = x i t x i t 1 for t = 2,3,...,T and s F D i t = min{s i t, s i t 1 }. 2.2 Conditions for consistent estimators in unbalanced linear panel data models We consider that violation of the MCAR assumption is the only source of inconsistency in the estimation of θ. Therefore, we assume E(v i t x i ) = 0 t so, as long as E(v i t x i ) = E(v i t x i, s i ), the pooled LS, RE, FE and FD estimators are all consistent. We can relax the assumption even further as E(u i t x i t,c i ) = 0 and E(u i t x i t,c i ) = E(u i t x i t,c i, s i t ) so that FE and FD estimators are consistent under the MCAR assumption conditional on fixed effects c i. For the cases where we are only interested in single parameter in θ, we can relax the assumption in a way that E(u i t d 1i t x i t,c i ) = 0 t where x i = (d 1i t, x i t ) so, if E(u i t d 1i t x i t,c i ) = E(u i t d 1i t x i t,c i, s i t ), then FE and FD estimators for θ are consistent. The violation of the MCAR assumption occurs because of the following two instances of endogeneity: (1) E(c i x i, s i ) E(c i x i ) = 0 or (2) E(u i t x i t,c i, s i t ) E(u i t x i t,c i ) = 0, for at least one t. For the first case, time-constant unobserved factors are correlated with the covariate of interest conditional on s i t = 1 while, for the second case, time-varying unobserved factors are correlated with the covariate of interest 7

8 conditional on observability (s i t = 1). For each alternative, the consistent estimators we can use for constructing HT tests are different but, for both cases, we can construct an HT test statistic using the difference between FE and FD estimates. Besides standard regularity conditions, rank and exogeneity conditions are essential for consistency. They are assumed in equations (8) and (9) for the pooled LS, in (10) and (11) for the FE, and in (12) and (13) for the FD estimators: r anke(s i t x i t x i t ) = di m(θ), t = 1,2,...,T (8) r anke(s i t.. E(s i t x i t v i t ) = 0, t = 1,2,...,T (9) x.. i t x i t ) = di m(θ), t = 1,2,...,T (10).. E(s i t x.. i t u i t ) = 0, t = 1,2,...,T (11) r anke(s F D i t x i t x i t ) = di m(θ), t = 2,...,T (12) E(s F D i t x i t u i t ) = 0, t = 2,...,T. (13) For the RE estimator, r anke(s i t x i t x i t ) = di m(θ) and conditional strict exogeneity, E(u i t x i, s i ) = 0, are sufficient for consistency of ˆθ RE where s i = (s i 1, s i 2,..., s i T ) and x i = (x i 1, x i 2,..., x i T ). 2.3 Tests of the MCAR assumption For simplicity of illustration, let s consider a random experiment on d 1i t so the pooled LS, RE, FE and FD estimators of β are all consistent under the MCAR assumption (i.e. E(v i t d 1i t ) = 0 and under the MCAR assumption E(v i t d 1i t ) = E(v i t d 1i t, s i t = 1), t). Using this fact, we construct a HT test statistic comparing two consistent estimators. A pairwise difference between two estimators should be entirely because of sampling errors if the MCAR assumption is satisfied. For instance, we can select two consistent estimators under the null hypothesis, say the FD and FE estimators, and construct a test statistic by using difference between the FD and FE estimators. Depending upon the source of violation of the MCAR assumption, we can consider the following two alternative hypotheses, (1) E(c i d 1i t, x i t, s i t = 1) E(c i d 1i t, x i t ) = 0 for some t and (2) E(u i t d 1i t, x i t,c i, s i t = 1) E(u i t d i t x i t,c i ) = 0, for some t and we choose two estimators accordingly. Against these alternative hypotheses, the null hypothesis is E(v i t d i t, x i t ) = E(v i t d i t, x i t, s i t = 1) = 0 t or E(u i t d i t,c i, x i t ) = E(v i t d i t, x i t,c i, s i t = 1) = 0 t, respectively. 8

9 For the first alternative, if we assume E(s i t d 1i t u i t ) = 0, t and E(s i t d 1i t c i ) 0, t, then FE and FD are consistent but pooled OLS and RE are inconsistent. Because we assume E(d 1i t c i ) = 0, E(s i t d i t c i ) 0 implies that non-zero correlation between d 1i t and a time-invariant unobserved factor is caused by the violation of the MCAR assumption. Under the alternative, the FE and FD estimators are consistent while the pooled LS and RE estimators are inconsistent. Thus, against this alternative, we can construct four HT test statistics using following four pairs: LS-FE, LS-FD, RE-FE, and RE-FD. The second alternative is violation of strict conditional exogeneity E(s i t d i t u i ) E(d i t u i ) = 0 for some t. Against this alternative, all four estimators are inconsistent so we rely on the different magnitude of the inconsistency of each estimator to detect the violation of the MCAR assumption. If we do not have any prior information on the sources of nonrandom missingness, it is safe to assume that the MCAR assumption is violated because of the correlation between d i t and time-varying and time-invariant unobserved factors conditional on observability. Therefore, in practice, we should consider both alternatives (1) and (2) and construct the HT test statistics accordingly. 2.4 Hausman-type tests of the MCAR assumption Test of multiple parameters We consider a HT test statistic that uses a difference between the pooled LS and FE estimators. Under the null hypothesis, the pooled LS and FE estimators are consistent. Depending on which of two alternatives we consider, either only the pooled LS is inconsistent or both the pooled LS and FE estimators are inconsistent under the alternative hypothesis. HT test statistic is constructed by stacking two k- dimensional consistent estimators θ pl s and θ F E as θ 1 = ( θ pl s, θ F E ) and multiplying a restriction matrix, R k 2k = [ I p k p I p k p ] where k is the dimension of θ and p is the dimension of covariates of interest, d i t, to obtain linear transform of θ 1. The transformation provides a restriction of θ pl s θ F E converging to zero under the null hypothesis. We note that N θ1 = 1 N N s i t x i t x i t N N.. s i t x.. i t x i t 1 2k 2k 1 N 1 N N s i t x i t y i t N.. s i t x.. i t y i t 2k 1. (14) We can write the null hypothesis as below (15) where pooled LS and FE estimators are consistent. H 0 : Rθ 1 = 0, 9

10 Rθ 1 = θ 1,pl s θ 1,F E θ 2,pl s θ 2,F E. = θ p,pl s θ p,f E 0. 0 k 1 k 2k θ 1,pl s. θ k,pl s θ 1,F E. θ k,f E 2k 1 = 0. (15) The null hypothesis states H 0 : Rθ 1 = 0, which equals H 0 : θ j,pls θ j,f E = 0 j = 1,2,..., p. We construct a test statistic W 1 which converges asymptotically to a chi-squared distribution with degree of freedom equal to the number of restrictions, p, in (15) under the following assumptions. Assumption Conditional strict exogeneity: E(v i t x i, s i ) = 0 t. 2. Rank conditions: E( T s i t.. x.. i t x i t ) and E( T s i t x i t x i t ) have full rank. 3. For fixed T, regularity conditions 4 to invoke CLT to the following objects, 1 N N.. s i t x.. i t u i t so both objects are N -asymptotically normal. 1 N N s i t x i t v i t and The next theorem summarizes the result. Theorem 1 Under the Assumption 2.1, a Wald statistic W 1 converges to χ 2 p distribution under the null hypothesis of H 0 : Rθ 1 = 0, where W 1 = [R N ( θ 1 )] [R var ( N θ 1 )R ] 1 R N ( θ 1 ) χ 2 p 4 Both sufficient and necessary conditions are provided at section 5.3 in Newey and McFadden [1994]. We omit them for brevity. 10

11 with R θ 1 = k 2k θ 1,pl s. θ k,pl s θ 1,F E. θ k,f E 2k 1 = θ 1,pl s θ 1,F E θ 2,pl s θ 2,F E. θ p,pl s θ p,f E k 1 We call W 1 as robust HT test statistic. 5 The rejection of the null hypothesis implies that either (i) the pooled LS estimator is inconsistent and the FE estimator is consistent or (ii) the pooled LS and FE estimators are inconsistent. The pooled LS estimator is inconsistent and the FE estimator is consistent if the sole cause of nonrandom missingness is the correlation between time-invariant unobserved factors and covariates conditional on observability (i.e. E(c i x i t s i t ) 0 for some t). The pooled LS and FE estimators are inconsistent if E(u i t x i, s i ) 0 for some t. A rejection of null hypothesis under any one of these alternatives necessarily implies the violation of the MCAR assumption. Similarly, we can construct another HT test statistic that uses the difference between other pairs of consistent estimators such as θ F D and θ F E under the null hypothesis of the MCAR assumption. We need to adjust strict exogeneity assumption, rank condition, and regularity conditions for N -asymptotically normal to invoke CLT. Assumption 2.1A 1. Strict exogeneity condition: E(u i t x i, s i ) = 0 t. 2. Proper rank conditions: E( T s i t.. x.. i t x i t ) and E( T t=2 s F D i t x i t x i t ) have full rank 3. For fixed T, regularity conditions to invoke CLT to the following objects: and 1 N N.. s i t x.. i t u i t. 1 N N i=1 t=2 s F D x i t i t u i t We use the following stacked estimator θ 2 = ( θ F D, θ F E ) that uses difference between the FD and FE estimators: 5 We call our test statistic as robust HT test in the sense that we do not make any imposition on second moment of error term to obtain efficiency unlike in the original Hausman test. We instead estimate sandwich form of error covariance matrix. 11

12 N θ2 = 1 N N i=1 t=2 s F D i t x i t x i t N N.. s i t x.. i t x i t 1 2k 2k 1 N N i=1 t=2 1 N N s F D x i t i t y i t.... s i t x i t y i t 2k 1. (16) We write the null hypothesis as below and the FD and FE estimators are consistent under the null hypothesis of the MCAR assumption. Rθ 2 = H 0 : Rθ 2 = 0, (17) θ 1,F D θ 1,F E θ 2,F D θ 2,F E. = θ p,f D θ p,f E 0. 0 k 1 = 0. k 2k θ 1,F D. θ k,f D θ 1,F E. (18) θ k,f E 2k 1 Proposition 2 Under Assumption 2.1A, a Wald statistic W 2 converges to χ 2 p distribution under the null hypothesis of H 0 : Rθ 2 = 0, where W 2 = [R N ( θ 2 )] [R var ( N θ 2 )R ] 1 R N ( θ 2 ) χ 2 p. (19) The rejection of the null hypothesis with the HT test statistic, W 2, implies that the FE and FD estimators are inconsistent so conditional strict exogeneity is violated. We can similarly construct a statistic W 3 that uses difference between the pooled LS and FD estimators with adjusted exogeneity and rank conditions using stacked estimators θ 3 = ( θ pl s, θ F D ) and the null hypothesis of H 0 : Rθ 3 = 0. We have that 12

13 W 3 = [R N ( θ 3 )] [R var ( N θ 3 )R ] 1 R N ( θ 3 ) χ 2 p. (20) The possible number of HT test statistics constructed from pairwise differences between two consistent estimators under the null hypothesis are six. We can additionally construct two HT test statistics by replacing the pooled LS estimator with the RE estimator for W 1 and W 3. This increases the power of the test with additional assumption on the second moment because these assumptions guarantee that the RE estimator is ideal (e.g. Hausman [1978]). Test statistics can also be constructed using the differences that replace the pooled LS with the RE estimators. Using the RE estimator enhances the power of the tests assuming that additional assumptions for the second moment are satisfied Test of one parameter Sometimes we are interested in the causal relationship of small set of core parameters or often only one parameter. The practical importance of focusing only on small number of core parameters is also examined in Lu and White [2014]. This is the case for our application in this paper. The adjustment of a multiple parameter HT test statistic to a one parameter HT test statistic is very important in practice because, in general, it provides a higher power and enables us to use the conditional MCAR assumption (which is much easier to satisfy) rather than MCAR assumption. We denote the one parameter version of test statistics that correspond to the multiple parameter version test statistics W q for q = 1,2,...,5, by T q for q = 1,2,...,5, respectively. A HT statistic T q using the null hypothesis H 0 : R j θ q = 0 where R j is j th row vector of k 2k matrix R is given as follows: R j n( θq ) T q = R j var ( n θ t n 1 z. (21) q )R as y j 2.5 Variable addition tests Simple tests of violation of the MCAR assumption for unbalanced panel data, called variable addition tests, have been introduced in the literature (Verbeerk and Nijman [1996] and more recently in Wooldridge [2010a]). The idea of the tests is that under the MCAR assumption any function of s i t x i t should not appear in the expression for E(s i t y i t ). So a test on the coefficients for the lag and lead values of {s i t,s i t x i t } has been proposed accordingly. Rejection implies the violation of the MCAR assumption. 13

14 For instance, a simple variable addition test for the MCAR assumption is a robust t-test for the coefficient on s i t+1 in pooled linear regression model as follows: y i t = γ 1 s i t+1 + x i t θ + u i t, for s i t = 1. (22) We can similarly implement variable addition tests for the FE and FD estimators as follows: ÿ i t = γ 2 s i t+1 + ẍ i t θ + ü i t, for s i t = 1 (23) where ÿ i t = s i t (y i t Σ T k=1 s i k y i k ) and ẍ i t and ü i t are similarly defined. We also consider the relative performance of HT tests and variable addition tests by comparing the power and size bias of the tests under various alternative hypotheses using Monte Carlo simulation. We consider four data generating processes in section 5. Although the variable addition test is quite useful, it is not easy to extend to the test of the MAR assumption so in the test of the MAR assumption, we focus on the HT test. 3 HT test of the MAR assumption using MI, IPW, and Two-Step estimators In this section, we extend our method to test the MAR assumption used to generate probability of observability, to impute missing observations, or to generate a correction term for sample selection. The proposed HT tests are extended to test the MAR assumption by constructing HT statistics from the MI, Two-step and IPW estimators. 3.1 MI estimators First, we show an extension of the HT test based on the MI estimators. We use the following notations for observability s i t : for any random draw w i t = (y i t, d i t, x 1i t ) and s i t from the population, s i t is equal to 1 if observations for w i t are available and 0 otherwise. We consider the pooled LS, FE and FD estimators from MI method. For simplicity of exposition, we illustrate the case where missing data only occurs in the dependent variable of the following model: y i t = d i t β + x 1i t δ + f t + c i + u i t for,i = 1,2,...,n and t = 1,2,...,T. (24) 14

15 We assume nonrandom missingness to be the only source of inconsistency and 0 = E(u i t d i t, x 1i t,c i ) E(u i t d i t, x 1i t,c i, s i t = 1), (25) so the MI-FE and MI-FD estimators are consistent if the model for imputation is correctly specified. Besides the assumptions 2.1 (or 2.1A), we additionally assume the MAR assumption a correct model specification for the distribution of missing observations for the dependent variable conditional on the availability of predictors. Let z 1i t be the exclusion restriction. For the sake of simplicity, we suppose that z i t = (d i t, x 1i t, z 1i t ) are available for s i t = 1 and s i t = 0 so only the dependent variable and the core variable d i t have some missing observations. This is an important case because missing observations on either the dependent variable or only on some core covariates are common as in our application. Assumption 3.1 Conditional distribution of y m,i t given z i t is correctly specified as D(y m,i t z i t,c i ), (26) where y m,i t denotes a missing dependent variable for individual i at time t. This assumption is also known as selection on observables. Under the validity of the MAR assumption (i.e. the model for missing observations, y m,i t, conditional on predictors, z i t is correct), we obtain a consistent estimator for β. Our proposed HT test of the MAR assumption is constructed by using the difference between two consistent estimators under the null, say MI-LS and MI-FD estimators, and the test is based on the idea that the difference should be entirely because of sampling error as long as the model for imputation in the equation (26) is correctly specified. The MI estimators are obtained from three steps. First, the missing values, y m,i t, are filled with predicted values of y m,i t from (26) in M times so we obtain M complete data. Second, standard balanced panel data methods are applied to the M complete data sets. Therefore, we can obtain the MI-FE and MI-FD estimators using the original and imputed observations. For instance, for the MI-FE estimator, we get coefficients estimates on d i t, ˆβj,f e, and its covariance matrix ˆ V j where j = 1,2,..., M. Finally, the results from the M analyses are combined into a single analysis by applying the Rubin s formula for MI-FE estimator as follows: ˆβ M I f e = 1 M M j =1 ˆβ j,f e ; ˆV M I f e = 1 M M j =1 ˆV j,f e + M + 1 M B f e (27) 15

16 where B f e = 1 M 1 M j =1 ( ˆβ j,f e ˆβ f e )( ˆβ j,f e ˆβ f e ). ˆβ M I f d and ˆV M I f d can be similarly obtained. We can extend the formula in (27) to a HT test statistic, W M I 1. We stack two k-dimensional estimators ˆβ j,m I f e, ˆβ j,m I f d and define this as ˆθ j = ( ˆβ j,m I f e, ˆβ j,m I f d ) and its variance estimate as var ( θ j ). Using ˆθ j and var ( θ j ) and applying (27) we obtain ˆθ M I 1 = 1 M M ˆθ j ; var ( θ) M I 1 = 1 M j =1 M j =1 var ( θ j ) + M + 1 B, (28) M where B = 1 M 1 M j =1 ( ˆθ j ˆθ M I )( ˆθ j ˆθ M I ). Then W M I 1 is obtained as follows: W M I 1 = [R N ( θ M I 1 )] [RN var ( θ) M I 1 R ] 1 R N ( θ M I 1 ). Under conditional strict exogeneity, with the correct model for observability, and rank condition and regularity conditions, it could be shown that W M I 1 converges to χ 2 p distribution under the null hypothesis of Rθ M I 1 = 0. The result is summarized in the following proposition. Proposition 3 Under the Assumptions 2.1 and 3.1, a test statistic W M I 1 converges to χ 2 p distribution under the null hypothesis of H 0 : Rθ M I 1 = 0, where W M I 1 = [R N ( θ M I 1 )] [RN var ( θ) M I 1 R ] 1 R N ( θ M I 1 ) χ 2 p. A test statistic for the test of one parameter can be similarly obtained using pairwise difference between two MI estimators. A statistict M I 1 is given below: R j N ( θm I q ) T M I q = R j var ( N θ t n 1 z (29) M I q )R as y j where q = 1,2,...,5 with the null hypothesis H 0 : R j θ q = 0 and where R j is j th row of R matrix, which is a 1 2k row vector. The complete observability of z i t, the predictors in imputation model, is a strong assumption in practice. A weaker assumption has been studied in Wooldridge [2007] where the assumption of complete observability of the predictors is relaxed to sequential observability of the predictors as Assumption 3.1A. Assumption 3.1A A random vector z i t is observed whenever s i t 1 = 1. Accordingly, we adjust the model for imputation from (26) to the following: 16

17 E(y m,i t z i t,c i, y i t 1, s i t 1 = 1) (30) and apply the MI method using the above three-steps. This relaxation is especially important for the attrition type of missing data because of its presence in many economic examples including our application. 3.2 IPW estimators We extend the HT test to the IPW estimators where the MAR assumption is used to obtain the probability of observability (ŝ i t ). Besides the assumption 2.1 (or 2.1A), we assume the following for the probability of observability model as in Rubin [1976], Little and Rubin [2002] and Wooldridge [2007]. Assumption A random vector z i t is available for a model of observability: E(s i t d i t, x 1i t, z i t ) = P(s i t = 1 d i t, x 1i t, z i t ) = P(s i t = 1 z i t ) = f (z i t ) and f (z i t ) > 0 z i t Z. 2. A random vector z i t is always observed. 3. A parametric model G(z i t,γ) for f (z i t ) is available so E(s i t w i t, z i t ) can be consistently estimated. We consider linear panel data model where G(z i t,γ) is probability of observability: s i t y i t G(z i t,γ) = s i t x i t θ G(z i t,γ) + s i t v i t for i = 1,2,..., N and t = 1,2,...,T. (31) G(z i t,γ) The pooled LS estimator for the model with IPW (IPW-LS estimator) is provided as follows: N s i t x i t θ I PW pl s = ( x i t N s i t x G(z i t, γ) ) 1 i t y i t G(z i t, γ) (32) where G(z i t, γ) is estimated consistently at the first stage by using parametric models such as Probit (or Logit) model with s i t = 1(z i t γ + e i t > 0) (33) where e i t follows either a normal or logistic distribution. 17

18 Similarly, we can apply the IPW-FE and IPW-FD estimators to the equation (31) by applying the relevant transformation. Then, the IPW-FE estimator is: where x w i t = s i t x i t G(z i t, γ) 1 T i T N θ I PW F E = ( x w i t x w i t ) 1 N x w i t ỹ w i t (34) s i t x i t G(z i t, γ) and ŷ w i t = s i t y i t G(z i t, γ) 1 T i T s i t y i t G(z i t, γ), and the IPW-FD estimator is: N θ I PW F D = ( w x i t w x i t ) 1 N w x i t w y i t (35) i=1 t=2 where w x i t = s i t x i t G(z i t, γ) s i t 1x i t 1 G(z i t 1, γ) for s i t = s i t 1 = 1. i=1 t=2 The IPW-LS estimator is consistent as long as E(s i t w i t, z i t ) is consistently estimated, as summarized in the following theorem. Lemma 4 Under Assumptions 2.1 and 3.2, pl i m θ I PW pl s = θ. Consistency can be shown with the law of iterated expectation. See the appendix A for a proof. The consistency for the IPW-FE and IPW-FD estimators under the MAR assumption can be similarly obtained under assumptions 2.1 and 3.2. Following Wooldridge [2007], we can relax the assumption of complete observability of the predictors to partial observability of the predictors for the consistency of the IPW-LS and the IPW-FD estimators as follows: Assumption 3.2A A random vector z i t is observed whenever s i t = 1. The results we use for the proof of consistency of the IPW-LS and IPW-FD estimators remain the same except the first-stage estimation of G(z i t, γ). We instead use the following equation to estimate f (z i t ) because we do not have observation z i t for unit i with s i t = 0: p(s i t = 1 z i t ) = p(s i t = 1 s i t 1 = 1, z i t 1 ) p(s i t 1 = 1 s i t 2 = 1, z i t 2 ) p(s i 1 = 1) G i t (36) where we assume p(s i 1 = 1) = 1(i.e. random sample is assumed at the beginning of period). We estimate π i t by using parametric models such as a Probit model as in (37): 18

19 π i t = Φ(c p + z i t 1 γ > 0), for s i t 1 = 1 (37) where z i t 1 is observed and Φ( ) is the standard normal CDF. A HT test statistic could be constructed by using the difference between two inverse probability weighted estimators from the IPW-LS, IPW-FE and IPW-FD estimators under complete observability of predictors z i t or from the IPW-LS and IPW-FD estimators under partial observability of predictors z i t 1. As an illustration, the IPW-LS and IPW-FE estimators are considered. We stack two k-dimensional consistent estimators θ I PW l s and θ I PW F E and use the stacked estimator θ I PW 1 = ( θ I PW LS, θ I PW F E ) to construct a HT test statistic as in the previous section: N θi PW 1 = 1 N N 0 s i t x i t x i t G(z i t, γ) 0 1 N N x w i t x w i t 1 2k 2k 1 N 1 N N N s i t x i t y i t G(z i t, γ) x w i t ỹ w i t 2k 1. (38) The null hypothesis of the test is provided with a linear transformation of θ I PW 1 by multiplying a restriction matrix R. This transformation provides a restriction of θ j,i PW l s = θ j,i PW F E where j = 1,..., p. The null hypothesis H 0 : Rθ I PW 1 = 0 equals H 0 : θ j,i PW pl s = θ j,i PW F E j = 1,2,..., p. Under conditional strict exogeneity, with the correct model specification for observability, and rank condition and regularity conditions, it could be shown that W I PW 1 converges to χ 2 p distribution under the null hypothesis of Rθ I PW 1 = 0. Proposition 5 Under the Assumptions 3.1 and 3.2, a test statistic W I PW 1 converges to χ 2 p distribution under the null hypothesis of H 0 : Rθ I PW 1 = 0, where W I PW 1 = [R N ( θ I PW 1 )] [R var ( N θ I PW 1 )R ] 1 R N ( θ I PW 1 ) χ 2 p. The other HT test statistics that use the IPW estimators can be obtained as follows: W I PW 2 = [R N ( θ I PW 2 )] [R var ( N θ I PW 2 )R ] 1 R N ( θ I PW 2 ) χ 2 p, W I PW 3 = [R N ( θ I PW 3 )] [R var ( N θ I PW 3 )R ] 1 R N ( θ I PW 3 ) χ 2 p 19

20 where θ I PW 2 = ( θ I PW F D, θ I PW F E ) and θ I PW 3 = ( θ I PW l s, θ I PW F D ). With partially available predictors, a valid test statistic cannot be constructed using θ I PW 1 or θ I PW 2 because of the sequential nature of obtaining probability weight estimates. 6 Thus, W I PW 3 can be only used for partially observed predictors. A test statistic for the test of one parameter can be similarly obtained using pairwise difference between two IPW estimators. A statistict I PW q is given below: R j N ( θi PW q ) T I PW q = R j var ( N θ t n 1 z (39) I PW q )R as y j where q = 1,2,3 with the null hypothesis H 0 : R j θ q = 0 and where R j is j th row of R matrix, which is a 1 2k row vector. 3.3 Two-step estimator We can also extend the HT test applicable with sample-selection correction using the MAR assumption (i.e. selection on observables, z i t ) to model selection/observability with a joint normality assumption on outcome and observability errors. Consider the following outcome and observability model: y i t = x i t β + c i + u i t, and 0 = E(u i t x i t ) E(u i t x i t, s i t = 1) and impose the following condition. Assumption 3.3 A random vector z i t is observed and selection model is provided as follows: s i t = 1(z i t γ + v i t > 0) and E(u i t v i t ) = ρv i t. Then, using the assumption that E(u i t v i t ) = ρv i t, we obtain: E(y i t z i t, s i t = 1) = x i t β + c i + ρe(v i t z i t, v i t > z i t γ) = x i t β + c i + ρ φ( z i t γ σ v ) Φ( z i t γ σ v ). (40) Missingness is not random because correlation between u i t and v i t (i.e. ρ 0) and correlation between x i t and v i t cause inconsistency for an estimator of β. However, the pooled LS, FE and FD es- 6 For the IPW-FE estimator, probability of observability at time t is a function of all time periods because demeaning transformation involves all time periods. 20

21 timators from the regressions of (40) are all consistent because using the selection correction term φ( z i t γ σv ) as an additional regressor removes the inconsistency because of nonrandom missingness (i.e. ) Φ( z i t γ σv E(u i t z i t, s i t = 1) = 0). We can construct the HT test statistics by comparing these two-step estimators of β. Given that nonrandom missingness is the sole source of inconsistency of β, the assumptions of correct specification of selection model s i t = 1(z i t γ + v i t > 0) and E(u i t v i t ) = ρv i t provide consistent estimators for β from two-step estimations. As an illustration, the two-step pooled LS and two-step pooled FE estimators are considered. We stack two k-dimensional consistent estimators θ T S l s and θ T S F E and use the stacked estimator θ T S 1 = ( θ T S l s, θ T S F E ) to construct a HT test statistic as in the previous section: N θt S 1 = 1 N N s i t x i t x i t N N.. s i t x.. i t x i t 1 2k 2k 1 N 1 N N s i t x i t y i t N.. s i t x.. i t y i t 2k 1 (41) where x i t = (x i t, φ( z i t γ σv ) Φ( z i t γ σv )). The null hypothesis of the test is provided with a linear transformation of θ T S 1 by multiplying R, which is the same as the test of MCAR in the previous section. The next proposition provides asymptotic distribution of a test statistic. Proposition 6 Under the Assumptions 3.1 and 3.3, a test statistic W T S 1 converges to χ 2 p distribution under the null hypothesis of H 0 : Rθ T S 1 = 0, where W T S 1 = [R N ( θ T S 1 )] [R var ( N θ T S 1 )R ] 1 R N ( θ T S 1 ) χ 2 p. 4 Monte Carlo Experiment Finite-sample performance of the HT tests for the MCAR assumption and the MAR assumption with the IPW and two-step estimators are provided. The results show that no size distortion and high power of the HT tests in various models. 4.1 Tests of the MCAR assumption We consider the following linear panel data model where the response variable is continuous and explanatory variables can be either binary or continuous: 21

22 y i t = x i t β + c 1i + u i t, for i = 1,2,...,n and t = 1,2,...,T (42) where x i t includes binary (treatment dummy) variables and continuous variables. We are interested in performing statistical inference on β. Selection processes reduce the sample size from 40 to 70 percent of full samples. We consider observations of cross-section dimension from n=100 to n=1,000 and of time-dimension from T =4 to T =8. Note that s i 1 = 1; s i t = 1(c p + z i t γ + c 2i + v i t > 0) (43) where c p is chosen to make the missing fraction be between 40 and 70 percent. The violation of the MCAR assumption is induced by the correlation between u i t (or c i ) and x i t conditional on observability (s i t = 1) Binary explanatory variable d i t and nonrandom missingness We consider following two data generating processes (DGPs) as benchmark where we are interested in estimating β in equation (42). The following DGPs are chosen to mimic our empirical example in a simple way. 7 DGP I 1. c 1i = i i d N (0, 1 16 ) + w 1i ; c 2i = q i + w 2i ; q i i i d N (0,1). 2. (w 1i, w 2i ) B N 0, 1 ρ where B N is bivariate normal. 0 ρ 1 3. d i t = q i + i i d N (0, 1 16 ); x 1i t = 1 if d i t > 0; β 1 = x 2i t i i d N (0,1); β 2 = z i t i i d N (0,1); γ = u i t i i d N (0,1); v i t i i d N (0,1). 7 To save space, we only report the HT tests of the MCAR assumption on the coefficient on a binary covariate, d i t while we report the HT tests for the MAR assumption in estimating continuous covariate. We provide DGPs for the HT tests of the MCAR assumption on a continuous covariate in the Appendix B. The qualitative implications from simulations remain exactly the same. 22

23 First, in the DGP I, the violation of the MCAR assumption is induced by the correlation between the time-invariant unobserved factor (w 1 ) and x 1i t conditional on observability. The correlation is indirect as it appears between error in selection model and x 1i t, and between error in selection model and timeinvariant unobserved factor in c 1i. Therefore, although x 1i t and c 1i are not directly correlated, they are correlated conditional on s i t = 1. The pooled LS estimator is inconsistent but the FE and FD estimators are consistent conditional on observability because the unobserved factor in c 1i causes inconsistency on the coefficient of x 1i t. In DGP I, the null hypothesis of the MCAR assumption is satisfied only if ρ = 0. In addition, ideal conditions for the RE model are satisfied under the null hypothesis. Thus, we can replace the LS estimator with the RE estimator in the construction of HT test. However, in the DGP I under alternative of ρ 0, the LS and RE estimators are inconsistent while the FE and FD estimators are consistent. Thus, we perform tests of the MCAR assumption using the HT statistics of W 1 (based on difference between the LS and FE estimators), W 2 (based on difference between the FD and FE estimators), W 3 (based on difference between the LS and FD estimators). DGP II 1. c 1i = i i d N (0,1); c 2i = i i d N (0,1). 2. z i t i i d N (0, 1 4 ), γ = d i t = z i t α + i i d N (0, 1 16 ); x 1i t = 1 if d i t > 0; β 1 = x i t i i d N (0,1); β 2 = (u i t, v i t ) B N 0, 1 ρ 0 ρ 1. Second, in the DGP II, the violation of the MCAR assumption is induced by the correlation between timevarying unobserved factor (u i t ) and x 1i t conditional on observability. In this case, all three estimators the pooled LS, FE, and FD estimators are inconsistent. The null hypothesis is satisfied if ρ = 0. Under the null, ideal conditions for these estimators are satisfied. All three estimators are inconsistent, under the alternative of ρ 0, but the magnitude of inconsistency differs among estimators Simulation results MC simulations are performed with 2,000 replications for T =4 and T =8 and n ranging from 100 to 1,000 and cluster robust standard errors are used in all simulations. The magnitude of correlation because of 23

24 nonrandom missingness and its effect on the inconsistency of β 1 in (42) are determined by ρ. When ρ = 0, no differential missingness across x 1i t occurs so no inconsistency is caused by nonrandom missingness. Finite-sample performance of the HT tests when covariate of interest is binary is examined in tables 1 and 2 for DGP I and tables 3 and 4 for DGP II. The true value of β 1 is 1 and the rejection rate of the null hypothesis should be nominal size, 0.05, under the true null hypothesis. When ρ is zero, no inconsistency on β 1 is caused by nonrandom missingness. Five HT tests and three variable addition tests using four panel data estimators such as the LS, RE, FE, and FD estimators are examined. The mean of the MC point estimates for β 1 is close to true value 1 for all four estimators under ρ = 0 regardless of missing proportion. In the DGP I with ρ = 0, the null hypothesis is true. 8 Table 1 shows that, for DGP I with missing proportion of 40%, coverages of rejection probability contain 0.05 for all five tests. As missing proportion increases from 40% to 60%, slight over-rejections of null hypothesis appear for some of HT tests when N = 100 and N = 500, but coverages of rejection probability of all five HT tests contain 0.05 when N = 1,000. Proposed five HT tests show no size bias. Finite-sample behavior of the HT tests improves as N increases. The last three columns report rejection probability for variable addition tests using the pooled LS, FE and FD estimators. Over-rejection appears for the pooled LS estimator and size bias is larger when missing proportion is higher. Coverages of rejection probability contain 0.05 in cases for the FE and FD estimators regardless of missing proportion. Variable addition tests with the FE and FD estimators show no size bias while the pooled LS estimator shows slight over-rejection. Under DGP I where inconsistency on an estimator of β 1 is caused by conditional correlation between covariate x 1i t and unobserved time-invariant factors, the pooled LS is inconsistent when ρ is not equal to zero. In simulation, we choose ρ = 0.9 to induce inconsistency of 50% for a missing proportion of 40% and inconsistency of 100% for a missing proportion of 60%. 9 No inconsistency arises for the FE and FD estimators because FE and FD transformations eliminate unobserved time-invariant factors that cause inconsistency because of nonrandom missing data. In Table 1, the last six rows report results when ρ is 0.9. The sample means of the MC point estimates for β are about 0.47 and 0.07 for the pooled LS estimator with missing proportions of 40% and 60%, respectively. Finite-sample mean estimates for β 1 do not depend on cross-sectional dimension, n, for the pooled LS estimator. On the other hand, the mean values of the MC point estimates of β 1 for the FE and FD estimators are 1 for all n regardless of the missing proportion. Therefore, the Pooled LS estimator is inconsistent while the FE and FD estimators 8 In this paper, all simulations of tests are based on 95% confidence interval. 9 Inconsistency increases for the Pooled LS as ρ increases. 24

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Specification Tests in Unbalanced Panels with Endogeneity.

Specification Tests in Unbalanced Panels with Endogeneity. Specification Tests in Unbalanced Panels with Endogeneity. Riju Joshi Jeffrey M. Wooldridge June 22, 2017 Abstract This paper develops specification tests for unbalanced panels with endogenous explanatory

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

Specification testing in panel data models estimated by fixed effects with instrumental variables

Specification testing in panel data models estimated by fixed effects with instrumental variables Specification testing in panel data models estimated by fixed effects wh instrumental variables Carrie Falls Department of Economics Michigan State Universy Abstract I show that a handful of the regressions

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Princeton University Asian Political Methodology Conference University of Sydney Joint

More information

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Non-linear panel data modeling

Non-linear panel data modeling Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Jeffrey M. Wooldridge Michigan State University

Jeffrey M. Wooldridge Michigan State University Fractional Response Models with Endogenous Explanatory Variables and Heterogeneity Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. Fractional Probit with Heteroskedasticity 3. Fractional

More information

Estimating Panel Data Models in the Presence of Endogeneity and Selection

Estimating Panel Data Models in the Presence of Endogeneity and Selection ================ Estimating Panel Data Models in the Presence of Endogeneity and Selection Anastasia Semykina Department of Economics Florida State University Tallahassee, FL 32306-2180 asemykina@fsu.edu

More information

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook) Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook) 1 2 Panel Data Panel data is obtained by observing the same person, firm, county, etc over several periods. Unlike the pooled cross sections,

More information

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 6 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 21 Recommended Reading For the today Advanced Panel Data Methods. Chapter 14 (pp.

More information

New Developments in Econometrics Lecture 16: Quantile Estimation

New Developments in Econometrics Lecture 16: Quantile Estimation New Developments in Econometrics Lecture 16: Quantile Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. Review of Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile

More information

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63 1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS17/18 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. Linear

More information

Econ 582 Fixed Effects Estimation of Panel Data

Econ 582 Fixed Effects Estimation of Panel Data Econ 582 Fixed Effects Estimation of Panel Data Eric Zivot May 28, 2012 Panel Data Framework = x 0 β + = 1 (individuals); =1 (time periods) y 1 = X β ( ) ( 1) + ε Main question: Is x uncorrelated with?

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation Inference about Clustering and Parametric Assumptions in Covariance Matrix Estimation Mikko Packalen y Tony Wirjanto z 26 November 2010 Abstract Selecting an estimator for the variance covariance matrix

More information

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Matthew Harding and Carlos Lamarche January 12, 2011 Abstract We propose a method for estimating

More information

EC327: Advanced Econometrics, Spring 2007

EC327: Advanced Econometrics, Spring 2007 EC327: Advanced Econometrics, Spring 2007 Wooldridge, Introductory Econometrics (3rd ed, 2006) Chapter 14: Advanced panel data methods Fixed effects estimators We discussed the first difference (FD) model

More information

Casuality and Programme Evaluation

Casuality and Programme Evaluation Casuality and Programme Evaluation Lecture V: Difference-in-Differences II Dr Martin Karlsson University of Duisburg-Essen Summer Semester 2017 M Karlsson (University of Duisburg-Essen) Casuality and Programme

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Increasing the Power of Specification Tests. November 18, 2018

Increasing the Power of Specification Tests. November 18, 2018 Increasing the Power of Specification Tests T W J A. H U A MIT November 18, 2018 A. This paper shows how to increase the power of Hausman s (1978) specification test as well as the difference test in a

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Fixed Effects Models for Panel Data. December 1, 2014

Fixed Effects Models for Panel Data. December 1, 2014 Fixed Effects Models for Panel Data December 1, 2014 Notation Use the same setup as before, with the linear model Y it = X it β + c i + ɛ it (1) where X it is a 1 K + 1 vector of independent variables.

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

Multiple Equation GMM with Common Coefficients: Panel Data

Multiple Equation GMM with Common Coefficients: Panel Data Multiple Equation GMM with Common Coefficients: Panel Data Eric Zivot Winter 2013 Multi-equation GMM with common coefficients Example (panel wage equation) 69 = + 69 + + 69 + 1 80 = + 80 + + 80 + 2 Note:

More information

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid Applied Economics Panel Data Department of Economics Universidad Carlos III de Madrid See also Wooldridge (chapter 13), and Stock and Watson (chapter 10) 1 / 38 Panel Data vs Repeated Cross-sections In

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

10 Panel Data. Andrius Buteikis,

10 Panel Data. Andrius Buteikis, 10 Panel Data Andrius Buteikis, andrius.buteikis@mif.vu.lt http://web.vu.lt/mif/a.buteikis/ Introduction Panel data combines cross-sectional and time series data: the same individuals (persons, firms,

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes ECON 4551 Econometrics II Memorial University of Newfoundland Panel Data Models Adapted from Vera Tabakova s notes 15.1 Grunfeld s Investment Data 15.2 Sets of Regression Equations 15.3 Seemingly Unrelated

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

Estimation of Dynamic Panel Data Models with Sample Selection

Estimation of Dynamic Panel Data Models with Sample Selection === Estimation of Dynamic Panel Data Models with Sample Selection Anastasia Semykina* Department of Economics Florida State University Tallahassee, FL 32306-2180 asemykina@fsu.edu Jeffrey M. Wooldridge

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

Panel data methods for policy analysis

Panel data methods for policy analysis IAPRI Quantitative Analysis Capacity Building Series Panel data methods for policy analysis Part I: Linear panel data models Outline 1. Independently pooled cross sectional data vs. panel/longitudinal

More information

Econometric Methods for Panel Data

Econometric Methods for Panel Data Based on the books by Baltagi: Econometric Analysis of Panel Data and by Hsiao: Analysis of Panel Data Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 3 Jakub Mućk Econometrics of Panel Data Meeting # 3 1 / 21 Outline 1 Fixed or Random Hausman Test 2 Between Estimator 3 Coefficient of determination (R 2

More information

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible

More information

ECO375 Tutorial 8 Instrumental Variables

ECO375 Tutorial 8 Instrumental Variables ECO375 Tutorial 8 Instrumental Variables Matt Tudball University of Toronto Mississauga November 16, 2017 Matt Tudball (University of Toronto) ECO375H5 November 16, 2017 1 / 22 Review: Endogeneity Instrumental

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 1 Jakub Mućk Econometrics of Panel Data Meeting # 1 1 / 31 Outline 1 Course outline 2 Panel data Advantages of Panel Data Limitations of Panel Data 3 Pooled

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

Linear dynamic panel data models

Linear dynamic panel data models Linear dynamic panel data models Laura Magazzini University of Verona L. Magazzini (UniVR) Dynamic PD 1 / 67 Linear dynamic panel data models Dynamic panel data models Notation & Assumptions One of the

More information

Applied Health Economics (for B.Sc.)

Applied Health Economics (for B.Sc.) Applied Health Economics (for B.Sc.) Helmut Farbmacher Department of Economics University of Mannheim Autumn Semester 2017 Outlook 1 Linear models (OLS, Omitted variables, 2SLS) 2 Limited and qualitative

More information

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Tak Wai Chau February 20, 2014 Abstract This paper investigates the nite sample performance of a minimum distance estimator

More information

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning Økonomisk Kandidateksamen 2004 (I) Econometrics 2 Rettevejledning This is a closed-book exam (uden hjælpemidler). Answer all questions! The group of questions 1 to 4 have equal weight. Within each group,

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Advanced Econometrics

Advanced Econometrics Based on the textbook by Verbeek: A Guide to Modern Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna May 16, 2013 Outline Univariate

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

Christopher Dougherty London School of Economics and Political Science

Christopher Dougherty London School of Economics and Political Science Introduction to Econometrics FIFTH EDITION Christopher Dougherty London School of Economics and Political Science OXFORD UNIVERSITY PRESS Contents INTRODU CTION 1 Why study econometrics? 1 Aim of this

More information

Gravity Models: Theoretical Foundations and related estimation issues

Gravity Models: Theoretical Foundations and related estimation issues Gravity Models: Theoretical Foundations and related estimation issues ARTNet Capacity Building Workshop for Trade Research Phnom Penh, Cambodia 2-6 June 2008 Outline 1. Theoretical foundations From Tinbergen

More information

On the Use of Linear Fixed Effects Regression Models for Causal Inference

On the Use of Linear Fixed Effects Regression Models for Causal Inference On the Use of Linear Fixed Effects Regression Models for ausal Inference Kosuke Imai Department of Politics Princeton University Joint work with In Song Kim Atlantic ausal Inference onference Johns Hopkins

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Short T Panels - Review

Short T Panels - Review Short T Panels - Review We have looked at methods for estimating parameters on time-varying explanatory variables consistently in panels with many cross-section observation units but a small number of

More information

Reliability of inference (1 of 2 lectures)

Reliability of inference (1 of 2 lectures) Reliability of inference (1 of 2 lectures) Ragnar Nymoen University of Oslo 5 March 2013 1 / 19 This lecture (#13 and 14): I The optimality of the OLS estimators and tests depend on the assumptions of

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Econometrics for PhDs

Econometrics for PhDs Econometrics for PhDs Amine Ouazad April 2012, Final Assessment - Answer Key 1 Questions with a require some Stata in the answer. Other questions do not. 1 Ordinary Least Squares: Equality of Estimates

More information

A Practical Test for Strict Exogeneity in Linear Panel Data Models with Fixed Effects

A Practical Test for Strict Exogeneity in Linear Panel Data Models with Fixed Effects A Practical Test for Strict Exogeneity in Linear Panel Data Models with Fixed Effects Liangjun Su, Yonghui Zhang,JieWei School of Economics, Singapore Management University School of Economics, Renmin

More information

Applied Quantitative Methods II

Applied Quantitative Methods II Applied Quantitative Methods II Lecture 10: Panel Data Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 1 / 38 Outline 1 Introduction 2 Pooled OLS 3 First differences 4 Fixed effects

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.

More information

Multiple Regression Analysis: Heteroskedasticity

Multiple Regression Analysis: Heteroskedasticity Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1 topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

CORRELATED RANDOM EFFECTS MODELS WITH UNBALANCED PANELS

CORRELATED RANDOM EFFECTS MODELS WITH UNBALANCED PANELS CORRELATED RANDOM EFFECTS MODELS WITH UNBALANCED PANELS Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI 48824-1038 wooldri1@msu.edu July 2009 I presented an earlier

More information

Don t be Fancy. Impute Your Dependent Variables!

Don t be Fancy. Impute Your Dependent Variables! Don t be Fancy. Impute Your Dependent Variables! Kyle M. Lang, Todd D. Little Institute for Measurement, Methodology, Analysis & Policy Texas Tech University Lubbock, TX May 24, 2016 Presented at the 6th

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 4 Jakub Mućk Econometrics of Panel Data Meeting # 4 1 / 30 Outline 1 Two-way Error Component Model Fixed effects model Random effects model 2 Non-spherical

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Linear Panel Data Models

Linear Panel Data Models Linear Panel Data Models Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania October 5, 2009 Michael R. Roberts Linear Panel Data Models 1/56 Example First Difference

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II Jeff Wooldridge IRP Lectures, UW Madison, August 2008 5. Estimating Production Functions Using Proxy Variables 6. Pseudo Panels

More information

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han Econometrics Honor s Exam Review Session Spring 2012 Eunice Han Topics 1. OLS The Assumptions Omitted Variable Bias Conditional Mean Independence Hypothesis Testing and Confidence Intervals Homoskedasticity

More information

8. Instrumental variables regression

8. Instrumental variables regression 8. Instrumental variables regression Recall: In Section 5 we analyzed five sources of estimation bias arising because the regressor is correlated with the error term Violation of the first OLS assumption

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors by Bruce E. Hansen Department of Economics University of Wisconsin October 2018 Bruce Hansen (University of Wisconsin) Exact

More information

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

Weak Instruments and the First-Stage Robust F-statistic in an IV regression with heteroskedastic errors

Weak Instruments and the First-Stage Robust F-statistic in an IV regression with heteroskedastic errors Weak Instruments and the First-Stage Robust F-statistic in an IV regression with heteroskedastic errors Uijin Kim u.kim@bristol.ac.uk Department of Economics, University of Bristol January 2016. Preliminary,

More information

Comment on Tests of Certain Types of Ignorable Nonresponse in Surveys Subject to Item Nonresponse or Attrition

Comment on Tests of Certain Types of Ignorable Nonresponse in Surveys Subject to Item Nonresponse or Attrition Institute for Policy Research Northwestern University Working Paper Series WP-09-10 Comment on Tests of Certain Types of Ignorable Nonresponse in Surveys Subject to Item Nonresponse or Attrition Christopher

More information

Small-sample cluster-robust variance estimators for two-stage least squares models

Small-sample cluster-robust variance estimators for two-stage least squares models Small-sample cluster-robust variance estimators for two-stage least squares models ames E. Pustejovsky The University of Texas at Austin Context In randomized field trials of educational interventions,

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Economics 308: Econometrics Professor Moody

Economics 308: Econometrics Professor Moody Economics 308: Econometrics Professor Moody References on reserve: Text Moody, Basic Econometrics with Stata (BES) Pindyck and Rubinfeld, Econometric Models and Economic Forecasts (PR) Wooldridge, Jeffrey

More information

INVERSE PROBABILITY WEIGHTED ESTIMATION FOR GENERAL MISSING DATA PROBLEMS

INVERSE PROBABILITY WEIGHTED ESTIMATION FOR GENERAL MISSING DATA PROBLEMS IVERSE PROBABILITY WEIGHTED ESTIMATIO FOR GEERAL MISSIG DATA PROBLEMS Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI 48824-1038 (517) 353-5972 wooldri1@msu.edu

More information

1 Estimation of Persistent Dynamic Panel Data. Motivation

1 Estimation of Persistent Dynamic Panel Data. Motivation 1 Estimation of Persistent Dynamic Panel Data. Motivation Consider the following Dynamic Panel Data (DPD) model y it = y it 1 ρ + x it β + µ i + v it (1.1) with i = {1, 2,..., N} denoting the individual

More information