Estimation of Dynamic Panel Data Models with Sample Selection

Size: px
Start display at page:

Download "Estimation of Dynamic Panel Data Models with Sample Selection"

Transcription

1 === Estimation of Dynamic Panel Data Models with Sample Selection Anastasia Semykina* Department of Economics Florida State University Tallahassee, FL Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI March 2, 2011 * Correspondence to: Anastasia Semykina, Department of Economics, Florida State University, Tallahassee, FL , USA. asemykina@fsu.edu. Phone: Fax: We thank the editor M. Hashem Pesaran and three anonymous referees for their useful comments. 1

2 Summary We propose a new method for estimating dynamic panel data models with selection. The method uses backward substitution for the lagged dependent variable, which leads to an estimating equation that requires correcting for contemporaneous selection only. The estimator is valid under relatively weak assumptions about errors and permits avoiding the weak instruments problem associated with differencing. We also propose a simple test for selection bias that is based on the addition of a selection term to the first-difference equation and subsequent testing for significance of this term. The methods are applied to estimating dynamic earnings equations for women. Key words: Sample selection, Panel data, Dynamic models, Two-step estimation. 2

3 1 Introduction Recently developed methods for estimating dynamic unobserved effects panel data models have become widely used in applied economics research. In the present paper, we contribute to the literature by developing a new estimation method for the models, where the panel is not balanced due to nonrandom selection. In the absence of selection, the traditional approach to estimating dynamic panel data models is to remove the unobserved effect by first-differencing and then use instrumental variables methods for estimating the differenced equation. This approach was initially proposed by Anderson and Hsiao (1981) and was later considered within a more efficient generalized method of moments (GMM) framework by Holtz-Eakin, Newey and Rosen (1988), Arellano and Bond (1991), Ahn and Schmidt (1995), and others. Blundell and Bond (1998) raised the problem of weak instruments in the context of the first-differenced GMM estimation. This problem arises when the series are highly persistent, which happens in a simple AR(1) model with the autoregressive coefficient close to unity. 1 Blundell and Bond show that imposing restrictions on the initial condition results in additional linear moments that can help to improve the performance of the GMM estimator. As an alternative solution, they model the relationship between the unobserved effect and initial condition through a linear function and suggest using the generalized least squares estimator on the extended model, where the initial value is included in the conditioning set. Several previous studies considered estimation of dynamic panel data models with selectivity; most of them use differencing to remove the unobserved effect. 2 Ziliak and Kniesner (1998) and Wooldridge (2002) propose a solution to the selection problem that 1 Binder, Hsiao and Pesaran (2005) show that the same problem arises in panel vector autoregressive models. 2 Dynamic panel data models with censoring are considered, for example, by Honore and Hu (2004), Hu (2002) and Labeaga (1999). See also Bover and Arellano (1997). 3

4 arises because of nonrandom attrition. Given the nature of attrition as an absorbing state if the unit is observed in the current period, it is observed in the previous period, also Ziliak and Kniesner, and Wooldridge show that accounting for the current period selection in the differenced equation results in consistent estimation. Under the assumption that errors in the selection equation are normally distributed, the selection correction term is the inverse Mills ratio. Arellano, Bover and Labeaga (1999) consider autoregressive panel data models with sample selection. They model the conditional expectation of the unobserved effect as a linear function of the past values of the dependent variable and consider the distribution of the dependent variable conditional on its past. For each t, the resulting reduced-form equation is estimated on a sub-sample of data, which includes cross-section units without missing past values. Arellano, Bover and Labeaga assume normality of the error terms in both primary and selection equations and use the inverse Mills ratio to account for the fact that only the sub-samples with observed past values are used. The structural autoregressive coefficient is then recovered from the reduced-form coefficients using the restrictions imposed on parameters. Another solution to the incidental truncation problem in dynamic panel data models was proposed by Kyriazidou (2001), who suggested taking differences between any two periods in which the selection index for the given unit is the same or similar. Under the assumption that the vector of errors is independent and identically distributed over time conditional on the exogenous variables, differencing eliminates both the unobserved effect and selection effect. For consistency, it is crucial that the assumptions of strict stationarity and conditional serial independence of the errors hold. Moreover, the estimator converges at a rate that is slower than the usual square root of the cross-section sample size. Another semiparametric estimator was proposed by Gayle and Viauroux (2007), who consider a three-step sieve estimator. In the first step the selection probabilities in each 4

5 period are estimated nonparametrically by a kernel estimator. In the second step the inverse probability function is linearized, the unobserved effect is removed by differencing and the parameters in the linearized specification of the inverse probability function are estimated using a sieve minimum distance estimator (a GMM estimator with series used to approximate unknown functions). In the third step the GMM estimator is used to estimate the differenced primary equation augmented by the correction term, where the differenced correction term is again approximated by series estimators. As mentioned above, most earlier studies use differencing. A benefit of differencing for unbalanced panels is that it removes additive heterogeneity, and therefore any selection is allowed to be arbitrarily correlated with the heterogeneity in the levels equation. Unfortunately, if selection depends on the idiosyncratic shocks, consistent estimation requires either imposing relatively strong assumptions on the properties of error distributions or necessitates derivation of a complicated selection correction term that accounts for selection in several consecutive periods. As noted by Blundell and Bond (1998), differencing may also lead to a weak instruments problem. Furthermore, in the case of incidental truncation such as labor force participation units may drop out and appear again in any period; therefore, the use of first-differencing or otherwise conditioning on observability of the dependent variable in multiple consecutive periods in dynamic panel data models with arbitrary selection patterns implies that much of the data is lost. In this paper we consider an alternative method for estimating dynamic panel data models with selection, which does not rely on differencing. One of the key assumptions is that the initial condition is observed for all cross-section units. To account for unobserved heterogeneity, rather than using differencing we follow Blundell and Bond (1998) and Chamberlain (1980, 1982, 1984), and model the conditional expectation of the unobserved effect as a linear function of the exogenous variables and initial condition. Then, backward substitution for the lagged dependent variable is used to obtain the equation 5

6 that contains the lags of the exogenous explanatory variables (which are assumed to be always observed) and the initial condition, but no lags of the dependent variable. As a result, selection correction reduces to a contemporaneous selection problem of the type studied in Wooldridge (1995) with strictly exogenous variables. The ability to focus on selection period-by-period greatly simplifies the derivation of the correction term while allowing general serial correlation in the error of the selection equation. The simplest approach relies on the assumption that the error terms in the selection equation are normally distributed, but we also briefly discuss the possibility of semiparametric estimation. Once the correction term is obtained, the augmented equation can be consistently estimated by nonlinear least squares (NLS) or GMM. The new estimation methods have several important advantages. Modeling the unobserved effects allows us to estimate the equation of interest in levels, thereby avoiding the weak instruments problem often associated with the estimators that use differencing. In the discussed context the error terms in both primary and selection equations may be heterogeneously distributed over time, and the error in the selection equation may be arbitrarily serially dependent. We also discuss how estimation can be modified, so that the observability of the initial condition is not required, and serial dependence in the error terms is permitted in both equations. Additionally, the approach proposed here makes use of all cross-section units observed at least once after the initial period, which helps to avoid losing data. 2 The Model Consider a dynamic panel data model with unobserved heterogeneity: y it = ρy i,t 1 + x it β + c i1 + u it1, t = 1,..., T, (1) 6

7 where x it is a 1 K vector of time-varying variables, β is a K 1 vector of parameters, ρ is a scalar parameter, c i1 is a time-constant unobserved effect, and u it1 is an idiosyncratic error. Variables in x it are assumed to be strictly exogenous conditional on the unobserved effect, but may be correlated with c i1. Selection occurs because of the partial observability of the dependent variable, y it. This is modeled by specifying a selection rule s it = 1[z it δ 2t + c i2 + u it2 > 0], t = 1,..., T, (2) where s it is a selection indicator that equals one if y it is observed and is zero otherwise, c i2 is a time-constant unobserved effect, u it2 is an idiosyncratic error, z it is a 1 L (L > K) vector of variables that are strictly exogenous conditional on the unobserved effect, and δ 2t is an L 1 vector of parameters. In what follows, it is assumed that z it contains all of the regressors from the primary equation, but must also contain at least one additional time-varying variable. Additional variables may be the factors that affect selection but not the dependent variable in the primary equation. Alternatively, if selection is partly determined by the lagged values of y it (as in some labor supply models, for example), vector z it would include lagged values of x it. Given the selection problem, estimation of equation (1) by differencing is complicated for several reasons. First, we need to observe the dependent variable and explanatory variables in the current and previous periods. Because of the lagged dependent variable, we would only be able to use observations where y it is observed in three consecutive periods. Moreover, any selection correction term would involve conditioning on observability in three different periods, making its derivation and estimation difficult. We can avoid these problems by substituting back for y i,t 1 and expressing y it through 7

8 the current and lagged values of the explanatory variables and the initial condition, y i0 : y it = ρ t y i0 + ( t 1 ) t 1 t 1 ρ j x i,t j β + c i1 ρ j + ρ j u i,t j,1, t = 1,..., T. (3) j=0 j=0 j=0 Denote z i (z i1, z i2,..., z it ). Given (3), the estimating equation can be derived under the following assumption: ASSUMPTION 2.1 (i) y i0 and z i are always observed, while y it, t = 1,..., T, are observed only for s it = 1. (ii) E(u it1 x it, y i,t 1, x i,t 1,..., y i0, c i1 ) = 0, so that Cov(u it1, u is1 ) = 0, for all s t. (iii) E(u it1 z i, y i0, c i1 ) = 0, t = 1,..., T. (iv) c i1 = η 1 + T s=1 ξ sz is + γ 1 y i0 + a i1, E(a i1 z i, y i0 ) = 0. (v) c i2 = η 2 + T s=1 ψ sz is + γ 2 y i0 + a i2. (vi) For v it2 = a i2 + u it2, v it2 z i, y i0 Normal(0, σ 2 2t), t = 1,..., T. (vii) For v it1 = t 1 j=0 ρj (u i,t j,1 + a i1 ), E(v it1 z i, y i0, v it2 ) = ϕ 2t v it2, t = 1,..., T. According to part (ii) of Assumption 2.1, the conditional mean in equation (1) is assumed to be dynamically complete, which is a rather standard assumption in the literature. This part of the assumption ensures that y i0 is exogenous with respect to the final error in (3). At the end of this section we discuss an alternative set of assumptions and the corresponding estimating equation, where the dynamic completeness assumption is dropped, so that {u it1 } may be serially correlated. Part (iv) of Assumption 2.1 uses Chamberlain s (1980, 1982, 1984) device to model the conditional mean of the unobserved effect, c i1, as a linear function of exogenous variables (see also Blundell and Bond, 1998). This approach was used by Wooldridge (2005) in 8

9 the context of nonlinear dynamic panel data models with balanced panels. In general, z it may contain time-constant variables; of course, the leads and lags of such variables would not be included in the conditional mean of c i1. A non-zero correlation between the time-constant variables and c i1 implies that the effect of these variables cannot be distinguished from that of the unobserved heterogeneity. However, it may still be useful to include the time-invariant characteristics in z it because controlling for more variables can help to improve on the precision of the estimator. Under Assumption 2.1, parts (i)-(iv), the primary equation can be written as y it = ρ t y i0 + ( t 1 ) ( ) ( 1 ρ ρ j t x i,t j β + η ρ j=0 ) T ξ s z is + γ 1 y i0 + v it1, E(v it1 z i, y i0 ) = 0, t = 1,..., T, (4) s=1 where v it1 = t 1 j=0 ρj (u i,t j,1 + a i1 ), t = 1,..., T, are the new error terms, which will be serially correlated even though the initial idiosyncratic errors were not. Equation (4) can be used to estimate the parameters when the panel is balanced. 3 Estimating equation (4) by NLS or GMM can serve as an alternative to traditional estimators that combine first differencing with instrumental variables methods. As mentioned in the introduction, a GMM estimator that uses first-differenced data suffers from the weak instruments problem when the series are highly persistent. Specifically, for a sequentially exogenous variable ω it, such as a lagged dependent variable, we can write the data generating process as ω it = ρω i,t 1 + ɛ it, where Cov(ɛ is, ɛ it ) = 0 for s t. In the extreme case, where ρ = 1, ω it = ɛ it, so that past values (ω i,t 1,..., ω i1 ) are not correlated with ω it and hence, cannot be used as instruments. When ρ is close to one, the lagged values are correlated with ω it, but the correlation is weak, which results in the weak instruments 3 We thank the anonymous referee for bringing this fact to our attention. The referee also noted that an interesting question is whether our approach is less efficient than the Blundell and Bond (1998) approach. This is difficult to say, as the two approaches make different assumptions about the initial condition. 9

10 problem. It is important to note, however, that this problem arises only when the estimation method is GMM. Binder, Hsiao and Pesaran (2005) proposed a quasi maximum likelihood estimator that uses differencing to remove unobserved heterogeneity, but does not suffer from the weak instruments problem. Similarly, Hsiao, Pesaran and Tahmiscioglu (2002) propose a transformed likelihood approach and show that their maximum likelihood estimator that uses differenced data performs better than the GMM estimator. In equation (4), the weak instruments problem does not arise. Because all variables in (4) are in levels, all of them are exogenous under Assumption 2.1 parts (ii)-(iv) and hence, are used as their own instruments. Although the estimator relies on time variation in the variables, the source of this variation does not matter. Even if ρ = 1, the parameters in (4) can be consistently estimated by NLS or GMM, as long as Var(ɛ it ) 0. As is true for all panel data models with large N and fixed T, the autoregressive coefficient can be identified from the cross-sectional variation in the data. In the context of an unbalanced panel, under Assumption 2.1, parts (v) and (vi), the selection equation can we written as T s it = 1[η 2 + z it δ 2t + ψ s z is + γ 2 y i0 + v it2 > 0], t = 1,..., T, (5) s=1 v it2 z i, y i0 Normal(0, σ 2 2t), t = 1,..., T, (6) where the Chamberlain s modeling device is used to model the distribution of the timeconstant unobserved effect, c i2. Note that due to the presence of the unobserved effect, the composite errors, v it2 = u it2 + a i2, t = 1,..., T, are necessarily serially correlated. Also, error variances are allowed to vary over time. The normality assumption is not crucial for estimating the selection equation. As long as v it2 is independent of (z i, y i0 ) and the appropriate regularity conditions hold, parameters in (5) can be consistently estimated using a semiparametric estimator (see, for example, Ichimura 1993, Klein and 10

11 Spady 1991). However, as discussed below, the derivation of the selection correction term is substantially simplified if Assumption 2.1(vi) holds. To correct for the selection bias, we consider a two-step estimator and use the assumptions similar to the standard selection literature in a cross-sectional context; see, for example, Wooldridge (2002, Chapter 17). Specifically, from Assumption 2.1(vii) it follows that E[v it1 z i, y i0, s it = 1] = E[E(v it1 v it2 ) z i, y i0, s it = 1] = E[ϕ 2t v it2 z i, y i0, s it = 1] = h t (z i, y i0 ) h it, t = 1,..., T, (7) where h it h t (η 2 + z it δ 2t + T s=1 ψ sz is + γ 2 y i0 ), and h t ( ) is an unknown function. From (7), it follows that for s it = 1, equation (4) can be written as y it = ρ t y i0 + ( t 1 ) ( ) ( 1 ρ ρ j t x i,t j β + η ρ j=0 ) T ξ s z is + γ 1 y i0 + h it + e it1, E(e it1 z i, y i0, s it = 1) = 0, t = 1,..., T. (8) s=1 It is possible to estimate equation (8) semiparametrically. A semiparametric estimator would be appropriate if either the error distribution in the selection equation is not normal, or E(v it1 z i, y i0, v it2 ) is a nonlinear function of v it2, or both. However, it is also useful to consider a fully parametric approach that would lead to a simple estimation routine and would help to avoid computational difficulties typically associated with semiparametric methods. Therefore, in what follows we focus on the parametric case. Under Assumption 2.1, parts (vi) and (vii), function h t is given by h t ( ) = ϕ 2t φ( ) Φ( ) ϕ 2tλ( ), (9) where φ( ) and Φ( ) are standard normal pdf and cdf, respectively, and λ( ) is the inverse 11

12 Mills ratio. Thus, with some abuse of notation we can write the primary equation for the selected sample as y it = ρ t y i0 + ( t 1 ) ( ) ( 1 ρ ρ j t x i,t j β + η ρ j=0 ) T ξ s z is + γ 1 y i0 + ϕ 2t λ it2 + e it1, E(e it1 z i, y i0, s it = 1) = 0, t = 1,..., T, (10) s=1 where λ it2 λ(η 2 +z it δ 2t + T s=1 ψ sz is +γ 2 y i0 ). Under Assumption 2.1, equation (10) is the final estimating equation that can be consistently estimated by NLS or GMM. As an alternative approach, one could treat the initial condition as an unobserved effect and model its conditional expectation as a linear function of exogenous variables, as suggested by Chamberlain (1984). 4 In this case, the dynamic completeness of the conditional mean in equation (2) is not needed (and most likely will not hold), so that the idiosyncratic errors in (2) may be serially correlated. Formally, the set of assumptions can be summarized as follows: ASSUMPTION 2.2 (i) y i0 is not observed, z i is always observed, and y it, t = 1,..., T, are observed only for s it = 1. (ii) E(u it1 z i ) = 0, t = 1,..., T. (iii) y i0 = T s=1 κ sz is + b i, E(b i z i ) = 0. (vi) c i1 = η 1 + T s=1 ξ sz is +γ 1 y i0 +a i1 = η 1 + T s=1 (ξ s +γ 1 κ s )z is +a i1 +γ 1 b i, E(a i1 z i ) = 0. (v) c i2 = η 2 + T s=1 ψ sz is + γ 2 y i0 + a i2 = η 2 + T s=1 (ψ s + γ 2 κ s )z is + a i2 + γ 2 b i. (vi) For v it2 = a i2 + γ 2 b i + u it2, v it2 z i Normal(0, σ 2 2t), t = 1,..., T. 4 We thank the anonymous referee for suggesting that we consider this approach. 12

13 (vii) For v it1 = ρ t b i + t 1 j=0 ρj (u i,t j,1 + a i1 + γ 1 b i ), E(v it1 z i, v it2 ) = ϕ 2t v it2, t = 1,..., T. Under Assumption 2.2, for s it = 1, the primary equation can be written as y it = ρ t T κ s z is + s=1 ( t 1 ) ( ) [ 1 ρ ρ j t x i,t j β + η ρ j=0 ] T ξ s z is + ϕ 2t λ it2 + e it1, E(e it1 z i, s it = 1) = 0, t = 1,..., T. (11) s=1 where λ it2 λ(η 2 + z it δ 2t + T s=1 ψ s z is ), ψs = ψ s + γ 2 κ s and ξ s = ξ s + γ 1 κ s. Similarly to (10), parameters in equation (11) can be consistently estimated by NLS or GMM, as discussed in the following two sections. Alternatively, one can estimate the reduced-form equation and then obtain structural coefficients, ρ and β, using nonlinear restrictions on parameters. In (11), it is possible to test the presence of the observed dynamics. If only the unobserved dynamics is present, the lags of the exogenous variables would not appear in equation (11), i.e. ρ would be zero. Specifying the estimating equation as in (11) has an advantage of allowing serial correlation in idiosyncratic errors in equation (2). However, it also requires that the model necessarily contains exogenous time-varying explanatory variables and ignores the dynamics that is due to unobserved factors that are not included in the model. In what follows, we focus on the approach, where the initial condition appears in the conditioning set, and the conditional mean in (2) is assumed to be dynamically complete, so that u it1 are serially uncorrelated. We emphasize, however, that equation (11) can be estimated using the proposed methods, also. 3 NLS Estimation A simple way to obtain a consisted estimator of parameters in equation (10) is to replace λ it2 with its consistent estimator and estimate the parameters in two steps. Under 13

14 Assumption 2.1(v) and (vi), equation (5) can be consistently estimated by probit after the error variance is normalized to equal unity. Since error variances may differ across time periods, it is most appropriate to estimate the selection equation separately for each time period. Denote the first-step estimators ˆπ t = (η t2, ˆψ 1t,..., δ 2t + ψ tt,..., ˆψ T t, ˆγ t2 ), ˆπ = (ˆπ 1,..., ˆπ T ), and the first-step vector of regressors q it = (1, z i1,..., z it, y i0 ). These can be used to obtain ˆλ it2 λ(q itˆπ t ), and then ˆλ it2 can be used instead of λ it2 in equation (10). Denote the 1 [K + LT + T + 3] vector of the parameters θ (ρ, β, η 1, ξ 1,..., ξ T, γ 1, ϕ 21,..., ϕ 2T ). Parameters in θ can be consistently estimated by pooled nonlinear least squares (NLS) on the selected sample. Define the conditional expectation of y it : m it (θ) m(z i, y i0, s it = 1; θ) = E(y it z i, y i0, s it = 1), (12) where m(z i, y i0, s it = 1; θ) = ρ t y i0 + + ( t 1 ) ρ j x i,t j β j=0 ( ) ( 1 ρ t η ρ ) T ξ s z is + γ 1 y i0 + ϕ 2t λ it2. (13) The correction term, λ it2, is not available, but it can be replaced by a consistent estimator mentioned above. In general, let m it (θ, ˆπ) be a conditional expectation obtained using the estimators of the parameters in the selection equation. Then, the pooled NLS estimator of θ is the solution to the minimization problem 1 min θ 2 N s=1 T s it [y it m it (θ, ˆπ)] 2, (14) t=1 14

15 where one half is used as a multiplier for convenience. The first-order condition for this problem is N T s it θ m it (ˆθ, ˆπ) [y it m it (ˆθ, ˆπ)] = 0, (15) t=1 which can be solved for ˆθ using the iterative procedures. As is standard in panel data models, for identification it is necessary that T 2. In summary, if Assumption 2.1 holds, a consistent estimator of θ can be obtained from the following two-step procedure: PROCEDURE For each t = 1,..., T, estimate separate probit models, s it on 1, z i1,..., z it, y i0 i = 1,..., N and compute the inverse Mills ratios, ˆλ it2. 2. For s it = 1, estimate equation (10) with λ it2 replaced by ˆλ it2 by pooled NLS. Estimate the asymptotic variance as described in Appendix A. From Procedure 3.1 it is apparent that one needs at least one additional exogenous variable in the selection equation (L > K). Although the inverse Mills ratio, ˆλ it2, is a nonlinear function of its argument, it is approximately linear on the most of its range, which may lead to multicollinearity. Thus, it is necessary to have at least one exclusion restriction in order to make the estimation convincing. Even though the resulting estimator is consistent, it is not efficient. From equations (3) and (4) it is seen that the error terms in (10) are serially correlated. Besides, the errors are going to be heteroskedastic because of selection. A nonlinear analog of the seemingly unrelated regressions estimator (see Wooldridge 2002, Problem 12.7) cannot be 15

16 used in this context because selection is not strictly exogenous in the selection equation. However, one can improve efficiency by using a GMM estimator, as discussed in the next section. 4 GMM Estimation The efficiency of the two-step estimator can be improved by using GMM at the second step. Equation (10) is linear in regressors, but nonlinear in parameters, which results in overidentification and permits obtaining a more efficient estimator than pooled NLS. To specify a GMM estimator, define a 1 (LT +3) vector of instruments ˆω it ω it (ˆπ t ) (1, y i0, z i1,..., z it, ˆλ it2 ), t = 1,..., T, and a T T (LT + 3) matrix of instruments Ŵi, Ŵ i W i (ˆπ) ˆω i ˆω i (16) ˆω it Here 0 denotes a 1 (LT + 3) vector of zeros. Define a T 1 vector ĝ i g i (θ, ˆπ) (ĝ i1,..., ĝ it ), where ĝ it g it (θ, ˆπ t ) s it [y it m it (θ, ˆπ)], t = 1,..., T. (17) From equation (10) it follows that the following moment conditions are available: E[W i (π) g i (θ, π)] = 0. (18) Since the conditional expectation of y it is different in each time period, equation (18) implies T (LT + 3) moment conditions. Moreover, because m it (θ, ˆπ) is nonlinear in θ, 16

17 these conditions are not redundant and can be used to enhance efficiency. The GMM estimator of θ is the solution to the minimization problem min θ ( N W i (ˆπ) g i (θ, ˆπ) ) ˆΩ 1 ( N ) W i (ˆπ) g i (θ, ˆπ), (19) where ˆΩ 1 is a consistent estimator of a T (LT + 3) T (LT + 3) positive semidefinite weighting matrix Ω 1. The first-order condition for this problem is given by [ N W i (ˆπ) θ g i (ˆθ, ˆπ) ] ˆΩ 1 [ N ] W i (ˆπ) g i (ˆθ, ˆπ) = 0. (20) Then, θ can be consistently estimated using a procedure similar to Procedure 3.1, where the GMM estimator is used instead of the pooled NLS estimator. Notice that the pooled NLS estimator is identical to a GMM estimator, which exploits the moment conditions T E[ θ g it (θ, π) g it (θ, π)] = 0 (21) t=1 and uses the weighting matrix { T 1 E[ θ g it (θ, π) θ g it (θ, π)]}. (22) t=1 Thus, in the NLS estimation, the instruments are stacked on top of each other, and each time period receives an equal weight. In contrast, a general GMM estimator that uses a block-diagonal matrix of instruments, as in equation (16), assigns different weights to each time period, which can be used to improve efficiency. In the discussion below, it is the solution to the minimization problem (19), which we call the GMM estimator. The proposed GMM estimator will be consistent for any positive definite matrix Ω; however, a particular form is preferred. Specifically, we formulate an additional assump- 17

18 tion: ASSUMPTION 4.1 (i) Λ is the asymptotic variance of W i (ˆπ) g i (θ, ˆπ). (ii) Ω = Λ. (iii) ˆΩ p Ω. Appendix A provides a formula for ˆΩ that satisfies Assumption 4.1. Following a standard argument for the relative efficiency of the GMM estimator, the GMM estimator that employs weighting matrix ˆΩ as specified in Assumption 4.1 is asymptotically more efficient than pooled NLS and results in a relatively simple expression for the asymptotic variance of ˆθ. Specifically, denote G E[W i (π) θ g i (θ, π)]. If Ω satisfies Assumption 4.1, then the asymptotic variance of the described GMM estimator is Avar(ˆθ) = (GΩ 1 G) 1 /N, (23) which can be estimated as (ĜˆΩ 1 Ĝ) 1 /N, using the formulae provided in Appendix A. We can now summarize a two-step estimation procedure. Let Assumptions 2.1 and 4.1 hold. Then, an estimator of θ that is asymptotically more efficient than the estimator discussed in Section 3 can be obtained using the procedure: PROCEDURE For each t = 1,..., T, estimate separate probit models, s it on 1, z i1,..., z it, y i0 i = 1,..., N and compute the inverse Mills ratios, ˆλ it2. 18

19 2. In equation (10), replace λ it2 with ˆλ it2. For s it = 1, estimate the equation by GMM that uses moment conditions (18) and the weighting matrix that satisfies Assumption 4.1. Estimate the asymptotic variance as described in Appendix A. It is important to note that there are more moment conditions available in addition to those specified in equation (18). Equation (10) implies that e it1 is uncorrelated with any function of z i and y i0. Therefore, any nonlinear functions of the exogenous variables and the initial condition should be valid instruments and can be used to obtain additional moment conditions. The proposed two-step estimator can also be formulated as a joint GMM estimator of (θ, π). As suggested by Newey and McFadden (1994, Section 6.1), such an estimator can be obtained by stacking the moment conditions from the two steps. The moment conditions from the second step are given in (18), while the first-order conditions from the first-step estimation generate the additional moment conditions: E [ {Φ(q it π t )[1 Φ(q it π t )]} 1 φ(q it π t )q it[y it2 Φ(q it π t )] ] = 0, t = 1,..., T. (24) The conditions in (18) and (24) can be used to form a vector of moment conditions for the joint GMM estimation. In that way the additional conditions can be used for estimating θ, which can help to improve efficiency. However, since the first-step equations are exactly identified, the efficiency gain may be modest or even not present at all. Moreover, the two-step GMM estimator appears to be computationally more tractable than the joint GMM estimator in applications where the number of the first-step moment conditions is large, for example, due to T being relatively large. To study the properties of the proposed estimators in finite samples we performed Monte Carlo experiments. 5 In the experiments, among the three estimators that account 5 Detailed description of the experiments and all results are summarized in the supplement to the paper, which is available from the authors upon request. 19

20 for the selection bias (two-step NLS, two-step GMM and joint GMM that uses the moment conditions for both equations) the two-step NLS estimator has the smallest standard deviations and root mean square errors (RMSEs) in small samples (N = 200), which is likely due to the fact that the GMM estimators use estimated weighting matrices, ˆΩ, that cannot be precisely estimated in small samples. However, in large samples (N = 4000) both GMM estimators are more efficient than the two-step NLS estimator. The joint GMM estimator tends to have slightly smaller standard deviations and RMSEs than the two-step GMM estimator, but the differences are minor and virtually disappear when N is large (N = 4000). The two-step NLS, two-step GMM and joint GMM estimators also perform reasonably well when testing simple hypothesis about parameters. Although for all three estimators the true null is rejected too often in small samples (with the over-rejection being most severe for the two-step GMM estimator), the computed size gets closer to the nominal size as N grows. Both the two-step GMM and joint GMM estimators outperform the two-step NLS estimator in terms of the power of the tests. 5 Testing for Selection Bias It is possible to test for selection bias by testing the hypothesis H 0 : ϕ 2t = 0 in equation (10). A variety of tests for GMM estimators described in Newey and McFadden (1994, Section 9) can be used for this purpose. However, such tests require estimation of either restricted or unrestricted model, or both, prior to testing. Since estimation of equation (10) may be computationally costly due to nonlinearity in the parameters, it is useful to have a simple alternative. A simple test can be developed based on the initial linear model (1). To construct a test, introduce a new selection indicator which identifies observability of y it in three 20

21 consecutive periods, and nominally assume that this new indicator follows an index model with unobserved heterogeneity: d it = 1[s it s i,t 1 s i,t 2 = 1] = 1[z it δ 30t + z i,t 1 δ 31t + z i,t 2 δ 32t + c i3 + u it3 > 0], t = 3,..., T, (25) where c i3 is the unobserved effect and u it3 is the idiosyncratic error. Moreover, (nominally) assume that u it3 is normally distributed and independent of the explanatory variables and unobserved effect, u it3 z i, c i3 Normal(0, 1). (26) Using Chamberlain s approach and assuming normality, write the unobserved effect as T c i3 = η 3 + z is ζ s + a i3, s=1 a i3 z i Normal(0, σ 3t ), t = 3,..., T. (27) Combining (25), (26), and (27) together gives T d it = 1[η 3 + z it δ 30t + z i,t 1 δ 31t + z i,t 2 δ 32t + z is ζ s + v it3 > 0], v it3 z i Normal(0, 1 + σ 3t ), t = 3,..., T, (28) s=1 where v it3 a i3 + u it3 is a new composite error term. With regard to the error terms in the primary equation, assume E( u it1 z i, v it3 ) = E( u it1 v it3 ) = ϕ 3t v it3, t = 3,..., T, (29) 21

22 which, when combined with the normality assumption, gives E( u it1 z i, d it = 1) = ϕ 3t E(v it3 z i, d it = 1) T = ϕ 3t λ(η 3 + z it δ 30t + z i,t 1 δ 31t + z i,t 2 δ 32t + z is ζ s ) ϕ 3t λ it3, t = 3,..., T. (30) s=1 After applying first differencing to equation (1), with some abuse of notation we can write the differenced primary equation for d it = 1 as y it = ρ y i,t 1 + x it β + ϕ 3t λ it3 + ɛ it1, E(ɛ it1 z i, d it = 1) = 0, t = 1,..., T. (31) Thus, the unobserved effect is removed by first differencing and ϕ 3t λ it3 captures the selection effect. Naturally, time-constant variables drop out from the equation. The test is then performed using the following procedure: PROCEDURE For each of t = 3,..., T, run a probit regression d it on 1, z i1,..., z it, i = 1,..., N and compute the inverse Mills ratios, ˆλ it3. 2. For d it = 1, augment the first-differenced primary equation by ˆλ it3 and its interactions with time dummies and estimate the augmented equation by pooled two stage least squares or GMM using y i,t 2 and leads and lags of z it as instruments for y i,t 1 ( x it, ˆλ it3 and the interaction terms should be used as their own instruments). Use the Wald test to test the hypothesis ϕ 31 =... = ϕ 3T = 0. 22

23 As an extension to the proposed procedure, it is possible to impose a restriction of equal variances in the selection equation and estimate equation (28) by pooled probit. Similarly, one may assume that the effect of selection is the same in all time periods and omit the interaction terms in the second-step estimation. A test for selection bias in that case is a usual t-test of the significance of the coefficient on ˆλ it3. Note that for testing a usual variance-covariance matrix should be used; there is no need to adjust for the first-step estimation. If in some period, t j (for j = 3,..., t 1), y i,t j is observed for all cross-section units, then y i,t j can be used as an additional instrument in the second-step estimation. Otherwise, if there are missing values for at least some i, then the observable variable is (s i,t j y i,t j ), and this is not a valid instrument, since we did not account for selection in period t j when constructing ˆλ it3. Importantly, the proposed test is valid regardless of whether or not the model in (25) is correct and whether or not the normality assumption holds. All we need for testing is a reasonable proxy for the selection effect, and the correct specification of the selection term is not essential. If selection problem is present, hopefully this will still be captured by a non-zero coefficient on the inverse Mills ratio in the differenced equation. Similar to the estimators discussed above, having additional variables in z it that are not also in x it helps to make the test more reliable. When the hypothesis of no selection bias is not rejected, the pooled two stage least squares or GMM estimation of the first-differenced equation with x it, y i,t 2, and leads and lags of strictly exogenous variables used as instruments will produce consistent estimators. More distant lags can be used as additional instruments if observed for all cross-section units. However, if the null is rejected, Procedure 5.1 will be a valid correction procedure only if all the assumptions specified in this section are correct. Given that model for d it in equation (25) is quite restrictive, Procedure 5.1 is unlikely to perform 23

24 well as a correction method. Therefore, the methodology described in the previous two sections should be used instead. 6 Empirical Application This section illustrates the proposed methodology with an empirical example by applying the new methods to the estimation of dynamic earnings equations for females. This example is appropriate because earnings are largely determined by different historical factors and tend to be correlated over time. The data come from the Panel Study of Income Dynamics (PSID), years 1980 to The sample consists of white females, who were followed over the considered period. 6 Because when estimating equation (10) it is necessary that the initial condition is observed, we keep only those females for whom 1980 earnings are available. The final sample consists of 579 women, or 6,948 observations over the 12-year period ( ). For this period, the earnings sample is comprised of 5,891 observations. Thus, about 15% of earnings data are missing due to non-participation. Because we define the population as women working in 1980, this exercise should be viewed as an evaluation of the effects of movement in and out of the labor force on estimated earnings equations. Such a question is of considerable interest in labor economics. The dependent variable in the primary equation is the natural logarithm of the average annual hourly earnings, while the independent variables include age, age squared and time dummies. We assume that age is strictly exogenous and is not correlated with the 6 We consider working-age women (ages 18-65) who were either household heads or wives, have completed their education and are neither self-employed nor agricultural workers. The woman was excluded from the analysis if her self-reported age exceeded the age constructed using information on the year of birth by more than two years or self-reported age was smaller than the constructed age by more than one year, or if the woman reported positive work hours and zero earnings. 24

25 unobserved effect. This assumption implies that the mean ability of women born in different years is about the same. Our sample is restricted to women who have completed their education (i.e. years of schooling do not vary over time); hence, the effect of education is not separable from unobserved heterogeneity. Therefore, we only include education as part of the unobserved effect. Additionally, to control for unobserved heterogeneity, we include the number of children in all time periods (i.e. the number of children is assumed to belong to z it, but not x it ). The selection rule is for labor force participation. A woman is considered to be a participant if she reports positive work hours in a given year. When estimating selection equations, in the probit regressions in each time period we include education, age, age squared, and the number of children in all time periods, where the number of children may have a direct effect on the labor force participation. Log of hourly earnings in 1980 is included depending on whether the methodology of Sections 2-4 or the methodology of Section 5 is used for the analysis. Before applying the more advanced methods developed in Sections 2 through 4, we first estimate equation (1) using the simple approach of Section 5. From the total sample we keep observations for which earnings data are available in three consecutive periods and use first differencing to remove the unobserved effect. As a result, the sample size reduces to 5,033 observations; age and education drop out from the equation. Then, we estimate the first-differenced equation by pooled instrumental variables using the log of hourly earnings in t 2 as an instrument for y i,t 1. We call this estimator the first difference instrumental variables (FD-IV) estimator. The estimates for the log earnings equations are reported in Table 1. The first column of the Table display contains the estimates from FD-IV regressions without inverse Mills ratios. The second column contains the test of selection bias in the first-differenced equation using the results in Section 5. The estimate of ρ is rather similar in the two 25

26 columns; it is about and is statistically significant at the 1% level. However, the test suggests that selection bias may be present. The null of no selection is rejected at the 8% significance level. Thus, one might conclude from the test using the FD equation that selection into the work force may be systematically related to idiosyncratic shocks to earnings. The estimates obtained using the methods discussed in Sections 2-4 are reported in the remaining three columns of Table 1. Columns (3) and (4) show estimation results from regressions where the NLS estimator is used at the second step. Column (5) contains the estimates obtained using Procedure 4.1, which employs GMM at the second step. The estimates for the augmented log earnings equation are reported in columns (4) and (5). Based on the Wald tests of the joint significance of the selection terms, the hypothesis of no selection bias is rejected at the 5% level in both cases. Thus, we again find the evidence of the selection bias. The NLS and GMM estimates of ρ are very similar in all three regressions. The estimate is about 0.6 and is significant at the 1% level, which provides evidence of state dependence in earnings offers. This estimate is rather different from the one obtained using first-differencing. Interestingly, similar results were obtained in Monte Carlo simulations, where the FD-IV estimator had substantially larger biases than the NLS estimator that did not account for selection. For all coefficient estimates, standard errors are smaller when the GMM estimator is used at the second step. Columns (3)-(5) show an estimated effect of another year of schooling of about 3%, which is statistically significant at the 1% level. We emphasize, however, that this effect is not distinguishable from unobserved heterogeneity. Moreover, the coefficient on years of schooling in these regressions is not a true return to education because education has an additional effect on earnings through the autoregressive earnings term. The coefficients on the age and age squared reveal a usual U-shape profile, although 26

27 the corresponding estimates are less precise, particularly in the NLS regressions. As a robustness check, we re-estimated the earnings equation using the data from years The sample was restricted to only include women who reported earnings in 1981 (583 women). 7 The resulting coefficient estimates and standard errors were very similar to the ones reported in Table 1. The only noticeable change was observed for the two-step GMM estimates of the coefficients on age and age squared, which became somewhat smaller and statistically insignificant. Based on the results of the joint Wald tests, the null of no selection bias could not be rejected; however, several selection correction terms were individually significant. Specifically, in the FD-IV regression the inverse Mills ratios for years 1984, 1985 and 1991 were significant at the 5% significance level. The correction term for year 1991 was also significant at the 5% level in the two-step NLS and two-step GMM regressions. The table with detailed estimation results is available from the authors upon request. Returning to the discussion of the estimating equation in Section 2, we note that one could also estimate the parameters using equation (11). Is such a case, identification would rely on time variation in strictly exogenous variables, age and age squared. Moreover, the autoregressive coefficient, ρ, would only capture the observed dynamics. In applications where there are no time-varying strictly exogenous variables in the model (i.e. x it is empty), the data would not provide a distinction between the observed and unobserved dynamics. 8 7 Conclusions In this paper, the new methods for estimating dynamic panel data models with selectivity were proposed. A distinctive feature of the new estimators is that they do not rely on 7 The cross-section sample size increased because more women were working in 1981 than in We thank the anonymous referee for suggesting that we include the discussion of this issue. 27

28 differencing when treating the unobserved heterogeneity. This feature allows to avoid the weak instruments problem, which arises in the context of differencing if series are highly persistent or close to unit root. The proposed correction is relatively simple because the method requires correcting for selection in current period only. The errors in both selection and primary equations may be heterogeneously distributed. The errors in the selection equation may also be serially dependent, and the general form of heteroskedasticity is allowed in the primary equation. Additionally, this paper develops a simple test for sample selection bias. The proposed methods are applied to the estimation of dynamic earnings equations for females using the Panel Study of Income Dynamics data. The evidence of selection bias is found in both the first-differenced equation and the equation obtained after backsubstitution. The NLS and GMM estimation based on the new methodology produces the estimate of the stability parameter that equals 0.6 and is rather different from the estimate obtained from the instrumental variables estimation of the first-differenced equation. The proposed correction procedure is parametric and assumes normality of the errors in the selection equation. An important topic for future research is developing a semiparametric estimator, which would not require parametric assumptions regarding the error distributions. Such an estimator can be implemented within the framework of this paper using the methods similar to those considered in Semykina and Wooldridge (2010). Appendix A This section starts with a derivation of the variance of the GMM estimator. The derivation of the variance of the pooled NLS estimator follows by analogy. Using the notation from Section 3, let ˆπ t = (η t2, ˆψ 1t,..., δ 2t + ψ tt,..., ˆψ T t, ˆγ t2 ), ˆπ = (ˆπ 1,..., ˆπ T ), be the first-step estimators, and let q it = (1, z i1,..., z it, y i0 ) be the first-step vector of regressors. Also, 28

29 denote the vector of the parameters θ (ρ, β, η 1, ξ 1,..., ξ T, γ 1, ϕ 21,..., ϕ 2T ). Under the standard regularity conditions given, for example, in Wooldridge (2002, Theorem 14.1), the GMM estimator, ˆθ, is consistent when π is known. If ˆπ is a consistent estimator of π, the first stage estimation will not affect consistency of ˆθ. ˆΩ By definition, if ˆΩ is a consistent estimator of a positive definite matrix Ω, then p Ω. Also, by consistency of ˆθ and ˆπ, and the weak law of large numbers, N 1 N W i (ˆπ) θ g i (ˆθ, ˆπ) p G, where G E[W i (π) θ g i (θ, π)] was also defined earlier. Recall that the first-order condition for the GMM estimator is as in equation (20). After normalizing by the number of observations, taking the appropriate probability limits, and expanding N 1 N W i(ˆπ) g i (ˆθ, ˆπ) around θ, we obtain G Ω 1 N 1/2 N W i (ˆπ) g i (θ, ˆπ) + G Ω 1 N 1/2 N N N(ˆθ θ) = C 1 G Ω 1 N 1/2 W i (ˆπ) θ g i (θ, ˆπ)(ˆθ θ) + o p (1) = 0, W i (ˆπ) g i (θ, ˆπ) + o p (1), (32) where C G Ω 1 G. Next, we need to account for the first-stage estimation of π. In equation (32), both the matrix of instruments and function g i depend on ˆπ. However, as is known, the use of generated instruments does not affect the asymptotic variance of the GMM estimator. This result follows from the conditional moment restrictions in equation (10), which imply that E[g i (θ, π) x i, y i0, s it = 1] = 0, so that g i (θ, π) is uncorrelated with any function of (x i, y i0 ) conditional on s it = 1. Therefore, the mean-value expansion of N 1/2 N W i(ˆπ) g i (θ, ˆπ) 29

30 around π gives N 1/2 N W i (ˆπ) g i (θ, ˆπ) = N 1/2 N W i (π) g i (θ, π) + F N(ˆπ π) + o p (1), (33) where F E[W i (π) π g i (θ, π)], and π g i (θ, π) is a block-diagonal matrix, π g i (θ, π) = s it q i1 ϕ 21 λ i12 (q i1 π 1 + λ i12 ) s it q it ϕ 2T λ it 2 (q it π T + λ it 2 ) (34) Here we used the fact that the derivative of the inverse Mills ratio is equal to q it λ it2 (q it π+ λ it2 ) [see, for example, Wooldridge 2002, p. 522]. Since ˆπ t, t = 1,..., T are maximum likelihood estimators, ˆπ satisfies N(ˆπ π) = N 1/2 N d i (π) + o p (1), (35) where d i (π) (d i1 (π 1 ),..., d it (π T ) ), d it (π t ) = A 1 t {Φ(q it π t )[1 Φ(q it π t )]} 1 φ(q it π t )q it[s it Φ(q it π t )], A t E[ H it (π t )], H it (π t ) = {Φ(q it π t )[1 Φ(q it π t )]} 1 [φ(q it π t )] 2 q itq it. (36) Combining equations (32), (33), and (35), we can write N(ˆθ θ) = C 1 G Ω 1 N 1/2 N [W i (π) g i (θ, π) + F d i (π)] + o p (1), (37) 30

Estimating Panel Data Models in the Presence of Endogeneity and Selection

Estimating Panel Data Models in the Presence of Endogeneity and Selection ================ Estimating Panel Data Models in the Presence of Endogeneity and Selection Anastasia Semykina Department of Economics Florida State University Tallahassee, FL 32306-2180 asemykina@fsu.edu

More information

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II Jeff Wooldridge IRP Lectures, UW Madison, August 2008 5. Estimating Production Functions Using Proxy Variables 6. Pseudo Panels

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation

Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation Seung C. Ahn Arizona State University, Tempe, AZ 85187, USA Peter Schmidt * Michigan State University,

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

Improving GMM efficiency in dynamic models for panel data with mean stationarity

Improving GMM efficiency in dynamic models for panel data with mean stationarity Working Paper Series Department of Economics University of Verona Improving GMM efficiency in dynamic models for panel data with mean stationarity Giorgio Calzolari, Laura Magazzini WP Number: 12 July

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

Linear dynamic panel data models

Linear dynamic panel data models Linear dynamic panel data models Laura Magazzini University of Verona L. Magazzini (UniVR) Dynamic PD 1 / 67 Linear dynamic panel data models Dynamic panel data models Notation & Assumptions One of the

More information

Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity

Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity Jörg Schwiebert Abstract In this paper, we derive a semiparametric estimation procedure for the sample selection model

More information

Non-linear panel data modeling

Non-linear panel data modeling Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1

More information

1 Estimation of Persistent Dynamic Panel Data. Motivation

1 Estimation of Persistent Dynamic Panel Data. Motivation 1 Estimation of Persistent Dynamic Panel Data. Motivation Consider the following Dynamic Panel Data (DPD) model y it = y it 1 ρ + x it β + µ i + v it (1.1) with i = {1, 2,..., N} denoting the individual

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

EC327: Advanced Econometrics, Spring 2007

EC327: Advanced Econometrics, Spring 2007 EC327: Advanced Econometrics, Spring 2007 Wooldridge, Introductory Econometrics (3rd ed, 2006) Chapter 14: Advanced panel data methods Fixed effects estimators We discussed the first difference (FD) model

More information

Transformed Maximum Likelihood Estimation of Short Dynamic Panel Data Models with Interactive Effects

Transformed Maximum Likelihood Estimation of Short Dynamic Panel Data Models with Interactive Effects Transformed Maximum Likelihood Estimation of Short Dynamic Panel Data Models with Interactive Effects Kazuhiko Hayakawa Hiroshima University M. Hashem Pesaran USC and Trinity College, Cambridge May 16,

More information

Linear Panel Data Models

Linear Panel Data Models Linear Panel Data Models Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania October 5, 2009 Michael R. Roberts Linear Panel Data Models 1/56 Example First Difference

More information

Women. Sheng-Kai Chang. Abstract. In this paper a computationally practical simulation estimator is proposed for the twotiered

Women. Sheng-Kai Chang. Abstract. In this paper a computationally practical simulation estimator is proposed for the twotiered Simulation Estimation of Two-Tiered Dynamic Panel Tobit Models with an Application to the Labor Supply of Married Women Sheng-Kai Chang Abstract In this paper a computationally practical simulation estimator

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Princeton University Asian Political Methodology Conference University of Sydney Joint

More information

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. Linear

More information

17/003. Alternative moment conditions and an efficient GMM estimator for dynamic panel data models. January, 2017

17/003. Alternative moment conditions and an efficient GMM estimator for dynamic panel data models. January, 2017 17/003 Alternative moment conditions and an efficient GMM estimator for dynamic panel data models Gabriel Montes-Rojas, Walter Sosa-Escudero, and Federico Zincenko January, 2017 Alternative moment conditions

More information

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63 1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS17/18 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:

More information

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible

More information

Identification and Estimation of Nonlinear Dynamic Panel Data. Models with Unobserved Covariates

Identification and Estimation of Nonlinear Dynamic Panel Data. Models with Unobserved Covariates Identification and Estimation of Nonlinear Dynamic Panel Data Models with Unobserved Covariates Ji-Liang Shiu and Yingyao Hu July 8, 2010 Abstract This paper considers nonparametric identification of nonlinear

More information

GMM ESTIMATION OF SHORT DYNAMIC PANEL DATA MODELS WITH INTERACTIVE FIXED EFFECTS

GMM ESTIMATION OF SHORT DYNAMIC PANEL DATA MODELS WITH INTERACTIVE FIXED EFFECTS J. Japan Statist. Soc. Vol. 42 No. 2 2012 109 123 GMM ESTIMATION OF SHORT DYNAMIC PANEL DATA MODELS WITH INTERACTIVE FIXED EFFECTS Kazuhiko Hayakawa* In this paper, we propose GMM estimators for short

More information

Identification and Estimation of Nonlinear Dynamic Panel Data. Models with Unobserved Covariates

Identification and Estimation of Nonlinear Dynamic Panel Data. Models with Unobserved Covariates Identification and Estimation of Nonlinear Dynamic Panel Data Models with Unobserved Covariates Ji-Liang Shiu and Yingyao Hu April 3, 2010 Abstract This paper considers nonparametric identification of

More information

1. Overview of the Basic Model

1. Overview of the Basic Model IRP Lectures Madison, WI, August 2008 Lectures 3 & 4, Monday, August 4, 11:15-12:30 and 1:30-2:30 Linear Panel Data Models These notes cover some recent topics in linear panel data models. They begin with

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

Asymptotic distributions of the quadratic GMM estimator in linear dynamic panel data models

Asymptotic distributions of the quadratic GMM estimator in linear dynamic panel data models Asymptotic distributions of the quadratic GMM estimator in linear dynamic panel data models By Tue Gørgens Chirok Han Sen Xue ANU Working Papers in Economics and Econometrics # 635 May 2016 JEL: C230 ISBN:

More information

Lecture 8 Panel Data

Lecture 8 Panel Data Lecture 8 Panel Data Economics 8379 George Washington University Instructor: Prof. Ben Williams Introduction This lecture will discuss some common panel data methods and problems. Random effects vs. fixed

More information

Specification Tests in Unbalanced Panels with Endogeneity.

Specification Tests in Unbalanced Panels with Endogeneity. Specification Tests in Unbalanced Panels with Endogeneity. Riju Joshi Jeffrey M. Wooldridge June 22, 2017 Abstract This paper develops specification tests for unbalanced panels with endogenous explanatory

More information

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008 Panel Data Seminar Discrete Response Models Romain Aeberhardt Laurent Davezies Crest-Insee 11 April 2008 Aeberhardt and Davezies (Crest-Insee) Panel Data Seminar 11 April 2008 1 / 29 Contents Overview

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 6 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 21 Recommended Reading For the today Advanced Panel Data Methods. Chapter 14 (pp.

More information

Appendix A: The time series behavior of employment growth

Appendix A: The time series behavior of employment growth Unpublished appendices from The Relationship between Firm Size and Firm Growth in the U.S. Manufacturing Sector Bronwyn H. Hall Journal of Industrial Economics 35 (June 987): 583-606. Appendix A: The time

More information

What s New in Econometrics. Lecture 15

What s New in Econometrics. Lecture 15 What s New in Econometrics Lecture 15 Generalized Method of Moments and Empirical Likelihood Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Generalized Method of Moments Estimation

More information

System GMM estimation of Empirical Growth Models

System GMM estimation of Empirical Growth Models System GMM estimation of Empirical Growth Models ELISABETH DORNETSHUMER June 29, 2007 1 Introduction This study based on the paper "GMM Estimation of Empirical Growth Models" by Stephan Bond, Anke Hoeffler

More information

Bias Correction Methods for Dynamic Panel Data Models with Fixed Effects

Bias Correction Methods for Dynamic Panel Data Models with Fixed Effects MPRA Munich Personal RePEc Archive Bias Correction Methods for Dynamic Panel Data Models with Fixed Effects Mohamed R. Abonazel Department of Applied Statistics and Econometrics, Institute of Statistical

More information

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure A Robust Approach to Estimating Production Functions: Replication of the ACF procedure Kyoo il Kim Michigan State University Yao Luo University of Toronto Yingjun Su IESR, Jinan University August 2018

More information

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and

More information

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Matthew Harding and Carlos Lamarche January 12, 2011 Abstract We propose a method for estimating

More information

Advanced Econometrics

Advanced Econometrics Based on the textbook by Verbeek: A Guide to Modern Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna May 16, 2013 Outline Univariate

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

CORRELATED RANDOM EFFECTS MODELS WITH UNBALANCED PANELS

CORRELATED RANDOM EFFECTS MODELS WITH UNBALANCED PANELS CORRELATED RANDOM EFFECTS MODELS WITH UNBALANCED PANELS Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI 48824-1038 wooldri1@msu.edu July 2009 I presented an earlier

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Dynamic Panels. Chapter Introduction Autoregressive Model

Dynamic Panels. Chapter Introduction Autoregressive Model Chapter 11 Dynamic Panels This chapter covers the econometrics methods to estimate dynamic panel data models, and presents examples in Stata to illustrate the use of these procedures. The topics in this

More information

A Note on Demand Estimation with Supply Information. in Non-Linear Models

A Note on Demand Estimation with Supply Information. in Non-Linear Models A Note on Demand Estimation with Supply Information in Non-Linear Models Tongil TI Kim Emory University J. Miguel Villas-Boas University of California, Berkeley May, 2018 Keywords: demand estimation, limited

More information

xtseqreg: Sequential (two-stage) estimation of linear panel data models

xtseqreg: Sequential (two-stage) estimation of linear panel data models xtseqreg: Sequential (two-stage) estimation of linear panel data models and some pitfalls in the estimation of dynamic panel models Sebastian Kripfganz University of Exeter Business School, Department

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

Short T Panels - Review

Short T Panels - Review Short T Panels - Review We have looked at methods for estimating parameters on time-varying explanatory variables consistently in panels with many cross-section observation units but a small number of

More information

Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels.

Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels. Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels. Pedro Albarran y Raquel Carrasco z Jesus M. Carro x June 2014 Preliminary and Incomplete Abstract This paper presents and evaluates

More information

GMM Estimation of Empirical Growth Models

GMM Estimation of Empirical Growth Models GMM Estimation of Empirical Growth Models Stephen Bond Nuffield College, University of Oxford and Institute for Fiscal Studies Anke Hoeffler St.Antony s College, University of Oxford and Centre for the

More information

Nonparametric Instrumental Variables Identification and Estimation of Nonseparable Panel Models

Nonparametric Instrumental Variables Identification and Estimation of Nonseparable Panel Models Nonparametric Instrumental Variables Identification and Estimation of Nonseparable Panel Models Bradley Setzler December 8, 2016 Abstract This paper considers identification and estimation of ceteris paribus

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

THE BEHAVIOR OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF DYNAMIC PANEL DATA SAMPLE SELECTION MODELS

THE BEHAVIOR OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF DYNAMIC PANEL DATA SAMPLE SELECTION MODELS THE BEHAVIOR OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF DYNAMIC PANEL DATA SAMPLE SELECTION MODELS WLADIMIR RAYMOND PIERRE MOHNEN FRANZ PALM SYBRAND SCHIM VAN DER LOEFF CESIFO WORKING PAPER NO. 1992 CATEGORY

More information

Christopher Dougherty London School of Economics and Political Science

Christopher Dougherty London School of Economics and Political Science Introduction to Econometrics FIFTH EDITION Christopher Dougherty London School of Economics and Political Science OXFORD UNIVERSITY PRESS Contents INTRODU CTION 1 Why study econometrics? 1 Aim of this

More information

SIMPLE SOLUTIONS TO THE INITIAL CONDITIONS PROBLEM IN DYNAMIC, NONLINEAR PANEL DATA MODELS WITH UNOBSERVED HETEROGENEITY

SIMPLE SOLUTIONS TO THE INITIAL CONDITIONS PROBLEM IN DYNAMIC, NONLINEAR PANEL DATA MODELS WITH UNOBSERVED HETEROGENEITY SIMPLE SOLUTIONS TO THE INITIAL CONDITIONS PROBLEM IN DYNAMIC, NONLINEAR PANEL DATA MODELS WITH UNOBSERVED HETEROGENEITY Jeffrey M Wooldridge THE INSTITUTE FOR FISCAL STUDIES DEPARTMENT OF ECONOMICS, UCL

More information

Regression with time series

Regression with time series Regression with time series Class Notes Manuel Arellano February 22, 2018 1 Classical regression model with time series Model and assumptions The basic assumption is E y t x 1,, x T = E y t x t = x tβ

More information

1 Introduction The time series properties of economic series (orders of integration and cointegration) are often of considerable interest. In micro pa

1 Introduction The time series properties of economic series (orders of integration and cointegration) are often of considerable interest. In micro pa Unit Roots and Identification in Autoregressive Panel Data Models: A Comparison of Alternative Tests Stephen Bond Institute for Fiscal Studies and Nuffield College, Oxford Céline Nauges LEERNA-INRA Toulouse

More information

NEW ESTIMATION METHODS FOR PANEL DATA MODELS. Valentin Verdier

NEW ESTIMATION METHODS FOR PANEL DATA MODELS. Valentin Verdier NEW ESTIMATION METHODS FOR PANEL DATA MODELS By Valentin Verdier A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Economics - Doctor of

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Specification testing in panel data models estimated by fixed effects with instrumental variables

Specification testing in panel data models estimated by fixed effects with instrumental variables Specification testing in panel data models estimated by fixed effects wh instrumental variables Carrie Falls Department of Economics Michigan State Universy Abstract I show that a handful of the regressions

More information

Testing Error Correction in Panel data

Testing Error Correction in Panel data University of Vienna, Dept. of Economics Master in Economics Vienna 2010 The Model (1) Westerlund (2007) consider the following DGP: y it = φ 1i + φ 2i t + z it (1) x it = x it 1 + υ it (2) where the stochastic

More information

An Exponential Class of Dynamic Binary Choice Panel Data Models with Fixed Effects

An Exponential Class of Dynamic Binary Choice Panel Data Models with Fixed Effects DISCUSSION PAPER SERIES IZA DP No. 7054 An Exponential Class of Dynamic Binary Choice Panel Data Models with Fixed Effects Majid M. Al-Sadoon Tong Li M. Hashem Pesaran November 2012 Forschungsinstitut

More information

Panel Data Exercises Manuel Arellano. Using panel data, a researcher considers the estimation of the following system:

Panel Data Exercises Manuel Arellano. Using panel data, a researcher considers the estimation of the following system: Panel Data Exercises Manuel Arellano Exercise 1 Using panel data, a researcher considers the estimation of the following system: y 1t = α 1 + βx 1t + v 1t. (t =1,..., T ) y Nt = α N + βx Nt + v Nt where

More information

On IV estimation of the dynamic binary panel data model with fixed effects

On IV estimation of the dynamic binary panel data model with fixed effects On IV estimation of the dynamic binary panel data model with fixed effects Andrew Adrian Yu Pua March 30, 2015 Abstract A big part of applied research still uses IV to estimate a dynamic linear probability

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

ECON3327: Financial Econometrics, Spring 2016

ECON3327: Financial Econometrics, Spring 2016 ECON3327: Financial Econometrics, Spring 2016 Wooldridge, Introductory Econometrics (5th ed, 2012) Chapter 11: OLS with time series data Stationary and weakly dependent time series The notion of a stationary

More information

Consistent estimation of dynamic panel data models with time-varying individual effects

Consistent estimation of dynamic panel data models with time-varying individual effects ANNALES D ÉCONOMIE ET DE STATISTIQUE. N 70 2003 Consistent estimation of dynamic panel data models with time-varying individual effects Céline NAUGES, Alban THOMAS 1 ABSTRACT. This paper proposes a new

More information

Subset-Continuous-Updating GMM Estimators for Dynamic Panel Data Models

Subset-Continuous-Updating GMM Estimators for Dynamic Panel Data Models econometrics Article Subset-Continuous-Updating GMM Estimators for Dynamic Panel Data Models Richard A Ashley 1 and Xiaojin Sun 2, * 1 Department of Economics, Virginia Tech, Blacksburg, VA 24060, USA;

More information

Parametric Identification of Multiplicative Exponential Heteroskedasticity

Parametric Identification of Multiplicative Exponential Heteroskedasticity Parametric Identification of Multiplicative Exponential Heteroskedasticity Alyssa Carlson Department of Economics, Michigan State University East Lansing, MI 48824-1038, United States Dated: October 5,

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 1 Jakub Mućk Econometrics of Panel Data Meeting # 1 1 / 31 Outline 1 Course outline 2 Panel data Advantages of Panel Data Limitations of Panel Data 3 Pooled

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Common Correlated Effects Estimation of Dynamic Panels with Cross-Sectional Dependence

Common Correlated Effects Estimation of Dynamic Panels with Cross-Sectional Dependence Common Correlated Effects Estimation of Dynamic Panels with Cross-Sectional Dependence om De Groote and Gerdie Everaert SHERPPA, Ghent University Preliminary, February 00 Abstract his paper studies estimation

More information

Chapter 6 Stochastic Regressors

Chapter 6 Stochastic Regressors Chapter 6 Stochastic Regressors 6. Stochastic regressors in non-longitudinal settings 6.2 Stochastic regressors in longitudinal settings 6.3 Longitudinal data models with heterogeneity terms and sequentially

More information

Fixed Effects Models for Panel Data. December 1, 2014

Fixed Effects Models for Panel Data. December 1, 2014 Fixed Effects Models for Panel Data December 1, 2014 Notation Use the same setup as before, with the linear model Y it = X it β + c i + ɛ it (1) where X it is a 1 K + 1 vector of independent variables.

More information

New Developments in Econometrics Lecture 16: Quantile Estimation

New Developments in Econometrics Lecture 16: Quantile Estimation New Developments in Econometrics Lecture 16: Quantile Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. Review of Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile

More information

1. GENERAL DESCRIPTION

1. GENERAL DESCRIPTION Econometrics II SYLLABUS Dr. Seung Chan Ahn Sogang University Spring 2003 1. GENERAL DESCRIPTION This course presumes that students have completed Econometrics I or equivalent. This course is designed

More information

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. The Basic Methodology 2. How Should We View Uncertainty in DD Settings?

More information

Birkbeck Working Papers in Economics & Finance

Birkbeck Working Papers in Economics & Finance ISSN 1745-8587 Birkbeck Working Papers in Economics & Finance Department of Economics, Mathematics and Statistics BWPEF 1809 A Note on Specification Testing in Some Structural Regression Models Walter

More information

Panel data methods for policy analysis

Panel data methods for policy analysis IAPRI Quantitative Analysis Capacity Building Series Panel data methods for policy analysis Part I: Linear panel data models Outline 1. Independently pooled cross sectional data vs. panel/longitudinal

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Jeffrey M. Wooldridge Michigan State University

Jeffrey M. Wooldridge Michigan State University Fractional Response Models with Endogenous Explanatory Variables and Heterogeneity Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. Fractional Probit with Heteroskedasticity 3. Fractional

More information

Econometrics Homework 4 Solutions

Econometrics Homework 4 Solutions Econometrics Homework 4 Solutions Question 1 (a) General sources of problem: measurement error in regressors, omitted variables that are correlated to the regressors, and simultaneous equation (reverse

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

10 Panel Data. Andrius Buteikis,

10 Panel Data. Andrius Buteikis, 10 Panel Data Andrius Buteikis, andrius.buteikis@mif.vu.lt http://web.vu.lt/mif/a.buteikis/ Introduction Panel data combines cross-sectional and time series data: the same individuals (persons, firms,

More information

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model

More information

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE 1. You have data on years of work experience, EXPER, its square, EXPER, years of education, EDUC, and the log of hourly wages, LWAGE You estimate the following regressions: (1) LWAGE =.00 + 0.05*EDUC +

More information

Robust Unit Root and Cointegration Rank Tests for Panels and Large Systems *

Robust Unit Root and Cointegration Rank Tests for Panels and Large Systems * February, 2005 Robust Unit Root and Cointegration Rank Tests for Panels and Large Systems * Peter Pedroni Williams College Tim Vogelsang Cornell University -------------------------------------------------------------------------------------------------------------------

More information

xtdpdqml: Quasi-maximum likelihood estimation of linear dynamic short-t panel data models

xtdpdqml: Quasi-maximum likelihood estimation of linear dynamic short-t panel data models xtdpdqml: Quasi-maximum likelihood estimation of linear dynamic short-t panel data models Sebastian Kripfganz University of Exeter Business School, Department of Economics, Exeter, UK UK Stata Users Group

More information

Econometric Methods for Panel Data

Econometric Methods for Panel Data Based on the books by Baltagi: Econometric Analysis of Panel Data and by Hsiao: Analysis of Panel Data Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies

More information

Beyond the Target Customer: Social Effects of CRM Campaigns

Beyond the Target Customer: Social Effects of CRM Campaigns Beyond the Target Customer: Social Effects of CRM Campaigns Eva Ascarza, Peter Ebbes, Oded Netzer, Matthew Danielson Link to article: http://journals.ama.org/doi/abs/10.1509/jmr.15.0442 WEB APPENDICES

More information

Bias Corrections for Two-Step Fixed Effects Panel Data Estimators

Bias Corrections for Two-Step Fixed Effects Panel Data Estimators Bias Corrections for Two-Step Fixed Effects Panel Data Estimators Iván Fernández-Val Boston University Francis Vella Georgetown University February 26, 2007 Abstract This paper introduces bias-corrected

More information

GMM based inference for panel data models

GMM based inference for panel data models GMM based inference for panel data models Maurice J.G. Bun and Frank Kleibergen y this version: 24 February 2010 JEL-code: C13; C23 Keywords: dynamic panel data model, Generalized Method of Moments, weak

More information

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid Applied Economics Panel Data Department of Economics Universidad Carlos III de Madrid See also Wooldridge (chapter 13), and Stock and Watson (chapter 10) 1 / 38 Panel Data vs Repeated Cross-sections In

More information

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS AND STRONG INSTRUMENTS AT UNITY. Chirok Han and Peter C. B. Phillips.

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS AND STRONG INSTRUMENTS AT UNITY. Chirok Han and Peter C. B. Phillips. GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS AND STRONG INSTRUMENTS AT UNITY By Chirok Han and Peter C. B. Phillips January 2007 COWLES FOUNDATION DISCUSSION PAPER NO. 1599 COWLES FOUNDATION FOR

More information

Instrumental variables estimation using heteroskedasticity-based instruments

Instrumental variables estimation using heteroskedasticity-based instruments Instrumental variables estimation using heteroskedasticity-based instruments Christopher F Baum, Arthur Lewbel, Mark E Schaffer, Oleksandr Talavera Boston College/DIW Berlin, Boston College, Heriot Watt

More information

Dynamic Panel Data Models

Dynamic Panel Data Models June 23, 2010 Contents Motivation 1 Motivation 2 Basic set-up Problem Solution 3 4 5 Literature Motivation Many economic issues are dynamic by nature and use the panel data structure to understand adjustment.

More information