Panel Data III. Stefan Dahlberg

Size: px

Start display at page:

Download "Panel Data III. Stefan Dahlberg"

Heather Doyle
6 years ago
Views:

1 Panel Data III Stefan Dahlberg

2 Overview of Part #3 VII. Spatial Panel Data & Unit Heterogeneity VIII. Panel Corrected Standard Errors (PCSE) IX. Fixed and Random Effects X. Hausman Test XI. The Hybrid Model XII. Working with TSCS data in Stata (cont.) c. Commands tailored for panel data

3 Potential Problems with Panel Data 1. Autocorrelation in the time dimension within units Expected effects: biased standard errors 2. Spatial autocorrelation (correlated errors between units that are geographically proximate) Expected effects: ineffient β:s, biased standard errors 3. Heteroscedasticity (i.e. unequal variance between units) Expected effects: ineffient β:s, biased standard errors 4. Contemporaneous correlation of errors across units (something affects all units at the same time) Expected effects: ineffient β:s, biased standard errors 5. Structural issues (likely the case that each disparate unit will have a unique constant as each unit likely has a distinct history) Expected effects: inefficient β:s, biased standard errors

4 VII. Spatial Panel Data and Unit Heterogeneity

5 Additional estimation problems Spatial autocorrelation (r[e i,t,e j,t ] 0): X i,t Y i,t Y j,t e.g., diffusion process usually a problem when geography is at hand.

6 Spatial autocorrelation (r[e i,t,e j,t ] 0):

7 Panel heteroskedasticity Panel heteroscedasticity or cluster heterogeneity (V[e i ] σ for all i) (e.g., quality of data varies with GDP/cap) implies that cluster means of the DV varies across clusters due to unmeasured cluster-level factors. Unobserved heterogeneity should always be addressed in clustered data (such as panel or multilevel data). For some models, one can include observed variables that will explain parts of the between cluster differences (instead of country dummies you control for GDP, political system, population etc).

8 Panel heteroskedasticity (Figure from Bartels 2009)

9 Correlated Residuals Larger Residuals as Y increases - Heteroscedasticity of Errors

10 Additional estimation problems How detect heteroscedasticity? Use same tools as for simple OLS (Breusch-Pagan or White s test). Breusch-Pagan test: Regress full model (no xtset is needed) hettest indep. vars. gives a chi2-squared test statistic (H0=homoscedasticity in the error terms - if Prob>chi2=<.05 H0 is rejected heteroscedasticity is at hand)

11 Three strategies for Paneldata 1. OLS Completely Pooled * - TSCS (Data :N>T) use autoregressive model with PCSE or go for: 2. Fixed Effects (FE) Unit-Specific Eff. No pooling (Data: N<=T) 3. Random Effects (RE) Partial pooling (Data: N<=T) The last two approaches account for unobserved heterogeneity, although in different ways, while the complete pooling strategy ignores the unobserved heterogeneity (when PCSE:s are not used) *all units are characterized by the same regression equation at all points in time.

12 Pooled OLS

13 OLS with PCSE:s Panel-corrected standard errors (PCSE s - Beck & Katz -APSR 1995)), by assuming Correlations between residuals vary across pairs of units, but remain constant over time within each pair and are only contemporaneous (r[e i,t,e j,t ]= r[e i,t,e j,t ] 0, but r[e i,t,e j,t ]=0) The variance of the residuals vary by country but remain constant over time* (V[e i,t ] V[e j,t ], but V[e i,t ]=V[e j,t ]) Stata estimates these correlations and variances from T the data and corrects for them (xtpcse) ^ ei, te j, t t 1 i, j * correlation in residuals over time is corrected by LDV T

14 Panel Corrected Standard Errors Beck and Katz propose an estimator that pools information across clusters to estimate the error variances. The Beck & Katz panel corrected standard error is calculated in the following way. Organize the residuals from the fitted model (OLS) according to cluster, so that the residuals from the clusters are ˆe 1 ˆe 2,..., ˆe N. These are vectors with T elements each, and they can be grouped together and used for weighting residuals across units. as a T N matrix (the ˆe i are columns): In Stata: xtpcse instead of xtreg

15 What is Panel Corrected Standard Errors? Aimed at estimating correlations across units. Such as contemporaneous correlation and additional problem of shared error effects across units panel heteroscedasticity. Why not always use PCSE:s? When homoskedasticity assumption holds and errors are normally distributed the t- statistics have an exact t-distribution even when sample sizes are small and with small samples the t-statistics for corrected or robust std. err. are not very close to the t- distrubution. (Wooldrige 2003:261)

16 Panel Corrected Standard Errors The pcse standard error estimate is robust not only to unit heteroskedacity, but it also robust against possible contemporaneous correlation across the units that is common in TSCS data. Note that using panel-corrected standard errors (PCSE:s Beck & Katz 1995) is a complete pooling approach that does not account for unobserved heterogeneity. The PCSE:s do, of course, make corrections for the standard errors, but the OLS coefficients are completely pooled estimates. The major payoff of this approach is its simplicity while a disadvantage of complete pooling is that it ignores unobserved Heterogeneity which can induce omitted variable bias (Skrondal and Rabe-Hesketh 2004)

17 VIII. Fixed- and Random Effects

18 Fixed Effects (FE) Fixed-effects (FE) also known as Unit Specific Effects - are used when only interested in analyzing the impact of variables that vary over time. Since fixed effects estimators depends on deviations from their group means, they are sometimes referred to as within-groups estimators (Davidson and MacKinnon, 1993). Equation for the FE model is: Yit = αi + βxit + uit Where: αi (i=1.n) is the unknown intercept for each unit (n unit-specific intercepts). Yit is the dep. var. (DV) where i = unit and t = time. Xit represents one indep. var. (IV) β1 is the coefficient for that IV, uit is the error term

19 Fixed Effects (FE) (Figure from Bartels 2009)

20 Fixed Effects (FE) FE explore the relationship between indep.vars and dep.vars within a unit (country, person, organization, etc.). Each unit has its own characteristics that may or may not influence the indep. vars (gender could for example influence opinion on certain issue or the electoral system of a specific country may affect party competition or a company's business practices may influence its stock price). When using FE we assume that something within the individual/unit may impact or bias the indep. or dep. vars and we need to control for this. This is the logic behind the assumption of the correlation between a units error term and indep. variables. FE remove the effect of time-invariant characteristics from the indep. vars.

21 Fixed Effects (FE) Fixed effects models are not without their drawbacks. The fixed effects models may have too many cross-sectional units of observations requiring too many dummy variables for model specification. Too many variables decreases the degrees of freedom for adequately statistical tests. A model with too many variables may be plagued with multicollinearity, which increases the standard errors. If the model contain variables that do not vary within the groups - parameter estimation may be inefficient. Autocorrelation over time (serial-correlation) is not solved by FE. Use LDV if T>15 or use Praise-regression

22 Fixed Effects (FE) FE models cannot be used to investigate time-invariant causes of the dependent variables, this since time-invariant characteristics of the units are perfectly collinear with the dummies. The FE approach thus eliminates the ability to test between-cluster hypotheses Fixed-effects will not work well with data for which within-cluster variation is minimal or for slow changing variables over time. Since all between-cluster variation in the data is absorbed by the cluster-specific dummies, the effects of independent variables are solely within-cluster effects, which has implications for how one interprets coefficients. For TSCS data, such effects are interpreted as: for a given country, as X varies across time by one unit, Y increases or decreases by ᵝ units (Bartels 2009).

23 For TSCS data, a now standard modeling practice is to use an FE model with panel-corrected standard errors and a lagged dependent variable to account for dynamics (Beck and Katz 1996; Beck 2001; Wilson and Butler 2007), though there is not an ironclad consensus about this strategy among practitioners (Bartels 2009) However fixed effects with a lagged DV does only perform well when t>15 (approximately)

24 AN EXAMPLE OF WHEN IT DOES MATTER

25 There can be little doubt when viewed as a single-level model, that the relation is a negative one, albeit one with quite a lot of scatter around the line. Turning now to a random intercept multilevel model (Goldstein 2003), we can recognise that the observations belong to 10 groups.

26 With single level regression, it is assumed that the observations are independently and identically distributed, and this gives an overall negative relationship. There is no recognition that within each group the underlying relationship is positive. In contrast the multilevel model allows the intercept of each group to take on a different value from an overall distribution. As the following table shows, the multilevel model is a much better fit to these data, with a considerably smaller deviance. The multilevel model gives a substantially better interpretation of the data. There is no reason why such relationships should not be found in reality. The effect of taking account of groups in the multilevel model is marked here because the group-specific intercept is negatively related to the mean of X for each group. This behaviour can be elucidated by including the group mean of X in the multilevel model alongside the deviations of X from that mean (Paccagnella, 2006).

27 It is now clear that the between-group relation between Y and Xbar is markedly negative, while the within-group relation with X-Xbar is positive. The true relation between Y and X is only revealed when the within- and between-group relations are considered jointly in a multilevel model. 6

28 FE in Stata With OLS and dummies xi: reg Y Xk i.id With xtreg and fe-option xtset id time xtreg Y Xk, fe Note: sometimes you might need to convert id to numeric, type: encode id, gen(id2). Use id2 instead of id in the xtset command

29 Testing for Heteroskedasticity A test for heteroskedasticiy available for fixed-effect models is: In Stata: xttest3 The null-hypothesis is homoskedasticity (constant variance). If we reject the null-hypothesis heteroskedasticity is a problem. Solution: Use the option robust to obtain heteroskedasticityrobust standard errors (Huber/White or sandwich estimators).

30 Random Effects (RE) The rationale behind random effects model is that, unlike the fixed effects model, the variation across entities is assumed to be random and uncorrelated with the predictor or independent variables included in the model (Torres-Reyna 200x)). Yit = α + βxit + uit + εit Between cluster error Within cluster error If you have reason to believe that differences across entities have some influence on your dependent variable then you should use random effects. An advantage of random effects is that you can include time invariant variables (i.e. gender). In the fixed effects model these variables are absorbed by the intercept variation.

31 Assumptions of RE The two components of the composite error, Ui and εit are independent, i.e. E(Uε)=0 2. The variances of both Ui (σ2 u) and εit (σ2 ε) are constant for all X (no heteroskedasticity) 3. The idiosyncratic residuals εit at one point in time are not related to their value at another point in time (no autocorrelation in εit). These three are relatively unproblematic but: 4. Both Ui and εit are unrelated to the Xik, i.e. E(XU)=E(Xε)=0 In order to use a RE model to identify and estimate a β with two separate error terms, they need to be treated as unrelated to the observed independent variables. This is the common OLS assumption for ε, but is now extend to U as well. Otherwise we can t estimate the separate effects of X and the composite error.

32 Since a level-1 variable varies both within and between clusters, many argue that this an unrealistic assumption to satisfy, since unobserved heterogeneity will almost always be correlated with the independent variables (Bartels 2009). Assumptions of RE A random effects, or random intercept, approach treats u0j as distributed normally with mean zero and an estimable variance. This approach decomposes the total error into a level-1 component (eij) and a level-2 component (u0j). The RE model is a partial pooling approach, with the effects of X1ij and X2ij a weighted average of the within and between-cluster variation in the data (Gelman and Hill 2007). A major complaint lodged against the RE model relates to the restrictive assumption that level-1 independent variables be uncorrelated with the random effects term on lev. 2: Cov(Xij, u0j)=0.

33 If assumption (4) is true, then RE is definitely the best estimator available. How does the RE estimator work, what its advantages

34 How to estimate RE How do we estimate a model such as the equation below? Yit= α + β1x1it +β2x2it +β3x3it+...βkxikt +Ui +ε it

35 RE in Stata With xtreg and re-option xtset id time xtreg Y Xk, re xtmixed Y Xk id

36 Mean of country level intercepts xtreg fh_polity2, re Std. Dev of country level intercepts Std. Dev. At the time level

37 But what is it really? How to estimate RE Several ways, but the simplest is to estimate by Generalized Least Squares (GLS), which involves weighting the equation by a factor that will transform the problematic error term so that OLS can be used on the weighted or transformed model. This is what used to be the most common correction for heteroskedasticity: weight the data by the inverse of X or the square of X (because the unequal variance was an increasing or decreasing function of X), and this weighted equation would then have an error variance that satisfied OLS assumptions.so WLS (Weighted Least Squares) is a type of GLS estimation. Another example of GLS is in time-series analysis, where one might weight the data by ρ, the autocorrelation parameter for the εit, and then use OLS on the weighted data to estimate structural effects when the error term is autocorrelated. Generally: GLS proceeds by weighting the data by the inverse of the error variance-covariance matrix to ensure that the weighted equation has a normal structure with common variance on the diagonals and zero covariances on the off-diagonals. Then OLS is used on the weighted equation.

38 ICC Intraclass Correlation The proportion of the total variability in the outcome that is attributed to the higher levels. Calculating the ICC show how much of the variability in the dependent variable that is attributed to the Higher Level Clusters (e.g. how much is attributed to variation between countries/regions/schools and how much is between individuals within these units) High ICC high variability at the highest level. Low ICC low variability at the highest level.

39 ICC Intraclass Correlation Amount of Variance in DV explained by the unit-specific (level 2) variables ICC Variance in Random Intercepts Between Group Variance Intercept [Subject = Variance Variable Name] We need to model unit-effects if this number is higher than 0.05 Total Variance = Variance in Random Intercepts + Residual Variance Between Group Variance + Within Group Variance Intercept [Subject = Variance Variable Name] + Residual

40 ICC Intraclass Correlation If the intra-class correlation approaches 1, there is no variance to explain at the individual level all units are similar results are driven by btw cluster effects If ICC is 0 = no clustering structure use simple OLS ICC Total Variance = Variance in Random Intercepts + Residual Variance Between Group Variance + Within Group Variance Intercept [Subject = Variance Variable Name] + Residual

41 Needed: estimates of the two variance terms σ2 u + σ2 e. If we could obtain those estimates, we can weight or transform equation in the following way and then use OLS to estimate the effects: Yit θ Yi = (α θα ) +β1(x1it θ X1i ) +β2 (X2it θ X2i ) +βk (Xikt θ Xik ) + (Uit θui + ε it θεi ) If the observed Y and X in the model are transformed/weighted by equations θ ( Theta ) then the resulting error term in will be OLS-ready. If we knew σ2 u + σ2 e. we could simply use them directly but we don t know the population values of these two error variances. We need to estimate them from our data, which is why this application is called Feasible Generalized Least Squares (FGLS)

the Xi over time), we get an error term whose variance is: or the variance of U plus the time-averaged variance of ε.

42 How do we get estimates of σ2u + σ2e? From the Within (FE) Regression we get an estimate of σ2 e. Why? FE eliminates Ui altogether, and the error term is pure ε From the Between Regression (the means of Yi over time against the means of all the Xi over time), we get an error term whose variance is: or the variance of U plus the time-averaged variance of ε. With this information we can calculate an estimate of each error component, calculate θ or THETA, transform the equation by this estimate and re-estimate the model with OLS Computational Formula for θ:

43 So what does the RE actually do? We can examine θ more closely to get a better idea of what RE is doing. As θ (theta) gets closer to 1, it means that more and more of the composite error variance is made up of Ui unit-level or between variance. So what happens then? Then the weighted RE equation (13) reduces to the FE equation! because if all of the error variation is from U, let s difference out U completely as the FE model does. As theta gets closer to 0, it means that more of more of the composite error variance is made up of random idiosyncratic variance ε, with no unit variance at all. So the RE equation reduces to POOLED OLS in this instance!! This is also as it should be because we only should take into account unit effects when they exist!

44 So what does the RE actually do? So we can look at the RE estimator as a weighted average of FE and pooled OLS, with the weight (θ) depending on how much of the estimated composite error variance is from the units. This is a middle ground, then, between the full unitlevel differencing model of FE and the assumption of no unit effects in pooled OLS. If there is a lot of unit-level variation, then RE is closer to FE. If there is not so much unit-level variation, then RE is closer to pooled OLS. This seems reasonable, IF the RE assumption of zero correlation between X and U is tenable a big if!!

45 IX. Hausman test

46 Fixed or Random Effects? To decide between fixed or random effects you can run a Hausman (1978) test where the null hypothesis is that the preferred model is random effects vs. the alternative the fixed effects (see Green, 2008). Hausman is thus used to detect violations of the random-effects modeling assumption that the explanatory variables are orthogonal to the unit effects. It basically tests whether the unique errors (ui) are correlated with the regressors, the null hypothesis is they are not. A significant test result is taken as evidence of a correlation between x and αj, implying that the random-effects model should be rejected in favor of the fixed-effects model.

47 Fixed or Random Effects? How to conduct a Hausman test to detect violations of the randomeffects modeling assumption: Run a fixed effects model and save the estimates, then run a random model and save the estimates and perform the test, such as: xtreg y x1, fe estimates store fixed xtreg y x1, re estimates store random hausman fixed random

48 Command reg xtreg areg Syntax Entity fixed effects reg y x1 x2 x3 x4 x5 i.country xtreg y x1 x2 x3 x4 x5, fe areg y x1 x2 x3 x4 x5, absorb(country) reg xtreg areg Entity and time fixed effects reg y x1 x2 x3 x4 x5 i.country i.year xtreg y x1 x2 x3 x4 x5 i.year, fe areg y x1 x2 x3 x4 x5 i.year, absorb(country) xtreg xtreg Random Effects xtreg y x1 x2 x3 x4 x5, re xtreg y x1 x2 x3 x4 x5, re robust

49 The Hybrid Model What to do if we want time-invariant covariates but the RE-assumption doesn t hold? To keep the FE set up, while trying to say something about the effects of time-invariant Xs; and keep the RE set up, while at the same time allowing possible correlation between the X and the Ui.

50 Rabe-Hesketh&Skrondal/Bell&Jones:Hybrid Model The idea is that the possible covariation of time-varying Xs and the Ui is what messes up RE. But this possible covariation is the result of model misspecification - something in the Ui term is related to the X that we need to account for, and RE cannot account for it due to its assumption that E(XU)=0. But we can bring the covariation between X and Ui into the model indirectly, by including the mean of X as an additional independent variable in Whatever covariation between X and U that may exist is now accounted for; if units that are generally high (low) on X also have high (low) U terms, then the mean of X in the model will pick this up. The effect of regular X can now be estimated, controlling for this possible confounding problem.

51 Rabe-Hesketh&Skrondal/Bell&Jones:Hybrid Model in Stata use "C:\Steve\exercise2.dta", clear gen mad_gdppcl=ln(mad_gdppc) egen gdpmean=mean(mad_gdppcl), by (ccode) gen gdpmeandev=mad_gdppcl-gdpmean xtreg fh_ipolity2 gdpmeandev gdpmean al_ethnic al_religion, re xtreg fh_ipolity2 gdpmeandev gdpmean al_ethnic al_religion, re vce(cluster ccode)

52 Potential Problems and Solutions with Panel Data Analysis 1. Autocorrelation in the time dimension within units Lagged DV, Praise-Winsten or Cochrane Orcutt 2. Spatial autocorrelation (correlated errors between units that are geographically proximate) PCSE:s 3. Heteroscedasticity (i.e. unequal variance between units) FE or RE (PCSE) 4. Contemporaneous correlation of errors across units (something affects all units at the same time) PCSE 5. Structural issues (likely the case that each disparate unit will have a unique constant as each unit likely has a distinct history) FE or RE

53 Panel data. Heterogeneity in levels and effects So far we talked about the inherent problems with the data. Other (and equally serious) Sources of heterogeneity that inflate the previously noted problems Intercept Heterogeneity Assuming a common intercept is as problematic as assuming constant error variance. Is it reasonable to assume that Welfare spending levels to be the same in Sweden and in Spain, absent all other factors that are relevant? Slope Heterogeneity Assuming common slopes can lead to erroneous inferences between dependent and independent variables. The effect of X on Y might be accelerated due to context specific effects that are not being accounted for in the model. Simultaneous intercept and slope heterogeneity

54 1. OLS Interce pt Slope Error term within clusters Variability in Intercepts between groups 2. Random Intercept Variability in Slopes within clusters 3. Random Coefficient 4. Random Coefficient And Random Intercept

55 Panel data. Heterogeneity in levels and effects Heterogeneity (example from Wilson and Butler, 2007)

56 EXERCISE 5 1. Open the abridged Russian panel data set in WIDE format: exercise1 2. Run the following regression: supdem i,t = α+γ supdem i,t-1 + β lifesat i,t-1 + e i,t 3. Open the same data set in LONG format: exercise3 4. Run the same regression and compare results 5. Experiment with the lag length of lifesat and various controls (gender, age, education) 6. Run the regression from your final model but with PCSE:s, FE and RE (3 models) and compare results

57 SOLUTION use "exercise1", clear regr supdem2 supdem1 v39y1 regr supdem3 supdem2 v39y1 use "exercise3", clear xtset v1 wave regr supdem l.supdem l.v39y regr supdem l.supdem v39y l.v39y regr supdem l.supdem v39y l.v39y v35x v42x v293x set matsize 1500 xtpcse supdem l.supdem v39y l.v39y v35x v42x v293x, p xtreg supdem l.supdem v39y l.v39y v35x v42x v293x, fe xtreg supdem l.supdem v39y l.v39y v35x v42x v293x, re

58 EXERCISE #6 1. Open the abridged QoG panel data set from file exercise2 2. Run a autoregressiv equation with fh_ipolity2 i,t as the dependent variable and choose two independent variables 3. Run the same regression as in 2 but with pcse:s 4. Run the same regression as in 2 but with fixed effects using xtreg and fe-option 5. Run the same regression as in 2 but with fixed effects using country dummies and compare results from 3 and 4. Run the same regression as in 2 but with fixed effects and PCSE:s 6. Conduct a Hausman test 7. Run the same regression but now using random effects.

59 Solution use "exercise2", clear xtset ccode year xtreg fh_ipolity2 l.fh_ipolity2 l.mad_gdppc l.mad_pop set matsize 1500 xtpcse fh_ipolity2 l.fh_ipolity2 l.mad_gdppc l.mad_pop, p xtreg fh_ipolity2 l.fh_ipolity2 l.mad_gdppc l.mad_pop, fe xi: reg fh_ipolity2 l.fh_ipolity2 l.mad_gdppc l.mad_pop i.ccode xtpcse fh_ipolity2 l.fh_ipolity2 l.mad_gdppc l.mad_pop i.ccode, p xtreg fh_ipolity2 l.fh_ipolity2 l.mad_gdppc l.mad_pop, fe estimates store fixed xtreg fh_ipolity2 l.fh_ipolity2 l.mad_gdppc l.mad_pop, re estimates store random hausman fixed random xtreg fh_ipolity2 l.fh_ipolity2 l.mad_gdppc l.mad_pop, re

60 Cross-sectional dependence According to Baltagi, cross-sectional dependence is a problem in macro panels with long time series (over years). This is not much of a problem in micro panels (few years and large number of cases). The null hypothesis in the B-P/LM test of independence is that residuals across units are not correlated. In Stata xttest2 (run it after xtreg, fe): xtreg y x1, fe xttest2

Capital humain, développement et migrations: approche macroéconomique (Empirical Analysis - Static Part)

Séminaire d Analyse Economique III (LECON2486) Capital humain, développement et migrations: approche macroéconomique (Empirical Analysis - Static Part) Frédéric Docquier & Sara Salomone IRES UClouvain