Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43

Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression methods typically used with panel data, relating a dependent to explanatory variables Most issues with panel data come down to how we model the intercept and/or the error We will cover pooled, xed e ects and random e ects model (and mention brie y random coe cients model) Reading: Koop (28) chapter 8 and Gujarati chapter 1 () Applied Economoetrics: Topic March 2, 212 2 / 43

Panel Data Notation Cross-section relates to individual i for i = 1,.., N Depending on empirical application individual could be a person, company, country, etc. Time series: t = 1,.., T Y it is observation of dependent variable for individual i at time t X it is observation of the explanatory variable for individual i at time t. We will do derivations and discuss many issues in panel data modelling using only a single explanatory variable. But all ideas also hold with many explanatory variables () Applied Economoetrics: Topic March 2, 212 3 / 43

The Pooled Model Treats all observations as though they came from the same regression model: Y it = α + βx it + ε it If this model satis es the classical assumptions (see next slide for a reminder), then standard regression results hold. E.g. OLS will be the best linear unbiased estimator. Con dence intervals, hypothesis tests, etc. same as with cross-sectional regression. With pooled model, nothing is new. Why then do we need to study specialized methods for panel data? With most panel data sets, the pooled model is inappropriate. To explain why, introduce idea of an individual e ect () Applied Economoetrics: Topic March 2, 212 4 / 43

A Reminder of What the Classical Assumptions Are 1 E (ε i ) = mean zero errors. 2 var (ε i ) = E ε 2 i = σ 2 constant variance errors (homoskedasticity). 3 cov (ε i ε j ) = for i = j. 4 ε i is Normally distributed 5 X i is xed. It is not a random variable. () Applied Economoetrics: Topic March 2, 212 5 / 43

Individual E ects Models The two main individual e ects models are xed e ects model and random e ects model. Before discussing estimation and hypothesis testing, explain the basic ideas behind individual e ects models by using an example Returns to Schooling Example: regression of Y = log income on X = years of education. Note: would want to include other explanatory variables and maybe use instrumental variables, but we will ignore these issues to keep example simple Suppose we have panel data for many people for several years. () Applied Economoetrics: Topic March 2, 212 / 43

Pooled model is: Y it = α + βx it + ε it The regressions assumes all individuals have the same relationship between education and income. Remember what a regression does: Slope (β) is marginal e ect of X on Y If education increases by 1 year, income tends to increase by β per cent Intercept (α) is predicted or tted income when X =. Predicts income for an individual with no schooling. Intercept is the starting point, the benchmark that an individual s earnings develop from. () Applied Economoetrics: Topic March 2, 212 / 43

Pooled model assumes α and β are the same for everyone In practice, α (and sometimes β) are di erent for di erent individuals E.g. individuals from advantaged backgrounds probably do better than those without advantages. Someone with an advantaged background might earn a higher wage than a disadvantaged person with the same education level If this is case, intercepts should vary across individuals. In empirical practice, it often does seem that this should be the case. () Applied Economoetrics: Topic March 2, 212 8 / 43

Assume advantaged individual and disadvantaged individual do have di erent intercepts Figure 8.1 is XY plot of education and income for these two individuals 2 years of data points for advantaged labelled with * 2 years of data points for disadvantaged labelled with. True regression line for advantaged individual lies above regression line for disadvantaged Intercepts will di er for two individuals Returns to schooling is positive since lines upward sloping In Figure 8.1 OLS would estimate a line which has right slope, but wrong intercept Since slope is the measure of returns to schooling maybe pooled model is okay? No, as next example will show. () Applied Economoetrics: Topic March 2, 212 9 / 43

() Applied Economoetrics: Topic March 2, 212 1 / 43

Figure 8.2 has same setup as Figure 8.1. However, we do not have wide spread of values of explanatory variables For advantaged individual years of education always roughly 12 Advantaged individual stopped schooling after secondary school and then took a job For disadvantaged individual years of education always roughly 18 2 Disadvantaged individual stopped schooling after extensive university study and then took a job Figure 8.2 plots two true regression line along with tted regression line Fitted regression line produced using OLS on pooled model () Applied Economoetrics: Topic March 2, 212 11 / 43

() Applied Economoetrics: Topic March 2, 212 12 / 43

Note that it does pass near the data points, suggesting it is tting the data well. However, it is completely wrong! It is downward sloping, indicating that the returns to schooling are negative. Pooled model lead to exactly the wrong conclusion. Key point: If di erent individuals have di erent regression lines and you ignore this fact, you can be seriously misled. () Applied Economoetrics: Topic March 2, 212 13 / 43

What should we do instead of pooled model? To handle cases like Figures 8.1 and 8.2 need di erent intercept for di erent individuals Terminology: α i is intercept for individual i = individual e ect Regression model with individual e ect: Y it = α i + βx it + ε it Now turn to question of how to estimate individual e ects models () Applied Economoetrics: Topic March 2, 212 14 / 43

The Fixed E ects model Fixed e ects model uses dummy variables to model the individual e ect. You will know about dummy variables from previous study (for reminder, see chapter 2 of my textbook) A dummy variable is either or 1. Fixed e ects model: create N di erent dummy variables D (j) for j = 1,.., N. D (j) it = 1 for the i th individual and equals zero for all other individuals () Applied Economoetrics: Topic March 2, 212 15 / 43

Example of xed e ects dummy variables when N = 4 and T = 2 D (1) = 2 4 1 1 3 5, D (2) = 2 4 1 1 3 5, D (3) = 2 4 1 1 3 5, D (4) = 2 4 1 1 3 5 () Applied Economoetrics: Topic March 2, 212 1 / 43

Fixed e ects estimator runs regression: Y it = α 1 D (1) it + α 2 D (2) it +.. + α N D (N ) Nt + βx it + ε it Note: each individual has di erent intercept Individual j has D (j) = 1 for observations for this individual (with all other dummy variables equalling zero). Plug in D (j) jt = 1 in regression: Y jt = α j + βx jt + ε jt this is the same form as individual e ects speci cation () Applied Economoetrics: Topic March 2, 212 1 / 43

Estimation: this is a regression model. Standard regression methods can be used since this is just a regression model E.g. if errors satisfy classical assumptions, then OLS is BLUE, con dence intervals and hypothesis tests can be done in standard way, etc. Note: large number of explanatory variables (N dummy variables plus the regular explanatory variable) Panel data sets often have N = 5 or more Can be hard to obtain precise estimates of so many regression coe cients Gujarati calls this the xed e ects least squares dummy variable (LSDV) estimator () Applied Economoetrics: Topic March 2, 212 18 / 43

One trick used in some econometrics software packages: di erence the individual e ects model Note: Di erencing is a time series concept you should know from previous study (see page 18 of my textbook) With panel data Y it = Y it Y i,t 1 If we subtract individual e ects equation for Y i,t 1 from equation for Y it we get Y it = β X it + ε it Now need only to estimate β Possible problems with using this trick: Does not produce estimates of α i (which you might want in some applications) Can t be used with explanatory variables which are constant over time (e.g. gender, race, years of schooling of parent, etc.) since X it = for all such variables and they will simply drop out of model If the original errors, ε it, satisfy the classical assumptions, then ε it will not. In particular, they will violate Assumption 3 since cov ( ε it, ε i,t 1 ) =. () Applied Economoetrics: Topic March 2, 212 19 / 43

Another trick: take deviations from individual means Notation: bars over variables are averages over time, e.g. X i = deviation from individual means : T X it t=1 T X it X i If we average over time and subtract o from xed e ects model, we can write it in deviations from means form: Y it Y i = β Xit X i + εit ε This transformation also gets rid of all the dummy variables but has same possible problems as with di erencing Gujarati calls this the xed e ect within group estimator () Applied Economoetrics: Topic March 2, 212 2 / 43

Hypothesis Testing in the Fixed E ects Model Since xed e ects model is simply a regression model, standard regression methods for hypothesis testing can be used E.g. if errors obey classical assumptions, t-statistics can be used to test the hypothesis that an individual coe cient equals zero F-statistics can be used to test joint hypotheses involving several coe cients. A popular test is whether the pooled model is acceptable (are all intercepts the same?) This tests the null hypothesis: This is an F-test. H : α 1 =.. = α N () Applied Economoetrics: Topic March 2, 212 21 / 43

General formula for F-test (e.g. page 14 of my textbook) can be adapted as follows With panel data sample size is TN Fixed e ects model has N + k explanatory variables (in the multiple regression case with k explanatory variables) Pooled model has k + 1 explanatory variables F-stat is: R 2 FE is R2 for xed e ects model R 2 P is R2 for pooled model RFE 2 R 2 P / (N 1) F = (1 RFE 2 ) / (TN N k) Critical value taken from F N 1,TN N K distribution. () Applied Economoetrics: Topic March 2, 212 22 / 43

The Random E ects Model Does not use dummy variables, but assumes that individual e ect is a random variable. The random e ects model is an individual e ects model: Y it = α i + βx it + u it where α i = α + v i and v i is a random variable. Since α i is a random variable call this random e ects model Note that we have labelled the error in the regression u it () Applied Economoetrics: Topic March 2, 212 23 / 43

Alternative way of writing random e ects model is: Y it = α + βx it + ε it where ε it = v i + u it Random e ects model can be written as a regression model (with constant intercept), but the error in the regression has a new form. Regression with errors which do not satisfy classical assumptions is something you will know about from previous study (or see my textbook Chapter 5) Random e ects estimator is a Generalized Least Squares (GLS) estimator Note random e ects don t need to estimate as many coe cients as xed e ects model () Applied Economoetrics: Topic March 2, 212 24 / 43

Error properties are important (as with any econometric model) Traditionally assume u it and v i satisfy the classical assumptions and are uncorrelated with one another u it are independent N, σ 2 u random variables v i are independent N, σ 2 v random variables () Applied Economoetrics: Topic March 2, 212 25 / 43

Can use the assumptions about u it and v i to gure out properties of regression errors, ε it Can show var (ε it ) = σ 2 u + σ 2 v cov (ε it, ε jt ) = for i = j E (ε it ) = cov (ε it, ε js ) = for i = j and s = t cov (ε it, ε is ) = σ 2 v for s = t () Applied Economoetrics: Topic March 2, 212 2 / 43

What do these results imply in terms of the regression errors and classical assumptions? Regression errors have mean zero (satisfy classical assumption number 1) Every regression error has the same variance (homoskedasticity: satis es classical assumption 2) However, cov (ε it, ε is ) = σ 2 v for s = t means some of the errors are correlated with one another (violates assumption 3) Therefore OLS estimation of random e ects model should be avoided () Applied Economoetrics: Topic March 2, 212 2 / 43

GLS estimation (or maximum likelihood) are popular estimators Will not give the formula for the GLS estimator here Computer packages such as Gretl will calculate it for you As with virtually any econometric model, Gretl also has options for robust covariance estimation I.e. to produce estimates of var b β which are valid even if heteroskedasticity or autocorrelated errors are present Remember: var b β appears in test statistics and con dence intervals so important to get good estimate () Applied Economoetrics: Topic March 2, 212 28 / 43

Hypothesis Testing in the Random E ects Model Testing of hypotheses involving regression coe cients can be done as for any multiple regression model (e.g. t- and F-tests). Another hypothesis of interest: is random e ects necessary or is pooled model adequate? Equivalent to H : σ 2 v = If H is true, individual e ects exhibit no dispersion (i.e. they are all the same). There are several tests of H and many econometrics software packages will do at least one test for you Gretl calls it a Breusch-Pagan test and produces it automatically when random e ects estimation is done If this test rejects the pooled model, then must choose between xed and random e ects () Applied Economoetrics: Topic March 2, 212 29 / 43

Choosing Between Random and Fixed E ects: The Hausman Test Random e ects speci cation has advantage that fewer parameters have to be estimated Random e ects have one potential drawback: It is possible that regression error, ε it, is correlated with explanatory variable. Remember from previous study (or see Chapter 5 of my textbook): if regression error is correlated with explanatory variable, then OLS (and GLS) estimators are biased and instrumental variables (IV) estimation is needed In our context, this mean random e ects will be biased and IV estimator needed () Applied Economoetrics: Topic March 2, 212 3 / 43

Why might explanatory variable and error be correlated in random e ects model? To illustrate, return to returns to schooling example with panel data Y = income, X = years of schooling In such a regression, X might be correlated with error Why? (discussed in 3rd year course or see pages 12-13 of my textbook) Basic idea: there may be some unobserved quality with e ect on both income and schooling. This unobserved quality could re ect innate talent, intelligence, ambition or other personal advantages, let me just call it talent This could cause regression error and X to be correlated. With cross-section data, this means IV necessary () Applied Economoetrics: Topic March 2, 212 31 / 43

Advantage of panel data: the individual e ect, α i, can pick up e ect of talent on income of individual i. But this mean individual e ect will be associated with talent With random e ects model individual e ect enters the regression error. If talent is correlated with individual e ect (as it may be in many applications), this means regression error is correlated with X Problem does not arise with xed e ects individual e ect does not appear in regression error () Applied Economoetrics: Topic March 2, 212 32 / 43

Hausman Test Will not explain in detail, just give basic ideas Hausman test is done by Gretl automatically when you do random e ects H is that individual e ect is uncorrelated with explanatory variables (and, thus, random e ects can be used). If H is rejected should use xed e ects Let bβ RE and bβ FE be random and xed e ects estimators 2. Hausman test has a term in it which relates to b β RE bβ FE Idea: if bβ RE and bβ FE are both unbiased then they should give similar 2 results and b β RE bβ FE is small. But if random e ects is biased and xed e ects is not then 2 b β RE bβ FE is large () Applied Economoetrics: Topic March 2, 212 33 / 43

Instrumental Variable Estimation in the Random E ects Model Given advantages of random e ects (many fewer coe cients to estimate), many researchers prefer to use it even if explanatory variable correlated with error. IV methods with panel data can be complicated Here explain ideas between two of the most popular: Hausman-Taylor estimator and the Arellano-Bond estimator. Gretl calculates Arellano-Bond estimator () Applied Economoetrics: Topic March 2, 212 34 / 43

There can four types of explanatory variables which you might have in a random e ects model: i) those which are time-varying and not correlated with individual e ect (call these X (1) it ) ii) those which are time-varying and correlated with individual e ect (X (2) it ) iii) those which are constant over time and not correlated with the individual e ect (Z (1) i ) iv) those which are constant over time and correlated with individual e ect (Z (2) i ). Let K 1, K 2, L 1 and L 2, be numbers of each type of explanatory variables. Need at least one instrumental variable for each explanatory variable that is correlated with the error term. In general, need K 2 + L 2 instrumental variables () Applied Economoetrics: Topic March 2, 212 35 / 43

Let us assume one explanatory variable of each type (i.e. K 1 = K 2 = L 1 = L 2 = 1) Thus, random e ects model is: Y it = α + β 1 X (1) it + β 2 X (2) it + β 3 Z (1) i + β 4 Z (2) i + ε it where ε it = v i + u it As before, assume u it and v i satisfy classical assumptions and are uncorrelated with each other. Basic idea: v i was part of the individual e ect and it is this which is causing problem, so transform model to get rid of it. () Applied Economoetrics: Topic March 2, 212 3 / 43

Remember our previous notation: bars over variables are averages over time, e.g. T X (1) X (1) it t=1 i = T and we took deviations from individual means : X (1) it X (1) i If we average over time and subtract o from random e ects model, we can write it in deviations from means form: Y it Y i = β1 X (1) it X (1) i + β 2 X (2) it X (2) i + ε it ε Can show error in this regression is NOT correlated with explanatory variables and OLS is ne This is not exactly what Hausman and Taylor recommend, but is meant to illustrate an important point: Deviations from individual means can be used as instruments. So X (2) it X (2) i can be used as an instrument for X (2) it, the variable which is correlated with the error. () Applied Economoetrics: Topic March 2, 212 3 / 43

Taking deviations from individual means removes the time invariant explanatory variables Z (1) i and Z (2) i so cannot use deviations from means as instrument for Z (2) i How can we obtain an instrument for Z (2) i? I will not explain why, but it turns out that X (1) i for Z (2) i. Hausman-Taylor uses this instrument is a valid instrument Arellano-Bond: in addition to Hausman-Taylor uses variables at di erent points in time. Hopefully this is enough information for you to sensible use the Arellano-Bond estimator in Gretl () Applied Economoetrics: Topic March 2, 212 38 / 43

Summary of Instrumental Variable Estimation in the Random E ects Model In some cases we will want to use IV with random e ects model But where do instruments come from? Hausman-Taylor and Arellano-Bond can be thought of as ways of constructing instruments (using deviations from mean, means and values of variables in di erent time periods) Note: with random e ects model we can estimate coe cients on variables like Z (1) i and Z (2) i which are constant over time (unlike xed e ects) () Applied Economoetrics: Topic March 2, 212 39 / 43

Extensions to Individual E ects Models So far we have worked with individual e ects models: Y it = α i + βx it + ε it In some cases might want time e ects, too: Y it = α i + γ t + βx it + u it This is a simple extension in either xed or random e ects models Dynamic panel data model: Y it = α + ρy i,t 1 + β 1 X (1) it + β 2 X (2) it + β 3 Z (1) i + β 4 Z (2) i + ε it Note: Y i,t 1 is very likely to be correlated with terror since both of them include the individual e ect Thus, Arellano-Bond commonly used to estimate this model. () Applied Economoetrics: Topic March 2, 212 4 / 43

The Random Coe cients Model Individual e ects allow for intercepts to di er over individuals. What if slope coe cients do? Replace β by β i Random coe cients model: Y it = α i + β i X it + ε it. Implication: marginal e ect of X on Y is di erent for di erent people This may be sensible in some applications Estimation of such models is a bit more complicated and is not done in many econometrics packages (like Gretl) I will not cover here, but if you do future study in econometrics you may come across such models Many ways of modelling how β i di ers across i Names: random coe cients model, nite mixture model, multi-level models, etc. () Applied Economoetrics: Topic March 2, 212 41 / 43

Summary Pooled model assumes all observations come from same regression model (and regression methods for cross-sectional data used) Pooled model is often not appropriate since it is often the case that di erent individuals have di erent regression lines. Individual e ects models allow di erent individuals to have regression lines with di erent intercepts Fixed e ects model uses dummy explanatory variable for each individual Random e ects model assumes that the individual e ect is a random variable. Estimation of random e ects model often done using GLS () Applied Economoetrics: Topic March 2, 212 42 / 43

A Hausman test can be used to decide whether a xed or random e ects speci cation is appropriate. Random e ects model can sometimes lead to regression errors being correlated with explanatory variables. If the latter occurs, then random e ects is biased unless instrumental variables methods are used. Popular instrumental variables estimators used with random e ects model are Hausman-Taylor and Arellano-Bond estimators. () Applied Economoetrics: Topic March 2, 212 43 / 43