Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data

Similar documents
ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

Chapter 8 Heteroskedasticity

Topic 7: Heteroskedasticity

Intermediate Econometrics

Econometrics of Panel Data

Specification testing in panel data models estimated by fixed effects with instrumental variables

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Lecture 4: Heteroskedasticity

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Panel data can be defined as data that are collected as a cross section but then they are observed periodically.

The regression model with one fixed regressor cont d

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Multiple Regression Analysis: Heteroskedasticity

R = µ + Bf Arbitrage Pricing Model, APM

Making sense of Econometrics: Basics

Ordinary Least Squares Regression

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

Test of hypotheses with panel data

Fixed Effects Models for Panel Data. December 1, 2014

ECON 4230 Intermediate Econometric Theory Exam

Instrumental Variables, Simultaneous and Systems of Equations

Topic 10: Panel Data Analysis

ECON 497: Lecture Notes 10 Page 1 of 1

Econometrics. 9) Heteroscedasticity and autocorrelation

EC327: Advanced Econometrics, Spring 2007

GLS. Miguel Sarzosa. Econ626: Empirical Microeconomics, Department of Economics University of Maryland

An overview of applied econometrics

Heteroskedasticity. Part VII. Heteroskedasticity

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Lecture 3: Multiple Regression

applications to the cases of investment and inflation January, 2001 Abstract

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Financial Econometrics Lecture 6: Testing the CAPM model

Simple Linear Regression

Lectures 5 & 6: Hypothesis Testing

Applied Microeconometrics (L5): Panel Data-Basics

Using EViews Vox Principles of Econometrics, Third Edition

y it = α i + β 0 ix it + ε it (0.1) The panel data estimators for the linear model are all standard, either the application of OLS or GLS.

Ch 2: Simple Linear Regression

Econometrics of Panel Data

Applied Econometrics. Lecture 3: Introduction to Linear Panel Data Models

Answers to Problem Set #4

Modeling GARCH processes in Panel Data: Theory, Simulations and Examples

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.

Graduate Econometrics Lecture 4: Heteroskedasticity

Econometrics of Panel Data

Applied Econometrics (QEM)

Semester 2, 2015/2016

Econ 583 Final Exam Fall 2008

CHAPTER 4: Forecasting by Regression

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

ECON 5350 Class Notes Functional Form and Structural Change

Chapter 9. Dummy (Binary) Variables. 9.1 Introduction The multiple regression model (9.1.1) Assumption MR1 is

Section 3: Simple Linear Regression

Hypothesis Testing for Var-Cov Components

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Statistics 910, #5 1. Regression Methods

Lab 11 - Heteroskedasticity

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

The Simple Linear Regression Model

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

Heteroskedasticity and Autocorrelation

Applied Econometrics (QEM)

Inference for Regression Simple Linear Regression

F9 F10: Autocorrelation

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Formal Statement of Simple Linear Regression Model

Econometrics Summary Algebraic and Statistical Preliminaries

ECON The Simple Regression Model

Statistics, inference and ordinary least squares. Frank Venmans

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Econometrics of Panel Data

Spatial Regression. 15. Spatial Panels (3) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Heteroscedasticity and Autocorrelation

Christopher Dougherty London School of Economics and Political Science

Review of Statistics

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

FinQuiz Notes

Inference for the Regression Coefficient

The OLS Estimation of a basic gravity model. Dr. Selim Raihan Executive Director, SANEM Professor, Department of Economics, University of Dhaka

Probability and Statistics Notes

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Correlation and Regression

Econometrics. 4) Statistical inference

Econometrics Multiple Regression Analysis: Heteroskedasticity

Confidence Intervals, Testing and ANOVA Summary

Inference with Simple Regression

Model Mis-specification

Econ 510 B. Brown Spring 2014 Final Exam Answers

14 Week 5 Empirical methods notes

Economics 308: Econometrics Professor Moody

Econometrics of Panel Data

ASSET PRICING MODELS

Lecture 13. Simple Linear Regression

Environmental Econometrics

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Transcription:

Chapter 5 Panel Data Models Pooling Time-Series and Cross-Section Data Sets of Regression Equations The topic can be introduced wh an example. A data set has 0 years of time series data (from 935 to 954) for the two firms General Electric and Westinghouse. A regression model that describes investment behaviour is: Firm Firm y +,t = β + βx,t + β3x,t3 e, t for t =,..., T y +,t = β + βx,t + β3x,t3 e, t for t =,..., T where y i,t x i,t is investment in plant and equipment by firm i (i=,) in year t, is the value of firm i (i=,) in year t, and x i,t3 is capal stock of firm i (i=,) in year t. The original data set for this study had T = 0 time series observations for 0 firms. The data set is called a panel data set. In this example, the firm is the cross-section un. The data set combines time-series and cross-section data. Econ 36 - Chapter 5

Panel data offers a variety of interesting applications: For a study in international trade the cross-section un may be a country. The data set contains time series data for a sample of countries. For a study wh Canadian data the cross-section un may be a province. The data set contains time series data for each province in Canada. For an American study the cross-section un may be a state. The data set contains time series data for each state in the Uned States. To start, is useful to review the estimation and testing methods that are available from the work presented so far. Application to the investment data set for General Electric and Westinghouse is demonstrated. Econ 36 - Chapter 5

Separate equations for each firm can be estimated by least squares (OLS Ordinary Least Squares). A first question to ask is: Do the equations have common coefficients? The hypothesis to test is: H 0 : the two firms General Electric and Westinghouse have the same intercept and slope coefficients that is, the investment functions for the two firms are identical. H : the two firms have different investment functions. The test method is the Chow test introduced in Chapter 7. For the investment data set the Chow test statistic calculated from least squares estimation results had a p-value of 0.38. This gives support to the null hypothesis that the two firms have the same investment function. 3 Econ 36 - Chapter 5

Wh the same coefficients for each cross-section (or firm) the regression model can be stated: y + i,t = β + βxi,t + β3xi,t3 ei,t for i=, and t=,..., 0 This is called a pooled regression. Estimation by least squares is called pooled OLS. Least squares estimation of the pooled regression assumes: var( e t) = var(et) = σ for t =,,..., T This is cross-section homoskedasticy. Cross-section heteroskedasticy recognizes a different error variance for each cross-section. This can be stated as: var(e var(e t t ) = σ ) = σ for t =,,..., T 4 Econ 36 - Chapter 5

A test for cross-section heteroskedasticy is the Goldfeld-Quandt test introduced in Chapter 8. The hypothesis of interest is: H against H 0 : σ = σ : σ σ For the investment function example, the Goldfeld-Quandt test statistic is: GQ σˆ = σˆ =7.453 The calculated p-value for a two-tail test is found to be less than any standard significance level (such as 0.05 or 0.0) to give strong evidence for a different error variance for each firm. 5 Econ 36 - Chapter 5

Wh heteroskedasticy the least squares principle gives an unbiased estimation rule. But the least squares variances and covariances of the parameter estimators are incorrect. Therefore, the standard errors reported as a routine part of the least squares estimation output will be incorrect. This will affect hypothesis testing conclusions. The Whe standard errors, described in Chapter 8, give standard errors adjusted for heteroskedasticy of a general form. The Whe standard errors do not make use of information about the specific form of the heteroskedasticy. As a special case, standard errors adjusted for cross-section heteroskedasticy can be obtained. The result will be shown for the simple -variable linear regression model: y + = β + βx e for i =, and t =,..., T General heteroskedasticy states ) σ var (e = 6 Econ 36 - Chapter 5

The formula for the variance of the slope estimator is: var(b ) = T t = i= T (x t= i= (x x) x) where x is the pooled sample mean calculated as: σ x = T T x t = i= Using the estimated residuals ê from the pooled least squares estimation the Whe estimate of the variance of the slope estimator is: vâr(b ) = T t = i= T (x t = i= (x x) x) ê 7 Econ 36 - Chapter 5

Now use the information about the cross-section heteroskedasticy. That is, σi σ = for i =, and t =,..., T The variance of the slope estimator simplifies to: var(b ) = σ T T (xt x) + σ t = t= T t= i= (x x) (x t x) For calculation, replace the unknown error variances σ and wh the least squares estimates of the cross-section error variances ˆσ and ˆσ. This gives the variance estimate: σ vâr(b ) = σˆ T T (xt x) + σˆ t= t= T t = i= (x x) (x t x) The results generalize to the variances and covariances of the parameter estimators obtained from the estimation of a model wh several explanatory variables. 8 Econ 36 - Chapter 5

The standard errors obtained as the square roots of the estimated variances adjusted for cross-section heteroskedasticy are called panel corrected standard errors. Least squares estimation results for an investment function for the combined data of the two firms General Electric and Westinghouse are below. This gives the pooled least squares estimates. A comparison of the t-statistics for testing the significance of the individual coefficients based on the different calculations of standard errors is shown. ŷ i,t = 7.87 + 0.05x + i,t 0.44x i,t3 (.54) (.45) (7.7) t-statistics OLS (0.05) (0.09) (<0.0005) p-values (4.0) (.35) (7.34) t-statistics Whe (<0.0005) (0.04) (<0.0005) p-values (3.96) (.8) (6.06) t-statistics Panel (<0.0005) (0.03) (<0.0005) p-values p-values are calculated using the standard normal distribution and should be viewed as approximate wh small samples. 9 Econ 36 - Chapter 5

Seemingly Unrelated Regressions The previous section discussed a regression model for the investment behaviour of two firms: Firm Firm y = + for t =,..., T,t β + βx,t + β3x,t3 et y = + for t =,..., T,t β + βx,t + β3x,t3 et The cross-section equations can be estimated separately by least squares. The error assumptions are: E t (e ) = 0, E(e t) = 0 for all t var( e t ) = σ, t var( e ) = σ for all t cross-section heteroskedasticy cov( e t,es) = 0, cov( e t,es) = 0 for all t s no autocorrelation In addion, separate OLS estimation assumes: cov( e t,et) = 0 for all t This says, in any time period t, the cross-equation errors are uncorrelated. 0 Econ 36 - Chapter 5

However, contemporaneous (in the same time period) cross-equation error correlation may be realistic. That is, the general state of the economy, whose influence is reflected in e t and e t, is likely to have similar effects in each equation. This suggests the error assumption: cov( e t,et) = σ for all t A set of equations that has contemporaneous cross-equation error correlation is called a Seemingly Unrelated Regression (SUR) system. At first look the equations seem unrelated. But the equations are related through the correlation in the errors. Separate least squares (OLS) estimation of each cross-section equation will ignore the information about the contemporaneous cross-equation error correlation. Using the information about the error correlation in the estimation procedure is expected to increase the precision of the parameter estimator. Generalized Least Squares (GLS) estimates or Seemingly Unrelated Regression (SURE) estimates can be obtained by a two-step estimation method. Econ 36 - Chapter 5

STEP Estimate each cross-section equation separately by least squares and get the residuals: ê t and ê t for t =,,..., T Estimate the error variances and covariance as: T ê t Tt = ˆ =, σ = σ σ ˆ T = êtêt Tt= T ˆ ê t and Tt = Note: A divisor of T was used in the above calculations. This is because there may be different numbers of explanatory variables in different cross-section equations. This is suable for use wh large T. STEP Obtain the GLS estimates. (Derivation of the estimation formula requires matrix manipulation and will not be presented here.) The estimation makes use of the information about the contemporaneous cross-equation error correlation. These estimates are called the Seemingly Unrelated Regression (SURE) estimates. Econ 36 - Chapter 5

If σ = 0 the errors are not correlated and the two equations can be estimated separately by least squares (OLS). That is, there is no advantage to SURE estimation. This suggests that a useful test is: H0 = : σ 0 no correlation between t H : σ 0 e and e t From the least squares estimation results of the two equations, a test statistic is constructed from the squared correlation as: T r = σ T σˆ ˆ σˆ This test statistic can be compared wh a wh degree of freedom. χ (chi-square) distribution For SURE estimation wh Stata the sureg command wh the corr option reports the above test statistic as the Breusch-Pagan test of independence. If the null hypothesis is not rejected then, for the given data set, SURE will not improve on least squares (OLS) estimation. 3 Econ 36 - Chapter 5

There is another special suation where separate least squares (OLS) estimation is just as good as SURE estimation. It can be shown that least squares and SURE give numerically identical parameter estimates when the explanatory variables that appear in each equation have the same numerical values. An example is the capal asset pricing model (CAPM) for stock market returns. A regression model that compares the performance of two companies is: Company A Company B y + t = β + βxt et for t =,..., T y + t = β + βxt et for t =,..., T where y t is the risk premium for Company A in time period t, y t is the risk premium for Company B in time period t, and x t is the risk premium on the market portfolio in time period t For each equation the explanatory variable x t has the identical observations. In this case, even if the errors in the equations are correlated, the use of SURE will give identical results to separate least squares (OLS) estimation of each equation. 4 Econ 36 - Chapter 5

Now return to the investment functions for General Electric and Westinghouse. From the SURE estimation results produced by the Stata sureg command the Breusch-Pagan test of independence for testing for contemporaneous cross-equation error correlation gave a test statistic of 0.68. The p-value for the test based on the χ () distribution is 0.00. Therefore, the null hypothesis of no correlation between e t and e t is rejected. The Stata results from the sureg command calculate p-values for t-statistics using the standard normal distribution. That is, the statistical properties of SURE estimators are established for large samples. Therefore, test statistics calculated for hypothesis testing should be viewed as approximate wh small samples. Testing Cross-Equation Hypotheses Following SURE estimation a test for identical coefficients for the investment functions for the two firms can be constructed. That is, H 0 : β = β, β = β and β 3 = β3 The Stata test command computed a p-value for the test of 0.036 to suggest that at a 5% significance level the null hypothesis of equal coefficients is rejected but at a % significance level the null hypothesis is not rejected. 5 Econ 36 - Chapter 5

It may be interesting to work wh the pooled regression: y + i,t = β + βxi,t + β3xi,t3 e for i=, and t=,..., 0 This imposes the same coefficients for each firm. The error assumptions for pooled least squares are: E(e var( e ) = E(e ) 0 for all t t t = t) var(et) = = σ for all t cross-section homoskedasticy cov( e t,es) = 0, cov( e t,es) = 0 for all t s no autocorrelation cov( e t,et) = 0 for all t no contemporaneous cross-equation error correlation 6 Econ 36 - Chapter 5

The method of pooled Generalized Least Squares (GLS) allows for both cross-section heteroskedasticy and contemporaneous crossequation error correlation. The error assumptions of pooled GLS are: E(e ) = E(e ) 0 for all t t t = var( e t ) = σ, t var( e ) = σ for all t cross-section heteroskedasticy cov( e t,es) = 0, cov( e t,es) = 0 for all t s no autocorrelation cov( e t,et) = σ for all t contemporaneous cross-equation error correlation Wh Stata pooled GLS is available wh the xtgls command. The pooled GLS method is designed to make better use of the information and so give a better estimator. 7 Econ 36 - Chapter 5

For the General Electric and Westinghouse data set the pooled least squares estimation results presented earlier can be compared wh the estimation results from pooled GLS. ŷ i,t = 7.87 + 0.05xi,t + 0.44xi,t3 Pooled OLS (3.96) (.8) (6.06) t-statistics Panel (<0.0005) (0.03) (<0.0005) p-values ŷ i,t = 0.9 + 0.09xi,t + 0.0xi,t3 Pooled GLS (6.36) (3.53) (5.97) t-statistics (<0.0005) (<0.0005) (<0.0005) p-values p-values are calculated using the standard normal distribution and should be viewed as approximate wh small samples. A 95% confidence interval estimate for the coefficient on the stock market value of the firm x can be calculated from the estimation results. OLS Panel zcse(b) b ± = 0.05 ±.96 (0.00667) = [0.003, 0.086] GLS GLS b ± zcse(b ) = 0.09 ±.96 (0.00545) = [0.00856, 0.099] 8 Econ 36 - Chapter 5

The Fixed Effects Model For panel data sets that combine time-series and cross-section data a method for allowing cross-section differences is to introduce crosssection dummy variables. The model can allow differential intercepts but impose identical slope coefficients for each cross-section. This set-up is called a fixed effects model. This can be applied to the investment data for General Electric and Westinghouse used in previous examples. Define the dummy variable: W = 0 for for i = i = (GeneralElectric) (Westinghouse) The investment model is: y β + γw + β x + β x + e = 3 3 for i =, and t =,,..., 0 Both General Electric and Westinghouse have identical slope coefficients. The differential intercept is: β for General Electric, and β + γ for Westinghouse A test for cross-section differences is: H 0 : γ = 0 against H : γ 0 This is the usual t-test of significance reported on the standard estimation output from computer programs. 9 Econ 36 - Chapter 5

The fixed effects model that recognizes differential cross-section intercepts has useful application to data sets that contain more than two cross-sections. The General Electric and Westinghouse data can be supplemented wh time series data on an addional 8 firms to give a total of ten firms. A set of cross-section dummy variables is defined as: D, = for firm i= and 0 otherwise, D, = for firm i= and 0 otherwise,..... D 0, = for firm i=0 and 0 otherwise. The investment model for the ten firms is: 0 y = γ D + β x + β x + e k k= k, 3 3 for i =,,..., 0 and t =,,..., 0 Note that the model includes all ten firm dummy variables. To avoid the dummy variable trap there is no overall intercept. 0 Econ 36 - Chapter 5

An equivalent scheme is to set one firm as the base group. The crosssection dummy variable for this firm is dropped from the equation and an intercept coefficient is included. For example, wh firm as the base group, a regression equation is: 0 y = β + α D + β x + β x + e k= k k, 3 3 for i =,,..., 0 and t =,,..., 0 Eher equation will lead to identical conclusions. The equation can be estimated by least squares (OLS). Following model estimation is interesting to test the hypothesis that all firm intercepts are the same. The test method is as follows. STEP STEP Estimate the model wh cross-section dummy variables and get the sum of squared residuals SSE. Estimate the restricted model that imposes identical firm intercepts: U y = β + β x + β x + e 3 3 Calculate the sum of squared residuals SSE R. Econ 36 - Chapter 5

STEP 3 Construct the F-statistic: F = (SSE SSE R U SSE U ) J (N T K) where N is the number of cross-sections and T is the number of time periods. In this example, N = 0, T = 0, and K=. J is the number of restrictions and for this test J = N = 9. The F-statistic is compared wh an F-distribution wh (J, NT K) degrees of freedom. For the 0-firm data set, the estimation results reported a F-statistic test value of 48.99. The calculated p-value for the test was less than 0.00005. The small p-value gives strong evidence to reject the null hypothesis that the intercept parameters for all firms are equal. Econ 36 - Chapter 5

The error assumptions for least squares (OLS) estimation are crosssection homoskedasticy (each firm has identical error variance) and no contemporaneous error correlation (in the same time period the errors between firms are not correlated). The Generalized Least Squares (GLS) estimation method recognizes both cross-section heteroskedasticy and contemporaneous error correlation. The error assumptions are: E(e var(e ) = 0 ) = σ i for i =,..., N; t =,..., T cov( e,eis) cov( e 0 = for i =,..., N and all t s no autocorrelation,ejt) = σ for all i j and t =,..., T ij contemporaneous cross-equation error correlation The Stata xtgls command gives GLS estimation for panel data sets that combine time-series and cross-section data. 3 Econ 36 - Chapter 5

Variable Parameter Estimate x x D D D D D D D D D D 3 3 4 5 6 7 8 9 0 Ten Firm Investment Data Set Estimation results for the fixed effects model OLS t-statistic p-value Parameter Estimate GLS t-statistic p-value 0. 9.6 0.000 0.0.88 0.000 0.3 7.88 0.000 0.9 3.46 0.000-69.4 -.39 0.66-7.8-0.57 0.569 00.86 4.05 0.000 4.55 5.48 0.000-35. -9.63 0.000-09.73 -.30 0.000-7.63 -.96 0.05-9.00-4.08 0.000-5.3-8.4 0.000-03.65-7.44 0.000-3.07 -.8 0.070-7.3-4.59 0.000-66.68-5.9 0.000-59.4-8.63 0.000-57.36-4.0 0.000-49.6-7.50 0.000-87.8-6.77 0.000-78.4 -.5 0.000-6.55-0.55 0.580-5.79 -.39 0.000 p-values are calculated using the standard normal distribution and should be viewed as approximate wh small samples. Note: A p-value reported as 0.000 means the p-value is less than 0.0005 to suggest that the null hypothesis of a zero coefficient on the associated variable is rejected at any reasonable significance level. 4 Econ 36 - Chapter 5