Panel data can be defined as data that are collected as a cross section but then they are observed periodically.

Similar documents
Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Regression Models - Introduction

ECON 4230 Intermediate Econometric Theory Exam

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

ECON 497: Lecture Notes 10 Page 1 of 1

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Econometrics of Panel Data

Topic 10: Panel Data Analysis

y it = α i + β 0 ix it + ε it (0.1) The panel data estimators for the linear model are all standard, either the application of OLS or GLS.

Regression Models - Introduction

Econometrics of Panel Data

TABLE OF CONTENTS INTRODUCTION TO MIXED-EFFECTS MODELS...3

df=degrees of freedom = n - 1

Lecture 4: Linear panel models

Lectures 5 & 6: Hypothesis Testing

STAT5044: Regression and Anova. Inyoung Kim

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Measuring the fit of the model - SSR

Regression of Time Series

Economics 582 Random Effects Estimation

Analisi Statistica per le Imprese

Test of hypotheses with panel data

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

1. The OLS Estimator. 1.1 Population model and notation

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Lab 07 Introduction to Econometrics

CIVL 7012/8012. Simple Linear Regression. Lecture 3

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

Applied Microeconometrics (L5): Panel Data-Basics

10 Panel Data. Andrius Buteikis,

Categorical Predictor Variables

Applied Econometrics (QEM)

Introduction to Econometrics. Heteroskedasticity

Steps in Regression Analysis

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Econometrics of Panel Data

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Fixed and Random Effects Models: Vartanian, SW 683

Inference with Simple Regression

Panel data panel data set not

Motivation for multiple regression

Econometrics Part Three

Econometrics of Panel Data

Advanced Econometrics

LECTURE 03: LINEAR REGRESSION PT. 1. September 18, 2017 SDS 293: Machine Learning

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?

Fixed Effects Models for Panel Data. December 1, 2014

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Statistical Techniques II EXST7015 Simple Linear Regression

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

2 Prediction and Analysis of Variance

Exam. Econometrics - Exam 1

The Multiple Regression Model

Chapter 14 Simple Linear Regression (A)

Econometrics Summary Algebraic and Statistical Preliminaries

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

MS&E 226: Small Data

Föreläsning /31

Econometrics Review questions for exam

Regression Analysis Chapter 2 Simple Linear Regression

Lecture 14 Simple Linear Regression

MS&E 226: Small Data

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Statistical View of Least Squares

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Economics 308: Econometrics Professor Moody

Correlation and Regression

The Simple Regression Model. Simple Regression Model 1

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

CHAPTER 6: SPECIFICATION VARIABLES

The regression model with one fixed regressor cont d

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Statistics Diagnostic. August 30, 2013 NAME:

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

WISE International Masters

Econometrics. 9) Heteroscedasticity and autocorrelation

BOOTSTRAPPING DIFFERENCES-IN-DIFFERENCES ESTIMATES

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Multiple Linear Regression CIVL 7012/8012

Multiple Regression Analysis

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

Inferences for Regression

STAT 100C: Linear models

7. GENERALIZED LEAST SQUARES (GLS)

Applied Quantitative Methods II

2 Regression Analysis

22s:152 Applied Linear Regression

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

TMA4255 Applied Statistics V2016 (5)

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Ch 2: Simple Linear Regression

General Linear Model (Chapter 4)

Transcription:

Panel Data Model Panel data can be defined as data that are collected as a cross section but then they are observed periodically. For example, the economic growths of each province in Indonesia from 1971-2009; or the profit of companies listed in ISX observed from 1991-2009

Panel data can be very useful for researchers who are interested in analyzing something that can not be done using either time series data or cross section data only. For example, we would like to develop a model that can explain the variations regional economic performance of provinces in Indonesia through their natural resources and productivity of their human resources. If we estimate the model using crosssection data that are observed only in one particular year, we can not say anything about the variation of their growths over the last ten years.

By using panel data, researchers can analyze the fluctuations of economic performance over years as well as the variations of economic performance across provinces in some particular year. Since the panel data is a combination of cross section and time series data, the observations are very large. In additions, characteristics of time series data and cross section data are merging into panel data characteristics. This situation can be advantages or disadvantages for researchers. That is why we need a special treatment for estimating panel data model.

Model Representation Model with cross section data Y i = α + β X + ε i i. ; i = 1,2,.., N N: number of cross section observations Model with time series data Y t = α + β X t + ε t ;t =1,2,.,T T: number of time series observations Model with panel data Y it = α + β X it + ε it ; i =1,2,..,N; t =1,2,..,T N.T: number of panel data observations.

Estimation of Panel Data Model There several techniques available 1. OLS (Pooled Data) This technique is to be used when the data is just combining cross-section data and time-series data and this data combination (Pooled Data) is treated as new set of data without taking any consideration of cross-section and time-series behaviors.

Estimation of Panel Data Model 2. Fixed Effect This approach assumes that all individual characteristics as well as cross-section specifics are captures in the intercepts. Therefore, in this approach, the intercept can change across individual or over time or in both directions.

Estimation of Panel Data Model 3. Random Effect This approach assumes that all individual characteristics as well as cross-section specifics are captures in the residuals. Therefore, in this approach, the residual has individual component, time-series component and both components.

Observe the following representation: Y it = α + β X it + ε it ; i=1,2,..,n; t=1,2,..,t If cov(ε it, ε jt ) = 0; cov(ε it, ε i,t-1 )=0; E(ε it )=0; and Var(ε it )=σ 2, then,

we can estimate the model by separating its time component so that we have T regressions each having N observations. Or: Y i1 = α + β X i1 + ε i1 ; i=1,2,..,n Y i2 = α + β X i2 + ε i2 Y it = α + β X it + ε it

Analogously, we can estimate the model by separating its cross-section so that we have N regressions each having T observations. Or: i = 1 ; Y 1t = α + β X + ε 1t 1t ; t=1,2,..,t i = 2 ; Y 2t = α + β X + ε 2t 2t ; i = N ; Y Nt = α + β X Nt + ε Nt ;

For Pooled Data approach, we assume that α (intercepts) and the residuals are constants across individual and over time. Sometime, this assumption is not a realistic one. Therefore, we will consider the models that makes intercepts or residuals change over time and across individual.

Fixed Effect Model (FEM) In this model, variations of individual and over time is captured in the intercepts. To formulate this, see the following: Y it = α + γ 2 W 2t + γ 3 W 3t +..+ γ N W Nt + δ 2 Z i2 + δ 3 Z i3 +..+ δ T Z it + β X it + ε it

W it and Z it are dummy variables and defined as: W it = 1 ; for individual i; i= 1,2, N = 0 ; others. Z it = 1 ; for period t; t= 1,2, T = 0 ; others. If the model is estimated using OLS, we will obtained an unbiased and consistent estimator.

Remarks: 1. The model has N+T parameters that consists of: (N-1) parameters of γ (T-1) parameters of δ 1 parameter of α 1 parameter of β 2. The degrees of freedom is: N.T N - T

Regression Equations on FEM i = 1 ; t=1; Y 11 = α + β X 11 + ε 11 t=2; Y 12 = (α +δ 2 ) + β X 12 + ε 12. t=t; Y 1T = (α +δ T ) + β X 1T + ε 1T

i = 2 ; t=1 ; Y 21 = ( α +γ 2 ) + β X 21 + ε 21 t=2 ; Y 22 = ( α +γ 2 +δ 2 ) + β X 22 + ε 22 t=t ; Y 2T = (α +γ 2 +δ T ) + β X 2T + ε 2T i = N ; t=1 ; Y N1 = (α + γ N ) + β X N1 + ε N1 t=2 ; Y N2 = (α + γ N + δ 2 ) + β X N2 + ε N2 t=t ; Y NT = (α + γ N + δ T ) + β X NT + ε NT

To investigate whether α is constants for all i and t, do the following test: F={(RSS OLS RSS MET ) / RSS MET }.{(NT-N-T) / (N+T-2)} If F calculated > F from table, then H 0 is rejected, and it means that FEM is better. The next question is: How to interpret all the parameters?

Random Effect Model (REM) In FEM, variations of individual and times are accommodated in the intercepts such that the intercepts changed over time and across individual. In the meantime, variations of individual and times are accommodated in the residuals for REM. In this case, the random error is composed into error of individual component, error of time component and error for both. REM can be represented as:

Y it = α + β X it + ε it ; ε it = u i + v t + w it u i : error for cross-section v t : error for time-series w it : error for both With the assumption: u i N (0, σ u2 ); v t N (0, σ v2 ); w it N (0, σ w2 )

Therefore, on average, deviation effect for time series is randomly represented by v t while deviation effect of cross-section is randomly represented by u i. For REM, Var (ε it ) = σ u2 + σ v2 + σ 2 w For OLS (Pooled Data), Var (ε it ) = σ 2 w

So, REM can be estimated using OLS if σ u2 = σ v 2 = 0. Otherwise, REM is estimated using Generalized Least Square method that consists of 2 stages. I (i) Estimate REM using OLS. (ii) Calculate RSS to estimate sample variance II By using sample variance estimated at the first stage, use GLS to estimate parameters of the model.

Remark: If we can assume that the error is normally distributed, then MLE can be used.

FEM vs REM Which one should we choose? (i) The parameters of REM are less; so it has bigger degrees of freedom. But FEM has capabilities to differentiate individual effects and time effects. (ii) There is a suggestion: If T > N use FEM If N > T use REM (iii) Use a statistical test, instead

Example 1 To analyze a cost function of an industry, it was observed costs and outputs from 4 companies over a ten-year period. The cost function is estimated using FEM approach: C it = α + γ 2 W 2t + γ 3 W 3t + γ 4 W 4t + β Q it + ε it

C it Q it : total cost of a company i at time t : total output of a company i at time t W = it 1; for a company i ; i =2,3,4 = 0; other The estimated model: C it = 2.315 + 10.110 W 2t + 2.385 W 3t + 16.171 W 4t + 1.119 Q it

Comment: How to interpret the intercept? For company 1, if Q 1 = 1000, then, C 1 = 1121.315 For company 2, if Q 2 = 1000, then, C 2 = (C 1 + 10.110) For company 3, if Q 3 = 1000, then, C 3 = (C 1 + 2.385) For company 4, if Q 4 = 1000, then, C 4 = (C 1 + 16.171)

How to interpret the slope? If the output is increased by 1 unit, then, the cost will increase by 1.119 unit for companies 1, 2, 3 or 4. Which company is the most cost efficient?

Example 2 Relationship between R&D Budget and Number of Products patented. There are several companies that spent a lot of many for Research and Development (R&D) expecting that new more efficient innovation / technique invented. To investigate whether there a positive relation between the budget of R&D and patents invented, it was observed 45 companies over 7 years in the US.

P RND : number of inventions patented (in log) : budget of R&D, 5 years ago (in log) The model offered: P it = β 0 + β 1 RND i,t-5 + ε it ; i: company; t: time Using 315 observation (45 companies over 7 years): P it = 1.438 + 0.845 RND i,t-5 t: (14.01) (24.17) R 2 = 0.65

Observations: 1. The estimated model indicates that there is a positive relation between budget of R&D and number of inventions patented. 2. On average, for every 1% increased in R&D, number of inventions patented will increase by 0.845%.

To analyze more on this relationship, the following is the estimated equation from regressing the average budget of R&D on the average invention (over a seven-year period): The estimated equation: P i = 1.370 + 0.871 RND i (5.53) (10.28) R 2 = 0.71

Observations: 1. There is a positive relationship between spending on R&D and invention patented 2. For every 1% increased in R&D spending, number of inventions patented increase 0.87%. 3. However, this model can not distinguished the variations among number of inventions patented across individual that not caused by spending on R&D. 4. Need to develop a model that can be used in analyzing number of inventions that are not caused by R&D spending across individual company.

Estimation based on FEM approach: P it = β 0 + β 1 RND i,t-5 + W it γ i + ε it Estimated Equation: P it = 0.195 RND i,t-5 t: (2.35); R 2 = 0.937 Since there are 45 different intercepts, it is not written explicitly. However, based on both F and t tests, all parameters are both jointly and individually significant.

For comparison, REM is also estimated and the estimated model is: P it = 2.299 + 0.519 RND i,t-5 t: (12.13) (8.78) R 2 = 0.91

From 4 different approaches we have tried, each gives different result and thus different interpretation. Since the data we used is panel data, we should use either FEM or REM. The choice can be guided by the objective of the analysis. If we really want to know the impact of other than R&D spending to the number of inventions patented across companies, FEM could be used.

However, Hausman Specification Test can be used to investigate whether the residuals are not correlated with the regressor as required in REM.

Remark: For this example, based on Hausman Test, requirement that the residuals are not correlated with regressor can not be fulfilled. So, for this example, FEM is more appropriate. Therefore, the analysis and model interpretation should be based on FEM.

Example 3 To analyze the cost function from an automobile industry, it was observed costs and outputs from 4 companies (let say: Toyota, Honda, Suzuki, and Kia) over a ten-year period. The cost function is represented by (using FEM approach):

C it = α + γ 2 W 2t + γ 3 W 3t + γ 4 W 4t + β Q it + ε it C it : total cost of a company i at time t Q it : total output a company i at time t W = it 1; for a company i; i = 2 (Honda), 3 (Suzuki), 4 (Kia) = 0; other (Toyota)

The estimated cost function (all parameters significant at α = 5%): C it = 16,171 2,385 W 2t - 2,315 W 3t + 10,110 W 4t + 1,119 Q it

From the estimated cost function, answer the following questions: (i) Which companies is the most cost-efficient? Why? (ii) For Suzuki, for example, what is the cost of producing 1000 units? Explain (iii) For Toyota, for example, what is the cost of producing 1000 units? Explain (iv) Which companies is the least cost-efficient? Why?

Which one is a proper estimator? I. Estimation using OLS P it = 1.438 + 0.845 RND i,t-5 t: (14.01) (24.17) R 2 = 0.65

II. Averaging over t, and using OLS for estimation P i = 1.370 + 0.871 RND i t: (5.54) (10.28) R 2 = 0.71

III. Estimation with FEM P it = β 0 + β 1 RND i,t-5 + W it γ i + ε it Estimate: P it = 0.195 RND i,t-5 t: (2.35); R 2 = 0.937

IV. Estimation with REM P it = 2.299 + 0.519 RND i,t-5 t: (12.13) (8.78) R 2 = 0.91