HOW IS GENERALIZED LEAST SQUARES RELATED TO WITHIN AND BETWEEN ESTIMATORS IN UNBALANCED PANEL DATA?

Similar documents
Simple Linear Regression: The Model

Regression systems for unbalanced panel data: a stepwise maximum likelihood procedure

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Time Invariant Variables and Panel Data Models : A Generalised Frisch- Waugh Theorem and its Implications

the error term could vary over the observations, in ways that are related

Heteroskedasticity. in the Error Component Model

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Intermediate Econometrics

Introductory Econometrics

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Topic 10: Panel Data Analysis

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics of Panel Data

The exact bias of S 2 in linear panel regressions with spatial autocorrelation SFB 823. Discussion Paper. Christoph Hanck, Walter Krämer

Econometrics of Panel Data

Econometrics of Panel Data

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Nonstationary Panels

INTRODUCTORY ECONOMETRICS

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

1 Estimation of Persistent Dynamic Panel Data. Motivation

ECON The Simple Regression Model

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Ordinary Least Squares Regression

Introductory Econometrics

Panel Data Model (January 9, 2018)

Advanced Econometrics

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Econometrics I Lecture 3: The Simple Linear Regression Model

The Simple Regression Model. Simple Regression Model 1

Some Monte Carlo Evidence for Adaptive Estimation of Unit-Time Varying Heteroscedastic Panel Data Models

Applied Microeconometrics (L5): Panel Data-Basics

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

Review of Econometrics

Economics 582 Random Effects Estimation

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Introduction to Econometrics Final Examination Fall 2006 Answer Sheet

CLUSTER EFFECTS AND SIMULTANEITY IN MULTILEVEL MODELS

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC

Reference: Davidson and MacKinnon Ch 2. In particular page

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

No 19/2016 Panel data estimators and aggregation Erik Biørn

Topic 6: Non-Spherical Disturbances

Introduction to Estimation Methods for Time Series models. Lecture 1

Seemingly Unrelated Regressions in Panel Models. Presented by Catherine Keppel, Michael Anreiter and Michael Greinecker

Sensitivity of GLS estimators in random effects models

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Linear Regression. Junhui Qian. October 27, 2014

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction

1 Introduction to Generalized Least Squares

Regression and Statistical Inference

Applied Econometrics (QEM)

ANALYSIS OF PANEL DATA MODELS WITH GROUPED OBSERVATIONS. 1. Introduction

ECON3150/4150 Spring 2015

The Two-way Error Component Regression Model

Least Squares Estimation-Finite-Sample Properties

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Modelling Multi-dimensional Panel Data: A Random Effects Approach

10 Panel Data. Andrius Buteikis,

Multivariate Regression Analysis

Title. Description. var intro Introduction to vector autoregressive models

Heteroskedasticity. Part VII. Heteroskedasticity

Simple Linear Regression Model & Introduction to. OLS Estimation

Test of hypotheses with panel data

Large Sample Properties of Estimators in the Classical Linear Regression Model

The Simple Regression Model. Part II. The Simple Regression Model

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

The Multiple Regression Model Estimation

Econometrics Multiple Regression Analysis: Heteroskedasticity

Next is material on matrix rank. Please see the handout

Weighted Least Squares

Homoskedasticity. Var (u X) = σ 2. (23)

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

GENERALIZED ESTIMATORS OF STATIONARY RANDOM-COEFFICIENTS PANEL DATA MODELS: ASYMPTOTIC AND SMALL SAMPLE PROPERTIES

1. The OLS Estimator. 1.1 Population model and notation

Econometrics of Panel Data

Econometrics Master in Business and Quantitative Methods

Lecture 3: Multiple Regression

Lecture 9 SLR in Matrix Form

Intermediate Econometrics

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Econometrics Summary Algebraic and Statistical Preliminaries

Lecture 7: Dynamic panel models 2

Specification Test for Instrumental Variables Regression with Many Instruments

Testing Random Effects in Two-Way Spatial Panel Data Models

Econometric Methods for Panel Data Part I

Panel data can be defined as data that are collected as a cross section but then they are observed periodically.

2. Linear regression with multiple regressors

Motivation for multiple regression

Econometrics of Panel Data

The multiple regression model; Indicator variables as regressors

Lecture 10: Panel Data

Variable Selection and Model Building

Econ 510 B. Brown Spring 2014 Final Exam Answers

Transcription:

HOW IS GENERALIZED LEAST SQUARES RELATED TO WITHIN AND BETWEEN ESTIMATORS IN UNBALANCED PANEL DATA? ERIK BIØRN Department of Economics University of Oslo P.O. Box 1095 Blindern 0317 Oslo Norway E-mail: erik.biorn@econ.uio.no March 16 010 Abstract: For a random effects panel data regression model in the unbalanced case we demonstrate that the Generalized Least Squares (GLS) estimator can be expressed as a (matrix) weighted average of estimators which utilize the within and the between individual variation. Two between estimators which generalizes the familiar estimator for the balanced case one of which depending on the variance components are involved. The regression s intercept needs specific treatment. We finally define an estimator which contains the GLS and the within and between estimators for balanced and unbalanced panel data as special cases. Keywords: Panel Data. Unbalanced panels. Missing data. Generalized Least Squares. Within estimation. Between estimation JEL classification: C13 C3

1 Introduction It is well known that for balanced panel data with random individual specific effects the Ordinary (OLS) and the Generalized Least Squares (GLS) estimators of the slope coefficient vector can be interpreted as (matrix) weighted averages of the estimators which utilize only the within individual and only the between individual variation in the data set often denoted as within and between estimators [see Hsiao (003 section 3.3.). Since unbalanced situations are more common than balanced in particular when using micro data due to entry or exit of respondents non-response rotation designs etc. the interest of this relationship is somewhat limited. There exists a growing literature on the GLS estimation of random individual effects models in unbalanced situations [see e.g. Biørn (1981) and Baltagi (1985). The question of whether and possibly how this estimator can be related to estimators which can be interpreted as within and between estimators has not been addressed in this literature. In this note the latter question is addressed. We demonstrate that a weighting relationship for the GLS with unbalanced panel data and random individual effects similar to that in the balanced case exists provided that the within and the between variation in the data are redefined in a suitable way. In deriving this relationship we demonstrate that specific attention must be given to the intercept term of the equation. Finally we present a general estimator which contains the GLS the OLS the within individual and the between individual estimators for balanced and unbalanced situations as special cases and which can be and easily implemented for computer programming. Model OLS and GLS estimators Consider a one-way error components regression model for unbalanced panel data in which individual i (i 1... N) is observed in T i periods and let t denote the observation number (which differs from the calendar period if the starting period of the individuals differ or if gaps occur in the time series of some of them): (1) y it x it β + k + ϵ it ϵ it α i + u it i 1... N α i IID(0 σα) u it IID(0 σ ) α i u it x it t 1... T i where x it is a (row) vector of regressors β its (column) vector of coefficients α i a random effect u it a disturbance and denotes orthogonal to. Let y i (y i1... y i ) y (y 1... y N ) X i (x i1... x it i ) X (X 1... X N) etc. and let I p be the p dimensional identity matrix and e p the (p 1) vector of ones. The number of observations i.e. the number of rows in y and X is n N i1 T i. The model can then be rewritten in the more compact form () y Xβ + e n k + ϵ ϵ [e T 1 α 1... e T N α N + u E(ϵ) 0 n1 E(ϵϵ ) Ω diag(ω 1... Ω N ) 1

where diag denotes block-diagonalization (3) Ω i E(ϵ i ϵ i) σ α e e T i + σ I σ K + (σ + T i σ α)j i 1... N and J p 1 p e pe p K p I p J p (p 1... ). Since J p and K p are idempotent and have orthogonal columns we simply have (4) Ω 1 i K T i σ + J σ + T i σ α σ (K + θ i J ) where (5) θ i var(ū i ) var( ε i ) σ σ +T i σ α 1 i 1... N. We use the following notation for the within the between and the total covariation in arbitrary matrices Z and Q constructed in the same way as X: W ZQ N i1 t1 (z it z i ) (q it q i ) Z diag(k T1... K TN )Q B ZQ N i1 T i( z i z) ( q i q) Z diag(j T1... J TN )Q Z J n Q G ZQ N i1 t1 (z it z) (q it q) W ZQ + B ZQ Z (I n J n )Q where z i 1 T i t1 z it and z 1 n N i1 t1 z it 1 n N i1 T i z i. Also we let X i (X i. e ) and X ( X 1... X N). As a consequence of not including the intercept in β and a column of ones in X we will [unlike e.g. Baltagi (008 section 9.) specify the intercept explicitly in the formulae. This will be essential for the purpose of defining the generalized between covariation and the two versions of the between estimators and interpreting the GLS estimator in terms of estimators which exploit the within and between variation. The OLS and GLS estimators of (β k) are respectively [ βols ( X X) [ 1 ( X X 1 [ y) ix i X ie X (6) iy i e X i e e e y i kols [ W XX + T i x i x i x i x i n 1 [ W XY + T i x i ȳ i ȳ i [ βgls (7) kgls ( X Ω 1 i X) 1 ( X Ω 1 i y) [ X iω 1 1 [ i X i X iω 1 i e X e Ω 1 i X i e Ω 1 i e [ W XX + 1 [ Φ i x i x i Φi x i Φi x i Φi iω 1 i y i y i e Ω 1 i W XY + Φ i x i ȳ i Φi ȳ i

where (8) Φ i var(u it) var( ε i ) T iσ σ +T i σ α θ i T i > < 1 (1 1 T i )σ > < σ α. The last equality in (7) follows from (4). Since the formula for the partitioned inverse [see Lütkepohl (1996 section 3.5.3) implies [ 1 [ A b Q Q b ( (9) c b c b c Q b c Qb c + 1 Q A b ) 1 b c c where A is a symmetric matrix b a row vector and c a scalar the estimators (6) and (7) can be rewritten as (10) β OLS [W XX + T i ( x i x) ( x i x) 1 [W XY + T i ( x i x) (ȳ i ȳ) kols ȳ x β OLS (11) β GLS [W XX + Φ i ( x i x) ( x i x) 1 [W XY + Φ i ( x i x) (ȳ i ȳ) kgls ȳ x β GLS where in general z z i z Φi z i Φi. In the unbalanced case we have: [1 The global means entering the OLS and the GLS formulae differ ( z z). [ The GLS estimators depend on (θ 1... θ N ). 3 Within and Between estimator first version The within estimator corresponding to β OLS and the between estimators corresponding to β OLS and k OLS obtained by running OLS on respectively (1) with the (k+α i )s considered as N unknown constants and on (1) ȳ i T i x i β + T i k + T i ϵ i i 1... N are (13) (14) β W W 1 XX W XY β B B 1 XX B XY kb ȳ x β B The disturbances in (1) T i ϵ i T i (α i + ū i ) are homoskedastic in the absence of individual effect (α i 0 θ i 1 Φ i T i ) and heteroskedastic with var( T i ϵ i ) σ /θ i otherwise. 3

From (10) (13) and (14) it follows for any panel design that (15) βols G 1 XX G XY (W XX +B XX ) 1 (W XX βw +B XX βb ). For a balanced design (T i T ) we have θ i θ σ Φ σ +T σα i Φ T σ σ +T σα (11) (13) and (14) yield (16) βgls (W XX +θb XX ) 1 (W XX βw +θb XX βb ). i so that We seek a relationship for the unbalanced case which can be interpreted as a generalization of the latter. This will involve modified between estimators. 4 Between estimator second version. A general class Let v i be an arbitrary weight and considered the following generalization of (1): (17) vi ȳ i v i x i β + v i k + v i ϵ i i 1... N with var( v i ϵ i ) v i Φ i σ. Running OLS gives a set of generalized between estimators: [ [ β 1 [ (18) B(v) vi x kb (v) i x i vi x i vi x i ȳ i vi x i vi vi ȳ i where v (v 1... v N ). Let further (19) z(v) vi z i vi BZQ (v) v i [ z i z(v) [ q i q(v) allowing (18) to be written in a form similar to (14) when we again use (9) (0) β B(v) B XX (v) 1 BXY (v) k B (v) ỹ(v) x(v)β B(v). Let now T (T 1... T N ) and Φ (Φ 1... Φ N ) so that we can write z z(t ) z z(φ) B ZQ B ZQ (T ). As a special case let in (0) v Φ which gives (1) β B β B(Φ) B XX (Φ) 1 BXY (Φ) kb k B (Φ) ỹ(φ) x(φ)β B(Φ) ȳ x β B. Proposition: Let for arbitrary vectors (z it q it ) W ZQ N i1 t1 (z it z i ) (q it q i ) B ZQ T i ( z i z) ( q i q) B ZQ (v) v i [ z i z(v) [ q i q(v) 4

where z(v)[ v i z i /[ v i z z(t ). Define corresponding estimators of β as β W W 1 XX W XY β B B 1 XX B XY β B(v) B XX (v) 1 BXY (v). Let (λ W λ B ) be non-negative scalars and consider the class of estimators β (λ W λ B v) [λ W W XX +λ B BXX (v) 1 [λ W W XX βw +λ B BXX (v)β B(v) being homogeneous in (λ W λ B ) and v of degree zero. Specifically β OLS β (1 1 T ) β B β (0 1 T ) β B(T ) β W β (1 0 T ) β (1 0 v) irrespective of v β B β (0 1 Φ) β B(Φ) β GLS β (1 1 Φ). [A β OLS β W β B β B β GLS are for suitable choices of v matrix weighted averages of β W β B(v). [B β OLS β GLS are in general not matrix weighted averages of β W β B. In practical applications the θ i s (Φ i s) have to be estimated which requires estimation of σ and σ α. We find that if the disturbances ϵ it were known the following estimators would be unbiased for the variance components: () σ W ϵϵ n N σ α n[(n N)B ϵϵ (N 1)W ϵϵ (n N)(n m) where n N i1 T i m N i1 T i. [For a balanced panel n NT m NT. The proof follows from the fact that and hence which implies W uu N i1 t1 u it N i1 T iū i B uu N i1 T iū i nū B αα N i1 T iα i nᾱ E(ū i ) 1 T i σ E(ū ) 1 n σ E(ᾱ ) m n σ α E(W ϵϵ ) E(W uu ) (n N)σ E(B uu ) (N 1)σ E(B αα ) E(B ϵϵ ) E(B uu ) + E(B αα ) ( n m ) σ n α (3) σ E(W ϵϵ) n N σ α n[(n N)E(B ϵϵ) (N 1)E(W ϵϵ ). (n N)(n m) 5

This we can verify that () give unbiased estimators: E( σ ) σ E( σ α) σ α. [See also Searle Casella and McCulloch (199 Section 3.6) and Biørn (004 Section 3). In practice the ϵ it disturbances are replaced by consistent residuals. The derived parameters θ i and Φ i can therefore be estimated by (4) (5) σ θ i σ +T i σ α Φ i T i σ σ +T i σ α (n m)w ϵϵ (n m)w ϵϵ + T i [(n N)B ϵϵ (N 1)W ϵϵ T i (n m)w ϵϵ (n m)w ϵϵ + T i [(n N)B ϵϵ (N 1)W ϵϵ to obtain feasible Between and GLS estimators by replacing Φ by Φ. 5 Conclusion Our conclusions then are the following: 1. If we define a modified between estimator of β given by (18) by choosing the weight vector v such that the weighted equation in individual means (16) gets disturbances with variance σ we obtain the Between estimator for the unbalanced panel data set β B β B(Φ). The prescription ensuring this is v Φ. [A Feasible Between estimator is obtained for Φ Φ as obtained from (5). The GLS estimator for the unbalanced case can be interpreted as a matrix weighted mean of β W and β B β B(Φ) with weights depending on X. [Replacing Φ by Φ we get the Feasible GLS estimator. Unlike the OLS estimator for the unbalanced case it cannot however in general be interpreted as a matrix weighted mean of β W and β B β B(T ). 3. Since in the balanced case (Φ θ T ) we have B XX (Φ) θb XX β B(Φ) β B(T ) β B β B the familiar decomposition given by (16) holds. References Baltagi B.H. (1985): Pooling Cross-Sections with Unequal me-series Lengths. Economics Letters 18 133 136. Baltagi B.H. (008): Econometric Analysis of Panel Data. Chichester: Wiley. Biørn E. (1981): Estimating Economic Relations from Incomplete Cross-Section/me-Series Data. Journal of Econometrics 16 1 36. Biørn E. (004): Regression Systems for Unbalanced Panel Data: A Stepwise Maximum Likelihood Procedure. Journal of Econometrics 1 81 91. Hsiao C. (003): Analysis of Panel Data. Cambridge: Cambridge University Press 1986. Lütkepohl H. (1996): Handbook of Matrices. Chichester: Wiley. Searle S.R. Casella G. and McCulloch C.E. (199): Variance Components. New York: Wiley. 6