Linear Model Under General Variance

Similar documents
Linear Model Under General Variance Structure: Autocorrelation

Simple and Multiple Linear Regression

LECTURE 11: GENERALIZED LEAST SQUARES (GLS) In this lecture, we will consider the model y = Xβ + ε retaining the assumption Ey = Xβ.

Topic 6: Non-Spherical Disturbances

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Parameter Estimation

Econometrics Summary Algebraic and Statistical Preliminaries

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Regression Models - Introduction

Inference in Regression Analysis

Multivariate Regression

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Introduction to Econometrics. Heteroskedasticity

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

the error term could vary over the observations, in ways that are related

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Y i = η + ɛ i, i = 1,...,n.

An Introduction to Parameter Estimation

Instrumental Variables and Two-Stage Least Squares

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Least Squares Estimation-Finite-Sample Properties

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Econ 510 B. Brown Spring 2014 Final Exam Answers

Simple Linear Regression

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

2.1 Linear regression with matrices

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Answers to Problem Set #4

Chapter 14 Stein-Rule Estimation

Instrumental Variables, Simultaneous and Systems of Equations

STAT5044: Regression and Anova. Inyoung Kim

Regression Models - Introduction

Advanced Econometrics I

MS&E 226: Small Data

Regression: Lecture 2

Chapter 14. Linear least squares

Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

4 Multiple Linear Regression

Multiple Regression Analysis

Linear Regression. Junhui Qian. October 27, 2014

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

Inference in Normal Regression Model. Dr. Frank Wood

Graduate Econometrics Lecture 4: Heteroskedasticity

Review of Econometrics

Advanced Econometrics

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing

Statistics 910, #5 1. Regression Methods

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Using Instrumental Variables to Find Causal Effects in Public Health

Quick Review on Linear Multiple Regression

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

ECON 4230 Intermediate Econometric Theory Exam

3 Multiple Linear Regression

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Financial Econometrics

Multiple Linear Regression

Econ 2120: Section 2

Linear Regression. y» F; Ey = + x Vary = ¾ 2. ) y = + x + u. Eu = 0 Varu = ¾ 2 Exu = 0:

Econometrics - 30C00200

Sensitivity of GLS estimators in random effects models

Maximum-Likelihood Estimation: Basic Ideas

Simple Linear Regression

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari

STAT 100C: Linear models

Vector Auto-Regressive Models

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

VAR Models and Applications

Intermediate Econometrics

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Introductory Econometrics

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Bias Variance Trade-off

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Week 11 Heteroskedasticity and Autocorrelation

Lecture 6: Linear models and Gauss-Markov theorem

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

The Statistical Property of Ordinary Least Squares

1. Simple Linear Regression

Multivariate Regression Analysis

Econometrics Master in Business and Quantitative Methods

Linear Regression and Its Applications

CHAPTER 6: SPECIFICATION VARIABLES

1 Correlation between an independent variable and the error

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

01 Probability Theory and Statistics Review

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Variable Selection and Model Building

Specification errors in linear regression models

Agricultural and Applied Economics 637 Applied Econometrics II

Regression and Statistical Inference

Regression. Oscar García

Non-linear panel data modeling

Transcription:

Linear Model Under General Variance We have a sample of T random variables y 1, y 2,, y T, satisfying the linear model Y = X β + e, where Y = (y 1,, y T )' is a (T 1) vector of random variables, X = (T K) matrix of explanatory variables, β is a (K 1) vector of parameters, and e = (e 1,, e T )' is a (T 1) error term vector. Under the classical linear model we have assumed Assumption A3: the e t s are independently distributed Assumption A4: V(e) = σ 2 I T, implying that V(e t ) = σ 2, t = 1,, T. (homoscedasticity). Lets relax assumptions A3 and A4 and consider the more general case where V(e) = s 2 y where σ 2 is a positive scalar, and ψ is a (T T) symmetric, positive-definite matrix. Under this structure: The variance of e is proportional to the matrix y. It allows for the variance of e t to vary across observations It allows for the possibility of non-zero covariance between e t and e t*, t t*. A Reformulation of the Standard Model Consider the model Y = X β + e, where V(e) = σ 2 ψ (model M). The (T T) matrix ψ being symmetric, positive-definite, it is non-singular and can be written as ψ -1 = P' P,

2 where P is a (T T) non-singular matrix. This is called the Cholesky decomposition of ψ -1. It follows that ψ = (P' P) -1 = P -1 (P' ) -1 or P ψ P' = I T. Let X * = P X, Y * = PY, and e * = P e. Premultiplying model M by P gives PY = P X β + P e, or Y * = X * β + e * (model M*) With P being a non-singular matrix, models M and M * are informationally equivalent. V(e) = σ 2 ψ, does not satisfy A3-A4 when ψ I T, V(e * ) = V(P e) = P V(e) P' = P σ 2 ψ P' = σ 2 P ψ P' = σ 2 P P -1 P' -1 P', since ψ = P -1 P' -1, = σ 2 I T, which satisfies assumptions A3-A4. While model M does not satisfy conditions A3-A4, model M * does. This implies that all the results obtained for the classical linear model do apply to model M *. Estimation of b Under General Variance Structure Consider the error sum-of-squares for model (M * ) S = e *' e * = (P e) ' (P e) = e ' P ' P e = e ' ψ -1 e, since P ' P = ψ -1. The term S = (e ' ψ -1 e) is called the weighted error sum of squares, where the weights involve the inverse of the ψ matrix (which is proportional to the variance of e). The value of β that minimizes S is b g the weighted least squares or generalized least squares estimator of β. b g is simply the least squares estimator under model M * b g = (X *' X * ) -1 X *' Y *, since b g is the least-squares estimator of β in M * = [(P X) ' P X] -1 (P X) ' P Y

3 = [X ' P ' P X] -1 X ' P ' P Y = [X ' ψ -1 X] -1 X ' ψ -1 Y, since P ' P = ψ -1. The generalized least squares estimator of β is b g = [X ' y -1 X] -1 X ' y -1 Y. Properties of b g Since b g is simply the least squares estimator of β in model M * and model M * satisfies all assumptions of the classical linear model, it follows that all of the results obtained for the classical linear model apply to model M * : β g is also the max. likelihood estimator of β in model M* where e is normally distributed. β g is an unbiased estimator of β: E(β g ) = β. V(β g ) = σ 2 [X *' X * ] -1 = σ 2 [(P X) ' (P X)] -1 = σ 2 [X ' P ' P X] -1 or, given ψ -1 = ' P, V(b g) ) = s 2 [X ' y -1 X] -1. β g is the best linear unbiased estimator (BLUE) of β, implying that it is efficient in finite sample (its variance is smallest compared to the variance of all linear unbiased estimators). In large samples, β g is a consistent estimator of β an asymptotically efficient estimator of β asymptotically normal with T 1/2 (β g - β) d N(0, T σ 2 (X ' ψ -1 X) -1 ], or β g N(β, σ 2 (X ' ψ -1 X) -1 ] as T. A Comparison of b s and b g We have b s = (X ' X) -1 X ' Y as the least-squares estimator of β, and b g = [X ' y -1 X] -1 X ' y -1 Y as the generalized least squares estimator of β. In general, β s β g whenever the matrix ψ is not proportional to the identity matrix I T. Then if ψ I T, on what basis can we choose between these two estimators in the estimation of β in model M?

4 Bias of b s Under General Variance Structure We have E(β s ) = E[(X'X) -1 X' Y] = (X'X) -1 X'E(y) = (X'X) -1 X' E[X β + e] = (X X) -1 X' X β + (X' X) -1 X' E(e) = β, since E(e) = 0. Thus, least squares estimator β s is an unbiased estimator of β under model M. Variance of b s Under The General Variance Structure Given E(β s ) = β under the general variance model M: V(b s ) = E[(b s - b)(b s - b)']. β s - β = (X' X) -1 X' Y - β = (X' X) -1 X' (X β + e) - β = (X' X) -1 X' e.? V(β s ) = E[(X' X) -1 X' e e' X (X' X) -1 ] = (X' X) -1 X' E(e e' ) X (X' X) -1 Given E(e e') = V(e) = σ 2 ψ,? V(b s ) = s 2 (X'X) -1 X' y X (X'X) -1. In Summary: When y = I T, then b g = b s = (X'X) -1 X' Y, and V(b g )=V(β s ) = σ 2 (X ψ -1 X) -1 = σ 2 (X X) -1 X ψ X (X X) -1 = s 2 (X'X) -1 When y I T under model M, we have b g = (X' y -1 X) -1 X' y -1 y b s, and V(b g ) = s 2 (X'y -1 X) -1 V(b s ) Efficiency of b s Under the General Variance Structure Applying the Gauss-Markov theorem to model M * implies: β g = (X' ψ -1 X) -1 X' ψ -1 Y is the best linear unbiased estimator (BLUE) of β in M. β s = (X' X) -1 X' Y is another linear unbiased estimator: V(β g ) = σ 2 (X' ψ -1 X) -1 V(β s ). In general b s is an inefficient estimator of β in model M (its variance is large compared to the variance of b g ).

5 Consistency of b s Under the General Variance Structure Since β s is an unbiased estimator of β, it is also asymptotically unbiased. Assume that, as T, (X'X/T) and (X' ψ X/T) converge each to a finite, nonsingular matrix. This implies that V(β s ) = σ 2 (X'X) -1 X' ψ X (X' X) -1 = (1/T) σ 2 (X' X/T) -1 (X' ψ X/T) (X' X/T) -1 0 as T. Together with being asymptotically unbiased, this implies that β s is a consistent estimator of β in model M. Estimation of s 2 Under the General Variance Structure When ψ I T, model M does not satisfy the conditions of the classical linear model. This implies σ l 2 = (Y X β s )' (Y X β s )/T, and σ u 2 = (Y X β s )' (Y X β s )/(T K) are in general biased and inconsistent estimators of σ 2. With models M and M * being informationally equivalent and model M * satisfying the conditions of the classical linear model, we can apply the results obtained for the classical linear model to M * : σ gl 2 = (Y * X * β g )' (Y * X * β g )/T a biased but consistent estimator of σ 2, β g = (X * ' X * ) -1 X * ' Y * = (X' ψ X) -1 X' ψ Y. This implies: σ gl 2 = (P Y P X β g )' (P Y P X β g )/T = (Y X β g )' P'P (Y X β g )/T. Since P'P = ψ -1, σ gl 2 = (Y X β g )' ψ -1 (Y X β g )/T and is a biased but consistent estimator of σ 2. Results of the classical linear model applied to M * give σ gu 2 = (Y * X * β g )' (Y * X * β g )/(T K) as an unbiased and consistent estimator of s 2 and σ gu 2 = (PY P X β g ) ' (PY P X β g )/(T K) = (Y X β g ) ' P' P (Y X β g )/(T K).

6 Since P' P = ψ -1, s gu 2 = (Y X b g )' y -1 (Y X b g )/(T K) is an unbiased and consistent estimator of σ 2. When y I T, in general s gl 2 s l 2, and s gu 2 s u 2, with only the generalized least squares being consistent estimators of σ 2. Prediction Under the General Variance Structure Let the sample information based on T observations be Y = X β + e, Y is (T 1), X is (T K), and e is (T 1), where e ~(0, σ 2 ψ). Consider a prediction scenario where the intent is to anticipate new and unknown Y 0 given known explanatory variables X 0, where Y 0 is generated by Y 0 = X 0 β + e 0, where Y 0 is (T 0 1), X 0 is (T 0 K), and e 0 = (Y 0 X 0 β) is (T 0 1) where e 0 ~(0, σ 2 ψ 0 ) and Cov(e, e 0 ) = σ 2 C, with C being a (T T 0 ) matrix. Note that when C 0, this allows for non-zero covariance between the error term of the sample and error term of the prediction. The variance of (e, e 0 ) is V(e, e 0 ) = σ 2 Ψ Ψ C ' Ψ, where is 0 C' Ψ0 symmetric, positive-definite (and thus non-singular) matrix. An Alternative Formulation Ψ Consider the Cholesky decomposition (P) of C' ; P' P = Ψ0 P1 0 where P = is a (T + T 0 ) (T + T 0 ) non-singular matrix. P2 P3 Ψ This implies: = P -1 P' -1 Ψ, or P C' Ψ P' = I T + T, or 0 0 C' Ψ0 P1 0 Ψ P1 ' P2 ' IT 0 P2 P 3 C' Ψ =. 0 0 P3 ' 0 IT 0 It follows that (P 2 ψ + P 3 C' ) P 1 ' = 0, or P 2 ψ = P 3 C', or P 3-1 P 2 = C' ψ -1. Consider model Q: Y X e = β + Y 0 X 0 e. 0 1 Ψ C' Ψ 0 1

7 Premultiplying by P results in model Q * where Y X e P P β P Y = + X e, or 0 0 0 Y* P1 0 Y = Y* P P Y, 0 2 3 0 e* P1 0 e = e* P P e. 0 2 3 0 y* X* e* β y* = + X* e*, 0 0 0 X* P1 0 X = X* P P X, and 0 2 3 0 Note that, since the matrix P is non-singular, models Q and Q * are informationally equivalent. Predicting Y 0 Under the General Variance Structure V(e *, e * 0 ) = P V(e, e 0 ) P' = σ 2 Ψ P P' = σ 2 I T + T C' Ψ 0. Thus Q * satisfies all 0 the assumptions of the traditional linear regression model. It follows that (X * 0 β g ) is the best linear unbiased predictor of Y * 0, where E(Y * 0 - X * 0 β g ) = 0 and (X * 0 β g ) has the smallest variance among linear unbiased predictors of Y * 0. Y * 0 = P 2 Y + P 3 Y 0 and X * 0 = P 2 X + P 3 X 0. This gives Y 0 = P -1 3 Y * 0 P -1 3 P 2 Y = P -1 3 [X * 0 β g ] - P -1 3 P 2 Y = P -1 3 [(P 2 X + P 3 X 0 ) β g ] P -1 3 P 2 Y = X 0 β g - P -1 3 P 2 [Y - X β g ]. Since -P -1 3 P 2 = C' ψ -1, this implies that the best linear unbiased predictor of Y 0 is X 0 b g + C' y -1 [y X b g ]. This predictor is unbiased in the sense that E[Y 0 (X 0 β g + C' ψ -1 [Y X β g ])] = 0. And it is best in the sense that it has the smallest possible variance among all linear unbiased predictors of Y 0. The predictor of Y 0 is X 0 β g if C = 0, but X 0 β g if C 0.

8 The prediction error (ε) is ε = Y 0 - (X 0 β g + C' ψ -1 [y - X β g ]). The variance of the prediction error, V(ε) is V(ε) = V(Y 0 X 0 β g C' ψ -1 [Y X β g ]) = V[P 3-1 (Y 0 * X 0 * β g )], since P 3-1 P 2 = C' ψ -1, = P 3-1 V(Y 0 * - X 0 * β g ) P 3 ' -1 = σ 2 P -1 3 [ I T0 + X 0 * (X * ' X * ) -1 X 0 * '] P 3 ' -1, using results from the classical linear model applied to model Q *. = σ 2 [P 3-1 P 3 ' -1 + P 3-1 X 0 * (X * ' X * ) -1 X 0 * ' P 3 ' -1 ] = σ 2 [ψ 0 C' ψ -1 C + (P 3-1 P 2 X + X 0 )(X * ' X * ) -1 (X' P 2 ' P 3 ' -1 + X 0 '], (proving this step is a little tedious ) = σ 2 [ψ 0 C' ψ -1 C + (X 0 C' ψ -1 X)(X' ψ -1 X) -1 (X 0 ' X' ψ -1 C)]. Note that the variance of the prediction error satisfies V(ε) = σ 2 [ψ 0 + X 0 (X' ψ -1 X) -1 X 0 '] if C = 0, σ 2 [ψ 0 + X 0 (X' ψ -1 X) -1 X 0 '] if C 0. Hypothesis Testing Under The General Variance Structure Assume we have the model: Y = X β + e where e ~ (0, σ 2 ψ). Consider the hypothesis consisting of J linear restrictions on β Null hypothesis: H 0 : R β = r Alternative hypothesis: H 1 : R β r, where R is a known (J K) matrix of rank J, and r is a known (J 1) vector. With the (K K) matrix ψ is known, the unrestricted generalized least squares estimator of β is b g = [X' y -1 X] -1 X' y -1 Y. We have shown that β g is an unbiased, consistent, and efficient estimator of β. An unbiased estimator of σ 2 is s gu 2 = (Y X b g ) ' y -1 (Y X b g )/(T K), and an unbiased estimator of the variance of β g is V(b g ) = s gu 2 [X' y -1 X] -1.

9 Under null hypothesis H 0, the restricted generalized least squares estimator of β is b gr = b g + C R' [R C R'] -1 [r R b g ], where C = [X' ψ -1 X] -1. Applying the results of the classical linear model to model M *. This gives the test statistic λ = (WSSE R WSSE u )/(J σ gu 2 ) = (β g - β gr ) ' X' ψ -1 X (β g - β gr )/(J σ gu 2 ) = (R β g - r) ' [R (X' ψ -1 X) -1 R'] -1 (R β g - r)/(j σ gu 2 ) where WSSE R = (y X β gr ) ' ψ -1 (y X β gr ), WSSE u = (y X β g ) ψ -1 (y X β g ) And WSSE R is the weighted restricted error sum of squares and WSSE U the weighted unrestricted error sum of squares. Under H 0 and assuming that e ~ N(0, s 2 y), the test statistic λ is distributed as F (J, T-K). With normality the following test procedure can be undertaken: Choose the significance level α = P(type-1 error) Find λ c that satisfies α = P(F (J, T-K) λ c ). Reject H 0 if λ > λ c Accept H 0 if λ λ c. If J = 1, consider using λ 1/2 : t = (R β g - r)/ [σ gu (R (X' ψ -1 X) -1 R') 1/2 ], t t (T-K) under H 0 which implies the following equivalent test procedure Choose the significance level α = P(type-1 error) Find t c that satisfies α/2 = P(t (T-K) t c ). Reject H 0 if t > t c Accept H 0 if t t c Estimation of b, s 2 and y When y not Known We have discussed the estimation of β and σ 2, assuming the (T T) matrix y is known. We proposed the unbiased estimators b g = [X' y -1 X] -1 X' y -1 Y, and s gu 2 = (Y X b l )' y -1 (Y X b l )/(T K). Note that both estimators depend on y. This is fine if ψ is known. However, it creates a problem if ψ is not known to the investigator. In that case, our proposed estimators are not empirically tractable (since they depend on the unknown ψ).

10 Lets consider the case where the (T T) matrix y is unknown and needs to be estimated. Thus, given a sample Y, we look for estimators β e for β, (σ 2 ) e for σ 2, and ψ e for ψ. A simple and intuitive way to proceed is, first to choose some estimator ψ e for ψ, and then to substitute it into our proposed estimators to obtain β e = [X' (ψ e ) -1 X] -1 X' (ψ e ) -1 Y as an estimator of β, and (σ 2 ) e = (Y X β l )' (ψ e ) -1 (Y X β l )/(T K) as an estimator of σ 2. This is the essence of the estimation method discussed below. This simple approach raises some difficult questions in evaluating the statistical properties of the estimator. The reason is that, since β e now depends explicitly on ψ e, the estimators β e and ψ e are necessarily correlated random variables. Note that this differs significantly from the classical linear model, where b s and s 2 u conveniently happened to be uncorrelated. This means that the small sample properties of the estimator can be complex and difficult to establish. However, large sample properties of the estimator remain available. Being easier to evaluate, we will rely extensively on such asymptotic properties. Some Key Asymptotic Results Under the General Variance Structure Assume that plim[(x' ψ -1 X)/T] = a (K K) finite, non-singular matrix. Let ψ e be a consistent estimator of ψ. Then plim[(x' (ψ e ) -1 X)/T] = plim[(x' ψ -1 X)/T] and plim[(x' (ψ e ) -1 e)/(t 1/2 )] = plim[(x' ψ -1 e)/(t 1/2 )]. β fg = [X' (ψ e ) -1 X] -1 X' (ψ e ) -1 Y is called the feasible generalized least squares estimator of β. When ψ e be a consistent estimator of ψ, it can be shown that β fg = [X' (ψ e ) -1 X] -1 X' (ψ e ) -1 Y has the same asymptotic distribution as β g = [X' ψ -1 X] -1 X' ψ -1 Y.

11 This is an important result since we already known the asymptotic properties of β g. It implies the following asymptotic properties for the feasible generalized least squares estimator β fg When ψ e be a consistent estimator of ψ, the estimator β fg of β is asymptotically unbiased consistent asymptotically efficient asymptotically normal, with (T 1/2 ) (β fg - β) d N(0, σ 2 [(X ψ -1 X)/T] -1 ) or b fg» N(b, s 2 [X y -1 X] -1 ) as T fi. A Proposed Estimation Procedure With a General Variance Structure We propose the following three-step estimation procedure: Obtain the least squares estimator β s = (X' X) -1 X'Y, a consistent estimator of β. From these estimates generate e s = Y X β s as a consistent estimator of e. Use e s to obtain consistent estimators ψ e of ψ, and (σ 2 ) e of σ 2. Obtain the feasible generalized least-squares estimator β fg = [X' (ψ e ) -1 X] -1 X' (ψ e ) -1 Y. This estimator β fg of β is consistent, asymptotically efficient, and satisfy b fg» N(b, s 2 [X' y -1 X] -1 ) as T. It follows that [(s 2 ) e [X' (y e ) -1 X] -1 ] is a consistent estimator of V(b fg ), which can be used to conduct asymptotic tests about β (e.g., using a Wald test). The above procedure is written in a very general form. How it gets implemented typically depends on the model specification for ψ. For that reason, we proceed with an analysis of more specific models.