Econometria. Estimation and hypotheses testing in the uni-equational linear regression model: cross-section data. Luca Fanelli. University of Bologna

Similar documents
Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

ECONOMETRIC MODELS. The concept of Data Generating Process (DGP) and its relationships with the analysis of specication.

the error term could vary over the observations, in ways that are related

Heteroskedasticity and Autocorrelation

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Testing Linear Restrictions: cont.

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Intermediate Econometrics

Least Squares Estimation-Finite-Sample Properties

Maximum Likelihood (ML) Estimation

GMM estimation of spatial panels

Regression and Statistical Inference

Linear Regression and Its Applications

Simple Linear Regression: The Model

The outline for Unit 3

Advanced Econometrics I

Spatial Regression. 9. Specification Tests (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

ECONOMETRICS. Bruce E. Hansen. c2000, 2001, 2002, 2003, University of Wisconsin

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

The Linear Regression Model

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Heteroskedasticity-Robust Inference in Finite Samples

Homoskedasticity. Var (u X) = σ 2. (23)

Exercises Chapter 4 Statistical Hypothesis Testing

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Reduced rank regression in cointegrated models

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Econometrics Summary Algebraic and Statistical Preliminaries

The Equality of OLS and GLS Estimators in the

Multivariate Regression Analysis

Introductory Econometrics

p(z)

Scaled and adjusted restricted tests in. multi-sample analysis of moment structures. Albert Satorra. Universitat Pompeu Fabra.

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

1 Outline Part I: Linear Programming (LP) Interior-Point Approach 1. Simplex Approach Comparison Part II: Semidenite Programming (SDP) Concludin

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Multiple Regression Analysis

Introductory Econometrics

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Econ 510 B. Brown Spring 2014 Final Exam Answers

Robust linear optimization under general norms

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Introductory Econometrics

Econometrics Multiple Regression Analysis: Heteroskedasticity

Chapter 1. GMM: Basic Concepts

Quick Review on Linear Multiple Regression

Introduction to Econometrics. Heteroskedasticity

Multiple Linear Regression

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

New Introduction to Multiple Time Series Analysis

Economics 472. Lecture 10. where we will refer to y t as a m-vector of endogenous variables, x t as a q-vector of exogenous variables,

Notes on Time Series Modeling

8. Hypothesis Testing

Linear Regression. y» F; Ey = + x Vary = ¾ 2. ) y = + x + u. Eu = 0 Varu = ¾ 2 Exu = 0:

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Review of Econometrics

Appendix A: The time series behavior of employment growth

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

1. The Multivariate Classical Linear Regression Model

LECTURE 5 HYPOTHESIS TESTING

Matrix Differential Calculus with Applications in Statistics and Econometrics

Discrete Dependent Variable Models

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

Econometrics II - EXAM Answer each question in separate sheets in three hours

Econometrics I. Ricardo Mora

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Financial Econometrics

A Course on Advanced Econometrics

1. The OLS Estimator. 1.1 Population model and notation

Econometric Analysis of Cross Section and Panel Data

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Lectures 5 & 6: Hypothesis Testing

Lecture 1: OLS derivations and inference

Heteroskedasticity. Part VII. Heteroskedasticity

The Standard Linear Model: Hypothesis Testing

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Analysis of Cross-Sectional Data

Tests forspatial Lag Dependence Based onmethodof Moments Estimation

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Y t = ΦD t + Π 1 Y t Π p Y t p + ε t, D t = deterministic terms

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

(c) i) In ation (INFL) is regressed on the unemployment rate (UNR):

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Reliability of inference (1 of 2 lectures)

Ma 3/103: Lecture 24 Linear Regression I: Estimation

4.8 Instrumental Variables

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Transcription:

Econometria Estimation and hypotheses testing in the uni-equational linear regression model: cross-section data Luca Fanelli University of Bologna luca.fanelli@unibo.it

Estimation and hypotheses testing in the uni-equational regression model - Model - Compact representation - OLS - GLS - ML - Constrained estimation - Testing linear hypotheses

Estimation: cross-section data Uni-equational linear regression model 8 >< >: y i = x 0 i + u i, i = 1; 2; :::; n classical: generalized: Cov(u i, u j ):=0, 8i 6= j E(u 2 i ):=2 u, 8i Cov(u i, u j ):= i;j zero or not E(u 2 i ):=2 i We know that the generalized model has to be intended as `virtual' unless further information is used ( i;j and 2 i are generally unknown)

Compact matrix representation y n1 = X nk k1 + u n1 E(u)= 0 n1, E(uu 0 ):= nn Why do we use this representation? Because it is useful in some cases! It is particularly useful in order to derive the estimators of interest!

Write y i = x 0 i + u i, i = 1; 2; :::; n as: y 1 = x 0 1 + u 1, i:=1 y 2 = x 0 2 + u 2, i:=2.... y n = x 0 n + u n, i:=n: Compactly: 2 6 4 y 1 y 2. y n 3 7 5 = 2 6 4 x 0 1 x 0 2. x 0 n 3 7 5 + 2 6 4 u 1 u 2. u n 3 7 5 y n1 = X nk k1 + u n1

What about the matrix E(uu 0 ):= nn? E(uu 0 ):=E = E 02 B6 @ 4 02 B6 @ 4 u 1 u 2. u n 3 7 5 [u 1; u 2 ; : : : u n ] u 2 1 u 1 u 2 u 1 u n u 2 u 1 u 2 2.... u n u 1 u n u 2 u 2 n 1 C A 31 7C 5A

= 2 6 4 E(u 2 1 ) E(u 1u 2 ) E(u 1 u n ) E(u 2 u 1 ) E(u 2 2 ).... E(u n u 1 ) E(u n u 2 ) E(u 2 n) 3 7 5 = 2 6 4 E(u 2 1 ) Cov(u 1; u 2 ) Cov(u 1 ; u n ) Cov(u 2 ; u 1 ) E(u 2 2 ).... Cov(u n ; u 1 ) Cov(u n ; u 2 ) E(u 2 n) 3 7 5 = 2 6 4 2 1 1;2 1;n 2;1 2 2.... n;1 n;2 2 n 3 7 5 It is clear that := ( 2 u I n 6= 2 ui n classical model generalzied model :

To sum up, the compact representation of the uniequational linear regression model based on cross-section data is: y n1 = X nk k1 + u n1 where E(u)= 0 n1, E(uu0 ):= nn := ( 2 u I n 6= 2 u I n classical model generalized model :

The representations y i = x 0 i + u i, i = 1; 2; :::; n and y n1 are interchangeable. = X nk k1 + u n1 The link between the two is immediately given if one recalls that link is obtained by: X 0 X nx i=1 x i x 0 i ; X 0 y nx i=1 x i y i :

We have introduced the classical uni-equation regression model in the following way: y i = x 0 i + u i, i = 1; 2; :::; n E(u i j x i ) = 0 ) E(u i ) = 0 Hp: E(u 2 i j x i) = 2 i Hp: Cov(u i ; u j ) = i;j 8i 6= j: Using the compact representation, we can say even more about the feautes of this model.

In particular, given the conditional nature of the model (we are conditioning with respect to regressors) we will be writing y n1 = X nk k1 + u n1 E(u j X) = 0 n1 ) E(u) = 0 n1 E(uu 0 j X):= nn ) E(uu 0 ) = nn so that u and X can be thought of being stocastic independent in the sense that E(u j X) = E(u) E(uu 0 j X) = E(uu 0 ):

It is useful to focus on the meaning n 1 vector E(u j X): From the previous set of slides we know: E(u j X):=E 00 BB @@ u 1. u n 1 C 1 C A j XA := 2 6 4 E(u 1 j X). E(u n j X) 3 7 5 Now consider e.g. E(u 1 j X) E(u 1 jall stochastic variables within X); E(u 2 j X) E(u 2 jall stochastic variables within X);... E(u n j X) E(u n jall stochastic variables within X):

Thus E(u n j X) = 0 means that conditional on the knowledge of all regressors x 1 (i:=1); x 2 (i:=2),..., x n (i:=n) that enter the matrix X, the expected value of u n is zero as the unconditional expectation: E(u n j X) = E(u n ) = 0: This holds for E(u n 1 j X), E(u n 2 j X); :::; E(u 1 j X) and allows us to write that E(u j X)=0 n1 ) E X (E(u j X)) =0 n1 The condition E(u j X)=0 n1 is usually assumed with cross-section data and is stronger than it apparently seems. Why? Example on the blackboard. E(u j X)=0 n1 ) E(u)=0 n1 (law of iterated expectations) The reverse it is not true, E(u)=0 n1 does not imply E(u j X)=0 n1!

For regression models based on time series data where lags of y appear among the regressors, the condition E(u j X)=0 n1 doesn't hold (but it holds E(u)=0 n1 ) Example on the blackboard based on: c i = 0 + 1 c i 1 + 2 z i + u i, i = 1; :::; n:

Classical model with cross-section data: OLS estimation y n1 = X nk k1 + u n1 E(u j X)= 0 n1, E(uu0 j X):= 2 ui n: Vector of k + 1 unknown parameters: = ( 0 ; 2 u) 0. Objective function: Q() = 1 2 u nx i=1 (y i x 0 i )2 1 2 u (y X ) 0 (y X ):

Given Q() = 1 2 u nx i=1 (y i x 0 i )2 1 2 u (y X) 0 (y X); the OLS estimator of is obtained by solving the problem min Q() i.e. the OLS estimator of is the vector that solves: ^ OLS = arg min Q():

First-order conditions: where note that 0 B @ @Q(; 2 u) @ @Q(; 2 u ) @ 1 @Q(; 2 u ) @ 2. @Q(; 2 u ) @ k 1 C A = 0 k1 = 0 B @ 0 0. 0 1 C A : The needed derivative rules are in the Appendix 3.A.3 (Ch. 3) of the textbook. Exercise at home.

The solution of this problem leads us to ^ OLS := 0 nx @ i=1 x i x 0 i 1 A 1 0 nx @ i=1 1 x i y i A X 0 X 1 X 0 y : The condition rank(x) = k ) X 0 X is non-singular is crucial to obtain a valid (unique) OLS estimator. The estimator of 2 u is obtained indirectly ^ 2 u = 1 n k 0 @ nx i=1 ^u 2 i 1 A 1 n k ^u0^u where ^u i = y i x 0 i^ OLS, i = 1; :::; n or, alternatively, ^u = (y X ^OLS ):

Are the estimators of and 2 u correct? E(^OLS )=, E(^ 2 u)= 2 u?? (if yes, under which conditions?) Consider that ^ OLS :=(X 0 X) 1 X 0 [X + u] = + (X 0 X) 1 X 0 u and E(^OLS ):=E X E ^OLS j X : Likewise, E(^ 2 u ):=E X E ^ 2 u j X :

Estimator of : E ^OLS j X = + (X 0 X) 1 X 0 E (u j X) : Hence, if E (u j X) = 0 n1, one has E ^OLS j X = ) E(^OLS ):=E X E ^OLS j X = E X () =; the OLS estimator is correct. Note that if E (u j X) 6= 0 n1, the estimator is no longer correct (it happens in the regression model with time series data in which regressors include lags of y).

Estimator of 2 u: To check whether the estimator ^ 2 u is correct, we start from some computations and considerations: ^u = (y X ^OLS ) = (y X (X 0 X) 1 X 0 y) =(X + u X (X 0 X) 1 X 0 )(X + u) =(X + u X X(X 0 X) 1 X 0 u) =(I n X(X 0 X) 1 X 0 )u The matrix (I n X(X 0 X) 1 X 0 ) will be indicated with the symbol: M XX :

M XX :=(I n X(X 0 X) 1 X 0 ) is a `special' matrix (idempotent matrix): - is symmetric: M 0 XX = M XX; - is such that M XX M XX = M XX - rank(m XX )=tr(m XX )=n k: (Properties of trace operator on the blackboard).

We have proved that ^u := (y X ^OLS ) = M XX u: E ^ 2 u j X = E 1 n =E 1 n k u0 M XX u j X k ^u0^u j X = n 1 k E u0 M XX u j X = n 1 k E tr u0 M XX u j X = n 1 k E tr M XXuu 0 j X = n 1 k tr E M XXuu 0 j X = n 1 k tr M XXE uu 0 j X = n 1 k tr M XX 2 u I n = n 2 k tr (M XX) = 2 u n k (n k) = 2 u

We have proved that E(^ 2 u j X) = 2 u ) E(^ 2 u):=e X E ^ 2 u j X = E X 2 u = 2 u ; the OLS estimator of 2 u is correct.

Covariance matrix of ^OLS. V ar ^OLS j X =V ar + (X 0 X) 1 X 0 u j X =V ar (X 0 X) 1 X 0 u j X :=(X 0 X) 1 X 0 2 ui n X(X 0 X) 1 = 2 u(x 0 X) 1 : Recall section Let v be p1 stochastic vector with V ar(v):=v (symmetric positive denite) and A m p non-stochastic matrix. Then (sandwich rule). V ar(av) = A mp V A 0 pp pm Assume now that also A is stochastic, i.e. random variables. its elements are Then the quantity V ar(av j A) means that we can treat A as if it is a non-stochastic matrix! Hence V ar(av j A):=AV ar(v j A)A 0 : End of recall section

Note that V ar ^OLS :=EX V ar ^OLS j X = E X 2 u (X 0 X) 1 = 2 u E h (X 0 X) 1i = 2 ue h ( P n i=1 x i x 0 i ) 1i :

To sum up: The OLS estimator of in the linear classical model based on cross-section data is correct E(^OLS ) = and has (conditional) covariance matrix: V ar ^OLS j X := 2 u(x 0 X) 1 : The OLS estimator of 2 u E(^ 2 u)= 2 u in the linear classical model based on cross-section data is correct. Usually we are not interested to the variance of the estimator of ^ 2 u but in principle we can also derive it.

A famous theorem (Gauss-Markov Theorem) applies to the case in which X does not contain stocastic variables. It says that given the linear model y n1 = X nk k1 + u n1 E(u)=0 n1, E(uu 0 ):= 2 ui n the OLS estimator is, in the class of linear correct estimators, the one whose covariance matrix V ar ^OLS := 2 u (X 0 X) 1 is the `most ecient' (minimum covariance matrix).

In our case, X is stochastic (we condition the model with respect to the regressors). A version of the Gauss-Markov theorem where X is stochastic and all variables are conditioned with respect to the elements of X still applies. This means that any estimator dierent from the OLS estimator has (conditional) covariance matrix `larger' than V ar ^OLS j X := 2 u(x 0 X) 1 : This is why the OLS estimator applied to cross-section data (under homoskedasticity) is often called BLUE (Best Linear Unbiased Estimator).

Recall section Let 1 and 2 two symmetric positive denite matrices. We say that 1 `is larger' than 2 if the matrix 1 2 is semipositive denite (this means that the eigenvalues of 1 2 are either positive or zero). What are eigenvalues? Given a q q matrix A the eigenvalues are the solutions (that can be complex numbers) to the problem det(i q A) = 0: If the matrix A is symmetric the eigenvalues are real mumbers. If A is symmetric and positive denite the eigenvalues are real and positive. End of recall section

Why do we care about the covariance matrix of the estimator ^OLS? Because it is fundamental to make inference! Imagine that ^ OLS j X N(, 2 u(x 0 X) 1 ): ^G j X 2 (n k), ^G:=^2 u 2 (n k): u These distributions hold if u j X N(0, 2 ui n ) (irrespective of whether n is small or large). It is possible to show that the random vector ^ OLS j X is independent on the random variable ^G j X:

Observe that given then ^ OLS j X N(, 2 u(x 0 X) 1 ); ^ j j 1=2 j X N(0, 1), u cjj j = 1; :::; k where ^j is the j-th element of ^, j is the j-th element of and c jj is the j-th element on the main diagonal of the matrix (X 0 X) 1 :

It then easy to construct a statistial test for H 0 : j :=0 H 1 : j 6= 0. Recall that according to the theory of distributions, if Z N(0; 1) G 2 (q) and Z and G are independent then Z Gq 1=2 t(q):

Consider H 0 : j :=0 H 1 : j 6= 0 and the test statistic (t-ratio): ^ j 0 t:= 1=2 = ^ u cjj ^ j s.e.(^j ).

Then ^ j 0 ^ j 0 ^ j ^j t:= s.e.(^j ) = 0 1=2 = ^ u cjj u (c jj ) 1=2 ^ u (c jj ) 1=2 u (c jj ) 1=2 = u (c jj ) 1=2 ^ u u Accordingly, = ^ j 0 u (c jj ) 1=2 ^ 2 1=2 = u n k 2 u n k ^ j 0 u (c jj ) 1=2 1=2 ^G n k t j X = ^ j 0 u (c jj ) 1=2 ^G n k 1=2 j X N(0; 1) 1=2 j X t(n 2 (n k) n k Thus, if disturbances are Gaussian we can test the signicance of each regression coecient by running t-tests. These tests are t(n k) distributed. k): Observe that if n large, t(n k) N(0; 1).

OLS with `robust' standard errors The standard error s.e.(^j ):=^ u cjj 1=2 provides a measure of the variability of the estimator ^ j : This measure, however, has been obtained under the implicit assumption of homoskedasticity (indeed, to be fully ecient OLS requires homoskedasticity!).

Imagine that the `true model' is y n1 = X nk k1 + u n1 E(u j X)= 0 n1, E(uu0 j X):== 2 6 4 2 1 0 0 0 2 2 0...... 0 0 2 n 3 7 5 that means that we have heteroskedasticity! Assume that an econometrician erraneously believes that there is homoskedasticity and applies the OLS estimator. In this case, the OLS estimator is still correct (proof as exercise!) but will no longer be ecient! Is it possible to improve its eciency?

Then V ar ^OLS j X =V ar + (X 0 X) 1 X 0 u j X =V ar (X 0 X) 1 X 0 u j X :=(X 0 X) 1 X 0 X(X 0 X) 1 therefore (X 0 X) 1 X 0 X(X 0 X) 1 6= 2 u(xx) 1 : The matrix (X 0 X) 1 X 0 X(X 0 X) 1 is the `correct' covariance matrix of the OLS estimator in the presence of heteroskedasticity.

Let us focus in detail on the structure of the covariance matrix X 0 X: Since is diagonal, it is seen that X 0 X:= nx i=1 2 i x ix 0 i : The US econometrician White has shown that given the residuals ^u i :=y i x 0 i^ OLS, i = 1; 2; :::; n, the quantity 1 n nx i=1 ^u 2 i x ix 0 i is a consistent estimator for 1 n X0 X. In other words, for large n 1 n nx i=1 ^u 2 i x ix 0 i! p 1 n nx i=1 2 i x ix 0 i :

The consequence of this result is the following: when the econometrician has the suspect that homoskedasticity might not hold in his/her sample, he/she might take the standard errors not from the covariance matrix ^ 2 u(xx) 1 but rather on the covariance matrix = ( (X 0 X) 1 nx i=1 x i x 0 i ) 1 0 @ nx i=1 0 @ nx i=1 1 ^u 2 i x ix 0 A i (X 0 X) 1 ^u 2 i x ix 0 i 1 nx A ( i=1 x i x 0 i ) 1 The standard errors are `robust' in the sense that they take into account the heteroskedasticity. The robust standard errors are also known as `White standard errors'. Every econometric package complements OLS estimation with these standard errors.

Generalized model: GLS estimation y n1 = X nk k1 + u n1 E(u j X)= 0 n1, E(uu0 j X)= 6= 2 ui n: Vector of unknown parameters: = ( 0 ; vech()) 0. It is impossible to estimate on the basis of n observations. We need some assumptions. Hp: is known.

Objective function: Q() = (y X ) 0 1 (y X ) nx nx i=1 j=1 ij(y i x 0 i )(y j x 0 j ) ij elements of GLS estimator of (given ) is obtained by solving the problem min Q() i.e. the GLS estimator of is the vector that solves: ^ GLS = arg min Q():

The solution of this problem leads us to ^ GLS := X 0 1 X 1 X 0 1 y : The condition rank(x) = k ) X 0 1 X is non-singular is crucial to obtain a valid (unique) GLS estimator. It can be noticed that the GLS estimator of is `virtual' in the sense that it requires the knowledge of otherwise it can not be computed.

^ GLS = + X 0 1 X 1 X 0 1 u E(^GLS j X) = + X 0 1 X 1 X 0 1 E(u j X) = : V ar(^gls j X) = V ar(+ X 0 1 X X) 1 X 0 1 u j =V ar( X 0 1 X 1 X 0 1 u j X) = X 0 1 X 1 The GLS estimator is correct under the assumption E(u j X) = 0 n1 and has conditional covariance matrix V ar(^gls j X) = X 0 1 X 1 :

A famous theorem (Aitken Theorem) applies to the case in which X does not contain stocastic variables. It says that given the generalized linear model y n1 = X nk k1 + u n1 E(u)= 0 n1, E(uu0 ):= 6= 2 ui n the GLS estimator is, in the class of linear correct estimators, the one whose covariance matrix V ar ^GLS :=(X 0 1 X) 1 is the `most ecient' (minimum covariance matrix).

In our case, X is stochastic (we condition the model with respect to the regressors). A version of the Aitken theorem where X is stochastic and all variables are conditioned with respect to the elements of X still applies. This means that any estimator dierent from the GLS estimator has (conditional) covariance matrix `larger' than V ar ^GLS j X :=(X 0 1 X) 1 :

Feasible GLS estimators There are cases in which the GLS is `feasible', i.e. it can be calculated by using the information in the data. Imagine the model is: y i = 0 + 1 x 1;i + 2 x 2;i + u i, i = 1; :::; n E(u i j x i ) = 0 ) E(u i ) = 0 Hp: E(u 2 i j x i) = 2 i = h (x 1i) 2, h > 0 Hp: Cov(u i ; u j ) = 0, 8i 6= j: We can transform the model by dividing both sides by 1 i, obtaining y i i = 1 i 0 + 1 i 1 x 1;i + 1 i 2 x 2;i + 1 i u i

which is equivalent to y i (h) 1=2 x 1i = which is equivalent to where 1 (h) 1=2 x 1i 0 + + 1 (h) 1=2 x 1i 1 x 1;i 1 1 (h) 1=2 2 x 2;i + x 1i (h) 1=2 u i x 1i y i 1 = 0 + 1 + x 2;i 2 + u i x 1i x 1i x 1i x 1i y i = 1 + 0 x 1;i + 2x 2;i + u i y i := y i x 1i, x 1;i := 1 x 1i, x 2;i :=x 2;i x 1i, u i := u i x 1i. Now, E( u i 2 j xi ) = E( u i x 1i 2 j xi ) = 1 x1i 2 E(u 2 i j x i ) = 1 x1i 2 h (x1i ) 2 = h 8i:

The transformed model y i = 1 + 0 x 1;i + 2x 2;i + u i is homoskedatic because E( u i 2 j xi ):=h=const 8i: OLS can be applied to estimate 1, 0 and 2 eciently. This transformation is equivalent to the following idea: Hp : 2 i :=h (x 1i) 2, h > 0 2 2 x1;1 0 0 2 ) :=h 0 x1;2 0 6...... 6 4 0 0 x1;n 2 3 7 5 :=hv ^ GLS := X 0 (hv ) 1 X 1 X 0 (hv ) 1 y = X 0 V 1 X 1 X 0 V 1 y it is feasible because the matrix V is known!

Maximum Likelihood (ML) estimation Before discussing the ML estimation of the linear regression model, we overwiew this crucial estimation method. Statistical model = ( stochastic distribution sampling scheme ) f(data; ) = L() likelihood function. The g 1 vector of unknown parameters belongs to the (open) space, and the `true' value of, 0, is an interior point of.

The ML method is parametric because it requires the knowledge of the stochastic distribution of the variables, other than the sampling scheme. Model (classical to simplify): y i = x 0 i + u i Our parameters are: :=( 0, 2 u ) Key hypothesis (normality): y i j x i N(x 0 i, 2 u), i = 1; :::; n Recall that E(y i j x i )=x 0 i ; This implies u i j x i N(0, 2 u), i = 1; :::; n:

Thus, for each xed j 6= i u i u j! j x i; x j N 0 0!, " 2 u 0 0 2 u #! : More generally, we can write so that where u j X:= 0 B @ u 1 j X. u n j X f(u j X; ):= 1 C A N 0 n1, 2 ui n ny i=1 f(u i j X; ) 1 1 f(u i j X; ):= u (2) 1=2e 2 2 uu 2 i = 1 u (2) 1=2e 1 2 2 u(y i x 0 i )2 :

Then L() = f(data; ) = ny i=1 f(u i j X; ): It will be convenient to focus on the log-likelihood function: log L() = nx i=1 log f(u i j X; ) = C n 2 log 2 u 1 2 2 u nx i=1 (y i x 0 i )2 C n 2 log 2 u 1 2 2 u (y X) 0 (y X): The ML estimator of is obtained by solving the problem i.e. max log L() ^ML = arg max log L():

As is known, in order to obtain ^ML it is necessary to solve the rst-order conditions: s n () = @ log L() @ = 0 B @ @ log L() @ 1 @ log L() @ 2. @ log L() @ g 1 C A = 0 g1 that means that ^ML is such that s n (^ML ) = 0 g1 : The vector s n () is known as the score (gradient) of the likelihood function. It is not always possible to solve the rst order conditions analytically; in many curcumstances (e.g. nonlinear restrictions) numerical optimization procedures are required.

Note that s n () = @ log L() @ = @ P n i=1 log f(u i j x i ; ) @ = nx i=1 @ log f(u i j x i ; ) @ = nx i=1 s i (): Each component s i () of the score depends on the data, hence it is a random variable! To be sure that ^ML is a maximum (and not a minimum), it is further necessary that the Hessian matrix H n () = = 0 B @ @ @ 0 @ log L() @ @ 2 log L() @ 2 1 @ 2 log L() @ 1 @ 2. @ 2 log L() @ 1 @ g! @ 2 log L() @ 2 @ 1 @ 2 log L() @ 2 2 = @ @ 0s n() = @2 log L() @ 0 @.... @ 2 log L() @ 2 @ g @ 2 log L() @g@ 1 @ 2 log L() @g@ 2 @ 2 log L() @ 2 g be (semi)negative denite at the point ^ML, i.e. H n (^ML ) 0: 1 C A gg

We dene (Fisher) Information Matrix the quantity: I n () = E @2 log L() @ 0 @! = E (H n ()) and it can be shown that under a set of regularity conditions, including the hypothesis of correct speci- cation of the model, one has the equivalence: I n () = E @2 log L() @ 0 @ = E 0 @! = E s n ()s n () 0 " # " @ log L() @ log L() @ @ # 0 1 A : The Asymptotic Information Matrix is given by the quantity 1 I 1 () = lim n!1 n I 1 n() = lim n!1 n E( H n()): The inverse of the matrix I 1 () represents a lower bound for any estimator of : this means that any estimator of will have covariance matrix `greater' than or at most equal to I1 1 ():

A crucial requirement for ML estimation is that log L( 0 ) > log L() for each 2 n f 0 g condition that ensures the existence of a global maximum. However, also situations of the time: log L( 0 ) > log L() for 2 N 0 where N 0 is a neighborhood of 0 are potentially ne, local maximum.

Properties of ML estimator Crucial Assumption: the model is correctly specied (the underlying statistical model is correct). 1. In general, E(^ML ) 6= 0 namely the ML is not correct! 2. Under general conditions the ML estimator is consistent: ^ML! p 0 : 3. Under general conditions the ML estimator is Asymptotically Gaussian: n 1=2 ^ML 0!D N(0 g1 ; V^): This property suggests that one can do standard inference in large samples!

4. Under general conditions the ML estimator estimator is asymptotically ecient, in particular V^ = [I 1 ()] 1 : The properties 3-4-5 are asymptotic properties and make the ML estimator `optimal'.

max log L() ) max C n log 2 u 1 2 2 u (y X) 0 (y X)! 0 B @ @ log L() @ @ log L() @ 2 u 1 C A = 0 k1 0! k 1 1 1 The needed derivative rules are in the Appendix 3.A.3 (Ch. 3) of the textbook.

The solution of this problem leads us to ^ ML := 0 nx @ i=1 x i x 0 i 1 A ^ 2 u = 1 n 1 0 nx @ i=1 0 1 nx @ ^u 2 i i=1 1 x i y i A X 0 X A 1 n ^u0^u 1 X 0 y where ^u i = y i x 0 i^ ML, i = 1; :::; n or, alternatively, ^u = (y X ^ML ): Recall that by contruction, s n (^ML ; ^ 2 u) = 0 k1 and H n (^ML ; ^ 2 u) is negative semidenite.

Constrained estimation Given y = X + u with u j X N(0 n1 ; 2 ui n ) general (linear) restrictions on can be represented either in implicit form or in explicit form. Implicit form: or, alternatively R qk k1 := r q1 R qk k1 where q is the # of restrictions; r q1 := 0 q1

R qk k1 r q1 := 0 q1 R is a known q k matrix whose rows select the elements of that must be restricted; r is a known q 1 vector. EXAMPLES: BLACKBOARD!

Explicit form: where a = (k := H k1 ka q), ' a1 + h k1 H is a known k a selection matrix; ' is the a 1 vector that contains the elements of which are not subject to restrictions (free parameters); h is a known k 1 selection vector. EXAMPLES: BLACKBOARD!

One can write the linear contraints on either in implicit or in explicit form. The two methods are alternative but equivalent. For certain problem it is convenient the implicit form representation, for others it is convenient the explicit form representation. It is therefore clear that there must exist a connection between the matrices R and H and r and h:

In particular, take the explicit form :=H' + h and multiply both sides by R, obtaining R:=RH' + Rh: Then, since the right-hand-side of the expression above must be equal to r, it must hold RH:=0 qa Rh:=r: This means that the (rows of the) matrix R lie in the null column space of the matrix H (sp(r) sp(h? )).

Recall: Let A a n m matrix, n > m, with (column) rank m; A = [a 1 : a 2 : :::: : a m ]. sp(a) = fall linear combination of a 1 ; a 2 ; :::; a m g R n sp(a? ) = n v, v 0 a i = 0; for each i = 1; :::; m o R n all the vectors in sp(a? ) are said to lie in the null column space of the matrix A: Let A? be a n(n m) full column rank matrix that satises A 0? A = 0 (n m)m A 0 A? = 0 m(n m) : A? is called orthgonal complement of A. The n n matrix [A : A? ] has rank n and forms a basis of R n.

Constrained estimation means that one stimates the model under H 0, i.e. imposing the restriction implied by the null hypothesis. Suppose that H 0 :R:=r, (H 0 ::=H' + h) has been accepted by the data. Then we wish to re-estimate the null model, i.e. the econometric model that embodies the linear restriction. In this case, using the restrictions in explicit form is convenient.

Indeed, the null model can be written as and re-arranged as y = X[H' + h] + u where y n1 = X na ' a1 + u n1 y :=y Xh X :=XH: We obtain a transformed dynamic linear regression model whose parameters are no longer in = ( 0 ; 2 u) 0 but in = (' 0 ; 2 u )0 : Recall that ' is the a1 vector, a = k q, that contains the unrestricted parameters, i.e. those parameters which are not aected by H 0.

We know very well how to estimate ' and 2 u and their properties: ^' = (X 0 X ) 1 X 0 y = (H 0 X 0 XH) 1 H 0 X 0 (y Xh) where ^' = ^' CML or ^' = ^' CLS ; ~ 2 u = 1 n a (y X ^') 0 (y X ^'):

The constrained ML or LS estimators of is obtained indirectly as ~ = ^CML = H ^' CML + h ~ = ^CLS = H ^' CLS + h and inherit the same properties as ^' CML.

Obviously, the asymptotic covariance matrix of the restricted estimator ~ will be V ar(~ j X):=V ar(h ^'CLS + h j X) =HV ar(^' CLS j X)H 0 =H(X 0 X ) 1 H 0 =H(H 0 (X 0 X)H) 1 H 0 : It can be proved that H(H 0 (X 0 X)H) 1 H 0 2 u(x 0 X) 1 regardless of whether H 0 is true or not! This means that imposing restrictions on has the effect of reducing the variability of the estimator (thus the standard errors of the single coecients). Of course, this operation is `correct' only when H 0 is uspported by the data.

In the `traditional' textbook econometrics the derivation of constrained estimator is usually obtained by exploiting the restriction in implicit form, by solving either or min max ( ( Q() subject to R r = 0 log L() subject to R r:=0 :

In order to solve these problems it is necessary to apply a technique based on Lagrange multipliers. For instance, in the case of ML estimation, the problem above amounts to solving where max () (; ) = log L() + 0 (R r) 1q q1 is the Lagrangean function and is the vector of Lagrange multipliers. Each element of gives a weight to the corresponding linear restriciton. By using the restrictions in explicit form we stick to the standard framework without the need of using constrained estimation techniques. More specically, we have turned a restricted estimation problem into an unrestricted estimation problem (' is estimated unrestrictedly!)

Testing problem Given y t = x 0 i + u i, i = 1; :::; n or objective is testing H 0 : R:=r, (H 0 : :=H' + h) vs H 1 : R 6= r, (H 1 : 6= H' + h). A testing problem requires solving a probabilistic decision rule that allows the researcher establishing whether the evidence provided by the data is closer to H 0 or to H 1.

This decision is based on the following ingredients: a Pre-xed nominal type I error (or signicance level) = Pr(rejecting H 0 j H 0 ): The cases we typically address are such that is actually an asymptotic error in the sense that Pr(reject H 0 j H 0 ) can be evaluated asymptotically (for large n); b Test statistic ^G n = G n (^) which is a function of ^ and whose distribution is known under H 0 asymptotically (later on we see that ^G n = G n (^) usually belongs to one of three families); c Decision rule of the type: reject H 0 if G n (^) > cv(1 ), where cv (1 ) is the 100*(1 ) percentile taken from the (asymptotic) distribution of ^G n = G n (^) under H 0 ;

d To understand whether the test does a good job one should be able to evaluate the function 1 () = lim n!1 Pr( reject H 0 j H 1 ) = Pr(G 1 (^) cv(1 ) j H 1 ) known as asymptotic power function. A`desired' test should be such that once one xes the type I error (usually 5% or 10%), the test is consistent meaning that 1 () = lim n!1 Pr(reject H 0 j H 1 ) = Pr(G 1 (^) cv(1 ) j H 1 )! 1:

When estimation is perfomed with ML we can classify the test statistic ^G n = G n (^) into three families, known as Wald, Lagrange Multipliers (LM) and Likelihood Ratio (LR) tests. However, the existence of these three families is not solely conned to ML estimation but can be extended to other classes of estimators as well. General philosophy: Wald These type of tests are based on the idea of checking whether the unrestricted (ML) estimator of (i.e. the estimator obtained without imposing any constraint) is `close in a statistical sense' to H 0, namely whether R^ r 0q1 which is what one would expect if H 0 is true. Computing a Wald-type test requires estimating the regression model one time without any restriction.

LM These type of tests are based on the the estimation of the null model (i.e. the regression model under H 0 ), hence are based on the constrained estimator of. Let ^ be the unconstrained (obtained under H 1 ) estimator of and ~ its constrained (obtained under H 0 ) counterpart. The idea is that if H 0 is true, the score of the unrestricted model, evaluated at the constrained estimator (recall that s n (^) = 0k1 ) should be `close to zero in a statistical sense', i.e. s n (~ ) 0k1 : Computing a LM test requires estimating the regression model one time under the restrictions implied by H 0 (however, it also requires that we know the structure of the score of the unrestricted model!).

LR The type of tests are based on the idea of estimating the linear regression model both under H 0 (~ ) and without restrictions (^); estimation is carried out two times. If H 0 is true, the distance between the likelihood functions of the null and unrestricted models should not be too large (note that one always has log L(~ ) log L(^)).

Recall: Let v a p 1 stochastic vector such that v N(0 p1 ; V ), where V is a p p covariance matrix (hence symmetric positive denite). Then v 0 V 1 v 2 (p): The same result holds if ` ' is replaced with `! D '. Note also that kvk 2 = (v 0 v) 1=2 is the Euclidean norm of the vector v (a measure of its length in the space R p, i.e. a measure of the distance of the vector v from the vector 0 p1 ). One can generalize this measure by dening the norm kvk A = (v 0 Av) 1=2, where A is a symmetric positive denite matrix; this norm measures the distance of v from 0 p1 `weighted' by the elements of the matrix A. This means that the random variable (quadratic form) v 0 V 1 v measures a weighted distance of v from 0 p1.

Wald test Let ^ be either the OLS or the ML estimator of in the classical regression model with corss-section data. Standard Assumptions. We start from the assumption (Gaussian) which implies that u j X N(0 n ; 2 ui n ): ^ j X N(; 2 u(x 0 X) 1 ): From the properties of the multivariate Gaussian distribution we obtain R(^ ) q1 j X N(0 q ; 2 ur(x 0 X) 1 R 0 ):

The result above implies that under H 0 :R:=r : R^ q1 r j X N(0 q1 ; 2 u R(X0 X) 1 R 0 ): Interpretation: the left-hand side can be regarded as a measure of the distance of the q 1 (stochastic) vector R^ from the known vector r (or the distance of R^ r from the vector 0q1 ); moreover, we have a symmetric positive denite matrix ( 2 ur(x 0 X) 1 R 0 ) with respect to which we can weight this distance.

Imagine rst that 2 u is known (unrealistic). We dene the test statistic (a quadratic form from Gaussian) W n = h (R^ r) i 0 n 2 u R(X 0 X) 1 R 0o 1 h (R^ r) i : From the properties of multivariate Gaussian distribution: W n j X 2 (q): Thus, xed, one rejects H 0 if W n cv (1 ), where cv (1 ) is the 100*(1-) percentile taken from the 2 (q) distribution (Pr[ 2 (q) > cv (1 ) ] = ). One accepts H 0 otherwise.

Since 2 u is generally unknown, we replace it with the estimator ^ 2 u. Please recall that we know that under our assumptions: ^G j X 2 (n k), ^G:=^2 u 2 (n k): u Recall also that from the theory of distributions, if G 1 2 (q 1 ) G 2 2 (q 2 ) and G 1 and G 2 are independent then G1 q 1 G2 q 2 F (q 1 ; q 2 ):

Now dene the test statistic W n = h (R^ r) i 0 n^ 2 u R(X 0 X) 1 R 0o 1 h (R^ r) i =q h (R^ r) i 0 n R(X 0 X) 1 R 0o 1 h (R^ r) i = q^ 2 u h 0fR(X (R^ r)i 0 X) 1 R 0 g 1h i (R^ r) = = ^ 2 u 2 u q 2 u n n h 0fR(X (R^ r)i 0 X) 1 R 0 g 1h i (R^ r) q 2 u ^G n k k k It can be proved that the two quadratic forms above and below the fraction are independent.

Accordingly, W n j X = h 0fR(X (R^ r)i 0 X) 1 R 0 g 1h i (R^ r) q 2 u ^G n k j X 2 (q) q 2 (n k) n k F (q; n k): Thus, xed, one rejects H 0 if W n cv (1 ), where cv (1 ) is the 100*(1-) percentile taken from the F (q; n k) distribution (Pr[ 2 (q) > cv (1 ) ] = ). One accepts H 0 otherwise. General linear hypotheses on can be tested by using the F-distribution. This is true if u j X N(0 n ; 2 u I n) irrespective of whether n is small or large. Note that F (q; n k)! 2 (q)=q for n! 1:

LM and LR tests will be reviewed when we deal with the regression model based on time series data.