Quick Review on Linear Multiple Regression

Similar documents
Classical Least Squares Theory

Asymptotic Least Squares Theory

Classical Least Squares Theory

Review of Econometrics

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008

Statistics 910, #5 1. Regression Methods

Introduction to Estimation Methods for Time Series models. Lecture 1

ECONOMETRICS (I) MEI-YUAN CHEN. Department of Finance National Chung Hsing University. July 17, 2003

Multivariate Regression Analysis

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

MEI Exam Review. June 7, 2002

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Economics 583: Econometric Theory I A Primer on Asymptotics

Heteroskedasticity and Autocorrelation

Introductory Econometrics

General Linear Model: Statistical Inference

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Large Sample Properties of Estimators in the Classical Linear Regression Model

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Understanding Regressions with Observations Collected at High Frequency over Long Span

Generalized Method of Moment

A Primer on Asymptotics

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Generalized Least Squares Theory

Linear Regression. Junhui Qian. October 27, 2014

Ch.10 Autocorrelated Disturbances (June 15, 2016)

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

Empirical Economic Research, Part II

Introduction to Estimation Methods for Time Series models Lecture 2

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Lectures on Structural Change

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

9. AUTOCORRELATION. [1] Definition of Autocorrelation (AUTO) 1) Model: y t = x t β + ε t. We say that AUTO exists if cov(ε t,ε s ) 0, t s.

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

x = 1 n (x = 1 (x n 1 ι(ι ι) 1 ι x) (x ι(ι ι) 1 ι x) = 1

Heteroskedasticity. Part VII. Heteroskedasticity

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Maximum Likelihood Estimation

Advanced Econometrics I

The outline for Unit 3

Asymptotic Statistics-III. Changliang Zou

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

6. MAXIMUM LIKELIHOOD ESTIMATION

Homoskedasticity. Var (u X) = σ 2. (23)

Regression and Statistical Inference

GENERALISED LEAST SQUARES AND RELATED TOPICS

Brief Suggested Solutions

BIOS 2083 Linear Models c Abdus S. Wahed

Econ 583 Final Exam Fall 2008

Statistics and econometrics

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Introductory Econometrics

simple if it completely specifies the density of x

MIT Spring 2015

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

Lesson 4: Stationary stochastic processes

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Intermediate Econometrics

1 Appendix A: Matrix Algebra

Economic modelling and forecasting

Empirical Market Microstructure Analysis (EMMA)

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Economics 582 Random Effects Estimation

Multivariate Regression

3. Linear Regression With a Single Regressor

Sensitivity of GLS estimators in random effects models

Reliability of inference (1 of 2 lectures)

Analysis of Cross-Sectional Data

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

The Statistical Property of Ordinary Least Squares

11. Further Issues in Using OLS with TS Data

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

13.2 Example: W, LM and LR Tests

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

8. Hypothesis Testing

STAT Financial Time Series

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Econometrics II - EXAM Answer each question in separate sheets in three hours

Maximum Likelihood (ML) Estimation

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Econometrics of Panel Data

Lecture 11: Regression Methods I (Linear Regression)

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Analysis of Cross-Sectional Data

The Multiple Regression Model Estimation

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Simple and Multiple Linear Regression

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Ch 3: Multiple Linear Regression

Generalized Linear Models

Multivariate Time Series: VAR(p) Processes and Models

Econ 510 B. Brown Spring 2014 Final Exam Answers

Transcription:

Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007

Introduction for Conditional Mean Modeling Suppose random variables Y, X 1, X 2,..., X k are considered and the conditional mean of Y on X 1, X 2,...,X k, E(Y X 1, X 2,..., X k ), is interested. Knowing E(Y X 1, X 2,..., X k ), the average behavior of Y conditional on specific realizations of X 1, X 2,..., X k is observed. Besides, the response of average behavior of Y conditional on one set of specific realizations of X 1, X 2,...,X k to another set of realizations could be analyzed. For example, the change of E(Y X 1 = x 1, X 2 = x 2,...,X k = x k ) to E(Y X 1 = x 1, X 2 = x 2 +,..., X k = x k ) could be treated as the pure effect of X 2 = x 2 changes to x 2 + on the average value of Y.

Denote m(x 1, x 2,...,x k ) = E(Y X 1 = x 1, X 2 = x 2,...,X k = x k ). To simplification, some functional forms are assumed for m(x 1, x 2,...,x k ), say linear and/or nonlinear parametric functional forms. Of course, m(x 1, x 2,...,x k ) can also be considered as nonparametric. The goal of econometric analysis is to estimate and infer m(x 1, x 2,...,x k ) using a collection of sample observations {y t, x t1, x t2,...,x tk, t = 1,..., T }, where T is total number of sample observations.

Linear Multiple Regression Suppose m(x 1, x 2,...,x k ) = β 10 x 1 + β 20 x 2 + + β k0 x k is assumed. For any realization (y, x 1, x 2,...,x k ), it can be represented as y = E(Y X 1 = x 1, X 2 = x 2,...,X k = x k ) + e, = β 10 x 1 + β 20 x 2 + + β k0 x k + e, where e is a difference between y and the conditional mean. Therefore, given a collection of sample observations {y t, x t1, x t2,...,x tk, t = 1,..., T }, the linear regression model is formulated as y t = β 10 x t1 + β 20 x t2 + + β k0 x tk + e t, t = 1,..., T. (1) The term e t is called as regression error.

OLS Estimator A linear regression model in (1) can be written in matrix notation as y T 1 = X T k β 0k 1 + e T 1, where k T. We want to find a k-dimensional regression hyperplane which best fits the data (y, X). Different estimators are obtained according different definition of best. The least squares estimator defines the best as the squared deviations of observed y t to fitted value ŷ t is minimized. The maximum likelihood estimator takes the best as the likelihood value is maximized.

Least Squares Estimator Denote the averaged squared deviations of all observed y t to ad hoc fitted values ŷ t as Q(β) = (y Xβ) (y Xβ)/T. The least squares estimator for β 0 is obtained by: min Q(β) := 1 β R k T (y Xβ) (y Xβ). The FOCs are also called the normal equations: β Q(β) = β (y y 2y Xβ + β X Xβ)/T = 2X (y Xβ)/T set = 0, and the resulting OLS estimator is ˆβ T = (X X) 1 X y.

The second order condition is satisfied because X X is positive definite. The vector of fitted values are ŷ = X ˆβ T = Py, where P = X(X X) 1 X, and the vector of regression residuals is ê = y ŷ = (I T P)y. By normal equations, X ê = 0 so that ŷe = 0. When X contains a constant term, we also have T t=1 êt = 0 and T t=1 y t = T T=1 ŷt.

Geometrically, ŷ = Py is the orthogonal projection of y onto the k-dimensional space span(x), the space spanned by the column vectors of X, and e = (I T P)y is the orthogonal projection of y onto span(x) (orthogonal complement space of span(x)). Consequently, ŷ is the best approximation of y in span(x) in the sense that y ŷ y z for all z span(x).

Properties of OLS Estimator under Classical Assumptions We first make the following classical assumptions on (y, X) and e: [A1 ]y = Xβ 0 + e, β 0 <, is the correct model. [A2 ]X is a T k nonstochastic and finite matrix. [A3 ]X X is nonsingular for all T k. (X is of full column rank.) [A4 ]e is a random vector such that E(e) = 0. [A4 ]e is a random vector such that E(e) = 0 and E(ee ) = σ0i 2 T, where σ0 2 <. [A5 ]e N(0, σ0i 2 T ), where σ0 2 <.

(1) Given assumptions [A1-A3], ˆβ T and ˆσ T 2 exist and are unique. (2) Given assumptions [A1-A4], ˆβ T is unbiased. (3) Given assumptions [A1-A3] and [A4 ], var(ˆβ T ) = E[(ˆβ T Eˆβ T )(ˆβ T Eˆβ T ) ] = E[(X X) 1 X ee X(X X) 1 ] = σ0(x 2 X) 1.

(4) Gauss-Markov Result: Given assumptions [A1-A3] and [A4 ], ˆβ T is the best linear unbiased estimator (BLUE) of β 0. (5) Given assumptions [A1-A3] and [A4 ], ˆσ T 2 = ê ê/(t k) is an unbiased estimator for σ0. 2 (6) If we assume [A5] instead of [A4 ], ˆβ T is the maximum likelihood estimator (MLE). But, the MLE for σ0 2 is σ T 2 = ê ê/t is biased estimator. (7) Given assumption [A1-A3] and [A5], ˆβ T and ˆσ T 2 are the minimum variance unbiased estimator (MVUE).

Goodness of Fit A natural measure is the regression variance ˆσ T 2 = ê ê/(t k). Some relative measures: (1) The Coefficient of Determination: Non-Centered R 2. (2) The Coefficient of Determination: Centered R 2. (3) Adjusted R 2 : R 2. R 2 ê ê/(t k) = 1 (y y TȳT 2 )/(T 1) = 1 T 1 T k (1 R2 ) = R 2 k 1 T k (1 R2 ).

Three alternatives that have been proposed for comparing models are 1. R 2 = T+k (1 T k R2 ), which minimizes Amemiya s prediction criterion, PC = ê ê T k ( 1 + k ) ( = ˆσ T 2 1 + k ). T T 2. Akaike s information criterion: (AIC) ) (ê ê AIC = ln + 2k T T = ln σ2 T + 2k T. 3. Schwarz information criterion: (SIC) SIC = ln σ 2 T + k ln T T.

Sampling Distribution of OLS Estimator under Classical Assumptions Given [A5]: e N(0, σ 2 0I T ), the following distributions are immediate. y X N(Xβ 0, σ 2 0I T ); ˆβ T X N(β 0, σ 2 0(X X) 1 ); ê X = (I T P)e N(0, σ 2 0(I T P)). As (T k)ˆσ 2 T /σ2 0 = ê ê/σ 2 0, (T k)ˆσ2 T χ 2 (T k), σ0 2 with mean (T k) and variance 2(T k). Hence, ˆσ T 2 has mean σ0 2 and variance 2σ0/(T 4 k).

Testing Linear Hypotheses H 0 : Rβ 0 = r, where R is a q k nonstochastic matrix with rank q, and r is a vector of pre-specified real values. [R(X X) 1 R ] 1/2 (Rˆβ T r)/σ 0 N(0,I q ) (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r)/σ 2 0 χ 2 (q). Recall that (T k)ˆσ 2 T /σ2 0 χ2 (T k). φ = = {(Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r)/σ0 2}/q {(T k)ˆσ T 2 /σ2 0 }/(T k) χ 2 (q)/q χ 2 (T k)/(t k) = (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r) qˆσ T 2 F(q,T k).

An Alternative Approach Given the constraint Rβ 0 = r, the constrained OLS estimator can be ontained by minimizing the Lagrangian: min β (y Xβ) (y Xβ)/T + (Rβ r) λ, where λ is the Lagrangian multiplier. The FOCs are 2X (y Xβ)/T + R λ set = 0 Rβ r set = 0. The FOCS can written as [ ] [ 2X X/T R R 0 β λ ] set = [ 2X y/t r ].

We can solve λ T = 2[R(X X/T) 1 R ] 1 (Rˆβ T r), β T = ˆβ T (2X X/T) 1 R λt. β T is called the constrained OLS estimator for β 0. Note that the vector of constrained OLS residuals is ë = y X β T = y X ˆβ T + X(ˆβ T β T ) = ê + X(ˆβ T β T );

ë ë = ê ê + (ˆβ T β T ) X X(ˆβ T β T ) = ê ê + (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r), since ˆβ T β T = (X X/T) 1 R [R(X X/T) 1 R ] 1 (Rˆβ T r). Thus ë ë ê ê = (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r) is the numerator term in the F-test, φ.

φ = ë ë ê ê = ESS c ESS u qˆσ T 2 qˆσ T 2 = (ESS c ESS u )/q ESS u /(T k) (Ru 2 R 2 = c)/q (1 Ru)/(T 2 k).

where the subscripts c and u signify the constrained and unconstrained models, respectively. In other words, the F-test can be interpreted as a test of the loss of fit because it compares the performance of the constrained and unconstrained models. In particular, if we want to test whether all the coefficients (except the constant term) equal zero, then R 2 c = 0 so that φ = Ru/(k 2 1) F(k 1, T k). (1 (Ru)/(T 2 k)

Asymptotic Properties of the OLS Estimator ( ) 1 ( ) 1 T ˆβ T = x T t x 1 T t x T t y t t=1 t=1 ( ) 1 ( ) 1 T = β 0 x T t x 1 T t x T t x t t=1 t=1 ( ) 1 ( ) 1 T + x T t x 1 T t x T t e t. t=1 t=1

Asymptotic Normality of OLS Estimator: IID Observations

Komolgorov Theorem Let {Z t } be a sequence of i. i. d. random variables and Z T T 1 T t=1 Z a.s. t. Then Z T µ if and only if IE Z t < and IE(Z t ) = µ.

Lindeberg-Levy Central Limit Theorem Let {Z t } be a sequence of i. i. d. random scalars. If varz t σ 2 <, σ 2 0, then T( Z T µ T )/ σ T = T( Z T µ)/σ = T T (Z T µ)/σ A N(0, 1). t=1

Asymptotic Normality under IID Case Given B1 y = Xβ 0 + e; B2 {(x t, e t ) } is an i. i. d. sequence; B3 1 E(x te t ) = 0; 2 E x ti e t 2 <,i = 1,... k; 3 V T var(t 1/2 X e) = V is positive definite; B4 1 E x ti 2 <,i = 1,...,k; 2 M E(x t x t) is positive definite. Then D 1/2 T(ˆβ T β 0 ) A N(0, I), where D M 1 V M 1.

Suppose in addition that B5 there exists Vˆ T symmetric and positive semidefinite such that Vˆ T V p 0. Then Dˆ T D p 0, where Dˆ T = (X X/T) 1 Vˆ T (X X/T) 1.

Asymptotic Normality of OLS: Independent Heterogeneous Observations

Markov s SLLN Let {Z t } be a sequence of independent random variables with E(Z t ) = µ t <. If for some δ > 0, t=1 E Z t µ t 1+δ /t 1+δ <, then Z T µ T 0 a.s.

Lindeberg-Feller s CLT Let {Z t } be a sequence of independent random scalars with E(Z t ) µ t, varz t σt 2 <, σt 2 0, and distribution function F t (z). Then T( Z T µ T )/ σ A T N(0, 1) and lim n σ 2 T T 1 T t=1 (z µ t) 2 >ǫt σ 2 T (z µ T ) 2 df t (z) = 0. The last condition of this result is called Lindeberg condition.

Liapounov s CLT Let {Z t } be a sequence of independent random scalars with E(Z t ) = µ t, varz t = σt 2, σt 2 0, and E Z t µ t 2+δ < < for some δ > 0 and all t. If σ T 2 > δ > 0 for all T sufficiently large, then T( Z T µ T )/ σ A T N(0, 1).

Asymptotic Normality: Ind. Het. Observations Suppose that the following conditions hold: B1 y t = x tβ 0 + e t, t = 1,...,T; B2 {(x t, e t) } is an independent sequence; B3 1 E(x t e t ) = 0 for all t, 2 E x ti e t 2+δ < for some δ > 0 and all i = 1,...,k, and t, 3 V T var(x e/t 1/2 ) is uniformly p.d.; B4 1 E x 2 ti 1+δ < for some δ > 0 and all i = 1,...,k, and t, 2 M T E(X X/T) is uniformly p.d. Then D 1/2 T T(ˆβ T β 0 ) A N(0, I), where D T = M 1 T V TM 1 T.

Further suppose: B5 there exists ˆV T p.s.d. and symmetric such that ˆV T V T p 0. Then ˆD T = (X X/T) 1 ˆV T (X X/T) 1 and ˆD T D T p 0.

Large Sample Tests I We consider various large sample tests for the linear hypothesis Rβ 0 = r, where R is a q k nonstochastic matrix with rank q k.

Wald Test Let Γ T = RD T R = RM 1 T V TM 1 T R. Then under the null hypothesis, and the Wald statistic is Γ 1/2 T T(RˆβT r) A N(0, I), W T = T(Rˆβ T r) Γ 1 T (Rˆβ T r) A χ 2 (q), where ˆΓ T = R ˆD T R = R(X X/T) 1 ˆV T (X X/T) 1 R.

Lagrange Multiplier Test Given the constraint Rβ = r, the constrained OLS estimator is obtained by minimizing the Lagrangian (y Xβ) (y Xβ)/T + (Rβ r) λ, where λ is the Lagrange multiplier. Intuitively, when the null hypothesis is true (i.e., the constraint is valid), the shadow price (λ) of this constraint should be low. Hence, whether the shadow price is close to zero is an evidence for or against the hypothesis. The Lagrange multiplier (LM) test can be interpreted as a test of λ = 0.

λ T = 2[R(X X/T) 1 R ] 1 (Rˆβ T r), β T = ˆβ T (X X/T) 1 R λt /2, where β T is the constrained OLS estimator, and λ T is the basis of the LM test.

Suppose that the asymptotic normality of ˆβ T holds, Λ 1/2 T TbλT A N(0, I), where Λ T = 4(RM 1 T R ) 1 Γ T (RM 1 T R ) 1. The LM statistic is where ˆΛ T = LM T = T λ 1 T ˆΛ λ T T A χ 2 (q), 4[R(X X/T) 1 R ] 1 [R(X X/T) 1 V T (X X/T) 1 R ][R(X X/T) 1 R ] 1, and V T is an estimator of V T obtained from the constrained regression such that V T V T P 0 under the null.

If ˆV T replaces V T in ˆΛ T, then LM T = 4(Rˆβ T r) [R(X X/T) 1 R 1 ] 1ˆΛ T [R(X X/T) 1 R ] 1 (Rˆβ T r) = T(Rˆβ T r) ˆΓ 1 T (Rˆβ T r) = W T. This suggests that these two tests are asymptotically equivalent under the null hypothesis, i.e., W T LM T P 0.

Test of s coefficients being zero: [0 I s ]β 0 = 0. Accordingly, the original model can be written as y = X 1 b 10 + X 2 b 20 + e, where X 1 and X 2 are T (k s) and T s matrices, respectively. Clearly, the constrained model is y = X 1 b 10 + e, so that the constrained OLS estimator is β T = ( b 1T 0), where b 1T = (X 1X 1 ) 1 X 1y, and the constrained OLS residual is ë = y X 1 b1t.

Writing P 1 = X 1 (X 1X 1 ) 1 X 1, it is easily verified that by matrix inversion formula, R(X X) 1 [ = [X 2(I P 1 )X 2 ] 1 X 2X 1 (X 1X 1 ) 1 [X 2(I P 1 )X 2 ] 1], R(X X) 1 R = [X 2 (I P 1)X 2 ] 1, R(X X) 1 X = [X 2(I P 1 )X 2 ] 1 X 2(I P 1 ). Hence λ T = 2X 2 (I P 1)ë/T = 2X 2ë/T, and ˆΛ T = 4[R(X X/T) 1 R ] 1 [R(X X/T) 1 V T(X X/T) 1 R ][R(X X/T) 1 = 4[ X 2X 1 (X 1X 1 ) 1 I s ] V T [ X 2X 1 (X 1X 1 ) 1 I s ].

The LM statistic is thus LM T = T 4 λ [ T [ X 2 X 1(X 1 X 1) 1 I s ] V T [ X 2 X 1(X 1 X 1) 1 I s ] ] 1 λt. When V T = σ 2 T (X X/T) 1 is consistent for V T, where σ 2 T = T t=1 ë2 t/t, the LM statistic can be further simplified as LM T = ë X(X X) 1 X ë ë ë/t = TR 2, where R 2 is the (non-centered) R 2 of regressing ë on X.

Likelihood Ratio Test When e t are i.i.d. N(0, σ 2 0), we have learned that the OLS estimator is also the MLE maximizing L T (β, σ 2 ) = T 2 log(2π) T 2 log(σ2 ) 1 2σ 2 T (y t x tβ) 2. t=1 Let β T ( β T ) be the (un)constrained MLE of β 0 and σ 2 T = 1 T T ë 2 t, σ 2 T = 1 T T ê 2 t. t=1 t=1

The likelihood ratio (LR) test is based on the log likelihood-ratio: ( ) LR T = 2 L T ( β T, σ T) 2 L T ( β T, σ T) 2 = T log ( σ 2 T σ 2 T ). If the null hypothesis is true, the likelihood ratio is close to one so that LR T is close to zero; otherwise, LR T is positive.

As σ 2 T = σ 2 T + ( β T β T ) (X X/T)( β T β T ) = σ 2 T + (R β T r) [R(X X/T) 1 R ] 1 (R β T r), LR T = T log ( ) 1+(R β T r) [R(X X/T) 1 R ] 1 (R β T r)/ σ T 2 }{{} =:z T

By noting that the mean value expansion of log(1 + z) about z = 0 is (1 + z ) 1 z, where z lies between z and 0, we can write LR T = T(1 + z T ) 1 z T = T(R β T r) [R(X X/T) 1 R ] 1 (R β T r)/ σ 2 T + o P (1), where the second term is nothing but the Wald statistic with ˆV T = σ 2 T (X X/T). We immediately have the following result.

Suppose that σ T 2(X X/T) is consistent for V T. Then under the null hypothesis, LR T A χ 2 (q). Therefore, the Wald, LR, and LM tests are asymptotically equivalent provided that σ T 2 (X X/T) is consistent for V T. If σ T 2 (X X/T) is not consistent for V T, LR T need not have a limiting χ 2 distribution. Thus, the LR test is not robust to heteroskedasticity and serial correlation, whereas the Wald and LM tests are robust if V T is estimated properly.

Conflict Among Tests If σ0 2 is known, it can be seen that T LR T = (ë 2 t e 2 t)/σ0 2 = W T. t=1 We have also learned that the Wald and LM tests differ by the asymptotic covariance matrix estimator used in the statistics. It follows that when σ0 2 is known, LM T = W T = LR T. As W T = LR T ( σ T); 2 LM T = LR T ( σ T). 2 if σ T 2 (X X/T) is not consistent for V T, LR T need not have a limiting χ 2 distribution. Thus, the LR test is not robust to heteroskedasticity and serial correlation, whereas the Wald and LM tests are robust if V T is estimated properly.

Observe that LR T LM T = LR T LR T ( σ 2 T) = 2[L T ( β T, σ 2 T) L T ( β u T, σ 2 T)] 0, where β u T maximizes L T (β, σ T 2 ), and that W T LR T = LR T ( σ 2 T) LR T = 2[L T ( β T, σ 2 T) L T ( β r T, σ 2 T)] 0, where β r T maximizes L T (β, σ T 2 ) subject to the constraint Rβ = r. We have established an inequality in finite samples: W T LR T LM T ;

Estimation of the Asymptotic Covariance Matrix In the most general form, V T can be written as ( ) 1 T var x t e t T = 1 T t=1 T var(x t e t ) + 1 T t=1 T 1 T τ=1 t=τ+1 E(x t τ e t τ e t x t) + E(x t e t e t τ x t τ). We have learned that the limiting distributions of the large sample tests discussed in the preceding subsections depend crucially on the consistent estimation of V T.

The Case of No Serial Correlation We have learned that when {(x t, e t ) } is an independent sequence, ( ) 1 T var x t e t = 1 T T t=1 T var(x t e t ). Let ˆV T = T t=1 ê2 tx t x t/t. It can be seen that when ˆβ T is consistent for β 0, 1 T T ê 2 tx t x t 1 T t=1 = 1 T T E(e 2 tx t x t) t=1 t=1 T ( e 2 t x t x t E(e 2 tx t x t) ) 2 T t=1 1 T T t=1 T t=1 ( ) (ˆβ T β 0 ) x t x t(ˆβ T β 0 )x t x t P 0, ( ) e t x t(ˆβ T β 0 ) x t x t +

Thus, ˆV T is consistent for V T, and ˆD T = ( 1 T ) 1 ( T x t x 1 t T t=1 ) ( T ê 2 tx t x 1 t T t=1 T x t x t t=1 ) 1 is consistent for D T, the asymptotic covariance matrix of T(ˆβT β 0 ).

More generally, if E(e t F t 1 ) = 0, where F t 1 = σ((e i 1, x i) ; i t) contains information up to time t 1, then for τ < t, E(x t e t e τ x τ) = E(x t E(e t F t 1 )e τ x τ) = 0, so that V T = T t=1 var(x te t )/T. Consequently, ˆD T above is still consistent for D T.

General Case In the time series context, it is possible that x t e t exhibit certain correlation. If x t e t are asymptotically uncorrelated in the sense that E(x t e t e t τ x t τ) 0 at a suitable rate as τ, then for τ large, T t=τ+1 E(x te t e t τ x t τ)/t should be very small. This suggests that V T may be well approximated by V T = 1 T T var(x t e t ) + 1 T t=1 m(t) T τ=1 t=τ+1 E(x t τ e t τ e t x t) + E(x t e t e t τ x t τ), for some m(t), where m(t) should be growing with T to maintain the approximation property.

In particular, m(t) is required to be O(T 1/4 ), i.e., m(t) also tends to infinity at a rate much slower than T. The following estimator is a heteroskedasticity and autocorrelation consistent convariance matrix estimator: ˆV T = 1 T T ê 2 tx t x t + 1 T t=1 m(t) T τ=1 t=τ+1 (x t τ ê t τ ê t x t + x t ê t ê t τ x t τ).

The major problem is that ˆV T need not be p.s.d. Newey & West (1987) propose a simple estimator: ˇV T = 1 T T ê 2 t x tx t + 1 T t=1 m(t) τ=1 w m (τ) T t=τ+1 ( xt τ ê t τ ê t x t + x tê t ê t τ x ) t τ, where w m (τ) = 1 [τ/(m + 1)] is a weight function. Note that w m (τ) is decreasing in τ; hence the larger the τ, the smaller the associated weights. Also note that for fixed τ, w m (τ) 1 as m.

Testing for Efficient Market Hypothesis, EMH EMH: E(p t Ω t 1 ) = p t 1, Ω t 1 is the information set at t 1. Under EMH, Ω t 1 = p t 1. That is, E(p t Ω t 1 ) = E(p t p t 1 ) = p t 1. Gien a linear model for the conditional mean E(p t p t 1 ) = α 0 + β 0 p t 1, a linear regression model for observations t = 1,..., T is set to be p t = α 0 + β 0 p t 1 + e t, t = 1,...,T. Testing for EMH is equivalent to test the null hypothesis H 0 : β 0 = 1.

Assumptions to be checked: [A1 ]: True model? Yes! [A2 ]: Is p t 1 nonstochastic? No! Non-classical regression analysis! [B2 ]: Does p 2 t 1 obey a WLLN? No! as p t 1 is not stationary so that spurious regression may exist. This is known by data plot or unit root tests.

What is the stationary and nonstationary? 1. Strong Stationarity: A time series {y t } is strong stationary if the distribution and joint distribution are time invariant. 2. Weak Stationarity: A time series {y t } is weak stationary if it has constant mean, constant variance, and the covariance between y t and y t+s depending on s not t.

It is clear that ln p t is nonstationary when p t is nonstationary. However, the first-order difference of ln p t, p t = ln p t ln p t 1 = r t which is defined as the return, becomes stationary. Observe that p t = α 0 + β 0 p t 1 ln p t ln p t 1 = α 0 + β 0 ln p t 1 = α 0 + β 0 ln p t 2 ln p t ln p t 1 = β 0 (ln p t 1 ln p t 1 ) p t = β 0 p t 1 r t = β 0 r t 1. Therefore, the linear regression model we considered becomes r t = α 0 + β 0 r t 1 + e t, t = 1,...,T,

Question again: How do we make a reliable statistical inference for the null hypothesis? [A1 ]: True model? Yes! [A2 ]: Is r t 1 nonstochastic? No! Non-classical regression analysis! [B2 ]: Does {r 2 t 1} obey a WLLN? Yes! as {r t 1 } is stationary. [A5 ]: Is r t normally distributed? No! Check by Eviews. [B3 ]: Does {r t 1 e t } obey a CLT? Yes! Therefore, regression analysis is implementable and the large sample test is applicable.