Linear Regression. Junhui Qian. October 27, 2014

Similar documents
Dealing With Endogeneity

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Least Squares Estimation-Finite-Sample Properties

The Simple Regression Model. Part II. The Simple Regression Model

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Econometrics - 30C00200

Multiple Linear Regression CIVL 7012/8012

Diagnostics of Linear Regression

Final Exam. Economics 835: Econometrics. Fall 2010

Review of Econometrics

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Multivariate Regression Analysis

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Statistics 910, #5 1. Regression Methods

The Simple Linear Regression Model

Intermediate Econometrics

Intermediate Econometrics

Econometrics I Lecture 3: The Simple Linear Regression Model

ECON The Simple Regression Model

Applied Health Economics (for B.Sc.)

Chapter 2: simple regression model

The Statistical Property of Ordinary Least Squares

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Econometrics Summary Algebraic and Statistical Preliminaries

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Linear Regression With Special Variables

Introductory Econometrics

Introduction to Econometrics

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

ECNS 561 Multiple Regression Analysis

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Applied Quantitative Methods II

Introduction to Estimation Methods for Time Series models. Lecture 1

Econ 510 B. Brown Spring 2014 Final Exam Answers

Empirical Market Microstructure Analysis (EMMA)

ECON3150/4150 Spring 2015

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Advanced Econometrics I

Linear Models in Econometrics

10. Time series regression and forecasting

Economics 308: Econometrics Professor Moody

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Homoskedasticity. Var (u X) = σ 2. (23)

Environmental Econometrics

1. The Multivariate Classical Linear Regression Model

Applied Econometrics (QEM)

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Introductory Econometrics

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

7 Introduction to Time Series

Quantitative Methods I: Regression diagnostics

2. Linear regression with multiple regressors

08 Endogenous Right-Hand-Side Variables. Andrius Buteikis,

1 Appendix A: Matrix Algebra

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Multiple Regression Analysis

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Regression diagnostics

Lecture 4: Multivariate Regression, Part 2

Applied Statistics and Econometrics

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Short Questions (Do two out of three) 15 points each

Multivariate Regression: Part I

Econometrics of Panel Data

ECON3327: Financial Econometrics, Spring 2016

Regression and Statistical Inference

Making sense of Econometrics: Basics

A Guide to Modern Econometric:

Lesson 17: Vector AutoRegressive Models

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Advanced Quantitative Methods: ordinary least squares

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Applied Econometrics Lecture 1

Quick Review on Linear Multiple Regression

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

HT Introduction. P(X i = x i ) = e λ λ x i

THE MULTIVARIATE LINEAR REGRESSION MODEL

AUTOCORRELATION. Phung Thanh Binh

Making sense of Econometrics: Basics

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Topic 12 Overview of Estimation

Simple Linear Regression: The Model

ECO375 Tutorial 8 Instrumental Variables

Motivation for multiple regression

The Simple Regression Model. Simple Regression Model 1

3. Linear Regression With a Single Regressor

Regression with time series

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Ch.10 Autocorrelated Disturbances (June 15, 2016)

Applied Microeconometrics (L5): Panel Data-Basics

Transcription:

Linear Regression Junhui Qian October 27, 2014

Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

What is linear regression? We want to explain an economic variable y using x, which is usually a vector. For example, y may be the wage of an individual, and x include factors such as experience, education, gender, and so on. Let x = (1, x 1,..., x k ), and let its realization for ith-individual be x i = (1, x i1,..., x ik ), we may write: y = β 0 + β 1 x 1 + + β k x k + u. (1)

Some Terminologies Now we have y = β 0 + β 1 x 1 + + β k x k + u. (2) y is called the dependent variable, the explained variable, or the regressand The elements in x are called the independent variables, explanatory variables, covariates, or the regressors. β s are coefficients. In particular, β 0 is usually called intercept parameter or simply called constant term, and (β j, 1 j k) are usually called slope parameters. u is called the error term, residuals, or disturbances and represents factors other than x that affect y.

An Example We may have an econometric model of wages: wage i = β 0 + β 1 edu i + β 2 expr i + u i edu i denotes the education level of individual i in terms of years of schooling and expr i denotes the working experience of individual i. β 0 is the constant term or the intercept. It measures what a male worker would expect to get if he has zero education and zero experience. β 1 is the slope parameter for the explanatory variable edu. It measure the marginal increase in wage if a worker gains additional year of schooling, holding other factors fixed, or controlling for other factors. u i may include the gender, the luck, or the family background of the individual, etc.

Partial Effects (β j, j = 1,.., k) can be interpreted as partial effects. For example, for the wage equation, we have wage = β 1 edu + β 2 expr + u. If we hold all factors other than edu fixed, then wage = β 1 edu. So β 1 is the partial effect of education on wage. We say: With one unit of increase in edu, an individual s wage increases by β 1, holding other factors fixed, or controlling for other factors.

Econometric Clear Thinking Whenever we make comparisons or inferences, we should hold relevant factors fixed. This is achieved in econometrics by multiple linear regression. The partial effects interpretation is not without problem. It is partial equilibrium analysis. We may have the socalled general equilibrium problem, which happens when a change in a variable leads to changes in the structure of regression equation. In most cases, however, partial effects analysis is a good approximation, or, the best alternative.

Classical Linear Regression Assumptions (1) Linearity (2) Random sampling (x i, y i ) are iid across i (3) No perfect collinearity Any element in x cannot be represented by the linear combination of other elements. (4) Zero conditional mean E(u x) = E(u x 1, x 2,..., x k ) = 0. (5) Homoscedasticity (6) Normality var(u x) = σ 2. u x N(0, σ 2 ).

More on Linearity Linearity can be achieved by transformation. For example, we may have log(wage i ) = β 0 +β 1 log(exper i )+β 2 log(educ i )+β 3 female i +u i. Now the parameter β 1 represents the elasticity of wage with respect to changes in experiences: β 1 = log(wage i) log(exper i ) = wage i/wage i exper i /exper i wage i/wage i exper i /exper i.

More on No Perfect Collinearity True or False? (1) No Perfect Collinearity does not allow correlation. For example, the following is perfect collinearity: testscore = β 0 + β 1 eduexpend + β 2 familyincome + u. (2) The following model suffers from perfect collinearity: cons = β 0 + β 1 income + β 2 income 2 + u. (3) The following model suffers from perfect collinearity: log(cons) = β 0 + β 1 log(income) + β 2 log(income 2 ) + u. (4) The following model suffers from perfect collinearity: cons = β 0 +β 1 husbincome+β 2 wifeincome+β 3 familyincome+u.

More on Zero Conditional Mean If E(u x) = 0, we call x exogenous. If E(u x) 0, we call x endogenous. The notion of being exogenous or endogenous can be understood in the following model, L = αw + γx + u, where both the employment level (L) and the average wage (W ) are endogenous variables, while the foreign exchange rate (X ) can be considered exogenous. The residual u should contain shocks from both supply and demand sides.

Endogenous Wage If the employment level and the average wage are determined by L s = bw + v s L d = aw + cx + v d L d = L s, Then we can solve for the equilibrium employment and wage rate: W = L = c b a X v s v d b a bc b a X av s bv d. b a It is obvious that cov(w, v d ) 0 and cov(w, v s ) 0. Thus W should be correlated with u, hence the endogeneity in econometric sense.

More on Zero Conditional Mean In econometrics, we call an explanatory variable x endogenous as long as E(u x) 0. Usually, nonzero conditional mean is due to Endogeneity Missing variables (e.g., ability in wage equation) Wrong functional form (e.g., missing quadratic term)

More on Homoscedasticity If var(u i x i ) = σ 2, we call the model homoscedastic. If not, we call it heteroscedastic. Note that var(u i x i ) = var(y i x i ). If var(y i x i ) is a function of some regressor, then there would be heteroscedasticity. Examples of heteroscedasticity Income v.s. Expenditure on meals Gender v.s. Wage.

Ordinary Least Square We have y i = β 0 + β 1 x i1 + + β k x ik + u i. The OLS method is to find beta s such that the sum of squared residuals (SSR) is minimized: SSR(β 0,..., β k ) = n [(y i (β 0 + β 1 x i1 + + β k x ik )] 2. i=1 OLS minimizes a measure of fitting error.

First Order Conditions of OLS To minimize SSR, we find the first-order conditions of the minimization problem: SSR β 0 = 0 SSR β 1 = 0. SSR β k = 0

First Order Conditions of OLS We obtain: 2 2 2 n ((y i ( ˆβ 0 + ˆβ 1 x i1 + + ˆβ k x ik )) = 0 i=1 n ((y i ( ˆβ 0 + ˆβ 1 x i1 + + ˆβ k x ik ))x i1 = 0 i=1. n ((y i ( ˆβ 0 + ˆβ 1 x i1 + + ˆβ k x ik ))x ik = 0. i=1 We have (1 + k) equations for (1 + k) unknowns. If there is no perfect collinearity, we can solve for these equations.

OLS for Naive Regression We may have the following model y i = βx i + u. Then the first-order condition is: n (y i ˆβx i )x i = 0 i=1 We obtain ˆβ = n i=1 y ix i n i=1 x 2 i

OLS for Simple Regression The following is called a simple regression : y i = β 0 + β 1 x i + u i. Then the first-order conditions are: n (y i ( ˆβ 0 + ˆβ 1 x i )) = 0 i=1 n (y i ( ˆβ 0 + ˆβ 1 x i ))x i = 0 i=1

OLS for Simple Regression, Continued From the first-order conditions, we obtain ˆβ 1 = n i=1 (y i ȳ)(x i x) n i=1 (x i x) 2 and ˆβ 0 = ȳ x ˆβ 1, where x = 1/n n i=1 x i and ȳ = 1/n n i=1 y i.

More on OLS for Simple Regression From y = β 0 + β 1 x + u, we have β 1 = cov(x, y) var(x). And we have obtained n i=1 ˆβ 1 = (y 1 n i ȳ)(x i x) n i=1 n i=1 (x i x) 2 = (y i ȳ)(x i x) cov(x, n i=1 (x = ˆ y) i x) 2 var(x) ˆ Hence β 1 measures the correlation between y and x. 1 n

True or False? In the simple regression model, y = β 0 + β 1 x + u. β 0 is the mean of y β 1 is the correlation coefficient between x and y

Estimated Residual Let û i = y i ( ˆβ 0 + ˆβ 1 x i ). From the first-order conditions we have û = 1 n n û i = 0, i=1 and 1 n n x i û i = 0. i=1

Connection between Simple and Naive We have Hence { yi = ˆβ 0 + ˆβ 1 x i + û i ȳ = ˆβ 0 + ˆβ 1 x + 0. y i ȳ = ˆβ 1 (x i x) + û i. Using the formula for the naive regression, we obtain: ˆβ 1 = n i=1 (y i ȳ)(x i x) n i=1 (x i x) 2.

Regression Line For the simple regression model, y = β 0 + β 1 x + u. We can define a regression line : y = ˆβ 0 + ˆβ 1 x. It is easy to show that ȳ = ˆβ 0 + ˆβ 1 x.

In Matrix Language For models with two or more regressors, the expression for ˆβ i are very complicated. However, we can use matrix language to obtain more beautiful and more memorable expressions. Let Y = y 1 y 2. y n 1 x 11 x 1k 1 x 21 x 2k., X =.., β = 1 x n1 x nk Then we may write the multiple regression as Y = X β + u. β 0 β 1. β k, u = u 1 u 2. u n

Some Special Vectors and Matrices Vector of ones, ι = (1, 1,..., 1). For a vector of the same length, we have n x i = x ι = ι x, i=1 and 1 n n x i = (ι ι) 1 ι x. i=1 Vector of standard basis, e 1 = (1, 0, 0,..., 0),e 2 = (0, 1, 0,..., 0), etc. Identity matrix, I. Projection matrix, square matrices that satisfy P 2 = P. If P is symmetric, it is called an orthogonal projection (e.g., P = X (X X ) 1 X ) [ ] 0 0 Oblique projection, e.g., P = X (W X ) 1 W, P =. α 1 If P is an orthogonal projection, so is I P.

Range of Matrix The span of a set of vectors is the set of all linear combinations of the vectors. The range of a matrix X, R(X ), is the span of the columns of X. R(X ) is the orthogonal complement R(X ), which contains all vectors that is orthogonal to R(X ). Two vectors, x and y, are orthogonal if x y = x y = 0. A vector y is orthogonal to a subspace U if for all x U, x ([ y = 0. ]) 1 R is the x-axis. 0

Orthogonal Projection on R(X ) By definition, the orthogonal projection of y on R(X ) can be represented by X β, where β is a vector. We denote proj(y X ) P X y = X β. y X β should be orthogonal to every element in R(X ), which include every column of X. Then we may solve X (y X β) = 0 and obtain β = (X X ) 1 X y. Hence P X = X (X X ) 1 X is the orthogonal projection on R(X ). I P X is the orthogonal projection on R(X ), or equivalently, N (X ).

Vector Differentiation Let z = (z 1,..., z k ) be a vector of variables and f (z) be a function of z. Then f z = f z 1 f z 2. f z k.

Vector Differentiation In particular, if f (z) = a z, where a is a vector of constants. Then z (a z) = a = z (z a) If f (z) = Az is a vector-valued function, where A is a matrix, then z (Az) = A.

Vector Differentiation of Quadratic Form If f (z) = z Az, where A is a matrix, then z (z Az) = (A + A )z. If A is symmetric, ie, A = A, then z (z Az) = 2Az. In particular, when A = I, the identity matrix, then z (z z) = 2z.

OLS in Matrix The least square problem can be written as min(y X β) (Y X β). β The first-order condition in matrix form: 2X (Y X ˆβ) = 0. Solving for β, ˆβ = (X X ) 1 X Y. The matrix of X X is invertible since we rule out perfect collinearity. X ˆβ is nothing but the orthogonal projection of Y on R(X ). If there is only one regressor and there is no constant term, X is a vector. Then the above expression reduces to the naive linear regression estimator.

An Equivalent Derivation The least square problem can be written as min β n (y i x i β) 2, i=1 where x i = (1, x i1,..., x ik ) and β = (β 0, β 1,..., β k ). The first-order condition in matrix form: n 2x i (y i x i ˆβ) = 0. i=1 Solving for β, ( n ) 1 ( n ) ˆβ = x i x i x i y i. i=1 i=1

Equivalence We can check that X X = n x i x i, and X Y = i=1 n x i y i. i=1 If there is only one regressor and there is no constant term, x i is a scalar. Then the above expression reduces to the naive linear regression estimator.

The Population Moments From the assumption E(u x) = 0, we have E(u) = 0, and E(ux j ) = 0, j = 1,..., k. This is E(y (β 0 + β 1 x 1 + + β k x k )) = 0 E((y (β 0 + β 1 x 1 + + β k x k ))x 1 ) = 0. E((y (β 0 + β 1 x 1 + + β k x k ))x k ) = 0 The above equations are called moment conditions.

The Sample Moments We can estimate population moments by sample moments. For example, the sample moment of E(u) is 1 n n u i = 1 n i=1 n (y i (β 0 + β 1 x i1 + + β k x ik )). i=1 Similarly, the sample counterpart of E(ux j ) = 0 is 1 n n u i x ij = 1 n i=1 n (y i (β 0 + β 1 x i1 + + β k x ik ))x ij = 0. i=1

Method of Moments (MM) Plug the sample moments into the moment conditions, we obtain 1 n n (y i ( ˆβ 0 + ˆβ 1 x i1 + + ˆβ k x ik )) = 0, i=1 and 1 n n (y i ( ˆβ 0 + ˆβ 1 x i1 + + ˆβ k x ik ))x ij = 0, j = 1,..., k. i=1 We can see that these equations are the same as those in the first-order conditions of the OLS.

The Distribution Assumption Under CLR Assumption (6), u is normally distributed with mean 0 and variance σ 2. The density function of u is given by p(u; σ) = 1 ) exp ( u2 2πσ 2σ 2. Then we can estimate the linear regression model using MLE. More generally, we can assume other distributional function for u, t-distribution for example.

Likelihood Function By the Assumption (2), random sampling, the joint distribution of (u 1,..., u n ) is p(u 1,..., u n ; θ) = p(u 1 ; θ)p(u 2 ; θ) p(u n ; θ). Given observations (Y, X ), the likelihood function is defined as p(β, θ y, X ) = p(y 1 x 1β,..., y n x nβ; θ) = p(y 1 x 1β; θ)p(y 2 x 2 β; θ) p(y n x nβ; θ).

Maximum Likelihood Estimation MLE implicitly assumes that what happens should most likely happen. MLE is to solve for ˆβ and ˆθ such that the likelihood function is maximized, max p(β, θ y, X ). β,θ In practice, we usually maximize the log likelihood function: log(p(β, θ y, X )) = n log(p(y i x i β; θ)). i=1

MLE of Classical Linear Regression We assume u i iid N(0, σ 2 ). The log likelihood function is log(p(β, σ y, X )) = n 2 The first-order condition for β is log(2π) n log(σ) 1 2σ 2 n x i (y i x i ˆβ) = 0. i=1 This yields the same ˆβ as in OLS. n (y i x i β) 2. i=1

Definition We call an estimator ˆβ is unbiased if E ˆβ = β. ˆβ is a random variable. For example, the OLS estimator ˆβ = (X X ) 1 X Y is random since both X and Y are sampled from a population. Given a sample, however, ˆβ is determined. So unbiasedness is NOT a measure of how good a particular estimate is, but a property of a good procedure.

The Unbiasedness of OLS Estimator Theorem: Under Assumptions (1) through (4), we have E( ˆβ j ) = β j, j = 0, 1,..., k. Proof: E( ˆβ) = E(X X ) 1 X Y = E(X X ) 1 X (X β+u) = β+e(x X ) 1 X U = β.

Omitted Variable Bias When we, mistakenly or due to lack of data, exclude one or more relevant variables, OLS yields biased estimates. This bias is called omitted variable bias. For example, suppose the wage of a worker is determined by both his education and his innate ability: wage = β 0 + β 1 education + β 2 abililty + u. The ability, however, is not observable. We may have to estimate the following model, where v = β 2 abililty + u. wage = β 0 + β 1 education + v,

The General Case Suppose the true model, in matrix form, is Y = X 1 β 1 + X 2 β 2 + U, (3) where β 1 is the parameter of interest. However, we omit X 2 and estimate Y = X 1 β 1 + V. (4) Denote the OLS estimator of β 1 in (3) as ˆβ 1 and the OLS estimator of β 1 in (4) as β 1. Then E( β 1 X 1, X 2 ) = β 1 + (X 1X 1 ) 1 X 1X 2 β 2.

Formula of Omitted-Variable Bias Suppose we only omit one relevant variable, ie, X 2 is a vector. Then (X 1 X 1) 1 X 1 X 2 is the OLS estimator of the following regression: X 2 = X 1 δ + W. So we have E( β 1 X 1, X 2 ) = β 1 + ˆδβ 2.

A Special Case Suppose the true model is y = β 0 + β 1 x 1 + β 2 x 2 + u. But we estimated y = β 0 + β 1 x 1 + ṽ. From the formula of omitted-variable bias, E( β 1 x 1, x 2 ) = β 1 + ˆδ 1 β 2, where ˆδ 1 is the OLS estimate of δ 1 in x 2 = δ 0 + δ 1 x 1 + w.

Bias Up or Down? We have E( β 1 x 1, x 2 ) = β 1 + ˆδ 1 β 2. Recall that δ 1 measures the correlation between x 1 and x 2. Hence we have OLS Bias corr(x 1, x 2 ) > 0 corr(x 1, x 2 ) < 0 β 2 > 0 β 2 < 0

Return to Education Back to our example, suppose the wage of a worker is determined by both his education and his innate ability: wage = β 0 + β 1 education + β 2 abililty + u. The ability, however, is not observable. We may have to estimate the following model, where v = β 2 abililty + u. wage = β 0 + β 1 education + v, Are we going to overestimate or underestimate the return to education?

Definition We say ˆβ is consistent if ˆβ β as n. This basically says, if we observe more and more, we can estimate our model more and more accurately till exactness.

Law of Large Number Let x 1, x 2,..., x n be iid random variables with mean µ. Then 1 n n x i p µ. i=1

LLN for Vectors and Matrices The x i in LLN can be vectors. And if x i,1 x i,2 Ex = E. = µ = then x i,k 1 n n x i p µ. i=1 The same is also true for matrices. µ 1 µ 2. µ k,

Consistency of OLS Estimator We have ˆβ = β + ( 1 n ) 1 ( n x i x i 1 n i=1 ) n x i u i. i=1 If Ex i x i = Q, and Ex i u i = 0, then by LLN we have ˆβ p β.

Inconsistency of OLS Estimator When Ex i u i = 0, then ˆβ p β + Q. Inconsistency occurs when x i is correlated with u i, or, x is endogenous. Q is called asymptotic bias.

Relative Efficiency If ˆθ and θ are two unbiased estimators of θ, ˆθ is efficient relative to θ if var(ˆθ) var( θ) for all θ, with strict inequality for at least one θ. Relative efficiency compares preciseness of estimators. Example: Suppose we want to estimate the population mean µ of an i.i.d. sample {x i, i = 1,..., n}. Both x and x 1 are unbiased estimators, however, x is more efficient since var ( x) = var(x 1) n var(x 1 ). If θ is a vector, we compare the covariance matrices of ˆθ and θ in the sense of positive definiteness.

Covariance Matrix of A Random Vector The variance of a scalar random variable x is var(x) = E(x Ex) 2. If x is a vector with two elements, x = ( x1 x 2 ), then the variance of x is a 2-by-2 matrix (we call covariance matrix ): Σ x = ( var(x1 ) cov(x 1, x 2 ) cov(x 2, x 1 ) var(x 2 ) ), where cov(x 1, x 2 ) is the covariance between x 1 and x 2 : cov(x 1, x 2 ) = E(x 1 Ex 1 )(x 2 Ex 2 ).

Covariance Matrix of A Random Vector More generally, if x is a vector with n elements, x 1 x 2 x =., x n then the covariance matrix of x is a n-by-n matrix: var(x 1 ) cov(x 1, x 2 ) cov(x 1, x n ) cov(x 2, x 1 ) var(x 2 ) cov(x 2, x n ) Σ x =.... cov(x n, x 1 ) cov(x n, x 2 ) var(x n )

Covariance Matrix of A Random Vector The covariance matrix is the second moment of a random vector: Σ x = E(x Ex)(x Ex). It is obvious that Σ x is a symmetric matrix.

The Formula of Covariance Matrix Given random vectors x and y, if y = Ax, where A is a matrix. Then Σ y = AΣ x A.

Covariance Matrix of the Residual Write the original linear regression model as y i = x i β + u i, where x i = (1, x i1,..., x ik ) β = (β 0, β 1,..., β k ). Eu i = 0 Assumption (4) zero conditional mean Eu 2 i = σ 2 Assumption (5) homoscedasticity Eu i u j = 0 for i j Assumption (2) random sampling What is the covariance matrix for U = (u 1, u 2,..., u n )?

Covariance Matrix of the OLS Estimator The covariance matrix for U = (u 1, u 2,..., u n ) is Σ u = σ 2 I, where I is the identity matrix. We have ˆβ = β + (X X ) 1 X U. The covariance matrix of ˆβ is then Σ ˆβ = σ2 (X X ) 1. The diagonal elements of Σ ˆβ give the standard error of ˆβ. If β is another unbiased estimator of β with covariance matrix Σ β, we say ˆβ is more efficient relative to β if Σ β Σ ˆβ is semi-positive definite for all β, with strict positive definiteness for at least one β.

Simple Regression For a simple regression, We can obtain Σ = y = β 0 + β 1 x + u. σ 2 ( n i (x i x) 2 i x i 2 i x i i x i n ). The variance of ˆβ 1 is var( ˆβ 1 ) = σ 2 i (x i x) 2. The less σ 2, the more accurate ˆβ 1 is. The more variation in x, the more accurate ˆβ 1 is. And the more sample size, the more accurate ˆβ 1 is.

Is OLS A Good Estimator? Define what is good : Is it unbiased? Is it consistent? Does it have a small variance?

Gauss-Markov Theorem Theorem: Under Assumption 1-5, OLS is BLUE (Best Linear Unbiased Estimator). Define best : smallest variance. Define linear : β j = And unbiasedness: E β = β. n w ij y i. (5) i=1 The message: We need not look for alternatives that are unbiased and are in the form of (5).

Time Series Regression Assumptions (1) Linearity y t = β 0 + β 1 x 1t + β k x kt + u t. (2) (x t, y t ) are jointly stationary and ergodic. (3) No perfect collinearity. (4) Past and contemporary exogeneity E(u t x t, x t 1,...) = 0.

Stationarity Weak stationarity. Ex t = µ, cov(x t, x t τ ) = γ τ, τ =..., 2, 1, 0, 1, 2,.... Strict stationarity. F (X t,..., X T ) = F (X t+τ,..., X T +τ ), where F is the joint distribution function.

Ergodicity An ergodic time series (x t ) has the property that x t and x t k are independent if k is large. If (x t ) is stationary and ergodic, then a law of large number holds, 1 n x t Ex a.s.. n t=1

Exogeneity in Time Series Context Strict exogeneity. E(u t X ) = E(u t..., x t+1, x t, x t 1,...) = 0. Past and Contemporary exogeneity. E(u t x t, x t 1,...) = 0.

Consistency of OLS Under the Time Series Regression Assumptions (1)-(4), the OLS estimator of the time series regression is consistent.

Special Cases Autoregressive models (AR), y t = β 0 + β 1 y t 1 + + β p y t p + u t. Autoregressive distributed lag models (ARDL) y t = β 0 + β 1 y t 1 + + β p y t p + γ 1 x t 1 + + γ q x t q + u t. Autoregressive models with exogenous variable (ARX) y t = β 0 + β 1 y t 1 + + β p y t p + γ 1 x t + + γ q x t q+1 + u t, where (x t ) is past and contemporary exogenous.

Beat OLS in Efficiency OLS is consistent, but is not efficient in general. u t may be serially correlated and/or heteroscedastic. In such cases, GLS would be a better alternative. A simple way to account for serial correlation is to explicitly model u t as an ARMA process: y t = x β + u t, where u t ARMA(p, q). But OLS is no longer able to estimate this model. Instead, nonlinear least square or MLE should be used.

Granger Causality Granger causality means that if x causes y, the x is a useful predictor of y t. Granger Causality Test. In the model y t = β 0 + β 1 y t 1 + + β p y t p + γ 1 x t 1 + + γ q x t q + u t. We test: H 0 : γ 1 = = γ q = 0. The above test should be more appropriately called a non-causality test. Or even more precisely, a non-predicting test. Example: Monetary cause of inflation. π t = β 0 +β 1 π t 1 + +β p π t p +γ 1 M1 t 1 + +γ q M1 t q +u t.