LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Similar documents
Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

1. The OLS Estimator. 1.1 Population model and notation

Simple Linear Regression Model & Introduction to. OLS Estimation

Simple Linear Regression: The Model

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Advanced Econometrics I

Intermediate Econometrics

Homoskedasticity. Var (u X) = σ 2. (23)

1. The Multivariate Classical Linear Regression Model

the error term could vary over the observations, in ways that are related

The Statistical Property of Ordinary Least Squares

ECON The Simple Regression Model

Linear Regression. Junhui Qian. October 27, 2014

Lecture 3: Multiple Regression

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Properties of the least squares estimates

Multiple Linear Regression

Multivariate Regression Analysis

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Econometrics I Lecture 3: The Simple Linear Regression Model

Linear Algebra Review

Linear Models in Econometrics

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Review of Econometrics

MIT Spring 2015

STAT 100C: Linear models

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Chapter 5 Matrix Approach to Simple Linear Regression

The regression model with one stochastic regressor.

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

The Linear Regression Model

Intermediate Econometrics

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Making sense of Econometrics: Basics

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

14 Multiple Linear Regression

The Multiple Regression Model Estimation

Topic 10: Panel Data Analysis

LECTURE 5 HYPOTHESIS TESTING

The Simple Regression Model. Simple Regression Model 1

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Least Squares Estimation-Finite-Sample Properties

Regression Models - Introduction

3 Multiple Linear Regression

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Regression and Statistical Inference

STA 302f16 Assignment Five 1

STAT 540: Data Analysis and Regression

Economics 620, Lecture 4: The K-Varable Linear Model I

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

Ordinary Least Squares Regression

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Graduate Econometrics Lecture 4: Heteroskedasticity

2. Linear regression with multiple regressors

Applied Statistics and Econometrics

Applied Econometrics (QEM)

Multicollinearity and A Ridge Parameter Estimation Approach

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Greene, Econometric Analysis (7th ed, 2012)

Regression. Oscar García

The Simple Regression Model. Part II. The Simple Regression Model

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Basic Econometrics - rewiev

1 Ordinary Least Squares Regression: The Multivariate

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

STAT 100C: Linear models

Introductory Econometrics

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 6: Geometry of OLS Estimation of Linear Regession

Introductory Econometrics

Interpreting Regression Results

Lecture 24: Weighted and Generalized Least Squares

Introduction to Econometrics

Introductory Econometrics

Inference in Regression Analysis

Multiple Regression Model: I

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Advanced Quantitative Methods: ordinary least squares

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 07 Hypothesis Testing with Multivariate Regression

Transcription:

SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another Y i, the dependent variables An econometrician observes the random data: Y 1, X 1, Y 2, X 2, Y n, X n, 1 where for i = 1,, n, Y i is a random variable and X i is a random k-vector: X i1 X i2 X i = X ik A pair Y i, X i is called the observation, and the collection of observations in 1 is called the sample The vector X i collects the values of k variables for observation i The joint distribution of 1 is called the population The population does not correspond to any physical population, but to a probability space In a cross-sectional framework each observation is a different individual or a firm etc, it is often natural to assume that all observations are independently drawn from the same distribution In this case, the population is described by the distribution of a single observation Y 1, X 1, which can be stated as well as Y i, X i are iid for i = 1,, n Note that the iid assumption does not imply that Y i and X i are independent, but rather that the random vector Y i, X i is independent from Y j, X j for i j At the same time, Y i and X i are still can be related In cross-sections, the relationship between the regressors and the dependent variable is modelled through the conditional expectation E Y i X i The deviation of Y i from its conditional expectation is called the error or residual: U i = Y i E Y i X i 2 Contrary to X i and Y i, the residual U i is not observable, since the conditional expectation function is unknown to the econometrician In the parametric framework, it is assumed that the conditional expectation function depends on a number of unknown constants or parameters, and that the functional form of E Y i X i is known In the linear regression model, it is assumed that E Y i X i is linear in the parameters: where E Y i X i = β 1 X i1 + β 2 X i2 + + β k X ik = X iβ, 3 β = is a k-vector of unknown constants The linearity of E Y i X i can be justified, for example, by saying that Y i, X i jointly has a multivariate normal distribution Since β j = EYi Xi X ij, the vector β is a vector of marginal effects of X i, ie β j gives the change in the conditional mean of Y i per unit change in X ij, while holding the values of other variables X il for l j fixed One of the objectives is estimation of unknown β from the sample 1 Note that combining together equations 2 and 3, one can write: β 1 β 2 β k Y i = X iβ + U i 4 1

By definition 2, E U i X i = 0 This implies, that the regressors contain no information on the deviation of Y i from its conditional expectation Further, the Law of Iterated Expectation LIE implies that the residuals have zero mean: EU i = 0 If Y i, X i are iid, then the residuals {U i : i = 1,, n} are iid as well In the classical regression model, it is assumed that the variance of the errors U i is independent of the regressors and the same for all observations: U i X i = σ 2, for some constant σ 2 > 0 This property is called homoskedasticity Assumptions In this section, we formally define the linear regression model Let s define X 1 X 2 X = X n X 11 X 12 X 1k = X 21 X 22 X 2k, X n1 X n2 X nk Y = U = Y 1 Y 2 Y n U 1 U 2 U n, and The following are the four classical regression assumptions: A1 Y = Xβ + U A2 E U X = 0 as A3 U X = σ 2 I n as A4 rankx = k as 1 1 The column row rank of a matrix is the maximum number of linearly independent columns rows One can show that, for any matrix, the column and row ranks are equal If A is an n k matrix, then ranka min n, k If ranka = n or ranka = k, we say that A has full row column rank Properties: rank A = rank A = rank A A = rank AA, rank AB min rank A, rank B, rank AB = rank A if B is square and of full rank 2

Instead of conditioning on the observed values of the regressors, in the classical regression model, one can assume that X is not random, ie the value of X are fixed in repeated samples In this case, A2 is replaced by EU = 0 and A3 is replaced by U = σ 2 I n Since conditioning on X is basically equivalent to treating them as fixed, two sets of assumptions lead to the same results For inference purposes, sometimes it is assumed that: A5 U X N 0, σ 2 I n In the case of fixed regressors, it is assumed instead that the unconditional distribution of the errors is normal Assumptions A1-A5 define the classical normal regression model In this case, Y X N Xβ, σ 2 I n Note that since all covariance elements in A5 are zeros, A5 implies independence of the residuals Assumptions A1-A4 alone do not imply independence across observations Actually, for many important results, independence across observations is not required Nevertheless, sometimes we would like to assume independence without normality: A6 {Y i, X i : i = 1,, n} are iid In the case of fixed regressors, A6 can be replaced with the assumption that {U i : i = 1,, n} are iid Assumption A2 says that U is mean independent of X This is actually a very strong assumption As we have mentioned above, it is equivalent to assuming that X are not random In many economic applications, it is hard to justify However, many important results may be obtained with a weaker assumption of uncorrelatedness: A2* For i = 1,, n, E U i X i = 0 and E U i = 0 Note that under A2*, the expression X i β does not have the interpretation of the conditional expectation In this case, one can treat 4 as a data generating process Assumption A3 implies that, the residuals U i have the same variance for all i, and uncorrelated with each other: E U i U j X = 0 for i j If desired, independence of the residuals may be achieved through A5 or Assumptions A1 and A6 Assumption A4 says that the columns of X are not linearly dependent If it is violated, then one or more regressors simply duplicate the information contained in other regressors, and thus should be omitted Often, the first column of the matrix X is a column of ones In this case, its coefficient β 1 is called the intercept The intercept gives the average value of the dependent variable when all regressors are equal to zero Estimation by the method of moments One of the objectives of econometric analysis is estimation of unknown parameters β and σ 2 An estimator is any function of the sample {Y i, X i : i = 1,, n} An estimator can depend on the unknown residuals U i or unknown parameters like β only through the observed variables Y and X An estimator usually is not unique, ie there exists a number of alternative estimators for the same parameter One of the oldest methods of finding estimators is called the method of moments MM The MM says to replace population moments expectations with the corresponding sample moments averages Assumptions A2 or A2* imply that at the true value of β, 0 = E U i X i = E Y i X iβ X i 5 3

Let β be an estimator of β According to the MM, we replace expectation in 5 with the sample average: 0 = n 1 n Y i X i β X i n = n 1 X i Y i n 1 Note that Y i X i β is a scalar Solving for β, one obtains: β = = n 1 n n X i X i β 1 n X i X i n 1 X i Y i 1 X i X i X i Y i = X X 1 X Y 6 The matrix X ix i = X X is invertible due to Assumption A4 To show that X X = X ix i, note that X 1 X X X 2 1 X 2 X = X n X n = X 1 X 2 X n X 1 X 2 X n = X 1 X 1 + X 2 X 2 + + X n X n n = X i X i The expression X i β gives the estimated regression line, with Ŷi = X i β being the predicted value of Y i, and Ûi = Y i X i β being the sample residual, Û = Y X β The vector Û is a function of the estimator of β In the case of the MM estimator, the sample residuals have to satisfy the sample normal equation: 0 = X Û 7 = n Û i X i = ÛiX i1 ÛiX i2 ÛiX ik If the model contains an intercept, ie X i1 = 1 for all i, then the normal equation implies that Ûi = 0 4

In order to estimate σ 2, write: σ 2 = EU 2 i = E Y i X iβ 2 Since β is unknown, we must replace it by its MM estimator: Least Squares σ 2 = n 1 n Y i X i β 2 Let b be an estimator of β The the ordinary least squares OLS estimator is an estimator of β that minimizes the sum-of-squared errors function: Sb = n Y i X ib 2 = Y Xb Y Xb It turns out that β, the MM estimator presented in the previous section, is the OLS estimator as well In order to show that, write Sb = Y Xb Y Xb = Y X β + X β Xb Y X β + X β Xb = Y X β Y X β + X β Xb X β Xb +2 Y X β X β Xb = Y X β Y X β + β b X X β b +2Û X β b equals zero because of the normal equations = Y X β Y X β + β b X X β b Minimization of Sb is equivalent to minimization of β b X X β b, because Y X β Y X β is not a function of b If X is of full column rank, as assumed in A4, X X is a positive definite matrix, and therefore β b X X β b 0, where β b X X β b = 0 if and only if β b = 0 Alternatively, one can show that β as defined in 6 is the OLS estimator, by taking the derivative of Sb with respect to b, and solving the first order condition ds β db = 0 Write Sb = Y Y 2b X Y + b X Xb 5

Using the fact that for a symmetric matrix A we have that the first order condition is S β b x Ax x = 2Ax, = 2X Y + 2X X β = 0 8 Solving for β, one obtains: β = X X 1 X Y 9 Note also that the first order condition 8 can be written as X Y X β = 0, which gives us normal equation 7 Properties of β 1 β is a linear estimator An estimator b is linear if it can be written as b = AY, where A is some matrix, which depends X alone, and does not depend on Y In the case of the OLS, A = X X 1 X 2 Under Assumptions A1, A2 and A4, β is an unbiased estimator, ie E β = β In order to show unbiasedness, first, plug-in the expression for Y in Assumption A1 into equation 9: β = X X 1 X Xβ + U = β + X X 1 X U 10 Next, consider the conditional expectation of β given X: E β X = E β + X X 1 X U X = β + E X X 1 X U X Now, E X X 1 X U X = X X 1 X E U X = 0, since, by Assumption A2, E U X = 0 Therefore, E β X = β 11 Finally, the LIE implies that E β = EE β X = β Equation 11 shows that β is conditionally unbiased given X Note that unbiasedness cannot be obtained under Assumption A2* 6

3 Under Assumptions A1, A2 and A4, β X = X X 1 X E UU X X X X 1 Under the homoskedastic errors, Assumption A3, the expression for the conditional variance simplifies to β X = σ 2 X X 1 To show that, first by the definition of the variance we have: β X = E β E β X β E β X X = E β β β β X = E X X 1 X UU X X X 1 X follows from 10 = X X 1 X E UU X X X X 1 Under homoskedasticity, E UU X = σ 2 I n, and X X 1 X E UU X X X X 1 = X X 1 X σ 2 I n X X X 1 = σ 2 X X 1 X X X X 1 = σ 2 X X 1 In the case of fixed regressors, β = σ 2 X X 1 4 If we add normality of the errors, ie under Assumptions A1-A5, we obtain the following result: β X N β, σ 2 X X 1 It is sufficient to show that, conditional on X, the distribution of β is normal Then, β X N E β X, β X However, normality of β X follows from the facts that β is a linear function of Y, and that Y X is normal by Assumption A5 see Normal distribution Section in Lecture 1 Again, in the case of fixed regressors, we simply omit conditioning on X, β N β, σ 2 X X 1 5 Efficiency or the Gauss-Markov Theorem: Under Assumptions A1-A4, the OLS estimator is the Best Linear Unbiased Estimator of β BLUE, where "best" means having the smallest variance, ie for any linear and unbiased estimator b, we have that b X β X is a positive semi-definite matrix: b X β X 0 Furthermore, if β is a linear unbiased estimator and β X = β X, then β = β with probability one Note that since the theorem discusses the conditional variance of the OLS estimator, unbiasedness in the statement of the theorem actually refers to unbiasedness conditional on X, ie E b X = β Proof: Let b be a linear and unbiased estimator of β : b = AY, E b X = β 7

These conditions imply that AX = I with probability 1 Indeed, Eb X = E A Xβ + U X = AXβ + AEU X By Assumption A2, EU X = 0, and thus, in order to obtain unbiasedness we need that AX = I k Next, we show that Cov β, b X = β X : Cov β, b X = E β β = E b β X X X 1 X UU A X = X X 1 X E UU X A = σ 2 X X 1 X A since, by Assumption A3, E UU X = σ 2 I n = σ 2 X X 1 since X A = I k = β X Finally, β b X = β X = b X Cov β, b X Cov b, β X + b X β X 12 Now, since any variance-covariance matrix is positive semi-definite, we have that b X β X 0 To show uniqueness, suppose that there is a linear unbiased estimator β such that β X = β X Then, by 12, β β X = 0, and therefore β = β + cx for some k-vectorvalued function cx that can depend only on X However, since both β and β are conditionally unbiased given X, it follows that cx = 0 with probability one Note that Assumption A3, E UU X = σ 2 I n, plays a crucial rule in the proof of the result We could not draw conclusion about efficiency of OLS without it 8