Advanced Quantitative Methods: ordinary least squares

Similar documents
Ma 3/103: Lecture 24 Linear Regression I: Estimation

The Simple Regression Model. Part II. The Simple Regression Model

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Introduction to Estimation Methods for Time Series models. Lecture 1

1. The OLS Estimator. 1.1 Population model and notation

Quantitative Methods I: Regression diagnostics

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

STAT5044: Regression and Anova. Inyoung Kim

Multiple Linear Regression

Lecture Notes on Different Aspects of Regression Analysis

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Homoskedasticity. Var (u X) = σ 2. (23)

Ch 3: Multiple Linear Regression

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Advanced Econometrics I

Econometrics Master in Business and Quantitative Methods

ECON The Simple Regression Model

REGRESSION ANALYSIS AND INDICATOR VARIABLES

The Multiple Regression Model Estimation

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

Introductory Econometrics

The Aitken Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 41

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Simple Linear Regression

Review of Econometrics

Linear Regression. Junhui Qian. October 27, 2014

Ch 2: Simple Linear Regression

Weighted Least Squares

Regression Models - Introduction

Simple Linear Regression: The Model

ECON3150/4150 Spring 2015

Multiple Regression Model: I

Measuring the fit of the model - SSR

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

STAT 100C: Linear models

Reference: Davidson and MacKinnon Ch 2. In particular page

FIRST MIDTERM EXAM ECON 7801 SPRING 2001

Regression and Statistical Inference

Intermediate Econometrics

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Applied Econometrics (QEM)

2. Regression Review

Least Squares Estimation-Finite-Sample Properties

Linear Regression Model. Badr Missaoui

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Econometrics Multiple Regression Analysis: Heteroskedasticity

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

The Simple Regression Model. Simple Regression Model 1

Econometrics I Lecture 3: The Simple Linear Regression Model

The Linear Regression Model

Econ 583 Final Exam Fall 2008

Non-Spherical Errors

Multivariate Regression Analysis

Introductory Econometrics

Lecture 14 Simple Linear Regression

Introductory Econometrics

Applied Regression Analysis

Chapter 2 Multiple Regression I (Part 1)

Weighted Least Squares

Regression Models - Introduction

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Ordinary Least Squares Regression

TMA4255 Applied Statistics V2016 (5)

4 Multiple Linear Regression

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Making sense of Econometrics: Basics

Intermediate Econometrics

ECON3150/4150 Spring 2016

Simple Linear Regression Model & Introduction to. OLS Estimation

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Lecture 4 Multiple linear regression

14 Multiple Linear Regression

The Statistical Property of Ordinary Least Squares

Econometrics - 30C00200

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

Need for Several Predictor Variables

Advanced Quantitative Methods: specification and multicollinearity

Inference in Regression Analysis

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Statistical View of Least Squares

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

The Classical Linear Regression Model

Applied Econometrics (QEM)

Chapter 5 Matrix Approach to Simple Linear Regression

Econometrics. Andrés M. Alonso. Unit 1: Introduction: The regression model. Unit 2: Estimation principles. Unit 3: Hypothesis testing principles.

STAT 540: Data Analysis and Regression

Multivariate Regression

ECNS 561 Multiple Regression Analysis

Empirical Economic Research, Part II

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Simple Linear Regression Analysis

Linear Regression. y» F; Ey = + x Vary = ¾ 2. ) y = + x + u. Eu = 0 Varu = ¾ 2 Exu = 0:

Transcription:

Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012

1 2 3 4 5

Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the independent variables also known as explanatory variables also known as regressors or predictors (or factors, carriers) X is sometimes called the design matrix (or factor space) y is regressed on X The error term ε is sometimes called a disturbance: ε = y Xβ. The difference between the observed and predicted dependent variable is called the residual: e = y ŷ = y Xˆβ.

y i = β 0 +β 1 x 1 +β 2 x 2 + +ε i y = Xβ +ε ε N(0,σ 2 ) y N(Xβ,σ 2 )

Components Two components of the model: y N(µ,σ 2 ) Stochastic µ = Xβ Systematic Generalised version (not necessarily linear): y f(µ,α) Stochastic µ = g(x,β) Systematic (King 1998, 8)

Components y f(µ,α) Stochastic µ = g(x,β) Systematic Stochastic component: varies over repeated (hypothetical) observations on the same unit. Systematic component: varies across units, but constant given X. (King 1998, 8)

Uncertainty y f(µ,α) Stochastic µ = g(x,β) Systematic Two types of uncertainty: Estimation uncertainty: lack of knowledge about α and β; can be reduced by increasing n. Fundamental uncertainty: represented by stochastic component and exists independent of researcher.

Ordinary Least Squares (OLS) For the linear model, the most popular method of estimation is (OLS). ˆβ OLS are those estimates of β that minimize the sum of squared residuals: e e. OLS is the best linear unbiased estimator (BLUE).

Assumptions: specification Linear in parameters (i.e. f(xβ) = Xβ and E(y) = Xβ) No extraneous variables in X No omitted independent variables Parameters to be estimated are constant Number of parameters is less than the number of cases, k < n Note that this does not imply that you cannot include non-linearly transformed variables, e.g. y i = β 0 +β 1 x i +β 2 xi 2 +ε i can be estimated with OLS.

Assumptions: errors Errors have an expected value of zero, E(ε X) = 0 Errors are normally distributed, ε N(0,σ 2 ) Errors have a constant variance, var(ε X) = σ 2 < Errors are not autocorrelated, cov(ε i,ε j X) = 0 i j Errors and X are uncorrelated, cov(x,ε) = 0

Assumptions: regressors X varies X is of full column rank (note: requires k < n) No measurement error in X No endogenous variables in X

y = Xβ +ε y 1 y 2. y n n 1 1 x 12 x 13 x 1k 1 x 22 x 23 x 2k =...... 1 x n2 x n3 x nk n k β 1 β 2. β k k 1 ε 1 ε 2 +. ε n n 1 y 1 y 2. y n n 1 β 1 +β 2 x 12 +β 3 x 13 + +β k x 1k β 1 +β 2 x 22 +β 3 x 23 + +β k x 2k =. β 1 +β 2 x n2 +β 3 x n3 + +β k x nk n 1 ε 1 ε 2 +. ε n n 1

Deriving ˆβ OLS y = Xβ +ε y = Xˆβ +e ˆβ OLS = argmin ˆβ e e = argmin(y Xˆβ) (y Xˆβ) ˆβ

Deriving ˆβ OLS e e = (y Xˆβ) (y Xˆβ) = (y (Xˆβ) )(y Xˆβ) = y y (Xˆβ) y y Xˆβ +(Xˆβ) Xˆβ = y y ˆβ X y y Xˆβ + ˆβ X Xˆβ = y y 2y Xˆβ + ˆβ X Xˆβ

Deriving ˆβ OLS ˆβ OLS = argmin ˆβ (e e) ˆβ OLS = 0 e e = (y y 2y Xˆβ + ˆβ X Xˆβ) ˆβ OLS = 0

Deriving ˆβ OLS (y y 2y Xˆβ + ˆβ X Xˆβ) ˆβ OLS = 0 2X Xˆβ OLS 2X y = 0 2X Xˆβ OLS = 2X y X Xˆβ OLS = X y (X X) 1 X Xˆβ OLS = (X X) 1 X y ˆβ OLS = (X X) 1 X y

X Xˆβ OLS = X y 1 x 12 x 13 x 1k 1 x 22 x 23 x 2k....... 1 x n2 x n3 x nk 1 x 12 x 13 x 1k 1 x 22 x 23 x 2k....... 1 x n2 x n3 x nk 1 x 12 x 13 x 1k 1 x 22 x 23 x 2k =....... 1 x n2 x n3 x nk n k n k n k ˆβ 1 ˆβ 2. ˆβ k y 1 y 2. y n k 1 n 1 (ˆβ here refers to OLS estimates.)

X Xˆβ OLS = X y 1 1 1 x 12 x 22 x n2 x 13 x 23 x n3...... x 1k x 2k x nk k n 1 x 12 x 13 x 1k 1 x 22 x 23 x 2k....... 1 x n2 x n3 x nk 1 1 1 x 12 x 22 x n2 = x 13 x 23 x n3...... x 1k x 2k x nk n k k n ˆβ 1 ˆβ 2. ˆβ k y 1 y 2. y n k 1 n 1

X Xˆβ OLS = X y n xi2 xik xi2 (xi2 ) 2 xi2 x ik xi3 xi3 x i2 xi3 x ik...... xik xik x i2 (xik ) 2 k k ˆβ 1 ˆβ 2 ˆβ 3. ˆβ k k 1 yi xi2 y i xi3 = y i. xik y i k 1 ( refers to n i.)

X Xˆβ OLS = X y So this can be seen as a set of linear equations to solve: ˆβ 1 n+ ˆβ 2 xi2 + + ˆβ k xik = y i ˆβ 1 xi2 + ˆβ 2 (xi2 ) 2 + + ˆβ k xi2 x ik = x i2 y i ˆβ 1 xi3 + ˆβ 2 xi3 x i2 + + ˆβ k xi3 x ik = x i3 y i ˆβ 1 xik + ˆβ 2 xik x i2 + + ˆβ k (xik ) 2 = x ik y i.

X Xˆβ OLS = X y When there is only one independent variable, this reduces to: ˆβ 1 n+ ˆβ 2 xi = y i ˆβ 1 xi + ˆβ 2 x 2 i = x i y i ˆβ 1 = yi ˆβ 2 xi n = nȳ ˆβ 2 n x n (ȳ ˆβ 2 x) x i + ˆβ 2 x 2 i = x i y i = ȳ ˆβ 2 x nȳ x ˆβ 2 n x 2 + ˆβ 2 x 2 i = x i y i xi y i nȳ x ˆβ 2 = x 2 i n x 2

Simple regression y i = β 0 +β 1 x i +ε i If β 1 = 0 and the only regressor is the intercept, then ˆβ 0 = ȳ. If β 0 = 0, so that there is no intercept and one explanatory variable x, then ˆβ 1 =. n i x i y i n i x 2 i If there is an intercept and one explanatory variable, then ˆβ 1 = n i (x i x)(y i ȳ) n i (x i x) 2

Demeaning If the observations are expressed as deviations from their means, i.e.: y i = y i ȳ x i = x i x, then ˆβ 1 = n i x i y n i x 2 i i The intercept can be estimated as ȳ ˆβ 1 x.

Standardized variables The coefficient on a variable is interpreted as the effect on y given a one unit increase in x and thus the interpretation is dependent on the scale of measurement. An option can be to standardize the variables: y = y ȳ σy 2 x = x x σ 2 x y = X β +ε and interpret the coefficients as the effect on y expressed in standard deviations as x increase by one standard deviation.

Standardized variables library(arm) summary(standardize(lm(y ~ x)), binary.inputs="leave.alone") See also Andrew Gelman (2007), Scaling regression inputs by dividing by two standard deviations.

Exercise: teenage gambling library(faraway) data(teengamb) summary(teengamb) Using matrix formulas, 1 regress gamble on a constant 2 regress gamble on sex 3 regress gamble on sex, status, income, verbal

Exercise: teenage gambling 1 Regress gamble on sex, status, income, verbal 2 Which observation has the largest residual? 3 Compute mean and median of residuals 4 Compute correlation between residuals and fitted values 5 Compute correlation between residuals and income 6 All other predictors held constant, what would be the difference in predicted expenditure on gambling between males and females?

Deriving var(ˆβ OLS ) var(ˆβ OLS ) = E[(ˆβ OLS E(ˆβ OLS ))(ˆβ OLS E(ˆβ OLS )) ] = E[(ˆβ OLS β)(ˆβ OLS β) ]

Deriving var(ˆβ OLS ) ˆβ OLS β = (X X) 1 X y β = (X X) 1 X (Xβ +ε) β = (X X) 1 X Xβ +(X X) 1 X ε β = β +(X X) 1 X ε β = (X X) 1 X ε

Deriving var(ˆβ OLS ) var(ˆβ OLS ) = E[(ˆβ OLS β)(ˆβ OLS β) ] = E[((X X) 1 X ε)((x X) 1 X ε) ] = E[(X X) 1 X εε X(X X) 1 ] = (X X) 1 X E[εε ]X(X X) 1 = (X X) 1 X σ 2 IX(X X) 1 = σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1

Standard errors var(ˆβ OLS ) = σ 2 (X X) 1 The standard errors of ˆβ OLS are then the square root of the diagonal of this matrix. In the simple case where y = β 0 +β 1 x, this gives var(ˆβ 1 ) = σ 2 n i (x i x) 2 Note how an increase in variance in x leads a decrease in the standard error of ˆβ 1.

e = y Xˆβ OLS = y X(X X) 1 X y = (I X(X X) 1 X )y = My = M(Xβ +ε) = (I X(X X) 1 X )Xβ +Mε = Xβ X(X X) 1 X Xβ +Mε = Xβ Xβ +Mε = Mε

e e = (Mε) (Mε) = ε M Mε = ε Mε E[e e X] = E[ε Mε X] ε Mε is a scalar, so if you see it as a 1 1 matrix, it is equal to its trace, therefore E[e e X] = E[tr(ε Mε X] = E[tr(Mεε ) X] = tr(me[εε X]) = tr(mσ 2 I) = σ 2 tr(m)

tr(m) = tr(i X(X X) 1 X ) = tr(i) tr(x(x X) 1 X ) = n tr(x X(X X) 1 ) = n k E(e e X) = (n k)σ 2 ˆσ 2 = e e n k

OLS in R n <- dim(x)[1] k <- dim(x)[2] b.hat <- solve(t(x) %*% X) %*% t(x) %*% y e <- y - X %*% b.hat s2.hat <- 1/(n-k) * t(e) %*% e v.hat <- s2.hat * solve(t(x) %*% X) Or: summary(lm(y ~ X))

Exercise: US wages library(faraway) data(uswages) summary(uswages) 1 Using matrix formulas, regress wage on educ, exper and race. 2 Interpret the results 3 Plot residuals against fitted values and against educ 4 Repeat with log(wage) as dependent variable

Unbiasedness of ˆβ OLS Unbiasedness of ˆβ OLS Efficiency of ˆβ OLS Consistency of ˆβ OLS From deriving var(ˆβ OLS ) we have: ˆβ OLS β = (X X) 1 X y β = (X X) 1 X (Xβ +ε) β = (X X) 1 X Xβ +(X X) 1 X ε β = β +(X X) 1 X ε β = (X X) 1 X ε Since we assume E(ε) = 0: E(ˆβ OLS β) = (X X) 1 X E(ε) = 0 Therefore, ˆβ OLS is an unbiased estimator of β.

Efficiency of ˆβ OLS Unbiasedness of ˆβ OLS Efficiency of ˆβ OLS Consistency of ˆβ OLS The Gauss-Markov theorem states that there is no linear unbiased estimator of β that has a smaller sampling variance than ˆβ OLS, i.e. ˆβ OLS is BLUE. An estimator is linear iff it can be expressed as a linear function of the data on the dependent variable: ˆβlinear j = n i f(x ij )y i, which is the case for OLS: ˆβ OLS = (X X) 1 X y = Cy (Wooldridge, pp. 101-102, 111-112)

Gauss-Markov Theorem Unbiasedness of ˆβ OLS Efficiency of ˆβ OLS Consistency of ˆβ OLS ˆβ OLS = (X X) 1 X y = Cy Imagine we have another linear estimator: ˆβ = Wy, where W = C+D. ˆβ = (C+D)y = Cy+Dy = ˆβ OLS +D(Xβ +ε) = ˆβ OLS +DXβ +Dε

Gauss-Markov Theorem Unbiasedness of ˆβ OLS Efficiency of ˆβ OLS Consistency of ˆβ OLS E(ˆβ ) = E(ˆβ OLS +DXβ +Dε) = E(ˆβ OLS )+DXβ +E(Dε) = β +DXβ +0 DXβ = 0 DX = 0, because both ˆβ OLS and ˆβ are unbiased, so E(ˆβ OLS ) = E(ˆβ ) = β. (Hayashi, pp. 29-30)

Gauss-Markov Theorem Unbiasedness of ˆβ OLS Efficiency of ˆβ OLS Consistency of ˆβ OLS ˆβ = ˆβ OLS +Dε ˆβ β = ˆβ OLS +Dε β = (C+D)ε var(ˆβ ) = E(ˆβ β)e(ˆβ β) = E((C+D)ε)E((C+D)ε) = (C+D)E(εε )(C+D) = σ 2 (C+D)(C+D) = σ 2 (CC +DC +CD +DD ) = σ 2 ((X X) 1 +DD ) σ 2 (X X) 1

Consistency of ˆβ OLS Unbiasedness of ˆβ OLS Efficiency of ˆβ OLS Consistency of ˆβ OLS ˆβ OLS = (X X) 1 X y = (X X) 1 X (Xβ +ε) = β +(X X) 1 X ε = β +( 1 n X X) 1 ( 1 n X ε) 1 lim n n X X = Q = lim (1 n n X X) 1 = Q 1 E( 1 n X ε) = E( 1 n n x i ε i ) = 0 i

Consistency of ˆβ OLS Unbiasedness of ˆβ OLS Efficiency of ˆβ OLS Consistency of ˆβ OLS var( 1 n X ε) = E( 1 n X ε( 1 n X ε) ) = 1 n X E(εε )X 1 n = ( σ2 X n )(X n ) lim n var(1 n X ε) = 0Q = 0

Consistency of ˆβ OLS Unbiasedness of ˆβ OLS Efficiency of ˆβ OLS Consistency of ˆβ OLS ˆβ OLS = β +( 1 n X X) 1 ( 1 n X ε) lim (1 n n X X) 1 = Q 1 E( 1 n X ε) = 0 lim var(1 n n X ε) = 0 imply that the sampling distribution of ˆβ OLS collapses to β as n becomes very large, i.e.: plim ˆβ OLS = β n

Sums of squares SST Total sum of squares (y i ȳ) 2 SSE Explained sum of squares (ŷ i ȳ) 2 SSR Residual sum of squares ei 2 = (ŷ i y i ) 2 = e e The key to remember is that SST = SSE + SSR Sometimes instead of explained and residual, regression and error are used, respectively, so that the abbreviations are swapped (!).

Defined in terms of sums of squares: = SSE SST = 1 SSR SST (yi ŷ i ) 2 = 1 (yi ȳ) 2 Interpretation: the proportion of the variation in y that is explained linearly by the independent variables A much over-used statistic: it may not be what we are interested in at all

in matrix algebra y y = (ŷ+e) (ŷ+e) = ŷ ŷ+ŷ e+e ŷ+e e = ŷ ŷ+2ˆβ X e+e e = ŷ ŷ+e e = 1 e e y y = ŷ ŷ y y (Hayashi, p. 20)

When a model has no intercept, it is possible for to lie outside the interval (0, 1) rises with the addition of more explanatory variables. For this reason we often report the adjusted : 1 (1 ) n 1 n k The values from different y samples cannot be compared

Exercise: US wages library(faraway) data(uswages) summary(uswages) 1 Using matrix formulas, regress wage on educ, exper and race. 2 What proportion of the variance in wage is explained by these three variables?