Simple Linear Regression Model & Introduction to. OLS Estimation

Similar documents
LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Introduction to Econometrics. Heteroskedasticity

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Simple Linear Regression: The Model

Review of Econometrics

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Multiple Linear Regression CIVL 7012/8012

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

1 Motivation for Instrumental Variable (IV) Regression

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Lecture 3: Multiple Regression

Motivation for multiple regression

Multiple Regression Analysis

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Least Squares Estimation-Finite-Sample Properties

Intermediate Econometrics

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Introduction to Estimation Methods for Time Series models. Lecture 1

Ordinary Least Squares Regression

Linear Models in Econometrics

Econometrics Summary Algebraic and Statistical Preliminaries

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

The Simple Regression Model. Simple Regression Model 1

Regression Analysis for Data Containing Outliers and High Leverage Points

the error term could vary over the observations, in ways that are related

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

ECON The Simple Regression Model

Homoskedasticity. Var (u X) = σ 2. (23)

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Econometrics I Lecture 3: The Simple Linear Regression Model

Chapter 2: simple regression model

1. The OLS Estimator. 1.1 Population model and notation

Introductory Econometrics

Regression #4: Properties of OLS Estimator (Part 2)

Multiple Linear Regression

Essential of Simple regression

The Simple Regression Model. Part II. The Simple Regression Model

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Introductory Econometrics

Econometrics Multiple Regression Analysis: Heteroskedasticity

Intermediate Econometrics

ECON3150/4150 Spring 2015

1. The Multivariate Classical Linear Regression Model

A Course on Advanced Econometrics

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Model Mis-specification

Linear Regression. Junhui Qian. October 27, 2014

ECNS 561 Multiple Regression Analysis

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

The Multiple Regression Model Estimation

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Multivariate Regression Analysis

Heteroskedasticity and Autocorrelation

Y i = η + ɛ i, i = 1,...,n.

EMERGING MARKETS - Lecture 2: Methodology refresher

Advanced Quantitative Methods: ordinary least squares

Reliability of inference (1 of 2 lectures)

Econ 510 B. Brown Spring 2014 Final Exam Answers

UNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS FINAL EXAM (Type B) 2. This document is self contained. Your are not allowed to use any other material.

Linear Regression with Time Series Data

Environmental Econometrics

Econometrics II - EXAM Answer each question in separate sheets in three hours

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Advanced Econometrics

Econometrics Master in Business and Quantitative Methods

Regression Models - Introduction

Heteroscedasticity and Autocorrelation

P1.T2. Stock & Watson Chapters 4 & 5. Bionic Turtle FRM Video Tutorials. By: David Harper CFA, FRM, CIPM

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Basic Econometrics - rewiev

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

Simple Linear Regression Estimation and Properties

Econometrics - 30C00200

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Economics 308: Econometrics Professor Moody

MS&E 226: Small Data

1 Introduction to Generalized Least Squares

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Lab 07 Introduction to Econometrics

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Heteroskedasticity. Part VII. Heteroskedasticity

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Introductory Econometrics

Semester 2, 2015/2016

4.8 Instrumental Variables

Lecture 8: Instrumental Variables Estimation

UNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS Academic year 2009/10 FINAL EXAM (2nd Call) June, 25, 2010

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Multiple Regression Analysis

Introductory Econometrics

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Economics 241B Estimation with Instruments

Transcription:

Inside ECOOMICS Introduction to Econometrics Simple Linear Regression Model & Introduction to Introduction OLS Estimation We are interested in a model that explains a variable y in terms of other variables x. We are also interested in finding how much y changes as a result of change in x. The simple linear regression model is used to study the relationship between an independent variable and the explanatory variables. For instance we have one explanatory variable x and one dependent variable y as shown below. It is common to include a constant β 0 which indicates the point of intersection on the y axis. The error term denoted by u represents the factors other than x that have an effect on the dependent variable y. Please note in this document we shall be dealing with only cross sectional data y = β 0 + β 1 x 11 + u The β k are unknown coefficients and the x ik are the regressors. For the regressors x ik the i denotes the observation or individual and are indexed from 1 to, where is called the sample size. So for instance the regressor x 13 means that the coefficient relates to the third regressor of the model for individual or observation 1. In the above equation this is the first regressor for individual or observation 1. y 1 = β 0 + β 1 x 11 + β x 1 + β 3 x 13 + β 4 x 14 + u 1 In the equation above we have linear regression for the 1st observation or individual. y = β 0 + β 1 x 1 + β x + β 3 x 3 + β 4 x 4 + u In the equation above we now have an equation for the second observation or individual. Please note that the regressors are the same for both individuals but they may, have different beta coefficients. For example x i1 and x i could be variables such as education and age. Therefore x 11 and x 1 are education and age for the first individual and x 1 and x are the education and age regressors for the second individual. Suppose that for some k = 1,., k, x ik denotes age of individual i and if the individual i were one year older, the value of the dependent variable y i will increase by β k. Matrix otation y i = β 0 + β 1 x i1 + β x i + β 3 x i3 + + β k x ik + u i (1) In Matrix otation we can write the model as Y = Xβ + u () Where Y is a vector of dependent variablesy = (y 1, y,, y n ). X is a matrix of independent variables with dimensions n k + 1. The 1 column of the matrix is there for the intercept term. The error term is also a vector u = (u 1, u,, u n ). 1

Inside ECOOMICS y 1 1 x 11 Y =, X = y 1 x n1 x 1 x 1k β 0 u 1, β =, u = x n x K β k u Ordinary Least Squares Estimation There are various methods to estimate the coefficients. Ordinary Least Squares (OLS) is just one these methods. OLS is relatively simple and has some attractive properties that make it a popular estimation method. The OLS estimator minimises the sum of squared residuals. In the diagram above we see that this line is the line that minimises the sum of squared residuals. So the sum of the squared distance between the errors and the line is minimised with this line. If the line was to change then the sum of squared residuals would be larger and would not be the minimum variance estimator. We prefer estimators with OLS Assumptions Assumption 1: Independent and identically distributed (I.I.D) I.I.D observations: (x i, y i ) is independent from, and has the same distribution as, (x j, y j ) for all i j; We do not observe the population but only a sample therefore we assume that an I.I.D sample can be drawn from the population. The I.I.D assumption makes it easier for us to interpret some of the other assumption. It also allows us to use asymptotic results (as the sample size ). Assumption : Linearity The regresssion model is linear in the parameters, (this is evident in the structure of equation (1)) Essentially the response variable is a linear function of the regressors. In the case where the models may not be linear in parameters a linear regression model will be an approximation. However this approximation often results in minimal accuracy.

Inside ECOOMICS Assumption 3: Uncorrelatedness E[x i, u i ] = 0 It is assumed that E[u i ] = 0, which means that the errors in the regression should have conditional mean of zero. Therefore assumption 3 is equivalent to errors being uncorrelated with the regressors. If this assumption holds we can call the regressor exogenous variables. If however it does not hold then the regressors that are correlated with the error term are called endogenous variables. If the regression contains endogenous variables the OLS estimates will be invalid and instrumental variables will be required. Assumption 4: Full Rank (OLS) rank E[x i, x i ] = k This assumption eliminates the possibility of collinearity. In practice collinearity is not a large problem, esspecially if the sample size is large. Assumption 5: Homoskedasticity This can be written as E u i X = σ E u i x i x i = E u i E[x i x i ] = σ A, where σ E u i The u i are known as error terms and include all the differences in y i that are not captured by the x variables. Homoskedasticity means that the errors have the same variance σ for each observation. The variance of the error is treated as a constant. This means that the 1st observation and the last observation in the sample will have equal and identical variance for the error. As a result the probability distribution for the dependent variable has the same variance regardless of the values for the explanatory variables. If this assumption is violated we have hetroskedasticity which means that the variance of the error term is not constant and differs across observations. Aside: If hetroskedasticity is present the weighted least squares estimator will be a more efficient estimator and can be used. If the errors have infinite variance robust estimation techniques are preferred. Assumption 6: Exogeneity This implies that E β β x 1,, x = 0 E[u i x i ] = 0 3

Inside ECOOMICS This means that β is an unbiased estimator of β conditional on the regressors x 1,, x. As E β = β, irrespective of the value of β. This assumption is similar to assumption 3, however assumption 3 is a stronger assumption of strict exogeneity. Deriving the OLS Estimator (Summation otation) We will now minimise the sum of squared residuals to derive the OLS estimator. The OLS Estimator is BLUE (Best Linear Unbiased Estimator). We will firstly derive the OLS estimator with sigma notation. Assuming we have an intercept and one regressor so K =. Equation y i = β 0 + β 1 x i + u i The data collected on the x s and y s will be used to construct estimates for β 0 and β 1. OLS is on technique of estimation and requires the minimisation of the sum of square residuals. β 0β 1 = min β0,β 1 (y i β 0 + β 1x i ) (1) The First Order Conditions are as follows Please note that this implies that, y i β 0 + β 1x i = 0 x i y i β 0 + β 1x i = 0 u i = 0 and x i u i = 0 where x = 1 x i therefore the above equation holds because x u i = 0 x is the sample average of the independent variable and comes from the sum of all x i divided by the number of observations in the sample. (Remember there are observations i. ). Similarly the sample average of the dependent variable is y = 1 y i Let us turn our attention to the first FOC and solve for β 0 1 y i β 0 + β 1x i = 0 1 y i 1 β 0 + 1 β 1x i = 0 y β 0 + β 1x = 0 4

Inside ECOOMICS β 0 = y β 1x ow our task is to solve for β 1 x i y i β 0 + β 1x i = 0 First we can eliminate the - as this is just a constant. Then we can start by multiplying out the equation to get the following expression. x i y i β 0x i β 1x i x i = 0 x i y i β 0x i β 1x i = 0 Substitute the expression for β 0 into the above equation x i y i x i (y β 1x ) β 1x i = 0 The summation term applies to everything in the equation so to work out the step it is best to write it out. (Remember that you can always put a constant term out in front of the summation) x i y i y x i β 1x x i β 1 x i = 0 Using the properties x = 1 x i and y = 1 y i x i y i y x β 1x x β 1 x i = 0 x i y i y x β 1 x x i = 0 (x i x )(y i y ) (x i x ) β 1 Rearrange for β 1 β 1 = x iy i x y x i x β 1 = (x i x )(y i y) (x i x ) 5

Inside ECOOMICS Finally we have solved for both β 0 and β 1 OLS For an Arbitrary k > β 1 = β 0 = y β 1x (x i x )(y i y) (x i x ) So far we have only had two parameters the intercept β 0 and the explanatory variable β 1. When k > the previous equations are incorrect. If we have an arbitrary k number of variables we need to minimise the sum of squared residuals in terms of k terms. β = min (y i x i β ) x i (y i x i β ) = 0 x i (y i x i β ) = 0, which is the same as x i u i = 0 x i x i β = x i y i β = x i x i 1 x i y i This equation is equivalent to the Matrix otation OLS Estimator equation β = (X X) 1 X Y Deriving the OLS Estimator (Matrix otation) We will now minimise the sum of squared residuals to derive the OLS estimator using Matrix algebra. Y = Xβ + u u = Y Xβ min u u = Y Xβ Y Xβ Use matrix calculus d A A da d Y Xβ Y Xβ dβ = A and d(cb) db = 0 = C 6

Inside ECOOMICS ( X) Y Xβ = 0 X Y Xβ = 0 X Y X Xβ = 0 X Xβ = X Y Assuming that (X X) 1 exists (Assumption 4) β = (X X) 1 X Y Where this β is the OLS estimator of the true population beta β Key Equations 1. Linear Model y i = β 0 + β 1 x i1 + β x i + β 3 x i3 + + β k x ik + u i. Linear Model Matrix otation Y = Xβ + u 3. OLS Estimator β 0 = y β 1x 4. OLS Estimator β 1 = (x i x )(y i y) (x i x ) 5. OLS Estimator for Arbitrary k > β = x i x i 1 x i y i 6. OLS Estimator Matrix otation β = (X X) 1 X Y Gauss-Markov Theorem In the classical linear regression model under Assumptions 1,, 4 and 6 the OLS estimator of equation is the minimum variance unbiased estimator of β. The OLS Estimator is BLUE (Best Linear Unbiased Estimator). For the proof of OLS properties please refer to the document labelled Properties of OLS (Proofs). Also for a brief OLS derivation reference refer to the document labelled Derivation of the OLS Estimator. 7