ECON The Simple Regression Model

Similar documents
The Simple Regression Model. Simple Regression Model 1

Multiple Linear Regression CIVL 7012/8012

The Simple Regression Model. Part II. The Simple Regression Model

Econometrics I Lecture 3: The Simple Linear Regression Model

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

ECON3150/4150 Spring 2015

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Motivation for multiple regression

Simple Linear Regression: The Model

Homoskedasticity. Var (u X) = σ 2. (23)

Intermediate Econometrics

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Applied Econometrics (QEM)

Chapter 2: simple regression model

Introductory Econometrics

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multivariate Regression Analysis

Econometrics Multiple Regression Analysis: Heteroskedasticity

ECON3150/4150 Spring 2016

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Introductory Econometrics

Simple Linear Regression

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Ch 2: Simple Linear Regression

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

The Multiple Regression Model Estimation

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

The Simple Linear Regression Model

L2: Two-variable regression model

Making sense of Econometrics: Basics

Least Squares Estimation-Finite-Sample Properties

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Multiple Regression Analysis: Heteroskedasticity

Simple Linear Regression

ECNS 561 Multiple Regression Analysis

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Multiple Linear Regression

Regression Analysis: Basic Concepts

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Applied Econometrics (QEM)

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Estadística II Chapter 4: Simple linear regression

Diagnostics of Linear Regression

Econometrics Summary Algebraic and Statistical Preliminaries

Intermediate Econometrics

Multiple Regression Analysis

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Intro to Applied Econometrics: Basic theory and Stata examples

Review of Econometrics

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

Essential of Simple regression

Lecture 14 Simple Linear Regression

3. Linear Regression With a Single Regressor

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

ECON Program Evaluation, Binary Dependent Variable, Misc.

Econometrics of Panel Data

Final Exam. Economics 835: Econometrics. Fall 2010

Empirical Application of Simple Regression (Chapter 2)

Two-Variable Regression Model: The Problem of Estimation

Introduction to Econometrics

Simple and Multiple Linear Regression

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Measuring the fit of the model - SSR

Correlation and Regression

ECON3150/4150 Spring 2016

Scatter plot of data from the study. Linear Regression

Regression Models - Introduction

Section 3: Simple Linear Regression

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Lecture 3: Multiple Regression

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!

Review of Statistics

Review of probability and statistics 1 / 31

Scatter plot of data from the study. Linear Regression

4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit. 4.3 Modeling Issues. 4.4 Log-Linear Models

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Regression Analysis with Cross-Sectional Data

Advanced Econometrics I

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

THE MULTIVARIATE LINEAR REGRESSION MODEL

Linear models and their mathematical foundations: Simple linear regression

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Heteroskedasticity. Part VII. Heteroskedasticity

The Classical Linear Regression Model

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

ECON 450 Development Economics

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECONOMETRICS Introduction & First Principles

Linear Regression. Junhui Qian. October 27, 2014

2. Linear regression with multiple regressors

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

1 A Non-technical Introduction to Regression

Regression #3: Properties of OLS Estimator

Lectures 5 & 6: Hypothesis Testing

Transcription:

ECON 351 - The Simple Regression Model Maggie Jones 1 / 41

The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In general, more complicated econometric models are used for empirical analysis, but this provides a good starting point Suppose we have two variables, x and y, and we are interested in the relationship between the two Specifically, we care about the question, how does x affect y? Typically, we don t observe the full population of y or the full population of x so we can think of y and x as random samples 2 / 41

The Simple Regression Model In determining the relationship between x and y, we should keep three questions in mind: 1 How do we allow for factors other than x that might affect y? 2 What is the functional relationship between x and y? 3 How can we be certain we are capturing the ceteris paribus relationship between x and y? We resolve these questions by writing down an equation relating y to x 3 / 41

The Simple Regression Model y = β 0 + β 1 x + u (1) We call equation 1 the simple linear regression model y is called the dependent variable x is called the independent variable u is called the error term, it represents everything else that helps to explain y, but is not contained in x 4 / 41

The Simple Regression Model Equation 1 assumes a linear functional form, i.e. it assumes that the relationship between x and y is linear β 0 is the intercept term/parameter β 1 is the slope parameter - it measures the effect of x on y, holding all other factors constant: y = β 0 + β 1 x + u Note: in what instances would a linear functional form be a poor choice? 5 / 41

The Simple Regression Model Equation 1 assumes a linear functional form, i.e. it assumes that the relationship between x and y is linear β 0 is the intercept term/parameter β 1 is the slope parameter 6 / 41

More on the Error Term As long as β 0 is included in the equation, we can assume that the average value of u in the population is zero E(u) = 0 (2) A crucial assumption is that the average value of u does not depend on x, this is known as mean independence E(u x) = E(u) (3) Combining equation 2 and 3 yields one of the most important assumptions in regression analysis, the zero conditional mean assumption E(u x) = 0 (4) 7 / 41

The Simple Regression Model Equation 1 assumes a linear functional form, i.e. it assumes that the relationship between x and y is linear β 0 is the intercept term/parameter β 1 is the slope parameter 8 / 41

The Simple Regression Model The zero conditional mean assumption gives β 1 another interpretation Taking conditional expectations of equation 1 yields: E(y x) = β 0 + β 1 x (5) which is known as the population regression function We interpret β 1 as, a 1 unit increase in x increases the expected value of y by β 1 units 9 / 41

The Simple Regression Model We can now re-consider equation 1 y = β 0 + β 1 x }{{} explained part y can be decomposed into + u }{{} unexplained part the explained part - part of y explained by x the unexplained portion - part of y that can t be explained by x 10 / 41

Ordinary Least Squares Now we can begin to discuss the way to estimate β 0 and β 1 given a random sample of y and x Let {(x i, y i ) : i = 1,..., n} be a random sample of size n drawn from the population (x, y) y i = β 0 + β 1 x i + u i (6) How do we use the data to obtain parameter estimates of the population intercept and slope? 11 / 41

Ordinary Least Squares We begin with the zero conditional mean assumption of equation 4, which implies: Cov(x, u) = E(ux) = 0 (7) And the zero mean assumption of equation 2 E(u) = 0 (8) These two equations are known as moment conditions 12 / 41

Ordinary Least Squares We then define u in terms of the simple regression equation and our moment conditions become E(ux) = E [(y β 0 β 1 x)x] = 0 (9) And the zero mean assumption of equation 2 E(u) = E(y β 0 β 1 x) = 0 (10) 13 / 41

Ordinary Least Squares Given our sample of x and y, using the method of moments, we choose our parameter estimates, ˆβ 0 and ˆβ 1 to solve the system of equations E(ux) = 1 n E(u) = 1 n n (y i ˆβ 0 ˆβ 1 x i )x i = 0 (11) i=1 n (y i ˆβ 0 ˆβ 1 x i ) = 0 (12) i=1 14 / 41

Ordinary Least Squares Solving yields the parameter estimate for β 0 ˆβ 0 = ȳ + ˆβ 1 x (13) And the estimate for β 1 ˆβ 1 = n i=1 (x i x)(y i ȳ) n i=1 (x i x) 2 (14) Equation 14 is actually just the sample covariance between x and y divided by the sample variance of x 15 / 41

Ordinary Least Squares The method of moments is not the only way to arrive at these equations for parameter estimates of β 0 and β 1 The focus of Econ 351 will be on the method of Ordinary Least Squares Our estimates ˆβ 0 and ˆβ 1 are also called the ordinary least squares estimates 16 / 41

Ordinary Least Squares To see why, define a fitted value as the value of y i that we obtain from combining the sample x i with our parameter estimates, ˆβ 0 and ˆβ 1 ŷ i = ˆβ 0 + ˆβ 1 x i Define the residual as the difference between the actual value of y i and the fitted value ŷ i û i = y i ŷ i = y i ˆβ 0 ˆβ 1 x i 17 / 41

Chapter 2 Ordinary Least Squares Figure 2.4 Fitted values and residuals. The Simple Regression Model y y i û i residual y ˆ ˆ 0 ˆ 1 x y 1 yˆ i Fitted value x 1 x i x as small as possible. The appendix to this chapter shows that the conditions necessary 18 / 41

Ordinary Least Squares It seems reasonable to want parameter values that minimize the difference between the true y i and the fitted value ŷ i Sometimes û i will be positive and sometimes it will be negative, thus in theory summing over all residuals could equal zero However, if we square the residuals, we have a more accurate summary of the total error in the regression residuals 19 / 41

Ordinary Least Squares Choosing parameter values for β 0 and β 1 that minimize the sum of squared residuals is the basic principle behind ordinary least squares n û 2 i = i=1 n i=1 ( y i ˆβ 0 ˆβ 1 x i ) 2 (15) To minimize equation 15 we set the first order conditions with respect to each of the ˆβs equal to zero 20 / 41

Ordinary Least Squares The fitted values and parameter values form the OLS regression line ŷ = ˆβ 0 + ˆβ 1 x (16) The slope estimate tells us the amount by which ŷ changes when x changes by one unit ˆβ 1 = ŷ x 21 / 41

Useful Properties of OLS Estimates 1 The sum of the OLS residuals is zero n û i = 0 i=1 2 The sample covariance between x and û is zero n x i û i = 0 3 The point ( x, ū) is always on the OLS regression line i=1 22 / 41

Useful Properties of OLS Estimates Re-writing y i in terms of its fitted value and its residual is useful y i = ŷ i + û i From here we see that If 1 n n i=1 ûi = 0 then ȳ i = ŷ i The covariance of ŷi and û i is zero OLS decomposes y i into two parts: a fitted value and a residual, both of which are uncorrelated 23 / 41

Sum of Squares 1 Total Sum of Squares SST = n (y i ȳ) 2 i=1 2 Explained Sum of Squares SSE = n (ŷ i ȳ) 2 i=1 3 Residual Sum of Squares SSR = n (y i ŷ i ) 2 i=1 24 / 41

Sum of Squares 1 Total Sum of Squares: measures the total sample variation in the y i (measures how spread out the y i are in the sample) 2 Explained Sum of Squares: measures the sample variation in the fitted values, ŷ i 3 Residual Sum of Squares: measures the sample variation in the residuals, û i Note that the total variation can be expressed as the sum of the explained and unexplained variation: SST = SSE + SSR 25 / 41

Goodness of Fit One of the most common ways to measure how well a regression fits the data is to use the R-squared R 2 = SSE/SST = 1 SSR/SST (17) It tells us the ratio of the explained variation compared to the total variation So if the majority of y is explained by unobserved factors, the R 2 tends to be very low R 2 is always between 0 and 1 26 / 41

Notes on the R 2 A low R 2 does not necessarily mean that the regression is bad and shouldn t be used It simply means that the variable x does not explain much of the variation in the variable y i.e. there are other variables that might help to explain y The regression may still provide an accurate summary of the relationship between x and y 27 / 41

Functional Form Level-Level: dependent and independent variables are in levels and related linearly y = β 0 + β 1 x + u Log-Level: dependent variable is in log form, independent variable in levels log(y) = β 0 + β 1 x + u Log-Log: dependent and independent variables are in log form - can be interpreted as an elasticity log(y) = β 0 + β 1 log(x) + u Level-Log: dependent variable is in levels and independent variable in log form y = β 0 + β 1 log(x) + u 28 / 41

Functional Form Model Equation Y X β 1 Lev-Lev y = β 0 + β 1 x + u y x y = β 1 x Log-Lev log(y) = β 0 + β 1 x + u log(y) x % y = (100β 1 ) x Log-Log log(y) = β 0 + β 1 log(x) + u log(y) log(x) % y = β 1 % x Lev-Log y = β 0 + β 1 log(x) + u y log(x) y = (β 1 /100)% x 29 / 41

Unbiasedness of OLS Unbiasedness is a statistical property that we will examine in the context of our simple linear regression model We require four assumptions to establish the unbiasedness of OLS parameters SLR. 1 - Linear in Parameters: needs to be in the form y = β 0 + β 1 x + u SLR. 2 - Random Sampling: {(xi, y i ) : i = 1,..., n} must be drawn from a random sample SLR. 3 - Variation in x: the sample outcomes on x are not all the same value SLR. 4 - Zero Conditional Mean: our previous assumption E(u x) = 0 holds 30 / 41

Unbiasedness of OLS Now consider rewriting ˆβ 1 as ˆβ 1 = n i=1 (x i x)y i n i=1 (x i x) 2 Recall from the review that a parameter is unbiased if its expectation equals its true value Substituting in the regression equation for y i yields ˆβ 1 = n i=1 (x i x)(β 0 + β 1 x i + u i ) n i=1 (x i x) 2 31 / 41

Unbiasedness of OLS Which, cancelling terms that equal 0, is ˆβ 1 = β 1 + n i=1 (x i x)u i n i=1 (x i x) 2 Checking unbiasedness: [ n E( ˆβ i=1 1 ) = E(β 1 ) + E (x ] i x)u i }{{} n i=1 (x i x) 2 =β 1 }{{} And since E(u i ) = 0, we have: 1 = n ni=1 (x i x) 2 i=1 (x i x)e(u i ) E( ˆβ 1 ) = β 1 32 / 41

Unbiasedness of OLS Now to verify the unbiasedness of ˆβ 0 ˆβ 0 = ȳ ˆβ 1 x = β 0 + β 1 x + ū ˆβ 1 x E( ˆβ 0 ) = E(β 0 ) }{{} + E(β }{{ 1 x) } =β 0 =β 1 x E( ˆβ 0 ) = β 0 + E(ū) }{{} =0 So ˆβ 0 is also unbiased under SLR. 1 - SLR. 4 E( ˆβ 1 x) }{{} =β 1 x 33 / 41

Variance of the OLS Estimate We also wish to know how far we can expect ˆβ 1 to be from β 1 on average We can compute the variance of the OLS estimators under assumptions SLR. 1 - SLR. 4, plus one additional assumption SLR. 5 - Homoskedasticity: the error term has the same variance given any value of the explanatory variable Var(u x) = σ 2 x 34 / 41

Variance of the OLS Estimate Under SLR. 1 - SLR. 5, the variance of the OLS estimators are: Var( ˆβ 1 ) = σ 2 n i=1 (x i x) 2 And Var( ˆβ 0 ) = σ 2 n n i=1 x2 i n i=1 (x i x) 2 35 / 41

Estimating the Error Variance Typically, we don t know the true value of σ 2, so we need to obtain an estimate of it The errors are never observed, but the regression residuals are Note that E(u 2 ) = σ 2 Thus, an unbiased estimator of σ 2 is 1 n n i=1 u2 i However, we do not observe u i, we observe û i 36 / 41

Estimating the Error Variance Replacing u i with û i yields the estimator n ˆσ 2 = 1 n i=1 û 2 i However, this estimator is biased Recall the two restrictions from the first order conditions: n i=1 ûi = 0 and n i=1 x iû i = 0 If we observed n 2 residuals, we could always use the above conditions to back out the remaining two residuals 37 / 41

Estimating the Error Variance Our estimate of the error variance makes an adjustment for the degrees of freedom Is ˆσ 2 unbiased? Yes! ˆσ 2 = 1 n 2 n û 2 i (18) i=1 38 / 41

Estimators of the OLS Parameter Variances We can use equation 18 in Var( ˆβ 0 ) and Var( ˆβ 1 ) to obtain an estimate of the variances of ˆβ 0 and ˆβ 1 Var( ˆβ 1 ) = 1 n 2 n i=1 û2 i n i=1 (x i x) 2 Var( ˆβ 0 ) = 1 n n 2 i=1 û2 i n n i=1 x2 i n i=1 (x i x) 2 39 / 41

Additional Notes on Variance Estimates We call the square root of the estimate of the variance of the errors the standard error of the regression ˆσ = ˆσ 2 ˆσ is used to compute the standard error of ˆβ 1 se( ˆβ 1 ) = ˆσ n i=1 (x i x) 2 40 / 41

Regression Through the Origin In some instances it makes sense to exclude the constant term from the model This regression equation is called a regression through the origin since we are imposing the intercept to be equal to 0 y = β 1 x + u (19) Minimizing the sum of squared residuals for this regression yields the following estimate for β 1 β 1 = n i=1 x iy i n i=1 x2 i 41 / 41