Chapter 2: simple regression model

Similar documents
Essential of Simple regression

ECON3150/4150 Spring 2015

Motivation for multiple regression

ECON The Simple Regression Model

Empirical Application of Simple Regression (Chapter 2)

Two-Variable Regression Model: The Problem of Estimation

The Simple Linear Regression Model

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Applied Statistics and Econometrics

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Econometrics I Lecture 3: The Simple Linear Regression Model

Homework Set 2, ECO 311, Fall 2014

L2: Two-variable regression model

Multiple Linear Regression CIVL 7012/8012

ECON3150/4150 Spring 2016

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Homework Set 2, ECO 311, Spring 2014

ECO375 Tutorial 8 Instrumental Variables

Multivariate Regression Analysis

2. Linear regression with multiple regressors

Introductory Econometrics

Simple Linear Regression Estimation and Properties

Final Exam. Economics 835: Econometrics. Fall 2010

Introductory Econometrics

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

ECON3150/4150 Spring 2016

Lab 6 - Simple Regression

Linear Regression. Junhui Qian. October 27, 2014

1 A Non-technical Introduction to Regression

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Lecture 4: Regression Analysis

The Simple Regression Model. Simple Regression Model 1

Econometrics Review questions for exam

Week 3: Simple Linear Regression

THE MULTIVARIATE LINEAR REGRESSION MODEL

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Homoskedasticity. Var (u X) = σ 2. (23)

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Econometrics Summary Algebraic and Statistical Preliminaries

Handout 11: Measurement Error

1 Motivation for Instrumental Variable (IV) Regression

Making sense of Econometrics: Basics

The regression model with one stochastic regressor (part II)

Regression Analysis Chapter 2 Simple Linear Regression

Chapter 6: Linear Regression With Multiple Regressors

Environmental Econometrics

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Dealing With Endogeneity

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Regression #2. Econ 671. Purdue University. Justin L. Tobias (Purdue) Regression #2 1 / 24

2 Regression Analysis

ECO220Y Simple Regression: Testing the Slope

Simple Linear Regression: The Model

Econometrics I. by Kefyalew Endale (AAU)

ECNS 561 Multiple Regression Analysis

Econometrics. 7) Endogeneity

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Econ 2120: Section 2

Lecture 1: OLS derivations and inference

8. Instrumental variables regression

Linear Models in Econometrics

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

2.2 Average vs. Instantaneous Description

Linear Regression with Multiple Regressors

Lectures 5 & 6: Hypothesis Testing

Simple Linear Regression Model & Introduction to. OLS Estimation

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Regression Analysis with Cross-Sectional Data

Intro to Applied Econometrics: Basic theory and Stata examples

Linear Regression with Multiple Regressors

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Nonlinear Regression Functions

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Applied Econometrics (QEM)

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

The Simple Regression Model. Part II. The Simple Regression Model

Gov 2000: 9. Regression with Two Independent Variables

WISE International Masters

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Making sense of Econometrics: Basics

1 Correlation and Inference from Regression

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Intermediate Econometrics

Introduction to Econometrics

The multiple regression model; Indicator variables as regressors

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

An overview of applied econometrics

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Applied Health Economics (for B.Sc.)

Transcription:

Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics. You better have a solid understanding. Discuss: 1. How to show the class size x and rating of eco201 instructors y are related? Comment on the limits of the following ideas. 2. Idea 1: estimate the covariance of x and y. 3. Idea 2: estimate the correlation (coefficient) of x and y. 4. Idea 3: let sample 1 includes the class with fewer than 40 students, and sample 2 includes the class with more than 40 students. Compare the average rating in the two samples (using two-sample t test). Simple regression 1. Simple regression is used when we try to use only one independent variable (explanatory variable, regressor) x to explain the dependent variable y. We have two variables in mind, and we want to know their relationship. 2. We start with simple regression not because it is good, but because it is simple. Actually simple regression has serious drawback: y can depend on many factors; but the simple regression just uses one of them. Ignoring other factors may induce some bias. That is, the estimated simple regression may capture the effect of other factors, instead of the true effect of x. We will revisit this bias issue repeatedly. 3. For instance, y is rating of eco201 instructor; x is the class size (the number of students in the class). We want to know how class size affects the rating. Other factors such as whether the class meets before 9 am matters, but the simple regression ignores them. Discuss : can you think of other factors that may affect rating? 4. Mathematically, the simple regression model is y = β 0 + β 1 x + u (1) 1

where (a) β 0 is the (unknown) intercept coefficient (b) β 1 is the (unknown) slope coefficient. (c) Graphically, the function y = β 0 + β 1 x is a straight line. β 0 is the intercept, and β 1 is the slope. The line is upward sloping if β 1 > 0, and downward sloping if β 1 < 0. (d) the error term (innovation, disturbance term, shock) u represents the factors other than x that affect y. The letter u is the first letter of unobserved. The simple regression effectively treats all other factors as unobserved (as if their data are unavailable). (e) y, x and u are all random variables. y and x are observed (for which data are available); while u is unobserved. 5. The key parameter is β 1, which measures the so called marginal effect β 1 = dy dx y x (2) (a) The model (1) assumes the marginal effect of x on y is constant. That means, y changes by a constant amount (or changes at a constant rate) when x changes by one unit. Later you will learn how to modify the regression to allow for nonconstant marginal effect. Discuss : do you think the rating of instructor will change at constant rate when more and more students are added into a class? Hint: Think about a graph where y is on the vertical axis, and x is on the horizontal axis. (b) The marginal effect is a mathematical concept. It assumes u is held constant. This reminds the ceteris paribus. That is why simple regression can be used for proving causality, but it must be used with caution. You need to verify certain assumption. See below. 6. We call (1) a model because it is based on a key assumption E(u x) = E(u) = 0 (3) (a) The E(u) = 0 part in (3) is not restrictive. Provided that the intercept term is 2

included, we can always redefine a new error term with zero mean if the original error term has nonzero mean. (b) The E(u x) = E(u) part is crucial, for purpose of proving causality. For forecasting purpose, E(u x) = E(u) is not needed. (c) E(u x) = E(u) implies that as x varies, the mean of u conditional on x remains constant (and the constant is E(u)). Put differently, it is as if u and x are independent so that E(u x) = E(u). The independence also reminds the ceteris paribus. (d) Assumption (3) is equivalent to the assumption that E(y x) = β 0 + β 1 x (4) Proof: under (3), E(y x) = E(β 0 + β 1 x + u x) = β 0 + β 1 x. Assumption (4) implies the conditional mean E(y x) is linear and is equal to β 0 + β 1 x. Formally, E(y x) = β 0 + β 1 x is called population regression function (PRF). (e) Assumptions (3) and (4) both fail when E(u x) depends on x, i.e., when E(u x) = f(x) constant. (5) In that case the true conditional mean is E(y x) = β 0 + β 1 x + f(x), but f(x) is missing on the right hand side of (4). So β 0 + β 1 x mis-specifies E(y x). (f) For instance, Assumptions (3) and (4) hold if the class size x is independent of all other factors u. Assumptions (3) and (4) fail when, say, small class is more likely to meet before 9 am than big class. (g) Assumption (3) is difficult to check because it involves the conditional mean. However, Assumption (3) implies that the regressor x is uncorrelated with u, and this is easier to verify E(u x) = E(u) = 0 cov(x, u) = 0 (6) Put differently, Assumptions (3) implies the regressor x is exogenous. 3

(h) Assumption (3) must be violated when x is correlated with any other factor, or cov(x, u) 0. In that case the regressor x is endogenous. Regressor is exogenous when it is uncorrelated with the error term Regressor is endogenous when it is correlated with the error term Assumption (3) is violated when regressor is endogenous (i) For example, the class size is endogenous if it is correlated with the meeting time of the class. Assumption (3) is violated in that case. The class size is exogenous if it is uncorrelated with all other factors. This is a very restrictive requirement. Most likely, the regressor in the simple regression is endogenous. Endogeneity means difficulty in proving causality. (j) The simple regression is very likely to suffer from the endogeneity issue. That is why simple regression is not the best choice to prove causality. If simple regression has to be used, it must be used with caution. We will revisit this issue later. 7. We can show the statistical interpretation of β 1 is β 1 = cov(y, x) var(x) (7) Therefore the method of moments (MM) estimator for β 1 is given as ˆβ 1 = s y,x s y,x = s 2 x = s 2 x n i=1 (y i ȳ)(x i x) n 1 n i=1 (x i x) 2 n 1 (8) (9) (10) where s y,x is the sample covariance of y and x; s 2 x is the sample variance of x. Basically the MM estimator replaces population variance and covariance with sample variance and covariance. Remarks (a) Formula (8) is used by the stata command reg 4

(b) Formula (8) shows the estimated slope coefficient ˆβ 1 has the same sign as the sample covariance. If y and x move in the same direction, then they have positive sample covariance and positive ˆβ 1. (c) For instance we may believe the class size is negatively correlated with the rating. The instructor may perform better and get higher rating in small class than in big class. Then we guess ˆβ 1 would be negative. You may want to figure out why if the sign of actual ˆβ 1 contradicts with your guess. (d) Formula (8) indicates the magnitude of ˆβ 1 can be manipulated, simply because the covariance can be manipulated. Find out what will happen to ˆβ 1 if we multiply x by a constant c. (e) ˆβ 1 cannot be computed if x does not vary. s 2 x = 0 if that happens, but we cannot put zero at the denominator of (8). This result implies that Big variance (variation) in x is desirable. Intuitively big variation in x means there is a lot of information about x which can be used for estimation. If x does not change or changes a little, the estimation can be imprecise. (f) For instance, we hope our sample contains both big class and small class. We can not estimate the effect of class size on rating if all class in our sample have the same size. (g) (7) indicates how to interpret β 1 and its estimate ˆβ 1. In general the slope coefficient β 1 just measures the association (correlation). A common mistake is trying to over-interpret β 1. Whether or not β 1 measures causality depends on Assumption (3). In short β 1 measures causality only when the regressor is exogenous! Intuitively, ceteris paribus is satisfied by exogenous regressor because x and u are (as if) independent. So we can change x while holding all other factors u constant. If x and u are dependent (correlated), then β 1 only shows association, not causality. 5

8. Critical thinking: how to interpret and estimate the intercept coefficient β 0? (a) Using Assumption (3) and show β 0 = µ y β 1 µ x (11) (b) The MM estimator for β 0 is given by ˆβ 0 = ȳ ˆβ 1 x (12) where ȳ and x are sample means of y and x. Why does (12) makes sense? (c) The stata command reg y x automatically reports ˆβ 0. It is the coefficient of variable cons (d) It can be meaningless to try to interpret β 0 because The intercept coefficient β 0 measures the mean value of y when x equals zero. In our data, all classes have at least one student. So the class size x can not equal zero. Then β 0 becomes uninteresting. (e) How to modify the regression so that the new β 0 measures the mean value of y when x equals its mean value? 9. One advantage of the MM estimator is that calculus is not used. There is an alternative estimator, called ordinary least squares (OLS) estimator. The OLS estimator is intuitive, but requires calculus. Many old textbooks only discuss the OLS estimator. But modern econometrics emphasizes the MM estimator more and more. The good news is that the OLS estimators for β 0 and β 1 are the same as the MM estimators. (a) Denote the OLS estimator by ˆβ 0 and ˆβ 1. The hat emphasizes they are estimates. For each observation we can compute the fitted (predicted) value: ŷ i ˆβ 0 + ˆβ 1 x i, (i = 1,..., n) (13) where the subscript i is used to index the observation. x i is the i-th observation of x. In total we have n fitted values: (ŷ 1, ŷ 2,..., ŷ n ). (b) Because we use sample instead of population, most likely ˆβ 0 and ˆβ 1 are different from the true value β 0 and β 1. That means in general y i ŷ i. Their difference, 6

called residual, captures the sampling error: û i y i ŷ i (14) In total we have n residuals. (c) Now we have a very important decomposition: y i ŷ i + û i (15) The fitted value ŷ represents the part of y explained by x; the residual û represents the unexplained part. Put differently, Residual is the part of y remained after the effect of x has been netted out. (d) Because residual represents error, we want to minimize it. More explicitly, the OLS estimator is obtained by minimizing the sum of squared residuals: ˆβ 0, ˆβ 0 = argmin n û 2 i (16) The objective function n i=1 û2 i is called residual sum squares (RSS). Squaring is intended to penalize big error and avoid cancelation of positive and negative errors. Also a squared (quadratic) function has some nice math properties. (e) From calculus, a (differentiable or smooth) function is minimized when the derivative is zero. So Appendix 2A of the textbook (on page 66) obtains the first order condition (FOC) (or necessary condition) by setting the (partial) derivatives of i=1 RSS with respect to ˆβ 0 and ˆβ 1 to zero. The FOC are n û i = 0 (17) i=1 n û i x i = 0 (18) i=1 The OLS estimator is the solution of the system of equations (17) and (18). The OLS estimators are the same as (8) and (12). 7

(f) Another unknown parameter is σ = se(u) = var(u). Its OLS estimator is RSS ˆσ = n k 1 (19) where n is the sample size, and k is the number of regressors. For simple regression k = 1. The ˆσ is called standard error of regression (SER). SER meaures the other factors contained in the error term. To put SER into perspective, you may compare SER to µ y. The regression has good fit if SER is small relative to µ y. (g) There is a formal measure of goodness of fit, called R squared (coefficient of determination). The formula is R 2 = 1 RSS n TSS = 1 i=1 û2 i n i=1 (y (20) i ȳ) 2 where TSS is shorthand for total sum squares TSS n i=1 (y i ȳ) 2. R 2 measures the fraction of variation in y that can be explained by the regressor x. A model fits data well if R 2 is close to one. For example, R 2 = 0.4 means 40% variation of the dependent variable can be explained by the model. Because R 2 measures the degree of linear association, not casuality, we have Causality is the focus of econometrics; R 2 is not. 8