Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Similar documents
ECON3150/4150 Spring 2016

Simple Linear Regression: The Model

Intermediate Econometrics

Multiple Regression Analysis. Part III. Multiple Regression Analysis

ECON3150/4150 Spring 2015

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Review of Econometrics

Homoskedasticity. Var (u X) = σ 2. (23)

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

ECON The Simple Regression Model

Inference in Regression Model

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Multiple Linear Regression CIVL 7012/8012

The Simple Regression Model. Simple Regression Model 1

The Simple Regression Model. Part II. The Simple Regression Model

Econometrics I Lecture 3: The Simple Linear Regression Model

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Econometrics Multiple Regression Analysis: Heteroskedasticity

The Simple Linear Regression Model

Multiple Regression Analysis

Lectures 5 & 6: Hypothesis Testing

Intermediate Econometrics

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Midterm 1 ECO Undergraduate Econometrics

Econometrics Summary Algebraic and Statistical Preliminaries

Introduction to Econometrics

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Linear Regression. Junhui Qian. October 27, 2014

Review of Statistics

THE MULTIVARIATE LINEAR REGRESSION MODEL

Motivation for multiple regression

Introductory Econometrics

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

ECNS 561 Multiple Regression Analysis

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

ECON3150/4150 Spring 2016

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Variance Decomposition and Goodness of Fit

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Econometrics - 30C00200

CHAPTER 6: SPECIFICATION VARIABLES

Statistical Inference with Regression Analysis

Simple Linear Regression Model & Introduction to. OLS Estimation

Making sense of Econometrics: Basics

Regression Models - Introduction

Instrumental Variables

Applied Econometrics (QEM)

Multiple Regression Analysis

Regression Analysis with Cross-Sectional Data

Applied Quantitative Methods II

Heteroskedasticity. Part VII. Heteroskedasticity

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics

WISE International Masters

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Environmental Econometrics

1. The OLS Estimator. 1.1 Population model and notation

Linear Models in Econometrics

P1.T2. Stock & Watson Chapters 4 & 5. Bionic Turtle FRM Video Tutorials. By: David Harper CFA, FRM, CIPM

Wooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues

Problem 13.5 (10 points)

Advanced Econometrics I

The Multiple Regression Model Estimation

Mathematics for Economics MA course

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Applied Statistics and Econometrics

ECONOMETRICS FIELD EXAM Michigan State University August 21, 2009

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

WISE International Masters

Inference in Regression Analysis

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

4.8 Instrumental Variables

Simple Linear Regression for the MPG Data

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Multiple linear regression

Introductory Econometrics

4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit. 4.3 Modeling Issues. 4.4 Log-Linear Models

Least Squares Estimation-Finite-Sample Properties

Intro to Applied Econometrics: Basic theory and Stata examples

Heteroskedasticity (Section )

An overview of applied econometrics

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Applied Statistics and Econometrics

Ordinary Least Squares Regression

Heteroscedasticity 1

Multiple Regression Analysis: Further Issues

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

LECTURE 2: SIMPLE REGRESSION I

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Transcription:

Economics 113 Simple Regression Models Simple Regression Assumptions Simple Regression Derivation Changing Units of Measurement Nonlinear effects OLS and unbiased estimates Variance of the OLS estimates Book chapters Chapters 1, 2 and 3 are relevant for the next few lectures Books have been placed on reserve at the Science and Engineering Library.

Econometrics Intro Statistics applied to non-experimental data Estimate relationships and describe behaviors Usually comes from an economic model Utility Maximization Profit Maximization Government Objectives Political Objectivs

Econometrics Example Model of criminal activity y = Hours spent breaking into cars y = f(x 1, x 2,..., x m 1, x m ) What should be included in the function? What do we do with our model and estimates? Form individual or joint hypotheses about the variables and their effects. Generate predictions

Econometrics Causality Causality: The most important thing you will learn in this class! Very hard to determine in a non-experimental setting. Examples: Maternal smoking and infant birth weight Smoking vs non-smoking mothers Are they the same? Wages and Labor supply You only receive a wage if you have a job People who are in the labor force receive a job for a reason Temperatures and CO2 More CO2 leads to higher temperatures Higher temperatures lead to more CO2

Econometrics Data Cross-sectional Data Sample of agents taken at one point in time. Ideally, the data is a random sample and observations are independent. Are cross-sectional observation independent? Time-series Data Repeat observations on specific agents. Are time-series observations are independent? Panel Data Have repeat observations for the same agents in different time periods. Ideal data, but difficult to get. Panel data can be used to analyze individual-specific differences Are panel observations are independent?

Simple Regression Model Intro How does y change with x. y can be called: Dependent variable Explained variable LHS variable x can be called: Independent variable Explanatory variable RHS variable y = β 0 + β 1 x + u u is the error term, or "disturbance" term u contains everything that we don t control for, both observed and unobserved β 1 is the slope parameter β 0 is the intercept parameter

Simple Regression Model An Example Example: Class attendance and grades grade i = β 0 + β 1 Attend i + u i How do we interpret β 0 and β 1. Suppose we estimate: grade i = 22.769 + 0.121Attend i Each additional class attended is associated with a higher grade of 0.121. Is this causal? When does β 1 summarize a causal relationship between Attend and grade?

Simple Regression Model The Assumptions General framework: y i = β 0 + β 1 x i + u i Assumption 1: E(u) = 0 This is innocuous as long as we have an intercept in the model. Assumption 2: E(u x) = E(u) Combined with assumption 1 this gives us E(u x) = 0 This means that given any x, the value of u we expect will be 0. This is not necessarily realistic. This is the hard assumption to satisfy.

Simple Regression Model The Example Example: Class attendance and grades grade i = β 0 + β 1 Attend i + u i The key: u contains all the variables, other than Attend, that help determine your grade!!!! Can you list some of these variables? For example, for A2 to hold, we would need E(u Attend = 32) = E(u Attend = 10) What does this mean? Is this likely?

Least Squares Regression The Derivation How do we estimate β 0 and β 1? Predicted Value: y i = β 0 + β 1 x i Residual: u i = y i y i Suppose that we choose to minimize the sum of squared errors min β 0 β1 n i=1 u 2 i Thus: Take derivatives! min β 0 β1 n yi β 0 β 2 1 x i (1) i=1

Least Squares Regression The Derivation Differentiate n i=1 yi β 0 β 1 x i 2 with respect to β0 : n 2 y i β 0 β 1 x i = 0 i=1 Divide by 2, divide by n 1 n n i=1 yi β 0 β 1 x i = 0 To which assumption does this equation correspond? E(u) = 0

Least Squares Regression The Derivation Differentiate n i=1 yi β 0 β 1 x i 2 with respect to β1 : n 2x i yi β 0 β 1 x i = 0 i=1 Divide by 2, divide by n 1 n n i=1 x i yi β 0 β 1 x i = 0 To which assumption does this equation correspond? E(u x) = 0

Least Squares Regression The Derivation Combining the two equations, and after lots of algebra, we get: n i=1 xi µ x (yi µ y ) β 1 = n 2 i=1 xi µ x Another way to write this β 1 = σ xy σ 2 x To solve for β 0, take means of y i = β 0 + β 1 x i + u i and rearrange: We can also solve for the residuals: β 0 = µ y β 1 µ x u i = y i y i u i = y i ( β 0 + β 1 x i )

Simple Regression Model Diagnostic Measures SST: Total sum of squares measures the total amount of variability in the dependent variable. n 2 SST = yi µ y SSR: Sum of squared residuals measures the total amount of variability that the model does not explain n 2 SSR = ui R-Squared: R 2 i=1 i=1 R 2 = 1 SSR SST Measures the variation "explained" by the model Often misinterpreted as "goodness of fit"

OLS Changing Units of Measurement Data Scaling Predictions in different units Different interpretations Example: Estimates: wage = β 0 + β 1 educ + β 2 tenure + u - educ is in years - tenure is years on the job - wage is in dollars wage = β 0 + β 1 educ + β 2 tenure Again, the u vanishes since E[u educ, tenure] = 0.

OLS Changing Units of Measurement Wage in cents rather than dollars? = wage dollars = 1 100 wage cents Original Equation: Substitute: 1 wage dollars = β 0 + β 1 educ + β 2 tenure 100 wage cents = β 0 + β 1 educ + β 2 tenure wage cents = 100 β 0 + 100 β 1 educ + 100 β 2 tenure What if we want to measure tenure in months? = tenure years = 1 12 tenure months Substitute 1 wage = β 0 + β 1 educ + β 2 12 tenure months 1 wage = β 0 + β 1 educ + β 12 2 tenure months

OLS Handling non-linearity Not everything linear in real life. Relationship between education and wage is linear? No. Which has the higher benefit? 3 more years after 6th grade? 3 more years after undergrad? Common ways to easily handle non-linearity 1 Take logs of the dependent variable 2 Take logs of the independent variable 3 Take logs of both

OLS Wage in Levels wage 0 500 1000 1500 2000 2500 3000 10 12 14 16 18 educ

OLS Wage in logs log(wage) 5.0 5.5 6.0 6.5 7.0 7.5 8.0 10 12 14 16 18 educ

OLS Handling non-linearity If data are in levels: wage = β 0 + β 1 educ How do we interpret β 1? Totally differentiate. Simplify Interpret β 1 wage = β 1 educ wage educ = β 1 wage = 15, 432 + 1, 324educ For each additional year of education, you earn $1,324 more.

OLS Handling non-linearity If wage is in logs log( wage) = β 0 + β 1 educ How do we interpret β 1? Totally differentiate. Simplify wage wage 100 }{{} % change wage wage = β 1 educ Interpret β 1 in the following results = β1 100 educ }{{} unit change log( wage) = 9.64 + 0.08educ A one-year increase in education yields an 8% increase in wage

OLS Handling non-linearity If wage and educ in logs How do we interpret β 1? Totally differentiate. Simplify log( wage) = β 0 + β 1 log(educ) wage wage = β educ 1 educ wage wage 100 }{{} % change Interpret β 1 in the following results = β 1 educ educ 100 }{{} % change log( wage) = 9.64 + 0.5 log(educ) A 1% increase in education yields an 0.5% increase in wage

Simple Regression Model Biased or unbiased When is β 1 a good estimate, where "good" is defined as unbiased? By unbiased, E β1 x = β 1 β 1 s are centered around β 1 β 1 Unbiased if the following assumptions hold! 1 Linear in parameters: y i = β 0 + β 1 x i 2 Random sample of size n. {(x 1, y 1 ), (x 2, y 2 ), (x 3, y 3 )... (x n, y n )} 3 Zero conditional mean: E(u x) = 0 4 σ 2 x > 0.

Simple Regression Model Biased or unbiased Simple example Suppose that the population is characterized by: y = 3 2x 1 + u - β 0 = 3 - β 1 = 2 - u distributed normal, mean 0 and sd 3 - x s are between 0.01 and 10, spaced evenly - 1000 people Estimate using: y = β 0 + β 1 x 1 + u Plot y on x

y -25-20 -15-10 -5 0 5 10 0 2 4 6 8 10 x

Simple Regression Model Biased or unbiased Suppose that we sample 30 people from the population, and estimate β 1 via OLS First sample: β 1 = 1.951 Second sample: β 1 = 1.890 Third sample: β 1 = 1.559 They re all wrong. Is this a problem? Keep sampling!! Sample 1000 times Plot a histogram of the estimates of β 1 How does the distribution of estimates compare to 2?

Histogram of Beta1 Density 0.0 0.5 1.0 1.5 2.0-2.5-2.0-1.5 Beta1

OLS - Variance Basics If assumptions 1-4 hold, β 1 is centered around β 1. Central tendency says nothing about dispersion. We are also interested in estimating Var( β 1 ) From the previous histogram, there is variance in the estimate β 1 Is the estimate of β 1 precise/reliable? Assumption 5 - Homoskedastic Errors: Var [u x] = σ 2 Variance of errors is common across x. Assumptions 1-5 are called the "Gauss-Markov Assumptions" If Var [u x] Var [u], errors are heteroskedastic.

y -25-20 -15-10 -5 0 5 10 0 2 4 6 8 10 x

y -80-60 -40-20 0 20 40 0 2 4 6 8 10 x

OLS - Variance Estimate Variance Variance of the slope parameter: var β1 = σ 2 n i=1 xi µ x 2 What do I need for these variance estimates? An estimate of σ 2 : σ 2 = 1 n 2 n i=1 u 2 i Why n 2? σ 2 requires estimating β 0 and β 1.

OLS Estimate Variance Standard error of β 1 : se β1 = σ 2 n i=1 xi µ x 2 Dispersion of β 1 around β 1, same scale as β 1 How does σ effect the precision of our estimates? Why? Higher σ yields higher standard errors (lower precision). With higher σ, there is more noise, and thus it is harder to get a precise estimate of β 1 Using the original example, compare the following two situations: u distributed normal, mean 0 and sd 10 u distributed normal, mean 0 and sd 3

Histogram of Beta1 - SD(u)=10 Density 0.0 0.5 1.0 1.5 2.0-5 -4-3 -2-1 0 Beta1

Histogram of Beta1 - Adding SE(u)=3 Density 0.0 0.5 1.0 1.5 2.0-5 -4-3 -2-1 0 Beta1