Simple Regression Model (Assumptions)

Similar documents
Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Making sense of Econometrics: Basics

Multiple Linear Regression CIVL 7012/8012

Business Statistics. Lecture 9: Simple Regression

Inference with Simple Regression

The Simple Linear Regression Model

SIMULTANEOUS EQUATION MODEL

The cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Econometrics. 7) Endogeneity

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Intermediate Econometrics

LECTURE 2: SIMPLE REGRESSION I

Business Statistics. Lecture 10: Course Review

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Econometrics I Lecture 3: The Simple Linear Regression Model

WISE International Masters

LECTURE 11. Introduction to Econometrics. Autocorrelation

Lecture-1: Introduction to Econometrics

Applied Microeconometrics (L5): Panel Data-Basics

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Business Statistics. Lecture 10: Correlation and Linear Regression


ECON 4230 Intermediate Econometric Theory Exam

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

EMERGING MARKETS - Lecture 2: Methodology refresher

ECON3150/4150 Spring 2015

Relationships between variables. Association Examples: Smoking is associated with heart disease. Weight is associated with height.

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

27. SIMPLE LINEAR REGRESSION II

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK

Regression line. Regression. Regression line. Slope intercept form review 9/16/09

Econometrics. 9) Heteroscedasticity and autocorrelation

Chapter 5 Friday, May 21st

Regression Analysis Tutorial 34 LECTURE / DISCUSSION. Statistical Properties of OLS

Dealing With Endogeneity

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Research Methods in Political Science I

Gov 2000: 9. Regression with Two Independent Variables

1 Correlation and Inference from Regression

Rockefeller College University at Albany

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Applied Quantitative Methods II

Simultaneous Equation Models

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

Handout 12. Endogeneity & Simultaneous Equation Models

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Job Training Partnership Act (JTPA)

Financial Econometrics

Greene, Econometric Analysis (7th ed, 2012)

ECNS 561 Multiple Regression Analysis

2 Prediction and Analysis of Variance

Econometrics - 30C00200

Motivation for multiple regression

13. Valuing Impacts from Observed Behavior: Slopes, Elasticities, and Data

Chapter 2: simple regression model

Making sense of Econometrics: Basics

CHAPTER 4: Forecasting by Regression

The regression model with one fixed regressor cont d

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

AMS 7 Correlation and Regression Lecture 8

Introduction to Regression

ECON Introductory Econometrics. Lecture 13: Internal and external validity

An explanation of Two Stage Least Squares

UNIVERSITY OF WARWICK. Summer Examinations 2015/16. Econometrics 1

ECONOMETRICS Introduction & First Principles

Heteroscedasticity 1

PhD/MA Econometrics Examination. January, 2015 PART A. (Answer any TWO from Part A)

6. Assessing studies based on multiple regression

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur

08 Endogenous Right-Hand-Side Variables. Andrius Buteikis,

Midterm 2 - Solutions

Linear Models in Econometrics

The regression model with one stochastic regressor (part II)

Introduction to Econometrics. Heteroskedasticity

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Handout 11: Measurement Error

Lectures 5 & 6: Hypothesis Testing

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Lecture 12. Functional form

Applied Regression Analysis

Correlation & Simple Regression

Analyzing Lines of Fit

Chapter 12 - Part I: Correlation Analysis

EC4051 Project and Introductory Econometrics

Topics. Estimation. Regression Through the Origin. Basic Econometrics in Transportation. Bivariate Regression Discussion

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Diagnostics of Linear Regression

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex

AUTOCORRELATION. Phung Thanh Binh

Types of economic data

Mgmt 469. Causality and Identification

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Econometrics Summary Algebraic and Statistical Preliminaries

15.0 Linear Regression

Transcription:

Simple Regression Model (Assumptions) Lecture 18 Reading: Sections 18.1, 18., Logarithms in Regression Analysis with Asiaphoria, 19.6 19.8 (Optional: Normal probability plot pp. 607-8) 1 Height son, inches 80 75 70 65 60 Remember Regression? son_hat = 33.887 + 0.514*father n = 1078, R = 0.51, s_e =.437 60 65 70 75 Height father, inches (s.d. of residuals).437 inches: measures scatter about OLS line 0.51: 5.1% of variation in sons heights explained by variation in their fathers heights OLS intercept 33.887: No interpretation b/c father cannot be 0 inches tall OLS slope 0.514: For every extra 1 inch of father s height, son is on average about ½ inch taller (y-hat): Predicted y, given x; E.g. son of a 7 inch tall father predicted to be 70.895 inches (= 33.887 + 0.514*7) (residual): ; E.g. if is 70.895 but is 68.531, then residual is -.364 inches Descriptive & Inferential Statistics Chap. 6: Scatterplots, Association, and Correlation Chap. 7: Introduction to Linear Regression / Simple Reg.: Chaps. 18 & 19 (Inference for Regression & Understanding Regression Residuals) Multiple Reg.: Chaps. 0 & 1 (Multiple Regression & Building Multiple Regression Models) BUT, multiple regression is also a new way to describe data: descriptive statistics 3 Lecture 18, Page 1 of 9

Questions and Data: Still Important Which kind of question? Research question: What is causal effect of a change in X (e.g. match) on Y (e.g. amount given) Descriptive question: what are patterns in data (e.g. how does household spending on food vary with income?) Which kind of data? Observational or experimental Correlation causation is a cliché Instead, apply understanding of data and specific context to fully interpret quantitative results Cross-sectional, time series, or panel 4 Which kind of data are these? On this slide, what is review of Fall-term material and what is new material? Unsurprisingly, we find that higher university density is associated with higher GDP per capita levels. p. 9 Why associated instead of correlated? Also, does quote imply causality? Source: The Economic Impact of Universities: Evidence from Across the Globe a 016 NBER Working Paper http://www.nber.org/papers/w501 5 It is interesting that countries with more universities in 1960 generally had higher growth rates over 1960-000. p. 9 But is it a strong association? 6 Lecture 18, Page of 9

X-variable is defined as Log(1 + universities per million people) Logs can straighten curved scatter plot Plus one addresses countries with 0 universities Example 1: x-value of 1 is a country with 1.7 universities per million: ln(1 + 1.7) 1 E.g. 10 universities w/ pop. 5.8 million: 1.710/5.8 Example : x-value of 3 is a country with 19.09 universities per million: ln(1 + 19.09) 3 E.g. 5 universities w/ pop. 1.31 million: 19.095/1.31 University density is over 11 times bigger in Example, but x-value only 3 times as big (diminishing returns) 7 How to interpret b=.64? On average, countries with a university density (universities per million people) that is 10% higher have average years of schooling that is 0.64 years higher. 8 Frozen Pizza (p. 67) How does the volume of sales depend on the price of frozen pizza? What is the economic name of this relationship? Weekly data on price and quantity for each of four cities (1994 1996); 156 weeks Raw data: ch18_mcsp_frozen_pizza.csv Are these data observational? Cross-sectional, time series, or panel? 9 Lecture 18, Page 3 of 9

Observational Data Price Quantity Demanded Demand Shifters What are some demand shifters for frozen pizza? Why do these demand shifters affect the price the firm chooses to set? 10 Frozen Pizza: OLS 0.7697 0.594 18.1 5.8 Interpret the line? Is the OLS line an estimate of the demand equation? Quantity, 10,000s 10 8 6 4 Denver, 1994-96 n = 156 weeks..4.6.8 3 Price, $1s 11 Simple Linear Regression: One x-variable Model: : dependent var., regressand, y-var., LHS-var. : independent var., regressor, explanatory var., x- var., RHS-var. (i.e. right-hand side variable) : observation index (often or cross-sectional data; time series data; or panel data) : intercept (constant) parameter : slope parameter : error term, residual, disturbance 1 Lecture 18, Page 4 of 9

Error term in Line is expected value: E Error explains deviations from expectations includes all other factors that affect aside from Impossible to collect data on everything: some variables unobserved to the researcher It reflects reality: model cannot control for everything In the above graph is positive or negative? 13 Assumptions Tame Elusive Epsilon We cannot observe but we can observe Notice how many of the six assumptions are about the unobservable Some assumptions can be checked by analyzing (the statistic tied to the parameter, but some cannot In general, models make assumptions about unknowns For example, a model could assume the outcome of the role of a die follows a discrete Uniform distribution: i.e. it s fair with a 1/6 probability of each outcome {1,, 3, 4, 5, 6} 14 Six Assumptions Six assumptions of the linear regression model Your book gives only four assumptions: I suspect it leaves one out because it is fairly obvious I am sure the other one is left out because it is only required if you wish to make a causal interpretation To minimize confusion, number the extra two as 5 & 6 Econometrics addresses violations in assumptions ECO374H Applied Econometrics (Commerce) ECO375H & ECO475H Applied Econometrics I & II 15 Lecture 18, Page 5 of 9

Assumption #1 Regression equation is linear in the error and parameters; the variables (in boxes) are linearly related to each other Not assuming that what is in boxes is linear (so long as no nonlinear functions of parameters or nonlinear functions of the error) Example of a linear regression: Example of a linear regression: ln 16 Frozen Pizza (Chapter 18) Weekly Q, 10,000s 10 8 6 4 Frozen Pizza, Denver, 1994-96 Q-hat = 18.1 + -5.80*P n = 156, R = 0.59, s.e.(b) = 0.353..4.6.8 3 Weekly price, $1s e (residuals) 3 1 0-1 - 3 4 5 6 7 Q_hat Which violations can we see? 17 Natural Log Transformations ln(weekly Q, 10,000s) Frozen Pizza, Denver, 1994-96 ln(q)-hat = 4.095 + -.773*ln(P) n = 156, R = 0.631, s.e.(b) = 0.171.5 1.5 1.7.8.9 1 1.1 ln(weekly price, $1s) e (residuals).4. 0 -. -.4 3 4 5 6 7 ln(q)_hat 18 Lecture 18, Page 6 of 9

Assumption # No autocorrelation / no serial correlation:, 0if Common problem in time-series data E.g. higher than expected inflation today, likely high tomorrow Errors assumed not systematically related across observations residual (e) -10-5 0 5 10 residual (e) -3--1 0 1 Assumption # Holds No Autocorrelation 0 10 0 30 t Assumption # Violated Positive Autocorrelation 0 10 0 30 t 19 Assumption #3 Homoscedasticity:, 1,, Equal variance assumption Error is just as noisy for all values of x Violation is called heteroscedasticity Common problem in cross-sectional data Y1-1 0 1 3 Y 0 1 3 Homoscedasticity 0 4 6 8 10 X1 Heteroscedasticity 0 4 6 8 10 X 0 ln(weekly Q, 10,000s) Fix Assumption #1 issues before checking Assumption #3 Frozen Pizza, Denver, 1994-96 ln(q)-hat = 4.095 + -.773*ln(P) n = 156, R = 0.631, s.e.(b) = 0.171.5 1.5 1.7.8.9 1 1.1 ln(weekly price, $1s) e (residuals).4. 0 -. -.4 3 4 5 6 7 ln(q)_hat Heteroscedasticity unequal variance of the residuals is often a byproduct of a violation of the linearity assumption Is Denver pizza regression an example? Remember that Chapter 18 advises you to check the assumptions in order: start with the linearity assumption 1 Lecture 18, Page 7 of 9

Assumptions #4 & #5 Galton s data (Lec. 5) Assumptions 1-3 hold? Normality: is Normal is unobserved so check Error has mean zero: 0,1,, Constant term (i.e. or ) picks up any constant effects, not the error Density 75 70 65 60 Son, inches80 Galton's Heights, n = 1078..15.1.05 60 65 70 75 Father, inches n = 1078 0-10 -5 0 5 10 residuals (e) Graphical Summary Would the elements in the population (not shown) lie on the line?,,, Is a reflection of sampling error? Assumptions #3, #4, and #5 combined: ~ 0, 3 017 ON Public Sector Disclosure of 016 salaries for University of Waterloo employees Sex n Mean S.d. F 416 $139.74K $33.74K M 941 $155.36K $36.96K OLS Results: Salary-hat = 139.74 + 15.6*Male R = 0.0385, n = 1,357, = 36.006 Do Assumptions #1 - #5 hold? Salary (1,0000's CAN$) Salary and Salary-hat 400 300 00 100 400 300 00 100 Waterloo, 016 Salaries Male=0 Male=1 Waterloo, 016 Salaries 0..4.6.8 1 Male 4 Lecture 18, Page 8 of 9

Assumption #6 x uncorrelated w/ error:, 0 Exogeneity: x variable(s) unrelated with error Dosage is exogenous: Experimental data can est. causal effect: Endogeneity: x variable(s) related with error With observational data, lurking/unobserved/omitted/ confounding variables mean x and error are related Price of pizza is endogenous: Endogeneity bias means: 5 Short-Hand Assumptions 1) Linear relationship between variables (possibly non-linearly transformed) ) No correlation amongst errors (no autocorrelation for time-series data) 3) Homoscedasticity (single variance) of errors 4) Normally distributed errors 5) Constant included (error has mean 0) 6) No relationship between x and error 6 Lecture 18, Page 9 of 9