CIVL 7012/8012. Simple Linear Regression. Lecture 3

Similar documents
Review of the General Linear Model

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Multiple Linear Regression CIVL 7012/8012

Econometrics -- Final Exam (Sample)

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

Making sense of Econometrics: Basics

Statistics and Quantitative Analysis U4320

Econometrics I Lecture 7: Dummy Variables

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Models - Introduction

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Chapter 3 Multiple Regression Complete Example

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Econometrics Review questions for exam

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Lab 10 - Binary Variables

Binary Logistic Regression

ECON3150/4150 Spring 2015

ECONOMETRIC MODEL WITH QUALITATIVE VARIABLES

Chapter 7 Student Lecture Notes 7-1

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Inferences for Regression

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Multiple Regression Analysis

Econometrics - 30C00200

Lectures 5 & 6: Hypothesis Testing

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

4. Nonlinear regression functions

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

Multiple Regression: Chapter 13. July 24, 2015

Review of Multiple Regression

Applied Econometrics Lecture 1

Ch 7: Dummy (binary, indicator) variables

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Problem Set 10: Panel Data

Categorical Predictor Variables

ECON Interactions and Dummies

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Linear Regression. Junhui Qian. October 27, 2014

ECON 497: Lecture Notes 10 Page 1 of 1

Answer Key: Problem Set 6

The regression model with one fixed regressor cont d

Chapter 8 Heteroskedasticity

df=degrees of freedom = n - 1

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur

Introduction to Econometrics. Heteroskedasticity

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Linear Regression Analysis for Survey Data. Professor Ron Fricker Naval Postgraduate School Monterey, California

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

ECON 4230 Intermediate Econometric Theory Exam

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Non-parametric methods

Statistical methods for Education Economics

Chapter 14 Student Lecture Notes 14-1

Final Exam - Solutions

General Linear Model (Chapter 4)

ECNS 561 Multiple Regression Analysis

ECON 497: Lecture 4 Page 1 of 1

Lecture 16 - Correlation and Regression

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Reliability of inference (1 of 2 lectures)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Lecture 6: Linear Regression

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Rockefeller College University at Albany

Answer Key: Problem Set 5

Lab 07 Introduction to Econometrics

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Applied Statistics and Econometrics

Linear Regression Model. Badr Missaoui

Chapter 13. Multiple Regression and Model Building

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Introduction To Logistic Regression

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Heteroscedasticity 1

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Panel data can be defined as data that are collected as a cross section but then they are observed periodically.

Graduate Econometrics Lecture 4: Heteroskedasticity

REVIEW 8/2/2017 陈芳华东师大英语系

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

The Simple Regression Model. Simple Regression Model 1

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Applied Statistics and Econometrics

ISQS 5349 Final Exam, Spring 2017.

Gov 2000: 9. Regression with Two Independent Variables

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression

Recitation 1: Regression Review. Christina Patterson

Transcription:

CIVL 7012/8012 Simple Linear Regression Lecture 3

OLS assumptions - 1 Model of population Sample estimation (best-fit line) y = β 0 + β 1 x + ε y = b 0 + b 1 x We want E b 1 = β 1 ---> (1) Meaning we want b 1 to be unbiased 5 assumptions of OLS to ensure unbiasedness of our slope parameter.

OLS assumptions - 2 1- Linear in parameters: in OLS, we can not have y = β 0 + β 2 1 x + ε (not linear) we could have y = β 0 + β 1 x 2 + ε (linear-in-parameters) 2- Random sampling 3- Zero conditional mean E(ε/x) = 0 ---> (2) Error is random with an expected average value of 0 given the IV (x).

OLS assumptions - 3 3- Zero conditional mean (cont.) Before we state how ε and x are related, we can make one assumption about ε As long as intercept β 0 is included in the equation, nothing is lost by assuming that the average value of ε in the population is zero. i.e. E(ε) = 0 ---> (3) (remember from SLR lec.1 errors above and below the best-fit line averaged 0). Eq. (3) suggests that the distribution of unobserved factors in the population is zero. Combining equations 2, and 3, we get that: E(ε/x) = E(ε) = 0 ---> (4)

OLS assumptions - 4 3- Zero conditional mean (cont.) E(ε/x) = E(ε) = 0 ---> (4) Equation (4) suggests that average value of ε does not depend on the value of x. If equation (4) holds true, then we can say that ε is mean independent of x When equations (3) and (4) are met, we can state the zero conditional mean assumption is met.

OLS assumptions - 5 3- Zero conditional mean (cont.) E(ε/x) = E(ε) = 0 ---> (4) The error is random with an expected average value of 0 given the IV Example (1): predict wage based on individual s height. So, for a certain height h i = 70 inches, we will have different predicted values of wages (individuals with different wages but same height).. Individuals 1,2,3 ---> wages = 40k, 42k, 38k. Why do we have these differences in wages? Because of other factors that are not known to the analyst, but combined in the ε term (factors such as: more/less talented, productive, etc.)..

OLS assumptions - 6 3- Zero conditional mean (cont.) E(ε/x) = E(ε) = 0 ---> (4) Example (2): In an effort to determine income as a function of education, we can state that Income = β 0 + β 1 (education) + ε Let us say ε is same as innate ability Let E(ability/8) represents average ability for the group of the population with 8 years of education. Similarly, let E(ability/16) represents average ability for the group of the population with 16 years of education. As per equation (4): E(ability/8) = E(ability/16) = 0

OLS assumptions - 7 3- Zero conditional mean (cont.) E(ε/x) = E(ε) = 0 ---> (4) As per equation (4): E(ability/8) = E(ability/16) = 0 As we can not observe innate ability, we have no way of knowing whether or not average ability is same for all education levels. So for all unobserved factors we consider that E(ε/x) = 0

OLS assumptions - 8 4- Sample variation We need to have different y s and different x s. We can not just have 1 observation and say multiply it by 100 and claim 100 observation observations need to be unique and have variations in both y and x. 5- Homoscedasticity: homo=same, scedasticity=variance var ε x = σ2 ---> constant variance. So, this means that the variance in the y variable is constant across the range of the x variable.

OLS assumptions 9 Homoscedasticity The variability of the unobserved influences (error) does not dependent on the value of the explanatory variable.

OLS assumptions 10 Heteroscedasticity The variance of the unobserved determinants of y increases with the change in x.

OLS assumptions - 11

OLS assumptions - 12 This is R generated plot for the ozone/temp example from SLR lecture 2. Simply type in the R console: plot(m1) for a model named m1 in R

Variable types in regression We first discussed SLR and did an example with 2 continuous variables x and y. What happens if your IV is not continuous? If your IV is categorical? Binary variable (takes only 2 values). 1 / 0. Yes / No. Male / Female. Nominal categorical (2 or more categories). Typically with no numeric values. The blood type of a person: A, B, AB or O. The state that a person lives in. So there is flexibility in the choices of the types of variables.

Example binary variable Example: Is height associated with gender? Two variable: x = IV with two levels (male, female) y = DV- continuous.(height). This IV has no values. Does our model change? y = b 0 + b 1 x NO If so, then all of the 5 OLS assumptions still hold and we can use SLR. Let s plot our variables

Example binary variable At what value does the line cross for males? And for females? The average What does the slope now represents? The difference in average y moving from males to females (between genders). Note: before the slope represented a change of 1 unit in x, now it represents the average from males to females.

Slope interpretation of binary SLR How does R handle categorical variables? When x = 0 (males), the model becomes y = β 0, so the intercept is the average of the males. When x = 1 (females), the model becomes y = β 0 + β 1, so the sum of the parameters is the average of the females. So, my β 1 now is the difference between ave. of females avg. of males = β 0 + β 1 β 0 = β 1 If betas were similar, there is no difference in height between male and female.

R - SLR - binary variable How does R handle categorical variables? Male=0 is the reference case β 0 = 1750.561 is the y-intercept at male = 0 β 1 = 150.952 is the coefficient for female = 1 β 1 is representing the difference of females relative to males (the difference). As we are moving from males to females, the average is decreasing by -150.952 We can generally conclude that female are less in height compared to males. But more preciously:

R - SLR - binary variable How does R handle categorical variables? Order matters! β 1 = always nonreference level minus reference level. There is a significate difference between females and males. There is a significate difference in height between those who ate females and those who are males (p value = 2e 16 )

Hypothesis test with β 1 to conduct a t-test on slope β 1 : Specify hypothesis: H 0 : β 1 = 0 vs. H 1 : β 1 0 at α = 0.05 (similar to previous class). If we can reject our null, this mean our slope is different than 0 and that there is a difference between both categories of gender. Luckily we have software that can do all that for us. Conclusion: We can conclude that there is a significate linear association between gender and height Interpretation (proper): There is a significant difference in the average height for females compared to males. Females were 150.952 mm shorter in height than males.

What happens when we have more than two categories to a variable? R will split the variable into smaller binary variables. For example, if a categorical variable X has 3 levels, some software will create 3 binary variables x 1, x 2, x 3 representing the categories of X. This technique of mimicking Xrepresents level 1 vs level 3 and level 2 vs level 3. Variables created with this procedure are called Dummy Variables. Note that both levels 1, 2 are being compared to level 3, thus level 3 is our reference level. Other types of software are smart enough to recognize the categorical levels of a variable. And other times, you have to code the variable so that the software can understand the 3 levels.