Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

Similar documents
Applied Statistics and Econometrics

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

1 Motivation for Instrumental Variable (IV) Regression

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Linear Regression with Multiple Regressors

2. Linear regression with multiple regressors

Chapter 6: Linear Regression With Multiple Regressors

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 March 28, 2002

Linear Regression with Multiple Regressors

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Welcome to Physics 161 Elements of Physics Fall 2018, Sept 4. Wim Kloet

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

EC4051 Project and Introductory Econometrics

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Applied Statistics and Econometrics

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2015

UNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS Academic year 2009/10 FINAL EXAM (2nd Call) June, 25, 2010

Answers to Problem Set #4

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Applied Health Economics (for B.Sc.)

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

ECN102: Analysis of Economics Data TA Section

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

Applied Quantitative Methods II

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

8. Instrumental variables regression

Handout 12. Endogeneity & Simultaneous Equation Models

Presentation Outline: Haunted Places in North America


Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

CHAPTER 6: SPECIFICATION VARIABLES

Announcements Wednesday, September 20

Review of Econometrics

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

EC402 - Problem Set 3

Announcements Monday, September 17

Inference in Regression Model

ECON3150/4150 Spring 2016

POL 681 Lecture Notes: Statistical Interactions

1 A Non-technical Introduction to Regression


Final Exam - Solutions

Instrumental Variables and the Problem of Endogeneity

WISE International Masters

Ec1123 Section 7 Instrumental Variables

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

Chapter 2: simple regression model

Reliability of inference (1 of 2 lectures)

Problem C7.10. points = exper.072 exper guard forward (1.18) (.33) (.024) (1.00) (1.00)


Announcements Monday, September 18

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

Final Exam - Solutions

Project Instructions Please Read Thoroughly!

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Solutions to Problem Set 5 (Due November 22) Maximum number of points for Problem set 5 is: 220. Problem 7.3

Session IV Instrumental Variables

Applied Statistics and Econometrics

Physics 141H. University of Arizona Spring 2004 Prof. Erich W. Varnes

Multiple Regression Analysis

Chapter 12 - Lecture 2 Inferences about regression coefficient

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

ECO375 Tutorial 8 Instrumental Variables

Introductory Econometrics

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Statistical Inference with Regression Analysis

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

ECO321: Economic Statistics II

Lecture 1/25 Chapter 2

Nonlinear Regression Functions

Motivation for multiple regression

STARTING WITH CONFIDENCE

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Machine Learning, Fall 2009: Midterm

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Ch. 2: Lec. 1. Basics: Outline. Importance. Usages. Key problems. Three ways of looking... Colbert on Equations. References. Ch. 2: Lec. 1.

The returns to schooling, ability bias, and regression

Quantitative Methods Final Exam (2017/1)

Empirical Application of Simple Regression (Chapter 2)

Click to edit Master title style

Lab 11 - Heteroskedasticity

Essential of Simple regression

In order to carry out a study on employees wages, a company collects information from its 500 employees 1 as follows:

Steps in Regression Analysis

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

CS 581: Introduction to the Theory of Computation! Lecture 1!

Introduction to Econometrics. Multiple Regression

MATH 341, Section 001 FALL 2014 Introduction to the Language and Practice of Mathematics

Quantitative Analysis II PUBPOL 528C Spring 2017

Transcription:

Contact Information Elena Llaudet Sections are voluntary. My office hours are Thursdays 5pm-7pm in Littauer Mezzanine 34-36 (Note room change) You can email me administrative questions to ellaudet@gmail.com. If you have substantive questions, please come to my office hours. September 16, 2010 Any substantial STATA related questions should be sent to the STATA TF: Shai (shaiber@gmail.com) Grade Problem Sets Grade: Problem sets: 30 % Midterm (October 14): 25 % Final (TBD): 45% Problem sets: graded from 0-10 -5 points if handed in before solutions are posted -10 points if handed in after solutions are posted The lowest grade will be dropped Problem set 8 cannot be dropped and counts double In both the midterm and the final, you are allowed a cheat-sheet To complete the problem sets, we encourage you to work in small groups (maximum of 3). Each person, however, must submit their own write-up. You must submit a hard copy of each problem set in class (usually on Tuesdays). Make sure to indicate the name of your TF and the name of your collaborators at the beginning of the problem set. Also make sure to attach your.do file at the end of the problem set

General Information Outline Problem set 1 has been graded (on a 0-10 scale). Problem sets that are not picked up in section will be left in a box in Littauer 200A The solutions should already be posted on the course website. I highly encourage you for you to take a look at them. I post the handouts that I use in section on the website (under Section Pages/Elena s Sections). The handouts on the web contain the answers to the questions raised in section. 1 2 3 4 5 of Last Week Based on Common Mistakes in Problem Set 1 A few general comments that apply to problem sets as well as exams Make sure that you are answering the question. Try to say no more and no less. In your answers, round numbers to two or three decimals. Pay attention to the signs. An effect of -3 is not the same as an effect of +3 If at all possible, make it easy for us to grade your answers (e.g., good handwriting if not typing the answers, don t cram your answers,...). Do not forget to mention the units in the interpretation of coefficients The real-world magnitude of effects SE (T ˆβ) = T SE(ˆβ) Meaning of Heteroskedasticity/Homoskedasticity What do we learn about the assumption of homoskedasticity if the standard errors of the coefficients change significantly depending on whether we run the regression with robust standard errors or not? Nothing. Meaning of RMSE Remember that with one binary/dummy regressor the ˆβ has a special interpretation.

ˆβs of Example from last week s section slides: Test Scores i = β 0 + β 1 Private School i + u i ˆβ 0 = [Test Scores i Private Schools i = 0] ˆβ 1 = [Test Scores i Private Schools i = 1] - [Test Scores i Private Schools i = 0] Generally: Y i = β 0 + β 1 D i + u i where D i is a dummy var. ˆβ 0 = [Y D = 0] ˆβ 1 = [Y D = 1] [Y D = 0] What if you try to run: TS i = β 0 + β 1 Public School i + β 2 Private School i + u i? You will face a problem of perfect multicoliniarity (also known as the dummy variable trap). For all observations: Public School i - Private School i =1= constant term So, one of two things will happen. Either you won t be able to run the regression or the software will automatically drop one of the variables of the regression. : An Example What would be the interpretation of coefficients then if you run: TS i = β 0 Public School i + β 1 Private School i + u i? ˆβ 0 = [Test Scores i Public Schools i = 1] ˆβ 1 = [Test Scores i Private Schools i = 1] Test Scores i = 4.9 + 2Private School i + u i 10 02 24 46 68 810 0.2.4.6.81 1private scores Linear prediction 8 (.4) (.8) Generally: Y i = β 0 D1 i + β 1 D2 i + u i where D1 i and D2 i are dummy variables such as D1 i + D2 i =1 ˆβ 0 = [Y D1 = 1] ˆβ 1 = [Y D2 = 1] 6 0 4 2 0.2.4 private.6.8 1 scores Linear prediction

(A violation of the first OLS assumption) occurs when one (or more) of the omitted variables in the regression: How can we interpret the distance between the data and the line? There are factors that affect students performance other than the type of school attended by the students. The errors of the regression represent all of the omitted factors that affect test scores. Does the existence of these error terms mean that the regression suffers from omitted variable bias? No. (a) is correlated with the independent variable of interest, AND (b) the omitted variable is a determinant of our dependent variable. Consequences: TheOLSestimatorisbiased(E(ˆβ) = β) and inconsistent ( ˆβ does not converge to β). ˆβ p β + ρ X,u σ u σ X,whereρ X,u is the correlation between X & u. How can we identify the sign or direction of the bias? Supposing Z to be the omitted variable determinant of Y and correlated with X, then: Sign of the OVB = Sign of corr(z,x) * Sign of corr(z,y) in our Example Going back to our example, we can think of many variables that are both related with the probability of attending a private school and that are determinants of students performance. The socio-economic status of a student, for example, will affect the likelihood that he or she attend a private school and it will also directly affect the performance of the student in that school. SES i Private Schools i Test Scores i If this is true, then, we say that SES i is an omitted variable and the coefficient on private school would suffer from omitted variable bias. What would be the expected sign of the bias in this case? We shall expect SES i and Private Schools i to be positively correlated. And, we shall expect SES i and Test Scores i to be also positively correlated. So (+) * (+) = (+) Does this mean that omitting SES i from our regression is making ˆβ 1 too small or too large? Too large. What can we do to solve the problem?: If the omitted variable is measurable, we can eliminate the bias by including the variable in the regression.

: An Example Results Test Scores i = ˆβ 0 + ˆβ 1 Private Schools i + ˆβ 2 SES i +û i Population Linear Model: Test Scores i = β 0 + β 1 Private Schools i + β 2 SES i + u i We have a data set of 61 observations. What is ˆβ 0 and what is its interpretation? ˆβ 0 = 2.8 It is the intercept of the regression. In this case it does not have substantive interpretation. (SES never takes values of 0). What is ˆβ 1 and what is its interpretation? ˆβ 1 =.7 It indicates the change in test scores associated with attending a private school (instead of a public school), while holding SES constant (or, ceteris paribus). Is this big? This is about 1/3 of a standard deviation of test scores. (Test scores have a standard deviation of 2.09 points). Is it statistically significant at the 5% significance level? No. ˆβ 1 / SE( ˆβ 1 ) =.7/.49 < 1.96, so we cannot reject the null hypothesis that Private Schools i is = 0 at the 5% significance level. How does ˆβ 1 compare to the ˆβ 1 that we got with the regression that did not control for SES i (e.g., is this smaller or larger; is it more or less statistically significant)? The ˆβ 1 estimated in the model that controls for SES i is smaller, which is what we would expect since we expected that omitting SES i from the regression was positively biasing the coefficient on Private School i While in the regression without SES Private Schools i was statistically significant at the 5% level, here it is not.

What is ˆβ 2 and what is its interpretation? ˆβ 2 =.3 It indicates the change in test scores associated with a one unit change in the socio-economic status of the student, while holding the type of school attended constant (or, ceteris paribus). Is this big? This is more than 1/2 of a standard deviation of test scores. (Test scores have a standard deviation of 2.09 points). Is it statistically significant at the 5% significance level? Yes. ˆβ 2 / SE( ˆβ 2 ) = 1.3/.3 >1.96, so we reject the null hypothesis that SES i is = 0 at the 5% significance level. Suppose we want to test: H 0 : β 1 = β 2 =0 H A : β 1 =0or β 2 = 0. How shall we interpret this test? We are testing whether neither of our two independent variables matter (i.e., whether neither of them are associated with changes in test scores). To test joint hypothesis, we use the F-test, which in large samples is distributed as F q, and can be approximated with χ 2 q/q; where q is the number of restrictions under the null. What is q in this case? q = 2. Using STATA to Test Joint Hypotheses F-test in the STATA output Shall we reject the null? Yes. Because the probability of getting a value greater than the value actually obtained is around 0.0000, then, we reject the null hypothesis at all of the conventional levels of significance.

Joint test vs. Individual test Is it ever possible for two variables to be jointly significant but individually insignificant? Yes. If the variables are correlated (if there is imperfect multicoliniarity between the two variables). Is it ever possible for two variables to be individually significant but jointly insignificant? No, by definition. To sort the data, use the command sort [NAME OF VARIABLE TO USE FOR THE SORTING] To list the data, use the command list [NAME OF VARIABLES TO BE LISTED] To drop an observation, use the command drop if [CONDITION TO BE USED FOR THE DROPPING: EXAMPLE: X 3]