Applied Statistics and Econometrics

Similar documents
Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors

Chapter 6: Linear Regression With Multiple Regressors

Applied Statistics and Econometrics

Introduction to Econometrics. Multiple Regression (2016/2017)

Introduction to Econometrics. Multiple Regression

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

2. Linear regression with multiple regressors

Applied Statistics and Econometrics

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Applied Statistics and Econometrics

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Hypothesis Tests and Confidence Intervals. in Multiple Regression

Nonlinear Regression Functions

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)

Hypothesis Tests and Confidence Intervals in Multiple Regression

Review of Econometrics

Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data

Econometrics Midterm Examination Answers

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2015

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Essential of Simple regression

ECO220Y Simple Regression: Testing the Slope

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

ECON3150/4150 Spring 2016

The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic)

The Simple Linear Regression Model

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

ECON Introductory Econometrics. Lecture 16: Instrumental variables

ECO321: Economic Statistics II

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

ECON Introductory Econometrics. Lecture 17: Experiments

Statistical Inference with Regression Analysis

Introduction to Econometrics

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

Multivariate Regression: Part I

Chapter 2: simple regression model

Lecture 4: Multivariate Regression, Part 2

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Lecture #8 & #9 Multiple regression

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

THE MULTIVARIATE LINEAR REGRESSION MODEL

Introduction to Econometrics. Regression with Panel Data

4. Nonlinear regression functions

Simple Linear Regression: The Model

Lecture 4: Multivariate Regression, Part 2

Ch 2: Simple Linear Regression

Lecture notes to Stock and Watson chapter 8

Homoskedasticity. Var (u X) = σ 2. (23)

Lecture 8: Instrumental Variables Estimation

Empirical Application of Simple Regression (Chapter 2)

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

General Linear Model (Chapter 4)

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Introduction to Econometrics. Review of Probability & Statistics

8. Instrumental variables regression

Linear Regression with one Regressor

Lab 07 Introduction to Econometrics

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Specification Error: Omitted and Extraneous Variables

Econometrics -- Final Exam (Sample)

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

6. Assessing studies based on multiple regression

Econometrics Review questions for exam

Motivation for multiple regression

Lab 6 - Simple Regression

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Correlation and Simple Linear Regression

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Review of Statistics 101

1 Motivation for Instrumental Variable (IV) Regression

CHAPTER 6: SPECIFICATION VARIABLES

Chapter 12 - Lecture 2 Inferences about regression coefficient

ECON Introductory Econometrics. Lecture 13: Internal and external validity

Lab 11 - Heteroskedasticity

ECNS 561 Multiple Regression Analysis

Econometrics. 8) Instrumental variables

10. Time series regression and forecasting

WISE International Masters

Econometrics I Lecture 3: The Simple Linear Regression Model

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Regression #8: Loose Ends

Econometrics - 30C00200

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Statistical Modelling in Stata 5: Linear Models

Transcription:

Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September 2017 2 / 53

Omitted variable bias The model with a single regressor is Y = β 0 + β 1 X + u. The error u arises because of factors that influence Y but are not included in the regression. These exluded factors are omitted variables from the regression. There are always omitted variables, and sometimes this can lead to a bias in the OLS estimator. We will study when such a bias arises and the likely direction of this bias. Saul Lach () Applied Statistics and Econometrics September 2017 3 / 53 Example: testscores and STR The estimated regression line of the test score-class size relationship is testscore = 698.9329 2.2798 STR A likely omitted variable here is family income. Suppose that in high-income districts classes are smaller and test scores are higher. Is 2.28 a credible estimate of the causal effect on test scores of a change in the student-teacher ratio? Probably not because it is likely that the estimated effect of STR also reflects the impact on test scores of variations in income across districts. Districts with smaller STR have higher tests due to higher income. Thus 2.78 is a larger effect (in abolute value) than the true causal effect of class size. Saul Lach () Applied Statistics and Econometrics September 2017 4 / 53

Omitted variable bias The bias in the OLS estimator that occurs as a result of an omitted factor is called the Omitted Variable Bias (OVB). Given that that there are always omitted variables, it is important to understand when such a OVB occurs? For OVB to occur, the omitted factor, which we call Z, must satisfy two conditions: 1 Z is a determinant of Y (i.e., Z is part of u) 2 Z is correlated with the regressor X. Both conditions must hold for the omission of Z to result in omitted variable bias. Saul Lach () Applied Statistics and Econometrics September 2017 5 / 53 Omitted variable bias: test scores and class size Another omitted variable could be English language ability 1 English language ability (whether the student has English as a second language) plausibly affects standardized test scores: Z is a determinant of Y. 2 Immigrant communities tend to be less affl uent and thus have smaller school budgets and higher STR: Z is correlated with X. Accordingly, ˆβ 1 is biased: what is the direction of the bias? That is, what is the sign of this bias? If intuition fails you, there is a formula... soon. Saul Lach () Applied Statistics and Econometrics September 2017 6 / 53

Conditions for OVB in CASchools data Sometimes we can actually check these conditions (at least in a given sample). The California School dataset has data on the percentage of students learning English. The variable is el_pct. Variable Obs Mean Std. Dev. Min Max el_pct 420 15.76816 18.28593 0 85.53972 Is this variable correlated with STR and testscore (at least in this sample)? correlate el_pct str testscr el_pct str testscr el_pct 1 str 0.1876 1 testscr 0.6441 0.2264 1 Saul Lach () Applied Statistics and Econometrics September 2017 7 / 53 Conditions for OVB in CASchools data testscore 700 680 660 640 620 testscore 700 680 660 640 620 str 26 24 22 20 18 16 14 14 16 18 20 22 24 26 0 20 40 60 80 0 20 40 60 80 str english english Districts with lower percent English learners have higher test scores. Districts with lower percent English learners have smaller classes. Saul Lach () Applied Statistics and Econometrics September 2017 8 / 53

OVB formula Recall from Lecture 4 (Preliminary algebra 3 slide) that we can write ˆβ 1 β 1 = n 1 i=1(x i X )u i n i=1(x i X ) 2 = n n i=1(x i X )u i 1 n n i=1(x i X ) 2 1 n = n i=1(x i X )u i n 1 n s2 X Under assumptions LS2 and LS3 we have ˆβ 1 p β1 + Cov(X, u) Var(x) }{{} OBV If LS1 holds then Cov(X, u) = 0 and ˆβ 1 E ( ) ˆβ 1 = β1 ). If LS1 does not hold then Cov(X, u) = 0 and ˆβ 1 (and also E ( ) ˆβ 1 = β1 ). p β1 (and also p β1 + Cov (X,u) Var (x ) Saul Lach () Applied Statistics and Econometrics September 2017 9 / 53 OVB formula in terms of omitted variable Previous formula is in terms of the error term u and, although it is called the OVB, it is more general in that the formula is correct irrespective of the reason for the correlation (or covariance) between u and X. Suppose we now that we assert that a variable Z is omitted from the regression. We are then saying that Z is part of u and we w.l.o.g. we can then write u = β 2 Z + ε where β 2 is a coeffi cient. Then Cov(X, u) = Cov(X, β 2 Z + ε) = β 2 Cov(X, Z ) assuming ε is uncorrelated with X. Saul Lach () Applied Statistics and Econometrics September 2017 10 / 53

OVB formula in terms of omitted variable The OVB formula in this case becomes ˆβ 1 p β1 + β 2 Cov(X, Z ) Var(x) }{{} OBV The math makes clear the two conditions for an OVB 1 Z is a determinant of Y = β 2 = 0. 2 Z is correlated with the regressor X = Cov(X, Z ) = 0. Saul Lach () Applied Statistics and Econometrics September 2017 11 / 53 OVB formula: correlation version An alternative formulation for the OVB formula is in terms of the correlation rather than the covariance: p Cov(X, u) ˆβ 1 β1 + Var(x) }{{} OBV p Cov(X, Z ) ˆβ 1 β1 + β 2 Var(x) }{{} OBV = β 1 + ρ Xu σ u σ X }{{} OBV = β 1 + β 2 ρ XZ σ Z σ X }{{} OBV Saul Lach () Applied Statistics and Econometrics September 2017 12 / 53

OVB formula in test score-class size example We usually use the OVB formula to try to sign the direction of the bias. For example, when Z is the % of English Learners it is likely that β 2 < 0 (also sample correlation suggest this). And ρ XZ is likely to be positive, ρ XZ > 0 (also suggested by sample correlation). Thus, β 2 ρ XZ ( ) (+) σ Z σ X < 0 so that ˆβ 1 is smaller than the true parameter β 1. Ignoring English learners overstates (in an absolute sense) the class size effect. What is the likely sign of the bias when Z is family income? Saul Lach () Applied Statistics and Econometrics September 2017 13 / 53 Three ways to overcome omitted variable bias 1. Run a randomized controlled experiment in which treatment (STR) is randomly assigned: then el_pct is still a determinant of testscore, but el_pct is uncorrelated with STR. Such random experiments are unrealistic in practice. Saul Lach () Applied Statistics and Econometrics September 2017 14 / 53

Three ways to overcome omitted variable bias 2. Adopt the cross tabulation approach: divide sample into groups having approx. same value of el_pct and analyze within groups. Problems: 1) soon we will run out of data, 2) there are other determinants (e.g., family income,parental education) that are omitted. Saul Lach () Applied Statistics and Econometrics September 2017 15 / 53 Three ways to overcome omitted variable bias 3. Use a regression in which the omitted variable (el_pct) is no longer omitted: include el_pct as an additional regressor in a multiple regression. This is the approach we will focus on. Saul Lach () Applied Statistics and Econometrics September 2017 16 / 53

Where are we? 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September 2017 17 / 53 The multiple regression model The population regression model (or function) is Y = β 0 + β 1 X 1 + β 2 X 2 + + β k X k + u Y is the dependent variable. X 1, X 2,..., X k are the k independent variables (regressors). β 0 is the (unknown) intercept and β 1,..., β k are the various (unknown) slopes. u is the regression error reflecting other omitted factors affecting Y. We assume right away that E (u X 1, X 2,... X k ) = 0 so that the population regression line is the C.E. of Y given the k X s and the slope parameters can be interpreted as causal effects. Saul Lach () Applied Statistics and Econometrics September 2017 18 / 53

Interpretation of coeffi cients (slopes) in multiple regression Consider changing X 1 from x 1 to x 1 + 1, while holding all the other X s fixed. Before the change we have After the change we have The difference is E (Y X 1 = x 1,..., X k = x k ) = β 0 + β 1 x 1 + β 2 x 2 + + β k x k E (Y X 1 = x 1 + 1,..., X k = x k ) = β 0 + β 1 (x 1 + 1 ) + β 2 x 2 + + β k x k E (Y X 1 = x 1 +,..., X k = x k ) E (Y X 1 = x 1,..., X k = x k ) = β 1 1 Saul Lach () Applied Statistics and Econometrics September 2017 19 / 53 Interpretation of coeffi cients (slopes) in multiple regression When 1 = 1 we have E (Y X 1 = x 1 + 1,..., X k = x k ) E (Y X 1 = x 1,..., X k = x k ) = β 1 β 1 measures the effect on (expected) Y of a unit change in X 1, holding the other regressors X 2,... X k fixed (we also say controlling for X 2,... X k ). Whether this partial effect can be given a causal interpretation depends on what we assume for E (u X 1, X 2,... X k ). If E (u X 1, X 2,... X k ) is constant, as assumed here, then β 1 is the causal effect of X 1 on Y. Otherwise, it is not a causal effect. Why? Same interpretation for β j, j = 2,..., k. Saul Lach () Applied Statistics and Econometrics September 2017 20 / 53

The multiple regression model in the sample The regression model (or function) in the sample is Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i i = 1,..., n The i th observation in the sample is (Y i, X 1i, X 2i,..., X ki ). Saul Lach () Applied Statistics and Econometrics September 2017 21 / 53 Estimation To simplify the presentation we assume that we have two regressors only, k = 2, Y i = β 0 + β 1 X 1i + β 2 X 2i + u i With two regressors, the OLS estimator solves: n Min b 0,b 1,b 2 (Y i (b 0 + b 1 X 1i + b 2 X 2i )) 2 i=1 The OLS estimator minimizes the average squared difference between the actual values of Y i and the prediction (predicted value), b 0 + b 1 X 1i + b 2 X 2i, based on such b s. This minimization problem is solved using calculus. The result is the OLS estimators of β 0, β 1, β 2 denoted, respectively, by ˆβ 0, ˆβ 1, ˆβ 2. Generalization of the case with one regressor (k = 1). Saul Lach () Applied Statistics and Econometrics September 2017 22 / 53

Graphic intuition n min b 0,b 1 (Y i b 0 b 1 X 1i ) 2 i=1 Fits line through points in R 2 n min b 0,b 1,b 2 (Y i b 0 b 1 X 1i b 2 X 2i ) 2 i=1 Fits plane through points in R 3 testscore 620 640 660 680 700 14 16 18 20 22 24 26 testscore 600 620 640 660 680 700 720 14 16 18 20 22 24 26 0 2040 60 8010 english str str Saul Lach () Applied Statistics and Econometrics September 2017 23 / 53 Matrix notation The multiple regression model Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i i = 1,..., n can be written in matrix form as Y = Xβ + u where Y = Y 1 Y 2. X = 1 X 11 X 21 X k1 1 X 12 X 22 X k2...... β = β 0 β 1. u = u 1 u 2. Y n 1 X 1n X 2n X kn β k u n Saul Lach () Applied Statistics and Econometrics September 2017 24 / 53

OLS in matrix form Using matrix notation, the minimization of the sum of squared residuals can be compactly written as Min β (Y Xβ) (Y Xβ) And the first order conditions are X (Y Xβ) = 0 = X X }{{} β }{{} (k +1) (k +1) (k +1) 1 = X Y }{{} (k +1) 1 which is a system of linear equations that can be solved for β (recall Ax = b, where A = X X,x = β and c = X Y). The solution is the OLS estimator given by provided X X is invertible. ^β = ( X X ) 1 X Y Saul Lach () Applied Statistics and Econometrics September 2017 25 / 53 Example:the CASchools test score data What happens to the coeffi cient on STR?. reg testscr str Source SS df MS Number of obs = 420 F(1, 418) = 22.58 Model 7794.11004 1 7794.11004 Prob > F = 0.0000 Residual 144315.484 418 345.252353 R-squared = 0.0512 Adj R-squared = 0.0490 Total 152109.594 419 363.030056 Root MSE = 18.581 testscr Coef. Std. Err. t P> t [95% Conf. Interval] str -2.279808.4798256-4.75 0.000-3.22298-1.336637 _cons 698.933 9.467491 73.82 0.000 680.3231 717.5428. reg testscr str el_pct Source SS df MS Number of obs = 420 F(2, 417) = 155.01 Model 64864.3011 2 32432.1506 Prob > F = 0.0000 Residual 87245.2925 417 209.221325 R-squared = 0.4264 Adj R-squared = 0.4237 Total 152109.594 419 363.030056 Root MSE = 14.464 testscr Coef. Std. Err. t P> t [95% Conf. Interval] str -1.101296.3802783-2.90 0.004-1.848797 -.3537945 el_pct -.6497768.0393425-16.52 0.000 -.7271112 -.5724423 _cons 686.0322 7.411312 92.57 0.000 671.4641 700.6004 Saul Lach () Applied Statistics and Econometrics September 2017 26 / 53

OLS predicted values and residuals Just as in the single regression model the predicted value is And the residual is Ŷ i = ˆβ 0 + ˆβ 1 X 1i + ˆβ 2 X 2i + + ˆβ k X ki û i = Y i Ŷ i = Y i ( ˆβ 0 + ˆβ 1 X 1i + ˆβ 2 X 2i + + ˆβ k X ki ) So that we can write Y i = Ŷ i + û i = ˆβ 0 + ˆβ 1 X 1i + ˆβ 2 X 2i + + ˆβ k X ki + û i Saul Lach () Applied Statistics and Econometrics September 2017 27 / 53 Where are we? 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September 2017 28 / 53

Measures of fit for multiple regression Same measures as before: SER (RMSE)= std. deviation of residual û i R 2 = fraction of variance of Y explained or accounted for by X 1,..., X k. (new!) R 2 is the adjusted R 2 = R 2 adjusted for the number of regressors. Saul Lach () Applied Statistics and Econometrics September 2017 29 / 53 Measures of fit: SER and RMSE As in regression with a single regressors, the SER/RMSE measure the spread of the Y s around the estimated regression line n 1 SER(RMSE ) = n k 1 ûi 2 i=1 Saul Lach () Applied Statistics and Econometrics September 2017 30 / 53

Measures of fit: R squared As in the regression with a single regressors, the R 2 is the fraction of the variance of Y accounted for by the model (i.e., by X 1,..., X k ) where ESS = n (Ŷi Ȳ ) 2, i=1 R 2 = ESS TSS = 1 SSR TSS n TSS = (Y i Ȳ ) 2, SSR = i=1 n ûi 2 i=1 The R 2 never decreases when another regressors is added (i.e., when k increases). (why?) Not a good feature for a measure of fit. Saul Lach () Applied Statistics and Econometrics September 2017 31 / 53 Measures of fit: Adjusted R squared The adjusted R 2 R 2 addresses this issue by penalizing you for including another regressor: ( ) n 1 SSR R 2 = 1 n k 1 TSS ( ) = R 2 k SSR n k 1 TSS Note that R 2 < R 2 but their difference tends to vanish for large n. R 2 does not necessarily increase with k (although SSR decreases, n 1 n k 1 increases). R 2 can be negative! Saul Lach () Applied Statistics and Econometrics September 2017 32 / 53

How to interpret the simple and adjusted R squared? A high R 2 (or R 2 ) means that the regressors account for the variation in Y. A high R 2 (or R 2 ) does not mean that you have eliminated omitted variable bias. A high R 2 (or R 2 ) does not mean that you have an unbiased estimator of a causal effect. A high R 2 (or R 2 ) does not mean that the included variables are statistically significant this must be determined using hypotheses tests. Maximize R 2 (or R 2 ) is not a a criterion we use to select regressors. Saul Lach () Applied Statistics and Econometrics September 2017 33 / 53 CASchools data example Regression of testscore against STR: testscore = 698.93 2.28str, R 2 = 0.0512 Regression of testscore against STR and el_pct: testscore = 686.032 1.101str 0.65el_pct, R 2 = 0.426 Adding the % of English Learners substantially improves fit of the regression. Both regressors account for almost 43% of the variation of testscores across districts. Saul Lach () Applied Statistics and Econometrics September 2017 34 / 53

Where are we? 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September 2017 35 / 53 The Least Squares Assumptions for multiple regression The multiple regression model is Y = β 0 + β 1 X 1 + β 2 X 2 + + β k X k + u The four least squares assumptions are: Assumption #1 The conditional distribution of u given all the X s has mean zero, that is, E (u X 1 = x 1,..., X k = x k ) = 0 for all (x 1,... x k ). Assumption #2 (Y i, X 1i,..., X ki ), i = 1,..., n, are i.i.d. Assumption #3 Large outliers in Y and X are unlikely. X 1,..., X k and Y have finite fourth moments, E ( Y 4) <, E ( X 4 1 ) <,..., E ( X 4 k ) < Assumpiton #4 There is no perfect multicollinearity. Saul Lach () Applied Statistics and Econometrics September 2017 36 / 53

Assumption #1: mean independence E (u X 1 = x 1,..., X k = x k ) = 0 Same interpretation as in the regression with a single regressor. This assumption gives a causal interpretation to the parameters (the β s). If an omitted variable (a) belongs in the equation (so is in u) and (b) is correlated with an included X, then this condition fails and there is OVB (omitted variable bias). The solution if possible is to include the omitted variable in the regression. Usually, this assumption is more likely to hold when one controls for more factors by including them in the regression. Saul Lach () Applied Statistics and Econometrics September 2017 37 / 53 Assumption #2: i.i.d. sample Same assumption as in the single regressor model. This is satisfied automatically if the data are collected by simple random sampling. Saul Lach () Applied Statistics and Econometrics September 2017 38 / 53

Assumption #3: large outliers are unlikely Same assumption as in the single regressor model. OLS can be sensitive to large outliers. It is recommended to check the data (via scatterplots, etc.) to make sure there are no large outliers (due to typos, coding errors, etc.) This is technical assumption satisfied automatically by variables with a bounded domain. Saul Lach () Applied Statistics and Econometrics September 2017 39 / 53 Assumption #4: No perfect multicollinearity New assumption that applies when there is more than a single regressor. Perfect multicollinearity occurs when one of the regressors is an exact linear function of the other regressors. Assumption #4 rules this out. Cannot estimate the effect of, say, X 1 holding all other variables constant if one of these variables is a perfect linear function of X 1. When there is perfect multicollinearity, the statistical software will let you know it by either crashing, by giving an error message, or by dropping one of the regressors arbitrarily. Saul Lach () Applied Statistics and Econometrics September 2017 40 / 53

Including a perfectly collinear regressor in R Example: generate str_new=5+.2*str and add it to regression. What happens? Stata drops one fo the collinear variables.. g str_new=5+.2*str. reg testscr str str_new note: str_new omitted because of collinearity Source SS df MS Number of obs = 420 F(1, 418) = 22.58 Model 7794.11004 1 7794.11004 Prob > F = 0.0000 Residual 144315.484 418 345.252353 R-squared = 0.0512 Adj R-squared = 0.0490 Total 152109.594 419 363.030056 Root MSE = 18.581 testscr Coef. Std. Err. t P> t [95% Conf. Interval] str -2.279808.4798256-4.75 0.000-3.22298-1.336637 str_new 0 (omitted) _cons 698.933 9.467491 73.82 0.000 680.3231 717.5428 Saul Lach () Applied Statistics and Econometrics September 2017 41 / 53 The dummy variable trap Suppose you have a set of multiple binary (dummy) variables, which are mutually exclusive and exhaustive that is, there are multiple categories and every observation falls in one and only one category (think of region of residence: Sicily, Lazio, Tuscany, etc.). If you include all these dummy variables and a constant in the regression, you will have perfect multicollinearity this is sometimes called the dummy variable trap. Why is there perfect multicollinearity here? Solutions to the dummy variable trap: 1 Omit one of the groups (e.g. Lazio), or 2 Omit the intercept What are the implications of (1) or (2) for the interpretation of the coeffi cients? We will analyze this later in an example. Saul Lach () Applied Statistics and Econometrics September 2017 42 / 53

Assumption #4: No perfect multicollinearity Perfect multicollinearity usually reflects a mistake in the definitions of the regressors, or an oddity in the data. The solution to perfect multicollinearity is to modify your list of regressors so that you no longer have perfect multicollinearity. Saul Lach () Applied Statistics and Econometrics September 2017 43 / 53 Imperfect multicollinearity Imperfect and perfect multicollinearity are quite different despite the similarity of their names. Imperfect multicollinearity occurs when two or more regressors are very highly (but not perfectly) correlated. Why the term multicollinearity? If two regressors are very highly correlated, then their scatterplot will pretty much look like a straight line they are co-linear" but unless the correlation is exactly ±1, that collinearity is imperfect. Saul Lach () Applied Statistics and Econometrics September 2017 44 / 53

Imperfect multicollinearity Imperfect multicollinearity implies that one or more of the regression coeffi cients will be imprecisely estimated. Intuition: the coeffi cient on X 1 is the effect of X 1 holding X 2 constant; but if X 1 and X 2 are highly correlated, there is very little variation in X 1 once X 2 is held constant so the data are pretty much uninformative about what happens when X 1 changes but X 2 doesn t. This means that the variance of the OLS estimator of the coeffi cient on X 1 will be large. Thus, imperfect multicollinearity (correctly) results in large standard errors for one or more of the OLS coeffi cients. Importantly, imperfect multicollinearity does not violate Assumption #4. The OLS regression will run. Saul Lach () Applied Statistics and Econometrics September 2017 45 / 53 Where are we? 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September 2017 46 / 53

The sampling distribution of OLS Under the four LS assumptions: 1 ˆβ 0, ˆβ 1,..., ˆβ k are unbiased and consistent estimators of β 0, β 1,..., β k. 2 The joint sampling distribution of ˆβ 0, ˆβ 1,..., ˆβ k is well approximated by a multivariate normal distribution. 3 This implies that, in large samples, for j = 0.1,..., k, ) ˆβ j N (β j, σ 2ˆβj or, equivalently, ˆβ j β j σ 2ˆβj N (0, 1) Saul Lach () Applied Statistics and Econometrics September 2017 47 / 53 The variance of the OLS estimator There is a more complicated formula for the estimator of the variance of ˆβ j...but the software does it for us! As in the single regressor case, there is a formula that holds only when there is homoskedasticity, i.e., when Var (u X 1,..., X k ) is a constant that does not vary with the values of (X 1,..., X k ), and another formula that holds when there is heteroskedasticity. As in the single regressor case, we prefer to use the formula that is robust to heteroskedasticity becasuse it is also correct when there is homoskedasticity. Intuitively we expect our estimator to be less precise (to have higher sampling variance) when using the same data to estimate more parameters. This is indeed correct, and the formula for the variance of ˆβ j (not shown) reflects this intuition as it usually increases with the number of variables (k) included in the regression. This result prevents us form keeping adding regressors without limits. Saul Lach () Applied Statistics and Econometrics September 2017 48 / 53

Where are we? 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September 2017 49 / 53 Hypothesis Tests and Confidence Intervals for a Single Coeffi cient This follows the same logic and recipe as for the slope coeffi cient in a single-regressor model. Because ˆβ j β j σ 2ˆβj is approximately distributed N(0, 1) in large samples (under the four LS assumptions), hypotheses on β 1 can be tested using the usual t-statistic t = ˆβ 1 β 1,0 SE ( ˆβ 1 ), and 95% confidence intervals are constructed as { ˆβ 1 ± 1.96 SE ( ˆβ 1 )} Similarly for β 2,..., β k. ˆβ 1 and ˆβ 2 are generally not independently distributed so neither are their t-statistics (more on this later). Saul Lach () Applied Statistics and Econometrics September 2017 50 / 53

The California school dataset Single regressor estimates. reg testscr str, robust Linear regression Number of obs = 420 F(1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581 Robust testscr Coef. Std. Err. t P> t [95% Conf. Interval] str -2.279808.5194892-4.39 0.000-3.300945-1.258671 _cons 698.933 10.36436 67.44 0.000 678.5602 719.3057 Saul Lach () Applied Statistics and Econometrics September 2017 51 / 53 The California school dataset Multiple regression estimates. reg testscr str el_pct,robust Linear regression Number of obs = 420 F(2, 417) = 223.82 Prob > F = 0.0000 R-squared = 0.4264 Root MSE = 14.464 Robust testscr Coef. Std. Err. t P> t [95% Conf. Interval] str -1.101296.4328472-2.54 0.011-1.95213 -.2504616 el_pct -.6497768.0310318-20.94 0.000 -.710775 -.5887786 _cons 686.0322 8.728224 78.60 0.000 668.8754 703.189 Saul Lach () Applied Statistics and Econometrics September 2017 52 / 53

Testing hypotheses and CI in the California school dataset The coeffi cient on STR in the multiple regression is the effect on Testscore of a unit change in STR, holding constant the percentage of English Learners in the district. The coeffi cient on STR falls by one-half (in absolute value) when el_pct is added to the regression (does it make sense?) The 95% confidence interval for the coeffi cient on STR in is { 1.10 ± 1.96 0.43} = ( 1.95, 0.25) The t-statistic testing H 0 : β STR = 0 is t = ˆβ STR 0 σ 2ˆβSTR = 1.10 0.43 = 2.54 so we reject the null hypothesis at the 5% significance level We use heteroskedasticity-robust standard errors for exactly the same reasons as in the case of a single regressor. Saul Lach () Applied Statistics and Econometrics September 2017 53 / 53