Introduction to Econometrics. Assessing Studies Based on Multiple Regression

Similar documents
Assessing Studies Based on Multiple Regression

Chapter 9: Assessing Studies Based on Multiple Regression. Copyright 2011 Pearson Addison-Wesley. All rights reserved.

6. Assessing studies based on multiple regression

Linear Regression with Multiple Regressors

Introduction to Econometrics. Regression with Panel Data

Introduction to Econometrics. Multiple Regression

Linear Regression with one Regressor

Introduction to Econometrics

Nonlinear Regression Functions

Chapter 6: Linear Regression With Multiple Regressors

Linear Regression with Multiple Regressors

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Introduction to Econometrics

Applied Health Economics (for B.Sc.)

Review of Econometrics

Introduction to Econometrics. Multiple Regression (2016/2017)

Applied Statistics and Econometrics

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Introduction to Econometrics

ECO321: Economic Statistics II

THE AUSTRALIAN NATIONAL UNIVERSITY. Second Semester Final Examination November, Econometrics II: Econometric Modelling (EMET 2008/6008)

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

2. Linear regression with multiple regressors

Hypothesis Tests and Confidence Intervals in Multiple Regression

Applied Statistics and Econometrics

4. Nonlinear regression functions

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

8. Instrumental variables regression

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic)

Multiple Linear Regression CIVL 7012/8012


ECON Introductory Econometrics. Lecture 17: Experiments

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Applied Statistics and Econometrics

Chapter 11. Regression with a Binary Dependent Variable

Hypothesis Tests and Confidence Intervals. in Multiple Regression

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

WISE International Masters

Experiments and Quasi-Experiments

Empirical approaches in public economics

Lab 07 Introduction to Econometrics

Introduction to Econometrics

The Simple Linear Regression Model

EMERGING MARKETS - Lecture 2: Methodology refresher

1 Motivation for Instrumental Variable (IV) Regression

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Econometrics Problem Set 11

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =


Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data

Introduction to Econometrics. Review of Probability & Statistics

Chapter 8 Conclusion

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

Economics 300. Econometrics Multiple Regression: Extensions and Issues

Topic 4: Model Specifications

Economics 300. Econometrics Multiple Regression: Extensions and Issues

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Final Exam. Economics 835: Econometrics. Fall 2010

ECON Introductory Econometrics. Lecture 13: Internal and external validity

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Honors General Exam Part 3: Econometrics Solutions. Harvard University

Econ Spring 2016 Section 9

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Multivariate Regression: Part I

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

I 1. 1 Introduction 2. 2 Examples Example: growth, GDP, and schooling California test scores Chandra et al. (2008)...

Gov 2002: 4. Observational Studies and Confounding

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

ECON2228 Notes 8. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 35

Linear Models in Econometrics

Sociology 593 Exam 2 Answer Key March 28, 2002

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler


PhD/MA Econometrics Examination January 2012 PART A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

2 Prediction and Analysis of Variance

Econ Spring 2016 Section 9

Introduction to Econometrics

Binary Dependent Variable. Regression with a

ECO220Y Simple Regression: Testing the Slope

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Quantitative Economics for the Evaluation of the European Policy

Selection on Observables: Propensity Score Matching.

Statistical Inference with Regression Analysis

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Linear Regression. Junhui Qian. October 27, 2014

Chapter 6 Stochastic Regressors

Session IV Instrumental Variables

Treatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison

Metrics Honors Review

Job Training Partnership Act (JTPA)

Addressing Analysis Issues REGRESSION-DISCONTINUITY (RD) DESIGN

PSC 504: Instrumental Variables

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

Transcription:

Introduction to Econometrics The statistical analysis of economic (and related) data STATS301 Assessing Studies Based on Multiple Regression Titulaire: Christopher Bruffaerts Assistant: Lorenzo Ricci 1

Assessing Studies Based on Multiple Regression Multiple regression has some key virtues: It provides an estimate of the effect on Y of arbitrary changes ΔX. It resolves the problem of omitted variable bias, if an omitted variable can be measured and included. It can handle nonlinear relations (effects that vary with the X s) Still, OLS might yield a biased estimator of the true causal effect. 2

A Framework for Assessing Statistical Studies Internal and External Validity Internal validity: the statistical inferences about causal effects are valid for the population being studied. External validity: the statistical inferences can be generalized from the population and setting studied to other populations and settings, where the setting refers to the legal, policy, and physical environment and related salient features. 3

Threats to External Validity How far can we generalize class size results from California school districts? Differences in populations o California in 2005? o Massachusetts in 2005? o Mexico in 2005? Differences in settings o different legal requirements concerning special education o different treatment of bilingual education o differences in teacher characteristics 4

Assessing Studies Based on Multiple Regression Threats to Internal Validity of Multiple Regression Analysis 5

Threats to Internal Validity Internal validity: the statistical inferences about causal effects are valid for the population being studied. Five threats to the internal validity of regression studies: 1. Omitted variable bias 2. Wrong functional form 3. Errors-in-variables bias 4. Sample selection bias 5. Simultaneous causality bias All of these imply that E(u i X 1i,,X ki ) 0 6

1 - Omitted variable bias Arises if an omitted variable both (i) is a determinant of Y and (ii) is correlated with at least one included regressor. Potential solutions to omitted variable bias If the variable can be measured, include it as a regressor in multiple regression; Possibly, use panel data in which each entity (individual) is observed more than once; If the variable cannot be measured, use instrumental variables regression; Run a randomized controlled experiment. 7

2 - Wrong functional form Arises if the functional form is incorrect for example, an interaction term is incorrectly omitted; then inferences on causal effects will be biased. Potential solutions to functional form misspecification Continuous dependent variable: use the appropriate nonlinear specifications in X (logarithms, interactions, etc.) Discrete (example: binary) dependent variable: need an extension of multiple regression methods ( probit or logit analysis for binary dependent variables). 8

3 - Errors-in-variables bias So far we have assumed that X is measured without error. In reality, economic data often have measurement error Data entry errors in administrative data Recollection errors in surveys o when did you start your current job? Ambiguous questions problems o what was your income last year? Intentionally false response problems with surveys o What is the current value of your financial assets? o How often do you drink and drive? In general, measurement error in a regressor results in errors-in-variables bias. 9

Errors-in-variables bias: Illustration Suppose that Y i = β 0 + β 1 X i + u i is correct in the sense that the three least squares assumptions hold (in particular E(u i X i ) = 0). Let X i = unmeasured true value of X X i = imprecisely measured version of X Then Y i = β 0 + β 1 X i + u i = β 0 + β 1 X i + [β 1 (X i X i ) + u i ] or Y i = β 0 + β 1 X i + u i, where u i = β 1 (X i X i ) + u i If X i is correlated with u i then ˆ β 1 will be biased 10

Errors-in-variables bias: Illustration cov( X i, u i ) = cov( X i,β 1 (X i X i ) + u i ) = β 1 cov( X i, X i X i ) + cov( X i,u i ) = β 1 cov( X i, X i ) β 1 var( X i ) + 0 0 because in general cov( X i,x i ) 0 and var( X i ) 0. If X i is measured with error, X i is in general correlated with u i, so ˆ β 1 is biased and inconsistent. It is possible to derive formulas for this bias, but they require making specific mathematical assumptions about the measurement error process. Those formulas are special and particular, but the observation that measurement error in X results in bias is general. 11

Potential solutions to errors-in-variables bias Obtain better data. Develop a specific model of the measurement error process. This is only possible if a lot is known about the nature of the measurement error for example a subsample of the data are cross-checked using administrative records and the discrepancies are analyzed and modeled. Instrumental variables regression. Very specialized; we won t pursue this here. 12

4 - Sample selection bias So far we have assumed simple random sampling of the population. In some cases, simple random sampling is thwarted because the sample, in effect, selects itself. Sample selection bias arises when a selection process (i) influences the availability of data and (ii) that process is related to the dependent variable. 13

Example #1: Mutual funds Do actively managed mutual funds outperform hold-the-market funds? Empirical strategy: o Sampling scheme: simple random sampling of mutual funds available to the public on a given date. o Data: returns for the preceding 10 years. o Estimator: average ten-year return of the sample mutual funds, minus ten-year return on S&P500 return i = β 0 + β 1 managed_fund i + u i Is there sample selection bias? Being a managed fund in the sample (managed_fund i = 1) means that your return was better than failed managed funds, which are not in the sample so corr(managed_fund i,u i ) 0. 14

Potential solutions to sample selection bias Collect the sample in a way that avoids sample selection. o Mutual funds example: change the sample population from those available at the end of the tenyear period, to those available at the beginning of the period (include failed funds) Randomized controlled experiment. Construct a model of the sample selection problem and estimate that model (we will not do this). 15

5 - Simultaneous causality bias So far we have assumed that X causes Y. What if Y causes X, too? Example: Class size effect Low STR results in better test scores But suppose districts with low test scores are given extra resources: as a result of a political process they also have low STR What does this mean for a regression of TestScore on STR? 16

Simultaneous causality bias in equations (a) Causal effect of X on Y: Y i = β 0 + β 1 X i + u i (b) Causal effect of Y on X: X i = γ 0 + γ 1 Y i + v i Large u i means large Y i, which implies large X i (if γ 1 >0) Thus corr(x i,u i ) 0 Thus ˆ β 1 is biased and inconsistent. Ex: A district with particularly bad test scores given the STR (negative u i ) receives extra resources, thereby lowering its STR; so STR i and u i are correlated 17

Potential solutions to simultaneous causality bias Randomized controlled experiment. Because X i is chosen at random by the experimenter, there is no feedback from the outcome variable to Y i (assuming perfect compliance). Develop and estimate a complete model of both directions of causality. This is the idea behind many large macro models (e.g. Federal Reserve Bank-US). This is extremely difficult in practice. Use instrumental variables regression to estimate the causal effect of interest (effect of X on Y, ignoring effect of Y on X). 18

Applying this Framework: Test Scores and Class Size Objective: Assess the threats to the internal and external validity of the empirical analysis of the California test score data. External validity o Compare results for California and Massachusetts o Think hard Internal validity o Go through the list of five potential threats to internal validity and think hard 19

Check of external validity compare the California study to one using Massachusetts data The Massachusetts data set 220 elementary school districts Test: 1998 MCAS test fourth grade total (Math + English + Science) Variables: STR, TestScore, PctEL, LunchPct, Income 20

The Massachusetts data: summary statistics 21

22

23

Logarithmic versus cubic function for STR? Evidence of nonlinearity in TestScore-STR relation? Is there a significant HiEL STR interaction? 24

Predicted effects for a class size reduction of 2 Linear specification TestScore = 744.0 0.64STR 0.437PctEL 0.582LunchPct (21.3) (0.27) (0.303) (0.097) 3.07Income + 0.164Income 2 0.0022Income 3 (2.35) (0.085) (0.0010) Estimated effect = -0.64 (-2) = 1.28 Standard error = 2 0.27 = 0.54 NOTE: var(ay) = a 2 var(y); SE(aβ 1 ) = a SE( β 1) ˆ ˆ CI 95% = [ 1.28 ± 1.96 0.54 ]= [0.22, 2.34] 25

Predicted effects for a class size reduction of 2 nonlinear specification Use the before and after method: TestScore = 655.5 + 12.4STR 0.680STR 2 + 0.0115STR 3 0.434PctEL 0.587LunchPct 3.48Income + 0.174Income 2 0.0023Income 3 Estimated reduction from 20 students to 18: ΔTestScore = [12.4 20 0.680 20 2 + 0.0115 20 3 ] [12.4 18 0.680 18 2 + 0.0115 18 3 ] = 1.98 compare with estimate from linear model of 1.28 SE of this estimated effect: use the rearrange the regression ( transform the regressors ) method 26

Summary of Findings for Massachusetts 1. Coefficient on STR falls from 1.72 to 0.69 when control variables for student and district characteristics are included an indication that the original estimate contained omitted variable bias. 2. The class size effect is statistically significant at the 1% significance level, after controlling for student and district characteristics 3. No statistical evidence on nonlinearities in the TestScore STR relation 4. No statistical evidence of STR PctEL interaction 27

Comparison of estimated class size effects: CA vs. MA 28

Summary: Comparison of CA and MA Analyses Class size effect falls in both CA, MA data when student and district control variables are added. Class size effect is statistically significant in both CA, MA data. Estimated effect of a 2-student reduction in STR is quantitatively similar for CA, MA. Neither data set shows evidence of STR PctEL interaction. Some evidence of STR nonlinearities in CA data, but not in MA data. 29

Remaining threats to internal validity What the CA v. MA comparison does and does not show 30

1. Omitted variable bias This analysis controls for: district demographics (income) some student characteristics (English speaking) What is missing? Additional student characteristics, Access to outside learning opportunities Teacher quality for example native ability (but is this correlated with STR?) perhaps better teachers are attracted to schools with lower STR We have controlled for many relevant omitted factors; The nature of this omitted variable bias would need to be similar in California and Massachusetts to be consistent with these results; In this application we will be able to compare these estimates based on observational data with estimates based on experimental data a check of this multiple regression methodology. 31

2. Wrong functional form We have tried quite a few different functional forms, in both the California and Massachussetts data Nonlinear effects are modest Plausibly, this is not a major threat at this point. 3. Errors-in-variables bias STR is a district-wide measure Presumably there is some measurement error students who take the test might not have experienced the measured STR for the district Ideally we would like data on individual students, by grade level. 32

4. Selection Sample is all elementary public school districts (in California; in Massachussetts) no reason that selection should be a problem. 5. Simultaneous Causality School funding equalization based on test scores could cause simultaneous causality. This was not in place in California or Massachussetts during these samples, so simultaneous causality bias is arguably not important. 33

Summary Framework for evaluating regression studies: o Internal validity o External validity Five threats to internal validity: 1. Omitted variable bias 2. Wrong functional form 3. Errors-in-variables bias 4. Sample selection bias 5. Simultaneous causality bias Rest of course focuses on econometric methods for addressing these threats. 34