Assessing Studies Based on Multiple Regression

Similar documents
Chapter 9: Assessing Studies Based on Multiple Regression. Copyright 2011 Pearson Addison-Wesley. All rights reserved.

Assessing Studies Based on Multiple Regression

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

January Examinations 2015

Chapter 5: Hypothesis Tests, Confidence Intervals & Gauss-Markov Result

Introduction to Econometrics. Assessing Studies Based on Multiple Regression

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Basically, if you have a dummy dependent variable you will be estimating a probability.

III. Econometric Methodology Regression Analysis

Lecture 6: Introduction to Linear Regression

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

The Ordinary Least Squares (OLS) Estimator

e i is a random error

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Chapter 13: Multiple Regression

Chapter 5 Multilevel Models

Polynomial Regression Models

Negative Binomial Regression

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Chapter 11: Simple Linear Regression and Correlation

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Lecture 3 Stat102, Spring 2007

Chapter 9: Statistical Inference and the Relationship between Two Variables

Linear Regression Analysis: Terminology and Notation

STAT 511 FINAL EXAM NAME Spring 2001

Econometrics: What's It All About, Alfie?

Limited Dependent Variables

x i1 =1 for all i (the constant ).

Statistics for Economics & Business

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Empirical Methods for Corporate Finance. Identification

Factor models with many assets: strong factors, weak factors, and the two-pass procedure

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Introduction to Econometrics (3 rd Updated Edition, Global Edition) Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 13

= z 20 z n. (k 20) + 4 z k = 4

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Homework Assignment 3 Due in class, Thursday October 15

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Lab 2e Thermal System Response and Effective Heat Transfer Coefficient

Methods Lunch Talk: Causal Mediation Analysis

Introduction to Regression

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Chapter 14 Simple Linear Regression

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

x = , so that calculated

Lecture 3 Specification

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Economics 130. Lecture 4 Simple Linear Regression Continued

Topic- 11 The Analysis of Variance

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Statistics II Final Exam 26/6/18

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

28. SIMPLE LINEAR REGRESSION III

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Statistics for Business and Economics

CHAPTER IV RESEARCH FINDING AND ANALYSIS

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Comparison of Regression Lines

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

18. SIMPLE LINEAR REGRESSION III

Midterm Examination. Regression and Forecasting Models

Chapter 15 - Multiple Regression

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Kernel Methods and SVMs Extension

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Global Sensitivity. Tuesday 20 th February, 2018

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Chapter 4: Regression With One Regressor

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Continuous vs. Discrete Goods

This column is a continuation of our previous column

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist?

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Chapter 20 Duration Analysis

ECONOMICS 351* -- Stata 10 Tutorial 6. Stata 10 Tutorial 6

An Introduction to Censoring, Truncation and Sample Selection Problems

Topic 23 - Randomized Complete Block Designs (RCBD)

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

The Geometry of Logit and Probit

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

A Comparative Study for Estimation Parameters in Panel Data Model

Econometrics of Panel Data

Vapnik-Chervonenkis theory

Lecture 4 Hypothesis Testing

STAT 3008 Applied Regression Analysis

9. Binary Dependent Variables

Credit Card Pricing and Impact of Adverse Selection

Transcription:

Assessng Studes Based on Multple Regresson (SW Chapter 9) Outlne 1. Internal and External Valdty 2. Threats to Internal Valdty a. Omtted varable bas b. Functonal form msspecfcaton c. Errors-n-varables bas d. Mssng data and sample selecton bas e. Smultaneous causalty bas 3. Applcaton to Test Scores SW Ch. 9 1/48

Internal and External Valdty Let s step back and take a broader look at regresson. Is there a systematc way to assess (crtque) regresson studes? We know the strengths of multple regresson but what are the ptfalls? We wll lst the most common reasons that multple regresson estmates, based on observatonal data, can result n based estmates of the causal effect of nterest. In the test score applcaton, let s try to address these threats as best we can and assess what threats reman. After all ths work, what have we learned about the effect on test scores of class sze reducton? SW Ch. 9 2/48

A Framework for Assessng Statstcal Studes: Internal and External Valdty (SW Secton 9.1) Internal valdty: the statstcal nferences about causal effects are vald for the populaton beng studed. External valdty: the statstcal nferences can be generalzed from the populaton and settng studed to other populatons and settngs, where the settng refers to the legal, polcy, and physcal envronment and related salent features. SW Ch. 9 3/48

Threats to External Valdty of Multple Regresson Studes Assessng threats to external valdty requres detaled substantve knowledge and judgment on a case-by-case bass. How far can we generalze class sze results from Calforna? Dfferences n populatons o Calforna n 2011? o Massachusetts n 2011? o Mexco n 2011? Dfferences n settngs o dfferent legal requrements (e.g. specal educaton) o dfferent treatment of blngual educaton o dfferences n teacher characterstcs SW Ch. 9 4/48

Threats to Internal Valdty of Multple Regresson Analyss (SW Secton 9.2) Internal valdty: the statstcal nferences about causal effects are vald for the populaton beng studed. Fve threats to the nternal valdty of regresson studes: 1. Omtted varable bas 2. Wrong functonal form 3. Errors-n-varables bas 4. Sample selecton bas 5. Smultaneous causalty bas All of these mply that E(u X 1,,X k ) 0 (or that condtonal mean ndependence fals) n whch case OLS s based and nconsstent. SW Ch. 9 5/48

1. Omtted varable bas Omtted varable bas arses f an omtted varable s both: () a determnant of Y and () correlated wth at least one ncluded regressor. We frst dscussed omtted varable bas n regresson wth a sngle X. OV bas arses n multple regresson f the omtted varable satsfes condtons () and () above. If the multple regresson ncludes control varables, then we need to ask whether there are omtted factors that are not adequately controlled for, that s, whether the error term s correlated wth the varable of nterest even after we have ncluded the control varables. SW Ch. 9 6/48

Solutons to omtted varable bas 1. If the omtted causal varable can be measured, nclude t as an addtonal regressor n multple regresson; 2. If you have data on one or more controls and they are adequate (n the sense of condtonal mean ndependence plausbly holdng) then nclude the control varables; 3. Possbly, use panel data n whch each entty (ndvdual) s observed more than once; 4. If the omtted varable(s) cannot be measured, use nstrumental varables regresson; 5. Run a randomzed controlled experment. Why does ths work? Remember f X s randomly assgned, then X necessarly wll be dstrbuted ndependently of u; thus E(u X = x) = 0. SW Ch. 9 7/48

2. Wrong functonal form (functonal form msspecfcaton) Arses f the functonal form s ncorrect for example, an nteracton term s ncorrectly omtted; then nferences on causal effects wll be based. Solutons to functonal form msspecfcaton 1. Contnuous dependent varable: use the approprate nonlnear specfcatons n X (logarthms, nteractons, etc.) 2. Dscrete (example: bnary) dependent varable: need an extenson of multple regresson methods ( probt or logt analyss for bnary dependent varables). SW Ch. 9 8/48

3. Errors-n-varables bas So far we have assumed that X s measured wthout error. In realty, economc data often have measurement error Data entry errors n admnstratve data Recollecton errors n surveys (when dd you start your current job?) Ambguous questons (what was your ncome last year?) Intentonally false response problems wth surveys (What s the current value of your fnancal assets? How often do you drnk and drve?) SW Ch. 9 9/48

Errors-n-varables bas, ctd. In general, measurement error n a regressor results n errorsn-varables bas. A bt of math shows that errors-n-varables typcally leads to correlaton between the measured varable and the regresson error. Consder the sngle-regressor model: Y = 0 + 1 X + u and suppose E(u X ) = 0). Let X = unmeasured true value of X X = ms-measured verson of X (the observed data) SW Ch. 9 10/48

Then Y = 0 + 1 X + u = 0 + 1 X + [ 1 (X So the regresson you run s, Y = 0 + 1 X ) + u ] X + u, where u = 1 (X X ) + u Wth measurement error, typcally ˆ s based: 1 cov( X, u ) = cov( X, 1 (X = 1 cov( X, X It s often plausble that cov( X s correlated wth u so X ) + u ) X ) + cov( cov( X,u ) = 0 f the measurement error n wth u ). But typcally cov( X,u ) X,u ) = 0 (f E(u X ) = 0 then X, X X ) 0. X s uncorrelated SW Ch. 9 11/48

Errors-n-varables bas, ctd. Y = 0 + 1 X + u, where u = 1 (X X ) + u cov( X, u ) = cov( X, 1 (X = 1 cov( X, X = 1 cov( X, X X ) + u ) X ) + cov( X ) f cov( X,u ) X,u ) = 0 To get some ntuton for the problem, consder two specal cases: A. Classcal measurement error B. Best guess measurement error SW Ch. 9 12/48

A. Classcal measurement error The classcal measurement error model assumes that X = X + v, where v s mean-zero random nose wth corr(x, v ) = 0 and corr(u, v ) = 0. Under the classcal measurement error model, ˆ 1 s based towards zero. Here s the dea: Suppose you take the true varable then add a huge amount of random nose random numbers generated by the computer. In the lmt of all nose, X wll be unrelated to Y (and to everythng else), so the regresson coeffcent wll have expectaton zero. If has some nose but sn t all nose then the relaton between X and Y wll be attenuated, so ˆ 1 s based towards zero. SW Ch. 9 13/48 X

Classcal measurement error: the math X = X + v, where corr(x, v ) = 0 and corr(u, v ) = 0. Then var( X ) = cov( X, X 2 X + 2 v X ) = cov(x + v, v ) = so 2 cov( X, u ) = 1 v so ˆ 2 2 p v v 1 1 1 = 1 2 2 1 X X 2 2 2 X v X = 2 1 = 2 2 1 X X v so ˆ 1 s based towards zero. The classcal measurement error model s specal because t assumes corr(x, v ) = 0. 2 v SW Ch. 9 14/48

B. Best Guess measurement error Suppose the respondent doesn t remember X, but makes a best guess of the form X = E(X W ), where E(u W ) = 0. Then, cov( X, u ) = cov( X, 1 (X X ) + u ) = 1 cov( X, X X ) + cov( X,u ) cov( X, X X ) = 0 because X = E(X W ) (because X s the best guess, the error X X s uncorrelated wth X ). cov( X,u ) = 0 because E(u W ) = 0 ( X s a functon of W and E(u W ) = 0). Thus cov( X, u ) = 0, so 1 s unbased. ˆ SW Ch. 9 15/48

Best guess measurement error model, ctd. Under the Best Guess model, you stll have measurement error you don t observe the true value of X but there ths measurement error doesn t ntroduce bas nto ˆ 1! The best guess model s extreme t sn t enough to make a good guess, you need the best guess X = E(X W ), that s, the condtonal expectaton of X gven W, where E(u W ) = 0. SW Ch. 9 16/48

Lessons from the classcal and best-guess models: The amount of bas n ˆ 1 depends on the nature of the measurement error these models are two specal cases. If there s pure nose added to X, then ˆ 1 s based towards zero. The best guess model s extreme. In general, f you thnk there s measurement error, you should worry about measurement error bas. The potental mportance of measurement error bas depends on how the data are collected. o Some admnstratve data (e.g. number of teachers n a school dstrct) are often qute accurate. o Survey data on senstve questons (how much do you earn?) often have consderable measurement error. SW Ch. 9 17/48

Solutons to errors-n-varables bas 1. Obtan better data (often easer sad than done). 2. Develop a specfc model of the measurement error process. Ths s only possble f a lot s known about the nature of the measurement error for example a subsample of the data are cross-checked usng admnstratve records and the dscrepances are analyzed and modeled. (Very specalzed; we won t pursue ths here.) 3. Instrumental varables regresson. SW Ch. 9 18/48

4. Mssng data and sample selecton bas Data are often mssng. Sometmes mssng data ntroduces bas, sometmes t doesn t. It s useful to consder three cases: 1. Data are mssng at random. 2. Data are mssng based on the value of one or more X s 3. Data are mssng based n part on the value of Y or u Cases 1 and 2 don t ntroduce bas: the standard errors are larger than they would be f the data weren t mssng but ˆ 1 s unbased. Case 3 ntroduces sample selecton bas. SW Ch. 9 19/48

Mssng data: Case 1 1. Data are mssng at random Suppose you took a smple random sample of 100 workers and recorded the answers on paper but your dog ate 20 of the response sheets (selected at random) before you could enter them nto the computer. Ths s equvalent to your havng taken a smple random sample of 80 workers (thnk about t), so your dog ddn t ntroduce any bas. SW Ch. 9 20/48

Mssng data: Case 2 2. Data are mssng based on a value of one of the X s In the test score/class sze applcaton, suppose you restrct your analyss to the subset of school dstrcts wth STR < 20. By only consderng dstrcts wth small class szes you won t be able to say anythng about dstrcts wth large class szes, but focusng on just the small-class dstrcts doesn t ntroduce bas. Ths s equvalent to havng mssng data, where the data are mssng f STR > 20. More generally, f data are mssng based only on values of X s, the fact that data are mssng doesn t bas the OLS estmator. SW Ch. 9 21/48

Mssng data: Case 3 3. Data are mssng based n part on the value of Y or u In general ths type of mssng data does ntroduce bas nto the OLS estmator. Ths type of bas s also called sample selecton bas. Sample selecton bas arses when a selecton process: () nfluences the avalablty of data and () s related to the dependent varable. SW Ch. 9 22/48

Example #1: Heght of undergraduates Your stats prof asks you to estmate the mean heght of undergraduate males. You collect your data (obtan your sample) by standng outsde the basketball team s locker room and recordng the heght of the undergraduates who enter. Is ths a good desgn wll t yeld an unbased estmate of undergraduate heght? Formally, you have sampled ndvduals n a way that s related to the outcome Y (heght), whch results n bas. SW Ch. 9 23/48

Example #2: Mutual funds Do actvely managed mutual funds outperform hold-themarket funds? Emprcal strategy: o Samplng scheme: smple random samplng of mutual funds avalable to the publc on a gven date. o Data: returns for the precedng 10 years. o Estmator: average ten-year return of the sample mutual funds, mnus ten-year return on S&P500 o Is there sample selecton bas? (Equvalently, are data mssng based n part on the value of Y or u?) o How s ths example lke the basketball player example? SW Ch. 9 24/48

Sample selecton bas nduces correlaton between a regressor and the error term. Mutual fund example: return = 0 + 1 managed_fund + u Beng a managed fund n the sample (managed_fund = 1) means that your return was better than faled managed funds, whch are not n the sample so corr(managed_fund,u ) 0. The survvng mutual funds are the basketball players of mutual funds. SW Ch. 9 25/48

Example #3: returns to educaton What s the return to an addtonal year of educaton? Emprcal strategy: o Samplng scheme: smple random sample of employed college grads (employed, so we have wage data) o Data: earnngs and years of educaton o Estmator: regress ln(earnngs) on years_educaton o Ignore ssues of omtted varable bas and measurement error s there sample selecton bas? o How does ths relate to the basketball player example? SW Ch. 9 26/48

Solutons to sample selecton bas Collect the sample n a way that avods sample selecton. o Basketball player example: obtan a true random sample of undergraduates, e.g. select students at random from the enrollment admnstratve lst. o Mutual funds example: change the sample populaton from those avalable at the end of the ten-year perod, to those avalable at the begnnng of the perod (nclude faled funds) o Returns to educaton example: sample college graduates, not workers (nclude the unemployed) Randomzed controlled experment. Construct a model of the sample selecton problem and estmate that model (we won t do ths). SW Ch. 9 27/48

5. Smultaneous causalty bas So far we have assumed that X causes Y. What f Y causes X, too? Example: Class sze effect Low STR results n better test scores But suppose dstrcts wth low test scores are gven extra resources: as a result of a poltcal process they also have low STR What does ths mean for a regresson of TestScore on STR? SW Ch. 9 28/48

Smultaneous causalty bas n equatons (a) Causal effect on Y of X: (b) Causal effect on X of Y: Y = 0 + 1 X + u X = 0 + 1 Y + v Large u means large Y, whch mples large X (f 1 >0) Thus corr(x,u ) 0 Thus ˆ 1 s based and nconsstent. Example: A dstrct wth partcularly bad test scores gven the STR (negatve u ) receves extra resources, thereby lowerng ts STR; so STR and u are correlated SW Ch. 9 29/48

Solutons to smultaneous causalty bas 1. Run a randomzed controlled experment. Because X s chosen at random by the expermenter, there s no feedback from the outcome varable to Y (assumng perfect complance). 2. Develop and estmate a complete model of both drectons of causalty. Ths s the dea behnd many large macro models (e.g. Federal Reserve Bank-US). Ths s extremely dffcult n practce. 3. Use nstrumental varables regresson to estmate the causal effect of nterest (effect of X on Y, gnorng effect of Y on X). SW Ch. 9 30/48

Internal and External Valdty When the Regresson s Used for Forecastng (SW Secton 9.3) Forecastng and estmaton of causal effects are qute dfferent objectves. For forecastng, 2 o R matters (a lot!) o Omtted varable bas sn t a problem! o Interpretng coeffcents n forecastng models s not mportant the mportant thng s a good ft and a model you can trust to work n your applcaton o External valdty s paramount: the model estmated usng hstorcal data must hold nto the (near) future o More on forecastng when we take up tme seres data SW Ch. 9 31/48

Applyng External and Internal Valdty: Test Scores and Class Sze (SW Secton 9.4) Objectve: Assess the threats to the nternal and external valdty of the emprcal analyss of the Calforna test score data. External valdty o Compare results for Calforna and Massachusetts o Thnk hard Internal valdty o Go through the lst of fve potental threats to nternal valdty and thnk hard SW Ch. 9 32/48

Check of external valdty We wll compare the Calforna study to one usng Massachusetts data The Massachusetts data set 220 elementary school dstrcts Test: 1998 MCAS test fourth grade total (Math + Englsh + Scence) Varables: STR, TestScore, PctEL, LunchPct, Income SW Ch. 9 33/48

The Massachusetts data: summary statstcs SW Ch. 9 34/48

Test scores vs. Income & regresson lnes: Massachusetts data SW Ch. 9 35/48

SW Ch. 9 36/48

How do the Mass and Calforna results compare? Logarthmc v. cubc functon for STR? Evdence of nonlnearty n TestScore-STR relaton? Is there a sgnfcant HEL STR nteracton? SW Ch. 9 37/48

Predcted effects for a class sze reducton of 2 Lnear specfcaton for Mass: TestScore = 744.0 0.64STR 0.437PctEL 0.582LunchPct (21.3) (0.27) (0.303) (0.097) 3.07Income + 0.164Income 2 0.0022Income 3 (2.35) (0.085) (0.0010) Estmated effect = -0.64 (-2) = 1.28 Standard error = 2 0.27 = 0.54 NOTE: var(ay) = a 2 var(y); SE(a ˆ 1 ) = a SE( ˆ 1 ) 95% CI = 1.28 1.96 0.54 = (0.22, 2.34) SW Ch. 9 38/48

Computng predcted effects n nonlnear models Use the before and after method: TestScore = 655.5 + 12.4STR 0.680STR 2 + 0.0115STR 3 0.434PctEL 0.587LunchPct 3.48Income + 0.174Income 2 0.0023Income 3 Estmated reducton from 20 students to 18: TestScore = [12.4 20 0.680 20 2 + 0.0115 20 3 ] [12.4 18 0.680 18 2 + 0.0115 18 3 ] = 1.98 compare wth estmate from lnear model of 1.28 SE of ths estmated effect: use the rearrange the regresson ( transform the regressors ) method SW Ch. 9 39/48

Summary of Fndngs for Massachusetts Coeffcent on STR falls from 1.72 to 0.69 when control varables for student and dstrct characterstcs are ncluded an ndcaton that the orgnal estmate contaned omtted varable bas. The class sze effect s statstcally sgnfcant at the 1% sgnfcance level, after controllng for student and dstrct characterstcs No statstcal evdence on nonlneartes n the TestScore STR relaton No statstcal evdence of STR PctEL nteracton SW Ch. 9 40/48

Comparson of estmated class sze effects: CA vs. MA SW Ch. 9 41/48

Summary: Comparson of Calforna and Massachusetts Regresson Analyses Class sze effect falls n both CA, MA data when student and dstrct control varables are added. Class sze effect s statstcally sgnfcant n both CA, MA data. Estmated effect of a 2-student reducton n STR s quanttatvely smlar for CA, MA. Nether data set shows evdence of STR PctEL nteracton. Some evdence of STR nonlneartes n CA data, but not n MA data. SW Ch. 9 42/48

Step back: what are the remanng threats to nternal valdty n the test score/class sze example? 1. Omtted varable bas? What causal factors are mssng? Student characterstcs such as natve ablty Access to outsde learnng opportuntes Other dstrct qualty measures such as teacher qualty The regressons attempt to control for these omtted factors usng control varables that are not necessarly causal but are correlated wth the omtted causal varables: dstrct demographcs (ncome, % free lunch elgble) Fracton of Englsh learners SW Ch. 9 43/48

Omtted varable bas, ctd. Are the control varables effectve? That s, after ncludng the control varables, s the error term uncorrelated wth STR? Answerng ths requres usng judgment. There s some evdence that the control varables mght be dong ther job: o The STR coeffcent doesn t change much when the control varables specfcatons change o The results for Calforna and Massachusetts are smlar so f there s OV bas remanng, that OV bas would need to be smlar n the two data sets What addtonal control varables mght you want to use and what would they be controllng for? SW Ch. 9 44/48

2. Wrong functonal form? We have tred qute a few dfferent functonal forms, n both the Calforna and Mass. data Nonlnear effects are modest Plausbly, ths s not a major threat at ths pont. 3. Errors-n-varables bas? The data are admnstratve so t s unlkely that there are substantal reportng/typo type errors. STR s a dstrct-wde measure, so students who take the test mght not have experenced the measured STR for the dstrct a complcated type of measurement error Ideally we would lke data on ndvdual students, by grade level. SW Ch. 9 45/48

4. Sample selecton bas? Sample s all elementary publc school dstrcts (n Calforna and n Mass.) there are no mssng data No reason to thnk that selecton s a problem. 5. Smultaneous causalty bas? School fundng equalzaton based on test scores could cause smultaneous causalty. Ths was not n place n Calforna or Mass. durng these samples, so smultaneous causalty bas s arguably not mportant. SW Ch. 9 46/48

Addtonal example for class dscusson Does appearng on Amerca s Most Wanted TV show ncrease your chance of beng caught? reference: Thomas Mles (2005), Estmatng the Effect of Amerca s Most Wanted: A Duraton Analyss of Wanted Fugtves, Journal of Law and Economcs, 281-306. Observatonal unt: Fugtve crmnals Samplng scheme: 1200 male fugtves, from FBI, NYCPD, LAPD, PhlaPD, USPS, Federal Marshalls Web stes (all data were downloaded from the Web!) Dependent varable: length of spell (years untl capture) Regressors: o Appearance on Amerca s Most Wanted (175 of the 1200) (then arng on Fox, Saturdays, 9pm) o type of offence, personal characterstcs SW Ch. 9 47/48

Amerca s Most Wanted: Threats to Internal and External Valdty External valdty: what would you want to extrapolate the results to havng the show ar longer? puttng on a second show of the same type? Be precse. Internal valdty: how mportant are these threats? 1. Omtted varable bas 2. Wrong functonal form 3. Errors-n-varables bas 4. Sample selecton bas 5. Smultaneous causalty bas Anythng else? SW Ch. 9 48/48