AN OVERVIEW OF INSTRUMENTAL VARIABLES*

Similar documents
Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Ec1123 Section 7 Instrumental Variables

AGEC 661 Note Fourteen

miivfind: A command for identifying model-implied instrumental variables for structural equation models in Stata

Topics in Applied Econometrics and Development - Spring 2014

1 Motivation for Instrumental Variable (IV) Regression

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

150C Causal Inference

Q&A ON CAUSAL INDICATORS IN STRUCTURAL EQUATION MODELS

arxiv: v1 [stat.me] 30 Aug 2018

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

ECON Introductory Econometrics. Lecture 16: Instrumental variables

ECON Introductory Econometrics. Lecture 17: Experiments

ECO375 Tutorial 8 Instrumental Variables

Econometrics. 8) Instrumental variables

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

8. Instrumental variables regression

Instrumental Variables

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Introduction to Structural Equation Modeling

Instrumental Variables and the Problem of Endogeneity

Instrumental Variables and GMM: Estimation and Testing. Steven Stillman, New Zealand Department of Labour

Job Training Partnership Act (JTPA)

Applied Health Economics (for B.Sc.)

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

The Simple Linear Regression Model

Linear Regression with Multiple Regressors

Instrumental variables estimation using heteroskedasticity-based instruments

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Econometrics of Panel Data

26:010:557 / 26:620:557 Social Science Research Methods

Empirical approaches in public economics

14.32 Final : Spring 2001

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables

Handout 12. Endogeneity & Simultaneous Equation Models

Instrumental Variables in Action


Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Two-Variable Regression Model: The Problem of Estimation

Final Exam. Economics 835: Econometrics. Fall 2010

Motivation for multiple regression

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Econometrics Problem Set 11

Lecture 4: Heteroskedasticity

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Birkbeck Working Papers in Economics & Finance

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Endogeneity. Tom Smith

ECO375 Tutorial 9 2SLS Applications and Endogeneity Tests

LECTURE 1. Introduction to Econometrics

Quantitative Economics for the Evaluation of the European Policy

Applied Microeconometrics (L5): Panel Data-Basics

Asymptotic Properties and simulation in gretl

Consequences of measurement error. Psychology 588: Covariance structure and factor models

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Linear Regression with Multiple Regressors

Applied Statistics and Econometrics

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models

Applied Quantitative Methods II

Recitation Notes 5. Konrad Menzel. October 13, 2006

Sociology 593 Exam 2 March 28, 2002

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

WISE International Masters

Instrumental variables estimation using heteroskedasticity-based instruments

HOW TO TEST ENDOGENEITY OR EXOGENEITY: AN E-LEARNING HANDS ON SAS

Treatment Effects with Normal Disturbances in sampleselection Package

ECON3150/4150 Spring 2015

Linear Regression. Junhui Qian. October 27, 2014

Gov 2000: 9. Regression with Two Independent Variables

Instrumental Variables in Action: Sometimes You get What You Need

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Econometrics I. Lecture 8: Instrumental Variables and GMM. Paul T. Scott NYU Stern. Fall Paul T. Scott NYU Stern Econometrics I Fall / 78

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Instrumental Variables

An overview of applied econometrics

Instrumental Variables

Gov 2002: 4. Observational Studies and Confounding

Causal Mechanisms Short Course Part II:

Analysis of Panel Data: Introduction and Causal Inference with Panel Data

Multiple Linear Regression

Econometrics Summary Algebraic and Statistical Preliminaries

Economics 241B Estimation with Instruments

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Chapter 8. Models with Structural and Measurement Components. Overview. Characteristics of SR models. Analysis of SR models. Estimation of SR models

An explanation of Two Stage Least Squares

Econometrics Review questions for exam

Sociology 593 Exam 2 Answer Key March 28, 2002

Using Instrumental Variables to Find Causal Effects in Public Health

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

ECON The Simple Regression Model

08 Endogenous Right-Hand-Side Variables. Andrius Buteikis,

EVALUATING EFFECT, COMPOSITE, AND CAUSAL INDICATORS IN STRUCTURAL EQUATION MODELS 1

Multivariate Regression Analysis

Transcription:

AN OVERVIEW OF INSTRUMENTAL VARIABLES* KENNETH A BOLLEN CAROLINA POPULATION CENTER UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL *Based On Bollen, K.A. (2012). Instrumental Variables In Sociology and the Social Sciences. Annual Review Of Sociology 38:37-72.

OUTLINE I. INTRODUCTION II. WHAT ARE INSTRUMENTAL VARIABLES (IVs)? III. ORIGINS OF INSTRUMENTAL VARIABLE METHODS IV. APPLICATION AREAS V. FINDING INSTRUMENTAL VARIABLES VI. EVALUATING INSTRUMENTAL VARIABLES VII. HETEROGENOUS CAUSAL EFFECTS VIII. CONCLUSIONS

INTRODUCTION Many reasons for equation error to correlate with covariate Can create bias/inconsistent estimator Instrumental Variables (IVs) can help IVs methods appear in many social sciences Spreading through even more disciplines Purpose of presentation Broad overview of IVs Give advantages & disadvantages Present methods to assess quality of IVs

WHAT ARE INSTRUMENTAL VARIABLES (IVS)? The Problem: COV(X, ε) 0 Y i = α + β X i + ε Yi

WHAT ARE INSTRUMENTAL VARIABLES (IVS)?

WHAT ARE INSTRUMENTAL VARIABLES (IVS)? Ordinary Least Squares applied (simple regression) β OLS = COV(X,Y ) VAR(X) β Biased estimator if ignore correlation of error with X

WHAT ARE INSTRUMENTAL VARIABLES (IVS)? The Instrumental Variable Solution (simple regression) For variable Z to be IV: COV(Z, ε Y ) = 0 COV(Z, X) 0 COV(Y,Z) = COV(α + β X + ε Y,Z) = βcov(x,z) β IV = COV(Y,Z) COV(X,Z) = β

WHAT ARE INSTRUMENTAL VARIABLES (IVS)? Ordinary Least Squares applied (multiple regression) Y = Xβ + ε Y = X β + X β + ε 1 1 2 2 Y Separates X in 2 parts: X and X,where 1 2 X correlates with ε problem! 1 Y X does not correlate with ε 2 Y ˆβ = (X ' X) 1 X 'Y OLS ˆβ biased (& inconsistent) estimator of β OLS

WHAT ARE INSTRUMENTAL VARIABLES (IVS)? Ordinary Least Squares applied (multiple regression) Y = X β + X β + ε 1 1 2 2 Y Y = birth weight X = smoking, drinking 1 X = age, race, first child 2 ˆβ biased (& inconsistent) estimator of β OLS (estimates of effects of all variables biased to unknown degree)

WHAT ARE INSTRUMENTAL VARIABLES (IVS)? The Instrumental Variable Solution (multiple regression) Y = X β + X β + ε 1 1 2 2 Y [recall C(X, ε ) 0; C(X, ε ) = 0] 1 Y 2 Y Z = [X X ] where C(X, ε ) = 0 2 3 3 Y ˆβ = (X 'P X) 1 X 'P Y where P = Z(Z 'Z) 1 Z ' IV Z Z Z ˆβ consistent (asymp unbiased) estimator of β IV

WHAT ARE INSTRUMENTAL VARIABLES (IVS)? Instrumental Variable Solution (multiple regression) = X 1 β 1 + X 2 β 2 + ε Y = birth weight X 1 = smoking, drinking X 2 = age, race, 1st child IVs are in Z = [X 2 X 3 ] Need X 3 X 3 = tobacco & alcohol receipts, smoke & drink history, spouse reports, DUI tickets ˆβ IV = (X 'P Z X) 1 X 'P Z Y where P Z = Z(Z 'Z) 1 Z '

WHAT ARE INSTRUMENTAL VARIABLES (IVS)? IV procedures in Stata, SAS, etc. Stata: ivregress 2sls brthwght age race frstchld (smoke, drink=hissmok hisdrnk spsmoke spdrink DUI) Y = X 1 β 1 + X 2 β 2 + ε Y Y = birth weight X 1 = smoking, drinking X 2 = age, race, 1st child X 3 = smoke & drink history, spouse reports, DUI tickets

WHAT ARE INSTRUMENTAL VARIABLES (IVS)? IV procedures in Stata, SAS, etc. SAS: Proc syslin 2sls; endogenous brthwght smoke drink; instruments age race frstchld hissmok hisdrnk spsmoke spdrink DUI; Model brthwght=age race frstchld smoke drink; Y = X 1 β 1 + X 2 β 2 + ε Y Y = birth weight X 1 = smoking, drinking X 2 = age, race, 1st child X 3 = smoke & drink history, spouse reports, DUI tickets

ORIGINS OF INSTRUMENTAL VARIABLE METHODS Sewall Wright (1925, Corn and Hog Correlations, US Dept Agric Bull.) Goldberger (1972) credits Sewall Philip Wright (1928, The Tariff on Animal and Vegetable Oils) Appendix B has IVs in supply & demand problem Which Wright is right? Controversy Prior to Goldberger (1972) many gave credit to Reiersøl (1941, 1945)

APPLICATION AREAS Simultaneous Equation Models Early ones in path analysis Highly developed in econometrics Two or more dependent variables Assume no measurement error Y = α + BY + ΓX + ε where Y = endogenous vars, X=exogenous vars., α = intercepts, B = coefficients, Γ = coefficients, ε = errors

APPLICATION AREAS Simultaneous Equation Models Felson & Bohrnstedt (1979) GPA height academic ζ 1 weight rating attract ζ 2

APPLICATION AREAS Simultaneous Equation Models Consider academic equation: academic = α + β attract + β GPA +ε 1 1 2 1 X = [X X ] X = attract, X = GPA 1 2 1 2 Z = [X X ] X = height, weight, rating 2 3 3 ˆβ = (X 'P X) 1 X 'P Y where P = Z(Z 'Z) 1 Z ' IV Z Z Z

APPLICATION AREAS Simultaneous Equation Models Consider academic equation: academic = α + β attract + β GPA +ε 1 1 2 1 Feedback COV(attract, ε ) 0 1 Need at least 1 IV in X. 3 2 or more IVs overidentified X = height, weight, rating overidentified 3 ˆβ = (X 'P X) 1 X 'P Y IV Z Z

APPLICATION AREAS Factor Analysis (less common application) Madansky (1964, Psychometrika) 1 st to suggest Exploratory factor analysis No correlated errors Approach here based on Bollen (1996, Psychometrika) Confirmatory (or exploratory) factor analysis Allows correlated errors Z = α + ΛL + ε Z = indicators, L= latent vars (factors), α = intercepts, Λ = coefficients (loadings), ε = errors

APPLICATION AREAS Factor Analysis (less common application) 1 factor, 4 indicators Subjective Air Quality L 1 Overall Z 1 Clarity Z 2 Color Z 3 Odor Z 4 ε 1 ε 2 ε 3 ε 4

APPLICATION AREAS Factor Analysis (less common application) 1 factor, 4 indicators Z = L + ε (set scale of L ) 1 1 1 1 Z = α + Λ L + ε 2 2 21 1 2 Z = α + Λ L + ε 3 3 31 1 3 Z = α + Λ L + ε 4 4 41 1 4

APPLICATION AREAS Factor Analysis (less common application) 1 factor, 4 indicators Z 1 = L 1 + ε 1 L 1 = Z 1 ε 1 Consider 2nd indicator equation: Z = α + Λ L + ε 2 2 21 1 2 = α + Λ (Z ε ) + ε 2 21 1 1 2 = α + Λ Z Λ ε + ε 2 21 1 21 1 2 COV(Z,ε ) 0 need IVs 1 1

APPLICATION AREAS Factor Analysis (less common application) 1 factor, 4 indicators Z = α + Λ Z Λ ε + ε 2 2 21 1 21 1 2 COV(Z,ε ) 0 need IVs 1 1 IVs must: (1) correlate with Z 1 (2) uncorrelated with ε,ε 1 2 Z & Z meet these conditions 3 4

APPLICATION AREAS Factor Analysis (less common application) 1 factor, 4 indicators Z 2 = α 2 + Λ 21 Z 1 Λ 21 ε 1 + ε 2 IV formula: ˆβ IV = (X 'P Z X) 1 X 'P Z Y For Z 2 eq.: ˆΛ 21 is ˆβ IV, Z 1 is X, Z 2 is Y Z 3, Z 4 form Z and P Z = Z(Z 'Z) 1 Z '

APPLICATION AREAS Factor Analysis (less common application) 1 factor, 4 indicators Subjective Air Quality L 1 Overall Z 1 Clarity Z 2 Color Z 3 Odor Z 4 ε 1 ε 2 ε 3 ε 4

APPLICATION AREAS Factor Analysis (less common application) General Procedure for IV estimation: Replace each latent variable with its scaling indicator minus its error Transforms latent variable model into observed variable model For each equation find those indicators from other equations that are uncorrelated with error Apply usual IV formula Because suitable IVs are dictated by model, I refer to these as Model Implied Instrumental Variables (MIIVs) Tests for overidentified equations are tests of model

APPLICATION AREAS Latent Variable SEM (less common application) Bollen (1996, Psychometrika) L = α L + BL + ε L Y = α Z + ΛL + ε Z Y = indicators, L= latent vars (factors), ε L, ε Y = errors for L &Y eqs., respectively α L, α Y = intercepts for L &Y eqs., respectively B, Λ= coefficients for L &Y eqs., respectively

APPLICATION AREAS Latent Variable SEM (less common application) Robins & West (1977, JASA) Y 1 ε L Y NY -2 ε YNY -2 Y 2 L Y NY -1 ε YNY -1 Y 3 Y NY ε YNY Y NY -3

APPLICATION AREAS Latent Variable SEM (less common application) Robins & West (1977, JASA) L 1 = value of home Y 1 = lot size Y 2 = square footage Y 3 = number of rooms Y 3 to Y NY -3 = other causal indicators Y NY -2 = appraised value Y NY -1 = owner estimate Y NY = assessed value ε L, ε YNY -1, ε YNY -2, ε YNY -3 = disturbances (errors) N Y = # of observed variables

APPLICATION AREAS Latent Variable SEM (less common application) General Procedure for IV estimation: Replace each latent variable with its scaling indicator minus its error Transforms latent variable model into observed variable model For each equation find those indicators from other equations that are uncorrelated with error Apply usual IV formula Model Implied Instrumental Variables (MIIVs) Tests for overidentified equations are tests of model

APPLICATION AREAS Dichotomous/ordinal dependent variable Y * = Xβ + ε 1 if Y Dichotomous outcome: Y= * > 0 0 if Y * 0 e.g., Y= 1 HIV positive, 0 not Ordinal outcome: Y=c, if τ c Y * < τ c+1 τs are thresholds crossed by Y * e.g., abortion attitude, Y= 0 to 5

APPLICATION AREAS Dichotomous/ordinal dependent variable (probit/logistic) Y * = Xβ + ε = X 1 β 1 + X 2 β 2 + ε Separates X in 2 parts: X 1 and X 2,where X 1 correlates with ε problem! X 2 does not correlate with ε Find Z = [X 2 X 3 ] where C(X 3, ε) = 0 Z are IVs

APPLICATION AREAS Dichotomous/ordinal dependent variable (probit/logistic) Approaches Treat Y as if continuous Y * = X 1 β 1 + X 2 β 2 + ε Same procedure as illustrated for usual regression Need heteroscedastic consistent standard errors 7 or more ordinal categories or for exploratory research Some highly critical of this approach

APPLICATION AREAS Dichotomous/ordinal dependent variable (probit/logistic) Approaches Y * = X 1 β 1 + X 2 β 2 + ε Instrumental variable probit/logit method (Lee, 1981; Rivers & Vuong, 1988) Use ˆX 1 in place of X 1 in above Do probit/logistic Problems: 1. Standard errors might not be good 2. Scaling differs from original equation

APPLICATION AREAS Dichotomous/ordinal dependent variable (probit/logistic) Other Approaches Y * = X 1 β 1 + X 2 β 2 + ε Two-stage conditional probit (Vuong, 1984; Rivers & Vuong, 1988; Smith & Blundell, 1986) Polychoric instrumental variables (Bollen & Maydeu- Olivares, 2007) Limited evidence on which approach is best

FINDING INSTRUMENTAL VARIABLES Three main strategies: (1) Auxiliary Instrumental Variables (AIVs) (2) Model Implied Instrumental Variables (MIIVs) (3) Randomization Instrumental Variables (RIVs) (My classification. Usually distinctions not made.)

FINDING INSTRUMENTAL VARIABLES AUXILIARY INSTRUMENTAL VARIABLES (AIVs) Y * = X 1 β 1 + X 2 β 2 + ε X 1 correlates with ε problem! X 2 does not correlate with ε Find X 3 Get into trouble and look for a way out

FINDING INSTRUMENTAL VARIABLES AUXILIARY INSTRUMENTAL VARIABLES (AIVs) Y * = X 1 β 1 + X 2 β 2 + ε Find X 3 as IVs Get into trouble and look for a way out You have an endogeneity problem. Look for IVs not part of original model Earlier example on birth weight and need IVs for smoking, drinking during pregnancy Suggested pre-pregnancy smoking, drinking, spousal reports on mother s drinking, smoking, cigarette & alcohol receipts as possible IVs

FINDING INSTRUMENTAL VARIABLES AUXILIARY INSTRUMENTAL VARIABLES (AIVs) Advantages Helps permit asymp. unbiased estimation of effects Exact relation of AIV to endogenous variable not specified If more than minimum IVs then overidentification tests possible Disadvantages Ad hoc selection raises doubts about whether IV conditions met Less systematic thought of role of IV in model Tendency to seek just enough IVs to permit estimation Overidentification test of IV not possible

ε ε ε ε FINDING INSTRUMENTAL VARIABLES MODEL IMPLIED INSTRUMENTAL VARIABLES (MIIVs) Approach in Bollen (1996) Build Identified model, implies sufficient instruments Subjective Air Quality L 1 Overall Z 1 Clarity Z 2 Color Z 3 Odor Z 4

FINDING INSTRUMENTAL VARIABLES Z = L + ε L = Z ε 1 1 1 1 1 1 Consider 2nd indicator equation: Z = α + Λ L + ε 2 2 21 1 2 = α + Λ (Z ε ) + ε 2 21 1 1 2 = α + Λ Z Λ ε + ε 2 21 1 21 1 2 COV(Z,ε ) 0 need IVs 1 1

FINDING INSTRUMENTAL VARIABLES Model Implied Instrumental Variables: Z equation 2 Z = α + Λ Z Λ ε + ε 2 2 21 1 21 1 2 COV(Z,ε ) 0 need IVs 1 1 IVs must: (1) correlate with Z 1 (2) uncorrelated with ε,ε 1 2 Z & Z meet these conditions 3 4

FINDING INSTRUMENTAL VARIABLES MODEL IMPLIED INSTRUMENTAL VARIABLES (MIIVs) MIIVs found for each equation SAS macro Bollen & Bauer (2004) Stata macro Bauldry (2013) R package Fisher (in progress) Sources of MIIVs Exogenous observed variables Multiple indicators Sometimes endogenous observed variables

FINDING INSTRUMENTAL VARIABLES MODEL IMPLIED INSTRUMENTAL VARIABLES (MIIVs) Advantages More sustained effort & thought in building model rather than post hoc search for IVs Assumptions about variables explicit in model Overidentification tests to test assumption that all MIIVs uncorrelated with equation error Disadvantages Approximate nature of models implies that MIIVs are not exactly uncorrelated with error Excess power could reject reasonable approx. IVs Exactly identified equations have no test for MIIVs Problem shared with AIVs or any method that creates exactly identified equation

FINDING INSTRUMENTAL VARIABLES (Quasi) Randomization Instrumental Variables (RIVs) Intervention or treatment randomized One group randomly assigned to job training program, others form control group Natural experiments (quasi-experiments) Twin births, weather events, random assignments of roommates at college intention to treat variable is IV for treatment variable Acknowledges difference between assignment and actual treatment

FINDING INSTRUMENTAL VARIABLES (Quasi) Randomization Instrumental Variables (RIVs) Advantages Randomization or natural experiment nature makes correlation with omitted variables less likely Intention-to-treat variable highly correlated with those taking treatment Models can be simpler Disadvantages Assumes that all effects of the intention-to-treat variable go through treatment variable Job training selection gives hope & confidence to those selected, opposite for controls Hope & confidence might affect job search outcome rather than job training per se False confidence & decrease motivation to search as confounders Experimental context might not generalize to real world conditions Exact identification, no overidentification tests

EVALUATING INSTRUMENTAL VARIABLES Three main criteria for IVs: (1) IVs are uncorrelated with equation error (2) IVs associated with X 1 (vars that correlate with error) (3) No perfect collinearity among Zs

EVALUATING INSTRUMENTAL VARIABLES Y = X 1 β 1 + X 2 β 2 + ε [recall C(X 1, ε) 0; C(X 2, ε) = 0] Z = [X 2 X 3 ] where C(X 3, ε) = 0 Z contains IVs ˆβ IV = (X 'P Z X) 1 X 'P Z Y where P Z = Z(Z 'Z) 1 Z '

EVALUATING INSTRUMENTAL VARIABLES (1) IVs are uncorrelated with equation error Are the IVs uncorrelated with the error [C(Z, ε) = 0]? Sargan (1958) test: T S = ˆε 'Z(Z 'Z) 1 Z ' ˆε χ 2 ˆε ' ˆε / N Simple way to calculate: 1) regress ˆε on Z, 2) Get R 2 3) Form T S = NR 2 degrees of freedom = # of IVs above minimum e.g., X 1 has 3 vars., X 3 has 5, df=2.

EVALUATING INSTRUMENTAL VARIABLES Are the IVs uncorrelated with the error [C(Z, ε) = 0]? Sargan (1958) test: H 0 : All IVs uncorrelated with error [C(Z, ε) = 0] H a : 1 or more IVs correlate with error [C(Z, ε) 0] Rejection means problem with IVs Does not say which IV is problem Substantive vs. statistical significance - this is statistical significance test Test not applicable if exactly identified equation

EVALUATING INSTRUMENTAL VARIABLES Are the IVs uncorrelated with the error [C(Z, ε) = 0]? Sargan (1958) test Other IV tests available Kirby & Bollen (2009, SM) show that Sargan has best performance Sargan or Basmann tests widely available Stata, SAS, etc.

EVALUATING INSTRUMENTAL VARIABLES Three main criteria for IVs: (1) IVs are uncorrelated with equation error (2) IVs associated with X 1 (vars that correlate with error) (3) No perfect collinearity among Zs Check for nonsingular covariance (or correlation) matrix

EVALUATING INSTRUMENTAL VARIABLES IVs associated with X 1 (vars that correlate with error) Check for WEAK IVs Insufficient association increase standard errors Problem made worse if small association of error with IV Simple regression example (Bound et al (1995)) Show that IV estimator can be worse than OLS if IV weakly correlated with X 1 and small correlation of IV and error

EVALUATING INSTRUMENTAL VARIABLES IVs associated with X 1 (vars that correlate with error) Check for WEAK IVs Simple regression diagnostic Check correlation of Z and X 1 Multiple regression diagnostics more complicated Shea (1997) proposes partial R 2 measure See Bollen (2012) for review and references Growing interest in weak IVs diagnostics over last 20 years Tests available, though consensus on best method not there yet

EVALUATING INSTRUMENTAL VARIABLES How many IVs should we use? Sometimes we have many more IVs than minimum needed Should we use all available IVs? Based on analytic results for special cases & simulation results (e.g., Bollen et al., 2007), my recommendations: Small N : use 1 or 2 more than required minimum # of IVs E.g., N=50, X 1 has 2 vars, use 3 or 4 IVs from X 3 Big N: matters less

HETEROGENEOUS CAUSAL EFFECTS So far, assumed same causal effect for each case Y i = α + β X i + ε Yi β same for all i Suppose effect of X i on Y i differs by i Y i = α + β i X i + ε Yi β i allows effects to differ

HETEROGENEOUS CAUSAL EFFECTS Y i = α + β i X i + ε Yi IVs for heterogeneous causal effects Merging Neyman (1923)-Rubin (1974) potential outcome with IV literature Much of literature assumes dichotomous X i Catholic school or not on academic achievement Job training attendance on wages IV (Z i ) often dichotomous E.g., Angrist (1990) military service (X i ) impact on wages, IV is draft eligible lottery number (Z i )

HETEROGENEOUS CAUSAL EFFECTS Y i = α + β i X i + ε Yi IVs for heterogeneous causal effects Intention to treat mean effects of Z i on Y i E(Y i Z i = 1) E(Y i Z i = 0) IV causal effect of X i on Y i E[Y i Z i = 1] E[Y i Z i = 0] E[X i Z i = 1] E[X i Z i = 0] Local Average Treatment Effect (LATE) Treatment effect of X for those whose treatment can be changed by Z.

HETEROGENEOUS CAUSAL EFFECTS Y i = α + β i X i + ε Yi IVs for heterogeneous causal effects More assumptions than I have time to go over More complicated than models where homogenous effects assumed Vast developing literature on this approach

INSTRUMENTAL VARIABLES IN PRACTICE Varies by discipline and field Correlation of error and Xs typically ignored Many sources of correlation present but not treated Say nothing about it and hope others do the same When IVs are used common not to apply diagnostics for correlation of IVs with error or for weak IVs

CONCLUSIONS Measurement error, omitted variables, feedback loops, spatial correlation, etc. common in social and health sciences Creates correlation of error and covariates Biases usual estimates Problems largely ignored Instrumental variables help provide corrected estimates Diagnostic checks available on IVs Widely available in statistical software Right to be concerned with current use of IVs, but bigger problem is that IVs not used when they could help