Econ 673: Microeconometrics

Similar documents
Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects. The Problem

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Selection on Observables: Propensity Score Matching.

Flexible Estimation of Treatment Effect Parameters

What s New in Econometrics. Lecture 1

Imbens/Wooldridge, IRP Lecture Notes 2, August 08 1

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest

Principles Underlying Evaluation Estimators

Quantitative Economics for the Evaluation of the European Policy

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Causal Inference with Big Data Sets

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Implementing Matching Estimators for. Average Treatment Effects in STATA

The problem of causality in microeconometrics.

NBER WORKING PAPER SERIES A NOTE ON ADAPTING PROPENSITY SCORE MATCHING AND SELECTION MODELS TO CHOICE BASED SAMPLES. James J. Heckman Petra E.

By Marcel Voia. February Abstract

Lecture 10 Regression Discontinuity (and Kink) Design

Empirical Analysis III

Implementing Matching Estimators for. Average Treatment Effects in STATA. Guido W. Imbens - Harvard University Stata User Group Meeting, Boston

Regression Discontinuity Designs.

Empirical approaches in public economics

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

Table B1. Full Sample Results OLS/Probit

Difference-in-Differences Estimation

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

EMERGING MARKETS - Lecture 2: Methodology refresher

Job Training Partnership Act (JTPA)

Estimating Marginal and Average Returns to Education

Chapter 60 Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ

Matching Techniques. Technical Session VI. Manila, December Jed Friedman. Spanish Impact Evaluation. Fund. Region

Potential Outcomes Model (POM)

The problem of causality in microeconometrics.

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Regression Discontinuity Designs

Applied Microeconometrics (L5): Panel Data-Basics

ECON Introductory Econometrics. Lecture 17: Experiments

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Development. ECON 8830 Anant Nyshadham

A Measure of Robustness to Misspecification

Identifying the Effect of Changing the Policy Threshold in Regression Discontinuity Models

Difference-in-Differences Methods

Tables and Figures. This draft, July 2, 2007

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Lecture 9. Matthew Osborne

Prediction and causal inference, in a nutshell

Lecture 11 Roy model, MTE, PRTE

Estimation of Treatment Effects under Essential Heterogeneity

Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"

An Alternative Assumption to Identify LATE in Regression Discontinuity Designs

Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated 1

Estimating the Dynamic Effects of a Job Training Program with M. Program with Multiple Alternatives

Causality and Experiments

Why high-order polynomials should not be used in regression discontinuity designs

An Alternative Assumption to Identify LATE in Regression Discontinuity Design

CALIFORNIA INSTITUTE OF TECHNOLOGY

Instrumental Variables

Section 10: Inverse propensity score weighting (IPSW)

Controlling for overlap in matching

Comparative Advantage and Schooling

Microeconometrics. C. Hsiao (2014), Analysis of Panel Data, 3rd edition. Cambridge, University Press.

12E016. Econometric Methods II 6 ECTS. Overview and Objectives

Policy-Relevant Treatment Effects

studies, situations (like an experiment) in which a group of units is exposed to a

Gov 2002: 4. Observational Studies and Confounding

Bounds on Average and Quantile Treatment Effects of Job Corps Training on Wages*

Identification for Difference in Differences with Cross-Section and Panel Data

Controlling for Time Invariant Heterogeneity

Experiments and Quasi-Experiments

Statistical Models for Causal Analysis

Lecture 11/12. Roy Model, MTE, Structural Estimation

Instrumental Variables in Action: Sometimes You get What You Need

Course Description. Course Requirements

Econometric Methods for Ex Post Social Program Evaluation

Econometric Causality

Combining Non-probability and Probability Survey Samples Through Mass Imputation

A Course in Applied Econometrics. Lecture 2 Outline. Estimation of Average Treatment Effects. Under Unconfoundedness, Part II

Causal Inference Lecture Notes: Selection Bias in Observational Studies

ted: a Stata Command for Testing Stability of Regression Discontinuity Models

WORKSHOP ON PRINCIPAL STRATIFICATION STANFORD UNIVERSITY, Luke W. Miratrix (Harvard University) Lindsay C. Page (University of Pittsburgh)

The Problem of Causality in the Analysis of Educational Choices and Labor Market Outcomes Slides for Lectures

Bounds on Average and Quantile Treatment Effects of Job Corps Training on Wages*

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

ECONOMETRICS II (ECO 2401) Victor Aguirregabiria. Spring 2018 TOPIC 4: INTRODUCTION TO THE EVALUATION OF TREATMENT EFFECTS

Econometrics, Harmless and Otherwise

Regression Discontinuity

A Simulation-Based Sensitivity Analysis for Matching Estimators

Matching using Semiparametric Propensity Scores

The Economics of European Regions: Theory, Empirics, and Policy

Recitation Notes 6. Konrad Menzel. October 22, 2006

Notes on causal effects

Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand

Modeling Mediation: Causes, Markers, and Mechanisms

Problem 13.5 (10 points)

Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs

Transcription:

Econ 673: Microeconometrics Chapter 12: Estimating Treatment Effects Fall 2010 Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 1 / 80 Outline 1 Introduction 2 3 Difference-in-Difference 4 Regression Discontinuity 5 Partial Identification Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 2 / 80

Introduction The Problem Analysts are frequently interested in measuring the impact of a treatment on individual behavior; e.g., the impact of - job training programs on income - 401(k)s on household savings - teenage pregnancy on high school drop-out or college graduation rates - environmental regulations on pollution levels Randomized experiments are typically not an option for cost and/or ethical reasons. Comparisons of treatment and nontreatment outcomes in a nonexperimental setting are contaminated by the treatment selection process. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 3 / 80 Notation Introduction The choice of the treatment is assumed to be determined in the fashion of a standard RUM model, with V = µ r (Z, U V ) (1) denoting the latent variable determining the treatment choice and D = 1(V > 0) (2) denoting the choice outcome, where U V Z denotes factors observed by the analyst, and denotes factors not observed by the analyst, but known to the decisionmaker. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 4 / 80

Introduction Potential Outcomes Let Y 1 and Y 0 denote the outcome with and without the treatment, where Y 1 = µ 1 (X, U 1 ) D = 1 (3) Y 0 = µ 0 (X, U 0 ) D = 0 (4) The individual treatment effect is given by = Y 1 Y 0 (5) Additively separable specifications are often considered, with V = µ V (Z) + U V E(U V ) = 0 (6) Y 1 = µ 1 (X ) + U 1 E(U 1 ) = 0 (7) Y 0 = µ 0 (X ) + U 0 E(U 0 ) = 0 (8) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 5 / 80 Parameters of Interest Introduction Three different treatment effects are of interest 1 The average treatment effect ATE = E(Y 1 Y 0 X ) (9) 2 The treatment on the treated TT = E(Y 1 Y 0 X, D = 1) (10) 3 The marginal treatment effect MTE = E(Y 1 Y 0 X, Z, U V = u V ) (11) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 6 / 80

Introduction The Selection Problem in a Regression Context The fundamental problem is that each individual is only observed in one state of the world; i.e., we only observe Y = DY 1 + (1 D)Y 0 (12) where ɛ = DU 1 + (1 D)U 0. = D [µ 1 (X ) + U 1 ] + (1 D) [µ 0 (X ) + U 0 ] (13) = µ 0 (X ) + D [µ 1 (X ) µ 0 (X )] + ɛ (14) = µ 0 (X ) + D ATE(X ) + ɛ (15) Unfortunately, unless the treatment assignment is randomized, E(ɛ X, D) 0. (16) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 7 / 80 The Biases Introduction From the available samples, we can compute E(Y X, Z, D = 1) = E(Y 1 X, Z, D = 1) (17) E(Y X, Z, D = 0) = E(Y 0 X, Z, D = 0) (18) Integrating out Z yields E(Y X, D = 1) = E(Y 1 X, D = 1) (19) E(Y X, D = 0) = E(Y 0 X, D = 0) (20) The resulting bias from comparing (D = 1) and (D = 0) means Bias(TT ) = [E(Y 1 X, D = 1) E(Y 0 X, D = 0)] E(Y 1 Y 0 X, D = 1) = E(Y 0 X, D = 1) E(Y 0 X, D = 0) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 8 / 80

Introduction The Biases (cont d) For the ATE we have Bias(ATE) = [E(Y 1 X, D = 1) E(Y 0 X, D = 0)] E(Y 1 Y 0 X ) = [E(Y 1 X, D = 1) E(Y 1 X )] [E(Y 0 X, D = 0) E(Y 0 X )] A similar bias emerges for the marginal treatment effect. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 9 / 80 The Problem (cont d) Introduction Lalonde (1986, AER) used data from an actual experiment (the National Supported Work Demonstration Experiment) to study the performance of non-experimental estimators, including - simple regression adjustments - difference-in-differences - two step Heckman adjustment He found the alternative estimators produced very different estimates Most deviated substantially from experimental benchmarks There has in recent years been a boom in the development of alternative non-experimental estimators Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 10 / 80

Introduction A Number of Alternative Solutions Have Emerged Instrumental Variables Difference-in-Difference Regression Discontinuity Partial Identification Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 11 / 80 The Literature - Theory *Wooldridge, J. M, (2002), Econometric Analysis of Cross Section and Panel Data, Cambridge: The MIT Press, Ch. 18. Heckman, J., and Navarro-Lozano, S., (2004), Using, Instrumental Variables, and Continuous Control Functions to Estimate Economics Choice Models, The Review of Economics and Statistics, 86(1): 30-57. Rosenbaum, P., and D. Rubin (1983), The Central Role of the Propensity Score in Observations Studies for Causal Effects, Biometrika 70(1): 41-55. Dehejia, R.H., and S. Wahba (2002), Propensity Score- Methods for Nonexperimental Causal Studies, The Review of Economic Studies, 84(1): 151-161. Heckman, J., H. Ichimura, J. Smith, and P. Todd (1998), Characterizing Selection Bias Using Experimental Data, Econometrica 66(5): 1017-1098. Heckman, J., H. Ichimura, and P. Todd (1997), as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme, Review of Economic Studies 64: 605-654. Heckman, J., H. Ichimura, and P. Todd (1998), as an Econometric Evaluation Estimator, Review of Economic Studies 65: 261-294. *Smith, J., and P. Todd (2005), Does Overcome Lalondes Critique of Nonexperimental Estimators? Journal of Econometrics, 125(1-2): 305-53. Abadie, A., and G. Imbens (2006), Large Sample Properties of Estimators for Average Treatment Effects, Econometrica 74(1): 235-267. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 12 / 80

The Literature - Applications Benjamin, D., (2003), Does 401(k) Eligability Increase Saving? Evidence from Propensity Score Subclassification, Journal of Public Economics 87: 1259-1290. Jalan, J., and M. Ravallion (2003), Does Piped Water Reduce Diarrhea for Children in Rural India? Journal of Econometrics 112: 153-173. Jalan, J., and M. Ravallion (2003), Estimating the Benefit Incidence of an Antipoverty Program by Propensity-Score, Journal of Business and Economic Statistics 21(1):19-30. Levine, D., and G. Painter (2003), The Schooling Costs of Teenage Out-of-Wedlock Childbearing: Analysis with a within-school Propensity-Score- Estimator, The Review of Economics and Statistics 85(4): 884-900. *List, J., D. Millimet, P. Fredriksson, and W. McHone (2003), Effects of Environmental Regulations on Manufacturing Plant Births: Evidence from a Propensity Score Estimator, The Review of Economics and Statistics 85(4): 944-52. Park, A., S. Wang, and G. Wu (2002), Regional Poverty Targeting in China, Journal of Public Economics, 86: 123-153. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 13 / 80 Making Use of Ignorability methods are based on the ignorability of treatment assumption introduced by Rosenbaum and Rubin (1983) Assumption ATE.1: Conditional on W = (X, Z), D and (Y 0, Y 1 ) are independent; i.e., (Y 0, Y 1 ) D W. (21) This is known as selection on observables. A less restrictive version that sometimes suffices is Assumption ATE.1 : E(Y 0 W, D) = E(Y 0 W ) and E(Y 1 W, D) = E(Y 1 W ) (22) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 14 / 80

Ignorability of Treatment (cont d) The key to the benefit of ignorability is that it suggests that, even though (Y 0, Y 1 ) and D might be correlated, once we control for W they are uncorrelated E(Y 1 W, D = 0) = E(Y 1 W, D = 1) = E(Y 1 W ) (23) E(Y 0 W, D = 1) = E(Y 0 W, D = 0) = E(Y 0 W ) (24) By conditioning on W, we can construct the missing counterfactuals. Note: If we are only interested in TT, then we only need the weaker assumption that Y 0 D W (25) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 15 / 80 Making Use of Ignorability There are several ways in which we can use the ignorability assumption. 1. Since we have a random sample on (Y, D, W ), we can estimate (even nonparametrically): r 1 (W ) = E(Y 1 W, D = 1) (26) r 0 (W ) = E(Y 0 W, D = 0) (27) Given consistent estimators of there functions, a consistent estimator of ATE is ÂTE = 1 N [ˆr 1 (W i ) ˆr 0 (W i )] (28) N Similarly i=1 TT = ( N ) 1 D i i=1 N i=1 D i [ˆr 1 (W i ) ˆr 0 (W i )] (29) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 16 / 80

Making Use of Ignorability Alternatively, if W can take on a finite number of alternatives; i.e., W w 1,..., w M, then we can compute τ jm = E [Y j W = τ m, D = j] (30) We can then compute ÃTE = N s m [ˆτ 1m ˆτ 0m ] (31) i=1 where s m denotes the population proportion of type m. This approach becomes difficult, however, if M is large. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 17 / 80 Using the Propensity Score The ignorability assumption is less useful if W is of high dimensionality. Rosenbaum and Rubin (1983) reduce the dimensionality problem using the Propensity Score: p(w ) = Pr(D = 1 W ) (32) In their Theorem 3, they show that (Y 0, Y 1 ) D W and 0 < p(w ) < 1 (Y 0, Y 1 ) D p(w ) and 0 < Pr[D = 1 p(w )] This is known as strong ignorability of the treatment Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 18 / 80

Using the Propensity Score (cont d) Again, we can now construct the counterfactuals of interest E[Y 1 p(w ), D = 0] = E[Y 1 p(w ), D = 1] = E[Y 1 p(w )] (33) E[Y 0 p(w ), D = 1] = E[Y 0 p(w ), D = 0] = E[Y 0 p(w )] (34) Note, however, that we are ruling out p(w ) = 1 and p(w ) = 0 case.there has to be a chance that each type of person (defined by W ) has a counterpart in the other treatment group. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 19 / 80 Using the Propensity Score (cont d) Strong ignorability implies that { } [D p(w )] Y ATE = E p(w ) [1 p(w )] TT = E { [D p(w )]Y [1 p(w )] Pr(D = 1) } (35) (36) Given a consistent estimator of p(w ), we then have TT = ÂTE = 1 N [ 1 N N i=1 ] 1 { N 1 D i N i=1 [D i ˆp(W i )] Y i ˆp(W i ) [1 ˆp(W i )] N i=1 [D i ˆp(W i )] Y i [1 ˆp(W i )] } (37) (38) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 20 / 80

Propensity Score Estimators PSM estimators take the form: ˆτ = 1 n 1 i I 1 S P [Y 1i Ŷ 0i ] (39) with Ŷ 0i = j I 0 Ŵ (i, j)y 0j (40) where I 1 denotes the set of treatment observations I 0 denotes the set of comparison observations n 1 denotes the number of treatment observations S P denotes the region of common support Ŵ (i, j) are weights that depend upon the distance between the propensity scores for i and j Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 21 / 80 The Choice of Weights Nearest neighbor matching: { 1 j = argmin k I0 ˆP i ˆP k Ŵ (i, j) = 0 otherwise (41) frequently used because of ease of implementation a single alternative individual serves as counterfactual for the treated individual. Nearest k neighbors matching trades off reduced variance (more info used to construct counterfactual) and increased bias (on average poorer fits). Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 22 / 80

The Choice of Weights (cont d) Caliper matching: { 1 Ŵ (i, j) = n i ˆP i ˆP j < c 0 otherwise (42) where n i denotes the number of caliper matches for observation i. Note: Treated individuals for whom no matches can be found are excluded from the analysis. Stratification : { 1 Ŵ (i, j) = n i ˆP j T i 0 otherwise (43) where T i denotes the propensity score strata for observation i. n i denotes the number of strata matches for observation i. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 23 / 80 The Choice of Weights (cont d) Kernel (e.g., Heckman, Ichimura, and Todd; 1997,1998) : Ŵ (i, j) = G k I 0 G ( ) ˆP j ˆP i a n where G(s) is a kernel function - e.g., G(s) = 15 16 (s2 1) 2. is a bandwidth parameter. a n local linear - See Fan(1992) ( ˆPk ˆP i ) (44) a n Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 24 / 80

Other Decisions matching with or without replacement - again, the tradeoff here is between bias and variance. trimming the support region - focus analysis on that region such that Pr[ˆp(W ) > 0] > 0 (45) Pr[1 ˆp(W ) > 0] > 0 (46) (47) - nonparametric density estimators can be used for p(w ). - typically, stricter requirements are placed on the support, with Pr[ˆp(W ) > 0] > c (48) Pr[1 ˆp(W ) > 0] > c (49) (50) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 25 / 80 Other Decisions (cont d) difference in difference matching - uses time series differencing to eliminate unobserved temporally invariant effects - requires before and after treatment observations for both treated and untreated individuals conditional matching (e.g., common region, school, etc.). the choice of the comparison sample. Heckman et al. (1997,1998) argue for the following criteria: - same data source - individuals reside in the same market - data contain a rich set of variables affecting outcomes and treatment group Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 26 / 80

Example #1 Heckman, Ichimura, and Todd (1997) HIT7 Use data from the National Job Training Partnership Act (JTPA) Experiment, including - randomized-out controls - an eligible nonparticipants comparison group. In this paper, the authors - decompose the bias differences in earnings - test the assumptions underlying matching, rejecting most of them - evaluate the performance of difference matching routines - emphasize the importance of a good comparison group Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 27 / 80 Decomposing Evaluation Bias in TT The bias in PSME can be decomposed as follows B = E(Y 0 X, D = 1)f (X D = 1)dX (51) S 1 E(Y 0 X, D = 0)f (X D = 0)dX S 0 (52) = B 1 + B 2 + B 3 (53) where B 1 = E(Y 0 X, D = 1)f (X D = 1)dX (54) S 1 \S 10 E(Y 0 X, D = 0)f (X D = 0)dX (55) S 0 \S 10 which is the bias due to non-overlapping support. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 28 / 80

Decomposing Evaluation Bias in TT (cont d) B 2 = E(Y 0 X, D = 0)[f (X D = 1) f (X D = 0)]dX S 10 (56) which is the bias due to differing distributions in X. B 3 = [E(Y 0 X, D = 1) E(Y 0 X, D = 0)] f (X D = 1)dX (57) S 10 which is the bias due to selection on unobservables. PSME attempts to address B 1 and B 2, but assumes away B 3. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 29 / 80 Overlap - Adult Males (HIT7) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 30 / 80

Overlap - Male Youths (HIT7) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 31 / 80 Decomposition of Bias - ENP s (HIT7) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 32 / 80

Decomposition of Bias - SIPP s (HIT7) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 33 / 80 Testing Key Assumptions (HIT7) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 34 / 80

Testing Key Assumptions (cont d) (HIT7) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 35 / 80 Testing Key Assumptions (cont d) (HIT7) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 36 / 80

Impact of Weights (HIT7) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 37 / 80 Impact of Conditioning Variables (HIT7) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 38 / 80

Example #2 Dehejia and Wahba (2003) ReStat Use data on National Supported Work (NSW) demonstration - this is randomized experiment - DW compare experimental treatment effect estimates to those obtained using two comparison samples Population Survey of Income Dynamics (PSID) Current Population Survey A variety of matching algorithms are considered. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 39 / 80 Without Replacement Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 40 / 80

Without Replacement Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 41 / 80 Sample Characteristics Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 42 / 80

Bias Estimates Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 43 / 80 Example #3 Smith and Todd (2005) Repeat the exercise in DW, but - investigate alternative sample definitions - estimate bias by using PSMEs on NSW randomized controls - add difference in difference matching General conclusions: - PSME are not a silver bullet for nonexperimental situations - The performance of PSME in DW is not generalizable, varying by sample definition - Difference-in-difference matching performed substantially better than cross-sectional matching alone - Details of the matching procedure generally had little impact including type of matching (nearest neighbor, local linear, etc.) propensity score estimation procedure Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 44 / 80

Table 5 Bias Associated with Alternative Estimators - ST Bias associated with alternative cross-sectional matching estimators. Comparison groups: (A) CPS male sample and (B) PSID male sample. Dependent variable: real earnings in 1978 (bootstrap standard errors in parentheses; trimming level for common support is 2 percent) Sample and propensity score model (1) Mean diff. (2) 1 Nearest neighbor without common support (3) 10 Nearestneighbors without common support (4) 1 Nearestneighbor with common support (5) 10 Nearestneighbors with common support (6) Local linear matching ðbw ¼ 1:0Þ (7) Local linear matching ðbw ¼ 4:0Þ (8) Local linear regression adjusted matching a ðbw ¼ 1:0Þ (9) Local linear regression adjusted matching ðbw ¼ 4:0Þ (A) Comparison group: CPS male sample LaLonde sample with DW 9757 555 270 838 1299 1380 1431 1406 1329 prop. score model (255) (596) (493) (628) (529) (437) (441) (490) (441) As % of $886 impact 1101% 63% 30% 95% 147% 156% 162% 159% 150% (29) (67) (56) (71) (60) (49) (50) (55) (50) DW sample with DW 10291 407 5 27 261 88 67 127 96 prop. score model (306) (698) (672) (723) (593) (630) (611) (709) (643) As % of $1794 impact 574% 23% 0.3% 1.5% 15% 5% 4% 5% 7% (17) (39) (37) (40) (33) (35) (34) (40) (36) Early RA sample with 11101 7781 3632 5417 2396 3427 2191 3065 3391 DW prop. score model (461) (1245) (1354) (1407) (1152) (1927) (1069) (3890) (1124) As % of $2748 impact 404% 283% 132% 197% 87% 125% 80% 112% 123% (17) (45) (49) (51) (42) (70) (39) (142) (41) LaLonde sample with 10227 3602 2122 3586 2342 3562 2708 3435 2362 LaLonde prop. score model (296) (1459) (1299) (1407) (1165) (3969) (1174) (4207) (1178) 336 J.A. Smith, P.E. Todd / Journal of Econometrics 125 (2005) 305 353 ARTICLE IN PRESS As % of $886 impact 1154% 406% 240% 405% 264% 402% 306% 388% 266% (33) (165) (147) (159) (131) (448) (133) (474) (133) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 45 / 80 Example #4: List, Millimet, Fredriksson, and McHone (2003) REStat Treatment: Nonattainment designation Outcome of interest: County level dirty plant births in New York 176 treatment observations Caliper matching - conditional matching considered for within region and year within year - matches are obtain for 8 to 81 of the treatment observations (depending on the use of conditional matches) Difference-in-difference estimates using clean plant births as control Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 46 / 80

Propensity Score Estimates of Attainment Effects Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 47 / 80 A Simple Experiment Let Y 1i = 2 + 2X 1i + X 2i + 2X 3i + ɛ 1i (58) Y 0i = 1 + X 1i + 2X 2i + X 3i + ɛ 0i (59) Y Di = 4 + X 1i + X 2i + X 3i + X 4i + ɛ Di (60) with (ɛ 1i, ɛ 0i, ɛ Di ) iid N(0, I3 ) (61) Σ = σ 2 D X i iid N(1, Σ) (62) 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 (63) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 48 / 80

RMSE Using Full Set of Conditioning Variables Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 49 / 80 RMSE Omitting X 1 Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 50 / 80

RMSE Omitting X 2 Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 51 / 80 Difference-in-Difference Difference-in-Difference (DID) It is tempting to evaluate a policy intervention (e.g., a job training program, an experimental rate structure, etc.) by examining the outcome of interest before and after the policy is in place. The problem with this approach is that the observed changes are potentially confounded with other temporal changes. The Difference-in-Difference (DID) approach attempts to control for these changes through the use of an untreated comparison group. Applications of DID approaches are commonplace in the treatment effects literature, including evaluations of: 1 labor market programs (Ashenfelter and Card, 1985); 2 minimum wage (Card and Krueger, 1993); 3 workers compensation (Meyer, Viscusi, and Durbin, 1995); 4 the inflow of immigrants (Card, 1990); 5 retirement plans (Poterba, Venti, and Wise, 1995); 6 universal pre-kindergarten (Fitzpatrick, 2008); 7 air pollution regulation (Becker and Henderson, 2000); 8 speed limits (Ashenfelter and Greenstone, 2004) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 52 / 80

The DID Literature Difference-in-Difference *Meyer, B. (1995), Natural and Quasi-Experiments in Economics, Journal of Business and Economic Statistics, 13(2): 151-61. *Fitzpatrick, M. D. (2008), Starting School at Four: The Effects of Universal Pre-Kindergarten on Children s Academic Achievement, B.E. Journal of Economic Analysis & Policy, 8(1), Article 46. Athey, S. and G. W. Imbens (2006), Identification and Inference in Nonlinear Difference-in-Differences Models, Econometrica, 74: 431-497. Meyer, B. D., W. K. Viscusi, and D. L. Durbin (1995), Workers Compensation and Injury Duration: Evidence from a Natural Experiment, American Economic Review, 85: 322-340. Ashenfelter, O., and Greenstone (2004), Using Mandated Speed Limits to Measure the Value of a Statistical Life, Journal of Political Economy, 112(1): S226-S266. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 53 / 80 First Differencing Difference-in-Difference Suppose that we observe our outcome of interest for the treatment group before and after the policy intervention; i.e., y it = α + βd t + ɛ it, i = 1,..., N; t = 0, 1, (64) where d t is a dummy variable that =1 after the policy intervention and =0 otherwise; y it denotes the outcome variable of interest; denotes the residual term. ɛ it Running OLS for this model yields an estimate of the treatment effect: ˆβ d = 1 N (y i1 y i0 ) = ȳ 1 ȳ 0 (65) i The key identifying assumption is that, absent the treatment there would have been no systematic change; i.e., E(ɛ it d t ) = 0. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 54 / 80

Difference-in-Difference Difference-in-Differences If, however, there were changes in other factors over time, then the treatment effect is no longer identified. The Difference-in-Difference (DID) approach addresses this problem by introducing a second group that never faces the treatment. Specifically, using Meyer s (1995) notation, we have y j it = α+α 1d t +α j d j +βdt j +ɛ j it, i = 1,..., N; t = 0, 1; j = 0, 1 (66) where j denotes the groups, with j = 1 denoting the treatment group and j = 0 denotes the comparison group; d j = 1 for j=1; =0 otherwise; d t = 1 for the post-treatment time period, = 0 otherwise; dt j = d t d j. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 55 / 80 Difference-in-Difference The Differences for Group 1 Note that we then have: and so that: y 1 i0 = α + α 1 + ɛ 1 i0 (67) y 1 i1 = α + α 1 + α 1 + β + ɛ 1 i1 (68) y 1 i1 y 1 i0 = α 1 + β + (ɛ 1 i1 ɛ 1 i0) (69) This illustrates the confounding problem in identifying β with only group 1. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 56 / 80

Difference-in-Difference Using the Differences for Group 0 We also have: yi0 0 = α + ɛ 0 i0 (70) and yi1 0 = α + α 1 + ɛ 0 i1 (71) so that: yi1 0 yi0 0 = α 1 + (ɛ 0 i1 ɛ 0 i0) (72) Differencing these differences yields (y 1 i1 y 1 i0) (y 0 i1 y 0 i0) = [ α 1 + β + (ɛ 1 i1 ɛ 1 i0) ] (73) [ α 1 + (ɛ 0 i1 ɛ 0 i0) ] (74) = β + [ (ɛ 1 i1 ɛ 1 i0) (ɛ 0 i1 ɛ 0 i0) ] (75) The DID estimate of β results by applying OLS to (66), yielding ˆβ dd = (ȳ 1 1 ȳ 1 0 ) (ȳ 0 1 ȳ 0 0 ) (76) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 57 / 80 Using DID Difference-in-Difference The key assumption here is that E(ɛ j it d t, d j ) = 0. This will require that the intertemporal changes are not group specific. The DID approach can be generalized to control for differences in the distributional characteristics of the treatment and comparison groups by including control variables in the regression. y j it = α+α 1d t +α j d j +βdt j +z j it δ+ɛj it, i = 1,..., N; t = 0, 1; j = 0, 1 (77) We ll look briefly at two applications: - Fitzpatrick, M. D. (2008), Starting School at Four: The Effects of Universal Pre-Kindergarten on Children s Academic Achievement, B.E. Journal of Economic Analysis & Policy, 8(1), Article 46. - Meyer, B. D., W. K. Viscusi, and D. L. Durbin (1995), Workers Compensation and Injury Duration: Evidence from a Natural Experiment, American Economic Review, 85: 322-340. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 58 / 80

Fitzpatrick (2008) Difference-in-Difference In this article, the author looks at the impact of Universal Pre-Kindergarten in Georgia. The article uses data from the National Assessment of Educational Progress (NAEP), with the treatment group being Georgian students, with the comparison group being students in other states. The estimation equation becomes: Y ijt = α + βupk it + σx ijt + γz jt + State i + θ t + ɛ ijt (78) where X ijt denotes a vector of child characteristics (e.g., gender); Z jt denotes a vector of school characteristics (e.g., rural, racial make-up); UPK it denotes a dummy variable for UPK treatment; State i denotes a state dummy variable; and denotes a time dummy variable. θ t Additional control variables were included in the analysis. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 59 / 80 Difference-in-Difference Math Test Score Comparison The B.E. Journal of Economic Analysis & Policy, Vol. 8 [2008], Iss. 1 (Advances), Art. 46 Figure 4. Standardized 4 th Grade NAEP Scores, Georgia vs. Rest of the U.S. (Line indicates last pre-program cohort) Panel A. Mathematics Scores 0.2 0.15 Georgia Other States Standardized Score 0.1 0.05 0-0.05 1996 2000 2003 2005-0.1 Panel B. Reading Scores Year Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 60 / 80

1996 2000 2003 2005-0.05 Difference-in-Difference Reading Test Score Comparison -0.1 Panel B. Reading Scores 0.08 0.06 Other States Georgia Year Standardized Score 0.04 0.02 0-0.02 1994 1998 2002 2003 2005-0.04 Note: Based on the author s calculations from the State NAEP Restricted Use files. Test scores have been standardized to have mean zero and standard deviation of one in 1996 for math and 1994 for reading. Survey population weights were used. Year Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 61 / 80 Difference-in-Difference http://www.bepress.com/bejeap/vol8/iss1/art46 Basic Results The B.E. Journal of Economic Analysis & Policy, Vol. 8 [2008], Iss. 1 (Advances), Art. 46 18 Table 4: Difference-in-Differences Estimates of the Effect of Universal Pre-K in Georgia on Test Scores and Probability of Being On-Grade (I) (II) (III) (IV) (V) (VI) (VII) Math Score 0.027 0.025 0.017 (-0.007, 0.092) 0.013 0.011 0.008 (0.006) (0.007) (0.006) {0.111} (0.008) (0.006) Reading Score 0.008 0.025 0.024 (-0.005,0.077) 0.009 0.017 0.013 (0.007) (0.002) (0.020) {0.350} (0.016) (0.012) On-grade 0.015-0.012-0.005 (-0.035, 0.036) 0.008 0.006 0.007 (0.006) (0.007) (0.005) {0.026} (0.007) (0.007) Specification Details Observation Level Student Student Student State Student Student Grades Included 4 4 4 & 8 4 4 4 & 8 Controls Included N Y Y Y Y Y Clustering State State State n/a State State Weighting Survey Survey Survey Synthetic Synthetic Synthetic Number of Observations Math Score 537,112 537,112 1,013,847 537,112 27 406,914 773,734 Reading Score 714,894 714,894 1,397,312 714,894 20 156,941 269,860 On-grade 1,241,994 1,241,994 2,468,988 1,241,994 29 111,422 218,836 Note: Based on the author s calculations using the NAEP. All regressions include state and year fixed effects as well as controls for student and school characteristics. Survey weights were used. See Rogers and Stoeckel (2004) for more information. The dependent variables in the first two sets of rows are an individual child s plausible test score on the Mathematics and Reading Assessments, respectively. The scores have been standardized by the mean and standard deviation of the first year of data for that subject. The dependent variable in the third set of rows is a dummy variable for whether the child was at or above the median age for his/her state, grade and cohort. The estimates in the third row are from linear probability models using all years of Mathematics and Reading data. Herriges Standard (ISU) errors are in parentheses. Estimates Ch. 12: allow Estimating for arbitrary Treatment correlation Effects of the error terms at the state level. Fall The 2010 fourth column 62 gives / 80the 90% confidence interval range using the methods detailed in Conley and Taber (2006). The last three columns report results using the synthetic control methods from Abadie et al. (2007) as detailed in the text. In the fifth column, the {} contain probability values of the estimate being within the 95 percent

Basic Results Difference-in-Difference The B.E. Journal of Economic Analysis & Policy, Vol. 8 [2008], Iss. 1 (Advances), Art. 46 Table 5. Difference-in-Differences Estimates of the Effect of Universal Pre-K on Students Test Scores and Probability of Being On-Grade of Students by Race and School Lunch Eligibility Status (I) (II) (III) (IV) Race White Black White Black School Lunch Eligible No No Yes Yes Math Score 0.036-0.009 0.082 0.000 (0.007) (0.015) (0.008) (0.011) 96,148 17,670 7,738 47,916 Reading Score -0.009 0.018-0.024-0.013 (0.007) (0.015) (0.025) (0.019) 204,767 26,979 89,092 67,314 On-grade -0.001 0.060 0.020 0.025 (0.005) (0.022) (0.004) (0.010) 370,227 50,342 22,462 107,371 Note: Based on the author s calculations using the National Assessment of Educational Progress. Column headers indicate the subgroups of the population included in the sample. The first row of each set represents the coefficient estimates, the second row (in parentheses) reports the standard Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 error of the estimate above it and the third row reports the number of observations used in 63 / 80 estimation. Test scores have been normalized by the average standard deviation for all plausible values in the first year of data for that test. All regressions include year and state fixed effects. Controls for student and Difference-in-Difference school characteristics included are described in the text. To correctly account for the design of the survey, weights were used (Rogers and Stoeckel 2004). Estimates allow for arbitrary correlation of the error terms at the state level. Synthetic control groups were created using the Abdaie, Diamond and Hainmueller (2007) method as detailed in the text. Estimates in bold are significant at the five percent level or lower. Meyer, Viscusi, and Durbin (1995) The results in Table 5 show that the math scores of some children improved because of the introduction of Universal Pre-K in Georgia. The math scores of Caucasian children ineligible for NSLP increased by 3.6 percent of a standard deviation. Similarly, the math scores of NSLP-eligible Caucasian children increased by 8.2 percentage points. However, the estimates of the program s introduction on the math scores of African-American children and on the reading scores of any of these groups are not statistically different from zero. With the exception of Caucasian NSLP-ineligible children, the introduction of Universal Pre-K produced increases in the probability of fourth graders in Georgia being on-grade for their age. African-Americans who were In this article, the authors look at the impact of worker s compensation on the duration of claims. The idea is that higher benefits may cause workers to stay out longer (either to get better or simply enjoy the additional leisure). The problem is that benefits are typically tied to previous earnings, which also strongly influences the payoff from returning to work. MVD use changes in the maximum weekly benefits cap in Michigan http://www.bepress.com/bejeap/vol8/iss1/art46 26 and Kentucky as their natural experiment. The treatment group is then composed of individuals in the effected earnings bracket, whereas the comparison group is individuals for whom the original cap was not binding. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 64 / 80

The Quasi-Experiment Difference-in-Difference Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 65 / 80 Simple DID Estimates Difference-in-Difference Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 66 / 80

Distributional Shift Difference-in-Difference Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 67 / 80 Difference-in-Difference DID Estimates with Covariates Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 68 / 80

Regression Discontinuity Regression Discontinuity Regression Discontinuity (RD) design takes advantage of the fact that, for some treatments, access to the treatment is a discontinuous function of one or more variables. For example: - Thistlethwaite and Campbell (1960) studies the effect of student scholarship on career aspirations, where scholarships were awarded only above a specific test score threshold; - Angrist and Lavy (1999) studied effect of class size on student test scores, using the Maimonides Rule requiring classes to be split when they reached a given threshold; - Van der Klaauw (2003) studied effect of financial aid offers on college attendance, using rule that relates aid to student SAT scores and GPA; - Hahn et al. (1999) studied impact of anti-discrimination law, using the fact that it only applied for firms with at least 15 employees; - Matsudaira (2007) studies the effect of a remedial summer school program, mandatory for students with a test score below a given level; - Card et al. (2004) studies effect of medical services, where its availability is restricted by age. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 69 / 80 Regression Discontinuity Regression Discontinuity, Illustration, van der Klaauw (2003) Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 70 / 80

Regression Discontinuity Regression Discontinuity - Readings Angrist and Lavy (1999), Using Maimonides rule to estimate the effect of class size on scholastic achievement, Quarterly Journal of Economics 114: 533575. Hahn, J., P. Todd, and W. van der Klaauw (2000), Identification and Estimation of Treatment Effects with a Regression Discontinuity Design, Econometrica 69(1): 201-209. Van der Klaauw (2003), Estimating the effect of financial aid offers on college enrollment: a regression-discontinuity approach, International Economic Review 43: 12491287. Imbens and Lemieux (2008), Regression Discontinuity Designs: A Guide to Practice, Journal of Econometrics, 142(2): 615-635. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 71 / 80 Partial Identification Partial Identification Much of the treatment effects literature that we have consider relies on relatively strong assumptions in order to identify the treatment impact. - Propensity Score requires strong ignorability. - Instrumental Variables requires mean independence of the outcome variable with respect to the instrument. - Difference-in-Differences requires inter-temporal factors to be independent of the treatment. - Two-Step Heckman corrections require distributional assumptions. There is a growing strand of literature that attempts to avoid these strong assumptions and determine what can be said regarding a treatment effect using relatively weak assumption. In most instances, the approach yields bounds on the treatment effect, partially identifying, rather than point identifying the treatment effect. Manski has been pivotal in this line of research. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 72 / 80

Partial Identification Partial Identification - Readings Manski, C. (1990), Nonparametric Bounds on Treatment Effects, American Economic Review, Papers and Proceedings, 80: 319-323. Manski, C. (1997). Monotone Treatment Response, Econometrica, 65: 1311-1334. Manski, C (2007) Identification for Prediction and Decision, (Cambridge, MA: Harvard University Press. Manski, C. and J. Pepper (2000), Monotone Instrumental Variables: With and Application to the Returns to Schooling Econometrica, 68: 997-1010. Manski, C. and J. Pepper (2009), More on Monotone Instrumental Variables The Econometrics Journal, 12: S200-S216. Lechner, M., and M. Blaise (2010) Partial Identification of Wage Effects of Training Programs, Working paper 2010-8, Brown University. Kreider, B., and S. Hill, S.(2009) Partially Identifying Treatment Effects with an Application to Covering the Uninsured, Journal of Human Resources, 44(2): 409-449. Kreider, B. and J. Pepper (2007). Disability and Employment: Reevaluating the Evidence in Light of Reporting Errors, Journal of the American Statistical Association, 102: 432-441. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 73 / 80 Basic Notation Partial Identification Let y j (t) T denote the outcome of interest for individual j = 1,..., J given treatment t T Let z j T denote the realized outcome for individual j. For T = 0, 1, the ATE of interest can be written as ATE = E [y(1) x] E [y(0) x] (79) Absent any assumptions or restrictions, the ATE is unbounded; i.e., ATE (, ). In the case of a binary outcome variable, ATE [ 1, 1]. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 74 / 80

Partial Identification Rewriting the Components of ATE We can rewrite the ATE using the fact that: E [y(1) x] = E [y(1) x, z = 1] P(z = 1 x) + E [y(1) x, z = 0] P(z = 0 x) = E [y(1) x, z = 1] + P(z = 0 x) {E [y(1) x, z = 0] E [y(1) x, z = 0]} = E [y(1) x, z = 1] + Ψ 1 Similarly: E [y(0) x] = E [y(0) x, z = 0] + P(z = 1 x) {E [y(0) x, z = 1] E [y(0) x, z = 0]} = E [y(0) x, z = 0] + Ψ 0 The trick is to bound the unknown components of Ψ 0 and Ψ 1 Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 75 / 80 Partial Identification The Worst Case Bounds Suppose that we can bound the outcome space, with y(t) [K l, K u ]. Then LB t E [y(t) x] UB t (80) where LB t = E [y(t) x, z = t] + P(z = t x) {K l E [y(t) x, z = t]} (81) and UB t = E [y(t) x, z = t] + P(z = t x) {K u E [y(t) x, z = t]} (82) so that LB 1 UB 0 ATE UB 1 LB 0 (83) The width of these bounds is K u K l, so we have shrunk the bounds considerably, but the bounds necessarily include zero. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 76 / 80

Partial Identification Instrumental Variable Assumption Let x = (w, v). We are used to the standard instrumental variables assumption; i.e., IV Assumption: The covariate v is an IV if for each t T, each value of w, and all (u, u ) (V V ), E [ y(t) w, v = u ] = E [y(t) w, v = u]. (84) This is a strong assumption. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 77 / 80 Partial Identification Monotone Instrumental Variable Assumption Manski and Pepper (2000), suggest a weaker assumption: Monotone Instrumental Variables. MIV Assumption: Let V be an ordered set. Covariate v is a MIV if for each t T, each value of w, and all (u 1, u 2 ) (V V ) such that u 2 u 1, E [y(t) w, v = u 2 ] E [y(t) w, v = u 1 ]. (85) MP (2000, p. 998) motivate this in the context of an analysis of wages y(t) as a function of income (w), with an instrument of ability (v). They suggest v as an instrument does not make sense But it is reasonable to think of v as a MIV. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 78 / 80

Partial Identification Using a MIV to Narrow the ATE Bounds Proposition 1 in Manski and Pepper (2000, p. 1000) establishes a bound on E [y(t) v = u] using the fact that the MIV assumption implies that: u 1 u u 2 E [y(t) v = u 1 ] E [y(t) v = u 1 ] E [y(t) v = u 2 ] (86) A subsequent corollary establishes bounds on E [y(t)] using E [y(t)] = u V P(v = u)e [y(t) v = u] (87) Similar bounds can be established conditional on x. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 79 / 80 Partial Identification The Monotone Treatment Response (MTR) A second assumption considered by MP (2000) specifies a relationship between y(t 1 ) and y(t 0 ); i.e., MTR Assumption: Let T be ordered. For each j J: t 1 t 0 y j (t 1 ) y(t 0 ). (88) Combining this assumption with that of MIV, they are able to further narrow the bounds on the ATE. Herriges (ISU) Ch. 12: Estimating Treatment Effects Fall 2010 80 / 80