Lecture 11/12. Roy Model, MTE, Structural Estimation

Similar documents
Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Lecture 11 Roy model, MTE, PRTE

Estimation of Treatment Effects under Essential Heterogeneity

Estimating Marginal and Average Returns to Education

Exploring Marginal Treatment Effects

Instrumental Variables: Then and Now

Principles Underlying Evaluation Estimators

AGEC 661 Note Fourteen

The Generalized Roy Model and Treatment Effects

Estimating marginal returns to education

Estimating Marginal Returns to Education

Comparative Advantage and Schooling

Truncation and Censoring

Estimating Heterogeneous Treatment Effects in the Presence of Self-Selection: A Propensity Score Perspective*

Four Parameters of Interest in the Evaluation. of Social Programs. James J. Heckman Justin L. Tobias Edward Vytlacil

Applied Econometrics Lecture 1

Estimating Marginal and Average Returns to Education

Generalized Roy Model and Cost-Benefit Analysis of Social Programs 1

The relationship between treatment parameters within a latent variable framework

IV estimators and forbidden regressions

Matching. James J. Heckman Econ 312. This draft, May 15, Intro Match Further MTE Impl Comp Gen. Roy Req Info Info Add Proxies Disc Modal Summ

Chilean and High School Dropout Calculations to Testing the Correlated Random Coefficient Model

Heterogeneous Treatment Effects in the Presence of. Self-Selection: A Propensity Score Perspective

Lecture 14 More on structural estimation

The Econometric Evaluation of Policy Design: Part I: Heterogeneity in Program Impacts, Modeling Self-Selection, and Parameters of Interest

Treatment Effects with Normal Disturbances in sampleselection Package

Impact Evaluation Technical Workshop:

Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Haavelmo, Marschak, and Structural Econometrics

Policy-Relevant Treatment Effects

Economics 536 Lecture 21 Counts, Tobit, Sample Selection, and Truncation

Instrumental Variables in Models with Multiple Outcomes: The General Unordered Case

Testing for Essential Heterogeneity

Potential Outcomes Model (POM)

Maximum Likelihood Estimation

Lecture 8 Panel Data

A Course in Applied Econometrics. Lecture 5. Instrumental Variables with Treatment Effect. Heterogeneity: Local Average Treatment Effects.

Marginal Treatment Effects from a Propensity Score Perspective

SUPPOSE a policy is proposed for adoption in a country.

Modeling Binary Outcomes: Logit and Probit Models

Generated Covariates in Nonparametric Estimation: A Short Review.

ECO Class 6 Nonparametric Econometrics

Econometrics Master in Business and Quantitative Methods

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

The College Premium in the Eighties: Returns to College or Returns to Ability

The Econometric Evaluation of Policy Design: Part III: Selection Models and the MTE

The problem of causality in microeconometrics.

Chapter 60 Evaluating Social Programs with Endogenous Program Placement and Selection of the Treated

Methods to Estimate Causal Effects Theory and Applications. Prof. Dr. Sascha O. Becker U Stirling, Ifo, CESifo and IZA

Labor Supply and the Two-Step Estimator

Adding Uncertainty to a Roy Economy with Two Sectors

Introduction to Econometrics

Understanding What Instrumental Variables Estimate: Estimating Marginal and Average Returns to Education

Tobit and Selection Models

1 Static (one period) model

The Problem of Causality in the Analysis of Educational Choices and Labor Market Outcomes Slides for Lectures

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

Florian Hoffmann. September - December Vancouver School of Economics University of British Columbia

Understanding Instrumental Variables in Models with Essential Heterogeneity

Economics 582 Random Effects Estimation

Tables and Figures. This draft, July 2, 2007

ted: a Stata Command for Testing Stability of Regression Discontinuity Models

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Non-linear panel data modeling

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Propensity-Score-Based Methods versus MTE-Based Methods. in Causal Inference

Econometric Causality

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

An Alternative Assumption to Identify LATE in Regression Discontinuity Designs

Empirical approaches in public economics

Rising Wage Inequality and the Effectiveness of Tuition Subsidy Policies:

Robustness to Parametric Assumptions in Missing Data Models

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Identifying and Estimating the Distributions of Ex Post. and Ex Ante Returns to Schooling

Selection on Observables: Propensity Score Matching.

Identifying Effects of Multivalued Treatments

The Generalized Roy Model and the Cost-Benefit Analysis of Social Programs

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Imbens, Lecture Notes 2, Local Average Treatment Effects, IEN, Miami, Oct 10 1

Identification and Extrapolation with Instrumental Variables

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Quantitative Economics for the Evaluation of the European Policy

Average and Marginal Returns to Upper Secondary Schooling in Indonesia

Dynamic Models Part 1

Econometric Analysis of Cross Section and Panel Data

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

STRUCTURAL EQUATIONS, TREATMENT EFFECTS AND ECONOMETRIC POLICY EVALUATION 1. By James J. Heckman and Edward Vytlacil

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Discrete Dependent Variable Models

EVALUATING EDUCATION POLICIES WHEN THE RATE OF RETURN VARIES ACROSS INDIVIDUALS PEDRO CARNEIRO * Take the standard model of potential outcomes:

Introduction to GSEM in Stata

TREATMENT HETEROGENEITY PAUL SCHRIMPF NOVEMBER 2, 2011 UNIVERSITY OF BRITISH COLUMBIA ECONOMICS 628: TOPICS IN ECONOMETRICS

Gibbs Sampling in Latent Variable Models #1

Transcription:

Lecture 11/12. Roy Model, MTE, Structural Estimation Economics 2123 George Washington University Instructor: Prof. Ben Williams

Roy model The Roy model is a model of comparative advantage: Potential earnings in sectors 0 and 1: Y 0, Y 1 Individuals choose sector 1 if and only if Y 1 Y 0 c where c is a nonrandom cost. Heckman and Honore (1990) studied the empirical implications and identification of this model.

Roy model The extended and generalized Roy model: the extended model allows for an observable cost component, D = 1(Y 1 Y 0 c(w )) where W is a vector of covariates and c is a possibly unknown function. the generalized model allows for an unobservable cost component, D = 1(Y 1 Y 0 c(w, V )) where V is unobservable

Roy model Two parts of this lecture: estimation of treatment effects estimation of structural models

Roy model In the Roy model, Y d = µ d + U d for d = 0, 1. Suppose we observe a vector of covariates X so that µ d = β d X. Then E(Y D = 1, X = x) = β 1 x + E(U 1 U 1 U 0 z, X = x) E(Y D = 0, X = x) = β 0 x + E(U 0 U 1 U 0 < z, X = x) where z = (β 1 β 0 ) x c.

Roy model Assumption: (U 1, U 0 ) X = x N(0, Σ) where ( ) σ 2 Σ = 1 σ 10 σ 10 σ0 2 Let V = U 1 U 0 and σ 2 V = Var(V )

Roy model Assumption: (U 1, U 0 ) X = x N(0, Σ) Let z = z /σ V. Then under this assumption, E(Y D = 1, X = x) = β 1 x + σ2 1 σ 10 λ( z) σ V E(Y D = 0, X = x) = β 0 x + σ2 0 σ 10 λ( z) σ V and Pr(D = 1 X = x) = Φ( z)

Roy model Estimation with only sector one observed. If we only observe Y for those with D = 1 (for example, a wage-lfp model) then we can (1) estimate the probit: Pr(D = 1 X = x) = Φ(γ 0 + γ 1 x) = Φ( z) (2) compute the predicted values from (1), ˆ Z = ˆγ0 + ˆγ 1 X and plug into λ to get λ( ˆ Z ) (3) estimate a regression of Y on X, λ( ˆ Z ) for those with D = 1 This enables us to estimate β 1 but not β 0, σ 1, σ 10, σ 0.

Roy model Estimation with both sectors observed. If we only observe Y, D, X for everyone. (1) estimate the probit: Pr(D = 1 X = x) = Φ(γ 0 + γ 1 x) = Φ( z) (2) compute the predicted values from (1), ˆ Z = ˆγ0 + ˆγ 1 X and plug into λ to get λ( ˆ Z ) (3) estimate a regression of Y on X, λ( ˆ Z ) for those with D = 1 (4) estimate a regression of Y on X, λ(ˆ Z ) for those with D = 0 This enables us to estimate β 1, β 0, β 1 β 0 and σ V, σ2 0 σ 10 σ V σ1 2 σ 10 σ V. Thus we get σ V too From the variance of the residuals from the two regressions we can also identify σ1 2, σ2 0 and σ 10.

Roy model Concerns: If λ( z) is approximately linear then we will have a serious collinearity problem. If U 1, U 0 is not normal then the model is misspecified and identification is not transparent. If there are variable costs the model is misspecified.

Roy model In the extended Roy model, Y d = µ d + U d for d = 0, 1 but D = 1(Y 1 Y 0 γ 1 X + γ 2 Z ) Cost of participation varies with X and also with other variables Z. Then E(Y D = 1, X = x, Z = z) = β 1 x E(Y D = 0, X = x, Z = z) = β 0 x where z = (β 1 β 0 ) x γ 1 x γ 2 z. + E(U 1 U 1 U 0 z, X = x) + E(U 0 U 1 U 0 < z, X = x)

Roy model Assumption: (U 1, U 0 ) X = x, Z = z N(0, Σ) Under this assumption, if z = z /σ V, E(Y D = 1, X = x) = β 1 x + σ2 1 σ 10 λ( z) σ V E(Y D = 0, X = x) = β 0 x + σ2 0 σ 10 λ( z) σ V and Pr(D = 1 X = x, Z = z) = Φ( z) β 1 and β 0 are still identified Σ only identified if there is an exclusion: a component of X that does not affect costs

Roy model What do we need/want to identify? The ATE is E(Y 1 Y 0 ) = (β 1 β 0 )E(X)

Roy model What do we need/want to identify? The ATE is E(Y 1 Y 0 ) = (β 1 β 0 )E(X) The distribution of gains: Y 1 Y 0 X = x N((β 1 β 0 ) x, σv 2 ) various other counterfactuals need Σ to go beyond mean treatment effects

Roy model In the generalized Roy model, Y d = µ d + U d for d = 0, 1 but D = 1(γ 1 X + γ 2 Z V ) V includes an unobservable component of cost Then E(Y D = 1, X = x, Z = z) = β 1 x E(Y D = 0, X = x, Z = z) = β 0 x where z = γ 1 x + γ 2 z. + E(U 1 V z, X = x) + E(U 0 V > z, X = x)

Roy model Assumption: (U 1, U 0, V ) X = x, Z = z N(0, Σ) Under this assumption, β 1 and β 0 are identified σ V is identified under the exclusion restriction but Var(U 1 U 0 ) σv 2 (key ingredient needed for distribution of Y 1 Y 0 ) is not identified

Roy model without normality Assumption: (U 1, U 0, V ) X = x, Z = z and lim z Pr(D = 1 X = x, Z = z) = 1 The first assumption is essentially the same one used by Imbens and Angrist (1994) The second assumption is called identification at infinity

Roy model without normality Under these assumptions Let P(x, z) = Pr(D = 1 X = x, Z = z) E(Y D = 1, X = x, Z = z) = β 1 x + K 1(P(x, z)) and lim z E(Y D = 1, X = x, Z = z) = β 1 x selection on unobservables goes away in the limit Using the same argument for D = 0, we can identify ATE(x) = (β 1 β 0 ) x We ve traded normality for identification at infinity.

Roy model Next: The marginal treatment effect. Structural estimation of the Roy model More on structural estimation

Some preliminaries Recall that D = 1(γ 1 X + γ 2 Z V ) Let U D = F V (V ) where V has distribution function F V ( ). Then D = 1(P(X, Z ) U D ) And note that U D Uniform(0, 1).

Definition of the MTE Then MTE(x, u) = E(Y 1 Y 0 X = x, U D = u) This demonstrates (observable and unobservable) heterogeneity in Y 1 Y 0. MTE(x, u) can be interpreted as the effect of participation for those individuals who would be indifferent if we assigned them a new value of P = P(x, z) equal to u. Someone with a large value of u (close to 1) will participate only if P is quite large; this person will be indifferent if P is equal to u. These are the high unobservable cost individuals. Someone with a small value of u (close to 0) will participate even if P is quite small; this person will be indifferent if P is equal to u. These are the low unobservable cost individuals.

Identification of the MTE The identifying equation: E(Y X = x, P(X, Z ) = p) p = MTE(x, p) This only works if Z is continuous. The effect for the high unobserved cost individuals is identified by the effect of a marginal increase in participation probability on Y at a high participation rate.

Other treatment parameters and methods ATE(x) = 1 0 MTE(x, u)du

Other treatment parameters and methods ATE(x) = 1 0 MTE(x, u)du TT (x) = 1 0 MTE(x, u)ω TT (x, u)du where ω TT (x, u) disproportionately weights smaller values of u OLS and IV can also be written as weighted averages of MTE(x, u).

Other treatment parameters and methods Consider the IV estimand IV (x) = The weight here is Cov(J(Z ),Y X=x) Cov(J(Z ),D X=x) ω IV (x, u) = E(J E(J) X = x, P u)pr(p u X = x) Cov(J, P X = x) This and many interesting implications are discussed in Heckman, Vytlacil, and Urzua (2006).

An example Carneiro, Heckman, Vytlacil (2011) use data from the NLSY Y is log wage in 1991 (ages 28-34), D represents college attendance, X a vector of controls vector Z : (i) distance to college, (ii) local wage, (iii) local unemployment, (iv) average local public tuition

VOL. 101 NO. 6 0.8 carneiro et al.: estimating marginal returns to education An example 2771 0.6 0.4 MTE 0.2 0 0.2 0.4 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 U S

Estimating the MTE - normal model Option 1. Estimate using MLE. Option 2. Two stage estimation: Probit to estimate P(Z i ) (to simplify, define Z i to include X i and instrument(s)) Regress Y on X i and ˆλ 1i = φ(φ 1 (ˆP(Z i ))) for D ˆP(Z i ) i = 1 Regress Y on X i and ˆλ 0i = φ(φ 1 (ˆP(Z i ))) 1 ˆP(Z i ) Then for D i = 0 MTE(x, u) = x ( ˆβ 1 ˆβ 0 ) + (ˆρ 1 ˆρ 0 )Φ 1 (u)

Estimating the MTE semiparametric model The outcome equation can be written 1 as E(Y X = x, P(Z ) = p) = x δ 0 + px (δ 1 δ 0 ) + K (p) There are several ways to estimate this perhaps the simplest is a series/spline/sieve estimator. Estimate P(Z i ) (logit). Choose a set of basis functions (polynomials) and an order, K. Run the regression: Y i = X i δ 0 + ˆP(Z i )X i (δ 1 δ 0 ) + γ 1 ˆP(Zi ) +... + γ K ˆP(Zi ) K + η i An important sacrifice here is that MTE(x, u) is only identified for u in the support of P. (Recall identification at )

Consider policies that affect P(Z ) but not Y 1, Y 0, V. Propensity score P under new policy. It can be shown that the effect of shifting to this new policy is given by 1 0 [ MTE(x, u) F P X=x(u) F P X=x (u) E(P X = x) E(P X = x) This will still require large support for P(Z ). define a continuum of policies consider marginal change from baseline ] du

MPRTE Consider increasing tuition (a component of Z ) by an amount α: tuition = tuition + α. Corresponding propensity score, P α. Define the MPRTE as 1 lim α 0 0 [ MTE(x, u) F Pα X=x(u) F P0 X=x(u) E(P α X = x) E(P 0 X = x) ] du This is also equal to lim e 0 E(Y 1 Y 0 µ D (X, Z ) V < e). And it can be written as 1 MTE(x, u)ω(x, u) where 0 ω(x, u) = f P X (u)f V X (F 1 V X (u)) E(f V X (µ D (X, Z )) X)

Maximum likelihood You ve seen theoretical conditions for maximum likelihood estimation before. See Cameron and Trivedi for a review. Here I want to practice constructing the likelihood for different models. Suppose we observe a vector of outcomes Y i and covariates X i. With iid data, the likelihood function is L(β) = n f Y X (Y i X i ; β) i=1 Let L(β) = log(l(β)) = n i=1 log(f Y X (Y i X i ; β)). Then ˆβ MLE = arg max L(β) β

Censoring and Truncation Censoring and truncation occur when when only observe the value of a variable below or above some threshold. Censoring occurs when we observe y if y > L but we still observe the individuals with y < L, just not their value of y. Truncation occurs when we observe y if y > L and we don t observe the individuals with y < L at all.

Censoring Suppose the conditional density is given by f (yi X i ). Then the likelihood is given by ( L n i=1 1(y i > L) ln(f (y i X i ))+1(y i < L) ln f (y X i )dy )

Truncation Suppose again that the conditional density is given by f (yi X i ). Then the density conditional on being observed is given by f (yi f (y X i)dy. Plug this into the likelihood to get X i )/ L n i=1 ln(f (y i ( ) X i )) ln f (y X i )dy L

Tobit In the Tobit model, yi = β X i + ε i where ε i is normally distributed. This can be implemented quite easily in Stata using the tobit command. As in the probit and logit models, consistency requires correct specification of the distribution of ε i a strong assumption (excludes heteroskedasticity, for example)!

Tobit Note that in this model, E(y i X i, y i > 0) = β X i + E(ε i β X i < ε i ) This second term is equal to σ ε φ( β X i /σ ε )/(1 Φ( β X i /σ ε )) if ε i N(0, σε 2 ) This is called the inverse Mills ratio.

Tobit An alternative is what Cameron and Trivedi call a two-part estimator. This models censoring as a different function of X i. d i = 1 if the observation is censored and Pr(d i = 0 X i ) is not given by Pr(β X i + ε i > 0 X i ). This can be interpreted as a participation decision and a separate choice/determination of y i conditional on participation (COP). See Angrist and Pischke for the reduced form approach to this.

Sample selection models The sample selection model generalizes this by allowing different components of X i to enter the participation/selection equation and the outcome equation. y i1 = β 1 X i1 + ε i1 y i2 = β 2 X i2 + ε i2 Then y i2 is observed if and only if y i1 > 0. Sometimes called a type 2 Tobit model.

Sample selection models This is easy to estimate via MLE if the distribution of (ε i1, ε i2 ) is known. The likelihood is L = n (1 y 1i )log (Pr(y1i 0))+y 1i log (f (y 2i y1i > 0)Pr(y1i > 0 i=1

Sample selection models The sample selection model can also be estimated using the two-step estimator of Heckman (1979). (heckit in Stata) The Roy model is a version of the sample selection model if yi1 < 0 then we observe a different outcome, yi3 and yi1 = y i2 y i3 or = y i2 y i3 + γ Z i We ll see an example of heckit briefly next class.

Example Another example (due to Petra Todd) Consider the following model of consumption (c i ), wages (w i ), and labor force participation (d i ). U i = c i + α i (1 d i ) with α i = κ i β + ε i c i = y i + w i d i πn i d i w i = z i γ + η i d i = 1(z i γ πn i κ i β + η i ε i 0) where ε i, η i are jointly normal. Is it possible to identify the effect of a child care subsidy?

Example Another example (due to Petra Todd) If we have sufficient independent variation so that we can identify Pr(d i = 1 z i, n i, κ i ) the estimated coefficients correspond to γ/σ ξ, β/σ ξ, and (β n + π)/σ ξ where σ 2 ξ = Var(η i ε i ) enough to identify the effect on LFP of the birth of a child, for example. not enough to identify the effect of a child care subsidy of a given amount τ because we don t observe variation in τ under the new policy, the participation probability becomes ( ) zi γ κ i β (β n + π τ)n i Pr(d i = 1 z i, n i, κ i, τ) = Φ σ ξ

Example Another example (due to Petra Todd) We can estimate a sample selection model where y 2i = w i and y 1i = d i We can generally identify (β n + π)/σ ξ If there is a variable in the wage equation that is not in κ i then we can identify σ ξ. This is what we need to identify the counterfactual!