A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Similar documents
Control Function and Related Methods: Nonlinear Models

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Econometric Analysis of Cross Section and Panel Data

New Developments in Econometrics Lecture 16: Quantile Estimation

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Jeffrey M. Wooldridge Michigan State University

CORRELATED RANDOM EFFECTS MODELS WITH UNBALANCED PANELS

What s New in Econometrics? Lecture 14 Quantile Methods

Thoughts on Heterogeneity in Econometric Models

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Non-linear panel data modeling

1 Motivation for Instrumental Variable (IV) Regression

CENSORED DATA AND CENSORED NORMAL REGRESSION

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

A Guide to Modern Econometric:

A simple alternative to the linear probability model for binary choice models with endogenous regressors

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

1. Overview of the Basic Model

Panel Data Exercises Manuel Arellano. Using panel data, a researcher considers the estimation of the following system:

Missing dependent variables in panel data models

What if we want to estimate the mean of w from an SS sample? Let non-overlapping, exhaustive groups, W g : g 1,...G. Random

Flexible Estimation of Treatment Effect Parameters

4.8 Instrumental Variables

New Developments in Econometrics Lecture 9: Stratified Sampling

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

WISE International Masters

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

A Course on Advanced Econometrics

Difference-in-Differences Estimation

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Intermediate Econometrics

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

APEC 8212: Econometric Analysis II

Binary Models with Endogenous Explanatory Variables

A Note on Demand Estimation with Supply Information. in Non-Linear Models

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

SIMPLE SOLUTIONS TO THE INITIAL CONDITIONS PROBLEM IN DYNAMIC, NONLINEAR PANEL DATA MODELS WITH UNOBSERVED HETEROGENEITY

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Introduction: structural econometrics. Jean-Marc Robin

Maximum Likelihood (ML) Estimation

Christopher Dougherty London School of Economics and Political Science

Applied Econometrics Lecture 1

Estimating Panel Data Models in the Presence of Endogeneity and Selection

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Discrete Dependent Variable Models

Final Exam. Economics 835: Econometrics. Fall 2010

Linear Models in Econometrics

Introduction to Econometrics

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

13 Endogeneity and Nonparametric IV

Econometrics II - EXAM Answer each question in separate sheets in three hours

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Specification testing in panel data models estimated by fixed effects with instrumental variables

Estimation of Dynamic Panel Data Models with Sample Selection

Lecture 8: Instrumental Variables Estimation

Applied Health Economics (for B.Sc.)

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

A Two-Regime CRC Model with Nonnegative Dependent Variable

AGEC 661 Note Fourteen

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

On IV estimation of the dynamic binary panel data model with fixed effects

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Econometric Analysis of Games 1

Stat 5101 Lecture Notes

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

INVERSE PROBABILITY WEIGHTED ESTIMATION FOR GENERAL MISSING DATA PROBLEMS

Introduction to Eco n o m et rics

Control Function Instrumental Variable Estimation of Nonlinear Causal Effect Models

Linear Panel Data Models

Identification in Discrete Choice Models with Fixed Effects

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

Lecture 6: Discrete Choice: Qualitative Response

Least Squares Estimation-Finite-Sample Properties

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Next, we discuss econometric methods that can be used to estimate panel data models.

11. Further Issues in Using OLS with TS Data

Estimating the Fractional Response Model with an Endogenous Count Variable

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Truncation and Censoring

Syllabus. By Joan Llull. Microeconometrics. IDEA PhD Program. Fall Chapter 1: Introduction and a Brief Review of Relevant Tools

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

Lecture #8 & #9 Multiple regression

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

We begin by thinking about population relationships.

ECNS 561 Multiple Regression Analysis

THE BEHAVIOR OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF DYNAMIC PANEL DATA SAMPLE SELECTION MODELS

11. Bootstrap Methods

Transcription:

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated Random Coefficient Models 3. Common Nonlinear Models and CF Limitations 4. Semiparametric and Nonparametric Approaches 5. Methods for Panel Data 1 1. Linear-in-Parameters Models: IV versus Control Functions Most models that are linear in parameters are estimated using standard IV methods two stage least squares (2SLS) or generalized method of moments (GMM). An alternative, the control function (CF) approach, relies on the same kinds of identification conditions. Let y 1 be the response variable, y 2 the endogenous explanatory variable (EEV), and z the 1 L vector of exogenous variables (with z 1 1 : y 1 z 1 1 1 y 2 u 1, where z 1 is a 1 L 1 strict subvector of z. 2 (1) First consider the exogeneity assumption Plug (4) into (1): E z u 1 0. (2) y 1 z 1 1 1 y 2 1 v 2 e 1, (5) Reduced form for y 2 : y 2 z 2 v 2,E z v 2 0 (3) where 2 is L 1. Write the linear projection of u 1 on v 2, in error form, as u 1 1 v 2 e 1, (4) where 1 E v 2 u 1 /E v 2 2 is the population regression coefficient. By construction, E v 2 e 1 0 and E z e 1 0. where v 2 is an explanatory variable in the equation. The new error, e 1, is uncorrelated with y 2 as well as with v 2 and z. Two-step procedure: (i) Regress y 2 on z and obtain the reduced form residuals, v 2 ; (ii) Regress y 1 on z 1, y 2, and v 2. The implicit error in (6) is e i1 1 z i 2 2, which depends on the sampling error in 2 unless 1 0 (exogeneity test). OLS estimators from (6) will be consistent for 1, 1, and 1. (6) 3 4

The OLS estimates from (6) are control function estimates. The OLS estimates of 1 and 1 from (6) are identical to the 2SLS estimates starting from (1). Now extend the model: y 1 z 1 1 1 y 2 1 y 2 2 u 1 E u 1 z 0. Let z 2 be a scalar not also in z 1. Under the (8) which is stronger than (2), and is essential for nonlinear models we can use, say, z 2 2 as an instrument for y 2 2. So the IVs would be z 1, z 2, z 2 2 for z 1, y 2, y 2 2. (7) (8) What does CF approach entail? Now assume E u 1 z, y 2 E u 1 v 2 1 v 2, where independence of u 1, v 2 and z is sufficient for the first equality; linearity is a substantive restriction. Then, E y 1 z, y 2 z 1 1 1 y 2 1 y 2 2 1 v 2, and a CF approach is immediate: replace v 2 with v 2 and use OLS on (10). Not equivalent to a 2SLS estimate. CF likely more efficient but less robust. (9) (10) 5 6 CF approaches can impose extra assumptions when we base it on E y 1 z, y 2. The estimating equation is Consistency of the CF estimators hinges on the model for D y 2 z being correctly specified, along with linearity in E u 1 v 2. If we just apply 2SLS directly to (1), it makes no distinction among discrete, continuous, or some mixture for y 2. How might we robustly use the binary nature of y 2 in IV estimation? Obtain the fitted probabilities, z i 2, from the first stage probit, and then use these as IVs (not regressors!) for y i2. Fully robust to misspecification of the probit model, usual standard errors from IV asymptotically valid. Efficient IV estimator if P y 2 1 z z 2 and Var u 1 z 2 1. Similar suggestions work for y 2 a count variable or a corner solution. E y 1 z, y 2 z 1 1 1 y 2 E u 1 z, y 2. If y 2 1 z 2 e 2 0, u 1, e 2 is independent of z, E u 1 e 2 1 e 2, and e 2 ~Normal 0, 1, then E u 1 z, y 2 1 y 2 z 2 1 y 2 z 2, where is the inverse Mills ratio (IMR). Heckman two-step approach (for endogeneity, not sample selection): (i) Probit to get 2 (11) (12) and compute gr i2 y i2 z i 2 1 y i2 z i 2 (generalized residual). (ii) Regress y i1 on z i1, y i2, gr i2, i 1,...,N. 7 8

2. Correlated Random Coefficient Models Modify the original equation as y 1 1 z 1 1 a 1 y 2 u 1, where a 1, the random coefficient on y 2. Heckman and Vytlacil (1998) call (13) a correlated random coefficient (CRC) model. Write a 1 1 v 1 where 1 E a 1 is the object of interest. We can rewrite the equation as y 1 1 z 1 1 1 y 2 v 1 y 2 u 1 1 z 1 1 1 y 2 e 1. (13) (14) (15) The potential problem with applying instrumental variables is that the error term v 1 y 2 u 1 is not necessarily uncorrelated with the instruments z, even under E u 1 z E v 1 z 0. We want to allow y 2 and v 1 to be correlated, Cov v 1, y 2 1 0. A suffcient condition that allows for any unconditional correlation is Cov v 1, y 2 z Cov v 1, y 2, and this is sufficient for IV to consistently estimate 1, 1. (16) (17) 9 10 The usual IV estimator that ignores the randomness in a 1 is more robust than Garen s (1984) CF estimator, which adds v 2 and v 2 y 2 to the original model, or the Heckman/Vytlacil (1998) plug-in estimator, which replaces y 2 with 2 z 2. The condition Cov v 1, y 2 z Cov v 1, y 2 cannot really hold for discrete y 2. Further, Card (2001) shows how it can be violated even if y 2 is continuous. Wooldridge (2005) shows how to allow parametric heteroskedasticity. In the case of binary y 2, we have what is often called the switching regression model. If y 2 1 z 2 v 2 0 and v 2 z is Normal 0, 1, then where E y 1 z, y 2 1 z 1 1 1 y 2 1 h 2 y 2, z 2 1 h 2 y 2, z 2 y 2, h 2 y 2, z 2 y 2 z 2 1 y 2 z 2 is the generalized residual function. Can interact exogenous variables with h 2 y i2, z i 2. Or, allow E v 1 v 2 to be more flexible [Heckman and MaCurdy (1986)]. 11 12

3. Nonlinear Models and Limitations of the CF Approach Typically three approaches to nonlinear models with EEVs. (1) Plug in fitted values from a first step regression (in an attempt to mimic 2SLS in linear model). More generally, try to find E y 1 z or D y 1 z and then impose identifying restrictions. (2) CF approach: plug in residuals in an attempt to obtain E y 1 y 2, z or D y 1 y 2, z. (3) Maximum Likelihood (often limited information): Use models for D y 1 y 2, z and D y 2 z jointly. All strategies are more difficult with nonlinear models when y 2 is discrete. Some poor practices have lingered. Binary and Fractional Responses Probit model: y 1 1 z 1 1 1 y 2 u 1 0, where u 1 z ~Normal 0, 1. Analysis goes through if we replace z 1, y 2 with any known function x 1 g 1 z 1, y 2. The Rivers-Vuong (1988) approach is to make a homoskedastic-normal assumption on the reduced form for y 2, y 2 z 2 v 2, v 2 z ~Normal 0, 2 2. (18) (19) 13 14 RV approach comes close to requiring u 1, v 2 independent of z. (20) If we also assume u 1, v 2 ~Bivariate Normal (21) with 1 Corr u 1, v 2, then we can proceed with MLE based on f y 1, y 2 z. A CF approach is available, too, based on P y 1 1 z, y 2 z 1 1 1 y 2 1 v 2 (22) where each coefficient is multiplied by 1 2 1 1/2. The RV two-step approach is (i) OLS of y 2 on z, to obtain the residuals, v 2. (ii) Probit of y 1 on z 1, y 2, v 2 to estimate the scaled coefficients. A simple t test on v 2 is valid to test H 0 : 1 0. Can recover the original coefficients, which appear in the partial effects. Or, N ASF z 1, y 2 N 1 x 1 1 1v i2, i 1 that is, we average out the reduced form residuals, v i2. This formulation is useful for more complicated models. (23) 15 16

The two-step CF approach easily extends to fractional responses: E y 1 z, y 2, q 1 x 1 1 q 1, where x 1 is a function of z 1, y 2 and q 1 contains unobservables. Can use the the same two-step because the Bernoulli log likelihood is in the linear exponential family. Still estimate scaled coefficients. APEs must be obtained from (23). In inference, we should only assume the mean is correctly specified.method can be used in the binary and fractional cases. To account for first-stage estimation, the bootstrap is convenient. Wooldridge (2005) describes some simple ways to make the analysis starting from (24) more flexible, including allowing Var q 1 v 2 to be heteroskedastic. 17 (24) The control function approach has some decided advantages over another two-step approach one that appears to mimic the 2SLS estimation of the linear model. Rather than conditioning on v 2 along with z (and therefore y 2 ) to obtain P y 1 1 z, v 2 P y 1 1 z, y 2, v 2, we can obtain P y 1 1 z. To find the latter probability, we plug in the reduced form for y 2 to get y 1 1 z 1 1 1 z 2 1 v 2 u 1 0. Because 1 v 2 u 1 is independent of z and normally distributed, P y 1 1 z z 1 1 1 z 2 / 1. So first do OLS on the reduced form, and get fitted values, i2 z i 2. Then, probit of y i1 on z i1, i2. Harder to estimate APEs and test for endogeneity. 18 Danger with plugging in fitted values for y 2 is that one might be tempted to plug 2 into nonlinear functions, say y 2 2 or y 2 z 1. This does not result in consistent estimation of the scaled parameters or the partial effects. If we believe y 2 has a linear RF with additive normal error independent of z, the addition of v 2 solves the endogeneity problem regardless of how y 2 appears. Plugging in fitted values for y 2 only works in the case where the model is linear in y 2. Plus, the CF approach makes it much easier to test the null that for endogeneity of y 2 as well as compute APEs. Can understand the limits of CF approach by returning to E y 1 z, y 2, q 1 z 1 1 1 y 2 q 1, where y 2 is discrete. Rivers-Vuong approach does not generally work. Some poor strategies still linger. Suppose y 1 and y 2 are both binary and y 2 1 z 2 v 2 0 (25) and we maintain joint normality of u 1, v 2.We should not try to mimic 2SLS as follows: (i) Do probit of y 2 on z and get the fitted probabilities, 2 z 2. (ii) Do probit of y 1 on z 1, 2, that is, just replace y 2 with 2. 19 20

Currently, the only strategy we have is maximum likelihood estimation based on f y 1 y 2, z f y 2 z. [Perhaps this is why some, such as Angrist (2001), promote the notion of just using linear probability models estimated by 2SLS.] Bivariate probit software be used to estimate the probit model with a binary endogenous variable. Parallel discussions hold for ordered probit, Tobit. Multinomial Responses Recent push by Petrin and Train (2006), among others, to use control function methods where the second step estimation is something simple such as multinomial logit, or nested logit rather than being derived from a structural model. So, if we have reduced forms y 2 z 2 v 2, then we jump directly to convenient models for P y 1 j z 1, y 2, v 2.The average structural functions are obtained by averaging the response probabilities across v i2. No convincing way to handle discrete y 2, though. (26) 21 22 Exponential Models IV and CF approaches available for exponential models. Write E y 1 z, y 2, r 1 exp z 1 1 1 y 2 r 1, where r 1 is the omitted variable. As usual, CF methods based on E y 1 z, y 2 exp z 1 1 1 y 2 E exp r 1 z, y 2. For continuous y 2, can find E y 1 z, y 2 when D y 2 z is homoskedastic normal (Wooldridge, 1997) and when D y 2 z follows a probit (Terza, 1998). In the probit case, E y 1 z, y 2 exp z 1 1 1 y 2 h y 2, z 2, 1 (27) h y 2, z 2, 1 exp 2 1 /2 y 2 1 z 2 / z 2 1 y 2 1 1 z 2 / 1 z 2. IV methods that work for any y 2 are available [Mullahy (1997)]. If E y 1 z, y 2, r 1 exp x 1 1 r 1 (28) and r 1 is independent of z then E exp x 1 1 y 1 z E exp r 1 z 1, (29) where E exp r 1 1 is a normalization. The moment conditions are E exp x 1 1 y 1 1 z 0. (30) 23 24

4. Semiparametric and Nonparametric Approaches Blundell and Powell (2004) show how to relax distributional assumptions on u 1, v 2 in the model y 1 1 x 1 1 u 1 0, where x 1 can be any function of z 1, y 2. Their key assumption is that y 2 can be written as y 2 g 2 z v 2, where u 1, v 2 is independent of z, which rules out discreteness in y 2. Then P y 1 1 z, v 2 E y 1 z, v 2 H x 1 1, v 2 (31) for some (generally unknown) function H,. The average structural function is just ASF z 1, y 2 E vi2 H x 1 1, v i2. Two-step estimation: Estimate the function g 2 and then obtain residuals v i2 y i2 2 z i. BP (2004) show how to estimate H and 1 (up to scaled) and G, the distribution of u 1. The ASF is obtained from G x 1 1 or N ASF z 1, y 2 N 1 x 1 1, v i2 ; i 1 Blundell and Powell (2003) allow P y 1 1 z, y 2 to have general form H z 1, y 2, v 2, and the second-step estimation is entirely nonparametric. Further, 2 can be fully nonparametric. Parametric approximations might produce good estimates of the APEs. (32) 25 26 BP (2003) consider a very general setup: y 1 g 1 z 1, y 2, u 1, with ASF 1 z 1, y 2 g 1 z 1, y 2, u 1 df 1 u 1, where F 1 is the distribution of u 1. Key restrictions are that y 2 can be written as y 2 g 2 z v 2, where u 1, v 2 is independent of z. Key: ASF can be obtained from E y 1 z 1, y 2, v 2 h 1 z 1, y 2, v 2 by averaging out v 2, and fully nonparametric two-step estimates are available. Can also justify flexible parametric approaches and just skip modeling g 1. (33) (34) 5. Methods for Panel Data Combine methods for correlated random effects models with CF methods for nonlinear panel data models with unobserved heterogeneity and EEVs. Illustrate a parametric approach used by Papke and Wooldridge (2008), which applies to binary and fractional responses. Nothing appears to be known about applying fixed effects probit to estimate the fixed effects while also dealing with endogeneity. Likely to be poor for small T. 27 28

Model with time-constant unobserved heterogeneity, c i1, and time-varying unobservables, v it1,as Write z it z it1, z it2, so that the time-varying IVs z it2 are excluded from the structural. Chamberlain approach: E y it1 y it2, z i, c i1, v it1 1 y it2 z it1 1 c i1 v it1. (35) c i1 1 z i 1 a i1, a i1 z i ~ Normal 0, 2 a1. (36) Allow the heterogeneity, c i1, to be correlated with y it2 and z i, where z i z i1,...,z it is the vector of strictly exogenous variables (conditional on c i1 ). The time-varying omitted variable, v it1,is uncorrelated with z i strict exogeneity but may be correlated with y it2. As an example, y it1 is a female labor force participation indicator and y it2 is other sources of income. Next step: E y it1 y it2, z i, r it1 1 y it2 z it1 1 1 z i 1 r it1 where r it1 a i1 v it1. Next, assume a linear reduced form for y it2 : y it2 2 z it 2 z i 2 v it2, t 1,...,T. (37) 29 30 Rules out discrete y it2 because Then r it1 1 v it2 e it1, e it1 z i, v it2 ~ Normal 0, 2 e1, t 1,...,T. E y it1 z i, y it2, v it2 e1 y it2 z it1 e1 (38) (39) e1 z i e1 e1 v it2 (40) where the e subscript denotes division by 1 2 e1 1/2. This equation is the basis for CF estimation. Simple two-step procedure: (i) Estimate the reduced form for y it2 (pooled across t, or maybe for each t separately; at a minimum, different time period intercepts should be allowed). Obtain the residuals, v it2 for all i, t pairs. The estimate of 2 is the fixed effects estimate. (ii) Use the pooled probit (quasi)-mle of y it1 on y it2, z it1, z i, v it2 to estimate e1, e1, e1, e1 and e1. Delta method or bootstrapping (resampling cross section units) for standard errors. Can ignore first-stage estimation to test e1 0 (but test should be fully robust to variance misspecification and serial independence). 31 32

Estimates of average partial effects are based on the average structural function, which is consistently estimated as E ci1,v it1 1 y t2 z t1 1 c i1 v it1, N N 1 e1 y t2 z t1 e1 e1 z i e1 e1v it2. i 1 These APEs, typically with further averaging out across t and perhaps over y t2 and z t1, can be compared directly with fixed effects IV estimates. (41) (42) Model: Linear Fractional Probit Estimation Method: Instrumental Variables Pooled QMLE Coefficient Coefficient APE log(arexppp).555 1.731.583 (.221) (.759) (.255) lunch. 062. 298. 100. 074 (.202) (.068) log(enroll).046.286.096 (.070) (.209) (.070) v 2. 424 1. 378 (.232) (.811) Scale Factor.337 33 34