A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Similar documents
A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

What s New in Econometrics? Lecture 14 Quantile Methods

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

New Developments in Econometrics Lecture 16: Quantile Estimation

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

Econometric Analysis of Cross Section and Panel Data

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

INVERSE PROBABILITY WEIGHTED ESTIMATION FOR GENERAL MISSING DATA PROBLEMS

CORRELATED RANDOM EFFECTS MODELS WITH UNBALANCED PANELS

What if we want to estimate the mean of w from an SS sample? Let non-overlapping, exhaustive groups, W g : g 1,...G. Random

Control Function and Related Methods: Nonlinear Models

New Developments in Econometrics Lecture 9: Stratified Sampling

Jeffrey M. Wooldridge Michigan State University

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Thoughts on Heterogeneity in Econometric Models

Non-linear panel data modeling

1. Overview of the Basic Model

Flexible Estimation of Treatment Effect Parameters

Discrete Dependent Variable Models

1 Motivation for Instrumental Variable (IV) Regression

CENSORED DATA AND CENSORED NORMAL REGRESSION

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Econometrics of Panel Data

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap

Econ 582 Fixed Effects Estimation of Panel Data

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

ECON Introductory Econometrics. Lecture 16: Instrumental variables

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0

Difference-in-Differences Estimation

WISE International Masters

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

AGEC 661 Note Fourteen

Fractional Imputation in Survey Sampling: A Comparative Review

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

What s New in Econometrics. Lecture 1

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

A Guide to Modern Econometric:

ECNS 561 Multiple Regression Analysis

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECONOMETFUCS FIELD EXAM Michigan State University May 11, 2007

Missing dependent variables in panel data models

Linear Models in Econometrics

4.8 Instrumental Variables

Introduction to Econometrics

1 Estimation of Persistent Dynamic Panel Data. Motivation

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Econometrics of Panel Data

A Course in Applied Econometrics. Lecture 5. Instrumental Variables with Treatment Effect. Heterogeneity: Local Average Treatment Effects.

Robustness to Parametric Assumptions in Missing Data Models

Multiple Equation GMM with Common Coefficients: Panel Data

Microeconometrics. C. Hsiao (2014), Analysis of Panel Data, 3rd edition. Cambridge, University Press.

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Economics 671: Applied Econometrics Department of Economics, Finance and Legal Studies University of Alabama

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Bayesian Interpretations of Heteroskedastic Consistent Covariance. Estimators Using the Informed Bayesian Bootstrap

A Course on Advanced Econometrics

Econometrics Summary Algebraic and Statistical Preliminaries

An overview of applied econometrics

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

2. Linear regression with multiple regressors

Specification testing in panel data models estimated by fixed effects with instrumental variables

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS HONOR S EXAM REVIEW SESSION

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Introductory Econometrics

Applied Quantitative Methods II

Final Exam. Economics 835: Econometrics. Fall 2010

Specification Tests in Unbalanced Panels with Endogeneity.

A simple alternative to the linear probability model for binary choice models with endogenous regressors

11. Bootstrap Methods

Review of Econometrics

Single Equation Linear GMM

Intermediate Econometrics


IV estimators and forbidden regressions

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Eco517 Fall 2014 C. Sims FINAL EXAM

Financial Econometrics

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

ECON 594: Lecture #6

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Simple Tests of Random Missing for Unbalanced Panel Data Models. September 10, 2014

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Selection on Observables: Propensity Score Matching.

ECON3150/4150 Spring 2015

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Christopher Dougherty London School of Economics and Political Science

Applied Econometrics Lecture 1

Lecture 8: Instrumental Variables Estimation

Stat 5101 Lecture Notes

Partial effects in fixed effects models

Transcription:

A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type Selection Corrections 1. When Can Missing Data be Ignored? Linear model with IVs: y i x i u i, where x i is 1 K, instruments z i are 1 L, L K. Lets i is the selection indicator, s i 1 if we can use observation i. With L K, the complete case estimator is IV s i z i x i s i z i x i s i z i y i s i z i u i. (1) (2) (3) 1 2 For consistency, rank E z i x i s i 1 K and which is implied by Sufficient for (5) is E s i z i u i 0, E u i z i, s i 0. (4) (5) E u i z i 0, s i h z i (6) for some function h. Zero covariance assumption in the population, E z i u i 0, is not sufficient for consistency when s i h z i. Special case is when E y i x i x i and selection s i is a function of x i. onlinear models/estimation methods: onlinear Least Squares: E y x, s E y x. Least Absolute Deviations: Med y x, s Med y x Maximum Likelihood: D y x, s D y x or D s y, x D s x. All of these allow selection on x but not generally on y. For estimating E y i, unbiasedness and consistency of the sample on the selected sample requires E y s E y. 3 4

Panel data: if we model D y t x t, and s t is the selection indicator, the sufficient condition to ignore selection is D s t x t, y t D s t x t, t 1,...,T. Let the true conditional density be f t y it x it,. Then the partial log-likelihood function for a random draw i from the cross section can be written as T t T s it log f t y it x it, g s it l it g. t (7) (8) If x it includes y i,t, (7) allows selection on y i,t, but not on shocks from t 1tot. Similar findings for LS, quasi-mle, quantile regression. Methods to remove time-constant, unobserved heterogeneity: for a random draw i, y it t x it c i u it, with IVs z it for x it. Random effects IV methods (unbalanced panel): (10) E u it z i1,...,z it, s i1,...,s it, c i 0, t 1,...,T (11) Can show under (7) that E s it l it g x it E s it x it E l it g x it. (9) E c i z i1,...,z it, s i1,...,s it E c i 0. Selection in any time period cannot depend on u it or c i. (12) 5 6 FE on unbalanced panel: can get by with just (11). Let T ÿ it y it T i r s ir y ir and similarly for and x it and z it, where T T i r s ir is the number of time periods for observation i. The FEIV estimator is FEIV T s it z it x it t T s it z it y it. t Weakest condition for consistency is T t E s it z it u it 0. One important violation of (11) is when units drop out of the sample in period t 1 because of shocks u it realized in time t. This generally induces correlation between s i,t and u it. Simple variable addition test. Consistency of FE (and FEIV) on the unbalanced panel under breaks down if the slope coefficients are random and one ignores this in estimation. The error term contains the term x i d i where d i b i. Simple test based on the alternative E b i z i1,...,z it, s i1,...,s it E b i T i. Then, add interaction terms of dummies for each possible sample size (with T i T as the base group): 1 T i 2 x it,1 T i 3 x it,..., 1 T i T 1 x it. Estimate equation by FE or FEIV. (13) (14) 7 8

Can use FD in basic model, too, which is very useful for attrition problems (later). Generally, if y it t x it u it, t 2,...,T (15) and, if z it is the set of IVs at time t, we can use E u it z it, s it 0 as being sufficient to ignore the missingess. Again, can add s i,t to test for attrition. onlinear models with unosberved effects are more difficult to handle. Certain conditional MLEs (logit, Poisson) can accomodate selection that is arbitrarily correlated with the unobserved effect. (16) 2. Inverse Probability Weighting Weighting with Cross-Sectional Data When selection is not on conditioning variables, can try to use probability weights to reweight the selected sample to make it representative of the population. Suppose y is a random variable whose population mean E y we would like to estimate, but some observations are missing on y. Let y i, s i, z i : i 1,..., indicate independent, identically distributed draws from the population, where z i is always observed (for now). 9 10 Selection on observables assumption P s 1 y, z P s 1 z p z (17) where p z 0 for all possible values of z. Consider IPW s i p z i where s i selects out the observed data points. Using (17) and iterated expectations, can show IPW is consistent (and unbiased) for y i. (Same kind of estimate used for treatment effects.) y i, (18) Sometimes p z i is known, but mostly it needs to be estimated. Let p z i denote the estimated selection probability: Can also write as IPW IPW 1 s i s i p z i p z i y i. (19) y i (20) where 1 s i is the number of selected observations and 1 / is a consistent estimate of P s i 1. 11 12

A different estimate is obtained by solving the least squares problem min m s i p z i y i m 2. Horowitz and Manski (1998) study estimating population means using IPW. HM focus on bounds in estimating E g y x A for conditioning variables x. Problem with certain IPW estimators based on weights that estimate P s 1 /P s 1 z : the resulting estimate of the mean can lie outside the natural bounds. One should use P s 1 x A /P s 1 x A, z if possible. Unfortunately, cannot generally estimate the proper weights if x is sometimes missing. The HM problem is related to another issue. Suppose E y x x. Let z be a variables that are always observed and let p z be the selection probability, as before. Suppose at least part of x is not always observed, so that x is not a subset of z. Consider the IPW estimator of, solves min a,b s i p z i y i a x i b 2. (21) (22) 13 14 The problem is that if P s 1 x, y P s 1 x, the IPW is generally inconsistent because the condition (23) P s 1 x, y, z P s 1 z (24) is unlikely. On the other hand, if (23) holds, we can consistently estimate the parameters using OLS on the selected sample. If x always observed, case for weighting is much stronger because then x z. If selection is on x, this should be picked up in large samples in the estimation of P s 1 z. 15 If selection is exogenous and x is always observed, is there a reason to use IPW? ot if we believe (21) along with the homoskedasticity assumption Var y x 2. Then, OLS is efficient and IPW is less efficient. IPW can be more efficient with heteroskedasticity (but WLS with the correct heteroskedasticity function would be best). Still, one can argue for weighting under (23) as a way to consistently estimate the linear projection. Write L y 1, x x (25) where L denotes the linear projection. Under under P s 1 x, y P s 1 x, the IPW estimator is consistent for.the unweighted estimator has a probabilty limit that depends on p x. 16

Parameters in LP show up in certain treatment effect estimators, and are the basis for the double robustness result of Robins and Ritov (1997) in the case of linear regression. The double robustness result holds for certain nonlinear models, but must choose model for E y x and the objective function appropriately; see Wooldridge (2007). (For binary or fractional response, use logistic function and Bernoulli quasi-log likelihood (QLL). For nonnegative response, use exponential function with Poisson QLL.) Return to the IPW regression estimator under P s 1 y, z P s 1 z G z,, with E u 0, E x u 0, for a parametric function G (such as flexible logit), and is the binary response MLE. The asymptotic variance of IPW, using the estimated probability weights, is Avar IPW E x i x i E r i r i E x i x i, where r i is the P 1 vector of population residuals from the regression s i /p z i x i u i on d i, where d i is the M 1 score for the MLE used to obtain. (26) (27) 17 18 Variance in (27) is always smaller than the variance if we knew p z i. Leads to a simple estimate of Avar IPW : s i / i x i x i r i r i s i / i x i x i If selection is estimated by logit with regressors h i h z i, d i h i s i h i, where a exp a / exp a and h i h z i. (28) (29) Illustrates an interesting finding of RRZ (1995): Can never do worse for estimating the parameters of interest,, and usually do better, when adding irrelevant functions to a logit selection model in the first stage. The Hirano, Imbens, and Ridder (2003) estimator keeps expanding h i. Adjustment in (27) carries over to general nonlinear models and estimation methods. Ignoring the estimation in p z, as is standard, is asymptotically conservative. When selection is exogenous in the sense of P s 1 x, y, z P s 1 x, the adjustment makes no difference. 19 20

evo (2003) studies the case where population moments are E r w i, 0 and selection depends on elements of w i not always observed. Use information on population means E h w i such that P s 1 w P s 1 h w and use method of moments. For a logit selection model, Attrition in Panel Data Inverse probability weighting can be applied to the attrition problem in panel data. Many estimation methods can be used, but consider MLE. We have a parametric density, f t y t x t,, and let s it be the selection indicator. Pooled MLE on on the observed data: E s i h w i r w i, 0 (30) max T t s it log f t y it x it,, (32) E s i h w i h w i h (31) where h is known. Equation (31) generally identifies, and can be used in a second step to choose in a weighted GMM procedure. which is consistent if P s it 1 y it, x it P s it 1 x it. If not, maybe we can find variables r it, such that P s it 1 y it, x it, r it P s it 1 r it p it 0. (33) 21 22 The weighted MLE is How do we obtain p it from the it? ot without some strong max T s it /p it log f t y it x it,. t (34) assumptions. Let v it w it, z it, t 1,...,T. An ignorability assumption that works is Under (33), IPW is generally consistent because E s it /p it q t w it, E q t w it, (35) where q t w it, log f t y it x it,. How do we choose r it to make (33) hold (if possible)? RRZ (1995) propose a sequential strategy, it P s it 1 z it, s i,t 1, t 1,...,T. Typically, z it contains elements from w i,t,...,w i1. (36) P s it 1 v i, s i,t 1 P s it 1 z it, s i,t 1. That is, given the entire history v i v i1,...,v it, selection at time t depends only on variables observed at t 1. RRZ (1995) show how to relax it somewhat in a regression framework with time-constant covariates. Using (37), can show that p it P s it 1 v i it i,t i1. (37) (38) 23 24

So, a consistent two-step method is: (i) In each time period, estimate a binary response model for P s it 1 z it, s i,t 1, which means on the group still in the sample at t 1. The fitted probabilities are the it. Form p it it i,t i1. (ii) Replace p it with p it in (34), and obtain the weighted pooled MLE. As shown by RRZ (1995) in the regression case, it is more efficient to estimate the p it than to use know weights, if we could. See RRZ (1995) and Wooldridge (2002) for a simple regression method for adjusting the score. IPW for attrition suffers from a similar drawback as in the cross section case. amely, if P s it 1 w it P s it 1 x it then the unweighted estimator is consistent. If we use weights that are not a function of x it in this case, the IPW estimator is generally inconsistent. Related to the previous point: would rarely apply IPW in the case of a model with completely specified dynamics. Why? If we have a model for D y it x it, y i,t,...,x i1, y i0 or E y it x it, y i,t,...,x i1, y i0, then our variables affecting attrition, z it, are likely to be functions of y i,t,...,x i1, y i0. If they are, the unweighted estimator is consistent. For misspecified models, we might still want to weight. 25 26 3. Imputation So far, we have discussed when we can just drop missing observations (Section 1) or when the complete cases can be used in a weighting method (Section 2). A different approach to missing data is to try to fill in the missing values, and then analyze the resulting data set as a complete data set. Little and Rubin (2002) provide an accessible treatment to imputation and multiple imputation methods, with lots of references to work by Rubin and coauthors. Imputing missing values is not always valid. Most methods depend on a missing at random (MAR) assumption. When data are missing on the response variable, y, MAR is essentially the same as P s 1 y, x P s 1 x. Missing completely at random (MCAR) is when s is independent of w x, y. MAR for general missing data patterns. Let w i w i1, w i2 be a random draw from the population. Let r i r i1, r i2 be the retention indicators for w i1 and w i2,sor ig 1 implies w ig is observed. MCAR is that r i is independent of w i. The MAR assumption is that P r i1 0, r i2 0 w i P r i1 0, r i2 0 00 and so on. 27 28

MAR is more natural with monotone missing data problems; we just saw the case of attrition. If we order the variables so that if w ih is observed the so is w ig, g h. Write f w 1,...,w G f w G w G,...,w 1 f w G w G,...,w 1 f w 2 w 1 f w 1. Partial log likelihood: Simple example of imputation. Let y E y, but data are missing on some y i. Unless P s i 1 y i P s i 1, the complete-case average is not consistent for y. Suppose that the selection is ignorable conditional on x: G r ig log f w ig w i,g,...,w i1,, g (39) E y x, s E y x m x,. LS using selected sample is consistent for. Obtain a fitted value, (41) whereweuser ig r ig r i,g r i2. Under MAR, m x i,, for any unit it the sample. Let i s i y i s i m x i, be the imputed data. Imputation estimator: E r ig w ig,...,w i1 E r ig w i,g,...,w i1. (40) (39) is the basis for filling in data in monotonic MAR schemes. y s i y i s i m x i,. (42) 29 30 From plim( y E s i y i s i m x i, we can show consistency of y because under (41), LR propose adding a random draw to m x i, assuming that we can estimate D y x. If we assume D u i x i ormal 0, 2 u,drawu i from a ormal 0, 2 u, distribution, where 2 u is estimated using the complete case nonlinear regression residuals, and then use m x i, u i for the missing data. Called the conditional draw method of imputation (special case of stochastic imputation). Generally difficult to quantity the uncertainty from single-imputation methods, where one imputed values is obtained for each missing variable. Can bootstrap the entire estimation/imputation steps, but this is computationally intensive. E s i y i s i m x i, E m x i, y. Danger in using imputation methods: we might be tempted to treat the imputed data as real random draws. Generally leads to incorrect inference because of inconsistent variance estimation. (In linear regression, easy to see that estimated variance is too small.) Little and Rubin (2002) call (43) the method of conditional means. In their Table Table 4.1 the document the downward bias in variance estiimates. (43) 31 32

Multiple imputation is an alternative. Its theoretical justification is Bayesian, based on obtaining the posterior distribution in particular, mean and variance of the parameters conditional on the observed data. For general missing data patterns, the computation required to impute missing values is quite complicated, and involves simulation methods of estimation. See also Cameron and Trivedi (2005). General idea: rather than just impute one set of missing values to create one complete data set, create several imputed data sets. (Often the number is fairly small, such as five or so.) Estimate the parameters of interest using each imputed data set, and average to obtain a final parameter estimate and sampling error. Let W mis denote the matrix of missing data and W obs thematrixof observations. Assume that MAR holds. MAR used to estimate E W obs, the posterier mean of given W obs. But by iterated expectations, E W obs E E W obs, W mis W obs. If d E W obs, W d mis for imputed data set d, then approximate E W obs as D D d. d (44) (45) 33 34 Further, we can obtain a sampling variance by estimating Var W obs using which suggests Var W obs E Var W obs, W mis W obs Var E W obs, W mis W obs, D Var W obs D V d d D D 1 d d d V B. (46) (47) For small number of imputations, a correction is usually made, namely, V D B. assuming that one trusts the MAR assumption and the underlying distributions used to draw the imputed values, inference with multiple imputations is fairly straightforward. D need not be very large so estimation using nonlinear models is relatively easy, given the imputed data. Use caution when applying to models with missing conditioning variables. Suppose x x 1, x 2, we are interested in D y x, data are missing on y and x 2, and selection is a function of x 2. Using the complete cases will be consistent. Imputation methods would not be, as they require D s y, x 1, x 2 D s x 1. 35 36

4. Heckman-Type Selection Corrections Advantages of applying IV methods when data are missing on explanatory variables in addition to the response variable. Briefly, a variable that is exogenous in the population model need not be in the selected subpopulation. (Example: wage-benefits tradeoff.) y 1 z 1 1 1 y 2 u 1 y 2 z 2 v 2 y 3 1 z 3 v 3 0. (48) (49) (50) Assume (a) z, y 3 is always observed, y 1, y 2 observed when y 3 1; (b) E u 1 z, v 3 1 v 3 ;(c)v 3 z ~ormal 0, 1 ; (d)e z v 2 0 and 22 0., then we can write y 1 z 1 1 1 y 2 g z, y 3 e 1 (51) where e 1 u 1 g z, y 3 u 1 E u 1 z, y 3. Selection is exogenous in (51) because E e 1 z, y 3 0. Because y 2 is not exogenous, we estimate (51) by IV, using the selected sample, with IVs z, z 3 because g z,1 z 3. The two-step estimator is (i) Probit of y 3 on z to (using all observations) to get i3 z i 3 ; (ii) IV (2SLS if overidentifying restrictions) of y i1 on z i1, y i2, i3 using IVs z i, i3. 37 38 If y 2 is always observed, tempting to obtain the fitted values i2 from the reduced form y i2 on z i, and then use OLS of y i1 on z i1, i2, i3 in the second stage. But this effectively puts 1 v 2 in the error term, so we would need u 1 2 v 2 to be normally (or something similar). Rules out discrete y 2. The procedure just outlined uses the linear projection y 2 z 2 2 z 3 r 3 in the selected population, and does not care whether this is a conditional expectation. Should have at least two elements in z not in z 1 : one to exogenously vary y 2, one to exogenously vary selection, y 3. If an explanatory variable is not always observed, ideally can find an IV for it and treat it as endogenous even if it is exogenous in the population. Generally, the usual Heckman approach (like IPW and imputation) is hard to justify in the model E y x E y x 1 if x 1 is not always observed. The first-step would be estimation of P s 1 x 2 where x 2 is always observed. But then we would be assuming P s 1 x P s 1 x 2, effectively an exclusion restriction on a reduced form. Lecture notes discuss linear unobserved effects models with endogenous explanatory variables and attrition: modify cross-sectional IV procedure. 39 40