Economics 536 Lecture 21 Counts, Tobit, Sample Selection, and Truncation

Similar documents
Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions

Non-linear panel data modeling

Lecture 11/12. Roy Model, MTE, Structural Estimation

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Truncation and Censoring

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

Syllabus. By Joan Llull. Microeconometrics. IDEA PhD Program. Fall Chapter 1: Introduction and a Brief Review of Relevant Tools

Econometric Analysis of Cross Section and Panel Data

Poisson Regression. Ryan Godwin. ECON University of Manitoba

What s New in Econometrics? Lecture 14 Quantile Methods

Tobit and Selection Models

For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood,

Treatment Effects with Normal Disturbances in sampleselection Package

AGEC 661 Note Fourteen

Microeconometrics. C. Hsiao (2014), Analysis of Panel Data, 3rd edition. Cambridge, University Press.

Applied Econometrics Lecture 1

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008

Informational Content in Static and Dynamic Discrete Response Panel Data Models

Applied Health Economics (for B.Sc.)

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

New Developments in Econometrics Lecture 16: Quantile Estimation

Quantile Regression for Panel/Longitudinal Data

ECON 594: Lecture #6

Parametric Identification of Multiplicative Exponential Heteroskedasticity

Maximum Likelihood Estimation

Lecture 22 Survival Analysis: An Introduction

Generalized Linear Models

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Gibbs Sampling in Latent Variable Models #1

Working Paper No Maximum score type estimators

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Quantile regression and heteroskedasticity

covariance between any two observations

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Estimation of Censored Quantile Regression for Panel Data with Fixed Effects

Potential Outcomes Model (POM)

Discrete Dependent Variable Models

Economics 508 Lecture 22 Duration Models

Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Economics 671: Applied Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Econometrics Master in Business and Quantitative Methods

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

11. Bootstrap Methods

WISE International Masters

Binary choice 3.3 Maximum likelihood estimation

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Lecture 9: Quantile Methods 2

General Regression Model

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM

A Guide to Modern Econometric:

Modeling Binary Outcomes: Logit and Probit Models

Quasi-Maximum Likelihood: Applications

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Fall, 2007 Nonlinear Econometrics. Theory: Consistency for Extremum Estimators. Modeling: Probit, Logit, and Other Links.

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

Lecture 11 Roy model, MTE, PRTE

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

SOME NOTES ON HOTELLING TUBES

Outline. Overview of Issues. Spatial Regression. Luc Anselin

Limited Dependent Variables and Panel Data

the error term could vary over the observations, in ways that are related

Selection on Observables: Propensity Score Matching.

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Lecture 14 More on structural estimation

Binary Models with Endogenous Explanatory Variables

Robustness of Logit Analysis: Unobserved Heterogeneity and Misspecified Disturbances

Estimation of Censored Quantile Regression for Panel Data with Fixed Effects

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Dynamic Models Part 1

leebounds: Lee s (2009) treatment effects bounds for non-random sample selection for Stata

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Gibbs Sampling in Endogenous Variables Models

AFT Models and Empirical Likelihood

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

The regression model with one stochastic regressor (part II)

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Functional Form. Econometrics. ADEi.

Generalized Linear Models for Non-Normal Data

Linear Regression. Junhui Qian. October 27, 2014

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Lecture 6: Discrete Choice: Qualitative Response

Part 6: Multivariate Normal and Linear Models

Lecture Notes on Measurement Error

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

A dynamic model for binary panel data with unobserved heterogeneity admitting a n-consistent conditional estimator

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

MC3: Econometric Theory and Methods. Course Notes 4

Robust covariance estimation for quantile regression

Exam D0M61A Advanced econometrics

Testing and Model Selection

Maximum Likelihood and. Limited Dependent Variable Models

What s New in Econometrics. Lecture 1

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Transcription:

University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 21 Counts, Tobit, Sample Selection, and Truncation The simplest of this general class of models is Tobin s (1958) model for durable demand y i = x i β + u i y i = max{y i, 0} u i iid F That is, we have a propensity, a latent variable, which describes demand for something when y i > 0 we act on it otherwise we do nothing. This model is the simplest form of the Censored regression model. The first question we should address is Why not estimate by OLS? First, we must clarify: OLS on what? Let s consider OLS on just the y i > 0 observations. Recall that OLS tries to estimate the conditional mean function for y so let s try to compute this in our case: so by the Appendix A when u i iid N (0, σ 2 ). Thus y i = x i β + u i E(y i y i > 0) = x i β + E(u i y i > 0) = x i β + E(u i > x i β) = x i β + σφ(x i β/σ) Φ(x i β/σ) E ˆβ = (X X) 1 X Ey = β + σ(x X) 1 X λ where λ = (φ i /Φ i ). Note that all the mass corresponding to y < 0 piles up at y = 0. So we get a nonlinear conditional expectation function. 1. The Heckman 2-step Estimator This suggests that if we could somehow estimate β/σ = γ we might be able to correct for the bias introduced by omitting the zero observations. How to estimate γ? The tobit model as expressed above is just the probit model we have already considered except that in the previous case σ 1, but note here we can divide through by σ in the first equation without changing anything. Then it is clear that we are estimating γ = β/σ by the usual probit estimator. So Heckman(1979) proposes: (1.) Estimate binary choice model by probit. (2.) Construct ˆλ i = φ(x i γ)/φ(x i ˆγ). (3.) Reestimate original model using only y i > 0 observations but including ˆλ i as additional explanatory variable the coefficient estimated on λ is σ. This approach is helpful because it clarifies what is going wrong in OLS estimation and how to correct it, but it is problematic in several other respects. In particular, it is difficult to construct s.e. s 1

2 Figure 1. Bias of OLS estimator in the Censored Regression Model: The figure illustrates the conditional expectation of the latent variable yi given x as the solid straight line in the figure. The conditional expectation of the observed response y i is given by the curved dotted line. And the least squares linear approximation of the conditional expectation of the observed response is given by the dashed line. Note that in this model the conditional median function of y i given x is the piecewise linear function max{a + bx, 0}, where E(yi x) = a + bx. for the estimates since the effect of the preliminary estimate of γ is non-negligible. It is also instructive to consider the mle in this problem. The likelihood is straightforward to write down: L(β, σ) = ( F x i β ) σ 1 f((y i x i β)/σ) σ i:y i >0 i:y i =0 for F = Φ we have = i:y i =0 ( ) x (1 Φ i β ) σ i:y i >0 σ 1 φ((y i x i β)/σ) It is useful to contrast this censored regression estimator with the truncated regression estimator with likelihood, n L(β, σ) = (Φ(x i β/σ)) 1 φ((y i x i β)/σ) i=1

2. POWELL S ESTIMATOR 3 2. Powell s estimator A critical failing of the Gaussian mle is that it can perform poorly in non-gaussian and/or heteroscedastic circumstances. If we go back to our picture we can see that the primary source of the difficulty we have been discussing is due to the wish to estimate conditional expectations. If, instead, we tried to estimate the condition median then we have ( ) med(y i x i ) = max{x i β, 0} so we can following Powell (1984) consistently estimate β by solving min y i max{x i β, 0} This works for any F as long as (*) holds, even if there is heteroscedasticity. This can be easily extended (Powell (1985) to quantile regression in general. An interesting question is what quantiles offer optimal efficiency in estimating β. The computational difficulty of this approach is substantially greater than conventional quantile regression due to the fact that the objective function is no longer convex. This problem has been addressed by several authors, notably Fitzenberger (1997). Chernozhukov and Hong (2000) have suggested an alternative simpler approach that employs a multi-step procedure that has the advantage that it only requires solution to linear-in-parameters quantile regression problems. Portnoy (2003) has introduced a new method for fitting randomly censored quantile regression models. Models for Count Data Often we have data that is integer valued and nonnegative: number of doctors visits per year, or number of patents awarded per year, etc. A natural first choice for such data is the Poisson model. P (Y i = k) = f(k) = e λ λ k /k! k = 0, 1, 2,... Given covariates, we would like to formulate a regression type model, so we may take λ = e x i β The log likelihood for the parameter β given a sample (x i, y i : i = 1,..., n). is, l(β) = y i x i β exp(x i β) log(y i!) and maximizing leads us to the normal equations, (yi exp(x i β))x i = 0 Again, we have a system of nonlinear equations to solve in β, but the equations are rather benign, and can be again solved by iteratively reweighted least squares methods as with probit and logit. In a now classic (i.e. forgotten) study of meta-econometrics Koenker (1988) estimated a model of this type purporting to explain the number of parameters k i, as a function of the sample size n i of the study, estimated in published studies of wages in the labor economics literature. Two of the estimated models were: (1) (2) log λ i = 1.34 (.14) log λ i = 0.78 (.31) +.235 (.017) + 1.947 (.148) log n i log(log n i )

4 y 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 Censored QR Naive QR Omniscient QR 0.0 0.5 1.0 1.5 2.0 x Figure 2. Powell estimator: The figure illustrates three families of estimated quintile lines to the points appearing in the plot. In accordance with the classical tobit model, the points with y less than zero are censored so they are recorded as zero. The dashed lines illustrate the naive QR fit that ignores the fact of the censoring, the grey lines represent the omniscient version of the fitting obtained if we know the latent values of the censored observations, and the solid black lines are the estimated Powell quintiles. Note that the Powell estimates are quite close to omniscient estimates as we might hope, except when the estimated line bends to reflect the censoring at zero. This plot is essentially what is obtained by running example(rq.fit.cen) in the quantreg package of R. The algorithm of Fitzenberger is employed for the Powell fitting. Mean sample size for the sample of wage equations was about n = 1000. If one were mainly interested in the effect, π = log Ek log n.

2. POWELL S ESTIMATOR 5 which we might call the parsity, or elasticity of parsimony for lack of better terms, how different are the two models? Recall that expectation of a Poisson random variable with rate λ is simply λ. Contrast the implications of the two models with respect to the behavior of E(k i n i ) as n i. The Poisson model has the feature that it assumes that the variance of a Poisson variable is the same as its mean. This may be sensible in some applications, but uncritical acceptance of this hypothesis sometimes leads to very unreliable inference. Classical MLE theory suggests that the variance covariance of ˆβ n solving the normal equations is V MLE = ( ˆλi x i x i ) 1 adhering to this assumption, but it is safer to admit that the the variance may depend on the mean in some more general way. When the Poisson assumption fails a general form for the covaraince matrix of ˆβ n is, V MLE = ( ˆλi x i x i ) 1 ( ˆν i x i x i )( ˆλi x i x i ) 1 where ˆν i is an estimate of V (y i x i ). There is a good discussion of various strategies for estimating this conditional variance in the monograph of Cameron and Trivedi (1998). Simple Heckman Sample Selection Model Now, we will extend the tobit model to a somewhat more general setup which is usually associated with a labor supply model of Gronau. Gonsider two latent variable equations, and assume that we observe y 1 = y 1 = x 1 β 1 + u 1 y 2 = x 2 β 2 + u 2 1 if y1 > 0 y 2 = 0 y1 0 y 2 y 1 = 1 if 0 y 1 = 0 where in the labor supply model y 1 may be interpreted as the decision to enter the labor force and y 2 is the number of hours worked. Then, E(y 2 x 2, y 1 = 1) = x 2 β 2 + E(u 2 u 1 > x 1 β 1 ) but by Appendix B u 2 u 1 N ( σ 12 u σ1 2 1, σ2 2 σ2 12 σ 2 1 ) so Recall from Tobit case E(y 2 x 2, y 1 = 1) = x 2 β 2 + E( σ 12 σ1 2 u 1 u 1 > x 1β 1 ) E(u 1 u 1 > x 1 β 1 ) = σ 1φ(x 1 β 1/σ 1 ) Φ(x 1 β = σ 1 λ 1/σ 1 ) so E(y 2 x 2, y 2 = 1) = x 2 β 2 + σ 12 λ(x 1 β 1 /σ 1 ) σ 1 which may now be estimated by Heckman 2-step as follows. (1.) Probit of y 1 on x 1 to get ˆγ if β 1 /σ 1. (2.) Construct ˆλ and regress y 2 on [X 2.ˆλ].

6 (3.) Test for Sample Selection bias using σ 12 /σ 1 estimate. Or, this could be estimated via mle methods. Increasingly, researchers have grown dissatisfied with the Heckman latent variable model recognizing that under misspecification of either the normality assumption or due to various forms of heterogeneity large biases may ensue. Manski (1989) offers a radical reappraisal of the problem. He begins with the observation that we can write, (1) P (y x) = P (y x, z = 1)P (z = 1 x) + P (y x, z = 0)P (z = 0 x) when z denotes the binary selection variable. We would like to know P (y x), but since P (y x, z = 0) is unobserved we don t know for example what wages are like for the unemployed there is a fundamental identification problem. This can be addressed in various parametric ways. The simplest of these is to assume selection away. It turns out to be particularly difficult to identify mean response given general assumptions for (1). In contrast quantiles of y are somewhat more tractable. Let and define Q y (τ x) = Q y (τ x) = Then, Manski shows that ˆQ y (τ x) = inf {ξ P (y ξ x) τ} { Qy (1 (1 τ)/p (z = 1 x) x, z = 1) if P (z = 1 x) >= 1 τ otherwise { Qy (τ/p (τ/p (z = 1 x) x, z = 1) if P (z = 1 x) >= τ otherwise Q y = (τ x) Q y (τ x) Q y (τ x) The upper and lower bounds are increasing in τ. As along as P (z = 1 x) < min{tau, 1 τ} the bounds are informative about the τ th quantile. The implication of this, of course, is that we can t bound quantiles in the tails and therefore we can t bound the mean effect. A truly semi-parametric approach to sample selection has been an elusive quest in econometrics for quite some time. Recent unpublished work by Arellano and Bonhomme offers hope of resolving this in a positive way. Stay tuned. References Fitzenberger, B. (1997) A Guide to Censored Quantile Regressions, C.R. Rao and G.S. Maddala (eds.),handbook of Statistics, North-Holland: New York, Cameron, C. and P. Trivedi (1998) Regression Analysis for Count Data, Cambridge U. Press. Chernozhukov, V. and H. Hong, (2000) 3-step Censored Quantile Regression, JASA, 97,872 897. Manski, C.F. (1989) Anatomy of the selection problem, J. of Human Resources, 21, 343-360. Manski, C.F. (1993) The selection problem in econometrics and Statistics, Handbook of Statistics, 11, 73-83. Tobin, J. (1958) Estimation of Relationships for limited dependent variables, Econometrica, 26, 24-36. Heckman, J. (1979) Sample Selection as a Specification Error, Econometrica, 47, 153-62. Portnoy, S. (2003). Censored Regression Quantiles, J. Am. Stat. Assoc, 98, 1001-1012.

2. POWELL S ESTIMATOR 7 Powell, J.L. (1984). Least absolute deviations estimation for the censored regression model, J. Econometrics, 25, 303-325. Powell, J.L. (1986). Censored regression quantiles, J. Econometrics, 32, 143-155. APPENDIX A: Some Notes on Conditional Expectations for the Tobit Model If Z has dff with density f, then the conditional density of Z given Z > c is Note c f c (z) = f(z)/(1 F (c)) f c (z)dz = (1 F (c)) 1 f(z)dz = 1 as expected. The condition expectation of Z given Z > c is E(Z Z > c) = zf c (z)dz = (1 F (c)) 1 zf(z)dz. For F standard Gaussian we have zf(z) = zφ(z) = φ (z) so, E(Z Z > c) = (1 Φ(c)) 1 ( = φ(c)/(1 Φ(c)). c c c φ (z)dz) Finally, consider Y = σz so Y N (0, σ 2 ). E(Y Y > c) = E(σZ σz > c) = σe(z Z > c/σ) = σφ(c/σ)/(1 Φ(c/σ)). APPENDIX B Conditional Normality Theorem: Let Y be p-variate normal N (µ, Ω) with sub-vectors Y 1 and Y 2 having EY i = µ i, and Cov(Y i, Y j ) = Ω ij. Assume Ω 11 and Ω 22 are nonsingular. Then the conditional distribution of Y 2 given Y 1 is N (µ 2 + Ω 21 Ω 1 11 (Y 1 µ 1 ), Ω 22 Ω 21 Ω 1 11 Ω 12). Proof: (This is a simplified version of Rao, Linear Stat Inference, 1973, p. 523. Rao relaxes the nonsingularity condition.) Consider Cov[Y 2 µ 2 Ω 21 Ω 1 11 (Y 1 µ 1 ), Y 1 µ 1 ] = Ω 21 Ω 21 Ω 1 11 Ω 11 = 0 ( ) Similarly, let U = Y 2 µ 2 Ω 21 Ω 1 11 (Y 1 µ 1 ) clearly EU = 0 and V (U) = V [Y 2 Ω 21 Ω 1 11 Y 1] = Ω 22 + Ω 21 Ω 1 11 Ω 12 Cov(Y 2, Ω 21 Ω 1 11 Y 1) Cov(Ω 21 Ω 1 11 Y 1, Y 2 )

8 = Ω 22 Ω 21 Ω 1 11 Ω 12 Since U is a linear function of normal r.v. s it is normal, and therefore, U N (0, Ω 22 Ω 21 Ω 1 11 Ω 12) (+) Further, ( ) establishes that U and Y 1 µ 1 are independent, hence (+) may be interpreted as the conditional distribution of U given Y 1, which is equivalent to what we wished to prove.