Munich Lecture Series 2 Non-linear panel data models: Binary response and ordered choice models and bias-corrected fixed effects models

Similar documents
Non-linear panel data modeling

A dynamic model for binary panel data with unobserved heterogeneity admitting a n-consistent conditional estimator

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008

Comments on: Panel Data Analysis Advantages and Challenges. Manuel Arellano CEMFI, Madrid November 2006

Advanced Econometrics

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Estimation in the Fixed Effects Ordered Logit Model. Chris Muris (SFU)

Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation

Econometrics of Panel Data

Modeling Binary Outcomes: Logit and Probit Models

Limited Dependent Variables and Panel Data

Gibbs Sampling in Latent Variable Models #1

Discrete Choice Modeling

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Simplified Implementation of the Heckman Estimator of the Dynamic Probit Model and a Comparison with Alternative Estimators

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Iterative Bias Correction Procedures Revisited: A Small Scale Monte Carlo Study

Limited Dependent Variables and Panel Data

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Linear Regression. Junhui Qian. October 27, 2014

Econometrics of Panel Data

Partial effects in fixed effects models

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as

Applied Health Economics (for B.Sc.)

Final Exam. Economics 835: Econometrics. Fall 2010

Identification in Discrete Choice Models with Fixed Effects

Maximum Likelihood and. Limited Dependent Variable Models

Linear Models in Econometrics

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

Applied Microeconometrics (L5): Panel Data-Basics

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Lecture 6: Discrete Choice: Qualitative Response

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Testing for time-invariant unobserved heterogeneity in nonlinear panel-data models

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Econometrics of Panel Data

Applied Econometrics Lecture 1

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Correcting the bias in the estimation of a dynamic ordered probit with fixed effects of self-assessed health status.

Maximum Likelihood Methods

ECON 3150/4150, Spring term Lecture 6

Poisson Regression. Ryan Godwin. ECON University of Manitoba

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Individual and Time Effects in Nonlinear Panel Data Models with Large N, T

Lecture 8 Panel Data

Estimation in the xed eects ordered logit model

Women. Sheng-Kai Chang. Abstract. In this paper a computationally practical simulation estimator is proposed for the twotiered

Dealing With Endogeneity

Nonlinear Panel Data Analysis

ECONOMETFUCS FIELD EXAM Michigan State University May 11, 2007

Bias Corrections for Two-Step Fixed Effects Panel Data Estimators

Informational Content in Static and Dynamic Discrete Response Panel Data Models

-redprob- A Stata program for the Heckman estimator of the random effects dynamic probit model

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Panel Data Econometrics

ECON 4160, Autumn term Lecture 1

,..., θ(2),..., θ(n)

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

Syllabus. By Joan Llull. Microeconometrics. IDEA PhD Program. Fall Chapter 1: Introduction and a Brief Review of Relevant Tools

Discrete Choice Modeling

Introductory Econometrics

Discrete Dependent Variable Models

Beyond the Target Customer: Social Effects of CRM Campaigns

Econometric Analysis of Cross Section and Panel Data

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Marginal and Interaction Effects in Ordered Response Models

Lecture 10: Panel Data

Appendix A: The time series behavior of employment growth

Ultra High Dimensional Variable Selection with Endogenous Variables

Generalized Linear Models for Non-Normal Data

Statistics 3858 : Maximum Likelihood Estimators

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

New Developments in Econometrics Lecture 16: Quantile Estimation

Estimation of Structural Parameters and Marginal Effects in Binary Choice Panel Data Models with Fixed Effects

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

Semiparametric Identification in Panel Data Discrete Response Models

xtseqreg: Sequential (two-stage) estimation of linear panel data models

The regression model with one stochastic regressor (part II)

Econometric Analysis of Panel Data. Final Examination: Spring 2013

DEEP, University of Lausanne Lectures on Econometric Analysis of Count Data Pravin K. Trivedi May 2005

SIMPLE SOLUTIONS TO THE INITIAL CONDITIONS PROBLEM IN DYNAMIC, NONLINEAR PANEL DATA MODELS WITH UNOBSERVED HETEROGENEITY

17/003. Alternative moment conditions and an efficient GMM estimator for dynamic panel data models. January, 2017

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models

Econometrics of Panel Data

Econometric Analysis of Games 1

ECON The Simple Regression Model

Econometric Analysis of Panel Data. Final Examination: Spring 2018

Estimation of Dynamic Panel Data Models with Sample Selection

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Dynamic logit model: pseudo conditional likelihood estimation and latent Markov extension

Comparison between conditional and marginal maximum likelihood for a class of item response models

Linear Regression With Special Variables

Transcription:

Munich Lecture Series 2 Non-linear panel data models: Binary response and ordered choice models and bias-corrected fixed effects models Stefanie Schurer stefanie.schurer@rmit.edu.au RMIT University School of Economics, Finance, and Marketing January 29, 2014 1/48

Overview 1 A brief review of binary response models and the maximum likelihood principle; 2 Binary response models with unobserved heterogeneity: 1 Random effects approaches; 2 Chamberlain s (1981) conditional logit fixed effects model. 3 Extension to ordered choice data (Ferrer-i-Carbonell and Frijters, 2004; Jones and Schurer, 2011) 4 Bias-corrected fixed effects models: very hard! (For a good review: Arellano and Hahn, 2007) + application: Carro and Traferri (2013) 2/48

References for Lecture 2 1 Greene, W.H. (2011). Econometric Analysis. Pearson Education Limited. pp. 756-771; 2 Hsiao, C. (2003). Analysis of Panel Data. Econometric Society Monographs. Cambridge University Press: New York. pp. 188-202; 3 Verbeek, M. (2000) A Guide to Modern Econometrics. Wiley. pp. 177-182;151-160; 336-340; 4 Jones, A.M., Schurer, S. (2011). How does heterogeneity shape the socioeconomic gradient in health satisfaction. Journal of Applied Econometrics 26(4); 549714. 5 Carro, J., Traferri, A. (2012). State dependence and heterogeneity in health using a bias-corrected fixed effects estimator. Journal of Applied Econometrics. In print. 6 Arellano, M., Hahn, J. (2007). Understanding bias in nonlinear panel models: some recent developments. In: Advances in Economics and Econometrics, Theory and Applications, Ninth World Congress, Vol. 3. Blundell, R., Newey, W., Persson, T. (eds.), CUP, UK; 381-409. 3/48

1. A brief review of binary response models and the maximum likelihood principle 4/48

Review of binary response models Many data applications in health economics involve a focus on an outcome variable Y that can take on a discrete range of values which represent different state outcomes. For example: HEALTH PRODUCTION: Being in good or excellent health: Y i = 1 if individual i in time period t reports a health status of e.g. 4 (if the Likert scale is increasing in good health and bound between 1 and 5), Y i = 0 otherwise; HEALTH BEHAVIOR: Smoking or excercising: Y i = 1 if individual i is a smoker/is exercising, Y i = 0 otherwise; HEALTH CARE DEMAND: Visiting a family physician or specialist: Y i = 1 if individual i consulted a doctor, Y i = 0 otherwise. A natural extension of binary response models are ordered choice models, i.e. when Y i = 0,1,2,...,J, and where one can say that Y i = 0 < Y i = 1 (natural ordering). More difficult to model. 5/48

Assume we observe random sample of N observations (Y i,x i ) from the population, where Y i = 0 or Y i = 1 is a binary response variable, and X i is a vector of covariates. We are interested in trying to model/understand the relationship between Y i and X i, and, in particular, we believe that P(Y i = 1) is some function of X i that we wish to parameterise. The usual way economists approach this is in terms of a latent variable Yi that is a linear function of the observed covariates X i plus an unobserved error term. This latent (continuous) variable is not observed, but we observe a binary variable that takes the value 1 if Yi > 0 and 0 otherwise. 6/48

Model set up More formally this is: and Y i = X iβ +ε i, (1) Y i = 1(Y i > 0), (2) where the threshold is implicitly built into the intercept term in this model, and the notation 1(.) is an indicator function that takes the value 1 if the statement in parentheses is true or false. Combining Eqs. 1 and 2: Y i = 1(X iβ +ε i > 0), (3) 7/48

Model set up The form of Equation 1 looks similar to the linear modelling approach. In particular, we are still assuming a linear relationship between the latent variable Yi and the regressors of the model X i. The only difference is that we do not observe whether or not Yi is positive. This means that we can only meaningfully consider discussing the probability that Yi is positive conditional on the vector of covariates, i.e. P(Y i = 1 X i ). From Eq. 3 we have: P(Y i X i ) = P(ε i > X iβ). (4) The marginal probability associated with an observation is: P(Y i X i ) = Ui L i f(ε i )dε i, (5) where (L i,u i ) = (, X i β) if Y i = 0 and ( X i β,+ ) if Y i = 1. 8/48

Maximum Likelihood Principle To evaluate this probability, and hence make the model operational, requires distributional assumptions regarding the error term ε i. Formally, suppose θ is a vector of parameters that specifies the model; θ will include β and any other parameters that characterise the distribution of ε i. f i (Y i X i,θ) = [ P(Y i = 1 X i,θ) ] Y i [ 1 P(Yi = 1) X i,θ ] 1 Y i, (6) and θ by maximising the log-likelihood function: N L(θ Y,X) = f i (Y i X i,θ) = i=1 log L(θ Y,X) = N log f i (Y i X i,θ) i=1 N {Y i log P(Y i = 1 X i,θ)+(1 Y i ) log (1 P(Y i = 1) X i,θ)}. i=1 9/48

The maximum likelihood principle Let P(Y i ) be some cumulative distribution function F i and θ = β, then the first-order derivative is: LogL β = N i=1 Y i F(X i β) F(X i (X iβ)x i = 0. (7) β)[1 F(X i β)]f The second-order derivative is: + 2 LogL { β β = N i=1 N i=1 [ Y i F 2 (X i β) + 1 Y i ] F [1 F(X i (X iβ) 2 β)]2 [ Y i F(X i β) F(X i (X iβ) ]} X i X i. β)[1 F(X i β)]f 10/48

The maximum likelihood principle If the likelihood function is concave, then the Newton-Raphson method can be used to find the MLE of β. One issue is how to choose the initial values β 0 : ˆβ j = ˆβ j 1 ( 2 logl) 1 ( logl) β β β=ˆβ j 1 β and where ˆβ j 1 denotes the j 1 th iterative solution. β=ˆβ j 1 (8) 11/48

The logit model This model assumes that the probability function in Eq. 6 has the following form: P(Y i = 1 X i ) = exp(x i β) 1+exp(X i β). (9) Alternatively, this expression can be derived by assuming either: The errors are logistically distributed: ε it Λ(0,π 2 /3) (bell-shaped distribution) The log-odds are linear: [ P(Y i = 1 X i ) ] log 1 P(Y i = 1 X i ) = X iβ (10) 12/48

The logit model: pros and cons The first order condition associated with maximising the log-likelihood function of Eq. 9 has a closed-form solution. The logit specification has some useful properties that enable unobserved heterogeneity to be controlled for using panel data. 13/48

The probit model Assumes that the errors are normally distributed, i.e. ε i N(0,σ 2 ) giving: P(Y i = 1 X i ) = Φ(X iβ/σ), (11) where Φ is the cumulative normal distribution function of the normal distribution. As parameterised, β and σ are not separately identified due to the absence of scale in the outcome variable. For this reason we normalise σ = 1. One of the crucial problems associated with the probit model is that the first-order condition from maximising the log-likelihood using Eq.?? does not have a closed-form solution. Hence, estimating the probit model is computationally more demanding. 14/48

2. Binary response models with unobserved heterogeneity 15/48

Binary choice models with unobserved heterogeneity We begin with a simple panel data extension of a cross-sectional model as outlined in Eq. 3 to allow for fixed unobserved heterogeneity Y it = 1(X it β +ε it > 0), (12) for i = 1,...,N and t = 1,...,T. We assume that ε it is iid. 16/48

Binary choice models with unobserved heterogeneity There are two generic approaches 1 Random effects estimation: possible under very strong assumptions about the unobserved heterogeneity unless imposing some relationship between the unobserved heterogeneity and regressors of the model (remember the Mundlak/Chamberlain solution); 2 Fixed effects estimation suffers from a phenomenon called the incidental parameter problem - only a restricted model can be estimated. 17/48

2.a. Random effects approaches 18/48

Random effects models This model specifies the error term as: where α i and u it are random variables with: ε it = α i +u it, (13) E[u it X] = 0, COV[u it,u js X] = Var[u it X], if i = j t = s; 0 otherwise; E[α i X] = 0, COV[α i,α j ] = Var[α i X] = σα 2, if i = j; 0 otherwise. COV[u it,α j ] = 0 for all i,t,j. Here X captures all exogenous variables. 19/48

Random effects models From these assumptions it follows that: E[ε it X] = 0 Var[ε it X] = σ 2 u +σ 2 α = 1+σ 2 α Corr[ε it,ε is X] = ρ = σ2 α 1+σ 2 α 20/48

Random effects models The contribution of each individual i to the likelihood is the joint probability for all T observations (Read Greene 2012, p. 759): L i = P(Y i1,...,y it X) = Ui,T L i,t... Ui,1 L i,1 f(ε i1,...,ε it )dε i1,...dε it. We can obtain the joint density of the u it s by integrating out α i out of the joint density, i.e. or (14) f(ε i1,...,ε it,α i ) = f(ε i1,...,ε it α i )f(α i ) (15) f(ε i1,...,ε it ) = + f(ε i1,...,ε it α i )f(α i )dα i (16) 21/48

Random effects models Using Eq. 16 and changing the order of the integration, conditioning on α i, the ε it s are independent, we get: L i = P[Y i1,...,y it X] = More generally: + t=1 Uit [ T ] ( f(ε it α i )dε it ) f(α i )dα i. L it (17) L i = P[Y i1,...,y it X] = + [ T t=1 ] Prob(Y it = y it X it β+u it) f(α i )dα i. (18) 22/48

Random effects models The inner probability can be probit or logit (or any other you can think of). We can do the outer integration with Butler and Moffitt s methods assuming that u it N. Their method uses the Gauss-Hermite quadrature to approximate integrals. (Please read p. 622 in Greene (2012) to check more detail this method). This approach is often critcised for the assumption of equal correlation across time-periods, but it can be efficiently estimated even with large Ts. 23/48

Random effects models Alternatively, one can use simulated maximum likelihood, which is based on an expectation and is more flexible: T L i = E αi [ t=1 Prob(Y it = y it X itβ +α i )] (19) This expectation can be approximated by simulation. We won t get into the details, but a sample of person-specific draws from the population α i can be generated with a random number generator. 24/48

Random effects models One advantage of random effects models is that one can construct average partial effects in the presence of unobserved heterogeneity. See Wooldridge (2009) for an extensive discussion about this. The assumption of independence between the unobserved heterogeneity and the regressors of the model is difficult to justify - one can use the Mundlak approach to impose some structure on the relationship between α i and X it. 25/48

2.b. Chamberlain s (1980) conditional fixed effects logit model 26/48

Fixed effects models Assume the following model: Y it = α i d it +X itβ +u it (20) where Y it = 1 if Y it > 0, and 0 otherwise. Here d it is a dummy variable that takes the value one for individual i and 0 otherwise. X it does not contain any constant. Hence, there are K regressors and n individual constant terms. The log-likelihood function is: lnl = N T i=1 t=1 lnp(y it α i +X itβ), (21) where P(.) is the probability of the observed outcome (e.g. Φ(q it (α i +X it β) for the probit model or Λ(q it(α i +X it β) for the logit model, q it = 2Y it 1). 27/48

Fixed effects models In the linear regression model, we could use deviations from the mean to get rid of the individual-specific heterogeneity. This is no longer possible in the linear case (except for some special cases). In this case, you would need to estimate the sometimes large number of constant terms. The problem with estimating the large number of constant terms is that the estimator relies on T i increasing for the constant term to be consistent. Usually, T i are small, and thus the estimates for α i are not consistent (they don t converge at all if T i is fixed). 28/48

Fixed effects models Since the estimator of β is a function of α, the MLE estimator of β is not consistent either. This is the famous incidental parameter problem (Read Lancaster (2000), which will be on the Blackboard). There is also a small sample bias in β: Hsiao (2000) found that the bias in T i = 2 can be 100% (Check Hsiao 2003, p. 194-195 for the case of T=2, for one regressor, with values X i1 = 0 and X i1 = 1); Heckman and MaCurdy (1980) estimate that the bias is in the order of 10% for N=100 and T=8. 29/48

The conditional fixed effects estimator Chamberlain s (1980) conditional fixed effects estimator relies on the notion of a sufficient statistic Ȳ i for α i. This sufficient statistic states: f(y it X it,ȳi,α i ) = f(y it X it,ȳi) (22) Such a sufficient statistic has been shown to be available for some distributions (e.g. logit), but not for the probit. The fixed effects binary logit model is: Prob(Y it = 1 X it ) = exp(α i +X it β) 1+exp(α i +X (23) β). it 30/48

The conditional fixed effects estimator The unconditional likelihood for the NT observations is: L = N T i=1t=1 F Y it it (1 F it ) 1 Y it. (24) Chamberlain (1980) used a result by Anderson (1970) to show that that the conditional log-likelihood is independent of the incidental parameter α i N T L c = Prob(Y i1 = y i1,y i2 = y i2,...,y it = y it y it ). (25) i=1 t=1 31/48

The conditional fixed effects estimator The joint likelihood for each set of T i observations conditioned on the number of ones in the set is: Prob(Y i1 = y i1,y i2 = y i2,...,y it = y it exp( T t=1 y it)x it β t d it=s i exp( T t=1 d itx it β) T y it,x it ) = The ( function ) in the denominator is summed over the set of all Ti different sequences of T i zeros and ones that have the S i same sum as S i = T i t=1 Y it. t=1 32/48

The conditional fixed effects estimator Only observations for whom the dependent variable changes at least once between 0 and 1 can be used. 1 Consider the case of T = 2. In this case S s=1 Y is = 0,1,2. What if s Y is = 0 or 2? In this case Y i1 and Y i2 are both determined (they are either both 0 or 1) - both observations would be uninformative about β as they would drop out of the likelihood. In the case where T i t=1 Y it = 2 or = 0, the conditional probability is 1, b/c then α i = +/ (see Hsiao (2003), p. 194). Note that ˆα i = β 2 if Y i1 +Y i2 = 1. What if s Y is = 1? Then either Y i1 = 1 or Y i2 = 0, e.g.: See next page for derivations. 1 (In more technical terms this means that 0 < T 1 T i t=1 Yit < T) 33/48

The conditional fixed effects estimator Let Y i1 = 1 and Y i2 = 0, then P(Y i1 = 1,Y i2 = 0 α i,x i ) = exp(α i +X i1 β) 1+exp(α i +X exp(α i +X i1 β) 1+exp(α i +X i1 β)(1 exp(α i+x i2 β) 1+exp(α i +X i2 β)) β)(1 exp(α i+x i2 β) i1 1+exp(α i +X i2 β))+ exp(α i+x i2 β) 1+exp(α i +X i2 β)(1 exp(α i+x = exp(α i +X i1 β) exp(α i +X i1 β)+exp(α i +X i2 β). i1 β) 1+exp(α i +X i1 β)) 34/48

The conditional fixed effects estimator For simplicity, let s use only Λ as a short-cut to express the logistic function: P(Y i1 = 0,Y i2 = 1 α i,x i ) = Λ(α i +X i2 β)(1 Λ(α i +X i1 β)) Λ(α i +X i1 β)(1 Λ(α i +X i2 β))+λ(α i +X i2 β)(1 Λ(α i +X = Λ((X i2 X i1 ) β). Thus, Eq. 26 is in the form of a binary logit function in which the two outcomes are (0,1) and (1, 0), with explanatory variables (X i2 X i1 ). The conditional log-likelihood is then i1 β) 35/48

The conditional fixed effects estimator The conditional log-likelihood function is: logl = i B{ω i logλ[(x i2 X i1 ) β]+(1 ω i )log(1 Λ[(X i2 X i1 ) β])}, (26) where B 1 = {i Y i1 +Y i2 = 1} and ω = 1 if Y i1,y i2 = (0,1) and ω = 0 if Y i1,y i2 = (1,0). What does this in practice mean? It means that we can use only individuals who change status at least once within the time periods observed. Only time-variant variables can be included in the set of regressors, as the data is defacto first-differenced. 36/48

The conditional fixed effects estimator You can get the asymptotic covariance matrix for the conditional MLE for β as N tends to infinity. Chamberlain (1980) has shown that the inverse of the information matrix is equivalent to the asymptotic covariance matrix. Let d i = 1 if Y i1 +Y i2 = 1 and 0 otherwise. 2 logl β β = d i F((X i2 X i1 ) β) [1 F((X i2 X i1 ) β)](x i2 X i1 )(X i2 X i1 ) The information matrix would be I = E( 2 logl β β ). 37/48

3. Extensions to Ordered Choice Data 38/48

Extensions What can we do if we want to model ordered choice data, such as health satisfaction or general health status? Ferrer-i-Carbonell and Frijters (2004): Suggest to find an individual-specific, efficient threshold according to which the ordered-choice variable is dichotomised; then apply Chamberlain CFE (In practice: use individual-specific mean of dependent variable as threshold) Jones and Schurer (2011): Model two individual-specific effects: One in the health outcome equation and one in each threshold (In practice: Dichotomise the ordered choice variable for every possible cut-off k; estimate k equations using the Chamberlain Conditional Fixed Effects Approach - heterogeneous parameter estimates are interpreted as non-linearities in the effect of e.g. income on health). 39/48

Jones and Schurer (2011) We want to model the socioeconomic gradient of health satisfaction HSit and allow for non-linearities in the effect of X on health. HSit = α i +µ(β X it )+u it (27) We observe reported health as: where HS it = j if τ i,j 1 < HS it τ i,j (28) τ i,j = τ ij 1 + τ i,j (29) Which means that each threshold depends on an individual-specific effect τ i,j (e.g. could be personality traits). 40/48

Jones and Schurer (2011) This means that the true health status (when reporting a particular value for health) is bound between: τ i,j 1 < α i +µ(β X it )+u it τ i,j (30) Which can be rearranged accordingly: (α i τ i,j 1 ) µ(β X it ) < u it (α i τ i,j ) µ(β X it ) (31) where α i and τ i,j k cannot be separately identified (Let (α i τ i,j k = α i,j k ). We then get: P itj = P(HS itj = j α ij,x it ) = F(α ij µ(β X it )) F(α ij 1 µ(β X it )) (32) and F(.) is the logistic function. 41/48

In practice 1 Dichotomise HS it K 1 times where K is the number of categories of the health satisfaction variable; HSit B1 = 1 if HS it > 1 0 otherwise (33) HSit B2 = 1 if HS it > 2 0 otherwise (34) HSit B3 = 1 if HS it > 3 0 otherwise (35) HSit B3 = 1 if HS it > 4 0 otherwise (36) 2 Estimate K 1 times a Chamberlain Conditional Fixed Effects Model, once for each binary variable; 3 Interpret heterogeneous coefficient estimates as non-linearities in the effect of e.g. income on true health. 42/48

4. Bias-corrected fixed effects models: VERY HARD! 43/48

Bias-corrected fixed effects estimators A new literature is evolving that tries to solve the incidental parameter problem by calculating and correcting for the bias. Most applications exist for static and dynamic binary/ordered choice estimators: Analytical or numerical bias correction in a fixed effects estimator: Fernandenz-Val (2009) - dynamic binary choice model; Correct bias in a moment equation, i.e. the expected score function: Carro (2007) - dynamic binary choice model, modified MLE; Correct objective function, i.e. the concentrated log likelihood: Arellano and Hahn (2006), Bester and Hansen (2009) - dynamic ordered probit. All three approaches may yield different results and finite sample properties need to be assessed, but they all reduce the asymptotic bias from order T 1 to order T 2 for a general class of models (when T dimension is not too small). 44/48

Correct the fixed effect estimator Let y it N(α 0i,σ 2 0 ). To obtain an estimate of σ2 = θ: Then: logf(y it ;σ 2,α i ) = C 1 2 logσ2 (y it α i ) 2 2σ 2 (37) ˆα i = 1 T ˆθ = 1 NT T y it ȳ i (38) t=1 N i=1 i=1 T (y it ȳ i ) 2 (39) 45/48

Correct the fixed effect estimator It can be shown, as N with fixed T, that: ˆθ = θ 0 1 T θ 0 +o p (1) (40) We are concerned about this part 1 T θ 0. Approach 1 would correct this bias directly by correcting for the correct degrees of freedom (equate denominator with N(T-1)), which turns the bias from 1/T into 1/T 2. In general, one needs to find the formula for the bias (in the limit), and then obtain an estimate for the bias (sample) analog. This approach does not depend on the log likelihood function. It requires transformations/derivations and possibly of expectations, and usually does not produce the exact bias correction. 46/48

Correct moment conditions (use expected score) In this case, one uses the expected fixed effects score function, evaluated at θ 0, i.e. E[ 1 T T t=1 θ logf(y it θ 0, ˆα i (θ 0 ))] = 1 T b i(θ 0 )+o( 1 T ) (41) This approach also requires the calculation of expectations. The expression of the expected score is then used to construct a moment condition, which will then be adjusted (Note: the score is used to obtain the MLE when equated to 0). This approach can produce an exact bias correction (not only 1/T 2 correction) 47/48

Correct the concentrated log likelihood Alternatively, one can take the expectation of the log likelihood such as: E[ 1 T T logf(y it θ 0, ˆα i (θ 0 )) 1 T t=1 T logf(y it θ,ᾱ i (θ))] = 1 T β i(θ 0 )+o( 1 T ) t=1 (42) This approach is easier to compute and usually does not require the calculation of expectations (e.g. Bester and Hansen (2009) s version of a dynamic ordered probit model, but approach does not perform well in removing the bias for T < 13.) 48/48