BINARY CHOICE MODELS Y ( Y ) ( Y ) 1 with Pr = 1 = P = 0 with Pr = 0 = 1 P Examples: decision-making purchase of durable consumer products unemployment Estimation with OLS? Yi = Xiβ + εi Problems: nonsense predictions ( < 0, > 1) functional form: linear parameters always have the same effect ε heteroskedastic 1
2
Binary Variables E( y ) = µ Var( y ) = µ (1 µ ) because: 2 2 2 Var( y): E ( y µ ) = E( y 2 yµ + µ ) = 2 2 E( y ) 2 µ E( y) + µ 2 since y = 1 or 0 E( y ) = E( y ) = µ 2 2 Var (y) = µ 2 µ + µ = µ (1 µ ) Heteroskedasticity Variance increases with xβ... ( µ ) 3
LATENT VARIABLES e.g. employment decision (0 or 1) Must be based on a latent propensity to work Y i * : (based on labor supply) Y = X β + ε * i i i Y i 1 if Y * > 0 = i 0 if Y * 0 i other values are possible thresholds as well 4
( ) ( ) P= Pr Y = 1 = F Xβ with F as symmetric cumulative probability density. F( Xβ ) X β + F( Xβ ) lim = 1 lim = 0 X β Therefore: Y = 1 if Y * > 0 X β + ε > 0 X β > ε ε > X β i i i i Pr ( Y = 1) = Pr ( ε > Xβ) = F( Xβ) Pr ( Y = 0) = 1 F( Xβ ) 5
6
MAXIMUM LIKELIHOOD ESTIMATION The likelihood of receiving a certain result Y i in the sample can be maximised via the choice of an adequate parameter β. ( ) L = Pr Y = y, Y = y,, Y = y = N i= 1 1 1 2 2 ( β) F( X β) = 1 F X i = i Y = 0 Y = 1 i N = 1 F X i= 1 i 1 yi ( β) F( X β) i ( ) ( β) ( β) { } i i i i ln L = 1 y ln 1 F X + y ln F X MAX β Log-likelihood function is globally concave ==> unique global maximum N i N y i 7
The form of F (.) Probit-Model normal distribution X β 2 1 t F ( X β) =F ( X β) = exp dt 2p 2 Logit-Model standard logistic distribution ( β ) F X exp = 1 + exp ( X β ) ( X β ) Both yield very similar results 8
GOODNESS OF FIT 2 ln L Pseudo R :1, ln L0 ln L value of log-likelihood in a model without covariates 0 + variations (McFadden, etc.) Problem: Pseudo 2 R is only 1 if X β ± Observed/predicted table (estat classification in STATA after Logit) Amount of Right Predictions : ˆ * Y i = 1 if Pˆi > P (usually 0.5), separately for each subgroup Comparison of Y ˆi and Y i Hosmer-Lemeshow-Test for misspecification: tests, for a number of subgroups, whether or not the mean predicted probability for P=1 is the same as in the sample. (mostly conducted for deciles; has to be the same for the whole sample) 9
ESTIMATION PROBLEMS Maximum Likelihood is o consistent, o asymptotic efficient and asymptotically normally distributed problems with small samples (should be >100) ln L is globally concave, nevertheless estimation problems (no convergence of the iteration procedure, singularity of the hessian matrix in the 2 nd derivative) multicollinearity of variables dummy-variable explains outcome entirely ( huge coefficients) + linear combinations 10
d i : if d i =1, then y i =1 i= 1 { i ( iβ d i) ( i) ( iβ d i) } N ln L = y ln F X + d + 1 y ln 1 F X + d MAX β i { } ( β d) ( β) ( ) ( β) = ln F X + + y ln F X + 1 yln 1 F X i i i i i d =1 d =0 i δ is only estimated in the first term, maximizing ln L means δ increases to maximum ==> + scaling of variables check standard errors 11
PROPERTIES OF THE ESTIMATOR only small troubles with OLS if some assumptions are violated. (e.g. still consistent estimates if error is heteroscedastic or suffers from autocorrelation) with Probit/Logit: only β is identified instead of β. σ General assumption: σ=1. heteroscedasticity: o if independent of RHS-variables, no problem but: 2 if σi = exp( γ1 + γ2x1 i), β cannot be consistently estimated o hetpro (STATA) as simple method to test heteroscedasticity; the general assumption is that errors correlate with one of the variables 12
INTERPRETATION OF THE COEFFICIENT Non-linear procedure impact of variables depend on their position marginal effects are necessary dp dx 1 = β f 1 ( Xβ) Which position should be used to evaluate the marginal effect? all i, then 0 sample mean relevant combinations of variables 13
14
15
DUMMY-VARIABLES Discrete change X k ( Y X) Pr = 1 X k ( Y X X ) ( Y X X ) = Pr = 1, = 0 Pr = 1, = 1 k k STATA provides automated processes for marginal effects tables in marginal effects are always better Scott Long (Univ. of Indiana) developed own procedures for interpretations (see website) 16
ODDS RATIO FOR LOGIT Odds: ( ) ( ) ( ) ( ) Pr Y = 1 X Pr Y = 1 X = Pr Y = 0 X 1 Pr Y = 1 X how often happens 1, relative to 0, variation between 0 and ln(odds): variation between and : ( Y = X) ( Y X) Pr 1 ln 1 Pr = 1 = X β equivalent to logit: ( X β ) ( X β ) exp Pr ( Y = 1 X) = yields an interesting interpretation. 1 + exp 17
ODDS ( Ω ) P Ω ( X) = = 1 P e X β Two realisations of X are given: X 1 und X 0 Odds ratio: Ω Ω ( X1 ) ( X ) 0 = e ( X X ) 1 0 β If 1, ( 0) j e β > β j >, X j increases odds of observing Y = 1 18
PANEL MODELS For fixed T and N, the estimator is inconsistent. Incidental parameter problem, fixed effect α i not estimable. In linear model (OLS), it was possible to eliminate α i. Conditional maximum likelihood model: look for a sufficient statistic t i for α i, such that: conditional on t i, an individual likelihood s contribution no longer depends on α i, but on β. In linear models, a sufficient statistic for α i is y. This is not the case for a Probit model: Fixed Effects Probit is inconsistent. Fixed Effects Logit and Random Effects Probit is possible. 19
Ordered Probit/Logit Ordinal variable e.g. school grades * Latent variable Y (, ) is measured only in N values Yi = m if * τ m 1 Y < i τm, m 1 N τ threshold, cutpoints Thresholds are unknown must be estimated as well! 20
For ordered probit: ( Y = ) =Φ( Xβ ) ( Y = ) =Φ( τ1 Xβ) Φ( Xβ) ( Y = ) =Φ( τ Xβ) Φ( τ Xβ) Pr 0 0 Pr 1 Pr 2 2 1 ( Y = N) = Φ( τ Xβ) Pr 1 N 1 a change in X shifts the entire probability density tot he left/right Attention: Interpretation is difficult and only explicit at the borders! 21
22
23
24