Economics 472. Lecture 16. Binary Dependent Variable Models

Similar documents
P.B. Stark. January 29, 1998

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

h=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio

Lecture 6: Discrete Choice: Qualitative Response

Economics Midterm Answer Key. Q1 èiè In this question we have a Marshallian demand function with arguments Cèp;mè =

The method of teepet decent i probably the bet known procedure for ænding aymptotic behavior of integral of the form Z è1è Ièè = gèzè e f èzè dz; C wh

also has x æ as a local imizer. Of course, æ is typically not known, but an algorithm can approximate æ as it approximates x æ èas the augmented Lagra

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Economics 536 Lecture 21 Counts, Tobit, Sample Selection, and Truncation

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Introduction to Discrete Choice Models

disc choice5.tex; April 11, ffl See: King - Unifying Political Methodology ffl See: King/Tomz/Wittenberg (1998, APSA Meeting). ffl See: Alvarez

ECON 594: Lecture #6

Limited Dependent Variable Models II

A Scaled Diæerence Chi-square Test Statistic. Albert Satorra. Universitat Pompeu Fabra. and. Peter M. Bentler. August 3, 1999

2 I. Pritsker and R. Varga Wesay that the triple èg; W;æè has the rational approximation property if, iè iiè for any f èzè which is analytic in G and

Multinomial Discrete Choice Models

Generalized Linear Models for Non-Normal Data

Probabilistic Choice Models

Some Preliminary Market Research: A Googoloscopy. Parametric Links for Binary Response. Some Preliminary Market Research: A Googoloscopy

Maximum Likelihood and. Limited Dependent Variable Models

Continuity of Bçezier patches. Jana Pçlnikovça, Jaroslav Plaçcek, Juraj ç Sofranko. Faculty of Mathematics and Physics. Comenius University

Probabilistic Choice Models

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

Stat 5101 Lecture Notes

In the previous chapters we have presented synthesis methods for optimal H 2 and

SPRING Final Exam. May 5, 1999

Economics 536 Lecture 20 Binary Response Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

where A and B are constants of integration, v t = p g=k is the terminal velocity, g is the acceleration of gravity, and m is the mass of the projectil

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

The total current injected in the system must be zero, therefore the sum of vector entries B j j is zero, è1;:::1èb j j =: Excluding j by means of è1è

Stat 642, Lecture notes for 04/12/05 96

Chapter 11. Regression with a Binary Dependent Variable

Linear Regression With Special Variables

Single-level Models for Binary Responses

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria

Non-linear panel data modeling

Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON

Semiparametric Generalized Linear Models

Working Paper No Maximum score type estimators

Lecture-20: Discrete Choice Modeling-I

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

FORM 1 Physics 240 í Fall Exam 1 CONTI COULTER GRAFE SKALSEY ZORN. Please æll in the information requested on the scantron answer sheet

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Chapter 3 Choice Models

STAT331. Cox s Proportional Hazards Model

ISQS 5349 Spring 2013 Final Exam

Final Exam. Name: Solution:

2. We care about proportion for categorical variable, but average for numerical one.

The Logit Model: Estimation, Testing and Interpretation

MS&E 226: Small Data

LECTURE 17. Algorithms for Polynomial Interpolation

problem of detection naturally arises in technical diagnostics, where one is interested in detecting cracks, corrosion, or any other defect in a sampl

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

What s New in Econometrics? Lecture 14 Quantile Methods

2 2. EQUATIONS AND ALGORITHMS 2.1. WAITING ELEMENTS Brittle-elastic bar. Consider a stretched rod from a homogeneous elastic-brittle material. If load

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Discrete Dependent Variable Models

W 1 æw 2 G + 0 e? u K y Figure 5.1: Control of uncertain system. For MIMO systems, the normbounded uncertainty description is generalized by assuming

The Generalized Roy Model and Treatment Effects

References. Regression standard errors in clustered samples. diagonal matrix with elements

consistency is faster than the usual T 1=2 consistency rate. In both cases more general error distributions were considered as well. Consistency resul

Simple Estimators for Semiparametric Multinomial Choice Models

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

An Overview of Choice Models

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Lecture 1. Introduction. The importance, ubiquity, and complexity of embedded systems are growing

Testing and Model Selection

Generalized linear models

Lecture 3: Multiple Regression

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Classification. Chapter Introduction. 6.2 The Bayes classifier

,..., θ(2),..., θ(n)

Multivariate Regression: Part I

IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION

LECTURE 12. Special Solutions of Laplace's Equation. 1. Separation of Variables with Respect to Cartesian Coordinates. Suppose.

R. Koenker Spring 2017 Economics 574 Problem Set 1

Correlation and Regression

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

8 Nominal and Ordinal Logistic Regression

Lecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo

Extensions to the Basic Framework II

Last few slides from last time

Loglikelihood and Confidence Intervals

A short introduc-on to discrete choice models

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

New Developments in Econometrics Lecture 16: Quantile Estimation

Comparing IRT with Other Models

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

easy to make g by æ èp,1è=q where æ generates Zp. æ We can use a secure prime modulus p such that èp, 1è=2q is also prime or each prime factor of èp,

Generalized Models: Part 1

Transcription:

University of Illinois Fall 998 Department of Economics Roger Koenker Economics 472 Lecture 6 Binary Dependent Variable Models Let's begin with a model for an observed proportion, or frequency. We would like to explain variation in the proportion p i as a function of covariates x i.we could simply specify that p i = x 0 i æ + error and run it as OLS regression. But this has certain problems. For example, we might ænd that ^p i =2 ë0; ë: So we typically consider transformations gèp i è=x 0 i æ + error where g is usually called the ëlink" function. A typical example of g is the logit function gèpè = logitèpè =logèp=, pè this corresponds to the logistic df. The transformation may be seen to induce a certain degree of heteroscedasticity into the model. Suppose each observation ^p i is based on a moderately large sample of n i observations with ^p i! p i : We may then use the æ-method to compute the variability of logitè^p i è, so V ègè^p i èè = èg 0 èp i èè 2 V è^p i è gèpè = logèp=è, pèè ç g 0 èpè =, p d p æ p dp, p V è^p i è = p iè, p i è n i V èlogitè^p i èè = n i p i è, p i è ç = pè, pè Thus GLS would suggest running the weighted regression of logitè^p i èonx i with weights n i p i è, p i è. Of course, we could, based on considerations so far, replace logitè^p i èwithany other quantile-type transformation from ë0,ë to jr. For example, we might use æ, è^p i è in which case the same logic suggests regressing æ, è^p i èonx i with weights n iç 2 èæ, èp i èè p i è, p i è An immediate problem presents itself, however, if we would like to apply the foregoing to data in which some of the observed p i are either 0 or. Since the foregoing approach seems rather ad hoc any way based as it is an approximate normality of the ^p i wemightaswell leap in the briar patch of mle. But to keep things quite close to the

2 regression setting we will posit the following latent variable model. We posit the model for the latent èunobservedè variable y æ i and assume that the observed binary response variable y i is generated as, y æ i = so letting the df of u i be denoted by F, x0 i æ + u i ç if y æ y i = i é 0 0 otherwise P èy =è = P èu i é,x 0 i æè=, F è,x0 i æè P èy =0è = F è,x 0 i æè For F symmetric F èzè+f è,zè =sofèzè =fè,zè and we have P èy =è = F èx 0 i æè P èy =0è =, F èx 0 i æè and we may write the likelihood of seeing Y the sample fy i ;x i Y è: Lèæè = è, F èx 0 i æèè = i:y i =0 ny i= F y i i è, F i è,y i i:y i = i =;:::;ng as F èx 0 i æè Now we need to make somechoice of F. There are several popular choices: èiè: Logit p = F èzè = ez è +e R logèp=, pè =z so Ey z i = p i = F èx i æè è logitèp i è=x 0 i æ z èiiè: Probit F èzè =æèzè =, çèxèdx æ, èpè =x 0 i æ èiiiè: Cauchy F èzè + 2 ç, tan, èzè F, èpè =tanèçèèp, èè = 2 x0 i æ: èivè: Complementary log log èvè: log-log F, èpè =logè, logè, pèè = x 0 i æ F, èpè =, logè, logèpèè = x i æ

3 6 4 2 probablility scale 0.00 0.00 0.00 0.900 0.990 0.999-6 log-log -4-2 -2 2 4 6 logit scale probit logit cauchy -4 c-loglog -6 Figure. Comparison of æve link functions: The horizontal axis is on the logistic scale so the logit link appears as the 45 degree line. Symmetry around the y-axis indicates symmetry of the distribution corresponding with the link as in the logit, probit and Cauchy cases. The log-log forms are asymmetric in this respect. Note that while the probit and logit are quite similar the Cauchy link is much more long tailed.

4 ëèplots of link functions for binary dep models postscriptè"fig.ps",horizontal=f,width=6.5,height=6.5, font=7,pointsize=2è eps_.2 u_è:999èè000 plotèlogèuèè-uèè,logèuèè-uèè,type="l",axes=f, xlab="",ylab=""è tics_cè.00,.0,.è tics_cètics,-ticsè ytics_0*tics segmentsèlogèticsèè-ticsèè,ytics,logèticsèè-ticsèè,ytics+epsè textèlogèticsèè-ticsèè,ytics+3*eps,pasteèformatèroundètics,3èèèè textèlogèticsë2ëèè-ticsë2ëèè,.3,"probablility scale"è tics_cè2,4,6è tics_cètics,-ticsè ytics_0*tics segmentsètics,ytics,tics,ytics-epsè textètics,ytics-3*eps,pasteèformatèroundèticsèèèè textèticsë2ë,-.3,"logit scale"è segmentsè-eps,ytics,eps,yticsè textè-3*eps,tics,pasteèformatèroundèticsèèèè ablineèh=0è ablineèv=0è linesèlogèuèè-uèè,qnormèuè,lty=2è linesèlogèuèè-uèè,logè-logè-uèè,lty=3è linesèlogèuèè-uèè,-logè-logèuèè,lty=4è linesèlogèuèè-uèè,tanèpi*èu-.5èè,lty=5è textè-cè6,6,5,5,2è,-cè,3,4,6,4è,cè"log-log","probit","logit","c-loglog","cauchy"èè frameèè

5 In regression we are used to the idea that Interpretation of the coeæcients @Eèyjxè = æ i @x i provided we really have a linear model in x i, but under our symmetry assumption here the situation is slightly more complicated. Now, so now for logit we have so while for probit we have Eèy i jx i è=æ P èy i =è+0æ P èy i =0è=F èx i æè @Eèyjxè = fèx 0 æèæ j @x j ez F èzè = +e z fèzè =F èzèè, F èzèè fèzè =çèzè = p 2ç e,z2 =2 and for Cauchy fèzè = çè + zè 2 We can compare these, for example, at z =0whereweget factor from fè0è logit è4 probit p = 2ç Cauchy =ç and roughly speaking the whole ^æ-vector should scale by these factors so e.g., 4 ælogit j ç p b probit j 2ç so æ logit j ç :60æ probit j Diagnostic For the Logistic Link Function Let gèpè = logitèpè in the usual one observation per cell logit model, and suppose we've ætted the model logitèp i è=xæ

6 but we'd like to know if there is some more general form for the density which works better. Pregibon suggests, following Box-Cox, gèpè = pæ,æ, æ, æ, è, pèæ,æ, æ + æ note as æ; æ! 0weget = log p, logè, pè = logèp=, pè: æ =0è symmetry, æ governs fatness of tails. Expanding g in æ; æ we get èwith diligenceè gèpè = g 0 èpè+æg æ 0 èpè+æg æ 0èpè g æ 0 èpè = 2 ëlog2 èpè, log 2 è, pèë g æ 0èpè =, 2 ëlog2 èpè + log 2 è, pèë LM tests of signiæcance of g æ ;g æ ; in an expanded model in which we include g æè^pè 0 andgæ 0 è^pè where these variables are constructed from a preliminary logistic regression, can be used to evaluate the reasonableness of the logit speciæcation. Semiparametric Methods for Binary Choice Models It is worthwhile to explore what happens when we relax the assumptions of the prior analysis, in particular the assumptions that a.è we know the form of the df F, and b.è that the u i 's in the latent variable formulation are iid. Recall that in ordinary linear regression we can justify OLS methods with the minimal assumption that u is mean independent of the covariates x, i.e., that Eèujxè = 0. We will see that this condition is not suæcient to identify the parameters æ in the latent variable form of the binary choice model. The following example is taken from Horowitz è998è. Suppose we have the simple logistic model, where u i is iid logistic, i.e., has df y æ i = x i æ + u i F èuè ==è + e,u è It is clear that multiplying the latent variable equation through by ç levels observable choices unchanged, so the ærst observation about identiæcation in this model is that we can only identify æ ëup to scale". This is essentially the reason we are entitled to impose the assumption that u has a df with known scale. Now let æ be another parameter vector such that æ 6= çæ for any choice of the scalar ç: It is easy to construct new random variables, say v, whose dfs will now depend upon x, and for which èæè F vjx èx 0 æè==è + expè,xæèè

and Eèvjxè=0 Thus, æ and the v's would generate the same observable probabilities as æ and the u's and both would have mean independent errors with respect to x. The argument is most easily seen by drawing a picture. Suppose we have the original èæ; uè model with nice logistic densities at each x, the and a line representing xæ and we could imagine recentering the logistic densities so that they were centered with respect to the xæ line. Now on the left side of the picture imagine stretching the right tail of the density until the mean matches xæ, similarly we can stretch the left tail an the right side of the picture í as long as the stretching doesn't move mass across the xæ line è*è is satisæed. What this shows is that mean independence is the wrong idea for thinking about binary choice models. What is appropriate? The example illustrates that the right concept is median independence. As long as medianèujxè =0 we do get identiæcation under two rather mild conditions. A simple way toseehow to exploit this is to recall that under the general quantile regression model, Q y èçjxè xæ equivariance to monotone transformations implies that for the rather drastic transformation Ièx é0è we have Q Ièyé0èèçjxè =Ièxæ é 0è but Ièy é0è is just the observable binary variable so this suggests the following estimation strategy X ç I èy i, Ièx i æé0èè min jjæjj= where y i is the binary variable. This problem is rather tricky computationally but it has a natural interpretation í we want tochose æ so that as often as possible Ièx i æé0è predicts correctly. Manski è975è introduced this idea under the rather unfortunate name ëmaximum score" estimator, writing it as X è2yi, èè2ièx 0 iæ ç 0è, è max jjæjj= In this form we try to maximize the number of matches, rather than minimizing the number of unmatches but the two problems are equivalent. The large sample theory of this estimator is rather complicated, but an interesting aspect of the quantile regression formulation is that it enables us by estimating the model for various values of ç to explore the problem of ëheteroscedasticity" in the binary choice model. 7 Discrete Choice Models í Some Theory The theory of discrete choice has a long history in both psychology and economics. McFadden's version of the Thurstone è927è model may be very concisely expressed as follows:

8 m choices y æ i = çutility ofi th choice if y æ y i = i =maxfy æ ;:::;yæ mg 0 otherwise Suppose we can express the ëutility of the i th choice" as, y æ i = vèx i è+u i where x i is a vector of attributes of the i th choice, and fu i g are iid draws from some df F. Note that in contrast to classical economic models of choice, here utility has a random component. This randomness has an important role to play, because it allows us to develop simple models with ëcommon tastes" in which not everyone make exactly the same choices. Thm. Iftheu i are iid with F èuè =F èu i éuè=e,e,u, then P èy i =jx i è= ev i P e v i where v i ç vèx i è. Remark. F èæè is often called the Type extreme value distribution. Proof. y æ i =maxfçg è u i + v i év j + v j for all j 6= i or u j éu i + v i, v j. So, conditioning on u i and then integrating with respect to the marginal density ofu i, P èy æ i =jx iè= Z Y F èu i + v i, v j èfèu i èdu i Note if F èuè takes the hypothesized form, then fèuè =e,e,u æ e,u = e,u,e,u so Y i F èu i + v i, v j èfèu i è = Y j expè, expè,u i, v i + v j èè expè,u i, expè,u i èè = expè,u i, e,u i è + X j6=i e v j e v i èè let so P èy æ i =jx iè = X ç i = logè + e v j =e v i X m è=logè e v j =e v i è j6=i j= Z expè,u i, e,èu i,ç iè èdu i Z = e,ç i expè,~u i, e,~u i èd~u i ~u i = u i, ç i = e,ç i = e v i P n j= ev j Extensions: y æ ij = utility ofi th person for j th choice y æ ij + x ij æ + z i æ + u ij

9 Then assuming u ij iid F yields p ij = P èy ij =jx ij;zi è= P e x ij æ+z i æ ex ijæ+z i æ E.g., here x ij is a vector of choice speciæc individual characteristics like travel time to work by thei th mode of transport, and z i is a vector of individual characteristics, like income, age, etc. Critique of IIA í Independence of Irrelevant Alternatives Luce derived aversion of the above model from the assumption that the odds of choosing alternatives i and j shouldn't depend on the characteristics of a 3 rd alternative k. Clearly here P i = P èy i =è P j P èy j =è = evi e v j which is independent ofv k. One should resist the temptation to relate this to similarly named concepts in the theory of voting. For some purchases this is a desirable feature of choice model, but in other circumstances it might be considered a ëbug." Debreu in a famous critique of the IIA property suggested that it might be unreasonable to think that the choice between car and bus transportation would be invariant to the introduction of a new form of bus which diæered from the original one only in terms of color. In this red-bus-blue-bus example we would expect that the draws of u i 's for the two bus modes would be highly correlated, not independent. Recently, there has been considerable interest in multinomial probit models of this type in which correlation can be easily incorporated. References Pregibon, D. è980è Goodness of link tests for generalized linear models, Applied Stat, 29, 5-24.