Multivariate Response Models The response variable is unordered and takes more than two values. The term unordered refers to the fact that response 3 is not more favored than response 2. One choice from a group is selected, the labeling of the choices is arbitrary. Example: Choice of Health Plan Choice of Occupation Transportation Mode for Commuting to Work Y takes the values f0; 1; : : : ; Jg J a positive integer X a set of conditioning variables Example: Y occupational choice X contains education, age, gender, race, marital status As in the binary case, we wish to know how ceteris paribus changes in the elements of X a ect the response probabilities for j = 0; 1; : : : ; J. Because the probabilities must sum to unity, P (Y = 0jX) is determined once we know the probabilities for j = 1; : : : ; J. Multinomial Logit X a 1 K vector with rst element unity exp X j = 1 + P J h=1 exp (X h) Because the probabilities sum to unity P (Y = 0jX) = 1 1 + P J h=1 exp (X h) j = 1; : : : ; J If J = 1, let 1 = and we have the binary logit model. Note, the model is not derived from an assumption that errors to a latent model are logisitc. Rather, the response probabilities are assumed to be a logistic function. Partial e ects are complicated. For continuous X k " P # J @ h=1 = hk exp (X h ) @X jk k 1 + P J h=1 exp (X : h) Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio with change approximately P (Y = 0jX) = exp X j jk exp X j Xk 1
The sign of jk determines the direction of the e ect on the odds ratio. To simplify the analysis even further ln = X P (Y = 0jX) j so that both the sign and the magnitude are deteremined by jk In general ln = X P (Y = hjx) j h Another useful fact. Because it follows that P (Y = j or Y = hjx) = + (Y = hjx) P (Y = jjy = j or Y = h; X) = which simpli es as exp X j exp X j + exp (Xh ) = + P (Y = hjx) exp X j h 1 + exp X j h which is a logistic function. Conditional on the choice being either j or h, the probability that the outcome is j follows a standard logit model with parameter vector j h. The density of Y given X is fully speci ed, so ML estimation. observation the conditional log-likelihood is l t () = JX 1 (y t = j) ln [P j (x t ; )] j=0 For each with P j (x t ; ) = P (Y t = jjx t ) McFadden (1974) establishes that the log-likelihood is globally concave, so the MLE is CAN Example: School and Employment Decisions - Young Men Sample of Men in 1987 Y = 0 enrolled in school Y = 1 home (not in school, not working) Y = 2 working X education, quadratic in work experience, black binary indicator 1,717 observations 99 in school 332 at home 1,286 working 2
Coe cient Estimates Y = 1 (home) Y = 2 (working) educ -.674 (.070) -.315 (.065) exper -.106 (.173).849 (.157) exper 2 -.013 (.025) -.077 (.023) black.813 (.303).311 (.282) constant 10.28 (1.13) 5.54 (1.09) For the home column, each experience term is insigni cant Wald test for joint signi cance yields p-value of.047, so signi cant at 5 percent level, we would probably leave coe cients unrestricted in 1 rather than setting them to zero To interpret Log-odds between at home and enrolled in school p1 ln = X 1 E ect of one more year of school, reduce log-odds by -.674 Impact of race, log-odds.813 higher for black men These magnitudes are hard to interpret Compute partial e ect (one more year of education) for: black, 5 years of experience, 12 years of education p 0 @P (Y = 1jX) @X educ = :095 ( :674 + :345) = :003 As education degrees are discrete, compare 12 years (HS) with 16 years (college) from above P (Y = 1j; 12 years schooling) = :095 P (Y = 1j; 16 years schooling) = :024 graduating from college, rather than HS, reduces the at home probability by.071 Similar calculations reveal: raises employment probability by.041 As the total change across all three responses must be zero, the in school probability must be raised by.030. Prediction For each observation t, the outcome with the highest estimated probability is the predicted probability Overall 80 percent correctly predicted Employed 95.2 percent correct In School 12.1 percent correct At Home 39.2 percent correct Probabilistic Choice Models McFadden (1974) showed that a model closely related to multinomial logit can be obtained from an underlying utility comparison 3
Utility from choosing alternative j is tj X tj Y tj = X tj + tj latent taste variables, independent of X tj every element must vary across alternatives (e.g. no constant) Y t = arg max (Yt1; : : : ; YtJ ) takes values in f0; 1; : : : ; Jg Example Y Choice of health plan X co-payment (di ers over plan, maybe over individuals) Y Commute Transport Choice X commute time (di ers over individual and choice) Assume for each j that tj are independently distributed with cdf F (a) = exp [ exp ( a)] (type I extreme value distribution), then P (Y t = jjx t ) = exp (X tj ) P J h=o exp (X tj) j = 0; : : : ; J These response probabilities are often termed, conditional logit model Marginal e ects (on response probabilities) for k = 1; : : : ; K @p j (X) @X jk = p j (X) [1 p j (X)] k j = 0; : : : ; J @p j (X) @X hk = p j (X) p h (X) k j 6= h multinomial logit - X speci c to individuals not alternatives, (occupational choice, we do not know how much someone could make in each occupation) conditional logit - X speci c to alternatives not individuals, choice is made on basis of observable attributes of choice Conditional logit model has an important restriction p j (X j ) p h (X h ) = exp (X j) exp (X h ) Relative probabilities do not depend on attributes of other alternatives, termed independence from irrelevant alternatives (IIA) Empirical applications often include individual speci c variables, W t Y tj = X tj + W t j + tj If j is constant across j, then W t drops out of the response probabilities p j (X j ) p h (X h ) = exp (X tj) exp (W t ) exp (X th ) exp (W t ) = exp (X tj) exp (X th ) Hence, no quantities that vary only across individuals are allowed in X (which has a constant coe cient across alternatives) 4
Classic example of restriction imposed by IIA two commute types, red bus and car, each selected with prob.5 add blue bus, commuters choose between bus types with prob.5 yet IIA implies p red bus p car remains equal to 1 so probability of each mode of transit is 1/3! Unlikely that the probability one drives falls from 1/2 to 1/3 just because a di erent color bus is added (admittedly extreme, in practice combine two bus types, but does illustrate unwanted restrictions) To relax IIA assumption: Assume t independent multivariate normal with arbitrary correlations among choices yields conditional probit model (often termed multivariate probit) Response probabilities are complicated, involve J + 1 dimensional integral MLE infeasbile with J > 4, recent work focuses on simulation Assume heirarchical model, most popular is nested logit Group alternative into S groups, G s number of alternatives in group s First, choose group; second, choose option within group Group selection Choice selection h P s j2g s exp s 1 X j i s P (Y 2 G s jx) = P h S P r=1 r exp 1 j2gr r X j i r exp s 1 X j P (Y = jjy 2 G s ; X) = Ph2G s exp s 1 X h Requires a normalization, usually s = 1 Response probability is the mulitiple of the two displayed probabilites Estimation: conditional on choosing group s, response probabilities are conditional logit with parameter s 1 = s rst estimate s from conditional logit to each group then plug ^ s into P (Y 2 G s jx) and estimate s by maximizing the log-likelihood nx SX t=1 s=1 1 (Y t 2 G s ) log hq s x t ; ^; i with q s the group selection probability with ^ plugged in for s 1 Ordered Response 5
Values assigned to Y are not arbitrary Example Credit Rating Y = 0 lowest rating Y = 6 highest rating Y remains ordinal, we cannot say the di erence between 4 and 2 is twice as important as the di erence between 1 and 0 Ordered Probit X Y = X + U UjX N (0; 1) K 1 vector that does not contain a constant De ne 1 < < J be threshold parameters (cut points) Y = 0 if Y 1 Y = 1 if 1 < Y 2. Y = J if J < Y Distribution of Y jx P (Y = 0jX) = P (Y 1 jx) = ( 1 X) P (Y = 1jX) = P ( 1 < Y 2 jx) = ( 2 X) ( 1 X). P (Y = JjX) = P ( J < Y jx) = 1 ( J X) If J = 1 P (Y = 1jX) = 1 P (Y = 0jX) = 1 ( 1 X) = (X 1 ) Thus 1 is the intercept, so X does not contain a constant (Of course, with only two outcomes, we set 1 = 0 and estimate an intercept) For each observation, conditional log-likelihood is 1 (Y t = 0) ln [ ( 1 X t )] + + 1 (Y t = J) ln [1 ( J X)] Replace with logit function delivers ordered logit Marginal E ects @p 0 (X) @X k = k ( 1 X) @p 1 (X) @X k = k [ ( 2 X) ( 1 X)] for most choices, k doesn t determine the sign of the e ect Interval-Coded Data Unlike ordered probit, don t need to estimate interval points, so replace unknown f i g with known intervals fa i g Example Y family wealth a known intervals Interest centers on E (Y jx) but only intervals in which Y falls are observed Assume Y jx N X; 2 6
Use same log-likelihood function as for ordered probit but let a j X replace j Interpret j as if we had observed Y and done OLS Follows from strong assumption that Y jx follows classic model X 7