Lecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo

Lecture 1 Behavioral Models Multinomial Logit: Power and limitations Cinzia Cirillo 1

Overview 1. Choice Probabilities 2. Power and Limitations of Logit 1. Taste variation 2. Substitution patterns 3. Repeated choices over time (i.e., Panel Data) 3. Nonlinear Representative Utility 4. Consumer Surplus 5. Derivative and Elasticity 6. Exogenous Estimation 7. Goodness of Fit 8. Hypothesis Testing 2

1. Choice Probabilities Logit is the easiest and most widely used discrete choice model. Its popularity is due to the fact that the choice probabilities takes a closed form and is readily interpretable. Suppose a decision maker, labeled n, faces J alternatives. The utility that person n obtains from alternative j is denoted as: U nn = V nn + ε nn, where V nj is observed by the researcher and ε nj is a nonobserved random variable. The logit model is obtained by assuming that each ε nj is independently, identically distributed extreme value. The distribution is also called Gumbel and type I extreme value, with density and cumulative distribution: f ε nn = e ε nn e e ε nn F ε nn = e e ε nn 3

1. Choice Probabilities (cont.) Logit is the easiest and most widely used discrete choice model. Its popularity is due to the fact that the choice probabilities takes a closed form and is readily interpretable. Suppose a decision maker, labeled n, faces J alternatives. The utility that person n obtains from alternative j is denoted as: U nn = V nn + ε nn, where V nj is observed by the researcher and ε nj is a non-observed random variable. The logit model is obtained by assuming that each ε nj is independently, identically distributed extreme value. The distribution is also called Gumbel and type I extreme value, with density and cumulative distribution: f ε nn = e ε nn e e ε nn F ε nn = e e ε nn The variance of this distribution is π 2 /6. By assuming this, we are implicitly normalizing the scale of utility. 4

1. Choice Probabilities (cont.) The difference between two extreme value variables is distributed logistic. That is, if ε nj and ε ni are iid extreme value, then ε nnn = ε nn ε ni follows the logistic distribution: F ε nnn = eε nnn 1 + e ε nnn The key assumption is that the errors are independent of each other. This independence means that the unobserved portion of utility for one alternative is unrelated to the unobserved portion of utility for another alternative. This assumption is not as restrictive as it might at first seem. Under independence, the researcher has specified V nj sufficiently that the remaining, unobserved portion of utility is essentially white noise. 5

1. Choice Probabilities (cont.) If the researcher thinks that the unobserved portion of utility is correlated over alternatives, then he/she can: 1. Use a different model that allows for correlated errors, 2. Re-specify representative utility so that the source of the correlation is captured explicitly and thus the remaining errors are independent, or 3. Use the logit model under the current specification of representative utility, considering the model to be an approximation. 6

1. Choice Probabilities (cont.) The probability that decision maker n chooses alternative i is: P nn = PPPP(ε nj < ε ni + V nn V nn j i) If ε ni is considered given, this expression is the cumulative distribution for each ε nj evaluated at ε ni + V ni V nj, which is exp( exp( (ε ni + V ni V nj ))). Since the ε s are independent, this cumulative distribution over all j i is the product of the individual cumulative distributions: P nn ε nn = e e (ε nn +V nn V nn ) j i ε ni is not given, and so the choice probability is the integral of P ni ε ni over all values of ε ni weighted by its density. After some algebraic manipulation of this integral results in a succinct, closed form expression: P nn = e(v nn) e (V nj) j 7

1. Choice Probabilities (cont.) Logit exhibit desirable properties: The LL is globally concave 0 < P ni < 1 P ni is never exactly 0 or 1 V ni rises P ni 1 V ni decreases P ni 0 V ni - then P ni 0 If the initial probability is very low or very high, the effect of change in V is small. The point at which the increase in representative utility has the greatest effect is when the probability is close to 0.5. 8

1. Choice Probabilities (cont.) Consider a binary choice situation first: a household s choice between a gas and an electric heating system. Ug =β 1 PP g + β 2 OC g + ε g PP= purchase price β 1 <0 Ue = β 1 PP e +β 2 OC e + ε e OC= operating cost β 2 <0 If ε g and ε e are i.i.d. extreme value, then: e β 1PP g +β 2 OO g P g = e β 1 PP g+β 2 OO g + e β 1 PP e+β 2 OO e The ratio β2/β1 represents the household s willingness to pay (WTP) for operating-cost reductions. If β1= 0.20 and β2= 1.14, then WTP=( 1.14)/( 0.20) = 5.70 dollars more for a system whose annual operating costs is one dollar less. 9

2. Power and Limitations of Logit Three topics elucidate the power of logit models to represent choice behavior, as well as delineating the limits to that power. These topics are: taste variation, substitution patterns, and repeated choices over time. The applicability of logit models can be summarized as follows: 1. Can represent systematic taste variation (linked to individual characteristics of the decision maker). 2. Implies proportional distribution across alternatives. 3. Cannot capture the dynamics of repeated choices. 10

2.1 Taste Variation Logit can capture taste variations that vary systematically with respect to observed variables, while random tastes cannot be handled. Consider households choice among makes and models of cars to buy. U nj = α n SR j + β n PP j + ε nj PP= purchase price SR= shoulder room α n varies with the number of members in the households, α n =ρm n β n is inversely related to income, β n =θ/i n. Then: U nj = ρ(m n SR j )+ θ(pp j /I n ) + ε nj 11

2.1 Taste Variation (cont.) Suppose that the value of shoulder room and purchase price varied with observed component plus some other factors that are unobserved, α n =ρm n + μ n and β n =(θ/i n ) + η n. Then: U nj = ρ(m n SR j )+ μ n SR j + θ(pp j /I n ) + η n PP j + ε nj Since μ n and η n are unobserved, the terms μ n SR j and η n PP j become part of the unobserved component of the utility. ε nn = μ n SS j + η n PP j + ε nn The new error terms ε nn is not distributed independently and identically as required for the logit formulation. 12

2.2 Substitution Pattern An increase in the probability of one alternative necessarily means the decrease in probability for other alternatives. For any two alternatives i and k, the ratio of the logit probabilities is: P nn = evnn j e Vnj P nk e V nk e V = ev nn nn e V = ev ni V nn nn j This ratio does not depend on any alternatives other than i and k. The logit model exhibits this independence from irrelevant alternatives, or IIA. 13

2.2 Substitution Pattern (cont.) Because of the IIA, it is possible to estimate model parameters consistently on a subset of alternatives for each sampled decision maker. For example, in a situation with 100 alternatives, the researcher might, so as to reduce computer time, estimate on a subset of 10 alternatives for each sampled person. The person s chosen alternative plus 9 alternatives randomly selected from the remaining 99. Since relative probabilities within a subset of alternatives are unaffected by the attributes or existence of alternatives not in the subset, exclusion of alternatives in estimation does not affect the consistency of the estimator. 14

2.2 Substitution Pattern (cont.) To test for IIA: 1. Estimate the model. 2. Re-estimate the model using a subset of the alternatives. 3. If IIA holds, then the parameters obtained with the subset of alternatives will not be significantly different from the full model. IIA can be tested with mixed logit by testing whether the variance of the mixing distribution is in fact zero. However, when IIA fails, the test do not provide as much guidance on the correct specification to use instead of logit. 15

2.3 Panel Data If the unobserved factors that affect decision makers are independent over the repeated choices, then logit can be used to examine panel data in the same way as purely crosssectional data. Any dynamics related to observed factors that enter the decision process can be accommodated. However, dynamics associated with unobserved factors cannot be handled. 16

2.3 Panel Data (cont.) The utility that decision maker n obtains from alternative j in period or choice situation t is U nnt = V nnt + ε nnt. Dynamic aspects of behavior can be captured by specifying representative utility in each period to depend on observed variables from other periods. For example, a lagged price response is represented by entering the price in period t 1 as an explanatory variable in the utility for period t. This behavior is captured as V nnn = αy nn(t 1) + βx nnt, where y nnt = 1 if n chose j in period t and 0 otherwise. Lagged dependent variable can be added without inducing bias as long as the errors are independent over time (i.e., y nn(t 1) is not correlated to ε nnn ). 17

3. Nonlinear Utility Models have been developed and widely used for travelers choice of destination for various types of trips, such as shopping trips, within a metropolitan area. The utility depends on time and cost plus some attraction variables (e.g., residential pop and retail employment); labeled by the vector a j for zone j. How large an area to include in each zone is fairly arbitrary. We want a model that is not sensitive to the level of aggregation in the zonal definitions. Consider zones j and k, which, when combined, are labeled zone c. The population and employment in the combined zone are necessarily the sums of those in the two original zones: a + a 18 jj k = a c

3. Nonlinear Utility (cont.) Consider zones j and k, which, when combined, are labeled zone c. The population and employment in the combined zone are necessarily the sums of those in the two original zones: a + a jj k = a c The model must then satisfy: P njj + P nk = P nc Which for logit models takes the form e V nj + e V nk e V nc e V = nn + e V nn + e V nl e V nn + e V nn l j,k l j,k This holds if exp(v nj )+exp(v nk )= exp(v nc ) To specify a destination choice model that is not sensitive to the level of zonal aggregation, representative utility needs to be specified with parameters inside a log operation. 19

4. Consumer Surplus A person s consumer surplus is the utility, in dollar terms, that the person receives in the choice situation. For policy analysis, the researcher is often interested in measuring the change in consumer surplus that is associated with a particular policy. For example, if a new alternative is being considered, such as building a light rail system in a city, then it is important to measure the benefits of the project to see if they warrant the costs. Similarly, a change in the attributes of an alternative can have 20 an impact on consumer surplus that is important to assess.

4. Consumer Surplus (cont.) Consumer surplus is CS n = (1/α n ) max j (U nj ) where α n is the marginal utility of income: du n /dy n = α n, with Y n the income of person n. The division by α n translates utility into dollars, since 1/α n = dy n /du n The researcher only observes V nj instead of U nj. The researcher is able to calculate the expected consumer surplus: E CC n = 1 E mmm α j V nn + ε nn j. n If each ε nj is iid extreme value and utility is linear in income (so that α n is constant with respect to income), then: E CC n = 1 α n ln e V nn J j=1 + C where C is an unknown constant that represents the fact that the absolute level 21 of utility cannot be measured

4. Consumer Surplus (cont.) Note that the argument in parentheses in the previous expression is the denominator of the logit choice probability. The expected consumer surplus in a logit model is simply the log of the denominator of the choice probability. It is often called the log-sum term. This is the CS of a population that has the same representative utility. If different segments are present, then it is necessary to calculate the weighted average over all the segments. 22

5. Derivative and Elasticity It is often useful to know the extent to which these probabilities change in response to a change in some observed factor. The change in the probability that decision maker n chooses alternative i given a change in an observed factor, z ni, while holding everything else constant is: e Vnn j P nn evnn = = V nn P z ni nn nn 1 P nn nn Let z nj denote an attribute of alternative j. How does the probability of choosing alternative i change as z nj increases? P nn evnn k e Vnk = = V nj P nj nn P nj nj nj 23

5. Derivative and Elasticity (cont.) Economists often measure response by elasticities rather than derivatives, since elasticities are normalized for the variables units. An elasticity is the percentage change in one variable that is associated with a one-percent change in another variable. The elasticity of P ni with respect to z ni, a variable entering the utility of alternative i, is E iznn = P nn z nn = V ni z z ni P nn nn 1 P nn ni The cross-elasticity of P ni with respect to a variable entering alternative j is E iznj = P nn nj z nj P nn = V nj nj z nn P nj 24

6. Exogenous Estimation Consider first the situation in which the sample is exogenously drawn (i.e., random or stratified random). We also assume that the explanatory variables are exogenous to the choice situation. Since the logit probabilities take a closed form, the traditional maximum-likelihood procedures can be applied. The probability of person n choosing the alternative that he was actually observed to choose is P nn y nn i. where y ni =1 if person n chose i and zero otherwise. 25

6. Exogenous Estimation (cont.) Assuming that each decision maker s choice is independent of that of other decision makers, the probability of each person in the sample choosing the alternative that he was observed actually to choose L β = N y P nn nn n=1 i. where β is a vector containing the parameters of the model. The log-likelihood function is then N LL β = y nn n=1 i ln (P nn ) LL(β) is globally concave for linear-inparameters utility. 26

7. Goodness of Fit A statistic called the likelihood ratio index is often used to measure how well the models fit the data (i.e., how well the model performs compared with a model in which all the parameters are zero). The likelihood ratio index is defined as ρ = 1 LL(β ) LL(0) It is usually valid to say that the model with the higher ρ fits the data better. 27

7. Goodness of Fit (cont.) Another goodness-of-fit statistic that is sometimes used, but should be avoided, is the percent correctly predicted. This statistic is calculated by identifying for each sampled decision maker the alternative with the highest probability, based on the estimated model, and determining whether or not this was the alternative that the decision maker actually chose. This statistic says that the alternative with the highest probability will be chosen each time, which is not true. Suppose an estimated model predicts choice probabilities of.75 and.25 in a twoalternative situation. Those probabilities mean that if 100 people faced the representative utilities that gave these probabilities the researcher s best prediction of how many people would choose each alternative are 75 and 25. However, the percent correctly predicted statistic would predict that one 28 alternative would be chosen by all 100 people.

8. Hypothesis Testing Standard t-statistics are used to test hypotheses about individual parameters in discrete choice models. Two of the most common hypotheses are (1) several parameters are zero, and (2) two or more parameters are equal. The test statistic 2(LL(β 2 ) LL(β 1 )) is used to evaluate these hypothesis. This statistic is chi-squared distributed with degrees of freedom equal to the number of restrictions implied by the null hypothesis (i.e., difference in number of coefficients). If this value exceeds the critical value of chi-squared with the appropriate degrees of freedom, then the null hypothesis is rejected. 29