CALIBRATION OF A LOGIT DISCRETE CHOICE MODEL WITH ELIMINATION OF ALTERNATIVES ABSTRACT

Size: px

Start display at page:

Download "CALIBRATION OF A LOGIT DISCRETE CHOICE MODEL WITH ELIMINATION OF ALTERNATIVES ABSTRACT"

David Jordan
6 years ago
Views:

1 CALIBRATION OF A LOGIT DISCRETE CHOICE MODEL WITH ELIMINATION OF ALTERNATIVES Marisol Castro macastro@ing.uchile.cl, Francisco Martínez fmartine@ing.uchile.cl, Marcela Muzaga mamuza@ing.uchile.cl* Uversity of Chile, Faculty of Physical and Mathematical Sciences, Civil Engineering Department, Transportation Division, Postal Address: P.O. Box 228-3, Santiago, Chile. Tel: Fax: * Corresponding author ABSTRACT Within the context of discrete choice models, defing the set of alternatives and bounds or thresholds for attributes that cannot be exceeded, are issues so complex that in practice they are commonly simplified by assuming that the available alternatives can be established arbitrarily and exogenously and that individuals mae their choices compensatively through a relative valuation of attributes, without limiting the range of possible values. These assumptions overloo the fact that the presence of thresholds and attributes may be part of alternative choice/rejection mechasms and, therefore, can act as explicit criteria in defing the choice set. Several methods have been proposed to handle this problem, and a distinction can be made between those that model the set of alternatives available and those that modify the utility function so the individuals themselves implicitly eliminate alternatives that they do not consider feasible. This latter group includes the Constrained Multinomial Logit Model (CMNL) (Martinez et al., 2008). This wor studies the calibration of the CMNL model, which is a Multinomial Logit (MNL) that, through penalization functions, includes the effect of the excessive value of an attribute when it comes close to, or exceeds, a threshold, thus reducing the probability that this alternative is chosen. This defines the availability of alternatives implicitly within the choice process. In this paper the maximum lielihood function is analytically studied and the model is calibrated using simulated and real databases. The simulation experiments showed that in the cases where behavior is constrained, the CMNL model presents a better fit to data than the classic Multinomial Logit (MNL), and it produces more accurate predictions. We also analyze some identification problems present in certain specifications of the model. KEY WORDS: discrete choice, thresholds, non-compensatory behavior.

2 1. INTRODUCTION Within the framewor of discrete choice, one of the main tools used in predicting the demand for transportation are the random utility models (RUM) (McFadden, 1968) usually applied assuming that individuals exhibit a compensatory behavior regarding attributes, and that the set of alternatives is nown and fixed. One of the problems that arise when using these models is that the value of an attribute may be excessively high or low, laying out of the range where choices are considered feasible for the consumer, thus invalidating the predictions made by the model. This problem is not present in the microeconomic theory, because consumers are assumed to behave based on the indirect utility, which is constrained by the domain of choices that are feasible for the consumers. The difficulties appear in the application of the theory when the number of constraints is large and when one or more constraints are difficult to identify for the modeler. In these cases the modeler will face severe difficulties in identifying the correct form of the indirect utility. Moreover, in practice, data is highly restrained so modelers normally define simple utility functions, most often using the linear form, and concentrate their effort on obtaing the best estimates of the function parameters. Therefore, in the applied field, the theoretical elegance and rigor of the indirect utility function is not usually applicable, and unless corrections are introduced, the choice model will predict consumption out of the consumers domain. In the search for a practical solution to obtain a realistic representation of the choice process, models have been proposed that include the defition of the set of available alternatives of each individual in the modeling, so that when an alternative has an attribute that exceeds a bound or threshold, this alternative is assumed no longer feasible for the user. This happens, for example, when the travel time is so long that the activity at the destination cannot be performed (e.g. waling a long distance) or when the travel cost is so high that the user cannot finance the trip. There are two formulations within the discrete choice models that until a few years bac seemed to be incompatible: the models of elimination by aspect (EBA) (Tversy, 1972) and the RUM. The EBA models propose that the choice of an alternative may be seen as a stochastic process in which the alternatives are successively eliminated until there is just one left, describing the behavior of an individual faced with a set of variable alternatives. The RUM models propose that the utility associated with each alternative depends on the attributes of the alternatives and the preference of the relevant individual, assuming that the individuals follow a compensatory behavior, meang they use a strategy in which it is possible to exchange the attributes of the alternatives available at substitution rates that maintain a level of utility. Hence, the prediction by the RUM model depends on the functional form of the utility, on how the set of available alternatives is defined, and on the quality and quantity of information gathered. So, if the choice set is poorly specified, there may be serious problems with the predictions (Williams and Ortúzar, 1982). Based on the focuses of the EBA and RUM models, the need arises to adequately model the alternatives available and the bounds of the attributes. In practice, these issues are commonly ignored, assuming that the alternatives available can be established exogenously. In the more used approach, that availability is treated as a binary variable established using arbitrary criteria,

3 based on the fact that the attributes of the alternatives defined as available are within the feasible domain. Although in some cases it is more plausible to define the sets of feasible alternatives exogenously, e.g. modeling travel mode choices, in other cases it is very complex or defitively arbitrary, lie in the case of modeling the choice of spatial alternatives, e.g. residential location or destination of trips. Two approaches are found in literature to create the set of alternatives available: 1. The two stages approach creates itially the set of feasible alternatives based on an external model (Mansi, 1977) and then, in the second stage, calculates the choices for the set of feasible alternatives. The main problem in this approach is the great computer cost implied in using a two-stage method, even though the model is usually seen to be consistent theoretically (Swait and Ben-Aiva, 1987). An interesting application of this methodology can be seen in Cantillo and Ortúzar (2005), where the individual compares each attribute to its corresponding threshold and if the threshold is exceeded, the choice is rejected within that choice set. 2. The single stage (or implicit) method includes the availability of alternatives implicitly in the choice models, so that the indirect utility reflects that certain alternatives are not desirable and, therefore, not chosen. These models have advantages on their efficiency because they are one-stage, but they can be criticized theoretically when it is impossible to identify the variables that affect the utility of the individual separately from those that define the set of alternatives, thus, the attributes play a dual role in the choice process. Some authors who have developed this approach are Cascetta and Papola (2001), who add a term to the utility function that represents the probability of the alternative belonging to the choice set. Another example is Swait (2001), who includes determistic thresholds in the choice process that models the possibility that certain restrictions may be violated by the individual, which translates into a linear penalization of the utility function. Martinez et al. (2008) criticize the Cascetta and Papola model because of its difficulty to calibrate the parameters and because it is not applicable to more complex situations with multiple constraints. The deficiencies in the Swait model are that it does not allow randomness in penalizations, nor is the utility function derivable at the threshold because of the inclusion of linear penalizations. Hence, it may be unstable in the neighborhood of that point, potentially creating computer and analytical complications in the calculation of equilibrium problems. Therefore, Martinez et al. (2008) propose the Constrained Multinomial Logit model (CMNL), that extends these implicit approach to a model that is also one-stage, but it imposes multiple thresholds on individuals directly in the utility function, overcoming the difficulties of the previous models by specifying penalizations in a binomial logit form. This paper studies the feasibility of implementing the calibration of the semi-compensatory CMNL model through analytical and empirical analysis, focusing in parameters estimation and prediction capabilities. In section 2 we describe the CMNL model in detail, in section 3 we discuss the calibration method and analyze identification, in section 4 we present calibration results using simulated data and in section 5 results obtained with real data; and finally in section 6 we state some conclusions and final remars.

4 2. THE CMNL MODEL The RUM proposes that the indirect utility U provided by alternative i to the individual n is nown by the decision-maer but not by the modeler. So it is represented in the choice model by the sum of two components: a determistic component V nown by the modeler, which is the function of the vector of attributes that define the alternative, and a random component ε that represents the analyst s inability to appreciate all attributes and variations in tastes that govern the behavior of individuals, as well as errors in measurement and imperfections in the information available. Therefore, U = + ε ( 1 ) V Within the RUM, one of the models most used is the Multinomial Logit (MNL) (McFadden, 1974) which, assuming that the random component ε distributes identically and independently (IID) Gumbel, offers a closed form for the choice probabilities. The probability that the individual n chooses alternative i within the set of alternatives available C n is given by equation (2), where µ is the scale parameter of the Gumbel distribution: µ V e P = µ e j C n Vnj The Constrained Multinomial Logit (CMNL) model proposed by Martinez et al. (2008) is a generalization of the above formulation that concentrates, in a one-stage model, a traditional compensatory determistic component and a non-compensatory component that is activated when the attributes approach the threshold values. The utility function is defined as: 1 U ( xi ) = V ( xi ) + ln( φ ( xi )) + ε ( 3 ) µ where x i is the vector of attributes of alternative i and for alternative i. ( 2 ) φ is the penalization function or cut-off In order to include the diversity of upper and lower constraints, the cut-off is defined by the U L multiplication of the set of upper restrictions φ and lower restrictions φ on the corresponding attributes, as follows: K L U φ = φ φ ( 4 ) = 1 These thresholds are defined on the basis of a binomial logit for each attribute and each alternative i, thus representing soft restrictions, meang that they may be trespassed by the

5 user, creating a penalization in the utility function when the attribute value exceeds the bound. Hence, the defition for each individual n that restricts alternative i based on attribute is: U 1 φ = ( 5 ) 1+ expω x b + ρ φ L = 1+ exp [ ( )] i n [ ω ( a x + ρ )] n i 1 ( 6 ) where b n and a n are the upper and lower bounds, respectively, that restrict the choice; ω is the scale parameter of the binomial logit ( ω > 0); x i is the value of the restricted attribute for the corresponding choice; and ρ is an adjustment parameter defined by: where ρ 1 1 η ( 7 ) = ln ω η η corresponds to the proportion of the population violating the restriction, ( 0,1) η. Please note that when the η parameter taes the value of 0 or 1, the expression for ρ is undefined. This comes from the fact that the binomial logit model can only predict determistic choices (0 or 1 probabilities) at (plus/minus) infite values of variables. The practical effect of this is that the cut-off factor necessarily allows some violation of the constraints, but limited by the parameter η. A second implication in that the sample used for the calibration of these parameters must be one where there are some individuals who violate the restriction (individuals who choose alternatives where the attributes exceed the imposed thresholds) and individuals who respect it (attributes in the chosen alternative do not exceed the threshold). Figure 1 shows how the penalization varies for different values of ω and η. It can be seen that it is approximated by a bilinear function, similar to what Swait (2001) proposed, but this function does not have an inflection at the threshold, taing a non-linear form, without forming a in.. The functional form defined for the cut-off, reaches its maximum φ ~ 1 when the attribute moves away from the bound towards the interior of the feasible domain. Its value falls as the attribute approaches the bound. When the value of the attribute(s) exceeds the threshold; the cutoff value tends toward zero, penalizing the utility function in a non-linear way. Then φ may be understood as the probability that individuals include alternative i within their choice set, or as the probability of respecting the restriction.

6 f ln f ln Variations of 0-1 h η = 0,2 η = 0,5 η = 0, b n - x i Variations in 10 w ω = 0,05 ω = 0,2 ω = 0, b n - x i Figure 1. Penalization of the Utility Function By assuming that the random component in the utility function follows a Gumbel IID distribution, the choice probability of alternative i is as expressed in equation (8), which is a closed functional form similar to that of the MNL. exp( µ U ) φ exp( µ V ) P = = ( 8 ) exp( µ U ) φ exp( µ V ) j nj j nj nj One property of equation (8) highlighted by Martinez et al (2008) is that when the utility function contains endogenous attributes, either in the compensatory part of the utility function or in the cut-off term, a fixed point problem is generated. This situation occurs when there are externalities in the individual choices, such as congestion or mutual attraction between consumers. The authors prove that this fixed point problem has a uque solution and that the solution algorithm is contractive. This property is not proven by the authors Cascetta and Papola (2001) and Swait (2001), in the case of the other one-stage models.

7 In order to simplify the analysis that will be made in this document, only upper bounds b n will be imposed for the attributes since they are the most commonly found in transportation studies (time and cost upper bounds). Nonetheless, the analysis performed can be extended to the case of lower bounds. 3. METHOD OF CALIBRATION The method proposed for the calibration of the CMNL model is maximum lielihood. The lielihood of each observation is built including the compensatory and non-compensatory model components simultaneously: ( θ n, ω, ρ, bn ) = max L P ( 9 ) whereδ = 1 if individual n chooses alternative i, 0 if not, and θ n is the vector of parameters of the compensatory utility function, that will be assumed linear in the attributes x: V = θ x. We use logarithm of the above expression to simplify calculus. n i It is worth noting that, as shown in equation (9), bounds b are assumed as variables of the optimization problem. Of course, these are nown parameters when they represent exogenous bounds, nown for the modeler. However, they become variables when the bounds reflect individual s thresholds, self imposed by the consumer and unnown for the modeler. We discuss how the solution can be sought for this latter case below. Thus, the model calibration can be defined for two different cases: with exogenous or with endogenous bounds. In the more general case, there could be both types of bounds mixed in the same choice problem. The first-order conditions (FOC) for parameters n i δ θ n, ω, bn and ρ are the following: Only for endogenous b n l θ n = δ xi Pnj xj = 0 θ n n i j l = δ φ φnj Pnj = 0 bn i j l ρ = δ φ φnj Pnj = 0 ρ n i j l ω = δ Φ Φ nj Pnj = 0 ω n i j ( 10 ) ( 11 ) ( 12 ) ( 13 ) with φ = 1 φ and Φ = ( 1 φ ) ( x b ) i n

8 These FOC have interpretations that agree with the theory on which the CMNL model is founded: 1. Equation (10) verifies the FOC for the MNL. For the specific constants, this condition implies that the parameters of the compensatory function must be capable of reproducing the maret shares observed in the calibration sample. For the rest of the parameters, it establishes that the expected (predicted) average value of an attribute be equal to the average observed in the sample. It can also be seen from this equation that the scale parameter µ cannot be estimated from the data (it is not identifiable) and this does not affect the cut-off parameters (a result also deductible from equation 8). 2. Equation (11) reveals that the calibration of parameters bn guarantees that the proportion of the sub-population type n in the sample violating the constraint is equal what the model predicts; this equation should be ignored for the exogenous bounds. Equation (12) is similar to (11) but aggregated across the whole population, hence it is redundant if equation (11) holds for all n; nevertheless, it is sigficant when there are exogenous bounds. The interpretation made here is based on the assumption that φ is the probability of respecting restriction ; φ is the complement, or the probability of violating restriction. It is important to underline here the identification problem that arises between parameters bn and ρ when the former is endogenous. This is caused by the fact that bn and ρ are added in the expression for the cut-off, and then the information used to obtain both parameters is the same. 3. Φ in equation (13) represents a magtude of the violation, that is the expected value of the distance of the excess (or deficit) of the attribute compared with the upper (lower) bound. It imposes that the magtude of the violations observed in the data be equal to the one predicted by the model. From this conditions can be seen that the CMNL is a generalization of the MNL, because if the individuals present a pure compensatory behavior, the cut-off parameters are irrelevant and the FOC of the MNL model is recovered. Note that since the scale parameter µ can not be identified, we calibrate µ V ( xi) + ln( φ ( xi) ), where ln φ( xi) = ln[1+ exp ω( xi bn + ρ)], which has the bilinear form depicted in Figure 1. This form is caused by the fact that as the attribute gets close to the bound the constant term one becomes dominated by the exponential, and the function can be approximated by the linear form since ln φ( xi bn) ω( xi bn + ρ). This quasi-linearity induces a potential correlation problem between the parameters associated to the restricted attribute x i, which are ( θn, ω ). Then, for an unbiased estimation of these parameters, a very rich sample is required, with a large number of observations affected by the restriction, and enough diversity among the compensatory parameters θ n across the consumers clusters. If that is not the case, the modeler may use standard techques to control for correlated parameters, such as instrumental variables.

9 When the bounds are endogenous variables, such as, for example, the waling time or travel time in a mode choice, ω and ρ% n = ρ bn can be identified. This observation provides a mean to estimate the model with endogenous bounds, thus eliminating the need to provide exogenous parameters to the model. The endogenous bounds are obtained maximizing the lielihood function based on the behavior of individuals observed in the sample. When the bounds are exogenous, for example, income, then both b n and a n are nown and only ρ and ω can be calibrated, in addition to the θ parameters in the compensatory part. In this case all parameters are identifiable from a rich sample. Notice, however, that when the bound is nown, the proportion of people violating restriction η can be obtained from the database, so ρ ω is obtained directly from (7); it follows that only parameter ω needs to be calibrated. This way of calculating CMNL has one degree of freedom less than the above, which helps to limit the feasible space of solutions. The data sample required to calibrate the set of additional parameters of the cut-off terms is very specific as it should reflect the behavior of decisions on the edge of the choice domain. We believe that the adequate databases are those obtained from revealed preferences that are rich in variance and with a large number of observations, because they represent the real behavior of individuals. Alternatively, stated preference data may be particularly helpful to obtain the behavior of individuals at the edge of their feasible domain for attributes, provided the experiments are properly designed. 4. CALIBRATION USING SIMULATED DATA To evaluate the calibration process, a database was created simulating transport mode choice in a context where individuals penalize some attributes; thus, when the CMNL applies. The advantage of using simulated data is that the calibration process can be studied in an ideal case where all the model components are nown, allowing a controlled analysis of the characteristics and properties of the model and the potential problems that arise in the calibration process. We use the simulation method suggested by Williams and Ortúzar (1982), also applied by Muzaga et al (2000). The data was generated for four hypothetical alternatives: Car, Bus, Shared-Taxi and Subway; with a utility function that depends on four attributes: Travel Time, Waling Time, Waiting Time and Cost, simulating the choices of 10,000 individuals. The method used to generate the data is the following: first, the attributes of each mode are generated as normal variables with fixed means and variances. Then we generate the iid-gumbel terms with a variance such that the scale factor µ equals one, thus controlling the scale effect to be able to compare directly the calibrated model parameters with the values used to generate the experiments. The linear utility function for each alternative on each observation is calculated as the addition of the determistic components and the stochastic terms, according to: U i ( φ ) ε = θ x + ln + ( 14 )

10 Finally, the choices of the individuals are computed as the alternative of maximum utility, completing the information on the simulated data base. For these experiments a homogenous population of individuals is assumed, therefore one single parameter ω was used for the entire sample. We performed several simulation-calibration experiments, varying all the relevant aspects of the problem. The results confirm that the CMNL model can be calibrated using a database that behaves according to that paradigm, with a sufficient number of observations and enough variance. In econometric terms, we can say that CMNL calibration by a maximum lielihood can recover the utility function parameters, both for the case where of exogenous and endogenous bounds, and for one or more attributes simultaneously penalized. In Table 1, we present some results for a case where travel time was penalized. It can be seen that both assuming endogenous bounds, or imposing them exogenously, the model is correctly calibrated, and the values used to simulate the data are within the confidence interval of the estimates. However, there are some factors that are determinant to obtain a good calibration of the parameters, which we want to discuss. Table 1: Calibration of CMNL model for synthetic data with penalization in travel time Parameters Exogenous bound Endogenous bound Confidence Design Estimate Estimate Interval values (t-stat) (t-stat) 95% Variable Mode constants Car 0.30 Shared taxi 1.00 Metro Utility function variables Travel time Wal time Wait time Cost Cut off parameters ω (Tr. time) (3.9) (13.5) (-7.0) (-12.5) (-34.0) (-4.5) (-38.6) (45.4) (0.21 ; 0.62) (0.86 ; 1.15) (-0.65 ; -0.36) (-0.1 ; -0.07) (-0.17 ; -0.15) (-0.3 ; -0.12) ( ; ) (0.67 ; 0.73) ω (ρ - b n ) (4.0) (14.0) (-7.2) (-6.4) (-32.6) (-4.8) (-36.8) (41.2) (-29.0) Log - lielihood Confidence Interval 95% (0.21 ; 0.61) (0.87 ; 1.15) (-0.66 ; -0.38) (-0.1 ; -0.05) (-0.17 ; -0.15) (-0.3 ; -0.12) ( ; ) (0.68 ; 0.74) (-9.8 ; -8.55) One factor that was empirically found necessary to obtain good quality calibrations is that, despite the condition that some individuals should violate the bounds, the proportion of the sample violating the restriction ( η ) must be very low, i.e. most of the population must respect

11 Utilidad [utiles] the restriction and, therefore, choose compensatory. This is clear if we thin in a sample where all observations are choosing affected the bounds, none is behaving truly compensatory and the θ parameters can hardly be distinguished from the parameters of the cutoff factors. Another aspect analyzed is what happens when the model is calibrated with exogenous bounds, but the modeler mistaenly assigns values to the bounds different to the true values. Results show that there are two effects: the fit of the model dimishes and the value of the parameters associated with penalization, θ n and ω, lay outside the confidence interval. This empirical evidence supports the following criteria: choose the threshold that best reproduce the behavior of individuals, given by the value of b n or a n that maximizes the model s log-lielihood since they best approximates the choices made by the sample. Similarly, models were calibrated varying η, obtaing as a result that the value of η that maximizes the model s log-lielihood coincides with the one that reproduces the proportion of the sample violating the restriction. As mentioned in section 2, there might be a correlation problem between ω and θ n in the linear utility function, which leads to an identification problem as the calibration method is incapable of differentiating between these parameters. This problem was found in some experiments, so we propose a method to deal with it. Under the assumption that the behavior in the interior of the choice domain is compensatory and at the domain s edge is non-compensatory, we split the sample into two groups differentiated by the distance of the attribute of the chosen alternative to the edge of the domain, and write different utility functions for each sample group. Therefore, only in the latter case the cut-off is activated in the utility (see Figure 2).? n ln (f ) x i % of the Sample w/compensatory Behavior x c % of the Sample w/non-compensatory Behavior Figure 2. Split of the Sample This sample split can be made in two different ways. In equation (15) a dummy variable ν i is introduced in such a way that within the compensatory domain, only the linear part of the utility function is present for alternative i, while for the group of observations which are closer to the edge of attribute, only the penalization term is present. We call this specification a partitioned

12 CMNL. Conversely, in the model presented in equation (16), the determistic (linear) behavior is present for all cases, but only for the observations close to the edge the penalization term is activated. This last specification was called an overlapped CMNL U = θ l U nl x l il + = θ nl ν i x θ il + n x i + ( 1 ν ) lnφ ( x ) i ( 1 ν ) ln φ ( x ) i n i n i ( 15 ) ( 16 ) * This method requires that the modeler identifies the critical value of the attribute ( x ) that defines ν, such that: i ν i 1 0 if x > x * i = * if xi x ( 17 ) Models with the specification expressed in equations (15) and (16) can only be calibrated for the case of exogenous bounds, since the sample segmentation depends on these bounds. Given the complexity of the model and the bilinear form of the cut-off function, we also studied the possibility of estimating a model that linearly penalizes the attributes according to the following formulation, where β n corresponds to the parameter that accompaes the penalized attribute, for each version (partitioned and overlapped, respectively): U = θnl xil + δ θ n xi + 1 l l ( δ ) β n xi ( 1 ν i ) β n xi U = θ nl xil + ( 18 ) ( 19 ) In order to compare the models coming from equations (15) - (16) and (17) - (18), simulated choices were generated according CMNL model and the parameters were calibrated with both type of models, and also with the partitioned and overlapped specification. The results presented in Table 2 show that for the CMNL all the model parameters but the specific constants were recovered. This is probably due to the fact that when dividing the calibration sample into two sub-samples, we mae two sub-marets with differentiated constants. Therefore, the constants are incapable of simultaneously adjust to both marets. It has been empirically seen that the model goodness of fit varies according to the split made of the sample, so correctly determing at what point the cut-off is triggered is important to the calibration process. This is a limitation, because the modeler does not now a priori the best split of the sample. For the linear model, some of the attribute parameters are not recovered, and the log lielihood is sigficantly worse that the value obtained for the CMNL.

13 Table 2: Calibration of CMNL and bilinear models using sample split Partitioned Overlapped CMNL model Linear model CMNL model Linear model Variable Design values Estimate (t-stat) Estimate (t-stat) Estimate (t-stat) Estimate (t-stat) Mode constants Car * * * * (0.2) (-6.9) (0.1) (-6.2) Shared Taxi * * * * Metro Utility function variables Travel time Wal time Wait time Cost Cut off parameters ω (Cost) d=90% αn (Cost) d=90% 0.35 (-0.3) * (-0.3) (-25.7) (-35.4) (-7.7) (-11.1) (34.3) (-10.9) 0.058* (1.0) (-24.1) (-33.2) (-7.5) * (-22.5) (-43.6) (-0.2) * (-0.3) (-25.2) (-36.3) (-7.6) (-11.0) (27.2) (-5.9) * (5.7) (-19.8) * (-27.2) (-7.6) * (-22.7) (-17.5) Log-lielihood Finally, we studied what would happen if we use an incorrect model, and what would be the difference in the choices predicted by the MNL model and the CMNL model. Two simulation experiments were thus performed. In the first simulation, the database was generated using a MNL compensatory utility function, and we tried to calibrate the CMNL parameters. As a result, all parameters in the compensatory function were satisfactorily recovered; while, as expected, the parameters associated with the cut-off were not statistically sigficant. This leads to the conclusion that if there is no certainty as to the actual behavior of the decision-maers, it is conveent to calibrate the CMNL model. On the other hand, if the individuals choose compensatively, the sigficance of the parameters in the restriction will confirm the real behavior. Then we proceed inversely, generating a database where the choices are CMNL, penalizing the cost attribute, and calibrate using a MNL specification. We found that only the parameters of the non penalized variables were recovered, and also that the MNL adjustment is sigficantly worse than what would be obtained by calibrating a CMNL. The comparison of the MNL and CMNL models in terms of their predictive capabilities showed some other interesting results. We tested the models for several policy scenarios, designed for this purpose, simulating future situations where the penalized attribute changes. Then the

14 predictions of both models were compared for one same policy change. In Table 3, we present the results for a case where the travel time variable is penalized for a different mode on each column. It can be seen that the error in the MNL model is always greater than that of the CMNL model, regardless of whether there is an increase or decrease in the value of the penalized attribute. It can also be seen that there are some differences depending on which mode is affected by the policy change. Differences over 7.81 can be considered sigficantly different, according to the χ 2 index reported in the table. Table 3: Predictions comparisons under different policy scenarios Car Shared taxi Bus Metro MNL CMNL MNL CMNL MNL CMNL MNL CMNL [%] χ 2 index χ 2 index χ 2 index χ 2 index χ 2 index χ 2 index χ 2 index χ 2 index Travel time variation Cost variation When the cost variations are applied, it can be seen that when the car cost is modified, the prediction are not sigficantly different, but when the bus cost varies, the MNL errors are sigficant. One explanation for these results can be found in the first-order conditions of the CMNL, which show that the calibration of the cut-off parameters is made from those observations that are affected by the restriction. In this way, when the constraint is the income budget, the car choices in the simulated data are closer to the constraint and embed the effect of the income bound; conversely the bus choices are less affected. Because the MNL model has only one parameter to represent the individuals reaction to cost changes, in the case of car choice the effect of the constraints are embedded in the calibrated cost parameter; conversely, in the case of bus choices, where the constraint is not active because of its lower cost, the cost parameter of the MNL model does not include at all the effect of the budget constraint. Therefore, any increase in costs will imply that the MNL predicts with less error in the case of cars than in the case of bus choices. This means that the MNL reproduces better in the vicity of the observed data but predicts with error beyond the range of the data. In summary, the calibration exercises using simulated data indicate that in all cases, the CMNL model is better than or the same as the MNL model, while the predictions with CMNL may be superior to those of the MNL model when there are active restrictions.

15 5. CALIBRATION USING REAL DATA The CMNL model was applied to real mode choice databases of Las Condes Dowtown (Donoso, 1982), San Miguel - Downtown (Parra, 1988) and Cagliari (Cherchi and Ortúzar, 2006). The information available for all databases was handled similarly. First of all, a statistical analysis was made for the data allowed us identifying which attributes could be subject to noncompensatory behavior, thus determing liely ranges for bounds. Then an attempt was made to calibrate thresholds endogenous to the model, but the results were unsatisfactory so the bounds were treated as exogenous variables. One of the main difficulties found in the calibration was the correlation problem between parameters ω and θ n, where defines the penalized attribute. It general, the variables that had the best calibration results were those statistically more sigficant in the MNL model, which resembles a result found with simulated databases. We found reasonable results for Las Condes Centro, where the CMNL model was capable to identify the cutoff parameters when splitting the sample. These results are reported on Table 5, where can be seen that the ω parameters are sigficant for waling and for travel time, and also for both simultaneously. All the CMNL models reported on Table 5 fit the data sigficantly better than the MNL. In the case of Cagliari, we also obtained some sigficant ω parameters, but there is a modest gain in lielihood and the cutoff is very similar to a linear function. For the case of San Miguel though, none of the specifications used gave sigficant cutoff parameters. We believe this is because San Miguel is a poorer neighborhood, where car availability is very restricted, and therefore, most users are captive to public transport. It is worth pointing out that an important limitation of all three databases used is that the alternatives availability is exogenously determined, and that constraint can not be relaxed, as the level of service variables are not available for the alternatives that are not part of the choice set. We also conclude, from the experiments conducted, that a database rich in variance is required to calibrate the CMNL model, containg observations that are affected by the restrictions. When this is not the case, the results are contrary to what is expected in terms of signs and statistical sigficance of the cut-off estimates; even the method does not converge on occasions. In those cases, however, the MNL performed correctly.

16 Table 5: Calibration results for Las Condes-Centro Database Morng trip to wor. CMNL overlapped. Estimates (t-stat) CMNL Variables Mode constants MNL Car-driver (-4.4) Car-compaon (-6.2) Shared taxi (-4.7) Metro (7.2) Car driver-metro (-4.6) Car compaon-metro (-5.1) Shared taxi-metro (-5.1) Bus-Metro (-1.6) Utility function variables # cars/# driving licences (5.3) Gender (1:female, 0:male) (-1.4) Travel time [min] (-4.8) Waling time [min] (-8.4) Waiting time [min] (-2.1) Cost/wage rate (-4.1) Cut off parameters ω (Wal time) bn =10 [min] d = 10 [%] ω (Travel time) bn =35 [min] d = 40 [%] CMNL Wal time (-5.0) (-6.7) (-5.0) (7.1) (-4.7) (-5.0) (-5.4) (-1.8) (5.4) (-1.5) (-4.9) (-1.6) (-2.1) (-3.2) (3.3) CMNL Travel time (-3.6) (-5.1) (-3.7) (8.5) (-4.2) (-4.9) (-4.7) (-1.5) (5.3) (-1.6) (4.8) (-8.3) (-1.6) (-4.3) (14.4) Wal time & Travel time (-4.0) (-5.5) (-4.0) (8.5) (-4.3) (-4.9) (-5.1) (-1.7) (5.4) (-1.6) (5.2) (-1.6) (-1.6) (-3.6) (3.1) (14.1) Log lielihood Lielihood ratio test (MNL) χ 2 lim Simple size 697

17 6. CONCLUSIONS AND FINAL COMMENTS The CMNL model is a generalization of the MNL model that possess a more complex probability formulation, using a non-linear function to include the possibility of penalizing the choice of alternatives not available or discarded by the decision maer because the value of one or more attributes exceeds or approaches to a threshold. This paper has confirmed the feasibility of implementing the CMNL and solving the calibration problem using the standard method of maximum lielihood. The first-order conditions are consistent with the theory of behavior on which the model is based, so they reproduce the MNL conditions within the domain of the variables while on the edge (near the threshold) it define new conditions that allow to calibrate the parameters associated with the penalization. It has been demonstrated that the CMNL model is capable of generating consistent predictions in terms of the variables used in calibration and its parameters can be identified for the simulated databases. In the case of real data, we found some cases where there is statistically sigficant evidence of threshold effects, and the CMNL model fits better the data than the MNL. In terms of prediction capabilities, we found that the CMNL demand predictions are different from those of the MNL as the attributes approach the edge of the domain. We also conclude that a sample rich in choice variability is required to calibrate the CMNL model, including situations where the thresholds are active constraints. When the data is not rich in variance, the parameters estimates are unstable, and it is better to use the simpler MNL model. The parameters of the restricted variables in the model may, under certain circumstances, present a correlation with their correspondent in the compensatory component of the utility function. In this case, identification problems might occur. This problem does not always appear as it depends on the characteristics of the database that is being used. The problem is lessened and even disappear when the cut-off is associated with an attribute that represents a high percentage of the compensatory utility function (obtained from the MNL) compared to other parameters and when the relationship between ω and θ is such that both effects are sufficiently strong. Should this problem arise, a method is proposed that divides the sample between observations inside the feasible space from those closer to the edge of that space, which maes those parameters be estimated with the corresponding sub-samples. In order to create this division, the result of the calibration is explored for different values of the bounds until identifying that of the maximum lielihood. Since such division of the sample affects the adjustment and sigficance of the calibrated parameters, this method could be automated in subsequent studies to find an optimization method to find the division of the sample that has the best adjustment. Since this study sought to mae a realistic comparison of the advantages that the CMNL might present, wor was only done to compare it with the MNL model. Another matter left for future investigations is to apply the cut-off to more complex discrete choice models such as the Hierarchical Logit (Williams, 1977) or the Mixed Logit (McFadden and Train, 2000).

18 ACKNOWLEDGEMENTS The authors than Fondecyt (No ), the PBCT - Redes Urbanas - R19 project and the Millenum Institute Complex Engineering Systems. REFERENCES Cantillo, V. and J. de D. Ortúzar (2005). A semi-compensatory discrete choice model with explicit attribute thresholds of perception. Transportation Research 39 B Cascetta, E. and A.. Papola (2001). Random Utility Models with Implicit aviability/perception of Choice Alternatives for the Simulation Travel Demand. Transport Research 9C Cherchi, E. and J. de D. Ortúzar (2006). Income, Time Effects and Direct Preferences in a Multimodal Choice Context: Application of Mixed RP/SP Models with Non-Linear Utilities. Networs and Spatial Economics Donoso, P. (1982). Calibración de Modelos Logit Simple para el Corredor Las Condes - Centro. Tesis para optar al Título de Ingeero Civil Mención Trasporte. Pontificia Uversidad Católica de Chile. Mansi, C. (1977). The Structure of Random Utility Models. Theory and Decision Martínez, F, L. Águila and R. Hurtubia (2008). The Constrained Multinomial Logit: a semi-compensatory choice model. Transportation Research Part B (forthcoming). McFadden, D. (1968). The Revealed Preferences of s Government Bureaucracy. Bell Journal of Economics and Management Science McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior. Frontiers in Econometrics. Editor P. Zaremba. Academic Press. Nueva Yor McFadden, D. and K. Train (2000). Mixed MNL Models for Discrete Response. Journal of Applied Econometrics 15 (5), Muzaga, M.A.., B. Heydecer and J. de D. Ortúzar (2000). Representation of Heterosedasticity in discrete choice models. Transportation Research 34B, Swait, J. (2001). A Non-compensatory Choice Model Incorporating Attribute Cut-offs. Transportation Research 35B Swait, J. and M. Ben-Aiva (1987). Incorporating Random Constrains in Discrete Models of Choice Set Generation. Transport Research 21B, Tversy, A. (1972) Elimination by Aspects: a Theory of Choice. Psychological Review Williams, H. (1977) On the formation of travel demand models and economic evaluation measures of user benefit. Environment and Planng 9A Williams, H. and J. de D. Ortúzar (1982) Behavioural Theories of Dispersion and the Misspecification of Travel Demand Models. Transportation Research 16B

Estimation of a constrained multinomial logit model

Transportation (2013) 40:563 581 DOI 10.1007/s11116-012-9435-4 Estimation of a constrained multinomial logit model Marisol Castro Francisco Martínez Marcela A. Munizaga Published online: 30 August 2012