Probabilistic Choice Models

Similar documents
Probabilistic Choice Models

Discrete Dependent Variable Models

An Overview of Choice Models

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Limited Dependent Variable Models II

Introduction to Discrete Choice Models

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

h=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Labor Supply and the Two-Step Estimator

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as

Notes on Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 2. Recap: MNL. Recap: MNL

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

Econ 673: Microeconometrics

Four Parameters of Interest in the Evaluation. of Social Programs. James J. Heckman Justin L. Tobias Edward Vytlacil

Multinomial Discrete Choice Models

Econ 424 Time Series Concepts

Chapter 3 Choice Models

Hiroshima University 2) The University of Tokyo

Lecture 6: Discrete Choice: Qualitative Response

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Model of Human Capital Accumulation and Occupational Choices. A simplified version of Keane and Wolpin (JPE, 1997)

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Lecture-20: Discrete Choice Modeling-I

ECON 594: Lecture #6

Probit Estimation in gretl

Introduction to Linear Regression Analysis

2. We care about proportion for categorical variable, but average for numerical one.

Structural Econometrics: Dynamic Discrete Choice. Jean-Marc Robin

Additional Material for Estimating the Technology of Cognitive and Noncognitive Skill Formation (Cuttings from the Web Appendix)

Volume Title: The Interpolation of Time Series by Related Series. Volume URL:

Introduction to Computational Finance and Financial Econometrics Probability Theory Review: Part 2

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Normalization and correlation of cross-nested logit models

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Reduced Error Logistic Regression

Maximum Likelihood (ML) Estimation

Controlling for Correlation Across Choice Occasions and Sites in a Repeated Mixed Logit Model of Recreation Demand *

Hiroshima University 2) Tokyo University of Science

Maximum Likelihood and. Limited Dependent Variable Models

Assortment Optimization under Variants of the Nested Logit Model

Lecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur

Great Expectations. Part I: On the Customizability of Generalized Expected Utility*

Entry for Daniel McFadden in the New Palgrave Dictionary of Economics

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Generalized Linear Models for Non-Normal Data

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Answer Key for STAT 200B HW No. 7

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 14

Constrained Assortment Optimization for the Nested Logit Model

Normalization and correlation of cross-nested logit models

disc choice5.tex; April 11, ffl See: King - Unifying Political Methodology ffl See: King/Tomz/Wittenberg (1998, APSA Meeting). ffl See: Alvarez

L2: Review of probability and statistics

Comparing IRT with Other Models

Economics 583: Econometric Theory I A Primer on Asymptotics

Maximum Likelihood Methods

Assortment Optimization under Variants of the Nested Logit Model

Choice Theory. Matthieu de Lapparent

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

The estimation of Generalized Extreme Value models from choice-based samples

1/sqrt(B) convergence 1/B convergence B

16/018. Efficiency Gains in Rank-ordered Multinomial Logit Models. June 13, 2016

Estimation of mixed generalized extreme value models

Part I Behavioral Models

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Normalization and correlation of cross-nested logit models

Linear Classification: Probabilistic Generative Models

The 17 th Behavior Modeling Summer School

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

5.5 Yitzhaki Weights as a Version of Theil Weights (1950)

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

LDA, QDA, Naive Bayes

Review (Probability & Linear Algebra)

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?

Department of Economics

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Order choice probabilities in random utility models

Discrete choice models Michel Bierlaire Intelligent Transportation Systems Program Massachusetts Institute of Technology URL: http

A short introduc-on to discrete choice models

Final Exam. Economics 835: Econometrics. Fall 2010

The Instability of Correlations: Measurement and the Implications for Market Risk

Specification Test on Mixed Logit Models

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Ordered Response and Multinomial Logit Estimation

Introduction: structural econometrics. Jean-Marc Robin

dqd: A command for treatment effect estimation under alternative assumptions

Lab 07 Introduction to Econometrics

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

A New Generalized Gumbel Copula for Multivariate Distributions

AGEC 661 Note Fourteen

Economics 201B Economic Theory (Spring 2017) Bargaining. Topics: the axiomatic approach (OR 15) and the strategic approach (OR 7).

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

Discriminant analysis and supervised classification

Transcription:

Probabilistic Choice Models James J. Heckman University of Chicago Econ 312 This draft, March 29, 2006

This chapter examines dierent models commonly used to model probabilistic choice, such as eg the choice of one type of transportation from among many choices available to the consumer. Section 1 discusses derivation and limitations of conditional logit models. Section 2 discusses probit models and Section 3 discusses the nested logit (generalized extreme value models), which address some of the limitations of the conditional logit models. 1 The Conditional Logit Model In this section we investigate conditional logit models. We discuss its derivation from a random utility model (Section 1.2) with Extreme Value Type I distributed shocks. The relevant 1

properties of the Extreme Value Type I distribution are discussed in Section 1.1. We also derive the conditional logit model from the Luce axioms (Section 1.3). In Section 1.4, we discuss some of the limitations of the conditional logit models. 1.1 The Extreme Value Type I Distribution Suppose is independent (not necessarily identical) Extreme ValueTypeIrandomvariable.ThentheCDFof is: 1 Pr( )= () =exp( exp ( ( + ))) 1 McFadden (1974) mistakenly referred to this as the Weibull distribution, and this misclassification shows up in the literature. 2

where is a parameter of the Extreme Value Type I CDF. Also, by the assumption of independence, we can write: ( 1 2 )= Y ( )= Y exp ( exp ( ( + ))) =1 =1 The Extreme Value Type I distribution has two useful features. First, the dierence between two Extreme Value Type I random variables is a logit. Second, Extreme Value Type Is are closed under maximization, since (assuming independence): 3

³ Pr max{ } Y = Pr( ) = =1 Y exp ( exp ( ( + ))) =1 Ã Ã X!! = exp exp ( ( + )) =1 Ã! X = exp exp () exp( ) (1) =1 4

P Consider exp( ) We can solve for in the following equation: =1 which implies: X exp( )=exp() =1 =log à X! exp( ) =1 We can then substitute this value of into equation (1) to get: ³ Pr max{ } = exp( (exp ()) exp()) = exp( exp ( ( + ))) which is indeed a Extreme Value Type I random variable. 5

1.2 Random Utility Model An individual with characteristics has a choice set ;with element, is a feasible set. We write: Pr ( ) as the probability that a person of characteristics chooses from the feasible set. We also suppose that: ( ) =( )+( ) where is independent Extreme Value Type I. From our information on Extreme Value Type Is in section 1, we know that +,(andthus ),hasanextremevaluetypeidistribution 6

with parameter, as shown below: () =Pr( + ) = Pr( ) = exp( exp ( ( + ))) Letusnowsupposethattherearetwogoodsandtwocorresponding utilities. Consumers govern their choices by the obvious decision rule: choose good one if 1 2 More generally, if there are goods, then good will be selected if argmax { } =1. Specifically, in our two good case: Pr (1 is chosen) =Pr( 1 2 )=Pr( 1 + 1 2 + 2 ) 7

Imposing that is independent Extreme Value Type I, we can be much more precise about this probability: Pr ( 1 + 1 2 + 2 ) (2) = Pr( 1 + 1 2 2 ) = = Z Z ( 1 ) μz 1 + 1 2 ( 2 ) 2 1 ( 1 )exp( exp ( 1 + 1 2 + 2 )) 1 Observe that ( 1 )=exp(exp ( 1 + 1 )) which implies: ( 1 ) = ( 1) 1 = exp (exp ( 1 + 1 )) (exp ( 1 + 1 )) = exp ( 1 + 1 )(exp(exp ( 1 + 1 ))) 8

Substituting this into equation(2), gives us: Pr (1 ischosen) = Z exp ( 1 + 1 )(exp( exp ( 1 + 1 ))) = 1 exp ( exp ( 1 + 1 2 + 2 )) 1 Z 1 [ exp( 1 )][exp( 1 )exp ( 1 2 + 2 )] 1 9

= exp( 1 ) = = 1 exp ( 1 )+exp( 1 2 + 2 ) [ exp( 1 )][exp( 1 )exp ( 1 2 + 2 )] exp ( 1 ) exp ( 1 )+exp( 1 2 + 2 ) exp( 1 1 ) exp( 1 1 )+exp( 2 2 ) This result generalizes, because the max over ( 1) choices is still an Extreme Value Type I, so we can make a two stage maximization argument, as follows: 10

Pr ( 1 + 1 + =12 ) μ = Pr 1 + 1 max ( + ) =2 = = exp( 1 1 ) exp( 1 1 )+exp( 2 2 )+ +exp( ) exp( 1 ) P exp ( ) =1 where = This type of model of probabilistic choice is called a conditional or multinomial logit model. The dierence between conditional and multinomial is simply that in the conditional 11

logit case, the values of the variables (usually choice characteristics) vary across the choices, while the parameters are common across the choices. In the multinomial logit case, the values of the variables are common across choices for the same person (usually individual characteristics) but the parameters vary across choices. For e.g. we have in the linear case, the probability of individual making choice from among m choices is: Conditional Logit case: P = exp(0 ) P exp( 0 ) =1,where is the vector of values of characteristics of choice as perceived by individual. 12

Multinomial Logit case: P = exp(0 ) P exp( 0 ) =1,where is a vector of individual characteristics for individual. Note that we can easily combine the two cases under one model, as described below: Generalized case: We can combine the conditional and multinomial logit models by generalizing either one of the two types of models. For eg, we could permit the coecients in the multinomial logit case to depend on choice characteristics, ie have: = + 0 13

Then we get the generalized case, where the probability of choice by individual depends on both individual as well as choice characteristics (as well as interaction terms) 2 : = exp(0 ) P exp( 0 ) =1 = exp(0 + 0 ) P exp( 0 + 0 ) =1 We could similarly modify the coecients in the conditional logit case to obtain the generalized version. 2 Note that by allowing (the vector of individual characteristics) to have an intercept column of 1s, the model would have pure choice characteristic eects, in addition to the interaction terms. 14

1.3 Derivation of Logit from the Luce Axioms We will now show how the conditional logit can be derived from the random utility model discussed in Section 1.2 and theluceaxiomspresentedbelow. 1.3.1 Luce Axioms Axiom 1: Independence of Irrelevant Alternatives(IIA) Suppose that Then, Pr ( { })Pr( ) =Pr( { })Pr( ) 15

or, we have: Pr ( { }) Pr ( { }) = Pr ( ) Pr ( ) The term on the left is the odds ratio; the ratio of probabilities of choosing to given characteristics and { } This axiom has been named Independence of Irrelevant Alternatives for an obvious reason the odds of our choice are not eected by adding additional alternatives. Note that this assumes that the additional choices entering in B aect probability of choosing in the same manner as they aect the probability of choosing ; implicitly we are assuming that the additional choices have equivalent relationship with choice and choice. We will see how this assumption is a limita- 16

tion, in Section 1.4 below. Axiom 2: Positivity This axiom states that the probability of choosing any one of the choices is strictly greater than zero: Pr ( ) 0 1.3.2 Derivation of logit With the Luce assumptions set out in the preceeding section, we can now proceed to our derivation of the logit. Define =Pr( { }). Then by Axiom 1 above, we know: μ Pr ( ) =Pr( ) (3) 17

Summing over, weget: Pr ( ) X μ = 1 = Pr ( ) = P 1 μ (4) Again using Axiom 1, for : μ Pr ( ) =Pr( ) μ Pr ( ) =Pr( ) (5a) (5b) Substituting these in equation (3), we get: 18

μ μ = Pr ( ) Pr ( ) = μ Pr ( ) Pr ( ) = (6) Now, in terms of the random utility model discussed in Section 1.2, define the mean utility of a person with characteristics choosing from set { } as: ( ) ln = =exp(( )) 19

Define a comparable expression for. Replacing this into equation (6) produces: = Then from equation (4), we get: Pr ( ) = = = exp (( )) exp (( )) 1 μ P exp (( )) exp (( )) 1 ³ 1 P exp(()) (exp (( ))) P exp (( )) (exp (( ))) 20

Assume additionally, additive separability of ( ) as follows: ( ) =( ) ( ) Note that this is equivalent to assuming irrelevance of the benchmark. From this assumption, we get: Pr ( ) = = = P exp (( ) ( )) (exp (( ) ( ))) exp ( )exp(( )) ³ P exp (( )) exp (( )) exp ( ) P exp (( )) (7) 21

which gives the multinomial logit. McFadden (1974) shows that Luce Axioms and a condition on ( Translation Completeness ) produce the Extreme Value Type I (which he mistakenly referred to as the Weibull). 1.4 Consequences of Independence: Limitations of Logit Models We just showed that: = exp( ) P exp( ) 22

so that: = exp( ) P exp( ) exp( ) P exp( ) = exp( μ ) exp( ) =exp( ) ln A common specification for is = Thus: μ μ ln ln =( ) = = or, changes in characteristics have a common eect on the ratio of log probabilities. This allows for estimation of the probabilities of purchasing a new good. (One could obtain an 23

estimate of from the existing goods. This estimate can then be combined with the characteristics,, of the new good to estimate the probability of selection, as in equation 7). Further, from equation (7): and: Pr (2 {1 2}) = 2 1 + 2 Pr (2 {1 2 3}) = 2 1 + 2 + 3 Pr (2 {1 2}) This leads us to a restrictive property of the conditional logit model - we have assumed independence of the when in fact, they may be correlated. This is illustrated by McFadden s famous red bus, blue bus problem: Suppose we are modelling 24

transportation choice and our alternatives consist of {car, bus, train}. If the alternatives are replaced by {car, red bus, blue bus}, then we have violated our assumption of dissimilar alternatives; if 2 1 then the event 3 1 is more likely. One can see by the preceding equation that adding more bus colors continually decreases the probability that car travel is chosen. We can deal with the problem of similar alternatives by using the nested logit model (Nested Logit) or the random coecient probit model. 2 Probit: Random Coecients In this section (as in Section 1.4 above), we make asimple linear function of the choice characteristics alone (as discussed 25

in Section 1.2, we can easily generalize this to include individual characteristics as well as interactions). Then we have, utility from choice is: = + where: (0 2 ), Moreover, is a random variable, with ( P ), sothat: It follows that: = + + 1 2 0 ( 1 2 ) +( 1 2 ) +( 1 2 ) 0 1 3 0 ( 1 3 ) +( 1 3 ) +( 1 3 ) 0 26

Further: Var ( 1 2 ) = ( (1 2 ) ( 1 2 ) 0 (1 2 ) ( 1 2 ) ) = [( 1 2 )( )+( 1 2 )] 0 [( 1 2 )( )+( 1 2 )] ª ½ (1 = 2 )( )( ) 0 ( 1 2 ) 0 +( 1 2 )( 1 2 ) 0 ¾ (since 12 =0). Similarly: = ( 1 2 ) X ( 1 2 ) 0 + 2 1 + 2 2 Var ( 1 3 )=( 1 3 ) X ( 1 3 ) 0 + 2 1 + 2 3 27

Thus: so: Cov ( 1 2 1 3 )=( 1 2 ) X ( 1 3 ) 0 + 2 1 = Corr ( 1 2 1 3 )= ( 1 2 ) P ( 1 3 ) 0 + 2 1 p Var (1 2 ) Var ( 1 3 ) We now seek to derive the probability of choosing good 1 in a three good case: Pr (1 {1 2 3}) =Pr( 1 2 0 and 1 3 0) From before, we know that: 1 2 ( 1 2 ) Var ( 1 2 ) 1 3 ( 1 3 ) Var ( 1 3 ) 28

Thus: Pr ( 1 2 0 and 1 3 0) p Var (1 = Pr 2 ) 1 +( 1 2 ) 0 and p Var ( 1 3 ) 2 +( 1 3 ) 0 where 1 and 2 are standard normal. Thus, the above equation reduces to: à Pr 1 ( 1 2 ) p Var (1 2 ) and 2 ( 1 3 ) p! Var (1 3 ) =Pr à 1 ( 1 2 ) p Var (1 2 ) and 2 ( 1 3 ) p! Var (1 3 ) 29

As 1 and 2 may be correlated, we integrate over the joint density to get the probability: = Pr (choosing 1) Z Z 1 2 p 1-2 1 2 1-2 1 2 + 2 2 2 1-2 2 1 where: = ( 1 2 ) p and = ( Var (1 2 ) 1 3 ) p Var (1 3 ) Now consider adding a third good to the two good case, under two alternative scenarios. 30

Case 1: Non-random utility, random coecients. If the third good has identical characteristics as the first, then 2 = 3 If there is no stochastic component (no utility innovation), then 2 1 = 2 2 = 2 3 =0 Therefore, in this case: Pr (1 chosen) = Pr( 1 2 0 and 1 3 0) = Pr( 1 2 0) Thus, there is no change in the probability of choosing good 1 despite the addition of a third good. Again fo- 31

cusing on the two good case, we observe: Pr (1 {12}) = Pr( 1 2 0) Ã = Pr 1 ( 1 2 ) p! Var (1 2 ) = Z 1 2 (12) [( 1 2 ) ( 1 2 ) 0 + 2 1 +2 2 ]12 exp μ 2 1 2 which can be evaluated to derive the desired probability. Case 2: Random utility, non-random coecients. Here we consider a McFadden-Luce type of set up, where one imposes P =0 Defining = p 2 1 + 2 2 we observe 32

that the probability of choosing good 1 in the two-good case is: Z (12) μ μ 1 exp 2 2 2 Adding a third good to the scene with identical characteristics, ( 2 = 3 ), yields the probability for good 1 being purchased as: Z ( 1 2 ) Ã Z ( 1 2 ) Ã μ!! 1 2 p 2 1 exp 1 1 2 1 2 + 2 2 2 2 1 2 1 2 One can show that, upon evaluation of these integrals, the probability derived from addition of the third good is less than the probability in the two good case. This leads us to a similar problem as the multinomial/conditional 33

logit adding alternatives decreases the probability of choice, despite the fact that the alternatives are quite similar. Thus, in the probit case we are able to avoid the limitation of the logit models with regard to addition of an identical good, through the covariance structure of the random coecients. As illustrated in case 2, probit models without random coefficients suer from the same limitation. Note that while the richer covariance structure is able to capture the relationship between choices in the probit model, applications involving many choices are practically limited as evaluation of higherorder multivariate normal integrals is dicult (refer discussion in Greene, Section 19.6.2.a). 34

3 Nested Logit: Generalized Extreme Value (GEV) Model Consider a function ( 1 2 ) where satisfies: i. Non-negativity: ( 1 2 ) 0 ( 1 2 ) 0 ii. Homogeneous of degree 1: ( 1 2 )= ( 1 2 ) iii. Derivative property: 1 2 0 if even 0 if odd. 35

If satisfies these conditions, then we get the following probability: ( { 1 2 }) = ( 1 2 ) ( 1 2 ) where is a probability that can be derived from utility maximization. We can use the theorem above to derive 36

a special case of the nested logit model. Define: (exp( 1 ) exp( 2 ) exp( )) μ μ 2 exp +exp exp( 1 )+ 1 μ + +exp 1 = exp( 1 )+ 3 1 (exp( 2 )) 1 1 + +(exp( )) 1 1 1 1 Observe that =0is the ordinary logit model. (With defined in this way, we are assuming that 1 is uncorrelated with all of the other while the remaining may be correlated. The parameter is a kind of measure of correlation between the remaining. It is this correla- 37

tion structure that would allow the GEV model to tackle the limitation of the ordinary conditional/multinomial logit models. ). This function obviously meets the conditions for the GEV model. For, i. Non-negativity: obvious as 0 1 ii. Homogeneity: ( exp( 1 )exp( 2 ) exp( )) = exp( 1 )+ ( exp( 2 )) 1 1 + +(exp ( )) 1 1 1 μ = exp( 1 )+ 1 1 (exp( 2 )) 1 1 + + ³ 1 1 (exp ( )) 1 1 1 38

μ μ = exp( 1 )+ exp = Ã exp( 1 )+ μ exp 2 1 2 1 = (exp( 1 ) exp( 2 ) exp( )) + + μ + +exp μ μ exp 1 iii. By inspection, one can see that this derivative property will hold. (It is obvious when dierentiating with respect to exp( 1. For other derivatives, the fact that 0 1 gives the needed alternation in sign. Note that in the definition of the property is analogous to exp( here.) 1 1 1! 39

Thus, we can now proceed to derive our probabilities. First, consider: Pr (1 {1 2}) = 1 1 + ³ 2 1 1 = 1 1 + 2 which is simply our binomial logit model. Also note that in the three good case: Ã μ μ! 2 3 1 2 = (1 ) exp +exp 1 1 1 exp μ Ã μ μ! 2 2 3 = exp exp +exp 1 1 1 μ 2 1 40

Now suppose that we eliminate choice 1 (by letting 1 ). Then: μ μ μ 2 2 3 exp( 2 )exp exp + exp 1 1 1 Pr (2 {23}) = μ μ 1 2 3 exp + exp 1 1 μ 2 exp 1 = μ μ 2 3 exp + exp 1 1 41

Observe that: Pr (1 {1 2 3}) = = 1 + ³ 2 1 1 + 1 + 3 1 1 1 2 1 ³ 1+ 3 2 1 1 = 1 + 2 Ã1+ 1 μ 3 2 1! 1 (8) 1 Letting 1 and supposing 2 3 we get: 42

μ 3 2 1= μ 3 2 andthusfromequation(8),wehave: 1 1 0 as 1 Pr (1 {123}) 1 1 + 2 (9) Conversely, if 3 2 just reverse the roles of 2 and 3 so: Pr (1 {1 2 3}) 1 1 + 2 μ 3 2 = 1 1 + 3 (10) Combining equations (9) and (10), we get, as 1: Pr (1 {1 2 3}) 1 1 +max{ 2 3} (11) 43

Equations (9), (10) & (11) imply that in this GEV model, the probability of choice 1 on addition of a choice 3 identical to choice 2, does not necessarily fall, as was the case in the ordinary conditional/multinomial logit case. Equations (9) shows that if the added choice 3 is highly correlated to choice 2( 1) but yields less utility, then the probability in the three choice case reduces to the binomial logit (the probability in the two choice case), with choice 3 dropping out, as one would intuitively expect. What about the probability of choice 2 how does this change when we add an identical choice 3 in this GEV model? To 44

answer this, consider: = = = Pr (2 {123}) " ½ μ μ ¾ # μ 2 3 1 2 (1 ) exp +exp 1 1 1 exp 2 1 μ μ ¾ 1 2 3 1 + ½exp +exp 1 1 μ ½ μ μ ¾ 2 2 3 exp exp +exp 1 1 1 μ μ ¾ 1 2 3 1 + ½exp +exp 1 1 μ 2 exp 1 ½ μ μ ¾! 1 μ μ μ 2 3 2 3 Ã 1 + exp +exp exp +exp 1 45 1 1 1

When =0, ie when there is no correlation between choice 2 and choice 3, we have ordinary conditional/multinomial logit. Suppose 2 3 and 1. By appealing to the result derived in equation (11), we get: (2 {123}) μ 2 exp = 1 μ μ 2 3 (12) exp +exp 1 1 μ μ 1 2 3 exp +exp 1 1 μ μ 1 2 3 exp 1 + exp +exp 1 1 46

We know for 2 3 : μ 3 2 1 μ 3 2 1 1 0 as 1 and thus, from equation (12), we get: Pr (2 {1 2 3}) exp( 2 ) as 1 exp( 1 )+exp( 2 ) (One could derive a similar result be assuming that 3 2 ) This equation tells us that in the GEV model, if choices 2 and 3 are very similar, if utility from 2 is greater than that from 3, then choice 3 gets disregarded (same as in Equation 9 earlier), which agrees with our intuition. 47

Finally, supposing that 2 = 3,weget: μ = 2 1 + exp +exp 1 μ 1 = 2 1 + 2exp 1 = exp( 1 )+2 1 exp ( 2 ) μ 1 2 1 Thus: = exp ( 2 )2 Pr (2 {123}) = exp( 1 )+2 1 exp ( 2 ) exp 2 = 2 exp 1 +2exp 2 lim Pr (2 {123}) 1 exp( 2 ) 1 2 exp( 1 )+exp( 2 ) 48

This final equation tells us if the characteristics are identical in the nested logit model, then the probability, in the three choice case, of choosing one of the two identical choices is equal half the probability of the two choice case, which is again what is intuitively expected. Thus the nested logit (GEV) model is able to avoid the key limitation of the conditional/multinomial logit imposed by the IIA assumption. References [1] Maddala, Limited-Dependent and Qualitative Variables in Econometrics,1983, chapters 2 & 3. [2] Greene, Econometric Analysis, 2000, chapter 19. 49