Binary choice. Michel Bierlaire

Size: px

Start display at page:

Download "Binary choice. Michel Bierlaire"

Primrose Evans
6 years ago
Views:

1 Binary choice Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale de Lausanne M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 1 / 64

2 Outline Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 2 / 64

3 Outline Simple example Choice between Auto and Transit Data Time Time Time Time # auto transit Choice # auto transit Choice T T T C C C T T T C C C C T T C T C T T C M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 3 / 64

4 Model specification Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 4 / 64

5 Model specification Binary choice model Specification of the utilities U C = β 1 T C +ε C U T = β 1 T T +ε T where T C is the travel time with car (min) and T T the travel time with transit (min). Choice model where ε = ε T ε C. P(C {C,T}) = Pr(U C U T ) = Pr(β 1 T C +ε C β 1 T T +ε T ) = Pr(β 1 (T C T T ) ε T ε C ) = Pr(ε β 1 (T C T T )) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 5 / 64

6 Model specification Error term Three assumptions about the random variables ε T and ε C 1 What s their mean? 2 What s their variance? 3 What s their distribution? Note For binary choice, it would be sufficient to make assumptions about ε = ε T ε C. But we want to generalize later on. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 6 / 64

7 Model specification Error term The mean Change of variables Define E[ε C ] = β C and E[ε T ] = β T. Define ε C = ε C β C and ε T = ε T β T, so that E[ε C ] = E[ε T ] = 0. Choice model P(C {C,T}) = Pr(β 1 (T C T T ) ε T ε C ) = Pr(β 1 (T C T T ) ε T +β T ε C β C) = Pr(β 1 (T C T T )+(β C β T ) ε T ε C ) = Pr(β 1 (T C T T )+β 0 ε ) where β 0 = β C β T and ε = ε T ε C. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 7 / 64

8 Model specification Error term The mean Specification The means of the error terms can be included as parameters of the deterministic part. Only the mean of the difference of the error terms is identified. Alternative Specific Constant Equivalent specifications: U C = β 1 T C +ε C or U C = β 1 T C +β C +ε C U T = β 1 T T +β T +ε T U T = β 1 T T +ε T In practice: associate an alternative specific constant with all alternatives but one. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 8 / 64

9 Model specification Error term The mean Note Adding the same constant to all utility functions does not affect the choice model Pr(U C U T ) = Pr(U C +K U T +K) K R n. The bottom line... If the deterministic part of the utility functions contains an Alternative Specific Constant (ASC) for all alternatives but one, the mean of the error terms can be assumed to be zero without loss of generality. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 9 / 64

10 Model specification Error term The variance Utility is ordinal Utilities can be scaled up or down without changing the choice probability Pr(U C U T ) = Pr(αU C αu T ) α > 0 Link with the variance Var(αU C ) = α 2 Var(U C ) Var(αU T ) = α 2 Var(U T ) Variance is not identified As any α can be selected arbitrarily, any variance can be assumed. No way to identify the variance of the error terms from data. The scale has to be arbitrarily decided. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 10 / 64

11 Model specification Error term The distribution Assumption 1 ε T and ε C are the sum of many r.v. capturing unobservable attributes (e.g. mood, experience), measurement and specification errors. Central-limit theorem The sum of many i.i.d. random variables approximately follows a normal distribution: N(µ,σ 2 ). Assumed distribution ε C N(0,1), ε T N(0,1) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 11 / 64

12 Model specification Error term The Normal distribution N(µ,σ 2 ) Probability density function (pdf) f(t) = 1 σ (t µ) 2 2π e 2σ 2 Cumulative distribution function (CDF) P(c ε) = F(c) = No closed form. c f(t)dt M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 12 / 64

13 Model specification Error term The distribution ε = ε T ε C From the properties of the normal distribution, we have ε C N(0,1) ε T N(0,1) ε = ε T ε C N(0,2) As the variance is arbitrary, we may also assume ε C N(0,0.5) ε T N(0,0.5) ε = ε T ε C N(0,1) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 13 / 64

14 Model specification Error term The binary probit model Choice model P(C {C,T}) = Pr(β 1 (T C T T )+β 0 ε) = F ε (β 1 (T C T T )+β 0 ) The binary probit model Not a closed form expression P(C {C,T}) = 1 2π β1 (T C T T ) β 0 e 1 2 t2 dt M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 14 / 64

15 Model specification Error term The distribution Assumption 2 ε T and ε C are the maximum of many r.v. capturing unobservable attributes (e.g. mood, experience), measurement and specification errors. Gumbel theorem The maximum of many i.i.d. random variables approximately follows an Extreme Value distribution: EV(η, µ). Assumed distribution ε C EV(0,1), ε T EV(0,1). M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 15 / 64

16 Model specification Error term The Extreme Value distribution EV(η, µ) Probability density function (pdf) f(t) = µe µ(t η) e e µ(t η) Cumulative distribution function (CDF) P(c ε) = F(c) = c f(t)dt = e e µ(c η) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 16 / 64

17 Model specification Error term The Extreme Value distribution pdf EV(0,1) CDF EV(0,1) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 17 / 64

18 Model specification Error term The Extreme Value distribution Properties If ε EV(η,µ) then E[ε] = η + γ µ where γ is Euler s constant. and Var[ε] = π2 6µ 2 Euler s constant γ = lim k k i=1 1 i lnk = 0 e x lnxdx M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 18 / 64

19 Model specification Error term The distribution ε = ε T ε C From the properties of the extreme value distribution, we have ε C EV(0,1) ε T EV(0,1) ε Logistic(0,1) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 19 / 64

20 Model specification Error term The Logistic distribution: Logistic(η,µ) Probability density function (pdf) f(t) = µe µ(t η) (1+e µ(t η) ) 2 Cumulative distribution function (CDF) with µ > 0. P(c ε) = F(c) = c f(t)dt = 1 1+e µ(c η) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 20 / 64

21 Model specification Error term The binary logit model Choice model P(C {C,T}) = Pr(β 1 (T C T T )+β 0 ε) = F ε (β 1 (T C T T )+β 0 ) The binary logit model P(C {C,T}) = The binary logit model 1 1+e (β 1(T C T T )+β 0 ) = e β1tc+β0 e β 1T C +β 0 +e β 1 T T P(C {C,T}) = e V C e V C +e V T M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 21 / 64

22 Applying the model Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 22 / 64

23 Applying the model Back to the example Choice between Auto and Transit Data Time Time Time Time # auto transit Choice # auto transit Choice T T T C C C T T T C C C C T T C T C T T C M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 23 / 64

24 Applying the model First individual Parameters Let s assume that β 0 = 0.5 and β 1 = 0.1 Variables Let s consider the first observation: T C1 = 52.9 T T1 = 4.4 Choice = transit: y auto,1 = 0, y transit,1 = 1 Choice What s the probability given by the model that this individual indeed chooses transit? M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 24 / 64

25 Applying the model First individual Utility functions V C1 = β 1 T C1 = 5.29 V T1 = β 1 T T1 +β 0 = 0.06 Choice model P 1 (transit) = e V T1 e V T1 +e V C1 = Comments The model fits the observation very well. e 0.06 e e 5.29 = 1 Consistent with the assumption that travel time is the only explanatory variable. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 25 / 64

26 Applying the model Second individual Parameters Let s assume that β 0 = 0.5 and β 1 = 0.1 Variables T C2 = 4.1 T T2 = 28.5 Choice = transit: y auto,2 = 0, y transit,2 = 1 Choice What s the probability given by the model that this individual indeed chooses transit? M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 26 / 64

27 Applying the model Second individual Utility functions V C2 = β 1 T C2 = 0.41 V T2 = β 1 T T2 +β 0 = 2.35 Choice model P 2 (transit) = e V T2 e V T2 +e V C2 = Comment The model poorly fits the observation. e 2.35 e e 0.41 = 0.13 But the assumption is that travel time is the only explanatory variable. Still, the probability is not small. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 27 / 64

28 Applying the model Likelihood Two observations The probability that the model reproduces both observations is P 1 (transit)p 2 (transit) = 0.13 All observations The probability that the model reproduces all observations is Likelihood of the sample P 1 (transit)p 2 (transit)...p 21 (auto) = L = n (P n (auto) yauto,n P n (transit) y transit,n ) where y j,n is 1 if individual n has chosen alternative j, 0 otherwise M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 28 / 64

29 Applying the model Likelihood Likelihood Probability that the model fits all observations. It is a function of the parameters. Examples β 0 β 1 L M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 29 / 64

30 Applying the model Likelihood function β 0 β M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 30 / 64

31 Applying the model Likelihood function (zoom) β β M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 31 / 64

32 Maximum likelihood estimation Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 32 / 64

33 Maximum likelihood estimation Maximum likelihood estimation Estimators for the parameters Parameters that achieve the maximum likelihood max (P n (auto;β) yauto,n P n (transit;β) y transit,n ) β Log likelihood n Alternatively, we prefer to maximize the log likelihood max β ln n (P n (auto) yauto,n P n (transit) y transit,n ) = max β ln(y auto,n P n (auto)+y transit,n P n (transit)) n M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 33 / 64

34 Maximum likelihood estimation Maximum likelihood estimation β 0 β M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 34 / 64

35 Maximum likelihood estimation Maximum likelihood estimation β β M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 35 / 64

36 Maximum likelihood estimation Solving the optimization problem Unconstrained nonlinear optimization Iterative methods Designed to identify a local maximum When the function is concave, a local maximum is also a global maximum For binary logit, the log likelihood is concave Use the derivatives of the objective function M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 36 / 64

37 Maximum likelihood estimation Algorithm CFSQP in Biogeme β β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 37 / 64

38 Maximum likelihood estimation Algorithm CFSQP in Biogeme β β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 38 / 64

39 Maximum likelihood estimation Algorithm CFSQP in Biogeme β β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 39 / 64

40 Maximum likelihood estimation Algorithm CFSQP in Biogeme β β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 40 / 64

41 Maximum likelihood estimation Algorithm CFSQP in Biogeme β β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 41 / 64

42 Maximum likelihood estimation Algorithm CFSQP in Biogeme β β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 42 / 64

43 Maximum likelihood estimation Algorithm CFSQP in Biogeme β β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 43 / 64

44 Maximum likelihood estimation Algorithm CFSQP in Biogeme β β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 44 / 64

45 Maximum likelihood estimation Nonlinear optimization Things to be aware of... Iterative methods terminate when a given stopping criterion is verified, based on the fact that, if β is the optimum, L(β ) = 0 Stopping criteria usually vary across optimization packages, which may produce slightly different solutions They are usually using a parameter defining the required precision Most methods are sensitive to the conditioning of the problem. A well-conditioned problem is a problem for which all parameters have almost the same magnitude M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 45 / 64

46 Maximum likelihood estimation Nonlinear optimization Time in min. Time in sec. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 46 / 64

47 Maximum likelihood estimation Nonlinear optimization Time in min. Time in sec. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 47 / 64

48 Maximum likelihood estimation Nonlinear optimization Things to be aware of... Convergence may be very slow or even fail if the likelihood function is flat It happens when the model is not identifiable Structural flaw in the model (e.g. full set of alternative specific constants) Lack of variability in the data (all prices are the same across the sample) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 48 / 64

49 Maximum likelihood estimation Nonlinear optimization M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 49 / 64

50 Output of the estimation Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 50 / 64

51 Output of the estimation Output of the estimation Solution of max β R K L(β) β L(β ) Case study β0 = β1 = L(β0,β 1 ) = M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 51 / 64

52 Output of the estimation Second derivatives Information about the quality of the estimators 2 L 2 L β 2 β 1 1 β 2 2 L β 1 β K 2 L 2 L β 2 β 1 2 L β 2 L(β 2 β 2 2 β K ) = L 2 L β K β 1 β K β 2 2 L βk 2 2 L(β ) 1 is a consistent estimator of the variance-covariance matrix of the estimates M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 52 / 64

53 Output of the estimation Statistics Statistics on the parameters Summary statistics L(β ) = L(0) = Parameter Value Std Err. t-test β β (L(0) L(β )) = ρ 2 = 0.576, ρ 2 = M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 53 / 64

54 Output of the estimation Summary statistics Null log likelihood L(0) sample log likelihood with a trivial model where all parameters are zero, that is a model always predicting P(1 {1,2}) = P(2 {1,2}) = 1 2 Purely a function of sample size L(0) = log( 1 2N) = Nlog(2) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 54 / 64

55 Output of the estimation Summary statistics Likelihood ratio 2(L(0) L(β )) ( L ) (0) log L (β = log(l (0)) log(l (β )) = L(0) L(β ) ) Likelihood ratio test H 0 : the two models are equivalent Under H 0, 2(L(0) L(β )) is asymptotically distributed as χ 2 with K degrees of freedom. Similar to the F test in regression models M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 55 / 64

56 Output of the estimation Summary statistics Rho (bar) squared ρ 2 ρ 2 = 1 L(β ) L(0) Similar to the R 2 in regression models ρ 2 ρ 2 = 1 L(β ) K L(0) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 56 / 64

57 Back to the scale Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 57 / 64

58 Back to the scale Back to the scale Comparing models Arbitrary scale may be problematic when comparing models Binary probit: σ 2 = Var(ε i ε j ) = 1 Binary logit: Var(ε i ε j ) = π 2 /(3µ) = π 2 /3 Var(αU) = α 2 Var (U). Scaled logit coeff. are π/ 3 larger than scaled probit coeff. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 58 / 64

59 Back to the scale Comparing models Estimation results Note: π/ Probit Logit Probit * π/ 3 L β β M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 59 / 64

60 Appendix Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 60 / 64

61 Appendix Maximum likelihood for binary logit Let C n = {i,j} Let y in = 1 if i is chosen by n, 0 otherwise Let y jn = 1 if j is chosen by n, 0 otherwise Obviously, y in = 1 y jn Log-likelihood of the sample N ( e V in e V ) jn y in ln e V +y in +e V jn ln jn e V in +e V jn n=1 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 61 / 64

62 Appendix Maximum likelihood for binary logit P n (i) = e V in e V in +e V jn lnp n (i) = V in ln(e V in +e V jn ) lnp n (i) V in = 1 e V in e V in +e V jn = 1 P n (i) = P n (j) lnp n (i) V jn = evjn e V in +e V jn = P n (j) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 62 / 64

63 Appendix Maximum likelihood for binary logit L V in L θ = L V in V in θ + L V jn V jn θ = N n=1 ( ) lnp y n(i) lnp in V in +y n(j) jn V in = N n=1 (y inp n (j) y jn P n (i)) = N n=1 (y in(1 P n (i)) (1 y in )P n (i)) = N n=1 (y in P n (i)) = N n=1 (y jn P n (j)) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 63 / 64

64 Appendix Maximum likelihood for binary logit L θ = N (y in P n (i)) V in θ +(y jn P n (j)) V jn θ n=1 = N (y in P n (i))( V in θ V jn θ ) n=1 If V in = k θ kx ink, then L θ k = N (y in P n (i))(x ink x jnk ) n=1 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 64 / 64

Optimization and Simulation

Optimization and Simulation Variance reduction Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale de Lausanne M.