Estimation of mixed generalized extreme value models

Size: px

Start display at page:

Download "Estimation of mixed generalized extreme value models"

Chastity Riley
5 years ago
Views:

1 Estimation of mixed generalized extreme value models Michel Bierlaire Operations Research Group ROSO Institute of Mathematics EPFL Katholieke Universiteit Leuven, November 2004 p.1

2 Introduction It is our choices that show what we truly are, far more than our abilities Albus Dumbledore Katholieke Universiteit Leuven, November 2004 p.2

3 Introduction Katholieke Universiteit Leuven, November 2004 p.3

4 Introduction Nobel Prize 2000 to D. Mc Fadden for his development of theory and methods for analyzing discrete choice Katholieke Universiteit Leuven, November 2004 p.4

5 Introduction Discrete choice models: P (i C n ) where C n = {1,..., J} Random utility models: U in = V in + ε in and P (i C n ) = P (U in U jn, j = 1,..., J) Utility is a latent concept Katholieke Universiteit Leuven, November 2004 p.5

6 Multinomial Logit Model Assumption: ε in are the maximum of many r.v. capturing unobservable attributes (e.g. mood, experience), measurement and specification errors. Gumbel theorem: the maximum of many i.i.d. random variables (with a tail) approximately follows a Gumbel distribution. ε in Gumbel(0, µ) Katholieke Universiteit Leuven, November 2004 p.6

7 Multinomial Logit Model Gumbel(η, µ), with µ > 0 : f(t) = µe µ(t η) e e µ(t η) If ε Gumbel(η, µ), then P (c ε) = F (c) = c f(t)dt = e e µ(t η) Katholieke Universiteit Leuven, November 2004 p.7

8 Multinomial Logit Model If then where E[ε] = η + γ µ ε Gumbel(η, µ) and Var[ε] = π2 6µ 2 γ = lim k k i=1 1 i ln k Euler constant Katholieke Universiteit Leuven, November 2004 p.8

9 Multinomial Logit Model The difference of two Gumbel distribution is logistic We have P (i {i, j}) = P (V i + ε i V j + ε j ) = P (V i V j ε j ε i ) We obtain the multinomial logit model P (i C n ) = e V in j C n e V jn Katholieke Universiteit Leuven, November 2004 p.9

10 Multinomial Logit Model Multinomial logit model: ε in i.i.d. Gumbel Gumbel is an Extreme Value distribution ε in is the maximum of many r.v. capturing unobservable attributes, measurement and specification errors. Key assumption: Independence Katholieke Universiteit Leuven, November 2004 p.10

11 Relaxing the independence assumption U 1n. = V 1n. + ε 1n. U Jn V Jn ε Jn that is U n = V n + ε n and ε n is a vector of random variables. Katholieke Universiteit Leuven, November 2004 p.11

12 Relaxing the independence assumption ε n N(0, Σ): multinomial probit model No closed form for the multifold integral Numerical integration is computationally infeasible Extensions of multinomial logit model Nested logit model Generalized Extreme Value (GEV) models Katholieke Universiteit Leuven, November 2004 p.12

13 GEV models Family of models proposed by McFadden (1978) Idea: a model is generated by a function From G, we can build G : R J R The cumulative distribution function (CDF) of ε n The probability model The expected maximum utility Not equivalent to GEV in statistics Katholieke Universiteit Leuven, November 2004 p.13

14 GEV models 1. G is homogeneous of degree µ > 0, that is G(αx) = α µ G(x) 2. lim x i + G(x 1,..., x i,..., x J ) = +, for each i = 1,..., J, 3. the kth partial derivative with respect to k distincts x i is non negative if k is odd and non positive if k is even, i.e., for all (disctincts) indices i 1,..., i k {1,..., J}, we have ( 1) k k G (x) 0, x R J x i1... x +. ik Katholieke Universiteit Leuven, November 2004 p.14

15 GEV models Density function: F (ε 1,..., ε J ) = e G(e ε 1,...,e ε J ) Probability: P (i C) = ev i +ln G i (ev 1,...,e V J ) Pj C ev j +ln G j (ev 1,...,e V J ) with G i = G x i. This is a closed form Expected maximum utility: V C = Euler s constant. ln G(...)+γ µ where γ is Note: P (i C) = V C V i. Katholieke Universiteit Leuven, November 2004 p.15

16 GEV models Example: G(e V 1,..., e V J ) = J i=1 eµv i P (i) = e V i+ln G i (e V 1,...,e V J ) with G j C ev j+ln G j (e V 1,...,e V J ) i(x) = µx µ 1 i e V i+ln G i (e V 1,...,e V J ) = e V i+ln µ+(µ 1) ln e V i = e ln µ+µv i P (i) = e ln µ+µv i = eµvi j C eln µ+µv j j C eµv j Multinomial Logit Model Katholieke Universiteit Leuven, November 2004 p.16

17 GEV models Multinomial logit model Nested logit model Cross-nested logit model and more... Katholieke Universiteit Leuven, November 2004 p.17

18 GEV models Closed form probability model Provides a great deal of flexibility Formulation not in term of correlations Require heavy proofs Katholieke Universiteit Leuven, November 2004 p.18

19 Properties of GEV Let R J i be p subspaces spanning R J. For any vector y R J, [y] i denotes the projection of y on R J i. It is assumed that the projection is such that all entries of [y] i are strictly positive. Let G i : R J i + R, i = 1,..., p be µ-gev functions. Then, the function G : R J + R : y G(y) = p i=1 α i G i ([y] i ) is also a µ-gev function if α i > 0, i = 1,..., p. Katholieke Universiteit Leuven, November 2004 p.19

20 Properties of GEV Let G : R J + R be a µ-gev function. Then G β is a (µβ)-gev function if 0 < β 1. Katholieke Universiteit Leuven, November 2004 p.20

21 Properties of GEV Let R J i be p subspaces spanning R J. For any vector y R J, denote by [y] i the projection of y on R J i. Let G i : R J i + R, i = 1,..., p be µ i -GEV functions. Then, the function G : R J + R : y G(y) = p i=1 α i G i ([y] i ) µ µ i is a µ-gev function if α i > 0 and 0 < µ µ i, i = 1,..., p. Katholieke Universiteit Leuven, November 2004 p.21

22 Properties of GEV Moreover, P (i C) = e V i+ln G i (e V 1,...,e V J ) j C ev j+ln G j (e V 1,...,e V J ) So, α i e V i+ln G i (e V 1,...,e V J ) = e V i+ln G i (e V 1 + ln α i,...,e V J + ln α i ) α i P (i C) j B α jp (j C) = evi+ln Gi(eV1+ln αi,...,evj +ln αi ) j B ev j+ln G j (e V 1 +ln αj,...,e V J +ln αj ) Katholieke Universiteit Leuven, November 2004 p.22

23 Applications These properties have practical consequences Network GEV Sampling strategy Katholieke Universiteit Leuven, November 2004 p.23

24 Network GEV Extension of the tree representation for Nested Logit Investigate new GEV models Provide the proof once for all Katholieke Universiteit Leuven, November 2004 p.24

25 Network GEV Let (V, E) be a network with link parameters α (i,j) 0 Assumptions: 1. No circuit. 2. One node without predecessor: root. 3. J nodes without successor: alternatives. 4. For each node v i, there exists at least one path from the root to v i such that P k=1 α (i k 1,i k ) > 0. Katholieke Universiteit Leuven, November 2004 p.25

26 Network GEV For each node v i, we define a set of indices I i {1,..., J} of J i relevant alternatives, a homogeneous function G i : R J i R, and a parameter µ i. Recursive definition of I i : I i = {i} for alternatives, I i = j succ(i) I j for all other nodes. Katholieke Universiteit Leuven, November 2004 p.26

27 Network GEV Recursive definition of G i : For alternatives: G i : R R : G i (x i ) = x µ i i For all others: G i : R J i R : G i (x) = j succ(i) i = 1,..., J α (i,j) G j (x) µ i µ j Katholieke Universiteit Leuven, November 2004 p.27

28 Network GEV Example: Cross-Nested Logit i=4,5 α 0i (α i1 y µ i 1 + α i2 y µ i 2 + α i3 y µ i y µ 1 1 y µ 2 2 y µ 3 3 G = m 3 ) µ 0 µ i ( j C α 51 y µ α 52 y µ α 53 y µ 5 3 ) µ µm α jm y µ m j Katholieke Universiteit Leuven, November 2004 p.28

29 Network GEV Daly & Bierlaire (2003) GEV calculus Possibility to define new GEV models No more proof needed for Network GEV Katholieke Universiteit Leuven, November 2004 p.29

30 Sampling Population probability of choice i C and socio-eco char: P i (z, β )p(z). Probability of being sampled: R(i, z) exogeneous sample: R(i, z) = R(z) choice-based sample: R(i, z) = R(i) Sampling of alternative: A(z) = {j C R(j, z) > 0} Let s draw B, a subset of A(z), with probability S(B i, z). Analyze choice as if it were limited to B. Katholieke Universiteit Leuven, November 2004 p.30

31 Sampling Contribution to the likelihood: P (i z, B, β) = P i (z, β)r(i, z)s(b i, z) j B P j(z, β)r(j, z)s(b j, z) If P i (z, β) is given by a GEV model, we obtain P (i z, B, β) = R(i, z)s(b i, z)e V i+ln G i (e V 1,...,e V J ) j B R(j, z)s(b j, z)ev j+ln G j (e V 1,...,e V J ) Because of the property, let α(i, z) = ln R(i, z) + ln S(B i, z) P (i z, B, β) = e V i+ln G i (e V 1 +α(i,z),...,e V J +α(i,z) ) j B ev j+ln G j (e V 1 +α(i,z),...,e V J +α(i,z) ) Katholieke Universiteit Leuven, November 2004 p.31

32 Sampling The model can be estimated as if a pure random sampling strategy was used Only the constants are affected. Restrictions apply on the sampling of alternatives Katholieke Universiteit Leuven, November 2004 p.32

33 Mixed GEV GEV models cannot handle all possible correlation structures Cannot capture heteroscedasticity and heterogeneity Necessity of mixing the model Katholieke Universiteit Leuven, November 2004 p.33

34 Mixed GEV U n = V n + ε n ε n compliant with GEV theory V n contains random parameters. V n = β T X n where β N( β, Σ) Using the Cholesky factorization, we have β = β + P ζ where Σ = P P T and ζ are i.i.d. standard normal variates. Katholieke Universiteit Leuven, November 2004 p.34

35 Mixed GEV McFadden & Train(2000) Under mild regularity conditions, any discrete choice model derived from random utility maximization has choice probabilities that can be approximated as closely as one pleases by a Mixed MNL model. Why bother with Mixed GEV? Katholieke Universiteit Leuven, November 2004 p.35

36 Mixed GEV GEV has closed form formulation Mixed models require simulated maximum likelihood estimation Capture as much as possible of the correlation using GEV Use the mixing distribution for the rest Issue: estimation Katholieke Universiteit Leuven, November 2004 p.36

37 BIOGEME Motivations GEV family must be explored Complicated implementation No appropriate software package Most researchers use commercial packages: LIMDEP, ALOGIT, HieLoW or Gauss, Matlab, SAS Freeware: Kenneth Train (but based on Gauss) Katholieke Universiteit Leuven, November 2004 p.37

38 BIOGEME Objectives Maximum likelihood estimation of a wide variety of GEV models Use various nonlinear optimization algorithms Open source Designed for researchers Flexible and easily extensible Katholieke Universiteit Leuven, November 2004 p.38

39 BIOGEME BIerlaire s Optimization toolbox for GEV Models Estimation Development : Version 0.0: July 2, Version 0.7: December 15, 2003 Version 0.8: March 19, 2004 Version 1.0: September 17, 2004 Katholieke Universiteit Leuven, November 2004 p.39

40 BIOGEME Input files mymodel.mod: model specification sample.dat: sample data mymodel.par: general control of the package Output files mymodel.html : estimated parameters + statistics mymodel.sta : sample statistics technical and debugging reports Katholieke Universiteit Leuven, November 2004 p.40

41 GEV models Available in BIOGEME Multinomial logit model Nested logit model Cross-nested logit model Network GEV model Katholieke Universiteit Leuven, November 2004 p.41

42 Heterogeneity GEV models are homoscedastic Assume there are two different groups such that and Var(ε in2 ) = α 2 Var(ε in1 ) Then we prefer the model U in1 = V in1 + ε in1 U in2 = V in2 + ε in2 αu in1 = αv in1 + αε in1 U in2 = V in2 + ε in2 Katholieke Universiteit Leuven, November 2004 p.42

43 Heterogeneity If V in1 is linear-in-parameters, that is V in1 = j β j x jin1 then αv in1 = j αβ j x jin1 is nonlinear. Katholieke Universiteit Leuven, November 2004 p.43

44 Nonlinear utility funtions Other types of nonlinearities Box-Cox Box-Tukey transforms β (x + α)λ 1 λ where β, α and λ must be estimated Continuous market segmentation. Example: the cost parameter varies with income, ( ) λ β cost = ˆβ inc cost with λ = β cost inc ref inc inc β cost Katholieke Universiteit Leuven, November 2004 p.44

45 Mixed GEV models V n contains random parameters. V n = f(β f, β N, β U, X n ) where β f are deterministic β N N( β N, Σ) (β U ) i U(a i, b i ) Because f is nonlinear, other distributions than normal and uniform are possible Katholieke Universiteit Leuven, November 2004 p.45

46 Mixed GEV models Lognormal: if β is normal, then e β is lognormal Triangular: if β 1 and β 2 are uniform [0,1], then 1 2 (β 1 + β 2 ) is triangular. S B distribution: if β is normal, then e β 1 + e β is a S B distribution between 0 and 1. Katholieke Universiteit Leuven, November 2004 p.46

47 Mixed GEV models S B distribution Katholieke Universiteit Leuven, November 2004 p.47

48 Mixed GEV models Example: specification with correlated normally and lognormally distributed random coefficients V in = β 1 X i1n + β 2 X i2n where β 1 and β 2 are generated from ( β1 ln β 2 ) = ( β1 β 2 ) + [ p11 0 p 21 p 22 and ζ 1 and ζ 2 are independent N(0, 1). ] [ ζ1 ζ 2 ] Katholieke Universiteit Leuven, November 2004 p.48

49 Panel data Several observations are available for each individual Need to capture the individual-specific effects At each instance t, we have V nt = V (β nt, β n, X nt ) where β n are random parameters constant across t for a given individual n. Katholieke Universiteit Leuven, November 2004 p.49

50 Miscellaneous features Functionalities can be combined Model specification language BETA1 [ BETA1_S ] * x11 + exp( BETA2 [ BETA2_S ] ) * x12 Simulation with Halton draws Several optimization packages Constrained likelihood estimation Katholieke Universiteit Leuven, November 2004 p.50

51 Miscellaneous features Robust variance-covariance ( sandwich ) Output in HTML format Katholieke Universiteit Leuven, November 2004 p.51

52 And there is more than in BIOGEME... BIOSIM for forecasting by sample enumeration BIOROUTE for route choice models BIOLOOP to generate large-scale models chi2.xls to perform χ 2 tests Katholieke Universiteit Leuven, November 2004 p.52

53 Short course Lausanne, March 20-24, 2005 M. Ben Akiva D. McFadden M. Bierlaire D. Bolduc Katholieke Universiteit Leuven, November 2004 p.53

The Network GEV model

The Network GEV model Michel Bierlaire January, 2002 Discrete choice models play a major role in transportation demand analysis. Their nice and strong theoretical properties, and their flexibility to capture