Binary choice. Michel Bierlaire

Similar documents
Optimization and Simulation

Estimation of mixed generalized extreme value models

Discrete panel data. Michel Bierlaire

Statistical Tests. Matthieu de Lapparent

Optimization and Simulation

Binary choice 3.3 Maximum likelihood estimation

Capturing Correlation in Route Choice Models using Subnetworks

Calculating indicators with PythonBiogeme

Choice Theory. Matthieu de Lapparent

Lecture-20: Discrete Choice Modeling-I

Lecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo

Modeling Binary Outcomes: Logit and Probit Models

Are You The Beatles Or The Rolling Stones?

Data-analysis and Retrieval Ordinal Classification

Calculating indicators with Biogeme

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Normalization and correlation of cross-nested logit models

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

A Lagrangian relaxation method for solving choice-based mixed linear optimization models that integrate supply and demand interactions

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Applied Econometrics (QEM)

The 17 th Behavior Modeling Summer School

Exploratory analysis of endogeneity in discrete choice models. Anna Fernández Antolín Amanda Stathopoulos Michel Bierlaire

Introduction to mixtures in discrete choice models

Integrating advanced discrete choice models in mixed integer linear optimization

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Integrated schedule planning with supply-demand interactions for a new generation of aircrafts

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

Destination Choice Model including panel data using WiFi localization in a pedestrian facility

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

STRC. A Lagrangian relaxation technique for the demandbased benefit maximization problem

Linear Regression With Special Variables

Introduction to disaggregate demand models

INTRODUCTION TO TRANSPORTATION SYSTEMS

Estimating hybrid choice models with the new version of Biogeme

A sampling of alternatives approach for route choice models

STAT 100C: Linear models

TMA 4275 Lifetime Analysis June 2004 Solution

A New Generalized Gumbel Copula for Multivariate Distributions

Non-linear panel data modeling

Probabilistic Choice Models

Route choice models: Introduction and recent developments

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Lecture 6: Discrete Choice: Qualitative Response

Normalization and correlation of cross-nested logit models

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

Appendix A: The time series behavior of employment growth

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Introduction to Maximum Likelihood Estimation

Discrete Dependent Variable Models

Appendix A. Math Reviews 03Jan2007. A.1 From Simple to Complex. Objectives. 1. Review tools that are needed for studying models for CLDVs.

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Generalized Linear Models Introduction

Asymptotic Statistics-III. Changliang Zou

The Logit Model: Estimation, Testing and Interpretation

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Econ 583 Final Exam Fall 2008

Fall, 2007 Nonlinear Econometrics. Theory: Consistency for Extremum Estimators. Modeling: Probit, Logit, and Other Links.

Panel Data Exercises Manuel Arellano. Using panel data, a researcher considers the estimation of the following system:

covariance between any two observations

Masking Identification of Discrete Choice Models under Simulation Methods *

Models for Heterogeneous Choices

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Classification. Chapter Introduction. 6.2 The Bayes classifier

The Network GEV model

Heteroskedasticity. Part VII. Heteroskedasticity

Multivariate Regression

Introduction to Signal Detection and Classification. Phani Chavali

LOGISTIC REGRESSION Joseph M. Hilbe

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

12 Modelling Binomial Response Data

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Kernel Logistic Regression and the Import Vector Machine

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

MSH3 Generalized linear model

Maximum Likelihood and. Limited Dependent Variable Models

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Exercises Chapter 4 Statistical Hypothesis Testing

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Generalized Linear Models

A Discrete choice framework for acceleration and direction change behaviors in walking pedestrians

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Binary Logistic Regression

GARCH Models Estimation and Inference

Part I Behavioral Models

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as

Normalization and correlation of cross-nested logit models

Testing Restrictions and Comparing Models

Transcription:

Binary choice Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale de Lausanne M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 1 / 64

Outline Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 2 / 64

Outline Simple example Choice between Auto and Transit Data Time Time Time Time # auto transit Choice # auto transit Choice 1 52.9 4.4 T 11 99.1 8.4 T 2 4.1 28.5 T 12 18.5 84.0 C 3 4.1 86.9 C 13 82.0 38.0 C 4 56.2 31.6 T 14 8.6 1.6 T 5 51.8 20.2 T 15 22.5 74.1 C 6 0.2 91.2 C 16 51.4 83.8 C 7 27.6 79.7 C 17 81.0 19.2 T 8 89.9 2.2 T 18 51.0 85.0 C 9 41.5 24.5 T 19 62.2 90.1 C 10 95.0 43.5 T 20 95.1 22.2 T 21 41.6 91.5 C M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 3 / 64

Model specification Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 4 / 64

Model specification Binary choice model Specification of the utilities U C = β 1 T C +ε C U T = β 1 T T +ε T where T C is the travel time with car (min) and T T the travel time with transit (min). Choice model where ε = ε T ε C. P(C {C,T}) = Pr(U C U T ) = Pr(β 1 T C +ε C β 1 T T +ε T ) = Pr(β 1 (T C T T ) ε T ε C ) = Pr(ε β 1 (T C T T )) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 5 / 64

Model specification Error term Three assumptions about the random variables ε T and ε C 1 What s their mean? 2 What s their variance? 3 What s their distribution? Note For binary choice, it would be sufficient to make assumptions about ε = ε T ε C. But we want to generalize later on. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 6 / 64

Model specification Error term The mean Change of variables Define E[ε C ] = β C and E[ε T ] = β T. Define ε C = ε C β C and ε T = ε T β T, so that E[ε C ] = E[ε T ] = 0. Choice model P(C {C,T}) = Pr(β 1 (T C T T ) ε T ε C ) = Pr(β 1 (T C T T ) ε T +β T ε C β C) = Pr(β 1 (T C T T )+(β C β T ) ε T ε C ) = Pr(β 1 (T C T T )+β 0 ε ) where β 0 = β C β T and ε = ε T ε C. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 7 / 64

Model specification Error term The mean Specification The means of the error terms can be included as parameters of the deterministic part. Only the mean of the difference of the error terms is identified. Alternative Specific Constant Equivalent specifications: U C = β 1 T C +ε C or U C = β 1 T C +β C +ε C U T = β 1 T T +β T +ε T U T = β 1 T T +ε T In practice: associate an alternative specific constant with all alternatives but one. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 8 / 64

Model specification Error term The mean Note Adding the same constant to all utility functions does not affect the choice model Pr(U C U T ) = Pr(U C +K U T +K) K R n. The bottom line... If the deterministic part of the utility functions contains an Alternative Specific Constant (ASC) for all alternatives but one, the mean of the error terms can be assumed to be zero without loss of generality. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 9 / 64

Model specification Error term The variance Utility is ordinal Utilities can be scaled up or down without changing the choice probability Pr(U C U T ) = Pr(αU C αu T ) α > 0 Link with the variance Var(αU C ) = α 2 Var(U C ) Var(αU T ) = α 2 Var(U T ) Variance is not identified As any α can be selected arbitrarily, any variance can be assumed. No way to identify the variance of the error terms from data. The scale has to be arbitrarily decided. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 10 / 64

Model specification Error term The distribution Assumption 1 ε T and ε C are the sum of many r.v. capturing unobservable attributes (e.g. mood, experience), measurement and specification errors. Central-limit theorem The sum of many i.i.d. random variables approximately follows a normal distribution: N(µ,σ 2 ). Assumed distribution ε C N(0,1), ε T N(0,1) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 11 / 64

Model specification Error term The Normal distribution N(µ,σ 2 ) Probability density function (pdf) f(t) = 1 σ (t µ) 2 2π e 2σ 2 Cumulative distribution function (CDF) P(c ε) = F(c) = No closed form. c f(t)dt 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 3 2 1 0 1 2 3 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 12 / 64

Model specification Error term The distribution ε = ε T ε C From the properties of the normal distribution, we have ε C N(0,1) ε T N(0,1) ε = ε T ε C N(0,2) As the variance is arbitrary, we may also assume ε C N(0,0.5) ε T N(0,0.5) ε = ε T ε C N(0,1) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 13 / 64

Model specification Error term The binary probit model Choice model P(C {C,T}) = Pr(β 1 (T C T T )+β 0 ε) = F ε (β 1 (T C T T )+β 0 ) The binary probit model Not a closed form expression P(C {C,T}) = 1 2π β1 (T C T T ) β 0 e 1 2 t2 dt M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 14 / 64

Model specification Error term The distribution Assumption 2 ε T and ε C are the maximum of many r.v. capturing unobservable attributes (e.g. mood, experience), measurement and specification errors. Gumbel theorem The maximum of many i.i.d. random variables approximately follows an Extreme Value distribution: EV(η, µ). Assumed distribution ε C EV(0,1), ε T EV(0,1). M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 15 / 64

Model specification Error term The Extreme Value distribution EV(η, µ) Probability density function (pdf) f(t) = µe µ(t η) e e µ(t η) Cumulative distribution function (CDF) P(c ε) = F(c) = c f(t)dt = e e µ(c η) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 16 / 64

Model specification Error term The Extreme Value distribution pdf EV(0,1) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 3 2 1 0 1 2 3 CDF EV(0,1) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 3 2 1 0 1 2 3 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 17 / 64

Model specification Error term The Extreme Value distribution Properties If ε EV(η,µ) then E[ε] = η + γ µ where γ is Euler s constant. and Var[ε] = π2 6µ 2 Euler s constant γ = lim k k i=1 1 i lnk = 0 e x lnxdx 0.5772 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 18 / 64

Model specification Error term The distribution ε = ε T ε C From the properties of the extreme value distribution, we have ε C EV(0,1) ε T EV(0,1) ε Logistic(0,1) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 19 / 64

Model specification Error term The Logistic distribution: Logistic(η,µ) Probability density function (pdf) f(t) = µe µ(t η) (1+e µ(t η) ) 2 Cumulative distribution function (CDF) with µ > 0. P(c ε) = F(c) = c f(t)dt = 1 1+e µ(c η) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 20 / 64

Model specification Error term The binary logit model Choice model P(C {C,T}) = Pr(β 1 (T C T T )+β 0 ε) = F ε (β 1 (T C T T )+β 0 ) The binary logit model P(C {C,T}) = The binary logit model 1 1+e (β 1(T C T T )+β 0 ) = e β1tc+β0 e β 1T C +β 0 +e β 1 T T P(C {C,T}) = e V C e V C +e V T M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 21 / 64

Applying the model Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 22 / 64

Applying the model Back to the example Choice between Auto and Transit Data Time Time Time Time # auto transit Choice # auto transit Choice 1 52.9 4.4 T 11 99.1 8.4 T 2 4.1 28.5 T 12 18.5 84.0 C 3 4.1 86.9 C 13 82.0 38.0 C 4 56.2 31.6 T 14 8.6 1.6 T 5 51.8 20.2 T 15 22.5 74.1 C 6 0.2 91.2 C 16 51.4 83.8 C 7 27.6 79.7 C 17 81.0 19.2 T 8 89.9 2.2 T 18 51.0 85.0 C 9 41.5 24.5 T 19 62.2 90.1 C 10 95.0 43.5 T 20 95.1 22.2 T 21 41.6 91.5 C M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 23 / 64

Applying the model First individual Parameters Let s assume that β 0 = 0.5 and β 1 = 0.1 Variables Let s consider the first observation: T C1 = 52.9 T T1 = 4.4 Choice = transit: y auto,1 = 0, y transit,1 = 1 Choice What s the probability given by the model that this individual indeed chooses transit? M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 24 / 64

Applying the model First individual Utility functions V C1 = β 1 T C1 = 5.29 V T1 = β 1 T T1 +β 0 = 0.06 Choice model P 1 (transit) = e V T1 e V T1 +e V C1 = Comments The model fits the observation very well. e 0.06 e 0.06 +e 5.29 = 1 Consistent with the assumption that travel time is the only explanatory variable. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 25 / 64

Applying the model Second individual Parameters Let s assume that β 0 = 0.5 and β 1 = 0.1 Variables T C2 = 4.1 T T2 = 28.5 Choice = transit: y auto,2 = 0, y transit,2 = 1 Choice What s the probability given by the model that this individual indeed chooses transit? M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 26 / 64

Applying the model Second individual Utility functions V C2 = β 1 T C2 = 0.41 V T2 = β 1 T T2 +β 0 = 2.35 Choice model P 2 (transit) = e V T2 e V T2 +e V C2 = Comment The model poorly fits the observation. e 2.35 e 2.35 +e 0.41 = 0.13 But the assumption is that travel time is the only explanatory variable. Still, the probability is not small. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 27 / 64

Applying the model Likelihood Two observations The probability that the model reproduces both observations is P 1 (transit)p 2 (transit) = 0.13 All observations The probability that the model reproduces all observations is Likelihood of the sample P 1 (transit)p 2 (transit)...p 21 (auto) = 4.62 10 4 L = n (P n (auto) yauto,n P n (transit) y transit,n ) where y j,n is 1 if individual n has chosen alternative j, 0 otherwise M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 28 / 64

Applying the model Likelihood Likelihood Probability that the model fits all observations. It is a function of the parameters. Examples β 0 β 1 L 0 0 4.57 10 07 0-1 1.97 10 30 0-0.1 4.1 10 04 0.5-0.1 4.62 10 04 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 29 / 64

Applying the model Likelihood function 0.003 0.002 0.001 0 2.5 0.3 0.25 0.2 0.15 0.1 0.05 β 0 β 1 1.5 1 0.500.511.52 0 2 0.0025 0.002 0.0015 0.001 0.0005 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 30 / 64

Applying the model Likelihood function (zoom) 0.003 0.002 0.001 0.0022 0.002 0.0018 0.0016 0.0014 0.0012 0.001 0.0008 0.0006 0.5 0.4 0.08 0.3 0.07 0.06 0.05 0.04 0.2 β 0 0.1 β 1 0.030 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 31 / 64

Maximum likelihood estimation Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 32 / 64

Maximum likelihood estimation Maximum likelihood estimation Estimators for the parameters Parameters that achieve the maximum likelihood max (P n (auto;β) yauto,n P n (transit;β) y transit,n ) β Log likelihood n Alternatively, we prefer to maximize the log likelihood max β ln n (P n (auto) yauto,n P n (transit) y transit,n ) = max β ln(y auto,n P n (auto)+y transit,n P n (transit)) n M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 33 / 64

Maximum likelihood estimation Maximum likelihood estimation 0 10 20 5 10 15 20 25 30 2.5 0.3 0.25 0.2 0.15 0.1 0.05 β 0 β 1 1.5 1 0.500.511.52 0 2 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 34 / 64

Maximum likelihood estimation Maximum likelihood estimation 6 6.5 7 6 6.2 6.4 6.6 6.8 7 7.2 7.4 0.5 0.4 0.08 0.3 0.07 0.06 0.05 0.04 0.2 β 0 0.1 β 1 0.030 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 35 / 64

Maximum likelihood estimation Solving the optimization problem Unconstrained nonlinear optimization Iterative methods Designed to identify a local maximum When the function is concave, a local maximum is also a global maximum For binary logit, the log likelihood is concave Use the derivatives of the objective function M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 36 / 64

Maximum likelihood estimation Algorithm CFSQP in Biogeme 0.3 0.25 0.2 0.15 0.1 0.05 0 β 1 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 37 / 64

Maximum likelihood estimation Algorithm CFSQP in Biogeme 0.3 0.25 0.2 0.15 0.1 0.05 0 β 1 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 38 / 64

Maximum likelihood estimation Algorithm CFSQP in Biogeme 0.3 0.25 0.2 0.15 0.1 0.05 0 β 1 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 39 / 64

Maximum likelihood estimation Algorithm CFSQP in Biogeme 0.3 0.25 0.2 0.15 0.1 0.05 0 β 1 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 40 / 64

Maximum likelihood estimation Algorithm CFSQP in Biogeme 0.3 0.25 0.2 0.15 0.1 0.05 0 β 1 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 41 / 64

Maximum likelihood estimation Algorithm CFSQP in Biogeme 0.3 0.25 0.2 0.15 0.1 0.05 0 β 1 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 42 / 64

Maximum likelihood estimation Algorithm CFSQP in Biogeme 0.3 0.25 0.2 0.15 0.1 0.05 0 β 1 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 43 / 64

Maximum likelihood estimation Algorithm CFSQP in Biogeme 0.3 0.25 0.2 0.15 0.1 0.05 0 β 1 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 β 0 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 44 / 64

Maximum likelihood estimation Nonlinear optimization Things to be aware of... Iterative methods terminate when a given stopping criterion is verified, based on the fact that, if β is the optimum, L(β ) = 0 Stopping criteria usually vary across optimization packages, which may produce slightly different solutions They are usually using a parameter defining the required precision Most methods are sensitive to the conditioning of the problem. A well-conditioned problem is a problem for which all parameters have almost the same magnitude M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 45 / 64

Maximum likelihood estimation Nonlinear optimization 5 10 15 20 25 30 0.3 0.2 0.1 0 2 10 1 2 5 10 15 20 25 30 0 200 0.3 0.2 0.1 0 2 10 1 2 0 50 100 150 200 250 300 350 400 Time in min. Time in sec. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 46 / 64

Maximum likelihood estimation Nonlinear optimization 6 6.2 6.4 6.6 6.8 7 7.2 7.4 0.08 0.06 0.04 0 0.10.20.30.40.5 6 6.2 6.4 100 6.6 150 6.8 7 200 7.2 250 7.4 300 350 0.08 0.06 0.04 0 0.10.20.30.40.5 100 150 200 250 300 350 Time in min. Time in sec. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 47 / 64

Maximum likelihood estimation Nonlinear optimization Things to be aware of... Convergence may be very slow or even fail if the likelihood function is flat It happens when the model is not identifiable Structural flaw in the model (e.g. full set of alternative specific constants) Lack of variability in the data (all prices are the same across the sample) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 48 / 64

Maximum likelihood estimation Nonlinear optimization 14 14.5 14.5 14.6 14.6 14.7 14.7 14.8 14.8 14.9 14.9 15 0.08 0.06 0.04 0 0.1 0.2 0.3 0.4 0.5 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 49 / 64

Output of the estimation Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 50 / 64

Output of the estimation Output of the estimation Solution of max β R K L(β) β L(β ) Case study β0 = 0.2376 β1 = 0.0531 L(β0,β 1 ) = 6.166 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 51 / 64

Output of the estimation Second derivatives Information about the quality of the estimators 2 L 2 L β 2 β 1 1 β 2 2 L β 1 β K 2 L 2 L β 2 β 1 2 L β 2 L(β 2 β 2 2 β K ) =.......... 2 L 2 L β K β 1 β K β 2 2 L βk 2 2 L(β ) 1 is a consistent estimator of the variance-covariance matrix of the estimates M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 52 / 64

Output of the estimation Statistics Statistics on the parameters Summary statistics L(β ) = -6.166 L(0) = -14.556 Parameter Value Std Err. t-test β 0 0.2376 0.7505 0.32 β 1-0.0531 0.0206-2.57 2(L(0) L(β )) = 16.780 ρ 2 = 0.576, ρ 2 = 0.439 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 53 / 64

Output of the estimation Summary statistics Null log likelihood L(0) sample log likelihood with a trivial model where all parameters are zero, that is a model always predicting P(1 {1,2}) = P(2 {1,2}) = 1 2 Purely a function of sample size L(0) = log( 1 2N) = Nlog(2) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 54 / 64

Output of the estimation Summary statistics Likelihood ratio 2(L(0) L(β )) ( L ) (0) log L (β = log(l (0)) log(l (β )) = L(0) L(β ) ) Likelihood ratio test H 0 : the two models are equivalent Under H 0, 2(L(0) L(β )) is asymptotically distributed as χ 2 with K degrees of freedom. Similar to the F test in regression models M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 55 / 64

Output of the estimation Summary statistics Rho (bar) squared ρ 2 ρ 2 = 1 L(β ) L(0) Similar to the R 2 in regression models ρ 2 ρ 2 = 1 L(β ) K L(0) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 56 / 64

Back to the scale Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 57 / 64

Back to the scale Back to the scale Comparing models Arbitrary scale may be problematic when comparing models Binary probit: σ 2 = Var(ε i ε j ) = 1 Binary logit: Var(ε i ε j ) = π 2 /(3µ) = π 2 /3 Var(αU) = α 2 Var (U). Scaled logit coeff. are π/ 3 larger than scaled probit coeff. M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 58 / 64

Back to the scale Comparing models Estimation results Note: π/ 3 1.814 Probit Logit Probit * π/ 3 L -6.165-6.166 β 0 0.064 0.238 0.117 β 1-0.030-0.053-0.054 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 59 / 64

Appendix Outline 1 Model specification Error term 2 Applying the model 3 Maximum likelihood estimation 4 Output of the estimation Summary statistics 5 Back to the scale 6 Appendix M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 60 / 64

Appendix Maximum likelihood for binary logit Let C n = {i,j} Let y in = 1 if i is chosen by n, 0 otherwise Let y jn = 1 if j is chosen by n, 0 otherwise Obviously, y in = 1 y jn Log-likelihood of the sample N ( e V in e V ) jn y in ln e V +y in +e V jn ln jn e V in +e V jn n=1 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 61 / 64

Appendix Maximum likelihood for binary logit P n (i) = e V in e V in +e V jn lnp n (i) = V in ln(e V in +e V jn ) lnp n (i) V in = 1 e V in e V in +e V jn = 1 P n (i) = P n (j) lnp n (i) V jn = evjn e V in +e V jn = P n (j) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 62 / 64

Appendix Maximum likelihood for binary logit L V in L θ = L V in V in θ + L V jn V jn θ = N n=1 ( ) lnp y n(i) lnp in V in +y n(j) jn V in = N n=1 (y inp n (j) y jn P n (i)) = N n=1 (y in(1 P n (i)) (1 y in )P n (i)) = N n=1 (y in P n (i)) = N n=1 (y jn P n (j)) M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 63 / 64

Appendix Maximum likelihood for binary logit L θ = N (y in P n (i)) V in θ +(y jn P n (j)) V jn θ n=1 = N (y in P n (i))( V in θ V jn θ ) n=1 If V in = k θ kx ink, then L θ k = N (y in P n (i))(x ink x jnk ) n=1 M. Bierlaire (TRANSP-OR ENAC EPFL) Binary choice 64 / 64