Lecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo

Similar documents
Lecture-20: Discrete Choice Modeling-I

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Part I Behavioral Models

The 17 th Behavior Modeling Summer School

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 2. Recap: MNL. Recap: MNL

INTRODUCTION TO TRANSPORTATION SYSTEMS

Outline. The binary choice model. The multinomial choice model. Extensions of the basic choice model

VALUATION USING HOUSEHOLD PRODUCTION LECTURE PLAN 15: APRIL 14, 2011 Hunt Allcott

Binary choice. Michel Bierlaire

6 Mixed Logit. 6.1 Choice Probabilities

Lecture 3 Optimization methods for econometrics models

Econ 110: Introduction to Economic Theory. 8th Class 2/7/11

Econometrics Summary Algebraic and Statistical Preliminaries

How Indecisiveness in Choice Behaviour affects the Magnitude of Parameter Estimates obtained in Discrete Choice Models. Abstract

P1: GEM/IKJ P2: GEM/IKJ QC: GEM/ABE T1: GEM CB495-05Drv CB495/Train KEY BOARDED August 20, :28 Char Count= 0

An Overview of Choice Models

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply

Binary Logistic Regression

Environmental Econometrics

Chapter 11. Regression with a Binary Dependent Variable

Basic Linear Model. Chapters 4 and 4: Part II. Basic Linear Model

Limited Dependent Variables and Panel Data

Non-linear panel data modeling

Chapter 10 Nonlinear Models

2. We care about proportion for categorical variable, but average for numerical one.

The impact of residential density on vehicle usage and fuel consumption*

A Joint Tour-Based Model of Vehicle Type Choice and Tour Length

Are Travel Demand Forecasting Models Biased because of Uncorrected Spatial Autocorrelation? Frank Goetzke RESEARCH PAPER

11.5 Regression Linear Relationships

A short introduc-on to discrete choice models

Chapter 3 Multiple Regression Complete Example

Limited Dependent Variable Models II

Statistical Tests. Matthieu de Lapparent

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Midterm 2 - Solutions

Estimating Transportation Demand, Part 2

Bresnahan, JIE 87: Competition and Collusion in the American Automobile Industry: 1955 Price War

Hypothesis Testing hypothesis testing approach formulation of the test statistic

Discrete Choice Models I

A Spatial Multiple Discrete-Continuous Model

Econ 673: Microeconometrics

Chapter 9 Inferences from Two Samples

Figure 8.2a Variation of suburban character, transit access and pedestrian accessibility by TAZ label in the study area

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Midterm 2 - Solutions

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

An overview of applied econometrics

Thresholds in choice behaviour and the size of travel time savings

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Dynamic Travel Demand Models Incorporating Unobserved Heterogeneity and First-order Serial Correlation

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Applied Microeconometrics (L5): Panel Data-Basics

Panel Data Exercises Manuel Arellano. Using panel data, a researcher considers the estimation of the following system:

The Multinomial Model

Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

ISQS 5349 Spring 2013 Final Exam

UNIVERSITY OF TORONTO Faculty of Arts and Science

Psychology 282 Lecture #4 Outline Inferences in SLR

Appendix A: The time series behavior of employment growth

Econometrics I Lecture 3: The Simple Linear Regression Model

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes

Data Collection. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1

Typical information required from the data collection can be grouped into four categories, enumerated as below.

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Lecture 4. Xavier Gabaix. February 26, 2004

Applied Econometrics - QEM Theme 1: Introduction to Econometrics Chapter 1 + Probability Primer + Appendix B in PoE

Chapter 1 Statistical Inference

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

P1: JYD /... CB495-08Drv CB495/Train KEY BOARDED March 24, :7 Char Count= 0 Part II Estimation 183

Final Exam - Solutions

Discrete panel data. Michel Bierlaire

Estimating Single-Agent Dynamic Models

Chapter 10 Logistic Regression

h=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

The Impact of Residential Density on Vehicle Usage and Fuel Consumption: Evidence from National Samples

Lecture 10 Demand for Autos (BLP) Bronwyn H. Hall Economics 220C, UC Berkeley Spring 2005

LECTURE 5. Introduction to Econometrics. Hypothesis testing

1. The Multivariate Classical Linear Regression Model

Empirical Industrial Organization (ECO 310) University of Toronto. Department of Economics Fall Instructor: Victor Aguirregabiria

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK

Supplementary Technical Details and Results

Testing Restrictions and Comparing Models

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Chapter 10: Multiple Regression Analysis Introduction

Comprehensive Examination Quantitative Methods Spring, 2018

Regression Analysis Tutorial 77 LECTURE /DISCUSSION. Specification of the OLS Regression Model

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

EMERGING MARKETS - Lecture 2: Methodology refresher

Lecture 5: Sampling Methods

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture Notes: Estimation of dynamic discrete choice models

Harvard University. Rigorous Research in Engineering Education

Transcription:

Lecture 1 Behavioral Models Multinomial Logit: Power and limitations Cinzia Cirillo 1

Overview 1. Choice Probabilities 2. Power and Limitations of Logit 1. Taste variation 2. Substitution patterns 3. Repeated choices over time (i.e., Panel Data) 3. Nonlinear Representative Utility 4. Consumer Surplus 5. Derivative and Elasticity 6. Exogenous Estimation 7. Goodness of Fit 8. Hypothesis Testing 2

1. Choice Probabilities Logit is the easiest and most widely used discrete choice model. Its popularity is due to the fact that the choice probabilities takes a closed form and is readily interpretable. Suppose a decision maker, labeled n, faces J alternatives. The utility that person n obtains from alternative j is denoted as: U nn = V nn + ε nn, where V nj is observed by the researcher and ε nj is a nonobserved random variable. The logit model is obtained by assuming that each ε nj is independently, identically distributed extreme value. The distribution is also called Gumbel and type I extreme value, with density and cumulative distribution: f ε nn = e ε nn e e ε nn F ε nn = e e ε nn 3

1. Choice Probabilities (cont.) Logit is the easiest and most widely used discrete choice model. Its popularity is due to the fact that the choice probabilities takes a closed form and is readily interpretable. Suppose a decision maker, labeled n, faces J alternatives. The utility that person n obtains from alternative j is denoted as: U nn = V nn + ε nn, where V nj is observed by the researcher and ε nj is a non-observed random variable. The logit model is obtained by assuming that each ε nj is independently, identically distributed extreme value. The distribution is also called Gumbel and type I extreme value, with density and cumulative distribution: f ε nn = e ε nn e e ε nn F ε nn = e e ε nn The variance of this distribution is π 2 /6. By assuming this, we are implicitly normalizing the scale of utility. 4

1. Choice Probabilities (cont.) The difference between two extreme value variables is distributed logistic. That is, if ε nj and ε ni are iid extreme value, then ε nnn = ε nn ε ni follows the logistic distribution: F ε nnn = eε nnn 1 + e ε nnn The key assumption is that the errors are independent of each other. This independence means that the unobserved portion of utility for one alternative is unrelated to the unobserved portion of utility for another alternative. This assumption is not as restrictive as it might at first seem. Under independence, the researcher has specified V nj sufficiently that the remaining, unobserved portion of utility is essentially white noise. 5

1. Choice Probabilities (cont.) If the researcher thinks that the unobserved portion of utility is correlated over alternatives, then he/she can: 1. Use a different model that allows for correlated errors, 2. Re-specify representative utility so that the source of the correlation is captured explicitly and thus the remaining errors are independent, or 3. Use the logit model under the current specification of representative utility, considering the model to be an approximation. 6

1. Choice Probabilities (cont.) The probability that decision maker n chooses alternative i is: P nn = PPPP(ε nj < ε ni + V nn V nn j i) If ε ni is considered given, this expression is the cumulative distribution for each ε nj evaluated at ε ni + V ni V nj, which is exp( exp( (ε ni + V ni V nj ))). Since the ε s are independent, this cumulative distribution over all j i is the product of the individual cumulative distributions: P nn ε nn = e e (ε nn +V nn V nn ) j i ε ni is not given, and so the choice probability is the integral of P ni ε ni over all values of ε ni weighted by its density. After some algebraic manipulation of this integral results in a succinct, closed form expression: P nn = e(v nn) e (V nj) j 7

1. Choice Probabilities (cont.) Logit exhibit desirable properties: The LL is globally concave 0 < P ni < 1 P ni is never exactly 0 or 1 V ni rises P ni 1 V ni decreases P ni 0 V ni - then P ni 0 If the initial probability is very low or very high, the effect of change in V is small. The point at which the increase in representative utility has the greatest effect is when the probability is close to 0.5. 8

1. Choice Probabilities (cont.) Consider a binary choice situation first: a household s choice between a gas and an electric heating system. Ug =β 1 PP g + β 2 OC g + ε g PP= purchase price β 1 <0 Ue = β 1 PP e +β 2 OC e + ε e OC= operating cost β 2 <0 If ε g and ε e are i.i.d. extreme value, then: e β 1PP g +β 2 OO g P g = e β 1 PP g+β 2 OO g + e β 1 PP e+β 2 OO e The ratio β2/β1 represents the household s willingness to pay (WTP) for operating-cost reductions. If β1= 0.20 and β2= 1.14, then WTP=( 1.14)/( 0.20) = 5.70 dollars more for a system whose annual operating costs is one dollar less. 9

2. Power and Limitations of Logit Three topics elucidate the power of logit models to represent choice behavior, as well as delineating the limits to that power. These topics are: taste variation, substitution patterns, and repeated choices over time. The applicability of logit models can be summarized as follows: 1. Can represent systematic taste variation (linked to individual characteristics of the decision maker). 2. Implies proportional distribution across alternatives. 3. Cannot capture the dynamics of repeated choices. 10

2.1 Taste Variation Logit can capture taste variations that vary systematically with respect to observed variables, while random tastes cannot be handled. Consider households choice among makes and models of cars to buy. U nj = α n SR j + β n PP j + ε nj PP= purchase price SR= shoulder room α n varies with the number of members in the households, α n =ρm n β n is inversely related to income, β n =θ/i n. Then: U nj = ρ(m n SR j )+ θ(pp j /I n ) + ε nj 11

2.1 Taste Variation (cont.) Suppose that the value of shoulder room and purchase price varied with observed component plus some other factors that are unobserved, α n =ρm n + μ n and β n =(θ/i n ) + η n. Then: U nj = ρ(m n SR j )+ μ n SR j + θ(pp j /I n ) + η n PP j + ε nj Since μ n and η n are unobserved, the terms μ n SR j and η n PP j become part of the unobserved component of the utility. ε nn = μ n SS j + η n PP j + ε nn The new error terms ε nn is not distributed independently and identically as required for the logit formulation. 12

2.2 Substitution Pattern An increase in the probability of one alternative necessarily means the decrease in probability for other alternatives. For any two alternatives i and k, the ratio of the logit probabilities is: P nn = evnn j e Vnj P nk e V nk e V = ev nn nn e V = ev ni V nn nn j This ratio does not depend on any alternatives other than i and k. The logit model exhibits this independence from irrelevant alternatives, or IIA. 13

2.2 Substitution Pattern (cont.) Because of the IIA, it is possible to estimate model parameters consistently on a subset of alternatives for each sampled decision maker. For example, in a situation with 100 alternatives, the researcher might, so as to reduce computer time, estimate on a subset of 10 alternatives for each sampled person. The person s chosen alternative plus 9 alternatives randomly selected from the remaining 99. Since relative probabilities within a subset of alternatives are unaffected by the attributes or existence of alternatives not in the subset, exclusion of alternatives in estimation does not affect the consistency of the estimator. 14

2.2 Substitution Pattern (cont.) To test for IIA: 1. Estimate the model. 2. Re-estimate the model using a subset of the alternatives. 3. If IIA holds, then the parameters obtained with the subset of alternatives will not be significantly different from the full model. IIA can be tested with mixed logit by testing whether the variance of the mixing distribution is in fact zero. However, when IIA fails, the test do not provide as much guidance on the correct specification to use instead of logit. 15

2.3 Panel Data If the unobserved factors that affect decision makers are independent over the repeated choices, then logit can be used to examine panel data in the same way as purely crosssectional data. Any dynamics related to observed factors that enter the decision process can be accommodated. However, dynamics associated with unobserved factors cannot be handled. 16

2.3 Panel Data (cont.) The utility that decision maker n obtains from alternative j in period or choice situation t is U nnt = V nnt + ε nnt. Dynamic aspects of behavior can be captured by specifying representative utility in each period to depend on observed variables from other periods. For example, a lagged price response is represented by entering the price in period t 1 as an explanatory variable in the utility for period t. This behavior is captured as V nnn = αy nn(t 1) + βx nnt, where y nnt = 1 if n chose j in period t and 0 otherwise. Lagged dependent variable can be added without inducing bias as long as the errors are independent over time (i.e., y nn(t 1) is not correlated to ε nnn ). 17

3. Nonlinear Utility Models have been developed and widely used for travelers choice of destination for various types of trips, such as shopping trips, within a metropolitan area. The utility depends on time and cost plus some attraction variables (e.g., residential pop and retail employment); labeled by the vector a j for zone j. How large an area to include in each zone is fairly arbitrary. We want a model that is not sensitive to the level of aggregation in the zonal definitions. Consider zones j and k, which, when combined, are labeled zone c. The population and employment in the combined zone are necessarily the sums of those in the two original zones: a + a 18 jj k = a c

3. Nonlinear Utility (cont.) Consider zones j and k, which, when combined, are labeled zone c. The population and employment in the combined zone are necessarily the sums of those in the two original zones: a + a jj k = a c The model must then satisfy: P njj + P nk = P nc Which for logit models takes the form e V nj + e V nk e V nc e V = nn + e V nn + e V nl e V nn + e V nn l j,k l j,k This holds if exp(v nj )+exp(v nk )= exp(v nc ) To specify a destination choice model that is not sensitive to the level of zonal aggregation, representative utility needs to be specified with parameters inside a log operation. 19

4. Consumer Surplus A person s consumer surplus is the utility, in dollar terms, that the person receives in the choice situation. For policy analysis, the researcher is often interested in measuring the change in consumer surplus that is associated with a particular policy. For example, if a new alternative is being considered, such as building a light rail system in a city, then it is important to measure the benefits of the project to see if they warrant the costs. Similarly, a change in the attributes of an alternative can have 20 an impact on consumer surplus that is important to assess.

4. Consumer Surplus (cont.) Consumer surplus is CS n = (1/α n ) max j (U nj ) where α n is the marginal utility of income: du n /dy n = α n, with Y n the income of person n. The division by α n translates utility into dollars, since 1/α n = dy n /du n The researcher only observes V nj instead of U nj. The researcher is able to calculate the expected consumer surplus: E CC n = 1 E mmm α j V nn + ε nn j. n If each ε nj is iid extreme value and utility is linear in income (so that α n is constant with respect to income), then: E CC n = 1 α n ln e V nn J j=1 + C where C is an unknown constant that represents the fact that the absolute level 21 of utility cannot be measured

4. Consumer Surplus (cont.) Note that the argument in parentheses in the previous expression is the denominator of the logit choice probability. The expected consumer surplus in a logit model is simply the log of the denominator of the choice probability. It is often called the log-sum term. This is the CS of a population that has the same representative utility. If different segments are present, then it is necessary to calculate the weighted average over all the segments. 22

5. Derivative and Elasticity It is often useful to know the extent to which these probabilities change in response to a change in some observed factor. The change in the probability that decision maker n chooses alternative i given a change in an observed factor, z ni, while holding everything else constant is: e Vnn j P nn evnn = = V nn P z ni nn nn 1 P nn nn Let z nj denote an attribute of alternative j. How does the probability of choosing alternative i change as z nj increases? P nn evnn k e Vnk = = V nj P nj nn P nj nj nj 23

5. Derivative and Elasticity (cont.) Economists often measure response by elasticities rather than derivatives, since elasticities are normalized for the variables units. An elasticity is the percentage change in one variable that is associated with a one-percent change in another variable. The elasticity of P ni with respect to z ni, a variable entering the utility of alternative i, is E iznn = P nn z nn = V ni z z ni P nn nn 1 P nn ni The cross-elasticity of P ni with respect to a variable entering alternative j is E iznj = P nn nj z nj P nn = V nj nj z nn P nj 24

6. Exogenous Estimation Consider first the situation in which the sample is exogenously drawn (i.e., random or stratified random). We also assume that the explanatory variables are exogenous to the choice situation. Since the logit probabilities take a closed form, the traditional maximum-likelihood procedures can be applied. The probability of person n choosing the alternative that he was actually observed to choose is P nn y nn i. where y ni =1 if person n chose i and zero otherwise. 25

6. Exogenous Estimation (cont.) Assuming that each decision maker s choice is independent of that of other decision makers, the probability of each person in the sample choosing the alternative that he was observed actually to choose L β = N y P nn nn n=1 i. where β is a vector containing the parameters of the model. The log-likelihood function is then N LL β = y nn n=1 i ln (P nn ) LL(β) is globally concave for linear-inparameters utility. 26

7. Goodness of Fit A statistic called the likelihood ratio index is often used to measure how well the models fit the data (i.e., how well the model performs compared with a model in which all the parameters are zero). The likelihood ratio index is defined as ρ = 1 LL(β ) LL(0) It is usually valid to say that the model with the higher ρ fits the data better. 27

7. Goodness of Fit (cont.) Another goodness-of-fit statistic that is sometimes used, but should be avoided, is the percent correctly predicted. This statistic is calculated by identifying for each sampled decision maker the alternative with the highest probability, based on the estimated model, and determining whether or not this was the alternative that the decision maker actually chose. This statistic says that the alternative with the highest probability will be chosen each time, which is not true. Suppose an estimated model predicts choice probabilities of.75 and.25 in a twoalternative situation. Those probabilities mean that if 100 people faced the representative utilities that gave these probabilities the researcher s best prediction of how many people would choose each alternative are 75 and 25. However, the percent correctly predicted statistic would predict that one 28 alternative would be chosen by all 100 people.

8. Hypothesis Testing Standard t-statistics are used to test hypotheses about individual parameters in discrete choice models. Two of the most common hypotheses are (1) several parameters are zero, and (2) two or more parameters are equal. The test statistic 2(LL(β 2 ) LL(β 1 )) is used to evaluate these hypothesis. This statistic is chi-squared distributed with degrees of freedom equal to the number of restrictions implied by the null hypothesis (i.e., difference in number of coefficients). If this value exceeds the critical value of chi-squared with the appropriate degrees of freedom, then the null hypothesis is rejected. 29