Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Similar documents
Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 2. Recap: MNL. Recap: MNL

POLI 7050 Spring 2008 February 27, 2008 Unordered Response Models I

disc choice5.tex; April 11, ffl See: King - Unifying Political Methodology ffl See: King/Tomz/Wittenberg (1998, APSA Meeting). ffl See: Alvarez

Discrete Choice Models I

Lecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo

Limited Dependent Variable Models II

Lecture-20: Discrete Choice Modeling-I

Probabilistic Choice Models

A short introduc-on to discrete choice models

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as

The Logit Model: Estimation, Testing and Interpretation

Introduction to Discrete Choice Models

Probabilistic Choice Models

Single-level Models for Binary Responses

An Overview of Choice Models

Applied Health Economics (for B.Sc.)

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

The 17 th Behavior Modeling Summer School

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

Linear Regression With Special Variables

Fixed Effects Models for Panel Data. December 1, 2014

POLI 7050 Spring 2008 March 5, 2008 Unordered Response Models II

Maximum Likelihood and. Limited Dependent Variable Models

h=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Models of Qualitative Binary Response

Binary Logistic Regression

Review of Statistics

Chapter 3 Choice Models

Binary choice. Michel Bierlaire

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Lecture 6: Discrete Choice: Qualitative Response

Econ 673: Microeconometrics

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Econometric Analysis of Games 1

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Instrumental Variables and the Problem of Endogeneity

2. We care about proportion for categorical variable, but average for numerical one.

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Bias Variance Trade-off

A Bayesian Probit Model with Spatial Dependencies

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Comparing IRT with Other Models

Multiple regression: Categorical dependent variables

Advanced Quantitative Methods: limited dependent variables

Maximum Likelihood Methods

Review of One-way Tables and SAS

[y i α βx i ] 2 (2) Q = i=1

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

CONTINUOUS RANDOM VARIABLES

Part I Behavioral Models

Outline. The binary choice model. The multinomial choice model. Extensions of the basic choice model

Can a Pseudo Panel be a Substitute for a Genuine Panel?

Item Response Theory for Conjoint Survey Experiments

Classification. Chapter Introduction. 6.2 The Bayes classifier

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Instrumental Variables

Lecture 11. Probability Theory: an Overveiw

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Nevo on Random-Coefficient Logit

UNIVERSITY OF TORONTO Faculty of Arts and Science

Logit Regression and Quantities of Interest

EC402 - Problem Set 3

1 Hotz-Miller approach: avoid numeric dynamic programming

UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS

Truncation and Censoring

Econometrics for PhDs

Ordinary Least Squares Regression

Introduction to mixtures in discrete choice models

INTRODUCTION TO TRANSPORTATION SYSTEMS

Summary of Chapters 7-9

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Final Exam. Economics 835: Econometrics. Fall 2010

Chapter 11. Regression with a Binary Dependent Variable

Comments on: Panel Data Analysis Advantages and Challenges. Manuel Arellano CEMFI, Madrid November 2006

Generalized logit models for nominal multinomial responses. Local odds ratios

Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression

Statistical Tests. Matthieu de Lapparent

Syllabus. By Joan Llull. Microeconometrics. IDEA PhD Program. Fall Chapter 1: Introduction and a Brief Review of Relevant Tools

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Homework 1 Solutions

CHAPTER 5. Logistic regression

Introduction to Estimation Methods for Time Series models. Lecture 1

Ordered Response and Multinomial Logit Estimation

Gibbs Sampling in Latent Variable Models #1

Statistical Analysis of the Item Count Technique

Simple Linear Regression

Limited Dependent Variables and Panel Data

ECON 594: Lecture #6

Bivariate Distributions

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

Estimation of mixed generalized extreme value models

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

What can we learn about correlations from multinomial probit estimates?

Transcription:

Goals PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1 Tetsuya Matsubayashi University of North Texas November 2, 2010 Random utility model Multinomial logit model Conditional logit model Independence of Irrelevant Alternatives Nested logit model (next week) Mixed logit model (next week) Multinomial probit model (next week) 1 / 47 2 / 47 Multinomial Dependent Variable Random Utility Model Vote: Bush, Clinton, or Perot in the 1992 presidential election Travel: car, bus, or train Occupation: blue-color job, white-color job, professional job etc. A decision maker chooses one alternative from a choice set. The choice set is characterized as follows: Alternatives must be mutually exclusive. The choice set must be exhaustive. The number of alternatives must be definite. 3 / 47 4 / 47

Random Utility Model Random Utility Model Thus, the model is expressed as: The random utility model assumes that a decision maker i attaches a utility to each alternative, U im, m 1... M. The random utility model assumes that the utility consists of two components: Systematic component, which we can observe. Random component, which we cannot observe. U im V im + ɛ im where U im is a decision maker i s utility for alternative m, V im is the systematic component for a decision maker i associated with choice m, and ɛ im denotes the random component of utility for a decision maker i associated with choice m. For example, the utility of voting for Obama increases as ideological proximity increases. 5 / 47 6 / 47 Random Utility Model Random Utility Model We assume that the systematic component for the utility is a linear function of some exogenous variables V im x im β if variables are choice-specific and V im x i β m if variables are individual-specific. In the case of presidential vote choice, candidates traits are choice specific variables and individuals demographic characteristics are individual-specific variables.p The utility of individual i for choice m is rewritten as: U im x im β + ɛ im U im x i β m + ɛ im The model assumes that the decision maker chooses choice m if and only if: U im > U ij j m 7 / 47 8 / 47

Multinomial Logit Model: Set Up Set Up We begin with the random utility model using individual-specific variables: U im x i β m + ɛ im where x i denotes a vector of individual-specific characteristics. β m is a vector of choice specific parameters. Thus, the effect of x i varies across the choices. Suppose a vote choice model in the 1992 presidential election. Income is a key exogenous variable. The utilities are written as Bush: U i1 β 10 + β 11 Income i + ɛ i1 Clinton: U i2 β 20 + β 21 Income i + ɛ i2 Perot: U i3 β 30 + β 31 Income i + ɛ i3 9 / 47 10 / 47 Set Up Set Up Suppose that individual i chooses one of two alternatives. The probability of choosing alternative 1 is the probability that the utility 1 exceeds the utility from alternative 2: Pr(y i 1) Pr(U i1 > U i2 ) This is a binary choice model. Pr(V i1 + ɛ i1 > V i2 + ɛ i2 ) Pr(ɛ i2 ɛ i1 < V i1 V i2 ) Suppose that individual i chooses one of three alternatives. The probability of choosing alternative 1 is the probability that the utility 1 exceeds the utility from alternative 2 and the utility from alternative 3: P(y i 1) Pr[(U i1 > U i2 ) and (U i1 > U i3 )] Pr[(V i1 + ɛ i1 > V i2 + ɛ i2 ) and (V i1 + ɛ i1 > V i3 + ɛ i3 )] Pr[(ɛ i2 ɛ i1 < V i1 V i2 ) and (ɛ i3 ɛ i1 < V i1 V i3 )] 11 / 47 12 / 47

Set Up Distributional Assumption When there are J choices, the probability of choice m is P(y i m) Pr(U im > U ij ) j m For example, the probability of voting for Bush equals the probability that the utility gained from voting for Bush exceeds the utilities from voting for Clinton and Perot. First, random components are independently and identically distributed (IID). In other words, the random components of the utility of all alternatives are uncorrelated with the unobserved components of utility for all other alternatives, and each of these unobserved components has identical distribution. Second, random components are distributed according to type I extreme value. 13 / 47 14 / 47 Distributional Assumption The Probability Density Function The CDF of type I extreme value distribution is F (ɛ im ) e e ɛ im The PDF of type I extreme value distribution is f (ɛ im ) e ɛ im e e ɛ im The choice of the distribution is motivated by the simplicity, tractability, and usefulness of the resulting model. This distribution has mode 0, mean.58, and standard deviation 1.28. pdf 0.0 0.1 0.2 0.3 4 2 0 2 4 x 15 / 47 16 / 47

The Cumulative Distribution Function Distributional Assumption cdf 0.0 0.2 0.4 0.6 0.8 1.0 The difference between two extreme value variables is distributed logistic. That is, if ɛ im and ɛ in are iid extreme value, then, ɛ imn follows the logistic distribution: eɛ imn F (ɛ imn) 1 + e ɛ imn 4 2 0 2 4 x 17 / 47 18 / 47 Distributional Assumption Identification The choice probability is: P im (ɛ in ɛ im < V im V in )f (ɛ im )dɛ im Some algebraic manipulation of this integral results in a succinct, closed form expression: P im Pr(y i m) e V im J J1 ev ij e x i β m J J1 ex i β J It is convenient to code the outcomes as j 0, 1,..., J so there are J + 1 alternatives in this notation. In the current set up, the ˆβ m are unidentified. For any vector of constants q, we find the same probabilities whether we use β m or β where β β m + q. We could add an arbitrary constant to all the coefficients in the model, yet get the same probabilities. which is the logit choice probability. (See Train, 2003, 78-9 for this derivation.) 19 / 47 20 / 47

Identification Identification Consider the following example with 3 choices: P(y i m) e x(β 1+q) e x(β k +q) e xβ 1 e xq e x(β1+q) + e x(β2+q) + e x(β 3+q) e xβ 1 e xq e xβ 1 e xq + e xβ 2 e xq + e xβ 3 e xq e xβ 1 e xq ( e xβ k )e xq e xβ 1 e xβ k Therefore, the model cannot distinguish the true parameters from the parameters plus an arbitrary constant. As with ordered response models, we need an assumption or normalization which will identify the parameters. A convenient normalization that solves the identification problem is to assume that one of the sets of coefficients (the coefficients for one of the choices) are all zero. 21 / 47 22 / 47 Identification Identification Specifically, assume that all β 0 0 for category zero. More generally, for j 0, 1,..., J. P(y i 0) P(y i j) e x 0 e x 0 + J k1 exβ k 1 1 + J k1 exβ k e xβ j 1 + J k1 exβ k The first alternative becomes the reference category to which all of the results are compared. In this form, it is clear that when J 1, we have the binary logit as a special case of the multinomial logit: P(y i 1) e xβ 1 1 + e xβ 1 23 / 47 24 / 47

Estimation Estimation Estimation of this model is relatively easy since the log-likelihood is globally concave. To specify the likelihood, first define d ij 1 if individual i chooses alternative j, and d ij 0 otherwise. This means there are J + 1 d ij s, each indicating a choice. Use these to select the appropriate terms in the likelihood function. As with ordered response models, there is a different probability expression for each selected outcome. The likelihood function for individual i is L i P d i0 0 P d i1 1 P d i2 2... P d ij J Since we assume these are independent, the joint likelihood is the product of the likelihood of each outcome: L N i1 P d i0 0 Pd i1 1 Pd i2 2... P d ij J 25 / 47 26 / 47 Estimation Estimation The log-likelihood is lnl where β 0 0. N i1 m0 N i1 m0 J d im lnp m ( J d im ln e x i β m 1 + J J1 ex i β j ) Estimate the vote choice model in the 1992 Presidential election: vote i β m0 + β m1 Economy i + β m2 Democrat i +β m3 Republican i + β m4 Income i + ɛ im where vote i has three categories (Bush, Clinton, Perot). Use multinom in nnet library. 27 / 47 28 / 47

Interpreting Coefficients Marginal Effects There are two sets of coefficients for each independent variable. The signs of coefficients can be interpreted in a direct manner. For example, a negative coefficient indicates the the independent variable reduces the probability of voting for a candidate compared to the baseline candidate. Statistical inference is done as usual. We can calculate the marginal effect of one continuous independent variable on the probabilities of the outcome categories. P m x k P m (β km P m (β m β) J P j β kj ) j0 This is the weighted sum of β k where the weights are the outcome probabilities. This tells us the effect on the probabilities of choosing m if a variable increases by small amount. 29 / 47 30 / 47 Predicted Probabilities Odds Ratio Predicted probabilities can be computed with the following equation: ˆP m e x ˆβ m 1 + J j1 ex ˆβ j The values of the key independent variable change, while the other variables are held constant. Odds ratios are useful when you want to know the odds of choosing one alternative relative to the other. We first write: Ω mn (x i ) P im P in where Ω mn (x i ) is the odds of outcome m versus outcome n given x i. x i includes all independent variables. 31 / 47 32 / 47

Odds Ratio We continue: Ω mn (x i ) P im P in ex i β m e x i β n e x i [β m β n] e x i βm J J1 ex i β J e x i βn J J1 ex i β J An individual with characteristics specified in x i is e x i [β m β n] more likely to choose m over n. If you want to use the odds ratio as opposed to the baseline category, the equation is simplified to: Ω m1 (x i ) e x i β m An individual with characteristics specified in x i is e x i β m more likely to choose m over the baseline category. 33 / 47 Odds Ratio You can assess how a change in a particular independent variable affects the odds ratio of m to the baseline category. The effect is computed by: Ω m1 (x i, x ik + δ) Ω m1 (x i, x ik ) e β km δ where x ik is the k th independent variable for individual i and β km is the coefficient associated with the k th independent variable for alternative m. For a change of δ in x ik, the odds of outcome m versus the baseline category are expected to change by a factor of e β km δ, holding all other variables constant. The factor change in the odds for a change in x ik does not depend on the level of x ik or on the level of any other variable. 34 / 47 Conditional Logit Model Conditional Logit Model In the MNL model, each explanatory variable denotes individual-specific characteristics and has a different effect on each outcome. The utility for the MNL model is expressed as U im x i β m + ɛ im The conditional logit model is slightly different from MNL since it considers the impact of choice-specific attributes instead of individual-specific attributes. The utility for the CL model is written as U im z im γ + ɛ im where z im denotes a vector of choice-specific attributes. In the case of the vote choice model in 1992, z im would be a perceived candidate trait, for example. Importantly, the parameters are not choice-specific attributes; there is only one for each attribute. In the three-candidate race, the utilities are expressed as Bush: U i1 β 1 honesty i1 + ɛ i1 Clinton: U i2 β 1 honesty i2 + ɛ i2 Perot: U i3 β 1 honesty i3 + ɛ i3 The utility gets larger when perceived honesty increases. 35 / 47 36 / 47

Data for Conditional Logit Model Conditional Logit Model outcome i outcome chosen honesty age 1 1 0 1 50 1 2 1 7 50 1 3 0 3 50 2 1 1 5 30 2 2 0 1 30 2 3 0 2 30 3 1 0 2 70 3 2 0 3 70 3 3 1 4 70 The probability that individual i chooses alternative m in the CL model is e z imγ Pr(y i m) J J1 ez ij γ which should be compared to the MNL model: where β 1 0. Pr(y i m) e x i β m J J1 ex i β J 37 / 47 38 / 47 (Mixed) Conditional Logit Model (Mixed) Conditional Logit Model It is possible to include both individual-specific and choice-specific attributes in the model. The utility is given by U im x i β m + z im γ + ɛ im where x i contains individual-specific attributes for individual i and z im contains choice-specific attributes for outcome m. The probability that individual i chooses alternative m is: where β 1 0. P(y i m) ex i β m+z im γ J J1 ex i β j +z ij γ 39 / 47 40 / 47

Interpretation Independence of Irrelevant Alternatives You can interpret the coefficients in exactly the same way as you do in the MNL model for the individual-specific variables. For choice-specific variables, the signs of the coefficients indicate how an increase in z affects the likelihood that the individual chooses one alternative. You can also use the same techniques (e.g., predicted probabilities) to make an interpretation. In the multinomial logit model, the equation for the odds of m versus n is P(y i m) P(y i n) exi βm e x i β n evim e V in This equation indicates that the odds are determined without reference to the other outcomes that might be available. This property is called as the independence of irrelevant alternatives or IIA. This is a consequence of assuming independence of ɛ ij in the random utility model. 41 / 47 42 / 47 Independence of Irrelevant Alternatives Independence of Irrelevant Alternatives Think about McFadden s famous example. A person has two choices for commuting to work: a private car that is chosen with P(car) 1/2 and a red bus with P(red bus) 1/2. The implied odds of taking the car versus the red bus is 1. Suppose a new bus company is started that is identical the current service except that the buses are blue. IIA requires that the new probabilities are P(car) 1/3, P(red bus).1/3, and P(blue bus) 1/3. This is necessary so that the odds of a car versus a red bus remain 1. However, if the only thing to distinguish the new bus service from the old is the color of the bus, we would not expect car travelers to start taking the bus (i.e., the utility does not change). Instead, the share of red bus riders would be split, resulting in P(car) 1/2 P(red bus) 1/4, and P(blue bus) 1/4. The new, implied odds for car versus red bus are 2 1/2 1/4, which violates the IIA assumption! The IIA assumption requires that if a new alternative becomes available, then all probabilities for the prior choices must adjust in precisely the amount necessary to retain the original odds among all pairs of outcomes. 43 / 47 44 / 47

Independence of Irrelevant Alternatives Testing IIA We assumed the the disturbances were distributed identically and independently according to Type 1 Extreme Value distribution. The violation of IIA indicates that the errors ɛ ij are not independent across alternatives j. The non-independence causes us to overestimate the probability of choosing alternatives that are similar to each other. A Hausman-type test is available to assess the property of IIA. If the IIA property holds, then the parameter estimates obtained on the subset of alternatives will not be significantly different from those obtained on the full set of alternatives. 45 / 47 46 / 47 Testing IIA The Hausman test proceeds as follows: 1 Estimate coefficients ˆβ F and covariance matrix ˆV F with all J alternatives. 2 Estimate coefficients ˆβ R and covariance matrix ˆV R with reduced alternatives. 3 Compare both estimates based on Hausman statistic: ( ˆβ R ˆβ F ) [ ˆV R ˆV F ] 1 ( ˆβ R ˆβ F ) which follows χ 2 distribution with k degrees of freedom where k is the number of elements in the β vector. 4 If the test statistic is larger than a critical value, we reject the null hypothesis that the IIA property holds. See Fry and Harris (1998) for alternative tests. 47 / 47