An Overview of Choice Models

Similar documents
Probabilistic Choice Models

Introduction to Discrete Choice Models

Probabilistic Choice Models

Part I Behavioral Models

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 2. Recap: MNL. Recap: MNL

Lecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo

P1: GEM/IKJ P2: GEM/IKJ QC: GEM/ABE T1: GEM CB495-05Drv CB495/Train KEY BOARDED August 20, :28 Char Count= 0

Choice Theory. Matthieu de Lapparent

Econ 673: Microeconometrics

Hiroshima University 2) The University of Tokyo

Lecture 6: Discrete Choice: Qualitative Response

Discrete Choice Methods with Simulation

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as

How Indecisiveness in Choice Behaviour affects the Magnitude of Parameter Estimates obtained in Discrete Choice Models. Abstract

Limited Dependent Variable Models II

Chapter 1 - Preference and choice

Pairwise Choice Markov Chains

McGill University Department of Electrical and Computer Engineering

Entry for Daniel McFadden in the New Palgrave Dictionary of Economics

Hiroshima University 2) Tokyo University of Science

Chapter 4 Discrete Choice Analysis: Method and Case

disc choice5.tex; April 11, ffl See: King - Unifying Political Methodology ffl See: King/Tomz/Wittenberg (1998, APSA Meeting). ffl See: Alvarez

Pairwise Choice Markov Chains

Individual decision-making under certainty

Limited Dependent Variables and Panel Data

INTRODUCTION TO TRANSPORTATION SYSTEMS

Practical note on specification of discrete choice model. Toshiyuki Yamamoto Nagoya University, Japan

Bayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen

h=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio

Lecture 14. Clustering, K-means, and EM

The perceived unreliability of rank-ordered data: an econometric origin and implications

Introductory notes on stochastic rationality. 1 Stochastic choice and stochastic rationality

6 Mixed Logit. 6.1 Choice Probabilities

Probability. 25 th September lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.)

SHEBA. Experiential Learning For Non Compensatory Choice. Khaled Boughanmi Asim Ansari Rajeev Kohli. Columbia Business School

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Lecture 10 Demand for Autos (BLP) Bronwyn H. Hall Economics 220C, UC Berkeley Spring 2005

Probabilistic Graphical Models for Image Analysis - Lecture 1

(Behavioral Model and Duality Theory)

Outline. The binary choice model. The multinomial choice model. Extensions of the basic choice model

Introduction to mixtures in discrete choice models

Data-analysis and Retrieval Ordinal Classification

A PSYCHOPHYSICAL INTERPRETATION OF RASCH S PSYCHOMETRIC PRINCIPLE OF SPECIFIC OBJECTIVITY

EFFICIENT PREFERENCE LEARNING WITH PAIRWISE CONTINUOUS OBSERVATIONS AND GAUSSIAN PROCESSES. Department of Informatics and Nymøllevej 6

Lecture 6: Gaussian Mixture Models (GMM)

Multinomial Discrete Choice Models

Probability Theory for Machine Learning. Chris Cremer September 2015

Learning from Comparisons and Choices

Choice Set Effects. Mark Dean. Behavioral Economics G6943 Autumn 2018

What can we learn about correlations from multinomial probit estimates?

Estimation of mixed generalized extreme value models

Intro to Economic analysis

Decoupled Collaborative Ranking

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

The 17 th Behavior Modeling Summer School

Preference, Choice and Utility

A short introduc-on to discrete choice models

Statistical learning. Chapter 20, Sections 1 3 1

Performance Evaluation and Comparison

Chapter 3 Choice Models

Discrete Choice Models I

Discrete panel data. Michel Bierlaire

Probabilistic Graphical Models

Lecture-20: Discrete Choice Modeling-I

SP Experimental Designs - Theoretical Background and Case Study

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

POISSON RACE MODELS: THEORY AND APPLICATION IN CONJOINT CHOICE ANALYSIS

Bayesian nonparametric latent feature models

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2016 Instructor: Victor Aguirregabiria

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Using Discrete Choice Models to Estimate Non-market Values: Effects of Choice Set Formation and Social Networks. Minjie Chen

Syllabus. By Joan Llull. Microeconometrics. IDEA PhD Program. Fall Chapter 1: Introduction and a Brief Review of Relevant Tools

Linear Classification: Probabilistic Generative Models

14.13 Lecture 8. Xavier Gabaix. March 2, 2004

Large-scale Ordinal Collaborative Filtering

Random Choice and Learning

PMR Learning as Inference

Assortment Optimization Under the Mallows model

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

16/018. Efficiency Gains in Rank-ordered Multinomial Logit Models. June 13, 2016

ESTIMATION AND OPTIMIZATION PROBLEMS IN REVENUE MANAGEMENT WITH CUSTOMER CHOICE BEHAVIOR

Recent Advances in Bayesian Inference Techniques

ECE521 Lecture7. Logistic Regression

Dominance and Admissibility without Priors

6.867 Machine Learning

Minimax-optimal Inference from Partial Rankings

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Moderate Expected Utility

Lecture 14 More on structural estimation

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Transitive Regret. Sushil Bikhchandani and Uzi Segal. October 24, Abstract

Recognizing single-peaked preferences on aggregated choice data

Bayesian Learning in Undirected Graphical Models

Learning Combinatorial Functions from Pairwise Comparisons

Restricted Boltzmann machines modeling human choice

Exploratory analysis of endogeneity in discrete choice models. Anna Fernández Antolín Amanda Stathopoulos Michel Bierlaire

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Transcription:

An Overview of Choice Models Dilan Görür Gatsby Computational Neuroscience Unit University College London May 08, 2009 Machine Learning II 1 / 31

Outline 1 Overview Terminology and Notation Economic vs psychological views of decision making Brief History 2 Models of Choice from Psychology Bradley Terry Luce (BTL) Elimination by Aspects 3 Models of Choice from Economics Logit Nested Logit Probit Mixed Logit 2 / 31

Outline 1 Overview Terminology and Notation Economic vs psychological views of decision making Brief History 2 Models of Choice from Psychology Bradley Terry Luce (BTL) Elimination by Aspects 3 Models of Choice from Economics Logit Nested Logit Probit Mixed Logit 3 / 31

Discrete Choice Models Terminology Goal of choice models understand and model the behavioral process that leads to the subject s choice Data: Comparison of alternatives from a choice set Psychophysics experiments Marketing: which cell phone to buy Alternatives: Items or courses of action Features (i.e. aspects) may or may not be known 4 / 31

Discrete Choice Models Terminology Choice set discrete finitely many alternatives exhaustive all possible alternatives are included mutually exclusive choosing one implies not choosing any other Data repeated choice from subsets of the choice set number of times each alternative is chosen over some others Task: Learn the true choice probabilities 5 / 31

Discrete Choice Models Different approaches Algebraic or absolute theories in economics vs probabilistic or stochastic theories in psychology Random utility models randomness in the determination of subjective value Constant utility models randomness in the decision rule 6 / 31

Discrete Choice Models Economic view of decision making Desirability precedes availability preferences are predetermined in any choice situation, and do not depend on what alternatives are available. A vaguely biological flavor Preferences are determined from a genetically-coded taste template. The expressed preferences are functions of the consumer s taste template, experience and personal characteristics. Unobserved characteristics vary continuously with the observed characteristics of a consumer. 7 / 31

Discrete Choice Models Psychological views of decision making Alternatives are viewed as a set of aspects that are known. The randomness in choice comes from the decision rule. 8 / 31

Discrete Choice Models Different approaches Psychology: Process Models Bradley-Terry-Luce (BTL) Elimination by Aspects (EBA) Diffusion Models Race Models Economics: Random Utility Maximization Models logit, nested logit probit mixed logit 9 / 31

Discrete Choice Models Basic assumptions Simple scalability The alternatives can be scaled such that the choice probability is a monotone function of the scale values of the respective alternatives. Independence from irrelevant alternatives (IIA) The probability of choosing an alternative over another does not depend on the choice set. Simple scalability implies IIA There are cases where both assumptions are violated, e.g. when alternatives share aspects. 10 / 31

Brief History Thurstone (1927) introduced a Law of Comperative Judgement: alternative i with true stimulus level V i is perceived with a normal error as V i + ε i, and the choice probability for a paired comparison satisfied P [1,2] (1) = Φ(V 1 V 2 ), a form now called binomial probit model. Random Utility Maximization (RUM) model is Thurstone s work introduced by Marschak (1960) into economics, exploring the theoretical implications for choice probabilities of maximization of utilities that contained some random elements. Independence from irrelevant alternatives (IIA) axiom introduced by Luce (1959) allowed multinomial choice probabilities to be inferred from binomial choice experiments. Luce showed for positive probabilities that IIA implies strict utilities w i s.t. P C (i) = w i / k C w k. Marschak proved for a finite universe of objects that IIA implies RUM. 11 / 31

Brief History Multinomial Logit (MNL) was introduced by McFadden (1968) in which the strict utilities were specified as functions of observed attributes of the alternatives, P C (i) = exp(v i )/ k C exp(v k). McFadden showed that the Luce model was consistent with a RUM model with iid additive disturbances iff these disturbances had Extreme Value Type I distribution. 12 / 31

Outline 1 Overview Terminology and Notation Economic vs psychological views of decision making Brief History 2 Models of Choice from Psychology Bradley Terry Luce (BTL) Elimination by Aspects 3 Models of Choice from Economics Logit Nested Logit Probit Mixed Logit 13 / 31

Bradley Terry Luce (BTL) Starts with the IIA assumption and arrives at the choice probabilities: P(x; A) = U(x) y A U(y) A: choice set U(α): utility of alternative x 14 / 31

Elimination by aspects Takes into account similarities between alternatives. Represents characteristics of the alternatives in terms of their aspects: x = α, β,... Choice probabilities P(x; A) = α x A o U(α)P(x; A α) β A A o U(β) A : set of aspects that belongs to at least one alternative in set A A o : set of aspects that belong to all alternatives in A U(α): utility of aspect α 15 / 31

Elimination by aspects b x y a c d e f g z Can cope with choice scenarios where IIA and simple scalability assumptions do not hold. 16 / 31

Elimination by aspects Independence from irrelevant alternatives Bus Car a b a a+b IIA suggests the probability of choosing the car to drop from 1/2 to 1/3 when a second bus is introduced as the third alternative. Bus 17 / 31

Elimination by aspects Simple scalability b b a a Simple scalability assumes that the choice probabilities change smoothly with the scale (utility) of an alternative. However, in some situations introducing a small but unique aspect may lead to an immense change in the probabilities. 18 / 31

Outline 1 Overview Terminology and Notation Economic vs psychological views of decision making Brief History 2 Models of Choice from Psychology Bradley Terry Luce (BTL) Elimination by Aspects 3 Models of Choice from Economics Logit Nested Logit Probit Mixed Logit 19 / 31

Random Utility Maximization True utility Choice probability U nj = V nj + ε nj P ni = Prob(U ni > U nj j i) V nj = V (x nj, s n ) j x nj : observed attributes of alternative j s n : attributes of decision maker n 20 / 31

Random Utility Maximization True utility Choice probability U nj = V nj + ε nj where ε n = [ε 1,..., ε J ] P ni = Prob(U ni > U nj j i) = Prob(V ni + ε ni > V nj + ε nj j i) = Prob(ε nj ε nj < V ni V nj j i) = I(ε nj ε nj < V ni V nj j i)f (ε n ) dε n, ε 21 / 31

Logit Extreme Value Error density and distribution f (ε nj ) = e ε nj e e ε nj, F (ε nj ) = e e ε nj Choice probability ( ) P ni = e e ε ni +V ni V nj e ε nj e e ε nj j i = ev ni j ev nj = eβt x ni j eβt x nj dε ni 22 / 31

Nested Logit Generalized Extreme Value The set of alternatives are partitioned into subsets called nests such that: IIA holds within each nest. IA does not hold in general for alternatives in different nests. Error distribution ( F (ε n ) = exp K ( ) e ε nj /λ k ) λ k j B k k=1 λ k is a measure of the degree of independence among the alternatives in nest k. 23 / 31

Nested Logit Generalized Extreme Value The set of alternatives are partitioned into subsets called nests such that: IIA holds within each nest. IA does not hold in general for alternatives in different nests. Choice probability P ni = ev ni /λ k ( j B k e V nj /λ k ) λ k 1 K l=1 ( j B l e V nj /λ l ) λ l λ k is a measure of the degree of independence among the alternatives in nest k. 23 / 31

Nested Logit The model is RUM consistent if 0 λ k 1 For λ k > 1, the model is RUM consistent for some range of explanatory variables. λ k < 0 is RUM inconsistent as it implies that improving attributes of an alternative can decrease its probability of being selected. if λ k = 1 the model reduces to Logit as λ k 0 the model approaches the Elimination by Aspects model k 24 / 31

Generalizations of Nested Logit Higher-level nests Overlapping nests Cross Nested Logit multiple overlapping nests Ordered GEV correlation depends on the ordering of alternatives Paired Combinatorial Logit each pair of alternatives constitutes a nest with its own correlation Generalized Nested Logit multiple overlapping nests with a different membership weight to each net for each alternative. 25 / 31

Simple procedure for defining any GEV Define Y j = exp(v j ), and consider a function G = G(Y 1,..., Y J ) with partial derivatives G i = G Y i P i = Y ig i G if 1 G 0 for all positive values of Y j j. 2 G is homogeneous of degree one, i.e. G(ρY 1,..., ρy J ) = ρg(y 1,..., Y J ) 3 G as Y j for any j 4 The cross partial derivatives of G have alternating signs. 26 / 31

Simple procedure for defining any GEV Examples Logit: G = J j=1 Y j Nested Logit: G = ( K ) l=1 j B l Y 1/λ λl l, 0 λk 1 k Paired Combinatorial Logit: G = J 1 ( ) J k=1 l=k+1 Y 1/λ kl k + Y 1/λ λkl kl l ) Generalized Nested Logit: G = ( K ) k=1 (α j Bk jky j ) 1/λ λk k 27 / 31

Probit Normal Error distribution: Normal φ(ε n ) = 1 (2π) J/2 Ω 1/2 e 1 2 εt n Ω 1 ε n Choice Probabilities P ni = I(V ni + ε ni > V nj + ε nj j i)φ(ε n ) dε n, This integral does not have a closed form. 28 / 31

Mixed Logit can approximate any RUM model A hierarchical model where the logit parameters β are given prior distributions so that the choice probabilities are given as: P ni = L ni (β)f (β) dβ, where L ni (β) is the logit probability evaluated at parameters β: L ni (β) = e V ni β J i=1 ev nj β f (β) can be discrete or continuous. e.g. if β takes M possible values, we have a latent class model 29 / 31

Mixed Logit can approximate any RUM model A hierarchical model where the logit parameters β are given prior distributions so that the choice probabilities are given as: P ni = L ni (β)f (β) dβ, where L ni (β) is the logit probability evaluated at parameters β: L ni (β) = e V ni β J i=1 ev nj β f (β) can be discrete or continuous. e.g. if β takes M possible values, we have a latent class model 29 / 31

Extensions of Choice Models Model extensions Nonparametric distributions over noise distribution Nonparametric distributions over explanatory variable parameters Nonparametric prior over the functions on explanatory variables Application areas Conjoint analysis Ranking, information retrieval 30 / 31

References McFadden, "Economic Choices", Nobel Prize Lecture, 2000 Train, "Discrete Choice Methods with Simulation", Cambridge University Press, 2003 Tversky, "Elimination by Aspects: A Theory of Choice", Psychological Review, 1972 Greene, "Discrete Choice Modeling", 2008 Cao et al., "Learning to Rank: From Pairwise to Listwise Approach", ICML 2007 Chu, Ghahramani, "Preference Learning with Gaussian Processes", ICML 2005 31 / 31