Random Utility Models, Attention Sets and Status Quo Bias

Similar documents
MC3: Econometric Theory and Methods. Course Notes 4

Introduction: structural econometrics. Jean-Marc Robin

Simultaneous Choice Models: The Sandwich Approach to Nonparametric Analysis

Nonparametric Welfare Analysis for Discrete Choice

16/018. Efficiency Gains in Rank-ordered Multinomial Logit Models. June 13, 2016

Chapter 1. GMM: Basic Concepts

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Non-parametric Identi cation and Testable Implications of the Roy Model

Simple Estimators for Semiparametric Multinomial Choice Models

Rank Estimation of Partially Linear Index Models

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

ECON2285: Mathematical Economics

h=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio

xtunbalmd: Dynamic Binary Random E ects Models Estimation with Unbalanced Panels

Estimation of Dynamic Nonlinear Random E ects Models with Unbalanced Panels.

The Kuhn-Tucker Problem

Nonlinear Programming (NLP)

ECON 594: Lecture #6

Simple Estimators for Monotone Index Models

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Testing for Regime Switching: A Comment

Choosing the Two Finalists

Chapter 2. Dynamic panel data models

Multinomial Discrete Choice Models

Microeconomics, Block I Part 1

Chapter 2. GMM: Estimating Rational Expectations Models

Control Functions in Nonseparable Simultaneous Equations Models 1

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

Intro to Economic analysis

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

13 Endogeneity and Nonparametric IV

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Measuring robustness

Volume 30, Issue 3. Monotone comparative statics with separable objective functions. Christian Ewerhart University of Zurich

Some Notes on Adverse Selection

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Advanced Microeconomics Fall Lecture Note 1 Choice-Based Approach: Price e ects, Wealth e ects and the WARP

Identi cation of Positive Treatment E ects in. Randomized Experiments with Non-Compliance

GMM estimation of spatial panels

PRICES VERSUS PREFERENCES: TASTE CHANGE AND TOBACCO CONSUMPTION

ECON0702: Mathematical Methods in Economics

Econ Review Set 2 - Answers

The marginal propensity to consume and multidimensional risk

Local Rank Estimation of Transformation Models with Functional Coe cients

1. The Multivariate Classical Linear Regression Model

Advanced Economic Growth: Lecture 8, Technology Di usion, Trade and Interdependencies: Di usion of Technology

GMM based inference for panel data models

Alvaro Rodrigues-Neto Research School of Economics, Australian National University. ANU Working Papers in Economics and Econometrics # 587

Endogeneity and Discrete Outcomes. Andrew Chesher Centre for Microdata Methods and Practice, UCL

EconS Advanced Microeconomics II Handout on Mechanism Design

Lecture # 1 - Introduction

Bounded Rationality Lecture 4

Solving Extensive Form Games

Lecture 5: Estimation of a One-to-One Transferable Utility Matching Model

Time is discrete and indexed by t =0; 1;:::;T,whereT<1. An individual is interested in maximizing an objective function given by. tu(x t ;a t ); (0.

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Instrumental variable models for discrete outcomes

Learning and Risk Aversion

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Advanced Microeconomics

Nonparametric Identification and Estimation of Nonadditive Hedonic Models

Daily Welfare Gains from Trade

Chapter 5 Linear Programming (LP)

Parametric Inference on Strong Dependence

Advanced Microeconomics I: Consumers, Firms and Markets Chapters 1+2

Labor Economics, Lecture 11: Partial Equilibrium Sequential Search

Nonparametric Identi cation of a Binary Random Factor in Cross Section Data

"A Theory of Financing Constraints and Firm Dynamics"

Monetary and Exchange Rate Policy Under Remittance Fluctuations. Technical Appendix and Additional Results

Lecture 8: Basic convex analysis

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Lecture 3 - Axioms of Consumer Preference and the Theory of Choice

The main purpose of this chapter is to prove the rst and second fundamental theorem of asset pricing in a so called nite market model.

4.3 - Linear Combinations and Independence of Vectors

Microeconomics, Block I Part 2

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

MATHEMATICAL PROGRAMMING I

Desire-as-belief revisited

ECONOMETRICS II (ECO 2401) Victor Aguirregabiria. Winter 2017 TOPIC 3: MULTINOMIAL CHOICE MODELS

Solution and Estimation of Dynamic Discrete Choice Structural Models Using Euler Equations

Choice Theory. Matthieu de Lapparent

How to Attain Minimax Risk with Applications to Distribution-Free Nonparametric Estimation and Testing 1

Addendum to: International Trade, Technology, and the Skill Premium

A note on comparative ambiguity aversion and justi ability

An Instrumental Variable Model of Multiple Discrete Choice

University of Toronto Department of Economics. Solution and Estimation of Dynamic Discrete Choice Structural Models Using Euler Equations

Estimation with Aggregate Shocks

1 Uncertainty. These notes correspond to chapter 2 of Jehle and Reny.

Economics 241B Estimation with Instruments

Production Policies for Multi-Product Systems with Deteriorating. Process Condition

Contents. University of York Department of Economics PhD Course 2006 VAR ANALYSIS IN MACROECONOMICS. Lecturer: Professor Mike Wickens.

Appendix II Testing for consistency

What Are Asset Demand Tests of Expected Utility Really Testing?

Bayesian IV: the normal case with multiple endogenous variables

The Cake-Eating problem: Non-linear sharing rules

Lecture Notes Part 7: Systems of Equations

Microeconometrics. Bernd Süssmuth. IEW Institute for Empirical Research in Economics. University of Leipzig. April 4, 2011

ECON0702: Mathematical Methods in Economics

Estimation of Static Discrete Choice Models Using Market Level Data

Transcription:

Random Utility Models, Attention Sets and Status Quo Bias Arie Beresteanu and Roee Teper y February, 2012 Abstract We develop a set of practical methods to understand the behavior of individuals when attention sets may a ect their choices. We focus on a semi-parametric model of attention sets and a parametric model capturing status quo bias, providing identi cation and estimation results for both models. This paper aims to join modeling techniques from behavioral decision theory and econometric analysis in an innovative way, bridging the two disciplines. Keywords: Random Utility Models, Status Quo Bias, Logit Models Incomplete. Do not Cite. Do not distribute. Department of Economics, University of Pittsburgh, arie@pitt.edu. This research is sponsored in part by NSF grant SES-0922373. y Department of Economics, University of Pittsburgh, rteper@pitt.edu.

1 Introduction Discrete Choice models are important modeling tools in economics and are used in a broad set of empirical applications such as consumer behavior in markets. Such models assume that the feasible set of alternatives considered by decision makers is given and universal and that decision makers maximize utility over that set. In other words, all individuals consider the same bundle of alternatives when deciding which of these options maximize their utility. In reality, di erent individuals may decide to pay close attention only to some of the alternatives in the universal alternatives set. In such a case, the decision maker reduces the number of alternatives she contemplates between to form a, potentially smaller, set of alternatives the attention set from which she eventually chooses the element that maximizes her utility. By developing a set of practical methods to understand the behavior of individuals when attention sets may a ect their choices, this paper aims to join modeling techniques from behavioral decision theory and econometric analysis in an innovative way, bridging the two disciplines. We focus on two models a semiparametric model of attention sets as in Masatlioglu, Nakajima, and Uzbay (2012), and a parametric model capturing status quo bias as in Riella and Teper (2012). We develop statistical tests for the existence of attention sets and status quo bias in individual choice. Identi cation and estimation properties of the semi-parametric and parametric discrete choice models are provided, when agents are either homogeneous or heterogeneous with respect to how the attention sets are being constructed. In future work we plan to implement the methods developed here using both a Monte-Carlo experiment and actual data. In the Monte Carlo experiment, the properties of the estimators are being evaluated and compared. The empirical analysis uses data from the Consumers Expenditure Survey (CES) from 2003 and 2004. In this version of the paper we focus only on a Monte Carlo experiment from a random number generator. Overview and Results. The construction of attention sets has long been recognized as a key feature of decision making. Such models were studied in the literature of marketing (e.g., Roberts and Lattin (1991) and Roberts and Lattin (1997)), industrial organization (e.g., Piccione and Spiegler (2012) and Eliaz and Spiegler (2011)), decision theory (e.g., Masatlioglu and Ok (2005), Masatlioglu, Nakajima, and Uzbay (2012), Riella and Teper (2012)) and econometrics (e.g., Ben- Akiva and Boccara (1995) and Horowitz and Louviere (1995)). The observed phenomenon that individuals exhibit a nity to the current state of a airs status quo bias, as termed by Samuelson and Zeckhauser (1988) is supported by ample of evidence (see empirical ndings in signi cant decisions as choosing a pension plan Madriam and Shea (2001), Agnew, Balduzzi, and Sunden 1

(2003) and Choi, Laibson, Madrian, and Metrick (2004), electric services Hartman, Doane, and Woo (1991) and car insurance Johnson, Hershey, Meszaros, and Kunreuther (1993)). Within the literature of decision theory, attention sets is by now a known and acceptable modeling tool of status quo bias (see, for example, Masatlioglu and Ok (2005), Eliaz and Spiegler (2011)) and Riella and Teper (2012)) the decision maker rst restricts the set of alternatives she considers to a subset of alternatives that dominate the status quo option in some fashion, after which she picks her optimal choice from that restricted set. We here integrate attentions as attention sets and status quo bias into the classic analysis of discrete choice in econometrics. We investigate a semi-parametric model in which utility is described by the standard conditional Logit. However, instead of maximizing utility, we assume that agents pay attention to a subset of alternatives the attention set ( see Masatlioglu, Nakajima, and Uzbay (2012)) and maximize utility across elements in that set. Having this model in mind, we provide conditions under which the distribution of the attention sets and utility functions can be identi ed. We show that in some cases only partial (as opposed to point) identi cation is possible. The identi cation results resort to the classic conditional Logit identi cation McFadden (1974a) and to Kakutani s x point theorem. We then proceed to study a parametric and more structured model in the lines of Riella and Teper (2012). This model considers agents that are standard utility maximizers but also agents that are status quo biased, where status quo bias is modeled via attention sets. An attention set includes alternatives that dominate the status quo in a subset of attributes; the number of attributes in which an alternative needs to dominate the status quo option, in order to be included in the attention set, determines, and is determined by, the extent to which the decision maker is status quo biased. We utilize the results provided from the semi parametric model, and provide identi cation results for the parametric status quo bias model. By incorporating behavioral decision theory into econometric analysis, we allow researchers and policy makers to evaluate the performance of existing policies and compare them with alternative policies. One example of pertinent public policy debate that can bene t from the methodology pursued here is public health insurance systems. Accounting for status quo bias and distinguishing between individuals with current health coverage and those who do not have one will allow for an optimal design of health insurance systems. Organization. The manuscript is organized as follows. Section 2 presents a semiparametric model with general attention sets. We give (partial) identi cation results for the parameters of this model. 2

In Section 3 a behavioral model of status quo bias is introduced where decision makers form their attention sets based on an observed status quo option. We give (partial) identi cation results for this model as well. We conclude in Section 4 with a short Monte Carlo experiment that implements the methods proposed in previous sections. 2 Identi cation with General Attention sets Let (I; ; P ) be a probability space. I is the collection of possible agents where a typical agent is denoted by i. All random variables, vectors and sets are de ned on this probability space. In what follows we denote all random elements associated with observation i by a superscript and vector or matrix coordinates by subscripts. A sampling process draws decision makers from I according to the probability P. Each individual is characterized by the following random variables, matrices, functions and sets described below. Each agent i 2 I, 1 faces a universal set of alternatives. Each option is denoted by an index from the set K = 1; :::; k i. To simplify the exposition we assume that k i = k, i.e. that all individuals face the same number of alternatives, although this is unnecessary. Each alternative, k 2 K, the decision maker faces can be summarized by the ( m + 1) 1 vector ~x i k x = i k;1 ; :::; xi k; m ; i k. The collection of these characteristics is denoted by X ~ i = ~x i 1 ; :::; ~xi k which is a nite subset of X R where X R m is the observed characteristics space. Thus X ~ i can be equivalently described as k( m + 1) matrix whose k rows are vectors in X R. With some abuse of notation we treat ~ X i both as a set and as a matrix. The rst m coordinates of each ~x i k are observables and the last coordinate is unobservable. Let i = i 1; :::; i k be the k 1 vector of unobserved characteristics, one for each alternative in K. The universal choice set ~ X i is indexed by i since the observed characteristics of a certain option may vary across individuals. 2 In Section 3 we introduce the notion of status quo bias. In the model we develop there we assume that an additional variable may be observed, ~s i which we assume is an element of the set of alternatives that decision maker i faces, ~ X i. The discussion of this additional observable is left for Section 3. The utility function of agent i is u i which depends on the observed and unobserved characteristics of the choice, the agent s observed demographics and later also on her status quo option. We make the way in which the agent s utility depends on these elements speci c in the next section. In what follows we assume that the decision maker can restrict the set of alternatives she 1 All statements made for all i 2 I are meant P -a.s. 2 We can introduce random coe cients that depend on a vector of observed and unobserved individual characteristics. We leave random coe cient models for future work. 3

considers. Let D i K be agent i s attention set a subset of alternatives to which agent i pays attention. Finally, the choice made by agent i is denoted by y i 2 D i indicating which of the members in D i is chosen by individual i. The utility function, the attention set and the choice are discussed in greater details in the following section. The goal in this section is to determine under which assumptions we can identify the distribution of (u; D) from the joint distribution of the observables (y; X). Either the utility function, u, or the attention set, D, can be left unspeci ed (nonparametric) or can be speci ed by a nite parameter. Moreover, we can either assume that the agents are homogeneous, i.e. all having the same u and D conditional on their covariates, or we can assume heterogeneity across agents and seek to identify the distribution of (u; D) or their parameters. We start from a minimal set of assumptions on (u; D) and gradually add assumptions as necessary. 2.1 Assumptions In this section we make explicit the assumptions we make on the observed and unobserved elements of the model. We start from a general model of attention sets. the formation of these attention sets is left unspeci ed. In other words, no behavioral model is assumed at this stage. In 3 we introduce a behavioral model of status quo bias in which attention sets are formed by the decision maker according 2.1.1 Utility Our starting point is the conditional Logit model (McFadden 1974). Let u (x; ) be the utility function depending on product observed and unobserved characteristics. We assume that the utility function is additively separable in its observed and unobserved product characteristics. We further assume that the utility is linear in the observed product characteristics. Assumption 2.1 The utility function u (; ), and the random variables (X; ) satisfy the following (i) The agent s utility function is a linear function of the product characteristics and is additively separable in the unobservable choice characteristics. In other words, for each choice (x; ) 2 X ; u (x; ) = v (x) + where v : X! R. (ii) v(x) is linear in x. In other words, v(x) = x for 2 B R m and B is a compact set. (iii) 1 ; :::; k are independent of each other and of X with the Type-I extreme value distribution. The function v () is often called the average utility function since has mean zero. In the next section we introduce an additional characteristics of the consumer which is the status quo option 4

they currently hold. Assumption 2.1 postulates the utility is not a function of the status quo option but only of the choices characteristics and the individual s demographics. We deviate from this assumption in Section 3.1 where we introduce switching costs into the average utility function. A switching cost creates dependence of the utility function on the status quo option. 2.1.2 Attention Sets The decision maker forms a potentially smaller set of alternatives from which she will choose the alternative that maximizes her utility. This reduced set is called the attention set and is denoted by D. Let D 2 K n f;g be a certain collection of subsets of K where K = 1; :::; k. We call D the collection of attention sets representing all the possible subsets of alternatives to which a decision maker can restrict her attention. With no assumptions on the behavior of the decision maker, the set D can be any member in the collection of attention sets D. Furthermore, the collection D itself is arbitrary at this point. In the most general case we can have D = 2 K n f;g which means that any non-empty subset of K can potentially be a attention set for some agents. In the next section we introduce a behavioral model of status quo bias that determines the collection D. The discussion on identi cation in this section, however, assumes an arbitrary collection of attention sets and is agnostic about how each decision maker chooses a set D. We assume that each decision maker has a attention set D i 2 D which represents her type. Therefore, there are jdj types in the population and the data we observe are a nite mixture of jdj decision makers choices. For convenience we index the elements in the collection D and let D = (D 1 ; :::; D d ). We assume that the collection D covers the set of alternatives, K, in the following way. Assumption 2.2 The collection D satis es: S (i) D = K, D2D (ii) jdj 2, (iii) for each i, D i 2 D and is statistically independent of X i ; i. The important part of assumption 2.2-(iii) is that the decision maker s type, D i, is independent of unobservables i. Therefore, given the observed X i, D i is distributed binomially over D. For simplicity we assume that D i is also independent of X i. For every D d 2 D, let d = Pr D i = D d and let = ( 1 ; :::; d ). At this point we do not specify how this collection D is created. We also remain agnostic about how each individual picks an element of the collection D to be her attention set. Models that speak 5

to these issues are discussed in Section 3. 2.1.3 Choice After the attention set is formed in the rst stage, the following choice is made. Assumption 2.3 The choice y is de ned by (2.1) y = arg max n o u ~Xj : j 2 D ; where D 2 D. y is the called the choice correspondence. Once the attention set is formed the decision maker behaves as a classic utility maximizer. In cases where D = ~ X we obtain the regular discrete choice model. 2.1.4 Discussion The assumptions on the utility function and the distribution of the unobservables are similar to those stated in McFadden (1974a). The model with attention sets we discuss here can be seen as a mixture model of conditional Logit with mixture weights. The following sections show that both the utility parameter,, and the mixture vector,, are partially jointly identi ed. 2.2 Identi cation of Assume that the average utility function v and the distribution of, F, are known. In other words, for each x 2 X and each attention set D 2 D we can compute the probability that a choice k 2 D is chosen. Our focus in this sub-section is the weights f D g D2D. For convenience we index the elements in the collection D and let D = (D 1 ; :::; D d ) and = ( 1 ; :::; d ). De ne the following matrix, A = [a kd ] k=1;:::; k d=1;:::; d where a kd = Pr (y = kjx; D = D d ). Note that, for example, if k =2 D d, then a kd = 0. The data generating process allows us to observe Pr (y = kjx) for each k 2 K. Denote by p = Pr (y = 1jX) ; :::; Pr y = kjx the k1 vector of observed conditional choice probabilities. Then, A 0 = p. De ne the following identi cation region n k (2.2) (u) = 2 o 1 : A 0 = p where k 1 R k is the k 1 dimensional simplex. For a matrix A, let rank (A) denote the number of linearly independent rows that the matrix has. Let [A : p] denote the k d + 1 matrix called the augmented matrix. The following theorem summarizes the properties of the set (u). 6

Theorem 2.1 The set (u) in (2.2) satis es the following properties. (i) (u) is non empty if and only if rank ([A : p]) = rank (A). (ii) If d > rank(a) = rank ([A : p]) then is partially identi ed and (u) is a non-empty convex subset of k 1. First, note that k rank (A) by construction. 2.3 Identi cation of Given assumptions (2.1), (2.2) and (2.3), the probability of choosing option k 2 K given the characteristics set X has the following close form, (2.3) Pr (y = kjx) = dx d d=1 e x k Pl2D d e x l : The likelihood function based on n independent repetitions of the above experiment is the following (2.4) L n y i ; X i ; ; ny = Pr y = y i jx i : Theorem 2.2 Given assumptions (2.1), (2.2) and (2.3), for each 2 d the likelihood function in (2.4) has a unique solution. Proof sketch. This result is based on results in McFadden (1974b) who showed the likelihood function is globally concave with respect to for every attention set D. Therefore, the weighted average in (2.3) is globally concave as well. From this point the result follows using the arguments in McFadden (1974b). The maximum likelihood estimator in the above theorem gives, for each, the parameter vector () that maximizes the likelihood function. The following theorem is a result of the two Theorems above. Theorem 2.3 Given assumptions (2.1), (2.2) and (2.3), (; ) are partially jointly identi ed. i=1 Proof sketch. De ne a correspondence I B : B,! d as follows: if for 2 B we have that () 6= ; then let I B () = (); other wise let I B () = d. It is clear that I B () is compact and convex for every 2 B. Also, the graph of I B is closed. Indeed, since the s are independent and absolutely continuous, the function A : B! L(R kd ) is continuous. 3 Thus, the rank operator is lower-semicontinuous as a function of. De ne, in 3 L(R kd ) denotes the collection of all matrices of dimension k d. 7

addition, the function I d : d! B as follows: for 2 d let I d() be the 2 B that maximizes the likelihood function, I d() = () de ned in the proof of Theorem 2.2. From Theorem 2.2, I d is continuous in. Now, let I : B d,! B d be de ned by I(; ) = I d(); I B () : Since both I B and I d are convex valued and upper-semicontinuous, then so is I. From Kakutani s xed-point theorem we have that I has a xed point (; ) 2 I(; ). The collection of all points I = f(; ) 2 I(; ) : () 6= ;g is the identi cation set of (; ). I gives all parameter vectors (; ) consistent with the data p. Note that this collection may be empty, in which case the model is misspeci ed. I is sharp by de nition. It remains to show that this set can be computed. Note that Theorem 2.3 is not the strongest possible identi cation result one might obtain. We intend to nd a condition such that: 1. under such a condition the collection of s, for which () is non-empty, is convex; and 2. the MLE s, given the di erent 2 d, satisfy this condition. This will provide a direct condition on data and will allow us to use Kakutani directly on the collection of admissible set of parameters, without resorting to an arti cial correspondence between the parameters as in the proof sketch above. 3 Status Quo Bias In this section we discuss a decision models with status quo bias. We denote the observed status quo alternative (which can also be void) as s i 2 X ~ i [ f;g. In other words, the decision maker either hold one of the alternatives in the universal choice set X ~ i or no alternative at all. We denote by X = X ~ [ f;g. The decision maker than forms her attention sets with respect to the observed status quo option in a way made explicit in equation (3.1) below. We make the same assumptions on the choice model as in the previous section with one di erence. Assumption (2.1-ii) in section (2) states that the utility is linear with respect to observed product characteristics. In this section we modify this assumption and assume that it is also monotonically increasing in each characteristics. The reason for this di erence is the way attention sets are constructed in the model with status quo bias. 8

3.1 Assumptions Utility Assumption (2.1) hold with the following addition to part (ii). Assumption 3.1 (ii ) B R m +. Attention Sets Here we adopt the approach developed in Riella and Teper (2012) and introduce potential status quo bias into the way the decision maker forms her attention set. In this version of the paper we assume the agent forms the attention set based on observable characteristics only. Speci cally, the attention set is 8 (3.1) D( X; ~ < s; ; ) = : x 2 X ~ : 9 mx = j 1 [x j s j ] ; ; where = ( 1 ; :::; m ) is a vector of unknown non-negative coe cients and 0 is an unknown parameter. Given (3.1), xing ~X; s, knowledge of (; ) is enough to construct D( X; ~ s; ; ). j=1 Two points should be noted following (3.1). First, if s = ;, then D( ~ X; s; ; ) = ~ X and the agent is acting as a classic utility maximizer over the set ~ X. Second, if = 0 then D( ~ X; s; ; ) = ~ X as well. The last point implies that the model of utility maximization over the universal set ~ X is nested in our model of discrete choice under status quo bias. From (3.1) the parameter can be interpreted as the extent to which the agent is biased toward the status quo option s. In addition, j is the relative importance the agent assigns to characteristic j. The following normalization is required. Assumption 3.2 and satisfy, (3.2) j 0; 8j; kk = 1 and 2 [0; 1]. An extreme example is the case where k = 1 for some k 2 f1; :::; mg and j = 0 for all j 6= k. In this case only options x for which x k s k are included in the attention set. In other words, characteristic k is a must have characteristic and no alternative which is inferior to s on this dimension is considered. More importantly, the attention set as de ned in (3.1) with the normalization in 3.2 may not be enough to point identify (; ) due to the discrete nature of the 1 [] function in (3.1). A complete discussion of this argument is left for future versions of this paper. 9

The following assumption on the joint distribution of (X; s) makes sure the decision makers we can draw in our sample are su ciently diverse. The reason for this assumption will become clear in the following sub-sections. Assumption 3.3 The joint distribution of (X; s) is such that for any binary vector b 2 f0; 1g m, Pr 9x 2 X such that : 1 [xs] = b > 0. 3.1.1 Partial Identi cation with Common, and Here we assume that i = for some 2 B, i = for some 2 R m and i = such that Assumption 3.2 is satis ed P -a.s. The identi cation question here is: can we infer (; ; ) from P (y; s; X)? Consider rst the likelihood of observing a certain agent facing choices X i and owning a status quo option s i making a decision y i. First, we look at the attention set. If we assume (3.1), then given the observables (X; s) and the parameters (; ) the set D(X; s; ; ) is known and non-stochastic. Assume that the distribution of is known to be type-i extreme value. Then this distribution induces a distribution over 2 ~X for D(X; s; ; ). For simplicity, we assume that the decision maker forms her attention set based on observables only. For j 2 1; :::; k P jjx;s := Pr (y = jjx; s; ; ; ) = Pr (X j + a X r + r 8r 2 D(X; s; ; )) : Note that if j =2 D(X; s; ; ), then P jjx;s = 0. If, however, j 2 D(X; s; ; ), then assuming that has type 1 extreme value distribution P jjx;s = exp (X j ) Pr2D(X;s;;) exp (X r) : let Therefore, the log-likelihood of observing y i ; X i ; s i N i=1 is (3.3) LL n (; ; ) = X log P y i jx i ;s i: Consider now the formation of the attention set. As assumption (3.1) suggests, the attention set is built using ordinal preferences. This is manifested by the presence of the indicator function in the de nition. The discrete nature of the attention set implies that a small change in the vector may not cause any change in the set D(X; s; ; ). Theorem 3.1 Given the assumptions above and equation (3.1), (; ; ) are partially jointly identi ed. 10

Proof sketch. First, notice that if x 2 D (X; s; ; ), then Pr (y = xjx; s; ; ; ) > 0. This is satis ed with the conditional Logit model as well as other common discrete choice models since we assume in nite support for. Let (; ; ) and ~; ~; ~ are two alternative vectors that satisfy Assumption (3.2). Consider the following two cases. First assume that n Pr (X; s) : D (X; s; ; ) 6= D X; s; ~; ~ o > 0: In other words, the set of individuals for which the attention set is di erent under (; ) than ~; ~ is of a non-zero measure. For individuals in this set, either 9a 2 D (X; s; ; ) such that a =2 D X; s; ~; ~. Following our rst observation Pr (y = ajx; s; ; ) > 0 while Pr y = ajx; s; ~; ~ = 0. Consider the case where n Pr (X; s) : D (X; s; ; ) 6= D X; s; ~; ~ o = 0: In this case, (; ) are observationally equivalent. De ne A (; ) = n ~; ~ n : Pr (X; s) : D (X; s; ; ) 6= D X; s; ~; ~ o o = 0 : We call A (; ) the equivalence class of (; ). Since all the members in a certain equivalence class are observationally equivalent, the parameters (; ) are partially identi ed. To jointly identify (; ; ) we can consider two situations. First, if Pr (s = ;jx) > 0, then we can point identify from this sub-population. (; ) are still partially identi ed. If, however, Pr (s = ;jx) = 0, we need to invoke argument similar to those we used in Theorem 2.3. 4 Monte Carlo Experiments In this section we implement the methods suggested in pervious section on a simpli ed model using Monte Carlo experiments. We add the following normalization to the de nition in (3.1). Let j = 1; 8j and 2 f0; 1; :::; mg. This assumption states that in order to consider an alternative, the decision maker counts by how many coordinates an option x is (weakly) superior to the status quo option s. If this number of is greater than or equal to, then x is included in the consideration set. The above is summarized in the following assumption. 11

Assumption 1 The agent forms the consideration set based on observed characteristics and using the following ordinal rule, 8 (4.1) D( X; ~ < s; ) = : x 2 X ~ : 9 mx = 1 [x j s j ] ; : In all Monte Carlo experiments the number of observations is N = 1000. j=1 The distribution " ij is the Type-I Extreme Value. There are 4 characteristics for each choice (m = 4) and there are 7 choices (k = 7). Therefore, ~ Xi is a 7 4 matrix. For each individual, i, the matrix of individual/choice characteristics, X i, was drawn from a uniform distribution such that the rst two characteristics were drawn from the uniform distribution U (0; 3:3) and the last two coordinates were drawn from the uniform distribution U ( choice. 4.1 Common and 3:3; 0). The 7 th choice was used as the numerair We set = (0:4; 0:3; 0:5; 0:6) and = 2 for all agents. We estimated the model described in Section 3 with the assumption above in three di erent ways. First, we ignore the presence of status quo bias and estimate the parameter using the conditional Logit model. Second, we estimate using only the sub-population of decision makers that do not have a status quo option. Given the estimator ^, we estimate using the likelihood function in (3.3). Third, we estimate both and together using the likelihood function in (3.3). We expect the rst estimator to be inconsistent and the second and third estimators to be consistent. The third estimator is expected to be more e cient than the second. For the Monte Carlo simulations for each individual i we choose a characteristic matrix X i of size 7 4 which represents seven choices each characterized by a 4dimensional vector. The parameter vector is set to be = (0:4; 0:3; 0:5; 0:6) and the status quo bias parameter is set to = 2. We drew the unobserved product characteristic " ij from the type-i extreme value distribution. Method ^1 ^2 ^3 ^4 ^ Conditional Logit 0:533 (0:044) 0:435 (0:040) 0:630 (0:042) 0:7309 (0:044) N:A: Two step 0:406 0:304 0:503 0:613 (0:116) (0:113) (0:114) (0:121) 2 One step 0:402 0:303 0:499 0:603 (0:048) (0:044) (0:049) (0:048) 2 True values 0:4 0:3 0:5 0:6 2 12

References Agnew, J., P. Balduzzi, and A. Sunden (2003): Portfolio Choice and Trading In a Large 401(k) Plan, American Economic Review, 93, 193 215. Ben-Akiva, M., and B. Boccara (1995): Discrete Choice Models with Latent Choice Sets, International Journal of Research in Marketing, 12, 9 24. Choi, J. J., D. Laibson, B. C. Madrian, and A. Metrick (2004): Perspectives on the Economics of Aging. University of Chicago Press. Eliaz, K., and R. Spiegler (2011): Consideration Sets and Competitive Marketing, Review of Economic Studies, 78, 235 262. Hartman, R. S., M. J. Doane, and C.-K. Woo (1991): Consumer Rationality and the Status Quo, Quarterly Journal of Economics, 106, 141 162. Horowitz, J. L., and J. J. Louviere (1995): International Journal of Research in Marketing, International Journal of Research in Marketing, 12, 39 54. Johnson, E., J. Hershey, J. Meszaros, and H. Kunreuther (1993): Framing, P robability Distortions, and Insurance Decisions, Journal of Risk and Uncertainty, 7, 35 51. Madriam, B. C., and D. F. Shea (2001): The Power of Suggestion: Inertia In 401(k) Participation and Savings Behavior., Quarterly Journal of Economics, 116, 1149 1187. Masatlioglu, Y., D. Nakajima, and E. Uzbay (2012): Revealed Attention, American Economic Review, 102, 2183 2205. Masatlioglu, Y., and E. Ok (2005): Rational Choice With Status Quo Bias, Journal of Economic Theory, 121, 1 29. McFadden, D. (1974a): Conditional Logit Analysis of Qualitative Choice Behavior, in Frobtiers in Econometrics, ed. by P. Zarembka, pp. 105 142. Academic Press, New York. (1974b): Conditional Logit Analysis of Qualitative Choice Behavior, in Frontiers in Econometrics, ed. by P. Zarembka, chap. 4, pp. 105 142. Academic Press: New York. Piccione, M., and R. Spiegler (2012): Price Competition under Limited Comparability, Quarterly Journal of Economics, 127, 97 135. 13

Riella, G., and R. Teper (2012): Probabistic Dominanace and Status Quo Bias, Mimeo. Roberts, J. H., and J. M. Lattin (1991): Development and Testing of a Model of Consideration Set Composition, Journal of Marketing Research, 28, 429 440. (1997): Consideration: Review of Research and Prospects for Future Insight, Journal of Marketing Research, 34, 406 410. Samuelson, W., and R. Zeckhauser (1988): Status-quo bias in decision making, Journal of Risk and Uncertainty, 1, 7 59. 14