ECONOMETRICS II (ECO 2401) Victor Aguirregabiria. Winter 2018 TOPIC 3: MULTINOMIAL CHOICE MODELS

Size: px

Start display at page:

Download "ECONOMETRICS II (ECO 2401) Victor Aguirregabiria. Winter 2018 TOPIC 3: MULTINOMIAL CHOICE MODELS"

Anastasia Summers
5 years ago
Views:

1 ECONOMETRICS II (ECO 2401) Victor Aguirregabiria Winter 2018 TOPIC 3: MULTINOMIAL CHOICE MODELS 1. Introduction 2. Nonparametric model 3. Random Utility Models - De nition; - Common Speci cation and Normalizations; - Choice Probabilities; - Some Theorems

2 4. Logit Model 5. Nested Logit Model 6. Random Coe cients Logit Model 7. Monte Carlo Simulation 8. Simulation-Based Estimation

3 1. INTRODUCTION Economics deals with agents choices. Many important economic decisions can be described as discrete choices within a nite number of choice alternatives. - Consumer choice of store, or brand, or product variety; - Occupational choice; Migration decisions; School / university choice; - Firms decisions of where to locate plants / stores; which products to sell; - Commuters s choice of transportation mode: car, bus, subway, bicycle, walk, mixed....

4 INTRODUCTION [2] Let J = f0; 1; :::; Jg be the set of choice alternatives that the agent faces. We index choice alternatives by j. Let Y 2 J be the variable that represents the actual choice of an individual. Let X be a vector of exogenous variables such as individual characteristics, and attributes of each choice alternative. Using a sample of fy; Xg we are interested in learning how X a ects Y. - How prices or other product attributes a ect consumer demand; - How neighborhood amenities and housing prices a ect people decisions of where to live;...

5 Stylized description of model and estimation The model can be described as: Y = h (X; "; ) - " is a vector of unobservables; - is a vector of parameters; - h(:) is a function that maps (X; "; ) into the choice set J. De ne the Conditional Choice Probability (CCP) function as the probability distribution of Y conditional on X. For any pair (j; x): P (j j x) Pr (Y = j j X = x) Note that P (j j x) = E (1fY = jg j X = x).

6 Stylized description of model and estimation [2] Suppose that: " independent of X with CDF F ("; ) where represents the unknown parameters in this distribution function. Model Y = h (X; "; ) and distribution F ("; ) imply a CCP function: P (j j x; ; ) = where 1f:g is the indicator function. Z 1 fh (x; "; ) = jg df ("; )

7 Stylized description of model and estimation [3] The researcher observes a random sample of N agents, indexed by n, with information on fy n ; x n : n = 1; 2; :::; Ng. She is interested in the estimation of the parameters (; ). The (conditional) log-likelihood function for this model and data is, `N() = NX n=1 ln Pr(Y = y n j X = x n ; ) = NX n=1 ln P (y n j x n ; ) The MLE is: b = arg max `N()

8 Predictions / Counterfactual analysis Given the estimated model, we can make predictions and counterfactual analysis. Let (x ; ) be a value of (X; ) that is di erent in some of its components to the observed/estimated value (x n, b ); e.g., a change in an attribute of a choice alternative; a change in agents characteristics; shutting down the e ect of a variable ( k = 0); removing some choice alternatives; etc. We can compare estimated and counterfactual CCPs, P (jjx n ; b ) and P (jjx ; ) This is a helpful exercise for policy analysis or managerial decisions.

9 2. NONPARAMETRIC MODEL Suppose that X has also a discrete & nite support, X 2 X f1; 2; :::; Mg. Consider a fully nonparametric speci cation of the CCPs. Nonparametric model: The vector of parameters is the vector of M (J + 1) CCPs, P = fp (jjx) : (j; x) 2 J X g, with the only restriction that P j2j P (j j x) = 1 for any value of x. Let N jx N P n=1 1fy n = j ; x n = xg be the number of observations in the sample where we observe (x n ; y n ) = (x; j).

10 NONPARAMETRIC MODEL [2] The log-likelihood function is: `N(P) = = = NX n=1 2 NX 6 X 4 n=1 (j;x)2j X X ln P (y n j x n ) (j;x)2j X 1fy n = j ; x n = xg ln P (j j x) N jx ln P (j j x) Taking into account the restrictions P J j=0 P (j j x) = 1 for any value of x, the f.o.c. for the Lagrange problem: N jx P (j j x) = 0 or N jx = P (j j x)

11 NONPARAMETRIC MODEL [2] Then, X i2j N ix = X P (ij x) =. Therefore, the MLE of the CCP i2j is: bp (j j x) = X N jx i2j N ix The MLE of this Nonparametric model is just the frequency estimator of the CCPs. As usual, this MLE is consistent, asymptotically normal, and e cient given the minimal restrictions in this nonparametric model.

12 Some Limitations of this Nonparametric model [1] When x is continuous: curse of dimensionality in the speed of asymptotic convergence. [2] Suppose that some of the X variables are characteristics of the choice alternative (not of the individuals): fx j : j = 1; 2; :::; Jg. For instance, the price of product j. Suppose that all the individuals face the same choice set (with the same values of X), such that variables X do not have variation over n. Therefore, the nonparametric CCPs P (jjx) do not provide any information about how P j (jjx) changes when X j changes (keeping X i : i 6= j constant) or when X i : i 6= j changes, e.g., no information about demand price elasticities.

13 3. RANDOM UTILITY MODELS An agent should choose one alternative from a choice set with J mutually exclusive alternatives J f0; 1; :::; Jg. We use n to index agents, i or j to index alternatives, and k to index explanatory variables. Let Y n 2 J be the random variable that represents the choice of agent n. Assumption of Utility Maximization: The agent makes this choice to maximize her payo or utility. Y n = arg max j2j U n(j) where U n (j) is the utility or payo for agent n of choosing alternative j.

14 Principle of Revealed Preference Suppose that we observe an agent n making choices under di erent choice sets J : i.e., y n (J 1 ), y n (J 2 ), :::, y n (J T ). Under the Assumption of Utility Maximization, the agent s choices reveal information on her preferences. It turns out that, under some assumptions on the structure of U n (j), we can also identify how U n (j) varies with characteristics X j (e.g., price) without the need of multiple choice sets J t. This a powerful principle in Econometrics and it is behind the estimation of demand or supply functions.

15 RANDOM UTILITY MODELS (2) In a Random Utility Model (RUM) the speci cation of U n (j) is: where: U n (j) = u n (j; X jn ) + " jn - X jn is a K 1 vector of characteristics of agent n and/or choice alternative j that are observable to the researcher; - u n (:) is a real-valued function; - " n = f" 0n ; " 1n ; :::; " J;n g represents unobservable variables to the researcher, but observable to the agent and therefore a ecting her choice.

16 RANDOM UTILITY MODELS (3) A common speci cation of the a RUM is: U n (j) = f X j n + W jn n + Z n j + " jn - f X j is a 1 K x vector of characteristics of alternative j (e.g., price); - Z n is a 1 K z vector of observable attributes of the agent (e.g., income); - W jn is a 1K w vector of characteristics that vary across individuals (e.g., commuting time to work using transportation mode j); - j is a K z 1 vector of parameters. - n and n are K x 1 and K w 1 vectors, respectively, that represent the marginal utility of each product attribute.

17 RANDOM COEFFICIENTS RUMs We can distinguish two types of models according to the speci cation of the coe cients n and n. Models without random coe cients. Either ( n ; n ) are constant parameters (i.e., n = and n = for any n) or they are deterministic functions of observable agent s observable Z n. In the later case, the terms f X j n + W jn n are equivalent to f W jn e where fw jn includes products of characteristics f X j and attributes Z n. Models with random coe cients. n and/or n depend on unobservable random variables for the researcher.

18 EXAMPLE 1: Choice of Transportation Mode to Work Y 2 f Walking, Bike, Bus, Subway, Car, Bundles of the previous g f X j = ( Price per mile ) W jn = ( Commuting time using mode j ) Z n = ( Income, Age, Gender, etc )

19 EXAMPLE 2: Choice (Demand) of Di erentiated Product (Laptops) Y 2 f every laptop product available in the market g f X j = ( Price, Brand, CPU speed, Screen size, Weight, Color, RAM, HD size, etc ) n contains the marginal utilities of each product attribute (for individual n); Z n = ( Income, Age, Gender, etc ). W jn = Indicator of n has bought brand j before.

20 SOME NORMALIZATIONS For constants a n and b n > 0, function a n + b n U n (j) is a positive a ne transformation of utility U n (j). Any positive a ne transformation of the utility function generates the same (utility maximizing) behavior for agent n. Therefore, we need to make some normalization assumptions on the parameters in the utility function U n (j) = f X j n + W jn n + Z n j + " jn, such that we can identify the parameters in the utility function.

21 SOME NORMALIZATIONS [2] A necessary condition to identify a parameter is that a marginal change in the parameter implies a change in the optimal choice of some agents in the population (such that some CCPs change). Other necessary condition is that it is not possible to completely o set the e ect on all CCPs of a marginal change in the parameter by making a marginal change in other parameter. Consider a model with 3 choice alternatives. Y n = 2 i : " 0n " 2n ( f X 2 f X 0 ) n + (W 2n W 0n ) n + Z n ( 2 0 ) " 1n " 2n ( f X 2 f X 1 ) n + (W 2n W 1n ) n + Z n ( 2 1 )

22 SOME STANDARD NORMALIZATIONS [3] Some standard normalization assumptions are: (1) No constant terms (i.e., no a n that does not depend on j): no constant term in f X j n or W jn n ; (2) If model includes f X j, then Z n does not include constant term. (3) 0 = 0. [If all the 0 js are additively transformed by the same constant, the optimal choice does not change]. (4) V ar(" 1n " 0n ) = 1 [If we multiply all the di erences " jn " 0n by the same constant, the optimal choice does not change].

23 Choice Probabilities in RUM The CCP for alternative j is (omitting agent subindex n) P (j jx) = Pr u(j; x) + " j u(i; x) + " i for any i 6= j = Pr " i " j + u(j; x) u(i; x) for any i 6= j = +1 Z 1 F " j j" j "j + u(j; x) u(i; x) for any i 6= j f "j (" j ) d" j where f "j is the marginal density of " j, and F " j j" j is the CDF of " j f" i : i 6= jg conditional on " j. Integral of dimension J. Only for some speci cations of the CDF F (") has a closed form expression.

24 Maximum Likelihood estimation Let be the vector of parameters of the model, and P (jjx; ) the CCPs according to the model. The log-likelihood function is: `N() = NX n=1 ln Pr(Y = y n j X = x n ; ) = NX X n=1 j2j 1fy n = jg ln P (jjx n ; ) The f.o.c. or = NX X n=1 j2j (jjx n ; ) 1fy n = jg P (jjx n ; = 0

25 Maximum Likelihood estimation [2] Now, for any x n, P j2j P (jjx n ; ) = 1, and therefore P j2j 0. This implies that we can write the likelihood equations as: Or NX X n=1 (jjx n ; " 1 1fy n = jg P (jjx n ; ) 1 (jjx n ; ) = = NX X n=1 (jjx n ; 1 P (jjx n ; ) [1fy n = jg P (jjx n ; )] = 0 This expression of the MLE has a Method of Moments interpretation.

26 MLE as Method of Moments Estimator of Regression-like model By de nition of CCPs, we have that P (j j x n ) = Pr (y n = j j x n ) = E ( 1fy n = jg j x n ) Therefore, 1fy n = jg = P (j j x n ) + v jn where, by construction, the error term v jn is such that E v jn j x n = 0. This system of equations can be seen as a regression-like representation of a multinomial choice model.

27 MLE as MME of Regression-like model [2] At the true parameters, the following moment conditions should hold: for any j and any function h j (x n ): E h j (x n ) [1fy n = jg P (jjx n ; )] = 0 MME is based on sample counterpart of population moment conditions. 1 N NX n=1 h j (x n ) [1fy n = jg P (jjx n ; )] = 0 The MLE provides the optimal K moment conditions to estimate the vector of K parameters : 1 N NX X n=1 (jjx n ; 1 P (jjx n ; ) [1fy n = jg P (jjx n ; )] = 0

28 4. MULTINOMIAL LOGIT MODEL [without random coe cients] We have U n (j) = X jn + " jn where " jn are i.i.d. over (n; j) Type 1 Extreme Value Type 1 Extreme Value is also called Gumbel distribution. For any j, we have that the CDF F (" j ) = exp n exp n " j oo and the PDF is f(" j ) = exp n " j exp n " j oo. The PDF is asymmetric. The di erence of two independent Type 1 Extreme Value variables has a Logistic distribution with CDF, F (" j " i ) = exp n " j " i 1 + exp n o " j " i o

29 Under this assumption on the distribution of ", we have the following form for the CCPs: P (j) = = +1 Z 1 +1 Z 1 F " j j" j "j + u j u i for any i 6= j f "j (" j ) d" j f(" j ) 2 4 Y F (" j + u j u i ) 5 d" j i6=j 3 = +1 Z Y 1 i2j expf " j g expf expf " j u j + u i gg d" j = +1 Z 1 expf " j g exp n expf " j u j g Pi2J expfu i g o d" j

30 De ne S P i2j expfu i g, and make the change in variable, v = " j + u j ln S P (j) = +1 Z 1 expf v + u j ln Sg exp f expf vgg dv = expfu j ln Sg = expfu jg expfln Sg Z expf vg exp f expf vgg dv = expfu j g P Jk=0 expfu k g

31 ML ESTIMATION OF LOGIT MODEL The closed form expression for CCPs is very convenient for the estimation of the model. Consider the logit model U n (j) = X jn + " jn, and the random sample fy n ; x n : i = 1; 2; :::; Ng. The log-likelihood function is: `N() = = NX n=1 NX n=1 j2j ln Pr(Y = y n j X = x n ; ) X 1fy n = jg ln " expfx jn g P i2j expfx in g # This log-likelihood function if globally concave in. Furthermore, the gradient and Hessian of this function have simple closed form expressions. Therefore, the numerical computation of the MLE can be implemented in a simple way using Newton s method.

32 ML ESTIMATION OF LOGIT MODEL (2) You can verify that in a Logit j = P (j) [1 P (j)] Taking this into account, we can show ln P (j j x n ; (j) j P = x jn m(x n ; ) where m(x n ; ) P i2j x in P (ijx n ; ). And the likelihood equation equations are: 1 N 0 NX n=1 j=0 h xjn m(x n ; ) i [1fy n = jg P (jjx n ; )] A = 0 1

33 But it is clear that P j2j m(x n ; ) [1fy n = jg P j2j 1fy n = jg = P j2j P (jjx n ; ) = 1. P (jjx n ; )] = 0 because Therefore, 1 N 0 NX n=1 j=0 x jn [1fy n = jg 1 P (jjx n ; )] A = 0

34 Luce Theorem (1959) Consider a RUM with utilities u j + " j. Consider the following two axioms. Axiom 1. Let i,j be two choice alternatives. And let J and J 0 be two di erent choice sets that include i and j. Then, P (jjj ) P (ijj ) = P (jjj 0 ) P (ijj 0 ) Axiom 2. For any J and any j 2 J, P (jjj ) > 0. Under Axioms 1 and 2, the form of the CCPs is: P (jjj ) = w(u j ) P i2j w(u i ) with w(:) > 0, strictly increasing, and w(u j )=w(u j ) = w(u j u i ).

35 Independence of Irrelevant Alternatives (IIA) The logit model imposes the restriction that the ratio between the probabilities of two alternatives, say j and i, depends ONLY on the utilities of these alternatives, and not on utilities of other alternatives: n uj o P (j) P (i) = exp exp fu i g Therefore, if we change the choice set J, by adding or/and removing alternatives, the ratios between probabilities should not change. For any two choice sets J and J 0 (that include j and i as alternatives), we have that: P (jjj ) P (ijj ) = P (jjj 0 ) P (ijj 0 ) This property, though reasonable in a deterministic environment, it generates unrealistic predictions in a RUM.

36 IIA in Deterministic and in Stochastic Decision Environments Let a, b, c be three choice alternatives. Deterministic choice. - Suppose J = fa; bg. P (ajj )=P (bjj ) is either 1 (if a b), or 0 (if b a), or 1 (if a b). - Suppose J 0 = fa; b; cg with a c. It seems reasonable that P (a)=p (c) = 1. This is consistent with IIA: if a b, P (ajj )=P (bjj ) = 0:5=0 = 1; if b a, P (ajj )=P (bjj ) = 0; and a b, P (ajj )=P (bjj ) = (1=3)=(1=3) = 1.

37 Stochastic choice. - Suppose J = fa; bg with P (ajj ) = 0:9999 and P (bjj ) = 0:0001, such that P (ajj )=P (bjj ) = 9999 [very close to the determinsitic case]. - Suppose J 0 = fa; b; cg with a c [identical choice alternatives]. It seems reasonable that P (ajj 0 ) = P (cjj 0 ) = 0:9999=2, and P (bjj 0 ) = 0:0001. This is NOT consistent with IIA because now P (ajj 0 )=P (bjj 0 ) = 9999=2 6= 9999 = P (ajj )=P (bjj ).

38 IIA: Example Consider consumers deciding which car model to purchase. The set of available models in year 2014 (i.e., choice set J 2014 ) includes model "Lux" that is a luxury car; and model "Econ", that is a very modest and unexpensive car. In year 2014, their market shares are: P (Lux j J 2014 ) = 0:10 ; P (Econ j J 2014 ) = 0:40; P (Lux j J 2014 ) P (Econ j J 2014 ) = 1 4 In 2015, the new luxury model "NewLux" (very similar to Lux) appears in the market. The logit model predicts that: P (Lux j J 2015 ) = 0:10 (1 P NewLux ) P (Econ j J 2015 ) = 0:40 (1 P NewLux )

39 IIA: Example (2) For instance, if P NewLux = 10%, then P (Lux j J 2015 ) = 9% and P (Econ j J 2015 ) = 36%, what seems unrealistic. The IIA is an implication of the Logit property that the di erences " j are i.i.d. across any pair of choices. " i

40 IIA and Average Partial E ects In most applications we are interested in the estimation of Average Partial E ects (APE). In a discrete choice model, de ne AP E k;j (x) as the APE of variable k in choice alternative j when the explanatory variables are x. AP E k;j (x) k Does the MNL impose a restrictive / unrealisitc structure on these APEs? Yes, especially when we are interested in APE of changes in product characteristics, X j.

41 IIA and Average Partial E ects [2] Remember that in the MNL: P jn = expfx j + Z n j g P Ji=0 expfx i + Z n i g Consider the APE on P j of a change in the X i for i 6= j (i.e., e ect on demand of product j of a change in the price of product i = P jn P in The e ect is proportional to P jn and P in. Two products j with the same P j are a ected exactly in the same way by an increase in the price of product i. This is very restrictive.

42 IIA and Average Partial E ects [3] Consider now the partial e ect of a change in a characteristic of individual n = P jn j J X i=0 i P in 1 A This is not a particularly restrictive APE. As in a binary choice model, this APE goes to zero when P jn! 0 and when P jn! 1. It depends on the value of j relative to the other s, and these parameters are unrestricted.

43 Solutions to Independence of Irrelevant Alternatives Di erent models have been proposed to deal with this limitation of the Logit model: (1) Multinomial probit; (2) Nested Logits; (3) Random coe cients logit

44 5. NESTED LOGIT MODEL Suppose that the set J chocie alternative can be partitioned into G (mutually exclusive) groups of alternatives, that we index by g. Let J g be the set of alternatives in group g such that: J = [ G g=1 J g The idea is that alternatives within a group share some unobserved features that make them closer substitutes that alternatives in di erent groups. Suppose that random variables " j of the RUM has the following structure: " j = " (1) g + g " (2) jjg - " (1) 1 ; "(1) 2 ; :::; "(1) G are i.i.d. EV type 1.

45 - For any g, f" (2) jjg : j 2 J gg are i.i.d. EV type 1. - " (1)0 s and " (2)0 s are independent.

46 NESTED LOGIT MODEL [2] This model has the following CCPs: P j = P (1) g P (2) jjg with P (2) jjg = exp ( ) uj g ( P ui exp i2j g g ) ; P (1) g = GP g 0 =1 exp fv g g exp n v g 0 o and v g is the E(max j2jg fu j + g " (2) g) that has the following form: v g = g ln jjg 0 ( ) 1 B X uj exp A j2j g g

47 NESTED LOGIT MODEL [3] A di erent way to represent the Nested Logit is the following. Consider the RUM Y = arg max j2j fu j + " j g with the the G groups and where " = (" 0 ; " 1 ; :::; " J ) has a Generalized Extreme Vaue (GEV) distribution: F (") = exp 8 < : GX g=1 " X j2j g exp where 1, 2,..., R are positive parameters. " j g!# g 9 = ; Then, the CCPs are: P j = P (1) g P (2) jjg

48 ( ) uj P (2) jjg = exp g ( P ui exp i2j g g ) ; P (1) g = GP g 0 =1 exp fv g g exp n v g 0 o with v g = g ln P j2j g exp ( )! uj g.

49 NESTED LOGIT MODEL [4] The NL has an interpretation as a sequential decision model. Let Y n (1) 2 f1; 2; :::; Gg represent agent n s choice of group. And let Y n (2) represent the choice of speci c alternative. The model implies that: Pr(Y (1) n = g j X n ) = P (1) g (X n ) and Pr(Y (2) n = j j X n ; Y (1) n = g) = P (2) jjg (X n) Therefore, the likelihood function of the model, l() = X N n=1 ln Pr(Y njx n ; )

50 can be written as the sum of two likehoods: l (1) () + l (2) () l() = NX GX n=1 g=1 1fy (1) n = gg ln P (1) g (X n ; ) + NX n=1 X j2j y (1) n 1fy (2) n = jg ln P (2) jjy n (1) (X n ; )

51 NESTED LOGIT MODEL [5] The Nested Logit maintains the property of IIA for alternatives within the same group but not for alternatives in di erent groups. In the example of the demand of cars: the new car will have a stronger substitution e ect within its own group, e.g., luxury cars.

52 TWO-STEP ESTIMATION OF NL Note that l() = l (1) () + l (2) () where: l (1) () is the between-group likelihood function for the choice variable Y (1) n conditional on X n l (2) () is the within-group likelihood function for the choice variable Y (2) n conditional on X n and Y (1) n. We can estimate a combination of the parameters in by maximizing l (1) (), and other combination of parameters by maximizing l (2) (). This two-step procedure is not statistically e cient but it is computationally very convenient because each step consists of a standard MNL estimation (i.e., globally concave likelihood function).

53 TWO-STEP ESTIMATION OF NL [2] Step 1: Maximization of within-group likelihood function l (2) () with probabilities: expfx j g + Z n j;g g P jjg;n = P i2j g expfx i g + Z n i;g g where the estimated parameters are: g g and j;g j g (where one of the 0 s within each group is notmalized to zero).

54 TWO-STEP ESTIMATION OF NL [3] Step 2: Construct the estimated inclusive values: bi gn = ln 0 X j2j g expfx j b g + Z n b j;g g And maximization of betwithin-group likelihood function l (1) () with probabilities: P jjg;n = expf b g I gn g P Gg 0 =1 expf g 0 I b g 0 n g 1 C A - The estimated parameters are g ; with one of these parameters normalized to 1:

55 EFFICIENT ESTIMATION OF NL Given this consistent two-step estimator, we can construct an e cient estimator, and a valid variance-covariance matric by doing one Newton or BHHH iteration in the estimation of the full likelihood function: b eff = b 2step 2 l( b 2step b 2step

56 6. RANDOM COEFFICIENTS LOGIT (MIXED LOGIT) In the standard RCLogit we have that: where: U jn = X jn n + " jn - " jn are i.i.d. over (n; j) Type 1 Extreme Value; - n is i.i.d. over n N(b; ); - " n and n are independent. We can also represent n as n = b + W v n where W is a K K lower triangular matrix that is the Cholesky s decomposition of (i.e., WW 0 = ) and v n = (v 1n ; v 2n ; :::; v Kn ) 0 is a vector of independent standard normals.

57 RC LOGIT We can write U jn = X jn b + [X jn W] v n + " jn = X jn b + " K P k=1 KP k 0 =k X k 0 jn w k 0 k! v kn # + " jn The parameters of the model are b and W. The RCLogit can be generalized to a allow for a nonparametric speci cation of the distribution of v n. Fox, Kim, Ryan, and Bajari (JoE, 2012) show that the nonparametric RCLogit is identi ed.

58 RC LOGIT - CCPs To obtain CCPs, we should integrate over " n and v n the optimal decision fy n = jg, f X jn b + [X jn W] v n + " jn X in b + [X in W] v n + " in for any i 6= jg: Z exp n X jn b + [X jn W] v o KQ P (j jx n ) = (v k ) dv k JP k=1 exp fx in b + [X in W] vg i=0 It requires numerical integration over the distribution of the K random variables fv k g.

59 RC LOGIT and IIA Consider the e ect on P j of a marginal change in the attributes of product i 6= i. In the Logit model, this e ect is the same for every choice alternaitve i = b P j P i In RC Logit, this e ect i = Z [b + W v] j (v) i (v) f(v) dv and j (v) = exp n X j b + [X j W] v o = P J i=0 exp fx i b + [X i W] vg.

60 RC LOGIT and IIA [2] Z The e j = [b + W v] j (v) i (v) f(v) dv depends i E v j (v) i (v) that is equal to Cov j (v) i (v) + P j P i. i i = Logit b Cov j (v) i (v) This covariance depends on the distance between the vectors X j and X i. - When X j and Cov j (v) i (v) > 0 X i is small, low values of j (v) are associated with low i (v), - When X j X i is large, Cov j (v) i (v) can be zero o even negative.

61 RC LOGIT - MLE Given a random sample, the log-likelihood function `N(b,W) is: NX JX n=1 j=0 1fy n = jg ln 2 Z 6 4 exp n X jn b + [X jn W] v o JP i=0 exp fx in b + [X in W] vg KQ k=1 (v k )dv k The MLE is the value of (b,w) that maximizes `N(b,W). This MLE has the standard good properties: - MLE is CAN and AE. - `N(b,W) is twice continuously di erentiable in (b,w): we can use gradient methods (e.g., Newton, BHHH) to search for the MLE. - `N(b,W) is not globally concave but is concave in b given W.

62 RC LOGIT - MLE [2] The main issue in the implementation of the MLE of the RCLogit is the computation of the CCPs by solving the multiple integration problem. We can use Monte Carlo simulation methods to approximate CCPs. However, we need to take into account how the approximation error a ect the properties of our estimators.

63 7. MONTE CARLO SIMULATION Monte Carlo simulation is a general method to approximate multiple-dimensional integrals. It is used in any scienti c application, empirical or theoretical, that requires the computation of multiple-dimensional integrals. Let v = (v 1 ; v 2 ; :::; v K ) be a vector of continuous random variables with joint density (v) that is continuous over the compact support V. Let P be a parameter that is de ned as: P = Z h(v) (v) dv where h(v) is a known function. Suppose there is NOT a closed-form expression for this integral.

64 Fundamental Theorem of Sampling Let v be a scalar random variable with CDF F (v) that is continuous and strictly increasing on the the support V. Then: (1) There exists an inverse function F 1 (:) (the Quantile function) such that v = F 1 (u). (2) The random variable u = F (v) has a distribution U [0; 1]. This implies that if fu 1 ; u 2 ; :::; u R g are R i.i.d. random draws from a U [0; 1], then n F 1 (u 1 ); F 1 (u 2 );...; F 1 (u R ) o are R i.i.d. random draws from the distribution F.

65 Proof: (1) F is strictly increasing on V. Therefore, by the inverse function theorem, there exists an inverse function F 1 (:) such that for any (v; u) 2 V [0; 1], u = F (v), v = F 1 (u). (2) De ne the random variable u, u = F (v). The CDF of u evaluated at an arbitrary u 0 2 [0; 1] is: Pr (u u 0 ) = Pr F 1 (u) F 1 (u 0 ) So, u has a U [0; 1] distribution. = Pr v F 1 (u 0 ) = F (F 1 (u 0 )) = u 0

66 Examples of Quantile functions: Logistic distribution v Logistic. The CDF is F (v) = exp(v) 1 + exp(v). The inverse CDF (i.e., Quantile function) of the Logistic is: F 1 (u) = ln u 1 u Then, we can get a random draw from the Logistic by getting a draw u from U [0; 1] and then apply transformation: v = ln u 1 u

67 Drawing from Multivariate Normal If v is Standard Normal with CDF and u U[0; 1], then v = 1 (u). There are very e cient procedures to calculate the inverse function 1. Let v = (v 1 ; v 2 ; :::; v K ) be a vector of Normal random variables N(m,). Then, we can write: where: v = m + W v - v = (v 1 ; v 2 ; :::; v K ) is a vector of i.i.d. standard normals; - W is a lower triangular matrix obtained as the Cholesky decomposition of, i.e., W W 0 =.

68 We can get a random draw of v N(m,) by taking K independent random draws from U [0; 1], (u 1 ; u 2 ; :::; u K ), and the applying the transformation: v = m + W 0 1 (u 1 ). 1 (u K ) where 1 is the inverse of the CDF of the standard normal. 1 C A As we will see below, in the context of simulation-based estimation, if in our econometric model we are interested in the estimation of parameters m or/and W, we need to keep our simulations 1 (u j ) constant over the estimation procedure, while the value of m or/and W varies during the gradient search for the estimator.

69 Frequency Simulator Remember that P is a parameter de ned as: P = E (h(v)) = Z h(v) (v) dv For instance, h(v) = 1fv 1 1, v 2 2,..., v K K g. Let fv 1 ; v 2 ; :::; v R g be R independent random draws from the CDF. Then, the Frequency Simulator of P is de ned as: ep R = 1 R RX r=1 h(v r ) The simulation error (approximation error) is: e R = e P R P.

70 Properties of the Frequency Simulator [Same as a "sample mean"] (1) Unbiased: E e P R = P. (2) Variance: V ar P e V ar(h(v)) R =. R (3) Consistent: As R goes to in nity, e P R! P. (4) p R asymptotically normal: p ep R R P qv ar(h(v))! N(0; 1).

71 Properties of the Frequency Simulator (2) In general, it is possible to obtain simulators more precise (with lower variance) than the frequency simulator [e.g., Importance Sampling] Other limitation of the FS is that if the h(:) is discontinuous or non-di erentiable with respect to some parameters [e.g., h(v) = 1fv 1 1,..., v K K g], then the simulator is also discontinuous and non-di erentiable. This has important implications in the estimation of dicrete choice models. In some Simulated-Based estimators that use the frequency simulator of CCPs are such that: - Criterion function is a step function of the parameters: numerical optimization problems; - Estimator may not be root-n asymptotical normal.

72 Simulation-Based Estimation using Frequency Simulator Consider the RUM with utilities U jn = X jn [b + W v n ] + " jn such that: P jn () = Z 1 n " in " jn + (X jn X in )[b + Wv n ] o f(" n ; v n ) d" n dv n A Simulated-MLE of = (b; W) based on the Frequency Simulator of the CCPs is the value of that maximizes the Simulated log-likelihood: `(R) () = NX JX n=1 j=0 1fy n = jg ln e P (R) jn () where e P (R) jn () is the frequency simulator of P jn().

73 Simulation-Based Estimation using Frequency Simulator [2] The frequency simulator P e(r) jn () is: ep (R) jn () = 1 R where and f" (r) n ; v n (r) f(" n ; v n ). RX r=1 1 " (r) in "(r) jn + (X jn X in )[b + W v n (r) ] : r = 1; 2; :::; Rg are R i.i.d. draws from the distribution This was the SMLE proposed in a seminal paper by Lerman and Manski (1981) for the Multinomial Probit, i.e., " n N(0; ) and W = 0.

74 Sim-Based Estimation using Frequency Simulator [3] This estimator very poor statistical and computational properties of this estimator. There are several related issues. [1] For choice alternatives with low P jn () we have that P e(r) jn () = 0, unless R is very large. The log-likelihood becomes minus in nite, even at the true. [2] R should be very large to have an estimator with decent properties. [3] `(R) () is a step function. Standard gradient methods do not work. [4] For xed R, the estimator is not consistent, it is not asymptotical normal. Poor small sample properties.

75 Solutions to the Problems of SMLE with Frequency Simulator There are several approaches / methods that overcome the limitations of the SMLE with the Frequency simulator. [1] RC Logit model: Simulating v 0 s but analytical formula for integration over " 0 s such that the simulator is a smooth function of parameters and always > 0. [2] Importance-sampling simulators: Like GHK for the Probit model. Improves precision, is always > 0, and smooth function of parameters. [3] Simulated Method of Moments and Simulated Scores [+ smooth and >0 simulator] that are root-n consistent and asymptotically normal estimators even when R is xed and small (e.g., R = 1).

76 Solution using RC Logit In the RC Logit we take into account that " n is independent of v n with Extreme Value distirbution, such that we have closed form expressions of probs conditional on v n, and these probs a re smooth functions. Z exp n o X jn b + [X jn W] v n P jn () = (v n )dv n JP exp fx in b + [X in W] v n g i=0 Therefore, we can use the simulator: ep ;R jn () = 1 R RX r=1 exp JP i=0 exp X jn b + [X jn W] v (r) n X in b + [X in W] v (r) n where fv (r) n : r = 1; 2; :::; Rg are R i.i.d. draws from the distribution f(v n ).

77 Solution using RC Logit [2] The simulator P e;r jn () has several important advantages over the frequency simulator P e jn R (). - P e;r jn () is continusously di erentiable in. - It is always > 0 and < 1 for any value of R, even for R = 1. - Variance of the simulation error is subtantially smaller than for the frequency simulator [and the ratio of their variances increases exponentially with J].

78 Importance sampling simulation (IS) Let be a density function di erent to. Density is denoted the Importance Sampling density. By de nition of P, we have that: P = E (h(v)) = = Z Z h(v) (v) dv h(v) (v) (v) (v) dv = E h(v) (v) (v)! Let n v 1 ; v 2 ; :::; v Ro be R independent random draws from. Then, the Importance Sampling Simulator (based on ) of P is: ep R = 1 R RX r=1 h(v r ) (v r) (v r )

79 Properties of IS (1) Unbiased: E e P R = P. (2) Variance: V ar e P R = V ar h(v) (v) (v). R (3) Consistent: As R goes to in nity, e P R! P. (4) Asymptotically normal: p R s V ar ep R P h(v) (v)! N(0; 1). (v)

80 Relative variances of FS and IS V ar e P F S = V ar(h(v)) R and V ar e P IS = V ar h(v) (v) (v). R Therefore, if the ratios (v) are smaller than 1 for values of v with large (v) (h(v) P ) 2, then the ISS will have lower variance than the FS. For the ISS to have a lower variance than FS, the ISS density should "over sample" (relative to ) those regions in the support of v where (h(v) P ) 2 is large.

81 Simulation of Multinomial Probit probabilities: GHK Simulator Let " = (" 1 ; " 2 ; :::; " J ) be a vector of Normal random variables with vector of means 0 and variance-covariance. Let c = (c 1 ; c 2 ; :::; c J ) be a vector of constants. Consider the following probability: P = Pr (" 1 c 1, " 2 c 2,..., " J c J ) = Z 1 f" 1 c 1, " 2 c 2,..., " J c J g (v; ) dv These probabilities appear in a Multinomial Probit model. The Geweke-Hajivassiliou-Keane (GHK) simulator is a very e cient simulator of these probabilities. It is also continuously di erentiable in the argument (vector of parameters) c.

82 GHK Simulator (2) Let W = fw ij g be a lower triangular matrix that comes from the Cholesky decomposition of. Then, " = W z, where z is a vector of J independent standard normals, such that: and " 1 = w 11 z 1 " 2. = w 21 z 1 + w 22 z 2. " J = w J1 z 1 + w J2 z 2 + ::: + w JJ z J n o "j c j = n o wj1 z 1 + w j2 z 2 + ::: + w jj z j c j with ec j = c j w jj and e b ji = w ji w jj for i < j = n z j ec j ew j1 z 1 ::: ew j;j z J o

83 GHK Simulator (3) Therefore, P = Z 1 n z j ec j ew j1 z 1 ::: ew j;j 1 z j 1 for any j o (z) dz

84 GHK Simulator (4) Consider the following IS density, f (z ) [1] z 1 fz 1jz 1 ec 1 g is a random draw from the standard normal right truncated at ec 1. [2] Given z1, then z 2 fz 2jz 2 ec 2 ew 21 z1 g is a random draw from the standard normal right truncated at ec 2 ew 21 z1.... [j] Given (z1 ; ::; z j 1 ), then z j fz jjz j ec j ew j1 z1 ::: ew j;j 1 zj 1g is a random draw from the standard normal right truncated at ec j ew j1 z1 ::: ew j;j 1 zj 1.

85 GHK Simulator (5) What is the form of the IS density f (z )? Note that the density of a random variable z (z ) that is a right-truncated normal at c is, where here and 1 (c) represent the pdf and cdf of the standard normal, respectively. Then, by de nition: = = f (z ) (z1 ) (z 2 ) ::: (z J ) [1 (ec 1 )] h 1 (ec 2 ew 21 z1 )i ::: h 1 (ec J ew J1 z1 ::: ew J;J zj )i Q Jj=1 (z1 ) (z 2 ) ::: (z J ) Q h Jj=1 1 ( ec j ew j1 z1 ::: ew j;j zj )i

86 GHK Simulator (5) The GHK simulator of P is the ISS that uses IS density f (z ). Let fz r : r = 1; 2; :::; Rg be R independent random draws from the IS density f (z ). Then, ep (R) GHK = 1 R Note that: RX r=1 1 n z jr ec j ew j1 z 1 ::: ew j;j 1 z j 1 for any jo (z r) f (z r) - (z r) f (z r) = Q h J j=1 1 ( ec j ew j1 z1 ::: ew j;j 1 zj 1 )i - By construction of the z r simulations, the indicator of n zjr ec j ew j1 z1 ::: ew j;j is always 1.

87 Therefore, ep GHK R = 1 R RX JY r=1 j=1 h 1 ( ec j ew j1 z 1 ::: ew j;j 1 z J 1 )i

88 Properties of GHK Simulator [1] It is unbiased, consistent, asymptotically normal. [2] It has substantially lower variance than the FS. In some standard settings the ratios of variances can the of the order of 100 or even [3] P e R GHK (c; ) is continuously di erentiable in the parameters (c; ). [4] e P GHK R is always strictly greater than 0 and lower than 1. [5] It is simple to get random draws from a truncated standard normal. If z that is a right-truncated normal at c then its CDF is F (z ) = (z ) (c), 1 (c) such that given u U[0; 1], z = F 1 (u) = 1 ((c) + [1 (c)] u)

89 8. SIMULATION-BASED ESTIMATION (SBE) 8.1. Refreshing Estimation and Asymptotic Theory 8.2. SBE: Conditions on the Simulators 8.3. Simulated Based Estimators: SMM and SMLE 8.4. Asymptotic Properties

90 8.1 REFRESHING ESTIMATION & ASYMPTOTIC THEORY Consider a discrete choice model with CCPs P (jjx n ; ), where is a K 1 the vector of parameters. Let 0 be the true value of in the population under study. Let fy n ; x n : n = 1; 2; :::; Ng be a random sample from the population. The model implies the following moment conditions: E JX j=1 z jn [1 fy n = jg P (jjx n ; 0 )] A = 0 where z jn is a K 1 vector of functions of x n, e.g., z jn = (x 0 jn ; P i6=j x 0 in, [x jn x jn ] 0 ). 1

91 The population likelihood equations, is particular example of these moment conditions. In this case, z jn ln P (jjx n; 0 ) E JX 0 ln P (jjx n ; 0 0 [1 fy n = jg P (jjx n ; 0 )] A = 0 1

92 We can represent these moment conditions in a compact form as: where: E (z n [y n P n ( 0 )]) = 0 z n is the K J matrix (z 1n ; z 2n ;...; z Jn ); y n is the J 1 vector (1 fy n = 1g ; 1 fy n = 2g ;...; 1 fy n = Jg) 0 ; P n () is the J 1 vector (P (1jx n ; ); P (2jx n ; );...; P (Jjx n ; )) 0. Identi cation Assumption. 0 is the unique value in the parameter space that solves the system of equations E (z n [y n P n ( 0 )]) = 0.

93 ESTIMATION. The estimator b N is the value that solves the system of sample moment conditions: 1 N NX n=1 z n [y n P n ()] = 0 Example: MLE. When z jn ln P (jjx 0, and z n ln P n() we have that the sample moment conditions above de ne the MLE: 1 N NX ln P n () [y n P n ()] = 0 Example: MM 1 N NX n=1 z n [y n P n ()] = 0

94 CONSISTENCY. Suppose that: (a) P n () is continuously di erentiable in ; (b) for any 2, we have that V ar(z n [y n P n ()]) is nite; (c) is a compact set; and (d) 0 is the unique value in the parameter space that solves the system of equations E(z n [y n P n ()]) = 0. Then, as N! 1, (i) 1 N to E(z n [y n NP n=1 z n [y n P n ()]); P n ()] converges in probability uniformly in 2 (ii) b N! p 0. ASYMPTOTIC NORMALITY. Let g n () z n [y n P n ()]. Using a Taylor expansion of 1 N P Nn=1 g n ( b N ) = 0 around = 0 : p N b N 0 = " 1 N NP n=1 # 1 n ( 0 ) 0 p N NP n=1 g n ( 0 ) # + o(1)

95 ASYMPTOTIC NORMALITY [Cont.] Then, under standard regularity 1 P conditions, we have that n ( 0 ) 0! p n( 0 0 G 0, and 1 P p Nn=1 g n ( 0 )! d N(0; 0 ) with 0 = E g n ( 0 ) g n ( 0 ) 0. By Slut- N sky s Theorem: p N b N 0!d N(0; G0 1 0 G 1 0) 0 In our Discrete Choice Model we have that: G 0 = E z n ( = E z n P n z 0 n where P n is J J matrix where the element (j; j) in the main diagonal is P (jjx n ; 0 ) [1 P (jjx n ; 0 )], and the element (j; i) out of the main diagonal is P (jjx n ; 0 ) P (ijx n ; 0 ).!

96 8.2. SIMULATION-BASED ESTIMATION Given a model de ned by the moment conditions E ( g n ( 0 ) ) = 0, a Simulated-Based-Estimator is the value b N;R that solves the moment conditions: 1 N NX n=1 eg R n () = 1 N NX n=1 ez R n () h y n e P R n () i = 0 e P R n () is the J 1 vector of simulators ( e P R (1jx n ; ); e P R (2jx n ; );...; ep R (Jjx n ; )) 0. Note that for the SMLE, z n () ln P n() 0. Therefore, we also need simulator for z n (), say ez R n ().

97 Note also that if we use the same simulator, P e R n (), for P n () and ln P n () 0 (1) one of the two simulators is biased for nite R; (2) the simulation errors in e P R n () and ez R n () will be correlated.

98 SIMULATED BASED ESTIMATORS: SMM, SMLE, S-Scores Simulated method of Moments (SMM). b N;R that solves: NX n=1 z n h yn e P R n () i = 0 with e P R n () unbiased sim. of P n () Simulated maximum likelihood (SML). b N;R that solves: NX ln e P R n h yn e P R n () i = 0 with e P R n () unb. im. of P n () Simulated Scores (SScores). b N;R that solves: 1 N NX n=1 es R n () h y n e P R n () i = 0 es R n () and e P R n () unb. & independent sims ln P n() and P n (), resp.

99 Conditions on Simulator [McFadden, ECMA 1989] particularized to the RC Logit [1] The random draws v (r) n = fv (r) nk : k = 1; 2; :::; Kg from the Standard normal are indepently distributed over k, n, and r. Each observation n (and each random coe cient v nk ) has its own R independent random draws. [2] These random draws are made at the beginning of the estimation procedure and they are kept xed during the implementation of the algorithm that searches for b N;R. That is, the same set of random draws is used to construct simulators for di erent values of.

100 If new drawings were made at each [Newton] iteration of the gradient algorithm, they would introduce new randomness at each step and it would not be possible to obtain numerical convergence of the algorithm, and the asymptotic properties of the estimator would not hold: McFadden (Econometrica, 1989). Note that some components in are parameters in the distribution of the random coe cients n, i.e., parameters b and W. The values of these parameters change during our search for the estimator b N;R. Therefore, we cannot keep constant the random draws from the distribution of the random coe cients n However, we can always keep constant the random draws from the distribution of the standard normals in v n. [Or more generally, the random draws from a U[0; 1] that we can use to construct draws from any distribution].

101 Conditions on Simulator [Cont.] [3] The simulator g P R (jjx n ; ) is continuously di erentiable in, and it is always within (0; 1). [4] For any value (j; x n ; ), the simulator P gr (jjx n ; ) is unbiased, and as R goes to in nity, it is consistent, and asymptotically normal: E gp R (jjx n ; ) = P (jjx n ; ) As R! 1, g P R (jjx n ; )! p P (jjx n ; ) As R! 1, p R h e P R n () P n () i! p N(0; e V (x n ; )) where e V (x n ; ) is the variance matrix of the J simulation errors.

102 8.3. ASYMPTOTIC PROPERTIES There are two types of asymptotics we can consider for SB Estimators. - As N! 1 and R is xed. - As N! 1 and R! 1. Asymptotics as N! 1 and R is xed are particularly interesting because they fully take into account how simulation error a ects the asymptotic bias and variance of SB Estimators. We start presenting asymptotic results as N! 1 and R is xed.

103 A useful decomposition For the derivation of the asymptotic results of SBEs, it is helpful to consider the following decomposition of the conditions that de ne the estimator: 1 N X N n=1 e g R n () = = 1 N X N n=1 g n() [A] Without Sim. error + 1 N X N n=1 h Ev eg R n () g n () i [B] Simulation Bias + 1 N X N n=1 heg R n () E v eg R n () i [C] Simulation Noise where E v eg R n () represents the expectation over the simulated random draws fv nr g but conditional on the observed data (y n ; x n ).

104 Term [A]: Standard Moment Conditions [Without Sim. Error] 1 N X N n=1 g n() [A] Without Sim. error Under standard regularity conditions, we have that: N 1 P N n=1 g n ( ) b = 0 implies: = p 1 X N g n ( 0 ) n ( 0 ) N n=1 0 5 p N b 0 + o(1) n=1 1 p N P Nn=1 g n ( 0 ) converges in distribution to N (0; 0 ). 1 N P n ( 0 0 converges in probability to n( 0 0.

105 In the discrete choice model: G 0 = E z n ( 0 0! and 0 = E z n P n z 0 n

106 Term [B]: Simulation Bias 1 N X N n=1 h Ev eg R n () g n () i [B] Simulation Bias In our model, for the Simulated Method of Moments: E v eg n R () h = E v zn yn P e R n () i = z n h yn E v e P R n () i = z n [y n P n ()] = g n () Therefore, for the SMM with unbiased simulator of CCPs, the Simulation Bias term is exactly zero for any value of, N, and R.

107 Term [B]: Simulation Bias For Simulated Maximum Likelihood: E v eg R n () = E ln e P R n () = E v ln Pn () = g n () + E v e R n () 6= g n () h yn e P R n () i! # + er hyn n () P n () e R n () i e [y n P n ()] E v R n () e R n ()! e R n () is the K J matrix of simulation errors ln e P R n ; e R n () is the the J 1 vector of simulation errors in e P R n ().

108 Term [B]: Simulation Bias [SML] For SML, 1 N X N n=1 h Eu 0 and E v e R n () e R n () e E v R n () eg n R () g n () i e 6= 0 because both E v R n () 6= 0. 6= 0 because an unbiased simulator of the CCP typically implies a bias simulator of the derivative of the ln P (jjx n ; (jjx n; 1 P (jjx n ; ) Note that simulation that enters additively in the simulator of P (jjx n ; ), however enters in the denominator 1 P (jjx n ;) in the simulator ln P (jjx e E v R n () e R n () 6= 0 because simulation error in CCPs is correlated with simulation error in the derivatives of the log-ccps. 6=

109 Term [B]: Simulation Bias [SMLE] Importantly, this Simulation Bias does not go to zero as the sample size goes to in nity p lim N!1 1 N NX n=1 h Ev eg n R () g n () i = E E v e R n () [y n P n ()] E E v e R n () e R n () = E E v e R n () e R n () The rst term is zero at = 0, but the second term is not zero. 6= 0 Therefore, SML is inconsistent as N! 1 and R is xed. Consistency of the SML requires that as N! 1 the number of simulations R also goes to in nity.

110 Term [B]: Simulation Bias [Method of Simulated Scores] For Simulated Scores: E v eg R n () = E v es R n () h y n e P R n () i = E v s n () + er hyn n () P n () e R n () i = g n () e Because, now the simulator es R n () is such that E v R n () = 0 and E v e R n () e R n () 0.

111 Term [C]: Simulation Noise 1 N X N n=1 heg R n () E v eg R n () i [C] Simulation Noise For the Simulated Method of Moments we have showed that E v eg R n () = g n (). Therefore, eg R n () E v eg R n () = z n h yn e P R n () i z n [y n P n ()] = z n e R n () Given the properties of our simulator, we have that the vector of K 1 random variables z n e R n () is such that, for any n and : E z n e R n () = 0 and conditional on x n, E(e R n ()jx n ) = 0.

112 Asymptotic Distribution of the SMM Using a Taylor approximation around = 0 of the moment conditions of the SMM, and taking into account that b N;R! p 0, we have that: p N b N;R 0 = N NX P z e R 3 n ( 0 ) p N N X n=1 h z n yn P n ( 0 ) e R n ( 0 ) i o(1) As N! 1 with R xed, we have that: N NX n=1 z e P R n ( ! p e G R E z e P R n ( 0 0!

113 Also: X N p z n [y n P n ( 0 )] 5! d N (0 ; 0 ) N n=1 3 where: 0 E z n P n z 0 n And: X N p z n e R n ( 0 ) 5! d N 0 ; e R N n=1 where: e R E z n e V (x n ; 0 ) R z 0 n! where remember that in e P R n ( 0 ). ev (x n ; 0 ) R is the variance matrix of the simulation errors

114 And the terms X N p z n [y n P n ( 0 )] 5 and N n= X N p z n e R n ( 0 ) 5 are N n=1 independent due to the conditional mean independence of the simulation error, i.e., E(e R n ()jx n ) = 0. Therefore, applying Slutsky s Theorem, we have that the asymptotic distribution of the SSM estimator as N! 1 with R xed, is: p N b N 0!d N 0 ; G e R 1 h 0 + e i e R GR 1 0 As R goes to in nity, G e R! G 0, and e R! 0, such that the SMM estimator becomes equivalent to the MM estimator without simulation. But in any empirical application, with nite number of simulation R, the simulation error introduces additional noise that increases the variance of the estimator. This is fully taken into account by the expression of the asymptotic variance above.

115 WILLIAMS-DALY-ZACHARY (WDZ) THEOREM Consider the RUM U(j) = u j + " j where " = f" j : j = 0; 1; :::; Jg has a CDF F (") that is continuously di erentiable over the whole Euclidean space R J+1. De ne the Social Surplus function (McFadden, 1981) S(u) = Z max j2a n uj + " j o df j = P (j) Proof: Exercise. Hint: Note max j2a n uj + " j o =@uj is equal to 1f u j + " j u i + " i for any i 6= j g.

116 Note that this result is like the discrete choice version of Roy s Theorem in Consumer Demand: The derivative of the indirect utility function with respect to price is equal to the demand.

117 HOTZ-MILLER PROPOSITION Consider the RUM U(j) = u j + " j where " = f" j : j = 0; 1; :::; Jg has a CDF F (") that is continuously di erentiable over the whole Euclidean space R J+1. De ne the vector of utility di erences u = fu j of CCPs P = fp (j) : j > 0g. u 0 : j > 0g and the vector Given the CDF F ("), the de nition of CCPs provides a mapping from the vector of utility di erences u into the vector of CCPs, P. P = G(u) Hotz and Miller show that this mapping is invertible: u = G 1 (P) There is a unique vector of utility di erences u that can rationalize (generate as optimal choices) a vector of CCPs P. Revealed Preference.

ECONOMETRICS II (ECO 2401) Victor Aguirregabiria. Winter 2017 TOPIC 3: MULTINOMIAL CHOICE MODELS

ECONOMETRICS II (ECO 2401) Victor Aguirregabiria Winter 2017 TOPIC 3: MULTINOMIAL CHOICE MODELS 1. Introduction 2. Nonparametric model 3. Random Utility Models - De nition; - Common Speci cation and Normalizations;