Estimating Semi-parametric Panel Multinomial Choice Models

Estimating Semi-parametric Panel Multinomial Choice Models Xiaoxia Shi, Matthew Shum, Wei Song UW-Madison, Caltech, UW-Madison September 15, 2016 1 / 31

Introduction We consider the panel multinomial choice model: Y k it = 1{β X k it + A k i + u k it β X k it + A k i + ɛ k it k = 0,..., K}. A k i : individual/choice-specific fixed effect with arbitrary dependence on {Xit k}t t=1. ɛ k it : idiosyncratic error with arbitrary dependence and heterogeneity across k. Short panel: T = 2. Normalization: X 0 it = 0, A0 i = 0; β = 1; no constant in X k it. Examples: Transportation mode choices; demand for differentiated products. 2 / 31

Contributions Derive identifying moment inequalities where the fixed effects are eliminated. Give easy-to-verify point identification conditions for β. Propose a consistent estimator. 3 / 31

Contributions Derive identifying moment inequalities where the fixed effects are eliminated. Give easy-to-verify point identification conditions for β. Propose a consistent estimator. Not yet done: the asymptotic distribution of the estimator. 3 / 31

Literature Panel data multinomial choice. Chamberlain (1989, logit), Arellano and Bonhomme (2009, long panel), Pakes and Porter (2015), Khan, Tamer, and Ouyang (2016) Cross-sectional data multinomial choice. McFadden (1973, 1978, 1989, logit), Manski (1975, iid error), Matzkin (1993, 2012), Lewbel (2000), Fox (2007, exchangeable error), Yan (2012), Powell and Ruud (2008), Ichimura, Powell, and Ruud (2015, rank correlation) Panel data binary choice. Chamberlain (1984,2001, logit), Manski (1987, maximum score), Honoré and Kyriazidou (2000), Honoré and Lewbel (2002, special regressor), Abrevaya (2000), Hoderlein and White (2012). Cross-sectional data binary choice. Logit, Probit, maximum score, rank correlation (Han, 1987), maximum likelihood, least square, etc. Cyclic monotonicity. Rochet (1987, implementability), Browning (1989, SREH), Ashlagi, Braverman, Hassidim, Monderer (2010). 4 / 31

Roadmap Cyclic monotonicity and our model Identification Estimation and Monte Carlo Conclusion 5 / 31

Cyclic Monotonicity Cyclic monotonicity is a generalization of monotonicity from R R to R K R K. Define a cycle of length M in B R k as: x 1, x 2,..., x M, x M+1 = x 1. f : B R K is cyclic monotone wrt the above cycle iff M (x m x m+1 ) f (x m ) 0. m=1 Special case: if M = 2: (x 2 x 1 ) (f (x 2 ) f (x 1 )) 0. Let F : B R be a continuously differentiable function. Then F is convex if and only if its gradient F : B R K is cyclic monotone. ref. Rockafellar (1970, Ch. 24). 6 / 31

in a Multinomial Choice Model Consider the model Y k = 1{β X k + u k β X k + u k for all k = 0,..., K} Normalization: X 0 = 0. Let u = (u 0,..., u K ), X = (X 1,..., X K ), and Y = (Y 1,..., Y K ). Define the social surplus function: { } G(β X) = E max k=0,...,k [β X k + u k ] β X. 7 / 31

in a Multinomial Choice Model { } G(β X) = E max k=0,...,k [β X k + u k ] β X. Proposition Suppose that u is independent of X, and the cdf of u is continuous. Then (a) G( ) is convex and differentiable, (b) G( ) = p( ), where p( ) E(Y β X = ), and (c) p( ) is cyclic monotone. Proof is simple and parallels McFadden (1981). 8 / 31

Panel Multionoial Choice Model Consider the panel model Y k it = 1{β X k it + A k i + ɛ k it β X k it + A k i + ɛ k it Define the conditional choice probability p(v, a) = E[Y it X itβ = v, A i = a]. for all k = 0,..., K}. Assumption 3.1 (a) ɛ it (X i1, X i2 ) A i, (b) ɛ i1 ɛ i2 A i, and (c) the conditional cdf of ɛ it given A i is continuous. Lemma The function p(, a) is cyclic monotone for every a under Assumption 3.1. 9 / 31

From CM to Identification p(v, a) = E[Y it X it β = v, A i = a] is cyclic monotone in v. Thus, for any x 1, x 2 supp(x it ), and any a supp(a i ), (p(x 1β, a) p(x 2β, a)) (x 1β x 2β) 0. 10 / 31

From CM to Identification p(v, a) = E[Y it X it β = v, A i = a] is cyclic monotone in v. Thus, for any x 1, x 2 supp(x it ), and any a supp(a i ), (p(x 1β, a) p(x 2β, a)) (x 1β x 2β) 0. With panel data: p(x β, a) = E[Y it X it = x, X i = a] t. Thus, ( E[Y i1 X i1 = x 1, A i = a ] E[Y i2 X i2 = x 2, A i = a ] ) (x 1β x 2β) 0, for any x 1, x 2 supp(x it ), and any a supp(a i ), 10 / 31

From CM to Identification Then, ( E[Y i1 X i1, X i2, A i ] E[Y i2 X i1, X i2, A i ] ) (X i1β X i2β) 0 Take conditional expectation given X i1,..., X i2, and we have where Z i = Z i2 Z i1. E[ Y i X i1, X i2 ] X iβ 0 This reduces to the rank-based identification conditions in Manski (1987) and Abrevaya (2000) when K = 1 (binary choice). 11 / 31

Point Identification of β Our identification inequality is E[ Y i X i1, X i2 ] X iβ 0 pointwise. G supp( X i E[ Y i X i1, X i2 ]) is important for identification. It is important up to its convex conic hull: cc(g) = {λ 1 g 1 + λ 2 g 2 R dx : g 1, g 2 G, λ 1, λ 2 0}. The identified set B 0 of β is B 0 = {b R dx : b = 1, b g 0 g G} = {b R dx : b = 1, b g 0 g cc(g)}. 12 / 31

cc(g) and the Identification of β A A A D E O C O C O C F G B B B Panel (i) Panel (ii) Panel (iii) Figure : Point identification of β: geometric intuition OC, OECF, OACB stand for cc(g) in panels (i), (ii), (iii), respectively. OACB, ODCG, OC stand for ID set of β in panels (i), (ii), (iii), rspctvly. 13 / 31

Point Identification goal Recall that G supp( X i E[ Y i X i1, X i2 ]). We would like to find conditions on model priminitives (X it, A it and ɛ it ) that guarantee that G is rich enough to ensure point ID, i.e. to guarantee the following high-level condition. Assumption 3.2 The set cc(g) is a halfspace of R K. 14 / 31

Point ID Conditions on Unobservables First, we impose a condition on the unobservables: Assumption 4.1. (a) Pr(supp(ɛ it A i ) = R K ) > 0. (b) The conditional CDF of A i given (X i1, X i2 ) = (x 1, x 2 ) is continuous in (x 1, x 2 ). Last, we need a condition on the observable X i. 15 / 31

Point ID Intuition from the Binary Case To get intuition, consider the binary case (K = 1): E[ Y i X i1, X i2 ] is a scalar. Then cc(g) = cc(supp( X i E[ Y i X i1, X i2 ])) This form makes the task easier. = cc (supp( X i sign(e[ Y i X i1, X i2 ]))) = cc ( supp( X i sign( X iβ)) ) 16 / 31

A Simple Sufficient Condition in the Binary Case For example, a sufficient condition when d x = 2: supp( X i ) supp( X i ) occupies all polar angles. And, for general d x, a sufficent condition is: cone (supp( X i ) supp( X i )) = R dx, where cone(g) = {λg : λ 0, g G}. 17 / 31

Restoring the Simple Form when K 2 X i E[ Y i X i1, X i2 ] = K k=1 X k i E[ Y k i X i1, X i2 ] Two ways to reduce the summation to the simple form: Conditional on the event Xi k = Xi 1 for all k. Then X i E[ Y i X i1, X i2 ] = X 1 i E[ Y 0 i X i1, X i2 ]. For a given k, conditional on the event X k i = 0. Then X i E[ Y i X i1, X i2 ] = X k i E[ Y k i X i1, X i2 ]. 18 / 31

Let G I be the union (over k) of supp( Xi k X k i = 0) supp( Xi k X k i = 0), where X k i = ( X 1 i,..., X k 1 Let G II be i, X k+1 i,..., X K i ). supp( Xi 1 Xi k = Xi 1 k) supp( Xi 1 Xi k = Xi 1 k) Let G = G I G II. 19 / 31

Point ID Our identification conditions will be imposed on the set G. Two sets of such conditions are given. Each is sufficient by itself. The first set allows all covariates to be bounded, but essentially requires all covariates to be continuous. The second set allows discrete regressors most generally, but requires one regressor with large support. 20 / 31

Point Identification Bounded Covariates The first sufficient uniform point identification allows (but does not require) all covariates to be bounded. Assumption 4.2 The set cone(g) = R dx. Theorem 4.1 Under Assumptions 3.1 and 4.1-2, Assumption 3.2 holds, and thus (a) B 0 = {β}, (b) Q(β) = 0, Q(b) > 0 for all b {b R dx : b = 1} and b β, where Q(b) = E min(0, E[ Y i X i1, X i2 ] X ib). Verifying Assumption 4.2 is fairly easy. Let s see a few examples. 21 / 31

Verifying Assumption 4.2 Examples For simplicity, let K = 2 (3-choices), d x = 2. Example 1. supp((x k it ) t=1,2;k=1,2) = [0, 1] 8. Then Then, given X 1 i = 0, supp(( X k i ) k=1,2 ) = [ 1, 1] 4. supp( X 2 i ) = [ 1, 1] 2. The cone generated by [ 1, 1] 2 is R 2, and [ 1, 1] 2 G I G. Thus, Assumption 4.2 is satisfied. 22 / 31

Verifying Assumption 4.2 Examples For simplicity, let K = 2 (3-choices), d x = 2. Example 2. Suppose that the covariates do not vary across k: X k it = X it for k = 1, 2, and supp((x it ) t=1,2 ) = [0, 1] 4. Thes, G II = supp( X i ) = [ 1, 1] 2. Assumption 4.2 is verified by noting that cone([ 1, 1] 2 ) = R 2, and that G II G. 23 / 31

Verifying Assumption 4.2 Examples For simplicity, let K = 2 (3-choices), d x = 2. Example 3. supp((xit 1) t=1,2) = [0, 1] 4, supp((xit 2) t=1,2) = {0, 1} 4, and the joint support is the Cartesian product. Then, supp( X 1 i X 2 i = 0) = [ 1, 1] 2. Note that cone([ 1, 1] 2 ) = R 2, and [ 1, 1] 2 G I G. Thus, Assumption 4.2 holds. 24 / 31

Point Identification Discrete Covariates Assumption 4.2 allows all covariates to be bounded, but only allows discrete covariates to a limited extent. Allowing for discrete covariates more generally requires the presence of an unbounded covariate. (Chamberlain (2001), our Theorem B.1). Those are impossibility results of uniform ID. The CM inequalities may still point ID β, depending on the true value of it. We next present an alternative to Assumption 4.2 that allows discrete covariates generally. 25 / 31

Point Identification Discrete Covariates Let g j = (g 1,..., g j 1, g j+1,..., g dx ). Let G j be the projection of G on the space of g j. Let G j (g j ) = {g j R : (g 1,..., g dx ) G}. Assumption 4.3 There exists a j {1,..., d x } such that (a) G j (g j ) = R for all g j in a subset G 0 j of G j, (b) G 0 j is symmetric about the origin, and is not contained in a proper linear subspace of R dx 1, and (c) the jth element of β, denoted by β j, is nonzero. Theorem 4.2 Under Assumptions 3.1, 4.1, and 4.3, Assumption 3.2 holds, and thus (a) B 0 = {β}, (b) Q(β) = 0, Q(b) > 0 for all b {b R dx : b = 1} and b β. 26 / 31

Verifying Assumption 4.3 An Example Let K = 2 (3-choices), d x = 2, and suppose that β 2 0. Example 1. Suppose that the first covariate is a time dummy: X1,it k = t for all k, t, and the second covariate has unbounded support: supp(x2,it k ) t=1,2;k=1,2) = (c, ) 4 for some c R. Then supp ( ( X1,i, 1 X1,i, 2 X2,i, 1 X2,i) 2 ) = {1} 2 R 2. Then, supp ( Xi 1 Xi 1 = Xi 2 ) = {1} R. Then, G G II = { 1, 1} R. Let the special j in Assumption 4.3 be 2, and let G 2 0 = { 1, 1}. Assumption 4.3(b) obviously holds. Assumption 4.3(a) also holds because G 1 ( 1) = G 1 (1) = R. Assumption 4.3(c) holds since β 2 0. 27 / 31

A Consistent Estimator Let ˆp(X i1, X i2 ) be an estimator of p(x i1, X i2 ) E[ Y i X i1, X i2 ]. Let β be the argmin of the following criterion function, subject to normalization, Q n (b) = n [ b X i ˆp(X i1, X i2 ) ]. i=1 Assumption 5.1 (a) sup x1,x 2 X ˆp(x 1, x 2 ) p(x 1, x 2 ) p 0 as n, (b) E[ X it ] <. Theorem 5.1 Under Assumptions 3.1, 3.2, and 5.1, β p β as n. 28 / 31

Monte Carlo Exercise K = 2 (Trinary choice), T = 3. Let Xit k R3. Let X k across k, j, and t. j,it be Uniform[0, 1] and let it be independent Let A k i = (ωi k + 3 j=1 X j,i1 k )/4 where ωk i across k and of other model primitives. is Uniform[0, 1], independent Let ɛ k it = Ak i (uk it u0 it ) for k = 1, 2, where u 0 it 0 1 0 0 uit 1 N 0, 0 1 0.5 uit 2 0 0 0.5 1 Let the u s be independent across t. Let β = (1, 0.5, 0). 29 / 31

Monte Carlo Results Table : Monte Carlo Results (5000 Repetitions), β 1 normalized to 1 n β 2 β3 BIAS SD rmse BIAS SD rmse Length-2 cycles only 250 -.1052.1077.1506 -.0096.0996.1000 500 -.0800.0781.1118 -.0110.0738.0746 1000 -.0624.0580.0852 -.0119.0545.0558 2000 -.0509.0418.0659 -.0107.0387.0402 All cycles 250 -.1080.1058.1512 -.0095.0982.0986 500 -.0817.0778.1128 -.0108.0735.0742 1000 -.0637.0580.0861 -.0117.0543.0556 2000 -.0516.0418.0664 -.0110.0390.0405 30 / 31

Conclusion and Beyond We derived conditional moment inequalies for a semi-parametric panel multinomial choice model using the cyclic monotonicity of the choice-probability function. We show point identification of the coefficient parameter in the linear utility index. We propose a consistent estimator based on the conditional moment inequalities. We are still developing an asymptotic distribution theory for the estimator (or an alternative consistent estimator.) Further future: marginal effects. 31 / 31