BOOTSTRAP INFERENCE IN NORMAL FORM GAMES

Size: px

Start display at page:

Download "BOOTSTRAP INFERENCE IN NORMAL FORM GAMES"

Kelly Douglas
5 years ago
Views:

1 BOOTSTRAP INFERENCE IN NORMAL FORM GAMES MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE Abstract. We propose a computationally feasible way of constructing valid asymptotic confidence regions for partially identified parameters of discrete games of complete information. The confidence regions are based on a sharp characterization of the identified set, hence delivering more informative inference than achieved to date in the literature. Validity of the confidence regions rests on the characterization of the identified set through the Max-Flow Min-Cut Theorem, and their efficient construction in empirically relevant models relies on a novel combinatorial bootstrap procedure, the small sample performance of which is assessed in a simulation experiment. The method is applied to estimating the determinants of long term elderly care choices of American families. Keywords: complete information games, multiple equilibria, partial identification, confidence regions, Max-Flow Min-Cut, bootstrap, elderly care. JEL subject classification: C13, C72 Date: February 7, The authors thank Alfred Galichon, Silvia Gonçalves and Hidehiko Ichimura for helpful discussions and David Straley for providing the data. Parts of this paper were written while Henry was visiting the Graduate School of Economics at the University of Tokyo; he gratefully acknowledges the CIRJE for its support. Comments from seminar audiences in Cambridge, the Bank of Japan, Hitotsubashi, UBC Sauder School, UCL, U-Kyoto, U-Tokyo, Rochester, Vanderbilt, and participants at the Inference in Incomplete Models conference in Montreal, the First Workshop on New Challenges in Distributed Systems in Valparaiso and the Second Workshop on Optimal Transportation and Applications to Economics in Vancouver are also gratefully acknowledged. Correspondence address: Département de sciences économiques, Université de Montréal, C.P. 6128, succursale Centre-ville, Montréal QC H3C 3J7, Canada, marc.henry@umontreal.ca and romuald.meango@umontreal.ca. Operations and Logistics Division, Sauder School of Business, University of British Columbia, 2053 Main Mall, Vancouver, BC V6T 1Z2, Canada, maurice.queyranne@sauder.ubc.ca. 1 Electronic copy available at:

2 2 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE Introduction With the conjoined advent of powerful computing capabilities and rich data sets, the empirical evaluation of non cooperative game theoretic models with equilibrium data is becoming prevalent, particularly in the analysis of social networks and industrial organization. However, in such models, multiple equilibria are the norm rather than the exception. Though multiplicity of equilibria and identifiability of the model s structural parameters are conceptually distinct, the former often leads to a failure of the latter, thereby invalidating traditional inference methods. This is generally remedied by imposing additional assumptions to achieve identification, such as imposing an equilibrium selection mechanism or a refinement of the equilibrium concept. Manski (1993) and Jovanovic (1989) were among the first to advocate a new inference approach that dispenses with identification assumptions and delivers confidence regions for partially identified structural parameters of interaction models. A large literature has developed on the general problem of inference on partially identified parameters defined as minimizers of objective functions or more specifically as solutions to moment inequality restrictions, following the seminal work of Chernozhukov, Hong, and Tamer (2007). The partial identification approach was applied to inference in games in Tamer (2003), Andrews, Berry, and Jia (2003), Pakes, Porter, Ho, and Ishii (2004), Ciliberto and Tamer (2009), Bajari, Fox, and Ryan (2008) among others. In these works, however, the inference is based on sets of conditions that are necessary, but generally not sufficient for equilibrium. This yields inference in the form of admissible regions of the parameter space that are larger, hence less informative, than could be achieved with the model and the data at hand. A characterization of the identified set based on necessary and sufficient conditions for equilibrium is proposed in Galichon and Henry (2009a) and Beresteanu, Molchanov, and Molinari (2009), and a method for constructing asymptotic confidence regions for each element of the identified set based on that characterization is proposed in Galichon and Henry (2009b). That approach however, is computationally too demanding for satisfactory applicability to empirical data sets, as are other existing methods. In the present work, we present a combinatorial formulation of the problem of inference in models of interactions, and we use it to derive a new proof of the characterization of the identified set in Galichon and Henry (2009a) and to derive an efficient inference method in models of discrete choice with interactions. These include models of social interaction, as in Brock and Durlauf (2001), Bramoullé, Djebbari, and Fortin (2009), as well as models of economic interactions, such as innovation games in Lee and Wilde (1980); information games in Hendricks and Kovenock (1989); entry and location games as in Bresnahan and Reiss (1990) and Berry (1992); matching games, as in Coate and Loury (1993); and others. Ekeland, Galichon, and Henry (2010) have shown that generic partial identification problems can be formulated as optimal transportation problems. Developing ideas in Galichon and Henry (2009a), we exploit the special structure of discrete choice problems and show that the correct specification of a partially identified model can be formulated as a problem of maximizing flow through a network, and that the identified set in discrete choice models with interactions can be obtained from the Max-Flow Min-Cut Theorem. This approach has Electronic copy available at:

3 INFERENCE IN GAMES 3 interesting structural and computational consequences. On the structural side we obtain a characterization of the identified region using an explicit set of inequalities with a very simple structure in terms of probabilities derived from the equilibrium model. On the computational side, the simplicity of this characterization leads to an efficient inference method for realistic problems of discrete choice with interactions. The dual problems of maximizing flow through a network and finding a minimum capacity cut are classics in combinatorial optimization and operations research, with applications in many areas such as traffic, communications, routing and scheduling; see, for example Schrijver (2004) for the theory and history, and Ahuja, Magnanti, and Orlin (1993) for numerous applications. To our knowledge this is the first application of the Max-Flow Min-Cut Theorem to statistical inference for equilibrium models. We propose a method for constructing a region covering the identified set with prescribed probability based on this combinatorial characterization. This requires the computation of upper probabilities, which we derive with a novel but simple bootstrap procedure that yields an asymptotically valid confidence region. Since the procedure involves bootstrapping the empirical process only, it does not suffer from the problems of bootstrap validity in partially identified models described in Chernozhukov, Hong, and Tamer (2007) and Bugni (2010). We illustrate the procedure on a simple social interactions model and conduct a simulation experiment to assess the performance of the bootstrap procedure in small and medium-sized samples. The paper is organized as follows. The next section introduces the general framework and illustrates it with some empirically relevant examples. Section 2 derives the identified set and section 3 the confidence region. Section 3.3 presents simulation evidence. Section 4 presents the empirical results of an application to the determinants of long term elderly care choices by American families and the last section concludes. 1. Analytical framework We consider a generalized model with discrete dependent variables. Y is an outcome variable that takes values in a finite set Y with K elements {y 1,..., y K }; X is a vector of observed explanatory variables with domain X ; ε is a vector of unobserved variables with domain Ξ R l ; and θ Θ R d is a vector of unknown parameters. The random vectors Y, X and ε are defined on a common probability space (Ω, F, P); G : ε G(ε x; θ) Y is a many-to-many mapping parameterized by x and θ; and the model relating outcome variables Y and shocks ε is given by P(Y G(ε X; θ 0 ) X) = 1 X a.s. for some θ 0 Θ. (1.1) Such a θ 0 is generally non-unique, which prompts the following definition. Definition 1 (Identified Set). The identified set is the set Θ I of values θ 0 of the parameter satisfying (1.1). Θ I is the object of interest, and we shall propose a methodology for constructing a confidence region for the identified set, i.e. a region Θ n that covers Θ I with a prescribed probability.

4 4 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE For each y Y, we shall denote by P (y X) the conditional probability P(Y = y X). If u is a subset of Y, P (u X) will denote P y u P (y X). To emphasize the dependence of Θ I on P, we shall denote it Θ I (P ). We make the following assumption on the distribution of unobservable shocks. Assumption 1 (Parametric). The conditional cumulative distribution function of ε is F (ε X; θ) for some known function F parameterized by θ. Remark 1. The same notation is used for the parameters of the model correspondence and for the parameters of the error distribution to indicate that they may have common components. At this stage, we are not distinguishing nuisance parameters from parameters of interest. The distinction can be eventually recovered by projecting the confidence region onto the subspace of parameters of interest. The main motivating example of this framework is the case of inference on parameters of discrete games. Consider a game described by its player s payoff functions and strategy sets. Suppose the game is parameterized by the triplet (ε, X, θ) Ξ X Θ, where ε is a random unobserved heterogeneity parameter vector which may represent, among others, the player types, random shocks, or measurement errors, and whose distribution is parameterized; X is an observed random heterogeneity parameter (vector of explanatory variables), whose realizations will be observed along with the outcomes of the games in the data; and θ is an unknown deterministic parameter (structural parameter vector), which is the object of inference. Call G(ε X; θ) the equilibrium correspondence, i.e., the set of equilibria of the game, for any given equilibrium concept. For any value of ε, X and θ, we observe a single equilibrium value Y G(ε X; θ), not the whole set G(ε X; θ). As in Jovanovic (1989), we allow random selection, i.e., we assume that there exists probability distributions π(y ε, X, θ) = P(Y = y ε, X; θ) with support contained in G(ε X; θ). It could be uniform on G(ε X; θ), or uniform over all Pareto-efficient equilibria in G(ε X; θ), but whatever it is, the random selection mechanism is not included in the model specification. Following Jovanovic (1989), we say that an equilibrium selection device is compatible with the data if the true probabilities of observed equilibrium outcomes are equal to the probabilities predicted by the model augmented with the equilibrium selection device, i.e., P (Y = y X; θ) = R π(y ε, X; θ)df (ε X; θ). Finally, Proposition 1 of Galichon and Henry (2009a) shows that Θ I of definition 1 is exactly the set of values of θ such that there exists an equilibrium selection device compatible with the data. Examples of this framework are numerous, and we shall focus here on models of interactions in social and economic networks. Example 1 (Discrete choice with Social interactions). Brock and Durlauf (2001) propose a model to analyze the strength of peer-effects in individual decision making. An individual i in a population of size N is faced with a behavioral choice between action Y i = 1 (which can be called smoking to fix ideas) and Y i = 1 (no smoking). This choice is affected by a vector of exogenous variables X i ; an unobservable idiosyncratic preference type ε i; the components β i and γ i of a vector of unknown parameters θ = (β, γ ) (where x denotes the transpose of vector x); and the choice Y j for each peer j in an exogenously determined set F i

5 INFERENCE IN GAMES 5 with cardinality n i, in the following way: Y i = 1 if β ix i + γ i n i Pj F i Y j + ε i > 0, and Y i = 1 otherwise. In Brock and Durlauf (2001) and Soetevent and Kooreman (2007), the population is partitioned into reference groups, and j F i if and only if i and j belong to the same group. However, F i can also be determined by a directed influence network, of which the individuals in the population are the nodes, and j F i if and only if the network has an edge directed from j to i, as in Bramoullé, Djebbari, and Fortin (2009). The model correspondence G(ε X; θ) is then the set of equilibrium outcomes for given values of X = (X 1,..., X N ), ε = (ε 1,..., ε N ) and θ. Soetevent and Kooreman (2007) assume normally distributed unobservable shocks ε, which a special case of Assumption 1. Another simple choice, consistent with common assumptions in game theory, is uniformly distributed player types ε. For illustration purposes, consider two simple instances of the social interactions model. In the first case, two individuals interact by influencing each other. The set of possible outcomes is Y = {( 1, 1), (1, 1), ( 1, 1), (1, 1)}, where (i, j) denotes the equilibrium where individual 1 chooses Y 1 = i and individual 2 chooses Y 2 = j. The equilibrium correspondence is given in Figure 1 for the case where the social influence parameter γ = γ 1 = γ 2 is positive. In the second illustrative case, three individuals A,B and C interact, and the influence set F i for individual i is determined by her neighbors in the undirected network A B C, where individuals A and B influence each other s choice, individuals B and C influence each other s choice, but individuals A and C do not. Hence, F(A) = F(C) = {B} and F(B) = {A, C}. We also assume γ i = γ, i = A, B, C. For simplicity we assume no exogenous variables X i; negative idiosyncratic preferences for smoking; and a positive, identical influence factor for all three individuals; i.e., ε i < 0 and γ i = γ > 0 for i = A, B, C. The equilibrium correspondence is as follows: If ε B > 2γ or {ε B > γ and one of the others also > γ} or {ε A > γ and ε C > γ}, then nobody smokes is the unique equilibrium. If {ε A < γ and ε B < γ and ε C > γ} then there are two equilibria, either nobody smokes or A and B smoke (and symmetrically if the roles of A and C are reversed). If {ε A < γ and ε B < 2γ and ε C < γ} then everybody smokes and nobody smokes are both equilibria. Hence, the set of observable outcomes is Y = {(1, 1, 1), (1, 1, 1), ( 1, 1, 1), ( 1, 1, 1)}, and the regions described above characterize the equilibrium correspondence G(ε X; θ). We now detail several among the many other examples that fit into this framework. Example 2 (Coordination and wage discrimination). Coate and Loury (1993) propose a coordination model to explain how ex-ante identical populations can be ex-post different through self-fulfilling discriminating priors. Individual workers are characterized by explanatory variables X and can belong to either one of two

6 6 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE ε 2 G(ε X; θ) = {( 1, 1)} G(ε X; θ) = {(1, 1)} β 2 X 2 + γ 2 G(ε X; θ) = {(1, 1), ε 1 ( 1, 1)} β 2 X 2 γ 2 G(ε X; θ) = {( 1, 1)} G(ε X; θ) = {(1, 1)} β 1X1 γ1 β 1X1 + γ1 Figure 1. Representation of the equilibrium correspondence G(ε X; θ) in the type space for the case of two individuals influencing each other. The social influence parameter γ is assumed positive. (i, j) corresponds to individual 1 choosing Y 1 = i and individual 2 choosing Y 2 = j. The types ε 1 and ε 2 of the two individuals are plotted along the horizontal and vertical axis, respectively. observable categories W or B. Firms hire workers to perform either an unskilled job producing net return for the firm and wage both normalized to zero, or a skilled job producing net return r q > 0 if the hired worker has previously invested in acquiring the skill, and r u < 0 otherwise. Denote r = r q/r u. Wage for the skilled job is normalized to 1 irrespective of the worker s skill level. Workers can acquire qualification at cost ζ with cumulative distribution function F (ζ X). Qualification is not observed by the firms or the analyst, but firms observe a signal η with conditional distribution function F q(η X) and continuous conditional density function f q (η X), if the worker is qualified; and F u (η X) and f u (η X), respectively, if the worker is not qualified. Let ϑ be the firm s prior probability that a worker is qualified. That prior may depend on the category of the worker, so that ϑ B may be lower than ϑ W. Under the monotone likelihood ratio property, the firm holding prior ϑ on a worker s qualification, will hire the worker if and only if the realized signal η is larger than or equal to a cutoff value s(ϑ) = min{η : r (1 ϑ)f u(η)/ϑf q(η)}. Facing a standard s, a worker will choose to invest in the skill acquisition if and only if its cost is smaller than or equal to F u (s X) F q (s X).

7 INFERENCE IN GAMES 7 An equilibrium is characterized by a pair ϑ = (ϑ B, ϑ W ) with ϑ B ϑ W and both ϑ B and ϑ W belong to the set of fixed points of ϑ (F (F u (s(ϑ)) F q (s(ϑ)) B), F (F u (s(ϑ)) F q (s(ϑ)) W )). Here, the vector of variables unobserved to the analyst are ε = (ζ, η). Assumption 1 holds. The parameter vector θ contains r and the parameters of the distributions F, F u and F q. The analyst observes a cross section of workers job status (skilled job/high salary or unskilled job/low salary); their category (B or W ), which only affects the prior in equilibrium; and exogenous characteristics X, which affect the distributions F, F u and F q. The model correspondence is obtained in the following way. For a given value of X and θ, the image of ε is the set of wages/job statuses that are compatible with an equilibrium prior ϑ given the worker s category. Moro (2003) conducts inference in this model with the assumption that all outcomes emerge from a single equilibrium. The approach we propose can dispense with this restriction. Example 3 (Information spill-overs). Hendricks and Kovenock (1989) propose a model to evaluate the effect of information spill-overs on investment decisions. Two firms have obtained the lease to exploit neighboring tracts containing each the same random amount ζ of oil, with cumulative distribution function F (ζ), which depends on observable characteristics of the location and on a vector of unknown parameters θ. The lease is for a maximum of two time periods. Firms a and b receive a signal η a and η b respectively, and η = (η a, η b ) is distributed according to F (η ζ) conditionally on the amount of oil ζ. F (η ζ) depends on the prospection technology, hence on observable characteristics X of the firms and location, and on θ. At the beginning of period 1, each firm individually decides whether to drill or to wait until period two. The outcome of drilling is observed by both firms. Profits of the firms can be computed in the following way. Let L(ζ) be the present value of production profits minus drilling cost. L(ζ) depends on X and θ. If a firm drills in period 1, its profits are L(ζ). If it waits and the other firm drills in period 1, then it can condition its drilling decision in period 2 on the outcome of drilling, and make a profit of max(0, δl(ζ)), where δ the known common discount rate. Hendricks and Kovenock (1989) show that under standard monotone likelihood ratio assumptions, there are three pure strategy Nash equilibria. In one equilibrium, each firm drills in period 1 if and only if its signal is larger than a cutoff value s, which may depend on X and θ. In the other equilibria, one of the firms drills in period 1 if and only if its signal is larger than r < s and the other firm never drills in period 1. Here, the vector of variables unobserved by the analyst is ε = (ζ, η). Assumption 1 holds. The observed outcomes are the strategies of the two firms. Our approach allows inference on the parameters of this model. Example 4 (Entry games). Another example of the framework above is that of empirical models of oligopoly entry, proposed in Bresnahan and Reiss (1990) and Berry (1992). In this setup, economic agents are firms who decide whether of not to enter a market. Markets are indexed by m, m = 1,..., M, and firms that could potentially enter the market are indexed by i, i = 1,..., J. Y im is firm i s strategy in market m, and it is equal to 1 if firm i enters market m, and zero otherwise. Y m denotes the vector (Y 1m,..., Y Jm ) t of strategies of all the firms. In standard notation, Y im denotes the vector of strategies of all firms except firm i. In models of oligopoly entry, the profit Π im of firm i in market m is allowed to depend on the strategies Y im of

8 8 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE other firms, as well as on a set of profit shifters X im that are observed by all firms and the econometrician; a profit shifter ε im that is observed by all the firms but not by the econometrician; and a vector θ of unknown structural parameters, so that it can be written Π im(y m, X im, ε im; θ) = Π im(y im, Y im, X im, ε im; θ). If, for instance, firms are assumed to play Nash equilibria in pure strategies in each market m (independently of decisions in other markets), their strategies Y im are such that they yield higher profits than 1 Y im given other firms strategies Y im. So the restrictions induced on the strategies and latent profit shifters are Π im (Y im, Y im, X im, ε im ; θ) Π im (1 Y im, Y im, X im, ε im ; θ) for all i = 1,..., J. Hence the model can be written Y m G(ɛ m X m; θ), where X m denotes the matrix of observed profit shifters for firms i = 1,..., J; ε m denotes the vector of latent profit shifters for firms i = 1,..., J; and the correspondence G is defined by G(ε X; θ) = {Y : Π i(y i, Y i, X i, ε i; θ) Π i(1 Y i, Y i, X i, ε i; θ) all i = 1,..., J}, where the index m is dropped when considering a generic market. 2. Identified set Since Y is finite, the model correspondence G(ε X; θ) can only take a finite number of values, which are predicted combinations of equilibria or subsets of Y. Call U the set of values that G can take, and u a generic element of U (so that combination u is a subset of Y). With this notation, and the definition below, we can give a very simple reformulation of the model. Definition 2 (Predicted outcome probabilities). Under Assumption 1, the model delivers the probability of each predicted combination of equilibria u U. We denote Q this probability, so that for each u U, we define Q(u X; θ) := P(G(ε X; θ) = u X, F ; θ). If V is a subset of U, we write Q(V X; θ) = P u V Q(u X; θ). Remark 2. In most applications, it will be difficult to obtain closed forms for Q(u X; θ). However, under Assumption 1, ε can be randomly generated. Given a sample (ε r ) r=1,...,r of simulated values, Q(u X; θ) can be approximated by P R r=1 1{u = G(εr X; θ)}/r. Example 1 continued In the case of two individuals influencing each other, the set of observable outcomes is Y = {( 1, 1), (1, 1), ( 1, 1), (1, 1)} and the set of predicted equilibrium combinations is U = {u 1,..., u 5 } with u 1 = {(1, 1)}, u 2 = {( 1, 1)}, u 3 = {( 1, 1)}, u 4 = {(1, 1)} and u 5 =

9 INFERENCE IN GAMES 9 {(1, 1), ( 1, 1)}. The probabilities Q(u X) of each combination of equilibria u are determined as follows: Q({(1, 1)} X; β, γ) = P(ε 1 > β 1X 1 + γ 1 and ε 2 β 2X 2 γ 2 X) Q({( 1, 1)} X; β, γ) = P(ε 1 β 1X 1 γ 1 and ε 2 > β 2X 2 + γ 2 X) Q({( 1, 1)} X; β, γ) = P(ε 1 β 1X 1 + γ 1 and ε 2 β 2X 2 γ 2 X) + P(ε 1 β 1X 1 γ 1 and β 2X 2 γ 2 < ε 2 β 2X 2 + γ 2 X) Q({(1, 1)} X; β, γ) = P(ε 1 > β 1X 1 γ 1 and ε 2 > β 2X 2 + γ 2 X) + P(ε 1 > β 1X 1 + γ 1 and β 2X 2 γ 2 < ε 2 β 2X 2 + γ 2 X) Q({( 1, 1), (1, 1)} X; β, γ) = P( β 1X 1 γ 1 < ε 1 β 1X 1 + γ 1 and β 2X 2 γ 2 < ε 2 β 2X 2 + γ 2 X). They can be calculated analytically or by simulation depending on the distribution of idiosyncratic types. Under Assumption 1, it will become apparent from the characterization of the identified set Θ I in Theorem 1 below, that Q summarizes all the information contained in the model. To motivate that characterization, consider a combination u of equilibria that may arise for some given values of X and θ. We claim that the model cannot fit the data if the predicted probability of u is larger than the sum of the probabilities of all outcomes y u. Indeed, P (y X) and Q(u X; θ) are the marginals of the conditional joint distribution of (Y, U) given X and θ, and thus Q(u X; θ) = X y u P(Y = y and U = u X; θ) X y u P(Y = y X; θ) = X y u P (y X). Note that Q(u X; θ) may be strictly smaller than P y u P (y X) when some outcome y u also belongs to other combinations u that may arise under different values of ε, as its (marginal) probability P (y X) must then be split between Q(u X; θ) and the probabilities Q(u X; θ) of such other combinations u U containing y. Extending this observation, consider a subset V U and define Then we must have V := {y Y : y u for some u V } = [ u. Q(V X; θ) = X u V = X y V X y V = X u V X P(Y = y and U = u X; θ) y u X u V :y u P(Y = y and U = u X; θ) X P(Y = y and U = u X; θ) u U y V P (y X)

10 10 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE where the inequality is again due to the fact that some y V may also belong to some u V. Since this inequality holds for every V U, we must have 0 1 X Q(u X; θ) X P (y X) A 0. V U u V y V This inequality must also hold for every realization x of X in the domain X of the explanatory variables, implying that every θ in the identified set Θ I (P ) must satisfy 0 1 sup X Q(u x; θ) X P (y x) A 0. x X V U u V y V In Theorem 1 below, we will show, with an appeal to the classical Max-Flow Min-Cut Theorem of combinatorial optimization and the theory of optimal transportation, that this necessary condition for θ Θ I (P ) is also sufficient, providing our first characterization (2.1) of the identified set. We thereby provide, for the case of a finite set of possible outcomes, a new and simpler proof of the characterization of the identified set in Galichon and Henry (2009a) and Beresteanu, Molchanov, and Molinari (2009). This allows us to emphasize the combinatorial optimization formulation of our inference problem, which is key to its tractable solution in empirically relevant instances. Theorem 1 below also provides an alternative characterization (2.2) of the identified set from the dual perspective of outcome subsets Z Y, in addition to the preceding characterization (2.1) based on combination subsets V U. This alternative characterization may be useful in situations where the number of possible outcomes is much smaller than the number of possible combinations. The alternative characterization may also be motivated along similar lines as above. For this, consider a subset Z Y of equilibria and given values of X and θ, and let Z := {u U : u Z}. We also claim that the model cannot fit the data if the predicted probability of Z is smaller than the sum of the probabilities of all combinations u Z. Indeed, P (Z X) = X y Z X y Z = X u Z = X X u U:y u X u Z :y u P(Y = y and U = u X; θ) P(Y = y and U = u X; θ) X P(Y = y and U = u X; θ) y u u Z Q(u X; θ). Since this inequality holds for every Z Y and every realization x of X in the domain X of the explanatory variables, every θ in the identified set Θ I (P ) must satisfy! X X sup max Q(u x; θ) P (y x) 0. x X Z Y u Z y Z

11 INFERENCE IN GAMES 11 In Theorem 1 below, we will see that this alternative necessary condition for θ Θ I (P ) is also sufficient. Theorem 1. The identified set is Θ I (P ) = = j θ Θ : sup max j θ Θ : sup x X x X V U max Z Y ff Q(V x; θ) P (V x) 0 (2.1) ff Q(Z x) P (Z x; θ) 0. (2.2) Proof of Theorem 1. A value θ of the parameter vector belongs to Θ I(P ) if and only if P(Y G(ε X; θ)) = 1, X-a.s. (which we drop from the notation from this point on), hence if there exists a pair (Y, U) of random vectors on Y U such that Y has probability mass P (y X), y Y, U has probability mass Q(u X; θ), u U, and P(Y U X) = 1. This is equivalent to the existence of non negative weights π u y, (y, u) Y U, such that P u U πu y = P (y X), P y Y πu y = Q(u X), and π u y = 0 when y / u. The latter is equivalent to the following programming problem with auxiliary variables a y, y Y and a u, u U having zero as a solution. The programming problem is the following: min( P y Y ay + P u U au ) subject to the constraints P u U πu y + a y P (y X), P y Y πu y + a u Q(u X; θ), a y, a u, πy u 0, and πy u = 0 when y / u. Since P y Y + P u U = P y Y P (y X) + P u U Q(u X; θ) 2 P y Y is also equivalent to max P y Y P u U πu y P u U πu y, the latter = 2 2 P y Y P u U πu y = 1 subject to the constraints P u U πu y P (y X), P y Y πu y Q(u X), π u y 0 and π u y = 0 when y / u. This is called a maximum flow problem, i.e. the problem of maximizing quantity flowing through a network under capacity constraints. A network is a collection of nodes, including a source S and a sink T, and directed edges between the nodes. For instance, (N 1, N 2 ) is an edge leading from node N 1 to node N 2. Here the network involved in the maximum flow problem is {( 1, 1)} ( 1, 1) P (( 1, 1) X) P (( 1, 1) X) Source P ((1, 1) X) ( 1, 1) (1, 1) Q({( 1, 1)} X; θ) {( 1, 1)} {( 1, 1), (1, 1)} Q({( 1, 1)} X; θ) Sink Q({(0, 1), (1, 0)} X; θ) P ((1, 1) X) {(1, 1)} Q({(1, 1)} X; θ) (1, 1) Q({(1, 1)} X; θ) {(1, 1)} Figure 2. Representation of the network in the case of example 1 with two individuals interacting. comprised of a source S, K nodes corresponding to the K elements of Y, J nodes corresponding to the J

12 12 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE elements of U and a sink T. The source S is connected to each of the nodes y 1,..., y k in Y. A node y Y is connected to a node u U if and only if y u. All nodes u 1,..., u J in U are connected to the sink T. To each edge is attached a capacity, which is the maximum amount that can flow through it. Capacity is constrained to P (y X) between S and node y. Capacity is unconstrained (i.e. infinite) between node y and node u such that y u. The capacity of edges between a node u and the sink T is constrained to Q(u X; θ). Figure 2 represents the network in a special case of example 1. We have shown that θ Θ I if and only if the maximum flow in the network described above is equal to 1. We now appeal to a classical result in combinatorial optimization called the Max-Flow Min-Cut theorem, see for instance theorem 10.3 page 150 of Schrijver (2004). A cut through a network is partition of the nodes into two sets separating the source from the sink. The capacity of a cut is defined as the sum of the capacities of edges in the network that cross the cut from the source side to the sink side. Let a cut be defined by the set V of elements of U and the set Z of elements of Y on the sink side of the cut, as shown in figure 3. Since the capacity of an edge from y to u such that y u is infinite, the cut defined by V and Z Z V S SOURCE P (y X) y ỹ y u 0 ỹ / u u Q(u X; θ) T SINK CUT Y U Figure 3. Representation of a cut in the network. has finite capacity if and only if y u and u V jointly imply y Z. Such a cut has capacity C(Z, V ) = P y Z P (y X)+P u U\V Q(u X; θ) = P y Z P (y X)+1 P u V Q(u X; θ). A cut has minimum capacity if no node can be moved between the source side of the cut and the sink side of the cut without increasing capacity, hence if y / u and u V jointly imply y / Z, hence if Z = V = S {u : u V }. Therefore, the capacity of a minimum cut is C(V, V ) = P y V P (y X) + 1 P u V Q(u X; θ) = P (V X) + 1 Q(V X; θ). By the Max-Flow Min-Cut theorem, the capacity of any minimum cut is equal to the maximum flow through the network, hence θ Θ I(P ) if and only if for all subset V of U, P (V X) + 1 Q(V X; θ) 1, i.e. Q(V X; θ) P (V X), and the result follows.

13 INFERENCE IN GAMES 13 Remark 3. Theorem 1 gives two characterizations of the identified set Θ I (P ), sometimes called sharp identified region in the literature. Θ I (P ) contains all the values of the parameter such that (1.1) holds and only such values. Moreover, all elements of Θ I(P ) are observationally equivalent. Hence no value of the parameter vector θ contained in Θ I (P ) can be rejected on the basis of the information available to the analyst. Thus, Θ I(P ) completely characterizes the empirical content of the model. Remark 4. As shown in the following lemma, the function to be optimized in characterization (2.1) of the identified set is supermodular (see Schrijver (2004) for a definition), hence the inner optimization can be carried out in polynomial time, i.e. very efficiently. Lemma 1. The function V P (V x) is submodular for all x X. Proof of lemma 1. Take an x X. Take any u U and V U\{u}. We have P ([V {u}] x) P (V x) = P y S v V {u} v P (y x) P y S v V v P (y x) = P y u\v P (y x) = P (u\v x), which is non-increasing in V, hence the result. Remark 5. Note that as a result of Theorem 1, the identified set can be computed as the solution to a standard linear programming problem with the following additional features: since P (y x) is independent of θ, Θ I is quasi-convex in θ for all V U when the function θ Q(V x; θ) is convex, and is a polyhedron when the function θ Q(V x; θ) is linear. Remark 6. Theorem 1 makes it very apparent that the identified set shrinks as the domain of the explanatory variables expands, thereby operationalizing one of the strengths of the partial identification approach, which is to show the effect of additional information on the object of inference. 3. Confidence region In Theorem 1, we proposed a method to compute the object of interest Θ I (P ) in the limit case, where the true distribution of dependent variables P (y x), y Y, is known. We now turn to the problem of inference on Θ I (P ) based on a sample of observations ((Y 1, X 1 ),..., (Y n, X n )). We seek coverage of the identified set with prescribed probability 1 α, for some α [0, 1]. Definition 3 (Confidence region). A confidence region of asymptotic level 1 α for the identified set Θ I is defined as a sequence of regions Θ n, n N, satisfying lim inf n P(Θ I Θ n ) 1 α. By Theorem 1, we are interested in coverage of the set of values of the parameter θ such that Q(V x, θ) P (V x) for all values of x and all subset V of U. Q is determined from the model, but P is unknown. However, if we can construct random functions P n (A x) that dominate the probabilities P (A x) for all values of x and all subsets A of Y with high probability, then in particular, P n(v x) P (V x) for each x and each subset V of U. Hence any θ satisfying Q(V x, θ) P (V x) for all values of x and all subsets

14 14 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE V of U also satisfies Q(V x, θ) P n (V x) for all values of x and all subsets V of U. There remains to control the level of confidence of the covering region, which is achieved by requiring that P n dominate P with probability asymptotically no less than the desired confidence level. Hence the following assumption. Assumption 2. Let the random functions A P n (A x), A Y, satisfy «lim inf P max ˆP (A x) P n n(a x) 0 1 α. sup x X A Y Suppose now a value θ 0 of the parameter vector belongs to the identified set Θ I. Then, by theorem 1, for all x and V U, Q(V x; θ 0 ) P (V x), so that with probability tending to no less than 1 α, Q(V x; θ 0) P n(v x), hence the following theorem. Theorem 2 (Confidence region). Under Assumption 2, the sets j ff Θ I(P n) = θ Θ : sup max (Q(V x; θ) P n(v x)) 0 x X V U define a confidence region of asymptotic level 1 α for Θ I. (3.1) Proof of theorem 2. Given a value θ Θ I, by Theorem 1, we have sup x X max V U (Q(V x; θ) P (V x)) 0. Under Assumption 2, with limiting probability larger than 1 α, sup x X max V U (P (V x)) P n (V x)) 0. Hence, with probability at least 1 α, sup x X max V U (Q(V x; θ) P n(v x)) 0, and thus θ Θ I (P n ). For feasibility reasons, the domain of the explanatory variables can be replaced by the domain of its realizations in the sample as in Andrews (1997). We define the corresponding confidence region as j ff Θ I (P n ) = θ Θ : max max(q(v X i; θ) P n (V X i )) 0. i=1,...,n V U Theorem 2 has the fundamental feature that it dissociates search in the parameter space (or even possibly search over a class of models) from the statistical procedure necessary to control the confidence level. The upper probabilities P n can be determined independently of θ in a procedure that is performed once and for all using only sample information, i.e. fully nonparametrically. Once the upper probabilities are determined, probabilities Q over predicted sets of outcomes are computed for particular chosen specifications of the structure and values of the parameter, and such specifications and values are tested with inequalities defining Θ n(p n). This dissociation of the statistical procedure to control confidence level from the search in the parameter space is crucial to the computational feasibility of the proposed inference procedure in realistic examples (i.e. sample sizes in the thousands, two-digit dimension of the parameter space and two-digit cardinality of the set of observed outcomes, as in the application to teen behavior in Soetevent and Kooreman (2007), or to entry in the airline market in Ciliberto and Tamer (2009)). The latter authors consider only equilibria in pure strategies, as we have until now. If equilibria in mixed strategies are also considered, as in Bajari, Hong, and Ryan (2010) and in the family bargaining application

15 INFERENCE IN GAMES 15 below, we can appeal to results in Beresteanu, Molchanov, and Molinari (2009) and Galichon and Henry (2009a). In particular, Galichon and Henry (2009a) show that if the game has a Shapley regular core (which is the case in the family bargaining application, by lemma 2 of Galichon and Henry (2009a)), then the identified set is characterized by (2.2) of theorem 1 with the caveat that the set function Z Q(Z x) is replaced by Z L(Z x) = min σ(z)dν(ε), (3.2) σ G(ε X;θ) where G(ε X; θ) is now a set of mixed strategies, i.e. a set of probabilities on the set of outcomes, as opposed to a subset of the set of outcomes. Hence the methodology can be easily adapted Control of confidence level. We now turn to the determination of random functions satisfying Assumption 2. We are looking for functions of the form P n(a X j) = ˆP n(a X j) + β n(a X j), for all A Y, 1 j n, where for y Y, ˆPn(y x) is a nonparametric estimator of P (y x) (see for instance Li and Racine (2008) for its computation), ˆP n (A x) = P y A ˆP n (y x) and β n (. x) are non-negative functions on 2 Y such that lim inf n P(β n(a X j ) P (A X j ) ˆP n (A X j ), all A Y, 1 j n) 1 α, i.e., as functions that dominates the fluctuations of the empirical process: lim inf P n max max[p (A X j) ˆP n (A X j ) β n (A X j )] 0 1 j n A Y «1 α. We can approximate this objective with the bootstrap version of the empirical process. Definition 4 (Bootstrap). Let P n denote probability statements relative to the bootstrap distribution and conditional on the original sample ((Y 1, X 1 ),..., (Y n, X n )). Let ((Y 1, X 1 ),..., (Y n, X n)) be a generic sample drawn from the bootstrap distribution, and ((Y 1 1, X 1 1 ),..., (Y B n, X B n )) be a sequence of B bootstrapped samples, indexed by b = 1,... B. Denote by ˆP n(..) the bootstrap version (i.e., constructed identically from a bootstrap sample) of ˆPn (..) and ˆP b n, b = 1,..., B its values taken on the B realized bootstrapped samples. Finally, for each A Y and 1 j n, denote ζ n(a X j) = P y A [ ˆP n(y X j) ˆP n(y X j)] and define ζ b n(a X j) analogously. In the bootstrap version of the problem, we are seeking functions β n satisfying «P n max max[ ˆP n (A X j ) ˆP n(a X j ) β n (A X j )] 0 1 α -a.s. 1 j n A Y If there was a total order on the space of realizations of ζ n, we could choose β n as the quantile of level 1 α of the distribution of ζ n. However, the ζ n(, X j ) s are random functions defined on 2 Y {X 1,..., X n }, hence there is no such total order. We propose to determine β n from a subset of B(1 α) bootstrap realizations determined as follows.

16 16 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE Step 1: Draw bootstrap samples ((Y b 1, X b 1),..., (Y b n, X b n)), for b = 1,..., B. Step 2: For each b B, j n and A Y, compute ζ b n(a X j ) = ˆP n (A X j ) ˆP b n(a X j ). Step 3: Discard a proportion α of the bootstrap indices, and compute β n(a X j) as the maximum over the remaining bootstrap realizations ζ b n(a X j ). Discarding Bα among the bootstrap realizations guarantees the control of the level of confidence, and we wish to choose the set D {1,..., B} of discarded indices so as to make β n as small as possible, to maximize informativeness of the resulting confidence region. Again, if there was a total order, we would be similarly discarding the Bα largest realizations of ζ b n, effectively choosing β n as the quantile of the distribution of ζ b n, b = 1..., B. Instead, we choose D so as to minimize the quantity w b = max 1 j n max A Y ζ b n(a X j ) over B / D, solving the optimization problem j ff min max w b : D {1,..., B}, D Bα. (3.3) b / D The procedure is explained graphically in figure 4. Problem (3.3) can be solved by the following Bootstrap ζ d n β n A Y ζ k n Figure 4. Stylized representation of the determination of the pseudo-quantile β n in a case without explanatory variables. The subsets A of Y are represented on the horizontal axis, ranging from to Y. ζn d is one of two discarded realization of the empirical process (dotted lines), whereas ζn k is one of three realizations that are not discarded (solid lines). β n is the pointwise maximum over the realizations that were not discarded (thick line). Realization Selection (BRS) algorithm: BRS Step 1: For each b B, compute w b = max 1 j n Py Y max{0, ˆP n (y X j ) ˆP b n(y X j )}. BRS Step 2: Let D be the set of indices b of the Bα largest w b values. Proposition 1. The BRS algorithm determines an optimal solution to problem (3.3) in O(nB Y ) time.

17 INFERENCE IN GAMES 17 Proof of proposition 1. We first justify the BRS Step 1 by showing that w b = w b for all b. Indeed observe that for any j {1,..., n} and A Y, we have ζn(a X b j) = X ˆP n(y X j) X ˆP n(y X b j) = X y A y A y A[ ˆP n(y X j) ˆP n(y X b j)] and thus max A Y ζn(a X b j) is attained by selecting all the elements y Y with ˆP n(y X j) ˆP n(y X b j) > 0. It follows that w b = max 1 j n max A Y [ P ˆP y Y n (y X j ) P ˆP b y A n(y X j )] and therefore w b = w b. To justify BRS Step 2, let w opt denote the optimum objective value of problem (3.3). If D fails to include any b such that w b > w opt then max b / D w b > w opt, therefore an optimal D must include all b such that w b > w opt. On the other hand, if D is any optimal subset and some b D satisfies w b w opt then discarding b from D yields a feasible subset D\{b } (since D\{b } < D d) such that max b D\{b } w b max b D w b hence D\{b } is an alternate optimal solution. Therefore an optimal D consists of all indices b such that w b > w opt. Concerning the running time, BRS Step 1 requires O(nB Y ) time, and BRS Step 2 requires O(B) time using a linear time selection (or median-finding) algorithm (see Blum, Floyd, Pratt, Rivest, and Tarjan (1973)). Remark 7. Running time linear in the number B of bootstrap samples is achieved in the proof of proposition 1 by invoking a linear-time selection algorithm in BRS Step 2. In practice, this step may effectively be implemented in O(B log B) time using sorting, balanced binary trees, or randomized methods, see e.g. Knuth (1997) and Cormen, Leiserson, Rivest, and Stein (2001). Remark 8. In problem (3.3), we chose to minimize the maximum, over all j {1,..., n} and A Y, of the non-discarded bootstrap realizations ζ b n(a X j ). Other objectives are possible, for example the L 1 objective P b / D w b. The main justification for the L norm objective max d / D w b in (3.3) is that it leads to a problem solvable in linear time. In contrast, the problem with an L 1 objective is computationally difficult, namely NP-hard in the strong sense, as shown in the next result. Proposition 2. Minimization of P b / D w b : D d, D {1,..., B} is NP-hard in the strong sense. Proof of proposition 2. The problem corresponds to the following decision problem: given an n m matrix H, an integer k and a target value t, can one find a subset S {1,..., n} such that S k and P m i=1 maxj S Hij t? Denote (H, k, t) an instance of the latter problem. Consider the well-known NPhard decision problem CLIQUE (see for instance section 4.8 page 43 of Schrijver (2004)): given a graph G = (V, E) and an integer q satisfying 2 q V, does there exist a subset Q V such that Q q and for all i, j V, ij E (i.e. Q is a clique). To any instance (G, q) of the problem CLIQUE, we associate an instance (H, k, t) of our decision problem, where lines of H corresponds to vertices of G (elements of V ), columns of H corresponds to edges in G (elements of E) and H ij = 1 if vertex i belongs to edge j, and 0 otherwise. For any subset S E of edges in

18 18 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE G, we have for all i E, max j S H ij = 1 if i belongs to at least one element of S, and 0 otherwise. Hence, P i E max j S H ij is the number of vertices that belong to at least one edge in S. Define k = q(q 1)/2 and t = q. Then, a set S of k edges involves at least (hence exactly) q vertices if and only if S is the set of edges of a CLIQUE. Hence the answer to the decision problem (H, k, t) thus defined is YES if and only if G contains a CLIQUE with q vertices. Since CLIQUE is NP-complete, it follows that our decision problem is NP-hard. Since k = O( V 2 ) and t = O( V ), the input size (in unitary notation) of such instances of our problem is polynomially bounded by the input size (in unitary or binary notation) Ω( V ) of the corresponding instance of CLIQUE. Hence our decision problem is NP-hard in the strong sense. This result implies that unless P=NP, there exists no algorithm for this problem that runs in polynomial time. In the computation of Θ n (P n ), it may be desirable to require P n (V x) to be submodular as a function of V U, so that the function to be maximized in (3.1) is supermodular, and the optimization problem can be solved in polynomial time. This can be achieved by adding the following additional linear constraints (see Schrijver (2004)): u v U, V U\{u, v}, j = 1,..., n, P n ([V {u} {v}] X j ) P n ([V {u}] X j ) P n ([V {v}] X j ) + P n ([V ] X j ) 0. (3.4) This involves a trade-off between tightness of the bounds β n, hence informativeness of Θ n(p n), and speed of computation of the latter Search over the parameter space. As we noted, the functions P n are determined once and for all from the sample, and can be used as inputs in the computation of the confidence region in (3.1). The search over models and parameters can be conducted with online queries of the form: Search Query For a given structure characterized by Q(. x; θ), and a given value of the parameter vector θ, does θ belong to Θ n (P n )? In other words, is max 1 j n max V U [Q(V X j ; θ) P n (V X j )] 0? If P n is constructed as a submodular function (see 3.4), then Q P n is supermodular, and the optimization problem (3.1) can be solved in polynomial time, i.e., very fast even if the cardinality of U is high. Otherwise, it may be useful to use formulation (2.2) instead of (2.1) in case U has much larger cardinality than Y. The procedure is the same, except that the upper probabilities P n will be replaced by lower probabilities P n obtained as a pointwise minimum of ζn b over B(1 α) bootstrap realizations Simulation evidence. We now evaluate the performance of the bootstrap procedure in a simulation experiment. Consider a simplified version of the social interaction model, where two individuals influence each other equally, so that γ 1 = γ 2 = γ, and no exogenous variables are allowed to influence the outcome,

19 INFERENCE IN GAMES 19 so that β 1 = β 2 = 0. Suppose the distribution of types ε is uniform on [ 1, 1] 2, so that Q({(1, 1), ( 1, 1)} γ) = γ 1 γ 2 = γ 2, Q({( 1, 1)} γ) = (1 β 1X 1 γ 1 )(1 + β 2X 2 γ 2 )/4 = (1 γ) 2 /4, Q({(1, 1)} γ) = (1 + β 1X 1 γ 1 )(1 β 2X 2 γ 2 )/4 = (1 γ) 2 /4, Q({( 1, 1)} γ) = [(1 β 2X 2 + γ 2 )(1 + β 1X 1 γ 1 ) + 2γ 1 (1 β 2X 2 γ 2 )]/4 = [1 + 2γ 3γ 2 ]/4, Q({(1, 1)} γ) = [(1 + β 2X 2 γ 2)(1 + β 1X 1 + γ 1) + 2γ 2(1 + β 1X 1 γ 1)]/4 = [1 + 2γ 3γ 2 ]/4. Suppose the true distribution of observable equilibria is also uniform, so that P ((1, 1)) = P (( 1, 1)) = P ((1, 1)) = P (( 1, 1)). Since (1 γ) 2 /4 = Q({( 1, 1)} γ) = P (( 1, 1)) = 1/4, the identified set is reduced to the singleton Θ I (P ) = {0}. Hence we can assess the performance of the procedure as a test of the hypothesis 0 Θ I(P ). We set up the simulation experiment in the following way. We simulate a number R = 10, 000 of initial samples (Y 1,..., Y n ) of size n = 50, 100, 500, 1000 drawn with a uniform distribution. For each sample r = 1,..., R, we compute P n, which we index as P (r,α) n, on 2 Y following Step 1, Step 2, BRS Step 1, BRS Step 2 and Step 3 for levels α equal to 0.01, 0.05, 0.10 and for an number B = 10, 000 of bootstrap replications. The procedure is implemented in Matlab with random seed value equal to The resulting fractions of these R replications for which 0 / Θ n(p (r,α) n ) are reported in table 1 (top entry in each cell). Since in this simple exactly identified case, our procedure is equivalent to a specification test, we compare simulated rejection levels to those of the bootstrapped Kolmogorov-Smirnov specification test (bottom entry in each cell). Execution time was similar for the Matlab implementation of the classical Kolmogorov-Smirnov and of our procedure. The rejection levels of the test corresponding to our procedure are closer to theoretical levels than are the rejection levels of the classical Kolmogorov-Smirnov in half of the cases. Moreover, our procedure only over-rejects in four out of 15 cases, whereas the bootstrap Kolmogorov-Smirnov testing procedure over-rejects in all but one case. The results show a very good performance of our procedure for small and moderate sized samples in this simple example and are therefore very encouraging. 4. Application to long term elderly care decisions We estimate the determinants of long term care option choices for elderly parents in American families. The model we use closely follows the one proposed by Engers and Stern (2002) who present these choices as the result of a family bargaining game. The family members decide simultaneously whether to participate in a family reunion where the care option maximizing the participants utility is chosen. Profits are then split among these participants according to some benefit-sharing rule. 1 Other random seeds were tried and no significant effect of the choice of seed was detected.

20 20 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE Table 1. Rejection levels from the bootstrap procedure (resp. bootstrapped Kolmogorov-Smirnov test) from the Sample Size n α = (0.0102) (0.0124) (0.0105) (0.0102) (0.0111) α = (0.0615) (0.0575) (0.0591) (0.0538) (0.0511) α = (0.1161) (0.0943) (0.1152) (0.1089) (0.1014) The data consists of a sample of 1,541 elderly Americans with two children drawn from the National Long Term Care Survey, sponsored by the National Institute of Aging and conducted by the Duke University Center for Demographic Studies under Grant number U01-AG007198, Duke (1999). Elderly people were interviewed in 1984 about their living and care arrangements. The survey questions include gender and age of the children, the distance between homes of the elderly parent and each of the children, the disability status of the elderly parent (where disability is referred to as problems with Activities of Daily Living or Instrumental Activities of Daily Living (ADL) ) and the number of days per week each of the children devotes to the care of the elderly parent. The dependent variable is the care provision for the parent. The parent is asked to list children (either at home or away from home) and how much each provides help. If only one child is listed as providing significant help, that child is designated the primary care giver. If more than one child is listed, the one providing the most time is designated the primary care giver. If the elderly parent lives in a nursing home, then the nursing home is the primary care giver. If no child is listed and the parent does not live in a nursing home, then the parent is designated as living alone. Table 2 presents the list of variables used in the analysis. They include parent characteristics, characteristics of the children and the care option chosen. A more detailed discussion and summary statistics can be found in appendix A. In the following, we first present the explanatory variables we use and some summary statistics, before giving a detailed presentation of the bargaining game. Then, we explain the application of our estimation methodology and present the results of the estimation The game. The observable choice of care option is modeled as in Engers and Stern (2002) as the outcome of a family bargaining game. Before going further, we introduce some notation. We will index family members as follow. Parent: 0, Firstborn child: 1 and Second born child: 2. The payoff to family member i, i = 0, 1, 2 is the sum of three terms.

21 INFERENCE IN GAMES 21 Variables Equal to 1 if: Percentage of sample Care Option Living with child Living with child Living in nursing home Living home alone Parent Variables D ADL Highly disabled D MARRIED Living with the spouse Children Variables D DIST 0 Living with parent D DIST 1 Distance from parent: 31 min and more D SEX Female Table 2 The first term V ij is the value to parent 0 and to child i of care option j, where j 1, 2 means child j becomes the primary care giver, j = 0 means the parent remains alone and j = 3, the parent is moved to a nursing home. The matrix V = (V ij) ij is known to both children and the parent. We suppose it takes the form V ij = γ ij + W β ij + Z j ψ ij where W indicates the characteristics of the parents (D ADL and D MARRIED), and Z j indicates the characteristics of care option j (D SEX, D DIST1 and D DIST2) and X = (W, Z). θ = (γ ij, β ij, ψ ij ) is unknown to the analyst and the object of inference. Example 5. Consider the following family, in which the matrix where given value of X and θ result in V that takes the form: V = Rows indicate family member i = 0, 1, 2 and columns represents care giving options j = 0, 1, 2, 3 in that order. In this example, the parent is indifferent between all the care options, except the one where she has to move to the Nursing home. Each child prefers to be the primary care giver to any other care option, followed

22 22 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE by the parent living at home, living in a nursing home, and being taken care of by the other child, in that order. The second term in the payoff results from the family bargaining process as follows. We assume that it is always in the interest of the parent to attend the family reunion. However, child i (i = 1, 2) can refrain from participating in the meeting. By choosing not to participate, a member of the family agrees on whatever is decided but can neither ensure the role of primary care giver, nor can he be involved in any side payment. Both children simultaneously decide whether or not to participate in the long term care decision. Suppose M is the set of children who participate. The option chosen is option j M {0, 3} which maximizes the participants s total utility P i M V ij. It is assumed that participants abide by the decision and that benefits are then shared equally 2 among parent and children participating in the decision through a monetary transfer s i, which is the second term in the children s payoff. The third term ɛ i in the payoff is a random benefit from participation, which is 0 for children who decide not to participate and distributed according to absolutely continuous distribution ν(. θ) for each child who participates. All children observe the realizations of ɛ, whereas the analyst only knows its distribution. Therefore, the payoff matrix of the participation game can be determined in the following way. If both children decide to participate, denoted P P, child i s payoff (for i = 1, 2) is where w P P Π i = ε i + w P P 0 = ε i + 1 X max V kj j {0,1,2,3} k {0,1,2} is the share of the overall benefit that each child gets when both children participate. Note that we have 3 participants and an equal sharing rule, which explains the term 1 3. If neither child participates, denoted NN, child i payoff is Π i = V ij with j such that j = arg max {V 00; V 03} Indeed, since none of the children participate, the parent (0) picks the best option among the two available (0 and 3). We will note this quantity w NN i. If only child 1 participates, (option denoted P N), child 1 s payoff is Π 1 = ε 1 + w1 P N 0 = ε X max V ij j {0,1,3} i {0,1} 1 A 1 A 2 Other benefit-sharing rule can be explored. Engers and Stern (2002) study different possible rules: two Pareto optimal rules and one based on the Shapley value

23 INFERENCE IN GAMES 23 where w P N 1 is the share of the overall benefit that player 1 gets when she the only one to participate. And child 2 s payoff is Π 2 = V 2j : j = arg X max V ij j {0,1,3} i {0,1} Following previous notations, we will call this quantity w P N 2. For option NP, wi NP, i = 1, 2 are defined similarly. The Payoff matrix can then be written as in 4.1. Child 2 Child 1 N P N P w1 NN, w2 NN w1 NP, ε 2 + w2 NP ε 1 + w P N 1, w P N 2 ε 1 + w P P, ε 2 + w P P We derive best responses (in pure strategies) br i (s 3 i ) for child i to a strategy s 3 i played by her sibling. Then we have: 8 >< P br 1(P ) = >: N 8 >< P br 1(N) = >: N 8 >< P br 2 (P ) = >: N 8 >< P br 2 (N) = >: N if ε 1 w NP 1 w P P if ε 1 w NP 1 w P P if ε 1 w1 NN w1 P N if ε 1 w1 NN w1 P N if ε 2 w P N 2 w P P if ε 2 w P N 2 w P P if ε 2 w2 NN w2 NP if ε 2 w2 NN w2 NP These lead to the following Nash Equilibria in pure strategies: - (N, N) ε 1 w1 NN w1 P N and ε 2 w2 NN w2 NP - (P, P ) ε 1 w NP 1 w P P and ε 2 w P N 2 w P P - (N, P ) ε 1 w1 NP w P P and ε 2 w2 NN w2 NP - (P, N) ε 1 w NN 1 w P N 1 and ε 2 w P N 2 w P P

24 24 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE Therefore, the equilibrium correspondence ε G(ε x; θ) depends on the rankings of the terms w r i w s i, i 1, 2 and r, s {NN, NP, P N, P P }. It can also be shown that there exists a unique Nash Equilibrium in mixed strategies as follows. A mixed profile (αp + (1 α)n; βp + (1 βn)) is a Nash equilibrium if and only if NN ε 2 `w 2 w2 NP α = (w2 P N w P P ) (w2 NN w2 NP ) and β = ε 1 `wnn 1 w1 P N (w1 NP w P P ) (w1 NN w1 P N ), the denominators are non zero and NN min w1 w1 P N ; w1 NP w P P NN < ε 1 < max w1 w1 P N ; w1 NP w P P NN min w2 w2 NP ; w2 P N w P P NN < ε 2 < max w2 w2 NP ; w2 P N w P P Each action profile results in a (almost surely) unique care option choice, hence for each participation shock ɛ, we can derive G(ɛ X; θ) as the set of probability measures on the set of care options {0, 1, 2, 3} induced by mixed strategy profiles, which are probabilities on the set of participation profiles {NN, NP, P N, P P } Specification. We provide estimates for two specifications of the utility matrix presented in this paragraph Specification 1. The first matrix of interest is the following: β B m DM α A ψ s DS 2 0 A +ψ sds 1 +β ah DA V (X; θ) β β mdm 11 A B +ψ +β ah 1 DD 1 C 0 0 A +β ac DA βmdm β 11 A 0 B 4 +ψ +β ah 1 DD 2 C A 0 +β acda Recall that the columns indicate the options, in the following order {0, 1, 2, 3}, and the rows represent each member of the family, in the following order Parent, Child 1, Child 2. For example, the value the first born child (family member 1) living less than 30 minutes away from the parent s home attaches to the fact that she takes care of a non disabled, non-married parent is measured by β 11, whereas for a disabled parent, it is β 11 + β ac. Note some implications of the model: (1) When the parent is unmarried and has no serious disability (D ADL = 0), the children are indifferent between the option where they are not in charge of the parent. 3 We use abbreviations for the dummies to save space: DM = D MARRIED, DA = D ADL, DS = D SEX, DDi = D DIST i, for i {0, 1}.

25 INFERENCE IN GAMES 25 (2) The presence of problems with ADL affects the utility of the family in two ways. First, the utility of each member is affected by the term β ah, if the option live alone is preferred. Second, if a child is chosen as primary care giver, she will bear the cost (under the hypothesis that β ac < 0) of taking care of the parent with a disability. This cost is transferred to the Nursing Home, if this option is the one preferred by the family. (3) The parameter β m associated to the variable D MARRIED measures the additional utility for all family member to choose the option live alone, when a spouse is present in the same household. (4) The effect of distance on children s utility is introduced in the valuation of the primary care giver. ψ d measures a cost for child i to travel (30 minutes or more) to provide care for the parent on a regular basis. (5) As mentioned earlier, we introduce the parameter α which allows for a preference of the parent for the oldest child. α measures the incremental utility for the parent of being taken care of by the firstborn child, as compared with the second born child. This specification allows for the presence of favoritism, as defined by Li, Rosenzweig, and Zhang (2010). In their words: favoritism exists if the parent derives more utility by spending the same time with [...] one child versus another. Such favoritism could be based on the child s endowment, the endowment here being simply birth order. The gender effect is introduced in the same manner Specification 2. We summarize the second specification of the utilities that we study in the following matrix: β B mdm α A ψ s DS 2 0 A +ψ s DS 1 +β ah DA 0 1 β 11 V (X; θ) = δv 00 B 1DD 1 C A + δv 03 δv β a DA 0 1 β 11 6 δv 4 00 δv 03 B 1 DD 2 C A + δv β a DA The main difference with the first model is the introduction of the term δv 0i in the evaluation of option j by child i. This term measures the degree of altruism of the children. An individual shows altruism when their utility depends directly on someone else s. The children are altruistic if δ > 0. We introduce the idiosyncratic part of the utility through two terms. First, random participation benefits for each child ε i, i {1, 2}. Second, error terms u j, j {0, 1, 2, 3}, which measure unobserved components

26 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE of the family evaluation of the alternatives.

Recall that we include a parameter δ, measuring the preference of the parent for a female caretaker, so that we have: V 0,i = V 0,i + ψ s D SEX i for i {1, 2} where V 0,i sums up the remaining terms

It follows that: V 0,i = V 0,i + ψ s(d SEX i) + ξ i for i {1, 2} with ξ i = ψ s (D SEX i (D SEX i) ) Supposing that true gender is misreported with probability p ξ and that measurement error is

26 26 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE of the family evaluation of the alternatives. We assume: ε i iidn (µ, σ ε ) ; u j iidn (0, σ u ) As mentioned above, we introduce in the specification a random error term to account for misclassification of the gender of children. Recall that we include a parameter δ, measuring the preference of the parent for a female caretaker, so that we have: V 0,i = V 0,i + ψ s D SEX i for i {1, 2} where V 0,i sums up the remaining terms entering the parent s utility. Suppose that (D SEX i ) is the reported value of child i s gender, when D SEX i is the true value of gender. It follows that: V 0,i = V 0,i + ψ s(d SEX i) + ξ i for i {1, 2} with ξ i = ψ s (D SEX i (D SEX i) ) Supposing that true gender is misreported with probability p ξ and that measurement error is independent of true gender, we have ξ i = 0 with probability 1 p ξ and ψ s (1 2D SEX i ) with probability p ξ. Failing to take into account this measurement error may lead to biased estimates of the parameter ψ s and may worsen the identification issue. Figure 5 show the estimation of a confidence region for a simpler model where no random term is introduced to correct for this type of error. Figure 5. Representation of the confidence region for parameter ψ s in a model witout an error term for correction of the missclassification of gender. The confidence region presents the shape of a bow tie, symmetric around the node formed at the point δ = 0. As the norm of δ increases, so does the magnitude of the measurement error, which increases the size of the identification region for parameter α Estimation methodology. The methodology proposed in the paper allows the construction of the identified set based on the hypothetical knowledge of the true distribution of the data. As described in section 3, we account for sampling uncertainty and control the level of confidence by constructing set functions

27 INFERENCE IN GAMES 27 A P (A X), which dominate P (A X) (uniformly over A {0, 1, 2, 3} and X) with probability 1 α (the chosen level of confidence, here 0.95). We implement the method detailed in section 3 with a number of bootstrap replications B = Second, we obtain the model likelihood by simulating the valuation matrix and computing the Equilibrium correspondence from the payoff matrix, for given values of X and θ. The procedure is as follow: For a given X and θ, We generate and store R draws of ε from the distribution ν θ. Here, R = 5000 and ν θ is normally distributed with mean µ and variance σ 2 ε, where (µ, σ 2 ε) belong to the parameter θ. For each value ε r, we compute the valuation matrix V (X, ε r, θ) and the corresponding payoff matrix. Then, we determine the equilibrium correspondence G (ε X, θ) from the analytical results derived in the preceding section. The Gambit software provides an alternative for computing numerically the set NE for more complex games. The last step of the simulation is to compute an estimator of the model likelihood L defined in RP (3.2) as follows: ˆL (A X; θ ) = 1 min σ (A). R r=1 σ G(ε r X;θ ) Having constructed those two elements, the identified set comprises all values of θ such that for all observed values of the explanatory variables, the minimum over A {0, 1, 2, 3} of the function P (A X; θ) ˆL(A X; θ) is non negative, as explained in section 3. We construct a n-dimensional grid to conduct the search over the parameter space. Each value of the parameter can be tested in a fraction of a second on a standard laptop, and a region of small dimensionality (1 to 4) can be constructed in a few hours, again on a standard laptop without parallel processing. However, estimation time grows exponentially with the number of parameters induced by the model. In our case, each specification involves a 12-dimensional parameter space. Parallel processing becomes therefore necessary. We use an Open-MP procedure for parallel processing, which is perfectly suited to the method we propose. The computation resources have been provided by the Réseau Québécois de Calcul de Haute Performance (RQCHP). All computation where made under the system Cottos which provides up to 128 computation nodes (1024 CPU cores) equipped with two Intel Xeon E5462 quad-core processors at 3 GHz. Under 1 node, approximately 10 7 parameters points can be tested in 24 hours Results. We perform the estimation of the two specifications introduced earlier, under different values of the mean and variance of the error term. To alleviate the computational burden, we first test the significance of some of the individual parameters by checking whether the hyper planes defined by θ i = 0 - where θ i is a component of θ - intersect the 95% confidence region. In practice, this amounts to building a constrained confidence region under the Null Hypothesis. We fail to reject the Null Hypothesis if the estimation procedure returns a non-empty set. We then obtain a constrained confidence region for the remaining parameters.

28 28 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE Parameters Min Max Min Max Min Max Min Max β β β ah β ac = β m α ψ s ψ d µ σ ε σu p ξ Table 3. Parameters Range for estimation of Specification 1 at β 11 = 0, β ac = β m and for different values of the error terms and of β Specification 1. For each value of mean and variance of the error term, we find a non empty intersection between the confidence region and the hyperplane defined by β 11 = 0. This means we fail to reject (at the 5% level) the null hypothesis that there is no additional constant disutility for a child to take care of an elderly parent. Since, this hypothesis is not rejected, we obtain a constrained confidence region for the remaining parameters. We then obtain confidence regions for different values of β 11 and discuss the latter s effect on the regions. We note that the Null hypothesis H 0 : β 00 = 0 is always rejected. Hence, when we control for all other effects, parents are not indifferent between the first two options. They show a clear preference in favor of living in their own home (option called living alone ) instead of living in a nursing home (β 00 is always positive). The results we present are then for given values of β 00. We provide an insight of how different values of this parameter change the results. We report the range for each parameters in table 3. Note that the identified set is not a compact set. In particular, β ac, β ah, β m and ψ are allowed to diverge to. Results are generally consistent with expectations and previous results on the subject. Namely: (1) The existence of several problems with the parent s functional ability is a key determinant of the decision to enter a nursing home. β ah and β ac are both negative and can both be (very) large. The negative sign of β ah captures the fact that a parent s disability increases the value of care provided by the family or a specialized institution. In addition, β ac < 0 means that the disability entails a utility cost for the child if he is chosen as primary care giver. In other words, holding all else equal,

29 INFERENCE IN GAMES 29 parental disability decreases the value for family members of being primary care giver by at least 2.86 utils. Placing the elderly parent in a nursing home transfers this cost to the institution. If we think that the expansion of life expectancy observed in past years is correlated with the appearance of more functional ability problems, this may partly explain the trend away from care provision by children toward other alternatives. (2) Parameter β m associated with the parent living with a spouse is positive and large. This implies that married parents are more likely to stay at home. In families where the parent is disabled, the effect of living with the spouse, somehow compensates the disutility created by the disability factor and, all other variables controlled, preserve the incentive for parents to live at home. (3) While we cannot rule out parents being indifferent to the gender or birth order of their primary care giver, estimation shows a tilt of the confidence interval toward positive values for both parameters, with a possible positive and large magnitude of the parameter α. In case µ = 1 and σu 2 = 0.25, the data reveal that parents exhibit a preference for an older and for a female care giver. (4) Children living more than 30 minutes from the parents are less likely to provide care than those living closer to the parents. Distance has a (possibly strong) disutility effect on children s incentives to participate in the care decision. All else being equal, moving 30 minutes away from the parents reduces by at least 1.43 utils, the utility of a child when providing care. (5) Note H 0 : σu 2 = 0 is rejected. The magnitude of the unobserved idiosyncratic term u is linked to the value of the parameter β 00. Values of β 00 higher than 1 are rejected for σu 2 = 0.25, and are the only values admissible if σ u = 1. Recall that β 00 measures the preference of the parent for living at their own home. Higher values of this parameter induces then higher probabilities for the choice of the option Living alone i.e P(Option Living alone ) 1. Higher magnitude of the unobserved heterogeneity is therefore needed to make sense of the difference in choices that we observe in the data. On the other hand, the unobserved heterogeneity need not have a large variance when β 00 is close to 0, which suggests that the model explains the variance in the data well. The shape of the confidence region also conveys a considerable amount of information. We plotted in Figure 6 and 7, two dimensional and three dimensional projections and cuts of the confidence region for column 2 of table 3, i.e µ ε = 0, σε 2 = 1, σu 2 = 1. Of great interest is the projection of the identified set in the plan β ah, β m. Figure 7(a) reveals a linear relation between the two parameters of the type : β ah = β m. The estimation rejects models for which the absolute value of the two parameters are significatively different. The data suggest therefore that the disutility induced by the disability of the parent can be entirely compensated by the presence of a spouse in the same household. Notice the triangular shape of the region plotted in figure 7(b) which entails that simultaneous large values of ψ s and α are rejected. This finding means that only one of the effects (gender or birth order) can

30 30 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE be large, not both. In other words, firstborn daughter are not the only possible care givers. Note also that both effects can be very small, though not jointly insignificant. We observe similar types of constraints for the pairs (α, β ah ), (α, β ac), (α, ψ d ), (ψ s, β ac), (ψ s, ψ d ) as large values of parameters α or ψ s are only permitted when the other parameters are jointly large (see figure 7(c) to 7(f)). For example, we obtain a constrained confidence region at β ac = 3.5. The ranges for the two parameters, α and ψ s, are tighter, as α [1, 2] and ψ [0, 1]. Figure 8 shows the effect of the variation of parameter β 11 on ψ s and α. Recall that β 11 represents a fixed cost or benefit for the child chosen as care giver. We observe negative relations between β 11 and ψ s, and β 11 and α. Negative values of ψ s and α are only admissible for positive values of β 11. Hence a model where parents exhibit no favoritism for a daughter and/or a firstborn, or favoritism for a son and/or a second born, will be consistent with our data if and only if there exist a strictly positive constant benefit for a child to be caregiver. In the case where β 11 << 0, the null hypothesis of indifference to gender and birth order of the primary care giver is rejected and the minimum value of both effects (gender and birth order) increases with more negative values of the fixed cost (see figure 8(c)) Specification 2. As discussed earlier, the second specification analyzes primarily altruistic behavior on the part of the children. The parent s utility enters in the children s evaluation through the parameter δ. The greater δ, the more the parent s preferences influence the family s decision. Notice that in the case where δ = 0, both children value all options identically, where they are not in charge of the parent, irrespective of the parent characteristics, in particular as pertains to disability. Not surprisingly, as in the previous specification, we reject H 0 : β 00 = 0 (at the 5% level) as β 00 is always positive, and we fail to reject (at the 5% level) the null hypothesis that there is no additional constant disutility for a child to take care of the elderly parent (H 0 : β 11 = 0). Effects of β 11 on remaining parameters is the same as discussed above. We obtain constrained confidence region for the remaining parameters. We report the range for each parameters in table 4. Note again, that the identified set is not a compact set as β ac, β ah, β m and ψ are allowed to diverge to. The data seem to comply with both types of models: altruistic and non altruistic. However, the hypothesis of non-altruism is rejected for some values of the error term, namely σu 2 = 1. That is, in the case where the variance of the unobserved heterogeneity is higher (σu 2 = 1 compared to σu 2 = 0.25), the behavior of children can only be rationalized in a framework where they act according to altruistic motives. On the other hand, high degrees of altruism (δ > 0.78) are not permitted for a low variance of u. This feature can be understood by relating to the parameter β 00. We discussed in the previous section the fact that σu 2 is related to β 00, as large values of σu 2 imply large values of β 00 (see first line of table 4). Figure 10(a) shows a projection of the confidence region in the space (β 00, δ). We can observe a negative dependance between the two parameters. Both parameters cannot both be very large or both very small. This suggests that a

31 INFERENCE IN GAMES 31 Parameters Min Max Min Max Min Max Min Max β 0, β 1, δ β a = β m α ψ s ψ d µ σ ε σu p ξ Table 4. Parameters Ranges for Specification 2 at β 11 = 0, β a = β m and for different values of the error terms model compatible with the data is either one where children are predominantly selfish and parents have a strong preference for living alone, or one where children are fairly altruistic and parents have only a weak preference for living alone, or a middle point between the two. In this specification, we reject the joint null hypothesis of indifference of the parent to gender and birth order of the primary care giver when there exists no constant cost or benefit associated with providing care (β 11 = 0). The estimation suggests indeed a preference of parents for an older child and for a daughter (α > 0 and ψ s > 0). Negative values of β 11 (i.e existence of constant disutility for caregivers) are associated with larger minimum values of α and β, hence stronger preference for first-born and female caregivers. Confidence intervals for the other parameters do not differ significantly from the previous specification. We again reject the hypotheses that distance (between parent and caretaker) and parent s disability are insignificant, as the range for ψ d is ψ d (, 1.43] and the range for β a is β a (, 2.86]. We also reject the hypothesis that living with a spouse does not increase the value for a parent to remain autonomous. We observe very similar shapes of the confidence region as for the previous specification. Most of the analysis above still applies to the second specification. Notice that in the non altruistic case, i.e δ = 0, only very large effects of the parent s disability are consistent with the data (see Figure 10(b)). In other words, parental disability reduces the utility of providing care for the non altruistic type of children more than for the altruistic type.

32 32 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE Conclusion We have considered the problem of statistical inference in models of discrete choice with social and economic interactions. A characterization of the identified set for parameters of models of discrete choice with interactions was given, with an appeal to a classical theorem in combinatorial optimization, the Max- Flow Min-Cut theorem, thereby emphasizing the optimization formulation of the problem of inference in partially identified models. Finally, we have shown how to apply combinatorial optimization methods within a bootstrap procedure in order to compute informative confidence regions very efficiently, hence feasibly in empirically relevant applications. An application of the methodology was carried out on a family bargaining example and it was shown that most findings in the literature on the determinants of long term elderly care by American families were supported in this more robust framework, where the effects of interaction are accounted for. References Ahuja, R. K., T. L. Magnanti, and J. B. Orlin (1993): Network Flows : Theory, Algorithms, and Applications. Prentice Hall. Andrews, D. (1997): A conditional Kolmogorov test, Econometrica, 65, Andrews, D., S. Berry, and P. Jia (2003): Placing bounds on parameters of entry games in the presence of multiple equilibria, unpublished manuscript. Bajari, P., J. Fox, and S. Ryan (2008): Evaluating wireless carrier consolidation using semiparametric demand estimation, Quantitative Marketing and Economics, 6, Bajari, P., H. Hong, and S. Ryan (2010): Identification and estimation of discrete games of complete information, forthcoming in Econometrica. Beresteanu, A., I. Molchanov, and F. Molinari (2009): Sharp identification regions in models with convex predictions: games, individual choice, and incomplete data, cemmap working paper CWP27/09. Berry, S. (1992): Estimation of a model of entry in the airline industry, Econometrica, 60, Blum, M., R. Floyd, V. Pratt, R. Rivest, and R. Tarjan (1973): Time bounds for selection, Journal of Computational Systems Science, 7, Bramoullé, Y., H. Djebbari, and B. Fortin (2009): Identification of peer effects through social networks, Journal of Econometrics, 150, Bresnahan, T., and P. Reiss (1990): Entry in monopoly markets, Review of Economic Studies, 57, Brock, B., and S. Durlauf (2001): Discrete choice with social interactions, Review of Economic Studies, 68, Bugni, F. (2010): Bootstrap inference in partially identified models defined by moment inequalities: coverage of the identified set, Econometrica, 78,

33 INFERENCE IN GAMES 33 Chernozhukov, V., H. Hong, and E. Tamer (2007): Estimation and Confidence Regions for Parameter Sets in Econometric Models, Econometrica, 75, Ciliberto, F., and E. Tamer (2009): Market structure and multiple equilibria in airline markets, Econometrica, 70, Coate, S., and G. Loury (1993): Will affirmative action policies eliminate negative stereotypes, American Economic Review, 83, Cormen, T., C. Leiserson, R. Rivest, and C. Stein (2001): Introduction to Algorithms. MIT Press. Duke (1999): National Long Term Care Survey, Public use data set produced and distributed by the Duke University Center for Demographic Studies with funding from the National Institute on Aging under Grant No. U01-AG Ekeland, I., A. Galichon, and M. Henry (2010): Optimal transportation and the falsifiability of incompletely specified economic models., Economic Theory, 42, Engers, M., and S. Stern (2002): Family bargaining and long term care, International Economic Review, 43, Galichon, A., and M. Henry (2009a): Set identification in models with multiple equilibria, forthcoming in the Review of Economic Studies. Galichon, A., and M. Henry (2009b): A test of non-identifying restrictions and confidence regions for partially identified parameters, Journal of Econometrics, 152, Hendricks, K., and D. Kovenock (1989): Asymmetric information, information externalities and efficiency: the case of oil exploration, RAND Journal of Economics, 20, Horowitz, A. (1982): The role of families in providing care to the frail and chronically ill elderly in the community, Final report submitted to the Health Care Financing Administration, New York. Jovanovic, B. (1989): Observable implications of models with multiple equilibria, Econometrica, 57, Knuth, D. (1997): The Art of Computer Programming, Volume 3: Sorting and Searching. Addison-Wesley. Lee, T., and L. Wilde (1980): Market structure and innovation: a reformulation, Quarterly Journal of Economics, 94, Li, H., M. Rosenzweig, and J. Zhang (2010): Altruism, Favoritism, and Guilt in the Allocation of Family Resources: Sophie s Choice in Mao s Mass Send-Down Movement, Journal of Political Economy, 118, Li, Q., and J. Racine (2008): Nonparametric Estimation of Conditional CDF and Quantile Functions With Mixed Categorical and Continuous Data, Journal of Business and Economic Statistics, 26, Manski, C. (1993): Identification of endogenous social effects: the reflection problem, Review of Economic Studies, 60,

34 34 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE Moro, A. (2003): The effect of statistical discrimination on Black-White wage inequality: estimating a model with multiple equilibria, International Economic Review, 44, Pakes, A., J. Porter, K. Ho, and J. Ishii (2004): Moment inequalities and their application, unpublished manuscript. Schrijver, A. (2004): Combinatorial optimization: polyhedra and efficiency. Springer: Berlin. Soetevent, A., and Kooreman (2007): A discrete choice model with social interactions: with an application to high school teen behaviour, Journal of Applied Econometrics, 22, Stern, S. (1995): Estimating family long term care decisions in the presence of endogenous child characteristics, Journal of Human Resources, 30, Tamer, E. (2003): Incomplete simultaneous discrete response model with multiple equilibria, Review of Economic Studies, 70, Treas, J., R. Gronvold, and V. Bergston (1980): Filial destiny? The effect of birth order on relations with aging parents, Annual Scientific Meeting of the Gerontological Society of America. Appendix A. Data summary statistics The dependent variable is the care option chosen by the family. We classify it in 4 categories: Child 1 (resp. Child 2 ) means that the firstborn child (resp. the second born child) provides the most care in terms of days spent helping the parents. Our data exhibits a strong dominance of the choice of child 1 over the choice child 2. Child 1 is listed as primary care giver in 26.81% of the families, while child 2 is only listed in 6.75 % of our observations. On average, the older child provides 3.50 (3.12) days of care while the second child provides 0.48 (1.34) days. When Child 1 is the chosen option, the average numbers of days spent by the primary care giver climbs to 4.32 (2.92), while the other child provides on average only 0.17 (0.77) days of care. When Child 2 is the preferred option, the firstborn will spend on average only 0.29 (1.29) days of care while the second child provides 1.74 (2.15) days of care. The third care option, for the parent to enter a nursing home, represents 19.92% of the families in our sample. The remaining option, for parents to live alone, includes all the cases where no child is listed and the parent does not live in a nursing home (it could well be the case that another individual, other than the children provides care for the parent). Parent is living alone in the remaining 46.74% of the sample, which makes it the most frequent option in our sample. We use two types of explanatory variables, those relative to the elderly parent, and those relative to children. The first variable relative to the elderly parent is their disability status based on six types of daily living activities: bed, bathing, dressing, eating, toilet, walking inside. As much as 65% of the parents in the sample population suffer at least one disability, of which 51.5% have 4 or more. In the following, we introduce a categorical variable D ADL with value 1 when the parent shows at least 4 disabilities, 0 otherwise. The second variable relative to the parent characteristics is a categorical variable for the presence of the parent s spouse in the household. We denote it D MARRIED. Previous studies show

35 INFERENCE IN GAMES 35 a positive effect on the incentive to remain home when the parent lives with his or her spouse. In our sample, 40.36% live with a spouse. We consider three characteristics of each child: their distance to the parent, their birth order and their gender. The distance can be viewed as a cost for a child to provide days of care to the parent. The survey measures the time required to travel from each child s residence to the parent s residence. About one half of the children in the sample live at a travel distance of 30 minutes or more, and 12% of the children live in the same household as their parent. Among care givers, the distribution of distance is quite different. Among the firstborn children in charge of the parent, 25% live in the same household, and the percentage of those living more than 30 minutes away drops to 31%. Similarly, among primary care giver who are second born children, the percentage of those living more than 30 minutes away is only 25%. However, the number of those living in the same household as the parents is 7%. As noted by Engers and Stern (2002), the observations where the parent lives in the same household as the child induces some statistical problems. It is unclear if the child is the actual care giver for the parent or if the parent is actually the care giver for the child. To capture the effect of distance on children s incentives, we will therefore conduct estimation conditional on the children living in a different household than the parent s household. We include in our analysis, a dummy variable for each one of the children: D DIST 1 i = 1 if child i lives 30 minutes away or more, 0 otherwise. The child s gender is an important variable in most of the analysis. Previous work (see, for example Horowitz (1982), Treas, Gronvold, and Bergston (1980)) suggests that being female makes a child most likely to provide care. However, Stern (1995) and Engers and Stern (2002) indicate that the data set suffers from significant misclassification of gender (see Stern (1995) for a detailed discussion of the problems with the data set). We introduce in the specification a random error term to account for misclassification of gender.

36 36 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE (a) (β ac, β ah, α) region (b) (ψ s, ψ d, β ac) region (c) (β ac, α, ψ s ) region (d) (β ah, β m, α) region Figure 6. Three dimensional representations of the confidence region for Specification 1 at β 00 = 3, β 11 = 0, µ = 0, σ ε = 1, σ u = 1, p ξ = 0.1

37 INFERENCE IN GAMES 37 β m ψ s β ah α (a) (β ah, β m ) region (b) (α, ψ s ) region β ac 6 ψ d α α (c) (α, β ac) region (d) (α, ψ d ) region β ac 6 ψ d ψ s ψ s (e) (ψ s, β ac ) region (f) (ψ s, ψ d ) region Figure 7. Two dimensional representations of the confidence region for Specification 1 at β 00 = 3, β 11 = 0, µ = 0, σ ε = 1, σ u = 1, p ξ = 0.1

38 38 MARC HENRY, ROMUALD MEANGO AND MAURICE QUEYRANNE β 11 0 β α ψ s (a) (α, β 11 ) region (b) (ψ s, β 11 ) region ψ s ψ s β= α β= α ψ s ψ s β= α β= α (c) (α, ψ s) regions for different values of β 11 Figure 8. Parameter β 11 in relation with other parameters: Specification 1 at β 00 = 3, µ = 0, σ ε = 1, σ u = 1, p ξ = 0.1

39 INFERENCE IN GAMES 39 (a) (δ, ψ s, α) region (b) (β 00, δ, β a) region (c) (ψ s, β a, ψ d ) region (d) (β a, β m, α) region Figure 9. Three dimensional representations of the confidence region for Specification 2 at β 11 = 0, µ = 0, σ ε = 1, σ u = 1, p ξ = 0.1

COMBINATORIAL APPROACH TO INFERENCE IN PARTIALLY IDENTIFIED INCOMPLETE STRUCTURAL MODELS. Penn State, Ifo Institute and Sauder School, UBC

COMBINATORIAL APPROACH TO INFERENCE IN PARTIALLY IDENTIFIED INCOMPLETE STRUCTURAL MODELS MARC HENRY, ROMUALD MÉANGO AND MAURICE QUEYRANNE Penn State, Ifo Institute and Sauder School, UBC Abstract. We propose