COORDINATION AND EQUILIBRIUM SELECTION IN GAMES WITH POSITIVE NETWORK EFFECTS

COORDINATION AND EQUILIBRIUM SELECTION IN GAMES WITH POSITIVE NETWORK EFFECTS Alexander M. Jakobsen B. Curtis Eaton David Krause August 31, 2009 Abstract When agents make their choices simultaneously, network effects often give rise to a selection problem involving perfectly coordinated, Paretooptimal equilibria. We characterize this selection problem, and introduce a generalized sequential choice model to address it. In this model, we show how expectation formation under imperfect information combines with network effects to form coordination cascades: ordered partitions of the agent space wherein coordination on one alternative is eventually optimal for all agents. Several theorems are proven regarding both the likelihood and extent of coordination under various parameter changes; in particular, we show that the degree to which agents can observe the choices of others is an important consideration. We also present numerical calculations which shed additional light on the coordination problem, and which suggest that sequential choice resolves, with high probability, the equilibrium selection problem efficiently. Corresponding author; Department of Economics, University of Calgary, 2500 University Drive NW, Calgary, AB, Canada, T2N 1N4; eaton@ucalgary.ca. Jakobsen: amjakobs@ucalgary.ca. Krause: d.krause@bell.ca

1 INTRODUCTION In the presence of positive network effects, the realized network benefit that an individual receives depends upon the number of other individuals who make the same choice. 1 Consequently, when making adoption decisions, agents must form expectations concerning the number of individuals who will eventually choose the same good. Surprisingly, relatively little work has focused on the expectations formation problem that is at the core of the individual s choice problem in the presence of network effects. Typically, the problem is addressed with a simplifying assumption. For example, Katz and Shapiro (1985) resolve the problem by assuming that consumer expectations are fulfilled in equilibrium. Models with network effects typically have multiple perfectlycoordinated equilibria, which raises yet another expectations formation question: even if expectations are fulfilled, how is it that agents manage to coordinate their expectations on one of the possible equilibria? Clearly, to achieve a perfectly-coordinated equilibrium, the expectations of all individuals must somehow be coordinated. But if choices are made independently, there is nothing to assure that coordination will, in fact, be achieved. These problems are well known, and various authors have suggested different approaches to them (see, for example, Crawford (1995) and Farrell and Katz (1998), or Jeitschko and Taylor (2001) and Walden and Browne (2009) for other types of coordination theory); yet, they remain unresolved, perhaps because, as Farrell and Klemperer (2007) point out, coordination is hard, especially when different adopters would prefer different coordinated outcomes. Notice also that when the process by which expectations are formed is swept away, so too is the ability to address a variety of interesting questions. If the value of the network benefit is small, is coordination more difficult to achieve? Is it easier to coordinate in an environment where tastes are similar? How does observability between agents affect coordination? More generally, what facili- 1 See, in particular, Rohlfs (1974), Katz and Shapiro (1985), Farrell and Saloner (1985), Church and Gandal (1992), Church and King (1993), Katz and Shapiro (1994), Economides (1996), Kauffman, McAndrews, and Wang (2000), and Farrell and Klemperer (2007). 1

tates and what inhibits coordination? Our purpose in this paper is to explore these issues in a sequential choice framework. In section 2 we characterize the equilibrium selection problem for simultaneous moves in a simple 2-good, 2-type model. In section 3 we introduce the dynamic choice model in a generalized framework suggested by the literature on information cascades. 2 In the generalized model, agents preferences are (potentially) unique, choice is sequential, and agents use the information available to them at the time they make their own choices to form expectations regarding the choices that agents who follow them will make. These expectations feed into the agents utility-maximizing choices in the usual way. We prove a number of new results concerning network effects and coordination in the generalized framework. Finally, in section 4 we provide numerical examples (using the 2-good, 2-type model from Section 2) to reinforce formal results and to explore additional issues. The numerical examples contrast two different algorithms agents may use to form expectations, and also provide some insight into how likely, and how complete, coordination may be under a variety of circumstances. 3 2 EQUILIBRIUM SELECTION IN NETWORK EFFECT MODELS In this section, we present a simple two-good, two-type model of network effects, and show how equilibrium selection issues arise when choice is simultaneous. We also investigate some efficiency issues surrounding these equilibrium results. The general model in this section also serves as a skeleton model for the numerical examples studied in section 4. 2 See Bikhchandani, Hirshleifer, and Welch (1998). 3 Choi (1997) explores a technology adoption problem in a sequential choice setting. In our model, individuals use the choices of those who precede them in the choice sequence to infer something about the number of agents adopting each good, while in Choi s model the choices of predecessors resolve uncertainty about the technologies. 2

2.1 AGENTS AND THEIR UTILITY FUNCTIONS Consider a set X of N agents who must choose one of two alternatives: A or B. These could represent goods, technologies, patterns of behaviour, or any other pair of choices exhibiting positive network effects; we simply refer to them as goods. Each agent is one of two types (A or B) and has a utility function U : {A, B} 2 N 2 R + of the form U(t, c, n A, n B ) = v(t, c) + e(c, n A, n B ), where t {A, B} is the agent s type, c {A, B} is the agent s decision, and n j is the total (final) number of agents who adopt good j. v(t, c) is the direct (private) utility a Type t agent derives from adopting good c, net of any costs from doing so, and the function e : {A, B} N 2 R + represents the positive network effect from choosing good c. Letting v t,c represent v(t, c), one of the defining traits of Type A agents is that v A,A > v A,B ; similarly, a characteristic of Type B agents is that v B,B > v B,A. One way of interpreting these conditions is to say that a Type i agent prefers a small network of Type i over a small network of Type j. The other defining characteristic also involves network size; namely, a Type i agent should prefer a large Type i network over a large Type j network. These, and several other properties of the network function, are made precise next. 2.2 THE NETWORK EFFECT FUNCTION The network function e is given by a split function e A (n A, n B ) e(c, n A, n B ) = e B (n B, n A ) if c = A if c = B. This is primarily a notational convenience; since n A and n B are reversed as inputs in e A and e B, only the general function e i (n i, n j ), where j = i, must be 3

characterized. Of course, e A and e B could be distinct functions, but to fully capture a wide spectrum of possible network effects, certain conditions must be satisfied. Five key properties are identified below. (In all cases, assume that i = j). (P1) e i (n i, n j ) is weakly increasing in n i and non constant in n i. (P2) e i (n i, n j ) is weakly decreasing in n j. (P3) M i R such that n j N, lim ni e i (n i, n j ) = M i. By (P1) and (P2), this means that n i, n j N, e i (n i, n j ) M i. Furthermore, we assume v i,i + M i > v i,j + M j. (P4) n j N, N N such that n i N, v i,i + e i (n i, n j ) > v i,j + e j (n j, n i ). (This is actually a consequence of (P3), but it is handy to state it as a separate property). (P5) S N with {0, 1} S such that n i S, N N for which n j N v i,j + e j (n j, n i ) > v i,i + e i (n i, n j ). (P1) asserts that, given n j, the value of network i increases as n i grows, but that it is not necessarily strictly increasing in n i ; there could be constant regions. But it is not constant everywhere, which makes network size a nontrivial consideration. Similarly, (P2) states that network i (weakly) diminishes in value as network j increases in size. Since the relation is weak, however, this means that e i could actually be constant in n j, or at least have constant regions. (P3) says that network i approaches some maximum value, M i, as n i increases, and that this maximum value is the same regardless of the value of n j. The intuition is that even if network j is large, network i can eventually grow large enough to approach its maximum value anyway. The second assumption in (P3) (v i,i + M i > v i,j + M j ) is the other defining characteristic of Type i agents: they prefer high-value good i networks over high-value good j networks. (P4) states that for any given n j N, Type i agents will eventually prefer network i as n i increases. (It is not difficult to show that (P4) is a direct result of (P3).) 4

(P5) states that for some values of n i and n j, a Type i agent will find it optimal to adopt good j. But depending on the network functions e A and e B, the set S of n i values where such an n j exists may be restricted. To rule out trivial models, attention is restricted to models where 1 S. (P1) implies that if k S, then l S for every 0 l k. Therefore, the union of all sets S satisfying (P5) is again a set satisfying (P5), and is the -maximal set satisfying (P5). This set, denoted S i, can be thought of as the switching set for Type i agents; it contains all values of n i for which it is possible for Type i agents to end up selecting good j. Depending on how the network functions behave, S i could be all of N, or it could be some proper subset of N, in which case S i = {0, 1, 2,..., n} for some n. With these properties in place, we may introduce some useful notation. Let n B N. Then α A (n B ) denotes the smallest value of n A for which v A,A + e A (n A, n B ) v A,B + e B (n B, n A ); that is, α A (n B ) is the smallest value of n A for which Type A agents are better off adopting good A. By (P4), there is at least one possible n A for which this inequality is true; as N is well-ordered, this means a minimum such n A exists, and so α A (n B ) is well-defined. Next, given n A S A, β A (n A ) denotes the minimum value of n B for which v A,B + e B (n B, n A ) > v A,A + e A (n A, n B ); that is, β A (n A ) is the smallest value of n B for which Type A agents are better off adopting good B. Again, (P5) and the fact that N is well-ordered guarantees the existence of β A (n A ). Similarly, for n A N, β B (n A ) is the smallest n B for which Type B agents are better off adopting good B, and α B (n B ) denotes the smallest n A (given n B S B ) for which Type B agents are better off adopting good A. It is straightforward to prove that α A (n B ) is weakly increasing in n B and that β A (n A ) is weakly increasing in n A (provided n A S A ). By symmetry, of course, this means that α B (n B ) (for n B S B ) and β B (n A ) are weakly increasing functions. It is also routine to verify that for all n B S B, α B (n B ) α A (n B ); similarly, for all n A S A, β A (n A ) β B (n A ). 5

2.2.1 COMPETITION BETWEEN NETWORKS It is possible that growth in the size of one network diminishes the quality of the other. In modeling fads, for instance, agents may only be concerned about the total fraction of individuals who choose the same network they do. In such cases, the networks are said to diminish one another. Definition 2.1. Suppose i, j {A, B} and that i = j. Then network j diminishes network i if there exists some n i N and n j, n j N with n j > n j such that e i (n i, n j ) < e i (n i, n j ). Network j strictly diminishes network i if for every n i N + and all n j, n j N with n j > n j, it follows that e i (n i, n j ) < e i(n i, n j ). So, network j diminishes network i only if (P2) acts non trivially (that is, if e i is non constant in n j ). Network j strictly diminishes network i if this is true for all values of n i N + and all values of n j ; the restriction to n i N + is taken to ensure that e i ( ) R +, since one would expect e i (0, 0) = 0. If both networks (strictly) diminish each other, then the networks are said to be (strictly) competitive; if they are not competitive (that is, if each function e i is constant in n j ), then the networks are independent. Independent networks have a useful mathematical property, as illustrated in the following theorem: Theorem 2.1. If networks A and B are independent, then S A and S B are bounded. Proof. When the networks are independent, each e i is constant in n j, j = i. Since lim ni e i (n i, n j ) = M i, regardless of n j, and since v i,i + M i > v i,j + M j, there is some ni for which v i,i + e i (ni, n j) > v i,j + M j v i,j + e j (n j, ni ) for all n j N. Since e i ( ) is increasing in n i, this inequality actually holds for all n i ni. But this means a Type i player is always better off with good i than good j whenever n i ni, making n i an upper bound for S i. An immediate consequence of Theorem 2.1 is that if each S i = N, then the networks must be competitive, for otherwise at least one network i would not be diminished by the other, resulting in a bounded S i for that network. Note that if each S i = N, then the networks are competitive, but not necessarily 6

strictly competitive. 4 Intuitively, one might suspect that a bounded S i puts limits on how likely coordination is, because restricting S i restricts the number of cases in which it is possible for a Type i agent to adopt good j. On the other hand, coordination might also seem less likely when the networks are competitive, because Type i agents can strengthen their own networks (and diminish the j network) simply by adopting good i. 2.3 NASH EQUILIBRIA IN SIMULTANEOUS MOVES We employ standard notation in the simultaneous move game, so that a strategy profile is an N-tuple s = (s 1,..., s N ) S = {A, B} N. Given an agent x i X and a profile s S, the profile s i = (s 1,..., s i 1, s i+1,..., s N ) refers to the strategies of s for all players other than agent i; the notation (s i, s i ) refers to the entire profile s. Since s S consists of the choices of all players, the values of n A and n B may be inferred from s; let n A (s) and n B (s) represent the number of A and B choices in s, respectively. Then agent i s utility function U i (t i, c i, n A, n B ) may be written as U i (t i, n A (s), n B (s)), or simply U i (t i, s) when the context is clear. We are now prepared to prove a number of equilibrium results. Theorem 2.2. If v B,A + e A (N A + N B, 0) > v B,B + e B (1, N A + N B 1), then the strategy profile where all players choose good A is a Nash equilibrium in simultaneous moves (which we call a pure A equilibrium). A symmetric statement holds for a pure B equilibrium. Theorem 2.2 is obvious and, trivially, its converse is also true. Closer examination of the required inequalities, however, reveals that a pure equilibrium of either type exists if N A and N B are sufficiently large. Corollary 2.3. If N A + N B 1 α B (1), then a pure A equilibrium exists. Similarly, if N A + N B 1 β A (1), then a pure B equilibrium exists. 4 Note also that the converse of Theorem 2.1 does not hold; that is, a bounded S i does not guarantee that the networks are independent. One may also conjecture that for strictly competitive networks, the S i sets are unbounded; but this is false as well. 7

Corollary 2.3 shows that in all but the most trivial cases, there is both a pure A equilibrium and a pure B equilibrium in simultaneous moves. Specifically, the corollary reveals an important condition under which the network effect is actually interesting, because without it no Type i agent would ever adopt good j. This condition is the hypothesis of Corollary 2.3: (NE) N A + N B 1 max {α B (1), β A (1)}. There is another interesting type of equilibrium which can occur in simultaneous moves: a split equilibrium where all agents simply adopt the good corresponding to their own type. Intuition might suggest that a split only occurs under weak network effects (for example, in models where (NE) is not satisfied), but this is not the case. The following theorem provides some general conditions under which a split equilibrium exists: Theorem 2.4. If N A 1 α A (N B + 1) and N B 1 β B (N A + 1), then there exists a split Nash equilibrium in simultaneous moves. Again, Theorem 2.4 is quite obvious; however, the hypothesis of Theorem 2.4 may not always be satisfied since α A ( ) and β B ( ) are weakly increasing functions. So, for competitive network effects, the hypothesis can fail. Independent networks, however, will have a split equilibrium if there are enough agents of each type. Corollary 2.5. If the networks are independent and N A and N B are sufficiently large, then there is a split equilibrium in simultaneous moves. Proof. Since the networks are independent, Theorem 2.1 asserts that S A and S B are bounded sets; in particular, there are constants s A, s B N such that S A = {0,..., s A } and S B = {0,..., s B }. So, if N A s A + 1 and N B s B + 1 then, by definition of S i, all agents will choose the good corresponding to their own type, and so a split equilibrium exists. Having established the existence of multiple equilibria, it is worthwhile to ask which, if any, are Pareto efficient. A surprising fact is that under (P1)-(P5) 8

and (NE), none of the equilibria identified thus far are guaranteed to be Pareto efficient. If, however, each e i is strictly increasing in n i, then at least one pure equilibrium is efficient. Before stating and proving this fact, a lemma is required. Lemma 2.6. If e A is strictly increasing in n A and e B is strictly increasing in n B, then any profile s S with n A (s) > 0 and n B (s) > 0 does not Pareto dominate any pure profile (i) = (i, i,..., i) for i {A, B}. Proof. In the profile (i), the payoff to agents of Type i is v i,i + e i (N, 0), and the payoff for Type j = i agents is v j,i + e i (N, 0). Now, suppose s S with n A (s) > 0 and n B (s) > 0. There are two cases. First, if at least one Type i agent has chosen i in the profile s, then this agent receives utility equal to v i,i + e i (n i (s), n j (s)) < v i,i + e i (N, n j (s)) v i,i + e i (N, 0), so this agent is worse off under s than under (i), and so s does not Pareto dominate (i) in this case. On the other hand, if all Type i agents choose j in s, then at least one Type j agent chooses i in s, because n i (s) > 0. Then this Type j agent receives utility equal to v j,i + e i (n i (s), n j (s)) < v j,i + e i (N, n j (s)) v j,i + e i (N, 0), so he is also worse off under s. Hence s does not Pareto dominate (i). Theorem 2.7. Suppose e A is strictly increasing in n A and e B is strictly increasing in n B. Then at least one pure equilibrium is Pareto efficient. Proof. Suppose (A) is not efficient. It suffices to show that (B) is efficient. Since (A) is not efficient, this means (A) is dominated by some other profile s S; in particular, s = (A) implies that either s = (B) or n A (s) > 0 and n B (s) > 0. If n A (s) > 0 and n B (s) > 0, then (by Lemma 2.6) s does not dominate (A), and therefore it must be the case that s = (B) dominates (A). But this means (B) is efficient, for any s S is either (A) (which (B) dominates) or n A (s ) > 0 and n B (s ) > 0, which (by Lemma 2.6) means s does not dominate (B). Thus no profile s S dominates (B), so (B) is efficient. Theorem 2.7 guarantees that at least one of the pure equilibria is efficient in any suitably restricted model, but are they both efficient? Indeed, it is possible 9

that they are; but even with strictly increasing network functions, one of the equilibria may be dominated by the other. Similar issues arise for split equilibria: sometimes the split outcome is efficient, but one or both of the pure equilibria may dominate it. This analysis suggests that, in many cases, there is a difficult equilibrium selection (coordination) problem to be solved. There are many possible equilibria, and even under (NE), criteria for the existence of the different types of outcomes are similar (namely, large N i ). Pareto efficiency considerations cannot help solve the selection problem when both pure outcomes are Pareto efficient; and, even if one pure equilibrium Pareto dominates the other, the perfectinformation assumptions (and the static nature of) the Nash solution concept may not be a good representation of the choice problem. What if agents have less information about other agents? What if they are able to observe the actions of some other agents prior to making their own decision? In a sequential setting, how might agents use their observations to form expectations about outcomes, and how do these expectations affect final outcomes? These, and related issues, are the central focus of sections 3 and 4. 3 DYNAMIC COORDINATION: THEORY 3.1 PRELIMINARY DEFINITIONS AGENTS AND THEIR (EXPECTED) UTILITY We assume there is a nonempty set X of agents, and that each agent must adopt one of two goods, A or B. Adoption decisions are made in a sequential manner, but agents do not know the preferences of all other agents, nor the sequence in which decisions are to be made. Agents are able to observe some (not necessarily all) decisions made prior to their own, and use these observations to form expectations about future decisions. In this way, then, agents seek to maximize their expected utility. 10

Formally, let N 2 N 1 = { (p, q) N 2 p + q = N 1 } and N 2 <N = {(p, q) N 2 p + q < N}. Then, for each agent x X, there are functions u x : {A, B} N 2 N 1 R and e x : {A, B} N 2 <N { P : N 2 N 1 [0, 1] P is a probability distribution }. Here, u x (c, A x, B x ) is the utility agent x receives from adopting good c, given that A x other agents have adopted good A and B x other agents have adopted good B. Since (A x, B x ) N 2 N 1, this means agent x s utility depends not only on his own decision, but also on the decisions of all other agents. e x (c, A x, B x ) is the agent s expectations function, and gives a probability distribution over N 2 N 1, the set of all possible decisions of other agents. (A x, B x ) N 2 <N represents decisions already made by other agents which agent x has observed, so that e maps these observations, together with x s choice, into a probability distribution P over the decisions of all agents other than x. 5 Combining u x and e x gives an expected utility function eu x : {A, B} N 2 <N R of the form eu x (c, A x, B x ) = e x (c, A x, B x )(n A, n B ) u x (c, n A, n B ). (n A,n B ) N 2 N 1 The expected utility functions give rise to optimal choice functions c x : N 2 <N {A, B}, with the understanding that c x (A x, B x ) denotes the optimal choice of agent x, given eu x and his observations (A x, B x ). SEQUENTIAL CHOICE AND COORDINATION CASCADES The order in which agents make decisions must also be considered. Intuitively, we wish to split X into a sequence of disjoint groups, with the interpretation 5 Obviously, we require that e x (c, A x, B x )(n A, n B ) = 0 whenever n A < A x or n B < B x. 11

that agents observe choices made in prior groups, but not those in their own group. This allows for the possibility that agents are not perfect observers; for instance, agent x may adopt a good before y does, but if y cannot observe x s decision, we simply place x and y in the same group. In section 3.2, we use this interpretation to examine the relationship between observability and coordination. But first, some more definitions are needed. Given X, a permutation (or ordered partition) of X is a partition P of X together with a linear order on P. This means X has been split into, say, n pairwise disjoint subsets (whose union is X ), and that these subsets have been arranged in a particular order. The notation ω = (g 1,..., g n ) represents such a permutation; here, each g i is a nonempty subset of X, and the vector notation indicates the order in which these subsets have been arranged. When the context is clear, A i and B i denote the total number of A and B adoption decisions made by agents in groups prior to g i. Ω denotes the set of all (combinatorially) possible ordered permutations; 6 in general, however, Ω Ω denotes a nonempty set of permutations which are deemed possible in a given model. This means that for each ω Ω, there is a nonzero probability that ω will be realized. Accordingly, Γ : Ω (0, 1] represents a probability distribution over Ω. We are interested in particular types of permutations called coordination cascades. Definition 3.1 (Coordination Cascade). Let ω = (g 1,..., g n ) Ω and T {A, B}. Then ω is a Type T Coordination Cascade if there exists some i such that for every i i and every x X, c x (A i, B i ) = T. Thus, a permutation ω is a Type T coordination cascade if, eventually, T becomes the optimal choice for all agents in X, including agents in groups prior to i. This does not require every agent to select good T; rather, it requires that good T would be chosen by any agent in X if the agent were in group i or any later group. 6 For a set S containing n elements, the total number of ordered partitions of S is denoted ζ(n), and is given by ζ(n) = n 1 i=0 (n i )ζ(i), with ζ(0) = 1. 12

To get an intuitive grasp of a coordination cascade, think of different agents as being more or less inclined to choose good T. For example, in the model outlined in the previous section, agents of Type A are more inclined to choose good A than are agents of Type B, and vice versa. Definition 3.1 says that we have a coordination cascade if the agents who are least inclined to choose good T nevertheless would do so if they were in group i or any later group. In practice, the existence of a Type T coordination cascade requires that choices made by agents in groups 1 through i 1 be so imbalanced in favour of good T that subsequent agents find it optimal to choose T, even if they are not naturally inclined to prefer good T. The notation A (or AGen ) denotes the set of all permutations in Ω which are Type A coordination cascades, while APure A Gen contains only those A cascades which are pure, meaning that every agent chooses good T. If a model permits only a subset Ω Ω, define A Gen = AGen Ω and A Pure = APure Ω to indicate the general and pure cascades in the restricted set Ω, respectively. Similar notation is used for Type B cascades, with B in place of A. DYNAMIC CHOICE MODELS The above discussion shows that only a set X, functions u x and e x for each x X, a set Ω Ω, and a probability distribution Γ : Ω (0, 1] are needed to construct a dynamic choice model; this suggests the following definition: Definition 3.2 (Dynamic Network Effect Model). A Dynamic Network Effect Model is a collection M = X, (u x ), (e x ), Ω, Γ ; M determines functions eu x and c x for each x X. For a given model M, let N = X. A simple and intuitive property for a model to have is the Expectations Property: Definition 3.3 (Expectations Property). A model M satisfies the Expectations Property if, for every agent x X, every (a, b) N 2 <N, and every (a, b ) N 2 <N for which 13

a a, b b, and a + b a + b, we have that eu x (A, a, b ) eu x (A, a, b) and eu x (B, a, b ) eu x (B, a, b). A symmetric condition must also hold for a a, b b, and a + b a + b. To understand the Expectations Property, suppose an agent, x, has observed a agents adopt good A, and b agents adopt good B, for a total of a + b observations. Now suppose that he had observed a > a and b < b A and B decisions, with a + b = a + b. This means he has observed the same total number of decisions as in the original scenario, but with more A s and fewer B s. The Expectations Property says that in such a scenario, his expected utility from choosing A is at least as high as (and his expected utility from choosing B is not greater than) it was in the original scenario. Now suppose that he had observed a > a and b = b A and B decisions, so that a + b > a + b. Then the Expectations Property says, once again, that his expected utility from choosing A is at least as high as (and his expected utility from choosing B is not greater than) it was in the original scenario. Combining these two examples gives the general statement of the Expectations Property. Intuitively, the Expectations Property must be satisfied in models with positive network effects. Theorem 3.1 states that in such models, impure coordination cascades cannot be Pareto efficient, and are not Nash equilibria of the simultaneous move game (see the appendix for a proof): Theorem 3.1. Let M be a model satisfying the Expectations Property. Furthermore, suppose that for each agent x, u x (A, a, b) is strictly increasing in a and u x (B, a, b) is strictly increasing in b. Then all impure coordination cascades (that is, those in A\A Pure or B\B Pure ) are not Pareto efficient, and are not Nash equilibria of the perfectinformation simultaneous move game. The Expectations Property is also sufficient for proving a number of comparative static results. Occasionally, however, more structure must be imposed upon a model. 14

Definition 3.4 (2-Type Model). A model M is a 2-Type Model if there exists a partition {X A, X B } of X such that for every x A X A, every x B X B, and every (a, b) N 2 N 1, we have that c xa (a, b) = B c xb (a, b) = B and c xb (a, b) = A c xa (a, b) = A. Agents in X A and X B are called Type A and B agents, respectively. So, a 2-Type model is one in which the agents can be split into two groups: those inclined toward A, and those inclined toward B. Specifically, the definition says that if any Type A agent optimally selects good B, then any Type B agent would also choose good B if faced with the same set of observations. A useful characteristic of some 2-Type models is the Cascade Property: Definition 3.5 (Cascade Property). A 2-Type model M satisfies the Cascade Property if for every ω = (g 1,..., g n ) Ω, every g i, every t {A, B}, and every x g i of Type t, c x (A i, B i ) = t = t = [ j i y X c y (A j, B j ) = t ]. t The Cascade Property asserts that if any Type t agent, x, optimally selects = t, then t is optimal for every agent in the same group as x, as well as all agents in subsequent groups, regardless of type. Naturally, this means that if any agent of Type t adopts good t = t, then a Type t cascade will come about. Not every 2-Type model satisfies the Cascade Property; for instance, a set of observations (A x, B x ) may be sufficient for a particular Type B agent (x) to adopt good A, but it may not be sufficient for all Type B agents to adopt A. 2- Type models and the Cascade Property are mainly used to explore the effects of population composition and, especially, observability on coordination. These, and other comparative static properties of coordination, are developed next. 15

3.2 COMPARATIVE STATICS NETWORK VALUE, POPULATION COMPOSITION, AND EXPECTATIONS In this section, we examine how changes to various model parameters influence both the likelihood of a coordination cascade occurring, and also the extent of coordination. Definition 3.6 (Relative Increase). Let t, t {A, B} with t = t, and let M = X, (u x ), (e x ), Ω, Γ and M = X, (u x), (e x), Ω, Γ be models where eu x(t, A x, B x ) eu x (t, A x, B x ) and eu x(t, A x, B x ) eu x (t, A x, B x ) for each x X and (A x, B x ) N 2 <N. Then the (expected) value of network t, in model M, is a relative increase to the (expected) value of network t in model M. So, the expected value of a network in model M is a relative increase to the expected value of the other network in model M if, in every possible circumstance, all agents derive at least as much (expected) utility from selecting that network in M, and derive no more (expected) utility from the other network in M than they did in M. When the context is clear, we shall simply say that the value of network t increases relative to the value of network t. Given a model M and a permutation ω Ω, let A(ω, M) X and B(ω, M) X denote the agents who adopt good A and good B, respectively, in ω. The following theorem is central to the analysis in this section: Theorem 3.2. Let M = X, (u x ), (e x ), Ω, Γ and M = X, (u x), (e x), Ω, Γ be models satisfying the Expectations Property. Then 1. If network A increases in value relative to network B, then A Pure A Pure, A Gen A Gen and, for each ω Ω, A(ω, M) A(ω, M ). 2. If network B increases in value relative to network A, then B Pure B Pure, B Gen B Gen and, for each ω Ω, B(ω, M) B(ω, M ). 16

Proof. We prove statement (1); the proof of (2) is similar. Let ω = (g 1,..., g n ) Ω. In group g 1, any agents who chose A in model M will still choose A in M, due to the relative increase in the value of network A. Consequently, agents in group g 2 will observe at least as many A s in model M as they would in M, so that (by the Expectations Property as well as the relative increase) any agents in g 2 who select A in model M will also select A in M. Continuing in this fashion, it is clear that A(ω) A(ω), and that if ω is an A cascade in M, then ω is an A cascade in M. An important result follows immediately from this theorem. For notational convenience, let E[A M] = ω Ω Γ(ω) A(ω, M) and E[B M] = ω Ω Γ(ω) B(ω, M) denote the expected number of agents who adopt good A and B, respectively, in a model M. Also, let P(A M) = ω A Γ(ω) and P(B M) = ω B Γ(ω) denote the probabilities of A and B cascades, respectively. Corollary 3.3. Let M = X, (u x ), (e x ), Ω, Γ and M = X, (u x), (e x), Ω, Γ be models satisfying the Expectations Property. Then 1. If network A increases in value relative to network B, then E[A M ] E[A M], P(A Pure M ) P(A Pure M), and P(A Gen M ) P(A Gen M). 2. If network B increases in value relative to network A, then E[B M ] E[B M], P(B Pure M ) P(B Pure M), and P(B Gen M ) P(B Gen M). Corollary 3.3 captures many different comparative static results. If, for example, agent preferences change so that every agent places greater value on good A, then, other things equal, this causes an increase in the value of network A relative to network B, and corollary 3.3 applies. Or, it could be that network effects become more pronounced or valuable for network A, which also qualifies as a relative increase. Agents could also change how expectations are formed; if, for instance, agents become biased in that they always predict a 17

greater number of other agents to adopt good A, then this also would cause a relative increase in the value of network A. 7 In 2-Type models, there is also an intuitive relationship between player types and coordination: if, for instance, the set of Type A agents expands, then there is a greater likelihood of Type A cascades. This result is given by theorem 3.4 (see the appendix for a proof): Theorem 3.4. Let M = X, (u x ), (e x ), Ω, Γ and M = X, (u x), (e x), Ω, Γ be 2-Type models satisfying the Expectations Property and the Cascade Property. Then 1. If X A = X A S for some nonempty S X B, and if u x = u x and e x = e x for each x X \S, then E[A M ] E[A M], P(A Pure M ) P(A Pure M), and P(A Gen M ) P(A Gen M). 2. If X B = X B S for some nonempty S X A, and if u x = u x and e x = e x for each x X \S, then E[B M ] E[B M], P(B Pure M ) P(B Pure M), and P(B Gen M ) P(B Gen M). OBSERVABILITY Another variable to consider is the degree to which agents are able to observe one another. Suppose ω = (g 1, g 2, g 3, g 4 ) is a permutation of the agents. In this permutation, agents in group g 2 can observe the decisions made by group g 1, and agents in group g 3 can observe the decisions of all agents in groups g 1 and g 2. Now consider the modified permutation ω = (g 1, g 2 g 3, g 4 ). The only difference is that agents in group g 3 are no longer able to observe the decisions made by group g 2. However, the order in which the agents make decisions has not been drastically modified; if agent x makes a decision prior to (or simultaneously with) agent y in ω, then x still makes a decision prior to (or simultaneously with) y in ω. We can therefore think of ω as being the same as ω, except that in ω, some agents (those in g 3 ) are worse observers than they were in ω. 7 This last example would require each function u x (c, n A, n B ) to be increasing in n A (and non-increasing in n B ) whenever c = A; this, of course, is a common assumption in network effect models. 18

By analyzing the structure of such modified permutations, along with some properties of their associated probability distributions, we shall prove that less observability implies less coordination. First, we make the above discussion precise: Definition 3.7. Given a permutation ω = (g 1,..., g n ) and an agent x X, define the rank r(x, ω) of agent x in ω by r(x, ω) = i x g i. Then the merge set of ω is given by Merge(ω) = { ω Ω x, y X r(x, ω) r(y, ω) = r(x, ω ) r(y, ω ) }. Finally, if Ω Ω, then Merge(Ω) = ω Ω Merge(ω). As a notational convenience, let Merge S (Ω) = Merge(Ω) S for any given S Ω. The rank function r(x, ω) specifies which group number agent x belongs to in a permutation ω, and the merge set Merge(ω) contains all possible permutations ω which weakly preserve the rank orderings of ω. If ω Merge(ω), then ω is called a merged permutation of ω. The set Merge(Ω) contains all possible merged permutations which can come about from a set Ω of permutations, and since ω Merge(ω), it is clear that Ω Merge(Ω). 8 Previous comparative static theorems were derived by fixing both a set Ω of permutations as well as a probability distribution Γ over Ω. In our current endeavor, however, this cannot be done, since we are modifying Ω to allow different permutations, which means we must also modify Γ. Specifically, given a set Ω, we wish to consider a nonempty set Ω Merge(Ω), so that permutations in Ω will, in general, represent worse observability than those in Ω. 8 Note that the merge set also preserves equal ranks, because if r(x, ω) = r(y, ω), then r(x, ω) r(y, ω) and r(y, ω) r(x, ω); hence, for every ω Merge(ω), we have r(x, ω ) r(y, ω ) and r(y, ω ) r(x, ω ), so that r(x, ω ) = r(y, ω ). This means Merge(ω) will contain permutations where sequences of adjacent groups have been merged via the union operator; that is, it will never be the case that, say, only one agent from a particular group (containing more than one agent) will be placed into an earlier group. 19

Just as merged permutations respect the original ordering of the agents, we also require that the new probability distribution respect the original probability distribution in a similar fashion. In particular, the probability weight Γ(ω) originally assigned to a permutation ω Ω may only be distributed among permutations in the set Merge Ω (ω); this is because any merged permutations originating from ω will be of a similar order type, so that the probability weight associated with such an order type is not drastically modified. This is made precise in the following definition: Definition 3.8. Given nonempty sets Ω Ω and Ω Merge(Ω), a probability distribution Γ over Ω, and a permutation ω Ω, a redistribution rule R ω is a function R ω : Merge Ω (ω) [0, 1] such that R ω (ρ) = Γ(ω). ρ Merge Ω (ω) If (R ω ) ω Ω is a sequence of redistribution rules (one for each permutation in Ω), then the probability distribution Γ : Ω [0, 1] induced by (R ω ) ω Ω is given by Γ (ω ) = R ω (ω ) ω Ω for each ω Ω, with the understanding that R ω (ω ) = 0 if ω / Merge Ω (ω). Finally, a probability distribution Γ over Ω is comparable to Γ if and only if there exists a sequence (R ω ) ω Ω of redistribution rules inducing Γ. Definition 3.8 describes the process by which comparable distributions on Ω are created. Namely, for each ω Ω, the probability weight Γ(ω) assigned to ω may only be shifted to permutations in Merge Ω (ω), as the above intuition suggests. We are now prepared to prove the relationship between observability and coordination. The argument is divided into two steps, starting with the following lemma: 20

Lemma 3.5. Let M be a 2-Type model satisfying the Cascade Property, and suppose ω Ω is not a coordination cascade. Then each ω Merge(ω) is not a coordination cascade either. Proof. First, observe that in a 2-Type model satisfying the Cascade Property, a permutation ω is either an A cascade, a B cascade, or each agent adopts the good corresponding to their own type. Thus, in ω, all agents adopt the good according to their type. Let ω = (g 1,..., g n ) and ω = (h 1,..., h m ) Merge(ω); if ω = ω, the statement is trivial. So, suppose ω = ω. This means there is some smallest i for which g i = h i. By definition of the merge operation, this means h i = g i g i+1... g i+k for some k. Now, agents in h i face the same observations (A i, B i ) as agents in g i. So, if it were optimal for any agent to choose the good opposite their type, this would mean (by the Cascade Property) that observations (A i, B i ) are sufficient to cause a coordination cascade, contradicting the fact that ω is not a cascade of either type. Thus, agents in h i each adopt the good according to their type, so that agents in h i+1 (if there is such a group) face the same observations as agents in g i+k+1. Repetition of this argument shows that in ω, all agents optimally adopt the good according to their type, so that ω is not a coordination cascade. Since coordination failure is preserved by the merge operation, a straightforward application of the comparability concept proves the following theorem: Theorem 3.6. Let M = X, (u x ), (e x ), Ω, Γ and M = X, (u x ), (e x ), Ω, Γ be 2- Type models satisfying the Cascade Property. If Ω Merge(Ω) and Γ is comparable to Γ, then the probability of coordination failure is at least as great in M as it is in M. Proof. Let F = Ω\(A B) and F = Ω \(A B ) denote the permutations in models M and M which are not coordination cascades. By Lemma 3.5, we have that ω F Merge Ω (ω) F. Since Γ is induced by a sequence of redistribution functions, we also have that for each ω F, the probability weight Γ(ω) is divided among members of Merge Ω (ω) F by Γ ; hence the total probability assigned to F by Γ is assigned to F by Γ, so that P(F M) P(F M ). 21

Theorem 3.6 establishes an intuitive link between observability and coordination: if, other things equal, agents are worse observers of other agents, then coordinated outcomes are less likely to occur. A special case of this is given in Theorem 3.7 (see the appendix for a proof): Theorem 3.7. Let Ω 1 1 denote the set of all one-by-one permutations (that is, those permutations where each group contains exactly one agent), and let U(Ω ) and U(Ω 1 1 ) denote uniform distributions on Ω and Ω 1 1, respectively. Then U(Ω ) is comparable to U(Ω 1 1 ). Consequently, a 2-Type model satisfying the Cascade Property has a higher probability of coordination failure if it employs U(Ω ) than if it employs U(Ω 1 1 ). Theorem 3.7 allows interesting comparisons to be made between models with one-by-one permutations (representing perfect observability) and models with arbitrarily grouped permutations (representing imperfect observability); these are examined in the numerical examples. Some combinatorial properties of coordination cascades, however, must be developed in order to aid the computation of the examples. 3.3 COUNTING CASCADES If a 2-Type model over Ω satisfies the Cascade Property, it is possible to characterize the sets A Pure, A Gen, B Pure, and B Gen using combinatorial methods. Then, if a uniform distribution is put on Ω, these characterizations give easy formulas for finding the probability of cascades in arbitrary 2-Type models. These are employed in section 4 to illustrate a number of results about dynamic coordination. Central to this analysis is the concept of a form, which is a vector F = (a 1, b 1 ), (a 2, b 2 ),..., (a n, b n ) for some n 1, and for which i=1 n a i = N A and i=1 n b i = N B. The interpretation is that each pair (a i, b i ) represents a group with a i Type A agents and b i Type B agents. Every form gives rise to many permutations which satisfy that form; so, our objective is to characterize all 22

forms which result in cascades, and to count how many permutations satisfy those forms. In general, an A cascade must be of the form (a 1, b 1 ), (a 2, b 2 ),..., (a n, b n ),, where denotes an arbitrary form for the remaining N i=1 n (a i + b i ) agents. Here, the interpretation is that the group (a n, b n ) causes the A cascade: all prior groups fail to cause a cascade of either type, and all agents after the n th group choose A regardless of type, and regardless of how they are arranged (the Cascade Property assures that all cascades must behave this way, and that agents in groups 1 i n all adopt the good according to their type). Suppose an agent has observed k adoption decisions, and let A k and B k represent the number of good A and good B adoption decisions observed, respectively. In 2-Type models satisfying the Cascade Property, there is an associated value A k representing the minimum value of A k such that any Type B agent would adopt good A; similarly, B k is the smallest value of B k for which a Type A agent would adopt good B based on these observations. (Note that A k and B k may not exist for small k; in that case, set them to with the interpretation that no observations could cause these early agents to adopt the good opposite their type). Now, each group (a i, b i ) in F for 1 i < n must not cause a cascade of either type; this means that for 1 i < n we must have i a j < A i j=1 (a j +b j ) j=1 and i b j < B i j=1 (a j +b j ). j=1 We also require that the group (a n, b n ) cause a cascade; that is, that agents in this group all adopt the good according to their type, but that the new total numbers of A and B adoptions cause all subsequent agents to adopt, say, good A (for Type A cascades). So, if an A cascade satisfies the form F, we must have that n a i A n i=1 (a i +b i ). i=1 Naturally, a similar condition exists for Type B cascades. This fully characterizes the forms for A and B cascades; so, let F A and F B denote the set of all A- 23

and B-cascade forms, respectively. For each F F i, let Perm(F) Ω denote the set of cascades satisfying the form F. It is then clear that the sets A and B of all Type A and B coordination cascades, respectively, are given by A = Perm(F) and B = F F A F F B Perm(F). Moreover, it is obvious by construction that Perm(F) Perm(F ) = whenever F = F, so that A = Perm(F) and B = Perm(F). F F A F F B All that remains is to determine the value of Perm(F) for an arbitrary form F. This is given in the following theorem; see the appendix for a proof. Theorem 3.8. Let F = (a 1, b 1 ), (a 2, b 2 ),..., (a n, b n ), be a form. Then Perm(F) = N A!N B!ζ(N n i=1 (a i + b i )) (N A n i=1 a i)! (N B n i=1 b i)! n i=1 a i!b i!, where ζ(n) = n 1 k=0 (n k )ζ(k) (with ζ(0) = 1) is the number of ordered partitions of a set containing n elements. A special case of this counting procedure is to consider only those permutations in Ω 1 1 where decisions are made one-by-one. A similar formula exists for Perm(F) in this scenario. Again, see the appendix for a proof. Theorem 3.9. Let F = (a 1, b 1 ), (a 2, b 2 ),..., (a n, b n ), be a form where decisions are made one-by-one (that is, where every group, including those in, contains only one agent). Then Perm(F) = N A!N B! (N n i=1 (a i + b i ))! (N A n i=1 a i)! (N B n i=1 b i)!. These constructions allow the total number of A and B cascades to be computed relatively easily, given that the A k and B k values are available; the A k and 24

B k determine the sets F A and F B, and Theorem 3.8 tells how many permutations satisfy each of those forms. Each A k and B k, of course, depends on all model specifications, including the manner in which agents form expectations. So, in the next section, we give some possible expectation formation rules, and provide several numerical examples of coordination issues in fully-specified models. 4 DYNAMIC COORDINATION: EXAMPLES In this section we develop some examples of sequential choice models. Naturally, they illustrate the results of section 3.2. Of more interest, we use these examples to explore the possibility that sequential choice may resolve the equilibrium selection problem that is at the heart of the literature on network effects. We restrict attention to network effect models of the form described in section 2, including properties (P1)-(P5) and (NE). We also present two possible algorithms agents may use to form expectations: the naïve algorithm and the sophisticated algorithm. 4.1 NAÏVE AGENTS Naïve agents use their observations and knowledge of their own type to form subjective probabilities of an agent being Type A or Type B. Then, by assuming that agents adopt the good according to their type, naïve agents use these probabilities in a binomial distribution to estimate the final number of A and B decisions. This is fitting when agents are only aware of their own preferences, the network effect functions, and the population size. Specifically, each agent x X (of Type t), having observed A x agents adopt good A and B x agents adopt B (with A x + B x = k), forms subjective probabili- 25

ties P x (A) = A x +1 k+1 if t = A A x k+1 if t = B and P x (B) = B x +1 k+1 if t = B B x k+1 if t = A. So, in a binomial distribution, the subjective probability for agent x that exactly i of the remaining N k 1 agents adopt good t is ( ) N k 1 ρ t x(i) = P x (t) i P x (t ) N k 1 i, i where t = t. Thus, the expected utility for agent x is eu x (A, A x, B x ) = if he adopts good A, and if he adopts B. eu x (B, A x, B x ) = N k 1 ρx A (i)u x (A, A x + i, N A x i 1) i=0 N k 1 ρx B (i)u x (B, N B x i 1, B x + i) i=0 4.2 SOPHISTICATED AGENTS Sophisticated agents have access to more information: they are aware of the preferences of both types of agents, they know the network effect functions, and they also know the population size, but not the values of N A and N B. Moreover, they use this information to predict decisions made by subsequent agents. Like naïve agents, sophisticated agents use observed decisions, together with knowledge of their own type, to form subjective probabilities of agents being a particular type. These probabilities are used to assign subjective probabilities to the different forms that the remaining N k 1 agents may satisfy. Since agents do not know the values of N A and N B, they consider all possible forms (a 1, b 1 ),..., (a n, b n ) with i=1 n (a i + b i ) = N k 1 (the value n, of 26