Hierarchical Bayesian Persuasion

Hierarchical Bayesian Persuasion Weijie Zhong May 12, 2015 1 Introduction Information transmission between an informed sender and an uninformed decision maker has been extensively studied and applied in Economics studies. When both side has no commitment power, [Crawford and Sobel, 1982] showed the possibility of information transmission in the form of coarse intervals. When the sender has commitment power on how to garble his information conditional on the realizaiton of signal he receives, [Kamenica and Gentzkow, 2009] characterized the optimal signal structure that the sender will choose using a simple geometric characterization. In real life, information transmission is more complicated than these simplified models. One key difference is that the interaction is likely to involve multiple agents. For example, people rely on multiple sources news reports on a single event to get an unbiased understanding on it. While on the same time, the person who report directly to public, like journalist or the anchor person, is very unliky to be the person who discovers the first hand information. Within this complex network of interaction, every one has his own interest and own knowledge. The possibly contradicting persuit of unbiasedness, political correctness, freshness or profitability might lead to all kind of patterns in information transmission. Then a natural question to ask is how information is transmitted when there are multiple agents, multiple layers of agents, or even a complex network of agents in the game. Mutiple receivers case is not too different from the single receiver case because no matter how we aggregate receivers action, it is equivlent to modify sender s preference in a normal way. Thus the more interesting problem is the multiple senders case. When multiple senders share the same initial information and communicate with receiver in a symmetric way, key properties of information transmission is well characterized. In the cheap talk case, [Ambrus and Takahashi, 2008] showed that rational receiver can assign undesirable beliefs to off path signals and force senders to agree on telling truth. When senders have commitment power, [Gentzkow and Kamenica, 2011] characterized the game as if there is a pool of information and each sender can individually add information to the pool. Thus the equilibrium outcome will be strictly more informative when adding more competing senders. However, we are still interested in the case where senders are not symmetric. Consider an organization, with agents searching for relevent informationa and a decision maker. It s most likely in a hierarchical form. Figure 1 shows a typical hierarchical information transmission structure. Each sender in the network has his own private source of information and he can also receiver information from downstream senders. Information is transmitted from downstream senders to upstream senders and eventually to the decision 1

maker. Each sender in this network has his own preference in decision maker s action and can distort the information he send to the following senders. Decision Maker Sender Sender Information Sender Information Inforamtion Figure 1: Hierarchical information transmission If senders have no commitment power, then we are in the hierarchical cheap talk case. Researches have been done on generally characterizing equilibrium and on how adding intermediary sender can possibly improve information transmission. [Ambrus et al., 2013] generally characterized hierarchical cheap talk equilibrium. With pure strategy and state independent biases, there exists bottleneck intermediator, order of senders doesn t matter and intermediary hurts communication. With mixed strategy these are not generally true. [Ivanov, 2010] showed that a first best non-strategic mediator can be implemented by using a strategic mediater. [Li, 2009] showed strategic complementarity between biased sender and biased intermediary s truthfulness and studied biased sender s choice of communication channel. This paper studied hierarchical information transmission in a Bayesian persuasion framework with commitment power on senders side. We start from a simplified case when only one sender is informative and all other senders are as uninformed as the decision maker. Thus as in Figure 2, the network is simplified to a chain. Each sender in this chain can design and commit to a signal structure garbling the signal he receives and passes it to following senders. Decision Maker Sender Sender Information Sender Information Inforamtion Figure 2: Hierarchical information transmission The paper is arranged as following. In section 2, we setup the model of hierarchical Bayesian persuasion and characterized the equilibrium by introducing a notion of capacity set, In section 3, we characterized some properties of capacity sets and showed some direct corollaries having flavors of main results from [Ambrus et al., 2013]. In section 4, we did comparative statics with respect to initial information and capacity. In 5 we concluded our research and summarized some potential directions to go. 2

2 Model of Persuasion 2.1 Setup State of the world: There are finite possible state of the world ω Ω. There is a common prior belief over states among all players µ 0 int (Ω). Receiver: There is one receiver with continuous utility function v(a, ω). Action set A is assumed to be compact. Given receiver s belief about the true state µ (Ω), receiver is going to choose optimal action to maximize her expected utiltiy E µ [v(a, ω)]. a(µ) arg max a A E µ[v(a, ω)] The tie breaking rule is important in the ref[kamenica and Gentzkow 11] s setup. Note that Sender-preferred action doesn t exist here because we have potentially multiple senders. Let s first be vague about this and assume that reveiver stick to some rule to determine unique a(µ) as receiver s response. Senders: There are N senders with continuous utility u n (a, ω) Given receiver and sender holding common belief µ, a sender n s expected utility is E µ [u n (a(µ), ω)]. Without loss of generality, we can write sender s utility in terms of belief: U n (µ) = E µ [u n (a(µ), ω)] *Let s assume at this time {U n } C( (ω)). Here (ω) is a finite dimensional Euclidean space. Game setup and strategies: All senders act sequentially in the order of 1,..., N, Each sender s action set is a large enough signal set S. At the start of game, a piece of information is made available to the first sender. By information, we refer to a distribution q(π) ( (ω)) which induces belief π Supp(q) with density q(π). A history h n collects the initial information q and all senders actions prior to the n + 1th sender. History h N also contains the realization of signals. A sender s strategy is a mapping σ n : h n 1 S. Receiver s strategy is a mapping σ R : h N a. We make a first subgame perfection assumption. Conditional on the history and other player s strategy, an agent should be rational. Note that here senders has no uncertainty but receiver should act w.r.t. her belief about true state. 3

The timing of the game can be understood as following: The first sender commit to a signal structure conditional on all possible initial information. Each sender sequentially commits to a signal structure conditional on all previous senders signal structure. An initial information structure is choosen by the nature and released to all agents. The signal is released to the first sender. Every sender choose a signal according to the signal he receives from the last sender according to his strategy. Receiver acts when she observes sender N s signal. 2.2 Simplification Receiver Given the perfection assumption on receiver, she has to be rational given all sender s signal structure chosen and signal realized. Thus the realized signal induces receiver s belief over true state µ (Ω). As we discussed before, receiver is going to be passive in this game by choosing a(µ) arg max a A E µ [v(a, ω)]. The Last Sender Let s first think about the last sender s optimization problem given all other player s strategy fixed. According to our perfection assumption, we need the last player to be rational in choosing signal structure given any possible initial information and signal structure before him. Let s assume that in a history, initial information is given and, players 1,..., N 1 are going to send a variety of signals according to their action choice. What player N can observe is player N 1 s possible signals. And by Bayes rule, these signals induces a spectrum of beliefs for player N. This is actually ref[kamenica and Gentzkow] s model with imperfect information. This optimization problem finally reduces to: max E p(λ)u ( E λ(π) [π] ) (1) p ( ( (Ω))) s.t. E p(λ) [λ] = q Solving this problem is also using the convex hull argument: Proposition 1. Given q 2 (Ω), sender N will choose p(λ) 3 ( ) such that it induces a distribution over beliefs l (q) 2 (Ω), l B q and it attains utility level covû(q). Where Û(λ) = U(E λ[π]). Proposition 1 is saying that, effectively we are finding all distributions over beliefs that are Blackwell dominated by our initial information q that attains certain utility level of utility function Û. Û is actually expanding the original utility function U into the space of all possible distributions over beliefs, with dimensionality as the size of support of signals card(supp(q)). Given optimal set of priors P 3 (Ω), informativeness of a prior is defined by informativeness of the belief it induces l = E p [E λ ]. Lemma 1 is saying that there exists a most informative signal structure within all optimal signal structures. 4

Lemma 1. There exists l {E p [E λ π]} such that no other l strictly Blackwell more informative than it yields a higher expected utility. The proof of lemma 1 is just applying Zorn s lemma. The exsitence of optimal l is guaranteed by the theorem in [Kamenica & Gentzkow]. Then we can pick a largest element from the set of optiml experiments. Let s make another assumption on equilibrium perfection: Assumption 1. Whenever there exists most informative signal structures in the set of optimal action, agents will choose a most informative signal structure in equilibrium. Thus, although we can say nothing at this time about the other senders action choice, we know that the last sender is going to choose a most informative action given any prior. Also we know that for whatever most informative experiments, given its posterior itself as prior, than the persuader can at most do at good as this experiment itself. By the choice of action, this must be a Blackwell dominating action and must be chosen under this prior information. Let s summarize all these actions by: L 0 N = { l l 2 (Ω) solves (1) for q (π) and U N } LN 0 is the set of all posteriors that sender N will possibly induce given some prior information. This set summarizes all information about sender N for other senders in the game. Whenever sender N receives a signal structure that induces posterior out of LN 0, he will choose an optimal signal structure that induces a l LN 0. Whenever sender N receives a signal structure that induces posterior in LN 0, he will choose a signal structure that induces exactly the same posterior because of our tie breaking assumption. To utilize LN 0 to solve for equilibrium, we also need a lemma on property of the set LN 0. Lemma 2. If U N (π) C ( (Ω)), L 0 N is bounded and closed in 2 (Ω), thus a compact set. 2.3 Equilibrium The Second Last Sender Consider the second last sender, sender N 1 s problem. He can choose any information structure p 3 (Ω) Bayes plausible and take sender N s reaction into account. Here whenever player N 1 chooses an information structure that induces distribution over beliefs l / LN 0, player N will choose some information structure that induces distribution over beliefs in LN 0. Thus by outcome equivalence, it is without loss of generality to assume player N 1 s optimal strategy is choosing an information structure that induces distribution over beliefs in LN 0. Lemma 3. Given sender N s capacity LN 0 and initial information q N 1, it is optimal for sender N 1 to use a direct mechanism as strategy l LN 0. With Lemma 3, we can rewrite sender N 1 s optimization problem: max E l [U(π)] (2) l LN 0, p 3 (Ω) s.t. E p [λ] = q N 1 E λ [π ] l(π) 5

Thus, given q N 1 and compact LN 0, we can solve for p (N 1) 3 (Ω) and ln 1 E λ [π] LN 0. Similarly, we can use Zorn s lemma to set the tie breaking rule: a sender is not going to choose a strictly Blackwell dominated l. Thus we can recursively define the capacity set: LN 1 0 = { } l l 2 (Ω) solves 1 for q (π) and U N 1 L N 1 = { } l l LN 0 solves 2 for q (π) and U N 1 The definition of LN 1 0 is exactly same as the definition of L N 0 except that we use sender N 1 s utility in the optimization. Thus LN 1 0 is defined as the set of ditributions over beliefs that sender N 1 is willing to induce under some initial information without any other constraint. L N 1 is defined recursively using LN 0. It is defined as the set of distributions over belifs that sender N 1 is willing to induce under some initial information knowing that there is sender N between him and the decision maker. The Equilibrium Now we can solve for the whole equilibrium with a chain of senders. We can recursively define the capacity set if the chain using the following definition: Definition 1. The individual capacity set L 0 n is defined as: L 0 n = { l l 2 (Ω) solves 1 for q (π) and U n } The chain capacity set L n is defined as: L n = { l l L 0 n+1 solves 2 for q (π) and U n } Thus, given initial information q, the first sender s problem can be written as: max E l [U 1 (π)] (3) l L 2, p 3 (Ω) s.t. E p [λ] = q N 1 E λ [π ] l(π) Let the first sender s optimal action be p 1(q, L 2 ) and induced distribution over beliefs l 1(q, L 2 ), we have a full charactrization of equilibrium: Proposition 2. SPNE is defined by as set of strategies {p n 3 (Ω)} defined according to {p n(q, L n+1 )}. On equilibrium path, first persuader chooses according to p 1 and all following persuaders truthfully pass the information they get. 3 Properties of Capacity Set In this section, we want to derive some important properties about the capacity set we defined in the previous section. Let s start from two direct corollary: 6

Corollary 1. N n=1l 0 n L 1 Corollary 1 is directly from the construction of the capacity sets. By Lemma 3, we know that each L n L n+1. Also q Ln 0 L n 1, q can be implemented as a direct mechanism for sender n. Then by reduction, the intersection of individual capacity set will be a subset of the chain capacity set. Although corollary 1 is very straight forward, the interpretation of it is non-trivial. Since N n=1ln 0 might be a strict subset of L 1, there might be some initial information q 2 (Ω) which some senders in this chain are not willing to pass through when they are the only sender, now being able to be passed through by the chain. Specifically, a weaker result is that, by adding other senders after a first sender, the first sender might be willing to pass through some information that he is not willing to pass through when being the only sender. Then, locally (for some initial information), adding more senders might be strictly desirable. Let s consider a simple example with two senders. Assume sender Ais willing to pass through all information whenever a signal is not very informative. And for a signal Blackwell more informative than some threshold, he will just pass through that threshold information. Thus, with A as the only sender, any signals more informative than the threshold will not be passed through. However, let s consider a second sender B, who is only willing to pass through very informative signals or the uninformative prior. If the utility from prior is relatively low for sender A, then he would rather pass trough a very informative signal to B instead of making it less informative thus undesirable for B. In fact, we can make up the preference of B very extreme that this chain can almost pass through the perfectly informative signal. 1.0 0.8 0.6 0.4 0.2 Figure 3: An example with two senders In figure 3, we illustrate a binary state example. The horizontal axis is the space of beliefs, denoted by the probability of true state being 1. The vertical axis is utility level. The red curve represents the utility of sender A and the blue curve represents the utility of sender B. Let s assume the initial information is perfect (induces two beliefs at p = 0 and p = 1). The prior is p = 0.5. Thus for the sender A, as we can see from the graph, his utility on beliefs is convex on interval [0.3, 0.7] and he is willing to pass through any signal inducing beliefs within this interval. However, given a very informative signal, he will induce beliefs p = 0.2 and p = 0.8. Now let s consider sender B, for signals inducing p 1 < 0.2, p 2 > 0.9, his utility is convex thus he is willing to pass through. For any other 7

signals, he would rather induce the prior. Let s explicitly plot their capacity sets in Figure 4: 0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 Figure 4: Capacty sets of Example 1 Point on Figure 4 represents the beliefs induced by a binary signal structure. The horizontal axis represents the belief with lower p induced by the signal and the vertical axis represents the belief with higher p induced by the signal minus the prior belief. Thus the upper left corner represents the perfectly informative signal and the y = 0 and x = 0.5 area represents the totally uninformative signals. The red area is the pairs of beliefs that sender A is willing to induce LA0. As we can see, the red area coveres some relatively uninformative pairs of beliefs. The blue area is the pairs of beliefs that sender B is willing to induce LB0 and it covers relatively informative pairs beliefs (and of course the prior). It s easy to see that LA0 LB0 contains only the prior. However, the black area is pairs of beliefs that sender A is willing to induce knowing sender B is following him LA. We can see that LA contains some pairs of beliefs strictly more informative than any pair of beliefs in LA0. Thus, whenever the initial signal is more informative than these signals, including a second sender is strictly better in terms of information transmission. Naturally, we want to ask whether we can extend this local result to global result: Is it possible that a chain performs weakly better than a single persuader under any initial information and strictly better under some initial information? The following corollary answers this question. Corollary 2. If there exists n N such that Ln0 L1, then Ln0 = L1. Corollary 2 is saying that a chain can never perform strictly better than a single sender (in this chain) given all possible initial information. If a chain performs weakly better than a single sender n, then given any initial information, the outcome signal from the chain should be Blackwell no less informative than the outcome signal from the single sender. Then let s restrict the initial information to Ln0, this should still be true. Then if Ln0 6 L1, there exists some l Ln0 that will not be passed through the chain. Thus the interpretation of the concept Ln0 L1 is clear. The proof of corollary 2 is also straight forward. Since for sender n, when he is making decision, given any initial information, 8

he knows that the unconstrained optimal signal can be passed through the other players. Thus he will choose that optimal signal. Thus 1 L 0 n and they have to be the same set. The intuition of this corollary is also simple. If a chain can locally improve information transmission, it must has ability to credibly threaten a sender by garbling its signal in an undesirable way. However this ability actually means the chain will shrink some initial information that a single sender passes through. Up to this time, although we derived some relationships between a chain s capacity and individual sender s capacity, the concept of capacity set is still abstract. Let s characterize some important properties of individual capacity set. Corollary 3. If initial information q L 1, then the equilibrium will be: First sender induces distribution over beliefs l = q. All following persuaders truthfully pass through the information they get. Corollary 3 is saying that the capacity set defines exactly all the possible initial information that can be truthfully passed through a chain. This corollary is straight forward following the definition of capacity set and our tie breaking rule Assumption 1. Given initial information q L 1, if optimal signal structure chosen by the first sender is not q, then l must be strictly less informative than q and generates strictly higher utility. Then given any information structure leading to l as optimal signal structure, q is also plausible and strictly better than l for the first sender. This contradicts the fact that q L 1. Proposition 3. Capacity defined by support: l L 0 n, if l 2 (Ω) satisfies: 1. Supp(l) = Supp(l ) 2. E l [π] = µ 0 then l L 0 n Proposition 3 is saying that the individual capacity set is actually only defined by the support of the distribution over beliefs it contains. Information structures contained in L 0 n have two cases: 1. Number of beliefs induced by the signal less than or equal to number of states. Then given a set of beliefs, the distribution over these beliefs is uniquely pinned down by Bayes plausibility. 2. Number of beliefs induced by the signal is larger than number of states. Then there are still degrees of freedom given Bayes plausibility. However with corollary 3, all these Bayesian plausible distribution over beliefs will be a best response to some initial information, i.e. they are all contained in the individual capacity set. Thus, the characterization of individual capacity set is largely simplified. For a finite number of signals, the information structures can be characterized in a finite dimensional space. Now we know that the individual capacity sets have good property, thus the intersection of individual capacity sets still has this good property. It s will be good if we can do further analysis only on the intersection of individual capacity sets. So let s derive some sufficient conditions under which capacity of a chain is intersection of individual capacities. 9

Corollary 4. If there exists n N such that L 0 n L 0 n for all n N, then L 1 = L 0 n. Corollary 4 is simply saying the whenever there is a bottleneck sender in a chain, then the chain capacity is same as intersection of individual capacities. The proof is straight forward by combining Corollary 1 and Corollary 2. 4 Comparative Statics In this section, we want to study two factors potentially affecting information transmission through a chain. 1. Fix a chain, if initial information q becomes more information, what about the outcome l? 2. Fix intial information q, if the chain capacity L 1 becomes larger, what about the outcome l? These two questions can also be asked on a chain with a single persuader. However, even with a single sender, these comparative statics haven t been well studied. In these two questions, by more informative, we refer to Blackwell informativeness, by larger, we refer to set inclusion and we are interested in the Blackwell informativeness of the outcome. As one might have expected, the answer to these two questions are quite ambiguous because Blackwell informativeness is a very partial order, especially in a high dimensional signal space. In our model, even though the state space is small, since the initial signal can be high dimensional, studying Blackwell informativeness will be very hard. Let s first see an example show why even with the simplest setup, comparative statics is ambiguous. Single sender and ternary states It s very easy to construct examples with ternary states in which more informative initial information doesn t lead to more informative outcome. Because Blackwell sufficiency in a ternary states belief space is equivalent to inclusion of triangles and inclusion of triangles is a very strong result. Let s assume there are three states {S 1, S 2, S 3 }. Thus all beliefs over the three states can be represented in a two dimensional simplex S 1 S 2 S 3. In this simplex, independent of how we parametrize the space, each point can be uniquely represented as a linear combination of S 1, S 2 and S 3. Thus we can use the three coefficients to represent the probability of the corresponding states. The largest blue triangle in Figure 5 represents the space of all possible beliefs over the three states. A full mix of the three states µ 0 is assumed to be the prior belief. Utility of the sender is represented by the arrows in Figure 5. The utility function is peaked at these arrows and decreases very fast to zero. Since we assumed the utility to be continuous in the beginning, we construct the utility as an approximation of a summation of Delta function. The utility level at three red arrows is 0.5 and at three black arrows is 0.8. Prior belief is point µ 0 in the left figure and the black line in the right figure. 10

B 3 B 2 A 3 Μ 0 A 1 A 2 S B 1 1 2 Figure 5: Ternary states example Initial Information q contains three signals inducing belief A 1, A 2, A 3, which locates exactly at the three red peaks of the utility function. Thus given initial information q, the sender will definitely choose to induce l = q and get expected utility 0.5. Now let s consider a Blackwell more informative information structure q which fully reveals true states (inducing beliefs S 1, S 2, S 3 ). Then from the right figure, we can easily see that the sender s optimal signal will induce the three black peaks B 1, B 2, B 3 and get expected utility 0.8 (which is also the highest possible utility he can get). It s easy to see that information structures inducing A 1, A 2, A 3 and B 1, B 2, B 3 are not Blackwell comparable. Thus, increasing information contained in initial signal will not necessarily increase information passed through by the sender. Single sender and binary states Although with more than two states, it s quite obvious that comparative statics is ambiguous, the binary states case is less trivial. Let s consider the following example. Let s assume there are two states {S 1, S 2 }. Thus all beliefs over the two states can be represented in a one dimensional simplex S 1, S 2 (A line). The only parameter for a belief is the probability p [0, 1] of state S 1. Each point within [0, 1] represents an induced belief as in the left graph in Figure 6. Utility of the sender is represented by the arrows in Figure 6. The utility function is peaked at these arrows and decreases very fast to zero. This is also an approximation of a summation of Delta functions. U(0.2) = U(0.6) = 0.4, U(0.4) = 0.3 and U(0.8) = 0.9. Prior belief is µ 0 = 0.5, represented by the black dashed lines. Initial information q contains two signals inducing beliefs 0.2, 0.6, which locates exactly at the two blue arrows. Thus it s easy to see that given initial information q, the optimal induced belief will be l = q. Now let s assume that the sender can discover more information and then he induces two beliefs 0.4, 0.8, which locate exactly at the two red arrows. Since the expected 11

utility (height of the dashed line at prior) is higher with 0.4 and 0.8, we know that the sender is better off given more information. However, we know that in this case, since given q, the sender can always garble some probability to p = 0.4 the utility level at 0.4 should be dominated by a combination of 0.2 and 0.6, as shown in the left figure. Thus it seems that given more information and if the sender induces 0.4 and 0.8, he can split some weight on 0.4 onto 0.2 and 0.6 to get a higher expected utility. We want to show in this example that this can be impossible and 0.4, 0.8 is indeed optimal. 1.0 0.8 0.6 0.4 0.2 0.2 0 p 0 A 1 B 2 B 1 A 2 Μ 0 1 B 2 1 A 1 A 2 B 1 Μ 0 0 Figure 6: Binary states example Initial information q contains three signals inducing beliefs (0, 0.5, 1). Noticing here there the distribution over the three signals is not uniquely determined. Let s look at the right graph of Figure 6. We use the three vertices of the blue triangle to represent the three signals. If we use the edge between 0 and 1 to represent the space of beliefs, then the induced belief of a point in this triangle is its projection onto the edge 0 1. For example, the third signal s projection onto edge 0 1 is 0.5. The Bayesian plausibility of these three signals requires the expectation of induced belief should be µ 0 = 0.5. Thus, prior is actually any point in the triangle projecting onto µ = 0.5, the exact location depends on the probabilities of the three signals. Let s assume the three signals are constructed such that prior is located on the dashed line. Let s define the utility function on the triangle. The utility of a point is the utility of its induced belief. Thus indifference curves of the utility function should all be perpendicular to the 0 1 edge. The utility function we defined can thus be represented as the four rectangles. The optimal signal structure should be the highest plane over prior (the dashed line) defined by points on the utility function. We can see that it s actually edge B 1 B 2. It s also easy to see on this graph why it s not possible to split the signal inducing 0.4 onto 0.2 and 0.6. Because given our prior, the only possible representation of information structure q is points A 1 12

and A 2. Although it seems that 0.4 is Blackwell dominated by (0.2, 0.6), actually a signal constructed by (0.2, 0.6) inducing 0.4 can not be the signal B 2. The key intuition is that with more signals than states, signals inducing the same belief might be very different. The above two examples showed that generally comparative statics on information in a persuasion problem can be very ambiguous. However, we can make some progress utilizing the capacity set of a chain. 4.1 Initial information within capacity set By Corollary 3, we know that when initial information is in the capacity set, then it will be truthfully passed through the chain. Thus comparing the initial information is actually equivalent to comparing the equilibrium outcome. Comparative statics is almost free: Corollary 5. 1. If q, q L 1 and q B q, then l (q ) = q B q = l (q). 2. If L 1 L 1 and q L 1, then l (q, L 1) = q B l (q, L 1 ). ( B means Blackwell weakly more informative) 4.2 Binary initial information Although Corollary 5 provides some possibility of doing comparative statics, the capacity set of a chain might be small and in most more general cases comparative statics are ambiguous. Also the undiscussed case is actually more relevant because the chain capacity set is small when preferences of senders are misaligned. We need some more restrictive assumptions to proceed. Assumption 2. 1. Binary initial information: Card(Supp(q)) = 2. 2. Minimal capacity: L 1 = L n The first assumption we made in Assumption 2 is necessary because we ve already seen examples with higher dimensional initial information where comparative statics are ambiguous. The second assumption is useful because we ve already known some good properties of individual capacity sets and we will utilize these properties. Lemma 4. If Card(Supp(q)) = 2, then the unique equilibrium is that the first sender chooses p inducing l 2 (Ω) with Card(Supp(l )) = 2 and following senders truthfully pass through signals. In the proof of Lemma 4, we utilize Proposition 3. Fixing the support of induced distribution over beliefs, when the first sender modify the weight on different beliefs in a Bayesian plausible way, the incentives for all the following senders are not violated. Thus, we can freely let the first sender to reduce probability of dominated beliefs. Noticing that with binary initial signal, any signal structure having more than 2 signals will 13

have dominated signal. Thus the only induced distribution over beliefs will have binary support. With Lemma 4, we know that given binary initial information, only binary signal structure will be chosen in equilibrium over the chain. Thus the effect of increasing informativeness of initial information is monotonic: Proposition 4. Assume L 1 = L 0 n, if binary initial information q B q, then l (q) B l (q ). Noticing that in Proposition 4, we don t require the less informative initial information q to be binary. Proposition 4 is saying that, when the capacity of a chain attains the minimal capacity, then increasing the informativeness of initial information to more informative binary initial information will increase the informativeness of equilibrium outcome. Proposition 5. Assume initial information q has binary support, if L 1 L 1 = Ln 0, then l (q, L 1) B l (q, L 1 ). Noticing that in Proposition 5, we don t require L 1 to be Ln 0, also we allow for preferences of all senders to be different. Proposition 5 is saying that, when the initial signal is binary, replacing a chain with a chain with larger capacity set and that attains minimal capacity will increase the informativeness of equilibrium outcome. Now we ve characterized the comparative statics with respect to information under the restrictive assumption 2. However, although binary information might not be that unrealistic, the minimal capacity assumption is very restrictive. The first reason is that it s very hard to acquire L 1 = Ln 0. Figure 7 shows an example with three senders in a chain. There are two states and thus the space of beliefs is one dimensional. The utility functions of the three senders are randomly generated as in the left graph. We consider all binary signal equilibria and plot them in the right graph. The meaning of the graphs is exactly the same as in the first example. The blue area in the right graph is Ln 0 and combining the red area with blue area is L 1. 0.6 1.0 0.5 0.8 0.4 0.6 0.3 0.2 0.1 0.4 0.2 0 Figure 7: Example with three senders 0.2 0.4 0.6 0.8 1.0 We can see that except for placing a bottleneck sender in a chain, generally L 1 will not be the same as L 0 n. Thus now we still can only do comparative statics on a very restrictive range of problems. The next step is to relax the assumption that L 1 = L 0 n. 14

Conjecture 1. Assume initial information q has binary support, then l (q, L 0 n ) B l (q, L 1 ) Conjecture 1 is saying that, letting the first sender to optimize with constraint L 0 n actually provides an information lower bound under binary initial information. Thus, we can perform comparative statics with extra constraint L 0 n on first sender and use that as a lower bound of actually change in equilibrium outcome. A full characterization of our comparative statics results can be shown in the following graphs: Comparative statics w.r.t. initial information Bin info q B Bin info q l (q, L 1 )? l (q, L 1 ) B l (q, L 0 n ) B l (q, L 0 n ) Comparative statics w.r.t. capacity set L 0 n B L 0 n l (q, L 1 )? l (q, L 1) B B l (q, L 0 n ) B l (q, L 0 B n ) 5 Conclusion In this paper, we characterized the equilibrium of a hierarchical Bayesian persuasion problem, studied the properties of the equilibrium and derived some comparative statics results. The main results can be summarized as following: 1. Equilibrium can be characterized as the first sender persuading decision maker directly with an additional constraint on the distribution over beliefs he can induce. 2. For some initial information, adding intermediary can improve information transmission of a chain. 3. For all possible initial information, adding intermediary can not always improve information transmission of a chain. 4. Assume binary initial information. There is an information lower bound of equilibrium outcome and this lower bound becomes more informative when: 1) initial information becomes more informative, 2) chain capacity becomes larger. Work plan: 1. Prove conjecture 1 2. Clarify the robustness of model by studying different commitment scheme. One direction is to try to establish the equivalence between garble conditional on information structure and garble conditional on information received. 15

3. A full characterization of capacity set L. Establish equivalence between a capacity set and a strategic sender. 4. Extend the model to allow for multiple sources. References [Ambrus et al., 2013] Ambrus, A., Azevedo, E. M., and Kamada, Y. (2013). Hierarchical cheap talk. Theoretical Economics, 8(1):233 261. [Ambrus and Takahashi, 2008] Ambrus, A. and Takahashi, S. (2008). Multi-sender cheap talk with restricted state spaces. Theoretical Economics, 3(1):1 27. [Crawford and Sobel, 1982] Crawford, V. P. and Sobel, J. (1982). Strategic information transmission. Econometrica: Journal of the Econometric Society, pages 1431 1451. [Gentzkow and Kamenica, 2011] Gentzkow, M. and Kamenica, E. (2011). Competition in persuasion. Technical report, National Bureau of Economic Research. [Ivanov, 2010] Ivanov, M. (2010). Communication via a strategic mediator. Journal of Economic Theory, 145(2):869 884. [Kamenica and Gentzkow, 2009] Kamenica, E. and Gentzkow, M. (2009). Bayesian persuasion. Technical report, National Bureau of Economic Research. [Li, 2009] Li, W. (2009). Peddling influence through intermediaries. American Economic Review, Forthcoming. 16