Models of Reputation with Bayesian Updating

Models of Reputation with Bayesian Updating Jia Chen 1 The Tariff Game (Downs and Rocke 1996) 1.1 Basic Setting Two states, A and B, are setting the tariffs for trade. The basic setting of the game resembles a coordination game where eliminating tariff is only good when the other party is doing the same. State A has two types, Tough and Weak. Tough type is able to resist domestic protectionist pressure while Weak type will give in to the opposition from the protectionist force. A s type can not be observed by B but B know the parameter of the Bernoulli distribution charaterizing A s type. For both type of A there is possibility that opposition occurs. State B leads the Stackleburg game. The game play is draw in Figure 2.2 on page 32 of Downs and Rocke (1996). 1.2 Repeated Play and Bayesian Updating If the game is played only once, a number of equilibria exists as p, q, and r vary. B may choose to eliminate tariff if the probability of Tough type A is high enough and/or the probability of opposition is low enough. Formally, B s expected payoff for eliminating tariff is given by E[u B ( t)] = p[2(q + r) 4(1 q r)] + (1 p)[2q 4(1 q)] (1.1) Given the payoff of keeping tariff is always 2, B will eliminate tariff iff E[u B ( t)] 2 which simplifies to q + pr 1 3 or p 1 3q p (1.2) 3r A behaves in the way that is best for the immediate payoff. Tough type A will eliminate tariff unless the opposition is strong and Weak type A will eliminate tariff only if the opposition is not present. 333 UCB, Boulder CO 80309. Department of Political Science, University of Colorado, Boulder. Email:jiac@colorado.edu. 1

The dynamic of the game is changed if the game is played repeated over a infinite horizon. In particular, the Weak type A starts to care about the long run benefits from making up an image of Tough type. The image of a Tough type A is created through the Bayesian updating by B upon observing the behavior of A at the end of each period. If A eliminated tariff, the posterior probability that A is Tough type is given by: Pr(T ough t) = p(q + r) pr + q Given q+r pr+q > 1, Pr(T ough t) > p. Thus eliminating tariff will instantaneously make up the image of a Tough type A. On the other hand, keeping tariff will make Weak type A more likely in the eyes of B in that which is smaller than p. Pr(T ough t) = p(1 q r) 1 q pr, After one period of play, having observed the behavior of A, state B will evaluate the expected payoff of cooperating using the updated belief, Pr(T ough t) and Pr(T ough t). For both types of A, convincing B that her type is Tough and hence induce B to eliminate tariff is always the best strategy. To see this, note if B s belief is such that p < 1 3q 3r, B will keep tariff, resulting in a payoff of 2 for both actors. But if B s belief is such that p 1 3q 3r, B will eliminate tariff believing A is likely enough to be Tough type, from which Weak type A gains the expected utility of 2(q + r) + 4(1 q r) = 4 2(q + r). This is always greater than 2. A Weak type A might be willing to eliminate tariff even in the presence of opposition or even strong opposition as long as it boosts the posterior probability of strong type over p the reputation gains in the future outweigh the instantaneous loss. That is, if or 2 + 2δ 1 δ 4 + 2δ 1 δ δ 1 3, a weak type A is willing to eliminate tariff in the presence of opposition to establish a reputation of being the strong type. As can be verified in the same way, δ has to be greater than.6 to make eliminating tariff by a weak type rational. Question 1.1 Given the incentive of imitating of Weak type A, why cannot A tell from the fact that a player switch from cooperating and defecting back and forth that this player is Weak type and is mimicking and taking advantage of A? 2

Hint. The answer hinges on the setting of the game, where even the Tough type A could defect if the opposition is strong. Given the game assumes B cannot observe the level of opposition, B is unable to tell if the defection is by a mimicking Weak type or by a Tough type facing strong domestic opposition. Extension 1. Alter the game setting by allowing B to observe the level of opposition before deciding to eliminate tariff or not: The level of opposition is drawn by Nature at the beginning of the game and sustain for ever. Find the conditions for reputation building. The level of opposition is drawn by Nature at the beginning of each period of the game. Find the conditions for reputation building. Question 1.2 The outcome of the Bayesian updating on B s belief will be different given different observed levels of opposition. How can the game be modified such that reputation still matters when domestic level of opposition is observable? 2 Hegemonic Stability Game (Alt et al. 1988) The game play is described in Figure 1 on page 7 of Alt et al.(1988). The core of the game is the ally(a) s uncertainty about the cost of punishment by the hegemon(h). The cost of punishement, x, is either 1 or 0, following a Bernoulli distribution Bern(w). The ally cannot observe either the realized cost, or the true value of parameter w of the distribution of x. The ally only knows w has a Beta prior distribution: Beta(, β). The ally decides whether to challenge the hegemon by evaluating the probability of the hegemon being a strong type (i.e. the cost of punishment, x, is 0). If the game is played only once, given the prior distribution of w, the ally will challenge if b > β + β. (2.1) And the strong hegemon always punishes and the weak hegemon always acquiesces. But if the game is played repeated with different allies in each peroid, the weak type hegemon may benefit from carrying out costly punishment in return for the reputational gains. If the weak hegemon could imitate the behavior of the strong hegemon, the ally may form posterior belief that is favorable to the weak type hegemon in that it deters challenge from the ally. Assuming x is drawn by Nature at the beginning of each period of the repeated game. This scenario involving reputation building happens when the prior belief prompts the first ally to challenge (i.e. b > +β ) and based on if the challenge is punished by the hegemon, the second ally could potentially be 3

detered if the posterior probability of strong hegemon is raised. How can the weak hegemon imitate the behavior of the strong such that the updated belief is favorable? First note that if the weak hegemon always punishes the challenge, the second ally will not be detered because the pooling outcome (both weak and strong punish) did nothing to the prior distribution and ally 2 will still challenge. Not punishing is also not good as it separates the weak hegemon from the strong hegemon in front of the ally and induce challenge. 2.1 Updating with Bayesian Inference The way that the uninformed player update his belief is a little different from what is seen before. Here the uninformed player, the allies, is even ignorant about w, the parameter of the Bernoulli distribution of x. He only knows the prior distribution of w, a Beta distribution. In other words, instead of knowing the distribution of x, allies only know the distribution of the parameter of the +β distribution. Given w Beta(, β), it is obvious E(w) =. The second ally updates his belief about the distribution of w upon observing if punishment took place in the last period. Given the Baysian theorem, the updating equation could be written as follows p t (w r t 1 ) p(r t 1 w) p(w) (2.2) r t 1 = P or P is the response from hegemon (punishment or not) once challenge is observed. Clearly the ally is updating belief regarding w, but not x t 1 itself. This is because x is redrawn at the beginning of every period. Knowing x t 1 equals 0 or 1 is not as helpful as knowing the range of the key parameter of the distribution of x. Special attention should be paid to the likelihood term in Equation 2.2, i.e. p(r t 1 w). This is the probability of observing certain response from the hegemon given the distribution of the types of hegemon. The important thing to keep in mind is that the generation of p(r t 1 w) has to be consistent and does not break the sequential rationality of the hegemon s strategy. Formally, p(p w) = w p(p x = 1) + (1 w) p(p x = 0), (2.3) where p(p x = 1) and p(p x = 0) has to be sequentially rational for both type of hegemon. For example, if the previous challenge is punished and in equilibrium only the strong type hegemon punishes, the likelihood is thus p(p w) = w 0 + (1 w) 1 = 1 w. (2.4) Plugging it into the Bayes formula, it is found that the posterior distribution of w follows Beta(, β + 1). E(w) of the updated distribution is thus 4

E(w) = + β + 1 (2.5) Because +β+1 < +β, this could be a good news for the hegemon because if he punishes costly, the second ally is going to update the belief that a weak hegemon is less likely and give up challenging next round. It creates benefits to the hegemon particular when x = 1 is drawn by detering all allies from challenging. Good as it sounds, this scenario cannot yet be an equilibrium, because if the hegemon can profit from costly punishment when x = 0, the likelihood, p(p w), is no longer consistent, i.e. p(p x = 1) 0 and p(p w) 1 w. The whole thing will collapse. The hegemon cannot benefit from always carrying out costly punishment when x = 1. Given p(p x = 1) = 1, the likelihood in the Bayes formula, p(p w), equals one. The posterior distribution is thus identical to the prior distribution. +β If b >, the second ally will challenge despite observing the previous challenge being punished. Since the ally will always challenge whatsoever, hegemon has no reason to punish given x = 0 is drawn. 2.2 Deterance through Randomized Costly Punishment As seen in the above example, the hegemon cannot deter challenge by always carrying out costly punishment because it renders the posterior probability identical to the prior distribution. Yet it is possible for the hegemon to deter challenge through carrying costly punishment probabilistically in equilibrium. This is generalized in an equilibrium where the hegemon always punishes if x = 0 and punishes with probability k if x = 1 and the ally never challenges. We first examine the posterior probability from Bayesian updating if punishment occurs. The likelihood function is now given by: p(p w) =kw + (1 w) (2.6) Plugging it into Equation 2.2 provides the posterior distribution of w. Since it is only the updated E(w) we need to compute for the ally, it can be obtained by taking the posterior distribution as a linear combination of two Beta distribution. That is 5

E(w t ) =ke(w 1 ) + E(w 0 ) k( + 1) = + β + 1 + + β + 1 k( + 1) + = + β + 1 (2.7) w 0 is the posterior distribution of w given punishment and the ally believes hegemon punishes only if x = 0. E(w 0 ) is obtained from Equation 2.5. w 1 is the posterior distribution of w given punishment and the ally believes hegemon punishes only if x = 1, which is obtained similar to that in Equation 2.5. The next step is find if there exist a range of k such that the second ally prefers not to challenge given the updated belief of E(w t ). The expect utility of challenge is E[u A (C)] = E(w t )[k(b 1) + (1 k)b] + [1 E(w k )](b 1), (2.8) which reduces to E[u A (C)] = E(w t )(1 k) + b 1. (2.9) Thus the ally will not challenge if E(w t )(1 k) + b 1 > 0. Together with (2.7) we can then find the critical value of k to deter the ally from challenging. k = β{1 [b( + β + 1) β]} [b( + β + 1) β] (2.10) Thus the hegemon will punish randomly at the probability of k ɛ upon drawing x = 1 and the second ally is deterred. But note such deterence will only happen if the first ally challenge given the randomized strategy by the hegemon, that is, E(w)(1 k) + b 1 > 0 (2.11) b > k + β + β (2.12) If this is not met the first ally would not challenge and the second ally would do the same as the belief cannot be revised if there is no challenge and the game proceed afterward. In Alt et al(1988) s model the second ally would still challenge given the game is player only twice and the second ally knows the hegemon will not punish if x = 1 is drawn in the last period, thus the belief E(w) = β +β, instead of E(wr ) = k+β +β, applies. The deterrence equilibrium is thus established. As the last case, it is straightforward to check that if b > β+1 nothing the hegemon could do to deter the allies from challenging. +β+1, there is 6