Information obfuscation in a game of strategic experimentation

Size: px

Start display at page:

Download "Information obfuscation in a game of strategic experimentation"

Austen Darcy Edwards
5 years ago
Views:

1 MANAGEMENT SCIENCE Vol. 00, No. 0, Xxxxx 0000, pp issn eissn INFORMS doi /xxxx c 0000 INFORMS Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named journal. INFORMS journal templates are for the exclusive purpose of submitting to an INFORMS journal and should not be used to distribute the papers in print or online or to submit the papers to another publication. Information obfuscation in a game of strategic experimentation Daron Acemoglu Economics, MIT, daron@mit.edu Kimon Drakopoulos Data Sciences and Operations, Marshall School of Business, USC, drakopou@marshall.usc.edu Asuman Ozdaglar LIDS, EECS, MIT, asuman@mit.edu Key words : MSC2000 subject classification : OR/MS subject classification : Primary: ; secondary: History : 1. Introduction 2. Outline of the paper 3. Model We consider a dynamic game in discrete time t {1, 2,...} between two players, E and I. The reader may think of Player E as an entrant to a new market who is uncertain about the type of the market and is trying to learn it through experimenting by observing the realized rewards. Player I can be thought of as the incumbent in the same market, who knows the type of the market and tries to manipulate (or obfuscate) the reward process in order to affect the learning behavior and trajectory of Player E. Clearly, for the problem be non-trivial, both exploration and obfuscation are costly. At each time instant, players receive rewards and pay costs that depend on: (i) Their actions: Player E can decide to explore (e t = 1) or exit (e t = 0). Player I can decide to obfuscate (i t = 1) or not obfuscate (i t = 0). In this paper we assume that exit decisions are final. The results would be exactly the same if we assume that re-entry is possible but is immediately observable by the the incumbent. Adding such a feature to our model only complicates the 1

2 2 Management Science 00(0), pp , c 0000 INFORMS analysis, without adding to the understanding of the problem. For notational convenience we use a t = e t 1 A = {0, 1} to denote the state of Player E. In particular a t = 1 when the entrant is still in the game, and a t = 0 when the entrant is out of the game. (ii) The type θ Θ = {0, 1} of the reward process: if the reward process is bad, (θ = 0), it never produces payoffs for any of the players. If the reward process is good, the probability of an arrival at a given time is equal to λa t (1 i t ) + λ(1 a t ). In other words, when the reward process is good and the entrant is exploring, positive payoffs may be realized if and only if Player I is not obfuscating. If the entrant exits, positive payoffs may only be realized for Player I independently of her action (but are not any more observable by Player E ). The realized reward after an arrival from the reward process depends on the action of each player and the history of the game as summarized in Table 1. Information is asymmetric. Specifically, Player I knows the type of the reward process, while player E does not. Furthermore, Player I knows whether Player E has exited so far, i.e. knows a t but does not observe exit decisions at the time that they occur. In contrast to Player I, Player E does not know (a priori) the type of the reward process and does not observe the obfuscation action of Player I. We denote by p 0 the probability that Player E assigns to the reward process being good at time t = 0. Player E can only learn the type of the reward process through observing rewards. In order to observe rewards the entrant needs to experiment (e t = 1) but as we will shortly see experimentation is costly. The first time that a positive reward is realized we say that a revelation occurrs, since after such an arrival, the type of the reward process is perfectly revealed to Player E. We denote by T the first time that a revelation occurs. In the case that a revelation never occurs, we write T =. Note that if e t = 0, that is if Player E has exited, she is not able to observe arrivals from the reward process. We denote by u i (t), i {E, I}, the stochastic process corresponding to each player s rewards and by R i (t) the corresponding filtration. Player I pays cost c I per unit time of obfuscation. Similarly, player E pays cost c E per unit time of exploration, i.e., c E (t) = c E e t, c I (t) = c I i t. Finally, after a revelation has occurred, if Player E doesn t exit (that we show is optimal in Section 4) we say that she exploits in which case she pays cost c E. A strategy profile (or from now on a strategy) for player E is a discrete time stochastic process e t, progressively measurable with respect to her information set I E (t) = R E (t). In other words, the

3 Management Science 00(0), pp , c 0000 INFORMS 3 Table 1 Rewards and costs if the reward process is good. E explores (e t = 1) Reward: 0 E Cost: c E I obfuscates (i t = 1) I E I doesn t obfuscate (i t = 0) I Reward: 0 Cost: c I Reward: 1 g E d (ge d after revelation), w.p. λ, 0, w.p.(1 λ) Cost: c E ( c E after revelation) Reward: 1 g I d (gi d after revelation), w.p. λ, 0, w.p.(1 λ) Cost: 0 Reward: 0 Cost: 0 Reward: 1 g t I, w.p. λ, 0, w.p.(1 λ) Cost: c I Reward: 0 Cost: 0 E exits (e t = 0) Reward: 1 g I m a t + gi m (1 a t ), w.p. λ, 0, w.p.(1 λ) Cost: 0 Assumptions 0 < c E < λg d E 0 < g d I g d I < g m I g m I g t I < g m I strategy of player E at any time instant may only depend on the observed rewards. We denote by E t the filtration generated by the process e t. A strategy profile for player I is a stochastic process i t, progressively measurable with respect to her information set I I (t) = R E (t) E t 1 Θ, i.e., the strategy of player I at any time instant may only depend on the observed rewards, the state a t of Player E and the type of the reward process. We denote by I t the filtration generated by the process i t. Given a strategy for Player E and Player I, the expected discounted payoff of each player is given by [ ] E δ t (u i (t) c i (t)), t=0 where expectation is taken with respect to other player s actions and with respect to reward arrivals. For simplicity we assume that the discount factor is common and equal to δ for both players although this is not a necessary assumption for our results. Figure 1 provides an illustration of the timing of events for a given period t. 1 At this point we should note that we consider the general case where the rewards realized at the time of the first arrival from the reward process ( g d I, g d E, g d I, g m I, g t I) may be different that the rewards that are realized after the first arrival. The reason that we allow for this will be evident in Section 8. Until then, the reader may safely focus on the case where g d I = g d I, g d E = g d E and g m i = g t I = g m I.

4 4 Management Science 00(0), pp , c 0000 INFORMS I observes state of E, a t Rewards are realized according to Table 1. t t + 1 Time I and E decide on their actions, i t and e t respectively. Figure 1 Timing of events in our game of strategic experimentation We next list a set of assumptions that make the game between E and I non-trivial. Assumption 1: 0 < gi d < gi m. This assumption ensures that Player I strictly prefers being alone in the game than sharing rewards with Player E. If this was not the case, Player I would trivially never obfuscate information. Assumption 2: 0 < λge d c E. This assumption ensures it is optimal for Player E to be in the game conditioned on the reward process being good. If this was not the case, Player E would never explore. We denote by N t R E (t) the event that a revelation has not occurred by time t. A very important observation is that given players strategy profiles, N t summarizes all the relevant public information by time t. Given this observation the information sets of the two players can be significantly simplified. In particular, the information set for Player I is equal to I I = {0, 1,...} {0, 1} A Θ (time, occurrence or not of a revelation, state of Player E, type of the reward process). Similarly, the information set of Player E is equal to I I = {0, 1,...} {0, 1} A (time, ocurrence or not of a revelation, her own state). Since player E is uninformed about (i) the type of the reward process (ii) the action i t of player I at time t, she needs to form beliefs about both. A belief process is a progressively measurable process with respect to R t, denoted by p t, taking values in [0, 1], where p t denotes the probability that the reward process is good, given the public information up to time t, i.e., p t = P(θ = 1 N t ) (1) Moreover, throughout the rest of the paper, we denote by q t the belief that player I is not obfuscating and the reward process is good, given the public information up to time t, i.e., q t = P(θ = 1, i t = 0 N t ). (2)

5 Management Science 00(0), pp , c 0000 INFORMS 5 We will soon show, in Section 5, that the belief q t can be calculated using p t and the strategy profile i t in a fairly straightforward manner. Definition 1. A Perfect Bayesian Equilibrium consists of strategy profiles (e t ) t {0,1,...} and (i t ) t {0,1,...} for players E and I respectively, and a belief process (p t ) t {0,1,...} such that at all times t {0, 1,...} and after all histories, 1. the action of each player (weakly) maximizes her expected payoff given her information so far [ ] E t δ k u i (t + k) 2. the beliefs p t are determined by Bayes rule, given the prior p 0. 0 Note that in the definition above we used the notation E t ( ) to denote expectation with respect to the probability measure induced by the information set I i (t) of each player i {I, E}. In this paper we focus on the interesting subset of Perfect Bayesian Equilibria where there is exploration for at least one time period with positive probability, as defined below. Definition 2. A set of strategy profiles and beliefs (e t ) t {0,1,...} and (i t ) t {0,1,...} for players E and I respectively, and a belief process (p t ) t {0,1,...}, is an entry equilibrium if it is a perfect Bayesian Equilibrium, P(a 0 = 1) = 1 and P(e 0 = 1) > 0. For a given entry equilibrium, we will use V I (t, N t, a t, θ) to denote the value function of player I at time t given the state a t of Player E, the absence (or not) of a revelation and the state θ. Similarly, we will use V E (t, N t, a t ) to denote the value function of player I at time t given the state a t of Player E, the absence (or not) of a revelation and the state θ. 4. Analysis of simple subgames We initiate out analysis by studying several simple subgames of the dynamic game under consideration. In particular we first consider the case of a bad reward process and observe that if this is the case, for all entry equilibria of the game, Player I never obfuscates. This result is fairly straightforward as in the case of a bad reward process Player I has no incentive to obfuscate information: rewards are zero anyway. Lemma 1. Assume that θ = 0. Then for any entry equilibrium, i t = 0 for all t {0, 1,...}. A direct consequence of the above lemma is that for any entry equilibrium For notational convenience we define V I (t, n t, a t, 0) = 0, for any t {0, 1,...}, n t {0, 1}, a t {0, 1} M. = k=0 δ k (λg m I ) = λgm I 1 δ,

6 6 Management Science 00(0), pp , c 0000 INFORMS R I R E. = k=0. = k=0 δ k (λg d I ) = 1 1 δ λgd I, δ k (λg d I ) = 1 1 δ (λgd I c E ). We next consider a subgame after the exit of Player E. Since exit decisions are final, whenever a t = 0, Player I has no incentive to obfuscate information any longer and her expected discounted payoff is calculated in a straightforward manner. Lemma 2. Assume that a t = 0 and that the reward process is good (θ = 1). Then for all PBE, i t = 0 and V I (t, n t, 0, 1) = M and V E (t, n t, 0) = 0, for any n t {0, 1} and t {0, 1,...}. Proof: Since Player E exited and exit decisions are final, player E will be receiving at each time period a reward equal to g m I with probability λ. Hence, her value function is equal to M = δ k t λg m I k=t = λgm I 1 δ, where we used the fact that positive rewards arrive at each tie instant with probability λ. We next consider the subgame after a revelation has occured. In particular we show that after a revelation occurs, i.e. at any time t T, Player I never obfuscates and Player E always exploits, i.e., i t = 0 and e t = 1. This result is fairly intuitive as after a revelation, Player I cannot affect the information of Player E any longer, and henceforth her subsequent actions. Therefore, the former is better off to not incur the obfuscation cost. Lemma 3. Consider a subgame, at period t, after a revelation has occured. For all entry equilibria, (i) e t = 1 and i t = 0. (ii) V E (t, 0, 1) = R E, V I (t, 0, 1, 1) = R I. Proof: For the purposes of a contradiction, assume that Player E either is indifferent between exploring and exiting or strictly prefers to exit. If this is the case, we have λq t (g d E) c E + δv E (t + 1, 0, 1) 0 = V E (t, 0, 0). On the other hand, V E (t + 1, 0, 1) 0 (since exiting is always an option), hence, q t c E. λge d

7 Management Science 00(0), pp , c 0000 INFORMS 7 Note that for all t after a revelation p t = 1 and therefore, it has to be that P(i t = 0 θ = 1, N c t ) c E < 1, λge d by Assumption 1. Therefore, the incumbent either strictly prefers to obfuscate, or is indifferent between obfuscation and no obfuscation. In other words, her payoff from obfuscating: ( c I + P(e t = 0 N c t ) λ g t I + δ ) 1 δ gm I + P(e t = 1 N c t )δv I (t + 1) has to be weakly larger than her payoff from not obfuscating: ( P(e t = 0 N c t ) λ g m I + δ ) 1 δ gm I + P(e t = 1 N c t )δv I (t + 1) + P(e t = 1 N c t )λ g d I, which is a contradiction since g m I > g t I and g d I > 0. The second part of the lemma follows from a simple calculation. In order to handle the more complicated subgames for which a revelation has not yet occurred and Player E is still in the game learning the type of the reward process, we need to study the evolution of the beliefs q t and p t for different strategy profiles. This is the content of the next Section. 5. Belief Trajectories In this section we analyze the learning behavior of Player E for a given PBE of the game. In particular, we characterize the evolution of the beliefs p t and q t given a strategy profile (i t ) t {0,1,...} of Player I. The first lemma of this section is concerned with the evolution of the belief p t. In particular, we expect the belief that the reward process is good to (weakly) decrease over time in the absence of a revelation, regardless of the strategy of the incumbent. On the other hand, the rate at which the belief p t decreases depends drastically on the strategy profile of Player I, as the next lemma illustrates. Lemma 4. Let (i t ) t {0,1,...} be the stochastic process that defines the strategy of Player I. Furthermore, consider the beliefs p t and q t as defined in Equations (1) and (2). Then, Proof: By definition, P(i t = 0 N t, a t = 1, θ = 1) p t+1 p t = λp t (1 p t ) 1 λp t P(i t = 0 N t, a t = 1, θ = 1). p t+1 = P(θ = 1 O t, N t ), where we use O t to denote the event that there was no revelation (exactly) at time t. We can then write p t+1 = P(O t θ = 1, N t )P(θ = 1 N t ) P(O t θ = 1, N t )P(θ = 1 N t ) + P(O t θ = 0, N t )(1 P(θ = 1 N t )).

8 8 Management Science 00(0), pp , c 0000 INFORMS Next we evaluate the different terms for a given strategy profile of Player I. In particular, we observe that P(O t θ = 0, N t ) = 1, since the probability of a revelation if the reward process is bad is equal to zero. Furthermore, we observe that P(θ = 1 N t ) = p t, by definition. Finally, P(O t θ = 1, N t ) = 1 λp(i t = 0 N t, a t = 1, θ = 1), since, if the reward process is good, there is an arrival at time t with probability λ if and only if Player I is not obfuscating. Combining the above we obtain the desired result. As we can see from the lemma above, the rate at which the belief p t drops is monotonically increasing with P(i t = 0 N t, a t = 1, θ = 1). In the extreme case where the incumbent always obfuscates, the belief p t does not change, as the absence of a revelation is explained by the obfuscation action and not by the type of the reward process. In the other extreme (that we will study shortly), learning happens at a maximum rate as the absence of a revelation can only be attributed to the reward process being bad. This observation together with the fact that q t = P(θ = 1, i t = 0 N t ) = P(i t = 0 θ = 1, N t )p t, (3) describes the fundamental tradeoff that governs the strategy of Player I. If she obfuscates intensively (P(i t = 0 θ = 1, N t ) is small), then the rate at which the agent is learning is small and Player E tends to stay longer. On the other hand if she does not obfuscate intensively (P(i t = 0 θ = 1, N t ) is large) the agent learns fast but there is a higher chance that a revelation occurs (since the probability of a revelation is proportional to λq t ) in which case she would have to share payoffs indefinitely. Optimally balancing between these two effects is the key to solving the competitive exploration game as we shall soon show. 6. Case I: Full experimentation is an equilibrium In order to develop some intuition on the learning aspect of our problem, we study a special case where Player I never obfuscates, i.e. i t = 0 for all t {0, 1,...} and we provide conditions under which this can happen as an outcome of an entry equilibrium. For this special case, we first analyze the evolution of beliefs in the following lemma.

9 Management Science 00(0), pp , c 0000 INFORMS 9 Lemma 5. Assume that i t = 0 for all t {0, 1,...}. Then p t = p 0. p p 0 (1 λ) t Proof: One can prove the lemma using two different methods. The most straightforward way is to use the fact that for all t {0, 1,...} to obtain that P(i t = 0 θ = 1, N t ) = 1 1 p t+1 p t = λp t (1 p t ), 1 λp t from Lemma 4 and verify that indeed the solution is as given in the statement of the lemma. Alternatively, by definition p t = P(θ = 1 N t ). Using Bayes rule and assuming that i t = 0 for all t {0, 1,...}, we obtain Note that while which concludes the proof. p t = P(N t θ = 1)p 0 P(N t θ = 1)p 0 + P(N t θ = 0)(1 p 0 ). P(N t θ = 0) = 0, P(N t θ = 1) = (1 λ) t, The next lemma, characterizes player s E optimal strategy when i t = 0 for all t {0, 1,...}. We will refer to this strategy for Player E as the full exploration strategy. Lemma 6. Assume that i t = 0 for all t {0, 1,...}. Then, as long as no revelation occurs, player E explores until her belief drops below c E p = λ g E d + δr E in which case she exits. If a revelation occurs she exploits indefinitely. Proof: We start by writing the Bellman equation associated with the decision problem of Player E. We remind the reader that V E (t, 1, 1) denotes the value function of Player E as long as no

10 10 Management Science 00(0), pp , c 0000 INFORMS revelation has occurred and she is still in the game. The Bellman equation corresponding can be written as V E (t, 1, 1) = max{v E (t, 1, 0), c E + λp t g d E + λp t δv E (t + 1, 0, 1) + δ(1 λp t )V E (t + 1, 1, 1)} = max{0, λp t g d E + λp t δr E + δ(1 λp t )V E (t + 1, 1, 1)} where the first term in the maximization corresponds to the decision of leaving the game (exit) which yields payoff equal to 0 by Lemma 2, while the second term corresponds to the decision of staying in the game. In particular, if Player E decides to stay in the game, she pays the exploration cost ( c E ) and if a revelation occurs (which happens with probability λp t ) she obtains an immediate reward equal to g E d and a continuation payoff equal to R E as obtained in Lemma 3. If a revelation does not occur, she obtains V E (t + 1, 1, 1). Let t denote the last time for which it is optimal for Player E to stay in the game. Then, V E ( t + 1, 1, 1) = 0, in which case V E ( t, 1, 1) = c E + λp t g d E + λp tδr E 0. Hence, concluding the proof. p t c E λ g E d + δr, E As it turns out, this belief p which we refer to as the indifference belief is very important for the analysis that follows. Note that if for some Perfect Bayesian Equilibrium of the game under consideration, there exists some time t, where the belief p t is less than p then it is (strictly) optimal for I to stop obfuscating. Essentially, even if I did not obfuscate, E would exit anyway. Therefore, Player I can save the cost of obfuscation. Corollary 1. Let (e t, i t, p t ) be an entry equilibrium and assume that at some t, p τ < p. Then, i t = 0 and e t = 0. As a last remark, note that if p 0 < p then, there does not exist an entry equilibrium. Corollary 2. If p 0 < p, there does not exist and entry equilibrium. In the rest of this section, we show that if c I + λ g d I λδ (M R I ) (4) then it is optimal for the incumbent to not obfuscate for all periods t {0, 1,...}. In order to obtain some intuition on this result, consider some period t for which it is guaranteed that if Player E

11 Management Science 00(0), pp , c 0000 INFORMS 11 does not exit by time t + 1 she will exit at time t + 1 with probability one. Consider the decision problem of the entrant. If she obfuscates, she pays the obfuscation cost c I and the opportunity cost λ( g I d ). On the other hand, by obfuscating she ensures that even if a revelation would occur (with probability λ it would be obfuscated from the entrant who in turn would exit at the next period. Therefore, obfuscation leads to a benefit of λδ(m R I ) (monopoly minus shared payoffs). If c I + λ g I d > λδ(m R I ) she strictly prefers to not obfuscate and let Player E explore. The following lemma formalizes this intuition. Lemma 7. Assume that Player E follows the full exploration strategy. If (4) holds, then it is optimal for Player I to not obfuscate, i.e., i t = 0 for all t {0, 1,...}. Proof: Let us denote by t the first time that p t < p. According to the full exploration strategy, Player E explores until time t 1 and exits at time t as long as no revelation occurs. If a revelation occurs by time t, she exploits indefinitely. We first use a simple Dynamic Programming argument to argue that for all t {0, 1,...} and given the full exploration strategy for Player E, V I (t, 1, 1, 1) V I (t + 1, 1, 1, 1). (5) Indeed, since Player E will exit at time t and will be exploring with probability one until then, Player I can use at time t + 1 a copy of the strategy at time t and get weakly better utility (since looking at time t + 1 Player E will exit sooner). We now prove that under the conditions of the lemma, it is optimal for Player I to not obfuscate. According to the full exploration strategy, e t = 0 for all t t and hence, by Lemma 2 the value function of Player I is equal to V I ( t, 1, 0, 1) = M. We will now use induction to prove that it is optimal for Player I to not obfuscate for all times t k, where 1 k t. We first show that this is the case for k = 1. Indeed, at time t 1, Player I can obfuscate in which case she receives c I + δv I ( t, 1, 0, 1) = c I + δm, or not obfuscate in which case she receives λ( g d I + δv I ( t, 0, 0, 1)) + (1 λ)δv I t, 1, 0, 1) = λ( g d I + δr I ) + (1 λ)δm.

12 12 Management Science 00(0), pp , c 0000 INFORMS The first term in the above corresponds to the event of a revelation while the second term corresponds to non-occurrence of a revelation. The equality follows from Lemma 3 and the definition of t respectively. Under Assumption (4), c I + δm λ( g d I + δr I ) + (1 λ)δm, and hence, it is (weakly) optimal for Player I to not obfuscate at time t 1. For the induction step, assume that for some k t 1 it is optimal for the incumbent to not obfuscate at time t k, i.e. c I + δv I ( t k + 1, 1, 0, 1) λ( g d I + δr I ) + (1 λ)δv I ( t k + 1, 1, 0, 1), or equivalently, c I + λδv I ( t k + 1, 1, 0, 1) λ(g d I + δr I ). Using Equation (5), we obtain c I + λδv I ( t k, 1, 0, 1) c I + λδv I ( t k + 1, 1, 0, 1) λ(g d I + δr I ), which yields c I + δv I ( t k, 1, 0, 1) λ(g d I + δr I ) + (1 λ)δv I ( t k, 1, 0, 1), and hence it is optimal to not obfuscate at time t (k + 1) too, concluding the proof. Combining Lemmata 6 and 7 we obtain the first result of this paper. Theorem 1. If Assumption (4) holds and p 0 > p, then the strategies (e t, i t ) defined in Table 2 and the belief p t defined in Lemma 5 constitute the unique entry equilibrium of the competitive exploration game. Proof: The fact that the proposed strategy profiles and beliefs constitute an entry equilibrium follows from Lemmata 6 and 7. Proof of uniqueness can be found in Appendix 8.1. Figure 4 is an illustration of the typical outcome of the game, under Assumption (4) and p 0 > p. Player E explores until her belief drops below p and the incumbent never obfuscates. Value functions of Player E and Player I are decreasing and increasing respectively (until the indifference belief).

13 Management Science 00(0), pp , c 0000 INFORMS 13 value functions Value function of Player I Value function of Player E Belief of Player E Indifference threshold belief Figure 2 Unique outcome of the competitive exploration game under Assumption (4) and p 0 > p. time Table 2 The full exploration equilibrium (p 0 > p ) before { revelation after revelation 1, if p t p, Player E e t = e 0, if p t < p t = 1 Player I i t = 0 i t = 0 7. Case II: Partial experimentation is an equilibrium In the previous section we considered the case where c I + λ g d I λδ (M R I ), and showed that full experimentation and no obfuscation is the unique entry equilibrium of the competitive exploration game. In this section we consider the complementary case where c I λ g d I < λδ (M R I ), (6) and characterize the entry equilibrium. Under this assumption, Player I can now afford to pay the obfuscation cost (direct cost c I and opportunity cost λ g I d ) in order to push the entrant out of the game, obtaining the benefit of λδ (M R I ). Therefore, the full exploration no obfuscation set of strategies of Table 2 cannon constitute an equilibrium any longer, since Player I would obfuscate with probability one right before the indifference belief p leading to a premature exit of the entrant. This intuition is crystallized in the next Lemma. Lemma 8. Assume that Assumption 6 holds then for any entry equilibrium, for all subgames such that p t > p and P(e t+1 = 0 N t ) = 1 then P(e t = 1) < 1.

14 14 Management Science 00(0), pp , c 0000 INFORMS Proof: Consider a period t, such that p t > p, P(e t+1 = 0 N t ) = 1 but P(e t = 1 ) = 1. Then, if the incumbent obfuscates, she obtains c I + δm while if se does not she obtains +λ( g d I + δr I ) + (1 λ)δm. By Assumption 6 her payoff from obfuscating is strictly larger and therefore P(i t = 1 N t ) = 1. If this is the case Player E obtains a payoff of c E from exploring (since both her continuation and her immediate payoffs are zero), while her payoff from exiting is 0. Therefore she strictly prefers to exit, which contradicts with P(e t = 1 N t ) = 1. In order to study this richer and insightful case of the competitive exploration game, we start with some preliminary notations. First, we denote by F (k) the (unique) solution to the system of difference equations r(k) = λδf (k 1) λ gd I λδr I c I λδf (k 1) λ g d I λδr I F (k) = c I + r(k)(λg t I + δm) + (1 r(k))δf (k 1) (7) F (0) = M Moreover, we write and K 1 = inf{k {0, 1,...} : r(k) < 0}, (8) p = 1 (1 p )(1 λp ) K 1+1. (9) Finally, we recursively define for any belief p t p, the number of periods until the belief drops below p as k(p t ) = { log ( 1 p t 1 p ) k(p t+1 ) + 1, 1 log(1 λp ), if p t p if p t > p The next Lemma follows from the definition of p as given above. Lemma 9. If p > p if and only if Assumption 6 holds. Proof: If p > p then K 1 > 1 and hence r(1) > 0. Therefore, r(1) = λδm λ g d I λδr I c I λδm λ g d I λδr I + λ( g m I gt I ) > 0 which in turn yields λδ(m R I ) > c I + λ g d I, concluding the proof. Following the same steps backwards, we can prove the other direction.

15 Management Science 00(0), pp , c 0000 INFORMS 15 Using these definitions, we now consider the strategies of the two players, as defined in Table 3 and we refer to this set of strategies as the partial experimentation strategies. In the rest of this section we analyze this strategy and prove that they capture the unique outcome of the competitive exploration game. We first start by analyzing the evolution of beliefs under the partial experimentation strategies. Lemma 10. Assume that Player I uses the strategy described in Table 3. Then for all t t and for all t > t, where is the last time where p t p. p t = p t = 1 p 0. p p 0 (1 λ) t { ( p(1 p0 ) t = max log (1 p)p 0 (1 p t) (1 λp ). t t ) } 1, 0, log(1 λ) Proof: By the definition of the strategy of Player I, as long as p t p, i t = 0 and hence by Lemma 4 we obtain the first part of the result. When p t p, P(i t = 1 θ = 1, N t ) = p p 0, where p is the indifference belief defined in Lemma 6. Using Lemma 4 we get and therefore λp p t+1 p t = (1 p t ) 1 λp, p t+1 = p t λp 1 λp. It is straightforward to verify that the solution to the above difference equation is given by the expression in the statement of the lemma. Figure 3 illustrates the evolution of beliefs on under this set of strategies. Observe that, due to the obfuscation action the belief of the entrant drops at a lower rate than the optimal but still at a high enough rate so that he stays in the game with positive probability. Next, we prove that the partial experimentation strategies and the beliefs p t as defined in Lemma 10 constitute a PBE of the dynamic exploration game. Before diving into the technical description consider some time t where the belief p t is close to the indifference level p (but still above). By Assumption 6, Player I is willing to pay the obfuscation cost in order to drive Player E out of

16 16 Management Science 00(0), pp , c 0000 INFORMS p t p 0 p full exploration belief trajectory partial exploration belief trajectory Figure 3 p partial obfuscation Belief trajectory under partial experimentation strategies. t Table 3 The partial experimentation strategies before revelation after revelation Belief p t p t p p p t < p p t p p t = 1 Player E e t = 1 P(e t = 0 N t ) = r(k(p t )) e t = 0 e t = 1 Player I i t = 0 P(i t = 0 N t, θ = 1) = p p t i t = 0 i t = 0 the game and hence is willing to obfuscate with positive probability. On the other hand Player E, at equilibrium is aware that this is a possibility and hence takes this into account in the update of her belief. Therefore the belief of the entrant close to p moves slower than for higher beliefs and the entrant tends to stay a little longer in order to learn. On the other hand her as the obfuscation probability increases her immediate payoffs and learning payoffs tend to decrease which leads to an effect in the opposite direction. Therefore Player I should obfuscate at an appropriate rate (probability) so that she makes Player E indifferent between staying and exiting, essentially cancelling out the two effects. This observation drives the development of the equilibrium strategies. Lemma 11. The strategies (i t, e t ) defined in Table 3 and the belief system p t as defined in Lemma 10 are a PBE of the dynamic exploration game. Proof: We start by proving optimality for every subgame, for Player E. For all subgames after a revelation Lemma 3 proves that e t = 1, i t = 0 are optimal. Moreover, for all subgames for which p t p, Corollary 1 proves that e t = 0, i t = 0 are optimal. Next, we consider all subgames for which p t p. We claim that for all such subgames V E (t, 1, 1) = 0. We already argued that this is the case for p t < p. Consider now subgames for which p p t p. In all such cases, according to her strategy, as long as no revelation has occurred and Player E is still in the game, Player I is obfuscating with probability p /p t and hence q t = q. Therefore,

17 Management Science 00(0), pp , c 0000 INFORMS 17 V E (t, 1, 1) = max{0, c E +λq t ( g d E +δd)+(1 λq t )δv E (t+1, 1, 1)} = max{0, (1 λp )δv E (t+1, 1, 1)}, where the second equality follows from the definition of p. From Lemma 10, there exists some time T, for which p T < p, and hence V E (T, 1, 1) = 0. The latter is a boundary condition for the Bellman Equation. Hence V E (t, 1, 1) = 0 is a solution to the Bellman equation for which the proposed actions are (weakly) optimal. Hence Player E is indifferent between exiting and exploring, making the proposed strategy optimal. Finally, for all subgames for which p t > p, if Player E exits she obtains 0 while if she explores she obtains c E + λp t ( g d E + δr E ) + (1 λq t )δv E (t + 1, 1, 1) λ(p t p )( g d E + δd) > 0, where the second inequality follows from the definition of p and the fact that V E (t + 1, 1, 1) 0 (there is always the option to exit). Hence the exploration action is optimal. We now focus on Player I and we start by considering subgames for which p t < p. In such subgames V I (t, 1, 1, 1) = M. Next, we consider subgames for which p p t p. The Bellman equation can be written as V I (t, 1, 1, 1) = max{ c I + r(k(p t ))(λg t I + δm) + (1 r(k(p t )))δv (t + 1, 1, 1, 1), r(k(p t ))(λ g d I + δm) + (1 r(k(p t )))((1 λ)δv (t + 1, 1, 1, 1) + λ g d I + λδr I )}, where the first term corresponds to the obfuscation action and the second term corresponds to non-obfuscation. We claim that V I (t, 1, 1, 1) = F (k(p t )) and the proposed strategy solves the Bellman Equation with equality. By the definition of k(p t ) being the number of periods until the belief drops below p, for all p t p, k t 1. We prove the result using induction on k(p t ). First consider all beliefs for which k(p t ) = 1. In this case, by Lemma 10 and since k(p t ) = 1, p t+1 = 1 (1 p t )/(1 λ) 1 (1 p ) = p, and therefore V I (t + 1) = M. Therefore the payoff from obfuscating equals c I + r(1)(λg t I + δm) + (1 r(1))δm and the payoff from not obfuscating equals r(1)(λ g d I + δm) + (1 r(1))((1 λ)δm + λ g d I + λδr I )}.

18 18 Management Science 00(0), pp , c 0000 INFORMS Note that by definition r(1) = c I λ( g m I g t i) λδm λ g d I λδr I λ( g m I gt I ), and therefore the payoffs from obfuscating and not obfuscating are equal, making Player I indifferent between obfuscating and not obfuscating. Hence, V I (t, 1, 1, 1) = c I + r(1)(λg t I + δm) + (1 r(1))δm = F (1), for all subgames such that k(p t ) = 1, where the last equality follows from the definition of F (1), concluding the first step of the induction. Assume now that for all subgames for which k(p t ) = k it holds that V I (t, 1, 1) = F (k). We prove that this is also the case for all beliefs for which k(p t ) = k + 1. We first prove that if k(p t ) = k + 1, then k(p t+1 ) = k. Indeed, ( 1 pt+1 k(p t+1 ) = log 1 p where we used the fact that ) ( 1 1 pt = log log(1 λp ) 1 p p t+1 = 1 1 p t 1 λp, from Lemma 10. Therefore the payoff from obfuscating equals ) 1 log(1 λp ) + = k. log(1 λp ) log(1 λp ) c I + r(k + 1)(λg t I + δm) + (1 r(k + 1))δF (k) and the payoff from not obfuscating equals r(k + 1)(λ g m I + δm) + (1 r(k + 1))((1 λ)δf (k) + λ g d I + λδr I )}. By the definition of r(k) the payoffs from obfuscating and not obfuscating are equal, making Player I indifferent between obfuscating and not obfuscating. Furthermore, V I (t, 1, 1, 1) = c I + r(k + 1)(λg t I + δm) + (1 r(k + 1))δF (k) = F (k + 1) which in turn equals F (k + 1) by Equation (16). We finally consider all subgames for which p t p. By the definition of p, p t > p implies k(p t ) > K 1. We first prove that for all beliefs p t for which k(p t ) > K 1, λδv I (t, 1, 1, 1) λ g d i + λδr I + c I, V I (t, 1, 1) V I (t + 1, 1, 1, 1),

19 Management Science 00(0), pp , c 0000 INFORMS 19 and that it is optimal for Player I to not obfuscate. We proceed using induction. If k(p t ) = k 1 = K 1 + 1, then r(k 1 ) < 0, which in turn implies that λδv I (t + 1, 1, 1, 1) < λ g d i + λδr I + c I. (10) If Player I obfuscates she obtains c I + δv I (t + 1, 1, 1, 1), while if she does not obfuscate she obtains λ g d i + λδr I + (1 λ)δv I (t + 1, 1, 1, 1). Using Equation (10) we conclude that the incumbent strictly prefers to not obfuscate. Furthermore, V I (t, 1, 1, 1) = λ g d I + λδr I + (1 λ)δv I (t + 1, 1, 1, 1) λ g d I + δ(λr I + (1 λ)v I (t + 1, 1, 1, 1)) (1 δ)v I (t + 1, 1, 1, 1) + δv I (t + 1, 1, 1, 1) = V I (t + 1, 1, 1, 1), where we used the fact that V I (t + 1, 1, 1, 1) R I λgd I 1 δ λ gd I 1 δ, hence proving the base case. The inductive step follows exactly the same arguments and we omit it here for brevity. 8. Application: Market entry with uncertain demand In this section we consider an application of our framework in a market entry situation. In particular, we consider two players: an entrant (E) and an incumbent (I). In any period t the inverse demand function can be described by P 0 (Q) = 0 or P 1 (Q) = max {α(q Q o ), 0}. The market can either be good or bad. If the market is good, the inverse demand function P 1 (Q) is realized with probability λ and the demand function P 0 (Q) is realized with probability (1 λ), independently over time. If the market is bad, the demand function P 0 (Q) is always realized. The

20 20 Management Science 00(0), pp , c 0000 INFORMS payoff Value Function of Player I Value Function of Player E Belief of Player E Indifference threshold Partial exploration threshold belief Figure 4 Unique outcome of the competitive exploration game under Assumption (6) and p 0 > p. time P with probability λ if good, with probability 0 if bad with probability (1 λ) if good, with probability 1 if bad 0 α Figure 5 Q min Demand functions for different types of the market. Q o Q two types of the demand function are depicted in Figure 5. We denote by P t (Q) the (random) inverse demand function that is realized each period. Marginal production cost is c for the entrant and 0 for the incumbent. Moreover, the entrant needs to produce at least Q o in order to participate in the market. This assumption is imposed to prevent "free" monitoring of the market from the entrant (costly experimentation). The type of the market is (initially) unknown to the entrant and she assigns probability p 0 to the event that the market is good. We denote by p t the probability that she assigns to the event that the market if good given all her information by time t. The incumbent knows the type of the market. Each of the two players, at any time instant, decide on a production quantity Q i (t), in

21 Management Science 00(0), pp , c 0000 INFORMS 21 order to maximize their expected discounted payoff [ ] E δ k (Q i (t + k)p t+k (Q(t + k)) cq i (t + k)), k=0 where Q(t) = Q E (t)+q I (t) denotes the total production quantity at period t. Note that the entrant at any time t can decide to produce zero quantity which corresponds to the decision of exiting the market and is irreversible. We denote by a t = 1 QE (t 1)>0 the state of the entrant at time t. In other words a t = 1 if and only if the entrant has not exited the market so far. If the entrant observes a positive price while she is producing a positive quantity, we say that a revelation occurred, since immediately after that event she knows that the market is good with probability one. For the remainder of this section we denote by p t the probability that the entrant assigns to the market being good, given that no revelation has occurred until time t. Next, we enumerate a set of assumptions that we make for the remainder of this section. Fist, we would like to ensure that it is myopically suboptimal for the entrant to explore or equivalently that her immediate payoff is always smaller than her immediate cost. max {E t [P t (Q E + Q I )] c} < 0. Q E,Q I It is fairly straightforward to verify that this assumption is equivalent to the following. Assumption 1. The initial belief of the entrant is such that p 0 < c λα(q Q min ). This assumption does not change the analysis and the results significantly it just makes the calculations significantly simpler. Finally, we ensure that after a revelation the minimal production constraint is not binding for the entrant. As we shall soon verify (Lemma 12), this is ensured when the following natural assumption holds. Assumption 2. Q 0 > 3Q min + c λα Note that in this case, if the market turns out to be good, it will produce strictly positive payoffs for the entrant. Similar to the previous sections, we want to characterize the entry equilibria of this game and understand the properties of the corresponding outcomes. Observe that in this case the action space (and therefore the strategy space) is much richer than our original competitive experimentation

22 22 Management Science 00(0), pp , c 0000 INFORMS game. We will shortly prove that essentially the two games are equivalent and hence we can provide a strong prediction on the outcome of this entry game. Note that all the production quantities above Q o Q min for the incumbent are equivalent so without loss of generality we can assume that Q I (t) [0, Q o Q min ]. Similarly, all the production quantities above Q 0 are strictly dominated for the entrant and therefore, we can assume that Q I (t) {0} [Q min, Q o ). We begin our study of this game by analyzing the subgames after a revelation. It is fairly obvious that after a revelation, the learning aspect of the game is not relevant and the two players engage in a standard Cournot competition. Lemma 12. For any subgame after a revelation has occurred the players receive payoffs equal to and set quantities R I = λα Q 0 1 δ 3 R E = λα 1 δ Q 0 3 ( Qo 3 + c ) ( 3λα Qo 3 c ). 3λα Q I (t) = Q o 3 + c 3λα, Q E(t) = Q o 3 c 3λα. Similarly, after the entrant exits, that we denote by a t = 0, the incumbent is setting her production quantity to her monopoly levels, as described in the following lemma. Lemma 13. Assume that the entrant has exited, i.e. a t = 0. Then the incumbent obtains and produces Q o /2. M = λα Q 2 o 1 δ 4, We next focus on all the subgames for which a revelation has not yet occured and the entrant is still in the game (a t = 1). Our next lemma argues that even though the entrant can chose any production quantity in 0 [Q min, Q o ] all the actions except 0 and Q min are strictly dominated. This is fairly intuitive as before the revelation, and while the entrant is uncertain about the type of the market, she would not invest in producing more than the minimum required quantity since such a decision would solely decrease her immediate payoff production cost without affecting the informational outcome of the game. Lemma 14. For any entry equilibrium, if a t = 1 and no revelation has occurred until time t, Q E (t) {0, Q min }.

23 Management Science 00(0), pp , c 0000 INFORMS 23 Proof: Consider the decision problem of the entrant at time t. If it is strictly optimal to exit, she decides Q I (t) = 0 and obtains payoff equal to 0. If she decides to stay, she produces an optimal production quantity Q that maximizes her payoff which is in turn equal to V (Q) = c E Q + QλE t [P (Q(t) Q 1 (t) = Q)] + (1 λp t (Q I (t) + Q < Q o, θ = 1))δV E (t + 1, 1, 1) + λp (Q I (t) + Q < Q o, θ = 1) δr E. We now prove that V (Q) is strictly decreasing in Q. Indeed for any ɛ > 0, note that P t (Q I (t) + Q + ɛ < Q o, θ = 1) P t (Q I (t) + Q < Q o, θ = 1), and V E (t + 1, 1, 1, 1) R E, since R E is the maximum payoff that the entrant can achieve (after revelation). Moreover, since in all cases the inverse demand function is decreasing in the total production quantity, E t [P (Q(t)) Q E (t) = Q + ɛ] E[P (Q(t)) Q E (t) = Q]. Therefore, we get Observe that V (Q + ɛ) V (Q) ( c E + λe t [P (Q t ) Q 1 (t) = Q])ɛ. λe t [P (Q t ) Q 1 (t) = Q] λp t α(q o Q min ) λp 0 α(q o Q min ) < 0. where the first inequality follows from the fact that the probability of an arrival is always less than p t, and from the fact that the price is always less than α(q o Q min )and the second inequality follows from the fact that p t p 0 as long as no revelation has occurred. Hence, V (Q + ɛ) V (Q) < 0, from Assumption 1. The latter proves that it is indeed the case that V (Q) is strictly decreasing in Q which in turn yields that Q(t) {0, Q min }. The previous lemma implies a very specific structure in the possible strategies of the entrant. In particular at any period t she either explores (Q t = Q min ) that we denote by e t = 1 or she exits (Q t = 0) that we denote by e t = 0. Given this structure on the strategies of the entrant, the decision problem of the incumbent greatly simplifies as the following lemma illustrates.

24 24 Management Science 00(0), pp , c 0000 INFORMS Lemma 15. For each time t, if a revelation has not occurred and the entrant is still in the market (a t = 1), { Q I (t) Q o Q min, Q } o + P(Q E (t) = Q min )Q min 2 Proof: At any period t, the continuation payoff of the incumbent depends only on the occurrence or not of a revelation. In particular, she may either produce Q I (t) = Q o Q min in which case the probability of a revelation is zero, or she may produce Q I (t) < Q o Q min in which case the probability of a revelation, if the entrant does not exit is equal to λ. Therefore, we can write her Bellman Equation as V I (t, 1, 1, 1) =P(Q E (t) = 0)δM + {P(Q E (t) = 0)λ(Q o Q min )α(q min ) + δp(q E (t) = Q min V I (t + 1, 1, 1, 1), max Q I <Q o Q min {P(Q E (t) = 0)λQ I α(q o Q I ) + P(Q E (t) = Q min )λq I α(q o Q min Q I )} +(1 λ)p(q E (t) = Q min )δv I (t + 1, 1, 1, 1) + λp(q E (t) = Q min )R I.} In the above expression the first term corresponds to the continuation payoff if the entrant exits, which does not depend on the quantity produced by the incumbent. The second term corresponds to her payoff if she decides to produce Q o Q min. In the latter case, no revelation will occur with probability one and the only case she obtains a positive immediate payoff is when the entrant exits. The third term corresponds to the case where she decides to produce less than Q o Q min, allowing the possibility of a revelation. In that case, if a revelation does not occur she obtains continuation payoff equal to δv I (t + 1, 1, 1, 1). Otherwise, she obtains continuation payoff equal to R E (independent of Q I ) and immediate payoff equal to max {P(Q E (t) = 0)λQ I α(q o Q I ) + P(Q E (t) = Q min )λq I α(q o Q min Q I )}. Q I <Q o Q min The above is maximized when Q I = Q o + P(Q E (t) = Q min )Q min. 2 Note that the above is strictly smaller than Q o Q min by Assumption 2. We proved that the action of the entrant is to either explore (denoted here onwards by e t = 1) in which case the production quantity is Q min or exit (in which case the production quantity is 0). Similarly the incumbent is either obfuscating (denoted here onwards by i t = 1) in which case she produces Q I (t) = Q 0 Q min or not obfuscating (denoted here onwards by i t = 0) in which case she produces Q I = Q o + P(e t = 1)Q min 2 < Q o Q min.

25 Management Science 00(0), pp , c 0000 INFORMS 25 If the incumbent stops obfuscating and a revelation occurs the immediate payoff for the entrant is equal to g d E (. = αq min Q o Q ) o + P(e t = 1)Q min Q min, 2 while the immediate payoff for the incumbent is equal to ( ) ( g d. Qo + P(e t = 1)Q min I = α Q o Q ) o + P(e t = 1)Q min Q min. 2 2 If the incumbent stops obfuscating and the entrant exits, the immediate payoff for the incumbent is equal to g m I. = α ( Qo + P(e t = 1)Q min 2 ) ( Q o Q ) o + P(e t = 1)Q min. 2 Finally, if the incumbent obfuscates and the entrant exits, the immediate payoff for the incumbent is equal to g t I. = α (Q o Q min ) Q min Unique Equilibrium of the market entry game In the previous subsection, we argued that the market entry game can be simplified to a competitive exploration game between the incumbent and the entrant. The main difference between the two games is that the rewards at the time of the first arrival from the reward process depend on the strategy profile of the entrant. Note on the other hand that we did not use the fact that these rewards are constant anywhere in the analysis of Section 7. Therefore, if the set of equations F (k + 1) = c I + λ g t I(k) + δm (δm + λ gt I(k) δf (k))(c I λ( g m i (k) g t I(k))) δf (k) λ g d I (k) λr I λ( g m I (k) gt I (k)) (11) with initial condition, F (0) = M. g d I. = α g m I Moreover, we write g d E r(k) = λδf (k 1) λ gd I (k) λδr I c I λδf (k 1) λ g I d(k) λδr. I (. = αq min Q o Q ) o + (1 r(k))q min Q min, ( Qo + (1 r(k))q min 2 (. Qo + (1 r(k))q min = α 2 g t I 2 ) ( Q o Q o + (1 r(k))q min 2 ) ( Q o Q o + (1 r(k))q min 2. = α (Q o Q min ) Q min. Q min ). K 1 = inf{k {0, 1,...} : r(k) < 0} 1, (12) ).

26 26 Management Science 00(0), pp , c 0000 INFORMS and p = 1 (1 p )(1 λp ) K 1+1. (13) have a unique solution then they uniquely determine the outcome of the market entry game. In the next lemma we prove that this is indeed the case Lemma 16. The system of equations has a unique solution C onsider the equation: r = λδf (k 1) λ gd I (r) λδr I c I λδf (k 1) λ g I d(r) λδr. I We will prove that there at most one solution to the above in [0, 1]. Indeed, consider the function g(r) = r(λδf (k 1) λ g d I (r) λδr I ) λδf (k 1) λ g d I (r) λδr I c I. We will prove that g(r) is strictly increasing in [0, 1]. g (r) = d dr gd I (r)(r 1) + (λδf (k 1) λ g d I (r) λδr I ). To finish the proof note that g d I (r) is strictly decreasing in r for any r > 0.

27 Management Science 00(0), pp , c 0000 INFORMS 27 Appendix. Uniqueness of equilibrium In this section we prove uniqueness of the outcome at equilibrium. For the remainder of this section we assume that (i t, e t, p t ) is an entry equilibrium of the game and establish a sequence of properties that essentially pin down the outcome as predicted by the strategies of Table 3. Our first lemma ensures that the belief of the entrant will drop below the indifference belief p therefore, there is a (finite) time when she exits with probability one. If it was the case that the entrant stays in the game indefinitely with positive probability, at every time, she has to be sufficiently confident that her continuation payoff is at least as large as her exploration cost. But for her continuation payoff to be large enough it has to be the case that her belief that an arrival may happen in the future is also large enough. On the other hand, if her belief was large enough but the revelation did not occur, the drop in her belief at the next time period is also large, hence the beliefs will eventually drop below p. Lemma 17. There exists a (finite) time τ 0 such that a τ = 0 and p τ 1 p as long as no revelation has occurred. Proof: For the purposes of contradiction assume that for all t > 0, P(e t = 0 N t ) < 1. This is the case when Player E is either always indifferent between exploring and exiting, or strictly prefers exploring. Therefore for all t {0, 1,...}, V E (t, 1, 1) 0 and the payoff from exploring is weakly larger than the payoff of exiting and therefore c E + λq t ( g d E + δr E ) + (1 λ)v E (t + 1, 1, 1) 0, and q t c E + (1 λ)v E (t + 1) g d E + δr E p, (14) where the last inequality follows from the fact that V E (t + 1, 1, 1) 0. On the other hand, from Lemma 4, for all t {0, 1,...}, P(i t = 0 θ = 1, N t ) p t+1 p t = λ(1 p t ) 1 λp t P(i t = 0 θ = 1, N t ) = λ(1 p q t t). 1 λq t Note that the right hand side of the inequality is increasing with respect to q t and therefore, Therefore, and hence at any time λp p t+1 p t (1 p t ) 1 λp. p t 1 1 p 0 (1 λp ) t, ( ) 1 p0 t > log (log(1 λp )) 1, 1 p the belief p t is smaller that p. Therefore q t p t < p which contradicts (14), concluding the first part of the proof. Consider now the first time that it is strictly optimal to exit. If p τ 1 < p then τ 1 would have been the first time to exit by Corollary 1.

Wars of Attrition with Budget Constraints

Wars of Attrition with Budget Constraints Gagan Ghosh Bingchao Huangfu Heng Liu October 19, 2017 (PRELIMINARY AND INCOMPLETE: COMMENTS WELCOME) Abstract We study wars of attrition between two bidders who