On a Markov Game with Incomplete Information

Similar documents
SIGNALING IN CONTESTS. Tomer Ifergane and Aner Sela. Discussion Paper No November 2017

Quitting games - An Example

On Wald-Type Optimal Stopping for Brownian Motion

On the capacity of the general trapdoor channel with feedback

Elementary Analysis in Q p

Approximating min-max k-clustering

Analysis of some entrance probabilities for killed birth-death processes

Correspondence Between Fractal-Wavelet. Transforms and Iterated Function Systems. With Grey Level Maps. F. Mendivil and E.R.

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

Robust Solutions to Markov Decision Problems

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems

A Note on Guaranteed Sparse Recovery via l 1 -Minimization

CHAPTER 2: SMOOTH MAPS. 1. Introduction In this chapter we introduce smooth maps between manifolds, and some important

1 Gambler s Ruin Problem

A Social Welfare Optimal Sequential Allocation Procedure

2 K. ENTACHER 2 Generalized Haar function systems In the following we x an arbitrary integer base b 2. For the notations and denitions of generalized

Convex Optimization methods for Computing Channel Capacity

#A37 INTEGERS 15 (2015) NOTE ON A RESULT OF CHUNG ON WEIL TYPE SUMS

Convex Analysis and Economic Theory Winter 2018

k- price auctions and Combination-auctions

1 1 c (a) 1 (b) 1 Figure 1: (a) First ath followed by salesman in the stris method. (b) Alternative ath. 4. D = distance travelled closing the loo. Th

COMMUNICATION BETWEEN SHAREHOLDERS 1

Elementary theory of L p spaces

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Andrea Mantile. Fractional Integral Equations and Applications to Point Interaction Models in Quantum Mechanics TESI DI DOTTORATO DI RICERCA

LEIBNIZ SEMINORMS IN PROBABILITY SPACES

Introduction to Group Theory Note 1

MATH 2710: NOTES FOR ANALYSIS

Difference of Convex Functions Programming for Reinforcement Learning (Supplementary File)

The non-stochastic multi-armed bandit problem

#A47 INTEGERS 15 (2015) QUADRATIC DIOPHANTINE EQUATIONS WITH INFINITELY MANY SOLUTIONS IN POSITIVE INTEGERS

Improvement on the Decay of Crossing Numbers

HAUSDORFF MEASURE OF p-cantor SETS

ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS

Statics and dynamics: some elementary concepts

arxiv: v1 [cs.lg] 31 Jul 2014

Topic 7: Using identity types

Positive decomposition of transfer functions with multiple poles

On the Toppling of a Sand Pile

Advanced Calculus I. Part A, for both Section 200 and Section 501

Haar type and Carleson Constants

On a class of Rellich inequalities

Econometrica Supplementary Material

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Estimation of the large covariance matrix with two-step monotone missing data

Optimism, Delay and (In)Efficiency in a Stochastic Model of Bargaining

t 0 Xt sup X t p c p inf t 0

Introduction Consider a set of jobs that are created in an on-line fashion and should be assigned to disks. Each job has a weight which is the frequen

On the minimax inequality and its application to existence of three solutions for elliptic equations with Dirichlet boundary condition

BEST POSSIBLE DENSITIES OF DICKSON m-tuples, AS A CONSEQUENCE OF ZHANG-MAYNARD-TAO

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

Combinatorics of topmost discs of multi-peg Tower of Hanoi problem

On Doob s Maximal Inequality for Brownian Motion

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables

p-adic Properties of Lengyel s Numbers

A construction of bent functions from plateaued functions

BEST CONSTANT IN POINCARÉ INEQUALITIES WITH TRACES: A FREE DISCONTINUITY APPROACH

Distributed Rule-Based Inference in the Presence of Redundant Information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

A MONOTONICITY RESULT FOR A G/GI/c QUEUE WITH BALKING OR RENEGING

Adaptive Estimation of the Regression Discontinuity Model

p-adic Measures and Bernoulli Numbers

State Estimation with ARMarkov Models

Lecture: Condorcet s Theorem

Location of solutions for quasi-linear elliptic equations with general gradient dependence

Semiparametric Estimation of Markov Decision Processes with Continuous State Space

CHAPTER 3: TANGENT SPACE

15-451/651: Design & Analysis of Algorithms October 23, 2018 Lecture #17: Prediction from Expert Advice last changed: October 25, 2018

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

Extension of Minimax to Infinite Matrices

arxiv: v2 [math.na] 6 Apr 2016

RAMANUJAN-NAGELL CUBICS

Solution: (Course X071570: Stochastic Processes)

Sums of independent random variables

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

Chapter 5 Notes. These notes correspond to chapter 5 of Mas-Colell, Whinston, and Green.

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations

THE SET CHROMATIC NUMBER OF RANDOM GRAPHS

Introduction to Probability and Statistics

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Trading OTC and Incentives to Clear Centrally

Journal of Mathematical Analysis and Applications

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment

E-companion to A risk- and ambiguity-averse extension of the max-min newsvendor order formula

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction

arxiv:math/ v4 [math.gn] 25 Nov 2006

216 S. Chandrasearan and I.C.F. Isen Our results dier from those of Sun [14] in two asects: we assume that comuted eigenvalues or singular values are

Symmetric and Asymmetric Equilibria in a Spatial Duopoly

arxiv:cond-mat/ v2 25 Sep 2002

How to Fight Corruption: Carrots and Sticks

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law

PHYS 301 HOMEWORK #9-- SOLUTIONS

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Estimating Time-Series Models

Best approximation by linear combinations of characteristic functions of half-spaces

OPTIMAL DESIGNS FOR TWO-LEVEL FACTORIAL EXPERIMENTS WITH BINARY RESPONSE

Transcription:

On a Markov Game with Incomlete Information Johannes Hörner, Dinah Rosenberg y, Eilon Solan z and Nicolas Vieille x{ January 24, 26 Abstract We consider an examle of a Markov game with lack of information on one side, that was rst introduced by Renault (22). We comute both the value and otimal strategies for a range of arameter values. MEDS Deartment, Kellogg School of Management, Northwestern University, and Déartement Finance et Economie, HEC,, rue de la Libération, 78 35 Jouy-en-Josas, France. e-mail: j-horner@kellogg.northwestern.edu y Laboratoire d Analyse Géométrie et Alications, Institut Galilée, Université Paris Nord, avenue Jean-Batiste Clément, 9343 Villetaneuse, France; and laboratoire d Econométrie de l Ecole Polytechnique,, rue Descartes 755 Paris, France. e-mail: dinah@zeus.math.univ-aris3.fr z MEDS Deartment, Kellogg School of Management, Northwestern University, and the School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel. e-mail: eilons@ost.tau.ac.il x Déartement Finance et Economie, HEC,, rue de la Libération, 78 35 Jouy-en- Josas, France. e-mail: vieille@hec.fr { The research of the third author was suorted by the Israel Science Foundation (grant No. 69/-).

Introduction Zero-sum reeated games with incomlete information on one side were introduced by Aumann and Maschler (968, 995) in a seminal work. In such games, two layers lay reeatedly a zero-sum matrix game, whose ayo matrix deends on a state of nature that is drawn rior to the beginning of the game. One of the layers is informed about the outcome of the state of nature, while the other is not. During the game, the layers moves are observed, but not the corresonding ayo. Such games model long-term interactions with asymmetric information. The main issue is to what extent the informed layer should use his rivate information, and how. Aumann and Maschler rove the existence of the (uniform) value and characterize both the value and the otimal strategies of the layers. Recently, Renault (22) extended this model to include situations in which the state of nature follows a Markov chain, with exogenous transition function, thereby allowing for situations where the rivate information of the informed layer gets renewed at random times. For such games, Renault roved the existence of the uniform value. However, no exlicit formula for the value nor a characterization of otimal strategies is available. Neyman (24) rovided an alternative roof to Renault s results, using a reduction to the case of reeated games with incomlete information on one side a la Aumann and Maschler. We here rovide a number of results on a seci c simle game due to Renault (22). 2 Model and Main Results 2. The game We rst de ne the game. There are two ossible states of nature, s and s. The ayo matrices in the two states are given by: s s L R L R T B 2

We denote by g s (i; j) the ayo in state s induced by the action air (i; j). The actual state s n in stage n follows a Markov chain with transition function P(s n+ = s n js n ) =, where 2 [; ] is a arameter of the game. That is, irresective of the current state, there is a robability that the game will move to the other state at the next stage. We denote by the robability that s is s. The game roceeds in in nitely many stages. At each stage n, layer and layer 2 choose simultaneously a row i n 2 ft; Bg and a column j n 2 fl; Rg. When doing so, both layers are fully informed of the actions i ; j ; : : : ; i n ; j n that were chosen in revious stages. In addition, layer is fully informed of ast and current states s ; : : : ; s n. We denote this game by. Observe that, for = =2, the states (s n ) are indeendent variables. In e ect, the two layers then face a sequence of indeendent, identical, one-shot games with incomlete information. At the other extreme, for =, the current state is xed throughout the game, and thus coincides with the leading examle of Chater in Aumann and Maschler (995). 2.2 Main results 2.2. Background de nitions A strategy of the (uninformed) layer 2 is a function : [ n2n (I J) n! [; ]; where (h) is the robability assigned to L after the sequence h of actions. We describe a strategy of layer as a function : [ n2n (I J S) n! [; ] [; ]; with the interretation that assigns to each sequence of ast actions and ast states a air of mixed moves, to be used in states s and s resectively: that is, denoting (x n ; y n ) = (s ; i ; j ; : : : ; s n ; i n ; j n ), x n (res. y n ) is the robability assigned to T if s n = s (res. s n = s). Given a strategy air (; ), we denote by P ; the induced robability measure over the set of in nite lays, and we denote by E ; the corresonding exectation oerator. For a given stage N 2 N, " N (; ) = N # N E X ; g sn (i n ; j n ) 3 n=

is the average ayo in the rst N stages. We denote by v N the value of the N-stage game i.e., the value of the game with ayo function N. A real number v is the (uniform) value if there are strategies that (aroximately) guarantee v N in all long games. To be seci c: De nition The real number v is the value of the game if, for every " > there is N 2 N and a air of strategies ( ; ) such that for every N N N ( ; ) + " v N (; ) "; 8; : () Renault (22) roves that the game has a value v for each 2 [; ] and that moreover, v is indeendent of the initial distribution. It is easy to check that, in the resent game, and can be chosen indeendently of ". For various values of we will exhibit strategies and and a real number v such that lim inf N! ( ; ) v and lim su N! (; ) v, for each and. Such strategies will be called otimal strategies. It can be checked that the strategies and satisfy the uniformity condition (), but we will leave this issue out of the resent note. 2.2.2 Main Results We rst introduce strategies for both layers, that will turn to be otimal strategies for a range of arameter values. We start with layer, and let =2. It is readily checked that the mixed action of layer that assigns robability ( )= to T in state s, and robability to B in state s, is an otimal strategy of layer in the one-shot game with incomlete information in which layer is informed of the state of nature, while layer 2 is not. For =2, by symmetry, we de ne to assign robability to T in state s, and =( ) to B in state s. If < =2, it is not the unique otimal strategy. However, it is the only one that renders layer 2 indi erent between laying L or R. 4

We de ne as the strategy that lays the mixed move n in stage n, where n is the conditional robability that s n = s, given ast actions of layer. In other words, n is the belief held by layer 2 in stage n, when facing the strategy. 2 To be seci c, letting s;n (i) be the robability assigned by n to move i in state s, one has by Bayes rule, s;n (i) n n+ = s;n (i) n + s;n (i)( n ) s;n (i)( n ) + ( ) s;n (i) n + s;n (i)( n ) : The rst fraction is the conditional robability that s n = s, given that layer just layed i. In that event, there is robability that s n+ = s. The second fraction is the conditional robability that s n = s, given that layer just layed i. In that event, there is robability that s n+ = s. In e ect, lays otimally in a myoic way. It maximizes the current ayo under the constraint that layer 2 be indi erent between L and R, not taking into account the information on the current state that is transmitted to layer 2. We now describe a strategy of layer 2. It is an automaton with two states, labelled ;. Transitions are given by: ( ; T )! ; ( ; T )! ; ( ; B)! ; ( ; B)! : That is, the automaton moves to (res. to ) whenever B (res. T ) is layed. In state (res. in state ), the automaton lays L (res. R) with robability 2. 4 Grahically, looks as follows. 2 2 4 4 B x B - - x T T 2 2 There is no circularity here, since the comutation of n involves only the strategy of layer in the rst n stages. 5

Figure : The strategy of layer 2 In this gure, below each state aears its name, and above the state aears the robability to lay L at that state. Transitions are indicated by arrows. We now summarize our main results. Theorem 2 The following results hold:. One has v = v for each 2 [; ]; 2. v = 4, for 2 [=2; 2=3], and v 4 for 2=3. 3. Let be the unique real solution of 9 3 3 2 + 6 = : ( :7589). One has v = w :34829466. 3+6 2 4. The strategy is otimal for all 2 [=2; 2=3]. 5. The strategy is otimal for all 2 [=2; 2=3]. Actually, the second statement rovides an uer bound on v. In Section 6, we will rovide a lower bound as well. Indeendently of us, and using di erent tools, Marino (25) roved that v = for 2 [=2; 2=3]. 4 The roof is organized as follows. We start below by establishing the rst claim in Theorem 2. In Section 3, we recall the so-called Average Cost Otimality Equality from the theory of MDP, and we derive a number of imlications for the game under study. In Section 4, we derive the roerties of. In Section 5, we rove the claims relative to layer. Section 6 rovides a lower bound on v. 2.3 v = v Here we argue that the symmetries in the game imly v = v for every 2 [; ]. Thus, it is enough to study v for 2 [=2; ]. Given a strategy of layer, we de ne a mirrored strategy, obtained by mirroring actions and states at even stages. That is, given a nite history h, we rst construct h by changing at all even stages every aearance of T (res. B, L, R, s, s) to B (res. T, R, L, s, s), and we de ne (h) to be (h ). 6

Given a strategy for layer 2, we de ne its mirrored version in a similar way. It is immediate to check that the average ayo induced by ( ; ) in the game is equal to the average ayo induced by (; ) in. This readily imlies our claim. 3 The Average Cost Otimality Equality The following Proosition is a version of the well-known Average Cost Otimality Equation (ACOE) for general Markov Decision Problems (MDP s). For a roof, see e.g. Feinberg and Shwartz (22). Proosition 3 Let (S; A; r; q) be an MDP with a comact metric sace S, comact action set A, continuous ayo function r : S A! R, and transition rule q : S A! (S) such that q( j s; a) has nite suort for every (s; a) 2 S A. If there is a bounded function V : S! R such that! v + V (s) = max a2a r(s; a) + X s 2S q(s j s; a)v (s ) for each s 2 S; (2) then v is the value of the MDP, for each initial state s 2 S. Moreover, a stationary strategy = ((s)) is otimal as soon as, for every s 2 S, (s) attains the maximum in (2). On the other hand, if, for some s 2 S, the sequence n (v n () v n (s )) has a oint-wise limit V, and if the value v is indeendent of the initial state, then! v + V (s) = su a2a r(s; a) + X s 2S q(s j s; a)v (s ) for each s 2 S: (3) In this roosition, (S) stands for the set of robability distributions over S. The value v is the suremum over all olicies of the ayo function () := lim inf E [ N P N n= r(s n; a n )]. Observe that the function V is determined u to an additive constant. We will aly this result in two ways. Let rst a strategy be given, that can be imlemented with a nite automaton with state sace and deterministic transitions. The best-rely of layer when facing reduces to a Markov decision roblem with state sace fs; sg. Indeed, in any 7

stage, layer will be able to infer from the ast history the current state of layer 2 s automaton and knows in addition the current state. We will use this remark to rove that guarantees =(4 ) in the game. Proosition 3 can be used to analyze layer s otimal behavior in the game. By Renault (22) layer has an otimal strategy that does not deend on layer 2 s actions. Facing such a strategy, at any stage layer 2 will simly comute her osterior belief, and lay the action that yields the minimal current ayo, given her osterior belief and the mixed moves x and y used by layer in states s and s. Hence, the su inf of can be comuted under the assumtion that layer 2 always lays a best rely to layer s current mixed move, given layer 2 s osterior belief over the current state. In other words, the value of the game coincides with the value of the following auxiliary Markov Decision Problem, both for the nite and in nite horizon versions: the current state n 2 [; ] corresonds to layer 2 s current osterior belief; the decision maker chooses a air (x; y) of mixed moves; the current ayo is min fx n ; ( y)( n )g; with robability x n + y( n ), (res. ( x) n + ( y)( n )) the decision maker lays T (res. B), and the next state is the conditional distribution n+ of s n+, given the action layed by the decision maker. Moreover, the value v n of the n-stage version of the above MDP coincides with the value of the n-stage version of the game and it can be checked that the sequence n (v n () v n ( )) has a oint-wise limit V, for each choice of the initial state. It is known that the value v n of the n-stage version of is concave w.r.t. the initial distribution. Thus, V is also concave, hence continuous on (; ). Therefore, for each 2 (; ), one has v + V () = max fmin (x; ( y)( )) x;y2[;] x + (x + ( )y) V x + ( )y + ( ) y( x + ( )y + (( x) + ( )( y)) ( x) V ( x) + ( )( y) + ( ) ( y)( : ( x) + ( )( y) 8

The rst term is the current ayo. In the second term, x + ( )y is the robability that T will be layed, while the argument of V is the induced osterior belief. We will denote by W x;y the term between braces in the right-hand side. Lemma 4 There is an otimal strategy such that layer 2 is always indi erent between L and R. Proof. We will rove that, for each, the maximum of W x;y is achieved for some x ; y 2 [; ] such that x = ( )( y ). W.l.o.g., we assume that =2. Let x; y 2 [; ] be a mixed action air such that x 6= ( )( y). We will exhibit a air x ; y, with x = ( )( y ) and W x ;y W x;y. We will treat the case in which x > ( )( y). The discussion of the other case is similar and therefore omitted. Assume rst that y x. Set x = y = so that x = ( )( y ). Observe that W x ;y = x + V (): Note that min fx ; ( )( y )g > min fx; ( )( y)g. By concavity of V, it therefore follows that W x ;y > W x;y. Assume now that y < x. Set y = y and decrease x to x that satis es x = ( )( y ). The current ayo is not a ected: min fx ; ( )( y )g = ( )( y). On the other hand, less information about the current state is transmitted when using (x ; y ) than when using (x; y). By concavity of V, this will imly that W x ;y W x;y. To be seci c, the sum of the last two terms in W x;y is a convex combination a x;y V (x;y) T + ( a x;y )V (x;y), B with a x;y x;y T + ( a x;y )x;y B =, and a similar observation holds for (x ; y ). It is readily veri ed that x;y B < x B ;y T x ;y < T x;y. The claim follows. By the revious lemma, it follows that the function V : [; ]! R satis es V () = V ( ) for each and the functional equation below, for each =2: v +V () = max x + ( )V + (2 ) x2[;] x + V ( (2 )x) : (4) Besides, any function x that achieves the maximum in (4) for each yields an otimal stationary strategy for layer. As an illustration, we rovide below the grahs of V for three di erent values of, obtained by numerical simulation. 9

4 Player 2 We here rove the results relative to layer 2. Lemma 5 The strategy of layer 2 guarantees =(4 ) for all =2. Proof. As layer knows the state of the automaton of layer 2 and the true state of the game, in e ect he faces a MDP with four states: S = f(s ; ); (s ; ); (s ; ); (s ; )g. In each one of these four states he has two available actions, T and B, and the ayo is: r((s ; ); T ) = 2 4 ; r((s ; ); B) = ; r((s ; ); T ) = ; r((s ; ); B) = 2 4 ; r((s ; ); T ) = 2 4 ; r((s ; ); B) = ; r((s ; ); T ) = ; r((s ; ); B) = 4 2 4 : By the ACOE, v = is the value if and only if there is a function V : S! R that satis es: 2 v + V (s ; ) = max 4 + V (s ; ) + ( )V (s ; ); V (s ; ) + ( )V (s ; )g ; (5) v + V (s ; ) = max fv (s ; ) + ( )V (s ; ); 2 4 + V (s ; ) + ( )V (s ; ) ; (6) 2 v + V (s ; ) = max 4 + V (s ; ) + ( )V (s ; ); V (s ; ) + ( )V (s ; )g ; (7) v + V (s ; ) = max fv (s ; ) + ( )V (s ; ); 2 4 + V (s ; ) + ( )V (s ; ) : (8) These equations are symmetric. One can verify that these equations imly that V (s ; ) = V (s ; ) and V (s ; ) = V (s ; ), and so Eqs. (??) to (5) can be omitted.

One can now verify that the following is a solution to these equations: v = 4 ; V (s ; ) = V (s ; ) = ; V (s ; ) = V (s ; ) = 4 : 5 Player We here rove the results relative to layer. We will rst rove that guarantees =(4 ) whenever 2 [=2; 2=3]. Together with the results of the revious section, this yields v = =(4 ) and imlies that is indeed otimal for that range of values of. Next, we analyze the case where = w :7589. We will rove that is otimal, and we also exhibit an otimal strategy for layer 2. Finally, we rove that the range of values of for which is an otimal strategy of, is an interval, thereby comleting the roof of Theorem 2. 5. The case 2 [=2; 2=3] We here rove that is otimal in, for all 2 [=2; 2=3]. Under, and when the osterior robability of s is, layer lays as follows: 2 s s L R L R Observe that the robability that the action B is layed is, so that the robability that the action T is layed is. Lemma 6 The strategy guarantees =(4 ), for each 2 [=2; 2=3]. Proof. Recall the otimality equation (4). In order to rove that the strategy yields =(4 ), we need to nd a function V, symmetric around

=2, such that the equality + V () = + ( )V 4 + (2 ) + V ( ) holds for each =2. Observe that one has + (2 ) =2, whenever 2 [=2; 2=3]: under, the osterior robability is either,, or jums from one half of the osterior sace [ ; ] to the other. Therefore, we need only nd a function V : [ ; =2]! R, such that + V () = + ( )V (2 ) + V ( ): 4 One can verify that the function V () = follows. 5.2 The case = w :7589 4 is a solution. The result We here analyze the case where =, the unique real solution of the equation 9 3 3 2 + 6 = : We show that v = w :34829 by showing that it is an equilibrium ayo. Since the game is zero-sum, the equilibrium consists of otimal 3+6 2 strategies. We are going to show that the following air of strategies is an equilibrium: For layer we take the strategy that is de ned in Section 2.2. For layer 2 we take the following strategy that can be imlemented by the following automaton with four states,,, and. 2 2 B B B B - x - x - x - x T T T T Figure 3: The strategy of layer 2 2

In Figure 2, below each state aears its name, and above it aears the robability that the action L is layed. Denote = w :5944. One can verify that the following equations 3 are satis ed. = ( j B) = 2 ( )2 + = ( j B) = 2 + ( ) = (2 ) ; 2 = (2 ) : (9) Whenever layer lays B, the osterior belief of layer 2 evolves as follows: (i) from it moves to, (ii) from it moves to whereas (iii) from either or and layer lays B, it moves to. By symmetry arguments, this imlies that the transitions of the automaton of layer 2 mimic the evolution of his osterior belief, rovided layer follows. When layer follows, layer 2 is always indi erent between laying T or B. Since is indeendent of layer 2 s moves, the ayo is indeendent of layer 2 s strategy. In articular, layer 2 s automaton is a best rely to. We now turn to the otimization roblem of layer. In e ect he faces a MDP with 8 states. Each state is comosed of the actual state of nature (2 alternatives) and the state of the automaton of layer 2 (4 alternatives). So that is the best resonse against the strategy of layer 2, and is the value, the ACOE imlies that there should be a function V that satis es the following. v + V (s ; ) = + V (s ; ) + ( )V (s ; ) V (s ; ) + ( )V (s ; ); v + V (s ; ) = 2 + V (s ; ) + ( )V (s ; ) V (s ; ) + ( )V (s ; ); v + V (s ; ) = 2 + V (s ; ) + ( )V (s ; ) = V (s ; ) + ( )V (s ; ); v + V (s ; ) = 3 + V (s ; ) + ( )V (s ; ) = V (s ; ) + ( )V (s ; ): Since V is determined u to an additive constant, we can set V (s ; ) =, and then this system contains 6 equations in 6 variables. The unique solution 3

is v = 3 + 6 w :34829466; 2 4 = 3 + 6 w :93423229; 2 2 2 = 3 + 6 w :696582932; 2 6 2 V (s ; ) = 3 + 6 w :788327; 2 2 V (s ; ) = 3 + 6 w :696582932; 2 2 V (s ; ) = 3 + 6 w :23764997; 2 V (s ; ) = : One can verify that the incentive constraints are satis ed in this case. 6 A lower bound on v Item 2 in Theorem 2 rovides an uer bound on v for 2=3. We here illustrate how to comute a lower bound on v, and numerically investigate the tightness of these bounds. We will obtain a lower bound by comuting the ayo induced by. By the ACOE, is the unique real number such that there is a function V (symmetric around =2) that satis es + V () = + ( )V + (2 ) + V ( ) =2 + V () = + V (2 ) + ( )V () =2: As V can be determined u to an additive constant, set V ( ) =. Set =, and for each k k k k+ = min + (2 ) ; (2 ) : k k 4

Then k 2 [ ; =2], and + V ( k ) = k + ( k )V ( k+ ) + k V ( ): Using V ( ) =, and given the boundedness of V, it follows that: = + ( ) + ( )( ) 2 + : + ( ) + ( )( ) + It is convenient to introduce the sequence (u n ), with u = and u n+ = n, so that (u n ) satis es the recursive equation: 2 u n+ = max 3 ; 2 3 + 2 : u n u n With these notations, the ayo is = u + u u + u u u 2 + u + u ( u 2 ) + u u 2 ( u 3 ) + = u + u u + u u u 2 + This last formula rovides a method of comuting for any. We do not know how to exlicitly solve for the sequence (u n ) in general, and no simle exlicit formula seems to hold. For some values of however, a closed formula can be obtained. Observe indeed that the recurrence equation on (u n ) writes u n+ = maxf (u n ); (u n )g; 2 where (u) := 3 is increasing. Let u be a solution to u = u (u). It is immediate to check that u =2, hence (u ) u, for otherwise, the inequality 2u < (u ) + (u ) = would hold. In articular, if u N = u for some N, then the sequence (u n ) is stationary from that stage on. Next, consider the sequence (w n ) de ned by w = = u and w n+ = 3 2 w n : () We claim that if w N = u (and w n 6= u for n < N), then u n = w n for each n < N and u n = u for each n N. To rove this claim, it is enough 5

to check that (w n ) (w n ) for n = ; 2; ; N. To see why this holds, observe rst that the sequence (w n ) is decreasing. We argue by contradiction, and assume that w k+ = (w k ) < (w k ) for some k < N. Since w k > w k+ w N, this yields w N < (w k ). Since is increasing, one obtains w N < (w N ) a contradiction. As a result, the comutation of is easy for those values of with the roerty that w n = u for some n. The solution to () is where tan = w n = 2 ( ) (9 5) ; tan = (3 ) cos ((n + ) ) ; () cos (n ) 3( ) ( ) (9 5) : Using standard maniulations, the following aears. Given N, there exists a unique such that N is the smallest integer for which u N = u N. This solves the following olynomial equation: r 9 5 tan((n + )) = 2 3 + (9 4) : For instance, (i) w :7589 for N = 2, which is the secial case already studied, (ii) w :8583 for N = 3, (iii) w :973 for N = 4, (iv) w :9348 for N = 5, etc. From the exression of w n, one deduces (n+)=2 cos ((n + ) ) w w n = (2 ) cos It follows that (2 ) (N+)=2! 2(3 (9 4)) = 2 : ( )(3 9 4)(5 2 (9 4)) For instance, w :288 for w :8583 (more recisely, is the unique real root of 62 + 737x 7279x 2 + 52x 3 5276x 4 + 669x 5 = for the unique real root of 4 + 37x 36x 2 + 248x 3 225x 4 + 8x 5 = ), and w :246 for w :973, etc. It is readily veri ed that the equation =, valid for w :7589, is not valid on any oen interval. 3+6 2 6

We draw below a grah that features simultaneously the uer bound =(4 ), and the functions v and. The latter two grahs were obtained numerically. References [] Aumann R.J. and Maschler M.B. (995) Reeated Games with Incomlete Information, The MIT Press [2] Feinberg E.A. and Shwartz A. (22) Handbook of Markov Decision Processes. Kluwer. [3] Marino A. (25) The Value and Otimal Strategies of a Particular Markov Chain Game. Prerint. [4] Neyman A. (24) Existence of Otimal Strategies in Markov Games with Incomlete Information, rerint [5] Renault J. (22) The Value of Markov Chain Games with Lack of Information on One Side, rerint 7