Nash Equilibrium for Upward-Closed Objectives Krishnendu Chatterjee Report No. UCB/CSD August 2005 Computer Science Division (EECS) University

Size: px

Start display at page:

Download "Nash Equilibrium for Upward-Closed Objectives Krishnendu Chatterjee Report No. UCB/CSD August 2005 Computer Science Division (EECS) University"

Dora Bond
5 years ago
Views:

1 Nash Equilibrium for Upward-Closed Objectives Krishnendu Chatterjee Report No. UCB/CSD August 2005 Computer Science Division (EECS) University of California Berkeley, California 94720

2 Nash Equilibrium for Upward-Closed Objectives Λ Krishnendu Chatterjee Department of Electrical Engineering and Computer Sciences University of California, Berkeley, USA c krish@cs.berkeley.edu August 2005 Abstract We study infinite stochastic games played by n-players on a finite graph with goals specified by sets of infinite traces. The games are concurrent (each player simultaneously and independently chooses an action at each round), stochastic (the next state is determined by a probability distribution depending on the current state and the chosen actions), infinite (the game continues for an infinite number of rounds), nonzero-sum (the players' goals are not necessarily conflicting), and undiscounted. We show that if each player has an upwardclosed objective, then there exists an "-Nash equilibrium in memoryless strategies, for every ">0; and exact Nash equilibria need not exist. Upward-closure of an objective means that if a set Z of infinitely repeating states is winning, then all supersets of Z of infinitely repeating states are also winning. Memoryless strategies are strategies that are independent of history of plays and depend only on the current state. We also study the complexity of finding values (payoff profile) of an "-Nash equilibrium. We show that the values of an "-Nash equilibrium in nonzero-sum concurrent games with upward-closed objectives for all players can be computed by computing "-Nash equilibrium values of nonzero-sum concurrent games with reachability objectives for all players and a polynomial procedure. As a consequence we establish that values of an "-Nash equilibrium can be computed in TFNP (total functional NP), and hence in EXPTIME. Λ This research was supported in part by the ONR grant N , the AFOSR MURI grant F , and the NSF grant CCR

3 1 Introduction Stochastic games. Non-cooperative games provide a natural framework to model interactions between agents [16, 18]. The simplest class of noncooperative games consists of the one-step" games games with single interaction between the agents after which the game ends and the payoffs are decided (e.g., matrix games). However, a wide class of games progress over time and in stateful manner, and the current game depends on the history of interactions. Infinite stochastic games [20, 9] are a natural model for such games. A stochastic game is played over a finite state space and is played in rounds. In concurrent games, in each round, each player chooses an action from a finite set of available actions, simultaneously and independently of other players. The game proceeds to a new state according to a probabilistic transition relation (stochastic transition matrix) based on the current state and the joint actions of the players. Concurrent games subsume the simpler class of turn-based games, where at every state at most one player can choose between multiple actions. In verification and control of finite state reactive systems such games proceed for infinite rounds, generating an infinite sequence of states, called the outcome of the game. The players receive apayoff based on a payoff function that maps every outcome to a real number. Objectives. Payoffs are generally Borel measurable functions [15]. The payoff set for each player is a Borel set B i in the Cantor topology on S! (where S is the set of states), and player i gets payoff 1 if the outcome of the game is in B i, and 0 otherwise. In verification, payoff functions are usually index sets of!-regular languages. The!-regular languages generalize the classical regular languages to infinite strings, they occur in low levels of the Borel hierarchy (they are in ± 3 Π 3 ), and they form a robust and expressive language for determining payoffs for commonly used specifications. The simplest!-regular objectives correspond to safety ( closed sets") and reachability ( open sets") objectives. Zero-sum games. Games may be zero-sum, where two players have directly conflicting objectives and the payoff of one player is one minus the payoff of the other, or nonzero-sum, where each player has a prescribed payoff function based on the outcome of the game. The fundamental question for games is the existence of equilibrium values. For zero-sum games, this involves showing a determinacy theorem that states that the expected optimum value obtained by player 1 is exactly one minus the expected optimum value obtained by player 2. For one-step zero-sum games, this is von 2

4 Neumann's minmax theorem [25]. For infinite games, the existence of such equilibria is not obvious, in fact, by using the axiom of choice, one can construct games for which determinacy does not hold. However, a remarkable result by Martin [15] shows that all stochastic zero-sum games with Borel payoffs are determined. Nonzero-sum games. For nonzero-sum games, the fundamental equilibrium concept is a Nash equilibrium [11], that is, a strategy profile such that no player can gain by deviating from the profile, assuming the other player continues playing the strategy in the profile. Again, for one-step games, the existence of such equilibria is guaranteed by Nash's theorem [11]. However, the existence of Nash equilibria in infinite games is not immediate: Nash's theorem holds for finite bimatrix games, but in case of stochastic games, the strategy space is not compact. The existence of Nash equilibria is known only in very special cases of stochastic games. In fact, Nash equilibria may not exist, and the best one can hope for is an "-Nash equilibrium for all " > 0, where an "-Nash equilibrium is a strategy profile where unilateral deviation can only increase the payoff of a player by at most ". Exact Nash equilibria do exist in discounted stochastic games [10]. For concurrent nonzero-sum games with payoffs defined by Borel sets, surprisingly little is known. Secchi and Sudderth [19] showed that exact Nash equilibria do exist when all players have payoffs defined by closed sets ( safety objectives" or Π 1 objectives). In the case of open sets ( reachability objectives" or ± 1 objectives), the existence of "-Nash equilibrium for every " > 0, has been established in [4]. For the special case of two-player games, existence of "-Nash equilibrium, for every " > 0, is known for!-regular objectives [2] and limit-average objectives [23, 24]. The existence of "-Nash equilibrium in n-player concurrent games with objectives in higher levels of Borel hierarchy than ± 1 and Π 1 has been an intriguing open problem; existence of "-Nash equilibrium is not even known even when each player has a Büchi objective. Result and proof techniques. In this paper we show that "-Nash equilibrium exists, for every ">0, for n-player concurrent games with upwardclosed objectives. However, exact Nash equilibria need not exist. Informally, an objective Ψisanupward-closed objective, if a play! that visits a set Z of states infinitely often is in Ψ, then a play! 0 that visits Z 0 Z of states infinitely often is also in Ψ. The class of upward-closed objectives subsumes Büchi and generalized Büchi objectives as special cases. For n-player concurrent games our result extends the existence of "-Nash equilibrium from the lowest level of Borel hierarchy (open and closed sets) to a class of objectives that lie in the higher levels of Borel hierarchy (upward-closed objectives lie 3

5 in Π 2 ) and subsumes several interesting class of objectives. Along with the existence of "-Nash equilibrium, our result presents a finer characterization of "-Nash equilibrium showing existence of "-Nash equilibrium in memoryless strategies (strategies that are independent of the history of the play and depend only on the current state). Our result is organized as follows: 1. In Section 3 we develop some results on one player version of concurrent games and n-player concurrent games with reachability objectives. 2. In Section 4 we use induction on the number of players, results of Section 3 and analysis of Markov chains to establish the desired result. Complexity of "-Nash equilibrium. Computing the values of a Nash equilibria, when it exists, is another challenging problem [17, 26]. For onestep zero-sum games, equilibrium values and strategies can be computed in polynomial time (by reduction to linear programming) [16]. For one-step nonzero-sum games, no polynomial time algorithm is known to compute an exact Nash equilibrium, even in two-player games [17]. From the computational aspects, a desirable property of an existence proof of Nash equilibrium is its ease of algorithmic analysis. We show that our proof for existence of "-Nash equilibrium is completely constructive and algorithmic. Our proof shows that the computation of an "-Nash equilibrium in n-player concurrent games with upward-closed objectives can be achieved by computing "-Nash equilibrium of games with reachability objectives and a polynomial time procedure. Our result thus shows that computing "-Nash equilibrium for upward-closed objectives is no harder than solving "-Nash equilibrium of n-player games with reachability objectives by a polynomial factor. We then prove that an "-Nash equilibrium can be computed in TFNP (total functional NP) and hence in EXPTIME. 2 Definitions Notation. For a countable P set A, aprobability distribution on A is a function ffi : A! [0; 1] such that a2a ffi(a) =1. We denote the set of probability distributions on A by D(A). Given a distribution ffi 2D(A), we denote by Supp(ffi) =fx 2 A j ffi(x) > 0g the support of ffi. Definition 1 (Concurrent game structures) An n-player concurrent game structure G = hs; A; 1 ; 2 ;:::; n ;ffii consists of the following components: 4

6 ffl A finite state space S and a finite set Aofmoves. ffl Move assignments 1 ; 2 ;:::; n : S! 2 A n;. For i 2f1; 2;:::;ng, move assignment i associates with each state s 2 S the non-empty set i (s) Aofmoves available to player i at state s. ffl A probabilistic transition function ffi : S A A ::: A! D(S), that gives the probability ffi(s; a 1 ;a 2 ;:::;a n )(t) of a transition from s to t when player i plays move a i, for all s; t 2 S and a i 2 i (s), for i 2f1; 2;:::;ng. We define the size of the game structure G to be equal to the size of the transition function ffi; specifically, jgj = X s2s X (a 1 ;a 2 ;:::;a n)2 1 (s) 2 (s) ::: n(s) X t2s jffi(s; a 1 ;a 2 ;:::;a n )(t)j; where jffi(s; a 1 ;a 2 ;:::;a n )(t)j denotes the space to specify the probability distribution. At every state s 2 S, each player i chooses a move a i 2 i (s), and simultaneously and independently of the other players, and the game then proceeds to the successor state t with probability ffi(s; a 1 ;a 2 ;:::;a n )(t), for all t 2 S. A state s is called an absorbing state if for all a i 2 i (s) we have ffi(s; a 1 ;a 2 ;:::;a n )(s) =1. In other words, at s for all choices of moves of the players the next state is always s. For all states s 2 S and moves a i 2 i (s) we indicate by Dest(s; a 1 ;a 2 ;:::;a n ) = Supp(ffi(s; a 1 ;a 2 ;:::;a n )) the set of possible successors of s when moves a 1 ;a 2 ;:::;a n are selected. A path or a play! of G is an infinite sequence! = hs 0 ;s 1 ;s 2 ;:::i of states in S such that for all k 0, there are moves a k i 2 i (s k ) and with ffi(s k ;a k 1 ;ak 2 ;:::;ak n)(s k+1 ) > 0. We denote by Ω the set of all paths and by Ω s the set of all paths! = hs 0 ;s 1 ;s 2 ;:::i such that s 0 = s, i.e., the set of plays starting from state s. Randomized strategies. A selector ο i for player i 2 f 1; 2;:::;ng is a function ο i : S!D(A) such that for all s 2 S and a 2 A, ifο i (s)(a) > 0 then a 2 i (s). We denote byλ i the set of all selectors for player i 2f1; 2;:::;ng. A strategy ff i for player i is a function ff i : S +! Λ i that associates with every finite non-empty sequence of states, representing the history of the play so far, a selector. A memoryless strategy is independent of the history of the play and depends only on the current state. Memoryless strategies coincide with selectors, and we often write ff i for the selector corresponding to a memoryless strategy ff i. A memoryless strategy ff i for player i is uniform memoryless if the selector of the memoryless strategy is an uniform 5

7 distribution over its support, i.e., for all states s we have ff i (s)(a i ) = 0 if 1 a i 62 Supp(ff i (s)) and ff i (s)(a i ) = jsupp(ff i (s))j if a i 2 Supp(ff i (s)). We denote by ± i, ± M i and ± UM i the set of all strategies, set of all memoryless strategies and the set of all uniform memoryless strategies for player i, respectively. Given strategies ff i for player i, we denote by ff the strategy profile hff 1 ;ff 2 ;:::;ff n i. A strategy profile ff is memoryless (resp. uniform memoryless) if all the component strategies are memoryless (resp. uniform memoryless). Given a strategy profile ff = (ff 1 ;ff 2 ;:::;ff n ) and a state s, we denote by Outcome(s; ff) = f! = hs 0 ;s 1 ;s 2 :::i j s 0 = s and 9a k i :ff i(hs 0 ;s 1 ;:::;s k i)(a k i ) > 0: and ffi(s k;a k 1 ;ak 2 ;:::;ak n)(s k+1 ) > 0g the set of all possible plays from s, given ff. Once the starting state s and the strategies ff i for the players have been chosen, the game is reduced to an ordinary stochastic process. Hence, the probabilities of events are uniquely defined, where an event A Ω s is a measurable set of paths. For an event A Ω s, we denote by Pr ff s (A) the probability that a path belongs to A when the game starts from s and the players follow the strategies ff i, and ff = hff 1 ;ff 2 ;:::;ff n i. Objectives. Objectives for the players in nonterminating games are specified by providing the set of winning plays Ψ Ω for each player. A general class of objectives are the Borel objectives [13]. A Borel objective Φ S! is a Borel set in the Cantor topology on S!. The class of!-regular objectives [21], lie in the first 2 1 = 2 levels of the Borel hierarchy (i.e., in the intersection of ± 3 and Π 3 ). The!-regular objectives, and subclasses thereof, can be specified in the following forms. For a play! = hs 0 ;s 1 ;s 2 ;:::i2ω, we define Inf(!) =f s 2 S j s k = s for infinitely many k 0 g to be the set of states that occur infinitely often in!. 1. Reachability and safety objectives. Given a game graph G, and a set T S of target states, the reachability specification Reach(T ) requires that some state in T be visited. The reachability specification Reach(T ) defines the objective [[Reach(T )] = fhs 0 ;s 1 ;s 2 ;:::i 2 Ω j 9k 0: s k 2 T g of winning plays. Given a set F S of safe states, the safety specification Safe(F ) requires that only states in F be visited. The safety specification Safe(F ) defines the objective [Safe(F )] =fhs 0 ;s 1 ;:::i2ω j8k 0: s k 2 F g of winning of plays. 2. Büchi and generalized Büchi objectives. Given a game graph G, and a set B S of Büchi states, the Büchi specification Büchi(B) requires that states in B be visited infinitely often. The Büchi specification 6

8 Büchi(B) defines the objective [Büchi(B)] = f! 2 Ω j Inf(!) B 6= ;g of winning plays. Let B 1 ;B 2 ;:::;B n be subset of states, i.e., each B i S. The generalized Büchi specification is the requires that every Büchi specification Büchi(B i ) be satisfied. Formally, the generalized Büchi objective is T i2f 1;2;:::;n g [Büchi(B i )]. 3. Müller and upward-closed objectives. Given a set M 2 S of Müller set of states, the Müller specification Müller(M) requires that the set of states visited infinitely often in a play is exactly one of the sets in M. The Müller specification Müller(M) defines the objective [Müller(M)]] =f! 2 Ω j Inf(!) 2 M g of winning plays. The upwardclosed objectives form a sub-class of Müller objectives, with the restriction that the set M is upward-closed. Formally a set UC 2 S is upward-closed if the following condition hold: if U 2 UC and U Z, then Z 2 UC. Givenaupward-closed set UC 2 S, the upward-closed objective is defined as the set [[UpClo(UC )] =f! 2 Ω j Inf(!) 2 UC g of winning plays. Observe that the upward-closed objectives specifies that if a play! that visits a subset U of states visited infinitely often is winning, then a play! 0 that visits a superset of U of states infinitely often is also winning. The upward-closed objectives subsumes Büchi and generalized Büchi (i.e., conjunction of Büchi) objectives. The upward-closed objectives also subsumes disjunction of Büchi objectives. Since the Büchi objectives lie in the second level of the Borel hierarchy (in Π 2 ), it follows that upward-closed objectives can express objectives that lie in Π 2. Müller objectives are canonical forms to express!-regular objectives, and the class of upward-closed objectives form a strict subset of Müller objectives and cannot express all!-regular properties. We write Ψ for an arbitrary objective. We write the objective of player i as Ψ i. Given a Müller objective Ψ, the set of paths Ψ is measurable for any choice of strategies for the players [22]. Hence, the probability that a path satisfies a Müller objective Ψ starting from state s 2 S under a strategy profile ff is Pr ff s (Ψ). Notations. Given a strategy profile ff = (ff 1 ;ff 2 ;:::;ff n ), we denote by ff i = (ff 1 ;ff 2 ;:::;ff i 1 ;ff i+1 ;:::;ff n ) the strategy profile with the strategy for player i removed. Given a strategy ff 0 i 2 ± i, and a strategy profile ff i, we denote by ff i [ ff 0 i the strategy profile (ff 1;ff 2 ;:::;ff i 1 ;ff 0 i ;ff i+1;:::;ff n ). We also use the following notations: ±=± 1 ± 2 ::: ± n ; ± M =± M 1 ± M ::: 2 ±M n ; ± UM =± UM 1 ± UM 2 ::: ± UM n ; and ± i =± 1 ± 2 7

9 :::± i 1 ± i+1 :::± n. The notations for ± M i and ± UM i are similar. For n 2 N, we denote by [n] the set f 1; 2;:::;ng. Concurrent nonzero-sum games. A concurrent nonzero-sum game consists of a concurrent game structure G with objective Ψ i for player i. The zero-sum values for the players in concurrent games with objective Ψ i for player i are defined as follows. Definition 2 (Zero-sum values) Let G be a concurrent game structure with objective Ψ i for player i. Given a state s 2 S we call the maximal probability with which player i can ensure that Ψ i holds from s against all strategies of the other players is the zero-sum value of player i at s. Formally, the zero-sum value for player i is given by the function val G i (Ψ i):s! [0; 1] defined for all s 2 S by val G i (Ψ i)(s) = sup ff 0 i 2± i inf ff i 2± i Pr ff i[ff0 i s (Ψ i ): Atwo-player concurrent game structure G with objectives Ψ 1 and Ψ 2 for player 1 and player 2, respectively, iszero-sum if the objectives of the players are complementary, i.e., Ψ 1 =Ωn Ψ 2. Concurrent zero-sum games satisfy a quantitative version of determinacy [15], stating that for all two-player concurrent games with Müller objectives Ψ 1 and Ψ 2, such that Ψ 1 =ΩnΨ 2, and all s 2 S, wehave val G 1 (Ψ 1)(s)+val G 2 (Ψ 2)(s) =1: The determinacy also establishes existence of "-Nash equilibrium, for all ">0, in concurrent zero-sum games. Definition 3 ("-Nash equilibrium) Let G be a concurrent game structure with objective Ψ i for player i. For " 0, a strategy profile ff Λ = (ff Λ 1 ;:::;ffλ n) 2 ± is an "-Nash equilibrium for a state s 2 S iff the following condition hold for all i 2 [n]: sup ff i 2± i Pr ffλ [ff i i s (Ψ i )» Pr ffλ s (Ψ i )+": A Nash equilibrium is an "-Nash equilibrium with " =0. Example 1 ("-Nash equilibrium) Consider the two-player game structure shown in Fig. 1.(a). The state s 1 and s 2 are absorbing states and the 8

10 set of available moves for player 1 and player 2 at s 0 is f a; b g and f c; d g, respectively. The transition function is defined as follows: ffi(s 0 ;a;c)(s 0 )=1; ffi(s 0 ;b;d)(s 2 )=1; ffi(s 0 ;a;d)(s 1 )=ffi(s 0 ;b;c)(s 1 )=1: The objective of player 1 is an upward-closed objective [UpClo(UC 1 )] with UC 1 = f f s 1 g; f s 1 ;s 2 g; f s 0 ;s 1 g; f s 0 ;s 1 ;s 2 g g, i.e., the objective of player 1 is to visit s 1 infinitely often (i.e.,[[büchi(f s 1 g)]]). Since s 1 is an absorbing state in the game shown, the objective of player 1 is equivalent to [Reach(f s 1 g)]. The objective of player 2 is an upward-closed objective [UpClo(UC 2 )] with UC 2 = ffs 2 g; f s 1 ;s 2 g; f s 0 ;s 2 g; f s 0 ;s 1 ;s 2 g; f s 0 g; f s 0 ;s 1 gg. Observe that any play! such that Inf(!) 6= f s 1 g is winning for player 2. Hence the objective of player 1 and player 2 are complementary. For ">0, consider the memoryless strategy ff " 2 1 ±M that plays move a with probability 1 ", and move b with probability ". The game starts at s 0, and in each round if player 2 plays move c, then the play reaches s 1 with probability " and stays in s 0 with probability 1 "; whereas if player 2 plays move d, then the game reaches state s 1 with probability 1 " and state s 2 with probability ". Hence it is easy to argue against all strategies ff 2 for player 2, given the strategy ff " 1 of player 1, the game reaches s 1 with probability at least 1 ". Hence for all ">0, there exists a strategy ff " 1 for player 1, such that against all strategies ff 2, we have Pr ff" 1 ;ff 2 s 0 ([[Reach(f s 1 g)]]) 1 "; hence (1 "; ") is an "-Nash equilibrium value profile at s 0. However, we argue that (1; 0) is not an Nash equilibrium at s 0. To prove the claim, given a strategy ff 1 for player 1 consider the counter strategy ff 2 for player 2 as follows: for k 0, at round k, if player 1 plays move a with probability 1, then player 2 chooses the move c and ensures that the state s 1 is reached with probability 0; otherwise if player 1 plays move b with positive probability at round k, then player 2 chooses move d, and the play reaches s 2 with positive probability. That is either s 2 is reached with positive probability or s 0 is visited infinitely often. Hence player 1cannot satisfy [[UpClo(UC 1 )] with probability 1. This shows that in game structures with upward-closed objectives, Nash equilibrium need not exist and "-Nash equilibrium, for all ">0, is the best one can achieve. Consider the game shown in Fig. 1.(b). The transition function at state s 0 is same as in Fig 1.(a). The state s 2 is an absorbing state and from state s 1 the next state is always s 0. The objective for player 1 is same as in the previous example, i.e., [Büchi(f s 1 g)]. Consider any upward-closed objective [UpClo(UC 2 )] for player 2. We claim that the following strategy profile (ff 1 ;ff 2 ) is a Nash equilibrium at s 0 : ff 1 is memoryless strategy that plays a 9

11 ac ac s 0 s 0 ad,bc bd ac ad,bc bd s1 s 2 s 1 s 2 (a) (b) Figure 1: structures Examples of Nash equilibrium in two-player concurrent game with probability 1, and ff 2 is a memoryless strategy that plays c and d with probability 1 = 2. Given ff 1 and ff 2 we have that the states s 0 and s 1 are visited infinitely often with probability 1. If f s 0 ;s 1 g2uc 2, then the objectives of both players are satisfied with probability 1. If f s 0 ;s 1 g62uc 2, then no subset of fs 0 ;s 2 g is in UC 2. Given strategy ff 1, for all strategies ff 0 2 of player 2, the plays under ff 1 and ff 0 visits subsets of f s 2 0;s 1 g infinitely often. Hence, if f s 0 ;s 1 g62uc 2, then given ff 1, for all strategies of player 2, the objective [[UpClo(UC 2 )]] is satisfied with probability 0, hence player 2 has no incentive to deviate from ff 2. The claim follows. Note that the present example can be contrasted to the zero-sum game on the same game structure with objective [[Büchi(f s 1 g)]] for player 1 and the complementary objective for player 2 (which is not upward-closed). In the zero-sum case, "-optimal strategies require infinite-memory (see [7]) for player 1. In the case of nonzero-sum game with upward-closed objectives (which do not generalize the zero-sum case) we exhibited existence ofmemoryless Nash equilibrium. 3 Markov Decision Processes and Nash Equilibrium for Reachability Objectives The section is divided in two parts: subsection 3.1 develops facts about one player concurrent game structures and subsection 3.2 develops facts about n-player concurrent game structures with reachability objectives. The facts developed in this section will play a key role in the analysis of the later sections. 10

12 3.1 Markov decision processes In this section we develop some facts about one player versions of concurrent game structures, known as Markov decision processes (MDPs) [1]. For i 2 [n], a player i-mdp is a concurrent game structure where for all s 2 S, for all j 2 [n] nfig we have j j (s)j = 1, i.e., at every state only player i can choose between multiple moves and the choice for the other players are singleton. If for all states s 2 S, for all i 2 [n], j i (s)j = 1, then we have a Markov chain. Given a concurrent game structure G, ifwe fix a memoryless strategy profile ff i =(ff 1 ;:::;ff i 1 ;ff i+1 ;:::;ff n ) for players in [n] nfig, the game structure is equivalent to a player i-mdp G ff i with transition function ffi ff i (s; a i )(t) = X (a 1 ;a 2 ;:::;a i 1 ;a i+1 ;:::;a n) ffi(s; a 1 ;a 2 ;:::;a n )(t) Y j2([n]nf i g) ff j (s)(a j ); for all s 2 S and a i 2 i (s). Similarly, if we fix a memoryless strategy profile ff 2 ± M for a concurrent game structure G, we obtain a Markov chain, which we denote by G ff. In an MDP, the sets of states that play an equivalent role to the closed recurrent set of states in Markov chains [14] are called end components [5, 6]. Without loss of generality, we consider player 1-MDPs and since the set ± 1 is singleton for player 1-MDPs we only consider strategies for player 1. Definition 4 (End components and maximal end components) Given a player 1-MDP G, an end component (EC) in G is a subset C S such that there is a memoryless strategy ff 1 2 ± M 1 for player 1 under which C forms a closed recurrent set in the resulting Markov chain, i.e., in the Markov chain G ff1. Given a player 1-MDP G, an end component C is a maximal end component, if the following condition hold: if C Z and Z is an end component, then C = Z, i.e., there is no end component that encloses C. Graph of a MDP. Given a player 1-MDP G, the graph of G is a directed graph (S; E) with the set E of edges defined as follows: E = f (s; t) j s; t 2 S: 9a (s): t 2 Dest(s; a 1 ) g, i.e., E(s) = f t j (s; t) 2 E g denotes the set of possible successors of the state s in the MDP G. Equivalent characterization. An equivalent characterization of an end component C is as follows: for each s 2 C, there is a subset of moves M 1 (s) 1 (s) such that: 11

13 1. when a move inm 1 (s) ischosen at s, all the states that can be reached with non-zero probability are in C, i.e., for all s 2 C, for all a 2 M 1 (s), Dest(s; a) C; 2. the graph (C; E) is strongly connected, where E consists of the transitions that occur with non-zero probability when moves in M 1 ( ) are taken, i.e., E = f (s; t) j s; t 2 S: 9a 2 M 1 (s): t2 Dest(s; a) g. Given a set F 2 S of subset of states we denote by InfSt(F) the event f! j InfSt(!) 2Fg. The following lemma states that in a player 1-MDP, for all strategies of player 1, the set of states visited infinitely often is an end component with probability 1. Lemma 2 follows easily from Lemma 1. Lemma 1 ([6, 5]) Let C be the set of end components of a player 1-MDP G. For all strategies ff 1 2 ± 1 and all states s 2 S, we have Pr ff 1 s (InfSt(C)) = 1. Lemma 2 Let C be the set of end components and Z be the set of maximal end components of a player 1-MDP G. Then the following assertions hold: ffl L = S C2C C = S Z2Z Z; and ffl for all strategies ff 1 2 ± 1 and all states s 2 S, we have Pr ff 1 s ([Reach(L)]]) = 1. Lemma 3 Given a player 1-MDP G and an end component C, there is a uniform memoryless strategy ff 1 2 ± UM 1, such that for all states s 2 C, we have Pr ff 1 s (f! j Inf(!) =C g) =1. Proof. For a state s 2 C, let M 1 (s) 1 (s), be the subset of moves such that the conditions of the equivalent characterization of end components hold. Consider the uniform memoryless strategy ff 1 defined as follows: for all states s 2 C, ff 1 (s)(a) = ( 1 jm 1 (s)j ; if a 2 M 1(s) 0 otherwise: Given the strategy ff 1, in the Markov chain G ff1, the set C is a closed recurrent set of states. Hence the result follows. 12

14 3.2 Nash equilibrium for reachability objectives Memoryless Nash equilibrium in discounted games. We first prove the existence of Nash equilibrium in memoryless strategies in n-player discounted games with reachability objective [Reach(R i )]] for player i, for R i S. We then characterize "-Nash equilibrium in memoryless strategies with some special property in n-player discounted games with reachability objectives. Definition 5 (fi-discounted games) Given an n-player game structure G we write G fi to denote a fi-discounted version of the game structure G. The game G fi at each step halts with probability fi (goes to a special absorbing state halt such that halt is not in R i for all i), and continues as the game G with probability 1 fi. We refer to fi as the discount-factor. In this paper we write G fi to denote a fi-discounted game. Definition 6 (Stopping time of history in fi-discounted games) Consider the stopping time fi defined on histories h = hs 0 ;s 1 ;:::i by fi(h) = inffk 0 j s k = haltg where the infimum of the empty set is +1. Lemma 4 Let G fi beann-player fi-discounted game structure, with fi>0. Then, for all states s 2 S and all strategy profiles ff we have Pr ff s [fi >m]» (1 fi) m : Proof. At each step of the game G fi the game reaches the halt state with probability fi. Hence the probability of not reaching the halt state in m steps is at most (1 fi) m. The proof of the following lemma is similar to the proof of Lemma 2.2 of [19]. Lemma 5 For every n-player fi-discounted game structure G fi, with fi>0, with reachability objective [[Reach(R i )] for player i, there exist memoryless strategies ff i for i 2 [n], such that the memoryless strategy profile ff =(ff 1 ;ff 2 ;:::;ff n ) is a Nash equilibrium in G fi for every s 2 S. 13

15 Proof. Regard each n-tuple ff =(ff 1 ;ff 2 ;:::;ff n ) of memoryless strategies as a vector in a compact, convex subset K of the appropriate Eucledian space. Then define a correspondence that maps each element ff of K to the set (ff) of all elements g =(g 1 ;g 2 ;:::;g n )ofk such that, for all i 2 [n] and all s 2 S, g i is an optimal response for player i in G fi against ff i. Clearly, it suffices to show that there is a ff 2 K such that (ff) =ff. To show this, we will verify the Kakutani's Fixed Point Theorem [12]: 1. For every ff 2 K, (ff) is closed, convex and nonempty; 2. If, for g 1 ; g 2 ;:::;g (k) 2 (ff (k) ); lim k!1 g (k) = g and lim k!1 ff (k) = ff, then g 2 (ff). To verify condition 1, fix ff = (ff 1 ;ff 2 ;:::;ff n ) 2 K and i 2 f1; 2;:::;ng. For each s 2 S, let v fi i (s) be the maximal payoff that player i can achieve in G fi against ff i, i.e., v fi (s) = i val ig fiff i ([Reach(R i )])(s). Since fixing the strategies for all the other players the game structure becomes a MDP, we know that g i is an optimal response to ff i if and only if, for each s 2 S, g i (s) puts positive probability only on actions a i 2 A i that maximize the expectation of v fi i (s), namely, X v fi i (s0 )ffi ff i (s; a i )(s 0 ): s 0 The fact that any convex combination of optimal responses is again an optimal response in MDPs with reachability objectives follows from the fact that MDPs with reachability objectives can be solved by a linear program and the convex combination of optimal responses satisfy the constraints of the linear program with optimal values. Hence condition 1 follows. Condition 2 is an easy consequence of the continuity mapping ff! Pr ff s ([Reach(R i )]) from K to the real line. It follows from Lemma 4 that the mapping is continuous. The desired result follows. Definition 7 (Difference of two MDP's) Let G 1 and G 2 be two player i- MDPs defined on the same state space S with the same set A of moves. The difference of the two MDPs, denoted diff (G 1 ; G 2 ),isdefined as: diff (G 1 ; G 2 )= X s 0 2S X s2s X a2a jffi 1 (s; a)(s 0 ) ffi 2 (s; a)(s 0 )j: That is, diff (G 1 ; G 2 ) is the sum of the difference of the probabilities of all the edges of the MDPs. 14

16 Observe that in context of player i-mdps G, for all objectives Ψ i, we have val G (Ψ i i)(s) = sup ffi 2± i Pr ff i s (Ψ i ). The following lemma follows from Theorem (page 185) of Filar-Vrieze [9]. Lemma 6 (Lipschitz continuity for reachability objectives) Let G fi 1 and G fi 2 be two fi-discounted player i-mdps, for fi > 0, on the same state space and with the same set of moves. For j 2 f 1; 2 g, let v fi j (s) = G val fi j i ([[Reach(Ri )]])(s), for R i S, i.e., v fi j (s) denotes the value for player i for the reachability objective [Reach(R i )] in the fi-discounted MDP G fi j. Then the following assertion hold: jv fi 1 (s) vfi 2 (s)j»diff (Gfi 1 ; Gfi 2 ): Lemma 7 (Nash equilibrium with full support) Let G fi be an n- player fi-discounted game structure, with fi > 0, and reachability objective [[Reach(R i )] for player i. For every ">0, there exist memoryless strategies ff i for i 2 [n], such that for all i 2 [n], for all s 2 S, Supp(ff i (s)) = i (s) (i.e., all the moves of player i is played with positive probability), and the memoryless strategy profile ff =(ff 1 ;ff 2 ;:::;ff n ) is an "-Nash equilibrium in G fi for every s 2 S. Proof. Fix an Nash equilibrium ff 0 = (ff 0 1 ;ff0 2 ;:::;ff0 n) as obtained from Lemma 5. For all i 2 [n], define ff i (s)(a) = " jaj n j i (s)j n jsj + " 2 1 ff 0 2 jaj n j 1 (s)j n jsj i (s)(a); for all s 2 S and for all a 2 i (s). Let ff =(ff 1 ;ff 2 ;:::;ff n ). Note that for all s 2 S, we have Supp(ff i (s)) = i (s). For all i 2 [n], consider the two player i-mdps G fi ff i and G fi ff 0 i: for all s; s 0 2 S, and a i 2 i (s) wehave from the construction that: jffi ff i (s; a i )(s 0 ) ffi ff 0 i (s; a i )(s 0 " )j» jsj 2 j i (s)j. It follows that diff (G fi ff i ; G fi» ". Since ff ff i) 0 is a Nash equilibrium, and for all i 2 [n], 0 G fi ff i and G fi are fi-discounted player i-mdps with diff bounded by ", the ff 0 i result follows from Lemma 6. Lemma 8 ("-Nash equilibrium for reachability games [4]) For every n-player game structure G, with reachability objective [Reach(R i )]] for player i, for every ">0, there exists fi>0, such that a memoryless "-Nash equilibrium in G fi,isan 2"-Nash equilibrium in G. 15

17 Lemma 8 follows from the results in [4]. Lemma 7 and Lemma 8 yield Theorem 1. Theorem 1 ("-Nash equilibrium of full support) For every n-player game structure G, with reachability objective [Reach(R i )] for player i, for every " > 0, there exists a memoryless "-Nash equilibrium ff Λ = (ff Λ 1 ;ffλ 2 ;:::;ffλ n) such that for all s 2 S, for all i 2 [n], we have Supp(ff Λ i (s)) = i (s). 4 Nash Equilibrium for Upward-closed Objectives In this section we prove existence of memoryless "-Nash equilibrium, for all " > 0, for all n-player concurrent game structures, with upward-closed objectives for all players. The key arguments use induction on the number of players, the results of Section 3 and analysis of Markov chains and MDPs. We present some definitions required for the analysis of the rest of the section. MDP and graph of a game structure. Given an n-player concurrent game structure G, we define an associated MDP G of G and an associated graph of G. The MDP G =(S;A; ; ffi) is defined as follows: ffl S = S; A = A A ::: A = A n ; and (s) =f (a 1 ;a 2 ;:::;a n ) j a i 2 i (s) g. ffl ffi(s; (a 1 ;a 2 ;:::;a n )) = ffi(s; a 1 ;a 2 ;:::;a n ). The graph of the game structure G is defined as the graph of the MDP G. Games with absorbing states. Given a game structure G we partition the state space of G as follows: 1. The set of absorbing states in S are denoted as T, i.e., T = f s 2 C j s is an absorbing state g. 2. A set U of states that consists of states s such that j i (s)j = 1 for i 2 [n] and (U S) E U T. In other words, at states in U there is no non-trivial choice of moves for the players and thus for any state s in U the game proceeds to the set T according to the probability distribution of the transition function ffi at s. 3. C = S n (U [ T ). 16

18 Reachable sets. Given a game structure G and a state s 2 S, we define Reachable(s; G) = f t 2 S j there is a path from s to t in the graph of Gg as the set of states that are reachable from s in the graph of the game structure. For a set Z S, we denote by Reachable(Z; S G) the set of states reachable from a state in Z, i.e., Reachable(Z; G) = s2z Reachable(s; G). Given a set Z, let Z R = Reachable(Z; G). We denote by G μ Z R, the subgame induced by the set Z R of states. Similarly, given a set F 2 S, we denote by F μ Z R the set f U j9f 2F:U= F Z R g. Terminal non-absorbing maximal end components (Tnec). Given a game structure G, let Z be the set of maximal end components of G. Let L = ZnT S be the set of maximal non-absorbing end components and let H = L2L L. A maximal end component Z C, is a terminal nonabsorbing maximal end component(tnec), if Reachable(Z; G) (H nz) =;, i.e., no other non-absorbing maximal end component is reachable from Z. We consider in this section game structures G with upward-closed objective[[upclo(uc i )]] for player i. We also denote by R i = ffsg 2T j s 2 UC i g the set of the absorbing states in T that are in UC i. We now prove the following key result. Theorem 2 For all n-player concurrent game structures G, with upwardclosed objective [[UpClo(UC i )] for player i, one of the following conditions (condition C1 or C2) hold: 1. (Condition C1) There exists a memoryless strategy profile ff 2 ± M such that in the Markov chain G ff there is closed recurrent set Z C, such that ff is a Nash equilibrium for all states s 2 Z. 2. (Condition C2) There exists a state s 2 C, such that for all " > 0, there exists a memoryless "-Nash equilibrium ff 2 ± M for state s, such that Pr ff s ([[Reach(T )]) = 1, and for all s 2 S, and for all i 2 [n], we have Supp(ff i (s)) = i (s). The proof of Theorem 2 is by induction on the number of players. We first analyze the base case. Base Case. (One player game structures or MDPs) We consider player 1- MDPs and analyze the following cases: ffl (Case 1.) If there in no Tnec in C, then it follows from Lemma 2 that for all states s 2 C, for all strategies ff 1 2 ± 1, we have Pr ff 1 s ([[Reach(T )]]) = 1, and Pr ff 1 s ([[Reach(R 1 )]) = Pr ff 1 s ([UpClo(UC 1 )]) (recall R 1 = ffsg2t j s 2 UC 1 g). The result 17

19 of Theorem 1 yields an "-Nash equilibrium ff 1 that satisfies condition C2 of Theorem 2, for all states s 2 C. ffl (Case 2.) Else let Z C be a Tnec. 1. If Z 2 UC 1, fix a uniform memoryless strategy ff 1 2 ± UM 1 such that for all s 2 Z, Pr ff 1 s (f! j Inf(!) = Z g) = 1 and Pr ff 1 s ([[UpClo(UC 1 )]) = 1 (such a strategy exists by Lemma 3, since C is an end component). In other words, Z is a closed recurrent set in the Markov chain G ff1 and the objective of player 1 is satisfied with probability 1. Hence condition C1 of Theorem 2 is satisfied. 2. If Z 62 UC 1, then since UC 1 is upward-closed, for all set Z 1 Z, Z 1 62 UC 1. Hence for any play!, such that! 2 [Safe(Z)]], we have Inf(!) Z, and hence! 62 [UpClo(UC 1 )]. Hence we have for all states s 2 Z, sup Pr ff 1 s ([UpClo(UC 1 )]) = sup Pr ff 1 s ([Reach(R 1 )]): ff 1 2± 1 ff 1 2± 1 If the set of edges from Z to U [ T is empty, then for all strategies ff 1 we have Pr ff 1 s ([UpClo(UC 1 )]) = 0, and hence any uniform memoryless strategy can be fixed and condition C1 of Theorem 2 can be satisfied. Otherwise, the set of edges from Z to U [ T is non-empty, and then for " > 0, consider an "-Nash equilibrium for reachability objective [[Reach(R 1 )]] satisfying the conditions of Theorem 1. Since Z is an end component, for all states s 2 Z, Supp(ff 1 (s)) = 1 (s), and the set of edges to Z to U [ T is non-empty it follows that for all states s 2 Z, we have Pr ff 1 s ([Reach(T )]]) = 1. Thus condition C2 of Theorem 2 is satisfied. We prove the following lemma, that will be useful for the analysis of the inductive case. Lemma 9 Consider a player i-mdp G with an upward-closed objective [[UpClo(UC i )]] for player i. Let ff i 2 ± M i be a memoryless strategy such that for all s 2 S, we have Supp(ff i (s)) = i (s). Let Z S be a closed recurrent set in the Markov chain G ffi. Then ff i is a Nash equilibrium for all states s 2 Z. Proof. The proof follows from the analysis of two cases. 18

20 1. If Z 2 UC i, then since Z is a closed recurrent set in G ffi, for all states s 2 S we have Pr ff i s (f! j Inf(!) = Z g) = 1. Hence we have Pr ff i s ([[UpClo(UC i )]])=1 sup ff 0 2± Prff0 i s ([UpClo(UC i i i )]). The result follows. 2. We now consider the case such that Z 62 UC i. Since for all s 2 Z, we have Supp(ff i (s)) = i (s), it follows that for all strategies ff 0 i 2 ± i and for all s 2 Z, we have Outcome(s; ff 0 i ) Outcome(s; ff i) [[Safe(Z)] (since Z is a closed recurrent set in G ffi ). It follows that for all strategies ff 0 i we have Prff0 i s ([[Safe(Z)]]) = 1. Hence for all strategies ff 0 i, for all states s 2 Z we have Pr ff0 i s (f! j Inf(!) Z g) =1. Since Z 62 UC i, and UC i is upward-closed, it follows that for all strategies ff 0 i, for all states s 2 Z we have Pr ff0 i s ([[UpClo(UC i )])=0. Hence for all states s 2 Z, we have sup ff 0 2± Prff0 i s ([[UpClo(UC i i i )])=0=Pr ff i s ([UpClo(UC i )]). Thus the result follows. Inductive case. Given a game structure G, consider the MDP G: if there are no Tnec in C, then the result follows from analysis similar to Case 1 of the base case. Otherwise consider a Tnec Z C in G. If for every player i we have Z 2 UC i, then fix a uniform memoryless strategy ff 2 ± UM such that for all s 2 Z, Pr ff s (f! j Inf(!) = Z g) = 1 (such a strategy exists by Lemma 3, since C is an end component ing). Hence, for all i 2 [n] wehave Pr ff s ([[UpClo(UC i )]]) = 1. That is Z is a closed recurrent set in the Markov chain G ff and the objective ofeach player is satisfied with probability 1 from all states s 2 Z. Hence condition C1 of Theorem 2 is satisfied. Otherwise, there exists i 2 [n], such that Z 62 UC i, and without loss of generality we assume that this holds for player 1, i.e., Z 62 UC 1. If Z 62 UC 1, then we prove Lemma 10 to prove Theorem 2. Lemma 10 Consider an n-player concurrent game structure G, with upward-closed objective [UpClo(UC i )] for player i. Let Z be a Tnec in G such that Z 62 UC 1 and let Z R = Reachable(Z; G). The following assertions hold: 1. If there exists ff 1 2 ± M, such that for all s 2 Z, Supp(ff 1 1(s)) = 1 (s), and condition C1 of Theorem 2 holds in G ff1 μ Z R, then condition C1 Theorem 2 holds in G. 2. Otherwise, condition C2 of Theorem 2 holds in G. 19

21 Proof. Given a memoryless strategy ff 1, fixing the strategy ff 1 for player 1, we get an n 1-player game structure and by inductive hypothesis either condition C1 or C2 of Theorem 2 holds. ffl Case 1. If there is a memoryless strategy ff 1 2 ± M 1, such that for all s 2 Z, Supp(ff 1 (s)) = 1 (s), and condition C1 of Theorem 2 holds in G ff1, then let ff 1 =(ff 2 ;ff 3 ;:::;ff n ) be the memoryless Nash equilibrium and Z 1 Z be the closed recurrent set in G ff 1 [ff 1 satisfying the condition C1 of Theorem 2 in G ff1. Observe that (ff 1 ;Z 1 ) satisfy the conditions of Lemma 9 in the MDP G ff i. Hence, an application of Lemma 9 yields that ff 1 is a Nash equilibrium for all states s 2 Z 1, in the MDP G ff 1. Since ff 1 is a Nash equilibrium for all states in Z 1 in G ff1, it follows that ff = ff 1 [ ff 1 and Z 1 satisfy condition C1 of Theorem 2. ffl For " > 0, consider a memoryless "-Nash equilibrium ff = (ff 1 ;ff 2 ;:::;ff n )ing with objective [Reach(R i )] for player i, such that for all s 2 S, for all i 2 [n], we have Supp(ff i (s)) = i (s) (such an "-Nash equilibrium exists from Theorem 1). We now prove the desired result analyzing two sub-cases: 1. Suppose there exits j 2 [n], and Z j Z, such that Z j 2 UC j, and Z j is an end component in G ff j, then let ff 0 j be a memoryless strategy for player j, such that Z j is a closed recurrent set of states in the Markov chain G ff j [ff 0. Let ff 0 = ff j j [ ff 0 j. Since Z j 2 UC j, it follows that for all states s 2 Z j, we have Pr ff0 s ([[UpClo(UC j )]) = 1, and hence player j has no incentive to deviate from ff 0. Since for all ff i, for i 6= j, and for all states s 2 S, wehave Supp(ff i )(s) = i (s), and Z j is a closed recurrent set in G ff 0, it follows from Lemma 9 that for all j 6= i, ff i is a Nash equilibrium in G ff 0 i. Hence we have ff 0 is a Nash equilibrium for all states s 2 Z j in G and condition C1 of Theorem 2 is satisfied. 2. Hence it follows that if Case 1 fails, for all i 2 [n], all end components Z i Z, in G ff i, we have Z i 62 UC i. Hence for all i 2 [n], for all s 2 Z, for all ff 0 i 2 ± i, Pr ff i[ff 0 i s ([UpClo(UC i )]) = Pr ff i[ff 0 i s ([Reach(R i )]]). Since ff is an "-Nash equilibrium with objectives [Reach(R i )] for player i in G, it follows that ff is an "-Nash equilibrium in G with objectives [[UpClo(UC i )]] for player i. Moreover, if there is an closed recurrent set Z 0 Z in the Markov chain G ff, then case 1would have been true (follows 20

22 from Lemma 9). Hence if case 1 fails, then it follows that there is no closed recurrent set Z 0 Z in G ff, and hence for all states s 2 Z, we have Pr ff s ([Reach(T )]]) = 1. Hence condition C2 of Theorem 2 holds, and the desired result is established. Inductive application of Theorem 2. Given a game structure G, with upward-closed objective [[UpClo(UC i )] for player i, to prove existence of "-Nash equilibrium for all states s 2 S, for " > 0, we apply Theorem 2 recursively. We convert the game structure G to a game structure G 0 as follows: 1. Transformation 1. If condition C1 of Theorem 2 holds, then let Z be the closed recurrent set that satisfy the condition C1 of Theorem 2. ffl In G 0 convert every state s 2 Z to an absorbing state; ffl if Z 62 UC i, for player i, then the objective for player i in G 0 is UC i ; ffl if Z 2 UC i for player i, the objective for player i in G is modified to include every state s 2 Z, i.e., for all Q S, ifs 2 Q, for some s 2 Z, then Q is included in UC i. Observe that the states in Z are converted to absorbing states and will be interpreted as states in T in G Transformation 2. If condition C2 of Theorem 2 holds, then let ff Λ be an jsj " -Nash equilibrium from state s, such that PrffΛ s ([[Reach(T )]])=1. The state is converted as follows: for all i 2 [n], the available moves for player i at s is reduced to 1, i.e., for all i 2 [n], i (s) =f a i g, and the transition function ffi 0 in G 0 at s is defined as: ( Pr ff Λ s ([Reach(t)]) if t 2 T ffi(s; a 1 ;a 2 ;:::;a n )(t) = 0 otherwise: Note that the state s can be interpreted as a state in U in G 0. To obtain a "-Nash equilibrium for all states s 2 S in G, it suffices to obtain an "-Nash equilibrium for all states in G 0. Also observe that for all states in U [ T, Nash equilibrium exists by definition. Applying the transformations recursively on G 0,we proceed to convert every state to a state in U [ T, and the desired result follows. This yields Theorem 3. 21

23 Theorem 3 For all n-player concurrent game structures G, with upwardclosed objective [UpClo(UC i )]] for player i, for all " > 0, for all states s 2 S, there exists a memoryless strategy profile ff Λ, such that ff Λ is an "-Nash equilibrium for state s. Remark 1 It may be noted that upward-closed objectives are not closed under complementation, and hence Theorem 3 is not a generalization of determinacy result for concurrent zero-sum games with upward-closed objective for one player. For example in concurrent zero-sum games with Büchi objective for a player, "-optimal strategies require infinite-memory in general, but the complementary objective of a Büchi objective is not upward-closed (recall Example 1). In contrast, we show the existence of memoryless "- Nash equilibrium for n-player concurrent games where each player has an upward-closed objective. For the special case of zero-sum turn-based games, with upward-closed objective for a player, existence of memoryless optimal strategies was proved in [3]; however note that the memoryless strategies require randomization as pure or deterministic strategies require memory even for turn-based games with generalized Büchi objectives. 5 Computational Complexity In this section we present an algorithm to compute an "-Nash equilibrium for n-player game structures with upward-closed objectives, for " > 0. A key result for the algorithmic analysis is Lemma 11. Lemma 11 Consider an n-player concurrent game structure G, with upward-closed objective [UpClo(UC i )] for player i. Let Z be a Tnec in G such that Z 62 UC n and let Z R = Reachable(Z; G). The following assertion hold: 1. Suppose there exists ff n 2 ± M n, such that for all s 2 Z, Supp(ff n(s)) = n (s), and condition C1 of Theorem 2 holds in G ffn μ Z R. Let ff Λ n 2 ± UM n such that for all s 2 Z we have Supp(ffn(s)) Λ = n (s) (i.e., ff Λ n is a uniform memoryless strategy that plays all available moves at all states in Z). Then condition C1 holds in G ff Λ n μ Z R. Proof. The result follows from the observation that for any strategy profile ff n 2 ± M n, the closed recurrent set of states in G ff n[ff n and G ff n[ff Λ n are the same. Lemma 11 presents the basic principle to identify if condition C1 of Theorem 2 holds in a game structure G with upward-closed objective 22

24 Algorithm 1 UpCloCondC1 Input : An n-player game structure G, with upward-closed objective [UpClo(UC i )]] for player i, for all i 2 [n]. Output: Either (Z; ff) satisfying condition C1 of Theorem 2 or else (;; ;). 1. if n =0, 1.1 if there is a non-absorbing closed recurrent set Z in the Markov chain G, return (Z; ;). 1.2 else return (;; ;). 2. Z =ComputeMaximalEC(G) (i.e., Z is the set of maximal end components in the MDP of G). 3. if there is no Tnec in G, return (;; ;). 4. if there exists Z 2Z such that for all i 2 [n], Z 2 UC i, 4.1. return (Z; ff) such that ff 2 ± UM and Z is closed recurrent set in G ff. 5. Let Z be a Tnec in G, and let Z R = Reachable(Z; G). 6. else without loss of generality let Z 62 UC n Let ff n 2 ± UM n such that for all states s 2 Z R, ff n (s) = n (s) (Z 1 ; ff) =UpCloCondC1 (G ffn μ Z R ;n 1; [UpClo(UC i μ Z R )] for i 2 [n 1]) 6.3. if (Z 1 = ;) return (;; ;); 6.4. else return (Z 1 ; ff n [ ff n ). [UpClo(UC i )]] for player i. An informal description of the algorithm (Algorithm 1) is as follows: the algorithm takes as input a game structure G of n-players, objectives [UpClo(UC i )] for player i, and it either returns (Z; ff) 2 S ± M satisfying the condition C1 of Theorem 2 or returns (;; ;). Let G be the MDP of G, and let Z be the set of maximal end components in G (computed in Step 2 of Algorithm 1). If there is no Tnec in G, then condition C1 of Theorem 2 fails and (;; ;) is returned (Step 3 of Algorithm 1). If there is a maximal end component Z 2 Z such that for all i 2 [n], Z 2 UC i, then fix a uniform memoryless strategy ff 2 ± UM such that Z is a closed recurrent set in G ff and return (Z; ff) (Step 4 of Algorithm 1). Else let Z be a Tnec and without of loss of generality let Z 62 UC n. Let Z R = Reachable(Z; G), and fix a strategy ff n 2 ± UM n, such that for all s 2 Z R, Supp(ff n (s)) = n (s). The n 1-player game structure G ffn μ Z R is solved by an recursive call (Step 6.3) and the result of the recursive call is returned. It follows from Lemma 11 that if Algorithm 1 23

A Survey of Stochastic ω-regular Games

A Survey of Stochastic ω-regular Games Krishnendu Chatterjee Thomas A. Henzinger EECS, University of California, Berkeley, USA Computer and Communication Sciences, EPFL, Switzerland {c krish,tah}@eecs.berkeley.edu