Monomial strategies for concurrent reachability games and other stochastic games

Size: px

Start display at page:

Download "Monomial strategies for concurrent reachability games and other stochastic games"

Eileen Perkins
5 years ago
Views:

1 Monomial strategies for onurrent reahability games and other stohasti games Søren Kristoffer Stiil Frederisen and Peter Bro Miltersen Aarhus University Abstrat. We onsider two-player zero-sum finite but infinite-horizon stohasti games with limiting average payoffs. We define a family of stationary strategies for Player I parameterized by ε > 0 to be monomial, if for eah state and eah ation of Player I in state exept possibly one ation, we have that the probability of playing in is given by an expression of the form ε d for some non-negative real number and some non-negative integer d. We show that for all games, there is a monomial family of stationary strategies that are ε-optimal among stationary strategies. A orollary is that all onurrent reahability games have a monomial family of ε-optimal strategies. This generalizes a lassial result of de Alfaro, Henzinger and Kupferman who showed that this is the ase for onurrent reahability games where all states have value 0 or 1. 1 Introdution We onsider two-player zero-sum finite but infinite-horizon stohasti games G with state set {1, 2,..., N} and set of ations {1, 2,..., m} available to eah of the two players in eah state. The reward to Player I when Player I plays i and Player II plays in state is denoted a i. Transition probabilites are denoted pl i. We assume stopping probabilitites are 0, i.e., for all, i, we have l pl i = 1. We are interested in games with limiting average undisounted payoffs [8, 12], i.e, payoff lim inf T t 1 i=0 r t/t to Player I, where r t is the reward olleted by Player I at stage t. A stationary strategy x for a player in a stohasti game is a fixed time independent assignment of probabilities to his ations, for eah of the states of the game. We let x denote the probability of playing ation in state aording to stationary strategy x. We denote the set of stationary strategies for Player I II by S I S II. For a state, the lower value in stationary strategies of, denoted v, is defined as sup x SI inf y SII u x, y, where u x, y is the expeted limiting average payoff when stationary strategy x of Player I is played against stationary strategy y of Player II and play starts in state. The authors anowledge support from The Danish National Researh Foundation and The National Siene Foundation of China under the grant for the Sino-Danish Center for the Theory of Interative Computation and from the Center for re- searh in the Foundations of Eletroni Marets CFEM, supported by the Danish Strategi Researh Counil.

2 Given ε > 0, a stationary strategy x for Player I is alled ε-optimal among stationary strategies if for all states, we have inf y SII u x, y v ε. Notie that when Player I has fixed his stationary strategy, Player II is ust playing a Marov deision proess, so he has an optimal positional reponse. The main purpose of the present paper is to prove that all stohasti games have a family of ε-optimal strategies among stationary strategies of a partiular regular ind. We introdue the following definition. Definition 1. A family of stationary strategies x ε 0<ε ε0 for Player I in a stohasti game is alled monomial if for all states, and all ations available to Player I in state exept possibly one ation, we have that x ε, is given by a monomial in ε, i.e., an expression of the form εd, where d is a non-negative integer and is a non-negative real number. The exeption made in the definition for some single ation in eah state is natural and neessary: The sum of probabilities assigned to the ations in eah state must be 1, so without this exeption, it is easy to see that a monomial family would have d = 0 for all,, i.e., it would be a single strategy rather than a family. Also note that when we speify a monomial family of strategies, we do not have to speify the probability assigned to the speial ation in eah state, as it is simply the result of subtrating the sum of the probabilities assigned to the remaining ations from one. We an now state our main theorem: Theorem 1. For any game G, there is an ε 0 > 0 and a monomial family of stationary strategies x ε 0<ε ε0 for Player I, so that for eah ε 0, ε 0 ], we have that x ε is ε-optimal among stationary strategies. Disussion of the main theorem A monomial family of strategies an be naturally interpreted as a parameterized strategy where probabilities have welldefined orders of magnitude, given by the degrees d. Our main theorem informally states that suh lean strategies are suffiient for playing stohasti games well, at least if one is restrited to the use of stationary strategies. Our main motivation for the theorem is omputational: A monomial family of strategies is a finite obet, and our theorem maes it possible to as the question of whether a family of ε-optimal strategies parameterized by ε an be effiiently omputed for a given game, as the result maes this question well-defined. The existene proof of the present paper is essentially non-onstrutive and provides no effiient algorithm although it is possible to derive an ineffiient algorithm using standard tehniques, so we do not answer the question in this paper. It should also be noted that it is easy to give examples of games with rational rewards and transition probabilties where the oeffiients annot be rational numbers, so one has to worry about how to represent those. Fortunately, a straightforward appliation of the Tarsi transfer priniple yields that algebrai oeffiients suffie, and suh a number has a finite representation in the form of a univariate polynomial with rational oeffiients and an isolating interval within whih the number is the only root of the polynomial. Our main theorem is partiularly natural for lasses of stohasti games that are guaranteed to have a value in stationary strategies, that is, games for whih

3 the lower value sup x SI inf y SII u x, y and the upper value inf y SII sup x SI u x, y oinide. A natural sublass of stohasti games with this property is Everett s reursive games [6]. In a reursive game, all non-zero rewards our at absorbing states: states with only one ation 1 available to eah player and p 1,1 = 1 terminal states. Everett presents several examples of families of ε-optimal strategies for natural reursive games and upon inspetion, we note that they are monomial. An interesting sublass of reursive games widely studied in the omputer siene literature [5, 3, 11, 9] is the lass of onurrent reahability games. In a onurrent reahability game, Player I is trying to reah a distinguished goal state and Player II is trying to prevent him from reahing this state. To view suh a game as a reursive game, we simply interpret the goal state as an absorbing state g with reward r g 1,1 = 1. Then, the lower value v of a state is naturally interpreted as the optimal probability of reahing the goal state from. De Alfaro, Henzinger and Kupferman [5] presented a polynomial time algorithm for deiding whih states in a onurrent reahability game have value 1. Inspeting their proof of orretness, we see that it yields an expliit onstrution of a monomial family of ε-optimal strategies for Player I if the onurrent reahability games satisfy the very restritive property that eah state has value either 0 or 1. Note that even this ase requires non-trivial strategies for near optimal play [11]. Also, their polynomial time algorithm an easily be adapted to output this strategy. It is interesting to note that in the omputed strategy, all oeffiients are either 0 or 1. Disussion of the proof Our proof relies heavily on semi-algebrai geometry. In this respet, the proof tehnique is muh in line with lassial wors on stohasti games, in partiular the wor of Bewley and Kohlberg [1], and semi-algebrai geometry has seen several uses in stohasti games, see for example [13, 4, 15, 10]. Our proof an be outlined as follows. First, we show that it is possible in first order logi over the reals to uniquely define a partiular distinguished ε- optimal strategy among stationary strategies, with ε being a free variable in this definition. Then, standard theorems of semi-algebrai geometry imply that there is a family of ε-optimal strategies the probabilities of whih an be desribed as Pusieux series in the parameter ε > 0. We then round these series to their most signifiant terms and finally massage them into monomials. To argue that ε-optimality is not lost in the proess, we appeal to theorems upper bounding the sensitivity of the limiting average values of Marov hains to perturbations of their transition probabilities. These sensitivity theorems are due to Solan [14], building on wor on Freidlin and Wentzell [7]. As our main theorem is very simply stated, one might speulate that it has an elementary proof, avoiding the use of semi-algebrai geometry. However, we are not aware of any suh proof, even for the ase of onurrent reahability games. It should be noted that the proof by De Alfaro, Henzinger and Kupferman is ombinatorial in nature, and does not rely on semi-algebrai geometry, so at least for the simpler ase onsidered by them, elementary arguments do exist. Organization of paper In setion 2 we will introdue the definitions, lemmas and previous results neessary for the proof. In setion 3 we prove a version of

4 the main theorem with monomials replaing Puiseux series. In setion 4 we prove the atual main theorem. 2 Preliminaries For n N, let [n] denote {1,..., n}. A Puiseux series p over some indeterminate T and field F is an expression of the form p = i=k a it i M where K Z, M N, and for all i, a i F, with the expression satisfying that if p 0 then there i Z : a i 0 gdi, M = 1. Similarly, a funtion p : R R is a Puiseux funtion on an interval I, if there exists K Z, M N, a i R suh that pɛ = i=k a iɛ i M for all ɛ I. In the ontext of this paper we will only loo at Puiseux funtions, and we will often all the funtion pɛ a Puiseux series. The order of a Puiseux series p = i=k a it i M is the smallest integer i suh that a i 0, and we will write ordp = i. If p = 0 then the order is defined to be. The proofs of the following elementary lemmas on Puiseux series are easy and we omit them. Lemma 1. if qɛ = i=k iɛ i M is a Puiseux series that is onvergent and bounded on some 0, ɛ 0, then i = 0 for all i < 0. In other words, the order of q is greater than or equal to 0. Lemma 2. For any Puiseux series qɛ = i=k iɛ i M with ordq = K 0 there exists an ɛ 0 suh that signqɛ = sign K for all ɛ 0, ɛ 0. A semi-algebrai set is a subset of real Eulidean spae defined by a finite set of polynomial equalities and inequalities. The well-nown Tarsi-Seidenberg theorem states that any set that an be defined in the language of first order arithmeti is semi-algebrai. We will use this theorem throughout this paper to establish that sets are semi-algebrai. A semi-algebrai funtion is a real-valued funtion whose graph is a semi-algebrai set. We shall use the following lemma, establishing a lose relationship between semi-algebrai funtions and Puiseux funtions. Lemma 3. [13, lemma 6.2] Let a > 0, if f : 0, a R is a semi-algebrai funtion, then there exists an 0 < ɛ < a suh that f is a Puiseux funtion on 0, ɛ. For stohasti games, we use the notation introdued in the introdution. We shall use the following theorem, due to Solan, as an important lemma. The theorem applies to 1-player stohasti games a..a., Marov deision proesses. In a 1-player stohasti game, Player 2 has only a single ation in eah state. We therefore write p l i rather than p l i for the transition probabilities. Theorem 2. [14, theorem 6] Let G and G be 1-player stohasti games with idential state set {1, 2,..., N}, transition probabilities p l i, pl i and idential rewards. Let be an upper bound on the absolute value of all rewards. Let v, ṽ 1 be the lower value in stationary strategies in eah of the games. Let δ 0, satisfy max i,,l pl i p l i, pl i p l i 2N 1 δ, where x 0 :=, 0 0 := 1. Then, v ṽ 4Nδ.

5 3 Puiseux family of strategies Lemma 4. For any game G there exists an ɛ 0 and a family of stationary strategies x ɛ 0<ɛ ɛ0 that are ɛ-optimal among stationary strategies, where for all states and all ations, x ɛ, is given by a Puiseux series in ɛ, that is, there is an expression q ɛ = i i=k i, ɛ M suh that x ɛ, = q ɛ for ɛ 0, ɛ 0]. Proof. We want to reate a first-order formula Φ x, ɛ for every state and every ation, whih is true if and only if x is the probability that Player I should play ation in state in a speifi strategy that is ɛ-optimal among stationary strategies. Then, sine we have desribed the funtion by a first-order formula, it is semi-algebrai, and by Lemma 3 we get that there exists a Puiseux series that is equal to the funtion, thus ompleting the proof. We are going to use several smaller first-order formulas to desribe the formulas Φ x, ɛ. To ease notation, during the proof, l will only be refering to states in the game, so they will be numbers, l [N]. i, will be refering to ations in a given state, so they will be numbers i, [m]. We will also use the following vetors x := x i [N] i [m], y := y i [N] i [m], v := v [N], ν := ν [N] x and y will represent the strategies of Player I and Player II respetively, while v and ν will be used to represent different values of stationary strategies of the game starting in eah position. The first two formulas α x, β y desribe that x is a stationary strategy and y is a stationary strategy respetively. [ α x := x i 0 ] x i = 1 β y := [N],i [m] [N],i [m] [m] [ y i 0 ] [N] i [m] yi = 1 i [m] Next we want to reate a first-order formula Ψv whih expresses that v is the lower value in stationary strategies when the game starts in state, that is, the quantity: sup x S I T 1 r t inf E x,y lim inf y S II T T t=0 We an rewrite this quantity by using the following equations proved in [2, Theorem 5.2] T 1 r t inf E x,y lim inf y S II T T = t=0 inf lim inf y S II λ 0 E x,y λ λ 1 + λ t r t, x S I t=0

6 So the suprema over the two sets are the same, and we an express the value by reating a formula whih express that v = sup x S I inf lim inf y S II λ 0 E x,y λ λ 1 + λ t r t [N] A ommon way of rewriting these value equations is by expanding the expetations for one state and substituting v l into the equations v = sup x S I v = sup x S I inf lim inf y S II λ 0 inf lim inf y S II λ 0 E x,y λ 1 + λ t=0 λ λ 1 + λ t r t [N] t=0 x i y a i + p l 1 i i, [m] l [N] λ vl [N] First notie that for any semi-algebrai sets A and B, and any funtion f : A B where there is a formula Πa, b that is true if and only is fa = b, we an express the supremum sup a A fa in the following way Π sup s := [ a A b B : Πa, b s b] [ ɛ > 0 a A b B : Πa, b s < b + ɛ] And similar formulas an be reated for the infimum and the limit, and sine lim inf λ 0 fλ is lim λ 0 inf 0<λ<λ fλ, we only need to reate a formula for the inner part: λ x i y a i + p l 1 i 1 + λ λ vl [N] i, [m] We then reate the formula Πx, y, ν, λ := ν = λ 1 + λ i, [m] l [N] x i y a i + l [N] p l 1 i λ νl Sine S I = {x R Nm α x}, we have that S I, S II are semi-algebrai. Then from the previous argument we an reate a formula Π sup v for the lower value in stationary strategies. Also, by not removing the last supremum, we an reate a formula Ξx, v that is true if the value of Player I playing strategy x is v. It is now straightforward to reate a formula Υ x, ɛ that is true if and only if x is a stationary strategy that is ɛ-optimal among stationary strategies. Υ x, ɛ := v R N ν R N :Λ α x 0 < ɛ < 1 Π sup v Ξx, ν [N] [ ν v ɛ ] Now to reate Φ x, ɛ, we need to selet a unique strategy from the set of stationary strategies that are ɛ-optimal among stationary strategies. Let ϕ :

7 [N] [m] [Nm] be some bietion, whih we will use to get an ordering on the pairs onsisting of an ation i and a state. Using this we an write a strategy as x ι ι [Nm]. We define formulas P ι x 1,..., x ι, ɛ for ι [Nm] whih are true if there exists a strategy that is ɛ-optimal among stationary strategies and the first ι entries are x 1,..., x ι. P ι x 1,..., x ι, ɛ := x ι+1,..., x Nm R : Υ x 1,..., x ι, x ι+1,..., x Nm, ɛ Notie that for eah ι [Nm], if we assume that we have hosen x 1,..., x ι 1 suh that P ι 1 x 1,..., x ι 1, ɛ is true, then the set {x R P ι x 1,..., x ι 1, x, ɛ} is non-empty. From the Tarsi-Seidenberg theorem the set is semi-algebrai, so it is defined by a finite set of polynomial equalities and inequalities. This implies that the set must onsist of a finite set of intervals 1, so we an hoose a unique strategy by the middle of the interval whih lower endpoint is losest to 0. Using this observation, we an now reate a new series of formulas Ψ ι x 1,..., x ι 1, x, ɛ for ι [Nm] whih given that P ι 1 x 1,..., x ι 1, ɛ is true, x is the middlepoint of the interval with the lower endpoint losest to 0 among the intervals in the set {x R P ι x 1,..., x ι 1, x, ɛ}. Ψ ι x 1,...,x ι 1, x, ɛ := x ι+1,..., x Nm, a, b R : a b x = a + b : 2 Υ x 1,..., x ι 1, x, x ι+1,..., x Nm, ɛ [P ι x 1,..., x ι 1, a, ɛ a < b y a, b : P ι x 1,..., x ι 1, y, ɛ] [ y < a : P ι x 1,..., x ι 1, y, ɛ] [ ɛ > 0 y b, b + ɛ : P ι x 1,..., x ι 1, y, ɛ] Now to selet our unique strategy we will do the following: For eah ɛ, pi x 1 to be the middlepoint of the interval with the lower endpoint losest to 0 among the intervals in the set {x R P ι x, ɛ}, next we pi x 2 to be the middlepoint of the interval with the lower endpoint losest to 0 among the intervals in the set {x R P ι x 1, x, ɛ}, and so on. We an then reursively define new formulas Ω ι x 1,..., x ι, ɛ for ι [Nm] that are true if and only if the unique hoie of the first ι indies desribed by the above proedure is exatly x 1,..., x ι. Ω 1 x, ɛ := Ψ 1 x, ɛ, Ω ι x 1,..., x ι, ɛ := Ω ι 1 x 1,..., x ι 1, ɛ Ψ ι x 1,..., x ι, ɛ Using this we an now immediately reate the formulas Φ ι x, ɛ for ι [Nm] in the following way: Φ ι x, ɛ := x 1,..., x Nm R : Ω Nm x 1,..., x Nm, ɛ x = x ι Now we have obtained that eah formula Φ ι x, ɛ impliitly defines a semialgebrai funtion x ι ɛ and due to Lemma 3 we have that there exists Puiseux series q ι ɛ and numbers ɛ ι suh that x ι ɛ = q ι ɛ for ɛ 0, ɛ ι. Now tae ɛ 0 = min ι [Nm] ɛ ι and we have the lemma. 1 In this terminology we allow for the interval [a, a] and identify it with the point {a}.

8 4 Proof of main theorem The proof will be arried out in two steps. First we will use the family of strategies obtained from Lemma 4 to reate a family of strategies only onsisting of the first term of the Puiseux series of the original family. Then by using Theorem 2, we prove their value an not be muh worse. Then finally we transform this family into a monomial family of strategies that are ɛ optimal among stationary strategies. Proof of Theorem 1. From Lemma 4 we now that there exists an ɛ 1 and a family of stationary strategies x ɛ 0<ɛ ɛ1 that are ɛ-optimal among stationary strategies suh that x ɛ, = q ɛ = i i=k i, ɛ M for ɛ 0, ɛ 1 ] and for all states and ations. Assume without loss of generality that K = ordq, and observe that K an be if the Puiseux series is identially 0. Also observe that sine eah x ɛ, is a probability, it is positive and bounded, so by Lemma 1 we now that all K 0. Now for eah, loo at the set of Puiseux series {q ɛ} [m] and let be an index so q ɛ is one of the Puiseux series in the set whih has minimal order. Observe that q ɛ has order 0. To see this, assume for ontradition that ordq > 0 for all ations, then all of them behave as power series around 0, thus q ɛ 0 for ɛ 0 so the sum [N] q ɛ 0 for ɛ 0, whih ontradits that [N] q ɛ = 1 for all ɛ 0, ɛ 1]. Now loo at any again. We want to approximate the family of strategies defined by q ɛ by a new family of strategies defined by finite Puiseux series ρ ɛ for ɛ 0, ɛ 2], where ɛ 2 will be defined later. We define ρ ɛ as a onditional funtion on the following sets S 1 = {, [N] [m] ordq = } S 2 = {, [N] [m] ordq } S 3 = {, [N] [m] = } Then ρ ɛ is defined as follows ρ ɛ = K K,ɛ 0 if, S 1 M if, S 2 1 K S 2 K,ɛ M if, S 3 So ρ ɛ [N] [m] is the derived family of strategies from q ɛ, defined by ρ ɛ 0 when q ɛ 0, and otherwise equal to the first term in q ɛ exept for one ation, q ɛ whih is 1 minus the sum of the other probabilities, to ensure ρ ɛ is a probability distribution. Sine q ɛ is a probability, then it is positive, so

9 from Lemma 2 we have that for, S 2 the onstant is positive. But then we an hoose ɛ 2 to be small enough so that for all, S 2, ρ ɛ 1. So for eah [N], ρ ɛ [m] beomes a probability distribution. We will use Theorem 2 to prove that the value of the game where Player I fixes his strategy to ρ ɛ [N] [m], is not muh different than the value of the game where Player I fixes his strategy to q ɛ [N] [m]. To do this, we must show that for all states and all ations, ρ ɛ is multipliatively lose to q ɛ in the sense of Theorem 2. We loo at the three ases where a pair, is either in S 1,S 2 and S 3. For the ase, S 1, q ɛ = 0 = ρ ɛ, so they are trivially lose. Now we loo at an arbritrary, S 2. To simplify notation we omit the, in the notation, and hene ρ ɛ beomes ρɛ = Kɛ K M and q ɛ beomes qɛ = i=k Kɛ i M. We want to show that there exists an ɛ for this, S 2 suh that for all ɛ 0, ɛ we have qɛ 1 ɛ 1 M 1 + K+1 ρɛ qɛ 1 + ɛ 1 M K 1 + K+1 K To see this holds, we loo at the differene between the two numbers qɛ 1 + ɛ 1 M 1 + K+1 ρɛ = K i=k+1 i ɛ i K+1 M + ɛ M K i ɛ i M i=k = ɛ K+1 M K+1 + K 1 + K+1 K +... So the first term is positive, and Lemma 2 gives us that the series is positive on some area 0, ɛ. Similarly we an show that qɛ 1 ɛ 1 1+ K+1 M K ρɛ is negative on some area 0, ɛ, so by letting ɛ = minɛ, ɛ we get the desired inequalities. Sine this wors for an arbritary state and ation where, S 2, we an reate similar inequalities that wor for all the states and ations in S 2 by defining 1 + K C := max +1,, S 2, Q := min K, 1, S 2 M, ɛ 3 := min, S 2 ɛ This immediately implies that for all, S 2 we get the following multipliative relation between q ɛ and ρ ɛ q ɛ 1 ɛ Q C ρ ɛ q ɛ 1 + ɛ Q C ɛ 0, ɛ 3 Now we loo at, S 3. From the observations on S 2 we have that for all l, i S 2, that ρ l i ɛ ql i ɛ 1 ɛ Q C for ɛ 0, ɛ 3. Furthermore sine we now that i [m] q i ɛ = 1, it holds that q ɛ = 1 i S 2 qi ɛ. We use these

10 observations to ompute the following ρ ɛ = 1 i S 2 ρ i ɛ 1 1 ɛ Q C i S 2 q i ɛ = ɛ Q C + 1 ɛ Q C 1 ɛ Q C i S 2 q i ɛ = ɛ Q C + 1 ɛ Q C1 i S 2 q i ɛ = ɛ Q C + 1 ɛ Q Cq ɛ = q ɛ Q C ɛ q ɛ + 1 ɛq C q 2ɛ Q C ɛ + 1 ɛ Q C 0, The last inequality is onditioned on ɛ being small enough. To see how small ɛ must be, onsider the Puiseux series q ɛ. First reall that for i, l S 3, qi lɛ has order 0, so the initial term is ust a onstant 0,, and from Lemma 2 we now that the onstant is positive. Now loo at the the tail M without the first term. The tail is ust a frational power series, so it tends to 0 for ɛ 0. This means that for any onstant κ, then there exists an ɛ suh i=1 i, ɛ i that for all ɛ < ɛ the tail is smaller than κ. By using the onstant 0, 2, we get that ρ ɛ must be larger than 0, 2 when ɛ 0, ɛ, giving us the inequality for ɛ 0, ɛ. If we then hose ɛ = minɛ, ɛ 3, then all the inequalities of the above omputation hold. In the same way, we get that there exists an ɛ suh that ρ ɛ q 2ɛ Q C ɛ ɛ Q C ɛ 0, ɛ 0, Now let ɛ = minɛ, ɛ, and let ɛ 4 = min, S3 ɛ. We now get that both inequalities hold for all, S 3 q 2ɛ Q C ɛ ɛ Q C ρ ɛ q 2ɛ Q C ɛ 0, + 1 ɛ Q C 0, Next by defining = min, S3 0,, and inverting the signs of ɛq C in the above inequalities, the bound also overs, S 2 as well. But then we have that for all ɛ 0, ɛ 4 and all, S 1 S 2 S 3 that 2ɛ q Q C 2ɛ ɛ + 1 ɛ Q C ρ ɛ q Q C ɛ ɛ Q C Notie that 2ɛQ C +1+ɛ Q C = 1+ɛ Q 2C+C. To ease the notation of the upoming alulations we define Q 2C + C lwɛ := 1 ɛ, upɛ := 1 + ɛ Q 2C + C

11 Now we are ready to use Theorem 2 to bound the differene in the value of the two Marov Deision proesses that appear when we fix the strategy of Player I to be q ɛ [N] [m] and ρ ɛ [N] [m]. Sine the strategy q ɛ [N] [m] is ɛ optimal among stationary strategies, then when Player I fixes its strategy to q ɛ [N] [m], Player II an not gain more than ɛ more than v + ɛ. Similarly we an loo at the game where Player I fixes his strategy to ρ ɛ [N] [m]. If we an prove that Player II an not gain more than v + γ in this game, then we get that the strategy is γ optimal among stationary strategies. Let p l ɛ,l [N] [m] be the transition probabilities of the Marov Deision proess where we fix the strategy of Player I to be q ɛ [N] [m]. Similarly, let p l ɛ,l [N] [m] be the transition probabilities when we fix Player I s strategy to be ρ ɛ [N] [m]. Then, we get: p l ɛ p l ɛ = {1,...,m} ρ i ɛpl i {1,...,m} q i ɛpl i lwɛ pl ɛ p l upɛ ɛ So we have an upper bound on the fration pl ɛ p l ɛ. To upper bound the fration p l ɛ p l ɛ, observe that when ɛ is smaller than some ɛ, then lwɛ, upɛ > 0 and we get the following upper bound lwɛ pl ɛ pl p l ɛ ɛ p l ɛ 1 lwɛ Also, sine lwɛ upɛ 1, then 1 lwɛ pl upɛ, so the fration is also p l ɛ 1 upper bounded by lwɛ. We now use Theorem 2 with δ := 1 lwɛ 1, and a as a an upper bound on the absolute value of the rewards. Now loo at any state, and let γ, γ > 0 be the numbers suh that v + γ and v + γ are the values for Player II of the games where Player I has fixed his strategy to q ɛ [N] [m] and ρ ɛ [N] [m] respetively. Then from Theorem 2 we get v + γ v + γ = γ γ 4Nδa 1 γ 4N 1 a + γ 4N 1 ɛ Q 2C+C 2C+C ɛq 1 ɛ Q 2C+C a + ɛ Sine the denominator 1 ɛ Q 2C+C tends to 1 for ɛ 0, then for ɛ smaller than some ɛ, the denominator is always larger than 1 2. So by letting ɛ 0 := minɛ, ɛ 4 we get that γ 8Na2C+C ɛ Q +ɛ. This implies that ρ ɛ [N] [m] is a 8Na2C+C ɛ Q + ɛ -optimal strategy among stationary strategies. Now onsider ɛ

12 1 the strategy defined by ϕ := ρ 8Na2C+C ɛ Q, whih is then ɛ Q + ɛ- optimal among stationary strategies. The strategy ϕ ɛ 2 [N] [m] Q + ɛ 2 ɛ. optimal strategy, sine ɛ 2 is then an ɛ- Finally notie that the strategy ϕ ɛ 2 [N] [m] strategies, sine it ould have frational exponents. To fix this, we define is not a monomial family of M := lm {1,...,m}, {1,...,N} M, and let x ɛ, := ɛ Q M ρ 2. Then x ɛ 0<ɛ ɛ0 is a monomial family of strategies, whih is also ɛ optimal amough stationary strategies, beause ɛ QM 2 ɛ 2, hene adding the exponent QM only improves the approximation of the value. Referenes 1. Truman Bewley and Elon Kohlberg. The asymptoti theory of stohasti games. Mathematis of Operations Researh, 13: , Truman Bewley and Elon Kohlberg. On stohasti games with stationary optimal strategies. Mathematis of Operations Researh, 32:pp , Krishnendu Chatteree, Lua de Alfaro, and Thomas A. Henzinger. Strategy improvement for onurrent reahability games. In Third International Conferene on the Quantitative Evaluation of Systems. QEST 06., pages IEEE Computer Soiety, Krishnendu Chatteree, Rupa Maumdar, and Thomas A. Henzinger. Stohasti limit-average games are in EXPTIME. International Journal of Game Theory, 372: , Lua de Alfaro, Thomas A. Henzinger, and Orna Kupferman. Conurrent reahability games. Theor. Comput. Si., 3863: , H. Everett. Reursive games. In H. W. Kuhn and A. W. Tuer, editors, Contributions to the Theory of Games Vol. III, volume 39 of Annals of Mathematial Studies. Prineton University Press, M. Freidlin and A. Wentzell. Random Perturbations of Dynamial Systems. Springer Verlag,, D. Gillette. Stohasti games with zero stop probabilities. In Contributions to the Theory of Games III, volume 39 of Ann. Math. Studies, pages Prineton University Press, Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen, and Peter Bro Miltersen. The omplexity of solving reahability games using value and strategy iteration. In CSR 11, volume 6651 of Leture Notes in Computer Siene, pages 77 90, Kristoffer Arnsfelt Hansen, Mihal Kouy, Niels Lauritzen, Peter Bro Miltersen, and Elias P Tsigaridas. Exat algorithms for solving stohasti games. In Proeedings of the 43rd annual ACM symposium on Theory of omputing, pages ACM, Kristoffer Arnsfelt Hansen, Mihal Kouy, and Peter Bro Miltersen. Winning onurrent reahability games requires doubly exponential patiene. In 24th Annual IEEE Symposium on Logi in Computer Siene LICS 09, pages IEEE, 2009.

13 12. J. F. Mertens and A. Neyman. Stohasti games. International Journal of Game Theory, 10:53 66, Emanuel Milman. The semi-algebrai theory of stohasti games. Mathematis of Operations Researh, 272:pp , E. Solan. Continuity of the value of ompetitive Marov deision proesses. Journal of Theoretial Probability, 16: , Eilon Solan and Niolas Vieille. Computing uniformly optimal strategies in twoplayer stohasti games. Eonomi Theory, 42: , 2010.

max min z i i=1 x j k s.t. j=1 x j j:i T j

max min z i i=1 x j k s.t. j=1 x j j:i T j AM 221: Advaned Optimization Spring 2016 Prof. Yaron Singer Leture 22 April 18th 1 Overview In this leture, we will study the pipage rounding tehnique whih is a deterministi rounding proedure that an be