arxiv: v1 [cs.lo] 17 Dec 2015
|
|
- Walter Booker
- 6 years ago
- Views:
Transcription
1 Non-Zero Sum Games for Reactive Synthesis Romain Brenguier 1, Lorenzo Clemente 2, Paul Hunter 3, Guillermo A. Pérez 3, Mickael Randour 3, Jean-François Raskin 3, Ocan Sankur 4, Mathieu Sassolas 5 arxiv: v1 [cs.lo] 17 Dec University of Oxford, UK 2 University of Warsaw, Poland 3 Université Libre de Bruxelles, Belgium 4 CNRS, Irisa, France 5 Université Paris-Est Créteil, LACL, France Abstract. In this invited contribution [7], we summarize new solution concepts useful for the synthesis of reactive systems that we have introduced in several recent publications. These solution concepts are developed in the context of non-zero sum games played on graphs. They are part of the contributions obtained in the invest project funded by the European Research Council. 1 Introduction Reactive systems are computer systems that maintain a continuous interaction with the environment in which they operate. They usually exhibit characteristics, like real-time constraints, concurrency, parallelism, etc., that make them difficult to develop correctly. Therefore, formal techniques using mathematical models have been advocated to help to their systematic design. One well-studied formal technique is model checking [21,40,2] which compares a model of a system with its specification. The main objective of this technique is to find design errors early in the development cycle. So model-checking can be considered as a sophisticated debugging method. A scientifically more challenging goal, called synthesis, is to design algorithms that, given a specification for a reactive system and a model of its environment, directly synthesize a correct system, i.e., a system that enforces the specification no matter how the environment behaves. Synthesis can take different forms: from computing optimal values of parameters to the full-blown automatic synthesis of finite-state machine descriptions for components of the reactive system. The main mathematical models proposed for the synthesis problem are based on two-player zero-sum games played on graphs and the main solution concept for those games is the notion of winning strategy. This model encompasses the situation where a monolithic controller has to be designed to interact with a monolithic environment that is supposed to be fully Work supported by the ERC starting grant invest (FP ), G.A. Pérez is supported by F.R.S.-FNRS ASP fellowship, M. Randour is a F.R.S.-FNRS Postdoctoral Researcher.
2 antagonistic. In the sequel, we call the two players Eve and Adam, Eve plays the role of the system and Adam plays the role of the environment. A fully antagonistic environment is most often a bold abstraction of reality: the environment usually has its own goal which, in general, does not correspond to that of falsifying the specification of the reactive system. Nevertheless, this abstraction is popular because it is simple and sound: a winning strategy against an antagonistic environment is winning against any environment that pursues its own objective. However this approach may fail to find a winning strategy even if solutions exist when the objective of the environment are taken into account, or it may produce sub-optimal solutions because they are overcautious and do not exploit the fact the the environment has its own objective. In several recent works, we have introduced new solution concepts for synthesis of reactive systems that take the objective of the environment into account or relax the fully adversarial assumption. Assume admissible synthesis In [8], we proposed a novel notion of synthesis where the objective of the environment can be captured using the concept of admissible strategies [5,3,9]. For a player with objective φ, a strategy σ is dominated by σ if σ does as well as σ w.r.t. φ against all strategies of the other players, and better for some of those strategies. A strategy σ is admissible if it is not dominated by another strategy. We use this notion to derive a meaningful notion to synthesize systems with several players, with the following idea. Only admissible strategies should be played by rational players as dominated strategies are clearly sub-optimal options. In assume-admissible synthesis, we make the assumption that both players play admissible strategies. Then, when synthesizing a controller, we search for an admissible strategy that is winning against all admissible strategies of the environment. Assume admissible synthesis is sound: if both players choose strategies that are winning against admissible strategies of the other player, the objectives of both players will be satisfied. Regret minimization: best-responses as yardstick In [33] we studied strategies for Eve which minimize her regret. The regret of a strategy σ of Eve corresponds to the difference between the value Eve achieves by playing σ against Adam and the value she could have ensured if she had known the strategy of Adam in advance. Regret is not a novel concept in game theory see, e.g., [31], but it was not explicitly used for games played on graphs before [29]. The complexity of deciding whether a regret-minimizing strategy for Eve exists, and the memory requirements for such strategies change depending on what type of behavior Adam can use. We have focused on three particular cases: arbitrary behaviors, positional behaviors, and time-dependent behaviors (otherwise known as oblivious environments). The latter class of regret games was shown in [33] to be related to the problem of determining whether an automaton has a certain form of determinism. Games with an expected adversary In [13,12,22], we combined the classical formalism of two-player zero-sum games (where the environment is con- 2
3 sidered to be completely antagonistic) with Markov decision processes (MDPs), a well-known model for decision-making inside a stochastic environment. The motivation is that one has often a good idea of the expected behavior (i.e., average-case) of the environment represented as a stochastic model based on statistical data such as the frequency of requests for a computer server, the average traffic in a town, etc. In this case, it makes sense to look for strategies that will maximize the expected performance of the system. This is the traditional approach for MDPs, but it gives no guarantee at all if the environment deviates from its expected behavior, which can happen, for example, if events with small probability happen, or if the statistical data upon which probabilities are estimated is noisy or unreliable. On the other hand, two-player zero-sum games lead to strategies guaranteeing a worst-case performance no matter how the environment behaves however such strategies may be far from optimal against the expected behavior of the environment. With our new framework of beyond worst-case synthesis, we provide formal grounds to synthesize strategies that both guarantee some minimal performance against any adversary and provide an higher expected performance against a given expected behavior of the environment thus essentially combining the two traditional standpoints from games and MDPs. Structure of the paper Section 2 recalls preliminaries about games played on graphs while Section 3 recalls the classical setting of zero-sum two player games. Section 4 summarizes our recent works on the use of the notion of admissibility for synthesis of reactive systems. Section 5 summarizes our recent results on regret minimization for reactive synthesis. Section 6 summarizes our recent contributions on the synthesis of strategies that ensure good expected performance together with guarantees against their worst-case behaviors. 2 Preliminaries We consider two-player turn-based games played on finite (weighted) graphs. Such games are played on so-called weighted game arenas. Definition 1 (Weighted Game Arena). A (turn-based) two-player weighted game arena is a tuple A = S,S,E,s init,w where: S is the finite set of states owned by Eve, S is the finite set of states owned by Adam, S S = and we denote S S by S. E S S is a set of edges, we say that E is total whenever for all states s S, there exists s S such that (s,s ) E (we often assume this w.l.o.g.). s init S is the initial state. w : E Z is the weight function that assigns an integer weight to each edge. We do not always use the weight function defined on the edges of the weighted game arena and in these cases we simply omit it. 3
4 Unless otherwise stated, we consider for the rest of the paper a fixed weighted game arena A = S,S,E,s init,w. A play in the arena A is an infinite sequence of states π = s 0 s 1...s n... such that for all i 0, (s i,s i+1 ) E. A play π = s 0 s 1... is initial when s 0 = s init. We denote by Plays(A) the set of plays in the arena A, and by InitPlays(A) its subset of initial plays. A history ρ is a finite sequence of states which is a prefix of a play in A. We denote by Pref(A) the set of histories in A, and the set of prefixes of initial plays is denoted by InitPref(A). Given an infinite sequence of states π, and two finite sequences of states ρ 1,ρ 2, we write ρ 1 < π if ρ 1 is a prefix of π, and ρ 2 ρ 1 if ρ 2 is a prefix of ρ 1. For a history ρ = s 0 s 1...s n, we denote by last(ρ) its last state s n, and for all i,j, 0 i j n, by ρ(i..j) the infix of ρ between position i and position j, i.e., ρ(i..j) = s i s i+1...s j, and by ρ(i) the position i of ρ, i.e., ρ(i) = s i. The set of histories that belong to Eve, noted Pref (A) is the subset of histories ρ Pref(A) such that last(ρ) S, and the set of histories that belong to Adam, noted Pref (A) is the subset of histories ρ Pref(A) such that last(ρ) S. Definition 2 (Strategy). A strategy for Eve in the arena A is a function σ : Pref (A) S such that for all ρ Pref (A), (last(ρ),σ (ρ)) E, i.e., it assigns to each history of A that belongs to Eve a state which is a E-successor of the last state of the history. Symmetrically, a strategy for Adam in the arena A is a function σ : Pref (A) S such that for all ρ Pref (A), (last(ρ),σ (ρ)) E. The set of strategies for Eve is denoted by Σ and the set of strategies of Adam by Σ. When we want to refer to a strategyof Eve or Adam, we write it σ. We denote by Dom(σ) the domain of definition of the strategy σ, i.e., for all strategies σ of Eve (resp. Adam), Dom(σ) = Pref (A) (resp. Dom(σ) = Pref (A)). A play π = s 0 s 1...s n... is compatible with a strategy σ if for all i 0 such that π(0..i) Dom(σ), we have that s i+1 = σ(ρ(0..i)). We denote by Outcome s (σ) the set of playsthat start in s and are compatible with the strategy σ. Given a strategy σ for Eve and a strategy σ for Adam, and a state s, we write Outcome s (σ,σ ) the unique play that starts in s and which is compatible both with σ and σ. A strategy σ is memoryless when for all histories ρ 1,ρ 2 Dom(σ), if we have that last(ρ 1 ) = last(ρ 2 ) then σ(ρ 1 ) = σ(ρ 2 ), i.e., memoryless strategies only depend on the last state of the history and so they can be seen as (partial) functions from S to S. Σ ML andσ ML denotes memoryless strategies of Eve and of Adam, respectively. A strategy σ is finite-memory if there exists an equivalence relation Dom(σ) Dom(σ) of finite index such that for all histories ρ 1,ρ 2 such that ρ 1 ρ 2, we have that σ(ρ 1 ) = σ(ρ 2 ). If the relation is regular (computable by a finite state machine) then the finite memory strategy can be modeled by a finite state transducer (a so-called Moore or Mealy machine). If a strategy is encoded by a machine with m states, we say that it has memory size m. 4
5 An objective Win Plays(A) is a subset of plays. A strategy σ is winning from state s if Outcome s (σ) Win. We will consider both qualitative objectives, that do not depend on the weight function of the game arena, and quantitative objectives that depend on the weight function of the game arena. Our qualitative objectives are defined with Muller conditions (which are a canonical way to represent all the regular sets of plays). Let π S ω, be a play, then inf(π) = {s S i j i 0 : π(j) = s} is the subset of elements of S that occur infinitely often along π. A Muller objective for a game arena A is a defined by a set of sets of states F and contains the plays {π S ω inf(π) F}. We sometimes take the liberty to define such regular sets using standard LTL syntax. For a formal definition of the syntax and semantics of LTL, we refer the interested reader to [2]. We associate, to each play π, an infinite sequence of weights, denoted w(π), and defined as follows: w(π) = w(π(0),π(1))w(π(1),π(2))...w(π(i),π(i+1)) Z ω. To assign a value Val(π) to a play π, we classically use functions like sup (that returns the supremum of the values along the play), inf (that returns the infimum), limsup (that returns the limit superior), liminf (that returns the limit inferior),mp (that returnsthe limit ofthe averageofthe weightsalongthe play), or dsum (that returns the discounted sum of the weights along the play). We only define the mean-payoff measure formally. Let ρ = s 0 s 1...s n be s.t. (s i,s i+1 ) E for all i, 0 i < n, the mean-payoff of this sequence of edges is MP(ρ) = 1 i=n 1 n w(ρ(i),ρ(i+1)), i=0 i.e., the mean-value of the weights of the edges traversed by the finite sequence ρ. The mean-payoff of an (infinite) play π, denoted MP(π), is a real number defined from the sequence of weights w(π) as follows: i=n 1 1 MP(π) = liminf n + n w(π(i),π(i+1)), i=0 i.e., MP(π) is the limit inferior of running averagesof weights seen alongthe play π. Note that we need to use liminf because the value of the running averages of weights may oscillate along π, and so the limit is not guaranteed to exist. A game is defined by a (weighted) game arena, and objectives for Eve and Adam. Definition 3 (Game). A game G = (A,Win,Win ) is defined by a game arena A, an objective Win for Eve, and an objective Win for Adam. 5
6 3 Classical Zero-Sum Setting In zero sum games, players have antagonistic objectives. Definition 4. A game G = (A,Win,Win ) is zero-sum if Win = Plays\Win Fig.1. An example of a two-player game arena. Rounded positions belong to Eve, and squared positions belong to Adam. Example 1. Let us consider the example of Fig. 1. Assume that the objective of Eve is to visit 4 infinitely often, i.e., Win = {π Plays π = 4}, and that the objective of Adam is Win = Plays\Win. Then it should be clear that Eve does not have a strategy that enforces a play in Win no matter what Adam plays. Indeed, if Adam always chooses to stay at state 2, there is no way for Eve to visit 4 at all. As we already said, zero-sum games are usually a bold abstraction of reality. This is because the system to synthesize usually interacts with an environment that has its own objective, and this objective is not necessarily the complement of the objective of the system. A classical way to handle this situation (see e.g., [4]) is to ask the system to win only when the environment meets its own objective. Definition 5 (Win-Hyp). Let G = (A,Win,Win ) be a game, Eve achieves Win from state s under hypothesis Win if there exists σ such that Outcome s (σ ) Win Win. The synthesis rule in the definition above is called winning under hypothesis, Win-Hyp for short. Example 2. Let us consider the example of Fig. 1 again. But now assume that the objective of Adam is to visit 3 infinitely often, i.e., Win = {π Plays π = 3}. In this case, it should be clear then the strategy 1 2 and 3 4 for Eve is winning for the objective Win-Hyp 4 3 = {π Plays π = 4} {π Plays π = 3} i.e., under the hypothesis that the outcome satisfies the objective of Adam. Unfortunately, there are strategies of Eve which are winning for the rule Win- Hyp but which are not desirable. As an example, consider the strategy that in 1 chooses to go to 5. In that case, the objective of Adam is unmet and so this strategy of Eve is winning for Win-Hyp 4 3, but clearly such a strategy is not interesting as it excludes the possibility to meet the objective of Eve. 6
7 4 Assume Admissible Synthesis Todefinethenotionofadmissiblestrategy,wefirstneedtodefinewhenastrategy σ is dominated by a strategy σ. We will define the notion for Eve, the definition for Adam is symmetric. Let σ and σ be two strategies of Eve in the game arena A. We say that σ dominates σ if the following two conditions hold: 1. σ Σ Outcome sinit (σ,σ ) Win Outcome sinit (σ,σ ) Win 2. σ Σ Outcome sinit (σ,σ ) / Win Outcome sinit (σ,σ ) Win So a strategy σ is dominated by σ if σ does as well as σ against any strategy of Adam (condition 1), and there exists a strategy of Adam against which σ does better than σ (condition 2). Definition 6 (Admissible Strategy). A strategy is admissible if there does not exist a strategy that dominates it. Let G = (A,Win,Win ) be a game, the set of admissible strategies for Eve is noted Adm, and the set of admissible strategies for Adam is denoted Adm. Clearly, a rational player should not play a dominated strategy as there always exists some strategy that behaves strictly better than the dominated strategy. So, a rational player only plays admissible strategies. Example 3. Let us consider again the example of Fig. 1 with Win = {π Plays π = 4} and Win = {π Plays π = 3}. We claim that the strategy σ that plays 1 5 is not admissible in A from state 1. This is because the strategy σ that plays 1 2 and 4 3 dominates this strategy. Indeed, while σ is always losing for the objective of Eve, the strategy σ wins for this objective whenever Adam eventually plays 2 3. Definition 7 (AA). Let G = (A,Win,Win ) be a game, Eve achieves Win from s under the hypothesis that Adam plays admissible strategies if σ Adm σ Adm Outcome s (σ,σ ) Win. Example 4. Let us consideragainthe exampleoffig. 1with Win = {π Plays π = 4} and Win = {π Plays π = 3}. We claim that the strategy σ of Eve that plays 1 2 and 4 3 is admissible (see previous example) and winning against all the admissible strategies of Adam. This is a consequence of the fact that the strategy of Adam that always plays 2 2, and which is the only counter strategyofadam againstσ, is not admissible. Indeed, this strategy falsifies Win while a strategy that always chooses 2 3 enforces the objective of Adam. Theorem 1 ([3,9,8]). For all games G = (A,Win,Win ), if Win and Win are omega-regular sets of plays, then Adm and Adm are both non empty sets. The problem of deciding if a game G = (A,Win,Win ), where Win and Win are omega-regular sets of plays expressed as Muller objectives, satisfies is PSpace-complete. σ Adm σ Adm Outcome s (σ,σ ) Win 7
8 Additional Results. The assume-admissible setting we present here relies on procedures for iterative elimination of dominated strategies for multiple players which was studied in [3] on games played on graphs. In this context, dominated strategies are repeatedly eliminated for each player. Thus, with respect to the new set of strategies of its opponent, new strategies may become dominated, and will therefore be eliminated, and so on until the process stabilizes. In [9], we studied the algorithmic complexity of this problem and proved that for games with Muller objectives, deciding whether all outcomes compatible with iteratively admissible strategy profiles satisfy an omega-regular objective defined by a Muller condition is PSpace-complete and in UP coup for the special case of Büchi objectives. The assume-admissible rule introduced in [8] is also defined for multiple players and corresponds, roughly, to the first iteration of the elimination procedure. We additionally prove that if players have Büchi objectives, then the rule can be decided in polynomial-time. One advantage of the assume-admissible rule is the rectangularity of the solution set: the set of strategy profiles that witness the rule can be written as a product of sets of strategies for each player. In particular, this means that a strategy witnessing the rule can be chosen separately for each player. Thus, the rule is robust in the sense that the players do not need to agree on a strategy profile, but only on the admissibility assumption on each other. In addition, we show in [8] that the rule is amenable to abstraction techniques: we show how state-space abstractions can be used to check a sufficient condition for assume-admissible, only doing computations on the abstract state space. Related Works. The rule winning under hypothesis (Win-Hyp) and its weaknesses are discussed in [4]. We have illustrated the limitations of this rule in Example 2. There are related works in the literature which propose concepts to model systems composed of several parts, each having their own objectives. The solutions that are proposed are based on n-players non-zero sum games. This is the case both for assume-guarantee synthesis[18](ag), and for rational synthesis[30] (RS). For the case of two player games, AG is based on the concept of secure equilibria [19] (SE), a refinement of Nash equilibria [38] (NE). In SE, objectives of the players are lexicographic: each player first tries to force his own objective, and then tries to falsify the objectives of the other players. It was shown in [19] that SE are the NE that form enforceable contracts between the two players. When the AG rule is extended to several players, as in [18], it no longer corresponds to secure equilibria. We gave a direct algorithm for multiple players in [8]. The difference betweenag andse is that AGstrategieshaveto be resilientto deviations of all the other players, while SE profiles have to be resilient to deviations by only one player. A variant of the rule AG, called Doomsday equilibria, has been proposed in [15]. We have also studied quantitative extensions of the notion of secure equilibria in [14]. In the context of infinite games played on graphs, one well known limitation of NE is the existence of non-credible threats. Refinements of the notion of 8
9 NE, like sub-game perfect equilibria (SPE), have been proposed to overcome this limitation. SPE for games played on graphs have been studied in e.g., [43,10]. Admissibility does not suffer from this limitation. In RS, the system is assumed to be monolithic and the environment is made of several components that are only partially controllable. In RS, we search for a profile of strategies in which the system forces its objective and the players that model the environment are given an acceptable strategy profile, from which it is assumed that they will not deviate. Acceptable can be formalized by any solution concept, e.g., by NE, dominant strategies, or sub-game perfect equilibria. This is the existential flavor of RS. More recently, Kupferman et al. have proposed in [35] a universal variant of this rule. In this variant, we search for a strategy of the system such that in all strategy profiles that extend this strategy for the system and that are NE, the outcome of the game satisfies the specification of the system. In [26], Faella studies several alternatives to the notion of winning strategy including the notion of admissible strategy. His work is for two-players but only the objective ofone playeris taken into account, the objectiveof the other player is left unspecified. In that work, the notion of admissibility is used to define a notion of best-effort in synthesis. The notion of admissible strategy is definable in strategy logics [20,37] and decision problems related to the assume-admissible rule can be reduced to satisfiability queries in such logics. This reduction does not lead to worst-case optimal algorithms; we presented worst-case optimal algorithms in [8] based on our previous work [9]. 5 Regret Minimization In the previous section, we have shown how the notion of admissible strategy can be used to relax the classical worst-case hypothesis made on the environment. In this section, we review another way to relax this worst-case hypothesis. The idea is simple and intuitive. When looking for a strategy, instead of trying to find a strategy which is worst-caseoptimal, we searchfor a strategy that takes best-responses (against the behavior of the environment) as a yardstick. That is, we would like to find a strategy that behaves not far from an optimal response to the strategy of the environment when the latter is fixed. The notion of regret minimization is naturally defined in a quantitative setting (although it also makes sense in a Boolean setting). Let us now formally define the notion of regret associated to a strategy of Eve. This definition is parameterized by a set of strategies for Adam. Definition 8 (Relative Regret). Let A = S,S,E,s init,w be a weighted game arena, let σ be a strategy of Eve, the regret of this strategy relative to a set of strategies Str Σ is defined as follows: Reg(σ,Str ) = sup sup Val(σ,σ ) Val(σ,σ ). σ Str σ Σ 9
10 We interpret the sub-expression sup σ Σ Val(σ,σ ) as the best-response of Eve against σ. Then, the relative regret of a strategy of Eve can be seen as the supremum of the differences between the value achieved by σ against a strategy of Adam and the value achieved by the corresponding best-response. We are now equipped to formally define the problem under study, which is parameterized by payoff function Val( ) and a set Str of strategies of Adam. Definition 9 (Regret Minimization). Given a weighted game arena A and a rational threshold r, decide if there exists a strategy σ for Eve such that Reg(σ,Str ) r and synthesize such a strategy if one exists. In [33], we have considered several types of strategies for Adam: the set Σ, i.e., any strategy, the set Σ ML, i.e., memoryless strategies for Adam, and the set Σ W, i.e., word strategies for Adam.6 We will illustrate each of these cases on examples below. Example 5. Let us consider the weighted gamearena offig. 2, and let us assume that we want to synthesize a strategy for Eve that minimizes her mean-payoff regret against Adam playing a memoryless strategy. The memoryless restriction is useful when designing a system that needs to perform well in an environment which is only partially known. In practice, a controller may discover the environment with which it is interacting during run-time. Such a situation can be modeled by an arena in which choices in nodes of the environment model an entire family of environments and each memoryless strategy models a specific environment of the family. In such cases, if we want to design a controller that performs reasonably well against all the possible environments, we can consider each best-response of Eve for each environment and then try to choose one unique strategy for Eve that minimizes the difference in performance w.r.t. those best-responses: a regret-minimizing strategy Fig.2. An example of a two-player game arena with MP objective for Eve. Rounded positions belong to Eve, and squared positions belong to Adam. 6 To define word strategies, it is convenient to consider game arenas where edges have labels called letters. In that case, when playing a word strategy, Adam commits to a sequence of letters (i.e., a word) and plays that word regardless of the exact state of the game. Word strategies are formally defined in [33] and below. 10
11 In our example, prior to a first visit to state 3, we do not know if the edge 3 2 or the edge 3 1 will be activated by Adam. But as Adam is bound to play a memoryless strategy, once he has chosen one of the two edges, we know that he will stick to this choice. A regret-minimizing strategy in this example is as follows: play 1 2, then 2 3, if Adam plays 3 2, then play 2 1 and then 1 1 forever, otherwise Adam plays 3 1 and then Eve should continue to play 1 2 and 2 3 forever. This strategy has regret 0. Note that this strategy uses memory and that there is no memoryless strategy of Eve with regret 0 in this game. Let us now illustrate the interest of the notion of regret minimization when Adam plays word strategies. When considering this restriction, it is convenient to consider letters that label the edges of the graph (Fig. 3). A word strategy for Adam is a function w : N {a,b}. In this setting Adam plays a sequence of letters and this sequence is independent of the current state of the game. We have shown in [33] that the notion of regret minimization relative to word strategies is a generalization of the notion of good-for-games automata introduced by Henzinger and Piterman in [32]. a 3 9 a 2 a,b a,b b 1 2 b Fig.3. An example of a two-player game arena with MP objective for Eve. Edges are annotated by letters: Adam chooses a word w and Eve resolves the non-determinism on edges. Example 6. In this example, a strategy of Eve determines how to resolve nondeterminism in state 1. The best strategy of Eve for mean-payoff regret minimization is to always take the edge 1 3. Indeed, let us consider all the sequences of two letters that Adam can choose and compute the regret of choosing 1 2 (left) and the regret of choosing 1 3 (right): a with {a,b}, the regret of left is equal to 0, and the regret of right is = 1. b with {a,b}, the regret of left is equal to = 3, and the regret of right is 0. So the strategy that minimizes the regret of Eve is to always take the arrow 1 3 (right), the regret is then equal to 1. In [33], we have studied the complexity of deciding the existence of strategies for Eve that have less than a given regret threshold. The results that we have obtained are summarized in the theorem below. 11
12 Theorem 2 ([33]). Let A = S,S,E,s init,w be a weighted game arena, the complexity of deciding if Eve has a strategy with regret less than or equal to a threshold r Q against Adam playing: a strategy in Σ, is PTime-Complete for payoff functions inf, sup, liminf, limsup, and in NP conp for MP. a strategy in Σ ML, is in PSpace for payoff functions inf, sup, liminf, limsup, and MP, and is conp-hard for inf, sup, limsup, and PSpace-Hard for liminf, and MP. a strategy in Σ W, is ExpTime-Complete for payoff functions inf, sup, liminf, limsup, and undecidable for MP. The above results are obtained by reducing the synthesis of regret-minimizing strategies to finding winning strategies in classical games. For instance, a strategy for Eve that minimizes regret against Σ ML for the mean-payoff measure corresponds to finding a winning strategy in a mean-payoff game played on a larger game arena which encodes the witnessed choices of Adam and forces him to play positionally. When minimizing regret against word strategies, for the decidable cases the reduction is done to parity games and is based on the quantitative simulation games defined in [16]. Additional Results. Since synthesis of regret-minimizing strategies against word strategies of Adam is undecidable with measure MP, we have considered the sub-case which limits the amount of memory the desired controller can use (as in [1]). That is, we ask whether there exists a strategy of Eve which uses at most memory m and ensures regret at most r. In [33] we showed that this problem is in NTime(m 2 A 2 ) for MP. Theorem 3 ([33]). Let A = S,S,E,s init,w be a weighted game arena, the complexity of deciding if Eve has a strategy using memory of at most m with regret less than or equal to a threshold λ Q against Adam playing a strategy in Σ W, is in non-deterministic polynomial time w.r.t. m and A for inf, sup, liminf, limsup, and MP. Finally, we have established the equivalence of a quantitative extension of the notion of good-for-games automata [32] with determinization-by-pruning of the refinement of an automaton [1] and our regret games against word strategies of Adam. Before we can formally state these results, some definitions are needed. Definition 10 (Weighted Automata). A finite weighted automaton is a tuple Q,q init,a,,w where: Q is a finite set of states, q init Q is the initial state, A is a finite alphabet of actions or symbols, Q A Q is the transition relation, and w : Z is the weight function. Arun ofanautomatononaworda A ω is aninfinite sequenceoftransitions ρ = (q 0,a 0,q 1 )(q 1,a 1,q 2 ) ω such that q 0 = q init and a i = a(i) for all i 0. As with plays in a game, each run is assigned a value with a payoff function Val( ). A weighted automaton M defines a function A ω R by assigning to 12
13 a A ω the supremum over all the values of its runs on a. The automaton is said to be deterministic if for all q Q and x A ω the set {q Q (q,x,q ) } is a singleton. In [32], Henzinger and Piterman introduced the notion of good-for-games automata. A non-deterministic automaton is good for solving games if it fairly simulates the equivalent deterministic automaton. Definition 11 (α-good-for-games). A finite weighted automaton M is α- good-for-games if a player (Simulator), against any word x A ω spelled by Spoiler, can resolve non-determinism in M so that the resulting run has value v and M(x) v α. The above definition is a quantitative generalization of the notion proposed in [32]. We link their class of automata with our regret games in the sequel. Proposition 1 ([33]). A weighted automaton M = Q,q init,a,,w is α-goodfor-games if and only if there exists a strategy σ for Eve with relative regret of at most α against strategies Σ W of Adam. Our definitions also suggest a natural notion of approximate determinization for weighted automata on infinite words. This is related to recent work by Aminof et al.: in [1], they introduce the notion of approximate-determinizationby-pruning for weighted sum automata over finite words. For α (0, 1], a weighted sum automaton is α-determinizable-by-pruning if there exists a finite state strategy to resolve non-determinism and that constructs a run whose value is at least α times the value of the maximal run of the given word. So, they consider a notion of approximation which is a ratio. Let us introduce some additional definitions required to formalize the notion of determinizable-by-pruning. Consider two weighted automata M = Q,q init,a,,w and M = Q,q init,a,,w. We say that M α-approximates M if M(x) M (x) α, for all x A ω. We say that M embodies M if Q Q,, and w agrees with w on. For an integer k 0, the k-refinement of M is the automaton obtained by refining the state-space of M using k boolean variables. Definition 12 ((α, k)-determinizable-by-pruning). A finite weighted automaton M is (α, k)-determinizable-by-pruning if the k-refinement of M embodies a deterministic automaton which α-approximates M. We show in [33] that when Adam plays word strategies only, our notion of regret defines a notion of approximation with respect to the difference metric for weighted automata (as defined above). Proposition 2 ([33]). A weighted automaton M = Q,q init,a,,w is α- determinizable-by-pruning if and only if there exists a strategy σ for Eve using memory at most 2 m with relative regret of at most α against strategies Σ W of Adam. 13
14 Related Works The notion of regret minimization is important in game and decision theory, see e.g., [46] and additional bibliographical pointers there. The concept of iterated regret minimization has been recently proposed by Halpern et al. for non-zero sum games [31]. In [29], the concept is applied to games played on weighted graphs with shortest path objectives. Variants on the different sets of strategies considered for Adam were not considered there. In [24], Damm and Finkbeiner introduce the notion of remorse-free strategies. The notion is introduced in order to define a notion of best-effort strategy when winning strategies do not exist. Remorse-free strategies are exactly the strategies which minimize regret in games with ω-regular objectives in which the environment (Adam) is playing word strategies only. The authors of [24] do not establish lower bounds on the complexity of the realizability and synthesis problems for remorse-free strategies. A concept equivalent to good-for-games automata is that of historydeterminism [23]. Proposition 1 thus allows us to generalize history-determinism to a quantitative setting via this relationship with good-for-games automata. Finally, we would like to highlight some differences between our work and the study of Aminof et al. in [1] on determinization-by-pruning. First, we consider infinite words while they consider finite words. Second, we study a general notion of regret minimization problem in which Eve can use any strategy while they restrict their study to fixed memory strategies only and leave the problem open when the memory is not fixed a priori. 6 Game Arenas with Expected Adversary In the two previous sections we have relaxed the worst-case hypothesis on the environment(modeled by the behavior of Adam) by either considering an explicit objective for the environment or by considering as yardsticks the best-responses to the strategies of Adam. Here, we introduce another model where the environment is modeled as a stochastic process (i.e., Adam is expected to play according to some known randomized strategy) and we are looking for strategies for Eve that ensure good expectation against this stochastic process while guaranteeing acceptable worst-case performance even if Adam deviates from his expected behavior. To define formally this new framework, we need game arenas in which an expected behavior for Adam is given as a memoryless randomized strategy. 7 We first introduce some notation. Given a set A, let D(A) denote the set of rational probability distributions over A, and, for d D(A), we denote its support by Supp(d) = {a A d(a) > 0} A. 7 It should be noted that we can easily consider finite-memory randomized strategies for Adam, instead of memoryless randomized strategies. This is because we can always take the synchronized product of a finite-memory randomized strategy with the game arena to obtain a new game arena in which the finite-memory strategy on the original game arena is now equivalent to a memoryless strategy. 14
15 Fig. 4. A game arena associated with a memoryless randomized strategy for Adam can be seen as an MDP: the fractions represent the respective probability to take each outgoing edge when leaving state 3. Definition 13. Fix a weighted game arena A = S,S,E,s init,w. A memoryless randomized strategy for Adam is a function σ rnd : S D(S) such that for all s S, Supp(σ rnd(s)) {s S (s,s ) E}. For the rest of this section, we model the expected behavior of Adam with a strategyσ rnd, given as part of the input for the problem we will consider. Given a weightedgamearenaaandamemorylessrandomizedstrategyσ rnd for Adam, we are left with a model with both non-deterministic choices(for Eve) and stochastic transitions (due to the randomized strategy of Adam). This is essentially what is known in the literature as a player game or more commonly, a Markov Decision Process (MDP), see for example [39,27]. One can talk about plays, strategies and other notions in MDPs as introduced for games. Considerthe game in Fig. 4. We can see it as a classicaltwo-playergame ifwe forget about the fractions around state 3. Now assume that we fix the memoryless randomized strategy σ rnd for Adam to be the one that, from 3, goes to 1 with probability 9 10 and to 2 with the remaining probability, This is represented by the fractions on the corresponding outgoing edges. In the remaining model, only Eve still has to pick a strategy: it is an MDP. We denote this MDP by A[σ rnd]. Let us go one step further. Assume now that Eve also picks a strategy σ in this MDP. Now we obtain a fully stochastic process called a Markov Chain (MC). We denote it by A[σ,σ rnd ]. In an MC, an event is a measurable set of plays. It is well-known from the literature [44] that every event has a uniquely defined probability (Carathéodory s extension theorem induces a unique probability measure on the Borel σ-algebra over plays in the MC). Given E a set of playsinm = A[σ,σ rnd ],wedenotebyp M(E)theprobabilitythataplaybelongs to E when M is executed for an infinite number of steps. Given a measurable value function Val, we denote by E M (Val) the expected value or expectation of Val over plays in M. In this paper, we focus on the mean-payoff function MP. We are now finally equipped to formally define the problem under study. Definition 14 (Beyond Worst-Case Synthesis). Given a weighted game arena A, a stochastic model of Adam given as a memoryless randomized strategy 15
16 σ rnd, and two rational thresholds λ wc,λ exp, decide if there exists a strategy σ for Eve such that { π Outcome A (σ sinit ) Val(π) > λ wc E A[σ,σ rnd ] (Val) > λ exp and synthesize such a strategy if one exists. Intuitively, we are looking for strategies that can simultaneously guarantee a worst-case performance higher than λ wc, i.e., against any behavior of Adam in the game A, and guarantee an expectation higher than λ exp when faced to the expected behavior of Adam, i.e., when played in the MDP A[σ rnd ]. We can of course assume w.l.o.g. that λ wc < λ exp, otherwise the problem reduces trivially to just a worst-case requirement: any lower bound on the worst-case value is also a lower bound on the expected value. Example 7. Consider the arena depicted in Fig. 4. As mentioned before, the probability distribution models the expected behavior of Adam. Assume that we want now to synthesize a strategy for Eve which ensures that (C 1 ) the meanpayoff will be at least 1 3 no matter how Adam behaves (worst-case guarantee), and (C 2 ) at least 3 2 if Adam plays according to his expected behavior (good expectation). First, let us study whether this can be achieved through the two classical solution concepts used in games and MDPs respectively. We start by considering the arena as a traditional two-player zero-sum game: in this case, it is known that an optimal memoryless strategy exists [25]. Let σ wc be the strategy of Eve that always plays 1 1 and 2 1. That strategy maximizes the worst-case mean-payoff, as it enforces a mean-payoff of 1 no matter how Adam behaves. Thus, (C 1 ) is satisfied. Observe that if we consider the arena as an MDP (i.e., taking the probabilities into account), this strategy yields an expected value of 1 as the unique possible play from state 1 is to take the self-loop forever. Hence this strategy does not satisfy (C 2 ). Now, consider the arena as an MDP. Again, it is known that the expected value can be maximized by a memoryless strategy [39,27]. Let σ exp be the strategyofevethatalwayschoosesthe followingedges:1 2and2 3.Itsexpected mean-payoff can be calculated in two steps: first computing the probability vector that represents the limiting stationary distribution of the irreducible MC induced by this strategy, second multiplying it by the vector containing the expected weights over outgoing edges for each state. In this case, it can be shown that the expected value is equal to 54 29, hence the strategy does satisfy (C 2). Unfortunately, it is clearly not acceptable for (C 1 ) as, if Adam does not behave according to the stochastic model and always chooses to play 3 2, the mean-payoff will be equal to zero. Hence this shows that the classical solution concepts do not suffice if one wants to go beyond the worst-case and mix guarantees on the worst-case and the expected performance of strategies. In contrast, with the framework developed in [13,12], it is indeed possible for the considered arena (Fig. 4) to build a strategyforevethatensurestheworst-caseconstraint(c 1 )andatthesametime, 16
17 yields an expected value arbitrarily close to the optimal expectation achieved by strategy σ exp. In particular, one can build a finite-memory strategy that guarantees both (C 1 ) and (C 2 ). The general form of such strategies is a combination of σ exp and σ wc in a well-chosen pattern. Let σcmb(k,l) be a combined strategy parameterized by two integers K,L N. The strategy is as follows. 1. Play according to σ exp for K steps. 2. If the mean-payoff over the last K steps is larger than the worst-case threshold λ wc (here 1 3 ), then go to phase Otherwise, play according to σ wc for L steps, and then go to phase 1. Intuitively, the strategy starts by mimicking σ exp for a long time, and the witnessed mean-payoff over the K steps will be close to the optimal expectation with high probability. Thus, with high probability it will be higher than λ exp, and therefore higher than λ wc recall that we assumed λ wc < λ exp. If this is not the case, then Eve has to switch to σ wc for sufficiently many steps L in order to make sure that the worst-case constraint (C 1 ) is satisfied before switching back to σ exp. One of the key results of [13] is to show that for any λ wc < µ, where µ denotes the optimal worst-case value guaranteed by σ wc, and for any expected value thresholdλ exp < ν, whereν denotes the optimal expected value guaranteed by σ exp, it is possible to compute values for K and L such that σcmb(k,l) satisfies the beyond worst-caseconstraint forthresholds λ wc and λ exp. For instance, in the example, where λ wc = 1 3 < 1 and λ exp = 3 2 < 54 29, one can compute appropriate values of the parameters following the technique presented in [13, Theorem 5]. The crux is proving that, for large enough values of K and L, the contribution to the expectation of the phases when σ cmb(k,l) mimics σ wc are negligible, and thus the expected value yield by σ cmb(k,l) tends to the optimal one given by σ exp, while at the same time the strategy ensures that the worst-case constraint is met. In the next theorem, we sum up some of the main results that we have obtained for the beyond worst-case synthesis problem applied to the mean-payoff value function. Theorem 4 ([13,12,22]). The beyond worst-case synthesis problem for the mean-payoff is in NP conp, and at least as hard as deciding the winner in two-player zero-sum mean-payoff games, both when looking for finite-memory or infinite-memory strategies of Eve. When restricted to finite-memory strategies, pseudo-polynomial memory is both sufficient and necessary. The NP conp-membership is good news as it matches the long-standing complexity barrier for two-player zero-sum mean-payoff games [25,47,11,17]: the beyond worst-case framework offers additional modeling power for free in terms of decision complexity. It is also interesting to note that in general, infinitememory strategies are more powerful than finite-memory ones in the beyond 17
18 worst-case setting, which is not the case for the classical problems in games and MDPs. Looking carefully at the techniques from [13,12], it can be seen that the main bottleneck in complexity is solving mean-payoff games in order to check whether the worst-case constraint can be met. Therefore, a natural relaxation of the problem is to consider the beyond almost-sure threshold problem where the worst-case constraint is softened by only asking that a threshold is satisfied with probability one against the stochastic model given as the strategy σ rnd of Adam. In this case, the complexity is reduced. Theorem 5 ([22]). The beyond almost-sure threshold problem for the meanpayoff is in PTime and finite-memory strategies are sufficient. Related Works We originally introduced the beyond worst-case framework in[13] where we studied both mean-payoff and shortest path objectives. This framework generalizes classical problems for two-player zero-sum games and MDPs. In mean-payoff games, optimal memoryless strategies exist and deciding the winner lies in NP conp while no polynomial algorithm is known [25,47,11,17]. For shortest path games, where we consider game graphs with strictly positive weights and try to minimize the accumulated cost to target,it can be shown that memoryless strategies also suffice, and the problem is in PTime [34]. In MDPs, optimal strategies for the expectation are studied in [39,27] for the mean-payoff and the shortest path: in both cases, memoryless strategies suffice and they can be computed in PTime. While we saw that the beyond worst-case synthesis problem does not cost more than solving games for the mean-payoff, it is not the case anymore for the shortest path: we jump from PTime to a pseudopolynomial-time algorithm. We proved in [13, Theorem 11] that the problem is inherently harder as it is NP-hard. The beyond worst-case framework was extended to the multi-dimensional setting where edges are fitted with vectors of integer weights in [22]. The general case is proved to be conp-complete. Our strategies can be considered as strongly risk averse: they avoid at all cost outcomes that are below a given threshold (no matter what is their probability), and inside the set of those safe strategies, we maximize the expectation. Other different notions of risk have been studied for MDPs: in [45], the authors want to find policies which minimize the probability (risk) that the total discounted rewards do not exceed a specified value (target); in [28] the authors want policies that achieve a specified value of the long-run limiting average reward at a specified probability level (percentile). The latter problem was recently extended significantly in the framework of percentile queries, which provide elaborate guarantees on the performance profile of strategies in multi-dimensional MDPs [41]. While all those strategies limit risk, they only ensure low probability for bad behaviors but they do not ensure their absence, furthermore, they do not ensure good expectation either. Anotherbodyofworkisthe studyofstrategiesin MDPsthatachieveatradeoff between the expectation and the variance over the outcomes (e.g., [6] for the 18
19 mean-payoff, [36] for the cumulative reward), giving a statistical measure of the stability of the performance. In our setting, we strengthen this requirement by asking for strict guarantees on individual outcomes, while maintaining an appropriate expected payoff. A survey of rich behavioral models extending the classical approaches for MDPs including the beyond worst-case framework presented here was published in [42], with a focus on the shortest path problem. References 1. B. Aminof, O. Kupferman, and R. Lampert. Reasoning about online algorithms with weighted automata. ACM Transactions on Algorithms, C. Baier and J.-P. Katoen. Principles of model checking. MIT Press, D. Berwanger. Admissibility in infinite games. In Proc. of STACS, LNCS 4393, pages Springer, R. Bloem, R. Ehlers, S. Jacobs, and R. Könighofer. How to handle assumptions in synthesis. In Proc. of SYNT, EPTCS 157, pages 34 50, A. Brandenburger, A. Friedenberg, and H. J. Keisler. Admissibility in games. Econometrica, 76(2), T. Brázdil, K. Chatterjee, V. Forejt, and A. Kucera. Trading performance for stability in Markov decision processes. In Proc. of LICS, pages IEEE, R. Brenguier, L. Clemente, P. Hunter, G. A. Pérez, M. Randour, J.-F. Raskin, O. Sankur, and M. Sassolas. Non-zero sum games for reactive synthesis. In Proc. of LATA, LNCS. Springer, To appear. 8. R. Brenguier, J.-F. Raskin, and O. Sankur. Assume-admissible synthesis. In Proc. of CONCUR, LIPIcs 42, pages Schloss Dagstuhl LZI, R. Brenguier, J.-F. Raskin, and M. Sassolas. The complexity of admissibility in omega-regular games. In Proc. of CSL-LICS, pages 23:1 23:10. ACM, T. Brihaye, V. Bruyère, N. Meunier, and J.-F. Raskin. Weak subgame perfect equilibria and their application to quantitative reachability. In Proc. of CSL, LIPIcs 41, pages Schloss Dagstuhl - LZI, L. Brim, J. Chaloupka, L. Doyen, R. Gentilini, and J.-F. Raskin. Faster algorithms for mean-payoff games. Formal Methods in System Design, 38(2):97 118, V. Bruyère, E. Filiot, M. Randour, and J.-F. Raskin. Expectations or guarantees? I want it all! A crossroad between games and MDPs. In Proc. of SR, EPTCS 146, pages 1 8, V. Bruyère, E. Filiot, M. Randour, and J.-F. Raskin. Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games. In Proc. of STACS, LIPIcs 25, pages Schloss Dagstuhl - LZI, V. Bruyère, N. Meunier, and J.-F. Raskin. Secure equilibria in weighted games. In Proc. of CSL-LICS, pages 26:1 26:26. ACM, K. Chatterjee, L. Doyen, E. Filiot, and J.-F. Raskin. Doomsday equilibria for omega-regular games. In Proc. of VMCAI, LNCS 8318, pages Springer, K. Chatterjee, L. Doyen, and T. A. Henzinger. Quantitative languages. ACM Transactions on Computational Logic, 11(4), K. Chatterjee, L. Doyen, M. Randour, and J.-F. Raskin. Looking at mean-payoff and total-payoff through windows. Information and Computation, 242:25 52,
Admissible Strategies for Synthesizing Systems
Admissible Strategies for Synthesizing Systems Ocan Sankur Univ Rennes, Inria, CNRS, IRISA, Rennes Joint with Romain Brenguier (DiffBlue), Guillermo Pérez (Antwerp), and Jean-François Raskin (ULB) (Multiplayer)
More informationExpectations or Guarantees? I Want It All! A Crossroad between Games and MDPs
Expectations or Guarantees? I Want It All! A Crossroad between Games and MDPs V. Bruyère (UMONS) E. Filiot (ULB) M. Randour (UMONS-ULB) J.-F. Raskin (ULB) Grenoble - 0.04.04 SR 04 - nd International Workshop
More informationAssume-admissible synthesis
Assume-admissible synthesis Romain Brenguier, Jean-François Raskin, Ocan Sankur To cite this version: Romain Brenguier, Jean-François Raskin, Ocan Sankur. Assume-admissible synthesis. Acta Informatica,
More informationarxiv: v5 [cs.gt] 19 Feb 2018
Reactive Synthesis Without Regret Paul Hunter, Guillermo A. Pérez, and Jean-François Raskin Département d Informatique, Université Libre de Bruxelles {phunter,gperezme,jraskin}@ulb.ac.be February 2, 28
More informationA Survey of Partial-Observation Stochastic Parity Games
Noname manuscript No. (will be inserted by the editor) A Survey of Partial-Observation Stochastic Parity Games Krishnendu Chatterjee Laurent Doyen Thomas A. Henzinger the date of receipt and acceptance
More informationThe Complexity of Admissibility in Omega-Regular Games
The Complexity of Admissibility in Omega-Regular Games Romain Brenguier Département d informatique, Université Libre de Bruxelles (U.L.B.), Belgium romain.brenguier@ulb.ac.be Jean-François Raskin Département
More informationReactive synthesis without regret
Acta Informatica (2017) 54:3 39 DOI 10.1007/s00236-016-0268-z ORIGINAL ARTICLE Reactive synthesis without regret Paul Hunter 1 Guillermo A. Pérez 1 Jean-François Raskin 1 Received: 29 November 2015 / Accepted:
More informationTheoretical Computer Science
Theoretical Computer Science 458 (2012) 49 60 Contents lists available at SciVerse ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Energy parity games Krishnendu
More informationSynthesis weakness of standard approach. Rational Synthesis
1 Synthesis weakness of standard approach Rational Synthesis 3 Overview Introduction to formal verification Reactive systems Verification Synthesis Introduction to Formal Verification of Reactive Systems
More informationLooking at Mean-Payoff and Total-Payoff through Windows
Looking at Mean-Payoff and Total-Payoff through Windows Krishnendu Chatterjee a,1, Laurent Doyen b, Mickael Randour b,c,2, Jean-François Raskin d,3 a IST Austria (Institute of Science and Technology Austria)
More informationQuantitative Languages
Quantitative Languages Krishnendu Chatterjee 1, Laurent Doyen 2, and Thomas A. Henzinger 2 1 University of California, Santa Cruz 2 EPFL, Lausanne, Switzerland Abstract. Quantitative generalizations of
More informationRandomness for Free. 1 Introduction. Krishnendu Chatterjee 1, Laurent Doyen 2, Hugo Gimbert 3, and Thomas A. Henzinger 1
Randomness for Free Krishnendu Chatterjee 1, Laurent Doyen 2, Hugo Gimbert 3, and Thomas A. Henzinger 1 1 IST Austria (Institute of Science and Technology Austria) 2 LSV, ENS Cachan & CNRS, France 3 LaBri
More informationFaster Pseudopolynomial Algorithms for Mean-Payoff Games
Faster Pseudopolynomial Algorithms for Mean-Payoff Games 1 Faster Pseudopolynomial Algorithms for Mean-Payoff Games L. Doyen, R. Gentilini, and J.-F. Raskin Univ. Libre de Bruxelles Faster Pseudopolynomial
More informationThe Complexity of Ergodic Mean-payoff Games,
The Complexity of Ergodic Mean-payoff Games, Krishnendu Chatterjee Rasmus Ibsen-Jensen Abstract We study two-player (zero-sum) concurrent mean-payoff games played on a finite-state graph. We focus on the
More informationLooking at Mean-Payoff and Total-Payoff through Windows
Looking at Mean-Payoff and Total-Payoff through Windows Krishnendu Chatterjee 1,, Laurent Doyen 2, Mickael Randour 3,, and Jean-François Raskin 4, 1 IST Austria (Institute of Science and Technology Austria)
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationA Note on the Approximation of Mean-Payoff Games
A Note on the Approximation of Mean-Payoff Games Raffaella Gentilini 1 1 University of Perugia, Italy Abstract. We consider the problem of designing approximation schemes for the values of mean-payoff
More informationNash Equilibria in Concurrent Games with Büchi Objectives
Nash Equilibria in Concurrent Games with Büchi Objectives Patricia Bouyer, Romain Brenguier, Nicolas Markey, and Michael Ummels LSV, CNRS & ENS Cachan, France {bouyer,brenguier,markey,ummels}@lsv.ens-cachan.fr
More informationMeasuring Permissivity in Finite Games
Measuring Permissivity in Finite Games Patricia Bouyer, Marie Duflot, Nicolas Markey, and Gabriel Renault 3 LSV, CNRS & ENS Cachan, France {bouyer,markey}@lsv.ens-cachan.fr LACL, Université Paris, France
More informationStochastic Games with Time The value Min strategies Max strategies Determinacy Finite-state games Cont.-time Markov chains
Games with Time Finite-state Masaryk University Brno GASICS 00 /39 Outline Finite-state stochastic processes. Games over event-driven stochastic processes. Strategies,, determinacy. Existing results for
More informationA Survey of Stochastic ω-regular Games
A Survey of Stochastic ω-regular Games Krishnendu Chatterjee Thomas A. Henzinger EECS, University of California, Berkeley, USA Computer and Communication Sciences, EPFL, Switzerland {c krish,tah}@eecs.berkeley.edu
More informationComplexity Bounds for Muller Games 1
Complexity Bounds for Muller Games 1 Paul Hunter a, Anuj Dawar b a Oxford University Computing Laboratory, UK b University of Cambridge Computer Laboratory, UK Abstract We consider the complexity of infinite
More informationFinitary Winning in \omega-regular Games
Finitary Winning in \omega-regular Games Krishnendu Chatterjee Thomas A. Henzinger Florian Horn Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2007-120
More informationAntichain Algorithms for Finite Automata
Antichain Algorithms for Finite Automata Laurent Doyen 1 and Jean-François Raskin 2 1 LSV, ENS Cachan & CNRS, France 2 U.L.B., Université Libre de Bruxelles, Belgium Abstract. We present a general theory
More informationAlternating nonzero automata
Alternating nonzero automata Application to the satisfiability of CTL [,, P >0, P =1 ] Hugo Gimbert, joint work with Paulin Fournier LaBRI, Université de Bordeaux ANR Stoch-MC 06/07/2017 Control and verification
More informationRobust Controller Synthesis in Timed Automata
Robust Controller Synthesis in Timed Automata Ocan Sankur LSV, ENS Cachan & CNRS Joint with Patricia Bouyer, Nicolas Markey, Pierre-Alain Reynier. Ocan Sankur (ENS Cachan) Robust Control in Timed Automata
More informationSolving Partial-Information Stochastic Parity Games
Solving Partial-Information Stochastic Parity ames Sumit Nain and Moshe Y. Vardi Department of Computer Science, Rice University, Houston, Texas, 77005 Email: {nain,vardi}@cs.rice.edu Abstract We study
More informationarxiv: v4 [cs.lo] 2 Oct 2010
Generalized Mean-payoff and Energy Games Krishnendu Chatterjee 1, Laurent Doyen 2, Thomas A. Henzinger 1, and Jean-François Raskin 3 1 IST Austria (Institute of Science and Technology Austria) 2 LSV, ENS
More informationPerfect-information Stochastic Parity Games
Perfect-information Stochastic Parity Games Wies law Zielonka LIAFA, case 7014 Université Paris 7 2, Place Jussieu 75251 Paris Cedex 05, France zielonka@liafa.jussieu.fr Abstract. We show that in perfect-information
More informationThe Complexity of Nash Equilibria in Simple Stochastic Multiplayer Games *
The Complexity of Nash Equilibria in Simple Stochastic Multiplayer Games * Michael Ummels 1 and Dominik Wojtczak 2,3 1 RWTH Aachen University, Germany E-Mail: ummels@logic.rwth-aachen.de 2 CWI, Amsterdam,
More informationApproximate Determinization of Quantitative Automata
Approximate Determinization of Quantitative Automata Udi Boker,2 and Thomas A. Henzinger 2 Hebrew University of Jerusalem 2 IST Austria Abstract Quantitative automata are nondeterministic finite automata
More informationEnergy and Mean-Payoff Games with Imperfect Information
Energy and Mean-Payoff Games with Imperfect Information Aldric Degorre 1, Laurent Doyen 2, Raffaella Gentilini 3, Jean-François Raskin 1, and Szymon Toruńczyk 2 1 Université Libre de Bruxelles (ULB), Belgium
More informationSolution Concepts and Algorithms for Infinite Multiplayer Games
Solution Concepts and Algorithms for Infinite Multiplayer Games Erich Grädel and Michael Ummels Mathematische Grundlagen der Informatik, RWTH Aachen, Germany E-Mail: {graedel,ummels}@logic.rwth-aachen.de
More informationPerfect-Information Stochastic Mean-Payo Parity Games
Perfect-Information Stochastic Mean-Payo Parity Games Krishnendu Chatterjee and Laurent Doyen and Hugo Gimbert and Youssouf Oualhadj Technical Report No. IST-2013-128-v1+1 Deposited at UNSPECIFIED http://repository.ist.ac.at/128/1/full
More informationarxiv: v2 [cs.lo] 9 Apr 2012
Mean-Payoff Pushdown Games Krishnendu Chatterjee 1 and Yaron Velner 2 1 IST Austria 2 Tel Aviv University, Israel arxiv:1201.2829v2 [cs.lo] 9 Apr 2012 Abstract. Two-player games on graphs are central in
More informationInfinite Games. Sumit Nain. 28 January Slides Credit: Barbara Jobstmann (CNRS/Verimag) Department of Computer Science Rice University
Infinite Games Sumit Nain Department of Computer Science Rice University 28 January 2013 Slides Credit: Barbara Jobstmann (CNRS/Verimag) Motivation Abstract games are of fundamental importance in mathematics
More informationGames with Costs and Delays
Games with Costs and Delays Martin Zimmermann Reactive Systems Group, Saarland University, 66123 Saarbrücken, Germany Email: zimmermann@react.uni-saarland.de Abstract We demonstrate the usefulness of adding
More informationPURE NASH EQUILIBRIA IN CONCURRENT DETERMINISTIC GAMES
PURE NASH EQUILIBRIA IN CONCURRENT DETERMINISTIC GAMES PATRICIA BOUYER 1, ROMAIN BRENGUIER 2, NICOLAS MARKEY 1, AND MICHAEL UMMELS 3 1 LSV CNRS & ENS Cachan France e-mail address: {bouyer,markey}@lsv.ens-cachan.fr
More informationChapter 4: Computation tree logic
INFOF412 Formal verification of computer systems Chapter 4: Computation tree logic Mickael Randour Formal Methods and Verification group Computer Science Department, ULB March 2017 1 CTL: a specification
More informationarxiv: v1 [cs.lo] 11 Jan 2019
Life is Random, Time is Not: Markov Decision Processes with Window Objectives Thomas Brihaye 1, Florent Delgrange 1,2, Youssouf Oualhadj 3, and Mickael Randour 1, 1 UMONS Université de Mons, Belgium 2
More informationarxiv: v1 [cs.lo] 2 Jul 2015
Assume-Admissible Synthesis Romain Brenguier, Jean-François Raskin, Ocan Sankur Université Libre de Bruxelles, Brussels, Belgium arxiv:1507.00623v1 [cs.lo] 2 Jul 2015 July 3, 2015 Abstract In this paper,
More informationFinitary Winning in ω-regular Games
Finitary Winning in ω-regular Games Krishnendu Chatterjee 1 and Thomas A. Henzinger 1,2 1 University of California, Berkeley, USA 2 EPFL, Switzerland {c krish,tah}@eecs.berkeley.edu Abstract. Games on
More informationThe Complexity of Stochastic Müller Games
The Complexity of Stochastic Müller Games Krishnendu Chatterjee Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2007-110 http://www.eecs.berkeley.edu/pubs/techrpts/2007/eecs-2007-110.html
More informationWhat s Decidable About Weighted Automata?
What s Decidable About Weighted Automata? Shaull Almagor 1, Udi Boker 1,2, and Orna Kupferman 1 1 Hebrew University, School of Engineering and Computer Science, Jerusalem, Israel. 2 IST, Austria. Abstract.
More informationGeneralized Parity Games
Generalized Parity Games Krishnendu Chatterjee 1, Thomas A. Henzinger 1,2, and Nir Piterman 2 1 University of California, Berkeley, USA 2 EPFL, Switzerland c krish@eecs.berkeley.edu, {tah,nir.piterman}@epfl.ch
More informationThe priority promotion approach to parity games
The priority promotion approach to parity games Massimo Benerecetti 1, Daniele Dell Erba 1, and Fabio Mogavero 2 1 Università degli Studi di Napoli Federico II 2 Università degli Studi di Verona Abstract.
More informationInfinite-Duration Bidding Games
Infinite-Duration Bidding Games Guy Avni 1, Thomas A. Henzinger 2, and Ventsislav Chonev 3 1 IST Austria, Klosterneuburg, Austria 2 IST Austria, Klosterneuburg, Austria 3 Max Planck Institute for Software
More informationChapter 3: Linear temporal logic
INFOF412 Formal verification of computer systems Chapter 3: Linear temporal logic Mickael Randour Formal Methods and Verification group Computer Science Department, ULB March 2017 1 LTL: a specification
More informationLecture December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about
0368.4170: Cryptography and Game Theory Ran Canetti and Alon Rosen Lecture 7 02 December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about Two-Player zero-sum games (min-max theorem) Mixed
More informationOptimal Bounds in Parametric LTL Games
Optimal Bounds in Parametric LTL Games Martin Zimmermann 1 Institute of Informatics University of Warsaw Warsaw, Poland Abstract Parameterized linear temporal logics are extensions of Linear Temporal Logic
More informationOn Promptness in Parity Games (preprint version)
Fundamenta Informaticae XXI (2X) 28 DOI.3233/FI-22- IOS Press On Promptness in Parity Games (preprint version) Fabio Mogavero Aniello Murano Loredana Sorrentino Università degli Studi di Napoli Federico
More informationRepresenting Arithmetic Constraints with Finite Automata: An Overview
Representing Arithmetic Constraints with Finite Automata: An Overview Bernard Boigelot Pierre Wolper Université de Liège Motivation Linear numerical constraints are a very common and useful formalism (our
More informationQuantitative Reductions and Vertex-Ranked Infinite Games
Quantitative Reductions and Vertex-Ranked Infinite Games Alexander Weinert Reactive Systems Group, Saarland University, 66123 Saarbrücken, Germany weinert@react.uni-saarland.de Abstract. We introduce quantitative
More informationRobust Reachability in Timed Automata: A Game-based Approach
Robust Reachability in Timed Automata: A Game-based Approach Patricia Bouyer, Nicolas Markey, and Ocan Sankur LSV, CNRS & ENS Cachan, France. {bouyer,markey,sankur}@lsv.ens-cachan.fr Abstract. Reachability
More informationMarkov Decision Processes with Multiple Long-run Average Objectives
Markov Decision Processes with Multiple Long-run Average Objectives Krishnendu Chatterjee UC Berkeley c krish@eecs.berkeley.edu Abstract. We consider Markov decision processes (MDPs) with multiple long-run
More informationSynthesis of Winning Strategies for Interaction under Partial Information
Synthesis of Winning Strategies for Interaction under Partial Information Oberseminar Informatik Bernd Puchala RWTH Aachen University June 10th, 2013 1 Introduction Interaction Strategy Synthesis 2 Main
More informationProbabilistic Weighted Automata
Probabilistic Weighted Automata Krishnendu Chatterjee 1, Laurent Doyen 2, and Thomas A. Henzinger 3 1 Institute of Science and Technology (IST), Austria 2 Université Libre de Bruxelles (ULB), Belgium 3
More informationSynthesis of Designs from Property Specifications
Synthesis of Designs from Property Specifications Amir Pnueli New York University and Weizmann Institute of Sciences FMCAD 06 San Jose, November, 2006 Joint work with Nir Piterman, Yaniv Sa ar, Research
More informationThe cost of traveling between languages
The cost of traveling between languages Michael Benedikt, Gabriele Puppis, and Cristian Riveros Department of Computer Science, Oxford University Parks Road, Oxford OX13QD UK Abstract. We show how to calculate
More informationMean-Payoff Games and the Max-Atom Problem
Mean-Payoff Games and the Max-Atom Problem Albert Atserias Universitat Politècnica de Catalunya Barcelona, Spain Elitza Maneva Universitat Politècnica de Catalunya Barcelona, Spain February 3, 200 Abstract
More informationPartially Ordered Two-way Büchi Automata
Partially Ordered Two-way Büchi Automata Manfred Kufleitner Alexander Lauser FMI, Universität Stuttgart, Germany {kufleitner, lauser}@fmi.uni-stuttgart.de June 14, 2010 Abstract We introduce partially
More informationSelecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden
1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized
More informationOn Recognizable Languages of Infinite Pictures
On Recognizable Languages of Infinite Pictures Equipe de Logique Mathématique CNRS and Université Paris 7 JAF 28, Fontainebleau, Juin 2009 Pictures Pictures are two-dimensional words. Let Σ be a finite
More informationFrom Liveness to Promptness
From Liveness to Promptness Orna Kupferman Hebrew University Nir Piterman EPFL Moshe Y. Vardi Rice University Abstract Liveness temporal properties state that something good eventually happens, e.g., every
More information: Cryptography and Game Theory Ran Canetti and Alon Rosen. Lecture 8
0368.4170: Cryptography and Game Theory Ran Canetti and Alon Rosen Lecture 8 December 9, 2009 Scribe: Naama Ben-Aroya Last Week 2 player zero-sum games (min-max) Mixed NE (existence, complexity) ɛ-ne Correlated
More informationNash Equilibria for Reachability Objectives in Multi-player Timed Games
Nash Equilibria for Reachability Objectives in Multi-player Timed Games Patricia Bouyer, Romain Brenguier, and Nicolas Markey LSV, ENS Cachan & CNRS, France {bouyer,brenguie,markey}@lsv.ens-cachan.fr Abstract.
More informationReasoning about Strategies: From module checking to strategy logic
Reasoning about Strategies: From module checking to strategy logic based on joint works with Fabio Mogavero, Giuseppe Perelli, Luigi Sauro, and Moshe Y. Vardi Luxembourg September 23, 2013 Reasoning about
More informationAlternating Time Temporal Logics*
Alternating Time Temporal Logics* Sophie Pinchinat Visiting Research Fellow at RSISE Marie Curie Outgoing International Fellowship * @article{alur2002, title={alternating-time Temporal Logic}, author={alur,
More informationSemi-Automatic Distributed Synthesis
Semi-Automatic Distributed Synthesis Bernd Finkbeiner and Sven Schewe Universität des Saarlandes, 66123 Saarbrücken, Germany {finkbeiner schewe}@cs.uni-sb.de Abstract. We propose a sound and complete compositional
More informationarxiv: v3 [cs.gt] 10 Apr 2009
The Complexity of Nash Equilibria in Simple Stochastic Multiplayer Games Michael Ummels and Dominik Wojtczak,3 RWTH Aachen University, Germany E-Mail: ummels@logicrwth-aachende CWI, Amsterdam, The Netherlands
More informationStrategy Logic. 1 Introduction. Krishnendu Chatterjee 1, Thomas A. Henzinger 1,2, and Nir Piterman 2
Strategy Logic Krishnendu Chatterjee 1, Thomas A. Henzinger 1,2, and Nir Piterman 2 1 University of California, Berkeley, USA 2 EPFL, Switzerland c krish@eecs.berkeley.edu, {tah,nir.piterman}@epfl.ch Abstract.
More informationOn Recognizable Languages of Infinite Pictures
On Recognizable Languages of Infinite Pictures Equipe de Logique Mathématique CNRS and Université Paris 7 LIF, Marseille, Avril 2009 Pictures Pictures are two-dimensional words. Let Σ be a finite alphabet
More informationCooperative Reactive Synthesis
Cooperative Reactive Synthesis Roderick Bloem IAIK, Graz University of Technology, Graz, Austria Rüdiger Ehlers University of Bremen and DFKI GmbH, Bremen, Germany Robert Könighofer IAIK, Graz University
More informationApproximation Metrics for Discrete and Continuous Systems
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science May 2007 Approximation Metrics for Discrete Continuous Systems Antoine Girard University
More informationGames with Imperfect Information: Theory and Algorithms
Games with Imperfect Information: Theory and Algorithms Laurent Doyen 1 and Jean-François Raskin 2 1 LSV, ENS Cachan & CNRS, France 2 Université Libre de Bruxelles (ULB), Belgium Abstract. We study observation-based
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More informationMDPs with Energy-Parity Objectives
MDPs with Energy-Parity Objectives Richard Mayr, Sven Schewe, Patrick Totzke, Dominik Wojtczak University of Edinburgh, UK University of Liverpool, UK Abstract Energy-parity objectives combine ω-regular
More informationPlaying Stochastic Games Precisely
Playing Stochastic Games Precisely Taolue Chen 1, Vojtěch Forejt 1, Marta Kwiatkowska 1, Aistis Simaitis 1, Ashutosh Trivedi 2, and Michael Ummels 3 1 Department of Computer Science, University of Oxford,
More informationarxiv: v3 [cs.fl] 30 Aug 2011
Qualitative Concurrent Stochastic Games with Imperfect Information Vincent Gripon and Olivier Serre arxiv:0902.2108v3 [cs.fl] 30 Aug 2011 LIAFA (CNRS & Université Paris Diderot Paris 7) Abstract. We study
More informationControlling probabilistic systems under partial observation an automata and verification perspective
Controlling probabilistic systems under partial observation an automata and verification perspective Nathalie Bertrand, Inria Rennes, France Uncertainty in Computation Workshop October 4th 2016, Simons
More informationSynthesis from Probabilistic Components
Synthesis from Probabilistic Components Yoad Lustig, Sumit Nain, and Moshe Y. Vardi Department of Computer Science Rice University, Houston, TX 77005, USA yoad.lustig@gmail.com, nain@cs.rice.edu, vardi@cs.rice.edu
More informationOn the Accepting Power of 2-Tape Büchi Automata
On the Accepting Power of 2-Tape Büchi Automata Equipe de Logique Mathématique Université Paris 7 STACS 2006 Acceptance of infinite words In the sixties, Acceptance of infinite words by finite automata
More informationA Framework for Automated Competitive Analysis of On-line Scheduling of Firm-Deadline Tasks
A Framework for Automated Competitive Analysis of On-line Scheduling of Firm-Deadline Tasks Krishnendu Chatterjee 1, Andreas Pavlogiannis 1, Alexander Kößler 2, Ulrich Schmid 2 1 IST Austria, 2 TU Wien
More informationan efficient procedure for the decision problem. We illustrate this phenomenon for the Satisfiability problem.
1 More on NP In this set of lecture notes, we examine the class NP in more detail. We give a characterization of NP which justifies the guess and verify paradigm, and study the complexity of solving search
More informationBounded Synthesis. Sven Schewe and Bernd Finkbeiner. Universität des Saarlandes, Saarbrücken, Germany
Bounded Synthesis Sven Schewe and Bernd Finkbeiner Universität des Saarlandes, 66123 Saarbrücken, Germany Abstract. The bounded synthesis problem is to construct an implementation that satisfies a given
More informationOn Model Checking Techniques for Randomized Distributed Systems. Christel Baier Technische Universität Dresden
On Model Checking Techniques for Randomized Distributed Systems Christel Baier Technische Universität Dresden joint work with Nathalie Bertrand Frank Ciesinski Marcus Größer / 6 biological systems, resilient
More informationSFM-11:CONNECT Summer School, Bertinoro, June 2011
SFM-:CONNECT Summer School, Bertinoro, June 20 EU-FP7: CONNECT LSCITS/PSS VERIWARE Part 3 Markov decision processes Overview Lectures and 2: Introduction 2 Discrete-time Markov chains 3 Markov decision
More informationNote on winning positions on pushdown games with omega-regular winning conditions
Note on winning positions on pushdown games with omega-regular winning conditions Olivier Serre To cite this version: Olivier Serre. Note on winning positions on pushdown games with omega-regular winning
More informationController Synthesis with Budget Constraints
Controller Synthesis with Budget Constraints Krishnendu Chatterjee 1, Rupak Majumdar 3, and Thomas A. Henzinger 1,2 1 EECS, UC Berkeley, 2 CCS, EPFL, 3 CS, UC Los Angeles Abstract. We study the controller
More informationThe Target Discounted-Sum Problem
The Target Discounted-Sum Problem Udi Boker The Interdisciplinary Center (IDC), Herzliya, Israel Email: udiboker@idc.ac.il Thomas A. Henzinger and Jan Otop IST Austria Klosterneuburg, Austria Email: {tah,
More informationSymmetric Nash Equilibria
Symmetric Nash Equilibria Steen Vester Supervisors: Patricia Bouyer-Decitre & Nicolas Markey, Laboratoire Spécification et Vérification, ENS de Cachan September 6, 2012 Summary In this report we study
More informationLogic Model Checking
Logic Model Checking Lecture Notes 10:18 Caltech 101b.2 January-March 2004 Course Text: The Spin Model Checker: Primer and Reference Manual Addison-Wesley 2003, ISBN 0-321-22862-6, 608 pgs. the assignment
More informationLimiting Behavior of Markov Chains with Eager Attractors
Limiting Behavior of Markov Chains with Eager Attractors Parosh Aziz Abdulla Uppsala University, Sweden. parosh@it.uu.se Noomene Ben Henda Uppsala University, Sweden. Noomene.BenHenda@it.uu.se Sven Sandberg
More informationReasoning about Equilibria in Game-like Concurrent Systems
Reasoning about Equilibria in Game-like Concurrent Systems Julian Gutierrez, Paul Harrenstein, Michael Wooldridge Department of Computer Science University of Oxford Abstract In this paper we study techniques
More informationTwo Views on Multiple Mean-Payoff Objectives in Markov Decision Processes
wo Views on Multiple Mean-Payoff Objectives in Markov Decision Processes omáš Brázdil Faculty of Informatics Masaryk University Brno, Czech Republic brazdil@fi.muni.cz Václav Brožek LFCS, School of Informatics
More informationThe purpose here is to classify computational problems according to their complexity. For that purpose we need first to agree on a computational
1 The purpose here is to classify computational problems according to their complexity. For that purpose we need first to agree on a computational model. We'll remind you what a Turing machine is --- you
More informationSynthesizing Robust Systems
Synthesizing Robust Systems Roderick Bloem and Karin Greimel (TU-Graz) Thomas Henzinger (EPFL and IST-Austria) Barbara Jobstmann (CNRS/Verimag) FMCAD 2009 in Austin, Texas Barbara Jobstmann 1 Motivation
More informationGames with Discrete Resources
Games with Discrete Resources Sylvain Schmitz with Th. Colcombet, J.-B. Courtois, M. Jurdziński, and R. Lazić LSV, ENS Paris-Saclay & CNRS IBISC, October 19, 217 1/12 Outline multi-dimensional energy parity
More informationProbabilistic Model Checking and Strategy Synthesis for Robot Navigation
Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Dave Parker University of Birmingham (joint work with Bruno Lacerda, Nick Hawes) AIMS CDT, Oxford, May 2015 Overview Probabilistic
More information