Computing Laboratory A GAME-BASED ABSTRACTION-REFINEMENT FRAMEWORK FOR MARKOV DECISION PROCESSES

Similar documents
Computing Laboratory STOCHASTIC GAMES FOR VERIFICATION OF PROBABILISTIC TIMED AUTOMATA. Marta Kwiatkowska Gethin Norman David Parker CL-RR-09-05

Lecture 1. 1 Overview. 2 Maximum Flow. COMPSCI 532: Design and Analysis of Algorithms August 26, 2015

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford

SFM-11:CONNECT Summer School, Bertinoro, June 2011

Simple Stochastic Games with Few Random Vertices Are Easy to Solve

arxiv: v1 [math.co] 25 Apr 2016

Hölder-type inequalities and their applications to concentration and correlation bounds

On resilience of distributed routing in networks under cascade dynamics

Probabilistic model checking with PRISM

On the Linear Threshold Model for Diffusion of Innovations in Multiplex Social Networks

Integer Parameter Synthesis for Real-time Systems

arxiv: v1 [math.pr] 25 Jun 2015

Patterns of Non-Simple Continued Fractions

arxiv: v2 [math.co] 12 Jul 2009

Probabilistic Engineering Design

Isoperimetric problems

Final Exam (Solution) Economics 501b Microeconomic Theory

Balanced Partitions of Vector Sequences

The Geometry and Topology of Groups

Network Flow Problems Luis Goddyn, Math 408

Transmission lines using a distributed equivalent circuit

The Budgeted Minimum Cost Flow Problem with Unit Upgrading Cost

Notes on Linear Minimum Mean Square Error Estimators

Stochastic Model Checking

THE DESCRIPTIVE COMPLEXITY OF SERIES REARRANGEMENTS

Scalar multiplication and algebraic direction of a vector

Math Mathematical Notation

Online Companion to Pricing Services Subject to Congestion: Charge Per-Use Fees or Sell Subscriptions?

Chapter 1. The Postulates of the Special Theory of Relativity

Sampling Binary Contingency Tables with a Greedy Start

Verification and Control of Partially Observable Probabilistic Systems

DECOMPOSING 4-REGULAR GRAPHS INTO TRIANGLE-FREE 2-FACTORS

VISUAL PHYSICS ONLINE RECTLINEAR MOTION: UNIFORM ACCELERATION

Dynamic Vehicle Routing with Heterogeneous Demands

Stable Network Flow with Piece-wise Linear Constraints

arxiv: v1 [math.ds] 19 Jul 2016

ISSN BCAM Do Quality Ladders Rationalise the Observed Engel Curves? Vincenzo Merella University of Cagliari.

An Explicit Lower Bound of 5n o(n) for Boolean Circuits

Towards Universal Cover Decoding

Collective Risk Models with Dependence Uncertainty

Dynamic Vehicle Routing with Moving Demands Part II: High speed demands or low arrival rates

Bc. Dominik Lachman. Bruhat-Tits buildings

The Riemann-Roch Theorem: a Proof, an Extension, and an Application MATH 780, Spring 2019

Math 144 Activity #9 Introduction to Vectors

Section 1.7. Linear Independence

Lecture notes for Analysis of Algorithms : Markov decision processes

State Explosion in Almost-Sure Probabilistic Reachability

P(c y) = W n (y c)p(c)

Inelastic Collapse in One Dimensional Systems with Ordered Collisions

Classical dynamics on graphs

4. A Physical Model for an Electron with Angular Momentum. An Electron in a Bohr Orbit. The Quantum Magnet Resulting from Orbital Motion.

A Survey of Partial-Observation Stochastic Parity Games

A Geometric Review of Linear Algebra

G 2(X) X SOLVABILITY OF GRAPH INEQUALITIES. 1. A simple graph inequality. Consider the diagram in Figure 1.

Towards Green Distributed Storage Systems

Astrometric Errors Correlated Strongly Across Multiple SIRTF Images

LECTURE 2: CROSS PRODUCTS, MULTILINEARITY, AND AREAS OF PARALLELOGRAMS

arxiv: v1 [cs.dm] 21 Oct 2017

On diamond-free subposets of the Boolean lattice

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford

Insights into Cross-validation

Bounding the Effectiveness of Temporal Redundancy in Fault-tolerant Real-time Scheduling under Error Bursts

The optimal pebbling number of the complete m-ary tree

The Simplex Algorithm in Dimension Three 1

Introduction to Thermodynamic Cycles Part 1 1 st Law of Thermodynamics and Gas Power Cycles

Lectures on Spectral Graph Theory. Fan R. K. Chung

Chapter 4: Techniques of Circuit Analysis

arxiv: v1 [stat.ml] 15 Feb 2018

SELECTION, SIZING, AND OPERATION OF CONTROL VALVES FOR GASES AND LIQUIDS Class # 6110

The Random Edge Rule on Three-Dimensional Linear Programs 1 (extended abstract)

A Geometric Review of Linear Algebra

THE MINIMUM MATCHING ENERGY OF BICYCLIC GRAPHS WITH GIVEN GIRTH

Playing Stochastic Games Precisely

Accuracy Bounds for Belief Propagation

An Alternative Characterization of Hidden Regular Variation in Joint Tail Modeling

PREPERIODIC POINTS OF POLYNOMIALS OVER GLOBAL FIELDS

Finding Paths in Grids with Forbidden Transitions

A new Framework for the Analysis of Simultaneous Events

A Closed-Form Characterization of Buyer Signaling Schemes in Monopoly Pricing

The 9 th Romanian Master of Mathematics Competition

Review of Matrices and Vectors 1/45

Rapidly mixing Markov chains

5. Biconnected Components of A Graph

Mean-variance receding horizon control for discrete time linear stochastic systems

Approximation Metrics for Discrete and Continuous Systems

Section 6: PRISMATIC BEAMS. Beam Theory

Two-sided bounds for L p -norms of combinations of products of independent random variables

Why does Saturn have many tiny rings?

UNDERSTAND MOTION IN ONE AND TWO DIMENSIONS

Multiparty Probability Computation and Verification

Dynamic Vehicle Routing with Moving Demands Part II: High speed demands or low arrival rates

Chapter 14 Thermal Physics: A Microscopic View

Optimization Problems in Multiple Subtree Graphs

Notes on Auctionsy. ECON 201B - Game Theory. Guillermo Ordoñez UCLA. February 15, 2007

Asymptotic Normality of an Entropy Estimator with Exponentially Decaying Bias

CS 7180: Behavioral Modeling and Decisionmaking

The Evolution of Grötzsch s Theorem

A Counterexample Guided Abstraction-Refinement Framework for Markov Decision Processes

NUMERICAL EVALUATION OF THE THERMAL PERFORMANCES OF ROOF-MOUNTED RADIANT BARRIERS

The Kinetic Theory of Gases

Transcription:

Computing Laboratory A GAME-BASED ABSTRACTION-REFINEMENT FRAMEWORK FOR MARKOV DECISION PROCESSES Mark Kattenbelt Marta Kwiatkowska Gethin Norman Daid Parker CL-RR-08-06 Oxford Uniersity Computing Laboratory Wolfson Building, Parks Road, Oxford OX 3QD

Abstract In the field of model checking, abstraction refinement has proed to be an extremely successful methodology for combating the state-space explosion problem. Howeer, little practical progress has been made in the setting of probabilistic erification. In this paper we present a noel abstraction-refinement framework for Marko decision processes (MDPs), which are widely used for modelling and erifying systems that exhibit both probabilistic and nondeterministic behaiour. Our framework comprises an abstraction approach based on stochastic two-player games, two refinement methods and an efficient algorithm for the abstraction-refinement loop. The key idea behind the abstraction approach is to maintain a separation between nondeterminism present in the original MDP and nondeterminism introduced during the abstraction process, each type being represented by a different player in the game. Crucially, this allows lower and upper bounds to be computed for the alues of reachability properties of the MDP. These gie a quantitatie measure of the quality of the abstraction and form the basis of the corresponding refinement methods. We describe a prototype implementation of our framework and present experimental results demonstrating automatic generation of compact, yet precise, abstractions for a large selection of real-world case studies. Introduction Numerous real-life systems from a wide range of application domains, including communication and network protocols, security protocols and distributed algorithms, exhibit both probabilistic and nondeterministic behaiour, and therefore deeloping techniques to erify the correctness of such systems is an important research topic. Marko decision processes (MDPs) are a natural and widely used model for this purpose and the automatic erification of MDPs using probabilistic model checking has proed successful for their analysis. Despite improements in implementations and tool support in this area, the state-space explosion problem remains a major hurdle for the practical application of these methods. This paper is motiated by the success of abstraction-refinement techniques [3], which hae been established as one of the most effectie ways of attacking the state-space explosion problem in non-probabilistic model checking. The basic idea is to construct a smaller abstract model, by remoing details from the concrete system not releant to the property of interest, which is consequently easier to analyse. This is done in such a way that when the property is erified true in the abstraction it also holds in the concrete system. On the other hand, if the property does not hold in the abstraction, information from the model checking process (typically a counterexample) is used either to show that the property is false in the concrete system or to refine the abstraction. This process forms the basis of a loop which refines the abstraction until the property is shown to be true or false in the concrete system. In the probabilistic setting, it is typically necessary to consider quantitatie properties, in which case the actual probability or expectation of some eent must be determined, e.g. the probability of reaching an error state within T time units. Therefore, in this setting a different notion of the correspondence between properties of the concrete and

abstract models is required. A suitable alternatie, for example, would be the case where quantitatie results computed from the abstraction constitute conseratie bounds on the actual alues for the concrete model. In fact, due to the presence of nondeterminism in an MDP there is not necessarily a single alue corresponding to a gien quantitatie measure. Instead, best-case and worst-case scenarios are analysed. More specifically, model checking of MDPs typically reduces to computation of probabilistic reachability and expected reachability properties, namely the minimum or maximum probability of reaching a set of states, and the minimum or maximum expected reward cumulated in doing so. When constructing an abstraction of an MDP, the resulting model will inariably exhibit a greater degree of nondeterminism caused by the uncertainty with regards to the precise behaiour of the system that the abstraction process introduces. The key idea in our abstraction approach is to maintain a distinction between the nondeterminism from the original MDP and the nondeterminism introduced during the abstraction process. To achiee this, we model abstract MDPs as stochastic two-player games [35, 4], where the two players correspond to the two different forms of nondeterminism. We can then analyse these models using techniques deeloped for such games [5, 3, 0]. Our analysis of these abstract models results in a separate lower and upper bound for both the minimum and maximum probabilities (or expected reward) of reaching a set of states. This approach is particularly appealing since this information proides both a quantitatie measure of the quality (or preciseness) of the abstraction and an indication of how to improe it. By comparison, if no discrimination between the two forms of nondeterminism is made, a single lower and upper bound would be obtained. Therefore, in the (common) situation where the minimum and maximum probabilities (or expected rewards) are notably different, the quantitatie measure and corresponding basis for refinement would be lost. Consider, for example, the extreme case where the two-player game approach reeals that the minimum probability of reaching some set of states is in the interal [0, ε min ] and the maximum probability is in the interal [ ε max, ]. In this case, a single pair of bounds could at best establish that both the minimum and maximum probability lie within the interal [0, ], effectiely yielding no information about the utility of the abstraction or how to refine it in order to improe the bounds. Based on the separate bounds obtained through this approach, we present two methods for automatically refining our game-based abstractions of MDPs and an efficient algorithm for an abstraction-refinement loop. In addition, using a prototype implementation of the framework, we gie experimental results that demonstrate automatic generation of compact, yet precise, abstractions for a large selection of MDP case studies. For conenience, the current prototype performs model-leel abstractions, first building an MDP in the probabilistic model checker PRISM [2, 32] and then reducing it to a stochastic two-player game. Our framework is also designed to function with higher-leel abstraction methods such as predicate abstraction [20], which hae recently been adapted to probabilistic models [38, 24]. A preliminary ersion of this paper, introducing the game-based abstraction approach, was published in conference proceedings as [26]. 2

Outline of the Paper. In the next section, we present background material on Marko decision processes and stochastic two-player games required in the remainder of the paper. Section 3 describes our abstraction technique and shows its correctness. Based on this, Section 4 introduces a corresponding abstraction-refinement framework. In Section 5, we describe a prototype implementation of the techniques from Sections 3 and 4, and present experimental results for a large selection of real-world case studies. Section 6 discusses related work and Section 7 concludes the paper. 2 Background Let R 0 denote the set of non-negatie reals. For a finite set Q, we denote by Dist(Q) the set of discrete probability distributions oer Q, i.e. the set of functions µ : Q [0, ] such that q Q µ(q) =. 2. Marko Decision Processes Marko decision processes (MDPs) are a natural representation for the modelling and analysis of systems with both probabilistic and nondeterministic behaiour. Definition A Marko decision process is a tuple M = (S, s init, Steps, r), where S is a set of states; s init S is an initial state; Steps : S 2 Dist(S) is a probabilistic transition function; r : S Dist(S) R 0 is a reward function. A probabilistic transition s µ s is made from a state s by first nondeterministically selecting a distribution µ Steps(s) and then making a probabilistic choice of target state s according to the distribution µ. A path of an MDP represents a particular resolution of both nondeterminism and probability. Formally, a path of an MDP is a non-empty finite or infinite sequence of probabilistic transitions: µ 0 µ µ 2 π = s 0 s s2 such that µ i (s i+ )>0 for all i 0. We denote by π(i) the state s i and by step(π, i) the distribution µ i. For a finite path π fin, we let π fin denote the length of π fin (i.e. the number of transitions) and last(π fin ) be its final state. Finally, π (i) denotes the prefix of length i of π. In contrast to a path, an adersary (sometimes also known as a scheduler or policy) represents a particular resolution of nondeterminism only. More precisely, an adersary is a function mapping eery finite path π fin to a distribution µ Steps(last(π fin )). For any state s S and adersary A, let Path A fin(s) and Path A (s) denote the sets of finite and infinite paths starting in s that correspond to A. 3

Definition 2 An adersary A is called memoryless (or simple) if, for any finite paths π fin and π fin for which last(π fin ) = last(π fin), we hae A(π fin ) = A(π fin). The behaiour of an MDP from a state s, under a gien adersary A, is purely probabilistic and is described by a probability space (Path A (s), F A s, Prob A s ) oer the infinite paths corresponding to A that start in s. This can be defined in standard fashion [25]. Based on this, we introduce two quantitatie measures for MDPs which together form the basis for probabilistic model checking of MDPs [, 7]. The first measure is probabilistic reachability, which refers to the probability of reaching a set of target states. For adersary A, the probability of reaching the target set F S from state s is gien by: p A s (F ) def = Prob A s {π Path A (s) i N. π(i) F }. The second measure we consider is expected reachability, which refers to the expected reward accumulated before reaching a set of target states. For an adersary A, the expected reward of reaching the target set F from state s, denoted e A s (F ), is defined as the usual expectation of the function r(f, ) (which returns, for a gien path, the total reward accumulated until a state in F is reached along the path) with respect to the probability measure Prob A s. More precisely: e A s (F ) def = r(f, π) dprob A s where for any path π Path A (s): r(f, π) def = { min{j π(j) F } π Path A (s) i= r(π(i ), step(π, i )) if j N. π(j) F otherwise. For simplicity, we hae defined the reward of a path which does not reach F to be, een though the total reward of the path may not be infinite. Essentially, this means that the expected reward of reaching F from s under A is finite if and only if, under the adersary A, a state in F is reached from s with probability. Quantifying oer all adersaries, we consider both the minimum and maximum alues of these measures. Definition 3 For an MDP M = (S, s init, Steps, r), reachability objectie F S and state s S, the minimum and maximum reachability probabilities of reaching F from s equal: p min s (F ) = inf A p A s (F ) and p max s (F ) = sup A p A s (F ) and the minimum and maximum expected rewards of reaching F from s equal: e min s (F ) = inf A e A s (F ) and e max s (F ) = sup A e A s (F ). 4

Computing alues for probabilistic and expected reachability reduces to the stochastic shortest path problem for Marko decision processes; see for example [8, 2]. A key result in this respect is that optimality with respect to probabilistic and expected reachability can always be achieed with memoryless adersaries (see Definition 2). A consequence of this is that these quantities can be computed through an iteratie processes known as alue iteration [8, 2], the basis of which is gien in the lemma below. Lemma 2. Consider an MDP M = (S, s init, Steps, r) and set of target states F S. The sequences of ectors p min n n N and p max n n N conerge to the minimum and maximum probability of reaching the target set F, where for any state s S: if s F, then p min n (s) = p max n (s) = for all n N; if s F, then: { p min n (s) = p max n (s) = { min µ Steps(s) max µ Steps(s) 0 if n = 0 otherwise s S µ(s ) p min n (s ) 0 if n = 0 otherwise. s S µ(s ) p max n (s ) The sequences of ectors e min n n N and e max n n N conerge to the minimum and maximum expected reward of reaching the target set F, where for any state s S: if s F, then e min n (s) = e max n (s) = 0 for all n N; if s F, then: e min n (s) = e max n (s) = 2.2 Stochastic Two-Player Games if p max (s) < 0 ) if p max (s) = and n = 0 otherwise ( min r(s, µ) + s µ Steps(s) S µ(s ) e min n (s ) if p min (s) < 0 ) if p min (s) = and n = 0 otherwise. ( max r(s, µ) + s µ Steps(s) S µ(s ) e max n (s ) In this section, we reiew (simple) stochastic games [35, 4], which are turn-based games inoling two players and chance. Definition 4 A stochastic two-player game is a tuple G = ((V, E), init, (V, V 2, V p ), δ, r) where: (V, E) is a finite directed graph; init V is an initial ertex; 5

(V, V 2, V p ) is a partition of V ; δ : V p Dist(V ) is the probabilistic transition function; r : E R 0 is a reward function oer edges. Vertices in V, V 2 and V p are called player, player 2 and probabilistic ertices, respectiely. The game operates as follows. Initially, a token is placed on the starting ertex init. At each step of the game, the token moes from its current ertex to a neighbouring ertex in the game graph. The choice of depends on the type of the ertex. If V then player chooses, if V 2 then player 2 makes the choice, and if V p then is selected randomly according to the distribution δ(). A Marko decision process can be thought of as a stochastic game in which there are no player 2 ertices and where there is a strict alternation between player and probabilistic ertices. A play in the game G is a sequence of ertices ω = 0 2... such that ( i, i+ ) E for all i N. We denote by ω(i) the ertex i and, for a finite play ω fin, we write last(ω fin ) for the final ertex of ω fin and ω fin for its length (the number of transitions). The prefix of length i of play ω is denoted ω (i). A strategy for player is a function σ : V V Dist(V ), i.e. a function from the set of finite plays ending in a player ertex to the set of distributions oer ertices, such that for any ω fin V V and V, if σ (ω fin )()>0, then (last(ω fin ), ) E. Strategies for player 2, denoted by σ 2, are defined analogously. For a fixed pair of strategies (σ, σ 2 ) we denote by Play σ,σ 2 fin () and Play σ,σ 2 () the set of finite and infinite plays starting in ertex that correspond to these strategies. For strategy pair (σ, σ 2 ), the behaiour of the game is completely random and, for any ertex, we can construct a probability space (Play σ,σ 2 (), F σ,σ 2, Prob σ,σ 2 ). This construction proceeds similarly to MDPs [25]. A reachability objectie of a game G is a set of ertices F which a player attempts to reach. For a fixed strategy pair (σ, σ 2 ) and ertex V we define both the probability and expected reward corresponding to the reachability objectie F as: p σ,σ 2 (F ) e σ,σ 2 (F ) def = Prob σ,σ 2 {ω Play σ,σ 2 () i N ω(i) F } def = r(f, ω) dprob σ,σ 2 ω Play σ,σ 2 () where for any play ω Play σ,σ 2 (): r(f, ω) def = { min{j ω(j) F } i= r(ω(i ), ω(i)) if j N. ω(j) F otherwise. Definition 5 For a game G = ((V, E), init, (V, V 2, V p ), δ, r), reachability objectie F V and ertex V, the optimal probabilities of the game for player and player 2, with respect to F and, are defined as follows: sup σ inf σ2 p σ,σ 2 (F ) and sup σ2 inf σ p σ,σ 2 (F ) 6

and the optimal expected rewards are: sup σ inf σ2 e σ,σ 2 (F ) and sup σ2 inf σ e σ,σ 2 (F ). A player strategy σ is optimal from ertex with respect to the probability of the objectie if: inf σ2 p σ,σ 2 (F ) = sup σ inf σ2 p σ,σ 2 (F ). () The optimal strategies for player 2 and for expected rewards can be defined analogously. We now summarise a number of results from [0, 4, 5]. Definition 6 A strategy σ i is pure if it does not use randomisation, that is, for any finite play ω fin such that last(ω fin ) V i, there exists V such that σ i (ω fin )( ) =. A strategy σ i is memoryless if its choice depends only on the current ertex, that is, σ i (ω fin ) = σ i (ω fin) for any finite plays ω fin and ω fin such that last(ω fin ) = last(ω fin). Similarly to MDPs, for any stochastic game, the family of pure memoryless strategies suffices for optimality with respect to reachability objecties. Lemma 2.2 Consider a stochastic game G = ((V, E), init, (V, V 2, V p ), δ, r) and set of target ertices F V. The sequence of ectors p n n N conerges to the optimal probabilities for player with respect to the reachability objectie F, where for any ertex V : if F, then p n () = for all n N; and otherwise: p n () = 0 if n = 0 max (, ) E p n ( ) if n > 0 and V min (, ) E p n ( ) if n > 0 and V 2 V δ()( ) p n ( ) if n > 0 and V p. The sequence of ectors e n n N conerges to the optimal expected rewards for player with respect to the reachability objectie F, where for any ertex V : if F, then e n () = 0 for all n N; if sup σ2 inf σ p σ,σ 2 (F ) <, then e n () = for all n N; and otherwise: e n () = 0 if n = 0 max (, ) E(r(, ) + e n ( )) if n > 0 and V min (, ) E(r(, ) + e n ( )) if n > 0 and V 2 V (r(, ) + δ()( ) e n ( )) if n > 0 and V p. Lemma 2.2 forms the basis of an iteratie method to compute the ector of optimal alues for a game. Note that although this concerns only the optimal probability for player, similar results hold for player 2. Obsere the similarity between this and the alue iteration method for MDP solution described in Lemma 2.. 7

3 Abstraction for Marko Decision Processes We now present our notion of abstraction for MDPs. As described in Section, the abstract ersion of a concrete MDP takes the form of a stochastic two-player game where the choices made by player 2 correspond to the nondeterminism in the original MDP and the choices made by player correspond to the nondeterminism introduced by the abstraction process. An abstract MDP is a stochastic two-player game, defined by an MDP M = (S, s init, Steps, r) and a partition P = {S, S 2,..., S n } of its state space S. The elements of P are referred to as abstract states, each comprising a set of concrete states. Intuitiely, player ertices in the game are abstract states and player 2 ertices correspond to the sets of nondeterministic choices in indiidual concrete states. In the following, for any distribution µ oer S, we denote by µ the probability distribution oer P lifted from µ, i.e. µ(s i ) = s S i µ(s) for all S i P. Similarly, we denote by Steps(s) the set {(r(s, µ), µ) µ Steps(s)}. The inclusion of rewards in the definition of Steps(s) ensures that the reward alues associated with probability distributions in the concrete model are presered, een in the case where multiple (concrete) distributions are lifted to the same distribution oer abstract states. This differs from [26] where, to simplify the presentation, we assumed that distributions that were lifted to the same abstract distribution had identical reward alues. Definition 7 Gien an MDP M = (S, s init, Steps, r) and a partition P of the state space S, we define the corresponding abstract MDP as the stochastic game in which: V = P; G P M = ((V, E), init, (V, V 2, V p ), δ, r) V 2 = { 2 R 0 Dist(P) 2 = Steps(s) for some s S}; V p = { p R 0 Dist(P) p Steps(s) for some s S}; init = S i where s init S i ; (, ) E if and only if one of the following conditions holds: V, V 2 and = Steps(s) for some s ; V 2, V p and ; V p, V and = (r, µ) such that µ( )>0; δ : V p Dist(V ) projects (r, µ) V p onto µ; r : E R 0 is the reward function such that for any (, ) V V : { r(, r if (, ) = ) V 2 V p, and = (r, µ) for some µ Dist(P ) 0 otherwise. 8

0.3 s 3 0.3 0.5 s s 0 0.5 s 2 0. 0.7 0.5 0.5 0.8 s 4 s 5 s s 0.0 s 2 0. 0.7.0 0.8 s 4 s 5 s 3 s 6 0. 0. s 6 (a) Concrete model (MDP) (b) Abstraction (stochastic game) Figure : Simple example of the abstraction process for Marko decision processes Example 3. We illustrate the abstraction process on a simple MDP, shown in Figure (a) and the partition P = {{s 0 }, {s, s 2 }, {s 4, s 5 }, {s 3, s 6 }}. The abstract MDP is gien in Figure (b): shapes outlined in thick black lines are player ertices and each one s successor player 2 ertices are outlined with grey lines and enclosed within it. Successor probabilistic ertices (distributions oer player ertices) are represented by sets of grouped, labelled arrows emanating from each player 2 ertex. The states of the original MDP, sets of which comprise player and player 2 ertices, are also depicted. Intuitiely, the roles of the ertices and players in the abstract MDP can be understood as follows. A player ertex corresponds to an abstract state: an element of the partition of the states from the original MDP. Player chooses a concrete state from this set and then player 2 chooses a probability distribution from those aailable in the concrete state (which is now a distribution oer abstract states rather than concrete states). This description, and Example 3. (see Figure ), perhaps gie the impression that the abstraction does not reduce the size of the model. In fact this is generally not the case. Firstly, note that V 2 ertices are actually sets of reward-distribution pairs that correspond to concrete states, not the concrete states themseles. Hence, concrete states with the same outgoing distributions and reward alues are collapsed into one player 2 state. In fact, there is an een greater reduction since these outgoing distributions hae now been lifted to the (smaller) abstract state space. Furthermore, in practice there is no need to store the entire ertex set V of the abstract MDP. Since we hae a strict alternation between V, V 2 and V p ertices, we need only store the ertices in V, the outgoing transitions comprising each reward-distribution pair from V, and how these pairs are grouped (into elements of V 2 ). Later, in Example 3.5 and Section 5 we will show how, on all the case studies considered, the abstraction process brings a significant reduction in model size. 3. Analysis of the Abstract MDP In this section we describe how, for an MDP M and state partition P, the abstract MDP G P M yields lower and upper bounds for probabilistic reachability and expected reachability properties of M. 9

We also show that a finer partition results in a more precise abstraction, i.e. an abstract MDP for which the lower and upper bounds are tighter. The notion of a finer partition is defined as follows. Definition 8 If P and P are partitions of a state space S, then P is finer than P (denoted P P ) if, for any P, there exists P such that. Equialently, we say that P is coarser than P. To ease notation we let, for x {p, e}: x lb,min (F ) x ub,min (F ) x lb,max (F ) x ub,max (F ) def = inf σ,σ 2 x σ,σ 2 (F ) def = sup σ inf σ2 x σ,σ 2 (F ) def = sup σ2 inf σ x σ,σ 2 (F ) def = sup σ,σ 2 x σ,σ 2 (F ) The alues x ub,min (F ) and x lb,max (F ), are the optimal reachability probabilities (if x = p) and expected rewards (if x = e) for players and 2, respectiely. As we hae seen in Lemma 2.2, these optimal alues can be computed using an iteratie process. This process can also be used to generate a pair of (memoryless) strategies that result in these optimal alues [5]. The other alues, x lb,min (F ) and x ub,max (F ), although not usually considered for games (because the players co-operate), can be computed similarly by considering the game as an MDP [8] (see Lemma 2.). The following theorem demonstrates that a finer partition results in a more precise abstraction, i.e. an abstract MDP for which the lower and upper bounds are tighter. Theorem 3.2 Let M = (S, s init, Steps, r) be an MDP, P and P partitions of S such that P P and F P P a set of target states. If G P M = ((V, E), init, (V, V 2, V p ), δ, r) and G P M = ((V, E ), init, (V, V 2, V p ), δ, r ) are the stochastic games constructed from P and P according to Definition 7, then for any x {e, p} and P: x lb,min (F ) x lb,min (F ) and x ub,min (F ) x ub,min (F ) x lb,max (F ) x lb,max (F ) and x ub,max (F ) x ub,max (F ) where is the unique element of P such that and F = {F }. The following theorem shows how an analysis of the abstract MDP G P M yields lower and upper bounds for probabilistic reachability and expected reachability properties of the MDP M from which it was constructed. Theorem 3.3 Let M = (S, s init, Steps, r) be an MDP, P a partition of S and F P a set of target states. If G P M = ((V, E ), init, (V, V 2, V p ), δ, r ) is the stochastic game constructed from M and P according to Definition 7, then for any x {e, p} and s S: x lb,min (F ) x min s (F ) x ub,min (F ) x lb,max (F ) x max s (F ) x ub,max (F ) where is the unique ertex of V such that s and F = {F }. 0

3.2 Examples In this section, we present examples illustrating the application of the results gien aboe. Then, in the following section, we proide proofs of Theorems 3.2 and 3.3. Example 3.4 Let us return to the preious simple example (see Example 3. and Figure ). Suppose that we are interested in the probability in the original MDP (Figure (a)) of, starting from state s 0, reaching the target set {s 4, s 5 }. The minimum and maximum reachability probabilities can be computed as 5/9 (0.789473) and 8/9 (0.947368) respectiely. From the abstraction shown in Figure (b) and the results of Theorem 3.3, we can establish that the minimum and maximum probabilities lie within the interals [7/0, 8/9] ([0.7, 0.888889]) and [8/9, ] ([0.888889, ]) respectiely. If, on the other hand, the abstract model had instead been constructed as an MDP, i.e. with no discrimination between the two forms of nondeterminism, we would only hae been able to determine that the minimum and maximum reachability probabilities both lay in the interal [0.7, ]. Example 3.5 To illustrate the application of our abstraction to a larger MDP, we consider a more complex example: a model of the Zeroconf protocol [2] for dynamic selfconfiguration of local IP addresses within a local network. Zeroconf proides a distributed, plug and play approach to IP address configuration, managed by the indiidual deices of the network. The model concerns the situation where a new deice joins a network of N existing hosts, in which there are a total of M IP addresses aailable. For full details of the MDP model of the protocol and partition used in the abstraction process, see [26]. Table shows the size of the concrete model (MDP) and its abstraction (game) for a range of alues of N and M. For each model, we compute the minimum and maximum expected time for a host to complete the protocol (i.e. to configure and start using an IP address). Table shows the exact results, obtained from the MDP, and the lower and upper bounds, obtained from the abstraction. In addition, Figure 2 illustrates results for the minimum probability that the new host configures successfully by time T. As stated preiously, an adantage of our approach is the ability to quantify the utility of the abstraction, based on the difference between the lower and upper bounds obtained. In the case of the plots in Figure 2, for a particular time bound T this difference is indicated by the ertical distance between the cures for the lower and upper bounds at the point T on the horizontal axis. Examining these differences between bounds for the presented results, it can be seen that our abstraction approach yields tight approximations for the performance characteristics of the protocol while at the same time producing a significant reduction in both the number of states and transitions. Obsere also that the size of the abstract model increases linearly, rather than exponentially, in N and is independent of M. 3.3 Correctness of the Abstraction We now proe Theorems 3.2 and 3.3, presented in Section 3.. Before doing so, we require a number of preliminary concepts. For the remainder of this section we fix

N M States (transitions) Minimum expected time Maximum expected time Concrete system Abstraction lb Exact ub lb Exact ub 4 26,2 (50,624) 737 (,594) 8.088 8.572 8.290 8.23 8.2465 8.3047 5 58,497 (39,04) 785 (,678) 8.40 8.2035 8.2839 8.595 8.383 8.3950 6 32 45,80 (432,944) 833 (,762) 8.758 8.2533 8.3538 8.988 8.3956 8.4922 7 220,53 (64,976) 857 (,806) 8.23 8.2939 8.4293 8.242 8.4579 8.5972 8 432,85 (,254,480) 88 (,850) 8.2538 8.3379 8.50 8.287 8.5254 8.709 4 50,377 (98,080) 737 (,594) 8.0508 8.0733 8.022 8.0574 8.50 8.422 5 3,27 (270,272) 785 (,678) 8.0645 8.093 8.299 8.0730 8.457 8.808 6 64 282,85 (839,824) 833 (,762) 8.0788 8.36 8.586 8.089 8.774 8.2206 7 426,529 (,89,792) 857 (,806) 8.0935 8.289 8.883 8.058 8.2008 8.269 8 838,905 (2,439,600) 88 (,850) 8.088 8.448 8.290 8.23 8.2252 8.3047 Table : Zeroconf protocol (Example 3.5): model statistics and numerical results (expected completion time) Probability not configured by time T 0.3 0.25 0.2 0.5 0. 0.05 upper bound actual alue lower bound 0 8 0 2 4 6 T (seconds) (a) N=8 and M=32 Probability not configured by time T 0.3 0.25 0.2 0.5 0. 0.05 upper bound actual alue lower bound 0 8 0 2 4 6 T (seconds) (b) N=8 and M=64 Figure 2: Zeroconf protocol (Example 3.5): minimum probability the new host configures successfully by time T an MDP M = (S, s init, Steps, r), partitions P and P of S such that P P and set of target states F P P. Let G P M = ((V, E), init, (V, V 2, V p ), δ, r) and G P M = ((V, E ), init, (V, V 2, V p ), δ, r ) be the abstract MDPs obtained from P and P according to Definition 7 and let F = {F }. To simplify notation, we will add subscripts to distributions µ and sets of rewarddistributions pairs Steps(s), indicating the partition that was used to lift them from the concrete to the abstract state space, e.g. for partition P, µ Dist(S), P and s S, we hae µ P () = s µ(s) and Steps(s) P = {(r(s, µ), µ P ) µ Steps(s)}. We next define a mapping from ertices of G P M to the ertices of GP M. Definition 9 Let [ ] : V V be the mapping from ertices of G P M that for any V : to ertices of GP M such if P, then [] = where is the unique element of P such that (such a exists from the fact that P P ); if = Steps(s) P for some s S, then [] = Steps(s) P ; 2

if = (r, µ P ) for some r R 0 and µ Dist(S), then [] = (r, µ P ). By construction, this mapping extends naturally to plays, i.e. for any play ω = 0 2... of G P M we hae [ω] = [ 0][ ][ 2 ]... is a play of G P M. We denote by [ ] the inerse mapping from plays of G P M to the plays of GP M, i.e. for any finite play ω of G P M we hae [ω ] = {ω V [ω] = ω }. Lemma 3.6 For any play ω of G P M, it holds that r(f, ω) = r (F, [ω]). Proof. The proof follows directly from Definition 9 and since we assume that F is an element of both P and P. Using this mapping between plays, for any player ertex and strategy pair σ = (σ, σ 2 ) of the game G P M, we now construct a (randomised) strategy pair [ σ] = ([σ ], [σ 2 ] ) of the game G P M such that, under σ and [ σ], when starting at and [] respectiely, the probability of reaching F is equal. For any finite play ω of G P M such that ω (0) = [] and last(ω ) V i, let the probability of [σ i] selecting ertex V after play ω equal: [σ i ] (ω )( ) ) def = Prob σ ([ω ] Prob σ ([ω ] ) where [ ω ] = {ω [ ω ] ω(0) = } for any play ω of G P M and Prob σ (Ω) = Prob σ {ω Play σ () ω (i) Ω for some i N} for any set of finite plays Ω of G P M. Proposition 3.7 For any ertex and strategy pair σ = (σ, σ 2 ) of G P M, if Ω is a measurable set of plays of Play [ σ] ([]), then Prob [ σ] [] (Ω) = Prob σ ([Ω] ). Proof. From the measure construction (see [25]) it is sufficient to show that Prob [ σ] [] (ω ) = Prob σ [ω ] for any finite play ω of G P M and ertex [ω (0)], which we now proe by induction on the length of the play ω. If ω =, then ω comprises a single ertex and since [ω ] = {}, by the measure construction [25]: Prob [ σ] [] (ω ) = Prob σ [ω ] =. Next, suppose by induction that the lemma holds for all plays of length n. Consider any play ω of length n+ of G P M. By definition ω is of the form ω ṽ for some play ω of length n and ertex ṽ V. If last( ω ) V i (for i =, 2), then from measure construction (see [25]): Prob [ σ] [] (ω ) = Prob [ σ] [] ( ω ) [σ i ] ( ω )(ṽ ) = Prob σ ([ ω ] ) [σ i ] ( ω )(ṽ ) by induction = Prob σ ([ ω ] ) Prob σ ([ ω ṽ ] ) Prob σ ([ ω ] ) by definition of [σ i ] = Prob σ ([ ωṽ ] ) rearranging = Prob σ ([ω ] ) since ω = ω ṽ. 3

On the other hand, if last( ω ) V p, then last( ω ) = (r, µ P ) for some r R 0 and µ Dist(S), and hence in this case, from the measure construction [25] we hae: Prob [ σ] [] (ω ) = Prob [ σ] [] ( ω ) µ P (ṽ ) = Prob σ ([ ω ] ) µ P (ṽ ) by induction ( ) = Prob σ ([ ω ] ) µ P(ṽ) by Definition 9 ṽ [ṽ ] = ṽ [ṽ ] Prob σ ([ ω ] ) µ P (ṽ) rearranging = ṽ [ṽ ] Prob σ ([ ω ] ṽ) by definition of Prob σ = Prob σ ([ω ] ) by Definition 9 and since ω = ω ṽ Since these are all the cases to consider, the lemma holds by induction. Finally, we require the following lemma and classical result from measure theory. Lemma 3.8 For any ertex and strategy pair σ = (σ, σ 2 ) of G P M, the mapping [ ] of Definition 9 is a measurable function between (Play σ (), F σ ) and (Play [ σ] ([]), F [ σ] [] ). Proof. The proof follows from measure construction (see [25]). Theorem 3.9 ([9]) Let (Ω, F) and (Ω, F ) be measurable spaces, and suppose that Pr is a measure on (Ω, F) and the function T : Ω Ω is measurable. If f is a real nonnegatie measurable function on (Ω, F ), then: f(t ω) dpr = f(ω ) dprt. ω Ω ω Ω We are now in a position to gie the proofs of Theorems 3.2 and 3.3. Proof (of Theorem 3.2). The proof can be reduced to showing that for any pair of strategies σ = (σ, σ 2 ) and player ertex of G P M : p σ (F ) = p [ σ] [] (F ) and e σ (F ) = e [ σ] [] (F ) where [ σ] is the strategy pair of the game G P M constructed from σ and as described aboe. The first equation follows from Proposition 3.7 using standard results from measure theory. For the second equation, by definition of the expected rewards for games 4

(see Section 2.2), we hae: e σ (F ) = r(f, ω) dp rob σ ω Play σ () = r (F, [ω]) dp rob σ by Lemma 3.6 = = ω Play σ () ω Play [ σ] ([]) ω Play [ σ] ([]) = e [ σ] [] (F ) as required. r(f, ω ) dp rob σ ([ ] ) by Theorem 3.9 and Lemma 3.8 r(f, ω ) dp rob [ σ] [] by Proposition 3.7 Proof (of Theorem 3.3). following facts. The proof follows from Theorem 3.2 together with the The partition P = {{s} s S} is finer than all other partitions. If G P M is constructed from partition P = {{s} s S} according to Definition 7, then: for all s S and x {e, p}. x lb,min {s} (F ) = x min s (F ) = x ub,min {s} (F ) x lb,max {s} (F ) = x max s (F ) = x ub,max {s} (F ) 4 An Abstraction-Refinement Framework for Marko Decision Processes We now present a framework for automatic abstraction-refinement of MDPs which uses the game-based abstraction method of Section 3. Unlike in the non-probabilistic case, probabilistic erification is typically quantitatie in nature: probabilistic model checking of MDPs essentially inoles computing minimum or maximum probabilistic or expected reachability measures. The construction and analysis of a game-based abstraction of an MDP yields lower and upper bounds for these alues (plus corresponding strategies). The difference between these bounds can be seen as a measure of the quality of the abstraction. If this difference is too high (say, aboe some tolerated error ε), then it is desirable to refine the abstraction to reduce the difference. This proides the basis for a quantitatie analogue of the counterexample-guided abstraction-refinement process, illustrated in Figure 3: starting with a simple, coarse abstraction, we refine the abstraction until the difference between the bounds is below ε. In fact we can consider different criteria for termination of the process: for example checking that the difference is below ε for a single abstract state or set of abstract states, such as the abstract states containing an initial state. 5

MDP M partition P S property ϕ error bound ε build abstract MDP model check abstract MDP error < ε error ε refine partition return bounds Figure 3: Abstraction-refinement framework for MDPs 4. Refinement strategies In our context, refinement corresponds to replacing the partition used to build the abstraction with a finer partition. We propose two methods called strategy-based and alue-based refinement. The first examines the difference between strategy pairs that yield the lower and upper bounds. The motiation for this is that, since the bounds are different and the actual alue for the concrete model falls between the bounds, one (or both) of the strategy pairs must make choices that are not alid in the concrete model (this is analogous to the non-probabilistic case where refinement is based on a single counterexample). The second method differs by considering all strategy pairs yielding the bounds. For both methods, we demonstrate that the refinement results in a strictly finer partition. This guarantees that, for finite state models, the process eentually conerges (results in more accurate abstractions). In addition, if a partition is coarser than probabilistic bisimulation [28], i.e. bisimilar states are in the same element of the partition, then so is the partition after refinement. This follows from the fact that, under such a partition, bisimilar states are represented by the same player 2 ertex and, as will be seen, neither method separates such states. Consequently, if the initial partition is coarser than probabilistic bisimulation, then any subsequent refinement will also be coarser. This obseration guarantees the conergence of the refinement process when it is applied to infinite-state MDPs that hae a finite bisimulation quotient. To simplify presentation, we describe the refinement step when computing minimum reachability probabilities, i.e. we assume that we hae MDP M = (S, s init, Steps, r), partition P, target set F P and corresponding game G P M = ((V, E), init, (V, V 2, V p ), δ, r) in which p lb,min (F ) < p ub,min (F ) for some V. The cases for maximum probabilistic and expected reachability follow in identical fashion. Strategy-based Refinement As mentioned in Section 2, when computing p lb,min (F ) and p ub,min (F ), a pair of strategies that obtain these alues is also generated. Since a player strategy s choice can be considered as a set of concrete states (the states whose choices are abstracted to the chosen player 2 ertex), the player strategy from each pair proides a method for refinement. More precisely, we refine P as follows.. Find memoryless strategy pairs (σ lb p σlb,σlb 2, σlb 2 ), (σub, σub 2 (F ) = p lb,min (F ) and p σub,σub 2 ) such that for any V : (F ) = p ub,min (F ). 6

2. For each player ertex V such that σ lb P with σ lb, σ ub and \ (σ lb σ ub ) where: () σub () replace in the partition σ lb = { s σ lb () = Steps(s) } and σ ub = { s σ ub () = Steps(s) }. The following states that there always exists a ertex such that σ lb () σub (), and hence applying this refinement will always (strictly) refine the partition. Lemma 4. If there exists a ertex V such that p lb,min (F ) < p ub,min (F ) and (σ lb, σlb 2 ) and (σub, σub 2 ) are corresponding memoryless optimal strategy pairs, then there also exists a ertex V such that σ lb () σub (). Proof. The proof is by contradiction. Therefore, suppose that there exists V such that p lb,min (F ) < p ub,min (F ) and (σ lb, σlb 2 ) and (σub, σub 2 ) are memoryless strategy pairs such that: p σlb,σlb 2 (F ) = p lb,min (F ), p σub,σub 2 Now, by the hypothesis we hae: p lb,min (F ) < p ub,min (F ) (F ) = p ub,min (F ) and σ lb () = σub () for all V. = sup σ inf σ2 p σ,σ 2 (F ) by definition = inf σ2 p σub,σ 2 (F ) since σ ub is an optimal strategy p σub,σlb 2 (F ) since σ lb 2 is a player 2 strategy = p σlb,σlb 2 (F ) since σ lb ( ) = σ ub() for all V = p lb,min (F ) by construction of (σ lb, σlb 2 ) which is a contradiction. Value-based Refinement The second refinement method is again based on the choices made by optimal strategy pairs. Howeer, rather than looking at a single strategy pair for each of the bounds, we use all strategies which achiee one of the bounds when refining the partition. More precisely, letting 2 (s) = Steps(s), we refine each element P as follows.. Construct the following subsets of : 2. Replace with min lb min lb = {s p lb,min lb,min (F ) = p (F )} min 2 (s) ub = {s p ub,min (F ) = p ub,min (F )}. \ min ub, min ub 2 (s) \min lb, lb min ub min and \(lb min ub min). 7

The following states that this refinement method will always lead to a (strictly) finer partition. Lemma 4.2 If there exists V such that p lb,min (F ) < p ub,min (F ), then there exists V such that lb min ub min. Proof. The proof is by contradiction. Therefore, suppose that there exists V such that p lb,min (F ) < p ub,min (F ) and min lb = ub min for all V. Now, consider any memoryless strategy pairs (σ lb ) and (σub ) such that: p σlb,σlb 2 (F ) = p lb,min, σlb 2 (F ) and p σub, σub 2,σub 2 Now, for any player ertex V, by definition we hae: min (F ) = p ub,min (F ) for all V. lb = {s p lb,min lb,min (F ) = p (F )} min 2 (s) ub = {s p ub,min (F ) = p ub,min (F )} 2 (s) where 2 (s) = Steps(s), and hence, by construction σ lb() = 2(s lb ) for some s lb lb min and σ ub() = 2(s ub ) for some s ub ub min. Now, by the hypothesis we hae min lb = ub min for all V and therefore, since σ2 lb is optimal, it follows that for the player ertex : p lb,min (F ) = p σub,σlb 2 (F ) inf σ2 p σub,σ 2 (F ) since σ lb 2 is a player 2 strategy = sup σ inf σ2 p σ,σ 2 (F ) since σ ub is an optimal strategy = p ub,min (F ) by definition which contradicts the fact that p lb,min (F ) < p ub,min (F ). Example 4.3 To illustrate the refinement strategies, we consider the simple MDP gien in Figure 4(a) with a reward of associated with each transition and the maximum expected reward of reaching the target set {s 4 }. We start by partitioning the state space into the target set ({s 4 }) and the remaining states, resulting in the stochastic two-player game shown in Figure 4(b). As in Example 3., shapes outlined in thick black lines are player ertices and each one s successor player 2 ertices are outlined with grey lines and enclosed within it. Successor probabilistic ertices (distributions oer player ertices) are represented by sets of grouped, labelled arrows emanating from the ertex. The states of the original MDP, sets of which comprise player and player 2 ertices, are also depicted. Calculating the lower and upper bounds on the maximum expected reward to reach {s 4 }, using the game in Figure 4(b), yields the interal [2, ) for the ertex containing states s 0 s 3 (and [0, 0] for the one containing s 4 ). There is only one memoryless strategy resulting in each bound: the one that selects {s 3 } (for the lower bound) and 8

s 0 s 0.5 0.5 s 2 s 3 s 4 0.5 0.5 (a) Concrete MDP s 0 s 0.5 s 2 s 3 s 0.5 4 (b) Initial abstraction s 0 s s 2 0.5 0.5 0.5 s 3 s 0.5 4 (c) Refined abstraction Figure 4: Example of the abstraction-refinement process the one that selects {s 0, s, s 2 } (for the upper bound). Consequently, both the strategybased and alue-based refinement techniques refine the partition from {s 0, s, s 2, s 3 }, {s 4 } to {s 0, s, s 2 }, {s 3 }, {s 4 }. The new abstraction is shown in Figure 4(c). Note that the player ertex containing {s 0, s, s 2 } now leads to three distinct player 2 ertices, since the (induced) distributions for MDP states s 0, s and s 2 are no longer equialent under the new partition. Calculating bounds with the new game now gies [4, ) and [3, ) for the ertices {s 0, s, s 2 } and {s 3 }. From the ertex {s 0, s, s 2 }, the optimal strategy for the lower bound chooses {s 2 } and for the upper bound can choose either {s 0 } or {s }. With strategy-based refinement, using either strategy for the upper bound gies the partition {s 0 }, {s }, {s 2 }, {s 3 }, {s 4 }. For alue-based refinement, s 0 and s are kept together, giing rise instead to the partition {s 0, s }, {s 2 }, {s 3 }, {s 4 }. 4.2 The Abstraction-Refinement Algorithm With the refinement strategies of the preious section, we can implement the abstractionrefinement loop shown in Figure 3. Starting with a coarse partition, we iteratiely: construct an abstraction (according to Definition 7); compute lower and upper bounds (and corresponding strategies); and refine using either the strategy-based or alue-based method of Section 4.. In this section, we present an optimised algorithm for the refinement loop, illustrated in Figure 5(a). At the ith refinement step, we construct the abstraction G P i M of the MDP using partition P i. Using G P i M and x ub, we then compute ectors of lower and upper bounds xlb i ) for the minimum probability of i (indexed oer P i, i.e. oer player ertices of G P i M reaching F in the MDP, using alue iteration. If the bounds are sufficiently close (within ε), the loop terminates; otherwise, we refine the partition. The algorithm includes two optimisations, designed to improe the conergence rate of numerical solution: whereer possible, we re-use existing numerical results: when calculating lower bounds (line 5), we use the same ector from the preious step as the initial solution; for upper bounds (line 6), we re-use the lower bound from the current step; both are guaranteed lower bounds on the solution. we store (in set V done ) ertices for which lower and upper bounds coincide (and we 9

// Initialise partition and probability ector. P 0 :=P; V done := 2. for ( P 0 ) x lb 0 ():=0 // Main loop 3. for (i = 0,, 2,... ) 4. G P i M :=ConstructAbstractMdp(M, P i) 5. (x lb i, σlb i ):=ValIter(GP i M, F, lb, P i\v done, x lb i ) 6. (x ub i, σub i ):=ValIter(G P i M, F, ub, P i\v done, x lb i ) // Terminate loop if done 7. if ( x ub i x lb i <ε) return (xlb i, xub i ) // Otherwise, refine, build partition P i+ // and ector x lb i+ for next iteration 8. P i+ := 9. for ( P i ) // Vertices that cannot be refined 0. if (x lb i () = xub i ()). P i+ :=P i+ {} 2. V done :=V done {} 3. x lb i+ ():=xlb () // Vertices that can be refined 5. else 6. P :=Refine(G P i M,, (xlb i, σlb i ), (xub i, σub i )) 7. P i+ :=P i+ P 8. for ( P ) x lb i+ ( ):=x lb i () 9. end 20. end 2. end (a) AbstractRefineLoop(M, P, F ) // Initialise probability ector. x:=x init ; // Qualitatie pre-computation 2. V0 :=Prob0(G, F, ) 3. for ( V0 ) x( ):=0 4. V :=Prob(G, F, ) 5. for ( V ) x( ):= // Main fixpoint loop 6. repeat 7. ˆx:=x; 8. if ( = lb) 9. for ( V? \(V0 V )) P 0. x( ):= min min δ( 2 p)() ˆx() p 2. σ( ):= argmin 2 2. end 3. else 4. for ( V? \(V V P min δ( p)() ˆx() p 2 V 0 V )) P 5. x( ):= max min δ( 2 p)() ˆx() p 2 V P 6. σ( ):= argmax min δ( 2 p)() ˆx() p 2 V 7. end 8. end 9. until x ˆx <δ // Return ector of probabilities // and corresponding player strategy 20. return (x, σ) (b) ValIter(G, F,, V?, x init) Figure 5: Abstraction-refinement loop algorithm (minimum reachability probabilities) thus hae the exact solution for their concrete states); again, we use these alues in initial solutions for alue iteration. Numerical solution of the game at each step is performed with an improed ersion of the standard alue iteration algorithm [5], illustrated in Figure 5(b), which includes seeral optimisations. Firstly, we employ the qualitatie algorithms of [4] to efficiently find ertices for which the solution is exactly 0 or (lines 2 and 4). Secondly, we only compute new alues for ertices where the answer is unknown, i.e. those not in the set V done, mentioned aboe, and not identified by the qualitatie algorithms (lines 9 and 4). Thirdly, the alue iteration algorithm is gien an initial ector x init for the fixpoint computation (normally this is a ector of zeros). This is used, as described aboe, to improe conergence. The correctness of the alue iteration scheme is gien in Section 4.3 below. As in the preious section, the code presented in Figure 5 is for computation of minimum probabilities. The case for maximum probabilities and for expected rewards require only triial changes to the alue iteration algorithm. The Refine function referenced in line 6 of Figure 5(a) can be either the strategy-based or alue-based refinement from Section 4.. For the latter, we require that lines and 6 of the ValIter algorithm in Figure 5(b) return all (rather than just one) player strategy choice which achiees the lower/upper bound. 20

4.3 Correctness of the Abstraction-Refinement Algorithm We now demonstrate that the alue iteration scheme of Figure 5 conerges. For simplicity we only consider computing bounds on minimum reachability probabilities and assume a fixed MDP M = (S, s init, Steps, r) and target set F. The cases for maximum probabilistic and expected reachability follow in identical fashion. We begin with a number of preliminary concepts. For a set of ertices V, let be the partial order oer (V [0, ]) (V [0, ]) where x x if and only if x() x () for all V. For any stochastic game G = ((V, E), init, (V, V 2, V p ), δ, r), subset of player ertices V done and reachability objectie F, let FV lb done : (V [0, ]) (V [0, ]) be the function: where FV lb done (x)() def = inf σ inf σ2 p σ,σ 2 min min 2 p 2 V if V lb 0 if V0 lb (F ) if V done \ (V lb V lb δ( p )() x() otherwise V lb = { V inf σ inf σ2 p σ,σ 2 (F ) = } V0 lb = { V inf σ inf σ2 p σ,σ 2 (F ) = 0} and FV ub done : (V [0, ]) (V [0, ]) the function: where FV ub done (x)() def = if V ub 0 if V0 ub (F ) if V done \ (V ub sup σ inf σ2 p σ,σ 2 δ( p )() x() max min 2 p 2 V otherwise V ub = { V sup σ inf σ2 p σ,σ 2 (F ) = } V0 ub = { V sup σ inf σ2 p σ,σ 2 (F ) = 0}. 0 ) V ub The following results are adapted from [4] to the notation used in this paper. Lemma 4.4 For any stochastic game G, objectie F and subset of player ertices V done, the function F lb V done is monotonic with respect to. Theorem 4.5 For any stochastic game G, objectie F and subset of player ertices V done, the function FV lb done has a unique fixed point u lb and u lb () is the optimal alue of the probabilistic reachability objectie F when player and player 2 cooperate to minimise the objectie, that is u lb () = inf σ inf σ2 p σ,σ 2 (F ). Lemma 4.6 For any stochastic game G, objectie F and subset of player ertices V done, the function F ub V done is monotonic with respect to. Theorem 4.7 For any stochastic game G, objectie F and subset of player ertices V done, the function FV ub done has a unique fixed point u ub and u ub () is the optimal alue of the probabilistic reachability objectie F for player, that is u ub () = sup σ inf σ2 p σ,σ 2 (F ). 2 0 )