Characterizing Fault-Tolerant Systems by Means of Simulation Relations

Characterizing Fault-Tolerant Systems by Means of Simulation Relations TECHNICAL REPORT Ramiro Demasi 1, Pablo F. Castro 2,3, Thomas S.E. Maibaum 1, and Nazareno Aguirre 2,3 1 Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada, demasira@mcmaster.ca,tom@maibaum.org 2 Departamento de Computación, FCEFQyN, Universidad Nacional de Río Cuarto, Río Cuarto, Córdoba, Argentina, {pcastro,naguirre}@dc.exa.unrc.edu.ar 3 Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina Abstract. We aim to formally characterize the various fault-tolerant behaviors that a computer system may exhibit, i.e., to capture the various possible executions of a system that in some way tolerate faults. This formalization is achieved by using particular notions of simulation and bisimulation, in order to compare executions of systems that exhibit faults with executions where no faults occur. This characterization allows us to algorithmically verify, using standard (bi)simulation algorithms, that a system behaves in an acceptable way, even under the occurrence of faults. This approach has the benefit of being simple and supporting an efficient automated treatment. We demonstrate the practical application of this theoretical framework through some well-known case studies, which illustrate that the main ideas behind most fault-tolerance mechanisms are naturally captured in this setting. Keywords: Formal Methods, Fault-Tolerance, Temporal Logic, Model Checking, Bisimulation. 1 Introduction The increasing use of computer systems in critical areas has implied that dependability and constant availability are two important characteristics that need to be taken into account when designing software. Some examples of such critical systems include software for medical devices and software controllers in the avionics and automotive industries. These systems are subject to a variety of potential failures that may corrupt, or degrade, their performance. So, being able to reason about these systems behaviors in the presence of potential faults, in order to provide guarantees for software robustness, has gained considerable attention. When a system is able to tolerate faults in some way, it is called fault-tolerant. If a system is able to tolerate faults, it is well-known that it may do so exhibiting different degrees of so called fault-tolerance. For instance, the system may

2 R. Demasi, P.F. Castro, T.S.E. Maibaum, N. Aguirre completely tolerate the faults, not allowing these to have any observable consequences for the users; this is often called masking fault-tolerance. Alternatively, after a fault occurs, the system may undergo some process to eventually take the system back to a good behavior (after some amount of time); this is usually called nonmasking tolerance and it is normally observable at the fault-tolerant component s interface. Also, the system may react to a fault by switching to a behavior that is safe but in which the system is restricted in its capacity; this is called failsafe fault-tolerance. The field of fault-tolerant systems is concerned with providing techniques that can be used to increase the fault-tolerance characteristics of software, or of systems in general. Some examples of traditional techniques employed to deal with fault-tolerance are component replication, N-version programming, exception mechanisms, transactions, etc. Some emerging approaches try to deal with fault-tolerance in formal settings, with the aim of mathematically proving that a given system effectively tolerates faults. One of the early examples of formal approaches to fault-tolerance is the one presented in [3 5], where Unity programs are complemented with fault steps, and the logic underlying Unity is used to prove properties of programs. More recently, formal approaches involving model checking, applied to fault-tolerance, have been proposed. In these approaches, temporal logics are employed to capture fault-tolerance properties of reactive systems, and model checking algorithms are used to automatically verify that these properties hold for a given system. It is widely known that model checking provides significant benefits over other semi automated or manual formal approaches. This is because model checking provides fully automated analysis, and the counterexamples that are generated when a property does not hold are extremely helpful in finding the source of the problem in the system, and dealing with it. The languages employed for the description of systems and system properties in model checking do not provide a built-in way of distinguishing between normal and abnormal behaviors. Thus, when capturing fault-tolerant systems, and expressing fault-tolerance properties, the specifier needs to encode in some suitable way the faults and their consequences. This may make formulas longer and more difficult to understand, and even have a potential negative impact on analysis, since the performance of model checking algorithms depends on the length of the formula being analyzed. In this paper, we propose an alternative automated approach for dealing with the analysis of fault-tolerance, which appropriately distinguishes faulty behaviors from normal ones, with a notion of abnormal transition built into the formalism. This approach enables for the automated analysis of fault-tolerance, meaning that the approach allows one to show, without the need for user intervention, that the system is able to tolerate faults to some degree (masking, nonmasking, failsafe). The approach is based on defining simulation/bisimulation relations for different degrees of fault tolerance, and employing (bi)simulation algorithms for checking fault tolerance. The remainder of the paper is structured as follows. In section 2 we introduce the main notions used throughout the paper. In section 3 we define masking, nonmasking and failsafe versions of fault-tolerance by means of simulation relations.

Characterizing Fault-Tolerant Systems by Means of Simulation Relations 3 We present some examples in section 5. Finally, we discuss some conclusions and directions for further work in section 7. 2 Preliminaries In this section, we introduce some concepts that will be necessary throughout the paper. For the sake of brevity, we assume some basic knowledge on model checking; the interested reader may consult [7]. We model fault-tolerant systems by means of coloured Kripke structures, as introduced in [8]. Given a set of propositional letters AP = {p, q, s,...}, a colored Kripke structure is a 5-tuple S, I, R, L, N, where S is a set of states, I S is a set of initial states, R S S is a transition relation, L : S (AP ) is a labeling function indicating which propositions are true in each state, and N S is a set of normal, or green states. Arcs leading to abnormal states (i.e., states not in N ) can be thought of as faulty transitions, or simply faults. Then, normal executions are those transiting only through green states. The set of normal executions is denoted by N T. We assume that in every colored Kripke structure, and for every normal state, there exists at least one successor state that is also normal, and that at least one initial state is green. This guarantees that every system has at least one normal execution, i.e., that N T. As is usual in the definition of temporal operators, we employ the notion of trace. A trace is a maximal sequence of states, whose consecutive pairs are adjacent w.r.t. R. When a trace starts in an initial state, it is called an execution of M, with partial executions corresponding to non-maximal sequences of adjacent states. Given a trace σ = s 0, s 1, s 2, s 3,..., the ith state of σ is denoted by σ[i], and the final segment of σ starting in position i is denoted by σ[i..]. Moreover, we distinguish among the different kinds of outgoing transitions from a state. Given a Kripke structure M = S, I, R, L, N, we denote by the restriction of R to faulty transitions, and the restriction of R to non-faulty transitions. We define P ost N (s) = {s S s s } as the set of successors of s reachable via non-faulty (or good) transitions; similarly, P ost F (s) = {s S s s } represents the set of successors of s reachable via faulty arcs. Analogously, we define P re N (s ) and P re F (s ) as the set of predecessor of s via normal and faulty transitions, respectively. Moreover, P ost (s) denotes the states which are reachable from s. Without loss of generality, we assume that every state has a successor [7]. We denote by the transitive closure of. In order to state properties of systems, we employ a fragment of dctl [8] (named dctl-), a branching time temporal logic with deontic operators designed for fault-tolerant system verification. Formulas in this logic refer to properties of behaviors of colored Kripke structures, in which a distinction between normal and abnormal states (and therefore also a distinction between normal and abnormal traces) is made. The logic dctl is defined over the Computation Tree Logic CTL, with its novel part being the deontic operators O(ψ) (obligation) and P(ψ) (permission), which are applied to a certain kind of path formula ψ. The

4 R. Demasi, P.F. Castro, T.S.E. Maibaum, N. Aguirre intention of these operators is to capture the notion of obligation and permission over traces. Intuitively, these operators have the following meaning: O(ψ): property ψ is obliged in every future state, reachable via non-faulty transitions. P(ψ): there exists a normal execution, i.e., not involving faults, starting from the current state and along which ψ holds. Obligation and permission will enable us to express intended properties which should hold in all normal behaviors and some normal behaviors, respectively. These deontic operators have an implicit temporal character, since ψ is a path formula. Let us present the syntax of dctl-. Let AP be a set {p 0, p 1,...} of atomic propositions; the sets Φ and Ψ of state and path formulas, respectively, are mutually recursively defined as follows: Φ ::= p i Φ Φ Φ A(Ψ) E(Ψ) O(Ψ) P(Ψ) Ψ ::= XΦ Φ U Φ Φ W Φ Other boolean connectives (here, state operators), such as,, etc., can be defined as usual. Also, traditional temporal operators G and F can be expressed, as G(φ) φ W, and F(φ) U φ. The standard boolean operators and the CTL quantifiers A and E have the usual semantics. Now, we formally state the semantics of the logic. We start by defining the relation, formalizing the satisfaction of dctl- state formulas in colored Kripke structures. The definition of is as follows: M, s. M, s p i p i L(s), where p i AP. M, s ϕ not M, s ϕ. M, s ϕ ϕ (M, s ϕ) or (M, s ϕ ). M, s E(ψ) M, σ ψ for some traces σ such that σ[0] = s. M, s A(ψ) M, σ ψ for all traces σ such that σ[0] = s. M, s O(ψ) for every σ N T such that σ[0] = s we have that for every i 0 M, σ[i..] ψ. M, s P(ψ) for some σ N T such that σ[0] = s we have that for every i 0 M, σ[i..] ψ. For the standard CTL operators the definition of is as usual (see [7]). We denote by M ϕ the fact that M, s ϕ holds for every state s of M, and by ϕ the fact that M ϕ for every colored Kripke structure M. In order to illustrate the semantics of the deontic operators, let us consider the colored Kripke structure in Figure 1, where the set of propositional variables is {p, q, r, t}, and each state is labeled by the set of propositional variables that hold in it. The states that are the target of dashed arcs are abnormal states (ones which indicate that something has gone wrong); faulty states are also drawn with dashed lines, while the remaining ones represent normal configurations. Note that the unique faulty state in this model is the one named t. In this simple model, for every non-faulty execution, p q is always true. In dctl- this is expressed by the

Characterizing Fault-Tolerant Systems by Means of Simulation Relations 5 formula O(p q). Note that there also exist normal executions for which p q r holds; this fact is expressed as P(p q r). Other deontic operators such as prohibition can be expressed by using the ones introduced above (cf. [8]). One of the interesting characteristics of dctl- is the possibility of distinguishing between formulas that state properties of good executions and the standard formulas, which state properties over all the possible executions. For any formula ϕ, we define an analogous formula ϕ N which captures the same property as ϕ, but restricted to good executions. The definition is as follows. Fig. 1. A Simple Colored Kripke Structure Definition 1. The normative formula ϕ N, derived from a dctl formula ϕ over an alphabet AP, is defined as follows: (p i ) N def = p i. ( ϕ) N def = ϕ N. (ϕ ϕ ) N def = ϕ N ϕ N. (A(ϕ U ϕ )) N def = O(ϕ N U ϕ N ). (E(ϕ U ϕ )) N def = P(ϕ N U ϕ N ). (A(ϕ W ϕ )) N def = O(ϕ N W ϕ N ). (E(ϕ W ϕ )) N def = P(ϕ N W ϕ N ). (O(ϕ U ϕ ))) N def = O(ϕ N U ϕ N ). (P(ϕ U ϕ )) N def = P(ϕ N U ϕ N ). (O(ϕ W ϕ )) N def = O(ϕ N W ϕ N ). (P(ϕ W ϕ )) N def = P(ϕ N W ϕ N ).

6 R. Demasi, P.F. Castro, T.S.E. Maibaum, N. Aguirre 3 Simulations and Fault-Tolerance In this section we present suitable notions of simulation relations that allow us to capture diverse fault-tolerance properties, namely, masking, nonmasking, and failsafe fault-tolerance. In order to define these properties, we follow the basic definitions regarding simulation and bisimulation relations given in [7]. Fault-tolerance properties are based on the grades of satisfaction of safety and liveness properties that a model may exhibit in the presence of faults. Recall that any specification can be written as a conjunction of safety and liveness properties [1]. In the following definitions, given a Kripke structure with a labeling L, we consider the notion of a sub-labeling function, we say that L 0 is a sub-labeling of L (denoted by L 0 L), if L 0 (s) = L(s) AP, for all states s and some AP AP, we also say that L 0 is obtained by restricting AP to AP. The concept of sub-labeling allows us to focus on certain properties of models. Basically, in order to check fault-tolerance we consider two coloured Kripke structures of a system, the first one acting as a specification of the intended behaviour and the second as the fault-tolerant implementation. The existence of an appropriate (bi)simulation relation between the two will correspond to the achievement of a certain kind of fault-tolerant behaviour. Now, we introduce the notion of masking tolerance relations. 3.1 Masking Faul-Tolerance Definition 2. (Masking fault-tolerance) Given two colored Kripke structures M = S, I, R, L, N and M = S, I, R, L, N, we say that a relationship Mask S S is masking fault-tolerant for sub-labelings L 0 L and L 0 L iff: (A) s 1 I ( s 2 I : s 1 Mask s 2 ) and s 2 I ( s 1 I : s 1 Mask s 2 ). (B) for all s 1 Mask s 2 the following holds: (1) L 0 (s 1 ) = L 0(s 2 ). (2) if s 1 P ost N (s 1 ), then there exists s 2 P ost N (s 2 ) with s 1 Mask s 2. (3) if s 2 P ost N (s 2 ), then there exists s 1 P ost N (s 1 ) with s 1 Mask s 2. (4) if s 2 P ost F (s 2 ), then either there exists s 1 P ost N (s 1 ) with s 1 Mask s 2 or s 1 Mask s 2. Roughly speaking, conditions A, B.1, B.2 and B.3 imply that we have a bisimulation between the normative parts of M and M. Condition B.4 states that every outgoing faulty transition from s 2 either must be matched by an outgoing normal transition from s 1 or s 2 is masking fault-tolerant for s 1. Observe that, if there exists a self-loop at state s 2, then we can stay forever satisfying s 1 Mask s 2. Therefore, a fairness condition needs to be imposed on B.4 to ensure that masking implementations preserve liveness properties; for example, we can assume that, if a set of non-faulty transition are enabled infinitely often, then these transitions are executed infinitely often. We denote by M f ϕ the restriction of to fair executions. It is worth remarking that the symmetric condition

Characterizing Fault-Tolerant Systems by Means of Simulation Relations 7 to (B.4) is not required, this is because we are only interested in the masking properties of M. We say that state s 2 is masking fault-tolerant for s 1 when s 1 Mask s 2. Intuitively, starting in s 2 we can mask the faults in such a way that the behavior exhibited is the same as the one observed when we start from s 1 (and execute transitions without faults). Moreover, we say that M masks faults for M iff s 0 Mask s 0 (where s 0 and s 0 are the corresponding initial states 4 ) for some masking fault-tolerant Mask ; we denote this situation by M Mask M. We present a simple example to illustrate the above definition. Example 1. Let us consider a memory cell that stores a bit of information and supports reading and writing operations. Briefly, each state maintains the current value of the memory cell (m = i, for i = 0, 1) and the last write operation that was performed (w = i, for i = 0, 1). Evidently, in this system the result of a read operation depends on the value stored in the cell. Some potential faults in this problem occur when a bit s value (say 1) unexpectedly loses its charge and it turns into another value (say 0). We employ the technique of redundancy to deal with this situation; that is, we use three memory bits instead of one. Writing operations are performed simultaneously on the three bits, whereas a reading returns the value that is repeated at least twice in the memory bits (also known as voting), and then the value read by voting is written back in all the bits. Each state in the model is described by variables m and w which record the last writing operation performed and the actual reading in the state. In addition, each state has three bits, described by boolean variables c 0, c 1 and c 2. For instance, the state s 0 contains the information 11/111, representing the state: w = 1, m = 1, c 0 = 1, c 1 = 1, and c 2 = 1. Now, consider the colored Fig. 2. Two masking fault-tolerance colored Kripke structures. Kripke structures M (left) and M (right) depicted in Figure 2. M contains only 4 In general, models have a unique initial state.

8 R. Demasi, P.F. Castro, T.S.E. Maibaum, N. Aguirre normal transitions describing the expected ideal behavior and does not take into account faults. On the other hand, M expresses the same ideal behavior but now a fault may occur; in this case, a bit may suffer a discharge and then it changes its value from 1 to 0. We can show that in this simple case there exists a relation of masking fault-tolerance between M and M with the sub-labelings L 0 and L 0 obtained by restricting L and L to propositions m and w, respectively. The relation R 1 = {(s 0, t 0 ), (s 1, t 1 ), (s 0, t 2 )} is a masking fault-tolerance for (M, M ). Notice that each pair of R 1 satisfies the Definition 2: (s 0, t 0 ) satisfies condition B.1 because L 0 (s 0 ) = L 0(t 0 ). Conditions B.2 and B.3 are satisfied because the transition s 0 s 0 is masked by t 0 t 0 and vice versa with (s 0, t 0 ) R 1, and transition t 0 t 1 masks s 0 s 1 and vice versa with (s 1, t 1 ) R 1. Finally, condition B.4 is satisfied because the faulty transition t 0 t 2 masks the normal transition s 0 s 0 with (s 0, t 2 ) R 1. (s 1, t 1 ) satisfies condition B.1 because L 0 (s 1 ) = L 0(t 1 ). Conditions B.2 and B.3 are satisfied because the normal transition t 1 t 0 is masked by s 1 s 0 and vice versa with (s 0, t 0 ) R 1, and the normal transition t 1 t 1 masks s 1 s 1 and vice versa with (s 1, t 1 ) R 1. (s 0, t 2 ) satisfies condition B.1 since L 0 (s 0 ) = L 0(t 2 ) (taking into account that reading operations return the value that is repeated at least twice in the memory bits). Conditions B.2 and B.3 are satisfied because the normal transition t 2 t 0 masks s 0 s 0 and vice versa with (s 0, t 0 ) R 1, and the normal transition t 2 t 1 masks s 0 s 1 and vice versa with (s 1, t 1 ) R 1. Now, we extend the definition of masking fault-tolerance to paths. Lemma 1. (Masking fault-tolerance on paths) Suppose we are given two colored Kripke structures M and M and a masking fault-tolerance Mask for (M, M ) and s 1 Mask s 2. Then for each path fragment σ 2 = s 0,2 s 1,2 s 2,2 s 3,2... starting in s 0,2 = s 2 there exists a normative path fragment σ 1 = s 0,1 s 1,1 s 2,1 s 3,1... from state s 0,1 = s 1 of the same length such that s j,1 Mask s j,2 for all j. Proof. Let σ 2 = s 0,2 s 1,2 s 2,2 s 3,2... P aths(s 2 ) be a path fragment in M starting in s 0,2 = s 2 and assume s 1 Mask s 2. We successively define a corresponding path fragment in M from state s 0,1 = s 1, where the transitions s i,2 2 s i+1,2 or s i,2 2 s i+1,2 are matched by transitions s i,1 1 s i+1,1 such that s i+1,1 Mask s i+1,2. We proceed by induction on i. Base case: i = 0. It follows from s 0,1 Mask s 0,2 which is equivalent to s 1 Mask s 2 that the transition either s 2 = s 0,2 2 s 1,2 or s 2 = s 0,2 2 s 1,2 can be matched by a transition s 1 = s 0,1 1 s 1,1 such that s 1,1 Mask s 1,2 by conditions (B.3) and (B.4) of definition 2, respectively. This yields the path fragment s 1 s 1,1 in M.

Characterizing Fault-Tolerant Systems by Means of Simulation Relations 9 Induction Step: Assume i 0 and that the path fragment s 0,1 s 1,1 s 2,1 s i,1 is already constructed with s j,1 Mask s j,2 for j = 1,..., i. We consider the step s i,2 2 s i+1,2 and s i,2 2 s i+1,2 in σ 2. Since s i,1 Mask s i,2, there exists a transition s i,1 1 s i+1,1 with s i+1,1 Mask s i+1,2 by condition (B.3) and (B.4) of definition 2, respectively. This yields a path fragment s 1 s 1,1... s i,1 s i+1,1 which is statewise related to the prefix s 2 s 1,2... s i,2 s i+1,2 of σ 2. Notice that if we only consider normal executions starting in s 1 and s 2 with s 1 Mask s 2, by conditions (B.3) and (B.4) of definition 2 we have that for each path fragment starting in s 2, there exists a path fragment from s 1 of the same length, and viceversa. This means that normal behavior are respected in both cases. An important property of masking fault-tolerance is that both safety and liveness properties of normative (i.e., non-faulty) executions are preserved by masking tolerant implementations under fairness restrictions. As we said above, we can assume that, if a set of non-faulty transition are enabled infinitely often, then these transitions are executed infinitely often. In order to express this fairness assumption, we follow the ideas introduced in [2], where Kripke structures are augmented with fairness conditions, the authors consider a transition fairness defined as follow: A path π is fair with respect to the transition fairness condition iff all the transitions that are enabled along π infinitely often are also taken along π infinitely often. Formally, a transition v, w R is enabled in position i along π, if π[i] = v. It is taken, if in addition π[i + 1] = w. Thus, the semantics of dctl- is slightly modified such that the state formulae O(ϕ) and P(ϕ) are interpreted over all normative fair paths rather than over all possible normative paths. Finally, the fairness assumption f is expressed as follow: f def = GF(v) GF(w) with w P ost N (v). Now, we can state the following theorem. Theorem 1. If s 1 Mask s 2 for sub-labelings L 0 and L 0 obtained by restricting AP to AP, then M, s 1 f ϕ N M, s 2 f ϕ, where all the propositional variables of ϕ are in AP. Proof. We proceed by induction over the structure of the formula ϕ. Base case: ϕ = p. From s 1 Mask s 2 it follows by condition (B.1) for masking fault-tolerance that s 1 and s 2 have the same valuation. Thus, M, s 1 f p N M, s 2 f p, where (p i ) N = p i by definition 1. Inductive case: Case 1: ϕ = ψ. Suppose M, s 1 f ( ψ) N and M, s 2 ψ; the latter is equivalent to M, s 2 f ψ M, s 2 f ψ. But note by induction hypothesis and lemma 1, M, s 1 f ψ N holds, which contradicts our assumption. Then, M, s 1 f ( ψ) N M, s 2 f ψ. Similar for ϕ = ψ 1 ψ 2.

10 R. Demasi, P.F. Castro, T.S.E. Maibaum, N. Aguirre Case 2: ϕ = A(ψ 1 U ψ 2 ). Suppose that M, s 1 f (A(ψ 1 U ψ 2 )) N and M, s 2 A(ψ 1 U ψ 2 )), the latter is equivalent to M, s 2 f E (ψ 1 U ψ 2 ). Intuitively, it means that for some paths starting in s 2 either ψ 1 stops to hold too early, i.e., before ψ 2 becomes valid or ψ 1 always holds but never ψ 2. Formally, in the first case, for some path fragment σ 2 such that σ 2 [0] = s 2, there exists a j 0 such that M, σ 2 [j] = ψ 2, and for every 0 l < k < j, it holds that M, σ 2 [l] = ψ 1 and M, σ 2 [k] = ψ 1. In the second case, for some path fragment σ 2 such that σ 2 [0] = s 2, there exists a j 0 such that M, σ 2 [j] = ψ 1 ψ 2. Furthermore, we have that s 1 Mask s 2, then, by lemma 1, for each path fragment starting in s 2 there exists a normative path fragment σ 1 such that σ 1 [0] = s 1 of the same length which is statewise related to σ 2, i.e., for the second case for each path fragment σ 2 such that M, σ 2 [j] = ψ 1 ψ 2 with j 0, there exists a normative path fragment with σ 1 [0] = s 1 such that M, σ 1 [j] = ψ 1 ψ 2. Similar for the first case. This yields a contradiction because contradicts the assumption that M, s 1 f (A(ψ 1 U ψ 2 )) N M, s 1 f O((ψ 1 ) N U (ψ 2 ) N ), i.e, for all normal paths σ 1 such that σ 1 [0] = s 1, there exists a j 0 such that M, σ 1 [j] = ψ 2, and for every 0 k < j, M, σ 1 [k] = ψ 1. Case 3: ϕ = E(ψ 1 U ψ 2 ). Similar to the proof for case 2. Case 4: ϕ = E(ψ 1 W ψ 2 ). Suppose that M, s 1 f (E(ψ 1 W ψ 2 )) N and M, s 2 E(ψ 1 W ψ 2 )), the latter is equivalent to M, s 2 f A (ψ 1 W ψ 2 ) M, s 2 f A ((ψ 1 ψ 2 ) U ( ψ 1 ψ 2 )). Intuitively, it means that for all paths starting in s 2 ψ 1 stops to hold too early, i.e., before ψ 2 becomes valid. In addition, we have that s 1 Mask s 2, then, by lemma 1, for each path fragment σ 2 such that σ 2 [0] = s 2, there exists a j 0 such that M, σ 2 [j] = ( ψ 1 ψ 2 ), and for every 0 k < j, M, σ 2 [k] = (ψ 1 ψ 2 ), there exists a normative path fragment from state s 1 of the same length which is statewise related to σ 2. This yields a contradiction because contradicts the assumption that M, s 1 f (E(ψ 1 W ψ 2 )) N M, s 1 f P((ψ 1 ) N W (ψ 2 ) N ), i.e, for some normal paths either there exists j 0 such that M, σ[j] f ψ 2 and for every 0 k < j it holds that M, σ[j] f ψ 1, or for every j 0 we have that M, σ[j] f ψ 1. The proof is similar for case ϕ = A(ψ 1 W ψ 2 ). Case 5: ϕ = O(ψ 1 U ψ 2 ). Suppose that M, s 1 f (O((ψ 1 U ψ 2 ))) N and M, s 2 O((ψ 1 ) N U (ψ 2 ) N ). Intuitively, it means that (ψ 1 ) N U (ψ 2 ) N does not hold for all normative paths starting in s 2. But not that by lemma 1, we can construct a matching execution from s 1 of the same length which is statewise related to the path in which (ψ 1 ) N U (ψ 2 ) N does not hold. However, this yields a contradiction because contradicts the assumption that for all normative path starting in s 1, (O((ψ 1 U ψ 2 ))) N holds. Hence, we can conclude M, s 1 f (O(ψ 1 U ψ 2 )) N M, s 2 f O((ψ 1 ) N U (ψ 2 ) N ). The proof for cases ϕ = P(ψ 1 U ψ 2 ), ϕ = O(ψ 1 W ψ 2 ), ϕ = O(ψ 1 W ψ 2 ) are similar to proof of case 5.

Characterizing Fault-Tolerant Systems by Means of Simulation Relations 11 3.2 Nonmasking Fault-tolerance Nonmasking tolerance is less strict than masking tolerance, it allows for the existence of some states which do not mask faults. Technically speaking, the liveness part is always guaranteed but the safety part is only eventually respected. Intuitively, this type of fault-tolerance allows the program to violate the safety specification while it is recovering to the normal behaviour. The characterization of this kind of behavior is given as follows: Definition 3. (Nonmasking fault-tolerance) Given two colored Kripke structures M = S, I, R, L, N and M = S, I, R, L, N, we say that a relation Nonmask S S is nonmasking for sub-labeling L 0 L and L 0 L iff: (A) s 1 I ( s 2 I : s 1 Nonmask s 2 ) and s 2 I ( s 1 I : s 1 Nonmask s 2 ). (B) for all s 1 Nonmask s 2 the following holds: (1) L 0 (s 1 ) = L 0(s 2 ). (2) if s 1 P ost N (s 1 ), then there exists s 2 P ost N (s 2 ) with s 1 Nonmask s 2. (3) if s 2 P ost N (s 2 ), then there exists s 1 P ost N (s 1 ) with s 1 Nonmask s 2. (4) if s 2 P ost F (s 2 ), then there exists s 1 P ost N (s 1 ) with s 1 Nonmask or s 2, (5) if s 2 P ost F (s 2 ) with s 1 Nonmask s 2 for all s 1 P ost N (s 1 ), then there exists a finite path fragment s 2 s 2 s 2 such that either: s 1 Nonmask s 2 for some s 1 P ost N (s 1 ), or s 1 Nonmask s 2. Conditions A, B.1, B.2, B.3, B.4 are similar to the conditions of Definition 2 of masking fault-tolerance. Condition B.5 asserts that, in the case that for each outgoing faulty transition s 2 s 2 where all normal successors of s 1 are not in a nonmasking relation with s 2, then there exists a path fragment that leads from s 2 to s 2 such that s 1 Nonmask s 2 for some normal successor state s 1 of s 1, or s 2 is nonmasking fault-tolerant for s 1. In this case we say that state s 2 is nonmasking tolerant for s 1, denoted by s 1 Nonmask s 2. In addition, we say that M is nonmasking fault-tolerant w.r.t. M iff s 0 Nonmask s 0 (where s 0 and s 0 are the corresponding initial states) for some nonmasking fault-tolerance Nonmask ; this fact is indicated by M Nonmask M. At first sight this kind of simulation seems similar to the notion of weak bisimulation used in process algebra [12], where silent steps are taken into account; however, note that faults may produce observable changes, whereas silent steps produce not observable (i.e. internal) changes. Let us present an example of nonmasking tolerance. Example 2. For the memory cell introduced in example 1, consider now the colored Kripke structures M (left) and M (right) depicted in figure 3. In this case, M expresses the same behavior that M does, but now we consider that two faults may occur: a bit changes its value from 1 to 0 and after that another bit may change its value too (i.e., two errors may occur consecutively). The relation R 2 = {(s 0, t 0 ), (s 1, t 1 ), (s 0, t 2 )}

12 R. Demasi, P.F. Castro, T.S.E. Maibaum, N. Aguirre is nonmasking tolerant for (M, M ) and the sub-labelings L 0 and L 0 obtained by restricting L and L to propositions m and w, respectively. Let us check the required conditions of Definition 3 for each pair of R 2 : Fig. 3. Two nonmasking fault-tolerance colored Kripke structures. (s 0, t 0 ) satisfies conditions B.1, B.2 and B.3. Finally, condition B.4 is satisfied because the faulty transition t 0 t 2 is nonmasking in relation to the normal transition s 0 s 0 with (s 0, t 2 ) R 2. (s 0, t 2 ) satisfies conditions B.1, B.2, and B.3. Finally, condition B.5 is satisfied since we have that t 3 is not nonmasking w.r.t. any successor of s 0, i.e. s 0 Nonmask t 3 and s 1 Nonmask t 3 ; and there exists a finite path fragment t 2 t 3 t 0 with (s 0, t 0 ) R 2. Moreover, there exists another finite path fragment t 2 t 3 t 1 with (s 1, t 1 ) R 2. (s 1, t 1 ) satisfies conditions B.1, B.2 and B.3. Note that for each pair described above, conditions B.1, B.2, and B.3 are satisfied by similar arguments used in the previous example shown above. We can extend nonmasking on path in the similar way that in Lemma 1. Lemma 2. (Nonmasking fault-tolerance on paths) Suppose we are given two colored Kripke structures M and M and a nonmasking fault-tolerance Nonmask for (M, M ) and s 1 Nonmask s 2. Then for each path fragment σ 2 = s 0,2 s 1,2 s 2,2 s 3,2... starting in s 0,2 = s 2 there exists a normative path fragment σ 1 = s 0,1 s 1,1 s 2,1 s 3,1... from state s 0,1 = s 1 such that s j,1 Nonmask s j,2 for all j. Let us investigate the properties of our definition of nonmasking. An obvious property is the following one: if we have a nonmasking fault-tolerance relation between s 1 and s 2 and for every state of normal paths starting in s 1, ϕ holds in the absence of faults, then eventually in fair executions we return to the ideal behavior in all possible futures in the presence of faults, in paths starting in s 2. The fairness assumption is the same imposed on the theorem 3.1.

Characterizing Fault-Tolerant Systems by Means of Simulation Relations 13 Theorem 2. If s 1 Nonmask s 2 for sub-labelings L 0 and L 0 obtained by restricting AP to AP, then M, s 1 f ϕ N M, s 2 f AFAG(ϕ) 5, where all the propositional variables of ϕ are in AP. Proof. We proceed by induction over the structure of the formula ϕ. Base case: ϕ = p. From s 1 Nonmask s 2 it follows by condition (B.1) for nonmasking fault-tolerance that s 1 and s 2 have the same valuation. Thus, M, s 1 f p N M, s 2 f p, where (p i ) N = p i by definition 1. Inductive case: Case 1: ϕ = ψ. Suppose M, s 1 f ( ψ) N and M, s 2 f ψ; the latter is equivalent to M, s 2 f ψ M, s 2 f ψ. But note by induction hypothesis and lemma 2, M, s 1 f ψ N holds, which contradicts our assumption. Then, M, s 1 f ( ψ) N M, s 2 f ψ. In fact, holds for M, s 2 f AFAG( ψ). Similar for ϕ = ψ 1 ψ 2. Case 2: ϕ = A(ψ 1 U ψ 2 ). Suppose that M, s 1 f (A(ψ 1 U ψ 2 )) N and M, s 2 f AFAG(A(ψ 1 U ψ 2 )), the latter is equivalent to M, s 2 f E (ψ 1 U ψ 2 ). Intuitively, it means that for some paths starting in s 2 either ψ 1 stops to hold too early, i.e., before ψ 2 becomes valid or ψ 1 always holds but never ψ 2. Formally, in the first case, for some path fragment σ 2 such that σ 2 [0] = s 2, there exists a j 0 such that M, σ 2 [j] = ψ 2, and for every 0 l < k < j, it holds that M, σ 2 [l] = ψ 1 and M, σ 2 [k] = ψ 1. In the second case, for some path fragment σ 2 such that σ 2 [0] = s 2, there exists a j 0 such that M, σ 2 [j] = ψ 1 ψ 2. Furthermore, we have that s 1 Nonmask s 2, then, by lemma 2, for each path fragment starting in s 2 there exists a normative path fragment σ 1 such that σ 1 [0] = s 1 of the same length which is statewise related to σ 2, i.e., for the second case for each path fragment σ 2 such that M, σ 2 [j] = ψ 1 ψ 2 with j 0, there exists a normative path fragment with σ 1 [0] = s 1 such that M, σ 1 [j] = ψ 1 ψ 2. Similar for the first case. This yields a contradiction because contradicts the assumption that M, s 1 f (A(ψ 1 U ψ 2 )) N M, s 1 f O((ψ 1 ) N U (ψ 2 ) N ), i.e, for all normal paths σ 1 such that σ 1 [0] = s 1, there exists a j 0 such that M, σ 1 [j] = ψ 2, and for every 0 k < j, M, σ 1 [k] = ψ 1. Case 3: ϕ = E(ψ 1 U ψ 2 ). Similar to the proof for case 2. Case 4: ϕ = E(ψ 1 W ψ 2 ). Suppose that M, s 1 f (E(ψ 1 W ψ 2 )) N and M, s 2 f AFAG(E(ψ 1 W ψ 2 )), the latter is equivalent to M, s 2 f A (ψ 1 W ψ 2 ) M, s 2 f A ((ψ 1 ψ 2 ) U ( ψ 1 ψ 2 )). Intuitively, it means that for all paths starting in s 2 ψ 1 stops to hold too early, i.e., before ψ 2 becomes valid. In addition, we have that s 1 Nonmask s 2, then, by lemma 2, for each path fragment σ 2 such that σ 2 [0] = s 2, there exists a j 0 such that 5 We assume that a subsequent execution of M from these states eventually reaches a state from where the desired correctness properties of M are satisfied.

14 R. Demasi, P.F. Castro, T.S.E. Maibaum, N. Aguirre M, σ 2 [j] = ( ψ 1 ψ 2 ), and for every 0 k < j, M, σ 2 [k] = (ψ 1 ψ 2 ), there exists a normative path fragment from state s 1 of the same length which is statewise related to σ 2. This yields a contradiction because contradicts the assumption that M, s 1 f (E(ψ 1 W ψ 2 )) N M, s 1 f P((ψ 1 ) N W (ψ 2 ) N ), i.e, for some normal paths either there exists j 0 such that M, σ[j] f ψ 2 and for every 0 k < j it holds that M, σ[j] f ψ 1, or for every j 0 we have that M, σ[j] f ψ 1. The proof is similar for case ϕ = A(ψ 1 W ψ 2 ). Case 5: ϕ = O(ψ 1 U ψ 2 ). Suppose that M, s 1 f (O((ψ 1 U ψ 2 ))) N and M, s 2 f AFAG(O((ψ 1 ) N U (ψ 2 ) N )). Intuitively, it means that (ψ 1 ) N U (ψ 2 ) N does not hold for all normative paths starting in s 2. But not that by lemma 2, we can construct a matching execution from s 1 of the same length which is statewise related to the path in which (ψ 1 ) N U (ψ 2 ) N does not hold. However, this yields a contradiction because contradicts the assumption that for all normative path starting in s 1, (O((ψ 1 U ψ 2 ))) N holds. Hence, we can conclude M, s 1 f (O(ψ 1 U ψ 2 )) N M, s 2 f AFAG(O((ψ 1 ) N U (ψ 2 ) N )). The proof for cases ϕ = P(ψ 1 U ψ 2 ), ϕ = O(ψ 1 W ψ 2 ), ϕ = O(ψ 1 W ψ 2 ) are similar to proof of case 5. 3.3 Failsafe Fault-tolerance Finally, we present a characterization of failsafe fault-tolerance. Failsafe faulttolerance ensures that the system will stay in a safe state; this means that the safety invariants (regarding normative states) are preserved, but liveness conditions may not be. Definition 4. (Failsafe fault-tolerance) Given two colored Kripke structures M = S, I, R, L, N and M = S, I, R, L, N, we say that a relation F ailsafe S S is failsafe for sub-labelings L 0 L and L 0 L iff: (A) s 1 I ( s 2 I : s 1 F ailsafe s 2 ) and s 2 I ( s 1 I : s 1 F ailsafe s 2 ). (B) for all s 1 F ailsafe s 2 the following holds: (1) L 0 (s 1 ) = L 0(s 2 ). (2) if s 1 P ost N (s 1 ), then there exists s 2 P ost N (s 2 ) with s 1 F ailsafe s 2. (3) if s 2 P ost N (s 2 ), then there exists s 1 P ost N (s 1 ) with s 1 F ailsafe s 2. (4) if s 2 P ost F (s 2 ), then either there exists s 1 P ost N (s 1 ) with L 0 (s 1) = L 0(s 2) or L 0 (s 1 ) = L 0(s 2). Conditions A, B.1, B.2, and B.3 are similar to the conditions of Definition 2 of masking fault-tolerance. Condition B.4 states that every outgoing faulty transition from s 2 either must be matched by an outgoing normal transition from s 1, requiring that states s 1 and s 2 are labeled with the same propositions, or s 2 is failsafe fault-tolerant for s 1. We say that state s 2 is failsafe fault-tolerant for s 1, denoted s 1 F ailsafe s 2. Moreover, we say that M is failsafe fault-tolerant for M iff s 0 F ailsafe s 0

Characterizing Fault-Tolerant Systems by Means of Simulation Relations 15 (where s 0 and s 0 are the corresponding initial states) for some failsafe relation F ailsafe ; we denote this situation by M F ailsafe M. We present a simple example to illustrate this notion. Example 3. Consider the colored Kripke structures M (left) and M (right) depicted in Figure 4. M contains only normal transitions describing the expected ideal behavior. On the other hand, M expresses the same behavior that M does but it considers the occurrence of one fault. The relation R 3 = {(s 0, t 0 ), (s 1, t 1 )} is a failsafe fault-tolerance relation for (M, M ) and the sub-labelings that we obtain by restricting L and L to propositions m and w. Now, we show that each pair of R 3 satisfies the Definition 4: Fig. 4. Two failsafe fault-tolerance colored Kripke structures. (s 0, t 0 ) satisfies conditions B.1, B.2, and B.3. Finally, condition B.4 is satisfied because the faulty transition t 0 t 2 is failsafe in relation to the normal transition s 0 s 0 with L(s 0 ) = L (t 2 ). (s 1, t 1 ) satisfies conditions B.1, B.2, and B.3. Note that for each pair described above, conditions B.1, B.2, and B.3 are satisfied by arguments identical to those used in example 1. Now, we extend the definition of failsafe fault-tolerance to paths. Lemma 3. (Failsafe fault-tolerance on paths) Suppose we are given two colored Kripke structures M and M and a failsafe fault-tolerance F ailsafe for (M, M ) and s 1 F ailsafe s 2. Then for each path fragment σ 2 = s 0,2 s 1,2 s 2,2 s 3,2... starting in s 0,2 = s 2 there exists a normative path fragment σ 1 = s 0,1 s 1,1 s 2,1 s 3,1... from state s 0,1 = s 1 of the same length such that s j,1 F ailsafe s j,2 for all j.

16 R. Demasi, P.F. Castro, T.S.E. Maibaum, N. Aguirre Proof. The proof is similar to Lemma 1. The following theorem states that our definition of failsafe fault-tolerance preserves safety properties. Theorem 3. If s 1 F ailsafe s 2 for sublabelings L 0 and L 0 obtained by restricting AP to AP, and ϕ is a safety property, then M, s 1 ϕ N M, s 2 ϕ, where all the propositional variables of ϕ are in AP. Proof. We proceed by induction over the structure of the formula ϕ. Base case: ϕ = p. M, s 1 p N iff p L(s 1 ) (where (p i ) N = p i by definition 1), the latter is equivalent to p L(s 2 ) by condition (B.1) of definition 4 of failsafe fault-tolerance where s 1 and s 2 have the same valuation. Thus, M, s 2 p iff p L(s 2 ). Inductive case: Case 1: ϕ = ψ. Suppose M, s 1 ( ψ) N and M, s 2 f ψ; the latter is equivalent to M, s 2 ψ M, s 2 ψ. But note by induction hypothesis and lemma 3, M, s 1 ψ N holds, which contradicts our assumption. Then, M, s 1 ( ψ) N M, s 2 ψ. Similar for ϕ = ψ 1 ψ 2. Case 2: ϕ = A(ψ 1 U ψ 2 ). Suppose that M, s 1 (A(ψ 1 U ψ 2 )) N and M, s 2 f A(ψ 1 U ψ 2 )), the latter is equivalent to M, s 2 E (ψ 1 U ψ 2 ). Intuitively, it means that for some paths starting in s 2 either ψ 1 stops to hold too early, i.e., before ψ 2 becomes valid or ψ 1 always holds but never ψ 2. Formally, in the first case, for some path fragment σ 2 such that σ 2 [0] = s 2, there exists a j 0 such that M, σ 2 [j] = ψ 2, and for every 0 l < k < j, it holds that M, σ 2 [l] = ψ 1 and M, σ 2 [k] = ψ 1. In the second case, for some path fragment σ 2 such that σ 2 [0] = s 2, there exists a j 0 such that M, σ 2 [j] = ψ 1 ψ 2. Furthermore, we have that s 1 F ailsafe s 2, then, by lemma 3, for each path fragment starting in s 2 there exists a normative path fragment σ 1 such that σ 1 [0] = s 1 of the same length which is statewise related to σ 2, i.e., for the second case for each path fragment σ 2 such that M, σ 2 [j] = ψ 1 ψ 2 with j 0, there exists a normative path fragment with σ 1 [0] = s 1 such that M, σ 1 [j] = ψ 1 ψ 2. Similar for the first case. This yields a contradiction because contradicts the assumption that M, s 1 (A(ψ 1 U ψ 2 )) N M, s 1 O((ψ 1 ) N U (ψ 2 ) N ), i.e, for all normal paths σ 1 such that σ 1 [0] = s 1, there exists a j 0 such that M, σ 1 [j] = ψ 2, and for every 0 k < j, M, σ 1 [k] = ψ 1. Case 3: ϕ = E(ψ 1 U ψ 2 ). Similar to the proof for case 2. Case 4: ϕ = E(ψ 1 W ψ 2 ). Suppose that M, s 1 (E(ψ 1 W ψ 2 )) N and M, s 2 f E(ψ 1 W ψ 2 )), the latter is equivalent to M, s 2 A (ψ 1 W ψ 2 ) M, s 2 A ((ψ 1 ψ 2 ) U ( ψ 1 ψ 2 )). Intuitively, it means that for all paths starting in s 2 ψ 1 stops to hold too early, i.e., before ψ 2 becomes valid.

Characterizing Fault-Tolerant Systems by Means of Simulation Relations 17 In addition, we have that s 1 F ailsafe s 2, then, by lemma 3, for each path fragment σ 2 such that σ 2 [0] = s 2, there exists a j 0 such that M, σ 2 [j] = ( ψ 1 ψ 2 ), and for every 0 k < j, M, σ 2 [k] = (ψ 1 ψ 2 ), there exists a normative path fragment from state s 1 of the same length which is statewise related to σ 2. This yields a contradiction because contradicts the assumption that M, s 1 (E(ψ 1 W ψ 2 )) N M, s 1 P((ψ 1 ) N W (ψ 2 ) N ), i.e, for some normal paths either there exists j 0 such that M, σ[j] ψ 2 and for every 0 k < j it holds that M, σ[j] ψ 1, or for every j 0 we have that M, σ[j] ψ 1. The proof is similar for case ϕ = A(ψ 1 W ψ 2 ). Case 5: ϕ = O(ψ 1 U ψ 2 ). Suppose that M, s 1 (O(ψ 1 U ψ 2 )) N and M, s 2 f O((ψ 1 ) N U (ψ 2 ) N ). Intuitively, it means that (ψ 1 ) N U (ψ 2 ) N does not hold for all normative paths starting in s 2. But not that by lemma 3, we can construct a matching execution from s 1 of the same length which is statewise related to the path in which (ψ 1 ) N U (ψ 2 ) N does not hold. However, this yields a contradiction because contradicts the assumption that for all normative path starting in s 1, (O(ψ 1 U ψ 2 )) N holds. Hence, we can conclude M, s 1 (O(ψ 1 U ψ 2 )) N M, s 2 O((ψ 1 ) N U (ψ 2 ) N ). The proof for cases ϕ = P(ψ 1 U ψ 2 ), ϕ = O(ψ 1 W ψ 2 ), ϕ = O(ψ 1 W ψ 2 ) are similar to proof of case 2. Roughly speaking, this property says that if we have a failsafe relation between s 1 and s 2 and for every state in normal paths starting in s 1, ϕ holds in the absence of faults, then ϕ is always true even in the presence of faults in paths starting in s 2. It is worth investigating the relational properties of the relations defined above, this is done in the following theorem: Lemma 4. Given relations Mask, Nonmask and F ailsafe, we have the following properties: Mask, Nonmask, F ailsafe are transitive, If M does not have faults, then: M M M M, where { Mask, Nonmask, F ailsafe } Mask, Nonmask and F ailsafe are not necessarily reflexive. Proof. Nonreflexivity: if Mask is reflexive, then for any colored Kripke structure with state space S, the identity relation R = {(s, s) s S} is masking fault-tolerance for (M, M). Although, we show that Mask is not reflexive via the following counterexample: given the colored Kripke structure M with state space S = {s 0, s 1 } depicted in figure 5, its identity relation R = {(s 0, s 0 ), (s 1, s 1 )} is not masking fault-tolerance because the pair (s 0, s 0 ) does not satisfy condition (B.4) of definition 2: for s 1 P ost F (s 0 ) there does not exist a normal successor from s 0 with s 1 Mask s 1. Symmetry: Assume R a masking fault-tolerance for two colored Kripke structures M and M. Consider R 1 = {(s 2, s 1 ) (s 1, s 2 ) R}

18 R. Demasi, P.F. Castro, T.S.E. Maibaum, N. Aguirre Fig. 5. Counterexample for nonreflexivity of Mask, F ailsafe and Nonmask. that is obtained by swapping the states in any pair in R. Clearly, relation R 1 satisfies conditions (A) and (B.1). Conditions (B.2) and (B.3) also holds in R 1 by symmetry of (B.2) and (B.3). Finally, condition (B.4) is satisfied due to we do not take in account the faulty transitions in the left colored kripke structures of the pair (M, M ). Hence, R 1 is masking fault-tolerance for (M, M). Transitivity Let R 1,2 and R 2,3 be masking fault-tolerance for (M 1, M 2 ) and (M 2, M 3 ), respectively. The relation R 1,3 = R 1,2 R 2,3, given by R 1,3 = {(s 1, s 3 ) s 2 S 2 : (s 1, s 2 ) R 1,2 (s 2, s 3 ) R 2,3 } is a masking fault-tolerance for (M 1, M 3 ) where S 2 denotes the set of states in M 2. This can be proven by checking all conditions of definition 2 for a masking fault-tolerance. (A) Consider the initial state s 1 of M 1. Since R 1,2 is a masking fault-tolerance, there is an initial state s 2 of M 2 with (s 1, s 2 ) R 1,2. As R 2,3 is a masking fault-tolerance, there is an initial state s 3 of M 3 with (s 2, s 3 ) R 2,3. Thus, (s 1, s 3 ) R 1,3. In the same way we can check that for any initial state s 3 of M 3, there is an initial state s 1 of M 1 with (s 1, s 3 ) R 1,3. (B.1) By definition of R 1,3, there is a state s 2 in M 2 with (s 1, s 2 ) R 1,2 and (s 2, s 3 ) R 2,3. Then, L 1 (s 1 ) = L 2 (s 2 ) = L 3 (s 3 ). (B.2) Assume (s 1, s 3 ) R 1,3. As (s 1, s 2 ) R 1,2, it follows that if s 1 P ost N (s 1 ) then (s 1, s 2) R 1,2 for some s 2 P ost N (s 2 ). Since (s 2, s 3 ) R 2,3, we have (s 2, s 3) R 2,3 for some s 3 P ost N (s 3 ). Hence, (s 1, s 3 ) R 1,3 (B.3) Similar to the proof for (B.2). (B.4) Assume (s 1, s 3 ) R 1,3. As (s 1, s 2 ) R 1,2, it follows that if s 2 P ost F (s 2 ) then (s 1, s 2) R 1,2 for some s 1 P ost N (s 1 ). Since (s 2, s 3 ) R 2,3, we have (s 2, s 3) R 2,3 for some s 3 P ost F (s 3 ) and s 2 P ost N (s 2 ). Moreover, for s 2 s 2 by condition (B.3) there exists s 1 P ost N (s 1 ) with (s 1, s 2) R 1,2. Thus, we have that there exists s 2 such that (s 1, s 2) R 1,2 and (s 2, s 3) R 2,3. Then, for s 3 P ost F (s 3 ) there exists s 1 P ost N (s 1 ) with (s 1, s 3) R 1,3. The proofs for simmetry and transitive for nonmasking and failsafe are similiar to the the one presented above. Also, we have inclusions between these kinds of relations, for example, any masking fault-tolerant system is also fail-safe, this is stated in the following theorem.

Characterizing Fault-Tolerant Systems by Means of Simulation Relations 19 Theorem 4. Let Mask, NMask and FSafe be the sets of masking, nonmasking and failsafe relations between two Kripke structures M and M, then we have: Mask FSafe and Mask NMask Proof Sketch: The proof is simple exploring the definitions of masking, nonmasking, and failsafe fault-tolerance. The three of them share the same conditions about the normative behaviors, this means that conditions A, B.1, B.2, and B.3 are stated in each one of the definitions. Thus, considering the ideal behavior the theorem works. In the presence of faults, on masking fault-tolerance both safety and liveness properties are satisfied, this is the most restricted one compared with failsafe (only safety is guaranteed) and nonmasking (liveness is respected but safety is eventually satisfied). At the first sight, if we have a masking tolerance then it is clear that it is also failsafe and nonmasking. Let check our definitions to be sure that these work correctly. Condition B.4 of Definition 2 is included in Definition 3 of nonmasking tolerance over condition B.4 and part of B.5 respectively. Finally, condition B.4 is included in Definition 4 of failsafe tolerance over condition B.4: if s 2 P ost F (s 2 ), then either there exists s 1 P ost N (s 1 ) with L 0 (s 1) = L 0(s 2) or L 0 (s 1 ) = L 0(s 2), where we only require that for the both cases states s 1 (s 1 ) and s 2 are equally labeled with the same propositions, we do not require that these states are in the relation. Thus, we focus in guarantee the safety properties on failsafe. It is clear to observe that on the definition of masking tolerance, requiring that s 1 Mask s 2 or s 1 Mask s 2, we are evidently requiring that L 0 (s 1) = L 0(s 2) or L 0 (s 1 ) = L 0(s 2). 4 Checking Fault-Tolerance Properties Simulation and bisimulation relations are amenable to efficient treatment. For instance, in [7, 10] algorithms for calculating several simulation and bisimulation relations are described and proven to be polynomial with respect to the number of states and transitions of the corresponding models. We have adapted these algorithms to our setting and then we have obtained efficient procedures to prove masking, nonmasking and failsafe fault-tolerance. Such algorithms can be used to verify whether M M, where { Mask, Nonmask, F ailsafe }. In order to verify fault-tolerance properties, we take coloured Kripke structures M and M over AP, and combine them in a single coloured Kripke structure M M, which basically results from the disjoint union of state spaces. The standard bisimulation algorithms can be used for checking that the intended behaviour describe by the specification is bisimilar to the one exhibited by the fault-tolerant implementation (conditions B.2 and B.3 of masking, nonmasking, and failsafe).