Diagnosis of Discrete-Event Systems in Rules-based Model using First-order Linear Temporal Logic

Diagnosis of Discrete-Event Systems in Rules-based Model using First-order Linear Temporal Logic Zhongdong Huang and Siddhartha Bhattacharyya Dept. of Elec. & Comp. Eng., Univ. of Kentucky, Lexington, KY Vigyan Chandra Technology Dept., Eastern Kentucky University, Richmond, KY Shengbing Jiang GM R&D Planning, Warren, MI Ratnesh Kumar Dept. of Elec. and Comp. Eng., Iowa State Univ., Ames, IA Abstract We study diagnosis of discrete-event systems (DESs) modeled in the rules-based modeling formalism introduced in [6, 7], and applied in [15, 17] to model failure-prone systems. An attractive feature of rules-based model is it s compactness (size is polynomial in number of signals). A motivation for the work presented is to develop failure diagnosis techniques that are able to exploit this compactness. In this regard, we develop symbolic techniques for testing diagnosability and online diagnosis. Diagnosability test is shown to be an instance of first-order temporal logic model-checking. An algorithm for online diagnosis is obtained by using predicates and their transformers. Keywords: Discrete event system, rules-based model, diagnosability, first-order linear temporal logic model-checking, online diagnosis, predicates and predicate transformers 1 Introduction Detection and isolation of failures in large, complex systems is a crucial and challenging task. A failure is a deviation of a system from its normal or required behavior, such as occurrence of a failure event, or visiting a failed state, or more generally, violating a design The research was supported in part by the National Science Foundation under the grants NSF-ECS- 0218207, NSF-ECS-0244732, NSF-EPNES-0323379, and NSF-0424048, and a DoD-EPSCoR grant through the Office of Naval Research under the grant N000140110621. 1

specification. A stuck-close valve, decrease in the efficiency of a heat exchanger, abnormal bias in the output of a sensor, and leakage in pipelines are examples of events that can lead to failures. Failure diagnosis is the process of detecting and identifying deviations in a system behavior using the information available through sensors. The problem of failure diagnosis has received considerable attention in the literature of reliability engineering, control, and computer science; and a wide variety of schemes have been proposed. Recently it has also been studied in the framework of discrete-event systems (DESs) such as [11, 28, 25, 33, 32, 18, 39, 24, 40, 41, 1]. For extensions to decentralized setting see [12, 2, 31, 35, 30, 36], and to detection of repeatedly-occurring/intermitten failures see [21, 19, 9, 38]. Some applications can be found in [34, 3, 4, 27, 14, 37, 26]. In above works, the non-faulty behavior of the system, also called the specification, is either specified by an automaton (containing no failure states) or by a language (event-traces containing no failure events). Temporal logic based specification has been used for studying diagnosis of DESs in [29, 10, 20, 19]. In this paper, we study failure diagnosis of systems modeled in a rules-based model introduced in [6, 7], and applied in [15, 17] to model failure prone systems. In this formalism, state-variables and rules for modifying their values are used to compactly model a DES. The representation of a system with faults in the rules-based modeling formalism is polynomial in the number of signals and faults. The compactness of this model, together with its intuitive nature, makes it user-friendly, less error-prone, more flexible, easily scalable, and provides canonicity of representation for models of systems with or without faults. The motivation of the work presented here is to develop techniques for failure diagnosis that are able to exploit the compactness of the model. In this regard, we develop symbolic techniques based on first-order temporal logic model-checking for verifying diagnosability, and predicates and their transformers for online diagnosis. In the rules-based modeling formalism, states correspond to values of certain variables which start from certain initial values and evolve as the events occur. The input signals of the system are the independent variables, whereas the output signals are the dependent variables which are a function of the independent variables and of other dependent variables. For simplicity, all signals in the system are assumed to be binary valued (extension to nonbinary valued signals is given in [7]), and it is also assumed that the state of the system can be specified by the current values of the signals (extension to where the state depends on also the past values is also given in [7]). Initial condition is used to specify the initial values of the system signals. An event is a transition of an input or an output signal from one binary value to another. For each of the input and output events of the system an event occurrence rule is specified. The antecedent of such a rule is a predicate over the signal values that serves as the event enabling condition, whereas the consequent of the rule assigns new values to the various variables. In order to model failure prone systems, a binary valued fault signal is introduced to model presence or absence of each fault. From a modeling standpoint, the fault signals are treated just the same as any other signal in the system. Addition of fault signals to capture the faulty behavior requires new rules for the added fault events, and modification of rules of existing non-faulty events, by appropriately weakening their enabling guard conditions. 2

In this paper we use first-order LTL model-checking for testing diagnosability of DESs, and predicates and their transformers for online diagnosis. We illustrate through various examples how the diagnosability of DESs modeled using a rules-based formalism [7] prone to faults can be checked, and how online diagnosis can be performed. A conference version of the paper appeared as [16]. The rest of the paper is organized as follows. In Section 2, the definitions of predicates and their transformers, rules-based model, diagnosability, and first-order LTL temporal logic are introduced. In Section 3, diagnosability as a first-order LTL temporal logic model-checking is studied, and illustrated via an example. An algorithm for online diagnosis is provided in Section 4 and illustrated using the same example of Section 3. Section 5 provides another illustrative exmaple. Conclusions is provided in Section 6. 2 Notation and Preliminaries 2.1 Predicates, their transformers, and rules-based model A discrete-event system, denoted G, is a 4-tuple G := (X, Σ,, X 0 ), where X denotes the state set, Σ is the finite event set, X Σ X is the set of state transitions, and X 0 X is the set of initial states. A set of events Σ F Σ denote the set of failure events. We use state-variables to represent the states and a finite set of conditional assignment statements, called rules, to represent the state transitions. The notation v is used to denote the vector of state-variables of G. If v is n-dimensional, then v = [v 1,..., v i,..., v n ], where v i is the ith state-variable. The state space X of G equals the Cartesian product of domains of all state-variables, i.e., X := n i=1 D(v i ), where D(v i ) is the domain of v i. By definition D(v i ) is a countable set and can be identified with the set of natural numbers N. We use predicates for describing various subsets of the state space. A predicate P ( v) is a boolean-valued map over the set of states, P ( v) : X {0, 1}. The set of all predicates is denoted by, P( v). Consider for example a two dimensional state space X = Z 2. Then the predicate P ( v) = [v 1 v 2 ] refers to all the states in which the value of variable v 1 is at least as large as the value of variable v 2. The symbols true and false are used for denoting predicates that hold on all and none of the states respectively. With every predicate P ( v) P( v), we associate a set X P X on which P ( v) takes the value one. Thus the collection of predicates P( v) has a one-to-one correspondence with the power set 2 X, and the names predicates and state-sets can be used interchangeably. We say that the predicate P ( v) holds on ˆX X if ˆX X P. Given P ( v) P( v), its negation is denoted by P ( v). Given an indexing set Λ such P λ ( v) P( v) for each λ Λ, the conjunction and disjunction over Λ is denoted by λ Λ P λ ( v) and λ Λ P λ ( v) respectively. The quadruple (P( v),,, ) forms a boolean algebra that is isomorphic to the algebra of the subsets of X with the operations of complementation, intersection, and union. For predicates P ( v) and Q( v), we say that P ( v) is stronger than Q( v) (equivalently, Q( v) is weaker than P ( v)), if P ( v) Q( v) = P ( v), or equivalently, P ( v) Q( v) = Q( v). 3

State transitions map a state to another state. Such mappings are extended to set of states or predicates in a natural way, and are known as predicate transformers. We use F to denote the collection of all predicate transformers, i.e., if f F, then f : P( v) P( v). The conjunctive closure of f, denoted f, and disjunctive closure of f denoted f is defined to be i 0 f i and i 0 f i respectively, where f 0 is the identity predicate transformer and f i+1 := f(f i ). Given f : X X, the substitution predicate transformer v f( v) maps a predicate P ( v) to P (f( v)). Consider for example the f given by (v 1, v 2 ) (v 1 + v 2, v 1 v 2 ). Then the corresponding substitution predicate transformer maps the predicate [v 1 < v 2 ] to [v 1 + v 2 < v 1 v 2 ] = [v 2 < 0]. Next we review the rules-based model [7] (which is a specific assignment program model [23]) for representing a DES G described above. The initial state set of G is specified as an initial predicate, denoted I( v), which implies X 0 = X I. The state transitions of G is specified using a finite set of rules, also called conditional assignment statements, of the form: σ : [C σ ( v)] [ v f σ ( v)], where σ Σ is an event, C σ ( v) is a predicate, called the enabling condition or the guard, and f σ : X X is a map defined on the state space. If no guard is present, then true is treated as the guard. A conditional assignment statement of the above type is enabled if the condition C σ ( v) holds. An enabled assignment statement may execute. Upon execution, new values are assigned to the state-variables according to the map f σ and a state transition on the event σ occurs. We assume that if multiple assignment statements are simultaneously enabled, only one of them is nondeterministically executed. This assumption may be relaxed to allow concurrency of execution. Example 1 For the purpose of illustrating the representation of a DES in the rules-based model, we consider a simplified model of the transporter section of the LEGO r based assembly-line [5]. A schematic of the transporter is shown in Figure 1. M1 Gearbox motor (Issued commands Ifon/Ifoff; Iron/Iroff for forward/reverse movement) angle sensor rack A1 pinion FIXTURE TRANSPORTER Intermediate Home position positions Extended position (Angle sensor records intermediate, iup/idn, and extended, eup/edn, positions) Figure 1: Schematic of transporter 4

The transporter moves between home and extended positions, crossing an of intermediary position. The events that can occur in transport system are: Tfon, Tfoff, Tron, Troff, iup, idn, eup, edn. Tfon/Tfoff refers to the gear-box motor being turned on/off in the forward direction, while Tron/Troff is for the reverse direction. When the transporter leaves the home position it enters an intermediary position, i, which is a collection of all those positions whose values are unimportant from the positioning point of view. The events iup/idn correspond to the transporter arriving/leaving the intermediary position from/to the home position; and eup/edn correspond to its arriving/leaving the extended position from/to the intermediate one. We assume that the initial state for the present system is when all the actuators are off (Tr, Tf = off) and the transporter is in home position (i, e = dn). The model of the transporter in the rules-based formalism is given in Figure 2. As an example, the first rule is for the intermediate position becoming true, and is enbled when forward motor is on, reverse motor is off, and both the intermediate and extended positions are false. Initial condition: T f, T r, i, e = [off, off, dn, dn] Event occurrence rules: iup : [T f T r (i e)] [i i] idn : [T r T f (i e)] [i i] eup : [T f T r (i e)] [e e] edn : [T r T f (i e)] [e e] T fon : [T f] [T f T f] T foff : [T f] [T f T f] T ron : [T r] [T r T r] T roff : [T r] [T r T r] Figure 2: Rules-based model of transporter (without faults) The substitution predicate transformer can be used to define the forward one-step reachable, f r, and backward one-step reachable, br, predicate transformers for G. f r determines the postcondition after the occurrence of a state transition for a given precondition, whereas br determines the precondition prior to the occurrence of a state transition for a given postcondition. For the assignment statement σ : [C σ ( v)] [ v f σ ( v)] and a condition P ( v), these are formally defined as follows: fr(p ( v), σ) := C σ (fσ 1 ( v)) P (fσ 1 ( v)); br(p ( v), σ) := C σ ( v) P (f σ ( v)). Note that the computation of br is easier as compared to that of fr, since its computation does not require the extra computation of f 1. For ˆΣ Σ, we define fr(p ( v), ˆΣ) := 5

σ ˆΣ fr(p ( v), σ), and similarly, br(p ( v), ˆΣ) := σ ˆΣ br(p ( v), σ). Finally, note that fr (P ( v), ˆΣ) denotes the set of states which are reachable from a state in P ( v) by execution of zero or more transitions of events in ˆΣ. Similarly, br (P ( v), ˆΣ) denotes the set of states from where a state in P ( v) can be reached by execution of zero or more transitions of events in ˆΣ. Clearly, fr is useful in characterizing the forward reachability, whereas br is useful in characterizing the backward reachability. 2.2 Modeling failure-prone systems & diagnosability In order to obtain event occurrence rules for a system with faults, we consider both the signal faults and the system faults. Stuck-signal faults: These faults occur when any of the signals (such as actuators or sensors), owing to mechanical, electrical, or electromagnetic interference problems, gets stuck in a particular position with its logic status becoming either true or false permanently, until a recovery occurs through a repair or a replacement. When an actuator gets stuck in the on/off position such fault signals are denoted by so (stuck open)/sc (stuck closed) respectively; where the associated fault events are denoted by sof/scf and the recovery events by sor/scr, respectively. When a sensor gets stuck in the up/dn(down) position such fault signals are denoted by sup(stuck up)/sdn(stuck dn) respectively; whereas the associated fault events are denoted by supf/sdnf, and the recovery events by supr/sdnr, respectively. If the r th actuator/sensor, A r, is prone to both the stuck open/up fault which occurs only after the actuator/sensor is already in the on/up condition, and the stuck closed/down fault which occurs only when the actuator/sensor is already in the off/down condition, then the rule for the occurrence of these stuck actuator/sensor signal faults is written as: A r sof : [A r A r so] [A r so A r so] A r sor : [A r so] [A r so A r so] A r scf : [A r A r sc] [A r sc A r sc] A r scr : [A r sc] [A r sc A r sc] System/equipment faults: Certain fault signals such as equipment failures, power disruptions, system software crashes, etc., affect the entire system. These can occur spontaneously in the system depending only on their own values. In the transporter system of Example 1, a transporter crack is an example of a system/equipment fault. The events of this fault are crackf and crackr, corresponding to the occurrence of crack fault and crack recovery. The rules for the t th equipment fault, E t, can be expressed in the rules-based modeling formalism in a similar way to stuck-signal faults. In the presence of faults the guard conditions of the non-fault events are weakened by introducing additional disjunctive conditions under which the event can also occur. The antecedent of each rule now contains the disjunct of enabling condition with and without fault. 6

Example 2 For illustrating modeling of failure-prone systems in rules-based formalism, we extend the transporter model considered in Example 1 to include a fault of interest. We suppose the transport is prone to the forward motor stuck open fault (T fsonf ), whose occurrence does not alter the transport speed. The rules-based model of the transporter with this fault is given in Figure 3. As an example, the first rule for the event intermediate position becoming true has a weaker guard condition owing to the added disjunct with T fson in the first clause, implying that the effect of forward motor on or forward motor stuck-on on this event is the same. Initial condition: T f, T r, T fson, i, e = [off, off, off, dn, dn] Event occurrence rules: iup : [(T f T fson) T r (i e)] [i i] idn : [T r T f (i e)] [i i] eup : [(T f T fson) T r (i e)] [e e] edn : [T r T f (i e)] [e e] T fon : [T f] [T f T f] T foff : [T f T fson] [T f T f] T ron : [T r] [T r T r] T roff : [T r] [T f T r] T fsonf : [T f T fson] [T fson T fson] T fsonr : [T fson] [T fson T fson] Figure 3: Rules-based model of transporter with T f sonf fault Diagnosability is the ability to infer about past occurrences of unobservable failure events from a bounded number of observed events. We let M : Σ {ɛ} {ɛ} with M(ɛ) = ɛ denote the event observation mask. Without loss of generality, M(σ) = ɛ for all σ Σ F. The diagnosability of DESs [33] is defined as follows (without loss of generality [18], we only consider a single failure type ): Definition 1 A system G is said to be diagnosable with respect to the observation mask M and the failure event set Σ F if ( n N) ( s L(G), s f Σ F ) ( v = st L(G), t n) ( w L(G), M(w) = M(v)) ( u pr({w}), u f Σ F ), where L(G) is the set of all event-traces G can execute starting from an initial state, s f, u f Σ denote the last events in traces s, u Σ respectively, and pr({w}) Σ is the set of all prefixes of w Σ. Diagnosability requires that every failure event leads to observations distinct enough to enable identification of the failure within a bounded delay. 7

2.3 First-order linear temporal logic The language of propositional logic (PL) consists of finite number of atomic proposition together with the Boolean connectives. Propositional linear temporal logic (PLTL) [13] is an extension of PL by the temporal logic quantifiers/operator {X, U, F, G, B}, called next time, until, eventually, always, and before, respectively. The temporal operators describe properties of a state-trace in a computation: X ( next time ): it requires that a property hold in the next state of the state-trace. U ( until ): it is used to combine two properties. The combined property holds if there is a state in the state-trace where the second property holds, and at every preceding state in the trace, the first property holds. F ( eventually or in the future ): it is used to assert that a property will hold at some future state in the state-trace. It is a special case of until. G ( always or globally ): it specifies that a property holds at every state in the state-trace. B ( before ): it also combines two properties. It requires that if there is a state in the state-trace where the second property holds, then there exists a preceding state in the trace where the first property holds. We have following relations among the above operators, where p and q denote temporal logic formulae: F p trueup; Gp F p; pbq ( puq). So X and U can be used to express the other temporal operators. First-order linear temporal logic (FOLTL) [13] is obtained by taking PLTL and endowing it with a first-order logic. That is, in addition to atomic propositions, Boolean connectives, and temporal operators we now also have variables, constants & functions, and relations (also called predicates), each interpreted over appropriate domains, and existential and universal quantifiers. The terms of a FOLTL are defined inductively as follows: T1 Each zero-ary function (i.e, a constant) is a term. T2 Each variable is a term. T3 If t 1,, t n are terms and f is an n-ary function symbol, then f(t 1,, t n ) is a term. The formulae of a FOLTL are defined inductively as follows: F1 Each zero-ary relation/predicate symbol (i.e., an atomic proposition) is a formula. F2 If t 1,, t n formula. are terms and ψ is an n-ary relation/predicate, then ψ(t 1,, t n ) is a 8

F3 If p, q are formulae, then p q, p q, p, Xp, puq are formulae. F4 If p is a formula with a free variable v, then v.p, v.p are formulae. (Note v.p v. p.) The semantics of FOLTL is defined over state-traces pi : N X. For a FOLTL formula p, the notation π = p denotes that π satisfies p, which is defined inductively as follows. π = p iff π(0) p, where p is an atomic proposition. π = ψ(t 1,, t n ) iff π(0) ψ(t 1,, t n )). π = p q iff π = p and π = q. π = p iff it is not the case that π = p. π = Xp iff π 1 = p, where π j is state sequence π starting from π(j). π = (puq) iff j(π j = q and k < j(π k = p)). π = v.p, where v is free variable in p, iff there exists d D(v) such that π = p[v d], where p[v d] is formula p with variable v assigned the value d. Given a FOLTL formula p, a system G is said to satify Ap (resp., Ep) if each (resp., exists) state-trace of G satisfies p. Verifying the satisfaction of such formulae Ap or Ep by a system G is known as a model-checking problem, whereas verifying the existence of a system G such that it satifies a given formula Ap or Ep is known as a satisfiability problem. There exist model-checking software tools such as NuSMV [8] for verifying FOLTL formulae provided each state-variable has a bounded domain, in which case a FOLTL formula can be viewed as a PLTL formula. 3 Diagnosability as First-Order LTL Model-Checking In order to test the diagnosability in the rules-based model, we generalize the test for diagnosability in the automaton setting presented in our past work [18]. The test in the automaton setting consists of the following steps: Refine the state set X of G by augmenting each state by a binary valued label such that for each x X, (x, 1) (resp., (x, 0)) represents the traces in L(G) that lead to x and contain a failure (resp., no failure) event. Construct a testing automaton T by taking masked synchronous composition of two copies of the refined G, where a pair of transitions, one from each copy, synchronize only if they are one a pair of indistinguishable events. 9

Check if the synchronous composition contains a cycle of state-pairs where the two components carry non-identical labels. (G is diagnosable if and only if no such cycles are found.) As in the automaton setting, we introduce a binary valued variable F to indicate whether or not a fault happened in past. with this the new state-variable set becomes, x := ( v, F ). We next need to extend the rules-based model to include this new state-variable. Since initially when no event has ocurred, no fault has occurred either, the initial state is given by the predicate, I( x) := I( v) [F = 0]. The rule for each event σ Σ, [C σ ( v)] [ v f σ ( v)] is extended as follows. For a non-faulty event σ Σ F, the extended rule is given by, [C σ ( v)] [( v, F ) (f σ ( v), F ) (non-faulty event retains the value of F as unchanged), whereas for a faulty event σ Σ F, the extended rule is given by, [C σ ( v)] [( v, F ) (f σ ( v), 1) (faulty event makes the value of F equal to 1). To facilitate diagnosis, we define a faulty-state predicate, B( x) = B(( v, F )) := [F = 1]. Using this predicate and the extended rules-base model (which includes the new boolean variable F, extended initial condition, and extended rules), we perform the diagnosis test as follows. Algorithm 1 Consider G with state-variables v, event set Σ, and model given by: Initial condition: I( v) P( v), and Event occurrence rules: σ Σ : [C σ ( v)] [ v f σ ( v)]. The set of fault events is denoted by Σ F Σ, and the events are partially observed through a event observation mask M : Σ {ɛ} {ɛ} with M(ɛ) = ɛ, and M(σ) = ɛ for each σ Σ F. Augment the state-variables by a boolean variable, F, to identify whether or not a fault happened in past. The augmented state-variable is given by, x = ( v, F ). The augmented system model is given by, Initial condition: I( x) = I( v) [F = 0] 10

Event occurrence rules: σ Σ F : [C σ ( v)] [ x (f σ ( v), 1)] σ Σ F : [C σ ( v)] [ x (f σ ( v), F )] Denote the set of states that are visited after a fault has happened in past by the predicate, B( x) = B(( v, F )) := [F = 1]. Obtain the testing system T model by performing a masked synchronous composition of augmented G with itself (here x and y are used to denote the state-variables of the two copies of the augmented G): Initial condition: I( x) I( y) Event occurrence rule: (σ, σ ) [(Σ {ɛ}) 2 {ɛ, ɛ}] s.t. M(σ) = M(σ ): (σ, σ ) : [C σ ( x) C σ( y)] [( x, y) (f σ ( x), f σ( y))] if M(σ) = Mσ ) ɛ (σ, ɛ) : [C σ ( x)] [( x, y) (f σ ( x), y)] if M(σ) = ɛ (ɛ, σ ) : [C σ( y)] [( x, y) ( x, f σ( y))] if M(σ ) = ɛ Note that when M(σ) = M(σ ) ɛ, a synchronized transition on (σ, σ ) occurs when both σ and σ are enabled in the respective copies, i.e., C σ ( x) and C σ( y) holds, in which case x is assigned the value f σ ( x) and y is assigned the value f σ( y). When σ (resp., σ ) is unobservable, an asynchronous transition occurs only in the first (resp., second) copy, provided it is enable there. Using first-order LTL model-checking check whether there exists an ambiguous cycle in T by verifying the following formula: x 0, y 0 [EGF ( x = x 0 y = y0 B( x0 ) B( y 0 ))]. Then G is diagnosable if and only if the above formula does not hold in T. The formula to be verified checks for the existence of a state pair ( x 0, y 0 ) (X {0, 1}) 2 with the property that x 0 is a faulty state: B( x 0 ) holds, y 0 is a non-faulty state: B( y 0 ) holds, ( x 0, y 0 ) is visited infinitely often along some state trajectory starting from the initial condition I( x) I( y): EGF ( x = x 0 y = y 0 ), i.e., exists a path (E) such that globally (G) along each state of the path, in future (F ) it holds that x = x 0 and y = y 0. Whenever the above formula is satisfiable, there exist a pair of indistinguishable traces of arbitrary long length in G one containing fault and another no fault, and as a result the system is not diagnosable. (Note that once the state-variable F becomes 1, it remains at 1. So the fact that B(y 0 ) holds infinitely often along one of the traces implies that this trace does not contain a fault. It follows that when state-variables of G are bounded a model-checking tool such as NuSMV [8] can be used to verify diagnosability. 11

Example 3 In order to illustrate our result, we give a simple example which consists of a traffic monitoring problem of a mouse in a maze. The maze, shown in Figure 4, consists of four rooms connected by various one-way passages, where some of them have sensors 0 mouse cat 1 food 3 2 : observable : unobservable Figure 4: Mouse in a maze installed to detect the passing of the mouse. There is also a cat which alway stays in room 1. The mouse is initially in room 0, and it can visit other rooms by using one way passages, and it never stays at one room forever. A failure occurs when the mouse visits the room occupied by the cat. Our task is to monitor the behavior of the mouse by observing the sensor signals to detect whether or not a failure occurred. The above problem can be formulated as a failure diagnosis problem in the rules-based model setting. The system to be diagnosed, G, has a single state-variable v denoting the location of the mouse in the maze, and it can take the values in the set {0, 1, 2, 3}. The initial state is I(v) = [v = 0], the event set is Σ = {o 1, o 2, o 3, u 1, u 2, u 3 }, and the event observation mask M is given as M(u i ) = ɛ and M(o i ) = o i for 1 i 3. Since a fault occurs when room 1 is visited, Σ F = {u 1 }, where note that u 1 is an unobservable event. The rules-based model of mouse in a maze is shown in Figure 5. Initial condition: I(v) = [v = 0]. Event occurrence rules: o 1 : [v = 0] [v 3] o 2 : [v = 3] [v 0] o 3 : [v = 2] [v 3] u 1 : [v = 0] [v 1] u 2 : [v = 1] [v 2] u 3 : [v = 3] [v 2] Figure 5: Rules-based model of mouse in a maze In order to verify diagnosability, we augment the state-variable v by the binary valued 12

variable F to obtain x := (v, F ). The augmented system model is shown in Figure 6. Initial condition: I( x) = (I(v), 0) = [0, 0] Event occurrence rules: o 1 : [v = 0] [ x (3, F )] o 2 : [v = 3] [ x (0, F )] o 3 : [v = 2] [ x (3, F )] u 1 : [v = 0] [ x (1, 1)] u 2 : [v = 1] [ x (2, F )] u 3 : [v = 3] [ x (2, F )] Figure 6: Augmented rules-based model of mouse in a maze Next using the state-variable y = (u, E) for the second copy of G, we compute the masked composition of augmented G with itself to obtain T as shown in Figure 7. Note that the observable events o 1, o 2, o 3 execute synchronously, whereas the unobservable events u 1, u 2, u 3 occur asynchronously. Initial condition: I( x, y) = (I(v), 0, I(u), 0) = [0, 0, 0, 0] Event occurrence rules: (o 1, o 1 ) : [v = 0] [u = 0] [( x, y) (3, F, 3, F )] (o 2, o 2 ) : [v = 3] [u = 3] [( x, y) (0, F, 0, F )] (o 3, o 3 ) : [v = 2] [u = 2] [( x, y) (3, F, 3, F )] (u 1, ɛ) : [v = 0] [( x, y) (1, 1, y)] (ɛ, u 1 ) : [u = 0] [( x, y) ( x, 1, 1)] (u 2, ɛ) : [v = 1] [( x, y) (2, F, y)] (ɛ, u 2 ) : [u = 1] [( x, y) ( x, 2, F )] (u 3, ɛ) : [v = 3] [( x, y) (2, F, y)] (ɛ, u 3 ) : [u = 3] [( x, y) ( x, 2, F )] Figure 7: Masked synchronizaiton of two augmented mouse in a maze We used NuSMV [8] for model-checking the diagnosability condition: x 0, y 0 [EGF ( x = x 0 y = y0 B( x0 ) B( y 0 ))]. This NuSMV tool allows computation of masked synchronous composition. We verified that the mouse in a maze is diagnosable. 13

Remark 1 The number of rules in the rules-based model is given by Σ, and so the number of rules in the masked synchronization of two copies of augmented systems is O( Σ 2 ). It turns out that both the system model and testing system model remain compact in the rules-based framework. This is in contrast to the number of states in an automaton representation which is in general exponential in the number of state-variables. The complexity of a rules-based model additionally also depends on the complexity of each guard condition, measured as the number of Boolean operators present in it. Discrete-event systems are typically man-made, and by design possess sufficient structure such that the complexity of the guard conditions is likely small (polynomial, as opposed to exponential, in the number of state-variables). Remark 2 We can use the diagnosability algorithm developed above together with the optimal sensor selection algorithm given in [22] to obtain an optimal observation mask while preserving the system diagnosability. Using this approach, we determined that the event o 3 need not be observable for the system to remain diagnosable. 4 Online Diagnosis using Predicate Transformers In this section we provide a symbolic technique for online diagnosis of systems modeled in a rules-based framework. Using this technique, we guaranteed to detect every fault within a bounded delay of its occurrence provided the system is diagnosable. The online diagnosis algorithm can be run even when the system is not diagnosable, in which case we will miss reporting some of the faults. The diagnoser maintains a predicate, denoted E k ( x) P( x), which is an estimate of the possible states following the occurrence of kth observable event. Initially, when no observation has occurred, i.e., when k = 0, E 0 ( x) consists of the set of states reached by zero or more unobservable events starting from I( x). Upon the occurrence of the (k + 1)th observable event (k 0), E k ( x) is updated to obtain E k+1 ( x). Whenver false E k ( x) B( x), it implies that a fault occurred in past, which can be repoted. Algorithm 2 Initiation step: E 0 ( x) = fr M 1 (ɛ) [I( x)] Iteration step: Upon (k + 1)th observation δ M(Σ) {ɛ}: E k+1 ( x) = fr M 1 (ɛ) [fr M 1 (δ)(e k ( x))] Declare a fault if: [false E k+1 ( x) B( x)] In the initiation step, E 0 ( x) is computed by the forward reachability of I( x) on zero or more unobservable events in M 1 (ɛ) (since a no observation corresponds to some execution 14

of an unobservable events sequence). In the iteration step, E k+1 ( x) is computed using a reachability computation starting from E k ( x) on a single event in M 1 (δ) followed by zero or more unobservable events in M 1 (ɛ) (since the (k + 1)th observation of δ results from the execution of an event in M 1 (δ) followed by the execution of a sequence of unobservable events in M 1 (ɛ) Σ). We illustrate the above algorithm for online diagnosis using the example of mouse in a maze given earlier. Example 4 Refer to the example of Section 3. We can perform the on-line diagnosis as follows: k = 0: Since I( x) = [v = F = 0] and B( x) = [F = 1], we have E 0 ( x) = [v, F = 0, 0] [v, F = 1, 1] [v, F = 2, 1]. Since false E 0 ( x) B( x), diagnoser will be ambiguous about whether a fault occurred or not. k = 1: There are three observable events o 1, o 2, o 3 Σ. If the first observation is o 1, then E 1 ( x) = [v, F = 3, 0] [v, F = 2, 0]. Since false E 1 ( x) B( x), diagnoser can be certain whether a fault occurred in past. If the first observation is o 2, then E 1 ( x) = false, implying o 2 as first observation is not possible. If the first observation is o 3, then E 1 ( x) = [v, F = 3, 1] [v, F = 2, 1]. Since false E 1 ( x) B( x), diagnoser can be certain that a fault has occurred sometimes in past. k = 2: Suppose the first observation (k = 1) is o 1. If the next observation is o 1, then E 2 ( x) = false, implying that the observation sequence o 1 o 1 is not possible. If the next observation is o 2, then E 2 ( x) = E 0 ( x), and the situation is the same as when no observation had occurred. If the next observation is o 3, then E 2 ( x) = E 1 ( x), and the situation is the same as when just the observation o 1 had occurred. 15

Next suppose the first observation (k = 1) is o 2. Then since E 1 ( x) = false, it follows that E 2 ( x) = false. Finally suppose the first observation (k = 1) is o 3. If the next observation is o 1, then E 2 ( x) = false, implying that the observation sequence o 3 o 1 is not possible. If the next observation is o 2, then E 2 ( x) = [v, F = 0, 1] [v, F = 1, 1] [v, F = 2, 1]. If the next observation is o 3, then E 2 ( x) = E 1 ( x), and the situation is the same as when just the observation o 3 had occurred. In both the above cases, we know that a fault has occurred in past. k = 3: After two observations the only predicate that is not already visited is the one following the observation sequence o 3 o 2 : E 2 ( x) = [v, F = 0, 1] [v, F = 1, 1] [v, F = 2, 1]. So we examine this predicate. If the next observation is o 1, then E 3 ( x) = [v, F = 3, 1] [v, F = 2, 1], and the situation is the same as when just the observation o 3 had occurred. If the next observation is o 2, then E 3 ( x) = false, implying that the observation sequence o 3 o 2 o 2 is not possible. If the next observation is o 3, then E 3 ( x) = [v, F = 3, 1] [v, F = 2, 1], and the situation is the same as when just the observation o 3 had occurred. Thus the third iteration step does not introduce a predicate that has not been visited before, further iterations will yield a predicate that has already been computed above. For the above example, the first three steps of the online diagnosis considered above can be represented as a offline diagnoser as shown in Figure 8. In Figure 8, when there is a transition from a predicate false E( x) B( x)] to a predicate [false E( x) B( x)], a fault is detected and reported. Further by removing any diagnoser state that is either impossible (i.e., [E( x) = false]) or where the diagnoser is certain that a fault has occurred in past (i.e., false E( x) B( x)), we can obtain a reduced representation of the diagnoser as shown in Figure 9. Any feasible observation sequence that is not executable in the reduced diagnoser automaton indicates that a fault must have occurred in past. 16

[v,f=0,0] OR [v,f=1,1] OR [v,f=2,1] o 3 o 3 o 1 o 2 o 2 [v,f=3,1] OR [v,f=2,1] o 1 [v,f=3,0] OR [v,f=2,0] o 1 False o 2 o1 o 3 o 3 o 1, o2, o 3 o 2 [v,f=0,1] OR [v,f=1,1] OR [v,f=2,1] Figure 8: Diagnoser for mouse in a maze Remark 3 Instead of tracking the possible states of the total system upon each observation, another algorithm can be devised that only tracks the states of the fault-free system. In order to obtain the rules-based model of the fault-free system, rules corresponding to the faulty events can be removed, and any dependence on the values of the faulty signals can be omitted from the guard conditions of the non-faulty signals. Letting E Σ F k ( x) denote the corresponding predicate to be tracked for online diagnosis, a fault can be detected at the arrival of kth observation if E Σ F k ( x) false and E Σ F k+1 ( x) = false. Indeed an automaton tracking the different values of E Σ F k is the same as the reduced diagnoser described above. o 1 [v,f=0,0] OR [v,f=1,1] OR [v,f=2,1] [v,f=3,0] OR [v,f=2,0] o 2 o 3 Figure 9: Reduced diagnoser for mouse in a maze 17

5 Illustrative Example In order to further illustrate our technique, we revisit the transporter example introduced in Example 1. We begin with the test for diagnosability. We let v = (T f, T r, T fson, i, e) denote the state-variables of the transporter, and augment it with the boolean valued variable F to obtain the augmented state-variable x = ( v, F ). The augmented rules-based model of the transporter with T fsonf fault is given in Figure 10. Initial condition: I( x) = I(T f, T r, T fson, i, e, F ) = [off, off, off, dn, dn, 0] Event occurrence rules: iup : [(T f T fson) T r (i e)] [i, F i, F ] idn : [T r T f (i e)] [i, F i, F ] eup : [(T f T fson) T r (i e)] [e, F e, F ] edn : [T r T f (i e)] [e, F e, F ] T fon : [T f] [T f, F T f, F ] T foff : [T f T fson] [T f, F T f, F ] T ron : [T r] [T r, F T r, F ] T roff : [T r] [T r, F T r, F ] T fsonf : [T f T fson] [T fson, F T fson, 1] T fsonr : [T fson] [T fson, F T fson, 0] Figure 10: Augmented rules-based model of transporter with T f sonf fault Next using the state-variable x = ( v, F ) for the second copy of the augmented model, where v = (T f, T r, T fson, i, e ), we compute the masked composition of the two augmented models. The resulting rules-based model is shown in Figure 11. Using NuSMV for model-checking the condition for the diagnosability, we found that the transporter with T fsonf fault is not diagnosable. The above example demonstrates that the symbolic technique for diagnosis developed above allows for the diagnosability verification of practical systems. When a given system is diagnosable, we can perform an online diagnosis and we are guranteed that every fault will be detected in a bounded delay. Otherwise, the system can be made diagnosable by sensor refinement using the technique developed in [22]. For example the transporter of Figure 1 prone to the forward motor stuck on fault, T fsonf, is not diagnosable. If we install a smart sensor that can sense the motor speed, then the occurrence of the T fsonf fault can be detected when T foff holds, but the motor speed is non-zero. By tracking all possible observation sequences and computing the corresponding stateestimates as computed by a reduced diagnoser, we can obtain the entire offline diagnoser. The result of such a computation for the transporter system is shown as an automaton in 18

Initial condition: I( x, ( x ) = (I( v), 0, I( v ), 0) Event occurrence rules: (iup, iup) : C iup ( x) C iup ( x ) [i, F, i, F i, F, i, F ] (idn, idn) : C idn ( x) C idn ( x ) [i, F, i, F i, F, i, F ] (eup, eup) : C eup ( x) C eup ( x ) [e, F, e, F e, F, e, F ] (edn, edn) : C edn ( x) C edn ( x ) [e, F, e, F, e, F e, F ] (T fon, T fon) : C T fon ( x) C T fon ( x ) [T f, F, T f, F T f, F, T f, F ] (T foff, T foff) : C T foff ( x) C T foff ( x ) [T f, F, T f, F T f, F, T f, F ] (T ron, T ron) : C T ron ( x) C T ron ( x ) [T r, F, T r, F T r, F, T r, F ] (T roff, T roff) : C T roff ( x) C T roff ( x ) [T r, F, T r, F T r, F, T r, F ] (ɛ, T fsonf ) : C T fsonf ( x ) [T fson, F, T fson, F T fsonf, T fson, 1] (T fsonf, ɛ) : C T fsonf ( x) [T fson, F, T fson, F T fson, 1, T fson, F ] (T fsonr, T fsonr) : C T fsonr ( x) C T fsonr ( x ) [T fson, F, T fson, F T fson, F, T fson, F ] Figure 11: Masked synchronization of two augmented rules-based model of transporter Figure 12. Owing to the fact that the transporter system is not diagnosable, the stateestimate computed by the reduced diagnoser never becomes f alse, i.e., it is unable to detect the occurrence of the T fsonf fault, as expected. Inclusion of a motor speed sensor will essentially make T fsonf fault observable in the T f = off state, in which case the occurrence of this observable event in the T f = off state will make the state-estimate computed by the reduced diagnoser become false, thereby detecting the occrrence of fault T fsonf. 6 Conclusion The rules-based modeling formalism of [6, 7, 15, 17] has been used to model discreteevent systems prone to failures. Stuck-signal faults, and system/equipment faults can be easily modeled in the rules-based modeling formalism. A rules-based model is compact for two reasons. First, the number of rules equals the number of events, i.e., it is linear in the number of signals. Second, the rules-based models are able to expolit any structure inherent in the underlying systems, which are usually man-made and so by design not very involved. So the complexity of each guard condition in each rule, measured as the number of boolean operators present, is likely small. Symbolic methods for verifying diagnosability by reduction to first-order linear temporal logic model-checking, and also for online diagnosis using predicates and their transformers have been developed. When the state-variables are bounded, FOLTL model-checking is equivalent to PLTL model-checking and thus amenable to automatic verification using tools such as NuSMV. Owing to the compactness of rulesbased model and given the success of model-checking tools (such as NuSMV) in verifying 19

iup eup Troff Tron 2 Tfon Tfon 3 8 Tfon Tron Troff Tfon 5 edn 12 Tfon Tron Troff Tfon 9 1 Tfoff Troff Tron Tfoff Legend: iup/idn: Intermediate angle sensor High/Low eup/edn: Extended angle sensor High/Low Tfon/off: Transporter forward command on/off Tron/off: Transporter reverse command on/off 4 idn 7 Tfoff Tron Tfoff Troff edn 6 11 Tfoff Tron Tfoff Troff 10 Figure 12: Diagnoser for transporter with T f sonf fault industrial size problems, we hope that the symbolic techniques developed in this paper will allow for the diagnosis analysis of large industrial applications. References [1] A. Beneveniste, E. Fabre, S. Haar, and C. Jard. Diagnosis of asynchronous discreteevent systems: A net unfolding approach. IEEE Transactions on Automatic Control, 48(5):714 727, 2003. [2] R. K. Boel and J. H. van Schuppen. Decentralized failure diagnosis for discrete-event systems with constrained communication between diagnosers. In Proceedings of International Workshop on Discrete Event Systems, 2002. [3] A. Bouloutas, G. W. Hart, and M. Schwartz. On the design of observers for fault detection in communication networks. In A. Kershenbaum and et al., editors, Network Management and Control, pages 319 338. Plenum Press, 1990. [4] A. Bouloutas, G. W. Hart, and M. Schwartz. Simple finite-state fault detectors for communication networks. IEEE Trans. on Communications, 40(3):477 479, March 1992. [5] V. Chandra, Z. Huang, and R. Kumar. Automated control syntesis for and assembly line using discrete event system control theory. IEEE Transactions on Systems, Man, and Cybernetics: Part C, 33(2):284 289, 2003. [6] V. Chandra and R. Kumar. A event occurrence rules-based compact modeling formalism for a class of discrete event systems. In Proceedings of 2002 American Control Conference, pages 724 729, Anchorage, Alaska, 2002. 20

[7] V. Chandra and R. Kumar. A event occurrence rules-based compact modeling formalism for a class of discrete event systems. Mathematical and Computer Modeling of Dynamical Systems, 8(1):49 73, 2002. [8] A. Cimatti, E. M. Clarke, E. Giunchiglia, F. Giunchiglia, and M. Pistore. NuSMV 2: An opensourse tool for symbolic model checking. In Proccedings of International Conference on Computer-Aided Verification, Copenhagen, Denmark, July 2002. [9] O. Contant, S. Lafortune, and D. Teneketzis. Diagnosis of intermittent faults. Discrete Event Dynamical Systems: Theory and Application, 14:171 202, 2004. [10] A. Darwiche and G. Provan. Exploiting system structure in model-based diagnosis of discrete event systems. In Proceedings of the Seventh International Workshop on Principles of Diagnosis, Val Morin Canada, 1996. [11] S. R. Das and L. E. Holloway. Characterizing a confidence space for discrete event timings for fault monitoring using discrete sensing and actuation signals. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 30(1):52 66, 2000. [12] R. Debouk, S. Lafortune, and D. Teneketzis. Coordinated decentralized protocols for failure diagnosis of discrete event systems. Discrete Event Dynamical Systems: Theory and Applications, 10:33 79, 2000. [13] E. A. Emerson. Temporal and modal logic. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science. Elsevier Science Publishers, 1990. [14] D. N. Godbole, J. Lygeros, E. Singh, A. Deshpande, and A. E. Lindsey. Communication protocols for a fault-tolerant automated highway system. IEEE Transactions on Control Systems Technology, 8(5):787 800, September 2000. [15] Z. Huang, V. Chandra, S. Jiang, and R. Kumar. Modeling discrete event systems with faults using a rules based modeling formalism. In 2002 IEEE Conference on Decision and Control, pages 4012 4017, Las Vegas, Dec 2002. [16] Z. Hunag, S. Bhattacharyya, V. Chandra, S. Jiang, and R. Kumar. Diagnosis of discrete event systems in rules-based model using first order linear temporal logic. In Proceedings of 2004 American Control Conference, pages 5114 5119, Boston, MA, June 2004. [17] Z. Hunag, V. Chandra, S. Jiang, and R. Kumar. Modeling discrete event systems with faults using a rules-based modeling formalism. Mathematical and Computer Modeling of Dynamical Systems, 9(3):233 254, 2003. [18] S. Jiang, Z. Huang, V. Chandra, and R. Kumar. A polynomial time algorithm for diagnosability of discrete event systems. IEEE Transactions on Automatic Control, 46(8):1318 1321, 2001. 21

[19] S. Jiang and R. Kumar. Diagnosis of repeated failures for discrete event systems with linear-time temporal logic specifications. In Proceedings of IEEE Conference on Decision and Control, pages 3221 3226, Maui, Hawaii, 2003. [20] S. Jiang and R. Kumar. Failure diagnosis of discrete event systems with linear-time temporal logic fault specifications. IEEE Transactions on Automatic Control, 49(6):934 945, 2004. [21] S. Jiang, R. Kumar, and H. E. Garcia. Diagnosis of repeated/intermittent failures in discrete event systems. IEEE Transactions on Robotics and Automation, 19(2):310 323, 2003. [22] S. Jiang, R. Kumar, and H. E. Garcia. Optimal sensor selection for discrete event systems under partial observation. IEEE Transactions on Automatic Control, 48(3):369 381, 2003. [23] R. Kumar, V. K. Garg, and S. I. Marcus. Predicates and predicate transformers for supervisory control of discrete event systems. IEEE Transactions on Automatic Control, 38(2):232 247, February 1993. [24] M. Larsson. Behavioral and structural model based approaches to discrete diagnosis. PhD thesis, Linkoping University, Linkoping, Sweden, 1999. [25] F. Lin. Diagnosability of discrete event systems and its applications. Discrete Event Dynamic Systems: Theory and Applications, 4(1):197 212, 1994. [26] F. Lin, J. Markee, and B. Rado. Design and test of mixed signal circuits: a discrete event approach. In Proceedings of the 32nd IEEE Conference on Decision and Control, pages 246 251, 1993. [27] J. Lygeros, D. N. Godbole, and M. Broucke. A fault tolerant control architecture for automated highway system. IEEE Transactions on Control Systems Technology, 8(2):205 219, March 2000. [28] D. Pandalai and L. Holloway. Template languages for fault monitoring of timed discrete event processes. IEEE Transactions on Automatic Control, 45(5):868 882, May 2000. [29] C. Pecheur and A. Cimatti. Formal verification of diagnosability via symbolic model checking. In Workshop on Model Checking and Artificial Intelligence, Lyon, France, 2002. [30] W. Qiu and R. Kumar. Decentralized failure diagnosis of discrete event systems. In Proceedings of 2004 International Workshop on Discrete Event Systems, Reim, France, September 2004. 22

[31] S. L. Ricker and J. H. van Schuppen. Decentralized failure diagnosis with asynchronous communication between supervisors. In Proceedings of the European Control Conference, pages 1002 1006, 2001. [32] M. Sampath and S. Lafortune. Active diagnosis of discrete event systems. IEEE Transactions on Automatic Control, 43(7):908 929, 1998. [33] M. Sampath, R. Sengupta, S. Lafortune, K. Sinaamohideen, and D. Teneketzis. Diagnosability of discrete event systems. IEEE Transactions on Automatic Control, 40(9):1555 1575, September 1995. [34] M. Sampath, R. Sengupta, S. Lafortune, K. Sinaamohideen, and D. Teneketzis. Failure diagnosis using discrete event models. IEEE Transactions on Control Systems Technology, 4(2):105 124, March 1996. [35] R. Sengupta and S. Tripakis. Decentralized diagnosis of regular language is undecidable. In Proceedings of IEEE Conference on Decision and Control, pages 423 428, Las Vegas, NV, December 2002. [36] R. Su, W. M. Wonham, J. Kurien, and X. Koutsoukos. Distributed diagnosis for qualitative systems. In Proceedings of International Workshop on Discrete Event Systems, 2002. [37] G. Westerman, R. Kumar, C. Stroud, and J. R. Heath. Discrete event systems approach for delay fault analysis in digital circuits. In Proceedings of 1998 American Control Conference, Philadelphia, PA, 1998. [38] T. Yoo and H. E. Garcia. Event diagnosis of discrete-event systems with uniformly and nonuniformly bounded diagnosis delays. In Proceedings of 2004 American Control Conference, pages 5102 5107, Boston, MA, June 2004. [39] T. S. Yoo and S. Lafortune. Polynomial-time verification of diagnosability of partially observed discrete-event systems. IEEE Transactions on Automatic Control, 47(9):1491 1495, 2002. [40] S. Young and V. K. Garg. Model uncertainty in discrete event systems. SIAM Journal of Control and Optimization, 33(1):208 226, 1995. [41] S. H. Zad, R. H. Kwong, and W. M. Wonham. Fault diagnosis in discrete-event systems: Framework and model reduction. IEEE Transactions on Automatic Control, 48(7):1199 1212, 2003. 23