Model-Based Systems. Bernhard Peischl, Neal Snooke, Gerald Steinbauer and Cees Witteveen

Size: px

Start display at page:

Download "Model-Based Systems. Bernhard Peischl, Neal Snooke, Gerald Steinbauer and Cees Witteveen"

Annabella Snow
5 years ago
Views:

3 Model-Based Systems The Model-Based Systems (MBS) paradigm refers to a methodology that allows for description of various kinds of systems for various tasks in a uniform way. For example, MBS has been used to specify monitoring tasks in medical systems, for planning in cognitive systems, and control and diagnosis in hardware and software systems. Consequently, research in MBS is spread across various application domains and different tasks. As lots of scientific workshops are application specific or system and task oriented, it is difficult to exchange experiences and novel concepts across the various application domains and tasks. In recent years MBS technology has increasingly contributed to mastering the inherent and ever increasing complexity of software and software-enabled systems. Thus it is the aim of this workshop to cross-fertilize the established concepts in model-based software engineering and MBS technology to further leverage model-oriented techniques in the software engineering domain. MBS 2008 workshop attracted researchers and practitioners dealing with modeling for specific reasoning tasks, knowledge representation, qualitative reasoning, and related areas such as model-based testing and fault detection and localization. The MBS Workshop on Model-Based Systems is the fourth workshop of a series of workshops on the this topic. Previous workshops were collocated with the ECAI 2004 in Valencia, Spain, the IJCAI 2005 in Edinburgh, United Kingdom and the ECAI 2006 in Riva del Garda, Italy. The submissions to the MBS 2008 cover a wide range of topics with in the area of model-based systems. They range from more application-oriented solutions to modeling problems including automated generation and debugging of models to more theoretical contributions in the areas of diagnosis, qualitative reasoning and testing. The good mixture of theoretical and application-oriented articles from various domains promises a very interesting and fruitful workshop. Finally we like to thank all the authors who have submitted to this workshop. Moreover we like to thank all members of the program committee for their careful reviews. Bernhard Peischl, Neal Snooke, Gerald Steinbauer and Cees Witteveen July 2008 i

4 Organizing Committee Bernhard Peischl Neal Snooke Gerald Steinbauer Cees Witteveen Technische Universität Graz, Austria University of Wales, Aberystwyth, UK Technische Universität Graz, Austria Delft University of Technology, The Netherlands Program Committee Gautam Biswas Bert Bredeweg Marie-Odile Carlos J. Alonso Gonzlez Bernhard Peischl Caudia Picardi Belarmino Pulido Junquera Martin Sachenbacher Paulo Salles Neal Snooke Gerald Steinbauer Cees Witteveen Vanderbilt University Universiteit van Amsterdam, The Netherlands Cordier IRISA Campus de Beaulieu, France Universidad de Valladolid, Spain Technische Universität Graz, Austria Universit di Torino, Itanly Universidad de Valladolid, Spain Technische Universität München, Germany Universidade de Brasilia, Brazil University of Wales, Aberystwyth, UK Technische Universität Graz, Austria Delft University of Technology, The Netherlands ii

5 Table of Contents Comparing GDE and Conflict-based Diagnosis Ildikó Flesch, Peter J.F. Lucas On computing minimal conflicts for ontology debugging Kostyantyn Shchekotykhin, Gerhard Friedrich, Dietmar Jannach 7 Supporting Conceptual Knowledge Capture Through Automatic Modelling Jochem Liem, Hylke Buisman, Bert Bredeweg Automated Learning of Communication Models for Robot Control Software Alexander Kleiner, Gerald Steinbauer, Franz Wotawa Relaxation of Temporal Observations in Model-Based Diagnosis of Discrete-Event Systems Gianfranco Lamperti, Federica Vivenzi, Marina Zanella The Concept of Entropy by means of Generalized Orders of Magnitude Qualitative Spaces Llorenş Roselló, Francesc Prats, Mónica Sánchez, Núria Agell.. 31 Model-based Testing using Quantified CSPs: A Map Martin Sachenbacher, Stefan Schwoon iii

7 Comparing GDE and Conflict-based Diagnosis Ildikó Flesch 1 and Peter J.F. Lucas 2 Abstract. Conflict-based diagnosis is a recently proposed method for model-based diagnosis, inspired by consistency-based diagnosis, that incorporates a measure of data conflict, called the diagnostic conflict measure, to rank diagnoses. The probabilistic information that is required to compute the diagnostic conflict measure is represented by means of a Bayesian network. The general diagnostic engine is a classical implementation of consistency-based diagnosis and incorporates a way to rank diagnoses using probabilistic information. Although conflict-based and consistency-based diagnosis are related, the way the general diagnostic engine handles probabilistic information to rank diagnoses is different from the method used in conflict-based diagnosis. In this paper, both methods are compared to each other. 1 INTRODUCTION In the last two decades, research into model-based diagnostic software has become increasingly important, mainly because the complexity of devices, for which such software can be used, has risen considerably and trouble shooting of faults in such devices has therefore become increasingly difficult. Basically, two types of model-based diagnosis are being distinguished in literature: (i) consistency-based diagnosis [2, 8], and (ii) abductive diagnosis [7]. In consistency-based diagnosis a diagnosis has to be consistent with the modelled system behaviour and observations made on the actual system, whereas in abductive diagnosis the observations have to be implied by the modelled system given the diagnosis [1]. In this paper, we focus on consistency-based diagnosis as implemented in the general diagnostic engine, GDE for short, [2]. In addition, particular probabilistic extensions to consistency-based diagnosis as implemented in GDE are considered [2]. There is also a third kind of model-based diagnosis that can be best seen as a translation of consistency-based diagnosis from a mixed logical-probabilistic setting to a purely probabilistic setting, using a statistical measure of information conflict. The method has been called conflict-based diagnosis; it exploits Bayesian-network representations for the purpose of model-based diagnosis [4]. Although both GDE and conflict-based diagnosis take consistency-based diagnosis as a foundation, the way uncertainty is handled, as well as the way in which diagnoses are ranked, are different. The aim of this paper is to shed light on the differences and similarities between these two approaches to model-based diagnosis. It is shown that conflict-based diagnosis yields a ranking that, under particular circumstances, is more informative than that obtained by GDE. 1 Department of Computer Science, Maastricht University, ildiko@micc.unimaas.nl 2 Institute for Computing and Information Sciences, Radboud University Nijmegen, peterl@cs.ru.nl The paper is organised as follows. In Section 2, the necessary basic concepts from model-based diagnosis, including GDE, and the use of Bayesian networks for model-based are reviewed. Next, in Section 3, the basic concepts from conflict-based diagnosis are explained. What can be achieved by the method of probabilistic reasoning in GDE is subsequently compared to the method of conflict-based diagnosis in Section 4. Finally, in Section 5, the paper is rounded off with some conclusions. 2 PRELIMINARIES 2.1 Model-based Diagnosis In the theory of consistency-based diagnosis [8, 2, 3], the structure and behaviour of a system is represented by a logical diagnostic system S L = (SD,COMPS), where SD denotes the system description, which is a finite set of logical formulae, specifying structure and behaviour; COMPS is a finite set of constants, corresponding to the components of the system that can be faulty. The system description consists of behaviour descriptions and connections. A behavioural description is a formula specifying normal and abnormal (faulty) functionality of the components. An abnormality literal of the form A c is used to indicate that component c is behaving abnormally. whereas literals of the form A c are used to indicate that component c is behaving normally. A connection is a formula of the form i c o c, where i c and o c denote the input and output of components c and c, respectively. A logical diagnostic problem is defined as a pair P L = (S L, OBS), where S L is a logical diagnostic system and OBS is a finite set of logical formulae, representing observations. Adopting the definition from [3], a diagnosis in the theory of consistency-based diagnosis is defined as follows. Let C consist of the assignment of abnormal behaviour, i.e. A c, to the set of components C COMPS and normal behaviour, i.e. A c, to the remaining components COMPS C, then C is a consistency-based diagnosis of the logical diagnostic problem P L iff the observations are consistent with both the system description and the diagnosis; formally: SD C OBS. Here, stands for the negation of the logical entailment relation, and represents a contradiction. Usually, one is in particular interested in subset-minimal diagnoses, i.e. diagnoses C, where the set C is subset minimal. Thus, a subset-minimal diagnosis assumes that a subset-minimal number of components are faulty; this often corresponds to the most-likely diagnosis. 1

8 1 0 1 X A A X 2 R predicted [1] observed 1 predicted [0] observed Figure 1. Full adder with all outputs computed under the assumption of normality and observed and predicted outputs; i 1 (1), ī 2 (0) and i 3 (1) indicate the inputs of the circuit and o 1 (1) and ō 2 (0) its observed outputs. EXAMPLE 1 Figure 1 presents the full-adder example, which consists of two AND gates (A1 and A2), one OR gate (R1) and two exclusive-or (XOR) gates (X1 and X2). Note that the predicted output ō 1 contradicts with the observation o 1, which is also the case for gate X2. As a consequence, the assumption that all components are behaving normally is invalid; thus, this is not a consistency-based diagnosis. However, a consistency-based diagnosis would be to assume the malfunctioning of component X1, as this would restore consistency. 2.2 GDE Next, GDE is briefly described, where [2] is used as a point of reference; however, the terminology defined above in this paper is adopted throughout this section. For example, where [2] speaks of a candidate in this paper the term diagnosis is used. The logical reasoning implemented by GDE can best be seen as an efficient implementation of consistency-based diagnosis. GDE can also deal with uncertainty by attaching a prior probability of malfunctioning to components. After an observation is made, the prior probability becomes a posterior probability, conditioned on this observation. Based on new observations, there may be previous diagnoses which become inconsistent with the observations and the system description. The set of diagnoses that are still possible is denoted by R and called the set of remaining diagnoses; it can be partitioned into two disjoint subsets: (i) the set of diagnoses that imply the observations, called the set of selected diagnoses and denoted by S, and (ii) the set of diagnoses that neither predict nor contradict the observations, called the set of uncommitted diagnoses, denoted by U. By definition, R = S U and S U =. The posterior probability of a set of behaviour assumptions that is either inconsistent (not in R), a selected diagnosis (in S), or an uncommitted diagnosis (in U) is computed as follows: 8 >< 0 if C R P( P( C OBS) = C ) if C S (1) >: P(OBS) P( C )/m P(OBS) if C U where m = 1/P(OBS C). Finally, the probability P(OBS) is computed as follows: X P(OBS) = P(OBS, C) C R = X P(OBS, C) + X P(OBS, C) C S = X C S P( C) + X C U C U P( C) m. (2) Computation of P( C) is made easy in GDE by assuming independence between components behaving normally or abnormally. One of the consequences of this assumption is the following proposition. Proposition 1 Let P L = (SD, OBS) be a logical diagnostic system with associated joint probability distribution P as defined above for GDE, such that P(A c) P( A c) for each c COMPS, and let C and C be two consistency-based diagnoses that are both in either S or U, then it holds that: P( C OBS) P( C OBS) if C C. Proof. The result follows from the assumption of independence together with P(A c) P( A c): P( C) = Y Y P(A c) P( A c) c C Y c C P(A c) c COMPS C Y c COMPS C P( A c) = P( C ) Filling this result into Equation (1) gives the requested outcome. For further detail of GDE the reader is referred to the paper by De Kleer and Williams [2]. The following example illustrates how GDE works. Table 1. Comparison of the values of the diagnostic conflict measure and GDE for the full-adder circuit with observations OBS = ω = {i 1, ī 2,i 3,o 1,ō 2 } and the probability distribution P, assuming that P(a c) = P(o c a c) = k X2 R1 X1 A1 A2 conf[p δ k](ω) GDE s P( k OBS) EXAMPLE 2 Reconsider the full-adder shown in Figure 1, where 2

9 each component can only be normal or abnormal. Assume that the probability of faulty behaviour of a component is equal to P(A c) = Without any observations, the diagnosis space consists of 2 5 = 32 members, where the diagnosis = { A c c COMPS} is the most probable diagnosis with probability P( ) = (1 P(A c)) 5 = (0.999) When more components are assumed to be faulty, the probabilities decrease quickly to very small values. Now, suppose that OBS = {i 1,ī 2,i 3,o 1,ō 2}. The new probabilities obtained from GDE are shown in the right-most column of Table 1, where 1 for a component means normal behaviour and 0 means abnormal behaviour. The diagnoses k, for k = 1, 3, 4, 9,..., 12, 17, 19, respectively, are eliminated by these observations. Furthermore, since there are no diagnoses in the set R that imply the two output observations, the set of S is empty and, thus, the set of uncommitted diagnoses U is equal to R. Then, the posterior probability of a diagnosis k can be computed as follows: P( k OBS) = P( k )/m ( P C U P( C))/m = P( k ) P C U P( C), where here P C U P( C) In the example, the probability of the k s that still can be diagnoses become about 1000 times more likely when conditioning on the observations than without observations. However, either with or without observations, the diagnosis with the fewest number of abnormality assumptions is the most likely one. Thus the resulting diagnostic reasoning behaviour is very similar to that obtained by exploiting the concept of subset-minimal diagnosis. 2.3 Bayesian Networks and the Conflict Measure Let P(X) be a joint probability distribution of the set of discrete binary random variables X. A single random variable taking the values true or false is written as (upright) y and ȳ, respectively. If we refer to arbitrary values of a set of variables X, sometimes a single variable, this will be denoted by (italic) x. Let U, W, Z X be disjoint sets of random variables, then U is said to be conditionally independent of W given Z, if for each value u, w and z: P(u w, z) = P(u z), withp(w, z) > 0. (3) A Bayesian network B is defined as a pair B = (G, P), where G = (V, E) is an acyclic directed graph, with set of vertices V and set of arcs E, P is the associated joint probability distribution of the set of random variables X which is associated 1 1 with V. We will normally use the same names for variables and their associated vertices. The factorisation of P respects the independence structure of G as follows: P(x) = Q y x P(y π(y)), where π(y) denotes the values of the parent set of vertex Y. Finally, we will frequently make use of marginalising out particular variables W written as P(u) = P w P(u, w). Bayesian networks specify probabilistic patterns that must be fulfilled by observations. Observations are random variables that obtain a value through an intervention, such as a diagnostic test. The set of observations is denoted by ω. The conflict measure has been proposed as a tool for the detection of potential conflicts between observations and a given Bayesian network and is defined as [5]: conf(ω) = log with ω = ω 1 ω 2 ω m. P(ω1)P(ω2) P(ωm), (4) P(ω) P(v u) = 0.8 P(v ū) = 0.01 Figure 2. v u P(u) = 0.2 Example of a Bayesian network. n P(n u) = 0.9 P(n ū) = 0.1 The interpretation of the conflict measure is as follows. A zero or negative conflict measure means that the denominator is equally likely or more likely than the numerator. This is interpreted as that the joint occurrence of the observations is in accordance with the probabilistic patterns in P. A positive conflict measure, however, implies negative correlation between the observations and P indicating that the observations do not match P very well. EXAMPLE 3 Consider the Bayesian network shown in Figure 2, which describes that stomach ulcer (u) may give rise to both vomiting (v) and nausea (n). Now, suppose that a patient comes in with the symptoms of vomiting and nausea. The conflict measure then has the following value: conf({v,n})=log P(v)P(n) =log 0.5. P(v,n) As the conflict measure assumes a negative value, there is no conflict between the two observations. This is consistent with medical knowledge, as we do expect that a patient with stomach ulcer displays symptoms of both vomiting and nausea. As a second example, suppose that a patient has only symptoms of vomiting. The conflict measure now obtains the following value: conf({v, n}) = log log As the conflict measure is positive, there is a conflict between the two observations, which is in accordance to medical expectations. 2.4 Bayesian Diagnostic Problems A Bayesian diagnostic system is denoted as a pair S B = (G, P), where P is a joint probability distribution of the vertices of G, interpreted as random variables, and G is obtained by mapping a logical diagnostic system S L = (SD,COMPS) to a Bayesian diagnostic system S B as follows [6]: 1. component c is represented by its input I c and output O c vertices, where inputs are connected by an arc to the output; 2. to each component c there belongs an abnormality vertex A c which has an arc pointing to the output O c. Figure 3 shows the Bayesian diagnostic system corresponding to the logical diagnostic system shown in Figure 1. Let O denote the set of all output variables and I the set of all input variables, let o and i denote (arbitrary) values of the set of output and input variables, respectively, and let δ C = {a c c C} {ā c c COMPS C} be the set of values of the abnormality variables A c, with c COMPS. The latter definition establishes a link between C in logical diagnostic systems and the abnormality variables in Bayesian diagnostic systems. 3

10 I 1 I 2 diagnosis is that the conflict measure can be used to rank these consistency-based diagnoses (cf. [4]). We start with the definition of the diagnostic conflict measure. I 3 A X1 O X1 O A1 A A1 Definition 1 (diagnostic conflict measure) Let P B = (S B, ω) be a Bayesian diagnostic problem. The diagnostic conflict measure, denoted by conf[p δ C ](, ), is defined for P(ω δ C) 0, as: A X2 O X2 O A2 A A2 O R1 A R1 conf[p δ C ](i ω, o ω) = log with observations ω = i ω o ω. P(iω δc)p(oω δc), (6) P(i ω, o ω δ C) Figure 3. The graphical representation of a Bayesian diagnostic system corresponding to the full-adder in Figure 1. Due to the independences that hold for a Bayesian diagnostic system, it is possible to simplify the computation of the joint probability distribution P by exploiting the following properties: Property 1: the joint probability distribution of a set of output variables O can be factorised as follows: P(o) = X Y P(i, δ c) P(o c π(o c)) ; (5) i,δ c c COMPS Property 2: the input variables and abnormality variables are mutually independent of each other, formally: P(i, δ c) = P(i)P(δ c). Recall that logical diagnostic problems are logical diagnostic systems augmented with observations; Bayesian diagnostic problems are defined similarly. The input and output variables that have been observed are now referred to as I ω and O ω, respectively. The unobserved input and output variables will be referred to as I u and O u respectively. The set of actual observations is then denoted by ω = i ω o ω. Thus, a Bayesian diagnostic problem P B = (S B, ω) consists of (i) a Bayesian diagnostic system representing the components, their behaviour and interaction, and (ii) a set of observations ω [4]. In Bayesian diagnostic problems, the normal behaviour of component c is expressed in a probabilistic setting by the assumption that a normally functioning component yields an output value with probability of either 0 or 1. Thus, P(o c π(o c)) {0, 1}, when the abnormality variable A c π(o c) takes the value false, i.e. is ā c. For the abnormal behaviour of a component c it is assumed that the random variable O c is conditionally independent of its parent set π(o c) if component c is assumed to function abnormally, i.e. A c takes the value true, written as: P(o c π(o c)) = P(o c a c). Thus, the fault behaviour of an abnormal component cannot be influenced by its environment. We use the abbreviation P(o c a c) = p c. Note that this assumption is not made when a component is behaving normally, i.e. when ā c holds. 3 CONFLICT-BASED DIAGNOSIS There exists a 1 1 correspondence between a consistency-based diagnosis C of a logical diagnostic problem P L and a δ C for which it holds that P(ω δ C) 0 if P B is the result of the mapping described above, applied to P L. The basic idea behind conflict-based Using the independence properties of Bayesian diagnostic problems we obtain [4]: conf[p δ C Pi ](i ω, o ω) = log P(i) P Q o u c P(oc π(oc) Pi u P(i P u) o u Qc P(oc π(oc)). where π(o c) may include input variables from I. The diagnostic conflict measure can take positive, zero and negative values having different diagnostic meaning. Note that the numerator of the diagnostic conflict measure is defined as the probability of the individual occurrence of the inputs and outputs, whereas the denominator is defined as the probability of the joint occurrence of the observations. Intuitively, if the probability of the individual occurrence of the observations is higher than that of the joint occurrence, then the observations do not support each other. Thus, more conflict between diagnosis and observations yields higher (more positive) values of the diagnostic conflict measure. This means that the sign of the diagnostic conflict measure, negative, zero or positive, can already be used to rank diagnoses in a qualitative fashion. This interpretation gives rise to the following definition. Definition 2 ((minimal) conflict-based diagnosis) Let P B = (S B, ω) be a Bayesian diagnostic problem and let δ C be a consistency-based diagnosis of P B (i.e. P(ω δ C) 0). Then, δ C is called a conflict-based diagnosis if conf[p δ C ](ω) 0. A conflictbased diagnosis δ C is called minimal, if for each conflict-based diagnosis δ C it holds that conf[p δ C ](ω) conf[p δ C ](ω). In general, the diagnostic conflict measure has the important property that its value can be seen as the overall result of a local analysis of component behaviours under particular logical and probabilistic normality and abnormality assumptions. A smaller value of the diagnostic conflict measure is due to a higher likelihood of dependence between observations, and this indicates a better fit between observations and component behaviours. Consider the following example. EXAMPLE 4 Reconsider the full-adder circuit example from Figure 1. Let as before ω = {i 1,ī 2,i 3,o 1, ō 2}. The diagnostic conflict measures for all the possible diagnoses are listed in Table 1. As an example, the diagnostic conflict measures for the diagnoses δ 5, δ 6, δ 7 and δ 8 are compared to one another for the case that the probability p X1 = P(o X1 a X1) = and it is explained what it means that, according to Table 1, conf[p δ 5 ](ω) = conf[p δ 6 ](ω) < conf[p δ 7 ](ω) = conf[p δ 8 ](ω). First, the diagnoses δ k, for k = 6 and k = 7, will be considered in more detail in order to explain the meaning of the diagnostic conflict measure. The difference in value of the diagnostic conflict measure for these two diagnoses can be explained by noting that for δ 6 it is assumed that the adder A1 functions normally and A2 abnormally, whereas for δ 7 it is the other way around. The diagnostic conflict measure of the diagnosis δ 6 is higher than that for δ 7, because if A1 4

11 functions normally, then its output has to be equal to 0, whereas if A2 functions normally, then its output has to be equal to 1. Note that it has been observed for R1 that the output is equal to 0. Because 0 is the output of the OR gate R1, its inputs must be 0; therefore, the assumption that A1 functions normally with output 0 offers a better explanation for the output 0 of the R1 gate than the assumption in δ 7 that A2 functions normally (which yields output value 1). Furthermore, since in both diagnoses δ 6 and δ 7 component X1 is assumed to be faulty, and the output of the X1 acts as the input of A2, the assumption about the output of A2 is already relaxed. This also explains the preference of diagnosis δ 6 above δ 7 and why δ 6 is ranked higher than δ 7. Next, the diagnoses δ 7, δ 8, δ 13, δ 14, δ 15, and δ 16 are compared to one another and we explain why it is reasonable that these diagnoses have the same diagnostic conflict measure value ( ). Note that both diagnoses δ 7 and δ 8 include faulty assumption a X1 and a A1, and δ 13, δ 14, δ 15 and δ 16 include the faulty behaviours a R1 and a X1. Note that for both {a X1,a A1} and {a R1,a X1}, one input of the X2 and the two inputs of R1 are relaxed. Therefore, they yield the same qualitative information about fault behaviour of the system. Below, these results are compared with those by GDE. The example above illustrates that comparing the value of the diagnostic conflict measure for different diagnoses gives considerable insight into the behavioural abnormality of a system. 4 COMPARISON In this section, the diagnostic conflict measure and GDE s probabilistic method are compared to each other in terms of the difference in ranking they give rise to. To start, the main differences between the diagnostic conflict measure and GDE are summarised, which is followed by an example. The example is used to illustrate that the diagnostic conflict measure yields a ranking that, for the probability distribution defined earlier, conveys more useful diagnostic information than the ranking by GDE. The following facts summarise the differences and similarities between the diagnostic conflict measure and GDE: 1. an abnormality assumption C is a diagnosis according to GDE iff its associated diagnostic conflict measure is defined, i.e. [4] P(ω δ C) 0 SD C OBS. 2. computation of the diagnostic conflict measure requires the conditional probability p c = P(o c a c), i.e. the probability that the component s output is o c when the component is faulty, this probability is assumed to be always 0 or 1 by GDE. 3. in GDE the probability P(a c), i.e. the probability that component c functions abnormally, acts as the basis for ranking diagnoses; this probability is not needed to rank diagnoses using conflictbased diagnosis, because it is summed out in the computation of the diagnostic conflict measure. 4. the ranking of a conflict-based diagnosis is based on a local analysis of interactions between inputs and outputs of components, taking into account the probability of particular faulty behaviours of components, and thus can be interpreted as a measure of how well the diagnosis, observations and system behaviour match; GDE offers nothing that is to some extent similar. 5. in GDE assuming more components to be functioning abnormally renders a diagnosis less likely, as proved in Proposition 1; a similar property does not hold for conflict-based diagnosis using the diagnostic conflict measure. All properties above have already been discussed extensively. Therefore, only the last issue is illustrated. EXAMPLE 5 Consider the Bayesian diagnostic problem discussed above. Table 1 summarises the results of GDE and conflict-based diagnosis, which makes it easier to compare the results. Note that δ k k and ω OBS. Consider again the Bayesian diagnostic problem P B with set of observations ω = {i 1,ī 2,i 3,o 1,ō 2} and the two diagnoses δ 5 = δ {X1} = {ā X2,ā R1, a X1,ā A1, ā A2} and δ 6 = δ {X1,A2} = {ā X2,ā R1,a X1,ā A1,a A2}. According to Table 1 the posterior probabilities computed by GDE are equal to P( 5 OBS) = and P( 6 OBS) = Thus, 5 is much more likely than 6, which is due to the inclusion of an extra abnormality assumption in 6 in comparison to 5. Consequently, the ranking obtained is compatible with subset-minimality. However, using the diagnostic conflict measure gives, according to Table 1, for both diagnoses the value of This means that relaxing one extra logical and probabilistic constraint, i.e. A2 in addition to X1, has no effect on the likelihood of the diagnosis in this case. Next consider the diagnoses 7 and 6, which both have the same number of components assumed to be abnormal, and thus obtain the same ranking according to GDE. However, δ 6 and δ 7 have a different diagnostic conflict measure, as explained in Example 4. This example again illustrates that GDE and conflict-based diagnosis rank diagnoses differently. Conflict-based diagnosis really looks into the system behaviour and, based on a local analysis of strength of the various constraints, comes up with a ranking. 5 CONCLUSION AND FUTURE WORK Conflict-based diagnosis is a new concept in the area of model-based diagnosis that has been introduced recently [4]. In this paper, we have compared this new method with the well-known probabilistic method employed in GDE. It was shown that the probabilistic method underlying conflict-based diagnosis yields detailed insight into the behaviour of a system. As the obtained information differs from information obtained from GDE, it may be useful as an alternative or complementary method. In the near future, we intend to implement the method as part of a diagnostic reasoning engine in order to build up experience with regard to the practical usefulness of the method. REFERENCES [1] L. Console and P. Torasso. A Spectrum of Logical Definitions of Model-based Diagnosis, Computational Intelligence, 7: , [2] J. de Kleer and B.C. Williams. Diagnosing multiple faults, Artificial Intelligence, 32:97 130, [3] J. de Kleer, A.K. Mackworth, and R. Reiter. Characterizing diagnoses and systems. Artificial Intelligence, 56: , [4] I. Flesch, P.J.F. Lucas and Th.P. van der Weide. Conflict-based diagnosis: Adding uncertainty to model-based diagnosis, Proc. IJCAI-2007, pp , [5] F.V. Jensen. Bayesian Networks and Decision Graphs. Springer-Verlag, New York, [6] J. Pearl. Probabilistic Reasoning in Intelligent Systems:Networks of Plausible Inference. Morgan Kauffman, San Francisco, CA, [7] D. Poole, R. Goebel and R. Aleliunas. A logical reasoning system for defaults and diagnosis, In: The knowledge Frontier, Ed. N. Cerone and G. Calla, Springer-Verlag, pp , [8] R. Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32:57 95,

12 6

13 On computing minimal conflicts for ontology debugging Kostyantyn Shchekotykhin and Gerhard Friedrich 1 and Dietmar Jannach 2 Abstract. Ontology debugging is an important stage of the ontology life-cycle and supports a knowledge engineer during the ontology development and maintenance processes. Model based diagnosis is the basis of many recently suggested ontology debugging methods. The main difference between the proposed approaches is the method of computing required conflict sets, i.e. a sets of axioms such that at least one axiom of each set should be changed (removed) to make ontology coherent. Conflict set computation is, however, the most time consuming part of the debugging process. Consequently, the choice of an efficient conflict set computation method is crucial for ensuring the practical applicability of an ontology debugging approach. In this paper we evaluate and compare two popular minimal conflict computation methods: QUICKXPLAIN and SINGLE JUST. First, we analyze best and worst cases of the required number of coherency checks of both methods on a theoretical basis assuming a black-box reasoner. Then, we empirically evaluate the run-time efficiency of the algorithms both in black-box and in glass-box settings. Although both algorithms were designed to view the reasoner as a black box, the exploitation of specific knowledge about the reasoning process (glass-box) can significantly speed up the run-time performance in practical applications. Therefore, we present modifications of the original algorithms that can also exploit specific data from the reasoning process. Both a theoretical analysis of best- and worst-case complexity as well as an empirical evaluation of run-time performance show that QUICKXPLAIN is preferable over SINGLE JUST. 1 MOTIVATION With an increasing number of applications that rely on ontologies, these knowledge bases are getting larger and more complex. Thus, corresponding knowledge bases can include definitions of thousands of concepts and roles from different domains. RDF search engines like Watson [2] for instance facilitate the creation of composite ontologies that reuse the definition of concepts and roles published on the Web. Moreover, the community of ontology users is getting more heterogeneous and nowadays includes many members from various industrial and scientific fields. Hence, different faults can be easily introduced during creation and maintenance of ontologies. Recent debugging methods as described in [4, 8, 9, 11] help the user to localize, understand, and correct faults in ontologies and are already implemented in popular ontology development tools like Protégé 3 or Swoop 4. All currently suggested approaches for ontology debugging aim at the automated computation of a set of changes to the ontology 1 University Klagenfurt, Austria, firstname.lastname@ifit.uni-klu.ac.at 2 Dortmund University of Technology, Germany, dietmar.jannach@u do.edu that restore the coherence of its terminology (diagnosis). In order to accomplish this task efficiently, current diagnosis approaches are based on the computation of axiom subsets that define an incoherent terminology (conflict sets). Diagnosis techniques: Currently, two approaches are used for the computation of diagnoses in ontology debugging: Pinpointing [12] and Reiter s model-based diagnosis (MBD) [10]. Pinpoints are used to avoid the computation of minimal hitting sets of conflict sets by approximating minimal diagnoses by their supersets. However, the pinpoints themselves are computed on the basis of all minimal conflicts. In contrast, in MBD approaches (minimal) conflicts are computed on demand and diagnoses are computed with increasing cardinality by constructing a hitting-set tree (HSTREE). Consequently, this method will find those diagnoses first that suggest minimal changes and avoids both the computation of very implausible multi-fault diagnoses and the costly computation of the set of all minimal conflicts. Note that Reiter s original proposal does not work correctly for non-minimal conflicts [5] and shows limited performance for non-minimal conflicts. A modified diagnosis method was however introduced in [4] which avoids these deficits. The general question whether pinpoints or leading diagnoses are more appropriate as an output of the debugging process is still open. Conflict computation: Current approaches like SINGLE JUST[8] and QUICKXPLAIN[4] either treat the underlying reasoner as a black-box or a glass-box. In (invasive) glass-box approaches the developer of the debugging method can exploit specifics of the theorem prover. In [9], for instance, a conflict computation approach was proposed which requires modifications of existing reasoning algorithms as its aim is to compute sets of conflicts during the reasoning process as efficiently as possible. The main drawback of such glass-box approaches however is that they can be used only for a particular description logic [1], like SHOIN (D) [9]. Hence, only a particular reasoner (or even a version of a reasoner) and a particular type of logic can be used. Moreover, glass-box modifications to reasoning systems often remove existing optimizations and thus are typically slower then their non-modified analogues. In addition, glass-box approaches to conflict set minimization do not guarantee the minimality of returned conflict sets and further (black-box) minimization is required [8]. On the other hand, black-box algorithms are completely independent from the reasoning process and just use the boolean outputs of the theorem prover. These algorithms are therefore logic-independent and can exploit the full power of highly optimized reasoning methods. Still, in case of an unsatisfiable set of axioms, all the axioms are considered as a conflict since no further information is available. Conflicts are typically minimized by additional calls to a theorem prover. In order to make black-box approaches applicable in cases where theorem proving is expensive, the number of such calls must 7

14 be minimized. In current systems, two main approaches for conflict set computation are used, SINGLE JUST [8] and QUICKXPLAIN [4]. In general, both of them can be used in glass-box and black-box settings. In this paper we show that QUICKXPLAIN is preferable over SIN- GLE JUST in both settings based on a theoretical analysis of best and worst cases and an empirical performance evaluation for a simulated average case. In addition, we propose modifications to the original algorithms to further improve the run-time performance in glass-box settings. The reminder of the paper is organized as follows, Section 2 provides theoretical study of conflict set computation methods and includes a brief description of the main algorithms as well as the analysis of extreme cases. In Section 3 we present the results of empirical evaluation of QUICKXPLAIN and SINGLE JUST in both black- and glass-box settings. The paper closes with a discussion of the future work. 2 COMPUTING MINIMAL CONFLICTS We will focus on the comparison of two popular algorithms QUICK- XPLAIN [6] and SINGLE JUST [8]. The presented comparison is possible because application scenarios and strategies of these algorithms are similar. Both methods are designed to compute only one minimal conflict set per execution. The combinations of QUICKX- PLAIN + HSTREE [4] and SINGLE JUST + HSTREE (also referred as ALL JUST) [8] are used to obtain a set of minimal diagnoses diagnoses or to enumerate minimal conflict sets (justifications). Therefore, QUICKXPLAIN and SINGLE JUST can be compared both theoretically and empirically. Algorithm: QUICKXPLAIN(B, C) Input: trusted knowledge B, set of axioms C Output: minimal conflict set CS (1) if iscoherent(b C) or C = return ; (2) AX getf aultyaxioms(c); (3) if AX then C C AX; (4) return computeconflict(b, B, C) function computeconflict(b,, C) (5) if and not iscoherent(b) then return ; (6) if C = 1 then return C; (7) int n C ; int k := split(n) (8) C 1 {ax 1,..., ax k } and C 2 := {ax k+1,..., ax n}; (9) CS 1 computeconflict(b C 1, C 1, C 2); (10) if CS 1 = then C 1 getf aultyaxioms(c 1); (11) CS 2 computeconflict(b CS 1, CS 1, C 1); (12) return CS CS 1 CS 2; function getf aultyaxioms(c) (13) AX getconflictset glassbox(); (14) if AX = then return C; (15) else return AX; Figure 1. Generalized QUICKXPLAIN algorithm QUICKXPLAIN. This algorithm (listed in Figure 1) takes two parameters as an input, B a set of axioms that are considered as correct by a knowledge engineer and C a set of axioms, which should be analyzed by the algorithm. QUICKXPLAIN follows a divide-andconquer strategy and splits the input set of axioms C into two subsets C 1 and C 2 on each recursive call. If the conflict set is a subset of either C 1 or C 2, the algorithm significantly reduces the search space. If for instance the splitting function is defined as split(n) = n/2 then the search space will be reduced in half just with one call to a reasoner. Otherwise, the algorithm re-adds some axioms ax C 2 to C 1. With the splitting function defined above, the algorithm will add a half of all axioms of the set C 2. The choice of the splitting function is crucial since it affects the number of required coherency checks. The knowledge engineer can define a very effective splitting function for a concrete problem, e.g., if there exists some a priori knowledge about faulty axioms of an ontology. However, in the general case it is recommended to use the function that splits the set C of all axioms into two subsets of the same size since the path length from the root of the recursion tree to a leaf will contain at most log 2 n nodes. Thus, if the cardinality of the searched minimal conflict set CS = k in the best case, i.e. when all k elements belong to a single subset C 1, the number of required coherency checks is log 2 +2k. The worst case for QUICKXPLAIN is n k observed when the axioms of a minimal conflict set always belong to different sets C 1 and C 2, i.e., if for instance a minimal conflict set has two axioms and one is positioned at the first place of set C and the other one at the last. In this case the number of coherency checks + 1) [6]. Note that we modified the original QUICKXPLAIN algorithm (Figure 1) such that it can be used with both black- and glassbox approaches. The algorithm issues two types of calls to a reasoner, iscoherent(t ) and getconflictset glassbox(). The first function returns true if the given terminology T is coherent. getconflictset glassbox() returns a set of axioms AX that are responsible for incoherence (CS AX). This function can only be used if the reasoner supports glass-box debugging. If this is the case the reasoner will able to return the set AX which was generated during the reasoning process. If only black-box usage is possible then a practical implementation of an ontology debugger should override this function with one that returns an empty set. In this case the modified algorithm is equal to the original one given in [6]. Moreover, the first part of the algorithm (lines 1-4) is required to check if an ontology is actually incoherent. This check is required for two reasons. First, the result of conflict set computation for an already coherent ontology using a reasoner as a black-box will include all axioms of this ontology; second, a feedback of a glass-box reasoner executed at this stage can significantly reduce the search space of QUICKXPLAIN. The same can also be noted for SINGLE JUST. is 2k(log 2 n k SINGLE JUST This algorithm (see Figure 2) follows an expand-andshrink strategy and has two main loops. The first one creates a set of CS that includes all axioms of the minimal conflict set and the second one minimizes CS by removing axioms that do not belong to the minimal conflict set. The algorithm includes two functions select(t ) and fastp runing(t ) that can be tuned to improve its performance. The first function starts by selecting a predefined number of axioms num from the given set. The number of axioms that are selected can grow with a certain factor f (see [7]). The fastp runing function implements a pruning strategy for CS with a sliding window technique. The pruning algorithm takes the size of the window 8

15 Algorithm: SINGLE JUST(B, C) Input: trusted knowledge B, set of axioms C Output: minimal conflict set CS (1) if iscoherent(b C) or C = return ; (2) AX getf aultyaxioms(c); (3) if AX then C C AX; (4) return computeconflict(b, C) function computeconflict(b, C) (5) CS B; (6) do (7) CS CS select(c \ CS); (8) while (iscoherent(cs)); (9) CS fastp runing(getf aultyaxioms(cs)); (10) for each ax CS do (11) CS CS \ {ax} (12) if iscoherent(cs) then CS CS {ax}; (13) else CS getf aultyaxioms(cs); (14) return CS; Figure 2. Generalized SINGLE JUST algorithm window and the set of axioms CS as an input and outputs a set of axioms CS CS. In the form it was implemented in OWL-API 5, the pruning algorithm partitions the input set CS with n axioms into p = n/window parts P i, i = 1,..., p and then sequentially tests coherency of each set CS i = CS \ P i, i = 1,..., p. Note also that OWL-API includes two variants of the pruning method, one with constant and one with shrinking window size. In further analysis and evaluation we will consider only the variant with the constant window size. Let us consider the best and worst cases for SINGLE JUST. In the best case, all axioms of a minimal conflict set CS belong to some partition set P i. Thus, given an axioms set C of cardinality n that contains a minimal conflict set CS of cardinality k, the algorithm will make at most 1 + p + min(window, num) coherency checks. In the worst case, the first iteration will require at least log f `1 n 1 f coherency checks and both the sliding window num and final minimization p + min(k/p, 1)n checks, if all k axioms of the minimal conflict set belong to different partitions P i. The theoretical analysis of the two algorithms thus shows that QUICKXPLAIN has a smaller interval of possible number of coherency checks in comparison to SINGLE JUST (see Figure 3) 6. Note also that the interaction with the reasoner used in SIN- GLE JUST in Figure 2 is organized in the same way as in QUICKXPLAIN, i.e., by means of the functions isconsistent and getf aultyaxioms. However, if a black-box approach to ontology debugging is used, the modified algorithm presented on the 2 is equal in terms of number of consistency checks to the original one suggested in [8]. Moreover, both generalized algorithms can also be used to detect conflict sets that cause unsatisfiability of a certain concept. This is possible if we introduce one more input parame- 5 Unfortunately, the authors of the SINGLE JUST algorithm did not provide a specification of the fast pruning method neither in [7] nor in [8]. Therefore we analyzed the OWL-API ( checked on June 7, 2008) implementation that is referred by the authors in [8]. 6 The values of SINGLE JUST parameters were taken from the OWL-API implementation Number of coherency checks QuickXPlain Single_Just 10 6 axioms 10 4 axioms 10 2 axioms Figure 3. Intervals for numbers of possible coherency checks required to identify a minimal conflict set of cardinality k = 8 in an ontology of n axioms. QUICKXPLAIN parameters: split(n) = n/2. SINGLE JUST: number of axioms on the first iteration num = 50, increment factor f = 1.25, window size window = 10. ter Concept and rewrite the coherency checking function such that iscoherent(c, Concept) returns false if Concept is unsatisfiable with respect to the terminology defined by a set of axioms C. Otherwise this function should return true. The algorithms presented on Figures 1 and 2 can also exploit the structural relations between axioms by means of specifically implemented functions split, select and fastp runing. One can thus for instance select and/or partition axioms so that axioms with intersecting sets of concepts will be considered first. 3 EMPIRICAL EVALUATION The theoretical analysis of the algorithms showed that QUICKX- PLAIN is preferable over SINGLE JUST since it has much lower variation of the number of required reasoner calls. Nevertheless, the extremum conditions of the discussed best and worth cases are rather specific. Therefore, an analysis of the average case has to be done in order to make the comparison complete. However, evaluating this case is problematic, since there are no publicly available collections of incoherent ontologies that are published on the Web and are suitable for such tests. Moreover, there is no a priori knowledge on the distribution of conflicts. In other words, we do not know how the faulty axioms are most often positioned in an ontology. Therefore, we simulated the occurence of faults in the ontology in order to obtain a measure of the numbers of coherency checks required by QUICK- XPLAIN and SINGLE JUST. These statistics can be then used to calculate the average number of required coherency checks. Moreover, for our purposes it is enough to generate and then compute only one conflict set, since none of the analyzed algorithms can improve its performance on subsequent executions by using data from the previous runs. The test case generation method was designed under the following assumptions: (1) All axioms have the same probability to be part of a conflict set (uniform distribution). Thus, for an ontology with n axioms, the probability for each axiom to be a source of a conflict is 1/n. (2) The cardinalities of minimal conflict sets follow the binomial distribution with the number of trials t equal to the maximal length of the dependency path from an axiom selected according to the first assumption to all axioms that directly or indirectly depend on concepts declared in the selected axiom. The value t corresponds to the maximum possible cardinality of a conflict that can be generated for a selected axiom. The success probability, which is the second 9

16 MyGrid Sweet-JPL BCS3 Galen Bike9 MGED Ontology Gene Ontology Sequence Ontology coherency checks QuickXplain Single_Just Find_Just MyGrid Sweet-JPL BCS3 Galen Bike9 MGED Ontology Gene Ontology Sequence Ontology milliseconds QuickXplain Find_Just QuickXplain Single_Just Figure 4. Average number of consistency checks for QUICKXPLAIN and SINGLE JUST using reasoner as a black-box Figure 5. Average running times for black-box QUICKXPLAIN and SINGLE JUST parameter of the distribution, is set to 1/t. Hence minimal conflict sets of smaller cardinality are more likely to appear. The process starts with the generation of a uniformly distributed number i for the interval [1, n]. This number corresponds to an axiom ax i that is the initial axiom of a conflict set. Then the algorithm queries a reasoner for a taxonomy of a given ontology to find out all axioms that contain concept definitions that are either directly or indirectly dependent on a concept defined in ax i. The length of a longest dependency path t is then used to generate the value c, which corresponds to the minimal cardinality of a conflict set according to the second assumption. Next, we retrieve all axioms which define concepts that subsume one of the concepts declared in ax i such that subsumption is made over the c 1 other concepts (C 1... C c 1 C c). If more then one axiom is retrieved then we select randomly one of them (denoted as ax j). Both axioms are modified to create a conflict (e.g. by inserting a new concept definition in ax i and its negation in ax j). Thus, the generation method generates faults that correspond to local or propagated causes of unsatisfiability that were observed by Wang et al [14] in many realworld ontologies. Note that the real cardinality of a minimal conflict is unknown prior to the execution of a conflict compution algorithm, since we do not investigate all possible dependencies of the modified axioms. In the tests we used Pellet as a reasoner and the SSJ statistical library 8 to generate the required random numbers. As can be seen in Figure 4 and Figure 5, in the average case (after 100 simulations) QUICKXPLAIN outperformed SINGLE JUST in all eight test ontologies MyGrid (8179 axioms), Sweet-JPL (3833 axioms), BCS3 (432 axioms), Galen (3963 axioms), MGED Ontology (236), Bike9 (215), Gene Ontology (1759) and Sequence Ontology (1745). In this test we measured both the number of checks and the elapsed time. Note also that the results that we obtained when using Pellet can in generally also be transferred to other reasoners that in these settings have shown to have comparable performance [13]. All experiments have been performed on a MacBookPro (Intel Core Duo) 2 GHz with 2 GB RAM and 1.5 GB maximum Java memory heap size. Beside using the reasoner as a black-box, both variants of QUICK- XPLAIN and SINGLE JUST can also be used in glass-box settings. However, the theoretical analysis of these cases is not trivial, since simardr/ssj/indexe.html we cannot predict the number of axioms that will be returned by a glass-box method on each iteration of the conflict computation algorithm. The computed conflict set can include extra axioms that do not belong to the searched minimal conflict set because of nondeterministisms of a reasoning algorithm such as max-cardinality restrictions or special features of the tracing algorithm itself. Therefore, our empirical evaluation of different combinations of QUICKXPLAIN and SINGLE JUST is based on two different glass-box implementations: invasive [9] and naïve non-invasive. In general, all glass-box methods implement tracing of axioms that were used by the reasoner to prove the unsatisfiability of a concept. The invasive method first stores all correspondences between source axioms and internal data structures of the reasoner and tracks the changes in internal data structures during the normalization and absorption phases (see [9] for details). In [9], it is also suggested to add tracing to the SHOIN (D) tableaux expansion rules to enable a very precise tracking of axioms so that the resulting axioms set will be as small as possible. The main drawback of this approach however is that such a modification disables many key optimizations, which are critical for the excellent performance of modern OWL reasoners [8]. In the non-invasive approach that we developed for our evaluation, we only track which concepts were unfolded by the reasoner and then search for all axioms in which these concepts are defined using the OWL-API. This method does not analyze the details of the reasoning process and thus, the resulting set of axioms is only in the best case equals to the set returned by the invasive method. However, such an approach can have a shorter execution time, since it does not require changes in the optimized reasoning process except for the insertion of a logging method for unfolded concepts. Pellet already includes an implementation of the invasive method (explanations of clashes) and can also be configured to turn on logging which is required for the non-invasive method. The only modification to the reasoner was to add a fast fail behavior in the satisfiability check. By default, Pellet searches for all unsatisfiable concepts. However, for the minimal conflict set computation algorithm it is enough to find just one such concept, since in this case the terminology is already incoherent. We performed the tests of the glass-box methods using the same test bundle that was used for the black-box tests. The evaluation shows that QUICKXPLAIN is faster in both approaches (see Figures 6 and 7). When using the feedback from the glass-box satisfiability check, QUICKXPLAIN performed better then SINGLE JUST in all 10

17 MyGrid Sweet-JPL BCS3 Galen Bike9 MGED Ontology Gene Ontology Sequence Ontology milliseconds QuickXplain Find_Just QuickXplain Single_Just Figure 6. Average running times for non-invasive glass-box work, it would be useful, if ontology editors like Protégé or Swoop would support anonymous user feedback for debugging purposes, as statistics on the number of conflict sets and their average cardinality. This data would help to make even more precise evaluations of the average case. Finally, note that in the context of this work we generally understand axioms as valid description logics statements of any kind. Concept definitions where the left-hand sides are atomic, are the most frequent form of axioms, since available ontology editors support mainly this presentation. However, if the terminology includes axioms with a different structure, approaches like those presented in [3, 7] can be used to transform the axioms. These approaches support fine-grained debugging of ontologies, will allow us to locate faults within parts of the original axioms. Although the evaluation in this paper was limited to the more coarse-grained case, the conflict computation techniques can be applied also for the fine-grained debugging approaches. MyGrid Sweet-JPL BCS3 Galen Bike9 MGED Ontology Gene Ontology Sequence Ontology Figure 7. QuickXplain milliseconds Single_Just Find_Just Average running times for invasive glass-box the cases. Note also that the difference in the average running times for QUICKXPLAIN and SINGLE JUST in invasive and non-invasive glass-box settings is not significant as both glass-box methods can in general reduce the search space of minimal conflict computation algorithms very rapidly. 4 CONCLUSIONS & FUTURE WORK Adequate debugging support is an important prerequisite for the broad application of ontologies in real-world scenarios and in recent years, different techniques for the automated detection of problematic chunks in the knowledge bases have been developed. One of the most critical and time-intensive tasks in most debugging approaches is the detection of small sets of axioms that contain the faults (conflict sets). In general, efficient conflict computation and minimization is central not only in debugging scenarios, as conflict sets are also helpful to compute justifications for axioms and assertions which in turn can serve as a basis of an explanation facility [8]. In this paper we have analyzed two recent proposals for the identification of conflicts, both in black-box and glass-box application scenarios. Both the theoretical analysis as well as an empirical evaluation showed that QUICKXPLAIN is currently the more efficient method for that purposes. Due to the lack of publicly available mass data about typical ontology faults, artificial tests had to be used in the experiments. For future REFERENCES [1] The Description Logic Handbook: Theory, Implementation, and Applications, eds., Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Peter F. Patel-Schneider, Cambridge University Press, [2] Mathieu d Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, and Enrico Motta, Watson: A gateway for next generation semantic web applications, in Poster session of the International Semantic Web Conference, ISWC, (2007). [3] Gerhard Friedrich, Stefan Rass, and Kostyantyn Shchekotykhin, A general method for diagnosing axioms, in DX 06-17th International Workshop on Principles of Diagnosis, eds., C.A. Gonzalez, T. Escobet, and B. Pulido, pp. pp , Penaranda de Duero, Burgos, Spain, (2006). [4] Gerhard Friedrich and Kostyantyn Shchekotykhin, A General Diagnosis Method for Ontologies, Proceedings of the 4 thinternational Semantic Web Conference (ISWC-05), , (2005). [5] Russell Greiner, Barbara A. Smith, and Ralph W. Wilkerson, A correction to the algorithm in Reiter s theory of diagnosis, Artificial Intelligence, 41(1), 79 88, (1989). [6] Ulrich Junker, QUICKXPLAIN: Preferred explanations and relaxations for over-constrained problems., in Association for the Advancement of Artificial Intelligence, pp , San Jose, CA, USA, (2004). [7] Aditya Kalyanpur, Debugging and repair of OWL ontologies, Ph.D. dissertation, University of Maryland, College Park, MD, USA, Adviser-James Hendler. [8] Aditya Kalyanpur, Bijan Parsia, Matthew Horridge, and Evren Sirin, Finding all justifications of OWL DL entailments, in Proc. of ISWC/ASWC2007, Busan, South Korea, volume 4825 of LNCS, pp , Berlin, Heidelberg, (November 2007). Springer Verlag. [9] Aditya Kalyanpur, Bijan Parsia, Evren Sirin, and James Hendler, Debugging unsatisfiable classes in OWL ontologies, Web Semantics: Science, Services and Agents on the World Wide Web, 3(4), , (2005). [10] Raymond Reiter, A theory of diagnosis from first principles, Artificial Intelligence, 23(1), 57 95, (1987). [11] Stefan Schlobach, Diagnosing terminologies., in Proc of AAAI, eds., Manuela M. Veloso and Subbarao Kambhampati, pp AAAI Press / The MIT Press, (2005). [12] Stefan Schlobach, Zhisheng Huang, Ronald Cornet, and Frank Harmelen, Debugging incoherent terminologies, J. Autom. Reason., 39(3), , (2007). [13] Evren Sirin, Bijan Parsia, Bernardo Cuenca Grau, Aditya Kalyanpur, and Yarden Katz, Pellet: A practical OWL-DL reasoner, Technical report, UMIACS Technical Report, (2005). [14] H. Wang, M. Horridge, A. Rector, N. Drummond, and J. Seidenberg, Debugging OWL-DL Ontologies: A Heuristic Approach, Proceedings of the 4 thinternational Semantic Web Conference (ISWC-05), , (2005). 11

18 12

19 Supporting Conceptual Knowledge Capture Through Automatic Modelling Jochem Liem and Hylke Buisman and Bert Bredeweg Human Computer Studies Laboratory, Informatics Institute, Faculty of Science, University of Amsterdam, The Netherlands. Abstract. Building qualitative models is still a difficult and lengthy endeavour for domain experts. This paper discusses progress towards an automated modelling algorithm that learns Garp3 models based on a full qualitative description of the system s behaviour. In contrast with other approaches, our algorithm attempts to learn the causality that explains the system s behaviour. The algorithm achieves good results when recreating four well-established models. 1 Introduction In this paper we focus on the ground work required to advance towards an automated modelling program. The input is considered to have a qualitative representation, i.e. a state graph that represents the possible situations that can emerge from a system, and the values of the quantities in each situation. Furthermore, the input is assumed to have no noise nor any inconsistencies. The completed algorithm is envisioned to support researchers in articulating their conceptual understanding. As such it will help to establish theories that explain the phenomena provided as input data. 2 Related Work Recently, researchers in the machine learning community proposed inductive process modelling as a new research agenda [4]. They argue that models should not just be accurate, but should also provide explanations (often requiring variables, objects and mechanisms that are unobserved). In their work, quantitative process models are learned from numerical data. Based on changing continuous variables (observations), a set of variables, a set of processes (described by generalized functional forms), and constraints such as variable type information, a specific process model is generated that explains the observed data and predicts unseen data accurately. As in our approach, there is an explicit notion of individual processes, variables (quantities) and subtype hierarchies to represent different types. Our approach differs from this work in two ways. Firstly, we learn qualitative models based on qualitative data, making our approach a viable alternative when no numerical data is available. Secondly, our approach represents causality more explicitly through causal dependencies. We argue that this representation provides a better explanation than equations. However, our generated models cannot perform numerical predictions. An earlier approach to learning qualitative models is Qualitative Induction (QUIN) [1]. QUIN searches for qualitative patterns in numeric data and outputs the result as qualitative trees (similar to decision trees). The choices in the qualitative tree can be seen as conditional inequalities for specific model fragments in our approach. As with the inductive process modelling approach equations are used to represent the causality, in this case Qualitative Differential Equations (QDEs). Similar work to QUIN learns models for the JMorven language, which uses fuzzy quantity spaces to specify variables [7]. However, this work also uses QDEs, which leaves the representation of causality implicit. 3 QR Model and Simulation Workbench: Garp3 The automatic model building algorithm is implemented in Garp3 1 [2]. Garp3 allows modellers to represent their knowledge about the structure and the important processes in their system as model fragments, which can be considered formalisations of the knowledge that applies in certain general situations. Next to model fragments, different scenarios can be modelled. These represent specific start states of a system. Garp3 can run simulations of models based on a particular scenario. The result of such a simulation is a state graph, in which each state represents a particular possible situation of the system, and the transitions represent the possible ways a situation can change into another. The simulation engine takes a scenario as input, and finds all the model fragments that apply to that scenario. The consequences of the matching model fragments are added to the scenario to create a state description from which new knowledge can be inferred such as the derivatives of quantities. Given the completed state description, the possible successor states are inferred. The complete state graph is generated by applying the reasoning to the new states. In Garp3 the structure of a system is represented using entities (objects) and configurations (relations). For example, a lion hunting on a zebra would be represented as two entities (lion and zebra) and a configuration (hunts). Quantities represent the features of entities and agents that change during simulation. A quantity has a magnitude and a derivative, representing its current value and trend. The magnitude and derivative are each defined by a quantity space that represents the possible values the magnitude and the derivative can have. Such a quantity space is defined by a set of alternating point and interval values. We use M v(q 1) to refer to the current value of the magnitude of a quantity. M s(q 1), the sign of the magnitude, indicates whether the magnitude is positive, zero or negative (M s(q 1) {+,0, })

20 D v(q 1) refers to the current value of the derivative of a quantity, which has a value from the predefined derivative quantity space (D v(q 1) {,0, +}). D s(q 1) refers to the current sign of a derivative. Note that the predefined values of derivatives completely correspond to the possible signs of the derivative. 3.1 Causality Garp3 explicitly represents causality using indirect and direct influences. Direct influences are represented as Q 1 I + Q 2. Influences can be either positive or negative. The positive influence will increase D v(q 2) if M s(q1) = +, decrease it if M s(q1) =, and have no effect when M s(q1) = 0. For a negative influence, it is vice versa. The indirect influences, called proportionalities, are represented as Q 1 P + Q2. Similar to influences, proportionalities can be either positive or negative. The positive proportionality will increase D v(q 2) if D s(q 1) = +, have no effect if it is stable, and decrease if it is below zero. For a negative proportionality, it is vice versa. 3.2 Other Behavioural Ingredients Other behavioural ingredients in Garp3 are operators, inequalities, value assignments and correspondences. Operators (+ and -) are used to calculate the magnitude value of quantities (e.g. Q 1 Q 2 = Q 3, to indicate M v(q 1) M v(q 2) = M v(q 3)). Inequalities can be placed between different model ingredient types: (1) magnitudes (M v(q 1) = M v(q 2)), (2) derivatives (D v(q 1) < D v(q 2), (3) values Q 1(point(Max)) = Q 2(point(Max)), (4) operator relations (M v(q 1) M v(q 2) < M v(q 3) M v(q 4), (5) combinations of the 1, 2, 3 and 4 (although only between either magnitude or derivative items). Value assignments simply indicate that a quantity has a certain qualitative value (M v(q 1) = Q 1(Plus)). Finally, correspondences indicate that from certain values of one quantity, values of another quantity can be inferred. There are quantity corre- Q qs spondences (Q 1 Q 2) and value correspondences (Q 1(Plus) Qv Q 2(Plus)), which can both be either directed or undirected. The value correspondence indicates that if M v(q 1) = Q 1(Plus), M v(q 2) = Q 2(Plus). If the value correspondence is bidirectional, the reverse inference is also possible. Quantity correspondences can be considered a set of value correspondences between each consecutive pair of the values of both quantities. There are also inverse quantity space correspondences (Q 1 Q 2) that indicate that the first Q 1 qs value in Q 1 corresponds to the last value in Q 2, the second to the one before last, etc. 4 Algorithm Requirements and Approach 4.1 Assumptions and Scoping The goal of the automatic model building algorithm is to take a state graph and a scenario as input, and generate the model that provides an explanation for the behaviour. Our approach focusses on the generation of causal explanation. Several assumptions are made to scope the work. In further research these assumptions can be alleviated. Firstly, input is assumed to have no noise or inconsistencies. Secondly, the state graph is assumes to be a full envisionment of the system s behaviour. The second assumption is that a model can be build using a single model fragment. From a causal explanation point of view, it is reasonable to assume that influences and proportionalities never disappear, but that their effects are only nullified when quantities become zero or stable. Thirdly, the algorithm is focussed on causal explanation and less on structure. Therefore, the entity hierarchy is assumed known. 4.2 Input and Output The algorithm takes a complete state graph as input, which includes (1) the quantity names, (2) the quantity spaces, (3) the magnitudes and derivatives of the quantities in different states, (4) the observable inequalities, and (5) the state transitions. Furthermore, the algorithm is provided with the scenario that should produce the state graph, which consists of: (1) the entities, agents and assumptions involved, (2) structural information about the configurations between them, (3) the quantities and their initial values, and (4) the inequalities that hold in the initial state. The output of the algorithm is one or more Garp3 qualitative models that explain (are consistent with) the input that can be immediately simulated. 4.3 Algorithm Design Approach Since the semantics of model ingredients are formally defined, one would assume that it is clear how each ingredient manifests itself in the simulation results of a model. Otherwise, how would the implementation of a simulation engine have been possible? However, in practice, it is hard even for expert modellers to pinpoint the model ingredients that are responsible for certain (lack of) behaviour. This has several reasons. Firstly, a large set of inequalities are derived during qualitative simulation, of which the implications (other inequalities) are difficult to foresee. Secondly, the engine has a lot of intricacies (such as second order derivatives) which makes simulation results hard to predict. Thirdly, the branching in the state graph that results from ambiguity is difficult for people to completely envision. For these reasons, an iterative algorithm design approach is chosen. Well-established models are ordered by complexity, and attempts are made to generate them using their own output. Each of the models requires a different (and increasingly large) set of considerations that must be dealt with. The models chosen are Tree and Shade, Communicating Vessels, Deforestation, Population Dynamics and a set of other even more complex models 2 Tree and Shade is the least complex model, containing only a few quantities, and causal dependencies, and no conditions, causal interactions, inequalities or operator relations. Communicating vessels is more complex, as it contains causal interactions, an operator, and inequalities. The deforestation model is different from the previous models as it contains many clusters linked to each other by proportionalities. Population dynamics is again more complex, due to the large amount of quantities, interactions and conditions. 4.4 Causality and Clusters Causal Paths Important for the algorithm is the concept of causal paths. These are series of quantities connected by influences and proportionalities. A causal path is defined as a set of quantities that starts with an influence, and is followed by a arbitrary number of proportionalities. For I + P + example: Q 1 Q 2... P P + Q n 1 Q n. A quantity that has no 2 The models and references to articles are available at 14

21 proportionalities leading out of it ends the causal path. If a quantity has more than one proportionality leading out of it, multiple causal paths can be defined. Since each influence represents the causal effect of a process, a causal path can be seen as the cascade of effects of a process. Given this perspective, certain successions of causal relations become unlikely. For example the causal path Q 1 I + Q 2 I + Q 3 P Q 4 I + Q 5 would imply there are many active processes with short or no cascading effects Direction of Causality An important issue in scientific enquiry is the problem of correlation and causality. This issue appears when trying to derive causal relations from the state graph. For example, D s(q 1) = D s(q 2) can be an caused by Q 1 P + Q 2, Q 2 P + Q1, or even Q 3 P + Q 1 and Q 3 P + Q 2. Another example of this is in the communicating vessels model. Ideally, a model capturing the idea of a contained liquid would distinguish between Volume, Height and Bottom pressure, and have a particular causal account (V olume P + Height P + Bottom pressure). However, from the model s behaviour this causality may not be derivable, e.g. when the width of the containers doesn t change. As a result, the unique role of the quantities involved can only be inferred when the required variation for that is apparent in the input state-graph. Therefore, it is considered the modeller s responsibility to provide simulation examples which will allow the algorithm the make these critical distinctions. However, it can be considered the responsibility of the tool to indicate to the modeller that the causality between certain sets of quantities cannot be derived, and that examples showing these differences should be provided Clusters The algorithm makes use of a specific subset of causal paths called clusters. We define clusters as groups of quantities that exhibit equivalent behaviour. More specifically, a set of quantities con- Q qs stitute a cluster if their values either correspond (Q 1 Q 2) or Q 1 qs inversely correspond (Q 1 Q 2) to each other. Additionally, the corresponding derivatives should be equal (D v(q 1) = D v(q 2)), while inversely corresponding derivatives should be each other s inverse (D v(q 1) = D v(q 2)). A further constraint is that the corresponding quantities (not inverse) in a cluster must be completely equivalent. Therefore, M v(q 1) = M v(q 2) must always hold. If an inequality holds between two quantities, they are considered not to belong to the same cluster. During implementation it became obvious that clusters are not meaningful when quantities within a cluster belong to different entities. The reason for this originates from the idea of no function in structure [5]. Clusters involving multiple entities would integrate causality across individual structural units, which is undesired. Therefore, clusters can only contain quantities that belong to the same entity. Quantities cannot be a member of more than one cluster. If Q 1 and Q 2 are in a cluster, and Q 1 and Q 3 are in a cluster, then Q 1, Q 2 and Q 3 must be in the same cluster. After all, if Q 1 and Q 2 have equivalent behaviour, and Q 1 and Q 3 have equivalent behaviour, by transitivity Q 2 and Q 3 have to exhibit equivalent behaviour. 4.5 Minimal Covering The key requirement of the model building algorithm is that it explains the input behaviour. However, a second requirement is that the algorithm does not contain redundant dependencies. That is, the algorithm should return the minimal set of dependencies that explains the behaviour. Two dependencies are considered substitutionary if they have the same effect on the simulation result (i.e. removing one of them would have no effect, however removing both would). Complementary dependencies are responsible for different aspects of the behaviour, and both have to be present to explain the data. The aim is to create an algorithm that is minimally covering, i.e. it should only contain complementary dependencies. 5 Algorithm 5.1 Finding Naive Dependencies The goal of this step is to find (non-interacting) dependencies that are valid throughout the entire model (i.e. are not conditional). These causal relations are called naive dependencies, and provide the basis for the rest of the algorithm Consistency Rules Naive dependencies are identified using consistency rules. Each pair of quantities is checked using these rules to determine which of them potentially holds throughout the state graph. These rules make use of M v(q x), M s(q x), D v(q x), D s(q x) of each quantity in a pair, and inequalities that hold between them. These statements are referred to as the state information of a quantity. The consistency rules are derived from the semantics of the causal dependencies (see Section on Garp3). Examples of rules (that should hold throughout the state graph) are: Redundancy Q 1 I + Q 2 if M s(q 1) = D s(q 2) (1) Q 1 I Q 2 if M s(q 1) = D s(q 2) (2) Q 1 P + Q 2 if D s(q 1) = D s(q 2) (3) Q 1 P Q 2 if D s(q 1) = D s(q 2) (4) Q 1(V x) Qv Q 2(V y) if M v(q 1) = Q 1(V x) = M v(q 2) = Q 2(V y) (5) Q qs Q 1 Q 2 if V n(q 1(V n) Qv Q 2(V n)) (6) The set of dependencies that are found contain a lot of redundancy, i.e. many dependencies are substitutionary. For example, in the communicating vessels model height P + pressure, can be substituted by pressure P + height. The remainder of the algorithm selects the correct substitutionary groups, and uses the selected naive dependencies to derive more complex dependencies. 5.2 Determining Clusters This step tries to determine clusters within the set of naive dependencies. The algorithm searches for quantities belonging to the same entity that exhibit equivalent behaviour, and tries to expand these candidate clusters by adding other quantities. Quantities are only added if 15

22 they exhibit behaviour equivalent to the quantities already contained in the candidate cluster. If no more quantities can be added to a candidate cluster, the algorithm searches for other candidate clusters. By only considering models composed of clusters, the space of possible models is significantly reduced. The validity of the candidate clusters is checked by determining if there is overlap between the clusters. All clusters that overlap are removed. An alternative would be to only remove clusters until no more overlap is present. However, in practice no situations were encountered where this was desirable. An example of a found cluster is volume, height and pressure in the communicating vessels model. Note that these clusters are still missing influences (their actuators), these are determined later in the algorithm. 5.3 Generating Causal Paths This step returns the possible causal orderings within clusters based on the cluster and naive dependencies sets. For each cluster a valid causal ordering is returned. Through backtracking other possible orderings are generated. The quantities in a cluster can be either connected in a linear fashion (Q 1 P + Q 2 P + Q3) or using branching (Q 1 P + Q 2 and Q 1 P + Q3). The algorithm prefers linear branching, as branching does not often occur in practice. Additionally, the reduction of possible models is a significant advantage. Another constraint that reduces the number of possible models is requiring clusters that belong to entities of the same type to have the same causal ordering. For example, if for one container V olume P + Height P + P ressure, than for other containers the same causal ordering must hold. 5.4 Actuating Clusters The goal of the actuating clusters step is to connect clusters by identifying cluster actuations. This step takes the set of clusters with established causal orderings and the naive dependencies as input. Clusters can either be actuated by another cluster, or act as an actuator itself. Furthermore, clusters can be connected by propagating an actuation. In a model, each cluster should take part in at least one of these kind of relations such that all clusters are related in a way. Otherwise, the model would include two separate non-interacting subsystems. When one cluster actuates another, there is an influence relation between the two. Actuations are the most important form of connecting clusters, since these connections are the cause of change in the system. They are also the easiest to detect, due to the specific way influences manifest themselves in the state information. For this reason, actuations by influences are identified first. Two types of actuations though influences are distinguished: (1) equilibrium seeking mechanisms (ESM) and (2) external actuators Equilibrium Seeking Mechanisms ESMs are better known as flows, and are common in qualitative models. Flows cause two unequal quantities to equalize. The flow in the communicating vessels model has a non-zero value when the pressures in the two containers are unequal. The flow changes the volume of the containers, and thus the pressures to equalize. An ESM holds under the following two conditions: (1) Q 1 = Q 2 Q 3, where Q 1 C 1, Q 2 C 2, Q 3 C 3, where the C s are clusters, and (2) Q 4 I Q 5 and Q 4 I + Q 6, where Q 4 C 1, Q 5 C 2, Q 6 C 3. Note that in many cases Q 1 = Q 4, such as in the communicating vessels model Finding Calculus Relations The algorithm reduces the search space of finding ESMs using four constraints. Firstly, all quantities involved in the operator should be in different clusters (C 1, C 2 and C3 are unequal). Secondly, the set of naive dependencies should at least contain one influence from Q 1 (to serve as an actuation). Thirdly, both Q 2 and Q 3 would be at the end of the causal paths within their cluster, as in most cases this is the most meaningful interpretation. Finally, Q 2 and Q 3 are required to be of the same type, as only things of the same type can be subtracted External Actuators External actuators are causes of change more at the edges of the system compared to ESMs. To identify external actuators, the algorithm considers the influences in the naive dependencies that are not part of an ESM. Again, the minimal covering principle is applied to keep the number of dependencies to a minimum. As a result a cluster will never have more than one incoming actuation. An actuation is only considered between C 1 to C 2 if the set of naive dependencies contains influences between each possible pair of quantities, such that Q x C 1, Q y C 2(Q x I + Q y). This removes the influences in the set of naive dependencies that are consistent with the behaviour by chance. Alternative actuations are returned through backtracking. In the future, actuations may be chosen based on the structure of the system, as causal relations are more likely to occur parallel to structurally related entities Feedback A common pattern in qualitative models is feedback, which is a proportionality originating from the end of a causal path to the quantity actuating the causal path. Feedbacks are simply added if the naive dependencies contain one. The algorithm always adds feedback at the end of causal paths, since this is what happens in the investigated models. However, it could be the case that feedbacks from halfway a causal chain are also possible. 5.5 Linking Clusters by Propagation This step connects the clusters that have not yet been connected through proportionalities, based on the naive dependencies. As with clusters, the causal ordering of the clusters cannot be distinguished. Therefore all possibilities are generated. Furthermore, the same design choices as with finding causal paths within clusters have been made. Only linear orderings of clusters are allowed (i.e. no branching). 5.6 Setting Initial Magnitudes An influence has no effect if the magnitude of the quantity from which it originates is unknown. Therefore this step assigns initial values to quantities. Note that this step first generates a set of candidate assignments. When a value can be derived in another way than through assignment, it is removed from the set of value assignment candidates. 16

23 There are six ways to assign initial magnitudes. Firstly, if a value assignment for the quantity is present in the scenario, it requires no initialisation. Secondly, if the magnitude can be derived through a correspondence, the value is known. Thirdly, the result of a minus operator can be derived if an inequality between its arguments is known. Based on the possible magnitudes of the result this inequality can be derived. Either this inequality is present in the scenario, or multiple inequalities should be made assumable by adding them as conditions in multiple model fragments. Garp3 automatically assumes unprovable values and inequalities if they are conditions in model fragments. Note that generating the conditional inequalities is currently beyond the scope of the algorithm, as it involves adding model ingredients to multiple model fragments. Fourthly, it is possible that a certain magnitude holds everywhere throughout the state graph. In this case, a value assignment is added as a (conditionless) consequence. Fifthly, a value could hold under certain conditions. However, this would require a value assignments with a conditional inequalities in separate model fragments. Therefore, it is currently beyond the scope of the algorithm. Finally, multiple model fragments could be created in which the magnitudes are present as conditions. Garp3 will generate the different states that would result by assuming each of the values. As with the conditional value assignments, having value assignments as conditions in multiple model fragments is currently beyond the scope of the algorithm. 5.7 Dependency Interactions This step identifies dependency interactions (influences or proportionalities) based on the input behaviour. Dependency interactions are detected in the same way as naive dependencies, i.e. using a set of consistency rules. Interactions are not found as naive dependencies, as the individual dependencies are not consistent with the entire state graph (as an interaction results in more behaviour than a single dependency). The algorithm assumes that the interaction consists opposing dependencies, such as birth vs. death and immigration vs. emigration. 6 Results The tree and shade model [3] is successfully modelled by the algorithm. It returns two models, representing both possible directions of causality between Size and Shade. The initial magnitude assignment correctly finds the conditionless value assignment on Growth rate. The models s simulation results are equivalent to the original model. The dependencies of the communicating vessels model are correctly found. The algorithm returns 6 models; one for each possible causal ordering of amount, height and pressure. The algorithm also correctly identifies the ESM-based actuations of the clusters, by properly finding the min operator. Furthermore, all necessary causal dependencies and correspondences are identified. Model fragments that allow the assumption of initial values are missing (due to the fact that the algorithm generates a single model fragments). Adding an inequality between the pressures of the containers in the scenario allows the model to simulate without problems. The deforestation model (containing entities Woodcutters, Vegetation, Water, Land and Humans ) is successfully modelled, including setting initial magnitudes using conditions. The simulation is equivalent to that of the original model. The causal ordering does differ, as it does not capture the branching of the causal paths in the original model. The resulting model however, is not considered wrong by experts, and is arguably better than the original. Over 2000 models are returned when generating all possible results, due to the many possible causal orderings. The population dynamics model [3] generates the correct models for the open and closed population scenarios. However, the initial values are not set. The algorithm does not yet give correct results for the heating/boiling, R-Star [6] and Ants Garden [8] models. For the heating model this is due to inequalities that hold under specific conditions, which are not taken care of in the algorithm. The R-Star and Ants Garden are large models that resulted from specific research projects. As such, these models are an order of magnitude more complex than the other models. It is therefore not surprising that the algorithm in its current form cannot cope with them. 7 Conclusions & Future Work This paper presents preliminary work towards an algorithm that automatically determines a Garp3 qualitative model, using an enumeration of all possible system behaviour as input. The algorithm uses consistency rules to determine the causal dependencies that hold within the system. Using the concept of clusters the search space is significantly reduced. Accurate results are generated for a set of well-established models. The results seem to suggests that it is possible to derive causal explanations from the behaviour of a system, and that model building support through an automatic model building algorithm is viable. There are several algorithm improvements planned. The first improvement is to have a generalised representation for the ambiguity within and between clusters. That is, have a single representation for the complete model space. For simulation purposes an arbitrary instantiation can be chosen, as each one has an equivalent result. Secondly, the algorithm has to be improved to be able to create multiple model fragments in order to deal with conditional model ingredients. Thirdly, means have to be developed to be able to compare generated state graphs with the desired state graph. ACKNOWLEDGEMENTS We would like to thank the referees for their insightful comments. REFERENCES [1] Ivan Bratko and Dorian Šuc, Learning qualitative models, AI Mag., 24(4), , (2004). [2] B. Bredeweg, A. Bouwer, J. Jellema, D. Bertels, F. Linnebank, and J. Liem, Garp3 - a new workbench for qualitative reasoning and modelling, in 20th International Workshop on Qualitative Reasoning (QR- 06), eds., C. Bailey-Kellogg and B. Kuipers, pp , Hanover, New Hampshire, USA, (July 2006). [3] Bert Bredeweg and Paulo Salles, Handbook of Ecological Modelling and Informatics, chapter Mediating conceptual knowledge using qualitative reasoning, WIT Press, (in press). [4] W. Bridewell, P. Langley, L. Todorovski, and S. D zeroski, Inductive process modeling, Machine Learning, 71, 132, (2008). [5] J. de Kleer and J. S. Brown, A qualitative physics based on confluences, Artificial Intelligence, 24(1-3), 7 83, (December 1984). [6] T. Nuttle, B. Bredeweg, and P. Salles, R-star - a qualitative model of plant growth based on exploitation of resources, in 19th International Workshop on Qualitative Reasoning (QR 05), eds., M. Hofbaur, B. Rinner, and F. Wotawa, pp , Graz, Austria, (May 2005). [7] Wei Pang and George M. and Coghill, Advanced experiments for learning qualitative compartment models, in 21th International Workshop on Qualitative Reasoning (QR-07), ed., C. Price, (2007). [8] P. Salles, B. Bredeweg, and N. Bensusan, The ants garden: Qualitative models of complex interactions between populations., Ecological Modelling, 194(1-3), , (2006). 17

24 18

25 Automated Learning of Communication Models for Robot Control Software Alexander Kleiner 1 and Gerald Steinbauer 2 and Franz Wotawa 2 Abstract. Control software of autonomous mobile robots comprises a number of software modules which show very rich behaviors and interact in a very complex manner. These facts among others have a strong influence on the robustness of robot control software in the field. In this paper we present an approach which is able to automatically derive a model of the structure and the behavior of the communication within a componentorientated control software. Such a model can be used for on-line model-based diagnosis in order to increase the robustness of the software by allowing the robot to autonomously cope with faults occurred during runtime. Due to the fact that the model is learned form recorded data and the use of the popular publisher-subscriber paradigm the approach can be applied to a wide range of complex and even partially unknown systems. 1 Introduction Control software of autonomous mobile robots comprises a number of software modules which show very rich behaviors and interact in a very complex manner. Because of this complexity and other reasons like bad design and implementation there is always the possibility that a fault occurs at runtime in the field. Such faults can have different characteristics like crashes of modules, deadlocks or wrong data leading to a hazardous decision of the robot. This situation can occur even if the software is carefully designed, implemented and tested. In order to have truly autonomous robots operating for a long time without or with limited possibility for human intervention, e.g., planetary rovers exploring Mars, such robots have to have the capability to detect, localize and to cope with such faults. In [8, 7] the authors presented a model-based diagnosis framework for control software for autonomous mobile robots. The control software is based on the robot control framework Miro [10, 9] and has a client-server architecture where the software modules communicate by exchanging events. The idea is to use the different communication behaviors between the modules of the control software in order to monitor the status of the system and to detect and localize faults. The model comprises a graph specifying which modules communicate with each other. Moreover, the model has information about 1 Institut für Informatik, Albert-Ludwigs-Universität Freiburg, Georges-Köhler-Allee, D Freiburg, Germany, kleiner@informatik.uni-freiburg.de 2 Institute for Software Technology, Graz University of Technology, Inffeldgasse 16b/II, A-8010, Austria, {steinbauer,wotawa}@ist.tugraz.at the type of a particular communication path, e.g, whether the communication occurs on a regular basis or sporadically. Finally, the model includes information about which inputs and outputs of the software modules have a functional relation, e.g, which output is triggered by which input. The model is specified by a set of logic clauses and uses a component-based modeling schema [1]. Please refer to [8, 7] for more details. The diagnosis process itself uses the well known consistency-based diagnosis techniques of Reiter [5]. The models of the control software and the communication were created by hand by analyzing the structure of the software and its communication behavior during runtime. Because of the complexity of such control software or the possible lack of information about the system it is not feasible to do this by hand for large or partially unknown systems. Therefore, it is desirable that such models can be created automatically either from a formal specification of the system or from observation of the system. In this paper we present an approach which allows the automatic extraction of all necessary information from the recorded communication between the software modules. The algorithm provides all information needed for model-based diagnosis. It provides a communication graph showing which modules communicate, the desired behavior of the particular communication paths and the relation between the inputs and outputs of the software modules. These model learning approach was originally developed for and tested with the control software of the Lurker robots [2] used in the RoboCup rescue league. This control software uses the IPC communication framework [6], which is a very popular event-based communication library used by a number of robotic research labs worldwide. However, the algorithm simply can be adapt to other event-based communication frameworks, such as for instance Miro. The next section describes in more detail how the model is extracted from the observed communication. 2 Model Learning Control systems based on IPC use an event-based communication paradigm. Software modules which wants to provide data are publishing an event containing the data. Other software modules which like to use this data, subscribe for the appropriate event and get automatically informed when such an event is available. A central software module of IPC is in charge for all aspects of this communication. Moreover, this software module is able to record all the communication details. It is able to record the type of the event, the time the event was published or consumed, the content of the event, and the names of the publishing and the receiving module. 19

26 The collected data is the basis for our model learning algorithm. Figure 1 depicts such collected data for a small example control software comprising only 5 modules with a simple communication structure. This example will be used in the following description of the model learning algorithm. The control software comprises two data path. One is the path for the self-localization of the robot. The two modules in the path Odometry and SelfLoc provide data on a regular basis. The other is the path for object tracking. The module Vision provides new data on a regular basis. The module Tracker provides data only if new data is available from the module Vision. The figure shows when the different events were published. Based on this recorded communication we extract the communication model step by step. 2.1 The communication graph At a first step the algorithm extract a communication graph from the data. The nodes of the graph are the different software modules. The edges represent the different events which are exchanged between the modules. Each event is represented by at least one edge. If the same event is received by multiple modules, there is an edge to every receiving module originating from the publishing module. Figure 2 depicts the communication graph for the above example. This graph shows the communication structure of the control software. Moreover, it shows the relation of inputs and outputs of the different software modules because each node knows its connections. Such a communication graph is not only useful for diagnosis purposes, but it is also able to expressively visualize the relation of modules from a larger or partially unknown control software. Formally the communication graph can be defined as following: Definition 1 (CG) A communication graph (CG) is a directed graph with the set of nodes M and the set of labeled edges C where: M is a set of software modules sending or receiving at least one event. C is a set of connections between modules, the direction of the edge points from the sending to the receiving module, the edge is labeled with the name of the related event. Please note that the communication graph may contain cycles. Usually such cycles emerge from acknowledgement mechanisms between two modules. The algorithm for the creation of the communication graph is straightforward. The algorithm starts with an empty set of nodes M and edges C. The algorithm iterates trough all recorded communication events. If either the sender or the receiver are not in the set of the nodes the sender or the receiver is added to the set. If there is no edge pointing from the sending to the receiving node with the proper label, a new edge with the appropriate label is added between the two modules. Moreover, we define the two functions in : CO 2 C which returns the edges pointing to a node and the function out : CO 2 C which returns the edges pointing from a node. 2.2 The communication behavior In a next step the behavior or type of each event connection is determined. For this determination we use the information of the node the event connection comes from, and the recorded information of the event related to the connection, and all events related to the sending node. We can distinguish the following types: triggered event connection (1), periodic event connection (2), bursted event connection (3) and random event connection (4). In order to describe the behavior of a connection formally we define a set of connection types CT = {periodic, triggered, bursted, random} and a function ctype : C CT which returns the type of a particular connection c C. The type of a event connection is determined by tests like measurements of the mean and the standard deviation of the time between the occurrence of the events on the connection, and comparison or correlation of the occurrence of two events. The criteria used to assign an event connection to one of the four categories are summarized below: triggered A triggered event only occurs if its publishing module recently received a trigger event. In oder to determine if an event connection is a triggered event connection, the events on connection c out(m) are correlated to the events on the set of input connection to the software module I = in(m). If the number of events on connection c, which are correlated with an event on a particular connection t in(m), exceed a certain threshold, connection t is named as trigger of connection c. The correlation test looks for the occurrence of the trigger event prior the observed event. Note each trigger event can only trigger one event. If connection c is correlated with at least one connection t in(m) connection c is categorized as a triggered connection. Usually, such connections are found in modules doing calculations only if new data are available. periodic On a periodic event connection the same event regularly occurs with a fixed frequency. We calculate from the time stamps of the occurrence of all events a discrete distribution of the time difference between two successive events. If there is a high evidence in the distribution for one particular time difference, the connection is periodic with a periodic time of the estimated time difference. For a pure periodic event connection one gets a distribution close to a Dirac impulse. Usually, such connections are found with modules providing data at a fixed frame rate, such as a module sending data from a video camera. bursted A bursted event is similar to the periodic event but its regularly occurrence can be switched on and off for a period of time. A event connection is classified as bursted if there exist time periods where the criteria of the periodic event connection hold. Usually, such connections are found with modules which do specific measurements only if the central controller explicitly enable them, e.g., a complete 3d laser scan. random For random event connections none of the above categories match and therefore no useful information about the behavior of that connection can be derived. Usually, such 20

27 Vision Odometry Hz 12 Hz 150 Tracker Selfloc 2 Hz 6 Hz 100 User Time [s] msg-objects msg-odometry msg-velocities msg-pose Figure 2. Communication graph learned from the recorded data of the example control software. Figure 1. Recorded communication of the example robot control software. The peaks indicate the occurrence of the particular event. connections are found in modules which provide data only if some specific circumstance occur in the system or its environment. In the case of the above example, the algorithm correctly classified the event connections odometry, objects and pose as periodic and the connection velocity as triggered with the trigger objects. 2.3 The observers In order to be able to monitor the actual behavior of the control software, the algorithm instantiates an observer for each event connection. The type of the observer is determined by the type of the connection and its parameters, estimated by the methods described before. An observer rises an alarm if there is a significant discrepancy between the currently observed behavior of an event connection and the behavior learned beforehand during normal operation. The observer provides as an observation O the atom ok(l) if the behavior is within the tolerance and the atom ok(l) otherwise. Where l is the label of the corresponding edge in the communication graph. The observations of the complete control OBS software is the union of all individual observations n[ OBS = i=1 where n is the number of observers. The following observers are used: triggered This observer raises an alarm if within a certain timeout after the occurrence of a trigger event no corresponding event occurs or if the trigger event is missing prior the occurrence of the corresponding event. In order to be robust against noise, the observer uses a majority vote for a number of succeeding events, e.g, 3 votes. O i periodic This observer raises an alarm if there is a significant change in the frequency of the events on the observed connection. The observer checks if the frequency of successive events does vary too much from the specified frequency. For this purpose, the observer estimates the frequency of the events within a sliding time window. bursted This observer is similar to the observer above. It differs in the fact that this observer starts the frequency check only if events occur and does not raise an alarm if no events occur. random This is a dummy observer which alway provides the observation ok(l). This observer is implemented for completeness. 2.4 The system description The communication graph together with the type of the connections is a sufficient specification of the communication behavior of the robot control software. This specification can be used in order to derive a system description for the diagnosis process. It is a description of the desired or nominal behavior of the system. In order to be able to be used in the diagnosis process, the system description is automatically written down as a set of logical clauses. This set can easily be derived from the communication graph and the behavior of the connections. The algorithm to derive the system description starts with an empty set SD. For every event connection in two steps, clauses are added to the system description. In the first step, a clause for forward reasoning is added. The clause specifies if a module works correct and all related inputs and outputs behave as expected. Depending on the type of the connection, we add the following clause to the SD. If connection c is 21

28 triggered, we add the clause ^ AB(m) and the clause t trigger(c) t in(m) AB(m) ok(c) ok(t) ok(c) otherwise. AB(m) means that the module m is not abnormal and the module works as expected. The atom ok(c) specifies that the connection c behaves as expected. In a second step, a clause for backward reasoning is added. The clause specifies if all output connections c of module m behave as expected the module itself has to behave as expected. We add the clause ^ ok(c ) AB(m) c out(m) Figure 3 depicts the system description obtained for the above example control software. 3 Model-based diagnosis For the detection and localization of faults we use the consistency-based diagnosis technique of [5]. A fault detectable by the derived model causes a change in the behavior of the system. If such an inconsistency between the modeled and observed behavior emerges, a failure has been detected. Formally, we define this by: SD OBS { AB(m) m M} = where the latter set says that we assume that all modules work as expected. In order to localize the module responsible for the detected fault, we have to calculate a diagnosis. Where is a set of modules m M we have to declare as faulty (change AB(m) to AB(m)) in order to resolve the above contradiction. We use our implementation 3 of this diagnosis process for the experimental evaluation of the models. Please refer to [8, 7] for the detail of the diagnosis process. 4 Experimental Results In order to show the potential of our model learning approach, the approach has been tested on three different types of robot control software. We evaluated whether the approach is able to derive an appropriate model reflecting all aspects of the behavior of the system. The derived model was evaluated by the system engineer who has developed the system. Moreover, we injected artificial faults like module crashes in the system, and evaluated if the fault can be detected and localized by the derived model. 4.1 A small example control software The example software from the introduction comprises five modules. The module Odometry provides odometry data at a regular basis. This data is consumed by the module Self- Loc, which does pose tracking by integrating odometry data, and providing continuously a pose estimate to a visualization 3 The implementation can freely be downloaded at module User. The module Vision provides position measurements of objects. The module Tracker uses this measurements to estimate the velocity of the objects. New velocity estimations are only generated if new data is available. The velocity estimates are also visualized by the GUI. Figure 1 shows the recorded communication of this example. Figure 2 depicts the communication graph extracted from the recorded data. It correctly represents the actual communication structure of the example, and shows the correct relation of event producers and event consumers. Moreover, the algorithm correctly identified the type of the event connections. This can be seen by the system description the algorithm has derived which is depicted in Figure 3. It also instantiated the correct observer for the four event connections. A periodic event observer was instantiated for odometry, objects and pose, and a triggered event observer was instantiated for velocities. 1. AB(Vision) ok(objects) 2. AB(Odometry) ok(odometry) 3. AB(Tracker) ok(objects) ok(velocities) 4. AB(Selfloc) ok(pose) 5. ok(objects) AB(Vision) 6. ok(odometry) AB(Odometry) 7. ok(velocities) AB(Tracker) 8. ok(pose) AB(Selfloc) Figure 3. The system description automatically derived for the example control software. Figure 3 depicts the extracted system description. Clauses 1 to 4 describe the forward reasoning. Clauses 5 to 8 describe the backward reasoning. Clause 3 states that the module Tracker works correctly only if a velocity event occurs only after trigger event. For instance, Clause 6 states that if all output connections of module Odometry work as expected, consequently the module itself works correct. This automatically generated system description was used in some diagnosis tests. We randomly shutdown modules and evaluate if the fault was correctly detected and localized. For this simple example the faults were always proper identified. 4.2 Autonomous exploration robot Lurker In a second experiment we recorded the communication of the control software of the rescue robot Lurker [2] while the robot was autonomously exploring an unknown area. The robot is shown in Figure 4. The control software of this robot is far more complex as in the example above, since it comprises all software modules enabling a rescue robot to autonomously explore an area after a disaster. Figure 5 shows the communication graph derived from the recorded data, clearly showing the complex structure of the control software. From the communication graph and the categorized event connections a system description with 70 clauses with 51 atoms and 35 observers was derived. After a double check with the system engineer of the control software it was confirmed that the automatically derived model maps the behavior of the system. 22

Xsense inertia @ 58Hz inertia @ 25 Hz mcclient lurker_arm_pos @ 11 Hz lurker_touch_pos @ 11 Hz bumper @ 11 Hz redone @ 5 Hz LurkerController redone @ 3 Hz task_finish task_assign_climbing @ 0 Hz @ 0

hierarchy_debug @ 3 Hz 3dscan_received @ 0 Hz 3dscan_trigger @ 0 Hz tilt_ack @ 3 Hz positioner_actuator @ 0 Hz robot_context @ 0 Hz positioner_actuator @ 0 Hz RemoteAutonomy kalman_pose @ 6 Hz urglms

29 Xsense 58Hz 25 Hz mcclient 11 Hz 11 Hz 11 Hz 5 Hz LurkerController 3 Hz task_finish 0 0 Hz motor 0 3Hz 38Hz 20 Hz HierarchyController 7 Hz 3Hz 18 Hz 0 Hz 3 Hz 0 Hz 0 Hz 3 Hz 0 Hz 0 Hz 0 Hz RemoteAutonomy 6 Hz urglms 2 Hz 0 Hz 1 Hz 3 Hz 13 Hz 9 Hz localization 0 Hz 5 Hz 6 Hz elevation partial_heightmap 0 0 Hz mrfheightmapclassifier Figure 5. Communication graph Lurker robot. Figure 7 depicts the communication graph derived from the recorded data. It clearly shows that the control software for teleoperation shows a far less complex communication structure than in the autonomous service. From the communication graph and the categorized event connections a system description with 44 clauses with 31 atoms and 22 observer was derived. Figure 4. The autonomous rescue robot Lurker of the University of Freiburg. 4.3 Teleoperation Telemax robot. In a final experiment we record data during a teleoperated run with the bomb-disposal robot Telemax. The robot Telemax is shown in Figurer 6. 5 Related Research There are many proposed and implemented systems for fault detection and repair in autonomous systems. Due to lack of space we refer only a few. The Livingstone architecture by Williams and colleagues [4] was used on the space probe Deep Space One to detect failures in the probe s hardware and to recover from them. Model-based diagnosis also has been successfully applied for fault detection and localization in digital circuits and car electronics and for software debugging of VHDL programs [1]. In [3] the authors show how modelbased reasoning can be used for diagnosis for a group of robots in the health care domain. The system model comprises interconnected finite state automata. All these methods have in common that the used models of the system behavior are generated by hand. Figure 6. The teleoperated robot Telemax. 6 Conclusion and Future Work In this paper we presented an approach which allows the automated learning of communication models for robot control software. The approach uses recorded event communication. The approach is able to automatically extract a model of the behavior of the communication within a componentorientated control software. Moreover, the approach is able to derive a system description which can be used for model-based diagnosis. The approach was successfully tested on IPC-based 23

The Probabilistic Interpretation of Model-based Diagnosis

The Probabilistic Interpretation of Model-based Diagnosis Ildikó Flesch MICC, Maastricht University, Maastricht, the Netherlands Email: ildiko@cs.ru.nl Peter J.F. Lucas Institute for Computing and Information