OntoRevision: A Plug-in System for Ontology Revision in

OntoRevision: A Plug-in System for Ontology Revision in Protégé Nathan Cobby 1, Kewen Wang 1, Zhe Wang 2, and Marco Sotomayor 1 1 Griffith University, Australia 2 Oxford University, UK Abstract. Ontologies have been widely used in advanced information systems. However, it has been a challenging issue in ontology engineering to efficiently revise ontologies as new information becomes available. A novel method of revising ontologies has been proposed recently by Wang et al. However, related algorithms have not been implemented yet. In this article we describe an implementation of these algorithms called OntoRevision and report some experimental results. Our system is a plug-in for revising general ontologies in Protégé and thus can be used by Protégé users to revise ontologies automatically. 1 Introduction In knowledge engineering, an ontology is a formal model of some domain knowledge of the world [6], by providing a shared vocabulary relevant to the domain, specification of the meaning (semantics) of the terms, and a formalized specification of the conceptualization. Ontologies have been applied in a wide range of practical domains such as e-science, e- Commerce, medical informatics, bio-informatics, and the Semantic Web. As with all knowledge formalizing structures, ontologies are not static, but may evolve over time. In particular, ontologies may need to be extended and sometimes revised. Although the operation of incorporating an ontology into another existing ontology is supported by Protégé 3, it does not provide any machinery to assure the validity or usefulness of such incorporation. Firstly, classes with the same name in different ontologies are, by default, considered to be distinct. When incorporating two ontologies, classes with the same name co-exist in the resulting ontology. For instance, suppose we have two ontologies both with a class called Student. When merging the two ontologies, two classes both named Student will occur in the result. The two classes can only be distinguished when we refer to their respective URI inherited from their source ontologies. Secondly, suppose we can change the URI of the two classes Student to unify them, another problem occurs when the knowledge in the two ontologies contradicts to each other. In such case, Protégé simply combine the two ontologies leaving the result inconsistent. Although Protégé can detect such inconsistency, no solution is provided to resolve the inconsistency. Recently, a novel framework for revising ontologies in DL-Lite is introduced in [7]. The DL-Lite [2, 1], which forms the basis of OWL 2 QL [3], is a family of lightweight DLs with efficient ontology reasoning and query answering algorithms. However, Wang et al s algorithm for ontology algorithms has not been implemented yet. In this article we describe a reasoning system for ontology revision called OntoRevision 4. This system is an implementa- Corresponding author: Kewen Wang, k.wang@griffith.edu.au 3 http://protege.stanford.edu 4 http://www.ict.griffith.edu.au/ kewen/ontorevision/

tion of Wang et al s original revision algorithm and an improved algorithm. Our system is a plug-in for revising general ontologies in Protégé and thus can be used by Protégé users to revise ontologies automatically. We also report some preliminary experimental results. 2 Feature-based Revision In this section we briefly recall some basics of ontology revision introduced in [7]. The revision operator is based on a new semantic characterization called features. So we first introduce the definition of features. 2.1 An Alternative Semantics for DL-Lite A signature is a finite set S = S C S R S I S N where S C is the set of atomic concepts, S R is the set of atomic roles, S I is the set of individual names and S N is the set of natural numbers in S. We assume 1 is always in S N. and will not be considered as atomic concepts or atomic roles. Formally, given a signature S, a DL-Lite R,N bool language has the following syntax: R P P S P P B A n R C B C C 1 C 2 where n S N, A S C and P S R. B is called a basic concept and C is called a general concept. We write as a shorthand for, R for 1 R, n R for ( n + 1 R), and C 1 C 2 for ( C 1 C 2 ). Let R + = P, where P S R, whenever R = P or R = P. A TBox T is a finite set of concept inclusions of the form C 1 C 2 with C 1 and C 2 being general concepts, and role inclusions of the form R 1 R 2. An ABox A is a finite set of membership assertions of the form C(a) or S(a, b), where a, b are individual names. We call C(a) a concept assertion and S(a, b) a role assertion. A knowledge base (KB) is a pair K = T, A. In this paper, a DL ontology is represented as a DL KB. We will use ontology and KB alternatively. Features for DL-Lite N bool are based on the notion of types defined in [5]. An S-type τ is a set of basic concepts over S such that τ, and for any m, n S N with m < n, n R τ implies m R τ. When the signature S is clear from context, we will simply call an Stype a type. As τ for any type τ, we omit it in examples for simplicity. For example, let S C = {A, B}, S R = {P }, and S N = {1, 3}. Then τ = { A, P, 3 P, P } is a type. Define a type τ satisfying a concept in the following way: τ satisfies basic concept B if B τ, τ satisfies C if τ does not satisfy C, and τ satisfies C D if τ satisfies both C and D. We can also define a type τ satisfies concept inclusion C D if τ satisfies concept C D. Type τ satisfies a TBox T if it satisfies every inclusion in T. Types are sufficient to capture the semantics of TBoxes, but as they do not refer to individuals, they are insufficient to capture the semantics of ABoxes. We need to extend the notion of types with individuals and thus define Herbrand sets in DL-Lite. Definition 2.1. An S-Herbrand set (or simply Herbrand set) H is a finite set of assertions of the form B(a) or P (a, b), where a, b S I, P S R and B is a basic concept over S, satisfying the following conditions 1. For each a S I, (a) H, and n R(a) H implies m R(a) H for m, n S N with m < n.

2. For each P S R, P (a, b i ) H (i = 1,..., n) implies m P (a) H for any m S N such that m n. 3. For each P S R, P (b i, a) H (i = 1,..., n) implies m P (a) H for any m S N such that m n. We use H R to denote the set of all role assertions in H. Given a Herbrand set H for a KB K = T, A and an individual a, τ(a, H) = {C C(a) H} is a type, called the type of a in H. We define a Herbrand set H satisfies concept assertion C(a) if τ(a, H) satisfies concept C. Herbrand set H satisfies role assertion P (a, b) if P (a, b) is in H, and P (a, b) if P (a, b) is not in H. Herbrand set H satisfies an ABox A if H satisfies every assertion in A. The concept of features is defined as follows. Definition 2.2 (Features). Given a signature S, an S-feature (or simply feature) is defined as a pair F = Ξ, H, where Ξ is a non-empty set of S-types and H a S-Herbrand set, satisfying the following conditions: 1. P Ξ iff P Ξ, for each P S R. 2. τ(a, H) Ξ, for each a S I. Example 2.1. Consider the knowledge base K = T, A, where T = { A P, B P, P B, A B, 2 P } A = { A(a), P (a, b) }. Take S = sig(k) = {A, B, P, 1, 2, a, b}. Then F = Ξ, H is a (finite) model feature of K, where Ξ = {τ 1, τ 2 } with τ 1 = {A, P } and τ 2 = {B, P, P }, and H = { A(a), P (a), B(b), P (b), P (b), P (a, b) }. Definition 2.3. Given a feature F = Ξ, H, we say F satisfies a concept C if there is a type in Ξ satisfying C. an inclusion C D if τ satisfies C D for all τ Ξ. an assertion C(a) or S(a, b) if H satisfies it. F is a model feature of KB K if F satisfies every concept inclusion and every membership assertion in K. M F (K) denotes the set of all model features of K. It has been shown in [7] that the semantics defined in terms of features characterize the standard semantics of DL-Lite in terms of all major reasoning forms for DL-Lite ontologies. A KB K is said to be a maximal approximation of M over S if (1) sig(k) S and M mod(k), and (2) there exists no KB K satisfying (1) such that mod(k ) mod(k). It is shown in [4] that maximal approximation may not exist for some DLs. However, as shown in [7], maximal approximations always exist in DL-Lite N bool.

2.2 Ontology Revision Given two S-features F 1 = Ξ 1, H 1 and F 2 = Ξ 2, H 2, the distance between F 1 and F 2, denoted F 1 F 2, is a pair Ξ 1 Ξ 2, H 1 H 2. Recall that X Y is the symmetric difference for any two sets X and Y. To compare two distances, we define F 1 F 2 F 3 F 4 if Ξ 1 Ξ 2 Ξ 3 Ξ 4 and H 1 H 2 H 3 H 4 ; and F 1 F 2 F 3 F 4 if F 1 F 2 F 3 F 4 and F 3 F 4 F 1 F 2. Definition 2.4 (F-Revision). Let K, K be two DL-Lite N bool KBs and S = sig(k K ). Define the f-revision of K by K, denoted K f K, such that M F (K f K ) = M F (K ) if M F (K) =, and otherwise M F (K f K ) = { Ξ, H M F (K ) Ξ, H M F (K) s.t. H H d H (K, K ) and Ξ Ξ, H H d F (K, K ) }. where d H (K 1, K 2 ) = min ( { H 1 H 2 Ξ 1, H 1 M F (K 1 ), Ξ 2, H 2 M F (K 2 ) } ), d F (K 1, K 2 ) = min ( { F 1 F 2 F 1 M F (K 1 ), F 2 M F (K 2 ) } ) Example 2.2. Consider the following knowledge base, K = {PhDStudent Student Postgrad, Student teaches, teaches Course, Student Course }, { PhDStudent(Tom) }. The TBox of K specifies that PhD students are postgraduate students, and students are not allowed to teach any courses, while the ABox states that Tom is a PhD student. Suppose PhD students are actually allowed to teach, and we want to revise K with K = { PhDStudent teaches },. Then, K f K is { PhDStudent Student Postgrad, PhDStudent teaches, Student teaches PhDStudent, teaches Course, Student Course }, { Student(Tom), Postgrad(Tom) }. 2.3 Algorithms for Ontology Revision In this section, we introduce an algorithm for computing the maximal approximation of revision syntactically and briefly explain how it can be improved. Given a S-type τ, we denote the concept C τ = B τ B B τ B, where B is a basic concept over S. In what follows, we present an algorithm for DL-Lite N bool KB revision (ref. Figure 1). In general, it is inefficient to compute the set of features for an ontology. For this reason, we have developed an improved algorithm. In particular, we only need to consider subsets of M F (K) and M F (K ) when selecting the model features of the revision. The optimisation is based on the following observations when selecting model features F = Ξ, H for the revision. Firstly, we can compare the Herbrand sets H independently from the type sets Ξ, and eliminate those features whose Herbrand sets do not have a minimal distance. Secondly, we do not need to consider all the Herbrand sets, but only those containing only role assertions explicitly appearing in the ABoxes A and A. Thirdly, when the Herbrand sets H are fixed, then the corresponding type sets Ξ can be constructed based on H.

Algorithm 1 Input: Two DL-Lite N bool KBs K and K, S = sig(k K ). Output: K f K. Method: Initially, let T = and A =. Step 1. Compute M F(K) and M F(K ). Step 2. Obtain M F(K f K ) from M F(K) and M F(K ) by Definition 2.4. Step 3. For each S-type τ not occurring in any type set in M F(K f K ), add inclusion C τ into T. Step 4. For each individual a S I, add concept assertion ( τ Ξ a C τ )(a) into A, where Ξ a = { τ Ξ, H M F(K f K ) s.t. τ is the type of a in H }. Step 5. For each role assertion P (a, b) occurring in every Herbrand set in M F(K f K ), add P (a, b) into A. Step 6. Return T, A as K f K. Fig. 1. Compute f-revision. 3 Implementation Details OntoRevision is implemented in Java as a plug-in of Protégé. The system has been tested for Protégé version 4.1.0 (Build 213). To install OntoRevision, we need only to copy the file OntoRevision.jar into Protégé s plug-in directory. Once the plug-in is installed, it can be displayed within Protégé by selecting the OntoRevision menu item under View Ontology View and placing it within the tab user interface. Protégé uses the Manchester OWL syntax for editing ontologies. Besides necessary preprocessing and postprocessing, OntoRevision has four major modules: (1) Feature Constructor (for computing the set of features for a given KB); (2) Distance Calculator (for calculating the distance between two features); (3) Feature Selector (for picking out features with minimal distances); and (4) KB Constructor (for constructing a KB from a set of features). An input of OntoRevision is a pair (K, K ) of DL-Lite KBs. The system first computes the sets M F (K) and M F (K ) of features for K and K, respectively. Then M F (K f K ) is obtained by the module Feature Selector, which uses the Distance Calculator. Finally, the revision result (maximal approximation) is obtained by KB Constructor. A screen shot for the completion of revision operation in Protégé is shown in Figure 3. Some preliminary experiments have been performed on a desktop computer (Intel Pentium 4 CPU 3.4 GHz, 2 GB RAM). We compared the performance of the original algorithm for ontology revision (v1) in [7] and an (improved) version (v2). In the first example we tested the performance of two algorithms when the number of individuals in K is increased. The example used is K = ({A B}, {}) and K k = {A B}, {A(a 1), A(a 2 ),..., A(a k )} with k > 0. The experimental results are shown in Figure 3. It can be seen that the improved algorithm performs better than the original one but the improvement is not radical. We also tested the performance of the two algorithms when the number of concepts in K is increased. The example used is K = {A B 1 }, {A(a)} and K k = {A B 1, B 1 B 2,..., B k B k+1 }, {} with k > 0. The results show that the improved algorithm is significantly faster than the original one (ref. Figure 3).

Fig. 2. A Screen Shot of OntoRevision 4 Conclusion We have implemented a prototype system for revising DL-Lite ontologies, called OntoRevision. It is able to revise general DL-Lite knowledge bases (i. e. containing both TBoxes and ABoxes). The system is implemented as a plug-in for the ontology editor Protégé. Some experimental results have also been reported in the paper. However, the scalability of OntoRevision is still a challenge. Currently, we are working on developing more efficient algorithms for DL-Lite revision. Acknowledgments We would like to thank all anonymous reviewers for their comments. This work was supported by the Australia Research Council (ARC) Discovery Projects DP110101042 and DP1093652. References 1. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. DL-Lite in the light of first-order logic. In Proc. of 22nd AAAI, pages 361 366, 2007.

1000000 100000 10000 Time (ms) 1000 Revision v2 Revision v1 100 10 12000 1 1 2 3 4 5 6 7 8 9 10 11 Number of Individuals in K Prime Fig. 3. Number of Individuals vs Time 10000 8000 Time (ms) 6000 Revision v2 Revision v1 4000 2000 0 2 3 4 5 10 11 12 13 14 15 Number of Atomic Concepts in K Prime Fig. 4. Number of Atomic Concepts vs Time 2. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reasoning, 39(3):385 429, 2007. 3. M. Dean, D. Connolly, F. van Harmelen, J. Hendler, I. Horrocks, D. McGuinness, P. Patel-Schneider, and L. Stein. Owl web ontology language reference. http://www.w3.org/tr/2004/rec-owl-ref- 20040210/, 3C Recommendation, 10 February 2004. 4. G. De Giacomo, M. Lenzerini, A. Poggi, and R. Rosati. On the approximation of instance level update and erasure in description logics. In Proc. of 22nd AAAI, pages 403 408, 2007. 5. R. Kontchakov, F. Wolter, and M. Zakharyaschev. Can you tell the difference between DL-Lite ontologies? In Proc. of 11th KR, pages 285 295, 2008. 6. S. Staab and R. Studer, editors. Handbook on Ontologies. Springer, Berlin, 2. edition, 2009. 7. Zhe Wang, Kewen Wang, and Rodney W. Topor. A new approach to knowledge base revision in dl-lite. In Proc. of 24th AAAI, pages 369-374, 2010.