MATHEMATICAL AND EXPERIMENTAL INVESTIGATION OF ONTOLOGICAL SIMILARITY MEASURES AND THEIR USE IN BIOMEDICAL DOMAINS

Size: px
Start display at page:

Download "MATHEMATICAL AND EXPERIMENTAL INVESTIGATION OF ONTOLOGICAL SIMILARITY MEASURES AND THEIR USE IN BIOMEDICAL DOMAINS"

Transcription

1 ABSTRACT MATHEMATICAL AND EXPERIMENTAL INVESTIGATION OF ONTOLOGICAL SIMILARITY MEASURES AND THEIR USE IN BIOMEDICAL DOMAINS by Xinran Yu Similarity measurement is an important notion. In the context of ontologies, similarity measures are used to determine how similar one concept is to another. Because graph models have been used to represent ontologies, a variety of algorithms have been proposed for calculating the similarity between the graph nodes which represent ontological concepts. This thesis overviews existing ontological similarity measures and investigates mathematically and experimentally a wide range of these measures. The objective is not to assess performance to a gold-standard of similarity judgment but to develop a better understanding of the relationships among these measures through comparing their results when applied to the Gene Ontology. The experimental results show that some ontological similarity measures, especially information content-based measures, are highly correlated. The results of experiments comparing corpus-based to ontology-based information content measures for the Gene Ontology support previous experimental results using WordNet which demonstrated little difference between the two approaches.

2 MATHEMATICAL AND EXPERIMENTAL INVESTIGATION OF ONTOLOGICAL SIMILARITY MEASURES AND THEIR USE IN BIOMEDICAL DOMAINS A Thesis Submitted to the Faculty of Miami University in partial fulfillment of the requirements for the degree of Master of Science Department of Computer Science by Xinran Yu Miami University Oxford, Ohio 2010 Advisor Valerie Cross, PhD. Reader Alton Sanders, PhD. Reader Eric Bachmann, PhD.

3 CONTENT 1. Introduction Brief Historical Overview of Semantic (Ontological) Similarity Overview of Standard Ontological Similarity Measures Path-based or Edge Counting Ontological Similarity Measures Information-content Ontological Similarity Measure Tversky feature-based Ontological Similarity Measures TaxPac Implementation of Standard Ontological Similarity Measures More Recent Proposals for "Novel" Ontological Similarity Measures Similarity between Biomedical Concepts Semantic Relatedness Measure Using Object Properties in an Ontology Plethora of Similarity Measures in Bioinformatics Experimental Investigations of Ontological Similarity Measures GO Description and its Concept Attributes Structure Analysis of Gene Ontology Experimental Investigation on IC and IC-based Ontological Similarity Measures IC Experiments Using the GO Experimental Investigation on GRASM method Cellular Component Sub-Ontology Analysis using Average IC Molecular Function Sub-Ontology Analysis Biological Process Sub-Ontology Analysis Experimental Investigation on Path Based Measures Cellular Component Sub-Ontology Analysis Molecular Function Sub-Ontology Analysis Biological Process Sub-Ontology Analysis Experimental Investigation on Set Based Measures Correlations among different similarity measures in the three categories A Classification of Ontological Similarity Measures Conclusions and Future Work References ii

4 Appendix iii

5 LIST OF TABLES Table 5. 1 CC Concept Attributes Table 5. 2 MF Concept Attributes Table 5. 3 BP Concept Attributes Table 5. 4 How Many Nodes Have the Following Number of Children Table 5. 5 How Many Nodes Have the Following Numbers of Parents Table 5. 6 Percentage of Descendents for Top 5 Children Concepts of The Root Concept Table 5. 7 Means and Standard Deviations for IC Values for CC Terms Table 5. 8 Pearson Correlation between IC Measures for CC Terms Table 5. 9 Spearman Correlation between IC Measures for CC Terms Table Kendall Tau Correlation between IC Measures for CC Terms Table Means and Standard Deviations for IC Values for MF Terms Table Pearson Correlation between IC Measures for MF Terms Table Spearman Correlation between IC Measures for MF Terms Table Kendall Tau Correlation between IC Measures for MF Terms Table Means and Standard Deviations for IC Values for BP Terms Table Pearson Correlation between IC Measures for BP Terms Table Spearman Correlation between IC Measures for BP Terms Table Kendall Tau Correlation between IC Measures for BP Terms Table Comparison of IC's Mean Values for CC MF and BP Table GO Max Depth Statistics Table Pearson Correlation Corpus IC to Ontology IC with k parameter Table Spearman Correlation Corpus IC to Ontology IC with k parameter Table Kendall Tau Correlation Corpus IC to Ontology IC with k parameter Table Means and Standard Deviations for IC-Based Similarity Measures for CC Terms Table Pearson Correlations for Ontological IC-based Similarity measures on CC Table Pearson Correlations for Corpus IC-based Similarity Measures on CC Table Pearson Correlations for Ontological IC vs. Corpus IC Similarity Measures for CC Table Spearman Correlations for Ontological IC-based Similarity measures on CC Table Spearman Correlations for Corpus IC-based Similarity measures on CC Table Spearman Correlations for Ontological IC vs. Corpus IC Similarity Measures for CC Table Kendall Tau Correlations for Ontological IC-based Similarity measures on CC. 57 Table Kendall Tau Correlations for Corpus IC-based Similarity measures on CC iv

6 Table Kendall Tau Correlations for Ontological IC vs. Corpus IC Similarity Measures on CC Table Means and Standard Deviations for IC ontological similarity measures for MF. 58 Table Pearson Correlations for Ontological IC-based Similarity measures on MF Table Pearson Correlations for Corpus IC-based Similarity measures on MF Table Pearson Correlations for Ontological IC vs. Corpus IC Similarity Measures for MF Table Spearman Correlations for Ontological IC-based Similarity measures on MF Table Spearman Correlations for Corpus IC-based Similarity measures on MF Table Spearman Correlations for Ontological IC vs. Corpus IC Similarity Measures for MF Table Kendall Tau Correlations for Ontological IC-based Similarity measures on MF 62 Table Kendall Tau Correlations for Corpus IC-based Similarity measures on MF Table Kendall Tau Correlations for Ontological IC vs. Corpus IC Similarity Measures on MF Table Means and Standard Deviations for IC-Based Similarity Measures for BP Terms Table Pearson Correlations for Ontological IC-based Similarity measures on BP Table Pearson Correlations for Corpus IC-based Similarity measures on BP Table Pearson Correlations for Ontological IC vs. Corpus IC Similarity Measures for BP Table Spearman Correlations for Ontological IC-based Similarity measures on BP Table Spearman Correlations for Corpus IC-based Similarity measures on BP Table Spearman Correlations for Ontological IC vs. Corpus IC Similarity Measures for BP Table Kendall Tau Correlations for Ontological IC-based Similarity measures on BP. 66 Table Kendall Tau Correlations for Corpus IC-based Similarity measures on BP Table Kendall Tau Correlations for Ontological IC vs. Corpus IC Similarity Measures on BP Table Pearson Correlations for Ontological IC vs. Corpus IC Similarity Measures Table Spearman Correlations for Ontological IC vs. Corpus IC Similarity Measures Table Kendall Tau Correlations for Ontological IC vs. Corpus IC Similarity Measures69 Table Mean and Standard Deviation for GRASM Similality Measures on CC Terms.. 71 Table Pearson Correlations for GRASM Similarity measures on CC Table Spearman Correlations for GRASM Similarity measures on CC Table Kendall Tau Correlations for GRASM Similarity measures on CC Table Mean and Standard Deviation for GRASM Similality Measures on MF Terms.. 74 Table Pearson Correlations for GRASM Similarity measures on MF v

7 Table Spearman Correlations for GRASM Similarity measures on MF Table Kendall Tau Correlations for GRASM Similarity measures on MF Table Mean and Standard Deviation for GRASM Similality Measures on BP Terms.. 76 Table Pearson Correlations for GRASM Similarity measures on BP Table Spearman Correlations for GRASM Similarity measures on BP Table Kendall Tau Correlations for GRASM Similarity measures on BP Table Means for each similarity measure to compare original to average IC Table Pearson Correlations for Each Similarity measure and their corresponding average measures Table Spearman Correlations for Each Similarity measure and their corresponding average measures Table Kendall Correlations for Each Similarity measure and their corresponding average measures Table 6. 1 Means and Standard Deviations for Path-Based Similarity Measures for CC Terms Table 6. 2 Pearson Correlations for Path-based Similarity measures on CC Table 6. 3 Spearman Correlations for Path-based Similarity measures on CC Table 6. 4 Kendall Tau Correlations for Path-based Similarity measures on CC Table 6. 5 Means and Standard Deviations for Path-Based Similarity Measures for MF Terms Table 6. 6 Pearson Correlations for Path-based Similarity measures on MF Table 6. 7 Spearman Correlations for Path-based Similarity measures on MF Table 6. 8 Kendall Tau Correlations for Path-based Similarity measures on MF Table 6. 9 Means and Standard Deviations for Path-Based Similarity Measures for BP Terms Table Pearson Correlations for Path-based Similarity measures on BP Table Spearman Correlations for Path-based Similarity measures on BP Table Kendall Tau Correlations for Path-based Similarity measures on BP Table 7. 1 Means and Standard Deviations for Set-Based Similarity Measures for CC Terms Table 7. 2 Pearson Correlations for Set-based Similarity measures on CC Table 7. 3 Spearman Correlations for Set-based Similarity measures on CC Table 7. 4 Kendall Tau Correlations for Set-based Similarity measures on CC vi

8 Table 7. 5 Means and Standard Deviations for Set-Based Similarity Measures for MF Terms Table 7. 6 Pearson Correlations for Set-based Similarity measures on MF Table 7. 7 Spearman Correlations for Set-based Similarity measures on MF Table 7. 8 Kendall Tau Correlations for Set-based Similarity measures on MF Table 7. 9 Means and Standard Deviations for Set-Based Similarity Measures for BP Terms Table Pearson Correlations for Set-based Similarity measures on BP Table Spearman Correlations for Set-based Similarity measures on BP Table Kendall Tau Correlations for Set-based Similarity measures on BP Table 8. 1 Means and Standard Deviations for Different Categories Similarity Measures for CC Terms Table 8. 2 Pearson Correlations for Different Categories Similarity measures on CC Table 8. 3 Spearman Correlations for Different Categories Similarity measures on CC Table 8. 4 Kendall Tau Correlations for Different Categories Similarity measures on CC 99 Table 8. 5 Means and Standard Deviations for Different Categories Similarity Measures for MF Terms Table 8. 6 Pearson Correlations for Different Categories Similarity measures on MF Table 8. 7 Spearman Correlations for Different Categories Similarity measures on MF Table 8. 8 Kendall Tau Correlations for Different Categories Similarity measures on MF 100 Table 8. 9 Means and Standard Deviations for Different Categories Similarity Measures for BP Terms Table Pearson Correlations for Different Categories Similarity measures on BP Table Spearman Correlations for Different Categories Similarity measures on BP 101 Table Kendall Tau Correlations for Different Categories Similarity measures on BP 101 Table 9. 1 Property Analysis of Some Ontological Similarity Measures vii

9 LIST OF FIGURES Figure 1 Illustration for Ontological Similarity Measure Examples... 6 Figure 2 Portion of the GO ( 28 Figure 3 Example for Minimum Height and Minimum Depth Figure 4 Union and Intersection of Ancestors and Descendents Figure 5 Mean Values for Each Measure in CC, MF and BP Figure 6 Pearson Correlations for Each Pair of Measures in CC, MF and BP Figure 7 Spearman Correlations for Each Pair of Measures in CC, MF and BP Figure 8 Kendall Correlations for Each Pair of Measures in CC, MF and BP Figure 9 Classification of Semantic Similarity Measures [Pesquita et al. 2009]. MICA is most informative common ancestor; DCA is disjoint common ancestor Figure 10 Classification Scheme for Ontological Similarity Measures viii

10 ACKNOWLEDGEMENTS I would like to give my sincere thanks to Prof. Valerie Cross for her valuable advice, constant encouragement and technical help on my thesis. Without her guidance, I could not have finished this work. I also want to thank my thesis committee members, Prof. Alton Sanders and Prof. Eric Bachmann for their guidance and support. I am also grateful to Dr. Joslyn and his lab PNNL. Without the funding and help from him as well the TaxPac software from PNNL, I could not have done so well on my research and experiments. Finally I would like to thank my mom for her support and care to me during the summer. ix

11 1. Introduction We live in an information society and information plays a important role almost everywhere at anytime. More and more emphasis is being placed on the semantics of the information which is made readily available on the World Wide Web. Ontologies are being used to represent the concepts and relationships among these concepts in a wide range of problem domains, particularly in bioinformatics and biomedical domains. The visual representation of such ontologies typically takes the form of a graph model where the nodes in the graph represent the concepts and the links between the nodes represent the relationships. Graph theory is being used to analyze and understand the structure of the ontologies. Similarity measurement is an important notion which is used to compare two different objects to determine how well they agree or match each other. It is extremely important in information processing in many contexts, such as search engines, collaborative filtering, and clustering. In the context of ontologies, an important use of similarity measurement is to determine how similar one concept is to another. Because graph models have been used to represent ontologies, a variety of algorithms have been proposed for calculating the similarity between or among nodes within a graph where the nodes represent the ontological concepts. Historically, the term semantic similarity measure has been used to refer to measures used to assess how similar one concept is to another concept. In the context of this thesis an ontological similarity measure is a semantic similarity measure specific to assessing similarity between concepts within an ontology. Other semantic similarity measures have also been developed using dictionary-based [Kozima and Ito 1997] and thesaurus-based [Okumura and Honda, 1994] approaches. A wide variety of ontological similarity measures have been proposed. One of the earliest and simplest ones is a path-based measure [Rada 1989] that just counts the number of edges between two nodes representing the concepts to find the distance between these concepts. This distance can be converted to a similarity measure. 1

12 Others are more complex methods such as Wu & Palmer [Wu and Palmer 1994], and Jiang & Conrath s methods [Jiang and Conrath 1997], which consider the information content of the concept. Information content has been measured in different ways such as the concept's' position or role in the graph [Seco et al. 2004] or based on a usage analysis of the concept within a reference corpus [Resnik 1995]. For the most part, these algorithms have been developed relying on the graphical representation of an ontology. As new ontological similarity measures have been proposed, evaluations of them against existing ones have typically used one of three approaches: mathematical analysis, domain-specific applications of them, and comparison of them to human judgments of similarity [Budanitsky 1999]. The primary approach, however, has been to use comparisons to human judgments of similarity. Now more recently due to use of ontological similarity in bioinformatics, the comparison has switched to other measures of similarity based on a variety of similarity measures for gene or gene products such as sequence similarity [Lord et al. 2003a]. Few efforts have been made on mathematical analysis of ontological similarity measures [Wei 1993] [Lin 1998] [Cross 2006]. Mathematical analysis explores a measure's mathematical properties, to determine whether it is a true metric and its mathematical relationship to other existing measures. This thesis research reviews existing ontological similarity methods found in the research literature, particularly in the biomedical and bioinformatics area due to the recent proliferation of these measures [Pesquita et al. 2009]. In particular, this research makes the following contributions: 1) analyze existing ontological similarity measures to discover their relationships and an ordering if possible for the results of these measures, 2) explore the role of aggregation of ontological similarity when it is used in measuring similarity between objects described using ontological concepts, 3) Investigate in detail the overlapping of different recent "novel" measures and the older existing measures, 4) Perform a controlled experimental evaluation of these measures using generated ontologies with different characteristics 2 to explore how ontological

13 structure affects a measure's performance and 5) develop a formal framework for categorization of ontological similarity measures and compare this framework to the limited existing ontological similarity categorization frameworks. This remainder of this thesis begins with section 2 providing a brief introduction to ontological similarity and its historical development noting some of the standard ontological measures. Section 3 first provides the background for the standard historical ontological similarity measures. It then describes the work done to incorporate these into the TaxPac software being developed at Pacific Northwest National Laboratories [Joslyn et al. 2009], in order to illustrate the proliferation of these measures. More recent "novel" ontological similarity measures are discussed in Section 4. Section 5, 6 7 and 8 describe experiments designed to examine similarity measures using GO s three sub ontologies. Section 9 is the classification scheme and section 10 is conclusion and future work. 3

14 2. Brief Historical Overview of Semantic (Ontological) Similarity One of the earliest proposals for similarity between concepts in a semantic network simply determines the distance between the two concepts by counting the number of edges on the shortest path between them. [Rada 1989]. This measure is considered a semantic distance. The use of the term "semantic" is a result of this distance being determined in a semantic network. The shorter this distance, the more semantically similar are the two concepts, i.e., the semantic distance can be converted into a semantic similarity measure since similarity t is considered an inverse function of the distance. This approach is criticized since it does not consider where the concepts occurred in the semantic network. This criticism is the result of the short coming of the measure to properly capture similarities in a network that was organized hierarchically. In this case, two concepts that are deeper in the hierarchy should be considered more similar and two concepts higher in the hierarchy even though the same number of edges separate the two pairs of concepts. For example, if the network is a tree structure, the two nodes directly under the root should have less similarity than two leaves which in a depth of 10; because the upper nodes are more general and lower ones are more specific. Obviously, only considering the distance between two concepts and ignoring the position of the concepts within the hierarchical structure is not enough to determine their similarity. Because of the weakness of Rada's simple edge counting measure, numerous researchers proposed methods to weight the edges in the network so that edges lower in the hierarchy were weighted less than edges higher up in the hierarchy [Lee et al. 1993] [Leacock and Chodorow 1998]. Another measure using edge count for distance scales the distance by the distance of the lowest common ancestor of the two concepts from the root of the hierarchy [Wu and Palmer 1994]. Besides edge counting methods, other researchers focuse on determining similarity between two concepts based on the shared information content of the concepts. The first proposal using this approach utilizes the information content of the lowest 4

15 common ancestor of the two concepts [Resnik 1995] to determine the amount of information shared between the two concepts. In this ontological similarity measure, information content for a concept is determined using the frequency of occurrence of the concept within a corpus to calculate a probability for that concept. Information content of a concept is then quantified as log (p(c)). The similarity between two concepts is then given as a function of the information content of the most specific parent to both concepts. A criticism of this approach is that it only looks at the shared information content and does not consider the information content of the two concepts themselves. Other researchers create a measure that incorporates the individual information content of each concept into their ontological similarity [Lin 1998]. Another approach to determining information content is also proposed instead of having to use an external corpus to determine the information content of a concept. The information content of a concept is specified as a function of the number of descendents for the concept [Seco et al. 2004] Another approach of evaluating the ontological similarity [Rodriguez and Egenhofer 2003] relies heavily on a psychological model of similarity, Tversky's parameterized ratio model of similarity [Tversky 1977]. In this model, the similarity between two objects is based on a ratio of the features they share in common to those shared and those not shared. In ontological similarity, the objects are the concepts and a variety of characteristics of the concepts have been proposed as the features for measuring ontological similarity. This section has provided an overview of standard ontological similarity in the context of their historical development but has not presented the details of the mathematical formalization of these standard measures. The next section gives examples of various ontological similarity measures along with their detailed formulas. The general categories are presented with at least one example in each category. 5

16 3. Overview of Standard Ontological Similarity Measures As seen in the previous section, a variety of measures have been proposed to measure the similarity between two concepts in an ontology. Initially the two major categories of such measures have been path-based or distance-based measures and information content measures. Later an interest in a feature or set-based approach to measuring concept similarity returned. The whole basis for this approach is Tversky's parameterized ratio model of similarity [Tversky 1977]. The following overview discusses some of the standard examples in this category and is a summarization of some of the material in [Cross 2009]. Then implementations of some of these ontological measures in the TaxPac software are described. Discussion of the implementation of a Tversky generalization is included in section 6 that presents components of this thesis research. In the following sections Figure 1 is used to clarify the discussion of standard ontological similarity measures. Figure 1 Illustration for Ontological Similarity Measure Examples Assume an ontology contains the two concepts c1 and c2, for which distance and corresponding similarity is to be assessed. The concept c3 in the discussion represents a common subsumer or ancestor of c1 and c2 in the ontology that is typically selected to maximize the similarity between the two concepts. 3.1 Path-based or Edge Counting Ontological Similarity Measures Path-based measures are also referred to as edge-counting or distance-based since they are determined primarily using a distance or a count of the number of edges that 6

17 separate one concept in the ontology from another. One of the problems immediately noticed with such measures is that edges should not all represent the same uniform distance, that is edges higher in an ontology represent greater distances than those edges lower in the ontology. Also these calculations represent distances, some of which are converted into similarity using various approaches. In this section we present the progression of standard distance based ontological similarity measures. The simplest measure is Rada's edge counting distance measure [Rada 1989] distance R = min p [len(p(c1, c2))] (3.1) which produces the minimum length of all paths p between c1 and c2. Rada's measure is used in semantic networks and was not necessarily restricted to edges or links that represented is-a or hierarchical subsumption links. The obvious criticism was that edges were all weighted the same regardless of where the edges occurred in the semantic network, either high or low. An approach to converting distance measures into a similarity measure is proposed by Leacock and Chodorow (1998) as sim LC = max p (3.2) where Np is the number of nodes in path p from c1 to c2 and D is the maximum depth of the taxonomy. Another approach to using path-based distances in the calculation of ontological similarity uses the distance of the common subsumer c3 from the root of the ontology [Wu and Palmer 1994] to express the intuition that the lower the concepts c1 and c2 are in the ontology the greater the similarity of the concepts. sim WP (c1, c2) = 2 (3.3) In this measure, c3 is typically selected as the one lowest or deepest one in the ontology in order to maximize the numerator. 7

18 These ontological similarity measures are the early standard ones for path-based measures. Other more recent variations of these path-based measures are discussed in Section Information-content Ontological Similarity Measure Path-based measures of ontological similarity are criticized for not having an appropriate edge weighting mechanism to reflect the difference for path distance between concepts at a higher level. Instead, information content-based (IC) ontological measures rely on a measure of how informative concepts are within an ontology when determining how similar two concepts are. The earliest approach to determining information content is based on using an external resource such as an associated corpus for the problem domain [Resnik 1995]. The information content of a concept c is given as using standard information theory [Ross 1976] IC corpus (c) = -log p(c) (3.4) with p(c) being the probability of the occurrence of an instance of concept c in a specified corpus. The value p(c) is based on the frequency of the concept. The frequency of a concept is the number of occurrences in the corpus of all words representing that concept. The frequency of the concept also includes the total frequencies of all its children concepts. The probability is calculated by dividing this total frequency count by the total number of words in the corpus. Because the formula is the negative logarithm of the probability, as the probability increases the information content decreases; therefore, concepts higher in the ontology which have a greater probability of occurring have less information content than those lower in the ontology. Others have argued that instead of using an external resource for determining the information content of a concept, the ontology structure itself and a concept's position within that structure should be used (Seco, Veale, and Hayes 2004). The intuition is 8

19 leaf concepts in the ontology are most specific and therefore contained the most information and root concepts were the least specific and therefore contained the least information. They developed the following information content measure: IC ont (c) = log /log = 1- (3.5) where num_desc(c) is the number of descendents concept c and max ont is the maximum number of concepts in the ontology. This IC measure is normalized in that all leaf concepts have the maximum information content of 1. This information content decreases until the value is 0 for the root concept of the ontology. A new IC method proposed by Wang et.al in 2009 is called a weighted information content measure which incorporates not only the number of descendants a concept has but also the depth of the concept within the ontology: ICont-desc-depth(c) = k (1- log ) + (1-k) (3.6) The first IC based ontological similarity measure was propsed by Resnik (1995) as sim RES (c1,c2) = max S(c1,c2) [IC corpus (c)] (3.7) where S(c1,c2) is the set of concepts that subsume both c1 and c2. From Figure 1 assume that c3 is that concept which produces the maximum IC value. Basically, this measure examines all concepts that subsume concepts c1 and c2 and they do not have any descendents that subsume both c1 and c2 and uses the one that has the most information content, i.e., the most informative. A major criticism of Resnik's measure is that it only looks at shared information between the two concepts but does not incorporate the separate information content of the two concepts themselves. Lin (1998) defined another ontological measure to address this criticism: 9

20 sim Lin (c1,c2)= (3.8) where c3 is the subsuming concept with the most information content. Note that IC is not subscripted since either an external resource such as a corpus or the ontological structure could be used to determine IC. Jiang and Conrath (1997) define another distance measure between ontological concepts. Their objective is to integrate path-based measures and information content methods. Intuitively, the distance is based on totaling up their separate information contents and subtracting out twice the information content of their most informative subsumer. dist JC (c1, c2) = IC(c1) + IC(c2) 2 IC(c3) (3.9) Whatever information content remains indicates the distance between them. If there is no IC left, i.e., 0, then the two concepts are the same. This distance measure can be converted to similarity and several approaches have been proposed. For example, Seco, Veale, and Hayes 2004) used the following: Sim JC (c1, c2) = 1- (IC(c1) + IC(c2)- 2 IC(c3)) 0.5 (3.10) The relationship between the Lin ontological similarity measure and the Jiang and Conrath ontological distance measure can be seen if Lin's measure is converted into distance by subtracting it from 1 since it is normalized in [0, 1] range [Cross 2009] dist Lin (c1, c2) = 1 - sim Lin (c1,c2) = 1 - = (3.11) The dist JC ontological distance measure is simply an unnormalized version of dist Lin 10

21 3.3 Tversky feature-based Ontological Similarity Measures How humans judge similarity is an active research area of psychology. One of the most famous model for similarity assessment is Tversky s parameterized ratio model of similarity [Tversky 1997]: S Tverksy (X, Y) = (3.12) With = = 1, S becomes the Jaccard index. S jaccard (X, Y) = (3.11) With = = 1/2, S Tverksy becomes Dice s coefficient of similarity: S dice (X, Y) = (3.13) With = 1, = 0, S becomes the degree of inclusion for X, that is, the proportion of X overlapping with Y. S inclusion (X, Y) = (3.14) Similarly with = 0, = 1, S becomes the degree of inclusion for Y, the proportion of Y overlapping with X. Using this model, researchers have begun looking at a concept in an ontology as an object with a set of features. There are a wide variety of "features" that may be selected to describe a concept within an ontology. For a concept, for example, its set of features could be its set of ancestors. Then a natural ontological similarity measure between two concepts x and y and their respective set of ancestors X and Y would be the application of Tverky's parameterized ratio model of similarity. Another set of features to describe a concept is its set of descendents. Tversky's model is especially flexible in that any set of features describing a concept can be used in determining its similarity to another concept. 11

22 Researchers [Rodriquez and Egenhofer 2003] applied Tversky's model repeatedly to define a similarity measure between two classes in an ontology. Their ontological similarity measure between entity classes c1 and c2 incorporates a weighted aggregation of Tverskys similarity measures on a wide range of feature sets of the concepts including synonym sets, semantic neighborhoods, and distinguishing features. Distinguishing features are further classified into parts, functions and attributes. The only difference is that the and parameters are determined as a function of the depth of the two concepts. 3.4 TaxPac Implementation of Standard Ontological Similarity Measures TaxPac stands for Taxonomy Package which is an experimental mathematics environment for knowledge systems analysis being developed at PNNL [Joslyn and White 2009]. It is a platform available in Python built as an extension of the NetworkX system for graph analysis developed by the Los Alamos National Laboratory. Its main goal is to use mathematical order theory to express and analyze knowledge bases which can be represented in various graph structures ranging from digraphs to concept lattices. As part of the PNNL contract work this past summer, this package was extended with the basic standard ontological similarity measures described in the previous sections. These measures are included in the class BoundedDAG which stands for bounded directed acyclic graph. The implementation of these ontological measures took advantage of the existing TaxPac data structures and classes suitable for representing ontologies and concepts within them. Using and further extending this TaxPac environment is planned in order to accomplish the goals of this thesis research which will require experiments and analysis on the wide range of ontological similarity measures. In TaxPac environment, the implementation of all the standard path based measures only uses the distance between two nodes c1 and c2 through a common subsumer c3. 12

23 However, because there can be multiple subsumers (multiple parent nodes may exist for each node c1 and c2), the measures have been parameterized (min, max, ave) to allow selection of the minimum, maximum or the average of over all the similarity measures calculated using each of the common subsumers for two nodes c1 and c2. The standard ontological similarity measures assume that the common subsumer which maximizes the similarity measure should be the result. In the following section a discussion on newer measures that consider different aggregation approaches is presented and those approaches provided the motivation for the provided parameterization. For the information-content-based measures, a major aspect is what method is used to calculate the IC value, i.e., using outside corpus to probabilistically determine the IC value and assign it as a node weight or use some other node metric or node weighting scheme based on the structure of the ontology graph such as the use of the number of descendents of a node. TaxPac provides both edge-weighting and node-weighting capabilities. In the current implementation and testing, only the method proposed in [Seco et al 2004] has been tested within the TaxPac environment. Other node weighting schemes that have been coded but not tested are presented in Section 6. From the presentation on information-content based measures, one can see that all of them essentially draw from the same components: IC(c1), IC(c2) and IC(c3). A standard parameterized method was created that allows the creation of any of the IC measures based on the components selected in the standard IC formula. As with the path-based measures, the key to the standard IC measures is the common subsumer c3. Because there can be multiple common subsumers, the standard IC ontological similarity measures are also parameterized to allow selection of minimum, maximum or the average of all the similarity measures calculated using each of the common subsumers for two nodes c1 and c2. 13

24 In the following section a discussion on newer measures that consider different aggregation approaches is presented and those approaches provide the motivation for the provided parameterization. 14

25 4. More Recent Proposals for "Novel" Ontological Similarity Measures The previous section presented the standard approaches to ontological similarity measures. This section describes numerous other proposed measures many of which have been developed for use in biomedical domains. Biomedical engineering is a unique mix of engineering, medicine and science which emerged early last century. Breakthrough advances in biotechnology have given rise to rapid production of biomedical data [Spasić and Ananiadou 2005] and the creation of a wide variety of ontologies such as MeSH, SNOMED, ICD family, the Gene Ontology and so on. For example, the Gene Ontology has been used in the assessment of similarity between gene and gene products based on the ontological similarity between concepts or GO terms annotating the genes. The biomedical domain is serving as the primary impetus for the creation of new ontological similarity measures. 4.1 Similarity between Biomedical Concepts Recently two new ontological similarity measures for biomedical concepts were proposed in [Nguyen and Al-Mubaid 2006] and [Al-Mubaid and Nguyen 2009]. Actually, the measure proposed in the second paper basically uses the measure from the first paper but incorporates it into a measure between two concepts in two different ontologies. These ontologies have "bridge" concepts, i.e., concepts that occur in both ontologies. The first measure is based on the observation that the lower the two nodes are in a hierarchy, the more similar they are. This observation is not new since some existing path-based and IC ontological similarity measures make adjustments for the position of the concepts in the ontology based on their lowest (deepest) or most informative common ancestor for path-based and IC based measures respectively. Their proposed method, however, adjusts this depth by subtracting from the overall depth as: D-Depth(LCS(c1,c2)) (4.1) 15

26 Then their proposed similarity measure is defined as sim NM (c1,c2) = log 2 ((len(c1,c2) - 1) ( D-Depth(LCS(c1,c2)))+ 2) (4.2) Looking at this measure, one sees that it uses the distance between the two concepts and then increases the distance based on the difference between the greatest depth and the depth of the LCS(c1, c2). The greater the depth of the LCS(c1, c2) then the smaller the increase in the distance between c1 and c2. Therefore, concepts c1 and c2 that have the same distance between them as concepts c3 and c4 will have an LCS of greater depth than c3 and c4 will not result a smaller increase in the overall calculated distance being fed into the logarithm function. An observation we make is that this method does not have the problem of Leacock & Chodorow method. That is when two pairs of concepts have the same path distance but in different levels of the ontology, they still have the same proportional ontological similarity since the maximum depth is the same for the whole ontology.. Their above measure (actually distance measure) does use the depth of their deepest common subsumer but again adjusts by the overall depth of the whole ontology. This adjustment process is the same for each pair of concepts. Although their proposed distance method does improve Leacock and Chodorow's measure in that it takes into account of the depth of the least common ancestor of the two concepts, other ontological similarity measures such as the Wu-Palmer measure use the depth of the lowest (deepest) common ancestor without the adjustment of subtracting it from D. Their experimentation shows results got by applying the proposed method and four other existing methods on biomedical datasets. The average correlation of the proposed method between physicians and experts are higher than that of other similarity methods (except that Leacock & Chodorow method s correlation to physicians judgment is a little bit higher than the proposed one). However, when one examines the resulting tables, their measure is at best better with respect to correlation with human judgments of similarity. 16 They do not state what kind of

27 correlation measure was used in this analysis. In discussion of the results, the authors briefly mention that the Wu and Palmer method is similar to their measure in that it takes into account the depth of deepest common ancestor of two concepts. Part of this thesis research is to mathematically show the relationships between the newer proposed ontological similarity measures and the standard ones. In their more recent 2009 paper [Al-Mubaid and Nguyen 2009], they correct their measure into a SemDist measure. In this paper, the authors state that they want to combine both path length and depth of the nodes in their new measure. They incorrectly state, however, In addition, the measure of Wu and Palmer [Wu and Palmer 1994] uses only depth of concept nodes [Al-Mubaid and Nguyen 2009]. The measure they propose is the same as in the previous paper but with a few parameters. It is a path-based measure that uses the depth of the lowest common subsumer, i.e., the one that is deepest in the ontology and normalizes it by subtracting it from D the depth of the overall ontology, to define the common specificity as before in their first paper: CSpec(c1, c2) = D - depth(lcs (c1, c2)) (4.3) Here c3 = LCS(c1, c2) is not selected by the maximum information content but instead by the maximum depth. They then define a semantic distance between c1 and c2 as SemDist(c1,c2) = log((path-1) α (CSpec) ß + k) (4.4) where path is the shortest path length between the two concept nodes. This SemDist is the same measure as in their previous paper except they added parameters α, β and k. These parameters are all set to 1 in their actual experiments described in the paper so that this SemDist measure is what they proposed as their semantic similarity measure except they previously added 2 instead of 1 (k=1). This SemDist measure is to be used for concepts that occur in the same primary ontology. The objective of this paper is to also define similarity measures for concepts that occur in multiple ontologies. Their definition of primary ontology is the ontology with the greatest 17

28 granularity. The definition of the ontology with the greatest granularity is not clear but appears to be the one with the greatest depth. Then the authors propose a measure that can be used when concepts are in different ontologies but these ontologies have common "bridge" concepts. Given a primary ontology containing c1 and a secondary ontology containing c2 and a set of bridge concepts bridge i, that occur in both ontologies, the formulas all remain the same except that bridge i, is used as follows: CSpec i (c1, c2) = D - depth(lcs (c1, bridge i )) (4.5) SemDist i (c1,c2) = log((path i -1) α (CSpec i ) ß + k) (4.6) SemDist i (c1,c2) = min q [SemDist q (c1,c2)] (4.7) The path distance between c1 and c2 is calculated as the sum of c1 s distance to the bridge and c2 s distance to the bridge. The distance of c2 to the bridge is scaled by the pathrate calculated as the ratio of (2 D 1-1)/(2 D 2-1) where D 1 is the overall depth of the primary ontology and D 2 is the overall depth of the secondary ontology The bridge concept in the primary ontology also serves to determine its lowest common subsumer with concept c1 in the primary ontology. Numerous other rules are proposed for finding ontological similarity between concepts when they are in secondary ontologies. One case is when the concepts are both in the same secondary ontology. This case uses the same formula for SemDist but Path(c1, c2) in the secondary ontology is scaled by the pathrate and Cspec(c1, c2) in secondary ontology is scaled by (D 1-1)/(D 2-1) where D 1 is the overall depth of the primary ontology and D 2 is the overall depth of the secondary ontology. Their rationale is that the semantic distance between the concepts in the secondary ontology must be converted into the primary ontology scales. The other case occurs when the two concepts c1 and c2 are in different secondary ontologies and neither concept exists in the primary ontology. One of the secondary ontologies temporarily acts as the primary ontology. Their discussion of this case is not clear. 18

29 They recommend for calculating SemDist between concepts in multiple ontologies that the ontology with the greatest granularity is selected as the primary ontology. If a concept occurs in multiple secondary ontologies, they recommend selecting an ontology that has the most overlap of concepts with the primary ontology. The authors also develop a set of experiments using two vocabularies from the UMLS: SNOMED-CT and MeSH and the WordNet 2.0 ontology and several different datasets based on previous experiments that evaluate measures based on their correlation with human judgments of similarity between concepts in the vocabulary. Their experiments use WordNet 2.0 as the primary ontology and MeSH and SNOMED-CT as the secondary ontologies. One aspect that is not clear is the results of two other measures. No explanation is showed of how the results are calculated for the Leacock and Chodorow measure and the Wu and Palmer measure. These two measures are defined for a single ontology. It is not clear how they are adapted for multiple ontologies in order to produce the numbers provided in the tables. 4.2 Semantic Relatedness Measure Using Object Properties in an Ontology In [Mazuel and Sabouret 2008] a semantic relatedness measure is proposed that makes use of the Hirst & St-Onge patterns for semantically correct paths [Hirst and St-Onge 1998] and the information-theoretic paradigm introduced in [Resnik 1995] In all of the previous discussions of ontological similarity measures the type of relationship that is used to link concepts to one another is the is-a or subsumption relationship or the part-of relationship. In ontologies where other relationships exist between concepts it might be the case that there is low ontological similarity but still the concepts may be highly related. Although most measures focus on the hierarchical structuring relationships, Hirst and St-Onge proposes a semantic relatedness measure that required certain patterns or changes in direction to hold in order to calculate the semantic relatedness between 19

30 two concepts. In [Mazuel and Sabouret 2008] a relatedness measure is proposed that integrates the use of other kinds of links in determining path based measures. In their discussion, there are some errors. For example, they state: The first node-based similarity measure, proposed by Resnik in [Resnik 1995], is defined by the information content of the closest common parent (ccp) of the two concept c1and c2. This statement is incorrect. The closeness of the common parent has nothing to do with the selected common parent. It is the most informative common ancestor, i.e., the one with the highest IC value should be selected. The objective of the authors is to extend the assumption that two different hierarchical edges do not carry the same information content to non-hierarchical links. There are two situations, single relation path and mixed relation path. For single one, Jiang & Conrath method is used if the path has only "is-a" (upward) and "includes" (downward) relations although the authors state that the upward path distance has to be calculated separately from the downward path and the two added together. This approach is simply the same as dist JC (c1, c2) = IC(c1) + IC(c2) 2 IC(c3) (4.8) They use the method of calculating IC as given in [Seco et al. 2004] defined above as IC ont (c). Now for paths that use relations which are not hierarchical, a static strength is associated with each type X relation, TC X, and the path weight is calculated as : W(pathX(x,y)) = TC X (4.9) For mixed path components from concept c1 to concept c2, the path can be factorized as an ordered set of n single-relation sub-paths, and then add the single relation path weights together. They define the minimal factorization T min (path(c1, c2))) as the factorization which minimizes 20

31 the value n. The weight of the mixed path (c1, c2) is then defined as the weight sum of all sub-paths of T min. The final distance between two concepts is defined as (4.10) where the HSO(p) allows only paths that are semantically correct based on the rules of Hirst and St-Onge to be used. Since this is a distance measure, the authors convert it to a similarity measure by subtracting it from the greatest distance as: rel(c1,c2) = 2 IC max dist(c1,c2) (4.10) Tests are implemented on Miller & Charles data [Miller and Charles 1991] and the WordSimilarity-353 data set [ using the WordNet ontology (only the noun part which is the standard approach). Their experiments showe that their measure has a higher Pearson-correlation with human similarity judgments than any of the Rada, Resnik, Lin, Jiang & Conrath, Hirst & St-Onge measures. 4.3 Plethora of Similarity Measures in Bioinformatics In [Pesquita et al 2009] an overview of the wide variety of semantic similarity measures is presented. In this section, some of these measures are presented in order to illustrate the proliferation of such measures and to argue for the development of a framework to be used in comparing such measures mathematically and experimentally without using correlation with some gold-standard of similarity assessment. In this paper, the primary ontology that these similarity measures have been used with is the Gene Ontology. The performance of the semantic similarity measures is assessed on how well they can be used to determine the similarity of genes or gene products that are annotated using GO terms. The similarity between two genes or gene products is determined as an aggregation of the similarities between their sets of GO term annotations. The performance of the ontological similarity measures is then 21

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science 1 Analysis and visualization of protein-protein interactions Olga Vitek Assistant Professor Statistics and Computer Science 2 Outline 1. Protein-protein interactions 2. Using graph structures to study

More information

Semantic Similarity and Relatedness

Semantic Similarity and Relatedness Semantic Relatedness Semantic Similarity and Relatedness (Based on Budanitsky, Hirst 2006 and Chapter 20 of Jurafsky/Martin 2 nd. Ed. - Most figures taken from either source.) Many applications require

More information

Francisco M. Couto Mário J. Silva Pedro Coutinho

Francisco M. Couto Mário J. Silva Pedro Coutinho Francisco M. Couto Mário J. Silva Pedro Coutinho DI FCUL TR 03 29 Departamento de Informática Faculdade de Ciências da Universidade de Lisboa Campo Grande, 1749 016 Lisboa Portugal Technical reports are

More information

Review Article From Ontology to Semantic Similarity: Calculation of Ontology-Based Semantic Similarity

Review Article From Ontology to Semantic Similarity: Calculation of Ontology-Based Semantic Similarity Hindawi Publishing Corporation The Scientific World Journal Volume 2013, Article ID 793091, 11 pages http://dx.doi.org/10.1155/2013/793091 Review Article From Ontology to Semantic Similarity: Calculation

More information

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms Bioinformatics Advance Access published March 7, 2007 The Author (2007). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

More information

A Study of Correlations between the Definition and Application of the Gene Ontology

A Study of Correlations between the Definition and Application of the Gene Ontology University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Theses, Dissertations, & Student Research in Computer Electronics & Engineering Electrical & Computer Engineering, Department

More information

A set theoretic view of the ISA hierarchy

A set theoretic view of the ISA hierarchy Loughborough University Institutional Repository A set theoretic view of the ISA hierarchy This item was submitted to Loughborough University's Institutional Repository by the/an author. Citation: CHEUNG,

More information

Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics

Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics Raman

More information

Functional Characterization and Topological Modularity of Molecular Interaction Networks

Functional Characterization and Topological Modularity of Molecular Interaction Networks Functional Characterization and Topological Modularity of Molecular Interaction Networks Jayesh Pandey 1 Mehmet Koyutürk 2 Ananth Grama 1 1 Department of Computer Science Purdue University 2 Department

More information

Toponym Disambiguation using Ontology-based Semantic Similarity

Toponym Disambiguation using Ontology-based Semantic Similarity Toponym Disambiguation using Ontology-based Semantic Similarity David S Batista 1, João D Ferreira 2, Francisco M Couto 2, and Mário J Silva 1 1 IST/INESC-ID Lisbon, Portugal {dsbatista,msilva}@inesc-id.pt

More information

The OntoNL Semantic Relatedness Measure for OWL Ontologies

The OntoNL Semantic Relatedness Measure for OWL Ontologies The OntoNL Semantic Relatedness Measure for OWL Ontologies Anastasia Karanastasi and Stavros hristodoulakis Laboratory of Distributed Multimedia Information Systems and Applications Technical University

More information

Measuring Semantic Similarity between Gene Ontology Terms

Measuring Semantic Similarity between Gene Ontology Terms Measuring Semantic Similarity between Gene Ontology Terms Francisco M. Couto a Mário J. Silva a Pedro M. Coutinho b a Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa, Portugal

More information

Similarity for Conceptual Querying

Similarity for Conceptual Querying Similarity for Conceptual Querying Troels Andreasen, Henrik Bulskov, and Rasmus Knappe Department of Computer Science, Roskilde University, P.O. Box 260, DK-4000 Roskilde, Denmark {troels,bulskov,knappe}@ruc.dk

More information

John Pavlopoulos and Ion Androutsopoulos NLP Group, Department of Informatics Athens University of Economics and Business, Greece

John Pavlopoulos and Ion Androutsopoulos NLP Group, Department of Informatics Athens University of Economics and Business, Greece John Pavlopoulos and Ion Androutsopoulos NLP Group, Department of Informatics Athens University of Economics and Business, Greece http://nlp.cs.aueb.gr/ A laptop with great design, but the service was

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Calculating Semantic Relatedness with GermaNet

Calculating Semantic Relatedness with GermaNet Organismus, Lebewesen organism, being Katze cat... Haustier pet Hund dog...... Baum tree Calculating Semantic Relatedness with GermaNet Verena Henrich, Düsseldorf, 19. Februar 2015 Semantic Relatedness

More information

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón What is GO? The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in

More information

Information-theoretic and Set-theoretic Similarity

Information-theoretic and Set-theoretic Similarity Information-theoretic and Set-theoretic Similarity Luca Cazzanti Applied Physics Lab University of Washington Seattle, WA 98195, USA Email: luca@apl.washington.edu Maya R. Gupta Department of Electrical

More information

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Exploring Spatial Relationships for Knowledge Discovery in Spatial Norazwin Buang

More information

Using C-OWL for the Alignment and Merging of Medical Ontologies

Using C-OWL for the Alignment and Merging of Medical Ontologies Using C-OWL for the Alignment and Merging of Medical Ontologies Heiner Stuckenschmidt 1, Frank van Harmelen 1 Paolo Bouquet 2,3, Fausto Giunchiglia 2,3, Luciano Serafini 3 1 Vrije Universiteit Amsterdam

More information

WEST: WEIGHTED-EDGE BASED SIMILARITY MEASUREMENT TOOLS FOR WORD SEMANTICS

WEST: WEIGHTED-EDGE BASED SIMILARITY MEASUREMENT TOOLS FOR WORD SEMANTICS WEST: WEIGHTED-EDGE BASED SIMILARITY MEASUREMENT TOOLS FOR WORD SEMANTICS Liang Dong, Pradip K. Srimani, James Z. Wang School of Computing, Clemson University Web Intelligence 2010, September 1, 2010 Outline

More information

Hierachical Name Entity Recognition

Hierachical Name Entity Recognition Hierachical Name Entity Recognition Dakan Wang, Yu Wu Mentor: David Mcclosky, Mihai Surdeanu March 10, 2011 1 Introduction In this project, we investigte the hierarchical name entity recognition problem

More information

A Multiobjective GO based Approach to Protein Complex Detection

A Multiobjective GO based Approach to Protein Complex Detection Available online at www.sciencedirect.com Procedia Technology 4 (2012 ) 555 560 C3IT-2012 A Multiobjective GO based Approach to Protein Complex Detection Sumanta Ray a, Moumita De b, Anirban Mukhopadhyay

More information

Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction

Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction Alexander Panchenko alexander.panchenko@student.uclouvain.be Université catholique de Louvain & Bauman Moscow State

More information

Experimental designs for multiple responses with different models

Experimental designs for multiple responses with different models Graduate Theses and Dissertations Graduate College 2015 Experimental designs for multiple responses with different models Wilmina Mary Marget Iowa State University Follow this and additional works at:

More information

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch

More information

Test of Complete Spatial Randomness on Networks

Test of Complete Spatial Randomness on Networks Test of Complete Spatial Randomness on Networks A PROJECT SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Xinyue Chang IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

More information

A Game-Theoretic Approach to Graph Transduction: An Experimental Study

A Game-Theoretic Approach to Graph Transduction: An Experimental Study MSc (ex D.M. 270/2004) in Computer Science Dissertation A Game-Theoretic Approach to Graph Transduction: An Experimental Study Supervisor Prof. Marcello Pelillo Candidate Michele Schiavinato Id 810469

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

Paper presented at the 9th AGILE Conference on Geographic Information Science, Visegrád, Hungary,

Paper presented at the 9th AGILE Conference on Geographic Information Science, Visegrád, Hungary, 220 A Framework for Intensional and Extensional Integration of Geographic Ontologies Eleni Tomai 1 and Poulicos Prastacos 2 1 Research Assistant, 2 Research Director - Institute of Applied and Computational

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models, Spring 2015 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Yi Cheng, Cong Lu 1 Notation Here the notations used in this course are defined:

More information

Least Common Subsumers and Most Specific Concepts in a Description Logic with Existential Restrictions and Terminological Cycles*

Least Common Subsumers and Most Specific Concepts in a Description Logic with Existential Restrictions and Terminological Cycles* Least Common Subsumers and Most Specific Concepts in a Description Logic with Existential Restrictions and Terminological Cycles* Franz Baader Theoretical Computer Science TU Dresden D-01062 Dresden, Germany

More information

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68 References 1 L. Freeman, Centrality in Social Networks: Conceptual Clarification, Social Networks, Vol. 1, 1978/1979, pp. 215 239. 2 S. Wasserman

More information

Disease Ontology Semantic and Enrichment analysis

Disease Ontology Semantic and Enrichment analysis Disease Ontology Semantic and Enrichment analysis Guangchuang Yu, Li-Gen Wang Jinan University, Guangzhou, China April 21, 2012 1 Introduction Disease Ontology (DO) provides an open source ontology for

More information

OSS: A Semantic Similarity Function based on Hierarchical Ontologies

OSS: A Semantic Similarity Function based on Hierarchical Ontologies OSS: A Semantic Similarity Function based on Hierarchical Ontologies Vincent Schickel-Zuber and Boi Faltings Swiss Federal Institute of Technology - EPFL Artificial Intelligence Laboratory vincent.schickel-zuber@epfl.ch,

More information

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Some slides adapted from Jacques van Helden Gene expression profiling A quick review Which molecular processes/functions

More information

Just: a Tool for Computing Justifications w.r.t. ELH Ontologies

Just: a Tool for Computing Justifications w.r.t. ELH Ontologies Just: a Tool for Computing Justifications w.r.t. ELH Ontologies Michel Ludwig Theoretical Computer Science, TU Dresden, Germany michel@tcs.inf.tu-dresden.de Abstract. We introduce the tool Just for computing

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny October 29 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/22 Lister s experiment Introduction In the 1860s, Joseph Lister conducted a landmark

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Role-depth Bounded Least Common Subsumers by Completion for EL- and Prob-EL-TBoxes

Role-depth Bounded Least Common Subsumers by Completion for EL- and Prob-EL-TBoxes Role-depth Bounded Least Common Subsumers by Completion for EL- and Prob-EL-TBoxes Rafael Peñaloza and Anni-Yasmin Turhan TU Dresden, Institute for Theoretical Computer Science Abstract. The least common

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Outline. Structure-Based Partitioning of Large Concept Hierarchies. Ontologies and the Semantic Web. The Case for Partitioning

Outline. Structure-Based Partitioning of Large Concept Hierarchies. Ontologies and the Semantic Web. The Case for Partitioning Outline Structure-Based Partitioning of Large Concept Hierarchies Heiner Stuckenschmidt, Michel Klein Vrije Universiteit Amsterdam Motivation: The Case for Ontology Partitioning Lots of Pictures A Partitioning

More information

Key Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context.

Key Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context. Marinos Kavouras & Margarita Kokla Department of Rural and Surveying Engineering National Technical University of Athens 9, H. Polytechniou Str., 157 80 Zografos Campus, Athens - Greece Tel: 30+1+772-2731/2637,

More information

Scrutinizing the relationships between SNOMED CT concepts and semantic tags

Scrutinizing the relationships between SNOMED CT concepts and semantic tags Bona and Ceusters Scrutinizing the relationships between SNOMED CT concepts and semantic tags Jonathan Bona 1, * and Werner Ceusters 2 1 Department of Biomedical Informatics, University of Arkansas for

More information

ASSESSING AND EVALUATING RECREATION RESOURCE IMPACTS: SPATIAL ANALYTICAL APPROACHES. Yu-Fai Leung

ASSESSING AND EVALUATING RECREATION RESOURCE IMPACTS: SPATIAL ANALYTICAL APPROACHES. Yu-Fai Leung ASSESSING AND EVALUATING RECREATION RESOURCE IMPACTS: SPATIAL ANALYTICAL APPROACHES by Yu-Fai Leung Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial

More information

Meronymy-based Aggregation of Activities in Business Process Models

Meronymy-based Aggregation of Activities in Business Process Models Meronymy-based Aggregation of Activities in Business Process Models Sergey Smirnov 1, Remco Dijkman 2, Jan Mendling 3, and Mathias Weske 1 1 Hasso Plattner Institute, Germany 2 Eindhoven University of

More information

A Bayesian. Network Model of Pilot Response to TCAS RAs. MIT Lincoln Laboratory. Robert Moss & Ted Londner. Federal Aviation Administration

A Bayesian. Network Model of Pilot Response to TCAS RAs. MIT Lincoln Laboratory. Robert Moss & Ted Londner. Federal Aviation Administration A Bayesian Network Model of Pilot Response to TCAS RAs Robert Moss & Ted Londner MIT Lincoln Laboratory ATM R&D Seminar June 28, 2017 This work is sponsored by the under Air Force Contract #FA8721-05-C-0002.

More information

Bayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen

Bayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen Bayesian Hierarchical Classification Seminar on Predicting Structured Data Jukka Kohonen 17.4.2008 Overview Intro: The task of hierarchical gene annotation Approach I: SVM/Bayes hybrid Barutcuoglu et al:

More information

Theoretical Foundations of the UML Lecture 18: Statecharts Semantics (1)

Theoretical Foundations of the UML Lecture 18: Statecharts Semantics (1) Theoretical Foundations of the UML Lecture 18: Statecharts Semantics (1) Joost-Pieter Katoen Lehrstuhl für Informatik 2 Software Modeling and Verification Group http://moves.rwth-aachen.de/teaching/ws-1415/uml/

More information

Non-impeding Noisy-AND Tree Causal Models Over Multi-valued Variables

Non-impeding Noisy-AND Tree Causal Models Over Multi-valued Variables Non-impeding Noisy-AND Tree Causal Models Over Multi-valued Variables Yang Xiang School of Computer Science, University of Guelph, Canada Abstract To specify a Bayesian network (BN), a conditional probability

More information

Alexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA

Alexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA Analyzing Behavioral Similarity Measures in Linguistic and Non-linguistic Conceptualization of Spatial Information and the Question of Individual Differences Alexander Klippel and Chris Weaver GeoVISTA

More information

A HYBRID SEMANTIC SIMILARITY MEASURING APPROACH FOR ANNOTATING WSDL DOCUMENTS WITH ONTOLOGY CONCEPTS. Received February 2017; revised May 2017

A HYBRID SEMANTIC SIMILARITY MEASURING APPROACH FOR ANNOTATING WSDL DOCUMENTS WITH ONTOLOGY CONCEPTS. Received February 2017; revised May 2017 International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 4, August 2017 pp. 1221 1242 A HYBRID SEMANTIC SIMILARITY MEASURING APPROACH

More information

OWL Semantics COMP Sean Bechhofer Uli Sattler

OWL Semantics COMP Sean Bechhofer Uli Sattler OWL Semantics COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Uli Sattler uli.sattler@manchester.ac.uk 1 Toward Knowledge Formalization Acquisition Process Elicit tacit knowledge A set of terms/concepts

More information

Efficient Reassembling of Graphs, Part 1: The Linear Case

Efficient Reassembling of Graphs, Part 1: The Linear Case Efficient Reassembling of Graphs, Part 1: The Linear Case Assaf Kfoury Boston University Saber Mirzaei Boston University Abstract The reassembling of a simple connected graph G = (V, E) is an abstraction

More information

An Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes

An Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes An Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes by Chenlu Shi B.Sc. (Hons.), St. Francis Xavier University, 013 Project Submitted in Partial Fulfillment of

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees Francesc Rosselló 1, Gabriel Valiente 2 1 Department of Mathematics and Computer Science, Research Institute

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

Matroid Secretary for Regular and Decomposable Matroids

Matroid Secretary for Regular and Decomposable Matroids Matroid Secretary for Regular and Decomposable Matroids Michael Dinitz Weizmann Institute of Science mdinitz@cs.cmu.edu Guy Kortsarz Rutgers University, Camden guyk@camden.rutgers.edu Abstract In the matroid

More information

Solving Classification Problems By Knowledge Sets

Solving Classification Problems By Knowledge Sets Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer

More information

Workshop: Biosystematics

Workshop: Biosystematics Workshop: Biosystematics by Julian Lee (revised by D. Krempels) Biosystematics (sometimes called simply "systematics") is that biological sub-discipline that is concerned with the theory and practice of

More information

Clustering. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden Clustering Genome 373 Genomic Informatics Elhanan Borenstein Some slides adapted from Jacques van Helden The clustering problem The goal of gene clustering process is to partition the genes into distinct

More information

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Causality II: How does causal inference fit into public health and what it is the role of statistics? Causality II: How does causal inference fit into public health and what it is the role of statistics? Statistics for Psychosocial Research II November 13, 2006 1 Outline Potential Outcomes / Counterfactual

More information

Computability of Heyting algebras and. Distributive Lattices

Computability of Heyting algebras and. Distributive Lattices Computability of Heyting algebras and Distributive Lattices Amy Turlington, Ph.D. University of Connecticut, 2010 Distributive lattices are studied from the viewpoint of effective algebra. In particular,

More information

Author Entropy vs. File Size in the GNOME Suite of Applications

Author Entropy vs. File Size in the GNOME Suite of Applications Brigham Young University BYU ScholarsArchive All Faculty Publications 2009-01-01 Author Entropy vs. File Size in the GNOME Suite of Applications Jason R. Casebolt caseb106@gmail.com Daniel P. Delorey routey@deloreyfamily.org

More information

Gene Ontology. Shifra Ben-Dor. Weizmann Institute of Science

Gene Ontology. Shifra Ben-Dor. Weizmann Institute of Science Gene Ontology Shifra Ben-Dor Weizmann Institute of Science Outline of Session What is GO (Gene Ontology)? What tools do we use to work with it? Combination of GO with other analyses What is Ontology? 1700s

More information

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION 4.1 Overview This chapter contains the description about the data that is used in this research. In this research time series data is used. A time

More information

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

MATH2206 Prob Stat/20.Jan Weekly Review 1-2 MATH2206 Prob Stat/20.Jan.2017 Weekly Review 1-2 This week I explained the idea behind the formula of the well-known statistic standard deviation so that it is clear now why it is a measure of dispersion

More information

Toward a Proof of the Chain Rule

Toward a Proof of the Chain Rule Toward a Proof of the Chain Rule Joe Gerhardinger, jgerhardinger@nda.org, Notre Dame Academy, Toledo, OH Abstract The proof of the chain rule from calculus is usually omitted from a beginning calculus

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

REX - A TOOL FOR DISCOVERING EVOLUTION TRENDS

REX - A TOOL FOR DISCOVERING EVOLUTION TRENDS REX - A TOOL FOR DISCOVERING EVOLUTION TRENDS IN ONTOLOGY REGIONS VICTOR CHRISTEN, ANIKA GROSS, MICHAEL HARTUNG 18 TH JULY 2014, DILS, LISBOA 1 ONTOLOGY EVOLUTION Heavy usage of ontologies in the life

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,

More information

Equality of P-partition Generating Functions

Equality of P-partition Generating Functions Bucknell University Bucknell Digital Commons Honors Theses Student Theses 2011 Equality of P-partition Generating Functions Ryan Ward Bucknell University Follow this and additional works at: https://digitalcommons.bucknell.edu/honors_theses

More information

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ). Connectedness 1 Motivation Connectedness is the sort of topological property that students love. Its definition is intuitive and easy to understand, and it is a powerful tool in proofs of well-known results.

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Part III: Unstructured Data. Lecture timetable. Analysis of data. Data Retrieval: III.1 Unstructured data and data retrieval

Part III: Unstructured Data. Lecture timetable. Analysis of data. Data Retrieval: III.1 Unstructured data and data retrieval Inf1-DA 2010 20 III: 28 / 89 Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval Statistical Analysis of Data: III.2 Data scales and summary statistics III.3 Hypothesis

More information

Path Graphs and PR-trees. Steven Chaplick

Path Graphs and PR-trees. Steven Chaplick Path Graphs and PR-trees by Steven Chaplick A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Computer Science University of Toronto Copyright

More information

pursues interdisciplinary long-term research in Spatial Cognition. Particular emphasis is given to:

pursues interdisciplinary long-term research in Spatial Cognition. Particular emphasis is given to: The Transregional Collaborative Research Center SFB/TR 8 Spatial Cognition: Reasoning, Action, Interaction at the Universities of Bremen and Freiburg, Germany pursues interdisciplinary long-term research

More information

Pattern Popularity in 132-Avoiding Permutations

Pattern Popularity in 132-Avoiding Permutations Pattern Popularity in 132-Avoiding Permutations The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Rudolph,

More information

Leveraging Data Relationships to Resolve Conflicts from Disparate Data Sources. Romila Pradhan, Walid G. Aref, Sunil Prabhakar

Leveraging Data Relationships to Resolve Conflicts from Disparate Data Sources. Romila Pradhan, Walid G. Aref, Sunil Prabhakar Leveraging Data Relationships to Resolve Conflicts from Disparate Data Sources Romila Pradhan, Walid G. Aref, Sunil Prabhakar Fusing data from multiple sources Data Item S 1 S 2 S 3 S 4 S 5 Basera 745

More information

Examining the accuracy of the normal approximation to the poisson random variable

Examining the accuracy of the normal approximation to the poisson random variable Eastern Michigan University DigitalCommons@EMU Master's Theses and Doctoral Dissertations Master's Theses, and Doctoral Dissertations, and Graduate Capstone Projects 2009 Examining the accuracy of the

More information

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Some slides adapted from Jacques van Helden Gene expression profiling A quick review Which molecular processes/functions

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Clustering & microarray technology

Clustering & microarray technology Clustering & microarray technology A large scale way to measure gene expression levels. Thanks to Kevin Wayne, Matt Hibbs, & SMD for a few of the slides 1 Why is expression important? Proteins Gene Expression

More information

arxiv: v1 [cs.ds] 3 Feb 2018

arxiv: v1 [cs.ds] 3 Feb 2018 A Model for Learned Bloom Filters and Related Structures Michael Mitzenmacher 1 arxiv:1802.00884v1 [cs.ds] 3 Feb 2018 Abstract Recent work has suggested enhancing Bloom filters by using a pre-filter, based

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

THESIS. Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University

THESIS. Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University The Hasse-Minkowski Theorem in Two and Three Variables THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events Discrete Structures II (Summer 2018) Rutgers University Instructor: Abhishek

More information

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

CS1800: Mathematical Induction. Professor Kevin Gold

CS1800: Mathematical Induction. Professor Kevin Gold CS1800: Mathematical Induction Professor Kevin Gold Induction: Used to Prove Patterns Just Keep Going For an algorithm, we may want to prove that it just keeps working, no matter how big the input size

More information

4CitySemantics. GIS-Semantic Tool for Urban Intervention Areas

4CitySemantics. GIS-Semantic Tool for Urban Intervention Areas 4CitySemantics GIS-Semantic Tool for Urban Intervention Areas Nuno MONTENEGRO 1 ; Jorge GOMES 2 ; Paulo URBANO 2 José P. DUARTE 1 1 Faculdade de Arquitectura da Universidade Técnica de Lisboa, Rua Sá Nogueira,

More information

Understanding Interlinked Data

Understanding Interlinked Data Understanding Interlinked Data Visualising, Exploring, and Analysing Ontologies Olaf Noppens and Thorsten Liebig (Ulm University, Germany olaf.noppens@uni-ulm.de, thorsten.liebig@uni-ulm.de) Abstract Companies

More information

Row and Column Distributions of Letter Matrices

Row and Column Distributions of Letter Matrices College of William and Mary W&M ScholarWorks Undergraduate Honors Theses Theses, Dissertations, & Master Projects 5-2016 Row and Column Distributions of Letter Matrices Xiaonan Hu College of William and

More information