Chemical Ontologies. Chemical Ontologies. ChemAxon UGM May 23, 2012

Similar documents
Classifying Compounds in Public Databases

Automated compound classification using a chemical ontology

Navigating between patents, papers, abstracts and databases using public sources and tools

ICM-Chemist How-To Guide. Version 3.6-1g Last Updated 12/01/2009

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

Organic Chemistry 112 A B C - Syllabus Addendum for Prospective Teachers

Searching Substances in Reaxys

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Creating a Gold Standard Corpus for the Extraction of Chemistry-Disease Relations from Patent Texts

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

Pipeline Pilot Integration

How to add your reactions to generate a Chemistry Space in KNIME

Course Information. Instructor Information

Chemical Data Retrieval and Management

DECEMBER 2014 REAXYS R201 ADVANCED STRUCTURE SEARCHING

Introduction to Chemoinformatics and Drug Discovery

FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES

The Basics of General, Organic, and Biological Chemistry

Open PHACTS Explorer: Compound by Name

Carbon and Molecular Diversity - 1

Gene Ontology and overrepresentation analysis

12.1 The Nature of Organic molecules

SABIO-RK Integration and Curation of Reaction Kinetics Data Ulrike Wittig

PIOTR GOLKIEWICZ LIFE SCIENCES SOLUTIONS CONSULTANT CENTRAL-EASTERN EUROPE

Exam 1 (Monday, July 6, 2015)

Structure and Reaction querying in Reaxys

Command-line tools of ChemAxon: tips and tricks

Introduction Molecular Structure Script Console External resources Advanced topics. JMol tutorial. Giovanni Morelli.

Catching the Drift Indexing Implicit Knowledge in Chemical Digital Libraries

10/4/2010. Sequence Rules for Specifying Configuration. Sequence Rules for Specifying Configuration. 5.5 Sequence Rules for Specifying.

A powerful site for all chemists CHOICE CRC Handbook of Chemistry and Physics

Data Mining in the Chemical Industry. Overview of presentation

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón

Bio-Medical Text Mining with Machine Learning

Montgomery County Community College CHE 132 Chemistry for Technology II 4-3-3

BIOLOGY 101. CHAPTER 4: Carbon and the Molecular Diversity of Life: Carbon: the Backbone of Life

EDITION PAULA YURKANIS BRUICE CHAPTER 12 PDF

Imago: open-source toolkit for 2D chemical structure image recognition

2/25/2015. Chapter 4. Introduction to Organic Compounds. Outline. Lecture Presentation. 4.1 Alkanes: The Simplest Organic Compounds

Chem 109 C Fall 2015 Armen Zakarian Office: Chemistry Bldn 2217

PSI Chemistry. 3) How many electron pairs does carbon share in order to complete its valence shell? A) 1 B) 2 C) 3 D) 4 E) 8

Introduction to Spark

On InChI and evaluating the quality of cross-reference links

Ontologies for nanotechnology. Colin Batchelor

10. Amines (text )

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal

Chemical Databases: Encoding, Storage and Search of Chemical Structures

Developing CAS Products for Substructure Searching by Chemists. Linda Toler

Overview. Database Overview Chart Databases. And now, a Few Words About Searching. How Database Content is Delivered

Marvin 5.4 A new generation of structure indexing at Elsevier. Dr. Michael Maier, Dr. Heike Nau, Elsevier

Reaxys Pipeline Pilot Components Installation and User Guide

Reaxys Managing Complexity

Information Retrieval: SciFinder

ORGANIC CHEMISTRY. Wiley STUDY GUIDE AND SOLUTIONS MANUAL TO ACCOMPANY ROBERT G. JOHNSON JON ANTILLA ELEVENTH EDITION. University of South Florida

The Case for Use Cases

COMMUNITY COLLEGE OF RHODE ISLAND CHEMISTRY DEPARTMENT. Syllabus and Course Information. for. Survey of Biomedical Chemistry CHEM 1010.

Carbon and the Molecular Diversity of Life

Montgomery County Community College CHE 122 General Chemistry-Organic (For the Non-Science Major) 4-3-3

Carbon and the Molecular Diversity of Life

AP Biology: Biochemistry Learning Targets (Ch. 2-5)

Chemistry 1110 Exam 4 Study Guide

This place covers: Compounds containing a cyclopenta[a]hydrophenanthrene skeleton (see below) or a ring structure derived therefrom:

Project Prospect and the InChI. Colin Batchelor

Pipeline Pilot Integration

Organic Nomenclature

Chapter 25 Organic and Biological Chemistry

Chapter 4. Carbon and the Molecular Diversity of Life

Using C-OWL for the Alignment and Merging of Medical Ontologies

Agilent METLIN Personal Metabolite Database and Library MORE CONFIDENCE IN COMPOUND IDENTIFICATION

Objective 5. Name alkanes chains and rings parent and branches.

Classes of Organic Compounds

profileanalysis Innovation with Integrity Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research

Dictionary of ligands

Carbon and the Molecular Diversity of Life

Хемоінформатика. Докінг. Дизайн ліків. Біоінформатика (3 курс) Лекція 4 (частина 1)

CLRG Biocreative V

Expanding the scope of literature data with document to structure tools PatentInformatics applications at Aptuit

4 Carbon and the Molecular Diversity of Life

The Fragment Network: A Chemistry Recommendation Engine Built Using a Graph Database

OntoChem Software. Chemoinformatic Solutions for Life Sciences Problems

4 Carbon and the Molecular Diversity of Life

3. Organic Compounds: Alkanes and Cycloalkanes

Alkanes and Cycloalkanes

Principles of Biological Chemistry

Searching Inorganic Chemistry

Hydrocarbons. Chapter 22-23

The Electronic Representation of Chemical Structures: beyond the low hanging fruit

Chapter 2 Concepts of Chemistry

Mining for Chemistry in Text and Images. A Real-World Example revealing the Challenge, Scope, Limitation and Usability of the current Technology

Enzyme Catalysis & Biotechnology

Full file at

Demonstration: Searching patents based on chemical structure using SciFinder

Biology Chapter 2: The Chemistry of Life. title 4 pictures, with color (black and white don t count!)

Exams: TA evaluation Total 1000 Grading Scale:

MONTGOMERY COLLEGE Physical Sciences Department Takoma Park-Silver Spring Campus CH 103 Syllabus Fall 2010

DRUG DISCOVERY TODAY ELN ELN. Chemistry. Biology. Known ligands. DBs. Generate chemistry ideas. Check chemical feasibility In-house.

KJM 3200 Required Reading (Pensum), Fall 2016

Carbon and the Molecular Diversity of Life

COURSE OBJECTIVES / OUTCOMES / COMPETENCIES.

Properties of Amines

Transcription:

Chemical Ontologies ChemAxon UGM May 23, 2012 Chemical Ontologies OntoChem GmbH Heinrich-Damerow-Str. 4 06120 Halle (Saale) Germany Tel. +49 345 4780472 Fax: +49 345 4780471 mail: info(at)ontochem.com

Chemical Ontologies - Why? Chemistry has a long history in specialized databases Compound databases Reaction databases Structure-Activity Relationships (SAR) Open needs Smart chemistry search for non-chemists Structure search over text documents Searches over patents Complex high-level chemistry related queries Chemical Ontologies 2

Examples Relationships between compound classes and terpenes OCMiner 724,137 3

Chemical Ontologies - Why? Bringing chemistry to non-chemists! Search for steroids and find ecdysterone Resolve Markush structures: 1-cycloalkyl isoquinolines High-level queries: - what diseases can be treated with terpenes? - which proteins are inhibited by terpenes? - which steroids are found in plants? - which compound classes have antibacterial properties? - drug repurposing: alternative uses of my compound? 4

Chemical Ontologies Status Today PubMed PubMed assigns compounds to classes 9,097 classes and compounds 58,822 synonym terms Manual assignment ChEBI (Chemical Entities of Biological Interest) ChEBI assigns compounds to classes 30,935 classes and compounds 182,608 synonym terms Manual assignment Problems of manual creation and assignment Redundant relationships Homonyms in the hierarchy path Missing relationships Wrong relationships 5

Chemical Ontologies Status Today Ontology Problems (e.g. MeSH, ChEBI) Redundant relationships Amino Acids <- Threonine Amino Acids <- Essential Amino Acids <- Threonine Amino Acids <- Neutral Amino Acids <- Threonine Homonyms in the hierarchy path Insulin (regular insulin) <- regular pork insulin (regular insulin) Missing relationships Histidine -> imidazole, carboxylic acid, amine Steroids -> terpenes Wrong relationships Chlorohydrocarbon <- 2,2-bis(4-chlorophenyl)ethanol 6

Chemical Ontologies Status Today Conceptual Problems : Chemistry class terms in life science derivative, backbone, scaffold, compound class, compound substructure, superstructure, tautomer, diastereomer, enantiomer, epimer ChEBI is_conjugate_base_of, is_conjugate_acid_of is_tautomer_of, has_parent_hydride, is_enantiomer_of has_functional_parent, is_substituent_group_from, has_role common understanding of terms needed! 7

Chemical Ontology Editor Goal: Create an ontology editor for chemistry that allows Classify chemical compounds into structure classes, e.g. steroids, terpenes, alcohols etc. Automated assignment of compound classes to compounds in structure files (SDF, SMI or others) or in databases Quality checks: loop detection, redundant links, homonyms, chemical validity and accessibility Optimized for named entity recognition in text: recognize compounds and compound classes, e.g. propane, propanes, propane derivatives and propyl substituent 8

Chemical Ontologies Design Principles is_a compound class is_a cycloalkane, cycloalkanes: defined by common (biological) understanding not is_a cycloalkane is_a compound derivative class cycloalkane derivatives (contains cycloalkane): cycloalkane derivatives, scaffold, substituent, group, moiety 9

Chemical Ontologies Design Principles is_a class type is_a androstane, not is_a pregnane is_a pregnane, not is_a androstane 10

Chemical Ontologies Design Principles is_a class type androstanes organic compounds lipids prenol lipids terpenes triterpenes steroids pregnanes 11

SMILES Compounds have SMILES = Simplified Molecular Input Line Entry Specification structures as ASCII strings SMILES: O=C1NS(=O)(=O)C2=CC=CC=C12 human readable 2 symbol types: Atoms (C,O,N,S) and bonds (-,=,#,:) SMILES for describing compounds

SMARTS Compound classes may have SMARTS = SMiles Arbitrary Target Specification molecular pattern SMARTS: [#6]1[#6][#6][#6]2[!#6][C,N]C[#6]2[#6]1 atom lists and non-lists ([C,N],![C]) bond lists: any, single OR double, double OR aromatic Compound classes may not have SMARTS, but only children with SMARTS e.g. lipids, terpenes SMARTS for describing compound classes

Chemical Ontologies Design Principles class definition is using SMARTS logic: androstanes AND OR AND 14

Chemical Ontologies Editor - SODIAC 15

Chemical Ontologies Editor - SODIAC 16

Chemical Ontologies Editor - SODIAC reads SMILES, SDF, OWL and OBO files writes SMILES, SDF and OBO files chemistry ontologies based on SMARTS integrates with name2structure tools allows to edit and export any OBO ontology integrity and ontology checking structure checking assigns compound classes in files or databases 17

Examples PubChem: 37 million compounds currently found 337 million compound class annotations with SODIAC Chemistry in Medline: found 92.813 chemicals in about 19 million abstracts 1.225.000 times water... 1.191.000 times calcium 756.800 times glucose 353.000 times cholesterol natural products - the largest compound class with interesting data 18

Examples compound group Annotation of classes and compounds in text: anaphora compound compound class OPSIN compound dictionary ChemAxon compound 19

Examples Search terms alkanes PubMed 51,992 alkanes OCMiner 108,164 heterocyclic PubMed 24,711 heterocyclic compounds OCMiner 3,124,129 spiro compounds PubMed 33,528 spiro compounds OCMiner 157,396 terpenes PubMed 218,433 terpenes OCMiner 724,137 20

Thanks For The Attention! try at www.ocminer.com 21