Tautomerism in chemical information management systems

Similar documents
Methods for tautomer enumeration, -searching and -duplicate filtering

Canonical Line Notations

InChI keys as standard global identifiers in chemistry web services. Russ Hillard ACS, Salt Lake City March 2009

Pipeline Pilot Integration

InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Organometallics & InChI. August 2017

A Journey from Data to Knowledge

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics

cheminformatics toolkits: a personal perspective

Pipeline Pilot Integration

Chemical Databases: Encoding, Storage and Search of Chemical Structures

The Electronic Representation of Chemical Structures: beyond the low hanging fruit

Fast similarity searching making the virtual real. Stephen Pickett, GSK

Command-line tools of ChemAxon: tips and tricks

The IBM Patent Data Donation to NIH, and its Integration in the NCI/CADD Database and Web Services

Developing CAS Products for Substructure Searching by Chemists. Linda Toler

DECEMBER 2014 REAXYS R201 ADVANCED STRUCTURE SEARCHING

Reaxys Pipeline Pilot Components Installation and User Guide

The NCI/CADD Group's InChI Usage and Analysis of Tautomerism for InChI V2

AUTOMATIC GENERATION OF TAUTOMERS

KNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös

Database Speaks. Ling-Kang Liu ( 劉陵崗 ) Institute of Chemistry, Academia Sinica Nangang, Taipei 115, Taiwan

ChemAxon. Content. By György Pirok. D Standardization D Virtual Reactions. D Fragmentation. ChemAxon European UGM Visegrad 2008

Introduction to Chemoinformatics and Drug Discovery

Introduction. Chemical Structure Graphs. Whitepaper

Searching Substances in Reaxys

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal

est Drive K20 GPUs! Experience The Acceleration Run Computational Chemistry Codes on Tesla K20 GPU today

ICM-Chemist How-To Guide. Version 3.6-1g Last Updated 12/01/2009

RInChI. International Chemical Identifier for Chemical Reactions (RInChI) Guenter Grethe, Jonathan Goodman, Chad Allen

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

Integrated Cheminformatics to Guide Drug Discovery

Introduction to Chemoinformatics

Comprehensive Chemoinformatics since Web-based, client/server, and toolkit approaches. Native Oracle (cartridge) and Microsoft technology.

IUPAC International Chemical Identifier (InChI) Subcommittee

RESPONSE PROJECT DATABASE - NPS AND RELATED COMPOUNDS

So I have an SD File What do I do next? Rajarshi Guha & Noel O Boyle NCATS & NextMove So<ware

FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES

Finding the Needle - Reaxys Structure Searching

Structure-based approaches to the indexing and retrieval of patent chemistry. Tim Miller Head of Research May 2010

An Integrated Approach to in-silico

BioSolveIT. A Combinatorial Approach for Handling of Protonation and Tautomer Ambiguities in Docking Experiments

Recent Advances in Computer-Aided Interpretation of 2D Representations of 3D Molecules

Хемоінформатика. Докінг. Дизайн ліків. Біоінформатика (3 курс) Лекція 4 (частина 1)

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

The PhilOEsophy. There are only two fundamental molecular descriptors

Tautomer Identification and Tautomer Structure Generation Based on the InChI Code

A powerful site for all chemists CHOICE CRC Handbook of Chemistry and Physics

Towards Physics-based Models for ADME/Tox. Tyler Day

CDK & Mass Spectrometry

Ligand Scout Tutorials

Data Mining in the Chemical Industry. Overview of presentation

Ákos Tarcsay CHEMAXON SOLUTIONS

Reaction mechanisms offer us insights into how reactions work / how molecules react with one another.

Chemical Journal Publishing in an Online World. Jason Wilde, Publisher Physical Sciences Nature Publishing Group ACS Spring Meeting 2009

5. Composition and Connectivity Does the formula always represent the complete composition of the substance?

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples

CHE 200 INFORMATION RESOURCES LIBRARY PRESENTATION

Capturing Chemistry. What you see is what you get In the world of mechanism and chemical transformations

How to Create a Substance Answer Set

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Overview. Database Overview Chart Databases. And now, a Few Words About Searching. How Database Content is Delivered

BioSolveIT. A Combinatorial Docking Approach for Dealing with Protonation and Tautomer Ambiguities

Using AutoDock for Virtual Screening

The Case for Use Cases

Dock Ligands from a 2D Molecule Sketch

In Silico Investigation of Off-Target Effects

Bioinformatics Workshop - NM-AIST

The Changing Requirements for Informatics Systems During the Growth of a Collaborative Drug Discovery Service Company. Sally Rose BioFocus plc

Ultra High Throughput Screening using THINK on the Internet

Navigating between patents, papers, abstracts and databases using public sources and tools

The Schrödinger KNIME extensions

On InChI and evaluating the quality of cross-reference links

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

Chemical structure representation challenges encountered when curating the CSD

Analyzing Small Molecule Data in R

Chemical Reaction Databases Computer-Aided Synthesis Design Reaction Prediction Synthetic Feasibility

Chemically Intelligent Experiment Data Management

Generating Small Molecule Conformations from Structural Data

NMR Predictor. Introduction

CHEMISTRY COLLECTION Basic Chemistry Guide

Molecular Modelling. Computational Chemistry Demystified. RSC Publishing. Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit

Conformational Searching using MacroModel and ConfGen. John Shelley Schrödinger Fellow

CSD. CSD-Enterprise. Access the CSD and ALL CCDC application software

JOHN MAYFIELD EGON WILLIGHAGEN CHEMISTRY DEVELOPMENT KIT V2.0

Dictionary of ligands

Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule

Supplementary information

Similarity Search. Uwe Koch

Open PHACTS Explorer: Compound by Name

Information Retrieval: SciFinder

ENERGY MINIMIZATION AND CONFORMATION SEARCH ANALYSIS OF TYPE-2 ANTI-DIABETES DRUGS

Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds

Transcription:

Tautomerism in chemical information management systems Dr. Wendy A. Warr http://www.warr.com

Tautomerism in chemical information management systems Author: Wendy A. Warr DOI: 10.1007/s10822-010-9338-4 Perspectives Issue Devoted to Tautomerism in Molecular Design Edited by Yvonne Martin

Chemical Information Aspects Registration procedures Storage of tautomers Exact and substructure search Depiction of results

Software and Database Vendors Accelrys ACD/Labs Beilstein/Reaxys CambridgeSoft CAS CCDC CCG ChemAxon ChemoSoft ChemSpider CWM Global Search Daylight Dialog IDBS InfoChem InhibOx John Wiley & Sons Molecular Networks NCI/CADD OpenEye Thieme PubChem Questel Schrödinger SciTouch Symyx Thomson Reuters Xemistry (CACTVS)

Not Included ABCD (J&J) BioRad (KnowItAll) CDK emolecules SimBioSys Tripos ZINC

Chemical Structure Representation

Morgan Algorithm Morgan, H. L. The generation of a unique machine description for chemical structures - a technique developed at Chemical Abstracts Service. J. Chem. Doc. 1965, 5(2),107-113.

CTfile

SMILES CC1=CC(Br)CCC1

SMILES OpenEye canonical SMILES Daylight canonical SMILES SciTouch canonical SMILES ChemAxon canonical SMILES

IUPAC International Chemical Identifier (InChI) InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 InChIKey=RYYVLZVUVIJVGH-UHFFFAOYSA-N

NCI/CADD Identifiers (CACTVS Hashcodes) 9850FD9F9E2B4E25-FICTS-01-57 9850FD9F9E2B4E25-FICuS-01-78 9850FD9F9E2B4E25-uuuuu-01-27

Definition of Tautomerism M=Q-ZH HM-Q=Z Q = C, N, S, P, Sb, As, Se, Te, Br, Cl or I M, Z = trivalent N, bivalent O, S, Se or Te [Either M or Z = C] H = H, D, T [or + or -] Extended system, ring/chain, etc.

Straightforward 1,3 shift 1,5 shift 1,7 shift

More Complex 1 2 3 4

Degree of Unsaturation

Ring Opening

Fluxional structures

Mesomers Different NEMA Keys NEMA Key=6P1SUP7NENNHV4V61WRZP5S2ES8NZF InChI=1S/C16H18N3S.ClH/c1-18(2)11-5-7-13-15(9-11)20-16-10-12(19(3)4)6-8-14(16)17-13;/h5-10H,1-4H3;1H/q+1;/p-1 NEMA Key=CKGEHDBX4KZPW3VV6DXTVM5BB689GB InChI=1S/C16H18N3S.ClH/c1-18(2)11-5-7-13-15(9-11)20-16-10-12(19(3)4)6-8-14(16)17-13;/h5-10H,1-4H3;1H/q+1;/p-1 CXKWCBBOMKCUKX-UHFFFAOYSA-M Same InChIKey

Tautomers Different NEMA Keys NEMA Key=CU3YSHT7DX8KUTKGRNS5GH3B4UQBFA InChI=1S/C10H13N5O3/c11-8-7-9(14-10(17)13-8)15(4-12-7)6-2-1-5(3-16)18-6/h4-6,16H,1-3H2, (H3,11,13,14,17)/t5-,6+/m0/s1 NEMA Key=CTDBHWQW8CQJHC3S5AH6X4QJWAVMKD InChI=1S/C10H13N5O3/c11-8-7-9(14-10(17)13-8)15(4-12-7)6-2-1-5(3-16)18-6/h4-6,16H,1-3H2, (H3,11,13,14,17)/t5-,6+/m0/s1 KITPKMKMNZXFDK-NTSWFWBYSA-N Same InChIKey

Unreasonable

5 Multiple Overlapping 6 7 8

Overlapping 9 10 11

Registration

Registration Objectives Corporate database Stock room database Predicting spectra Reaction mechanisms Ultra-low temperature lab

Registration Options Enumerate all tautomers; store all tautomers Calculate canonical tautomer; store canonical tautomer Enumerate all tautomers Rank [as major, minor, or conditions dependent (ACD/Labs)] Allow user to choose which form to store

Epik Schrödinger Enumerate all energetically reasonable tautomers Enumerate all energetically reasonable ionization states Store all tautomers and ionization states Canvas identifies duplicates by canonical SMILES

Are A and B Tautomers? If A and B are identical, accept If the total number of hydrogen atoms or charges is not identical, reject Examine the heavy-atom skeletons; reject if not identical Enumerate all tautomers for A; if any is the same as B, accept Enumerate all tautomers for B; if any is the same as A, accept Otherwise reject.

Enumeration of tautomers Sayle, R. A.; Delany, J. J. Canonicalization and enumeration of tautomers. Paper presented at EuroMUG99, Cambridge, UK, 28-29 Oct 1999 Oellien, F.; Cramer, J.; Beyer, C.; Ihlenfeldt, W-D.; Selzer, P. M. (2006) The impact of tautomer forms on pharmacophore-based virtual screening. J. Chem. Inf. Model. 2006, 46, 2342-2354. Greenwood, J. R.; Calkins, D.; Sullivan, A. P.; Shelley, J. C. Towards the comprehensive, rapid, and accurate prediction of the favorable tautomeric states of drug-like molecules in aqueous solution. J. Comput.-Aided Mol. Des. 2010, published online March 31, 2010

Storage of Tautomers

Concept A Generate all tautomers Impossible to calculate lowest energy tautomer Use rules for consistent generation [of a low energy form] Store this form [as canonical SMILES]

Concept B Generate all tautomers Impossible to calculate lowest energy tautomer Store all tautomers [Store all protomers]

Structure Search

Structure Search Exact matches done by flexmatch, SMILES, hashcodes etc. Substructure search Hard to perceive all tautomers for a substructure

Approaches to Substructure Search Address problem at registration stage store all tautomers Address problem at search stage enumerate database structures on the fly or enumerate query structure or user takes care specifying query Combine methods Ignore the problem

Depiction of Results

Depicting Results Display input, registered structure Display matched tautomer good approach if substructure is highlighted Display standard form Let user choose Experimental results match displayed tautomer

ChemAxon Approaches Normalize the structure ( generic tautomers ) Allow for tautomers at search time Choose a preferred tautomer Customize preferences in Standardizer

ChemAxon Tautomerization Plugin Generates all, dominant and canonical tautomers Calculates canonical tautomer by empirical rules Tries to make canonical tautomer the dominant tautomer (includes pk a filter) Handles dearomatization and stereochemistry

Customization Choose dominant tautomer set operating ph set maximum distance (# bonds) of a single proton migration protect structural features aromaticity, charge, stereochemistry, stable functional groups exclude unstable antiaromatic compounds

ChemAxon software Stores canonical form, or all tautomers Enumerates query tautomers (as far as possible) Usually displays structure originally input Optionally displays standard tautomer

Observations Computational chemistry companies does the ligand match the receptor? ligand preparation pk a algorithms, rules, energetics rigorous approaches Informatics companies does the compound match the patent? building registries and inventories graph theory examples (structures) pragmatic approaches Hybrid companies

Acknowledgments All 28 vendors including ChemAxon Jonathan Brecher Geoff Skillman Russ Hillard Keith Taylor