InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison

Similar documents
The NCI/CADD Group's InChI Usage and Analysis of Tautomerism for InChI V2

The IBM Patent Data Donation to NIH, and its Integration in the NCI/CADD Database and Web Services

Tautomerism in chemical information management systems

Canonical Line Notations

The IUPAC Chemical Identifier

InChI keys as standard global identifiers in chemistry web services. Russ Hillard ACS, Salt Lake City March 2009

DECEMBER 2014 REAXYS R201 ADVANCED STRUCTURE SEARCHING

Introduction to Chemoinformatics and Drug Discovery

Organometallics & InChI. August 2017

Capturing Chemistry. What you see is what you get In the world of mechanism and chemical transformations

5. Composition and Connectivity Does the formula always represent the complete composition of the substance?

InChI, the IUPAC International Chemical Identifier

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

On InChI and evaluating the quality of cross-reference links

AUTOMATIC GENERATION OF TAUTOMERS

Introduction to Chemoinformatics

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal

Chemically Intelligent Experiment Data Management

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics

Bioinformatics Workshop - NM-AIST

So I have an SD File What do I do next? Rajarshi Guha & Noel O Boyle NCATS & NextMove So<ware

CHAPTER 23 HW: ENOLS + ENOLATES

Reaxys Pipeline Pilot Components Installation and User Guide

Dictionary of ligands

Analyzing Small Molecule Data in R

Chemical Journal Publishing in an Online World. Jason Wilde, Publisher Physical Sciences Nature Publishing Group ACS Spring Meeting 2009

Comprehensive Chemoinformatics since Web-based, client/server, and toolkit approaches. Native Oracle (cartridge) and Microsoft technology.

cheminformatics toolkits: a personal perspective

Chemical Databases: Encoding, Storage and Search of Chemical Structures

Reaction mechanisms offer us insights into how reactions work / how molecules react with one another.

Supporting Information. Kekule.js: An Open Source JavaScript Chemoinformatics Toolkit

Developing CAS Products for Substructure Searching by Chemists. Linda Toler

California State Polytechnic University, Pomona. Exam Points 1. Nomenclature (1) 25

A Journey from Data to Knowledge

MEDICINAL CHEMISTRY I EXAM #1

Хемоінформатика. Докінг. Дизайн ліків. Біоінформатика (3 курс) Лекція 4 (частина 1)

CHE 321 Summer 2010 Exam 2 Form Choose the structure(s) that represent cis-1-sec-butyl-4-methylcyclohexane. I II III

PubChem atom environments

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

Molecular Modelling. Computational Chemistry Demystified. RSC Publishing. Interprobe Chemical Services, Lenzie, Kirkintilloch, Glasgow, UK

Synthetically Accessible Virtual Inventory (SAVI)

Reaxys The Highlights

Dock Ligands from a 2D Molecule Sketch

Data Mining in the Chemical Industry. Overview of presentation

Introduction Molecular Structure Script Console External resources Advanced topics. JMol tutorial. Giovanni Morelli.

California State Polytechnic University, Pomona Chem 315. Exam Points 1. Nomenclature (1) 30

ICM-Chemist How-To Guide. Version 3.6-1g Last Updated 12/01/2009

BIOVIA ENHANCED STEREOCHEMICAL REPRESENTATION WHITE PAPER

Tautomerism in Alkyl and -OH Derivatives of Heterocycles containing Two Heteroatoms

Chapter 19. Organic Chemistry. Carbonyl Compounds III. Reactions at the a-carbon. 4 th Edition Paula Yurkanis Bruice

OAT Organic Chemistry - Problem Drill 19: NMR Spectroscopy and Mass Spectrometry

O N N. electrons in ring

CDK & Mass Spectrometry

Exam 1 (Monday, July 6, 2015)

Enolates, Enols, and Enamines Part 3

Spring Term 2012 Dr. Williams (309 Zurn, ex 2386)

The IUPAC InChI Chemical Structure Standard

Chemistry /002 Exam 1 Version Green. The Periodic Table

RESPONSE PROJECT DATABASE - NPS AND RELATED COMPOUNDS

Isomerism. Introduction

If somehow it is possible to increase the stability of a given base, then it

Objective #1 (80 topics, due on 09/05 (11:59PM))

Imago: open-source toolkit for 2D chemical structure image recognition

Chap 11. Carbonyl Alpha-Substitution Reactions and Condensation Reactions

Module No and Title. PAPER No: 5 ; TITLE : Organic Chemistry-II MODULE No: 25 ; TITLE: S E 1 reactions

Topic 9. Aldehydes & Ketones

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

The LSD Software. A tool for the structure determination of small molecules. Jean-Marc Nuzillard. Cargèse, 2013, March 23 th

IUPAC International Chemical Identifier (InChI) Subcommittee

Navigating between patents, papers, abstracts and databases using public sources and tools

C. Correct! The abbreviation Ar stands for an aromatic ring, sometimes called an aryl ring.

The Fragment Network: A Chemistry Recommendation Engine Built Using a Graph Database

Chemistry 233 Exam 3. The Periodic Table

The following exam contains 30 questions valued at 3 point/question and bonus opportunities. Name:

Structure Searching in CrossFire Beilstein. DiscoveryGate SM Version 1.4 Participant s Guide

CHEM 341: Organic Chemistry I at North Dakota State University Midterm Exam 01 - Fri, Feb 10, 2012!! Name:!

Style guide for chemical structures

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers

Open PHACTS Explorer: Compound by Name

BioSolveIT. A Combinatorial Approach for Handling of Protonation and Tautomer Ambiguities in Docking Experiments

Lecture 13A 05/11/12. Amines. [Sn2; Hofmann elimination; reduction of alkyl azides, amides, nitriles, imines; reductive amination; Gabriel synthesis]

Web-accessible Chemical. Compound Information. Dana L. Roth

The Electronic Representation of Chemical Structures: beyond the low hanging fruit

CHEM 240: Survey of Organic Chemistry at North Dakota State University Midterm Exam 02 - Tue, 23 Sep 2014!! Name:! KEY!

C h a p t e r T w e n t y - o n e : Enols, Enolates, and Aldol-like Condensations

CHEM1101 Worksheet 6: Lewis Structures

CEM 850 Final Exam H N H H O H H O H C H 2. MeO

Fri 6 Nov 09. More IR Mass spectroscopy. Hour exam 3 Fri Covers Chaps 9-12 Wednesday: Review

Final Exam. Chem 3B, Fall 2016 Monday, Dec 12, pm. Name Answer Key. Student ID. If you are making up an incomplete, list the semester here:

For more info visit

Keynotes in Organic Chemistry

Chemistry 233 Exam 3 (Green) The Periodic Table

Metrabase The Metabolism and Transport Database. user manual v

Organic Chemistry. Alkynes

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Basic Techniques in Structure and Substructure

The IUPAC Chemical Identifier Technical Manual

BioSolveIT. A Combinatorial Docking Approach for Dealing with Protonation and Tautomer Ambiguities

Tautomer Identification and Tautomer Structure Generation Based on the InChI Code

Transcription:

InChI/InChIKey vs. CI/CADD Structure Identifiers: A comparison Markus Sitzmann Computer-Aided Drug Design Group (CI/CADD), Laboratory of Medicinal Chemistry, CI-Frederick, I, DS Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

The Adaption and Use of the IUPAC InChI/InChIKey Chemical Structure Lookup Service 74 million structure records 46 million unique structures InChI/InChIKey Std. InChI/InChIKey CI/CADD Identifiers FICTS FICuS uuuuu Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Unique Representation of Chemical Structures CI/CADD Structure Identifiers based on hashcodes calculated by the chemoinformatics toolkit CACTVS 2 CACTVS hashcodes: 9850FD9F9E2B4E25 represent a chemical structure uniquely as 16-digit hexadecimal number (64-bit unsigned) have a high sensitivity to structural features of a compound change if connectivity changes Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

2 6C16DE2351F9FF50 tautomers 2 E92E4BA2869F3611 stereoisomers 2 8A7AD1EB498CC76A 2 salt - 2 charged form a + 3 + - 3ECEF579D7DF025A 9850FD9F9E2B4E25 A3DAE0788050DDE4 a 2 8F7A1DE5A733F0E0 errors isotope 15 2 B2FDA68AEDA06DB9 60525E1AF41497B6 Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Unique Representation of Chemical Structures CI/CADD Structure Identifiers MDL Molfile MDL SDF SMILES ChemDraw cdx PDB input structure structure normalization parent structure hashcode calculation E_ASISY CI/CADD Identifier MDL SDF SMILES database Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

CI/CADD Structure Identifiers Structure ormalization adjustable levels of sensitivity: Fragments Isotopes Charges Tautomers Stereochemistry sensitive sensitive sensitive sensitive sensitive - a + D D D D D D 3 + - C 2 C 2 keep only largest organic fragment ignore isotope labels uncharge find canonical tautomer discard stereo information 2 C 2 un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

CI/CADD Structure Identifiers Structure ormalization Fragments Isotopes Charges sensitive sensitive sensitive - a + D D D D D D 3 + - Tautomers sensitive Stereochemistry sensitive C 2 C 2 2 C 2 un-sensitive un-sensitive un-sensitive un-sensitive Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

CI/CADD Structure Identifiers Structure ormalization FICTS identifier: representation of the exact drawing Fragments Isotopes Charges sensitive sensitive sensitive - a + D D D D D D F I C 3 + - Tautomers sensitive = T = Stereochemistry sensitive C 2 S C 2 2 C 2 un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

CI/CADD Structure Identifiers Structure ormalization FICuS identifier: comes closest to how a chemist perceives a compound Fragments Isotopes Charges sensitive sensitive sensitive - a + D D D D D D 3 + - Tautomers sensitive = Stereochemistry sensitive C 2 C 2 F I C = u S 2 C 2 un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

CI/CADD Structure Identifier Structure ormalization uuuuu identifier: closely related forms of the same compound Fragments Isotopes Charges Tautomers Stereochemistry sensitive sensitive sensitive sensitive sensitive - a + D D D D D D 3 + - = C = 2 C 2 = u = = = = = u u u u 2 C 2 un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

CI/CADD Structure Identifier Structure ormalization normalize or discard stereo information define canonical tautomer n FICTS define canonical resonance form/ protonation state n d FICTu FICuS input structure d n FICuu uuuts parent structures correct structure: add hydrogen atoms correct functional groups correct metal atom bonds n d uuutu uuuus get largest fragment & uncharge: delete complex center get largest organic fragment delete radical center uncharge structure d discard isotope labels uuuuu Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

CI/CADD Structure Identifier 2 9850FD9F9E2B4E25-FICTS-01-57 9850FD9F9E2B4E25-FICuS-01-78 9850FD9F9E2B4E25-uuuuu-01-27 <CACTVS hashcode (E_ASISY)>-<tag>-<version>-<checksum> Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

2 6C16DE2351F9FF50-FICTS tautomers 2 E92E4BA2869F3611-FICTS stereoisomers 2 8A7AD1EB498CC76A-FICTS 2 salt - a + 3 + - 2 charged form E5F83F10C5DB080A-FICTS 2 9850FD9F9E2B4E25-FICTS FICTS errors a A3DAE0788050DDE4-FICTS isotope 15 2 E5F83F10C5DB080A-FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25-FICTS Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

2 9850FD9F9E2B4E25-FICuS tautomers 2 E92E4BA2869F3611-FICuS stereoisomers 2 8A7AD1EB498CC76A-FICuS 2 salt - a + 3 + - 2 charged form E5F83F10C5DB080A-FICuS 2 9850FD9F9E2B4E25-FICuS FICuS errors a A3DAE0788050DDE4-FICuS isotope 15 2 E5F83F10C5DB080A-FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25-FICuS Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

2 9850FD9F9E2B4E25-uuuuu tautomers 2 9850FD9F9E2B4E25-uuuuu stereoisomers 2 9850FD9F9E2B4E25-uuuuu 2 salt - a + 3 + - 2 charged form 9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu 2 uuuuu errors a isotope 1 5 2 9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-FICuS Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

2 DVDQJCIGZP-UFFFAYSA- tautomers 2 DVDQJCIGZP-RXMQYKEDSA- stereoisomers 2 DVDQJCIGZP-YFKPBYRVSA- 2 salt - a + 3 + - 2 charged form UPKBYGGMJTIM-UFFFAYSA-M DVDQJCIGZP-UFFFAYSA- DVDQJCIGZP-UFFFAYSA- 2 Std. InChIKey errors a isotope 1 5 2 UPKBYGGMJTIM-UFFFAYSA-M DVDQJCIGZP-UFFFAYSA- DVDQJCIGZP-CDYZYAPPSA- Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Structure ormalization Tautomers canonical tautomer? Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Structure ormalization Tautomers CACTVS: generation of all formal tautomers for a given organic compound (prototropic tautomerism) rule set of 21 transforms encoded as (CACTVS-extended) SMIRKS types of tautomerism covered: 1.3, 1.5 keto/enol imine/enamine imine/amine lactam/lactim 1.3, 1.5, 1.7, 1.11 hydrogen atom shift on (aromatic) heteroatoms keten/ynol nitro/aci-nitro nitroso/oxime special cases: cyanic/iso-cyanic acid, phosphonic acid, formamidinesulfonic acid, isocyanide, furanones and more Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Structure ormalization Tautomers 21 SMIRKS transforms, examples: transform: 1.3 keto-enol [,S,Se,Te;X1:1]=[Cx1:2][CX4R{0-2}:3][#1:4]>> [#1:4][,S,Se,Te;X2:1][Cx1,cx1:2]=[C,cx1,cx0:3] transform: 1.3 heteroatom shift [,n,s,s,,o,se,te:1]=[x2,nx2,c,c,p,p:2] [,n,s,,se,te:3][#1:4]>>[#1:4][,n,s,,se,te:1] [X2,nX2,C,c,P,p:2]=[,n,S,s,,o,Se,Te:3] transform: 1.5 heteroatom shift [nx2,x2,s,,se,te:1]=[c,c,nx2,x2:6][c,c:5]=[c,c,nx2:2] [,n,s,s,,o,se,te:3][#1:4]>>[#1:4][,n,s,,se,te:1] [C,c,nX2,X2:6]=[C,c:5][C,c,nX2:2]=[X2,S,,Se,Te:3] Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Structure ormalization Tautomers guanine 2 2 2 2 2 A6199E68A788F2F5-FICTS 67196F0B20B1D934-FICTS D979CF9770AC0BA5-FICTS 675R4FCC50F45026-FICTS 959B273B619C709F-FICTS 2 2 2 1AD375920BE60DAD-FICTS BCCDA7D0CDACF120-FICTS CE8F480C11DBFC4F-FICTS 61248C4A7D045A47-FICTS 0B345B47F6625113-FICTS 181CA9BCE3EF47F4-FICTS D46A1E6500B06AB6-FICTS 56FFE8B5619FB01-FICTS F802E527EC5C61BF-FICTS EF060DA9D97091DE-FICTS UYTPUPDQBUYGX-UFFFAYSA- BCCDA7D0CDACF120-FICuS Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Structure ormalization Tautomerism & Stereochemistry methyl propenyl ketone E Z Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Structure ormalization Tautomerism & Stereochemistry methyl propenyl ketone E tautomer tautomer Z Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Structure ormalization Tautomerism & Stereochemistry methyl propenyl ketone E tautomer 76D03F08ACDF6C0C-FICuS FICUS disregards stereochemistry on double bonds if the double bond is not located during tautomer generation. Z tautomer Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Tautomerism & Stereochemistry InChI=1S/C58/c1-3-4-5(2)6/h3-4,1-23 LABTWGUMFABVFG-UFFFAYSA- E methyl propenyl ketone InChI=1S/C58/c1-3-4-5(2)6/h3-4,1-23/b4-3+ LABTWGUMFABVFG-EGZZKSA- tautomer 76D03F08ACDF6C0C-FICuS InChI=1S/C58/c1-3-4-5(2)6/h3-4,6,12,23/b5-4- LYGWZVQSCPYDG-PLGDYQASA- FICUS disregards stereochemistry on double bonds if the double bond is not located during tautomer generation. Z tautomer InChI=1S/C58/c1-3-4-5(2)6/h3-4,1-23/b4-3- LABTWGUMFABVFG-ARJAWSKDSA- Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Tautomerism & Stereochemistry methyl propenyl ketone E 821D8C17ACE5040E-FICTS tautomer 76D03F08ACDF6C0C-FICTS 6EB4AA2BAA11965F-FICTS FICTS sees four different structures tautomer Z 1677645190718885-FICTS Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Structure ormalization Charges in Resonance Systems uncharge F3A27F03AE77A722 62FADCB01F197FC9 canonical resonance structure? problem! different protonation states uncharge F3A27F03AE77A722 2E011EE4519F7920 Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Structure ormalization Charges in Resonance Systems generation of all formal resonance structures for a given (charged) organic compound rule set of 14 transforms encoded as (CACTVS-extended) SMIRKS shifting of charges: 5 rules recombination of charges: 5 rules separation of charges: 4 rules Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Structure ormalization Charges in Resonance Systems münchnones: 1.2 recombination separation (pentavalent atom) 1.3 shift 1.2 recombination 1.3 shift 1.2 shift 1.3 recombination 1.3 shift 1.3 shift 1.3 shift 1.3 shift (no plausible unpolarized resonance structure can be drawn) IUYUGWCTLFFCL-UFFFAYSA- F68AC07DE0D3379F-FICuS Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison»chemical Structure Lookup Service«Database PubChem database (including pen CI database, EPA DSSTox databases, IAID IV databases, IST Webbook, LM ChemIDplus, ChemSpider ) Chemavigator iresearch Library (compilation of commercially available screening compounds from ~250 international chemistry suppliers) thers ~10% Chemav. iresearch Lib. ~43% PubChem ~47% Commercial Sources / thers (Asinex, Comgenex, ) 74 million structure records (~46 million unique structures) Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Unique Structure Counts structure records registered in CSLS: 74.2 million successful calculation of: Standard InChI/InChIKey: 73.8 million records CI/CADD Structure Identifiers: 73.7 million records unique structure counts (compound sets) Standard InChI/InChIKey: FICTS Identifier FICuS Identifier Standard InChIKey (first block) uuuuu Identifier 48,027,940 48,023,835 46,715,521 43,055,589 41,671,010 Standard InChI/InChIKeys were calculated by stdinchi-1 (Linux i-386 executable) from the original SD file records Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Detailed Comparison FICuS compound set (46.7 million unique) Standard InChI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) original structure record set (74.2 million) Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Detailed Comparison 1 conflicts? FICuS compound set (46.7 million unique) Standard InChI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) original structure record set (74.2 million) Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Detailed Comparison 1 conflicts? FICuS compound set (46.7 million unique) Standard InChI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) original structure record set (74.2 million) Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Detailed Comparison 2 same InChI/InChIKey? Standard InChI/InChIKey calculated by CACTVS from FICuS compound structure FICuS compound set (46.7 million unique) Standard InChI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) original structure record set (74.2 million) Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Detailed Comparison 1 no conflicts between Std. InChI/InChIKey and FICuS structure records (million records) all structure records 73.7 FICuS linked to a single InChI/InChIKey 62.3 (84.5%) both linked to a single structure record 34.4 (46.9%) both linked to multiple structure records 27.9 (38.0%) Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Detailed Comparison 1 conflicts between Std. InChI/InChIKey and FICuS structure records (million records) all structure records 73.7 FICuS is linked to multiple InChI/InChIKeys or vice versa 10.9 (14.7%) one FICuS is linked to multiple InChI/InChIKeys 6.8 (9.2%) one InChI/InChIKey is linked to multiple FICuS 4.1 (5.5%) Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Detailed Comparison 1 conflicts between Std. InChI/InChIKey and FICuS structure records (million records) all structure records 73.7 FICuS is linked to multiple InChI/InChIKeys or vice versa 10.9 (14.7%) one FICuS is linked to multiple InChI/InChIKeys number of InChIKey first block 6.8 2.3 one InChI/InChIKey is linked to multiple FICuS 4.1 number of InChIKey first block 1.0 (9.2%) (3.1%) (5.5%) (1.3%) Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Detailed Comparison 2 same InChI/InChIKey? compounds (unique structures) (million records) structure records (million records) all compounds InChI changes all records InChI changes FICTS 48.0 3.8 (7.9%) 4.6 (6.2%) FICuS 46.7 6.4 (13.7%) 73.7 9.3 (12.7%) uuuuu 41.6 11.9 (28.6%) 21.9 (29.7%) Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Detailed Comparison 2 same InChI/InChIKey? compounds (unique structures) (million records) structure records (million records) all compounds InChI changes all records InChI changes FICTS 48.0 3.8 (7.9%) 4.6 (6.2%) FICuS 46.7 6.4 (13.7%) 73.7 9.3 (12.7%) uuuuu 41.6 vs. InChIKey first block 11.9 (28.6%) 21.9 (29.7%) 3.2 (7.6%) 6.3 (8.4%) Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison Detailed Comparison compound classification (formal) tautomer count > 1 (formal) tautomer count > 3 (formal) tautomer count > 10 full stereo contains metal atoms metal complexes salt has resonance charges inorganic occurrence in FICuS set 56.4% 25.4% 5.5% 25.7% 0.8% 0.2% 1.0% 0.2% 0.1% occurrence in FICuS subset (InChI changes) 14.5% 18.5% 28.9% 16.9% 34.5% 52.1% 18.6% 52.1% 33.9% Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison ChemBlock A3422/0145215 FICuS: 12 different structure records linked to this structure Std. InChI/InChIKey (stdinchi-1): calculates 3 different strings/keys for these 12 structure records (all have the same connectivity layer/first block) all of these 3 StdInChI/InChIKey differ from the StdInChI/InChIKey calculated after FICuS normalization (including connectivity layer/ first block) Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison ChemBlock A3422/0145215 E Z Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison ChemBlock A3422/0145215 tautomer: E Z Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison ChemBlock A3422/0145215 tautomer: tautomeric interconversion? E Z Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison ChemBlock A3422/0145215 R tautomeric interconversion? S tautomer: tautomeric interconversion? E Z Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison ChemBlock A3422/0145215 R tautomeric interconversion? S tautomer: tautomeric interconversion? E Z Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison ChemBlock A3422/0145215 ow many structures? tautomer: R ZIC04685909 tautomeric interconversion? S tautomeric interconversion? E ChemBlock A3422/0145215 Chemavigator 47748165 IST MS-Lib 1967005690 Z Chemavigator 65635274 Comparison Chemavigator Standard InChI/InChIKeys 34903393 - CI/CADD Structure Identifiers

InChI/InChIKey - CI/CADD Identifier comparison ChemBlock A3422/0145215 ow many structures? R tautomeric interconversion? S tautomer: FICuS parent structure tautomeric interconversion? E InChIKey A Z InChIKey C same connectivity layer/block InChIKey B Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

The Adaption and Use of the IUPAC InChI/InChIKey Chemical Structure Lookup Service 74 million structure records 46 million unique structures http://cactus.nci.nih.gov/lookup InChI/InChIKey Std. InChI/InChIKey CI/CADD Identifiers FICTS FICuS uuuuu Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Web Service Chemical Structure REST Service (beta) URL scheme: http://cactus.nci.nih.gov/chemical/structure/{identifier}/{method} http://cactus.nci.nih.gov/chemical/structure/inchikey=lfqscwfljttz-ufffaysa-/smiles http://cactus.nci.nih.gov/chemical/structure/inchikey=lfqscwfljttz-ufffaysa-/names http://cactus.nci.nih.gov/chemical/structure/inchikey=lfqscwfljttz-ufffaysa-/ficus http://cactus.nci.nih.gov/chemical/structure/inchikey=lfqscwfljttz-ufffaysa-/stdinchi http://cactus.nci.nih.gov/chemical/structure/inchikey=lfqscwfljttz-ufffaysa-/image http://cactus.nci.nih.gov/chemical/structure/ethanol/stdinchikey http://cactus.nci.nih.gov/chemical/structure/64-17-5/stdinchikey returns plain text/gif image if the structure identifier is not resolvable: http 404 status code Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers

Acknowledgments CADD Group, LMC, CI Marc icklaus Igor V. Filippov Chemavigator Scott utton Tad urst CACTVS, Xemistry Gmb Wolf-Dietrich Ihlenfeldt Thanks to all database providers Thanks to the InChI Team ur web site: http://cactus.nci.nih.gov Comparison Standard InChI/InChIKeys - CI/CADD Structure Identifiers